Changes for page POC Requirements (POC1 & POC2)
Last modified by Robert Schaub on 2025/12/24 20:16
To version 2.3
edited by Robert Schaub
on 2025/12/24 20:16
on 2025/12/24 20:16
Change comment:
Update document after refactoring.
Summary
-
Page properties (2 modified, 0 added, 0 removed)
Details
- Page properties
-
- Parent
-
... ... @@ -1,1 +1,1 @@ 1 - Test.FactHarbor.Specification.POC.WebHome1 +WebHome - Content
-
... ... @@ -18,9 +18,11 @@ 18 18 === 1.1 What POC Tests === 19 19 20 20 **Core Question:** 21 + 21 21 > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts? 22 22 23 23 **What we're proving:** 25 + 24 24 * AI can identify factual claims from text 25 25 * AI can evaluate those claims and produce verdicts 26 26 * Output is comprehensible and useful ... ... @@ -27,6 +27,7 @@ 27 27 * Fully automated approach is viable 28 28 29 29 **What we're NOT testing:** 32 + 30 30 * Scenario generation (deferred to POC2) 31 31 * Evidence display (deferred to POC2) 32 32 * Production scalability ... ... @@ -40,6 +40,7 @@ 40 40 Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**. 41 41 42 42 **Rationale:** 46 + 43 43 * **POC1 tests:** Can AI extract claims and generate verdicts? 44 44 * **POC2 will add:** Scenario generation and management 45 45 * **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow? ... ... @@ -51,6 +51,7 @@ 51 51 **No Risk:** 52 52 53 53 Scenarios are additive complexity, not foundational. Deferring them to POC2 allows: 58 + 54 54 * Faster POC1 validation 55 55 * Learning from POC1 to inform scenario design 56 56 * Iterative approach: fail fast if basic AI doesn't work ... ... @@ -57,14 +57,10 @@ 57 57 * Flexibility to adjust scenario architecture based on POC1 insights 58 58 59 59 **Full System Workflow (Future):** 60 -{{code}} 61 -Claims → Scenarios → Evidence → Verdicts 62 -{{/code}} 65 +{{code}}Claims → Scenarios → Evidence → Verdicts{{/code}} 63 63 64 64 **POC1 Simplified Workflow:** 65 -{{code}} 66 -Claims → Verdicts (scenarios implicit in reasoning) 67 -{{/code}} 68 +{{code}}Claims → Verdicts (scenarios implicit in reasoning){{/code}} 68 68 69 69 == 2. POC Output Specification == 70 70 ... ... @@ -75,6 +75,7 @@ 75 75 **Length:** 4-6 sentences 76 76 77 77 **Content (Required Elements):** 79 + 78 78 1. **Article's main thesis/claim** - What is the article trying to argue or prove? 79 79 2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts 80 80 3. **Central vs. supporting claims** - Which claims are central to the article's argument? ... ... @@ -84,30 +84,28 @@ 84 84 **Critical Innovation:** 85 85 86 86 POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might: 89 + 87 87 * Make accurate supporting facts but draw unsupported conclusions 88 88 * Have one false central claim that invalidates the whole argument 89 89 * Misframe accurate information to mislead 90 90 91 91 **Good Example (Context-Aware):** 92 -{{code}} 93 -This article argues that coffee cures cancer based on its antioxidant 95 +{{code}}This article argues that coffee cures cancer based on its antioxidant 94 94 content. We analyzed 3 factual claims: 2 about coffee's chemical 95 95 properties are well-supported, but the main causal claim is refuted 96 96 by current evidence. The article confuses correlation with causation. 97 97 Overall assessment: MISLEADING - makes an unsupported medical claim 98 -despite citing some accurate facts. 99 -{{/code}} 100 +despite citing some accurate facts.{{/code}} 100 100 101 101 **Poor Example (Simple Aggregation - Don't Do This):** 102 -{{code}} 103 -This article makes 3 claims. 2 are well-supported and 1 is refuted. 104 -Overall assessment: mostly accurate (67% accurate). 105 -{{/code}} 103 +{{code}}This article makes 3 claims. 2 are well-supported and 1 is refuted. 104 +Overall assessment: mostly accurate (67% accurate).{{/code}} 106 106 ↑ This misses that the refuted claim IS the article's main point! 107 107 108 108 **What POC1 Tests:** 109 109 110 110 Can AI identify and assess: 110 + 111 111 * ✅ The article's main thesis/conclusion? 112 112 * ✅ Which claims are central vs. supporting? 113 113 * ✅ Whether the evidence supports the conclusion? ... ... @@ -116,6 +116,7 @@ 116 116 **If AI Cannot Do This:** 117 117 118 118 That's valuable to learn in POC1! We'll: 119 + 119 119 * Note as limitation 120 120 * Fall back to simple aggregation with warning 121 121 * Design explicit article-level analysis for POC2 ... ... @@ -126,19 +126,18 @@ 126 126 **Format:** Numbered list 127 127 **Quantity:** 3-5 claims 128 128 **Requirements:** 130 + 129 129 * Factual claims only (not opinions/questions) 130 130 * Clearly stated 131 131 * Automatically extracted by AI 132 132 133 133 **Example:** 134 -{{code}} 135 -CLAIMS IDENTIFIED: 136 +{{code}}CLAIMS IDENTIFIED: 136 136 137 137 [1] Coffee reduces diabetes risk by 30% 138 138 [2] Coffee improves heart health 139 139 [3] Decaf has same benefits as regular 140 -[4] Coffee prevents Alzheimer's completely 141 -{{/code}} 141 +[4] Coffee prevents Alzheimer's completely{{/code}} 142 142 143 143 === 2.3 Component 3: CLAIMS VERDICTS === 144 144 ... ... @@ -146,6 +146,7 @@ 146 146 **Format:** Per claim structure 147 147 148 148 **Required Elements:** 149 + 149 149 * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED 150 150 * **Confidence Score:** 0-100% 151 151 * **Brief Reasoning:** 1-3 sentences explaining why ... ... @@ -152,8 +152,7 @@ 152 152 * **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration 153 153 154 154 **Example:** 155 -{{code}} 156 -VERDICTS: 156 +{{code}}VERDICTS: 157 157 158 158 [1] WELL-SUPPORTED (85%) [Risk: C] 159 159 Multiple studies confirm 25-30% risk reduction with regular consumption. ... ... @@ -165,10 +165,10 @@ 165 165 Some benefits overlap, but caffeine-related benefits are reduced in decaf. 166 166 167 167 [4] REFUTED (90%) [Risk: B] 168 -No evidence for complete prevention. Claim is significantly overstated. 169 -{{/code}} 168 +No evidence for complete prevention. Claim is significantly overstated.{{/code}} 170 170 171 171 **Risk Tier Display:** 171 + 172 172 * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections 173 173 * **Tier B (Yellow):** Medium Risk - Policy/Science/Causality 174 174 * **Tier C (Green):** Low Risk - Facts/Definitions/History ... ... @@ -182,13 +182,11 @@ 182 182 **Tone:** Neutral (article's position, not FactHarbor's analysis) 183 183 184 184 **Example:** 185 -{{code}} 186 -ARTICLE SUMMARY: 185 +{{code}}ARTICLE SUMMARY: 187 187 188 188 Health News Today article discusses coffee benefits, citing studies 189 189 on diabetes and Alzheimer's. Author highlights research linking coffee 190 -to disease prevention. Recommends 2-3 cups daily for optimal health. 191 -{{/code}} 189 +to disease prevention. Recommends 2-3 cups daily for optimal health.{{/code}} 192 192 193 193 === 2.5 Component 5: USAGE STATISTICS (Cost Tracking) === 194 194 ... ... @@ -195,6 +195,7 @@ 195 195 **What:** LLM usage metrics for cost optimization and scaling decisions 196 196 197 197 **Purpose:** 196 + 198 198 * Understand cost per analysis 199 199 * Identify optimization opportunities 200 200 * Project costs at scale ... ... @@ -201,8 +201,7 @@ 201 201 * Inform architecture decisions 202 202 203 203 **Display Format:** 204 -{{code}} 205 -USAGE STATISTICS: 203 +{{code}}USAGE STATISTICS: 206 206 • Article: 2,450 words (12,300 characters) 207 207 • Input tokens: 15,234 208 208 • Output tokens: 892 ... ... @@ -210,17 +210,18 @@ 210 210 • Estimated cost: $0.24 USD 211 211 • Response time: 8.3 seconds 212 212 • Cost per claim: $0.048 213 -• Model: claude-sonnet-4-20250514 214 -{{/code}} 211 +• Model: claude-sonnet-4-20250514{{/code}} 215 215 216 216 **Why This Matters:** 217 217 218 218 At scale, LLM costs are critical: 216 + 219 219 * 10,000 articles/month ≈ $200-500/month 220 220 * 100,000 articles/month ≈ $2,000-5,000/month 221 221 * Cost optimization can reduce expenses 30-50% 222 222 223 223 **What POC1 Learns:** 222 + 224 224 * How cost scales with article length 225 225 * Prompt optimization opportunities (caching, compression) 226 226 * Output verbosity tradeoffs ... ... @@ -228,6 +228,7 @@ 228 228 * Article length limits (if needed) 229 229 230 230 **Implementation:** 230 + 231 231 * Claude API already returns usage data 232 232 * No extra API calls needed 233 233 * Display to user + log for aggregate analysis ... ... @@ -237,7 +237,8 @@ 237 237 238 238 === 2.6 Total Output Size === 239 239 240 -**Combined:** ~220-350 words 240 +**Combined:** 220-350 words 241 + 241 241 * Analysis Summary (Context-Aware): 60-90 words (4-6 sentences) 242 242 * Claims Identification: 30-50 words 243 243 * Claims Verdicts: 100-150 words ... ... @@ -252,6 +252,7 @@ 252 252 The following are **explicitly excluded** from POC: 253 253 254 254 **Content Features:** 256 + 255 255 * ❌ Scenarios (deferred to POC2) 256 256 * ❌ Evidence display (supporting/opposing lists) 257 257 * ❌ Source links (clickable references) ... ... @@ -261,6 +261,7 @@ 261 261 * ❌ Risk assessment (shown but not workflow-integrated) 262 262 263 263 **Platform Features:** 266 + 264 264 * ❌ User accounts / authentication 265 265 * ❌ Saved history 266 266 * ❌ Search functionality ... ... @@ -270,6 +270,7 @@ 270 270 * ❌ Social sharing 271 271 272 272 **Technical Features:** 276 + 273 273 * ❌ Browser extensions 274 274 * ❌ Mobile apps 275 275 * ❌ API endpoints ... ... @@ -277,6 +277,7 @@ 277 277 * ❌ Export features (PDF, CSV) 278 278 279 279 **Quality Features:** 284 + 280 280 * ❌ Accessibility (WCAG compliance) 281 281 * ❌ Multilingual support 282 282 * ❌ Mobile optimization ... ... @@ -283,6 +283,7 @@ 283 283 * ❌ Media verification (images/videos) 284 284 285 285 **Production Features:** 291 + 286 286 * ❌ Security hardening 287 287 * ❌ Privacy compliance (GDPR) 288 288 * ❌ Terms of service ... ... @@ -296,17 +296,13 @@ 296 296 === 4.1 Architecture Comparison === 297 297 298 298 **POC Architecture (Simplified):** 299 -{{code}} 300 -User Input → Single AKEL Call → Output Display 301 - (all processing) 302 -{{/code}} 305 +{{code}}User Input → Single AKEL Call → Output Display 306 + (all processing){{/code}} 303 303 304 304 **Full System Architecture:** 305 -{{code}} 306 -User Input → Claim Extractor → Claim Classifier → Scenario Generator 309 +{{code}}User Input → Claim Extractor → Claim Classifier → Scenario Generator 307 307 → Evidence Summarizer → Contradiction Detector → Verdict Generator 308 -→ Quality Gates → Publication → Output Display 309 -{{/code}} 311 +→ Quality Gates → Publication → Output Display{{/code}} 310 310 311 311 **Key Differences:** 312 312 ... ... @@ -322,12 +322,14 @@ 322 322 === 4.2 Workflow Comparison === 323 323 324 324 **POC1 Workflow:** 327 + 325 325 1. User submits text/URL 326 326 2. Single AKEL call (all processing in one prompt) 327 327 3. Display results 328 -**Total: 3 steps, ~10-18 seconds**331 +**Total: 3 steps, 10-18 seconds** 329 329 330 330 **Full System Workflow:** 334 + 331 331 1. **Claim Submission** (extraction, normalization, clustering) 332 332 2. **Scenario Building** (definitions, assumptions, boundaries) 333 333 3. **Evidence Handling** (retrieval, assessment, linking) ... ... @@ -334,7 +334,7 @@ 334 334 4. **Verdict Creation** (synthesis, reasoning, approval) 335 335 5. **Public Presentation** (summaries, landscapes, deep dives) 336 336 6. **Time Evolution** (versioning, re-evaluation triggers) 337 -**Total: 6 phases with quality gates, ~10-30 seconds**341 +**Total: 6 phases with quality gates, 10-30 seconds** 338 338 339 339 === 4.3 Why POC is Simplified === 340 340 ... ... @@ -357,6 +357,7 @@ 357 357 === 4.4 Gap Between POC1 and POC2/Beta === 358 358 359 359 **What needs to be built for POC2:** 364 + 360 360 * Scenario generation component 361 361 * Evidence Model structure (full) 362 362 * Scenario-evidence linking ... ... @@ -364,6 +364,7 @@ 364 364 * Truth landscape visualization 365 365 366 366 **What needs to be built for Beta:** 372 + 367 367 * Multi-component AKEL pipeline 368 368 * Quality gate infrastructure 369 369 * Review workflow system ... ... @@ -380,6 +380,7 @@ 380 380 **Mode:** Mode 2 (AI-Generated, No Prior Human Review) 381 381 382 382 Per FactHarbor Specification Section 11 "POC v1 Behavior": 389 + 383 383 * Produces public AI-generated output 384 384 * No human approval gate 385 385 * Clear AI-Generated labeling ... ... @@ -389,8 +389,7 @@ 389 389 === 5.2 User-Facing Labels === 390 390 391 391 **Primary Label (top of analysis):** 392 -{{code}} 393 -╔════════════════════════════════════════════════════════════╗ 399 +{{code}}╔════════════════════════════════════════════════════════════╗ 394 394 ║ [AI-GENERATED - POC/DEMO] ║ 395 395 ║ ║ 396 396 ║ This analysis was produced entirely by AI and has not ║ ... ... @@ -400,10 +400,10 @@ 400 400 ║ Review Status: Not Reviewed (Proof-of-Concept) ║ 401 401 ║ Quality Gates: 4/4 Passed (Simplified) ║ 402 402 ║ Last Updated: [timestamp] ║ 403 -╚════════════════════════════════════════════════════════════╝ 404 -{{/code}} 409 +╚════════════════════════════════════════════════════════════╝{{/code}} 405 405 406 406 **Per-Claim Risk Labels:** 412 + 407 407 * **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety) 408 408 * **[Risk: B]** 🟡 Medium Risk (Policy/Science) 409 409 * **[Risk: C]** 🟢 Low Risk (Facts/Definitions) ... ... @@ -411,6 +411,7 @@ 411 411 === 5.3 Display Requirements === 412 412 413 413 **Must Show:** 420 + 414 414 * AI-Generated status (prominent) 415 415 * POC/Demo disclaimer 416 416 * Risk tier per claim ... ... @@ -419,6 +419,7 @@ 419 419 * Timestamp 420 420 421 421 **Must NOT Claim:** 429 + 422 422 * Human review 423 423 * Production quality 424 424 * Medical/legal advice ... ... @@ -442,6 +442,7 @@ 442 442 Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates. 443 443 444 444 **Full System Has 4 Gates:** 453 + 445 445 1. Source Quality 446 446 2. Contradiction Search (MANDATORY) 447 447 3. Uncertainty Quantification ... ... @@ -448,6 +448,7 @@ 448 448 4. Structural Integrity 449 449 450 450 **POC Implements Simplified Versions:** 460 + 451 451 * Focus on demonstrating concept 452 452 * Basic implementations sufficient 453 453 * Failures displayed to user (not blocking) ... ... @@ -456,6 +456,7 @@ 456 456 === 6.2 Gate 1: Source Quality (Basic) === 457 457 458 458 **Full System Requirements:** 469 + 459 459 * Primary sources identified and accessible 460 460 * Source reliability scored against whitelist 461 461 * Citation completeness verified ... ... @@ -463,6 +463,7 @@ 463 463 * Author credentials validated 464 464 465 465 **POC Implementation:** 477 + 466 466 * ✅ At least 2 sources found 467 467 * ✅ Sources accessible (URLs valid) 468 468 * ❌ No whitelist checking ... ... @@ -476,6 +476,7 @@ 476 476 === 6.3 Gate 2: Contradiction Search (Basic) === 477 477 478 478 **Full System Requirements:** 491 + 479 479 * Counter-evidence actively searched 480 480 * Reservations and limitations identified 481 481 * Alternative interpretations explored ... ... @@ -484,6 +484,7 @@ 484 484 * Academic literature (supporting AND opposing) 485 485 486 486 **POC Implementation:** 500 + 487 487 * ✅ Basic search for counter-evidence 488 488 * ✅ Identify obvious contradictions 489 489 * ❌ No comprehensive academic search ... ... @@ -498,6 +498,7 @@ 498 498 === 6.4 Gate 3: Uncertainty Quantification (Basic) === 499 499 500 500 **Full System Requirements:** 515 + 501 501 * Confidence scores calculated for all claims/verdicts 502 502 * Limitations explicitly stated 503 503 * Data gaps identified and disclosed ... ... @@ -505,6 +505,7 @@ 505 505 * Alternative scenarios considered 506 506 507 507 **POC Implementation:** 523 + 508 508 * ✅ Confidence scores (0-100%) 509 509 * ✅ Basic uncertainty acknowledgment 510 510 * ❌ No detailed limitation disclosure ... ... @@ -518,6 +518,7 @@ 518 518 === 6.5 Gate 4: Structural Integrity (Basic) === 519 519 520 520 **Full System Requirements:** 537 + 521 521 * No hallucinations detected (fact-checking against sources) 522 522 * Logic chain valid and traceable 523 523 * References accessible and verifiable ... ... @@ -525,6 +525,7 @@ 525 525 * Premises clearly stated 526 526 527 527 **POC Implementation:** 545 + 528 528 * ✅ Basic coherence check 529 529 * ✅ References accessible 530 530 * ❌ No comprehensive hallucination detection ... ... @@ -538,24 +538,20 @@ 538 538 === 6.6 Quality Gate Display === 539 539 540 540 **POC shows simplified status:** 541 -{{code}} 542 -Quality Gates: 4/4 Passed (Simplified) 559 +{{code}}Quality Gates: 4/4 Passed (Simplified) 543 543 ✓ Source Quality: 3 sources found 544 544 ✓ Contradiction Search: Basic search completed 545 545 ✓ Uncertainty: Confidence scores assigned 546 -✓ Structural Integrity: Output coherent 547 -{{/code}} 563 +✓ Structural Integrity: Output coherent{{/code}} 548 548 549 549 **If any gate fails:** 550 -{{code}} 551 -Quality Gates: 3/4 Passed (Simplified) 566 +{{code}}Quality Gates: 3/4 Passed (Simplified) 552 552 ✓ Source Quality: 3 sources found 553 553 ✗ Contradiction Search: Search failed - limited evidence 554 554 ✓ Uncertainty: Confidence scores assigned 555 555 ✓ Structural Integrity: Output coherent 556 556 557 -Note: This analysis has limited evidence. Use with caution. 558 -{{/code}} 572 +Note: This analysis has limited evidence. Use with caution.{{/code}} 559 559 560 560 === 6.7 Simplified vs. Full System === 561 561 ... ... @@ -572,6 +572,7 @@ 572 572 === 7.1 POC AKEL (Simplified) === 573 573 574 574 **Implementation:** 589 + 575 575 * Single Claude API call (Sonnet 4.5) 576 576 * One comprehensive prompt 577 577 * All processing in single request ... ... @@ -579,8 +579,7 @@ 579 579 * No orchestration layer 580 580 581 581 **Prompt Structure:** 582 -{{code}} 583 -Task: Analyze this article and provide: 597 +{{code}}Task: Analyze this article and provide: 584 584 585 585 1. Extract 3-5 factual claims 586 586 2. For each claim: ... ... @@ -592,8 +592,7 @@ 592 592 4. Generate article summary (3-5 sentences) 593 593 5. Run basic quality checks 594 594 595 -Return as structured JSON. 596 -{{/code}} 609 +Return as structured JSON.{{/code}} 597 597 598 598 **Processing Time:** 10-18 seconds (estimate) 599 599 ... ... @@ -600,8 +600,7 @@ 600 600 === 7.2 Full System AKEL (Production) === 601 601 602 602 **Architecture:** 603 -{{code}} 604 -AKEL Orchestrator 616 +{{code}}AKEL Orchestrator 605 605 ├── Claim Extractor 606 606 ├── Claim Classifier (with risk tier assignment) 607 607 ├── Scenario Generator ... ... @@ -609,10 +609,10 @@ 609 609 ├── Contradiction Detector 610 610 ├── Quality Gate Validator 611 611 ├── Audit Sampling Scheduler 612 -└── Federation Sync Adapter (Release 1.0+) 613 -{{/code}} 624 +└── Federation Sync Adapter (Release 1.0+){{/code}} 614 614 615 615 **Processing:** 627 + 616 616 * Parallel processing where possible 617 617 * Separate component calls 618 618 * Quality gates between phases ... ... @@ -624,6 +624,7 @@ 624 624 === 7.3 Why POC Uses Single Call === 625 625 626 626 **Advantages:** 639 + 627 627 * ✅ Simpler to implement 628 628 * ✅ Faster POC development 629 629 * ✅ Easier to debug ... ... @@ -631,6 +631,7 @@ 631 631 * ✅ Good enough for concept validation 632 632 633 633 **Limitations:** 647 + 634 634 * ❌ No component reusability 635 635 * ❌ No parallel processing 636 636 * ❌ All-or-nothing (can't partially succeed) ... ... @@ -657,6 +657,7 @@ 657 657 **Requirement:** User can submit article for analysis 658 658 659 659 **Functionality:** 674 + 660 660 * Text input field (paste article text, up to 5000 characters) 661 661 * URL input field (paste article URL) 662 662 * "Analyze" button to trigger processing ... ... @@ -663,6 +663,7 @@ 663 663 * Loading indicator during analysis 664 664 665 665 **Excluded:** 681 + 666 666 * No user authentication 667 667 * No claim history 668 668 * No search functionality ... ... @@ -669,6 +669,7 @@ 669 669 * No saved templates 670 670 671 671 **Acceptance Criteria:** 688 + 672 672 * User can paste text from article 673 673 * User can paste URL of article 674 674 * System accepts input and triggers analysis ... ... @@ -678,6 +678,7 @@ 678 678 **Requirement:** AI automatically extracts 3-5 factual claims 679 679 680 680 **Functionality:** 698 + 681 681 * AI reads article text 682 682 * AI identifies factual claims (not opinions/questions) 683 683 * AI extracts 3-5 most important claims ... ... @@ -684,6 +684,7 @@ 684 684 * System displays numbered list 685 685 686 686 **Critical:** NO MANUAL EDITING ALLOWED 705 + 687 687 * AI selects which claims to extract 688 688 * AI identifies factual vs. non-factual 689 689 * System processes claims as extracted ... ... @@ -690,11 +690,13 @@ 690 690 * No human curation or correction 691 691 692 692 **Error Handling:** 712 + 693 693 * If extraction fails: Display error message 694 694 * User can retry with different input 695 695 * No manual intervention to fix extraction 696 696 697 697 **Acceptance Criteria:** 718 + 698 698 * AI extracts 3-5 claims automatically 699 699 * Claims are factual (not opinions) 700 700 * Claims are clearly stated ... ... @@ -705,15 +705,17 @@ 705 705 **Requirement:** AI automatically generates verdict for each claim 706 706 707 707 **Functionality:** 729 + 708 708 * For each claim, AI: 709 - * Evaluates claim based on available evidence/knowledge710 - * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED711 - * Assigns confidence score (0-100%)712 - * Assigns risk tier (A/B/C)713 - * Writes brief reasoning (1-3 sentences)731 +* Evaluates claim based on available evidence/knowledge 732 +* Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED 733 +* Assigns confidence score (0-100%) 734 +* Assigns risk tier (A/B/C) 735 +* Writes brief reasoning (1-3 sentences) 714 714 * System displays verdict for each claim 715 715 716 716 **Critical:** NO MANUAL EDITING ALLOWED 739 + 717 717 * AI computes verdicts based on evidence 718 718 * AI generates confidence scores 719 719 * AI writes reasoning ... ... @@ -720,11 +720,13 @@ 720 720 * No human review or adjustment 721 721 722 722 **Error Handling:** 746 + 723 723 * If verdict generation fails: Display error message 724 724 * User can retry 725 725 * No manual intervention to adjust verdicts 726 726 727 727 **Acceptance Criteria:** 752 + 728 728 * Each claim has a verdict 729 729 * Confidence score is displayed (0-100%) 730 730 * Risk tier is displayed (A/B/C) ... ... @@ -737,15 +737,17 @@ 737 737 **Requirement:** AI generates brief summary of analysis 738 738 739 739 **Functionality:** 765 + 740 740 * AI summarizes findings in 3-5 sentences: 741 - * How many claims found742 - * Distribution of verdicts743 - * Overall assessment767 +* How many claims found 768 +* Distribution of verdicts 769 +* Overall assessment 744 744 * System displays at top of results 745 745 746 746 **Critical:** NO MANUAL EDITING ALLOWED 747 747 748 748 **Acceptance Criteria:** 775 + 749 749 * Summary is coherent 750 750 * Accurately reflects analysis 751 751 * 3-5 sentences ... ... @@ -756,6 +756,7 @@ 756 756 **Requirement:** AI generates brief summary of original article 757 757 758 758 **Functionality:** 786 + 759 759 * AI summarizes article content (not FactHarbor's analysis) 760 760 * 3-5 sentences 761 761 * System displays ... ... @@ -765,6 +765,7 @@ 765 765 **Critical:** NO MANUAL EDITING ALLOWED 766 766 767 767 **Acceptance Criteria:** 796 + 768 768 * Summary is neutral (article's position) 769 769 * Accurately reflects article content 770 770 * 3-5 sentences ... ... @@ -775,6 +775,7 @@ 775 775 **Requirement:** Clear labeling of AI-generated content 776 776 777 777 **Functionality:** 807 + 778 778 * Display Mode 2 publication label 779 779 * Show POC/Demo disclaimer 780 780 * Display risk tiers per claim ... ... @@ -782,6 +782,7 @@ 782 782 * Display timestamp 783 783 784 784 **Acceptance Criteria:** 815 + 785 785 * Label is prominent and clear 786 786 * User understands this is AI-generated POC output 787 787 * Risk tiers are color-coded ... ... @@ -792,6 +792,7 @@ 792 792 **Requirement:** Execute simplified quality gates 793 793 794 794 **Functionality:** 826 + 795 795 * Check source quality (basic) 796 796 * Attempt contradiction search (basic) 797 797 * Calculate confidence scores ... ... @@ -799,6 +799,7 @@ 799 799 * Display gate results 800 800 801 801 **Acceptance Criteria:** 834 + 802 802 * All 4 gates attempted 803 803 * Pass/fail status displayed 804 804 * Failures explained to user ... ... @@ -813,6 +813,7 @@ 813 813 **Critical Rule:** NO MANUAL EDITING AT ANY STAGE 814 814 815 815 **What this means:** 849 + 816 816 * Claims: AI selects (no human curation) 817 817 * Scenarios: N/A (deferred to POC2) 818 818 * Evidence: AI evaluates (no human selection) ... ... @@ -820,13 +820,12 @@ 820 820 * Summaries: AI writes (no human editing) 821 821 822 822 **Pipeline:** 823 -{{code}} 824 -User Input → AKEL Processing → Output Display 857 +{{code}}User Input → AKEL Processing → Output Display 825 825 ↓ 826 - ZERO human editing 827 -{{/code}} 859 + ZERO human editing{{/code}} 828 828 829 829 **If AI output is poor:** 862 + 830 830 * ❌ Do NOT manually fix it 831 831 * ✅ Document the failure 832 832 * ✅ Improve prompts and retry ... ... @@ -833,6 +833,7 @@ 833 833 * ✅ Accept that POC might fail 834 834 835 835 **Why this matters:** 869 + 836 836 * Tests whether AI can do this without humans 837 837 * Validates scalability (humans can't review every analysis) 838 838 * Honest test of technical feasibility ... ... @@ -842,16 +842,19 @@ 842 842 **Requirement:** Analysis completes in reasonable time 843 843 844 844 **Acceptable Performance:** 879 + 845 845 * Processing time: 1-5 minutes (acceptable for POC) 846 846 * Display loading indicator to user 847 847 * Show progress if possible ("Extracting claims...", "Generating verdicts...") 848 848 849 849 **Not Required:** 885 + 850 850 * Production-level speed (< 30 seconds) 851 851 * Optimization for scale 852 852 * Caching 853 853 854 854 **Acceptance Criteria:** 891 + 855 855 * Analysis completes within 5 minutes 856 856 * User sees loading indicator 857 857 * No timeout errors ... ... @@ -861,16 +861,19 @@ 861 861 **Requirement:** System works for manual testing sessions 862 862 863 863 **Acceptable:** 901 + 864 864 * Occasional errors (< 20% failure rate) 865 865 * Manual restart if needed 866 866 * Display error messages clearly 867 867 868 868 **Not Required:** 907 + 869 869 * 99.9% uptime 870 870 * Automatic error recovery 871 871 * Production monitoring 872 872 873 873 **Acceptance Criteria:** 913 + 874 874 * System works for test demonstrations 875 875 * Errors are handled gracefully 876 876 * User receives clear error messages ... ... @@ -880,6 +880,7 @@ 880 880 **Requirement:** Runs on simple infrastructure 881 881 882 882 **Acceptable:** 923 + 883 883 * Single machine or simple cloud setup 884 884 * No distributed architecture 885 885 * No load balancing ... ... @@ -887,6 +887,7 @@ 887 887 * Local development environment viable 888 888 889 889 **Not Required:** 931 + 890 890 * Production infrastructure 891 891 * Multi-region deployment 892 892 * Auto-scaling ... ... @@ -897,6 +897,7 @@ 897 897 **Requirement:** Track and display LLM usage metrics to inform optimization decisions 898 898 899 899 **Must Track:** 942 + 900 900 * Input tokens (article + prompt) 901 901 * Output tokens (generated analysis) 902 902 * Total tokens ... ... @@ -905,16 +905,19 @@ 905 905 * Article length (words/characters) 906 906 907 907 **Must Display:** 951 + 908 908 * Usage statistics in UI (Component 5) 909 909 * Cost per analysis 910 910 * Cost per claim extracted 911 911 912 912 **Must Log:** 957 + 913 913 * Aggregate metrics for analysis 914 914 * Cost distribution by article length 915 915 * Token efficiency trends 916 916 917 917 **Purpose:** 963 + 918 918 * Understand unit economics 919 919 * Identify optimization opportunities 920 920 * Project costs at scale ... ... @@ -921,6 +921,7 @@ 921 921 * Inform architecture decisions (caching, model selection, etc.) 922 922 923 923 **Acceptance Criteria:** 970 + 924 924 * ✅ Usage data displayed after each analysis 925 925 * ✅ Metrics logged for aggregate analysis 926 926 * ✅ Cost calculated accurately (Claude API pricing) ... ... @@ -928,6 +928,7 @@ 928 928 * ✅ POC1 report includes cost analysis section 929 929 930 930 **Success Target:** 978 + 931 931 * Average cost per analysis < $0.05 USD 932 932 * Cost scaling behavior understood (linear/exponential) 933 933 * 2+ optimization opportunities identified ... ... @@ -939,11 +939,13 @@ 939 939 === 10.1 System Components === 940 940 941 941 **Frontend:** 990 + 942 942 * Simple HTML form (text input + URL input + button) 943 943 * Loading indicator 944 944 * Results display page (single page, no tabs/navigation) 945 945 946 946 **Backend:** 996 + 947 947 * Single API endpoint 948 948 * Calls Claude API (Sonnet 4.5 or latest) 949 949 * Parses response ... ... @@ -950,10 +950,12 @@ 950 950 * Returns JSON to frontend 951 951 952 952 **Data Storage:** 1003 + 953 953 * None required (stateless POC) 954 954 * Optional: Simple file storage or SQLite for demo examples 955 955 956 956 **External Services:** 1008 + 957 957 * Claude API (Anthropic) - required 958 958 * Optional: URL fetch service for article text extraction 959 959 ... ... @@ -986,8 +986,7 @@ 986 986 === 10.3 AI Prompt Strategy === 987 987 988 988 **Single Comprehensive Prompt:** 989 -{{code}} 990 -Task: Analyze this article and provide: 1041 +{{code}}Task: Analyze this article and provide: 991 991 992 992 1. Identify the article's main thesis/conclusion 993 993 - What is the article trying to argue or prove? ... ... @@ -1023,8 +1023,7 @@ 1023 1023 1024 1024 7. Write article summary (3-5 sentences: neutral summary of article content) 1025 1025 1026 -Return as structured JSON with quality gate results. 1027 -{{/code}} 1077 +Return as structured JSON with quality gate results.{{/code}} 1028 1028 1029 1029 **One prompt generates everything.** 1030 1030 ... ... @@ -1035,25 +1035,30 @@ 1035 1035 === 10.4 Technology Stack Suggestions === 1036 1036 1037 1037 **Frontend:** 1088 + 1038 1038 * HTML + CSS + JavaScript (minimal framework) 1039 1039 * OR: Next.js (if team prefers) 1040 1040 * Hosted: Local machine OR Vercel/Netlify free tier 1041 1041 1042 1042 **Backend:** 1094 + 1043 1043 * Python Flask/FastAPI (simple REST API) 1044 1044 * OR: Next.js API routes (if using Next.js) 1045 1045 * Hosted: Local machine OR Railway/Render free tier 1046 1046 1047 1047 **AKEL Integration:** 1100 + 1048 1048 * Claude API via Anthropic SDK 1049 1049 * Model: Claude Sonnet 4.5 or latest available 1050 1050 1051 1051 **Database:** 1105 + 1052 1052 * None (stateless acceptable) 1053 1053 * OR: SQLite if want to store demo examples 1054 1054 * OR: JSON files on disk 1055 1055 1056 1056 **Deployment:** 1111 + 1057 1057 * Local development environment sufficient for POC 1058 1058 * Optional: Deploy to cloud for remote demos 1059 1059 ... ... @@ -1062,6 +1062,7 @@ 1062 1062 === 11.1 Minimum Success (POC Passes) === 1063 1063 1064 1064 **Required for GO decision:** 1120 + 1065 1065 * ✅ AI extracts 3-5 factual claims automatically 1066 1066 * ✅ AI provides verdict for each claim automatically 1067 1067 * ✅ Verdicts are reasonable (≥70% make logical sense) ... ... @@ -1075,6 +1075,7 @@ 1075 1075 * ✅ **Optimization opportunities identified** (≥2 potential improvements documented) 1076 1076 1077 1077 **Quality Definition:** 1134 + 1078 1078 * "Reasonable verdict" = Defensible given general knowledge 1079 1079 * "Coherent summary" = Logically structured, grammatically correct 1080 1080 * "Comprehensible" = Reviewers understand what analysis means ... ... @@ -1082,6 +1082,7 @@ 1082 1082 === 11.2 POC Fails If === 1083 1083 1084 1084 **Automatic NO-GO if any of these:** 1142 + 1085 1085 * ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones) 1086 1086 * ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random) 1087 1087 * ❌ Output incomprehensible (reviewers can't understand analysis) ... ... @@ -1093,14 +1093,15 @@ 1093 1093 **POC quality expectations:** 1094 1094 1095 1095 |=Component|=Quality Threshold|=Definition 1096 -|Claim Extraction|(% class="success" %)≥70% accuracy (%%)|Identifies obvious factual claims, may miss some edge cases1097 -|Verdict Logic|(% class="success" %)≥70% defensible (%%)|Verdicts are logical given reasoning provided1098 -|Reasoning Clarity|(% class="success" %)≥70% clear (%%)|1-3 sentences are understandable and relevant1099 -|Overall Analysis|(% class="success" %)≥70% useful (%%)|Output helps user understand article claims1154 +|Claim Extraction|(% class="success" %)≥70% accuracy |Identifies obvious factual claims, may miss some edge cases 1155 +|Verdict Logic|(% class="success" %)≥70% defensible |Verdicts are logical given reasoning provided 1156 +|Reasoning Clarity|(% class="success" %)≥70% clear |1-3 sentences are understandable and relevant 1157 +|Overall Analysis|(% class="success" %)≥70% useful |Output helps user understand article claims 1100 1100 1101 1101 **Analogy:** "B student" quality (70-80%), not "A+" perfection yet 1102 1102 1103 1103 **Not expecting:** 1162 + 1104 1104 * 100% accuracy 1105 1105 * Perfect claim coverage 1106 1106 * Comprehensive evidence gathering ... ... @@ -1108,6 +1108,7 @@ 1108 1108 * Production polish 1109 1109 1110 1110 **Expecting:** 1170 + 1111 1111 * Reasonable claim extraction 1112 1112 * Defensible verdicts 1113 1113 * Understandable reasoning ... ... @@ -1120,6 +1120,7 @@ 1120 1120 **Input:** "Coffee reduces the risk of type 2 diabetes by 30%" 1121 1121 1122 1122 **Expected Output:** 1183 + 1123 1123 * Extract claim correctly 1124 1124 * Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED 1125 1125 * Confidence: 70-90% ... ... @@ -1133,6 +1133,7 @@ 1133 1133 **Input:** News article URL with multiple claims about politics/health/science 1134 1134 1135 1135 **Expected Output:** 1197 + 1136 1136 * Extract 3-5 key claims 1137 1137 * Verdict for each (may vary: some supported, some uncertain, some refuted) 1138 1138 * Coherent analysis summary ... ... @@ -1146,6 +1146,7 @@ 1146 1146 **Input:** Article on contested political or scientific topic 1147 1147 1148 1148 **Expected Output:** 1211 + 1149 1149 * Balanced analysis 1150 1150 * Acknowledges uncertainty where appropriate 1151 1151 * Doesn't overstate confidence ... ... @@ -1158,6 +1158,7 @@ 1158 1158 **Input:** Article with obviously false claim (e.g., "The Earth is flat") 1159 1159 1160 1160 **Expected Output:** 1224 + 1161 1161 * Extract claim 1162 1162 * Verdict: REFUTED 1163 1163 * High confidence (> 90%) ... ... @@ -1171,6 +1171,7 @@ 1171 1171 **Input:** Article with claim where evidence is genuinely mixed 1172 1172 1173 1173 **Expected Output:** 1238 + 1174 1174 * Extract claim 1175 1175 * Verdict: UNCERTAIN 1176 1176 * Moderate confidence (40-60%) ... ... @@ -1183,6 +1183,7 @@ 1183 1183 **Input:** Article making medical claims 1184 1184 1185 1185 **Expected Output:** 1251 + 1186 1186 * Extract claim 1187 1187 * Verdict: [appropriate based on evidence] 1188 1188 * Risk tier: A (High - medical) ... ... @@ -1200,6 +1200,7 @@ 1200 1200 **Option A: GO (Proceed to POC2)** 1201 1201 1202 1202 **Conditions:** 1269 + 1203 1203 * AI quality ≥70% without manual editing 1204 1204 * Basic claim → verdict pipeline validated 1205 1205 * Internal + advisor feedback positive ... ... @@ -1208,6 +1208,7 @@ 1208 1208 * Clear path to improving AI quality to ≥90% 1209 1209 1210 1210 **Next Steps:** 1278 + 1211 1211 * Plan POC2 development (add scenarios) 1212 1212 * Design scenario architecture 1213 1213 * Expand to Evidence Model structure ... ... @@ -1216,6 +1216,7 @@ 1216 1216 **Option B: NO-GO (Pivot or Stop)** 1217 1217 1218 1218 **Conditions:** 1287 + 1219 1219 * AI quality < 60% 1220 1220 * Requires manual editing for most analyses (> 50%) 1221 1221 * Feedback indicates fundamental flaws ... ... @@ -1223,6 +1223,7 @@ 1223 1223 * No clear path to improvement 1224 1224 1225 1225 **Next Steps:** 1295 + 1226 1226 * **Pivot:** Change to hybrid human-AI approach (accept manual review required) 1227 1227 * **Stop:** Conclude approach not viable, revisit later 1228 1228 ... ... @@ -1229,6 +1229,7 @@ 1229 1229 **Option C: ITERATE (Improve POC)** 1230 1230 1231 1231 **Conditions:** 1302 + 1232 1232 * Concept has merit but execution needs work 1233 1233 * Specific improvements identified 1234 1234 * Addressable with better prompts/approach ... ... @@ -1235,6 +1235,7 @@ 1235 1235 * AI quality between 60-70% 1236 1236 1237 1237 **Next Steps:** 1309 + 1238 1238 * Improve AI prompts 1239 1239 * Test different approaches 1240 1240 * Re-run POC with improvements ... ... @@ -1256,6 +1256,7 @@ 1256 1256 **Impact:** POC fails 1257 1257 1258 1258 **Mitigation:** 1331 + 1259 1259 * Extensive prompt engineering and testing 1260 1260 * Use best available AI models (Sonnet 4.5) 1261 1261 * Test with diverse article types ... ... @@ -1269,6 +1269,7 @@ 1269 1269 **Impact:** Works sometimes, fails other times 1270 1270 1271 1271 **Mitigation:** 1345 + 1272 1272 * Test with 10+ diverse articles 1273 1273 * Measure success rate honestly 1274 1274 * Improve prompts to increase consistency ... ... @@ -1281,6 +1281,7 @@ 1281 1281 **Impact:** Users can't understand analysis 1282 1282 1283 1283 **Mitigation:** 1358 + 1284 1284 * Create clear explainer document 1285 1285 * Iterate on output format 1286 1286 * Test with non-technical reviewers ... ... @@ -1294,6 +1294,7 @@ 1294 1294 **Impact:** System slow or expensive 1295 1295 1296 1296 **Mitigation:** 1372 + 1297 1297 * Monitor API usage 1298 1298 * Implement retry logic 1299 1299 * Estimate costs before scaling ... ... @@ -1306,6 +1306,7 @@ 1306 1306 **Impact:** POC becomes too complex 1307 1307 1308 1308 **Mitigation:** 1385 + 1309 1309 * Strict scope discipline 1310 1310 * Say NO to feature additions 1311 1311 * Keep focus on core question ... ... @@ -1316,12 +1316,15 @@ 1316 1316 1317 1317 === 15.1 Core Principles === 1318 1318 1319 -**1. Build Less, Learn More** 1396 +* 1397 +** 1398 +**1. Build Less, Learn More 1320 1320 * Minimum features to test hypothesis 1321 1321 * Don't build unvalidated features 1322 1322 * Focus on core question only 1323 1323 1324 1324 **2. Fail Fast** 1404 + 1325 1325 * Quick test of hardest part (AI capability) 1326 1326 * Accept that POC might fail 1327 1327 * Better to discover issues early ... ... @@ -1328,16 +1328,19 @@ 1328 1328 * Honest assessment over optimistic hope 1329 1329 1330 1330 **3. Test First, Build Second** 1411 + 1331 1331 * Validate AI can do this before building platform 1332 1332 * Don't assume it will work 1333 1333 * Let results guide decisions 1334 1334 1335 1335 **4. Automation First** 1417 + 1336 1336 * No manual editing allowed 1337 1337 * Tests scalability, not just feasibility 1338 1338 * Proves approach can work at scale 1339 1339 1340 1340 **5. Honest Assessment** 1423 + 1341 1341 * Don't cherry-pick examples 1342 1342 * Don't manually fix bad outputs 1343 1343 * Document failures openly ... ... @@ -1358,9 +1358,12 @@ 1358 1358 ❌ Perfectly accurate analysis 1359 1359 ❌ Polished user experience 1360 1360 1361 -== 16. Success = Clear Path Forward==1444 +== 16. Success == 1362 1362 1446 + Clear Path Forward == 1447 + 1363 1363 **If POC succeeds (≥70% AI quality):** 1449 + 1364 1364 * ✅ Approach validated 1365 1365 * ✅ Proceed to POC2 (add scenarios) 1366 1366 * ✅ Design full Evidence Model structure ... ... @@ -1368,6 +1368,7 @@ 1368 1368 * ✅ Focus on improving AI quality from 70% → 90% 1369 1369 1370 1370 **If POC fails (< 60% AI quality):** 1457 + 1371 1371 * ✅ Learn what doesn't work 1372 1372 * ✅ Pivot to different approach 1373 1373 * ✅ OR wait for better AI technology ... ... @@ -1394,16 +1394,16 @@ 1394 1394 **POC1 Implementation:** 1395 1395 1396 1396 * **Primary Provider:** Anthropic Claude API 1397 - * Stage 1: Claude Haiku 41398 - * Stage 2: Claude Sonnet 3.5 (cached)1399 - * Stage 3: Claude Sonnet 3.51484 +* Stage 1: Claude Haiku 4 1485 +* Stage 2: Claude Sonnet 3.5 (cached) 1486 +* Stage 3: Claude Sonnet 3.5 1400 1400 1401 1401 * **Provider Interface:** Abstract LLMProvider interface implemented 1402 1402 1403 1403 * **Configuration:** Environment variables for provider selection 1404 - * {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}}1405 - * {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}}1406 - * {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}}1491 +* {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}} 1492 +* {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}} 1493 +* {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}} 1407 1407 1408 1408 * **Failover:** Basic error handling with cache fallback for Stage 2 1409 1409 ... ... @@ -1423,9 +1423,10 @@ 1423 1423 * Cost tracking includes provider name in logs 1424 1424 * Stage 2 falls back to cache on provider failure 1425 1425 1426 -**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor.Specification.POC.API-and-Schemas.WebHome]] Section 6 1513 +**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor V0\.9\.105.Specification.POC.API-and-Schemas.WebHome]] Section 6 1427 1427 1428 1428 **Dependencies:** 1516 + 1429 1429 * NFR-14 (Main Requirements) 1430 1430 * Design Decision 9 1431 1431 * Architecture Section 2.2 ... ... @@ -1433,5 +1433,3 @@ 1433 1433 **Priority:** HIGH (P1) 1434 1434 1435 1435 **Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later. 1436 - 1437 -