Changes for page POC Requirements (POC1 & POC2)
Last modified by Robert Schaub on 2025/12/24 18:27
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -9,9 +9,11 @@ 9 9 === 1.1 What POC Tests === 10 10 11 11 **Core Question:** 12 + 12 12 > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts? 13 13 14 14 **What we're proving:** 16 + 15 15 * AI can identify factual claims from text 16 16 * AI can evaluate those claims and produce verdicts 17 17 * Output is comprehensible and useful ... ... @@ -18,6 +18,7 @@ 18 18 * Fully automated approach is viable 19 19 20 20 **What we're NOT testing:** 23 + 21 21 * Scenario generation (deferred to POC2) 22 22 * Evidence display (deferred to POC2) 23 23 * Production scalability ... ... @@ -31,6 +31,7 @@ 31 31 Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**. 32 32 33 33 **Rationale:** 37 + 34 34 * **POC1 tests:** Can AI extract claims and generate verdicts? 35 35 * **POC2 will add:** Scenario generation and management 36 36 * **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow? ... ... @@ -42,6 +42,7 @@ 42 42 **No Risk:** 43 43 44 44 Scenarios are additive complexity, not foundational. Deferring them to POC2 allows: 49 + 45 45 * Faster POC1 validation 46 46 * Learning from POC1 to inform scenario design 47 47 * Iterative approach: fail fast if basic AI doesn't work ... ... @@ -48,14 +48,10 @@ 48 48 * Flexibility to adjust scenario architecture based on POC1 insights 49 49 50 50 **Full System Workflow (Future):** 51 -{{code}} 52 -Claims → Scenarios → Evidence → Verdicts 53 -{{/code}} 56 +{{code}}Claims → Scenarios → Evidence → Verdicts{{/code}} 54 54 55 55 **POC1 Simplified Workflow:** 56 -{{code}} 57 -Claims → Verdicts (scenarios implicit in reasoning) 58 -{{/code}} 59 +{{code}}Claims → Verdicts (scenarios implicit in reasoning){{/code}} 59 59 60 60 == 2. POC Output Specification == 61 61 ... ... @@ -66,6 +66,7 @@ 66 66 **Length:** 4-6 sentences 67 67 68 68 **Content (Required Elements):** 70 + 69 69 1. **Article's main thesis/claim** - What is the article trying to argue or prove? 70 70 2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts 71 71 3. **Central vs. supporting claims** - Which claims are central to the article's argument? ... ... @@ -75,30 +75,28 @@ 75 75 **Critical Innovation:** 76 76 77 77 POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might: 80 + 78 78 * Make accurate supporting facts but draw unsupported conclusions 79 79 * Have one false central claim that invalidates the whole argument 80 80 * Misframe accurate information to mislead 81 81 82 82 **Good Example (Context-Aware):** 83 -{{code}} 84 -This article argues that coffee cures cancer based on its antioxidant 86 +{{code}}This article argues that coffee cures cancer based on its antioxidant 85 85 content. We analyzed 3 factual claims: 2 about coffee's chemical 86 86 properties are well-supported, but the main causal claim is refuted 87 87 by current evidence. The article confuses correlation with causation. 88 88 Overall assessment: MISLEADING - makes an unsupported medical claim 89 -despite citing some accurate facts. 90 -{{/code}} 91 +despite citing some accurate facts.{{/code}} 91 91 92 92 **Poor Example (Simple Aggregation - Don't Do This):** 93 -{{code}} 94 -This article makes 3 claims. 2 are well-supported and 1 is refuted. 95 -Overall assessment: mostly accurate (67% accurate). 96 -{{/code}} 94 +{{code}}This article makes 3 claims. 2 are well-supported and 1 is refuted. 95 +Overall assessment: mostly accurate (67% accurate).{{/code}} 97 97 ↑ This misses that the refuted claim IS the article's main point! 98 98 99 99 **What POC1 Tests:** 100 100 101 101 Can AI identify and assess: 101 + 102 102 * ✅ The article's main thesis/conclusion? 103 103 * ✅ Which claims are central vs. supporting? 104 104 * ✅ Whether the evidence supports the conclusion? ... ... @@ -107,6 +107,7 @@ 107 107 **If AI Cannot Do This:** 108 108 109 109 That's valuable to learn in POC1! We'll: 110 + 110 110 * Note as limitation 111 111 * Fall back to simple aggregation with warning 112 112 * Design explicit article-level analysis for POC2 ... ... @@ -117,19 +117,18 @@ 117 117 **Format:** Numbered list 118 118 **Quantity:** 3-5 claims 119 119 **Requirements:** 121 + 120 120 * Factual claims only (not opinions/questions) 121 121 * Clearly stated 122 122 * Automatically extracted by AI 123 123 124 124 **Example:** 125 -{{code}} 126 -CLAIMS IDENTIFIED: 127 +{{code}}CLAIMS IDENTIFIED: 127 127 128 128 [1] Coffee reduces diabetes risk by 30% 129 129 [2] Coffee improves heart health 130 130 [3] Decaf has same benefits as regular 131 -[4] Coffee prevents Alzheimer's completely 132 -{{/code}} 132 +[4] Coffee prevents Alzheimer's completely{{/code}} 133 133 134 134 === 2.3 Component 3: CLAIMS VERDICTS === 135 135 ... ... @@ -137,6 +137,7 @@ 137 137 **Format:** Per claim structure 138 138 139 139 **Required Elements:** 140 + 140 140 * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED 141 141 * **Confidence Score:** 0-100% 142 142 * **Brief Reasoning:** 1-3 sentences explaining why ... ... @@ -143,8 +143,7 @@ 143 143 * **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration 144 144 145 145 **Example:** 146 -{{code}} 147 -VERDICTS: 147 +{{code}}VERDICTS: 148 148 149 149 [1] WELL-SUPPORTED (85%) [Risk: C] 150 150 Multiple studies confirm 25-30% risk reduction with regular consumption. ... ... @@ -156,10 +156,10 @@ 156 156 Some benefits overlap, but caffeine-related benefits are reduced in decaf. 157 157 158 158 [4] REFUTED (90%) [Risk: B] 159 -No evidence for complete prevention. Claim is significantly overstated. 160 -{{/code}} 159 +No evidence for complete prevention. Claim is significantly overstated.{{/code}} 161 161 162 162 **Risk Tier Display:** 162 + 163 163 * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections 164 164 * **Tier B (Yellow):** Medium Risk - Policy/Science/Causality 165 165 * **Tier C (Green):** Low Risk - Facts/Definitions/History ... ... @@ -173,13 +173,11 @@ 173 173 **Tone:** Neutral (article's position, not FactHarbor's analysis) 174 174 175 175 **Example:** 176 -{{code}} 177 -ARTICLE SUMMARY: 176 +{{code}}ARTICLE SUMMARY: 178 178 179 179 Health News Today article discusses coffee benefits, citing studies 180 180 on diabetes and Alzheimer's. Author highlights research linking coffee 181 -to disease prevention. Recommends 2-3 cups daily for optimal health. 182 -{{/code}} 180 +to disease prevention. Recommends 2-3 cups daily for optimal health.{{/code}} 183 183 184 184 === 2.5 Component 5: USAGE STATISTICS (Cost Tracking) === 185 185 ... ... @@ -186,6 +186,7 @@ 186 186 **What:** LLM usage metrics for cost optimization and scaling decisions 187 187 188 188 **Purpose:** 187 + 189 189 * Understand cost per analysis 190 190 * Identify optimization opportunities 191 191 * Project costs at scale ... ... @@ -192,8 +192,7 @@ 192 192 * Inform architecture decisions 193 193 194 194 **Display Format:** 195 -{{code}} 196 -USAGE STATISTICS: 194 +{{code}}USAGE STATISTICS: 197 197 • Article: 2,450 words (12,300 characters) 198 198 • Input tokens: 15,234 199 199 • Output tokens: 892 ... ... @@ -201,17 +201,18 @@ 201 201 • Estimated cost: $0.24 USD 202 202 • Response time: 8.3 seconds 203 203 • Cost per claim: $0.048 204 -• Model: claude-sonnet-4-20250514 205 -{{/code}} 202 +• Model: claude-sonnet-4-20250514{{/code}} 206 206 207 207 **Why This Matters:** 208 208 209 209 At scale, LLM costs are critical: 207 + 210 210 * 10,000 articles/month ≈ $200-500/month 211 211 * 100,000 articles/month ≈ $2,000-5,000/month 212 212 * Cost optimization can reduce expenses 30-50% 213 213 214 214 **What POC1 Learns:** 213 + 215 215 * How cost scales with article length 216 216 * Prompt optimization opportunities (caching, compression) 217 217 * Output verbosity tradeoffs ... ... @@ -219,6 +219,7 @@ 219 219 * Article length limits (if needed) 220 220 221 221 **Implementation:** 221 + 222 222 * Claude API already returns usage data 223 223 * No extra API calls needed 224 224 * Display to user + log for aggregate analysis ... ... @@ -228,7 +228,8 @@ 228 228 229 229 === 2.6 Total Output Size === 230 230 231 -**Combined:** ~220-350 words 231 +**Combined:** 220-350 words 232 + 232 232 * Analysis Summary (Context-Aware): 60-90 words (4-6 sentences) 233 233 * Claims Identification: 30-50 words 234 234 * Claims Verdicts: 100-150 words ... ... @@ -243,6 +243,7 @@ 243 243 The following are **explicitly excluded** from POC: 244 244 245 245 **Content Features:** 247 + 246 246 * ❌ Scenarios (deferred to POC2) 247 247 * ❌ Evidence display (supporting/opposing lists) 248 248 * ❌ Source links (clickable references) ... ... @@ -252,6 +252,7 @@ 252 252 * ❌ Risk assessment (shown but not workflow-integrated) 253 253 254 254 **Platform Features:** 257 + 255 255 * ❌ User accounts / authentication 256 256 * ❌ Saved history 257 257 * ❌ Search functionality ... ... @@ -261,6 +261,7 @@ 261 261 * ❌ Social sharing 262 262 263 263 **Technical Features:** 267 + 264 264 * ❌ Browser extensions 265 265 * ❌ Mobile apps 266 266 * ❌ API endpoints ... ... @@ -268,6 +268,7 @@ 268 268 * ❌ Export features (PDF, CSV) 269 269 270 270 **Quality Features:** 275 + 271 271 * ❌ Accessibility (WCAG compliance) 272 272 * ❌ Multilingual support 273 273 * ❌ Mobile optimization ... ... @@ -274,6 +274,7 @@ 274 274 * ❌ Media verification (images/videos) 275 275 276 276 **Production Features:** 282 + 277 277 * ❌ Security hardening 278 278 * ❌ Privacy compliance (GDPR) 279 279 * ❌ Terms of service ... ... @@ -287,17 +287,13 @@ 287 287 === 4.1 Architecture Comparison === 288 288 289 289 **POC Architecture (Simplified):** 290 -{{code}} 291 -User Input → Single AKEL Call → Output Display 292 - (all processing) 293 -{{/code}} 296 +{{code}}User Input → Single AKEL Call → Output Display 297 + (all processing){{/code}} 294 294 295 295 **Full System Architecture:** 296 -{{code}} 297 -User Input → Claim Extractor → Claim Classifier → Scenario Generator 300 +{{code}}User Input → Claim Extractor → Claim Classifier → Scenario Generator 298 298 → Evidence Summarizer → Contradiction Detector → Verdict Generator 299 -→ Quality Gates → Publication → Output Display 300 -{{/code}} 302 +→ Quality Gates → Publication → Output Display{{/code}} 301 301 302 302 **Key Differences:** 303 303 ... ... @@ -313,12 +313,14 @@ 313 313 === 4.2 Workflow Comparison === 314 314 315 315 **POC1 Workflow:** 318 + 316 316 1. User submits text/URL 317 317 2. Single AKEL call (all processing in one prompt) 318 318 3. Display results 319 -**Total: 3 steps, ~10-18 seconds**322 +**Total: 3 steps, 10-18 seconds** 320 320 321 321 **Full System Workflow:** 325 + 322 322 1. **Claim Submission** (extraction, normalization, clustering) 323 323 2. **Scenario Building** (definitions, assumptions, boundaries) 324 324 3. **Evidence Handling** (retrieval, assessment, linking) ... ... @@ -325,7 +325,7 @@ 325 325 4. **Verdict Creation** (synthesis, reasoning, approval) 326 326 5. **Public Presentation** (summaries, landscapes, deep dives) 327 327 6. **Time Evolution** (versioning, re-evaluation triggers) 328 -**Total: 6 phases with quality gates, ~10-30 seconds**332 +**Total: 6 phases with quality gates, 10-30 seconds** 329 329 330 330 === 4.3 Why POC is Simplified === 331 331 ... ... @@ -348,6 +348,7 @@ 348 348 === 4.4 Gap Between POC1 and POC2/Beta === 349 349 350 350 **What needs to be built for POC2:** 355 + 351 351 * Scenario generation component 352 352 * Evidence Model structure (full) 353 353 * Scenario-evidence linking ... ... @@ -355,6 +355,7 @@ 355 355 * Truth landscape visualization 356 356 357 357 **What needs to be built for Beta:** 363 + 358 358 * Multi-component AKEL pipeline 359 359 * Quality gate infrastructure 360 360 * Review workflow system ... ... @@ -371,6 +371,7 @@ 371 371 **Mode:** Mode 2 (AI-Generated, No Prior Human Review) 372 372 373 373 Per FactHarbor Specification Section 11 "POC v1 Behavior": 380 + 374 374 * Produces public AI-generated output 375 375 * No human approval gate 376 376 * Clear AI-Generated labeling ... ... @@ -380,8 +380,7 @@ 380 380 === 5.2 User-Facing Labels === 381 381 382 382 **Primary Label (top of analysis):** 383 -{{code}} 384 -╔════════════════════════════════════════════════════════════╗ 390 +{{code}}╔════════════════════════════════════════════════════════════╗ 385 385 ║ [AI-GENERATED - POC/DEMO] ║ 386 386 ║ ║ 387 387 ║ This analysis was produced entirely by AI and has not ║ ... ... @@ -391,10 +391,10 @@ 391 391 ║ Review Status: Not Reviewed (Proof-of-Concept) ║ 392 392 ║ Quality Gates: 4/4 Passed (Simplified) ║ 393 393 ║ Last Updated: [timestamp] ║ 394 -╚════════════════════════════════════════════════════════════╝ 395 -{{/code}} 400 +╚════════════════════════════════════════════════════════════╝{{/code}} 396 396 397 397 **Per-Claim Risk Labels:** 403 + 398 398 * **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety) 399 399 * **[Risk: B]** 🟡 Medium Risk (Policy/Science) 400 400 * **[Risk: C]** 🟢 Low Risk (Facts/Definitions) ... ... @@ -402,6 +402,7 @@ 402 402 === 5.3 Display Requirements === 403 403 404 404 **Must Show:** 411 + 405 405 * AI-Generated status (prominent) 406 406 * POC/Demo disclaimer 407 407 * Risk tier per claim ... ... @@ -410,6 +410,7 @@ 410 410 * Timestamp 411 411 412 412 **Must NOT Claim:** 420 + 413 413 * Human review 414 414 * Production quality 415 415 * Medical/legal advice ... ... @@ -433,6 +433,7 @@ 433 433 Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates. 434 434 435 435 **Full System Has 4 Gates:** 444 + 436 436 1. Source Quality 437 437 2. Contradiction Search (MANDATORY) 438 438 3. Uncertainty Quantification ... ... @@ -439,6 +439,7 @@ 439 439 4. Structural Integrity 440 440 441 441 **POC Implements Simplified Versions:** 451 + 442 442 * Focus on demonstrating concept 443 443 * Basic implementations sufficient 444 444 * Failures displayed to user (not blocking) ... ... @@ -447,6 +447,7 @@ 447 447 === 6.2 Gate 1: Source Quality (Basic) === 448 448 449 449 **Full System Requirements:** 460 + 450 450 * Primary sources identified and accessible 451 451 * Source reliability scored against whitelist 452 452 * Citation completeness verified ... ... @@ -454,6 +454,7 @@ 454 454 * Author credentials validated 455 455 456 456 **POC Implementation:** 468 + 457 457 * ✅ At least 2 sources found 458 458 * ✅ Sources accessible (URLs valid) 459 459 * ❌ No whitelist checking ... ... @@ -467,6 +467,7 @@ 467 467 === 6.3 Gate 2: Contradiction Search (Basic) === 468 468 469 469 **Full System Requirements:** 482 + 470 470 * Counter-evidence actively searched 471 471 * Reservations and limitations identified 472 472 * Alternative interpretations explored ... ... @@ -475,6 +475,7 @@ 475 475 * Academic literature (supporting AND opposing) 476 476 477 477 **POC Implementation:** 491 + 478 478 * ✅ Basic search for counter-evidence 479 479 * ✅ Identify obvious contradictions 480 480 * ❌ No comprehensive academic search ... ... @@ -489,6 +489,7 @@ 489 489 === 6.4 Gate 3: Uncertainty Quantification (Basic) === 490 490 491 491 **Full System Requirements:** 506 + 492 492 * Confidence scores calculated for all claims/verdicts 493 493 * Limitations explicitly stated 494 494 * Data gaps identified and disclosed ... ... @@ -496,6 +496,7 @@ 496 496 * Alternative scenarios considered 497 497 498 498 **POC Implementation:** 514 + 499 499 * ✅ Confidence scores (0-100%) 500 500 * ✅ Basic uncertainty acknowledgment 501 501 * ❌ No detailed limitation disclosure ... ... @@ -509,6 +509,7 @@ 509 509 === 6.5 Gate 4: Structural Integrity (Basic) === 510 510 511 511 **Full System Requirements:** 528 + 512 512 * No hallucinations detected (fact-checking against sources) 513 513 * Logic chain valid and traceable 514 514 * References accessible and verifiable ... ... @@ -516,6 +516,7 @@ 516 516 * Premises clearly stated 517 517 518 518 **POC Implementation:** 536 + 519 519 * ✅ Basic coherence check 520 520 * ✅ References accessible 521 521 * ❌ No comprehensive hallucination detection ... ... @@ -529,24 +529,20 @@ 529 529 === 6.6 Quality Gate Display === 530 530 531 531 **POC shows simplified status:** 532 -{{code}} 533 -Quality Gates: 4/4 Passed (Simplified) 550 +{{code}}Quality Gates: 4/4 Passed (Simplified) 534 534 ✓ Source Quality: 3 sources found 535 535 ✓ Contradiction Search: Basic search completed 536 536 ✓ Uncertainty: Confidence scores assigned 537 -✓ Structural Integrity: Output coherent 538 -{{/code}} 554 +✓ Structural Integrity: Output coherent{{/code}} 539 539 540 540 **If any gate fails:** 541 -{{code}} 542 -Quality Gates: 3/4 Passed (Simplified) 557 +{{code}}Quality Gates: 3/4 Passed (Simplified) 543 543 ✓ Source Quality: 3 sources found 544 544 ✗ Contradiction Search: Search failed - limited evidence 545 545 ✓ Uncertainty: Confidence scores assigned 546 546 ✓ Structural Integrity: Output coherent 547 547 548 -Note: This analysis has limited evidence. Use with caution. 549 -{{/code}} 563 +Note: This analysis has limited evidence. Use with caution.{{/code}} 550 550 551 551 === 6.7 Simplified vs. Full System === 552 552 ... ... @@ -563,6 +563,7 @@ 563 563 === 7.1 POC AKEL (Simplified) === 564 564 565 565 **Implementation:** 580 + 566 566 * Single Claude API call (Sonnet 4.5) 567 567 * One comprehensive prompt 568 568 * All processing in single request ... ... @@ -570,8 +570,7 @@ 570 570 * No orchestration layer 571 571 572 572 **Prompt Structure:** 573 -{{code}} 574 -Task: Analyze this article and provide: 588 +{{code}}Task: Analyze this article and provide: 575 575 576 576 1. Extract 3-5 factual claims 577 577 2. For each claim: ... ... @@ -583,8 +583,7 @@ 583 583 4. Generate article summary (3-5 sentences) 584 584 5. Run basic quality checks 585 585 586 -Return as structured JSON. 587 -{{/code}} 600 +Return as structured JSON.{{/code}} 588 588 589 589 **Processing Time:** 10-18 seconds (estimate) 590 590 ... ... @@ -591,8 +591,7 @@ 591 591 === 7.2 Full System AKEL (Production) === 592 592 593 593 **Architecture:** 594 -{{code}} 595 -AKEL Orchestrator 607 +{{code}}AKEL Orchestrator 596 596 ├── Claim Extractor 597 597 ├── Claim Classifier (with risk tier assignment) 598 598 ├── Scenario Generator ... ... @@ -600,10 +600,10 @@ 600 600 ├── Contradiction Detector 601 601 ├── Quality Gate Validator 602 602 ├── Audit Sampling Scheduler 603 -└── Federation Sync Adapter (Release 1.0+) 604 -{{/code}} 615 +└── Federation Sync Adapter (Release 1.0+){{/code}} 605 605 606 606 **Processing:** 618 + 607 607 * Parallel processing where possible 608 608 * Separate component calls 609 609 * Quality gates between phases ... ... @@ -615,6 +615,7 @@ 615 615 === 7.3 Why POC Uses Single Call === 616 616 617 617 **Advantages:** 630 + 618 618 * ✅ Simpler to implement 619 619 * ✅ Faster POC development 620 620 * ✅ Easier to debug ... ... @@ -622,6 +622,7 @@ 622 622 * ✅ Good enough for concept validation 623 623 624 624 **Limitations:** 638 + 625 625 * ❌ No component reusability 626 626 * ❌ No parallel processing 627 627 * ❌ All-or-nothing (can't partially succeed) ... ... @@ -648,6 +648,7 @@ 648 648 **Requirement:** User can submit article for analysis 649 649 650 650 **Functionality:** 665 + 651 651 * Text input field (paste article text, up to 5000 characters) 652 652 * URL input field (paste article URL) 653 653 * "Analyze" button to trigger processing ... ... @@ -654,6 +654,7 @@ 654 654 * Loading indicator during analysis 655 655 656 656 **Excluded:** 672 + 657 657 * No user authentication 658 658 * No claim history 659 659 * No search functionality ... ... @@ -660,6 +660,7 @@ 660 660 * No saved templates 661 661 662 662 **Acceptance Criteria:** 679 + 663 663 * User can paste text from article 664 664 * User can paste URL of article 665 665 * System accepts input and triggers analysis ... ... @@ -669,6 +669,7 @@ 669 669 **Requirement:** AI automatically extracts 3-5 factual claims 670 670 671 671 **Functionality:** 689 + 672 672 * AI reads article text 673 673 * AI identifies factual claims (not opinions/questions) 674 674 * AI extracts 3-5 most important claims ... ... @@ -675,6 +675,7 @@ 675 675 * System displays numbered list 676 676 677 677 **Critical:** NO MANUAL EDITING ALLOWED 696 + 678 678 * AI selects which claims to extract 679 679 * AI identifies factual vs. non-factual 680 680 * System processes claims as extracted ... ... @@ -681,11 +681,13 @@ 681 681 * No human curation or correction 682 682 683 683 **Error Handling:** 703 + 684 684 * If extraction fails: Display error message 685 685 * User can retry with different input 686 686 * No manual intervention to fix extraction 687 687 688 688 **Acceptance Criteria:** 709 + 689 689 * AI extracts 3-5 claims automatically 690 690 * Claims are factual (not opinions) 691 691 * Claims are clearly stated ... ... @@ -696,15 +696,17 @@ 696 696 **Requirement:** AI automatically generates verdict for each claim 697 697 698 698 **Functionality:** 720 + 699 699 * For each claim, AI: 700 - * Evaluates claim based on available evidence/knowledge701 - * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED702 - * Assigns confidence score (0-100%)703 - * Assigns risk tier (A/B/C)704 - * Writes brief reasoning (1-3 sentences)722 +* Evaluates claim based on available evidence/knowledge 723 +* Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED 724 +* Assigns confidence score (0-100%) 725 +* Assigns risk tier (A/B/C) 726 +* Writes brief reasoning (1-3 sentences) 705 705 * System displays verdict for each claim 706 706 707 707 **Critical:** NO MANUAL EDITING ALLOWED 730 + 708 708 * AI computes verdicts based on evidence 709 709 * AI generates confidence scores 710 710 * AI writes reasoning ... ... @@ -711,11 +711,13 @@ 711 711 * No human review or adjustment 712 712 713 713 **Error Handling:** 737 + 714 714 * If verdict generation fails: Display error message 715 715 * User can retry 716 716 * No manual intervention to adjust verdicts 717 717 718 718 **Acceptance Criteria:** 743 + 719 719 * Each claim has a verdict 720 720 * Confidence score is displayed (0-100%) 721 721 * Risk tier is displayed (A/B/C) ... ... @@ -728,15 +728,17 @@ 728 728 **Requirement:** AI generates brief summary of analysis 729 729 730 730 **Functionality:** 756 + 731 731 * AI summarizes findings in 3-5 sentences: 732 - * How many claims found733 - * Distribution of verdicts734 - * Overall assessment758 +* How many claims found 759 +* Distribution of verdicts 760 +* Overall assessment 735 735 * System displays at top of results 736 736 737 737 **Critical:** NO MANUAL EDITING ALLOWED 738 738 739 739 **Acceptance Criteria:** 766 + 740 740 * Summary is coherent 741 741 * Accurately reflects analysis 742 742 * 3-5 sentences ... ... @@ -747,6 +747,7 @@ 747 747 **Requirement:** AI generates brief summary of original article 748 748 749 749 **Functionality:** 777 + 750 750 * AI summarizes article content (not FactHarbor's analysis) 751 751 * 3-5 sentences 752 752 * System displays ... ... @@ -756,6 +756,7 @@ 756 756 **Critical:** NO MANUAL EDITING ALLOWED 757 757 758 758 **Acceptance Criteria:** 787 + 759 759 * Summary is neutral (article's position) 760 760 * Accurately reflects article content 761 761 * 3-5 sentences ... ... @@ -766,6 +766,7 @@ 766 766 **Requirement:** Clear labeling of AI-generated content 767 767 768 768 **Functionality:** 798 + 769 769 * Display Mode 2 publication label 770 770 * Show POC/Demo disclaimer 771 771 * Display risk tiers per claim ... ... @@ -773,6 +773,7 @@ 773 773 * Display timestamp 774 774 775 775 **Acceptance Criteria:** 806 + 776 776 * Label is prominent and clear 777 777 * User understands this is AI-generated POC output 778 778 * Risk tiers are color-coded ... ... @@ -783,6 +783,7 @@ 783 783 **Requirement:** Execute simplified quality gates 784 784 785 785 **Functionality:** 817 + 786 786 * Check source quality (basic) 787 787 * Attempt contradiction search (basic) 788 788 * Calculate confidence scores ... ... @@ -790,6 +790,7 @@ 790 790 * Display gate results 791 791 792 792 **Acceptance Criteria:** 825 + 793 793 * All 4 gates attempted 794 794 * Pass/fail status displayed 795 795 * Failures explained to user ... ... @@ -804,6 +804,7 @@ 804 804 **Critical Rule:** NO MANUAL EDITING AT ANY STAGE 805 805 806 806 **What this means:** 840 + 807 807 * Claims: AI selects (no human curation) 808 808 * Scenarios: N/A (deferred to POC2) 809 809 * Evidence: AI evaluates (no human selection) ... ... @@ -811,13 +811,12 @@ 811 811 * Summaries: AI writes (no human editing) 812 812 813 813 **Pipeline:** 814 -{{code}} 815 -User Input → AKEL Processing → Output Display 848 +{{code}}User Input → AKEL Processing → Output Display 816 816 ↓ 817 - ZERO human editing 818 -{{/code}} 850 + ZERO human editing{{/code}} 819 819 820 820 **If AI output is poor:** 853 + 821 821 * ❌ Do NOT manually fix it 822 822 * ✅ Document the failure 823 823 * ✅ Improve prompts and retry ... ... @@ -824,6 +824,7 @@ 824 824 * ✅ Accept that POC might fail 825 825 826 826 **Why this matters:** 860 + 827 827 * Tests whether AI can do this without humans 828 828 * Validates scalability (humans can't review every analysis) 829 829 * Honest test of technical feasibility ... ... @@ -833,16 +833,19 @@ 833 833 **Requirement:** Analysis completes in reasonable time 834 834 835 835 **Acceptable Performance:** 870 + 836 836 * Processing time: 1-5 minutes (acceptable for POC) 837 837 * Display loading indicator to user 838 838 * Show progress if possible ("Extracting claims...", "Generating verdicts...") 839 839 840 840 **Not Required:** 876 + 841 841 * Production-level speed (< 30 seconds) 842 842 * Optimization for scale 843 843 * Caching 844 844 845 845 **Acceptance Criteria:** 882 + 846 846 * Analysis completes within 5 minutes 847 847 * User sees loading indicator 848 848 * No timeout errors ... ... @@ -852,16 +852,19 @@ 852 852 **Requirement:** System works for manual testing sessions 853 853 854 854 **Acceptable:** 892 + 855 855 * Occasional errors (< 20% failure rate) 856 856 * Manual restart if needed 857 857 * Display error messages clearly 858 858 859 859 **Not Required:** 898 + 860 860 * 99.9% uptime 861 861 * Automatic error recovery 862 862 * Production monitoring 863 863 864 864 **Acceptance Criteria:** 904 + 865 865 * System works for test demonstrations 866 866 * Errors are handled gracefully 867 867 * User receives clear error messages ... ... @@ -871,6 +871,7 @@ 871 871 **Requirement:** Runs on simple infrastructure 872 872 873 873 **Acceptable:** 914 + 874 874 * Single machine or simple cloud setup 875 875 * No distributed architecture 876 876 * No load balancing ... ... @@ -878,6 +878,7 @@ 878 878 * Local development environment viable 879 879 880 880 **Not Required:** 922 + 881 881 * Production infrastructure 882 882 * Multi-region deployment 883 883 * Auto-scaling ... ... @@ -888,6 +888,7 @@ 888 888 **Requirement:** Track and display LLM usage metrics to inform optimization decisions 889 889 890 890 **Must Track:** 933 + 891 891 * Input tokens (article + prompt) 892 892 * Output tokens (generated analysis) 893 893 * Total tokens ... ... @@ -896,16 +896,19 @@ 896 896 * Article length (words/characters) 897 897 898 898 **Must Display:** 942 + 899 899 * Usage statistics in UI (Component 5) 900 900 * Cost per analysis 901 901 * Cost per claim extracted 902 902 903 903 **Must Log:** 948 + 904 904 * Aggregate metrics for analysis 905 905 * Cost distribution by article length 906 906 * Token efficiency trends 907 907 908 908 **Purpose:** 954 + 909 909 * Understand unit economics 910 910 * Identify optimization opportunities 911 911 * Project costs at scale ... ... @@ -912,6 +912,7 @@ 912 912 * Inform architecture decisions (caching, model selection, etc.) 913 913 914 914 **Acceptance Criteria:** 961 + 915 915 * ✅ Usage data displayed after each analysis 916 916 * ✅ Metrics logged for aggregate analysis 917 917 * ✅ Cost calculated accurately (Claude API pricing) ... ... @@ -919,6 +919,7 @@ 919 919 * ✅ POC1 report includes cost analysis section 920 920 921 921 **Success Target:** 969 + 922 922 * Average cost per analysis < $0.05 USD 923 923 * Cost scaling behavior understood (linear/exponential) 924 924 * 2+ optimization opportunities identified ... ... @@ -930,11 +930,13 @@ 930 930 === 10.1 System Components === 931 931 932 932 **Frontend:** 981 + 933 933 * Simple HTML form (text input + URL input + button) 934 934 * Loading indicator 935 935 * Results display page (single page, no tabs/navigation) 936 936 937 937 **Backend:** 987 + 938 938 * Single API endpoint 939 939 * Calls Claude API (Sonnet 4.5 or latest) 940 940 * Parses response ... ... @@ -941,10 +941,12 @@ 941 941 * Returns JSON to frontend 942 942 943 943 **Data Storage:** 994 + 944 944 * None required (stateless POC) 945 945 * Optional: Simple file storage or SQLite for demo examples 946 946 947 947 **External Services:** 999 + 948 948 * Claude API (Anthropic) - required 949 949 * Optional: URL fetch service for article text extraction 950 950 ... ... @@ -977,8 +977,7 @@ 977 977 === 10.3 AI Prompt Strategy === 978 978 979 979 **Single Comprehensive Prompt:** 980 -{{code}} 981 -Task: Analyze this article and provide: 1032 +{{code}}Task: Analyze this article and provide: 982 982 983 983 1. Identify the article's main thesis/conclusion 984 984 - What is the article trying to argue or prove? ... ... @@ -1014,8 +1014,7 @@ 1014 1014 1015 1015 7. Write article summary (3-5 sentences: neutral summary of article content) 1016 1016 1017 -Return as structured JSON with quality gate results. 1018 -{{/code}} 1068 +Return as structured JSON with quality gate results.{{/code}} 1019 1019 1020 1020 **One prompt generates everything.** 1021 1021 ... ... @@ -1026,25 +1026,30 @@ 1026 1026 === 10.4 Technology Stack Suggestions === 1027 1027 1028 1028 **Frontend:** 1079 + 1029 1029 * HTML + CSS + JavaScript (minimal framework) 1030 1030 * OR: Next.js (if team prefers) 1031 1031 * Hosted: Local machine OR Vercel/Netlify free tier 1032 1032 1033 1033 **Backend:** 1085 + 1034 1034 * Python Flask/FastAPI (simple REST API) 1035 1035 * OR: Next.js API routes (if using Next.js) 1036 1036 * Hosted: Local machine OR Railway/Render free tier 1037 1037 1038 1038 **AKEL Integration:** 1091 + 1039 1039 * Claude API via Anthropic SDK 1040 1040 * Model: Claude Sonnet 4.5 or latest available 1041 1041 1042 1042 **Database:** 1096 + 1043 1043 * None (stateless acceptable) 1044 1044 * OR: SQLite if want to store demo examples 1045 1045 * OR: JSON files on disk 1046 1046 1047 1047 **Deployment:** 1102 + 1048 1048 * Local development environment sufficient for POC 1049 1049 * Optional: Deploy to cloud for remote demos 1050 1050 ... ... @@ -1053,6 +1053,7 @@ 1053 1053 === 11.1 Minimum Success (POC Passes) === 1054 1054 1055 1055 **Required for GO decision:** 1111 + 1056 1056 * ✅ AI extracts 3-5 factual claims automatically 1057 1057 * ✅ AI provides verdict for each claim automatically 1058 1058 * ✅ Verdicts are reasonable (≥70% make logical sense) ... ... @@ -1066,6 +1066,7 @@ 1066 1066 * ✅ **Optimization opportunities identified** (≥2 potential improvements documented) 1067 1067 1068 1068 **Quality Definition:** 1125 + 1069 1069 * "Reasonable verdict" = Defensible given general knowledge 1070 1070 * "Coherent summary" = Logically structured, grammatically correct 1071 1071 * "Comprehensible" = Reviewers understand what analysis means ... ... @@ -1073,6 +1073,7 @@ 1073 1073 === 11.2 POC Fails If === 1074 1074 1075 1075 **Automatic NO-GO if any of these:** 1133 + 1076 1076 * ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones) 1077 1077 * ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random) 1078 1078 * ❌ Output incomprehensible (reviewers can't understand analysis) ... ... @@ -1084,14 +1084,15 @@ 1084 1084 **POC quality expectations:** 1085 1085 1086 1086 |=Component|=Quality Threshold|=Definition 1087 -|Claim Extraction|(% class="success" %)≥70% accuracy (%%)|Identifies obvious factual claims, may miss some edge cases1088 -|Verdict Logic|(% class="success" %)≥70% defensible (%%)|Verdicts are logical given reasoning provided1089 -|Reasoning Clarity|(% class="success" %)≥70% clear (%%)|1-3 sentences are understandable and relevant1090 -|Overall Analysis|(% class="success" %)≥70% useful (%%)|Output helps user understand article claims1145 +|Claim Extraction|(% class="success" %)≥70% accuracy |Identifies obvious factual claims, may miss some edge cases 1146 +|Verdict Logic|(% class="success" %)≥70% defensible |Verdicts are logical given reasoning provided 1147 +|Reasoning Clarity|(% class="success" %)≥70% clear |1-3 sentences are understandable and relevant 1148 +|Overall Analysis|(% class="success" %)≥70% useful |Output helps user understand article claims 1091 1091 1092 1092 **Analogy:** "B student" quality (70-80%), not "A+" perfection yet 1093 1093 1094 1094 **Not expecting:** 1153 + 1095 1095 * 100% accuracy 1096 1096 * Perfect claim coverage 1097 1097 * Comprehensive evidence gathering ... ... @@ -1099,6 +1099,7 @@ 1099 1099 * Production polish 1100 1100 1101 1101 **Expecting:** 1161 + 1102 1102 * Reasonable claim extraction 1103 1103 * Defensible verdicts 1104 1104 * Understandable reasoning ... ... @@ -1111,6 +1111,7 @@ 1111 1111 **Input:** "Coffee reduces the risk of type 2 diabetes by 30%" 1112 1112 1113 1113 **Expected Output:** 1174 + 1114 1114 * Extract claim correctly 1115 1115 * Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED 1116 1116 * Confidence: 70-90% ... ... @@ -1124,6 +1124,7 @@ 1124 1124 **Input:** News article URL with multiple claims about politics/health/science 1125 1125 1126 1126 **Expected Output:** 1188 + 1127 1127 * Extract 3-5 key claims 1128 1128 * Verdict for each (may vary: some supported, some uncertain, some refuted) 1129 1129 * Coherent analysis summary ... ... @@ -1137,6 +1137,7 @@ 1137 1137 **Input:** Article on contested political or scientific topic 1138 1138 1139 1139 **Expected Output:** 1202 + 1140 1140 * Balanced analysis 1141 1141 * Acknowledges uncertainty where appropriate 1142 1142 * Doesn't overstate confidence ... ... @@ -1149,6 +1149,7 @@ 1149 1149 **Input:** Article with obviously false claim (e.g., "The Earth is flat") 1150 1150 1151 1151 **Expected Output:** 1215 + 1152 1152 * Extract claim 1153 1153 * Verdict: REFUTED 1154 1154 * High confidence (> 90%) ... ... @@ -1162,6 +1162,7 @@ 1162 1162 **Input:** Article with claim where evidence is genuinely mixed 1163 1163 1164 1164 **Expected Output:** 1229 + 1165 1165 * Extract claim 1166 1166 * Verdict: UNCERTAIN 1167 1167 * Moderate confidence (40-60%) ... ... @@ -1174,6 +1174,7 @@ 1174 1174 **Input:** Article making medical claims 1175 1175 1176 1176 **Expected Output:** 1242 + 1177 1177 * Extract claim 1178 1178 * Verdict: [appropriate based on evidence] 1179 1179 * Risk tier: A (High - medical) ... ... @@ -1191,6 +1191,7 @@ 1191 1191 **Option A: GO (Proceed to POC2)** 1192 1192 1193 1193 **Conditions:** 1260 + 1194 1194 * AI quality ≥70% without manual editing 1195 1195 * Basic claim → verdict pipeline validated 1196 1196 * Internal + advisor feedback positive ... ... @@ -1199,6 +1199,7 @@ 1199 1199 * Clear path to improving AI quality to ≥90% 1200 1200 1201 1201 **Next Steps:** 1269 + 1202 1202 * Plan POC2 development (add scenarios) 1203 1203 * Design scenario architecture 1204 1204 * Expand to Evidence Model structure ... ... @@ -1207,6 +1207,7 @@ 1207 1207 **Option B: NO-GO (Pivot or Stop)** 1208 1208 1209 1209 **Conditions:** 1278 + 1210 1210 * AI quality < 60% 1211 1211 * Requires manual editing for most analyses (> 50%) 1212 1212 * Feedback indicates fundamental flaws ... ... @@ -1214,6 +1214,7 @@ 1214 1214 * No clear path to improvement 1215 1215 1216 1216 **Next Steps:** 1286 + 1217 1217 * **Pivot:** Change to hybrid human-AI approach (accept manual review required) 1218 1218 * **Stop:** Conclude approach not viable, revisit later 1219 1219 ... ... @@ -1220,6 +1220,7 @@ 1220 1220 **Option C: ITERATE (Improve POC)** 1221 1221 1222 1222 **Conditions:** 1293 + 1223 1223 * Concept has merit but execution needs work 1224 1224 * Specific improvements identified 1225 1225 * Addressable with better prompts/approach ... ... @@ -1226,6 +1226,7 @@ 1226 1226 * AI quality between 60-70% 1227 1227 1228 1228 **Next Steps:** 1300 + 1229 1229 * Improve AI prompts 1230 1230 * Test different approaches 1231 1231 * Re-run POC with improvements ... ... @@ -1247,6 +1247,7 @@ 1247 1247 **Impact:** POC fails 1248 1248 1249 1249 **Mitigation:** 1322 + 1250 1250 * Extensive prompt engineering and testing 1251 1251 * Use best available AI models (Sonnet 4.5) 1252 1252 * Test with diverse article types ... ... @@ -1260,6 +1260,7 @@ 1260 1260 **Impact:** Works sometimes, fails other times 1261 1261 1262 1262 **Mitigation:** 1336 + 1263 1263 * Test with 10+ diverse articles 1264 1264 * Measure success rate honestly 1265 1265 * Improve prompts to increase consistency ... ... @@ -1272,6 +1272,7 @@ 1272 1272 **Impact:** Users can't understand analysis 1273 1273 1274 1274 **Mitigation:** 1349 + 1275 1275 * Create clear explainer document 1276 1276 * Iterate on output format 1277 1277 * Test with non-technical reviewers ... ... @@ -1285,6 +1285,7 @@ 1285 1285 **Impact:** System slow or expensive 1286 1286 1287 1287 **Mitigation:** 1363 + 1288 1288 * Monitor API usage 1289 1289 * Implement retry logic 1290 1290 * Estimate costs before scaling ... ... @@ -1297,6 +1297,7 @@ 1297 1297 **Impact:** POC becomes too complex 1298 1298 1299 1299 **Mitigation:** 1376 + 1300 1300 * Strict scope discipline 1301 1301 * Say NO to feature additions 1302 1302 * Keep focus on core question ... ... @@ -1307,12 +1307,15 @@ 1307 1307 1308 1308 === 15.1 Core Principles === 1309 1309 1310 -**1. Build Less, Learn More** 1387 +* 1388 +** 1389 +**1. Build Less, Learn More 1311 1311 * Minimum features to test hypothesis 1312 1312 * Don't build unvalidated features 1313 1313 * Focus on core question only 1314 1314 1315 1315 **2. Fail Fast** 1395 + 1316 1316 * Quick test of hardest part (AI capability) 1317 1317 * Accept that POC might fail 1318 1318 * Better to discover issues early ... ... @@ -1319,16 +1319,19 @@ 1319 1319 * Honest assessment over optimistic hope 1320 1320 1321 1321 **3. Test First, Build Second** 1402 + 1322 1322 * Validate AI can do this before building platform 1323 1323 * Don't assume it will work 1324 1324 * Let results guide decisions 1325 1325 1326 1326 **4. Automation First** 1408 + 1327 1327 * No manual editing allowed 1328 1328 * Tests scalability, not just feasibility 1329 1329 * Proves approach can work at scale 1330 1330 1331 1331 **5. Honest Assessment** 1414 + 1332 1332 * Don't cherry-pick examples 1333 1333 * Don't manually fix bad outputs 1334 1334 * Document failures openly ... ... @@ -1349,9 +1349,12 @@ 1349 1349 ❌ Perfectly accurate analysis 1350 1350 ❌ Polished user experience 1351 1351 1352 -== 16. Success = Clear Path Forward==1435 +== 16. Success == 1353 1353 1437 + Clear Path Forward == 1438 + 1354 1354 **If POC succeeds (≥70% AI quality):** 1440 + 1355 1355 * ✅ Approach validated 1356 1356 * ✅ Proceed to POC2 (add scenarios) 1357 1357 * ✅ Design full Evidence Model structure ... ... @@ -1359,6 +1359,7 @@ 1359 1359 * ✅ Focus on improving AI quality from 70% → 90% 1360 1360 1361 1361 **If POC fails (< 60% AI quality):** 1448 + 1362 1362 * ✅ Learn what doesn't work 1363 1363 * ✅ Pivot to different approach 1364 1364 * ✅ OR wait for better AI technology ... ... @@ -1385,16 +1385,16 @@ 1385 1385 **POC1 Implementation:** 1386 1386 1387 1387 * **Primary Provider:** Anthropic Claude API 1388 - * Stage 1: Claude Haiku 41389 - * Stage 2: Claude Sonnet 3.5 (cached)1390 - * Stage 3: Claude Sonnet 3.51475 +* Stage 1: Claude Haiku 4 1476 +* Stage 2: Claude Sonnet 3.5 (cached) 1477 +* Stage 3: Claude Sonnet 3.5 1391 1391 1392 1392 * **Provider Interface:** Abstract LLMProvider interface implemented 1393 1393 1394 1394 * **Configuration:** Environment variables for provider selection 1395 - * {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}}1396 - * {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}}1397 - * {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}}1482 +* {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}} 1483 +* {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}} 1484 +* {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}} 1398 1398 1399 1399 * **Failover:** Basic error handling with cache fallback for Stage 2 1400 1400 ... ... @@ -1414,9 +1414,10 @@ 1414 1414 * Cost tracking includes provider name in logs 1415 1415 * Stage 2 falls back to cache on provider failure 1416 1416 1417 -**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor.Specification.POC.API-and-Schemas.WebHome]] Section 6 1504 +**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor V0\.9\.103.Specification.POC.API-and-Schemas.WebHome]] Section 6 1418 1418 1419 1419 **Dependencies:** 1507 + 1420 1420 * NFR-14 (Main Requirements) 1421 1421 * Design Decision 9 1422 1422 * Architecture Section 2.2 ... ... @@ -1424,5 +1424,3 @@ 1424 1424 **Priority:** HIGH (P1) 1425 1425 1426 1426 **Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later. 1427 - 1428 -