Changes for page POC Requirements (POC1 & POC2)
Last modified by Robert Schaub on 2026/02/08 08:26
Summary
-
Page properties (2 modified, 0 added, 0 removed)
Details
- Page properties
-
- Title
-
... ... @@ -1,1 +1,1 @@ 1 -POC Requirements (POC1 & POC2)1 +POC Requirements - Content
-
... ... @@ -1,18 +1,11 @@ 1 1 = POC Requirements = 2 2 3 - 4 -{{info}} 5 -**POC1 Architecture:** 3-stage AKEL pipeline (Extract → Analyze → Holistic) with Redis caching, credit tracking, and LLM abstraction layer. 6 - 7 -See [[POC1 API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome]] for complete technical details. 8 -{{/info}} 9 - 10 - 11 - 12 -**Status:** ✅ Approved for Development 13 -**Version:** 2.0 (Updated after Specification Cross-Check) 3 +**Status:** ✅ Approved for Development 4 +**Version:** 2.0 (Updated after Specification Cross-Check) 14 14 **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention 15 15 7 +--- 8 + 16 16 == 1. POC Overview == 17 17 18 18 === 1.1 What POC Tests === ... ... @@ -33,6 +33,8 @@ 33 33 * Perfect accuracy 34 34 * Complete feature set 35 35 29 +--- 30 + 36 36 === 1.2 Scenarios Deferred to POC2 === 37 37 38 38 **Intentional Simplification:** ... ... @@ -66,65 +66,33 @@ 66 66 Claims → Verdicts (scenarios implicit in reasoning) 67 67 {{/code}} 68 68 64 +--- 65 + 69 69 == 2. POC Output Specification == 70 70 71 -=== 2.1 Component 1: ANALYSIS SUMMARY (Context-Aware)===68 +=== 2.1 Component 1: ANALYSIS SUMMARY === 72 72 73 -**What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument 70 +**What:** Brief overview of findings 71 +**Length:** 3-5 sentences 72 +**Content:** 73 +* How many claims found 74 +* Distribution of verdicts 75 +* Overall assessment 74 74 75 -**Length:** 4-6 sentences 76 - 77 -**Content (Required Elements):** 78 -1. **Article's main thesis/claim** - What is the article trying to argue or prove? 79 -2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts 80 -3. **Central vs. supporting claims** - Which claims are central to the article's argument? 81 -4. **Relationship assessment** - Do the claims support the article's conclusion? 82 -5. **Overall credibility** - Final assessment considering claim importance 83 - 84 -**Critical Innovation:** 85 - 86 -POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might: 87 -* Make accurate supporting facts but draw unsupported conclusions 88 -* Have one false central claim that invalidates the whole argument 89 -* Misframe accurate information to mislead 90 - 91 -**Good Example (Context-Aware):** 77 +**Example:** 92 92 {{code}} 93 -This article argues that coffee cures cancer based on its antioxidant 94 -content. We analyzed 3 factual claims: 2 about coffee's chemical 95 -properties are well-supported, but the main causal claim is refuted 96 -by current evidence. The article confuses correlation with causation. 97 -Overall assessment: MISLEADING - makes an unsupported medical claim 98 -despite citing some accurate facts. 79 +This article makes 4 claims about coffee's health effects. We found 80 +2 claims are well-supported, 1 is uncertain, and 1 is refuted. 81 +Overall assessment: mostly accurate with some exaggeration. 99 99 {{/code}} 100 100 101 -**Poor Example (Simple Aggregation - Don't Do This):** 102 -{{code}} 103 -This article makes 3 claims. 2 are well-supported and 1 is refuted. 104 -Overall assessment: mostly accurate (67% accurate). 105 -{{/code}} 106 -↑ This misses that the refuted claim IS the article's main point! 84 +--- 107 107 108 -**What POC1 Tests:** 109 - 110 -Can AI identify and assess: 111 -* ✅ The article's main thesis/conclusion? 112 -* ✅ Which claims are central vs. supporting? 113 -* ✅ Whether the evidence supports the conclusion? 114 -* ✅ Overall credibility considering logical structure? 115 - 116 -**If AI Cannot Do This:** 117 - 118 -That's valuable to learn in POC1! We'll: 119 -* Note as limitation 120 -* Fall back to simple aggregation with warning 121 -* Design explicit article-level analysis for POC2 122 - 123 123 === 2.2 Component 2: CLAIMS IDENTIFICATION === 124 124 125 -**What:** List of factual claims extracted from article 126 -**Format:** Numbered list 127 -**Quantity:** 3-5 claims 88 +**What:** List of factual claims extracted from article 89 +**Format:** Numbered list 90 +**Quantity:** 3-5 claims 128 128 **Requirements:** 129 129 * Factual claims only (not opinions/questions) 130 130 * Clearly stated ... ... @@ -140,10 +140,12 @@ 140 140 [4] Coffee prevents Alzheimer's completely 141 141 {{/code}} 142 142 106 +--- 107 + 143 143 === 2.3 Component 3: CLAIMS VERDICTS === 144 144 145 -**What:** Verdict for each claim identified 146 -**Format:** Per claim structure 110 +**What:** Verdict for each claim identified 111 +**Format:** Per claim structure 147 147 148 148 **Required Elements:** 149 149 * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED ... ... @@ -170,15 +170,17 @@ 170 170 171 171 **Risk Tier Display:** 172 172 * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections 173 -* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality 138 +* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality 174 174 * **Tier C (Green):** Low Risk - Facts/Definitions/History 175 175 176 176 **Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow. 177 177 143 +--- 144 + 178 178 === 2.4 Component 4: ARTICLE SUMMARY (Optional) === 179 179 180 -**What:** Brief summary of original article content 181 -**Length:** 3-5 sentences 147 +**What:** Brief summary of original article content 148 +**Length:** 3-5 sentences 182 182 **Tone:** Neutral (article's position, not FactHarbor's analysis) 183 183 184 184 **Example:** ... ... @@ -190,60 +190,17 @@ 190 190 to disease prevention. Recommends 2-3 cups daily for optimal health. 191 191 {{/code}} 192 192 193 - === 2.5 Component 5: USAGE STATISTICS (Cost Tracking) ===160 +--- 194 194 195 - **What:**LLMusage metrics for costoptimizationand scaling decisions162 +=== 2.5 Total Output Size === 196 196 197 -**Purpose:** 198 -* Understand cost per analysis 199 -* Identify optimization opportunities 200 -* Project costs at scale 201 -* Inform architecture decisions 202 - 203 -**Display Format:** 204 -{{code}} 205 -USAGE STATISTICS: 206 -• Article: 2,450 words (12,300 characters) 207 -• Input tokens: 15,234 208 -• Output tokens: 892 209 -• Total tokens: 16,126 210 -• Estimated cost: $0.24 USD 211 -• Response time: 8.3 seconds 212 -• Cost per claim: $0.048 213 -• Model: claude-sonnet-4-20250514 214 -{{/code}} 215 - 216 -**Why This Matters:** 217 - 218 -At scale, LLM costs are critical: 219 -* 10,000 articles/month ≈ $200-500/month 220 -* 100,000 articles/month ≈ $2,000-5,000/month 221 -* Cost optimization can reduce expenses 30-50% 222 - 223 -**What POC1 Learns:** 224 -* How cost scales with article length 225 -* Prompt optimization opportunities (caching, compression) 226 -* Output verbosity tradeoffs 227 -* Model selection strategy (FAST vs. REASONING roles) 228 -* Article length limits (if needed) 229 - 230 -**Implementation:** 231 -* Claude API already returns usage data 232 -* No extra API calls needed 233 -* Display to user + log for aggregate analysis 234 -* Test with articles of varying lengths 235 - 236 -**Critical for GO/NO-GO:** Unit economics must be viable at scale! 237 - 238 -=== 2.6 Total Output Size === 239 - 240 -**Combined:** ~220-350 words 241 -* Analysis Summary (Context-Aware): 60-90 words (4-6 sentences) 164 +**Combined:** ~200-300 words 165 +* Analysis Summary: 50-70 words 242 242 * Claims Identification: 30-50 words 243 243 * Claims Verdicts: 100-150 words 244 244 * Article Summary: 30-50 words (optional) 245 245 246 - **Note:** Analysis summary is slightly longer (4-6 sentences vs. 3-5) to accommodate context-aware assessment of article structure and logical reasoning.170 +--- 247 247 248 248 == 3. What's NOT in POC Scope == 249 249 ... ... @@ -291,6 +291,8 @@ 291 291 * ❌ Analytics 292 292 * ❌ A/B testing 293 293 218 +--- 219 + 294 294 == 4. POC Simplifications vs. Full System == 295 295 296 296 === 4.1 Architecture Comparison === ... ... @@ -298,7 +298,7 @@ 298 298 **POC Architecture (Simplified):** 299 299 {{code}} 300 300 User Input → Single AKEL Call → Output Display 301 - (all processing) 227 + (all processing) 302 302 {{/code}} 303 303 304 304 **Full System Architecture:** ... ... @@ -319,6 +319,8 @@ 319 319 |Data Model|Stateless (no database)|PostgreSQL + Redis + S3 320 320 |Architecture|Single prompt to Claude|AKEL Orchestrator + Components 321 321 248 +--- 249 + 322 322 === 4.2 Workflow Comparison === 323 323 324 324 **POC1 Workflow:** ... ... @@ -336,6 +336,8 @@ 336 336 6. **Time Evolution** (versioning, re-evaluation triggers) 337 337 **Total: 6 phases with quality gates, ~10-30 seconds** 338 338 267 +--- 268 + 339 339 === 4.3 Why POC is Simplified === 340 340 341 341 **Engineering Rationale:** ... ... @@ -354,6 +354,8 @@ 354 354 * ❌ POC doesn't validate scale (test in Beta) 355 355 * ❌ POC doesn't validate scenario architecture (design in POC2) 356 356 287 +--- 288 + 357 357 === 4.4 Gap Between POC1 and POC2/Beta === 358 358 359 359 **What needs to be built for POC2:** ... ... @@ -373,6 +373,8 @@ 373 373 374 374 **POC1 → POC2 is significant architectural expansion.** 375 375 308 +--- 309 + 376 376 == 5. Publication Mode & Labeling == 377 377 378 378 === 5.1 POC Publication Mode === ... ... @@ -386,20 +386,22 @@ 386 386 * All quality gates active (simplified) 387 387 * Risk tier classification shown (demo) 388 388 323 +--- 324 + 389 389 === 5.2 User-Facing Labels === 390 390 391 391 **Primary Label (top of analysis):** 392 392 {{code}} 393 393 ╔════════════════════════════════════════════════════════════╗ 394 -║ [AI-GENERATED - POC/DEMO] ║ 395 -║ ║ 396 -║ This analysis was produced entirely by AI and has not ║ 397 -║ been human-reviewed. Use for demonstration purposes. ║ 398 -║ ║ 399 -║ Source: AI/AKEL v1.0 (POC) ║ 400 -║ Review Status: Not Reviewed (Proof-of-Concept) ║ 401 -║ Quality Gates: 4/4 Passed (Simplified) ║ 402 -║ Last Updated: [timestamp] ║ 330 +║ [AI-GENERATED - POC/DEMO] ║ 331 +║ ║ 332 +║ This analysis was produced entirely by AI and has not ║ 333 +║ been human-reviewed. Use for demonstration purposes. ║ 334 +║ ║ 335 +║ Source: AI/AKEL v1.0 (POC) ║ 336 +║ Review Status: Not Reviewed (Proof-of-Concept) ║ 337 +║ Quality Gates: 4/4 Passed (Simplified) ║ 338 +║ Last Updated: [timestamp] ║ 403 403 ╚════════════════════════════════════════════════════════════╝ 404 404 {{/code}} 405 405 ... ... @@ -408,6 +408,8 @@ 408 408 * **[Risk: B]** 🟡 Medium Risk (Policy/Science) 409 409 * **[Risk: C]** 🟢 Low Risk (Facts/Definitions) 410 410 347 +--- 348 + 411 411 === 5.3 Display Requirements === 412 412 413 413 **Must Show:** ... ... @@ -425,6 +425,8 @@ 425 425 * Authoritative verdicts 426 426 * Complete accuracy 427 427 366 +--- 367 + 428 428 === 5.4 Mode 2 vs. Full System Publication === 429 429 430 430 |=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3 ... ... @@ -435,6 +435,8 @@ 435 435 |Risk Display|Demo only|Workflow-integrated|Validated 436 436 |User Actions|View only|Flag for review|Trust rating 437 437 378 +--- 379 + 438 438 == 6. Quality Gates (Simplified Implementation) == 439 439 440 440 === 6.1 Overview === ... ... @@ -453,6 +453,8 @@ 453 453 * Failures displayed to user (not blocking) 454 454 * Full system has comprehensive validation 455 455 398 +--- 399 + 456 456 === 6.2 Gate 1: Source Quality (Basic) === 457 457 458 458 **Full System Requirements:** ... ... @@ -473,6 +473,8 @@ 473 473 474 474 **Failure Handling:** Display error message, don't generate verdict 475 475 420 +--- 421 + 476 476 === 6.3 Gate 2: Contradiction Search (Basic) === 477 477 478 478 **Full System Requirements:** ... ... @@ -495,6 +495,8 @@ 495 495 496 496 **Failure Handling:** Note "limited contradiction search" in output 497 497 444 +--- 445 + 498 498 === 6.4 Gate 3: Uncertainty Quantification (Basic) === 499 499 500 500 **Full System Requirements:** ... ... @@ -515,6 +515,8 @@ 515 515 516 516 **Failure Handling:** Show "Confidence: Unknown" if calculation fails 517 517 466 +--- 467 + 518 518 === 6.5 Gate 4: Structural Integrity (Basic) === 519 519 520 520 **Full System Requirements:** ... ... @@ -535,6 +535,8 @@ 535 535 536 536 **Failure Handling:** Display error message 537 537 488 +--- 489 + 538 538 === 6.6 Quality Gate Display === 539 539 540 540 **POC shows simplified status:** ... ... @@ -557,6 +557,8 @@ 557 557 Note: This analysis has limited evidence. Use with caution. 558 558 {{/code}} 559 559 512 +--- 513 + 560 560 === 6.7 Simplified vs. Full System === 561 561 562 562 |=Gate|=POC (Simplified)|=Full System ... ... @@ -567,12 +567,14 @@ 567 567 568 568 **POC Goal:** Demonstrate that quality gates are possible, not perfect implementation. 569 569 524 +--- 525 + 570 570 == 7. AKEL Architecture Comparison == 571 571 572 572 === 7.1 POC AKEL (Simplified) === 573 573 574 574 **Implementation:** 575 -* Single providerAPI call (REASONING model)531 +* Single Claude API call (Sonnet 4.5) 576 576 * One comprehensive prompt 577 577 * All processing in single request 578 578 * No separate components ... ... @@ -584,10 +584,10 @@ 584 584 585 585 1. Extract 3-5 factual claims 586 586 2. For each claim: 587 - - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED) 588 - - Assign confidence score (0-100%) 589 - - Assign risk tier (A/B/C) 590 - - Write brief reasoning (1-3 sentences) 543 + - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED) 544 + - Assign confidence score (0-100%) 545 + - Assign risk tier (A/B/C) 546 + - Write brief reasoning (1-3 sentences) 591 591 3. Generate analysis summary (3-5 sentences) 592 592 4. Generate article summary (3-5 sentences) 593 593 5. Run basic quality checks ... ... @@ -597,6 +597,8 @@ 597 597 598 598 **Processing Time:** 10-18 seconds (estimate) 599 599 556 +--- 557 + 600 600 === 7.2 Full System AKEL (Production) === 601 601 602 602 **Architecture:** ... ... @@ -621,6 +621,8 @@ 621 621 622 622 **Processing Time:** 10-30 seconds (full pipeline) 623 623 582 +--- 583 + 624 624 === 7.3 Why POC Uses Single Call === 625 625 626 626 **Advantages:** ... ... @@ -643,6 +643,8 @@ 643 643 644 644 Full component architecture comes in Beta after POC validates concept. 645 645 606 +--- 607 + 646 646 === 7.4 Evolution Path === 647 647 648 648 **POC1:** Single prompt → Prove concept ... ... @@ -650,6 +650,8 @@ 650 650 **Beta:** Multi-component AKEL → Production architecture 651 651 **Release 1.0:** Full AKEL + Federation → Scale 652 652 615 +--- 616 + 653 653 == 8. Functional Requirements == 654 654 655 655 === FR-POC-1: Article Input === ... ... @@ -673,6 +673,8 @@ 673 673 * User can paste URL of article 674 674 * System accepts input and triggers analysis 675 675 640 +--- 641 + 676 676 === FR-POC-2: Claim Extraction (Fully Automated) === 677 677 678 678 **Requirement:** AI automatically extracts 3-5 factual claims ... ... @@ -700,6 +700,8 @@ 700 700 * Claims are clearly stated 701 701 * No manual editing required 702 702 669 +--- 670 + 703 703 === FR-POC-3: Verdict Generation (Fully Automated) === 704 704 705 705 **Requirement:** AI automatically generates verdict for each claim ... ... @@ -706,11 +706,11 @@ 706 706 707 707 **Functionality:** 708 708 * For each claim, AI: 709 - * Evaluates claim based on available evidence/knowledge 710 - * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED 711 - * Assigns confidence score (0-100%) 712 - * Assigns risk tier (A/B/C) 713 - * Writes brief reasoning (1-3 sentences) 677 + * Evaluates claim based on available evidence/knowledge 678 + * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED 679 + * Assigns confidence score (0-100%) 680 + * Assigns risk tier (A/B/C) 681 + * Writes brief reasoning (1-3 sentences) 714 714 * System displays verdict for each claim 715 715 716 716 **Critical:** NO MANUAL EDITING ALLOWED ... ... @@ -732,6 +732,8 @@ 732 732 * Verdict is defensible given reasoning 733 733 * All generated automatically by AI 734 734 703 +--- 704 + 735 735 === FR-POC-4: Analysis Summary (Fully Automated) === 736 736 737 737 **Requirement:** AI generates brief summary of analysis ... ... @@ -738,9 +738,9 @@ 738 738 739 739 **Functionality:** 740 740 * AI summarizes findings in 3-5 sentences: 741 - * How many claims found 742 - * Distribution of verdicts 743 - * Overall assessment 711 + * How many claims found 712 + * Distribution of verdicts 713 + * Overall assessment 744 744 * System displays at top of results 745 745 746 746 **Critical:** NO MANUAL EDITING ALLOWED ... ... @@ -751,6 +751,8 @@ 751 751 * 3-5 sentences 752 752 * Automatically generated 753 753 724 +--- 725 + 754 754 === FR-POC-5: Article Summary (Fully Automated, Optional) === 755 755 756 756 **Requirement:** AI generates brief summary of original article ... ... @@ -770,6 +770,8 @@ 770 770 * 3-5 sentences 771 771 * Automatically generated 772 772 745 +--- 746 + 773 773 === FR-POC-6: Publication Mode Display === 774 774 775 775 **Requirement:** Clear labeling of AI-generated content ... ... @@ -787,6 +787,8 @@ 787 787 * Risk tiers are color-coded 788 788 * Quality gate status is visible 789 789 764 +--- 765 + 790 790 === FR-POC-7: Quality Gate Execution === 791 791 792 792 **Requirement:** Execute simplified quality gates ... ... @@ -804,6 +804,8 @@ 804 804 * Failures explained to user 805 805 * Gates don't block publication (POC mode) 806 806 783 +--- 784 + 807 807 == 9. Non-Functional Requirements == 808 808 809 809 === NFR-POC-1: Fully Automated Processing === ... ... @@ -822,8 +822,8 @@ 822 822 **Pipeline:** 823 823 {{code}} 824 824 User Input → AKEL Processing → Output Display 825 - ↓ 826 - ZERO human editing 803 + ↓ 804 + ZERO human editing 827 827 {{/code}} 828 828 829 829 **If AI output is poor:** ... ... @@ -837,6 +837,8 @@ 837 837 * Validates scalability (humans can't review every analysis) 838 838 * Honest test of technical feasibility 839 839 818 +--- 819 + 840 840 === NFR-POC-2: Performance === 841 841 842 842 **Requirement:** Analysis completes in reasonable time ... ... @@ -856,6 +856,8 @@ 856 856 * User sees loading indicator 857 857 * No timeout errors 858 858 839 +--- 840 + 859 859 === NFR-POC-3: Reliability === 860 860 861 861 **Requirement:** System works for manual testing sessions ... ... @@ -875,6 +875,8 @@ 875 875 * Errors are handled gracefully 876 876 * User receives clear error messages 877 877 860 +--- 861 + 878 878 === NFR-POC-4: Environment === 879 879 880 880 **Requirement:** Runs on simple infrastructure ... ... @@ -892,48 +892,8 @@ 892 892 * Auto-scaling 893 893 * Disaster recovery 894 894 895 - === NFR-POC-5: Cost Efficiency Tracking ===879 +--- 896 896 897 -**Requirement:** Track and display LLM usage metrics to inform optimization decisions 898 - 899 -**Must Track:** 900 -* Input tokens (article + prompt) 901 -* Output tokens (generated analysis) 902 -* Total tokens 903 -* Estimated cost (USD) 904 -* Response time (seconds) 905 -* Article length (words/characters) 906 - 907 -**Must Display:** 908 -* Usage statistics in UI (Component 5) 909 -* Cost per analysis 910 -* Cost per claim extracted 911 - 912 -**Must Log:** 913 -* Aggregate metrics for analysis 914 -* Cost distribution by article length 915 -* Token efficiency trends 916 - 917 -**Purpose:** 918 -* Understand unit economics 919 -* Identify optimization opportunities 920 -* Project costs at scale 921 -* Inform architecture decisions (caching, model selection, etc.) 922 - 923 -**Acceptance Criteria:** 924 -* ✅ Usage data displayed after each analysis 925 -* ✅ Metrics logged for aggregate analysis 926 -* ✅ Cost calculated accurately (Claude API pricing) 927 -* ✅ Test cases include varying article lengths 928 -* ✅ POC1 report includes cost analysis section 929 - 930 -**Success Target:** 931 -* Average cost per analysis < $0.05 USD 932 -* Cost scaling behavior understood (linear/exponential) 933 -* 2+ optimization opportunities identified 934 - 935 -**Critical:** Unit economics must be viable for scaling decision! 936 - 937 937 == 10. Technical Architecture == 938 938 939 939 === 10.1 System Components === ... ... @@ -945,7 +945,7 @@ 945 945 946 946 **Backend:** 947 947 * Single API endpoint 948 -* Calls providerAPI (REASONING model;configuredviaLLM abstraction)892 +* Calls Claude API (Sonnet 4.5 or latest) 949 949 * Parses response 950 950 * Returns JSON to frontend 951 951 ... ... @@ -957,32 +957,36 @@ 957 957 * Claude API (Anthropic) - required 958 958 * Optional: URL fetch service for article text extraction 959 959 904 +--- 905 + 960 960 === 10.2 Processing Flow === 961 961 962 962 {{code}} 963 963 1. User submits text or URL 964 - ↓ 910 + ↓ 965 965 2. Backend receives request 966 - ↓ 912 + ↓ 967 967 3. If URL: Fetch article text 968 - ↓ 914 + ↓ 969 969 4. Call Claude API with single prompt: 970 - "Extract claims, evaluate each, provide verdicts" 971 - ↓ 916 + "Extract claims, evaluate each, provide verdicts" 917 + ↓ 972 972 5. Claude API returns: 973 - - Analysis summary 974 - - Claims list 975 - - Verdicts for each claim (with risk tiers) 976 - - Article summary (optional) 977 - - Quality gate results 978 - ↓ 919 + - Analysis summary 920 + - Claims list 921 + - Verdicts for each claim (with risk tiers) 922 + - Article summary (optional) 923 + - Quality gate results 924 + ↓ 979 979 6. Backend parses response 980 - ↓ 926 + ↓ 981 981 7. Frontend displays results with Mode 2 labeling 982 982 {{/code}} 983 983 984 984 **Key Simplification:** Single API call does entire analysis 985 985 932 +--- 933 + 986 986 === 10.3 AI Prompt Strategy === 987 987 988 988 **Single Comprehensive Prompt:** ... ... @@ -989,49 +989,27 @@ 989 989 {{code}} 990 990 Task: Analyze this article and provide: 991 991 992 -1. Identify the article's main thesis/conclusion 993 - - What is the article trying to argue or prove? 994 - - What is the primary claim or conclusion? 940 +1. Extract 3-5 factual claims from the article 941 +2. For each claim: 942 + - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED) 943 + - Assign confidence score (0-100%) 944 + - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions) 945 + - Write brief reasoning (1-3 sentences) 946 +3. Run quality gates: 947 + - Check: ≥2 sources found 948 + - Attempt: Basic contradiction search 949 + - Calculate: Confidence scores 950 + - Verify: Structural integrity 951 +4. Write analysis summary (3-5 sentences: claims found, verdict distribution, overall assessment) 952 +5. Write article summary (3-5 sentences: neutral summary of article content) 995 995 996 -2. Extract 3-5 factual claims from the article 997 - - Note which claims are CENTRAL to the main thesis 998 - - Note which claims are SUPPORTING facts 999 - 1000 -3. For each claim: 1001 - - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED) 1002 - - Assign confidence score (0-100%) 1003 - - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions) 1004 - - Write brief reasoning (1-3 sentences) 1005 - 1006 -4. Assess relationship between claims and main thesis: 1007 - - Do the claims actually support the article's conclusion? 1008 - - Are there logical leaps or unsupported inferences? 1009 - - Is the article's framing misleading even if individual facts are accurate? 1010 - 1011 -5. Run quality gates: 1012 - - Check: ≥2 sources found 1013 - - Attempt: Basic contradiction search 1014 - - Calculate: Confidence scores 1015 - - Verify: Structural integrity 1016 - 1017 -6. Write context-aware analysis summary (4-6 sentences): 1018 - - State article's main thesis 1019 - - Report claims found and verdict distribution 1020 - - Note if central claims are problematic 1021 - - Assess whether evidence supports conclusion 1022 - - Overall credibility considering claim importance 1023 - 1024 -7. Write article summary (3-5 sentences: neutral summary of article content) 1025 - 1026 1026 Return as structured JSON with quality gate results. 1027 1027 {{/code}} 1028 1028 1029 1029 **One prompt generates everything.** 1030 1030 1031 - **Critical Addition:**959 +--- 1032 1032 1033 -Steps 1, 2 (marking central claims), 4, and 6 are NEW for context-aware analysis. These test whether AI can distinguish between "accurate facts poorly reasoned" vs. "genuinely credible article." 1034 - 1035 1035 === 10.4 Technology Stack Suggestions === 1036 1036 1037 1037 **Frontend:** ... ... @@ -1046,7 +1046,7 @@ 1046 1046 1047 1047 **AKEL Integration:** 1048 1048 * Claude API via Anthropic SDK 1049 -* Model: Provider-defaultREASONING modelor latest available975 +* Model: Claude Sonnet 4.5 or latest available 1050 1050 1051 1051 **Database:** 1052 1052 * None (stateless acceptable) ... ... @@ -1057,6 +1057,8 @@ 1057 1057 * Local development environment sufficient for POC 1058 1058 * Optional: Deploy to cloud for remote demos 1059 1059 986 +--- 987 + 1060 1060 == 11. Success Criteria == 1061 1061 1062 1062 === 11.1 Minimum Success (POC Passes) === ... ... @@ -1070,9 +1070,6 @@ 1070 1070 * ✅ Team/advisors understand the output 1071 1071 * ✅ Team agrees approach has merit 1072 1072 * ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention) 1073 -* ✅ **Cost efficiency acceptable** (average cost per analysis < $0.05 USD target) 1074 -* ✅ **Cost scaling understood** (data collected on article length vs. cost) 1075 -* ✅ **Optimization opportunities identified** (≥2 potential improvements documented) 1076 1076 1077 1077 **Quality Definition:** 1078 1078 * "Reasonable verdict" = Defensible given general knowledge ... ... @@ -1079,6 +1079,8 @@ 1079 1079 * "Coherent summary" = Logically structured, grammatically correct 1080 1080 * "Comprehensible" = Reviewers understand what analysis means 1081 1081 1007 +--- 1008 + 1082 1082 === 11.2 POC Fails If === 1083 1083 1084 1084 **Automatic NO-GO if any of these:** ... ... @@ -1088,6 +1088,8 @@ 1088 1088 * ❌ **Requires manual editing for most analyses** (> 50% need human correction) 1089 1089 * ❌ Team loses confidence in AI-automated approach 1090 1090 1018 +--- 1019 + 1091 1091 === 11.3 Quality Thresholds === 1092 1092 1093 1093 **POC quality expectations:** ... ... @@ -1113,6 +1113,8 @@ 1113 1113 * Understandable reasoning 1114 1114 * Useful output 1115 1115 1045 +--- 1046 + 1116 1116 == 12. Test Cases == 1117 1117 1118 1118 === 12.1 Test Case 1: Simple Factual Claim === ... ... @@ -1128,6 +1128,8 @@ 1128 1128 1129 1129 **Success:** Verdict is reasonable and reasoning makes sense 1130 1130 1062 +--- 1063 + 1131 1131 === 12.2 Test Case 2: Complex News Article === 1132 1132 1133 1133 **Input:** News article URL with multiple claims about politics/health/science ... ... @@ -1141,6 +1141,8 @@ 1141 1141 1142 1142 **Success:** Claims identified are actually from article, verdicts are reasonable 1143 1143 1077 +--- 1078 + 1144 1144 === 12.3 Test Case 3: Controversial Topic === 1145 1145 1146 1146 **Input:** Article on contested political or scientific topic ... ... @@ -1153,6 +1153,8 @@ 1153 1153 1154 1154 **Success:** Analysis is fair and doesn't show obvious bias 1155 1155 1091 +--- 1092 + 1156 1156 === 12.4 Test Case 4: Clearly False Claim === 1157 1157 1158 1158 **Input:** Article with obviously false claim (e.g., "The Earth is flat") ... ... @@ -1166,6 +1166,8 @@ 1166 1166 1167 1167 **Success:** AI correctly identifies false claim with high confidence 1168 1168 1106 +--- 1107 + 1169 1169 === 12.5 Test Case 5: Genuinely Uncertain Claim === 1170 1170 1171 1171 **Input:** Article with claim where evidence is genuinely mixed ... ... @@ -1178,6 +1178,8 @@ 1178 1178 1179 1179 **Success:** AI recognizes uncertainty and doesn't overstate confidence 1180 1180 1120 +--- 1121 + 1181 1181 === 12.6 Test Case 6: High-Risk Medical Claim === 1182 1182 1183 1183 **Input:** Article making medical claims ... ... @@ -1191,6 +1191,8 @@ 1191 1191 1192 1192 **Success:** Risk tier correctly assigned, appropriate warnings shown 1193 1193 1135 +--- 1136 + 1194 1194 == 13. POC Decision Gate == 1195 1195 1196 1196 === 13.1 Decision Framework === ... ... @@ -1213,6 +1213,8 @@ 1213 1213 * Expand to Evidence Model structure 1214 1214 * Test with more complex articles 1215 1215 1159 +--- 1160 + 1216 1216 **Option B: NO-GO (Pivot or Stop)** 1217 1217 1218 1218 **Conditions:** ... ... @@ -1226,6 +1226,8 @@ 1226 1226 * **Pivot:** Change to hybrid human-AI approach (accept manual review required) 1227 1227 * **Stop:** Conclude approach not viable, revisit later 1228 1228 1174 +--- 1175 + 1229 1229 **Option C: ITERATE (Improve POC)** 1230 1230 1231 1231 **Conditions:** ... ... @@ -1240,33 +1240,39 @@ 1240 1240 * Re-run POC with improvements 1241 1241 * Then make GO/NO-GO decision 1242 1242 1190 +--- 1191 + 1243 1243 === 13.2 Decision Criteria Summary === 1244 1244 1245 1245 {{code}} 1246 -AI Quality < 60% → NO-GO (approach doesn't work) 1195 +AI Quality < 60% → NO-GO (approach doesn't work) 1247 1247 AI Quality 60-70% → ITERATE (improve and retry) 1248 -AI Quality ≥70% → GO (proceed to POC2) 1197 +AI Quality ≥70% → GO (proceed to POC2) 1249 1249 {{/code}} 1250 1250 1200 +--- 1201 + 1251 1251 == 14. Key Risks & Mitigations == 1252 1252 1253 1253 === 14.1 Risk: AI Quality Not Good Enough === 1254 1254 1255 -**Likelihood:** Medium-High 1256 -**Impact:** POC fails 1206 +**Likelihood:** Medium-High 1207 +**Impact:** POC fails 1257 1257 1258 1258 **Mitigation:** 1259 1259 * Extensive prompt engineering and testing 1260 -* Use best available AI models ( role-based selection; configured via LLM abstraction)1211 +* Use best available AI models (Sonnet 4.5) 1261 1261 * Test with diverse article types 1262 1262 * Iterate on prompts based on results 1263 1263 1264 1264 **Acceptance:** This is what POC tests - be ready for failure 1265 1265 1217 +--- 1218 + 1266 1266 === 14.2 Risk: AI Consistency Issues === 1267 1267 1268 -**Likelihood:** Medium 1269 -**Impact:** Works sometimes, fails other times 1221 +**Likelihood:** Medium 1222 +**Impact:** Works sometimes, fails other times 1270 1270 1271 1271 **Mitigation:** 1272 1272 * Test with 10+ diverse articles ... ... @@ -1275,10 +1275,12 @@ 1275 1275 1276 1276 **Acceptance:** Some variability OK if average quality ≥70% 1277 1277 1231 +--- 1232 + 1278 1278 === 14.3 Risk: Output Incomprehensible === 1279 1279 1280 -**Likelihood:** Low-Medium 1281 -**Impact:** Users can't understand analysis 1235 +**Likelihood:** Low-Medium 1236 +**Impact:** Users can't understand analysis 1282 1282 1283 1283 **Mitigation:** 1284 1284 * Create clear explainer document ... ... @@ -1288,10 +1288,12 @@ 1288 1288 1289 1289 **Acceptance:** Iterate until comprehensible 1290 1290 1246 +--- 1247 + 1291 1291 === 14.4 Risk: API Rate Limits / Costs === 1292 1292 1293 -**Likelihood:** Low 1294 -**Impact:** System slow or expensive 1250 +**Likelihood:** Low 1251 +**Impact:** System slow or expensive 1295 1295 1296 1296 **Mitigation:** 1297 1297 * Monitor API usage ... ... @@ -1300,10 +1300,12 @@ 1300 1300 1301 1301 **Acceptance:** POC can be slow and expensive (optimization later) 1302 1302 1260 +--- 1261 + 1303 1303 === 14.5 Risk: Scope Creep === 1304 1304 1305 -**Likelihood:** Medium 1306 -**Impact:** POC becomes too complex 1264 +**Likelihood:** Medium 1265 +**Impact:** POC becomes too complex 1307 1307 1308 1308 **Mitigation:** 1309 1309 * Strict scope discipline ... ... @@ -1312,6 +1312,8 @@ 1312 1312 1313 1313 **Acceptance:** POC is minimal by design 1314 1314 1274 +--- 1275 + 1315 1315 == 15. POC Philosophy == 1316 1316 1317 1317 === 15.1 Core Principles === ... ... @@ -1343,21 +1343,27 @@ 1343 1343 * Document failures openly 1344 1344 * Make data-driven decisions 1345 1345 1307 +--- 1308 + 1346 1346 === 15.2 What POC Is === 1347 1347 1348 -✅ Testing AI capability without humans 1349 -✅ Proving core technical concept 1350 -✅ Fast validation of approach 1351 -✅ Honest assessment of feasibility 1311 +✅ Testing AI capability without humans 1312 +✅ Proving core technical concept 1313 +✅ Fast validation of approach 1314 +✅ Honest assessment of feasibility 1352 1352 1316 +--- 1317 + 1353 1353 === 15.3 What POC Is NOT === 1354 1354 1355 -❌ Building a product 1356 -❌ Production-ready system 1357 -❌ Feature-complete platform 1358 -❌ Perfectly accurate analysis 1359 -❌ Polished user experience 1320 +❌ Building a product 1321 +❌ Production-ready system 1322 +❌ Feature-complete platform 1323 +❌ Perfectly accurate analysis 1324 +❌ Polished user experience 1360 1360 1326 +--- 1327 + 1361 1361 == 16. Success = Clear Path Forward == 1362 1362 1363 1363 **If POC succeeds (≥70% AI quality):** ... ... @@ -1375,63 +1375,18 @@ 1375 1375 1376 1376 **Either way, POC provides clarity.** 1377 1377 1345 +--- 1346 + 1378 1378 == 17. Related Pages == 1379 1379 1380 -* [[User Needs>>FactHarbor.Specification.Requirements.User Needs .WebHome]]1381 -* [[Requirements>>FactHarbor. Specification.Requirements.WebHome]]1382 -* [[Gap Analysis>>FactHarbor. Specification.Requirements.GapAnalysis]]1349 +* [[User Needs>>FactHarbor.Specification.Requirements.User Needs]] 1350 +* [[Requirements>>FactHarbor.Requirements.WebHome]] 1351 +* [[Gap Analysis>>FactHarbor.Analysis.GapAnalysis]] 1383 1383 * [[Architecture>>FactHarbor.Specification.Architecture.WebHome]] 1384 1384 * [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] 1385 1385 * [[Workflows>>FactHarbor.Specification.Workflows.WebHome]] 1386 1386 1356 +--- 1357 + 1387 1387 **Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment) 1388 1388 1389 - 1390 -=== NFR-POC-11: LLM Provider Abstraction (POC1) === 1391 - 1392 -**Requirement:** POC1 MUST implement LLM abstraction layer with support for multiple providers. 1393 - 1394 -**POC1 Implementation:** 1395 - 1396 -* **Primary Provider:** Anthropic Claude API 1397 - * Stage 1: Provider-default FAST model 1398 - * Stage 2: Provider-default REASONING model (cached) 1399 - * Stage 3: Provider-default REASONING model 1400 - 1401 -* **Provider Interface:** Abstract LLMProvider interface implemented 1402 - 1403 -* **Configuration:** Environment variables for provider selection 1404 - * {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}} 1405 - * {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}} 1406 - * {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}} 1407 - 1408 -* **Failover:** Basic error handling with cache fallback for Stage 2 1409 - 1410 -* **Cost Tracking:** Log provider name and cost per request 1411 - 1412 -**Future (POC2/Beta):** 1413 - 1414 -* Secondary provider (OpenAI) with automatic failover 1415 -* Admin API for runtime provider switching 1416 -* Cost comparison dashboard 1417 -* Cross-provider output verification 1418 - 1419 -**Success Criteria:** 1420 - 1421 -* All LLM calls go through abstraction layer (no direct API calls) 1422 -* Provider can be changed via environment variable without code changes 1423 -* Cost tracking includes provider name in logs 1424 -* Stage 2 falls back to cache on provider failure 1425 - 1426 -**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor.Specification.POC.API-and-Schemas.WebHome]] Section 6 1427 - 1428 -**Dependencies:** 1429 -* NFR-14 (Main Requirements) 1430 -* Design Decision 9 1431 -* Architecture Section 2.2 1432 - 1433 -**Priority:** HIGH (P1) 1434 - 1435 -**Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later. 1436 - 1437 -