Last modified by Robert Schaub on 2026/02/08 08:26

From version 1.1
edited by Robert Schaub
on 2025/12/19 16:13
Change comment: Imported from XAR
To version 2.1
edited by Robert Schaub
on 2025/12/24 21:53
Change comment: Imported from XAR

Summary

Details

Page properties
Title
... ... @@ -1,1 +1,1 @@
1 -POC Requirements
1 +POC Requirements (POC1 & POC2)
Content
... ... @@ -1,11 +1,18 @@
1 1  = POC Requirements =
2 2  
3 -**Status:** ✅ Approved for Development
4 -**Version:** 2.0 (Updated after Specification Cross-Check)
5 -**Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
6 6  
7 ----
4 +{{info}}
5 +**POC1 Architecture:** 3-stage AKEL pipeline (Extract → Analyze → Holistic) with Redis caching, credit tracking, and LLM abstraction layer.
8 8  
7 +See [[POC1 API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome]] for complete technical details.
8 +{{/info}}
9 +
10 +
11 +
12 +**Status:** ✅ Approved for Development
13 +**Version:** 2.0 (Updated after Specification Cross-Check)
14 +**Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
15 +
9 9  == 1. POC Overview ==
10 10  
11 11  === 1.1 What POC Tests ===
... ... @@ -26,8 +26,6 @@
26 26  * Perfect accuracy
27 27  * Complete feature set
28 28  
29 ----
30 -
31 31  === 1.2 Scenarios Deferred to POC2 ===
32 32  
33 33  **Intentional Simplification:**
... ... @@ -61,33 +61,65 @@
61 61  Claims → Verdicts (scenarios implicit in reasoning)
62 62  {{/code}}
63 63  
64 ----
65 -
66 66  == 2. POC Output Specification ==
67 67  
68 -=== 2.1 Component 1: ANALYSIS SUMMARY ===
71 +=== 2.1 Component 1: ANALYSIS SUMMARY (Context-Aware) ===
69 69  
70 -**What:** Brief overview of findings
71 -**Length:** 3-5 sentences
72 -**Content:**
73 -* How many claims found
74 -* Distribution of verdicts
75 -* Overall assessment
73 +**What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument
76 76  
77 -**Example:**
75 +**Length:** 4-6 sentences
76 +
77 +**Content (Required Elements):**
78 +1. **Article's main thesis/claim** - What is the article trying to argue or prove?
79 +2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts
80 +3. **Central vs. supporting claims** - Which claims are central to the article's argument?
81 +4. **Relationship assessment** - Do the claims support the article's conclusion?
82 +5. **Overall credibility** - Final assessment considering claim importance
83 +
84 +**Critical Innovation:**
85 +
86 +POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might:
87 +* Make accurate supporting facts but draw unsupported conclusions
88 +* Have one false central claim that invalidates the whole argument
89 +* Misframe accurate information to mislead
90 +
91 +**Good Example (Context-Aware):**
78 78  {{code}}
79 -This article makes 4 claims about coffee's health effects. We found
80 -2 claims are well-supported, 1 is uncertain, and 1 is refuted.
81 -Overall assessment: mostly accurate with some exaggeration.
93 +This article argues that coffee cures cancer based on its antioxidant
94 +content. We analyzed 3 factual claims: 2 about coffee's chemical
95 +properties are well-supported, but the main causal claim is refuted
96 +by current evidence. The article confuses correlation with causation.
97 +Overall assessment: MISLEADING - makes an unsupported medical claim
98 +despite citing some accurate facts.
82 82  {{/code}}
83 83  
84 ----
101 +**Poor Example (Simple Aggregation - Don't Do This):**
102 +{{code}}
103 +This article makes 3 claims. 2 are well-supported and 1 is refuted.
104 +Overall assessment: mostly accurate (67% accurate).
105 +{{/code}}
106 +↑ This misses that the refuted claim IS the article's main point!
85 85  
108 +**What POC1 Tests:**
109 +
110 +Can AI identify and assess:
111 +* ✅ The article's main thesis/conclusion?
112 +* ✅ Which claims are central vs. supporting?
113 +* ✅ Whether the evidence supports the conclusion?
114 +* ✅ Overall credibility considering logical structure?
115 +
116 +**If AI Cannot Do This:**
117 +
118 +That's valuable to learn in POC1! We'll:
119 +* Note as limitation
120 +* Fall back to simple aggregation with warning
121 +* Design explicit article-level analysis for POC2
122 +
86 86  === 2.2 Component 2: CLAIMS IDENTIFICATION ===
87 87  
88 -**What:** List of factual claims extracted from article
89 -**Format:** Numbered list
90 -**Quantity:** 3-5 claims
125 +**What:** List of factual claims extracted from article
126 +**Format:** Numbered list
127 +**Quantity:** 3-5 claims
91 91  **Requirements:**
92 92  * Factual claims only (not opinions/questions)
93 93  * Clearly stated
... ... @@ -103,12 +103,10 @@
103 103  [4] Coffee prevents Alzheimer's completely
104 104  {{/code}}
105 105  
106 ----
107 -
108 108  === 2.3 Component 3: CLAIMS VERDICTS ===
109 109  
110 -**What:** Verdict for each claim identified
111 -**Format:** Per claim structure
145 +**What:** Verdict for each claim identified
146 +**Format:** Per claim structure
112 112  
113 113  **Required Elements:**
114 114  * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
... ... @@ -135,17 +135,15 @@
135 135  
136 136  **Risk Tier Display:**
137 137  * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections
138 -* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
173 +* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
139 139  * **Tier C (Green):** Low Risk - Facts/Definitions/History
140 140  
141 141  **Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.
142 142  
143 ----
144 -
145 145  === 2.4 Component 4: ARTICLE SUMMARY (Optional) ===
146 146  
147 -**What:** Brief summary of original article content
148 -**Length:** 3-5 sentences
180 +**What:** Brief summary of original article content
181 +**Length:** 3-5 sentences
149 149  **Tone:** Neutral (article's position, not FactHarbor's analysis)
150 150  
151 151  **Example:**
... ... @@ -157,17 +157,60 @@
157 157  to disease prevention. Recommends 2-3 cups daily for optimal health.
158 158  {{/code}}
159 159  
160 ----
193 +=== 2.5 Component 5: USAGE STATISTICS (Cost Tracking) ===
161 161  
162 -=== 2.5 Total Output Size ===
195 +**What:** LLM usage metrics for cost optimization and scaling decisions
163 163  
164 -**Combined:** ~200-300 words
165 -* Analysis Summary: 50-70 words
197 +**Purpose:**
198 +* Understand cost per analysis
199 +* Identify optimization opportunities
200 +* Project costs at scale
201 +* Inform architecture decisions
202 +
203 +**Display Format:**
204 +{{code}}
205 +USAGE STATISTICS:
206 +• Article: 2,450 words (12,300 characters)
207 +• Input tokens: 15,234
208 +• Output tokens: 892
209 +• Total tokens: 16,126
210 +• Estimated cost: $0.24 USD
211 +• Response time: 8.3 seconds
212 +• Cost per claim: $0.048
213 +• Model: claude-sonnet-4-20250514
214 +{{/code}}
215 +
216 +**Why This Matters:**
217 +
218 +At scale, LLM costs are critical:
219 +* 10,000 articles/month ≈ $200-500/month
220 +* 100,000 articles/month ≈ $2,000-5,000/month
221 +* Cost optimization can reduce expenses 30-50%
222 +
223 +**What POC1 Learns:**
224 +* How cost scales with article length
225 +* Prompt optimization opportunities (caching, compression)
226 +* Output verbosity tradeoffs
227 +* Model selection strategy (FAST vs. REASONING roles)
228 +* Article length limits (if needed)
229 +
230 +**Implementation:**
231 +* Claude API already returns usage data
232 +* No extra API calls needed
233 +* Display to user + log for aggregate analysis
234 +* Test with articles of varying lengths
235 +
236 +**Critical for GO/NO-GO:** Unit economics must be viable at scale!
237 +
238 +=== 2.6 Total Output Size ===
239 +
240 +**Combined:** ~220-350 words
241 +* Analysis Summary (Context-Aware): 60-90 words (4-6 sentences)
166 166  * Claims Identification: 30-50 words
167 167  * Claims Verdicts: 100-150 words
168 168  * Article Summary: 30-50 words (optional)
169 169  
170 ----
246 +**Note:** Analysis summary is slightly longer (4-6 sentences vs. 3-5) to accommodate context-aware assessment of article structure and logical reasoning.
171 171  
172 172  == 3. What's NOT in POC Scope ==
173 173  
... ... @@ -215,8 +215,6 @@
215 215  * ❌ Analytics
216 216  * ❌ A/B testing
217 217  
218 ----
219 -
220 220  == 4. POC Simplifications vs. Full System ==
221 221  
222 222  === 4.1 Architecture Comparison ===
... ... @@ -224,7 +224,7 @@
224 224  **POC Architecture (Simplified):**
225 225  {{code}}
226 226  User Input → Single AKEL Call → Output Display
227 - (all processing)
301 + (all processing)
228 228  {{/code}}
229 229  
230 230  **Full System Architecture:**
... ... @@ -245,8 +245,6 @@
245 245  |Data Model|Stateless (no database)|PostgreSQL + Redis + S3
246 246  |Architecture|Single prompt to Claude|AKEL Orchestrator + Components
247 247  
248 ----
249 -
250 250  === 4.2 Workflow Comparison ===
251 251  
252 252  **POC1 Workflow:**
... ... @@ -264,8 +264,6 @@
264 264  6. **Time Evolution** (versioning, re-evaluation triggers)
265 265  **Total: 6 phases with quality gates, ~10-30 seconds**
266 266  
267 ----
268 -
269 269  === 4.3 Why POC is Simplified ===
270 270  
271 271  **Engineering Rationale:**
... ... @@ -284,8 +284,6 @@
284 284  * ❌ POC doesn't validate scale (test in Beta)
285 285  * ❌ POC doesn't validate scenario architecture (design in POC2)
286 286  
287 ----
288 -
289 289  === 4.4 Gap Between POC1 and POC2/Beta ===
290 290  
291 291  **What needs to be built for POC2:**
... ... @@ -305,8 +305,6 @@
305 305  
306 306  **POC1 → POC2 is significant architectural expansion.**
307 307  
308 ----
309 -
310 310  == 5. Publication Mode & Labeling ==
311 311  
312 312  === 5.1 POC Publication Mode ===
... ... @@ -320,22 +320,20 @@
320 320  * All quality gates active (simplified)
321 321  * Risk tier classification shown (demo)
322 322  
323 ----
324 -
325 325  === 5.2 User-Facing Labels ===
326 326  
327 327  **Primary Label (top of analysis):**
328 328  {{code}}
329 329  ╔════════════════════════════════════════════════════════════╗
330 -║ [AI-GENERATED - POC/DEMO]
331 -║
332 -║ This analysis was produced entirely by AI and has not
333 -║ been human-reviewed. Use for demonstration purposes.
334 -║
335 -║ Source: AI/AKEL v1.0 (POC)
336 -║ Review Status: Not Reviewed (Proof-of-Concept)
337 -║ Quality Gates: 4/4 Passed (Simplified)
338 -║ Last Updated: [timestamp]
394 +║ [AI-GENERATED - POC/DEMO] ║
395 +║ ║
396 +║ This analysis was produced entirely by AI and has not ║
397 +║ been human-reviewed. Use for demonstration purposes. ║
398 +║ ║
399 +║ Source: AI/AKEL v1.0 (POC) ║
400 +║ Review Status: Not Reviewed (Proof-of-Concept) ║
401 +║ Quality Gates: 4/4 Passed (Simplified) ║
402 +║ Last Updated: [timestamp] ║
339 339  ╚════════════════════════════════════════════════════════════╝
340 340  {{/code}}
341 341  
... ... @@ -344,8 +344,6 @@
344 344  * **[Risk: B]** 🟡 Medium Risk (Policy/Science)
345 345  * **[Risk: C]** 🟢 Low Risk (Facts/Definitions)
346 346  
347 ----
348 -
349 349  === 5.3 Display Requirements ===
350 350  
351 351  **Must Show:**
... ... @@ -363,8 +363,6 @@
363 363  * Authoritative verdicts
364 364  * Complete accuracy
365 365  
366 ----
367 -
368 368  === 5.4 Mode 2 vs. Full System Publication ===
369 369  
370 370  |=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3
... ... @@ -375,8 +375,6 @@
375 375  |Risk Display|Demo only|Workflow-integrated|Validated
376 376  |User Actions|View only|Flag for review|Trust rating
377 377  
378 ----
379 -
380 380  == 6. Quality Gates (Simplified Implementation) ==
381 381  
382 382  === 6.1 Overview ===
... ... @@ -395,8 +395,6 @@
395 395  * Failures displayed to user (not blocking)
396 396  * Full system has comprehensive validation
397 397  
398 ----
399 -
400 400  === 6.2 Gate 1: Source Quality (Basic) ===
401 401  
402 402  **Full System Requirements:**
... ... @@ -417,8 +417,6 @@
417 417  
418 418  **Failure Handling:** Display error message, don't generate verdict
419 419  
420 ----
421 -
422 422  === 6.3 Gate 2: Contradiction Search (Basic) ===
423 423  
424 424  **Full System Requirements:**
... ... @@ -441,8 +441,6 @@
441 441  
442 442  **Failure Handling:** Note "limited contradiction search" in output
443 443  
444 ----
445 -
446 446  === 6.4 Gate 3: Uncertainty Quantification (Basic) ===
447 447  
448 448  **Full System Requirements:**
... ... @@ -463,8 +463,6 @@
463 463  
464 464  **Failure Handling:** Show "Confidence: Unknown" if calculation fails
465 465  
466 ----
467 -
468 468  === 6.5 Gate 4: Structural Integrity (Basic) ===
469 469  
470 470  **Full System Requirements:**
... ... @@ -485,8 +485,6 @@
485 485  
486 486  **Failure Handling:** Display error message
487 487  
488 ----
489 -
490 490  === 6.6 Quality Gate Display ===
491 491  
492 492  **POC shows simplified status:**
... ... @@ -509,8 +509,6 @@
509 509  Note: This analysis has limited evidence. Use with caution.
510 510  {{/code}}
511 511  
512 ----
513 -
514 514  === 6.7 Simplified vs. Full System ===
515 515  
516 516  |=Gate|=POC (Simplified)|=Full System
... ... @@ -521,14 +521,12 @@
521 521  
522 522  **POC Goal:** Demonstrate that quality gates are possible, not perfect implementation.
523 523  
524 ----
525 -
526 526  == 7. AKEL Architecture Comparison ==
527 527  
528 528  === 7.1 POC AKEL (Simplified) ===
529 529  
530 530  **Implementation:**
531 -* Single Claude API call (Sonnet 4.5)
575 +* Single provider API call (REASONING model)
532 532  * One comprehensive prompt
533 533  * All processing in single request
534 534  * No separate components
... ... @@ -540,10 +540,10 @@
540 540  
541 541  1. Extract 3-5 factual claims
542 542  2. For each claim:
543 - - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
544 - - Assign confidence score (0-100%)
545 - - Assign risk tier (A/B/C)
546 - - Write brief reasoning (1-3 sentences)
587 + - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
588 + - Assign confidence score (0-100%)
589 + - Assign risk tier (A/B/C)
590 + - Write brief reasoning (1-3 sentences)
547 547  3. Generate analysis summary (3-5 sentences)
548 548  4. Generate article summary (3-5 sentences)
549 549  5. Run basic quality checks
... ... @@ -553,8 +553,6 @@
553 553  
554 554  **Processing Time:** 10-18 seconds (estimate)
555 555  
556 ----
557 -
558 558  === 7.2 Full System AKEL (Production) ===
559 559  
560 560  **Architecture:**
... ... @@ -579,8 +579,6 @@
579 579  
580 580  **Processing Time:** 10-30 seconds (full pipeline)
581 581  
582 ----
583 -
584 584  === 7.3 Why POC Uses Single Call ===
585 585  
586 586  **Advantages:**
... ... @@ -603,8 +603,6 @@
603 603  
604 604  Full component architecture comes in Beta after POC validates concept.
605 605  
606 ----
607 -
608 608  === 7.4 Evolution Path ===
609 609  
610 610  **POC1:** Single prompt → Prove concept
... ... @@ -612,8 +612,6 @@
612 612  **Beta:** Multi-component AKEL → Production architecture
613 613  **Release 1.0:** Full AKEL + Federation → Scale
614 614  
615 ----
616 -
617 617  == 8. Functional Requirements ==
618 618  
619 619  === FR-POC-1: Article Input ===
... ... @@ -637,8 +637,6 @@
637 637  * User can paste URL of article
638 638  * System accepts input and triggers analysis
639 639  
640 ----
641 -
642 642  === FR-POC-2: Claim Extraction (Fully Automated) ===
643 643  
644 644  **Requirement:** AI automatically extracts 3-5 factual claims
... ... @@ -666,8 +666,6 @@
666 666  * Claims are clearly stated
667 667  * No manual editing required
668 668  
669 ----
670 -
671 671  === FR-POC-3: Verdict Generation (Fully Automated) ===
672 672  
673 673  **Requirement:** AI automatically generates verdict for each claim
... ... @@ -674,11 +674,11 @@
674 674  
675 675  **Functionality:**
676 676  * For each claim, AI:
677 - * Evaluates claim based on available evidence/knowledge
678 - * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
679 - * Assigns confidence score (0-100%)
680 - * Assigns risk tier (A/B/C)
681 - * Writes brief reasoning (1-3 sentences)
709 + * Evaluates claim based on available evidence/knowledge
710 + * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
711 + * Assigns confidence score (0-100%)
712 + * Assigns risk tier (A/B/C)
713 + * Writes brief reasoning (1-3 sentences)
682 682  * System displays verdict for each claim
683 683  
684 684  **Critical:** NO MANUAL EDITING ALLOWED
... ... @@ -700,8 +700,6 @@
700 700  * Verdict is defensible given reasoning
701 701  * All generated automatically by AI
702 702  
703 ----
704 -
705 705  === FR-POC-4: Analysis Summary (Fully Automated) ===
706 706  
707 707  **Requirement:** AI generates brief summary of analysis
... ... @@ -708,9 +708,9 @@
708 708  
709 709  **Functionality:**
710 710  * AI summarizes findings in 3-5 sentences:
711 - * How many claims found
712 - * Distribution of verdicts
713 - * Overall assessment
741 + * How many claims found
742 + * Distribution of verdicts
743 + * Overall assessment
714 714  * System displays at top of results
715 715  
716 716  **Critical:** NO MANUAL EDITING ALLOWED
... ... @@ -721,8 +721,6 @@
721 721  * 3-5 sentences
722 722  * Automatically generated
723 723  
724 ----
725 -
726 726  === FR-POC-5: Article Summary (Fully Automated, Optional) ===
727 727  
728 728  **Requirement:** AI generates brief summary of original article
... ... @@ -742,8 +742,6 @@
742 742  * 3-5 sentences
743 743  * Automatically generated
744 744  
745 ----
746 -
747 747  === FR-POC-6: Publication Mode Display ===
748 748  
749 749  **Requirement:** Clear labeling of AI-generated content
... ... @@ -761,8 +761,6 @@
761 761  * Risk tiers are color-coded
762 762  * Quality gate status is visible
763 763  
764 ----
765 -
766 766  === FR-POC-7: Quality Gate Execution ===
767 767  
768 768  **Requirement:** Execute simplified quality gates
... ... @@ -780,8 +780,6 @@
780 780  * Failures explained to user
781 781  * Gates don't block publication (POC mode)
782 782  
783 ----
784 -
785 785  == 9. Non-Functional Requirements ==
786 786  
787 787  === NFR-POC-1: Fully Automated Processing ===
... ... @@ -800,8 +800,8 @@
800 800  **Pipeline:**
801 801  {{code}}
802 802  User Input → AKEL Processing → Output Display
803 -
804 - ZERO human editing
825 + ↓
826 + ZERO human editing
805 805  {{/code}}
806 806  
807 807  **If AI output is poor:**
... ... @@ -815,8 +815,6 @@
815 815  * Validates scalability (humans can't review every analysis)
816 816  * Honest test of technical feasibility
817 817  
818 ----
819 -
820 820  === NFR-POC-2: Performance ===
821 821  
822 822  **Requirement:** Analysis completes in reasonable time
... ... @@ -836,8 +836,6 @@
836 836  * User sees loading indicator
837 837  * No timeout errors
838 838  
839 ----
840 -
841 841  === NFR-POC-3: Reliability ===
842 842  
843 843  **Requirement:** System works for manual testing sessions
... ... @@ -857,8 +857,6 @@
857 857  * Errors are handled gracefully
858 858  * User receives clear error messages
859 859  
860 ----
861 -
862 862  === NFR-POC-4: Environment ===
863 863  
864 864  **Requirement:** Runs on simple infrastructure
... ... @@ -876,8 +876,48 @@
876 876  * Auto-scaling
877 877  * Disaster recovery
878 878  
879 ----
895 +=== NFR-POC-5: Cost Efficiency Tracking ===
880 880  
897 +**Requirement:** Track and display LLM usage metrics to inform optimization decisions
898 +
899 +**Must Track:**
900 +* Input tokens (article + prompt)
901 +* Output tokens (generated analysis)
902 +* Total tokens
903 +* Estimated cost (USD)
904 +* Response time (seconds)
905 +* Article length (words/characters)
906 +
907 +**Must Display:**
908 +* Usage statistics in UI (Component 5)
909 +* Cost per analysis
910 +* Cost per claim extracted
911 +
912 +**Must Log:**
913 +* Aggregate metrics for analysis
914 +* Cost distribution by article length
915 +* Token efficiency trends
916 +
917 +**Purpose:**
918 +* Understand unit economics
919 +* Identify optimization opportunities
920 +* Project costs at scale
921 +* Inform architecture decisions (caching, model selection, etc.)
922 +
923 +**Acceptance Criteria:**
924 +* ✅ Usage data displayed after each analysis
925 +* ✅ Metrics logged for aggregate analysis
926 +* ✅ Cost calculated accurately (Claude API pricing)
927 +* ✅ Test cases include varying article lengths
928 +* ✅ POC1 report includes cost analysis section
929 +
930 +**Success Target:**
931 +* Average cost per analysis < $0.05 USD
932 +* Cost scaling behavior understood (linear/exponential)
933 +* 2+ optimization opportunities identified
934 +
935 +**Critical:** Unit economics must be viable for scaling decision!
936 +
881 881  == 10. Technical Architecture ==
882 882  
883 883  === 10.1 System Components ===
... ... @@ -889,7 +889,7 @@
889 889  
890 890  **Backend:**
891 891  * Single API endpoint
892 -* Calls Claude API (Sonnet 4.5 or latest)
948 +* Calls provider API (REASONING model; configured via LLM abstraction)
893 893  * Parses response
894 894  * Returns JSON to frontend
895 895  
... ... @@ -901,36 +901,32 @@
901 901  * Claude API (Anthropic) - required
902 902  * Optional: URL fetch service for article text extraction
903 903  
904 ----
905 -
906 906  === 10.2 Processing Flow ===
907 907  
908 908  {{code}}
909 909  1. User submits text or URL
910 -
964 + ↓
911 911  2. Backend receives request
912 -
966 + ↓
913 913  3. If URL: Fetch article text
914 -
968 + ↓
915 915  4. Call Claude API with single prompt:
916 - "Extract claims, evaluate each, provide verdicts"
917 -
970 + "Extract claims, evaluate each, provide verdicts"
971 + ↓
918 918  5. Claude API returns:
919 - - Analysis summary
920 - - Claims list
921 - - Verdicts for each claim (with risk tiers)
922 - - Article summary (optional)
923 - - Quality gate results
924 -
973 + - Analysis summary
974 + - Claims list
975 + - Verdicts for each claim (with risk tiers)
976 + - Article summary (optional)
977 + - Quality gate results
978 + ↓
925 925  6. Backend parses response
926 -
980 + ↓
927 927  7. Frontend displays results with Mode 2 labeling
928 928  {{/code}}
929 929  
930 930  **Key Simplification:** Single API call does entire analysis
931 931  
932 ----
933 -
934 934  === 10.3 AI Prompt Strategy ===
935 935  
936 936  **Single Comprehensive Prompt:**
... ... @@ -937,27 +937,49 @@
937 937  {{code}}
938 938  Task: Analyze this article and provide:
939 939  
940 -1. Extract 3-5 factual claims from the article
941 -2. For each claim:
942 - - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
943 - - Assign confidence score (0-100%)
944 - - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
945 - - Write brief reasoning (1-3 sentences)
946 -3. Run quality gates:
947 - - Check: ≥2 sources found
948 - - Attempt: Basic contradiction search
949 - - Calculate: Confidence scores
950 - - Verify: Structural integrity
951 -4. Write analysis summary (3-5 sentences: claims found, verdict distribution, overall assessment)
952 -5. Write article summary (3-5 sentences: neutral summary of article content)
992 +1. Identify the article's main thesis/conclusion
993 + - What is the article trying to argue or prove?
994 + - What is the primary claim or conclusion?
953 953  
996 +2. Extract 3-5 factual claims from the article
997 + - Note which claims are CENTRAL to the main thesis
998 + - Note which claims are SUPPORTING facts
999 +
1000 +3. For each claim:
1001 + - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
1002 + - Assign confidence score (0-100%)
1003 + - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
1004 + - Write brief reasoning (1-3 sentences)
1005 +
1006 +4. Assess relationship between claims and main thesis:
1007 + - Do the claims actually support the article's conclusion?
1008 + - Are there logical leaps or unsupported inferences?
1009 + - Is the article's framing misleading even if individual facts are accurate?
1010 +
1011 +5. Run quality gates:
1012 + - Check: ≥2 sources found
1013 + - Attempt: Basic contradiction search
1014 + - Calculate: Confidence scores
1015 + - Verify: Structural integrity
1016 +
1017 +6. Write context-aware analysis summary (4-6 sentences):
1018 + - State article's main thesis
1019 + - Report claims found and verdict distribution
1020 + - Note if central claims are problematic
1021 + - Assess whether evidence supports conclusion
1022 + - Overall credibility considering claim importance
1023 +
1024 +7. Write article summary (3-5 sentences: neutral summary of article content)
1025 +
954 954  Return as structured JSON with quality gate results.
955 955  {{/code}}
956 956  
957 957  **One prompt generates everything.**
958 958  
959 ----
1031 +**Critical Addition:**
960 960  
1033 +Steps 1, 2 (marking central claims), 4, and 6 are NEW for context-aware analysis. These test whether AI can distinguish between "accurate facts poorly reasoned" vs. "genuinely credible article."
1034 +
961 961  === 10.4 Technology Stack Suggestions ===
962 962  
963 963  **Frontend:**
... ... @@ -972,7 +972,7 @@
972 972  
973 973  **AKEL Integration:**
974 974  * Claude API via Anthropic SDK
975 -* Model: Claude Sonnet 4.5 or latest available
1049 +* Model: Provider-default REASONING model or latest available
976 976  
977 977  **Database:**
978 978  * None (stateless acceptable)
... ... @@ -983,8 +983,6 @@
983 983  * Local development environment sufficient for POC
984 984  * Optional: Deploy to cloud for remote demos
985 985  
986 ----
987 -
988 988  == 11. Success Criteria ==
989 989  
990 990  === 11.1 Minimum Success (POC Passes) ===
... ... @@ -998,6 +998,9 @@
998 998  * ✅ Team/advisors understand the output
999 999  * ✅ Team agrees approach has merit
1000 1000  * ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention)
1073 +* ✅ **Cost efficiency acceptable** (average cost per analysis < $0.05 USD target)
1074 +* ✅ **Cost scaling understood** (data collected on article length vs. cost)
1075 +* ✅ **Optimization opportunities identified** (≥2 potential improvements documented)
1001 1001  
1002 1002  **Quality Definition:**
1003 1003  * "Reasonable verdict" = Defensible given general knowledge
... ... @@ -1004,8 +1004,6 @@
1004 1004  * "Coherent summary" = Logically structured, grammatically correct
1005 1005  * "Comprehensible" = Reviewers understand what analysis means
1006 1006  
1007 ----
1008 -
1009 1009  === 11.2 POC Fails If ===
1010 1010  
1011 1011  **Automatic NO-GO if any of these:**
... ... @@ -1015,8 +1015,6 @@
1015 1015  * ❌ **Requires manual editing for most analyses** (> 50% need human correction)
1016 1016  * ❌ Team loses confidence in AI-automated approach
1017 1017  
1018 ----
1019 -
1020 1020  === 11.3 Quality Thresholds ===
1021 1021  
1022 1022  **POC quality expectations:**
... ... @@ -1042,8 +1042,6 @@
1042 1042  * Understandable reasoning
1043 1043  * Useful output
1044 1044  
1045 ----
1046 -
1047 1047  == 12. Test Cases ==
1048 1048  
1049 1049  === 12.1 Test Case 1: Simple Factual Claim ===
... ... @@ -1059,8 +1059,6 @@
1059 1059  
1060 1060  **Success:** Verdict is reasonable and reasoning makes sense
1061 1061  
1062 ----
1063 -
1064 1064  === 12.2 Test Case 2: Complex News Article ===
1065 1065  
1066 1066  **Input:** News article URL with multiple claims about politics/health/science
... ... @@ -1074,8 +1074,6 @@
1074 1074  
1075 1075  **Success:** Claims identified are actually from article, verdicts are reasonable
1076 1076  
1077 ----
1078 -
1079 1079  === 12.3 Test Case 3: Controversial Topic ===
1080 1080  
1081 1081  **Input:** Article on contested political or scientific topic
... ... @@ -1088,8 +1088,6 @@
1088 1088  
1089 1089  **Success:** Analysis is fair and doesn't show obvious bias
1090 1090  
1091 ----
1092 -
1093 1093  === 12.4 Test Case 4: Clearly False Claim ===
1094 1094  
1095 1095  **Input:** Article with obviously false claim (e.g., "The Earth is flat")
... ... @@ -1103,8 +1103,6 @@
1103 1103  
1104 1104  **Success:** AI correctly identifies false claim with high confidence
1105 1105  
1106 ----
1107 -
1108 1108  === 12.5 Test Case 5: Genuinely Uncertain Claim ===
1109 1109  
1110 1110  **Input:** Article with claim where evidence is genuinely mixed
... ... @@ -1117,8 +1117,6 @@
1117 1117  
1118 1118  **Success:** AI recognizes uncertainty and doesn't overstate confidence
1119 1119  
1120 ----
1121 -
1122 1122  === 12.6 Test Case 6: High-Risk Medical Claim ===
1123 1123  
1124 1124  **Input:** Article making medical claims
... ... @@ -1132,8 +1132,6 @@
1132 1132  
1133 1133  **Success:** Risk tier correctly assigned, appropriate warnings shown
1134 1134  
1135 ----
1136 -
1137 1137  == 13. POC Decision Gate ==
1138 1138  
1139 1139  === 13.1 Decision Framework ===
... ... @@ -1156,8 +1156,6 @@
1156 1156  * Expand to Evidence Model structure
1157 1157  * Test with more complex articles
1158 1158  
1159 ----
1160 -
1161 1161  **Option B: NO-GO (Pivot or Stop)**
1162 1162  
1163 1163  **Conditions:**
... ... @@ -1171,8 +1171,6 @@
1171 1171  * **Pivot:** Change to hybrid human-AI approach (accept manual review required)
1172 1172  * **Stop:** Conclude approach not viable, revisit later
1173 1173  
1174 ----
1175 -
1176 1176  **Option C: ITERATE (Improve POC)**
1177 1177  
1178 1178  **Conditions:**
... ... @@ -1187,39 +1187,33 @@
1187 1187  * Re-run POC with improvements
1188 1188  * Then make GO/NO-GO decision
1189 1189  
1190 ----
1191 -
1192 1192  === 13.2 Decision Criteria Summary ===
1193 1193  
1194 1194  {{code}}
1195 -AI Quality < 60% → NO-GO (approach doesn't work)
1246 +AI Quality < 60% → NO-GO (approach doesn't work)
1196 1196  AI Quality 60-70% → ITERATE (improve and retry)
1197 -AI Quality ≥70% → GO (proceed to POC2)
1248 +AI Quality ≥70% → GO (proceed to POC2)
1198 1198  {{/code}}
1199 1199  
1200 ----
1201 -
1202 1202  == 14. Key Risks & Mitigations ==
1203 1203  
1204 1204  === 14.1 Risk: AI Quality Not Good Enough ===
1205 1205  
1206 -**Likelihood:** Medium-High
1207 -**Impact:** POC fails
1255 +**Likelihood:** Medium-High
1256 +**Impact:** POC fails
1208 1208  
1209 1209  **Mitigation:**
1210 1210  * Extensive prompt engineering and testing
1211 -* Use best available AI models (Sonnet 4.5)
1260 +* Use best available AI models (role-based selection; configured via LLM abstraction)
1212 1212  * Test with diverse article types
1213 1213  * Iterate on prompts based on results
1214 1214  
1215 1215  **Acceptance:** This is what POC tests - be ready for failure
1216 1216  
1217 ----
1218 -
1219 1219  === 14.2 Risk: AI Consistency Issues ===
1220 1220  
1221 -**Likelihood:** Medium
1222 -**Impact:** Works sometimes, fails other times
1268 +**Likelihood:** Medium
1269 +**Impact:** Works sometimes, fails other times
1223 1223  
1224 1224  **Mitigation:**
1225 1225  * Test with 10+ diverse articles
... ... @@ -1228,12 +1228,10 @@
1228 1228  
1229 1229  **Acceptance:** Some variability OK if average quality ≥70%
1230 1230  
1231 ----
1232 -
1233 1233  === 14.3 Risk: Output Incomprehensible ===
1234 1234  
1235 -**Likelihood:** Low-Medium
1236 -**Impact:** Users can't understand analysis
1280 +**Likelihood:** Low-Medium
1281 +**Impact:** Users can't understand analysis
1237 1237  
1238 1238  **Mitigation:**
1239 1239  * Create clear explainer document
... ... @@ -1243,12 +1243,10 @@
1243 1243  
1244 1244  **Acceptance:** Iterate until comprehensible
1245 1245  
1246 ----
1247 -
1248 1248  === 14.4 Risk: API Rate Limits / Costs ===
1249 1249  
1250 -**Likelihood:** Low
1251 -**Impact:** System slow or expensive
1293 +**Likelihood:** Low
1294 +**Impact:** System slow or expensive
1252 1252  
1253 1253  **Mitigation:**
1254 1254  * Monitor API usage
... ... @@ -1257,12 +1257,10 @@
1257 1257  
1258 1258  **Acceptance:** POC can be slow and expensive (optimization later)
1259 1259  
1260 ----
1261 -
1262 1262  === 14.5 Risk: Scope Creep ===
1263 1263  
1264 -**Likelihood:** Medium
1265 -**Impact:** POC becomes too complex
1305 +**Likelihood:** Medium
1306 +**Impact:** POC becomes too complex
1266 1266  
1267 1267  **Mitigation:**
1268 1268  * Strict scope discipline
... ... @@ -1271,8 +1271,6 @@
1271 1271  
1272 1272  **Acceptance:** POC is minimal by design
1273 1273  
1274 ----
1275 -
1276 1276  == 15. POC Philosophy ==
1277 1277  
1278 1278  === 15.1 Core Principles ===
... ... @@ -1304,27 +1304,21 @@
1304 1304  * Document failures openly
1305 1305  * Make data-driven decisions
1306 1306  
1307 ----
1308 -
1309 1309  === 15.2 What POC Is ===
1310 1310  
1311 -✅ Testing AI capability without humans
1312 -✅ Proving core technical concept
1313 -✅ Fast validation of approach
1314 -✅ Honest assessment of feasibility
1348 +✅ Testing AI capability without humans
1349 +✅ Proving core technical concept
1350 +✅ Fast validation of approach
1351 +✅ Honest assessment of feasibility
1315 1315  
1316 ----
1317 -
1318 1318  === 15.3 What POC Is NOT ===
1319 1319  
1320 -❌ Building a product
1321 -❌ Production-ready system
1322 -❌ Feature-complete platform
1323 -❌ Perfectly accurate analysis
1324 -❌ Polished user experience
1355 +❌ Building a product
1356 +❌ Production-ready system
1357 +❌ Feature-complete platform
1358 +❌ Perfectly accurate analysis
1359 +❌ Polished user experience
1325 1325  
1326 ----
1327 -
1328 1328  == 16. Success = Clear Path Forward ==
1329 1329  
1330 1330  **If POC succeeds (≥70% AI quality):**
... ... @@ -1342,18 +1342,63 @@
1342 1342  
1343 1343  **Either way, POC provides clarity.**
1344 1344  
1345 ----
1346 -
1347 1347  == 17. Related Pages ==
1348 1348  
1349 -* [[User Needs>>FactHarbor.Specification.Requirements.User Needs]]
1350 -* [[Requirements>>FactHarbor.Requirements.WebHome]]
1351 -* [[Gap Analysis>>FactHarbor.Analysis.GapAnalysis]]
1380 +* [[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]
1381 +* [[Requirements>>FactHarbor.Specification.Requirements.WebHome]]
1382 +* [[Gap Analysis>>FactHarbor.Specification.Requirements.GapAnalysis]]
1352 1352  * [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]
1353 1353  * [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
1354 1354  * [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]
1355 1355  
1356 ----
1357 -
1358 1358  **Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)
1359 1359  
1389 +
1390 +=== NFR-POC-11: LLM Provider Abstraction (POC1) ===
1391 +
1392 +**Requirement:** POC1 MUST implement LLM abstraction layer with support for multiple providers.
1393 +
1394 +**POC1 Implementation:**
1395 +
1396 +* **Primary Provider:** Anthropic Claude API
1397 + * Stage 1: Provider-default FAST model
1398 + * Stage 2: Provider-default REASONING model (cached)
1399 + * Stage 3: Provider-default REASONING model
1400 +
1401 +* **Provider Interface:** Abstract LLMProvider interface implemented
1402 +
1403 +* **Configuration:** Environment variables for provider selection
1404 + * {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}}
1405 + * {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}}
1406 + * {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}}
1407 +
1408 +* **Failover:** Basic error handling with cache fallback for Stage 2
1409 +
1410 +* **Cost Tracking:** Log provider name and cost per request
1411 +
1412 +**Future (POC2/Beta):**
1413 +
1414 +* Secondary provider (OpenAI) with automatic failover
1415 +* Admin API for runtime provider switching
1416 +* Cost comparison dashboard
1417 +* Cross-provider output verification
1418 +
1419 +**Success Criteria:**
1420 +
1421 +* All LLM calls go through abstraction layer (no direct API calls)
1422 +* Provider can be changed via environment variable without code changes
1423 +* Cost tracking includes provider name in logs
1424 +* Stage 2 falls back to cache on provider failure
1425 +
1426 +**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor.Specification.POC.API-and-Schemas.WebHome]] Section 6
1427 +
1428 +**Dependencies:**
1429 +* NFR-14 (Main Requirements)
1430 +* Design Decision 9
1431 +* Architecture Section 2.2
1432 +
1433 +**Priority:** HIGH (P1)
1434 +
1435 +**Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.
1436 +
1437 +