Wiki source code of POC Requirements (POC1 & POC2)

Last modified by Robert Schaub on 2025/12/24 20:16

Show last authors
1 = POC Requirements =
2
3
4 {{info}}
5 **POC1 Architecture:** 3-stage AKEL pipeline (Extract → Analyze → Holistic) with Redis caching, credit tracking, and LLM abstraction layer.
6
7 See [[POC1 API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome]] for complete technical details.
8 {{/info}}
9
10
11
12 **Status:** ✅ Approved for Development
13 **Version:** 2.0 (Updated after Specification Cross-Check)
14 **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
15
16 == 1. POC Overview ==
17
18 === 1.1 What POC Tests ===
19
20 **Core Question:**
21
22 > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?
23
24 **What we're proving:**
25
26 * AI can identify factual claims from text
27 * AI can evaluate those claims and produce verdicts
28 * Output is comprehensible and useful
29 * Fully automated approach is viable
30
31 **What we're NOT testing:**
32
33 * Scenario generation (deferred to POC2)
34 * Evidence display (deferred to POC2)
35 * Production scalability
36 * Perfect accuracy
37 * Complete feature set
38
39 === 1.2 Scenarios Deferred to POC2 ===
40
41 **Intentional Simplification:**
42
43 Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**.
44
45 **Rationale:**
46
47 * **POC1 tests:** Can AI extract claims and generate verdicts?
48 * **POC2 will add:** Scenario generation and management
49 * **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow?
50
51 **Design Decision:**
52
53 Prove basic AI capability first, then add scenario complexity based on POC1 learnings. This is good engineering: test the hardest part (AI fact-checking) before adding architectural complexity.
54
55 **No Risk:**
56
57 Scenarios are additive complexity, not foundational. Deferring them to POC2 allows:
58
59 * Faster POC1 validation
60 * Learning from POC1 to inform scenario design
61 * Iterative approach: fail fast if basic AI doesn't work
62 * Flexibility to adjust scenario architecture based on POC1 insights
63
64 **Full System Workflow (Future):**
65 {{code}}Claims → Scenarios → Evidence → Verdicts{{/code}}
66
67 **POC1 Simplified Workflow:**
68 {{code}}Claims → Verdicts (scenarios implicit in reasoning){{/code}}
69
70 == 2. POC Output Specification ==
71
72 === 2.1 Component 1: ANALYSIS SUMMARY (Context-Aware) ===
73
74 **What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument
75
76 **Length:** 4-6 sentences
77
78 **Content (Required Elements):**
79
80 1. **Article's main thesis/claim** - What is the article trying to argue or prove?
81 2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts
82 3. **Central vs. supporting claims** - Which claims are central to the article's argument?
83 4. **Relationship assessment** - Do the claims support the article's conclusion?
84 5. **Overall credibility** - Final assessment considering claim importance
85
86 **Critical Innovation:**
87
88 POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might:
89
90 * Make accurate supporting facts but draw unsupported conclusions
91 * Have one false central claim that invalidates the whole argument
92 * Misframe accurate information to mislead
93
94 **Good Example (Context-Aware):**
95 {{code}}This article argues that coffee cures cancer based on its antioxidant
96 content. We analyzed 3 factual claims: 2 about coffee's chemical
97 properties are well-supported, but the main causal claim is refuted
98 by current evidence. The article confuses correlation with causation.
99 Overall assessment: MISLEADING - makes an unsupported medical claim
100 despite citing some accurate facts.{{/code}}
101
102 **Poor Example (Simple Aggregation - Don't Do This):**
103 {{code}}This article makes 3 claims. 2 are well-supported and 1 is refuted.
104 Overall assessment: mostly accurate (67% accurate).{{/code}}
105 ↑ This misses that the refuted claim IS the article's main point!
106
107 **What POC1 Tests:**
108
109 Can AI identify and assess:
110
111 * ✅ The article's main thesis/conclusion?
112 * ✅ Which claims are central vs. supporting?
113 * ✅ Whether the evidence supports the conclusion?
114 * ✅ Overall credibility considering logical structure?
115
116 **If AI Cannot Do This:**
117
118 That's valuable to learn in POC1! We'll:
119
120 * Note as limitation
121 * Fall back to simple aggregation with warning
122 * Design explicit article-level analysis for POC2
123
124 === 2.2 Component 2: CLAIMS IDENTIFICATION ===
125
126 **What:** List of factual claims extracted from article
127 **Format:** Numbered list
128 **Quantity:** 3-5 claims
129 **Requirements:**
130
131 * Factual claims only (not opinions/questions)
132 * Clearly stated
133 * Automatically extracted by AI
134
135 **Example:**
136 {{code}}CLAIMS IDENTIFIED:
137
138 [1] Coffee reduces diabetes risk by 30%
139 [2] Coffee improves heart health
140 [3] Decaf has same benefits as regular
141 [4] Coffee prevents Alzheimer's completely{{/code}}
142
143 === 2.3 Component 3: CLAIMS VERDICTS ===
144
145 **What:** Verdict for each claim identified
146 **Format:** Per claim structure
147
148 **Required Elements:**
149
150 * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
151 * **Confidence Score:** 0-100%
152 * **Brief Reasoning:** 1-3 sentences explaining why
153 * **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration
154
155 **Example:**
156 {{code}}VERDICTS:
157
158 [1] WELL-SUPPORTED (85%) [Risk: C]
159 Multiple studies confirm 25-30% risk reduction with regular consumption.
160
161 [2] UNCERTAIN (65%) [Risk: B]
162 Evidence is mixed. Some studies show benefits, others show no effect.
163
164 [3] PARTIALLY SUPPORTED (60%) [Risk: C]
165 Some benefits overlap, but caffeine-related benefits are reduced in decaf.
166
167 [4] REFUTED (90%) [Risk: B]
168 No evidence for complete prevention. Claim is significantly overstated.{{/code}}
169
170 **Risk Tier Display:**
171
172 * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections
173 * **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
174 * **Tier C (Green):** Low Risk - Facts/Definitions/History
175
176 **Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.
177
178 === 2.4 Component 4: ARTICLE SUMMARY (Optional) ===
179
180 **What:** Brief summary of original article content
181 **Length:** 3-5 sentences
182 **Tone:** Neutral (article's position, not FactHarbor's analysis)
183
184 **Example:**
185 {{code}}ARTICLE SUMMARY:
186
187 Health News Today article discusses coffee benefits, citing studies
188 on diabetes and Alzheimer's. Author highlights research linking coffee
189 to disease prevention. Recommends 2-3 cups daily for optimal health.{{/code}}
190
191 === 2.5 Component 5: USAGE STATISTICS (Cost Tracking) ===
192
193 **What:** LLM usage metrics for cost optimization and scaling decisions
194
195 **Purpose:**
196
197 * Understand cost per analysis
198 * Identify optimization opportunities
199 * Project costs at scale
200 * Inform architecture decisions
201
202 **Display Format:**
203 {{code}}USAGE STATISTICS:
204 • Article: 2,450 words (12,300 characters)
205 • Input tokens: 15,234
206 • Output tokens: 892
207 • Total tokens: 16,126
208 • Estimated cost: $0.24 USD
209 • Response time: 8.3 seconds
210 • Cost per claim: $0.048
211 • Model: claude-sonnet-4-20250514{{/code}}
212
213 **Why This Matters:**
214
215 At scale, LLM costs are critical:
216
217 * 10,000 articles/month ≈ $200-500/month
218 * 100,000 articles/month ≈ $2,000-5,000/month
219 * Cost optimization can reduce expenses 30-50%
220
221 **What POC1 Learns:**
222
223 * How cost scales with article length
224 * Prompt optimization opportunities (caching, compression)
225 * Output verbosity tradeoffs
226 * Model selection strategy (Sonnet vs. Haiku)
227 * Article length limits (if needed)
228
229 **Implementation:**
230
231 * Claude API already returns usage data
232 * No extra API calls needed
233 * Display to user + log for aggregate analysis
234 * Test with articles of varying lengths
235
236 **Critical for GO/NO-GO:** Unit economics must be viable at scale!
237
238 === 2.6 Total Output Size ===
239
240 **Combined:** 220-350 words
241
242 * Analysis Summary (Context-Aware): 60-90 words (4-6 sentences)
243 * Claims Identification: 30-50 words
244 * Claims Verdicts: 100-150 words
245 * Article Summary: 30-50 words (optional)
246
247 **Note:** Analysis summary is slightly longer (4-6 sentences vs. 3-5) to accommodate context-aware assessment of article structure and logical reasoning.
248
249 == 3. What's NOT in POC Scope ==
250
251 === 3.1 Feature Exclusions ===
252
253 The following are **explicitly excluded** from POC:
254
255 **Content Features:**
256
257 * ❌ Scenarios (deferred to POC2)
258 * ❌ Evidence display (supporting/opposing lists)
259 * ❌ Source links (clickable references)
260 * ❌ Detailed reasoning chains
261 * ❌ Source quality ratings (shown but not detailed)
262 * ❌ Contradiction detection (basic only)
263 * ❌ Risk assessment (shown but not workflow-integrated)
264
265 **Platform Features:**
266
267 * ❌ User accounts / authentication
268 * ❌ Saved history
269 * ❌ Search functionality
270 * ❌ Claim comparison
271 * ❌ User contributions
272 * ❌ Commenting system
273 * ❌ Social sharing
274
275 **Technical Features:**
276
277 * ❌ Browser extensions
278 * ❌ Mobile apps
279 * ❌ API endpoints
280 * ❌ Webhooks
281 * ❌ Export features (PDF, CSV)
282
283 **Quality Features:**
284
285 * ❌ Accessibility (WCAG compliance)
286 * ❌ Multilingual support
287 * ❌ Mobile optimization
288 * ❌ Media verification (images/videos)
289
290 **Production Features:**
291
292 * ❌ Security hardening
293 * ❌ Privacy compliance (GDPR)
294 * ❌ Terms of service
295 * ❌ Monitoring/logging
296 * ❌ Error tracking
297 * ❌ Analytics
298 * ❌ A/B testing
299
300 == 4. POC Simplifications vs. Full System ==
301
302 === 4.1 Architecture Comparison ===
303
304 **POC Architecture (Simplified):**
305 {{code}}User Input → Single AKEL Call → Output Display
306 (all processing){{/code}}
307
308 **Full System Architecture:**
309 {{code}}User Input → Claim Extractor → Claim Classifier → Scenario Generator
310 → Evidence Summarizer → Contradiction Detector → Verdict Generator
311 → Quality Gates → Publication → Output Display{{/code}}
312
313 **Key Differences:**
314
315 |=Aspect|=POC1|=Full System
316 |Processing|Single API call|Multi-component pipeline
317 |Scenarios|None (implicit)|Explicit entities with versioning
318 |Evidence|Basic retrieval|Comprehensive with quality scoring
319 |Quality Gates|Simplified (4 basic checks)|Full validation infrastructure
320 |Workflow|3 steps (input/process/output)|6 phases with gates
321 |Data Model|Stateless (no database)|PostgreSQL + Redis + S3
322 |Architecture|Single prompt to Claude|AKEL Orchestrator + Components
323
324 === 4.2 Workflow Comparison ===
325
326 **POC1 Workflow:**
327
328 1. User submits text/URL
329 2. Single AKEL call (all processing in one prompt)
330 3. Display results
331 **Total: 3 steps, 10-18 seconds**
332
333 **Full System Workflow:**
334
335 1. **Claim Submission** (extraction, normalization, clustering)
336 2. **Scenario Building** (definitions, assumptions, boundaries)
337 3. **Evidence Handling** (retrieval, assessment, linking)
338 4. **Verdict Creation** (synthesis, reasoning, approval)
339 5. **Public Presentation** (summaries, landscapes, deep dives)
340 6. **Time Evolution** (versioning, re-evaluation triggers)
341 **Total: 6 phases with quality gates, 10-30 seconds**
342
343 === 4.3 Why POC is Simplified ===
344
345 **Engineering Rationale:**
346
347 1. **Test core capability first:** Can AI do basic fact-checking without humans?
348 2. **Fail fast:** If AI can't generate reasonable verdicts, pivot early
349 3. **Learn before building:** POC1 insights inform full architecture
350 4. **Iterative approach:** Add complexity only after validating foundations
351 5. **Resource efficiency:** Don't build full system if core concept fails
352
353 **Acceptable Trade-offs:**
354
355 * ✅ POC proves AI capability (most risky assumption)
356 * ✅ POC validates user comprehension (can people understand output?)
357 * ❌ POC doesn't validate full workflow (test in Beta)
358 * ❌ POC doesn't validate scale (test in Beta)
359 * ❌ POC doesn't validate scenario architecture (design in POC2)
360
361 === 4.4 Gap Between POC1 and POC2/Beta ===
362
363 **What needs to be built for POC2:**
364
365 * Scenario generation component
366 * Evidence Model structure (full)
367 * Scenario-evidence linking
368 * Multi-interpretation comparison
369 * Truth landscape visualization
370
371 **What needs to be built for Beta:**
372
373 * Multi-component AKEL pipeline
374 * Quality gate infrastructure
375 * Review workflow system
376 * Audit sampling framework
377 * Production data model
378 * Federation architecture (Release 1.0)
379
380 **POC1 → POC2 is significant architectural expansion.**
381
382 == 5. Publication Mode & Labeling ==
383
384 === 5.1 POC Publication Mode ===
385
386 **Mode:** Mode 2 (AI-Generated, No Prior Human Review)
387
388 Per FactHarbor Specification Section 11 "POC v1 Behavior":
389
390 * Produces public AI-generated output
391 * No human approval gate
392 * Clear AI-Generated labeling
393 * All quality gates active (simplified)
394 * Risk tier classification shown (demo)
395
396 === 5.2 User-Facing Labels ===
397
398 **Primary Label (top of analysis):**
399 {{code}}╔════════════════════════════════════════════════════════════╗
400 ║ [AI-GENERATED - POC/DEMO] ║
401 ║ ║
402 ║ This analysis was produced entirely by AI and has not ║
403 ║ been human-reviewed. Use for demonstration purposes. ║
404 ║ ║
405 ║ Source: AI/AKEL v1.0 (POC) ║
406 ║ Review Status: Not Reviewed (Proof-of-Concept) ║
407 ║ Quality Gates: 4/4 Passed (Simplified) ║
408 ║ Last Updated: [timestamp] ║
409 ╚════════════════════════════════════════════════════════════╝{{/code}}
410
411 **Per-Claim Risk Labels:**
412
413 * **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety)
414 * **[Risk: B]** 🟡 Medium Risk (Policy/Science)
415 * **[Risk: C]** 🟢 Low Risk (Facts/Definitions)
416
417 === 5.3 Display Requirements ===
418
419 **Must Show:**
420
421 * AI-Generated status (prominent)
422 * POC/Demo disclaimer
423 * Risk tier per claim
424 * Confidence scores (0-100%)
425 * Quality gate status (passed/failed)
426 * Timestamp
427
428 **Must NOT Claim:**
429
430 * Human review
431 * Production quality
432 * Medical/legal advice
433 * Authoritative verdicts
434 * Complete accuracy
435
436 === 5.4 Mode 2 vs. Full System Publication ===
437
438 |=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3
439 |Label|AI-Generated (POC)|AI-Generated|AKEL-Generated
440 |Review|None|None|Human-Reviewed
441 |Quality Gates|4 (simplified)|6 (full)|6 (full) + Human
442 |Audit|None (POC)|Sampling (5-50%)|Pre-publication
443 |Risk Display|Demo only|Workflow-integrated|Validated
444 |User Actions|View only|Flag for review|Trust rating
445
446 == 6. Quality Gates (Simplified Implementation) ==
447
448 === 6.1 Overview ===
449
450 Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates.
451
452 **Full System Has 4 Gates:**
453
454 1. Source Quality
455 2. Contradiction Search (MANDATORY)
456 3. Uncertainty Quantification
457 4. Structural Integrity
458
459 **POC Implements Simplified Versions:**
460
461 * Focus on demonstrating concept
462 * Basic implementations sufficient
463 * Failures displayed to user (not blocking)
464 * Full system has comprehensive validation
465
466 === 6.2 Gate 1: Source Quality (Basic) ===
467
468 **Full System Requirements:**
469
470 * Primary sources identified and accessible
471 * Source reliability scored against whitelist
472 * Citation completeness verified
473 * Publication dates checked
474 * Author credentials validated
475
476 **POC Implementation:**
477
478 * ✅ At least 2 sources found
479 * ✅ Sources accessible (URLs valid)
480 * ❌ No whitelist checking
481 * ❌ No credential validation
482 * ❌ No comprehensive reliability scoring
483
484 **Pass Criteria:** ≥2 accessible sources found
485
486 **Failure Handling:** Display error message, don't generate verdict
487
488 === 6.3 Gate 2: Contradiction Search (Basic) ===
489
490 **Full System Requirements:**
491
492 * Counter-evidence actively searched
493 * Reservations and limitations identified
494 * Alternative interpretations explored
495 * Bubble detection (echo chambers, conspiracy theories)
496 * Cross-cultural and international perspectives
497 * Academic literature (supporting AND opposing)
498
499 **POC Implementation:**
500
501 * ✅ Basic search for counter-evidence
502 * ✅ Identify obvious contradictions
503 * ❌ No comprehensive academic search
504 * ❌ No bubble detection
505 * ❌ No systematic alternative interpretation search
506 * ❌ No international perspective verification
507
508 **Pass Criteria:** Basic contradiction search attempted
509
510 **Failure Handling:** Note "limited contradiction search" in output
511
512 === 6.4 Gate 3: Uncertainty Quantification (Basic) ===
513
514 **Full System Requirements:**
515
516 * Confidence scores calculated for all claims/verdicts
517 * Limitations explicitly stated
518 * Data gaps identified and disclosed
519 * Strength of evidence assessed
520 * Alternative scenarios considered
521
522 **POC Implementation:**
523
524 * ✅ Confidence scores (0-100%)
525 * ✅ Basic uncertainty acknowledgment
526 * ❌ No detailed limitation disclosure
527 * ❌ No data gap identification
528 * ❌ No alternative scenario consideration (deferred to POC2)
529
530 **Pass Criteria:** Confidence score assigned
531
532 **Failure Handling:** Show "Confidence: Unknown" if calculation fails
533
534 === 6.5 Gate 4: Structural Integrity (Basic) ===
535
536 **Full System Requirements:**
537
538 * No hallucinations detected (fact-checking against sources)
539 * Logic chain valid and traceable
540 * References accessible and verifiable
541 * No circular reasoning
542 * Premises clearly stated
543
544 **POC Implementation:**
545
546 * ✅ Basic coherence check
547 * ✅ References accessible
548 * ❌ No comprehensive hallucination detection
549 * ❌ No formal logic validation
550 * ❌ No premise extraction and verification
551
552 **Pass Criteria:** Output is coherent and references are accessible
553
554 **Failure Handling:** Display error message
555
556 === 6.6 Quality Gate Display ===
557
558 **POC shows simplified status:**
559 {{code}}Quality Gates: 4/4 Passed (Simplified)
560 ✓ Source Quality: 3 sources found
561 ✓ Contradiction Search: Basic search completed
562 ✓ Uncertainty: Confidence scores assigned
563 ✓ Structural Integrity: Output coherent{{/code}}
564
565 **If any gate fails:**
566 {{code}}Quality Gates: 3/4 Passed (Simplified)
567 ✓ Source Quality: 3 sources found
568 ✗ Contradiction Search: Search failed - limited evidence
569 ✓ Uncertainty: Confidence scores assigned
570 ✓ Structural Integrity: Output coherent
571
572 Note: This analysis has limited evidence. Use with caution.{{/code}}
573
574 === 6.7 Simplified vs. Full System ===
575
576 |=Gate|=POC (Simplified)|=Full System
577 |Source Quality|≥2 sources accessible|Whitelist scoring, credentials, comprehensiveness
578 |Contradiction|Basic search|Systematic academic + media + international
579 |Uncertainty|Confidence % assigned|Detailed limitations, data gaps, alternatives
580 |Structural|Coherence check|Hallucination detection, logic validation, premise check
581
582 **POC Goal:** Demonstrate that quality gates are possible, not perfect implementation.
583
584 == 7. AKEL Architecture Comparison ==
585
586 === 7.1 POC AKEL (Simplified) ===
587
588 **Implementation:**
589
590 * Single Claude API call (Sonnet 4.5)
591 * One comprehensive prompt
592 * All processing in single request
593 * No separate components
594 * No orchestration layer
595
596 **Prompt Structure:**
597 {{code}}Task: Analyze this article and provide:
598
599 1. Extract 3-5 factual claims
600 2. For each claim:
601 - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
602 - Assign confidence score (0-100%)
603 - Assign risk tier (A/B/C)
604 - Write brief reasoning (1-3 sentences)
605 3. Generate analysis summary (3-5 sentences)
606 4. Generate article summary (3-5 sentences)
607 5. Run basic quality checks
608
609 Return as structured JSON.{{/code}}
610
611 **Processing Time:** 10-18 seconds (estimate)
612
613 === 7.2 Full System AKEL (Production) ===
614
615 **Architecture:**
616 {{code}}AKEL Orchestrator
617 ├── Claim Extractor
618 ├── Claim Classifier (with risk tier assignment)
619 ├── Scenario Generator
620 ├── Evidence Summarizer
621 ├── Contradiction Detector
622 ├── Quality Gate Validator
623 ├── Audit Sampling Scheduler
624 └── Federation Sync Adapter (Release 1.0+){{/code}}
625
626 **Processing:**
627
628 * Parallel processing where possible
629 * Separate component calls
630 * Quality gates between phases
631 * Audit sampling selection
632 * Cross-node coordination (federated mode)
633
634 **Processing Time:** 10-30 seconds (full pipeline)
635
636 === 7.3 Why POC Uses Single Call ===
637
638 **Advantages:**
639
640 * ✅ Simpler to implement
641 * ✅ Faster POC development
642 * ✅ Easier to debug
643 * ✅ Proves AI capability
644 * ✅ Good enough for concept validation
645
646 **Limitations:**
647
648 * ❌ No component reusability
649 * ❌ No parallel processing
650 * ❌ All-or-nothing (can't partially succeed)
651 * ❌ Harder to improve individual components
652 * ❌ No audit sampling
653
654 **Acceptable Trade-off:**
655
656 POC tests "Can AI do this?" not "How should we architect it?"
657
658 Full component architecture comes in Beta after POC validates concept.
659
660 === 7.4 Evolution Path ===
661
662 **POC1:** Single prompt → Prove concept
663 **POC2:** Add scenario component → Test full pipeline
664 **Beta:** Multi-component AKEL → Production architecture
665 **Release 1.0:** Full AKEL + Federation → Scale
666
667 == 8. Functional Requirements ==
668
669 === FR-POC-1: Article Input ===
670
671 **Requirement:** User can submit article for analysis
672
673 **Functionality:**
674
675 * Text input field (paste article text, up to 5000 characters)
676 * URL input field (paste article URL)
677 * "Analyze" button to trigger processing
678 * Loading indicator during analysis
679
680 **Excluded:**
681
682 * No user authentication
683 * No claim history
684 * No search functionality
685 * No saved templates
686
687 **Acceptance Criteria:**
688
689 * User can paste text from article
690 * User can paste URL of article
691 * System accepts input and triggers analysis
692
693 === FR-POC-2: Claim Extraction (Fully Automated) ===
694
695 **Requirement:** AI automatically extracts 3-5 factual claims
696
697 **Functionality:**
698
699 * AI reads article text
700 * AI identifies factual claims (not opinions/questions)
701 * AI extracts 3-5 most important claims
702 * System displays numbered list
703
704 **Critical:** NO MANUAL EDITING ALLOWED
705
706 * AI selects which claims to extract
707 * AI identifies factual vs. non-factual
708 * System processes claims as extracted
709 * No human curation or correction
710
711 **Error Handling:**
712
713 * If extraction fails: Display error message
714 * User can retry with different input
715 * No manual intervention to fix extraction
716
717 **Acceptance Criteria:**
718
719 * AI extracts 3-5 claims automatically
720 * Claims are factual (not opinions)
721 * Claims are clearly stated
722 * No manual editing required
723
724 === FR-POC-3: Verdict Generation (Fully Automated) ===
725
726 **Requirement:** AI automatically generates verdict for each claim
727
728 **Functionality:**
729
730 * For each claim, AI:
731 * Evaluates claim based on available evidence/knowledge
732 * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
733 * Assigns confidence score (0-100%)
734 * Assigns risk tier (A/B/C)
735 * Writes brief reasoning (1-3 sentences)
736 * System displays verdict for each claim
737
738 **Critical:** NO MANUAL EDITING ALLOWED
739
740 * AI computes verdicts based on evidence
741 * AI generates confidence scores
742 * AI writes reasoning
743 * No human review or adjustment
744
745 **Error Handling:**
746
747 * If verdict generation fails: Display error message
748 * User can retry
749 * No manual intervention to adjust verdicts
750
751 **Acceptance Criteria:**
752
753 * Each claim has a verdict
754 * Confidence score is displayed (0-100%)
755 * Risk tier is displayed (A/B/C)
756 * Reasoning is understandable (1-3 sentences)
757 * Verdict is defensible given reasoning
758 * All generated automatically by AI
759
760 === FR-POC-4: Analysis Summary (Fully Automated) ===
761
762 **Requirement:** AI generates brief summary of analysis
763
764 **Functionality:**
765
766 * AI summarizes findings in 3-5 sentences:
767 * How many claims found
768 * Distribution of verdicts
769 * Overall assessment
770 * System displays at top of results
771
772 **Critical:** NO MANUAL EDITING ALLOWED
773
774 **Acceptance Criteria:**
775
776 * Summary is coherent
777 * Accurately reflects analysis
778 * 3-5 sentences
779 * Automatically generated
780
781 === FR-POC-5: Article Summary (Fully Automated, Optional) ===
782
783 **Requirement:** AI generates brief summary of original article
784
785 **Functionality:**
786
787 * AI summarizes article content (not FactHarbor's analysis)
788 * 3-5 sentences
789 * System displays
790
791 **Note:** Optional - can skip if time limited
792
793 **Critical:** NO MANUAL EDITING ALLOWED
794
795 **Acceptance Criteria:**
796
797 * Summary is neutral (article's position)
798 * Accurately reflects article content
799 * 3-5 sentences
800 * Automatically generated
801
802 === FR-POC-6: Publication Mode Display ===
803
804 **Requirement:** Clear labeling of AI-generated content
805
806 **Functionality:**
807
808 * Display Mode 2 publication label
809 * Show POC/Demo disclaimer
810 * Display risk tiers per claim
811 * Show quality gate status
812 * Display timestamp
813
814 **Acceptance Criteria:**
815
816 * Label is prominent and clear
817 * User understands this is AI-generated POC output
818 * Risk tiers are color-coded
819 * Quality gate status is visible
820
821 === FR-POC-7: Quality Gate Execution ===
822
823 **Requirement:** Execute simplified quality gates
824
825 **Functionality:**
826
827 * Check source quality (basic)
828 * Attempt contradiction search (basic)
829 * Calculate confidence scores
830 * Verify structural integrity (basic)
831 * Display gate results
832
833 **Acceptance Criteria:**
834
835 * All 4 gates attempted
836 * Pass/fail status displayed
837 * Failures explained to user
838 * Gates don't block publication (POC mode)
839
840 == 9. Non-Functional Requirements ==
841
842 === NFR-POC-1: Fully Automated Processing ===
843
844 **Requirement:** Complete AI automation with zero manual intervention
845
846 **Critical Rule:** NO MANUAL EDITING AT ANY STAGE
847
848 **What this means:**
849
850 * Claims: AI selects (no human curation)
851 * Scenarios: N/A (deferred to POC2)
852 * Evidence: AI evaluates (no human selection)
853 * Verdicts: AI determines (no human adjustment)
854 * Summaries: AI writes (no human editing)
855
856 **Pipeline:**
857 {{code}}User Input → AKEL Processing → Output Display
858
859 ZERO human editing{{/code}}
860
861 **If AI output is poor:**
862
863 * ❌ Do NOT manually fix it
864 * ✅ Document the failure
865 * ✅ Improve prompts and retry
866 * ✅ Accept that POC might fail
867
868 **Why this matters:**
869
870 * Tests whether AI can do this without humans
871 * Validates scalability (humans can't review every analysis)
872 * Honest test of technical feasibility
873
874 === NFR-POC-2: Performance ===
875
876 **Requirement:** Analysis completes in reasonable time
877
878 **Acceptable Performance:**
879
880 * Processing time: 1-5 minutes (acceptable for POC)
881 * Display loading indicator to user
882 * Show progress if possible ("Extracting claims...", "Generating verdicts...")
883
884 **Not Required:**
885
886 * Production-level speed (< 30 seconds)
887 * Optimization for scale
888 * Caching
889
890 **Acceptance Criteria:**
891
892 * Analysis completes within 5 minutes
893 * User sees loading indicator
894 * No timeout errors
895
896 === NFR-POC-3: Reliability ===
897
898 **Requirement:** System works for manual testing sessions
899
900 **Acceptable:**
901
902 * Occasional errors (< 20% failure rate)
903 * Manual restart if needed
904 * Display error messages clearly
905
906 **Not Required:**
907
908 * 99.9% uptime
909 * Automatic error recovery
910 * Production monitoring
911
912 **Acceptance Criteria:**
913
914 * System works for test demonstrations
915 * Errors are handled gracefully
916 * User receives clear error messages
917
918 === NFR-POC-4: Environment ===
919
920 **Requirement:** Runs on simple infrastructure
921
922 **Acceptable:**
923
924 * Single machine or simple cloud setup
925 * No distributed architecture
926 * No load balancing
927 * No redundancy
928 * Local development environment viable
929
930 **Not Required:**
931
932 * Production infrastructure
933 * Multi-region deployment
934 * Auto-scaling
935 * Disaster recovery
936
937 === NFR-POC-5: Cost Efficiency Tracking ===
938
939 **Requirement:** Track and display LLM usage metrics to inform optimization decisions
940
941 **Must Track:**
942
943 * Input tokens (article + prompt)
944 * Output tokens (generated analysis)
945 * Total tokens
946 * Estimated cost (USD)
947 * Response time (seconds)
948 * Article length (words/characters)
949
950 **Must Display:**
951
952 * Usage statistics in UI (Component 5)
953 * Cost per analysis
954 * Cost per claim extracted
955
956 **Must Log:**
957
958 * Aggregate metrics for analysis
959 * Cost distribution by article length
960 * Token efficiency trends
961
962 **Purpose:**
963
964 * Understand unit economics
965 * Identify optimization opportunities
966 * Project costs at scale
967 * Inform architecture decisions (caching, model selection, etc.)
968
969 **Acceptance Criteria:**
970
971 * ✅ Usage data displayed after each analysis
972 * ✅ Metrics logged for aggregate analysis
973 * ✅ Cost calculated accurately (Claude API pricing)
974 * ✅ Test cases include varying article lengths
975 * ✅ POC1 report includes cost analysis section
976
977 **Success Target:**
978
979 * Average cost per analysis < $0.05 USD
980 * Cost scaling behavior understood (linear/exponential)
981 * 2+ optimization opportunities identified
982
983 **Critical:** Unit economics must be viable for scaling decision!
984
985 == 10. Technical Architecture ==
986
987 === 10.1 System Components ===
988
989 **Frontend:**
990
991 * Simple HTML form (text input + URL input + button)
992 * Loading indicator
993 * Results display page (single page, no tabs/navigation)
994
995 **Backend:**
996
997 * Single API endpoint
998 * Calls Claude API (Sonnet 4.5 or latest)
999 * Parses response
1000 * Returns JSON to frontend
1001
1002 **Data Storage:**
1003
1004 * None required (stateless POC)
1005 * Optional: Simple file storage or SQLite for demo examples
1006
1007 **External Services:**
1008
1009 * Claude API (Anthropic) - required
1010 * Optional: URL fetch service for article text extraction
1011
1012 === 10.2 Processing Flow ===
1013
1014 {{code}}
1015 1. User submits text or URL
1016
1017 2. Backend receives request
1018
1019 3. If URL: Fetch article text
1020
1021 4. Call Claude API with single prompt:
1022 "Extract claims, evaluate each, provide verdicts"
1023
1024 5. Claude API returns:
1025 - Analysis summary
1026 - Claims list
1027 - Verdicts for each claim (with risk tiers)
1028 - Article summary (optional)
1029 - Quality gate results
1030
1031 6. Backend parses response
1032
1033 7. Frontend displays results with Mode 2 labeling
1034 {{/code}}
1035
1036 **Key Simplification:** Single API call does entire analysis
1037
1038 === 10.3 AI Prompt Strategy ===
1039
1040 **Single Comprehensive Prompt:**
1041 {{code}}Task: Analyze this article and provide:
1042
1043 1. Identify the article's main thesis/conclusion
1044 - What is the article trying to argue or prove?
1045 - What is the primary claim or conclusion?
1046
1047 2. Extract 3-5 factual claims from the article
1048 - Note which claims are CENTRAL to the main thesis
1049 - Note which claims are SUPPORTING facts
1050
1051 3. For each claim:
1052 - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
1053 - Assign confidence score (0-100%)
1054 - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
1055 - Write brief reasoning (1-3 sentences)
1056
1057 4. Assess relationship between claims and main thesis:
1058 - Do the claims actually support the article's conclusion?
1059 - Are there logical leaps or unsupported inferences?
1060 - Is the article's framing misleading even if individual facts are accurate?
1061
1062 5. Run quality gates:
1063 - Check: ≥2 sources found
1064 - Attempt: Basic contradiction search
1065 - Calculate: Confidence scores
1066 - Verify: Structural integrity
1067
1068 6. Write context-aware analysis summary (4-6 sentences):
1069 - State article's main thesis
1070 - Report claims found and verdict distribution
1071 - Note if central claims are problematic
1072 - Assess whether evidence supports conclusion
1073 - Overall credibility considering claim importance
1074
1075 7. Write article summary (3-5 sentences: neutral summary of article content)
1076
1077 Return as structured JSON with quality gate results.{{/code}}
1078
1079 **One prompt generates everything.**
1080
1081 **Critical Addition:**
1082
1083 Steps 1, 2 (marking central claims), 4, and 6 are NEW for context-aware analysis. These test whether AI can distinguish between "accurate facts poorly reasoned" vs. "genuinely credible article."
1084
1085 === 10.4 Technology Stack Suggestions ===
1086
1087 **Frontend:**
1088
1089 * HTML + CSS + JavaScript (minimal framework)
1090 * OR: Next.js (if team prefers)
1091 * Hosted: Local machine OR Vercel/Netlify free tier
1092
1093 **Backend:**
1094
1095 * Python Flask/FastAPI (simple REST API)
1096 * OR: Next.js API routes (if using Next.js)
1097 * Hosted: Local machine OR Railway/Render free tier
1098
1099 **AKEL Integration:**
1100
1101 * Claude API via Anthropic SDK
1102 * Model: Claude Sonnet 4.5 or latest available
1103
1104 **Database:**
1105
1106 * None (stateless acceptable)
1107 * OR: SQLite if want to store demo examples
1108 * OR: JSON files on disk
1109
1110 **Deployment:**
1111
1112 * Local development environment sufficient for POC
1113 * Optional: Deploy to cloud for remote demos
1114
1115 == 11. Success Criteria ==
1116
1117 === 11.1 Minimum Success (POC Passes) ===
1118
1119 **Required for GO decision:**
1120
1121 * ✅ AI extracts 3-5 factual claims automatically
1122 * ✅ AI provides verdict for each claim automatically
1123 * ✅ Verdicts are reasonable (≥70% make logical sense)
1124 * ✅ Analysis summary is coherent
1125 * ✅ Output is comprehensible to reviewers
1126 * ✅ Team/advisors understand the output
1127 * ✅ Team agrees approach has merit
1128 * ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention)
1129 * ✅ **Cost efficiency acceptable** (average cost per analysis < $0.05 USD target)
1130 * ✅ **Cost scaling understood** (data collected on article length vs. cost)
1131 * ✅ **Optimization opportunities identified** (≥2 potential improvements documented)
1132
1133 **Quality Definition:**
1134
1135 * "Reasonable verdict" = Defensible given general knowledge
1136 * "Coherent summary" = Logically structured, grammatically correct
1137 * "Comprehensible" = Reviewers understand what analysis means
1138
1139 === 11.2 POC Fails If ===
1140
1141 **Automatic NO-GO if any of these:**
1142
1143 * ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones)
1144 * ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random)
1145 * ❌ Output incomprehensible (reviewers can't understand analysis)
1146 * ❌ **Requires manual editing for most analyses** (> 50% need human correction)
1147 * ❌ Team loses confidence in AI-automated approach
1148
1149 === 11.3 Quality Thresholds ===
1150
1151 **POC quality expectations:**
1152
1153 |=Component|=Quality Threshold|=Definition
1154 |Claim Extraction|(% class="success" %)≥70% accuracy |Identifies obvious factual claims, may miss some edge cases
1155 |Verdict Logic|(% class="success" %)≥70% defensible |Verdicts are logical given reasoning provided
1156 |Reasoning Clarity|(% class="success" %)≥70% clear |1-3 sentences are understandable and relevant
1157 |Overall Analysis|(% class="success" %)≥70% useful |Output helps user understand article claims
1158
1159 **Analogy:** "B student" quality (70-80%), not "A+" perfection yet
1160
1161 **Not expecting:**
1162
1163 * 100% accuracy
1164 * Perfect claim coverage
1165 * Comprehensive evidence gathering
1166 * Flawless verdicts
1167 * Production polish
1168
1169 **Expecting:**
1170
1171 * Reasonable claim extraction
1172 * Defensible verdicts
1173 * Understandable reasoning
1174 * Useful output
1175
1176 == 12. Test Cases ==
1177
1178 === 12.1 Test Case 1: Simple Factual Claim ===
1179
1180 **Input:** "Coffee reduces the risk of type 2 diabetes by 30%"
1181
1182 **Expected Output:**
1183
1184 * Extract claim correctly
1185 * Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED
1186 * Confidence: 70-90%
1187 * Risk tier: C (Low)
1188 * Reasoning: Mentions studies or evidence
1189
1190 **Success:** Verdict is reasonable and reasoning makes sense
1191
1192 === 12.2 Test Case 2: Complex News Article ===
1193
1194 **Input:** News article URL with multiple claims about politics/health/science
1195
1196 **Expected Output:**
1197
1198 * Extract 3-5 key claims
1199 * Verdict for each (may vary: some supported, some uncertain, some refuted)
1200 * Coherent analysis summary
1201 * Article summary
1202 * Risk tiers assigned appropriately
1203
1204 **Success:** Claims identified are actually from article, verdicts are reasonable
1205
1206 === 12.3 Test Case 3: Controversial Topic ===
1207
1208 **Input:** Article on contested political or scientific topic
1209
1210 **Expected Output:**
1211
1212 * Balanced analysis
1213 * Acknowledges uncertainty where appropriate
1214 * Doesn't overstate confidence
1215 * Reasoning shows awareness of complexity
1216
1217 **Success:** Analysis is fair and doesn't show obvious bias
1218
1219 === 12.4 Test Case 4: Clearly False Claim ===
1220
1221 **Input:** Article with obviously false claim (e.g., "The Earth is flat")
1222
1223 **Expected Output:**
1224
1225 * Extract claim
1226 * Verdict: REFUTED
1227 * High confidence (> 90%)
1228 * Risk tier: C (Low - established fact)
1229 * Clear reasoning
1230
1231 **Success:** AI correctly identifies false claim with high confidence
1232
1233 === 12.5 Test Case 5: Genuinely Uncertain Claim ===
1234
1235 **Input:** Article with claim where evidence is genuinely mixed
1236
1237 **Expected Output:**
1238
1239 * Extract claim
1240 * Verdict: UNCERTAIN
1241 * Moderate confidence (40-60%)
1242 * Reasoning explains why uncertain
1243
1244 **Success:** AI recognizes uncertainty and doesn't overstate confidence
1245
1246 === 12.6 Test Case 6: High-Risk Medical Claim ===
1247
1248 **Input:** Article making medical claims
1249
1250 **Expected Output:**
1251
1252 * Extract claim
1253 * Verdict: [appropriate based on evidence]
1254 * Risk tier: A (High - medical)
1255 * Red label displayed
1256 * Clear disclaimer about not being medical advice
1257
1258 **Success:** Risk tier correctly assigned, appropriate warnings shown
1259
1260 == 13. POC Decision Gate ==
1261
1262 === 13.1 Decision Framework ===
1263
1264 After POC testing complete, team makes one of three decisions:
1265
1266 **Option A: GO (Proceed to POC2)**
1267
1268 **Conditions:**
1269
1270 * AI quality ≥70% without manual editing
1271 * Basic claim → verdict pipeline validated
1272 * Internal + advisor feedback positive
1273 * Technical feasibility confirmed
1274 * Team confident in direction
1275 * Clear path to improving AI quality to ≥90%
1276
1277 **Next Steps:**
1278
1279 * Plan POC2 development (add scenarios)
1280 * Design scenario architecture
1281 * Expand to Evidence Model structure
1282 * Test with more complex articles
1283
1284 **Option B: NO-GO (Pivot or Stop)**
1285
1286 **Conditions:**
1287
1288 * AI quality < 60%
1289 * Requires manual editing for most analyses (> 50%)
1290 * Feedback indicates fundamental flaws
1291 * Cost/effort not justified by value
1292 * No clear path to improvement
1293
1294 **Next Steps:**
1295
1296 * **Pivot:** Change to hybrid human-AI approach (accept manual review required)
1297 * **Stop:** Conclude approach not viable, revisit later
1298
1299 **Option C: ITERATE (Improve POC)**
1300
1301 **Conditions:**
1302
1303 * Concept has merit but execution needs work
1304 * Specific improvements identified
1305 * Addressable with better prompts/approach
1306 * AI quality between 60-70%
1307
1308 **Next Steps:**
1309
1310 * Improve AI prompts
1311 * Test different approaches
1312 * Re-run POC with improvements
1313 * Then make GO/NO-GO decision
1314
1315 === 13.2 Decision Criteria Summary ===
1316
1317 {{code}}
1318 AI Quality < 60% → NO-GO (approach doesn't work)
1319 AI Quality 60-70% → ITERATE (improve and retry)
1320 AI Quality ≥70% → GO (proceed to POC2)
1321 {{/code}}
1322
1323 == 14. Key Risks & Mitigations ==
1324
1325 === 14.1 Risk: AI Quality Not Good Enough ===
1326
1327 **Likelihood:** Medium-High
1328 **Impact:** POC fails
1329
1330 **Mitigation:**
1331
1332 * Extensive prompt engineering and testing
1333 * Use best available AI models (Sonnet 4.5)
1334 * Test with diverse article types
1335 * Iterate on prompts based on results
1336
1337 **Acceptance:** This is what POC tests - be ready for failure
1338
1339 === 14.2 Risk: AI Consistency Issues ===
1340
1341 **Likelihood:** Medium
1342 **Impact:** Works sometimes, fails other times
1343
1344 **Mitigation:**
1345
1346 * Test with 10+ diverse articles
1347 * Measure success rate honestly
1348 * Improve prompts to increase consistency
1349
1350 **Acceptance:** Some variability OK if average quality ≥70%
1351
1352 === 14.3 Risk: Output Incomprehensible ===
1353
1354 **Likelihood:** Low-Medium
1355 **Impact:** Users can't understand analysis
1356
1357 **Mitigation:**
1358
1359 * Create clear explainer document
1360 * Iterate on output format
1361 * Test with non-technical reviewers
1362 * Simplify language if needed
1363
1364 **Acceptance:** Iterate until comprehensible
1365
1366 === 14.4 Risk: API Rate Limits / Costs ===
1367
1368 **Likelihood:** Low
1369 **Impact:** System slow or expensive
1370
1371 **Mitigation:**
1372
1373 * Monitor API usage
1374 * Implement retry logic
1375 * Estimate costs before scaling
1376
1377 **Acceptance:** POC can be slow and expensive (optimization later)
1378
1379 === 14.5 Risk: Scope Creep ===
1380
1381 **Likelihood:** Medium
1382 **Impact:** POC becomes too complex
1383
1384 **Mitigation:**
1385
1386 * Strict scope discipline
1387 * Say NO to feature additions
1388 * Keep focus on core question
1389
1390 **Acceptance:** POC is minimal by design
1391
1392 == 15. POC Philosophy ==
1393
1394 === 15.1 Core Principles ===
1395
1396 *
1397 **
1398 **1. Build Less, Learn More
1399 * Minimum features to test hypothesis
1400 * Don't build unvalidated features
1401 * Focus on core question only
1402
1403 **2. Fail Fast**
1404
1405 * Quick test of hardest part (AI capability)
1406 * Accept that POC might fail
1407 * Better to discover issues early
1408 * Honest assessment over optimistic hope
1409
1410 **3. Test First, Build Second**
1411
1412 * Validate AI can do this before building platform
1413 * Don't assume it will work
1414 * Let results guide decisions
1415
1416 **4. Automation First**
1417
1418 * No manual editing allowed
1419 * Tests scalability, not just feasibility
1420 * Proves approach can work at scale
1421
1422 **5. Honest Assessment**
1423
1424 * Don't cherry-pick examples
1425 * Don't manually fix bad outputs
1426 * Document failures openly
1427 * Make data-driven decisions
1428
1429 === 15.2 What POC Is ===
1430
1431 ✅ Testing AI capability without humans
1432 ✅ Proving core technical concept
1433 ✅ Fast validation of approach
1434 ✅ Honest assessment of feasibility
1435
1436 === 15.3 What POC Is NOT ===
1437
1438 ❌ Building a product
1439 ❌ Production-ready system
1440 ❌ Feature-complete platform
1441 ❌ Perfectly accurate analysis
1442 ❌ Polished user experience
1443
1444 == 16. Success ==
1445
1446 Clear Path Forward ==
1447
1448 **If POC succeeds (≥70% AI quality):**
1449
1450 * ✅ Approach validated
1451 * ✅ Proceed to POC2 (add scenarios)
1452 * ✅ Design full Evidence Model structure
1453 * ✅ Test multi-scenario comparison
1454 * ✅ Focus on improving AI quality from 70% → 90%
1455
1456 **If POC fails (< 60% AI quality):**
1457
1458 * ✅ Learn what doesn't work
1459 * ✅ Pivot to different approach
1460 * ✅ OR wait for better AI technology
1461 * ✅ Avoid wasting resources on non-viable approach
1462
1463 **Either way, POC provides clarity.**
1464
1465 == 17. Related Pages ==
1466
1467 * [[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]
1468 * [[Requirements>>FactHarbor.Specification.Requirements.WebHome]]
1469 * [[Gap Analysis>>FactHarbor.Specification.Requirements.GapAnalysis]]
1470 * [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]
1471 * [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
1472 * [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]
1473
1474 **Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)
1475
1476
1477 === NFR-POC-11: LLM Provider Abstraction (POC1) ===
1478
1479 **Requirement:** POC1 MUST implement LLM abstraction layer with support for multiple providers.
1480
1481 **POC1 Implementation:**
1482
1483 * **Primary Provider:** Anthropic Claude API
1484 * Stage 1: Claude Haiku 4
1485 * Stage 2: Claude Sonnet 3.5 (cached)
1486 * Stage 3: Claude Sonnet 3.5
1487
1488 * **Provider Interface:** Abstract LLMProvider interface implemented
1489
1490 * **Configuration:** Environment variables for provider selection
1491 * {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}}
1492 * {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}}
1493 * {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}}
1494
1495 * **Failover:** Basic error handling with cache fallback for Stage 2
1496
1497 * **Cost Tracking:** Log provider name and cost per request
1498
1499 **Future (POC2/Beta):**
1500
1501 * Secondary provider (OpenAI) with automatic failover
1502 * Admin API for runtime provider switching
1503 * Cost comparison dashboard
1504 * Cross-provider output verification
1505
1506 **Success Criteria:**
1507
1508 * All LLM calls go through abstraction layer (no direct API calls)
1509 * Provider can be changed via environment variable without code changes
1510 * Cost tracking includes provider name in logs
1511 * Stage 2 falls back to cache on provider failure
1512
1513 **Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor V0\.9\.105.Specification.POC.API-and-Schemas.WebHome]] Section 6
1514
1515 **Dependencies:**
1516
1517 * NFR-14 (Main Requirements)
1518 * Design Decision 9
1519 * Architecture Section 2.2
1520
1521 **Priority:** HIGH (P1)
1522
1523 **Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.