Wiki source code of POC Requirements (POC1 & POC2)

Last modified by Robert Schaub on 2025/12/24 19:33

Show last authors
1 = POC Requirements =
2
3 **Status:** ✅ Approved for Development
4 **Version:** 2.0 (Updated after Specification Cross-Check)
5 **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
6
7 == 1. POC Overview ==
8
9 === 1.1 What POC Tests ===
10
11 **Core Question:**
12
13 > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?
14
15 **What we're proving:**
16
17 * AI can identify factual claims from text
18 * AI can evaluate those claims and produce verdicts
19 * Output is comprehensible and useful
20 * Fully automated approach is viable
21
22 **What we're NOT testing:**
23
24 * Scenario generation (deferred to POC2)
25 * Evidence display (deferred to POC2)
26 * Production scalability
27 * Perfect accuracy
28 * Complete feature set
29
30 === 1.2 Scenarios Deferred to POC2 ===
31
32 **Intentional Simplification:**
33
34 Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**.
35
36 **Rationale:**
37
38 * **POC1 tests:** Can AI extract claims and generate verdicts?
39 * **POC2 will add:** Scenario generation and management
40 * **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow?
41
42 **Design Decision:**
43
44 Prove basic AI capability first, then add scenario complexity based on POC1 learnings. This is good engineering: test the hardest part (AI fact-checking) before adding architectural complexity.
45
46 **No Risk:**
47
48 Scenarios are additive complexity, not foundational. Deferring them to POC2 allows:
49
50 * Faster POC1 validation
51 * Learning from POC1 to inform scenario design
52 * Iterative approach: fail fast if basic AI doesn't work
53 * Flexibility to adjust scenario architecture based on POC1 insights
54
55 **Full System Workflow (Future):**
56 {{code}}Claims → Scenarios → Evidence → Verdicts{{/code}}
57
58 **POC1 Simplified Workflow:**
59 {{code}}Claims → Verdicts (scenarios implicit in reasoning){{/code}}
60
61 == 2. POC Output Specification ==
62
63 === 2.1 Component 1: ANALYSIS SUMMARY (Context-Aware) ===
64
65 **What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument
66
67 **Length:** 4-6 sentences
68
69 **Content (Required Elements):**
70
71 1. **Article's main thesis/claim** - What is the article trying to argue or prove?
72 2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts
73 3. **Central vs. supporting claims** - Which claims are central to the article's argument?
74 4. **Relationship assessment** - Do the claims support the article's conclusion?
75 5. **Overall credibility** - Final assessment considering claim importance
76
77 **Critical Innovation:**
78
79 POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might:
80
81 * Make accurate supporting facts but draw unsupported conclusions
82 * Have one false central claim that invalidates the whole argument
83 * Misframe accurate information to mislead
84
85 **Good Example (Context-Aware):**
86 {{code}}This article argues that coffee cures cancer based on its antioxidant
87 content. We analyzed 3 factual claims: 2 about coffee's chemical
88 properties are well-supported, but the main causal claim is refuted
89 by current evidence. The article confuses correlation with causation.
90 Overall assessment: MISLEADING - makes an unsupported medical claim
91 despite citing some accurate facts.{{/code}}
92
93 **Poor Example (Simple Aggregation - Don't Do This):**
94 {{code}}This article makes 3 claims. 2 are well-supported and 1 is refuted.
95 Overall assessment: mostly accurate (67% accurate).{{/code}}
96 ↑ This misses that the refuted claim IS the article's main point!
97
98 **What POC1 Tests:**
99
100 Can AI identify and assess:
101
102 * ✅ The article's main thesis/conclusion?
103 * ✅ Which claims are central vs. supporting?
104 * ✅ Whether the evidence supports the conclusion?
105 * ✅ Overall credibility considering logical structure?
106
107 **If AI Cannot Do This:**
108
109 That's valuable to learn in POC1! We'll:
110
111 * Note as limitation
112 * Fall back to simple aggregation with warning
113 * Design explicit article-level analysis for POC2
114
115 === 2.2 Component 2: CLAIMS IDENTIFICATION ===
116
117 **What:** List of factual claims extracted from article
118 **Format:** Numbered list
119 **Quantity:** 3-5 claims
120 **Requirements:**
121
122 * Factual claims only (not opinions/questions)
123 * Clearly stated
124 * Automatically extracted by AI
125
126 **Example:**
127 {{code}}CLAIMS IDENTIFIED:
128
129 [1] Coffee reduces diabetes risk by 30%
130 [2] Coffee improves heart health
131 [3] Decaf has same benefits as regular
132 [4] Coffee prevents Alzheimer's completely{{/code}}
133
134 === 2.3 Component 3: CLAIMS VERDICTS ===
135
136 **What:** Verdict for each claim identified
137 **Format:** Per claim structure
138
139 **Required Elements:**
140
141 * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
142 * **Confidence Score:** 0-100%
143 * **Brief Reasoning:** 1-3 sentences explaining why
144 * **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration
145
146 **Example:**
147 {{code}}VERDICTS:
148
149 [1] WELL-SUPPORTED (85%) [Risk: C]
150 Multiple studies confirm 25-30% risk reduction with regular consumption.
151
152 [2] UNCERTAIN (65%) [Risk: B]
153 Evidence is mixed. Some studies show benefits, others show no effect.
154
155 [3] PARTIALLY SUPPORTED (60%) [Risk: C]
156 Some benefits overlap, but caffeine-related benefits are reduced in decaf.
157
158 [4] REFUTED (90%) [Risk: B]
159 No evidence for complete prevention. Claim is significantly overstated.{{/code}}
160
161 **Risk Tier Display:**
162
163 * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections
164 * **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
165 * **Tier C (Green):** Low Risk - Facts/Definitions/History
166
167 **Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.
168
169 === 2.4 Component 4: ARTICLE SUMMARY (Optional) ===
170
171 **What:** Brief summary of original article content
172 **Length:** 3-5 sentences
173 **Tone:** Neutral (article's position, not FactHarbor's analysis)
174
175 **Example:**
176 {{code}}ARTICLE SUMMARY:
177
178 Health News Today article discusses coffee benefits, citing studies
179 on diabetes and Alzheimer's. Author highlights research linking coffee
180 to disease prevention. Recommends 2-3 cups daily for optimal health.{{/code}}
181
182 === 2.5 Component 5: USAGE STATISTICS (Cost Tracking) ===
183
184 **What:** LLM usage metrics for cost optimization and scaling decisions
185
186 **Purpose:**
187
188 * Understand cost per analysis
189 * Identify optimization opportunities
190 * Project costs at scale
191 * Inform architecture decisions
192
193 **Display Format:**
194 {{code}}USAGE STATISTICS:
195 • Article: 2,450 words (12,300 characters)
196 • Input tokens: 15,234
197 • Output tokens: 892
198 • Total tokens: 16,126
199 • Estimated cost: $0.24 USD
200 • Response time: 8.3 seconds
201 • Cost per claim: $0.048
202 • Model: claude-sonnet-4-20250514{{/code}}
203
204 **Why This Matters:**
205
206 At scale, LLM costs are critical:
207
208 * 10,000 articles/month ≈ $200-500/month
209 * 100,000 articles/month ≈ $2,000-5,000/month
210 * Cost optimization can reduce expenses 30-50%
211
212 **What POC1 Learns:**
213
214 * How cost scales with article length
215 * Prompt optimization opportunities (caching, compression)
216 * Output verbosity tradeoffs
217 * Model selection strategy (Sonnet vs. Haiku)
218 * Article length limits (if needed)
219
220 **Implementation:**
221
222 * Claude API already returns usage data
223 * No extra API calls needed
224 * Display to user + log for aggregate analysis
225 * Test with articles of varying lengths
226
227 **Critical for GO/NO-GO:** Unit economics must be viable at scale!
228
229 === 2.6 Total Output Size ===
230
231 **Combined:** 220-350 words
232
233 * Analysis Summary (Context-Aware): 60-90 words (4-6 sentences)
234 * Claims Identification: 30-50 words
235 * Claims Verdicts: 100-150 words
236 * Article Summary: 30-50 words (optional)
237
238 **Note:** Analysis summary is slightly longer (4-6 sentences vs. 3-5) to accommodate context-aware assessment of article structure and logical reasoning.
239
240 == 3. What's NOT in POC Scope ==
241
242 === 3.1 Feature Exclusions ===
243
244 The following are **explicitly excluded** from POC:
245
246 **Content Features:**
247
248 * ❌ Scenarios (deferred to POC2)
249 * ❌ Evidence display (supporting/opposing lists)
250 * ❌ Source links (clickable references)
251 * ❌ Detailed reasoning chains
252 * ❌ Source quality ratings (shown but not detailed)
253 * ❌ Contradiction detection (basic only)
254 * ❌ Risk assessment (shown but not workflow-integrated)
255
256 **Platform Features:**
257
258 * ❌ User accounts / authentication
259 * ❌ Saved history
260 * ❌ Search functionality
261 * ❌ Claim comparison
262 * ❌ User contributions
263 * ❌ Commenting system
264 * ❌ Social sharing
265
266 **Technical Features:**
267
268 * ❌ Browser extensions
269 * ❌ Mobile apps
270 * ❌ API endpoints
271 * ❌ Webhooks
272 * ❌ Export features (PDF, CSV)
273
274 **Quality Features:**
275
276 * ❌ Accessibility (WCAG compliance)
277 * ❌ Multilingual support
278 * ❌ Mobile optimization
279 * ❌ Media verification (images/videos)
280
281 **Production Features:**
282
283 * ❌ Security hardening
284 * ❌ Privacy compliance (GDPR)
285 * ❌ Terms of service
286 * ❌ Monitoring/logging
287 * ❌ Error tracking
288 * ❌ Analytics
289 * ❌ A/B testing
290
291 == 4. POC Simplifications vs. Full System ==
292
293 === 4.1 Architecture Comparison ===
294
295 **POC Architecture (Simplified):**
296 {{code}}User Input → Single AKEL Call → Output Display
297 (all processing){{/code}}
298
299 **Full System Architecture:**
300 {{code}}User Input → Claim Extractor → Claim Classifier → Scenario Generator
301 → Evidence Summarizer → Contradiction Detector → Verdict Generator
302 → Quality Gates → Publication → Output Display{{/code}}
303
304 **Key Differences:**
305
306 |=Aspect|=POC1|=Full System
307 |Processing|Single API call|Multi-component pipeline
308 |Scenarios|None (implicit)|Explicit entities with versioning
309 |Evidence|Basic retrieval|Comprehensive with quality scoring
310 |Quality Gates|Simplified (4 basic checks)|Full validation infrastructure
311 |Workflow|3 steps (input/process/output)|6 phases with gates
312 |Data Model|Stateless (no database)|PostgreSQL + Redis + S3
313 |Architecture|Single prompt to Claude|AKEL Orchestrator + Components
314
315 === 4.2 Workflow Comparison ===
316
317 **POC1 Workflow:**
318
319 1. User submits text/URL
320 2. Single AKEL call (all processing in one prompt)
321 3. Display results
322 **Total: 3 steps, 10-18 seconds**
323
324 **Full System Workflow:**
325
326 1. **Claim Submission** (extraction, normalization, clustering)
327 2. **Scenario Building** (definitions, assumptions, boundaries)
328 3. **Evidence Handling** (retrieval, assessment, linking)
329 4. **Verdict Creation** (synthesis, reasoning, approval)
330 5. **Public Presentation** (summaries, landscapes, deep dives)
331 6. **Time Evolution** (versioning, re-evaluation triggers)
332 **Total: 6 phases with quality gates, 10-30 seconds**
333
334 === 4.3 Why POC is Simplified ===
335
336 **Engineering Rationale:**
337
338 1. **Test core capability first:** Can AI do basic fact-checking without humans?
339 2. **Fail fast:** If AI can't generate reasonable verdicts, pivot early
340 3. **Learn before building:** POC1 insights inform full architecture
341 4. **Iterative approach:** Add complexity only after validating foundations
342 5. **Resource efficiency:** Don't build full system if core concept fails
343
344 **Acceptable Trade-offs:**
345
346 * ✅ POC proves AI capability (most risky assumption)
347 * ✅ POC validates user comprehension (can people understand output?)
348 * ❌ POC doesn't validate full workflow (test in Beta)
349 * ❌ POC doesn't validate scale (test in Beta)
350 * ❌ POC doesn't validate scenario architecture (design in POC2)
351
352 === 4.4 Gap Between POC1 and POC2/Beta ===
353
354 **What needs to be built for POC2:**
355
356 * Scenario generation component
357 * Evidence Model structure (full)
358 * Scenario-evidence linking
359 * Multi-interpretation comparison
360 * Truth landscape visualization
361
362 **What needs to be built for Beta:**
363
364 * Multi-component AKEL pipeline
365 * Quality gate infrastructure
366 * Review workflow system
367 * Audit sampling framework
368 * Production data model
369 * Federation architecture (Release 1.0)
370
371 **POC1 → POC2 is significant architectural expansion.**
372
373 == 5. Publication Mode & Labeling ==
374
375 === 5.1 POC Publication Mode ===
376
377 **Mode:** Mode 2 (AI-Generated, No Prior Human Review)
378
379 Per FactHarbor Specification Section 11 "POC v1 Behavior":
380
381 * Produces public AI-generated output
382 * No human approval gate
383 * Clear AI-Generated labeling
384 * All quality gates active (simplified)
385 * Risk tier classification shown (demo)
386
387 === 5.2 User-Facing Labels ===
388
389 **Primary Label (top of analysis):**
390 {{code}}╔════════════════════════════════════════════════════════════╗
391 ║ [AI-GENERATED - POC/DEMO] ║
392 ║ ║
393 ║ This analysis was produced entirely by AI and has not ║
394 ║ been human-reviewed. Use for demonstration purposes. ║
395 ║ ║
396 ║ Source: AI/AKEL v1.0 (POC) ║
397 ║ Review Status: Not Reviewed (Proof-of-Concept) ║
398 ║ Quality Gates: 4/4 Passed (Simplified) ║
399 ║ Last Updated: [timestamp] ║
400 ╚════════════════════════════════════════════════════════════╝{{/code}}
401
402 **Per-Claim Risk Labels:**
403
404 * **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety)
405 * **[Risk: B]** 🟡 Medium Risk (Policy/Science)
406 * **[Risk: C]** 🟢 Low Risk (Facts/Definitions)
407
408 === 5.3 Display Requirements ===
409
410 **Must Show:**
411
412 * AI-Generated status (prominent)
413 * POC/Demo disclaimer
414 * Risk tier per claim
415 * Confidence scores (0-100%)
416 * Quality gate status (passed/failed)
417 * Timestamp
418
419 **Must NOT Claim:**
420
421 * Human review
422 * Production quality
423 * Medical/legal advice
424 * Authoritative verdicts
425 * Complete accuracy
426
427 === 5.4 Mode 2 vs. Full System Publication ===
428
429 |=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3
430 |Label|AI-Generated (POC)|AI-Generated|AKEL-Generated
431 |Review|None|None|Human-Reviewed
432 |Quality Gates|4 (simplified)|6 (full)|6 (full) + Human
433 |Audit|None (POC)|Sampling (5-50%)|Pre-publication
434 |Risk Display|Demo only|Workflow-integrated|Validated
435 |User Actions|View only|Flag for review|Trust rating
436
437 == 6. Quality Gates (Simplified Implementation) ==
438
439 === 6.1 Overview ===
440
441 Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates.
442
443 **Full System Has 4 Gates:**
444
445 1. Source Quality
446 2. Contradiction Search (MANDATORY)
447 3. Uncertainty Quantification
448 4. Structural Integrity
449
450 **POC Implements Simplified Versions:**
451
452 * Focus on demonstrating concept
453 * Basic implementations sufficient
454 * Failures displayed to user (not blocking)
455 * Full system has comprehensive validation
456
457 === 6.2 Gate 1: Source Quality (Basic) ===
458
459 **Full System Requirements:**
460
461 * Primary sources identified and accessible
462 * Source reliability scored against whitelist
463 * Citation completeness verified
464 * Publication dates checked
465 * Author credentials validated
466
467 **POC Implementation:**
468
469 * ✅ At least 2 sources found
470 * ✅ Sources accessible (URLs valid)
471 * ❌ No whitelist checking
472 * ❌ No credential validation
473 * ❌ No comprehensive reliability scoring
474
475 **Pass Criteria:** ≥2 accessible sources found
476
477 **Failure Handling:** Display error message, don't generate verdict
478
479 === 6.3 Gate 2: Contradiction Search (Basic) ===
480
481 **Full System Requirements:**
482
483 * Counter-evidence actively searched
484 * Reservations and limitations identified
485 * Alternative interpretations explored
486 * Bubble detection (echo chambers, conspiracy theories)
487 * Cross-cultural and international perspectives
488 * Academic literature (supporting AND opposing)
489
490 **POC Implementation:**
491
492 * ✅ Basic search for counter-evidence
493 * ✅ Identify obvious contradictions
494 * ❌ No comprehensive academic search
495 * ❌ No bubble detection
496 * ❌ No systematic alternative interpretation search
497 * ❌ No international perspective verification
498
499 **Pass Criteria:** Basic contradiction search attempted
500
501 **Failure Handling:** Note "limited contradiction search" in output
502
503 === 6.4 Gate 3: Uncertainty Quantification (Basic) ===
504
505 **Full System Requirements:**
506
507 * Confidence scores calculated for all claims/verdicts
508 * Limitations explicitly stated
509 * Data gaps identified and disclosed
510 * Strength of evidence assessed
511 * Alternative scenarios considered
512
513 **POC Implementation:**
514
515 * ✅ Confidence scores (0-100%)
516 * ✅ Basic uncertainty acknowledgment
517 * ❌ No detailed limitation disclosure
518 * ❌ No data gap identification
519 * ❌ No alternative scenario consideration (deferred to POC2)
520
521 **Pass Criteria:** Confidence score assigned
522
523 **Failure Handling:** Show "Confidence: Unknown" if calculation fails
524
525 === 6.5 Gate 4: Structural Integrity (Basic) ===
526
527 **Full System Requirements:**
528
529 * No hallucinations detected (fact-checking against sources)
530 * Logic chain valid and traceable
531 * References accessible and verifiable
532 * No circular reasoning
533 * Premises clearly stated
534
535 **POC Implementation:**
536
537 * ✅ Basic coherence check
538 * ✅ References accessible
539 * ❌ No comprehensive hallucination detection
540 * ❌ No formal logic validation
541 * ❌ No premise extraction and verification
542
543 **Pass Criteria:** Output is coherent and references are accessible
544
545 **Failure Handling:** Display error message
546
547 === 6.6 Quality Gate Display ===
548
549 **POC shows simplified status:**
550 {{code}}Quality Gates: 4/4 Passed (Simplified)
551 ✓ Source Quality: 3 sources found
552 ✓ Contradiction Search: Basic search completed
553 ✓ Uncertainty: Confidence scores assigned
554 ✓ Structural Integrity: Output coherent{{/code}}
555
556 **If any gate fails:**
557 {{code}}Quality Gates: 3/4 Passed (Simplified)
558 ✓ Source Quality: 3 sources found
559 ✗ Contradiction Search: Search failed - limited evidence
560 ✓ Uncertainty: Confidence scores assigned
561 ✓ Structural Integrity: Output coherent
562
563 Note: This analysis has limited evidence. Use with caution.{{/code}}
564
565 === 6.7 Simplified vs. Full System ===
566
567 |=Gate|=POC (Simplified)|=Full System
568 |Source Quality|≥2 sources accessible|Whitelist scoring, credentials, comprehensiveness
569 |Contradiction|Basic search|Systematic academic + media + international
570 |Uncertainty|Confidence % assigned|Detailed limitations, data gaps, alternatives
571 |Structural|Coherence check|Hallucination detection, logic validation, premise check
572
573 **POC Goal:** Demonstrate that quality gates are possible, not perfect implementation.
574
575 == 7. AKEL Architecture Comparison ==
576
577 === 7.1 POC AKEL (Simplified) ===
578
579 **Implementation:**
580
581 * Single Claude API call (Sonnet 4.5)
582 * One comprehensive prompt
583 * All processing in single request
584 * No separate components
585 * No orchestration layer
586
587 **Prompt Structure:**
588 {{code}}Task: Analyze this article and provide:
589
590 1. Extract 3-5 factual claims
591 2. For each claim:
592 - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
593 - Assign confidence score (0-100%)
594 - Assign risk tier (A/B/C)
595 - Write brief reasoning (1-3 sentences)
596 3. Generate analysis summary (3-5 sentences)
597 4. Generate article summary (3-5 sentences)
598 5. Run basic quality checks
599
600 Return as structured JSON.{{/code}}
601
602 **Processing Time:** 10-18 seconds (estimate)
603
604 === 7.2 Full System AKEL (Production) ===
605
606 **Architecture:**
607 {{code}}AKEL Orchestrator
608 ├── Claim Extractor
609 ├── Claim Classifier (with risk tier assignment)
610 ├── Scenario Generator
611 ├── Evidence Summarizer
612 ├── Contradiction Detector
613 ├── Quality Gate Validator
614 ├── Audit Sampling Scheduler
615 └── Federation Sync Adapter (Release 1.0+){{/code}}
616
617 **Processing:**
618
619 * Parallel processing where possible
620 * Separate component calls
621 * Quality gates between phases
622 * Audit sampling selection
623 * Cross-node coordination (federated mode)
624
625 **Processing Time:** 10-30 seconds (full pipeline)
626
627 === 7.3 Why POC Uses Single Call ===
628
629 **Advantages:**
630
631 * ✅ Simpler to implement
632 * ✅ Faster POC development
633 * ✅ Easier to debug
634 * ✅ Proves AI capability
635 * ✅ Good enough for concept validation
636
637 **Limitations:**
638
639 * ❌ No component reusability
640 * ❌ No parallel processing
641 * ❌ All-or-nothing (can't partially succeed)
642 * ❌ Harder to improve individual components
643 * ❌ No audit sampling
644
645 **Acceptable Trade-off:**
646
647 POC tests "Can AI do this?" not "How should we architect it?"
648
649 Full component architecture comes in Beta after POC validates concept.
650
651 === 7.4 Evolution Path ===
652
653 **POC1:** Single prompt → Prove concept
654 **POC2:** Add scenario component → Test full pipeline
655 **Beta:** Multi-component AKEL → Production architecture
656 **Release 1.0:** Full AKEL + Federation → Scale
657
658 == 8. Functional Requirements ==
659
660 === FR-POC-1: Article Input ===
661
662 **Requirement:** User can submit article for analysis
663
664 **Functionality:**
665
666 * Text input field (paste article text, up to 5000 characters)
667 * URL input field (paste article URL)
668 * "Analyze" button to trigger processing
669 * Loading indicator during analysis
670
671 **Excluded:**
672
673 * No user authentication
674 * No claim history
675 * No search functionality
676 * No saved templates
677
678 **Acceptance Criteria:**
679
680 * User can paste text from article
681 * User can paste URL of article
682 * System accepts input and triggers analysis
683
684 === FR-POC-2: Claim Extraction (Fully Automated) ===
685
686 **Requirement:** AI automatically extracts 3-5 factual claims
687
688 **Functionality:**
689
690 * AI reads article text
691 * AI identifies factual claims (not opinions/questions)
692 * AI extracts 3-5 most important claims
693 * System displays numbered list
694
695 **Critical:** NO MANUAL EDITING ALLOWED
696
697 * AI selects which claims to extract
698 * AI identifies factual vs. non-factual
699 * System processes claims as extracted
700 * No human curation or correction
701
702 **Error Handling:**
703
704 * If extraction fails: Display error message
705 * User can retry with different input
706 * No manual intervention to fix extraction
707
708 **Acceptance Criteria:**
709
710 * AI extracts 3-5 claims automatically
711 * Claims are factual (not opinions)
712 * Claims are clearly stated
713 * No manual editing required
714
715 === FR-POC-3: Verdict Generation (Fully Automated) ===
716
717 **Requirement:** AI automatically generates verdict for each claim
718
719 **Functionality:**
720
721 * For each claim, AI:
722 * Evaluates claim based on available evidence/knowledge
723 * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
724 * Assigns confidence score (0-100%)
725 * Assigns risk tier (A/B/C)
726 * Writes brief reasoning (1-3 sentences)
727 * System displays verdict for each claim
728
729 **Critical:** NO MANUAL EDITING ALLOWED
730
731 * AI computes verdicts based on evidence
732 * AI generates confidence scores
733 * AI writes reasoning
734 * No human review or adjustment
735
736 **Error Handling:**
737
738 * If verdict generation fails: Display error message
739 * User can retry
740 * No manual intervention to adjust verdicts
741
742 **Acceptance Criteria:**
743
744 * Each claim has a verdict
745 * Confidence score is displayed (0-100%)
746 * Risk tier is displayed (A/B/C)
747 * Reasoning is understandable (1-3 sentences)
748 * Verdict is defensible given reasoning
749 * All generated automatically by AI
750
751 === FR-POC-4: Analysis Summary (Fully Automated) ===
752
753 **Requirement:** AI generates brief summary of analysis
754
755 **Functionality:**
756
757 * AI summarizes findings in 3-5 sentences:
758 * How many claims found
759 * Distribution of verdicts
760 * Overall assessment
761 * System displays at top of results
762
763 **Critical:** NO MANUAL EDITING ALLOWED
764
765 **Acceptance Criteria:**
766
767 * Summary is coherent
768 * Accurately reflects analysis
769 * 3-5 sentences
770 * Automatically generated
771
772 === FR-POC-5: Article Summary (Fully Automated, Optional) ===
773
774 **Requirement:** AI generates brief summary of original article
775
776 **Functionality:**
777
778 * AI summarizes article content (not FactHarbor's analysis)
779 * 3-5 sentences
780 * System displays
781
782 **Note:** Optional - can skip if time limited
783
784 **Critical:** NO MANUAL EDITING ALLOWED
785
786 **Acceptance Criteria:**
787
788 * Summary is neutral (article's position)
789 * Accurately reflects article content
790 * 3-5 sentences
791 * Automatically generated
792
793 === FR-POC-6: Publication Mode Display ===
794
795 **Requirement:** Clear labeling of AI-generated content
796
797 **Functionality:**
798
799 * Display Mode 2 publication label
800 * Show POC/Demo disclaimer
801 * Display risk tiers per claim
802 * Show quality gate status
803 * Display timestamp
804
805 **Acceptance Criteria:**
806
807 * Label is prominent and clear
808 * User understands this is AI-generated POC output
809 * Risk tiers are color-coded
810 * Quality gate status is visible
811
812 === FR-POC-7: Quality Gate Execution ===
813
814 **Requirement:** Execute simplified quality gates
815
816 **Functionality:**
817
818 * Check source quality (basic)
819 * Attempt contradiction search (basic)
820 * Calculate confidence scores
821 * Verify structural integrity (basic)
822 * Display gate results
823
824 **Acceptance Criteria:**
825
826 * All 4 gates attempted
827 * Pass/fail status displayed
828 * Failures explained to user
829 * Gates don't block publication (POC mode)
830
831 == 9. Non-Functional Requirements ==
832
833 === NFR-POC-1: Fully Automated Processing ===
834
835 **Requirement:** Complete AI automation with zero manual intervention
836
837 **Critical Rule:** NO MANUAL EDITING AT ANY STAGE
838
839 **What this means:**
840
841 * Claims: AI selects (no human curation)
842 * Scenarios: N/A (deferred to POC2)
843 * Evidence: AI evaluates (no human selection)
844 * Verdicts: AI determines (no human adjustment)
845 * Summaries: AI writes (no human editing)
846
847 **Pipeline:**
848 {{code}}User Input → AKEL Processing → Output Display
849
850 ZERO human editing{{/code}}
851
852 **If AI output is poor:**
853
854 * ❌ Do NOT manually fix it
855 * ✅ Document the failure
856 * ✅ Improve prompts and retry
857 * ✅ Accept that POC might fail
858
859 **Why this matters:**
860
861 * Tests whether AI can do this without humans
862 * Validates scalability (humans can't review every analysis)
863 * Honest test of technical feasibility
864
865 === NFR-POC-2: Performance ===
866
867 **Requirement:** Analysis completes in reasonable time
868
869 **Acceptable Performance:**
870
871 * Processing time: 1-5 minutes (acceptable for POC)
872 * Display loading indicator to user
873 * Show progress if possible ("Extracting claims...", "Generating verdicts...")
874
875 **Not Required:**
876
877 * Production-level speed (< 30 seconds)
878 * Optimization for scale
879 * Caching
880
881 **Acceptance Criteria:**
882
883 * Analysis completes within 5 minutes
884 * User sees loading indicator
885 * No timeout errors
886
887 === NFR-POC-3: Reliability ===
888
889 **Requirement:** System works for manual testing sessions
890
891 **Acceptable:**
892
893 * Occasional errors (< 20% failure rate)
894 * Manual restart if needed
895 * Display error messages clearly
896
897 **Not Required:**
898
899 * 99.9% uptime
900 * Automatic error recovery
901 * Production monitoring
902
903 **Acceptance Criteria:**
904
905 * System works for test demonstrations
906 * Errors are handled gracefully
907 * User receives clear error messages
908
909 === NFR-POC-4: Environment ===
910
911 **Requirement:** Runs on simple infrastructure
912
913 **Acceptable:**
914
915 * Single machine or simple cloud setup
916 * No distributed architecture
917 * No load balancing
918 * No redundancy
919 * Local development environment viable
920
921 **Not Required:**
922
923 * Production infrastructure
924 * Multi-region deployment
925 * Auto-scaling
926 * Disaster recovery
927
928 === NFR-POC-5: Cost Efficiency Tracking ===
929
930 **Requirement:** Track and display LLM usage metrics to inform optimization decisions
931
932 **Must Track:**
933
934 * Input tokens (article + prompt)
935 * Output tokens (generated analysis)
936 * Total tokens
937 * Estimated cost (USD)
938 * Response time (seconds)
939 * Article length (words/characters)
940
941 **Must Display:**
942
943 * Usage statistics in UI (Component 5)
944 * Cost per analysis
945 * Cost per claim extracted
946
947 **Must Log:**
948
949 * Aggregate metrics for analysis
950 * Cost distribution by article length
951 * Token efficiency trends
952
953 **Purpose:**
954
955 * Understand unit economics
956 * Identify optimization opportunities
957 * Project costs at scale
958 * Inform architecture decisions (caching, model selection, etc.)
959
960 **Acceptance Criteria:**
961
962 * ✅ Usage data displayed after each analysis
963 * ✅ Metrics logged for aggregate analysis
964 * ✅ Cost calculated accurately (Claude API pricing)
965 * ✅ Test cases include varying article lengths
966 * ✅ POC1 report includes cost analysis section
967
968 **Success Target:**
969
970 * Average cost per analysis < $0.05 USD
971 * Cost scaling behavior understood (linear/exponential)
972 * 2+ optimization opportunities identified
973
974 **Critical:** Unit economics must be viable for scaling decision!
975
976 == 10. Technical Architecture ==
977
978 === 10.1 System Components ===
979
980 **Frontend:**
981
982 * Simple HTML form (text input + URL input + button)
983 * Loading indicator
984 * Results display page (single page, no tabs/navigation)
985
986 **Backend:**
987
988 * Single API endpoint
989 * Calls Claude API (Sonnet 4.5 or latest)
990 * Parses response
991 * Returns JSON to frontend
992
993 **Data Storage:**
994
995 * None required (stateless POC)
996 * Optional: Simple file storage or SQLite for demo examples
997
998 **External Services:**
999
1000 * Claude API (Anthropic) - required
1001 * Optional: URL fetch service for article text extraction
1002
1003 === 10.2 Processing Flow ===
1004
1005 {{code}}
1006 1. User submits text or URL
1007
1008 2. Backend receives request
1009
1010 3. If URL: Fetch article text
1011
1012 4. Call Claude API with single prompt:
1013 "Extract claims, evaluate each, provide verdicts"
1014
1015 5. Claude API returns:
1016 - Analysis summary
1017 - Claims list
1018 - Verdicts for each claim (with risk tiers)
1019 - Article summary (optional)
1020 - Quality gate results
1021
1022 6. Backend parses response
1023
1024 7. Frontend displays results with Mode 2 labeling
1025 {{/code}}
1026
1027 **Key Simplification:** Single API call does entire analysis
1028
1029 === 10.3 AI Prompt Strategy ===
1030
1031 **Single Comprehensive Prompt:**
1032 {{code}}Task: Analyze this article and provide:
1033
1034 1. Identify the article's main thesis/conclusion
1035 - What is the article trying to argue or prove?
1036 - What is the primary claim or conclusion?
1037
1038 2. Extract 3-5 factual claims from the article
1039 - Note which claims are CENTRAL to the main thesis
1040 - Note which claims are SUPPORTING facts
1041
1042 3. For each claim:
1043 - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
1044 - Assign confidence score (0-100%)
1045 - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
1046 - Write brief reasoning (1-3 sentences)
1047
1048 4. Assess relationship between claims and main thesis:
1049 - Do the claims actually support the article's conclusion?
1050 - Are there logical leaps or unsupported inferences?
1051 - Is the article's framing misleading even if individual facts are accurate?
1052
1053 5. Run quality gates:
1054 - Check: ≥2 sources found
1055 - Attempt: Basic contradiction search
1056 - Calculate: Confidence scores
1057 - Verify: Structural integrity
1058
1059 6. Write context-aware analysis summary (4-6 sentences):
1060 - State article's main thesis
1061 - Report claims found and verdict distribution
1062 - Note if central claims are problematic
1063 - Assess whether evidence supports conclusion
1064 - Overall credibility considering claim importance
1065
1066 7. Write article summary (3-5 sentences: neutral summary of article content)
1067
1068 Return as structured JSON with quality gate results.{{/code}}
1069
1070 **One prompt generates everything.**
1071
1072 **Critical Addition:**
1073
1074 Steps 1, 2 (marking central claims), 4, and 6 are NEW for context-aware analysis. These test whether AI can distinguish between "accurate facts poorly reasoned" vs. "genuinely credible article."
1075
1076 === 10.4 Technology Stack Suggestions ===
1077
1078 **Frontend:**
1079
1080 * HTML + CSS + JavaScript (minimal framework)
1081 * OR: Next.js (if team prefers)
1082 * Hosted: Local machine OR Vercel/Netlify free tier
1083
1084 **Backend:**
1085
1086 * Python Flask/FastAPI (simple REST API)
1087 * OR: Next.js API routes (if using Next.js)
1088 * Hosted: Local machine OR Railway/Render free tier
1089
1090 **AKEL Integration:**
1091
1092 * Claude API via Anthropic SDK
1093 * Model: Claude Sonnet 4.5 or latest available
1094
1095 **Database:**
1096
1097 * None (stateless acceptable)
1098 * OR: SQLite if want to store demo examples
1099 * OR: JSON files on disk
1100
1101 **Deployment:**
1102
1103 * Local development environment sufficient for POC
1104 * Optional: Deploy to cloud for remote demos
1105
1106 == 11. Success Criteria ==
1107
1108 === 11.1 Minimum Success (POC Passes) ===
1109
1110 **Required for GO decision:**
1111
1112 * ✅ AI extracts 3-5 factual claims automatically
1113 * ✅ AI provides verdict for each claim automatically
1114 * ✅ Verdicts are reasonable (≥70% make logical sense)
1115 * ✅ Analysis summary is coherent
1116 * ✅ Output is comprehensible to reviewers
1117 * ✅ Team/advisors understand the output
1118 * ✅ Team agrees approach has merit
1119 * ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention)
1120 * ✅ **Cost efficiency acceptable** (average cost per analysis < $0.05 USD target)
1121 * ✅ **Cost scaling understood** (data collected on article length vs. cost)
1122 * ✅ **Optimization opportunities identified** (≥2 potential improvements documented)
1123
1124 **Quality Definition:**
1125
1126 * "Reasonable verdict" = Defensible given general knowledge
1127 * "Coherent summary" = Logically structured, grammatically correct
1128 * "Comprehensible" = Reviewers understand what analysis means
1129
1130 === 11.2 POC Fails If ===
1131
1132 **Automatic NO-GO if any of these:**
1133
1134 * ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones)
1135 * ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random)
1136 * ❌ Output incomprehensible (reviewers can't understand analysis)
1137 * ❌ **Requires manual editing for most analyses** (> 50% need human correction)
1138 * ❌ Team loses confidence in AI-automated approach
1139
1140 === 11.3 Quality Thresholds ===
1141
1142 **POC quality expectations:**
1143
1144 |=Component|=Quality Threshold|=Definition
1145 |Claim Extraction|(% class="success" %)≥70% accuracy |Identifies obvious factual claims, may miss some edge cases
1146 |Verdict Logic|(% class="success" %)≥70% defensible |Verdicts are logical given reasoning provided
1147 |Reasoning Clarity|(% class="success" %)≥70% clear |1-3 sentences are understandable and relevant
1148 |Overall Analysis|(% class="success" %)≥70% useful |Output helps user understand article claims
1149
1150 **Analogy:** "B student" quality (70-80%), not "A+" perfection yet
1151
1152 **Not expecting:**
1153
1154 * 100% accuracy
1155 * Perfect claim coverage
1156 * Comprehensive evidence gathering
1157 * Flawless verdicts
1158 * Production polish
1159
1160 **Expecting:**
1161
1162 * Reasonable claim extraction
1163 * Defensible verdicts
1164 * Understandable reasoning
1165 * Useful output
1166
1167 == 12. Test Cases ==
1168
1169 === 12.1 Test Case 1: Simple Factual Claim ===
1170
1171 **Input:** "Coffee reduces the risk of type 2 diabetes by 30%"
1172
1173 **Expected Output:**
1174
1175 * Extract claim correctly
1176 * Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED
1177 * Confidence: 70-90%
1178 * Risk tier: C (Low)
1179 * Reasoning: Mentions studies or evidence
1180
1181 **Success:** Verdict is reasonable and reasoning makes sense
1182
1183 === 12.2 Test Case 2: Complex News Article ===
1184
1185 **Input:** News article URL with multiple claims about politics/health/science
1186
1187 **Expected Output:**
1188
1189 * Extract 3-5 key claims
1190 * Verdict for each (may vary: some supported, some uncertain, some refuted)
1191 * Coherent analysis summary
1192 * Article summary
1193 * Risk tiers assigned appropriately
1194
1195 **Success:** Claims identified are actually from article, verdicts are reasonable
1196
1197 === 12.3 Test Case 3: Controversial Topic ===
1198
1199 **Input:** Article on contested political or scientific topic
1200
1201 **Expected Output:**
1202
1203 * Balanced analysis
1204 * Acknowledges uncertainty where appropriate
1205 * Doesn't overstate confidence
1206 * Reasoning shows awareness of complexity
1207
1208 **Success:** Analysis is fair and doesn't show obvious bias
1209
1210 === 12.4 Test Case 4: Clearly False Claim ===
1211
1212 **Input:** Article with obviously false claim (e.g., "The Earth is flat")
1213
1214 **Expected Output:**
1215
1216 * Extract claim
1217 * Verdict: REFUTED
1218 * High confidence (> 90%)
1219 * Risk tier: C (Low - established fact)
1220 * Clear reasoning
1221
1222 **Success:** AI correctly identifies false claim with high confidence
1223
1224 === 12.5 Test Case 5: Genuinely Uncertain Claim ===
1225
1226 **Input:** Article with claim where evidence is genuinely mixed
1227
1228 **Expected Output:**
1229
1230 * Extract claim
1231 * Verdict: UNCERTAIN
1232 * Moderate confidence (40-60%)
1233 * Reasoning explains why uncertain
1234
1235 **Success:** AI recognizes uncertainty and doesn't overstate confidence
1236
1237 === 12.6 Test Case 6: High-Risk Medical Claim ===
1238
1239 **Input:** Article making medical claims
1240
1241 **Expected Output:**
1242
1243 * Extract claim
1244 * Verdict: [appropriate based on evidence]
1245 * Risk tier: A (High - medical)
1246 * Red label displayed
1247 * Clear disclaimer about not being medical advice
1248
1249 **Success:** Risk tier correctly assigned, appropriate warnings shown
1250
1251 == 13. POC Decision Gate ==
1252
1253 === 13.1 Decision Framework ===
1254
1255 After POC testing complete, team makes one of three decisions:
1256
1257 **Option A: GO (Proceed to POC2)**
1258
1259 **Conditions:**
1260
1261 * AI quality ≥70% without manual editing
1262 * Basic claim → verdict pipeline validated
1263 * Internal + advisor feedback positive
1264 * Technical feasibility confirmed
1265 * Team confident in direction
1266 * Clear path to improving AI quality to ≥90%
1267
1268 **Next Steps:**
1269
1270 * Plan POC2 development (add scenarios)
1271 * Design scenario architecture
1272 * Expand to Evidence Model structure
1273 * Test with more complex articles
1274
1275 **Option B: NO-GO (Pivot or Stop)**
1276
1277 **Conditions:**
1278
1279 * AI quality < 60%
1280 * Requires manual editing for most analyses (> 50%)
1281 * Feedback indicates fundamental flaws
1282 * Cost/effort not justified by value
1283 * No clear path to improvement
1284
1285 **Next Steps:**
1286
1287 * **Pivot:** Change to hybrid human-AI approach (accept manual review required)
1288 * **Stop:** Conclude approach not viable, revisit later
1289
1290 **Option C: ITERATE (Improve POC)**
1291
1292 **Conditions:**
1293
1294 * Concept has merit but execution needs work
1295 * Specific improvements identified
1296 * Addressable with better prompts/approach
1297 * AI quality between 60-70%
1298
1299 **Next Steps:**
1300
1301 * Improve AI prompts
1302 * Test different approaches
1303 * Re-run POC with improvements
1304 * Then make GO/NO-GO decision
1305
1306 === 13.2 Decision Criteria Summary ===
1307
1308 {{code}}
1309 AI Quality < 60% → NO-GO (approach doesn't work)
1310 AI Quality 60-70% → ITERATE (improve and retry)
1311 AI Quality ≥70% → GO (proceed to POC2)
1312 {{/code}}
1313
1314 == 14. Key Risks & Mitigations ==
1315
1316 === 14.1 Risk: AI Quality Not Good Enough ===
1317
1318 **Likelihood:** Medium-High
1319 **Impact:** POC fails
1320
1321 **Mitigation:**
1322
1323 * Extensive prompt engineering and testing
1324 * Use best available AI models (Sonnet 4.5)
1325 * Test with diverse article types
1326 * Iterate on prompts based on results
1327
1328 **Acceptance:** This is what POC tests - be ready for failure
1329
1330 === 14.2 Risk: AI Consistency Issues ===
1331
1332 **Likelihood:** Medium
1333 **Impact:** Works sometimes, fails other times
1334
1335 **Mitigation:**
1336
1337 * Test with 10+ diverse articles
1338 * Measure success rate honestly
1339 * Improve prompts to increase consistency
1340
1341 **Acceptance:** Some variability OK if average quality ≥70%
1342
1343 === 14.3 Risk: Output Incomprehensible ===
1344
1345 **Likelihood:** Low-Medium
1346 **Impact:** Users can't understand analysis
1347
1348 **Mitigation:**
1349
1350 * Create clear explainer document
1351 * Iterate on output format
1352 * Test with non-technical reviewers
1353 * Simplify language if needed
1354
1355 **Acceptance:** Iterate until comprehensible
1356
1357 === 14.4 Risk: API Rate Limits / Costs ===
1358
1359 **Likelihood:** Low
1360 **Impact:** System slow or expensive
1361
1362 **Mitigation:**
1363
1364 * Monitor API usage
1365 * Implement retry logic
1366 * Estimate costs before scaling
1367
1368 **Acceptance:** POC can be slow and expensive (optimization later)
1369
1370 === 14.5 Risk: Scope Creep ===
1371
1372 **Likelihood:** Medium
1373 **Impact:** POC becomes too complex
1374
1375 **Mitigation:**
1376
1377 * Strict scope discipline
1378 * Say NO to feature additions
1379 * Keep focus on core question
1380
1381 **Acceptance:** POC is minimal by design
1382
1383 == 15. POC Philosophy ==
1384
1385 === 15.1 Core Principles ===
1386
1387 *
1388 **
1389 **1. Build Less, Learn More
1390 * Minimum features to test hypothesis
1391 * Don't build unvalidated features
1392 * Focus on core question only
1393
1394 **2. Fail Fast**
1395
1396 * Quick test of hardest part (AI capability)
1397 * Accept that POC might fail
1398 * Better to discover issues early
1399 * Honest assessment over optimistic hope
1400
1401 **3. Test First, Build Second**
1402
1403 * Validate AI can do this before building platform
1404 * Don't assume it will work
1405 * Let results guide decisions
1406
1407 **4. Automation First**
1408
1409 * No manual editing allowed
1410 * Tests scalability, not just feasibility
1411 * Proves approach can work at scale
1412
1413 **5. Honest Assessment**
1414
1415 * Don't cherry-pick examples
1416 * Don't manually fix bad outputs
1417 * Document failures openly
1418 * Make data-driven decisions
1419
1420 === 15.2 What POC Is ===
1421
1422 ✅ Testing AI capability without humans
1423 ✅ Proving core technical concept
1424 ✅ Fast validation of approach
1425 ✅ Honest assessment of feasibility
1426
1427 === 15.3 What POC Is NOT ===
1428
1429 ❌ Building a product
1430 ❌ Production-ready system
1431 ❌ Feature-complete platform
1432 ❌ Perfectly accurate analysis
1433 ❌ Polished user experience
1434
1435 == 16. Success ==
1436
1437 Clear Path Forward ==
1438
1439 **If POC succeeds (≥70% AI quality):**
1440
1441 * ✅ Approach validated
1442 * ✅ Proceed to POC2 (add scenarios)
1443 * ✅ Design full Evidence Model structure
1444 * ✅ Test multi-scenario comparison
1445 * ✅ Focus on improving AI quality from 70% → 90%
1446
1447 **If POC fails (< 60% AI quality):**
1448
1449 * ✅ Learn what doesn't work
1450 * ✅ Pivot to different approach
1451 * ✅ OR wait for better AI technology
1452 * ✅ Avoid wasting resources on non-viable approach
1453
1454 **Either way, POC provides clarity.**
1455
1456 == 17. Related Pages ==
1457
1458 * [[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]
1459 * [[Requirements>>FactHarbor.Specification.Requirements.WebHome]]
1460 * [[Gap Analysis>>FactHarbor.Specification.Requirements.GapAnalysis]]
1461 * [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]
1462 * [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
1463 * [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]
1464
1465 **Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)
1466
1467
1468 === NFR-POC-11: LLM Provider Abstraction (POC1) ===
1469
1470 **Requirement:** POC1 MUST implement LLM abstraction layer with support for multiple providers.
1471
1472 **POC1 Implementation:**
1473
1474 * **Primary Provider:** Anthropic Claude API
1475 * Stage 1: Claude Haiku 4
1476 * Stage 2: Claude Sonnet 3.5 (cached)
1477 * Stage 3: Claude Sonnet 3.5
1478
1479 * **Provider Interface:** Abstract LLMProvider interface implemented
1480
1481 * **Configuration:** Environment variables for provider selection
1482 * {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}}
1483 * {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}}
1484 * {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}}
1485
1486 * **Failover:** Basic error handling with cache fallback for Stage 2
1487
1488 * **Cost Tracking:** Log provider name and cost per request
1489
1490 **Future (POC2/Beta):**
1491
1492 * Secondary provider (OpenAI) with automatic failover
1493 * Admin API for runtime provider switching
1494 * Cost comparison dashboard
1495 * Cross-provider output verification
1496
1497 **Success Criteria:**
1498
1499 * All LLM calls go through abstraction layer (no direct API calls)
1500 * Provider can be changed via environment variable without code changes
1501 * Cost tracking includes provider name in logs
1502 * Stage 2 falls back to cache on provider failure
1503
1504 **Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor V0\.9\.104.Specification.POC.API-and-Schemas.WebHome]] Section 6
1505
1506 **Dependencies:**
1507
1508 * NFR-14 (Main Requirements)
1509 * Design Decision 9
1510 * Architecture Section 2.2
1511
1512 **Priority:** HIGH (P1)
1513
1514 **Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.