Wiki source code of POC Requirements (POC1 & POC2)

Last modified by Robert Schaub on 2025/12/24 21:27

Show last authors
1 = POC Requirements =
2
3
4 {{info}}
5 **POC1 Architecture:** 3-stage AKEL pipeline (Extract → Analyze → Holistic) with Redis caching, credit tracking, and LLM abstraction layer.
6
7 See [[POC1 API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome]] for complete technical details.
8 {{/info}}
9
10
11
12 **Status:** ✅ Approved for Development
13 **Version:** 2.0 (Updated after Specification Cross-Check)
14 **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
15
16 == 1. POC Overview ==
17
18 === 1.1 What POC Tests ===
19
20 **Core Question:**
21 > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?
22
23 **What we're proving:**
24 * AI can identify factual claims from text
25 * AI can evaluate those claims and produce verdicts
26 * Output is comprehensible and useful
27 * Fully automated approach is viable
28
29 **What we're NOT testing:**
30 * Scenario generation (deferred to POC2)
31 * Evidence display (deferred to POC2)
32 * Production scalability
33 * Perfect accuracy
34 * Complete feature set
35
36 === 1.2 Scenarios Deferred to POC2 ===
37
38 **Intentional Simplification:**
39
40 Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**.
41
42 **Rationale:**
43 * **POC1 tests:** Can AI extract claims and generate verdicts?
44 * **POC2 will add:** Scenario generation and management
45 * **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow?
46
47 **Design Decision:**
48
49 Prove basic AI capability first, then add scenario complexity based on POC1 learnings. This is good engineering: test the hardest part (AI fact-checking) before adding architectural complexity.
50
51 **No Risk:**
52
53 Scenarios are additive complexity, not foundational. Deferring them to POC2 allows:
54 * Faster POC1 validation
55 * Learning from POC1 to inform scenario design
56 * Iterative approach: fail fast if basic AI doesn't work
57 * Flexibility to adjust scenario architecture based on POC1 insights
58
59 **Full System Workflow (Future):**
60 {{code}}
61 Claims → Scenarios → Evidence → Verdicts
62 {{/code}}
63
64 **POC1 Simplified Workflow:**
65 {{code}}
66 Claims → Verdicts (scenarios implicit in reasoning)
67 {{/code}}
68
69 == 2. POC Output Specification ==
70
71 === 2.1 Component 1: ANALYSIS SUMMARY (Context-Aware) ===
72
73 **What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument
74
75 **Length:** 4-6 sentences
76
77 **Content (Required Elements):**
78 1. **Article's main thesis/claim** - What is the article trying to argue or prove?
79 2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts
80 3. **Central vs. supporting claims** - Which claims are central to the article's argument?
81 4. **Relationship assessment** - Do the claims support the article's conclusion?
82 5. **Overall credibility** - Final assessment considering claim importance
83
84 **Critical Innovation:**
85
86 POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might:
87 * Make accurate supporting facts but draw unsupported conclusions
88 * Have one false central claim that invalidates the whole argument
89 * Misframe accurate information to mislead
90
91 **Good Example (Context-Aware):**
92 {{code}}
93 This article argues that coffee cures cancer based on its antioxidant
94 content. We analyzed 3 factual claims: 2 about coffee's chemical
95 properties are well-supported, but the main causal claim is refuted
96 by current evidence. The article confuses correlation with causation.
97 Overall assessment: MISLEADING - makes an unsupported medical claim
98 despite citing some accurate facts.
99 {{/code}}
100
101 **Poor Example (Simple Aggregation - Don't Do This):**
102 {{code}}
103 This article makes 3 claims. 2 are well-supported and 1 is refuted.
104 Overall assessment: mostly accurate (67% accurate).
105 {{/code}}
106 ↑ This misses that the refuted claim IS the article's main point!
107
108 **What POC1 Tests:**
109
110 Can AI identify and assess:
111 * ✅ The article's main thesis/conclusion?
112 * ✅ Which claims are central vs. supporting?
113 * ✅ Whether the evidence supports the conclusion?
114 * ✅ Overall credibility considering logical structure?
115
116 **If AI Cannot Do This:**
117
118 That's valuable to learn in POC1! We'll:
119 * Note as limitation
120 * Fall back to simple aggregation with warning
121 * Design explicit article-level analysis for POC2
122
123 === 2.2 Component 2: CLAIMS IDENTIFICATION ===
124
125 **What:** List of factual claims extracted from article
126 **Format:** Numbered list
127 **Quantity:** 3-5 claims
128 **Requirements:**
129 * Factual claims only (not opinions/questions)
130 * Clearly stated
131 * Automatically extracted by AI
132
133 **Example:**
134 {{code}}
135 CLAIMS IDENTIFIED:
136
137 [1] Coffee reduces diabetes risk by 30%
138 [2] Coffee improves heart health
139 [3] Decaf has same benefits as regular
140 [4] Coffee prevents Alzheimer's completely
141 {{/code}}
142
143 === 2.3 Component 3: CLAIMS VERDICTS ===
144
145 **What:** Verdict for each claim identified
146 **Format:** Per claim structure
147
148 **Required Elements:**
149 * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
150 * **Confidence Score:** 0-100%
151 * **Brief Reasoning:** 1-3 sentences explaining why
152 * **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration
153
154 **Example:**
155 {{code}}
156 VERDICTS:
157
158 [1] WELL-SUPPORTED (85%) [Risk: C]
159 Multiple studies confirm 25-30% risk reduction with regular consumption.
160
161 [2] UNCERTAIN (65%) [Risk: B]
162 Evidence is mixed. Some studies show benefits, others show no effect.
163
164 [3] PARTIALLY SUPPORTED (60%) [Risk: C]
165 Some benefits overlap, but caffeine-related benefits are reduced in decaf.
166
167 [4] REFUTED (90%) [Risk: B]
168 No evidence for complete prevention. Claim is significantly overstated.
169 {{/code}}
170
171 **Risk Tier Display:**
172 * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections
173 * **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
174 * **Tier C (Green):** Low Risk - Facts/Definitions/History
175
176 **Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.
177
178 === 2.4 Component 4: ARTICLE SUMMARY (Optional) ===
179
180 **What:** Brief summary of original article content
181 **Length:** 3-5 sentences
182 **Tone:** Neutral (article's position, not FactHarbor's analysis)
183
184 **Example:**
185 {{code}}
186 ARTICLE SUMMARY:
187
188 Health News Today article discusses coffee benefits, citing studies
189 on diabetes and Alzheimer's. Author highlights research linking coffee
190 to disease prevention. Recommends 2-3 cups daily for optimal health.
191 {{/code}}
192
193 === 2.5 Component 5: USAGE STATISTICS (Cost Tracking) ===
194
195 **What:** LLM usage metrics for cost optimization and scaling decisions
196
197 **Purpose:**
198 * Understand cost per analysis
199 * Identify optimization opportunities
200 * Project costs at scale
201 * Inform architecture decisions
202
203 **Display Format:**
204 {{code}}
205 USAGE STATISTICS:
206 • Article: 2,450 words (12,300 characters)
207 • Input tokens: 15,234
208 • Output tokens: 892
209 • Total tokens: 16,126
210 • Estimated cost: $0.24 USD
211 • Response time: 8.3 seconds
212 • Cost per claim: $0.048
213 • Model: claude-sonnet-4-20250514
214 {{/code}}
215
216 **Why This Matters:**
217
218 At scale, LLM costs are critical:
219 * 10,000 articles/month ≈ $200-500/month
220 * 100,000 articles/month ≈ $2,000-5,000/month
221 * Cost optimization can reduce expenses 30-50%
222
223 **What POC1 Learns:**
224 * How cost scales with article length
225 * Prompt optimization opportunities (caching, compression)
226 * Output verbosity tradeoffs
227 * Model selection strategy (FAST vs. REASONING roles)
228 * Article length limits (if needed)
229
230 **Implementation:**
231 * Claude API already returns usage data
232 * No extra API calls needed
233 * Display to user + log for aggregate analysis
234 * Test with articles of varying lengths
235
236 **Critical for GO/NO-GO:** Unit economics must be viable at scale!
237
238 === 2.6 Total Output Size ===
239
240 **Combined:** ~220-350 words
241 * Analysis Summary (Context-Aware): 60-90 words (4-6 sentences)
242 * Claims Identification: 30-50 words
243 * Claims Verdicts: 100-150 words
244 * Article Summary: 30-50 words (optional)
245
246 **Note:** Analysis summary is slightly longer (4-6 sentences vs. 3-5) to accommodate context-aware assessment of article structure and logical reasoning.
247
248 == 3. What's NOT in POC Scope ==
249
250 === 3.1 Feature Exclusions ===
251
252 The following are **explicitly excluded** from POC:
253
254 **Content Features:**
255 * ❌ Scenarios (deferred to POC2)
256 * ❌ Evidence display (supporting/opposing lists)
257 * ❌ Source links (clickable references)
258 * ❌ Detailed reasoning chains
259 * ❌ Source quality ratings (shown but not detailed)
260 * ❌ Contradiction detection (basic only)
261 * ❌ Risk assessment (shown but not workflow-integrated)
262
263 **Platform Features:**
264 * ❌ User accounts / authentication
265 * ❌ Saved history
266 * ❌ Search functionality
267 * ❌ Claim comparison
268 * ❌ User contributions
269 * ❌ Commenting system
270 * ❌ Social sharing
271
272 **Technical Features:**
273 * ❌ Browser extensions
274 * ❌ Mobile apps
275 * ❌ API endpoints
276 * ❌ Webhooks
277 * ❌ Export features (PDF, CSV)
278
279 **Quality Features:**
280 * ❌ Accessibility (WCAG compliance)
281 * ❌ Multilingual support
282 * ❌ Mobile optimization
283 * ❌ Media verification (images/videos)
284
285 **Production Features:**
286 * ❌ Security hardening
287 * ❌ Privacy compliance (GDPR)
288 * ❌ Terms of service
289 * ❌ Monitoring/logging
290 * ❌ Error tracking
291 * ❌ Analytics
292 * ❌ A/B testing
293
294 == 4. POC Simplifications vs. Full System ==
295
296 === 4.1 Architecture Comparison ===
297
298 **POC Architecture (Simplified):**
299 {{code}}
300 User Input → Single AKEL Call → Output Display
301 (all processing)
302 {{/code}}
303
304 **Full System Architecture:**
305 {{code}}
306 User Input → Claim Extractor → Claim Classifier → Scenario Generator
307 → Evidence Summarizer → Contradiction Detector → Verdict Generator
308 → Quality Gates → Publication → Output Display
309 {{/code}}
310
311 **Key Differences:**
312
313 |=Aspect|=POC1|=Full System
314 |Processing|Single API call|Multi-component pipeline
315 |Scenarios|None (implicit)|Explicit entities with versioning
316 |Evidence|Basic retrieval|Comprehensive with quality scoring
317 |Quality Gates|Simplified (4 basic checks)|Full validation infrastructure
318 |Workflow|3 steps (input/process/output)|6 phases with gates
319 |Data Model|Stateless (no database)|PostgreSQL + Redis + S3
320 |Architecture|Single prompt to Claude|AKEL Orchestrator + Components
321
322 === 4.2 Workflow Comparison ===
323
324 **POC1 Workflow:**
325 1. User submits text/URL
326 2. Single AKEL call (all processing in one prompt)
327 3. Display results
328 **Total: 3 steps, ~10-18 seconds**
329
330 **Full System Workflow:**
331 1. **Claim Submission** (extraction, normalization, clustering)
332 2. **Scenario Building** (definitions, assumptions, boundaries)
333 3. **Evidence Handling** (retrieval, assessment, linking)
334 4. **Verdict Creation** (synthesis, reasoning, approval)
335 5. **Public Presentation** (summaries, landscapes, deep dives)
336 6. **Time Evolution** (versioning, re-evaluation triggers)
337 **Total: 6 phases with quality gates, ~10-30 seconds**
338
339 === 4.3 Why POC is Simplified ===
340
341 **Engineering Rationale:**
342
343 1. **Test core capability first:** Can AI do basic fact-checking without humans?
344 2. **Fail fast:** If AI can't generate reasonable verdicts, pivot early
345 3. **Learn before building:** POC1 insights inform full architecture
346 4. **Iterative approach:** Add complexity only after validating foundations
347 5. **Resource efficiency:** Don't build full system if core concept fails
348
349 **Acceptable Trade-offs:**
350
351 * ✅ POC proves AI capability (most risky assumption)
352 * ✅ POC validates user comprehension (can people understand output?)
353 * ❌ POC doesn't validate full workflow (test in Beta)
354 * ❌ POC doesn't validate scale (test in Beta)
355 * ❌ POC doesn't validate scenario architecture (design in POC2)
356
357 === 4.4 Gap Between POC1 and POC2/Beta ===
358
359 **What needs to be built for POC2:**
360 * Scenario generation component
361 * Evidence Model structure (full)
362 * Scenario-evidence linking
363 * Multi-interpretation comparison
364 * Truth landscape visualization
365
366 **What needs to be built for Beta:**
367 * Multi-component AKEL pipeline
368 * Quality gate infrastructure
369 * Review workflow system
370 * Audit sampling framework
371 * Production data model
372 * Federation architecture (Release 1.0)
373
374 **POC1 → POC2 is significant architectural expansion.**
375
376 == 5. Publication Mode & Labeling ==
377
378 === 5.1 POC Publication Mode ===
379
380 **Mode:** Mode 2 (AI-Generated, No Prior Human Review)
381
382 Per FactHarbor Specification Section 11 "POC v1 Behavior":
383 * Produces public AI-generated output
384 * No human approval gate
385 * Clear AI-Generated labeling
386 * All quality gates active (simplified)
387 * Risk tier classification shown (demo)
388
389 === 5.2 User-Facing Labels ===
390
391 **Primary Label (top of analysis):**
392 {{code}}
393 ╔════════════════════════════════════════════════════════════╗
394 ║ [AI-GENERATED - POC/DEMO] ║
395 ║ ║
396 ║ This analysis was produced entirely by AI and has not ║
397 ║ been human-reviewed. Use for demonstration purposes. ║
398 ║ ║
399 ║ Source: AI/AKEL v1.0 (POC) ║
400 ║ Review Status: Not Reviewed (Proof-of-Concept) ║
401 ║ Quality Gates: 4/4 Passed (Simplified) ║
402 ║ Last Updated: [timestamp] ║
403 ╚════════════════════════════════════════════════════════════╝
404 {{/code}}
405
406 **Per-Claim Risk Labels:**
407 * **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety)
408 * **[Risk: B]** 🟡 Medium Risk (Policy/Science)
409 * **[Risk: C]** 🟢 Low Risk (Facts/Definitions)
410
411 === 5.3 Display Requirements ===
412
413 **Must Show:**
414 * AI-Generated status (prominent)
415 * POC/Demo disclaimer
416 * Risk tier per claim
417 * Confidence scores (0-100%)
418 * Quality gate status (passed/failed)
419 * Timestamp
420
421 **Must NOT Claim:**
422 * Human review
423 * Production quality
424 * Medical/legal advice
425 * Authoritative verdicts
426 * Complete accuracy
427
428 === 5.4 Mode 2 vs. Full System Publication ===
429
430 |=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3
431 |Label|AI-Generated (POC)|AI-Generated|AKEL-Generated
432 |Review|None|None|Human-Reviewed
433 |Quality Gates|4 (simplified)|6 (full)|6 (full) + Human
434 |Audit|None (POC)|Sampling (5-50%)|Pre-publication
435 |Risk Display|Demo only|Workflow-integrated|Validated
436 |User Actions|View only|Flag for review|Trust rating
437
438 == 6. Quality Gates (Simplified Implementation) ==
439
440 === 6.1 Overview ===
441
442 Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates.
443
444 **Full System Has 4 Gates:**
445 1. Source Quality
446 2. Contradiction Search (MANDATORY)
447 3. Uncertainty Quantification
448 4. Structural Integrity
449
450 **POC Implements Simplified Versions:**
451 * Focus on demonstrating concept
452 * Basic implementations sufficient
453 * Failures displayed to user (not blocking)
454 * Full system has comprehensive validation
455
456 === 6.2 Gate 1: Source Quality (Basic) ===
457
458 **Full System Requirements:**
459 * Primary sources identified and accessible
460 * Source reliability scored against whitelist
461 * Citation completeness verified
462 * Publication dates checked
463 * Author credentials validated
464
465 **POC Implementation:**
466 * ✅ At least 2 sources found
467 * ✅ Sources accessible (URLs valid)
468 * ❌ No whitelist checking
469 * ❌ No credential validation
470 * ❌ No comprehensive reliability scoring
471
472 **Pass Criteria:** ≥2 accessible sources found
473
474 **Failure Handling:** Display error message, don't generate verdict
475
476 === 6.3 Gate 2: Contradiction Search (Basic) ===
477
478 **Full System Requirements:**
479 * Counter-evidence actively searched
480 * Reservations and limitations identified
481 * Alternative interpretations explored
482 * Bubble detection (echo chambers, conspiracy theories)
483 * Cross-cultural and international perspectives
484 * Academic literature (supporting AND opposing)
485
486 **POC Implementation:**
487 * ✅ Basic search for counter-evidence
488 * ✅ Identify obvious contradictions
489 * ❌ No comprehensive academic search
490 * ❌ No bubble detection
491 * ❌ No systematic alternative interpretation search
492 * ❌ No international perspective verification
493
494 **Pass Criteria:** Basic contradiction search attempted
495
496 **Failure Handling:** Note "limited contradiction search" in output
497
498 === 6.4 Gate 3: Uncertainty Quantification (Basic) ===
499
500 **Full System Requirements:**
501 * Confidence scores calculated for all claims/verdicts
502 * Limitations explicitly stated
503 * Data gaps identified and disclosed
504 * Strength of evidence assessed
505 * Alternative scenarios considered
506
507 **POC Implementation:**
508 * ✅ Confidence scores (0-100%)
509 * ✅ Basic uncertainty acknowledgment
510 * ❌ No detailed limitation disclosure
511 * ❌ No data gap identification
512 * ❌ No alternative scenario consideration (deferred to POC2)
513
514 **Pass Criteria:** Confidence score assigned
515
516 **Failure Handling:** Show "Confidence: Unknown" if calculation fails
517
518 === 6.5 Gate 4: Structural Integrity (Basic) ===
519
520 **Full System Requirements:**
521 * No hallucinations detected (fact-checking against sources)
522 * Logic chain valid and traceable
523 * References accessible and verifiable
524 * No circular reasoning
525 * Premises clearly stated
526
527 **POC Implementation:**
528 * ✅ Basic coherence check
529 * ✅ References accessible
530 * ❌ No comprehensive hallucination detection
531 * ❌ No formal logic validation
532 * ❌ No premise extraction and verification
533
534 **Pass Criteria:** Output is coherent and references are accessible
535
536 **Failure Handling:** Display error message
537
538 === 6.6 Quality Gate Display ===
539
540 **POC shows simplified status:**
541 {{code}}
542 Quality Gates: 4/4 Passed (Simplified)
543 ✓ Source Quality: 3 sources found
544 ✓ Contradiction Search: Basic search completed
545 ✓ Uncertainty: Confidence scores assigned
546 ✓ Structural Integrity: Output coherent
547 {{/code}}
548
549 **If any gate fails:**
550 {{code}}
551 Quality Gates: 3/4 Passed (Simplified)
552 ✓ Source Quality: 3 sources found
553 ✗ Contradiction Search: Search failed - limited evidence
554 ✓ Uncertainty: Confidence scores assigned
555 ✓ Structural Integrity: Output coherent
556
557 Note: This analysis has limited evidence. Use with caution.
558 {{/code}}
559
560 === 6.7 Simplified vs. Full System ===
561
562 |=Gate|=POC (Simplified)|=Full System
563 |Source Quality|≥2 sources accessible|Whitelist scoring, credentials, comprehensiveness
564 |Contradiction|Basic search|Systematic academic + media + international
565 |Uncertainty|Confidence % assigned|Detailed limitations, data gaps, alternatives
566 |Structural|Coherence check|Hallucination detection, logic validation, premise check
567
568 **POC Goal:** Demonstrate that quality gates are possible, not perfect implementation.
569
570 == 7. AKEL Architecture Comparison ==
571
572 === 7.1 POC AKEL (Simplified) ===
573
574 **Implementation:**
575 * Single provider API call (REASONING model)
576 * One comprehensive prompt
577 * All processing in single request
578 * No separate components
579 * No orchestration layer
580
581 **Prompt Structure:**
582 {{code}}
583 Task: Analyze this article and provide:
584
585 1. Extract 3-5 factual claims
586 2. For each claim:
587 - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
588 - Assign confidence score (0-100%)
589 - Assign risk tier (A/B/C)
590 - Write brief reasoning (1-3 sentences)
591 3. Generate analysis summary (3-5 sentences)
592 4. Generate article summary (3-5 sentences)
593 5. Run basic quality checks
594
595 Return as structured JSON.
596 {{/code}}
597
598 **Processing Time:** 10-18 seconds (estimate)
599
600 === 7.2 Full System AKEL (Production) ===
601
602 **Architecture:**
603 {{code}}
604 AKEL Orchestrator
605 ├── Claim Extractor
606 ├── Claim Classifier (with risk tier assignment)
607 ├── Scenario Generator
608 ├── Evidence Summarizer
609 ├── Contradiction Detector
610 ├── Quality Gate Validator
611 ├── Audit Sampling Scheduler
612 └── Federation Sync Adapter (Release 1.0+)
613 {{/code}}
614
615 **Processing:**
616 * Parallel processing where possible
617 * Separate component calls
618 * Quality gates between phases
619 * Audit sampling selection
620 * Cross-node coordination (federated mode)
621
622 **Processing Time:** 10-30 seconds (full pipeline)
623
624 === 7.3 Why POC Uses Single Call ===
625
626 **Advantages:**
627 * ✅ Simpler to implement
628 * ✅ Faster POC development
629 * ✅ Easier to debug
630 * ✅ Proves AI capability
631 * ✅ Good enough for concept validation
632
633 **Limitations:**
634 * ❌ No component reusability
635 * ❌ No parallel processing
636 * ❌ All-or-nothing (can't partially succeed)
637 * ❌ Harder to improve individual components
638 * ❌ No audit sampling
639
640 **Acceptable Trade-off:**
641
642 POC tests "Can AI do this?" not "How should we architect it?"
643
644 Full component architecture comes in Beta after POC validates concept.
645
646 === 7.4 Evolution Path ===
647
648 **POC1:** Single prompt → Prove concept
649 **POC2:** Add scenario component → Test full pipeline
650 **Beta:** Multi-component AKEL → Production architecture
651 **Release 1.0:** Full AKEL + Federation → Scale
652
653 == 8. Functional Requirements ==
654
655 === FR-POC-1: Article Input ===
656
657 **Requirement:** User can submit article for analysis
658
659 **Functionality:**
660 * Text input field (paste article text, up to 5000 characters)
661 * URL input field (paste article URL)
662 * "Analyze" button to trigger processing
663 * Loading indicator during analysis
664
665 **Excluded:**
666 * No user authentication
667 * No claim history
668 * No search functionality
669 * No saved templates
670
671 **Acceptance Criteria:**
672 * User can paste text from article
673 * User can paste URL of article
674 * System accepts input and triggers analysis
675
676 === FR-POC-2: Claim Extraction (Fully Automated) ===
677
678 **Requirement:** AI automatically extracts 3-5 factual claims
679
680 **Functionality:**
681 * AI reads article text
682 * AI identifies factual claims (not opinions/questions)
683 * AI extracts 3-5 most important claims
684 * System displays numbered list
685
686 **Critical:** NO MANUAL EDITING ALLOWED
687 * AI selects which claims to extract
688 * AI identifies factual vs. non-factual
689 * System processes claims as extracted
690 * No human curation or correction
691
692 **Error Handling:**
693 * If extraction fails: Display error message
694 * User can retry with different input
695 * No manual intervention to fix extraction
696
697 **Acceptance Criteria:**
698 * AI extracts 3-5 claims automatically
699 * Claims are factual (not opinions)
700 * Claims are clearly stated
701 * No manual editing required
702
703 === FR-POC-3: Verdict Generation (Fully Automated) ===
704
705 **Requirement:** AI automatically generates verdict for each claim
706
707 **Functionality:**
708 * For each claim, AI:
709 * Evaluates claim based on available evidence/knowledge
710 * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
711 * Assigns confidence score (0-100%)
712 * Assigns risk tier (A/B/C)
713 * Writes brief reasoning (1-3 sentences)
714 * System displays verdict for each claim
715
716 **Critical:** NO MANUAL EDITING ALLOWED
717 * AI computes verdicts based on evidence
718 * AI generates confidence scores
719 * AI writes reasoning
720 * No human review or adjustment
721
722 **Error Handling:**
723 * If verdict generation fails: Display error message
724 * User can retry
725 * No manual intervention to adjust verdicts
726
727 **Acceptance Criteria:**
728 * Each claim has a verdict
729 * Confidence score is displayed (0-100%)
730 * Risk tier is displayed (A/B/C)
731 * Reasoning is understandable (1-3 sentences)
732 * Verdict is defensible given reasoning
733 * All generated automatically by AI
734
735 === FR-POC-4: Analysis Summary (Fully Automated) ===
736
737 **Requirement:** AI generates brief summary of analysis
738
739 **Functionality:**
740 * AI summarizes findings in 3-5 sentences:
741 * How many claims found
742 * Distribution of verdicts
743 * Overall assessment
744 * System displays at top of results
745
746 **Critical:** NO MANUAL EDITING ALLOWED
747
748 **Acceptance Criteria:**
749 * Summary is coherent
750 * Accurately reflects analysis
751 * 3-5 sentences
752 * Automatically generated
753
754 === FR-POC-5: Article Summary (Fully Automated, Optional) ===
755
756 **Requirement:** AI generates brief summary of original article
757
758 **Functionality:**
759 * AI summarizes article content (not FactHarbor's analysis)
760 * 3-5 sentences
761 * System displays
762
763 **Note:** Optional - can skip if time limited
764
765 **Critical:** NO MANUAL EDITING ALLOWED
766
767 **Acceptance Criteria:**
768 * Summary is neutral (article's position)
769 * Accurately reflects article content
770 * 3-5 sentences
771 * Automatically generated
772
773 === FR-POC-6: Publication Mode Display ===
774
775 **Requirement:** Clear labeling of AI-generated content
776
777 **Functionality:**
778 * Display Mode 2 publication label
779 * Show POC/Demo disclaimer
780 * Display risk tiers per claim
781 * Show quality gate status
782 * Display timestamp
783
784 **Acceptance Criteria:**
785 * Label is prominent and clear
786 * User understands this is AI-generated POC output
787 * Risk tiers are color-coded
788 * Quality gate status is visible
789
790 === FR-POC-7: Quality Gate Execution ===
791
792 **Requirement:** Execute simplified quality gates
793
794 **Functionality:**
795 * Check source quality (basic)
796 * Attempt contradiction search (basic)
797 * Calculate confidence scores
798 * Verify structural integrity (basic)
799 * Display gate results
800
801 **Acceptance Criteria:**
802 * All 4 gates attempted
803 * Pass/fail status displayed
804 * Failures explained to user
805 * Gates don't block publication (POC mode)
806
807 == 9. Non-Functional Requirements ==
808
809 === NFR-POC-1: Fully Automated Processing ===
810
811 **Requirement:** Complete AI automation with zero manual intervention
812
813 **Critical Rule:** NO MANUAL EDITING AT ANY STAGE
814
815 **What this means:**
816 * Claims: AI selects (no human curation)
817 * Scenarios: N/A (deferred to POC2)
818 * Evidence: AI evaluates (no human selection)
819 * Verdicts: AI determines (no human adjustment)
820 * Summaries: AI writes (no human editing)
821
822 **Pipeline:**
823 {{code}}
824 User Input → AKEL Processing → Output Display
825
826 ZERO human editing
827 {{/code}}
828
829 **If AI output is poor:**
830 * ❌ Do NOT manually fix it
831 * ✅ Document the failure
832 * ✅ Improve prompts and retry
833 * ✅ Accept that POC might fail
834
835 **Why this matters:**
836 * Tests whether AI can do this without humans
837 * Validates scalability (humans can't review every analysis)
838 * Honest test of technical feasibility
839
840 === NFR-POC-2: Performance ===
841
842 **Requirement:** Analysis completes in reasonable time
843
844 **Acceptable Performance:**
845 * Processing time: 1-5 minutes (acceptable for POC)
846 * Display loading indicator to user
847 * Show progress if possible ("Extracting claims...", "Generating verdicts...")
848
849 **Not Required:**
850 * Production-level speed (< 30 seconds)
851 * Optimization for scale
852 * Caching
853
854 **Acceptance Criteria:**
855 * Analysis completes within 5 minutes
856 * User sees loading indicator
857 * No timeout errors
858
859 === NFR-POC-3: Reliability ===
860
861 **Requirement:** System works for manual testing sessions
862
863 **Acceptable:**
864 * Occasional errors (< 20% failure rate)
865 * Manual restart if needed
866 * Display error messages clearly
867
868 **Not Required:**
869 * 99.9% uptime
870 * Automatic error recovery
871 * Production monitoring
872
873 **Acceptance Criteria:**
874 * System works for test demonstrations
875 * Errors are handled gracefully
876 * User receives clear error messages
877
878 === NFR-POC-4: Environment ===
879
880 **Requirement:** Runs on simple infrastructure
881
882 **Acceptable:**
883 * Single machine or simple cloud setup
884 * No distributed architecture
885 * No load balancing
886 * No redundancy
887 * Local development environment viable
888
889 **Not Required:**
890 * Production infrastructure
891 * Multi-region deployment
892 * Auto-scaling
893 * Disaster recovery
894
895 === NFR-POC-5: Cost Efficiency Tracking ===
896
897 **Requirement:** Track and display LLM usage metrics to inform optimization decisions
898
899 **Must Track:**
900 * Input tokens (article + prompt)
901 * Output tokens (generated analysis)
902 * Total tokens
903 * Estimated cost (USD)
904 * Response time (seconds)
905 * Article length (words/characters)
906
907 **Must Display:**
908 * Usage statistics in UI (Component 5)
909 * Cost per analysis
910 * Cost per claim extracted
911
912 **Must Log:**
913 * Aggregate metrics for analysis
914 * Cost distribution by article length
915 * Token efficiency trends
916
917 **Purpose:**
918 * Understand unit economics
919 * Identify optimization opportunities
920 * Project costs at scale
921 * Inform architecture decisions (caching, model selection, etc.)
922
923 **Acceptance Criteria:**
924 * ✅ Usage data displayed after each analysis
925 * ✅ Metrics logged for aggregate analysis
926 * ✅ Cost calculated accurately (Claude API pricing)
927 * ✅ Test cases include varying article lengths
928 * ✅ POC1 report includes cost analysis section
929
930 **Success Target:**
931 * Average cost per analysis < $0.05 USD
932 * Cost scaling behavior understood (linear/exponential)
933 * 2+ optimization opportunities identified
934
935 **Critical:** Unit economics must be viable for scaling decision!
936
937 == 10. Technical Architecture ==
938
939 === 10.1 System Components ===
940
941 **Frontend:**
942 * Simple HTML form (text input + URL input + button)
943 * Loading indicator
944 * Results display page (single page, no tabs/navigation)
945
946 **Backend:**
947 * Single API endpoint
948 * Calls provider API (REASONING model; configured via LLM abstraction)
949 * Parses response
950 * Returns JSON to frontend
951
952 **Data Storage:**
953 * None required (stateless POC)
954 * Optional: Simple file storage or SQLite for demo examples
955
956 **External Services:**
957 * Claude API (Anthropic) - required
958 * Optional: URL fetch service for article text extraction
959
960 === 10.2 Processing Flow ===
961
962 {{code}}
963 1. User submits text or URL
964
965 2. Backend receives request
966
967 3. If URL: Fetch article text
968
969 4. Call Claude API with single prompt:
970 "Extract claims, evaluate each, provide verdicts"
971
972 5. Claude API returns:
973 - Analysis summary
974 - Claims list
975 - Verdicts for each claim (with risk tiers)
976 - Article summary (optional)
977 - Quality gate results
978
979 6. Backend parses response
980
981 7. Frontend displays results with Mode 2 labeling
982 {{/code}}
983
984 **Key Simplification:** Single API call does entire analysis
985
986 === 10.3 AI Prompt Strategy ===
987
988 **Single Comprehensive Prompt:**
989 {{code}}
990 Task: Analyze this article and provide:
991
992 1. Identify the article's main thesis/conclusion
993 - What is the article trying to argue or prove?
994 - What is the primary claim or conclusion?
995
996 2. Extract 3-5 factual claims from the article
997 - Note which claims are CENTRAL to the main thesis
998 - Note which claims are SUPPORTING facts
999
1000 3. For each claim:
1001 - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
1002 - Assign confidence score (0-100%)
1003 - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
1004 - Write brief reasoning (1-3 sentences)
1005
1006 4. Assess relationship between claims and main thesis:
1007 - Do the claims actually support the article's conclusion?
1008 - Are there logical leaps or unsupported inferences?
1009 - Is the article's framing misleading even if individual facts are accurate?
1010
1011 5. Run quality gates:
1012 - Check: ≥2 sources found
1013 - Attempt: Basic contradiction search
1014 - Calculate: Confidence scores
1015 - Verify: Structural integrity
1016
1017 6. Write context-aware analysis summary (4-6 sentences):
1018 - State article's main thesis
1019 - Report claims found and verdict distribution
1020 - Note if central claims are problematic
1021 - Assess whether evidence supports conclusion
1022 - Overall credibility considering claim importance
1023
1024 7. Write article summary (3-5 sentences: neutral summary of article content)
1025
1026 Return as structured JSON with quality gate results.
1027 {{/code}}
1028
1029 **One prompt generates everything.**
1030
1031 **Critical Addition:**
1032
1033 Steps 1, 2 (marking central claims), 4, and 6 are NEW for context-aware analysis. These test whether AI can distinguish between "accurate facts poorly reasoned" vs. "genuinely credible article."
1034
1035 === 10.4 Technology Stack Suggestions ===
1036
1037 **Frontend:**
1038 * HTML + CSS + JavaScript (minimal framework)
1039 * OR: Next.js (if team prefers)
1040 * Hosted: Local machine OR Vercel/Netlify free tier
1041
1042 **Backend:**
1043 * Python Flask/FastAPI (simple REST API)
1044 * OR: Next.js API routes (if using Next.js)
1045 * Hosted: Local machine OR Railway/Render free tier
1046
1047 **AKEL Integration:**
1048 * Claude API via Anthropic SDK
1049 * Model: Provider-default REASONING model or latest available
1050
1051 **Database:**
1052 * None (stateless acceptable)
1053 * OR: SQLite if want to store demo examples
1054 * OR: JSON files on disk
1055
1056 **Deployment:**
1057 * Local development environment sufficient for POC
1058 * Optional: Deploy to cloud for remote demos
1059
1060 == 11. Success Criteria ==
1061
1062 === 11.1 Minimum Success (POC Passes) ===
1063
1064 **Required for GO decision:**
1065 * ✅ AI extracts 3-5 factual claims automatically
1066 * ✅ AI provides verdict for each claim automatically
1067 * ✅ Verdicts are reasonable (≥70% make logical sense)
1068 * ✅ Analysis summary is coherent
1069 * ✅ Output is comprehensible to reviewers
1070 * ✅ Team/advisors understand the output
1071 * ✅ Team agrees approach has merit
1072 * ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention)
1073 * ✅ **Cost efficiency acceptable** (average cost per analysis < $0.05 USD target)
1074 * ✅ **Cost scaling understood** (data collected on article length vs. cost)
1075 * ✅ **Optimization opportunities identified** (≥2 potential improvements documented)
1076
1077 **Quality Definition:**
1078 * "Reasonable verdict" = Defensible given general knowledge
1079 * "Coherent summary" = Logically structured, grammatically correct
1080 * "Comprehensible" = Reviewers understand what analysis means
1081
1082 === 11.2 POC Fails If ===
1083
1084 **Automatic NO-GO if any of these:**
1085 * ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones)
1086 * ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random)
1087 * ❌ Output incomprehensible (reviewers can't understand analysis)
1088 * ❌ **Requires manual editing for most analyses** (> 50% need human correction)
1089 * ❌ Team loses confidence in AI-automated approach
1090
1091 === 11.3 Quality Thresholds ===
1092
1093 **POC quality expectations:**
1094
1095 |=Component|=Quality Threshold|=Definition
1096 |Claim Extraction|(% class="success" %)≥70% accuracy(%%) |Identifies obvious factual claims, may miss some edge cases
1097 |Verdict Logic|(% class="success" %)≥70% defensible(%%) |Verdicts are logical given reasoning provided
1098 |Reasoning Clarity|(% class="success" %)≥70% clear(%%) |1-3 sentences are understandable and relevant
1099 |Overall Analysis|(% class="success" %)≥70% useful(%%) |Output helps user understand article claims
1100
1101 **Analogy:** "B student" quality (70-80%), not "A+" perfection yet
1102
1103 **Not expecting:**
1104 * 100% accuracy
1105 * Perfect claim coverage
1106 * Comprehensive evidence gathering
1107 * Flawless verdicts
1108 * Production polish
1109
1110 **Expecting:**
1111 * Reasonable claim extraction
1112 * Defensible verdicts
1113 * Understandable reasoning
1114 * Useful output
1115
1116 == 12. Test Cases ==
1117
1118 === 12.1 Test Case 1: Simple Factual Claim ===
1119
1120 **Input:** "Coffee reduces the risk of type 2 diabetes by 30%"
1121
1122 **Expected Output:**
1123 * Extract claim correctly
1124 * Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED
1125 * Confidence: 70-90%
1126 * Risk tier: C (Low)
1127 * Reasoning: Mentions studies or evidence
1128
1129 **Success:** Verdict is reasonable and reasoning makes sense
1130
1131 === 12.2 Test Case 2: Complex News Article ===
1132
1133 **Input:** News article URL with multiple claims about politics/health/science
1134
1135 **Expected Output:**
1136 * Extract 3-5 key claims
1137 * Verdict for each (may vary: some supported, some uncertain, some refuted)
1138 * Coherent analysis summary
1139 * Article summary
1140 * Risk tiers assigned appropriately
1141
1142 **Success:** Claims identified are actually from article, verdicts are reasonable
1143
1144 === 12.3 Test Case 3: Controversial Topic ===
1145
1146 **Input:** Article on contested political or scientific topic
1147
1148 **Expected Output:**
1149 * Balanced analysis
1150 * Acknowledges uncertainty where appropriate
1151 * Doesn't overstate confidence
1152 * Reasoning shows awareness of complexity
1153
1154 **Success:** Analysis is fair and doesn't show obvious bias
1155
1156 === 12.4 Test Case 4: Clearly False Claim ===
1157
1158 **Input:** Article with obviously false claim (e.g., "The Earth is flat")
1159
1160 **Expected Output:**
1161 * Extract claim
1162 * Verdict: REFUTED
1163 * High confidence (> 90%)
1164 * Risk tier: C (Low - established fact)
1165 * Clear reasoning
1166
1167 **Success:** AI correctly identifies false claim with high confidence
1168
1169 === 12.5 Test Case 5: Genuinely Uncertain Claim ===
1170
1171 **Input:** Article with claim where evidence is genuinely mixed
1172
1173 **Expected Output:**
1174 * Extract claim
1175 * Verdict: UNCERTAIN
1176 * Moderate confidence (40-60%)
1177 * Reasoning explains why uncertain
1178
1179 **Success:** AI recognizes uncertainty and doesn't overstate confidence
1180
1181 === 12.6 Test Case 6: High-Risk Medical Claim ===
1182
1183 **Input:** Article making medical claims
1184
1185 **Expected Output:**
1186 * Extract claim
1187 * Verdict: [appropriate based on evidence]
1188 * Risk tier: A (High - medical)
1189 * Red label displayed
1190 * Clear disclaimer about not being medical advice
1191
1192 **Success:** Risk tier correctly assigned, appropriate warnings shown
1193
1194 == 13. POC Decision Gate ==
1195
1196 === 13.1 Decision Framework ===
1197
1198 After POC testing complete, team makes one of three decisions:
1199
1200 **Option A: GO (Proceed to POC2)**
1201
1202 **Conditions:**
1203 * AI quality ≥70% without manual editing
1204 * Basic claim → verdict pipeline validated
1205 * Internal + advisor feedback positive
1206 * Technical feasibility confirmed
1207 * Team confident in direction
1208 * Clear path to improving AI quality to ≥90%
1209
1210 **Next Steps:**
1211 * Plan POC2 development (add scenarios)
1212 * Design scenario architecture
1213 * Expand to Evidence Model structure
1214 * Test with more complex articles
1215
1216 **Option B: NO-GO (Pivot or Stop)**
1217
1218 **Conditions:**
1219 * AI quality < 60%
1220 * Requires manual editing for most analyses (> 50%)
1221 * Feedback indicates fundamental flaws
1222 * Cost/effort not justified by value
1223 * No clear path to improvement
1224
1225 **Next Steps:**
1226 * **Pivot:** Change to hybrid human-AI approach (accept manual review required)
1227 * **Stop:** Conclude approach not viable, revisit later
1228
1229 **Option C: ITERATE (Improve POC)**
1230
1231 **Conditions:**
1232 * Concept has merit but execution needs work
1233 * Specific improvements identified
1234 * Addressable with better prompts/approach
1235 * AI quality between 60-70%
1236
1237 **Next Steps:**
1238 * Improve AI prompts
1239 * Test different approaches
1240 * Re-run POC with improvements
1241 * Then make GO/NO-GO decision
1242
1243 === 13.2 Decision Criteria Summary ===
1244
1245 {{code}}
1246 AI Quality < 60% → NO-GO (approach doesn't work)
1247 AI Quality 60-70% → ITERATE (improve and retry)
1248 AI Quality ≥70% → GO (proceed to POC2)
1249 {{/code}}
1250
1251 == 14. Key Risks & Mitigations ==
1252
1253 === 14.1 Risk: AI Quality Not Good Enough ===
1254
1255 **Likelihood:** Medium-High
1256 **Impact:** POC fails
1257
1258 **Mitigation:**
1259 * Extensive prompt engineering and testing
1260 * Use best available AI models (role-based selection; configured via LLM abstraction)
1261 * Test with diverse article types
1262 * Iterate on prompts based on results
1263
1264 **Acceptance:** This is what POC tests - be ready for failure
1265
1266 === 14.2 Risk: AI Consistency Issues ===
1267
1268 **Likelihood:** Medium
1269 **Impact:** Works sometimes, fails other times
1270
1271 **Mitigation:**
1272 * Test with 10+ diverse articles
1273 * Measure success rate honestly
1274 * Improve prompts to increase consistency
1275
1276 **Acceptance:** Some variability OK if average quality ≥70%
1277
1278 === 14.3 Risk: Output Incomprehensible ===
1279
1280 **Likelihood:** Low-Medium
1281 **Impact:** Users can't understand analysis
1282
1283 **Mitigation:**
1284 * Create clear explainer document
1285 * Iterate on output format
1286 * Test with non-technical reviewers
1287 * Simplify language if needed
1288
1289 **Acceptance:** Iterate until comprehensible
1290
1291 === 14.4 Risk: API Rate Limits / Costs ===
1292
1293 **Likelihood:** Low
1294 **Impact:** System slow or expensive
1295
1296 **Mitigation:**
1297 * Monitor API usage
1298 * Implement retry logic
1299 * Estimate costs before scaling
1300
1301 **Acceptance:** POC can be slow and expensive (optimization later)
1302
1303 === 14.5 Risk: Scope Creep ===
1304
1305 **Likelihood:** Medium
1306 **Impact:** POC becomes too complex
1307
1308 **Mitigation:**
1309 * Strict scope discipline
1310 * Say NO to feature additions
1311 * Keep focus on core question
1312
1313 **Acceptance:** POC is minimal by design
1314
1315 == 15. POC Philosophy ==
1316
1317 === 15.1 Core Principles ===
1318
1319 **1. Build Less, Learn More**
1320 * Minimum features to test hypothesis
1321 * Don't build unvalidated features
1322 * Focus on core question only
1323
1324 **2. Fail Fast**
1325 * Quick test of hardest part (AI capability)
1326 * Accept that POC might fail
1327 * Better to discover issues early
1328 * Honest assessment over optimistic hope
1329
1330 **3. Test First, Build Second**
1331 * Validate AI can do this before building platform
1332 * Don't assume it will work
1333 * Let results guide decisions
1334
1335 **4. Automation First**
1336 * No manual editing allowed
1337 * Tests scalability, not just feasibility
1338 * Proves approach can work at scale
1339
1340 **5. Honest Assessment**
1341 * Don't cherry-pick examples
1342 * Don't manually fix bad outputs
1343 * Document failures openly
1344 * Make data-driven decisions
1345
1346 === 15.2 What POC Is ===
1347
1348 ✅ Testing AI capability without humans
1349 ✅ Proving core technical concept
1350 ✅ Fast validation of approach
1351 ✅ Honest assessment of feasibility
1352
1353 === 15.3 What POC Is NOT ===
1354
1355 ❌ Building a product
1356 ❌ Production-ready system
1357 ❌ Feature-complete platform
1358 ❌ Perfectly accurate analysis
1359 ❌ Polished user experience
1360
1361 == 16. Success = Clear Path Forward ==
1362
1363 **If POC succeeds (≥70% AI quality):**
1364 * ✅ Approach validated
1365 * ✅ Proceed to POC2 (add scenarios)
1366 * ✅ Design full Evidence Model structure
1367 * ✅ Test multi-scenario comparison
1368 * ✅ Focus on improving AI quality from 70% → 90%
1369
1370 **If POC fails (< 60% AI quality):**
1371 * ✅ Learn what doesn't work
1372 * ✅ Pivot to different approach
1373 * ✅ OR wait for better AI technology
1374 * ✅ Avoid wasting resources on non-viable approach
1375
1376 **Either way, POC provides clarity.**
1377
1378 == 17. Related Pages ==
1379
1380 * [[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]
1381 * [[Requirements>>FactHarbor.Specification.Requirements.WebHome]]
1382 * [[Gap Analysis>>FactHarbor.Specification.Requirements.GapAnalysis]]
1383 * [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]
1384 * [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
1385 * [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]
1386
1387 **Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)
1388
1389
1390 === NFR-POC-11: LLM Provider Abstraction (POC1) ===
1391
1392 **Requirement:** POC1 MUST implement LLM abstraction layer with support for multiple providers.
1393
1394 **POC1 Implementation:**
1395
1396 * **Primary Provider:** Anthropic Claude API
1397 * Stage 1: Provider-default FAST model
1398 * Stage 2: Provider-default REASONING model (cached)
1399 * Stage 3: Provider-default REASONING model
1400
1401 * **Provider Interface:** Abstract LLMProvider interface implemented
1402
1403 * **Configuration:** Environment variables for provider selection
1404 * {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}}
1405 * {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}}
1406 * {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}}
1407
1408 * **Failover:** Basic error handling with cache fallback for Stage 2
1409
1410 * **Cost Tracking:** Log provider name and cost per request
1411
1412 **Future (POC2/Beta):**
1413
1414 * Secondary provider (OpenAI) with automatic failover
1415 * Admin API for runtime provider switching
1416 * Cost comparison dashboard
1417 * Cross-provider output verification
1418
1419 **Success Criteria:**
1420
1421 * All LLM calls go through abstraction layer (no direct API calls)
1422 * Provider can be changed via environment variable without code changes
1423 * Cost tracking includes provider name in logs
1424 * Stage 2 falls back to cache on provider failure
1425
1426 **Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor.Specification.POC.API-and-Schemas.WebHome]] Section 6
1427
1428 **Dependencies:**
1429 * NFR-14 (Main Requirements)
1430 * Design Decision 9
1431 * Architecture Section 2.2
1432
1433 **Priority:** HIGH (P1)
1434
1435 **Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.