Wiki source code of POC Requirements (POC1 & POC2)

Last modified by Robert Schaub on 2026/02/08 08:32

Show last authors
1 = POC Requirements =
2
3 {{warning}}
4 **Implementation Status (Updated January 2026)**
5
6 This specification describes the original POC1 design. The current implementation (v2.6.33) has evolved in several key areas:
7
8 * **Verdict Scale**: Expanded from 4-point to **7-point symmetric scale** (TRUE → MOSTLY-TRUE → LEANING-TRUE → MIXED/UNVERIFIED → LEANING-FALSE → MOSTLY-FALSE → FALSE)
9 * **Scenarios**: Replaced with **KeyFactors** (decomposition questions discovered during analysis)
10 * **Quality Gates**: Gate 1 (Claim Validation) and Gate 4 (Verdict Confidence) implemented; Gates 2-3 deferred
11 * **Caching**: Redis claim caching **not yet implemented**; all data stored as JSON blobs
12 * **Data Model**: Normalized tables **not implemented**; using JSON blob storage in SQLite
13
14 See `Docs/STATUS/Documentation_Inconsistencies.md` for full comparison.
15 {{/warning}}
16
17 {{info}}
18 **POC1 Architecture:** 3-stage AKEL pipeline (Extract → Analyze → Holistic) with Redis caching, credit tracking, and LLM abstraction layer.
19
20 See [[POC1 API Specification>>Archive.FactHarbor 2026\.02\.08.Specification.POC.API-and-Schemas.WebHome]] for complete technical details.
21 {{/info}}
22
23
24
25 **Status:** ✅ Approved for Development
26 **Version:** 2.0 (Updated after Specification Cross-Check)
27 **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
28
29 == 1. POC Overview ==
30
31 === 1.1 What POC Tests ===
32
33 **Core Question:**
34
35 > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?
36
37 **What we're proving:**
38
39 * AI can identify factual claims from text
40 * AI can evaluate those claims and produce verdicts
41 * Output is comprehensible and useful
42 * Fully automated approach is viable
43
44 **What we're NOT testing:**
45
46 * Scenario generation (deferred to POC2)
47 * Evidence display (deferred to POC2)
48 * Production scalability
49 * Perfect accuracy
50 * Complete feature set
51
52 === 1.2 Scenarios Deferred to POC2 ===
53
54 {{info}}
55 **Implementation Update (v2.6.33):** Scenarios were **replaced with KeyFactors** - decomposition questions discovered during the understanding phase. KeyFactors are optional, emergent, and do not require a separate entity. See `Docs/ARCHITECTURE/KeyFactors_Design.md` for rationale.
56 {{/info}}
57
58 **Intentional Simplification:**
59
60 Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**.
61
62 **Rationale:**
63
64 * **POC1 tests:** Can AI extract claims and generate verdicts?
65 * **POC2 will add:** Scenario generation and management
66 * **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow?
67
68 **Design Decision:**
69
70 Prove basic AI capability first, then add scenario complexity based on POC1 learnings. This is good engineering: test the hardest part (AI fact-checking) before adding architectural complexity.
71
72 **No Risk:**
73
74 Scenarios are additive complexity, not foundational. Deferring them to POC2 allows:
75
76 * Faster POC1 validation
77 * Learning from POC1 to inform scenario design
78 * Iterative approach: fail fast if basic AI doesn't work
79 * Flexibility to adjust scenario architecture based on POC1 insights
80
81 **Full System Workflow (Future):**
82 {{code}}Claims → Scenarios → Evidence → Verdicts{{/code}}
83
84 **POC1 Simplified Workflow:**
85 {{code}}Claims → Verdicts (scenarios implicit in reasoning){{/code}}
86
87 == 2. POC Output Specification ==
88
89 === 2.1 Component 1: ANALYSIS SUMMARY (Context-Aware) ===
90
91 **What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument
92
93 **Length:** 4-6 sentences
94
95 **Content (Required Elements):**
96
97 1. **Article's main thesis/claim** - What is the article trying to argue or prove?
98 2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts
99 3. **Central vs. supporting claims** - Which claims are central to the article's argument?
100 4. **Relationship assessment** - Do the claims support the article's conclusion?
101 5. **Overall credibility** - Final assessment considering claim importance
102
103 **Critical Innovation:**
104
105 POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might:
106
107 * Make accurate supporting facts but draw unsupported conclusions
108 * Have one false central claim that invalidates the whole argument
109 * Misframe accurate information to mislead
110
111 **Good Example (Context-Aware):**
112 {{code}}This article argues that coffee cures cancer based on its antioxidant
113 content. We analyzed 3 factual claims: 2 about coffee's chemical
114 properties are well-supported, but the main causal claim is refuted
115 by current evidence. The article confuses correlation with causation.
116 Overall assessment: MISLEADING - makes an unsupported medical claim
117 despite citing some accurate facts.{{/code}}
118
119 **Poor Example (Simple Aggregation - Don't Do This):**
120 {{code}}This article makes 3 claims. 2 are well-supported and 1 is refuted.
121 Overall assessment: mostly accurate (67% accurate).{{/code}}
122 ↑ This misses that the refuted claim IS the article's main point!
123
124 **What POC1 Tests:**
125
126 Can AI identify and assess:
127
128 * ✅ The article's main thesis/conclusion?
129 * ✅ Which claims are central vs. supporting?
130 * ✅ Whether the evidence supports the conclusion?
131 * ✅ Overall credibility considering logical structure?
132
133 **If AI Cannot Do This:**
134
135 That's valuable to learn in POC1! We'll:
136
137 * Note as limitation
138 * Fall back to simple aggregation with warning
139 * Design explicit article-level analysis for POC2
140
141 === 2.2 Component 2: CLAIMS IDENTIFICATION ===
142
143 **What:** List of factual claims extracted from article
144 **Format:** Numbered list
145 **Quantity:** 3-5 claims
146 **Requirements:**
147
148 * Factual claims only (not opinions/questions)
149 * Clearly stated
150 * Automatically extracted by AI
151
152 **Example:**
153 {{code}}CLAIMS IDENTIFIED:
154
155 [1] Coffee reduces diabetes risk by 30%
156 [2] Coffee improves heart health
157 [3] Decaf has same benefits as regular
158 [4] Coffee prevents Alzheimer's completely{{/code}}
159
160 === 2.3 Component 3: CLAIMS VERDICTS ===
161
162 **What:** Verdict for each claim identified
163 **Format:** Per claim structure
164
165 **Required Elements:**
166
167 * **Verdict Label:** ~~WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED~~
168 //**Current Implementation (v2.6.33):** 7-point symmetric scale://////////////
169 * TRUE (86-100%) / MOSTLY-TRUE (72-85%) / LEANING-TRUE (58-71%)
170 * MIXED (43-57%, high confidence) / UNVERIFIED (43-57%, low confidence)
171 * LEANING-FALSE (29-42%) / MOSTLY-FALSE (15-28%) / FALSE (0-14%)
172 * **Confidence Score:** 0-100%
173 * **Brief Reasoning:** 1-3 sentences explaining why
174 * **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration
175
176 **Example:**
177 {{code}}VERDICTS:
178
179 [1] WELL-SUPPORTED (85%) [Risk: C]
180 Multiple studies confirm 25-30% risk reduction with regular consumption.
181
182 [2] UNCERTAIN (65%) [Risk: B]
183 Evidence is mixed. Some studies show benefits, others show no effect.
184
185 [3] PARTIALLY SUPPORTED (60%) [Risk: C]
186 Some benefits overlap, but caffeine-related benefits are reduced in decaf.
187
188 [4] REFUTED (90%) [Risk: B]
189 No evidence for complete prevention. Claim is significantly overstated.{{/code}}
190
191 **Risk Tier Display:**
192
193 * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections
194 * **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
195 * **Tier C (Green):** Low Risk - Facts/Definitions/History
196
197 **Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.
198
199 === 2.4 Component 4: ARTICLE SUMMARY (Optional) ===
200
201 **What:** Brief summary of original article content
202 **Length:** 3-5 sentences
203 **Tone:** Neutral (article's position, not FactHarbor's analysis)
204
205 **Example:**
206 {{code}}ARTICLE SUMMARY:
207
208 Health News Today article discusses coffee benefits, citing studies
209 on diabetes and Alzheimer's. Author highlights research linking coffee
210 to disease prevention. Recommends 2-3 cups daily for optimal health.{{/code}}
211
212 === 2.5 Component 5: USAGE STATISTICS (Cost Tracking) ===
213
214 **What:** LLM usage metrics for cost optimization and scaling decisions
215
216 **Purpose:**
217
218 * Understand cost per analysis
219 * Identify optimization opportunities
220 * Project costs at scale
221 * Inform architecture decisions
222
223 **Display Format:**
224 {{code}}USAGE STATISTICS:
225 • Article: 2,450 words (12,300 characters)
226 • Input tokens: 15,234
227 • Output tokens: 892
228 • Total tokens: 16,126
229 • Estimated cost: $0.24 USD
230 • Response time: 8.3 seconds
231 • Cost per claim: $0.048
232 • Model: claude-sonnet-4-20250514{{/code}}
233
234 **Why This Matters:**
235
236 At scale, LLM costs are critical:
237
238 * 10,000 articles/month ≈ $200-500/month
239 * 100,000 articles/month ≈ $2,000-5,000/month
240 * Cost optimization can reduce expenses 30-50%
241
242 **What POC1 Learns:**
243
244 * How cost scales with article length
245 * Prompt optimization opportunities (caching, compression)
246 * Output verbosity tradeoffs
247 * Model selection strategy (FAST vs. REASONING roles)
248 * Article length limits (if needed)
249
250 **Implementation:**
251
252 * Claude API already returns usage data
253 * No extra API calls needed
254 * Display to user + log for aggregate analysis
255 * Test with articles of varying lengths
256
257 **Critical for GO/NO-GO:** Unit economics must be viable at scale!
258
259 === 2.6 Total Output Size ===
260
261 **Combined:** 220-350 words
262
263 * Analysis Summary (Context-Aware): 60-90 words (4-6 sentences)
264 * Claims Identification: 30-50 words
265 * Claims Verdicts: 100-150 words
266 * Article Summary: 30-50 words (optional)
267
268 **Note:** Analysis summary is slightly longer (4-6 sentences vs. 3-5) to accommodate context-aware assessment of article structure and logical reasoning.
269
270 == 3. What's NOT in POC Scope ==
271
272 === 3.1 Feature Exclusions ===
273
274 The following are **explicitly excluded** from POC:
275
276 **Content Features:**
277
278 * ❌ Scenarios (deferred to POC2)
279 * ❌ Evidence display (supporting/opposing lists)
280 * ❌ Source links (clickable references)
281 * ❌ Detailed reasoning chains
282 * ❌ Source quality ratings (shown but not detailed)
283 * ❌ Contradiction detection (basic only)
284 * ❌ Risk assessment (shown but not workflow-integrated)
285
286 **Platform Features:**
287
288 * ❌ User accounts / authentication
289 * ❌ Saved history
290 * ❌ Search functionality
291 * ❌ Claim comparison
292 * ❌ User contributions
293 * ❌ Commenting system
294 * ❌ Social sharing
295
296 **Technical Features:**
297
298 * ❌ Browser extensions
299 * ❌ Mobile apps
300 * ❌ API endpoints
301 * ❌ Webhooks
302 * ❌ Export features (PDF, CSV)
303
304 **Quality Features:**
305
306 * ❌ Accessibility (WCAG compliance)
307 * ❌ Multilingual support
308 * ❌ Mobile optimization
309 * ❌ Media verification (images/videos)
310
311 **Production Features:**
312
313 * ❌ Security hardening
314 * ❌ Privacy compliance (GDPR)
315 * ❌ Terms of service
316 * ❌ Monitoring/logging
317 * ❌ Error tracking
318 * ❌ Analytics
319 * ❌ A/B testing
320
321 == 4. POC Simplifications vs. Full System ==
322
323 === 4.1 Architecture Comparison ===
324
325 **POC Architecture (Simplified):**
326 {{code}}User Input → Single AKEL Call → Output Display
327 (all processing){{/code}}
328
329 **Full System Architecture:**
330 {{code}}User Input → Claim Extractor → Claim Classifier → Scenario Generator
331 → Evidence Summarizer → Contradiction Detector → Verdict Generator
332 → Quality Gates → Publication → Output Display{{/code}}
333
334 **Key Differences:**
335
336 |=Aspect|=POC1|=Full System
337 |Processing|Single API call|Multi-component pipeline
338 |Scenarios|None (implicit)|Explicit entities with versioning
339 |Evidence|Basic retrieval|Comprehensive with quality scoring
340 |Quality Gates|Simplified (4 basic checks)|Full validation infrastructure
341 |Workflow|3 steps (input/process/output)|6 phases with gates
342 |Data Model|Stateless (no database)|PostgreSQL + Redis + S3
343 |Architecture|Single prompt to Claude|AKEL Orchestrator + Components
344
345 === 4.2 Workflow Comparison ===
346
347 **POC1 Workflow:**
348
349 1. User submits text/URL
350 2. Single AKEL call (all processing in one prompt)
351 3. Display results
352 **Total: 3 steps, 10-18 seconds**
353
354 **Full System Workflow:**
355
356 1. **Claim Submission** (extraction, normalization, clustering)
357 2. **Scenario Building** (definitions, assumptions, boundaries)
358 3. **Evidence Handling** (retrieval, assessment, linking)
359 4. **Verdict Creation** (synthesis, reasoning, approval)
360 5. **Public Presentation** (summaries, landscapes, deep dives)
361 6. **Time Evolution** (versioning, re-evaluation triggers)
362 **Total: 6 phases with quality gates, 10-30 seconds**
363
364 === 4.3 Why POC is Simplified ===
365
366 **Engineering Rationale:**
367
368 1. **Test core capability first:** Can AI do basic fact-checking without humans?
369 2. **Fail fast:** If AI can't generate reasonable verdicts, pivot early
370 3. **Learn before building:** POC1 insights inform full architecture
371 4. **Iterative approach:** Add complexity only after validating foundations
372 5. **Resource efficiency:** Don't build full system if core concept fails
373
374 **Acceptable Trade-offs:**
375
376 * ✅ POC proves AI capability (most risky assumption)
377 * ✅ POC validates user comprehension (can people understand output?)
378 * ❌ POC doesn't validate full workflow (test in Beta)
379 * ❌ POC doesn't validate scale (test in Beta)
380 * ❌ POC doesn't validate scenario architecture (design in POC2)
381
382 === 4.4 Gap Between POC1 and POC2/Beta ===
383
384 **What needs to be built for POC2:**
385
386 * Scenario generation component
387 * Evidence Model structure (full)
388 * Scenario-evidence linking
389 * Multi-interpretation comparison
390 * Truth landscape visualization
391
392 **What needs to be built for Beta:**
393
394 * Multi-component AKEL pipeline
395 * Quality gate infrastructure
396 * Review workflow system
397 * Audit sampling framework
398 * Production data model
399 * Federation architecture (Release 1.0)
400
401 **POC1 → POC2 is significant architectural expansion.**
402
403 == 5. Publication Mode & Labeling ==
404
405 === 5.1 POC Publication Mode ===
406
407 **Mode:** Mode 2 (AI-Generated, No Prior Human Review)
408
409 Per FactHarbor Specification Section 11 "POC v1 Behavior":
410
411 * Produces public AI-generated output
412 * No human approval gate
413 * Clear AI-Generated labeling
414 * All quality gates active (simplified)
415 * Risk tier classification shown (demo)
416
417 === 5.2 User-Facing Labels ===
418
419 **Primary Label (top of analysis):**
420 {{code}}╔════════════════════════════════════════════════════════════╗
421 ║ [AI-GENERATED - POC/DEMO] ║
422 ║ ║
423 ║ This analysis was produced entirely by AI and has not ║
424 ║ been human-reviewed. Use for demonstration purposes. ║
425 ║ ║
426 ║ Source: AI/AKEL v1.0 (POC) ║
427 ║ Review Status: Not Reviewed (Proof-of-Concept) ║
428 ║ Quality Gates: 4/4 Passed (Simplified) ║
429 ║ Last Updated: [timestamp] ║
430 ╚════════════════════════════════════════════════════════════╝{{/code}}
431
432 **Per-Claim Risk Labels:**
433
434 * **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety)
435 * **[Risk: B]** 🟡 Medium Risk (Policy/Science)
436 * **[Risk: C]** 🟢 Low Risk (Facts/Definitions)
437
438 === 5.3 Display Requirements ===
439
440 **Must Show:**
441
442 * AI-Generated status (prominent)
443 * POC/Demo disclaimer
444 * Risk tier per claim
445 * Confidence scores (0-100%)
446 * Quality gate status (passed/failed)
447 * Timestamp
448
449 **Must NOT Claim:**
450
451 * Human review
452 * Production quality
453 * Medical/legal advice
454 * Authoritative verdicts
455 * Complete accuracy
456
457 === 5.4 Mode 2 vs. Full System Publication ===
458
459 |=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3
460 |Label|AI-Generated (POC)|AI-Generated|AKEL-Generated
461 |Review|None|None|Human-Reviewed
462 |Quality Gates|4 (simplified)|6 (full)|6 (full) + Human
463 |Audit|None (POC)|Sampling (5-50%)|Pre-publication
464 |Risk Display|Demo only|Workflow-integrated|Validated
465 |User Actions|View only|Flag for review|Trust rating
466
467 == 6. Quality Gates (Simplified Implementation) ==
468
469 === 6.1 Overview ===
470
471 Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates.
472
473 **Full System Has 4 Gates:**
474
475 1. Source Quality
476 2. Contradiction Search (MANDATORY)
477 3. Uncertainty Quantification
478 4. Structural Integrity
479
480 **POC Implements Simplified Versions:**
481
482 * Focus on demonstrating concept
483 * Basic implementations sufficient
484 * Failures displayed to user (not blocking)
485 * Full system has comprehensive validation
486
487 === 6.2 Gate 1: Source Quality (Basic) ===
488
489 **Full System Requirements:**
490
491 * Primary sources identified and accessible
492 * Source reliability scored against whitelist
493 * Citation completeness verified
494 * Publication dates checked
495 * Author credentials validated
496
497 **POC Implementation:**
498
499 * ✅ At least 2 sources found
500 * ✅ Sources accessible (URLs valid)
501 * ❌ No whitelist checking
502 * ❌ No credential validation
503 * ❌ No comprehensive reliability scoring
504
505 **Pass Criteria:** ≥2 accessible sources found
506
507 **Failure Handling:** Display error message, don't generate verdict
508
509 === 6.3 Gate 2: Contradiction Search (Basic) ===
510
511 **Full System Requirements:**
512
513 * Counter-evidence actively searched
514 * Reservations and limitations identified
515 * Alternative interpretations explored
516 * Bubble detection (echo chambers, conspiracy theories)
517 * Cross-cultural and international perspectives
518 * Academic literature (supporting AND opposing)
519
520 **POC Implementation:**
521
522 * ✅ Basic search for counter-evidence
523 * ✅ Identify obvious contradictions
524 * ❌ No comprehensive academic search
525 * ❌ No bubble detection
526 * ❌ No systematic alternative interpretation search
527 * ❌ No international perspective verification
528
529 **Pass Criteria:** Basic contradiction search attempted
530
531 **Failure Handling:** Note "limited contradiction search" in output
532
533 === 6.4 Gate 3: Uncertainty Quantification (Basic) ===
534
535 **Full System Requirements:**
536
537 * Confidence scores calculated for all claims/verdicts
538 * Limitations explicitly stated
539 * Data gaps identified and disclosed
540 * Strength of evidence assessed
541 * Alternative scenarios considered
542
543 **POC Implementation:**
544
545 * ✅ Confidence scores (0-100%)
546 * ✅ Basic uncertainty acknowledgment
547 * ❌ No detailed limitation disclosure
548 * ❌ No data gap identification
549 * ❌ No alternative scenario consideration (deferred to POC2)
550
551 **Pass Criteria:** Confidence score assigned
552
553 **Failure Handling:** Show "Confidence: Unknown" if calculation fails
554
555 === 6.5 Gate 4: Structural Integrity (Basic) ===
556
557 **Full System Requirements:**
558
559 * No hallucinations detected (fact-checking against sources)
560 * Logic chain valid and traceable
561 * References accessible and verifiable
562 * No circular reasoning
563 * Premises clearly stated
564
565 **POC Implementation:**
566
567 * ✅ Basic coherence check
568 * ✅ References accessible
569 * ❌ No comprehensive hallucination detection
570 * ❌ No formal logic validation
571 * ❌ No premise extraction and verification
572
573 **Pass Criteria:** Output is coherent and references are accessible
574
575 **Failure Handling:** Display error message
576
577 === 6.6 Quality Gate Display ===
578
579 **POC shows simplified status:**
580 {{code}}Quality Gates: 4/4 Passed (Simplified)
581 ✓ Source Quality: 3 sources found
582 ✓ Contradiction Search: Basic search completed
583 ✓ Uncertainty: Confidence scores assigned
584 ✓ Structural Integrity: Output coherent{{/code}}
585
586 **If any gate fails:**
587 {{code}}Quality Gates: 3/4 Passed (Simplified)
588 ✓ Source Quality: 3 sources found
589 ✗ Contradiction Search: Search failed - limited evidence
590 ✓ Uncertainty: Confidence scores assigned
591 ✓ Structural Integrity: Output coherent
592
593 Note: This analysis has limited evidence. Use with caution.{{/code}}
594
595 === 6.7 Simplified vs. Full System ===
596
597 |=Gate|=POC (Simplified)|=Full System
598 |Source Quality|≥2 sources accessible|Whitelist scoring, credentials, comprehensiveness
599 |Contradiction|Basic search|Systematic academic + media + international
600 |Uncertainty|Confidence % assigned|Detailed limitations, data gaps, alternatives
601 |Structural|Coherence check|Hallucination detection, logic validation, premise check
602
603 **POC Goal:** Demonstrate that quality gates are possible, not perfect implementation.
604
605 == 7. AKEL Architecture Comparison ==
606
607 === 7.1 POC AKEL (Simplified) ===
608
609 **Implementation:**
610
611 * Single provider API call (REASONING model)
612 * One comprehensive prompt
613 * All processing in single request
614 * No separate components
615 * No orchestration layer
616
617 **Prompt Structure:**
618 {{code}}Task: Analyze this article and provide:
619
620 1. Extract 3-5 factual claims
621 2. For each claim:
622 - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
623 - Assign confidence score (0-100%)
624 - Assign risk tier (A/B/C)
625 - Write brief reasoning (1-3 sentences)
626 3. Generate analysis summary (3-5 sentences)
627 4. Generate article summary (3-5 sentences)
628 5. Run basic quality checks
629
630 Return as structured JSON.{{/code}}
631
632 **Processing Time:** 10-18 seconds (estimate)
633
634 === 7.2 Full System AKEL (Production) ===
635
636 **Architecture:**
637 {{code}}AKEL Orchestrator
638 ├── Claim Extractor
639 ├── Claim Classifier (with risk tier assignment)
640 ├── Scenario Generator
641 ├── Evidence Summarizer
642 ├── Contradiction Detector
643 ├── Quality Gate Validator
644 ├── Audit Sampling Scheduler
645 └── Federation Sync Adapter (Release 1.0+){{/code}}
646
647 **Processing:**
648
649 * Parallel processing where possible
650 * Separate component calls
651 * Quality gates between phases
652 * Audit sampling selection
653 * Cross-node coordination (federated mode)
654
655 **Processing Time:** 10-30 seconds (full pipeline)
656
657 === 7.3 Why POC Uses Single Call ===
658
659 **Advantages:**
660
661 * ✅ Simpler to implement
662 * ✅ Faster POC development
663 * ✅ Easier to debug
664 * ✅ Proves AI capability
665 * ✅ Good enough for concept validation
666
667 **Limitations:**
668
669 * ❌ No component reusability
670 * ❌ No parallel processing
671 * ❌ All-or-nothing (can't partially succeed)
672 * ❌ Harder to improve individual components
673 * ❌ No audit sampling
674
675 **Acceptable Trade-off:**
676
677 POC tests "Can AI do this?" not "How should we architect it?"
678
679 Full component architecture comes in Beta after POC validates concept.
680
681 === 7.4 Evolution Path ===
682
683 **POC1:** Single prompt → Prove concept
684 **POC2:** Add scenario component → Test full pipeline
685 **Beta:** Multi-component AKEL → Production architecture
686 **Release 1.0:** Full AKEL + Federation → Scale
687
688 == 8. Functional Requirements ==
689
690 === FR-POC-1: Article Input ===
691
692 **Requirement:** User can submit article for analysis
693
694 **Functionality:**
695
696 * Text input field (paste article text, up to 5000 characters)
697 * URL input field (paste article URL)
698 * "Analyze" button to trigger processing
699 * Loading indicator during analysis
700
701 **Excluded:**
702
703 * No user authentication
704 * No claim history
705 * No search functionality
706 * No saved templates
707
708 **Acceptance Criteria:**
709
710 * User can paste text from article
711 * User can paste URL of article
712 * System accepts input and triggers analysis
713
714 === FR-POC-2: Claim Extraction (Fully Automated) ===
715
716 **Requirement:** AI automatically extracts 3-5 factual claims
717
718 **Functionality:**
719
720 * AI reads article text
721 * AI identifies factual claims (not opinions/questions)
722 * AI extracts 3-5 most important claims
723 * System displays numbered list
724
725 **Critical:** NO MANUAL EDITING ALLOWED
726
727 * AI selects which claims to extract
728 * AI identifies factual vs. non-factual
729 * System processes claims as extracted
730 * No human curation or correction
731
732 **Error Handling:**
733
734 * If extraction fails: Display error message
735 * User can retry with different input
736 * No manual intervention to fix extraction
737
738 **Acceptance Criteria:**
739
740 * AI extracts 3-5 claims automatically
741 * Claims are factual (not opinions)
742 * Claims are clearly stated
743 * No manual editing required
744
745 === FR-POC-3: Verdict Generation (Fully Automated) ===
746
747 **Requirement:** AI automatically generates verdict for each claim
748
749 **Functionality:**
750
751 * For each claim, AI:
752 * Evaluates claim based on available evidence/knowledge
753 * Determines verdict: ~~WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED~~ //(Now: 7-point scale - see Section 2.3)//
754 * Assigns confidence score (0-100%)
755 * Assigns risk tier (A/B/C)
756 * Writes brief reasoning (1-3 sentences)
757 * System displays verdict for each claim
758
759 **Critical:** NO MANUAL EDITING ALLOWED
760
761 * AI computes verdicts based on evidence
762 * AI generates confidence scores
763 * AI writes reasoning
764 * No human review or adjustment
765
766 **Error Handling:**
767
768 * If verdict generation fails: Display error message
769 * User can retry
770 * No manual intervention to adjust verdicts
771
772 **Acceptance Criteria:**
773
774 * Each claim has a verdict
775 * Confidence score is displayed (0-100%)
776 * Risk tier is displayed (A/B/C)
777 * Reasoning is understandable (1-3 sentences)
778 * Verdict is defensible given reasoning
779 * All generated automatically by AI
780
781 === FR-POC-4: Analysis Summary (Fully Automated) ===
782
783 **Requirement:** AI generates brief summary of analysis
784
785 **Functionality:**
786
787 * AI summarizes findings in 3-5 sentences:
788 * How many claims found
789 * Distribution of verdicts
790 * Overall assessment
791 * System displays at top of results
792
793 **Critical:** NO MANUAL EDITING ALLOWED
794
795 **Acceptance Criteria:**
796
797 * Summary is coherent
798 * Accurately reflects analysis
799 * 3-5 sentences
800 * Automatically generated
801
802 === FR-POC-5: Article Summary (Fully Automated, Optional) ===
803
804 **Requirement:** AI generates brief summary of original article
805
806 **Functionality:**
807
808 * AI summarizes article content (not FactHarbor's analysis)
809 * 3-5 sentences
810 * System displays
811
812 **Note:** Optional - can skip if time limited
813
814 **Critical:** NO MANUAL EDITING ALLOWED
815
816 **Acceptance Criteria:**
817
818 * Summary is neutral (article's position)
819 * Accurately reflects article content
820 * 3-5 sentences
821 * Automatically generated
822
823 === FR-POC-6: Publication Mode Display ===
824
825 **Requirement:** Clear labeling of AI-generated content
826
827 **Functionality:**
828
829 * Display Mode 2 publication label
830 * Show POC/Demo disclaimer
831 * Display risk tiers per claim
832 * Show quality gate status
833 * Display timestamp
834
835 **Acceptance Criteria:**
836
837 * Label is prominent and clear
838 * User understands this is AI-generated POC output
839 * Risk tiers are color-coded
840 * Quality gate status is visible
841
842 === FR-POC-7: Quality Gate Execution ===
843
844 **Requirement:** Execute simplified quality gates
845
846 **Functionality:**
847
848 * Check source quality (basic)
849 * Attempt contradiction search (basic)
850 * Calculate confidence scores
851 * Verify structural integrity (basic)
852 * Display gate results
853
854 **Acceptance Criteria:**
855
856 * All 4 gates attempted
857 * Pass/fail status displayed
858 * Failures explained to user
859 * Gates don't block publication (POC mode)
860
861 == 9. Non-Functional Requirements ==
862
863 === NFR-POC-1: Fully Automated Processing ===
864
865 **Requirement:** Complete AI automation with zero manual intervention
866
867 **Critical Rule:** NO MANUAL EDITING AT ANY STAGE
868
869 **What this means:**
870
871 * Claims: AI selects (no human curation)
872 * Scenarios: N/A (deferred to POC2)
873 * Evidence: AI evaluates (no human selection)
874 * Verdicts: AI determines (no human adjustment)
875 * Summaries: AI writes (no human editing)
876
877 **Pipeline:**
878 {{code}}User Input → AKEL Processing → Output Display
879
880 ZERO human editing{{/code}}
881
882 **If AI output is poor:**
883
884 * ❌ Do NOT manually fix it
885 * ✅ Document the failure
886 * ✅ Improve prompts and retry
887 * ✅ Accept that POC might fail
888
889 **Why this matters:**
890
891 * Tests whether AI can do this without humans
892 * Validates scalability (humans can't review every analysis)
893 * Honest test of technical feasibility
894
895 === NFR-POC-2: Performance ===
896
897 **Requirement:** Analysis completes in reasonable time
898
899 **Acceptable Performance:**
900
901 * Processing time: 1-5 minutes (acceptable for POC)
902 * Display loading indicator to user
903 * Show progress if possible ("Extracting claims...", "Generating verdicts...")
904
905 **Not Required:**
906
907 * Production-level speed (< 30 seconds)
908 * Optimization for scale
909 * Caching
910
911 **Acceptance Criteria:**
912
913 * Analysis completes within 5 minutes
914 * User sees loading indicator
915 * No timeout errors
916
917 === NFR-POC-3: Reliability ===
918
919 **Requirement:** System works for manual testing sessions
920
921 **Acceptable:**
922
923 * Occasional errors (< 20% failure rate)
924 * Manual restart if needed
925 * Display error messages clearly
926
927 **Not Required:**
928
929 * 99.9% uptime
930 * Automatic error recovery
931 * Production monitoring
932
933 **Acceptance Criteria:**
934
935 * System works for test demonstrations
936 * Errors are handled gracefully
937 * User receives clear error messages
938
939 === NFR-POC-4: Environment ===
940
941 **Requirement:** Runs on simple infrastructure
942
943 **Acceptable:**
944
945 * Single machine or simple cloud setup
946 * No distributed architecture
947 * No load balancing
948 * No redundancy
949 * Local development environment viable
950
951 **Not Required:**
952
953 * Production infrastructure
954 * Multi-region deployment
955 * Auto-scaling
956 * Disaster recovery
957
958 === NFR-POC-5: Cost Efficiency Tracking ===
959
960 **Requirement:** Track and display LLM usage metrics to inform optimization decisions
961
962 **Must Track:**
963
964 * Input tokens (article + prompt)
965 * Output tokens (generated analysis)
966 * Total tokens
967 * Estimated cost (USD)
968 * Response time (seconds)
969 * Article length (words/characters)
970
971 **Must Display:**
972
973 * Usage statistics in UI (Component 5)
974 * Cost per analysis
975 * Cost per claim extracted
976
977 **Must Log:**
978
979 * Aggregate metrics for analysis
980 * Cost distribution by article length
981 * Token efficiency trends
982
983 **Purpose:**
984
985 * Understand unit economics
986 * Identify optimization opportunities
987 * Project costs at scale
988 * Inform architecture decisions (caching, model selection, etc.)
989
990 **Acceptance Criteria:**
991
992 * ✅ Usage data displayed after each analysis
993 * ✅ Metrics logged for aggregate analysis
994 * ✅ Cost calculated accurately (Claude API pricing)
995 * ✅ Test cases include varying article lengths
996 * ✅ POC1 report includes cost analysis section
997
998 **Success Target:**
999
1000 * Average cost per analysis < $0.05 USD
1001 * Cost scaling behavior understood (linear/exponential)
1002 * 2+ optimization opportunities identified
1003
1004 **Critical:** Unit economics must be viable for scaling decision!
1005
1006 == 10. Technical Architecture ==
1007
1008 === 10.1 System Components ===
1009
1010 **Frontend:**
1011
1012 * Simple HTML form (text input + URL input + button)
1013 * Loading indicator
1014 * Results display page (single page, no tabs/navigation)
1015
1016 **Backend:**
1017
1018 * Single API endpoint
1019 * Calls provider API (REASONING model; configured via LLM abstraction)
1020 * Parses response
1021 * Returns JSON to frontend
1022
1023 **Data Storage:**
1024
1025 * None required (stateless POC)
1026 * Optional: Simple file storage or SQLite for demo examples
1027
1028 **External Services:**
1029
1030 * Claude API (Anthropic) - required
1031 * Optional: URL fetch service for article text extraction
1032
1033 === 10.2 Processing Flow ===
1034
1035 {{code}}
1036 1. User submits text or URL
1037
1038 2. Backend receives request
1039
1040 3. If URL: Fetch article text
1041
1042 4. Call Claude API with single prompt:
1043 "Extract claims, evaluate each, provide verdicts"
1044
1045 5. Claude API returns:
1046 - Analysis summary
1047 - Claims list
1048 - Verdicts for each claim (with risk tiers)
1049 - Article summary (optional)
1050 - Quality gate results
1051
1052 6. Backend parses response
1053
1054 7. Frontend displays results with Mode 2 labeling
1055 {{/code}}
1056
1057 **Key Simplification:** Single API call does entire analysis
1058
1059 === 10.3 AI Prompt Strategy ===
1060
1061 **Single Comprehensive Prompt:**
1062 {{code}}Task: Analyze this article and provide:
1063
1064 1. Identify the article's main thesis/conclusion
1065 - What is the article trying to argue or prove?
1066 - What is the primary claim or conclusion?
1067
1068 2. Extract 3-5 factual claims from the article
1069 - Note which claims are CENTRAL to the main thesis
1070 - Note which claims are SUPPORTING facts
1071
1072 3. For each claim:
1073 - Determine verdict (7-point scale: TRUE/MOSTLY-TRUE/LEANING-TRUE/MIXED/UNVERIFIED/LEANING-FALSE/MOSTLY-FALSE/FALSE)
1074 - Assign confidence score (0-100%)
1075 - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
1076 - Write brief reasoning (1-3 sentences)
1077
1078 4. Assess relationship between claims and main thesis:
1079 - Do the claims actually support the article's conclusion?
1080 - Are there logical leaps or unsupported inferences?
1081 - Is the article's framing misleading even if individual facts are accurate?
1082
1083 5. Run quality gates:
1084 - Check: ≥2 sources found
1085 - Attempt: Basic contradiction search
1086 - Calculate: Confidence scores
1087 - Verify: Structural integrity
1088
1089 6. Write context-aware analysis summary (4-6 sentences):
1090 - State article's main thesis
1091 - Report claims found and verdict distribution
1092 - Note if central claims are problematic
1093 - Assess whether evidence supports conclusion
1094 - Overall credibility considering claim importance
1095
1096 7. Write article summary (3-5 sentences: neutral summary of article content)
1097
1098 Return as structured JSON with quality gate results.{{/code}}
1099
1100 **One prompt generates everything.**
1101
1102 **Critical Addition:**
1103
1104 Steps 1, 2 (marking central claims), 4, and 6 are NEW for context-aware analysis. These test whether AI can distinguish between "accurate facts poorly reasoned" vs. "genuinely credible article."
1105
1106 === 10.4 Technology Stack Suggestions ===
1107
1108 **Frontend:**
1109
1110 * HTML + CSS + JavaScript (minimal framework)
1111 * OR: Next.js (if team prefers)
1112 * Hosted: Local machine OR Vercel/Netlify free tier
1113
1114 **Backend:**
1115
1116 * Python Flask/FastAPI (simple REST API)
1117 * OR: Next.js API routes (if using Next.js)
1118 * Hosted: Local machine OR Railway/Render free tier
1119
1120 **AKEL Integration:**
1121
1122 * Claude API via Anthropic SDK
1123 * Model: Provider-default REASONING model or latest available
1124
1125 **Database:**
1126
1127 * None (stateless acceptable)
1128 * OR: SQLite if want to store demo examples
1129 * OR: JSON files on disk
1130
1131 **Deployment:**
1132
1133 * Local development environment sufficient for POC
1134 * Optional: Deploy to cloud for remote demos
1135
1136 == 11. Success Criteria ==
1137
1138 === 11.1 Minimum Success (POC Passes) ===
1139
1140 **Required for GO decision:**
1141
1142 * ✅ AI extracts 3-5 factual claims automatically
1143 * ✅ AI provides verdict for each claim automatically
1144 * ✅ Verdicts are reasonable (≥70% make logical sense)
1145 * ✅ Analysis summary is coherent
1146 * ✅ Output is comprehensible to reviewers
1147 * ✅ Team/advisors understand the output
1148 * ✅ Team agrees approach has merit
1149 * ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention)
1150 * ✅ **Cost efficiency acceptable** (average cost per analysis < $0.05 USD target)
1151 * ✅ **Cost scaling understood** (data collected on article length vs. cost)
1152 * ✅ **Optimization opportunities identified** (≥2 potential improvements documented)
1153
1154 **Quality Definition:**
1155
1156 * "Reasonable verdict" = Defensible given general knowledge
1157 * "Coherent summary" = Logically structured, grammatically correct
1158 * "Comprehensible" = Reviewers understand what analysis means
1159
1160 === 11.2 POC Fails If ===
1161
1162 **Automatic NO-GO if any of these:**
1163
1164 * ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones)
1165 * ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random)
1166 * ❌ Output incomprehensible (reviewers can't understand analysis)
1167 * ❌ **Requires manual editing for most analyses** (> 50% need human correction)
1168 * ❌ Team loses confidence in AI-automated approach
1169
1170 === 11.3 Quality Thresholds ===
1171
1172 **POC quality expectations:**
1173
1174 |=Component|=Quality Threshold|=Definition
1175 |Claim Extraction|(% class="success" %)≥70% accuracy |Identifies obvious factual claims, may miss some edge cases
1176 |Verdict Logic|(% class="success" %)≥70% defensible |Verdicts are logical given reasoning provided
1177 |Reasoning Clarity|(% class="success" %)≥70% clear |1-3 sentences are understandable and relevant
1178 |Overall Analysis|(% class="success" %)≥70% useful |Output helps user understand article claims
1179
1180 **Analogy:** "B student" quality (70-80%), not "A+" perfection yet
1181
1182 **Not expecting:**
1183
1184 * 100% accuracy
1185 * Perfect claim coverage
1186 * Comprehensive evidence gathering
1187 * Flawless verdicts
1188 * Production polish
1189
1190 **Expecting:**
1191
1192 * Reasonable claim extraction
1193 * Defensible verdicts
1194 * Understandable reasoning
1195 * Useful output
1196
1197 == 12. Test Cases ==
1198
1199 === 12.1 Test Case 1: Simple Factual Claim ===
1200
1201 **Input:** "Coffee reduces the risk of type 2 diabetes by 30%"
1202
1203 **Expected Output:**
1204
1205 * Extract claim correctly
1206 * Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED
1207 * Confidence: 70-90%
1208 * Risk tier: C (Low)
1209 * Reasoning: Mentions studies or evidence
1210
1211 **Success:** Verdict is reasonable and reasoning makes sense
1212
1213 === 12.2 Test Case 2: Complex News Article ===
1214
1215 **Input:** News article URL with multiple claims about politics/health/science
1216
1217 **Expected Output:**
1218
1219 * Extract 3-5 key claims
1220 * Verdict for each (may vary: some supported, some uncertain, some refuted)
1221 * Coherent analysis summary
1222 * Article summary
1223 * Risk tiers assigned appropriately
1224
1225 **Success:** Claims identified are actually from article, verdicts are reasonable
1226
1227 === 12.3 Test Case 3: Controversial Topic ===
1228
1229 **Input:** Article on contested political or scientific topic
1230
1231 **Expected Output:**
1232
1233 * Balanced analysis
1234 * Acknowledges uncertainty where appropriate
1235 * Doesn't overstate confidence
1236 * Reasoning shows awareness of complexity
1237
1238 **Success:** Analysis is fair and doesn't show obvious bias
1239
1240 === 12.4 Test Case 4: Clearly False Claim ===
1241
1242 **Input:** Article with obviously false claim (e.g., "The Earth is flat")
1243
1244 **Expected Output:**
1245
1246 * Extract claim
1247 * Verdict: REFUTED
1248 * High confidence (> 90%)
1249 * Risk tier: C (Low - established fact)
1250 * Clear reasoning
1251
1252 **Success:** AI correctly identifies false claim with high confidence
1253
1254 === 12.5 Test Case 5: Genuinely Uncertain Claim ===
1255
1256 **Input:** Article with claim where evidence is genuinely mixed
1257
1258 **Expected Output:**
1259
1260 * Extract claim
1261 * Verdict: UNCERTAIN
1262 * Moderate confidence (40-60%)
1263 * Reasoning explains why uncertain
1264
1265 **Success:** AI recognizes uncertainty and doesn't overstate confidence
1266
1267 === 12.6 Test Case 6: High-Risk Medical Claim ===
1268
1269 **Input:** Article making medical claims
1270
1271 **Expected Output:**
1272
1273 * Extract claim
1274 * Verdict: [appropriate based on evidence]
1275 * Risk tier: A (High - medical)
1276 * Red label displayed
1277 * Clear disclaimer about not being medical advice
1278
1279 **Success:** Risk tier correctly assigned, appropriate warnings shown
1280
1281 == 13. POC Decision Gate ==
1282
1283 === 13.1 Decision Framework ===
1284
1285 After POC testing complete, team makes one of three decisions:
1286
1287 **Option A: GO (Proceed to POC2)**
1288
1289 **Conditions:**
1290
1291 * AI quality ≥70% without manual editing
1292 * Basic claim → verdict pipeline validated
1293 * Internal + advisor feedback positive
1294 * Technical feasibility confirmed
1295 * Team confident in direction
1296 * Clear path to improving AI quality to ≥90%
1297
1298 **Next Steps:**
1299
1300 * Plan POC2 development (add scenarios)
1301 * Design scenario architecture
1302 * Expand to Evidence Model structure
1303 * Test with more complex articles
1304
1305 **Option B: NO-GO (Pivot or Stop)**
1306
1307 **Conditions:**
1308
1309 * AI quality < 60%
1310 * Requires manual editing for most analyses (> 50%)
1311 * Feedback indicates fundamental flaws
1312 * Cost/effort not justified by value
1313 * No clear path to improvement
1314
1315 **Next Steps:**
1316
1317 * **Pivot:** Change to hybrid human-AI approach (accept manual review required)
1318 * **Stop:** Conclude approach not viable, revisit later
1319
1320 **Option C: ITERATE (Improve POC)**
1321
1322 **Conditions:**
1323
1324 * Concept has merit but execution needs work
1325 * Specific improvements identified
1326 * Addressable with better prompts/approach
1327 * AI quality between 60-70%
1328
1329 **Next Steps:**
1330
1331 * Improve AI prompts
1332 * Test different approaches
1333 * Re-run POC with improvements
1334 * Then make GO/NO-GO decision
1335
1336 === 13.2 Decision Criteria Summary ===
1337
1338 {{code}}
1339 AI Quality < 60% → NO-GO (approach doesn't work)
1340 AI Quality 60-70% → ITERATE (improve and retry)
1341 AI Quality ≥70% → GO (proceed to POC2)
1342 {{/code}}
1343
1344 == 14. Key Risks & Mitigations ==
1345
1346 === 14.1 Risk: AI Quality Not Good Enough ===
1347
1348 **Likelihood:** Medium-High
1349 **Impact:** POC fails
1350
1351 **Mitigation:**
1352
1353 * Extensive prompt engineering and testing
1354 * Use best available AI models (role-based selection; configured via LLM abstraction)
1355 * Test with diverse article types
1356 * Iterate on prompts based on results
1357
1358 **Acceptance:** This is what POC tests - be ready for failure
1359
1360 === 14.2 Risk: AI Consistency Issues ===
1361
1362 **Likelihood:** Medium
1363 **Impact:** Works sometimes, fails other times
1364
1365 **Mitigation:**
1366
1367 * Test with 10+ diverse articles
1368 * Measure success rate honestly
1369 * Improve prompts to increase consistency
1370
1371 **Acceptance:** Some variability OK if average quality ≥70%
1372
1373 === 14.3 Risk: Output Incomprehensible ===
1374
1375 **Likelihood:** Low-Medium
1376 **Impact:** Users can't understand analysis
1377
1378 **Mitigation:**
1379
1380 * Create clear explainer document
1381 * Iterate on output format
1382 * Test with non-technical reviewers
1383 * Simplify language if needed
1384
1385 **Acceptance:** Iterate until comprehensible
1386
1387 === 14.4 Risk: API Rate Limits / Costs ===
1388
1389 **Likelihood:** Low
1390 **Impact:** System slow or expensive
1391
1392 **Mitigation:**
1393
1394 * Monitor API usage
1395 * Implement retry logic
1396 * Estimate costs before scaling
1397
1398 **Acceptance:** POC can be slow and expensive (optimization later)
1399
1400 === 14.5 Risk: Scope Creep ===
1401
1402 **Likelihood:** Medium
1403 **Impact:** POC becomes too complex
1404
1405 **Mitigation:**
1406
1407 * Strict scope discipline
1408 * Say NO to feature additions
1409 * Keep focus on core question
1410
1411 **Acceptance:** POC is minimal by design
1412
1413 == 15. POC Philosophy ==
1414
1415 === 15.1 Core Principles ===
1416
1417 * \\
1418 ** \\
1419 **1. Build Less, Learn More
1420 * Minimum features to test hypothesis
1421 * Don't build unvalidated features
1422 * Focus on core question only
1423
1424 **2. Fail Fast**
1425
1426 * Quick test of hardest part (AI capability)
1427 * Accept that POC might fail
1428 * Better to discover issues early
1429 * Honest assessment over optimistic hope
1430
1431 **3. Test First, Build Second**
1432
1433 * Validate AI can do this before building platform
1434 * Don't assume it will work
1435 * Let results guide decisions
1436
1437 **4. Automation First**
1438
1439 * No manual editing allowed
1440 * Tests scalability, not just feasibility
1441 * Proves approach can work at scale
1442
1443 **5. Honest Assessment**
1444
1445 * Don't cherry-pick examples
1446 * Don't manually fix bad outputs
1447 * Document failures openly
1448 * Make data-driven decisions
1449
1450 === 15.2 What POC Is ===
1451
1452 ✅ Testing AI capability without humans
1453 ✅ Proving core technical concept
1454 ✅ Fast validation of approach
1455 ✅ Honest assessment of feasibility
1456
1457 === 15.3 What POC Is NOT ===
1458
1459 ❌ Building a product
1460 ❌ Production-ready system
1461 ❌ Feature-complete platform
1462 ❌ Perfectly accurate analysis
1463 ❌ Polished user experience
1464
1465 == 16. Success ==
1466
1467 Clear Path Forward ==
1468
1469 **If POC succeeds (≥70% AI quality):**
1470
1471 * ✅ Approach validated
1472 * ✅ Proceed to POC2 (add scenarios)
1473 * ✅ Design full Evidence Model structure
1474 * ✅ Test multi-scenario comparison
1475 * ✅ Focus on improving AI quality from 70% → 90%
1476
1477 **If POC fails (< 60% AI quality):**
1478
1479 * ✅ Learn what doesn't work
1480 * ✅ Pivot to different approach
1481 * ✅ OR wait for better AI technology
1482 * ✅ Avoid wasting resources on non-viable approach
1483
1484 **Either way, POC provides clarity.**
1485
1486 == 17. Related Pages ==
1487
1488 * [[User Needs>>Archive.FactHarbor 2026\.02\.08.Specification.Requirements.User Needs.WebHome]]
1489 * [[Requirements>>Archive.FactHarbor 2026\.02\.08.Specification.Requirements.WebHome]]
1490 * [[Gap Analysis>>FactHarbor.Specification.Requirements.GapAnalysis]]
1491 * [[Architecture>>Archive.FactHarbor 2026\.02\.08.Specification.Architecture.WebHome]]
1492 * [[AKEL>>Archive.FactHarbor 2026\.02\.08.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
1493 * [[Workflows>>Archive.FactHarbor 2026\.02\.08.Specification.Workflows.WebHome]]
1494
1495 **Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)
1496
1497
1498 === NFR-POC-11: LLM Provider Abstraction (POC1) ===
1499
1500 **Requirement:** POC1 MUST implement LLM abstraction layer with support for multiple providers.
1501
1502 **POC1 Implementation:**
1503
1504 * **Primary Provider:** Anthropic Claude API
1505 * Stage 1: Provider-default FAST model
1506 * Stage 2: Provider-default REASONING model (cached)
1507 * Stage 3: Provider-default REASONING model
1508
1509 * **Provider Interface:** Abstract LLMProvider interface implemented
1510
1511 * **Configuration:** Environment variables for provider selection
1512 * {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}}
1513 * {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}}
1514 * {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}}
1515
1516 * **Failover:** Basic error handling with cache fallback for Stage 2
1517
1518 * **Cost Tracking:** Log provider name and cost per request
1519
1520 **Future (POC2/Beta):**
1521
1522 * Secondary provider (OpenAI) with automatic failover
1523 * Admin API for runtime provider switching
1524 * Cost comparison dashboard
1525 * Cross-provider output verification
1526
1527 **Success Criteria:**
1528
1529 * All LLM calls go through abstraction layer (no direct API calls)
1530 * Provider can be changed via environment variable without code changes
1531 * Cost tracking includes provider name in logs
1532 * Stage 2 falls back to cache on provider failure
1533
1534 **Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor.Specification.POC.API-and-Schemas.WebHome]] Section 6
1535
1536 **Dependencies:**
1537
1538 * NFR-14 (Main Requirements)
1539 * Design Decision 9
1540 * Architecture Section 2.2
1541
1542 **Priority:** HIGH (P1)
1543
1544 **Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.