Last modified by Robert Schaub on 2025/12/24 18:27

From version 2.1
edited by Robert Schaub
on 2025/12/24 13:58
Change comment: Imported from XAR
To version 3.2
edited by Robert Schaub
on 2025/12/24 18:26
Change comment: Renamed back-links.

Summary

Details

Page properties
Content
... ... @@ -1,7 +1,7 @@
1 1  = POC Requirements =
2 2  
3 -**Status:** ✅ Approved for Development
4 -**Version:** 2.0 (Updated after Specification Cross-Check)
3 +**Status:** ✅ Approved for Development
4 +**Version:** 2.0 (Updated after Specification Cross-Check)
5 5  **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
6 6  
7 7  == 1. POC Overview ==
... ... @@ -9,9 +9,11 @@
9 9  === 1.1 What POC Tests ===
10 10  
11 11  **Core Question:**
12 +
12 12  > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?
13 13  
14 14  **What we're proving:**
16 +
15 15  * AI can identify factual claims from text
16 16  * AI can evaluate those claims and produce verdicts
17 17  * Output is comprehensible and useful
... ... @@ -18,6 +18,7 @@
18 18  * Fully automated approach is viable
19 19  
20 20  **What we're NOT testing:**
23 +
21 21  * Scenario generation (deferred to POC2)
22 22  * Evidence display (deferred to POC2)
23 23  * Production scalability
... ... @@ -31,6 +31,7 @@
31 31  Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**.
32 32  
33 33  **Rationale:**
37 +
34 34  * **POC1 tests:** Can AI extract claims and generate verdicts?
35 35  * **POC2 will add:** Scenario generation and management
36 36  * **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow?
... ... @@ -42,6 +42,7 @@
42 42  **No Risk:**
43 43  
44 44  Scenarios are additive complexity, not foundational. Deferring them to POC2 allows:
49 +
45 45  * Faster POC1 validation
46 46  * Learning from POC1 to inform scenario design
47 47  * Iterative approach: fail fast if basic AI doesn't work
... ... @@ -48,14 +48,10 @@
48 48  * Flexibility to adjust scenario architecture based on POC1 insights
49 49  
50 50  **Full System Workflow (Future):**
51 -{{code}}
52 -Claims → Scenarios → Evidence → Verdicts
53 -{{/code}}
56 +{{code}}Claims → Scenarios → Evidence → Verdicts{{/code}}
54 54  
55 55  **POC1 Simplified Workflow:**
56 -{{code}}
57 -Claims → Verdicts (scenarios implicit in reasoning)
58 -{{/code}}
59 +{{code}}Claims → Verdicts (scenarios implicit in reasoning){{/code}}
59 59  
60 60  == 2. POC Output Specification ==
61 61  
... ... @@ -63,9 +63,10 @@
63 63  
64 64  **What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument
65 65  
66 -**Length:** 4-6 sentences
67 +**Length:** 4-6 sentences
67 67  
68 68  **Content (Required Elements):**
70 +
69 69  1. **Article's main thesis/claim** - What is the article trying to argue or prove?
70 70  2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts
71 71  3. **Central vs. supporting claims** - Which claims are central to the article's argument?
... ... @@ -75,30 +75,28 @@
75 75  **Critical Innovation:**
76 76  
77 77  POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might:
80 +
78 78  * Make accurate supporting facts but draw unsupported conclusions
79 79  * Have one false central claim that invalidates the whole argument
80 80  * Misframe accurate information to mislead
81 81  
82 82  **Good Example (Context-Aware):**
83 -{{code}}
84 -This article argues that coffee cures cancer based on its antioxidant
86 +{{code}}This article argues that coffee cures cancer based on its antioxidant
85 85  content. We analyzed 3 factual claims: 2 about coffee's chemical
86 86  properties are well-supported, but the main causal claim is refuted
87 87  by current evidence. The article confuses correlation with causation.
88 88  Overall assessment: MISLEADING - makes an unsupported medical claim
89 -despite citing some accurate facts.
90 -{{/code}}
91 +despite citing some accurate facts.{{/code}}
91 91  
92 92  **Poor Example (Simple Aggregation - Don't Do This):**
93 -{{code}}
94 -This article makes 3 claims. 2 are well-supported and 1 is refuted.
95 -Overall assessment: mostly accurate (67% accurate).
96 -{{/code}}
94 +{{code}}This article makes 3 claims. 2 are well-supported and 1 is refuted.
95 +Overall assessment: mostly accurate (67% accurate).{{/code}}
97 97  ↑ This misses that the refuted claim IS the article's main point!
98 98  
99 99  **What POC1 Tests:**
100 100  
101 101  Can AI identify and assess:
101 +
102 102  * ✅ The article's main thesis/conclusion?
103 103  * ✅ Which claims are central vs. supporting?
104 104  * ✅ Whether the evidence supports the conclusion?
... ... @@ -107,6 +107,7 @@
107 107  **If AI Cannot Do This:**
108 108  
109 109  That's valuable to learn in POC1! We'll:
110 +
110 110  * Note as limitation
111 111  * Fall back to simple aggregation with warning
112 112  * Design explicit article-level analysis for POC2
... ... @@ -113,30 +113,30 @@
113 113  
114 114  === 2.2 Component 2: CLAIMS IDENTIFICATION ===
115 115  
116 -**What:** List of factual claims extracted from article
117 -**Format:** Numbered list
118 -**Quantity:** 3-5 claims
117 +**What:** List of factual claims extracted from article
118 +**Format:** Numbered list
119 +**Quantity:** 3-5 claims
119 119  **Requirements:**
121 +
120 120  * Factual claims only (not opinions/questions)
121 121  * Clearly stated
122 122  * Automatically extracted by AI
123 123  
124 124  **Example:**
125 -{{code}}
126 -CLAIMS IDENTIFIED:
127 +{{code}}CLAIMS IDENTIFIED:
127 127  
128 128  [1] Coffee reduces diabetes risk by 30%
129 129  [2] Coffee improves heart health
130 130  [3] Decaf has same benefits as regular
131 -[4] Coffee prevents Alzheimer's completely
132 -{{/code}}
132 +[4] Coffee prevents Alzheimer's completely{{/code}}
133 133  
134 134  === 2.3 Component 3: CLAIMS VERDICTS ===
135 135  
136 -**What:** Verdict for each claim identified
137 -**Format:** Per claim structure
136 +**What:** Verdict for each claim identified
137 +**Format:** Per claim structure
138 138  
139 139  **Required Elements:**
140 +
140 140  * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
141 141  * **Confidence Score:** 0-100%
142 142  * **Brief Reasoning:** 1-3 sentences explaining why
... ... @@ -143,8 +143,7 @@
143 143  * **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration
144 144  
145 145  **Example:**
146 -{{code}}
147 -VERDICTS:
147 +{{code}}VERDICTS:
148 148  
149 149  [1] WELL-SUPPORTED (85%) [Risk: C]
150 150  Multiple studies confirm 25-30% risk reduction with regular consumption.
... ... @@ -156,12 +156,12 @@
156 156  Some benefits overlap, but caffeine-related benefits are reduced in decaf.
157 157  
158 158  [4] REFUTED (90%) [Risk: B]
159 -No evidence for complete prevention. Claim is significantly overstated.
160 -{{/code}}
159 +No evidence for complete prevention. Claim is significantly overstated.{{/code}}
161 161  
162 162  **Risk Tier Display:**
162 +
163 163  * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections
164 -* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
164 +* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
165 165  * **Tier C (Green):** Low Risk - Facts/Definitions/History
166 166  
167 167  **Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.
... ... @@ -168,18 +168,16 @@
168 168  
169 169  === 2.4 Component 4: ARTICLE SUMMARY (Optional) ===
170 170  
171 -**What:** Brief summary of original article content
172 -**Length:** 3-5 sentences
171 +**What:** Brief summary of original article content
172 +**Length:** 3-5 sentences
173 173  **Tone:** Neutral (article's position, not FactHarbor's analysis)
174 174  
175 175  **Example:**
176 -{{code}}
177 -ARTICLE SUMMARY:
176 +{{code}}ARTICLE SUMMARY:
178 178  
179 179  Health News Today article discusses coffee benefits, citing studies
180 180  on diabetes and Alzheimer's. Author highlights research linking coffee
181 -to disease prevention. Recommends 2-3 cups daily for optimal health.
182 -{{/code}}
180 +to disease prevention. Recommends 2-3 cups daily for optimal health.{{/code}}
183 183  
184 184  === 2.5 Component 5: USAGE STATISTICS (Cost Tracking) ===
185 185  
... ... @@ -186,6 +186,7 @@
186 186  **What:** LLM usage metrics for cost optimization and scaling decisions
187 187  
188 188  **Purpose:**
187 +
189 189  * Understand cost per analysis
190 190  * Identify optimization opportunities
191 191  * Project costs at scale
... ... @@ -192,8 +192,7 @@
192 192  * Inform architecture decisions
193 193  
194 194  **Display Format:**
195 -{{code}}
196 -USAGE STATISTICS:
194 +{{code}}USAGE STATISTICS:
197 197  • Article: 2,450 words (12,300 characters)
198 198  • Input tokens: 15,234
199 199  • Output tokens: 892
... ... @@ -201,17 +201,18 @@
201 201  • Estimated cost: $0.24 USD
202 202  • Response time: 8.3 seconds
203 203  • Cost per claim: $0.048
204 -• Model: claude-sonnet-4-20250514
205 -{{/code}}
202 +• Model: claude-sonnet-4-20250514{{/code}}
206 206  
207 207  **Why This Matters:**
208 208  
209 209  At scale, LLM costs are critical:
207 +
210 210  * 10,000 articles/month ≈ $200-500/month
211 211  * 100,000 articles/month ≈ $2,000-5,000/month
212 212  * Cost optimization can reduce expenses 30-50%
213 213  
214 214  **What POC1 Learns:**
213 +
215 215  * How cost scales with article length
216 216  * Prompt optimization opportunities (caching, compression)
217 217  * Output verbosity tradeoffs
... ... @@ -219,6 +219,7 @@
219 219  * Article length limits (if needed)
220 220  
221 221  **Implementation:**
221 +
222 222  * Claude API already returns usage data
223 223  * No extra API calls needed
224 224  * Display to user + log for aggregate analysis
... ... @@ -228,7 +228,8 @@
228 228  
229 229  === 2.6 Total Output Size ===
230 230  
231 -**Combined:** ~220-350 words
231 +**Combined:** 220-350 words
232 +
232 232  * Analysis Summary (Context-Aware): 60-90 words (4-6 sentences)
233 233  * Claims Identification: 30-50 words
234 234  * Claims Verdicts: 100-150 words
... ... @@ -243,6 +243,7 @@
243 243  The following are **explicitly excluded** from POC:
244 244  
245 245  **Content Features:**
247 +
246 246  * ❌ Scenarios (deferred to POC2)
247 247  * ❌ Evidence display (supporting/opposing lists)
248 248  * ❌ Source links (clickable references)
... ... @@ -252,6 +252,7 @@
252 252  * ❌ Risk assessment (shown but not workflow-integrated)
253 253  
254 254  **Platform Features:**
257 +
255 255  * ❌ User accounts / authentication
256 256  * ❌ Saved history
257 257  * ❌ Search functionality
... ... @@ -261,6 +261,7 @@
261 261  * ❌ Social sharing
262 262  
263 263  **Technical Features:**
267 +
264 264  * ❌ Browser extensions
265 265  * ❌ Mobile apps
266 266  * ❌ API endpoints
... ... @@ -268,6 +268,7 @@
268 268  * ❌ Export features (PDF, CSV)
269 269  
270 270  **Quality Features:**
275 +
271 271  * ❌ Accessibility (WCAG compliance)
272 272  * ❌ Multilingual support
273 273  * ❌ Mobile optimization
... ... @@ -274,6 +274,7 @@
274 274  * ❌ Media verification (images/videos)
275 275  
276 276  **Production Features:**
282 +
277 277  * ❌ Security hardening
278 278  * ❌ Privacy compliance (GDPR)
279 279  * ❌ Terms of service
... ... @@ -287,17 +287,13 @@
287 287  === 4.1 Architecture Comparison ===
288 288  
289 289  **POC Architecture (Simplified):**
290 -{{code}}
291 -User Input → Single AKEL Call → Output Display
292 - (all processing)
293 -{{/code}}
296 +{{code}}User Input → Single AKEL Call → Output Display
297 + (all processing){{/code}}
294 294  
295 295  **Full System Architecture:**
296 -{{code}}
297 -User Input → Claim Extractor → Claim Classifier → Scenario Generator
300 +{{code}}User Input → Claim Extractor → Claim Classifier → Scenario Generator
298 298  → Evidence Summarizer → Contradiction Detector → Verdict Generator
299 -→ Quality Gates → Publication → Output Display
300 -{{/code}}
302 +→ Quality Gates → Publication → Output Display{{/code}}
301 301  
302 302  **Key Differences:**
303 303  
... ... @@ -313,12 +313,14 @@
313 313  === 4.2 Workflow Comparison ===
314 314  
315 315  **POC1 Workflow:**
318 +
316 316  1. User submits text/URL
317 317  2. Single AKEL call (all processing in one prompt)
318 318  3. Display results
319 -**Total: 3 steps, ~10-18 seconds**
322 +**Total: 3 steps, 10-18 seconds**
320 320  
321 321  **Full System Workflow:**
325 +
322 322  1. **Claim Submission** (extraction, normalization, clustering)
323 323  2. **Scenario Building** (definitions, assumptions, boundaries)
324 324  3. **Evidence Handling** (retrieval, assessment, linking)
... ... @@ -325,7 +325,7 @@
325 325  4. **Verdict Creation** (synthesis, reasoning, approval)
326 326  5. **Public Presentation** (summaries, landscapes, deep dives)
327 327  6. **Time Evolution** (versioning, re-evaluation triggers)
328 -**Total: 6 phases with quality gates, ~10-30 seconds**
332 +**Total: 6 phases with quality gates, 10-30 seconds**
329 329  
330 330  === 4.3 Why POC is Simplified ===
331 331  
... ... @@ -348,6 +348,7 @@
348 348  === 4.4 Gap Between POC1 and POC2/Beta ===
349 349  
350 350  **What needs to be built for POC2:**
355 +
351 351  * Scenario generation component
352 352  * Evidence Model structure (full)
353 353  * Scenario-evidence linking
... ... @@ -355,6 +355,7 @@
355 355  * Truth landscape visualization
356 356  
357 357  **What needs to be built for Beta:**
363 +
358 358  * Multi-component AKEL pipeline
359 359  * Quality gate infrastructure
360 360  * Review workflow system
... ... @@ -371,6 +371,7 @@
371 371  **Mode:** Mode 2 (AI-Generated, No Prior Human Review)
372 372  
373 373  Per FactHarbor Specification Section 11 "POC v1 Behavior":
380 +
374 374  * Produces public AI-generated output
375 375  * No human approval gate
376 376  * Clear AI-Generated labeling
... ... @@ -380,21 +380,20 @@
380 380  === 5.2 User-Facing Labels ===
381 381  
382 382  **Primary Label (top of analysis):**
383 -{{code}}
384 -╔════════════════════════════════════════════════════════════╗
385 -║ [AI-GENERATED - POC/DEMO] ║
386 -║ ║
387 -║ This analysis was produced entirely by AI and has not ║
388 -║ been human-reviewed. Use for demonstration purposes. ║
389 -║ ║
390 -║ Source: AI/AKEL v1.0 (POC) ║
391 -║ Review Status: Not Reviewed (Proof-of-Concept) ║
392 -║ Quality Gates: 4/4 Passed (Simplified) ║
393 -║ Last Updated: [timestamp] ║
394 -╚════════════════════════════════════════════════════════════╝
395 -{{/code}}
390 +{{code}}╔════════════════════════════════════════════════════════════╗
391 +║ [AI-GENERATED - POC/DEMO] ║
392 +║ ║
393 +║ This analysis was produced entirely by AI and has not ║
394 +║ been human-reviewed. Use for demonstration purposes. ║
395 +║ ║
396 +║ Source: AI/AKEL v1.0 (POC) ║
397 +║ Review Status: Not Reviewed (Proof-of-Concept) ║
398 +║ Quality Gates: 4/4 Passed (Simplified) ║
399 +║ Last Updated: [timestamp] ║
400 +╚════════════════════════════════════════════════════════════╝{{/code}}
396 396  
397 397  **Per-Claim Risk Labels:**
403 +
398 398  * **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety)
399 399  * **[Risk: B]** 🟡 Medium Risk (Policy/Science)
400 400  * **[Risk: C]** 🟢 Low Risk (Facts/Definitions)
... ... @@ -402,6 +402,7 @@
402 402  === 5.3 Display Requirements ===
403 403  
404 404  **Must Show:**
411 +
405 405  * AI-Generated status (prominent)
406 406  * POC/Demo disclaimer
407 407  * Risk tier per claim
... ... @@ -410,6 +410,7 @@
410 410  * Timestamp
411 411  
412 412  **Must NOT Claim:**
420 +
413 413  * Human review
414 414  * Production quality
415 415  * Medical/legal advice
... ... @@ -433,6 +433,7 @@
433 433  Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates.
434 434  
435 435  **Full System Has 4 Gates:**
444 +
436 436  1. Source Quality
437 437  2. Contradiction Search (MANDATORY)
438 438  3. Uncertainty Quantification
... ... @@ -439,6 +439,7 @@
439 439  4. Structural Integrity
440 440  
441 441  **POC Implements Simplified Versions:**
451 +
442 442  * Focus on demonstrating concept
443 443  * Basic implementations sufficient
444 444  * Failures displayed to user (not blocking)
... ... @@ -447,6 +447,7 @@
447 447  === 6.2 Gate 1: Source Quality (Basic) ===
448 448  
449 449  **Full System Requirements:**
460 +
450 450  * Primary sources identified and accessible
451 451  * Source reliability scored against whitelist
452 452  * Citation completeness verified
... ... @@ -454,6 +454,7 @@
454 454  * Author credentials validated
455 455  
456 456  **POC Implementation:**
468 +
457 457  * ✅ At least 2 sources found
458 458  * ✅ Sources accessible (URLs valid)
459 459  * ❌ No whitelist checking
... ... @@ -467,6 +467,7 @@
467 467  === 6.3 Gate 2: Contradiction Search (Basic) ===
468 468  
469 469  **Full System Requirements:**
482 +
470 470  * Counter-evidence actively searched
471 471  * Reservations and limitations identified
472 472  * Alternative interpretations explored
... ... @@ -475,6 +475,7 @@
475 475  * Academic literature (supporting AND opposing)
476 476  
477 477  **POC Implementation:**
491 +
478 478  * ✅ Basic search for counter-evidence
479 479  * ✅ Identify obvious contradictions
480 480  * ❌ No comprehensive academic search
... ... @@ -489,6 +489,7 @@
489 489  === 6.4 Gate 3: Uncertainty Quantification (Basic) ===
490 490  
491 491  **Full System Requirements:**
506 +
492 492  * Confidence scores calculated for all claims/verdicts
493 493  * Limitations explicitly stated
494 494  * Data gaps identified and disclosed
... ... @@ -496,6 +496,7 @@
496 496  * Alternative scenarios considered
497 497  
498 498  **POC Implementation:**
514 +
499 499  * ✅ Confidence scores (0-100%)
500 500  * ✅ Basic uncertainty acknowledgment
501 501  * ❌ No detailed limitation disclosure
... ... @@ -509,6 +509,7 @@
509 509  === 6.5 Gate 4: Structural Integrity (Basic) ===
510 510  
511 511  **Full System Requirements:**
528 +
512 512  * No hallucinations detected (fact-checking against sources)
513 513  * Logic chain valid and traceable
514 514  * References accessible and verifiable
... ... @@ -516,6 +516,7 @@
516 516  * Premises clearly stated
517 517  
518 518  **POC Implementation:**
536 +
519 519  * ✅ Basic coherence check
520 520  * ✅ References accessible
521 521  * ❌ No comprehensive hallucination detection
... ... @@ -529,24 +529,20 @@
529 529  === 6.6 Quality Gate Display ===
530 530  
531 531  **POC shows simplified status:**
532 -{{code}}
533 -Quality Gates: 4/4 Passed (Simplified)
550 +{{code}}Quality Gates: 4/4 Passed (Simplified)
534 534  ✓ Source Quality: 3 sources found
535 535  ✓ Contradiction Search: Basic search completed
536 536  ✓ Uncertainty: Confidence scores assigned
537 -✓ Structural Integrity: Output coherent
538 -{{/code}}
554 +✓ Structural Integrity: Output coherent{{/code}}
539 539  
540 540  **If any gate fails:**
541 -{{code}}
542 -Quality Gates: 3/4 Passed (Simplified)
557 +{{code}}Quality Gates: 3/4 Passed (Simplified)
543 543  ✓ Source Quality: 3 sources found
544 544  ✗ Contradiction Search: Search failed - limited evidence
545 545  ✓ Uncertainty: Confidence scores assigned
546 546  ✓ Structural Integrity: Output coherent
547 547  
548 -Note: This analysis has limited evidence. Use with caution.
549 -{{/code}}
563 +Note: This analysis has limited evidence. Use with caution.{{/code}}
550 550  
551 551  === 6.7 Simplified vs. Full System ===
552 552  
... ... @@ -563,6 +563,7 @@
563 563  === 7.1 POC AKEL (Simplified) ===
564 564  
565 565  **Implementation:**
580 +
566 566  * Single Claude API call (Sonnet 4.5)
567 567  * One comprehensive prompt
568 568  * All processing in single request
... ... @@ -570,21 +570,19 @@
570 570  * No orchestration layer
571 571  
572 572  **Prompt Structure:**
573 -{{code}}
574 -Task: Analyze this article and provide:
588 +{{code}}Task: Analyze this article and provide:
575 575  
576 576  1. Extract 3-5 factual claims
577 577  2. For each claim:
578 - - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
579 - - Assign confidence score (0-100%)
580 - - Assign risk tier (A/B/C)
581 - - Write brief reasoning (1-3 sentences)
592 + - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
593 + - Assign confidence score (0-100%)
594 + - Assign risk tier (A/B/C)
595 + - Write brief reasoning (1-3 sentences)
582 582  3. Generate analysis summary (3-5 sentences)
583 583  4. Generate article summary (3-5 sentences)
584 584  5. Run basic quality checks
585 585  
586 -Return as structured JSON.
587 -{{/code}}
600 +Return as structured JSON.{{/code}}
588 588  
589 589  **Processing Time:** 10-18 seconds (estimate)
590 590  
... ... @@ -591,8 +591,7 @@
591 591  === 7.2 Full System AKEL (Production) ===
592 592  
593 593  **Architecture:**
594 -{{code}}
595 -AKEL Orchestrator
607 +{{code}}AKEL Orchestrator
596 596  ├── Claim Extractor
597 597  ├── Claim Classifier (with risk tier assignment)
598 598  ├── Scenario Generator
... ... @@ -600,10 +600,10 @@
600 600  ├── Contradiction Detector
601 601  ├── Quality Gate Validator
602 602  ├── Audit Sampling Scheduler
603 -└── Federation Sync Adapter (Release 1.0+)
604 -{{/code}}
615 +└── Federation Sync Adapter (Release 1.0+){{/code}}
605 605  
606 606  **Processing:**
618 +
607 607  * Parallel processing where possible
608 608  * Separate component calls
609 609  * Quality gates between phases
... ... @@ -615,6 +615,7 @@
615 615  === 7.3 Why POC Uses Single Call ===
616 616  
617 617  **Advantages:**
630 +
618 618  * ✅ Simpler to implement
619 619  * ✅ Faster POC development
620 620  * ✅ Easier to debug
... ... @@ -622,6 +622,7 @@
622 622  * ✅ Good enough for concept validation
623 623  
624 624  **Limitations:**
638 +
625 625  * ❌ No component reusability
626 626  * ❌ No parallel processing
627 627  * ❌ All-or-nothing (can't partially succeed)
... ... @@ -648,6 +648,7 @@
648 648  **Requirement:** User can submit article for analysis
649 649  
650 650  **Functionality:**
665 +
651 651  * Text input field (paste article text, up to 5000 characters)
652 652  * URL input field (paste article URL)
653 653  * "Analyze" button to trigger processing
... ... @@ -654,6 +654,7 @@
654 654  * Loading indicator during analysis
655 655  
656 656  **Excluded:**
672 +
657 657  * No user authentication
658 658  * No claim history
659 659  * No search functionality
... ... @@ -660,6 +660,7 @@
660 660  * No saved templates
661 661  
662 662  **Acceptance Criteria:**
679 +
663 663  * User can paste text from article
664 664  * User can paste URL of article
665 665  * System accepts input and triggers analysis
... ... @@ -669,6 +669,7 @@
669 669  **Requirement:** AI automatically extracts 3-5 factual claims
670 670  
671 671  **Functionality:**
689 +
672 672  * AI reads article text
673 673  * AI identifies factual claims (not opinions/questions)
674 674  * AI extracts 3-5 most important claims
... ... @@ -675,6 +675,7 @@
675 675  * System displays numbered list
676 676  
677 677  **Critical:** NO MANUAL EDITING ALLOWED
696 +
678 678  * AI selects which claims to extract
679 679  * AI identifies factual vs. non-factual
680 680  * System processes claims as extracted
... ... @@ -681,11 +681,13 @@
681 681  * No human curation or correction
682 682  
683 683  **Error Handling:**
703 +
684 684  * If extraction fails: Display error message
685 685  * User can retry with different input
686 686  * No manual intervention to fix extraction
687 687  
688 688  **Acceptance Criteria:**
709 +
689 689  * AI extracts 3-5 claims automatically
690 690  * Claims are factual (not opinions)
691 691  * Claims are clearly stated
... ... @@ -696,15 +696,17 @@
696 696  **Requirement:** AI automatically generates verdict for each claim
697 697  
698 698  **Functionality:**
720 +
699 699  * For each claim, AI:
700 - * Evaluates claim based on available evidence/knowledge
701 - * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
702 - * Assigns confidence score (0-100%)
703 - * Assigns risk tier (A/B/C)
704 - * Writes brief reasoning (1-3 sentences)
722 +* Evaluates claim based on available evidence/knowledge
723 +* Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
724 +* Assigns confidence score (0-100%)
725 +* Assigns risk tier (A/B/C)
726 +* Writes brief reasoning (1-3 sentences)
705 705  * System displays verdict for each claim
706 706  
707 707  **Critical:** NO MANUAL EDITING ALLOWED
730 +
708 708  * AI computes verdicts based on evidence
709 709  * AI generates confidence scores
710 710  * AI writes reasoning
... ... @@ -711,11 +711,13 @@
711 711  * No human review or adjustment
712 712  
713 713  **Error Handling:**
737 +
714 714  * If verdict generation fails: Display error message
715 715  * User can retry
716 716  * No manual intervention to adjust verdicts
717 717  
718 718  **Acceptance Criteria:**
743 +
719 719  * Each claim has a verdict
720 720  * Confidence score is displayed (0-100%)
721 721  * Risk tier is displayed (A/B/C)
... ... @@ -728,15 +728,17 @@
728 728  **Requirement:** AI generates brief summary of analysis
729 729  
730 730  **Functionality:**
756 +
731 731  * AI summarizes findings in 3-5 sentences:
732 - * How many claims found
733 - * Distribution of verdicts
734 - * Overall assessment
758 +* How many claims found
759 +* Distribution of verdicts
760 +* Overall assessment
735 735  * System displays at top of results
736 736  
737 737  **Critical:** NO MANUAL EDITING ALLOWED
738 738  
739 739  **Acceptance Criteria:**
766 +
740 740  * Summary is coherent
741 741  * Accurately reflects analysis
742 742  * 3-5 sentences
... ... @@ -747,6 +747,7 @@
747 747  **Requirement:** AI generates brief summary of original article
748 748  
749 749  **Functionality:**
777 +
750 750  * AI summarizes article content (not FactHarbor's analysis)
751 751  * 3-5 sentences
752 752  * System displays
... ... @@ -756,6 +756,7 @@
756 756  **Critical:** NO MANUAL EDITING ALLOWED
757 757  
758 758  **Acceptance Criteria:**
787 +
759 759  * Summary is neutral (article's position)
760 760  * Accurately reflects article content
761 761  * 3-5 sentences
... ... @@ -766,6 +766,7 @@
766 766  **Requirement:** Clear labeling of AI-generated content
767 767  
768 768  **Functionality:**
798 +
769 769  * Display Mode 2 publication label
770 770  * Show POC/Demo disclaimer
771 771  * Display risk tiers per claim
... ... @@ -773,6 +773,7 @@
773 773  * Display timestamp
774 774  
775 775  **Acceptance Criteria:**
806 +
776 776  * Label is prominent and clear
777 777  * User understands this is AI-generated POC output
778 778  * Risk tiers are color-coded
... ... @@ -783,6 +783,7 @@
783 783  **Requirement:** Execute simplified quality gates
784 784  
785 785  **Functionality:**
817 +
786 786  * Check source quality (basic)
787 787  * Attempt contradiction search (basic)
788 788  * Calculate confidence scores
... ... @@ -790,6 +790,7 @@
790 790  * Display gate results
791 791  
792 792  **Acceptance Criteria:**
825 +
793 793  * All 4 gates attempted
794 794  * Pass/fail status displayed
795 795  * Failures explained to user
... ... @@ -804,6 +804,7 @@
804 804  **Critical Rule:** NO MANUAL EDITING AT ANY STAGE
805 805  
806 806  **What this means:**
840 +
807 807  * Claims: AI selects (no human curation)
808 808  * Scenarios: N/A (deferred to POC2)
809 809  * Evidence: AI evaluates (no human selection)
... ... @@ -811,13 +811,12 @@
811 811  * Summaries: AI writes (no human editing)
812 812  
813 813  **Pipeline:**
814 -{{code}}
815 -User Input → AKEL Processing → Output Display
816 - ↓
817 - ZERO human editing
818 -{{/code}}
848 +{{code}}User Input → AKEL Processing → Output Display
849 + ↓
850 + ZERO human editing{{/code}}
819 819  
820 820  **If AI output is poor:**
853 +
821 821  * ❌ Do NOT manually fix it
822 822  * ✅ Document the failure
823 823  * ✅ Improve prompts and retry
... ... @@ -824,6 +824,7 @@
824 824  * ✅ Accept that POC might fail
825 825  
826 826  **Why this matters:**
860 +
827 827  * Tests whether AI can do this without humans
828 828  * Validates scalability (humans can't review every analysis)
829 829  * Honest test of technical feasibility
... ... @@ -833,16 +833,19 @@
833 833  **Requirement:** Analysis completes in reasonable time
834 834  
835 835  **Acceptable Performance:**
870 +
836 836  * Processing time: 1-5 minutes (acceptable for POC)
837 837  * Display loading indicator to user
838 838  * Show progress if possible ("Extracting claims...", "Generating verdicts...")
839 839  
840 840  **Not Required:**
876 +
841 841  * Production-level speed (< 30 seconds)
842 842  * Optimization for scale
843 843  * Caching
844 844  
845 845  **Acceptance Criteria:**
882 +
846 846  * Analysis completes within 5 minutes
847 847  * User sees loading indicator
848 848  * No timeout errors
... ... @@ -852,16 +852,19 @@
852 852  **Requirement:** System works for manual testing sessions
853 853  
854 854  **Acceptable:**
892 +
855 855  * Occasional errors (< 20% failure rate)
856 856  * Manual restart if needed
857 857  * Display error messages clearly
858 858  
859 859  **Not Required:**
898 +
860 860  * 99.9% uptime
861 861  * Automatic error recovery
862 862  * Production monitoring
863 863  
864 864  **Acceptance Criteria:**
904 +
865 865  * System works for test demonstrations
866 866  * Errors are handled gracefully
867 867  * User receives clear error messages
... ... @@ -871,6 +871,7 @@
871 871  **Requirement:** Runs on simple infrastructure
872 872  
873 873  **Acceptable:**
914 +
874 874  * Single machine or simple cloud setup
875 875  * No distributed architecture
876 876  * No load balancing
... ... @@ -878,6 +878,7 @@
878 878  * Local development environment viable
879 879  
880 880  **Not Required:**
922 +
881 881  * Production infrastructure
882 882  * Multi-region deployment
883 883  * Auto-scaling
... ... @@ -888,6 +888,7 @@
888 888  **Requirement:** Track and display LLM usage metrics to inform optimization decisions
889 889  
890 890  **Must Track:**
933 +
891 891  * Input tokens (article + prompt)
892 892  * Output tokens (generated analysis)
893 893  * Total tokens
... ... @@ -896,16 +896,19 @@
896 896  * Article length (words/characters)
897 897  
898 898  **Must Display:**
942 +
899 899  * Usage statistics in UI (Component 5)
900 900  * Cost per analysis
901 901  * Cost per claim extracted
902 902  
903 903  **Must Log:**
948 +
904 904  * Aggregate metrics for analysis
905 905  * Cost distribution by article length
906 906  * Token efficiency trends
907 907  
908 908  **Purpose:**
954 +
909 909  * Understand unit economics
910 910  * Identify optimization opportunities
911 911  * Project costs at scale
... ... @@ -912,6 +912,7 @@
912 912  * Inform architecture decisions (caching, model selection, etc.)
913 913  
914 914  **Acceptance Criteria:**
961 +
915 915  * ✅ Usage data displayed after each analysis
916 916  * ✅ Metrics logged for aggregate analysis
917 917  * ✅ Cost calculated accurately (Claude API pricing)
... ... @@ -919,6 +919,7 @@
919 919  * ✅ POC1 report includes cost analysis section
920 920  
921 921  **Success Target:**
969 +
922 922  * Average cost per analysis < $0.05 USD
923 923  * Cost scaling behavior understood (linear/exponential)
924 924  * 2+ optimization opportunities identified
... ... @@ -930,11 +930,13 @@
930 930  === 10.1 System Components ===
931 931  
932 932  **Frontend:**
981 +
933 933  * Simple HTML form (text input + URL input + button)
934 934  * Loading indicator
935 935  * Results display page (single page, no tabs/navigation)
936 936  
937 937  **Backend:**
987 +
938 938  * Single API endpoint
939 939  * Calls Claude API (Sonnet 4.5 or latest)
940 940  * Parses response
... ... @@ -941,10 +941,12 @@
941 941  * Returns JSON to frontend
942 942  
943 943  **Data Storage:**
994 +
944 944  * None required (stateless POC)
945 945  * Optional: Simple file storage or SQLite for demo examples
946 946  
947 947  **External Services:**
999 +
948 948  * Claude API (Anthropic) - required
949 949  * Optional: URL fetch service for article text extraction
950 950  
... ... @@ -952,23 +952,23 @@
952 952  
953 953  {{code}}
954 954  1. User submits text or URL
955 -
1007 + ↓
956 956  2. Backend receives request
957 -
1009 + ↓
958 958  3. If URL: Fetch article text
959 -
1011 + ↓
960 960  4. Call Claude API with single prompt:
961 - "Extract claims, evaluate each, provide verdicts"
962 -
1013 + "Extract claims, evaluate each, provide verdicts"
1014 + ↓
963 963  5. Claude API returns:
964 - - Analysis summary
965 - - Claims list
966 - - Verdicts for each claim (with risk tiers)
967 - - Article summary (optional)
968 - - Quality gate results
969 -
1016 + - Analysis summary
1017 + - Claims list
1018 + - Verdicts for each claim (with risk tiers)
1019 + - Article summary (optional)
1020 + - Quality gate results
1021 + ↓
970 970  6. Backend parses response
971 -
1023 + ↓
972 972  7. Frontend displays results with Mode 2 labeling
973 973  {{/code}}
974 974  
... ... @@ -977,45 +977,43 @@
977 977  === 10.3 AI Prompt Strategy ===
978 978  
979 979  **Single Comprehensive Prompt:**
980 -{{code}}
981 -Task: Analyze this article and provide:
1032 +{{code}}Task: Analyze this article and provide:
982 982  
983 983  1. Identify the article's main thesis/conclusion
984 - - What is the article trying to argue or prove?
985 - - What is the primary claim or conclusion?
1035 + - What is the article trying to argue or prove?
1036 + - What is the primary claim or conclusion?
986 986  
987 987  2. Extract 3-5 factual claims from the article
988 - - Note which claims are CENTRAL to the main thesis
989 - - Note which claims are SUPPORTING facts
1039 + - Note which claims are CENTRAL to the main thesis
1040 + - Note which claims are SUPPORTING facts
990 990  
991 991  3. For each claim:
992 - - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
993 - - Assign confidence score (0-100%)
994 - - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
995 - - Write brief reasoning (1-3 sentences)
1043 + - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
1044 + - Assign confidence score (0-100%)
1045 + - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
1046 + - Write brief reasoning (1-3 sentences)
996 996  
997 997  4. Assess relationship between claims and main thesis:
998 - - Do the claims actually support the article's conclusion?
999 - - Are there logical leaps or unsupported inferences?
1000 - - Is the article's framing misleading even if individual facts are accurate?
1049 + - Do the claims actually support the article's conclusion?
1050 + - Are there logical leaps or unsupported inferences?
1051 + - Is the article's framing misleading even if individual facts are accurate?
1001 1001  
1002 1002  5. Run quality gates:
1003 - - Check: ≥2 sources found
1004 - - Attempt: Basic contradiction search
1005 - - Calculate: Confidence scores
1006 - - Verify: Structural integrity
1054 + - Check: ≥2 sources found
1055 + - Attempt: Basic contradiction search
1056 + - Calculate: Confidence scores
1057 + - Verify: Structural integrity
1007 1007  
1008 1008  6. Write context-aware analysis summary (4-6 sentences):
1009 - - State article's main thesis
1010 - - Report claims found and verdict distribution
1011 - - Note if central claims are problematic
1012 - - Assess whether evidence supports conclusion
1013 - - Overall credibility considering claim importance
1060 + - State article's main thesis
1061 + - Report claims found and verdict distribution
1062 + - Note if central claims are problematic
1063 + - Assess whether evidence supports conclusion
1064 + - Overall credibility considering claim importance
1014 1014  
1015 1015  7. Write article summary (3-5 sentences: neutral summary of article content)
1016 1016  
1017 -Return as structured JSON with quality gate results.
1018 -{{/code}}
1068 +Return as structured JSON with quality gate results.{{/code}}
1019 1019  
1020 1020  **One prompt generates everything.**
1021 1021  
... ... @@ -1026,25 +1026,30 @@
1026 1026  === 10.4 Technology Stack Suggestions ===
1027 1027  
1028 1028  **Frontend:**
1079 +
1029 1029  * HTML + CSS + JavaScript (minimal framework)
1030 1030  * OR: Next.js (if team prefers)
1031 1031  * Hosted: Local machine OR Vercel/Netlify free tier
1032 1032  
1033 1033  **Backend:**
1085 +
1034 1034  * Python Flask/FastAPI (simple REST API)
1035 1035  * OR: Next.js API routes (if using Next.js)
1036 1036  * Hosted: Local machine OR Railway/Render free tier
1037 1037  
1038 1038  **AKEL Integration:**
1091 +
1039 1039  * Claude API via Anthropic SDK
1040 1040  * Model: Claude Sonnet 4.5 or latest available
1041 1041  
1042 1042  **Database:**
1096 +
1043 1043  * None (stateless acceptable)
1044 1044  * OR: SQLite if want to store demo examples
1045 1045  * OR: JSON files on disk
1046 1046  
1047 1047  **Deployment:**
1102 +
1048 1048  * Local development environment sufficient for POC
1049 1049  * Optional: Deploy to cloud for remote demos
1050 1050  
... ... @@ -1053,6 +1053,7 @@
1053 1053  === 11.1 Minimum Success (POC Passes) ===
1054 1054  
1055 1055  **Required for GO decision:**
1111 +
1056 1056  * ✅ AI extracts 3-5 factual claims automatically
1057 1057  * ✅ AI provides verdict for each claim automatically
1058 1058  * ✅ Verdicts are reasonable (≥70% make logical sense)
... ... @@ -1066,6 +1066,7 @@
1066 1066  * ✅ **Optimization opportunities identified** (≥2 potential improvements documented)
1067 1067  
1068 1068  **Quality Definition:**
1125 +
1069 1069  * "Reasonable verdict" = Defensible given general knowledge
1070 1070  * "Coherent summary" = Logically structured, grammatically correct
1071 1071  * "Comprehensible" = Reviewers understand what analysis means
... ... @@ -1073,6 +1073,7 @@
1073 1073  === 11.2 POC Fails If ===
1074 1074  
1075 1075  **Automatic NO-GO if any of these:**
1133 +
1076 1076  * ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones)
1077 1077  * ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random)
1078 1078  * ❌ Output incomprehensible (reviewers can't understand analysis)
... ... @@ -1084,14 +1084,15 @@
1084 1084  **POC quality expectations:**
1085 1085  
1086 1086  |=Component|=Quality Threshold|=Definition
1087 -|Claim Extraction|(% class="success" %)≥70% accuracy(%%) |Identifies obvious factual claims, may miss some edge cases
1088 -|Verdict Logic|(% class="success" %)≥70% defensible(%%) |Verdicts are logical given reasoning provided
1089 -|Reasoning Clarity|(% class="success" %)≥70% clear(%%) |1-3 sentences are understandable and relevant
1090 -|Overall Analysis|(% class="success" %)≥70% useful(%%) |Output helps user understand article claims
1145 +|Claim Extraction|(% class="success" %)≥70% accuracy |Identifies obvious factual claims, may miss some edge cases
1146 +|Verdict Logic|(% class="success" %)≥70% defensible |Verdicts are logical given reasoning provided
1147 +|Reasoning Clarity|(% class="success" %)≥70% clear |1-3 sentences are understandable and relevant
1148 +|Overall Analysis|(% class="success" %)≥70% useful |Output helps user understand article claims
1091 1091  
1092 1092  **Analogy:** "B student" quality (70-80%), not "A+" perfection yet
1093 1093  
1094 1094  **Not expecting:**
1153 +
1095 1095  * 100% accuracy
1096 1096  * Perfect claim coverage
1097 1097  * Comprehensive evidence gathering
... ... @@ -1099,6 +1099,7 @@
1099 1099  * Production polish
1100 1100  
1101 1101  **Expecting:**
1161 +
1102 1102  * Reasonable claim extraction
1103 1103  * Defensible verdicts
1104 1104  * Understandable reasoning
... ... @@ -1111,6 +1111,7 @@
1111 1111  **Input:** "Coffee reduces the risk of type 2 diabetes by 30%"
1112 1112  
1113 1113  **Expected Output:**
1174 +
1114 1114  * Extract claim correctly
1115 1115  * Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED
1116 1116  * Confidence: 70-90%
... ... @@ -1124,6 +1124,7 @@
1124 1124  **Input:** News article URL with multiple claims about politics/health/science
1125 1125  
1126 1126  **Expected Output:**
1188 +
1127 1127  * Extract 3-5 key claims
1128 1128  * Verdict for each (may vary: some supported, some uncertain, some refuted)
1129 1129  * Coherent analysis summary
... ... @@ -1137,6 +1137,7 @@
1137 1137  **Input:** Article on contested political or scientific topic
1138 1138  
1139 1139  **Expected Output:**
1202 +
1140 1140  * Balanced analysis
1141 1141  * Acknowledges uncertainty where appropriate
1142 1142  * Doesn't overstate confidence
... ... @@ -1149,6 +1149,7 @@
1149 1149  **Input:** Article with obviously false claim (e.g., "The Earth is flat")
1150 1150  
1151 1151  **Expected Output:**
1215 +
1152 1152  * Extract claim
1153 1153  * Verdict: REFUTED
1154 1154  * High confidence (> 90%)
... ... @@ -1162,6 +1162,7 @@
1162 1162  **Input:** Article with claim where evidence is genuinely mixed
1163 1163  
1164 1164  **Expected Output:**
1229 +
1165 1165  * Extract claim
1166 1166  * Verdict: UNCERTAIN
1167 1167  * Moderate confidence (40-60%)
... ... @@ -1174,6 +1174,7 @@
1174 1174  **Input:** Article making medical claims
1175 1175  
1176 1176  **Expected Output:**
1242 +
1177 1177  * Extract claim
1178 1178  * Verdict: [appropriate based on evidence]
1179 1179  * Risk tier: A (High - medical)
... ... @@ -1191,6 +1191,7 @@
1191 1191  **Option A: GO (Proceed to POC2)**
1192 1192  
1193 1193  **Conditions:**
1260 +
1194 1194  * AI quality ≥70% without manual editing
1195 1195  * Basic claim → verdict pipeline validated
1196 1196  * Internal + advisor feedback positive
... ... @@ -1199,6 +1199,7 @@
1199 1199  * Clear path to improving AI quality to ≥90%
1200 1200  
1201 1201  **Next Steps:**
1269 +
1202 1202  * Plan POC2 development (add scenarios)
1203 1203  * Design scenario architecture
1204 1204  * Expand to Evidence Model structure
... ... @@ -1207,6 +1207,7 @@
1207 1207  **Option B: NO-GO (Pivot or Stop)**
1208 1208  
1209 1209  **Conditions:**
1278 +
1210 1210  * AI quality < 60%
1211 1211  * Requires manual editing for most analyses (> 50%)
1212 1212  * Feedback indicates fundamental flaws
... ... @@ -1214,6 +1214,7 @@
1214 1214  * No clear path to improvement
1215 1215  
1216 1216  **Next Steps:**
1286 +
1217 1217  * **Pivot:** Change to hybrid human-AI approach (accept manual review required)
1218 1218  * **Stop:** Conclude approach not viable, revisit later
1219 1219  
... ... @@ -1220,6 +1220,7 @@
1220 1220  **Option C: ITERATE (Improve POC)**
1221 1221  
1222 1222  **Conditions:**
1293 +
1223 1223  * Concept has merit but execution needs work
1224 1224  * Specific improvements identified
1225 1225  * Addressable with better prompts/approach
... ... @@ -1226,6 +1226,7 @@
1226 1226  * AI quality between 60-70%
1227 1227  
1228 1228  **Next Steps:**
1300 +
1229 1229  * Improve AI prompts
1230 1230  * Test different approaches
1231 1231  * Re-run POC with improvements
... ... @@ -1234,9 +1234,9 @@
1234 1234  === 13.2 Decision Criteria Summary ===
1235 1235  
1236 1236  {{code}}
1237 -AI Quality < 60% → NO-GO (approach doesn't work)
1309 +AI Quality < 60% → NO-GO (approach doesn't work)
1238 1238  AI Quality 60-70% → ITERATE (improve and retry)
1239 -AI Quality ≥70% → GO (proceed to POC2)
1311 +AI Quality ≥70% → GO (proceed to POC2)
1240 1240  {{/code}}
1241 1241  
1242 1242  == 14. Key Risks & Mitigations ==
... ... @@ -1243,10 +1243,11 @@
1243 1243  
1244 1244  === 14.1 Risk: AI Quality Not Good Enough ===
1245 1245  
1246 -**Likelihood:** Medium-High
1247 -**Impact:** POC fails
1318 +**Likelihood:** Medium-High
1319 +**Impact:** POC fails
1248 1248  
1249 1249  **Mitigation:**
1322 +
1250 1250  * Extensive prompt engineering and testing
1251 1251  * Use best available AI models (Sonnet 4.5)
1252 1252  * Test with diverse article types
... ... @@ -1256,10 +1256,11 @@
1256 1256  
1257 1257  === 14.2 Risk: AI Consistency Issues ===
1258 1258  
1259 -**Likelihood:** Medium
1260 -**Impact:** Works sometimes, fails other times
1332 +**Likelihood:** Medium
1333 +**Impact:** Works sometimes, fails other times
1261 1261  
1262 1262  **Mitigation:**
1336 +
1263 1263  * Test with 10+ diverse articles
1264 1264  * Measure success rate honestly
1265 1265  * Improve prompts to increase consistency
... ... @@ -1268,10 +1268,11 @@
1268 1268  
1269 1269  === 14.3 Risk: Output Incomprehensible ===
1270 1270  
1271 -**Likelihood:** Low-Medium
1272 -**Impact:** Users can't understand analysis
1345 +**Likelihood:** Low-Medium
1346 +**Impact:** Users can't understand analysis
1273 1273  
1274 1274  **Mitigation:**
1349 +
1275 1275  * Create clear explainer document
1276 1276  * Iterate on output format
1277 1277  * Test with non-technical reviewers
... ... @@ -1281,10 +1281,11 @@
1281 1281  
1282 1282  === 14.4 Risk: API Rate Limits / Costs ===
1283 1283  
1284 -**Likelihood:** Low
1285 -**Impact:** System slow or expensive
1359 +**Likelihood:** Low
1360 +**Impact:** System slow or expensive
1286 1286  
1287 1287  **Mitigation:**
1363 +
1288 1288  * Monitor API usage
1289 1289  * Implement retry logic
1290 1290  * Estimate costs before scaling
... ... @@ -1293,10 +1293,11 @@
1293 1293  
1294 1294  === 14.5 Risk: Scope Creep ===
1295 1295  
1296 -**Likelihood:** Medium
1297 -**Impact:** POC becomes too complex
1372 +**Likelihood:** Medium
1373 +**Impact:** POC becomes too complex
1298 1298  
1299 1299  **Mitigation:**
1376 +
1300 1300  * Strict scope discipline
1301 1301  * Say NO to feature additions
1302 1302  * Keep focus on core question
... ... @@ -1307,12 +1307,15 @@
1307 1307  
1308 1308  === 15.1 Core Principles ===
1309 1309  
1310 -**1. Build Less, Learn More**
1387 +*
1388 +**
1389 +**1. Build Less, Learn More
1311 1311  * Minimum features to test hypothesis
1312 1312  * Don't build unvalidated features
1313 1313  * Focus on core question only
1314 1314  
1315 1315  **2. Fail Fast**
1395 +
1316 1316  * Quick test of hardest part (AI capability)
1317 1317  * Accept that POC might fail
1318 1318  * Better to discover issues early
... ... @@ -1319,16 +1319,19 @@
1319 1319  * Honest assessment over optimistic hope
1320 1320  
1321 1321  **3. Test First, Build Second**
1402 +
1322 1322  * Validate AI can do this before building platform
1323 1323  * Don't assume it will work
1324 1324  * Let results guide decisions
1325 1325  
1326 1326  **4. Automation First**
1408 +
1327 1327  * No manual editing allowed
1328 1328  * Tests scalability, not just feasibility
1329 1329  * Proves approach can work at scale
1330 1330  
1331 1331  **5. Honest Assessment**
1414 +
1332 1332  * Don't cherry-pick examples
1333 1333  * Don't manually fix bad outputs
1334 1334  * Document failures openly
... ... @@ -1336,22 +1336,25 @@
1336 1336  
1337 1337  === 15.2 What POC Is ===
1338 1338  
1339 -✅ Testing AI capability without humans
1340 -✅ Proving core technical concept
1341 -✅ Fast validation of approach
1342 -✅ Honest assessment of feasibility
1422 +✅ Testing AI capability without humans
1423 +✅ Proving core technical concept
1424 +✅ Fast validation of approach
1425 +✅ Honest assessment of feasibility
1343 1343  
1344 1344  === 15.3 What POC Is NOT ===
1345 1345  
1346 -❌ Building a product
1347 -❌ Production-ready system
1348 -❌ Feature-complete platform
1349 -❌ Perfectly accurate analysis
1350 -❌ Polished user experience
1429 +❌ Building a product
1430 +❌ Production-ready system
1431 +❌ Feature-complete platform
1432 +❌ Perfectly accurate analysis
1433 +❌ Polished user experience
1351 1351  
1352 -== 16. Success = Clear Path Forward ==
1435 +== 16. Success ==
1353 1353  
1437 + Clear Path Forward ==
1438 +
1354 1354  **If POC succeeds (≥70% AI quality):**
1440 +
1355 1355  * ✅ Approach validated
1356 1356  * ✅ Proceed to POC2 (add scenarios)
1357 1357  * ✅ Design full Evidence Model structure
... ... @@ -1359,6 +1359,7 @@
1359 1359  * ✅ Focus on improving AI quality from 70% → 90%
1360 1360  
1361 1361  **If POC fails (< 60% AI quality):**
1448 +
1362 1362  * ✅ Learn what doesn't work
1363 1363  * ✅ Pivot to different approach
1364 1364  * ✅ OR wait for better AI technology
... ... @@ -1377,3 +1377,51 @@
1377 1377  
1378 1378  **Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)
1379 1379  
1467 +
1468 +=== NFR-POC-11: LLM Provider Abstraction (POC1) ===
1469 +
1470 +**Requirement:** POC1 MUST implement LLM abstraction layer with support for multiple providers.
1471 +
1472 +**POC1 Implementation:**
1473 +
1474 +* **Primary Provider:** Anthropic Claude API
1475 +* Stage 1: Claude Haiku 4
1476 +* Stage 2: Claude Sonnet 3.5 (cached)
1477 +* Stage 3: Claude Sonnet 3.5
1478 +
1479 +* **Provider Interface:** Abstract LLMProvider interface implemented
1480 +
1481 +* **Configuration:** Environment variables for provider selection
1482 +* {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}}
1483 +* {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}}
1484 +* {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}}
1485 +
1486 +* **Failover:** Basic error handling with cache fallback for Stage 2
1487 +
1488 +* **Cost Tracking:** Log provider name and cost per request
1489 +
1490 +**Future (POC2/Beta):**
1491 +
1492 +* Secondary provider (OpenAI) with automatic failover
1493 +* Admin API for runtime provider switching
1494 +* Cost comparison dashboard
1495 +* Cross-provider output verification
1496 +
1497 +**Success Criteria:**
1498 +
1499 +* All LLM calls go through abstraction layer (no direct API calls)
1500 +* Provider can be changed via environment variable without code changes
1501 +* Cost tracking includes provider name in logs
1502 +* Stage 2 falls back to cache on provider failure
1503 +
1504 +**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor V0\.9\.103.Specification.POC.API-and-Schemas.WebHome]] Section 6
1505 +
1506 +**Dependencies:**
1507 +
1508 +* NFR-14 (Main Requirements)
1509 +* Design Decision 9
1510 +* Architecture Section 2.2
1511 +
1512 +**Priority:** HIGH (P1)
1513 +
1514 +**Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.