Last modified by Robert Schaub on 2026/02/08 08:26

From version 1.1
edited by Robert Schaub
on 2025/12/19 16:13
Change comment: Imported from XAR
To version 2.2
edited by Robert Schaub
on 2026/01/20 20:24
Change comment: Renamed back-links.

Summary

Details

Page properties
Title
... ... @@ -1,1 +1,1 @@
1 -POC Requirements
1 +POC Requirements (POC1 & POC2)
Content
... ... @@ -1,19 +1,28 @@
1 1  = POC Requirements =
2 2  
3 -**Status:** ✅ Approved for Development
4 -**Version:** 2.0 (Updated after Specification Cross-Check)
5 -**Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
6 6  
7 ----
4 +{{info}}
5 +**POC1 Architecture:** 3-stage AKEL pipeline (Extract → Analyze → Holistic) with Redis caching, credit tracking, and LLM abstraction layer.
8 8  
7 +See [[POC1 API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome]] for complete technical details.
8 +{{/info}}
9 +
10 +
11 +
12 +**Status:** ✅ Approved for Development
13 +**Version:** 2.0 (Updated after Specification Cross-Check)
14 +**Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
15 +
9 9  == 1. POC Overview ==
10 10  
11 11  === 1.1 What POC Tests ===
12 12  
13 13  **Core Question:**
21 +
14 14  > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?
15 15  
16 16  **What we're proving:**
25 +
17 17  * AI can identify factual claims from text
18 18  * AI can evaluate those claims and produce verdicts
19 19  * Output is comprehensible and useful
... ... @@ -20,6 +20,7 @@
20 20  * Fully automated approach is viable
21 21  
22 22  **What we're NOT testing:**
32 +
23 23  * Scenario generation (deferred to POC2)
24 24  * Evidence display (deferred to POC2)
25 25  * Production scalability
... ... @@ -26,8 +26,6 @@
26 26  * Perfect accuracy
27 27  * Complete feature set
28 28  
29 ----
30 -
31 31  === 1.2 Scenarios Deferred to POC2 ===
32 32  
33 33  **Intentional Simplification:**
... ... @@ -35,6 +35,7 @@
35 35  Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**.
36 36  
37 37  **Rationale:**
46 +
38 38  * **POC1 tests:** Can AI extract claims and generate verdicts?
39 39  * **POC2 will add:** Scenario generation and management
40 40  * **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow?
... ... @@ -46,6 +46,7 @@
46 46  **No Risk:**
47 47  
48 48  Scenarios are additive complexity, not foundational. Deferring them to POC2 allows:
58 +
49 49  * Faster POC1 validation
50 50  * Learning from POC1 to inform scenario design
51 51  * Iterative approach: fail fast if basic AI doesn't work
... ... @@ -52,65 +52,91 @@
52 52  * Flexibility to adjust scenario architecture based on POC1 insights
53 53  
54 54  **Full System Workflow (Future):**
55 -{{code}}
56 -Claims → Scenarios → Evidence → Verdicts
57 -{{/code}}
65 +{{code}}Claims → Scenarios → Evidence → Verdicts{{/code}}
58 58  
59 59  **POC1 Simplified Workflow:**
60 -{{code}}
61 -Claims → Verdicts (scenarios implicit in reasoning)
62 -{{/code}}
68 +{{code}}Claims → Verdicts (scenarios implicit in reasoning){{/code}}
63 63  
64 ----
65 -
66 66  == 2. POC Output Specification ==
67 67  
68 -=== 2.1 Component 1: ANALYSIS SUMMARY ===
72 +=== 2.1 Component 1: ANALYSIS SUMMARY (Context-Aware) ===
69 69  
70 -**What:** Brief overview of findings
71 -**Length:** 3-5 sentences
72 -**Content:**
73 -* How many claims found
74 -* Distribution of verdicts
75 -* Overall assessment
74 +**What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument
76 76  
77 -**Example:**
78 -{{code}}
79 -This article makes 4 claims about coffee's health effects. We found
80 -2 claims are well-supported, 1 is uncertain, and 1 is refuted.
81 -Overall assessment: mostly accurate with some exaggeration.
82 -{{/code}}
76 +**Length:** 4-6 sentences
83 83  
84 ----
78 +**Content (Required Elements):**
85 85  
80 +1. **Article's main thesis/claim** - What is the article trying to argue or prove?
81 +2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts
82 +3. **Central vs. supporting claims** - Which claims are central to the article's argument?
83 +4. **Relationship assessment** - Do the claims support the article's conclusion?
84 +5. **Overall credibility** - Final assessment considering claim importance
85 +
86 +**Critical Innovation:**
87 +
88 +POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might:
89 +
90 +* Make accurate supporting facts but draw unsupported conclusions
91 +* Have one false central claim that invalidates the whole argument
92 +* Misframe accurate information to mislead
93 +
94 +**Good Example (Context-Aware):**
95 +{{code}}This article argues that coffee cures cancer based on its antioxidant
96 +content. We analyzed 3 factual claims: 2 about coffee's chemical
97 +properties are well-supported, but the main causal claim is refuted
98 +by current evidence. The article confuses correlation with causation.
99 +Overall assessment: MISLEADING - makes an unsupported medical claim
100 +despite citing some accurate facts.{{/code}}
101 +
102 +**Poor Example (Simple Aggregation - Don't Do This):**
103 +{{code}}This article makes 3 claims. 2 are well-supported and 1 is refuted.
104 +Overall assessment: mostly accurate (67% accurate).{{/code}}
105 +↑ This misses that the refuted claim IS the article's main point!
106 +
107 +**What POC1 Tests:**
108 +
109 +Can AI identify and assess:
110 +
111 +* ✅ The article's main thesis/conclusion?
112 +* ✅ Which claims are central vs. supporting?
113 +* ✅ Whether the evidence supports the conclusion?
114 +* ✅ Overall credibility considering logical structure?
115 +
116 +**If AI Cannot Do This:**
117 +
118 +That's valuable to learn in POC1! We'll:
119 +
120 +* Note as limitation
121 +* Fall back to simple aggregation with warning
122 +* Design explicit article-level analysis for POC2
123 +
86 86  === 2.2 Component 2: CLAIMS IDENTIFICATION ===
87 87  
88 -**What:** List of factual claims extracted from article
89 -**Format:** Numbered list
90 -**Quantity:** 3-5 claims
126 +**What:** List of factual claims extracted from article
127 +**Format:** Numbered list
128 +**Quantity:** 3-5 claims
91 91  **Requirements:**
130 +
92 92  * Factual claims only (not opinions/questions)
93 93  * Clearly stated
94 94  * Automatically extracted by AI
95 95  
96 96  **Example:**
97 -{{code}}
98 -CLAIMS IDENTIFIED:
136 +{{code}}CLAIMS IDENTIFIED:
99 99  
100 100  [1] Coffee reduces diabetes risk by 30%
101 101  [2] Coffee improves heart health
102 102  [3] Decaf has same benefits as regular
103 -[4] Coffee prevents Alzheimer's completely
104 -{{/code}}
141 +[4] Coffee prevents Alzheimer's completely{{/code}}
105 105  
106 ----
107 -
108 108  === 2.3 Component 3: CLAIMS VERDICTS ===
109 109  
110 -**What:** Verdict for each claim identified
111 -**Format:** Per claim structure
145 +**What:** Verdict for each claim identified
146 +**Format:** Per claim structure
112 112  
113 113  **Required Elements:**
149 +
114 114  * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
115 115  * **Confidence Score:** 0-100%
116 116  * **Brief Reasoning:** 1-3 sentences explaining why
... ... @@ -117,8 +117,7 @@
117 117  * **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration
118 118  
119 119  **Example:**
120 -{{code}}
121 -VERDICTS:
156 +{{code}}VERDICTS:
122 122  
123 123  [1] WELL-SUPPORTED (85%) [Risk: C]
124 124  Multiple studies confirm 25-30% risk reduction with regular consumption.
... ... @@ -130,44 +130,86 @@
130 130  Some benefits overlap, but caffeine-related benefits are reduced in decaf.
131 131  
132 132  [4] REFUTED (90%) [Risk: B]
133 -No evidence for complete prevention. Claim is significantly overstated.
134 -{{/code}}
168 +No evidence for complete prevention. Claim is significantly overstated.{{/code}}
135 135  
136 136  **Risk Tier Display:**
171 +
137 137  * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections
138 -* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
173 +* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
139 139  * **Tier C (Green):** Low Risk - Facts/Definitions/History
140 140  
141 141  **Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.
142 142  
143 ----
144 -
145 145  === 2.4 Component 4: ARTICLE SUMMARY (Optional) ===
146 146  
147 -**What:** Brief summary of original article content
148 -**Length:** 3-5 sentences
180 +**What:** Brief summary of original article content
181 +**Length:** 3-5 sentences
149 149  **Tone:** Neutral (article's position, not FactHarbor's analysis)
150 150  
151 151  **Example:**
152 -{{code}}
153 -ARTICLE SUMMARY:
185 +{{code}}ARTICLE SUMMARY:
154 154  
155 155  Health News Today article discusses coffee benefits, citing studies
156 156  on diabetes and Alzheimer's. Author highlights research linking coffee
157 -to disease prevention. Recommends 2-3 cups daily for optimal health.
158 -{{/code}}
189 +to disease prevention. Recommends 2-3 cups daily for optimal health.{{/code}}
159 159  
160 ----
191 +=== 2.5 Component 5: USAGE STATISTICS (Cost Tracking) ===
161 161  
162 -=== 2.5 Total Output Size ===
193 +**What:** LLM usage metrics for cost optimization and scaling decisions
163 163  
164 -**Combined:** ~200-300 words
165 -* Analysis Summary: 50-70 words
195 +**Purpose:**
196 +
197 +* Understand cost per analysis
198 +* Identify optimization opportunities
199 +* Project costs at scale
200 +* Inform architecture decisions
201 +
202 +**Display Format:**
203 +{{code}}USAGE STATISTICS:
204 +• Article: 2,450 words (12,300 characters)
205 +• Input tokens: 15,234
206 +• Output tokens: 892
207 +• Total tokens: 16,126
208 +• Estimated cost: $0.24 USD
209 +• Response time: 8.3 seconds
210 +• Cost per claim: $0.048
211 +• Model: claude-sonnet-4-20250514{{/code}}
212 +
213 +**Why This Matters:**
214 +
215 +At scale, LLM costs are critical:
216 +
217 +* 10,000 articles/month ≈ $200-500/month
218 +* 100,000 articles/month ≈ $2,000-5,000/month
219 +* Cost optimization can reduce expenses 30-50%
220 +
221 +**What POC1 Learns:**
222 +
223 +* How cost scales with article length
224 +* Prompt optimization opportunities (caching, compression)
225 +* Output verbosity tradeoffs
226 +* Model selection strategy (FAST vs. REASONING roles)
227 +* Article length limits (if needed)
228 +
229 +**Implementation:**
230 +
231 +* Claude API already returns usage data
232 +* No extra API calls needed
233 +* Display to user + log for aggregate analysis
234 +* Test with articles of varying lengths
235 +
236 +**Critical for GO/NO-GO:** Unit economics must be viable at scale!
237 +
238 +=== 2.6 Total Output Size ===
239 +
240 +**Combined:** 220-350 words
241 +
242 +* Analysis Summary (Context-Aware): 60-90 words (4-6 sentences)
166 166  * Claims Identification: 30-50 words
167 167  * Claims Verdicts: 100-150 words
168 168  * Article Summary: 30-50 words (optional)
169 169  
170 ----
247 +**Note:** Analysis summary is slightly longer (4-6 sentences vs. 3-5) to accommodate context-aware assessment of article structure and logical reasoning.
171 171  
172 172  == 3. What's NOT in POC Scope ==
173 173  
... ... @@ -176,6 +176,7 @@
176 176  The following are **explicitly excluded** from POC:
177 177  
178 178  **Content Features:**
256 +
179 179  * ❌ Scenarios (deferred to POC2)
180 180  * ❌ Evidence display (supporting/opposing lists)
181 181  * ❌ Source links (clickable references)
... ... @@ -185,6 +185,7 @@
185 185  * ❌ Risk assessment (shown but not workflow-integrated)
186 186  
187 187  **Platform Features:**
266 +
188 188  * ❌ User accounts / authentication
189 189  * ❌ Saved history
190 190  * ❌ Search functionality
... ... @@ -194,6 +194,7 @@
194 194  * ❌ Social sharing
195 195  
196 196  **Technical Features:**
276 +
197 197  * ❌ Browser extensions
198 198  * ❌ Mobile apps
199 199  * ❌ API endpoints
... ... @@ -201,6 +201,7 @@
201 201  * ❌ Export features (PDF, CSV)
202 202  
203 203  **Quality Features:**
284 +
204 204  * ❌ Accessibility (WCAG compliance)
205 205  * ❌ Multilingual support
206 206  * ❌ Mobile optimization
... ... @@ -207,6 +207,7 @@
207 207  * ❌ Media verification (images/videos)
208 208  
209 209  **Production Features:**
291 +
210 210  * ❌ Security hardening
211 211  * ❌ Privacy compliance (GDPR)
212 212  * ❌ Terms of service
... ... @@ -215,24 +215,18 @@
215 215  * ❌ Analytics
216 216  * ❌ A/B testing
217 217  
218 ----
219 -
220 220  == 4. POC Simplifications vs. Full System ==
221 221  
222 222  === 4.1 Architecture Comparison ===
223 223  
224 224  **POC Architecture (Simplified):**
225 -{{code}}
226 -User Input → Single AKEL Call → Output Display
227 - (all processing)
228 -{{/code}}
305 +{{code}}User Input → Single AKEL Call → Output Display
306 + (all processing){{/code}}
229 229  
230 230  **Full System Architecture:**
231 -{{code}}
232 -User Input → Claim Extractor → Claim Classifier → Scenario Generator
309 +{{code}}User Input → Claim Extractor → Claim Classifier → Scenario Generator
233 233  → Evidence Summarizer → Contradiction Detector → Verdict Generator
234 -→ Quality Gates → Publication → Output Display
235 -{{/code}}
311 +→ Quality Gates → Publication → Output Display{{/code}}
236 236  
237 237  **Key Differences:**
238 238  
... ... @@ -245,17 +245,17 @@
245 245  |Data Model|Stateless (no database)|PostgreSQL + Redis + S3
246 246  |Architecture|Single prompt to Claude|AKEL Orchestrator + Components
247 247  
248 ----
249 -
250 250  === 4.2 Workflow Comparison ===
251 251  
252 252  **POC1 Workflow:**
327 +
253 253  1. User submits text/URL
254 254  2. Single AKEL call (all processing in one prompt)
255 255  3. Display results
256 -**Total: 3 steps, ~10-18 seconds**
331 +**Total: 3 steps, 10-18 seconds**
257 257  
258 258  **Full System Workflow:**
334 +
259 259  1. **Claim Submission** (extraction, normalization, clustering)
260 260  2. **Scenario Building** (definitions, assumptions, boundaries)
261 261  3. **Evidence Handling** (retrieval, assessment, linking)
... ... @@ -262,10 +262,8 @@
262 262  4. **Verdict Creation** (synthesis, reasoning, approval)
263 263  5. **Public Presentation** (summaries, landscapes, deep dives)
264 264  6. **Time Evolution** (versioning, re-evaluation triggers)
265 -**Total: 6 phases with quality gates, ~10-30 seconds**
341 +**Total: 6 phases with quality gates, 10-30 seconds**
266 266  
267 ----
268 -
269 269  === 4.3 Why POC is Simplified ===
270 270  
271 271  **Engineering Rationale:**
... ... @@ -284,11 +284,10 @@
284 284  * ❌ POC doesn't validate scale (test in Beta)
285 285  * ❌ POC doesn't validate scenario architecture (design in POC2)
286 286  
287 ----
288 -
289 289  === 4.4 Gap Between POC1 and POC2/Beta ===
290 290  
291 291  **What needs to be built for POC2:**
364 +
292 292  * Scenario generation component
293 293  * Evidence Model structure (full)
294 294  * Scenario-evidence linking
... ... @@ -296,6 +296,7 @@
296 296  * Truth landscape visualization
297 297  
298 298  **What needs to be built for Beta:**
372 +
299 299  * Multi-component AKEL pipeline
300 300  * Quality gate infrastructure
301 301  * Review workflow system
... ... @@ -305,8 +305,6 @@
305 305  
306 306  **POC1 → POC2 is significant architectural expansion.**
307 307  
308 ----
309 -
310 310  == 5. Publication Mode & Labeling ==
311 311  
312 312  === 5.1 POC Publication Mode ===
... ... @@ -314,6 +314,7 @@
314 314  **Mode:** Mode 2 (AI-Generated, No Prior Human Review)
315 315  
316 316  Per FactHarbor Specification Section 11 "POC v1 Behavior":
389 +
317 317  * Produces public AI-generated output
318 318  * No human approval gate
319 319  * Clear AI-Generated labeling
... ... @@ -320,35 +320,31 @@
320 320  * All quality gates active (simplified)
321 321  * Risk tier classification shown (demo)
322 322  
323 ----
324 -
325 325  === 5.2 User-Facing Labels ===
326 326  
327 327  **Primary Label (top of analysis):**
328 -{{code}}
329 -╔════════════════════════════════════════════════════════════╗
330 -║ [AI-GENERATED - POC/DEMO] ║
331 -║ ║
332 -║ This analysis was produced entirely by AI and has not ║
333 -║ been human-reviewed. Use for demonstration purposes. ║
334 -║ ║
335 -║ Source: AI/AKEL v1.0 (POC) ║
336 -║ Review Status: Not Reviewed (Proof-of-Concept) ║
337 -║ Quality Gates: 4/4 Passed (Simplified) ║
338 -║ Last Updated: [timestamp] ║
339 -╚════════════════════════════════════════════════════════════╝
340 -{{/code}}
399 +{{code}}╔════════════════════════════════════════════════════════════╗
400 +║ [AI-GENERATED - POC/DEMO] ║
401 +║ ║
402 +║ This analysis was produced entirely by AI and has not ║
403 +║ been human-reviewed. Use for demonstration purposes. ║
404 +║ ║
405 +║ Source: AI/AKEL v1.0 (POC) ║
406 +║ Review Status: Not Reviewed (Proof-of-Concept) ║
407 +║ Quality Gates: 4/4 Passed (Simplified) ║
408 +║ Last Updated: [timestamp] ║
409 +╚════════════════════════════════════════════════════════════╝{{/code}}
341 341  
342 342  **Per-Claim Risk Labels:**
412 +
343 343  * **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety)
344 344  * **[Risk: B]** 🟡 Medium Risk (Policy/Science)
345 345  * **[Risk: C]** 🟢 Low Risk (Facts/Definitions)
346 346  
347 ----
348 -
349 349  === 5.3 Display Requirements ===
350 350  
351 351  **Must Show:**
420 +
352 352  * AI-Generated status (prominent)
353 353  * POC/Demo disclaimer
354 354  * Risk tier per claim
... ... @@ -357,6 +357,7 @@
357 357  * Timestamp
358 358  
359 359  **Must NOT Claim:**
429 +
360 360  * Human review
361 361  * Production quality
362 362  * Medical/legal advice
... ... @@ -363,8 +363,6 @@
363 363  * Authoritative verdicts
364 364  * Complete accuracy
365 365  
366 ----
367 -
368 368  === 5.4 Mode 2 vs. Full System Publication ===
369 369  
370 370  |=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3
... ... @@ -375,8 +375,6 @@
375 375  |Risk Display|Demo only|Workflow-integrated|Validated
376 376  |User Actions|View only|Flag for review|Trust rating
377 377  
378 ----
379 -
380 380  == 6. Quality Gates (Simplified Implementation) ==
381 381  
382 382  === 6.1 Overview ===
... ... @@ -384,6 +384,7 @@
384 384  Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates.
385 385  
386 386  **Full System Has 4 Gates:**
453 +
387 387  1. Source Quality
388 388  2. Contradiction Search (MANDATORY)
389 389  3. Uncertainty Quantification
... ... @@ -390,16 +390,16 @@
390 390  4. Structural Integrity
391 391  
392 392  **POC Implements Simplified Versions:**
460 +
393 393  * Focus on demonstrating concept
394 394  * Basic implementations sufficient
395 395  * Failures displayed to user (not blocking)
396 396  * Full system has comprehensive validation
397 397  
398 ----
399 -
400 400  === 6.2 Gate 1: Source Quality (Basic) ===
401 401  
402 402  **Full System Requirements:**
469 +
403 403  * Primary sources identified and accessible
404 404  * Source reliability scored against whitelist
405 405  * Citation completeness verified
... ... @@ -407,6 +407,7 @@
407 407  * Author credentials validated
408 408  
409 409  **POC Implementation:**
477 +
410 410  * ✅ At least 2 sources found
411 411  * ✅ Sources accessible (URLs valid)
412 412  * ❌ No whitelist checking
... ... @@ -417,11 +417,10 @@
417 417  
418 418  **Failure Handling:** Display error message, don't generate verdict
419 419  
420 ----
421 -
422 422  === 6.3 Gate 2: Contradiction Search (Basic) ===
423 423  
424 424  **Full System Requirements:**
491 +
425 425  * Counter-evidence actively searched
426 426  * Reservations and limitations identified
427 427  * Alternative interpretations explored
... ... @@ -430,6 +430,7 @@
430 430  * Academic literature (supporting AND opposing)
431 431  
432 432  **POC Implementation:**
500 +
433 433  * ✅ Basic search for counter-evidence
434 434  * ✅ Identify obvious contradictions
435 435  * ❌ No comprehensive academic search
... ... @@ -441,11 +441,10 @@
441 441  
442 442  **Failure Handling:** Note "limited contradiction search" in output
443 443  
444 ----
445 -
446 446  === 6.4 Gate 3: Uncertainty Quantification (Basic) ===
447 447  
448 448  **Full System Requirements:**
515 +
449 449  * Confidence scores calculated for all claims/verdicts
450 450  * Limitations explicitly stated
451 451  * Data gaps identified and disclosed
... ... @@ -453,6 +453,7 @@
453 453  * Alternative scenarios considered
454 454  
455 455  **POC Implementation:**
523 +
456 456  * ✅ Confidence scores (0-100%)
457 457  * ✅ Basic uncertainty acknowledgment
458 458  * ❌ No detailed limitation disclosure
... ... @@ -463,11 +463,10 @@
463 463  
464 464  **Failure Handling:** Show "Confidence: Unknown" if calculation fails
465 465  
466 ----
467 -
468 468  === 6.5 Gate 4: Structural Integrity (Basic) ===
469 469  
470 470  **Full System Requirements:**
537 +
471 471  * No hallucinations detected (fact-checking against sources)
472 472  * Logic chain valid and traceable
473 473  * References accessible and verifiable
... ... @@ -475,6 +475,7 @@
475 475  * Premises clearly stated
476 476  
477 477  **POC Implementation:**
545 +
478 478  * ✅ Basic coherence check
479 479  * ✅ References accessible
480 480  * ❌ No comprehensive hallucination detection
... ... @@ -485,32 +485,24 @@
485 485  
486 486  **Failure Handling:** Display error message
487 487  
488 ----
489 -
490 490  === 6.6 Quality Gate Display ===
491 491  
492 492  **POC shows simplified status:**
493 -{{code}}
494 -Quality Gates: 4/4 Passed (Simplified)
559 +{{code}}Quality Gates: 4/4 Passed (Simplified)
495 495  ✓ Source Quality: 3 sources found
496 496  ✓ Contradiction Search: Basic search completed
497 497  ✓ Uncertainty: Confidence scores assigned
498 -✓ Structural Integrity: Output coherent
499 -{{/code}}
563 +✓ Structural Integrity: Output coherent{{/code}}
500 500  
501 501  **If any gate fails:**
502 -{{code}}
503 -Quality Gates: 3/4 Passed (Simplified)
566 +{{code}}Quality Gates: 3/4 Passed (Simplified)
504 504  ✓ Source Quality: 3 sources found
505 505  ✗ Contradiction Search: Search failed - limited evidence
506 506  ✓ Uncertainty: Confidence scores assigned
507 507  ✓ Structural Integrity: Output coherent
508 508  
509 -Note: This analysis has limited evidence. Use with caution.
510 -{{/code}}
572 +Note: This analysis has limited evidence. Use with caution.{{/code}}
511 511  
512 ----
513 -
514 514  === 6.7 Simplified vs. Full System ===
515 515  
516 516  |=Gate|=POC (Simplified)|=Full System
... ... @@ -521,14 +521,13 @@
521 521  
522 522  **POC Goal:** Demonstrate that quality gates are possible, not perfect implementation.
523 523  
524 ----
525 -
526 526  == 7. AKEL Architecture Comparison ==
527 527  
528 528  === 7.1 POC AKEL (Simplified) ===
529 529  
530 530  **Implementation:**
531 -* Single Claude API call (Sonnet 4.5)
589 +
590 +* Single provider API call (REASONING model)
532 532  * One comprehensive prompt
533 533  * All processing in single request
534 534  * No separate components
... ... @@ -535,31 +535,26 @@
535 535  * No orchestration layer
536 536  
537 537  **Prompt Structure:**
538 -{{code}}
539 -Task: Analyze this article and provide:
597 +{{code}}Task: Analyze this article and provide:
540 540  
541 541  1. Extract 3-5 factual claims
542 542  2. For each claim:
543 - - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
544 - - Assign confidence score (0-100%)
545 - - Assign risk tier (A/B/C)
546 - - Write brief reasoning (1-3 sentences)
601 + - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
602 + - Assign confidence score (0-100%)
603 + - Assign risk tier (A/B/C)
604 + - Write brief reasoning (1-3 sentences)
547 547  3. Generate analysis summary (3-5 sentences)
548 548  4. Generate article summary (3-5 sentences)
549 549  5. Run basic quality checks
550 550  
551 -Return as structured JSON.
552 -{{/code}}
609 +Return as structured JSON.{{/code}}
553 553  
554 554  **Processing Time:** 10-18 seconds (estimate)
555 555  
556 ----
557 -
558 558  === 7.2 Full System AKEL (Production) ===
559 559  
560 560  **Architecture:**
561 -{{code}}
562 -AKEL Orchestrator
616 +{{code}}AKEL Orchestrator
563 563  ├── Claim Extractor
564 564  ├── Claim Classifier (with risk tier assignment)
565 565  ├── Scenario Generator
... ... @@ -567,10 +567,10 @@
567 567  ├── Contradiction Detector
568 568  ├── Quality Gate Validator
569 569  ├── Audit Sampling Scheduler
570 -└── Federation Sync Adapter (Release 1.0+)
571 -{{/code}}
624 +└── Federation Sync Adapter (Release 1.0+){{/code}}
572 572  
573 573  **Processing:**
627 +
574 574  * Parallel processing where possible
575 575  * Separate component calls
576 576  * Quality gates between phases
... ... @@ -579,11 +579,10 @@
579 579  
580 580  **Processing Time:** 10-30 seconds (full pipeline)
581 581  
582 ----
583 -
584 584  === 7.3 Why POC Uses Single Call ===
585 585  
586 586  **Advantages:**
639 +
587 587  * ✅ Simpler to implement
588 588  * ✅ Faster POC development
589 589  * ✅ Easier to debug
... ... @@ -591,6 +591,7 @@
591 591  * ✅ Good enough for concept validation
592 592  
593 593  **Limitations:**
647 +
594 594  * ❌ No component reusability
595 595  * ❌ No parallel processing
596 596  * ❌ All-or-nothing (can't partially succeed)
... ... @@ -603,8 +603,6 @@
603 603  
604 604  Full component architecture comes in Beta after POC validates concept.
605 605  
606 ----
607 -
608 608  === 7.4 Evolution Path ===
609 609  
610 610  **POC1:** Single prompt → Prove concept
... ... @@ -612,8 +612,6 @@
612 612  **Beta:** Multi-component AKEL → Production architecture
613 613  **Release 1.0:** Full AKEL + Federation → Scale
614 614  
615 ----
616 -
617 617  == 8. Functional Requirements ==
618 618  
619 619  === FR-POC-1: Article Input ===
... ... @@ -621,6 +621,7 @@
621 621  **Requirement:** User can submit article for analysis
622 622  
623 623  **Functionality:**
674 +
624 624  * Text input field (paste article text, up to 5000 characters)
625 625  * URL input field (paste article URL)
626 626  * "Analyze" button to trigger processing
... ... @@ -627,6 +627,7 @@
627 627  * Loading indicator during analysis
628 628  
629 629  **Excluded:**
681 +
630 630  * No user authentication
631 631  * No claim history
632 632  * No search functionality
... ... @@ -633,17 +633,17 @@
633 633  * No saved templates
634 634  
635 635  **Acceptance Criteria:**
688 +
636 636  * User can paste text from article
637 637  * User can paste URL of article
638 638  * System accepts input and triggers analysis
639 639  
640 ----
641 -
642 642  === FR-POC-2: Claim Extraction (Fully Automated) ===
643 643  
644 644  **Requirement:** AI automatically extracts 3-5 factual claims
645 645  
646 646  **Functionality:**
698 +
647 647  * AI reads article text
648 648  * AI identifies factual claims (not opinions/questions)
649 649  * AI extracts 3-5 most important claims
... ... @@ -650,6 +650,7 @@
650 650  * System displays numbered list
651 651  
652 652  **Critical:** NO MANUAL EDITING ALLOWED
705 +
653 653  * AI selects which claims to extract
654 654  * AI identifies factual vs. non-factual
655 655  * System processes claims as extracted
... ... @@ -656,32 +656,34 @@
656 656  * No human curation or correction
657 657  
658 658  **Error Handling:**
712 +
659 659  * If extraction fails: Display error message
660 660  * User can retry with different input
661 661  * No manual intervention to fix extraction
662 662  
663 663  **Acceptance Criteria:**
718 +
664 664  * AI extracts 3-5 claims automatically
665 665  * Claims are factual (not opinions)
666 666  * Claims are clearly stated
667 667  * No manual editing required
668 668  
669 ----
670 -
671 671  === FR-POC-3: Verdict Generation (Fully Automated) ===
672 672  
673 673  **Requirement:** AI automatically generates verdict for each claim
674 674  
675 675  **Functionality:**
729 +
676 676  * For each claim, AI:
677 - * Evaluates claim based on available evidence/knowledge
678 - * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
679 - * Assigns confidence score (0-100%)
680 - * Assigns risk tier (A/B/C)
681 - * Writes brief reasoning (1-3 sentences)
731 +* Evaluates claim based on available evidence/knowledge
732 +* Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
733 +* Assigns confidence score (0-100%)
734 +* Assigns risk tier (A/B/C)
735 +* Writes brief reasoning (1-3 sentences)
682 682  * System displays verdict for each claim
683 683  
684 684  **Critical:** NO MANUAL EDITING ALLOWED
739 +
685 685  * AI computes verdicts based on evidence
686 686  * AI generates confidence scores
687 687  * AI writes reasoning
... ... @@ -688,11 +688,13 @@
688 688  * No human review or adjustment
689 689  
690 690  **Error Handling:**
746 +
691 691  * If verdict generation fails: Display error message
692 692  * User can retry
693 693  * No manual intervention to adjust verdicts
694 694  
695 695  **Acceptance Criteria:**
752 +
696 696  * Each claim has a verdict
697 697  * Confidence score is displayed (0-100%)
698 698  * Risk tier is displayed (A/B/C)
... ... @@ -700,34 +700,33 @@
700 700  * Verdict is defensible given reasoning
701 701  * All generated automatically by AI
702 702  
703 ----
704 -
705 705  === FR-POC-4: Analysis Summary (Fully Automated) ===
706 706  
707 707  **Requirement:** AI generates brief summary of analysis
708 708  
709 709  **Functionality:**
765 +
710 710  * AI summarizes findings in 3-5 sentences:
711 - * How many claims found
712 - * Distribution of verdicts
713 - * Overall assessment
767 +* How many claims found
768 +* Distribution of verdicts
769 +* Overall assessment
714 714  * System displays at top of results
715 715  
716 716  **Critical:** NO MANUAL EDITING ALLOWED
717 717  
718 718  **Acceptance Criteria:**
775 +
719 719  * Summary is coherent
720 720  * Accurately reflects analysis
721 721  * 3-5 sentences
722 722  * Automatically generated
723 723  
724 ----
725 -
726 726  === FR-POC-5: Article Summary (Fully Automated, Optional) ===
727 727  
728 728  **Requirement:** AI generates brief summary of original article
729 729  
730 730  **Functionality:**
786 +
731 731  * AI summarizes article content (not FactHarbor's analysis)
732 732  * 3-5 sentences
733 733  * System displays
... ... @@ -737,18 +737,18 @@
737 737  **Critical:** NO MANUAL EDITING ALLOWED
738 738  
739 739  **Acceptance Criteria:**
796 +
740 740  * Summary is neutral (article's position)
741 741  * Accurately reflects article content
742 742  * 3-5 sentences
743 743  * Automatically generated
744 744  
745 ----
746 -
747 747  === FR-POC-6: Publication Mode Display ===
748 748  
749 749  **Requirement:** Clear labeling of AI-generated content
750 750  
751 751  **Functionality:**
807 +
752 752  * Display Mode 2 publication label
753 753  * Show POC/Demo disclaimer
754 754  * Display risk tiers per claim
... ... @@ -756,18 +756,18 @@
756 756  * Display timestamp
757 757  
758 758  **Acceptance Criteria:**
815 +
759 759  * Label is prominent and clear
760 760  * User understands this is AI-generated POC output
761 761  * Risk tiers are color-coded
762 762  * Quality gate status is visible
763 763  
764 ----
765 -
766 766  === FR-POC-7: Quality Gate Execution ===
767 767  
768 768  **Requirement:** Execute simplified quality gates
769 769  
770 770  **Functionality:**
826 +
771 771  * Check source quality (basic)
772 772  * Attempt contradiction search (basic)
773 773  * Calculate confidence scores
... ... @@ -775,13 +775,12 @@
775 775  * Display gate results
776 776  
777 777  **Acceptance Criteria:**
834 +
778 778  * All 4 gates attempted
779 779  * Pass/fail status displayed
780 780  * Failures explained to user
781 781  * Gates don't block publication (POC mode)
782 782  
783 ----
784 -
785 785  == 9. Non-Functional Requirements ==
786 786  
787 787  === NFR-POC-1: Fully Automated Processing ===
... ... @@ -791,6 +791,7 @@
791 791  **Critical Rule:** NO MANUAL EDITING AT ANY STAGE
792 792  
793 793  **What this means:**
849 +
794 794  * Claims: AI selects (no human curation)
795 795  * Scenarios: N/A (deferred to POC2)
796 796  * Evidence: AI evaluates (no human selection)
... ... @@ -798,13 +798,12 @@
798 798  * Summaries: AI writes (no human editing)
799 799  
800 800  **Pipeline:**
801 -{{code}}
802 -User Input → AKEL Processing → Output Display
803 - ↓
804 - ZERO human editing
805 -{{/code}}
857 +{{code}}User Input → AKEL Processing → Output Display
858 + ↓
859 + ZERO human editing{{/code}}
806 806  
807 807  **If AI output is poor:**
862 +
808 808  * ❌ Do NOT manually fix it
809 809  * ✅ Document the failure
810 810  * ✅ Improve prompts and retry
... ... @@ -811,59 +811,61 @@
811 811  * ✅ Accept that POC might fail
812 812  
813 813  **Why this matters:**
869 +
814 814  * Tests whether AI can do this without humans
815 815  * Validates scalability (humans can't review every analysis)
816 816  * Honest test of technical feasibility
817 817  
818 ----
819 -
820 820  === NFR-POC-2: Performance ===
821 821  
822 822  **Requirement:** Analysis completes in reasonable time
823 823  
824 824  **Acceptable Performance:**
879 +
825 825  * Processing time: 1-5 minutes (acceptable for POC)
826 826  * Display loading indicator to user
827 827  * Show progress if possible ("Extracting claims...", "Generating verdicts...")
828 828  
829 829  **Not Required:**
885 +
830 830  * Production-level speed (< 30 seconds)
831 831  * Optimization for scale
832 832  * Caching
833 833  
834 834  **Acceptance Criteria:**
891 +
835 835  * Analysis completes within 5 minutes
836 836  * User sees loading indicator
837 837  * No timeout errors
838 838  
839 ----
840 -
841 841  === NFR-POC-3: Reliability ===
842 842  
843 843  **Requirement:** System works for manual testing sessions
844 844  
845 845  **Acceptable:**
901 +
846 846  * Occasional errors (< 20% failure rate)
847 847  * Manual restart if needed
848 848  * Display error messages clearly
849 849  
850 850  **Not Required:**
907 +
851 851  * 99.9% uptime
852 852  * Automatic error recovery
853 853  * Production monitoring
854 854  
855 855  **Acceptance Criteria:**
913 +
856 856  * System works for test demonstrations
857 857  * Errors are handled gracefully
858 858  * User receives clear error messages
859 859  
860 ----
861 -
862 862  === NFR-POC-4: Environment ===
863 863  
864 864  **Requirement:** Runs on simple infrastructure
865 865  
866 866  **Acceptable:**
923 +
867 867  * Single machine or simple cloud setup
868 868  * No distributed architecture
869 869  * No load balancing
... ... @@ -871,125 +871,196 @@
871 871  * Local development environment viable
872 872  
873 873  **Not Required:**
931 +
874 874  * Production infrastructure
875 875  * Multi-region deployment
876 876  * Auto-scaling
877 877  * Disaster recovery
878 878  
879 ----
937 +=== NFR-POC-5: Cost Efficiency Tracking ===
880 880  
939 +**Requirement:** Track and display LLM usage metrics to inform optimization decisions
940 +
941 +**Must Track:**
942 +
943 +* Input tokens (article + prompt)
944 +* Output tokens (generated analysis)
945 +* Total tokens
946 +* Estimated cost (USD)
947 +* Response time (seconds)
948 +* Article length (words/characters)
949 +
950 +**Must Display:**
951 +
952 +* Usage statistics in UI (Component 5)
953 +* Cost per analysis
954 +* Cost per claim extracted
955 +
956 +**Must Log:**
957 +
958 +* Aggregate metrics for analysis
959 +* Cost distribution by article length
960 +* Token efficiency trends
961 +
962 +**Purpose:**
963 +
964 +* Understand unit economics
965 +* Identify optimization opportunities
966 +* Project costs at scale
967 +* Inform architecture decisions (caching, model selection, etc.)
968 +
969 +**Acceptance Criteria:**
970 +
971 +* ✅ Usage data displayed after each analysis
972 +* ✅ Metrics logged for aggregate analysis
973 +* ✅ Cost calculated accurately (Claude API pricing)
974 +* ✅ Test cases include varying article lengths
975 +* ✅ POC1 report includes cost analysis section
976 +
977 +**Success Target:**
978 +
979 +* Average cost per analysis < $0.05 USD
980 +* Cost scaling behavior understood (linear/exponential)
981 +* 2+ optimization opportunities identified
982 +
983 +**Critical:** Unit economics must be viable for scaling decision!
984 +
881 881  == 10. Technical Architecture ==
882 882  
883 883  === 10.1 System Components ===
884 884  
885 885  **Frontend:**
990 +
886 886  * Simple HTML form (text input + URL input + button)
887 887  * Loading indicator
888 888  * Results display page (single page, no tabs/navigation)
889 889  
890 890  **Backend:**
996 +
891 891  * Single API endpoint
892 -* Calls Claude API (Sonnet 4.5 or latest)
998 +* Calls provider API (REASONING model; configured via LLM abstraction)
893 893  * Parses response
894 894  * Returns JSON to frontend
895 895  
896 896  **Data Storage:**
1003 +
897 897  * None required (stateless POC)
898 898  * Optional: Simple file storage or SQLite for demo examples
899 899  
900 900  **External Services:**
1008 +
901 901  * Claude API (Anthropic) - required
902 902  * Optional: URL fetch service for article text extraction
903 903  
904 ----
905 -
906 906  === 10.2 Processing Flow ===
907 907  
908 908  {{code}}
909 909  1. User submits text or URL
910 -
1016 + ↓
911 911  2. Backend receives request
912 -
1018 + ↓
913 913  3. If URL: Fetch article text
914 -
1020 + ↓
915 915  4. Call Claude API with single prompt:
916 - "Extract claims, evaluate each, provide verdicts"
917 -
1022 + "Extract claims, evaluate each, provide verdicts"
1023 + ↓
918 918  5. Claude API returns:
919 - - Analysis summary
920 - - Claims list
921 - - Verdicts for each claim (with risk tiers)
922 - - Article summary (optional)
923 - - Quality gate results
924 -
1025 + - Analysis summary
1026 + - Claims list
1027 + - Verdicts for each claim (with risk tiers)
1028 + - Article summary (optional)
1029 + - Quality gate results
1030 + ↓
925 925  6. Backend parses response
926 -
1032 + ↓
927 927  7. Frontend displays results with Mode 2 labeling
928 928  {{/code}}
929 929  
930 930  **Key Simplification:** Single API call does entire analysis
931 931  
932 ----
933 -
934 934  === 10.3 AI Prompt Strategy ===
935 935  
936 936  **Single Comprehensive Prompt:**
937 -{{code}}
938 -Task: Analyze this article and provide:
1041 +{{code}}Task: Analyze this article and provide:
939 939  
940 -1. Extract 3-5 factual claims from the article
941 -2. For each claim:
942 - - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
943 - - Assign confidence score (0-100%)
944 - - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
945 - - Write brief reasoning (1-3 sentences)
946 -3. Run quality gates:
947 - - Check: ≥2 sources found
948 - - Attempt: Basic contradiction search
949 - - Calculate: Confidence scores
950 - - Verify: Structural integrity
951 -4. Write analysis summary (3-5 sentences: claims found, verdict distribution, overall assessment)
952 -5. Write article summary (3-5 sentences: neutral summary of article content)
1043 +1. Identify the article's main thesis/conclusion
1044 + - What is the article trying to argue or prove?
1045 + - What is the primary claim or conclusion?
953 953  
954 -Return as structured JSON with quality gate results.
955 -{{/code}}
1047 +2. Extract 3-5 factual claims from the article
1048 + - Note which claims are CENTRAL to the main thesis
1049 + - Note which claims are SUPPORTING facts
956 956  
1051 +3. For each claim:
1052 + - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
1053 + - Assign confidence score (0-100%)
1054 + - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
1055 + - Write brief reasoning (1-3 sentences)
1056 +
1057 +4. Assess relationship between claims and main thesis:
1058 + - Do the claims actually support the article's conclusion?
1059 + - Are there logical leaps or unsupported inferences?
1060 + - Is the article's framing misleading even if individual facts are accurate?
1061 +
1062 +5. Run quality gates:
1063 + - Check: ≥2 sources found
1064 + - Attempt: Basic contradiction search
1065 + - Calculate: Confidence scores
1066 + - Verify: Structural integrity
1067 +
1068 +6. Write context-aware analysis summary (4-6 sentences):
1069 + - State article's main thesis
1070 + - Report claims found and verdict distribution
1071 + - Note if central claims are problematic
1072 + - Assess whether evidence supports conclusion
1073 + - Overall credibility considering claim importance
1074 +
1075 +7. Write article summary (3-5 sentences: neutral summary of article content)
1076 +
1077 +Return as structured JSON with quality gate results.{{/code}}
1078 +
957 957  **One prompt generates everything.**
958 958  
959 ----
1081 +**Critical Addition:**
960 960  
1083 +Steps 1, 2 (marking central claims), 4, and 6 are NEW for context-aware analysis. These test whether AI can distinguish between "accurate facts poorly reasoned" vs. "genuinely credible article."
1084 +
961 961  === 10.4 Technology Stack Suggestions ===
962 962  
963 963  **Frontend:**
1088 +
964 964  * HTML + CSS + JavaScript (minimal framework)
965 965  * OR: Next.js (if team prefers)
966 966  * Hosted: Local machine OR Vercel/Netlify free tier
967 967  
968 968  **Backend:**
1094 +
969 969  * Python Flask/FastAPI (simple REST API)
970 970  * OR: Next.js API routes (if using Next.js)
971 971  * Hosted: Local machine OR Railway/Render free tier
972 972  
973 973  **AKEL Integration:**
1100 +
974 974  * Claude API via Anthropic SDK
975 -* Model: Claude Sonnet 4.5 or latest available
1102 +* Model: Provider-default REASONING model or latest available
976 976  
977 977  **Database:**
1105 +
978 978  * None (stateless acceptable)
979 979  * OR: SQLite if want to store demo examples
980 980  * OR: JSON files on disk
981 981  
982 982  **Deployment:**
1111 +
983 983  * Local development environment sufficient for POC
984 984  * Optional: Deploy to cloud for remote demos
985 985  
986 ----
987 -
988 988  == 11. Success Criteria ==
989 989  
990 990  === 11.1 Minimum Success (POC Passes) ===
991 991  
992 992  **Required for GO decision:**
1120 +
993 993  * ✅ AI extracts 3-5 factual claims automatically
994 994  * ✅ AI provides verdict for each claim automatically
995 995  * ✅ Verdicts are reasonable (≥70% make logical sense)
... ... @@ -998,17 +998,20 @@
998 998  * ✅ Team/advisors understand the output
999 999  * ✅ Team agrees approach has merit
1000 1000  * ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention)
1129 +* ✅ **Cost efficiency acceptable** (average cost per analysis < $0.05 USD target)
1130 +* ✅ **Cost scaling understood** (data collected on article length vs. cost)
1131 +* ✅ **Optimization opportunities identified** (≥2 potential improvements documented)
1001 1001  
1002 1002  **Quality Definition:**
1134 +
1003 1003  * "Reasonable verdict" = Defensible given general knowledge
1004 1004  * "Coherent summary" = Logically structured, grammatically correct
1005 1005  * "Comprehensible" = Reviewers understand what analysis means
1006 1006  
1007 ----
1008 -
1009 1009  === 11.2 POC Fails If ===
1010 1010  
1011 1011  **Automatic NO-GO if any of these:**
1142 +
1012 1012  * ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones)
1013 1013  * ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random)
1014 1014  * ❌ Output incomprehensible (reviewers can't understand analysis)
... ... @@ -1015,21 +1015,20 @@
1015 1015  * ❌ **Requires manual editing for most analyses** (> 50% need human correction)
1016 1016  * ❌ Team loses confidence in AI-automated approach
1017 1017  
1018 ----
1019 -
1020 1020  === 11.3 Quality Thresholds ===
1021 1021  
1022 1022  **POC quality expectations:**
1023 1023  
1024 1024  |=Component|=Quality Threshold|=Definition
1025 -|Claim Extraction|(% class="success" %)≥70% accuracy(%%) |Identifies obvious factual claims, may miss some edge cases
1026 -|Verdict Logic|(% class="success" %)≥70% defensible(%%) |Verdicts are logical given reasoning provided
1027 -|Reasoning Clarity|(% class="success" %)≥70% clear(%%) |1-3 sentences are understandable and relevant
1028 -|Overall Analysis|(% class="success" %)≥70% useful(%%) |Output helps user understand article claims
1154 +|Claim Extraction|(% class="success" %)≥70% accuracy |Identifies obvious factual claims, may miss some edge cases
1155 +|Verdict Logic|(% class="success" %)≥70% defensible |Verdicts are logical given reasoning provided
1156 +|Reasoning Clarity|(% class="success" %)≥70% clear |1-3 sentences are understandable and relevant
1157 +|Overall Analysis|(% class="success" %)≥70% useful |Output helps user understand article claims
1029 1029  
1030 1030  **Analogy:** "B student" quality (70-80%), not "A+" perfection yet
1031 1031  
1032 1032  **Not expecting:**
1162 +
1033 1033  * 100% accuracy
1034 1034  * Perfect claim coverage
1035 1035  * Comprehensive evidence gathering
... ... @@ -1037,13 +1037,12 @@
1037 1037  * Production polish
1038 1038  
1039 1039  **Expecting:**
1170 +
1040 1040  * Reasonable claim extraction
1041 1041  * Defensible verdicts
1042 1042  * Understandable reasoning
1043 1043  * Useful output
1044 1044  
1045 ----
1046 -
1047 1047  == 12. Test Cases ==
1048 1048  
1049 1049  === 12.1 Test Case 1: Simple Factual Claim ===
... ... @@ -1051,6 +1051,7 @@
1051 1051  **Input:** "Coffee reduces the risk of type 2 diabetes by 30%"
1052 1052  
1053 1053  **Expected Output:**
1183 +
1054 1054  * Extract claim correctly
1055 1055  * Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED
1056 1056  * Confidence: 70-90%
... ... @@ -1059,13 +1059,12 @@
1059 1059  
1060 1060  **Success:** Verdict is reasonable and reasoning makes sense
1061 1061  
1062 ----
1063 -
1064 1064  === 12.2 Test Case 2: Complex News Article ===
1065 1065  
1066 1066  **Input:** News article URL with multiple claims about politics/health/science
1067 1067  
1068 1068  **Expected Output:**
1197 +
1069 1069  * Extract 3-5 key claims
1070 1070  * Verdict for each (may vary: some supported, some uncertain, some refuted)
1071 1071  * Coherent analysis summary
... ... @@ -1074,13 +1074,12 @@
1074 1074  
1075 1075  **Success:** Claims identified are actually from article, verdicts are reasonable
1076 1076  
1077 ----
1078 -
1079 1079  === 12.3 Test Case 3: Controversial Topic ===
1080 1080  
1081 1081  **Input:** Article on contested political or scientific topic
1082 1082  
1083 1083  **Expected Output:**
1211 +
1084 1084  * Balanced analysis
1085 1085  * Acknowledges uncertainty where appropriate
1086 1086  * Doesn't overstate confidence
... ... @@ -1088,13 +1088,12 @@
1088 1088  
1089 1089  **Success:** Analysis is fair and doesn't show obvious bias
1090 1090  
1091 ----
1092 -
1093 1093  === 12.4 Test Case 4: Clearly False Claim ===
1094 1094  
1095 1095  **Input:** Article with obviously false claim (e.g., "The Earth is flat")
1096 1096  
1097 1097  **Expected Output:**
1224 +
1098 1098  * Extract claim
1099 1099  * Verdict: REFUTED
1100 1100  * High confidence (> 90%)
... ... @@ -1103,13 +1103,12 @@
1103 1103  
1104 1104  **Success:** AI correctly identifies false claim with high confidence
1105 1105  
1106 ----
1107 -
1108 1108  === 12.5 Test Case 5: Genuinely Uncertain Claim ===
1109 1109  
1110 1110  **Input:** Article with claim where evidence is genuinely mixed
1111 1111  
1112 1112  **Expected Output:**
1238 +
1113 1113  * Extract claim
1114 1114  * Verdict: UNCERTAIN
1115 1115  * Moderate confidence (40-60%)
... ... @@ -1117,13 +1117,12 @@
1117 1117  
1118 1118  **Success:** AI recognizes uncertainty and doesn't overstate confidence
1119 1119  
1120 ----
1121 -
1122 1122  === 12.6 Test Case 6: High-Risk Medical Claim ===
1123 1123  
1124 1124  **Input:** Article making medical claims
1125 1125  
1126 1126  **Expected Output:**
1251 +
1127 1127  * Extract claim
1128 1128  * Verdict: [appropriate based on evidence]
1129 1129  * Risk tier: A (High - medical)
... ... @@ -1132,8 +1132,6 @@
1132 1132  
1133 1133  **Success:** Risk tier correctly assigned, appropriate warnings shown
1134 1134  
1135 ----
1136 -
1137 1137  == 13. POC Decision Gate ==
1138 1138  
1139 1139  === 13.1 Decision Framework ===
... ... @@ -1143,6 +1143,7 @@
1143 1143  **Option A: GO (Proceed to POC2)**
1144 1144  
1145 1145  **Conditions:**
1269 +
1146 1146  * AI quality ≥70% without manual editing
1147 1147  * Basic claim → verdict pipeline validated
1148 1148  * Internal + advisor feedback positive
... ... @@ -1151,16 +1151,16 @@
1151 1151  * Clear path to improving AI quality to ≥90%
1152 1152  
1153 1153  **Next Steps:**
1278 +
1154 1154  * Plan POC2 development (add scenarios)
1155 1155  * Design scenario architecture
1156 1156  * Expand to Evidence Model structure
1157 1157  * Test with more complex articles
1158 1158  
1159 ----
1160 -
1161 1161  **Option B: NO-GO (Pivot or Stop)**
1162 1162  
1163 1163  **Conditions:**
1287 +
1164 1164  * AI quality < 60%
1165 1165  * Requires manual editing for most analyses (> 50%)
1166 1166  * Feedback indicates fundamental flaws
... ... @@ -1168,14 +1168,14 @@
1168 1168  * No clear path to improvement
1169 1169  
1170 1170  **Next Steps:**
1295 +
1171 1171  * **Pivot:** Change to hybrid human-AI approach (accept manual review required)
1172 1172  * **Stop:** Conclude approach not viable, revisit later
1173 1173  
1174 ----
1175 -
1176 1176  **Option C: ITERATE (Improve POC)**
1177 1177  
1178 1178  **Conditions:**
1302 +
1179 1179  * Concept has merit but execution needs work
1180 1180  * Specific improvements identified
1181 1181  * Addressable with better prompts/approach
... ... @@ -1182,46 +1182,43 @@
1182 1182  * AI quality between 60-70%
1183 1183  
1184 1184  **Next Steps:**
1309 +
1185 1185  * Improve AI prompts
1186 1186  * Test different approaches
1187 1187  * Re-run POC with improvements
1188 1188  * Then make GO/NO-GO decision
1189 1189  
1190 ----
1191 -
1192 1192  === 13.2 Decision Criteria Summary ===
1193 1193  
1194 1194  {{code}}
1195 -AI Quality < 60% → NO-GO (approach doesn't work)
1318 +AI Quality < 60% → NO-GO (approach doesn't work)
1196 1196  AI Quality 60-70% → ITERATE (improve and retry)
1197 -AI Quality ≥70% → GO (proceed to POC2)
1320 +AI Quality ≥70% → GO (proceed to POC2)
1198 1198  {{/code}}
1199 1199  
1200 ----
1201 -
1202 1202  == 14. Key Risks & Mitigations ==
1203 1203  
1204 1204  === 14.1 Risk: AI Quality Not Good Enough ===
1205 1205  
1206 -**Likelihood:** Medium-High
1207 -**Impact:** POC fails
1327 +**Likelihood:** Medium-High
1328 +**Impact:** POC fails
1208 1208  
1209 1209  **Mitigation:**
1331 +
1210 1210  * Extensive prompt engineering and testing
1211 -* Use best available AI models (Sonnet 4.5)
1333 +* Use best available AI models (role-based selection; configured via LLM abstraction)
1212 1212  * Test with diverse article types
1213 1213  * Iterate on prompts based on results
1214 1214  
1215 1215  **Acceptance:** This is what POC tests - be ready for failure
1216 1216  
1217 ----
1218 -
1219 1219  === 14.2 Risk: AI Consistency Issues ===
1220 1220  
1221 -**Likelihood:** Medium
1222 -**Impact:** Works sometimes, fails other times
1341 +**Likelihood:** Medium
1342 +**Impact:** Works sometimes, fails other times
1223 1223  
1224 1224  **Mitigation:**
1345 +
1225 1225  * Test with 10+ diverse articles
1226 1226  * Measure success rate honestly
1227 1227  * Improve prompts to increase consistency
... ... @@ -1228,14 +1228,13 @@
1228 1228  
1229 1229  **Acceptance:** Some variability OK if average quality ≥70%
1230 1230  
1231 ----
1232 -
1233 1233  === 14.3 Risk: Output Incomprehensible ===
1234 1234  
1235 -**Likelihood:** Low-Medium
1236 -**Impact:** Users can't understand analysis
1354 +**Likelihood:** Low-Medium
1355 +**Impact:** Users can't understand analysis
1237 1237  
1238 1238  **Mitigation:**
1358 +
1239 1239  * Create clear explainer document
1240 1240  * Iterate on output format
1241 1241  * Test with non-technical reviewers
... ... @@ -1243,14 +1243,13 @@
1243 1243  
1244 1244  **Acceptance:** Iterate until comprehensible
1245 1245  
1246 ----
1247 -
1248 1248  === 14.4 Risk: API Rate Limits / Costs ===
1249 1249  
1250 -**Likelihood:** Low
1251 -**Impact:** System slow or expensive
1368 +**Likelihood:** Low
1369 +**Impact:** System slow or expensive
1252 1252  
1253 1253  **Mitigation:**
1372 +
1254 1254  * Monitor API usage
1255 1255  * Implement retry logic
1256 1256  * Estimate costs before scaling
... ... @@ -1257,14 +1257,13 @@
1257 1257  
1258 1258  **Acceptance:** POC can be slow and expensive (optimization later)
1259 1259  
1260 ----
1261 -
1262 1262  === 14.5 Risk: Scope Creep ===
1263 1263  
1264 -**Likelihood:** Medium
1265 -**Impact:** POC becomes too complex
1381 +**Likelihood:** Medium
1382 +**Impact:** POC becomes too complex
1266 1266  
1267 1267  **Mitigation:**
1385 +
1268 1268  * Strict scope discipline
1269 1269  * Say NO to feature additions
1270 1270  * Keep focus on core question
... ... @@ -1271,18 +1271,19 @@
1271 1271  
1272 1272  **Acceptance:** POC is minimal by design
1273 1273  
1274 ----
1275 -
1276 1276  == 15. POC Philosophy ==
1277 1277  
1278 1278  === 15.1 Core Principles ===
1279 1279  
1280 -**1. Build Less, Learn More**
1396 +*
1397 +**
1398 +**1. Build Less, Learn More
1281 1281  * Minimum features to test hypothesis
1282 1282  * Don't build unvalidated features
1283 1283  * Focus on core question only
1284 1284  
1285 1285  **2. Fail Fast**
1404 +
1286 1286  * Quick test of hardest part (AI capability)
1287 1287  * Accept that POC might fail
1288 1288  * Better to discover issues early
... ... @@ -1289,45 +1289,45 @@
1289 1289  * Honest assessment over optimistic hope
1290 1290  
1291 1291  **3. Test First, Build Second**
1411 +
1292 1292  * Validate AI can do this before building platform
1293 1293  * Don't assume it will work
1294 1294  * Let results guide decisions
1295 1295  
1296 1296  **4. Automation First**
1417 +
1297 1297  * No manual editing allowed
1298 1298  * Tests scalability, not just feasibility
1299 1299  * Proves approach can work at scale
1300 1300  
1301 1301  **5. Honest Assessment**
1423 +
1302 1302  * Don't cherry-pick examples
1303 1303  * Don't manually fix bad outputs
1304 1304  * Document failures openly
1305 1305  * Make data-driven decisions
1306 1306  
1307 ----
1308 -
1309 1309  === 15.2 What POC Is ===
1310 1310  
1311 -✅ Testing AI capability without humans
1312 -✅ Proving core technical concept
1313 -✅ Fast validation of approach
1314 -✅ Honest assessment of feasibility
1431 +✅ Testing AI capability without humans
1432 +✅ Proving core technical concept
1433 +✅ Fast validation of approach
1434 +✅ Honest assessment of feasibility
1315 1315  
1316 ----
1317 -
1318 1318  === 15.3 What POC Is NOT ===
1319 1319  
1320 -❌ Building a product
1321 -❌ Production-ready system
1322 -❌ Feature-complete platform
1323 -❌ Perfectly accurate analysis
1324 -❌ Polished user experience
1438 +❌ Building a product
1439 +❌ Production-ready system
1440 +❌ Feature-complete platform
1441 +❌ Perfectly accurate analysis
1442 +❌ Polished user experience
1325 1325  
1326 ----
1444 +== 16. Success ==
1327 1327  
1328 -== 16. Success = Clear Path Forward ==
1446 + Clear Path Forward ==
1329 1329  
1330 1330  **If POC succeeds (≥70% AI quality):**
1449 +
1331 1331  * ✅ Approach validated
1332 1332  * ✅ Proceed to POC2 (add scenarios)
1333 1333  * ✅ Design full Evidence Model structure
... ... @@ -1335,6 +1335,7 @@
1335 1335  * ✅ Focus on improving AI quality from 70% → 90%
1336 1336  
1337 1337  **If POC fails (< 60% AI quality):**
1457 +
1338 1338  * ✅ Learn what doesn't work
1339 1339  * ✅ Pivot to different approach
1340 1340  * ✅ OR wait for better AI technology
... ... @@ -1342,18 +1342,62 @@
1342 1342  
1343 1343  **Either way, POC provides clarity.**
1344 1344  
1345 ----
1346 -
1347 1347  == 17. Related Pages ==
1348 1348  
1349 -* [[User Needs>>FactHarbor.Specification.Requirements.User Needs]]
1350 -* [[Requirements>>FactHarbor.Requirements.WebHome]]
1351 -* [[Gap Analysis>>FactHarbor.Analysis.GapAnalysis]]
1467 +* [[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]
1468 +* [[Requirements>>FactHarbor.Specification.Requirements.WebHome]]
1469 +* [[Gap Analysis>>FactHarbor.Specification.Requirements.GapAnalysis]]
1352 1352  * [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]
1353 -* [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
1471 +* [[AKEL>>Archive.FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
1354 1354  * [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]
1355 1355  
1356 ----
1357 -
1358 1358  **Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)
1359 1359  
1476 +
1477 +=== NFR-POC-11: LLM Provider Abstraction (POC1) ===
1478 +
1479 +**Requirement:** POC1 MUST implement LLM abstraction layer with support for multiple providers.
1480 +
1481 +**POC1 Implementation:**
1482 +
1483 +* **Primary Provider:** Anthropic Claude API
1484 +* Stage 1: Provider-default FAST model
1485 +* Stage 2: Provider-default REASONING model (cached)
1486 +* Stage 3: Provider-default REASONING model
1487 +
1488 +* **Provider Interface:** Abstract LLMProvider interface implemented
1489 +
1490 +* **Configuration:** Environment variables for provider selection
1491 +* {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}}
1492 +* {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}}
1493 +* {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}}
1494 +
1495 +* **Failover:** Basic error handling with cache fallback for Stage 2
1496 +
1497 +* **Cost Tracking:** Log provider name and cost per request
1498 +
1499 +**Future (POC2/Beta):**
1500 +
1501 +* Secondary provider (OpenAI) with automatic failover
1502 +* Admin API for runtime provider switching
1503 +* Cost comparison dashboard
1504 +* Cross-provider output verification
1505 +
1506 +**Success Criteria:**
1507 +
1508 +* All LLM calls go through abstraction layer (no direct API calls)
1509 +* Provider can be changed via environment variable without code changes
1510 +* Cost tracking includes provider name in logs
1511 +* Stage 2 falls back to cache on provider failure
1512 +
1513 +**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor.Specification.POC.API-and-Schemas.WebHome]] Section 6
1514 +
1515 +**Dependencies:**
1516 +
1517 +* NFR-14 (Main Requirements)
1518 +* Design Decision 9
1519 +* Architecture Section 2.2
1520 +
1521 +**Priority:** HIGH (P1)
1522 +
1523 +**Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.