Changes for page POC Requirements (POC1 & POC2)
Last modified by Robert Schaub on 2026/02/08 08:26
Summary
-
Page properties (2 modified, 0 added, 0 removed)
Details
- Page properties
-
- Title
-
... ... @@ -1,1 +1,1 @@ 1 -POC Requirements 1 +POC Requirements (POC1 & POC2) - Content
-
... ... @@ -1,19 +1,28 @@ 1 1 = POC Requirements = 2 2 3 -**Status:** ✅ Approved for Development 4 -**Version:** 2.0 (Updated after Specification Cross-Check) 5 -**Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention 6 6 7 ---- 4 +{{info}} 5 +**POC1 Architecture:** 3-stage AKEL pipeline (Extract → Analyze → Holistic) with Redis caching, credit tracking, and LLM abstraction layer. 8 8 7 +See [[POC1 API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome]] for complete technical details. 8 +{{/info}} 9 + 10 + 11 + 12 +**Status:** ✅ Approved for Development 13 +**Version:** 2.0 (Updated after Specification Cross-Check) 14 +**Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention 15 + 9 9 == 1. POC Overview == 10 10 11 11 === 1.1 What POC Tests === 12 12 13 13 **Core Question:** 21 + 14 14 > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts? 15 15 16 16 **What we're proving:** 25 + 17 17 * AI can identify factual claims from text 18 18 * AI can evaluate those claims and produce verdicts 19 19 * Output is comprehensible and useful ... ... @@ -20,6 +20,7 @@ 20 20 * Fully automated approach is viable 21 21 22 22 **What we're NOT testing:** 32 + 23 23 * Scenario generation (deferred to POC2) 24 24 * Evidence display (deferred to POC2) 25 25 * Production scalability ... ... @@ -26,8 +26,6 @@ 26 26 * Perfect accuracy 27 27 * Complete feature set 28 28 29 ---- 30 - 31 31 === 1.2 Scenarios Deferred to POC2 === 32 32 33 33 **Intentional Simplification:** ... ... @@ -35,6 +35,7 @@ 35 35 Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**. 36 36 37 37 **Rationale:** 46 + 38 38 * **POC1 tests:** Can AI extract claims and generate verdicts? 39 39 * **POC2 will add:** Scenario generation and management 40 40 * **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow? ... ... @@ -46,6 +46,7 @@ 46 46 **No Risk:** 47 47 48 48 Scenarios are additive complexity, not foundational. Deferring them to POC2 allows: 58 + 49 49 * Faster POC1 validation 50 50 * Learning from POC1 to inform scenario design 51 51 * Iterative approach: fail fast if basic AI doesn't work ... ... @@ -52,65 +52,91 @@ 52 52 * Flexibility to adjust scenario architecture based on POC1 insights 53 53 54 54 **Full System Workflow (Future):** 55 -{{code}} 56 -Claims → Scenarios → Evidence → Verdicts 57 -{{/code}} 65 +{{code}}Claims → Scenarios → Evidence → Verdicts{{/code}} 58 58 59 59 **POC1 Simplified Workflow:** 60 -{{code}} 61 -Claims → Verdicts (scenarios implicit in reasoning) 62 -{{/code}} 68 +{{code}}Claims → Verdicts (scenarios implicit in reasoning){{/code}} 63 63 64 ---- 65 - 66 66 == 2. POC Output Specification == 67 67 68 -=== 2.1 Component 1: ANALYSIS SUMMARY === 72 +=== 2.1 Component 1: ANALYSIS SUMMARY (Context-Aware) === 69 69 70 -**What:** Brief overview of findings 71 -**Length:** 3-5 sentences 72 -**Content:** 73 -* How many claims found 74 -* Distribution of verdicts 75 -* Overall assessment 74 +**What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument 76 76 77 -**Example:** 78 -{{code}} 79 -This article makes 4 claims about coffee's health effects. We found 80 -2 claims are well-supported, 1 is uncertain, and 1 is refuted. 81 -Overall assessment: mostly accurate with some exaggeration. 82 -{{/code}} 76 +**Length:** 4-6 sentences 83 83 84 - ---78 +**Content (Required Elements):** 85 85 80 +1. **Article's main thesis/claim** - What is the article trying to argue or prove? 81 +2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts 82 +3. **Central vs. supporting claims** - Which claims are central to the article's argument? 83 +4. **Relationship assessment** - Do the claims support the article's conclusion? 84 +5. **Overall credibility** - Final assessment considering claim importance 85 + 86 +**Critical Innovation:** 87 + 88 +POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might: 89 + 90 +* Make accurate supporting facts but draw unsupported conclusions 91 +* Have one false central claim that invalidates the whole argument 92 +* Misframe accurate information to mislead 93 + 94 +**Good Example (Context-Aware):** 95 +{{code}}This article argues that coffee cures cancer based on its antioxidant 96 +content. We analyzed 3 factual claims: 2 about coffee's chemical 97 +properties are well-supported, but the main causal claim is refuted 98 +by current evidence. The article confuses correlation with causation. 99 +Overall assessment: MISLEADING - makes an unsupported medical claim 100 +despite citing some accurate facts.{{/code}} 101 + 102 +**Poor Example (Simple Aggregation - Don't Do This):** 103 +{{code}}This article makes 3 claims. 2 are well-supported and 1 is refuted. 104 +Overall assessment: mostly accurate (67% accurate).{{/code}} 105 +↑ This misses that the refuted claim IS the article's main point! 106 + 107 +**What POC1 Tests:** 108 + 109 +Can AI identify and assess: 110 + 111 +* ✅ The article's main thesis/conclusion? 112 +* ✅ Which claims are central vs. supporting? 113 +* ✅ Whether the evidence supports the conclusion? 114 +* ✅ Overall credibility considering logical structure? 115 + 116 +**If AI Cannot Do This:** 117 + 118 +That's valuable to learn in POC1! We'll: 119 + 120 +* Note as limitation 121 +* Fall back to simple aggregation with warning 122 +* Design explicit article-level analysis for POC2 123 + 86 86 === 2.2 Component 2: CLAIMS IDENTIFICATION === 87 87 88 -**What:** List of factual claims extracted from article 89 -**Format:** Numbered list 90 -**Quantity:** 3-5 claims 126 +**What:** List of factual claims extracted from article 127 +**Format:** Numbered list 128 +**Quantity:** 3-5 claims 91 91 **Requirements:** 130 + 92 92 * Factual claims only (not opinions/questions) 93 93 * Clearly stated 94 94 * Automatically extracted by AI 95 95 96 96 **Example:** 97 -{{code}} 98 -CLAIMS IDENTIFIED: 136 +{{code}}CLAIMS IDENTIFIED: 99 99 100 100 [1] Coffee reduces diabetes risk by 30% 101 101 [2] Coffee improves heart health 102 102 [3] Decaf has same benefits as regular 103 -[4] Coffee prevents Alzheimer's completely 104 -{{/code}} 141 +[4] Coffee prevents Alzheimer's completely{{/code}} 105 105 106 ---- 107 - 108 108 === 2.3 Component 3: CLAIMS VERDICTS === 109 109 110 -**What:** Verdict for each claim identified 111 -**Format:** Per claim structure 145 +**What:** Verdict for each claim identified 146 +**Format:** Per claim structure 112 112 113 113 **Required Elements:** 149 + 114 114 * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED 115 115 * **Confidence Score:** 0-100% 116 116 * **Brief Reasoning:** 1-3 sentences explaining why ... ... @@ -117,8 +117,7 @@ 117 117 * **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration 118 118 119 119 **Example:** 120 -{{code}} 121 -VERDICTS: 156 +{{code}}VERDICTS: 122 122 123 123 [1] WELL-SUPPORTED (85%) [Risk: C] 124 124 Multiple studies confirm 25-30% risk reduction with regular consumption. ... ... @@ -130,44 +130,86 @@ 130 130 Some benefits overlap, but caffeine-related benefits are reduced in decaf. 131 131 132 132 [4] REFUTED (90%) [Risk: B] 133 -No evidence for complete prevention. Claim is significantly overstated. 134 -{{/code}} 168 +No evidence for complete prevention. Claim is significantly overstated.{{/code}} 135 135 136 136 **Risk Tier Display:** 171 + 137 137 * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections 138 -* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality 173 +* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality 139 139 * **Tier C (Green):** Low Risk - Facts/Definitions/History 140 140 141 141 **Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow. 142 142 143 ---- 144 - 145 145 === 2.4 Component 4: ARTICLE SUMMARY (Optional) === 146 146 147 -**What:** Brief summary of original article content 148 -**Length:** 3-5 sentences 180 +**What:** Brief summary of original article content 181 +**Length:** 3-5 sentences 149 149 **Tone:** Neutral (article's position, not FactHarbor's analysis) 150 150 151 151 **Example:** 152 -{{code}} 153 -ARTICLE SUMMARY: 185 +{{code}}ARTICLE SUMMARY: 154 154 155 155 Health News Today article discusses coffee benefits, citing studies 156 156 on diabetes and Alzheimer's. Author highlights research linking coffee 157 -to disease prevention. Recommends 2-3 cups daily for optimal health. 158 -{{/code}} 189 +to disease prevention. Recommends 2-3 cups daily for optimal health.{{/code}} 159 159 160 - ---191 +=== 2.5 Component 5: USAGE STATISTICS (Cost Tracking) === 161 161 162 - ===2.5TotalOutputSize===193 +**What:** LLM usage metrics for cost optimization and scaling decisions 163 163 164 -**Combined:** ~200-300 words 165 -* Analysis Summary: 50-70 words 195 +**Purpose:** 196 + 197 +* Understand cost per analysis 198 +* Identify optimization opportunities 199 +* Project costs at scale 200 +* Inform architecture decisions 201 + 202 +**Display Format:** 203 +{{code}}USAGE STATISTICS: 204 +• Article: 2,450 words (12,300 characters) 205 +• Input tokens: 15,234 206 +• Output tokens: 892 207 +• Total tokens: 16,126 208 +• Estimated cost: $0.24 USD 209 +• Response time: 8.3 seconds 210 +• Cost per claim: $0.048 211 +• Model: claude-sonnet-4-20250514{{/code}} 212 + 213 +**Why This Matters:** 214 + 215 +At scale, LLM costs are critical: 216 + 217 +* 10,000 articles/month ≈ $200-500/month 218 +* 100,000 articles/month ≈ $2,000-5,000/month 219 +* Cost optimization can reduce expenses 30-50% 220 + 221 +**What POC1 Learns:** 222 + 223 +* How cost scales with article length 224 +* Prompt optimization opportunities (caching, compression) 225 +* Output verbosity tradeoffs 226 +* Model selection strategy (FAST vs. REASONING roles) 227 +* Article length limits (if needed) 228 + 229 +**Implementation:** 230 + 231 +* Claude API already returns usage data 232 +* No extra API calls needed 233 +* Display to user + log for aggregate analysis 234 +* Test with articles of varying lengths 235 + 236 +**Critical for GO/NO-GO:** Unit economics must be viable at scale! 237 + 238 +=== 2.6 Total Output Size === 239 + 240 +**Combined:** 220-350 words 241 + 242 +* Analysis Summary (Context-Aware): 60-90 words (4-6 sentences) 166 166 * Claims Identification: 30-50 words 167 167 * Claims Verdicts: 100-150 words 168 168 * Article Summary: 30-50 words (optional) 169 169 170 ---- 247 +**Note:** Analysis summary is slightly longer (4-6 sentences vs. 3-5) to accommodate context-aware assessment of article structure and logical reasoning. 171 171 172 172 == 3. What's NOT in POC Scope == 173 173 ... ... @@ -176,6 +176,7 @@ 176 176 The following are **explicitly excluded** from POC: 177 177 178 178 **Content Features:** 256 + 179 179 * ❌ Scenarios (deferred to POC2) 180 180 * ❌ Evidence display (supporting/opposing lists) 181 181 * ❌ Source links (clickable references) ... ... @@ -185,6 +185,7 @@ 185 185 * ❌ Risk assessment (shown but not workflow-integrated) 186 186 187 187 **Platform Features:** 266 + 188 188 * ❌ User accounts / authentication 189 189 * ❌ Saved history 190 190 * ❌ Search functionality ... ... @@ -194,6 +194,7 @@ 194 194 * ❌ Social sharing 195 195 196 196 **Technical Features:** 276 + 197 197 * ❌ Browser extensions 198 198 * ❌ Mobile apps 199 199 * ❌ API endpoints ... ... @@ -201,6 +201,7 @@ 201 201 * ❌ Export features (PDF, CSV) 202 202 203 203 **Quality Features:** 284 + 204 204 * ❌ Accessibility (WCAG compliance) 205 205 * ❌ Multilingual support 206 206 * ❌ Mobile optimization ... ... @@ -207,6 +207,7 @@ 207 207 * ❌ Media verification (images/videos) 208 208 209 209 **Production Features:** 291 + 210 210 * ❌ Security hardening 211 211 * ❌ Privacy compliance (GDPR) 212 212 * ❌ Terms of service ... ... @@ -215,24 +215,18 @@ 215 215 * ❌ Analytics 216 216 * ❌ A/B testing 217 217 218 ---- 219 - 220 220 == 4. POC Simplifications vs. Full System == 221 221 222 222 === 4.1 Architecture Comparison === 223 223 224 224 **POC Architecture (Simplified):** 225 -{{code}} 226 -User Input → Single AKEL Call → Output Display 227 - (all processing) 228 -{{/code}} 305 +{{code}}User Input → Single AKEL Call → Output Display 306 + (all processing){{/code}} 229 229 230 230 **Full System Architecture:** 231 -{{code}} 232 -User Input → Claim Extractor → Claim Classifier → Scenario Generator 309 +{{code}}User Input → Claim Extractor → Claim Classifier → Scenario Generator 233 233 → Evidence Summarizer → Contradiction Detector → Verdict Generator 234 -→ Quality Gates → Publication → Output Display 235 -{{/code}} 311 +→ Quality Gates → Publication → Output Display{{/code}} 236 236 237 237 **Key Differences:** 238 238 ... ... @@ -245,17 +245,17 @@ 245 245 |Data Model|Stateless (no database)|PostgreSQL + Redis + S3 246 246 |Architecture|Single prompt to Claude|AKEL Orchestrator + Components 247 247 248 ---- 249 - 250 250 === 4.2 Workflow Comparison === 251 251 252 252 **POC1 Workflow:** 327 + 253 253 1. User submits text/URL 254 254 2. Single AKEL call (all processing in one prompt) 255 255 3. Display results 256 -**Total: 3 steps, ~10-18 seconds**331 +**Total: 3 steps, 10-18 seconds** 257 257 258 258 **Full System Workflow:** 334 + 259 259 1. **Claim Submission** (extraction, normalization, clustering) 260 260 2. **Scenario Building** (definitions, assumptions, boundaries) 261 261 3. **Evidence Handling** (retrieval, assessment, linking) ... ... @@ -262,10 +262,8 @@ 262 262 4. **Verdict Creation** (synthesis, reasoning, approval) 263 263 5. **Public Presentation** (summaries, landscapes, deep dives) 264 264 6. **Time Evolution** (versioning, re-evaluation triggers) 265 -**Total: 6 phases with quality gates, ~10-30 seconds**341 +**Total: 6 phases with quality gates, 10-30 seconds** 266 266 267 ---- 268 - 269 269 === 4.3 Why POC is Simplified === 270 270 271 271 **Engineering Rationale:** ... ... @@ -284,11 +284,10 @@ 284 284 * ❌ POC doesn't validate scale (test in Beta) 285 285 * ❌ POC doesn't validate scenario architecture (design in POC2) 286 286 287 ---- 288 - 289 289 === 4.4 Gap Between POC1 and POC2/Beta === 290 290 291 291 **What needs to be built for POC2:** 364 + 292 292 * Scenario generation component 293 293 * Evidence Model structure (full) 294 294 * Scenario-evidence linking ... ... @@ -296,6 +296,7 @@ 296 296 * Truth landscape visualization 297 297 298 298 **What needs to be built for Beta:** 372 + 299 299 * Multi-component AKEL pipeline 300 300 * Quality gate infrastructure 301 301 * Review workflow system ... ... @@ -305,8 +305,6 @@ 305 305 306 306 **POC1 → POC2 is significant architectural expansion.** 307 307 308 ---- 309 - 310 310 == 5. Publication Mode & Labeling == 311 311 312 312 === 5.1 POC Publication Mode === ... ... @@ -314,6 +314,7 @@ 314 314 **Mode:** Mode 2 (AI-Generated, No Prior Human Review) 315 315 316 316 Per FactHarbor Specification Section 11 "POC v1 Behavior": 389 + 317 317 * Produces public AI-generated output 318 318 * No human approval gate 319 319 * Clear AI-Generated labeling ... ... @@ -320,35 +320,31 @@ 320 320 * All quality gates active (simplified) 321 321 * Risk tier classification shown (demo) 322 322 323 ---- 324 - 325 325 === 5.2 User-Facing Labels === 326 326 327 327 **Primary Label (top of analysis):** 328 -{{code}} 329 -╔════════════════════════════════════════════════════════════╗ 330 -║ [AI-GENERATED - POC/DEMO] ║ 331 -║ ║ 332 -║ This analysis was produced entirely by AI and has not ║ 333 -║ been human-reviewed. Use for demonstration purposes. ║ 334 -║ ║ 335 -║ Source: AI/AKEL v1.0 (POC) ║ 336 -║ Review Status: Not Reviewed (Proof-of-Concept) ║ 337 -║ Quality Gates: 4/4 Passed (Simplified) ║ 338 -║ Last Updated: [timestamp] ║ 339 -╚════════════════════════════════════════════════════════════╝ 340 -{{/code}} 399 +{{code}}╔════════════════════════════════════════════════════════════╗ 400 +║ [AI-GENERATED - POC/DEMO] ║ 401 +║ ║ 402 +║ This analysis was produced entirely by AI and has not ║ 403 +║ been human-reviewed. Use for demonstration purposes. ║ 404 +║ ║ 405 +║ Source: AI/AKEL v1.0 (POC) ║ 406 +║ Review Status: Not Reviewed (Proof-of-Concept) ║ 407 +║ Quality Gates: 4/4 Passed (Simplified) ║ 408 +║ Last Updated: [timestamp] ║ 409 +╚════════════════════════════════════════════════════════════╝{{/code}} 341 341 342 342 **Per-Claim Risk Labels:** 412 + 343 343 * **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety) 344 344 * **[Risk: B]** 🟡 Medium Risk (Policy/Science) 345 345 * **[Risk: C]** 🟢 Low Risk (Facts/Definitions) 346 346 347 ---- 348 - 349 349 === 5.3 Display Requirements === 350 350 351 351 **Must Show:** 420 + 352 352 * AI-Generated status (prominent) 353 353 * POC/Demo disclaimer 354 354 * Risk tier per claim ... ... @@ -357,6 +357,7 @@ 357 357 * Timestamp 358 358 359 359 **Must NOT Claim:** 429 + 360 360 * Human review 361 361 * Production quality 362 362 * Medical/legal advice ... ... @@ -363,8 +363,6 @@ 363 363 * Authoritative verdicts 364 364 * Complete accuracy 365 365 366 ---- 367 - 368 368 === 5.4 Mode 2 vs. Full System Publication === 369 369 370 370 |=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3 ... ... @@ -375,8 +375,6 @@ 375 375 |Risk Display|Demo only|Workflow-integrated|Validated 376 376 |User Actions|View only|Flag for review|Trust rating 377 377 378 ---- 379 - 380 380 == 6. Quality Gates (Simplified Implementation) == 381 381 382 382 === 6.1 Overview === ... ... @@ -384,6 +384,7 @@ 384 384 Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates. 385 385 386 386 **Full System Has 4 Gates:** 453 + 387 387 1. Source Quality 388 388 2. Contradiction Search (MANDATORY) 389 389 3. Uncertainty Quantification ... ... @@ -390,16 +390,16 @@ 390 390 4. Structural Integrity 391 391 392 392 **POC Implements Simplified Versions:** 460 + 393 393 * Focus on demonstrating concept 394 394 * Basic implementations sufficient 395 395 * Failures displayed to user (not blocking) 396 396 * Full system has comprehensive validation 397 397 398 ---- 399 - 400 400 === 6.2 Gate 1: Source Quality (Basic) === 401 401 402 402 **Full System Requirements:** 469 + 403 403 * Primary sources identified and accessible 404 404 * Source reliability scored against whitelist 405 405 * Citation completeness verified ... ... @@ -407,6 +407,7 @@ 407 407 * Author credentials validated 408 408 409 409 **POC Implementation:** 477 + 410 410 * ✅ At least 2 sources found 411 411 * ✅ Sources accessible (URLs valid) 412 412 * ❌ No whitelist checking ... ... @@ -417,11 +417,10 @@ 417 417 418 418 **Failure Handling:** Display error message, don't generate verdict 419 419 420 ---- 421 - 422 422 === 6.3 Gate 2: Contradiction Search (Basic) === 423 423 424 424 **Full System Requirements:** 491 + 425 425 * Counter-evidence actively searched 426 426 * Reservations and limitations identified 427 427 * Alternative interpretations explored ... ... @@ -430,6 +430,7 @@ 430 430 * Academic literature (supporting AND opposing) 431 431 432 432 **POC Implementation:** 500 + 433 433 * ✅ Basic search for counter-evidence 434 434 * ✅ Identify obvious contradictions 435 435 * ❌ No comprehensive academic search ... ... @@ -441,11 +441,10 @@ 441 441 442 442 **Failure Handling:** Note "limited contradiction search" in output 443 443 444 ---- 445 - 446 446 === 6.4 Gate 3: Uncertainty Quantification (Basic) === 447 447 448 448 **Full System Requirements:** 515 + 449 449 * Confidence scores calculated for all claims/verdicts 450 450 * Limitations explicitly stated 451 451 * Data gaps identified and disclosed ... ... @@ -453,6 +453,7 @@ 453 453 * Alternative scenarios considered 454 454 455 455 **POC Implementation:** 523 + 456 456 * ✅ Confidence scores (0-100%) 457 457 * ✅ Basic uncertainty acknowledgment 458 458 * ❌ No detailed limitation disclosure ... ... @@ -463,11 +463,10 @@ 463 463 464 464 **Failure Handling:** Show "Confidence: Unknown" if calculation fails 465 465 466 ---- 467 - 468 468 === 6.5 Gate 4: Structural Integrity (Basic) === 469 469 470 470 **Full System Requirements:** 537 + 471 471 * No hallucinations detected (fact-checking against sources) 472 472 * Logic chain valid and traceable 473 473 * References accessible and verifiable ... ... @@ -475,6 +475,7 @@ 475 475 * Premises clearly stated 476 476 477 477 **POC Implementation:** 545 + 478 478 * ✅ Basic coherence check 479 479 * ✅ References accessible 480 480 * ❌ No comprehensive hallucination detection ... ... @@ -485,32 +485,24 @@ 485 485 486 486 **Failure Handling:** Display error message 487 487 488 ---- 489 - 490 490 === 6.6 Quality Gate Display === 491 491 492 492 **POC shows simplified status:** 493 -{{code}} 494 -Quality Gates: 4/4 Passed (Simplified) 559 +{{code}}Quality Gates: 4/4 Passed (Simplified) 495 495 ✓ Source Quality: 3 sources found 496 496 ✓ Contradiction Search: Basic search completed 497 497 ✓ Uncertainty: Confidence scores assigned 498 -✓ Structural Integrity: Output coherent 499 -{{/code}} 563 +✓ Structural Integrity: Output coherent{{/code}} 500 500 501 501 **If any gate fails:** 502 -{{code}} 503 -Quality Gates: 3/4 Passed (Simplified) 566 +{{code}}Quality Gates: 3/4 Passed (Simplified) 504 504 ✓ Source Quality: 3 sources found 505 505 ✗ Contradiction Search: Search failed - limited evidence 506 506 ✓ Uncertainty: Confidence scores assigned 507 507 ✓ Structural Integrity: Output coherent 508 508 509 -Note: This analysis has limited evidence. Use with caution. 510 -{{/code}} 572 +Note: This analysis has limited evidence. Use with caution.{{/code}} 511 511 512 ---- 513 - 514 514 === 6.7 Simplified vs. Full System === 515 515 516 516 |=Gate|=POC (Simplified)|=Full System ... ... @@ -521,14 +521,13 @@ 521 521 522 522 **POC Goal:** Demonstrate that quality gates are possible, not perfect implementation. 523 523 524 ---- 525 - 526 526 == 7. AKEL Architecture Comparison == 527 527 528 528 === 7.1 POC AKEL (Simplified) === 529 529 530 530 **Implementation:** 531 -* Single Claude API call (Sonnet 4.5) 589 + 590 +* Single provider API call (REASONING model) 532 532 * One comprehensive prompt 533 533 * All processing in single request 534 534 * No separate components ... ... @@ -535,31 +535,26 @@ 535 535 * No orchestration layer 536 536 537 537 **Prompt Structure:** 538 -{{code}} 539 -Task: Analyze this article and provide: 597 +{{code}}Task: Analyze this article and provide: 540 540 541 541 1. Extract 3-5 factual claims 542 542 2. For each claim: 543 - - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)544 - - Assign confidence score (0-100%)545 - - Assign risk tier (A/B/C)546 - - Write brief reasoning (1-3 sentences)601 + - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED) 602 + - Assign confidence score (0-100%) 603 + - Assign risk tier (A/B/C) 604 + - Write brief reasoning (1-3 sentences) 547 547 3. Generate analysis summary (3-5 sentences) 548 548 4. Generate article summary (3-5 sentences) 549 549 5. Run basic quality checks 550 550 551 -Return as structured JSON. 552 -{{/code}} 609 +Return as structured JSON.{{/code}} 553 553 554 554 **Processing Time:** 10-18 seconds (estimate) 555 555 556 ---- 557 - 558 558 === 7.2 Full System AKEL (Production) === 559 559 560 560 **Architecture:** 561 -{{code}} 562 -AKEL Orchestrator 616 +{{code}}AKEL Orchestrator 563 563 ├── Claim Extractor 564 564 ├── Claim Classifier (with risk tier assignment) 565 565 ├── Scenario Generator ... ... @@ -567,10 +567,10 @@ 567 567 ├── Contradiction Detector 568 568 ├── Quality Gate Validator 569 569 ├── Audit Sampling Scheduler 570 -└── Federation Sync Adapter (Release 1.0+) 571 -{{/code}} 624 +└── Federation Sync Adapter (Release 1.0+){{/code}} 572 572 573 573 **Processing:** 627 + 574 574 * Parallel processing where possible 575 575 * Separate component calls 576 576 * Quality gates between phases ... ... @@ -579,11 +579,10 @@ 579 579 580 580 **Processing Time:** 10-30 seconds (full pipeline) 581 581 582 ---- 583 - 584 584 === 7.3 Why POC Uses Single Call === 585 585 586 586 **Advantages:** 639 + 587 587 * ✅ Simpler to implement 588 588 * ✅ Faster POC development 589 589 * ✅ Easier to debug ... ... @@ -591,6 +591,7 @@ 591 591 * ✅ Good enough for concept validation 592 592 593 593 **Limitations:** 647 + 594 594 * ❌ No component reusability 595 595 * ❌ No parallel processing 596 596 * ❌ All-or-nothing (can't partially succeed) ... ... @@ -603,8 +603,6 @@ 603 603 604 604 Full component architecture comes in Beta after POC validates concept. 605 605 606 ---- 607 - 608 608 === 7.4 Evolution Path === 609 609 610 610 **POC1:** Single prompt → Prove concept ... ... @@ -612,8 +612,6 @@ 612 612 **Beta:** Multi-component AKEL → Production architecture 613 613 **Release 1.0:** Full AKEL + Federation → Scale 614 614 615 ---- 616 - 617 617 == 8. Functional Requirements == 618 618 619 619 === FR-POC-1: Article Input === ... ... @@ -621,6 +621,7 @@ 621 621 **Requirement:** User can submit article for analysis 622 622 623 623 **Functionality:** 674 + 624 624 * Text input field (paste article text, up to 5000 characters) 625 625 * URL input field (paste article URL) 626 626 * "Analyze" button to trigger processing ... ... @@ -627,6 +627,7 @@ 627 627 * Loading indicator during analysis 628 628 629 629 **Excluded:** 681 + 630 630 * No user authentication 631 631 * No claim history 632 632 * No search functionality ... ... @@ -633,17 +633,17 @@ 633 633 * No saved templates 634 634 635 635 **Acceptance Criteria:** 688 + 636 636 * User can paste text from article 637 637 * User can paste URL of article 638 638 * System accepts input and triggers analysis 639 639 640 ---- 641 - 642 642 === FR-POC-2: Claim Extraction (Fully Automated) === 643 643 644 644 **Requirement:** AI automatically extracts 3-5 factual claims 645 645 646 646 **Functionality:** 698 + 647 647 * AI reads article text 648 648 * AI identifies factual claims (not opinions/questions) 649 649 * AI extracts 3-5 most important claims ... ... @@ -650,6 +650,7 @@ 650 650 * System displays numbered list 651 651 652 652 **Critical:** NO MANUAL EDITING ALLOWED 705 + 653 653 * AI selects which claims to extract 654 654 * AI identifies factual vs. non-factual 655 655 * System processes claims as extracted ... ... @@ -656,32 +656,34 @@ 656 656 * No human curation or correction 657 657 658 658 **Error Handling:** 712 + 659 659 * If extraction fails: Display error message 660 660 * User can retry with different input 661 661 * No manual intervention to fix extraction 662 662 663 663 **Acceptance Criteria:** 718 + 664 664 * AI extracts 3-5 claims automatically 665 665 * Claims are factual (not opinions) 666 666 * Claims are clearly stated 667 667 * No manual editing required 668 668 669 ---- 670 - 671 671 === FR-POC-3: Verdict Generation (Fully Automated) === 672 672 673 673 **Requirement:** AI automatically generates verdict for each claim 674 674 675 675 **Functionality:** 729 + 676 676 * For each claim, AI: 677 - * Evaluates claim based on available evidence/knowledge678 - * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED679 - * Assigns confidence score (0-100%)680 - * Assigns risk tier (A/B/C)681 - * Writes brief reasoning (1-3 sentences)731 +* Evaluates claim based on available evidence/knowledge 732 +* Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED 733 +* Assigns confidence score (0-100%) 734 +* Assigns risk tier (A/B/C) 735 +* Writes brief reasoning (1-3 sentences) 682 682 * System displays verdict for each claim 683 683 684 684 **Critical:** NO MANUAL EDITING ALLOWED 739 + 685 685 * AI computes verdicts based on evidence 686 686 * AI generates confidence scores 687 687 * AI writes reasoning ... ... @@ -688,11 +688,13 @@ 688 688 * No human review or adjustment 689 689 690 690 **Error Handling:** 746 + 691 691 * If verdict generation fails: Display error message 692 692 * User can retry 693 693 * No manual intervention to adjust verdicts 694 694 695 695 **Acceptance Criteria:** 752 + 696 696 * Each claim has a verdict 697 697 * Confidence score is displayed (0-100%) 698 698 * Risk tier is displayed (A/B/C) ... ... @@ -700,34 +700,33 @@ 700 700 * Verdict is defensible given reasoning 701 701 * All generated automatically by AI 702 702 703 ---- 704 - 705 705 === FR-POC-4: Analysis Summary (Fully Automated) === 706 706 707 707 **Requirement:** AI generates brief summary of analysis 708 708 709 709 **Functionality:** 765 + 710 710 * AI summarizes findings in 3-5 sentences: 711 - * How many claims found712 - * Distribution of verdicts713 - * Overall assessment767 +* How many claims found 768 +* Distribution of verdicts 769 +* Overall assessment 714 714 * System displays at top of results 715 715 716 716 **Critical:** NO MANUAL EDITING ALLOWED 717 717 718 718 **Acceptance Criteria:** 775 + 719 719 * Summary is coherent 720 720 * Accurately reflects analysis 721 721 * 3-5 sentences 722 722 * Automatically generated 723 723 724 ---- 725 - 726 726 === FR-POC-5: Article Summary (Fully Automated, Optional) === 727 727 728 728 **Requirement:** AI generates brief summary of original article 729 729 730 730 **Functionality:** 786 + 731 731 * AI summarizes article content (not FactHarbor's analysis) 732 732 * 3-5 sentences 733 733 * System displays ... ... @@ -737,18 +737,18 @@ 737 737 **Critical:** NO MANUAL EDITING ALLOWED 738 738 739 739 **Acceptance Criteria:** 796 + 740 740 * Summary is neutral (article's position) 741 741 * Accurately reflects article content 742 742 * 3-5 sentences 743 743 * Automatically generated 744 744 745 ---- 746 - 747 747 === FR-POC-6: Publication Mode Display === 748 748 749 749 **Requirement:** Clear labeling of AI-generated content 750 750 751 751 **Functionality:** 807 + 752 752 * Display Mode 2 publication label 753 753 * Show POC/Demo disclaimer 754 754 * Display risk tiers per claim ... ... @@ -756,18 +756,18 @@ 756 756 * Display timestamp 757 757 758 758 **Acceptance Criteria:** 815 + 759 759 * Label is prominent and clear 760 760 * User understands this is AI-generated POC output 761 761 * Risk tiers are color-coded 762 762 * Quality gate status is visible 763 763 764 ---- 765 - 766 766 === FR-POC-7: Quality Gate Execution === 767 767 768 768 **Requirement:** Execute simplified quality gates 769 769 770 770 **Functionality:** 826 + 771 771 * Check source quality (basic) 772 772 * Attempt contradiction search (basic) 773 773 * Calculate confidence scores ... ... @@ -775,13 +775,12 @@ 775 775 * Display gate results 776 776 777 777 **Acceptance Criteria:** 834 + 778 778 * All 4 gates attempted 779 779 * Pass/fail status displayed 780 780 * Failures explained to user 781 781 * Gates don't block publication (POC mode) 782 782 783 ---- 784 - 785 785 == 9. Non-Functional Requirements == 786 786 787 787 === NFR-POC-1: Fully Automated Processing === ... ... @@ -791,6 +791,7 @@ 791 791 **Critical Rule:** NO MANUAL EDITING AT ANY STAGE 792 792 793 793 **What this means:** 849 + 794 794 * Claims: AI selects (no human curation) 795 795 * Scenarios: N/A (deferred to POC2) 796 796 * Evidence: AI evaluates (no human selection) ... ... @@ -798,13 +798,12 @@ 798 798 * Summaries: AI writes (no human editing) 799 799 800 800 **Pipeline:** 801 -{{code}} 802 -User Input → AKEL Processing → Output Display 803 - ↓ 804 - ZERO human editing 805 -{{/code}} 857 +{{code}}User Input → AKEL Processing → Output Display 858 + ↓ 859 + ZERO human editing{{/code}} 806 806 807 807 **If AI output is poor:** 862 + 808 808 * ❌ Do NOT manually fix it 809 809 * ✅ Document the failure 810 810 * ✅ Improve prompts and retry ... ... @@ -811,59 +811,61 @@ 811 811 * ✅ Accept that POC might fail 812 812 813 813 **Why this matters:** 869 + 814 814 * Tests whether AI can do this without humans 815 815 * Validates scalability (humans can't review every analysis) 816 816 * Honest test of technical feasibility 817 817 818 ---- 819 - 820 820 === NFR-POC-2: Performance === 821 821 822 822 **Requirement:** Analysis completes in reasonable time 823 823 824 824 **Acceptable Performance:** 879 + 825 825 * Processing time: 1-5 minutes (acceptable for POC) 826 826 * Display loading indicator to user 827 827 * Show progress if possible ("Extracting claims...", "Generating verdicts...") 828 828 829 829 **Not Required:** 885 + 830 830 * Production-level speed (< 30 seconds) 831 831 * Optimization for scale 832 832 * Caching 833 833 834 834 **Acceptance Criteria:** 891 + 835 835 * Analysis completes within 5 minutes 836 836 * User sees loading indicator 837 837 * No timeout errors 838 838 839 ---- 840 - 841 841 === NFR-POC-3: Reliability === 842 842 843 843 **Requirement:** System works for manual testing sessions 844 844 845 845 **Acceptable:** 901 + 846 846 * Occasional errors (< 20% failure rate) 847 847 * Manual restart if needed 848 848 * Display error messages clearly 849 849 850 850 **Not Required:** 907 + 851 851 * 99.9% uptime 852 852 * Automatic error recovery 853 853 * Production monitoring 854 854 855 855 **Acceptance Criteria:** 913 + 856 856 * System works for test demonstrations 857 857 * Errors are handled gracefully 858 858 * User receives clear error messages 859 859 860 ---- 861 - 862 862 === NFR-POC-4: Environment === 863 863 864 864 **Requirement:** Runs on simple infrastructure 865 865 866 866 **Acceptable:** 923 + 867 867 * Single machine or simple cloud setup 868 868 * No distributed architecture 869 869 * No load balancing ... ... @@ -871,125 +871,196 @@ 871 871 * Local development environment viable 872 872 873 873 **Not Required:** 931 + 874 874 * Production infrastructure 875 875 * Multi-region deployment 876 876 * Auto-scaling 877 877 * Disaster recovery 878 878 879 --- -937 +=== NFR-POC-5: Cost Efficiency Tracking === 880 880 939 +**Requirement:** Track and display LLM usage metrics to inform optimization decisions 940 + 941 +**Must Track:** 942 + 943 +* Input tokens (article + prompt) 944 +* Output tokens (generated analysis) 945 +* Total tokens 946 +* Estimated cost (USD) 947 +* Response time (seconds) 948 +* Article length (words/characters) 949 + 950 +**Must Display:** 951 + 952 +* Usage statistics in UI (Component 5) 953 +* Cost per analysis 954 +* Cost per claim extracted 955 + 956 +**Must Log:** 957 + 958 +* Aggregate metrics for analysis 959 +* Cost distribution by article length 960 +* Token efficiency trends 961 + 962 +**Purpose:** 963 + 964 +* Understand unit economics 965 +* Identify optimization opportunities 966 +* Project costs at scale 967 +* Inform architecture decisions (caching, model selection, etc.) 968 + 969 +**Acceptance Criteria:** 970 + 971 +* ✅ Usage data displayed after each analysis 972 +* ✅ Metrics logged for aggregate analysis 973 +* ✅ Cost calculated accurately (Claude API pricing) 974 +* ✅ Test cases include varying article lengths 975 +* ✅ POC1 report includes cost analysis section 976 + 977 +**Success Target:** 978 + 979 +* Average cost per analysis < $0.05 USD 980 +* Cost scaling behavior understood (linear/exponential) 981 +* 2+ optimization opportunities identified 982 + 983 +**Critical:** Unit economics must be viable for scaling decision! 984 + 881 881 == 10. Technical Architecture == 882 882 883 883 === 10.1 System Components === 884 884 885 885 **Frontend:** 990 + 886 886 * Simple HTML form (text input + URL input + button) 887 887 * Loading indicator 888 888 * Results display page (single page, no tabs/navigation) 889 889 890 890 **Backend:** 996 + 891 891 * Single API endpoint 892 -* Calls Claude API (Sonnet4.5orlatest)998 +* Calls provider API (REASONING model; configured via LLM abstraction) 893 893 * Parses response 894 894 * Returns JSON to frontend 895 895 896 896 **Data Storage:** 1003 + 897 897 * None required (stateless POC) 898 898 * Optional: Simple file storage or SQLite for demo examples 899 899 900 900 **External Services:** 1008 + 901 901 * Claude API (Anthropic) - required 902 902 * Optional: URL fetch service for article text extraction 903 903 904 ---- 905 - 906 906 === 10.2 Processing Flow === 907 907 908 908 {{code}} 909 909 1. User submits text or URL 910 - ↓1016 + ↓ 911 911 2. Backend receives request 912 - ↓1018 + ↓ 913 913 3. If URL: Fetch article text 914 - ↓1020 + ↓ 915 915 4. Call Claude API with single prompt: 916 - "Extract claims, evaluate each, provide verdicts"917 - ↓1022 + "Extract claims, evaluate each, provide verdicts" 1023 + ↓ 918 918 5. Claude API returns: 919 - - Analysis summary920 - - Claims list921 - - Verdicts for each claim (with risk tiers)922 - - Article summary (optional)923 - - Quality gate results924 - ↓1025 + - Analysis summary 1026 + - Claims list 1027 + - Verdicts for each claim (with risk tiers) 1028 + - Article summary (optional) 1029 + - Quality gate results 1030 + ↓ 925 925 6. Backend parses response 926 - ↓1032 + ↓ 927 927 7. Frontend displays results with Mode 2 labeling 928 928 {{/code}} 929 929 930 930 **Key Simplification:** Single API call does entire analysis 931 931 932 ---- 933 - 934 934 === 10.3 AI Prompt Strategy === 935 935 936 936 **Single Comprehensive Prompt:** 937 -{{code}} 938 -Task: Analyze this article and provide: 1041 +{{code}}Task: Analyze this article and provide: 939 939 940 -1. Extract 3-5 factual claims from the article 941 -2. For each claim: 942 - - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED) 943 - - Assign confidence score (0-100%) 944 - - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions) 945 - - Write brief reasoning (1-3 sentences) 946 -3. Run quality gates: 947 - - Check: ≥2 sources found 948 - - Attempt: Basic contradiction search 949 - - Calculate: Confidence scores 950 - - Verify: Structural integrity 951 -4. Write analysis summary (3-5 sentences: claims found, verdict distribution, overall assessment) 952 -5. Write article summary (3-5 sentences: neutral summary of article content) 1043 +1. Identify the article's main thesis/conclusion 1044 + - What is the article trying to argue or prove? 1045 + - What is the primary claim or conclusion? 953 953 954 -Return as structured JSON with quality gate results. 955 -{{/code}} 1047 +2. Extract 3-5 factual claims from the article 1048 + - Note which claims are CENTRAL to the main thesis 1049 + - Note which claims are SUPPORTING facts 956 956 1051 +3. For each claim: 1052 + - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED) 1053 + - Assign confidence score (0-100%) 1054 + - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions) 1055 + - Write brief reasoning (1-3 sentences) 1056 + 1057 +4. Assess relationship between claims and main thesis: 1058 + - Do the claims actually support the article's conclusion? 1059 + - Are there logical leaps or unsupported inferences? 1060 + - Is the article's framing misleading even if individual facts are accurate? 1061 + 1062 +5. Run quality gates: 1063 + - Check: ≥2 sources found 1064 + - Attempt: Basic contradiction search 1065 + - Calculate: Confidence scores 1066 + - Verify: Structural integrity 1067 + 1068 +6. Write context-aware analysis summary (4-6 sentences): 1069 + - State article's main thesis 1070 + - Report claims found and verdict distribution 1071 + - Note if central claims are problematic 1072 + - Assess whether evidence supports conclusion 1073 + - Overall credibility considering claim importance 1074 + 1075 +7. Write article summary (3-5 sentences: neutral summary of article content) 1076 + 1077 +Return as structured JSON with quality gate results.{{/code}} 1078 + 957 957 **One prompt generates everything.** 958 958 959 - ---1081 +**Critical Addition:** 960 960 1083 +Steps 1, 2 (marking central claims), 4, and 6 are NEW for context-aware analysis. These test whether AI can distinguish between "accurate facts poorly reasoned" vs. "genuinely credible article." 1084 + 961 961 === 10.4 Technology Stack Suggestions === 962 962 963 963 **Frontend:** 1088 + 964 964 * HTML + CSS + JavaScript (minimal framework) 965 965 * OR: Next.js (if team prefers) 966 966 * Hosted: Local machine OR Vercel/Netlify free tier 967 967 968 968 **Backend:** 1094 + 969 969 * Python Flask/FastAPI (simple REST API) 970 970 * OR: Next.js API routes (if using Next.js) 971 971 * Hosted: Local machine OR Railway/Render free tier 972 972 973 973 **AKEL Integration:** 1100 + 974 974 * Claude API via Anthropic SDK 975 -* Model: Claude Sonnet4.5or latest available1102 +* Model: Provider-default REASONING model or latest available 976 976 977 977 **Database:** 1105 + 978 978 * None (stateless acceptable) 979 979 * OR: SQLite if want to store demo examples 980 980 * OR: JSON files on disk 981 981 982 982 **Deployment:** 1111 + 983 983 * Local development environment sufficient for POC 984 984 * Optional: Deploy to cloud for remote demos 985 985 986 ---- 987 - 988 988 == 11. Success Criteria == 989 989 990 990 === 11.1 Minimum Success (POC Passes) === 991 991 992 992 **Required for GO decision:** 1120 + 993 993 * ✅ AI extracts 3-5 factual claims automatically 994 994 * ✅ AI provides verdict for each claim automatically 995 995 * ✅ Verdicts are reasonable (≥70% make logical sense) ... ... @@ -998,17 +998,20 @@ 998 998 * ✅ Team/advisors understand the output 999 999 * ✅ Team agrees approach has merit 1000 1000 * ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention) 1129 +* ✅ **Cost efficiency acceptable** (average cost per analysis < $0.05 USD target) 1130 +* ✅ **Cost scaling understood** (data collected on article length vs. cost) 1131 +* ✅ **Optimization opportunities identified** (≥2 potential improvements documented) 1001 1001 1002 1002 **Quality Definition:** 1134 + 1003 1003 * "Reasonable verdict" = Defensible given general knowledge 1004 1004 * "Coherent summary" = Logically structured, grammatically correct 1005 1005 * "Comprehensible" = Reviewers understand what analysis means 1006 1006 1007 ---- 1008 - 1009 1009 === 11.2 POC Fails If === 1010 1010 1011 1011 **Automatic NO-GO if any of these:** 1142 + 1012 1012 * ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones) 1013 1013 * ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random) 1014 1014 * ❌ Output incomprehensible (reviewers can't understand analysis) ... ... @@ -1015,21 +1015,20 @@ 1015 1015 * ❌ **Requires manual editing for most analyses** (> 50% need human correction) 1016 1016 * ❌ Team loses confidence in AI-automated approach 1017 1017 1018 ---- 1019 - 1020 1020 === 11.3 Quality Thresholds === 1021 1021 1022 1022 **POC quality expectations:** 1023 1023 1024 1024 |=Component|=Quality Threshold|=Definition 1025 -|Claim Extraction|(% class="success" %)≥70% accuracy (%%)|Identifies obvious factual claims, may miss some edge cases1026 -|Verdict Logic|(% class="success" %)≥70% defensible (%%)|Verdicts are logical given reasoning provided1027 -|Reasoning Clarity|(% class="success" %)≥70% clear (%%)|1-3 sentences are understandable and relevant1028 -|Overall Analysis|(% class="success" %)≥70% useful (%%)|Output helps user understand article claims1154 +|Claim Extraction|(% class="success" %)≥70% accuracy |Identifies obvious factual claims, may miss some edge cases 1155 +|Verdict Logic|(% class="success" %)≥70% defensible |Verdicts are logical given reasoning provided 1156 +|Reasoning Clarity|(% class="success" %)≥70% clear |1-3 sentences are understandable and relevant 1157 +|Overall Analysis|(% class="success" %)≥70% useful |Output helps user understand article claims 1029 1029 1030 1030 **Analogy:** "B student" quality (70-80%), not "A+" perfection yet 1031 1031 1032 1032 **Not expecting:** 1162 + 1033 1033 * 100% accuracy 1034 1034 * Perfect claim coverage 1035 1035 * Comprehensive evidence gathering ... ... @@ -1037,13 +1037,12 @@ 1037 1037 * Production polish 1038 1038 1039 1039 **Expecting:** 1170 + 1040 1040 * Reasonable claim extraction 1041 1041 * Defensible verdicts 1042 1042 * Understandable reasoning 1043 1043 * Useful output 1044 1044 1045 ---- 1046 - 1047 1047 == 12. Test Cases == 1048 1048 1049 1049 === 12.1 Test Case 1: Simple Factual Claim === ... ... @@ -1051,6 +1051,7 @@ 1051 1051 **Input:** "Coffee reduces the risk of type 2 diabetes by 30%" 1052 1052 1053 1053 **Expected Output:** 1183 + 1054 1054 * Extract claim correctly 1055 1055 * Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED 1056 1056 * Confidence: 70-90% ... ... @@ -1059,13 +1059,12 @@ 1059 1059 1060 1060 **Success:** Verdict is reasonable and reasoning makes sense 1061 1061 1062 ---- 1063 - 1064 1064 === 12.2 Test Case 2: Complex News Article === 1065 1065 1066 1066 **Input:** News article URL with multiple claims about politics/health/science 1067 1067 1068 1068 **Expected Output:** 1197 + 1069 1069 * Extract 3-5 key claims 1070 1070 * Verdict for each (may vary: some supported, some uncertain, some refuted) 1071 1071 * Coherent analysis summary ... ... @@ -1074,13 +1074,12 @@ 1074 1074 1075 1075 **Success:** Claims identified are actually from article, verdicts are reasonable 1076 1076 1077 ---- 1078 - 1079 1079 === 12.3 Test Case 3: Controversial Topic === 1080 1080 1081 1081 **Input:** Article on contested political or scientific topic 1082 1082 1083 1083 **Expected Output:** 1211 + 1084 1084 * Balanced analysis 1085 1085 * Acknowledges uncertainty where appropriate 1086 1086 * Doesn't overstate confidence ... ... @@ -1088,13 +1088,12 @@ 1088 1088 1089 1089 **Success:** Analysis is fair and doesn't show obvious bias 1090 1090 1091 ---- 1092 - 1093 1093 === 12.4 Test Case 4: Clearly False Claim === 1094 1094 1095 1095 **Input:** Article with obviously false claim (e.g., "The Earth is flat") 1096 1096 1097 1097 **Expected Output:** 1224 + 1098 1098 * Extract claim 1099 1099 * Verdict: REFUTED 1100 1100 * High confidence (> 90%) ... ... @@ -1103,13 +1103,12 @@ 1103 1103 1104 1104 **Success:** AI correctly identifies false claim with high confidence 1105 1105 1106 ---- 1107 - 1108 1108 === 12.5 Test Case 5: Genuinely Uncertain Claim === 1109 1109 1110 1110 **Input:** Article with claim where evidence is genuinely mixed 1111 1111 1112 1112 **Expected Output:** 1238 + 1113 1113 * Extract claim 1114 1114 * Verdict: UNCERTAIN 1115 1115 * Moderate confidence (40-60%) ... ... @@ -1117,13 +1117,12 @@ 1117 1117 1118 1118 **Success:** AI recognizes uncertainty and doesn't overstate confidence 1119 1119 1120 ---- 1121 - 1122 1122 === 12.6 Test Case 6: High-Risk Medical Claim === 1123 1123 1124 1124 **Input:** Article making medical claims 1125 1125 1126 1126 **Expected Output:** 1251 + 1127 1127 * Extract claim 1128 1128 * Verdict: [appropriate based on evidence] 1129 1129 * Risk tier: A (High - medical) ... ... @@ -1132,8 +1132,6 @@ 1132 1132 1133 1133 **Success:** Risk tier correctly assigned, appropriate warnings shown 1134 1134 1135 ---- 1136 - 1137 1137 == 13. POC Decision Gate == 1138 1138 1139 1139 === 13.1 Decision Framework === ... ... @@ -1143,6 +1143,7 @@ 1143 1143 **Option A: GO (Proceed to POC2)** 1144 1144 1145 1145 **Conditions:** 1269 + 1146 1146 * AI quality ≥70% without manual editing 1147 1147 * Basic claim → verdict pipeline validated 1148 1148 * Internal + advisor feedback positive ... ... @@ -1151,16 +1151,16 @@ 1151 1151 * Clear path to improving AI quality to ≥90% 1152 1152 1153 1153 **Next Steps:** 1278 + 1154 1154 * Plan POC2 development (add scenarios) 1155 1155 * Design scenario architecture 1156 1156 * Expand to Evidence Model structure 1157 1157 * Test with more complex articles 1158 1158 1159 ---- 1160 - 1161 1161 **Option B: NO-GO (Pivot or Stop)** 1162 1162 1163 1163 **Conditions:** 1287 + 1164 1164 * AI quality < 60% 1165 1165 * Requires manual editing for most analyses (> 50%) 1166 1166 * Feedback indicates fundamental flaws ... ... @@ -1168,14 +1168,14 @@ 1168 1168 * No clear path to improvement 1169 1169 1170 1170 **Next Steps:** 1295 + 1171 1171 * **Pivot:** Change to hybrid human-AI approach (accept manual review required) 1172 1172 * **Stop:** Conclude approach not viable, revisit later 1173 1173 1174 ---- 1175 - 1176 1176 **Option C: ITERATE (Improve POC)** 1177 1177 1178 1178 **Conditions:** 1302 + 1179 1179 * Concept has merit but execution needs work 1180 1180 * Specific improvements identified 1181 1181 * Addressable with better prompts/approach ... ... @@ -1182,46 +1182,43 @@ 1182 1182 * AI quality between 60-70% 1183 1183 1184 1184 **Next Steps:** 1309 + 1185 1185 * Improve AI prompts 1186 1186 * Test different approaches 1187 1187 * Re-run POC with improvements 1188 1188 * Then make GO/NO-GO decision 1189 1189 1190 ---- 1191 - 1192 1192 === 13.2 Decision Criteria Summary === 1193 1193 1194 1194 {{code}} 1195 -AI Quality < 60% → NO-GO (approach doesn't work)1318 +AI Quality < 60% → NO-GO (approach doesn't work) 1196 1196 AI Quality 60-70% → ITERATE (improve and retry) 1197 -AI Quality ≥70% → GO (proceed to POC2)1320 +AI Quality ≥70% → GO (proceed to POC2) 1198 1198 {{/code}} 1199 1199 1200 ---- 1201 - 1202 1202 == 14. Key Risks & Mitigations == 1203 1203 1204 1204 === 14.1 Risk: AI Quality Not Good Enough === 1205 1205 1206 -**Likelihood:** Medium-High 1207 -**Impact:** POC fails 1327 +**Likelihood:** Medium-High 1328 +**Impact:** POC fails 1208 1208 1209 1209 **Mitigation:** 1331 + 1210 1210 * Extensive prompt engineering and testing 1211 -* Use best available AI models ( Sonnet4.5)1333 +* Use best available AI models (role-based selection; configured via LLM abstraction) 1212 1212 * Test with diverse article types 1213 1213 * Iterate on prompts based on results 1214 1214 1215 1215 **Acceptance:** This is what POC tests - be ready for failure 1216 1216 1217 ---- 1218 - 1219 1219 === 14.2 Risk: AI Consistency Issues === 1220 1220 1221 -**Likelihood:** Medium 1222 -**Impact:** Works sometimes, fails other times 1341 +**Likelihood:** Medium 1342 +**Impact:** Works sometimes, fails other times 1223 1223 1224 1224 **Mitigation:** 1345 + 1225 1225 * Test with 10+ diverse articles 1226 1226 * Measure success rate honestly 1227 1227 * Improve prompts to increase consistency ... ... @@ -1228,14 +1228,13 @@ 1228 1228 1229 1229 **Acceptance:** Some variability OK if average quality ≥70% 1230 1230 1231 ---- 1232 - 1233 1233 === 14.3 Risk: Output Incomprehensible === 1234 1234 1235 -**Likelihood:** Low-Medium 1236 -**Impact:** Users can't understand analysis 1354 +**Likelihood:** Low-Medium 1355 +**Impact:** Users can't understand analysis 1237 1237 1238 1238 **Mitigation:** 1358 + 1239 1239 * Create clear explainer document 1240 1240 * Iterate on output format 1241 1241 * Test with non-technical reviewers ... ... @@ -1243,14 +1243,13 @@ 1243 1243 1244 1244 **Acceptance:** Iterate until comprehensible 1245 1245 1246 ---- 1247 - 1248 1248 === 14.4 Risk: API Rate Limits / Costs === 1249 1249 1250 -**Likelihood:** Low 1251 -**Impact:** System slow or expensive 1368 +**Likelihood:** Low 1369 +**Impact:** System slow or expensive 1252 1252 1253 1253 **Mitigation:** 1372 + 1254 1254 * Monitor API usage 1255 1255 * Implement retry logic 1256 1256 * Estimate costs before scaling ... ... @@ -1257,14 +1257,13 @@ 1257 1257 1258 1258 **Acceptance:** POC can be slow and expensive (optimization later) 1259 1259 1260 ---- 1261 - 1262 1262 === 14.5 Risk: Scope Creep === 1263 1263 1264 -**Likelihood:** Medium 1265 -**Impact:** POC becomes too complex 1381 +**Likelihood:** Medium 1382 +**Impact:** POC becomes too complex 1266 1266 1267 1267 **Mitigation:** 1385 + 1268 1268 * Strict scope discipline 1269 1269 * Say NO to feature additions 1270 1270 * Keep focus on core question ... ... @@ -1271,18 +1271,19 @@ 1271 1271 1272 1272 **Acceptance:** POC is minimal by design 1273 1273 1274 ---- 1275 - 1276 1276 == 15. POC Philosophy == 1277 1277 1278 1278 === 15.1 Core Principles === 1279 1279 1280 -**1. Build Less, Learn More** 1396 +* 1397 +** 1398 +**1. Build Less, Learn More 1281 1281 * Minimum features to test hypothesis 1282 1282 * Don't build unvalidated features 1283 1283 * Focus on core question only 1284 1284 1285 1285 **2. Fail Fast** 1404 + 1286 1286 * Quick test of hardest part (AI capability) 1287 1287 * Accept that POC might fail 1288 1288 * Better to discover issues early ... ... @@ -1289,45 +1289,45 @@ 1289 1289 * Honest assessment over optimistic hope 1290 1290 1291 1291 **3. Test First, Build Second** 1411 + 1292 1292 * Validate AI can do this before building platform 1293 1293 * Don't assume it will work 1294 1294 * Let results guide decisions 1295 1295 1296 1296 **4. Automation First** 1417 + 1297 1297 * No manual editing allowed 1298 1298 * Tests scalability, not just feasibility 1299 1299 * Proves approach can work at scale 1300 1300 1301 1301 **5. Honest Assessment** 1423 + 1302 1302 * Don't cherry-pick examples 1303 1303 * Don't manually fix bad outputs 1304 1304 * Document failures openly 1305 1305 * Make data-driven decisions 1306 1306 1307 ---- 1308 - 1309 1309 === 15.2 What POC Is === 1310 1310 1311 -✅ Testing AI capability without humans 1312 -✅ Proving core technical concept 1313 -✅ Fast validation of approach 1314 -✅ Honest assessment of feasibility 1431 +✅ Testing AI capability without humans 1432 +✅ Proving core technical concept 1433 +✅ Fast validation of approach 1434 +✅ Honest assessment of feasibility 1315 1315 1316 ---- 1317 - 1318 1318 === 15.3 What POC Is NOT === 1319 1319 1320 -❌ Building a product 1321 -❌ Production-ready system 1322 -❌ Feature-complete platform 1323 -❌ Perfectly accurate analysis 1324 -❌ Polished user experience 1438 +❌ Building a product 1439 +❌ Production-ready system 1440 +❌ Feature-complete platform 1441 +❌ Perfectly accurate analysis 1442 +❌ Polished user experience 1325 1325 1326 - ---1444 +== 16. Success == 1327 1327 1328 - ==16. Success =Clear Path Forward ==1446 + Clear Path Forward == 1329 1329 1330 1330 **If POC succeeds (≥70% AI quality):** 1449 + 1331 1331 * ✅ Approach validated 1332 1332 * ✅ Proceed to POC2 (add scenarios) 1333 1333 * ✅ Design full Evidence Model structure ... ... @@ -1335,6 +1335,7 @@ 1335 1335 * ✅ Focus on improving AI quality from 70% → 90% 1336 1336 1337 1337 **If POC fails (< 60% AI quality):** 1457 + 1338 1338 * ✅ Learn what doesn't work 1339 1339 * ✅ Pivot to different approach 1340 1340 * ✅ OR wait for better AI technology ... ... @@ -1342,18 +1342,62 @@ 1342 1342 1343 1343 **Either way, POC provides clarity.** 1344 1344 1345 ---- 1346 - 1347 1347 == 17. Related Pages == 1348 1348 1349 -* [[User Needs>>FactHarbor.Specification.Requirements.User Needs]] 1350 -* [[Requirements>>FactHarbor.Requirements.WebHome]] 1351 -* [[Gap Analysis>>FactHarbor. Analysis.GapAnalysis]]1467 +* [[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]] 1468 +* [[Requirements>>FactHarbor.Specification.Requirements.WebHome]] 1469 +* [[Gap Analysis>>FactHarbor.Specification.Requirements.GapAnalysis]] 1352 1352 * [[Architecture>>FactHarbor.Specification.Architecture.WebHome]] 1353 -* [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] 1471 +* [[AKEL>>Archive.FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] 1354 1354 * [[Workflows>>FactHarbor.Specification.Workflows.WebHome]] 1355 1355 1356 ---- 1357 - 1358 1358 **Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment) 1359 1359 1476 + 1477 +=== NFR-POC-11: LLM Provider Abstraction (POC1) === 1478 + 1479 +**Requirement:** POC1 MUST implement LLM abstraction layer with support for multiple providers. 1480 + 1481 +**POC1 Implementation:** 1482 + 1483 +* **Primary Provider:** Anthropic Claude API 1484 +* Stage 1: Provider-default FAST model 1485 +* Stage 2: Provider-default REASONING model (cached) 1486 +* Stage 3: Provider-default REASONING model 1487 + 1488 +* **Provider Interface:** Abstract LLMProvider interface implemented 1489 + 1490 +* **Configuration:** Environment variables for provider selection 1491 +* {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}} 1492 +* {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}} 1493 +* {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}} 1494 + 1495 +* **Failover:** Basic error handling with cache fallback for Stage 2 1496 + 1497 +* **Cost Tracking:** Log provider name and cost per request 1498 + 1499 +**Future (POC2/Beta):** 1500 + 1501 +* Secondary provider (OpenAI) with automatic failover 1502 +* Admin API for runtime provider switching 1503 +* Cost comparison dashboard 1504 +* Cross-provider output verification 1505 + 1506 +**Success Criteria:** 1507 + 1508 +* All LLM calls go through abstraction layer (no direct API calls) 1509 +* Provider can be changed via environment variable without code changes 1510 +* Cost tracking includes provider name in logs 1511 +* Stage 2 falls back to cache on provider failure 1512 + 1513 +**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor.Specification.POC.API-and-Schemas.WebHome]] Section 6 1514 + 1515 +**Dependencies:** 1516 + 1517 +* NFR-14 (Main Requirements) 1518 +* Design Decision 9 1519 +* Architecture Section 2.2 1520 + 1521 +**Priority:** HIGH (P1) 1522 + 1523 +**Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.