Changes for page POC Requirements (POC1 & POC2)
Last modified by Robert Schaub on 2026/02/08 08:26
Summary
-
Page properties (2 modified, 0 added, 0 removed)
Details
- Page properties
-
- Title
-
... ... @@ -1,1 +1,1 @@ 1 -POC Requirements (POC1 & POC2)1 +POC Requirements - Content
-
... ... @@ -1,28 +1,19 @@ 1 1 = POC Requirements = 2 2 3 - 4 -{{info}} 5 -**POC1 Architecture:** 3-stage AKEL pipeline (Extract → Analyze → Holistic) with Redis caching, credit tracking, and LLM abstraction layer. 6 - 7 -See [[POC1 API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome]] for complete technical details. 8 -{{/info}} 9 - 10 - 11 - 12 -**Status:** ✅ Approved for Development 13 -**Version:** 2.0 (Updated after Specification Cross-Check) 3 +**Status:** ✅ Approved for Development 4 +**Version:** 2.0 (Updated after Specification Cross-Check) 14 14 **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention 15 15 7 +--- 8 + 16 16 == 1. POC Overview == 17 17 18 18 === 1.1 What POC Tests === 19 19 20 20 **Core Question:** 21 - 22 22 > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts? 23 23 24 24 **What we're proving:** 25 - 26 26 * AI can identify factual claims from text 27 27 * AI can evaluate those claims and produce verdicts 28 28 * Output is comprehensible and useful ... ... @@ -29,7 +29,6 @@ 29 29 * Fully automated approach is viable 30 30 31 31 **What we're NOT testing:** 32 - 33 33 * Scenario generation (deferred to POC2) 34 34 * Evidence display (deferred to POC2) 35 35 * Production scalability ... ... @@ -36,6 +36,8 @@ 36 36 * Perfect accuracy 37 37 * Complete feature set 38 38 29 +--- 30 + 39 39 === 1.2 Scenarios Deferred to POC2 === 40 40 41 41 **Intentional Simplification:** ... ... @@ -43,7 +43,6 @@ 43 43 Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**. 44 44 45 45 **Rationale:** 46 - 47 47 * **POC1 tests:** Can AI extract claims and generate verdicts? 48 48 * **POC2 will add:** Scenario generation and management 49 49 * **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow? ... ... @@ -55,7 +55,6 @@ 55 55 **No Risk:** 56 56 57 57 Scenarios are additive complexity, not foundational. Deferring them to POC2 allows: 58 - 59 59 * Faster POC1 validation 60 60 * Learning from POC1 to inform scenario design 61 61 * Iterative approach: fail fast if basic AI doesn't work ... ... @@ -62,91 +62,65 @@ 62 62 * Flexibility to adjust scenario architecture based on POC1 insights 63 63 64 64 **Full System Workflow (Future):** 65 -{{code}}Claims → Scenarios → Evidence → Verdicts{{/code}} 55 +{{code}} 56 +Claims → Scenarios → Evidence → Verdicts 57 +{{/code}} 66 66 67 67 **POC1 Simplified Workflow:** 68 -{{code}}Claims → Verdicts (scenarios implicit in reasoning){{/code}} 60 +{{code}} 61 +Claims → Verdicts (scenarios implicit in reasoning) 62 +{{/code}} 69 69 64 +--- 65 + 70 70 == 2. POC Output Specification == 71 71 72 -=== 2.1 Component 1: ANALYSIS SUMMARY (Context-Aware)===68 +=== 2.1 Component 1: ANALYSIS SUMMARY === 73 73 74 -**What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument 70 +**What:** Brief overview of findings 71 +**Length:** 3-5 sentences 72 +**Content:** 73 +* How many claims found 74 +* Distribution of verdicts 75 +* Overall assessment 75 75 76 -**Length:** 4-6 sentences 77 +**Example:** 78 +{{code}} 79 +This article makes 4 claims about coffee's health effects. We found 80 +2 claims are well-supported, 1 is uncertain, and 1 is refuted. 81 +Overall assessment: mostly accurate with some exaggeration. 82 +{{/code}} 77 77 78 - **Content (Required Elements):**84 +--- 79 79 80 -1. **Article's main thesis/claim** - What is the article trying to argue or prove? 81 -2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts 82 -3. **Central vs. supporting claims** - Which claims are central to the article's argument? 83 -4. **Relationship assessment** - Do the claims support the article's conclusion? 84 -5. **Overall credibility** - Final assessment considering claim importance 85 - 86 -**Critical Innovation:** 87 - 88 -POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might: 89 - 90 -* Make accurate supporting facts but draw unsupported conclusions 91 -* Have one false central claim that invalidates the whole argument 92 -* Misframe accurate information to mislead 93 - 94 -**Good Example (Context-Aware):** 95 -{{code}}This article argues that coffee cures cancer based on its antioxidant 96 -content. We analyzed 3 factual claims: 2 about coffee's chemical 97 -properties are well-supported, but the main causal claim is refuted 98 -by current evidence. The article confuses correlation with causation. 99 -Overall assessment: MISLEADING - makes an unsupported medical claim 100 -despite citing some accurate facts.{{/code}} 101 - 102 -**Poor Example (Simple Aggregation - Don't Do This):** 103 -{{code}}This article makes 3 claims. 2 are well-supported and 1 is refuted. 104 -Overall assessment: mostly accurate (67% accurate).{{/code}} 105 -↑ This misses that the refuted claim IS the article's main point! 106 - 107 -**What POC1 Tests:** 108 - 109 -Can AI identify and assess: 110 - 111 -* ✅ The article's main thesis/conclusion? 112 -* ✅ Which claims are central vs. supporting? 113 -* ✅ Whether the evidence supports the conclusion? 114 -* ✅ Overall credibility considering logical structure? 115 - 116 -**If AI Cannot Do This:** 117 - 118 -That's valuable to learn in POC1! We'll: 119 - 120 -* Note as limitation 121 -* Fall back to simple aggregation with warning 122 -* Design explicit article-level analysis for POC2 123 - 124 124 === 2.2 Component 2: CLAIMS IDENTIFICATION === 125 125 126 -**What:** List of factual claims extracted from article 127 -**Format:** Numbered list 128 -**Quantity:** 3-5 claims 88 +**What:** List of factual claims extracted from article 89 +**Format:** Numbered list 90 +**Quantity:** 3-5 claims 129 129 **Requirements:** 130 - 131 131 * Factual claims only (not opinions/questions) 132 132 * Clearly stated 133 133 * Automatically extracted by AI 134 134 135 135 **Example:** 136 -{{code}}CLAIMS IDENTIFIED: 97 +{{code}} 98 +CLAIMS IDENTIFIED: 137 137 138 138 [1] Coffee reduces diabetes risk by 30% 139 139 [2] Coffee improves heart health 140 140 [3] Decaf has same benefits as regular 141 -[4] Coffee prevents Alzheimer's completely{{/code}} 103 +[4] Coffee prevents Alzheimer's completely 104 +{{/code}} 142 142 106 +--- 107 + 143 143 === 2.3 Component 3: CLAIMS VERDICTS === 144 144 145 -**What:** Verdict for each claim identified 146 -**Format:** Per claim structure 110 +**What:** Verdict for each claim identified 111 +**Format:** Per claim structure 147 147 148 148 **Required Elements:** 149 - 150 150 * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED 151 151 * **Confidence Score:** 0-100% 152 152 * **Brief Reasoning:** 1-3 sentences explaining why ... ... @@ -153,7 +153,8 @@ 153 153 * **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration 154 154 155 155 **Example:** 156 -{{code}}VERDICTS: 120 +{{code}} 121 +VERDICTS: 157 157 158 158 [1] WELL-SUPPORTED (85%) [Risk: C] 159 159 Multiple studies confirm 25-30% risk reduction with regular consumption. ... ... @@ -165,86 +165,44 @@ 165 165 Some benefits overlap, but caffeine-related benefits are reduced in decaf. 166 166 167 167 [4] REFUTED (90%) [Risk: B] 168 -No evidence for complete prevention. Claim is significantly overstated.{{/code}} 133 +No evidence for complete prevention. Claim is significantly overstated. 134 +{{/code}} 169 169 170 170 **Risk Tier Display:** 171 - 172 172 * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections 173 -* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality 138 +* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality 174 174 * **Tier C (Green):** Low Risk - Facts/Definitions/History 175 175 176 176 **Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow. 177 177 143 +--- 144 + 178 178 === 2.4 Component 4: ARTICLE SUMMARY (Optional) === 179 179 180 -**What:** Brief summary of original article content 181 -**Length:** 3-5 sentences 147 +**What:** Brief summary of original article content 148 +**Length:** 3-5 sentences 182 182 **Tone:** Neutral (article's position, not FactHarbor's analysis) 183 183 184 184 **Example:** 185 -{{code}}ARTICLE SUMMARY: 152 +{{code}} 153 +ARTICLE SUMMARY: 186 186 187 187 Health News Today article discusses coffee benefits, citing studies 188 188 on diabetes and Alzheimer's. Author highlights research linking coffee 189 -to disease prevention. Recommends 2-3 cups daily for optimal health.{{/code}} 157 +to disease prevention. Recommends 2-3 cups daily for optimal health. 158 +{{/code}} 190 190 191 - === 2.5 Component 5: USAGE STATISTICS (Cost Tracking) ===160 +--- 192 192 193 - **What:**LLMusage metrics for costoptimizationand scaling decisions162 +=== 2.5 Total Output Size === 194 194 195 -**Purpose:** 196 - 197 -* Understand cost per analysis 198 -* Identify optimization opportunities 199 -* Project costs at scale 200 -* Inform architecture decisions 201 - 202 -**Display Format:** 203 -{{code}}USAGE STATISTICS: 204 -• Article: 2,450 words (12,300 characters) 205 -• Input tokens: 15,234 206 -• Output tokens: 892 207 -• Total tokens: 16,126 208 -• Estimated cost: $0.24 USD 209 -• Response time: 8.3 seconds 210 -• Cost per claim: $0.048 211 -• Model: claude-sonnet-4-20250514{{/code}} 212 - 213 -**Why This Matters:** 214 - 215 -At scale, LLM costs are critical: 216 - 217 -* 10,000 articles/month ≈ $200-500/month 218 -* 100,000 articles/month ≈ $2,000-5,000/month 219 -* Cost optimization can reduce expenses 30-50% 220 - 221 -**What POC1 Learns:** 222 - 223 -* How cost scales with article length 224 -* Prompt optimization opportunities (caching, compression) 225 -* Output verbosity tradeoffs 226 -* Model selection strategy (FAST vs. REASONING roles) 227 -* Article length limits (if needed) 228 - 229 -**Implementation:** 230 - 231 -* Claude API already returns usage data 232 -* No extra API calls needed 233 -* Display to user + log for aggregate analysis 234 -* Test with articles of varying lengths 235 - 236 -**Critical for GO/NO-GO:** Unit economics must be viable at scale! 237 - 238 -=== 2.6 Total Output Size === 239 - 240 -**Combined:** 220-350 words 241 - 242 -* Analysis Summary (Context-Aware): 60-90 words (4-6 sentences) 164 +**Combined:** ~200-300 words 165 +* Analysis Summary: 50-70 words 243 243 * Claims Identification: 30-50 words 244 244 * Claims Verdicts: 100-150 words 245 245 * Article Summary: 30-50 words (optional) 246 246 247 - **Note:** Analysis summary is slightly longer (4-6 sentences vs. 3-5) to accommodate context-aware assessment of article structure and logical reasoning.170 +--- 248 248 249 249 == 3. What's NOT in POC Scope == 250 250 ... ... @@ -253,7 +253,6 @@ 253 253 The following are **explicitly excluded** from POC: 254 254 255 255 **Content Features:** 256 - 257 257 * ❌ Scenarios (deferred to POC2) 258 258 * ❌ Evidence display (supporting/opposing lists) 259 259 * ❌ Source links (clickable references) ... ... @@ -263,7 +263,6 @@ 263 263 * ❌ Risk assessment (shown but not workflow-integrated) 264 264 265 265 **Platform Features:** 266 - 267 267 * ❌ User accounts / authentication 268 268 * ❌ Saved history 269 269 * ❌ Search functionality ... ... @@ -273,7 +273,6 @@ 273 273 * ❌ Social sharing 274 274 275 275 **Technical Features:** 276 - 277 277 * ❌ Browser extensions 278 278 * ❌ Mobile apps 279 279 * ❌ API endpoints ... ... @@ -281,7 +281,6 @@ 281 281 * ❌ Export features (PDF, CSV) 282 282 283 283 **Quality Features:** 284 - 285 285 * ❌ Accessibility (WCAG compliance) 286 286 * ❌ Multilingual support 287 287 * ❌ Mobile optimization ... ... @@ -288,7 +288,6 @@ 288 288 * ❌ Media verification (images/videos) 289 289 290 290 **Production Features:** 291 - 292 292 * ❌ Security hardening 293 293 * ❌ Privacy compliance (GDPR) 294 294 * ❌ Terms of service ... ... @@ -297,18 +297,24 @@ 297 297 * ❌ Analytics 298 298 * ❌ A/B testing 299 299 218 +--- 219 + 300 300 == 4. POC Simplifications vs. Full System == 301 301 302 302 === 4.1 Architecture Comparison === 303 303 304 304 **POC Architecture (Simplified):** 305 -{{code}}User Input → Single AKEL Call → Output Display 306 - (all processing){{/code}} 225 +{{code}} 226 +User Input → Single AKEL Call → Output Display 227 + (all processing) 228 +{{/code}} 307 307 308 308 **Full System Architecture:** 309 -{{code}}User Input → Claim Extractor → Claim Classifier → Scenario Generator 231 +{{code}} 232 +User Input → Claim Extractor → Claim Classifier → Scenario Generator 310 310 → Evidence Summarizer → Contradiction Detector → Verdict Generator 311 -→ Quality Gates → Publication → Output Display{{/code}} 234 +→ Quality Gates → Publication → Output Display 235 +{{/code}} 312 312 313 313 **Key Differences:** 314 314 ... ... @@ -321,17 +321,17 @@ 321 321 |Data Model|Stateless (no database)|PostgreSQL + Redis + S3 322 322 |Architecture|Single prompt to Claude|AKEL Orchestrator + Components 323 323 248 +--- 249 + 324 324 === 4.2 Workflow Comparison === 325 325 326 326 **POC1 Workflow:** 327 - 328 328 1. User submits text/URL 329 329 2. Single AKEL call (all processing in one prompt) 330 330 3. Display results 331 -**Total: 3 steps, 10-18 seconds** 256 +**Total: 3 steps, ~10-18 seconds** 332 332 333 333 **Full System Workflow:** 334 - 335 335 1. **Claim Submission** (extraction, normalization, clustering) 336 336 2. **Scenario Building** (definitions, assumptions, boundaries) 337 337 3. **Evidence Handling** (retrieval, assessment, linking) ... ... @@ -338,8 +338,10 @@ 338 338 4. **Verdict Creation** (synthesis, reasoning, approval) 339 339 5. **Public Presentation** (summaries, landscapes, deep dives) 340 340 6. **Time Evolution** (versioning, re-evaluation triggers) 341 -**Total: 6 phases with quality gates, 10-30 seconds** 265 +**Total: 6 phases with quality gates, ~10-30 seconds** 342 342 267 +--- 268 + 343 343 === 4.3 Why POC is Simplified === 344 344 345 345 **Engineering Rationale:** ... ... @@ -358,10 +358,11 @@ 358 358 * ❌ POC doesn't validate scale (test in Beta) 359 359 * ❌ POC doesn't validate scenario architecture (design in POC2) 360 360 287 +--- 288 + 361 361 === 4.4 Gap Between POC1 and POC2/Beta === 362 362 363 363 **What needs to be built for POC2:** 364 - 365 365 * Scenario generation component 366 366 * Evidence Model structure (full) 367 367 * Scenario-evidence linking ... ... @@ -369,7 +369,6 @@ 369 369 * Truth landscape visualization 370 370 371 371 **What needs to be built for Beta:** 372 - 373 373 * Multi-component AKEL pipeline 374 374 * Quality gate infrastructure 375 375 * Review workflow system ... ... @@ -379,6 +379,8 @@ 379 379 380 380 **POC1 → POC2 is significant architectural expansion.** 381 381 308 +--- 309 + 382 382 == 5. Publication Mode & Labeling == 383 383 384 384 === 5.1 POC Publication Mode === ... ... @@ -386,7 +386,6 @@ 386 386 **Mode:** Mode 2 (AI-Generated, No Prior Human Review) 387 387 388 388 Per FactHarbor Specification Section 11 "POC v1 Behavior": 389 - 390 390 * Produces public AI-generated output 391 391 * No human approval gate 392 392 * Clear AI-Generated labeling ... ... @@ -393,31 +393,35 @@ 393 393 * All quality gates active (simplified) 394 394 * Risk tier classification shown (demo) 395 395 323 +--- 324 + 396 396 === 5.2 User-Facing Labels === 397 397 398 398 **Primary Label (top of analysis):** 399 -{{code}}╔════════════════════════════════════════════════════════════╗ 400 -║ [AI-GENERATED - POC/DEMO] ║ 401 -║ ║ 402 -║ This analysis was produced entirely by AI and has not ║ 403 -║ been human-reviewed. Use for demonstration purposes. ║ 404 -║ ║ 405 -║ Source: AI/AKEL v1.0 (POC) ║ 406 -║ Review Status: Not Reviewed (Proof-of-Concept) ║ 407 -║ Quality Gates: 4/4 Passed (Simplified) ║ 408 -║ Last Updated: [timestamp] ║ 409 -╚════════════════════════════════════════════════════════════╝{{/code}} 328 +{{code}} 329 +╔════════════════════════════════════════════════════════════╗ 330 +║ [AI-GENERATED - POC/DEMO] ║ 331 +║ ║ 332 +║ This analysis was produced entirely by AI and has not ║ 333 +║ been human-reviewed. Use for demonstration purposes. ║ 334 +║ ║ 335 +║ Source: AI/AKEL v1.0 (POC) ║ 336 +║ Review Status: Not Reviewed (Proof-of-Concept) ║ 337 +║ Quality Gates: 4/4 Passed (Simplified) ║ 338 +║ Last Updated: [timestamp] ║ 339 +╚════════════════════════════════════════════════════════════╝ 340 +{{/code}} 410 410 411 411 **Per-Claim Risk Labels:** 412 - 413 413 * **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety) 414 414 * **[Risk: B]** 🟡 Medium Risk (Policy/Science) 415 415 * **[Risk: C]** 🟢 Low Risk (Facts/Definitions) 416 416 347 +--- 348 + 417 417 === 5.3 Display Requirements === 418 418 419 419 **Must Show:** 420 - 421 421 * AI-Generated status (prominent) 422 422 * POC/Demo disclaimer 423 423 * Risk tier per claim ... ... @@ -426,7 +426,6 @@ 426 426 * Timestamp 427 427 428 428 **Must NOT Claim:** 429 - 430 430 * Human review 431 431 * Production quality 432 432 * Medical/legal advice ... ... @@ -433,6 +433,8 @@ 433 433 * Authoritative verdicts 434 434 * Complete accuracy 435 435 366 +--- 367 + 436 436 === 5.4 Mode 2 vs. Full System Publication === 437 437 438 438 |=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3 ... ... @@ -443,6 +443,8 @@ 443 443 |Risk Display|Demo only|Workflow-integrated|Validated 444 444 |User Actions|View only|Flag for review|Trust rating 445 445 378 +--- 379 + 446 446 == 6. Quality Gates (Simplified Implementation) == 447 447 448 448 === 6.1 Overview === ... ... @@ -450,7 +450,6 @@ 450 450 Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates. 451 451 452 452 **Full System Has 4 Gates:** 453 - 454 454 1. Source Quality 455 455 2. Contradiction Search (MANDATORY) 456 456 3. Uncertainty Quantification ... ... @@ -457,16 +457,16 @@ 457 457 4. Structural Integrity 458 458 459 459 **POC Implements Simplified Versions:** 460 - 461 461 * Focus on demonstrating concept 462 462 * Basic implementations sufficient 463 463 * Failures displayed to user (not blocking) 464 464 * Full system has comprehensive validation 465 465 398 +--- 399 + 466 466 === 6.2 Gate 1: Source Quality (Basic) === 467 467 468 468 **Full System Requirements:** 469 - 470 470 * Primary sources identified and accessible 471 471 * Source reliability scored against whitelist 472 472 * Citation completeness verified ... ... @@ -474,7 +474,6 @@ 474 474 * Author credentials validated 475 475 476 476 **POC Implementation:** 477 - 478 478 * ✅ At least 2 sources found 479 479 * ✅ Sources accessible (URLs valid) 480 480 * ❌ No whitelist checking ... ... @@ -485,10 +485,11 @@ 485 485 486 486 **Failure Handling:** Display error message, don't generate verdict 487 487 420 +--- 421 + 488 488 === 6.3 Gate 2: Contradiction Search (Basic) === 489 489 490 490 **Full System Requirements:** 491 - 492 492 * Counter-evidence actively searched 493 493 * Reservations and limitations identified 494 494 * Alternative interpretations explored ... ... @@ -497,7 +497,6 @@ 497 497 * Academic literature (supporting AND opposing) 498 498 499 499 **POC Implementation:** 500 - 501 501 * ✅ Basic search for counter-evidence 502 502 * ✅ Identify obvious contradictions 503 503 * ❌ No comprehensive academic search ... ... @@ -509,10 +509,11 @@ 509 509 510 510 **Failure Handling:** Note "limited contradiction search" in output 511 511 444 +--- 445 + 512 512 === 6.4 Gate 3: Uncertainty Quantification (Basic) === 513 513 514 514 **Full System Requirements:** 515 - 516 516 * Confidence scores calculated for all claims/verdicts 517 517 * Limitations explicitly stated 518 518 * Data gaps identified and disclosed ... ... @@ -520,7 +520,6 @@ 520 520 * Alternative scenarios considered 521 521 522 522 **POC Implementation:** 523 - 524 524 * ✅ Confidence scores (0-100%) 525 525 * ✅ Basic uncertainty acknowledgment 526 526 * ❌ No detailed limitation disclosure ... ... @@ -531,10 +531,11 @@ 531 531 532 532 **Failure Handling:** Show "Confidence: Unknown" if calculation fails 533 533 466 +--- 467 + 534 534 === 6.5 Gate 4: Structural Integrity (Basic) === 535 535 536 536 **Full System Requirements:** 537 - 538 538 * No hallucinations detected (fact-checking against sources) 539 539 * Logic chain valid and traceable 540 540 * References accessible and verifiable ... ... @@ -542,7 +542,6 @@ 542 542 * Premises clearly stated 543 543 544 544 **POC Implementation:** 545 - 546 546 * ✅ Basic coherence check 547 547 * ✅ References accessible 548 548 * ❌ No comprehensive hallucination detection ... ... @@ -553,24 +553,32 @@ 553 553 554 554 **Failure Handling:** Display error message 555 555 488 +--- 489 + 556 556 === 6.6 Quality Gate Display === 557 557 558 558 **POC shows simplified status:** 559 -{{code}}Quality Gates: 4/4 Passed (Simplified) 493 +{{code}} 494 +Quality Gates: 4/4 Passed (Simplified) 560 560 ✓ Source Quality: 3 sources found 561 561 ✓ Contradiction Search: Basic search completed 562 562 ✓ Uncertainty: Confidence scores assigned 563 -✓ Structural Integrity: Output coherent{{/code}} 498 +✓ Structural Integrity: Output coherent 499 +{{/code}} 564 564 565 565 **If any gate fails:** 566 -{{code}}Quality Gates: 3/4 Passed (Simplified) 502 +{{code}} 503 +Quality Gates: 3/4 Passed (Simplified) 567 567 ✓ Source Quality: 3 sources found 568 568 ✗ Contradiction Search: Search failed - limited evidence 569 569 ✓ Uncertainty: Confidence scores assigned 570 570 ✓ Structural Integrity: Output coherent 571 571 572 -Note: This analysis has limited evidence. Use with caution.{{/code}} 509 +Note: This analysis has limited evidence. Use with caution. 510 +{{/code}} 573 573 512 +--- 513 + 574 574 === 6.7 Simplified vs. Full System === 575 575 576 576 |=Gate|=POC (Simplified)|=Full System ... ... @@ -581,13 +581,14 @@ 581 581 582 582 **POC Goal:** Demonstrate that quality gates are possible, not perfect implementation. 583 583 524 +--- 525 + 584 584 == 7. AKEL Architecture Comparison == 585 585 586 586 === 7.1 POC AKEL (Simplified) === 587 587 588 588 **Implementation:** 589 - 590 -* Single provider API call (REASONING model) 531 +* Single Claude API call (Sonnet 4.5) 591 591 * One comprehensive prompt 592 592 * All processing in single request 593 593 * No separate components ... ... @@ -594,26 +594,31 @@ 594 594 * No orchestration layer 595 595 596 596 **Prompt Structure:** 597 -{{code}}Task: Analyze this article and provide: 538 +{{code}} 539 +Task: Analyze this article and provide: 598 598 599 599 1. Extract 3-5 factual claims 600 600 2. For each claim: 601 - - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED) 602 - - Assign confidence score (0-100%) 603 - - Assign risk tier (A/B/C) 604 - - Write brief reasoning (1-3 sentences) 543 + - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED) 544 + - Assign confidence score (0-100%) 545 + - Assign risk tier (A/B/C) 546 + - Write brief reasoning (1-3 sentences) 605 605 3. Generate analysis summary (3-5 sentences) 606 606 4. Generate article summary (3-5 sentences) 607 607 5. Run basic quality checks 608 608 609 -Return as structured JSON.{{/code}} 551 +Return as structured JSON. 552 +{{/code}} 610 610 611 611 **Processing Time:** 10-18 seconds (estimate) 612 612 556 +--- 557 + 613 613 === 7.2 Full System AKEL (Production) === 614 614 615 615 **Architecture:** 616 -{{code}}AKEL Orchestrator 561 +{{code}} 562 +AKEL Orchestrator 617 617 ├── Claim Extractor 618 618 ├── Claim Classifier (with risk tier assignment) 619 619 ├── Scenario Generator ... ... @@ -621,10 +621,10 @@ 621 621 ├── Contradiction Detector 622 622 ├── Quality Gate Validator 623 623 ├── Audit Sampling Scheduler 624 -└── Federation Sync Adapter (Release 1.0+){{/code}} 570 +└── Federation Sync Adapter (Release 1.0+) 571 +{{/code}} 625 625 626 626 **Processing:** 627 - 628 628 * Parallel processing where possible 629 629 * Separate component calls 630 630 * Quality gates between phases ... ... @@ -633,10 +633,11 @@ 633 633 634 634 **Processing Time:** 10-30 seconds (full pipeline) 635 635 582 +--- 583 + 636 636 === 7.3 Why POC Uses Single Call === 637 637 638 638 **Advantages:** 639 - 640 640 * ✅ Simpler to implement 641 641 * ✅ Faster POC development 642 642 * ✅ Easier to debug ... ... @@ -644,7 +644,6 @@ 644 644 * ✅ Good enough for concept validation 645 645 646 646 **Limitations:** 647 - 648 648 * ❌ No component reusability 649 649 * ❌ No parallel processing 650 650 * ❌ All-or-nothing (can't partially succeed) ... ... @@ -657,6 +657,8 @@ 657 657 658 658 Full component architecture comes in Beta after POC validates concept. 659 659 606 +--- 607 + 660 660 === 7.4 Evolution Path === 661 661 662 662 **POC1:** Single prompt → Prove concept ... ... @@ -664,6 +664,8 @@ 664 664 **Beta:** Multi-component AKEL → Production architecture 665 665 **Release 1.0:** Full AKEL + Federation → Scale 666 666 615 +--- 616 + 667 667 == 8. Functional Requirements == 668 668 669 669 === FR-POC-1: Article Input === ... ... @@ -671,7 +671,6 @@ 671 671 **Requirement:** User can submit article for analysis 672 672 673 673 **Functionality:** 674 - 675 675 * Text input field (paste article text, up to 5000 characters) 676 676 * URL input field (paste article URL) 677 677 * "Analyze" button to trigger processing ... ... @@ -678,7 +678,6 @@ 678 678 * Loading indicator during analysis 679 679 680 680 **Excluded:** 681 - 682 682 * No user authentication 683 683 * No claim history 684 684 * No search functionality ... ... @@ -685,17 +685,17 @@ 685 685 * No saved templates 686 686 687 687 **Acceptance Criteria:** 688 - 689 689 * User can paste text from article 690 690 * User can paste URL of article 691 691 * System accepts input and triggers analysis 692 692 640 +--- 641 + 693 693 === FR-POC-2: Claim Extraction (Fully Automated) === 694 694 695 695 **Requirement:** AI automatically extracts 3-5 factual claims 696 696 697 697 **Functionality:** 698 - 699 699 * AI reads article text 700 700 * AI identifies factual claims (not opinions/questions) 701 701 * AI extracts 3-5 most important claims ... ... @@ -702,7 +702,6 @@ 702 702 * System displays numbered list 703 703 704 704 **Critical:** NO MANUAL EDITING ALLOWED 705 - 706 706 * AI selects which claims to extract 707 707 * AI identifies factual vs. non-factual 708 708 * System processes claims as extracted ... ... @@ -709,34 +709,32 @@ 709 709 * No human curation or correction 710 710 711 711 **Error Handling:** 712 - 713 713 * If extraction fails: Display error message 714 714 * User can retry with different input 715 715 * No manual intervention to fix extraction 716 716 717 717 **Acceptance Criteria:** 718 - 719 719 * AI extracts 3-5 claims automatically 720 720 * Claims are factual (not opinions) 721 721 * Claims are clearly stated 722 722 * No manual editing required 723 723 669 +--- 670 + 724 724 === FR-POC-3: Verdict Generation (Fully Automated) === 725 725 726 726 **Requirement:** AI automatically generates verdict for each claim 727 727 728 728 **Functionality:** 729 - 730 730 * For each claim, AI: 731 -* Evaluates claim based on available evidence/knowledge 732 -* Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED 733 -* Assigns confidence score (0-100%) 734 -* Assigns risk tier (A/B/C) 735 -* Writes brief reasoning (1-3 sentences) 677 + * Evaluates claim based on available evidence/knowledge 678 + * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED 679 + * Assigns confidence score (0-100%) 680 + * Assigns risk tier (A/B/C) 681 + * Writes brief reasoning (1-3 sentences) 736 736 * System displays verdict for each claim 737 737 738 738 **Critical:** NO MANUAL EDITING ALLOWED 739 - 740 740 * AI computes verdicts based on evidence 741 741 * AI generates confidence scores 742 742 * AI writes reasoning ... ... @@ -743,13 +743,11 @@ 743 743 * No human review or adjustment 744 744 745 745 **Error Handling:** 746 - 747 747 * If verdict generation fails: Display error message 748 748 * User can retry 749 749 * No manual intervention to adjust verdicts 750 750 751 751 **Acceptance Criteria:** 752 - 753 753 * Each claim has a verdict 754 754 * Confidence score is displayed (0-100%) 755 755 * Risk tier is displayed (A/B/C) ... ... @@ -757,33 +757,34 @@ 757 757 * Verdict is defensible given reasoning 758 758 * All generated automatically by AI 759 759 703 +--- 704 + 760 760 === FR-POC-4: Analysis Summary (Fully Automated) === 761 761 762 762 **Requirement:** AI generates brief summary of analysis 763 763 764 764 **Functionality:** 765 - 766 766 * AI summarizes findings in 3-5 sentences: 767 -* How many claims found 768 -* Distribution of verdicts 769 -* Overall assessment 711 + * How many claims found 712 + * Distribution of verdicts 713 + * Overall assessment 770 770 * System displays at top of results 771 771 772 772 **Critical:** NO MANUAL EDITING ALLOWED 773 773 774 774 **Acceptance Criteria:** 775 - 776 776 * Summary is coherent 777 777 * Accurately reflects analysis 778 778 * 3-5 sentences 779 779 * Automatically generated 780 780 724 +--- 725 + 781 781 === FR-POC-5: Article Summary (Fully Automated, Optional) === 782 782 783 783 **Requirement:** AI generates brief summary of original article 784 784 785 785 **Functionality:** 786 - 787 787 * AI summarizes article content (not FactHarbor's analysis) 788 788 * 3-5 sentences 789 789 * System displays ... ... @@ -793,18 +793,18 @@ 793 793 **Critical:** NO MANUAL EDITING ALLOWED 794 794 795 795 **Acceptance Criteria:** 796 - 797 797 * Summary is neutral (article's position) 798 798 * Accurately reflects article content 799 799 * 3-5 sentences 800 800 * Automatically generated 801 801 745 +--- 746 + 802 802 === FR-POC-6: Publication Mode Display === 803 803 804 804 **Requirement:** Clear labeling of AI-generated content 805 805 806 806 **Functionality:** 807 - 808 808 * Display Mode 2 publication label 809 809 * Show POC/Demo disclaimer 810 810 * Display risk tiers per claim ... ... @@ -812,18 +812,18 @@ 812 812 * Display timestamp 813 813 814 814 **Acceptance Criteria:** 815 - 816 816 * Label is prominent and clear 817 817 * User understands this is AI-generated POC output 818 818 * Risk tiers are color-coded 819 819 * Quality gate status is visible 820 820 764 +--- 765 + 821 821 === FR-POC-7: Quality Gate Execution === 822 822 823 823 **Requirement:** Execute simplified quality gates 824 824 825 825 **Functionality:** 826 - 827 827 * Check source quality (basic) 828 828 * Attempt contradiction search (basic) 829 829 * Calculate confidence scores ... ... @@ -831,12 +831,13 @@ 831 831 * Display gate results 832 832 833 833 **Acceptance Criteria:** 834 - 835 835 * All 4 gates attempted 836 836 * Pass/fail status displayed 837 837 * Failures explained to user 838 838 * Gates don't block publication (POC mode) 839 839 783 +--- 784 + 840 840 == 9. Non-Functional Requirements == 841 841 842 842 === NFR-POC-1: Fully Automated Processing === ... ... @@ -846,7 +846,6 @@ 846 846 **Critical Rule:** NO MANUAL EDITING AT ANY STAGE 847 847 848 848 **What this means:** 849 - 850 850 * Claims: AI selects (no human curation) 851 851 * Scenarios: N/A (deferred to POC2) 852 852 * Evidence: AI evaluates (no human selection) ... ... @@ -854,12 +854,13 @@ 854 854 * Summaries: AI writes (no human editing) 855 855 856 856 **Pipeline:** 857 -{{code}}User Input → AKEL Processing → Output Display 858 - ↓ 859 - ZERO human editing{{/code}} 801 +{{code}} 802 +User Input → AKEL Processing → Output Display 803 + ↓ 804 + ZERO human editing 805 +{{/code}} 860 860 861 861 **If AI output is poor:** 862 - 863 863 * ❌ Do NOT manually fix it 864 864 * ✅ Document the failure 865 865 * ✅ Improve prompts and retry ... ... @@ -866,61 +866,59 @@ 866 866 * ✅ Accept that POC might fail 867 867 868 868 **Why this matters:** 869 - 870 870 * Tests whether AI can do this without humans 871 871 * Validates scalability (humans can't review every analysis) 872 872 * Honest test of technical feasibility 873 873 818 +--- 819 + 874 874 === NFR-POC-2: Performance === 875 875 876 876 **Requirement:** Analysis completes in reasonable time 877 877 878 878 **Acceptable Performance:** 879 - 880 880 * Processing time: 1-5 minutes (acceptable for POC) 881 881 * Display loading indicator to user 882 882 * Show progress if possible ("Extracting claims...", "Generating verdicts...") 883 883 884 884 **Not Required:** 885 - 886 886 * Production-level speed (< 30 seconds) 887 887 * Optimization for scale 888 888 * Caching 889 889 890 890 **Acceptance Criteria:** 891 - 892 892 * Analysis completes within 5 minutes 893 893 * User sees loading indicator 894 894 * No timeout errors 895 895 839 +--- 840 + 896 896 === NFR-POC-3: Reliability === 897 897 898 898 **Requirement:** System works for manual testing sessions 899 899 900 900 **Acceptable:** 901 - 902 902 * Occasional errors (< 20% failure rate) 903 903 * Manual restart if needed 904 904 * Display error messages clearly 905 905 906 906 **Not Required:** 907 - 908 908 * 99.9% uptime 909 909 * Automatic error recovery 910 910 * Production monitoring 911 911 912 912 **Acceptance Criteria:** 913 - 914 914 * System works for test demonstrations 915 915 * Errors are handled gracefully 916 916 * User receives clear error messages 917 917 860 +--- 861 + 918 918 === NFR-POC-4: Environment === 919 919 920 920 **Requirement:** Runs on simple infrastructure 921 921 922 922 **Acceptable:** 923 - 924 924 * Single machine or simple cloud setup 925 925 * No distributed architecture 926 926 * No load balancing ... ... @@ -928,196 +928,125 @@ 928 928 * Local development environment viable 929 929 930 930 **Not Required:** 931 - 932 932 * Production infrastructure 933 933 * Multi-region deployment 934 934 * Auto-scaling 935 935 * Disaster recovery 936 936 937 - === NFR-POC-5: Cost Efficiency Tracking ===879 +--- 938 938 939 -**Requirement:** Track and display LLM usage metrics to inform optimization decisions 940 - 941 -**Must Track:** 942 - 943 -* Input tokens (article + prompt) 944 -* Output tokens (generated analysis) 945 -* Total tokens 946 -* Estimated cost (USD) 947 -* Response time (seconds) 948 -* Article length (words/characters) 949 - 950 -**Must Display:** 951 - 952 -* Usage statistics in UI (Component 5) 953 -* Cost per analysis 954 -* Cost per claim extracted 955 - 956 -**Must Log:** 957 - 958 -* Aggregate metrics for analysis 959 -* Cost distribution by article length 960 -* Token efficiency trends 961 - 962 -**Purpose:** 963 - 964 -* Understand unit economics 965 -* Identify optimization opportunities 966 -* Project costs at scale 967 -* Inform architecture decisions (caching, model selection, etc.) 968 - 969 -**Acceptance Criteria:** 970 - 971 -* ✅ Usage data displayed after each analysis 972 -* ✅ Metrics logged for aggregate analysis 973 -* ✅ Cost calculated accurately (Claude API pricing) 974 -* ✅ Test cases include varying article lengths 975 -* ✅ POC1 report includes cost analysis section 976 - 977 -**Success Target:** 978 - 979 -* Average cost per analysis < $0.05 USD 980 -* Cost scaling behavior understood (linear/exponential) 981 -* 2+ optimization opportunities identified 982 - 983 -**Critical:** Unit economics must be viable for scaling decision! 984 - 985 985 == 10. Technical Architecture == 986 986 987 987 === 10.1 System Components === 988 988 989 989 **Frontend:** 990 - 991 991 * Simple HTML form (text input + URL input + button) 992 992 * Loading indicator 993 993 * Results display page (single page, no tabs/navigation) 994 994 995 995 **Backend:** 996 - 997 997 * Single API endpoint 998 -* Calls providerAPI (REASONING model;configuredviaLLM abstraction)892 +* Calls Claude API (Sonnet 4.5 or latest) 999 999 * Parses response 1000 1000 * Returns JSON to frontend 1001 1001 1002 1002 **Data Storage:** 1003 - 1004 1004 * None required (stateless POC) 1005 1005 * Optional: Simple file storage or SQLite for demo examples 1006 1006 1007 1007 **External Services:** 1008 - 1009 1009 * Claude API (Anthropic) - required 1010 1010 * Optional: URL fetch service for article text extraction 1011 1011 904 +--- 905 + 1012 1012 === 10.2 Processing Flow === 1013 1013 1014 1014 {{code}} 1015 1015 1. User submits text or URL 1016 - ↓ 910 + ↓ 1017 1017 2. Backend receives request 1018 - ↓ 912 + ↓ 1019 1019 3. If URL: Fetch article text 1020 - ↓ 914 + ↓ 1021 1021 4. Call Claude API with single prompt: 1022 - "Extract claims, evaluate each, provide verdicts" 1023 - ↓ 916 + "Extract claims, evaluate each, provide verdicts" 917 + ↓ 1024 1024 5. Claude API returns: 1025 - - Analysis summary 1026 - - Claims list 1027 - - Verdicts for each claim (with risk tiers) 1028 - - Article summary (optional) 1029 - - Quality gate results 1030 - ↓ 919 + - Analysis summary 920 + - Claims list 921 + - Verdicts for each claim (with risk tiers) 922 + - Article summary (optional) 923 + - Quality gate results 924 + ↓ 1031 1031 6. Backend parses response 1032 - ↓ 926 + ↓ 1033 1033 7. Frontend displays results with Mode 2 labeling 1034 1034 {{/code}} 1035 1035 1036 1036 **Key Simplification:** Single API call does entire analysis 1037 1037 932 +--- 933 + 1038 1038 === 10.3 AI Prompt Strategy === 1039 1039 1040 1040 **Single Comprehensive Prompt:** 1041 -{{code}}Task: Analyze this article and provide: 937 +{{code}} 938 +Task: Analyze this article and provide: 1042 1042 1043 -1. Identify the article's main thesis/conclusion 1044 - - What is the article trying to argue or prove? 1045 - - What is the primary claim or conclusion? 940 +1. Extract 3-5 factual claims from the article 941 +2. For each claim: 942 + - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED) 943 + - Assign confidence score (0-100%) 944 + - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions) 945 + - Write brief reasoning (1-3 sentences) 946 +3. Run quality gates: 947 + - Check: ≥2 sources found 948 + - Attempt: Basic contradiction search 949 + - Calculate: Confidence scores 950 + - Verify: Structural integrity 951 +4. Write analysis summary (3-5 sentences: claims found, verdict distribution, overall assessment) 952 +5. Write article summary (3-5 sentences: neutral summary of article content) 1046 1046 1047 -2. Extract 3-5 factual claims from the article 1048 - - Note which claims are CENTRAL to the main thesis 1049 - - Note which claims are SUPPORTING facts 954 +Return as structured JSON with quality gate results. 955 +{{/code}} 1050 1050 1051 -3. For each claim: 1052 - - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED) 1053 - - Assign confidence score (0-100%) 1054 - - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions) 1055 - - Write brief reasoning (1-3 sentences) 1056 - 1057 -4. Assess relationship between claims and main thesis: 1058 - - Do the claims actually support the article's conclusion? 1059 - - Are there logical leaps or unsupported inferences? 1060 - - Is the article's framing misleading even if individual facts are accurate? 1061 - 1062 -5. Run quality gates: 1063 - - Check: ≥2 sources found 1064 - - Attempt: Basic contradiction search 1065 - - Calculate: Confidence scores 1066 - - Verify: Structural integrity 1067 - 1068 -6. Write context-aware analysis summary (4-6 sentences): 1069 - - State article's main thesis 1070 - - Report claims found and verdict distribution 1071 - - Note if central claims are problematic 1072 - - Assess whether evidence supports conclusion 1073 - - Overall credibility considering claim importance 1074 - 1075 -7. Write article summary (3-5 sentences: neutral summary of article content) 1076 - 1077 -Return as structured JSON with quality gate results.{{/code}} 1078 - 1079 1079 **One prompt generates everything.** 1080 1080 1081 - **Critical Addition:**959 +--- 1082 1082 1083 -Steps 1, 2 (marking central claims), 4, and 6 are NEW for context-aware analysis. These test whether AI can distinguish between "accurate facts poorly reasoned" vs. "genuinely credible article." 1084 - 1085 1085 === 10.4 Technology Stack Suggestions === 1086 1086 1087 1087 **Frontend:** 1088 - 1089 1089 * HTML + CSS + JavaScript (minimal framework) 1090 1090 * OR: Next.js (if team prefers) 1091 1091 * Hosted: Local machine OR Vercel/Netlify free tier 1092 1092 1093 1093 **Backend:** 1094 - 1095 1095 * Python Flask/FastAPI (simple REST API) 1096 1096 * OR: Next.js API routes (if using Next.js) 1097 1097 * Hosted: Local machine OR Railway/Render free tier 1098 1098 1099 1099 **AKEL Integration:** 1100 - 1101 1101 * Claude API via Anthropic SDK 1102 -* Model: Provider-defaultREASONING modelor latest available975 +* Model: Claude Sonnet 4.5 or latest available 1103 1103 1104 1104 **Database:** 1105 - 1106 1106 * None (stateless acceptable) 1107 1107 * OR: SQLite if want to store demo examples 1108 1108 * OR: JSON files on disk 1109 1109 1110 1110 **Deployment:** 1111 - 1112 1112 * Local development environment sufficient for POC 1113 1113 * Optional: Deploy to cloud for remote demos 1114 1114 986 +--- 987 + 1115 1115 == 11. Success Criteria == 1116 1116 1117 1117 === 11.1 Minimum Success (POC Passes) === 1118 1118 1119 1119 **Required for GO decision:** 1120 - 1121 1121 * ✅ AI extracts 3-5 factual claims automatically 1122 1122 * ✅ AI provides verdict for each claim automatically 1123 1123 * ✅ Verdicts are reasonable (≥70% make logical sense) ... ... @@ -1126,20 +1126,17 @@ 1126 1126 * ✅ Team/advisors understand the output 1127 1127 * ✅ Team agrees approach has merit 1128 1128 * ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention) 1129 -* ✅ **Cost efficiency acceptable** (average cost per analysis < $0.05 USD target) 1130 -* ✅ **Cost scaling understood** (data collected on article length vs. cost) 1131 -* ✅ **Optimization opportunities identified** (≥2 potential improvements documented) 1132 1132 1133 1133 **Quality Definition:** 1134 - 1135 1135 * "Reasonable verdict" = Defensible given general knowledge 1136 1136 * "Coherent summary" = Logically structured, grammatically correct 1137 1137 * "Comprehensible" = Reviewers understand what analysis means 1138 1138 1007 +--- 1008 + 1139 1139 === 11.2 POC Fails If === 1140 1140 1141 1141 **Automatic NO-GO if any of these:** 1142 - 1143 1143 * ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones) 1144 1144 * ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random) 1145 1145 * ❌ Output incomprehensible (reviewers can't understand analysis) ... ... @@ -1146,20 +1146,21 @@ 1146 1146 * ❌ **Requires manual editing for most analyses** (> 50% need human correction) 1147 1147 * ❌ Team loses confidence in AI-automated approach 1148 1148 1018 +--- 1019 + 1149 1149 === 11.3 Quality Thresholds === 1150 1150 1151 1151 **POC quality expectations:** 1152 1152 1153 1153 |=Component|=Quality Threshold|=Definition 1154 -|Claim Extraction|(% class="success" %)≥70% accuracy |Identifies obvious factual claims, may miss some edge cases 1155 -|Verdict Logic|(% class="success" %)≥70% defensible |Verdicts are logical given reasoning provided 1156 -|Reasoning Clarity|(% class="success" %)≥70% clear |1-3 sentences are understandable and relevant 1157 -|Overall Analysis|(% class="success" %)≥70% useful |Output helps user understand article claims 1025 +|Claim Extraction|(% class="success" %)≥70% accuracy(%%) |Identifies obvious factual claims, may miss some edge cases 1026 +|Verdict Logic|(% class="success" %)≥70% defensible(%%) |Verdicts are logical given reasoning provided 1027 +|Reasoning Clarity|(% class="success" %)≥70% clear(%%) |1-3 sentences are understandable and relevant 1028 +|Overall Analysis|(% class="success" %)≥70% useful(%%) |Output helps user understand article claims 1158 1158 1159 1159 **Analogy:** "B student" quality (70-80%), not "A+" perfection yet 1160 1160 1161 1161 **Not expecting:** 1162 - 1163 1163 * 100% accuracy 1164 1164 * Perfect claim coverage 1165 1165 * Comprehensive evidence gathering ... ... @@ -1167,12 +1167,13 @@ 1167 1167 * Production polish 1168 1168 1169 1169 **Expecting:** 1170 - 1171 1171 * Reasonable claim extraction 1172 1172 * Defensible verdicts 1173 1173 * Understandable reasoning 1174 1174 * Useful output 1175 1175 1045 +--- 1046 + 1176 1176 == 12. Test Cases == 1177 1177 1178 1178 === 12.1 Test Case 1: Simple Factual Claim === ... ... @@ -1180,7 +1180,6 @@ 1180 1180 **Input:** "Coffee reduces the risk of type 2 diabetes by 30%" 1181 1181 1182 1182 **Expected Output:** 1183 - 1184 1184 * Extract claim correctly 1185 1185 * Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED 1186 1186 * Confidence: 70-90% ... ... @@ -1189,12 +1189,13 @@ 1189 1189 1190 1190 **Success:** Verdict is reasonable and reasoning makes sense 1191 1191 1062 +--- 1063 + 1192 1192 === 12.2 Test Case 2: Complex News Article === 1193 1193 1194 1194 **Input:** News article URL with multiple claims about politics/health/science 1195 1195 1196 1196 **Expected Output:** 1197 - 1198 1198 * Extract 3-5 key claims 1199 1199 * Verdict for each (may vary: some supported, some uncertain, some refuted) 1200 1200 * Coherent analysis summary ... ... @@ -1203,12 +1203,13 @@ 1203 1203 1204 1204 **Success:** Claims identified are actually from article, verdicts are reasonable 1205 1205 1077 +--- 1078 + 1206 1206 === 12.3 Test Case 3: Controversial Topic === 1207 1207 1208 1208 **Input:** Article on contested political or scientific topic 1209 1209 1210 1210 **Expected Output:** 1211 - 1212 1212 * Balanced analysis 1213 1213 * Acknowledges uncertainty where appropriate 1214 1214 * Doesn't overstate confidence ... ... @@ -1216,12 +1216,13 @@ 1216 1216 1217 1217 **Success:** Analysis is fair and doesn't show obvious bias 1218 1218 1091 +--- 1092 + 1219 1219 === 12.4 Test Case 4: Clearly False Claim === 1220 1220 1221 1221 **Input:** Article with obviously false claim (e.g., "The Earth is flat") 1222 1222 1223 1223 **Expected Output:** 1224 - 1225 1225 * Extract claim 1226 1226 * Verdict: REFUTED 1227 1227 * High confidence (> 90%) ... ... @@ -1230,12 +1230,13 @@ 1230 1230 1231 1231 **Success:** AI correctly identifies false claim with high confidence 1232 1232 1106 +--- 1107 + 1233 1233 === 12.5 Test Case 5: Genuinely Uncertain Claim === 1234 1234 1235 1235 **Input:** Article with claim where evidence is genuinely mixed 1236 1236 1237 1237 **Expected Output:** 1238 - 1239 1239 * Extract claim 1240 1240 * Verdict: UNCERTAIN 1241 1241 * Moderate confidence (40-60%) ... ... @@ -1243,12 +1243,13 @@ 1243 1243 1244 1244 **Success:** AI recognizes uncertainty and doesn't overstate confidence 1245 1245 1120 +--- 1121 + 1246 1246 === 12.6 Test Case 6: High-Risk Medical Claim === 1247 1247 1248 1248 **Input:** Article making medical claims 1249 1249 1250 1250 **Expected Output:** 1251 - 1252 1252 * Extract claim 1253 1253 * Verdict: [appropriate based on evidence] 1254 1254 * Risk tier: A (High - medical) ... ... @@ -1257,6 +1257,8 @@ 1257 1257 1258 1258 **Success:** Risk tier correctly assigned, appropriate warnings shown 1259 1259 1135 +--- 1136 + 1260 1260 == 13. POC Decision Gate == 1261 1261 1262 1262 === 13.1 Decision Framework === ... ... @@ -1266,7 +1266,6 @@ 1266 1266 **Option A: GO (Proceed to POC2)** 1267 1267 1268 1268 **Conditions:** 1269 - 1270 1270 * AI quality ≥70% without manual editing 1271 1271 * Basic claim → verdict pipeline validated 1272 1272 * Internal + advisor feedback positive ... ... @@ -1275,16 +1275,16 @@ 1275 1275 * Clear path to improving AI quality to ≥90% 1276 1276 1277 1277 **Next Steps:** 1278 - 1279 1279 * Plan POC2 development (add scenarios) 1280 1280 * Design scenario architecture 1281 1281 * Expand to Evidence Model structure 1282 1282 * Test with more complex articles 1283 1283 1159 +--- 1160 + 1284 1284 **Option B: NO-GO (Pivot or Stop)** 1285 1285 1286 1286 **Conditions:** 1287 - 1288 1288 * AI quality < 60% 1289 1289 * Requires manual editing for most analyses (> 50%) 1290 1290 * Feedback indicates fundamental flaws ... ... @@ -1292,14 +1292,14 @@ 1292 1292 * No clear path to improvement 1293 1293 1294 1294 **Next Steps:** 1295 - 1296 1296 * **Pivot:** Change to hybrid human-AI approach (accept manual review required) 1297 1297 * **Stop:** Conclude approach not viable, revisit later 1298 1298 1174 +--- 1175 + 1299 1299 **Option C: ITERATE (Improve POC)** 1300 1300 1301 1301 **Conditions:** 1302 - 1303 1303 * Concept has merit but execution needs work 1304 1304 * Specific improvements identified 1305 1305 * Addressable with better prompts/approach ... ... @@ -1306,43 +1306,46 @@ 1306 1306 * AI quality between 60-70% 1307 1307 1308 1308 **Next Steps:** 1309 - 1310 1310 * Improve AI prompts 1311 1311 * Test different approaches 1312 1312 * Re-run POC with improvements 1313 1313 * Then make GO/NO-GO decision 1314 1314 1190 +--- 1191 + 1315 1315 === 13.2 Decision Criteria Summary === 1316 1316 1317 1317 {{code}} 1318 -AI Quality < 60% → NO-GO (approach doesn't work) 1195 +AI Quality < 60% → NO-GO (approach doesn't work) 1319 1319 AI Quality 60-70% → ITERATE (improve and retry) 1320 -AI Quality ≥70% → GO (proceed to POC2) 1197 +AI Quality ≥70% → GO (proceed to POC2) 1321 1321 {{/code}} 1322 1322 1200 +--- 1201 + 1323 1323 == 14. Key Risks & Mitigations == 1324 1324 1325 1325 === 14.1 Risk: AI Quality Not Good Enough === 1326 1326 1327 -**Likelihood:** Medium-High 1328 -**Impact:** POC fails 1206 +**Likelihood:** Medium-High 1207 +**Impact:** POC fails 1329 1329 1330 1330 **Mitigation:** 1331 - 1332 1332 * Extensive prompt engineering and testing 1333 -* Use best available AI models ( role-based selection; configured via LLM abstraction)1211 +* Use best available AI models (Sonnet 4.5) 1334 1334 * Test with diverse article types 1335 1335 * Iterate on prompts based on results 1336 1336 1337 1337 **Acceptance:** This is what POC tests - be ready for failure 1338 1338 1217 +--- 1218 + 1339 1339 === 14.2 Risk: AI Consistency Issues === 1340 1340 1341 -**Likelihood:** Medium 1342 -**Impact:** Works sometimes, fails other times 1221 +**Likelihood:** Medium 1222 +**Impact:** Works sometimes, fails other times 1343 1343 1344 1344 **Mitigation:** 1345 - 1346 1346 * Test with 10+ diverse articles 1347 1347 * Measure success rate honestly 1348 1348 * Improve prompts to increase consistency ... ... @@ -1349,13 +1349,14 @@ 1349 1349 1350 1350 **Acceptance:** Some variability OK if average quality ≥70% 1351 1351 1231 +--- 1232 + 1352 1352 === 14.3 Risk: Output Incomprehensible === 1353 1353 1354 -**Likelihood:** Low-Medium 1355 -**Impact:** Users can't understand analysis 1235 +**Likelihood:** Low-Medium 1236 +**Impact:** Users can't understand analysis 1356 1356 1357 1357 **Mitigation:** 1358 - 1359 1359 * Create clear explainer document 1360 1360 * Iterate on output format 1361 1361 * Test with non-technical reviewers ... ... @@ -1363,13 +1363,14 @@ 1363 1363 1364 1364 **Acceptance:** Iterate until comprehensible 1365 1365 1246 +--- 1247 + 1366 1366 === 14.4 Risk: API Rate Limits / Costs === 1367 1367 1368 -**Likelihood:** Low 1369 -**Impact:** System slow or expensive 1250 +**Likelihood:** Low 1251 +**Impact:** System slow or expensive 1370 1370 1371 1371 **Mitigation:** 1372 - 1373 1373 * Monitor API usage 1374 1374 * Implement retry logic 1375 1375 * Estimate costs before scaling ... ... @@ -1376,13 +1376,14 @@ 1376 1376 1377 1377 **Acceptance:** POC can be slow and expensive (optimization later) 1378 1378 1260 +--- 1261 + 1379 1379 === 14.5 Risk: Scope Creep === 1380 1380 1381 -**Likelihood:** Medium 1382 -**Impact:** POC becomes too complex 1264 +**Likelihood:** Medium 1265 +**Impact:** POC becomes too complex 1383 1383 1384 1384 **Mitigation:** 1385 - 1386 1386 * Strict scope discipline 1387 1387 * Say NO to feature additions 1388 1388 * Keep focus on core question ... ... @@ -1389,19 +1389,18 @@ 1389 1389 1390 1390 **Acceptance:** POC is minimal by design 1391 1391 1274 +--- 1275 + 1392 1392 == 15. POC Philosophy == 1393 1393 1394 1394 === 15.1 Core Principles === 1395 1395 1396 -* \\ 1397 -** \\ 1398 -**1. Build Less, Learn More 1280 +**1. Build Less, Learn More** 1399 1399 * Minimum features to test hypothesis 1400 1400 * Don't build unvalidated features 1401 1401 * Focus on core question only 1402 1402 1403 1403 **2. Fail Fast** 1404 - 1405 1405 * Quick test of hardest part (AI capability) 1406 1406 * Accept that POC might fail 1407 1407 * Better to discover issues early ... ... @@ -1408,45 +1408,45 @@ 1408 1408 * Honest assessment over optimistic hope 1409 1409 1410 1410 **3. Test First, Build Second** 1411 - 1412 1412 * Validate AI can do this before building platform 1413 1413 * Don't assume it will work 1414 1414 * Let results guide decisions 1415 1415 1416 1416 **4. Automation First** 1417 - 1418 1418 * No manual editing allowed 1419 1419 * Tests scalability, not just feasibility 1420 1420 * Proves approach can work at scale 1421 1421 1422 1422 **5. Honest Assessment** 1423 - 1424 1424 * Don't cherry-pick examples 1425 1425 * Don't manually fix bad outputs 1426 1426 * Document failures openly 1427 1427 * Make data-driven decisions 1428 1428 1307 +--- 1308 + 1429 1429 === 15.2 What POC Is === 1430 1430 1431 -✅ Testing AI capability without humans 1432 -✅ Proving core technical concept 1433 -✅ Fast validation of approach 1434 -✅ Honest assessment of feasibility 1311 +✅ Testing AI capability without humans 1312 +✅ Proving core technical concept 1313 +✅ Fast validation of approach 1314 +✅ Honest assessment of feasibility 1435 1435 1316 +--- 1317 + 1436 1436 === 15.3 What POC Is NOT === 1437 1437 1438 -❌ Building a product 1439 -❌ Production-ready system 1440 -❌ Feature-complete platform 1441 -❌ Perfectly accurate analysis 1442 -❌ Polished user experience 1320 +❌ Building a product 1321 +❌ Production-ready system 1322 +❌ Feature-complete platform 1323 +❌ Perfectly accurate analysis 1324 +❌ Polished user experience 1443 1443 1444 - == 16. Success ==1326 +--- 1445 1445 1446 - Clear Path Forward == 1328 +== 16. Success = Clear Path Forward == 1447 1447 1448 1448 **If POC succeeds (≥70% AI quality):** 1449 - 1450 1450 * ✅ Approach validated 1451 1451 * ✅ Proceed to POC2 (add scenarios) 1452 1452 * ✅ Design full Evidence Model structure ... ... @@ -1454,7 +1454,6 @@ 1454 1454 * ✅ Focus on improving AI quality from 70% → 90% 1455 1455 1456 1456 **If POC fails (< 60% AI quality):** 1457 - 1458 1458 * ✅ Learn what doesn't work 1459 1459 * ✅ Pivot to different approach 1460 1460 * ✅ OR wait for better AI technology ... ... @@ -1462,62 +1462,18 @@ 1462 1462 1463 1463 **Either way, POC provides clarity.** 1464 1464 1345 +--- 1346 + 1465 1465 == 17. Related Pages == 1466 1466 1467 -* [[User Needs>>FactHarbor.Specification.Requirements.User Needs .WebHome]]1468 -* [[Requirements>>FactHarbor. Specification.Requirements.WebHome]]1469 -* [[Gap Analysis>>FactHarbor. Specification.Requirements.GapAnalysis]]1470 -* [[Architecture>> Archive.FactHarbor.Specification.Architecture.WebHome]]1471 -* [[AKEL>> Archive.FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]1349 +* [[User Needs>>FactHarbor.Specification.Requirements.User Needs]] 1350 +* [[Requirements>>FactHarbor.Requirements.WebHome]] 1351 +* [[Gap Analysis>>FactHarbor.Analysis.GapAnalysis]] 1352 +* [[Architecture>>FactHarbor.Specification.Architecture.WebHome]] 1353 +* [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] 1472 1472 * [[Workflows>>FactHarbor.Specification.Workflows.WebHome]] 1473 1473 1356 +--- 1357 + 1474 1474 **Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment) 1475 1475 1476 - 1477 -=== NFR-POC-11: LLM Provider Abstraction (POC1) === 1478 - 1479 -**Requirement:** POC1 MUST implement LLM abstraction layer with support for multiple providers. 1480 - 1481 -**POC1 Implementation:** 1482 - 1483 -* **Primary Provider:** Anthropic Claude API 1484 -* Stage 1: Provider-default FAST model 1485 -* Stage 2: Provider-default REASONING model (cached) 1486 -* Stage 3: Provider-default REASONING model 1487 - 1488 -* **Provider Interface:** Abstract LLMProvider interface implemented 1489 - 1490 -* **Configuration:** Environment variables for provider selection 1491 -* {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}} 1492 -* {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}} 1493 -* {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}} 1494 - 1495 -* **Failover:** Basic error handling with cache fallback for Stage 2 1496 - 1497 -* **Cost Tracking:** Log provider name and cost per request 1498 - 1499 -**Future (POC2/Beta):** 1500 - 1501 -* Secondary provider (OpenAI) with automatic failover 1502 -* Admin API for runtime provider switching 1503 -* Cost comparison dashboard 1504 -* Cross-provider output verification 1505 - 1506 -**Success Criteria:** 1507 - 1508 -* All LLM calls go through abstraction layer (no direct API calls) 1509 -* Provider can be changed via environment variable without code changes 1510 -* Cost tracking includes provider name in logs 1511 -* Stage 2 falls back to cache on provider failure 1512 - 1513 -**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor.Specification.POC.API-and-Schemas.WebHome]] Section 6 1514 - 1515 -**Dependencies:** 1516 - 1517 -* NFR-14 (Main Requirements) 1518 -* Design Decision 9 1519 -* Architecture Section 2.2 1520 - 1521 -**Priority:** HIGH (P1) 1522 - 1523 -**Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.