Changes for page POC Requirements (POC1 & POC2)
Last modified by Robert Schaub on 2025/12/24 18:27
From version 3.3
edited by Robert Schaub
on 2025/12/24 18:27
on 2025/12/24 18:27
Change comment:
Update document after refactoring.
Summary
-
Page properties (2 modified, 0 added, 0 removed)
Details
- Page properties
-
- Parent
-
... ... @@ -1,1 +1,1 @@ 1 -WebHome 1 +Test.FactHarbor.Specification.POC.WebHome - Content
-
... ... @@ -1,7 +1,7 @@ 1 1 = POC Requirements = 2 2 3 -**Status:** ✅ Approved for Development 4 -**Version:** 2.0 (Updated after Specification Cross-Check) 3 +**Status:** ✅ Approved for Development 4 +**Version:** 2.0 (Updated after Specification Cross-Check) 5 5 **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention 6 6 7 7 == 1. POC Overview == ... ... @@ -9,11 +9,9 @@ 9 9 === 1.1 What POC Tests === 10 10 11 11 **Core Question:** 12 - 13 13 > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts? 14 14 15 15 **What we're proving:** 16 - 17 17 * AI can identify factual claims from text 18 18 * AI can evaluate those claims and produce verdicts 19 19 * Output is comprehensible and useful ... ... @@ -20,7 +20,6 @@ 20 20 * Fully automated approach is viable 21 21 22 22 **What we're NOT testing:** 23 - 24 24 * Scenario generation (deferred to POC2) 25 25 * Evidence display (deferred to POC2) 26 26 * Production scalability ... ... @@ -34,7 +34,6 @@ 34 34 Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**. 35 35 36 36 **Rationale:** 37 - 38 38 * **POC1 tests:** Can AI extract claims and generate verdicts? 39 39 * **POC2 will add:** Scenario generation and management 40 40 * **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow? ... ... @@ -46,7 +46,6 @@ 46 46 **No Risk:** 47 47 48 48 Scenarios are additive complexity, not foundational. Deferring them to POC2 allows: 49 - 50 50 * Faster POC1 validation 51 51 * Learning from POC1 to inform scenario design 52 52 * Iterative approach: fail fast if basic AI doesn't work ... ... @@ -53,10 +53,14 @@ 53 53 * Flexibility to adjust scenario architecture based on POC1 insights 54 54 55 55 **Full System Workflow (Future):** 56 -{{code}}Claims → Scenarios → Evidence → Verdicts{{/code}} 51 +{{code}} 52 +Claims → Scenarios → Evidence → Verdicts 53 +{{/code}} 57 57 58 58 **POC1 Simplified Workflow:** 59 -{{code}}Claims → Verdicts (scenarios implicit in reasoning){{/code}} 56 +{{code}} 57 +Claims → Verdicts (scenarios implicit in reasoning) 58 +{{/code}} 60 60 61 61 == 2. POC Output Specification == 62 62 ... ... @@ -64,10 +64,9 @@ 64 64 65 65 **What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument 66 66 67 -**Length:** 4-6 sentences 66 +**Length:** 4-6 sentences 68 68 69 69 **Content (Required Elements):** 70 - 71 71 1. **Article's main thesis/claim** - What is the article trying to argue or prove? 72 72 2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts 73 73 3. **Central vs. supporting claims** - Which claims are central to the article's argument? ... ... @@ -77,28 +77,30 @@ 77 77 **Critical Innovation:** 78 78 79 79 POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might: 80 - 81 81 * Make accurate supporting facts but draw unsupported conclusions 82 82 * Have one false central claim that invalidates the whole argument 83 83 * Misframe accurate information to mislead 84 84 85 85 **Good Example (Context-Aware):** 86 -{{code}}This article argues that coffee cures cancer based on its antioxidant 83 +{{code}} 84 +This article argues that coffee cures cancer based on its antioxidant 87 87 content. We analyzed 3 factual claims: 2 about coffee's chemical 88 88 properties are well-supported, but the main causal claim is refuted 89 89 by current evidence. The article confuses correlation with causation. 90 90 Overall assessment: MISLEADING - makes an unsupported medical claim 91 -despite citing some accurate facts.{{/code}} 89 +despite citing some accurate facts. 90 +{{/code}} 92 92 93 93 **Poor Example (Simple Aggregation - Don't Do This):** 94 -{{code}}This article makes 3 claims. 2 are well-supported and 1 is refuted. 95 -Overall assessment: mostly accurate (67% accurate).{{/code}} 93 +{{code}} 94 +This article makes 3 claims. 2 are well-supported and 1 is refuted. 95 +Overall assessment: mostly accurate (67% accurate). 96 +{{/code}} 96 96 ↑ This misses that the refuted claim IS the article's main point! 97 97 98 98 **What POC1 Tests:** 99 99 100 100 Can AI identify and assess: 101 - 102 102 * ✅ The article's main thesis/conclusion? 103 103 * ✅ Which claims are central vs. supporting? 104 104 * ✅ Whether the evidence supports the conclusion? ... ... @@ -107,7 +107,6 @@ 107 107 **If AI Cannot Do This:** 108 108 109 109 That's valuable to learn in POC1! We'll: 110 - 111 111 * Note as limitation 112 112 * Fall back to simple aggregation with warning 113 113 * Design explicit article-level analysis for POC2 ... ... @@ -114,30 +114,30 @@ 114 114 115 115 === 2.2 Component 2: CLAIMS IDENTIFICATION === 116 116 117 -**What:** List of factual claims extracted from article 118 -**Format:** Numbered list 119 -**Quantity:** 3-5 claims 116 +**What:** List of factual claims extracted from article 117 +**Format:** Numbered list 118 +**Quantity:** 3-5 claims 120 120 **Requirements:** 121 - 122 122 * Factual claims only (not opinions/questions) 123 123 * Clearly stated 124 124 * Automatically extracted by AI 125 125 126 126 **Example:** 127 -{{code}}CLAIMS IDENTIFIED: 125 +{{code}} 126 +CLAIMS IDENTIFIED: 128 128 129 129 [1] Coffee reduces diabetes risk by 30% 130 130 [2] Coffee improves heart health 131 131 [3] Decaf has same benefits as regular 132 -[4] Coffee prevents Alzheimer's completely{{/code}} 131 +[4] Coffee prevents Alzheimer's completely 132 +{{/code}} 133 133 134 134 === 2.3 Component 3: CLAIMS VERDICTS === 135 135 136 -**What:** Verdict for each claim identified 137 -**Format:** Per claim structure 136 +**What:** Verdict for each claim identified 137 +**Format:** Per claim structure 138 138 139 139 **Required Elements:** 140 - 141 141 * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED 142 142 * **Confidence Score:** 0-100% 143 143 * **Brief Reasoning:** 1-3 sentences explaining why ... ... @@ -144,7 +144,8 @@ 144 144 * **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration 145 145 146 146 **Example:** 147 -{{code}}VERDICTS: 146 +{{code}} 147 +VERDICTS: 148 148 149 149 [1] WELL-SUPPORTED (85%) [Risk: C] 150 150 Multiple studies confirm 25-30% risk reduction with regular consumption. ... ... @@ -156,12 +156,12 @@ 156 156 Some benefits overlap, but caffeine-related benefits are reduced in decaf. 157 157 158 158 [4] REFUTED (90%) [Risk: B] 159 -No evidence for complete prevention. Claim is significantly overstated.{{/code}} 159 +No evidence for complete prevention. Claim is significantly overstated. 160 +{{/code}} 160 160 161 161 **Risk Tier Display:** 162 - 163 163 * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections 164 -* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality 164 +* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality 165 165 * **Tier C (Green):** Low Risk - Facts/Definitions/History 166 166 167 167 **Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow. ... ... @@ -168,16 +168,18 @@ 168 168 169 169 === 2.4 Component 4: ARTICLE SUMMARY (Optional) === 170 170 171 -**What:** Brief summary of original article content 172 -**Length:** 3-5 sentences 171 +**What:** Brief summary of original article content 172 +**Length:** 3-5 sentences 173 173 **Tone:** Neutral (article's position, not FactHarbor's analysis) 174 174 175 175 **Example:** 176 -{{code}}ARTICLE SUMMARY: 176 +{{code}} 177 +ARTICLE SUMMARY: 177 177 178 178 Health News Today article discusses coffee benefits, citing studies 179 179 on diabetes and Alzheimer's. Author highlights research linking coffee 180 -to disease prevention. Recommends 2-3 cups daily for optimal health.{{/code}} 181 +to disease prevention. Recommends 2-3 cups daily for optimal health. 182 +{{/code}} 181 181 182 182 === 2.5 Component 5: USAGE STATISTICS (Cost Tracking) === 183 183 ... ... @@ -184,7 +184,6 @@ 184 184 **What:** LLM usage metrics for cost optimization and scaling decisions 185 185 186 186 **Purpose:** 187 - 188 188 * Understand cost per analysis 189 189 * Identify optimization opportunities 190 190 * Project costs at scale ... ... @@ -191,7 +191,8 @@ 191 191 * Inform architecture decisions 192 192 193 193 **Display Format:** 194 -{{code}}USAGE STATISTICS: 195 +{{code}} 196 +USAGE STATISTICS: 195 195 • Article: 2,450 words (12,300 characters) 196 196 • Input tokens: 15,234 197 197 • Output tokens: 892 ... ... @@ -199,18 +199,17 @@ 199 199 • Estimated cost: $0.24 USD 200 200 • Response time: 8.3 seconds 201 201 • Cost per claim: $0.048 202 -• Model: claude-sonnet-4-20250514{{/code}} 204 +• Model: claude-sonnet-4-20250514 205 +{{/code}} 203 203 204 204 **Why This Matters:** 205 205 206 206 At scale, LLM costs are critical: 207 - 208 208 * 10,000 articles/month ≈ $200-500/month 209 209 * 100,000 articles/month ≈ $2,000-5,000/month 210 210 * Cost optimization can reduce expenses 30-50% 211 211 212 212 **What POC1 Learns:** 213 - 214 214 * How cost scales with article length 215 215 * Prompt optimization opportunities (caching, compression) 216 216 * Output verbosity tradeoffs ... ... @@ -218,7 +218,6 @@ 218 218 * Article length limits (if needed) 219 219 220 220 **Implementation:** 221 - 222 222 * Claude API already returns usage data 223 223 * No extra API calls needed 224 224 * Display to user + log for aggregate analysis ... ... @@ -228,8 +228,7 @@ 228 228 229 229 === 2.6 Total Output Size === 230 230 231 -**Combined:** 220-350 words 232 - 231 +**Combined:** ~220-350 words 233 233 * Analysis Summary (Context-Aware): 60-90 words (4-6 sentences) 234 234 * Claims Identification: 30-50 words 235 235 * Claims Verdicts: 100-150 words ... ... @@ -244,7 +244,6 @@ 244 244 The following are **explicitly excluded** from POC: 245 245 246 246 **Content Features:** 247 - 248 248 * ❌ Scenarios (deferred to POC2) 249 249 * ❌ Evidence display (supporting/opposing lists) 250 250 * ❌ Source links (clickable references) ... ... @@ -254,7 +254,6 @@ 254 254 * ❌ Risk assessment (shown but not workflow-integrated) 255 255 256 256 **Platform Features:** 257 - 258 258 * ❌ User accounts / authentication 259 259 * ❌ Saved history 260 260 * ❌ Search functionality ... ... @@ -264,7 +264,6 @@ 264 264 * ❌ Social sharing 265 265 266 266 **Technical Features:** 267 - 268 268 * ❌ Browser extensions 269 269 * ❌ Mobile apps 270 270 * ❌ API endpoints ... ... @@ -272,7 +272,6 @@ 272 272 * ❌ Export features (PDF, CSV) 273 273 274 274 **Quality Features:** 275 - 276 276 * ❌ Accessibility (WCAG compliance) 277 277 * ❌ Multilingual support 278 278 * ❌ Mobile optimization ... ... @@ -279,7 +279,6 @@ 279 279 * ❌ Media verification (images/videos) 280 280 281 281 **Production Features:** 282 - 283 283 * ❌ Security hardening 284 284 * ❌ Privacy compliance (GDPR) 285 285 * ❌ Terms of service ... ... @@ -293,13 +293,17 @@ 293 293 === 4.1 Architecture Comparison === 294 294 295 295 **POC Architecture (Simplified):** 296 -{{code}}User Input → Single AKEL Call → Output Display 297 - (all processing){{/code}} 290 +{{code}} 291 +User Input → Single AKEL Call → Output Display 292 + (all processing) 293 +{{/code}} 298 298 299 299 **Full System Architecture:** 300 -{{code}}User Input → Claim Extractor → Claim Classifier → Scenario Generator 296 +{{code}} 297 +User Input → Claim Extractor → Claim Classifier → Scenario Generator 301 301 → Evidence Summarizer → Contradiction Detector → Verdict Generator 302 -→ Quality Gates → Publication → Output Display{{/code}} 299 +→ Quality Gates → Publication → Output Display 300 +{{/code}} 303 303 304 304 **Key Differences:** 305 305 ... ... @@ -315,14 +315,12 @@ 315 315 === 4.2 Workflow Comparison === 316 316 317 317 **POC1 Workflow:** 318 - 319 319 1. User submits text/URL 320 320 2. Single AKEL call (all processing in one prompt) 321 321 3. Display results 322 -**Total: 3 steps, 10-18 seconds** 319 +**Total: 3 steps, ~10-18 seconds** 323 323 324 324 **Full System Workflow:** 325 - 326 326 1. **Claim Submission** (extraction, normalization, clustering) 327 327 2. **Scenario Building** (definitions, assumptions, boundaries) 328 328 3. **Evidence Handling** (retrieval, assessment, linking) ... ... @@ -329,7 +329,7 @@ 329 329 4. **Verdict Creation** (synthesis, reasoning, approval) 330 330 5. **Public Presentation** (summaries, landscapes, deep dives) 331 331 6. **Time Evolution** (versioning, re-evaluation triggers) 332 -**Total: 6 phases with quality gates, 10-30 seconds** 328 +**Total: 6 phases with quality gates, ~10-30 seconds** 333 333 334 334 === 4.3 Why POC is Simplified === 335 335 ... ... @@ -352,7 +352,6 @@ 352 352 === 4.4 Gap Between POC1 and POC2/Beta === 353 353 354 354 **What needs to be built for POC2:** 355 - 356 356 * Scenario generation component 357 357 * Evidence Model structure (full) 358 358 * Scenario-evidence linking ... ... @@ -360,7 +360,6 @@ 360 360 * Truth landscape visualization 361 361 362 362 **What needs to be built for Beta:** 363 - 364 364 * Multi-component AKEL pipeline 365 365 * Quality gate infrastructure 366 366 * Review workflow system ... ... @@ -377,7 +377,6 @@ 377 377 **Mode:** Mode 2 (AI-Generated, No Prior Human Review) 378 378 379 379 Per FactHarbor Specification Section 11 "POC v1 Behavior": 380 - 381 381 * Produces public AI-generated output 382 382 * No human approval gate 383 383 * Clear AI-Generated labeling ... ... @@ -387,20 +387,21 @@ 387 387 === 5.2 User-Facing Labels === 388 388 389 389 **Primary Label (top of analysis):** 390 -{{code}}╔════════════════════════════════════════════════════════════╗ 391 -║ [AI-GENERATED - POC/DEMO] ║ 392 -║ ║ 393 -║ This analysis was produced entirely by AI and has not ║ 394 -║ been human-reviewed. Use for demonstration purposes. ║ 395 -║ ║ 396 -║ Source: AI/AKEL v1.0 (POC) ║ 397 -║ Review Status: Not Reviewed (Proof-of-Concept) ║ 398 -║ Quality Gates: 4/4 Passed (Simplified) ║ 399 -║ Last Updated: [timestamp] ║ 400 -╚════════════════════════════════════════════════════════════╝{{/code}} 383 +{{code}} 384 +╔════════════════════════════════════════════════════════════╗ 385 +║ [AI-GENERATED - POC/DEMO] ║ 386 +║ ║ 387 +║ This analysis was produced entirely by AI and has not ║ 388 +║ been human-reviewed. Use for demonstration purposes. ║ 389 +║ ║ 390 +║ Source: AI/AKEL v1.0 (POC) ║ 391 +║ Review Status: Not Reviewed (Proof-of-Concept) ║ 392 +║ Quality Gates: 4/4 Passed (Simplified) ║ 393 +║ Last Updated: [timestamp] ║ 394 +╚════════════════════════════════════════════════════════════╝ 395 +{{/code}} 401 401 402 402 **Per-Claim Risk Labels:** 403 - 404 404 * **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety) 405 405 * **[Risk: B]** 🟡 Medium Risk (Policy/Science) 406 406 * **[Risk: C]** 🟢 Low Risk (Facts/Definitions) ... ... @@ -408,7 +408,6 @@ 408 408 === 5.3 Display Requirements === 409 409 410 410 **Must Show:** 411 - 412 412 * AI-Generated status (prominent) 413 413 * POC/Demo disclaimer 414 414 * Risk tier per claim ... ... @@ -417,7 +417,6 @@ 417 417 * Timestamp 418 418 419 419 **Must NOT Claim:** 420 - 421 421 * Human review 422 422 * Production quality 423 423 * Medical/legal advice ... ... @@ -441,7 +441,6 @@ 441 441 Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates. 442 442 443 443 **Full System Has 4 Gates:** 444 - 445 445 1. Source Quality 446 446 2. Contradiction Search (MANDATORY) 447 447 3. Uncertainty Quantification ... ... @@ -448,7 +448,6 @@ 448 448 4. Structural Integrity 449 449 450 450 **POC Implements Simplified Versions:** 451 - 452 452 * Focus on demonstrating concept 453 453 * Basic implementations sufficient 454 454 * Failures displayed to user (not blocking) ... ... @@ -457,7 +457,6 @@ 457 457 === 6.2 Gate 1: Source Quality (Basic) === 458 458 459 459 **Full System Requirements:** 460 - 461 461 * Primary sources identified and accessible 462 462 * Source reliability scored against whitelist 463 463 * Citation completeness verified ... ... @@ -465,7 +465,6 @@ 465 465 * Author credentials validated 466 466 467 467 **POC Implementation:** 468 - 469 469 * ✅ At least 2 sources found 470 470 * ✅ Sources accessible (URLs valid) 471 471 * ❌ No whitelist checking ... ... @@ -479,7 +479,6 @@ 479 479 === 6.3 Gate 2: Contradiction Search (Basic) === 480 480 481 481 **Full System Requirements:** 482 - 483 483 * Counter-evidence actively searched 484 484 * Reservations and limitations identified 485 485 * Alternative interpretations explored ... ... @@ -488,7 +488,6 @@ 488 488 * Academic literature (supporting AND opposing) 489 489 490 490 **POC Implementation:** 491 - 492 492 * ✅ Basic search for counter-evidence 493 493 * ✅ Identify obvious contradictions 494 494 * ❌ No comprehensive academic search ... ... @@ -503,7 +503,6 @@ 503 503 === 6.4 Gate 3: Uncertainty Quantification (Basic) === 504 504 505 505 **Full System Requirements:** 506 - 507 507 * Confidence scores calculated for all claims/verdicts 508 508 * Limitations explicitly stated 509 509 * Data gaps identified and disclosed ... ... @@ -511,7 +511,6 @@ 511 511 * Alternative scenarios considered 512 512 513 513 **POC Implementation:** 514 - 515 515 * ✅ Confidence scores (0-100%) 516 516 * ✅ Basic uncertainty acknowledgment 517 517 * ❌ No detailed limitation disclosure ... ... @@ -525,7 +525,6 @@ 525 525 === 6.5 Gate 4: Structural Integrity (Basic) === 526 526 527 527 **Full System Requirements:** 528 - 529 529 * No hallucinations detected (fact-checking against sources) 530 530 * Logic chain valid and traceable 531 531 * References accessible and verifiable ... ... @@ -533,7 +533,6 @@ 533 533 * Premises clearly stated 534 534 535 535 **POC Implementation:** 536 - 537 537 * ✅ Basic coherence check 538 538 * ✅ References accessible 539 539 * ❌ No comprehensive hallucination detection ... ... @@ -547,20 +547,24 @@ 547 547 === 6.6 Quality Gate Display === 548 548 549 549 **POC shows simplified status:** 550 -{{code}}Quality Gates: 4/4 Passed (Simplified) 532 +{{code}} 533 +Quality Gates: 4/4 Passed (Simplified) 551 551 ✓ Source Quality: 3 sources found 552 552 ✓ Contradiction Search: Basic search completed 553 553 ✓ Uncertainty: Confidence scores assigned 554 -✓ Structural Integrity: Output coherent{{/code}} 537 +✓ Structural Integrity: Output coherent 538 +{{/code}} 555 555 556 556 **If any gate fails:** 557 -{{code}}Quality Gates: 3/4 Passed (Simplified) 541 +{{code}} 542 +Quality Gates: 3/4 Passed (Simplified) 558 558 ✓ Source Quality: 3 sources found 559 559 ✗ Contradiction Search: Search failed - limited evidence 560 560 ✓ Uncertainty: Confidence scores assigned 561 561 ✓ Structural Integrity: Output coherent 562 562 563 -Note: This analysis has limited evidence. Use with caution.{{/code}} 548 +Note: This analysis has limited evidence. Use with caution. 549 +{{/code}} 564 564 565 565 === 6.7 Simplified vs. Full System === 566 566 ... ... @@ -577,7 +577,6 @@ 577 577 === 7.1 POC AKEL (Simplified) === 578 578 579 579 **Implementation:** 580 - 581 581 * Single Claude API call (Sonnet 4.5) 582 582 * One comprehensive prompt 583 583 * All processing in single request ... ... @@ -585,19 +585,21 @@ 585 585 * No orchestration layer 586 586 587 587 **Prompt Structure:** 588 -{{code}}Task: Analyze this article and provide: 573 +{{code}} 574 +Task: Analyze this article and provide: 589 589 590 590 1. Extract 3-5 factual claims 591 591 2. For each claim: 592 - - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED) 593 - - Assign confidence score (0-100%) 594 - - Assign risk tier (A/B/C) 595 - - Write brief reasoning (1-3 sentences) 578 + - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED) 579 + - Assign confidence score (0-100%) 580 + - Assign risk tier (A/B/C) 581 + - Write brief reasoning (1-3 sentences) 596 596 3. Generate analysis summary (3-5 sentences) 597 597 4. Generate article summary (3-5 sentences) 598 598 5. Run basic quality checks 599 599 600 -Return as structured JSON.{{/code}} 586 +Return as structured JSON. 587 +{{/code}} 601 601 602 602 **Processing Time:** 10-18 seconds (estimate) 603 603 ... ... @@ -604,7 +604,8 @@ 604 604 === 7.2 Full System AKEL (Production) === 605 605 606 606 **Architecture:** 607 -{{code}}AKEL Orchestrator 594 +{{code}} 595 +AKEL Orchestrator 608 608 ├── Claim Extractor 609 609 ├── Claim Classifier (with risk tier assignment) 610 610 ├── Scenario Generator ... ... @@ -612,10 +612,10 @@ 612 612 ├── Contradiction Detector 613 613 ├── Quality Gate Validator 614 614 ├── Audit Sampling Scheduler 615 -└── Federation Sync Adapter (Release 1.0+){{/code}} 603 +└── Federation Sync Adapter (Release 1.0+) 604 +{{/code}} 616 616 617 617 **Processing:** 618 - 619 619 * Parallel processing where possible 620 620 * Separate component calls 621 621 * Quality gates between phases ... ... @@ -627,7 +627,6 @@ 627 627 === 7.3 Why POC Uses Single Call === 628 628 629 629 **Advantages:** 630 - 631 631 * ✅ Simpler to implement 632 632 * ✅ Faster POC development 633 633 * ✅ Easier to debug ... ... @@ -635,7 +635,6 @@ 635 635 * ✅ Good enough for concept validation 636 636 637 637 **Limitations:** 638 - 639 639 * ❌ No component reusability 640 640 * ❌ No parallel processing 641 641 * ❌ All-or-nothing (can't partially succeed) ... ... @@ -662,7 +662,6 @@ 662 662 **Requirement:** User can submit article for analysis 663 663 664 664 **Functionality:** 665 - 666 666 * Text input field (paste article text, up to 5000 characters) 667 667 * URL input field (paste article URL) 668 668 * "Analyze" button to trigger processing ... ... @@ -669,7 +669,6 @@ 669 669 * Loading indicator during analysis 670 670 671 671 **Excluded:** 672 - 673 673 * No user authentication 674 674 * No claim history 675 675 * No search functionality ... ... @@ -676,7 +676,6 @@ 676 676 * No saved templates 677 677 678 678 **Acceptance Criteria:** 679 - 680 680 * User can paste text from article 681 681 * User can paste URL of article 682 682 * System accepts input and triggers analysis ... ... @@ -686,7 +686,6 @@ 686 686 **Requirement:** AI automatically extracts 3-5 factual claims 687 687 688 688 **Functionality:** 689 - 690 690 * AI reads article text 691 691 * AI identifies factual claims (not opinions/questions) 692 692 * AI extracts 3-5 most important claims ... ... @@ -693,7 +693,6 @@ 693 693 * System displays numbered list 694 694 695 695 **Critical:** NO MANUAL EDITING ALLOWED 696 - 697 697 * AI selects which claims to extract 698 698 * AI identifies factual vs. non-factual 699 699 * System processes claims as extracted ... ... @@ -700,13 +700,11 @@ 700 700 * No human curation or correction 701 701 702 702 **Error Handling:** 703 - 704 704 * If extraction fails: Display error message 705 705 * User can retry with different input 706 706 * No manual intervention to fix extraction 707 707 708 708 **Acceptance Criteria:** 709 - 710 710 * AI extracts 3-5 claims automatically 711 711 * Claims are factual (not opinions) 712 712 * Claims are clearly stated ... ... @@ -717,17 +717,15 @@ 717 717 **Requirement:** AI automatically generates verdict for each claim 718 718 719 719 **Functionality:** 720 - 721 721 * For each claim, AI: 722 -* Evaluates claim based on available evidence/knowledge 723 -* Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED 724 -* Assigns confidence score (0-100%) 725 -* Assigns risk tier (A/B/C) 726 -* Writes brief reasoning (1-3 sentences) 700 + * Evaluates claim based on available evidence/knowledge 701 + * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED 702 + * Assigns confidence score (0-100%) 703 + * Assigns risk tier (A/B/C) 704 + * Writes brief reasoning (1-3 sentences) 727 727 * System displays verdict for each claim 728 728 729 729 **Critical:** NO MANUAL EDITING ALLOWED 730 - 731 731 * AI computes verdicts based on evidence 732 732 * AI generates confidence scores 733 733 * AI writes reasoning ... ... @@ -734,13 +734,11 @@ 734 734 * No human review or adjustment 735 735 736 736 **Error Handling:** 737 - 738 738 * If verdict generation fails: Display error message 739 739 * User can retry 740 740 * No manual intervention to adjust verdicts 741 741 742 742 **Acceptance Criteria:** 743 - 744 744 * Each claim has a verdict 745 745 * Confidence score is displayed (0-100%) 746 746 * Risk tier is displayed (A/B/C) ... ... @@ -753,17 +753,15 @@ 753 753 **Requirement:** AI generates brief summary of analysis 754 754 755 755 **Functionality:** 756 - 757 757 * AI summarizes findings in 3-5 sentences: 758 -* How many claims found 759 -* Distribution of verdicts 760 -* Overall assessment 732 + * How many claims found 733 + * Distribution of verdicts 734 + * Overall assessment 761 761 * System displays at top of results 762 762 763 763 **Critical:** NO MANUAL EDITING ALLOWED 764 764 765 765 **Acceptance Criteria:** 766 - 767 767 * Summary is coherent 768 768 * Accurately reflects analysis 769 769 * 3-5 sentences ... ... @@ -774,7 +774,6 @@ 774 774 **Requirement:** AI generates brief summary of original article 775 775 776 776 **Functionality:** 777 - 778 778 * AI summarizes article content (not FactHarbor's analysis) 779 779 * 3-5 sentences 780 780 * System displays ... ... @@ -784,7 +784,6 @@ 784 784 **Critical:** NO MANUAL EDITING ALLOWED 785 785 786 786 **Acceptance Criteria:** 787 - 788 788 * Summary is neutral (article's position) 789 789 * Accurately reflects article content 790 790 * 3-5 sentences ... ... @@ -795,7 +795,6 @@ 795 795 **Requirement:** Clear labeling of AI-generated content 796 796 797 797 **Functionality:** 798 - 799 799 * Display Mode 2 publication label 800 800 * Show POC/Demo disclaimer 801 801 * Display risk tiers per claim ... ... @@ -803,7 +803,6 @@ 803 803 * Display timestamp 804 804 805 805 **Acceptance Criteria:** 806 - 807 807 * Label is prominent and clear 808 808 * User understands this is AI-generated POC output 809 809 * Risk tiers are color-coded ... ... @@ -814,7 +814,6 @@ 814 814 **Requirement:** Execute simplified quality gates 815 815 816 816 **Functionality:** 817 - 818 818 * Check source quality (basic) 819 819 * Attempt contradiction search (basic) 820 820 * Calculate confidence scores ... ... @@ -822,7 +822,6 @@ 822 822 * Display gate results 823 823 824 824 **Acceptance Criteria:** 825 - 826 826 * All 4 gates attempted 827 827 * Pass/fail status displayed 828 828 * Failures explained to user ... ... @@ -837,7 +837,6 @@ 837 837 **Critical Rule:** NO MANUAL EDITING AT ANY STAGE 838 838 839 839 **What this means:** 840 - 841 841 * Claims: AI selects (no human curation) 842 842 * Scenarios: N/A (deferred to POC2) 843 843 * Evidence: AI evaluates (no human selection) ... ... @@ -845,12 +845,13 @@ 845 845 * Summaries: AI writes (no human editing) 846 846 847 847 **Pipeline:** 848 -{{code}}User Input → AKEL Processing → Output Display 849 - ↓ 850 - ZERO human editing{{/code}} 814 +{{code}} 815 +User Input → AKEL Processing → Output Display 816 + ↓ 817 + ZERO human editing 818 +{{/code}} 851 851 852 852 **If AI output is poor:** 853 - 854 854 * ❌ Do NOT manually fix it 855 855 * ✅ Document the failure 856 856 * ✅ Improve prompts and retry ... ... @@ -857,7 +857,6 @@ 857 857 * ✅ Accept that POC might fail 858 858 859 859 **Why this matters:** 860 - 861 861 * Tests whether AI can do this without humans 862 862 * Validates scalability (humans can't review every analysis) 863 863 * Honest test of technical feasibility ... ... @@ -867,19 +867,16 @@ 867 867 **Requirement:** Analysis completes in reasonable time 868 868 869 869 **Acceptable Performance:** 870 - 871 871 * Processing time: 1-5 minutes (acceptable for POC) 872 872 * Display loading indicator to user 873 873 * Show progress if possible ("Extracting claims...", "Generating verdicts...") 874 874 875 875 **Not Required:** 876 - 877 877 * Production-level speed (< 30 seconds) 878 878 * Optimization for scale 879 879 * Caching 880 880 881 881 **Acceptance Criteria:** 882 - 883 883 * Analysis completes within 5 minutes 884 884 * User sees loading indicator 885 885 * No timeout errors ... ... @@ -889,19 +889,16 @@ 889 889 **Requirement:** System works for manual testing sessions 890 890 891 891 **Acceptable:** 892 - 893 893 * Occasional errors (< 20% failure rate) 894 894 * Manual restart if needed 895 895 * Display error messages clearly 896 896 897 897 **Not Required:** 898 - 899 899 * 99.9% uptime 900 900 * Automatic error recovery 901 901 * Production monitoring 902 902 903 903 **Acceptance Criteria:** 904 - 905 905 * System works for test demonstrations 906 906 * Errors are handled gracefully 907 907 * User receives clear error messages ... ... @@ -911,7 +911,6 @@ 911 911 **Requirement:** Runs on simple infrastructure 912 912 913 913 **Acceptable:** 914 - 915 915 * Single machine or simple cloud setup 916 916 * No distributed architecture 917 917 * No load balancing ... ... @@ -919,7 +919,6 @@ 919 919 * Local development environment viable 920 920 921 921 **Not Required:** 922 - 923 923 * Production infrastructure 924 924 * Multi-region deployment 925 925 * Auto-scaling ... ... @@ -930,7 +930,6 @@ 930 930 **Requirement:** Track and display LLM usage metrics to inform optimization decisions 931 931 932 932 **Must Track:** 933 - 934 934 * Input tokens (article + prompt) 935 935 * Output tokens (generated analysis) 936 936 * Total tokens ... ... @@ -939,19 +939,16 @@ 939 939 * Article length (words/characters) 940 940 941 941 **Must Display:** 942 - 943 943 * Usage statistics in UI (Component 5) 944 944 * Cost per analysis 945 945 * Cost per claim extracted 946 946 947 947 **Must Log:** 948 - 949 949 * Aggregate metrics for analysis 950 950 * Cost distribution by article length 951 951 * Token efficiency trends 952 952 953 953 **Purpose:** 954 - 955 955 * Understand unit economics 956 956 * Identify optimization opportunities 957 957 * Project costs at scale ... ... @@ -958,7 +958,6 @@ 958 958 * Inform architecture decisions (caching, model selection, etc.) 959 959 960 960 **Acceptance Criteria:** 961 - 962 962 * ✅ Usage data displayed after each analysis 963 963 * ✅ Metrics logged for aggregate analysis 964 964 * ✅ Cost calculated accurately (Claude API pricing) ... ... @@ -966,7 +966,6 @@ 966 966 * ✅ POC1 report includes cost analysis section 967 967 968 968 **Success Target:** 969 - 970 970 * Average cost per analysis < $0.05 USD 971 971 * Cost scaling behavior understood (linear/exponential) 972 972 * 2+ optimization opportunities identified ... ... @@ -978,13 +978,11 @@ 978 978 === 10.1 System Components === 979 979 980 980 **Frontend:** 981 - 982 982 * Simple HTML form (text input + URL input + button) 983 983 * Loading indicator 984 984 * Results display page (single page, no tabs/navigation) 985 985 986 986 **Backend:** 987 - 988 988 * Single API endpoint 989 989 * Calls Claude API (Sonnet 4.5 or latest) 990 990 * Parses response ... ... @@ -991,12 +991,10 @@ 991 991 * Returns JSON to frontend 992 992 993 993 **Data Storage:** 994 - 995 995 * None required (stateless POC) 996 996 * Optional: Simple file storage or SQLite for demo examples 997 997 998 998 **External Services:** 999 - 1000 1000 * Claude API (Anthropic) - required 1001 1001 * Optional: URL fetch service for article text extraction 1002 1002 ... ... @@ -1004,23 +1004,23 @@ 1004 1004 1005 1005 {{code}} 1006 1006 1. User submits text or URL 1007 - ↓ 955 + ↓ 1008 1008 2. Backend receives request 1009 - ↓ 957 + ↓ 1010 1010 3. If URL: Fetch article text 1011 - ↓ 959 + ↓ 1012 1012 4. Call Claude API with single prompt: 1013 - "Extract claims, evaluate each, provide verdicts" 1014 - ↓ 961 + "Extract claims, evaluate each, provide verdicts" 962 + ↓ 1015 1015 5. Claude API returns: 1016 - - Analysis summary 1017 - - Claims list 1018 - - Verdicts for each claim (with risk tiers) 1019 - - Article summary (optional) 1020 - - Quality gate results 1021 - ↓ 964 + - Analysis summary 965 + - Claims list 966 + - Verdicts for each claim (with risk tiers) 967 + - Article summary (optional) 968 + - Quality gate results 969 + ↓ 1022 1022 6. Backend parses response 1023 - ↓ 971 + ↓ 1024 1024 7. Frontend displays results with Mode 2 labeling 1025 1025 {{/code}} 1026 1026 ... ... @@ -1029,43 +1029,45 @@ 1029 1029 === 10.3 AI Prompt Strategy === 1030 1030 1031 1031 **Single Comprehensive Prompt:** 1032 -{{code}}Task: Analyze this article and provide: 980 +{{code}} 981 +Task: Analyze this article and provide: 1033 1033 1034 1034 1. Identify the article's main thesis/conclusion 1035 - - What is the article trying to argue or prove? 1036 - - What is the primary claim or conclusion? 984 + - What is the article trying to argue or prove? 985 + - What is the primary claim or conclusion? 1037 1037 1038 1038 2. Extract 3-5 factual claims from the article 1039 - - Note which claims are CENTRAL to the main thesis 1040 - - Note which claims are SUPPORTING facts 988 + - Note which claims are CENTRAL to the main thesis 989 + - Note which claims are SUPPORTING facts 1041 1041 1042 1042 3. For each claim: 1043 - - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED) 1044 - - Assign confidence score (0-100%) 1045 - - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions) 1046 - - Write brief reasoning (1-3 sentences) 992 + - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED) 993 + - Assign confidence score (0-100%) 994 + - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions) 995 + - Write brief reasoning (1-3 sentences) 1047 1047 1048 1048 4. Assess relationship between claims and main thesis: 1049 - - Do the claims actually support the article's conclusion? 1050 - - Are there logical leaps or unsupported inferences? 1051 - - Is the article's framing misleading even if individual facts are accurate? 998 + - Do the claims actually support the article's conclusion? 999 + - Are there logical leaps or unsupported inferences? 1000 + - Is the article's framing misleading even if individual facts are accurate? 1052 1052 1053 1053 5. Run quality gates: 1054 - - Check: ≥2 sources found 1055 - - Attempt: Basic contradiction search 1056 - - Calculate: Confidence scores 1057 - - Verify: Structural integrity 1003 + - Check: ≥2 sources found 1004 + - Attempt: Basic contradiction search 1005 + - Calculate: Confidence scores 1006 + - Verify: Structural integrity 1058 1058 1059 1059 6. Write context-aware analysis summary (4-6 sentences): 1060 - - State article's main thesis 1061 - - Report claims found and verdict distribution 1062 - - Note if central claims are problematic 1063 - - Assess whether evidence supports conclusion 1064 - - Overall credibility considering claim importance 1009 + - State article's main thesis 1010 + - Report claims found and verdict distribution 1011 + - Note if central claims are problematic 1012 + - Assess whether evidence supports conclusion 1013 + - Overall credibility considering claim importance 1065 1065 1066 1066 7. Write article summary (3-5 sentences: neutral summary of article content) 1067 1067 1068 -Return as structured JSON with quality gate results.{{/code}} 1017 +Return as structured JSON with quality gate results. 1018 +{{/code}} 1069 1069 1070 1070 **One prompt generates everything.** 1071 1071 ... ... @@ -1076,30 +1076,25 @@ 1076 1076 === 10.4 Technology Stack Suggestions === 1077 1077 1078 1078 **Frontend:** 1079 - 1080 1080 * HTML + CSS + JavaScript (minimal framework) 1081 1081 * OR: Next.js (if team prefers) 1082 1082 * Hosted: Local machine OR Vercel/Netlify free tier 1083 1083 1084 1084 **Backend:** 1085 - 1086 1086 * Python Flask/FastAPI (simple REST API) 1087 1087 * OR: Next.js API routes (if using Next.js) 1088 1088 * Hosted: Local machine OR Railway/Render free tier 1089 1089 1090 1090 **AKEL Integration:** 1091 - 1092 1092 * Claude API via Anthropic SDK 1093 1093 * Model: Claude Sonnet 4.5 or latest available 1094 1094 1095 1095 **Database:** 1096 - 1097 1097 * None (stateless acceptable) 1098 1098 * OR: SQLite if want to store demo examples 1099 1099 * OR: JSON files on disk 1100 1100 1101 1101 **Deployment:** 1102 - 1103 1103 * Local development environment sufficient for POC 1104 1104 * Optional: Deploy to cloud for remote demos 1105 1105 ... ... @@ -1108,7 +1108,6 @@ 1108 1108 === 11.1 Minimum Success (POC Passes) === 1109 1109 1110 1110 **Required for GO decision:** 1111 - 1112 1112 * ✅ AI extracts 3-5 factual claims automatically 1113 1113 * ✅ AI provides verdict for each claim automatically 1114 1114 * ✅ Verdicts are reasonable (≥70% make logical sense) ... ... @@ -1122,7 +1122,6 @@ 1122 1122 * ✅ **Optimization opportunities identified** (≥2 potential improvements documented) 1123 1123 1124 1124 **Quality Definition:** 1125 - 1126 1126 * "Reasonable verdict" = Defensible given general knowledge 1127 1127 * "Coherent summary" = Logically structured, grammatically correct 1128 1128 * "Comprehensible" = Reviewers understand what analysis means ... ... @@ -1130,7 +1130,6 @@ 1130 1130 === 11.2 POC Fails If === 1131 1131 1132 1132 **Automatic NO-GO if any of these:** 1133 - 1134 1134 * ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones) 1135 1135 * ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random) 1136 1136 * ❌ Output incomprehensible (reviewers can't understand analysis) ... ... @@ -1142,15 +1142,14 @@ 1142 1142 **POC quality expectations:** 1143 1143 1144 1144 |=Component|=Quality Threshold|=Definition 1145 -|Claim Extraction|(% class="success" %)≥70% accuracy |Identifies obvious factual claims, may miss some edge cases 1146 -|Verdict Logic|(% class="success" %)≥70% defensible |Verdicts are logical given reasoning provided 1147 -|Reasoning Clarity|(% class="success" %)≥70% clear |1-3 sentences are understandable and relevant 1148 -|Overall Analysis|(% class="success" %)≥70% useful |Output helps user understand article claims 1087 +|Claim Extraction|(% class="success" %)≥70% accuracy(%%) |Identifies obvious factual claims, may miss some edge cases 1088 +|Verdict Logic|(% class="success" %)≥70% defensible(%%) |Verdicts are logical given reasoning provided 1089 +|Reasoning Clarity|(% class="success" %)≥70% clear(%%) |1-3 sentences are understandable and relevant 1090 +|Overall Analysis|(% class="success" %)≥70% useful(%%) |Output helps user understand article claims 1149 1149 1150 1150 **Analogy:** "B student" quality (70-80%), not "A+" perfection yet 1151 1151 1152 1152 **Not expecting:** 1153 - 1154 1154 * 100% accuracy 1155 1155 * Perfect claim coverage 1156 1156 * Comprehensive evidence gathering ... ... @@ -1158,7 +1158,6 @@ 1158 1158 * Production polish 1159 1159 1160 1160 **Expecting:** 1161 - 1162 1162 * Reasonable claim extraction 1163 1163 * Defensible verdicts 1164 1164 * Understandable reasoning ... ... @@ -1171,7 +1171,6 @@ 1171 1171 **Input:** "Coffee reduces the risk of type 2 diabetes by 30%" 1172 1172 1173 1173 **Expected Output:** 1174 - 1175 1175 * Extract claim correctly 1176 1176 * Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED 1177 1177 * Confidence: 70-90% ... ... @@ -1185,7 +1185,6 @@ 1185 1185 **Input:** News article URL with multiple claims about politics/health/science 1186 1186 1187 1187 **Expected Output:** 1188 - 1189 1189 * Extract 3-5 key claims 1190 1190 * Verdict for each (may vary: some supported, some uncertain, some refuted) 1191 1191 * Coherent analysis summary ... ... @@ -1199,7 +1199,6 @@ 1199 1199 **Input:** Article on contested political or scientific topic 1200 1200 1201 1201 **Expected Output:** 1202 - 1203 1203 * Balanced analysis 1204 1204 * Acknowledges uncertainty where appropriate 1205 1205 * Doesn't overstate confidence ... ... @@ -1212,7 +1212,6 @@ 1212 1212 **Input:** Article with obviously false claim (e.g., "The Earth is flat") 1213 1213 1214 1214 **Expected Output:** 1215 - 1216 1216 * Extract claim 1217 1217 * Verdict: REFUTED 1218 1218 * High confidence (> 90%) ... ... @@ -1226,7 +1226,6 @@ 1226 1226 **Input:** Article with claim where evidence is genuinely mixed 1227 1227 1228 1228 **Expected Output:** 1229 - 1230 1230 * Extract claim 1231 1231 * Verdict: UNCERTAIN 1232 1232 * Moderate confidence (40-60%) ... ... @@ -1239,7 +1239,6 @@ 1239 1239 **Input:** Article making medical claims 1240 1240 1241 1241 **Expected Output:** 1242 - 1243 1243 * Extract claim 1244 1244 * Verdict: [appropriate based on evidence] 1245 1245 * Risk tier: A (High - medical) ... ... @@ -1257,7 +1257,6 @@ 1257 1257 **Option A: GO (Proceed to POC2)** 1258 1258 1259 1259 **Conditions:** 1260 - 1261 1261 * AI quality ≥70% without manual editing 1262 1262 * Basic claim → verdict pipeline validated 1263 1263 * Internal + advisor feedback positive ... ... @@ -1266,7 +1266,6 @@ 1266 1266 * Clear path to improving AI quality to ≥90% 1267 1267 1268 1268 **Next Steps:** 1269 - 1270 1270 * Plan POC2 development (add scenarios) 1271 1271 * Design scenario architecture 1272 1272 * Expand to Evidence Model structure ... ... @@ -1275,7 +1275,6 @@ 1275 1275 **Option B: NO-GO (Pivot or Stop)** 1276 1276 1277 1277 **Conditions:** 1278 - 1279 1279 * AI quality < 60% 1280 1280 * Requires manual editing for most analyses (> 50%) 1281 1281 * Feedback indicates fundamental flaws ... ... @@ -1283,7 +1283,6 @@ 1283 1283 * No clear path to improvement 1284 1284 1285 1285 **Next Steps:** 1286 - 1287 1287 * **Pivot:** Change to hybrid human-AI approach (accept manual review required) 1288 1288 * **Stop:** Conclude approach not viable, revisit later 1289 1289 ... ... @@ -1290,7 +1290,6 @@ 1290 1290 **Option C: ITERATE (Improve POC)** 1291 1291 1292 1292 **Conditions:** 1293 - 1294 1294 * Concept has merit but execution needs work 1295 1295 * Specific improvements identified 1296 1296 * Addressable with better prompts/approach ... ... @@ -1297,7 +1297,6 @@ 1297 1297 * AI quality between 60-70% 1298 1298 1299 1299 **Next Steps:** 1300 - 1301 1301 * Improve AI prompts 1302 1302 * Test different approaches 1303 1303 * Re-run POC with improvements ... ... @@ -1306,9 +1306,9 @@ 1306 1306 === 13.2 Decision Criteria Summary === 1307 1307 1308 1308 {{code}} 1309 -AI Quality < 60% → NO-GO (approach doesn't work) 1237 +AI Quality < 60% → NO-GO (approach doesn't work) 1310 1310 AI Quality 60-70% → ITERATE (improve and retry) 1311 -AI Quality ≥70% → GO (proceed to POC2) 1239 +AI Quality ≥70% → GO (proceed to POC2) 1312 1312 {{/code}} 1313 1313 1314 1314 == 14. Key Risks & Mitigations == ... ... @@ -1315,11 +1315,10 @@ 1315 1315 1316 1316 === 14.1 Risk: AI Quality Not Good Enough === 1317 1317 1318 -**Likelihood:** Medium-High 1319 -**Impact:** POC fails 1246 +**Likelihood:** Medium-High 1247 +**Impact:** POC fails 1320 1320 1321 1321 **Mitigation:** 1322 - 1323 1323 * Extensive prompt engineering and testing 1324 1324 * Use best available AI models (Sonnet 4.5) 1325 1325 * Test with diverse article types ... ... @@ -1329,11 +1329,10 @@ 1329 1329 1330 1330 === 14.2 Risk: AI Consistency Issues === 1331 1331 1332 -**Likelihood:** Medium 1333 -**Impact:** Works sometimes, fails other times 1259 +**Likelihood:** Medium 1260 +**Impact:** Works sometimes, fails other times 1334 1334 1335 1335 **Mitigation:** 1336 - 1337 1337 * Test with 10+ diverse articles 1338 1338 * Measure success rate honestly 1339 1339 * Improve prompts to increase consistency ... ... @@ -1342,11 +1342,10 @@ 1342 1342 1343 1343 === 14.3 Risk: Output Incomprehensible === 1344 1344 1345 -**Likelihood:** Low-Medium 1346 -**Impact:** Users can't understand analysis 1271 +**Likelihood:** Low-Medium 1272 +**Impact:** Users can't understand analysis 1347 1347 1348 1348 **Mitigation:** 1349 - 1350 1350 * Create clear explainer document 1351 1351 * Iterate on output format 1352 1352 * Test with non-technical reviewers ... ... @@ -1356,11 +1356,10 @@ 1356 1356 1357 1357 === 14.4 Risk: API Rate Limits / Costs === 1358 1358 1359 -**Likelihood:** Low 1360 -**Impact:** System slow or expensive 1284 +**Likelihood:** Low 1285 +**Impact:** System slow or expensive 1361 1361 1362 1362 **Mitigation:** 1363 - 1364 1364 * Monitor API usage 1365 1365 * Implement retry logic 1366 1366 * Estimate costs before scaling ... ... @@ -1369,11 +1369,10 @@ 1369 1369 1370 1370 === 14.5 Risk: Scope Creep === 1371 1371 1372 -**Likelihood:** Medium 1373 -**Impact:** POC becomes too complex 1296 +**Likelihood:** Medium 1297 +**Impact:** POC becomes too complex 1374 1374 1375 1375 **Mitigation:** 1376 - 1377 1377 * Strict scope discipline 1378 1378 * Say NO to feature additions 1379 1379 * Keep focus on core question ... ... @@ -1384,15 +1384,12 @@ 1384 1384 1385 1385 === 15.1 Core Principles === 1386 1386 1387 -* 1388 -** 1389 -**1. Build Less, Learn More 1310 +**1. Build Less, Learn More** 1390 1390 * Minimum features to test hypothesis 1391 1391 * Don't build unvalidated features 1392 1392 * Focus on core question only 1393 1393 1394 1394 **2. Fail Fast** 1395 - 1396 1396 * Quick test of hardest part (AI capability) 1397 1397 * Accept that POC might fail 1398 1398 * Better to discover issues early ... ... @@ -1399,19 +1399,16 @@ 1399 1399 * Honest assessment over optimistic hope 1400 1400 1401 1401 **3. Test First, Build Second** 1402 - 1403 1403 * Validate AI can do this before building platform 1404 1404 * Don't assume it will work 1405 1405 * Let results guide decisions 1406 1406 1407 1407 **4. Automation First** 1408 - 1409 1409 * No manual editing allowed 1410 1410 * Tests scalability, not just feasibility 1411 1411 * Proves approach can work at scale 1412 1412 1413 1413 **5. Honest Assessment** 1414 - 1415 1415 * Don't cherry-pick examples 1416 1416 * Don't manually fix bad outputs 1417 1417 * Document failures openly ... ... @@ -1419,25 +1419,22 @@ 1419 1419 1420 1420 === 15.2 What POC Is === 1421 1421 1422 -✅ Testing AI capability without humans 1423 -✅ Proving core technical concept 1424 -✅ Fast validation of approach 1425 -✅ Honest assessment of feasibility 1339 +✅ Testing AI capability without humans 1340 +✅ Proving core technical concept 1341 +✅ Fast validation of approach 1342 +✅ Honest assessment of feasibility 1426 1426 1427 1427 === 15.3 What POC Is NOT === 1428 1428 1429 -❌ Building a product 1430 -❌ Production-ready system 1431 -❌ Feature-complete platform 1432 -❌ Perfectly accurate analysis 1433 -❌ Polished user experience 1346 +❌ Building a product 1347 +❌ Production-ready system 1348 +❌ Feature-complete platform 1349 +❌ Perfectly accurate analysis 1350 +❌ Polished user experience 1434 1434 1435 -== 16. Success == 1352 +== 16. Success = Clear Path Forward == 1436 1436 1437 - Clear Path Forward == 1438 - 1439 1439 **If POC succeeds (≥70% AI quality):** 1440 - 1441 1441 * ✅ Approach validated 1442 1442 * ✅ Proceed to POC2 (add scenarios) 1443 1443 * ✅ Design full Evidence Model structure ... ... @@ -1445,7 +1445,6 @@ 1445 1445 * ✅ Focus on improving AI quality from 70% → 90% 1446 1446 1447 1447 **If POC fails (< 60% AI quality):** 1448 - 1449 1449 * ✅ Learn what doesn't work 1450 1450 * ✅ Pivot to different approach 1451 1451 * ✅ OR wait for better AI technology ... ... @@ -1455,9 +1455,9 @@ 1455 1455 1456 1456 == 17. Related Pages == 1457 1457 1458 -* [[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]] 1459 -* [[Requirements>>FactHarbor.Specification.Requirements.WebHome]] 1460 -* [[Gap Analysis>>FactHarbor.Specification.Requirements.GapAnalysis]] 1371 +* [[User Needs>>Test.FactHarbor.Specification.Requirements.User Needs.WebHome]] 1372 +* [[Requirements>>Test.FactHarbor.Specification.Requirements.WebHome]] 1373 +* [[Gap Analysis>>Test.FactHarbor.Specification.Requirements.GapAnalysis]] 1461 1461 * [[Architecture>>FactHarbor.Specification.Architecture.WebHome]] 1462 1462 * [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] 1463 1463 * [[Workflows>>FactHarbor.Specification.Workflows.WebHome]] ... ... @@ -1464,51 +1464,3 @@ 1464 1464 1465 1465 **Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment) 1466 1466 1467 - 1468 -=== NFR-POC-11: LLM Provider Abstraction (POC1) === 1469 - 1470 -**Requirement:** POC1 MUST implement LLM abstraction layer with support for multiple providers. 1471 - 1472 -**POC1 Implementation:** 1473 - 1474 -* **Primary Provider:** Anthropic Claude API 1475 -* Stage 1: Claude Haiku 4 1476 -* Stage 2: Claude Sonnet 3.5 (cached) 1477 -* Stage 3: Claude Sonnet 3.5 1478 - 1479 -* **Provider Interface:** Abstract LLMProvider interface implemented 1480 - 1481 -* **Configuration:** Environment variables for provider selection 1482 -* {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}} 1483 -* {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}} 1484 -* {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}} 1485 - 1486 -* **Failover:** Basic error handling with cache fallback for Stage 2 1487 - 1488 -* **Cost Tracking:** Log provider name and cost per request 1489 - 1490 -**Future (POC2/Beta):** 1491 - 1492 -* Secondary provider (OpenAI) with automatic failover 1493 -* Admin API for runtime provider switching 1494 -* Cost comparison dashboard 1495 -* Cross-provider output verification 1496 - 1497 -**Success Criteria:** 1498 - 1499 -* All LLM calls go through abstraction layer (no direct API calls) 1500 -* Provider can be changed via environment variable without code changes 1501 -* Cost tracking includes provider name in logs 1502 -* Stage 2 falls back to cache on provider failure 1503 - 1504 -**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor V0\.9\.103.Specification.POC.API-and-Schemas.WebHome]] Section 6 1505 - 1506 -**Dependencies:** 1507 - 1508 -* NFR-14 (Main Requirements) 1509 -* Design Decision 9 1510 -* Architecture Section 2.2 1511 - 1512 -**Priority:** HIGH (P1) 1513 - 1514 -**Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.