Changes for page POC Requirements (POC1 & POC2)
Last modified by Robert Schaub on 2025/12/23 18:00
To version 2.2
edited by Robert Schaub
on 2025/12/23 18:00
on 2025/12/23 18:00
Change comment:
Renamed from xwiki:Test.FactHarbor.Specification.POC.Requirements
Summary
-
Page properties (3 modified, 0 added, 0 removed)
Details
- Page properties
-
- Title
-
... ... @@ -1,1 +1,1 @@ 1 -POC Requirements 1 +POC Requirements (POC1 & POC2) - Parent
-
... ... @@ -1,1 +1,1 @@ 1 - FactHarbor.Specification.POC.WebHome1 +WebHome - Content
-
... ... @@ -1,1359 +1,581 @@ 1 1 = POC Requirements = 2 2 3 3 **Status:** ✅ Approved for Development 4 -**Version:** 2.0 (UpdatedafterSpecificationCross-Check)4 +**Version:** 3.0 (Aligned with Main Requirements) 5 5 **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention 6 6 7 ---- 7 +{{info}} 8 +**Core Philosophy:** POC validates the [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]] through simplified implementation. All POC features map to formal FR/NFR requirements. 9 +{{/info}} 8 8 11 + 9 9 == 1. POC Overview == 10 10 11 11 === 1.1 What POC Tests === 12 12 13 13 **Core Question:** 17 + 14 14 > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts? 15 15 16 16 **What we're proving:** 21 + 17 17 * AI can identify factual claims from text 18 -* AI can evaluate those claims andproduceverdicts19 -* Outputis comprehensibleanduseful20 -* Fullyautomatedapproachisviable23 +* AI can evaluate those claims with structured evidence 24 +* Quality gates can filter unreliable outputs 25 +* The core workflow is technically feasible 21 21 22 -**What we're NOT testing:** 23 -* Scenario generation (deferred to POC2) 24 -* Evidence display (deferred to POC2) 25 -* Production scalability 26 -* Perfect accuracy 27 -* Complete feature set 27 +**What we're NOT proving:** 28 28 29 ---- 29 +* Production-ready reliability (that's POC2) 30 +* User-facing features (that's Beta 0) 31 +* Full IFCN compliance (that's V1.0) 30 30 31 -=== 1.2 Scenarios DeferredtoPOC2===33 +=== 1.2 Requirements Mapping === 32 32 33 -** IntentionalSimplification:**35 +POC1 implements a **subset** of the full system requirements defined in [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]. 34 34 35 -Sc enarios are a core componentof the full FactHarbor system(Claims → Scenarios → Evidence → Verdicts), but are **deliberatelyexcluded from POC1**.37 +**Scope Summary:** 36 36 37 -**Rationale:** 38 -* **POC1 tests:** Can AI extract claims and generate verdicts? 39 -* **POC2 will add:** Scenario generation and management 40 -* **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow? 39 +* **In Scope:** 8 requirements (7 FRs + 1 NFR) 40 +* **Partial:** 3 NFRs (simplified versions) 41 +* **Out of Scope:** 19 requirements (deferred to later phases) 41 41 42 - **DesignDecision:**43 +== 2. POC1 Scope == 43 43 44 -Prove basic AI capability first, then add scenario complexity based on POC1 learnings. This is good engineering: test the hardest part (AI fact-checking) before adding architectural complexity. 45 +{{success}} 46 +**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor V0\.9\.88 ex 2 new Org Pages.Roadmap.Requirements-Roadmap-Matrix.WebHome]] 45 45 46 -**No Risk:** 48 +The Roadmap Matrix is the single source of truth for which requirements are implemented in which phases. This page provides POC1-specific implementation details only. 49 +{{/success}} 47 47 48 -Scenarios are additive complexity, not foundational. Deferring them to POC2 allows: 49 -* Faster POC1 validation 50 -* Learning from POC1 to inform scenario design 51 -* Iterative approach: fail fast if basic AI doesn't work 52 -* Flexibility to adjust scenario architecture based on POC1 insights 51 +**POC1 implements these formal requirements:** 53 53 54 - **FullSystemWorkflow(Future):**55 - {{code}}56 - Claims→Scenarios→ Evidence→ Verdicts57 - {{/code}}53 +|= Formal Req |= Implementation in POC1 |= Notes 54 +| **FR4** | Analysis Summary | Basic format; quality metadata deferred to POC2 55 +| **FR7** | Automated Verdicts | Full implementation with quality gates (NFR11) 56 +| **NFR11** | Quality Assurance Framework | 4 quality gates implemented 58 58 59 -**POC1 Simplified Workflow:** 60 -{{code}} 61 -Claims → Verdicts (scenarios implicit in reasoning) 62 -{{/code}} 58 +**POC1 also implements these workflow components** (detailed as FR1-FR6 in implementation sections below) 63 63 64 -- --60 +{{info}}**Note:** FR11 (Audit Trail) and FR13 (In-Article Claim Highlighting) are deferred to Beta 0 for production readiness and user experience enhancement.{{/info}}: 65 65 66 -== 2. POC Output Specification == 62 +* Claim extraction (FR1) 63 +* Claim context (FR2) 64 +* Multiple scenarios (FR3) 65 +* Evidence collection (FR5) 66 +* Source quality assessment (FR6) 67 +* Time evolution tracking (FR8) - deferred to POC2 68 +* Audit trail (FR11) - deferred to Beta 0 69 +* In-article highlighting (FR13) - deferred to Beta 0 67 67 68 - ===2.1 Component1:ANALYSIS SUMMARY ===71 +**Partial implementations:** 69 69 70 -**What:** Brief overview of findings 71 -**Length:** 3-5 sentences 72 -**Content:** 73 -* How many claims found 74 -* Distribution of verdicts 75 -* Overall assessment 73 +* NFR1 (Explainability) - Basic only 74 +* NFR2 (Performance) - Functional but not optimized 75 +* NFR3 (Transparency) - Basic only 76 76 77 -**Example:** 78 -{{code}} 79 -This article makes 4 claims about coffee's health effects. We found 80 -2 claims are well-supported, 1 is uncertain, and 1 is refuted. 81 -Overall assessment: mostly accurate with some exaggeration. 82 -{{/code}} 77 +**Detailed POC1 implementation specifications continue below...** 83 83 84 ---- 85 85 86 -=== 2.2 Component 2: CLAIMS IDENTIFICATION === 87 87 88 -**What:** List of factual claims extracted from article 89 -**Format:** Numbered list 90 -**Quantity:** 3-5 claims 91 -**Requirements:** 92 -* Factual claims only (not opinions/questions) 93 -* Clearly stated 94 -* Automatically extracted by AI 81 +== 3. POC Simplifications == 95 95 96 -**Example:** 97 -{{code}} 98 -CLAIMS IDENTIFIED: 83 +=== 3.1 FR1: Claim Extraction (Full Implementation) === 99 99 100 -[1] Coffee reduces diabetes risk by 30% 101 -[2] Coffee improves heart health 102 -[3] Decaf has same benefits as regular 103 -[4] Coffee prevents Alzheimer's completely 104 -{{/code}} 85 +**Main Requirement:** AI extracts factual claims from input text 105 105 106 ---- 107 - 108 -=== 2.3 Component 3: CLAIMS VERDICTS === 109 - 110 -**What:** Verdict for each claim identified 111 -**Format:** Per claim structure 112 - 113 -**Required Elements:** 114 -* **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED 115 -* **Confidence Score:** 0-100% 116 -* **Brief Reasoning:** 1-3 sentences explaining why 117 -* **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration 118 - 119 -**Example:** 120 -{{code}} 121 -VERDICTS: 122 - 123 -[1] WELL-SUPPORTED (85%) [Risk: C] 124 -Multiple studies confirm 25-30% risk reduction with regular consumption. 125 - 126 -[2] UNCERTAIN (65%) [Risk: B] 127 -Evidence is mixed. Some studies show benefits, others show no effect. 128 - 129 -[3] PARTIALLY SUPPORTED (60%) [Risk: C] 130 -Some benefits overlap, but caffeine-related benefits are reduced in decaf. 131 - 132 -[4] REFUTED (90%) [Risk: B] 133 -No evidence for complete prevention. Claim is significantly overstated. 134 -{{/code}} 135 - 136 -**Risk Tier Display:** 137 -* **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections 138 -* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality 139 -* **Tier C (Green):** Low Risk - Facts/Definitions/History 140 - 141 -**Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow. 142 - 143 ---- 144 - 145 -=== 2.4 Component 4: ARTICLE SUMMARY (Optional) === 146 - 147 -**What:** Brief summary of original article content 148 -**Length:** 3-5 sentences 149 -**Tone:** Neutral (article's position, not FactHarbor's analysis) 150 - 151 -**Example:** 152 -{{code}} 153 -ARTICLE SUMMARY: 154 - 155 -Health News Today article discusses coffee benefits, citing studies 156 -on diabetes and Alzheimer's. Author highlights research linking coffee 157 -to disease prevention. Recommends 2-3 cups daily for optimal health. 158 -{{/code}} 159 - 160 ---- 161 - 162 -=== 2.5 Total Output Size === 163 - 164 -**Combined:** ~200-300 words 165 -* Analysis Summary: 50-70 words 166 -* Claims Identification: 30-50 words 167 -* Claims Verdicts: 100-150 words 168 -* Article Summary: 30-50 words (optional) 169 - 170 ---- 171 - 172 -== 3. What's NOT in POC Scope == 173 - 174 -=== 3.1 Feature Exclusions === 175 - 176 -The following are **explicitly excluded** from POC: 177 - 178 -**Content Features:** 179 -* ❌ Scenarios (deferred to POC2) 180 -* ❌ Evidence display (supporting/opposing lists) 181 -* ❌ Source links (clickable references) 182 -* ❌ Detailed reasoning chains 183 -* ❌ Source quality ratings (shown but not detailed) 184 -* ❌ Contradiction detection (basic only) 185 -* ❌ Risk assessment (shown but not workflow-integrated) 186 - 187 -**Platform Features:** 188 -* ❌ User accounts / authentication 189 -* ❌ Saved history 190 -* ❌ Search functionality 191 -* ❌ Claim comparison 192 -* ❌ User contributions 193 -* ❌ Commenting system 194 -* ❌ Social sharing 195 - 196 -**Technical Features:** 197 -* ❌ Browser extensions 198 -* ❌ Mobile apps 199 -* ❌ API endpoints 200 -* ❌ Webhooks 201 -* ❌ Export features (PDF, CSV) 202 - 203 -**Quality Features:** 204 -* ❌ Accessibility (WCAG compliance) 205 -* ❌ Multilingual support 206 -* ❌ Mobile optimization 207 -* ❌ Media verification (images/videos) 208 - 209 -**Production Features:** 210 -* ❌ Security hardening 211 -* ❌ Privacy compliance (GDPR) 212 -* ❌ Terms of service 213 -* ❌ Monitoring/logging 214 -* ❌ Error tracking 215 -* ❌ Analytics 216 -* ❌ A/B testing 217 - 218 ---- 219 - 220 -== 4. POC Simplifications vs. Full System == 221 - 222 -=== 4.1 Architecture Comparison === 223 - 224 -**POC Architecture (Simplified):** 225 -{{code}} 226 -User Input → Single AKEL Call → Output Display 227 - (all processing) 228 -{{/code}} 229 - 230 -**Full System Architecture:** 231 -{{code}} 232 -User Input → Claim Extractor → Claim Classifier → Scenario Generator 233 -→ Evidence Summarizer → Contradiction Detector → Verdict Generator 234 -→ Quality Gates → Publication → Output Display 235 -{{/code}} 236 - 237 -**Key Differences:** 238 - 239 -|=Aspect|=POC1|=Full System 240 -|Processing|Single API call|Multi-component pipeline 241 -|Scenarios|None (implicit)|Explicit entities with versioning 242 -|Evidence|Basic retrieval|Comprehensive with quality scoring 243 -|Quality Gates|Simplified (4 basic checks)|Full validation infrastructure 244 -|Workflow|3 steps (input/process/output)|6 phases with gates 245 -|Data Model|Stateless (no database)|PostgreSQL + Redis + S3 246 -|Architecture|Single prompt to Claude|AKEL Orchestrator + Components 247 - 248 ---- 249 - 250 -=== 4.2 Workflow Comparison === 251 - 252 -**POC1 Workflow:** 253 -1. User submits text/URL 254 -2. Single AKEL call (all processing in one prompt) 255 -3. Display results 256 -**Total: 3 steps, ~10-18 seconds** 257 - 258 -**Full System Workflow:** 259 -1. **Claim Submission** (extraction, normalization, clustering) 260 -2. **Scenario Building** (definitions, assumptions, boundaries) 261 -3. **Evidence Handling** (retrieval, assessment, linking) 262 -4. **Verdict Creation** (synthesis, reasoning, approval) 263 -5. **Public Presentation** (summaries, landscapes, deep dives) 264 -6. **Time Evolution** (versioning, re-evaluation triggers) 265 -**Total: 6 phases with quality gates, ~10-30 seconds** 266 - 267 ---- 268 - 269 -=== 4.3 Why POC is Simplified === 270 - 271 -**Engineering Rationale:** 272 - 273 -1. **Test core capability first:** Can AI do basic fact-checking without humans? 274 -2. **Fail fast:** If AI can't generate reasonable verdicts, pivot early 275 -3. **Learn before building:** POC1 insights inform full architecture 276 -4. **Iterative approach:** Add complexity only after validating foundations 277 -5. **Resource efficiency:** Don't build full system if core concept fails 278 - 279 -**Acceptable Trade-offs:** 280 - 281 -* ✅ POC proves AI capability (most risky assumption) 282 -* ✅ POC validates user comprehension (can people understand output?) 283 -* ❌ POC doesn't validate full workflow (test in Beta) 284 -* ❌ POC doesn't validate scale (test in Beta) 285 -* ❌ POC doesn't validate scenario architecture (design in POC2) 286 - 287 ---- 288 - 289 -=== 4.4 Gap Between POC1 and POC2/Beta === 290 - 291 -**What needs to be built for POC2:** 292 -* Scenario generation component 293 -* Evidence Model structure (full) 294 -* Scenario-evidence linking 295 -* Multi-interpretation comparison 296 -* Truth landscape visualization 297 - 298 -**What needs to be built for Beta:** 299 -* Multi-component AKEL pipeline 300 -* Quality gate infrastructure 301 -* Review workflow system 302 -* Audit sampling framework 303 -* Production data model 304 -* Federation architecture (Release 1.0) 305 - 306 -**POC1 → POC2 is significant architectural expansion.** 307 - 308 ---- 309 - 310 -== 5. Publication Mode & Labeling == 311 - 312 -=== 5.1 POC Publication Mode === 313 - 314 -**Mode:** Mode 2 (AI-Generated, No Prior Human Review) 315 - 316 -Per FactHarbor Specification Section 11 "POC v1 Behavior": 317 -* Produces public AI-generated output 318 -* No human approval gate 319 -* Clear AI-Generated labeling 320 -* All quality gates active (simplified) 321 -* Risk tier classification shown (demo) 322 - 323 ---- 324 - 325 -=== 5.2 User-Facing Labels === 326 - 327 -**Primary Label (top of analysis):** 328 -{{code}} 329 -╔════════════════════════════════════════════════════════════╗ 330 -║ [AI-GENERATED - POC/DEMO] ║ 331 -║ ║ 332 -║ This analysis was produced entirely by AI and has not ║ 333 -║ been human-reviewed. Use for demonstration purposes. ║ 334 -║ ║ 335 -║ Source: AI/AKEL v1.0 (POC) ║ 336 -║ Review Status: Not Reviewed (Proof-of-Concept) ║ 337 -║ Quality Gates: 4/4 Passed (Simplified) ║ 338 -║ Last Updated: [timestamp] ║ 339 -╚════════════════════════════════════════════════════════════╝ 340 -{{/code}} 341 - 342 -**Per-Claim Risk Labels:** 343 -* **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety) 344 -* **[Risk: B]** 🟡 Medium Risk (Policy/Science) 345 -* **[Risk: C]** 🟢 Low Risk (Facts/Definitions) 346 - 347 ---- 348 - 349 -=== 5.3 Display Requirements === 350 - 351 -**Must Show:** 352 -* AI-Generated status (prominent) 353 -* POC/Demo disclaimer 354 -* Risk tier per claim 355 -* Confidence scores (0-100%) 356 -* Quality gate status (passed/failed) 357 -* Timestamp 358 - 359 -**Must NOT Claim:** 360 -* Human review 361 -* Production quality 362 -* Medical/legal advice 363 -* Authoritative verdicts 364 -* Complete accuracy 365 - 366 ---- 367 - 368 -=== 5.4 Mode 2 vs. Full System Publication === 369 - 370 -|=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3 371 -|Label|AI-Generated (POC)|AI-Generated|AKEL-Generated 372 -|Review|None|None|Human-Reviewed 373 -|Quality Gates|4 (simplified)|6 (full)|6 (full) + Human 374 -|Audit|None (POC)|Sampling (5-50%)|Pre-publication 375 -|Risk Display|Demo only|Workflow-integrated|Validated 376 -|User Actions|View only|Flag for review|Trust rating 377 - 378 ---- 379 - 380 -== 6. Quality Gates (Simplified Implementation) == 381 - 382 -=== 6.1 Overview === 383 - 384 -Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates. 385 - 386 -**Full System Has 4 Gates:** 387 -1. Source Quality 388 -2. Contradiction Search (MANDATORY) 389 -3. Uncertainty Quantification 390 -4. Structural Integrity 391 - 392 -**POC Implements Simplified Versions:** 393 -* Focus on demonstrating concept 394 -* Basic implementations sufficient 395 -* Failures displayed to user (not blocking) 396 -* Full system has comprehensive validation 397 - 398 ---- 399 - 400 -=== 6.2 Gate 1: Source Quality (Basic) === 401 - 402 -**Full System Requirements:** 403 -* Primary sources identified and accessible 404 -* Source reliability scored against whitelist 405 -* Citation completeness verified 406 -* Publication dates checked 407 -* Author credentials validated 408 - 409 409 **POC Implementation:** 410 -* ✅ At least 2 sources found 411 -* ✅ Sources accessible (URLs valid) 412 -* ❌ No whitelist checking 413 -* ❌ No credential validation 414 -* ❌ No comprehensive reliability scoring 415 415 416 -**Pass Criteria:** ≥2 accessible sources found 89 +* ✅ AKEL extracts claims using LLM 90 +* ✅ Each claim includes original text reference 91 +* ✅ Claims are identified as factual/non-factual 92 +* ❌ No advanced claim parsing (added in POC2) 417 417 418 -** FailureHandling:** Display error message, don't generateverdict94 +**Acceptance Criteria:** 419 419 420 ---- 96 +* Extracts 3-5 claims from typical article 97 +* Identifies factual vs non-factual claims 98 +* Quality Gate 1 validates extraction 421 421 422 -=== 6.3Gate2:ContradictionSearch(Basic) ===100 +=== 3.2 FR3: Multiple Scenarios (Full Implementation) === 423 423 424 -**Full System Requirements:** 425 -* Counter-evidence actively searched 426 -* Reservations and limitations identified 427 -* Alternative interpretations explored 428 -* Bubble detection (echo chambers, conspiracy theories) 429 -* Cross-cultural and international perspectives 430 -* Academic literature (supporting AND opposing) 102 +**Main Requirement:** Generate multiple interpretation scenarios for ambiguous claims 431 431 432 432 **POC Implementation:** 433 -* ✅ Basic search for counter-evidence 434 -* ✅ Identify obvious contradictions 435 -* ❌ No comprehensive academic search 436 -* ❌ No bubble detection 437 -* ❌ No systematic alternative interpretation search 438 -* ❌ No international perspective verification 439 439 440 -**Pass Criteria:** Basic contradiction search attempted 106 +* ✅ AKEL generates 2-3 scenarios per claim 107 +* ✅ Scenarios capture different interpretations 108 +* ✅ Each scenario is evaluated separately 109 +* ✅ Verdict considers all scenarios 441 441 442 -** FailureHandling:** Note"limited contradiction search"in output111 +**Acceptance Criteria:** 443 443 444 ---- 113 +* Generates 2+ scenarios for ambiguous claims 114 +* Scenarios are meaningfully different 115 +* All scenarios are evaluated 445 445 446 -=== 6.4 Gate3:UncertaintyQuantification(Basic) ===117 +=== 3.3 FR4: Analysis Summary (Basic Implementation) === 447 447 448 -**Full System Requirements:** 449 -* Confidence scores calculated for all claims/verdicts 450 -* Limitations explicitly stated 451 -* Data gaps identified and disclosed 452 -* Strength of evidence assessed 453 -* Alternative scenarios considered 119 +**Main Requirement:** Provide user-friendly summary of analysis 454 454 455 455 **POC Implementation:** 456 -* ✅ Confidence scores (0-100%) 457 -* ✅ Basic uncertainty acknowledgment 458 -* ❌ No detailed limitation disclosure 459 -* ❌ No data gap identification 460 -* ❌ No alternative scenario consideration (deferred to POC2) 461 461 462 -**Pass Criteria:** Confidence score assigned 123 +* ✅ Simple text summary generated 124 +* ❌ No rich formatting (added in Beta 0) 125 +* ❌ No visual elements (added in Beta 0) 126 +* ❌ No interactive features (added in Beta 0) 463 463 464 -**Failure Handling:** Show "Confidence: Unknown" if calculation fails 128 +**POC Format:** 129 +``` 130 +Claim: [extracted claim] 131 +Scenarios: [list of scenarios] 132 +Evidence: [supporting/opposing evidence] 133 +Verdict: [probability with uncertainty] 134 +``` 465 465 466 ---- 467 467 468 -=== 6.5Gate4: Structural Integrity (Basic) ===137 +=== 3.4 FR5-FR6: Evidence Collection & Evaluation (Full Implementation) === 469 469 470 -**Full System Requirements:** 471 -* No hallucinations detected (fact-checking against sources) 472 -* Logic chain valid and traceable 473 -* References accessible and verifiable 474 -* No circular reasoning 475 -* Premises clearly stated 139 +**Main Requirements:** 476 476 141 +* FR5: Collect supporting and opposing evidence 142 +* FR6: Evaluate evidence source reliability 143 + 477 477 **POC Implementation:** 478 -* ✅ Basic coherence check 479 -* ✅ References accessible 480 -* ❌ No comprehensive hallucination detection 481 -* ❌ No formal logic validation 482 -* ❌ No premise extraction and verification 483 483 484 -**Pass Criteria:** Output is coherent and references are accessible 146 +* ✅ AKEL searches for evidence (web/knowledge base) 147 +* ✅ **Mandatory contradiction search** (finds opposing evidence) 148 +* ✅ Source reliability scoring 149 +* ❌ No evidence deduplication (added in POC2) 150 +* ❌ No advanced source verification (added in POC2) 485 485 486 -**Failure Handling:** Display error message 487 - 488 ---- 489 - 490 -=== 6.6 Quality Gate Display === 491 - 492 -**POC shows simplified status:** 493 -{{code}} 494 -Quality Gates: 4/4 Passed (Simplified) 495 -✓ Source Quality: 3 sources found 496 -✓ Contradiction Search: Basic search completed 497 -✓ Uncertainty: Confidence scores assigned 498 -✓ Structural Integrity: Output coherent 499 -{{/code}} 500 - 501 -**If any gate fails:** 502 -{{code}} 503 -Quality Gates: 3/4 Passed (Simplified) 504 -✓ Source Quality: 3 sources found 505 -✗ Contradiction Search: Search failed - limited evidence 506 -✓ Uncertainty: Confidence scores assigned 507 -✓ Structural Integrity: Output coherent 508 - 509 -Note: This analysis has limited evidence. Use with caution. 510 -{{/code}} 511 - 512 ---- 513 - 514 -=== 6.7 Simplified vs. Full System === 515 - 516 -|=Gate|=POC (Simplified)|=Full System 517 -|Source Quality|≥2 sources accessible|Whitelist scoring, credentials, comprehensiveness 518 -|Contradiction|Basic search|Systematic academic + media + international 519 -|Uncertainty|Confidence % assigned|Detailed limitations, data gaps, alternatives 520 -|Structural|Coherence check|Hallucination detection, logic validation, premise check 521 - 522 -**POC Goal:** Demonstrate that quality gates are possible, not perfect implementation. 523 - 524 ---- 525 - 526 -== 7. AKEL Architecture Comparison == 527 - 528 -=== 7.1 POC AKEL (Simplified) === 529 - 530 -**Implementation:** 531 -* Single Claude API call (Sonnet 4.5) 532 -* One comprehensive prompt 533 -* All processing in single request 534 -* No separate components 535 -* No orchestration layer 536 - 537 -**Prompt Structure:** 538 -{{code}} 539 -Task: Analyze this article and provide: 540 - 541 -1. Extract 3-5 factual claims 542 -2. For each claim: 543 - - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED) 544 - - Assign confidence score (0-100%) 545 - - Assign risk tier (A/B/C) 546 - - Write brief reasoning (1-3 sentences) 547 -3. Generate analysis summary (3-5 sentences) 548 -4. Generate article summary (3-5 sentences) 549 -5. Run basic quality checks 550 - 551 -Return as structured JSON. 552 -{{/code}} 553 - 554 -**Processing Time:** 10-18 seconds (estimate) 555 - 556 ---- 557 - 558 -=== 7.2 Full System AKEL (Production) === 559 - 560 -**Architecture:** 561 -{{code}} 562 -AKEL Orchestrator 563 -├── Claim Extractor 564 -├── Claim Classifier (with risk tier assignment) 565 -├── Scenario Generator 566 -├── Evidence Summarizer 567 -├── Contradiction Detector 568 -├── Quality Gate Validator 569 -├── Audit Sampling Scheduler 570 -└── Federation Sync Adapter (Release 1.0+) 571 -{{/code}} 572 - 573 -**Processing:** 574 -* Parallel processing where possible 575 -* Separate component calls 576 -* Quality gates between phases 577 -* Audit sampling selection 578 -* Cross-node coordination (federated mode) 579 - 580 -**Processing Time:** 10-30 seconds (full pipeline) 581 - 582 ---- 583 - 584 -=== 7.3 Why POC Uses Single Call === 585 - 586 -**Advantages:** 587 -* ✅ Simpler to implement 588 -* ✅ Faster POC development 589 -* ✅ Easier to debug 590 -* ✅ Proves AI capability 591 -* ✅ Good enough for concept validation 592 - 593 -**Limitations:** 594 -* ❌ No component reusability 595 -* ❌ No parallel processing 596 -* ❌ All-or-nothing (can't partially succeed) 597 -* ❌ Harder to improve individual components 598 -* ❌ No audit sampling 599 - 600 -**Acceptable Trade-off:** 601 - 602 -POC tests "Can AI do this?" not "How should we architect it?" 603 - 604 -Full component architecture comes in Beta after POC validates concept. 605 - 606 ---- 607 - 608 -=== 7.4 Evolution Path === 609 - 610 -**POC1:** Single prompt → Prove concept 611 -**POC2:** Add scenario component → Test full pipeline 612 -**Beta:** Multi-component AKEL → Production architecture 613 -**Release 1.0:** Full AKEL + Federation → Scale 614 - 615 ---- 616 - 617 -== 8. Functional Requirements == 618 - 619 -=== FR-POC-1: Article Input === 620 - 621 -**Requirement:** User can submit article for analysis 622 - 623 -**Functionality:** 624 -* Text input field (paste article text, up to 5000 characters) 625 -* URL input field (paste article URL) 626 -* "Analyze" button to trigger processing 627 -* Loading indicator during analysis 628 - 629 -**Excluded:** 630 -* No user authentication 631 -* No claim history 632 -* No search functionality 633 -* No saved templates 634 - 635 635 **Acceptance Criteria:** 636 -* User can paste text from article 637 -* User can paste URL of article 638 -* System accepts input and triggers analysis 639 639 640 ---- 154 +* Finds 2+ supporting evidence items 155 +* Finds 1+ opposing evidence (if exists) 156 +* Sources scored for reliability 641 641 642 -=== FR -POC-2:ClaimExtraction(FullyAutomated) ===158 +=== 3.5 FR7: Automated Verdicts (Full Implementation) === 643 643 644 -**Requirement:** AI automaticallyextracts3-5factualclaims160 +**Main Requirement:** AI computes verdicts with uncertainty quantification 645 645 646 -**Functionality:** 647 -* AI reads article text 648 -* AI identifies factual claims (not opinions/questions) 649 -* AI extracts 3-5 most important claims 650 -* System displays numbered list 162 +**POC Implementation:** 651 651 652 -**Critical:** NO MANUAL EDITING ALLOWED 653 -* AI selects which claims to extract 654 -* AI identifies factual vs. non-factual 655 -* System processes claims as extracted 656 -* No human curation or correction 164 +* ✅ Probabilistic verdicts (0-100% confidence) 165 +* ✅ Uncertainty explicitly stated 166 +* ✅ Reasoning chain provided 167 +* ✅ Quality Gate 4 validates verdict confidence 657 657 658 -**Error Handling:** 659 -* If extraction fails: Display error message 660 -* User can retry with different input 661 -* No manual intervention to fix extraction 169 +**POC Output:** 170 +``` 171 +Verdict: 70% likely true 172 +Uncertainty: ±15% (moderate confidence) 173 +Reasoning: Based on 3 high-quality sources... 174 +Confidence Level: MEDIUM 175 +``` 662 662 663 663 **Acceptance Criteria:** 664 -* AI extracts 3-5 claims automatically 665 -* Claims are factual (not opinions) 666 -* Claims are clearly stated 667 -* No manual editing required 668 668 669 ---- 179 +* Verdicts include probability (0-100%) 180 +* Uncertainty explicitly quantified 181 +* Reasoning chain explains verdict 670 670 671 -=== FR -POC-3:VerdictGeneration(FullyAutomated) ===183 +=== 3.6 NFR11: Quality Assurance Framework (LITE VERSION) === 672 672 673 -**Requirement:** AI automaticallygeneratesverdictforeach claim185 +**Main Requirement:** Complete quality assurance with 7 quality gates 674 674 675 -**Functionality:** 676 -* For each claim, AI: 677 - * Evaluates claim based on available evidence/knowledge 678 - * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED 679 - * Assigns confidence score (0-100%) 680 - * Assigns risk tier (A/B/C) 681 - * Writes brief reasoning (1-3 sentences) 682 -* System displays verdict for each claim 187 +**POC Implementation:** **2 gates only** 683 683 684 -**Critical:** NO MANUAL EDITING ALLOWED 685 -* AI computes verdicts based on evidence 686 -* AI generates confidence scores 687 -* AI writes reasoning 688 -* No human review or adjustment 189 +**Quality Gate 1: Claim Validation** 689 689 690 -**Error Handling:** 691 -* If verdict generation fails: Display error message 692 -* User can retry 693 -* No manual intervention to adjust verdicts 191 +* ✅ Validates claim is factual and verifiable 192 +* ✅ Blocks non-factual claims (opinion/prediction/ambiguous) 193 +* ✅ Provides clear rejection reason 694 694 695 -**Acceptance Criteria:** 696 -* Each claim has a verdict 697 -* Confidence score is displayed (0-100%) 698 -* Risk tier is displayed (A/B/C) 699 -* Reasoning is understandable (1-3 sentences) 700 -* Verdict is defensible given reasoning 701 -* All generated automatically by AI 195 +**Quality Gate 4: Verdict Confidence Assessment** 702 702 703 ---- 197 +* ✅ Validates ≥2 sources found 198 +* ✅ Validates quality score ≥0.6 199 +* ✅ Blocks low-confidence verdicts 200 +* ✅ Provides clear rejection reason 704 704 705 - === FR-POC-4: Analysis Summary(FullyAutomated)===202 +**Out of Scope (POC2+):** 706 706 707 -**Requirement:** AI generates brief summary of analysis 204 +* ❌ Gate 2: Evidence Relevance 205 +* ❌ Gate 3: Scenario Coherence 206 +* ❌ Gate 5: Source Diversity 207 +* ❌ Gate 6: Reasoning Validity 208 +* ❌ Gate 7: Output Completeness 708 708 709 -**Functionality:** 710 -* AI summarizes findings in 3-5 sentences: 711 - * How many claims found 712 - * Distribution of verdicts 713 - * Overall assessment 714 -* System displays at top of results 210 +**Rationale:** Prove gate concept works. Add remaining gates in POC2 after validating approach. 715 715 716 -**Critical:** NO MANUAL EDITING ALLOWED 717 717 718 -**Acceptance Criteria:** 719 -* Summary is coherent 720 -* Accurately reflects analysis 721 -* 3-5 sentences 722 -* Automatically generated 213 +=== 3.7 NFR1-3: Performance, Scalability, Reliability (Basic) === 723 723 724 - ---215 +**Main Requirements:** 725 725 726 -=== FR-POC-5: Article Summary (Fully Automated, Optional) === 217 +* NFR1: Response time < 30 seconds 218 +* NFR2: Handle 1000+ concurrent users 219 +* NFR3: 99.9% uptime 727 727 728 -** Requirement:** AI generates brief summaryof original article221 +**POC Implementation:** 729 729 730 -**Functionality:** 731 -* AI summarizes article content (not FactHarbor's analysis) 732 -* 3-5 sentences 733 -* System displays 223 +* ⚠️ **Response time monitored** (not optimized) 224 +* ⚠️ **Single-threaded processing** (no concurrency) 225 +* ⚠️ **Basic error handling** (no advanced retry logic) 734 734 735 -** Note:** Optional- canskipiftime limited227 +**Rationale:** POC proves functionality. Performance optimization happens in POC2. 736 736 737 -**C ritical:**NO MANUAL EDITING ALLOWED229 +**POC Acceptance:** 738 738 739 -**Acceptance Criteria:** 740 -* Summary is neutral (article's position) 741 -* Accurately reflects article content 742 -* 3-5 sentences 743 -* Automatically generated 231 +* Analysis completes (no timeout requirement) 232 +* Errors don't crash system 233 +* Basic logging in place 744 744 745 - ---235 +== 4. What's NOT in POC Scope == 746 746 747 -=== FR-POC-6:PublicationModeDisplay===237 +=== 4.1 User-Facing Features (Beta 0+) === 748 748 749 -**Requirement:** Clear labeling of AI-generated content 239 +{{warning}} 240 +**Deferred to Beta 0:** 241 +{{/warning}} 750 750 751 -**Functionality:** 752 -* Display Mode 2 publication label 753 -* Show POC/Demo disclaimer 754 -* Display risk tiers per claim 755 -* Show quality gate status 756 -* Display timestamp 243 +**Out of Scope:** 757 757 758 -**Acceptance Criteria:** 759 -* Label is prominent and clear 760 -* User understands this is AI-generated POC output 761 -* Risk tiers are color-coded 762 -* Quality gate status is visible 245 +* ❌ User accounts and authentication (FR8) 246 +* ❌ User corrections system (FR9, FR45-46) 247 +* ❌ Public publishing interface (FR10) 248 +* ❌ Social sharing (FR11) 249 +* ❌ Email notifications (FR12) 250 +* ❌ API access (FR13) 763 763 764 - ---252 +**Rationale:** POC validates AI capabilities. User features added in Beta 0. 765 765 766 -=== FR-POC-7: Quality Gate Execution === 767 767 768 - **Requirement:**Executesimplifiedquality gates255 +=== 4.2 Advanced Features (V1.0+) === 769 769 770 -**Functionality:** 771 -* Check source quality (basic) 772 -* Attempt contradiction search (basic) 773 -* Calculate confidence scores 774 -* Verify structural integrity (basic) 775 -* Display gate results 257 +**Out of Scope:** 776 776 777 -**Acceptance Criteria:** 778 -* All 4 gates attempted 779 -* Pass/fail status displayed 780 -* Failures explained to user 781 -* Gates don't block publication (POC mode) 259 +* ❌ IFCN compliance (FR47) 260 +* ❌ ClaimReview schema (FR48) 261 +* ❌ Archive.org integration (FR49) 262 +* ❌ OSINT toolkit (FR50) 263 +* ❌ Video verification (FR51) 264 +* ❌ Deepfake detection (FR52) 265 +* ❌ Cross-org sharing (FR53) 782 782 783 -- --267 +**Rationale:** Advanced features require proven platform. Added post-V1.0. 784 784 785 -== 9. Non-Functional Requirements == 786 786 787 -=== NFR-POC-1:Fully AutomatedProcessing===270 +=== 4.3 Production Requirements (POC2, Beta 0) === 788 788 789 -** Requirement:**CompleteAI automation with zero manual intervention272 +**Out of Scope:** 790 790 791 -**Critical Rule:** NO MANUAL EDITING AT ANY STAGE 274 +* ❌ Security controls (NFR4, NFR12) 275 +* ❌ Code maintainability (NFR5) 276 +* ❌ System monitoring (NFR13) 277 +* ❌ Evidence deduplication 278 +* ❌ Advanced source verification 279 +* ❌ Full 7-gate quality framework 792 792 793 -**What this means:** 794 -* Claims: AI selects (no human curation) 795 -* Scenarios: N/A (deferred to POC2) 796 -* Evidence: AI evaluates (no human selection) 797 -* Verdicts: AI determines (no human adjustment) 798 -* Summaries: AI writes (no human editing) 281 +**Rationale:** POC proves concept. Production hardening happens in POC2 and Beta 0. 799 799 800 -**Pipeline:** 801 -{{code}} 802 -User Input → AKEL Processing → Output Display 803 - ↓ 804 - ZERO human editing 805 -{{/code}} 806 806 807 -**If AI output is poor:** 808 -* ❌ Do NOT manually fix it 809 -* ✅ Document the failure 810 -* ✅ Improve prompts and retry 811 -* ✅ Accept that POC might fail 284 +== 5. POC Output Specification == 812 812 813 -**Why this matters:** 814 -* Tests whether AI can do this without humans 815 -* Validates scalability (humans can't review every analysis) 816 -* Honest test of technical feasibility 286 +=== 5.1 Required Output Elements === 817 817 818 - ---288 +For each analyzed claim, POC must produce: 819 819 820 -=== NFR-POC-2: Performance === 290 +* 291 +** 292 +**1. Claim 293 +* Original text 294 +* Classification (factual/non-factual/ambiguous) 295 +* If non-factual: Clear reason why 821 821 822 -** Requirement:** Analysis completes inreasonable time297 +**2. Scenarios** (if factual) 823 823 824 -**Acceptable Performance:** 825 -* Processing time: 1-5 minutes (acceptable for POC) 826 -* Display loading indicator to user 827 -* Show progress if possible ("Extracting claims...", "Generating verdicts...") 299 +* 2-3 interpretation scenarios 300 +* Each scenario clearly described 828 828 829 -**Not Required:** 830 -* Production-level speed (< 30 seconds) 831 -* Optimization for scale 832 -* Caching 302 +**3. Evidence** (if factual) 833 833 834 -**Acceptance Criteria:** 835 -* Analysis completes within 5 minutes 836 -* User sees loading indicator 837 -* No timeout errors 304 +* Supporting evidence (2+ items) 305 +* Opposing evidence (if exists) 306 +* Source URLs and reliability scores 838 838 839 - ---308 +**4. Verdict** (if factual) 840 840 841 -=== NFR-POC-3: Reliability === 310 +* Probability (0-100%) 311 +* Uncertainty quantification 312 +* Confidence level (LOW/MEDIUM/HIGH) 313 +* Reasoning chain 842 842 843 -** Requirement:**System works for manualtesting sessions315 +**5. Quality Status** 844 844 845 -**Acceptable:** 846 -* Occasional errors (< 20% failure rate) 847 -* Manual restart if needed 848 -* Display error messages clearly 317 +* Which gates passed/failed 318 +* If failed: Clear explanation why 849 849 850 -**Not Required:** 851 -* 99.9% uptime 852 -* Automatic error recovery 853 -* Production monitoring 320 +=== 5.2 Example POC Output === 854 854 855 -**Acceptance Criteria:** 856 -* System works for test demonstrations 857 -* Errors are handled gracefully 858 -* User receives clear error messages 859 - 860 ---- 861 - 862 -=== NFR-POC-4: Environment === 863 - 864 -**Requirement:** Runs on simple infrastructure 865 - 866 -**Acceptable:** 867 -* Single machine or simple cloud setup 868 -* No distributed architecture 869 -* No load balancing 870 -* No redundancy 871 -* Local development environment viable 872 - 873 -**Not Required:** 874 -* Production infrastructure 875 -* Multi-region deployment 876 -* Auto-scaling 877 -* Disaster recovery 878 - 879 ---- 880 - 881 -== 10. Technical Architecture == 882 - 883 -=== 10.1 System Components === 884 - 885 -**Frontend:** 886 -* Simple HTML form (text input + URL input + button) 887 -* Loading indicator 888 -* Results display page (single page, no tabs/navigation) 889 - 890 -**Backend:** 891 -* Single API endpoint 892 -* Calls Claude API (Sonnet 4.5 or latest) 893 -* Parses response 894 -* Returns JSON to frontend 895 - 896 -**Data Storage:** 897 -* None required (stateless POC) 898 -* Optional: Simple file storage or SQLite for demo examples 899 - 900 -**External Services:** 901 -* Claude API (Anthropic) - required 902 -* Optional: URL fetch service for article text extraction 903 - 904 ---- 905 - 906 -=== 10.2 Processing Flow === 907 - 908 -{{code}} 909 -1. User submits text or URL 910 - ↓ 911 -2. Backend receives request 912 - ↓ 913 -3. If URL: Fetch article text 914 - ↓ 915 -4. Call Claude API with single prompt: 916 - "Extract claims, evaluate each, provide verdicts" 917 - ↓ 918 -5. Claude API returns: 919 - - Analysis summary 920 - - Claims list 921 - - Verdicts for each claim (with risk tiers) 922 - - Article summary (optional) 923 - - Quality gate results 924 - ↓ 925 -6. Backend parses response 926 - ↓ 927 -7. Frontend displays results with Mode 2 labeling 322 +{{code language="json"}} 323 +{ 324 + "claim": { 325 + "text": "Switzerland has the highest life expectancy in Europe", 326 + "type": "factual", 327 + "gate1_status": "PASS" 328 + }, 329 + "scenarios": [ 330 + "Switzerland's overall life expectancy is highest", 331 + "Switzerland ranks highest for specific age groups" 332 + ], 333 + "evidence": { 334 + "supporting": [ 335 + { 336 + "source": "WHO Report 2023", 337 + "reliability": 0.95, 338 + "excerpt": "Switzerland: 83.4 years average..." 339 + } 340 + ], 341 + "opposing": [ 342 + { 343 + "source": "Eurostat 2024", 344 + "reliability": 0.90, 345 + "excerpt": "Spain leads at 83.5 years..." 346 + } 347 + ] 348 + }, 349 + "verdict": { 350 + "probability": 0.65, 351 + "uncertainty": 0.15, 352 + "confidence": "MEDIUM", 353 + "reasoning": "WHO and Eurostat show similar but conflicting data...", 354 + "gate4_status": "PASS" 355 + } 356 +} 928 928 {{/code}} 929 929 930 -**Key Simplification:** Single API call does entire analysis 931 931 932 - ---360 +== 6. Success Criteria == 933 933 934 -=== 10.3 AI Prompt Strategy === 362 +{{success}} 363 +**POC Success Definition:** POC validates that AI can extract claims, find balanced evidence, and compute reasonable verdicts with quality gates improving output quality. 364 +{{/success}} 935 935 936 -**Single Comprehensive Prompt:** 937 -{{code}} 938 -Task: Analyze this article and provide: 366 +=== 6.1 Functional Success === 939 939 940 -1. Extract 3-5 factual claims from the article 941 -2. For each claim: 942 - - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED) 943 - - Assign confidence score (0-100%) 944 - - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions) 945 - - Write brief reasoning (1-3 sentences) 946 -3. Run quality gates: 947 - - Check: ≥2 sources found 948 - - Attempt: Basic contradiction search 949 - - Calculate: Confidence scores 950 - - Verify: Structural integrity 951 -4. Write analysis summary (3-5 sentences: claims found, verdict distribution, overall assessment) 952 -5. Write article summary (3-5 sentences: neutral summary of article content) 368 +POC is successful if: 953 953 954 -Return as structured JSON with quality gate results. 955 -{{/code}} 370 +✅ **FR1-FR7 Requirements Met:** 956 956 957 -**One prompt generates everything.** 372 +1. Extracts 3-5 factual claims from test articles 373 +2. Generates 2-3 scenarios per ambiguous claim 374 +3. Finds supporting AND opposing evidence 375 +4. Computes probabilistic verdicts with uncertainty 376 +5. Provides clear reasoning chains 958 958 959 - ---378 +✅ **Quality Gates Work:** 960 960 961 -=== 10.4 Technology Stack Suggestions === 380 +1. Gate 1 blocks non-factual claims (100% block rate) 381 +2. Gate 4 blocks low-quality verdicts (blocks if <2 sources or quality <0.6) 382 +3. Clear rejection reasons provided 962 962 963 -**Frontend:** 964 -* HTML + CSS + JavaScript (minimal framework) 965 -* OR: Next.js (if team prefers) 966 -* Hosted: Local machine OR Vercel/Netlify free tier 384 +✅ **NFR11 Met:** 967 967 968 -**Backend:** 969 -* Python Flask/FastAPI (simple REST API) 970 -* OR: Next.js API routes (if using Next.js) 971 -* Hosted: Local machine OR Railway/Render free tier 386 +1. Quality gates reduce hallucination rate 387 +2. Blocked outputs have clear explanations 388 +3. Quality metrics are logged 972 972 973 -**AKEL Integration:** 974 -* Claude API via Anthropic SDK 975 -* Model: Claude Sonnet 4.5 or latest available 390 +=== 6.2 Quality Thresholds === 976 976 977 -**Database:** 978 -* None (stateless acceptable) 979 -* OR: SQLite if want to store demo examples 980 -* OR: JSON files on disk 392 +**Minimum Acceptable:** 981 981 982 -**Deployment:** 983 -* Local development environment sufficient for POC 984 -* Optional: Deploy to cloud for remote demos 394 +* ≥70% of test claims correctly classified (factual/non-factual) 395 +* ≥60% of verdicts are reasonable (human evaluation) 396 +* Gate 1 blocks 100% of non-factual claims 397 +* Gate 4 blocks verdicts with <2 sources 985 985 986 - ---399 +**Target:** 987 987 988 -== 11. Success Criteria == 401 +* ≥80% claims correctly classified 402 +* ≥75% verdicts are reasonable 403 +* <10% false positives (blocking good claims) 989 989 990 -=== 11.1Minimum Success (POCPasses)===405 +=== 6.3 POC Decision Gate === 991 991 992 -**Required for GO decision:** 993 -* ✅ AI extracts 3-5 factual claims automatically 994 -* ✅ AI provides verdict for each claim automatically 995 -* ✅ Verdicts are reasonable (≥70% make logical sense) 996 -* ✅ Analysis summary is coherent 997 -* ✅ Output is comprehensible to reviewers 998 -* ✅ Team/advisors understand the output 999 -* ✅ Team agrees approach has merit 1000 -* ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention) 407 +**After POC1, we decide:** 1001 1001 1002 -**Quality Definition:** 1003 -* "Reasonable verdict" = Defensible given general knowledge 1004 -* "Coherent summary" = Logically structured, grammatically correct 1005 -* "Comprehensible" = Reviewers understand what analysis means 409 +**✅ PROCEED to POC2** if: 1006 1006 1007 ---- 411 +* Success criteria met 412 +* Quality gates demonstrably improve output 413 +* Core workflow is technically sound 414 +* Clear path to production quality 1008 1008 1009 - ===11.2POCFails If===416 +**⚠️ ITERATE POC1** if: 1010 1010 1011 -**Automatic NO-GO if any of these:** 1012 -* ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones) 1013 -* ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random) 1014 -* ❌ Output incomprehensible (reviewers can't understand analysis) 1015 -* ❌ **Requires manual editing for most analyses** (> 50% need human correction) 1016 -* ❌ Team loses confidence in AI-automated approach 418 +* Success criteria partially met 419 +* Gates work but need tuning 420 +* Core issues identified but fixable 1017 1017 1018 - ---422 +**❌ PIVOT APPROACH** if: 1019 1019 1020 -=== 11.3 Quality Thresholds === 424 +* Success criteria not met 425 +* Fundamental AI limitations discovered 426 +* Quality gates insufficient 427 +* Alternative approach needed 1021 1021 1022 - **POCqualityexpectations:**429 +== 7. Test Cases == 1023 1023 1024 -|=Component|=Quality Threshold|=Definition 1025 -|Claim Extraction|(% class="success" %)≥70% accuracy(%%) |Identifies obvious factual claims, may miss some edge cases 1026 -|Verdict Logic|(% class="success" %)≥70% defensible(%%) |Verdicts are logical given reasoning provided 1027 -|Reasoning Clarity|(% class="success" %)≥70% clear(%%) |1-3 sentences are understandable and relevant 1028 -|Overall Analysis|(% class="success" %)≥70% useful(%%) |Output helps user understand article claims 431 +=== 7.1 Happy Path === 1029 1029 1030 -** Analogy:** "Bstudent"quality(70-80%), not "A+"perfectionyet433 +**Test 1: Simple Factual Claim** 1031 1031 1032 -**Not expecting:** 1033 -* 100% accuracy 1034 -* Perfect claim coverage 1035 -* Comprehensive evidence gathering 1036 -* Flawless verdicts 1037 -* Production polish 435 +* Input: "Paris is the capital of France" 436 +* Expected: Factual, 1 scenario, verdict 95% true 1038 1038 1039 -**Expecting:** 1040 -* Reasonable claim extraction 1041 -* Defensible verdicts 1042 -* Understandable reasoning 1043 -* Useful output 438 +**Test 2: Ambiguous Claim** 1044 1044 1045 ---- 440 +* Input: "Switzerland has the highest income in Europe" 441 +* Expected: Factual, 2-3 scenarios, verdict with uncertainty 1046 1046 1047 - == 12.TestCases==443 +**Test 3: Statistical Claim** 1048 1048 1049 -=== 12.1 Test Case 1: Simple Factual Claim === 445 +* Input: "10% of people have condition X" 446 +* Expected: Factual, evidence with numbers, probabilistic verdict 1050 1050 1051 - **Input:**"Coffeereducesthe risk of type2 diabetesby 30%"448 +=== 7.2 Edge Cases === 1052 1052 1053 -**Expected Output:** 1054 -* Extract claim correctly 1055 -* Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED 1056 -* Confidence: 70-90% 1057 -* Risk tier: C (Low) 1058 -* Reasoning: Mentions studies or evidence 450 +**Test 4: Opinion** 1059 1059 1060 -**Success:** Verdict is reasonable and reasoning makes sense 452 +* Input: "Paris is the best city" 453 +* Expected: Non-factual (opinion), blocked by Gate 1 1061 1061 1062 - ---455 +**Test 5: Prediction** 1063 1063 1064 -=== 12.2 Test Case 2: Complex News Article === 457 +* Input: "Bitcoin will reach $100,000 next year" 458 +* Expected: Non-factual (prediction), blocked by Gate 1 1065 1065 1066 -** Input:** NewsarticleURLwith multipleclaims aboutpolitics/health/science460 +**Test 6: Insufficient Evidence** 1067 1067 1068 -**Expected Output:** 1069 -* Extract 3-5 key claims 1070 -* Verdict for each (may vary: some supported, some uncertain, some refuted) 1071 -* Coherent analysis summary 1072 -* Article summary 1073 -* Risk tiers assigned appropriately 462 +* Input: Obscure factual claim with no sources 463 +* Expected: Blocked by Gate 4 (<2 sources) 1074 1074 1075 - **Success:**Claimsidentified are actuallyfromarticle,verdictsare reasonable465 +=== 7.3 Quality Gate Tests === 1076 1076 1077 - ---467 +**Test 7: Gate 1 Effectiveness** 1078 1078 1079 -=== 12.3 Test Case 3: Controversial Topic === 469 +* Input: Mix of 10 factual + 10 non-factual claims 470 +* Expected: Gate 1 blocks all 10 non-factual (100% precision) 1080 1080 1081 -** Input:**Articleoncontested political or scientific topic472 +**Test 8: Gate 4 Effectiveness** 1082 1082 1083 -**Expected Output:** 1084 -* Balanced analysis 1085 -* Acknowledges uncertainty where appropriate 1086 -* Doesn't overstate confidence 1087 -* Reasoning shows awareness of complexity 474 +* Input: Claims with varying evidence availability 475 +* Expected: Gate 4 blocks low-confidence verdicts 1088 1088 1089 - **Success:**Analysisis fairand doesn't show obviousbias477 +== 8. Technical Architecture (POC) == 1090 1090 1091 - ---479 +=== 8.1 Simplified Architecture === 1092 1092 1093 - ===12.4TestCase 4:Clearly False Claim ===481 +**POC Tech Stack:** 1094 1094 1095 -**Input:** Article with obviously false claim (e.g., "The Earth is flat") 483 +* **Frontend:** Simple web interface (Next.js + TypeScript) 484 +* **Backend:** Single API endpoint 485 +* **AI:** Claude API (Sonnet 4.5) 486 +* **Database:** Local JSON files (no database) 487 +* **Deployment:** Single server 1096 1096 1097 -**Expected Output:** 1098 -* Extract claim 1099 -* Verdict: REFUTED 1100 -* High confidence (> 90%) 1101 -* Risk tier: C (Low - established fact) 1102 -* Clear reasoning 489 +**Architecture Diagram:** See [[POC1 Specification>>FactHarbor.Specification.POC.Specification]] 1103 1103 1104 -**Success:** AI correctly identifies false claim with high confidence 1105 1105 1106 - ---492 +=== 8.2 AKEL Implementation === 1107 1107 1108 - === 12.5 TestCase5:Genuinely Uncertain Claim ===494 +**POC AKEL:** 1109 1109 1110 -**Input:** Article with claim where evidence is genuinely mixed 496 +* Single-threaded processing 497 +* Synchronous API calls 498 +* No caching 499 +* Basic error handling 500 +* Console logging 1111 1111 1112 -**Expected Output:** 1113 -* Extract claim 1114 -* Verdict: UNCERTAIN 1115 -* Moderate confidence (40-60%) 1116 -* Reasoning explains why uncertain 502 +**Full AKEL (POC2+):** 1117 1117 1118 -**Success:** AI recognizes uncertainty and doesn't overstate confidence 504 +* Multi-threaded processing 505 +* Async API calls 506 +* Evidence caching 507 +* Advanced error handling with retry 508 +* Structured logging + monitoring 1119 1119 1120 - ---510 +== 9. POC Philosophy == 1121 1121 1122 -=== 12.6 Test Case 6: High-Risk Medical Claim === 512 +{{info}} 513 +**Important:** POC validates concept, not production readiness. Focus is on proving AI can do the job, with production quality coming in later phases. 514 +{{/info}} 1123 1123 1124 - **Input:**Articlemaking medical claims516 +=== 9.1 Core Principles === 1125 1125 1126 -* *ExpectedOutput:**1127 -* Extract claim1128 -* Verdict:[appropriatebasedonevidence]1129 -* Risktier:A(High-medical)1130 -* Redlabeldisplayed1131 -* Cleardisclaimeraboutnotbeingmedicaladvice518 +* 519 +** 520 +**1. Prove Concept, Not Production 521 +* POC validates AI can do the job 522 +* Production quality comes in POC2 and Beta 0 523 +* Focus on "does it work?" not "is it perfect?" 1132 1132 1133 -** Success:**Risk tier correctlyassigned,appropriatewarningsshown525 +**2. Implement Subset of Requirements** 1134 1134 1135 ---- 527 +* POC covers FR1-7, NFR11 (lite) 528 +* All other requirements deferred 529 +* Clear mapping to [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]] 1136 1136 1137 - == 13.POCDecisionGate==531 +**3. Quality Gates Validate Approach** 1138 1138 1139 -=== 13.1 Decision Framework === 533 +* 2 gates prove the concept 534 +* Remaining 5 gates added in POC2 535 +* Gates must demonstrably improve quality 1140 1140 1141 - AfterPOCtestingcomplete, team makes oneof three decisions:537 +**4. Iterate Based on Results** 1142 1142 1143 -**Option A: GO (Proceed to POC2)** 539 +* POC results determine next steps 540 +* Decision gate after POC1 541 +* Flexibility to pivot if needed 1144 1144 1145 -**Conditions:** 1146 -* AI quality ≥70% without manual editing 1147 -* Basic claim → verdict pipeline validated 1148 -* Internal + advisor feedback positive 1149 -* Technical feasibility confirmed 1150 -* Team confident in direction 1151 -* Clear path to improving AI quality to ≥90% 543 +=== 9.2 Success === 1152 1152 1153 -**Next Steps:** 1154 -* Plan POC2 development (add scenarios) 1155 -* Design scenario architecture 1156 -* Expand to Evidence Model structure 1157 -* Test with more complex articles 545 + Clear Path Forward === 1158 1158 1159 - ---547 +POC succeeds if we can confidently answer: 1160 1160 1161 -** OptionB: NO-GO (Pivotor Stop)**549 +✅ **Technical Feasibility:** 1162 1162 1163 -**Conditions:** 1164 -* AI quality < 60% 1165 -* Requires manual editing for most analyses (> 50%) 1166 -* Feedback indicates fundamental flaws 1167 -* Cost/effort not justified by value 1168 -* No clear path to improvement 551 +* Can AI extract claims reliably? 552 +* Can AI find balanced evidence? 553 +* Can AI compute reasonable verdicts? 1169 1169 1170 -**Next Steps:** 1171 -* **Pivot:** Change to hybrid human-AI approach (accept manual review required) 1172 -* **Stop:** Conclude approach not viable, revisit later 555 +✅ **Quality Approach:** 1173 1173 1174 ---- 557 +* Do quality gates improve output? 558 +* Can we measure and track quality? 559 +* Is the gate approach scalable? 1175 1175 1176 -** OptionC: ITERATE (ImprovePOC)**561 +✅ **Production Path:** 1177 1177 1178 -**Conditions:** 1179 -* Concept has merit but execution needs work 1180 -* Specific improvements identified 1181 -* Addressable with better prompts/approach 1182 -* AI quality between 60-70% 563 +* Is the core architecture sound? 564 +* What needs improvement for production? 565 +* Is POC2 the right next step? 1183 1183 1184 -**Next Steps:** 1185 -* Improve AI prompts 1186 -* Test different approaches 1187 -* Re-run POC with improvements 1188 -* Then make GO/NO-GO decision 567 +== 10. Related Pages == 1189 1189 1190 ---- 569 +* **[[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]** - Full system requirements (this POC implements a subset) 570 +* **[[POC1 Specification (Detailed)>>FactHarbor.Specification.POC.Specification]]** - Detailed POC1 technical specs 571 +* **[[POC Summary>>FactHarbor.Specification.POC.Summary]]** - High-level POC overview 572 +* **[[Implementation Roadmap>>FactHarbor.Roadmap.WebHome]]** - POC1, POC2, Beta 0, V1.0 phases 573 +* **[[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]** - What users need (drives requirements) 1191 1191 1192 -=== 13.2 Decision Criteria Summary === 575 +**Document Owner:** Technical Team 576 +**Review Frequency:** After each POC iteration 577 +**Version History:** 1193 1193 1194 -{{code}} 1195 -AI Quality < 60% → NO-GO (approach doesn't work) 1196 -AI Quality 60-70% → ITERATE (improve and retry) 1197 -AI Quality ≥70% → GO (proceed to POC2) 1198 -{{/code}} 1199 - 1200 ---- 1201 - 1202 -== 14. Key Risks & Mitigations == 1203 - 1204 -=== 14.1 Risk: AI Quality Not Good Enough === 1205 - 1206 -**Likelihood:** Medium-High 1207 -**Impact:** POC fails 1208 - 1209 -**Mitigation:** 1210 -* Extensive prompt engineering and testing 1211 -* Use best available AI models (Sonnet 4.5) 1212 -* Test with diverse article types 1213 -* Iterate on prompts based on results 1214 - 1215 -**Acceptance:** This is what POC tests - be ready for failure 1216 - 1217 ---- 1218 - 1219 -=== 14.2 Risk: AI Consistency Issues === 1220 - 1221 -**Likelihood:** Medium 1222 -**Impact:** Works sometimes, fails other times 1223 - 1224 -**Mitigation:** 1225 -* Test with 10+ diverse articles 1226 -* Measure success rate honestly 1227 -* Improve prompts to increase consistency 1228 - 1229 -**Acceptance:** Some variability OK if average quality ≥70% 1230 - 1231 ---- 1232 - 1233 -=== 14.3 Risk: Output Incomprehensible === 1234 - 1235 -**Likelihood:** Low-Medium 1236 -**Impact:** Users can't understand analysis 1237 - 1238 -**Mitigation:** 1239 -* Create clear explainer document 1240 -* Iterate on output format 1241 -* Test with non-technical reviewers 1242 -* Simplify language if needed 1243 - 1244 -**Acceptance:** Iterate until comprehensible 1245 - 1246 ---- 1247 - 1248 -=== 14.4 Risk: API Rate Limits / Costs === 1249 - 1250 -**Likelihood:** Low 1251 -**Impact:** System slow or expensive 1252 - 1253 -**Mitigation:** 1254 -* Monitor API usage 1255 -* Implement retry logic 1256 -* Estimate costs before scaling 1257 - 1258 -**Acceptance:** POC can be slow and expensive (optimization later) 1259 - 1260 ---- 1261 - 1262 -=== 14.5 Risk: Scope Creep === 1263 - 1264 -**Likelihood:** Medium 1265 -**Impact:** POC becomes too complex 1266 - 1267 -**Mitigation:** 1268 -* Strict scope discipline 1269 -* Say NO to feature additions 1270 -* Keep focus on core question 1271 - 1272 -**Acceptance:** POC is minimal by design 1273 - 1274 ---- 1275 - 1276 -== 15. POC Philosophy == 1277 - 1278 -=== 15.1 Core Principles === 1279 - 1280 -**1. Build Less, Learn More** 1281 -* Minimum features to test hypothesis 1282 -* Don't build unvalidated features 1283 -* Focus on core question only 1284 - 1285 -**2. Fail Fast** 1286 -* Quick test of hardest part (AI capability) 1287 -* Accept that POC might fail 1288 -* Better to discover issues early 1289 -* Honest assessment over optimistic hope 1290 - 1291 -**3. Test First, Build Second** 1292 -* Validate AI can do this before building platform 1293 -* Don't assume it will work 1294 -* Let results guide decisions 1295 - 1296 -**4. Automation First** 1297 -* No manual editing allowed 1298 -* Tests scalability, not just feasibility 1299 -* Proves approach can work at scale 1300 - 1301 -**5. Honest Assessment** 1302 -* Don't cherry-pick examples 1303 -* Don't manually fix bad outputs 1304 -* Document failures openly 1305 -* Make data-driven decisions 1306 - 1307 ---- 1308 - 1309 -=== 15.2 What POC Is === 1310 - 1311 -✅ Testing AI capability without humans 1312 -✅ Proving core technical concept 1313 -✅ Fast validation of approach 1314 -✅ Honest assessment of feasibility 1315 - 1316 ---- 1317 - 1318 -=== 15.3 What POC Is NOT === 1319 - 1320 -❌ Building a product 1321 -❌ Production-ready system 1322 -❌ Feature-complete platform 1323 -❌ Perfectly accurate analysis 1324 -❌ Polished user experience 1325 - 1326 ---- 1327 - 1328 -== 16. Success = Clear Path Forward == 1329 - 1330 -**If POC succeeds (≥70% AI quality):** 1331 -* ✅ Approach validated 1332 -* ✅ Proceed to POC2 (add scenarios) 1333 -* ✅ Design full Evidence Model structure 1334 -* ✅ Test multi-scenario comparison 1335 -* ✅ Focus on improving AI quality from 70% → 90% 1336 - 1337 -**If POC fails (< 60% AI quality):** 1338 -* ✅ Learn what doesn't work 1339 -* ✅ Pivot to different approach 1340 -* ✅ OR wait for better AI technology 1341 -* ✅ Avoid wasting resources on non-viable approach 1342 - 1343 -**Either way, POC provides clarity.** 1344 - 1345 ---- 1346 - 1347 -== 17. Related Pages == 1348 - 1349 -* [[User Needs>>FactHarbor.Specification.Requirements.User Needs]] 1350 -* [[Requirements>>FactHarbor.Requirements.WebHome]] 1351 -* [[Gap Analysis>>FactHarbor.Analysis.GapAnalysis]] 1352 -* [[Architecture>>FactHarbor.Specification.Architecture.WebHome]] 1353 -* [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] 1354 -* [[Workflows>>FactHarbor.Specification.Workflows.WebHome]] 1355 - 1356 ---- 1357 - 1358 -**Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment) 1359 - 579 +* v1.0 - Initial POC requirements 580 +* v2.0 - Updated after specification cross-check 581 +* v3.0 - Aligned with Main Requirements (FR/NFR IDs added)