Changes for page POC Requirements (POC1 & POC2)
Last modified by Robert Schaub on 2025/12/23 18:00
From version 2.2
edited by Robert Schaub
on 2025/12/23 18:00
on 2025/12/23 18:00
Change comment:
Renamed from xwiki:Test.FactHarbor.Specification.POC.Requirements
Summary
-
Page properties (3 modified, 0 added, 0 removed)
Details
- Page properties
-
- Title
-
... ... @@ -1,1 +1,1 @@ 1 -POC Requirements (POC1 & POC2)1 +POC Requirements - Parent
-
... ... @@ -1,1 +1,1 @@ 1 -WebHome 1 +FactHarbor.Specification.POC.WebHome - Content
-
... ... @@ -1,581 +1,1359 @@ 1 1 = POC Requirements = 2 2 3 3 **Status:** ✅ Approved for Development 4 -**Version:** 3.0 (AlignedwithMainRequirements)4 +**Version:** 2.0 (Updated after Specification Cross-Check) 5 5 **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention 6 6 7 -{{info}} 8 -**Core Philosophy:** POC validates the [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]] through simplified implementation. All POC features map to formal FR/NFR requirements. 9 -{{/info}} 7 +--- 10 10 11 - 12 12 == 1. POC Overview == 13 13 14 14 === 1.1 What POC Tests === 15 15 16 16 **Core Question:** 17 - 18 18 > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts? 19 19 20 20 **What we're proving:** 21 - 22 22 * AI can identify factual claims from text 23 -* AI can evaluate those claims withstructuredevidence24 -* Quality gates can filterunreliableoutputs25 -* Thecoreworkflowistechnically feasible18 +* AI can evaluate those claims and produce verdicts 19 +* Output is comprehensible and useful 20 +* Fully automated approach is viable 26 26 27 -**What we're NOT proving:** 22 +**What we're NOT testing:** 23 +* Scenario generation (deferred to POC2) 24 +* Evidence display (deferred to POC2) 25 +* Production scalability 26 +* Perfect accuracy 27 +* Complete feature set 28 28 29 -* Production-ready reliability (that's POC2) 30 -* User-facing features (that's Beta 0) 31 -* Full IFCN compliance (that's V1.0) 29 +--- 32 32 33 -=== 1.2 RequirementsMapping===31 +=== 1.2 Scenarios Deferred to POC2 === 34 34 35 - POC1 implements a**subset** of thefull system requirements defined in [[MainRequirements>>FactHarbor.Specification.Requirements.WebHome]].33 +**Intentional Simplification:** 36 36 37 - **ScopeSummary:**35 +Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**. 38 38 39 -* **In Scope:** 8 requirements (7 FRs + 1 NFR) 40 -* **Partial:** 3 NFRs (simplified versions) 41 -* **Out of Scope:** 19 requirements (deferred to later phases) 37 +**Rationale:** 38 +* **POC1 tests:** Can AI extract claims and generate verdicts? 39 +* **POC2 will add:** Scenario generation and management 40 +* **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow? 42 42 43 - ==2. POC1 Scope ==42 +**Design Decision:** 44 44 45 -{{success}} 46 -**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor V0\.9\.88 ex 2 new Org Pages.Roadmap.Requirements-Roadmap-Matrix.WebHome]] 44 +Prove basic AI capability first, then add scenario complexity based on POC1 learnings. This is good engineering: test the hardest part (AI fact-checking) before adding architectural complexity. 47 47 48 -The Roadmap Matrix is the single source of truth for which requirements are implemented in which phases. This page provides POC1-specific implementation details only. 49 -{{/success}} 46 +**No Risk:** 50 50 51 -**POC1 implements these formal requirements:** 48 +Scenarios are additive complexity, not foundational. Deferring them to POC2 allows: 49 +* Faster POC1 validation 50 +* Learning from POC1 to inform scenario design 51 +* Iterative approach: fail fast if basic AI doesn't work 52 +* Flexibility to adjust scenario architecture based on POC1 insights 52 52 53 - |=FormalReq |= Implementationin POC1 |= Notes54 - | **FR4** | Analysis Summary | Basicformat; quality metadata deferred to POC255 - | **FR7** | Automated Verdicts|Full implementationwithqualitygates(NFR11)56 - | **NFR11** | Quality Assurance Framework | 4 quality gates implemented54 +**Full System Workflow (Future):** 55 +{{code}} 56 +Claims → Scenarios → Evidence → Verdicts 57 +{{/code}} 57 57 58 -**POC1 also implements these workflow components** (detailed as FR1-FR6 in implementation sections below) 59 +**POC1 Simplified Workflow:** 60 +{{code}} 61 +Claims → Verdicts (scenarios implicit in reasoning) 62 +{{/code}} 59 59 60 - {{info}}**Note:** FR11 (Audit Trail) and FR13 (In-Article Claim Highlighting) are deferred to Beta 0 for production readiness and user experience enhancement.{{/info}}:64 +--- 61 61 62 -* Claim extraction (FR1) 63 -* Claim context (FR2) 64 -* Multiple scenarios (FR3) 65 -* Evidence collection (FR5) 66 -* Source quality assessment (FR6) 67 -* Time evolution tracking (FR8) - deferred to POC2 68 -* Audit trail (FR11) - deferred to Beta 0 69 -* In-article highlighting (FR13) - deferred to Beta 0 66 +== 2. POC Output Specification == 70 70 71 - **Partialimplementations:**68 +=== 2.1 Component 1: ANALYSIS SUMMARY === 72 72 73 -* NFR1 (Explainability) - Basic only 74 -* NFR2 (Performance) - Functional but not optimized 75 -* NFR3 (Transparency) - Basic only 70 +**What:** Brief overview of findings 71 +**Length:** 3-5 sentences 72 +**Content:** 73 +* How many claims found 74 +* Distribution of verdicts 75 +* Overall assessment 76 76 77 -**Detailed POC1 implementation specifications continue below...** 77 +**Example:** 78 +{{code}} 79 +This article makes 4 claims about coffee's health effects. We found 80 +2 claims are well-supported, 1 is uncertain, and 1 is refuted. 81 +Overall assessment: mostly accurate with some exaggeration. 82 +{{/code}} 78 78 84 +--- 79 79 86 +=== 2.2 Component 2: CLAIMS IDENTIFICATION === 80 80 81 -== 3. POC Simplifications == 88 +**What:** List of factual claims extracted from article 89 +**Format:** Numbered list 90 +**Quantity:** 3-5 claims 91 +**Requirements:** 92 +* Factual claims only (not opinions/questions) 93 +* Clearly stated 94 +* Automatically extracted by AI 82 82 83 -=== 3.1 FR1: Claim Extraction (Full Implementation) === 96 +**Example:** 97 +{{code}} 98 +CLAIMS IDENTIFIED: 84 84 85 -**Main Requirement:** AI extracts factual claims from input text 100 +[1] Coffee reduces diabetes risk by 30% 101 +[2] Coffee improves heart health 102 +[3] Decaf has same benefits as regular 103 +[4] Coffee prevents Alzheimer's completely 104 +{{/code}} 86 86 87 - **POC Implementation:**106 +--- 88 88 89 -* ✅ AKEL extracts claims using LLM 90 -* ✅ Each claim includes original text reference 91 -* ✅ Claims are identified as factual/non-factual 92 -* ❌ No advanced claim parsing (added in POC2) 108 +=== 2.3 Component 3: CLAIMS VERDICTS === 93 93 94 -**Acceptance Criteria:** 110 +**What:** Verdict for each claim identified 111 +**Format:** Per claim structure 95 95 96 -* Extracts 3-5 claims from typical article 97 -* Identifies factual vs non-factual claims 98 -* Quality Gate 1 validates extraction 113 +**Required Elements:** 114 +* **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED 115 +* **Confidence Score:** 0-100% 116 +* **Brief Reasoning:** 1-3 sentences explaining why 117 +* **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration 99 99 100 -=== 3.2 FR3: Multiple Scenarios (Full Implementation) === 119 +**Example:** 120 +{{code}} 121 +VERDICTS: 101 101 102 -**Main Requirement:** Generate multiple interpretation scenarios for ambiguous claims 123 +[1] WELL-SUPPORTED (85%) [Risk: C] 124 +Multiple studies confirm 25-30% risk reduction with regular consumption. 103 103 126 +[2] UNCERTAIN (65%) [Risk: B] 127 +Evidence is mixed. Some studies show benefits, others show no effect. 128 + 129 +[3] PARTIALLY SUPPORTED (60%) [Risk: C] 130 +Some benefits overlap, but caffeine-related benefits are reduced in decaf. 131 + 132 +[4] REFUTED (90%) [Risk: B] 133 +No evidence for complete prevention. Claim is significantly overstated. 134 +{{/code}} 135 + 136 +**Risk Tier Display:** 137 +* **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections 138 +* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality 139 +* **Tier C (Green):** Low Risk - Facts/Definitions/History 140 + 141 +**Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow. 142 + 143 +--- 144 + 145 +=== 2.4 Component 4: ARTICLE SUMMARY (Optional) === 146 + 147 +**What:** Brief summary of original article content 148 +**Length:** 3-5 sentences 149 +**Tone:** Neutral (article's position, not FactHarbor's analysis) 150 + 151 +**Example:** 152 +{{code}} 153 +ARTICLE SUMMARY: 154 + 155 +Health News Today article discusses coffee benefits, citing studies 156 +on diabetes and Alzheimer's. Author highlights research linking coffee 157 +to disease prevention. Recommends 2-3 cups daily for optimal health. 158 +{{/code}} 159 + 160 +--- 161 + 162 +=== 2.5 Total Output Size === 163 + 164 +**Combined:** ~200-300 words 165 +* Analysis Summary: 50-70 words 166 +* Claims Identification: 30-50 words 167 +* Claims Verdicts: 100-150 words 168 +* Article Summary: 30-50 words (optional) 169 + 170 +--- 171 + 172 +== 3. What's NOT in POC Scope == 173 + 174 +=== 3.1 Feature Exclusions === 175 + 176 +The following are **explicitly excluded** from POC: 177 + 178 +**Content Features:** 179 +* ❌ Scenarios (deferred to POC2) 180 +* ❌ Evidence display (supporting/opposing lists) 181 +* ❌ Source links (clickable references) 182 +* ❌ Detailed reasoning chains 183 +* ❌ Source quality ratings (shown but not detailed) 184 +* ❌ Contradiction detection (basic only) 185 +* ❌ Risk assessment (shown but not workflow-integrated) 186 + 187 +**Platform Features:** 188 +* ❌ User accounts / authentication 189 +* ❌ Saved history 190 +* ❌ Search functionality 191 +* ❌ Claim comparison 192 +* ❌ User contributions 193 +* ❌ Commenting system 194 +* ❌ Social sharing 195 + 196 +**Technical Features:** 197 +* ❌ Browser extensions 198 +* ❌ Mobile apps 199 +* ❌ API endpoints 200 +* ❌ Webhooks 201 +* ❌ Export features (PDF, CSV) 202 + 203 +**Quality Features:** 204 +* ❌ Accessibility (WCAG compliance) 205 +* ❌ Multilingual support 206 +* ❌ Mobile optimization 207 +* ❌ Media verification (images/videos) 208 + 209 +**Production Features:** 210 +* ❌ Security hardening 211 +* ❌ Privacy compliance (GDPR) 212 +* ❌ Terms of service 213 +* ❌ Monitoring/logging 214 +* ❌ Error tracking 215 +* ❌ Analytics 216 +* ❌ A/B testing 217 + 218 +--- 219 + 220 +== 4. POC Simplifications vs. Full System == 221 + 222 +=== 4.1 Architecture Comparison === 223 + 224 +**POC Architecture (Simplified):** 225 +{{code}} 226 +User Input → Single AKEL Call → Output Display 227 + (all processing) 228 +{{/code}} 229 + 230 +**Full System Architecture:** 231 +{{code}} 232 +User Input → Claim Extractor → Claim Classifier → Scenario Generator 233 +→ Evidence Summarizer → Contradiction Detector → Verdict Generator 234 +→ Quality Gates → Publication → Output Display 235 +{{/code}} 236 + 237 +**Key Differences:** 238 + 239 +|=Aspect|=POC1|=Full System 240 +|Processing|Single API call|Multi-component pipeline 241 +|Scenarios|None (implicit)|Explicit entities with versioning 242 +|Evidence|Basic retrieval|Comprehensive with quality scoring 243 +|Quality Gates|Simplified (4 basic checks)|Full validation infrastructure 244 +|Workflow|3 steps (input/process/output)|6 phases with gates 245 +|Data Model|Stateless (no database)|PostgreSQL + Redis + S3 246 +|Architecture|Single prompt to Claude|AKEL Orchestrator + Components 247 + 248 +--- 249 + 250 +=== 4.2 Workflow Comparison === 251 + 252 +**POC1 Workflow:** 253 +1. User submits text/URL 254 +2. Single AKEL call (all processing in one prompt) 255 +3. Display results 256 +**Total: 3 steps, ~10-18 seconds** 257 + 258 +**Full System Workflow:** 259 +1. **Claim Submission** (extraction, normalization, clustering) 260 +2. **Scenario Building** (definitions, assumptions, boundaries) 261 +3. **Evidence Handling** (retrieval, assessment, linking) 262 +4. **Verdict Creation** (synthesis, reasoning, approval) 263 +5. **Public Presentation** (summaries, landscapes, deep dives) 264 +6. **Time Evolution** (versioning, re-evaluation triggers) 265 +**Total: 6 phases with quality gates, ~10-30 seconds** 266 + 267 +--- 268 + 269 +=== 4.3 Why POC is Simplified === 270 + 271 +**Engineering Rationale:** 272 + 273 +1. **Test core capability first:** Can AI do basic fact-checking without humans? 274 +2. **Fail fast:** If AI can't generate reasonable verdicts, pivot early 275 +3. **Learn before building:** POC1 insights inform full architecture 276 +4. **Iterative approach:** Add complexity only after validating foundations 277 +5. **Resource efficiency:** Don't build full system if core concept fails 278 + 279 +**Acceptable Trade-offs:** 280 + 281 +* ✅ POC proves AI capability (most risky assumption) 282 +* ✅ POC validates user comprehension (can people understand output?) 283 +* ❌ POC doesn't validate full workflow (test in Beta) 284 +* ❌ POC doesn't validate scale (test in Beta) 285 +* ❌ POC doesn't validate scenario architecture (design in POC2) 286 + 287 +--- 288 + 289 +=== 4.4 Gap Between POC1 and POC2/Beta === 290 + 291 +**What needs to be built for POC2:** 292 +* Scenario generation component 293 +* Evidence Model structure (full) 294 +* Scenario-evidence linking 295 +* Multi-interpretation comparison 296 +* Truth landscape visualization 297 + 298 +**What needs to be built for Beta:** 299 +* Multi-component AKEL pipeline 300 +* Quality gate infrastructure 301 +* Review workflow system 302 +* Audit sampling framework 303 +* Production data model 304 +* Federation architecture (Release 1.0) 305 + 306 +**POC1 → POC2 is significant architectural expansion.** 307 + 308 +--- 309 + 310 +== 5. Publication Mode & Labeling == 311 + 312 +=== 5.1 POC Publication Mode === 313 + 314 +**Mode:** Mode 2 (AI-Generated, No Prior Human Review) 315 + 316 +Per FactHarbor Specification Section 11 "POC v1 Behavior": 317 +* Produces public AI-generated output 318 +* No human approval gate 319 +* Clear AI-Generated labeling 320 +* All quality gates active (simplified) 321 +* Risk tier classification shown (demo) 322 + 323 +--- 324 + 325 +=== 5.2 User-Facing Labels === 326 + 327 +**Primary Label (top of analysis):** 328 +{{code}} 329 +╔════════════════════════════════════════════════════════════╗ 330 +║ [AI-GENERATED - POC/DEMO] ║ 331 +║ ║ 332 +║ This analysis was produced entirely by AI and has not ║ 333 +║ been human-reviewed. Use for demonstration purposes. ║ 334 +║ ║ 335 +║ Source: AI/AKEL v1.0 (POC) ║ 336 +║ Review Status: Not Reviewed (Proof-of-Concept) ║ 337 +║ Quality Gates: 4/4 Passed (Simplified) ║ 338 +║ Last Updated: [timestamp] ║ 339 +╚════════════════════════════════════════════════════════════╝ 340 +{{/code}} 341 + 342 +**Per-Claim Risk Labels:** 343 +* **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety) 344 +* **[Risk: B]** 🟡 Medium Risk (Policy/Science) 345 +* **[Risk: C]** 🟢 Low Risk (Facts/Definitions) 346 + 347 +--- 348 + 349 +=== 5.3 Display Requirements === 350 + 351 +**Must Show:** 352 +* AI-Generated status (prominent) 353 +* POC/Demo disclaimer 354 +* Risk tier per claim 355 +* Confidence scores (0-100%) 356 +* Quality gate status (passed/failed) 357 +* Timestamp 358 + 359 +**Must NOT Claim:** 360 +* Human review 361 +* Production quality 362 +* Medical/legal advice 363 +* Authoritative verdicts 364 +* Complete accuracy 365 + 366 +--- 367 + 368 +=== 5.4 Mode 2 vs. Full System Publication === 369 + 370 +|=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3 371 +|Label|AI-Generated (POC)|AI-Generated|AKEL-Generated 372 +|Review|None|None|Human-Reviewed 373 +|Quality Gates|4 (simplified)|6 (full)|6 (full) + Human 374 +|Audit|None (POC)|Sampling (5-50%)|Pre-publication 375 +|Risk Display|Demo only|Workflow-integrated|Validated 376 +|User Actions|View only|Flag for review|Trust rating 377 + 378 +--- 379 + 380 +== 6. Quality Gates (Simplified Implementation) == 381 + 382 +=== 6.1 Overview === 383 + 384 +Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates. 385 + 386 +**Full System Has 4 Gates:** 387 +1. Source Quality 388 +2. Contradiction Search (MANDATORY) 389 +3. Uncertainty Quantification 390 +4. Structural Integrity 391 + 392 +**POC Implements Simplified Versions:** 393 +* Focus on demonstrating concept 394 +* Basic implementations sufficient 395 +* Failures displayed to user (not blocking) 396 +* Full system has comprehensive validation 397 + 398 +--- 399 + 400 +=== 6.2 Gate 1: Source Quality (Basic) === 401 + 402 +**Full System Requirements:** 403 +* Primary sources identified and accessible 404 +* Source reliability scored against whitelist 405 +* Citation completeness verified 406 +* Publication dates checked 407 +* Author credentials validated 408 + 104 104 **POC Implementation:** 410 +* ✅ At least 2 sources found 411 +* ✅ Sources accessible (URLs valid) 412 +* ❌ No whitelist checking 413 +* ❌ No credential validation 414 +* ❌ No comprehensive reliability scoring 105 105 106 -* ✅ AKEL generates 2-3 scenarios per claim 107 -* ✅ Scenarios capture different interpretations 108 -* ✅ Each scenario is evaluated separately 109 -* ✅ Verdict considers all scenarios 416 +**Pass Criteria:** ≥2 accessible sources found 110 110 111 -** AcceptanceCriteria:**418 +**Failure Handling:** Display error message, don't generate verdict 112 112 113 -* Generates 2+ scenarios for ambiguous claims 114 -* Scenarios are meaningfully different 115 -* All scenarios are evaluated 420 +--- 116 116 117 -=== 3.3FR4:AnalysisSummary(BasicImplementation) ===422 +=== 6.3 Gate 2: Contradiction Search (Basic) === 118 118 119 -**Main Requirement:** Provide user-friendly summary of analysis 424 +**Full System Requirements:** 425 +* Counter-evidence actively searched 426 +* Reservations and limitations identified 427 +* Alternative interpretations explored 428 +* Bubble detection (echo chambers, conspiracy theories) 429 +* Cross-cultural and international perspectives 430 +* Academic literature (supporting AND opposing) 120 120 121 121 **POC Implementation:** 433 +* ✅ Basic search for counter-evidence 434 +* ✅ Identify obvious contradictions 435 +* ❌ No comprehensive academic search 436 +* ❌ No bubble detection 437 +* ❌ No systematic alternative interpretation search 438 +* ❌ No international perspective verification 122 122 123 -* ✅ Simple text summary generated 124 -* ❌ No rich formatting (added in Beta 0) 125 -* ❌ No visual elements (added in Beta 0) 126 -* ❌ No interactive features (added in Beta 0) 440 +**Pass Criteria:** Basic contradiction search attempted 127 127 128 -**POC Format:** 129 -``` 130 -Claim: [extracted claim] 131 -Scenarios: [list of scenarios] 132 -Evidence: [supporting/opposing evidence] 133 -Verdict: [probability with uncertainty] 134 -``` 442 +**Failure Handling:** Note "limited contradiction search" in output 135 135 444 +--- 136 136 137 -=== 3.4FR5-FR6:EvidenceCollection& Evaluation (Full Implementation) ===446 +=== 6.4 Gate 3: Uncertainty Quantification (Basic) === 138 138 139 -**Main Requirements:** 448 +**Full System Requirements:** 449 +* Confidence scores calculated for all claims/verdicts 450 +* Limitations explicitly stated 451 +* Data gaps identified and disclosed 452 +* Strength of evidence assessed 453 +* Alternative scenarios considered 140 140 141 -* FR5: Collect supporting and opposing evidence 142 -* FR6: Evaluate evidence source reliability 455 +**POC Implementation:** 456 +* ✅ Confidence scores (0-100%) 457 +* ✅ Basic uncertainty acknowledgment 458 +* ❌ No detailed limitation disclosure 459 +* ❌ No data gap identification 460 +* ❌ No alternative scenario consideration (deferred to POC2) 143 143 462 +**Pass Criteria:** Confidence score assigned 463 + 464 +**Failure Handling:** Show "Confidence: Unknown" if calculation fails 465 + 466 +--- 467 + 468 +=== 6.5 Gate 4: Structural Integrity (Basic) === 469 + 470 +**Full System Requirements:** 471 +* No hallucinations detected (fact-checking against sources) 472 +* Logic chain valid and traceable 473 +* References accessible and verifiable 474 +* No circular reasoning 475 +* Premises clearly stated 476 + 144 144 **POC Implementation:** 478 +* ✅ Basic coherence check 479 +* ✅ References accessible 480 +* ❌ No comprehensive hallucination detection 481 +* ❌ No formal logic validation 482 +* ❌ No premise extraction and verification 145 145 146 -* ✅ AKEL searches for evidence (web/knowledge base) 147 -* ✅ **Mandatory contradiction search** (finds opposing evidence) 148 -* ✅ Source reliability scoring 149 -* ❌ No evidence deduplication (added in POC2) 150 -* ❌ No advanced source verification (added in POC2) 484 +**Pass Criteria:** Output is coherent and references are accessible 151 151 486 +**Failure Handling:** Display error message 487 + 488 +--- 489 + 490 +=== 6.6 Quality Gate Display === 491 + 492 +**POC shows simplified status:** 493 +{{code}} 494 +Quality Gates: 4/4 Passed (Simplified) 495 +✓ Source Quality: 3 sources found 496 +✓ Contradiction Search: Basic search completed 497 +✓ Uncertainty: Confidence scores assigned 498 +✓ Structural Integrity: Output coherent 499 +{{/code}} 500 + 501 +**If any gate fails:** 502 +{{code}} 503 +Quality Gates: 3/4 Passed (Simplified) 504 +✓ Source Quality: 3 sources found 505 +✗ Contradiction Search: Search failed - limited evidence 506 +✓ Uncertainty: Confidence scores assigned 507 +✓ Structural Integrity: Output coherent 508 + 509 +Note: This analysis has limited evidence. Use with caution. 510 +{{/code}} 511 + 512 +--- 513 + 514 +=== 6.7 Simplified vs. Full System === 515 + 516 +|=Gate|=POC (Simplified)|=Full System 517 +|Source Quality|≥2 sources accessible|Whitelist scoring, credentials, comprehensiveness 518 +|Contradiction|Basic search|Systematic academic + media + international 519 +|Uncertainty|Confidence % assigned|Detailed limitations, data gaps, alternatives 520 +|Structural|Coherence check|Hallucination detection, logic validation, premise check 521 + 522 +**POC Goal:** Demonstrate that quality gates are possible, not perfect implementation. 523 + 524 +--- 525 + 526 +== 7. AKEL Architecture Comparison == 527 + 528 +=== 7.1 POC AKEL (Simplified) === 529 + 530 +**Implementation:** 531 +* Single Claude API call (Sonnet 4.5) 532 +* One comprehensive prompt 533 +* All processing in single request 534 +* No separate components 535 +* No orchestration layer 536 + 537 +**Prompt Structure:** 538 +{{code}} 539 +Task: Analyze this article and provide: 540 + 541 +1. Extract 3-5 factual claims 542 +2. For each claim: 543 + - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED) 544 + - Assign confidence score (0-100%) 545 + - Assign risk tier (A/B/C) 546 + - Write brief reasoning (1-3 sentences) 547 +3. Generate analysis summary (3-5 sentences) 548 +4. Generate article summary (3-5 sentences) 549 +5. Run basic quality checks 550 + 551 +Return as structured JSON. 552 +{{/code}} 553 + 554 +**Processing Time:** 10-18 seconds (estimate) 555 + 556 +--- 557 + 558 +=== 7.2 Full System AKEL (Production) === 559 + 560 +**Architecture:** 561 +{{code}} 562 +AKEL Orchestrator 563 +├── Claim Extractor 564 +├── Claim Classifier (with risk tier assignment) 565 +├── Scenario Generator 566 +├── Evidence Summarizer 567 +├── Contradiction Detector 568 +├── Quality Gate Validator 569 +├── Audit Sampling Scheduler 570 +└── Federation Sync Adapter (Release 1.0+) 571 +{{/code}} 572 + 573 +**Processing:** 574 +* Parallel processing where possible 575 +* Separate component calls 576 +* Quality gates between phases 577 +* Audit sampling selection 578 +* Cross-node coordination (federated mode) 579 + 580 +**Processing Time:** 10-30 seconds (full pipeline) 581 + 582 +--- 583 + 584 +=== 7.3 Why POC Uses Single Call === 585 + 586 +**Advantages:** 587 +* ✅ Simpler to implement 588 +* ✅ Faster POC development 589 +* ✅ Easier to debug 590 +* ✅ Proves AI capability 591 +* ✅ Good enough for concept validation 592 + 593 +**Limitations:** 594 +* ❌ No component reusability 595 +* ❌ No parallel processing 596 +* ❌ All-or-nothing (can't partially succeed) 597 +* ❌ Harder to improve individual components 598 +* ❌ No audit sampling 599 + 600 +**Acceptable Trade-off:** 601 + 602 +POC tests "Can AI do this?" not "How should we architect it?" 603 + 604 +Full component architecture comes in Beta after POC validates concept. 605 + 606 +--- 607 + 608 +=== 7.4 Evolution Path === 609 + 610 +**POC1:** Single prompt → Prove concept 611 +**POC2:** Add scenario component → Test full pipeline 612 +**Beta:** Multi-component AKEL → Production architecture 613 +**Release 1.0:** Full AKEL + Federation → Scale 614 + 615 +--- 616 + 617 +== 8. Functional Requirements == 618 + 619 +=== FR-POC-1: Article Input === 620 + 621 +**Requirement:** User can submit article for analysis 622 + 623 +**Functionality:** 624 +* Text input field (paste article text, up to 5000 characters) 625 +* URL input field (paste article URL) 626 +* "Analyze" button to trigger processing 627 +* Loading indicator during analysis 628 + 629 +**Excluded:** 630 +* No user authentication 631 +* No claim history 632 +* No search functionality 633 +* No saved templates 634 + 152 152 **Acceptance Criteria:** 636 +* User can paste text from article 637 +* User can paste URL of article 638 +* System accepts input and triggers analysis 153 153 154 -* Finds 2+ supporting evidence items 155 -* Finds 1+ opposing evidence (if exists) 156 -* Sources scored for reliability 640 +--- 157 157 158 -=== 3.5FR7:AutomatedVerdicts(FullImplementation) ===642 +=== FR-POC-2: Claim Extraction (Fully Automated) === 159 159 160 -** MainRequirement:** AIcomputesverdictswithuncertainty quantification644 +**Requirement:** AI automatically extracts 3-5 factual claims 161 161 162 -**POC Implementation:** 646 +**Functionality:** 647 +* AI reads article text 648 +* AI identifies factual claims (not opinions/questions) 649 +* AI extracts 3-5 most important claims 650 +* System displays numbered list 163 163 164 -* ✅ Probabilistic verdicts (0-100% confidence) 165 -* ✅ Uncertainty explicitly stated 166 -* ✅ Reasoning chain provided 167 -* ✅ Quality Gate 4 validates verdict confidence 652 +**Critical:** NO MANUAL EDITING ALLOWED 653 +* AI selects which claims to extract 654 +* AI identifies factual vs. non-factual 655 +* System processes claims as extracted 656 +* No human curation or correction 168 168 169 -**POC Output:** 170 -``` 171 -Verdict: 70% likely true 172 -Uncertainty: ±15% (moderate confidence) 173 -Reasoning: Based on 3 high-quality sources... 174 -Confidence Level: MEDIUM 175 -``` 658 +**Error Handling:** 659 +* If extraction fails: Display error message 660 +* User can retry with different input 661 +* No manual intervention to fix extraction 176 176 177 177 **Acceptance Criteria:** 664 +* AI extracts 3-5 claims automatically 665 +* Claims are factual (not opinions) 666 +* Claims are clearly stated 667 +* No manual editing required 178 178 179 -* Verdicts include probability (0-100%) 180 -* Uncertainty explicitly quantified 181 -* Reasoning chain explains verdict 669 +--- 182 182 183 -=== 3.6 NFR11:QualityAssuranceFramework(LITEVERSION) ===671 +=== FR-POC-3: Verdict Generation (Fully Automated) === 184 184 185 -** MainRequirement:**Complete qualityassurancewith7qualitygates673 +**Requirement:** AI automatically generates verdict for each claim 186 186 187 -**POC Implementation:** **2 gates only** 675 +**Functionality:** 676 +* For each claim, AI: 677 + * Evaluates claim based on available evidence/knowledge 678 + * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED 679 + * Assigns confidence score (0-100%) 680 + * Assigns risk tier (A/B/C) 681 + * Writes brief reasoning (1-3 sentences) 682 +* System displays verdict for each claim 188 188 189 -**Quality Gate 1: Claim Validation** 684 +**Critical:** NO MANUAL EDITING ALLOWED 685 +* AI computes verdicts based on evidence 686 +* AI generates confidence scores 687 +* AI writes reasoning 688 +* No human review or adjustment 190 190 191 -* ✅ Validates claim is factual and verifiable 192 -* ✅ Blocks non-factual claims (opinion/prediction/ambiguous) 193 -* ✅ Provides clear rejection reason 690 +**Error Handling:** 691 +* If verdict generation fails: Display error message 692 +* User can retry 693 +* No manual intervention to adjust verdicts 194 194 195 -**Quality Gate 4: Verdict Confidence Assessment** 695 +**Acceptance Criteria:** 696 +* Each claim has a verdict 697 +* Confidence score is displayed (0-100%) 698 +* Risk tier is displayed (A/B/C) 699 +* Reasoning is understandable (1-3 sentences) 700 +* Verdict is defensible given reasoning 701 +* All generated automatically by AI 196 196 197 -* ✅ Validates ≥2 sources found 198 -* ✅ Validates quality score ≥0.6 199 -* ✅ Blocks low-confidence verdicts 200 -* ✅ Provides clear rejection reason 703 +--- 201 201 202 - **OutofScope(POC2+):**705 +=== FR-POC-4: Analysis Summary (Fully Automated) === 203 203 204 -* ❌ Gate 2: Evidence Relevance 205 -* ❌ Gate 3: Scenario Coherence 206 -* ❌ Gate 5: Source Diversity 207 -* ❌ Gate 6: Reasoning Validity 208 -* ❌ Gate 7: Output Completeness 707 +**Requirement:** AI generates brief summary of analysis 209 209 210 -**Rationale:** Prove gate concept works. Add remaining gates in POC2 after validating approach. 709 +**Functionality:** 710 +* AI summarizes findings in 3-5 sentences: 711 + * How many claims found 712 + * Distribution of verdicts 713 + * Overall assessment 714 +* System displays at top of results 211 211 716 +**Critical:** NO MANUAL EDITING ALLOWED 212 212 213 -=== 3.7 NFR1-3: Performance, Scalability, Reliability (Basic) === 718 +**Acceptance Criteria:** 719 +* Summary is coherent 720 +* Accurately reflects analysis 721 +* 3-5 sentences 722 +* Automatically generated 214 214 215 - **Main Requirements:**724 +--- 216 216 217 -* NFR1: Response time < 30 seconds 218 -* NFR2: Handle 1000+ concurrent users 219 -* NFR3: 99.9% uptime 726 +=== FR-POC-5: Article Summary (Fully Automated, Optional) === 220 220 221 -** POC Implementation:**728 +**Requirement:** AI generates brief summary of original article 222 222 223 -* ⚠️ **Response time monitored** (not optimized) 224 -* ⚠️ **Single-threaded processing** (no concurrency) 225 -* ⚠️ **Basic error handling** (no advanced retry logic) 730 +**Functionality:** 731 +* AI summarizes article content (not FactHarbor's analysis) 732 +* 3-5 sentences 733 +* System displays 226 226 227 -** Rationale:**POCproves functionality.Performanceoptimization happensin POC2.735 +**Note:** Optional - can skip if time limited 228 228 229 -** POCAcceptance:**737 +**Critical:** NO MANUAL EDITING ALLOWED 230 230 231 -* Analysis completes (no timeout requirement) 232 -* Errors don't crash system 233 -* Basic logging in place 739 +**Acceptance Criteria:** 740 +* Summary is neutral (article's position) 741 +* Accurately reflects article content 742 +* 3-5 sentences 743 +* Automatically generated 234 234 235 - == 4. What's NOT in POC Scope ==745 +--- 236 236 237 -=== 4.1 User-FacingFeatures(Beta0+)===747 +=== FR-POC-6: Publication Mode Display === 238 238 239 -{{warning}} 240 -**Deferred to Beta 0:** 241 -{{/warning}} 749 +**Requirement:** Clear labeling of AI-generated content 242 242 243 -**Out of Scope:** 751 +**Functionality:** 752 +* Display Mode 2 publication label 753 +* Show POC/Demo disclaimer 754 +* Display risk tiers per claim 755 +* Show quality gate status 756 +* Display timestamp 244 244 245 -* ❌ User accounts and authentication (FR8) 246 -* ❌ User corrections system (FR9, FR45-46) 247 -* ❌ Public publishing interface (FR10) 248 -* ❌ Social sharing (FR11) 249 -* ❌ Email notifications (FR12) 250 -* ❌ API access (FR13) 758 +**Acceptance Criteria:** 759 +* Label is prominent and clear 760 +* User understands this is AI-generated POC output 761 +* Risk tiers are color-coded 762 +* Quality gate status is visible 251 251 252 - **Rationale:** POC validates AI capabilities. User features added in Beta 0.764 +--- 253 253 766 +=== FR-POC-7: Quality Gate Execution === 254 254 255 - === 4.2 AdvancedFeatures(V1.0+) ===768 +**Requirement:** Execute simplified quality gates 256 256 257 -**Out of Scope:** 770 +**Functionality:** 771 +* Check source quality (basic) 772 +* Attempt contradiction search (basic) 773 +* Calculate confidence scores 774 +* Verify structural integrity (basic) 775 +* Display gate results 258 258 259 -* ❌ IFCN compliance (FR47) 260 -* ❌ ClaimReview schema (FR48) 261 -* ❌ Archive.org integration (FR49) 262 -* ❌ OSINT toolkit (FR50) 263 -* ❌ Video verification (FR51) 264 -* ❌ Deepfake detection (FR52) 265 -* ❌ Cross-org sharing (FR53) 777 +**Acceptance Criteria:** 778 +* All 4 gates attempted 779 +* Pass/fail status displayed 780 +* Failures explained to user 781 +* Gates don't block publication (POC mode) 266 266 267 - **Rationale:** Advanced features require proven platform. Added post-V1.0.783 +--- 268 268 785 +== 9. Non-Functional Requirements == 269 269 270 -=== 4.3ProductionRequirements(POC2, Beta0)===787 +=== NFR-POC-1: Fully Automated Processing === 271 271 272 -** Out ofScope:**789 +**Requirement:** Complete AI automation with zero manual intervention 273 273 274 -* ❌ Security controls (NFR4, NFR12) 275 -* ❌ Code maintainability (NFR5) 276 -* ❌ System monitoring (NFR13) 277 -* ❌ Evidence deduplication 278 -* ❌ Advanced source verification 279 -* ❌ Full 7-gate quality framework 791 +**Critical Rule:** NO MANUAL EDITING AT ANY STAGE 280 280 281 -**Rationale:** POC proves concept. Production hardening happens in POC2 and Beta 0. 793 +**What this means:** 794 +* Claims: AI selects (no human curation) 795 +* Scenarios: N/A (deferred to POC2) 796 +* Evidence: AI evaluates (no human selection) 797 +* Verdicts: AI determines (no human adjustment) 798 +* Summaries: AI writes (no human editing) 282 282 800 +**Pipeline:** 801 +{{code}} 802 +User Input → AKEL Processing → Output Display 803 + ↓ 804 + ZERO human editing 805 +{{/code}} 283 283 284 -== 5. POC Output Specification == 807 +**If AI output is poor:** 808 +* ❌ Do NOT manually fix it 809 +* ✅ Document the failure 810 +* ✅ Improve prompts and retry 811 +* ✅ Accept that POC might fail 285 285 286 -=== 5.1 Required Output Elements === 813 +**Why this matters:** 814 +* Tests whether AI can do this without humans 815 +* Validates scalability (humans can't review every analysis) 816 +* Honest test of technical feasibility 287 287 288 - For each analyzed claim, POC must produce:818 +--- 289 289 290 -* 291 -** 292 -**1. Claim 293 -* Original text 294 -* Classification (factual/non-factual/ambiguous) 295 -* If non-factual: Clear reason why 820 +=== NFR-POC-2: Performance === 296 296 297 -** 2. Scenarios**(iffactual)822 +**Requirement:** Analysis completes in reasonable time 298 298 299 -* 2-3 interpretation scenarios 300 -* Each scenario clearly described 824 +**Acceptable Performance:** 825 +* Processing time: 1-5 minutes (acceptable for POC) 826 +* Display loading indicator to user 827 +* Show progress if possible ("Extracting claims...", "Generating verdicts...") 301 301 302 -**3. Evidence** (if factual) 829 +**Not Required:** 830 +* Production-level speed (< 30 seconds) 831 +* Optimization for scale 832 +* Caching 303 303 304 -* Supporting evidence (2+ items) 305 -* Opposing evidence (if exists) 306 -* Source URLs and reliability scores 834 +**Acceptance Criteria:** 835 +* Analysis completes within 5 minutes 836 +* User sees loading indicator 837 +* No timeout errors 307 307 308 - **4. Verdict** (if factual)839 +--- 309 309 310 -* Probability (0-100%) 311 -* Uncertainty quantification 312 -* Confidence level (LOW/MEDIUM/HIGH) 313 -* Reasoning chain 841 +=== NFR-POC-3: Reliability === 314 314 315 -** 5. QualityStatus**843 +**Requirement:** System works for manual testing sessions 316 316 317 -* Which gates passed/failed 318 -* If failed: Clear explanation why 845 +**Acceptable:** 846 +* Occasional errors (< 20% failure rate) 847 +* Manual restart if needed 848 +* Display error messages clearly 319 319 320 -=== 5.2 Example POC Output === 850 +**Not Required:** 851 +* 99.9% uptime 852 +* Automatic error recovery 853 +* Production monitoring 321 321 322 -{{code language="json"}} 323 -{ 324 - "claim": { 325 - "text": "Switzerland has the highest life expectancy in Europe", 326 - "type": "factual", 327 - "gate1_status": "PASS" 328 - }, 329 - "scenarios": [ 330 - "Switzerland's overall life expectancy is highest", 331 - "Switzerland ranks highest for specific age groups" 332 - ], 333 - "evidence": { 334 - "supporting": [ 335 - { 336 - "source": "WHO Report 2023", 337 - "reliability": 0.95, 338 - "excerpt": "Switzerland: 83.4 years average..." 339 - } 340 - ], 341 - "opposing": [ 342 - { 343 - "source": "Eurostat 2024", 344 - "reliability": 0.90, 345 - "excerpt": "Spain leads at 83.5 years..." 346 - } 347 - ] 348 - }, 349 - "verdict": { 350 - "probability": 0.65, 351 - "uncertainty": 0.15, 352 - "confidence": "MEDIUM", 353 - "reasoning": "WHO and Eurostat show similar but conflicting data...", 354 - "gate4_status": "PASS" 355 - } 356 -} 855 +**Acceptance Criteria:** 856 +* System works for test demonstrations 857 +* Errors are handled gracefully 858 +* User receives clear error messages 859 + 860 +--- 861 + 862 +=== NFR-POC-4: Environment === 863 + 864 +**Requirement:** Runs on simple infrastructure 865 + 866 +**Acceptable:** 867 +* Single machine or simple cloud setup 868 +* No distributed architecture 869 +* No load balancing 870 +* No redundancy 871 +* Local development environment viable 872 + 873 +**Not Required:** 874 +* Production infrastructure 875 +* Multi-region deployment 876 +* Auto-scaling 877 +* Disaster recovery 878 + 879 +--- 880 + 881 +== 10. Technical Architecture == 882 + 883 +=== 10.1 System Components === 884 + 885 +**Frontend:** 886 +* Simple HTML form (text input + URL input + button) 887 +* Loading indicator 888 +* Results display page (single page, no tabs/navigation) 889 + 890 +**Backend:** 891 +* Single API endpoint 892 +* Calls Claude API (Sonnet 4.5 or latest) 893 +* Parses response 894 +* Returns JSON to frontend 895 + 896 +**Data Storage:** 897 +* None required (stateless POC) 898 +* Optional: Simple file storage or SQLite for demo examples 899 + 900 +**External Services:** 901 +* Claude API (Anthropic) - required 902 +* Optional: URL fetch service for article text extraction 903 + 904 +--- 905 + 906 +=== 10.2 Processing Flow === 907 + 908 +{{code}} 909 +1. User submits text or URL 910 + ↓ 911 +2. Backend receives request 912 + ↓ 913 +3. If URL: Fetch article text 914 + ↓ 915 +4. Call Claude API with single prompt: 916 + "Extract claims, evaluate each, provide verdicts" 917 + ↓ 918 +5. Claude API returns: 919 + - Analysis summary 920 + - Claims list 921 + - Verdicts for each claim (with risk tiers) 922 + - Article summary (optional) 923 + - Quality gate results 924 + ↓ 925 +6. Backend parses response 926 + ↓ 927 +7. Frontend displays results with Mode 2 labeling 357 357 {{/code}} 358 358 930 +**Key Simplification:** Single API call does entire analysis 359 359 360 - == 6. Success Criteria ==932 +--- 361 361 362 -{{success}} 363 -**POC Success Definition:** POC validates that AI can extract claims, find balanced evidence, and compute reasonable verdicts with quality gates improving output quality. 364 -{{/success}} 934 +=== 10.3 AI Prompt Strategy === 365 365 366 -=== 6.1 Functional Success === 936 +**Single Comprehensive Prompt:** 937 +{{code}} 938 +Task: Analyze this article and provide: 367 367 368 -POC is successful if: 940 +1. Extract 3-5 factual claims from the article 941 +2. For each claim: 942 + - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED) 943 + - Assign confidence score (0-100%) 944 + - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions) 945 + - Write brief reasoning (1-3 sentences) 946 +3. Run quality gates: 947 + - Check: ≥2 sources found 948 + - Attempt: Basic contradiction search 949 + - Calculate: Confidence scores 950 + - Verify: Structural integrity 951 +4. Write analysis summary (3-5 sentences: claims found, verdict distribution, overall assessment) 952 +5. Write article summary (3-5 sentences: neutral summary of article content) 369 369 370 -✅ **FR1-FR7 Requirements Met:** 954 +Return as structured JSON with quality gate results. 955 +{{/code}} 371 371 372 -1. Extracts 3-5 factual claims from test articles 373 -2. Generates 2-3 scenarios per ambiguous claim 374 -3. Finds supporting AND opposing evidence 375 -4. Computes probabilistic verdicts with uncertainty 376 -5. Provides clear reasoning chains 957 +**One prompt generates everything.** 377 377 378 - ✅ **Quality Gates Work:**959 +--- 379 379 380 -1. Gate 1 blocks non-factual claims (100% block rate) 381 -2. Gate 4 blocks low-quality verdicts (blocks if <2 sources or quality <0.6) 382 -3. Clear rejection reasons provided 961 +=== 10.4 Technology Stack Suggestions === 383 383 384 -✅ **NFR11 Met:** 963 +**Frontend:** 964 +* HTML + CSS + JavaScript (minimal framework) 965 +* OR: Next.js (if team prefers) 966 +* Hosted: Local machine OR Vercel/Netlify free tier 385 385 386 -1. Quality gates reduce hallucination rate 387 -2. Blocked outputs have clear explanations 388 -3. Quality metrics are logged 968 +**Backend:** 969 +* Python Flask/FastAPI (simple REST API) 970 +* OR: Next.js API routes (if using Next.js) 971 +* Hosted: Local machine OR Railway/Render free tier 389 389 390 -=== 6.2 Quality Thresholds === 973 +**AKEL Integration:** 974 +* Claude API via Anthropic SDK 975 +* Model: Claude Sonnet 4.5 or latest available 391 391 392 -**Minimum Acceptable:** 977 +**Database:** 978 +* None (stateless acceptable) 979 +* OR: SQLite if want to store demo examples 980 +* OR: JSON files on disk 393 393 394 -* ≥70% of test claims correctly classified (factual/non-factual) 395 -* ≥60% of verdicts are reasonable (human evaluation) 396 -* Gate 1 blocks 100% of non-factual claims 397 -* Gate 4 blocks verdicts with <2 sources 982 +**Deployment:** 983 +* Local development environment sufficient for POC 984 +* Optional: Deploy to cloud for remote demos 398 398 399 - **Target:**986 +--- 400 400 401 -* ≥80% claims correctly classified 402 -* ≥75% verdicts are reasonable 403 -* <10% false positives (blocking good claims) 988 +== 11. Success Criteria == 404 404 405 -=== 6.3POCDecisionGate===990 +=== 11.1 Minimum Success (POC Passes) === 406 406 407 -**After POC1, we decide:** 992 +**Required for GO decision:** 993 +* ✅ AI extracts 3-5 factual claims automatically 994 +* ✅ AI provides verdict for each claim automatically 995 +* ✅ Verdicts are reasonable (≥70% make logical sense) 996 +* ✅ Analysis summary is coherent 997 +* ✅ Output is comprehensible to reviewers 998 +* ✅ Team/advisors understand the output 999 +* ✅ Team agrees approach has merit 1000 +* ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention) 408 408 409 -**✅ PROCEED to POC2** if: 1002 +**Quality Definition:** 1003 +* "Reasonable verdict" = Defensible given general knowledge 1004 +* "Coherent summary" = Logically structured, grammatically correct 1005 +* "Comprehensible" = Reviewers understand what analysis means 410 410 411 -* Success criteria met 412 -* Quality gates demonstrably improve output 413 -* Core workflow is technically sound 414 -* Clear path to production quality 1007 +--- 415 415 416 - **⚠️ITERATEPOC1**if:1009 +=== 11.2 POC Fails If === 417 417 418 -* Success criteria partially met 419 -* Gates work but need tuning 420 -* Core issues identified but fixable 1011 +**Automatic NO-GO if any of these:** 1012 +* ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones) 1013 +* ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random) 1014 +* ❌ Output incomprehensible (reviewers can't understand analysis) 1015 +* ❌ **Requires manual editing for most analyses** (> 50% need human correction) 1016 +* ❌ Team loses confidence in AI-automated approach 421 421 422 - **❌ PIVOT APPROACH** if:1018 +--- 423 423 424 -* Success criteria not met 425 -* Fundamental AI limitations discovered 426 -* Quality gates insufficient 427 -* Alternative approach needed 1020 +=== 11.3 Quality Thresholds === 428 428 429 - ==7.TestCases ==1022 +**POC quality expectations:** 430 430 431 -=== 7.1 Happy Path === 1024 +|=Component|=Quality Threshold|=Definition 1025 +|Claim Extraction|(% class="success" %)≥70% accuracy(%%) |Identifies obvious factual claims, may miss some edge cases 1026 +|Verdict Logic|(% class="success" %)≥70% defensible(%%) |Verdicts are logical given reasoning provided 1027 +|Reasoning Clarity|(% class="success" %)≥70% clear(%%) |1-3 sentences are understandable and relevant 1028 +|Overall Analysis|(% class="success" %)≥70% useful(%%) |Output helps user understand article claims 432 432 433 -** Test 1:SimpleFactualClaim**1030 +**Analogy:** "B student" quality (70-80%), not "A+" perfection yet 434 434 435 -* Input: "Paris is the capital of France" 436 -* Expected: Factual, 1 scenario, verdict 95% true 1032 +**Not expecting:** 1033 +* 100% accuracy 1034 +* Perfect claim coverage 1035 +* Comprehensive evidence gathering 1036 +* Flawless verdicts 1037 +* Production polish 437 437 438 -**Test 2: Ambiguous Claim** 1039 +**Expecting:** 1040 +* Reasonable claim extraction 1041 +* Defensible verdicts 1042 +* Understandable reasoning 1043 +* Useful output 439 439 440 -* Input: "Switzerland has the highest income in Europe" 441 -* Expected: Factual, 2-3 scenarios, verdict with uncertainty 1045 +--- 442 442 443 - **Test3: StatisticalClaim**1047 +== 12. Test Cases == 444 444 445 -* Input: "10% of people have condition X" 446 -* Expected: Factual, evidence with numbers, probabilistic verdict 1049 +=== 12.1 Test Case 1: Simple Factual Claim === 447 447 448 - ===7.2EdgeCases===1051 +**Input:** "Coffee reduces the risk of type 2 diabetes by 30%" 449 449 450 -**Test 4: Opinion** 1053 +**Expected Output:** 1054 +* Extract claim correctly 1055 +* Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED 1056 +* Confidence: 70-90% 1057 +* Risk tier: C (Low) 1058 +* Reasoning: Mentions studies or evidence 451 451 452 -* Input: "Paris is the best city" 453 -* Expected: Non-factual (opinion), blocked by Gate 1 1060 +**Success:** Verdict is reasonable and reasoning makes sense 454 454 455 - **Test 5: Prediction**1062 +--- 456 456 457 -* Input: "Bitcoin will reach $100,000 next year" 458 -* Expected: Non-factual (prediction), blocked by Gate 1 1064 +=== 12.2 Test Case 2: Complex News Article === 459 459 460 -** Test6:InsufficientEvidence**1066 +**Input:** News article URL with multiple claims about politics/health/science 461 461 462 -* Input: Obscure factual claim with no sources 463 -* Expected: Blocked by Gate 4 (<2 sources) 1068 +**Expected Output:** 1069 +* Extract 3-5 key claims 1070 +* Verdict for each (may vary: some supported, some uncertain, some refuted) 1071 +* Coherent analysis summary 1072 +* Article summary 1073 +* Risk tiers assigned appropriately 464 464 465 - ===7.3QualityGateTests===1075 +**Success:** Claims identified are actually from article, verdicts are reasonable 466 466 467 - **Test 7: Gate 1 Effectiveness**1077 +--- 468 468 469 -* Input: Mix of 10 factual + 10 non-factual claims 470 -* Expected: Gate 1 blocks all 10 non-factual (100% precision) 1079 +=== 12.3 Test Case 3: Controversial Topic === 471 471 472 -** Test8:Gate4Effectiveness**1081 +**Input:** Article on contested political or scientific topic 473 473 474 -* Input: Claims with varying evidence availability 475 -* Expected: Gate 4 blocks low-confidence verdicts 1083 +**Expected Output:** 1084 +* Balanced analysis 1085 +* Acknowledges uncertainty where appropriate 1086 +* Doesn't overstate confidence 1087 +* Reasoning shows awareness of complexity 476 476 477 - == 8. TechnicalArchitecture(POC)==1089 +**Success:** Analysis is fair and doesn't show obvious bias 478 478 479 - === 8.1 Simplified Architecture ===1091 +--- 480 480 481 - **POCTech Stack:**1093 +=== 12.4 Test Case 4: Clearly False Claim === 482 482 483 -* **Frontend:** Simple web interface (Next.js + TypeScript) 484 -* **Backend:** Single API endpoint 485 -* **AI:** Claude API (Sonnet 4.5) 486 -* **Database:** Local JSON files (no database) 487 -* **Deployment:** Single server 1095 +**Input:** Article with obviously false claim (e.g., "The Earth is flat") 488 488 489 -**Architecture Diagram:** See [[POC1 Specification>>FactHarbor.Specification.POC.Specification]] 1097 +**Expected Output:** 1098 +* Extract claim 1099 +* Verdict: REFUTED 1100 +* High confidence (> 90%) 1101 +* Risk tier: C (Low - established fact) 1102 +* Clear reasoning 490 490 1104 +**Success:** AI correctly identifies false claim with high confidence 491 491 492 - === 8.2 AKEL Implementation ===1106 +--- 493 493 494 - **POCAKEL:**1108 +=== 12.5 Test Case 5: Genuinely Uncertain Claim === 495 495 496 -* Single-threaded processing 497 -* Synchronous API calls 498 -* No caching 499 -* Basic error handling 500 -* Console logging 1110 +**Input:** Article with claim where evidence is genuinely mixed 501 501 502 -**Full AKEL (POC2+):** 1112 +**Expected Output:** 1113 +* Extract claim 1114 +* Verdict: UNCERTAIN 1115 +* Moderate confidence (40-60%) 1116 +* Reasoning explains why uncertain 503 503 504 -* Multi-threaded processing 505 -* Async API calls 506 -* Evidence caching 507 -* Advanced error handling with retry 508 -* Structured logging + monitoring 1118 +**Success:** AI recognizes uncertainty and doesn't overstate confidence 509 509 510 - == 9. POC Philosophy ==1120 +--- 511 511 512 -{{info}} 513 -**Important:** POC validates concept, not production readiness. Focus is on proving AI can do the job, with production quality coming in later phases. 514 -{{/info}} 1122 +=== 12.6 Test Case 6: High-Risk Medical Claim === 515 515 516 - ===9.1 CorePrinciples===1124 +**Input:** Article making medical claims 517 517 518 -* 519 -* *520 -* *1.ProveConcept,NotProduction521 -* POC validatesAIcandothejob522 -* Productionqualitycomesin POC2and Beta 0523 -* Focuson "doesitwork?"not"isitperfect?"1126 +**Expected Output:** 1127 +* Extract claim 1128 +* Verdict: [appropriate based on evidence] 1129 +* Risk tier: A (High - medical) 1130 +* Red label displayed 1131 +* Clear disclaimer about not being medical advice 524 524 525 -** 2.ImplementSubsetof Requirements**1133 +**Success:** Risk tier correctly assigned, appropriate warnings shown 526 526 527 -* POC covers FR1-7, NFR11 (lite) 528 -* All other requirements deferred 529 -* Clear mapping to [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]] 1135 +--- 530 530 531 - **3.QualityGatesValidateApproach**1137 +== 13. POC Decision Gate == 532 532 533 -* 2 gates prove the concept 534 -* Remaining 5 gates added in POC2 535 -* Gates must demonstrably improve quality 1139 +=== 13.1 Decision Framework === 536 536 537 - **4. IterateBasedonResults**1141 +After POC testing complete, team makes one of three decisions: 538 538 539 -* POC results determine next steps 540 -* Decision gate after POC1 541 -* Flexibility to pivot if needed 1143 +**Option A: GO (Proceed to POC2)** 542 542 543 -=== 9.2 Success === 1145 +**Conditions:** 1146 +* AI quality ≥70% without manual editing 1147 +* Basic claim → verdict pipeline validated 1148 +* Internal + advisor feedback positive 1149 +* Technical feasibility confirmed 1150 +* Team confident in direction 1151 +* Clear path to improving AI quality to ≥90% 544 544 545 - Clear Path Forward === 1153 +**Next Steps:** 1154 +* Plan POC2 development (add scenarios) 1155 +* Design scenario architecture 1156 +* Expand to Evidence Model structure 1157 +* Test with more complex articles 546 546 547 - POC succeeds if we can confidently answer:1159 +--- 548 548 549 - ✅**TechnicalFeasibility:**1161 +**Option B: NO-GO (Pivot or Stop)** 550 550 551 -* Can AI extract claims reliably? 552 -* Can AI find balanced evidence? 553 -* Can AI compute reasonable verdicts? 1163 +**Conditions:** 1164 +* AI quality < 60% 1165 +* Requires manual editing for most analyses (> 50%) 1166 +* Feedback indicates fundamental flaws 1167 +* Cost/effort not justified by value 1168 +* No clear path to improvement 554 554 555 -✅ **Quality Approach:** 1170 +**Next Steps:** 1171 +* **Pivot:** Change to hybrid human-AI approach (accept manual review required) 1172 +* **Stop:** Conclude approach not viable, revisit later 556 556 557 -* Do quality gates improve output? 558 -* Can we measure and track quality? 559 -* Is the gate approach scalable? 1174 +--- 560 560 561 - ✅**ProductionPath:**1176 +**Option C: ITERATE (Improve POC)** 562 562 563 -* Is the core architecture sound? 564 -* What needs improvement for production? 565 -* Is POC2 the right next step? 1178 +**Conditions:** 1179 +* Concept has merit but execution needs work 1180 +* Specific improvements identified 1181 +* Addressable with better prompts/approach 1182 +* AI quality between 60-70% 566 566 567 -== 10. Related Pages == 1184 +**Next Steps:** 1185 +* Improve AI prompts 1186 +* Test different approaches 1187 +* Re-run POC with improvements 1188 +* Then make GO/NO-GO decision 568 568 569 -* **[[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]** - Full system requirements (this POC implements a subset) 570 -* **[[POC1 Specification (Detailed)>>FactHarbor.Specification.POC.Specification]]** - Detailed POC1 technical specs 571 -* **[[POC Summary>>FactHarbor.Specification.POC.Summary]]** - High-level POC overview 572 -* **[[Implementation Roadmap>>FactHarbor.Roadmap.WebHome]]** - POC1, POC2, Beta 0, V1.0 phases 573 -* **[[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]** - What users need (drives requirements) 1190 +--- 574 574 575 -**Document Owner:** Technical Team 576 -**Review Frequency:** After each POC iteration 577 -**Version History:** 1192 +=== 13.2 Decision Criteria Summary === 578 578 579 -* v1.0 - Initial POC requirements 580 -* v2.0 - Updated after specification cross-check 581 -* v3.0 - Aligned with Main Requirements (FR/NFR IDs added) 1194 +{{code}} 1195 +AI Quality < 60% → NO-GO (approach doesn't work) 1196 +AI Quality 60-70% → ITERATE (improve and retry) 1197 +AI Quality ≥70% → GO (proceed to POC2) 1198 +{{/code}} 1199 + 1200 +--- 1201 + 1202 +== 14. Key Risks & Mitigations == 1203 + 1204 +=== 14.1 Risk: AI Quality Not Good Enough === 1205 + 1206 +**Likelihood:** Medium-High 1207 +**Impact:** POC fails 1208 + 1209 +**Mitigation:** 1210 +* Extensive prompt engineering and testing 1211 +* Use best available AI models (Sonnet 4.5) 1212 +* Test with diverse article types 1213 +* Iterate on prompts based on results 1214 + 1215 +**Acceptance:** This is what POC tests - be ready for failure 1216 + 1217 +--- 1218 + 1219 +=== 14.2 Risk: AI Consistency Issues === 1220 + 1221 +**Likelihood:** Medium 1222 +**Impact:** Works sometimes, fails other times 1223 + 1224 +**Mitigation:** 1225 +* Test with 10+ diverse articles 1226 +* Measure success rate honestly 1227 +* Improve prompts to increase consistency 1228 + 1229 +**Acceptance:** Some variability OK if average quality ≥70% 1230 + 1231 +--- 1232 + 1233 +=== 14.3 Risk: Output Incomprehensible === 1234 + 1235 +**Likelihood:** Low-Medium 1236 +**Impact:** Users can't understand analysis 1237 + 1238 +**Mitigation:** 1239 +* Create clear explainer document 1240 +* Iterate on output format 1241 +* Test with non-technical reviewers 1242 +* Simplify language if needed 1243 + 1244 +**Acceptance:** Iterate until comprehensible 1245 + 1246 +--- 1247 + 1248 +=== 14.4 Risk: API Rate Limits / Costs === 1249 + 1250 +**Likelihood:** Low 1251 +**Impact:** System slow or expensive 1252 + 1253 +**Mitigation:** 1254 +* Monitor API usage 1255 +* Implement retry logic 1256 +* Estimate costs before scaling 1257 + 1258 +**Acceptance:** POC can be slow and expensive (optimization later) 1259 + 1260 +--- 1261 + 1262 +=== 14.5 Risk: Scope Creep === 1263 + 1264 +**Likelihood:** Medium 1265 +**Impact:** POC becomes too complex 1266 + 1267 +**Mitigation:** 1268 +* Strict scope discipline 1269 +* Say NO to feature additions 1270 +* Keep focus on core question 1271 + 1272 +**Acceptance:** POC is minimal by design 1273 + 1274 +--- 1275 + 1276 +== 15. POC Philosophy == 1277 + 1278 +=== 15.1 Core Principles === 1279 + 1280 +**1. Build Less, Learn More** 1281 +* Minimum features to test hypothesis 1282 +* Don't build unvalidated features 1283 +* Focus on core question only 1284 + 1285 +**2. Fail Fast** 1286 +* Quick test of hardest part (AI capability) 1287 +* Accept that POC might fail 1288 +* Better to discover issues early 1289 +* Honest assessment over optimistic hope 1290 + 1291 +**3. Test First, Build Second** 1292 +* Validate AI can do this before building platform 1293 +* Don't assume it will work 1294 +* Let results guide decisions 1295 + 1296 +**4. Automation First** 1297 +* No manual editing allowed 1298 +* Tests scalability, not just feasibility 1299 +* Proves approach can work at scale 1300 + 1301 +**5. Honest Assessment** 1302 +* Don't cherry-pick examples 1303 +* Don't manually fix bad outputs 1304 +* Document failures openly 1305 +* Make data-driven decisions 1306 + 1307 +--- 1308 + 1309 +=== 15.2 What POC Is === 1310 + 1311 +✅ Testing AI capability without humans 1312 +✅ Proving core technical concept 1313 +✅ Fast validation of approach 1314 +✅ Honest assessment of feasibility 1315 + 1316 +--- 1317 + 1318 +=== 15.3 What POC Is NOT === 1319 + 1320 +❌ Building a product 1321 +❌ Production-ready system 1322 +❌ Feature-complete platform 1323 +❌ Perfectly accurate analysis 1324 +❌ Polished user experience 1325 + 1326 +--- 1327 + 1328 +== 16. Success = Clear Path Forward == 1329 + 1330 +**If POC succeeds (≥70% AI quality):** 1331 +* ✅ Approach validated 1332 +* ✅ Proceed to POC2 (add scenarios) 1333 +* ✅ Design full Evidence Model structure 1334 +* ✅ Test multi-scenario comparison 1335 +* ✅ Focus on improving AI quality from 70% → 90% 1336 + 1337 +**If POC fails (< 60% AI quality):** 1338 +* ✅ Learn what doesn't work 1339 +* ✅ Pivot to different approach 1340 +* ✅ OR wait for better AI technology 1341 +* ✅ Avoid wasting resources on non-viable approach 1342 + 1343 +**Either way, POC provides clarity.** 1344 + 1345 +--- 1346 + 1347 +== 17. Related Pages == 1348 + 1349 +* [[User Needs>>FactHarbor.Specification.Requirements.User Needs]] 1350 +* [[Requirements>>FactHarbor.Requirements.WebHome]] 1351 +* [[Gap Analysis>>FactHarbor.Analysis.GapAnalysis]] 1352 +* [[Architecture>>FactHarbor.Specification.Architecture.WebHome]] 1353 +* [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] 1354 +* [[Workflows>>FactHarbor.Specification.Workflows.WebHome]] 1355 + 1356 +--- 1357 + 1358 +**Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment) 1359 +