Changes for page POC Requirements (POC1 & POC2)
Last modified by Robert Schaub on 2026/02/08 08:25
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -14,9 +14,11 @@ 14 14 === 1.1 What POC Tests === 15 15 16 16 **Core Question:** 17 + 17 17 > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts? 18 18 19 19 **What we're proving:** 21 + 20 20 * AI can identify factual claims from text 21 21 * AI can evaluate those claims with structured evidence 22 22 * Quality gates can filter unreliable outputs ... ... @@ -23,6 +23,7 @@ 23 23 * The core workflow is technically feasible 24 24 25 25 **What we're NOT proving:** 28 + 26 26 * Production-ready reliability (that's POC2) 27 27 * User-facing features (that's Beta 0) 28 28 * Full IFCN compliance (that's V1.0) ... ... @@ -32,15 +32,15 @@ 32 32 POC1 implements a **subset** of the full system requirements defined in [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]. 33 33 34 34 **Scope Summary:** 38 + 35 35 * **In Scope:** 8 requirements (7 FRs + 1 NFR) 36 36 * **Partial:** 3 NFRs (simplified versions) 37 37 * **Out of Scope:** 19 requirements (deferred to later phases) 38 38 39 - 40 40 == 2. POC1 Scope == 41 41 42 42 {{success}} 43 -**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor.Roadmap.Requirements-Roadmap-Matrix.WebHome]] 46 +**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor V0\.9\.88.Roadmap.Requirements-Roadmap-Matrix.WebHome]] 44 44 45 45 The Roadmap Matrix is the single source of truth for which requirements are implemented in which phases. This page provides POC1-specific implementation details only. 46 46 {{/success}} ... ... @@ -54,9 +54,8 @@ 54 54 55 55 **POC1 also implements these workflow components** (detailed as FR1-FR6 in implementation sections below) 56 56 57 -{{info}} 58 -**Note:** FR11 (Audit Trail) and FR13 (In-Article Claim Highlighting) are deferred to Beta 0 for production readiness and user experience enhancement. 59 -{{/info}}: 60 +{{info}}**Note:** FR11 (Audit Trail) and FR13 (In-Article Claim Highlighting) are deferred to Beta 0 for production readiness and user experience enhancement.{{/info}}: 61 + 60 60 * Claim extraction (FR1) 61 61 * Claim context (FR2) 62 62 * Multiple scenarios (FR3) ... ... @@ -67,6 +67,7 @@ 67 67 * In-article highlighting (FR13) - deferred to Beta 0 68 68 69 69 **Partial implementations:** 72 + 70 70 * NFR1 (Explainability) - Basic only 71 71 * NFR2 (Performance) - Functional but not optimized 72 72 * NFR3 (Transparency) - Basic only ... ... @@ -82,6 +82,7 @@ 82 82 **Main Requirement:** AI extracts factual claims from input text 83 83 84 84 **POC Implementation:** 88 + 85 85 * ✅ AKEL extracts claims using LLM 86 86 * ✅ Each claim includes original text reference 87 87 * ✅ Claims are identified as factual/non-factual ... ... @@ -88,16 +88,17 @@ 88 88 * ❌ No advanced claim parsing (added in POC2) 89 89 90 90 **Acceptance Criteria:** 95 + 91 91 * Extracts 3-5 claims from typical article 92 92 * Identifies factual vs non-factual claims 93 93 * Quality Gate 1 validates extraction 94 94 95 - 96 96 === 3.2 FR3: Multiple Scenarios (Full Implementation) === 97 97 98 98 **Main Requirement:** Generate multiple interpretation scenarios for ambiguous claims 99 99 100 100 **POC Implementation:** 105 + 101 101 * ✅ AKEL generates 2-3 scenarios per claim 102 102 * ✅ Scenarios capture different interpretations 103 103 * ✅ Each scenario is evaluated separately ... ... @@ -104,16 +104,17 @@ 104 104 * ✅ Verdict considers all scenarios 105 105 106 106 **Acceptance Criteria:** 112 + 107 107 * Generates 2+ scenarios for ambiguous claims 108 108 * Scenarios are meaningfully different 109 109 * All scenarios are evaluated 110 110 111 - 112 112 === 3.3 FR4: Analysis Summary (Basic Implementation) === 113 113 114 114 **Main Requirement:** Provide user-friendly summary of analysis 115 115 116 116 **POC Implementation:** 122 + 117 117 * ✅ Simple text summary generated 118 118 * ❌ No rich formatting (added in Beta 0) 119 119 * ❌ No visual elements (added in Beta 0) ... ... @@ -131,10 +131,12 @@ 131 131 === 3.4 FR5-FR6: Evidence Collection & Evaluation (Full Implementation) === 132 132 133 133 **Main Requirements:** 140 + 134 134 * FR5: Collect supporting and opposing evidence 135 135 * FR6: Evaluate evidence source reliability 136 136 137 137 **POC Implementation:** 145 + 138 138 * ✅ AKEL searches for evidence (web/knowledge base) 139 139 * ✅ **Mandatory contradiction search** (finds opposing evidence) 140 140 * ✅ Source reliability scoring ... ... @@ -142,16 +142,17 @@ 142 142 * ❌ No advanced source verification (added in POC2) 143 143 144 144 **Acceptance Criteria:** 153 + 145 145 * Finds 2+ supporting evidence items 146 146 * Finds 1+ opposing evidence (if exists) 147 147 * Sources scored for reliability 148 148 149 - 150 150 === 3.5 FR7: Automated Verdicts (Full Implementation) === 151 151 152 152 **Main Requirement:** AI computes verdicts with uncertainty quantification 153 153 154 154 **POC Implementation:** 163 + 155 155 * ✅ Probabilistic verdicts (0-100% confidence) 156 156 * ✅ Uncertainty explicitly stated 157 157 * ✅ Reasoning chain provided ... ... @@ -166,11 +166,11 @@ 166 166 ``` 167 167 168 168 **Acceptance Criteria:** 178 + 169 169 * Verdicts include probability (0-100%) 170 170 * Uncertainty explicitly quantified 171 171 * Reasoning chain explains verdict 172 172 173 - 174 174 === 3.6 NFR11: Quality Assurance Framework (LITE VERSION) === 175 175 176 176 **Main Requirement:** Complete quality assurance with 7 quality gates ... ... @@ -178,11 +178,13 @@ 178 178 **POC Implementation:** **2 gates only** 179 179 180 180 **Quality Gate 1: Claim Validation** 190 + 181 181 * ✅ Validates claim is factual and verifiable 182 182 * ✅ Blocks non-factual claims (opinion/prediction/ambiguous) 183 183 * ✅ Provides clear rejection reason 184 184 185 185 **Quality Gate 4: Verdict Confidence Assessment** 196 + 186 186 * ✅ Validates ≥2 sources found 187 187 * ✅ Validates quality score ≥0.6 188 188 * ✅ Blocks low-confidence verdicts ... ... @@ -189,6 +189,7 @@ 189 189 * ✅ Provides clear rejection reason 190 190 191 191 **Out of Scope (POC2+):** 203 + 192 192 * ❌ Gate 2: Evidence Relevance 193 193 * ❌ Gate 3: Scenario Coherence 194 194 * ❌ Gate 5: Source Diversity ... ... @@ -201,11 +201,13 @@ 201 201 === 3.7 NFR1-3: Performance, Scalability, Reliability (Basic) === 202 202 203 203 **Main Requirements:** 216 + 204 204 * NFR1: Response time < 30 seconds 205 205 * NFR2: Handle 1000+ concurrent users 206 206 * NFR3: 99.9% uptime 207 207 208 208 **POC Implementation:** 222 + 209 209 * ⚠️ **Response time monitored** (not optimized) 210 210 * ⚠️ **Single-threaded processing** (no concurrency) 211 211 * ⚠️ **Basic error handling** (no advanced retry logic) ... ... @@ -213,11 +213,11 @@ 213 213 **Rationale:** POC proves functionality. Performance optimization happens in POC2. 214 214 215 215 **POC Acceptance:** 230 + 216 216 * Analysis completes (no timeout requirement) 217 217 * Errors don't crash system 218 218 * Basic logging in place 219 219 220 - 221 221 == 4. What's NOT in POC Scope == 222 222 223 223 === 4.1 User-Facing Features (Beta 0+) === ... ... @@ -227,6 +227,7 @@ 227 227 {{/warning}} 228 228 229 229 **Out of Scope:** 244 + 230 230 * ❌ User accounts and authentication (FR8) 231 231 * ❌ User corrections system (FR9, FR45-46) 232 232 * ❌ Public publishing interface (FR10) ... ... @@ -240,6 +240,7 @@ 240 240 === 4.2 Advanced Features (V1.0+) === 241 241 242 242 **Out of Scope:** 258 + 243 243 * ❌ IFCN compliance (FR47) 244 244 * ❌ ClaimReview schema (FR48) 245 245 * ❌ Archive.org integration (FR49) ... ... @@ -254,6 +254,7 @@ 254 254 === 4.3 Production Requirements (POC2, Beta 0) === 255 255 256 256 **Out of Scope:** 273 + 257 257 * ❌ Security controls (NFR4, NFR12) 258 258 * ❌ Code maintainability (NFR5) 259 259 * ❌ System monitoring (NFR13) ... ... @@ -270,21 +270,26 @@ 270 270 271 271 For each analyzed claim, POC must produce: 272 272 273 -**1. Claim** 290 +* 291 +** 292 +**1. Claim 274 274 * Original text 275 275 * Classification (factual/non-factual/ambiguous) 276 276 * If non-factual: Clear reason why 277 277 278 278 **2. Scenarios** (if factual) 298 + 279 279 * 2-3 interpretation scenarios 280 280 * Each scenario clearly described 281 281 282 282 **3. Evidence** (if factual) 303 + 283 283 * Supporting evidence (2+ items) 284 284 * Opposing evidence (if exists) 285 285 * Source URLs and reliability scores 286 286 287 287 **4. Verdict** (if factual) 309 + 288 288 * Probability (0-100%) 289 289 * Uncertainty quantification 290 290 * Confidence level (LOW/MEDIUM/HIGH) ... ... @@ -291,10 +291,10 @@ 291 291 * Reasoning chain 292 292 293 293 **5. Quality Status** 316 + 294 294 * Which gates passed/failed 295 295 * If failed: Clear explanation why 296 296 297 - 298 298 === 5.2 Example POC Output === 299 299 300 300 {{code language="json"}} ... ... @@ -346,6 +346,7 @@ 346 346 POC is successful if: 347 347 348 348 ✅ **FR1-FR7 Requirements Met:** 371 + 349 349 1. Extracts 3-5 factual claims from test articles 350 350 2. Generates 2-3 scenarios per ambiguous claim 351 351 3. Finds supporting AND opposing evidence ... ... @@ -353,19 +353,21 @@ 353 353 5. Provides clear reasoning chains 354 354 355 355 ✅ **Quality Gates Work:** 379 + 356 356 1. Gate 1 blocks non-factual claims (100% block rate) 357 357 2. Gate 4 blocks low-quality verdicts (blocks if <2 sources or quality <0.6) 358 358 3. Clear rejection reasons provided 359 359 360 360 ✅ **NFR11 Met:** 385 + 361 361 1. Quality gates reduce hallucination rate 362 362 2. Blocked outputs have clear explanations 363 363 3. Quality metrics are logged 364 364 365 - 366 366 === 6.2 Quality Thresholds === 367 367 368 368 **Minimum Acceptable:** 393 + 369 369 * ≥70% of test claims correctly classified (factual/non-factual) 370 370 * ≥60% of verdicts are reasonable (human evaluation) 371 371 * Gate 1 blocks 100% of non-factual claims ... ... @@ -372,16 +372,17 @@ 372 372 * Gate 4 blocks verdicts with <2 sources 373 373 374 374 **Target:** 400 + 375 375 * ≥80% claims correctly classified 376 376 * ≥75% verdicts are reasonable 377 377 * <10% false positives (blocking good claims) 378 378 379 - 380 380 === 6.3 POC Decision Gate === 381 381 382 382 **After POC1, we decide:** 383 383 384 384 **✅ PROCEED to POC2** if: 410 + 385 385 * Success criteria met 386 386 * Quality gates demonstrably improve output 387 387 * Core workflow is technically sound ... ... @@ -388,65 +388,72 @@ 388 388 * Clear path to production quality 389 389 390 390 **⚠️ ITERATE POC1** if: 417 + 391 391 * Success criteria partially met 392 392 * Gates work but need tuning 393 393 * Core issues identified but fixable 394 394 395 395 **❌ PIVOT APPROACH** if: 423 + 396 396 * Success criteria not met 397 397 * Fundamental AI limitations discovered 398 398 * Quality gates insufficient 399 399 * Alternative approach needed 400 400 401 - 402 402 == 7. Test Cases == 403 403 404 404 === 7.1 Happy Path === 405 405 406 406 **Test 1: Simple Factual Claim** 434 + 407 407 * Input: "Paris is the capital of France" 408 -* Expected: Factual, 1 scenario, verdict ~95% true436 +* Expected: Factual, 1 scenario, verdict 95% true 409 409 410 410 **Test 2: Ambiguous Claim** 439 + 411 411 * Input: "Switzerland has the highest income in Europe" 412 412 * Expected: Factual, 2-3 scenarios, verdict with uncertainty 413 413 414 414 **Test 3: Statistical Claim** 444 + 415 415 * Input: "10% of people have condition X" 416 416 * Expected: Factual, evidence with numbers, probabilistic verdict 417 417 418 - 419 419 === 7.2 Edge Cases === 420 420 421 421 **Test 4: Opinion** 451 + 422 422 * Input: "Paris is the best city" 423 423 * Expected: Non-factual (opinion), blocked by Gate 1 424 424 425 425 **Test 5: Prediction** 456 + 426 426 * Input: "Bitcoin will reach $100,000 next year" 427 427 * Expected: Non-factual (prediction), blocked by Gate 1 428 428 429 429 **Test 6: Insufficient Evidence** 461 + 430 430 * Input: Obscure factual claim with no sources 431 431 * Expected: Blocked by Gate 4 (<2 sources) 432 432 433 - 434 434 === 7.3 Quality Gate Tests === 435 435 436 436 **Test 7: Gate 1 Effectiveness** 468 + 437 437 * Input: Mix of 10 factual + 10 non-factual claims 438 438 * Expected: Gate 1 blocks all 10 non-factual (100% precision) 439 439 440 440 **Test 8: Gate 4 Effectiveness** 473 + 441 441 * Input: Claims with varying evidence availability 442 442 * Expected: Gate 4 blocks low-confidence verdicts 443 443 444 - 445 445 == 8. Technical Architecture (POC) == 446 446 447 447 === 8.1 Simplified Architecture === 448 448 449 449 **POC Tech Stack:** 482 + 450 450 * **Frontend:** Simple web interface (Next.js + TypeScript) 451 451 * **Backend:** Single API endpoint 452 452 * **AI:** Claude API (Sonnet 4.5) ... ... @@ -459,6 +459,7 @@ 459 459 === 8.2 AKEL Implementation === 460 460 461 461 **POC AKEL:** 495 + 462 462 * Single-threaded processing 463 463 * Synchronous API calls 464 464 * No caching ... ... @@ -466,6 +466,7 @@ 466 466 * Console logging 467 467 468 468 **Full AKEL (POC2+):** 503 + 469 469 * Multi-threaded processing 470 470 * Async API calls 471 471 * Evidence caching ... ... @@ -472,7 +472,6 @@ 472 472 * Advanced error handling with retry 473 473 * Structured logging + monitoring 474 474 475 - 476 476 == 9. POC Philosophy == 477 477 478 478 {{info}} ... ... @@ -481,47 +481,55 @@ 481 481 482 482 === 9.1 Core Principles === 483 483 484 -**1. Prove Concept, Not Production** 518 +* 519 +** 520 +**1. Prove Concept, Not Production 485 485 * POC validates AI can do the job 486 486 * Production quality comes in POC2 and Beta 0 487 487 * Focus on "does it work?" not "is it perfect?" 488 488 489 489 **2. Implement Subset of Requirements** 526 + 490 490 * POC covers FR1-7, NFR11 (lite) 491 491 * All other requirements deferred 492 492 * Clear mapping to [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]] 493 493 494 494 **3. Quality Gates Validate Approach** 532 + 495 495 * 2 gates prove the concept 496 496 * Remaining 5 gates added in POC2 497 497 * Gates must demonstrably improve quality 498 498 499 499 **4. Iterate Based on Results** 538 + 500 500 * POC results determine next steps 501 501 * Decision gate after POC1 502 502 * Flexibility to pivot if needed 503 503 543 +=== 9.2 Success === 504 504 505 - ===9.2 Success =Clear Path Forward ===545 + Clear Path Forward === 506 506 507 507 POC succeeds if we can confidently answer: 508 508 509 509 ✅ **Technical Feasibility:** 550 + 510 510 * Can AI extract claims reliably? 511 511 * Can AI find balanced evidence? 512 512 * Can AI compute reasonable verdicts? 513 513 514 514 ✅ **Quality Approach:** 556 + 515 515 * Do quality gates improve output? 516 516 * Can we measure and track quality? 517 517 * Is the gate approach scalable? 518 518 519 519 ✅ **Production Path:** 562 + 520 520 * Is the core architecture sound? 521 521 * What needs improvement for production? 522 522 * Is POC2 the right next step? 523 523 524 - 525 525 == 10. Related Pages == 526 526 527 527 * **[[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]** - Full system requirements (this POC implements a subset) ... ... @@ -530,11 +530,10 @@ 530 530 * **[[Implementation Roadmap>>FactHarbor.Roadmap.WebHome]]** - POC1, POC2, Beta 0, V1.0 phases 531 531 * **[[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]** - What users need (drives requirements) 532 532 533 - 534 534 **Document Owner:** Technical Team 535 535 **Review Frequency:** After each POC iteration 536 536 **Version History:** 578 + 537 537 * v1.0 - Initial POC requirements 538 538 * v2.0 - Updated after specification cross-check 539 539 * v3.0 - Aligned with Main Requirements (FR/NFR IDs added) 540 -