Changes for page POC Requirements
Last modified by Robert Schaub on 2026/02/08 08:25
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -14,9 +14,11 @@ 14 14 === 1.1 What POC Tests === 15 15 16 16 **Core Question:** 17 + 17 17 > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts? 18 18 19 19 **What we're proving:** 21 + 20 20 * AI can identify factual claims from text 21 21 * AI can evaluate those claims with structured evidence 22 22 * Quality gates can filter unreliable outputs ... ... @@ -23,6 +23,7 @@ 23 23 * The core workflow is technically feasible 24 24 25 25 **What we're NOT proving:** 28 + 26 26 * Production-ready reliability (that's POC2) 27 27 * User-facing features (that's Beta 0) 28 28 * Full IFCN compliance (that's V1.0) ... ... @@ -32,11 +32,11 @@ 32 32 POC1 implements a **subset** of the full system requirements defined in [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]. 33 33 34 34 **Scope Summary:** 38 + 35 35 * **In Scope:** 8 requirements (7 FRs + 1 NFR) 36 36 * **Partial:** 3 NFRs (simplified versions) 37 37 * **Out of Scope:** 19 requirements (deferred to later phases) 38 38 39 - 40 40 == 2. Requirements Scope Matrix == 41 41 42 42 {{success}} ... ... @@ -44,7 +44,7 @@ 44 44 {{/success}} 45 45 46 46 |=Requirement|=POC1 Status|=Implementation Level|=Notes 47 -|**CORE WORKFLOW**|||| 50 +|**CORE WORKFLOW**||||\\ 48 48 |FR1: Claim Extraction|✅ **In Scope**|Full|AKEL extracts claims from text 49 49 |FR2: Claim Context|✅ **In Scope**|Basic|Context preserved with claim 50 50 |FR3: Multiple Scenarios|✅ **In Scope**|Full|AKEL generates interpretation scenarios ... ... @@ -52,12 +52,12 @@ 52 52 |FR5: Evidence Collection|✅ **In Scope**|Full|AKEL searches for evidence 53 53 |FR6: Evidence Evaluation|✅ **In Scope**|Full|AKEL evaluates source reliability 54 54 |FR7: Automated Verdicts|✅ **In Scope**|Full|AKEL computes verdicts with uncertainty 55 -|**QUALITY & RELIABILITY**|||| 58 +|**QUALITY & RELIABILITY**||||\\ 56 56 |NFR11: Quality Assurance|✅ **In Scope**|**Lite**|**2 gates only** (Gate 1 & 4) 57 57 |NFR1: Performance|⚠️ **Partial**|Basic|Response time monitored, not optimized 58 58 |NFR2: Scalability|⚠️ **Partial**|Single-thread|No concurrent processing 59 59 |NFR3: Reliability|⚠️ **Partial**|Basic|Error handling, no retry logic 60 -|**DEFERRED TO LATER**|||| 63 +|**DEFERRED TO LATER**||||\\ 61 61 |FR8-FR13|❌ Out of Scope|N/A|User accounts, corrections, publishing 62 62 |FR44-FR53|❌ Out of Scope|N/A|Advanced features (V1.0+) 63 63 |NFR4: Security|❌ Out of Scope|N/A|POC2 ... ... @@ -65,7 +65,6 @@ 65 65 |NFR12: Security Controls|❌ Out of Scope|N/A|Beta 0 66 66 |NFR13: Monitoring|❌ Out of Scope|N/A|POC2 67 67 68 - 69 69 == 3. POC Simplifications == 70 70 71 71 === 3.1 FR1: Claim Extraction (Full Implementation) === ... ... @@ -73,6 +73,7 @@ 73 73 **Main Requirement:** AI extracts factual claims from input text 74 74 75 75 **POC Implementation:** 78 + 76 76 * ✅ AKEL extracts claims using LLM 77 77 * ✅ Each claim includes original text reference 78 78 * ✅ Claims are identified as factual/non-factual ... ... @@ -79,16 +79,17 @@ 79 79 * ❌ No advanced claim parsing (added in POC2) 80 80 81 81 **Acceptance Criteria:** 85 + 82 82 * Extracts 3-5 claims from typical article 83 83 * Identifies factual vs non-factual claims 84 84 * Quality Gate 1 validates extraction 85 85 86 - 87 87 === 3.2 FR3: Multiple Scenarios (Full Implementation) === 88 88 89 89 **Main Requirement:** Generate multiple interpretation scenarios for ambiguous claims 90 90 91 91 **POC Implementation:** 95 + 92 92 * ✅ AKEL generates 2-3 scenarios per claim 93 93 * ✅ Scenarios capture different interpretations 94 94 * ✅ Each scenario is evaluated separately ... ... @@ -95,16 +95,17 @@ 95 95 * ✅ Verdict considers all scenarios 96 96 97 97 **Acceptance Criteria:** 102 + 98 98 * Generates 2+ scenarios for ambiguous claims 99 99 * Scenarios are meaningfully different 100 100 * All scenarios are evaluated 101 101 102 - 103 103 === 3.3 FR4: Analysis Summary (Basic Implementation) === 104 104 105 105 **Main Requirement:** Provide user-friendly summary of analysis 106 106 107 107 **POC Implementation:** 112 + 108 108 * ✅ Simple text summary generated 109 109 * ❌ No rich formatting (added in Beta 0) 110 110 * ❌ No visual elements (added in Beta 0) ... ... @@ -122,10 +122,12 @@ 122 122 === 3.4 FR5-FR6: Evidence Collection & Evaluation (Full Implementation) === 123 123 124 124 **Main Requirements:** 130 + 125 125 * FR5: Collect supporting and opposing evidence 126 126 * FR6: Evaluate evidence source reliability 127 127 128 128 **POC Implementation:** 135 + 129 129 * ✅ AKEL searches for evidence (web/knowledge base) 130 130 * ✅ **Mandatory contradiction search** (finds opposing evidence) 131 131 * ✅ Source reliability scoring ... ... @@ -133,16 +133,17 @@ 133 133 * ❌ No advanced source verification (added in POC2) 134 134 135 135 **Acceptance Criteria:** 143 + 136 136 * Finds 2+ supporting evidence items 137 137 * Finds 1+ opposing evidence (if exists) 138 138 * Sources scored for reliability 139 139 140 - 141 141 === 3.5 FR7: Automated Verdicts (Full Implementation) === 142 142 143 143 **Main Requirement:** AI computes verdicts with uncertainty quantification 144 144 145 145 **POC Implementation:** 153 + 146 146 * ✅ Probabilistic verdicts (0-100% confidence) 147 147 * ✅ Uncertainty explicitly stated 148 148 * ✅ Reasoning chain provided ... ... @@ -157,11 +157,11 @@ 157 157 ``` 158 158 159 159 **Acceptance Criteria:** 168 + 160 160 * Verdicts include probability (0-100%) 161 161 * Uncertainty explicitly quantified 162 162 * Reasoning chain explains verdict 163 163 164 - 165 165 === 3.6 NFR11: Quality Assurance Framework (LITE VERSION) === 166 166 167 167 **Main Requirement:** Complete quality assurance with 7 quality gates ... ... @@ -169,11 +169,13 @@ 169 169 **POC Implementation:** **2 gates only** 170 170 171 171 **Quality Gate 1: Claim Validation** 180 + 172 172 * ✅ Validates claim is factual and verifiable 173 173 * ✅ Blocks non-factual claims (opinion/prediction/ambiguous) 174 174 * ✅ Provides clear rejection reason 175 175 176 176 **Quality Gate 4: Verdict Confidence Assessment** 186 + 177 177 * ✅ Validates ≥2 sources found 178 178 * ✅ Validates quality score ≥0.6 179 179 * ✅ Blocks low-confidence verdicts ... ... @@ -180,6 +180,7 @@ 180 180 * ✅ Provides clear rejection reason 181 181 182 182 **Out of Scope (POC2+):** 193 + 183 183 * ❌ Gate 2: Evidence Relevance 184 184 * ❌ Gate 3: Scenario Coherence 185 185 * ❌ Gate 5: Source Diversity ... ... @@ -192,11 +192,13 @@ 192 192 === 3.7 NFR1-3: Performance, Scalability, Reliability (Basic) === 193 193 194 194 **Main Requirements:** 206 + 195 195 * NFR1: Response time < 30 seconds 196 196 * NFR2: Handle 1000+ concurrent users 197 197 * NFR3: 99.9% uptime 198 198 199 199 **POC Implementation:** 212 + 200 200 * ⚠️ **Response time monitored** (not optimized) 201 201 * ⚠️ **Single-threaded processing** (no concurrency) 202 202 * ⚠️ **Basic error handling** (no advanced retry logic) ... ... @@ -204,11 +204,11 @@ 204 204 **Rationale:** POC proves functionality. Performance optimization happens in POC2. 205 205 206 206 **POC Acceptance:** 220 + 207 207 * Analysis completes (no timeout requirement) 208 208 * Errors don't crash system 209 209 * Basic logging in place 210 210 211 - 212 212 == 4. What's NOT in POC Scope == 213 213 214 214 === 4.1 User-Facing Features (Beta 0+) === ... ... @@ -218,6 +218,7 @@ 218 218 {{/warning}} 219 219 220 220 **Out of Scope:** 234 + 221 221 * ❌ User accounts and authentication (FR8) 222 222 * ❌ User corrections system (FR9, FR45-46) 223 223 * ❌ Public publishing interface (FR10) ... ... @@ -231,6 +231,7 @@ 231 231 === 4.2 Advanced Features (V1.0+) === 232 232 233 233 **Out of Scope:** 248 + 234 234 * ❌ IFCN compliance (FR47) 235 235 * ❌ ClaimReview schema (FR48) 236 236 * ❌ Archive.org integration (FR49) ... ... @@ -245,6 +245,7 @@ 245 245 === 4.3 Production Requirements (POC2, Beta 0) === 246 246 247 247 **Out of Scope:** 263 + 248 248 * ❌ Security controls (NFR4, NFR12) 249 249 * ❌ Code maintainability (NFR5) 250 250 * ❌ System monitoring (NFR13) ... ... @@ -261,21 +261,26 @@ 261 261 262 262 For each analyzed claim, POC must produce: 263 263 264 -**1. Claim** 280 +* 281 +** 282 +**1. Claim 265 265 * Original text 266 266 * Classification (factual/non-factual/ambiguous) 267 267 * If non-factual: Clear reason why 268 268 269 269 **2. Scenarios** (if factual) 288 + 270 270 * 2-3 interpretation scenarios 271 271 * Each scenario clearly described 272 272 273 273 **3. Evidence** (if factual) 293 + 274 274 * Supporting evidence (2+ items) 275 275 * Opposing evidence (if exists) 276 276 * Source URLs and reliability scores 277 277 278 278 **4. Verdict** (if factual) 299 + 279 279 * Probability (0-100%) 280 280 * Uncertainty quantification 281 281 * Confidence level (LOW/MEDIUM/HIGH) ... ... @@ -282,10 +282,10 @@ 282 282 * Reasoning chain 283 283 284 284 **5. Quality Status** 306 + 285 285 * Which gates passed/failed 286 286 * If failed: Clear explanation why 287 287 288 - 289 289 === 5.2 Example POC Output === 290 290 291 291 {{code language="json"}} ... ... @@ -337,6 +337,7 @@ 337 337 POC is successful if: 338 338 339 339 ✅ **FR1-FR7 Requirements Met:** 361 + 340 340 1. Extracts 3-5 factual claims from test articles 341 341 2. Generates 2-3 scenarios per ambiguous claim 342 342 3. Finds supporting AND opposing evidence ... ... @@ -344,19 +344,21 @@ 344 344 5. Provides clear reasoning chains 345 345 346 346 ✅ **Quality Gates Work:** 369 + 347 347 1. Gate 1 blocks non-factual claims (100% block rate) 348 348 2. Gate 4 blocks low-quality verdicts (blocks if <2 sources or quality <0.6) 349 349 3. Clear rejection reasons provided 350 350 351 351 ✅ **NFR11 Met:** 375 + 352 352 1. Quality gates reduce hallucination rate 353 353 2. Blocked outputs have clear explanations 354 354 3. Quality metrics are logged 355 355 356 - 357 357 === 6.2 Quality Thresholds === 358 358 359 359 **Minimum Acceptable:** 383 + 360 360 * ≥70% of test claims correctly classified (factual/non-factual) 361 361 * ≥60% of verdicts are reasonable (human evaluation) 362 362 * Gate 1 blocks 100% of non-factual claims ... ... @@ -363,16 +363,17 @@ 363 363 * Gate 4 blocks verdicts with <2 sources 364 364 365 365 **Target:** 390 + 366 366 * ≥80% claims correctly classified 367 367 * ≥75% verdicts are reasonable 368 368 * <10% false positives (blocking good claims) 369 369 370 - 371 371 === 6.3 POC Decision Gate === 372 372 373 373 **After POC1, we decide:** 374 374 375 375 **✅ PROCEED to POC2** if: 400 + 376 376 * Success criteria met 377 377 * Quality gates demonstrably improve output 378 378 * Core workflow is technically sound ... ... @@ -379,65 +379,72 @@ 379 379 * Clear path to production quality 380 380 381 381 **⚠️ ITERATE POC1** if: 407 + 382 382 * Success criteria partially met 383 383 * Gates work but need tuning 384 384 * Core issues identified but fixable 385 385 386 386 **❌ PIVOT APPROACH** if: 413 + 387 387 * Success criteria not met 388 388 * Fundamental AI limitations discovered 389 389 * Quality gates insufficient 390 390 * Alternative approach needed 391 391 392 - 393 393 == 7. Test Cases == 394 394 395 395 === 7.1 Happy Path === 396 396 397 397 **Test 1: Simple Factual Claim** 424 + 398 398 * Input: "Paris is the capital of France" 399 -* Expected: Factual, 1 scenario, verdict ~95% true426 +* Expected: Factual, 1 scenario, verdict 95% true 400 400 401 401 **Test 2: Ambiguous Claim** 429 + 402 402 * Input: "Switzerland has the highest income in Europe" 403 403 * Expected: Factual, 2-3 scenarios, verdict with uncertainty 404 404 405 405 **Test 3: Statistical Claim** 434 + 406 406 * Input: "10% of people have condition X" 407 407 * Expected: Factual, evidence with numbers, probabilistic verdict 408 408 409 - 410 410 === 7.2 Edge Cases === 411 411 412 412 **Test 4: Opinion** 441 + 413 413 * Input: "Paris is the best city" 414 414 * Expected: Non-factual (opinion), blocked by Gate 1 415 415 416 416 **Test 5: Prediction** 446 + 417 417 * Input: "Bitcoin will reach $100,000 next year" 418 418 * Expected: Non-factual (prediction), blocked by Gate 1 419 419 420 420 **Test 6: Insufficient Evidence** 451 + 421 421 * Input: Obscure factual claim with no sources 422 422 * Expected: Blocked by Gate 4 (<2 sources) 423 423 424 - 425 425 === 7.3 Quality Gate Tests === 426 426 427 427 **Test 7: Gate 1 Effectiveness** 458 + 428 428 * Input: Mix of 10 factual + 10 non-factual claims 429 429 * Expected: Gate 1 blocks all 10 non-factual (100% precision) 430 430 431 431 **Test 8: Gate 4 Effectiveness** 463 + 432 432 * Input: Claims with varying evidence availability 433 433 * Expected: Gate 4 blocks low-confidence verdicts 434 434 435 - 436 436 == 8. Technical Architecture (POC) == 437 437 438 438 === 8.1 Simplified Architecture === 439 439 440 440 **POC Tech Stack:** 472 + 441 441 * **Frontend:** Simple web interface (Next.js + TypeScript) 442 442 * **Backend:** Single API endpoint 443 443 * **AI:** Claude API (Sonnet 4.5) ... ... @@ -450,6 +450,7 @@ 450 450 === 8.2 AKEL Implementation === 451 451 452 452 **POC AKEL:** 485 + 453 453 * Single-threaded processing 454 454 * Synchronous API calls 455 455 * No caching ... ... @@ -457,6 +457,7 @@ 457 457 * Console logging 458 458 459 459 **Full AKEL (POC2+):** 493 + 460 460 * Multi-threaded processing 461 461 * Async API calls 462 462 * Evidence caching ... ... @@ -463,7 +463,6 @@ 463 463 * Advanced error handling with retry 464 464 * Structured logging + monitoring 465 465 466 - 467 467 == 9. POC Philosophy == 468 468 469 469 {{info}} ... ... @@ -472,60 +472,67 @@ 472 472 473 473 === 9.1 Core Principles === 474 474 475 -**1. Prove Concept, Not Production** 508 +* 509 +** 510 +**1. Prove Concept, Not Production 476 476 * POC validates AI can do the job 477 477 * Production quality comes in POC2 and Beta 0 478 478 * Focus on "does it work?" not "is it perfect?" 479 479 480 480 **2. Implement Subset of Requirements** 516 + 481 481 * POC covers FR1-7, NFR11 (lite) 482 482 * All other requirements deferred 483 483 * Clear mapping to [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]] 484 484 485 485 **3. Quality Gates Validate Approach** 522 + 486 486 * 2 gates prove the concept 487 487 * Remaining 5 gates added in POC2 488 488 * Gates must demonstrably improve quality 489 489 490 490 **4. Iterate Based on Results** 528 + 491 491 * POC results determine next steps 492 492 * Decision gate after POC1 493 493 * Flexibility to pivot if needed 494 494 533 +=== 9.2 Success === 495 495 496 - ===9.2 Success =Clear Path Forward ===535 + Clear Path Forward === 497 497 498 498 POC succeeds if we can confidently answer: 499 499 500 500 ✅ **Technical Feasibility:** 540 + 501 501 * Can AI extract claims reliably? 502 502 * Can AI find balanced evidence? 503 503 * Can AI compute reasonable verdicts? 504 504 505 505 ✅ **Quality Approach:** 546 + 506 506 * Do quality gates improve output? 507 507 * Can we measure and track quality? 508 508 * Is the gate approach scalable? 509 509 510 510 ✅ **Production Path:** 552 + 511 511 * Is the core architecture sound? 512 512 * What needs improvement for production? 513 513 * Is POC2 the right next step? 514 514 515 - 516 516 == 10. Related Pages == 517 517 518 518 * **[[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]** - Full system requirements (this POC implements a subset) 519 519 * **[[POC1 Specification (Detailed)>>FactHarbor.Specification.POC.Specification]]** - Detailed POC1 technical specs 520 520 * **[[POC Summary>>FactHarbor.Specification.POC.Summary]]** - High-level POC overview 521 -* **[[Implementation Roadmap>>FactHarbor.Roadmap.WebHome]]** - POC1, POC2, Beta 0, V1.0 phases 562 +* **[[Implementation Roadmap>>Archive.FactHarbor.Roadmap.WebHome]]** - POC1, POC2, Beta 0, V1.0 phases 522 522 * **[[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]** - What users need (drives requirements) 523 523 524 - 525 525 **Document Owner:** Technical Team 526 526 **Review Frequency:** After each POC iteration 527 527 **Version History:** 568 + 528 528 * v1.0 - Initial POC requirements 529 529 * v2.0 - Updated after specification cross-check 530 530 * v3.0 - Aligned with Main Requirements (FR/NFR IDs added) 531 -