Changes for page POC Requirements
Last modified by Robert Schaub on 2025/12/23 11:35
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -14,11 +14,9 @@ 14 14 === 1.1 What POC Tests === 15 15 16 16 **Core Question:** 17 - 18 18 > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts? 19 19 20 20 **What we're proving:** 21 - 22 22 * AI can identify factual claims from text 23 23 * AI can evaluate those claims with structured evidence 24 24 * Quality gates can filter unreliable outputs ... ... @@ -25,7 +25,6 @@ 25 25 * The core workflow is technically feasible 26 26 27 27 **What we're NOT proving:** 28 - 29 29 * Production-ready reliability (that's POC2) 30 30 * User-facing features (that's Beta 0) 31 31 * Full IFCN compliance (that's V1.0) ... ... @@ -35,11 +35,11 @@ 35 35 POC1 implements a **subset** of the full system requirements defined in [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]. 36 36 37 37 **Scope Summary:** 38 - 39 39 * **In Scope:** 8 requirements (7 FRs + 1 NFR) 40 40 * **Partial:** 3 NFRs (simplified versions) 41 41 * **Out of Scope:** 19 requirements (deferred to later phases) 42 42 39 + 43 43 == 2. Requirements Scope Matrix == 44 44 45 45 {{success}} ... ... @@ -48,10 +48,9 @@ 48 48 49 49 **POC1 Scope Summary:** 50 50 51 -POC1 implements the following requirements from the [[Main Requirements>>Test.FactHarbor V0\.9\.78.Specification.Requirements.WebHome]]:48 +POC1 implements the following requirements from the [[Main Requirements>>Test.FactHarbor.Specification.Requirements.WebHome]]: 52 52 53 53 **Full Implementation (8 requirements):** 54 - 55 55 * FR1: Claim Extraction 56 56 * FR2: Claim Context 57 57 * FR3: Multiple Scenarios ... ... @@ -62,13 +62,11 @@ 62 62 * NFR11: AKEL Quality Assurance Framework (Basic - 4 quality gates) 63 63 64 64 **Partial Implementation (3 requirements):** 65 - 66 66 * NFR1: Explainability (Basic explanations only) 67 67 * NFR2: Performance (Functional but not optimized) 68 68 * NFR3: Transparency (Basic transparency) 69 69 70 70 **Deferred to Later Phases:** 71 - 72 72 * All other requirements (see Roadmap Matrix for phase assignments) 73 73 74 74 **Detailed POC1 specifications continue below...** ... ... @@ -81,7 +81,6 @@ 81 81 **Main Requirement:** AI extracts factual claims from input text 82 82 83 83 **POC Implementation:** 84 - 85 85 * ✅ AKEL extracts claims using LLM 86 86 * ✅ Each claim includes original text reference 87 87 * ✅ Claims are identified as factual/non-factual ... ... @@ -88,17 +88,16 @@ 88 88 * ❌ No advanced claim parsing (added in POC2) 89 89 90 90 **Acceptance Criteria:** 91 - 92 92 * Extracts 3-5 claims from typical article 93 93 * Identifies factual vs non-factual claims 94 94 * Quality Gate 1 validates extraction 95 95 88 + 96 96 === 3.2 FR3: Multiple Scenarios (Full Implementation) === 97 97 98 98 **Main Requirement:** Generate multiple interpretation scenarios for ambiguous claims 99 99 100 100 **POC Implementation:** 101 - 102 102 * ✅ AKEL generates 2-3 scenarios per claim 103 103 * ✅ Scenarios capture different interpretations 104 104 * ✅ Each scenario is evaluated separately ... ... @@ -105,17 +105,16 @@ 105 105 * ✅ Verdict considers all scenarios 106 106 107 107 **Acceptance Criteria:** 108 - 109 109 * Generates 2+ scenarios for ambiguous claims 110 110 * Scenarios are meaningfully different 111 111 * All scenarios are evaluated 112 112 104 + 113 113 === 3.3 FR4: Analysis Summary (Basic Implementation) === 114 114 115 115 **Main Requirement:** Provide user-friendly summary of analysis 116 116 117 117 **POC Implementation:** 118 - 119 119 * ✅ Simple text summary generated 120 120 * ❌ No rich formatting (added in Beta 0) 121 121 * ❌ No visual elements (added in Beta 0) ... ... @@ -133,12 +133,10 @@ 133 133 === 3.4 FR5-FR6: Evidence Collection & Evaluation (Full Implementation) === 134 134 135 135 **Main Requirements:** 136 - 137 137 * FR5: Collect supporting and opposing evidence 138 138 * FR6: Evaluate evidence source reliability 139 139 140 140 **POC Implementation:** 141 - 142 142 * ✅ AKEL searches for evidence (web/knowledge base) 143 143 * ✅ **Mandatory contradiction search** (finds opposing evidence) 144 144 * ✅ Source reliability scoring ... ... @@ -146,17 +146,16 @@ 146 146 * ❌ No advanced source verification (added in POC2) 147 147 148 148 **Acceptance Criteria:** 149 - 150 150 * Finds 2+ supporting evidence items 151 151 * Finds 1+ opposing evidence (if exists) 152 152 * Sources scored for reliability 153 153 142 + 154 154 === 3.5 FR7: Automated Verdicts (Full Implementation) === 155 155 156 156 **Main Requirement:** AI computes verdicts with uncertainty quantification 157 157 158 158 **POC Implementation:** 159 - 160 160 * ✅ Probabilistic verdicts (0-100% confidence) 161 161 * ✅ Uncertainty explicitly stated 162 162 * ✅ Reasoning chain provided ... ... @@ -171,11 +171,11 @@ 171 171 ``` 172 172 173 173 **Acceptance Criteria:** 174 - 175 175 * Verdicts include probability (0-100%) 176 176 * Uncertainty explicitly quantified 177 177 * Reasoning chain explains verdict 178 178 166 + 179 179 === 3.6 NFR11: Quality Assurance Framework (LITE VERSION) === 180 180 181 181 **Main Requirement:** Complete quality assurance with 7 quality gates ... ... @@ -183,13 +183,11 @@ 183 183 **POC Implementation:** **2 gates only** 184 184 185 185 **Quality Gate 1: Claim Validation** 186 - 187 187 * ✅ Validates claim is factual and verifiable 188 188 * ✅ Blocks non-factual claims (opinion/prediction/ambiguous) 189 189 * ✅ Provides clear rejection reason 190 190 191 191 **Quality Gate 4: Verdict Confidence Assessment** 192 - 193 193 * ✅ Validates ≥2 sources found 194 194 * ✅ Validates quality score ≥0.6 195 195 * ✅ Blocks low-confidence verdicts ... ... @@ -196,7 +196,6 @@ 196 196 * ✅ Provides clear rejection reason 197 197 198 198 **Out of Scope (POC2+):** 199 - 200 200 * ❌ Gate 2: Evidence Relevance 201 201 * ❌ Gate 3: Scenario Coherence 202 202 * ❌ Gate 5: Source Diversity ... ... @@ -209,13 +209,11 @@ 209 209 === 3.7 NFR1-3: Performance, Scalability, Reliability (Basic) === 210 210 211 211 **Main Requirements:** 212 - 213 213 * NFR1: Response time < 30 seconds 214 214 * NFR2: Handle 1000+ concurrent users 215 215 * NFR3: 99.9% uptime 216 216 217 217 **POC Implementation:** 218 - 219 219 * ⚠️ **Response time monitored** (not optimized) 220 220 * ⚠️ **Single-threaded processing** (no concurrency) 221 221 * ⚠️ **Basic error handling** (no advanced retry logic) ... ... @@ -223,11 +223,11 @@ 223 223 **Rationale:** POC proves functionality. Performance optimization happens in POC2. 224 224 225 225 **POC Acceptance:** 226 - 227 227 * Analysis completes (no timeout requirement) 228 228 * Errors don't crash system 229 229 * Basic logging in place 230 230 213 + 231 231 == 4. What's NOT in POC Scope == 232 232 233 233 === 4.1 User-Facing Features (Beta 0+) === ... ... @@ -237,7 +237,6 @@ 237 237 {{/warning}} 238 238 239 239 **Out of Scope:** 240 - 241 241 * ❌ User accounts and authentication (FR8) 242 242 * ❌ User corrections system (FR9, FR45-46) 243 243 * ❌ Public publishing interface (FR10) ... ... @@ -251,7 +251,6 @@ 251 251 === 4.2 Advanced Features (V1.0+) === 252 252 253 253 **Out of Scope:** 254 - 255 255 * ❌ IFCN compliance (FR47) 256 256 * ❌ ClaimReview schema (FR48) 257 257 * ❌ Archive.org integration (FR49) ... ... @@ -266,7 +266,6 @@ 266 266 === 4.3 Production Requirements (POC2, Beta 0) === 267 267 268 268 **Out of Scope:** 269 - 270 270 * ❌ Security controls (NFR4, NFR12) 271 271 * ❌ Code maintainability (NFR5) 272 272 * ❌ System monitoring (NFR13) ... ... @@ -283,26 +283,21 @@ 283 283 284 284 For each analyzed claim, POC must produce: 285 285 286 -* 287 -** 288 -**1. Claim 266 +**1. Claim** 289 289 * Original text 290 290 * Classification (factual/non-factual/ambiguous) 291 291 * If non-factual: Clear reason why 292 292 293 293 **2. Scenarios** (if factual) 294 - 295 295 * 2-3 interpretation scenarios 296 296 * Each scenario clearly described 297 297 298 298 **3. Evidence** (if factual) 299 - 300 300 * Supporting evidence (2+ items) 301 301 * Opposing evidence (if exists) 302 302 * Source URLs and reliability scores 303 303 304 304 **4. Verdict** (if factual) 305 - 306 306 * Probability (0-100%) 307 307 * Uncertainty quantification 308 308 * Confidence level (LOW/MEDIUM/HIGH) ... ... @@ -309,10 +309,10 @@ 309 309 * Reasoning chain 310 310 311 311 **5. Quality Status** 312 - 313 313 * Which gates passed/failed 314 314 * If failed: Clear explanation why 315 315 290 + 316 316 === 5.2 Example POC Output === 317 317 318 318 {{code language="json"}} ... ... @@ -364,7 +364,6 @@ 364 364 POC is successful if: 365 365 366 366 ✅ **FR1-FR7 Requirements Met:** 367 - 368 368 1. Extracts 3-5 factual claims from test articles 369 369 2. Generates 2-3 scenarios per ambiguous claim 370 370 3. Finds supporting AND opposing evidence ... ... @@ -372,21 +372,19 @@ 372 372 5. Provides clear reasoning chains 373 373 374 374 ✅ **Quality Gates Work:** 375 - 376 376 1. Gate 1 blocks non-factual claims (100% block rate) 377 377 2. Gate 4 blocks low-quality verdicts (blocks if <2 sources or quality <0.6) 378 378 3. Clear rejection reasons provided 379 379 380 380 ✅ **NFR11 Met:** 381 - 382 382 1. Quality gates reduce hallucination rate 383 383 2. Blocked outputs have clear explanations 384 384 3. Quality metrics are logged 385 385 358 + 386 386 === 6.2 Quality Thresholds === 387 387 388 388 **Minimum Acceptable:** 389 - 390 390 * ≥70% of test claims correctly classified (factual/non-factual) 391 391 * ≥60% of verdicts are reasonable (human evaluation) 392 392 * Gate 1 blocks 100% of non-factual claims ... ... @@ -393,17 +393,16 @@ 393 393 * Gate 4 blocks verdicts with <2 sources 394 394 395 395 **Target:** 396 - 397 397 * ≥80% claims correctly classified 398 398 * ≥75% verdicts are reasonable 399 399 * <10% false positives (blocking good claims) 400 400 372 + 401 401 === 6.3 POC Decision Gate === 402 402 403 403 **After POC1, we decide:** 404 404 405 405 **✅ PROCEED to POC2** if: 406 - 407 407 * Success criteria met 408 408 * Quality gates demonstrably improve output 409 409 * Core workflow is technically sound ... ... @@ -410,72 +410,65 @@ 410 410 * Clear path to production quality 411 411 412 412 **⚠️ ITERATE POC1** if: 413 - 414 414 * Success criteria partially met 415 415 * Gates work but need tuning 416 416 * Core issues identified but fixable 417 417 418 418 **❌ PIVOT APPROACH** if: 419 - 420 420 * Success criteria not met 421 421 * Fundamental AI limitations discovered 422 422 * Quality gates insufficient 423 423 * Alternative approach needed 424 424 394 + 425 425 == 7. Test Cases == 426 426 427 427 === 7.1 Happy Path === 428 428 429 429 **Test 1: Simple Factual Claim** 430 - 431 431 * Input: "Paris is the capital of France" 432 -* Expected: Factual, 1 scenario, verdict 95% true 401 +* Expected: Factual, 1 scenario, verdict ~95% true 433 433 434 434 **Test 2: Ambiguous Claim** 435 - 436 436 * Input: "Switzerland has the highest income in Europe" 437 437 * Expected: Factual, 2-3 scenarios, verdict with uncertainty 438 438 439 439 **Test 3: Statistical Claim** 440 - 441 441 * Input: "10% of people have condition X" 442 442 * Expected: Factual, evidence with numbers, probabilistic verdict 443 443 411 + 444 444 === 7.2 Edge Cases === 445 445 446 446 **Test 4: Opinion** 447 - 448 448 * Input: "Paris is the best city" 449 449 * Expected: Non-factual (opinion), blocked by Gate 1 450 450 451 451 **Test 5: Prediction** 452 - 453 453 * Input: "Bitcoin will reach $100,000 next year" 454 454 * Expected: Non-factual (prediction), blocked by Gate 1 455 455 456 456 **Test 6: Insufficient Evidence** 457 - 458 458 * Input: Obscure factual claim with no sources 459 459 * Expected: Blocked by Gate 4 (<2 sources) 460 460 426 + 461 461 === 7.3 Quality Gate Tests === 462 462 463 463 **Test 7: Gate 1 Effectiveness** 464 - 465 465 * Input: Mix of 10 factual + 10 non-factual claims 466 466 * Expected: Gate 1 blocks all 10 non-factual (100% precision) 467 467 468 468 **Test 8: Gate 4 Effectiveness** 469 - 470 470 * Input: Claims with varying evidence availability 471 471 * Expected: Gate 4 blocks low-confidence verdicts 472 472 437 + 473 473 == 8. Technical Architecture (POC) == 474 474 475 475 === 8.1 Simplified Architecture === 476 476 477 477 **POC Tech Stack:** 478 - 479 479 * **Frontend:** Simple web interface (Next.js + TypeScript) 480 480 * **Backend:** Single API endpoint 481 481 * **AI:** Claude API (Sonnet 4.5) ... ... @@ -488,7 +488,6 @@ 488 488 === 8.2 AKEL Implementation === 489 489 490 490 **POC AKEL:** 491 - 492 492 * Single-threaded processing 493 493 * Synchronous API calls 494 494 * No caching ... ... @@ -496,7 +496,6 @@ 496 496 * Console logging 497 497 498 498 **Full AKEL (POC2+):** 499 - 500 500 * Multi-threaded processing 501 501 * Async API calls 502 502 * Evidence caching ... ... @@ -503,6 +503,7 @@ 503 503 * Advanced error handling with retry 504 504 * Structured logging + monitoring 505 505 468 + 506 506 == 9. POC Philosophy == 507 507 508 508 {{info}} ... ... @@ -511,55 +511,47 @@ 511 511 512 512 === 9.1 Core Principles === 513 513 514 -* 515 -** 516 -**1. Prove Concept, Not Production 477 +**1. Prove Concept, Not Production** 517 517 * POC validates AI can do the job 518 518 * Production quality comes in POC2 and Beta 0 519 519 * Focus on "does it work?" not "is it perfect?" 520 520 521 521 **2. Implement Subset of Requirements** 522 - 523 523 * POC covers FR1-7, NFR11 (lite) 524 524 * All other requirements deferred 525 525 * Clear mapping to [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]] 526 526 527 527 **3. Quality Gates Validate Approach** 528 - 529 529 * 2 gates prove the concept 530 530 * Remaining 5 gates added in POC2 531 531 * Gates must demonstrably improve quality 532 532 533 533 **4. Iterate Based on Results** 534 - 535 535 * POC results determine next steps 536 536 * Decision gate after POC1 537 537 * Flexibility to pivot if needed 538 538 539 -=== 9.2 Success === 540 540 541 - Clear Path Forward === 498 +=== 9.2 Success = Clear Path Forward === 542 542 543 543 POC succeeds if we can confidently answer: 544 544 545 545 ✅ **Technical Feasibility:** 546 - 547 547 * Can AI extract claims reliably? 548 548 * Can AI find balanced evidence? 549 549 * Can AI compute reasonable verdicts? 550 550 551 551 ✅ **Quality Approach:** 552 - 553 553 * Do quality gates improve output? 554 554 * Can we measure and track quality? 555 555 * Is the gate approach scalable? 556 556 557 557 ✅ **Production Path:** 558 - 559 559 * Is the core architecture sound? 560 560 * What needs improvement for production? 561 561 * Is POC2 the right next step? 562 562 517 + 563 563 == 10. Related Pages == 564 564 565 565 * **[[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]** - Full system requirements (this POC implements a subset) ... ... @@ -568,10 +568,11 @@ 568 568 * **[[Implementation Roadmap>>FactHarbor.Roadmap.WebHome]]** - POC1, POC2, Beta 0, V1.0 phases 569 569 * **[[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]** - What users need (drives requirements) 570 570 526 + 571 571 **Document Owner:** Technical Team 572 572 **Review Frequency:** After each POC iteration 573 573 **Version History:** 574 - 575 575 * v1.0 - Initial POC requirements 576 576 * v2.0 - Updated after specification cross-check 577 577 * v3.0 - Aligned with Main Requirements (FR/NFR IDs added) 533 +