Changes for page POC Requirements
Last modified by Robert Schaub on 2025/12/23 11:35
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -14,9 +14,11 @@ 14 14 === 1.1 What POC Tests === 15 15 16 16 **Core Question:** 17 + 17 17 > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts? 18 18 19 19 **What we're proving:** 21 + 20 20 * AI can identify factual claims from text 21 21 * AI can evaluate those claims with structured evidence 22 22 * Quality gates can filter unreliable outputs ... ... @@ -23,6 +23,7 @@ 23 23 * The core workflow is technically feasible 24 24 25 25 **What we're NOT proving:** 28 + 26 26 * Production-ready reliability (that's POC2) 27 27 * User-facing features (that's Beta 0) 28 28 * Full IFCN compliance (that's V1.0) ... ... @@ -32,22 +32,23 @@ 32 32 POC1 implements a **subset** of the full system requirements defined in [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]. 33 33 34 34 **Scope Summary:** 38 + 35 35 * **In Scope:** 8 requirements (7 FRs + 1 NFR) 36 36 * **Partial:** 3 NFRs (simplified versions) 37 37 * **Out of Scope:** 19 requirements (deferred to later phases) 38 38 39 - 40 40 == 2. Requirements Scope Matrix == 41 41 42 42 {{success}} 43 -**Authoritative Source:** See [[Requirements Roadmap Matrix>>Test.FactHarbor.Specification.Requirements-Roadmap-Matrix.WebHome]] for complete phase-to-requirement mapping across all phases. 46 +**Authoritative Source:** See [[Requirements Roadmap Matrix>>Test.FactHarbor V0\.9\.78.Specification.Requirements-Roadmap-Matrix.WebHome]] for complete phase-to-requirement mapping across all phases. 44 44 {{/success}} 45 45 46 46 **POC1 Scope Summary:** 47 47 48 -POC1 implements the following requirements from the [[Main Requirements>>Test.FactHarbor.Specification.Requirements.WebHome]]: 51 +POC1 implements the following requirements from the [[Main Requirements>>Test.FactHarbor V0\.9\.78.Specification.Requirements.WebHome]]: 49 49 50 50 **Full Implementation (8 requirements):** 54 + 51 51 * FR1: Claim Extraction 52 52 * FR2: Claim Context 53 53 * FR3: Multiple Scenarios ... ... @@ -58,11 +58,13 @@ 58 58 * NFR11: AKEL Quality Assurance Framework (Basic - 4 quality gates) 59 59 60 60 **Partial Implementation (3 requirements):** 65 + 61 61 * NFR1: Explainability (Basic explanations only) 62 62 * NFR2: Performance (Functional but not optimized) 63 63 * NFR3: Transparency (Basic transparency) 64 64 65 65 **Deferred to Later Phases:** 71 + 66 66 * All other requirements (see Roadmap Matrix for phase assignments) 67 67 68 68 **Detailed POC1 specifications continue below...** ... ... @@ -75,6 +75,7 @@ 75 75 **Main Requirement:** AI extracts factual claims from input text 76 76 77 77 **POC Implementation:** 84 + 78 78 * ✅ AKEL extracts claims using LLM 79 79 * ✅ Each claim includes original text reference 80 80 * ✅ Claims are identified as factual/non-factual ... ... @@ -81,16 +81,17 @@ 81 81 * ❌ No advanced claim parsing (added in POC2) 82 82 83 83 **Acceptance Criteria:** 91 + 84 84 * Extracts 3-5 claims from typical article 85 85 * Identifies factual vs non-factual claims 86 86 * Quality Gate 1 validates extraction 87 87 88 - 89 89 === 3.2 FR3: Multiple Scenarios (Full Implementation) === 90 90 91 91 **Main Requirement:** Generate multiple interpretation scenarios for ambiguous claims 92 92 93 93 **POC Implementation:** 101 + 94 94 * ✅ AKEL generates 2-3 scenarios per claim 95 95 * ✅ Scenarios capture different interpretations 96 96 * ✅ Each scenario is evaluated separately ... ... @@ -97,16 +97,17 @@ 97 97 * ✅ Verdict considers all scenarios 98 98 99 99 **Acceptance Criteria:** 108 + 100 100 * Generates 2+ scenarios for ambiguous claims 101 101 * Scenarios are meaningfully different 102 102 * All scenarios are evaluated 103 103 104 - 105 105 === 3.3 FR4: Analysis Summary (Basic Implementation) === 106 106 107 107 **Main Requirement:** Provide user-friendly summary of analysis 108 108 109 109 **POC Implementation:** 118 + 110 110 * ✅ Simple text summary generated 111 111 * ❌ No rich formatting (added in Beta 0) 112 112 * ❌ No visual elements (added in Beta 0) ... ... @@ -124,10 +124,12 @@ 124 124 === 3.4 FR5-FR6: Evidence Collection & Evaluation (Full Implementation) === 125 125 126 126 **Main Requirements:** 136 + 127 127 * FR5: Collect supporting and opposing evidence 128 128 * FR6: Evaluate evidence source reliability 129 129 130 130 **POC Implementation:** 141 + 131 131 * ✅ AKEL searches for evidence (web/knowledge base) 132 132 * ✅ **Mandatory contradiction search** (finds opposing evidence) 133 133 * ✅ Source reliability scoring ... ... @@ -135,16 +135,17 @@ 135 135 * ❌ No advanced source verification (added in POC2) 136 136 137 137 **Acceptance Criteria:** 149 + 138 138 * Finds 2+ supporting evidence items 139 139 * Finds 1+ opposing evidence (if exists) 140 140 * Sources scored for reliability 141 141 142 - 143 143 === 3.5 FR7: Automated Verdicts (Full Implementation) === 144 144 145 145 **Main Requirement:** AI computes verdicts with uncertainty quantification 146 146 147 147 **POC Implementation:** 159 + 148 148 * ✅ Probabilistic verdicts (0-100% confidence) 149 149 * ✅ Uncertainty explicitly stated 150 150 * ✅ Reasoning chain provided ... ... @@ -159,11 +159,11 @@ 159 159 ``` 160 160 161 161 **Acceptance Criteria:** 174 + 162 162 * Verdicts include probability (0-100%) 163 163 * Uncertainty explicitly quantified 164 164 * Reasoning chain explains verdict 165 165 166 - 167 167 === 3.6 NFR11: Quality Assurance Framework (LITE VERSION) === 168 168 169 169 **Main Requirement:** Complete quality assurance with 7 quality gates ... ... @@ -171,11 +171,13 @@ 171 171 **POC Implementation:** **2 gates only** 172 172 173 173 **Quality Gate 1: Claim Validation** 186 + 174 174 * ✅ Validates claim is factual and verifiable 175 175 * ✅ Blocks non-factual claims (opinion/prediction/ambiguous) 176 176 * ✅ Provides clear rejection reason 177 177 178 178 **Quality Gate 4: Verdict Confidence Assessment** 192 + 179 179 * ✅ Validates ≥2 sources found 180 180 * ✅ Validates quality score ≥0.6 181 181 * ✅ Blocks low-confidence verdicts ... ... @@ -182,6 +182,7 @@ 182 182 * ✅ Provides clear rejection reason 183 183 184 184 **Out of Scope (POC2+):** 199 + 185 185 * ❌ Gate 2: Evidence Relevance 186 186 * ❌ Gate 3: Scenario Coherence 187 187 * ❌ Gate 5: Source Diversity ... ... @@ -194,11 +194,13 @@ 194 194 === 3.7 NFR1-3: Performance, Scalability, Reliability (Basic) === 195 195 196 196 **Main Requirements:** 212 + 197 197 * NFR1: Response time < 30 seconds 198 198 * NFR2: Handle 1000+ concurrent users 199 199 * NFR3: 99.9% uptime 200 200 201 201 **POC Implementation:** 218 + 202 202 * ⚠️ **Response time monitored** (not optimized) 203 203 * ⚠️ **Single-threaded processing** (no concurrency) 204 204 * ⚠️ **Basic error handling** (no advanced retry logic) ... ... @@ -206,11 +206,11 @@ 206 206 **Rationale:** POC proves functionality. Performance optimization happens in POC2. 207 207 208 208 **POC Acceptance:** 226 + 209 209 * Analysis completes (no timeout requirement) 210 210 * Errors don't crash system 211 211 * Basic logging in place 212 212 213 - 214 214 == 4. What's NOT in POC Scope == 215 215 216 216 === 4.1 User-Facing Features (Beta 0+) === ... ... @@ -220,6 +220,7 @@ 220 220 {{/warning}} 221 221 222 222 **Out of Scope:** 240 + 223 223 * ❌ User accounts and authentication (FR8) 224 224 * ❌ User corrections system (FR9, FR45-46) 225 225 * ❌ Public publishing interface (FR10) ... ... @@ -233,6 +233,7 @@ 233 233 === 4.2 Advanced Features (V1.0+) === 234 234 235 235 **Out of Scope:** 254 + 236 236 * ❌ IFCN compliance (FR47) 237 237 * ❌ ClaimReview schema (FR48) 238 238 * ❌ Archive.org integration (FR49) ... ... @@ -247,6 +247,7 @@ 247 247 === 4.3 Production Requirements (POC2, Beta 0) === 248 248 249 249 **Out of Scope:** 269 + 250 250 * ❌ Security controls (NFR4, NFR12) 251 251 * ❌ Code maintainability (NFR5) 252 252 * ❌ System monitoring (NFR13) ... ... @@ -263,21 +263,26 @@ 263 263 264 264 For each analyzed claim, POC must produce: 265 265 266 -**1. Claim** 286 +* \\ 287 +** \\ 288 +**1. Claim 267 267 * Original text 268 268 * Classification (factual/non-factual/ambiguous) 269 269 * If non-factual: Clear reason why 270 270 271 271 **2. Scenarios** (if factual) 294 + 272 272 * 2-3 interpretation scenarios 273 273 * Each scenario clearly described 274 274 275 275 **3. Evidence** (if factual) 299 + 276 276 * Supporting evidence (2+ items) 277 277 * Opposing evidence (if exists) 278 278 * Source URLs and reliability scores 279 279 280 280 **4. Verdict** (if factual) 305 + 281 281 * Probability (0-100%) 282 282 * Uncertainty quantification 283 283 * Confidence level (LOW/MEDIUM/HIGH) ... ... @@ -284,10 +284,10 @@ 284 284 * Reasoning chain 285 285 286 286 **5. Quality Status** 312 + 287 287 * Which gates passed/failed 288 288 * If failed: Clear explanation why 289 289 290 - 291 291 === 5.2 Example POC Output === 292 292 293 293 {{code language="json"}} ... ... @@ -339,6 +339,7 @@ 339 339 POC is successful if: 340 340 341 341 ✅ **FR1-FR7 Requirements Met:** 367 + 342 342 1. Extracts 3-5 factual claims from test articles 343 343 2. Generates 2-3 scenarios per ambiguous claim 344 344 3. Finds supporting AND opposing evidence ... ... @@ -346,19 +346,21 @@ 346 346 5. Provides clear reasoning chains 347 347 348 348 ✅ **Quality Gates Work:** 375 + 349 349 1. Gate 1 blocks non-factual claims (100% block rate) 350 350 2. Gate 4 blocks low-quality verdicts (blocks if <2 sources or quality <0.6) 351 351 3. Clear rejection reasons provided 352 352 353 353 ✅ **NFR11 Met:** 381 + 354 354 1. Quality gates reduce hallucination rate 355 355 2. Blocked outputs have clear explanations 356 356 3. Quality metrics are logged 357 357 358 - 359 359 === 6.2 Quality Thresholds === 360 360 361 361 **Minimum Acceptable:** 389 + 362 362 * ≥70% of test claims correctly classified (factual/non-factual) 363 363 * ≥60% of verdicts are reasonable (human evaluation) 364 364 * Gate 1 blocks 100% of non-factual claims ... ... @@ -365,16 +365,17 @@ 365 365 * Gate 4 blocks verdicts with <2 sources 366 366 367 367 **Target:** 396 + 368 368 * ≥80% claims correctly classified 369 369 * ≥75% verdicts are reasonable 370 370 * <10% false positives (blocking good claims) 371 371 372 - 373 373 === 6.3 POC Decision Gate === 374 374 375 375 **After POC1, we decide:** 376 376 377 377 **✅ PROCEED to POC2** if: 406 + 378 378 * Success criteria met 379 379 * Quality gates demonstrably improve output 380 380 * Core workflow is technically sound ... ... @@ -381,65 +381,72 @@ 381 381 * Clear path to production quality 382 382 383 383 **⚠️ ITERATE POC1** if: 413 + 384 384 * Success criteria partially met 385 385 * Gates work but need tuning 386 386 * Core issues identified but fixable 387 387 388 388 **❌ PIVOT APPROACH** if: 419 + 389 389 * Success criteria not met 390 390 * Fundamental AI limitations discovered 391 391 * Quality gates insufficient 392 392 * Alternative approach needed 393 393 394 - 395 395 == 7. Test Cases == 396 396 397 397 === 7.1 Happy Path === 398 398 399 399 **Test 1: Simple Factual Claim** 430 + 400 400 * Input: "Paris is the capital of France" 401 -* Expected: Factual, 1 scenario, verdict ~95% true432 +* Expected: Factual, 1 scenario, verdict 95% true 402 402 403 403 **Test 2: Ambiguous Claim** 435 + 404 404 * Input: "Switzerland has the highest income in Europe" 405 405 * Expected: Factual, 2-3 scenarios, verdict with uncertainty 406 406 407 407 **Test 3: Statistical Claim** 440 + 408 408 * Input: "10% of people have condition X" 409 409 * Expected: Factual, evidence with numbers, probabilistic verdict 410 410 411 - 412 412 === 7.2 Edge Cases === 413 413 414 414 **Test 4: Opinion** 447 + 415 415 * Input: "Paris is the best city" 416 416 * Expected: Non-factual (opinion), blocked by Gate 1 417 417 418 418 **Test 5: Prediction** 452 + 419 419 * Input: "Bitcoin will reach $100,000 next year" 420 420 * Expected: Non-factual (prediction), blocked by Gate 1 421 421 422 422 **Test 6: Insufficient Evidence** 457 + 423 423 * Input: Obscure factual claim with no sources 424 424 * Expected: Blocked by Gate 4 (<2 sources) 425 425 426 - 427 427 === 7.3 Quality Gate Tests === 428 428 429 429 **Test 7: Gate 1 Effectiveness** 464 + 430 430 * Input: Mix of 10 factual + 10 non-factual claims 431 431 * Expected: Gate 1 blocks all 10 non-factual (100% precision) 432 432 433 433 **Test 8: Gate 4 Effectiveness** 469 + 434 434 * Input: Claims with varying evidence availability 435 435 * Expected: Gate 4 blocks low-confidence verdicts 436 436 437 - 438 438 == 8. Technical Architecture (POC) == 439 439 440 440 === 8.1 Simplified Architecture === 441 441 442 442 **POC Tech Stack:** 478 + 443 443 * **Frontend:** Simple web interface (Next.js + TypeScript) 444 444 * **Backend:** Single API endpoint 445 445 * **AI:** Claude API (Sonnet 4.5) ... ... @@ -452,6 +452,7 @@ 452 452 === 8.2 AKEL Implementation === 453 453 454 454 **POC AKEL:** 491 + 455 455 * Single-threaded processing 456 456 * Synchronous API calls 457 457 * No caching ... ... @@ -459,6 +459,7 @@ 459 459 * Console logging 460 460 461 461 **Full AKEL (POC2+):** 499 + 462 462 * Multi-threaded processing 463 463 * Async API calls 464 464 * Evidence caching ... ... @@ -465,7 +465,6 @@ 465 465 * Advanced error handling with retry 466 466 * Structured logging + monitoring 467 467 468 - 469 469 == 9. POC Philosophy == 470 470 471 471 {{info}} ... ... @@ -474,47 +474,55 @@ 474 474 475 475 === 9.1 Core Principles === 476 476 477 -**1. Prove Concept, Not Production** 514 +* \\ 515 +** \\ 516 +**1. Prove Concept, Not Production 478 478 * POC validates AI can do the job 479 479 * Production quality comes in POC2 and Beta 0 480 480 * Focus on "does it work?" not "is it perfect?" 481 481 482 482 **2. Implement Subset of Requirements** 522 + 483 483 * POC covers FR1-7, NFR11 (lite) 484 484 * All other requirements deferred 485 485 * Clear mapping to [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]] 486 486 487 487 **3. Quality Gates Validate Approach** 528 + 488 488 * 2 gates prove the concept 489 489 * Remaining 5 gates added in POC2 490 490 * Gates must demonstrably improve quality 491 491 492 492 **4. Iterate Based on Results** 534 + 493 493 * POC results determine next steps 494 494 * Decision gate after POC1 495 495 * Flexibility to pivot if needed 496 496 539 +=== 9.2 Success === 497 497 498 - ===9.2 Success =Clear Path Forward ===541 + Clear Path Forward === 499 499 500 500 POC succeeds if we can confidently answer: 501 501 502 502 ✅ **Technical Feasibility:** 546 + 503 503 * Can AI extract claims reliably? 504 504 * Can AI find balanced evidence? 505 505 * Can AI compute reasonable verdicts? 506 506 507 507 ✅ **Quality Approach:** 552 + 508 508 * Do quality gates improve output? 509 509 * Can we measure and track quality? 510 510 * Is the gate approach scalable? 511 511 512 512 ✅ **Production Path:** 558 + 513 513 * Is the core architecture sound? 514 514 * What needs improvement for production? 515 515 * Is POC2 the right next step? 516 516 517 - 518 518 == 10. Related Pages == 519 519 520 520 * **[[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]** - Full system requirements (this POC implements a subset) ... ... @@ -523,11 +523,10 @@ 523 523 * **[[Implementation Roadmap>>FactHarbor.Roadmap.WebHome]]** - POC1, POC2, Beta 0, V1.0 phases 524 524 * **[[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]** - What users need (drives requirements) 525 525 526 - 527 527 **Document Owner:** Technical Team 528 528 **Review Frequency:** After each POC iteration 529 529 **Version History:** 574 + 530 530 * v1.0 - Initial POC requirements 531 531 * v2.0 - Updated after specification cross-check 532 532 * v3.0 - Aligned with Main Requirements (FR/NFR IDs added) 533 -