Changes for page POC Requirements (POC1 & POC2)
Last modified by Robert Schaub on 2025/12/23 16:13
From version 1.2
edited by Robert Schaub
on 2025/12/23 16:13
on 2025/12/23 16:13
Change comment:
Renamed from xwiki:Test.FactHarbor.Specification.POC.Requirements
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -14,11 +14,9 @@ 14 14 === 1.1 What POC Tests === 15 15 16 16 **Core Question:** 17 - 18 18 > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts? 19 19 20 20 **What we're proving:** 21 - 22 22 * AI can identify factual claims from text 23 23 * AI can evaluate those claims with structured evidence 24 24 * Quality gates can filter unreliable outputs ... ... @@ -25,7 +25,6 @@ 25 25 * The core workflow is technically feasible 26 26 27 27 **What we're NOT proving:** 28 - 29 29 * Production-ready reliability (that's POC2) 30 30 * User-facing features (that's Beta 0) 31 31 * Full IFCN compliance (that's V1.0) ... ... @@ -35,15 +35,15 @@ 35 35 POC1 implements a **subset** of the full system requirements defined in [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]. 36 36 37 37 **Scope Summary:** 38 - 39 39 * **In Scope:** 8 requirements (7 FRs + 1 NFR) 40 40 * **Partial:** 3 NFRs (simplified versions) 41 41 * **Out of Scope:** 19 requirements (deferred to later phases) 42 42 39 + 43 43 == 2. POC1 Scope == 44 44 45 45 {{success}} 46 -**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor V0\.9\.83.Roadmap.Requirements-Roadmap-Matrix.WebHome]]43 +**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor.Roadmap.Requirements-Roadmap-Matrix.WebHome]] 47 47 48 48 The Roadmap Matrix is the single source of truth for which requirements are implemented in which phases. This page provides POC1-specific implementation details only. 49 49 {{/success}} ... ... @@ -56,7 +56,6 @@ 56 56 | **NFR11** | Quality Assurance Framework | 4 quality gates implemented 57 57 58 58 **POC1 also implements these workflow components** (detailed as FR1-FR6, FR8, FR11, FR13 in implementation sections below): 59 - 60 60 * Claim extraction (FR1) 61 61 * Claim context (FR2) 62 62 * Multiple scenarios (FR3) ... ... @@ -67,7 +67,6 @@ 67 67 * In-article highlighting (FR13) - deferred to Beta 0 68 68 69 69 **Partial implementations:** 70 - 71 71 * NFR1 (Explainability) - Basic only 72 72 * NFR2 (Performance) - Functional but not optimized 73 73 * NFR3 (Transparency) - Basic only ... ... @@ -83,7 +83,6 @@ 83 83 **Main Requirement:** AI extracts factual claims from input text 84 84 85 85 **POC Implementation:** 86 - 87 87 * ✅ AKEL extracts claims using LLM 88 88 * ✅ Each claim includes original text reference 89 89 * ✅ Claims are identified as factual/non-factual ... ... @@ -90,17 +90,16 @@ 90 90 * ❌ No advanced claim parsing (added in POC2) 91 91 92 92 **Acceptance Criteria:** 93 - 94 94 * Extracts 3-5 claims from typical article 95 95 * Identifies factual vs non-factual claims 96 96 * Quality Gate 1 validates extraction 97 97 91 + 98 98 === 3.2 FR3: Multiple Scenarios (Full Implementation) === 99 99 100 100 **Main Requirement:** Generate multiple interpretation scenarios for ambiguous claims 101 101 102 102 **POC Implementation:** 103 - 104 104 * ✅ AKEL generates 2-3 scenarios per claim 105 105 * ✅ Scenarios capture different interpretations 106 106 * ✅ Each scenario is evaluated separately ... ... @@ -107,17 +107,16 @@ 107 107 * ✅ Verdict considers all scenarios 108 108 109 109 **Acceptance Criteria:** 110 - 111 111 * Generates 2+ scenarios for ambiguous claims 112 112 * Scenarios are meaningfully different 113 113 * All scenarios are evaluated 114 114 107 + 115 115 === 3.3 FR4: Analysis Summary (Basic Implementation) === 116 116 117 117 **Main Requirement:** Provide user-friendly summary of analysis 118 118 119 119 **POC Implementation:** 120 - 121 121 * ✅ Simple text summary generated 122 122 * ❌ No rich formatting (added in Beta 0) 123 123 * ❌ No visual elements (added in Beta 0) ... ... @@ -135,12 +135,10 @@ 135 135 === 3.4 FR5-FR6: Evidence Collection & Evaluation (Full Implementation) === 136 136 137 137 **Main Requirements:** 138 - 139 139 * FR5: Collect supporting and opposing evidence 140 140 * FR6: Evaluate evidence source reliability 141 141 142 142 **POC Implementation:** 143 - 144 144 * ✅ AKEL searches for evidence (web/knowledge base) 145 145 * ✅ **Mandatory contradiction search** (finds opposing evidence) 146 146 * ✅ Source reliability scoring ... ... @@ -148,17 +148,16 @@ 148 148 * ❌ No advanced source verification (added in POC2) 149 149 150 150 **Acceptance Criteria:** 151 - 152 152 * Finds 2+ supporting evidence items 153 153 * Finds 1+ opposing evidence (if exists) 154 154 * Sources scored for reliability 155 155 145 + 156 156 === 3.5 FR7: Automated Verdicts (Full Implementation) === 157 157 158 158 **Main Requirement:** AI computes verdicts with uncertainty quantification 159 159 160 160 **POC Implementation:** 161 - 162 162 * ✅ Probabilistic verdicts (0-100% confidence) 163 163 * ✅ Uncertainty explicitly stated 164 164 * ✅ Reasoning chain provided ... ... @@ -173,11 +173,11 @@ 173 173 ``` 174 174 175 175 **Acceptance Criteria:** 176 - 177 177 * Verdicts include probability (0-100%) 178 178 * Uncertainty explicitly quantified 179 179 * Reasoning chain explains verdict 180 180 169 + 181 181 === 3.6 NFR11: Quality Assurance Framework (LITE VERSION) === 182 182 183 183 **Main Requirement:** Complete quality assurance with 7 quality gates ... ... @@ -185,13 +185,11 @@ 185 185 **POC Implementation:** **2 gates only** 186 186 187 187 **Quality Gate 1: Claim Validation** 188 - 189 189 * ✅ Validates claim is factual and verifiable 190 190 * ✅ Blocks non-factual claims (opinion/prediction/ambiguous) 191 191 * ✅ Provides clear rejection reason 192 192 193 193 **Quality Gate 4: Verdict Confidence Assessment** 194 - 195 195 * ✅ Validates ≥2 sources found 196 196 * ✅ Validates quality score ≥0.6 197 197 * ✅ Blocks low-confidence verdicts ... ... @@ -198,7 +198,6 @@ 198 198 * ✅ Provides clear rejection reason 199 199 200 200 **Out of Scope (POC2+):** 201 - 202 202 * ❌ Gate 2: Evidence Relevance 203 203 * ❌ Gate 3: Scenario Coherence 204 204 * ❌ Gate 5: Source Diversity ... ... @@ -211,13 +211,11 @@ 211 211 === 3.7 NFR1-3: Performance, Scalability, Reliability (Basic) === 212 212 213 213 **Main Requirements:** 214 - 215 215 * NFR1: Response time < 30 seconds 216 216 * NFR2: Handle 1000+ concurrent users 217 217 * NFR3: 99.9% uptime 218 218 219 219 **POC Implementation:** 220 - 221 221 * ⚠️ **Response time monitored** (not optimized) 222 222 * ⚠️ **Single-threaded processing** (no concurrency) 223 223 * ⚠️ **Basic error handling** (no advanced retry logic) ... ... @@ -225,11 +225,11 @@ 225 225 **Rationale:** POC proves functionality. Performance optimization happens in POC2. 226 226 227 227 **POC Acceptance:** 228 - 229 229 * Analysis completes (no timeout requirement) 230 230 * Errors don't crash system 231 231 * Basic logging in place 232 232 216 + 233 233 == 4. What's NOT in POC Scope == 234 234 235 235 === 4.1 User-Facing Features (Beta 0+) === ... ... @@ -239,7 +239,6 @@ 239 239 {{/warning}} 240 240 241 241 **Out of Scope:** 242 - 243 243 * ❌ User accounts and authentication (FR8) 244 244 * ❌ User corrections system (FR9, FR45-46) 245 245 * ❌ Public publishing interface (FR10) ... ... @@ -253,7 +253,6 @@ 253 253 === 4.2 Advanced Features (V1.0+) === 254 254 255 255 **Out of Scope:** 256 - 257 257 * ❌ IFCN compliance (FR47) 258 258 * ❌ ClaimReview schema (FR48) 259 259 * ❌ Archive.org integration (FR49) ... ... @@ -268,7 +268,6 @@ 268 268 === 4.3 Production Requirements (POC2, Beta 0) === 269 269 270 270 **Out of Scope:** 271 - 272 272 * ❌ Security controls (NFR4, NFR12) 273 273 * ❌ Code maintainability (NFR5) 274 274 * ❌ System monitoring (NFR13) ... ... @@ -285,26 +285,21 @@ 285 285 286 286 For each analyzed claim, POC must produce: 287 287 288 -* 289 -** 290 -**1. Claim 269 +**1. Claim** 291 291 * Original text 292 292 * Classification (factual/non-factual/ambiguous) 293 293 * If non-factual: Clear reason why 294 294 295 295 **2. Scenarios** (if factual) 296 - 297 297 * 2-3 interpretation scenarios 298 298 * Each scenario clearly described 299 299 300 300 **3. Evidence** (if factual) 301 - 302 302 * Supporting evidence (2+ items) 303 303 * Opposing evidence (if exists) 304 304 * Source URLs and reliability scores 305 305 306 306 **4. Verdict** (if factual) 307 - 308 308 * Probability (0-100%) 309 309 * Uncertainty quantification 310 310 * Confidence level (LOW/MEDIUM/HIGH) ... ... @@ -311,10 +311,10 @@ 311 311 * Reasoning chain 312 312 313 313 **5. Quality Status** 314 - 315 315 * Which gates passed/failed 316 316 * If failed: Clear explanation why 317 317 293 + 318 318 === 5.2 Example POC Output === 319 319 320 320 {{code language="json"}} ... ... @@ -366,7 +366,6 @@ 366 366 POC is successful if: 367 367 368 368 ✅ **FR1-FR7 Requirements Met:** 369 - 370 370 1. Extracts 3-5 factual claims from test articles 371 371 2. Generates 2-3 scenarios per ambiguous claim 372 372 3. Finds supporting AND opposing evidence ... ... @@ -374,21 +374,19 @@ 374 374 5. Provides clear reasoning chains 375 375 376 376 ✅ **Quality Gates Work:** 377 - 378 378 1. Gate 1 blocks non-factual claims (100% block rate) 379 379 2. Gate 4 blocks low-quality verdicts (blocks if <2 sources or quality <0.6) 380 380 3. Clear rejection reasons provided 381 381 382 382 ✅ **NFR11 Met:** 383 - 384 384 1. Quality gates reduce hallucination rate 385 385 2. Blocked outputs have clear explanations 386 386 3. Quality metrics are logged 387 387 361 + 388 388 === 6.2 Quality Thresholds === 389 389 390 390 **Minimum Acceptable:** 391 - 392 392 * ≥70% of test claims correctly classified (factual/non-factual) 393 393 * ≥60% of verdicts are reasonable (human evaluation) 394 394 * Gate 1 blocks 100% of non-factual claims ... ... @@ -395,17 +395,16 @@ 395 395 * Gate 4 blocks verdicts with <2 sources 396 396 397 397 **Target:** 398 - 399 399 * ≥80% claims correctly classified 400 400 * ≥75% verdicts are reasonable 401 401 * <10% false positives (blocking good claims) 402 402 375 + 403 403 === 6.3 POC Decision Gate === 404 404 405 405 **After POC1, we decide:** 406 406 407 407 **✅ PROCEED to POC2** if: 408 - 409 409 * Success criteria met 410 410 * Quality gates demonstrably improve output 411 411 * Core workflow is technically sound ... ... @@ -412,72 +412,65 @@ 412 412 * Clear path to production quality 413 413 414 414 **⚠️ ITERATE POC1** if: 415 - 416 416 * Success criteria partially met 417 417 * Gates work but need tuning 418 418 * Core issues identified but fixable 419 419 420 420 **❌ PIVOT APPROACH** if: 421 - 422 422 * Success criteria not met 423 423 * Fundamental AI limitations discovered 424 424 * Quality gates insufficient 425 425 * Alternative approach needed 426 426 397 + 427 427 == 7. Test Cases == 428 428 429 429 === 7.1 Happy Path === 430 430 431 431 **Test 1: Simple Factual Claim** 432 - 433 433 * Input: "Paris is the capital of France" 434 -* Expected: Factual, 1 scenario, verdict 95% true 404 +* Expected: Factual, 1 scenario, verdict ~95% true 435 435 436 436 **Test 2: Ambiguous Claim** 437 - 438 438 * Input: "Switzerland has the highest income in Europe" 439 439 * Expected: Factual, 2-3 scenarios, verdict with uncertainty 440 440 441 441 **Test 3: Statistical Claim** 442 - 443 443 * Input: "10% of people have condition X" 444 444 * Expected: Factual, evidence with numbers, probabilistic verdict 445 445 414 + 446 446 === 7.2 Edge Cases === 447 447 448 448 **Test 4: Opinion** 449 - 450 450 * Input: "Paris is the best city" 451 451 * Expected: Non-factual (opinion), blocked by Gate 1 452 452 453 453 **Test 5: Prediction** 454 - 455 455 * Input: "Bitcoin will reach $100,000 next year" 456 456 * Expected: Non-factual (prediction), blocked by Gate 1 457 457 458 458 **Test 6: Insufficient Evidence** 459 - 460 460 * Input: Obscure factual claim with no sources 461 461 * Expected: Blocked by Gate 4 (<2 sources) 462 462 429 + 463 463 === 7.3 Quality Gate Tests === 464 464 465 465 **Test 7: Gate 1 Effectiveness** 466 - 467 467 * Input: Mix of 10 factual + 10 non-factual claims 468 468 * Expected: Gate 1 blocks all 10 non-factual (100% precision) 469 469 470 470 **Test 8: Gate 4 Effectiveness** 471 - 472 472 * Input: Claims with varying evidence availability 473 473 * Expected: Gate 4 blocks low-confidence verdicts 474 474 440 + 475 475 == 8. Technical Architecture (POC) == 476 476 477 477 === 8.1 Simplified Architecture === 478 478 479 479 **POC Tech Stack:** 480 - 481 481 * **Frontend:** Simple web interface (Next.js + TypeScript) 482 482 * **Backend:** Single API endpoint 483 483 * **AI:** Claude API (Sonnet 4.5) ... ... @@ -490,7 +490,6 @@ 490 490 === 8.2 AKEL Implementation === 491 491 492 492 **POC AKEL:** 493 - 494 494 * Single-threaded processing 495 495 * Synchronous API calls 496 496 * No caching ... ... @@ -498,7 +498,6 @@ 498 498 * Console logging 499 499 500 500 **Full AKEL (POC2+):** 501 - 502 502 * Multi-threaded processing 503 503 * Async API calls 504 504 * Evidence caching ... ... @@ -505,6 +505,7 @@ 505 505 * Advanced error handling with retry 506 506 * Structured logging + monitoring 507 507 471 + 508 508 == 9. POC Philosophy == 509 509 510 510 {{info}} ... ... @@ -513,55 +513,47 @@ 513 513 514 514 === 9.1 Core Principles === 515 515 516 -* 517 -** 518 -**1. Prove Concept, Not Production 480 +**1. Prove Concept, Not Production** 519 519 * POC validates AI can do the job 520 520 * Production quality comes in POC2 and Beta 0 521 521 * Focus on "does it work?" not "is it perfect?" 522 522 523 523 **2. Implement Subset of Requirements** 524 - 525 525 * POC covers FR1-7, NFR11 (lite) 526 526 * All other requirements deferred 527 527 * Clear mapping to [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]] 528 528 529 529 **3. Quality Gates Validate Approach** 530 - 531 531 * 2 gates prove the concept 532 532 * Remaining 5 gates added in POC2 533 533 * Gates must demonstrably improve quality 534 534 535 535 **4. Iterate Based on Results** 536 - 537 537 * POC results determine next steps 538 538 * Decision gate after POC1 539 539 * Flexibility to pivot if needed 540 540 541 -=== 9.2 Success === 542 542 543 - Clear Path Forward === 501 +=== 9.2 Success = Clear Path Forward === 544 544 545 545 POC succeeds if we can confidently answer: 546 546 547 547 ✅ **Technical Feasibility:** 548 - 549 549 * Can AI extract claims reliably? 550 550 * Can AI find balanced evidence? 551 551 * Can AI compute reasonable verdicts? 552 552 553 553 ✅ **Quality Approach:** 554 - 555 555 * Do quality gates improve output? 556 556 * Can we measure and track quality? 557 557 * Is the gate approach scalable? 558 558 559 559 ✅ **Production Path:** 560 - 561 561 * Is the core architecture sound? 562 562 * What needs improvement for production? 563 563 * Is POC2 the right next step? 564 564 520 + 565 565 == 10. Related Pages == 566 566 567 567 * **[[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]** - Full system requirements (this POC implements a subset) ... ... @@ -570,10 +570,11 @@ 570 570 * **[[Implementation Roadmap>>FactHarbor.Roadmap.WebHome]]** - POC1, POC2, Beta 0, V1.0 phases 571 571 * **[[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]** - What users need (drives requirements) 572 572 529 + 573 573 **Document Owner:** Technical Team 574 574 **Review Frequency:** After each POC iteration 575 575 **Version History:** 576 - 577 577 * v1.0 - Initial POC requirements 578 578 * v2.0 - Updated after specification cross-check 579 579 * v3.0 - Aligned with Main Requirements (FR/NFR IDs added) 536 +