Changes for page POC Requirements (POC1 & POC2)
Last modified by Robert Schaub on 2025/12/23 18:00
From version 2.2
edited by Robert Schaub
on 2025/12/23 18:00
on 2025/12/23 18:00
Change comment:
Renamed from xwiki:Test.FactHarbor.Specification.POC.Requirements
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -14,11 +14,9 @@ 14 14 === 1.1 What POC Tests === 15 15 16 16 **Core Question:** 17 - 18 18 > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts? 19 19 20 20 **What we're proving:** 21 - 22 22 * AI can identify factual claims from text 23 23 * AI can evaluate those claims with structured evidence 24 24 * Quality gates can filter unreliable outputs ... ... @@ -25,7 +25,6 @@ 25 25 * The core workflow is technically feasible 26 26 27 27 **What we're NOT proving:** 28 - 29 29 * Production-ready reliability (that's POC2) 30 30 * User-facing features (that's Beta 0) 31 31 * Full IFCN compliance (that's V1.0) ... ... @@ -35,15 +35,15 @@ 35 35 POC1 implements a **subset** of the full system requirements defined in [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]. 36 36 37 37 **Scope Summary:** 38 - 39 39 * **In Scope:** 8 requirements (7 FRs + 1 NFR) 40 40 * **Partial:** 3 NFRs (simplified versions) 41 41 * **Out of Scope:** 19 requirements (deferred to later phases) 42 42 39 + 43 43 == 2. POC1 Scope == 44 44 45 45 {{success}} 46 -**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor V0\.9\.88 ex 2 new Org Pages.Roadmap.Requirements-Roadmap-Matrix.WebHome]]43 +**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor.Roadmap.Requirements-Roadmap-Matrix.WebHome]] 47 47 48 48 The Roadmap Matrix is the single source of truth for which requirements are implemented in which phases. This page provides POC1-specific implementation details only. 49 49 {{/success}} ... ... @@ -57,8 +57,9 @@ 57 57 58 58 **POC1 also implements these workflow components** (detailed as FR1-FR6 in implementation sections below) 59 59 60 -{{info}}**Note:** FR11 (Audit Trail) and FR13 (In-Article Claim Highlighting) are deferred to Beta 0 for production readiness and user experience enhancement.{{/info}}: 61 - 57 +{{info}} 58 +**Note:** FR11 (Audit Trail) and FR13 (In-Article Claim Highlighting) are deferred to Beta 0 for production readiness and user experience enhancement. 59 +{{/info}}: 62 62 * Claim extraction (FR1) 63 63 * Claim context (FR2) 64 64 * Multiple scenarios (FR3) ... ... @@ -69,7 +69,6 @@ 69 69 * In-article highlighting (FR13) - deferred to Beta 0 70 70 71 71 **Partial implementations:** 72 - 73 73 * NFR1 (Explainability) - Basic only 74 74 * NFR2 (Performance) - Functional but not optimized 75 75 * NFR3 (Transparency) - Basic only ... ... @@ -85,7 +85,6 @@ 85 85 **Main Requirement:** AI extracts factual claims from input text 86 86 87 87 **POC Implementation:** 88 - 89 89 * ✅ AKEL extracts claims using LLM 90 90 * ✅ Each claim includes original text reference 91 91 * ✅ Claims are identified as factual/non-factual ... ... @@ -92,17 +92,16 @@ 92 92 * ❌ No advanced claim parsing (added in POC2) 93 93 94 94 **Acceptance Criteria:** 95 - 96 96 * Extracts 3-5 claims from typical article 97 97 * Identifies factual vs non-factual claims 98 98 * Quality Gate 1 validates extraction 99 99 95 + 100 100 === 3.2 FR3: Multiple Scenarios (Full Implementation) === 101 101 102 102 **Main Requirement:** Generate multiple interpretation scenarios for ambiguous claims 103 103 104 104 **POC Implementation:** 105 - 106 106 * ✅ AKEL generates 2-3 scenarios per claim 107 107 * ✅ Scenarios capture different interpretations 108 108 * ✅ Each scenario is evaluated separately ... ... @@ -109,17 +109,16 @@ 109 109 * ✅ Verdict considers all scenarios 110 110 111 111 **Acceptance Criteria:** 112 - 113 113 * Generates 2+ scenarios for ambiguous claims 114 114 * Scenarios are meaningfully different 115 115 * All scenarios are evaluated 116 116 111 + 117 117 === 3.3 FR4: Analysis Summary (Basic Implementation) === 118 118 119 119 **Main Requirement:** Provide user-friendly summary of analysis 120 120 121 121 **POC Implementation:** 122 - 123 123 * ✅ Simple text summary generated 124 124 * ❌ No rich formatting (added in Beta 0) 125 125 * ❌ No visual elements (added in Beta 0) ... ... @@ -137,12 +137,10 @@ 137 137 === 3.4 FR5-FR6: Evidence Collection & Evaluation (Full Implementation) === 138 138 139 139 **Main Requirements:** 140 - 141 141 * FR5: Collect supporting and opposing evidence 142 142 * FR6: Evaluate evidence source reliability 143 143 144 144 **POC Implementation:** 145 - 146 146 * ✅ AKEL searches for evidence (web/knowledge base) 147 147 * ✅ **Mandatory contradiction search** (finds opposing evidence) 148 148 * ✅ Source reliability scoring ... ... @@ -150,17 +150,16 @@ 150 150 * ❌ No advanced source verification (added in POC2) 151 151 152 152 **Acceptance Criteria:** 153 - 154 154 * Finds 2+ supporting evidence items 155 155 * Finds 1+ opposing evidence (if exists) 156 156 * Sources scored for reliability 157 157 149 + 158 158 === 3.5 FR7: Automated Verdicts (Full Implementation) === 159 159 160 160 **Main Requirement:** AI computes verdicts with uncertainty quantification 161 161 162 162 **POC Implementation:** 163 - 164 164 * ✅ Probabilistic verdicts (0-100% confidence) 165 165 * ✅ Uncertainty explicitly stated 166 166 * ✅ Reasoning chain provided ... ... @@ -175,11 +175,11 @@ 175 175 ``` 176 176 177 177 **Acceptance Criteria:** 178 - 179 179 * Verdicts include probability (0-100%) 180 180 * Uncertainty explicitly quantified 181 181 * Reasoning chain explains verdict 182 182 173 + 183 183 === 3.6 NFR11: Quality Assurance Framework (LITE VERSION) === 184 184 185 185 **Main Requirement:** Complete quality assurance with 7 quality gates ... ... @@ -187,13 +187,11 @@ 187 187 **POC Implementation:** **2 gates only** 188 188 189 189 **Quality Gate 1: Claim Validation** 190 - 191 191 * ✅ Validates claim is factual and verifiable 192 192 * ✅ Blocks non-factual claims (opinion/prediction/ambiguous) 193 193 * ✅ Provides clear rejection reason 194 194 195 195 **Quality Gate 4: Verdict Confidence Assessment** 196 - 197 197 * ✅ Validates ≥2 sources found 198 198 * ✅ Validates quality score ≥0.6 199 199 * ✅ Blocks low-confidence verdicts ... ... @@ -200,7 +200,6 @@ 200 200 * ✅ Provides clear rejection reason 201 201 202 202 **Out of Scope (POC2+):** 203 - 204 204 * ❌ Gate 2: Evidence Relevance 205 205 * ❌ Gate 3: Scenario Coherence 206 206 * ❌ Gate 5: Source Diversity ... ... @@ -213,13 +213,11 @@ 213 213 === 3.7 NFR1-3: Performance, Scalability, Reliability (Basic) === 214 214 215 215 **Main Requirements:** 216 - 217 217 * NFR1: Response time < 30 seconds 218 218 * NFR2: Handle 1000+ concurrent users 219 219 * NFR3: 99.9% uptime 220 220 221 221 **POC Implementation:** 222 - 223 223 * ⚠️ **Response time monitored** (not optimized) 224 224 * ⚠️ **Single-threaded processing** (no concurrency) 225 225 * ⚠️ **Basic error handling** (no advanced retry logic) ... ... @@ -227,11 +227,11 @@ 227 227 **Rationale:** POC proves functionality. Performance optimization happens in POC2. 228 228 229 229 **POC Acceptance:** 230 - 231 231 * Analysis completes (no timeout requirement) 232 232 * Errors don't crash system 233 233 * Basic logging in place 234 234 220 + 235 235 == 4. What's NOT in POC Scope == 236 236 237 237 === 4.1 User-Facing Features (Beta 0+) === ... ... @@ -241,7 +241,6 @@ 241 241 {{/warning}} 242 242 243 243 **Out of Scope:** 244 - 245 245 * ❌ User accounts and authentication (FR8) 246 246 * ❌ User corrections system (FR9, FR45-46) 247 247 * ❌ Public publishing interface (FR10) ... ... @@ -255,7 +255,6 @@ 255 255 === 4.2 Advanced Features (V1.0+) === 256 256 257 257 **Out of Scope:** 258 - 259 259 * ❌ IFCN compliance (FR47) 260 260 * ❌ ClaimReview schema (FR48) 261 261 * ❌ Archive.org integration (FR49) ... ... @@ -270,7 +270,6 @@ 270 270 === 4.3 Production Requirements (POC2, Beta 0) === 271 271 272 272 **Out of Scope:** 273 - 274 274 * ❌ Security controls (NFR4, NFR12) 275 275 * ❌ Code maintainability (NFR5) 276 276 * ❌ System monitoring (NFR13) ... ... @@ -287,26 +287,21 @@ 287 287 288 288 For each analyzed claim, POC must produce: 289 289 290 -* 291 -** 292 -**1. Claim 273 +**1. Claim** 293 293 * Original text 294 294 * Classification (factual/non-factual/ambiguous) 295 295 * If non-factual: Clear reason why 296 296 297 297 **2. Scenarios** (if factual) 298 - 299 299 * 2-3 interpretation scenarios 300 300 * Each scenario clearly described 301 301 302 302 **3. Evidence** (if factual) 303 - 304 304 * Supporting evidence (2+ items) 305 305 * Opposing evidence (if exists) 306 306 * Source URLs and reliability scores 307 307 308 308 **4. Verdict** (if factual) 309 - 310 310 * Probability (0-100%) 311 311 * Uncertainty quantification 312 312 * Confidence level (LOW/MEDIUM/HIGH) ... ... @@ -313,10 +313,10 @@ 313 313 * Reasoning chain 314 314 315 315 **5. Quality Status** 316 - 317 317 * Which gates passed/failed 318 318 * If failed: Clear explanation why 319 319 297 + 320 320 === 5.2 Example POC Output === 321 321 322 322 {{code language="json"}} ... ... @@ -368,7 +368,6 @@ 368 368 POC is successful if: 369 369 370 370 ✅ **FR1-FR7 Requirements Met:** 371 - 372 372 1. Extracts 3-5 factual claims from test articles 373 373 2. Generates 2-3 scenarios per ambiguous claim 374 374 3. Finds supporting AND opposing evidence ... ... @@ -376,21 +376,19 @@ 376 376 5. Provides clear reasoning chains 377 377 378 378 ✅ **Quality Gates Work:** 379 - 380 380 1. Gate 1 blocks non-factual claims (100% block rate) 381 381 2. Gate 4 blocks low-quality verdicts (blocks if <2 sources or quality <0.6) 382 382 3. Clear rejection reasons provided 383 383 384 384 ✅ **NFR11 Met:** 385 - 386 386 1. Quality gates reduce hallucination rate 387 387 2. Blocked outputs have clear explanations 388 388 3. Quality metrics are logged 389 389 365 + 390 390 === 6.2 Quality Thresholds === 391 391 392 392 **Minimum Acceptable:** 393 - 394 394 * ≥70% of test claims correctly classified (factual/non-factual) 395 395 * ≥60% of verdicts are reasonable (human evaluation) 396 396 * Gate 1 blocks 100% of non-factual claims ... ... @@ -397,17 +397,16 @@ 397 397 * Gate 4 blocks verdicts with <2 sources 398 398 399 399 **Target:** 400 - 401 401 * ≥80% claims correctly classified 402 402 * ≥75% verdicts are reasonable 403 403 * <10% false positives (blocking good claims) 404 404 379 + 405 405 === 6.3 POC Decision Gate === 406 406 407 407 **After POC1, we decide:** 408 408 409 409 **✅ PROCEED to POC2** if: 410 - 411 411 * Success criteria met 412 412 * Quality gates demonstrably improve output 413 413 * Core workflow is technically sound ... ... @@ -414,72 +414,65 @@ 414 414 * Clear path to production quality 415 415 416 416 **⚠️ ITERATE POC1** if: 417 - 418 418 * Success criteria partially met 419 419 * Gates work but need tuning 420 420 * Core issues identified but fixable 421 421 422 422 **❌ PIVOT APPROACH** if: 423 - 424 424 * Success criteria not met 425 425 * Fundamental AI limitations discovered 426 426 * Quality gates insufficient 427 427 * Alternative approach needed 428 428 401 + 429 429 == 7. Test Cases == 430 430 431 431 === 7.1 Happy Path === 432 432 433 433 **Test 1: Simple Factual Claim** 434 - 435 435 * Input: "Paris is the capital of France" 436 -* Expected: Factual, 1 scenario, verdict 95% true 408 +* Expected: Factual, 1 scenario, verdict ~95% true 437 437 438 438 **Test 2: Ambiguous Claim** 439 - 440 440 * Input: "Switzerland has the highest income in Europe" 441 441 * Expected: Factual, 2-3 scenarios, verdict with uncertainty 442 442 443 443 **Test 3: Statistical Claim** 444 - 445 445 * Input: "10% of people have condition X" 446 446 * Expected: Factual, evidence with numbers, probabilistic verdict 447 447 418 + 448 448 === 7.2 Edge Cases === 449 449 450 450 **Test 4: Opinion** 451 - 452 452 * Input: "Paris is the best city" 453 453 * Expected: Non-factual (opinion), blocked by Gate 1 454 454 455 455 **Test 5: Prediction** 456 - 457 457 * Input: "Bitcoin will reach $100,000 next year" 458 458 * Expected: Non-factual (prediction), blocked by Gate 1 459 459 460 460 **Test 6: Insufficient Evidence** 461 - 462 462 * Input: Obscure factual claim with no sources 463 463 * Expected: Blocked by Gate 4 (<2 sources) 464 464 433 + 465 465 === 7.3 Quality Gate Tests === 466 466 467 467 **Test 7: Gate 1 Effectiveness** 468 - 469 469 * Input: Mix of 10 factual + 10 non-factual claims 470 470 * Expected: Gate 1 blocks all 10 non-factual (100% precision) 471 471 472 472 **Test 8: Gate 4 Effectiveness** 473 - 474 474 * Input: Claims with varying evidence availability 475 475 * Expected: Gate 4 blocks low-confidence verdicts 476 476 444 + 477 477 == 8. Technical Architecture (POC) == 478 478 479 479 === 8.1 Simplified Architecture === 480 480 481 481 **POC Tech Stack:** 482 - 483 483 * **Frontend:** Simple web interface (Next.js + TypeScript) 484 484 * **Backend:** Single API endpoint 485 485 * **AI:** Claude API (Sonnet 4.5) ... ... @@ -492,7 +492,6 @@ 492 492 === 8.2 AKEL Implementation === 493 493 494 494 **POC AKEL:** 495 - 496 496 * Single-threaded processing 497 497 * Synchronous API calls 498 498 * No caching ... ... @@ -500,7 +500,6 @@ 500 500 * Console logging 501 501 502 502 **Full AKEL (POC2+):** 503 - 504 504 * Multi-threaded processing 505 505 * Async API calls 506 506 * Evidence caching ... ... @@ -507,6 +507,7 @@ 507 507 * Advanced error handling with retry 508 508 * Structured logging + monitoring 509 509 475 + 510 510 == 9. POC Philosophy == 511 511 512 512 {{info}} ... ... @@ -515,55 +515,47 @@ 515 515 516 516 === 9.1 Core Principles === 517 517 518 -* 519 -** 520 -**1. Prove Concept, Not Production 484 +**1. Prove Concept, Not Production** 521 521 * POC validates AI can do the job 522 522 * Production quality comes in POC2 and Beta 0 523 523 * Focus on "does it work?" not "is it perfect?" 524 524 525 525 **2. Implement Subset of Requirements** 526 - 527 527 * POC covers FR1-7, NFR11 (lite) 528 528 * All other requirements deferred 529 529 * Clear mapping to [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]] 530 530 531 531 **3. Quality Gates Validate Approach** 532 - 533 533 * 2 gates prove the concept 534 534 * Remaining 5 gates added in POC2 535 535 * Gates must demonstrably improve quality 536 536 537 537 **4. Iterate Based on Results** 538 - 539 539 * POC results determine next steps 540 540 * Decision gate after POC1 541 541 * Flexibility to pivot if needed 542 542 543 -=== 9.2 Success === 544 544 545 - Clear Path Forward === 505 +=== 9.2 Success = Clear Path Forward === 546 546 547 547 POC succeeds if we can confidently answer: 548 548 549 549 ✅ **Technical Feasibility:** 550 - 551 551 * Can AI extract claims reliably? 552 552 * Can AI find balanced evidence? 553 553 * Can AI compute reasonable verdicts? 554 554 555 555 ✅ **Quality Approach:** 556 - 557 557 * Do quality gates improve output? 558 558 * Can we measure and track quality? 559 559 * Is the gate approach scalable? 560 560 561 561 ✅ **Production Path:** 562 - 563 563 * Is the core architecture sound? 564 564 * What needs improvement for production? 565 565 * Is POC2 the right next step? 566 566 524 + 567 567 == 10. Related Pages == 568 568 569 569 * **[[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]** - Full system requirements (this POC implements a subset) ... ... @@ -572,10 +572,11 @@ 572 572 * **[[Implementation Roadmap>>FactHarbor.Roadmap.WebHome]]** - POC1, POC2, Beta 0, V1.0 phases 573 573 * **[[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]** - What users need (drives requirements) 574 574 533 + 575 575 **Document Owner:** Technical Team 576 576 **Review Frequency:** After each POC iteration 577 577 **Version History:** 578 - 579 579 * v1.0 - Initial POC requirements 580 580 * v2.0 - Updated after specification cross-check 581 581 * v3.0 - Aligned with Main Requirements (FR/NFR IDs added) 540 +