Changes for page POC Requirements (POC1 & POC2)
Last modified by Robert Schaub on 2025/12/23 16:49
To version 1.2
edited by Robert Schaub
on 2025/12/23 16:49
on 2025/12/23 16:49
Change comment:
Renamed from xwiki:Test.FactHarbor.Specification.POC.Requirements
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -14,9 +14,11 @@ 14 14 === 1.1 What POC Tests === 15 15 16 16 **Core Question:** 17 + 17 17 > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts? 18 18 19 19 **What we're proving:** 21 + 20 20 * AI can identify factual claims from text 21 21 * AI can evaluate those claims with structured evidence 22 22 * Quality gates can filter unreliable outputs ... ... @@ -23,6 +23,7 @@ 23 23 * The core workflow is technically feasible 24 24 25 25 **What we're NOT proving:** 28 + 26 26 * Production-ready reliability (that's POC2) 27 27 * User-facing features (that's Beta 0) 28 28 * Full IFCN compliance (that's V1.0) ... ... @@ -32,15 +32,15 @@ 32 32 POC1 implements a **subset** of the full system requirements defined in [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]. 33 33 34 34 **Scope Summary:** 38 + 35 35 * **In Scope:** 8 requirements (7 FRs + 1 NFR) 36 36 * **Partial:** 3 NFRs (simplified versions) 37 37 * **Out of Scope:** 19 requirements (deferred to later phases) 38 38 39 - 40 40 == 2. POC1 Scope == 41 41 42 42 {{success}} 43 -**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor.Roadmap.Requirements-Roadmap-Matrix.WebHome]] 46 +**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor V0\.9\.84.Roadmap.Requirements-Roadmap-Matrix.WebHome]] 44 44 45 45 The Roadmap Matrix is the single source of truth for which requirements are implemented in which phases. This page provides POC1-specific implementation details only. 46 46 {{/success}} ... ... @@ -53,6 +53,7 @@ 53 53 | **NFR11** | Quality Assurance Framework | 4 quality gates implemented 54 54 55 55 **POC1 also implements these workflow components** (detailed as FR1-FR6, FR11, FR13 in implementation sections below): 59 + 56 56 * Claim extraction (FR1) 57 57 * Claim context (FR2) 58 58 * Multiple scenarios (FR3) ... ... @@ -63,6 +63,7 @@ 63 63 * In-article highlighting (FR13) - deferred to Beta 0 64 64 65 65 **Partial implementations:** 70 + 66 66 * NFR1 (Explainability) - Basic only 67 67 * NFR2 (Performance) - Functional but not optimized 68 68 * NFR3 (Transparency) - Basic only ... ... @@ -78,6 +78,7 @@ 78 78 **Main Requirement:** AI extracts factual claims from input text 79 79 80 80 **POC Implementation:** 86 + 81 81 * ✅ AKEL extracts claims using LLM 82 82 * ✅ Each claim includes original text reference 83 83 * ✅ Claims are identified as factual/non-factual ... ... @@ -84,16 +84,17 @@ 84 84 * ❌ No advanced claim parsing (added in POC2) 85 85 86 86 **Acceptance Criteria:** 93 + 87 87 * Extracts 3-5 claims from typical article 88 88 * Identifies factual vs non-factual claims 89 89 * Quality Gate 1 validates extraction 90 90 91 - 92 92 === 3.2 FR3: Multiple Scenarios (Full Implementation) === 93 93 94 94 **Main Requirement:** Generate multiple interpretation scenarios for ambiguous claims 95 95 96 96 **POC Implementation:** 103 + 97 97 * ✅ AKEL generates 2-3 scenarios per claim 98 98 * ✅ Scenarios capture different interpretations 99 99 * ✅ Each scenario is evaluated separately ... ... @@ -100,16 +100,17 @@ 100 100 * ✅ Verdict considers all scenarios 101 101 102 102 **Acceptance Criteria:** 110 + 103 103 * Generates 2+ scenarios for ambiguous claims 104 104 * Scenarios are meaningfully different 105 105 * All scenarios are evaluated 106 106 107 - 108 108 === 3.3 FR4: Analysis Summary (Basic Implementation) === 109 109 110 110 **Main Requirement:** Provide user-friendly summary of analysis 111 111 112 112 **POC Implementation:** 120 + 113 113 * ✅ Simple text summary generated 114 114 * ❌ No rich formatting (added in Beta 0) 115 115 * ❌ No visual elements (added in Beta 0) ... ... @@ -127,10 +127,12 @@ 127 127 === 3.4 FR5-FR6: Evidence Collection & Evaluation (Full Implementation) === 128 128 129 129 **Main Requirements:** 138 + 130 130 * FR5: Collect supporting and opposing evidence 131 131 * FR6: Evaluate evidence source reliability 132 132 133 133 **POC Implementation:** 143 + 134 134 * ✅ AKEL searches for evidence (web/knowledge base) 135 135 * ✅ **Mandatory contradiction search** (finds opposing evidence) 136 136 * ✅ Source reliability scoring ... ... @@ -138,16 +138,17 @@ 138 138 * ❌ No advanced source verification (added in POC2) 139 139 140 140 **Acceptance Criteria:** 151 + 141 141 * Finds 2+ supporting evidence items 142 142 * Finds 1+ opposing evidence (if exists) 143 143 * Sources scored for reliability 144 144 145 - 146 146 === 3.5 FR7: Automated Verdicts (Full Implementation) === 147 147 148 148 **Main Requirement:** AI computes verdicts with uncertainty quantification 149 149 150 150 **POC Implementation:** 161 + 151 151 * ✅ Probabilistic verdicts (0-100% confidence) 152 152 * ✅ Uncertainty explicitly stated 153 153 * ✅ Reasoning chain provided ... ... @@ -162,11 +162,11 @@ 162 162 ``` 163 163 164 164 **Acceptance Criteria:** 176 + 165 165 * Verdicts include probability (0-100%) 166 166 * Uncertainty explicitly quantified 167 167 * Reasoning chain explains verdict 168 168 169 - 170 170 === 3.6 NFR11: Quality Assurance Framework (LITE VERSION) === 171 171 172 172 **Main Requirement:** Complete quality assurance with 7 quality gates ... ... @@ -174,11 +174,13 @@ 174 174 **POC Implementation:** **2 gates only** 175 175 176 176 **Quality Gate 1: Claim Validation** 188 + 177 177 * ✅ Validates claim is factual and verifiable 178 178 * ✅ Blocks non-factual claims (opinion/prediction/ambiguous) 179 179 * ✅ Provides clear rejection reason 180 180 181 181 **Quality Gate 4: Verdict Confidence Assessment** 194 + 182 182 * ✅ Validates ≥2 sources found 183 183 * ✅ Validates quality score ≥0.6 184 184 * ✅ Blocks low-confidence verdicts ... ... @@ -185,6 +185,7 @@ 185 185 * ✅ Provides clear rejection reason 186 186 187 187 **Out of Scope (POC2+):** 201 + 188 188 * ❌ Gate 2: Evidence Relevance 189 189 * ❌ Gate 3: Scenario Coherence 190 190 * ❌ Gate 5: Source Diversity ... ... @@ -197,11 +197,13 @@ 197 197 === 3.7 NFR1-3: Performance, Scalability, Reliability (Basic) === 198 198 199 199 **Main Requirements:** 214 + 200 200 * NFR1: Response time < 30 seconds 201 201 * NFR2: Handle 1000+ concurrent users 202 202 * NFR3: 99.9% uptime 203 203 204 204 **POC Implementation:** 220 + 205 205 * ⚠️ **Response time monitored** (not optimized) 206 206 * ⚠️ **Single-threaded processing** (no concurrency) 207 207 * ⚠️ **Basic error handling** (no advanced retry logic) ... ... @@ -209,11 +209,11 @@ 209 209 **Rationale:** POC proves functionality. Performance optimization happens in POC2. 210 210 211 211 **POC Acceptance:** 228 + 212 212 * Analysis completes (no timeout requirement) 213 213 * Errors don't crash system 214 214 * Basic logging in place 215 215 216 - 217 217 == 4. What's NOT in POC Scope == 218 218 219 219 === 4.1 User-Facing Features (Beta 0+) === ... ... @@ -223,6 +223,7 @@ 223 223 {{/warning}} 224 224 225 225 **Out of Scope:** 242 + 226 226 * ❌ User accounts and authentication (FR8) 227 227 * ❌ User corrections system (FR9, FR45-46) 228 228 * ❌ Public publishing interface (FR10) ... ... @@ -236,6 +236,7 @@ 236 236 === 4.2 Advanced Features (V1.0+) === 237 237 238 238 **Out of Scope:** 256 + 239 239 * ❌ IFCN compliance (FR47) 240 240 * ❌ ClaimReview schema (FR48) 241 241 * ❌ Archive.org integration (FR49) ... ... @@ -250,6 +250,7 @@ 250 250 === 4.3 Production Requirements (POC2, Beta 0) === 251 251 252 252 **Out of Scope:** 271 + 253 253 * ❌ Security controls (NFR4, NFR12) 254 254 * ❌ Code maintainability (NFR5) 255 255 * ❌ System monitoring (NFR13) ... ... @@ -266,21 +266,26 @@ 266 266 267 267 For each analyzed claim, POC must produce: 268 268 269 -**1. Claim** 288 +* 289 +** 290 +**1. Claim 270 270 * Original text 271 271 * Classification (factual/non-factual/ambiguous) 272 272 * If non-factual: Clear reason why 273 273 274 274 **2. Scenarios** (if factual) 296 + 275 275 * 2-3 interpretation scenarios 276 276 * Each scenario clearly described 277 277 278 278 **3. Evidence** (if factual) 301 + 279 279 * Supporting evidence (2+ items) 280 280 * Opposing evidence (if exists) 281 281 * Source URLs and reliability scores 282 282 283 283 **4. Verdict** (if factual) 307 + 284 284 * Probability (0-100%) 285 285 * Uncertainty quantification 286 286 * Confidence level (LOW/MEDIUM/HIGH) ... ... @@ -287,10 +287,10 @@ 287 287 * Reasoning chain 288 288 289 289 **5. Quality Status** 314 + 290 290 * Which gates passed/failed 291 291 * If failed: Clear explanation why 292 292 293 - 294 294 === 5.2 Example POC Output === 295 295 296 296 {{code language="json"}} ... ... @@ -342,6 +342,7 @@ 342 342 POC is successful if: 343 343 344 344 ✅ **FR1-FR7 Requirements Met:** 369 + 345 345 1. Extracts 3-5 factual claims from test articles 346 346 2. Generates 2-3 scenarios per ambiguous claim 347 347 3. Finds supporting AND opposing evidence ... ... @@ -349,19 +349,21 @@ 349 349 5. Provides clear reasoning chains 350 350 351 351 ✅ **Quality Gates Work:** 377 + 352 352 1. Gate 1 blocks non-factual claims (100% block rate) 353 353 2. Gate 4 blocks low-quality verdicts (blocks if <2 sources or quality <0.6) 354 354 3. Clear rejection reasons provided 355 355 356 356 ✅ **NFR11 Met:** 383 + 357 357 1. Quality gates reduce hallucination rate 358 358 2. Blocked outputs have clear explanations 359 359 3. Quality metrics are logged 360 360 361 - 362 362 === 6.2 Quality Thresholds === 363 363 364 364 **Minimum Acceptable:** 391 + 365 365 * ≥70% of test claims correctly classified (factual/non-factual) 366 366 * ≥60% of verdicts are reasonable (human evaluation) 367 367 * Gate 1 blocks 100% of non-factual claims ... ... @@ -368,16 +368,17 @@ 368 368 * Gate 4 blocks verdicts with <2 sources 369 369 370 370 **Target:** 398 + 371 371 * ≥80% claims correctly classified 372 372 * ≥75% verdicts are reasonable 373 373 * <10% false positives (blocking good claims) 374 374 375 - 376 376 === 6.3 POC Decision Gate === 377 377 378 378 **After POC1, we decide:** 379 379 380 380 **✅ PROCEED to POC2** if: 408 + 381 381 * Success criteria met 382 382 * Quality gates demonstrably improve output 383 383 * Core workflow is technically sound ... ... @@ -384,65 +384,72 @@ 384 384 * Clear path to production quality 385 385 386 386 **⚠️ ITERATE POC1** if: 415 + 387 387 * Success criteria partially met 388 388 * Gates work but need tuning 389 389 * Core issues identified but fixable 390 390 391 391 **❌ PIVOT APPROACH** if: 421 + 392 392 * Success criteria not met 393 393 * Fundamental AI limitations discovered 394 394 * Quality gates insufficient 395 395 * Alternative approach needed 396 396 397 - 398 398 == 7. Test Cases == 399 399 400 400 === 7.1 Happy Path === 401 401 402 402 **Test 1: Simple Factual Claim** 432 + 403 403 * Input: "Paris is the capital of France" 404 -* Expected: Factual, 1 scenario, verdict ~95% true434 +* Expected: Factual, 1 scenario, verdict 95% true 405 405 406 406 **Test 2: Ambiguous Claim** 437 + 407 407 * Input: "Switzerland has the highest income in Europe" 408 408 * Expected: Factual, 2-3 scenarios, verdict with uncertainty 409 409 410 410 **Test 3: Statistical Claim** 442 + 411 411 * Input: "10% of people have condition X" 412 412 * Expected: Factual, evidence with numbers, probabilistic verdict 413 413 414 - 415 415 === 7.2 Edge Cases === 416 416 417 417 **Test 4: Opinion** 449 + 418 418 * Input: "Paris is the best city" 419 419 * Expected: Non-factual (opinion), blocked by Gate 1 420 420 421 421 **Test 5: Prediction** 454 + 422 422 * Input: "Bitcoin will reach $100,000 next year" 423 423 * Expected: Non-factual (prediction), blocked by Gate 1 424 424 425 425 **Test 6: Insufficient Evidence** 459 + 426 426 * Input: Obscure factual claim with no sources 427 427 * Expected: Blocked by Gate 4 (<2 sources) 428 428 429 - 430 430 === 7.3 Quality Gate Tests === 431 431 432 432 **Test 7: Gate 1 Effectiveness** 466 + 433 433 * Input: Mix of 10 factual + 10 non-factual claims 434 434 * Expected: Gate 1 blocks all 10 non-factual (100% precision) 435 435 436 436 **Test 8: Gate 4 Effectiveness** 471 + 437 437 * Input: Claims with varying evidence availability 438 438 * Expected: Gate 4 blocks low-confidence verdicts 439 439 440 - 441 441 == 8. Technical Architecture (POC) == 442 442 443 443 === 8.1 Simplified Architecture === 444 444 445 445 **POC Tech Stack:** 480 + 446 446 * **Frontend:** Simple web interface (Next.js + TypeScript) 447 447 * **Backend:** Single API endpoint 448 448 * **AI:** Claude API (Sonnet 4.5) ... ... @@ -455,6 +455,7 @@ 455 455 === 8.2 AKEL Implementation === 456 456 457 457 **POC AKEL:** 493 + 458 458 * Single-threaded processing 459 459 * Synchronous API calls 460 460 * No caching ... ... @@ -462,6 +462,7 @@ 462 462 * Console logging 463 463 464 464 **Full AKEL (POC2+):** 501 + 465 465 * Multi-threaded processing 466 466 * Async API calls 467 467 * Evidence caching ... ... @@ -468,7 +468,6 @@ 468 468 * Advanced error handling with retry 469 469 * Structured logging + monitoring 470 470 471 - 472 472 == 9. POC Philosophy == 473 473 474 474 {{info}} ... ... @@ -477,47 +477,55 @@ 477 477 478 478 === 9.1 Core Principles === 479 479 480 -**1. Prove Concept, Not Production** 516 +* 517 +** 518 +**1. Prove Concept, Not Production 481 481 * POC validates AI can do the job 482 482 * Production quality comes in POC2 and Beta 0 483 483 * Focus on "does it work?" not "is it perfect?" 484 484 485 485 **2. Implement Subset of Requirements** 524 + 486 486 * POC covers FR1-7, NFR11 (lite) 487 487 * All other requirements deferred 488 488 * Clear mapping to [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]] 489 489 490 490 **3. Quality Gates Validate Approach** 530 + 491 491 * 2 gates prove the concept 492 492 * Remaining 5 gates added in POC2 493 493 * Gates must demonstrably improve quality 494 494 495 495 **4. Iterate Based on Results** 536 + 496 496 * POC results determine next steps 497 497 * Decision gate after POC1 498 498 * Flexibility to pivot if needed 499 499 541 +=== 9.2 Success === 500 500 501 - ===9.2 Success =Clear Path Forward ===543 + Clear Path Forward === 502 502 503 503 POC succeeds if we can confidently answer: 504 504 505 505 ✅ **Technical Feasibility:** 548 + 506 506 * Can AI extract claims reliably? 507 507 * Can AI find balanced evidence? 508 508 * Can AI compute reasonable verdicts? 509 509 510 510 ✅ **Quality Approach:** 554 + 511 511 * Do quality gates improve output? 512 512 * Can we measure and track quality? 513 513 * Is the gate approach scalable? 514 514 515 515 ✅ **Production Path:** 560 + 516 516 * Is the core architecture sound? 517 517 * What needs improvement for production? 518 518 * Is POC2 the right next step? 519 519 520 - 521 521 == 10. Related Pages == 522 522 523 523 * **[[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]** - Full system requirements (this POC implements a subset) ... ... @@ -526,11 +526,10 @@ 526 526 * **[[Implementation Roadmap>>FactHarbor.Roadmap.WebHome]]** - POC1, POC2, Beta 0, V1.0 phases 527 527 * **[[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]** - What users need (drives requirements) 528 528 529 - 530 530 **Document Owner:** Technical Team 531 531 **Review Frequency:** After each POC iteration 532 532 **Version History:** 576 + 533 533 * v1.0 - Initial POC requirements 534 534 * v2.0 - Updated after specification cross-check 535 535 * v3.0 - Aligned with Main Requirements (FR/NFR IDs added) 536 -