Last modified by Robert Schaub on 2025/12/23 15:53

From version 1.2
edited by Robert Schaub
on 2025/12/23 15:53
Change comment: Renamed from xwiki:Test.FactHarbor.Specification.POC.Requirements
To version 1.1
edited by Robert Schaub
on 2025/12/23 15:32
Change comment: Imported from XAR

Summary

Details

Page properties
Content
... ... @@ -14,11 +14,9 @@
14 14  === 1.1 What POC Tests ===
15 15  
16 16  **Core Question:**
17 -
18 18  > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?
19 19  
20 20  **What we're proving:**
21 -
22 22  * AI can identify factual claims from text
23 23  * AI can evaluate those claims with structured evidence
24 24  * Quality gates can filter unreliable outputs
... ... @@ -25,7 +25,6 @@
25 25  * The core workflow is technically feasible
26 26  
27 27  **What we're NOT proving:**
28 -
29 29  * Production-ready reliability (that's POC2)
30 30  * User-facing features (that's Beta 0)
31 31  * Full IFCN compliance (that's V1.0)
... ... @@ -35,15 +35,15 @@
35 35  POC1 implements a **subset** of the full system requirements defined in [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]].
36 36  
37 37  **Scope Summary:**
38 -
39 39  * **In Scope:** 8 requirements (7 FRs + 1 NFR)
40 40  * **Partial:** 3 NFRs (simplified versions)
41 41  * **Out of Scope:** 19 requirements (deferred to later phases)
42 42  
39 +
43 43  == 2. POC1 Scope ==
44 44  
45 45  {{success}}
46 -**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor V0\.9\.82.Roadmap.Requirements-Roadmap-Matrix.WebHome]]
43 +**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor.Roadmap.Requirements-Roadmap-Matrix.WebHome]]
47 47  
48 48  The Roadmap Matrix is the single source of truth for which requirements are implemented in which phases. This page provides POC1-specific implementation details only.
49 49  {{/success}}
... ... @@ -56,7 +56,6 @@
56 56  | **NFR11** | Quality Assurance Framework | 4 quality gates implemented
57 57  
58 58  **POC1 also implements these workflow components** (detailed as FR1-FR6, FR8, FR11, FR13 in implementation sections below):
59 -
60 60  * Claim extraction (FR1)
61 61  * Claim context (FR2)
62 62  * Multiple scenarios (FR3)
... ... @@ -67,7 +67,6 @@
67 67  * In-article highlighting (FR13) - deferred to Beta 0
68 68  
69 69  **Partial implementations:**
70 -
71 71  * NFR1 (Explainability) - Basic only
72 72  * NFR2 (Performance) - Functional but not optimized
73 73  * NFR3 (Transparency) - Basic only
... ... @@ -83,7 +83,6 @@
83 83  **Main Requirement:** AI extracts factual claims from input text
84 84  
85 85  **POC Implementation:**
86 -
87 87  * ✅ AKEL extracts claims using LLM
88 88  * ✅ Each claim includes original text reference
89 89  * ✅ Claims are identified as factual/non-factual
... ... @@ -90,17 +90,16 @@
90 90  * ❌ No advanced claim parsing (added in POC2)
91 91  
92 92  **Acceptance Criteria:**
93 -
94 94  * Extracts 3-5 claims from typical article
95 95  * Identifies factual vs non-factual claims
96 96  * Quality Gate 1 validates extraction
97 97  
91 +
98 98  === 3.2 FR3: Multiple Scenarios (Full Implementation) ===
99 99  
100 100  **Main Requirement:** Generate multiple interpretation scenarios for ambiguous claims
101 101  
102 102  **POC Implementation:**
103 -
104 104  * ✅ AKEL generates 2-3 scenarios per claim
105 105  * ✅ Scenarios capture different interpretations
106 106  * ✅ Each scenario is evaluated separately
... ... @@ -107,17 +107,16 @@
107 107  * ✅ Verdict considers all scenarios
108 108  
109 109  **Acceptance Criteria:**
110 -
111 111  * Generates 2+ scenarios for ambiguous claims
112 112  * Scenarios are meaningfully different
113 113  * All scenarios are evaluated
114 114  
107 +
115 115  === 3.3 FR4: Analysis Summary (Basic Implementation) ===
116 116  
117 117  **Main Requirement:** Provide user-friendly summary of analysis
118 118  
119 119  **POC Implementation:**
120 -
121 121  * ✅ Simple text summary generated
122 122  * ❌ No rich formatting (added in Beta 0)
123 123  * ❌ No visual elements (added in Beta 0)
... ... @@ -135,12 +135,10 @@
135 135  === 3.4 FR5-FR6: Evidence Collection & Evaluation (Full Implementation) ===
136 136  
137 137  **Main Requirements:**
138 -
139 139  * FR5: Collect supporting and opposing evidence
140 140  * FR6: Evaluate evidence source reliability
141 141  
142 142  **POC Implementation:**
143 -
144 144  * ✅ AKEL searches for evidence (web/knowledge base)
145 145  * ✅ **Mandatory contradiction search** (finds opposing evidence)
146 146  * ✅ Source reliability scoring
... ... @@ -148,17 +148,16 @@
148 148  * ❌ No advanced source verification (added in POC2)
149 149  
150 150  **Acceptance Criteria:**
151 -
152 152  * Finds 2+ supporting evidence items
153 153  * Finds 1+ opposing evidence (if exists)
154 154  * Sources scored for reliability
155 155  
145 +
156 156  === 3.5 FR7: Automated Verdicts (Full Implementation) ===
157 157  
158 158  **Main Requirement:** AI computes verdicts with uncertainty quantification
159 159  
160 160  **POC Implementation:**
161 -
162 162  * ✅ Probabilistic verdicts (0-100% confidence)
163 163  * ✅ Uncertainty explicitly stated
164 164  * ✅ Reasoning chain provided
... ... @@ -173,11 +173,11 @@
173 173  ```
174 174  
175 175  **Acceptance Criteria:**
176 -
177 177  * Verdicts include probability (0-100%)
178 178  * Uncertainty explicitly quantified
179 179  * Reasoning chain explains verdict
180 180  
169 +
181 181  === 3.6 NFR11: Quality Assurance Framework (LITE VERSION) ===
182 182  
183 183  **Main Requirement:** Complete quality assurance with 7 quality gates
... ... @@ -185,13 +185,11 @@
185 185  **POC Implementation:** **2 gates only**
186 186  
187 187  **Quality Gate 1: Claim Validation**
188 -
189 189  * ✅ Validates claim is factual and verifiable
190 190  * ✅ Blocks non-factual claims (opinion/prediction/ambiguous)
191 191  * ✅ Provides clear rejection reason
192 192  
193 193  **Quality Gate 4: Verdict Confidence Assessment**
194 -
195 195  * ✅ Validates ≥2 sources found
196 196  * ✅ Validates quality score ≥0.6
197 197  * ✅ Blocks low-confidence verdicts
... ... @@ -198,7 +198,6 @@
198 198  * ✅ Provides clear rejection reason
199 199  
200 200  **Out of Scope (POC2+):**
201 -
202 202  * ❌ Gate 2: Evidence Relevance
203 203  * ❌ Gate 3: Scenario Coherence
204 204  * ❌ Gate 5: Source Diversity
... ... @@ -211,13 +211,11 @@
211 211  === 3.7 NFR1-3: Performance, Scalability, Reliability (Basic) ===
212 212  
213 213  **Main Requirements:**
214 -
215 215  * NFR1: Response time < 30 seconds
216 216  * NFR2: Handle 1000+ concurrent users
217 217  * NFR3: 99.9% uptime
218 218  
219 219  **POC Implementation:**
220 -
221 221  * ⚠️ **Response time monitored** (not optimized)
222 222  * ⚠️ **Single-threaded processing** (no concurrency)
223 223  * ⚠️ **Basic error handling** (no advanced retry logic)
... ... @@ -225,11 +225,11 @@
225 225  **Rationale:** POC proves functionality. Performance optimization happens in POC2.
226 226  
227 227  **POC Acceptance:**
228 -
229 229  * Analysis completes (no timeout requirement)
230 230  * Errors don't crash system
231 231  * Basic logging in place
232 232  
216 +
233 233  == 4. What's NOT in POC Scope ==
234 234  
235 235  === 4.1 User-Facing Features (Beta 0+) ===
... ... @@ -239,7 +239,6 @@
239 239  {{/warning}}
240 240  
241 241  **Out of Scope:**
242 -
243 243  * ❌ User accounts and authentication (FR8)
244 244  * ❌ User corrections system (FR9, FR45-46)
245 245  * ❌ Public publishing interface (FR10)
... ... @@ -253,7 +253,6 @@
253 253  === 4.2 Advanced Features (V1.0+) ===
254 254  
255 255  **Out of Scope:**
256 -
257 257  * ❌ IFCN compliance (FR47)
258 258  * ❌ ClaimReview schema (FR48)
259 259  * ❌ Archive.org integration (FR49)
... ... @@ -268,7 +268,6 @@
268 268  === 4.3 Production Requirements (POC2, Beta 0) ===
269 269  
270 270  **Out of Scope:**
271 -
272 272  * ❌ Security controls (NFR4, NFR12)
273 273  * ❌ Code maintainability (NFR5)
274 274  * ❌ System monitoring (NFR13)
... ... @@ -285,26 +285,21 @@
285 285  
286 286  For each analyzed claim, POC must produce:
287 287  
288 -*
289 -**
290 -**1. Claim
269 +**1. Claim**
291 291  * Original text
292 292  * Classification (factual/non-factual/ambiguous)
293 293  * If non-factual: Clear reason why
294 294  
295 295  **2. Scenarios** (if factual)
296 -
297 297  * 2-3 interpretation scenarios
298 298  * Each scenario clearly described
299 299  
300 300  **3. Evidence** (if factual)
301 -
302 302  * Supporting evidence (2+ items)
303 303  * Opposing evidence (if exists)
304 304  * Source URLs and reliability scores
305 305  
306 306  **4. Verdict** (if factual)
307 -
308 308  * Probability (0-100%)
309 309  * Uncertainty quantification
310 310  * Confidence level (LOW/MEDIUM/HIGH)
... ... @@ -311,10 +311,10 @@
311 311  * Reasoning chain
312 312  
313 313  **5. Quality Status**
314 -
315 315  * Which gates passed/failed
316 316  * If failed: Clear explanation why
317 317  
293 +
318 318  === 5.2 Example POC Output ===
319 319  
320 320  {{code language="json"}}
... ... @@ -366,7 +366,6 @@
366 366  POC is successful if:
367 367  
368 368  ✅ **FR1-FR7 Requirements Met:**
369 -
370 370  1. Extracts 3-5 factual claims from test articles
371 371  2. Generates 2-3 scenarios per ambiguous claim
372 372  3. Finds supporting AND opposing evidence
... ... @@ -374,21 +374,19 @@
374 374  5. Provides clear reasoning chains
375 375  
376 376  ✅ **Quality Gates Work:**
377 -
378 378  1. Gate 1 blocks non-factual claims (100% block rate)
379 379  2. Gate 4 blocks low-quality verdicts (blocks if <2 sources or quality <0.6)
380 380  3. Clear rejection reasons provided
381 381  
382 382  ✅ **NFR11 Met:**
383 -
384 384  1. Quality gates reduce hallucination rate
385 385  2. Blocked outputs have clear explanations
386 386  3. Quality metrics are logged
387 387  
361 +
388 388  === 6.2 Quality Thresholds ===
389 389  
390 390  **Minimum Acceptable:**
391 -
392 392  * ≥70% of test claims correctly classified (factual/non-factual)
393 393  * ≥60% of verdicts are reasonable (human evaluation)
394 394  * Gate 1 blocks 100% of non-factual claims
... ... @@ -395,17 +395,16 @@
395 395  * Gate 4 blocks verdicts with <2 sources
396 396  
397 397  **Target:**
398 -
399 399  * ≥80% claims correctly classified
400 400  * ≥75% verdicts are reasonable
401 401  * <10% false positives (blocking good claims)
402 402  
375 +
403 403  === 6.3 POC Decision Gate ===
404 404  
405 405  **After POC1, we decide:**
406 406  
407 407  **✅ PROCEED to POC2** if:
408 -
409 409  * Success criteria met
410 410  * Quality gates demonstrably improve output
411 411  * Core workflow is technically sound
... ... @@ -412,72 +412,65 @@
412 412  * Clear path to production quality
413 413  
414 414  **⚠️ ITERATE POC1** if:
415 -
416 416  * Success criteria partially met
417 417  * Gates work but need tuning
418 418  * Core issues identified but fixable
419 419  
420 420  **❌ PIVOT APPROACH** if:
421 -
422 422  * Success criteria not met
423 423  * Fundamental AI limitations discovered
424 424  * Quality gates insufficient
425 425  * Alternative approach needed
426 426  
397 +
427 427  == 7. Test Cases ==
428 428  
429 429  === 7.1 Happy Path ===
430 430  
431 431  **Test 1: Simple Factual Claim**
432 -
433 433  * Input: "Paris is the capital of France"
434 -* Expected: Factual, 1 scenario, verdict 95% true
404 +* Expected: Factual, 1 scenario, verdict ~95% true
435 435  
436 436  **Test 2: Ambiguous Claim**
437 -
438 438  * Input: "Switzerland has the highest income in Europe"
439 439  * Expected: Factual, 2-3 scenarios, verdict with uncertainty
440 440  
441 441  **Test 3: Statistical Claim**
442 -
443 443  * Input: "10% of people have condition X"
444 444  * Expected: Factual, evidence with numbers, probabilistic verdict
445 445  
414 +
446 446  === 7.2 Edge Cases ===
447 447  
448 448  **Test 4: Opinion**
449 -
450 450  * Input: "Paris is the best city"
451 451  * Expected: Non-factual (opinion), blocked by Gate 1
452 452  
453 453  **Test 5: Prediction**
454 -
455 455  * Input: "Bitcoin will reach $100,000 next year"
456 456  * Expected: Non-factual (prediction), blocked by Gate 1
457 457  
458 458  **Test 6: Insufficient Evidence**
459 -
460 460  * Input: Obscure factual claim with no sources
461 461  * Expected: Blocked by Gate 4 (<2 sources)
462 462  
429 +
463 463  === 7.3 Quality Gate Tests ===
464 464  
465 465  **Test 7: Gate 1 Effectiveness**
466 -
467 467  * Input: Mix of 10 factual + 10 non-factual claims
468 468  * Expected: Gate 1 blocks all 10 non-factual (100% precision)
469 469  
470 470  **Test 8: Gate 4 Effectiveness**
471 -
472 472  * Input: Claims with varying evidence availability
473 473  * Expected: Gate 4 blocks low-confidence verdicts
474 474  
440 +
475 475  == 8. Technical Architecture (POC) ==
476 476  
477 477  === 8.1 Simplified Architecture ===
478 478  
479 479  **POC Tech Stack:**
480 -
481 481  * **Frontend:** Simple web interface (Next.js + TypeScript)
482 482  * **Backend:** Single API endpoint
483 483  * **AI:** Claude API (Sonnet 4.5)
... ... @@ -490,7 +490,6 @@
490 490  === 8.2 AKEL Implementation ===
491 491  
492 492  **POC AKEL:**
493 -
494 494  * Single-threaded processing
495 495  * Synchronous API calls
496 496  * No caching
... ... @@ -498,7 +498,6 @@
498 498  * Console logging
499 499  
500 500  **Full AKEL (POC2+):**
501 -
502 502  * Multi-threaded processing
503 503  * Async API calls
504 504  * Evidence caching
... ... @@ -505,6 +505,7 @@
505 505  * Advanced error handling with retry
506 506  * Structured logging + monitoring
507 507  
471 +
508 508  == 9. POC Philosophy ==
509 509  
510 510  {{info}}
... ... @@ -513,55 +513,47 @@
513 513  
514 514  === 9.1 Core Principles ===
515 515  
516 -*
517 -**
518 -**1. Prove Concept, Not Production
480 +**1. Prove Concept, Not Production**
519 519  * POC validates AI can do the job
520 520  * Production quality comes in POC2 and Beta 0
521 521  * Focus on "does it work?" not "is it perfect?"
522 522  
523 523  **2. Implement Subset of Requirements**
524 -
525 525  * POC covers FR1-7, NFR11 (lite)
526 526  * All other requirements deferred
527 527  * Clear mapping to [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]
528 528  
529 529  **3. Quality Gates Validate Approach**
530 -
531 531  * 2 gates prove the concept
532 532  * Remaining 5 gates added in POC2
533 533  * Gates must demonstrably improve quality
534 534  
535 535  **4. Iterate Based on Results**
536 -
537 537  * POC results determine next steps
538 538  * Decision gate after POC1
539 539  * Flexibility to pivot if needed
540 540  
541 -=== 9.2 Success ===
542 542  
543 - Clear Path Forward ===
501 +=== 9.2 Success = Clear Path Forward ===
544 544  
545 545  POC succeeds if we can confidently answer:
546 546  
547 547  ✅ **Technical Feasibility:**
548 -
549 549  * Can AI extract claims reliably?
550 550  * Can AI find balanced evidence?
551 551  * Can AI compute reasonable verdicts?
552 552  
553 553  ✅ **Quality Approach:**
554 -
555 555  * Do quality gates improve output?
556 556  * Can we measure and track quality?
557 557  * Is the gate approach scalable?
558 558  
559 559  ✅ **Production Path:**
560 -
561 561  * Is the core architecture sound?
562 562  * What needs improvement for production?
563 563  * Is POC2 the right next step?
564 564  
520 +
565 565  == 10. Related Pages ==
566 566  
567 567  * **[[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]** - Full system requirements (this POC implements a subset)
... ... @@ -570,10 +570,11 @@
570 570  * **[[Implementation Roadmap>>FactHarbor.Roadmap.WebHome]]** - POC1, POC2, Beta 0, V1.0 phases
571 571  * **[[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]** - What users need (drives requirements)
572 572  
529 +
573 573  **Document Owner:** Technical Team
574 574  **Review Frequency:** After each POC iteration
575 575  **Version History:**
576 -
577 577  * v1.0 - Initial POC requirements
578 578  * v2.0 - Updated after specification cross-check
579 579  * v3.0 - Aligned with Main Requirements (FR/NFR IDs added)
536 +