Last modified by Robert Schaub on 2025/12/23 18:00

From version 2.2
edited by Robert Schaub
on 2025/12/23 18:00
Change comment: Renamed from xwiki:Test.FactHarbor.Specification.POC.Requirements
To version 2.1
edited by Robert Schaub
on 2025/12/23 17:44
Change comment: Imported from XAR

Summary

Details

Page properties
Content
... ... @@ -14,11 +14,9 @@
14 14  === 1.1 What POC Tests ===
15 15  
16 16  **Core Question:**
17 -
18 18  > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?
19 19  
20 20  **What we're proving:**
21 -
22 22  * AI can identify factual claims from text
23 23  * AI can evaluate those claims with structured evidence
24 24  * Quality gates can filter unreliable outputs
... ... @@ -25,7 +25,6 @@
25 25  * The core workflow is technically feasible
26 26  
27 27  **What we're NOT proving:**
28 -
29 29  * Production-ready reliability (that's POC2)
30 30  * User-facing features (that's Beta 0)
31 31  * Full IFCN compliance (that's V1.0)
... ... @@ -35,15 +35,15 @@
35 35  POC1 implements a **subset** of the full system requirements defined in [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]].
36 36  
37 37  **Scope Summary:**
38 -
39 39  * **In Scope:** 8 requirements (7 FRs + 1 NFR)
40 40  * **Partial:** 3 NFRs (simplified versions)
41 41  * **Out of Scope:** 19 requirements (deferred to later phases)
42 42  
39 +
43 43  == 2. POC1 Scope ==
44 44  
45 45  {{success}}
46 -**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor V0\.9\.88 ex 2 new Org Pages.Roadmap.Requirements-Roadmap-Matrix.WebHome]]
43 +**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor.Roadmap.Requirements-Roadmap-Matrix.WebHome]]
47 47  
48 48  The Roadmap Matrix is the single source of truth for which requirements are implemented in which phases. This page provides POC1-specific implementation details only.
49 49  {{/success}}
... ... @@ -57,8 +57,9 @@
57 57  
58 58  **POC1 also implements these workflow components** (detailed as FR1-FR6 in implementation sections below)
59 59  
60 -{{info}}**Note:** FR11 (Audit Trail) and FR13 (In-Article Claim Highlighting) are deferred to Beta 0 for production readiness and user experience enhancement.{{/info}}:
61 -
57 +{{info}}
58 +**Note:** FR11 (Audit Trail) and FR13 (In-Article Claim Highlighting) are deferred to Beta 0 for production readiness and user experience enhancement.
59 +{{/info}}:
62 62  * Claim extraction (FR1)
63 63  * Claim context (FR2)
64 64  * Multiple scenarios (FR3)
... ... @@ -69,7 +69,6 @@
69 69  * In-article highlighting (FR13) - deferred to Beta 0
70 70  
71 71  **Partial implementations:**
72 -
73 73  * NFR1 (Explainability) - Basic only
74 74  * NFR2 (Performance) - Functional but not optimized
75 75  * NFR3 (Transparency) - Basic only
... ... @@ -85,7 +85,6 @@
85 85  **Main Requirement:** AI extracts factual claims from input text
86 86  
87 87  **POC Implementation:**
88 -
89 89  * ✅ AKEL extracts claims using LLM
90 90  * ✅ Each claim includes original text reference
91 91  * ✅ Claims are identified as factual/non-factual
... ... @@ -92,17 +92,16 @@
92 92  * ❌ No advanced claim parsing (added in POC2)
93 93  
94 94  **Acceptance Criteria:**
95 -
96 96  * Extracts 3-5 claims from typical article
97 97  * Identifies factual vs non-factual claims
98 98  * Quality Gate 1 validates extraction
99 99  
95 +
100 100  === 3.2 FR3: Multiple Scenarios (Full Implementation) ===
101 101  
102 102  **Main Requirement:** Generate multiple interpretation scenarios for ambiguous claims
103 103  
104 104  **POC Implementation:**
105 -
106 106  * ✅ AKEL generates 2-3 scenarios per claim
107 107  * ✅ Scenarios capture different interpretations
108 108  * ✅ Each scenario is evaluated separately
... ... @@ -109,17 +109,16 @@
109 109  * ✅ Verdict considers all scenarios
110 110  
111 111  **Acceptance Criteria:**
112 -
113 113  * Generates 2+ scenarios for ambiguous claims
114 114  * Scenarios are meaningfully different
115 115  * All scenarios are evaluated
116 116  
111 +
117 117  === 3.3 FR4: Analysis Summary (Basic Implementation) ===
118 118  
119 119  **Main Requirement:** Provide user-friendly summary of analysis
120 120  
121 121  **POC Implementation:**
122 -
123 123  * ✅ Simple text summary generated
124 124  * ❌ No rich formatting (added in Beta 0)
125 125  * ❌ No visual elements (added in Beta 0)
... ... @@ -137,12 +137,10 @@
137 137  === 3.4 FR5-FR6: Evidence Collection & Evaluation (Full Implementation) ===
138 138  
139 139  **Main Requirements:**
140 -
141 141  * FR5: Collect supporting and opposing evidence
142 142  * FR6: Evaluate evidence source reliability
143 143  
144 144  **POC Implementation:**
145 -
146 146  * ✅ AKEL searches for evidence (web/knowledge base)
147 147  * ✅ **Mandatory contradiction search** (finds opposing evidence)
148 148  * ✅ Source reliability scoring
... ... @@ -150,17 +150,16 @@
150 150  * ❌ No advanced source verification (added in POC2)
151 151  
152 152  **Acceptance Criteria:**
153 -
154 154  * Finds 2+ supporting evidence items
155 155  * Finds 1+ opposing evidence (if exists)
156 156  * Sources scored for reliability
157 157  
149 +
158 158  === 3.5 FR7: Automated Verdicts (Full Implementation) ===
159 159  
160 160  **Main Requirement:** AI computes verdicts with uncertainty quantification
161 161  
162 162  **POC Implementation:**
163 -
164 164  * ✅ Probabilistic verdicts (0-100% confidence)
165 165  * ✅ Uncertainty explicitly stated
166 166  * ✅ Reasoning chain provided
... ... @@ -175,11 +175,11 @@
175 175  ```
176 176  
177 177  **Acceptance Criteria:**
178 -
179 179  * Verdicts include probability (0-100%)
180 180  * Uncertainty explicitly quantified
181 181  * Reasoning chain explains verdict
182 182  
173 +
183 183  === 3.6 NFR11: Quality Assurance Framework (LITE VERSION) ===
184 184  
185 185  **Main Requirement:** Complete quality assurance with 7 quality gates
... ... @@ -187,13 +187,11 @@
187 187  **POC Implementation:** **2 gates only**
188 188  
189 189  **Quality Gate 1: Claim Validation**
190 -
191 191  * ✅ Validates claim is factual and verifiable
192 192  * ✅ Blocks non-factual claims (opinion/prediction/ambiguous)
193 193  * ✅ Provides clear rejection reason
194 194  
195 195  **Quality Gate 4: Verdict Confidence Assessment**
196 -
197 197  * ✅ Validates ≥2 sources found
198 198  * ✅ Validates quality score ≥0.6
199 199  * ✅ Blocks low-confidence verdicts
... ... @@ -200,7 +200,6 @@
200 200  * ✅ Provides clear rejection reason
201 201  
202 202  **Out of Scope (POC2+):**
203 -
204 204  * ❌ Gate 2: Evidence Relevance
205 205  * ❌ Gate 3: Scenario Coherence
206 206  * ❌ Gate 5: Source Diversity
... ... @@ -213,13 +213,11 @@
213 213  === 3.7 NFR1-3: Performance, Scalability, Reliability (Basic) ===
214 214  
215 215  **Main Requirements:**
216 -
217 217  * NFR1: Response time < 30 seconds
218 218  * NFR2: Handle 1000+ concurrent users
219 219  * NFR3: 99.9% uptime
220 220  
221 221  **POC Implementation:**
222 -
223 223  * ⚠️ **Response time monitored** (not optimized)
224 224  * ⚠️ **Single-threaded processing** (no concurrency)
225 225  * ⚠️ **Basic error handling** (no advanced retry logic)
... ... @@ -227,11 +227,11 @@
227 227  **Rationale:** POC proves functionality. Performance optimization happens in POC2.
228 228  
229 229  **POC Acceptance:**
230 -
231 231  * Analysis completes (no timeout requirement)
232 232  * Errors don't crash system
233 233  * Basic logging in place
234 234  
220 +
235 235  == 4. What's NOT in POC Scope ==
236 236  
237 237  === 4.1 User-Facing Features (Beta 0+) ===
... ... @@ -241,7 +241,6 @@
241 241  {{/warning}}
242 242  
243 243  **Out of Scope:**
244 -
245 245  * ❌ User accounts and authentication (FR8)
246 246  * ❌ User corrections system (FR9, FR45-46)
247 247  * ❌ Public publishing interface (FR10)
... ... @@ -255,7 +255,6 @@
255 255  === 4.2 Advanced Features (V1.0+) ===
256 256  
257 257  **Out of Scope:**
258 -
259 259  * ❌ IFCN compliance (FR47)
260 260  * ❌ ClaimReview schema (FR48)
261 261  * ❌ Archive.org integration (FR49)
... ... @@ -270,7 +270,6 @@
270 270  === 4.3 Production Requirements (POC2, Beta 0) ===
271 271  
272 272  **Out of Scope:**
273 -
274 274  * ❌ Security controls (NFR4, NFR12)
275 275  * ❌ Code maintainability (NFR5)
276 276  * ❌ System monitoring (NFR13)
... ... @@ -287,26 +287,21 @@
287 287  
288 288  For each analyzed claim, POC must produce:
289 289  
290 -*
291 -**
292 -**1. Claim
273 +**1. Claim**
293 293  * Original text
294 294  * Classification (factual/non-factual/ambiguous)
295 295  * If non-factual: Clear reason why
296 296  
297 297  **2. Scenarios** (if factual)
298 -
299 299  * 2-3 interpretation scenarios
300 300  * Each scenario clearly described
301 301  
302 302  **3. Evidence** (if factual)
303 -
304 304  * Supporting evidence (2+ items)
305 305  * Opposing evidence (if exists)
306 306  * Source URLs and reliability scores
307 307  
308 308  **4. Verdict** (if factual)
309 -
310 310  * Probability (0-100%)
311 311  * Uncertainty quantification
312 312  * Confidence level (LOW/MEDIUM/HIGH)
... ... @@ -313,10 +313,10 @@
313 313  * Reasoning chain
314 314  
315 315  **5. Quality Status**
316 -
317 317  * Which gates passed/failed
318 318  * If failed: Clear explanation why
319 319  
297 +
320 320  === 5.2 Example POC Output ===
321 321  
322 322  {{code language="json"}}
... ... @@ -368,7 +368,6 @@
368 368  POC is successful if:
369 369  
370 370  ✅ **FR1-FR7 Requirements Met:**
371 -
372 372  1. Extracts 3-5 factual claims from test articles
373 373  2. Generates 2-3 scenarios per ambiguous claim
374 374  3. Finds supporting AND opposing evidence
... ... @@ -376,21 +376,19 @@
376 376  5. Provides clear reasoning chains
377 377  
378 378  ✅ **Quality Gates Work:**
379 -
380 380  1. Gate 1 blocks non-factual claims (100% block rate)
381 381  2. Gate 4 blocks low-quality verdicts (blocks if <2 sources or quality <0.6)
382 382  3. Clear rejection reasons provided
383 383  
384 384  ✅ **NFR11 Met:**
385 -
386 386  1. Quality gates reduce hallucination rate
387 387  2. Blocked outputs have clear explanations
388 388  3. Quality metrics are logged
389 389  
365 +
390 390  === 6.2 Quality Thresholds ===
391 391  
392 392  **Minimum Acceptable:**
393 -
394 394  * ≥70% of test claims correctly classified (factual/non-factual)
395 395  * ≥60% of verdicts are reasonable (human evaluation)
396 396  * Gate 1 blocks 100% of non-factual claims
... ... @@ -397,17 +397,16 @@
397 397  * Gate 4 blocks verdicts with <2 sources
398 398  
399 399  **Target:**
400 -
401 401  * ≥80% claims correctly classified
402 402  * ≥75% verdicts are reasonable
403 403  * <10% false positives (blocking good claims)
404 404  
379 +
405 405  === 6.3 POC Decision Gate ===
406 406  
407 407  **After POC1, we decide:**
408 408  
409 409  **✅ PROCEED to POC2** if:
410 -
411 411  * Success criteria met
412 412  * Quality gates demonstrably improve output
413 413  * Core workflow is technically sound
... ... @@ -414,72 +414,65 @@
414 414  * Clear path to production quality
415 415  
416 416  **⚠️ ITERATE POC1** if:
417 -
418 418  * Success criteria partially met
419 419  * Gates work but need tuning
420 420  * Core issues identified but fixable
421 421  
422 422  **❌ PIVOT APPROACH** if:
423 -
424 424  * Success criteria not met
425 425  * Fundamental AI limitations discovered
426 426  * Quality gates insufficient
427 427  * Alternative approach needed
428 428  
401 +
429 429  == 7. Test Cases ==
430 430  
431 431  === 7.1 Happy Path ===
432 432  
433 433  **Test 1: Simple Factual Claim**
434 -
435 435  * Input: "Paris is the capital of France"
436 -* Expected: Factual, 1 scenario, verdict 95% true
408 +* Expected: Factual, 1 scenario, verdict ~95% true
437 437  
438 438  **Test 2: Ambiguous Claim**
439 -
440 440  * Input: "Switzerland has the highest income in Europe"
441 441  * Expected: Factual, 2-3 scenarios, verdict with uncertainty
442 442  
443 443  **Test 3: Statistical Claim**
444 -
445 445  * Input: "10% of people have condition X"
446 446  * Expected: Factual, evidence with numbers, probabilistic verdict
447 447  
418 +
448 448  === 7.2 Edge Cases ===
449 449  
450 450  **Test 4: Opinion**
451 -
452 452  * Input: "Paris is the best city"
453 453  * Expected: Non-factual (opinion), blocked by Gate 1
454 454  
455 455  **Test 5: Prediction**
456 -
457 457  * Input: "Bitcoin will reach $100,000 next year"
458 458  * Expected: Non-factual (prediction), blocked by Gate 1
459 459  
460 460  **Test 6: Insufficient Evidence**
461 -
462 462  * Input: Obscure factual claim with no sources
463 463  * Expected: Blocked by Gate 4 (<2 sources)
464 464  
433 +
465 465  === 7.3 Quality Gate Tests ===
466 466  
467 467  **Test 7: Gate 1 Effectiveness**
468 -
469 469  * Input: Mix of 10 factual + 10 non-factual claims
470 470  * Expected: Gate 1 blocks all 10 non-factual (100% precision)
471 471  
472 472  **Test 8: Gate 4 Effectiveness**
473 -
474 474  * Input: Claims with varying evidence availability
475 475  * Expected: Gate 4 blocks low-confidence verdicts
476 476  
444 +
477 477  == 8. Technical Architecture (POC) ==
478 478  
479 479  === 8.1 Simplified Architecture ===
480 480  
481 481  **POC Tech Stack:**
482 -
483 483  * **Frontend:** Simple web interface (Next.js + TypeScript)
484 484  * **Backend:** Single API endpoint
485 485  * **AI:** Claude API (Sonnet 4.5)
... ... @@ -492,7 +492,6 @@
492 492  === 8.2 AKEL Implementation ===
493 493  
494 494  **POC AKEL:**
495 -
496 496  * Single-threaded processing
497 497  * Synchronous API calls
498 498  * No caching
... ... @@ -500,7 +500,6 @@
500 500  * Console logging
501 501  
502 502  **Full AKEL (POC2+):**
503 -
504 504  * Multi-threaded processing
505 505  * Async API calls
506 506  * Evidence caching
... ... @@ -507,6 +507,7 @@
507 507  * Advanced error handling with retry
508 508  * Structured logging + monitoring
509 509  
475 +
510 510  == 9. POC Philosophy ==
511 511  
512 512  {{info}}
... ... @@ -515,55 +515,47 @@
515 515  
516 516  === 9.1 Core Principles ===
517 517  
518 -*
519 -**
520 -**1. Prove Concept, Not Production
484 +**1. Prove Concept, Not Production**
521 521  * POC validates AI can do the job
522 522  * Production quality comes in POC2 and Beta 0
523 523  * Focus on "does it work?" not "is it perfect?"
524 524  
525 525  **2. Implement Subset of Requirements**
526 -
527 527  * POC covers FR1-7, NFR11 (lite)
528 528  * All other requirements deferred
529 529  * Clear mapping to [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]
530 530  
531 531  **3. Quality Gates Validate Approach**
532 -
533 533  * 2 gates prove the concept
534 534  * Remaining 5 gates added in POC2
535 535  * Gates must demonstrably improve quality
536 536  
537 537  **4. Iterate Based on Results**
538 -
539 539  * POC results determine next steps
540 540  * Decision gate after POC1
541 541  * Flexibility to pivot if needed
542 542  
543 -=== 9.2 Success ===
544 544  
545 - Clear Path Forward ===
505 +=== 9.2 Success = Clear Path Forward ===
546 546  
547 547  POC succeeds if we can confidently answer:
548 548  
549 549  ✅ **Technical Feasibility:**
550 -
551 551  * Can AI extract claims reliably?
552 552  * Can AI find balanced evidence?
553 553  * Can AI compute reasonable verdicts?
554 554  
555 555  ✅ **Quality Approach:**
556 -
557 557  * Do quality gates improve output?
558 558  * Can we measure and track quality?
559 559  * Is the gate approach scalable?
560 560  
561 561  ✅ **Production Path:**
562 -
563 563  * Is the core architecture sound?
564 564  * What needs improvement for production?
565 565  * Is POC2 the right next step?
566 566  
524 +
567 567  == 10. Related Pages ==
568 568  
569 569  * **[[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]** - Full system requirements (this POC implements a subset)
... ... @@ -572,10 +572,11 @@
572 572  * **[[Implementation Roadmap>>FactHarbor.Roadmap.WebHome]]** - POC1, POC2, Beta 0, V1.0 phases
573 573  * **[[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]** - What users need (drives requirements)
574 574  
533 +
575 575  **Document Owner:** Technical Team
576 576  **Review Frequency:** After each POC iteration
577 577  **Version History:**
578 -
579 579  * v1.0 - Initial POC requirements
580 580  * v2.0 - Updated after specification cross-check
581 581  * v3.0 - Aligned with Main Requirements (FR/NFR IDs added)
540 +