Changes for page POC Requirements

Last modified by Robert Schaub on 2025/12/23 14:37

From version 1.1
edited by Robert Schaub
on 2025/12/23 12:09
Change comment: Imported from XAR
To version 1.2
edited by Robert Schaub
on 2025/12/23 14:37
Change comment: Renamed from xwiki:Test.FactHarbor.Specification.POC.Requirements

Summary

Details

Page properties
Content
... ... @@ -14,9 +14,11 @@
14 14  === 1.1 What POC Tests ===
15 15  
16 16  **Core Question:**
17 +
17 17  > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?
18 18  
19 19  **What we're proving:**
21 +
20 20  * AI can identify factual claims from text
21 21  * AI can evaluate those claims with structured evidence
22 22  * Quality gates can filter unreliable outputs
... ... @@ -23,6 +23,7 @@
23 23  * The core workflow is technically feasible
24 24  
25 25  **What we're NOT proving:**
28 +
26 26  * Production-ready reliability (that's POC2)
27 27  * User-facing features (that's Beta 0)
28 28  * Full IFCN compliance (that's V1.0)
... ... @@ -32,15 +32,15 @@
32 32  POC1 implements a **subset** of the full system requirements defined in [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]].
33 33  
34 34  **Scope Summary:**
38 +
35 35  * **In Scope:** 8 requirements (7 FRs + 1 NFR)
36 36  * **Partial:** 3 NFRs (simplified versions)
37 37  * **Out of Scope:** 19 requirements (deferred to later phases)
38 38  
39 -
40 40  == 2. POC1 Scope ==
41 41  
42 42  {{success}}
43 -**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor.Roadmap.Requirements-Roadmap-Matrix.WebHome]]
46 +**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor V0\.9\.79.Roadmap.Requirements-Roadmap-Matrix.WebHome]]
44 44  
45 45  The Roadmap Matrix is the single source of truth for which requirements are implemented in which phases. This page provides POC1-specific implementation details only.
46 46  {{/success}}
... ... @@ -53,6 +53,7 @@
53 53  | **NFR11** | Quality Assurance Framework | 4 quality gates implemented
54 54  
55 55  **POC1 also implements these workflow components** (detailed as FR1-FR6, FR8, FR11, FR13 in implementation sections below):
59 +
56 56  * Claim extraction (FR1)
57 57  * Claim context (FR2)
58 58  * Multiple scenarios (FR3)
... ... @@ -63,6 +63,7 @@
63 63  * In-article highlighting (FR13) - deferred to Beta 0
64 64  
65 65  **Partial implementations:**
70 +
66 66  * NFR1 (Explainability) - Basic only
67 67  * NFR2 (Performance) - Functional but not optimized
68 68  * NFR3 (Transparency) - Basic only
... ... @@ -78,6 +78,7 @@
78 78  **Main Requirement:** AI extracts factual claims from input text
79 79  
80 80  **POC Implementation:**
86 +
81 81  * ✅ AKEL extracts claims using LLM
82 82  * ✅ Each claim includes original text reference
83 83  * ✅ Claims are identified as factual/non-factual
... ... @@ -84,16 +84,17 @@
84 84  * ❌ No advanced claim parsing (added in POC2)
85 85  
86 86  **Acceptance Criteria:**
93 +
87 87  * Extracts 3-5 claims from typical article
88 88  * Identifies factual vs non-factual claims
89 89  * Quality Gate 1 validates extraction
90 90  
91 -
92 92  === 3.2 FR3: Multiple Scenarios (Full Implementation) ===
93 93  
94 94  **Main Requirement:** Generate multiple interpretation scenarios for ambiguous claims
95 95  
96 96  **POC Implementation:**
103 +
97 97  * ✅ AKEL generates 2-3 scenarios per claim
98 98  * ✅ Scenarios capture different interpretations
99 99  * ✅ Each scenario is evaluated separately
... ... @@ -100,16 +100,17 @@
100 100  * ✅ Verdict considers all scenarios
101 101  
102 102  **Acceptance Criteria:**
110 +
103 103  * Generates 2+ scenarios for ambiguous claims
104 104  * Scenarios are meaningfully different
105 105  * All scenarios are evaluated
106 106  
107 -
108 108  === 3.3 FR4: Analysis Summary (Basic Implementation) ===
109 109  
110 110  **Main Requirement:** Provide user-friendly summary of analysis
111 111  
112 112  **POC Implementation:**
120 +
113 113  * ✅ Simple text summary generated
114 114  * ❌ No rich formatting (added in Beta 0)
115 115  * ❌ No visual elements (added in Beta 0)
... ... @@ -127,10 +127,12 @@
127 127  === 3.4 FR5-FR6: Evidence Collection & Evaluation (Full Implementation) ===
128 128  
129 129  **Main Requirements:**
138 +
130 130  * FR5: Collect supporting and opposing evidence
131 131  * FR6: Evaluate evidence source reliability
132 132  
133 133  **POC Implementation:**
143 +
134 134  * ✅ AKEL searches for evidence (web/knowledge base)
135 135  * ✅ **Mandatory contradiction search** (finds opposing evidence)
136 136  * ✅ Source reliability scoring
... ... @@ -138,16 +138,17 @@
138 138  * ❌ No advanced source verification (added in POC2)
139 139  
140 140  **Acceptance Criteria:**
151 +
141 141  * Finds 2+ supporting evidence items
142 142  * Finds 1+ opposing evidence (if exists)
143 143  * Sources scored for reliability
144 144  
145 -
146 146  === 3.5 FR7: Automated Verdicts (Full Implementation) ===
147 147  
148 148  **Main Requirement:** AI computes verdicts with uncertainty quantification
149 149  
150 150  **POC Implementation:**
161 +
151 151  * ✅ Probabilistic verdicts (0-100% confidence)
152 152  * ✅ Uncertainty explicitly stated
153 153  * ✅ Reasoning chain provided
... ... @@ -162,11 +162,11 @@
162 162  ```
163 163  
164 164  **Acceptance Criteria:**
176 +
165 165  * Verdicts include probability (0-100%)
166 166  * Uncertainty explicitly quantified
167 167  * Reasoning chain explains verdict
168 168  
169 -
170 170  === 3.6 NFR11: Quality Assurance Framework (LITE VERSION) ===
171 171  
172 172  **Main Requirement:** Complete quality assurance with 7 quality gates
... ... @@ -174,11 +174,13 @@
174 174  **POC Implementation:** **2 gates only**
175 175  
176 176  **Quality Gate 1: Claim Validation**
188 +
177 177  * ✅ Validates claim is factual and verifiable
178 178  * ✅ Blocks non-factual claims (opinion/prediction/ambiguous)
179 179  * ✅ Provides clear rejection reason
180 180  
181 181  **Quality Gate 4: Verdict Confidence Assessment**
194 +
182 182  * ✅ Validates ≥2 sources found
183 183  * ✅ Validates quality score ≥0.6
184 184  * ✅ Blocks low-confidence verdicts
... ... @@ -185,6 +185,7 @@
185 185  * ✅ Provides clear rejection reason
186 186  
187 187  **Out of Scope (POC2+):**
201 +
188 188  * ❌ Gate 2: Evidence Relevance
189 189  * ❌ Gate 3: Scenario Coherence
190 190  * ❌ Gate 5: Source Diversity
... ... @@ -197,11 +197,13 @@
197 197  === 3.7 NFR1-3: Performance, Scalability, Reliability (Basic) ===
198 198  
199 199  **Main Requirements:**
214 +
200 200  * NFR1: Response time < 30 seconds
201 201  * NFR2: Handle 1000+ concurrent users
202 202  * NFR3: 99.9% uptime
203 203  
204 204  **POC Implementation:**
220 +
205 205  * ⚠️ **Response time monitored** (not optimized)
206 206  * ⚠️ **Single-threaded processing** (no concurrency)
207 207  * ⚠️ **Basic error handling** (no advanced retry logic)
... ... @@ -209,11 +209,11 @@
209 209  **Rationale:** POC proves functionality. Performance optimization happens in POC2.
210 210  
211 211  **POC Acceptance:**
228 +
212 212  * Analysis completes (no timeout requirement)
213 213  * Errors don't crash system
214 214  * Basic logging in place
215 215  
216 -
217 217  == 4. What's NOT in POC Scope ==
218 218  
219 219  === 4.1 User-Facing Features (Beta 0+) ===
... ... @@ -223,6 +223,7 @@
223 223  {{/warning}}
224 224  
225 225  **Out of Scope:**
242 +
226 226  * ❌ User accounts and authentication (FR8)
227 227  * ❌ User corrections system (FR9, FR45-46)
228 228  * ❌ Public publishing interface (FR10)
... ... @@ -236,6 +236,7 @@
236 236  === 4.2 Advanced Features (V1.0+) ===
237 237  
238 238  **Out of Scope:**
256 +
239 239  * ❌ IFCN compliance (FR47)
240 240  * ❌ ClaimReview schema (FR48)
241 241  * ❌ Archive.org integration (FR49)
... ... @@ -250,6 +250,7 @@
250 250  === 4.3 Production Requirements (POC2, Beta 0) ===
251 251  
252 252  **Out of Scope:**
271 +
253 253  * ❌ Security controls (NFR4, NFR12)
254 254  * ❌ Code maintainability (NFR5)
255 255  * ❌ System monitoring (NFR13)
... ... @@ -266,21 +266,26 @@
266 266  
267 267  For each analyzed claim, POC must produce:
268 268  
269 -**1. Claim**
288 +*
289 +**
290 +**1. Claim
270 270  * Original text
271 271  * Classification (factual/non-factual/ambiguous)
272 272  * If non-factual: Clear reason why
273 273  
274 274  **2. Scenarios** (if factual)
296 +
275 275  * 2-3 interpretation scenarios
276 276  * Each scenario clearly described
277 277  
278 278  **3. Evidence** (if factual)
301 +
279 279  * Supporting evidence (2+ items)
280 280  * Opposing evidence (if exists)
281 281  * Source URLs and reliability scores
282 282  
283 283  **4. Verdict** (if factual)
307 +
284 284  * Probability (0-100%)
285 285  * Uncertainty quantification
286 286  * Confidence level (LOW/MEDIUM/HIGH)
... ... @@ -287,10 +287,10 @@
287 287  * Reasoning chain
288 288  
289 289  **5. Quality Status**
314 +
290 290  * Which gates passed/failed
291 291  * If failed: Clear explanation why
292 292  
293 -
294 294  === 5.2 Example POC Output ===
295 295  
296 296  {{code language="json"}}
... ... @@ -342,6 +342,7 @@
342 342  POC is successful if:
343 343  
344 344  ✅ **FR1-FR7 Requirements Met:**
369 +
345 345  1. Extracts 3-5 factual claims from test articles
346 346  2. Generates 2-3 scenarios per ambiguous claim
347 347  3. Finds supporting AND opposing evidence
... ... @@ -349,19 +349,21 @@
349 349  5. Provides clear reasoning chains
350 350  
351 351  ✅ **Quality Gates Work:**
377 +
352 352  1. Gate 1 blocks non-factual claims (100% block rate)
353 353  2. Gate 4 blocks low-quality verdicts (blocks if <2 sources or quality <0.6)
354 354  3. Clear rejection reasons provided
355 355  
356 356  ✅ **NFR11 Met:**
383 +
357 357  1. Quality gates reduce hallucination rate
358 358  2. Blocked outputs have clear explanations
359 359  3. Quality metrics are logged
360 360  
361 -
362 362  === 6.2 Quality Thresholds ===
363 363  
364 364  **Minimum Acceptable:**
391 +
365 365  * ≥70% of test claims correctly classified (factual/non-factual)
366 366  * ≥60% of verdicts are reasonable (human evaluation)
367 367  * Gate 1 blocks 100% of non-factual claims
... ... @@ -368,16 +368,17 @@
368 368  * Gate 4 blocks verdicts with <2 sources
369 369  
370 370  **Target:**
398 +
371 371  * ≥80% claims correctly classified
372 372  * ≥75% verdicts are reasonable
373 373  * <10% false positives (blocking good claims)
374 374  
375 -
376 376  === 6.3 POC Decision Gate ===
377 377  
378 378  **After POC1, we decide:**
379 379  
380 380  **✅ PROCEED to POC2** if:
408 +
381 381  * Success criteria met
382 382  * Quality gates demonstrably improve output
383 383  * Core workflow is technically sound
... ... @@ -384,65 +384,72 @@
384 384  * Clear path to production quality
385 385  
386 386  **⚠️ ITERATE POC1** if:
415 +
387 387  * Success criteria partially met
388 388  * Gates work but need tuning
389 389  * Core issues identified but fixable
390 390  
391 391  **❌ PIVOT APPROACH** if:
421 +
392 392  * Success criteria not met
393 393  * Fundamental AI limitations discovered
394 394  * Quality gates insufficient
395 395  * Alternative approach needed
396 396  
397 -
398 398  == 7. Test Cases ==
399 399  
400 400  === 7.1 Happy Path ===
401 401  
402 402  **Test 1: Simple Factual Claim**
432 +
403 403  * Input: "Paris is the capital of France"
404 -* Expected: Factual, 1 scenario, verdict ~95% true
434 +* Expected: Factual, 1 scenario, verdict 95% true
405 405  
406 406  **Test 2: Ambiguous Claim**
437 +
407 407  * Input: "Switzerland has the highest income in Europe"
408 408  * Expected: Factual, 2-3 scenarios, verdict with uncertainty
409 409  
410 410  **Test 3: Statistical Claim**
442 +
411 411  * Input: "10% of people have condition X"
412 412  * Expected: Factual, evidence with numbers, probabilistic verdict
413 413  
414 -
415 415  === 7.2 Edge Cases ===
416 416  
417 417  **Test 4: Opinion**
449 +
418 418  * Input: "Paris is the best city"
419 419  * Expected: Non-factual (opinion), blocked by Gate 1
420 420  
421 421  **Test 5: Prediction**
454 +
422 422  * Input: "Bitcoin will reach $100,000 next year"
423 423  * Expected: Non-factual (prediction), blocked by Gate 1
424 424  
425 425  **Test 6: Insufficient Evidence**
459 +
426 426  * Input: Obscure factual claim with no sources
427 427  * Expected: Blocked by Gate 4 (<2 sources)
428 428  
429 -
430 430  === 7.3 Quality Gate Tests ===
431 431  
432 432  **Test 7: Gate 1 Effectiveness**
466 +
433 433  * Input: Mix of 10 factual + 10 non-factual claims
434 434  * Expected: Gate 1 blocks all 10 non-factual (100% precision)
435 435  
436 436  **Test 8: Gate 4 Effectiveness**
471 +
437 437  * Input: Claims with varying evidence availability
438 438  * Expected: Gate 4 blocks low-confidence verdicts
439 439  
440 -
441 441  == 8. Technical Architecture (POC) ==
442 442  
443 443  === 8.1 Simplified Architecture ===
444 444  
445 445  **POC Tech Stack:**
480 +
446 446  * **Frontend:** Simple web interface (Next.js + TypeScript)
447 447  * **Backend:** Single API endpoint
448 448  * **AI:** Claude API (Sonnet 4.5)
... ... @@ -455,6 +455,7 @@
455 455  === 8.2 AKEL Implementation ===
456 456  
457 457  **POC AKEL:**
493 +
458 458  * Single-threaded processing
459 459  * Synchronous API calls
460 460  * No caching
... ... @@ -462,6 +462,7 @@
462 462  * Console logging
463 463  
464 464  **Full AKEL (POC2+):**
501 +
465 465  * Multi-threaded processing
466 466  * Async API calls
467 467  * Evidence caching
... ... @@ -468,7 +468,6 @@
468 468  * Advanced error handling with retry
469 469  * Structured logging + monitoring
470 470  
471 -
472 472  == 9. POC Philosophy ==
473 473  
474 474  {{info}}
... ... @@ -477,47 +477,55 @@
477 477  
478 478  === 9.1 Core Principles ===
479 479  
480 -**1. Prove Concept, Not Production**
516 +*
517 +**
518 +**1. Prove Concept, Not Production
481 481  * POC validates AI can do the job
482 482  * Production quality comes in POC2 and Beta 0
483 483  * Focus on "does it work?" not "is it perfect?"
484 484  
485 485  **2. Implement Subset of Requirements**
524 +
486 486  * POC covers FR1-7, NFR11 (lite)
487 487  * All other requirements deferred
488 488  * Clear mapping to [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]
489 489  
490 490  **3. Quality Gates Validate Approach**
530 +
491 491  * 2 gates prove the concept
492 492  * Remaining 5 gates added in POC2
493 493  * Gates must demonstrably improve quality
494 494  
495 495  **4. Iterate Based on Results**
536 +
496 496  * POC results determine next steps
497 497  * Decision gate after POC1
498 498  * Flexibility to pivot if needed
499 499  
541 +=== 9.2 Success ===
500 500  
501 -=== 9.2 Success = Clear Path Forward ===
543 + Clear Path Forward ===
502 502  
503 503  POC succeeds if we can confidently answer:
504 504  
505 505  ✅ **Technical Feasibility:**
548 +
506 506  * Can AI extract claims reliably?
507 507  * Can AI find balanced evidence?
508 508  * Can AI compute reasonable verdicts?
509 509  
510 510  ✅ **Quality Approach:**
554 +
511 511  * Do quality gates improve output?
512 512  * Can we measure and track quality?
513 513  * Is the gate approach scalable?
514 514  
515 515  ✅ **Production Path:**
560 +
516 516  * Is the core architecture sound?
517 517  * What needs improvement for production?
518 518  * Is POC2 the right next step?
519 519  
520 -
521 521  == 10. Related Pages ==
522 522  
523 523  * **[[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]** - Full system requirements (this POC implements a subset)
... ... @@ -526,11 +526,10 @@
526 526  * **[[Implementation Roadmap>>FactHarbor.Roadmap.WebHome]]** - POC1, POC2, Beta 0, V1.0 phases
527 527  * **[[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]** - What users need (drives requirements)
528 528  
529 -
530 530  **Document Owner:** Technical Team
531 531  **Review Frequency:** After each POC iteration
532 532  **Version History:**
576 +
533 533  * v1.0 - Initial POC requirements
534 534  * v2.0 - Updated after specification cross-check
535 535  * v3.0 - Aligned with Main Requirements (FR/NFR IDs added)
536 -