Changes for page POC Requirements

Last modified by Robert Schaub on 2025/12/23 11:35

From version 1.1
edited by Robert Schaub
on 2025/12/23 11:20
Change comment: Imported from XAR
To version 1.3
edited by Robert Schaub
on 2025/12/23 11:35
Change comment: Renamed back-links.

Summary

Details

Page properties
Content
... ... @@ -14,9 +14,11 @@
14 14  === 1.1 What POC Tests ===
15 15  
16 16  **Core Question:**
17 +
17 17  > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?
18 18  
19 19  **What we're proving:**
21 +
20 20  * AI can identify factual claims from text
21 21  * AI can evaluate those claims with structured evidence
22 22  * Quality gates can filter unreliable outputs
... ... @@ -23,6 +23,7 @@
23 23  * The core workflow is technically feasible
24 24  
25 25  **What we're NOT proving:**
28 +
26 26  * Production-ready reliability (that's POC2)
27 27  * User-facing features (that's Beta 0)
28 28  * Full IFCN compliance (that's V1.0)
... ... @@ -32,22 +32,23 @@
32 32  POC1 implements a **subset** of the full system requirements defined in [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]].
33 33  
34 34  **Scope Summary:**
38 +
35 35  * **In Scope:** 8 requirements (7 FRs + 1 NFR)
36 36  * **Partial:** 3 NFRs (simplified versions)
37 37  * **Out of Scope:** 19 requirements (deferred to later phases)
38 38  
39 -
40 40  == 2. Requirements Scope Matrix ==
41 41  
42 42  {{success}}
43 -**Authoritative Source:** See [[Requirements Roadmap Matrix>>Test.FactHarbor.Specification.Requirements-Roadmap-Matrix.WebHome]] for complete phase-to-requirement mapping across all phases.
46 +**Authoritative Source:** See [[Requirements Roadmap Matrix>>Test.FactHarbor V0\.9\.78.Specification.Requirements-Roadmap-Matrix.WebHome]] for complete phase-to-requirement mapping across all phases.
44 44  {{/success}}
45 45  
46 46  **POC1 Scope Summary:**
47 47  
48 -POC1 implements the following requirements from the [[Main Requirements>>Test.FactHarbor.Specification.Requirements.WebHome]]:
51 +POC1 implements the following requirements from the [[Main Requirements>>Test.FactHarbor V0\.9\.78.Specification.Requirements.WebHome]]:
49 49  
50 50  **Full Implementation (8 requirements):**
54 +
51 51  * FR1: Claim Extraction
52 52  * FR2: Claim Context
53 53  * FR3: Multiple Scenarios
... ... @@ -58,11 +58,13 @@
58 58  * NFR11: AKEL Quality Assurance Framework (Basic - 4 quality gates)
59 59  
60 60  **Partial Implementation (3 requirements):**
65 +
61 61  * NFR1: Explainability (Basic explanations only)
62 62  * NFR2: Performance (Functional but not optimized)
63 63  * NFR3: Transparency (Basic transparency)
64 64  
65 65  **Deferred to Later Phases:**
71 +
66 66  * All other requirements (see Roadmap Matrix for phase assignments)
67 67  
68 68  **Detailed POC1 specifications continue below...**
... ... @@ -75,6 +75,7 @@
75 75  **Main Requirement:** AI extracts factual claims from input text
76 76  
77 77  **POC Implementation:**
84 +
78 78  * ✅ AKEL extracts claims using LLM
79 79  * ✅ Each claim includes original text reference
80 80  * ✅ Claims are identified as factual/non-factual
... ... @@ -81,16 +81,17 @@
81 81  * ❌ No advanced claim parsing (added in POC2)
82 82  
83 83  **Acceptance Criteria:**
91 +
84 84  * Extracts 3-5 claims from typical article
85 85  * Identifies factual vs non-factual claims
86 86  * Quality Gate 1 validates extraction
87 87  
88 -
89 89  === 3.2 FR3: Multiple Scenarios (Full Implementation) ===
90 90  
91 91  **Main Requirement:** Generate multiple interpretation scenarios for ambiguous claims
92 92  
93 93  **POC Implementation:**
101 +
94 94  * ✅ AKEL generates 2-3 scenarios per claim
95 95  * ✅ Scenarios capture different interpretations
96 96  * ✅ Each scenario is evaluated separately
... ... @@ -97,16 +97,17 @@
97 97  * ✅ Verdict considers all scenarios
98 98  
99 99  **Acceptance Criteria:**
108 +
100 100  * Generates 2+ scenarios for ambiguous claims
101 101  * Scenarios are meaningfully different
102 102  * All scenarios are evaluated
103 103  
104 -
105 105  === 3.3 FR4: Analysis Summary (Basic Implementation) ===
106 106  
107 107  **Main Requirement:** Provide user-friendly summary of analysis
108 108  
109 109  **POC Implementation:**
118 +
110 110  * ✅ Simple text summary generated
111 111  * ❌ No rich formatting (added in Beta 0)
112 112  * ❌ No visual elements (added in Beta 0)
... ... @@ -124,10 +124,12 @@
124 124  === 3.4 FR5-FR6: Evidence Collection & Evaluation (Full Implementation) ===
125 125  
126 126  **Main Requirements:**
136 +
127 127  * FR5: Collect supporting and opposing evidence
128 128  * FR6: Evaluate evidence source reliability
129 129  
130 130  **POC Implementation:**
141 +
131 131  * ✅ AKEL searches for evidence (web/knowledge base)
132 132  * ✅ **Mandatory contradiction search** (finds opposing evidence)
133 133  * ✅ Source reliability scoring
... ... @@ -135,16 +135,17 @@
135 135  * ❌ No advanced source verification (added in POC2)
136 136  
137 137  **Acceptance Criteria:**
149 +
138 138  * Finds 2+ supporting evidence items
139 139  * Finds 1+ opposing evidence (if exists)
140 140  * Sources scored for reliability
141 141  
142 -
143 143  === 3.5 FR7: Automated Verdicts (Full Implementation) ===
144 144  
145 145  **Main Requirement:** AI computes verdicts with uncertainty quantification
146 146  
147 147  **POC Implementation:**
159 +
148 148  * ✅ Probabilistic verdicts (0-100% confidence)
149 149  * ✅ Uncertainty explicitly stated
150 150  * ✅ Reasoning chain provided
... ... @@ -159,11 +159,11 @@
159 159  ```
160 160  
161 161  **Acceptance Criteria:**
174 +
162 162  * Verdicts include probability (0-100%)
163 163  * Uncertainty explicitly quantified
164 164  * Reasoning chain explains verdict
165 165  
166 -
167 167  === 3.6 NFR11: Quality Assurance Framework (LITE VERSION) ===
168 168  
169 169  **Main Requirement:** Complete quality assurance with 7 quality gates
... ... @@ -171,11 +171,13 @@
171 171  **POC Implementation:** **2 gates only**
172 172  
173 173  **Quality Gate 1: Claim Validation**
186 +
174 174  * ✅ Validates claim is factual and verifiable
175 175  * ✅ Blocks non-factual claims (opinion/prediction/ambiguous)
176 176  * ✅ Provides clear rejection reason
177 177  
178 178  **Quality Gate 4: Verdict Confidence Assessment**
192 +
179 179  * ✅ Validates ≥2 sources found
180 180  * ✅ Validates quality score ≥0.6
181 181  * ✅ Blocks low-confidence verdicts
... ... @@ -182,6 +182,7 @@
182 182  * ✅ Provides clear rejection reason
183 183  
184 184  **Out of Scope (POC2+):**
199 +
185 185  * ❌ Gate 2: Evidence Relevance
186 186  * ❌ Gate 3: Scenario Coherence
187 187  * ❌ Gate 5: Source Diversity
... ... @@ -194,11 +194,13 @@
194 194  === 3.7 NFR1-3: Performance, Scalability, Reliability (Basic) ===
195 195  
196 196  **Main Requirements:**
212 +
197 197  * NFR1: Response time < 30 seconds
198 198  * NFR2: Handle 1000+ concurrent users
199 199  * NFR3: 99.9% uptime
200 200  
201 201  **POC Implementation:**
218 +
202 202  * ⚠️ **Response time monitored** (not optimized)
203 203  * ⚠️ **Single-threaded processing** (no concurrency)
204 204  * ⚠️ **Basic error handling** (no advanced retry logic)
... ... @@ -206,11 +206,11 @@
206 206  **Rationale:** POC proves functionality. Performance optimization happens in POC2.
207 207  
208 208  **POC Acceptance:**
226 +
209 209  * Analysis completes (no timeout requirement)
210 210  * Errors don't crash system
211 211  * Basic logging in place
212 212  
213 -
214 214  == 4. What's NOT in POC Scope ==
215 215  
216 216  === 4.1 User-Facing Features (Beta 0+) ===
... ... @@ -220,6 +220,7 @@
220 220  {{/warning}}
221 221  
222 222  **Out of Scope:**
240 +
223 223  * ❌ User accounts and authentication (FR8)
224 224  * ❌ User corrections system (FR9, FR45-46)
225 225  * ❌ Public publishing interface (FR10)
... ... @@ -233,6 +233,7 @@
233 233  === 4.2 Advanced Features (V1.0+) ===
234 234  
235 235  **Out of Scope:**
254 +
236 236  * ❌ IFCN compliance (FR47)
237 237  * ❌ ClaimReview schema (FR48)
238 238  * ❌ Archive.org integration (FR49)
... ... @@ -247,6 +247,7 @@
247 247  === 4.3 Production Requirements (POC2, Beta 0) ===
248 248  
249 249  **Out of Scope:**
269 +
250 250  * ❌ Security controls (NFR4, NFR12)
251 251  * ❌ Code maintainability (NFR5)
252 252  * ❌ System monitoring (NFR13)
... ... @@ -263,21 +263,26 @@
263 263  
264 264  For each analyzed claim, POC must produce:
265 265  
266 -**1. Claim**
286 +* \\
287 +** \\
288 +**1. Claim
267 267  * Original text
268 268  * Classification (factual/non-factual/ambiguous)
269 269  * If non-factual: Clear reason why
270 270  
271 271  **2. Scenarios** (if factual)
294 +
272 272  * 2-3 interpretation scenarios
273 273  * Each scenario clearly described
274 274  
275 275  **3. Evidence** (if factual)
299 +
276 276  * Supporting evidence (2+ items)
277 277  * Opposing evidence (if exists)
278 278  * Source URLs and reliability scores
279 279  
280 280  **4. Verdict** (if factual)
305 +
281 281  * Probability (0-100%)
282 282  * Uncertainty quantification
283 283  * Confidence level (LOW/MEDIUM/HIGH)
... ... @@ -284,10 +284,10 @@
284 284  * Reasoning chain
285 285  
286 286  **5. Quality Status**
312 +
287 287  * Which gates passed/failed
288 288  * If failed: Clear explanation why
289 289  
290 -
291 291  === 5.2 Example POC Output ===
292 292  
293 293  {{code language="json"}}
... ... @@ -339,6 +339,7 @@
339 339  POC is successful if:
340 340  
341 341  ✅ **FR1-FR7 Requirements Met:**
367 +
342 342  1. Extracts 3-5 factual claims from test articles
343 343  2. Generates 2-3 scenarios per ambiguous claim
344 344  3. Finds supporting AND opposing evidence
... ... @@ -346,19 +346,21 @@
346 346  5. Provides clear reasoning chains
347 347  
348 348  ✅ **Quality Gates Work:**
375 +
349 349  1. Gate 1 blocks non-factual claims (100% block rate)
350 350  2. Gate 4 blocks low-quality verdicts (blocks if <2 sources or quality <0.6)
351 351  3. Clear rejection reasons provided
352 352  
353 353  ✅ **NFR11 Met:**
381 +
354 354  1. Quality gates reduce hallucination rate
355 355  2. Blocked outputs have clear explanations
356 356  3. Quality metrics are logged
357 357  
358 -
359 359  === 6.2 Quality Thresholds ===
360 360  
361 361  **Minimum Acceptable:**
389 +
362 362  * ≥70% of test claims correctly classified (factual/non-factual)
363 363  * ≥60% of verdicts are reasonable (human evaluation)
364 364  * Gate 1 blocks 100% of non-factual claims
... ... @@ -365,16 +365,17 @@
365 365  * Gate 4 blocks verdicts with <2 sources
366 366  
367 367  **Target:**
396 +
368 368  * ≥80% claims correctly classified
369 369  * ≥75% verdicts are reasonable
370 370  * <10% false positives (blocking good claims)
371 371  
372 -
373 373  === 6.3 POC Decision Gate ===
374 374  
375 375  **After POC1, we decide:**
376 376  
377 377  **✅ PROCEED to POC2** if:
406 +
378 378  * Success criteria met
379 379  * Quality gates demonstrably improve output
380 380  * Core workflow is technically sound
... ... @@ -381,65 +381,72 @@
381 381  * Clear path to production quality
382 382  
383 383  **⚠️ ITERATE POC1** if:
413 +
384 384  * Success criteria partially met
385 385  * Gates work but need tuning
386 386  * Core issues identified but fixable
387 387  
388 388  **❌ PIVOT APPROACH** if:
419 +
389 389  * Success criteria not met
390 390  * Fundamental AI limitations discovered
391 391  * Quality gates insufficient
392 392  * Alternative approach needed
393 393  
394 -
395 395  == 7. Test Cases ==
396 396  
397 397  === 7.1 Happy Path ===
398 398  
399 399  **Test 1: Simple Factual Claim**
430 +
400 400  * Input: "Paris is the capital of France"
401 -* Expected: Factual, 1 scenario, verdict ~95% true
432 +* Expected: Factual, 1 scenario, verdict 95% true
402 402  
403 403  **Test 2: Ambiguous Claim**
435 +
404 404  * Input: "Switzerland has the highest income in Europe"
405 405  * Expected: Factual, 2-3 scenarios, verdict with uncertainty
406 406  
407 407  **Test 3: Statistical Claim**
440 +
408 408  * Input: "10% of people have condition X"
409 409  * Expected: Factual, evidence with numbers, probabilistic verdict
410 410  
411 -
412 412  === 7.2 Edge Cases ===
413 413  
414 414  **Test 4: Opinion**
447 +
415 415  * Input: "Paris is the best city"
416 416  * Expected: Non-factual (opinion), blocked by Gate 1
417 417  
418 418  **Test 5: Prediction**
452 +
419 419  * Input: "Bitcoin will reach $100,000 next year"
420 420  * Expected: Non-factual (prediction), blocked by Gate 1
421 421  
422 422  **Test 6: Insufficient Evidence**
457 +
423 423  * Input: Obscure factual claim with no sources
424 424  * Expected: Blocked by Gate 4 (<2 sources)
425 425  
426 -
427 427  === 7.3 Quality Gate Tests ===
428 428  
429 429  **Test 7: Gate 1 Effectiveness**
464 +
430 430  * Input: Mix of 10 factual + 10 non-factual claims
431 431  * Expected: Gate 1 blocks all 10 non-factual (100% precision)
432 432  
433 433  **Test 8: Gate 4 Effectiveness**
469 +
434 434  * Input: Claims with varying evidence availability
435 435  * Expected: Gate 4 blocks low-confidence verdicts
436 436  
437 -
438 438  == 8. Technical Architecture (POC) ==
439 439  
440 440  === 8.1 Simplified Architecture ===
441 441  
442 442  **POC Tech Stack:**
478 +
443 443  * **Frontend:** Simple web interface (Next.js + TypeScript)
444 444  * **Backend:** Single API endpoint
445 445  * **AI:** Claude API (Sonnet 4.5)
... ... @@ -452,6 +452,7 @@
452 452  === 8.2 AKEL Implementation ===
453 453  
454 454  **POC AKEL:**
491 +
455 455  * Single-threaded processing
456 456  * Synchronous API calls
457 457  * No caching
... ... @@ -459,6 +459,7 @@
459 459  * Console logging
460 460  
461 461  **Full AKEL (POC2+):**
499 +
462 462  * Multi-threaded processing
463 463  * Async API calls
464 464  * Evidence caching
... ... @@ -465,7 +465,6 @@
465 465  * Advanced error handling with retry
466 466  * Structured logging + monitoring
467 467  
468 -
469 469  == 9. POC Philosophy ==
470 470  
471 471  {{info}}
... ... @@ -474,47 +474,55 @@
474 474  
475 475  === 9.1 Core Principles ===
476 476  
477 -**1. Prove Concept, Not Production**
514 +* \\
515 +** \\
516 +**1. Prove Concept, Not Production
478 478  * POC validates AI can do the job
479 479  * Production quality comes in POC2 and Beta 0
480 480  * Focus on "does it work?" not "is it perfect?"
481 481  
482 482  **2. Implement Subset of Requirements**
522 +
483 483  * POC covers FR1-7, NFR11 (lite)
484 484  * All other requirements deferred
485 485  * Clear mapping to [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]
486 486  
487 487  **3. Quality Gates Validate Approach**
528 +
488 488  * 2 gates prove the concept
489 489  * Remaining 5 gates added in POC2
490 490  * Gates must demonstrably improve quality
491 491  
492 492  **4. Iterate Based on Results**
534 +
493 493  * POC results determine next steps
494 494  * Decision gate after POC1
495 495  * Flexibility to pivot if needed
496 496  
539 +=== 9.2 Success ===
497 497  
498 -=== 9.2 Success = Clear Path Forward ===
541 + Clear Path Forward ===
499 499  
500 500  POC succeeds if we can confidently answer:
501 501  
502 502  ✅ **Technical Feasibility:**
546 +
503 503  * Can AI extract claims reliably?
504 504  * Can AI find balanced evidence?
505 505  * Can AI compute reasonable verdicts?
506 506  
507 507  ✅ **Quality Approach:**
552 +
508 508  * Do quality gates improve output?
509 509  * Can we measure and track quality?
510 510  * Is the gate approach scalable?
511 511  
512 512  ✅ **Production Path:**
558 +
513 513  * Is the core architecture sound?
514 514  * What needs improvement for production?
515 515  * Is POC2 the right next step?
516 516  
517 -
518 518  == 10. Related Pages ==
519 519  
520 520  * **[[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]** - Full system requirements (this POC implements a subset)
... ... @@ -523,11 +523,10 @@
523 523  * **[[Implementation Roadmap>>FactHarbor.Roadmap.WebHome]]** - POC1, POC2, Beta 0, V1.0 phases
524 524  * **[[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]** - What users need (drives requirements)
525 525  
526 -
527 527  **Document Owner:** Technical Team
528 528  **Review Frequency:** After each POC iteration
529 529  **Version History:**
574 +
530 530  * v1.0 - Initial POC requirements
531 531  * v2.0 - Updated after specification cross-check
532 532  * v3.0 - Aligned with Main Requirements (FR/NFR IDs added)
533 -