Last modified by Robert Schaub on 2026/02/08 08:25

From version 1.1
edited by Robert Schaub
on 2025/12/23 16:52
Change comment: Imported from XAR
To version 1.3
edited by Robert Schaub
on 2026/01/20 20:23
Change comment: Renamed back-links.

Summary

Details

Page properties
Content
... ... @@ -14,9 +14,11 @@
14 14  === 1.1 What POC Tests ===
15 15  
16 16  **Core Question:**
17 +
17 17  > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?
18 18  
19 19  **What we're proving:**
21 +
20 20  * AI can identify factual claims from text
21 21  * AI can evaluate those claims with structured evidence
22 22  * Quality gates can filter unreliable outputs
... ... @@ -23,6 +23,7 @@
23 23  * The core workflow is technically feasible
24 24  
25 25  **What we're NOT proving:**
28 +
26 26  * Production-ready reliability (that's POC2)
27 27  * User-facing features (that's Beta 0)
28 28  * Full IFCN compliance (that's V1.0)
... ... @@ -32,15 +32,15 @@
32 32  POC1 implements a **subset** of the full system requirements defined in [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]].
33 33  
34 34  **Scope Summary:**
38 +
35 35  * **In Scope:** 8 requirements (7 FRs + 1 NFR)
36 36  * **Partial:** 3 NFRs (simplified versions)
37 37  * **Out of Scope:** 19 requirements (deferred to later phases)
38 38  
39 -
40 40  == 2. POC1 Scope ==
41 41  
42 42  {{success}}
43 -**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor.Roadmap.Requirements-Roadmap-Matrix.WebHome]]
46 +**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor V0\.9\.88.Roadmap.Requirements-Roadmap-Matrix.WebHome]]
44 44  
45 45  The Roadmap Matrix is the single source of truth for which requirements are implemented in which phases. This page provides POC1-specific implementation details only.
46 46  {{/success}}
... ... @@ -54,9 +54,8 @@
54 54  
55 55  **POC1 also implements these workflow components** (detailed as FR1-FR6 in implementation sections below)
56 56  
57 -{{info}}
58 -**Note:** FR11 (Audit Trail) and FR13 (In-Article Claim Highlighting) are deferred to Beta 0 for production readiness and user experience enhancement.
59 -{{/info}}:
60 +{{info}}**Note:** FR11 (Audit Trail) and FR13 (In-Article Claim Highlighting) are deferred to Beta 0 for production readiness and user experience enhancement.{{/info}}:
61 +
60 60  * Claim extraction (FR1)
61 61  * Claim context (FR2)
62 62  * Multiple scenarios (FR3)
... ... @@ -67,6 +67,7 @@
67 67  * In-article highlighting (FR13) - deferred to Beta 0
68 68  
69 69  **Partial implementations:**
72 +
70 70  * NFR1 (Explainability) - Basic only
71 71  * NFR2 (Performance) - Functional but not optimized
72 72  * NFR3 (Transparency) - Basic only
... ... @@ -82,6 +82,7 @@
82 82  **Main Requirement:** AI extracts factual claims from input text
83 83  
84 84  **POC Implementation:**
88 +
85 85  * ✅ AKEL extracts claims using LLM
86 86  * ✅ Each claim includes original text reference
87 87  * ✅ Claims are identified as factual/non-factual
... ... @@ -88,16 +88,17 @@
88 88  * ❌ No advanced claim parsing (added in POC2)
89 89  
90 90  **Acceptance Criteria:**
95 +
91 91  * Extracts 3-5 claims from typical article
92 92  * Identifies factual vs non-factual claims
93 93  * Quality Gate 1 validates extraction
94 94  
95 -
96 96  === 3.2 FR3: Multiple Scenarios (Full Implementation) ===
97 97  
98 98  **Main Requirement:** Generate multiple interpretation scenarios for ambiguous claims
99 99  
100 100  **POC Implementation:**
105 +
101 101  * ✅ AKEL generates 2-3 scenarios per claim
102 102  * ✅ Scenarios capture different interpretations
103 103  * ✅ Each scenario is evaluated separately
... ... @@ -104,16 +104,17 @@
104 104  * ✅ Verdict considers all scenarios
105 105  
106 106  **Acceptance Criteria:**
112 +
107 107  * Generates 2+ scenarios for ambiguous claims
108 108  * Scenarios are meaningfully different
109 109  * All scenarios are evaluated
110 110  
111 -
112 112  === 3.3 FR4: Analysis Summary (Basic Implementation) ===
113 113  
114 114  **Main Requirement:** Provide user-friendly summary of analysis
115 115  
116 116  **POC Implementation:**
122 +
117 117  * ✅ Simple text summary generated
118 118  * ❌ No rich formatting (added in Beta 0)
119 119  * ❌ No visual elements (added in Beta 0)
... ... @@ -131,10 +131,12 @@
131 131  === 3.4 FR5-FR6: Evidence Collection & Evaluation (Full Implementation) ===
132 132  
133 133  **Main Requirements:**
140 +
134 134  * FR5: Collect supporting and opposing evidence
135 135  * FR6: Evaluate evidence source reliability
136 136  
137 137  **POC Implementation:**
145 +
138 138  * ✅ AKEL searches for evidence (web/knowledge base)
139 139  * ✅ **Mandatory contradiction search** (finds opposing evidence)
140 140  * ✅ Source reliability scoring
... ... @@ -142,16 +142,17 @@
142 142  * ❌ No advanced source verification (added in POC2)
143 143  
144 144  **Acceptance Criteria:**
153 +
145 145  * Finds 2+ supporting evidence items
146 146  * Finds 1+ opposing evidence (if exists)
147 147  * Sources scored for reliability
148 148  
149 -
150 150  === 3.5 FR7: Automated Verdicts (Full Implementation) ===
151 151  
152 152  **Main Requirement:** AI computes verdicts with uncertainty quantification
153 153  
154 154  **POC Implementation:**
163 +
155 155  * ✅ Probabilistic verdicts (0-100% confidence)
156 156  * ✅ Uncertainty explicitly stated
157 157  * ✅ Reasoning chain provided
... ... @@ -166,11 +166,11 @@
166 166  ```
167 167  
168 168  **Acceptance Criteria:**
178 +
169 169  * Verdicts include probability (0-100%)
170 170  * Uncertainty explicitly quantified
171 171  * Reasoning chain explains verdict
172 172  
173 -
174 174  === 3.6 NFR11: Quality Assurance Framework (LITE VERSION) ===
175 175  
176 176  **Main Requirement:** Complete quality assurance with 7 quality gates
... ... @@ -178,11 +178,13 @@
178 178  **POC Implementation:** **2 gates only**
179 179  
180 180  **Quality Gate 1: Claim Validation**
190 +
181 181  * ✅ Validates claim is factual and verifiable
182 182  * ✅ Blocks non-factual claims (opinion/prediction/ambiguous)
183 183  * ✅ Provides clear rejection reason
184 184  
185 185  **Quality Gate 4: Verdict Confidence Assessment**
196 +
186 186  * ✅ Validates ≥2 sources found
187 187  * ✅ Validates quality score ≥0.6
188 188  * ✅ Blocks low-confidence verdicts
... ... @@ -189,6 +189,7 @@
189 189  * ✅ Provides clear rejection reason
190 190  
191 191  **Out of Scope (POC2+):**
203 +
192 192  * ❌ Gate 2: Evidence Relevance
193 193  * ❌ Gate 3: Scenario Coherence
194 194  * ❌ Gate 5: Source Diversity
... ... @@ -201,11 +201,13 @@
201 201  === 3.7 NFR1-3: Performance, Scalability, Reliability (Basic) ===
202 202  
203 203  **Main Requirements:**
216 +
204 204  * NFR1: Response time < 30 seconds
205 205  * NFR2: Handle 1000+ concurrent users
206 206  * NFR3: 99.9% uptime
207 207  
208 208  **POC Implementation:**
222 +
209 209  * ⚠️ **Response time monitored** (not optimized)
210 210  * ⚠️ **Single-threaded processing** (no concurrency)
211 211  * ⚠️ **Basic error handling** (no advanced retry logic)
... ... @@ -213,11 +213,11 @@
213 213  **Rationale:** POC proves functionality. Performance optimization happens in POC2.
214 214  
215 215  **POC Acceptance:**
230 +
216 216  * Analysis completes (no timeout requirement)
217 217  * Errors don't crash system
218 218  * Basic logging in place
219 219  
220 -
221 221  == 4. What's NOT in POC Scope ==
222 222  
223 223  === 4.1 User-Facing Features (Beta 0+) ===
... ... @@ -227,6 +227,7 @@
227 227  {{/warning}}
228 228  
229 229  **Out of Scope:**
244 +
230 230  * ❌ User accounts and authentication (FR8)
231 231  * ❌ User corrections system (FR9, FR45-46)
232 232  * ❌ Public publishing interface (FR10)
... ... @@ -240,6 +240,7 @@
240 240  === 4.2 Advanced Features (V1.0+) ===
241 241  
242 242  **Out of Scope:**
258 +
243 243  * ❌ IFCN compliance (FR47)
244 244  * ❌ ClaimReview schema (FR48)
245 245  * ❌ Archive.org integration (FR49)
... ... @@ -254,6 +254,7 @@
254 254  === 4.3 Production Requirements (POC2, Beta 0) ===
255 255  
256 256  **Out of Scope:**
273 +
257 257  * ❌ Security controls (NFR4, NFR12)
258 258  * ❌ Code maintainability (NFR5)
259 259  * ❌ System monitoring (NFR13)
... ... @@ -270,21 +270,26 @@
270 270  
271 271  For each analyzed claim, POC must produce:
272 272  
273 -**1. Claim**
290 +* \\
291 +** \\
292 +**1. Claim
274 274  * Original text
275 275  * Classification (factual/non-factual/ambiguous)
276 276  * If non-factual: Clear reason why
277 277  
278 278  **2. Scenarios** (if factual)
298 +
279 279  * 2-3 interpretation scenarios
280 280  * Each scenario clearly described
281 281  
282 282  **3. Evidence** (if factual)
303 +
283 283  * Supporting evidence (2+ items)
284 284  * Opposing evidence (if exists)
285 285  * Source URLs and reliability scores
286 286  
287 287  **4. Verdict** (if factual)
309 +
288 288  * Probability (0-100%)
289 289  * Uncertainty quantification
290 290  * Confidence level (LOW/MEDIUM/HIGH)
... ... @@ -291,10 +291,10 @@
291 291  * Reasoning chain
292 292  
293 293  **5. Quality Status**
316 +
294 294  * Which gates passed/failed
295 295  * If failed: Clear explanation why
296 296  
297 -
298 298  === 5.2 Example POC Output ===
299 299  
300 300  {{code language="json"}}
... ... @@ -346,6 +346,7 @@
346 346  POC is successful if:
347 347  
348 348  ✅ **FR1-FR7 Requirements Met:**
371 +
349 349  1. Extracts 3-5 factual claims from test articles
350 350  2. Generates 2-3 scenarios per ambiguous claim
351 351  3. Finds supporting AND opposing evidence
... ... @@ -353,19 +353,21 @@
353 353  5. Provides clear reasoning chains
354 354  
355 355  ✅ **Quality Gates Work:**
379 +
356 356  1. Gate 1 blocks non-factual claims (100% block rate)
357 357  2. Gate 4 blocks low-quality verdicts (blocks if <2 sources or quality <0.6)
358 358  3. Clear rejection reasons provided
359 359  
360 360  ✅ **NFR11 Met:**
385 +
361 361  1. Quality gates reduce hallucination rate
362 362  2. Blocked outputs have clear explanations
363 363  3. Quality metrics are logged
364 364  
365 -
366 366  === 6.2 Quality Thresholds ===
367 367  
368 368  **Minimum Acceptable:**
393 +
369 369  * ≥70% of test claims correctly classified (factual/non-factual)
370 370  * ≥60% of verdicts are reasonable (human evaluation)
371 371  * Gate 1 blocks 100% of non-factual claims
... ... @@ -372,16 +372,17 @@
372 372  * Gate 4 blocks verdicts with <2 sources
373 373  
374 374  **Target:**
400 +
375 375  * ≥80% claims correctly classified
376 376  * ≥75% verdicts are reasonable
377 377  * <10% false positives (blocking good claims)
378 378  
379 -
380 380  === 6.3 POC Decision Gate ===
381 381  
382 382  **After POC1, we decide:**
383 383  
384 384  **✅ PROCEED to POC2** if:
410 +
385 385  * Success criteria met
386 386  * Quality gates demonstrably improve output
387 387  * Core workflow is technically sound
... ... @@ -388,65 +388,72 @@
388 388  * Clear path to production quality
389 389  
390 390  **⚠️ ITERATE POC1** if:
417 +
391 391  * Success criteria partially met
392 392  * Gates work but need tuning
393 393  * Core issues identified but fixable
394 394  
395 395  **❌ PIVOT APPROACH** if:
423 +
396 396  * Success criteria not met
397 397  * Fundamental AI limitations discovered
398 398  * Quality gates insufficient
399 399  * Alternative approach needed
400 400  
401 -
402 402  == 7. Test Cases ==
403 403  
404 404  === 7.1 Happy Path ===
405 405  
406 406  **Test 1: Simple Factual Claim**
434 +
407 407  * Input: "Paris is the capital of France"
408 -* Expected: Factual, 1 scenario, verdict ~95% true
436 +* Expected: Factual, 1 scenario, verdict 95% true
409 409  
410 410  **Test 2: Ambiguous Claim**
439 +
411 411  * Input: "Switzerland has the highest income in Europe"
412 412  * Expected: Factual, 2-3 scenarios, verdict with uncertainty
413 413  
414 414  **Test 3: Statistical Claim**
444 +
415 415  * Input: "10% of people have condition X"
416 416  * Expected: Factual, evidence with numbers, probabilistic verdict
417 417  
418 -
419 419  === 7.2 Edge Cases ===
420 420  
421 421  **Test 4: Opinion**
451 +
422 422  * Input: "Paris is the best city"
423 423  * Expected: Non-factual (opinion), blocked by Gate 1
424 424  
425 425  **Test 5: Prediction**
456 +
426 426  * Input: "Bitcoin will reach $100,000 next year"
427 427  * Expected: Non-factual (prediction), blocked by Gate 1
428 428  
429 429  **Test 6: Insufficient Evidence**
461 +
430 430  * Input: Obscure factual claim with no sources
431 431  * Expected: Blocked by Gate 4 (<2 sources)
432 432  
433 -
434 434  === 7.3 Quality Gate Tests ===
435 435  
436 436  **Test 7: Gate 1 Effectiveness**
468 +
437 437  * Input: Mix of 10 factual + 10 non-factual claims
438 438  * Expected: Gate 1 blocks all 10 non-factual (100% precision)
439 439  
440 440  **Test 8: Gate 4 Effectiveness**
473 +
441 441  * Input: Claims with varying evidence availability
442 442  * Expected: Gate 4 blocks low-confidence verdicts
443 443  
444 -
445 445  == 8. Technical Architecture (POC) ==
446 446  
447 447  === 8.1 Simplified Architecture ===
448 448  
449 449  **POC Tech Stack:**
482 +
450 450  * **Frontend:** Simple web interface (Next.js + TypeScript)
451 451  * **Backend:** Single API endpoint
452 452  * **AI:** Claude API (Sonnet 4.5)
... ... @@ -459,6 +459,7 @@
459 459  === 8.2 AKEL Implementation ===
460 460  
461 461  **POC AKEL:**
495 +
462 462  * Single-threaded processing
463 463  * Synchronous API calls
464 464  * No caching
... ... @@ -466,6 +466,7 @@
466 466  * Console logging
467 467  
468 468  **Full AKEL (POC2+):**
503 +
469 469  * Multi-threaded processing
470 470  * Async API calls
471 471  * Evidence caching
... ... @@ -472,7 +472,6 @@
472 472  * Advanced error handling with retry
473 473  * Structured logging + monitoring
474 474  
475 -
476 476  == 9. POC Philosophy ==
477 477  
478 478  {{info}}
... ... @@ -481,60 +481,67 @@
481 481  
482 482  === 9.1 Core Principles ===
483 483  
484 -**1. Prove Concept, Not Production**
518 +* \\
519 +** \\
520 +**1. Prove Concept, Not Production
485 485  * POC validates AI can do the job
486 486  * Production quality comes in POC2 and Beta 0
487 487  * Focus on "does it work?" not "is it perfect?"
488 488  
489 489  **2. Implement Subset of Requirements**
526 +
490 490  * POC covers FR1-7, NFR11 (lite)
491 491  * All other requirements deferred
492 492  * Clear mapping to [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]
493 493  
494 494  **3. Quality Gates Validate Approach**
532 +
495 495  * 2 gates prove the concept
496 496  * Remaining 5 gates added in POC2
497 497  * Gates must demonstrably improve quality
498 498  
499 499  **4. Iterate Based on Results**
538 +
500 500  * POC results determine next steps
501 501  * Decision gate after POC1
502 502  * Flexibility to pivot if needed
503 503  
543 +=== 9.2 Success ===
504 504  
505 -=== 9.2 Success = Clear Path Forward ===
545 + Clear Path Forward ===
506 506  
507 507  POC succeeds if we can confidently answer:
508 508  
509 509  ✅ **Technical Feasibility:**
550 +
510 510  * Can AI extract claims reliably?
511 511  * Can AI find balanced evidence?
512 512  * Can AI compute reasonable verdicts?
513 513  
514 514  ✅ **Quality Approach:**
556 +
515 515  * Do quality gates improve output?
516 516  * Can we measure and track quality?
517 517  * Is the gate approach scalable?
518 518  
519 519  ✅ **Production Path:**
562 +
520 520  * Is the core architecture sound?
521 521  * What needs improvement for production?
522 522  * Is POC2 the right next step?
523 523  
524 -
525 525  == 10. Related Pages ==
526 526  
527 527  * **[[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]** - Full system requirements (this POC implements a subset)
528 528  * **[[POC1 Specification (Detailed)>>FactHarbor.Specification.POC.Specification]]** - Detailed POC1 technical specs
529 529  * **[[POC Summary>>FactHarbor.Specification.POC.Summary]]** - High-level POC overview
530 -* **[[Implementation Roadmap>>FactHarbor.Roadmap.WebHome]]** - POC1, POC2, Beta 0, V1.0 phases
572 +* **[[Implementation Roadmap>>Archive.FactHarbor.Roadmap.WebHome]]** - POC1, POC2, Beta 0, V1.0 phases
531 531  * **[[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]** - What users need (drives requirements)
532 532  
533 -
534 534  **Document Owner:** Technical Team
535 535  **Review Frequency:** After each POC iteration
536 536  **Version History:**
578 +
537 537  * v1.0 - Initial POC requirements
538 538  * v2.0 - Updated after specification cross-check
539 539  * v3.0 - Aligned with Main Requirements (FR/NFR IDs added)
540 -