Last modified by Robert Schaub on 2025/12/23 18:00

From version 2.2
edited by Robert Schaub
on 2025/12/23 18:00
Change comment: Renamed from xwiki:Test.FactHarbor.Specification.POC.Requirements
To version 1.1
edited by Robert Schaub
on 2025/12/23 17:44
Change comment: Imported from XAR

Summary

Details

Page properties
Title
... ... @@ -1,1 +1,1 @@
1 -POC Requirements (POC1 & POC2)
1 +POC Requirements
Parent
... ... @@ -1,1 +1,1 @@
1 -WebHome
1 +FactHarbor.Specification.POC.WebHome
Content
... ... @@ -1,581 +1,1359 @@
1 1  = POC Requirements =
2 2  
3 3  **Status:** ✅ Approved for Development
4 -**Version:** 3.0 (Aligned with Main Requirements)
4 +**Version:** 2.0 (Updated after Specification Cross-Check)
5 5  **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
6 6  
7 -{{info}}
8 -**Core Philosophy:** POC validates the [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]] through simplified implementation. All POC features map to formal FR/NFR requirements.
9 -{{/info}}
7 +---
10 10  
11 -
12 12  == 1. POC Overview ==
13 13  
14 14  === 1.1 What POC Tests ===
15 15  
16 16  **Core Question:**
17 -
18 18  > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?
19 19  
20 20  **What we're proving:**
21 -
22 22  * AI can identify factual claims from text
23 -* AI can evaluate those claims with structured evidence
24 -* Quality gates can filter unreliable outputs
25 -* The core workflow is technically feasible
18 +* AI can evaluate those claims and produce verdicts
19 +* Output is comprehensible and useful
20 +* Fully automated approach is viable
26 26  
27 -**What we're NOT proving:**
22 +**What we're NOT testing:**
23 +* Scenario generation (deferred to POC2)
24 +* Evidence display (deferred to POC2)
25 +* Production scalability
26 +* Perfect accuracy
27 +* Complete feature set
28 28  
29 -* Production-ready reliability (that's POC2)
30 -* User-facing features (that's Beta 0)
31 -* Full IFCN compliance (that's V1.0)
29 +---
32 32  
33 -=== 1.2 Requirements Mapping ===
31 +=== 1.2 Scenarios Deferred to POC2 ===
34 34  
35 -POC1 implements a **subset** of the full system requirements defined in [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]].
33 +**Intentional Simplification:**
36 36  
37 -**Scope Summary:**
35 +Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**.
38 38  
39 -* **In Scope:** 8 requirements (7 FRs + 1 NFR)
40 -* **Partial:** 3 NFRs (simplified versions)
41 -* **Out of Scope:** 19 requirements (deferred to later phases)
37 +**Rationale:**
38 +* **POC1 tests:** Can AI extract claims and generate verdicts?
39 +* **POC2 will add:** Scenario generation and management
40 +* **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow?
42 42  
43 -== 2. POC1 Scope ==
42 +**Design Decision:**
44 44  
45 -{{success}}
46 -**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor V0\.9\.88 ex 2 new Org Pages.Roadmap.Requirements-Roadmap-Matrix.WebHome]]
44 +Prove basic AI capability first, then add scenario complexity based on POC1 learnings. This is good engineering: test the hardest part (AI fact-checking) before adding architectural complexity.
47 47  
48 -The Roadmap Matrix is the single source of truth for which requirements are implemented in which phases. This page provides POC1-specific implementation details only.
49 -{{/success}}
46 +**No Risk:**
50 50  
51 -**POC1 implements these formal requirements:**
48 +Scenarios are additive complexity, not foundational. Deferring them to POC2 allows:
49 +* Faster POC1 validation
50 +* Learning from POC1 to inform scenario design
51 +* Iterative approach: fail fast if basic AI doesn't work
52 +* Flexibility to adjust scenario architecture based on POC1 insights
52 52  
53 -|= Formal Req |= Implementation in POC1 |= Notes
54 -| **FR4** | Analysis Summary | Basic format; quality metadata deferred to POC2
55 -| **FR7** | Automated Verdicts | Full implementation with quality gates (NFR11)
56 -| **NFR11** | Quality Assurance Framework | 4 quality gates implemented
54 +**Full System Workflow (Future):**
55 +{{code}}
56 +Claims Scenarios Evidence → Verdicts
57 +{{/code}}
57 57  
58 -**POC1 also implements these workflow components** (detailed as FR1-FR6 in implementation sections below)
59 +**POC1 Simplified Workflow:**
60 +{{code}}
61 +Claims → Verdicts (scenarios implicit in reasoning)
62 +{{/code}}
59 59  
60 -{{info}}**Note:** FR11 (Audit Trail) and FR13 (In-Article Claim Highlighting) are deferred to Beta 0 for production readiness and user experience enhancement.{{/info}}:
64 +---
61 61  
62 -* Claim extraction (FR1)
63 -* Claim context (FR2)
64 -* Multiple scenarios (FR3)
65 -* Evidence collection (FR5)
66 -* Source quality assessment (FR6)
67 -* Time evolution tracking (FR8) - deferred to POC2
68 -* Audit trail (FR11) - deferred to Beta 0
69 -* In-article highlighting (FR13) - deferred to Beta 0
66 +== 2. POC Output Specification ==
70 70  
71 -**Partial implementations:**
68 +=== 2.1 Component 1: ANALYSIS SUMMARY ===
72 72  
73 -* NFR1 (Explainability) - Basic only
74 -* NFR2 (Performance) - Functional but not optimized
75 -* NFR3 (Transparency) - Basic only
70 +**What:** Brief overview of findings
71 +**Length:** 3-5 sentences
72 +**Content:**
73 +* How many claims found
74 +* Distribution of verdicts
75 +* Overall assessment
76 76  
77 -**Detailed POC1 implementation specifications continue below...**
77 +**Example:**
78 +{{code}}
79 +This article makes 4 claims about coffee's health effects. We found
80 +2 claims are well-supported, 1 is uncertain, and 1 is refuted.
81 +Overall assessment: mostly accurate with some exaggeration.
82 +{{/code}}
78 78  
84 +---
79 79  
86 +=== 2.2 Component 2: CLAIMS IDENTIFICATION ===
80 80  
81 -== 3. POC Simplifications ==
88 +**What:** List of factual claims extracted from article
89 +**Format:** Numbered list
90 +**Quantity:** 3-5 claims
91 +**Requirements:**
92 +* Factual claims only (not opinions/questions)
93 +* Clearly stated
94 +* Automatically extracted by AI
82 82  
83 -=== 3.1 FR1: Claim Extraction (Full Implementation) ===
96 +**Example:**
97 +{{code}}
98 +CLAIMS IDENTIFIED:
84 84  
85 -**Main Requirement:** AI extracts factual claims from input text
100 +[1] Coffee reduces diabetes risk by 30%
101 +[2] Coffee improves heart health
102 +[3] Decaf has same benefits as regular
103 +[4] Coffee prevents Alzheimer's completely
104 +{{/code}}
86 86  
87 -**POC Implementation:**
106 +---
88 88  
89 -* ✅ AKEL extracts claims using LLM
90 -* ✅ Each claim includes original text reference
91 -* ✅ Claims are identified as factual/non-factual
92 -* ❌ No advanced claim parsing (added in POC2)
108 +=== 2.3 Component 3: CLAIMS VERDICTS ===
93 93  
94 -**Acceptance Criteria:**
110 +**What:** Verdict for each claim identified
111 +**Format:** Per claim structure
95 95  
96 -* Extracts 3-5 claims from typical article
97 -* Identifies factual vs non-factual claims
98 -* Quality Gate 1 validates extraction
113 +**Required Elements:**
114 +* **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
115 +* **Confidence Score:** 0-100%
116 +* **Brief Reasoning:** 1-3 sentences explaining why
117 +* **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration
99 99  
100 -=== 3.2 FR3: Multiple Scenarios (Full Implementation) ===
119 +**Example:**
120 +{{code}}
121 +VERDICTS:
101 101  
102 -**Main Requirement:** Generate multiple interpretation scenarios for ambiguous claims
123 +[1] WELL-SUPPORTED (85%) [Risk: C]
124 +Multiple studies confirm 25-30% risk reduction with regular consumption.
103 103  
126 +[2] UNCERTAIN (65%) [Risk: B]
127 +Evidence is mixed. Some studies show benefits, others show no effect.
128 +
129 +[3] PARTIALLY SUPPORTED (60%) [Risk: C]
130 +Some benefits overlap, but caffeine-related benefits are reduced in decaf.
131 +
132 +[4] REFUTED (90%) [Risk: B]
133 +No evidence for complete prevention. Claim is significantly overstated.
134 +{{/code}}
135 +
136 +**Risk Tier Display:**
137 +* **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections
138 +* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
139 +* **Tier C (Green):** Low Risk - Facts/Definitions/History
140 +
141 +**Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.
142 +
143 +---
144 +
145 +=== 2.4 Component 4: ARTICLE SUMMARY (Optional) ===
146 +
147 +**What:** Brief summary of original article content
148 +**Length:** 3-5 sentences
149 +**Tone:** Neutral (article's position, not FactHarbor's analysis)
150 +
151 +**Example:**
152 +{{code}}
153 +ARTICLE SUMMARY:
154 +
155 +Health News Today article discusses coffee benefits, citing studies
156 +on diabetes and Alzheimer's. Author highlights research linking coffee
157 +to disease prevention. Recommends 2-3 cups daily for optimal health.
158 +{{/code}}
159 +
160 +---
161 +
162 +=== 2.5 Total Output Size ===
163 +
164 +**Combined:** ~200-300 words
165 +* Analysis Summary: 50-70 words
166 +* Claims Identification: 30-50 words
167 +* Claims Verdicts: 100-150 words
168 +* Article Summary: 30-50 words (optional)
169 +
170 +---
171 +
172 +== 3. What's NOT in POC Scope ==
173 +
174 +=== 3.1 Feature Exclusions ===
175 +
176 +The following are **explicitly excluded** from POC:
177 +
178 +**Content Features:**
179 +* ❌ Scenarios (deferred to POC2)
180 +* ❌ Evidence display (supporting/opposing lists)
181 +* ❌ Source links (clickable references)
182 +* ❌ Detailed reasoning chains
183 +* ❌ Source quality ratings (shown but not detailed)
184 +* ❌ Contradiction detection (basic only)
185 +* ❌ Risk assessment (shown but not workflow-integrated)
186 +
187 +**Platform Features:**
188 +* ❌ User accounts / authentication
189 +* ❌ Saved history
190 +* ❌ Search functionality
191 +* ❌ Claim comparison
192 +* ❌ User contributions
193 +* ❌ Commenting system
194 +* ❌ Social sharing
195 +
196 +**Technical Features:**
197 +* ❌ Browser extensions
198 +* ❌ Mobile apps
199 +* ❌ API endpoints
200 +* ❌ Webhooks
201 +* ❌ Export features (PDF, CSV)
202 +
203 +**Quality Features:**
204 +* ❌ Accessibility (WCAG compliance)
205 +* ❌ Multilingual support
206 +* ❌ Mobile optimization
207 +* ❌ Media verification (images/videos)
208 +
209 +**Production Features:**
210 +* ❌ Security hardening
211 +* ❌ Privacy compliance (GDPR)
212 +* ❌ Terms of service
213 +* ❌ Monitoring/logging
214 +* ❌ Error tracking
215 +* ❌ Analytics
216 +* ❌ A/B testing
217 +
218 +---
219 +
220 +== 4. POC Simplifications vs. Full System ==
221 +
222 +=== 4.1 Architecture Comparison ===
223 +
224 +**POC Architecture (Simplified):**
225 +{{code}}
226 +User Input → Single AKEL Call → Output Display
227 + (all processing)
228 +{{/code}}
229 +
230 +**Full System Architecture:**
231 +{{code}}
232 +User Input → Claim Extractor → Claim Classifier → Scenario Generator
233 +→ Evidence Summarizer → Contradiction Detector → Verdict Generator
234 +→ Quality Gates → Publication → Output Display
235 +{{/code}}
236 +
237 +**Key Differences:**
238 +
239 +|=Aspect|=POC1|=Full System
240 +|Processing|Single API call|Multi-component pipeline
241 +|Scenarios|None (implicit)|Explicit entities with versioning
242 +|Evidence|Basic retrieval|Comprehensive with quality scoring
243 +|Quality Gates|Simplified (4 basic checks)|Full validation infrastructure
244 +|Workflow|3 steps (input/process/output)|6 phases with gates
245 +|Data Model|Stateless (no database)|PostgreSQL + Redis + S3
246 +|Architecture|Single prompt to Claude|AKEL Orchestrator + Components
247 +
248 +---
249 +
250 +=== 4.2 Workflow Comparison ===
251 +
252 +**POC1 Workflow:**
253 +1. User submits text/URL
254 +2. Single AKEL call (all processing in one prompt)
255 +3. Display results
256 +**Total: 3 steps, ~10-18 seconds**
257 +
258 +**Full System Workflow:**
259 +1. **Claim Submission** (extraction, normalization, clustering)
260 +2. **Scenario Building** (definitions, assumptions, boundaries)
261 +3. **Evidence Handling** (retrieval, assessment, linking)
262 +4. **Verdict Creation** (synthesis, reasoning, approval)
263 +5. **Public Presentation** (summaries, landscapes, deep dives)
264 +6. **Time Evolution** (versioning, re-evaluation triggers)
265 +**Total: 6 phases with quality gates, ~10-30 seconds**
266 +
267 +---
268 +
269 +=== 4.3 Why POC is Simplified ===
270 +
271 +**Engineering Rationale:**
272 +
273 +1. **Test core capability first:** Can AI do basic fact-checking without humans?
274 +2. **Fail fast:** If AI can't generate reasonable verdicts, pivot early
275 +3. **Learn before building:** POC1 insights inform full architecture
276 +4. **Iterative approach:** Add complexity only after validating foundations
277 +5. **Resource efficiency:** Don't build full system if core concept fails
278 +
279 +**Acceptable Trade-offs:**
280 +
281 +* ✅ POC proves AI capability (most risky assumption)
282 +* ✅ POC validates user comprehension (can people understand output?)
283 +* ❌ POC doesn't validate full workflow (test in Beta)
284 +* ❌ POC doesn't validate scale (test in Beta)
285 +* ❌ POC doesn't validate scenario architecture (design in POC2)
286 +
287 +---
288 +
289 +=== 4.4 Gap Between POC1 and POC2/Beta ===
290 +
291 +**What needs to be built for POC2:**
292 +* Scenario generation component
293 +* Evidence Model structure (full)
294 +* Scenario-evidence linking
295 +* Multi-interpretation comparison
296 +* Truth landscape visualization
297 +
298 +**What needs to be built for Beta:**
299 +* Multi-component AKEL pipeline
300 +* Quality gate infrastructure
301 +* Review workflow system
302 +* Audit sampling framework
303 +* Production data model
304 +* Federation architecture (Release 1.0)
305 +
306 +**POC1 → POC2 is significant architectural expansion.**
307 +
308 +---
309 +
310 +== 5. Publication Mode & Labeling ==
311 +
312 +=== 5.1 POC Publication Mode ===
313 +
314 +**Mode:** Mode 2 (AI-Generated, No Prior Human Review)
315 +
316 +Per FactHarbor Specification Section 11 "POC v1 Behavior":
317 +* Produces public AI-generated output
318 +* No human approval gate
319 +* Clear AI-Generated labeling
320 +* All quality gates active (simplified)
321 +* Risk tier classification shown (demo)
322 +
323 +---
324 +
325 +=== 5.2 User-Facing Labels ===
326 +
327 +**Primary Label (top of analysis):**
328 +{{code}}
329 +╔════════════════════════════════════════════════════════════╗
330 +║ [AI-GENERATED - POC/DEMO] ║
331 +║ ║
332 +║ This analysis was produced entirely by AI and has not ║
333 +║ been human-reviewed. Use for demonstration purposes. ║
334 +║ ║
335 +║ Source: AI/AKEL v1.0 (POC) ║
336 +║ Review Status: Not Reviewed (Proof-of-Concept) ║
337 +║ Quality Gates: 4/4 Passed (Simplified) ║
338 +║ Last Updated: [timestamp] ║
339 +╚════════════════════════════════════════════════════════════╝
340 +{{/code}}
341 +
342 +**Per-Claim Risk Labels:**
343 +* **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety)
344 +* **[Risk: B]** 🟡 Medium Risk (Policy/Science)
345 +* **[Risk: C]** 🟢 Low Risk (Facts/Definitions)
346 +
347 +---
348 +
349 +=== 5.3 Display Requirements ===
350 +
351 +**Must Show:**
352 +* AI-Generated status (prominent)
353 +* POC/Demo disclaimer
354 +* Risk tier per claim
355 +* Confidence scores (0-100%)
356 +* Quality gate status (passed/failed)
357 +* Timestamp
358 +
359 +**Must NOT Claim:**
360 +* Human review
361 +* Production quality
362 +* Medical/legal advice
363 +* Authoritative verdicts
364 +* Complete accuracy
365 +
366 +---
367 +
368 +=== 5.4 Mode 2 vs. Full System Publication ===
369 +
370 +|=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3
371 +|Label|AI-Generated (POC)|AI-Generated|AKEL-Generated
372 +|Review|None|None|Human-Reviewed
373 +|Quality Gates|4 (simplified)|6 (full)|6 (full) + Human
374 +|Audit|None (POC)|Sampling (5-50%)|Pre-publication
375 +|Risk Display|Demo only|Workflow-integrated|Validated
376 +|User Actions|View only|Flag for review|Trust rating
377 +
378 +---
379 +
380 +== 6. Quality Gates (Simplified Implementation) ==
381 +
382 +=== 6.1 Overview ===
383 +
384 +Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates.
385 +
386 +**Full System Has 4 Gates:**
387 +1. Source Quality
388 +2. Contradiction Search (MANDATORY)
389 +3. Uncertainty Quantification
390 +4. Structural Integrity
391 +
392 +**POC Implements Simplified Versions:**
393 +* Focus on demonstrating concept
394 +* Basic implementations sufficient
395 +* Failures displayed to user (not blocking)
396 +* Full system has comprehensive validation
397 +
398 +---
399 +
400 +=== 6.2 Gate 1: Source Quality (Basic) ===
401 +
402 +**Full System Requirements:**
403 +* Primary sources identified and accessible
404 +* Source reliability scored against whitelist
405 +* Citation completeness verified
406 +* Publication dates checked
407 +* Author credentials validated
408 +
104 104  **POC Implementation:**
410 +* ✅ At least 2 sources found
411 +* ✅ Sources accessible (URLs valid)
412 +* ❌ No whitelist checking
413 +* ❌ No credential validation
414 +* ❌ No comprehensive reliability scoring
105 105  
106 -* ✅ AKEL generates 2-3 scenarios per claim
107 -* ✅ Scenarios capture different interpretations
108 -* ✅ Each scenario is evaluated separately
109 -* ✅ Verdict considers all scenarios
416 +**Pass Criteria:** ≥2 accessible sources found
110 110  
111 -**Acceptance Criteria:**
418 +**Failure Handling:** Display error message, don't generate verdict
112 112  
113 -* Generates 2+ scenarios for ambiguous claims
114 -* Scenarios are meaningfully different
115 -* All scenarios are evaluated
420 +---
116 116  
117 -=== 3.3 FR4: Analysis Summary (Basic Implementation) ===
422 +=== 6.3 Gate 2: Contradiction Search (Basic) ===
118 118  
119 -**Main Requirement:** Provide user-friendly summary of analysis
424 +**Full System Requirements:**
425 +* Counter-evidence actively searched
426 +* Reservations and limitations identified
427 +* Alternative interpretations explored
428 +* Bubble detection (echo chambers, conspiracy theories)
429 +* Cross-cultural and international perspectives
430 +* Academic literature (supporting AND opposing)
120 120  
121 121  **POC Implementation:**
433 +* ✅ Basic search for counter-evidence
434 +* ✅ Identify obvious contradictions
435 +* ❌ No comprehensive academic search
436 +* ❌ No bubble detection
437 +* ❌ No systematic alternative interpretation search
438 +* ❌ No international perspective verification
122 122  
123 -* ✅ Simple text summary generated
124 -* ❌ No rich formatting (added in Beta 0)
125 -* ❌ No visual elements (added in Beta 0)
126 -* ❌ No interactive features (added in Beta 0)
440 +**Pass Criteria:** Basic contradiction search attempted
127 127  
128 -**POC Format:**
129 -```
130 -Claim: [extracted claim]
131 -Scenarios: [list of scenarios]
132 -Evidence: [supporting/opposing evidence]
133 -Verdict: [probability with uncertainty]
134 -```
442 +**Failure Handling:** Note "limited contradiction search" in output
135 135  
444 +---
136 136  
137 -=== 3.4 FR5-FR6: Evidence Collection & Evaluation (Full Implementation) ===
446 +=== 6.4 Gate 3: Uncertainty Quantification (Basic) ===
138 138  
139 -**Main Requirements:**
448 +**Full System Requirements:**
449 +* Confidence scores calculated for all claims/verdicts
450 +* Limitations explicitly stated
451 +* Data gaps identified and disclosed
452 +* Strength of evidence assessed
453 +* Alternative scenarios considered
140 140  
141 -* FR5: Collect supporting and opposing evidence
142 -* FR6: Evaluate evidence source reliability
455 +**POC Implementation:**
456 +* ✅ Confidence scores (0-100%)
457 +* ✅ Basic uncertainty acknowledgment
458 +* ❌ No detailed limitation disclosure
459 +* ❌ No data gap identification
460 +* ❌ No alternative scenario consideration (deferred to POC2)
143 143  
462 +**Pass Criteria:** Confidence score assigned
463 +
464 +**Failure Handling:** Show "Confidence: Unknown" if calculation fails
465 +
466 +---
467 +
468 +=== 6.5 Gate 4: Structural Integrity (Basic) ===
469 +
470 +**Full System Requirements:**
471 +* No hallucinations detected (fact-checking against sources)
472 +* Logic chain valid and traceable
473 +* References accessible and verifiable
474 +* No circular reasoning
475 +* Premises clearly stated
476 +
144 144  **POC Implementation:**
478 +* ✅ Basic coherence check
479 +* ✅ References accessible
480 +* ❌ No comprehensive hallucination detection
481 +* ❌ No formal logic validation
482 +* ❌ No premise extraction and verification
145 145  
146 -* ✅ AKEL searches for evidence (web/knowledge base)
147 -* ✅ **Mandatory contradiction search** (finds opposing evidence)
148 -* ✅ Source reliability scoring
149 -* ❌ No evidence deduplication (added in POC2)
150 -* ❌ No advanced source verification (added in POC2)
484 +**Pass Criteria:** Output is coherent and references are accessible
151 151  
486 +**Failure Handling:** Display error message
487 +
488 +---
489 +
490 +=== 6.6 Quality Gate Display ===
491 +
492 +**POC shows simplified status:**
493 +{{code}}
494 +Quality Gates: 4/4 Passed (Simplified)
495 +✓ Source Quality: 3 sources found
496 +✓ Contradiction Search: Basic search completed
497 +✓ Uncertainty: Confidence scores assigned
498 +✓ Structural Integrity: Output coherent
499 +{{/code}}
500 +
501 +**If any gate fails:**
502 +{{code}}
503 +Quality Gates: 3/4 Passed (Simplified)
504 +✓ Source Quality: 3 sources found
505 +✗ Contradiction Search: Search failed - limited evidence
506 +✓ Uncertainty: Confidence scores assigned
507 +✓ Structural Integrity: Output coherent
508 +
509 +Note: This analysis has limited evidence. Use with caution.
510 +{{/code}}
511 +
512 +---
513 +
514 +=== 6.7 Simplified vs. Full System ===
515 +
516 +|=Gate|=POC (Simplified)|=Full System
517 +|Source Quality|≥2 sources accessible|Whitelist scoring, credentials, comprehensiveness
518 +|Contradiction|Basic search|Systematic academic + media + international
519 +|Uncertainty|Confidence % assigned|Detailed limitations, data gaps, alternatives
520 +|Structural|Coherence check|Hallucination detection, logic validation, premise check
521 +
522 +**POC Goal:** Demonstrate that quality gates are possible, not perfect implementation.
523 +
524 +---
525 +
526 +== 7. AKEL Architecture Comparison ==
527 +
528 +=== 7.1 POC AKEL (Simplified) ===
529 +
530 +**Implementation:**
531 +* Single Claude API call (Sonnet 4.5)
532 +* One comprehensive prompt
533 +* All processing in single request
534 +* No separate components
535 +* No orchestration layer
536 +
537 +**Prompt Structure:**
538 +{{code}}
539 +Task: Analyze this article and provide:
540 +
541 +1. Extract 3-5 factual claims
542 +2. For each claim:
543 + - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
544 + - Assign confidence score (0-100%)
545 + - Assign risk tier (A/B/C)
546 + - Write brief reasoning (1-3 sentences)
547 +3. Generate analysis summary (3-5 sentences)
548 +4. Generate article summary (3-5 sentences)
549 +5. Run basic quality checks
550 +
551 +Return as structured JSON.
552 +{{/code}}
553 +
554 +**Processing Time:** 10-18 seconds (estimate)
555 +
556 +---
557 +
558 +=== 7.2 Full System AKEL (Production) ===
559 +
560 +**Architecture:**
561 +{{code}}
562 +AKEL Orchestrator
563 +├── Claim Extractor
564 +├── Claim Classifier (with risk tier assignment)
565 +├── Scenario Generator
566 +├── Evidence Summarizer
567 +├── Contradiction Detector
568 +├── Quality Gate Validator
569 +├── Audit Sampling Scheduler
570 +└── Federation Sync Adapter (Release 1.0+)
571 +{{/code}}
572 +
573 +**Processing:**
574 +* Parallel processing where possible
575 +* Separate component calls
576 +* Quality gates between phases
577 +* Audit sampling selection
578 +* Cross-node coordination (federated mode)
579 +
580 +**Processing Time:** 10-30 seconds (full pipeline)
581 +
582 +---
583 +
584 +=== 7.3 Why POC Uses Single Call ===
585 +
586 +**Advantages:**
587 +* ✅ Simpler to implement
588 +* ✅ Faster POC development
589 +* ✅ Easier to debug
590 +* ✅ Proves AI capability
591 +* ✅ Good enough for concept validation
592 +
593 +**Limitations:**
594 +* ❌ No component reusability
595 +* ❌ No parallel processing
596 +* ❌ All-or-nothing (can't partially succeed)
597 +* ❌ Harder to improve individual components
598 +* ❌ No audit sampling
599 +
600 +**Acceptable Trade-off:**
601 +
602 +POC tests "Can AI do this?" not "How should we architect it?"
603 +
604 +Full component architecture comes in Beta after POC validates concept.
605 +
606 +---
607 +
608 +=== 7.4 Evolution Path ===
609 +
610 +**POC1:** Single prompt → Prove concept
611 +**POC2:** Add scenario component → Test full pipeline
612 +**Beta:** Multi-component AKEL → Production architecture
613 +**Release 1.0:** Full AKEL + Federation → Scale
614 +
615 +---
616 +
617 +== 8. Functional Requirements ==
618 +
619 +=== FR-POC-1: Article Input ===
620 +
621 +**Requirement:** User can submit article for analysis
622 +
623 +**Functionality:**
624 +* Text input field (paste article text, up to 5000 characters)
625 +* URL input field (paste article URL)
626 +* "Analyze" button to trigger processing
627 +* Loading indicator during analysis
628 +
629 +**Excluded:**
630 +* No user authentication
631 +* No claim history
632 +* No search functionality
633 +* No saved templates
634 +
152 152  **Acceptance Criteria:**
636 +* User can paste text from article
637 +* User can paste URL of article
638 +* System accepts input and triggers analysis
153 153  
154 -* Finds 2+ supporting evidence items
155 -* Finds 1+ opposing evidence (if exists)
156 -* Sources scored for reliability
640 +---
157 157  
158 -=== 3.5 FR7: Automated Verdicts (Full Implementation) ===
642 +=== FR-POC-2: Claim Extraction (Fully Automated) ===
159 159  
160 -**Main Requirement:** AI computes verdicts with uncertainty quantification
644 +**Requirement:** AI automatically extracts 3-5 factual claims
161 161  
162 -**POC Implementation:**
646 +**Functionality:**
647 +* AI reads article text
648 +* AI identifies factual claims (not opinions/questions)
649 +* AI extracts 3-5 most important claims
650 +* System displays numbered list
163 163  
164 -* ✅ Probabilistic verdicts (0-100% confidence)
165 -* ✅ Uncertainty explicitly stated
166 -* ✅ Reasoning chain provided
167 -* ✅ Quality Gate 4 validates verdict confidence
652 +**Critical:** NO MANUAL EDITING ALLOWED
653 +* AI selects which claims to extract
654 +* AI identifies factual vs. non-factual
655 +* System processes claims as extracted
656 +* No human curation or correction
168 168  
169 -**POC Output:**
170 -```
171 -Verdict: 70% likely true
172 -Uncertainty: ±15% (moderate confidence)
173 -Reasoning: Based on 3 high-quality sources...
174 -Confidence Level: MEDIUM
175 -```
658 +**Error Handling:**
659 +* If extraction fails: Display error message
660 +* User can retry with different input
661 +* No manual intervention to fix extraction
176 176  
177 177  **Acceptance Criteria:**
664 +* AI extracts 3-5 claims automatically
665 +* Claims are factual (not opinions)
666 +* Claims are clearly stated
667 +* No manual editing required
178 178  
179 -* Verdicts include probability (0-100%)
180 -* Uncertainty explicitly quantified
181 -* Reasoning chain explains verdict
669 +---
182 182  
183 -=== 3.6 NFR11: Quality Assurance Framework (LITE VERSION) ===
671 +=== FR-POC-3: Verdict Generation (Fully Automated) ===
184 184  
185 -**Main Requirement:** Complete quality assurance with 7 quality gates
673 +**Requirement:** AI automatically generates verdict for each claim
186 186  
187 -**POC Implementation:** **2 gates only**
675 +**Functionality:**
676 +* For each claim, AI:
677 + * Evaluates claim based on available evidence/knowledge
678 + * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
679 + * Assigns confidence score (0-100%)
680 + * Assigns risk tier (A/B/C)
681 + * Writes brief reasoning (1-3 sentences)
682 +* System displays verdict for each claim
188 188  
189 -**Quality Gate 1: Claim Validation**
684 +**Critical:** NO MANUAL EDITING ALLOWED
685 +* AI computes verdicts based on evidence
686 +* AI generates confidence scores
687 +* AI writes reasoning
688 +* No human review or adjustment
190 190  
191 -* ✅ Validates claim is factual and verifiable
192 -* ✅ Blocks non-factual claims (opinion/prediction/ambiguous)
193 -* ✅ Provides clear rejection reason
690 +**Error Handling:**
691 +* If verdict generation fails: Display error message
692 +* User can retry
693 +* No manual intervention to adjust verdicts
194 194  
195 -**Quality Gate 4: Verdict Confidence Assessment**
695 +**Acceptance Criteria:**
696 +* Each claim has a verdict
697 +* Confidence score is displayed (0-100%)
698 +* Risk tier is displayed (A/B/C)
699 +* Reasoning is understandable (1-3 sentences)
700 +* Verdict is defensible given reasoning
701 +* All generated automatically by AI
196 196  
197 -* ✅ Validates ≥2 sources found
198 -* ✅ Validates quality score ≥0.6
199 -* ✅ Blocks low-confidence verdicts
200 -* ✅ Provides clear rejection reason
703 +---
201 201  
202 -**Out of Scope (POC2+):**
705 +=== FR-POC-4: Analysis Summary (Fully Automated) ===
203 203  
204 -* ❌ Gate 2: Evidence Relevance
205 -* ❌ Gate 3: Scenario Coherence
206 -* ❌ Gate 5: Source Diversity
207 -* ❌ Gate 6: Reasoning Validity
208 -* ❌ Gate 7: Output Completeness
707 +**Requirement:** AI generates brief summary of analysis
209 209  
210 -**Rationale:** Prove gate concept works. Add remaining gates in POC2 after validating approach.
709 +**Functionality:**
710 +* AI summarizes findings in 3-5 sentences:
711 + * How many claims found
712 + * Distribution of verdicts
713 + * Overall assessment
714 +* System displays at top of results
211 211  
716 +**Critical:** NO MANUAL EDITING ALLOWED
212 212  
213 -=== 3.7 NFR1-3: Performance, Scalability, Reliability (Basic) ===
718 +**Acceptance Criteria:**
719 +* Summary is coherent
720 +* Accurately reflects analysis
721 +* 3-5 sentences
722 +* Automatically generated
214 214  
215 -**Main Requirements:**
724 +---
216 216  
217 -* NFR1: Response time < 30 seconds
218 -* NFR2: Handle 1000+ concurrent users
219 -* NFR3: 99.9% uptime
726 +=== FR-POC-5: Article Summary (Fully Automated, Optional) ===
220 220  
221 -**POC Implementation:**
728 +**Requirement:** AI generates brief summary of original article
222 222  
223 -* ⚠️ **Response time monitored** (not optimized)
224 -* ⚠️ **Single-threaded processing** (no concurrency)
225 -* ⚠️ **Basic error handling** (no advanced retry logic)
730 +**Functionality:**
731 +* AI summarizes article content (not FactHarbor's analysis)
732 +* 3-5 sentences
733 +* System displays
226 226  
227 -**Rationale:** POC proves functionality. Performance optimization happens in POC2.
735 +**Note:** Optional - can skip if time limited
228 228  
229 -**POC Acceptance:**
737 +**Critical:** NO MANUAL EDITING ALLOWED
230 230  
231 -* Analysis completes (no timeout requirement)
232 -* Errors don't crash system
233 -* Basic logging in place
739 +**Acceptance Criteria:**
740 +* Summary is neutral (article's position)
741 +* Accurately reflects article content
742 +* 3-5 sentences
743 +* Automatically generated
234 234  
235 -== 4. What's NOT in POC Scope ==
745 +---
236 236  
237 -=== 4.1 User-Facing Features (Beta 0+) ===
747 +=== FR-POC-6: Publication Mode Display ===
238 238  
239 -{{warning}}
240 -**Deferred to Beta 0:**
241 -{{/warning}}
749 +**Requirement:** Clear labeling of AI-generated content
242 242  
243 -**Out of Scope:**
751 +**Functionality:**
752 +* Display Mode 2 publication label
753 +* Show POC/Demo disclaimer
754 +* Display risk tiers per claim
755 +* Show quality gate status
756 +* Display timestamp
244 244  
245 -* ❌ User accounts and authentication (FR8)
246 -* ❌ User corrections system (FR9, FR45-46)
247 -* ❌ Public publishing interface (FR10)
248 -* ❌ Social sharing (FR11)
249 -* ❌ Email notifications (FR12)
250 -* ❌ API access (FR13)
758 +**Acceptance Criteria:**
759 +* Label is prominent and clear
760 +* User understands this is AI-generated POC output
761 +* Risk tiers are color-coded
762 +* Quality gate status is visible
251 251  
252 -**Rationale:** POC validates AI capabilities. User features added in Beta 0.
764 +---
253 253  
766 +=== FR-POC-7: Quality Gate Execution ===
254 254  
255 -=== 4.2 Advanced Features (V1.0+) ===
768 +**Requirement:** Execute simplified quality gates
256 256  
257 -**Out of Scope:**
770 +**Functionality:**
771 +* Check source quality (basic)
772 +* Attempt contradiction search (basic)
773 +* Calculate confidence scores
774 +* Verify structural integrity (basic)
775 +* Display gate results
258 258  
259 -* ❌ IFCN compliance (FR47)
260 -* ❌ ClaimReview schema (FR48)
261 -* ❌ Archive.org integration (FR49)
262 -* ❌ OSINT toolkit (FR50)
263 -* ❌ Video verification (FR51)
264 -* ❌ Deepfake detection (FR52)
265 -* ❌ Cross-org sharing (FR53)
777 +**Acceptance Criteria:**
778 +* All 4 gates attempted
779 +* Pass/fail status displayed
780 +* Failures explained to user
781 +* Gates don't block publication (POC mode)
266 266  
267 -**Rationale:** Advanced features require proven platform. Added post-V1.0.
783 +---
268 268  
785 +== 9. Non-Functional Requirements ==
269 269  
270 -=== 4.3 Production Requirements (POC2, Beta 0) ===
787 +=== NFR-POC-1: Fully Automated Processing ===
271 271  
272 -**Out of Scope:**
789 +**Requirement:** Complete AI automation with zero manual intervention
273 273  
274 -* ❌ Security controls (NFR4, NFR12)
275 -* ❌ Code maintainability (NFR5)
276 -* ❌ System monitoring (NFR13)
277 -* ❌ Evidence deduplication
278 -* ❌ Advanced source verification
279 -* ❌ Full 7-gate quality framework
791 +**Critical Rule:** NO MANUAL EDITING AT ANY STAGE
280 280  
281 -**Rationale:** POC proves concept. Production hardening happens in POC2 and Beta 0.
793 +**What this means:**
794 +* Claims: AI selects (no human curation)
795 +* Scenarios: N/A (deferred to POC2)
796 +* Evidence: AI evaluates (no human selection)
797 +* Verdicts: AI determines (no human adjustment)
798 +* Summaries: AI writes (no human editing)
282 282  
800 +**Pipeline:**
801 +{{code}}
802 +User Input → AKEL Processing → Output Display
803 + ↓
804 + ZERO human editing
805 +{{/code}}
283 283  
284 -== 5. POC Output Specification ==
807 +**If AI output is poor:**
808 +* ❌ Do NOT manually fix it
809 +* ✅ Document the failure
810 +* ✅ Improve prompts and retry
811 +* ✅ Accept that POC might fail
285 285  
286 -=== 5.1 Required Output Elements ===
813 +**Why this matters:**
814 +* Tests whether AI can do this without humans
815 +* Validates scalability (humans can't review every analysis)
816 +* Honest test of technical feasibility
287 287  
288 -For each analyzed claim, POC must produce:
818 +---
289 289  
290 -*
291 -**
292 -**1. Claim
293 -* Original text
294 -* Classification (factual/non-factual/ambiguous)
295 -* If non-factual: Clear reason why
820 +=== NFR-POC-2: Performance ===
296 296  
297 -**2. Scenarios** (if factual)
822 +**Requirement:** Analysis completes in reasonable time
298 298  
299 -* 2-3 interpretation scenarios
300 -* Each scenario clearly described
824 +**Acceptable Performance:**
825 +* Processing time: 1-5 minutes (acceptable for POC)
826 +* Display loading indicator to user
827 +* Show progress if possible ("Extracting claims...", "Generating verdicts...")
301 301  
302 -**3. Evidence** (if factual)
829 +**Not Required:**
830 +* Production-level speed (< 30 seconds)
831 +* Optimization for scale
832 +* Caching
303 303  
304 -* Supporting evidence (2+ items)
305 -* Opposing evidence (if exists)
306 -* Source URLs and reliability scores
834 +**Acceptance Criteria:**
835 +* Analysis completes within 5 minutes
836 +* User sees loading indicator
837 +* No timeout errors
307 307  
308 -**4. Verdict** (if factual)
839 +---
309 309  
310 -* Probability (0-100%)
311 -* Uncertainty quantification
312 -* Confidence level (LOW/MEDIUM/HIGH)
313 -* Reasoning chain
841 +=== NFR-POC-3: Reliability ===
314 314  
315 -**5. Quality Status**
843 +**Requirement:** System works for manual testing sessions
316 316  
317 -* Which gates passed/failed
318 -* If failed: Clear explanation why
845 +**Acceptable:**
846 +* Occasional errors (< 20% failure rate)
847 +* Manual restart if needed
848 +* Display error messages clearly
319 319  
320 -=== 5.2 Example POC Output ===
850 +**Not Required:**
851 +* 99.9% uptime
852 +* Automatic error recovery
853 +* Production monitoring
321 321  
322 -{{code language="json"}}
323 -{
324 - "claim": {
325 - "text": "Switzerland has the highest life expectancy in Europe",
326 - "type": "factual",
327 - "gate1_status": "PASS"
328 - },
329 - "scenarios": [
330 - "Switzerland's overall life expectancy is highest",
331 - "Switzerland ranks highest for specific age groups"
332 - ],
333 - "evidence": {
334 - "supporting": [
335 - {
336 - "source": "WHO Report 2023",
337 - "reliability": 0.95,
338 - "excerpt": "Switzerland: 83.4 years average..."
339 - }
340 - ],
341 - "opposing": [
342 - {
343 - "source": "Eurostat 2024",
344 - "reliability": 0.90,
345 - "excerpt": "Spain leads at 83.5 years..."
346 - }
347 - ]
348 - },
349 - "verdict": {
350 - "probability": 0.65,
351 - "uncertainty": 0.15,
352 - "confidence": "MEDIUM",
353 - "reasoning": "WHO and Eurostat show similar but conflicting data...",
354 - "gate4_status": "PASS"
355 - }
356 -}
855 +**Acceptance Criteria:**
856 +* System works for test demonstrations
857 +* Errors are handled gracefully
858 +* User receives clear error messages
859 +
860 +---
861 +
862 +=== NFR-POC-4: Environment ===
863 +
864 +**Requirement:** Runs on simple infrastructure
865 +
866 +**Acceptable:**
867 +* Single machine or simple cloud setup
868 +* No distributed architecture
869 +* No load balancing
870 +* No redundancy
871 +* Local development environment viable
872 +
873 +**Not Required:**
874 +* Production infrastructure
875 +* Multi-region deployment
876 +* Auto-scaling
877 +* Disaster recovery
878 +
879 +---
880 +
881 +== 10. Technical Architecture ==
882 +
883 +=== 10.1 System Components ===
884 +
885 +**Frontend:**
886 +* Simple HTML form (text input + URL input + button)
887 +* Loading indicator
888 +* Results display page (single page, no tabs/navigation)
889 +
890 +**Backend:**
891 +* Single API endpoint
892 +* Calls Claude API (Sonnet 4.5 or latest)
893 +* Parses response
894 +* Returns JSON to frontend
895 +
896 +**Data Storage:**
897 +* None required (stateless POC)
898 +* Optional: Simple file storage or SQLite for demo examples
899 +
900 +**External Services:**
901 +* Claude API (Anthropic) - required
902 +* Optional: URL fetch service for article text extraction
903 +
904 +---
905 +
906 +=== 10.2 Processing Flow ===
907 +
908 +{{code}}
909 +1. User submits text or URL
910 + ↓
911 +2. Backend receives request
912 + ↓
913 +3. If URL: Fetch article text
914 + ↓
915 +4. Call Claude API with single prompt:
916 + "Extract claims, evaluate each, provide verdicts"
917 + ↓
918 +5. Claude API returns:
919 + - Analysis summary
920 + - Claims list
921 + - Verdicts for each claim (with risk tiers)
922 + - Article summary (optional)
923 + - Quality gate results
924 + ↓
925 +6. Backend parses response
926 + ↓
927 +7. Frontend displays results with Mode 2 labeling
357 357  {{/code}}
358 358  
930 +**Key Simplification:** Single API call does entire analysis
359 359  
360 -== 6. Success Criteria ==
932 +---
361 361  
362 -{{success}}
363 -**POC Success Definition:** POC validates that AI can extract claims, find balanced evidence, and compute reasonable verdicts with quality gates improving output quality.
364 -{{/success}}
934 +=== 10.3 AI Prompt Strategy ===
365 365  
366 -=== 6.1 Functional Success ===
936 +**Single Comprehensive Prompt:**
937 +{{code}}
938 +Task: Analyze this article and provide:
367 367  
368 -POC is successful if:
940 +1. Extract 3-5 factual claims from the article
941 +2. For each claim:
942 + - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
943 + - Assign confidence score (0-100%)
944 + - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
945 + - Write brief reasoning (1-3 sentences)
946 +3. Run quality gates:
947 + - Check: ≥2 sources found
948 + - Attempt: Basic contradiction search
949 + - Calculate: Confidence scores
950 + - Verify: Structural integrity
951 +4. Write analysis summary (3-5 sentences: claims found, verdict distribution, overall assessment)
952 +5. Write article summary (3-5 sentences: neutral summary of article content)
369 369  
370 -✅ **FR1-FR7 Requirements Met:**
954 +Return as structured JSON with quality gate results.
955 +{{/code}}
371 371  
372 -1. Extracts 3-5 factual claims from test articles
373 -2. Generates 2-3 scenarios per ambiguous claim
374 -3. Finds supporting AND opposing evidence
375 -4. Computes probabilistic verdicts with uncertainty
376 -5. Provides clear reasoning chains
957 +**One prompt generates everything.**
377 377  
378 -✅ **Quality Gates Work:**
959 +---
379 379  
380 -1. Gate 1 blocks non-factual claims (100% block rate)
381 -2. Gate 4 blocks low-quality verdicts (blocks if <2 sources or quality <0.6)
382 -3. Clear rejection reasons provided
961 +=== 10.4 Technology Stack Suggestions ===
383 383  
384 -✅ **NFR11 Met:**
963 +**Frontend:**
964 +* HTML + CSS + JavaScript (minimal framework)
965 +* OR: Next.js (if team prefers)
966 +* Hosted: Local machine OR Vercel/Netlify free tier
385 385  
386 -1. Quality gates reduce hallucination rate
387 -2. Blocked outputs have clear explanations
388 -3. Quality metrics are logged
968 +**Backend:**
969 +* Python Flask/FastAPI (simple REST API)
970 +* OR: Next.js API routes (if using Next.js)
971 +* Hosted: Local machine OR Railway/Render free tier
389 389  
390 -=== 6.2 Quality Thresholds ===
973 +**AKEL Integration:**
974 +* Claude API via Anthropic SDK
975 +* Model: Claude Sonnet 4.5 or latest available
391 391  
392 -**Minimum Acceptable:**
977 +**Database:**
978 +* None (stateless acceptable)
979 +* OR: SQLite if want to store demo examples
980 +* OR: JSON files on disk
393 393  
394 -* ≥70% of test claims correctly classified (factual/non-factual)
395 -* ≥60% of verdicts are reasonable (human evaluation)
396 -* Gate 1 blocks 100% of non-factual claims
397 -* Gate 4 blocks verdicts with <2 sources
982 +**Deployment:**
983 +* Local development environment sufficient for POC
984 +* Optional: Deploy to cloud for remote demos
398 398  
399 -**Target:**
986 +---
400 400  
401 -* ≥80% claims correctly classified
402 -* ≥75% verdicts are reasonable
403 -* <10% false positives (blocking good claims)
988 +== 11. Success Criteria ==
404 404  
405 -=== 6.3 POC Decision Gate ===
990 +=== 11.1 Minimum Success (POC Passes) ===
406 406  
407 -**After POC1, we decide:**
992 +**Required for GO decision:**
993 +* ✅ AI extracts 3-5 factual claims automatically
994 +* ✅ AI provides verdict for each claim automatically
995 +* ✅ Verdicts are reasonable (≥70% make logical sense)
996 +* ✅ Analysis summary is coherent
997 +* ✅ Output is comprehensible to reviewers
998 +* ✅ Team/advisors understand the output
999 +* ✅ Team agrees approach has merit
1000 +* ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention)
408 408  
409 -**✅ PROCEED to POC2** if:
1002 +**Quality Definition:**
1003 +* "Reasonable verdict" = Defensible given general knowledge
1004 +* "Coherent summary" = Logically structured, grammatically correct
1005 +* "Comprehensible" = Reviewers understand what analysis means
410 410  
411 -* Success criteria met
412 -* Quality gates demonstrably improve output
413 -* Core workflow is technically sound
414 -* Clear path to production quality
1007 +---
415 415  
416 -**⚠️ ITERATE POC1** if:
1009 +=== 11.2 POC Fails If ===
417 417  
418 -* Success criteria partially met
419 -* Gates work but need tuning
420 -* Core issues identified but fixable
1011 +**Automatic NO-GO if any of these:**
1012 +* ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones)
1013 +* ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random)
1014 +* ❌ Output incomprehensible (reviewers can't understand analysis)
1015 +* ❌ **Requires manual editing for most analyses** (> 50% need human correction)
1016 +* ❌ Team loses confidence in AI-automated approach
421 421  
422 -**❌ PIVOT APPROACH** if:
1018 +---
423 423  
424 -* Success criteria not met
425 -* Fundamental AI limitations discovered
426 -* Quality gates insufficient
427 -* Alternative approach needed
1020 +=== 11.3 Quality Thresholds ===
428 428  
429 -== 7. Test Cases ==
1022 +**POC quality expectations:**
430 430  
431 -=== 7.1 Happy Path ===
1024 +|=Component|=Quality Threshold|=Definition
1025 +|Claim Extraction|(% class="success" %)≥70% accuracy(%%) |Identifies obvious factual claims, may miss some edge cases
1026 +|Verdict Logic|(% class="success" %)≥70% defensible(%%) |Verdicts are logical given reasoning provided
1027 +|Reasoning Clarity|(% class="success" %)≥70% clear(%%) |1-3 sentences are understandable and relevant
1028 +|Overall Analysis|(% class="success" %)≥70% useful(%%) |Output helps user understand article claims
432 432  
433 -**Test 1: Simple Factual Claim**
1030 +**Analogy:** "B student" quality (70-80%), not "A+" perfection yet
434 434  
435 -* Input: "Paris is the capital of France"
436 -* Expected: Factual, 1 scenario, verdict 95% true
1032 +**Not expecting:**
1033 +* 100% accuracy
1034 +* Perfect claim coverage
1035 +* Comprehensive evidence gathering
1036 +* Flawless verdicts
1037 +* Production polish
437 437  
438 -**Test 2: Ambiguous Claim**
1039 +**Expecting:**
1040 +* Reasonable claim extraction
1041 +* Defensible verdicts
1042 +* Understandable reasoning
1043 +* Useful output
439 439  
440 -* Input: "Switzerland has the highest income in Europe"
441 -* Expected: Factual, 2-3 scenarios, verdict with uncertainty
1045 +---
442 442  
443 -**Test 3: Statistical Claim**
1047 +== 12. Test Cases ==
444 444  
445 -* Input: "10% of people have condition X"
446 -* Expected: Factual, evidence with numbers, probabilistic verdict
1049 +=== 12.1 Test Case 1: Simple Factual Claim ===
447 447  
448 -=== 7.2 Edge Cases ===
1051 +**Input:** "Coffee reduces the risk of type 2 diabetes by 30%"
449 449  
450 -**Test 4: Opinion**
1053 +**Expected Output:**
1054 +* Extract claim correctly
1055 +* Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED
1056 +* Confidence: 70-90%
1057 +* Risk tier: C (Low)
1058 +* Reasoning: Mentions studies or evidence
451 451  
452 -* Input: "Paris is the best city"
453 -* Expected: Non-factual (opinion), blocked by Gate 1
1060 +**Success:** Verdict is reasonable and reasoning makes sense
454 454  
455 -**Test 5: Prediction**
1062 +---
456 456  
457 -* Input: "Bitcoin will reach $100,000 next year"
458 -* Expected: Non-factual (prediction), blocked by Gate 1
1064 +=== 12.2 Test Case 2: Complex News Article ===
459 459  
460 -**Test 6: Insufficient Evidence**
1066 +**Input:** News article URL with multiple claims about politics/health/science
461 461  
462 -* Input: Obscure factual claim with no sources
463 -* Expected: Blocked by Gate 4 (<2 sources)
1068 +**Expected Output:**
1069 +* Extract 3-5 key claims
1070 +* Verdict for each (may vary: some supported, some uncertain, some refuted)
1071 +* Coherent analysis summary
1072 +* Article summary
1073 +* Risk tiers assigned appropriately
464 464  
465 -=== 7.3 Quality Gate Tests ===
1075 +**Success:** Claims identified are actually from article, verdicts are reasonable
466 466  
467 -**Test 7: Gate 1 Effectiveness**
1077 +---
468 468  
469 -* Input: Mix of 10 factual + 10 non-factual claims
470 -* Expected: Gate 1 blocks all 10 non-factual (100% precision)
1079 +=== 12.3 Test Case 3: Controversial Topic ===
471 471  
472 -**Test 8: Gate 4 Effectiveness**
1081 +**Input:** Article on contested political or scientific topic
473 473  
474 -* Input: Claims with varying evidence availability
475 -* Expected: Gate 4 blocks low-confidence verdicts
1083 +**Expected Output:**
1084 +* Balanced analysis
1085 +* Acknowledges uncertainty where appropriate
1086 +* Doesn't overstate confidence
1087 +* Reasoning shows awareness of complexity
476 476  
477 -== 8. Technical Architecture (POC) ==
1089 +**Success:** Analysis is fair and doesn't show obvious bias
478 478  
479 -=== 8.1 Simplified Architecture ===
1091 +---
480 480  
481 -**POC Tech Stack:**
1093 +=== 12.4 Test Case 4: Clearly False Claim ===
482 482  
483 -* **Frontend:** Simple web interface (Next.js + TypeScript)
484 -* **Backend:** Single API endpoint
485 -* **AI:** Claude API (Sonnet 4.5)
486 -* **Database:** Local JSON files (no database)
487 -* **Deployment:** Single server
1095 +**Input:** Article with obviously false claim (e.g., "The Earth is flat")
488 488  
489 -**Architecture Diagram:** See [[POC1 Specification>>FactHarbor.Specification.POC.Specification]]
1097 +**Expected Output:**
1098 +* Extract claim
1099 +* Verdict: REFUTED
1100 +* High confidence (> 90%)
1101 +* Risk tier: C (Low - established fact)
1102 +* Clear reasoning
490 490  
1104 +**Success:** AI correctly identifies false claim with high confidence
491 491  
492 -=== 8.2 AKEL Implementation ===
1106 +---
493 493  
494 -**POC AKEL:**
1108 +=== 12.5 Test Case 5: Genuinely Uncertain Claim ===
495 495  
496 -* Single-threaded processing
497 -* Synchronous API calls
498 -* No caching
499 -* Basic error handling
500 -* Console logging
1110 +**Input:** Article with claim where evidence is genuinely mixed
501 501  
502 -**Full AKEL (POC2+):**
1112 +**Expected Output:**
1113 +* Extract claim
1114 +* Verdict: UNCERTAIN
1115 +* Moderate confidence (40-60%)
1116 +* Reasoning explains why uncertain
503 503  
504 -* Multi-threaded processing
505 -* Async API calls
506 -* Evidence caching
507 -* Advanced error handling with retry
508 -* Structured logging + monitoring
1118 +**Success:** AI recognizes uncertainty and doesn't overstate confidence
509 509  
510 -== 9. POC Philosophy ==
1120 +---
511 511  
512 -{{info}}
513 -**Important:** POC validates concept, not production readiness. Focus is on proving AI can do the job, with production quality coming in later phases.
514 -{{/info}}
1122 +=== 12.6 Test Case 6: High-Risk Medical Claim ===
515 515  
516 -=== 9.1 Core Principles ===
1124 +**Input:** Article making medical claims
517 517  
518 -*
519 -**
520 -**1. Prove Concept, Not Production
521 -* POC validates AI can do the job
522 -* Production quality comes in POC2 and Beta 0
523 -* Focus on "does it work?" not "is it perfect?"
1126 +**Expected Output:**
1127 +* Extract claim
1128 +* Verdict: [appropriate based on evidence]
1129 +* Risk tier: A (High - medical)
1130 +* Red label displayed
1131 +* Clear disclaimer about not being medical advice
524 524  
525 -**2. Implement Subset of Requirements**
1133 +**Success:** Risk tier correctly assigned, appropriate warnings shown
526 526  
527 -* POC covers FR1-7, NFR11 (lite)
528 -* All other requirements deferred
529 -* Clear mapping to [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]
1135 +---
530 530  
531 -**3. Quality Gates Validate Approach**
1137 +== 13. POC Decision Gate ==
532 532  
533 -* 2 gates prove the concept
534 -* Remaining 5 gates added in POC2
535 -* Gates must demonstrably improve quality
1139 +=== 13.1 Decision Framework ===
536 536  
537 -**4. Iterate Based on Results**
1141 +After POC testing complete, team makes one of three decisions:
538 538  
539 -* POC results determine next steps
540 -* Decision gate after POC1
541 -* Flexibility to pivot if needed
1143 +**Option A: GO (Proceed to POC2)**
542 542  
543 -=== 9.2 Success ===
1145 +**Conditions:**
1146 +* AI quality ≥70% without manual editing
1147 +* Basic claim → verdict pipeline validated
1148 +* Internal + advisor feedback positive
1149 +* Technical feasibility confirmed
1150 +* Team confident in direction
1151 +* Clear path to improving AI quality to ≥90%
544 544  
545 - Clear Path Forward ===
1153 +**Next Steps:**
1154 +* Plan POC2 development (add scenarios)
1155 +* Design scenario architecture
1156 +* Expand to Evidence Model structure
1157 +* Test with more complex articles
546 546  
547 -POC succeeds if we can confidently answer:
1159 +---
548 548  
549 -**Technical Feasibility:**
1161 +**Option B: NO-GO (Pivot or Stop)**
550 550  
551 -* Can AI extract claims reliably?
552 -* Can AI find balanced evidence?
553 -* Can AI compute reasonable verdicts?
1163 +**Conditions:**
1164 +* AI quality < 60%
1165 +* Requires manual editing for most analyses (> 50%)
1166 +* Feedback indicates fundamental flaws
1167 +* Cost/effort not justified by value
1168 +* No clear path to improvement
554 554  
555 -✅ **Quality Approach:**
1170 +**Next Steps:**
1171 +* **Pivot:** Change to hybrid human-AI approach (accept manual review required)
1172 +* **Stop:** Conclude approach not viable, revisit later
556 556  
557 -* Do quality gates improve output?
558 -* Can we measure and track quality?
559 -* Is the gate approach scalable?
1174 +---
560 560  
561 -**Production Path:**
1176 +**Option C: ITERATE (Improve POC)**
562 562  
563 -* Is the core architecture sound?
564 -* What needs improvement for production?
565 -* Is POC2 the right next step?
1178 +**Conditions:**
1179 +* Concept has merit but execution needs work
1180 +* Specific improvements identified
1181 +* Addressable with better prompts/approach
1182 +* AI quality between 60-70%
566 566  
567 -== 10. Related Pages ==
1184 +**Next Steps:**
1185 +* Improve AI prompts
1186 +* Test different approaches
1187 +* Re-run POC with improvements
1188 +* Then make GO/NO-GO decision
568 568  
569 -* **[[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]** - Full system requirements (this POC implements a subset)
570 -* **[[POC1 Specification (Detailed)>>FactHarbor.Specification.POC.Specification]]** - Detailed POC1 technical specs
571 -* **[[POC Summary>>FactHarbor.Specification.POC.Summary]]** - High-level POC overview
572 -* **[[Implementation Roadmap>>FactHarbor.Roadmap.WebHome]]** - POC1, POC2, Beta 0, V1.0 phases
573 -* **[[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]** - What users need (drives requirements)
1190 +---
574 574  
575 -**Document Owner:** Technical Team
576 -**Review Frequency:** After each POC iteration
577 -**Version History:**
1192 +=== 13.2 Decision Criteria Summary ===
578 578  
579 -* v1.0 - Initial POC requirements
580 -* v2.0 - Updated after specification cross-check
581 -* v3.0 - Aligned with Main Requirements (FR/NFR IDs added)
1194 +{{code}}
1195 +AI Quality < 60% → NO-GO (approach doesn't work)
1196 +AI Quality 60-70% → ITERATE (improve and retry)
1197 +AI Quality ≥70% → GO (proceed to POC2)
1198 +{{/code}}
1199 +
1200 +---
1201 +
1202 +== 14. Key Risks & Mitigations ==
1203 +
1204 +=== 14.1 Risk: AI Quality Not Good Enough ===
1205 +
1206 +**Likelihood:** Medium-High
1207 +**Impact:** POC fails
1208 +
1209 +**Mitigation:**
1210 +* Extensive prompt engineering and testing
1211 +* Use best available AI models (Sonnet 4.5)
1212 +* Test with diverse article types
1213 +* Iterate on prompts based on results
1214 +
1215 +**Acceptance:** This is what POC tests - be ready for failure
1216 +
1217 +---
1218 +
1219 +=== 14.2 Risk: AI Consistency Issues ===
1220 +
1221 +**Likelihood:** Medium
1222 +**Impact:** Works sometimes, fails other times
1223 +
1224 +**Mitigation:**
1225 +* Test with 10+ diverse articles
1226 +* Measure success rate honestly
1227 +* Improve prompts to increase consistency
1228 +
1229 +**Acceptance:** Some variability OK if average quality ≥70%
1230 +
1231 +---
1232 +
1233 +=== 14.3 Risk: Output Incomprehensible ===
1234 +
1235 +**Likelihood:** Low-Medium
1236 +**Impact:** Users can't understand analysis
1237 +
1238 +**Mitigation:**
1239 +* Create clear explainer document
1240 +* Iterate on output format
1241 +* Test with non-technical reviewers
1242 +* Simplify language if needed
1243 +
1244 +**Acceptance:** Iterate until comprehensible
1245 +
1246 +---
1247 +
1248 +=== 14.4 Risk: API Rate Limits / Costs ===
1249 +
1250 +**Likelihood:** Low
1251 +**Impact:** System slow or expensive
1252 +
1253 +**Mitigation:**
1254 +* Monitor API usage
1255 +* Implement retry logic
1256 +* Estimate costs before scaling
1257 +
1258 +**Acceptance:** POC can be slow and expensive (optimization later)
1259 +
1260 +---
1261 +
1262 +=== 14.5 Risk: Scope Creep ===
1263 +
1264 +**Likelihood:** Medium
1265 +**Impact:** POC becomes too complex
1266 +
1267 +**Mitigation:**
1268 +* Strict scope discipline
1269 +* Say NO to feature additions
1270 +* Keep focus on core question
1271 +
1272 +**Acceptance:** POC is minimal by design
1273 +
1274 +---
1275 +
1276 +== 15. POC Philosophy ==
1277 +
1278 +=== 15.1 Core Principles ===
1279 +
1280 +**1. Build Less, Learn More**
1281 +* Minimum features to test hypothesis
1282 +* Don't build unvalidated features
1283 +* Focus on core question only
1284 +
1285 +**2. Fail Fast**
1286 +* Quick test of hardest part (AI capability)
1287 +* Accept that POC might fail
1288 +* Better to discover issues early
1289 +* Honest assessment over optimistic hope
1290 +
1291 +**3. Test First, Build Second**
1292 +* Validate AI can do this before building platform
1293 +* Don't assume it will work
1294 +* Let results guide decisions
1295 +
1296 +**4. Automation First**
1297 +* No manual editing allowed
1298 +* Tests scalability, not just feasibility
1299 +* Proves approach can work at scale
1300 +
1301 +**5. Honest Assessment**
1302 +* Don't cherry-pick examples
1303 +* Don't manually fix bad outputs
1304 +* Document failures openly
1305 +* Make data-driven decisions
1306 +
1307 +---
1308 +
1309 +=== 15.2 What POC Is ===
1310 +
1311 +✅ Testing AI capability without humans
1312 +✅ Proving core technical concept
1313 +✅ Fast validation of approach
1314 +✅ Honest assessment of feasibility
1315 +
1316 +---
1317 +
1318 +=== 15.3 What POC Is NOT ===
1319 +
1320 +❌ Building a product
1321 +❌ Production-ready system
1322 +❌ Feature-complete platform
1323 +❌ Perfectly accurate analysis
1324 +❌ Polished user experience
1325 +
1326 +---
1327 +
1328 +== 16. Success = Clear Path Forward ==
1329 +
1330 +**If POC succeeds (≥70% AI quality):**
1331 +* ✅ Approach validated
1332 +* ✅ Proceed to POC2 (add scenarios)
1333 +* ✅ Design full Evidence Model structure
1334 +* ✅ Test multi-scenario comparison
1335 +* ✅ Focus on improving AI quality from 70% → 90%
1336 +
1337 +**If POC fails (< 60% AI quality):**
1338 +* ✅ Learn what doesn't work
1339 +* ✅ Pivot to different approach
1340 +* ✅ OR wait for better AI technology
1341 +* ✅ Avoid wasting resources on non-viable approach
1342 +
1343 +**Either way, POC provides clarity.**
1344 +
1345 +---
1346 +
1347 +== 17. Related Pages ==
1348 +
1349 +* [[User Needs>>FactHarbor.Specification.Requirements.User Needs]]
1350 +* [[Requirements>>FactHarbor.Requirements.WebHome]]
1351 +* [[Gap Analysis>>FactHarbor.Analysis.GapAnalysis]]
1352 +* [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]
1353 +* [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
1354 +* [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]
1355 +
1356 +---
1357 +
1358 +**Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)
1359 +