Last modified by Robert Schaub on 2025/12/23 18:00

From version 2.1
edited by Robert Schaub
on 2025/12/23 17:44
Change comment: Imported from XAR
To version 1.1
edited by Robert Schaub
on 2025/12/23 17:44
Change comment: Imported from XAR

Summary

Details

Page properties
Title
... ... @@ -1,1 +1,1 @@
1 -POC Requirements (POC1 & POC2)
1 +POC Requirements
Parent
... ... @@ -1,1 +1,1 @@
1 -WebHome
1 +FactHarbor.Specification.POC.WebHome
Content
... ... @@ -1,14 +1,11 @@
1 1  = POC Requirements =
2 2  
3 3  **Status:** ✅ Approved for Development
4 -**Version:** 3.0 (Aligned with Main Requirements)
4 +**Version:** 2.0 (Updated after Specification Cross-Check)
5 5  **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
6 6  
7 -{{info}}
8 -**Core Philosophy:** POC validates the [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]] through simplified implementation. All POC features map to formal FR/NFR requirements.
9 -{{/info}}
7 +---
10 10  
11 -
12 12  == 1. POC Overview ==
13 13  
14 14  === 1.1 What POC Tests ===
... ... @@ -18,523 +18,1345 @@
18 18  
19 19  **What we're proving:**
20 20  * AI can identify factual claims from text
21 -* AI can evaluate those claims with structured evidence
22 -* Quality gates can filter unreliable outputs
23 -* The core workflow is technically feasible
18 +* AI can evaluate those claims and produce verdicts
19 +* Output is comprehensible and useful
20 +* Fully automated approach is viable
24 24  
25 -**What we're NOT proving:**
26 -* Production-ready reliability (that's POC2)
27 -* User-facing features (that's Beta 0)
28 -* Full IFCN compliance (that's V1.0)
22 +**What we're NOT testing:**
23 +* Scenario generation (deferred to POC2)
24 +* Evidence display (deferred to POC2)
25 +* Production scalability
26 +* Perfect accuracy
27 +* Complete feature set
29 29  
30 -=== 1.2 Requirements Mapping ===
29 +---
31 31  
32 -POC1 implements a **subset** of the full system requirements defined in [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]].
31 +=== 1.2 Scenarios Deferred to POC2 ===
33 33  
34 -**Scope Summary:**
35 -* **In Scope:** 8 requirements (7 FRs + 1 NFR)
36 -* **Partial:** 3 NFRs (simplified versions)
37 -* **Out of Scope:** 19 requirements (deferred to later phases)
33 +**Intentional Simplification:**
38 38  
35 +Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**.
39 39  
40 -== 2. POC1 Scope ==
37 +**Rationale:**
38 +* **POC1 tests:** Can AI extract claims and generate verdicts?
39 +* **POC2 will add:** Scenario generation and management
40 +* **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow?
41 41  
42 -{{success}}
43 -**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor.Roadmap.Requirements-Roadmap-Matrix.WebHome]]
42 +**Design Decision:**
44 44  
45 -The Roadmap Matrix is the single source of truth for which requirements are implemented in which phases. This page provides POC1-specific implementation details only.
46 -{{/success}}
44 +Prove basic AI capability first, then add scenario complexity based on POC1 learnings. This is good engineering: test the hardest part (AI fact-checking) before adding architectural complexity.
47 47  
48 -**POC1 implements these formal requirements:**
46 +**No Risk:**
49 49  
50 -|= Formal Req |= Implementation in POC1 |= Notes
51 -| **FR4** | Analysis Summary | Basic format; quality metadata deferred to POC2
52 -| **FR7** | Automated Verdicts | Full implementation with quality gates (NFR11)
53 -| **NFR11** | Quality Assurance Framework | 4 quality gates implemented
48 +Scenarios are additive complexity, not foundational. Deferring them to POC2 allows:
49 +* Faster POC1 validation
50 +* Learning from POC1 to inform scenario design
51 +* Iterative approach: fail fast if basic AI doesn't work
52 +* Flexibility to adjust scenario architecture based on POC1 insights
54 54  
55 -**POC1 also implements these workflow components** (detailed as FR1-FR6 in implementation sections below)
54 +**Full System Workflow (Future):**
55 +{{code}}
56 +Claims → Scenarios → Evidence → Verdicts
57 +{{/code}}
56 56  
57 -{{info}}
58 -**Note:** FR11 (Audit Trail) and FR13 (In-Article Claim Highlighting) are deferred to Beta 0 for production readiness and user experience enhancement.
59 -{{/info}}:
60 -* Claim extraction (FR1)
61 -* Claim context (FR2)
62 -* Multiple scenarios (FR3)
63 -* Evidence collection (FR5)
64 -* Source quality assessment (FR6)
65 -* Time evolution tracking (FR8) - deferred to POC2
66 -* Audit trail (FR11) - deferred to Beta 0
67 -* In-article highlighting (FR13) - deferred to Beta 0
59 +**POC1 Simplified Workflow:**
60 +{{code}}
61 +Claims → Verdicts (scenarios implicit in reasoning)
62 +{{/code}}
68 68  
69 -**Partial implementations:**
70 -* NFR1 (Explainability) - Basic only
71 -* NFR2 (Performance) - Functional but not optimized
72 -* NFR3 (Transparency) - Basic only
64 +---
73 73  
74 -**Detailed POC1 implementation specifications continue below...**
66 +== 2. POC Output Specification ==
75 75  
68 +=== 2.1 Component 1: ANALYSIS SUMMARY ===
76 76  
70 +**What:** Brief overview of findings
71 +**Length:** 3-5 sentences
72 +**Content:**
73 +* How many claims found
74 +* Distribution of verdicts
75 +* Overall assessment
77 77  
78 -== 3. POC Simplifications ==
77 +**Example:**
78 +{{code}}
79 +This article makes 4 claims about coffee's health effects. We found
80 +2 claims are well-supported, 1 is uncertain, and 1 is refuted.
81 +Overall assessment: mostly accurate with some exaggeration.
82 +{{/code}}
79 79  
80 -=== 3.1 FR1: Claim Extraction (Full Implementation) ===
84 +---
81 81  
82 -**Main Requirement:** AI extracts factual claims from input text
86 +=== 2.2 Component 2: CLAIMS IDENTIFICATION ===
83 83  
88 +**What:** List of factual claims extracted from article
89 +**Format:** Numbered list
90 +**Quantity:** 3-5 claims
91 +**Requirements:**
92 +* Factual claims only (not opinions/questions)
93 +* Clearly stated
94 +* Automatically extracted by AI
95 +
96 +**Example:**
97 +{{code}}
98 +CLAIMS IDENTIFIED:
99 +
100 +[1] Coffee reduces diabetes risk by 30%
101 +[2] Coffee improves heart health
102 +[3] Decaf has same benefits as regular
103 +[4] Coffee prevents Alzheimer's completely
104 +{{/code}}
105 +
106 +---
107 +
108 +=== 2.3 Component 3: CLAIMS VERDICTS ===
109 +
110 +**What:** Verdict for each claim identified
111 +**Format:** Per claim structure
112 +
113 +**Required Elements:**
114 +* **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
115 +* **Confidence Score:** 0-100%
116 +* **Brief Reasoning:** 1-3 sentences explaining why
117 +* **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration
118 +
119 +**Example:**
120 +{{code}}
121 +VERDICTS:
122 +
123 +[1] WELL-SUPPORTED (85%) [Risk: C]
124 +Multiple studies confirm 25-30% risk reduction with regular consumption.
125 +
126 +[2] UNCERTAIN (65%) [Risk: B]
127 +Evidence is mixed. Some studies show benefits, others show no effect.
128 +
129 +[3] PARTIALLY SUPPORTED (60%) [Risk: C]
130 +Some benefits overlap, but caffeine-related benefits are reduced in decaf.
131 +
132 +[4] REFUTED (90%) [Risk: B]
133 +No evidence for complete prevention. Claim is significantly overstated.
134 +{{/code}}
135 +
136 +**Risk Tier Display:**
137 +* **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections
138 +* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
139 +* **Tier C (Green):** Low Risk - Facts/Definitions/History
140 +
141 +**Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.
142 +
143 +---
144 +
145 +=== 2.4 Component 4: ARTICLE SUMMARY (Optional) ===
146 +
147 +**What:** Brief summary of original article content
148 +**Length:** 3-5 sentences
149 +**Tone:** Neutral (article's position, not FactHarbor's analysis)
150 +
151 +**Example:**
152 +{{code}}
153 +ARTICLE SUMMARY:
154 +
155 +Health News Today article discusses coffee benefits, citing studies
156 +on diabetes and Alzheimer's. Author highlights research linking coffee
157 +to disease prevention. Recommends 2-3 cups daily for optimal health.
158 +{{/code}}
159 +
160 +---
161 +
162 +=== 2.5 Total Output Size ===
163 +
164 +**Combined:** ~200-300 words
165 +* Analysis Summary: 50-70 words
166 +* Claims Identification: 30-50 words
167 +* Claims Verdicts: 100-150 words
168 +* Article Summary: 30-50 words (optional)
169 +
170 +---
171 +
172 +== 3. What's NOT in POC Scope ==
173 +
174 +=== 3.1 Feature Exclusions ===
175 +
176 +The following are **explicitly excluded** from POC:
177 +
178 +**Content Features:**
179 +* ❌ Scenarios (deferred to POC2)
180 +* ❌ Evidence display (supporting/opposing lists)
181 +* ❌ Source links (clickable references)
182 +* ❌ Detailed reasoning chains
183 +* ❌ Source quality ratings (shown but not detailed)
184 +* ❌ Contradiction detection (basic only)
185 +* ❌ Risk assessment (shown but not workflow-integrated)
186 +
187 +**Platform Features:**
188 +* ❌ User accounts / authentication
189 +* ❌ Saved history
190 +* ❌ Search functionality
191 +* ❌ Claim comparison
192 +* ❌ User contributions
193 +* ❌ Commenting system
194 +* ❌ Social sharing
195 +
196 +**Technical Features:**
197 +* ❌ Browser extensions
198 +* ❌ Mobile apps
199 +* ❌ API endpoints
200 +* ❌ Webhooks
201 +* ❌ Export features (PDF, CSV)
202 +
203 +**Quality Features:**
204 +* ❌ Accessibility (WCAG compliance)
205 +* ❌ Multilingual support
206 +* ❌ Mobile optimization
207 +* ❌ Media verification (images/videos)
208 +
209 +**Production Features:**
210 +* ❌ Security hardening
211 +* ❌ Privacy compliance (GDPR)
212 +* ❌ Terms of service
213 +* ❌ Monitoring/logging
214 +* ❌ Error tracking
215 +* ❌ Analytics
216 +* ❌ A/B testing
217 +
218 +---
219 +
220 +== 4. POC Simplifications vs. Full System ==
221 +
222 +=== 4.1 Architecture Comparison ===
223 +
224 +**POC Architecture (Simplified):**
225 +{{code}}
226 +User Input → Single AKEL Call → Output Display
227 + (all processing)
228 +{{/code}}
229 +
230 +**Full System Architecture:**
231 +{{code}}
232 +User Input → Claim Extractor → Claim Classifier → Scenario Generator
233 +→ Evidence Summarizer → Contradiction Detector → Verdict Generator
234 +→ Quality Gates → Publication → Output Display
235 +{{/code}}
236 +
237 +**Key Differences:**
238 +
239 +|=Aspect|=POC1|=Full System
240 +|Processing|Single API call|Multi-component pipeline
241 +|Scenarios|None (implicit)|Explicit entities with versioning
242 +|Evidence|Basic retrieval|Comprehensive with quality scoring
243 +|Quality Gates|Simplified (4 basic checks)|Full validation infrastructure
244 +|Workflow|3 steps (input/process/output)|6 phases with gates
245 +|Data Model|Stateless (no database)|PostgreSQL + Redis + S3
246 +|Architecture|Single prompt to Claude|AKEL Orchestrator + Components
247 +
248 +---
249 +
250 +=== 4.2 Workflow Comparison ===
251 +
252 +**POC1 Workflow:**
253 +1. User submits text/URL
254 +2. Single AKEL call (all processing in one prompt)
255 +3. Display results
256 +**Total: 3 steps, ~10-18 seconds**
257 +
258 +**Full System Workflow:**
259 +1. **Claim Submission** (extraction, normalization, clustering)
260 +2. **Scenario Building** (definitions, assumptions, boundaries)
261 +3. **Evidence Handling** (retrieval, assessment, linking)
262 +4. **Verdict Creation** (synthesis, reasoning, approval)
263 +5. **Public Presentation** (summaries, landscapes, deep dives)
264 +6. **Time Evolution** (versioning, re-evaluation triggers)
265 +**Total: 6 phases with quality gates, ~10-30 seconds**
266 +
267 +---
268 +
269 +=== 4.3 Why POC is Simplified ===
270 +
271 +**Engineering Rationale:**
272 +
273 +1. **Test core capability first:** Can AI do basic fact-checking without humans?
274 +2. **Fail fast:** If AI can't generate reasonable verdicts, pivot early
275 +3. **Learn before building:** POC1 insights inform full architecture
276 +4. **Iterative approach:** Add complexity only after validating foundations
277 +5. **Resource efficiency:** Don't build full system if core concept fails
278 +
279 +**Acceptable Trade-offs:**
280 +
281 +* ✅ POC proves AI capability (most risky assumption)
282 +* ✅ POC validates user comprehension (can people understand output?)
283 +* ❌ POC doesn't validate full workflow (test in Beta)
284 +* ❌ POC doesn't validate scale (test in Beta)
285 +* ❌ POC doesn't validate scenario architecture (design in POC2)
286 +
287 +---
288 +
289 +=== 4.4 Gap Between POC1 and POC2/Beta ===
290 +
291 +**What needs to be built for POC2:**
292 +* Scenario generation component
293 +* Evidence Model structure (full)
294 +* Scenario-evidence linking
295 +* Multi-interpretation comparison
296 +* Truth landscape visualization
297 +
298 +**What needs to be built for Beta:**
299 +* Multi-component AKEL pipeline
300 +* Quality gate infrastructure
301 +* Review workflow system
302 +* Audit sampling framework
303 +* Production data model
304 +* Federation architecture (Release 1.0)
305 +
306 +**POC1 → POC2 is significant architectural expansion.**
307 +
308 +---
309 +
310 +== 5. Publication Mode & Labeling ==
311 +
312 +=== 5.1 POC Publication Mode ===
313 +
314 +**Mode:** Mode 2 (AI-Generated, No Prior Human Review)
315 +
316 +Per FactHarbor Specification Section 11 "POC v1 Behavior":
317 +* Produces public AI-generated output
318 +* No human approval gate
319 +* Clear AI-Generated labeling
320 +* All quality gates active (simplified)
321 +* Risk tier classification shown (demo)
322 +
323 +---
324 +
325 +=== 5.2 User-Facing Labels ===
326 +
327 +**Primary Label (top of analysis):**
328 +{{code}}
329 +╔════════════════════════════════════════════════════════════╗
330 +║ [AI-GENERATED - POC/DEMO] ║
331 +║ ║
332 +║ This analysis was produced entirely by AI and has not ║
333 +║ been human-reviewed. Use for demonstration purposes. ║
334 +║ ║
335 +║ Source: AI/AKEL v1.0 (POC) ║
336 +║ Review Status: Not Reviewed (Proof-of-Concept) ║
337 +║ Quality Gates: 4/4 Passed (Simplified) ║
338 +║ Last Updated: [timestamp] ║
339 +╚════════════════════════════════════════════════════════════╝
340 +{{/code}}
341 +
342 +**Per-Claim Risk Labels:**
343 +* **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety)
344 +* **[Risk: B]** 🟡 Medium Risk (Policy/Science)
345 +* **[Risk: C]** 🟢 Low Risk (Facts/Definitions)
346 +
347 +---
348 +
349 +=== 5.3 Display Requirements ===
350 +
351 +**Must Show:**
352 +* AI-Generated status (prominent)
353 +* POC/Demo disclaimer
354 +* Risk tier per claim
355 +* Confidence scores (0-100%)
356 +* Quality gate status (passed/failed)
357 +* Timestamp
358 +
359 +**Must NOT Claim:**
360 +* Human review
361 +* Production quality
362 +* Medical/legal advice
363 +* Authoritative verdicts
364 +* Complete accuracy
365 +
366 +---
367 +
368 +=== 5.4 Mode 2 vs. Full System Publication ===
369 +
370 +|=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3
371 +|Label|AI-Generated (POC)|AI-Generated|AKEL-Generated
372 +|Review|None|None|Human-Reviewed
373 +|Quality Gates|4 (simplified)|6 (full)|6 (full) + Human
374 +|Audit|None (POC)|Sampling (5-50%)|Pre-publication
375 +|Risk Display|Demo only|Workflow-integrated|Validated
376 +|User Actions|View only|Flag for review|Trust rating
377 +
378 +---
379 +
380 +== 6. Quality Gates (Simplified Implementation) ==
381 +
382 +=== 6.1 Overview ===
383 +
384 +Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates.
385 +
386 +**Full System Has 4 Gates:**
387 +1. Source Quality
388 +2. Contradiction Search (MANDATORY)
389 +3. Uncertainty Quantification
390 +4. Structural Integrity
391 +
392 +**POC Implements Simplified Versions:**
393 +* Focus on demonstrating concept
394 +* Basic implementations sufficient
395 +* Failures displayed to user (not blocking)
396 +* Full system has comprehensive validation
397 +
398 +---
399 +
400 +=== 6.2 Gate 1: Source Quality (Basic) ===
401 +
402 +**Full System Requirements:**
403 +* Primary sources identified and accessible
404 +* Source reliability scored against whitelist
405 +* Citation completeness verified
406 +* Publication dates checked
407 +* Author credentials validated
408 +
84 84  **POC Implementation:**
85 -* ✅ AKEL extracts claims using LLM
86 -* ✅ Each claim includes original text reference
87 -* ✅ Claims are identified as factual/non-factual
88 -* ❌ No advanced claim parsing (added in POC2)
410 +* ✅ At least 2 sources found
411 +* ✅ Sources accessible (URLs valid)
412 +* ❌ No whitelist checking
413 +* ❌ No credential validation
414 +* ❌ No comprehensive reliability scoring
89 89  
90 -**Acceptance Criteria:**
91 -* Extracts 3-5 claims from typical article
92 -* Identifies factual vs non-factual claims
93 -* Quality Gate 1 validates extraction
416 +**Pass Criteria:** ≥2 accessible sources found
94 94  
418 +**Failure Handling:** Display error message, don't generate verdict
95 95  
96 -=== 3.2 FR3: Multiple Scenarios (Full Implementation) ===
420 +---
97 97  
98 -**Main Requirement:** Generate multiple interpretation scenarios for ambiguous claims
422 +=== 6.3 Gate 2: Contradiction Search (Basic) ===
99 99  
424 +**Full System Requirements:**
425 +* Counter-evidence actively searched
426 +* Reservations and limitations identified
427 +* Alternative interpretations explored
428 +* Bubble detection (echo chambers, conspiracy theories)
429 +* Cross-cultural and international perspectives
430 +* Academic literature (supporting AND opposing)
431 +
100 100  **POC Implementation:**
101 -* ✅ AKEL generates 2-3 scenarios per claim
102 -* ✅ Scenarios capture different interpretations
103 -* ✅ Each scenario is evaluated separately
104 -* ✅ Verdict considers all scenarios
433 +* ✅ Basic search for counter-evidence
434 +* ✅ Identify obvious contradictions
435 +* ❌ No comprehensive academic search
436 +* ❌ No bubble detection
437 +* ❌ No systematic alternative interpretation search
438 +* ❌ No international perspective verification
105 105  
106 -**Acceptance Criteria:**
107 -* Generates 2+ scenarios for ambiguous claims
108 -* Scenarios are meaningfully different
109 -* All scenarios are evaluated
440 +**Pass Criteria:** Basic contradiction search attempted
110 110  
442 +**Failure Handling:** Note "limited contradiction search" in output
111 111  
112 -=== 3.3 FR4: Analysis Summary (Basic Implementation) ===
444 +---
113 113  
114 -**Main Requirement:** Provide user-friendly summary of analysis
446 +=== 6.4 Gate 3: Uncertainty Quantification (Basic) ===
115 115  
448 +**Full System Requirements:**
449 +* Confidence scores calculated for all claims/verdicts
450 +* Limitations explicitly stated
451 +* Data gaps identified and disclosed
452 +* Strength of evidence assessed
453 +* Alternative scenarios considered
454 +
116 116  **POC Implementation:**
117 -* ✅ Simple text summary generated
118 -* ❌ No rich formatting (added in Beta 0)
119 -* ❌ No visual elements (added in Beta 0)
120 -* ❌ No interactive features (added in Beta 0)
456 +* ✅ Confidence scores (0-100%)
457 +* ✅ Basic uncertainty acknowledgment
458 +* ❌ No detailed limitation disclosure
459 +* ❌ No data gap identification
460 +* ❌ No alternative scenario consideration (deferred to POC2)
121 121  
122 -**POC Format:**
123 -```
124 -Claim: [extracted claim]
125 -Scenarios: [list of scenarios]
126 -Evidence: [supporting/opposing evidence]
127 -Verdict: [probability with uncertainty]
128 -```
462 +**Pass Criteria:** Confidence score assigned
129 129  
464 +**Failure Handling:** Show "Confidence: Unknown" if calculation fails
130 130  
131 -=== 3.4 FR5-FR6: Evidence Collection & Evaluation (Full Implementation) ===
466 +---
132 132  
133 -**Main Requirements:**
134 -* FR5: Collect supporting and opposing evidence
135 -* FR6: Evaluate evidence source reliability
468 +=== 6.5 Gate 4: Structural Integrity (Basic) ===
136 136  
470 +**Full System Requirements:**
471 +* No hallucinations detected (fact-checking against sources)
472 +* Logic chain valid and traceable
473 +* References accessible and verifiable
474 +* No circular reasoning
475 +* Premises clearly stated
476 +
137 137  **POC Implementation:**
138 -* ✅ AKEL searches for evidence (web/knowledge base)
139 -* ✅ **Mandatory contradiction search** (finds opposing evidence)
140 -* Source reliability scoring
141 -* ❌ No evidence deduplication (added in POC2)
142 -* ❌ No advanced source verification (added in POC2)
478 +* ✅ Basic coherence check
479 +* ✅ References accessible
480 +* No comprehensive hallucination detection
481 +* ❌ No formal logic validation
482 +* ❌ No premise extraction and verification
143 143  
484 +**Pass Criteria:** Output is coherent and references are accessible
485 +
486 +**Failure Handling:** Display error message
487 +
488 +---
489 +
490 +=== 6.6 Quality Gate Display ===
491 +
492 +**POC shows simplified status:**
493 +{{code}}
494 +Quality Gates: 4/4 Passed (Simplified)
495 +✓ Source Quality: 3 sources found
496 +✓ Contradiction Search: Basic search completed
497 +✓ Uncertainty: Confidence scores assigned
498 +✓ Structural Integrity: Output coherent
499 +{{/code}}
500 +
501 +**If any gate fails:**
502 +{{code}}
503 +Quality Gates: 3/4 Passed (Simplified)
504 +✓ Source Quality: 3 sources found
505 +✗ Contradiction Search: Search failed - limited evidence
506 +✓ Uncertainty: Confidence scores assigned
507 +✓ Structural Integrity: Output coherent
508 +
509 +Note: This analysis has limited evidence. Use with caution.
510 +{{/code}}
511 +
512 +---
513 +
514 +=== 6.7 Simplified vs. Full System ===
515 +
516 +|=Gate|=POC (Simplified)|=Full System
517 +|Source Quality|≥2 sources accessible|Whitelist scoring, credentials, comprehensiveness
518 +|Contradiction|Basic search|Systematic academic + media + international
519 +|Uncertainty|Confidence % assigned|Detailed limitations, data gaps, alternatives
520 +|Structural|Coherence check|Hallucination detection, logic validation, premise check
521 +
522 +**POC Goal:** Demonstrate that quality gates are possible, not perfect implementation.
523 +
524 +---
525 +
526 +== 7. AKEL Architecture Comparison ==
527 +
528 +=== 7.1 POC AKEL (Simplified) ===
529 +
530 +**Implementation:**
531 +* Single Claude API call (Sonnet 4.5)
532 +* One comprehensive prompt
533 +* All processing in single request
534 +* No separate components
535 +* No orchestration layer
536 +
537 +**Prompt Structure:**
538 +{{code}}
539 +Task: Analyze this article and provide:
540 +
541 +1. Extract 3-5 factual claims
542 +2. For each claim:
543 + - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
544 + - Assign confidence score (0-100%)
545 + - Assign risk tier (A/B/C)
546 + - Write brief reasoning (1-3 sentences)
547 +3. Generate analysis summary (3-5 sentences)
548 +4. Generate article summary (3-5 sentences)
549 +5. Run basic quality checks
550 +
551 +Return as structured JSON.
552 +{{/code}}
553 +
554 +**Processing Time:** 10-18 seconds (estimate)
555 +
556 +---
557 +
558 +=== 7.2 Full System AKEL (Production) ===
559 +
560 +**Architecture:**
561 +{{code}}
562 +AKEL Orchestrator
563 +├── Claim Extractor
564 +├── Claim Classifier (with risk tier assignment)
565 +├── Scenario Generator
566 +├── Evidence Summarizer
567 +├── Contradiction Detector
568 +├── Quality Gate Validator
569 +├── Audit Sampling Scheduler
570 +└── Federation Sync Adapter (Release 1.0+)
571 +{{/code}}
572 +
573 +**Processing:**
574 +* Parallel processing where possible
575 +* Separate component calls
576 +* Quality gates between phases
577 +* Audit sampling selection
578 +* Cross-node coordination (federated mode)
579 +
580 +**Processing Time:** 10-30 seconds (full pipeline)
581 +
582 +---
583 +
584 +=== 7.3 Why POC Uses Single Call ===
585 +
586 +**Advantages:**
587 +* ✅ Simpler to implement
588 +* ✅ Faster POC development
589 +* ✅ Easier to debug
590 +* ✅ Proves AI capability
591 +* ✅ Good enough for concept validation
592 +
593 +**Limitations:**
594 +* ❌ No component reusability
595 +* ❌ No parallel processing
596 +* ❌ All-or-nothing (can't partially succeed)
597 +* ❌ Harder to improve individual components
598 +* ❌ No audit sampling
599 +
600 +**Acceptable Trade-off:**
601 +
602 +POC tests "Can AI do this?" not "How should we architect it?"
603 +
604 +Full component architecture comes in Beta after POC validates concept.
605 +
606 +---
607 +
608 +=== 7.4 Evolution Path ===
609 +
610 +**POC1:** Single prompt → Prove concept
611 +**POC2:** Add scenario component → Test full pipeline
612 +**Beta:** Multi-component AKEL → Production architecture
613 +**Release 1.0:** Full AKEL + Federation → Scale
614 +
615 +---
616 +
617 +== 8. Functional Requirements ==
618 +
619 +=== FR-POC-1: Article Input ===
620 +
621 +**Requirement:** User can submit article for analysis
622 +
623 +**Functionality:**
624 +* Text input field (paste article text, up to 5000 characters)
625 +* URL input field (paste article URL)
626 +* "Analyze" button to trigger processing
627 +* Loading indicator during analysis
628 +
629 +**Excluded:**
630 +* No user authentication
631 +* No claim history
632 +* No search functionality
633 +* No saved templates
634 +
144 144  **Acceptance Criteria:**
145 -* Finds 2+ supporting evidence items
146 -* Finds 1+ opposing evidence (if exists)
147 -* Sources scored for reliability
636 +* User can paste text from article
637 +* User can paste URL of article
638 +* System accepts input and triggers analysis
148 148  
640 +---
149 149  
150 -=== 3.5 FR7: Automated Verdicts (Full Implementation) ===
642 +=== FR-POC-2: Claim Extraction (Fully Automated) ===
151 151  
152 -**Main Requirement:** AI computes verdicts with uncertainty quantification
644 +**Requirement:** AI automatically extracts 3-5 factual claims
153 153  
154 -**POC Implementation:**
155 -* Probabilistic verdicts (0-100% confidence)
156 -* Uncertainty explicitly stated
157 -* Reasoning chain provided
158 -* ✅ Quality Gate 4 validates verdict confidence
646 +**Functionality:**
647 +* AI reads article text
648 +* AI identifies factual claims (not opinions/questions)
649 +* AI extracts 3-5 most important claims
650 +* System displays numbered list
159 159  
160 -**POC Output:**
161 -```
162 -Verdict: 70% likely true
163 -Uncertainty: ±15% (moderate confidence)
164 -Reasoning: Based on 3 high-quality sources...
165 -Confidence Level: MEDIUM
166 -```
652 +**Critical:** NO MANUAL EDITING ALLOWED
653 +* AI selects which claims to extract
654 +* AI identifies factual vs. non-factual
655 +* System processes claims as extracted
656 +* No human curation or correction
167 167  
658 +**Error Handling:**
659 +* If extraction fails: Display error message
660 +* User can retry with different input
661 +* No manual intervention to fix extraction
662 +
168 168  **Acceptance Criteria:**
169 -* Verdicts include probability (0-100%)
170 -* Uncertainty explicitly quantified
171 -* Reasoning chain explains verdict
664 +* AI extracts 3-5 claims automatically
665 +* Claims are factual (not opinions)
666 +* Claims are clearly stated
667 +* No manual editing required
172 172  
669 +---
173 173  
174 -=== 3.6 NFR11: Quality Assurance Framework (LITE VERSION) ===
671 +=== FR-POC-3: Verdict Generation (Fully Automated) ===
175 175  
176 -**Main Requirement:** Complete quality assurance with 7 quality gates
673 +**Requirement:** AI automatically generates verdict for each claim
177 177  
178 -**POC Implementation:** **2 gates only**
675 +**Functionality:**
676 +* For each claim, AI:
677 + * Evaluates claim based on available evidence/knowledge
678 + * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
679 + * Assigns confidence score (0-100%)
680 + * Assigns risk tier (A/B/C)
681 + * Writes brief reasoning (1-3 sentences)
682 +* System displays verdict for each claim
179 179  
180 -**Quality Gate 1: Claim Validation**
181 -* ✅ Validates claim is factual and verifiable
182 -* ✅ Blocks non-factual claims (opinion/prediction/ambiguous)
183 -* ✅ Provides clear rejection reason
684 +**Critical:** NO MANUAL EDITING ALLOWED
685 +* AI computes verdicts based on evidence
686 +* AI generates confidence scores
687 +* AI writes reasoning
688 +* No human review or adjustment
184 184  
185 -**Quality Gate 4: Verdict Confidence Assessment**
186 -* ✅ Validates ≥2 sources found
187 -* ✅ Validates quality score ≥0.6
188 -* ✅ Blocks low-confidence verdicts
189 -* ✅ Provides clear rejection reason
690 +**Error Handling:**
691 +* If verdict generation fails: Display error message
692 +* User can retry
693 +* No manual intervention to adjust verdicts
190 190  
191 -**Out of Scope (POC2+):**
192 -* ❌ Gate 2: Evidence Relevance
193 -* ❌ Gate 3: Scenario Coherence
194 -* ❌ Gate 5: Source Diversity
195 -* ❌ Gate 6: Reasoning Validity
196 -* ❌ Gate 7: Output Completeness
695 +**Acceptance Criteria:**
696 +* Each claim has a verdict
697 +* Confidence score is displayed (0-100%)
698 +* Risk tier is displayed (A/B/C)
699 +* Reasoning is understandable (1-3 sentences)
700 +* Verdict is defensible given reasoning
701 +* All generated automatically by AI
197 197  
198 -**Rationale:** Prove gate concept works. Add remaining gates in POC2 after validating approach.
703 +---
199 199  
705 +=== FR-POC-4: Analysis Summary (Fully Automated) ===
200 200  
201 -=== 3.7 NFR1-3: Performance, Scalability, Reliability (Basic) ===
707 +**Requirement:** AI generates brief summary of analysis
202 202  
203 -**Main Requirements:**
204 -* NFR1: Response time < 30 seconds
205 -* NFR2: Handle 1000+ concurrent users
206 -* NFR3: 99.9% uptime
709 +**Functionality:**
710 +* AI summarizes findings in 3-5 sentences:
711 + * How many claims found
712 + * Distribution of verdicts
713 + * Overall assessment
714 +* System displays at top of results
207 207  
208 -**POC Implementation:**
209 -* ⚠️ **Response time monitored** (not optimized)
210 -* ⚠️ **Single-threaded processing** (no concurrency)
211 -* ⚠️ **Basic error handling** (no advanced retry logic)
716 +**Critical:** NO MANUAL EDITING ALLOWED
212 212  
213 -**Rationale:** POC proves functionality. Performance optimization happens in POC2.
718 +**Acceptance Criteria:**
719 +* Summary is coherent
720 +* Accurately reflects analysis
721 +* 3-5 sentences
722 +* Automatically generated
214 214  
215 -**POC Acceptance:**
216 -* Analysis completes (no timeout requirement)
217 -* Errors don't crash system
218 -* Basic logging in place
724 +---
219 219  
726 +=== FR-POC-5: Article Summary (Fully Automated, Optional) ===
220 220  
221 -== 4. What's NOT in POC Scope ==
728 +**Requirement:** AI generates brief summary of original article
222 222  
223 -=== 4.1 User-Facing Features (Beta 0+) ===
730 +**Functionality:**
731 +* AI summarizes article content (not FactHarbor's analysis)
732 +* 3-5 sentences
733 +* System displays
224 224  
225 -{{warning}}
226 -**Deferred to Beta 0:**
227 -{{/warning}}
735 +**Note:** Optional - can skip if time limited
228 228  
229 -**Out of Scope:**
230 -* ❌ User accounts and authentication (FR8)
231 -* ❌ User corrections system (FR9, FR45-46)
232 -* ❌ Public publishing interface (FR10)
233 -* ❌ Social sharing (FR11)
234 -* ❌ Email notifications (FR12)
235 -* ❌ API access (FR13)
737 +**Critical:** NO MANUAL EDITING ALLOWED
236 236  
237 -**Rationale:** POC validates AI capabilities. User features added in Beta 0.
739 +**Acceptance Criteria:**
740 +* Summary is neutral (article's position)
741 +* Accurately reflects article content
742 +* 3-5 sentences
743 +* Automatically generated
238 238  
745 +---
239 239  
240 -=== 4.2 Advanced Features (V1.0+) ===
747 +=== FR-POC-6: Publication Mode Display ===
241 241  
242 -**Out of Scope:**
243 -* ❌ IFCN compliance (FR47)
244 -* ❌ ClaimReview schema (FR48)
245 -* ❌ Archive.org integration (FR49)
246 -* ❌ OSINT toolkit (FR50)
247 -* ❌ Video verification (FR51)
248 -* ❌ Deepfake detection (FR52)
249 -* ❌ Cross-org sharing (FR53)
749 +**Requirement:** Clear labeling of AI-generated content
250 250  
251 -**Rationale:** Advanced features require proven platform. Added post-V1.0.
751 +**Functionality:**
752 +* Display Mode 2 publication label
753 +* Show POC/Demo disclaimer
754 +* Display risk tiers per claim
755 +* Show quality gate status
756 +* Display timestamp
252 252  
758 +**Acceptance Criteria:**
759 +* Label is prominent and clear
760 +* User understands this is AI-generated POC output
761 +* Risk tiers are color-coded
762 +* Quality gate status is visible
253 253  
254 -=== 4.3 Production Requirements (POC2, Beta 0) ===
764 +---
255 255  
256 -**Out of Scope:**
257 -* ❌ Security controls (NFR4, NFR12)
258 -* ❌ Code maintainability (NFR5)
259 -* ❌ System monitoring (NFR13)
260 -* ❌ Evidence deduplication
261 -* ❌ Advanced source verification
262 -* ❌ Full 7-gate quality framework
766 +=== FR-POC-7: Quality Gate Execution ===
263 263  
264 -**Rationale:** POC proves concept. Production hardening happens in POC2 and Beta 0.
768 +**Requirement:** Execute simplified quality gates
265 265  
770 +**Functionality:**
771 +* Check source quality (basic)
772 +* Attempt contradiction search (basic)
773 +* Calculate confidence scores
774 +* Verify structural integrity (basic)
775 +* Display gate results
266 266  
267 -== 5. POC Output Specification ==
777 +**Acceptance Criteria:**
778 +* All 4 gates attempted
779 +* Pass/fail status displayed
780 +* Failures explained to user
781 +* Gates don't block publication (POC mode)
268 268  
269 -=== 5.1 Required Output Elements ===
783 +---
270 270  
271 -For each analyzed claim, POC must produce:
785 +== 9. Non-Functional Requirements ==
272 272  
273 -**1. Claim**
274 -* Original text
275 -* Classification (factual/non-factual/ambiguous)
276 -* If non-factual: Clear reason why
787 +=== NFR-POC-1: Fully Automated Processing ===
277 277  
278 -**2. Scenarios** (if factual)
279 -* 2-3 interpretation scenarios
280 -* Each scenario clearly described
789 +**Requirement:** Complete AI automation with zero manual intervention
281 281  
282 -**3. Evidence** (if factual)
283 -* Supporting evidence (2+ items)
284 -* Opposing evidence (if exists)
285 -* Source URLs and reliability scores
791 +**Critical Rule:** NO MANUAL EDITING AT ANY STAGE
286 286  
287 -**4. Verdict** (if factual)
288 -* Probability (0-100%)
289 -* Uncertainty quantification
290 -* Confidence level (LOW/MEDIUM/HIGH)
291 -* Reasoning chain
793 +**What this means:**
794 +* Claims: AI selects (no human curation)
795 +* Scenarios: N/A (deferred to POC2)
796 +* Evidence: AI evaluates (no human selection)
797 +* Verdicts: AI determines (no human adjustment)
798 +* Summaries: AI writes (no human editing)
292 292  
293 -**5. Quality Status**
294 -* Which gates passed/failed
295 -* If failed: Clear explanation why
800 +**Pipeline:**
801 +{{code}}
802 +User Input → AKEL Processing → Output Display
803 + ↓
804 + ZERO human editing
805 +{{/code}}
296 296  
807 +**If AI output is poor:**
808 +* ❌ Do NOT manually fix it
809 +* ✅ Document the failure
810 +* ✅ Improve prompts and retry
811 +* ✅ Accept that POC might fail
297 297  
298 -=== 5.2 Example POC Output ===
813 +**Why this matters:**
814 +* Tests whether AI can do this without humans
815 +* Validates scalability (humans can't review every analysis)
816 +* Honest test of technical feasibility
299 299  
300 -{{code language="json"}}
301 -{
302 - "claim": {
303 - "text": "Switzerland has the highest life expectancy in Europe",
304 - "type": "factual",
305 - "gate1_status": "PASS"
306 - },
307 - "scenarios": [
308 - "Switzerland's overall life expectancy is highest",
309 - "Switzerland ranks highest for specific age groups"
310 - ],
311 - "evidence": {
312 - "supporting": [
313 - {
314 - "source": "WHO Report 2023",
315 - "reliability": 0.95,
316 - "excerpt": "Switzerland: 83.4 years average..."
317 - }
318 - ],
319 - "opposing": [
320 - {
321 - "source": "Eurostat 2024",
322 - "reliability": 0.90,
323 - "excerpt": "Spain leads at 83.5 years..."
324 - }
325 - ]
326 - },
327 - "verdict": {
328 - "probability": 0.65,
329 - "uncertainty": 0.15,
330 - "confidence": "MEDIUM",
331 - "reasoning": "WHO and Eurostat show similar but conflicting data...",
332 - "gate4_status": "PASS"
333 - }
334 -}
818 +---
819 +
820 +=== NFR-POC-2: Performance ===
821 +
822 +**Requirement:** Analysis completes in reasonable time
823 +
824 +**Acceptable Performance:**
825 +* Processing time: 1-5 minutes (acceptable for POC)
826 +* Display loading indicator to user
827 +* Show progress if possible ("Extracting claims...", "Generating verdicts...")
828 +
829 +**Not Required:**
830 +* Production-level speed (< 30 seconds)
831 +* Optimization for scale
832 +* Caching
833 +
834 +**Acceptance Criteria:**
835 +* Analysis completes within 5 minutes
836 +* User sees loading indicator
837 +* No timeout errors
838 +
839 +---
840 +
841 +=== NFR-POC-3: Reliability ===
842 +
843 +**Requirement:** System works for manual testing sessions
844 +
845 +**Acceptable:**
846 +* Occasional errors (< 20% failure rate)
847 +* Manual restart if needed
848 +* Display error messages clearly
849 +
850 +**Not Required:**
851 +* 99.9% uptime
852 +* Automatic error recovery
853 +* Production monitoring
854 +
855 +**Acceptance Criteria:**
856 +* System works for test demonstrations
857 +* Errors are handled gracefully
858 +* User receives clear error messages
859 +
860 +---
861 +
862 +=== NFR-POC-4: Environment ===
863 +
864 +**Requirement:** Runs on simple infrastructure
865 +
866 +**Acceptable:**
867 +* Single machine or simple cloud setup
868 +* No distributed architecture
869 +* No load balancing
870 +* No redundancy
871 +* Local development environment viable
872 +
873 +**Not Required:**
874 +* Production infrastructure
875 +* Multi-region deployment
876 +* Auto-scaling
877 +* Disaster recovery
878 +
879 +---
880 +
881 +== 10. Technical Architecture ==
882 +
883 +=== 10.1 System Components ===
884 +
885 +**Frontend:**
886 +* Simple HTML form (text input + URL input + button)
887 +* Loading indicator
888 +* Results display page (single page, no tabs/navigation)
889 +
890 +**Backend:**
891 +* Single API endpoint
892 +* Calls Claude API (Sonnet 4.5 or latest)
893 +* Parses response
894 +* Returns JSON to frontend
895 +
896 +**Data Storage:**
897 +* None required (stateless POC)
898 +* Optional: Simple file storage or SQLite for demo examples
899 +
900 +**External Services:**
901 +* Claude API (Anthropic) - required
902 +* Optional: URL fetch service for article text extraction
903 +
904 +---
905 +
906 +=== 10.2 Processing Flow ===
907 +
908 +{{code}}
909 +1. User submits text or URL
910 + ↓
911 +2. Backend receives request
912 + ↓
913 +3. If URL: Fetch article text
914 + ↓
915 +4. Call Claude API with single prompt:
916 + "Extract claims, evaluate each, provide verdicts"
917 + ↓
918 +5. Claude API returns:
919 + - Analysis summary
920 + - Claims list
921 + - Verdicts for each claim (with risk tiers)
922 + - Article summary (optional)
923 + - Quality gate results
924 + ↓
925 +6. Backend parses response
926 + ↓
927 +7. Frontend displays results with Mode 2 labeling
335 335  {{/code}}
336 336  
930 +**Key Simplification:** Single API call does entire analysis
337 337  
338 -== 6. Success Criteria ==
932 +---
339 339  
340 -{{success}}
341 -**POC Success Definition:** POC validates that AI can extract claims, find balanced evidence, and compute reasonable verdicts with quality gates improving output quality.
342 -{{/success}}
934 +=== 10.3 AI Prompt Strategy ===
343 343  
344 -=== 6.1 Functional Success ===
936 +**Single Comprehensive Prompt:**
937 +{{code}}
938 +Task: Analyze this article and provide:
345 345  
346 -POC is successful if:
940 +1. Extract 3-5 factual claims from the article
941 +2. For each claim:
942 + - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
943 + - Assign confidence score (0-100%)
944 + - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
945 + - Write brief reasoning (1-3 sentences)
946 +3. Run quality gates:
947 + - Check: ≥2 sources found
948 + - Attempt: Basic contradiction search
949 + - Calculate: Confidence scores
950 + - Verify: Structural integrity
951 +4. Write analysis summary (3-5 sentences: claims found, verdict distribution, overall assessment)
952 +5. Write article summary (3-5 sentences: neutral summary of article content)
347 347  
348 -✅ **FR1-FR7 Requirements Met:**
349 -1. Extracts 3-5 factual claims from test articles
350 -2. Generates 2-3 scenarios per ambiguous claim
351 -3. Finds supporting AND opposing evidence
352 -4. Computes probabilistic verdicts with uncertainty
353 -5. Provides clear reasoning chains
954 +Return as structured JSON with quality gate results.
955 +{{/code}}
354 354  
355 -✅ **Quality Gates Work:**
356 -1. Gate 1 blocks non-factual claims (100% block rate)
357 -2. Gate 4 blocks low-quality verdicts (blocks if <2 sources or quality <0.6)
358 -3. Clear rejection reasons provided
957 +**One prompt generates everything.**
359 359  
360 -✅ **NFR11 Met:**
361 -1. Quality gates reduce hallucination rate
362 -2. Blocked outputs have clear explanations
363 -3. Quality metrics are logged
959 +---
364 364  
961 +=== 10.4 Technology Stack Suggestions ===
365 365  
366 -=== 6.2 Quality Thresholds ===
963 +**Frontend:**
964 +* HTML + CSS + JavaScript (minimal framework)
965 +* OR: Next.js (if team prefers)
966 +* Hosted: Local machine OR Vercel/Netlify free tier
367 367  
368 -**Minimum Acceptable:**
369 -* ≥70% of test claims correctly classified (factual/non-factual)
370 -* ≥60% of verdicts are reasonable (human evaluation)
371 -* Gate 1 blocks 100% of non-factual claims
372 -* Gate 4 blocks verdicts with <2 sources
968 +**Backend:**
969 +* Python Flask/FastAPI (simple REST API)
970 +* OR: Next.js API routes (if using Next.js)
971 +* Hosted: Local machine OR Railway/Render free tier
373 373  
374 -**Target:**
375 -* ≥80% claims correctly classified
376 -* ≥75% verdicts are reasonable
377 -* <10% false positives (blocking good claims)
973 +**AKEL Integration:**
974 +* Claude API via Anthropic SDK
975 +* Model: Claude Sonnet 4.5 or latest available
378 378  
977 +**Database:**
978 +* None (stateless acceptable)
979 +* OR: SQLite if want to store demo examples
980 +* OR: JSON files on disk
379 379  
380 -=== 6.3 POC Decision Gate ===
982 +**Deployment:**
983 +* Local development environment sufficient for POC
984 +* Optional: Deploy to cloud for remote demos
381 381  
382 -**After POC1, we decide:**
986 +---
383 383  
384 -**✅ PROCEED to POC2** if:
385 -* Success criteria met
386 -* Quality gates demonstrably improve output
387 -* Core workflow is technically sound
388 -* Clear path to production quality
988 +== 11. Success Criteria ==
389 389  
390 -**⚠️ ITERATE POC1** if:
391 -* Success criteria partially met
392 -* Gates work but need tuning
393 -* Core issues identified but fixable
990 +=== 11.1 Minimum Success (POC Passes) ===
394 394  
395 -**❌ PIVOT APPROACH** if:
396 -* Success criteria not met
397 -* Fundamental AI limitations discovered
398 -* Quality gates insufficient
399 -* Alternative approach needed
992 +**Required for GO decision:**
993 +* ✅ AI extracts 3-5 factual claims automatically
994 +* ✅ AI provides verdict for each claim automatically
995 +* ✅ Verdicts are reasonable (≥70% make logical sense)
996 +* ✅ Analysis summary is coherent
997 +* ✅ Output is comprehensible to reviewers
998 +* ✅ Team/advisors understand the output
999 +* ✅ Team agrees approach has merit
1000 +* ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention)
400 400  
1002 +**Quality Definition:**
1003 +* "Reasonable verdict" = Defensible given general knowledge
1004 +* "Coherent summary" = Logically structured, grammatically correct
1005 +* "Comprehensible" = Reviewers understand what analysis means
401 401  
402 -== 7. Test Cases ==
1007 +---
403 403  
404 -=== 7.1 Happy Path ===
1009 +=== 11.2 POC Fails If ===
405 405  
406 -**Test 1: Simple Factual Claim**
407 -* Input: "Paris is the capital of France"
408 -* Expected: Factual, 1 scenario, verdict ~95% true
1011 +**Automatic NO-GO if any of these:**
1012 +* ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones)
1013 +* ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random)
1014 +* ❌ Output incomprehensible (reviewers can't understand analysis)
1015 +* ❌ **Requires manual editing for most analyses** (> 50% need human correction)
1016 +* ❌ Team loses confidence in AI-automated approach
409 409  
410 -**Test 2: Ambiguous Claim**
411 -* Input: "Switzerland has the highest income in Europe"
412 -* Expected: Factual, 2-3 scenarios, verdict with uncertainty
1018 +---
413 413  
414 -**Test 3: Statistical Claim**
415 -* Input: "10% of people have condition X"
416 -* Expected: Factual, evidence with numbers, probabilistic verdict
1020 +=== 11.3 Quality Thresholds ===
417 417  
1022 +**POC quality expectations:**
418 418  
419 -=== 7.2 Edge Cases ===
1024 +|=Component|=Quality Threshold|=Definition
1025 +|Claim Extraction|(% class="success" %)≥70% accuracy(%%) |Identifies obvious factual claims, may miss some edge cases
1026 +|Verdict Logic|(% class="success" %)≥70% defensible(%%) |Verdicts are logical given reasoning provided
1027 +|Reasoning Clarity|(% class="success" %)≥70% clear(%%) |1-3 sentences are understandable and relevant
1028 +|Overall Analysis|(% class="success" %)≥70% useful(%%) |Output helps user understand article claims
420 420  
421 -**Test 4: Opinion**
422 -* Input: "Paris is the best city"
423 -* Expected: Non-factual (opinion), blocked by Gate 1
1030 +**Analogy:** "B student" quality (70-80%), not "A+" perfection yet
424 424  
425 -**Test 5: Prediction**
426 -* Input: "Bitcoin will reach $100,000 next year"
427 -* Expected: Non-factual (prediction), blocked by Gate 1
1032 +**Not expecting:**
1033 +* 100% accuracy
1034 +* Perfect claim coverage
1035 +* Comprehensive evidence gathering
1036 +* Flawless verdicts
1037 +* Production polish
428 428  
429 -**Test 6: Insufficient Evidence**
430 -* Input: Obscure factual claim with no sources
431 -* Expected: Blocked by Gate 4 (<2 sources)
1039 +**Expecting:**
1040 +* Reasonable claim extraction
1041 +* Defensible verdicts
1042 +* Understandable reasoning
1043 +* Useful output
432 432  
1045 +---
433 433  
434 -=== 7.3 Quality Gate Tests ===
1047 +== 12. Test Cases ==
435 435  
436 -**Test 7: Gate 1 Effectiveness**
437 -* Input: Mix of 10 factual + 10 non-factual claims
438 -* Expected: Gate 1 blocks all 10 non-factual (100% precision)
1049 +=== 12.1 Test Case 1: Simple Factual Claim ===
439 439  
440 -**Test 8: Gate 4 Effectiveness**
441 -* Input: Claims with varying evidence availability
442 -* Expected: Gate 4 blocks low-confidence verdicts
1051 +**Input:** "Coffee reduces the risk of type 2 diabetes by 30%"
443 443  
1053 +**Expected Output:**
1054 +* Extract claim correctly
1055 +* Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED
1056 +* Confidence: 70-90%
1057 +* Risk tier: C (Low)
1058 +* Reasoning: Mentions studies or evidence
444 444  
445 -== 8. Technical Architecture (POC) ==
1060 +**Success:** Verdict is reasonable and reasoning makes sense
446 446  
447 -=== 8.1 Simplified Architecture ===
1062 +---
448 448  
449 -**POC Tech Stack:**
450 -* **Frontend:** Simple web interface (Next.js + TypeScript)
451 -* **Backend:** Single API endpoint
452 -* **AI:** Claude API (Sonnet 4.5)
453 -* **Database:** Local JSON files (no database)
454 -* **Deployment:** Single server
1064 +=== 12.2 Test Case 2: Complex News Article ===
455 455  
456 -**Architecture Diagram:** See [[POC1 Specification>>FactHarbor.Specification.POC.Specification]]
1066 +**Input:** News article URL with multiple claims about politics/health/science
457 457  
1068 +**Expected Output:**
1069 +* Extract 3-5 key claims
1070 +* Verdict for each (may vary: some supported, some uncertain, some refuted)
1071 +* Coherent analysis summary
1072 +* Article summary
1073 +* Risk tiers assigned appropriately
458 458  
459 -=== 8.2 AKEL Implementation ===
1075 +**Success:** Claims identified are actually from article, verdicts are reasonable
460 460  
461 -**POC AKEL:**
462 -* Single-threaded processing
463 -* Synchronous API calls
464 -* No caching
465 -* Basic error handling
466 -* Console logging
1077 +---
467 467  
468 -**Full AKEL (POC2+):**
469 -* Multi-threaded processing
470 -* Async API calls
471 -* Evidence caching
472 -* Advanced error handling with retry
473 -* Structured logging + monitoring
1079 +=== 12.3 Test Case 3: Controversial Topic ===
474 474  
1081 +**Input:** Article on contested political or scientific topic
475 475  
476 -== 9. POC Philosophy ==
1083 +**Expected Output:**
1084 +* Balanced analysis
1085 +* Acknowledges uncertainty where appropriate
1086 +* Doesn't overstate confidence
1087 +* Reasoning shows awareness of complexity
477 477  
478 -{{info}}
479 -**Important:** POC validates concept, not production readiness. Focus is on proving AI can do the job, with production quality coming in later phases.
480 -{{/info}}
1089 +**Success:** Analysis is fair and doesn't show obvious bias
481 481  
482 -=== 9.1 Core Principles ===
1091 +---
483 483  
484 -**1. Prove Concept, Not Production**
485 -* POC validates AI can do the job
486 -* Production quality comes in POC2 and Beta 0
487 -* Focus on "does it work?" not "is it perfect?"
1093 +=== 12.4 Test Case 4: Clearly False Claim ===
488 488  
489 -**2. Implement Subset of Requirements**
490 -* POC covers FR1-7, NFR11 (lite)
491 -* All other requirements deferred
492 -* Clear mapping to [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]
1095 +**Input:** Article with obviously false claim (e.g., "The Earth is flat")
493 493  
494 -**3. Quality Gates Validate Approach**
495 -* 2 gates prove the concept
496 -* Remaining 5 gates added in POC2
497 -* Gates must demonstrably improve quality
1097 +**Expected Output:**
1098 +* Extract claim
1099 +* Verdict: REFUTED
1100 +* High confidence (> 90%)
1101 +* Risk tier: C (Low - established fact)
1102 +* Clear reasoning
498 498  
499 -**4. Iterate Based on Results**
500 -* POC results determine next steps
501 -* Decision gate after POC1
502 -* Flexibility to pivot if needed
1104 +**Success:** AI correctly identifies false claim with high confidence
503 503  
1106 +---
504 504  
505 -=== 9.2 Success = Clear Path Forward ===
1108 +=== 12.5 Test Case 5: Genuinely Uncertain Claim ===
506 506  
507 -POC succeeds if we can confidently answer:
1110 +**Input:** Article with claim where evidence is genuinely mixed
508 508  
509 -✅ **Technical Feasibility:**
510 -* Can AI extract claims reliably?
511 -* Can AI find balanced evidence?
512 -* Can AI compute reasonable verdicts?
1112 +**Expected Output:**
1113 +* Extract claim
1114 +* Verdict: UNCERTAIN
1115 +* Moderate confidence (40-60%)
1116 +* Reasoning explains why uncertain
513 513  
514 -✅ **Quality Approach:**
515 -* Do quality gates improve output?
516 -* Can we measure and track quality?
517 -* Is the gate approach scalable?
1118 +**Success:** AI recognizes uncertainty and doesn't overstate confidence
518 518  
519 -✅ **Production Path:**
520 -* Is the core architecture sound?
521 -* What needs improvement for production?
522 -* Is POC2 the right next step?
1120 +---
523 523  
1122 +=== 12.6 Test Case 6: High-Risk Medical Claim ===
524 524  
525 -== 10. Related Pages ==
1124 +**Input:** Article making medical claims
526 526  
527 -* **[[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]** - Full system requirements (this POC implements a subset)
528 -* **[[POC1 Specification (Detailed)>>FactHarbor.Specification.POC.Specification]]** - Detailed POC1 technical specs
529 -* **[[POC Summary>>FactHarbor.Specification.POC.Summary]]** - High-level POC overview
530 -* **[[Implementation Roadmap>>FactHarbor.Roadmap.WebHome]]** - POC1, POC2, Beta 0, V1.0 phases
531 -* **[[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]** - What users need (drives requirements)
1126 +**Expected Output:**
1127 +* Extract claim
1128 +* Verdict: [appropriate based on evidence]
1129 +* Risk tier: A (High - medical)
1130 +* Red label displayed
1131 +* Clear disclaimer about not being medical advice
532 532  
1133 +**Success:** Risk tier correctly assigned, appropriate warnings shown
533 533  
534 -**Document Owner:** Technical Team
535 -**Review Frequency:** After each POC iteration
536 -**Version History:**
537 -* v1.0 - Initial POC requirements
538 -* v2.0 - Updated after specification cross-check
539 -* v3.0 - Aligned with Main Requirements (FR/NFR IDs added)
1135 +---
540 540  
1137 +== 13. POC Decision Gate ==
1138 +
1139 +=== 13.1 Decision Framework ===
1140 +
1141 +After POC testing complete, team makes one of three decisions:
1142 +
1143 +**Option A: GO (Proceed to POC2)**
1144 +
1145 +**Conditions:**
1146 +* AI quality ≥70% without manual editing
1147 +* Basic claim → verdict pipeline validated
1148 +* Internal + advisor feedback positive
1149 +* Technical feasibility confirmed
1150 +* Team confident in direction
1151 +* Clear path to improving AI quality to ≥90%
1152 +
1153 +**Next Steps:**
1154 +* Plan POC2 development (add scenarios)
1155 +* Design scenario architecture
1156 +* Expand to Evidence Model structure
1157 +* Test with more complex articles
1158 +
1159 +---
1160 +
1161 +**Option B: NO-GO (Pivot or Stop)**
1162 +
1163 +**Conditions:**
1164 +* AI quality < 60%
1165 +* Requires manual editing for most analyses (> 50%)
1166 +* Feedback indicates fundamental flaws
1167 +* Cost/effort not justified by value
1168 +* No clear path to improvement
1169 +
1170 +**Next Steps:**
1171 +* **Pivot:** Change to hybrid human-AI approach (accept manual review required)
1172 +* **Stop:** Conclude approach not viable, revisit later
1173 +
1174 +---
1175 +
1176 +**Option C: ITERATE (Improve POC)**
1177 +
1178 +**Conditions:**
1179 +* Concept has merit but execution needs work
1180 +* Specific improvements identified
1181 +* Addressable with better prompts/approach
1182 +* AI quality between 60-70%
1183 +
1184 +**Next Steps:**
1185 +* Improve AI prompts
1186 +* Test different approaches
1187 +* Re-run POC with improvements
1188 +* Then make GO/NO-GO decision
1189 +
1190 +---
1191 +
1192 +=== 13.2 Decision Criteria Summary ===
1193 +
1194 +{{code}}
1195 +AI Quality < 60% → NO-GO (approach doesn't work)
1196 +AI Quality 60-70% → ITERATE (improve and retry)
1197 +AI Quality ≥70% → GO (proceed to POC2)
1198 +{{/code}}
1199 +
1200 +---
1201 +
1202 +== 14. Key Risks & Mitigations ==
1203 +
1204 +=== 14.1 Risk: AI Quality Not Good Enough ===
1205 +
1206 +**Likelihood:** Medium-High
1207 +**Impact:** POC fails
1208 +
1209 +**Mitigation:**
1210 +* Extensive prompt engineering and testing
1211 +* Use best available AI models (Sonnet 4.5)
1212 +* Test with diverse article types
1213 +* Iterate on prompts based on results
1214 +
1215 +**Acceptance:** This is what POC tests - be ready for failure
1216 +
1217 +---
1218 +
1219 +=== 14.2 Risk: AI Consistency Issues ===
1220 +
1221 +**Likelihood:** Medium
1222 +**Impact:** Works sometimes, fails other times
1223 +
1224 +**Mitigation:**
1225 +* Test with 10+ diverse articles
1226 +* Measure success rate honestly
1227 +* Improve prompts to increase consistency
1228 +
1229 +**Acceptance:** Some variability OK if average quality ≥70%
1230 +
1231 +---
1232 +
1233 +=== 14.3 Risk: Output Incomprehensible ===
1234 +
1235 +**Likelihood:** Low-Medium
1236 +**Impact:** Users can't understand analysis
1237 +
1238 +**Mitigation:**
1239 +* Create clear explainer document
1240 +* Iterate on output format
1241 +* Test with non-technical reviewers
1242 +* Simplify language if needed
1243 +
1244 +**Acceptance:** Iterate until comprehensible
1245 +
1246 +---
1247 +
1248 +=== 14.4 Risk: API Rate Limits / Costs ===
1249 +
1250 +**Likelihood:** Low
1251 +**Impact:** System slow or expensive
1252 +
1253 +**Mitigation:**
1254 +* Monitor API usage
1255 +* Implement retry logic
1256 +* Estimate costs before scaling
1257 +
1258 +**Acceptance:** POC can be slow and expensive (optimization later)
1259 +
1260 +---
1261 +
1262 +=== 14.5 Risk: Scope Creep ===
1263 +
1264 +**Likelihood:** Medium
1265 +**Impact:** POC becomes too complex
1266 +
1267 +**Mitigation:**
1268 +* Strict scope discipline
1269 +* Say NO to feature additions
1270 +* Keep focus on core question
1271 +
1272 +**Acceptance:** POC is minimal by design
1273 +
1274 +---
1275 +
1276 +== 15. POC Philosophy ==
1277 +
1278 +=== 15.1 Core Principles ===
1279 +
1280 +**1. Build Less, Learn More**
1281 +* Minimum features to test hypothesis
1282 +* Don't build unvalidated features
1283 +* Focus on core question only
1284 +
1285 +**2. Fail Fast**
1286 +* Quick test of hardest part (AI capability)
1287 +* Accept that POC might fail
1288 +* Better to discover issues early
1289 +* Honest assessment over optimistic hope
1290 +
1291 +**3. Test First, Build Second**
1292 +* Validate AI can do this before building platform
1293 +* Don't assume it will work
1294 +* Let results guide decisions
1295 +
1296 +**4. Automation First**
1297 +* No manual editing allowed
1298 +* Tests scalability, not just feasibility
1299 +* Proves approach can work at scale
1300 +
1301 +**5. Honest Assessment**
1302 +* Don't cherry-pick examples
1303 +* Don't manually fix bad outputs
1304 +* Document failures openly
1305 +* Make data-driven decisions
1306 +
1307 +---
1308 +
1309 +=== 15.2 What POC Is ===
1310 +
1311 +✅ Testing AI capability without humans
1312 +✅ Proving core technical concept
1313 +✅ Fast validation of approach
1314 +✅ Honest assessment of feasibility
1315 +
1316 +---
1317 +
1318 +=== 15.3 What POC Is NOT ===
1319 +
1320 +❌ Building a product
1321 +❌ Production-ready system
1322 +❌ Feature-complete platform
1323 +❌ Perfectly accurate analysis
1324 +❌ Polished user experience
1325 +
1326 +---
1327 +
1328 +== 16. Success = Clear Path Forward ==
1329 +
1330 +**If POC succeeds (≥70% AI quality):**
1331 +* ✅ Approach validated
1332 +* ✅ Proceed to POC2 (add scenarios)
1333 +* ✅ Design full Evidence Model structure
1334 +* ✅ Test multi-scenario comparison
1335 +* ✅ Focus on improving AI quality from 70% → 90%
1336 +
1337 +**If POC fails (< 60% AI quality):**
1338 +* ✅ Learn what doesn't work
1339 +* ✅ Pivot to different approach
1340 +* ✅ OR wait for better AI technology
1341 +* ✅ Avoid wasting resources on non-viable approach
1342 +
1343 +**Either way, POC provides clarity.**
1344 +
1345 +---
1346 +
1347 +== 17. Related Pages ==
1348 +
1349 +* [[User Needs>>FactHarbor.Specification.Requirements.User Needs]]
1350 +* [[Requirements>>FactHarbor.Requirements.WebHome]]
1351 +* [[Gap Analysis>>FactHarbor.Analysis.GapAnalysis]]
1352 +* [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]
1353 +* [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
1354 +* [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]
1355 +
1356 +---
1357 +
1358 +**Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)
1359 +