Last modified by Robert Schaub on 2025/12/23 18:00

From version 1.1
edited by Robert Schaub
on 2025/12/23 17:44
Change comment: Imported from XAR
To version 2.1
edited by Robert Schaub
on 2025/12/23 17:44
Change comment: Imported from XAR

Summary

Details

Page properties
Title
... ... @@ -1,1 +1,1 @@
1 -POC Requirements
1 +POC Requirements (POC1 & POC2)
Parent
... ... @@ -1,1 +1,1 @@
1 -FactHarbor.Specification.POC.WebHome
1 +WebHome
Content
... ... @@ -1,11 +1,14 @@
1 1  = POC Requirements =
2 2  
3 3  **Status:** ✅ Approved for Development
4 -**Version:** 2.0 (Updated after Specification Cross-Check)
4 +**Version:** 3.0 (Aligned with Main Requirements)
5 5  **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
6 6  
7 ----
7 +{{info}}
8 +**Core Philosophy:** POC validates the [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]] through simplified implementation. All POC features map to formal FR/NFR requirements.
9 +{{/info}}
8 8  
11 +
9 9  == 1. POC Overview ==
10 10  
11 11  === 1.1 What POC Tests ===
... ... @@ -15,1345 +15,523 @@
15 15  
16 16  **What we're proving:**
17 17  * AI can identify factual claims from text
18 -* AI can evaluate those claims and produce verdicts
19 -* Output is comprehensible and useful
20 -* Fully automated approach is viable
21 +* AI can evaluate those claims with structured evidence
22 +* Quality gates can filter unreliable outputs
23 +* The core workflow is technically feasible
21 21  
22 -**What we're NOT testing:**
23 -* Scenario generation (deferred to POC2)
24 -* Evidence display (deferred to POC2)
25 -* Production scalability
26 -* Perfect accuracy
27 -* Complete feature set
25 +**What we're NOT proving:**
26 +* Production-ready reliability (that's POC2)
27 +* User-facing features (that's Beta 0)
28 +* Full IFCN compliance (that's V1.0)
28 28  
29 ----
30 +=== 1.2 Requirements Mapping ===
30 30  
31 -=== 1.2 Scenarios Deferred to POC2 ===
32 +POC1 implements a **subset** of the full system requirements defined in [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]].
32 32  
33 -**Intentional Simplification:**
34 +**Scope Summary:**
35 +* **In Scope:** 8 requirements (7 FRs + 1 NFR)
36 +* **Partial:** 3 NFRs (simplified versions)
37 +* **Out of Scope:** 19 requirements (deferred to later phases)
34 34  
35 -Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**.
36 36  
37 -**Rationale:**
38 -* **POC1 tests:** Can AI extract claims and generate verdicts?
39 -* **POC2 will add:** Scenario generation and management
40 -* **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow?
40 +== 2. POC1 Scope ==
41 41  
42 -**Design Decision:**
42 +{{success}}
43 +**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor.Roadmap.Requirements-Roadmap-Matrix.WebHome]]
43 43  
44 -Prove basic AI capability first, then add scenario complexity based on POC1 learnings. This is good engineering: test the hardest part (AI fact-checking) before adding architectural complexity.
45 +The Roadmap Matrix is the single source of truth for which requirements are implemented in which phases. This page provides POC1-specific implementation details only.
46 +{{/success}}
45 45  
46 -**No Risk:**
48 +**POC1 implements these formal requirements:**
47 47  
48 -Scenarios are additive complexity, not foundational. Deferring them to POC2 allows:
49 -* Faster POC1 validation
50 -* Learning from POC1 to inform scenario design
51 -* Iterative approach: fail fast if basic AI doesn't work
52 -* Flexibility to adjust scenario architecture based on POC1 insights
50 +|= Formal Req |= Implementation in POC1 |= Notes
51 +| **FR4** | Analysis Summary | Basic format; quality metadata deferred to POC2
52 +| **FR7** | Automated Verdicts | Full implementation with quality gates (NFR11)
53 +| **NFR11** | Quality Assurance Framework | 4 quality gates implemented
53 53  
54 -**Full System Workflow (Future):**
55 -{{code}}
56 -Claims → Scenarios → Evidence → Verdicts
57 -{{/code}}
55 +**POC1 also implements these workflow components** (detailed as FR1-FR6 in implementation sections below)
58 58  
59 -**POC1 Simplified Workflow:**
60 -{{code}}
61 -Claims → Verdicts (scenarios implicit in reasoning)
62 -{{/code}}
57 +{{info}}
58 +**Note:** FR11 (Audit Trail) and FR13 (In-Article Claim Highlighting) are deferred to Beta 0 for production readiness and user experience enhancement.
59 +{{/info}}:
60 +* Claim extraction (FR1)
61 +* Claim context (FR2)
62 +* Multiple scenarios (FR3)
63 +* Evidence collection (FR5)
64 +* Source quality assessment (FR6)
65 +* Time evolution tracking (FR8) - deferred to POC2
66 +* Audit trail (FR11) - deferred to Beta 0
67 +* In-article highlighting (FR13) - deferred to Beta 0
63 63  
64 ----
69 +**Partial implementations:**
70 +* NFR1 (Explainability) - Basic only
71 +* NFR2 (Performance) - Functional but not optimized
72 +* NFR3 (Transparency) - Basic only
65 65  
66 -== 2. POC Output Specification ==
74 +**Detailed POC1 implementation specifications continue below...**
67 67  
68 -=== 2.1 Component 1: ANALYSIS SUMMARY ===
69 69  
70 -**What:** Brief overview of findings
71 -**Length:** 3-5 sentences
72 -**Content:**
73 -* How many claims found
74 -* Distribution of verdicts
75 -* Overall assessment
76 76  
77 -**Example:**
78 -{{code}}
79 -This article makes 4 claims about coffee's health effects. We found
80 -2 claims are well-supported, 1 is uncertain, and 1 is refuted.
81 -Overall assessment: mostly accurate with some exaggeration.
82 -{{/code}}
78 +== 3. POC Simplifications ==
83 83  
84 ----
80 +=== 3.1 FR1: Claim Extraction (Full Implementation) ===
85 85  
86 -=== 2.2 Component 2: CLAIMS IDENTIFICATION ===
82 +**Main Requirement:** AI extracts factual claims from input text
87 87  
88 -**What:** List of factual claims extracted from article
89 -**Format:** Numbered list
90 -**Quantity:** 3-5 claims
91 -**Requirements:**
92 -* Factual claims only (not opinions/questions)
93 -* Clearly stated
94 -* Automatically extracted by AI
95 -
96 -**Example:**
97 -{{code}}
98 -CLAIMS IDENTIFIED:
99 -
100 -[1] Coffee reduces diabetes risk by 30%
101 -[2] Coffee improves heart health
102 -[3] Decaf has same benefits as regular
103 -[4] Coffee prevents Alzheimer's completely
104 -{{/code}}
105 -
106 ----
107 -
108 -=== 2.3 Component 3: CLAIMS VERDICTS ===
109 -
110 -**What:** Verdict for each claim identified
111 -**Format:** Per claim structure
112 -
113 -**Required Elements:**
114 -* **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
115 -* **Confidence Score:** 0-100%
116 -* **Brief Reasoning:** 1-3 sentences explaining why
117 -* **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration
118 -
119 -**Example:**
120 -{{code}}
121 -VERDICTS:
122 -
123 -[1] WELL-SUPPORTED (85%) [Risk: C]
124 -Multiple studies confirm 25-30% risk reduction with regular consumption.
125 -
126 -[2] UNCERTAIN (65%) [Risk: B]
127 -Evidence is mixed. Some studies show benefits, others show no effect.
128 -
129 -[3] PARTIALLY SUPPORTED (60%) [Risk: C]
130 -Some benefits overlap, but caffeine-related benefits are reduced in decaf.
131 -
132 -[4] REFUTED (90%) [Risk: B]
133 -No evidence for complete prevention. Claim is significantly overstated.
134 -{{/code}}
135 -
136 -**Risk Tier Display:**
137 -* **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections
138 -* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
139 -* **Tier C (Green):** Low Risk - Facts/Definitions/History
140 -
141 -**Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.
142 -
143 ----
144 -
145 -=== 2.4 Component 4: ARTICLE SUMMARY (Optional) ===
146 -
147 -**What:** Brief summary of original article content
148 -**Length:** 3-5 sentences
149 -**Tone:** Neutral (article's position, not FactHarbor's analysis)
150 -
151 -**Example:**
152 -{{code}}
153 -ARTICLE SUMMARY:
154 -
155 -Health News Today article discusses coffee benefits, citing studies
156 -on diabetes and Alzheimer's. Author highlights research linking coffee
157 -to disease prevention. Recommends 2-3 cups daily for optimal health.
158 -{{/code}}
159 -
160 ----
161 -
162 -=== 2.5 Total Output Size ===
163 -
164 -**Combined:** ~200-300 words
165 -* Analysis Summary: 50-70 words
166 -* Claims Identification: 30-50 words
167 -* Claims Verdicts: 100-150 words
168 -* Article Summary: 30-50 words (optional)
169 -
170 ----
171 -
172 -== 3. What's NOT in POC Scope ==
173 -
174 -=== 3.1 Feature Exclusions ===
175 -
176 -The following are **explicitly excluded** from POC:
177 -
178 -**Content Features:**
179 -* ❌ Scenarios (deferred to POC2)
180 -* ❌ Evidence display (supporting/opposing lists)
181 -* ❌ Source links (clickable references)
182 -* ❌ Detailed reasoning chains
183 -* ❌ Source quality ratings (shown but not detailed)
184 -* ❌ Contradiction detection (basic only)
185 -* ❌ Risk assessment (shown but not workflow-integrated)
186 -
187 -**Platform Features:**
188 -* ❌ User accounts / authentication
189 -* ❌ Saved history
190 -* ❌ Search functionality
191 -* ❌ Claim comparison
192 -* ❌ User contributions
193 -* ❌ Commenting system
194 -* ❌ Social sharing
195 -
196 -**Technical Features:**
197 -* ❌ Browser extensions
198 -* ❌ Mobile apps
199 -* ❌ API endpoints
200 -* ❌ Webhooks
201 -* ❌ Export features (PDF, CSV)
202 -
203 -**Quality Features:**
204 -* ❌ Accessibility (WCAG compliance)
205 -* ❌ Multilingual support
206 -* ❌ Mobile optimization
207 -* ❌ Media verification (images/videos)
208 -
209 -**Production Features:**
210 -* ❌ Security hardening
211 -* ❌ Privacy compliance (GDPR)
212 -* ❌ Terms of service
213 -* ❌ Monitoring/logging
214 -* ❌ Error tracking
215 -* ❌ Analytics
216 -* ❌ A/B testing
217 -
218 ----
219 -
220 -== 4. POC Simplifications vs. Full System ==
221 -
222 -=== 4.1 Architecture Comparison ===
223 -
224 -**POC Architecture (Simplified):**
225 -{{code}}
226 -User Input → Single AKEL Call → Output Display
227 - (all processing)
228 -{{/code}}
229 -
230 -**Full System Architecture:**
231 -{{code}}
232 -User Input → Claim Extractor → Claim Classifier → Scenario Generator
233 -→ Evidence Summarizer → Contradiction Detector → Verdict Generator
234 -→ Quality Gates → Publication → Output Display
235 -{{/code}}
236 -
237 -**Key Differences:**
238 -
239 -|=Aspect|=POC1|=Full System
240 -|Processing|Single API call|Multi-component pipeline
241 -|Scenarios|None (implicit)|Explicit entities with versioning
242 -|Evidence|Basic retrieval|Comprehensive with quality scoring
243 -|Quality Gates|Simplified (4 basic checks)|Full validation infrastructure
244 -|Workflow|3 steps (input/process/output)|6 phases with gates
245 -|Data Model|Stateless (no database)|PostgreSQL + Redis + S3
246 -|Architecture|Single prompt to Claude|AKEL Orchestrator + Components
247 -
248 ----
249 -
250 -=== 4.2 Workflow Comparison ===
251 -
252 -**POC1 Workflow:**
253 -1. User submits text/URL
254 -2. Single AKEL call (all processing in one prompt)
255 -3. Display results
256 -**Total: 3 steps, ~10-18 seconds**
257 -
258 -**Full System Workflow:**
259 -1. **Claim Submission** (extraction, normalization, clustering)
260 -2. **Scenario Building** (definitions, assumptions, boundaries)
261 -3. **Evidence Handling** (retrieval, assessment, linking)
262 -4. **Verdict Creation** (synthesis, reasoning, approval)
263 -5. **Public Presentation** (summaries, landscapes, deep dives)
264 -6. **Time Evolution** (versioning, re-evaluation triggers)
265 -**Total: 6 phases with quality gates, ~10-30 seconds**
266 -
267 ----
268 -
269 -=== 4.3 Why POC is Simplified ===
270 -
271 -**Engineering Rationale:**
272 -
273 -1. **Test core capability first:** Can AI do basic fact-checking without humans?
274 -2. **Fail fast:** If AI can't generate reasonable verdicts, pivot early
275 -3. **Learn before building:** POC1 insights inform full architecture
276 -4. **Iterative approach:** Add complexity only after validating foundations
277 -5. **Resource efficiency:** Don't build full system if core concept fails
278 -
279 -**Acceptable Trade-offs:**
280 -
281 -* ✅ POC proves AI capability (most risky assumption)
282 -* ✅ POC validates user comprehension (can people understand output?)
283 -* ❌ POC doesn't validate full workflow (test in Beta)
284 -* ❌ POC doesn't validate scale (test in Beta)
285 -* ❌ POC doesn't validate scenario architecture (design in POC2)
286 -
287 ----
288 -
289 -=== 4.4 Gap Between POC1 and POC2/Beta ===
290 -
291 -**What needs to be built for POC2:**
292 -* Scenario generation component
293 -* Evidence Model structure (full)
294 -* Scenario-evidence linking
295 -* Multi-interpretation comparison
296 -* Truth landscape visualization
297 -
298 -**What needs to be built for Beta:**
299 -* Multi-component AKEL pipeline
300 -* Quality gate infrastructure
301 -* Review workflow system
302 -* Audit sampling framework
303 -* Production data model
304 -* Federation architecture (Release 1.0)
305 -
306 -**POC1 → POC2 is significant architectural expansion.**
307 -
308 ----
309 -
310 -== 5. Publication Mode & Labeling ==
311 -
312 -=== 5.1 POC Publication Mode ===
313 -
314 -**Mode:** Mode 2 (AI-Generated, No Prior Human Review)
315 -
316 -Per FactHarbor Specification Section 11 "POC v1 Behavior":
317 -* Produces public AI-generated output
318 -* No human approval gate
319 -* Clear AI-Generated labeling
320 -* All quality gates active (simplified)
321 -* Risk tier classification shown (demo)
322 -
323 ----
324 -
325 -=== 5.2 User-Facing Labels ===
326 -
327 -**Primary Label (top of analysis):**
328 -{{code}}
329 -╔════════════════════════════════════════════════════════════╗
330 -║ [AI-GENERATED - POC/DEMO] ║
331 -║ ║
332 -║ This analysis was produced entirely by AI and has not ║
333 -║ been human-reviewed. Use for demonstration purposes. ║
334 -║ ║
335 -║ Source: AI/AKEL v1.0 (POC) ║
336 -║ Review Status: Not Reviewed (Proof-of-Concept) ║
337 -║ Quality Gates: 4/4 Passed (Simplified) ║
338 -║ Last Updated: [timestamp] ║
339 -╚════════════════════════════════════════════════════════════╝
340 -{{/code}}
341 -
342 -**Per-Claim Risk Labels:**
343 -* **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety)
344 -* **[Risk: B]** 🟡 Medium Risk (Policy/Science)
345 -* **[Risk: C]** 🟢 Low Risk (Facts/Definitions)
346 -
347 ----
348 -
349 -=== 5.3 Display Requirements ===
350 -
351 -**Must Show:**
352 -* AI-Generated status (prominent)
353 -* POC/Demo disclaimer
354 -* Risk tier per claim
355 -* Confidence scores (0-100%)
356 -* Quality gate status (passed/failed)
357 -* Timestamp
358 -
359 -**Must NOT Claim:**
360 -* Human review
361 -* Production quality
362 -* Medical/legal advice
363 -* Authoritative verdicts
364 -* Complete accuracy
365 -
366 ----
367 -
368 -=== 5.4 Mode 2 vs. Full System Publication ===
369 -
370 -|=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3
371 -|Label|AI-Generated (POC)|AI-Generated|AKEL-Generated
372 -|Review|None|None|Human-Reviewed
373 -|Quality Gates|4 (simplified)|6 (full)|6 (full) + Human
374 -|Audit|None (POC)|Sampling (5-50%)|Pre-publication
375 -|Risk Display|Demo only|Workflow-integrated|Validated
376 -|User Actions|View only|Flag for review|Trust rating
377 -
378 ----
379 -
380 -== 6. Quality Gates (Simplified Implementation) ==
381 -
382 -=== 6.1 Overview ===
383 -
384 -Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates.
385 -
386 -**Full System Has 4 Gates:**
387 -1. Source Quality
388 -2. Contradiction Search (MANDATORY)
389 -3. Uncertainty Quantification
390 -4. Structural Integrity
391 -
392 -**POC Implements Simplified Versions:**
393 -* Focus on demonstrating concept
394 -* Basic implementations sufficient
395 -* Failures displayed to user (not blocking)
396 -* Full system has comprehensive validation
397 -
398 ----
399 -
400 -=== 6.2 Gate 1: Source Quality (Basic) ===
401 -
402 -**Full System Requirements:**
403 -* Primary sources identified and accessible
404 -* Source reliability scored against whitelist
405 -* Citation completeness verified
406 -* Publication dates checked
407 -* Author credentials validated
408 -
409 409  **POC Implementation:**
410 -* ✅ At least 2 sources found
411 -* ✅ Sources accessible (URLs valid)
412 -* ❌ No whitelist checking
413 -* ❌ No credential validation
414 -* ❌ No comprehensive reliability scoring
85 +* ✅ AKEL extracts claims using LLM
86 +* ✅ Each claim includes original text reference
87 +* ✅ Claims are identified as factual/non-factual
88 +* ❌ No advanced claim parsing (added in POC2)
415 415  
416 -**Pass Criteria:** ≥2 accessible sources found
90 +**Acceptance Criteria:**
91 +* Extracts 3-5 claims from typical article
92 +* Identifies factual vs non-factual claims
93 +* Quality Gate 1 validates extraction
417 417  
418 -**Failure Handling:** Display error message, don't generate verdict
419 419  
420 ----
96 +=== 3.2 FR3: Multiple Scenarios (Full Implementation) ===
421 421  
422 -=== 6.3 Gate 2: Contradiction Search (Basic) ===
98 +**Main Requirement:** Generate multiple interpretation scenarios for ambiguous claims
423 423  
424 -**Full System Requirements:**
425 -* Counter-evidence actively searched
426 -* Reservations and limitations identified
427 -* Alternative interpretations explored
428 -* Bubble detection (echo chambers, conspiracy theories)
429 -* Cross-cultural and international perspectives
430 -* Academic literature (supporting AND opposing)
431 -
432 432  **POC Implementation:**
433 -* ✅ Basic search for counter-evidence
434 -* ✅ Identify obvious contradictions
435 -* ❌ No comprehensive academic search
436 -* ❌ No bubble detection
437 -* ❌ No systematic alternative interpretation search
438 -* ❌ No international perspective verification
101 +* ✅ AKEL generates 2-3 scenarios per claim
102 +* ✅ Scenarios capture different interpretations
103 +* ✅ Each scenario is evaluated separately
104 +* ✅ Verdict considers all scenarios
439 439  
440 -**Pass Criteria:** Basic contradiction search attempted
106 +**Acceptance Criteria:**
107 +* Generates 2+ scenarios for ambiguous claims
108 +* Scenarios are meaningfully different
109 +* All scenarios are evaluated
441 441  
442 -**Failure Handling:** Note "limited contradiction search" in output
443 443  
444 ----
112 +=== 3.3 FR4: Analysis Summary (Basic Implementation) ===
445 445  
446 -=== 6.4 Gate 3: Uncertainty Quantification (Basic) ===
114 +**Main Requirement:** Provide user-friendly summary of analysis
447 447  
448 -**Full System Requirements:**
449 -* Confidence scores calculated for all claims/verdicts
450 -* Limitations explicitly stated
451 -* Data gaps identified and disclosed
452 -* Strength of evidence assessed
453 -* Alternative scenarios considered
454 -
455 455  **POC Implementation:**
456 -* ✅ Confidence scores (0-100%)
457 -* ✅ Basic uncertainty acknowledgment
458 -* ❌ No detailed limitation disclosure
459 -* ❌ No data gap identification
460 -* ❌ No alternative scenario consideration (deferred to POC2)
117 +* ✅ Simple text summary generated
118 +* ❌ No rich formatting (added in Beta 0)
119 +* ❌ No visual elements (added in Beta 0)
120 +* ❌ No interactive features (added in Beta 0)
461 461  
462 -**Pass Criteria:** Confidence score assigned
122 +**POC Format:**
123 +```
124 +Claim: [extracted claim]
125 +Scenarios: [list of scenarios]
126 +Evidence: [supporting/opposing evidence]
127 +Verdict: [probability with uncertainty]
128 +```
463 463  
464 -**Failure Handling:** Show "Confidence: Unknown" if calculation fails
465 465  
466 ----
131 +=== 3.4 FR5-FR6: Evidence Collection & Evaluation (Full Implementation) ===
467 467  
468 -=== 6.5 Gate 4: Structural Integrity (Basic) ===
133 +**Main Requirements:**
134 +* FR5: Collect supporting and opposing evidence
135 +* FR6: Evaluate evidence source reliability
469 469  
470 -**Full System Requirements:**
471 -* No hallucinations detected (fact-checking against sources)
472 -* Logic chain valid and traceable
473 -* References accessible and verifiable
474 -* No circular reasoning
475 -* Premises clearly stated
476 -
477 477  **POC Implementation:**
478 -* ✅ Basic coherence check
479 -* ✅ References accessible
480 -* No comprehensive hallucination detection
481 -* ❌ No formal logic validation
482 -* ❌ No premise extraction and verification
138 +* ✅ AKEL searches for evidence (web/knowledge base)
139 +* ✅ **Mandatory contradiction search** (finds opposing evidence)
140 +* Source reliability scoring
141 +* ❌ No evidence deduplication (added in POC2)
142 +* ❌ No advanced source verification (added in POC2)
483 483  
484 -**Pass Criteria:** Output is coherent and references are accessible
485 -
486 -**Failure Handling:** Display error message
487 -
488 ----
489 -
490 -=== 6.6 Quality Gate Display ===
491 -
492 -**POC shows simplified status:**
493 -{{code}}
494 -Quality Gates: 4/4 Passed (Simplified)
495 -✓ Source Quality: 3 sources found
496 -✓ Contradiction Search: Basic search completed
497 -✓ Uncertainty: Confidence scores assigned
498 -✓ Structural Integrity: Output coherent
499 -{{/code}}
500 -
501 -**If any gate fails:**
502 -{{code}}
503 -Quality Gates: 3/4 Passed (Simplified)
504 -✓ Source Quality: 3 sources found
505 -✗ Contradiction Search: Search failed - limited evidence
506 -✓ Uncertainty: Confidence scores assigned
507 -✓ Structural Integrity: Output coherent
508 -
509 -Note: This analysis has limited evidence. Use with caution.
510 -{{/code}}
511 -
512 ----
513 -
514 -=== 6.7 Simplified vs. Full System ===
515 -
516 -|=Gate|=POC (Simplified)|=Full System
517 -|Source Quality|≥2 sources accessible|Whitelist scoring, credentials, comprehensiveness
518 -|Contradiction|Basic search|Systematic academic + media + international
519 -|Uncertainty|Confidence % assigned|Detailed limitations, data gaps, alternatives
520 -|Structural|Coherence check|Hallucination detection, logic validation, premise check
521 -
522 -**POC Goal:** Demonstrate that quality gates are possible, not perfect implementation.
523 -
524 ----
525 -
526 -== 7. AKEL Architecture Comparison ==
527 -
528 -=== 7.1 POC AKEL (Simplified) ===
529 -
530 -**Implementation:**
531 -* Single Claude API call (Sonnet 4.5)
532 -* One comprehensive prompt
533 -* All processing in single request
534 -* No separate components
535 -* No orchestration layer
536 -
537 -**Prompt Structure:**
538 -{{code}}
539 -Task: Analyze this article and provide:
540 -
541 -1. Extract 3-5 factual claims
542 -2. For each claim:
543 - - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
544 - - Assign confidence score (0-100%)
545 - - Assign risk tier (A/B/C)
546 - - Write brief reasoning (1-3 sentences)
547 -3. Generate analysis summary (3-5 sentences)
548 -4. Generate article summary (3-5 sentences)
549 -5. Run basic quality checks
550 -
551 -Return as structured JSON.
552 -{{/code}}
553 -
554 -**Processing Time:** 10-18 seconds (estimate)
555 -
556 ----
557 -
558 -=== 7.2 Full System AKEL (Production) ===
559 -
560 -**Architecture:**
561 -{{code}}
562 -AKEL Orchestrator
563 -├── Claim Extractor
564 -├── Claim Classifier (with risk tier assignment)
565 -├── Scenario Generator
566 -├── Evidence Summarizer
567 -├── Contradiction Detector
568 -├── Quality Gate Validator
569 -├── Audit Sampling Scheduler
570 -└── Federation Sync Adapter (Release 1.0+)
571 -{{/code}}
572 -
573 -**Processing:**
574 -* Parallel processing where possible
575 -* Separate component calls
576 -* Quality gates between phases
577 -* Audit sampling selection
578 -* Cross-node coordination (federated mode)
579 -
580 -**Processing Time:** 10-30 seconds (full pipeline)
581 -
582 ----
583 -
584 -=== 7.3 Why POC Uses Single Call ===
585 -
586 -**Advantages:**
587 -* ✅ Simpler to implement
588 -* ✅ Faster POC development
589 -* ✅ Easier to debug
590 -* ✅ Proves AI capability
591 -* ✅ Good enough for concept validation
592 -
593 -**Limitations:**
594 -* ❌ No component reusability
595 -* ❌ No parallel processing
596 -* ❌ All-or-nothing (can't partially succeed)
597 -* ❌ Harder to improve individual components
598 -* ❌ No audit sampling
599 -
600 -**Acceptable Trade-off:**
601 -
602 -POC tests "Can AI do this?" not "How should we architect it?"
603 -
604 -Full component architecture comes in Beta after POC validates concept.
605 -
606 ----
607 -
608 -=== 7.4 Evolution Path ===
609 -
610 -**POC1:** Single prompt → Prove concept
611 -**POC2:** Add scenario component → Test full pipeline
612 -**Beta:** Multi-component AKEL → Production architecture
613 -**Release 1.0:** Full AKEL + Federation → Scale
614 -
615 ----
616 -
617 -== 8. Functional Requirements ==
618 -
619 -=== FR-POC-1: Article Input ===
620 -
621 -**Requirement:** User can submit article for analysis
622 -
623 -**Functionality:**
624 -* Text input field (paste article text, up to 5000 characters)
625 -* URL input field (paste article URL)
626 -* "Analyze" button to trigger processing
627 -* Loading indicator during analysis
628 -
629 -**Excluded:**
630 -* No user authentication
631 -* No claim history
632 -* No search functionality
633 -* No saved templates
634 -
635 635  **Acceptance Criteria:**
636 -* User can paste text from article
637 -* User can paste URL of article
638 -* System accepts input and triggers analysis
145 +* Finds 2+ supporting evidence items
146 +* Finds 1+ opposing evidence (if exists)
147 +* Sources scored for reliability
639 639  
640 ----
641 641  
642 -=== FR-POC-2: Claim Extraction (Fully Automated) ===
150 +=== 3.5 FR7: Automated Verdicts (Full Implementation) ===
643 643  
644 -**Requirement:** AI automatically extracts 3-5 factual claims
152 +**Main Requirement:** AI computes verdicts with uncertainty quantification
645 645  
646 -**Functionality:**
647 -* AI reads article text
648 -* AI identifies factual claims (not opinions/questions)
649 -* AI extracts 3-5 most important claims
650 -* System displays numbered list
154 +**POC Implementation:**
155 +* Probabilistic verdicts (0-100% confidence)
156 +* Uncertainty explicitly stated
157 +* Reasoning chain provided
158 +* ✅ Quality Gate 4 validates verdict confidence
651 651  
652 -**Critical:** NO MANUAL EDITING ALLOWED
653 -* AI selects which claims to extract
654 -* AI identifies factual vs. non-factual
655 -* System processes claims as extracted
656 -* No human curation or correction
160 +**POC Output:**
161 +```
162 +Verdict: 70% likely true
163 +Uncertainty: ±15% (moderate confidence)
164 +Reasoning: Based on 3 high-quality sources...
165 +Confidence Level: MEDIUM
166 +```
657 657  
658 -**Error Handling:**
659 -* If extraction fails: Display error message
660 -* User can retry with different input
661 -* No manual intervention to fix extraction
662 -
663 663  **Acceptance Criteria:**
664 -* AI extracts 3-5 claims automatically
665 -* Claims are factual (not opinions)
666 -* Claims are clearly stated
667 -* No manual editing required
169 +* Verdicts include probability (0-100%)
170 +* Uncertainty explicitly quantified
171 +* Reasoning chain explains verdict
668 668  
669 ----
670 670  
671 -=== FR-POC-3: Verdict Generation (Fully Automated) ===
174 +=== 3.6 NFR11: Quality Assurance Framework (LITE VERSION) ===
672 672  
673 -**Requirement:** AI automatically generates verdict for each claim
176 +**Main Requirement:** Complete quality assurance with 7 quality gates
674 674  
675 -**Functionality:**
676 -* For each claim, AI:
677 - * Evaluates claim based on available evidence/knowledge
678 - * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
679 - * Assigns confidence score (0-100%)
680 - * Assigns risk tier (A/B/C)
681 - * Writes brief reasoning (1-3 sentences)
682 -* System displays verdict for each claim
178 +**POC Implementation:** **2 gates only**
683 683  
684 -**Critical:** NO MANUAL EDITING ALLOWED
685 -* AI computes verdicts based on evidence
686 -* AI generates confidence scores
687 -* AI writes reasoning
688 -* No human review or adjustment
180 +**Quality Gate 1: Claim Validation**
181 +* ✅ Validates claim is factual and verifiable
182 +* ✅ Blocks non-factual claims (opinion/prediction/ambiguous)
183 +* ✅ Provides clear rejection reason
689 689  
690 -**Error Handling:**
691 -* If verdict generation fails: Display error message
692 -* User can retry
693 -* No manual intervention to adjust verdicts
185 +**Quality Gate 4: Verdict Confidence Assessment**
186 +* ✅ Validates ≥2 sources found
187 +* ✅ Validates quality score ≥0.6
188 +* ✅ Blocks low-confidence verdicts
189 +* ✅ Provides clear rejection reason
694 694  
695 -**Acceptance Criteria:**
696 -* Each claim has a verdict
697 -* Confidence score is displayed (0-100%)
698 -* Risk tier is displayed (A/B/C)
699 -* Reasoning is understandable (1-3 sentences)
700 -* Verdict is defensible given reasoning
701 -* All generated automatically by AI
191 +**Out of Scope (POC2+):**
192 +* ❌ Gate 2: Evidence Relevance
193 +* ❌ Gate 3: Scenario Coherence
194 +* ❌ Gate 5: Source Diversity
195 +* ❌ Gate 6: Reasoning Validity
196 +* ❌ Gate 7: Output Completeness
702 702  
703 ----
198 +**Rationale:** Prove gate concept works. Add remaining gates in POC2 after validating approach.
704 704  
705 -=== FR-POC-4: Analysis Summary (Fully Automated) ===
706 706  
707 -**Requirement:** AI generates brief summary of analysis
201 +=== 3.7 NFR1-3: Performance, Scalability, Reliability (Basic) ===
708 708  
709 -**Functionality:**
710 -* AI summarizes findings in 3-5 sentences:
711 - * How many claims found
712 - * Distribution of verdicts
713 - * Overall assessment
714 -* System displays at top of results
203 +**Main Requirements:**
204 +* NFR1: Response time < 30 seconds
205 +* NFR2: Handle 1000+ concurrent users
206 +* NFR3: 99.9% uptime
715 715  
716 -**Critical:** NO MANUAL EDITING ALLOWED
208 +**POC Implementation:**
209 +* ⚠️ **Response time monitored** (not optimized)
210 +* ⚠️ **Single-threaded processing** (no concurrency)
211 +* ⚠️ **Basic error handling** (no advanced retry logic)
717 717  
718 -**Acceptance Criteria:**
719 -* Summary is coherent
720 -* Accurately reflects analysis
721 -* 3-5 sentences
722 -* Automatically generated
213 +**Rationale:** POC proves functionality. Performance optimization happens in POC2.
723 723  
724 ----
215 +**POC Acceptance:**
216 +* Analysis completes (no timeout requirement)
217 +* Errors don't crash system
218 +* Basic logging in place
725 725  
726 -=== FR-POC-5: Article Summary (Fully Automated, Optional) ===
727 727  
728 -**Requirement:** AI generates brief summary of original article
221 +== 4. What's NOT in POC Scope ==
729 729  
730 -**Functionality:**
731 -* AI summarizes article content (not FactHarbor's analysis)
732 -* 3-5 sentences
733 -* System displays
223 +=== 4.1 User-Facing Features (Beta 0+) ===
734 734  
735 -**Note:** Optional - can skip if time limited
225 +{{warning}}
226 +**Deferred to Beta 0:**
227 +{{/warning}}
736 736  
737 -**Critical:** NO MANUAL EDITING ALLOWED
229 +**Out of Scope:**
230 +* ❌ User accounts and authentication (FR8)
231 +* ❌ User corrections system (FR9, FR45-46)
232 +* ❌ Public publishing interface (FR10)
233 +* ❌ Social sharing (FR11)
234 +* ❌ Email notifications (FR12)
235 +* ❌ API access (FR13)
738 738  
739 -**Acceptance Criteria:**
740 -* Summary is neutral (article's position)
741 -* Accurately reflects article content
742 -* 3-5 sentences
743 -* Automatically generated
237 +**Rationale:** POC validates AI capabilities. User features added in Beta 0.
744 744  
745 ----
746 746  
747 -=== FR-POC-6: Publication Mode Display ===
240 +=== 4.2 Advanced Features (V1.0+) ===
748 748  
749 -**Requirement:** Clear labeling of AI-generated content
242 +**Out of Scope:**
243 +* ❌ IFCN compliance (FR47)
244 +* ❌ ClaimReview schema (FR48)
245 +* ❌ Archive.org integration (FR49)
246 +* ❌ OSINT toolkit (FR50)
247 +* ❌ Video verification (FR51)
248 +* ❌ Deepfake detection (FR52)
249 +* ❌ Cross-org sharing (FR53)
750 750  
751 -**Functionality:**
752 -* Display Mode 2 publication label
753 -* Show POC/Demo disclaimer
754 -* Display risk tiers per claim
755 -* Show quality gate status
756 -* Display timestamp
251 +**Rationale:** Advanced features require proven platform. Added post-V1.0.
757 757  
758 -**Acceptance Criteria:**
759 -* Label is prominent and clear
760 -* User understands this is AI-generated POC output
761 -* Risk tiers are color-coded
762 -* Quality gate status is visible
763 763  
764 ----
254 +=== 4.3 Production Requirements (POC2, Beta 0) ===
765 765  
766 -=== FR-POC-7: Quality Gate Execution ===
256 +**Out of Scope:**
257 +* ❌ Security controls (NFR4, NFR12)
258 +* ❌ Code maintainability (NFR5)
259 +* ❌ System monitoring (NFR13)
260 +* ❌ Evidence deduplication
261 +* ❌ Advanced source verification
262 +* ❌ Full 7-gate quality framework
767 767  
768 -**Requirement:** Execute simplified quality gates
264 +**Rationale:** POC proves concept. Production hardening happens in POC2 and Beta 0.
769 769  
770 -**Functionality:**
771 -* Check source quality (basic)
772 -* Attempt contradiction search (basic)
773 -* Calculate confidence scores
774 -* Verify structural integrity (basic)
775 -* Display gate results
776 776  
777 -**Acceptance Criteria:**
778 -* All 4 gates attempted
779 -* Pass/fail status displayed
780 -* Failures explained to user
781 -* Gates don't block publication (POC mode)
267 +== 5. POC Output Specification ==
782 782  
783 ----
269 +=== 5.1 Required Output Elements ===
784 784  
785 -== 9. Non-Functional Requirements ==
271 +For each analyzed claim, POC must produce:
786 786  
787 -=== NFR-POC-1: Fully Automated Processing ===
273 +**1. Claim**
274 +* Original text
275 +* Classification (factual/non-factual/ambiguous)
276 +* If non-factual: Clear reason why
788 788  
789 -**Requirement:** Complete AI automation with zero manual intervention
278 +**2. Scenarios** (if factual)
279 +* 2-3 interpretation scenarios
280 +* Each scenario clearly described
790 790  
791 -**Critical Rule:** NO MANUAL EDITING AT ANY STAGE
282 +**3. Evidence** (if factual)
283 +* Supporting evidence (2+ items)
284 +* Opposing evidence (if exists)
285 +* Source URLs and reliability scores
792 792  
793 -**What this means:**
794 -* Claims: AI selects (no human curation)
795 -* Scenarios: N/A (deferred to POC2)
796 -* Evidence: AI evaluates (no human selection)
797 -* Verdicts: AI determines (no human adjustment)
798 -* Summaries: AI writes (no human editing)
287 +**4. Verdict** (if factual)
288 +* Probability (0-100%)
289 +* Uncertainty quantification
290 +* Confidence level (LOW/MEDIUM/HIGH)
291 +* Reasoning chain
799 799  
800 -**Pipeline:**
801 -{{code}}
802 -User Input → AKEL Processing → Output Display
803 - ↓
804 - ZERO human editing
805 -{{/code}}
293 +**5. Quality Status**
294 +* Which gates passed/failed
295 +* If failed: Clear explanation why
806 806  
807 -**If AI output is poor:**
808 -* ❌ Do NOT manually fix it
809 -* ✅ Document the failure
810 -* ✅ Improve prompts and retry
811 -* ✅ Accept that POC might fail
812 812  
813 -**Why this matters:**
814 -* Tests whether AI can do this without humans
815 -* Validates scalability (humans can't review every analysis)
816 -* Honest test of technical feasibility
298 +=== 5.2 Example POC Output ===
817 817  
818 ----
819 -
820 -=== NFR-POC-2: Performance ===
821 -
822 -**Requirement:** Analysis completes in reasonable time
823 -
824 -**Acceptable Performance:**
825 -* Processing time: 1-5 minutes (acceptable for POC)
826 -* Display loading indicator to user
827 -* Show progress if possible ("Extracting claims...", "Generating verdicts...")
828 -
829 -**Not Required:**
830 -* Production-level speed (< 30 seconds)
831 -* Optimization for scale
832 -* Caching
833 -
834 -**Acceptance Criteria:**
835 -* Analysis completes within 5 minutes
836 -* User sees loading indicator
837 -* No timeout errors
838 -
839 ----
840 -
841 -=== NFR-POC-3: Reliability ===
842 -
843 -**Requirement:** System works for manual testing sessions
844 -
845 -**Acceptable:**
846 -* Occasional errors (< 20% failure rate)
847 -* Manual restart if needed
848 -* Display error messages clearly
849 -
850 -**Not Required:**
851 -* 99.9% uptime
852 -* Automatic error recovery
853 -* Production monitoring
854 -
855 -**Acceptance Criteria:**
856 -* System works for test demonstrations
857 -* Errors are handled gracefully
858 -* User receives clear error messages
859 -
860 ----
861 -
862 -=== NFR-POC-4: Environment ===
863 -
864 -**Requirement:** Runs on simple infrastructure
865 -
866 -**Acceptable:**
867 -* Single machine or simple cloud setup
868 -* No distributed architecture
869 -* No load balancing
870 -* No redundancy
871 -* Local development environment viable
872 -
873 -**Not Required:**
874 -* Production infrastructure
875 -* Multi-region deployment
876 -* Auto-scaling
877 -* Disaster recovery
878 -
879 ----
880 -
881 -== 10. Technical Architecture ==
882 -
883 -=== 10.1 System Components ===
884 -
885 -**Frontend:**
886 -* Simple HTML form (text input + URL input + button)
887 -* Loading indicator
888 -* Results display page (single page, no tabs/navigation)
889 -
890 -**Backend:**
891 -* Single API endpoint
892 -* Calls Claude API (Sonnet 4.5 or latest)
893 -* Parses response
894 -* Returns JSON to frontend
895 -
896 -**Data Storage:**
897 -* None required (stateless POC)
898 -* Optional: Simple file storage or SQLite for demo examples
899 -
900 -**External Services:**
901 -* Claude API (Anthropic) - required
902 -* Optional: URL fetch service for article text extraction
903 -
904 ----
905 -
906 -=== 10.2 Processing Flow ===
907 -
908 -{{code}}
909 -1. User submits text or URL
910 - ↓
911 -2. Backend receives request
912 - ↓
913 -3. If URL: Fetch article text
914 - ↓
915 -4. Call Claude API with single prompt:
916 - "Extract claims, evaluate each, provide verdicts"
917 - ↓
918 -5. Claude API returns:
919 - - Analysis summary
920 - - Claims list
921 - - Verdicts for each claim (with risk tiers)
922 - - Article summary (optional)
923 - - Quality gate results
924 - ↓
925 -6. Backend parses response
926 - ↓
927 -7. Frontend displays results with Mode 2 labeling
300 +{{code language="json"}}
301 +{
302 + "claim": {
303 + "text": "Switzerland has the highest life expectancy in Europe",
304 + "type": "factual",
305 + "gate1_status": "PASS"
306 + },
307 + "scenarios": [
308 + "Switzerland's overall life expectancy is highest",
309 + "Switzerland ranks highest for specific age groups"
310 + ],
311 + "evidence": {
312 + "supporting": [
313 + {
314 + "source": "WHO Report 2023",
315 + "reliability": 0.95,
316 + "excerpt": "Switzerland: 83.4 years average..."
317 + }
318 + ],
319 + "opposing": [
320 + {
321 + "source": "Eurostat 2024",
322 + "reliability": 0.90,
323 + "excerpt": "Spain leads at 83.5 years..."
324 + }
325 + ]
326 + },
327 + "verdict": {
328 + "probability": 0.65,
329 + "uncertainty": 0.15,
330 + "confidence": "MEDIUM",
331 + "reasoning": "WHO and Eurostat show similar but conflicting data...",
332 + "gate4_status": "PASS"
333 + }
334 +}
928 928  {{/code}}
929 929  
930 -**Key Simplification:** Single API call does entire analysis
931 931  
932 ----
338 +== 6. Success Criteria ==
933 933  
934 -=== 10.3 AI Prompt Strategy ===
340 +{{success}}
341 +**POC Success Definition:** POC validates that AI can extract claims, find balanced evidence, and compute reasonable verdicts with quality gates improving output quality.
342 +{{/success}}
935 935  
936 -**Single Comprehensive Prompt:**
937 -{{code}}
938 -Task: Analyze this article and provide:
344 +=== 6.1 Functional Success ===
939 939  
940 -1. Extract 3-5 factual claims from the article
941 -2. For each claim:
942 - - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
943 - - Assign confidence score (0-100%)
944 - - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
945 - - Write brief reasoning (1-3 sentences)
946 -3. Run quality gates:
947 - - Check: ≥2 sources found
948 - - Attempt: Basic contradiction search
949 - - Calculate: Confidence scores
950 - - Verify: Structural integrity
951 -4. Write analysis summary (3-5 sentences: claims found, verdict distribution, overall assessment)
952 -5. Write article summary (3-5 sentences: neutral summary of article content)
346 +POC is successful if:
953 953  
954 -Return as structured JSON with quality gate results.
955 -{{/code}}
348 +✅ **FR1-FR7 Requirements Met:**
349 +1. Extracts 3-5 factual claims from test articles
350 +2. Generates 2-3 scenarios per ambiguous claim
351 +3. Finds supporting AND opposing evidence
352 +4. Computes probabilistic verdicts with uncertainty
353 +5. Provides clear reasoning chains
956 956  
957 -**One prompt generates everything.**
355 +✅ **Quality Gates Work:**
356 +1. Gate 1 blocks non-factual claims (100% block rate)
357 +2. Gate 4 blocks low-quality verdicts (blocks if <2 sources or quality <0.6)
358 +3. Clear rejection reasons provided
958 958  
959 ----
360 +✅ **NFR11 Met:**
361 +1. Quality gates reduce hallucination rate
362 +2. Blocked outputs have clear explanations
363 +3. Quality metrics are logged
960 960  
961 -=== 10.4 Technology Stack Suggestions ===
962 962  
963 -**Frontend:**
964 -* HTML + CSS + JavaScript (minimal framework)
965 -* OR: Next.js (if team prefers)
966 -* Hosted: Local machine OR Vercel/Netlify free tier
366 +=== 6.2 Quality Thresholds ===
967 967  
968 -**Backend:**
969 -* Python Flask/FastAPI (simple REST API)
970 -* OR: Next.js API routes (if using Next.js)
971 -* Hosted: Local machine OR Railway/Render free tier
368 +**Minimum Acceptable:**
369 +* ≥70% of test claims correctly classified (factual/non-factual)
370 +* ≥60% of verdicts are reasonable (human evaluation)
371 +* Gate 1 blocks 100% of non-factual claims
372 +* Gate 4 blocks verdicts with <2 sources
972 972  
973 -**AKEL Integration:**
974 -* Claude API via Anthropic SDK
975 -* Model: Claude Sonnet 4.5 or latest available
374 +**Target:**
375 +* ≥80% claims correctly classified
376 +* ≥75% verdicts are reasonable
377 +* <10% false positives (blocking good claims)
976 976  
977 -**Database:**
978 -* None (stateless acceptable)
979 -* OR: SQLite if want to store demo examples
980 -* OR: JSON files on disk
981 981  
982 -**Deployment:**
983 -* Local development environment sufficient for POC
984 -* Optional: Deploy to cloud for remote demos
380 +=== 6.3 POC Decision Gate ===
985 985  
986 ----
382 +**After POC1, we decide:**
987 987  
988 -== 11. Success Criteria ==
384 +**✅ PROCEED to POC2** if:
385 +* Success criteria met
386 +* Quality gates demonstrably improve output
387 +* Core workflow is technically sound
388 +* Clear path to production quality
989 989  
990 -=== 11.1 Minimum Success (POC Passes) ===
390 +**⚠️ ITERATE POC1** if:
391 +* Success criteria partially met
392 +* Gates work but need tuning
393 +* Core issues identified but fixable
991 991  
992 -**Required for GO decision:**
993 -* ✅ AI extracts 3-5 factual claims automatically
994 -* ✅ AI provides verdict for each claim automatically
995 -* ✅ Verdicts are reasonable (≥70% make logical sense)
996 -* ✅ Analysis summary is coherent
997 -* ✅ Output is comprehensible to reviewers
998 -* ✅ Team/advisors understand the output
999 -* ✅ Team agrees approach has merit
1000 -* ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention)
395 +**❌ PIVOT APPROACH** if:
396 +* Success criteria not met
397 +* Fundamental AI limitations discovered
398 +* Quality gates insufficient
399 +* Alternative approach needed
1001 1001  
1002 -**Quality Definition:**
1003 -* "Reasonable verdict" = Defensible given general knowledge
1004 -* "Coherent summary" = Logically structured, grammatically correct
1005 -* "Comprehensible" = Reviewers understand what analysis means
1006 1006  
1007 ----
402 +== 7. Test Cases ==
1008 1008  
1009 -=== 11.2 POC Fails If ===
404 +=== 7.1 Happy Path ===
1010 1010  
1011 -**Automatic NO-GO if any of these:**
1012 -* ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones)
1013 -* ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random)
1014 -* ❌ Output incomprehensible (reviewers can't understand analysis)
1015 -* ❌ **Requires manual editing for most analyses** (> 50% need human correction)
1016 -* ❌ Team loses confidence in AI-automated approach
406 +**Test 1: Simple Factual Claim**
407 +* Input: "Paris is the capital of France"
408 +* Expected: Factual, 1 scenario, verdict ~95% true
1017 1017  
1018 ----
410 +**Test 2: Ambiguous Claim**
411 +* Input: "Switzerland has the highest income in Europe"
412 +* Expected: Factual, 2-3 scenarios, verdict with uncertainty
1019 1019  
1020 -=== 11.3 Quality Thresholds ===
414 +**Test 3: Statistical Claim**
415 +* Input: "10% of people have condition X"
416 +* Expected: Factual, evidence with numbers, probabilistic verdict
1021 1021  
1022 -**POC quality expectations:**
1023 1023  
1024 -|=Component|=Quality Threshold|=Definition
1025 -|Claim Extraction|(% class="success" %)≥70% accuracy(%%) |Identifies obvious factual claims, may miss some edge cases
1026 -|Verdict Logic|(% class="success" %)≥70% defensible(%%) |Verdicts are logical given reasoning provided
1027 -|Reasoning Clarity|(% class="success" %)≥70% clear(%%) |1-3 sentences are understandable and relevant
1028 -|Overall Analysis|(% class="success" %)≥70% useful(%%) |Output helps user understand article claims
419 +=== 7.2 Edge Cases ===
1029 1029  
1030 -**Analogy:** "B student" quality (70-80%), not "A+" perfection yet
421 +**Test 4: Opinion**
422 +* Input: "Paris is the best city"
423 +* Expected: Non-factual (opinion), blocked by Gate 1
1031 1031  
1032 -**Not expecting:**
1033 -* 100% accuracy
1034 -* Perfect claim coverage
1035 -* Comprehensive evidence gathering
1036 -* Flawless verdicts
1037 -* Production polish
425 +**Test 5: Prediction**
426 +* Input: "Bitcoin will reach $100,000 next year"
427 +* Expected: Non-factual (prediction), blocked by Gate 1
1038 1038  
1039 -**Expecting:**
1040 -* Reasonable claim extraction
1041 -* Defensible verdicts
1042 -* Understandable reasoning
1043 -* Useful output
429 +**Test 6: Insufficient Evidence**
430 +* Input: Obscure factual claim with no sources
431 +* Expected: Blocked by Gate 4 (<2 sources)
1044 1044  
1045 ----
1046 1046  
1047 -== 12. Test Cases ==
434 +=== 7.3 Quality Gate Tests ===
1048 1048  
1049 -=== 12.1 Test Case 1: Simple Factual Claim ===
436 +**Test 7: Gate 1 Effectiveness**
437 +* Input: Mix of 10 factual + 10 non-factual claims
438 +* Expected: Gate 1 blocks all 10 non-factual (100% precision)
1050 1050  
1051 -**Input:** "Coffee reduces the risk of type 2 diabetes by 30%"
440 +**Test 8: Gate 4 Effectiveness**
441 +* Input: Claims with varying evidence availability
442 +* Expected: Gate 4 blocks low-confidence verdicts
1052 1052  
1053 -**Expected Output:**
1054 -* Extract claim correctly
1055 -* Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED
1056 -* Confidence: 70-90%
1057 -* Risk tier: C (Low)
1058 -* Reasoning: Mentions studies or evidence
1059 1059  
1060 -**Success:** Verdict is reasonable and reasoning makes sense
445 +== 8. Technical Architecture (POC) ==
1061 1061  
1062 ----
447 +=== 8.1 Simplified Architecture ===
1063 1063  
1064 -=== 12.2 Test Case 2: Complex News Article ===
449 +**POC Tech Stack:**
450 +* **Frontend:** Simple web interface (Next.js + TypeScript)
451 +* **Backend:** Single API endpoint
452 +* **AI:** Claude API (Sonnet 4.5)
453 +* **Database:** Local JSON files (no database)
454 +* **Deployment:** Single server
1065 1065  
1066 -**Input:** News article URL with multiple claims about politics/health/science
456 +**Architecture Diagram:** See [[POC1 Specification>>FactHarbor.Specification.POC.Specification]]
1067 1067  
1068 -**Expected Output:**
1069 -* Extract 3-5 key claims
1070 -* Verdict for each (may vary: some supported, some uncertain, some refuted)
1071 -* Coherent analysis summary
1072 -* Article summary
1073 -* Risk tiers assigned appropriately
1074 1074  
1075 -**Success:** Claims identified are actually from article, verdicts are reasonable
459 +=== 8.2 AKEL Implementation ===
1076 1076  
1077 ----
461 +**POC AKEL:**
462 +* Single-threaded processing
463 +* Synchronous API calls
464 +* No caching
465 +* Basic error handling
466 +* Console logging
1078 1078  
1079 -=== 12.3 Test Case 3: Controversial Topic ===
468 +**Full AKEL (POC2+):**
469 +* Multi-threaded processing
470 +* Async API calls
471 +* Evidence caching
472 +* Advanced error handling with retry
473 +* Structured logging + monitoring
1080 1080  
1081 -**Input:** Article on contested political or scientific topic
1082 1082  
1083 -**Expected Output:**
1084 -* Balanced analysis
1085 -* Acknowledges uncertainty where appropriate
1086 -* Doesn't overstate confidence
1087 -* Reasoning shows awareness of complexity
476 +== 9. POC Philosophy ==
1088 1088  
1089 -**Success:** Analysis is fair and doesn't show obvious bias
478 +{{info}}
479 +**Important:** POC validates concept, not production readiness. Focus is on proving AI can do the job, with production quality coming in later phases.
480 +{{/info}}
1090 1090  
1091 ----
482 +=== 9.1 Core Principles ===
1092 1092  
1093 -=== 12.4 Test Case 4: Clearly False Claim ===
484 +**1. Prove Concept, Not Production**
485 +* POC validates AI can do the job
486 +* Production quality comes in POC2 and Beta 0
487 +* Focus on "does it work?" not "is it perfect?"
1094 1094  
1095 -**Input:** Article with obviously false claim (e.g., "The Earth is flat")
489 +**2. Implement Subset of Requirements**
490 +* POC covers FR1-7, NFR11 (lite)
491 +* All other requirements deferred
492 +* Clear mapping to [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]
1096 1096  
1097 -**Expected Output:**
1098 -* Extract claim
1099 -* Verdict: REFUTED
1100 -* High confidence (> 90%)
1101 -* Risk tier: C (Low - established fact)
1102 -* Clear reasoning
494 +**3. Quality Gates Validate Approach**
495 +* 2 gates prove the concept
496 +* Remaining 5 gates added in POC2
497 +* Gates must demonstrably improve quality
1103 1103  
1104 -**Success:** AI correctly identifies false claim with high confidence
499 +**4. Iterate Based on Results**
500 +* POC results determine next steps
501 +* Decision gate after POC1
502 +* Flexibility to pivot if needed
1105 1105  
1106 ----
1107 1107  
1108 -=== 12.5 Test Case 5: Genuinely Uncertain Claim ===
505 +=== 9.2 Success = Clear Path Forward ===
1109 1109  
1110 -**Input:** Article with claim where evidence is genuinely mixed
507 +POC succeeds if we can confidently answer:
1111 1111  
1112 -**Expected Output:**
1113 -* Extract claim
1114 -* Verdict: UNCERTAIN
1115 -* Moderate confidence (40-60%)
1116 -* Reasoning explains why uncertain
509 +✅ **Technical Feasibility:**
510 +* Can AI extract claims reliably?
511 +* Can AI find balanced evidence?
512 +* Can AI compute reasonable verdicts?
1117 1117  
1118 -**Success:** AI recognizes uncertainty and doesn't overstate confidence
514 +✅ **Quality Approach:**
515 +* Do quality gates improve output?
516 +* Can we measure and track quality?
517 +* Is the gate approach scalable?
1119 1119  
1120 ----
519 +✅ **Production Path:**
520 +* Is the core architecture sound?
521 +* What needs improvement for production?
522 +* Is POC2 the right next step?
1121 1121  
1122 -=== 12.6 Test Case 6: High-Risk Medical Claim ===
1123 1123  
1124 -**Input:** Article making medical claims
525 +== 10. Related Pages ==
1125 1125  
1126 -**Expected Output:**
1127 -* Extract claim
1128 -* Verdict: [appropriate based on evidence]
1129 -* Risk tier: A (High - medical)
1130 -* Red label displayed
1131 -* Clear disclaimer about not being medical advice
527 +* **[[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]** - Full system requirements (this POC implements a subset)
528 +* **[[POC1 Specification (Detailed)>>FactHarbor.Specification.POC.Specification]]** - Detailed POC1 technical specs
529 +* **[[POC Summary>>FactHarbor.Specification.POC.Summary]]** - High-level POC overview
530 +* **[[Implementation Roadmap>>FactHarbor.Roadmap.WebHome]]** - POC1, POC2, Beta 0, V1.0 phases
531 +* **[[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]** - What users need (drives requirements)
1132 1132  
1133 -**Success:** Risk tier correctly assigned, appropriate warnings shown
1134 1134  
1135 ----
534 +**Document Owner:** Technical Team
535 +**Review Frequency:** After each POC iteration
536 +**Version History:**
537 +* v1.0 - Initial POC requirements
538 +* v2.0 - Updated after specification cross-check
539 +* v3.0 - Aligned with Main Requirements (FR/NFR IDs added)
1136 1136  
1137 -== 13. POC Decision Gate ==
1138 -
1139 -=== 13.1 Decision Framework ===
1140 -
1141 -After POC testing complete, team makes one of three decisions:
1142 -
1143 -**Option A: GO (Proceed to POC2)**
1144 -
1145 -**Conditions:**
1146 -* AI quality ≥70% without manual editing
1147 -* Basic claim → verdict pipeline validated
1148 -* Internal + advisor feedback positive
1149 -* Technical feasibility confirmed
1150 -* Team confident in direction
1151 -* Clear path to improving AI quality to ≥90%
1152 -
1153 -**Next Steps:**
1154 -* Plan POC2 development (add scenarios)
1155 -* Design scenario architecture
1156 -* Expand to Evidence Model structure
1157 -* Test with more complex articles
1158 -
1159 ----
1160 -
1161 -**Option B: NO-GO (Pivot or Stop)**
1162 -
1163 -**Conditions:**
1164 -* AI quality < 60%
1165 -* Requires manual editing for most analyses (> 50%)
1166 -* Feedback indicates fundamental flaws
1167 -* Cost/effort not justified by value
1168 -* No clear path to improvement
1169 -
1170 -**Next Steps:**
1171 -* **Pivot:** Change to hybrid human-AI approach (accept manual review required)
1172 -* **Stop:** Conclude approach not viable, revisit later
1173 -
1174 ----
1175 -
1176 -**Option C: ITERATE (Improve POC)**
1177 -
1178 -**Conditions:**
1179 -* Concept has merit but execution needs work
1180 -* Specific improvements identified
1181 -* Addressable with better prompts/approach
1182 -* AI quality between 60-70%
1183 -
1184 -**Next Steps:**
1185 -* Improve AI prompts
1186 -* Test different approaches
1187 -* Re-run POC with improvements
1188 -* Then make GO/NO-GO decision
1189 -
1190 ----
1191 -
1192 -=== 13.2 Decision Criteria Summary ===
1193 -
1194 -{{code}}
1195 -AI Quality < 60% → NO-GO (approach doesn't work)
1196 -AI Quality 60-70% → ITERATE (improve and retry)
1197 -AI Quality ≥70% → GO (proceed to POC2)
1198 -{{/code}}
1199 -
1200 ----
1201 -
1202 -== 14. Key Risks & Mitigations ==
1203 -
1204 -=== 14.1 Risk: AI Quality Not Good Enough ===
1205 -
1206 -**Likelihood:** Medium-High
1207 -**Impact:** POC fails
1208 -
1209 -**Mitigation:**
1210 -* Extensive prompt engineering and testing
1211 -* Use best available AI models (Sonnet 4.5)
1212 -* Test with diverse article types
1213 -* Iterate on prompts based on results
1214 -
1215 -**Acceptance:** This is what POC tests - be ready for failure
1216 -
1217 ----
1218 -
1219 -=== 14.2 Risk: AI Consistency Issues ===
1220 -
1221 -**Likelihood:** Medium
1222 -**Impact:** Works sometimes, fails other times
1223 -
1224 -**Mitigation:**
1225 -* Test with 10+ diverse articles
1226 -* Measure success rate honestly
1227 -* Improve prompts to increase consistency
1228 -
1229 -**Acceptance:** Some variability OK if average quality ≥70%
1230 -
1231 ----
1232 -
1233 -=== 14.3 Risk: Output Incomprehensible ===
1234 -
1235 -**Likelihood:** Low-Medium
1236 -**Impact:** Users can't understand analysis
1237 -
1238 -**Mitigation:**
1239 -* Create clear explainer document
1240 -* Iterate on output format
1241 -* Test with non-technical reviewers
1242 -* Simplify language if needed
1243 -
1244 -**Acceptance:** Iterate until comprehensible
1245 -
1246 ----
1247 -
1248 -=== 14.4 Risk: API Rate Limits / Costs ===
1249 -
1250 -**Likelihood:** Low
1251 -**Impact:** System slow or expensive
1252 -
1253 -**Mitigation:**
1254 -* Monitor API usage
1255 -* Implement retry logic
1256 -* Estimate costs before scaling
1257 -
1258 -**Acceptance:** POC can be slow and expensive (optimization later)
1259 -
1260 ----
1261 -
1262 -=== 14.5 Risk: Scope Creep ===
1263 -
1264 -**Likelihood:** Medium
1265 -**Impact:** POC becomes too complex
1266 -
1267 -**Mitigation:**
1268 -* Strict scope discipline
1269 -* Say NO to feature additions
1270 -* Keep focus on core question
1271 -
1272 -**Acceptance:** POC is minimal by design
1273 -
1274 ----
1275 -
1276 -== 15. POC Philosophy ==
1277 -
1278 -=== 15.1 Core Principles ===
1279 -
1280 -**1. Build Less, Learn More**
1281 -* Minimum features to test hypothesis
1282 -* Don't build unvalidated features
1283 -* Focus on core question only
1284 -
1285 -**2. Fail Fast**
1286 -* Quick test of hardest part (AI capability)
1287 -* Accept that POC might fail
1288 -* Better to discover issues early
1289 -* Honest assessment over optimistic hope
1290 -
1291 -**3. Test First, Build Second**
1292 -* Validate AI can do this before building platform
1293 -* Don't assume it will work
1294 -* Let results guide decisions
1295 -
1296 -**4. Automation First**
1297 -* No manual editing allowed
1298 -* Tests scalability, not just feasibility
1299 -* Proves approach can work at scale
1300 -
1301 -**5. Honest Assessment**
1302 -* Don't cherry-pick examples
1303 -* Don't manually fix bad outputs
1304 -* Document failures openly
1305 -* Make data-driven decisions
1306 -
1307 ----
1308 -
1309 -=== 15.2 What POC Is ===
1310 -
1311 -✅ Testing AI capability without humans
1312 -✅ Proving core technical concept
1313 -✅ Fast validation of approach
1314 -✅ Honest assessment of feasibility
1315 -
1316 ----
1317 -
1318 -=== 15.3 What POC Is NOT ===
1319 -
1320 -❌ Building a product
1321 -❌ Production-ready system
1322 -❌ Feature-complete platform
1323 -❌ Perfectly accurate analysis
1324 -❌ Polished user experience
1325 -
1326 ----
1327 -
1328 -== 16. Success = Clear Path Forward ==
1329 -
1330 -**If POC succeeds (≥70% AI quality):**
1331 -* ✅ Approach validated
1332 -* ✅ Proceed to POC2 (add scenarios)
1333 -* ✅ Design full Evidence Model structure
1334 -* ✅ Test multi-scenario comparison
1335 -* ✅ Focus on improving AI quality from 70% → 90%
1336 -
1337 -**If POC fails (< 60% AI quality):**
1338 -* ✅ Learn what doesn't work
1339 -* ✅ Pivot to different approach
1340 -* ✅ OR wait for better AI technology
1341 -* ✅ Avoid wasting resources on non-viable approach
1342 -
1343 -**Either way, POC provides clarity.**
1344 -
1345 ----
1346 -
1347 -== 17. Related Pages ==
1348 -
1349 -* [[User Needs>>FactHarbor.Specification.Requirements.User Needs]]
1350 -* [[Requirements>>FactHarbor.Requirements.WebHome]]
1351 -* [[Gap Analysis>>FactHarbor.Analysis.GapAnalysis]]
1352 -* [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]
1353 -* [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
1354 -* [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]
1355 -
1356 ----
1357 -
1358 -**Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)
1359 -