Last modified by Robert Schaub on 2025/12/24 21:53

Show last authors
1 = POC2: Robust Quality & Reliability =
2
3 **Phase Goal:** Prove AKEL produces high-quality outputs consistently at scale
4
5 **Success Metric:** <5% hallucination rate, all 4 quality gates operational
6
7 == 1. Overview ==
8
9 POC2 extends POC1 by implementing the full quality assurance framework (all 4 gates), adding evidence deduplication, and processing significantly more test articles to validate system reliability at scale.
10
11 **Key Innovation:** Complete quality validation pipeline catches all categories of errors
12
13 **What We're Proving:**
14
15 * All 4 quality gates work together effectively
16 * Evidence deduplication prevents artificial inflation
17 * System maintains quality at larger scale
18 * Quality metrics dashboard provides actionable insights
19
20 == 2. New Requirements ==
21
22 === 2.1 NFR11: Complete Quality Assurance Framework ===
23
24 **Add Gates 2 & 3** (POC1 had only Gates 1 & 4)
25
26 ==== Gate 2: Evidence Relevance Validation ====
27
28 **Purpose:** Ensure AI-linked evidence actually relates to the claim
29
30 **Validation Checks:**
31
32 1. **Semantic Similarity:** Cosine similarity between claim and evidence embeddings ≥ 0.6
33 2. **Entity Overlap:** At least 1 shared named entity between claim and evidence
34 3. **Topic Relevance:** Evidence discusses the claim's subject matter (score ≥ 0.5)
35
36 **Action if Failed:**
37
38 * Discard irrelevant evidence (don't count it)
39 * If <2 relevant evidence items remain → "Insufficient Evidence" verdict
40 * Log discarded evidence for quality review
41
42 **Target:** 0% of evidence cited is off-topic
43
44 ==== Gate 3: Scenario Coherence Check ====
45
46 **Purpose:** Validate scenarios are logical, complete, and meaningfully different
47
48 **Validation Checks:**
49
50 1. **Completeness:** All required fields populated (assumptions, scope, evidence context)
51 2. **Internal Consistency:** Assumptions don't contradict each other (score <0.3)
52 3. **Distinctiveness:** Scenarios are meaningfully different (similarity <0.8)
53 4. **Minimum Detail:** At least 1 specific assumption per scenario
54
55 **Action if Failed:**
56
57 * Merge duplicate scenarios
58 * Flag contradictory assumptions for review
59 * Reduce confidence score by 20%
60 * Do not publish if <2 distinct scenarios
61
62 **Target:** 0% duplicate scenarios, all scenarios internally consistent
63
64 === 2.2 FR54: Evidence Deduplication (NEW) ===
65
66 **Importance:** HIGH
67 **Fulfills:** Accurate evidence counting, prevents artificial inflation
68
69 **Purpose:** Prevent counting the same evidence multiple times when cited by different sources
70
71 **Problem:**
72
73 * Wire services (AP, Reuters) redistribute same content
74 * Different sites cite the same original study
75 * Aggregators copy primary sources
76 * AKEL might count this as "5 sources" when it's really 1
77
78 **Solution: Content Fingerprinting**
79
80 * Generate SHA-256 hash of normalized text
81 * Detect near-duplicates (≥85% similarity) using fuzzy matching
82 * Track which sources cited each unique piece of evidence
83 * Display provenance chain to user
84
85 **Target:** Duplicate detection >95% accurate, evidence counts reflect reality
86
87 === 2.3 NFR13: Quality Metrics Dashboard (Internal) ===
88
89 **Importance:** HIGH
90 **Fulfills:** Real-time quality monitoring during development
91
92 **Dashboard Metrics:**
93
94 * Claim processing statistics
95 * Gate performance (pass/fail rates for each gate)
96 * Evidence quality metrics
97 * Hallucination rate tracking
98 * Processing performance
99
100 **Target:** Dashboard functional, all metrics tracked, exportable
101
102 == 3. Success Criteria ==
103
104 **✅ Quality:**
105
106 * Hallucination rate <5% (target: <3%)
107 * Average quality rating ≥8.0/10
108 * 0 critical failures (publishable falsities)
109 * Gates correctly identify >95% of low-quality outputs
110
111 **✅ All 4 Gates Operational:**
112
113 * Gate 1: Claim validation working
114 * Gate 2: Evidence relevance filtering working
115 * Gate 3: Scenario coherence checking working
116 * Gate 4: Verdict confidence assessment working
117
118 **✅ Evidence Deduplication:**
119
120 * Duplicate detection >95% accurate
121 * Evidence counts reflect reality
122 * Provenance tracked correctly
123
124 **✅ Metrics Dashboard:**
125
126 * All metrics implemented and tracking
127 * Dashboard functional and useful
128 * Alerts trigger appropriately
129
130 == 4. Architecture Notes ==
131
132 **POC2 Enhanced Architecture:**
133
134 {{code}}
135 Input → AKEL Processing → All 4 Quality Gates → Display
136 (claims + scenarios (1: Claim validation
137 + evidence linking 2: Evidence relevance
138 + verdicts) 3: Scenario coherence
139 4: Verdict confidence)
140 {{/code}}
141
142 **Key Additions from POC1:**
143
144 * Scenario generation component
145 * Evidence deduplication system
146 * Gates 2 & 3 implementation
147 * Quality metrics collection
148
149 **Still Simplified vs. Full System:**
150
151 * Single AKEL orchestration (not multi-component pipeline)
152 * No review queue
153 * No federation architecture
154
155 **See:** [[Architecture>>FactHarbor pre10 V0\.9\.70.Specification.Architecture.WebHome]] for details
156
157 == 5. Context-Aware Analysis (Conditional Feature) ==
158
159 **Status:** Depends on POC1 experimental test results
160
161 **Background:**
162
163 POC1 tested context-aware analysis as an experimental feature using Approach 1 (Single-Pass Holistic Analysis). The goal is to detect when articles use accurate individual claims but reach misleading conclusions through faulty logic or selective presentation.
164
165 **See:** [[Article Verdict Problem>>FactHarbor.Specification.POC.Article-Verdict-Problem]] for complete investigation
166
167 === 5.1 POC2 Implementation Path ===
168
169 **Decision based on POC1 test results (30-article test set):**
170
171 ==== If POC1 Accuracy ≥70% (Success) ====
172
173 **Action:** Implement as standard feature (no longer experimental)
174
175 **Enhancement to FR4:**
176 * Context-aware analysis becomes part of standard Analysis Summary
177 * Article verdict may differ from simple claim average
178 * AI evaluates logical structure and reasoning quality
179
180 **Potential Upgrade to Approach 6 (Hybrid):**
181 * Add weighted claim importance (some claims more central than others)
182 * Add rule-based fallacy detection alongside AI reasoning
183 * Combine AI judgment with heuristic checks for robustness
184
185 **Target:** Maintain ≥70% accuracy at detecting misleading articles
186
187 ==== If POC1 Accuracy 50-70% (Promising) ====
188
189 **Action:** Implement alternative Approach 4 (Weighted Aggregation)
190
191 **Instead of holistic analysis:**
192 * AI assigns importance weights (0-1) to each claim
193 * Weight based on: claim centrality, evidence strength, logical role
194 * Article verdict = weighted average of claim verdicts
195 * More structured than pure AI reasoning
196
197 **Rationale:** If holistic reasoning is inconsistent, structured weighting may work better
198
199 ==== If POC1 Accuracy <50% (Insufficient) ====
200
201 **Action:** Defer context-aware analysis to post-POC2
202
203 **Fallback:**
204 * Focus on individual claim accuracy only
205 * Article verdict = simple average of claim verdicts
206 * Note limitation: May miss misleading articles built from accurate claims
207
208 **Future consideration:** Try Approach 7 (LLM-as-Judge) with better models in future releases
209
210 === 5.2 Testing in POC2 ===
211
212 **If context-aware feature is implemented:**
213
214 * Expand test set from 30 to 100 articles
215 * Include more diverse article types (op-eds, news, analysis, advocacy)
216 * Track false positive rate (flagging good articles as misleading)
217 * Validate with subject matter experts when possible
218
219 **Success Metrics:**
220 * ≥70% accuracy on misleading article detection
221 * <15% false positive rate
222 * Reasoning is comprehensible to users
223
224 === 5.3 Architecture Notes ===
225
226 **Context-aware analysis adds NO additional API calls**
227
228 The enhanced analysis happens within the existing AKEL workflow:
229
230 {{code}}
231 Standard Flow: Context-Aware Enhancement:
232 1. Extract claims 1. Extract claims + mark central claims
233 2. Find evidence 2. Find evidence
234 3. Generate verdicts 3. Generate verdicts
235 4. Write summary 4. Write context-aware summary
236 (evaluates article structure)
237 {{/code}}
238
239 **Cost:** $0 increase (same API calls, enhanced prompt only)
240
241 **See:** [[POC Requirements>>FactHarbor.Specification.POC.Requirements]] Component 1 for implementation details
242
243 == Related Pages ==
244
245 * [[POC1>>FactHarbor pre10 V0\.9\.70.Roadmap.POC1.WebHome]] - Previous phase
246 * [[Beta 0>>FactHarbor pre10 V0\.9\.70.Roadmap.Beta0.WebHome]] - Next phase
247 * [[Roadmap Overview>>FactHarbor pre10 V0\.9\.70.Roadmap.WebHome]]
248 * [[Architecture>>FactHarbor pre10 V0\.9\.70.Specification.Architecture.WebHome]]
249
250 **Document Status:** ✅ POC2 Specification Complete - Waiting for POC1 Completion
251 **Version:** V0.9.70