Last modified by Robert Schaub on 2025/12/24 21:53

Hide last authors
Robert Schaub 1.1 1 = POC1: Core Workflow with Quality Gates =
2
3 **Phase Goal:** Prove AKEL can produce credible, quality outputs without manual intervention
4
5 **Success Metric:** <10% hallucination rate, quality gates prevent low-confidence publications
6
7 == 1. Overview ==
8
9 POC1 validates that the core AKEL workflow (Article → Claims → Verdicts) can produce trustworthy fact-checking analyses automatically. This phase implements **2 critical quality gates** to prevent low-quality outputs from being published.
10
11 **Key Innovation:** Quality validation BEFORE publication, not after
12
13 **What We're Proving:**
14
15 * AKEL can reliably extract factual claims from articles
16 * AKEL can generate credible verdicts with proper evidence
17 * **AKEL can assess article credibility beyond simple claim averaging** (context-aware analysis)
18 * Quality gates prevent hallucinations and low-confidence outputs
19 * Fully automated approach is viable
20
21 == 2. Scope ==
22
23 === In Scope ===
24
25 * Core AKEL workflow (claim extraction, verdict generation)
26 * **Gate 1:** Claim Validation (factual vs. opinion/prediction)
27 * **Gate 4:** Verdict Confidence Assessment (minimum 2 sources, quality thresholds)
28 * Basic UI to display results
29 * Manual quality tracking
30
31 === Out of Scope (Deferred to POC2+) ===
32
33 * User accounts / authentication
34 * Corrections system
35 * Search engine optimization (ClaimReview schema)
36 * Image verification
37 * API endpoints
38 * Archive.org integration
39 * Security hardening
40 * A/B testing
41 * Gates 2 & 3 (Evidence relevance, Scenario coherence)
42
43 === Experimental Features (POC1) ===
44
45 **Context-Aware Analysis** (Approach 1: Single-Pass Holistic)
46
47 **Goal:** Test if AI can detect when an article's overall credibility differs from the average of its claim verdicts (e.g., accurate facts but misleading conclusion).
48
49 **Implementation:**
50 * Enhanced AI prompt to evaluate logical structure
51 * AI identifies article's main argument
52 * AI assesses if conclusion follows from evidence
53 * Article verdict may differ from claim average
54
55 **Testing:**
56 * 30-article test set (10 straightforward, 10 misleading, 10 complex)
57 * Success criteria: ≥70% accuracy on misleading articles
58 * Marked as experimental - doesn't block POC1 success
59
60 **See:** [[Article Verdict Problem>>FactHarbor.Specification.POC.Article-Verdict-Problem]] for complete analysis
61
62 **Decision:**
63 * If ≥70% accuracy → ship in POC2
64 * If 50-70% → try weighted aggregation approach
65 * If <50% → defer to POC2 with different approach
66
67 == 3. Requirements ==
68
69 === 3.1 NFR11: Quality Assurance Framework (POC1 Lite Version) ===
70
71 **Importance:** CRITICAL - Core POC1 Requirement
72 **Fulfills:** AI safety, credibility, prevents embarrassing failures
73
74 **Specification:**
75
76 AKEL must validate outputs before displaying to users. POC1 implements a **2-gate subset** of the full NFR11 framework.
77
78 ==== Gate 1: Claim Validation ====
79
80 **Purpose:** Ensure extracted claims are factual assertions, not opinions or predictions
81
82 **Validation Checks:**
83
84 1. **Factual Statement Test:** Can this be verified with evidence?
85 2. **Opinion Detection:** Contains hedging language? ("I think", "probably", "best", "worst")
86 3. **Specificity Score:** Contains concrete details? (names, numbers, dates, locations)
87 4. **Future Prediction Test:** Makes claims about future events?
88
89 **Pass Criteria:**
90 {{code}}- isFactual: true
91 - opinionScore: ≤ 0.3
92 - specificityScore: ≥ 0.3
93 - claimType: FACTUAL{{/code}}
94
95 **Action if Failed:**
96
97 * Flag as "Non-verifiable: Opinion/Prediction/Ambiguous"
98 * Do NOT generate scenarios or verdicts
99 * Display explanation to user
100
101 **Target:** 0% opinion statements processed as facts
102
103 ==== Gate 4: Verdict Confidence Assessment ====
104
105 **Purpose:** Only publish verdicts with sufficient evidence and confidence
106
107 **Validation Checks:**
108
109 1. **Evidence Count:** Minimum 2 independent sources
110 2. **Source Quality:** Average reliability ≥ 0.6 (on 0-1 scale)
111 3. **Evidence Agreement:** % supporting vs. contradicting ≥ 0.6
112 4. **Uncertainty Factors:** Count of hedging statements in reasoning
113
114 **Confidence Tiers:**
115 {{code}}HIGH (80-100%):
116 - ≥3 sources
117 - ≥0.7 average quality
118 - ≥80% agreement
119
120 MEDIUM (50-79%):
121 - ≥2 sources
122 - ≥0.6 average quality
123 - ≥60% agreement
124
125 LOW (0-49%):
126 - ≥2 sources BUT low quality/agreement
127
128 INSUFFICIENT:
129 - <2 sources → DO NOT PUBLISH{{/code}}
130
131 **POC1 Publication Rule:**
132
133 * Minimum **MEDIUM** confidence required
134 * Blocked verdicts show "Insufficient Evidence" message
135
136 **Target:** 0% verdicts published with <2 sources
137
138 === 3.2 Modified FR7: Automated Verdicts (Enhanced) ===
139
140 **Enhancement for POC1:**
141
142 After AKEL generates a verdict, it must pass through the quality validation pipeline:
143
144 {{code}}
145 AKEL Workflow (POC1):
146
147 1. Extract claims from article
148
149 2. [GATE 1] Validate each claim is fact-checkable
150 ↓ (pass claims only)
151 3. Generate verdicts for each claim
152
153 4. [GATE 4] Validate verdict has sufficient evidence
154 ↓ (pass verdicts only)
155 5. Display to user
156
157 Failed claims/verdicts:
158 - Store in database with failure reason
159 - Display explanatory message to user
160 - Log for quality metrics tracking
161 {{/code}}
162
163 **Updated Verdict States:**
164
165 * PUBLISHED - Passed all gates
166 * INSUFFICIENT_EVIDENCE - Failed Gate 4
167 * NON_FACTUAL_CLAIM - Failed Gate 1
168 * PROCESSING - In progress
169 * ERROR - System failure
170
171 === 3.3 Modified FR4: Analysis Summary (Enhanced) ===
172
173 **Enhancement for POC1:**
174
175 Analysis Summary must now display quality metadata:
176
177 {{code}}
178 Analysis Summary:
179 Total Claims Found: 5
180 Verifiable Claims: 3
181 Non-verifiable (Opinion): 1
182 Non-verifiable (Prediction): 1
183
184 Verdicts Generated: 3
185 High Confidence: 1
186 Medium Confidence: 2
187 Insufficient Evidence: 0
188
189 Evidence Sources: 12 total
190 Average Source Quality: 0.73
191
192 Quality Score: 8.5/10
193 {{/code}}
194
195 == 4. Success Criteria ==
196
197 POC1 is considered **SUCCESSFUL** if:
198
199 **✅ Functional:**
200
201 * Processes diverse test articles without crashes
202 * Generates verdicts for all factual claims
203 * Blocks all non-factual claims (0% pass through)
204 * Blocks all insufficient-evidence verdicts (0% with <2 sources)
205
206 **✅ Quality:**
207
208 * Hallucination rate <10% (manual verification)
209 * 0 verdicts with <2 sources published
210 * 0 opinion statements published as facts
211 * Average quality score ≥7.0/10
212
213 **✅ Performance:**
214
215 * Processing time reasonable for POC demonstration
216 * Quality gates execute efficiently
217 * UI displays results clearly
218
219 **✅ Learnings:**
220
221 * Identified prompt engineering improvements
222 * Documented AKEL strengths/weaknesses
223 * Validated threshold values
224 * Clear path to POC2 defined
225
226 == 5. Decision Gates ==
227
228 **POC1 → POC2 Decision:**
229
230 * **IF** hallucination rate >10% → Pause, improve prompts before POC2
231 * **IF** majority of claims non-processable → Rethink claim extraction approach
232 * **IF** quality gates too strict (excessive blocking) → Adjust thresholds
233 * **IF** quality gates too loose (hallucinations pass) → Tighten criteria
234
235 **Only proceed to POC2 if all success criteria met**
236
237 == 6. Architecture Notes ==
238
239 **POC1 Simplified Architecture:**
240
241 {{code}}
242 User Input → AKEL Processing → Quality Gates → Display
243 (claim extraction (Gates 1 & 4)
244 + verdicts)
245 {{/code}}
246
247 **vs. Full System (Future):**
248
249 {{code}}
250 Input → Claim Extractor → Scenario Generator → Evidence Linker
251 → Verdict Generator → All 4 Gates → Review Queue → Publication
252 {{/code}}
253
254 **POC1 Acceptable Simplifications:**
255
256 * Single AKEL call (not multi-component pipeline)
257 * No scenarios (implicit in verdicts)
258 * Basic evidence linking
259 * 2 gates instead of 4
260 * No review queue
261
262 **See:** [[Architecture>>FactHarbor pre10 V0\.9\.70.Specification.Architecture.WebHome]] for details
263
264 == Related Pages ==
265
266 * [[Roadmap Overview>>FactHarbor pre10 V0\.9\.70.Roadmap.WebHome]] - All phases
267 * [[POC2 Requirements>>FactHarbor pre10 V0\.9\.70.Roadmap.POC2.WebHome]] - Next phase
268 * [[Requirements>>FactHarbor pre10 V0\.9\.70.Specification.Requirements.WebHome]] - Full system requirements
269 * [[Architecture>>FactHarbor pre10 V0\.9\.70.Specification.Architecture.WebHome]] - System architecture
270 * [[NFR11 Full Specification>>FactHarbor.Specification.Requirements.WebHome#NFR11]] - Complete quality framework
271
272 **Document Status:** ✅ POC1 Specification Complete - Ready for Implementation
273 **Version:** V0.9.70