Last modified by Robert Schaub on 2025/12/24 11:47

Show last authors
1 = POC1: Core Workflow with Quality Gates =
2
3 **Phase Goal:** Prove AKEL can produce credible, quality outputs without manual intervention
4
5 **Success Metric:** <10% hallucination rate, quality gates prevent low-confidence publications
6
7
8 == 1. Overview ==
9
10 POC1 validates that the core AKEL workflow (Article → Claims → Verdicts) can produce trustworthy fact-checking analyses automatically. This phase implements **2 critical quality gates** to prevent low-quality outputs from being published.
11
12 **Key Innovation:** Quality validation BEFORE publication, not after
13
14 **What We're Proving:**
15
16 * AKEL can reliably extract factual claims from articles
17 * AKEL can generate credible verdicts with proper evidence
18 * **AKEL can assess article credibility beyond simple claim averaging** (context-aware analysis)
19 * Quality gates prevent hallucinations and low-confidence outputs
20 * Fully automated approach is viable
21
22 == 2. Scope ==
23
24 === In Scope ===
25
26 * Core AKEL workflow (claim extraction, verdict generation)
27 * **Gate 1:** Claim Validation (factual vs. opinion/prediction)
28 * **Gate 4:** Verdict Confidence Assessment (minimum 2 sources, quality thresholds)
29 * Basic UI to display results
30 * Manual quality tracking
31
32 === Out of Scope (Deferred to POC2+) ===
33
34 * User accounts / authentication
35 * Corrections system
36 * Search engine optimization (ClaimReview schema)
37 * Image verification
38 * API endpoints
39 * Archive.org integration
40 * Security hardening
41 * A/B testing
42 * Gates 2 & 3 (Evidence relevance, Scenario coherence)
43
44
45 === Experimental Features (POC1) ===
46
47 **Context-Aware Analysis** (Approach 1: Single-Pass Holistic)
48
49 **Goal:** Test if AI can detect when an article's overall credibility differs from the average of its claim verdicts (e.g., accurate facts but misleading conclusion).
50
51 **Implementation:**
52 * Enhanced AI prompt to evaluate logical structure
53 * AI identifies article's main argument
54 * AI assesses if conclusion follows from evidence
55 * Article verdict may differ from claim average
56
57 **Testing:**
58 * 30-article test set (10 straightforward, 10 misleading, 10 complex)
59 * Success criteria: ≥70% accuracy on misleading articles
60 * Marked as experimental - doesn't block POC1 success
61
62 **See:** [[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]] for complete analysis
63
64 **Decision:**
65 * If ≥70% accuracy → ship in POC2
66 * If 50-70% → try weighted aggregation approach
67 * If <50% → defer to POC2 with different approach
68
69 == 3. Requirements ==
70
71 === 3.1 NFR11: Quality Assurance Framework (POC1 Lite Version) ===
72
73 **Importance:** CRITICAL - Core POC1 Requirement
74 **Fulfills:** AI safety, credibility, prevents embarrassing failures
75
76 **Specification:**
77
78 AKEL must validate outputs before displaying to users. POC1 implements a **2-gate subset** of the full NFR11 framework.
79
80 ==== Gate 1: Claim Validation ====
81
82 **Purpose:** Ensure extracted claims are factual assertions, not opinions or predictions
83
84 **Validation Checks:**
85
86 1. **Factual Statement Test:** Can this be verified with evidence?
87 2. **Opinion Detection:** Contains hedging language? ("I think", "probably", "best", "worst")
88 3. **Specificity Score:** Contains concrete details? (names, numbers, dates, locations)
89 4. **Future Prediction Test:** Makes claims about future events?
90
91 **Pass Criteria:**
92 {{code}}- isFactual: true
93 - opinionScore: ≤ 0.3
94 - specificityScore: ≥ 0.3
95 - claimType: FACTUAL{{/code}}
96
97 **Action if Failed:**
98
99 * Flag as "Non-verifiable: Opinion/Prediction/Ambiguous"
100 * Do NOT generate scenarios or verdicts
101 * Display explanation to user
102
103 **Target:** 0% opinion statements processed as facts
104
105
106 ==== Gate 4: Verdict Confidence Assessment ====
107
108 **Purpose:** Only publish verdicts with sufficient evidence and confidence
109
110 **Validation Checks:**
111
112 1. **Evidence Count:** Minimum 2 independent sources
113 2. **Source Quality:** Average reliability ≥ 0.6 (on 0-1 scale)
114 3. **Evidence Agreement:** % supporting vs. contradicting ≥ 0.6
115 4. **Uncertainty Factors:** Count of hedging statements in reasoning
116
117 **Confidence Tiers:**
118 {{code}}HIGH (80-100%):
119 - ≥3 sources
120 - ≥0.7 average quality
121 - ≥80% agreement
122
123 MEDIUM (50-79%):
124 - ≥2 sources
125 - ≥0.6 average quality
126 - ≥60% agreement
127
128 LOW (0-49%):
129 - ≥2 sources BUT low quality/agreement
130
131 INSUFFICIENT:
132 - <2 sources → DO NOT PUBLISH{{/code}}
133
134 **POC1 Publication Rule:**
135
136 * Minimum **MEDIUM** confidence required
137 * Blocked verdicts show "Insufficient Evidence" message
138
139 **Target:** 0% verdicts published with <2 sources
140
141
142 === 3.2 Modified FR7: Automated Verdicts (Enhanced) ===
143
144 **Enhancement for POC1:**
145
146 After AKEL generates a verdict, it must pass through the quality validation pipeline:
147
148 {{code}}
149 AKEL Workflow (POC1):
150
151 1. Extract claims from article
152
153 2. [GATE 1] Validate each claim is fact-checkable
154 ↓ (pass claims only)
155 3. Generate verdicts for each claim
156
157 4. [GATE 4] Validate verdict has sufficient evidence
158 ↓ (pass verdicts only)
159 5. Display to user
160
161 Failed claims/verdicts:
162 - Store in database with failure reason
163 - Display explanatory message to user
164 - Log for quality metrics tracking
165 {{/code}}
166
167 **Updated Verdict States:**
168
169 * PUBLISHED - Passed all gates
170 * INSUFFICIENT_EVIDENCE - Failed Gate 4
171 * NON_FACTUAL_CLAIM - Failed Gate 1
172 * PROCESSING - In progress
173 * ERROR - System failure
174
175 === 3.3 Modified FR4: Analysis Summary (Enhanced) ===
176
177 **Enhancement for POC1:**
178
179 Analysis Summary must now display quality metadata:
180
181 {{code}}
182 Analysis Summary:
183 Total Claims Found: 5
184 Verifiable Claims: 3
185 Non-verifiable (Opinion): 1
186 Non-verifiable (Prediction): 1
187
188 Verdicts Generated: 3
189 High Confidence: 1
190 Medium Confidence: 2
191 Insufficient Evidence: 0
192
193 Evidence Sources: 12 total
194 Average Source Quality: 0.73
195
196 Quality Score: 8.5/10
197 {{/code}}
198
199
200 == 4. Success Criteria ==
201
202 POC1 is considered **SUCCESSFUL** if:
203
204 **✅ Functional:**
205
206 * Processes diverse test articles without crashes
207 * Generates verdicts for all factual claims
208 * Blocks all non-factual claims (0% pass through)
209 * Blocks all insufficient-evidence verdicts (0% with <2 sources)
210
211 **✅ Quality:**
212
213 * Hallucination rate <10% (manual verification)
214 * 0 verdicts with <2 sources published
215 * 0 opinion statements published as facts
216 * Average quality score ≥7.0/10
217
218 **✅ Performance:**
219
220 * Processing time reasonable for POC demonstration
221 * Quality gates execute efficiently
222 * UI displays results clearly
223
224 **✅ Learnings:**
225
226 * Identified prompt engineering improvements
227 * Documented AKEL strengths/weaknesses
228 * Validated threshold values
229 * Clear path to POC2 defined
230
231 == 5. Decision Gates ==
232
233 **POC1 → POC2 Decision:**
234
235 * **IF** hallucination rate >10% → Pause, improve prompts before POC2
236 * **IF** majority of claims non-processable → Rethink claim extraction approach
237 * **IF** quality gates too strict (excessive blocking) → Adjust thresholds
238 * **IF** quality gates too loose (hallucinations pass) → Tighten criteria
239
240 **Only proceed to POC2 if all success criteria met**
241
242
243 == 6. Architecture Notes ==
244
245 **POC1 Simplified Architecture:**
246
247 {{code}}
248 User Input → AKEL Processing → Quality Gates → Display
249 (claim extraction (Gates 1 & 4)
250 + verdicts)
251 {{/code}}
252
253 **vs. Full System (Future):**
254
255 {{code}}
256 Input → Claim Extractor → Scenario Generator → Evidence Linker
257 → Verdict Generator → All 4 Gates → Review Queue → Publication
258 {{/code}}
259
260 **POC1 Acceptable Simplifications:**
261
262 * Single AKEL call (not multi-component pipeline)
263 * No scenarios (implicit in verdicts)
264 * Basic evidence linking
265 * 2 gates instead of 4
266 * No review queue
267
268 **See:** [[Architecture>>Test.FactHarbor pre10 V0\.9\.70.Specification.Architecture.WebHome]] for details
269
270
271 == Related Pages ==
272
273 * [[Roadmap Overview>>Test.FactHarbor pre10 V0\.9\.70.Roadmap.WebHome]] - All phases
274 * [[POC2 Requirements>>Test.FactHarbor pre10 V0\.9\.70.Roadmap.POC2.WebHome]] - Next phase
275 * [[Requirements>>Test.FactHarbor pre10 V0\.9\.70.Specification.Requirements.WebHome]] - Full system requirements
276 * [[Architecture>>Test.FactHarbor pre10 V0\.9\.70.Specification.Architecture.WebHome]] - System architecture
277 * [[NFR11 Full Specification>>Test.FactHarbor.Specification.Requirements.WebHome#NFR11]] - Complete quality framework
278
279 **Document Status:** ✅ POC1 Specification Complete - Ready for Implementation
280 **Version:** V0.9.70