Last modified by Robert Schaub on 2025/12/23 11:02

Hide last authors
Robert Schaub 1.1 1 = POC2: Robust Quality & Reliability =
2
3 **Phase Goal:** Prove AKEL produces high-quality outputs consistently at scale
4
5 **Success Metric:** <5% hallucination rate, all 4 quality gates operational
6
7
8 == 1. Overview ==
9
10 POC2 extends POC1 by implementing the full quality assurance framework (all 4 gates), adding evidence deduplication, and processing significantly more test articles to validate system reliability at scale.
11
12 **Key Innovation:** Complete quality validation pipeline catches all categories of errors
13
14 **What We're Proving:**
15
16 * All 4 quality gates work together effectively
17 * Evidence deduplication prevents artificial inflation
18 * System maintains quality at larger scale
19 * Quality metrics dashboard provides actionable insights
20
21 == 2. New Requirements ==
22
23 === 2.1 NFR11: Complete Quality Assurance Framework ===
24
25 **Add Gates 2 & 3** (POC1 had only Gates 1 & 4)
26
27 ==== Gate 2: Evidence Relevance Validation ====
28
29 **Purpose:** Ensure AI-linked evidence actually relates to the claim
30
31 **Validation Checks:**
32
33 1. **Semantic Similarity:** Cosine similarity between claim and evidence embeddings ≥ 0.6
34 2. **Entity Overlap:** At least 1 shared named entity between claim and evidence
35 3. **Topic Relevance:** Evidence discusses the claim's subject matter (score ≥ 0.5)
36
37 **Action if Failed:**
38
39 * Discard irrelevant evidence (don't count it)
40 * If <2 relevant evidence items remain → "Insufficient Evidence" verdict
41 * Log discarded evidence for quality review
42
43 **Target:** 0% of evidence cited is off-topic
44
45
46 ==== Gate 3: Scenario Coherence Check ====
47
48 **Purpose:** Validate scenarios are logical, complete, and meaningfully different
49
50 **Validation Checks:**
51
52 1. **Completeness:** All required fields populated (assumptions, scope, evidence context)
53 2. **Internal Consistency:** Assumptions don't contradict each other (score <0.3)
54 3. **Distinctiveness:** Scenarios are meaningfully different (similarity <0.8)
55 4. **Minimum Detail:** At least 1 specific assumption per scenario
56
57 **Action if Failed:**
58
59 * Merge duplicate scenarios
60 * Flag contradictory assumptions for review
61 * Reduce confidence score by 20%
62 * Do not publish if <2 distinct scenarios
63
64 **Target:** 0% duplicate scenarios, all scenarios internally consistent
65
66
67 === 2.2 FR54: Evidence Deduplication (NEW) ===
68
69 **Priority:** HIGH
70 **Fulfills:** Accurate evidence counting, prevents artificial inflation
71
72 **Purpose:** Prevent counting the same evidence multiple times when cited by different sources
73
74 **Problem:**
75
76 * Wire services (AP, Reuters) redistribute same content
77 * Different sites cite the same original study
78 * Aggregators copy primary sources
79 * AKEL might count this as "5 sources" when it's really 1
80
81 **Solution: Content Fingerprinting**
82
83 * Generate SHA-256 hash of normalized text
84 * Detect near-duplicates (≥85% similarity) using fuzzy matching
85 * Track which sources cited each unique piece of evidence
86 * Display provenance chain to user
87
88 **Target:** Duplicate detection >95% accurate, evidence counts reflect reality
89
90
91 === 2.3 NFR13: Quality Metrics Dashboard (Internal) ===
92
93 **Priority:** HIGH
94 **Fulfills:** Real-time quality monitoring during development
95
96 **Dashboard Metrics:**
97
98 * Claim processing statistics
99 * Gate performance (pass/fail rates for each gate)
100 * Evidence quality metrics
101 * Hallucination rate tracking
102 * Processing performance
103
104 **Target:** Dashboard functional, all metrics tracked, exportable
105
106
107 == 3. Success Criteria ==
108
109 **✅ Quality:**
110
111 * Hallucination rate <5% (target: <3%)
112 * Average quality rating ≥8.0/10
113 * 0 critical failures (publishable falsities)
114 * Gates correctly identify >95% of low-quality outputs
115
116 **✅ All 4 Gates Operational:**
117
118 * Gate 1: Claim validation working
119 * Gate 2: Evidence relevance filtering working
120 * Gate 3: Scenario coherence checking working
121 * Gate 4: Verdict confidence assessment working
122
123 **✅ Evidence Deduplication:**
124
125 * Duplicate detection >95% accurate
126 * Evidence counts reflect reality
127 * Provenance tracked correctly
128
129 **✅ Metrics Dashboard:**
130
131 * All metrics implemented and tracking
132 * Dashboard functional and useful
133 * Alerts trigger appropriately
134
135 == 4. Architecture Notes ==
136
137 **POC2 Enhanced Architecture:**
138
139 {{code}}
140 Input → AKEL Processing → All 4 Quality Gates → Display
141 (claims + scenarios (1: Claim validation
142 + evidence linking 2: Evidence relevance
143 + verdicts) 3: Scenario coherence
144 4: Verdict confidence)
145 {{/code}}
146
147 **Key Additions from POC1:**
148
149 * Scenario generation component
150 * Evidence deduplication system
151 * Gates 2 & 3 implementation
152 * Quality metrics collection
153
154 **Still Simplified vs. Full System:**
155
156 * Single AKEL orchestration (not multi-component pipeline)
157 * No review queue
158 * No federation architecture
159
160 **See:** [[Architecture>>Test.FactHarbor pre10 V0\.9\.70.Specification.Architecture.WebHome]] for details
161
162
163 == Related Pages ==
164
165 * [[POC1>>Test.FactHarbor pre10 V0\.9\.70.Roadmap.POC1.WebHome]] - Previous phase
166 * [[Beta 0>>Test.FactHarbor pre10 V0\.9\.70.Roadmap.Beta0.WebHome]] - Next phase
167 * [[Roadmap Overview>>Test.FactHarbor pre10 V0\.9\.70.Roadmap.WebHome]]
168 * [[Architecture>>Test.FactHarbor pre10 V0\.9\.70.Specification.Architecture.WebHome]]
169
170 **Document Status:** ✅ POC2 Specification Complete - Waiting for POC1 Completion
171 **Version:** V0.9.70