Version 1.1 by Robert Schaub on 2025/12/21 11:25

Show last authors
1 = POC2: Robust Quality & Reliability =
2
3 **Phase Goal:** Prove AKEL produces high-quality outputs consistently at scale
4
5 **Success Metric:** <5% hallucination rate, all 4 quality gates operational
6
7 ---
8
9 == 1. Overview ==
10
11 POC2 extends POC1 by implementing the full quality assurance framework (all 4 gates), adding evidence deduplication, and processing significantly more test articles to validate system reliability at scale.
12
13 **Key Innovation:** Complete quality validation pipeline catches all categories of errors
14
15 **What We're Proving:**
16 * All 4 quality gates work together effectively
17 * Evidence deduplication prevents artificial inflation
18 * System maintains quality at larger scale
19 * Quality metrics dashboard provides actionable insights
20
21 ---
22
23 == 2. New Requirements ==
24
25 === 2.1 NFR11: Complete Quality Assurance Framework ===
26
27 **Add Gates 2 & 3** (POC1 had only Gates 1 & 4)
28
29 ==== Gate 2: Evidence Relevance Validation ====
30
31 **Purpose:** Ensure AI-linked evidence actually relates to the claim
32
33 **Validation Checks:**
34 1. **Semantic Similarity:** Cosine similarity between claim and evidence embeddings ≥ 0.6
35 2. **Entity Overlap:** At least 1 shared named entity between claim and evidence
36 3. **Topic Relevance:** Evidence discusses the claim's subject matter (score ≥ 0.5)
37
38 **Action if Failed:**
39 * Discard irrelevant evidence (don't count it)
40 * If <2 relevant evidence items remain → "Insufficient Evidence" verdict
41 * Log discarded evidence for quality review
42
43 **Target:** 0% of evidence cited is off-topic
44
45 ---
46
47 ==== Gate 3: Scenario Coherence Check ====
48
49 **Purpose:** Validate scenarios are logical, complete, and meaningfully different
50
51 **Validation Checks:**
52 1. **Completeness:** All required fields populated (assumptions, scope, evidence context)
53 2. **Internal Consistency:** Assumptions don't contradict each other (score <0.3)
54 3. **Distinctiveness:** Scenarios are meaningfully different (similarity <0.8)
55 4. **Minimum Detail:** At least 1 specific assumption per scenario
56
57 **Action if Failed:**
58 * Merge duplicate scenarios
59 * Flag contradictory assumptions for review
60 * Reduce confidence score by 20%
61 * Do not publish if <2 distinct scenarios
62
63 **Target:** 0% duplicate scenarios, all scenarios internally consistent
64
65 ---
66
67 === 2.2 FR54: Evidence Deduplication (NEW) ===
68
69 **Priority:** HIGH
70 **Fulfills:** Accurate evidence counting, prevents artificial inflation
71
72 **Purpose:** Prevent counting the same evidence multiple times when cited by different sources
73
74 **Problem:**
75 * Wire services (AP, Reuters) redistribute same content
76 * Different sites cite the same original study
77 * Aggregators copy primary sources
78 * AKEL might count this as "5 sources" when it's really 1
79
80 **Solution: Content Fingerprinting**
81 * Generate SHA-256 hash of normalized text
82 * Detect near-duplicates (≥85% similarity) using fuzzy matching
83 * Track which sources cited each unique piece of evidence
84 * Display provenance chain to user
85
86 **Target:** Duplicate detection >95% accurate, evidence counts reflect reality
87
88 ---
89
90 === 2.3 NFR13: Quality Metrics Dashboard (Internal) ===
91
92 **Priority:** HIGH
93 **Fulfills:** Real-time quality monitoring during development
94
95 **Dashboard Metrics:**
96 * Claim processing statistics
97 * Gate performance (pass/fail rates for each gate)
98 * Evidence quality metrics
99 * Hallucination rate tracking
100 * Processing performance
101
102 **Target:** Dashboard functional, all metrics tracked, exportable
103
104 ---
105
106 == 3. Success Criteria ==
107
108 **✅ Quality:**
109 * Hallucination rate <5% (target: <3%)
110 * Average quality rating ≥8.0/10
111 * 0 critical failures (publishable falsities)
112 * Gates correctly identify >95% of low-quality outputs
113
114 **✅ All 4 Gates Operational:**
115 * Gate 1: Claim validation working
116 * Gate 2: Evidence relevance filtering working
117 * Gate 3: Scenario coherence checking working
118 * Gate 4: Verdict confidence assessment working
119
120 **✅ Evidence Deduplication:**
121 * Duplicate detection >95% accurate
122 * Evidence counts reflect reality
123 * Provenance tracked correctly
124
125 **✅ Metrics Dashboard:**
126 * All metrics implemented and tracking
127 * Dashboard functional and useful
128 * Alerts trigger appropriately
129
130 ---
131
132 == 4. Architecture Notes ==
133
134 **POC2 Enhanced Architecture:**
135
136 {{code}}
137 Input → AKEL Processing → All 4 Quality Gates → Display
138 (claims + scenarios (1: Claim validation
139 + evidence linking 2: Evidence relevance
140 + verdicts) 3: Scenario coherence
141 4: Verdict confidence)
142 {{/code}}
143
144 **Key Additions from POC1:**
145 * Scenario generation component
146 * Evidence deduplication system
147 * Gates 2 & 3 implementation
148 * Quality metrics collection
149
150 **Still Simplified vs. Full System:**
151 * Single AKEL orchestration (not multi-component pipeline)
152 * No review queue
153 * No federation architecture
154
155 **See:** [[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]] for details
156
157 ---
158
159 == Related Pages ==
160
161 * [[POC1>>Test.FactHarbor.Roadmap.POC1.WebHome]] - Previous phase
162 * [[Beta 0>>Test.FactHarbor.Roadmap.Beta0.WebHome]] - Next phase
163 * [[Roadmap Overview>>Test.FactHarbor.Roadmap.WebHome]]
164 * [[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]]
165
166 ---
167
168 **Document Status:** ✅ POC2 Specification Complete - Waiting for POC1 Completion
169 **Version:** V0.9.70