Last modified by Robert Schaub on 2025/12/22 13:49

From version 1.4
edited by Robert Schaub
on 2025/12/22 13:49
Change comment: Update document after refactoring.
To version 1.1
edited by Robert Schaub
on 2025/12/22 13:26
Change comment: Imported from XAR

Summary

Details

Page properties
Parent
... ... @@ -1,1 +1,1 @@
1 -Test.FactHarbor pre10 V0\.9\.70.Roadmap.WebHome
1 +Test.FactHarbor.Roadmap.WebHome
Content
... ... @@ -12,12 +12,12 @@
12 12  **Key Innovation:** Complete quality validation pipeline catches all categories of errors
13 13  
14 14  **What We're Proving:**
15 -
16 16  * All 4 quality gates work together effectively
17 17  * Evidence deduplication prevents artificial inflation
18 18  * System maintains quality at larger scale
19 19  * Quality metrics dashboard provides actionable insights
20 20  
20 +
21 21  == 2. New Requirements ==
22 22  
23 23  === 2.1 NFR11: Complete Quality Assurance Framework ===
... ... @@ -29,13 +29,11 @@
29 29  **Purpose:** Ensure AI-linked evidence actually relates to the claim
30 30  
31 31  **Validation Checks:**
32 -
33 33  1. **Semantic Similarity:** Cosine similarity between claim and evidence embeddings ≥ 0.6
34 34  2. **Entity Overlap:** At least 1 shared named entity between claim and evidence
35 35  3. **Topic Relevance:** Evidence discusses the claim's subject matter (score ≥ 0.5)
36 36  
37 37  **Action if Failed:**
38 -
39 39  * Discard irrelevant evidence (don't count it)
40 40  * If <2 relevant evidence items remain → "Insufficient Evidence" verdict
41 41  * Log discarded evidence for quality review
... ... @@ -48,7 +48,6 @@
48 48  **Purpose:** Validate scenarios are logical, complete, and meaningfully different
49 49  
50 50  **Validation Checks:**
51 -
52 52  1. **Completeness:** All required fields populated (assumptions, scope, evidence context)
53 53  2. **Internal Consistency:** Assumptions don't contradict each other (score <0.3)
54 54  3. **Distinctiveness:** Scenarios are meaningfully different (similarity <0.8)
... ... @@ -55,7 +55,6 @@
55 55  4. **Minimum Detail:** At least 1 specific assumption per scenario
56 56  
57 57  **Action if Failed:**
58 -
59 59  * Merge duplicate scenarios
60 60  * Flag contradictory assumptions for review
61 61  * Reduce confidence score by 20%
... ... @@ -72,7 +72,6 @@
72 72  **Purpose:** Prevent counting the same evidence multiple times when cited by different sources
73 73  
74 74  **Problem:**
75 -
76 76  * Wire services (AP, Reuters) redistribute same content
77 77  * Different sites cite the same original study
78 78  * Aggregators copy primary sources
... ... @@ -79,7 +79,6 @@
79 79  * AKEL might count this as "5 sources" when it's really 1
80 80  
81 81  **Solution: Content Fingerprinting**
82 -
83 83  * Generate SHA-256 hash of normalized text
84 84  * Detect near-duplicates (≥85% similarity) using fuzzy matching
85 85  * Track which sources cited each unique piece of evidence
... ... @@ -94,7 +94,6 @@
94 94  **Fulfills:** Real-time quality monitoring during development
95 95  
96 96  **Dashboard Metrics:**
97 -
98 98  * Claim processing statistics
99 99  * Gate performance (pass/fail rates for each gate)
100 100  * Evidence quality metrics
... ... @@ -107,7 +107,6 @@
107 107  == 3. Success Criteria ==
108 108  
109 109  **✅ Quality:**
110 -
111 111  * Hallucination rate <5% (target: <3%)
112 112  * Average quality rating ≥8.0/10
113 113  * 0 critical failures (publishable falsities)
... ... @@ -114,7 +114,6 @@
114 114  * Gates correctly identify >95% of low-quality outputs
115 115  
116 116  **✅ All 4 Gates Operational:**
117 -
118 118  * Gate 1: Claim validation working
119 119  * Gate 2: Evidence relevance filtering working
120 120  * Gate 3: Scenario coherence checking working
... ... @@ -121,17 +121,16 @@
121 121  * Gate 4: Verdict confidence assessment working
122 122  
123 123  **✅ Evidence Deduplication:**
124 -
125 125  * Duplicate detection >95% accurate
126 126  * Evidence counts reflect reality
127 127  * Provenance tracked correctly
128 128  
129 129  **✅ Metrics Dashboard:**
130 -
131 131  * All metrics implemented and tracking
132 132  * Dashboard functional and useful
133 133  * Alerts trigger appropriately
134 134  
124 +
135 135  == 4. Architecture Notes ==
136 136  
137 137  **POC2 Enhanced Architecture:**
... ... @@ -145,7 +145,6 @@
145 145  {{/code}}
146 146  
147 147  **Key Additions from POC1:**
148 -
149 149  * Scenario generation component
150 150  * Evidence deduplication system
151 151  * Gates 2 & 3 implementation
... ... @@ -152,7 +152,6 @@
152 152  * Quality metrics collection
153 153  
154 154  **Still Simplified vs. Full System:**
155 -
156 156  * Single AKEL orchestration (not multi-component pipeline)
157 157  * No review queue
158 158  * No federation architecture
... ... @@ -162,10 +162,12 @@
162 162  
163 163  == Related Pages ==
164 164  
165 -* [[POC1>>Test.FactHarbor pre10 V0\.9\.70.Roadmap.POC1.WebHome]] - Previous phase
166 -* [[Beta 0>>Test.FactHarbor pre10 V0\.9\.70.Roadmap.Beta0.WebHome]] - Next phase
153 +* [[POC1>>Test.FactHarbor.Roadmap.POC1.WebHome]] - Previous phase
154 +* [[Beta 0>>Test.FactHarbor.Roadmap.Beta0.WebHome]] - Next phase
167 167  * [[Roadmap Overview>>Test.FactHarbor.Roadmap.WebHome]]
168 168  * [[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]]
169 169  
158 +
170 170  **Document Status:** ✅ POC2 Specification Complete - Waiting for POC1 Completion
171 171  **Version:** V0.9.70
161 +