Last modified by Robert Schaub on 2025/12/22 13:49

From version 1.1
edited by Robert Schaub
on 2025/12/22 13:26
Change comment: Imported from XAR
To version 1.4
edited by Robert Schaub
on 2025/12/22 13:49
Change comment: Update document after refactoring.

Summary

Details

Page properties
Parent
... ... @@ -1,1 +1,1 @@
1 -Test.FactHarbor.Roadmap.WebHome
1 +Test.FactHarbor pre10 V0\.9\.70.Roadmap.WebHome
Content
... ... @@ -12,12 +12,12 @@
12 12  **Key Innovation:** Complete quality validation pipeline catches all categories of errors
13 13  
14 14  **What We're Proving:**
15 +
15 15  * All 4 quality gates work together effectively
16 16  * Evidence deduplication prevents artificial inflation
17 17  * System maintains quality at larger scale
18 18  * Quality metrics dashboard provides actionable insights
19 19  
20 -
21 21  == 2. New Requirements ==
22 22  
23 23  === 2.1 NFR11: Complete Quality Assurance Framework ===
... ... @@ -29,11 +29,13 @@
29 29  **Purpose:** Ensure AI-linked evidence actually relates to the claim
30 30  
31 31  **Validation Checks:**
32 +
32 32  1. **Semantic Similarity:** Cosine similarity between claim and evidence embeddings ≥ 0.6
33 33  2. **Entity Overlap:** At least 1 shared named entity between claim and evidence
34 34  3. **Topic Relevance:** Evidence discusses the claim's subject matter (score ≥ 0.5)
35 35  
36 36  **Action if Failed:**
38 +
37 37  * Discard irrelevant evidence (don't count it)
38 38  * If <2 relevant evidence items remain → "Insufficient Evidence" verdict
39 39  * Log discarded evidence for quality review
... ... @@ -46,6 +46,7 @@
46 46  **Purpose:** Validate scenarios are logical, complete, and meaningfully different
47 47  
48 48  **Validation Checks:**
51 +
49 49  1. **Completeness:** All required fields populated (assumptions, scope, evidence context)
50 50  2. **Internal Consistency:** Assumptions don't contradict each other (score <0.3)
51 51  3. **Distinctiveness:** Scenarios are meaningfully different (similarity <0.8)
... ... @@ -52,6 +52,7 @@
52 52  4. **Minimum Detail:** At least 1 specific assumption per scenario
53 53  
54 54  **Action if Failed:**
58 +
55 55  * Merge duplicate scenarios
56 56  * Flag contradictory assumptions for review
57 57  * Reduce confidence score by 20%
... ... @@ -68,6 +68,7 @@
68 68  **Purpose:** Prevent counting the same evidence multiple times when cited by different sources
69 69  
70 70  **Problem:**
75 +
71 71  * Wire services (AP, Reuters) redistribute same content
72 72  * Different sites cite the same original study
73 73  * Aggregators copy primary sources
... ... @@ -74,6 +74,7 @@
74 74  * AKEL might count this as "5 sources" when it's really 1
75 75  
76 76  **Solution: Content Fingerprinting**
82 +
77 77  * Generate SHA-256 hash of normalized text
78 78  * Detect near-duplicates (≥85% similarity) using fuzzy matching
79 79  * Track which sources cited each unique piece of evidence
... ... @@ -88,6 +88,7 @@
88 88  **Fulfills:** Real-time quality monitoring during development
89 89  
90 90  **Dashboard Metrics:**
97 +
91 91  * Claim processing statistics
92 92  * Gate performance (pass/fail rates for each gate)
93 93  * Evidence quality metrics
... ... @@ -100,6 +100,7 @@
100 100  == 3. Success Criteria ==
101 101  
102 102  **✅ Quality:**
110 +
103 103  * Hallucination rate <5% (target: <3%)
104 104  * Average quality rating ≥8.0/10
105 105  * 0 critical failures (publishable falsities)
... ... @@ -106,6 +106,7 @@
106 106  * Gates correctly identify >95% of low-quality outputs
107 107  
108 108  **✅ All 4 Gates Operational:**
117 +
109 109  * Gate 1: Claim validation working
110 110  * Gate 2: Evidence relevance filtering working
111 111  * Gate 3: Scenario coherence checking working
... ... @@ -112,16 +112,17 @@
112 112  * Gate 4: Verdict confidence assessment working
113 113  
114 114  **✅ Evidence Deduplication:**
124 +
115 115  * Duplicate detection >95% accurate
116 116  * Evidence counts reflect reality
117 117  * Provenance tracked correctly
118 118  
119 119  **✅ Metrics Dashboard:**
130 +
120 120  * All metrics implemented and tracking
121 121  * Dashboard functional and useful
122 122  * Alerts trigger appropriately
123 123  
124 -
125 125  == 4. Architecture Notes ==
126 126  
127 127  **POC2 Enhanced Architecture:**
... ... @@ -135,6 +135,7 @@
135 135  {{/code}}
136 136  
137 137  **Key Additions from POC1:**
148 +
138 138  * Scenario generation component
139 139  * Evidence deduplication system
140 140  * Gates 2 & 3 implementation
... ... @@ -141,6 +141,7 @@
141 141  * Quality metrics collection
142 142  
143 143  **Still Simplified vs. Full System:**
155 +
144 144  * Single AKEL orchestration (not multi-component pipeline)
145 145  * No review queue
146 146  * No federation architecture
... ... @@ -150,12 +150,10 @@
150 150  
151 151  == Related Pages ==
152 152  
153 -* [[POC1>>Test.FactHarbor.Roadmap.POC1.WebHome]] - Previous phase
154 -* [[Beta 0>>Test.FactHarbor.Roadmap.Beta0.WebHome]] - Next phase
165 +* [[POC1>>Test.FactHarbor pre10 V0\.9\.70.Roadmap.POC1.WebHome]] - Previous phase
166 +* [[Beta 0>>Test.FactHarbor pre10 V0\.9\.70.Roadmap.Beta0.WebHome]] - Next phase
155 155  * [[Roadmap Overview>>Test.FactHarbor.Roadmap.WebHome]]
156 156  * [[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]]
157 157  
158 -
159 159  **Document Status:** ✅ POC2 Specification Complete - Waiting for POC1 Completion
160 160  **Version:** V0.9.70
161 -