Last modified by Robert Schaub on 2026/02/08 08:32

From version 1.1
edited by Robert Schaub
on 2026/01/20 21:40
Change comment: Imported from XAR
To version 1.2
edited by Robert Schaub
on 2026/02/08 08:29
Change comment: Renamed back-links.

Summary

Details

Page properties
Content
... ... @@ -1,25 +1,42 @@
1 1  = System Performance Metrics =
2 +
2 2  **What we monitor to ensure AKEL performs well.**
4 +
3 3  == 1. Purpose ==
6 +
4 4  These metrics tell us:
8 +
5 5  * ✅ Is AKEL performing within acceptable ranges?
6 6  * ✅ Where should we focus improvement efforts?
7 7  * ✅ When do humans need to intervene?
8 8  * ✅ Are our changes improving things?
9 9  **Principle**: Measure to improve, not to judge.
14 +
10 10  == 2. Metric Categories ==
16 +
11 11  === 2.1 AKEL Performance ===
18 +
12 12  **Processing speed and reliability**
20 +
13 13  === 2.2 Content Quality ===
22 +
14 14  **Output quality and user satisfaction**
24 +
15 15  === 2.3 System Health ===
26 +
16 16  **Infrastructure and operational metrics**
28 +
17 17  === 2.4 User Experience ===
30 +
18 18  **How users interact with the system**
32 +
19 19  == 3. AKEL Performance Metrics ==
34 +
20 20  === 3.1 Processing Time ===
36 +
21 21  **Metric**: Time from claim submission to verdict publication
22 22  **Measurements**:
39 +
23 23  * P50 (median): 50% of claims processed within X seconds
24 24  * P95: 95% of claims processed within Y seconds
25 25  * P99: 99% of claims processed within Z seconds
... ... @@ -37,10 +37,13 @@
37 37  * Better caching
38 38  * Parallel processing
39 39  * Database query optimization
57 +
40 40  === 3.2 Success Rate ===
59 +
41 41  **Metric**: % of claims successfully processed without errors
42 42  **Target**: ≥ 99%
43 43  **Alert thresholds**:
63 +
44 44  * 98-99%: Monitor
45 45  * 95-98%: Investigate
46 46  * <95%: Emergency
... ... @@ -50,11 +50,14 @@
50 50  * External API failure (source unavailable)
51 51  * Resource exhaustion (memory/CPU)
52 52  **Why it matters**: Errors frustrate users and reduce trust
73 +
53 53  === 3.3 Evidence Completeness ===
75 +
54 54  **Metric**: % of claims where AKEL found sufficient evidence
55 55  **Measurement**: Claims with ≥3 pieces of evidence from ≥2 distinct sources
56 56  **Target**: ≥ 80%
57 57  **Alert thresholds**:
80 +
58 58  * 75-80%: Monitor
59 59  * 70-75%: Investigate
60 60  * <70%: Intervention needed
... ... @@ -63,51 +63,69 @@
63 63  * Better search algorithms
64 64  * More source integrations
65 65  * Improved relevance scoring
89 +
66 66  === 3.4 Source Diversity ===
91 +
67 67  **Metric**: Average number of distinct sources per claim
68 68  **Target**: ≥ 3.0 sources per claim
69 69  **Alert thresholds**:
95 +
70 70  * 2.5-3.0: Monitor
71 71  * 2.0-2.5: Investigate
72 72  * <2.0: Intervention needed
73 73  **Why it matters**: Multiple sources increase confidence and reduce bias
100 +
74 74  === 3.5 Scenario Coverage ===
102 +
75 75  **Metric**: % of claims with at least one scenario extracted
76 76  **Target**: ≥ 75%
77 77  **Why it matters**: Scenarios provide context for verdicts
106 +
78 78  == 4. Content Quality Metrics ==
108 +
79 79  === 4.1 Confidence Distribution ===
110 +
80 80  **Metric**: Distribution of confidence scores across claims
81 81  **Target**: Roughly normal distribution
82 -* ~10% very low confidence (0.0-0.3)
83 -* ~20% low confidence (0.3-0.5)
84 -* ~40% medium confidence (0.5-0.7)
85 -* ~20% high confidence (0.7-0.9)
86 -* ~10% very high confidence (0.9-1.0)
113 +
114 +* 10% very low confidence (0.0-0.3)
115 +* 20% low confidence (0.3-0.5)
116 +* 40% medium confidence (0.5-0.7)
117 +* 20% high confidence (0.7-0.9)
118 +* 10% very high confidence (0.9-1.0)
87 87  **Alert thresholds**:
88 88  * >30% very low confidence: Evidence extraction issues
89 89  * >30% very high confidence: Too aggressive/overconfident
90 90  * Heavily skewed distribution: Systematic bias
91 91  **Why it matters**: Confidence should reflect actual uncertainty
124 +
92 92  === 4.2 Contradiction Rate ===
126 +
93 93  **Metric**: % of claims with internal contradictions detected
94 94  **Target**: ≤ 5%
95 95  **Alert thresholds**:
130 +
96 96  * 5-10%: Monitor
97 97  * 10-15%: Investigate
98 98  * >15%: Intervention needed
99 99  **Why it matters**: High contradiction rate suggests poor evidence quality or logic errors
135 +
100 100  === 4.3 User Feedback Ratio ===
137 +
101 101  **Metric**: Helpful vs unhelpful user ratings
102 102  **Target**: ≥ 70% helpful
103 103  **Alert thresholds**:
141 +
104 104  * 60-70%: Monitor
105 105  * 50-60%: Investigate
106 106  * <50%: Emergency
107 107  **Why it matters**: Direct measure of user satisfaction
146 +
108 108  === 4.4 False Positive/Negative Rate ===
148 +
109 109  **Metric**: When humans review flagged items, how often was AKEL right?
110 110  **Measurement**:
151 +
111 111  * False positive: AKEL flagged for review, but actually fine
112 112  * False negative: Missed something that should've been flagged
113 113  **Target**:
... ... @@ -114,22 +114,31 @@
114 114  * False positive rate: ≤ 20%
115 115  * False negative rate: ≤ 5%
116 116  **Why it matters**: Balance between catching problems and not crying wolf
158 +
117 117  == 5. System Health Metrics ==
160 +
118 118  === 5.1 Uptime ===
162 +
119 119  **Metric**: % of time system is available and functional
120 120  **Target**: ≥ 99.9% (less than 45 minutes downtime per month)
121 121  **Alert**: Immediate notification on any downtime
122 122  **Why it matters**: Users expect 24/7 availability
167 +
123 123  === 5.2 Error Rate ===
169 +
124 124  **Metric**: Errors per 1000 requests
125 125  **Target**: ≤ 1 error per 1000 requests (0.1%)
126 126  **Alert thresholds**:
173 +
127 127  * 1-5 per 1000: Monitor
128 128  * 5-10 per 1000: Investigate
129 129  * >10 per 1000: Emergency
130 130  **Why it matters**: Errors disrupt user experience
178 +
131 131  === 5.3 Database Performance ===
180 +
132 132  **Metrics**:
182 +
133 133  * Query response time (P95)
134 134  * Connection pool utilization
135 135  * Slow query frequency
... ... @@ -138,12 +138,17 @@
138 138  * Connection pool: ≤ 80% utilized
139 139  * Slow queries (>1s): ≤ 10 per hour
140 140  **Why it matters**: Database bottlenecks slow entire system
191 +
141 141  === 5.4 Cache Hit Rate ===
193 +
142 142  **Metric**: % of requests served from cache vs. database
143 143  **Target**: ≥ 80%
144 144  **Why it matters**: Higher cache hit rate = faster responses, less DB load
197 +
145 145  === 5.5 Resource Utilization ===
199 +
146 146  **Metrics**:
201 +
147 147  * CPU utilization
148 148  * Memory utilization
149 149  * Disk I/O
... ... @@ -155,54 +155,81 @@
155 155  * Disk I/O: ≤ 70%
156 156  **Alert**: Any metric consistently >85%
157 157  **Why it matters**: Headroom for traffic spikes, prevents resource exhaustion
213 +
158 158  == 6. User Experience Metrics ==
215 +
159 159  === 6.1 Time to First Verdict ===
217 +
160 160  **Metric**: Time from user submitting claim to seeing initial verdict
161 161  **Target**: ≤ 15 seconds
162 162  **Why it matters**: User perception of speed
221 +
163 163  === 6.2 Claim Submission Rate ===
223 +
164 164  **Metric**: Claims submitted per day/hour
165 165  **Monitoring**: Track trends, detect anomalies
166 166  **Why it matters**: Understand usage patterns, capacity planning
227 +
167 167  === 6.3 User Retention ===
229 +
168 168  **Metric**: % of users who return after first visit
169 169  **Target**: ≥ 30% (1-week retention)
170 170  **Why it matters**: Indicates system usefulness
233 +
171 171  === 6.4 Feature Usage ===
235 +
172 172  **Metrics**:
237 +
173 173  * % of users who explore evidence
174 174  * % who check scenarios
175 175  * % who view source track records
176 176  **Why it matters**: Understand how users interact with system
242 +
177 177  == 7. Metric Dashboard ==
244 +
178 178  === 7.1 Real-Time Dashboard ===
246 +
179 179  **Always visible**:
248 +
180 180  * Current processing time (P95)
181 181  * Success rate (last hour)
182 182  * Error rate (last hour)
183 183  * System health status
184 184  **Update frequency**: Every 30 seconds
254 +
185 185  === 7.2 Daily Dashboard ===
256 +
186 186  **Reviewed daily**:
258 +
187 187  * All AKEL performance metrics
188 188  * Content quality metrics
189 189  * System health trends
190 190  * User feedback summary
263 +
191 191  === 7.3 Weekly Reports ===
265 +
192 192  **Reviewed weekly**:
267 +
193 193  * Trends over time
194 194  * Week-over-week comparisons
195 195  * Improvement priorities
196 196  * Outstanding issues
272 +
197 197  === 7.4 Monthly/Quarterly Reports ===
274 +
198 198  **Comprehensive analysis**:
276 +
199 199  * Long-term trends
200 200  * Seasonal patterns
201 201  * Strategic metrics
202 202  * Goal progress
281 +
203 203  == 8. Alert System ==
283 +
204 204  === 8.1 Alert Levels ===
285 +
205 205  **Info**: Metric outside target, but within acceptable range
287 +
206 206  * Action: Note in daily review
207 207  * Example: P95 processing time 19s (target 18s, acceptable <20s)
208 208  **Warning**: Metric outside acceptable range
... ... @@ -214,69 +214,98 @@
214 214  **Emergency**: System failure or severe degradation
215 215  * Action: Page on-call, all hands
216 216  * Example: Uptime <95%, P95 >30s
299 +
217 217  === 8.2 Alert Channels ===
301 +
218 218  **Slack/Discord**: All alerts
219 219  **Email**: Warning and above
220 220  **SMS**: Critical and emergency only
221 221  **PagerDuty**: Emergency only
306 +
222 222  === 8.3 On-Call Rotation ===
308 +
223 223  **Technical Coordinator**: Primary on-call
224 224  **Backup**: Designated team member
225 225  **Responsibilities**:
312 +
226 226  * Respond to alerts within SLA
227 227  * Investigate and diagnose issues
228 228  * Implement fixes or escalate
229 229  * Document incidents
317 +
230 230  == 9. Metric-Driven Improvement ==
319 +
231 231  === 9.1 Prioritization ===
321 +
232 232  **Focus improvements on**:
323 +
233 233  * Metrics furthest from target
234 234  * Metrics with biggest user impact
235 235  * Metrics easiest to improve
236 236  * Strategic priorities
328 +
237 237  === 9.2 Success Criteria ===
330 +
238 238  **Every improvement project should**:
332 +
239 239  * Target specific metrics
240 240  * Set concrete improvement goals
241 241  * Measure before and after
242 242  * Document learnings
243 243  **Example**: "Reduce P95 processing time from 20s to 16s by optimizing evidence extraction"
338 +
244 244  === 9.3 A/B Testing ===
340 +
245 245  **When feasible**:
342 +
246 246  * Run two versions
247 247  * Measure metric differences
248 248  * Choose based on data
249 249  * Roll out winner
347 +
250 250  == 10. Bias and Fairness Metrics ==
349 +
251 251  === 10.1 Domain Balance ===
351 +
252 252  **Metric**: Confidence distribution by domain
253 253  **Target**: Similar distributions across domains
254 254  **Alert**: One domain consistently much lower/higher confidence
255 255  **Why it matters**: Ensure no systematic domain bias
356 +
256 256  === 10.2 Source Type Balance ===
358 +
257 257  **Metric**: Evidence distribution by source type
258 258  **Target**: Diverse source types represented
259 259  **Alert**: Over-reliance on one source type
260 260  **Why it matters**: Prevent source type bias
363 +
261 261  === 10.3 Geographic Balance ===
365 +
262 262  **Metric**: Source geographic distribution
263 263  **Target**: Multiple regions represented
264 264  **Alert**: Over-concentration in one region
265 265  **Why it matters**: Reduce geographic/cultural bias
370 +
266 266  == 11. Experimental Metrics ==
372 +
267 267  **New metrics to test**:
374 +
268 268  * User engagement time
269 269  * Evidence exploration depth
270 270  * Cross-reference usage
271 271  * Mobile vs desktop usage
272 272  **Process**:
380 +
273 273  1. Define metric hypothesis
274 274  2. Implement tracking
275 275  3. Collect data for 1 month
276 276  4. Evaluate usefulness
277 277  5. Add to standard set or discard
386 +
278 278  == 12. Anti-Patterns ==
388 +
279 279  **Don't**:
390 +
280 280  * ❌ Measure too many things (focus on what matters)
281 281  * ❌ Set unrealistic targets (demotivating)
282 282  * ❌ Ignore metrics when inconvenient
... ... @@ -290,9 +290,11 @@
290 290  * ✅ Continuously validate metrics still matter
291 291  * ✅ Use metrics for system improvement, not people evaluation
292 292  * ✅ Remember: metrics serve users, not the other way around
404 +
293 293  == 13. Related Pages ==
406 +
294 294  * [[Automation Philosophy>>FactHarbor.Organisation.Automation-Philosophy]] - Why we monitor systems, not outputs
295 295  * [[Continuous Improvement>>FactHarbor.Organisation.How-We-Work-Together.Continuous-Improvement]] - How we use metrics to improve
296 -* [[Governance>>FactHarbor.Organisation.Governance.WebHome]] - Quarterly performance reviews
409 +* [[Governance>>Archive.FactHarbor 2026\.02\.08.Organisation.Governance.WebHome]] - Quarterly performance reviews
297 297  ---
298 -**Remember**: We measure the SYSTEM, not individual outputs. Metrics drive IMPROVEMENT, not judgment.
411 +**Remember**: We measure the SYSTEM, not individual outputs. Metrics drive IMPROVEMENT, not judgment.--