Wiki source code of System Performance Metrics

Last modified by Robert Schaub on 2026/02/08 08:32

Hide last authors
Robert Schaub 1.1 1 = System Performance Metrics =
Robert Schaub 1.2 2
Robert Schaub 1.1 3 **What we monitor to ensure AKEL performs well.**
Robert Schaub 1.2 4
Robert Schaub 1.1 5 == 1. Purpose ==
Robert Schaub 1.2 6
Robert Schaub 1.1 7 These metrics tell us:
Robert Schaub 1.2 8
Robert Schaub 1.1 9 * ✅ Is AKEL performing within acceptable ranges?
10 * ✅ Where should we focus improvement efforts?
11 * ✅ When do humans need to intervene?
12 * ✅ Are our changes improving things?
13 **Principle**: Measure to improve, not to judge.
Robert Schaub 1.2 14
Robert Schaub 1.1 15 == 2. Metric Categories ==
Robert Schaub 1.2 16
Robert Schaub 1.1 17 === 2.1 AKEL Performance ===
Robert Schaub 1.2 18
Robert Schaub 1.1 19 **Processing speed and reliability**
Robert Schaub 1.2 20
Robert Schaub 1.1 21 === 2.2 Content Quality ===
Robert Schaub 1.2 22
Robert Schaub 1.1 23 **Output quality and user satisfaction**
Robert Schaub 1.2 24
Robert Schaub 1.1 25 === 2.3 System Health ===
Robert Schaub 1.2 26
Robert Schaub 1.1 27 **Infrastructure and operational metrics**
Robert Schaub 1.2 28
Robert Schaub 1.1 29 === 2.4 User Experience ===
Robert Schaub 1.2 30
Robert Schaub 1.1 31 **How users interact with the system**
Robert Schaub 1.2 32
Robert Schaub 1.1 33 == 3. AKEL Performance Metrics ==
Robert Schaub 1.2 34
Robert Schaub 1.1 35 === 3.1 Processing Time ===
Robert Schaub 1.2 36
Robert Schaub 1.1 37 **Metric**: Time from claim submission to verdict publication
38 **Measurements**:
Robert Schaub 1.2 39
Robert Schaub 1.1 40 * P50 (median): 50% of claims processed within X seconds
41 * P95: 95% of claims processed within Y seconds
42 * P99: 99% of claims processed within Z seconds
43 **Targets**:
44 * P50: ≤ 12 seconds
45 * P95: ≤ 18 seconds
46 * P99: ≤ 25 seconds
47 **Alert thresholds**:
48 * P95 > 20 seconds: Monitor closely
49 * P95 > 25 seconds: Investigate immediately
50 * P95 > 30 seconds: Emergency - intervention required
51 **Why it matters**: Slow processing = poor user experience
52 **Improvement ideas**:
53 * Optimize evidence extraction
54 * Better caching
55 * Parallel processing
56 * Database query optimization
Robert Schaub 1.2 57
Robert Schaub 1.1 58 === 3.2 Success Rate ===
Robert Schaub 1.2 59
Robert Schaub 1.1 60 **Metric**: % of claims successfully processed without errors
61 **Target**: ≥ 99%
62 **Alert thresholds**:
Robert Schaub 1.2 63
Robert Schaub 1.1 64 * 98-99%: Monitor
65 * 95-98%: Investigate
66 * <95%: Emergency
67 **Common failure causes**:
68 * Timeout (evidence extraction took too long)
69 * Parse error (claim text unparsable)
70 * External API failure (source unavailable)
71 * Resource exhaustion (memory/CPU)
72 **Why it matters**: Errors frustrate users and reduce trust
Robert Schaub 1.2 73
Robert Schaub 1.1 74 === 3.3 Evidence Completeness ===
Robert Schaub 1.2 75
Robert Schaub 1.1 76 **Metric**: % of claims where AKEL found sufficient evidence
77 **Measurement**: Claims with ≥3 pieces of evidence from ≥2 distinct sources
78 **Target**: ≥ 80%
79 **Alert thresholds**:
Robert Schaub 1.2 80
Robert Schaub 1.1 81 * 75-80%: Monitor
82 * 70-75%: Investigate
83 * <70%: Intervention needed
84 **Why it matters**: Incomplete evidence = low confidence verdicts
85 **Improvement ideas**:
86 * Better search algorithms
87 * More source integrations
88 * Improved relevance scoring
Robert Schaub 1.2 89
Robert Schaub 1.1 90 === 3.4 Source Diversity ===
Robert Schaub 1.2 91
Robert Schaub 1.1 92 **Metric**: Average number of distinct sources per claim
93 **Target**: ≥ 3.0 sources per claim
94 **Alert thresholds**:
Robert Schaub 1.2 95
Robert Schaub 1.1 96 * 2.5-3.0: Monitor
97 * 2.0-2.5: Investigate
98 * <2.0: Intervention needed
99 **Why it matters**: Multiple sources increase confidence and reduce bias
Robert Schaub 1.2 100
Robert Schaub 1.1 101 === 3.5 Scenario Coverage ===
Robert Schaub 1.2 102
Robert Schaub 1.1 103 **Metric**: % of claims with at least one scenario extracted
104 **Target**: ≥ 75%
105 **Why it matters**: Scenarios provide context for verdicts
Robert Schaub 1.2 106
Robert Schaub 1.1 107 == 4. Content Quality Metrics ==
Robert Schaub 1.2 108
Robert Schaub 1.1 109 === 4.1 Confidence Distribution ===
Robert Schaub 1.2 110
Robert Schaub 1.1 111 **Metric**: Distribution of confidence scores across claims
112 **Target**: Roughly normal distribution
Robert Schaub 1.2 113
114 * 10% very low confidence (0.0-0.3)
115 * 20% low confidence (0.3-0.5)
116 * 40% medium confidence (0.5-0.7)
117 * 20% high confidence (0.7-0.9)
118 * 10% very high confidence (0.9-1.0)
Robert Schaub 1.1 119 **Alert thresholds**:
120 * >30% very low confidence: Evidence extraction issues
121 * >30% very high confidence: Too aggressive/overconfident
122 * Heavily skewed distribution: Systematic bias
123 **Why it matters**: Confidence should reflect actual uncertainty
Robert Schaub 1.2 124
Robert Schaub 1.1 125 === 4.2 Contradiction Rate ===
Robert Schaub 1.2 126
Robert Schaub 1.1 127 **Metric**: % of claims with internal contradictions detected
128 **Target**: ≤ 5%
129 **Alert thresholds**:
Robert Schaub 1.2 130
Robert Schaub 1.1 131 * 5-10%: Monitor
132 * 10-15%: Investigate
133 * >15%: Intervention needed
134 **Why it matters**: High contradiction rate suggests poor evidence quality or logic errors
Robert Schaub 1.2 135
Robert Schaub 1.1 136 === 4.3 User Feedback Ratio ===
Robert Schaub 1.2 137
Robert Schaub 1.1 138 **Metric**: Helpful vs unhelpful user ratings
139 **Target**: ≥ 70% helpful
140 **Alert thresholds**:
Robert Schaub 1.2 141
Robert Schaub 1.1 142 * 60-70%: Monitor
143 * 50-60%: Investigate
144 * <50%: Emergency
145 **Why it matters**: Direct measure of user satisfaction
Robert Schaub 1.2 146
Robert Schaub 1.1 147 === 4.4 False Positive/Negative Rate ===
Robert Schaub 1.2 148
Robert Schaub 1.1 149 **Metric**: When humans review flagged items, how often was AKEL right?
150 **Measurement**:
Robert Schaub 1.2 151
Robert Schaub 1.1 152 * False positive: AKEL flagged for review, but actually fine
153 * False negative: Missed something that should've been flagged
154 **Target**:
155 * False positive rate: ≤ 20%
156 * False negative rate: ≤ 5%
157 **Why it matters**: Balance between catching problems and not crying wolf
Robert Schaub 1.2 158
Robert Schaub 1.1 159 == 5. System Health Metrics ==
Robert Schaub 1.2 160
Robert Schaub 1.1 161 === 5.1 Uptime ===
Robert Schaub 1.2 162
Robert Schaub 1.1 163 **Metric**: % of time system is available and functional
164 **Target**: ≥ 99.9% (less than 45 minutes downtime per month)
165 **Alert**: Immediate notification on any downtime
166 **Why it matters**: Users expect 24/7 availability
Robert Schaub 1.2 167
Robert Schaub 1.1 168 === 5.2 Error Rate ===
Robert Schaub 1.2 169
Robert Schaub 1.1 170 **Metric**: Errors per 1000 requests
171 **Target**: ≤ 1 error per 1000 requests (0.1%)
172 **Alert thresholds**:
Robert Schaub 1.2 173
Robert Schaub 1.1 174 * 1-5 per 1000: Monitor
175 * 5-10 per 1000: Investigate
176 * >10 per 1000: Emergency
177 **Why it matters**: Errors disrupt user experience
Robert Schaub 1.2 178
Robert Schaub 1.1 179 === 5.3 Database Performance ===
Robert Schaub 1.2 180
Robert Schaub 1.1 181 **Metrics**:
Robert Schaub 1.2 182
Robert Schaub 1.1 183 * Query response time (P95)
184 * Connection pool utilization
185 * Slow query frequency
186 **Targets**:
187 * P95 query time: ≤ 50ms
188 * Connection pool: ≤ 80% utilized
189 * Slow queries (>1s): ≤ 10 per hour
190 **Why it matters**: Database bottlenecks slow entire system
Robert Schaub 1.2 191
Robert Schaub 1.1 192 === 5.4 Cache Hit Rate ===
Robert Schaub 1.2 193
Robert Schaub 1.1 194 **Metric**: % of requests served from cache vs. database
195 **Target**: ≥ 80%
196 **Why it matters**: Higher cache hit rate = faster responses, less DB load
Robert Schaub 1.2 197
Robert Schaub 1.1 198 === 5.5 Resource Utilization ===
Robert Schaub 1.2 199
Robert Schaub 1.1 200 **Metrics**:
Robert Schaub 1.2 201
Robert Schaub 1.1 202 * CPU utilization
203 * Memory utilization
204 * Disk I/O
205 * Network bandwidth
206 **Targets**:
207 * Average CPU: ≤ 60%
208 * Peak CPU: ≤ 85%
209 * Memory: ≤ 80%
210 * Disk I/O: ≤ 70%
211 **Alert**: Any metric consistently >85%
212 **Why it matters**: Headroom for traffic spikes, prevents resource exhaustion
Robert Schaub 1.2 213
Robert Schaub 1.1 214 == 6. User Experience Metrics ==
Robert Schaub 1.2 215
Robert Schaub 1.1 216 === 6.1 Time to First Verdict ===
Robert Schaub 1.2 217
Robert Schaub 1.1 218 **Metric**: Time from user submitting claim to seeing initial verdict
219 **Target**: ≤ 15 seconds
220 **Why it matters**: User perception of speed
Robert Schaub 1.2 221
Robert Schaub 1.1 222 === 6.2 Claim Submission Rate ===
Robert Schaub 1.2 223
Robert Schaub 1.1 224 **Metric**: Claims submitted per day/hour
225 **Monitoring**: Track trends, detect anomalies
226 **Why it matters**: Understand usage patterns, capacity planning
Robert Schaub 1.2 227
Robert Schaub 1.1 228 === 6.3 User Retention ===
Robert Schaub 1.2 229
Robert Schaub 1.1 230 **Metric**: % of users who return after first visit
231 **Target**: ≥ 30% (1-week retention)
232 **Why it matters**: Indicates system usefulness
Robert Schaub 1.2 233
Robert Schaub 1.1 234 === 6.4 Feature Usage ===
Robert Schaub 1.2 235
Robert Schaub 1.1 236 **Metrics**:
Robert Schaub 1.2 237
Robert Schaub 1.1 238 * % of users who explore evidence
239 * % who check scenarios
240 * % who view source track records
241 **Why it matters**: Understand how users interact with system
Robert Schaub 1.2 242
Robert Schaub 1.1 243 == 7. Metric Dashboard ==
Robert Schaub 1.2 244
Robert Schaub 1.1 245 === 7.1 Real-Time Dashboard ===
Robert Schaub 1.2 246
Robert Schaub 1.1 247 **Always visible**:
Robert Schaub 1.2 248
Robert Schaub 1.1 249 * Current processing time (P95)
250 * Success rate (last hour)
251 * Error rate (last hour)
252 * System health status
253 **Update frequency**: Every 30 seconds
Robert Schaub 1.2 254
Robert Schaub 1.1 255 === 7.2 Daily Dashboard ===
Robert Schaub 1.2 256
Robert Schaub 1.1 257 **Reviewed daily**:
Robert Schaub 1.2 258
Robert Schaub 1.1 259 * All AKEL performance metrics
260 * Content quality metrics
261 * System health trends
262 * User feedback summary
Robert Schaub 1.2 263
Robert Schaub 1.1 264 === 7.3 Weekly Reports ===
Robert Schaub 1.2 265
Robert Schaub 1.1 266 **Reviewed weekly**:
Robert Schaub 1.2 267
Robert Schaub 1.1 268 * Trends over time
269 * Week-over-week comparisons
270 * Improvement priorities
271 * Outstanding issues
Robert Schaub 1.2 272
Robert Schaub 1.1 273 === 7.4 Monthly/Quarterly Reports ===
Robert Schaub 1.2 274
Robert Schaub 1.1 275 **Comprehensive analysis**:
Robert Schaub 1.2 276
Robert Schaub 1.1 277 * Long-term trends
278 * Seasonal patterns
279 * Strategic metrics
280 * Goal progress
Robert Schaub 1.2 281
Robert Schaub 1.1 282 == 8. Alert System ==
Robert Schaub 1.2 283
Robert Schaub 1.1 284 === 8.1 Alert Levels ===
Robert Schaub 1.2 285
Robert Schaub 1.1 286 **Info**: Metric outside target, but within acceptable range
Robert Schaub 1.2 287
Robert Schaub 1.1 288 * Action: Note in daily review
289 * Example: P95 processing time 19s (target 18s, acceptable <20s)
290 **Warning**: Metric outside acceptable range
291 * Action: Investigate within 24 hours
292 * Example: Success rate 97% (acceptable >98%)
293 **Critical**: Metric severely degraded
294 * Action: Investigate immediately
295 * Example: Error rate 2% (acceptable <0.5%)
296 **Emergency**: System failure or severe degradation
297 * Action: Page on-call, all hands
298 * Example: Uptime <95%, P95 >30s
Robert Schaub 1.2 299
Robert Schaub 1.1 300 === 8.2 Alert Channels ===
Robert Schaub 1.2 301
Robert Schaub 1.1 302 **Slack/Discord**: All alerts
303 **Email**: Warning and above
304 **SMS**: Critical and emergency only
305 **PagerDuty**: Emergency only
Robert Schaub 1.2 306
Robert Schaub 1.1 307 === 8.3 On-Call Rotation ===
Robert Schaub 1.2 308
Robert Schaub 1.1 309 **Technical Coordinator**: Primary on-call
310 **Backup**: Designated team member
311 **Responsibilities**:
Robert Schaub 1.2 312
Robert Schaub 1.1 313 * Respond to alerts within SLA
314 * Investigate and diagnose issues
315 * Implement fixes or escalate
316 * Document incidents
Robert Schaub 1.2 317
Robert Schaub 1.1 318 == 9. Metric-Driven Improvement ==
Robert Schaub 1.2 319
Robert Schaub 1.1 320 === 9.1 Prioritization ===
Robert Schaub 1.2 321
Robert Schaub 1.1 322 **Focus improvements on**:
Robert Schaub 1.2 323
Robert Schaub 1.1 324 * Metrics furthest from target
325 * Metrics with biggest user impact
326 * Metrics easiest to improve
327 * Strategic priorities
Robert Schaub 1.2 328
Robert Schaub 1.1 329 === 9.2 Success Criteria ===
Robert Schaub 1.2 330
Robert Schaub 1.1 331 **Every improvement project should**:
Robert Schaub 1.2 332
Robert Schaub 1.1 333 * Target specific metrics
334 * Set concrete improvement goals
335 * Measure before and after
336 * Document learnings
337 **Example**: "Reduce P95 processing time from 20s to 16s by optimizing evidence extraction"
Robert Schaub 1.2 338
Robert Schaub 1.1 339 === 9.3 A/B Testing ===
Robert Schaub 1.2 340
Robert Schaub 1.1 341 **When feasible**:
Robert Schaub 1.2 342
Robert Schaub 1.1 343 * Run two versions
344 * Measure metric differences
345 * Choose based on data
346 * Roll out winner
Robert Schaub 1.2 347
Robert Schaub 1.1 348 == 10. Bias and Fairness Metrics ==
Robert Schaub 1.2 349
Robert Schaub 1.1 350 === 10.1 Domain Balance ===
Robert Schaub 1.2 351
Robert Schaub 1.1 352 **Metric**: Confidence distribution by domain
353 **Target**: Similar distributions across domains
354 **Alert**: One domain consistently much lower/higher confidence
355 **Why it matters**: Ensure no systematic domain bias
Robert Schaub 1.2 356
Robert Schaub 1.1 357 === 10.2 Source Type Balance ===
Robert Schaub 1.2 358
Robert Schaub 1.1 359 **Metric**: Evidence distribution by source type
360 **Target**: Diverse source types represented
361 **Alert**: Over-reliance on one source type
362 **Why it matters**: Prevent source type bias
Robert Schaub 1.2 363
Robert Schaub 1.1 364 === 10.3 Geographic Balance ===
Robert Schaub 1.2 365
Robert Schaub 1.1 366 **Metric**: Source geographic distribution
367 **Target**: Multiple regions represented
368 **Alert**: Over-concentration in one region
369 **Why it matters**: Reduce geographic/cultural bias
Robert Schaub 1.2 370
Robert Schaub 1.1 371 == 11. Experimental Metrics ==
Robert Schaub 1.2 372
Robert Schaub 1.1 373 **New metrics to test**:
Robert Schaub 1.2 374
Robert Schaub 1.1 375 * User engagement time
376 * Evidence exploration depth
377 * Cross-reference usage
378 * Mobile vs desktop usage
379 **Process**:
Robert Schaub 1.2 380
Robert Schaub 1.1 381 1. Define metric hypothesis
382 2. Implement tracking
383 3. Collect data for 1 month
384 4. Evaluate usefulness
385 5. Add to standard set or discard
Robert Schaub 1.2 386
Robert Schaub 1.1 387 == 12. Anti-Patterns ==
Robert Schaub 1.2 388
Robert Schaub 1.1 389 **Don't**:
Robert Schaub 1.2 390
Robert Schaub 1.1 391 * ❌ Measure too many things (focus on what matters)
392 * ❌ Set unrealistic targets (demotivating)
393 * ❌ Ignore metrics when inconvenient
394 * ❌ Game metrics (destroys their value)
395 * ❌ Blame individuals for metric failures
396 * ❌ Let metrics become the goal (they're tools)
397 **Do**:
398 * ✅ Focus on actionable metrics
399 * ✅ Set ambitious but achievable targets
400 * ✅ Respond to metric signals
401 * ✅ Continuously validate metrics still matter
402 * ✅ Use metrics for system improvement, not people evaluation
403 * ✅ Remember: metrics serve users, not the other way around
Robert Schaub 1.2 404
Robert Schaub 1.1 405 == 13. Related Pages ==
Robert Schaub 1.2 406
Robert Schaub 1.1 407 * [[Automation Philosophy>>FactHarbor.Organisation.Automation-Philosophy]] - Why we monitor systems, not outputs
408 * [[Continuous Improvement>>FactHarbor.Organisation.How-We-Work-Together.Continuous-Improvement]] - How we use metrics to improve
Robert Schaub 1.2 409 * [[Governance>>Archive.FactHarbor 2026\.02\.08.Organisation.Governance.WebHome]] - Quarterly performance reviews
Robert Schaub 1.1 410 ---
Robert Schaub 1.2 411 **Remember**: We measure the SYSTEM, not individual outputs. Metrics drive IMPROVEMENT, not judgment.--