Changes for page System Performance Metrics
Last modified by Robert Schaub on 2026/02/08 08:32
To version 1.3
edited by Robert Schaub
on 2026/02/08 08:32
on 2026/02/08 08:32
Change comment:
Update document after refactoring.
Summary
-
Page properties (2 modified, 0 added, 0 removed)
Details
- Page properties
-
- Parent
-
... ... @@ -1,1 +1,1 @@ 1 - FactHarbor.Specification.WebHome1 +WebHome - Content
-
... ... @@ -1,25 +1,42 @@ 1 1 = System Performance Metrics = 2 + 2 2 **What we monitor to ensure AKEL performs well.** 4 + 3 3 == 1. Purpose == 6 + 4 4 These metrics tell us: 8 + 5 5 * ✅ Is AKEL performing within acceptable ranges? 6 6 * ✅ Where should we focus improvement efforts? 7 7 * ✅ When do humans need to intervene? 8 8 * ✅ Are our changes improving things? 9 9 **Principle**: Measure to improve, not to judge. 14 + 10 10 == 2. Metric Categories == 16 + 11 11 === 2.1 AKEL Performance === 18 + 12 12 **Processing speed and reliability** 20 + 13 13 === 2.2 Content Quality === 22 + 14 14 **Output quality and user satisfaction** 24 + 15 15 === 2.3 System Health === 26 + 16 16 **Infrastructure and operational metrics** 28 + 17 17 === 2.4 User Experience === 30 + 18 18 **How users interact with the system** 32 + 19 19 == 3. AKEL Performance Metrics == 34 + 20 20 === 3.1 Processing Time === 36 + 21 21 **Metric**: Time from claim submission to verdict publication 22 22 **Measurements**: 39 + 23 23 * P50 (median): 50% of claims processed within X seconds 24 24 * P95: 95% of claims processed within Y seconds 25 25 * P99: 99% of claims processed within Z seconds ... ... @@ -37,10 +37,13 @@ 37 37 * Better caching 38 38 * Parallel processing 39 39 * Database query optimization 57 + 40 40 === 3.2 Success Rate === 59 + 41 41 **Metric**: % of claims successfully processed without errors 42 42 **Target**: ≥ 99% 43 43 **Alert thresholds**: 63 + 44 44 * 98-99%: Monitor 45 45 * 95-98%: Investigate 46 46 * <95%: Emergency ... ... @@ -50,11 +50,14 @@ 50 50 * External API failure (source unavailable) 51 51 * Resource exhaustion (memory/CPU) 52 52 **Why it matters**: Errors frustrate users and reduce trust 73 + 53 53 === 3.3 Evidence Completeness === 75 + 54 54 **Metric**: % of claims where AKEL found sufficient evidence 55 55 **Measurement**: Claims with ≥3 pieces of evidence from ≥2 distinct sources 56 56 **Target**: ≥ 80% 57 57 **Alert thresholds**: 80 + 58 58 * 75-80%: Monitor 59 59 * 70-75%: Investigate 60 60 * <70%: Intervention needed ... ... @@ -63,51 +63,69 @@ 63 63 * Better search algorithms 64 64 * More source integrations 65 65 * Improved relevance scoring 89 + 66 66 === 3.4 Source Diversity === 91 + 67 67 **Metric**: Average number of distinct sources per claim 68 68 **Target**: ≥ 3.0 sources per claim 69 69 **Alert thresholds**: 95 + 70 70 * 2.5-3.0: Monitor 71 71 * 2.0-2.5: Investigate 72 72 * <2.0: Intervention needed 73 73 **Why it matters**: Multiple sources increase confidence and reduce bias 100 + 74 74 === 3.5 Scenario Coverage === 102 + 75 75 **Metric**: % of claims with at least one scenario extracted 76 76 **Target**: ≥ 75% 77 77 **Why it matters**: Scenarios provide context for verdicts 106 + 78 78 == 4. Content Quality Metrics == 108 + 79 79 === 4.1 Confidence Distribution === 110 + 80 80 **Metric**: Distribution of confidence scores across claims 81 81 **Target**: Roughly normal distribution 82 -* ~10% very low confidence (0.0-0.3) 83 -* ~20% low confidence (0.3-0.5) 84 -* ~40% medium confidence (0.5-0.7) 85 -* ~20% high confidence (0.7-0.9) 86 -* ~10% very high confidence (0.9-1.0) 113 + 114 +* 10% very low confidence (0.0-0.3) 115 +* 20% low confidence (0.3-0.5) 116 +* 40% medium confidence (0.5-0.7) 117 +* 20% high confidence (0.7-0.9) 118 +* 10% very high confidence (0.9-1.0) 87 87 **Alert thresholds**: 88 88 * >30% very low confidence: Evidence extraction issues 89 89 * >30% very high confidence: Too aggressive/overconfident 90 90 * Heavily skewed distribution: Systematic bias 91 91 **Why it matters**: Confidence should reflect actual uncertainty 124 + 92 92 === 4.2 Contradiction Rate === 126 + 93 93 **Metric**: % of claims with internal contradictions detected 94 94 **Target**: ≤ 5% 95 95 **Alert thresholds**: 130 + 96 96 * 5-10%: Monitor 97 97 * 10-15%: Investigate 98 98 * >15%: Intervention needed 99 99 **Why it matters**: High contradiction rate suggests poor evidence quality or logic errors 135 + 100 100 === 4.3 User Feedback Ratio === 137 + 101 101 **Metric**: Helpful vs unhelpful user ratings 102 102 **Target**: ≥ 70% helpful 103 103 **Alert thresholds**: 141 + 104 104 * 60-70%: Monitor 105 105 * 50-60%: Investigate 106 106 * <50%: Emergency 107 107 **Why it matters**: Direct measure of user satisfaction 146 + 108 108 === 4.4 False Positive/Negative Rate === 148 + 109 109 **Metric**: When humans review flagged items, how often was AKEL right? 110 110 **Measurement**: 151 + 111 111 * False positive: AKEL flagged for review, but actually fine 112 112 * False negative: Missed something that should've been flagged 113 113 **Target**: ... ... @@ -114,22 +114,31 @@ 114 114 * False positive rate: ≤ 20% 115 115 * False negative rate: ≤ 5% 116 116 **Why it matters**: Balance between catching problems and not crying wolf 158 + 117 117 == 5. System Health Metrics == 160 + 118 118 === 5.1 Uptime === 162 + 119 119 **Metric**: % of time system is available and functional 120 120 **Target**: ≥ 99.9% (less than 45 minutes downtime per month) 121 121 **Alert**: Immediate notification on any downtime 122 122 **Why it matters**: Users expect 24/7 availability 167 + 123 123 === 5.2 Error Rate === 169 + 124 124 **Metric**: Errors per 1000 requests 125 125 **Target**: ≤ 1 error per 1000 requests (0.1%) 126 126 **Alert thresholds**: 173 + 127 127 * 1-5 per 1000: Monitor 128 128 * 5-10 per 1000: Investigate 129 129 * >10 per 1000: Emergency 130 130 **Why it matters**: Errors disrupt user experience 178 + 131 131 === 5.3 Database Performance === 180 + 132 132 **Metrics**: 182 + 133 133 * Query response time (P95) 134 134 * Connection pool utilization 135 135 * Slow query frequency ... ... @@ -138,12 +138,17 @@ 138 138 * Connection pool: ≤ 80% utilized 139 139 * Slow queries (>1s): ≤ 10 per hour 140 140 **Why it matters**: Database bottlenecks slow entire system 191 + 141 141 === 5.4 Cache Hit Rate === 193 + 142 142 **Metric**: % of requests served from cache vs. database 143 143 **Target**: ≥ 80% 144 144 **Why it matters**: Higher cache hit rate = faster responses, less DB load 197 + 145 145 === 5.5 Resource Utilization === 199 + 146 146 **Metrics**: 201 + 147 147 * CPU utilization 148 148 * Memory utilization 149 149 * Disk I/O ... ... @@ -155,54 +155,81 @@ 155 155 * Disk I/O: ≤ 70% 156 156 **Alert**: Any metric consistently >85% 157 157 **Why it matters**: Headroom for traffic spikes, prevents resource exhaustion 213 + 158 158 == 6. User Experience Metrics == 215 + 159 159 === 6.1 Time to First Verdict === 217 + 160 160 **Metric**: Time from user submitting claim to seeing initial verdict 161 161 **Target**: ≤ 15 seconds 162 162 **Why it matters**: User perception of speed 221 + 163 163 === 6.2 Claim Submission Rate === 223 + 164 164 **Metric**: Claims submitted per day/hour 165 165 **Monitoring**: Track trends, detect anomalies 166 166 **Why it matters**: Understand usage patterns, capacity planning 227 + 167 167 === 6.3 User Retention === 229 + 168 168 **Metric**: % of users who return after first visit 169 169 **Target**: ≥ 30% (1-week retention) 170 170 **Why it matters**: Indicates system usefulness 233 + 171 171 === 6.4 Feature Usage === 235 + 172 172 **Metrics**: 237 + 173 173 * % of users who explore evidence 174 174 * % who check scenarios 175 175 * % who view source track records 176 176 **Why it matters**: Understand how users interact with system 242 + 177 177 == 7. Metric Dashboard == 244 + 178 178 === 7.1 Real-Time Dashboard === 246 + 179 179 **Always visible**: 248 + 180 180 * Current processing time (P95) 181 181 * Success rate (last hour) 182 182 * Error rate (last hour) 183 183 * System health status 184 184 **Update frequency**: Every 30 seconds 254 + 185 185 === 7.2 Daily Dashboard === 256 + 186 186 **Reviewed daily**: 258 + 187 187 * All AKEL performance metrics 188 188 * Content quality metrics 189 189 * System health trends 190 190 * User feedback summary 263 + 191 191 === 7.3 Weekly Reports === 265 + 192 192 **Reviewed weekly**: 267 + 193 193 * Trends over time 194 194 * Week-over-week comparisons 195 195 * Improvement priorities 196 196 * Outstanding issues 272 + 197 197 === 7.4 Monthly/Quarterly Reports === 274 + 198 198 **Comprehensive analysis**: 276 + 199 199 * Long-term trends 200 200 * Seasonal patterns 201 201 * Strategic metrics 202 202 * Goal progress 281 + 203 203 == 8. Alert System == 283 + 204 204 === 8.1 Alert Levels === 285 + 205 205 **Info**: Metric outside target, but within acceptable range 287 + 206 206 * Action: Note in daily review 207 207 * Example: P95 processing time 19s (target 18s, acceptable <20s) 208 208 **Warning**: Metric outside acceptable range ... ... @@ -214,69 +214,98 @@ 214 214 **Emergency**: System failure or severe degradation 215 215 * Action: Page on-call, all hands 216 216 * Example: Uptime <95%, P95 >30s 299 + 217 217 === 8.2 Alert Channels === 301 + 218 218 **Slack/Discord**: All alerts 219 219 **Email**: Warning and above 220 220 **SMS**: Critical and emergency only 221 221 **PagerDuty**: Emergency only 306 + 222 222 === 8.3 On-Call Rotation === 308 + 223 223 **Technical Coordinator**: Primary on-call 224 224 **Backup**: Designated team member 225 225 **Responsibilities**: 312 + 226 226 * Respond to alerts within SLA 227 227 * Investigate and diagnose issues 228 228 * Implement fixes or escalate 229 229 * Document incidents 317 + 230 230 == 9. Metric-Driven Improvement == 319 + 231 231 === 9.1 Prioritization === 321 + 232 232 **Focus improvements on**: 323 + 233 233 * Metrics furthest from target 234 234 * Metrics with biggest user impact 235 235 * Metrics easiest to improve 236 236 * Strategic priorities 328 + 237 237 === 9.2 Success Criteria === 330 + 238 238 **Every improvement project should**: 332 + 239 239 * Target specific metrics 240 240 * Set concrete improvement goals 241 241 * Measure before and after 242 242 * Document learnings 243 243 **Example**: "Reduce P95 processing time from 20s to 16s by optimizing evidence extraction" 338 + 244 244 === 9.3 A/B Testing === 340 + 245 245 **When feasible**: 342 + 246 246 * Run two versions 247 247 * Measure metric differences 248 248 * Choose based on data 249 249 * Roll out winner 347 + 250 250 == 10. Bias and Fairness Metrics == 349 + 251 251 === 10.1 Domain Balance === 351 + 252 252 **Metric**: Confidence distribution by domain 253 253 **Target**: Similar distributions across domains 254 254 **Alert**: One domain consistently much lower/higher confidence 255 255 **Why it matters**: Ensure no systematic domain bias 356 + 256 256 === 10.2 Source Type Balance === 358 + 257 257 **Metric**: Evidence distribution by source type 258 258 **Target**: Diverse source types represented 259 259 **Alert**: Over-reliance on one source type 260 260 **Why it matters**: Prevent source type bias 363 + 261 261 === 10.3 Geographic Balance === 365 + 262 262 **Metric**: Source geographic distribution 263 263 **Target**: Multiple regions represented 264 264 **Alert**: Over-concentration in one region 265 265 **Why it matters**: Reduce geographic/cultural bias 370 + 266 266 == 11. Experimental Metrics == 372 + 267 267 **New metrics to test**: 374 + 268 268 * User engagement time 269 269 * Evidence exploration depth 270 270 * Cross-reference usage 271 271 * Mobile vs desktop usage 272 272 **Process**: 380 + 273 273 1. Define metric hypothesis 274 274 2. Implement tracking 275 275 3. Collect data for 1 month 276 276 4. Evaluate usefulness 277 277 5. Add to standard set or discard 386 + 278 278 == 12. Anti-Patterns == 388 + 279 279 **Don't**: 390 + 280 280 * ❌ Measure too many things (focus on what matters) 281 281 * ❌ Set unrealistic targets (demotivating) 282 282 * ❌ Ignore metrics when inconvenient ... ... @@ -290,9 +290,11 @@ 290 290 * ✅ Continuously validate metrics still matter 291 291 * ✅ Use metrics for system improvement, not people evaluation 292 292 * ✅ Remember: metrics serve users, not the other way around 404 + 293 293 == 13. Related Pages == 406 + 294 294 * [[Automation Philosophy>>FactHarbor.Organisation.Automation-Philosophy]] - Why we monitor systems, not outputs 295 295 * [[Continuous Improvement>>FactHarbor.Organisation.How-We-Work-Together.Continuous-Improvement]] - How we use metrics to improve 296 -* [[Governance>>FactHarbor.Organisation.Governance.WebHome]] - Quarterly performance reviews 409 +* [[Governance>>Archive.FactHarbor 2026\.02\.08.Organisation.Governance.WebHome]] - Quarterly performance reviews 297 297 --- 298 -**Remember**: We measure the SYSTEM, not individual outputs. Metrics drive IMPROVEMENT, not judgment. 411 +**Remember**: We measure the SYSTEM, not individual outputs. Metrics drive IMPROVEMENT, not judgment.--