Last modified by Robert Schaub on 2026/02/08 08:32

From version 1.3
edited by Robert Schaub
on 2026/02/08 08:32
Change comment: Update document after refactoring.
To version 1.1
edited by Robert Schaub
on 2026/01/20 21:40
Change comment: Imported from XAR

Summary

Details

Page properties
Parent
... ... @@ -1,1 +1,1 @@
1 -WebHome
1 +FactHarbor.Specification.WebHome
Content
... ... @@ -1,42 +1,25 @@
1 1  = System Performance Metrics =
2 -
3 3  **What we monitor to ensure AKEL performs well.**
4 -
5 5  == 1. Purpose ==
6 -
7 7  These metrics tell us:
8 -
9 9  * ✅ Is AKEL performing within acceptable ranges?
10 10  * ✅ Where should we focus improvement efforts?
11 11  * ✅ When do humans need to intervene?
12 12  * ✅ Are our changes improving things?
13 13  **Principle**: Measure to improve, not to judge.
14 -
15 15  == 2. Metric Categories ==
16 -
17 17  === 2.1 AKEL Performance ===
18 -
19 19  **Processing speed and reliability**
20 -
21 21  === 2.2 Content Quality ===
22 -
23 23  **Output quality and user satisfaction**
24 -
25 25  === 2.3 System Health ===
26 -
27 27  **Infrastructure and operational metrics**
28 -
29 29  === 2.4 User Experience ===
30 -
31 31  **How users interact with the system**
32 -
33 33  == 3. AKEL Performance Metrics ==
34 -
35 35  === 3.1 Processing Time ===
36 -
37 37  **Metric**: Time from claim submission to verdict publication
38 38  **Measurements**:
39 -
40 40  * P50 (median): 50% of claims processed within X seconds
41 41  * P95: 95% of claims processed within Y seconds
42 42  * P99: 99% of claims processed within Z seconds
... ... @@ -54,13 +54,10 @@
54 54  * Better caching
55 55  * Parallel processing
56 56  * Database query optimization
57 -
58 58  === 3.2 Success Rate ===
59 -
60 60  **Metric**: % of claims successfully processed without errors
61 61  **Target**: ≥ 99%
62 62  **Alert thresholds**:
63 -
64 64  * 98-99%: Monitor
65 65  * 95-98%: Investigate
66 66  * <95%: Emergency
... ... @@ -70,14 +70,11 @@
70 70  * External API failure (source unavailable)
71 71  * Resource exhaustion (memory/CPU)
72 72  **Why it matters**: Errors frustrate users and reduce trust
73 -
74 74  === 3.3 Evidence Completeness ===
75 -
76 76  **Metric**: % of claims where AKEL found sufficient evidence
77 77  **Measurement**: Claims with ≥3 pieces of evidence from ≥2 distinct sources
78 78  **Target**: ≥ 80%
79 79  **Alert thresholds**:
80 -
81 81  * 75-80%: Monitor
82 82  * 70-75%: Investigate
83 83  * <70%: Intervention needed
... ... @@ -86,69 +86,51 @@
86 86  * Better search algorithms
87 87  * More source integrations
88 88  * Improved relevance scoring
89 -
90 90  === 3.4 Source Diversity ===
91 -
92 92  **Metric**: Average number of distinct sources per claim
93 93  **Target**: ≥ 3.0 sources per claim
94 94  **Alert thresholds**:
95 -
96 96  * 2.5-3.0: Monitor
97 97  * 2.0-2.5: Investigate
98 98  * <2.0: Intervention needed
99 99  **Why it matters**: Multiple sources increase confidence and reduce bias
100 -
101 101  === 3.5 Scenario Coverage ===
102 -
103 103  **Metric**: % of claims with at least one scenario extracted
104 104  **Target**: ≥ 75%
105 105  **Why it matters**: Scenarios provide context for verdicts
106 -
107 107  == 4. Content Quality Metrics ==
108 -
109 109  === 4.1 Confidence Distribution ===
110 -
111 111  **Metric**: Distribution of confidence scores across claims
112 112  **Target**: Roughly normal distribution
113 -
114 -* 10% very low confidence (0.0-0.3)
115 -* 20% low confidence (0.3-0.5)
116 -* 40% medium confidence (0.5-0.7)
117 -* 20% high confidence (0.7-0.9)
118 -* 10% very high confidence (0.9-1.0)
82 +* ~10% very low confidence (0.0-0.3)
83 +* ~20% low confidence (0.3-0.5)
84 +* ~40% medium confidence (0.5-0.7)
85 +* ~20% high confidence (0.7-0.9)
86 +* ~10% very high confidence (0.9-1.0)
119 119  **Alert thresholds**:
120 120  * >30% very low confidence: Evidence extraction issues
121 121  * >30% very high confidence: Too aggressive/overconfident
122 122  * Heavily skewed distribution: Systematic bias
123 123  **Why it matters**: Confidence should reflect actual uncertainty
124 -
125 125  === 4.2 Contradiction Rate ===
126 -
127 127  **Metric**: % of claims with internal contradictions detected
128 128  **Target**: ≤ 5%
129 129  **Alert thresholds**:
130 -
131 131  * 5-10%: Monitor
132 132  * 10-15%: Investigate
133 133  * >15%: Intervention needed
134 134  **Why it matters**: High contradiction rate suggests poor evidence quality or logic errors
135 -
136 136  === 4.3 User Feedback Ratio ===
137 -
138 138  **Metric**: Helpful vs unhelpful user ratings
139 139  **Target**: ≥ 70% helpful
140 140  **Alert thresholds**:
141 -
142 142  * 60-70%: Monitor
143 143  * 50-60%: Investigate
144 144  * <50%: Emergency
145 145  **Why it matters**: Direct measure of user satisfaction
146 -
147 147  === 4.4 False Positive/Negative Rate ===
148 -
149 149  **Metric**: When humans review flagged items, how often was AKEL right?
150 150  **Measurement**:
151 -
152 152  * False positive: AKEL flagged for review, but actually fine
153 153  * False negative: Missed something that should've been flagged
154 154  **Target**:
... ... @@ -155,31 +155,22 @@
155 155  * False positive rate: ≤ 20%
156 156  * False negative rate: ≤ 5%
157 157  **Why it matters**: Balance between catching problems and not crying wolf
158 -
159 159  == 5. System Health Metrics ==
160 -
161 161  === 5.1 Uptime ===
162 -
163 163  **Metric**: % of time system is available and functional
164 164  **Target**: ≥ 99.9% (less than 45 minutes downtime per month)
165 165  **Alert**: Immediate notification on any downtime
166 166  **Why it matters**: Users expect 24/7 availability
167 -
168 168  === 5.2 Error Rate ===
169 -
170 170  **Metric**: Errors per 1000 requests
171 171  **Target**: ≤ 1 error per 1000 requests (0.1%)
172 172  **Alert thresholds**:
173 -
174 174  * 1-5 per 1000: Monitor
175 175  * 5-10 per 1000: Investigate
176 176  * >10 per 1000: Emergency
177 177  **Why it matters**: Errors disrupt user experience
178 -
179 179  === 5.3 Database Performance ===
180 -
181 181  **Metrics**:
182 -
183 183  * Query response time (P95)
184 184  * Connection pool utilization
185 185  * Slow query frequency
... ... @@ -188,17 +188,12 @@
188 188  * Connection pool: ≤ 80% utilized
189 189  * Slow queries (>1s): ≤ 10 per hour
190 190  **Why it matters**: Database bottlenecks slow entire system
191 -
192 192  === 5.4 Cache Hit Rate ===
193 -
194 194  **Metric**: % of requests served from cache vs. database
195 195  **Target**: ≥ 80%
196 196  **Why it matters**: Higher cache hit rate = faster responses, less DB load
197 -
198 198  === 5.5 Resource Utilization ===
199 -
200 200  **Metrics**:
201 -
202 202  * CPU utilization
203 203  * Memory utilization
204 204  * Disk I/O
... ... @@ -210,81 +210,54 @@
210 210  * Disk I/O: ≤ 70%
211 211  **Alert**: Any metric consistently >85%
212 212  **Why it matters**: Headroom for traffic spikes, prevents resource exhaustion
213 -
214 214  == 6. User Experience Metrics ==
215 -
216 216  === 6.1 Time to First Verdict ===
217 -
218 218  **Metric**: Time from user submitting claim to seeing initial verdict
219 219  **Target**: ≤ 15 seconds
220 220  **Why it matters**: User perception of speed
221 -
222 222  === 6.2 Claim Submission Rate ===
223 -
224 224  **Metric**: Claims submitted per day/hour
225 225  **Monitoring**: Track trends, detect anomalies
226 226  **Why it matters**: Understand usage patterns, capacity planning
227 -
228 228  === 6.3 User Retention ===
229 -
230 230  **Metric**: % of users who return after first visit
231 231  **Target**: ≥ 30% (1-week retention)
232 232  **Why it matters**: Indicates system usefulness
233 -
234 234  === 6.4 Feature Usage ===
235 -
236 236  **Metrics**:
237 -
238 238  * % of users who explore evidence
239 239  * % who check scenarios
240 240  * % who view source track records
241 241  **Why it matters**: Understand how users interact with system
242 -
243 243  == 7. Metric Dashboard ==
244 -
245 245  === 7.1 Real-Time Dashboard ===
246 -
247 247  **Always visible**:
248 -
249 249  * Current processing time (P95)
250 250  * Success rate (last hour)
251 251  * Error rate (last hour)
252 252  * System health status
253 253  **Update frequency**: Every 30 seconds
254 -
255 255  === 7.2 Daily Dashboard ===
256 -
257 257  **Reviewed daily**:
258 -
259 259  * All AKEL performance metrics
260 260  * Content quality metrics
261 261  * System health trends
262 262  * User feedback summary
263 -
264 264  === 7.3 Weekly Reports ===
265 -
266 266  **Reviewed weekly**:
267 -
268 268  * Trends over time
269 269  * Week-over-week comparisons
270 270  * Improvement priorities
271 271  * Outstanding issues
272 -
273 273  === 7.4 Monthly/Quarterly Reports ===
274 -
275 275  **Comprehensive analysis**:
276 -
277 277  * Long-term trends
278 278  * Seasonal patterns
279 279  * Strategic metrics
280 280  * Goal progress
281 -
282 282  == 8. Alert System ==
283 -
284 284  === 8.1 Alert Levels ===
285 -
286 286  **Info**: Metric outside target, but within acceptable range
287 -
288 288  * Action: Note in daily review
289 289  * Example: P95 processing time 19s (target 18s, acceptable <20s)
290 290  **Warning**: Metric outside acceptable range
... ... @@ -296,98 +296,69 @@
296 296  **Emergency**: System failure or severe degradation
297 297  * Action: Page on-call, all hands
298 298  * Example: Uptime <95%, P95 >30s
299 -
300 300  === 8.2 Alert Channels ===
301 -
302 302  **Slack/Discord**: All alerts
303 303  **Email**: Warning and above
304 304  **SMS**: Critical and emergency only
305 305  **PagerDuty**: Emergency only
306 -
307 307  === 8.3 On-Call Rotation ===
308 -
309 309  **Technical Coordinator**: Primary on-call
310 310  **Backup**: Designated team member
311 311  **Responsibilities**:
312 -
313 313  * Respond to alerts within SLA
314 314  * Investigate and diagnose issues
315 315  * Implement fixes or escalate
316 316  * Document incidents
317 -
318 318  == 9. Metric-Driven Improvement ==
319 -
320 320  === 9.1 Prioritization ===
321 -
322 322  **Focus improvements on**:
323 -
324 324  * Metrics furthest from target
325 325  * Metrics with biggest user impact
326 326  * Metrics easiest to improve
327 327  * Strategic priorities
328 -
329 329  === 9.2 Success Criteria ===
330 -
331 331  **Every improvement project should**:
332 -
333 333  * Target specific metrics
334 334  * Set concrete improvement goals
335 335  * Measure before and after
336 336  * Document learnings
337 337  **Example**: "Reduce P95 processing time from 20s to 16s by optimizing evidence extraction"
338 -
339 339  === 9.3 A/B Testing ===
340 -
341 341  **When feasible**:
342 -
343 343  * Run two versions
344 344  * Measure metric differences
345 345  * Choose based on data
346 346  * Roll out winner
347 -
348 348  == 10. Bias and Fairness Metrics ==
349 -
350 350  === 10.1 Domain Balance ===
351 -
352 352  **Metric**: Confidence distribution by domain
353 353  **Target**: Similar distributions across domains
354 354  **Alert**: One domain consistently much lower/higher confidence
355 355  **Why it matters**: Ensure no systematic domain bias
356 -
357 357  === 10.2 Source Type Balance ===
358 -
359 359  **Metric**: Evidence distribution by source type
360 360  **Target**: Diverse source types represented
361 361  **Alert**: Over-reliance on one source type
362 362  **Why it matters**: Prevent source type bias
363 -
364 364  === 10.3 Geographic Balance ===
365 -
366 366  **Metric**: Source geographic distribution
367 367  **Target**: Multiple regions represented
368 368  **Alert**: Over-concentration in one region
369 369  **Why it matters**: Reduce geographic/cultural bias
370 -
371 371  == 11. Experimental Metrics ==
372 -
373 373  **New metrics to test**:
374 -
375 375  * User engagement time
376 376  * Evidence exploration depth
377 377  * Cross-reference usage
378 378  * Mobile vs desktop usage
379 379  **Process**:
380 -
381 381  1. Define metric hypothesis
382 382  2. Implement tracking
383 383  3. Collect data for 1 month
384 384  4. Evaluate usefulness
385 385  5. Add to standard set or discard
386 -
387 387  == 12. Anti-Patterns ==
388 -
389 389  **Don't**:
390 -
391 391  * ❌ Measure too many things (focus on what matters)
392 392  * ❌ Set unrealistic targets (demotivating)
393 393  * ❌ Ignore metrics when inconvenient
... ... @@ -401,11 +401,9 @@
401 401  * ✅ Continuously validate metrics still matter
402 402  * ✅ Use metrics for system improvement, not people evaluation
403 403  * ✅ Remember: metrics serve users, not the other way around
404 -
405 405  == 13. Related Pages ==
406 -
407 407  * [[Automation Philosophy>>FactHarbor.Organisation.Automation-Philosophy]] - Why we monitor systems, not outputs
408 408  * [[Continuous Improvement>>FactHarbor.Organisation.How-We-Work-Together.Continuous-Improvement]] - How we use metrics to improve
409 -* [[Governance>>Archive.FactHarbor 2026\.02\.08.Organisation.Governance.WebHome]] - Quarterly performance reviews
296 +* [[Governance>>FactHarbor.Organisation.Governance.WebHome]] - Quarterly performance reviews
410 410  ---
411 -**Remember**: We measure the SYSTEM, not individual outputs. Metrics drive IMPROVEMENT, not judgment.--
298 +**Remember**: We measure the SYSTEM, not individual outputs. Metrics drive IMPROVEMENT, not judgment.