Wiki source code of When to Add Complexity
Last modified by Robert Schaub on 2026/02/08 08:32
Hide last authors
| author | version | line-number | content |
|---|---|---|---|
| |
1.1 | 1 | = When to Add Complexity = |
| |
1.2 | 2 | |
| |
1.1 | 3 | FactHarbor starts simple and adds complexity **only when metrics prove it's necessary**. This page defines clear triggers for adding deferred features. |
| 4 | **Philosophy**: Let data and user feedback drive complexity, not assumptions about future needs. | ||
| |
1.2 | 5 | |
| |
1.1 | 6 | == 1. Add Elasticsearch == |
| |
1.2 | 7 | |
| |
1.1 | 8 | **Current**: PostgreSQL full-text search |
| 9 | **Add Elasticsearch when**: | ||
| |
1.2 | 10 | |
| |
1.1 | 11 | * ✅ PostgreSQL search queries consistently >500ms |
| 12 | * ✅ Search accounts for >20% of total database load | ||
| 13 | * ✅ Users complain about search speed | ||
| 14 | * ✅ Search index size >50GB | ||
| 15 | **Metrics to monitor**: | ||
| 16 | * Search query response time (P95, P99) | ||
| 17 | * Database CPU usage during search | ||
| 18 | * User search abandonment rate | ||
| 19 | * Search result relevance scores | ||
| 20 | **Before adding**: | ||
| 21 | * Try PostgreSQL search optimization (indexes, pg_trgm, GIN indexes) | ||
| 22 | * Profile slow queries | ||
| 23 | * Consider query result caching | ||
| 24 | * Estimate Elasticsearch costs | ||
| |
1.2 | 25 | **Implementation effort**: |
| 26 | |||
| |
1.1 | 27 | == 2. Add TimescaleDB == |
| |
1.2 | 28 | |
| |
1.1 | 29 | **Current**: PostgreSQL with time-series data in regular tables |
| 30 | **Add TimescaleDB when**: | ||
| |
1.2 | 31 | |
| |
1.1 | 32 | * ✅ Metrics queries consistently >1 second |
| 33 | * ✅ Metrics tables >100GB | ||
| 34 | * ✅ Need for time-series specific features (continuous aggregates, data retention policies) | ||
| 35 | * ✅ Dashboard loading noticeably slow | ||
| 36 | **Metrics to monitor**: | ||
| 37 | * Metrics query response time | ||
| 38 | * Metrics table size growth rate | ||
| 39 | * Dashboard load time | ||
| 40 | * Time-series query patterns | ||
| 41 | **Before adding**: | ||
| 42 | * Try PostgreSQL optimization (partitioning, materialized views) | ||
| 43 | * Implement query result caching | ||
| 44 | * Consider data aggregation strategies | ||
| 45 | * Profile slow metrics queries | ||
| |
1.2 | 46 | **Implementation effort**: |
| 47 | |||
| |
1.1 | 48 | == 3. Add Federation == |
| |
1.2 | 49 | |
| |
1.1 | 50 | **Current**: Single-node deployment with read replicas |
| 51 | **Add Federation when**: | ||
| |
1.2 | 52 | |
| |
1.1 | 53 | * ✅ 10,000+ users on single node |
| 54 | * ✅ Users explicitly request ability to run own instances | ||
| 55 | * ✅ Geographic latency becomes significant problem (>200ms) | ||
| 56 | * ✅ Censorship/control concerns emerge | ||
| 57 | * ✅ Community demands decentralization | ||
| 58 | **Metrics to monitor**: | ||
| 59 | * Total active users | ||
| 60 | * Geographic distribution of users | ||
| 61 | * Single-node performance limits | ||
| 62 | * User feature requests | ||
| 63 | * Community sentiment | ||
| 64 | **Before adding**: | ||
| 65 | * Exhaust vertical scaling options | ||
| 66 | * Add read replicas in multiple regions | ||
| 67 | * Implement CDN for static content | ||
| 68 | * Survey users about federation interest | ||
| |
1.2 | 69 | **Implementation effort**: (major undertaking) |
| 70 | |||
| |
1.1 | 71 | == 4. Add Complex Reputation System == |
| |
1.2 | 72 | |
| |
1.1 | 73 | **Current**: Simple manual roles (Reader, Contributor, Moderator, Admin) |
| 74 | **Add Complex Reputation when**: | ||
| |
1.2 | 75 | |
| |
1.1 | 76 | * ✅ 100+ active contributors |
| 77 | * ✅ Manual role management becomes bottleneck (>5 hours/week) | ||
| 78 | * ✅ Clear patterns of abuse require automated detection | ||
| 79 | * ✅ Community requests reputation visibility | ||
| 80 | **Metrics to monitor**: | ||
| 81 | * Number of active contributors | ||
| 82 | * Time spent on manual role management | ||
| 83 | * Abuse incident rate | ||
| 84 | * Contribution quality distribution | ||
| 85 | * Community feedback on roles | ||
| 86 | **Before adding**: | ||
| 87 | * Document current manual process thoroughly | ||
| 88 | * Identify most time-consuming tasks | ||
| 89 | * Prototype automated reputation algorithm | ||
| 90 | * Get community feedback on proposal | ||
| |
1.2 | 91 | **Implementation effort**: |
| 92 | |||
| |
1.1 | 93 | == 5. Add Many-to-Many Scenarios == |
| |
1.2 | 94 | |
| |
1.1 | 95 | **Current**: Scenarios belong to single claims (one-to-many) |
| 96 | **Add Many-to-Many Scenarios when**: | ||
| |
1.2 | 97 | |
| |
1.1 | 98 | * ✅ Users request "apply this scenario to other claims" |
| 99 | * ✅ Clear use cases for scenario reuse emerge | ||
| 100 | * ✅ Scenario duplication becomes significant storage issue | ||
| 101 | * ✅ Cross-claim scenario analysis requested | ||
| 102 | **Metrics to monitor**: | ||
| 103 | * Scenario duplication rate | ||
| 104 | * User feature requests | ||
| 105 | * Storage costs of scenarios | ||
| 106 | * Query patterns involving scenarios | ||
| 107 | **Before adding**: | ||
| 108 | * Analyze scenario duplication patterns | ||
| 109 | * Design junction table schema | ||
| 110 | * Plan data migration strategy | ||
| 111 | * Consider query performance impact | ||
| |
1.2 | 112 | **Implementation effort**: |
| 113 | |||
| |
1.1 | 114 | == 6. Add Full Versioning System == |
| |
1.2 | 115 | |
| |
1.1 | 116 | **Current**: Simple audit trail (before/after values, who/when/why) |
| 117 | **Add Full Versioning when**: | ||
| |
1.2 | 118 | |
| |
1.1 | 119 | * ✅ Users request "see complete version history" |
| 120 | * ✅ Users request "restore to specific previous version" | ||
| 121 | * ✅ Need for branching and merging emerges | ||
| 122 | * ✅ Collaborative editing requires conflict resolution | ||
| 123 | **Metrics to monitor**: | ||
| 124 | * User feature requests for versioning | ||
| 125 | * Manual rollback frequency | ||
| 126 | * Edit conflict rate | ||
| 127 | * Storage costs of full history | ||
| 128 | **Before adding**: | ||
| 129 | * Design branching/merging strategy | ||
| 130 | * Plan storage optimization (delta compression) | ||
| 131 | * Consider UI/UX for version history | ||
| 132 | * Estimate storage and performance impact | ||
| |
1.2 | 133 | **Implementation effort**: |
| 134 | |||
| |
1.1 | 135 | == 7. Add Graph Database == |
| |
1.2 | 136 | |
| |
1.1 | 137 | **Current**: Relational data model in PostgreSQL |
| 138 | **Add Graph Database when**: | ||
| |
1.2 | 139 | |
| |
1.1 | 140 | * ✅ Complex relationship queries become common |
| 141 | * ✅ Need for multi-hop traversals (friend-of-friend, citation chains) | ||
| 142 | * ✅ PostgreSQL recursive queries too slow | ||
| 143 | * ✅ Graph algorithms needed (PageRank, community detection) | ||
| 144 | **Metrics to monitor**: | ||
| 145 | * Relationship query patterns | ||
| 146 | * Recursive query performance | ||
| 147 | * Use cases requiring graph traversals | ||
| 148 | * Query complexity growth | ||
| 149 | **Before adding**: | ||
| 150 | * Try PostgreSQL recursive CTEs | ||
| 151 | * Consider graph extensions for PostgreSQL | ||
| 152 | * Profile slow relationship queries | ||
| 153 | * Evaluate Neo4j vs alternatives | ||
| |
1.2 | 154 | **Implementation effort**: |
| 155 | |||
| |
1.1 | 156 | == 8. Add Real-Time Collaboration == |
| |
1.2 | 157 | |
| |
1.1 | 158 | **Current**: Asynchronous edits with eventual consistency |
| 159 | **Add Real-Time Collaboration when**: | ||
| |
1.2 | 160 | |
| |
1.1 | 161 | * ✅ Users request simultaneous editing |
| 162 | * ✅ Conflict resolution becomes frequent issue | ||
| 163 | * ✅ Need for live updates during editing sessions | ||
| 164 | * ✅ Collaborative workflows common | ||
| 165 | **Metrics to monitor**: | ||
| 166 | * Edit conflict frequency | ||
| 167 | * User feature requests | ||
| 168 | * Collaborative editing patterns | ||
| 169 | * Average edit session duration | ||
| 170 | **Before adding**: | ||
| 171 | * Design conflict resolution strategy (Operational Transform or CRDT) | ||
| 172 | * Consider WebSocket infrastructure | ||
| 173 | * Plan UI/UX for real-time editing | ||
| 174 | * Estimate server resource requirements | ||
| |
1.2 | 175 | **Implementation effort**: |
| 176 | |||
| |
1.1 | 177 | == 9. Add Machine Learning Pipeline == |
| |
1.2 | 178 | |
| |
1.1 | 179 | **Current**: Rule-based quality scoring and LLM-based analysis |
| 180 | **Add ML Pipeline when**: | ||
| |
1.2 | 181 | |
| |
1.1 | 182 | * ✅ Need for custom models beyond LLM APIs |
| 183 | * ✅ Opportunity for specialized fine-tuning | ||
| 184 | * ✅ Cost savings from specialized models | ||
| 185 | * ✅ Real-time learning from user feedback | ||
| 186 | **Metrics to monitor**: | ||
| 187 | * LLM API costs | ||
| 188 | * Need for domain-specific models | ||
| 189 | * Quality improvement opportunities | ||
| 190 | * User feedback patterns | ||
| 191 | **Before adding**: | ||
| 192 | * Collect training data (user feedback, corrections) | ||
| 193 | * Experiment with fine-tuning approaches | ||
| 194 | * Estimate cost savings vs infrastructure costs | ||
| 195 | * Consider model hosting options | ||
| |
1.2 | 196 | **Implementation effort**: |
| 197 | |||
| |
1.1 | 198 | == 10. Add Blockchain/Web3 Integration == |
| |
1.2 | 199 | |
| |
1.1 | 200 | **Current**: Traditional database with audit logs |
| 201 | **Add Blockchain when**: | ||
| |
1.2 | 202 | |
| |
1.1 | 203 | * ✅ Need for immutable public audit trail |
| 204 | * ✅ Decentralized verification demanded | ||
| 205 | * ✅ Token economics would add value | ||
| 206 | * ✅ Community governance requires voting | ||
| 207 | * ✅ Cross-organization trust is critical | ||
| 208 | **Metrics to monitor**: | ||
| 209 | * User requests for blockchain features | ||
| 210 | * Need for external verification | ||
| 211 | * Governance participation rate | ||
| 212 | * Trust/verification requirements | ||
| 213 | **Before adding**: | ||
| 214 | * Evaluate real vs perceived benefits | ||
| 215 | * Consider costs (gas fees, infrastructure) | ||
| 216 | * Design token economics carefully | ||
| 217 | * Study successful Web3 content platforms | ||
| |
1.2 | 218 | **Implementation effort**: |
| 219 | |||
| |
1.1 | 220 | == Decision Framework == |
| |
1.2 | 221 | |
| |
1.1 | 222 | **For any complexity addition, ask**: |
| |
1.2 | 223 | |
| |
1.1 | 224 | ==== Do we have data? ==== |
| |
1.2 | 225 | |
| |
1.1 | 226 | * Metrics showing current system inadequate? |
| 227 | * User requests documenting need? | ||
| 228 | * Performance problems proven? | ||
| |
1.2 | 229 | |
| |
1.1 | 230 | ==== Have we exhausted simpler options? ==== |
| |
1.2 | 231 | |
| |
1.1 | 232 | * Optimization of current system? |
| 233 | * Configuration tuning? | ||
| 234 | * Simple workarounds? | ||
| |
1.2 | 235 | |
| |
1.1 | 236 | ==== Do we understand the cost? ==== |
| |
1.2 | 237 | |
| |
1.1 | 238 | * Implementation time realistic? |
| 239 | * Ongoing maintenance burden? | ||
| 240 | * Infrastructure costs? | ||
| 241 | * Technical debt implications? | ||
| |
1.2 | 242 | |
| |
1.1 | 243 | ==== Is the timing right? ==== |
| |
1.2 | 244 | |
| |
1.1 | 245 | * Core product stable? |
| 246 | * Team capacity available? | ||
| 247 | * User demand strong enough? | ||
| 248 | **If all four answers are YES**: Proceed with complexity addition | ||
| 249 | **If any answer is NO**: Defer and revisit later | ||
| |
1.2 | 250 | |
| |
1.1 | 251 | == Monitoring Dashboard == |
| |
1.2 | 252 | |
| |
1.1 | 253 | **Recommended metrics to track**: |
| 254 | **Performance**: | ||
| |
1.2 | 255 | |
| |
1.1 | 256 | * P95/P99 response times for all major operations |
| 257 | * Database query performance | ||
| 258 | * AKEL processing time | ||
| 259 | * Search performance | ||
| 260 | **Usage**: | ||
| 261 | * Active users (daily, weekly, monthly) | ||
| 262 | * Claims processed per day | ||
| 263 | * Search queries per day | ||
| 264 | * Contribution rate | ||
| 265 | **Costs**: | ||
| 266 | * Infrastructure costs per user | ||
| 267 | * LLM API costs per claim | ||
| 268 | * Storage costs per GB | ||
| 269 | * Total operational costs | ||
| 270 | **Quality**: | ||
| 271 | * Confidence score distribution | ||
| 272 | * Evidence completeness | ||
| 273 | * Source reliability trends | ||
| 274 | * User satisfaction (surveys) | ||
| 275 | **Community**: | ||
| 276 | * Active contributors | ||
| 277 | * Moderation workload | ||
| 278 | * Feature requests by category | ||
| 279 | * Abuse incident rate | ||
| |
1.2 | 280 | |
| |
1.1 | 281 | == Quarterly Review Process == |
| |
1.2 | 282 | |
| |
1.1 | 283 | **Every quarter, review**: |
| |
1.2 | 284 | |
| |
1.1 | 285 | 1. **Metrics dashboard**: Are any triggers close to thresholds? |
| 286 | 2. **User feedback**: What features are most requested? | ||
| 287 | 3. **Performance**: What's slowing down? | ||
| 288 | 4. **Costs**: What's most expensive? | ||
| 289 | 5. **Team capacity**: Can we handle new complexity? | ||
| 290 | **Decision**: Prioritize complexity additions based on: | ||
| |
1.2 | 291 | |
| |
1.1 | 292 | * Urgency (current pain vs future optimization) |
| 293 | * Impact (user benefit vs internal efficiency) | ||
| 294 | * Effort (quick wins vs major projects) | ||
| 295 | * Dependencies (prerequisites needed) | ||
| |
1.2 | 296 | |
| |
1.1 | 297 | == Related Pages == |
| |
1.2 | 298 | |
| |
1.1 | 299 | * [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] |
| |
1.2 | 300 | * [[Architecture>>Archive.FactHarbor 2026\.02\.08.Specification.Architecture.WebHome]] |
| |
1.3 | 301 | * [[Data Model>>Archive.FactHarbor 2026\.02\.08.Specification.Data Model.WebHome]] |
| |
1.1 | 302 | ## Remember |
| 303 | **Build what you need now. Measure everything. Add complexity only when data proves it's necessary.** | ||
| |
1.2 | 304 | The best architecture is the simplest one that works for current needs. 🎯## |