When to Add Complexity

= When to Add Complexity =

FactHarbor starts simple and adds complexity **only when metrics prove it's necessary**. This page defines clear triggers for adding deferred features.

author	version	line-number	content
		1	= When to Add Complexity =
		2
		3	FactHarbor starts simple and adds complexity only when metrics prove it's necessary. This page defines clear triggers for adding deferred features.
		4	Philosophy: Let data and user feedback drive complexity, not assumptions about future needs.
		5
		6	== 1. Add Elasticsearch ==
		7
		8	Current: PostgreSQL full-text search
		9	Add Elasticsearch when:
		10
		11	* ✅ PostgreSQL search queries consistently >500ms
		12	* ✅ Search accounts for >20% of total database load
		13	* ✅ Users complain about search speed
		14	* ✅ Search index size >50GB
		15	Metrics to monitor:
		16	* Search query response time (P95, P99)
		17	* Database CPU usage during search
		18	* User search abandonment rate
		19	* Search result relevance scores
		20	Before adding:
		21	* Try PostgreSQL search optimization (indexes, pg_trgm, GIN indexes)
		22	* Profile slow queries
		23	* Consider query result caching
		24	* Estimate Elasticsearch costs
		25	Implementation effort:
		26
		27	== 2. Add TimescaleDB ==
		28
		29	Current: PostgreSQL with time-series data in regular tables
		30	Add TimescaleDB when:
		31
		32	* ✅ Metrics queries consistently >1 second
		33	* ✅ Metrics tables >100GB
		34	* ✅ Need for time-series specific features (continuous aggregates, data retention policies)
		35	* ✅ Dashboard loading noticeably slow
		36	Metrics to monitor:
		37	* Metrics query response time
		38	* Metrics table size growth rate
		39	* Dashboard load time
		40	* Time-series query patterns
		41	Before adding:
		42	* Try PostgreSQL optimization (partitioning, materialized views)
		43	* Implement query result caching
		44	* Consider data aggregation strategies
		45	* Profile slow metrics queries
		46	Implementation effort:
		47
		48	== 3. Add Federation ==
		49
		50	Current: Single-node deployment with read replicas
		51	Add Federation when:
		52
		53	* ✅ 10,000+ users on single node
		54	* ✅ Users explicitly request ability to run own instances
		55	* ✅ Geographic latency becomes significant problem (>200ms)
		56	* ✅ Censorship/control concerns emerge
		57	* ✅ Community demands decentralization
		58	Metrics to monitor:
		59	* Total active users
		60	* Geographic distribution of users
		61	* Single-node performance limits
		62	* User feature requests
		63	* Community sentiment
		64	Before adding:
		65	* Exhaust vertical scaling options
		66	* Add read replicas in multiple regions
		67	* Implement CDN for static content
		68	* Survey users about federation interest
		69	Implementation effort: (major undertaking)
		70
		71	== 4. Add Complex Reputation System ==
		72
		73	Current: Simple manual roles (Reader, Contributor, Moderator, Admin)
		74	Add Complex Reputation when:
		75
		76	* ✅ 100+ active contributors
		77	* ✅ Manual role management becomes bottleneck (>5 hours/week)
		78	* ✅ Clear patterns of abuse require automated detection
		79	* ✅ Community requests reputation visibility
		80	Metrics to monitor:
		81	* Number of active contributors
		82	* Time spent on manual role management
		83	* Abuse incident rate
		84	* Contribution quality distribution
		85	* Community feedback on roles
		86	Before adding:
		87	* Document current manual process thoroughly
		88	* Identify most time-consuming tasks
		89	* Prototype automated reputation algorithm
		90	* Get community feedback on proposal
		91	Implementation effort:
		92
		93	== 5. Add Many-to-Many Scenarios ==
		94
		95	Current: Scenarios belong to single claims (one-to-many)
		96	Add Many-to-Many Scenarios when:
		97
		98	* ✅ Users request "apply this scenario to other claims"
		99	* ✅ Clear use cases for scenario reuse emerge
		100	* ✅ Scenario duplication becomes significant storage issue
		101	* ✅ Cross-claim scenario analysis requested
		102	Metrics to monitor:
		103	* Scenario duplication rate
		104	* User feature requests
		105	* Storage costs of scenarios
		106	* Query patterns involving scenarios
		107	Before adding:
		108	* Analyze scenario duplication patterns
		109	* Design junction table schema
		110	* Plan data migration strategy
		111	* Consider query performance impact
		112	Implementation effort:
		113
		114	== 6. Add Full Versioning System ==
		115
		116	Current: Simple audit trail (before/after values, who/when/why)
		117	Add Full Versioning when:
		118
		119	* ✅ Users request "see complete version history"
		120	* ✅ Users request "restore to specific previous version"
		121	* ✅ Need for branching and merging emerges
		122	* ✅ Collaborative editing requires conflict resolution
		123	Metrics to monitor:
		124	* User feature requests for versioning
		125	* Manual rollback frequency
		126	* Edit conflict rate
		127	* Storage costs of full history
		128	Before adding:
		129	* Design branching/merging strategy
		130	* Plan storage optimization (delta compression)
		131	* Consider UI/UX for version history
		132	* Estimate storage and performance impact
		133	Implementation effort:
		134
		135	== 7. Add Graph Database ==
		136
		137	Current: Relational data model in PostgreSQL
		138	Add Graph Database when:
		139
		140	* ✅ Complex relationship queries become common
		141	* ✅ Need for multi-hop traversals (friend-of-friend, citation chains)
		142	* ✅ PostgreSQL recursive queries too slow
		143	* ✅ Graph algorithms needed (PageRank, community detection)
		144	Metrics to monitor:
		145	* Relationship query patterns
		146	* Recursive query performance
		147	* Use cases requiring graph traversals
		148	* Query complexity growth
		149	Before adding:
		150	* Try PostgreSQL recursive CTEs
		151	* Consider graph extensions for PostgreSQL
		152	* Profile slow relationship queries
		153	* Evaluate Neo4j vs alternatives
		154	Implementation effort:
		155
		156	== 8. Add Real-Time Collaboration ==
		157
		158	Current: Asynchronous edits with eventual consistency
		159	Add Real-Time Collaboration when:
		160
		161	* ✅ Users request simultaneous editing
		162	* ✅ Conflict resolution becomes frequent issue
		163	* ✅ Need for live updates during editing sessions
		164	* ✅ Collaborative workflows common
		165	Metrics to monitor:
		166	* Edit conflict frequency
		167	* User feature requests
		168	* Collaborative editing patterns
		169	* Average edit session duration
		170	Before adding:
		171	* Design conflict resolution strategy (Operational Transform or CRDT)
		172	* Consider WebSocket infrastructure
		173	* Plan UI/UX for real-time editing
		174	* Estimate server resource requirements
		175	Implementation effort:
		176
		177	== 9. Add Machine Learning Pipeline ==
		178
		179	Current: Rule-based quality scoring and LLM-based analysis
		180	Add ML Pipeline when:
		181
		182	* ✅ Need for custom models beyond LLM APIs
		183	* ✅ Opportunity for specialized fine-tuning
		184	* ✅ Cost savings from specialized models
		185	* ✅ Real-time learning from user feedback
		186	Metrics to monitor:
		187	* LLM API costs
		188	* Need for domain-specific models
		189	* Quality improvement opportunities
		190	* User feedback patterns
		191	Before adding:
		192	* Collect training data (user feedback, corrections)
		193	* Experiment with fine-tuning approaches
		194	* Estimate cost savings vs infrastructure costs
		195	* Consider model hosting options
		196	Implementation effort:
		197
		198	== 10. Add Blockchain/Web3 Integration ==
		199
		200	Current: Traditional database with audit logs
		201	Add Blockchain when:
		202
		203	* ✅ Need for immutable public audit trail
		204	* ✅ Decentralized verification demanded
		205	* ✅ Token economics would add value
		206	* ✅ Community governance requires voting
		207	* ✅ Cross-organization trust is critical
		208	Metrics to monitor:
		209	* User requests for blockchain features
		210	* Need for external verification
		211	* Governance participation rate
		212	* Trust/verification requirements
		213	Before adding:
		214	* Evaluate real vs perceived benefits
		215	* Consider costs (gas fees, infrastructure)
		216	* Design token economics carefully
		217	* Study successful Web3 content platforms
		218	Implementation effort:
		219
		220	== Decision Framework ==
		221
		222	For any complexity addition, ask:
		223
		224	==== Do we have data? ====
		225
		226	* Metrics showing current system inadequate?
		227	* User requests documenting need?
		228	* Performance problems proven?
		229
		230	==== Have we exhausted simpler options? ====
		231
		232	* Optimization of current system?
		233	* Configuration tuning?
		234	* Simple workarounds?
		235
		236	==== Do we understand the cost? ====
		237
		238	* Implementation time realistic?
		239	* Ongoing maintenance burden?
		240	* Infrastructure costs?
		241	* Technical debt implications?
		242
		243	==== Is the timing right? ====
		244
		245	* Core product stable?
		246	* Team capacity available?
		247	* User demand strong enough?
		248	If all four answers are YES: Proceed with complexity addition
		249	If any answer is NO: Defer and revisit later
		250
		251	== Monitoring Dashboard ==
		252
		253	Recommended metrics to track:
		254	Performance:
		255
		256	* P95/P99 response times for all major operations
		257	* Database query performance
		258	* AKEL processing time
		259	* Search performance
		260	Usage:
		261	* Active users (daily, weekly, monthly)
		262	* Claims processed per day
		263	* Search queries per day
		264	* Contribution rate
		265	Costs:
		266	* Infrastructure costs per user
		267	* LLM API costs per claim
		268	* Storage costs per GB
		269	* Total operational costs
		270	Quality:
		271	* Confidence score distribution
		272	* Evidence completeness
		273	* Source reliability trends
		274	* User satisfaction (surveys)
		275	Community:
		276	* Active contributors
		277	* Moderation workload
		278	* Feature requests by category
		279	* Abuse incident rate
		280
		281	== Quarterly Review Process ==
		282
		283	Every quarter, review:
		284
		285	1. Metrics dashboard: Are any triggers close to thresholds?
		286	2. User feedback: What features are most requested?
		287	3. Performance: What's slowing down?
		288	4. Costs: What's most expensive?
		289	5. Team capacity: Can we handle new complexity?
		290	Decision: Prioritize complexity additions based on:
		291
		292	* Urgency (current pain vs future optimization)
		293	* Impact (user benefit vs internal efficiency)
		294	* Effort (quick wins vs major projects)
		295	* Dependencies (prerequisites needed)
		296
		297	== Related Pages ==
		298
		299	* [[Design Decisions>>FactHarbor.Specification.Design-Decisions]]
		300	* [[Architecture>>Archive.FactHarbor 2026\.02\.08.Specification.Architecture.WebHome]]
		301	* [[Data Model>>Archive.FactHarbor 2026\.02\.08.Specification.Data Model.WebHome]]
		302	## Remember
		303	Build what you need now. Measure everything. Add complexity only when data proves it's necessary.
		304	The best architecture is the simplest one that works for current needs. 🎯##

Wiki source code of When to Add Complexity

Applications

Need help?