When to Add Complexity

author	version	line-number	content
		1	= When to Add Complexity =
		2	FactHarbor starts simple and adds complexity only when metrics prove it's necessary. This page defines clear triggers for adding deferred features.
		3	Philosophy: Let data and user feedback drive complexity, not assumptions about future needs.
		4	== 1. Add Elasticsearch ==
		5	Current: PostgreSQL full-text search
		6	Add Elasticsearch when:
		7	* ✅ PostgreSQL search queries consistently >500ms
		8	* ✅ Search accounts for >20% of total database load
		9	* ✅ Users complain about search speed
		10	* ✅ Search index size >50GB
		11	Metrics to monitor:
		12	* Search query response time (P95, P99)
		13	* Database CPU usage during search
		14	* User search abandonment rate
		15	* Search result relevance scores
		16	Before adding:
		17	* Try PostgreSQL search optimization (indexes, pg_trgm, GIN indexes)
		18	* Profile slow queries
		19	* Consider query result caching
		20	* Estimate Elasticsearch costs
		21	Implementation effort: ~
		22	== 2. Add TimescaleDB ==
		23	Current: PostgreSQL with time-series data in regular tables
		24	Add TimescaleDB when:
		25	* ✅ Metrics queries consistently >1 second
		26	* ✅ Metrics tables >100GB
		27	* ✅ Need for time-series specific features (continuous aggregates, data retention policies)
		28	* ✅ Dashboard loading noticeably slow
		29	Metrics to monitor:
		30	* Metrics query response time
		31	* Metrics table size growth rate
		32	* Dashboard load time
		33	* Time-series query patterns
		34	Before adding:
		35	* Try PostgreSQL optimization (partitioning, materialized views)
		36	* Implement query result caching
		37	* Consider data aggregation strategies
		38	* Profile slow metrics queries
		39	Implementation effort: ~
		40	== 3. Add Federation ==
		41	Current: Single-node deployment with read replicas
		42	Add Federation when:
		43	* ✅ 10,000+ users on single node
		44	* ✅ Users explicitly request ability to run own instances
		45	* ✅ Geographic latency becomes significant problem (>200ms)
		46	* ✅ Censorship/control concerns emerge
		47	* ✅ Community demands decentralization
		48	Metrics to monitor:
		49	* Total active users
		50	* Geographic distribution of users
		51	* Single-node performance limits
		52	* User feature requests
		53	* Community sentiment
		54	Before adding:
		55	* Exhaust vertical scaling options
		56	* Add read replicas in multiple regions
		57	* Implement CDN for static content
		58	* Survey users about federation interest
		59	Implementation effort: ~ (major undertaking)
		60	== 4. Add Complex Reputation System ==
		61	Current: Simple manual roles (Reader, Contributor, Moderator, Admin)
		62	Add Complex Reputation when:
		63	* ✅ 100+ active contributors
		64	* ✅ Manual role management becomes bottleneck (>5 hours/week)
		65	* ✅ Clear patterns of abuse require automated detection
		66	* ✅ Community requests reputation visibility
		67	Metrics to monitor:
		68	* Number of active contributors
		69	* Time spent on manual role management
		70	* Abuse incident rate
		71	* Contribution quality distribution
		72	* Community feedback on roles
		73	Before adding:
		74	* Document current manual process thoroughly
		75	* Identify most time-consuming tasks
		76	* Prototype automated reputation algorithm
		77	* Get community feedback on proposal
		78	Implementation effort: ~
		79	== 5. Add Many-to-Many Scenarios ==
		80	Current: Scenarios belong to single claims (one-to-many)
		81	Add Many-to-Many Scenarios when:
		82	* ✅ Users request "apply this scenario to other claims"
		83	* ✅ Clear use cases for scenario reuse emerge
		84	* ✅ Scenario duplication becomes significant storage issue
		85	* ✅ Cross-claim scenario analysis requested
		86	Metrics to monitor:
		87	* Scenario duplication rate
		88	* User feature requests
		89	* Storage costs of scenarios
		90	* Query patterns involving scenarios
		91	Before adding:
		92	* Analyze scenario duplication patterns
		93	* Design junction table schema
		94	* Plan data migration strategy
		95	* Consider query performance impact
		96	Implementation effort: ~
		97	== 6. Add Full Versioning System ==
		98	Current: Simple audit trail (before/after values, who/when/why)
		99	Add Full Versioning when:
		100	* ✅ Users request "see complete version history"
		101	* ✅ Users request "restore to specific previous version"
		102	* ✅ Need for branching and merging emerges
		103	* ✅ Collaborative editing requires conflict resolution
		104	Metrics to monitor:
		105	* User feature requests for versioning
		106	* Manual rollback frequency
		107	* Edit conflict rate
		108	* Storage costs of full history
		109	Before adding:
		110	* Design branching/merging strategy
		111	* Plan storage optimization (delta compression)
		112	* Consider UI/UX for version history
		113	* Estimate storage and performance impact
		114	Implementation effort: ~
		115	== 7. Add Graph Database ==
		116	Current: Relational data model in PostgreSQL
		117	Add Graph Database when:
		118	* ✅ Complex relationship queries become common
		119	* ✅ Need for multi-hop traversals (friend-of-friend, citation chains)
		120	* ✅ PostgreSQL recursive queries too slow
		121	* ✅ Graph algorithms needed (PageRank, community detection)
		122	Metrics to monitor:
		123	* Relationship query patterns
		124	* Recursive query performance
		125	* Use cases requiring graph traversals
		126	* Query complexity growth
		127	Before adding:
		128	* Try PostgreSQL recursive CTEs
		129	* Consider graph extensions for PostgreSQL
		130	* Profile slow relationship queries
		131	* Evaluate Neo4j vs alternatives
		132	Implementation effort: ~
		133	== 8. Add Real-Time Collaboration ==
		134	Current: Asynchronous edits with eventual consistency
		135	Add Real-Time Collaboration when:
		136	* ✅ Users request simultaneous editing
		137	* ✅ Conflict resolution becomes frequent issue
		138	* ✅ Need for live updates during editing sessions
		139	* ✅ Collaborative workflows common
		140	Metrics to monitor:
		141	* Edit conflict frequency
		142	* User feature requests
		143	* Collaborative editing patterns
		144	* Average edit session duration
		145	Before adding:
		146	* Design conflict resolution strategy (Operational Transform or CRDT)
		147	* Consider WebSocket infrastructure
		148	* Plan UI/UX for real-time editing
		149	* Estimate server resource requirements
		150	Implementation effort: ~
		151	== 9. Add Machine Learning Pipeline ==
		152	Current: Rule-based quality scoring and LLM-based analysis
		153	Add ML Pipeline when:
		154	* ✅ Need for custom models beyond LLM APIs
		155	* ✅ Opportunity for specialized fine-tuning
		156	* ✅ Cost savings from specialized models
		157	* ✅ Real-time learning from user feedback
		158	Metrics to monitor:
		159	* LLM API costs
		160	* Need for domain-specific models
		161	* Quality improvement opportunities
		162	* User feedback patterns
		163	Before adding:
		164	* Collect training data (user feedback, corrections)
		165	* Experiment with fine-tuning approaches
		166	* Estimate cost savings vs infrastructure costs
		167	* Consider model hosting options
		168	Implementation effort: ~
		169	== 10. Add Blockchain/Web3 Integration ==
		170	Current: Traditional database with audit logs
		171	Add Blockchain when:
		172	* ✅ Need for immutable public audit trail
		173	* ✅ Decentralized verification demanded
		174	* ✅ Token economics would add value
		175	* ✅ Community governance requires voting
		176	* ✅ Cross-organization trust is critical
		177	Metrics to monitor:
		178	* User requests for blockchain features
		179	* Need for external verification
		180	* Governance participation rate
		181	* Trust/verification requirements
		182	Before adding:
		183	* Evaluate real vs perceived benefits
		184	* Consider costs (gas fees, infrastructure)
		185	* Design token economics carefully
		186	* Study successful Web3 content platforms
		187	Implementation effort: ~
		188	== Decision Framework ==
		189	For any complexity addition, ask:
		190	==== Do we have data? ====
		191	* Metrics showing current system inadequate?
		192	* User requests documenting need?
		193	* Performance problems proven?
		194	==== Have we exhausted simpler options? ====
		195	* Optimization of current system?
		196	* Configuration tuning?
		197	* Simple workarounds?
		198	==== Do we understand the cost? ====
		199	* Implementation time realistic?
		200	* Ongoing maintenance burden?
		201	* Infrastructure costs?
		202	* Technical debt implications?
		203	==== Is the timing right? ====
		204	* Core product stable?
		205	* Team capacity available?
		206	* User demand strong enough?
		207	If all four answers are YES: Proceed with complexity addition
		208	If any answer is NO: Defer and revisit later
		209	== Monitoring Dashboard ==
		210	Recommended metrics to track:
		211	Performance:
		212	* P95/P99 response times for all major operations
		213	* Database query performance
		214	* AKEL processing time
		215	* Search performance
		216	Usage:
		217	* Active users (daily, weekly, monthly)
		218	* Claims processed per day
		219	* Search queries per day
		220	* Contribution rate
		221	Costs:
		222	* Infrastructure costs per user
		223	* LLM API costs per claim
		224	* Storage costs per GB
		225	* Total operational costs
		226	Quality:
		227	* Confidence score distribution
		228	* Evidence completeness
		229	* Source reliability trends
		230	* User satisfaction (surveys)
		231	Community:
		232	* Active contributors
		233	* Moderation workload
		234	* Feature requests by category
		235	* Abuse incident rate
		236	== Quarterly Review Process ==
		237	Every quarter, review:
		238	1. Metrics dashboard: Are any triggers close to thresholds?
		239	2. User feedback: What features are most requested?
		240	3. Performance: What's slowing down?
		241	4. Costs: What's most expensive?
		242	5. Team capacity: Can we handle new complexity?
		243	Decision: Prioritize complexity additions based on:
		244	* Urgency (current pain vs future optimization)
		245	* Impact (user benefit vs internal efficiency)
		246	* Effort (quick wins vs major projects)
		247	* Dependencies (prerequisites needed)
		248	== Related Pages ==
		249	* [[Design Decisions>>FactHarbor.Specification.Design-Decisions]]
		250	* [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]
		251	* [[Data Model>>FactHarbor.Specification.Data Model.WebHome]]
		252	## Remember
		253	Build what you need now. Measure everything. Add complexity only when data proves it's necessary.
		254	The best architecture is the simplest one that works for current needs. 🎯

= When to Add Complexity =

FactHarbor starts simple and adds complexity **only when metrics prove it's necessary**. This page defines clear triggers for adding deferred features.

3

**Philosophy**: Let data and user feedback drive complexity, not assumptions about future needs.

4

== 1. Add Elasticsearch ==

5

**Current**: PostgreSQL full-text search

6

**Add Elasticsearch when**:

7

* ✅ PostgreSQL search queries consistently >500ms

8

* ✅ Search accounts for >20% of total database load

9

* ✅ Users complain about search speed

10

* ✅ Search index size >50GB

11

**Metrics to monitor**:

12

* Search query response time (P95, P99)

13

* Database CPU usage during search

14

* User search abandonment rate

15

* Search result relevance scores

16

**Before adding**:

17

* Try PostgreSQL search optimization (indexes, pg_trgm, GIN indexes)

18

* Profile slow queries

19

* Consider query result caching

20

* Estimate Elasticsearch costs

21

**Implementation effort**: ~

22

== 2. Add TimescaleDB ==

23

**Current**: PostgreSQL with time-series data in regular tables

24

**Add TimescaleDB when**:

25

* ✅ Metrics queries consistently >1 second

26

* ✅ Metrics tables >100GB

27

* ✅ Need for time-series specific features (continuous aggregates, data retention policies)

28

* ✅ Dashboard loading noticeably slow

29

**Metrics to monitor**:

30

* Metrics query response time

31

* Metrics table size growth rate

32

* Dashboard load time

33

* Time-series query patterns

34

**Before adding**:

35

* Try PostgreSQL optimization (partitioning, materialized views)

36

* Implement query result caching

37

* Consider data aggregation strategies

38

* Profile slow metrics queries

39

**Implementation effort**: ~

40

== 3. Add Federation ==

41

**Current**: Single-node deployment with read replicas

42

**Add Federation when**:

43

* ✅ 10,000+ users on single node

44

* ✅ Users explicitly request ability to run own instances

45

* ✅ Geographic latency becomes significant problem (>200ms)

46

* ✅ Censorship/control concerns emerge

47

* ✅ Community demands decentralization

48

**Metrics to monitor**:

49

* Total active users

50

* Geographic distribution of users

51

* Single-node performance limits

52

* User feature requests

53

* Community sentiment

54

**Before adding**:

55

* Exhaust vertical scaling options

56

* Add read replicas in multiple regions

57

* Implement CDN for static content

58

* Survey users about federation interest

59

**Implementation effort**: ~ (major undertaking)

60

== 4. Add Complex Reputation System ==

61

**Current**: Simple manual roles (Reader, Contributor, Moderator, Admin)

62

**Add Complex Reputation when**:

63

* ✅ 100+ active contributors

64

* ✅ Manual role management becomes bottleneck (>5 hours/week)

65

* ✅ Clear patterns of abuse require automated detection

66

* ✅ Community requests reputation visibility

67

**Metrics to monitor**:

68

* Number of active contributors

69

* Time spent on manual role management

70

* Abuse incident rate

71

* Contribution quality distribution

72

* Community feedback on roles

73

**Before adding**:

74

* Document current manual process thoroughly

75

* Identify most time-consuming tasks

76

* Prototype automated reputation algorithm

77

* Get community feedback on proposal

78

**Implementation effort**: ~

79

== 5. Add Many-to-Many Scenarios ==

80

**Current**: Scenarios belong to single claims (one-to-many)

81

**Add Many-to-Many Scenarios when**:

82

* ✅ Users request "apply this scenario to other claims"

83

* ✅ Clear use cases for scenario reuse emerge

84

* ✅ Scenario duplication becomes significant storage issue

85

* ✅ Cross-claim scenario analysis requested

86

**Metrics to monitor**:

87

* Scenario duplication rate

88

* User feature requests

89

* Storage costs of scenarios

90

* Query patterns involving scenarios

91

**Before adding**:

92

* Analyze scenario duplication patterns

93

* Design junction table schema

94

* Plan data migration strategy

95

* Consider query performance impact

96

**Implementation effort**: ~

97

== 6. Add Full Versioning System ==

98

**Current**: Simple audit trail (before/after values, who/when/why)

99

**Add Full Versioning when**:

100

* ✅ Users request "see complete version history"

101

* ✅ Users request "restore to specific previous version"

102

* ✅ Need for branching and merging emerges

103

* ✅ Collaborative editing requires conflict resolution

104

**Metrics to monitor**:

105

* User feature requests for versioning

106

* Manual rollback frequency

107

* Edit conflict rate

108

* Storage costs of full history

109

**Before adding**:

110

* Design branching/merging strategy

111

* Plan storage optimization (delta compression)

112

* Consider UI/UX for version history

113

* Estimate storage and performance impact

114

**Implementation effort**: ~

115

== 7. Add Graph Database ==

116

**Current**: Relational data model in PostgreSQL

117

**Add Graph Database when**:

118

* ✅ Complex relationship queries become common

119

* ✅ Need for multi-hop traversals (friend-of-friend, citation chains)

120

* ✅ PostgreSQL recursive queries too slow

121

* ✅ Graph algorithms needed (PageRank, community detection)

122

**Metrics to monitor**:

123

* Relationship query patterns

124

* Recursive query performance

125

* Use cases requiring graph traversals

126

* Query complexity growth

127

**Before adding**:

128

* Try PostgreSQL recursive CTEs

129

* Consider graph extensions for PostgreSQL

130

* Profile slow relationship queries

131

* Evaluate Neo4j vs alternatives

132

**Implementation effort**: ~

133

== 8. Add Real-Time Collaboration ==

134

**Current**: Asynchronous edits with eventual consistency

135

**Add Real-Time Collaboration when**:

136

* ✅ Users request simultaneous editing

137

* ✅ Conflict resolution becomes frequent issue

138

* ✅ Need for live updates during editing sessions

139

* ✅ Collaborative workflows common

140

**Metrics to monitor**:

141

* Edit conflict frequency

142

* User feature requests

143

* Collaborative editing patterns

144

* Average edit session duration

145

**Before adding**:

146

* Design conflict resolution strategy (Operational Transform or CRDT)

147

* Consider WebSocket infrastructure

148

* Plan UI/UX for real-time editing

149

* Estimate server resource requirements

150

**Implementation effort**: ~

151

== 9. Add Machine Learning Pipeline ==

152

**Current**: Rule-based quality scoring and LLM-based analysis

153

**Add ML Pipeline when**:

154

* ✅ Need for custom models beyond LLM APIs

155

* ✅ Opportunity for specialized fine-tuning

156

* ✅ Cost savings from specialized models

157

* ✅ Real-time learning from user feedback

158

**Metrics to monitor**:

159

* LLM API costs

160

* Need for domain-specific models

161

* Quality improvement opportunities

162

* User feedback patterns

163

**Before adding**:

164

* Collect training data (user feedback, corrections)

165

* Experiment with fine-tuning approaches

166

* Estimate cost savings vs infrastructure costs

167

* Consider model hosting options

168

**Implementation effort**: ~

169

== 10. Add Blockchain/Web3 Integration ==

170

**Current**: Traditional database with audit logs

171

**Add Blockchain when**:

172

* ✅ Need for immutable public audit trail

173

* ✅ Decentralized verification demanded

174

* ✅ Token economics would add value

175

* ✅ Community governance requires voting

176

* ✅ Cross-organization trust is critical

177

**Metrics to monitor**:

178

* User requests for blockchain features

179

* Need for external verification

180

* Governance participation rate

181

* Trust/verification requirements

182

**Before adding**:

183

* Evaluate real vs perceived benefits

184

* Consider costs (gas fees, infrastructure)

185

* Design token economics carefully

186

* Study successful Web3 content platforms

187

**Implementation effort**: ~

188

== Decision Framework ==

189

**For any complexity addition, ask**:

190

==== Do we have data? ====

191

* Metrics showing current system inadequate?

192

* User requests documenting need?

193

* Performance problems proven?

194

==== Have we exhausted simpler options? ====

195

* Optimization of current system?

196

* Configuration tuning?

197

* Simple workarounds?

198

==== Do we understand the cost? ====

199

* Implementation time realistic?

200

* Ongoing maintenance burden?

201

* Infrastructure costs?

202

* Technical debt implications?

203

==== Is the timing right? ====

204

* Core product stable?

205

* Team capacity available?

206

* User demand strong enough?

207

**If all four answers are YES**: Proceed with complexity addition

208

**If any answer is NO**: Defer and revisit later

209

== Monitoring Dashboard ==

210

**Recommended metrics to track**:

211

**Performance**:

212

* P95/P99 response times for all major operations

213

* Database query performance

214

* AKEL processing time

215

* Search performance

216

**Usage**:

217

* Active users (daily, weekly, monthly)

218

* Claims processed per day

219

* Search queries per day

220

* Contribution rate

221

**Costs**:

222

* Infrastructure costs per user

223

* LLM API costs per claim

224

* Storage costs per GB

225

* Total operational costs

226

**Quality**:

227

* Confidence score distribution

228

* Evidence completeness

229

* Source reliability trends

230

* User satisfaction (surveys)

231

**Community**:

232

* Active contributors

233

* Moderation workload

234

* Feature requests by category

235

* Abuse incident rate

236

== Quarterly Review Process ==

237

**Every quarter, review**:

238

1. **Metrics dashboard**: Are any triggers close to thresholds?

239

2. **User feedback**: What features are most requested?

240

3. **Performance**: What's slowing down?

241

4. **Costs**: What's most expensive?

242

5. **Team capacity**: Can we handle new complexity?

243

**Decision**: Prioritize complexity additions based on:

244

* Urgency (current pain vs future optimization)

245

* Impact (user benefit vs internal efficiency)

246

* Effort (quick wins vs major projects)

247

* Dependencies (prerequisites needed)

248

== Related Pages ==

249

* [[Design Decisions>>FactHarbor.Specification.Design-Decisions]]

250

* [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]

251

* [[Data Model>>FactHarbor.Specification.Data Model.WebHome]]

252

## Remember

253

**Build what you need now. Measure everything. Add complexity only when data proves it's necessary.**

254

The best architecture is the simplest one that works for current needs. 🎯

Wiki source code of When to Add Complexity

Applications

Navigation

Need help?