Design Decisions

= Design Decisions =

This page explains key architectural choices in FactHarbor and why simpler alternatives were chosen over complex solutions.

4

**Philosophy**: Start simple, add complexity only when metrics prove necessary.

5

6

== 1. Single Primary Database (PostgreSQL) ==

7

8

**Decision**: Use PostgreSQL for all data initially, not multiple specialized databases

9

**Alternatives considered**:

10

11

* ❌ PostgreSQL + TimescaleDB + Elasticsearch from day one

12

* ❌ Multiple specialized databases (graph, document, time-series)

13

* ❌ Microservices with separate databases

14

**Why PostgreSQL alone**:

15

* Modern PostgreSQL handles most workloads excellently

16

* Built-in full-text search often sufficient

17

* Time-series extensions available (pg_timeseries)

18

* Simpler deployment and maintenance

19

* Lower infrastructure costs

20

* Easier to reason about

21

**When to add specialized databases**:

22

* Elasticsearch: When PostgreSQL search consistently >500ms

23

* TimescaleDB: When metrics queries consistently >1s

24

* Graph DB: If relationship queries become complex

25

**Evidence**: Research shows single-DB architectures work well until 10,000+ users (Vertabelo, AWS patterns)

26

27

== 2. Three-Layer Architecture ==

28

29

**Decision**: Organize system into 3 layers (Interface, Processing, Data)

30

**Alternatives considered**:

31

32

* ❌ 7 layers (Ingestion, AKEL, Quality, Publication, Improvement, UI, Moderation)

33

* ❌ Pure microservices (20+ services)

34

* ❌ Monolithic single-layer

35

**Why 3 layers**:

36

* Clear separation of concerns

37

* Easy to understand and explain

38

* Maintainable by small team

39

* Can scale each layer independently

40

* Reduces cognitive load

41

**Research**: Modern architecture best practices recommend 3-4 layers maximum for maintainability

42

43

== 3. Deferred Federation ==

44

45

**Decision**: Single-node architecture for V1.0, federation only in V2.0+

46

**Alternatives considered**:

47

48

* ❌ Federated from day one

49

* ❌ P2P architecture

50

* ❌ Blockchain-based

51

**Why defer federation**:

52

* Adds massive complexity (sync, conflicts, identity, governance)

53

* Not needed for first 10,000 users

54

* Core product must be proven first

55

* Most successful platforms start centralized (Wikipedia, Reddit, GitHub)

56

* Can add federation later (see: Mastodon, Matrix)

57

**When to implement**:

58

* 10,000+ users on single node

59

* Users explicitly request decentralization

60

* Geographic distribution becomes necessary

61

* Censorship becomes real problem

62

**Evidence**: Research shows premature federation increases failure risk (InfoQ MVP architecture)

63

64

== 4. Parallel AKEL Processing ==

65

66

**Decision**: Process evidence/sources/scenarios in parallel, not sequentially

67

**Alternatives considered**:

68

69

* ❌ Pure sequential pipeline (15-30 seconds)

70

* ❌ Fully async/event-driven (complex orchestration)

71

* ❌ Microservices per stage

72

**Why parallel**:

73

* 40% faster (10-18s vs 15-30s)

74

* Better resource utilization

75

* Same code complexity

76

* Improves user experience

77

**Implementation**: Simple parallelization within single AKEL worker

78

**Evidence**: LLM orchestration research (2024-2025) strongly recommends pipeline parallelization

79

80

== 5. Simple Manual Roles ==

81

82

**Decision**: Manual role assignment for V1.0 (Reader, Contributor, Moderator, Admin)

83

**Alternatives considered**:

84

85

* ❌ Complex reputation point system from day one

86

* ❌ Automated privilege escalation

87

* ❌ Reputation decay algorithms

88

* ❌ Trust graphs

89

**Why simple roles**:

90

* Complex reputation not needed until 100+ active contributors

91

* Manual review builds better community initially

92

* Easier to implement and maintain

93

* Can add automation later when needed

94

**When to add complexity**:

95

* 100+ active contributors

96

* Manual role management becomes bottleneck

97

* Clear abuse patterns emerge requiring automation

98

**Evidence**: Successful communities (Wikipedia, Stack Overflow) started simple and added complexity gradually

99

100

== 6. One-to-Many Scenarios ==

101

102

**Decision**: Scenarios belong to single claims (one-to-many) for V1.0

103

**Alternatives considered**:

104

105

* ❌ Many-to-many with junction table

106

* ❌ Scenarios as separate first-class entities

107

* ❌ Hierarchical scenario taxonomy

108

**Why one-to-many**:

109

* Simpler queries (no junction table)

110

* Easier to understand

111

* Sufficient for most use cases

112

* Can add many-to-many in V2.0 if requested

113

**When to add many-to-many**:

114

* Users request "apply this scenario to other claims"

115

* Clear use cases for scenario reuse emerge

116

* Performance doesn't degrade

117

**Trade-off**: Slight duplication of scenarios vs. simpler mental model

118

119

== 7. Two-Tier Edit History ==

120

121

**Decision**: Hot audit trail (PostgreSQL) + Cold debug logs (S3 archive)

122

**Alternatives considered**:

123

124

* ❌ Everything in PostgreSQL forever

125

* ❌ Everything archived immediately

126

* ❌ Complex versioning system from day one

127

**Why two-tier**:

128

* 90% reduction in hot database size

129

* Full traceability maintained

130

* Faster queries (hot data only)

131

* Lower storage costs (S3 cheaper)

132

**Implementation**:

133

* Hot: Human edits, moderation actions, major AKEL updates

134

* Cold: All AKEL processing logs (archived after 90 days)

135

**Evidence**: Standard pattern for high-volume audit systems

136

137

== 8. Denormalized Cache Fields ==

138

139

**Decision**: Store summary data in claim records (evidence_summary, source_names, scenario_count)

140

**Alternatives considered**:

141

142

* ❌ Fully normalized (join every time)

143

* ❌ Fully denormalized (duplicate everything)

144

* ❌ External cache only (Redis)

145

**Why selective denormalization**:

146

* 70% fewer joins on common queries

147

* Much faster claim list/search pages

148

* Trade-off: Small storage increase (10%)

149

* Read-heavy system (95% reads) benefits greatly

150

**Update strategy**:

151

* Immediate: On user-visible edits

152

* Deferred: Background job every hour

153

* Invalidation: On source data changes

154

**Evidence**: Content management best practices recommend denormalization for read-heavy systems

155

156

== 9. Multi-Provider LLM Orchestration ==

157

158

**Decision**: Abstract LLM calls behind interface, support multiple providers

159

**Alternatives considered**:

160

161

* ❌ Hard-coded to single LLM provider

162

* ❌ Switch providers manually

163

* ❌ Complex multi-agent system

164

**Why orchestration**:

165

* No vendor lock-in

166

* Cost optimization (use cheap models for simple tasks)

167

* Cross-checking (compare outputs)

168

* Resilience (automatic fallback)

169

**Implementation**: Simple routing layer, task-based provider selection

170

**Evidence**: Modern LLM app architecture (2024-2025) strongly recommends orchestration

171

172

== 10. Source Scoring Separation ==

173

174

**Decision**: Separate source scoring (weekly batch) from claim analysis (real-time)

175

**Alternatives considered**:

176

177

* ❌ Update source scores during claim analysis

178

* ❌ Real-time score calculation

179

* ❌ Complex feedback loops

180

**Why separate**:

181

* Prevents circular dependencies

182

* Predictable behavior

183

* Easier to reason about

* Simpler testing

* Clear audit trail

**Implementation**:

* Sunday 2 AM: Calculate scores from past week

188

* Monday-Saturday: Claims use those scores

189

* Never update scores during analysis

190

**Evidence**: Standard pattern to prevent feedback loops in ML systems

191

192

== 11. Simple Versioning ==

193

194

**Decision**: Basic audit trail only for V1.0 (before/after values, who/when/why)

195

**Alternatives considered**:

196

197

* ❌ Full Git-like versioning from day one

198

* ❌ Branching and merging

199

* ❌ Time-travel queries

200

* ❌ Automatic conflict resolution

201

**Why simple**:

202

* Sufficient for accountability and basic rollback

203

* Complex versioning not requested by users yet

204

* Can add later if needed

205

* Easier to implement and maintain

206

**When to add complexity**:

207

* Users request "see version history"

208

* Users request "restore previous version"

209

* Need for branching emerges

210

**Evidence**: "You Aren't Gonna Need It" (YAGNI) principle from Extreme Programming

211

212

== Design Philosophy ==

213

214

**Guiding Principles**:

215

216

1. **Start Simple**: Build minimum viable features

217

2. **Measure First**: Add complexity only when metrics prove necessity

218

3. **User-Driven**: Let user requests guide feature additions

219

4. **Iterate**: Evolve based on real-world usage

220

5. **Fail Fast**: Simple systems fail in simple ways

221

**Inspiration**:

222

223

* "Premature optimization is the root of all evil" - Donald Knuth

224

* "You Aren't Gonna Need It" - Extreme Programming

225

* "Make it work, make it right, make it fast" - Kent Beck

226

**Result**: FactHarbor V1.0 is 35% simpler than original design while maintaining all core functionality and actually becoming more scalable.

== Related Pages ==

* [[Architecture>>Archive.FactHarbor 2026\.02\.08.Specification.Architecture.WebHome]]

231

* [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]]

232

* [[Data Model>>Archive.FactHarbor 2026\.02\.08.Specification.Data Model.WebHome]]

233

* [[AKEL>>Archive.FactHarbor 2026\.02\.08.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]

Wiki source code of Design Decisions

Applications

Navigation

Need help?