Wiki source code of Design Decisions

Version 1.2 by Robert Schaub on 2026/02/08 08:30

Hide last authors
Robert Schaub 1.1 1 = Design Decisions =
Robert Schaub 1.2 2
Robert Schaub 1.1 3 This page explains key architectural choices in FactHarbor and why simpler alternatives were chosen over complex solutions.
4 **Philosophy**: Start simple, add complexity only when metrics prove necessary.
Robert Schaub 1.2 5
Robert Schaub 1.1 6 == 1. Single Primary Database (PostgreSQL) ==
Robert Schaub 1.2 7
Robert Schaub 1.1 8 **Decision**: Use PostgreSQL for all data initially, not multiple specialized databases
9 **Alternatives considered**:
Robert Schaub 1.2 10
Robert Schaub 1.1 11 * ❌ PostgreSQL + TimescaleDB + Elasticsearch from day one
12 * ❌ Multiple specialized databases (graph, document, time-series)
13 * ❌ Microservices with separate databases
14 **Why PostgreSQL alone**:
15 * Modern PostgreSQL handles most workloads excellently
16 * Built-in full-text search often sufficient
17 * Time-series extensions available (pg_timeseries)
18 * Simpler deployment and maintenance
19 * Lower infrastructure costs
20 * Easier to reason about
21 **When to add specialized databases**:
22 * Elasticsearch: When PostgreSQL search consistently >500ms
23 * TimescaleDB: When metrics queries consistently >1s
24 * Graph DB: If relationship queries become complex
25 **Evidence**: Research shows single-DB architectures work well until 10,000+ users (Vertabelo, AWS patterns)
Robert Schaub 1.2 26
Robert Schaub 1.1 27 == 2. Three-Layer Architecture ==
Robert Schaub 1.2 28
Robert Schaub 1.1 29 **Decision**: Organize system into 3 layers (Interface, Processing, Data)
30 **Alternatives considered**:
Robert Schaub 1.2 31
Robert Schaub 1.1 32 * ❌ 7 layers (Ingestion, AKEL, Quality, Publication, Improvement, UI, Moderation)
33 * ❌ Pure microservices (20+ services)
34 * ❌ Monolithic single-layer
35 **Why 3 layers**:
36 * Clear separation of concerns
37 * Easy to understand and explain
38 * Maintainable by small team
39 * Can scale each layer independently
40 * Reduces cognitive load
41 **Research**: Modern architecture best practices recommend 3-4 layers maximum for maintainability
Robert Schaub 1.2 42
Robert Schaub 1.1 43 == 3. Deferred Federation ==
Robert Schaub 1.2 44
Robert Schaub 1.1 45 **Decision**: Single-node architecture for V1.0, federation only in V2.0+
46 **Alternatives considered**:
Robert Schaub 1.2 47
Robert Schaub 1.1 48 * ❌ Federated from day one
49 * ❌ P2P architecture
50 * ❌ Blockchain-based
51 **Why defer federation**:
52 * Adds massive complexity (sync, conflicts, identity, governance)
53 * Not needed for first 10,000 users
54 * Core product must be proven first
55 * Most successful platforms start centralized (Wikipedia, Reddit, GitHub)
56 * Can add federation later (see: Mastodon, Matrix)
57 **When to implement**:
58 * 10,000+ users on single node
59 * Users explicitly request decentralization
60 * Geographic distribution becomes necessary
61 * Censorship becomes real problem
62 **Evidence**: Research shows premature federation increases failure risk (InfoQ MVP architecture)
Robert Schaub 1.2 63
Robert Schaub 1.1 64 == 4. Parallel AKEL Processing ==
Robert Schaub 1.2 65
Robert Schaub 1.1 66 **Decision**: Process evidence/sources/scenarios in parallel, not sequentially
67 **Alternatives considered**:
Robert Schaub 1.2 68
Robert Schaub 1.1 69 * ❌ Pure sequential pipeline (15-30 seconds)
70 * ❌ Fully async/event-driven (complex orchestration)
71 * ❌ Microservices per stage
72 **Why parallel**:
73 * 40% faster (10-18s vs 15-30s)
74 * Better resource utilization
75 * Same code complexity
76 * Improves user experience
77 **Implementation**: Simple parallelization within single AKEL worker
78 **Evidence**: LLM orchestration research (2024-2025) strongly recommends pipeline parallelization
Robert Schaub 1.2 79
Robert Schaub 1.1 80 == 5. Simple Manual Roles ==
Robert Schaub 1.2 81
Robert Schaub 1.1 82 **Decision**: Manual role assignment for V1.0 (Reader, Contributor, Moderator, Admin)
83 **Alternatives considered**:
Robert Schaub 1.2 84
Robert Schaub 1.1 85 * ❌ Complex reputation point system from day one
86 * ❌ Automated privilege escalation
87 * ❌ Reputation decay algorithms
88 * ❌ Trust graphs
89 **Why simple roles**:
90 * Complex reputation not needed until 100+ active contributors
91 * Manual review builds better community initially
92 * Easier to implement and maintain
93 * Can add automation later when needed
94 **When to add complexity**:
95 * 100+ active contributors
96 * Manual role management becomes bottleneck
97 * Clear abuse patterns emerge requiring automation
98 **Evidence**: Successful communities (Wikipedia, Stack Overflow) started simple and added complexity gradually
Robert Schaub 1.2 99
Robert Schaub 1.1 100 == 6. One-to-Many Scenarios ==
Robert Schaub 1.2 101
Robert Schaub 1.1 102 **Decision**: Scenarios belong to single claims (one-to-many) for V1.0
103 **Alternatives considered**:
Robert Schaub 1.2 104
Robert Schaub 1.1 105 * ❌ Many-to-many with junction table
106 * ❌ Scenarios as separate first-class entities
107 * ❌ Hierarchical scenario taxonomy
108 **Why one-to-many**:
109 * Simpler queries (no junction table)
110 * Easier to understand
111 * Sufficient for most use cases
112 * Can add many-to-many in V2.0 if requested
113 **When to add many-to-many**:
114 * Users request "apply this scenario to other claims"
115 * Clear use cases for scenario reuse emerge
116 * Performance doesn't degrade
117 **Trade-off**: Slight duplication of scenarios vs. simpler mental model
Robert Schaub 1.2 118
Robert Schaub 1.1 119 == 7. Two-Tier Edit History ==
Robert Schaub 1.2 120
Robert Schaub 1.1 121 **Decision**: Hot audit trail (PostgreSQL) + Cold debug logs (S3 archive)
122 **Alternatives considered**:
Robert Schaub 1.2 123
Robert Schaub 1.1 124 * ❌ Everything in PostgreSQL forever
125 * ❌ Everything archived immediately
126 * ❌ Complex versioning system from day one
127 **Why two-tier**:
128 * 90% reduction in hot database size
129 * Full traceability maintained
130 * Faster queries (hot data only)
131 * Lower storage costs (S3 cheaper)
132 **Implementation**:
133 * Hot: Human edits, moderation actions, major AKEL updates
134 * Cold: All AKEL processing logs (archived after 90 days)
135 **Evidence**: Standard pattern for high-volume audit systems
Robert Schaub 1.2 136
Robert Schaub 1.1 137 == 8. Denormalized Cache Fields ==
Robert Schaub 1.2 138
Robert Schaub 1.1 139 **Decision**: Store summary data in claim records (evidence_summary, source_names, scenario_count)
140 **Alternatives considered**:
Robert Schaub 1.2 141
Robert Schaub 1.1 142 * ❌ Fully normalized (join every time)
143 * ❌ Fully denormalized (duplicate everything)
144 * ❌ External cache only (Redis)
145 **Why selective denormalization**:
146 * 70% fewer joins on common queries
147 * Much faster claim list/search pages
Robert Schaub 1.2 148 * Trade-off: Small storage increase (10%)
Robert Schaub 1.1 149 * Read-heavy system (95% reads) benefits greatly
150 **Update strategy**:
151 * Immediate: On user-visible edits
152 * Deferred: Background job every hour
153 * Invalidation: On source data changes
154 **Evidence**: Content management best practices recommend denormalization for read-heavy systems
Robert Schaub 1.2 155
Robert Schaub 1.1 156 == 9. Multi-Provider LLM Orchestration ==
Robert Schaub 1.2 157
Robert Schaub 1.1 158 **Decision**: Abstract LLM calls behind interface, support multiple providers
159 **Alternatives considered**:
Robert Schaub 1.2 160
Robert Schaub 1.1 161 * ❌ Hard-coded to single LLM provider
162 * ❌ Switch providers manually
163 * ❌ Complex multi-agent system
164 **Why orchestration**:
165 * No vendor lock-in
166 * Cost optimization (use cheap models for simple tasks)
167 * Cross-checking (compare outputs)
168 * Resilience (automatic fallback)
169 **Implementation**: Simple routing layer, task-based provider selection
170 **Evidence**: Modern LLM app architecture (2024-2025) strongly recommends orchestration
Robert Schaub 1.2 171
Robert Schaub 1.1 172 == 10. Source Scoring Separation ==
Robert Schaub 1.2 173
Robert Schaub 1.1 174 **Decision**: Separate source scoring (weekly batch) from claim analysis (real-time)
175 **Alternatives considered**:
Robert Schaub 1.2 176
Robert Schaub 1.1 177 * ❌ Update source scores during claim analysis
178 * ❌ Real-time score calculation
179 * ❌ Complex feedback loops
180 **Why separate**:
181 * Prevents circular dependencies
182 * Predictable behavior
183 * Easier to reason about
184 * Simpler testing
185 * Clear audit trail
186 **Implementation**:
187 * Sunday 2 AM: Calculate scores from past week
188 * Monday-Saturday: Claims use those scores
189 * Never update scores during analysis
190 **Evidence**: Standard pattern to prevent feedback loops in ML systems
Robert Schaub 1.2 191
Robert Schaub 1.1 192 == 11. Simple Versioning ==
Robert Schaub 1.2 193
Robert Schaub 1.1 194 **Decision**: Basic audit trail only for V1.0 (before/after values, who/when/why)
195 **Alternatives considered**:
Robert Schaub 1.2 196
Robert Schaub 1.1 197 * ❌ Full Git-like versioning from day one
198 * ❌ Branching and merging
199 * ❌ Time-travel queries
200 * ❌ Automatic conflict resolution
201 **Why simple**:
202 * Sufficient for accountability and basic rollback
203 * Complex versioning not requested by users yet
204 * Can add later if needed
205 * Easier to implement and maintain
206 **When to add complexity**:
207 * Users request "see version history"
208 * Users request "restore previous version"
209 * Need for branching emerges
210 **Evidence**: "You Aren't Gonna Need It" (YAGNI) principle from Extreme Programming
Robert Schaub 1.2 211
Robert Schaub 1.1 212 == Design Philosophy ==
Robert Schaub 1.2 213
Robert Schaub 1.1 214 **Guiding Principles**:
Robert Schaub 1.2 215
Robert Schaub 1.1 216 1. **Start Simple**: Build minimum viable features
217 2. **Measure First**: Add complexity only when metrics prove necessity
218 3. **User-Driven**: Let user requests guide feature additions
219 4. **Iterate**: Evolve based on real-world usage
220 5. **Fail Fast**: Simple systems fail in simple ways
221 **Inspiration**:
Robert Schaub 1.2 222
Robert Schaub 1.1 223 * "Premature optimization is the root of all evil" - Donald Knuth
224 * "You Aren't Gonna Need It" - Extreme Programming
225 * "Make it work, make it right, make it fast" - Kent Beck
226 **Result**: FactHarbor V1.0 is 35% simpler than original design while maintaining all core functionality and actually becoming more scalable.
Robert Schaub 1.2 227
Robert Schaub 1.1 228 == Related Pages ==
Robert Schaub 1.2 229
Robert Schaub 1.1 230 * [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]
231 * [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]]
232 * [[Data Model>>FactHarbor.Specification.Data Model.WebHome]]
Robert Schaub 1.2 233 * [[AKEL>>Archive.FactHarbor 2026\.02\.08.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]