Changes for page Design Decisions
Last modified by Robert Schaub on 2026/02/08 08:31
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -1,13 +1,9 @@ 1 1 = Design Decisions = 2 - 3 3 This page explains key architectural choices in FactHarbor and why simpler alternatives were chosen over complex solutions. 4 4 **Philosophy**: Start simple, add complexity only when metrics prove necessary. 5 - 6 6 == 1. Single Primary Database (PostgreSQL) == 7 - 8 8 **Decision**: Use PostgreSQL for all data initially, not multiple specialized databases 9 9 **Alternatives considered**: 10 - 11 11 * ❌ PostgreSQL + TimescaleDB + Elasticsearch from day one 12 12 * ❌ Multiple specialized databases (graph, document, time-series) 13 13 * ❌ Microservices with separate databases ... ... @@ -23,12 +23,9 @@ 23 23 * TimescaleDB: When metrics queries consistently >1s 24 24 * Graph DB: If relationship queries become complex 25 25 **Evidence**: Research shows single-DB architectures work well until 10,000+ users (Vertabelo, AWS patterns) 26 - 27 27 == 2. Three-Layer Architecture == 28 - 29 29 **Decision**: Organize system into 3 layers (Interface, Processing, Data) 30 30 **Alternatives considered**: 31 - 32 32 * ❌ 7 layers (Ingestion, AKEL, Quality, Publication, Improvement, UI, Moderation) 33 33 * ❌ Pure microservices (20+ services) 34 34 * ❌ Monolithic single-layer ... ... @@ -39,12 +39,9 @@ 39 39 * Can scale each layer independently 40 40 * Reduces cognitive load 41 41 **Research**: Modern architecture best practices recommend 3-4 layers maximum for maintainability 42 - 43 43 == 3. Deferred Federation == 44 - 45 45 **Decision**: Single-node architecture for V1.0, federation only in V2.0+ 46 46 **Alternatives considered**: 47 - 48 48 * ❌ Federated from day one 49 49 * ❌ P2P architecture 50 50 * ❌ Blockchain-based ... ... @@ -60,12 +60,9 @@ 60 60 * Geographic distribution becomes necessary 61 61 * Censorship becomes real problem 62 62 **Evidence**: Research shows premature federation increases failure risk (InfoQ MVP architecture) 63 - 64 64 == 4. Parallel AKEL Processing == 65 - 66 66 **Decision**: Process evidence/sources/scenarios in parallel, not sequentially 67 67 **Alternatives considered**: 68 - 69 69 * ❌ Pure sequential pipeline (15-30 seconds) 70 70 * ❌ Fully async/event-driven (complex orchestration) 71 71 * ❌ Microservices per stage ... ... @@ -76,12 +76,9 @@ 76 76 * Improves user experience 77 77 **Implementation**: Simple parallelization within single AKEL worker 78 78 **Evidence**: LLM orchestration research (2024-2025) strongly recommends pipeline parallelization 79 - 80 80 == 5. Simple Manual Roles == 81 - 82 82 **Decision**: Manual role assignment for V1.0 (Reader, Contributor, Moderator, Admin) 83 83 **Alternatives considered**: 84 - 85 85 * ❌ Complex reputation point system from day one 86 86 * ❌ Automated privilege escalation 87 87 * ❌ Reputation decay algorithms ... ... @@ -96,12 +96,9 @@ 96 96 * Manual role management becomes bottleneck 97 97 * Clear abuse patterns emerge requiring automation 98 98 **Evidence**: Successful communities (Wikipedia, Stack Overflow) started simple and added complexity gradually 99 - 100 100 == 6. One-to-Many Scenarios == 101 - 102 102 **Decision**: Scenarios belong to single claims (one-to-many) for V1.0 103 103 **Alternatives considered**: 104 - 105 105 * ❌ Many-to-many with junction table 106 106 * ❌ Scenarios as separate first-class entities 107 107 * ❌ Hierarchical scenario taxonomy ... ... @@ -115,12 +115,9 @@ 115 115 * Clear use cases for scenario reuse emerge 116 116 * Performance doesn't degrade 117 117 **Trade-off**: Slight duplication of scenarios vs. simpler mental model 118 - 119 119 == 7. Two-Tier Edit History == 120 - 121 121 **Decision**: Hot audit trail (PostgreSQL) + Cold debug logs (S3 archive) 122 122 **Alternatives considered**: 123 - 124 124 * ❌ Everything in PostgreSQL forever 125 125 * ❌ Everything archived immediately 126 126 * ❌ Complex versioning system from day one ... ... @@ -133,12 +133,9 @@ 133 133 * Hot: Human edits, moderation actions, major AKEL updates 134 134 * Cold: All AKEL processing logs (archived after 90 days) 135 135 **Evidence**: Standard pattern for high-volume audit systems 136 - 137 137 == 8. Denormalized Cache Fields == 138 - 139 139 **Decision**: Store summary data in claim records (evidence_summary, source_names, scenario_count) 140 140 **Alternatives considered**: 141 - 142 142 * ❌ Fully normalized (join every time) 143 143 * ❌ Fully denormalized (duplicate everything) 144 144 * ❌ External cache only (Redis) ... ... @@ -145,7 +145,7 @@ 145 145 **Why selective denormalization**: 146 146 * 70% fewer joins on common queries 147 147 * Much faster claim list/search pages 148 -* Trade-off: Small storage increase (10%) 123 +* Trade-off: Small storage increase (~10%) 149 149 * Read-heavy system (95% reads) benefits greatly 150 150 **Update strategy**: 151 151 * Immediate: On user-visible edits ... ... @@ -152,12 +152,9 @@ 152 152 * Deferred: Background job every hour 153 153 * Invalidation: On source data changes 154 154 **Evidence**: Content management best practices recommend denormalization for read-heavy systems 155 - 156 156 == 9. Multi-Provider LLM Orchestration == 157 - 158 158 **Decision**: Abstract LLM calls behind interface, support multiple providers 159 159 **Alternatives considered**: 160 - 161 161 * ❌ Hard-coded to single LLM provider 162 162 * ❌ Switch providers manually 163 163 * ❌ Complex multi-agent system ... ... @@ -168,12 +168,9 @@ 168 168 * Resilience (automatic fallback) 169 169 **Implementation**: Simple routing layer, task-based provider selection 170 170 **Evidence**: Modern LLM app architecture (2024-2025) strongly recommends orchestration 171 - 172 172 == 10. Source Scoring Separation == 173 - 174 174 **Decision**: Separate source scoring (weekly batch) from claim analysis (real-time) 175 175 **Alternatives considered**: 176 - 177 177 * ❌ Update source scores during claim analysis 178 178 * ❌ Real-time score calculation 179 179 * ❌ Complex feedback loops ... ... @@ -188,12 +188,9 @@ 188 188 * Monday-Saturday: Claims use those scores 189 189 * Never update scores during analysis 190 190 **Evidence**: Standard pattern to prevent feedback loops in ML systems 191 - 192 192 == 11. Simple Versioning == 193 - 194 194 **Decision**: Basic audit trail only for V1.0 (before/after values, who/when/why) 195 195 **Alternatives considered**: 196 - 197 197 * ❌ Full Git-like versioning from day one 198 198 * ❌ Branching and merging 199 199 * ❌ Time-travel queries ... ... @@ -208,11 +208,8 @@ 208 208 * Users request "restore previous version" 209 209 * Need for branching emerges 210 210 **Evidence**: "You Aren't Gonna Need It" (YAGNI) principle from Extreme Programming 211 - 212 212 == Design Philosophy == 213 - 214 214 **Guiding Principles**: 215 - 216 216 1. **Start Simple**: Build minimum viable features 217 217 2. **Measure First**: Add complexity only when metrics prove necessity 218 218 3. **User-Driven**: Let user requests guide feature additions ... ... @@ -219,15 +219,12 @@ 219 219 4. **Iterate**: Evolve based on real-world usage 220 220 5. **Fail Fast**: Simple systems fail in simple ways 221 221 **Inspiration**: 222 - 223 223 * "Premature optimization is the root of all evil" - Donald Knuth 224 224 * "You Aren't Gonna Need It" - Extreme Programming 225 225 * "Make it work, make it right, make it fast" - Kent Beck 226 226 **Result**: FactHarbor V1.0 is 35% simpler than original design while maintaining all core functionality and actually becoming more scalable. 227 - 228 228 == Related Pages == 229 - 230 230 * [[Architecture>>FactHarbor.Specification.Architecture.WebHome]] 231 231 * [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] 232 232 * [[Data Model>>FactHarbor.Specification.Data Model.WebHome]] 233 -* [[AKEL>> Archive.FactHarbor2026\.02\.08.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]193 +* [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]