Wiki source code of Design Decisions

Version 1.2 by Robert Schaub on 2026/02/08 08:30

version	line-number	content
1.1	1	= Design Decisions =
1.2	2
1.1	3	This page explains key architectural choices in FactHarbor and why simpler alternatives were chosen over complex solutions.
	4	Philosophy: Start simple, add complexity only when metrics prove necessary.
1.2	5
1.1	6	== 1. Single Primary Database (PostgreSQL) ==
1.2	7
1.1	8	Decision: Use PostgreSQL for all data initially, not multiple specialized databases
	9	Alternatives considered:
1.2	10
1.1	11	* ❌ PostgreSQL + TimescaleDB + Elasticsearch from day one
	12	* ❌ Multiple specialized databases (graph, document, time-series)
	13	* ❌ Microservices with separate databases
	14	Why PostgreSQL alone:
	15	* Modern PostgreSQL handles most workloads excellently
	16	* Built-in full-text search often sufficient
	17	* Time-series extensions available (pg_timeseries)
	18	* Simpler deployment and maintenance
	19	* Lower infrastructure costs
	20	* Easier to reason about
	21	When to add specialized databases:
	22	* Elasticsearch: When PostgreSQL search consistently >500ms
	23	* TimescaleDB: When metrics queries consistently >1s
	24	* Graph DB: If relationship queries become complex
	25	Evidence: Research shows single-DB architectures work well until 10,000+ users (Vertabelo, AWS patterns)
1.2	26
1.1	27	== 2. Three-Layer Architecture ==
1.2	28
1.1	29	Decision: Organize system into 3 layers (Interface, Processing, Data)
	30	Alternatives considered:
1.2	31
1.1	32	* ❌ 7 layers (Ingestion, AKEL, Quality, Publication, Improvement, UI, Moderation)
	33	* ❌ Pure microservices (20+ services)
	34	* ❌ Monolithic single-layer
	35	Why 3 layers:
	36	* Clear separation of concerns
	37	* Easy to understand and explain
	38	* Maintainable by small team
	39	* Can scale each layer independently
	40	* Reduces cognitive load
	41	Research: Modern architecture best practices recommend 3-4 layers maximum for maintainability
1.2	42
1.1	43	== 3. Deferred Federation ==
1.2	44
1.1	45	Decision: Single-node architecture for V1.0, federation only in V2.0+
	46	Alternatives considered:
1.2	47
1.1	48	* ❌ Federated from day one
	49	* ❌ P2P architecture
	50	* ❌ Blockchain-based
	51	Why defer federation:
	52	* Adds massive complexity (sync, conflicts, identity, governance)
	53	* Not needed for first 10,000 users
	54	* Core product must be proven first
	55	* Most successful platforms start centralized (Wikipedia, Reddit, GitHub)
	56	* Can add federation later (see: Mastodon, Matrix)
	57	When to implement:
	58	* 10,000+ users on single node
	59	* Users explicitly request decentralization
	60	* Geographic distribution becomes necessary
	61	* Censorship becomes real problem
	62	Evidence: Research shows premature federation increases failure risk (InfoQ MVP architecture)
1.2	63
1.1	64	== 4. Parallel AKEL Processing ==
1.2	65
1.1	66	Decision: Process evidence/sources/scenarios in parallel, not sequentially
	67	Alternatives considered:
1.2	68
1.1	69	* ❌ Pure sequential pipeline (15-30 seconds)
	70	* ❌ Fully async/event-driven (complex orchestration)
	71	* ❌ Microservices per stage
	72	Why parallel:
	73	* 40% faster (10-18s vs 15-30s)
	74	* Better resource utilization
	75	* Same code complexity
	76	* Improves user experience
	77	Implementation: Simple parallelization within single AKEL worker
	78	Evidence: LLM orchestration research (2024-2025) strongly recommends pipeline parallelization
1.2	79
1.1	80	== 5. Simple Manual Roles ==
1.2	81
1.1	82	Decision: Manual role assignment for V1.0 (Reader, Contributor, Moderator, Admin)
	83	Alternatives considered:
1.2	84
1.1	85	* ❌ Complex reputation point system from day one
	86	* ❌ Automated privilege escalation
	87	* ❌ Reputation decay algorithms
	88	* ❌ Trust graphs
	89	Why simple roles:
	90	* Complex reputation not needed until 100+ active contributors
	91	* Manual review builds better community initially
	92	* Easier to implement and maintain
	93	* Can add automation later when needed
	94	When to add complexity:
	95	* 100+ active contributors
	96	* Manual role management becomes bottleneck
	97	* Clear abuse patterns emerge requiring automation
	98	Evidence: Successful communities (Wikipedia, Stack Overflow) started simple and added complexity gradually
1.2	99
1.1	100	== 6. One-to-Many Scenarios ==
1.2	101
1.1	102	Decision: Scenarios belong to single claims (one-to-many) for V1.0
	103	Alternatives considered:
1.2	104
1.1	105	* ❌ Many-to-many with junction table
	106	* ❌ Scenarios as separate first-class entities
	107	* ❌ Hierarchical scenario taxonomy
	108	Why one-to-many:
	109	* Simpler queries (no junction table)
	110	* Easier to understand
	111	* Sufficient for most use cases
	112	* Can add many-to-many in V2.0 if requested
	113	When to add many-to-many:
	114	* Users request "apply this scenario to other claims"
	115	* Clear use cases for scenario reuse emerge
	116	* Performance doesn't degrade
	117	Trade-off: Slight duplication of scenarios vs. simpler mental model
1.2	118
1.1	119	== 7. Two-Tier Edit History ==
1.2	120
1.1	121	Decision: Hot audit trail (PostgreSQL) + Cold debug logs (S3 archive)
	122	Alternatives considered:
1.2	123
1.1	124	* ❌ Everything in PostgreSQL forever
	125	* ❌ Everything archived immediately
	126	* ❌ Complex versioning system from day one
	127	Why two-tier:
	128	* 90% reduction in hot database size
	129	* Full traceability maintained
	130	* Faster queries (hot data only)
	131	* Lower storage costs (S3 cheaper)
	132	Implementation:
	133	* Hot: Human edits, moderation actions, major AKEL updates
	134	* Cold: All AKEL processing logs (archived after 90 days)
	135	Evidence: Standard pattern for high-volume audit systems
1.2	136
1.1	137	== 8. Denormalized Cache Fields ==
1.2	138
1.1	139	Decision: Store summary data in claim records (evidence_summary, source_names, scenario_count)
	140	Alternatives considered:
1.2	141
1.1	142	* ❌ Fully normalized (join every time)
	143	* ❌ Fully denormalized (duplicate everything)
	144	* ❌ External cache only (Redis)
	145	Why selective denormalization:
	146	* 70% fewer joins on common queries
	147	* Much faster claim list/search pages
1.2	148	* Trade-off: Small storage increase (10%)
1.1	149	* Read-heavy system (95% reads) benefits greatly
	150	Update strategy:
	151	* Immediate: On user-visible edits
	152	* Deferred: Background job every hour
	153	* Invalidation: On source data changes
	154	Evidence: Content management best practices recommend denormalization for read-heavy systems
1.2	155
1.1	156	== 9. Multi-Provider LLM Orchestration ==
1.2	157
1.1	158	Decision: Abstract LLM calls behind interface, support multiple providers
	159	Alternatives considered:
1.2	160
1.1	161	* ❌ Hard-coded to single LLM provider
	162	* ❌ Switch providers manually
	163	* ❌ Complex multi-agent system
	164	Why orchestration:
	165	* No vendor lock-in
	166	* Cost optimization (use cheap models for simple tasks)
	167	* Cross-checking (compare outputs)
	168	* Resilience (automatic fallback)
	169	Implementation: Simple routing layer, task-based provider selection
	170	Evidence: Modern LLM app architecture (2024-2025) strongly recommends orchestration
1.2	171
1.1	172	== 10. Source Scoring Separation ==
1.2	173
1.1	174	Decision: Separate source scoring (weekly batch) from claim analysis (real-time)
	175	Alternatives considered:
1.2	176
1.1	177	* ❌ Update source scores during claim analysis
	178	* ❌ Real-time score calculation
	179	* ❌ Complex feedback loops
	180	Why separate:
	181	* Prevents circular dependencies
	182	* Predictable behavior
	183	* Easier to reason about
	184	* Simpler testing
	185	* Clear audit trail
	186	Implementation:
	187	* Sunday 2 AM: Calculate scores from past week
	188	* Monday-Saturday: Claims use those scores
	189	* Never update scores during analysis
	190	Evidence: Standard pattern to prevent feedback loops in ML systems
1.2	191
1.1	192	== 11. Simple Versioning ==
1.2	193
1.1	194	Decision: Basic audit trail only for V1.0 (before/after values, who/when/why)
	195	Alternatives considered:
1.2	196
1.1	197	* ❌ Full Git-like versioning from day one
	198	* ❌ Branching and merging
	199	* ❌ Time-travel queries
	200	* ❌ Automatic conflict resolution
	201	Why simple:
	202	* Sufficient for accountability and basic rollback
	203	* Complex versioning not requested by users yet
	204	* Can add later if needed
	205	* Easier to implement and maintain
	206	When to add complexity:
	207	* Users request "see version history"
	208	* Users request "restore previous version"
	209	* Need for branching emerges
	210	Evidence: "You Aren't Gonna Need It" (YAGNI) principle from Extreme Programming
1.2	211
1.1	212	== Design Philosophy ==
1.2	213
1.1	214	Guiding Principles:
1.2	215
1.1	216	1. Start Simple: Build minimum viable features
	217	2. Measure First: Add complexity only when metrics prove necessity
	218	3. User-Driven: Let user requests guide feature additions
	219	4. Iterate: Evolve based on real-world usage
	220	5. Fail Fast: Simple systems fail in simple ways
	221	Inspiration:
1.2	222
1.1	223	* "Premature optimization is the root of all evil" - Donald Knuth
	224	* "You Aren't Gonna Need It" - Extreme Programming
	225	* "Make it work, make it right, make it fast" - Kent Beck
	226	Result: FactHarbor V1.0 is 35% simpler than original design while maintaining all core functionality and actually becoming more scalable.
1.2	227
1.1	228	== Related Pages ==
1.2	229
1.1	230	* [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]
	231	* [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]]
	232	* [[Data Model>>FactHarbor.Specification.Data Model.WebHome]]
1.2	233	* [[AKEL>>Archive.FactHarbor 2026\.02\.08.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]

Wiki source code of Design Decisions

Applications

Navigation

Need help?