Changes for page Design Decisions

Last modified by Robert Schaub on 2026/02/08 08:31

From 1.4 to 1.5

From version 1.1

edited by Robert Schaub
on 2026/01/20 21:40

Change comment: Imported from XAR

To version 1.4

edited by Robert Schaub
on 2026/02/08 08:31

Change comment: Renamed back-links.

Raw
Rendered

Summary

Page properties (1 modified, 0 added, 0 removed)

Details

Page properties

Content

@@ -1,9 +1,13 @@
  = Design Decisions =
++
  This page explains key architectural choices in FactHarbor and why simpler alternatives were chosen over complex solutions.
  **Philosophy**: Start simple, add complexity only when metrics prove necessary.
++
  == 1. Single Primary Database (PostgreSQL) ==
++
  **Decision**: Use PostgreSQL for all data initially, not multiple specialized databases
  **Alternatives considered**:
++
  * ❌ PostgreSQL + TimescaleDB + Elasticsearch from day one
  * ❌ Multiple specialized databases (graph, document, time-series)
  * ❌ Microservices with separate databases
@@ -19,9 +19,12 @@
  * TimescaleDB: When metrics queries consistently >1s
  * Graph DB: If relationship queries become complex
  **Evidence**: Research shows single-DB architectures work well until 10,000+ users (Vertabelo, AWS patterns)
++
  == 2. Three-Layer Architecture ==
++
  **Decision**: Organize system into 3 layers (Interface, Processing, Data)
  **Alternatives considered**:
++
  * ❌ 7 layers (Ingestion, AKEL, Quality, Publication, Improvement, UI, Moderation)
  * ❌ Pure microservices (20+ services)
  * ❌ Monolithic single-layer
@@ -32,9 +32,12 @@
  * Can scale each layer independently
  * Reduces cognitive load
  **Research**: Modern architecture best practices recommend 3-4 layers maximum for maintainability
++
  == 3. Deferred Federation ==
++
  **Decision**: Single-node architecture for V1.0, federation only in V2.0+
  **Alternatives considered**:
++
  * ❌ Federated from day one
  * ❌ P2P architecture
  * ❌ Blockchain-based
@@ -50,9 +50,12 @@
  * Geographic distribution becomes necessary
  * Censorship becomes real problem
  **Evidence**: Research shows premature federation increases failure risk (InfoQ MVP architecture)
++
  == 4. Parallel AKEL Processing ==
++
  **Decision**: Process evidence/sources/scenarios in parallel, not sequentially
  **Alternatives considered**:
++
  * ❌ Pure sequential pipeline (15-30 seconds)
  * ❌ Fully async/event-driven (complex orchestration)
  * ❌ Microservices per stage
@@ -63,9 +63,12 @@
  * Improves user experience
  **Implementation**: Simple parallelization within single AKEL worker
  **Evidence**: LLM orchestration research (2024-2025) strongly recommends pipeline parallelization
++
  == 5. Simple Manual Roles ==
++
  **Decision**: Manual role assignment for V1.0 (Reader, Contributor, Moderator, Admin)
  **Alternatives considered**:
++
  * ❌ Complex reputation point system from day one
  * ❌ Automated privilege escalation
  * ❌ Reputation decay algorithms
@@ -80,9 +80,12 @@
  * Manual role management becomes bottleneck
  * Clear abuse patterns emerge requiring automation
  **Evidence**: Successful communities (Wikipedia, Stack Overflow) started simple and added complexity gradually
++
  == 6. One-to-Many Scenarios ==
++
  **Decision**: Scenarios belong to single claims (one-to-many) for V1.0
  **Alternatives considered**:
++
  * ❌ Many-to-many with junction table
  * ❌ Scenarios as separate first-class entities
  * ❌ Hierarchical scenario taxonomy
@@ -96,9 +96,12 @@
  * Clear use cases for scenario reuse emerge
  * Performance doesn't degrade
  **Trade-off**: Slight duplication of scenarios vs. simpler mental model
++
  == 7. Two-Tier Edit History ==
++
  **Decision**: Hot audit trail (PostgreSQL) + Cold debug logs (S3 archive)
  **Alternatives considered**:
++
  * ❌ Everything in PostgreSQL forever
  * ❌ Everything archived immediately
  * ❌ Complex versioning system from day one
@@ -111,9 +111,12 @@
  * Hot: Human edits, moderation actions, major AKEL updates
  * Cold: All AKEL processing logs (archived after 90 days)
  **Evidence**: Standard pattern for high-volume audit systems
++
  == 8. Denormalized Cache Fields ==
++
  **Decision**: Store summary data in claim records (evidence_summary, source_names, scenario_count)
  **Alternatives considered**:
++
  * ❌ Fully normalized (join every time)
  * ❌ Fully denormalized (duplicate everything)
  * ❌ External cache only (Redis)
@@ -120,7 +120,7 @@
  **Why selective denormalization**:
  * 70% fewer joins on common queries
  * Much faster claim list/search pages
--* Trade-off: Small storage increase (~10%)
++* Trade-off: Small storage increase (10%)
  * Read-heavy system (95% reads) benefits greatly
  **Update strategy**:
  * Immediate: On user-visible edits
@@ -127,9 +127,12 @@
  * Deferred: Background job every hour
  * Invalidation: On source data changes
  **Evidence**: Content management best practices recommend denormalization for read-heavy systems
++
  == 9. Multi-Provider LLM Orchestration ==
++
  **Decision**: Abstract LLM calls behind interface, support multiple providers
  **Alternatives considered**:
++
  * ❌ Hard-coded to single LLM provider
  * ❌ Switch providers manually
  * ❌ Complex multi-agent system
@@ -140,9 +140,12 @@
  * Resilience (automatic fallback)
  **Implementation**: Simple routing layer, task-based provider selection
  **Evidence**: Modern LLM app architecture (2024-2025) strongly recommends orchestration
++
  == 10. Source Scoring Separation ==
++
  **Decision**: Separate source scoring (weekly batch) from claim analysis (real-time)
  **Alternatives considered**:
++
  * ❌ Update source scores during claim analysis
  * ❌ Real-time score calculation
  * ❌ Complex feedback loops
@@ -157,9 +157,12 @@
  * Monday-Saturday: Claims use those scores
  * Never update scores during analysis
  **Evidence**: Standard pattern to prevent feedback loops in ML systems
++
  == 11. Simple Versioning ==
++
  **Decision**: Basic audit trail only for V1.0 (before/after values, who/when/why)
  **Alternatives considered**:
++
  * ❌ Full Git-like versioning from day one
  * ❌ Branching and merging
  * ❌ Time-travel queries
@@ -174,8 +174,11 @@
  * Users request "restore previous version"
  * Need for branching emerges
  **Evidence**: "You Aren't Gonna Need It" (YAGNI) principle from Extreme Programming
++
  == Design Philosophy ==
++
  **Guiding Principles**:
++
 . **Start Simple**: Build minimum viable features
 . **Measure First**: Add complexity only when metrics prove necessity
 . **User-Driven**: Let user requests guide feature additions
@@ -182,12 +182,15 @@
 . **Iterate**: Evolve based on real-world usage
 . **Fail Fast**: Simple systems fail in simple ways
  **Inspiration**:
++
  * "Premature optimization is the root of all evil" - Donald Knuth
  * "You Aren't Gonna Need It" - Extreme Programming
  * "Make it work, make it right, make it fast" - Kent Beck
  **Result**: FactHarbor V1.0 is 35% simpler than original design while maintaining all core functionality and actually becoming more scalable.
++
  == Related Pages ==
--* [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]
++
++* [[Architecture>>Archive.FactHarbor 2026\.02\.08.Specification.Architecture.WebHome]]
  * [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]]
--* [[Data Model>>FactHarbor.Specification.Data Model.WebHome]]
--* [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
++* [[Data Model>>Archive.FactHarbor 2026\.02\.08.Specification.Data Model.WebHome]]
++* [[AKEL>>Archive.FactHarbor 2026\.02\.08.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]

Changes for page Design Decisions

Summary

Details

Applications

Navigation

Need help?