When to Add Complexity
When to Add Complexity
FactHarbor starts simple and adds complexity only when metrics prove it's necessary. This page defines clear triggers for adding deferred features.
Philosophy: Let data and user feedback drive complexity, not assumptions about future needs.
1. Add Elasticsearch
Current: PostgreSQL full-text search
Add Elasticsearch when:
- ✅ PostgreSQL search queries consistently >500ms
- ✅ Search accounts for >20% of total database load
- ✅ Users complain about search speed
- ✅ Search index size >50GB
Metrics to monitor: - Search query response time (P95, P99)
- Database CPU usage during search
- User search abandonment rate
- Search result relevance scores
Before adding: - Try PostgreSQL search optimization (indexes, pg_trgm, GIN indexes)
- Profile slow queries
- Consider query result caching
- Estimate Elasticsearch costs
Implementation effort:
2. Add TimescaleDB
Current: PostgreSQL with time-series data in regular tables
Add TimescaleDB when:
- ✅ Metrics queries consistently >1 second
- ✅ Metrics tables >100GB
- ✅ Need for time-series specific features (continuous aggregates, data retention policies)
- ✅ Dashboard loading noticeably slow
Metrics to monitor: - Metrics query response time
- Metrics table size growth rate
- Dashboard load time
- Time-series query patterns
Before adding: - Try PostgreSQL optimization (partitioning, materialized views)
- Implement query result caching
- Consider data aggregation strategies
- Profile slow metrics queries
Implementation effort:
3. Add Federation
Current: Single-node deployment with read replicas
Add Federation when:
- ✅ 10,000+ users on single node
- ✅ Users explicitly request ability to run own instances
- ✅ Geographic latency becomes significant problem (>200ms)
- ✅ Censorship/control concerns emerge
- ✅ Community demands decentralization
Metrics to monitor: - Total active users
- Geographic distribution of users
- Single-node performance limits
- User feature requests
- Community sentiment
Before adding: - Exhaust vertical scaling options
- Add read replicas in multiple regions
- Implement CDN for static content
- Survey users about federation interest
Implementation effort: (major undertaking)
4. Add Complex Reputation System
Current: Simple manual roles (Reader, Contributor, Moderator, Admin)
Add Complex Reputation when:
- ✅ 100+ active contributors
- ✅ Manual role management becomes bottleneck (>5 hours/week)
- ✅ Clear patterns of abuse require automated detection
- ✅ Community requests reputation visibility
Metrics to monitor: - Number of active contributors
- Time spent on manual role management
- Abuse incident rate
- Contribution quality distribution
- Community feedback on roles
Before adding: - Document current manual process thoroughly
- Identify most time-consuming tasks
- Prototype automated reputation algorithm
- Get community feedback on proposal
Implementation effort:
5. Add Many-to-Many Scenarios
Current: Scenarios belong to single claims (one-to-many)
Add Many-to-Many Scenarios when:
- ✅ Users request "apply this scenario to other claims"
- ✅ Clear use cases for scenario reuse emerge
- ✅ Scenario duplication becomes significant storage issue
- ✅ Cross-claim scenario analysis requested
Metrics to monitor: - Scenario duplication rate
- User feature requests
- Storage costs of scenarios
- Query patterns involving scenarios
Before adding: - Analyze scenario duplication patterns
- Design junction table schema
- Plan data migration strategy
- Consider query performance impact
Implementation effort:
6. Add Full Versioning System
Current: Simple audit trail (before/after values, who/when/why)
Add Full Versioning when:
- ✅ Users request "see complete version history"
- ✅ Users request "restore to specific previous version"
- ✅ Need for branching and merging emerges
- ✅ Collaborative editing requires conflict resolution
Metrics to monitor: - User feature requests for versioning
- Manual rollback frequency
- Edit conflict rate
- Storage costs of full history
Before adding: - Design branching/merging strategy
- Plan storage optimization (delta compression)
- Consider UI/UX for version history
- Estimate storage and performance impact
Implementation effort:
7. Add Graph Database
Current: Relational data model in PostgreSQL
Add Graph Database when:
- ✅ Complex relationship queries become common
- ✅ Need for multi-hop traversals (friend-of-friend, citation chains)
- ✅ PostgreSQL recursive queries too slow
- ✅ Graph algorithms needed (PageRank, community detection)
Metrics to monitor: - Relationship query patterns
- Recursive query performance
- Use cases requiring graph traversals
- Query complexity growth
Before adding: - Try PostgreSQL recursive CTEs
- Consider graph extensions for PostgreSQL
- Profile slow relationship queries
- Evaluate Neo4j vs alternatives
Implementation effort:
8. Add Real-Time Collaboration
Current: Asynchronous edits with eventual consistency
Add Real-Time Collaboration when:
- ✅ Users request simultaneous editing
- ✅ Conflict resolution becomes frequent issue
- ✅ Need for live updates during editing sessions
- ✅ Collaborative workflows common
Metrics to monitor: - Edit conflict frequency
- User feature requests
- Collaborative editing patterns
- Average edit session duration
Before adding: - Design conflict resolution strategy (Operational Transform or CRDT)
- Consider WebSocket infrastructure
- Plan UI/UX for real-time editing
- Estimate server resource requirements
Implementation effort:
9. Add Machine Learning Pipeline
Current: Rule-based quality scoring and LLM-based analysis
Add ML Pipeline when:
- ✅ Need for custom models beyond LLM APIs
- ✅ Opportunity for specialized fine-tuning
- ✅ Cost savings from specialized models
- ✅ Real-time learning from user feedback
Metrics to monitor: - LLM API costs
- Need for domain-specific models
- Quality improvement opportunities
- User feedback patterns
Before adding: - Collect training data (user feedback, corrections)
- Experiment with fine-tuning approaches
- Estimate cost savings vs infrastructure costs
- Consider model hosting options
Implementation effort:
10. Add Blockchain/Web3 Integration
Current: Traditional database with audit logs
Add Blockchain when:
- ✅ Need for immutable public audit trail
- ✅ Decentralized verification demanded
- ✅ Token economics would add value
- ✅ Community governance requires voting
- ✅ Cross-organization trust is critical
Metrics to monitor: - User requests for blockchain features
- Need for external verification
- Governance participation rate
- Trust/verification requirements
Before adding: - Evaluate real vs perceived benefits
- Consider costs (gas fees, infrastructure)
- Design token economics carefully
- Study successful Web3 content platforms
Implementation effort:
Decision Framework
For any complexity addition, ask:
Do we have data?
- Metrics showing current system inadequate?
- User requests documenting need?
- Performance problems proven?
Have we exhausted simpler options?
- Optimization of current system?
- Configuration tuning?
- Simple workarounds?
Do we understand the cost?
- Implementation time realistic?
- Ongoing maintenance burden?
- Infrastructure costs?
- Technical debt implications?
Is the timing right?
- Core product stable?
- Team capacity available?
- User demand strong enough?
If all four answers are YES: Proceed with complexity addition
If any answer is NO: Defer and revisit later
Monitoring Dashboard
Recommended metrics to track:
Performance:
- P95/P99 response times for all major operations
- Database query performance
- AKEL processing time
- Search performance
Usage: - Active users (daily, weekly, monthly)
- Claims processed per day
- Search queries per day
- Contribution rate
Costs: - Infrastructure costs per user
- LLM API costs per claim
- Storage costs per GB
- Total operational costs
Quality: - Confidence score distribution
- Evidence completeness
- Source reliability trends
- User satisfaction (surveys)
Community: - Active contributors
- Moderation workload
- Feature requests by category
- Abuse incident rate
Quarterly Review Process
Every quarter, review:
- Metrics dashboard: Are any triggers close to thresholds?
2. User feedback: What features are most requested?
3. Performance: What's slowing down?
4. Costs: What's most expensive?
5. Team capacity: Can we handle new complexity?
Decision: Prioritize complexity additions based on:
- Urgency (current pain vs future optimization)
- Impact (user benefit vs internal efficiency)
- Effort (quick wins vs major projects)
- Dependencies (prerequisites needed)
Related Pages
- Design Decisions
- Architecture
- Data Model
Remember
Build what you need now. Measure everything. Add complexity only when data proves it's necessary.
The best architecture is the simplest one that works for current needs. 🎯