When to Add Complexity

Last modified by Robert Schaub on 2025/12/24 21:53

When to Add Complexity

FactHarbor starts simple and adds complexity only when metrics prove it's necessary. This page defines clear triggers for adding deferred features.
Philosophy: Let data and user feedback drive complexity, not assumptions about future needs.

1. Add Elasticsearch

Current: PostgreSQL full-text search
Add Elasticsearch when:

✅ PostgreSQL search queries consistently >500ms
✅ Search accounts for >20% of total database load
✅ Users complain about search speed
✅ Search index size >50GB
Metrics to monitor:
Search query response time (P95, P99)
Database CPU usage during search
User search abandonment rate
Search result relevance scores
Before adding:
Try PostgreSQL search optimization (indexes, pg_trgm, GIN indexes)
Profile slow queries
Consider query result caching
Estimate Elasticsearch costs
Implementation effort:

2. Add TimescaleDB

Current: PostgreSQL with time-series data in regular tables
Add TimescaleDB when:

✅ Metrics queries consistently >1 second
✅ Metrics tables >100GB
✅ Need for time-series specific features (continuous aggregates, data retention policies)
✅ Dashboard loading noticeably slow
Metrics to monitor:
Metrics query response time
Metrics table size growth rate
Dashboard load time
Time-series query patterns
Before adding:
Try PostgreSQL optimization (partitioning, materialized views)
Implement query result caching
Consider data aggregation strategies
Profile slow metrics queries
Implementation effort:

3. Add Federation

Current: Single-node deployment with read replicas
Add Federation when:

✅ 10,000+ users on single node
✅ Users explicitly request ability to run own instances
✅ Geographic latency becomes significant problem (>200ms)
✅ Censorship/control concerns emerge
✅ Community demands decentralization
Metrics to monitor:
Total active users
Geographic distribution of users
Single-node performance limits
User feature requests
Community sentiment
Before adding:
Exhaust vertical scaling options
Add read replicas in multiple regions
Implement CDN for static content
Survey users about federation interest
Implementation effort: (major undertaking)

4. Add Complex Reputation System

Current: Simple manual roles (Reader, Contributor, Moderator, Admin)
Add Complex Reputation when:

✅ 100+ active contributors
✅ Manual role management becomes bottleneck (>5 hours/week)
✅ Clear patterns of abuse require automated detection
✅ Community requests reputation visibility
Metrics to monitor:
Number of active contributors
Time spent on manual role management
Abuse incident rate
Contribution quality distribution
Community feedback on roles
Before adding:
Document current manual process thoroughly
Identify most time-consuming tasks
Prototype automated reputation algorithm
Get community feedback on proposal
Implementation effort:

5. Add Many-to-Many Scenarios

Current: Scenarios belong to single claims (one-to-many)
Add Many-to-Many Scenarios when:

✅ Users request "apply this scenario to other claims"
✅ Clear use cases for scenario reuse emerge
✅ Scenario duplication becomes significant storage issue
✅ Cross-claim scenario analysis requested
Metrics to monitor:
Scenario duplication rate
User feature requests
Storage costs of scenarios
Query patterns involving scenarios
Before adding:
Analyze scenario duplication patterns
Design junction table schema
Plan data migration strategy
Consider query performance impact
Implementation effort:

6. Add Full Versioning System

Current: Simple audit trail (before/after values, who/when/why)
Add Full Versioning when:

✅ Users request "see complete version history"
✅ Users request "restore to specific previous version"
✅ Need for branching and merging emerges
✅ Collaborative editing requires conflict resolution
Metrics to monitor:
User feature requests for versioning
Manual rollback frequency
Edit conflict rate
Storage costs of full history
Before adding:
Design branching/merging strategy
Plan storage optimization (delta compression)
Consider UI/UX for version history
Estimate storage and performance impact
Implementation effort:

7. Add Graph Database

Current: Relational data model in PostgreSQL
Add Graph Database when:

✅ Complex relationship queries become common
✅ Need for multi-hop traversals (friend-of-friend, citation chains)
✅ PostgreSQL recursive queries too slow
✅ Graph algorithms needed (PageRank, community detection)
Metrics to monitor:
Relationship query patterns
Recursive query performance
Use cases requiring graph traversals
Query complexity growth
Before adding:
Try PostgreSQL recursive CTEs
Consider graph extensions for PostgreSQL
Profile slow relationship queries
Evaluate Neo4j vs alternatives
Implementation effort:

8. Add Real-Time Collaboration

Current: Asynchronous edits with eventual consistency
Add Real-Time Collaboration when:

✅ Users request simultaneous editing
✅ Conflict resolution becomes frequent issue
✅ Need for live updates during editing sessions
✅ Collaborative workflows common
Metrics to monitor:
Edit conflict frequency
User feature requests
Collaborative editing patterns
Average edit session duration
Before adding:
Design conflict resolution strategy (Operational Transform or CRDT)
Consider WebSocket infrastructure
Plan UI/UX for real-time editing
Estimate server resource requirements
Implementation effort:

9. Add Machine Learning Pipeline

Current: Rule-based quality scoring and LLM-based analysis
Add ML Pipeline when:

✅ Need for custom models beyond LLM APIs
✅ Opportunity for specialized fine-tuning
✅ Cost savings from specialized models
✅ Real-time learning from user feedback
Metrics to monitor:
LLM API costs
Need for domain-specific models
Quality improvement opportunities
User feedback patterns
Before adding:
Collect training data (user feedback, corrections)
Experiment with fine-tuning approaches
Estimate cost savings vs infrastructure costs
Consider model hosting options
Implementation effort:

10. Add Blockchain/Web3 Integration

Current: Traditional database with audit logs
Add Blockchain when:

✅ Need for immutable public audit trail
✅ Decentralized verification demanded
✅ Token economics would add value
✅ Community governance requires voting
✅ Cross-organization trust is critical
Metrics to monitor:
User requests for blockchain features
Need for external verification
Governance participation rate
Trust/verification requirements
Before adding:
Evaluate real vs perceived benefits
Consider costs (gas fees, infrastructure)
Design token economics carefully
Study successful Web3 content platforms
Implementation effort:

Decision Framework

For any complexity addition, ask:

Do we have data?

Metrics showing current system inadequate?
User requests documenting need?
Performance problems proven?

Have we exhausted simpler options?

Optimization of current system?
Configuration tuning?
Simple workarounds?

Do we understand the cost?

Implementation time realistic?
Ongoing maintenance burden?
Infrastructure costs?
Technical debt implications?

Is the timing right?

Core product stable?
Team capacity available?
User demand strong enough?
If all four answers are YES: Proceed with complexity addition
If any answer is NO: Defer and revisit later

Monitoring Dashboard

Recommended metrics to track:
Performance:

P95/P99 response times for all major operations
Database query performance
AKEL processing time
Search performance
Usage:
Active users (daily, weekly, monthly)
Claims processed per day
Search queries per day
Contribution rate
Costs:
Infrastructure costs per user
LLM API costs per claim
Storage costs per GB
Total operational costs
Quality:
Confidence score distribution
Evidence completeness
Source reliability trends
User satisfaction (surveys)
Community:
Active contributors
Moderation workload
Feature requests by category
Abuse incident rate

Quarterly Review Process

Every quarter, review:

Metrics dashboard: Are any triggers close to thresholds?
2. User feedback: What features are most requested?
3. Performance: What's slowing down?
4. Costs: What's most expensive?
5. Team capacity: Can we handle new complexity?
Decision: Prioritize complexity additions based on:

Urgency (current pain vs future optimization)
Impact (user benefit vs internal efficiency)
Effort (quick wins vs major projects)
Dependencies (prerequisites needed)

Design Decisions
Architecture
Data Model
Remember
Build what you need now. Measure everything. Add complexity only when data proves it's necessary.
The best architecture is the simplest one that works for current needs. 🎯

When to Add Complexity

When to Add Complexity

1. Add Elasticsearch

2. Add TimescaleDB

3. Add Federation

4. Add Complex Reputation System

5. Add Many-to-Many Scenarios

6. Add Full Versioning System

7. Add Graph Database

8. Add Real-Time Collaboration

9. Add Machine Learning Pipeline

10. Add Blockchain/Web3 Integration

Decision Framework

Do we have data?

Have we exhausted simpler options?

Do we understand the cost?

Is the timing right?

Monitoring Dashboard

Quarterly Review Process

Related Pages

Applications

Navigation

Need help?