Wiki source code of Architecture
Version 2.1 by Robert Schaub on 2025/12/18 12:54
Show last authors
| author | version | line-number | content |
|---|---|---|---|
| 1 | = Architecture = | ||
| 2 | FactHarbor's architecture is designed for **simplicity, automation, and continuous improvement**. | ||
| 3 | == 1. Core Principles == | ||
| 4 | * **AI-First**: AKEL (AI) is the primary system, humans supplement | ||
| 5 | * **Publish by Default**: No centralized approval (removed in V0.9.50), publish with confidence scores | ||
| 6 | * **System Over Data**: Fix algorithms, not individual outputs | ||
| 7 | * **Measure Everything**: Quality metrics drive improvements | ||
| 8 | * **Scale Through Automation**: Minimal human intervention | ||
| 9 | * **Start Simple**: Add complexity only when metrics prove necessary | ||
| 10 | == 2. High-Level Architecture == | ||
| 11 | {{include reference="FactHarbor.Specification.Diagrams.High-Level Architecture.WebHome"/}} | ||
| 12 | === 2.1 Three-Layer Architecture === | ||
| 13 | FactHarbor uses a clean three-layer architecture: | ||
| 14 | ==== Interface Layer ==== | ||
| 15 | Handles all user and system interactions: | ||
| 16 | * **Web UI**: Browse claims, view evidence, submit feedback | ||
| 17 | * **REST API**: Programmatic access for integrations | ||
| 18 | * **Authentication & Authorization**: User identity and permissions | ||
| 19 | * **Rate Limiting**: Protect against abuse | ||
| 20 | ==== Processing Layer ==== | ||
| 21 | Core business logic and AI processing: | ||
| 22 | * **AKEL Pipeline**: AI-driven claim analysis (parallel processing) | ||
| 23 | * Parse and extract claim components | ||
| 24 | * Gather evidence from multiple sources | ||
| 25 | * Check source track records | ||
| 26 | * Extract scenarios from evidence | ||
| 27 | * Synthesize verdicts | ||
| 28 | * Calculate risk scores | ||
| 29 | * **Background Jobs**: Automated maintenance tasks | ||
| 30 | * Source track record updates (weekly) | ||
| 31 | * Cache warming and invalidation | ||
| 32 | * Metrics aggregation | ||
| 33 | * Data archival | ||
| 34 | * **Quality Monitoring**: Automated quality checks | ||
| 35 | * Anomaly detection | ||
| 36 | * Contradiction detection | ||
| 37 | * Completeness validation | ||
| 38 | * **Moderation Detection**: Automated abuse detection | ||
| 39 | * Spam identification | ||
| 40 | * Manipulation detection | ||
| 41 | * Flag suspicious activity | ||
| 42 | ==== Data & Storage Layer ==== | ||
| 43 | Persistent data storage and caching: | ||
| 44 | * **PostgreSQL**: Primary database for all core data | ||
| 45 | * Claims, evidence, sources, users | ||
| 46 | * Scenarios, edits, audit logs | ||
| 47 | * Built-in full-text search | ||
| 48 | * Time-series capabilities for metrics | ||
| 49 | * **Redis**: High-speed caching layer | ||
| 50 | * Session data | ||
| 51 | * Frequently accessed claims | ||
| 52 | * API rate limiting | ||
| 53 | * **S3 Storage**: Long-term archival | ||
| 54 | * Old edit history (90+ days) | ||
| 55 | * AKEL processing logs | ||
| 56 | * Backup snapshots | ||
| 57 | **Optional future additions** (add only when metrics prove necessary): | ||
| 58 | * **Elasticsearch**: If PostgreSQL full-text search becomes slow | ||
| 59 | * **TimescaleDB**: If metrics queries become a bottleneck | ||
| 60 | === 2.2 Design Philosophy === | ||
| 61 | **Start Simple, Evolve Based on Metrics** | ||
| 62 | The architecture deliberately starts simple: | ||
| 63 | * Single primary database (PostgreSQL handles most workloads initially) | ||
| 64 | * Three clear layers (easy to understand and maintain) | ||
| 65 | * Automated operations (minimal human intervention) | ||
| 66 | * Measure before optimizing (add complexity only when proven necessary) | ||
| 67 | See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale. | ||
| 68 | == 3. AKEL Architecture == | ||
| 69 | {{include reference="FactHarbor.Specification.Diagrams.AKEL_Architecture.WebHome"/}} | ||
| 70 | See [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information. | ||
| 71 | == 4. Storage Architecture == | ||
| 72 | {{include reference="FactHarbor.Specification.Diagrams.Storage Architecture.WebHome"/}} | ||
| 73 | See [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] for detailed information. | ||
| 74 | == 4.5 Versioning Architecture == | ||
| 75 | {{include reference="FactHarbor.Specification.Diagrams.Versioning Architecture.WebHome"/}} | ||
| 76 | == 5. Automated Systems in Detail == | ||
| 77 | FactHarbor relies heavily on automation to achieve scale and quality. Here's how each automated system works: | ||
| 78 | === 5.1 AKEL (AI Knowledge Evaluation Layer) === | ||
| 79 | **What it does**: Primary AI processing engine that analyzes claims automatically | ||
| 80 | **Inputs**: | ||
| 81 | * User-submitted claim text | ||
| 82 | * Existing evidence and sources | ||
| 83 | * Source track record database | ||
| 84 | **Processing steps**: | ||
| 85 | 1. **Parse & Extract**: Identify key components, entities, assertions | ||
| 86 | 2. **Gather Evidence**: Search web and database for relevant sources | ||
| 87 | 3. **Check Sources**: Evaluate source reliability using track records | ||
| 88 | 4. **Extract Scenarios**: Identify different contexts from evidence | ||
| 89 | 5. **Synthesize Verdict**: Compile evidence assessment per scenario | ||
| 90 | 6. **Calculate Risk**: Assess potential harm and controversy | ||
| 91 | **Outputs**: | ||
| 92 | * Structured claim record | ||
| 93 | * Evidence links with relevance scores | ||
| 94 | * Scenarios with context descriptions | ||
| 95 | * Verdict summary per scenario | ||
| 96 | * Overall confidence score | ||
| 97 | * Risk assessment | ||
| 98 | **Timing**: 10-18 seconds total (parallel processing) | ||
| 99 | === 5.2 Background Jobs === | ||
| 100 | **Source Track Record Updates** (Weekly): | ||
| 101 | * Analyze claim outcomes from past week | ||
| 102 | * Calculate source accuracy and reliability | ||
| 103 | * Update source_track_record table | ||
| 104 | * Never triggered by individual claims (prevents circular dependencies) | ||
| 105 | **Cache Management** (Continuous): | ||
| 106 | * Warm cache for popular claims | ||
| 107 | * Invalidate cache on claim updates | ||
| 108 | * Monitor cache hit rates | ||
| 109 | **Metrics Aggregation** (Hourly): | ||
| 110 | * Roll up detailed metrics | ||
| 111 | * Calculate system health indicators | ||
| 112 | * Generate performance reports | ||
| 113 | **Data Archival** (Daily): | ||
| 114 | * Move old AKEL logs to S3 (90+ days) | ||
| 115 | * Archive old edit history | ||
| 116 | * Compress and backup data | ||
| 117 | === 5.3 Quality Monitoring === | ||
| 118 | **Automated checks run continuously**: | ||
| 119 | * **Anomaly Detection**: Flag unusual patterns | ||
| 120 | * Sudden confidence score changes | ||
| 121 | * Unusual evidence distributions | ||
| 122 | * Suspicious source patterns | ||
| 123 | * **Contradiction Detection**: Identify conflicts | ||
| 124 | * Evidence that contradicts other evidence | ||
| 125 | * Claims with internal contradictions | ||
| 126 | * Source track record anomalies | ||
| 127 | * **Completeness Validation**: Ensure thoroughness | ||
| 128 | * Sufficient evidence gathered | ||
| 129 | * Multiple source types represented | ||
| 130 | * Key scenarios identified | ||
| 131 | === 5.4 Moderation Detection === | ||
| 132 | **Automated abuse detection**: | ||
| 133 | * **Spam Identification**: Pattern matching for spam claims | ||
| 134 | * **Manipulation Detection**: Identify coordinated editing | ||
| 135 | * **Gaming Detection**: Flag attempts to game source scores | ||
| 136 | * **Suspicious Activity**: Log unusual behavior patterns | ||
| 137 | **Human Review**: Moderators review flagged items, system learns from decisions | ||
| 138 | == 6. Scalability Strategy == | ||
| 139 | === 6.1 Horizontal Scaling === | ||
| 140 | Components scale independently: | ||
| 141 | * **AKEL Workers**: Add more processing workers as claim volume grows | ||
| 142 | * **Database Read Replicas**: Add replicas for read-heavy workloads | ||
| 143 | * **Cache Layer**: Redis cluster for distributed caching | ||
| 144 | * **API Servers**: Load-balanced API instances | ||
| 145 | === 6.2 Vertical Scaling === | ||
| 146 | Individual components can be upgraded: | ||
| 147 | * **Database Server**: Increase CPU/RAM for PostgreSQL | ||
| 148 | * **Cache Memory**: Expand Redis memory | ||
| 149 | * **Worker Resources**: More powerful AKEL worker machines | ||
| 150 | === 6.3 Performance Optimization === | ||
| 151 | Built-in optimizations: | ||
| 152 | * **Denormalized Data**: Cache summary data in claim records (70% fewer joins) | ||
| 153 | * **Parallel Processing**: AKEL pipeline processes in parallel (40% faster) | ||
| 154 | * **Intelligent Caching**: Redis caches frequently accessed data | ||
| 155 | * **Background Processing**: Non-urgent tasks run asynchronously | ||
| 156 | == 7. Monitoring & Observability == | ||
| 157 | === 7.1 Key Metrics === | ||
| 158 | System tracks: | ||
| 159 | * **Performance**: AKEL processing time, API response time, cache hit rate | ||
| 160 | * **Quality**: Confidence score distribution, evidence completeness, contradiction rate | ||
| 161 | * **Usage**: Claims per day, active users, API requests | ||
| 162 | * **Errors**: Failed AKEL runs, API errors, database issues | ||
| 163 | === 7.2 Alerts === | ||
| 164 | Automated alerts for: | ||
| 165 | * Processing time >30 seconds (threshold breach) | ||
| 166 | * Error rate >1% (quality issue) | ||
| 167 | * Cache hit rate <80% (cache problem) | ||
| 168 | * Database connections >80% capacity (scaling needed) | ||
| 169 | === 7.3 Dashboards === | ||
| 170 | Real-time monitoring: | ||
| 171 | * **System Health**: Overall status and key metrics | ||
| 172 | * **AKEL Performance**: Processing time breakdown | ||
| 173 | * **Quality Metrics**: Confidence scores, completeness | ||
| 174 | * **User Activity**: Usage patterns, peak times | ||
| 175 | == 8. Security Architecture == | ||
| 176 | === 8.1 Authentication & Authorization === | ||
| 177 | * **User Authentication**: Secure login with password hashing | ||
| 178 | * **Role-Based Access**: Reader, Contributor, Moderator, Admin | ||
| 179 | * **API Keys**: For programmatic access | ||
| 180 | * **Rate Limiting**: Prevent abuse | ||
| 181 | === 8.2 Data Security === | ||
| 182 | * **Encryption**: TLS for transport, encrypted storage for sensitive data | ||
| 183 | * **Audit Logging**: Track all significant changes | ||
| 184 | * **Input Validation**: Sanitize all user inputs | ||
| 185 | * **SQL Injection Protection**: Parameterized queries | ||
| 186 | === 8.3 Abuse Prevention === | ||
| 187 | * **Rate Limiting**: Prevent flooding and DDoS | ||
| 188 | * **Automated Detection**: Flag suspicious patterns | ||
| 189 | * **Human Review**: Moderators investigate flagged content | ||
| 190 | * **Ban Mechanisms**: Block abusive users/IPs | ||
| 191 | == 9. Deployment Architecture == | ||
| 192 | === 9.1 Production Environment === | ||
| 193 | **Components**: | ||
| 194 | * Load Balancer (HAProxy or cloud LB) | ||
| 195 | * Multiple API servers (stateless) | ||
| 196 | * AKEL worker pool (auto-scaling) | ||
| 197 | * PostgreSQL primary + read replicas | ||
| 198 | * Redis cluster | ||
| 199 | * S3-compatible storage | ||
| 200 | **Regions**: Single region for V1.0, multi-region when needed | ||
| 201 | === 9.2 Development & Staging === | ||
| 202 | **Development**: Local Docker Compose setup | ||
| 203 | **Staging**: Scaled-down production replica | ||
| 204 | **CI/CD**: Automated testing and deployment | ||
| 205 | === 9.3 Disaster Recovery === | ||
| 206 | * **Database Backups**: Daily automated backups to S3 | ||
| 207 | * **Point-in-Time Recovery**: Transaction log archival | ||
| 208 | * **Replication**: Real-time replication to standby | ||
| 209 | * **Recovery Time Objective**: <4 hours | ||
| 210 | |||
| 211 | === 9.5 Federation Architecture Diagram === | ||
| 212 | |||
| 213 | {{include reference="FactHarbor.Specification.Diagrams.Federation Architecture.WebHome"/}} | ||
| 214 | |||
| 215 | == 10. Future Architecture Evolution == | ||
| 216 | === 10.1 When to Add Complexity === | ||
| 217 | See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers. | ||
| 218 | **Elasticsearch**: When PostgreSQL search consistently >500ms | ||
| 219 | **TimescaleDB**: When metrics queries consistently >1s | ||
| 220 | **Federation**: When 10,000+ users and explicit demand | ||
| 221 | **Complex Reputation**: When 100+ active contributors | ||
| 222 | === 10.2 Federation (V2.0+) === | ||
| 223 | **Deferred until**: | ||
| 224 | * Core product proven with 10,000+ users | ||
| 225 | * User demand for decentralization | ||
| 226 | * Single-node limits reached | ||
| 227 | See [[Federation & Decentralization>>FactHarbor.Specification.Federation & Decentralization.WebHome]] for future plans. | ||
| 228 | == 11. Technology Stack Summary == | ||
| 229 | **Backend**: | ||
| 230 | * Python (FastAPI or Django) | ||
| 231 | * PostgreSQL (primary database) | ||
| 232 | * Redis (caching) | ||
| 233 | **Frontend**: | ||
| 234 | * Modern JavaScript framework (React, Vue, or Svelte) | ||
| 235 | * Server-side rendering for SEO | ||
| 236 | **AI/LLM**: | ||
| 237 | * Multi-provider orchestration (Claude, GPT-4, local models) | ||
| 238 | * Fallback and cross-checking support | ||
| 239 | **Infrastructure**: | ||
| 240 | * Docker containers | ||
| 241 | * Kubernetes or cloud platform auto-scaling | ||
| 242 | * S3-compatible object storage | ||
| 243 | **Monitoring**: | ||
| 244 | * Prometheus + Grafana | ||
| 245 | * Structured logging (ELK or cloud logging) | ||
| 246 | * Error tracking (Sentry) | ||
| 247 | == 12. Related Pages == | ||
| 248 | * [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] | ||
| 249 | * [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] | ||
| 250 | * [[Data Model>>FactHarbor.Specification.Data Model.WebHome]] | ||
| 251 | * [[API Layer>>FactHarbor.Specification.Architecture.WebHome]] | ||
| 252 | * [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] | ||
| 253 | * [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] |