Wiki source code of Architecture
Last modified by Robert Schaub on 2026/02/08 08:23
Hide last authors
| author | version | line-number | content |
|---|---|---|---|
| |
1.1 | 1 | = Architecture = |
| |
1.3 | 2 | |
| |
1.1 | 3 | FactHarbor's architecture is designed for **simplicity, automation, and continuous improvement**. |
| |
1.3 | 4 | |
| |
1.1 | 5 | == 1. Core Principles == |
| |
1.3 | 6 | |
| |
1.1 | 7 | * **AI-First**: AKEL (AI) is the primary system, humans supplement |
| 8 | * **Publish by Default**: No centralized approval (removed in V0.9.50), publish with confidence scores | ||
| 9 | * **System Over Data**: Fix algorithms, not individual outputs | ||
| 10 | * **Measure Everything**: Quality metrics drive improvements | ||
| 11 | * **Scale Through Automation**: Minimal human intervention | ||
| 12 | * **Start Simple**: Add complexity only when metrics prove necessary | ||
| |
1.3 | 13 | |
| |
1.1 | 14 | == 2. High-Level Architecture == |
| |
1.3 | 15 | |
| |
1.15 | 16 | {{include reference="Archive.FactHarbor 2026\.01\.20.Specification.Diagrams.High-Level Architecture.WebHome"/}} |
| |
1.3 | 17 | |
| |
1.1 | 18 | === 2.1 Three-Layer Architecture === |
| |
1.3 | 19 | |
| |
1.1 | 20 | FactHarbor uses a clean three-layer architecture: |
| |
1.3 | 21 | |
| |
1.1 | 22 | ==== Interface Layer ==== |
| |
1.3 | 23 | |
| |
1.1 | 24 | Handles all user and system interactions: |
| |
1.3 | 25 | |
| |
1.1 | 26 | * **Web UI**: Browse claims, view evidence, submit feedback |
| 27 | * **REST API**: Programmatic access for integrations | ||
| 28 | * **Authentication & Authorization**: User identity and permissions | ||
| 29 | * **Rate Limiting**: Protect against abuse | ||
| |
1.3 | 30 | |
| |
1.1 | 31 | ==== Processing Layer ==== |
| |
1.3 | 32 | |
| |
1.1 | 33 | Core business logic and AI processing: |
| |
1.3 | 34 | |
| |
1.1 | 35 | * **AKEL Pipeline**: AI-driven claim analysis (parallel processing) |
| |
1.3 | 36 | * Parse and extract claim components |
| 37 | * Gather evidence from multiple sources | ||
| 38 | * Check source track records | ||
| 39 | * Extract scenarios from evidence | ||
| 40 | * Synthesize verdicts | ||
| 41 | * Calculate risk scores | ||
| |
1.1 | 42 | * **Background Jobs**: Automated maintenance tasks |
| |
1.3 | 43 | * Source track record updates (weekly) |
| 44 | * Cache warming and invalidation | ||
| 45 | * Metrics aggregation | ||
| 46 | * Data archival | ||
| |
1.1 | 47 | * **Quality Monitoring**: Automated quality checks |
| |
1.3 | 48 | * Anomaly detection |
| 49 | * Contradiction detection | ||
| 50 | * Completeness validation | ||
| |
1.1 | 51 | * **Moderation Detection**: Automated abuse detection |
| |
1.3 | 52 | * Spam identification |
| 53 | * Manipulation detection | ||
| 54 | * Flag suspicious activity | ||
| 55 | |||
| |
1.1 | 56 | ==== Data & Storage Layer ==== |
| |
1.3 | 57 | |
| |
1.1 | 58 | Persistent data storage and caching: |
| |
1.3 | 59 | |
| |
1.1 | 60 | * **PostgreSQL**: Primary database for all core data |
| |
1.3 | 61 | * Claims, evidence, sources, users |
| 62 | * Scenarios, edits, audit logs | ||
| 63 | * Built-in full-text search | ||
| 64 | * Time-series capabilities for metrics | ||
| |
1.1 | 65 | * **Redis**: High-speed caching layer |
| |
1.3 | 66 | * Session data |
| 67 | * Frequently accessed claims | ||
| 68 | * API rate limiting | ||
| |
1.1 | 69 | * **S3 Storage**: Long-term archival |
| |
1.3 | 70 | * Old edit history (90+ days) |
| 71 | * AKEL processing logs | ||
| 72 | * Backup snapshots | ||
| |
1.1 | 73 | **Optional future additions** (add only when metrics prove necessary): |
| 74 | * **Elasticsearch**: If PostgreSQL full-text search becomes slow | ||
| 75 | * **TimescaleDB**: If metrics queries become a bottleneck | ||
| |
1.3 | 76 | |
| |
1.1 | 77 | === 2.2 Design Philosophy === |
| |
1.3 | 78 | |
| |
1.1 | 79 | **Start Simple, Evolve Based on Metrics** |
| 80 | The architecture deliberately starts simple: | ||
| |
1.3 | 81 | |
| |
1.1 | 82 | * Single primary database (PostgreSQL handles most workloads initially) |
| 83 | * Three clear layers (easy to understand and maintain) | ||
| 84 | * Automated operations (minimal human intervention) | ||
| 85 | * Measure before optimizing (add complexity only when proven necessary) | ||
| 86 | See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale. | ||
| |
1.3 | 87 | |
| |
1.1 | 88 | == 3. AKEL Architecture == |
| |
1.3 | 89 | |
| |
1.1 | 90 | {{include reference="FactHarbor.Specification.Diagrams.AKEL_Architecture.WebHome"/}} |
| |
1.11 | 91 | See [[AI Knowledge Extraction Layer (AKEL)>>Archive.FactHarbor 2026\.01\.20.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information. |
| |
1.1 | 92 | |
| 93 | == 3.5 Claim Processing Architecture == | ||
| 94 | |||
| 95 | FactHarbor's claim processing architecture is designed to handle both single-claim and multi-claim submissions efficiently. | ||
| 96 | |||
| 97 | === Multi-Claim Handling === | ||
| 98 | |||
| 99 | Users often submit: | ||
| |
1.3 | 100 | |
| |
1.1 | 101 | * **Text with multiple claims**: Articles, statements, or paragraphs containing several distinct factual claims |
| 102 | * **Web pages**: URLs that are analyzed to extract all verifiable claims | ||
| 103 | * **Single claims**: Simple, direct factual statements | ||
| 104 | |||
| 105 | The first processing step is always **Claim Extraction**: identifying and isolating individual verifiable claims from submitted content. | ||
| 106 | |||
| 107 | === Processing Phases === | ||
| 108 | |||
| 109 | **POC Implementation (Two-Phase):** | ||
| 110 | |||
| 111 | Phase 1 - Claim Extraction: | ||
| |
1.3 | 112 | |
| |
1.1 | 113 | * LLM analyzes submitted content |
| 114 | * Extracts all distinct, verifiable claims | ||
| 115 | * Returns structured list of claims with context | ||
| 116 | |||
| 117 | Phase 2 - Parallel Analysis: | ||
| |
1.3 | 118 | |
| |
1.1 | 119 | * Each claim processed independently by LLM |
| 120 | * Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk | ||
| 121 | * Parallelized across all claims | ||
| 122 | * Results aggregated for presentation | ||
| 123 | |||
| 124 | **Production Implementation (Three-Phase):** | ||
| 125 | |||
| 126 | Phase 1 - Extraction + Validation: | ||
| |
1.3 | 127 | |
| |
1.1 | 128 | * Extract claims from content |
| 129 | * Validate clarity and uniqueness | ||
| 130 | * Filter vague or duplicate claims | ||
| 131 | |||
| 132 | Phase 2 - Evidence Gathering (Parallel): | ||
| |
1.3 | 133 | |
| |
1.1 | 134 | * Independent evidence gathering per claim |
| 135 | * Source validation and scenario generation | ||
| 136 | * Quality gates prevent poor data from advancing | ||
| 137 | |||
| 138 | Phase 3 - Verdict Generation (Parallel): | ||
| |
1.3 | 139 | |
| |
1.1 | 140 | * Generate verdict from validated evidence |
| 141 | * Confidence scoring and risk assessment | ||
| 142 | * Low-confidence cases routed to human review | ||
| 143 | |||
| 144 | === Architectural Benefits === | ||
| 145 | |||
| 146 | **Scalability:** | ||
| |
1.3 | 147 | |
| 148 | * Process 100 claims with 3x latency of single claim | ||
| |
1.1 | 149 | * Parallel processing across independent claims |
| 150 | * Linear cost scaling with claim count | ||
| 151 | |||
| 152 | **Quality:** | ||
| |
1.3 | 153 | |
| |
1.1 | 154 | * Validation gates between phases |
| 155 | * Errors isolated to individual claims | ||
| 156 | * Clear observability per processing step | ||
| 157 | |||
| 158 | **Flexibility:** | ||
| |
1.3 | 159 | |
| |
1.1 | 160 | * Each phase optimizable independently |
| 161 | * Can use different model sizes per phase | ||
| 162 | * Easy to add human review at decision points | ||
| 163 | |||
| |
1.3 | 164 | == 4. Storage Architecture == |
| |
1.1 | 165 | |
| |
1.16 | 166 | {{include reference="Archive.FactHarbor 2026\.01\.20.Specification.Diagrams.Storage Architecture.WebHome"/}} |
| |
1.12 | 167 | See [[Storage Strategy>>Archive.FactHarbor 2026\.01\.20.Specification.Architecture.WebHome]] for detailed information. |
| |
1.3 | 168 | |
| |
1.1 | 169 | == 4.5 Versioning Architecture == |
| |
1.3 | 170 | |
| |
1.17 | 171 | {{include reference="Archive.FactHarbor 2026\.01\.20.Specification.Diagrams.Versioning Architecture.WebHome"/}} |
| |
1.3 | 172 | |
| |
1.1 | 173 | == 5. Automated Systems in Detail == |
| |
1.3 | 174 | |
| |
1.1 | 175 | FactHarbor relies heavily on automation to achieve scale and quality. Here's how each automated system works: |
| |
1.3 | 176 | |
| |
1.1 | 177 | === 5.1 AKEL (AI Knowledge Evaluation Layer) === |
| |
1.3 | 178 | |
| |
1.1 | 179 | **What it does**: Primary AI processing engine that analyzes claims automatically |
| 180 | **Inputs**: | ||
| |
1.3 | 181 | |
| |
1.1 | 182 | * User-submitted claim text |
| 183 | * Existing evidence and sources | ||
| 184 | * Source track record database | ||
| 185 | **Processing steps**: | ||
| |
1.3 | 186 | |
| |
1.1 | 187 | 1. **Parse & Extract**: Identify key components, entities, assertions |
| 188 | 2. **Gather Evidence**: Search web and database for relevant sources | ||
| 189 | 3. **Check Sources**: Evaluate source reliability using track records | ||
| 190 | 4. **Extract Scenarios**: Identify different contexts from evidence | ||
| 191 | 5. **Synthesize Verdict**: Compile evidence assessment per scenario | ||
| 192 | 6. **Calculate Risk**: Assess potential harm and controversy | ||
| 193 | **Outputs**: | ||
| |
1.3 | 194 | |
| |
1.1 | 195 | * Structured claim record |
| 196 | * Evidence links with relevance scores | ||
| 197 | * Scenarios with context descriptions | ||
| 198 | * Verdict summary per scenario | ||
| 199 | * Overall confidence score | ||
| 200 | * Risk assessment | ||
| 201 | **Timing**: 10-18 seconds total (parallel processing) | ||
| |
1.3 | 202 | |
| |
1.1 | 203 | === 5.2 Background Jobs === |
| |
1.3 | 204 | |
| |
1.1 | 205 | **Source Track Record Updates** (Weekly): |
| |
1.3 | 206 | |
| |
1.1 | 207 | * Analyze claim outcomes from past week |
| 208 | * Calculate source accuracy and reliability | ||
| 209 | * Update source_track_record table | ||
| 210 | * Never triggered by individual claims (prevents circular dependencies) | ||
| 211 | **Cache Management** (Continuous): | ||
| 212 | * Warm cache for popular claims | ||
| 213 | * Invalidate cache on claim updates | ||
| 214 | * Monitor cache hit rates | ||
| 215 | **Metrics Aggregation** (Hourly): | ||
| 216 | * Roll up detailed metrics | ||
| 217 | * Calculate system health indicators | ||
| 218 | * Generate performance reports | ||
| 219 | **Data Archival** (Daily): | ||
| 220 | * Move old AKEL logs to S3 (90+ days) | ||
| 221 | * Archive old edit history | ||
| 222 | * Compress and backup data | ||
| |
1.3 | 223 | |
| |
1.1 | 224 | === 5.3 Quality Monitoring === |
| |
1.3 | 225 | |
| |
1.1 | 226 | **Automated checks run continuously**: |
| |
1.3 | 227 | |
| |
1.1 | 228 | * **Anomaly Detection**: Flag unusual patterns |
| |
1.3 | 229 | * Sudden confidence score changes |
| 230 | * Unusual evidence distributions | ||
| 231 | * Suspicious source patterns | ||
| |
1.1 | 232 | * **Contradiction Detection**: Identify conflicts |
| |
1.3 | 233 | * Evidence that contradicts other evidence |
| 234 | * Claims with internal contradictions | ||
| 235 | * Source track record anomalies | ||
| |
1.1 | 236 | * **Completeness Validation**: Ensure thoroughness |
| |
1.3 | 237 | * Sufficient evidence gathered |
| 238 | * Multiple source types represented | ||
| 239 | * Key scenarios identified | ||
| 240 | |||
| |
1.1 | 241 | === 5.4 Moderation Detection === |
| |
1.3 | 242 | |
| |
1.1 | 243 | **Automated abuse detection**: |
| |
1.3 | 244 | |
| |
1.1 | 245 | * **Spam Identification**: Pattern matching for spam claims |
| 246 | * **Manipulation Detection**: Identify coordinated editing | ||
| 247 | * **Gaming Detection**: Flag attempts to game source scores | ||
| 248 | * **Suspicious Activity**: Log unusual behavior patterns | ||
| 249 | **Human Review**: Moderators review flagged items, system learns from decisions | ||
| |
1.3 | 250 | |
| |
1.1 | 251 | == 6. Scalability Strategy == |
| |
1.3 | 252 | |
| |
1.1 | 253 | === 6.1 Horizontal Scaling === |
| |
1.3 | 254 | |
| |
1.1 | 255 | Components scale independently: |
| |
1.3 | 256 | |
| |
1.1 | 257 | * **AKEL Workers**: Add more processing workers as claim volume grows |
| 258 | * **Database Read Replicas**: Add replicas for read-heavy workloads | ||
| 259 | * **Cache Layer**: Redis cluster for distributed caching | ||
| 260 | * **API Servers**: Load-balanced API instances | ||
| |
1.3 | 261 | |
| |
1.1 | 262 | === 6.2 Vertical Scaling === |
| |
1.3 | 263 | |
| |
1.1 | 264 | Individual components can be upgraded: |
| |
1.3 | 265 | |
| |
1.1 | 266 | * **Database Server**: Increase CPU/RAM for PostgreSQL |
| 267 | * **Cache Memory**: Expand Redis memory | ||
| 268 | * **Worker Resources**: More powerful AKEL worker machines | ||
| |
1.3 | 269 | |
| |
1.1 | 270 | === 6.3 Performance Optimization === |
| |
1.3 | 271 | |
| |
1.1 | 272 | Built-in optimizations: |
| |
1.3 | 273 | |
| |
1.1 | 274 | * **Denormalized Data**: Cache summary data in claim records (70% fewer joins) |
| 275 | * **Parallel Processing**: AKEL pipeline processes in parallel (40% faster) | ||
| 276 | * **Intelligent Caching**: Redis caches frequently accessed data | ||
| 277 | * **Background Processing**: Non-urgent tasks run asynchronously | ||
| |
1.3 | 278 | |
| |
1.1 | 279 | == 7. Monitoring & Observability == |
| |
1.3 | 280 | |
| |
1.1 | 281 | === 7.1 Key Metrics === |
| |
1.3 | 282 | |
| |
1.1 | 283 | System tracks: |
| |
1.3 | 284 | |
| |
1.1 | 285 | * **Performance**: AKEL processing time, API response time, cache hit rate |
| 286 | * **Quality**: Confidence score distribution, evidence completeness, contradiction rate | ||
| 287 | * **Usage**: Claims per day, active users, API requests | ||
| 288 | * **Errors**: Failed AKEL runs, API errors, database issues | ||
| |
1.3 | 289 | |
| |
1.1 | 290 | === 7.2 Alerts === |
| |
1.3 | 291 | |
| |
1.1 | 292 | Automated alerts for: |
| |
1.3 | 293 | |
| |
1.1 | 294 | * Processing time >30 seconds (threshold breach) |
| 295 | * Error rate >1% (quality issue) | ||
| 296 | * Cache hit rate <80% (cache problem) | ||
| 297 | * Database connections >80% capacity (scaling needed) | ||
| |
1.3 | 298 | |
| |
1.1 | 299 | === 7.3 Dashboards === |
| |
1.3 | 300 | |
| |
1.1 | 301 | Real-time monitoring: |
| |
1.3 | 302 | |
| |
1.1 | 303 | * **System Health**: Overall status and key metrics |
| 304 | * **AKEL Performance**: Processing time breakdown | ||
| 305 | * **Quality Metrics**: Confidence scores, completeness | ||
| 306 | * **User Activity**: Usage patterns, peak times | ||
| |
1.3 | 307 | |
| |
1.1 | 308 | == 8. Security Architecture == |
| |
1.3 | 309 | |
| |
1.1 | 310 | === 8.1 Authentication & Authorization === |
| |
1.3 | 311 | |
| |
1.1 | 312 | * **User Authentication**: Secure login with password hashing |
| 313 | * **Role-Based Access**: Reader, Contributor, Moderator, Admin | ||
| 314 | * **API Keys**: For programmatic access | ||
| 315 | * **Rate Limiting**: Prevent abuse | ||
| |
1.3 | 316 | |
| |
1.1 | 317 | === 8.2 Data Security === |
| |
1.3 | 318 | |
| |
1.1 | 319 | * **Encryption**: TLS for transport, encrypted storage for sensitive data |
| 320 | * **Audit Logging**: Track all significant changes | ||
| 321 | * **Input Validation**: Sanitize all user inputs | ||
| 322 | * **SQL Injection Protection**: Parameterized queries | ||
| |
1.3 | 323 | |
| |
1.1 | 324 | === 8.3 Abuse Prevention === |
| |
1.3 | 325 | |
| |
1.1 | 326 | * **Rate Limiting**: Prevent flooding and DDoS |
| 327 | * **Automated Detection**: Flag suspicious patterns | ||
| 328 | * **Human Review**: Moderators investigate flagged content | ||
| 329 | * **Ban Mechanisms**: Block abusive users/IPs | ||
| |
1.3 | 330 | |
| |
1.1 | 331 | == 9. Deployment Architecture == |
| |
1.3 | 332 | |
| |
1.1 | 333 | === 9.1 Production Environment === |
| |
1.3 | 334 | |
| |
1.1 | 335 | **Components**: |
| |
1.3 | 336 | |
| |
1.1 | 337 | * Load Balancer (HAProxy or cloud LB) |
| 338 | * Multiple API servers (stateless) | ||
| 339 | * AKEL worker pool (auto-scaling) | ||
| 340 | * PostgreSQL primary + read replicas | ||
| 341 | * Redis cluster | ||
| 342 | * S3-compatible storage | ||
| 343 | **Regions**: Single region for V1.0, multi-region when needed | ||
| |
1.3 | 344 | |
| |
1.1 | 345 | === 9.2 Development & Staging === |
| |
1.3 | 346 | |
| |
1.1 | 347 | **Development**: Local Docker Compose setup |
| 348 | **Staging**: Scaled-down production replica | ||
| 349 | **CI/CD**: Automated testing and deployment | ||
| |
1.3 | 350 | |
| |
1.1 | 351 | === 9.3 Disaster Recovery === |
| |
1.3 | 352 | |
| |
1.1 | 353 | * **Database Backups**: Daily automated backups to S3 |
| 354 | * **Point-in-Time Recovery**: Transaction log archival | ||
| 355 | * **Replication**: Real-time replication to standby | ||
| 356 | * **Recovery Time Objective**: <4 hours | ||
| 357 | |||
| 358 | === 9.5 Federation Architecture Diagram === | ||
| 359 | |||
| |
1.14 | 360 | {{include reference="Archive.FactHarbor 2026\.01\.20.Specification.Diagrams.Federation Architecture.WebHome"/}} |
| |
1.1 | 361 | |
| 362 | == 10. Future Architecture Evolution == | ||
| |
1.3 | 363 | |
| |
1.1 | 364 | === 10.1 When to Add Complexity === |
| |
1.3 | 365 | |
| |
1.1 | 366 | See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers. |
| 367 | **Elasticsearch**: When PostgreSQL search consistently >500ms | ||
| 368 | **TimescaleDB**: When metrics queries consistently >1s | ||
| 369 | **Federation**: When 10,000+ users and explicit demand | ||
| 370 | **Complex Reputation**: When 100+ active contributors | ||
| |
1.3 | 371 | |
| |
1.1 | 372 | === 10.2 Federation (V2.0+) === |
| |
1.3 | 373 | |
| |
1.1 | 374 | **Deferred until**: |
| |
1.3 | 375 | |
| |
1.1 | 376 | * Core product proven with 10,000+ users |
| 377 | * User demand for decentralization | ||
| 378 | * Single-node limits reached | ||
| |
1.18 | 379 | See [[Federation & Decentralization>>Archive.FactHarbor 2026\.01\.20.Specification.Federation & Decentralization.WebHome]] for future plans. |
| |
1.3 | 380 | |
| |
1.1 | 381 | == 11. Technology Stack Summary == |
| |
1.3 | 382 | |
| |
1.1 | 383 | **Backend**: |
| |
1.3 | 384 | |
| |
1.1 | 385 | * Python (FastAPI or Django) |
| 386 | * PostgreSQL (primary database) | ||
| 387 | * Redis (caching) | ||
| 388 | **Frontend**: | ||
| 389 | * Modern JavaScript framework (React, Vue, or Svelte) | ||
| 390 | * Server-side rendering for SEO | ||
| 391 | **AI/LLM**: | ||
| 392 | * Multi-provider orchestration (Claude, GPT-4, local models) | ||
| 393 | * Fallback and cross-checking support | ||
| 394 | **Infrastructure**: | ||
| 395 | * Docker containers | ||
| 396 | * Kubernetes or cloud platform auto-scaling | ||
| 397 | * S3-compatible object storage | ||
| 398 | **Monitoring**: | ||
| 399 | * Prometheus + Grafana | ||
| 400 | * Structured logging (ELK or cloud logging) | ||
| 401 | * Error tracking (Sentry) | ||
| |
1.3 | 402 | |
| |
1.1 | 403 | == 12. Related Pages == |
| |
1.3 | 404 | |
| |
1.11 | 405 | * [[AI Knowledge Extraction Layer (AKEL)>>Archive.FactHarbor 2026\.01\.20.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] |
| |
1.12 | 406 | * [[Storage Strategy>>Archive.FactHarbor 2026\.01\.20.Specification.Architecture.WebHome]] |
| |
1.13 | 407 | * [[Data Model>>Archive.FactHarbor 2026\.01\.20.Specification.Data Model.WebHome]] |
| |
1.12 | 408 | * [[API Layer>>Archive.FactHarbor 2026\.01\.20.Specification.Architecture.WebHome]] |
| |
1.1 | 409 | * [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] |
| 410 | * [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] |