Wiki source code of Architecture
Last modified by Robert Schaub on 2025/12/24 21:53
Show last authors
| author | version | line-number | content |
|---|---|---|---|
| 1 | = Architecture = | ||
| 2 | FactHarbor's architecture is designed for **simplicity, automation, and continuous improvement**. | ||
| 3 | == 1. Core Principles == | ||
| 4 | * **AI-First**: AKEL (AI) is the primary system, humans supplement | ||
| 5 | * **Publish by Default**: No centralized approval (removed in V0.9.50), publish with confidence scores | ||
| 6 | * **System Over Data**: Fix algorithms, not individual outputs | ||
| 7 | * **Measure Everything**: Quality metrics drive improvements | ||
| 8 | * **Scale Through Automation**: Minimal human intervention | ||
| 9 | * **Start Simple**: Add complexity only when metrics prove necessary | ||
| 10 | == 2. High-Level Architecture == | ||
| 11 | {{include reference="FactHarbor.Specification.Diagrams.High-Level Architecture.WebHome"/}} | ||
| 12 | === 2.1 Three-Layer Architecture === | ||
| 13 | FactHarbor uses a clean three-layer architecture: | ||
| 14 | ==== Interface Layer ==== | ||
| 15 | Handles all user and system interactions: | ||
| 16 | * **Web UI**: Browse claims, view evidence, submit feedback | ||
| 17 | * **REST API**: Programmatic access for integrations | ||
| 18 | * **Authentication & Authorization**: User identity and permissions | ||
| 19 | * **Rate Limiting**: Protect against abuse | ||
| 20 | ==== Processing Layer ==== | ||
| 21 | Core business logic and AI processing: | ||
| 22 | * **AKEL Pipeline**: AI-driven claim analysis (parallel processing) | ||
| 23 | * Parse and extract claim components | ||
| 24 | * Gather evidence from multiple sources | ||
| 25 | * Check source track records | ||
| 26 | * Extract scenarios from evidence | ||
| 27 | * Synthesize verdicts | ||
| 28 | * Calculate risk scores | ||
| 29 | |||
| 30 | * **LLM Abstraction Layer**: Provider-agnostic AI access | ||
| 31 | * Multi-provider support (Anthropic, OpenAI, Google, local models) | ||
| 32 | * Automatic failover and rate limit handling | ||
| 33 | * Per-stage model configuration | ||
| 34 | * Cost optimization through provider selection | ||
| 35 | * No vendor lock-in | ||
| 36 | * **Background Jobs**: Automated maintenance tasks | ||
| 37 | * Source track record updates (weekly) | ||
| 38 | * Cache warming and invalidation | ||
| 39 | * Metrics aggregation | ||
| 40 | * Data archival | ||
| 41 | * **Quality Monitoring**: Automated quality checks | ||
| 42 | * Anomaly detection | ||
| 43 | * Contradiction detection | ||
| 44 | * Completeness validation | ||
| 45 | * **Moderation Detection**: Automated abuse detection | ||
| 46 | * Spam identification | ||
| 47 | * Manipulation detection | ||
| 48 | * Flag suspicious activity | ||
| 49 | ==== Data & Storage Layer ==== | ||
| 50 | Persistent data storage and caching: | ||
| 51 | * **PostgreSQL**: Primary database for all core data | ||
| 52 | * Claims, evidence, sources, users | ||
| 53 | * Scenarios, edits, audit logs | ||
| 54 | * Built-in full-text search | ||
| 55 | * Time-series capabilities for metrics | ||
| 56 | * **Redis**: High-speed caching layer | ||
| 57 | * Session data | ||
| 58 | * Frequently accessed claims | ||
| 59 | * API rate limiting | ||
| 60 | * **S3 Storage**: Long-term archival | ||
| 61 | * Old edit history (90+ days) | ||
| 62 | * AKEL processing logs | ||
| 63 | * Backup snapshots | ||
| 64 | **Optional future additions** (add only when metrics prove necessary): | ||
| 65 | * **Elasticsearch**: If PostgreSQL full-text search becomes slow | ||
| 66 | * **TimescaleDB**: If metrics queries become a bottleneck | ||
| 67 | |||
| 68 | |||
| 69 | === 2.2 LLM Abstraction Layer === | ||
| 70 | |||
| 71 | {{include reference="Test.FactHarbor.Specification.Diagrams.LLM Abstraction Architecture.WebHome"/}} | ||
| 72 | |||
| 73 | **Purpose:** FactHarbor uses a provider-agnostic abstraction layer for all AI interactions, avoiding vendor lock-in and enabling flexible provider selection. | ||
| 74 | |||
| 75 | **Multi-Provider Support:** | ||
| 76 | * **Primary:** Anthropic Claude API (Haiku for extraction, Sonnet for analysis) | ||
| 77 | * **Secondary:** OpenAI GPT API (automatic failover) | ||
| 78 | * **Tertiary:** Google Vertex AI / Gemini | ||
| 79 | * **Future:** Local models (Llama, Mistral) for on-premises deployments | ||
| 80 | |||
| 81 | **Provider Interface:** | ||
| 82 | * Abstract `LLMProvider` interface with `complete()`, `stream()`, `getName()`, `getCostPer1kTokens()`, `isAvailable()` methods | ||
| 83 | * Per-stage model configuration (Stage 1: Haiku, Stage 2 & 3: Sonnet) | ||
| 84 | * Environment variable and database configuration | ||
| 85 | * Adapter pattern implementation (AnthropicProvider, OpenAIProvider, GoogleProvider) | ||
| 86 | |||
| 87 | **Configuration:** | ||
| 88 | * Runtime provider switching without code changes | ||
| 89 | * Admin API for provider management (`POST /admin/v1/llm/configure`) | ||
| 90 | * Per-stage cost optimization (use cheaper models for extraction, quality models for analysis) | ||
| 91 | * Support for rate limit handling and cost tracking | ||
| 92 | |||
| 93 | **Failover Strategy:** | ||
| 94 | * Automatic fallback: Primary → Secondary → Tertiary | ||
| 95 | * Circuit breaker pattern for unavailable providers | ||
| 96 | * Health checking and provider availability monitoring | ||
| 97 | * Graceful degradation when all providers unavailable | ||
| 98 | |||
| 99 | **Cost Optimization:** | ||
| 100 | * Track and compare costs across providers per request | ||
| 101 | * Enable A/B testing of different models for quality/cost tradeoffs | ||
| 102 | * Per-stage provider selection for optimal cost-efficiency | ||
| 103 | * Cost comparison: Anthropic ($0.114), OpenAI ($0.065), Google ($0.072) per article at 0% cache | ||
| 104 | |||
| 105 | **Architecture Pattern:** | ||
| 106 | |||
| 107 | {{code}} | ||
| 108 | AKEL Stages LLM Abstraction Providers | ||
| 109 | ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ | ||
| 110 | Stage 1 Extract ──→ Provider Interface ──→ Anthropic (PRIMARY) | ||
| 111 | Stage 2 Analyze ──→ Configuration ──→ OpenAI (SECONDARY) | ||
| 112 | Stage 3 Holistic ──→ Failover Handler ──→ Google (TERTIARY) | ||
| 113 | └→ Local Models (FUTURE) | ||
| 114 | {{/code}} | ||
| 115 | |||
| 116 | **Benefits:** | ||
| 117 | * **No Vendor Lock-In:** Switch providers based on cost, quality, or availability without code changes | ||
| 118 | * **Resilience:** Automatic failover ensures service continuity during provider outages | ||
| 119 | * **Cost Efficiency:** Use optimal provider per task (cheap for extraction, quality for analysis) | ||
| 120 | * **Quality Assurance:** Cross-provider output verification for critical claims | ||
| 121 | * **Regulatory Compliance:** Use specific providers for data residency requirements | ||
| 122 | * **Future-Proofing:** Easy integration of new models as they become available | ||
| 123 | |||
| 124 | **Cross-References:** | ||
| 125 | * [[Requirements>>FactHarbor.Specification.Requirements.WebHome#NFR-14]]: NFR-14 (formal requirement) | ||
| 126 | * [[POC Requirements>>FactHarbor.Specification.POC.Requirements#NFR-POC-11]]: NFR-POC-11 (POC1 implementation) | ||
| 127 | * [[API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome#Section-6]]: Section 6 (implementation details) | ||
| 128 | * [[Design Decisions>>FactHarbor.Specification.Design-Decisions#Section-9]]: Section 9 (design rationale) | ||
| 129 | |||
| 130 | |||
| 131 | === 2.2 Design Philosophy === | ||
| 132 | **Start Simple, Evolve Based on Metrics** | ||
| 133 | The architecture deliberately starts simple: | ||
| 134 | * Single primary database (PostgreSQL handles most workloads initially) | ||
| 135 | * Three clear layers (easy to understand and maintain) | ||
| 136 | * Automated operations (minimal human intervention) | ||
| 137 | * Measure before optimizing (add complexity only when proven necessary) | ||
| 138 | See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale. | ||
| 139 | == 3. AKEL Architecture == | ||
| 140 | {{include reference="FactHarbor.Specification.Diagrams.AKEL Architecture.WebHome"/}} | ||
| 141 | See [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information. | ||
| 142 | |||
| 143 | == 3.5 Claim Processing Architecture == | ||
| 144 | |||
| 145 | FactHarbor's claim processing architecture is designed to handle both single-claim and multi-claim submissions efficiently. | ||
| 146 | |||
| 147 | === Multi-Claim Handling === | ||
| 148 | |||
| 149 | Users often submit: | ||
| 150 | * **Text with multiple claims**: Articles, statements, or paragraphs containing several distinct factual claims | ||
| 151 | * **Web pages**: URLs that are analyzed to extract all verifiable claims | ||
| 152 | * **Single claims**: Simple, direct factual statements | ||
| 153 | |||
| 154 | The first processing step is always **Claim Extraction**: identifying and isolating individual verifiable claims from submitted content. | ||
| 155 | |||
| 156 | === Processing Phases === | ||
| 157 | |||
| 158 | **POC Implementation (Two-Phase):** | ||
| 159 | |||
| 160 | Phase 1 - Claim Extraction: | ||
| 161 | * LLM analyzes submitted content | ||
| 162 | * Extracts all distinct, verifiable claims | ||
| 163 | * Returns structured list of claims with context | ||
| 164 | |||
| 165 | Phase 2 - Parallel Analysis: | ||
| 166 | * Each claim processed independently by LLM | ||
| 167 | * Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk | ||
| 168 | * Parallelized across all claims | ||
| 169 | * Results aggregated for presentation | ||
| 170 | |||
| 171 | **Production Implementation (Three-Phase):** | ||
| 172 | |||
| 173 | Phase 1 - Extraction + Validation: | ||
| 174 | * Extract claims from content | ||
| 175 | * Validate clarity and uniqueness | ||
| 176 | * Filter vague or duplicate claims | ||
| 177 | |||
| 178 | Phase 2 - Evidence Gathering (Parallel): | ||
| 179 | * Independent evidence gathering per claim | ||
| 180 | * Source validation and scenario generation | ||
| 181 | * Quality gates prevent poor data from advancing | ||
| 182 | |||
| 183 | Phase 3 - Verdict Generation (Parallel): | ||
| 184 | * Generate verdict from validated evidence | ||
| 185 | * Confidence scoring and risk assessment | ||
| 186 | * Low-confidence cases routed to human review | ||
| 187 | |||
| 188 | === Architectural Benefits === | ||
| 189 | |||
| 190 | **Scalability:** | ||
| 191 | * Process 100 claims with ~3x latency of single claim | ||
| 192 | * Parallel processing across independent claims | ||
| 193 | * Linear cost scaling with claim count | ||
| 194 | === 2.3 Design Philosophy === | ||
| 195 | **Quality:** | ||
| 196 | * Validation gates between phases | ||
| 197 | * Errors isolated to individual claims | ||
| 198 | * Clear observability per processing step | ||
| 199 | |||
| 200 | **Flexibility:** | ||
| 201 | * Each phase optimizable independently | ||
| 202 | * Can use different model sizes per phase | ||
| 203 | * Easy to add human review at decision points | ||
| 204 | |||
| 205 | == 4. Storage Architecture == | ||
| 206 | {{include reference="FactHarbor.Specification.Diagrams.Storage Architecture.WebHome"/}} | ||
| 207 | See [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] for detailed information. | ||
| 208 | == 4.5 Versioning Architecture == | ||
| 209 | {{include reference="FactHarbor.Specification.Diagrams.Versioning Architecture.WebHome"/}} | ||
| 210 | == 5. Automated Systems in Detail == | ||
| 211 | FactHarbor relies heavily on automation to achieve scale and quality. Here's how each automated system works: | ||
| 212 | === 5.1 AKEL (AI Knowledge Evaluation Layer) === | ||
| 213 | **What it does**: Primary AI processing engine that analyzes claims automatically | ||
| 214 | **Inputs**: | ||
| 215 | * User-submitted claim text | ||
| 216 | * Existing evidence and sources | ||
| 217 | * Source track record database | ||
| 218 | **Processing steps**: | ||
| 219 | 1. **Parse & Extract**: Identify key components, entities, assertions | ||
| 220 | 2. **Gather Evidence**: Search web and database for relevant sources | ||
| 221 | 3. **Check Sources**: Evaluate source reliability using track records | ||
| 222 | 4. **Extract Scenarios**: Identify different contexts from evidence | ||
| 223 | 5. **Synthesize Verdict**: Compile evidence assessment per scenario | ||
| 224 | 6. **Calculate Risk**: Assess potential harm and controversy | ||
| 225 | **Outputs**: | ||
| 226 | * Structured claim record | ||
| 227 | * Evidence links with relevance scores | ||
| 228 | * Scenarios with context descriptions | ||
| 229 | * Verdict summary per scenario | ||
| 230 | * Overall confidence score | ||
| 231 | * Risk assessment | ||
| 232 | **Timing**: 10-18 seconds total (parallel processing) | ||
| 233 | === 5.2 Background Jobs === | ||
| 234 | **Source Track Record Updates** (Weekly): | ||
| 235 | * Analyze claim outcomes from past week | ||
| 236 | * Calculate source accuracy and reliability | ||
| 237 | * Update source_track_record table | ||
| 238 | * Never triggered by individual claims (prevents circular dependencies) | ||
| 239 | **Cache Management** (Continuous): | ||
| 240 | * Warm cache for popular claims | ||
| 241 | * Invalidate cache on claim updates | ||
| 242 | * Monitor cache hit rates | ||
| 243 | **Metrics Aggregation** (Hourly): | ||
| 244 | * Roll up detailed metrics | ||
| 245 | * Calculate system health indicators | ||
| 246 | * Generate performance reports | ||
| 247 | **Data Archival** (Daily): | ||
| 248 | * Move old AKEL logs to S3 (90+ days) | ||
| 249 | * Archive old edit history | ||
| 250 | * Compress and backup data | ||
| 251 | === 5.3 Quality Monitoring === | ||
| 252 | **Automated checks run continuously**: | ||
| 253 | * **Anomaly Detection**: Flag unusual patterns | ||
| 254 | * Sudden confidence score changes | ||
| 255 | * Unusual evidence distributions | ||
| 256 | * Suspicious source patterns | ||
| 257 | * **Contradiction Detection**: Identify conflicts | ||
| 258 | * Evidence that contradicts other evidence | ||
| 259 | * Claims with internal contradictions | ||
| 260 | * Source track record anomalies | ||
| 261 | * **Completeness Validation**: Ensure thoroughness | ||
| 262 | * Sufficient evidence gathered | ||
| 263 | * Multiple source types represented | ||
| 264 | * Key scenarios identified | ||
| 265 | === 5.4 Moderation Detection === | ||
| 266 | **Automated abuse detection**: | ||
| 267 | * **Spam Identification**: Pattern matching for spam claims | ||
| 268 | * **Manipulation Detection**: Identify coordinated editing | ||
| 269 | * **Gaming Detection**: Flag attempts to game source scores | ||
| 270 | * **Suspicious Activity**: Log unusual behavior patterns | ||
| 271 | **Human Review**: Moderators review flagged items, system learns from decisions | ||
| 272 | == 6. Scalability Strategy == | ||
| 273 | === 6.1 Horizontal Scaling === | ||
| 274 | Components scale independently: | ||
| 275 | * **AKEL Workers**: Add more processing workers as claim volume grows | ||
| 276 | * **Database Read Replicas**: Add replicas for read-heavy workloads | ||
| 277 | * **Cache Layer**: Redis cluster for distributed caching | ||
| 278 | * **API Servers**: Load-balanced API instances | ||
| 279 | === 6.2 Vertical Scaling === | ||
| 280 | Individual components can be upgraded: | ||
| 281 | * **Database Server**: Increase CPU/RAM for PostgreSQL | ||
| 282 | * **Cache Memory**: Expand Redis memory | ||
| 283 | * **Worker Resources**: More powerful AKEL worker machines | ||
| 284 | === 6.3 Performance Optimization === | ||
| 285 | Built-in optimizations: | ||
| 286 | * **Denormalized Data**: Cache summary data in claim records (70% fewer joins) | ||
| 287 | * **Parallel Processing**: AKEL pipeline processes in parallel (40% faster) | ||
| 288 | * **Intelligent Caching**: Redis caches frequently accessed data | ||
| 289 | * **Background Processing**: Non-urgent tasks run asynchronously | ||
| 290 | == 7. Monitoring & Observability == | ||
| 291 | === 7.1 Key Metrics === | ||
| 292 | System tracks: | ||
| 293 | * **Performance**: AKEL processing time, API response time, cache hit rate | ||
| 294 | * **Quality**: Confidence score distribution, evidence completeness, contradiction rate | ||
| 295 | * **Usage**: Claims per day, active users, API requests | ||
| 296 | * **Errors**: Failed AKEL runs, API errors, database issues | ||
| 297 | === 7.2 Alerts === | ||
| 298 | Automated alerts for: | ||
| 299 | * Processing time >30 seconds (threshold breach) | ||
| 300 | * Error rate >1% (quality issue) | ||
| 301 | * Cache hit rate <80% (cache problem) | ||
| 302 | * Database connections >80% capacity (scaling needed) | ||
| 303 | === 7.3 Dashboards === | ||
| 304 | Real-time monitoring: | ||
| 305 | * **System Health**: Overall status and key metrics | ||
| 306 | * **AKEL Performance**: Processing time breakdown | ||
| 307 | * **Quality Metrics**: Confidence scores, completeness | ||
| 308 | * **User Activity**: Usage patterns, peak times | ||
| 309 | == 8. Security Architecture == | ||
| 310 | === 8.1 Authentication & Authorization === | ||
| 311 | * **User Authentication**: Secure login with password hashing | ||
| 312 | * **Role-Based Access**: Reader, Contributor, Moderator, Admin | ||
| 313 | * **API Keys**: For programmatic access | ||
| 314 | * **Rate Limiting**: Prevent abuse | ||
| 315 | === 8.2 Data Security === | ||
| 316 | * **Encryption**: TLS for transport, encrypted storage for sensitive data | ||
| 317 | * **Audit Logging**: Track all significant changes | ||
| 318 | * **Input Validation**: Sanitize all user inputs | ||
| 319 | * **SQL Injection Protection**: Parameterized queries | ||
| 320 | === 8.3 Abuse Prevention === | ||
| 321 | * **Rate Limiting**: Prevent flooding and DDoS | ||
| 322 | * **Automated Detection**: Flag suspicious patterns | ||
| 323 | * **Human Review**: Moderators investigate flagged content | ||
| 324 | * **Ban Mechanisms**: Block abusive users/IPs | ||
| 325 | == 9. Deployment Architecture == | ||
| 326 | === 9.1 Production Environment === | ||
| 327 | **Components**: | ||
| 328 | * Load Balancer (HAProxy or cloud LB) | ||
| 329 | * Multiple API servers (stateless) | ||
| 330 | * AKEL worker pool (auto-scaling) | ||
| 331 | * PostgreSQL primary + read replicas | ||
| 332 | * Redis cluster | ||
| 333 | * S3-compatible storage | ||
| 334 | **Regions**: Single region for V1.0, multi-region when needed | ||
| 335 | === 9.2 Development & Staging === | ||
| 336 | **Development**: Local Docker Compose setup | ||
| 337 | **Staging**: Scaled-down production replica | ||
| 338 | **CI/CD**: Automated testing and deployment | ||
| 339 | === 9.3 Disaster Recovery === | ||
| 340 | * **Database Backups**: Daily automated backups to S3 | ||
| 341 | * **Point-in-Time Recovery**: Transaction log archival | ||
| 342 | * **Replication**: Real-time replication to standby | ||
| 343 | * **Recovery Time Objective**: <4 hours | ||
| 344 | |||
| 345 | === 9.5 Federation Architecture Diagram === | ||
| 346 | |||
| 347 | {{include reference="FactHarbor.Specification.Diagrams.Federation Architecture.WebHome"/}} | ||
| 348 | |||
| 349 | == 10. Future Architecture Evolution == | ||
| 350 | === 10.1 When to Add Complexity === | ||
| 351 | See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers. | ||
| 352 | **Elasticsearch**: When PostgreSQL search consistently >500ms | ||
| 353 | **TimescaleDB**: When metrics queries consistently >1s | ||
| 354 | **Federation**: When 10,000+ users and explicit demand | ||
| 355 | **Complex Reputation**: When 100+ active contributors | ||
| 356 | === 10.2 Federation (V2.0+) === | ||
| 357 | **Deferred until**: | ||
| 358 | * Core product proven with 10,000+ users | ||
| 359 | * User demand for decentralization | ||
| 360 | * Single-node limits reached | ||
| 361 | See [[Federation & Decentralization>>FactHarbor.Specification.Federation & Decentralization.WebHome]] for future plans. | ||
| 362 | == 11. Technology Stack Summary == | ||
| 363 | **Backend**: | ||
| 364 | * Python (FastAPI or Django) | ||
| 365 | * PostgreSQL (primary database) | ||
| 366 | * Redis (caching) | ||
| 367 | **Frontend**: | ||
| 368 | * Modern JavaScript framework (React, Vue, or Svelte) | ||
| 369 | * Server-side rendering for SEO | ||
| 370 | **AI/LLM**: | ||
| 371 | * Multi-provider orchestration (Claude, GPT-4, local models) | ||
| 372 | * Fallback and cross-checking support | ||
| 373 | **Infrastructure**: | ||
| 374 | * Docker containers | ||
| 375 | * Kubernetes or cloud platform auto-scaling | ||
| 376 | * S3-compatible object storage | ||
| 377 | **Monitoring**: | ||
| 378 | * Prometheus + Grafana | ||
| 379 | * Structured logging (ELK or cloud logging) | ||
| 380 | * Error tracking (Sentry) | ||
| 381 | == 12. Related Pages == | ||
| 382 | * [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] | ||
| 383 | * [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] | ||
| 384 | * [[Data Model>>FactHarbor.Specification.Data Model.WebHome]] | ||
| 385 | * [[API Layer>>FactHarbor.Specification.Architecture.WebHome]] | ||
| 386 | * [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] | ||
| 387 | * [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] |