Wiki source code of Architecture

Last modified by Robert Schaub on 2026/02/08 08:23

Hide last authors
Robert Schaub 1.1 1 = Architecture =
Robert Schaub 1.3 2
Robert Schaub 1.1 3 FactHarbor's architecture is designed for **simplicity, automation, and continuous improvement**.
Robert Schaub 1.3 4
Robert Schaub 1.1 5 == 1. Core Principles ==
Robert Schaub 1.3 6
Robert Schaub 1.1 7 * **AI-First**: AKEL (AI) is the primary system, humans supplement
8 * **Publish by Default**: No centralized approval (removed in V0.9.50), publish with confidence scores
9 * **System Over Data**: Fix algorithms, not individual outputs
10 * **Measure Everything**: Quality metrics drive improvements
11 * **Scale Through Automation**: Minimal human intervention
12 * **Start Simple**: Add complexity only when metrics prove necessary
Robert Schaub 1.3 13
Robert Schaub 1.1 14 == 2. High-Level Architecture ==
Robert Schaub 1.3 15
Robert Schaub 1.15 16 {{include reference="Archive.FactHarbor 2026\.01\.20.Specification.Diagrams.High-Level Architecture.WebHome"/}}
Robert Schaub 1.3 17
Robert Schaub 1.1 18 === 2.1 Three-Layer Architecture ===
Robert Schaub 1.3 19
Robert Schaub 1.1 20 FactHarbor uses a clean three-layer architecture:
Robert Schaub 1.3 21
Robert Schaub 1.1 22 ==== Interface Layer ====
Robert Schaub 1.3 23
Robert Schaub 1.1 24 Handles all user and system interactions:
Robert Schaub 1.3 25
Robert Schaub 1.1 26 * **Web UI**: Browse claims, view evidence, submit feedback
27 * **REST API**: Programmatic access for integrations
28 * **Authentication & Authorization**: User identity and permissions
29 * **Rate Limiting**: Protect against abuse
Robert Schaub 1.3 30
Robert Schaub 1.1 31 ==== Processing Layer ====
Robert Schaub 1.3 32
Robert Schaub 1.1 33 Core business logic and AI processing:
Robert Schaub 1.3 34
Robert Schaub 1.1 35 * **AKEL Pipeline**: AI-driven claim analysis (parallel processing)
Robert Schaub 1.3 36 * Parse and extract claim components
37 * Gather evidence from multiple sources
38 * Check source track records
39 * Extract scenarios from evidence
40 * Synthesize verdicts
41 * Calculate risk scores
Robert Schaub 1.1 42 * **Background Jobs**: Automated maintenance tasks
Robert Schaub 1.3 43 * Source track record updates (weekly)
44 * Cache warming and invalidation
45 * Metrics aggregation
46 * Data archival
Robert Schaub 1.1 47 * **Quality Monitoring**: Automated quality checks
Robert Schaub 1.3 48 * Anomaly detection
49 * Contradiction detection
50 * Completeness validation
Robert Schaub 1.1 51 * **Moderation Detection**: Automated abuse detection
Robert Schaub 1.3 52 * Spam identification
53 * Manipulation detection
54 * Flag suspicious activity
55
Robert Schaub 1.1 56 ==== Data & Storage Layer ====
Robert Schaub 1.3 57
Robert Schaub 1.1 58 Persistent data storage and caching:
Robert Schaub 1.3 59
Robert Schaub 1.1 60 * **PostgreSQL**: Primary database for all core data
Robert Schaub 1.3 61 * Claims, evidence, sources, users
62 * Scenarios, edits, audit logs
63 * Built-in full-text search
64 * Time-series capabilities for metrics
Robert Schaub 1.1 65 * **Redis**: High-speed caching layer
Robert Schaub 1.3 66 * Session data
67 * Frequently accessed claims
68 * API rate limiting
Robert Schaub 1.1 69 * **S3 Storage**: Long-term archival
Robert Schaub 1.3 70 * Old edit history (90+ days)
71 * AKEL processing logs
72 * Backup snapshots
Robert Schaub 1.1 73 **Optional future additions** (add only when metrics prove necessary):
74 * **Elasticsearch**: If PostgreSQL full-text search becomes slow
75 * **TimescaleDB**: If metrics queries become a bottleneck
Robert Schaub 1.3 76
Robert Schaub 1.1 77 === 2.2 Design Philosophy ===
Robert Schaub 1.3 78
Robert Schaub 1.1 79 **Start Simple, Evolve Based on Metrics**
80 The architecture deliberately starts simple:
Robert Schaub 1.3 81
Robert Schaub 1.1 82 * Single primary database (PostgreSQL handles most workloads initially)
83 * Three clear layers (easy to understand and maintain)
84 * Automated operations (minimal human intervention)
85 * Measure before optimizing (add complexity only when proven necessary)
86 See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale.
Robert Schaub 1.3 87
Robert Schaub 1.1 88 == 3. AKEL Architecture ==
Robert Schaub 1.3 89
Robert Schaub 1.1 90 {{include reference="FactHarbor.Specification.Diagrams.AKEL_Architecture.WebHome"/}}
Robert Schaub 1.11 91 See [[AI Knowledge Extraction Layer (AKEL)>>Archive.FactHarbor 2026\.01\.20.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information.
Robert Schaub 1.1 92
93 == 3.5 Claim Processing Architecture ==
94
95 FactHarbor's claim processing architecture is designed to handle both single-claim and multi-claim submissions efficiently.
96
97 === Multi-Claim Handling ===
98
99 Users often submit:
Robert Schaub 1.3 100
Robert Schaub 1.1 101 * **Text with multiple claims**: Articles, statements, or paragraphs containing several distinct factual claims
102 * **Web pages**: URLs that are analyzed to extract all verifiable claims
103 * **Single claims**: Simple, direct factual statements
104
105 The first processing step is always **Claim Extraction**: identifying and isolating individual verifiable claims from submitted content.
106
107 === Processing Phases ===
108
109 **POC Implementation (Two-Phase):**
110
111 Phase 1 - Claim Extraction:
Robert Schaub 1.3 112
Robert Schaub 1.1 113 * LLM analyzes submitted content
114 * Extracts all distinct, verifiable claims
115 * Returns structured list of claims with context
116
117 Phase 2 - Parallel Analysis:
Robert Schaub 1.3 118
Robert Schaub 1.1 119 * Each claim processed independently by LLM
120 * Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk
121 * Parallelized across all claims
122 * Results aggregated for presentation
123
124 **Production Implementation (Three-Phase):**
125
126 Phase 1 - Extraction + Validation:
Robert Schaub 1.3 127
Robert Schaub 1.1 128 * Extract claims from content
129 * Validate clarity and uniqueness
130 * Filter vague or duplicate claims
131
132 Phase 2 - Evidence Gathering (Parallel):
Robert Schaub 1.3 133
Robert Schaub 1.1 134 * Independent evidence gathering per claim
135 * Source validation and scenario generation
136 * Quality gates prevent poor data from advancing
137
138 Phase 3 - Verdict Generation (Parallel):
Robert Schaub 1.3 139
Robert Schaub 1.1 140 * Generate verdict from validated evidence
141 * Confidence scoring and risk assessment
142 * Low-confidence cases routed to human review
143
144 === Architectural Benefits ===
145
146 **Scalability:**
Robert Schaub 1.3 147
148 * Process 100 claims with 3x latency of single claim
Robert Schaub 1.1 149 * Parallel processing across independent claims
150 * Linear cost scaling with claim count
151
152 **Quality:**
Robert Schaub 1.3 153
Robert Schaub 1.1 154 * Validation gates between phases
155 * Errors isolated to individual claims
156 * Clear observability per processing step
157
158 **Flexibility:**
Robert Schaub 1.3 159
Robert Schaub 1.1 160 * Each phase optimizable independently
161 * Can use different model sizes per phase
162 * Easy to add human review at decision points
163
Robert Schaub 1.3 164 == 4. Storage Architecture ==
Robert Schaub 1.1 165
Robert Schaub 1.16 166 {{include reference="Archive.FactHarbor 2026\.01\.20.Specification.Diagrams.Storage Architecture.WebHome"/}}
Robert Schaub 1.12 167 See [[Storage Strategy>>Archive.FactHarbor 2026\.01\.20.Specification.Architecture.WebHome]] for detailed information.
Robert Schaub 1.3 168
Robert Schaub 1.1 169 == 4.5 Versioning Architecture ==
Robert Schaub 1.3 170
Robert Schaub 1.17 171 {{include reference="Archive.FactHarbor 2026\.01\.20.Specification.Diagrams.Versioning Architecture.WebHome"/}}
Robert Schaub 1.3 172
Robert Schaub 1.1 173 == 5. Automated Systems in Detail ==
Robert Schaub 1.3 174
Robert Schaub 1.1 175 FactHarbor relies heavily on automation to achieve scale and quality. Here's how each automated system works:
Robert Schaub 1.3 176
Robert Schaub 1.1 177 === 5.1 AKEL (AI Knowledge Evaluation Layer) ===
Robert Schaub 1.3 178
Robert Schaub 1.1 179 **What it does**: Primary AI processing engine that analyzes claims automatically
180 **Inputs**:
Robert Schaub 1.3 181
Robert Schaub 1.1 182 * User-submitted claim text
183 * Existing evidence and sources
184 * Source track record database
185 **Processing steps**:
Robert Schaub 1.3 186
Robert Schaub 1.1 187 1. **Parse & Extract**: Identify key components, entities, assertions
188 2. **Gather Evidence**: Search web and database for relevant sources
189 3. **Check Sources**: Evaluate source reliability using track records
190 4. **Extract Scenarios**: Identify different contexts from evidence
191 5. **Synthesize Verdict**: Compile evidence assessment per scenario
192 6. **Calculate Risk**: Assess potential harm and controversy
193 **Outputs**:
Robert Schaub 1.3 194
Robert Schaub 1.1 195 * Structured claim record
196 * Evidence links with relevance scores
197 * Scenarios with context descriptions
198 * Verdict summary per scenario
199 * Overall confidence score
200 * Risk assessment
201 **Timing**: 10-18 seconds total (parallel processing)
Robert Schaub 1.3 202
Robert Schaub 1.1 203 === 5.2 Background Jobs ===
Robert Schaub 1.3 204
Robert Schaub 1.1 205 **Source Track Record Updates** (Weekly):
Robert Schaub 1.3 206
Robert Schaub 1.1 207 * Analyze claim outcomes from past week
208 * Calculate source accuracy and reliability
209 * Update source_track_record table
210 * Never triggered by individual claims (prevents circular dependencies)
211 **Cache Management** (Continuous):
212 * Warm cache for popular claims
213 * Invalidate cache on claim updates
214 * Monitor cache hit rates
215 **Metrics Aggregation** (Hourly):
216 * Roll up detailed metrics
217 * Calculate system health indicators
218 * Generate performance reports
219 **Data Archival** (Daily):
220 * Move old AKEL logs to S3 (90+ days)
221 * Archive old edit history
222 * Compress and backup data
Robert Schaub 1.3 223
Robert Schaub 1.1 224 === 5.3 Quality Monitoring ===
Robert Schaub 1.3 225
Robert Schaub 1.1 226 **Automated checks run continuously**:
Robert Schaub 1.3 227
Robert Schaub 1.1 228 * **Anomaly Detection**: Flag unusual patterns
Robert Schaub 1.3 229 * Sudden confidence score changes
230 * Unusual evidence distributions
231 * Suspicious source patterns
Robert Schaub 1.1 232 * **Contradiction Detection**: Identify conflicts
Robert Schaub 1.3 233 * Evidence that contradicts other evidence
234 * Claims with internal contradictions
235 * Source track record anomalies
Robert Schaub 1.1 236 * **Completeness Validation**: Ensure thoroughness
Robert Schaub 1.3 237 * Sufficient evidence gathered
238 * Multiple source types represented
239 * Key scenarios identified
240
Robert Schaub 1.1 241 === 5.4 Moderation Detection ===
Robert Schaub 1.3 242
Robert Schaub 1.1 243 **Automated abuse detection**:
Robert Schaub 1.3 244
Robert Schaub 1.1 245 * **Spam Identification**: Pattern matching for spam claims
246 * **Manipulation Detection**: Identify coordinated editing
247 * **Gaming Detection**: Flag attempts to game source scores
248 * **Suspicious Activity**: Log unusual behavior patterns
249 **Human Review**: Moderators review flagged items, system learns from decisions
Robert Schaub 1.3 250
Robert Schaub 1.1 251 == 6. Scalability Strategy ==
Robert Schaub 1.3 252
Robert Schaub 1.1 253 === 6.1 Horizontal Scaling ===
Robert Schaub 1.3 254
Robert Schaub 1.1 255 Components scale independently:
Robert Schaub 1.3 256
Robert Schaub 1.1 257 * **AKEL Workers**: Add more processing workers as claim volume grows
258 * **Database Read Replicas**: Add replicas for read-heavy workloads
259 * **Cache Layer**: Redis cluster for distributed caching
260 * **API Servers**: Load-balanced API instances
Robert Schaub 1.3 261
Robert Schaub 1.1 262 === 6.2 Vertical Scaling ===
Robert Schaub 1.3 263
Robert Schaub 1.1 264 Individual components can be upgraded:
Robert Schaub 1.3 265
Robert Schaub 1.1 266 * **Database Server**: Increase CPU/RAM for PostgreSQL
267 * **Cache Memory**: Expand Redis memory
268 * **Worker Resources**: More powerful AKEL worker machines
Robert Schaub 1.3 269
Robert Schaub 1.1 270 === 6.3 Performance Optimization ===
Robert Schaub 1.3 271
Robert Schaub 1.1 272 Built-in optimizations:
Robert Schaub 1.3 273
Robert Schaub 1.1 274 * **Denormalized Data**: Cache summary data in claim records (70% fewer joins)
275 * **Parallel Processing**: AKEL pipeline processes in parallel (40% faster)
276 * **Intelligent Caching**: Redis caches frequently accessed data
277 * **Background Processing**: Non-urgent tasks run asynchronously
Robert Schaub 1.3 278
Robert Schaub 1.1 279 == 7. Monitoring & Observability ==
Robert Schaub 1.3 280
Robert Schaub 1.1 281 === 7.1 Key Metrics ===
Robert Schaub 1.3 282
Robert Schaub 1.1 283 System tracks:
Robert Schaub 1.3 284
Robert Schaub 1.1 285 * **Performance**: AKEL processing time, API response time, cache hit rate
286 * **Quality**: Confidence score distribution, evidence completeness, contradiction rate
287 * **Usage**: Claims per day, active users, API requests
288 * **Errors**: Failed AKEL runs, API errors, database issues
Robert Schaub 1.3 289
Robert Schaub 1.1 290 === 7.2 Alerts ===
Robert Schaub 1.3 291
Robert Schaub 1.1 292 Automated alerts for:
Robert Schaub 1.3 293
Robert Schaub 1.1 294 * Processing time >30 seconds (threshold breach)
295 * Error rate >1% (quality issue)
296 * Cache hit rate <80% (cache problem)
297 * Database connections >80% capacity (scaling needed)
Robert Schaub 1.3 298
Robert Schaub 1.1 299 === 7.3 Dashboards ===
Robert Schaub 1.3 300
Robert Schaub 1.1 301 Real-time monitoring:
Robert Schaub 1.3 302
Robert Schaub 1.1 303 * **System Health**: Overall status and key metrics
304 * **AKEL Performance**: Processing time breakdown
305 * **Quality Metrics**: Confidence scores, completeness
306 * **User Activity**: Usage patterns, peak times
Robert Schaub 1.3 307
Robert Schaub 1.1 308 == 8. Security Architecture ==
Robert Schaub 1.3 309
Robert Schaub 1.1 310 === 8.1 Authentication & Authorization ===
Robert Schaub 1.3 311
Robert Schaub 1.1 312 * **User Authentication**: Secure login with password hashing
313 * **Role-Based Access**: Reader, Contributor, Moderator, Admin
314 * **API Keys**: For programmatic access
315 * **Rate Limiting**: Prevent abuse
Robert Schaub 1.3 316
Robert Schaub 1.1 317 === 8.2 Data Security ===
Robert Schaub 1.3 318
Robert Schaub 1.1 319 * **Encryption**: TLS for transport, encrypted storage for sensitive data
320 * **Audit Logging**: Track all significant changes
321 * **Input Validation**: Sanitize all user inputs
322 * **SQL Injection Protection**: Parameterized queries
Robert Schaub 1.3 323
Robert Schaub 1.1 324 === 8.3 Abuse Prevention ===
Robert Schaub 1.3 325
Robert Schaub 1.1 326 * **Rate Limiting**: Prevent flooding and DDoS
327 * **Automated Detection**: Flag suspicious patterns
328 * **Human Review**: Moderators investigate flagged content
329 * **Ban Mechanisms**: Block abusive users/IPs
Robert Schaub 1.3 330
Robert Schaub 1.1 331 == 9. Deployment Architecture ==
Robert Schaub 1.3 332
Robert Schaub 1.1 333 === 9.1 Production Environment ===
Robert Schaub 1.3 334
Robert Schaub 1.1 335 **Components**:
Robert Schaub 1.3 336
Robert Schaub 1.1 337 * Load Balancer (HAProxy or cloud LB)
338 * Multiple API servers (stateless)
339 * AKEL worker pool (auto-scaling)
340 * PostgreSQL primary + read replicas
341 * Redis cluster
342 * S3-compatible storage
343 **Regions**: Single region for V1.0, multi-region when needed
Robert Schaub 1.3 344
Robert Schaub 1.1 345 === 9.2 Development & Staging ===
Robert Schaub 1.3 346
Robert Schaub 1.1 347 **Development**: Local Docker Compose setup
348 **Staging**: Scaled-down production replica
349 **CI/CD**: Automated testing and deployment
Robert Schaub 1.3 350
Robert Schaub 1.1 351 === 9.3 Disaster Recovery ===
Robert Schaub 1.3 352
Robert Schaub 1.1 353 * **Database Backups**: Daily automated backups to S3
354 * **Point-in-Time Recovery**: Transaction log archival
355 * **Replication**: Real-time replication to standby
356 * **Recovery Time Objective**: <4 hours
357
358 === 9.5 Federation Architecture Diagram ===
359
Robert Schaub 1.14 360 {{include reference="Archive.FactHarbor 2026\.01\.20.Specification.Diagrams.Federation Architecture.WebHome"/}}
Robert Schaub 1.1 361
362 == 10. Future Architecture Evolution ==
Robert Schaub 1.3 363
Robert Schaub 1.1 364 === 10.1 When to Add Complexity ===
Robert Schaub 1.3 365
Robert Schaub 1.1 366 See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers.
367 **Elasticsearch**: When PostgreSQL search consistently >500ms
368 **TimescaleDB**: When metrics queries consistently >1s
369 **Federation**: When 10,000+ users and explicit demand
370 **Complex Reputation**: When 100+ active contributors
Robert Schaub 1.3 371
Robert Schaub 1.1 372 === 10.2 Federation (V2.0+) ===
Robert Schaub 1.3 373
Robert Schaub 1.1 374 **Deferred until**:
Robert Schaub 1.3 375
Robert Schaub 1.1 376 * Core product proven with 10,000+ users
377 * User demand for decentralization
378 * Single-node limits reached
Robert Schaub 1.18 379 See [[Federation & Decentralization>>Archive.FactHarbor 2026\.01\.20.Specification.Federation & Decentralization.WebHome]] for future plans.
Robert Schaub 1.3 380
Robert Schaub 1.1 381 == 11. Technology Stack Summary ==
Robert Schaub 1.3 382
Robert Schaub 1.1 383 **Backend**:
Robert Schaub 1.3 384
Robert Schaub 1.1 385 * Python (FastAPI or Django)
386 * PostgreSQL (primary database)
387 * Redis (caching)
388 **Frontend**:
389 * Modern JavaScript framework (React, Vue, or Svelte)
390 * Server-side rendering for SEO
391 **AI/LLM**:
392 * Multi-provider orchestration (Claude, GPT-4, local models)
393 * Fallback and cross-checking support
394 **Infrastructure**:
395 * Docker containers
396 * Kubernetes or cloud platform auto-scaling
397 * S3-compatible object storage
398 **Monitoring**:
399 * Prometheus + Grafana
400 * Structured logging (ELK or cloud logging)
401 * Error tracking (Sentry)
Robert Schaub 1.3 402
Robert Schaub 1.1 403 == 12. Related Pages ==
Robert Schaub 1.3 404
Robert Schaub 1.11 405 * [[AI Knowledge Extraction Layer (AKEL)>>Archive.FactHarbor 2026\.01\.20.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
Robert Schaub 1.12 406 * [[Storage Strategy>>Archive.FactHarbor 2026\.01\.20.Specification.Architecture.WebHome]]
Robert Schaub 1.13 407 * [[Data Model>>Archive.FactHarbor 2026\.01\.20.Specification.Data Model.WebHome]]
Robert Schaub 1.12 408 * [[API Layer>>Archive.FactHarbor 2026\.01\.20.Specification.Architecture.WebHome]]
Robert Schaub 1.1 409 * [[Design Decisions>>FactHarbor.Specification.Design-Decisions]]
410 * [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]]