Wiki source code of Architecture

Last modified by Robert Schaub on 2026/02/08 08:23

Show last authors
1 = Architecture =
2
3 FactHarbor's architecture is designed for **simplicity, automation, and continuous improvement**.
4
5 == 1. Core Principles ==
6
7 * **AI-First**: AKEL (AI) is the primary system, humans supplement
8 * **Publish by Default**: No centralized approval (removed in V0.9.50), publish with confidence scores
9 * **System Over Data**: Fix algorithms, not individual outputs
10 * **Measure Everything**: Quality metrics drive improvements
11 * **Scale Through Automation**: Minimal human intervention
12 * **Start Simple**: Add complexity only when metrics prove necessary
13
14 == 2. High-Level Architecture ==
15
16 {{include reference="Archive.FactHarbor 2026\.01\.20.Specification.Diagrams.High-Level Architecture.WebHome"/}}
17
18 === 2.1 Three-Layer Architecture ===
19
20 FactHarbor uses a clean three-layer architecture:
21
22 ==== Interface Layer ====
23
24 Handles all user and system interactions:
25
26 * **Web UI**: Browse claims, view evidence, submit feedback
27 * **REST API**: Programmatic access for integrations
28 * **Authentication & Authorization**: User identity and permissions
29 * **Rate Limiting**: Protect against abuse
30
31 ==== Processing Layer ====
32
33 Core business logic and AI processing:
34
35 * **AKEL Pipeline**: AI-driven claim analysis (parallel processing)
36 * Parse and extract claim components
37 * Gather evidence from multiple sources
38 * Check source track records
39 * Extract scenarios from evidence
40 * Synthesize verdicts
41 * Calculate risk scores
42 * **Background Jobs**: Automated maintenance tasks
43 * Source track record updates (weekly)
44 * Cache warming and invalidation
45 * Metrics aggregation
46 * Data archival
47 * **Quality Monitoring**: Automated quality checks
48 * Anomaly detection
49 * Contradiction detection
50 * Completeness validation
51 * **Moderation Detection**: Automated abuse detection
52 * Spam identification
53 * Manipulation detection
54 * Flag suspicious activity
55
56 ==== Data & Storage Layer ====
57
58 Persistent data storage and caching:
59
60 * **PostgreSQL**: Primary database for all core data
61 * Claims, evidence, sources, users
62 * Scenarios, edits, audit logs
63 * Built-in full-text search
64 * Time-series capabilities for metrics
65 * **Redis**: High-speed caching layer
66 * Session data
67 * Frequently accessed claims
68 * API rate limiting
69 * **S3 Storage**: Long-term archival
70 * Old edit history (90+ days)
71 * AKEL processing logs
72 * Backup snapshots
73 **Optional future additions** (add only when metrics prove necessary):
74 * **Elasticsearch**: If PostgreSQL full-text search becomes slow
75 * **TimescaleDB**: If metrics queries become a bottleneck
76
77 === 2.2 Design Philosophy ===
78
79 **Start Simple, Evolve Based on Metrics**
80 The architecture deliberately starts simple:
81
82 * Single primary database (PostgreSQL handles most workloads initially)
83 * Three clear layers (easy to understand and maintain)
84 * Automated operations (minimal human intervention)
85 * Measure before optimizing (add complexity only when proven necessary)
86 See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale.
87
88 == 3. AKEL Architecture ==
89
90 {{include reference="FactHarbor.Specification.Diagrams.AKEL_Architecture.WebHome"/}}
91 See [[AI Knowledge Extraction Layer (AKEL)>>Archive.FactHarbor 2026\.01\.20.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information.
92
93 == 3.5 Claim Processing Architecture ==
94
95 FactHarbor's claim processing architecture is designed to handle both single-claim and multi-claim submissions efficiently.
96
97 === Multi-Claim Handling ===
98
99 Users often submit:
100
101 * **Text with multiple claims**: Articles, statements, or paragraphs containing several distinct factual claims
102 * **Web pages**: URLs that are analyzed to extract all verifiable claims
103 * **Single claims**: Simple, direct factual statements
104
105 The first processing step is always **Claim Extraction**: identifying and isolating individual verifiable claims from submitted content.
106
107 === Processing Phases ===
108
109 **POC Implementation (Two-Phase):**
110
111 Phase 1 - Claim Extraction:
112
113 * LLM analyzes submitted content
114 * Extracts all distinct, verifiable claims
115 * Returns structured list of claims with context
116
117 Phase 2 - Parallel Analysis:
118
119 * Each claim processed independently by LLM
120 * Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk
121 * Parallelized across all claims
122 * Results aggregated for presentation
123
124 **Production Implementation (Three-Phase):**
125
126 Phase 1 - Extraction + Validation:
127
128 * Extract claims from content
129 * Validate clarity and uniqueness
130 * Filter vague or duplicate claims
131
132 Phase 2 - Evidence Gathering (Parallel):
133
134 * Independent evidence gathering per claim
135 * Source validation and scenario generation
136 * Quality gates prevent poor data from advancing
137
138 Phase 3 - Verdict Generation (Parallel):
139
140 * Generate verdict from validated evidence
141 * Confidence scoring and risk assessment
142 * Low-confidence cases routed to human review
143
144 === Architectural Benefits ===
145
146 **Scalability:**
147
148 * Process 100 claims with 3x latency of single claim
149 * Parallel processing across independent claims
150 * Linear cost scaling with claim count
151
152 **Quality:**
153
154 * Validation gates between phases
155 * Errors isolated to individual claims
156 * Clear observability per processing step
157
158 **Flexibility:**
159
160 * Each phase optimizable independently
161 * Can use different model sizes per phase
162 * Easy to add human review at decision points
163
164 == 4. Storage Architecture ==
165
166 {{include reference="Archive.FactHarbor 2026\.01\.20.Specification.Diagrams.Storage Architecture.WebHome"/}}
167 See [[Storage Strategy>>Archive.FactHarbor 2026\.01\.20.Specification.Architecture.WebHome]] for detailed information.
168
169 == 4.5 Versioning Architecture ==
170
171 {{include reference="Archive.FactHarbor 2026\.01\.20.Specification.Diagrams.Versioning Architecture.WebHome"/}}
172
173 == 5. Automated Systems in Detail ==
174
175 FactHarbor relies heavily on automation to achieve scale and quality. Here's how each automated system works:
176
177 === 5.1 AKEL (AI Knowledge Evaluation Layer) ===
178
179 **What it does**: Primary AI processing engine that analyzes claims automatically
180 **Inputs**:
181
182 * User-submitted claim text
183 * Existing evidence and sources
184 * Source track record database
185 **Processing steps**:
186
187 1. **Parse & Extract**: Identify key components, entities, assertions
188 2. **Gather Evidence**: Search web and database for relevant sources
189 3. **Check Sources**: Evaluate source reliability using track records
190 4. **Extract Scenarios**: Identify different contexts from evidence
191 5. **Synthesize Verdict**: Compile evidence assessment per scenario
192 6. **Calculate Risk**: Assess potential harm and controversy
193 **Outputs**:
194
195 * Structured claim record
196 * Evidence links with relevance scores
197 * Scenarios with context descriptions
198 * Verdict summary per scenario
199 * Overall confidence score
200 * Risk assessment
201 **Timing**: 10-18 seconds total (parallel processing)
202
203 === 5.2 Background Jobs ===
204
205 **Source Track Record Updates** (Weekly):
206
207 * Analyze claim outcomes from past week
208 * Calculate source accuracy and reliability
209 * Update source_track_record table
210 * Never triggered by individual claims (prevents circular dependencies)
211 **Cache Management** (Continuous):
212 * Warm cache for popular claims
213 * Invalidate cache on claim updates
214 * Monitor cache hit rates
215 **Metrics Aggregation** (Hourly):
216 * Roll up detailed metrics
217 * Calculate system health indicators
218 * Generate performance reports
219 **Data Archival** (Daily):
220 * Move old AKEL logs to S3 (90+ days)
221 * Archive old edit history
222 * Compress and backup data
223
224 === 5.3 Quality Monitoring ===
225
226 **Automated checks run continuously**:
227
228 * **Anomaly Detection**: Flag unusual patterns
229 * Sudden confidence score changes
230 * Unusual evidence distributions
231 * Suspicious source patterns
232 * **Contradiction Detection**: Identify conflicts
233 * Evidence that contradicts other evidence
234 * Claims with internal contradictions
235 * Source track record anomalies
236 * **Completeness Validation**: Ensure thoroughness
237 * Sufficient evidence gathered
238 * Multiple source types represented
239 * Key scenarios identified
240
241 === 5.4 Moderation Detection ===
242
243 **Automated abuse detection**:
244
245 * **Spam Identification**: Pattern matching for spam claims
246 * **Manipulation Detection**: Identify coordinated editing
247 * **Gaming Detection**: Flag attempts to game source scores
248 * **Suspicious Activity**: Log unusual behavior patterns
249 **Human Review**: Moderators review flagged items, system learns from decisions
250
251 == 6. Scalability Strategy ==
252
253 === 6.1 Horizontal Scaling ===
254
255 Components scale independently:
256
257 * **AKEL Workers**: Add more processing workers as claim volume grows
258 * **Database Read Replicas**: Add replicas for read-heavy workloads
259 * **Cache Layer**: Redis cluster for distributed caching
260 * **API Servers**: Load-balanced API instances
261
262 === 6.2 Vertical Scaling ===
263
264 Individual components can be upgraded:
265
266 * **Database Server**: Increase CPU/RAM for PostgreSQL
267 * **Cache Memory**: Expand Redis memory
268 * **Worker Resources**: More powerful AKEL worker machines
269
270 === 6.3 Performance Optimization ===
271
272 Built-in optimizations:
273
274 * **Denormalized Data**: Cache summary data in claim records (70% fewer joins)
275 * **Parallel Processing**: AKEL pipeline processes in parallel (40% faster)
276 * **Intelligent Caching**: Redis caches frequently accessed data
277 * **Background Processing**: Non-urgent tasks run asynchronously
278
279 == 7. Monitoring & Observability ==
280
281 === 7.1 Key Metrics ===
282
283 System tracks:
284
285 * **Performance**: AKEL processing time, API response time, cache hit rate
286 * **Quality**: Confidence score distribution, evidence completeness, contradiction rate
287 * **Usage**: Claims per day, active users, API requests
288 * **Errors**: Failed AKEL runs, API errors, database issues
289
290 === 7.2 Alerts ===
291
292 Automated alerts for:
293
294 * Processing time >30 seconds (threshold breach)
295 * Error rate >1% (quality issue)
296 * Cache hit rate <80% (cache problem)
297 * Database connections >80% capacity (scaling needed)
298
299 === 7.3 Dashboards ===
300
301 Real-time monitoring:
302
303 * **System Health**: Overall status and key metrics
304 * **AKEL Performance**: Processing time breakdown
305 * **Quality Metrics**: Confidence scores, completeness
306 * **User Activity**: Usage patterns, peak times
307
308 == 8. Security Architecture ==
309
310 === 8.1 Authentication & Authorization ===
311
312 * **User Authentication**: Secure login with password hashing
313 * **Role-Based Access**: Reader, Contributor, Moderator, Admin
314 * **API Keys**: For programmatic access
315 * **Rate Limiting**: Prevent abuse
316
317 === 8.2 Data Security ===
318
319 * **Encryption**: TLS for transport, encrypted storage for sensitive data
320 * **Audit Logging**: Track all significant changes
321 * **Input Validation**: Sanitize all user inputs
322 * **SQL Injection Protection**: Parameterized queries
323
324 === 8.3 Abuse Prevention ===
325
326 * **Rate Limiting**: Prevent flooding and DDoS
327 * **Automated Detection**: Flag suspicious patterns
328 * **Human Review**: Moderators investigate flagged content
329 * **Ban Mechanisms**: Block abusive users/IPs
330
331 == 9. Deployment Architecture ==
332
333 === 9.1 Production Environment ===
334
335 **Components**:
336
337 * Load Balancer (HAProxy or cloud LB)
338 * Multiple API servers (stateless)
339 * AKEL worker pool (auto-scaling)
340 * PostgreSQL primary + read replicas
341 * Redis cluster
342 * S3-compatible storage
343 **Regions**: Single region for V1.0, multi-region when needed
344
345 === 9.2 Development & Staging ===
346
347 **Development**: Local Docker Compose setup
348 **Staging**: Scaled-down production replica
349 **CI/CD**: Automated testing and deployment
350
351 === 9.3 Disaster Recovery ===
352
353 * **Database Backups**: Daily automated backups to S3
354 * **Point-in-Time Recovery**: Transaction log archival
355 * **Replication**: Real-time replication to standby
356 * **Recovery Time Objective**: <4 hours
357
358 === 9.5 Federation Architecture Diagram ===
359
360 {{include reference="Archive.FactHarbor 2026\.01\.20.Specification.Diagrams.Federation Architecture.WebHome"/}}
361
362 == 10. Future Architecture Evolution ==
363
364 === 10.1 When to Add Complexity ===
365
366 See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers.
367 **Elasticsearch**: When PostgreSQL search consistently >500ms
368 **TimescaleDB**: When metrics queries consistently >1s
369 **Federation**: When 10,000+ users and explicit demand
370 **Complex Reputation**: When 100+ active contributors
371
372 === 10.2 Federation (V2.0+) ===
373
374 **Deferred until**:
375
376 * Core product proven with 10,000+ users
377 * User demand for decentralization
378 * Single-node limits reached
379 See [[Federation & Decentralization>>Archive.FactHarbor 2026\.01\.20.Specification.Federation & Decentralization.WebHome]] for future plans.
380
381 == 11. Technology Stack Summary ==
382
383 **Backend**:
384
385 * Python (FastAPI or Django)
386 * PostgreSQL (primary database)
387 * Redis (caching)
388 **Frontend**:
389 * Modern JavaScript framework (React, Vue, or Svelte)
390 * Server-side rendering for SEO
391 **AI/LLM**:
392 * Multi-provider orchestration (Claude, GPT-4, local models)
393 * Fallback and cross-checking support
394 **Infrastructure**:
395 * Docker containers
396 * Kubernetes or cloud platform auto-scaling
397 * S3-compatible object storage
398 **Monitoring**:
399 * Prometheus + Grafana
400 * Structured logging (ELK or cloud logging)
401 * Error tracking (Sentry)
402
403 == 12. Related Pages ==
404
405 * [[AI Knowledge Extraction Layer (AKEL)>>Archive.FactHarbor 2026\.01\.20.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
406 * [[Storage Strategy>>Archive.FactHarbor 2026\.01\.20.Specification.Architecture.WebHome]]
407 * [[Data Model>>Archive.FactHarbor 2026\.01\.20.Specification.Data Model.WebHome]]
408 * [[API Layer>>Archive.FactHarbor 2026\.01\.20.Specification.Architecture.WebHome]]
409 * [[Design Decisions>>FactHarbor.Specification.Design-Decisions]]
410 * [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]]