Wiki source code of Architecture

Version 2.1 by Robert Schaub on 2025/12/18 12:54

Show last authors
1 = Architecture =
2 FactHarbor's architecture is designed for **simplicity, automation, and continuous improvement**.
3 == 1. Core Principles ==
4 * **AI-First**: AKEL (AI) is the primary system, humans supplement
5 * **Publish by Default**: No centralized approval (removed in V0.9.50), publish with confidence scores
6 * **System Over Data**: Fix algorithms, not individual outputs
7 * **Measure Everything**: Quality metrics drive improvements
8 * **Scale Through Automation**: Minimal human intervention
9 * **Start Simple**: Add complexity only when metrics prove necessary
10 == 2. High-Level Architecture ==
11 {{include reference="FactHarbor.Specification.Diagrams.High-Level Architecture.WebHome"/}}
12 === 2.1 Three-Layer Architecture ===
13 FactHarbor uses a clean three-layer architecture:
14 ==== Interface Layer ====
15 Handles all user and system interactions:
16 * **Web UI**: Browse claims, view evidence, submit feedback
17 * **REST API**: Programmatic access for integrations
18 * **Authentication & Authorization**: User identity and permissions
19 * **Rate Limiting**: Protect against abuse
20 ==== Processing Layer ====
21 Core business logic and AI processing:
22 * **AKEL Pipeline**: AI-driven claim analysis (parallel processing)
23 * Parse and extract claim components
24 * Gather evidence from multiple sources
25 * Check source track records
26 * Extract scenarios from evidence
27 * Synthesize verdicts
28 * Calculate risk scores
29 * **Background Jobs**: Automated maintenance tasks
30 * Source track record updates (weekly)
31 * Cache warming and invalidation
32 * Metrics aggregation
33 * Data archival
34 * **Quality Monitoring**: Automated quality checks
35 * Anomaly detection
36 * Contradiction detection
37 * Completeness validation
38 * **Moderation Detection**: Automated abuse detection
39 * Spam identification
40 * Manipulation detection
41 * Flag suspicious activity
42 ==== Data & Storage Layer ====
43 Persistent data storage and caching:
44 * **PostgreSQL**: Primary database for all core data
45 * Claims, evidence, sources, users
46 * Scenarios, edits, audit logs
47 * Built-in full-text search
48 * Time-series capabilities for metrics
49 * **Redis**: High-speed caching layer
50 * Session data
51 * Frequently accessed claims
52 * API rate limiting
53 * **S3 Storage**: Long-term archival
54 * Old edit history (90+ days)
55 * AKEL processing logs
56 * Backup snapshots
57 **Optional future additions** (add only when metrics prove necessary):
58 * **Elasticsearch**: If PostgreSQL full-text search becomes slow
59 * **TimescaleDB**: If metrics queries become a bottleneck
60 === 2.2 Design Philosophy ===
61 **Start Simple, Evolve Based on Metrics**
62 The architecture deliberately starts simple:
63 * Single primary database (PostgreSQL handles most workloads initially)
64 * Three clear layers (easy to understand and maintain)
65 * Automated operations (minimal human intervention)
66 * Measure before optimizing (add complexity only when proven necessary)
67 See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale.
68 == 3. AKEL Architecture ==
69 {{include reference="FactHarbor.Specification.Diagrams.AKEL_Architecture.WebHome"/}}
70 See [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information.
71 == 4. Storage Architecture ==
72 {{include reference="FactHarbor.Specification.Diagrams.Storage Architecture.WebHome"/}}
73 See [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] for detailed information.
74 == 4.5 Versioning Architecture ==
75 {{include reference="FactHarbor.Specification.Diagrams.Versioning Architecture.WebHome"/}}
76 == 5. Automated Systems in Detail ==
77 FactHarbor relies heavily on automation to achieve scale and quality. Here's how each automated system works:
78 === 5.1 AKEL (AI Knowledge Evaluation Layer) ===
79 **What it does**: Primary AI processing engine that analyzes claims automatically
80 **Inputs**:
81 * User-submitted claim text
82 * Existing evidence and sources
83 * Source track record database
84 **Processing steps**:
85 1. **Parse & Extract**: Identify key components, entities, assertions
86 2. **Gather Evidence**: Search web and database for relevant sources
87 3. **Check Sources**: Evaluate source reliability using track records
88 4. **Extract Scenarios**: Identify different contexts from evidence
89 5. **Synthesize Verdict**: Compile evidence assessment per scenario
90 6. **Calculate Risk**: Assess potential harm and controversy
91 **Outputs**:
92 * Structured claim record
93 * Evidence links with relevance scores
94 * Scenarios with context descriptions
95 * Verdict summary per scenario
96 * Overall confidence score
97 * Risk assessment
98 **Timing**: 10-18 seconds total (parallel processing)
99 === 5.2 Background Jobs ===
100 **Source Track Record Updates** (Weekly):
101 * Analyze claim outcomes from past week
102 * Calculate source accuracy and reliability
103 * Update source_track_record table
104 * Never triggered by individual claims (prevents circular dependencies)
105 **Cache Management** (Continuous):
106 * Warm cache for popular claims
107 * Invalidate cache on claim updates
108 * Monitor cache hit rates
109 **Metrics Aggregation** (Hourly):
110 * Roll up detailed metrics
111 * Calculate system health indicators
112 * Generate performance reports
113 **Data Archival** (Daily):
114 * Move old AKEL logs to S3 (90+ days)
115 * Archive old edit history
116 * Compress and backup data
117 === 5.3 Quality Monitoring ===
118 **Automated checks run continuously**:
119 * **Anomaly Detection**: Flag unusual patterns
120 * Sudden confidence score changes
121 * Unusual evidence distributions
122 * Suspicious source patterns
123 * **Contradiction Detection**: Identify conflicts
124 * Evidence that contradicts other evidence
125 * Claims with internal contradictions
126 * Source track record anomalies
127 * **Completeness Validation**: Ensure thoroughness
128 * Sufficient evidence gathered
129 * Multiple source types represented
130 * Key scenarios identified
131 === 5.4 Moderation Detection ===
132 **Automated abuse detection**:
133 * **Spam Identification**: Pattern matching for spam claims
134 * **Manipulation Detection**: Identify coordinated editing
135 * **Gaming Detection**: Flag attempts to game source scores
136 * **Suspicious Activity**: Log unusual behavior patterns
137 **Human Review**: Moderators review flagged items, system learns from decisions
138 == 6. Scalability Strategy ==
139 === 6.1 Horizontal Scaling ===
140 Components scale independently:
141 * **AKEL Workers**: Add more processing workers as claim volume grows
142 * **Database Read Replicas**: Add replicas for read-heavy workloads
143 * **Cache Layer**: Redis cluster for distributed caching
144 * **API Servers**: Load-balanced API instances
145 === 6.2 Vertical Scaling ===
146 Individual components can be upgraded:
147 * **Database Server**: Increase CPU/RAM for PostgreSQL
148 * **Cache Memory**: Expand Redis memory
149 * **Worker Resources**: More powerful AKEL worker machines
150 === 6.3 Performance Optimization ===
151 Built-in optimizations:
152 * **Denormalized Data**: Cache summary data in claim records (70% fewer joins)
153 * **Parallel Processing**: AKEL pipeline processes in parallel (40% faster)
154 * **Intelligent Caching**: Redis caches frequently accessed data
155 * **Background Processing**: Non-urgent tasks run asynchronously
156 == 7. Monitoring & Observability ==
157 === 7.1 Key Metrics ===
158 System tracks:
159 * **Performance**: AKEL processing time, API response time, cache hit rate
160 * **Quality**: Confidence score distribution, evidence completeness, contradiction rate
161 * **Usage**: Claims per day, active users, API requests
162 * **Errors**: Failed AKEL runs, API errors, database issues
163 === 7.2 Alerts ===
164 Automated alerts for:
165 * Processing time >30 seconds (threshold breach)
166 * Error rate >1% (quality issue)
167 * Cache hit rate <80% (cache problem)
168 * Database connections >80% capacity (scaling needed)
169 === 7.3 Dashboards ===
170 Real-time monitoring:
171 * **System Health**: Overall status and key metrics
172 * **AKEL Performance**: Processing time breakdown
173 * **Quality Metrics**: Confidence scores, completeness
174 * **User Activity**: Usage patterns, peak times
175 == 8. Security Architecture ==
176 === 8.1 Authentication & Authorization ===
177 * **User Authentication**: Secure login with password hashing
178 * **Role-Based Access**: Reader, Contributor, Moderator, Admin
179 * **API Keys**: For programmatic access
180 * **Rate Limiting**: Prevent abuse
181 === 8.2 Data Security ===
182 * **Encryption**: TLS for transport, encrypted storage for sensitive data
183 * **Audit Logging**: Track all significant changes
184 * **Input Validation**: Sanitize all user inputs
185 * **SQL Injection Protection**: Parameterized queries
186 === 8.3 Abuse Prevention ===
187 * **Rate Limiting**: Prevent flooding and DDoS
188 * **Automated Detection**: Flag suspicious patterns
189 * **Human Review**: Moderators investigate flagged content
190 * **Ban Mechanisms**: Block abusive users/IPs
191 == 9. Deployment Architecture ==
192 === 9.1 Production Environment ===
193 **Components**:
194 * Load Balancer (HAProxy or cloud LB)
195 * Multiple API servers (stateless)
196 * AKEL worker pool (auto-scaling)
197 * PostgreSQL primary + read replicas
198 * Redis cluster
199 * S3-compatible storage
200 **Regions**: Single region for V1.0, multi-region when needed
201 === 9.2 Development & Staging ===
202 **Development**: Local Docker Compose setup
203 **Staging**: Scaled-down production replica
204 **CI/CD**: Automated testing and deployment
205 === 9.3 Disaster Recovery ===
206 * **Database Backups**: Daily automated backups to S3
207 * **Point-in-Time Recovery**: Transaction log archival
208 * **Replication**: Real-time replication to standby
209 * **Recovery Time Objective**: <4 hours
210
211 === 9.5 Federation Architecture Diagram ===
212
213 {{include reference="FactHarbor.Specification.Diagrams.Federation Architecture.WebHome"/}}
214
215 == 10. Future Architecture Evolution ==
216 === 10.1 When to Add Complexity ===
217 See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers.
218 **Elasticsearch**: When PostgreSQL search consistently >500ms
219 **TimescaleDB**: When metrics queries consistently >1s
220 **Federation**: When 10,000+ users and explicit demand
221 **Complex Reputation**: When 100+ active contributors
222 === 10.2 Federation (V2.0+) ===
223 **Deferred until**:
224 * Core product proven with 10,000+ users
225 * User demand for decentralization
226 * Single-node limits reached
227 See [[Federation & Decentralization>>FactHarbor.Specification.Federation & Decentralization.WebHome]] for future plans.
228 == 11. Technology Stack Summary ==
229 **Backend**:
230 * Python (FastAPI or Django)
231 * PostgreSQL (primary database)
232 * Redis (caching)
233 **Frontend**:
234 * Modern JavaScript framework (React, Vue, or Svelte)
235 * Server-side rendering for SEO
236 **AI/LLM**:
237 * Multi-provider orchestration (Claude, GPT-4, local models)
238 * Fallback and cross-checking support
239 **Infrastructure**:
240 * Docker containers
241 * Kubernetes or cloud platform auto-scaling
242 * S3-compatible object storage
243 **Monitoring**:
244 * Prometheus + Grafana
245 * Structured logging (ELK or cloud logging)
246 * Error tracking (Sentry)
247 == 12. Related Pages ==
248 * [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
249 * [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]]
250 * [[Data Model>>FactHarbor.Specification.Data Model.WebHome]]
251 * [[API Layer>>FactHarbor.Specification.Architecture.WebHome]]
252 * [[Design Decisions>>FactHarbor.Specification.Design-Decisions]]
253 * [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]]