Wiki source code of Architecture

Last modified by Robert Schaub on 2025/12/24 09:59

Show last authors
1 = Architecture =
2 FactHarbor's architecture is designed for **simplicity, automation, and continuous improvement**.
3 == 1. Core Principles ==
4 * **AI-First**: AKEL (AI) is the primary system, humans supplement
5 * **Publish by Default**: No centralized approval (removed in V0.9.50), publish with confidence scores
6 * **System Over Data**: Fix algorithms, not individual outputs
7 * **Measure Everything**: Quality metrics drive improvements
8 * **Scale Through Automation**: Minimal human intervention
9 * **Start Simple**: Add complexity only when metrics prove necessary
10 == 2. High-Level Architecture ==
11 {{include reference="FactHarbor.Specification.Diagrams.High-Level Architecture.WebHome"/}}
12 === 2.1 Three-Layer Architecture ===
13 FactHarbor uses a clean three-layer architecture:
14 ==== Interface Layer ====
15 Handles all user and system interactions:
16 * **Web UI**: Browse claims, view evidence, submit feedback
17 * **REST API**: Programmatic access for integrations
18 * **Authentication & Authorization**: User identity and permissions
19 * **Rate Limiting**: Protect against abuse
20 ==== Processing Layer ====
21 Core business logic and AI processing:
22 * **AKEL Pipeline**: AI-driven claim analysis (parallel processing)
23 * Parse and extract claim components
24 * Gather evidence from multiple sources
25 * Check source track records
26 * Extract scenarios from evidence
27 * Synthesize verdicts
28 * Calculate risk scores
29 * **Background Jobs**: Automated maintenance tasks
30 * Source track record updates (weekly)
31 * Cache warming and invalidation
32 * Metrics aggregation
33 * Data archival
34 * **Quality Monitoring**: Automated quality checks
35 * Anomaly detection
36 * Contradiction detection
37 * Completeness validation
38 * **Moderation Detection**: Automated abuse detection
39 * Spam identification
40 * Manipulation detection
41 * Flag suspicious activity
42 ==== Data & Storage Layer ====
43 Persistent data storage and caching:
44 * **PostgreSQL**: Primary database for all core data
45 * Claims, evidence, sources, users
46 * Scenarios, edits, audit logs
47 * Built-in full-text search
48 * Time-series capabilities for metrics
49 * **Redis**: High-speed caching layer
50 * Session data
51 * Frequently accessed claims
52 * API rate limiting
53 * **S3 Storage**: Long-term archival
54 * Old edit history (90+ days)
55 * AKEL processing logs
56 * Backup snapshots
57 **Optional future additions** (add only when metrics prove necessary):
58 * **Elasticsearch**: If PostgreSQL full-text search becomes slow
59 * **TimescaleDB**: If metrics queries become a bottleneck
60 === 2.2 Design Philosophy ===
61 **Start Simple, Evolve Based on Metrics**
62 The architecture deliberately starts simple:
63 * Single primary database (PostgreSQL handles most workloads initially)
64 * Three clear layers (easy to understand and maintain)
65 * Automated operations (minimal human intervention)
66 * Measure before optimizing (add complexity only when proven necessary)
67 See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale.
68 == 3. AKEL Architecture ==
69 {{include reference="FactHarbor.Specification.Diagrams.AKEL_Architecture.WebHome"/}}
70 See [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information.
71
72 == 3.5 Claim Processing Architecture ==
73
74 FactHarbor's claim processing architecture is designed to handle both single-claim and multi-claim submissions efficiently.
75
76 === Multi-Claim Handling ===
77
78 Users often submit:
79 * **Text with multiple claims**: Articles, statements, or paragraphs containing several distinct factual claims
80 * **Web pages**: URLs that are analyzed to extract all verifiable claims
81 * **Single claims**: Simple, direct factual statements
82
83 The first processing step is always **Claim Extraction**: identifying and isolating individual verifiable claims from submitted content.
84
85 === Processing Phases ===
86
87 **POC Implementation (Two-Phase):**
88
89 Phase 1 - Claim Extraction:
90 * LLM analyzes submitted content
91 * Extracts all distinct, verifiable claims
92 * Returns structured list of claims with context
93
94 Phase 2 - Parallel Analysis:
95 * Each claim processed independently by LLM
96 * Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk
97 * Parallelized across all claims
98 * Results aggregated for presentation
99
100 **Production Implementation (Three-Phase):**
101
102 Phase 1 - Extraction + Validation:
103 * Extract claims from content
104 * Validate clarity and uniqueness
105 * Filter vague or duplicate claims
106
107 Phase 2 - Evidence Gathering (Parallel):
108 * Independent evidence gathering per claim
109 * Source validation and scenario generation
110 * Quality gates prevent poor data from advancing
111
112 Phase 3 - Verdict Generation (Parallel):
113 * Generate verdict from validated evidence
114 * Confidence scoring and risk assessment
115 * Low-confidence cases routed to human review
116
117 === Architectural Benefits ===
118
119 **Scalability:**
120 * Process 100 claims with ~3x latency of single claim
121 * Parallel processing across independent claims
122 * Linear cost scaling with claim count
123
124 **Quality:**
125 * Validation gates between phases
126 * Errors isolated to individual claims
127 * Clear observability per processing step
128
129 **Flexibility:**
130 * Each phase optimizable independently
131 * Can use different model sizes per phase
132 * Easy to add human review at decision points
133
134
135 == 4. Storage Architecture ==
136 {{include reference="FactHarbor.Specification.Diagrams.Storage Architecture.WebHome"/}}
137 See [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] for detailed information.
138 == 4.5 Versioning Architecture ==
139 {{include reference="FactHarbor.Specification.Diagrams.Versioning Architecture.WebHome"/}}
140 == 5. Automated Systems in Detail ==
141 FactHarbor relies heavily on automation to achieve scale and quality. Here's how each automated system works:
142 === 5.1 AKEL (AI Knowledge Evaluation Layer) ===
143 **What it does**: Primary AI processing engine that analyzes claims automatically
144 **Inputs**:
145 * User-submitted claim text
146 * Existing evidence and sources
147 * Source track record database
148 **Processing steps**:
149 1. **Parse & Extract**: Identify key components, entities, assertions
150 2. **Gather Evidence**: Search web and database for relevant sources
151 3. **Check Sources**: Evaluate source reliability using track records
152 4. **Extract Scenarios**: Identify different contexts from evidence
153 5. **Synthesize Verdict**: Compile evidence assessment per scenario
154 6. **Calculate Risk**: Assess potential harm and controversy
155 **Outputs**:
156 * Structured claim record
157 * Evidence links with relevance scores
158 * Scenarios with context descriptions
159 * Verdict summary per scenario
160 * Overall confidence score
161 * Risk assessment
162 **Timing**: 10-18 seconds total (parallel processing)
163 === 5.2 Background Jobs ===
164 **Source Track Record Updates** (Weekly):
165 * Analyze claim outcomes from past week
166 * Calculate source accuracy and reliability
167 * Update source_track_record table
168 * Never triggered by individual claims (prevents circular dependencies)
169 **Cache Management** (Continuous):
170 * Warm cache for popular claims
171 * Invalidate cache on claim updates
172 * Monitor cache hit rates
173 **Metrics Aggregation** (Hourly):
174 * Roll up detailed metrics
175 * Calculate system health indicators
176 * Generate performance reports
177 **Data Archival** (Daily):
178 * Move old AKEL logs to S3 (90+ days)
179 * Archive old edit history
180 * Compress and backup data
181 === 5.3 Quality Monitoring ===
182 **Automated checks run continuously**:
183 * **Anomaly Detection**: Flag unusual patterns
184 * Sudden confidence score changes
185 * Unusual evidence distributions
186 * Suspicious source patterns
187 * **Contradiction Detection**: Identify conflicts
188 * Evidence that contradicts other evidence
189 * Claims with internal contradictions
190 * Source track record anomalies
191 * **Completeness Validation**: Ensure thoroughness
192 * Sufficient evidence gathered
193 * Multiple source types represented
194 * Key scenarios identified
195 === 5.4 Moderation Detection ===
196 **Automated abuse detection**:
197 * **Spam Identification**: Pattern matching for spam claims
198 * **Manipulation Detection**: Identify coordinated editing
199 * **Gaming Detection**: Flag attempts to game source scores
200 * **Suspicious Activity**: Log unusual behavior patterns
201 **Human Review**: Moderators review flagged items, system learns from decisions
202 == 6. Scalability Strategy ==
203 === 6.1 Horizontal Scaling ===
204 Components scale independently:
205 * **AKEL Workers**: Add more processing workers as claim volume grows
206 * **Database Read Replicas**: Add replicas for read-heavy workloads
207 * **Cache Layer**: Redis cluster for distributed caching
208 * **API Servers**: Load-balanced API instances
209 === 6.2 Vertical Scaling ===
210 Individual components can be upgraded:
211 * **Database Server**: Increase CPU/RAM for PostgreSQL
212 * **Cache Memory**: Expand Redis memory
213 * **Worker Resources**: More powerful AKEL worker machines
214 === 6.3 Performance Optimization ===
215 Built-in optimizations:
216 * **Denormalized Data**: Cache summary data in claim records (70% fewer joins)
217 * **Parallel Processing**: AKEL pipeline processes in parallel (40% faster)
218 * **Intelligent Caching**: Redis caches frequently accessed data
219 * **Background Processing**: Non-urgent tasks run asynchronously
220 == 7. Monitoring & Observability ==
221 === 7.1 Key Metrics ===
222 System tracks:
223 * **Performance**: AKEL processing time, API response time, cache hit rate
224 * **Quality**: Confidence score distribution, evidence completeness, contradiction rate
225 * **Usage**: Claims per day, active users, API requests
226 * **Errors**: Failed AKEL runs, API errors, database issues
227 === 7.2 Alerts ===
228 Automated alerts for:
229 * Processing time >30 seconds (threshold breach)
230 * Error rate >1% (quality issue)
231 * Cache hit rate <80% (cache problem)
232 * Database connections >80% capacity (scaling needed)
233 === 7.3 Dashboards ===
234 Real-time monitoring:
235 * **System Health**: Overall status and key metrics
236 * **AKEL Performance**: Processing time breakdown
237 * **Quality Metrics**: Confidence scores, completeness
238 * **User Activity**: Usage patterns, peak times
239 == 8. Security Architecture ==
240 === 8.1 Authentication & Authorization ===
241 * **User Authentication**: Secure login with password hashing
242 * **Role-Based Access**: Reader, Contributor, Moderator, Admin
243 * **API Keys**: For programmatic access
244 * **Rate Limiting**: Prevent abuse
245 === 8.2 Data Security ===
246 * **Encryption**: TLS for transport, encrypted storage for sensitive data
247 * **Audit Logging**: Track all significant changes
248 * **Input Validation**: Sanitize all user inputs
249 * **SQL Injection Protection**: Parameterized queries
250 === 8.3 Abuse Prevention ===
251 * **Rate Limiting**: Prevent flooding and DDoS
252 * **Automated Detection**: Flag suspicious patterns
253 * **Human Review**: Moderators investigate flagged content
254 * **Ban Mechanisms**: Block abusive users/IPs
255 == 9. Deployment Architecture ==
256 === 9.1 Production Environment ===
257 **Components**:
258 * Load Balancer (HAProxy or cloud LB)
259 * Multiple API servers (stateless)
260 * AKEL worker pool (auto-scaling)
261 * PostgreSQL primary + read replicas
262 * Redis cluster
263 * S3-compatible storage
264 **Regions**: Single region for V1.0, multi-region when needed
265 === 9.2 Development & Staging ===
266 **Development**: Local Docker Compose setup
267 **Staging**: Scaled-down production replica
268 **CI/CD**: Automated testing and deployment
269 === 9.3 Disaster Recovery ===
270 * **Database Backups**: Daily automated backups to S3
271 * **Point-in-Time Recovery**: Transaction log archival
272 * **Replication**: Real-time replication to standby
273 * **Recovery Time Objective**: <4 hours
274
275 === 9.5 Federation Architecture Diagram ===
276
277 {{include reference="FactHarbor.Specification.Diagrams.Federation Architecture.WebHome"/}}
278
279 == 10. Future Architecture Evolution ==
280 === 10.1 When to Add Complexity ===
281 See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers.
282 **Elasticsearch**: When PostgreSQL search consistently >500ms
283 **TimescaleDB**: When metrics queries consistently >1s
284 **Federation**: When 10,000+ users and explicit demand
285 **Complex Reputation**: When 100+ active contributors
286 === 10.2 Federation (V2.0+) ===
287 **Deferred until**:
288 * Core product proven with 10,000+ users
289 * User demand for decentralization
290 * Single-node limits reached
291 See [[Federation & Decentralization>>FactHarbor.Specification.Federation & Decentralization.WebHome]] for future plans.
292 == 11. Technology Stack Summary ==
293 **Backend**:
294 * Python (FastAPI or Django)
295 * PostgreSQL (primary database)
296 * Redis (caching)
297 **Frontend**:
298 * Modern JavaScript framework (React, Vue, or Svelte)
299 * Server-side rendering for SEO
300 **AI/LLM**:
301 * Multi-provider orchestration (Claude, GPT-4, local models)
302 * Fallback and cross-checking support
303 **Infrastructure**:
304 * Docker containers
305 * Kubernetes or cloud platform auto-scaling
306 * S3-compatible object storage
307 **Monitoring**:
308 * Prometheus + Grafana
309 * Structured logging (ELK or cloud logging)
310 * Error tracking (Sentry)
311 == 12. Related Pages ==
312 * [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
313 * [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]]
314 * [[Data Model>>FactHarbor.Specification.Data Model.WebHome]]
315 * [[API Layer>>FactHarbor.Specification.Architecture.WebHome]]
316 * [[Design Decisions>>FactHarbor.Specification.Design-Decisions]]
317 * [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]]