Changes for page Architecture

Last modified by Robert Schaub on 2026/02/08 08:23

From version 1.2
edited by Robert Schaub
on 2025/12/23 18:00
Change comment: Update document after refactoring.
To version 1.3
edited by Robert Schaub
on 2026/01/20 20:24
Change comment: Renamed back-links.

Summary

Details

Page properties
Content
... ... @@ -1,6 +1,9 @@
1 1  = Architecture =
2 +
2 2  FactHarbor's architecture is designed for **simplicity, automation, and continuous improvement**.
4 +
3 3  == 1. Core Principles ==
6 +
4 4  * **AI-First**: AKEL (AI) is the primary system, humans supplement
5 5  * **Publish by Default**: No centralized approval (removed in V0.9.50), publish with confidence scores
6 6  * **System Over Data**: Fix algorithms, not individual outputs
... ... @@ -7,67 +7,85 @@
7 7  * **Measure Everything**: Quality metrics drive improvements
8 8  * **Scale Through Automation**: Minimal human intervention
9 9  * **Start Simple**: Add complexity only when metrics prove necessary
13 +
10 10  == 2. High-Level Architecture ==
15 +
11 11  {{include reference="FactHarbor.Specification.Diagrams.High-Level Architecture.WebHome"/}}
17 +
12 12  === 2.1 Three-Layer Architecture ===
19 +
13 13  FactHarbor uses a clean three-layer architecture:
21 +
14 14  ==== Interface Layer ====
23 +
15 15  Handles all user and system interactions:
25 +
16 16  * **Web UI**: Browse claims, view evidence, submit feedback
17 17  * **REST API**: Programmatic access for integrations
18 18  * **Authentication & Authorization**: User identity and permissions
19 19  * **Rate Limiting**: Protect against abuse
30 +
20 20  ==== Processing Layer ====
32 +
21 21  Core business logic and AI processing:
34 +
22 22  * **AKEL Pipeline**: AI-driven claim analysis (parallel processing)
23 - * Parse and extract claim components
24 - * Gather evidence from multiple sources
25 - * Check source track records
26 - * Extract scenarios from evidence
27 - * Synthesize verdicts
28 - * Calculate risk scores
36 +* Parse and extract claim components
37 +* Gather evidence from multiple sources
38 +* Check source track records
39 +* Extract scenarios from evidence
40 +* Synthesize verdicts
41 +* Calculate risk scores
29 29  * **Background Jobs**: Automated maintenance tasks
30 - * Source track record updates (weekly)
31 - * Cache warming and invalidation
32 - * Metrics aggregation
33 - * Data archival
43 +* Source track record updates (weekly)
44 +* Cache warming and invalidation
45 +* Metrics aggregation
46 +* Data archival
34 34  * **Quality Monitoring**: Automated quality checks
35 - * Anomaly detection
36 - * Contradiction detection
37 - * Completeness validation
48 +* Anomaly detection
49 +* Contradiction detection
50 +* Completeness validation
38 38  * **Moderation Detection**: Automated abuse detection
39 - * Spam identification
40 - * Manipulation detection
41 - * Flag suspicious activity
52 +* Spam identification
53 +* Manipulation detection
54 +* Flag suspicious activity
55 +
42 42  ==== Data & Storage Layer ====
57 +
43 43  Persistent data storage and caching:
59 +
44 44  * **PostgreSQL**: Primary database for all core data
45 - * Claims, evidence, sources, users
46 - * Scenarios, edits, audit logs
47 - * Built-in full-text search
48 - * Time-series capabilities for metrics
61 +* Claims, evidence, sources, users
62 +* Scenarios, edits, audit logs
63 +* Built-in full-text search
64 +* Time-series capabilities for metrics
49 49  * **Redis**: High-speed caching layer
50 - * Session data
51 - * Frequently accessed claims
52 - * API rate limiting
66 +* Session data
67 +* Frequently accessed claims
68 +* API rate limiting
53 53  * **S3 Storage**: Long-term archival
54 - * Old edit history (90+ days)
55 - * AKEL processing logs
56 - * Backup snapshots
70 +* Old edit history (90+ days)
71 +* AKEL processing logs
72 +* Backup snapshots
57 57  **Optional future additions** (add only when metrics prove necessary):
58 58  * **Elasticsearch**: If PostgreSQL full-text search becomes slow
59 59  * **TimescaleDB**: If metrics queries become a bottleneck
76 +
60 60  === 2.2 Design Philosophy ===
78 +
61 61  **Start Simple, Evolve Based on Metrics**
62 62  The architecture deliberately starts simple:
81 +
63 63  * Single primary database (PostgreSQL handles most workloads initially)
64 64  * Three clear layers (easy to understand and maintain)
65 65  * Automated operations (minimal human intervention)
66 66  * Measure before optimizing (add complexity only when proven necessary)
67 67  See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale.
87 +
68 68  == 3. AKEL Architecture ==
89 +
69 69  {{include reference="FactHarbor.Specification.Diagrams.AKEL_Architecture.WebHome"/}}
70 -See [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information.
91 +See [[AI Knowledge Extraction Layer (AKEL)>>Archive.FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information.
71 71  
72 72  == 3.5 Claim Processing Architecture ==
73 73  
... ... @@ -76,6 +76,7 @@
76 76  === Multi-Claim Handling ===
77 77  
78 78  Users often submit:
100 +
79 79  * **Text with multiple claims**: Articles, statements, or paragraphs containing several distinct factual claims
80 80  * **Web pages**: URLs that are analyzed to extract all verifiable claims
81 81  * **Single claims**: Simple, direct factual statements
... ... @@ -87,11 +87,13 @@
87 87  **POC Implementation (Two-Phase):**
88 88  
89 89  Phase 1 - Claim Extraction:
112 +
90 90  * LLM analyzes submitted content
91 91  * Extracts all distinct, verifiable claims
92 92  * Returns structured list of claims with context
93 93  
94 94  Phase 2 - Parallel Analysis:
118 +
95 95  * Each claim processed independently by LLM
96 96  * Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk
97 97  * Parallelized across all claims
... ... @@ -100,16 +100,19 @@
100 100  **Production Implementation (Three-Phase):**
101 101  
102 102  Phase 1 - Extraction + Validation:
127 +
103 103  * Extract claims from content
104 104  * Validate clarity and uniqueness
105 105  * Filter vague or duplicate claims
106 106  
107 107  Phase 2 - Evidence Gathering (Parallel):
133 +
108 108  * Independent evidence gathering per claim
109 109  * Source validation and scenario generation
110 110  * Quality gates prevent poor data from advancing
111 111  
112 112  Phase 3 - Verdict Generation (Parallel):
139 +
113 113  * Generate verdict from validated evidence
114 114  * Confidence scoring and risk assessment
115 115  * Low-confidence cases routed to human review
... ... @@ -117,35 +117,46 @@
117 117  === Architectural Benefits ===
118 118  
119 119  **Scalability:**
120 -* Process 100 claims with ~3x latency of single claim
147 +
148 +* Process 100 claims with 3x latency of single claim
121 121  * Parallel processing across independent claims
122 122  * Linear cost scaling with claim count
123 123  
124 124  **Quality:**
153 +
125 125  * Validation gates between phases
126 126  * Errors isolated to individual claims
127 127  * Clear observability per processing step
128 128  
129 129  **Flexibility:**
159 +
130 130  * Each phase optimizable independently
131 131  * Can use different model sizes per phase
132 132  * Easy to add human review at decision points
133 133  
134 -
135 135  == 4. Storage Architecture ==
165 +
136 136  {{include reference="FactHarbor.Specification.Diagrams.Storage Architecture.WebHome"/}}
137 137  See [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] for detailed information.
168 +
138 138  == 4.5 Versioning Architecture ==
170 +
139 139  {{include reference="FactHarbor.Specification.Diagrams.Versioning Architecture.WebHome"/}}
172 +
140 140  == 5. Automated Systems in Detail ==
174 +
141 141  FactHarbor relies heavily on automation to achieve scale and quality. Here's how each automated system works:
176 +
142 142  === 5.1 AKEL (AI Knowledge Evaluation Layer) ===
178 +
143 143  **What it does**: Primary AI processing engine that analyzes claims automatically
144 144  **Inputs**:
181 +
145 145  * User-submitted claim text
146 146  * Existing evidence and sources
147 147  * Source track record database
148 148  **Processing steps**:
186 +
149 149  1. **Parse & Extract**: Identify key components, entities, assertions
150 150  2. **Gather Evidence**: Search web and database for relevant sources
151 151  3. **Check Sources**: Evaluate source reliability using track records
... ... @@ -153,6 +153,7 @@
153 153  5. **Synthesize Verdict**: Compile evidence assessment per scenario
154 154  6. **Calculate Risk**: Assess potential harm and controversy
155 155  **Outputs**:
194 +
156 156  * Structured claim record
157 157  * Evidence links with relevance scores
158 158  * Scenarios with context descriptions
... ... @@ -160,8 +160,11 @@
160 160  * Overall confidence score
161 161  * Risk assessment
162 162  **Timing**: 10-18 seconds total (parallel processing)
202 +
163 163  === 5.2 Background Jobs ===
204 +
164 164  **Source Track Record Updates** (Weekly):
206 +
165 165  * Analyze claim outcomes from past week
166 166  * Calculate source accuracy and reliability
167 167  * Update source_track_record table
... ... @@ -178,83 +178,120 @@
178 178  * Move old AKEL logs to S3 (90+ days)
179 179  * Archive old edit history
180 180  * Compress and backup data
223 +
181 181  === 5.3 Quality Monitoring ===
225 +
182 182  **Automated checks run continuously**:
227 +
183 183  * **Anomaly Detection**: Flag unusual patterns
184 - * Sudden confidence score changes
185 - * Unusual evidence distributions
186 - * Suspicious source patterns
229 +* Sudden confidence score changes
230 +* Unusual evidence distributions
231 +* Suspicious source patterns
187 187  * **Contradiction Detection**: Identify conflicts
188 - * Evidence that contradicts other evidence
189 - * Claims with internal contradictions
190 - * Source track record anomalies
233 +* Evidence that contradicts other evidence
234 +* Claims with internal contradictions
235 +* Source track record anomalies
191 191  * **Completeness Validation**: Ensure thoroughness
192 - * Sufficient evidence gathered
193 - * Multiple source types represented
194 - * Key scenarios identified
237 +* Sufficient evidence gathered
238 +* Multiple source types represented
239 +* Key scenarios identified
240 +
195 195  === 5.4 Moderation Detection ===
242 +
196 196  **Automated abuse detection**:
244 +
197 197  * **Spam Identification**: Pattern matching for spam claims
198 198  * **Manipulation Detection**: Identify coordinated editing
199 199  * **Gaming Detection**: Flag attempts to game source scores
200 200  * **Suspicious Activity**: Log unusual behavior patterns
201 201  **Human Review**: Moderators review flagged items, system learns from decisions
250 +
202 202  == 6. Scalability Strategy ==
252 +
203 203  === 6.1 Horizontal Scaling ===
254 +
204 204  Components scale independently:
256 +
205 205  * **AKEL Workers**: Add more processing workers as claim volume grows
206 206  * **Database Read Replicas**: Add replicas for read-heavy workloads
207 207  * **Cache Layer**: Redis cluster for distributed caching
208 208  * **API Servers**: Load-balanced API instances
261 +
209 209  === 6.2 Vertical Scaling ===
263 +
210 210  Individual components can be upgraded:
265 +
211 211  * **Database Server**: Increase CPU/RAM for PostgreSQL
212 212  * **Cache Memory**: Expand Redis memory
213 213  * **Worker Resources**: More powerful AKEL worker machines
269 +
214 214  === 6.3 Performance Optimization ===
271 +
215 215  Built-in optimizations:
273 +
216 216  * **Denormalized Data**: Cache summary data in claim records (70% fewer joins)
217 217  * **Parallel Processing**: AKEL pipeline processes in parallel (40% faster)
218 218  * **Intelligent Caching**: Redis caches frequently accessed data
219 219  * **Background Processing**: Non-urgent tasks run asynchronously
278 +
220 220  == 7. Monitoring & Observability ==
280 +
221 221  === 7.1 Key Metrics ===
282 +
222 222  System tracks:
284 +
223 223  * **Performance**: AKEL processing time, API response time, cache hit rate
224 224  * **Quality**: Confidence score distribution, evidence completeness, contradiction rate
225 225  * **Usage**: Claims per day, active users, API requests
226 226  * **Errors**: Failed AKEL runs, API errors, database issues
289 +
227 227  === 7.2 Alerts ===
291 +
228 228  Automated alerts for:
293 +
229 229  * Processing time >30 seconds (threshold breach)
230 230  * Error rate >1% (quality issue)
231 231  * Cache hit rate <80% (cache problem)
232 232  * Database connections >80% capacity (scaling needed)
298 +
233 233  === 7.3 Dashboards ===
300 +
234 234  Real-time monitoring:
302 +
235 235  * **System Health**: Overall status and key metrics
236 236  * **AKEL Performance**: Processing time breakdown
237 237  * **Quality Metrics**: Confidence scores, completeness
238 238  * **User Activity**: Usage patterns, peak times
307 +
239 239  == 8. Security Architecture ==
309 +
240 240  === 8.1 Authentication & Authorization ===
311 +
241 241  * **User Authentication**: Secure login with password hashing
242 242  * **Role-Based Access**: Reader, Contributor, Moderator, Admin
243 243  * **API Keys**: For programmatic access
244 244  * **Rate Limiting**: Prevent abuse
316 +
245 245  === 8.2 Data Security ===
318 +
246 246  * **Encryption**: TLS for transport, encrypted storage for sensitive data
247 247  * **Audit Logging**: Track all significant changes
248 248  * **Input Validation**: Sanitize all user inputs
249 249  * **SQL Injection Protection**: Parameterized queries
323 +
250 250  === 8.3 Abuse Prevention ===
325 +
251 251  * **Rate Limiting**: Prevent flooding and DDoS
252 252  * **Automated Detection**: Flag suspicious patterns
253 253  * **Human Review**: Moderators investigate flagged content
254 254  * **Ban Mechanisms**: Block abusive users/IPs
330 +
255 255  == 9. Deployment Architecture ==
332 +
256 256  === 9.1 Production Environment ===
334 +
257 257  **Components**:
336 +
258 258  * Load Balancer (HAProxy or cloud LB)
259 259  * Multiple API servers (stateless)
260 260  * AKEL worker pool (auto-scaling)
... ... @@ -262,11 +262,15 @@
262 262  * Redis cluster
263 263  * S3-compatible storage
264 264  **Regions**: Single region for V1.0, multi-region when needed
344 +
265 265  === 9.2 Development & Staging ===
346 +
266 266  **Development**: Local Docker Compose setup
267 267  **Staging**: Scaled-down production replica
268 268  **CI/CD**: Automated testing and deployment
350 +
269 269  === 9.3 Disaster Recovery ===
352 +
270 270  * **Database Backups**: Daily automated backups to S3
271 271  * **Point-in-Time Recovery**: Transaction log archival
272 272  * **Replication**: Real-time replication to standby
... ... @@ -277,20 +277,28 @@
277 277  {{include reference="FactHarbor.Specification.Diagrams.Federation Architecture.WebHome"/}}
278 278  
279 279  == 10. Future Architecture Evolution ==
363 +
280 280  === 10.1 When to Add Complexity ===
365 +
281 281  See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers.
282 282  **Elasticsearch**: When PostgreSQL search consistently >500ms
283 283  **TimescaleDB**: When metrics queries consistently >1s
284 284  **Federation**: When 10,000+ users and explicit demand
285 285  **Complex Reputation**: When 100+ active contributors
371 +
286 286  === 10.2 Federation (V2.0+) ===
373 +
287 287  **Deferred until**:
375 +
288 288  * Core product proven with 10,000+ users
289 289  * User demand for decentralization
290 290  * Single-node limits reached
291 291  See [[Federation & Decentralization>>FactHarbor.Specification.Federation & Decentralization.WebHome]] for future plans.
380 +
292 292  == 11. Technology Stack Summary ==
382 +
293 293  **Backend**:
384 +
294 294  * Python (FastAPI or Django)
295 295  * PostgreSQL (primary database)
296 296  * Redis (caching)
... ... @@ -308,8 +308,10 @@
308 308  * Prometheus + Grafana
309 309  * Structured logging (ELK or cloud logging)
310 310  * Error tracking (Sentry)
402 +
311 311  == 12. Related Pages ==
312 -* [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
404 +
405 +* [[AI Knowledge Extraction Layer (AKEL)>>Archive.FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
313 313  * [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]]
314 314  * [[Data Model>>FactHarbor.Specification.Data Model.WebHome]]
315 315  * [[API Layer>>FactHarbor.Specification.Architecture.WebHome]]