Changes for page Architecture

Last modified by Robert Schaub on 2026/02/08 08:23

From version 1.3
edited by Robert Schaub
on 2026/01/20 20:24
Change comment: Renamed back-links.
To version 1.1
edited by Robert Schaub
on 2025/12/23 18:19
Change comment: Imported from XAR

Summary

Details

Page properties
Parent
... ... @@ -1,1 +1,1 @@
1 -Test.FactHarbor V0\.9\.100 incremental.Specification.WebHome
1 +FactHarbor.Specification.WebHome
Content
... ... @@ -1,9 +1,6 @@
1 1  = Architecture =
2 -
3 3  FactHarbor's architecture is designed for **simplicity, automation, and continuous improvement**.
4 -
5 5  == 1. Core Principles ==
6 -
7 7  * **AI-First**: AKEL (AI) is the primary system, humans supplement
8 8  * **Publish by Default**: No centralized approval (removed in V0.9.50), publish with confidence scores
9 9  * **System Over Data**: Fix algorithms, not individual outputs
... ... @@ -10,85 +10,67 @@
10 10  * **Measure Everything**: Quality metrics drive improvements
11 11  * **Scale Through Automation**: Minimal human intervention
12 12  * **Start Simple**: Add complexity only when metrics prove necessary
13 -
14 14  == 2. High-Level Architecture ==
15 -
16 16  {{include reference="FactHarbor.Specification.Diagrams.High-Level Architecture.WebHome"/}}
17 -
18 18  === 2.1 Three-Layer Architecture ===
19 -
20 20  FactHarbor uses a clean three-layer architecture:
21 -
22 22  ==== Interface Layer ====
23 -
24 24  Handles all user and system interactions:
25 -
26 26  * **Web UI**: Browse claims, view evidence, submit feedback
27 27  * **REST API**: Programmatic access for integrations
28 28  * **Authentication & Authorization**: User identity and permissions
29 29  * **Rate Limiting**: Protect against abuse
30 -
31 31  ==== Processing Layer ====
32 -
33 33  Core business logic and AI processing:
34 -
35 35  * **AKEL Pipeline**: AI-driven claim analysis (parallel processing)
36 -* Parse and extract claim components
37 -* Gather evidence from multiple sources
38 -* Check source track records
39 -* Extract scenarios from evidence
40 -* Synthesize verdicts
41 -* Calculate risk scores
23 + * Parse and extract claim components
24 + * Gather evidence from multiple sources
25 + * Check source track records
26 + * Extract scenarios from evidence
27 + * Synthesize verdicts
28 + * Calculate risk scores
42 42  * **Background Jobs**: Automated maintenance tasks
43 -* Source track record updates (weekly)
44 -* Cache warming and invalidation
45 -* Metrics aggregation
46 -* Data archival
30 + * Source track record updates (weekly)
31 + * Cache warming and invalidation
32 + * Metrics aggregation
33 + * Data archival
47 47  * **Quality Monitoring**: Automated quality checks
48 -* Anomaly detection
49 -* Contradiction detection
50 -* Completeness validation
35 + * Anomaly detection
36 + * Contradiction detection
37 + * Completeness validation
51 51  * **Moderation Detection**: Automated abuse detection
52 -* Spam identification
53 -* Manipulation detection
54 -* Flag suspicious activity
55 -
39 + * Spam identification
40 + * Manipulation detection
41 + * Flag suspicious activity
56 56  ==== Data & Storage Layer ====
57 -
58 58  Persistent data storage and caching:
59 -
60 60  * **PostgreSQL**: Primary database for all core data
61 -* Claims, evidence, sources, users
62 -* Scenarios, edits, audit logs
63 -* Built-in full-text search
64 -* Time-series capabilities for metrics
45 + * Claims, evidence, sources, users
46 + * Scenarios, edits, audit logs
47 + * Built-in full-text search
48 + * Time-series capabilities for metrics
65 65  * **Redis**: High-speed caching layer
66 -* Session data
67 -* Frequently accessed claims
68 -* API rate limiting
50 + * Session data
51 + * Frequently accessed claims
52 + * API rate limiting
69 69  * **S3 Storage**: Long-term archival
70 -* Old edit history (90+ days)
71 -* AKEL processing logs
72 -* Backup snapshots
54 + * Old edit history (90+ days)
55 + * AKEL processing logs
56 + * Backup snapshots
73 73  **Optional future additions** (add only when metrics prove necessary):
74 74  * **Elasticsearch**: If PostgreSQL full-text search becomes slow
75 75  * **TimescaleDB**: If metrics queries become a bottleneck
76 -
77 77  === 2.2 Design Philosophy ===
78 -
79 79  **Start Simple, Evolve Based on Metrics**
80 80  The architecture deliberately starts simple:
81 -
82 82  * Single primary database (PostgreSQL handles most workloads initially)
83 83  * Three clear layers (easy to understand and maintain)
84 84  * Automated operations (minimal human intervention)
85 85  * Measure before optimizing (add complexity only when proven necessary)
86 86  See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale.
87 -
88 88  == 3. AKEL Architecture ==
89 -
90 90  {{include reference="FactHarbor.Specification.Diagrams.AKEL_Architecture.WebHome"/}}
91 -See [[AI Knowledge Extraction Layer (AKEL)>>Archive.FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information.
70 +See [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information.
92 92  
93 93  == 3.5 Claim Processing Architecture ==
94 94  
... ... @@ -97,7 +97,6 @@
97 97  === Multi-Claim Handling ===
98 98  
99 99  Users often submit:
100 -
101 101  * **Text with multiple claims**: Articles, statements, or paragraphs containing several distinct factual claims
102 102  * **Web pages**: URLs that are analyzed to extract all verifiable claims
103 103  * **Single claims**: Simple, direct factual statements
... ... @@ -109,13 +109,11 @@
109 109  **POC Implementation (Two-Phase):**
110 110  
111 111  Phase 1 - Claim Extraction:
112 -
113 113  * LLM analyzes submitted content
114 114  * Extracts all distinct, verifiable claims
115 115  * Returns structured list of claims with context
116 116  
117 117  Phase 2 - Parallel Analysis:
118 -
119 119  * Each claim processed independently by LLM
120 120  * Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk
121 121  * Parallelized across all claims
... ... @@ -124,19 +124,16 @@
124 124  **Production Implementation (Three-Phase):**
125 125  
126 126  Phase 1 - Extraction + Validation:
127 -
128 128  * Extract claims from content
129 129  * Validate clarity and uniqueness
130 130  * Filter vague or duplicate claims
131 131  
132 132  Phase 2 - Evidence Gathering (Parallel):
133 -
134 134  * Independent evidence gathering per claim
135 135  * Source validation and scenario generation
136 136  * Quality gates prevent poor data from advancing
137 137  
138 138  Phase 3 - Verdict Generation (Parallel):
139 -
140 140  * Generate verdict from validated evidence
141 141  * Confidence scoring and risk assessment
142 142  * Low-confidence cases routed to human review
... ... @@ -144,46 +144,35 @@
144 144  === Architectural Benefits ===
145 145  
146 146  **Scalability:**
147 -
148 -* Process 100 claims with 3x latency of single claim
120 +* Process 100 claims with ~3x latency of single claim
149 149  * Parallel processing across independent claims
150 150  * Linear cost scaling with claim count
151 151  
152 152  **Quality:**
153 -
154 154  * Validation gates between phases
155 155  * Errors isolated to individual claims
156 156  * Clear observability per processing step
157 157  
158 158  **Flexibility:**
159 -
160 160  * Each phase optimizable independently
161 161  * Can use different model sizes per phase
162 162  * Easy to add human review at decision points
163 163  
164 -== 4. Storage Architecture ==
165 165  
135 +== 4. Storage Architecture ==
166 166  {{include reference="FactHarbor.Specification.Diagrams.Storage Architecture.WebHome"/}}
167 167  See [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] for detailed information.
168 -
169 169  == 4.5 Versioning Architecture ==
170 -
171 171  {{include reference="FactHarbor.Specification.Diagrams.Versioning Architecture.WebHome"/}}
172 -
173 173  == 5. Automated Systems in Detail ==
174 -
175 175  FactHarbor relies heavily on automation to achieve scale and quality. Here's how each automated system works:
176 -
177 177  === 5.1 AKEL (AI Knowledge Evaluation Layer) ===
178 -
179 179  **What it does**: Primary AI processing engine that analyzes claims automatically
180 180  **Inputs**:
181 -
182 182  * User-submitted claim text
183 183  * Existing evidence and sources
184 184  * Source track record database
185 185  **Processing steps**:
186 -
187 187  1. **Parse & Extract**: Identify key components, entities, assertions
188 188  2. **Gather Evidence**: Search web and database for relevant sources
189 189  3. **Check Sources**: Evaluate source reliability using track records
... ... @@ -191,7 +191,6 @@
191 191  5. **Synthesize Verdict**: Compile evidence assessment per scenario
192 192  6. **Calculate Risk**: Assess potential harm and controversy
193 193  **Outputs**:
194 -
195 195  * Structured claim record
196 196  * Evidence links with relevance scores
197 197  * Scenarios with context descriptions
... ... @@ -199,11 +199,8 @@
199 199  * Overall confidence score
200 200  * Risk assessment
201 201  **Timing**: 10-18 seconds total (parallel processing)
202 -
203 203  === 5.2 Background Jobs ===
204 -
205 205  **Source Track Record Updates** (Weekly):
206 -
207 207  * Analyze claim outcomes from past week
208 208  * Calculate source accuracy and reliability
209 209  * Update source_track_record table
... ... @@ -220,120 +220,83 @@
220 220  * Move old AKEL logs to S3 (90+ days)
221 221  * Archive old edit history
222 222  * Compress and backup data
223 -
224 224  === 5.3 Quality Monitoring ===
225 -
226 226  **Automated checks run continuously**:
227 -
228 228  * **Anomaly Detection**: Flag unusual patterns
229 -* Sudden confidence score changes
230 -* Unusual evidence distributions
231 -* Suspicious source patterns
184 + * Sudden confidence score changes
185 + * Unusual evidence distributions
186 + * Suspicious source patterns
232 232  * **Contradiction Detection**: Identify conflicts
233 -* Evidence that contradicts other evidence
234 -* Claims with internal contradictions
235 -* Source track record anomalies
188 + * Evidence that contradicts other evidence
189 + * Claims with internal contradictions
190 + * Source track record anomalies
236 236  * **Completeness Validation**: Ensure thoroughness
237 -* Sufficient evidence gathered
238 -* Multiple source types represented
239 -* Key scenarios identified
240 -
192 + * Sufficient evidence gathered
193 + * Multiple source types represented
194 + * Key scenarios identified
241 241  === 5.4 Moderation Detection ===
242 -
243 243  **Automated abuse detection**:
244 -
245 245  * **Spam Identification**: Pattern matching for spam claims
246 246  * **Manipulation Detection**: Identify coordinated editing
247 247  * **Gaming Detection**: Flag attempts to game source scores
248 248  * **Suspicious Activity**: Log unusual behavior patterns
249 249  **Human Review**: Moderators review flagged items, system learns from decisions
250 -
251 251  == 6. Scalability Strategy ==
252 -
253 253  === 6.1 Horizontal Scaling ===
254 -
255 255  Components scale independently:
256 -
257 257  * **AKEL Workers**: Add more processing workers as claim volume grows
258 258  * **Database Read Replicas**: Add replicas for read-heavy workloads
259 259  * **Cache Layer**: Redis cluster for distributed caching
260 260  * **API Servers**: Load-balanced API instances
261 -
262 262  === 6.2 Vertical Scaling ===
263 -
264 264  Individual components can be upgraded:
265 -
266 266  * **Database Server**: Increase CPU/RAM for PostgreSQL
267 267  * **Cache Memory**: Expand Redis memory
268 268  * **Worker Resources**: More powerful AKEL worker machines
269 -
270 270  === 6.3 Performance Optimization ===
271 -
272 272  Built-in optimizations:
273 -
274 274  * **Denormalized Data**: Cache summary data in claim records (70% fewer joins)
275 275  * **Parallel Processing**: AKEL pipeline processes in parallel (40% faster)
276 276  * **Intelligent Caching**: Redis caches frequently accessed data
277 277  * **Background Processing**: Non-urgent tasks run asynchronously
278 -
279 279  == 7. Monitoring & Observability ==
280 -
281 281  === 7.1 Key Metrics ===
282 -
283 283  System tracks:
284 -
285 285  * **Performance**: AKEL processing time, API response time, cache hit rate
286 286  * **Quality**: Confidence score distribution, evidence completeness, contradiction rate
287 287  * **Usage**: Claims per day, active users, API requests
288 288  * **Errors**: Failed AKEL runs, API errors, database issues
289 -
290 290  === 7.2 Alerts ===
291 -
292 292  Automated alerts for:
293 -
294 294  * Processing time >30 seconds (threshold breach)
295 295  * Error rate >1% (quality issue)
296 296  * Cache hit rate <80% (cache problem)
297 297  * Database connections >80% capacity (scaling needed)
298 -
299 299  === 7.3 Dashboards ===
300 -
301 301  Real-time monitoring:
302 -
303 303  * **System Health**: Overall status and key metrics
304 304  * **AKEL Performance**: Processing time breakdown
305 305  * **Quality Metrics**: Confidence scores, completeness
306 306  * **User Activity**: Usage patterns, peak times
307 -
308 308  == 8. Security Architecture ==
309 -
310 310  === 8.1 Authentication & Authorization ===
311 -
312 312  * **User Authentication**: Secure login with password hashing
313 313  * **Role-Based Access**: Reader, Contributor, Moderator, Admin
314 314  * **API Keys**: For programmatic access
315 315  * **Rate Limiting**: Prevent abuse
316 -
317 317  === 8.2 Data Security ===
318 -
319 319  * **Encryption**: TLS for transport, encrypted storage for sensitive data
320 320  * **Audit Logging**: Track all significant changes
321 321  * **Input Validation**: Sanitize all user inputs
322 322  * **SQL Injection Protection**: Parameterized queries
323 -
324 324  === 8.3 Abuse Prevention ===
325 -
326 326  * **Rate Limiting**: Prevent flooding and DDoS
327 327  * **Automated Detection**: Flag suspicious patterns
328 328  * **Human Review**: Moderators investigate flagged content
329 329  * **Ban Mechanisms**: Block abusive users/IPs
330 -
331 331  == 9. Deployment Architecture ==
332 -
333 333  === 9.1 Production Environment ===
334 -
335 335  **Components**:
336 -
337 337  * Load Balancer (HAProxy or cloud LB)
338 338  * Multiple API servers (stateless)
339 339  * AKEL worker pool (auto-scaling)
... ... @@ -341,15 +341,11 @@
341 341  * Redis cluster
342 342  * S3-compatible storage
343 343  **Regions**: Single region for V1.0, multi-region when needed
344 -
345 345  === 9.2 Development & Staging ===
346 -
347 347  **Development**: Local Docker Compose setup
348 348  **Staging**: Scaled-down production replica
349 349  **CI/CD**: Automated testing and deployment
350 -
351 351  === 9.3 Disaster Recovery ===
352 -
353 353  * **Database Backups**: Daily automated backups to S3
354 354  * **Point-in-Time Recovery**: Transaction log archival
355 355  * **Replication**: Real-time replication to standby
... ... @@ -360,28 +360,20 @@
360 360  {{include reference="FactHarbor.Specification.Diagrams.Federation Architecture.WebHome"/}}
361 361  
362 362  == 10. Future Architecture Evolution ==
363 -
364 364  === 10.1 When to Add Complexity ===
365 -
366 366  See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers.
367 367  **Elasticsearch**: When PostgreSQL search consistently >500ms
368 368  **TimescaleDB**: When metrics queries consistently >1s
369 369  **Federation**: When 10,000+ users and explicit demand
370 370  **Complex Reputation**: When 100+ active contributors
371 -
372 372  === 10.2 Federation (V2.0+) ===
373 -
374 374  **Deferred until**:
375 -
376 376  * Core product proven with 10,000+ users
377 377  * User demand for decentralization
378 378  * Single-node limits reached
379 379  See [[Federation & Decentralization>>FactHarbor.Specification.Federation & Decentralization.WebHome]] for future plans.
380 -
381 381  == 11. Technology Stack Summary ==
382 -
383 383  **Backend**:
384 -
385 385  * Python (FastAPI or Django)
386 386  * PostgreSQL (primary database)
387 387  * Redis (caching)
... ... @@ -399,10 +399,8 @@
399 399  * Prometheus + Grafana
400 400  * Structured logging (ELK or cloud logging)
401 401  * Error tracking (Sentry)
402 -
403 403  == 12. Related Pages ==
404 -
405 -* [[AI Knowledge Extraction Layer (AKEL)>>Archive.FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
312 +* [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
406 406  * [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]]
407 407  * [[Data Model>>FactHarbor.Specification.Data Model.WebHome]]
408 408  * [[API Layer>>FactHarbor.Specification.Architecture.WebHome]]