Changes for page Architecture
Last modified by Robert Schaub on 2026/02/08 08:23
Summary
-
Page properties (2 modified, 0 added, 0 removed)
Details
- Page properties
-
- Parent
-
... ... @@ -1,1 +1,1 @@ 1 - Test.FactHarborV0\.9\.100 incremental.Specification.WebHome1 +FactHarbor.Specification.WebHome - Content
-
... ... @@ -1,9 +1,6 @@ 1 1 = Architecture = 2 - 3 3 FactHarbor's architecture is designed for **simplicity, automation, and continuous improvement**. 4 - 5 5 == 1. Core Principles == 6 - 7 7 * **AI-First**: AKEL (AI) is the primary system, humans supplement 8 8 * **Publish by Default**: No centralized approval (removed in V0.9.50), publish with confidence scores 9 9 * **System Over Data**: Fix algorithms, not individual outputs ... ... @@ -10,85 +10,67 @@ 10 10 * **Measure Everything**: Quality metrics drive improvements 11 11 * **Scale Through Automation**: Minimal human intervention 12 12 * **Start Simple**: Add complexity only when metrics prove necessary 13 - 14 14 == 2. High-Level Architecture == 15 - 16 16 {{include reference="FactHarbor.Specification.Diagrams.High-Level Architecture.WebHome"/}} 17 - 18 18 === 2.1 Three-Layer Architecture === 19 - 20 20 FactHarbor uses a clean three-layer architecture: 21 - 22 22 ==== Interface Layer ==== 23 - 24 24 Handles all user and system interactions: 25 - 26 26 * **Web UI**: Browse claims, view evidence, submit feedback 27 27 * **REST API**: Programmatic access for integrations 28 28 * **Authentication & Authorization**: User identity and permissions 29 29 * **Rate Limiting**: Protect against abuse 30 - 31 31 ==== Processing Layer ==== 32 - 33 33 Core business logic and AI processing: 34 - 35 35 * **AKEL Pipeline**: AI-driven claim analysis (parallel processing) 36 -* Parse and extract claim components 37 -* Gather evidence from multiple sources 38 -* Check source track records 39 -* Extract scenarios from evidence 40 -* Synthesize verdicts 41 -* Calculate risk scores 23 + * Parse and extract claim components 24 + * Gather evidence from multiple sources 25 + * Check source track records 26 + * Extract scenarios from evidence 27 + * Synthesize verdicts 28 + * Calculate risk scores 42 42 * **Background Jobs**: Automated maintenance tasks 43 -* Source track record updates (weekly) 44 -* Cache warming and invalidation 45 -* Metrics aggregation 46 -* Data archival 30 + * Source track record updates (weekly) 31 + * Cache warming and invalidation 32 + * Metrics aggregation 33 + * Data archival 47 47 * **Quality Monitoring**: Automated quality checks 48 -* Anomaly detection 49 -* Contradiction detection 50 -* Completeness validation 35 + * Anomaly detection 36 + * Contradiction detection 37 + * Completeness validation 51 51 * **Moderation Detection**: Automated abuse detection 52 -* Spam identification 53 -* Manipulation detection 54 -* Flag suspicious activity 55 - 39 + * Spam identification 40 + * Manipulation detection 41 + * Flag suspicious activity 56 56 ==== Data & Storage Layer ==== 57 - 58 58 Persistent data storage and caching: 59 - 60 60 * **PostgreSQL**: Primary database for all core data 61 -* Claims, evidence, sources, users 62 -* Scenarios, edits, audit logs 63 -* Built-in full-text search 64 -* Time-series capabilities for metrics 45 + * Claims, evidence, sources, users 46 + * Scenarios, edits, audit logs 47 + * Built-in full-text search 48 + * Time-series capabilities for metrics 65 65 * **Redis**: High-speed caching layer 66 -* Session data 67 -* Frequently accessed claims 68 -* API rate limiting 50 + * Session data 51 + * Frequently accessed claims 52 + * API rate limiting 69 69 * **S3 Storage**: Long-term archival 70 -* Old edit history (90+ days) 71 -* AKEL processing logs 72 -* Backup snapshots 54 + * Old edit history (90+ days) 55 + * AKEL processing logs 56 + * Backup snapshots 73 73 **Optional future additions** (add only when metrics prove necessary): 74 74 * **Elasticsearch**: If PostgreSQL full-text search becomes slow 75 75 * **TimescaleDB**: If metrics queries become a bottleneck 76 - 77 77 === 2.2 Design Philosophy === 78 - 79 79 **Start Simple, Evolve Based on Metrics** 80 80 The architecture deliberately starts simple: 81 - 82 82 * Single primary database (PostgreSQL handles most workloads initially) 83 83 * Three clear layers (easy to understand and maintain) 84 84 * Automated operations (minimal human intervention) 85 85 * Measure before optimizing (add complexity only when proven necessary) 86 86 See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale. 87 - 88 88 == 3. AKEL Architecture == 89 - 90 90 {{include reference="FactHarbor.Specification.Diagrams.AKEL_Architecture.WebHome"/}} 91 -See [[AI Knowledge Extraction Layer (AKEL)>> Archive.FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information.70 +See [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information. 92 92 93 93 == 3.5 Claim Processing Architecture == 94 94 ... ... @@ -97,7 +97,6 @@ 97 97 === Multi-Claim Handling === 98 98 99 99 Users often submit: 100 - 101 101 * **Text with multiple claims**: Articles, statements, or paragraphs containing several distinct factual claims 102 102 * **Web pages**: URLs that are analyzed to extract all verifiable claims 103 103 * **Single claims**: Simple, direct factual statements ... ... @@ -109,13 +109,11 @@ 109 109 **POC Implementation (Two-Phase):** 110 110 111 111 Phase 1 - Claim Extraction: 112 - 113 113 * LLM analyzes submitted content 114 114 * Extracts all distinct, verifiable claims 115 115 * Returns structured list of claims with context 116 116 117 117 Phase 2 - Parallel Analysis: 118 - 119 119 * Each claim processed independently by LLM 120 120 * Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk 121 121 * Parallelized across all claims ... ... @@ -124,19 +124,16 @@ 124 124 **Production Implementation (Three-Phase):** 125 125 126 126 Phase 1 - Extraction + Validation: 127 - 128 128 * Extract claims from content 129 129 * Validate clarity and uniqueness 130 130 * Filter vague or duplicate claims 131 131 132 132 Phase 2 - Evidence Gathering (Parallel): 133 - 134 134 * Independent evidence gathering per claim 135 135 * Source validation and scenario generation 136 136 * Quality gates prevent poor data from advancing 137 137 138 138 Phase 3 - Verdict Generation (Parallel): 139 - 140 140 * Generate verdict from validated evidence 141 141 * Confidence scoring and risk assessment 142 142 * Low-confidence cases routed to human review ... ... @@ -144,46 +144,35 @@ 144 144 === Architectural Benefits === 145 145 146 146 **Scalability:** 147 - 148 -* Process 100 claims with 3x latency of single claim 120 +* Process 100 claims with ~3x latency of single claim 149 149 * Parallel processing across independent claims 150 150 * Linear cost scaling with claim count 151 151 152 152 **Quality:** 153 - 154 154 * Validation gates between phases 155 155 * Errors isolated to individual claims 156 156 * Clear observability per processing step 157 157 158 158 **Flexibility:** 159 - 160 160 * Each phase optimizable independently 161 161 * Can use different model sizes per phase 162 162 * Easy to add human review at decision points 163 163 164 -== 4. Storage Architecture == 165 165 135 +== 4. Storage Architecture == 166 166 {{include reference="FactHarbor.Specification.Diagrams.Storage Architecture.WebHome"/}} 167 167 See [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] for detailed information. 168 - 169 169 == 4.5 Versioning Architecture == 170 - 171 171 {{include reference="FactHarbor.Specification.Diagrams.Versioning Architecture.WebHome"/}} 172 - 173 173 == 5. Automated Systems in Detail == 174 - 175 175 FactHarbor relies heavily on automation to achieve scale and quality. Here's how each automated system works: 176 - 177 177 === 5.1 AKEL (AI Knowledge Evaluation Layer) === 178 - 179 179 **What it does**: Primary AI processing engine that analyzes claims automatically 180 180 **Inputs**: 181 - 182 182 * User-submitted claim text 183 183 * Existing evidence and sources 184 184 * Source track record database 185 185 **Processing steps**: 186 - 187 187 1. **Parse & Extract**: Identify key components, entities, assertions 188 188 2. **Gather Evidence**: Search web and database for relevant sources 189 189 3. **Check Sources**: Evaluate source reliability using track records ... ... @@ -191,7 +191,6 @@ 191 191 5. **Synthesize Verdict**: Compile evidence assessment per scenario 192 192 6. **Calculate Risk**: Assess potential harm and controversy 193 193 **Outputs**: 194 - 195 195 * Structured claim record 196 196 * Evidence links with relevance scores 197 197 * Scenarios with context descriptions ... ... @@ -199,11 +199,8 @@ 199 199 * Overall confidence score 200 200 * Risk assessment 201 201 **Timing**: 10-18 seconds total (parallel processing) 202 - 203 203 === 5.2 Background Jobs === 204 - 205 205 **Source Track Record Updates** (Weekly): 206 - 207 207 * Analyze claim outcomes from past week 208 208 * Calculate source accuracy and reliability 209 209 * Update source_track_record table ... ... @@ -220,120 +220,83 @@ 220 220 * Move old AKEL logs to S3 (90+ days) 221 221 * Archive old edit history 222 222 * Compress and backup data 223 - 224 224 === 5.3 Quality Monitoring === 225 - 226 226 **Automated checks run continuously**: 227 - 228 228 * **Anomaly Detection**: Flag unusual patterns 229 -* Sudden confidence score changes 230 -* Unusual evidence distributions 231 -* Suspicious source patterns 184 + * Sudden confidence score changes 185 + * Unusual evidence distributions 186 + * Suspicious source patterns 232 232 * **Contradiction Detection**: Identify conflicts 233 -* Evidence that contradicts other evidence 234 -* Claims with internal contradictions 235 -* Source track record anomalies 188 + * Evidence that contradicts other evidence 189 + * Claims with internal contradictions 190 + * Source track record anomalies 236 236 * **Completeness Validation**: Ensure thoroughness 237 -* Sufficient evidence gathered 238 -* Multiple source types represented 239 -* Key scenarios identified 240 - 192 + * Sufficient evidence gathered 193 + * Multiple source types represented 194 + * Key scenarios identified 241 241 === 5.4 Moderation Detection === 242 - 243 243 **Automated abuse detection**: 244 - 245 245 * **Spam Identification**: Pattern matching for spam claims 246 246 * **Manipulation Detection**: Identify coordinated editing 247 247 * **Gaming Detection**: Flag attempts to game source scores 248 248 * **Suspicious Activity**: Log unusual behavior patterns 249 249 **Human Review**: Moderators review flagged items, system learns from decisions 250 - 251 251 == 6. Scalability Strategy == 252 - 253 253 === 6.1 Horizontal Scaling === 254 - 255 255 Components scale independently: 256 - 257 257 * **AKEL Workers**: Add more processing workers as claim volume grows 258 258 * **Database Read Replicas**: Add replicas for read-heavy workloads 259 259 * **Cache Layer**: Redis cluster for distributed caching 260 260 * **API Servers**: Load-balanced API instances 261 - 262 262 === 6.2 Vertical Scaling === 263 - 264 264 Individual components can be upgraded: 265 - 266 266 * **Database Server**: Increase CPU/RAM for PostgreSQL 267 267 * **Cache Memory**: Expand Redis memory 268 268 * **Worker Resources**: More powerful AKEL worker machines 269 - 270 270 === 6.3 Performance Optimization === 271 - 272 272 Built-in optimizations: 273 - 274 274 * **Denormalized Data**: Cache summary data in claim records (70% fewer joins) 275 275 * **Parallel Processing**: AKEL pipeline processes in parallel (40% faster) 276 276 * **Intelligent Caching**: Redis caches frequently accessed data 277 277 * **Background Processing**: Non-urgent tasks run asynchronously 278 - 279 279 == 7. Monitoring & Observability == 280 - 281 281 === 7.1 Key Metrics === 282 - 283 283 System tracks: 284 - 285 285 * **Performance**: AKEL processing time, API response time, cache hit rate 286 286 * **Quality**: Confidence score distribution, evidence completeness, contradiction rate 287 287 * **Usage**: Claims per day, active users, API requests 288 288 * **Errors**: Failed AKEL runs, API errors, database issues 289 - 290 290 === 7.2 Alerts === 291 - 292 292 Automated alerts for: 293 - 294 294 * Processing time >30 seconds (threshold breach) 295 295 * Error rate >1% (quality issue) 296 296 * Cache hit rate <80% (cache problem) 297 297 * Database connections >80% capacity (scaling needed) 298 - 299 299 === 7.3 Dashboards === 300 - 301 301 Real-time monitoring: 302 - 303 303 * **System Health**: Overall status and key metrics 304 304 * **AKEL Performance**: Processing time breakdown 305 305 * **Quality Metrics**: Confidence scores, completeness 306 306 * **User Activity**: Usage patterns, peak times 307 - 308 308 == 8. Security Architecture == 309 - 310 310 === 8.1 Authentication & Authorization === 311 - 312 312 * **User Authentication**: Secure login with password hashing 313 313 * **Role-Based Access**: Reader, Contributor, Moderator, Admin 314 314 * **API Keys**: For programmatic access 315 315 * **Rate Limiting**: Prevent abuse 316 - 317 317 === 8.2 Data Security === 318 - 319 319 * **Encryption**: TLS for transport, encrypted storage for sensitive data 320 320 * **Audit Logging**: Track all significant changes 321 321 * **Input Validation**: Sanitize all user inputs 322 322 * **SQL Injection Protection**: Parameterized queries 323 - 324 324 === 8.3 Abuse Prevention === 325 - 326 326 * **Rate Limiting**: Prevent flooding and DDoS 327 327 * **Automated Detection**: Flag suspicious patterns 328 328 * **Human Review**: Moderators investigate flagged content 329 329 * **Ban Mechanisms**: Block abusive users/IPs 330 - 331 331 == 9. Deployment Architecture == 332 - 333 333 === 9.1 Production Environment === 334 - 335 335 **Components**: 336 - 337 337 * Load Balancer (HAProxy or cloud LB) 338 338 * Multiple API servers (stateless) 339 339 * AKEL worker pool (auto-scaling) ... ... @@ -341,15 +341,11 @@ 341 341 * Redis cluster 342 342 * S3-compatible storage 343 343 **Regions**: Single region for V1.0, multi-region when needed 344 - 345 345 === 9.2 Development & Staging === 346 - 347 347 **Development**: Local Docker Compose setup 348 348 **Staging**: Scaled-down production replica 349 349 **CI/CD**: Automated testing and deployment 350 - 351 351 === 9.3 Disaster Recovery === 352 - 353 353 * **Database Backups**: Daily automated backups to S3 354 354 * **Point-in-Time Recovery**: Transaction log archival 355 355 * **Replication**: Real-time replication to standby ... ... @@ -360,28 +360,20 @@ 360 360 {{include reference="FactHarbor.Specification.Diagrams.Federation Architecture.WebHome"/}} 361 361 362 362 == 10. Future Architecture Evolution == 363 - 364 364 === 10.1 When to Add Complexity === 365 - 366 366 See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers. 367 367 **Elasticsearch**: When PostgreSQL search consistently >500ms 368 368 **TimescaleDB**: When metrics queries consistently >1s 369 369 **Federation**: When 10,000+ users and explicit demand 370 370 **Complex Reputation**: When 100+ active contributors 371 - 372 372 === 10.2 Federation (V2.0+) === 373 - 374 374 **Deferred until**: 375 - 376 376 * Core product proven with 10,000+ users 377 377 * User demand for decentralization 378 378 * Single-node limits reached 379 379 See [[Federation & Decentralization>>FactHarbor.Specification.Federation & Decentralization.WebHome]] for future plans. 380 - 381 381 == 11. Technology Stack Summary == 382 - 383 383 **Backend**: 384 - 385 385 * Python (FastAPI or Django) 386 386 * PostgreSQL (primary database) 387 387 * Redis (caching) ... ... @@ -399,10 +399,8 @@ 399 399 * Prometheus + Grafana 400 400 * Structured logging (ELK or cloud logging) 401 401 * Error tracking (Sentry) 402 - 403 403 == 12. Related Pages == 404 - 405 -* [[AI Knowledge Extraction Layer (AKEL)>>Archive.FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] 312 +* [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] 406 406 * [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] 407 407 * [[Data Model>>FactHarbor.Specification.Data Model.WebHome]] 408 408 * [[API Layer>>FactHarbor.Specification.Architecture.WebHome]]