Changes for page Architecture
Last modified by Robert Schaub on 2026/02/08 08:23
Summary
-
Page properties (2 modified, 0 added, 0 removed)
Details
- Page properties
-
- Parent
-
... ... @@ -1,1 +1,1 @@ 1 -FactHarbor.Specification.WebHome 1 +Archive.FactHarbor V0\.9\.50 Plus (Prev Rel).Specification.WebHome - Content
-
... ... @@ -1,6 +1,9 @@ 1 1 = Architecture = 2 + 2 2 FactHarbor's architecture is designed for **simplicity, automation, and continuous improvement**. 4 + 3 3 == 1. Core Principles == 6 + 4 4 * **AI-First**: AKEL (AI) is the primary system, humans supplement 5 5 * **Publish by Default**: No centralized approval (removed in V0.9.50), publish with confidence scores 6 6 * **System Over Data**: Fix algorithms, not individual outputs ... ... @@ -7,81 +7,180 @@ 7 7 * **Measure Everything**: Quality metrics drive improvements 8 8 * **Scale Through Automation**: Minimal human intervention 9 9 * **Start Simple**: Add complexity only when metrics prove necessary 13 + 10 10 == 2. High-Level Architecture == 15 + 11 11 {{include reference="FactHarbor.Specification.Diagrams.High-Level Architecture.WebHome"/}} 17 + 12 12 === 2.1 Three-Layer Architecture === 19 + 13 13 FactHarbor uses a clean three-layer architecture: 21 + 14 14 ==== Interface Layer ==== 23 + 15 15 Handles all user and system interactions: 25 + 16 16 * **Web UI**: Browse claims, view evidence, submit feedback 17 17 * **REST API**: Programmatic access for integrations 18 18 * **Authentication & Authorization**: User identity and permissions 19 19 * **Rate Limiting**: Protect against abuse 30 + 20 20 ==== Processing Layer ==== 32 + 21 21 Core business logic and AI processing: 34 + 22 22 * **AKEL Pipeline**: AI-driven claim analysis (parallel processing) 23 - * Parse and extract claim components24 - * Gather evidence from multiple sources25 - * Check source track records26 - * Extract scenarios from evidence27 - * Synthesize verdicts28 - * Calculate risk scores36 +* Parse and extract claim components 37 +* Gather evidence from multiple sources 38 +* Check source track records 39 +* Extract scenarios from evidence 40 +* Synthesize verdicts 41 +* Calculate risk scores 29 29 * **Background Jobs**: Automated maintenance tasks 30 - * Source track record updates (weekly)31 - * Cache warming and invalidation32 - * Metrics aggregation33 - * Data archival43 +* Source track record updates (weekly) 44 +* Cache warming and invalidation 45 +* Metrics aggregation 46 +* Data archival 34 34 * **Quality Monitoring**: Automated quality checks 35 - * Anomaly detection36 - * Contradiction detection37 - * Completeness validation48 +* Anomaly detection 49 +* Contradiction detection 50 +* Completeness validation 38 38 * **Moderation Detection**: Automated abuse detection 39 - * Spam identification 40 - * Manipulation detection 41 - * Flag suspicious activity 52 +* Spam identification 53 +* Manipulation detection 54 +* Flag suspicious activity 55 + 42 42 ==== Data & Storage Layer ==== 57 + 43 43 Persistent data storage and caching: 59 + 44 44 * **PostgreSQL**: Primary database for all core data 45 - * Claims, evidence, sources, users46 - * Scenarios, edits, audit logs47 - * Built-in full-text search48 - * Time-series capabilities for metrics61 +* Claims, evidence, sources, users 62 +* Scenarios, edits, audit logs 63 +* Built-in full-text search 64 +* Time-series capabilities for metrics 49 49 * **Redis**: High-speed caching layer 50 - * Session data51 - * Frequently accessed claims52 - * API rate limiting66 +* Session data 67 +* Frequently accessed claims 68 +* API rate limiting 53 53 * **S3 Storage**: Long-term archival 54 - * Old edit history (90+ days)55 - * AKEL processing logs56 - * Backup snapshots70 +* Old edit history (90+ days) 71 +* AKEL processing logs 72 +* Backup snapshots 57 57 **Optional future additions** (add only when metrics prove necessary): 58 58 * **Elasticsearch**: If PostgreSQL full-text search becomes slow 59 59 * **TimescaleDB**: If metrics queries become a bottleneck 76 + 60 60 === 2.2 Design Philosophy === 78 + 61 61 **Start Simple, Evolve Based on Metrics** 62 62 The architecture deliberately starts simple: 81 + 63 63 * Single primary database (PostgreSQL handles most workloads initially) 64 64 * Three clear layers (easy to understand and maintain) 65 65 * Automated operations (minimal human intervention) 66 66 * Measure before optimizing (add complexity only when proven necessary) 67 67 See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale. 87 + 68 68 == 3. AKEL Architecture == 89 + 69 69 {{include reference="FactHarbor.Specification.Diagrams.AKEL_Architecture.WebHome"/}} 70 -See [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information. 91 +See [[AI Knowledge Extraction Layer (AKEL)>>Archive.FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information. 92 + 93 +== 3.5 Claim Processing Architecture == 94 + 95 +FactHarbor's claim processing architecture is designed to handle both single-claim and multi-claim submissions efficiently. 96 + 97 +=== Multi-Claim Handling === 98 + 99 +Users often submit: 100 + 101 +* **Text with multiple claims**: Articles, statements, or paragraphs containing several distinct factual claims 102 +* **Web pages**: URLs that are analyzed to extract all verifiable claims 103 +* **Single claims**: Simple, direct factual statements 104 + 105 +The first processing step is always **Claim Extraction**: identifying and isolating individual verifiable claims from submitted content. 106 + 107 +=== Processing Phases === 108 + 109 +**POC Implementation (Two-Phase):** 110 + 111 +Phase 1 - Claim Extraction: 112 + 113 +* LLM analyzes submitted content 114 +* Extracts all distinct, verifiable claims 115 +* Returns structured list of claims with context 116 + 117 +Phase 2 - Parallel Analysis: 118 + 119 +* Each claim processed independently by LLM 120 +* Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk 121 +* Parallelized across all claims 122 +* Results aggregated for presentation 123 + 124 +**Production Implementation (Three-Phase):** 125 + 126 +Phase 1 - Extraction + Validation: 127 + 128 +* Extract claims from content 129 +* Validate clarity and uniqueness 130 +* Filter vague or duplicate claims 131 + 132 +Phase 2 - Evidence Gathering (Parallel): 133 + 134 +* Independent evidence gathering per claim 135 +* Source validation and scenario generation 136 +* Quality gates prevent poor data from advancing 137 + 138 +Phase 3 - Verdict Generation (Parallel): 139 + 140 +* Generate verdict from validated evidence 141 +* Confidence scoring and risk assessment 142 +* Low-confidence cases routed to human review 143 + 144 +=== Architectural Benefits === 145 + 146 +**Scalability:** 147 + 148 +* Process 100 claims with 3x latency of single claim 149 +* Parallel processing across independent claims 150 +* Linear cost scaling with claim count 151 + 152 +**Quality:** 153 + 154 +* Validation gates between phases 155 +* Errors isolated to individual claims 156 +* Clear observability per processing step 157 + 158 +**Flexibility:** 159 + 160 +* Each phase optimizable independently 161 +* Can use different model sizes per phase 162 +* Easy to add human review at decision points 163 + 71 71 == 4. Storage Architecture == 165 + 72 72 {{include reference="FactHarbor.Specification.Diagrams.Storage Architecture.WebHome"/}} 73 73 See [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] for detailed information. 168 + 74 74 == 4.5 Versioning Architecture == 170 + 75 75 {{include reference="FactHarbor.Specification.Diagrams.Versioning Architecture.WebHome"/}} 172 + 76 76 == 5. Automated Systems in Detail == 174 + 77 77 FactHarbor relies heavily on automation to achieve scale and quality. Here's how each automated system works: 176 + 78 78 === 5.1 AKEL (AI Knowledge Evaluation Layer) === 178 + 79 79 **What it does**: Primary AI processing engine that analyzes claims automatically 80 80 **Inputs**: 181 + 81 81 * User-submitted claim text 82 82 * Existing evidence and sources 83 83 * Source track record database 84 84 **Processing steps**: 186 + 85 85 1. **Parse & Extract**: Identify key components, entities, assertions 86 86 2. **Gather Evidence**: Search web and database for relevant sources 87 87 3. **Check Sources**: Evaluate source reliability using track records ... ... @@ -89,6 +89,7 @@ 89 89 5. **Synthesize Verdict**: Compile evidence assessment per scenario 90 90 6. **Calculate Risk**: Assess potential harm and controversy 91 91 **Outputs**: 194 + 92 92 * Structured claim record 93 93 * Evidence links with relevance scores 94 94 * Scenarios with context descriptions ... ... @@ -96,8 +96,11 @@ 96 96 * Overall confidence score 97 97 * Risk assessment 98 98 **Timing**: 10-18 seconds total (parallel processing) 202 + 99 99 === 5.2 Background Jobs === 204 + 100 100 **Source Track Record Updates** (Weekly): 206 + 101 101 * Analyze claim outcomes from past week 102 102 * Calculate source accuracy and reliability 103 103 * Update source_track_record table ... ... @@ -114,83 +114,120 @@ 114 114 * Move old AKEL logs to S3 (90+ days) 115 115 * Archive old edit history 116 116 * Compress and backup data 223 + 117 117 === 5.3 Quality Monitoring === 225 + 118 118 **Automated checks run continuously**: 227 + 119 119 * **Anomaly Detection**: Flag unusual patterns 120 - * Sudden confidence score changes121 - * Unusual evidence distributions122 - * Suspicious source patterns229 +* Sudden confidence score changes 230 +* Unusual evidence distributions 231 +* Suspicious source patterns 123 123 * **Contradiction Detection**: Identify conflicts 124 - * Evidence that contradicts other evidence125 - * Claims with internal contradictions126 - * Source track record anomalies233 +* Evidence that contradicts other evidence 234 +* Claims with internal contradictions 235 +* Source track record anomalies 127 127 * **Completeness Validation**: Ensure thoroughness 128 - * Sufficient evidence gathered 129 - * Multiple source types represented 130 - * Key scenarios identified 237 +* Sufficient evidence gathered 238 +* Multiple source types represented 239 +* Key scenarios identified 240 + 131 131 === 5.4 Moderation Detection === 242 + 132 132 **Automated abuse detection**: 244 + 133 133 * **Spam Identification**: Pattern matching for spam claims 134 134 * **Manipulation Detection**: Identify coordinated editing 135 135 * **Gaming Detection**: Flag attempts to game source scores 136 136 * **Suspicious Activity**: Log unusual behavior patterns 137 137 **Human Review**: Moderators review flagged items, system learns from decisions 250 + 138 138 == 6. Scalability Strategy == 252 + 139 139 === 6.1 Horizontal Scaling === 254 + 140 140 Components scale independently: 256 + 141 141 * **AKEL Workers**: Add more processing workers as claim volume grows 142 142 * **Database Read Replicas**: Add replicas for read-heavy workloads 143 143 * **Cache Layer**: Redis cluster for distributed caching 144 144 * **API Servers**: Load-balanced API instances 261 + 145 145 === 6.2 Vertical Scaling === 263 + 146 146 Individual components can be upgraded: 265 + 147 147 * **Database Server**: Increase CPU/RAM for PostgreSQL 148 148 * **Cache Memory**: Expand Redis memory 149 149 * **Worker Resources**: More powerful AKEL worker machines 269 + 150 150 === 6.3 Performance Optimization === 271 + 151 151 Built-in optimizations: 273 + 152 152 * **Denormalized Data**: Cache summary data in claim records (70% fewer joins) 153 153 * **Parallel Processing**: AKEL pipeline processes in parallel (40% faster) 154 154 * **Intelligent Caching**: Redis caches frequently accessed data 155 155 * **Background Processing**: Non-urgent tasks run asynchronously 278 + 156 156 == 7. Monitoring & Observability == 280 + 157 157 === 7.1 Key Metrics === 282 + 158 158 System tracks: 284 + 159 159 * **Performance**: AKEL processing time, API response time, cache hit rate 160 160 * **Quality**: Confidence score distribution, evidence completeness, contradiction rate 161 161 * **Usage**: Claims per day, active users, API requests 162 162 * **Errors**: Failed AKEL runs, API errors, database issues 289 + 163 163 === 7.2 Alerts === 291 + 164 164 Automated alerts for: 293 + 165 165 * Processing time >30 seconds (threshold breach) 166 166 * Error rate >1% (quality issue) 167 167 * Cache hit rate <80% (cache problem) 168 168 * Database connections >80% capacity (scaling needed) 298 + 169 169 === 7.3 Dashboards === 300 + 170 170 Real-time monitoring: 302 + 171 171 * **System Health**: Overall status and key metrics 172 172 * **AKEL Performance**: Processing time breakdown 173 173 * **Quality Metrics**: Confidence scores, completeness 174 174 * **User Activity**: Usage patterns, peak times 307 + 175 175 == 8. Security Architecture == 309 + 176 176 === 8.1 Authentication & Authorization === 311 + 177 177 * **User Authentication**: Secure login with password hashing 178 178 * **Role-Based Access**: Reader, Contributor, Moderator, Admin 179 179 * **API Keys**: For programmatic access 180 180 * **Rate Limiting**: Prevent abuse 316 + 181 181 === 8.2 Data Security === 318 + 182 182 * **Encryption**: TLS for transport, encrypted storage for sensitive data 183 183 * **Audit Logging**: Track all significant changes 184 184 * **Input Validation**: Sanitize all user inputs 185 185 * **SQL Injection Protection**: Parameterized queries 323 + 186 186 === 8.3 Abuse Prevention === 325 + 187 187 * **Rate Limiting**: Prevent flooding and DDoS 188 188 * **Automated Detection**: Flag suspicious patterns 189 189 * **Human Review**: Moderators investigate flagged content 190 190 * **Ban Mechanisms**: Block abusive users/IPs 330 + 191 191 == 9. Deployment Architecture == 332 + 192 192 === 9.1 Production Environment === 334 + 193 193 **Components**: 336 + 194 194 * Load Balancer (HAProxy or cloud LB) 195 195 * Multiple API servers (stateless) 196 196 * AKEL worker pool (auto-scaling) ... ... @@ -198,11 +198,15 @@ 198 198 * Redis cluster 199 199 * S3-compatible storage 200 200 **Regions**: Single region for V1.0, multi-region when needed 344 + 201 201 === 9.2 Development & Staging === 346 + 202 202 **Development**: Local Docker Compose setup 203 203 **Staging**: Scaled-down production replica 204 204 **CI/CD**: Automated testing and deployment 350 + 205 205 === 9.3 Disaster Recovery === 352 + 206 206 * **Database Backups**: Daily automated backups to S3 207 207 * **Point-in-Time Recovery**: Transaction log archival 208 208 * **Replication**: Real-time replication to standby ... ... @@ -213,20 +213,28 @@ 213 213 {{include reference="FactHarbor.Specification.Diagrams.Federation Architecture.WebHome"/}} 214 214 215 215 == 10. Future Architecture Evolution == 363 + 216 216 === 10.1 When to Add Complexity === 365 + 217 217 See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers. 218 218 **Elasticsearch**: When PostgreSQL search consistently >500ms 219 219 **TimescaleDB**: When metrics queries consistently >1s 220 220 **Federation**: When 10,000+ users and explicit demand 221 221 **Complex Reputation**: When 100+ active contributors 371 + 222 222 === 10.2 Federation (V2.0+) === 373 + 223 223 **Deferred until**: 375 + 224 224 * Core product proven with 10,000+ users 225 225 * User demand for decentralization 226 226 * Single-node limits reached 227 227 See [[Federation & Decentralization>>FactHarbor.Specification.Federation & Decentralization.WebHome]] for future plans. 380 + 228 228 == 11. Technology Stack Summary == 382 + 229 229 **Backend**: 384 + 230 230 * Python (FastAPI or Django) 231 231 * PostgreSQL (primary database) 232 232 * Redis (caching) ... ... @@ -244,8 +244,10 @@ 244 244 * Prometheus + Grafana 245 245 * Structured logging (ELK or cloud logging) 246 246 * Error tracking (Sentry) 402 + 247 247 == 12. Related Pages == 248 -* [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] 404 + 405 +* [[AI Knowledge Extraction Layer (AKEL)>>Archive.FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] 249 249 * [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] 250 250 * [[Data Model>>FactHarbor.Specification.Data Model.WebHome]] 251 251 * [[API Layer>>FactHarbor.Specification.Architecture.WebHome]]