Changes for page Architecture
Last modified by Robert Schaub on 2026/02/08 08:30
To version 4.3
edited by Robert Schaub
on 2026/01/20 20:25
on 2026/01/20 20:25
Change comment:
Update document after refactoring.
Summary
-
Page properties (2 modified, 0 added, 0 removed)
Details
- Page properties
-
- Parent
-
... ... @@ -1,1 +1,1 @@ 1 -FactHarbor.Specification.WebHome 1 +Archive.FactHarbor.Specification.WebHome - Content
-
... ... @@ -1,6 +1,9 @@ 1 1 = Architecture = 2 + 2 2 FactHarbor's architecture is designed for **simplicity, automation, and continuous improvement**. 4 + 3 3 == 1. Core Principles == 6 + 4 4 * **AI-First**: AKEL (AI) is the primary system, humans supplement 5 5 * **Publish by Default**: No centralized approval (removed in V0.9.50), publish with confidence scores 6 6 * **System Over Data**: Fix algorithms, not individual outputs ... ... @@ -7,81 +7,257 @@ 7 7 * **Measure Everything**: Quality metrics drive improvements 8 8 * **Scale Through Automation**: Minimal human intervention 9 9 * **Start Simple**: Add complexity only when metrics prove necessary 13 + 10 10 == 2. High-Level Architecture == 15 + 11 11 {{include reference="FactHarbor.Specification.Diagrams.High-Level Architecture.WebHome"/}} 17 + 12 12 === 2.1 Three-Layer Architecture === 19 + 13 13 FactHarbor uses a clean three-layer architecture: 21 + 14 14 ==== Interface Layer ==== 23 + 15 15 Handles all user and system interactions: 25 + 16 16 * **Web UI**: Browse claims, view evidence, submit feedback 17 17 * **REST API**: Programmatic access for integrations 18 18 * **Authentication & Authorization**: User identity and permissions 19 19 * **Rate Limiting**: Protect against abuse 30 + 20 20 ==== Processing Layer ==== 32 + 21 21 Core business logic and AI processing: 34 + 22 22 * **AKEL Pipeline**: AI-driven claim analysis (parallel processing) 23 - * Parse and extract claim components 24 - * Gather evidence from multiple sources 25 - * Check source track records 26 - * Extract scenarios from evidence 27 - * Synthesize verdicts 28 - * Calculate risk scores 36 +* Parse and extract claim components 37 +* Gather evidence from multiple sources 38 +* Check source track records 39 +* Extract scenarios from evidence 40 +* Synthesize verdicts 41 +* Calculate risk scores 42 + 43 +* **LLM Abstraction Layer**: Provider-agnostic AI access 44 +* Multi-provider support (Anthropic, OpenAI, Google, local models) 45 +* Automatic failover and rate limit handling 46 +* Per-stage model configuration 47 +* Cost optimization through provider selection 48 +* No vendor lock-in 29 29 * **Background Jobs**: Automated maintenance tasks 30 - * Source track record updates (weekly)31 - * Cache warming and invalidation32 - * Metrics aggregation33 - * Data archival50 +* Source track record updates (weekly) 51 +* Cache warming and invalidation 52 +* Metrics aggregation 53 +* Data archival 34 34 * **Quality Monitoring**: Automated quality checks 35 - * Anomaly detection36 - * Contradiction detection37 - * Completeness validation55 +* Anomaly detection 56 +* Contradiction detection 57 +* Completeness validation 38 38 * **Moderation Detection**: Automated abuse detection 39 - * Spam identification 40 - * Manipulation detection 41 - * Flag suspicious activity 59 +* Spam identification 60 +* Manipulation detection 61 +* Flag suspicious activity 62 + 42 42 ==== Data & Storage Layer ==== 64 + 43 43 Persistent data storage and caching: 66 + 44 44 * **PostgreSQL**: Primary database for all core data 45 - * Claims, evidence, sources, users46 - * Scenarios, edits, audit logs47 - * Built-in full-text search48 - * Time-series capabilities for metrics68 +* Claims, evidence, sources, users 69 +* Scenarios, edits, audit logs 70 +* Built-in full-text search 71 +* Time-series capabilities for metrics 49 49 * **Redis**: High-speed caching layer 50 - * Session data51 - * Frequently accessed claims52 - * API rate limiting73 +* Session data 74 +* Frequently accessed claims 75 +* API rate limiting 53 53 * **S3 Storage**: Long-term archival 54 - * Old edit history (90+ days)55 - * AKEL processing logs56 - * Backup snapshots77 +* Old edit history (90+ days) 78 +* AKEL processing logs 79 +* Backup snapshots 57 57 **Optional future additions** (add only when metrics prove necessary): 58 58 * **Elasticsearch**: If PostgreSQL full-text search becomes slow 59 59 * **TimescaleDB**: If metrics queries become a bottleneck 83 + 84 +=== 2.2 LLM Abstraction Layer === 85 + 86 +{{include reference="Test.FactHarbor.Specification.Diagrams.LLM Abstraction Architecture.WebHome"/}} 87 + 88 +**Purpose:** FactHarbor uses a provider-agnostic abstraction layer for all AI interactions, avoiding vendor lock-in and enabling flexible provider selection. 89 + 90 +**Multi-Provider Support:** 91 + 92 +* **Primary:** Anthropic Claude API (Haiku for extraction, Sonnet for analysis) 93 +* **Secondary:** OpenAI GPT API (automatic failover) 94 +* **Tertiary:** Google Vertex AI / Gemini 95 +* **Future:** Local models (Llama, Mistral) for on-premises deployments 96 + 97 +**Provider Interface:** 98 + 99 +* Abstract `LLMProvider` interface with `complete()`, `stream()`, `getName()`, `getCostPer1kTokens()`, `isAvailable()` methods 100 +* Per-stage model configuration (Stage 1: Haiku, Stage 2 & 3: Sonnet) 101 +* Environment variable and database configuration 102 +* Adapter pattern implementation (AnthropicProvider, OpenAIProvider, GoogleProvider) 103 + 104 +**Configuration:** 105 + 106 +* Runtime provider switching without code changes 107 +* Admin API for provider management (`POST /admin/v1/llm/configure`) 108 +* Per-stage cost optimization (use cheaper models for extraction, quality models for analysis) 109 +* Support for rate limit handling and cost tracking 110 + 111 +**Failover Strategy:** 112 + 113 +* Automatic fallback: Primary → Secondary → Tertiary 114 +* Circuit breaker pattern for unavailable providers 115 +* Health checking and provider availability monitoring 116 +* Graceful degradation when all providers unavailable 117 + 118 +**Cost Optimization:** 119 + 120 +* Track and compare costs across providers per request 121 +* Enable A/B testing of different models for quality/cost tradeoffs 122 +* Per-stage provider selection for optimal cost-efficiency 123 +* Cost comparison: Anthropic ($0.114), OpenAI ($0.065), Google ($0.072) per article at 0% cache 124 + 125 +**Architecture Pattern:** 126 + 127 +{{code}} 128 +AKEL Stages LLM Abstraction Providers 129 +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 130 +Stage 1 Extract ──→ Provider Interface ──→ Anthropic (PRIMARY) 131 +Stage 2 Analyze ──→ Configuration ──→ OpenAI (SECONDARY) 132 +Stage 3 Holistic ──→ Failover Handler ──→ Google (TERTIARY) 133 + └→ Local Models (FUTURE) 134 +{{/code}} 135 + 136 +**Benefits:** 137 + 138 +* **No Vendor Lock-In:** Switch providers based on cost, quality, or availability without code changes 139 +* **Resilience:** Automatic failover ensures service continuity during provider outages 140 +* **Cost Efficiency:** Use optimal provider per task (cheap for extraction, quality for analysis) 141 +* **Quality Assurance:** Cross-provider output verification for critical claims 142 +* **Regulatory Compliance:** Use specific providers for data residency requirements 143 +* **Future-Proofing:** Easy integration of new models as they become available 144 + 145 +**Cross-References:** 146 + 147 +* [[Requirements>>FactHarbor.Specification.Requirements.WebHome#NFR-14]]: NFR-14 (formal requirement) 148 +* [[POC Requirements>>FactHarbor.Specification.POC.Requirements#NFR-POC-11]]: NFR-POC-11 (POC1 implementation) 149 +* [[API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome#Section-6]]: Section 6 (implementation details) 150 +* [[Design Decisions>>FactHarbor.Specification.Design-Decisions#Section-9]]: Section 9 (design rationale) 151 + 60 60 === 2.2 Design Philosophy === 153 + 61 61 **Start Simple, Evolve Based on Metrics** 62 62 The architecture deliberately starts simple: 156 + 63 63 * Single primary database (PostgreSQL handles most workloads initially) 64 64 * Three clear layers (easy to understand and maintain) 65 65 * Automated operations (minimal human intervention) 66 66 * Measure before optimizing (add complexity only when proven necessary) 67 67 See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale. 162 + 68 68 == 3. AKEL Architecture == 69 -{{include reference="FactHarbor.Specification.Diagrams.AKEL_Architecture.WebHome"/}} 70 -See [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information. 164 + 165 +{{include reference="FactHarbor.Specification.Diagrams.AKEL Architecture.WebHome"/}} 166 +See [[AI Knowledge Extraction Layer (AKEL)>>Archive.FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information. 167 + 168 +== 3.5 Claim Processing Architecture == 169 + 170 +FactHarbor's claim processing architecture is designed to handle both single-claim and multi-claim submissions efficiently. 171 + 172 +=== Multi-Claim Handling === 173 + 174 +Users often submit: 175 + 176 +* **Text with multiple claims**: Articles, statements, or paragraphs containing several distinct factual claims 177 +* **Web pages**: URLs that are analyzed to extract all verifiable claims 178 +* **Single claims**: Simple, direct factual statements 179 + 180 +The first processing step is always **Claim Extraction**: identifying and isolating individual verifiable claims from submitted content. 181 + 182 +=== Processing Phases === 183 + 184 +**POC Implementation (Two-Phase):** 185 + 186 +Phase 1 - Claim Extraction: 187 + 188 +* LLM analyzes submitted content 189 +* Extracts all distinct, verifiable claims 190 +* Returns structured list of claims with context 191 + 192 +Phase 2 - Parallel Analysis: 193 + 194 +* Each claim processed independently by LLM 195 +* Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk 196 +* Parallelized across all claims 197 +* Results aggregated for presentation 198 + 199 +**Production Implementation (Three-Phase):** 200 + 201 +Phase 1 - Extraction + Validation: 202 + 203 +* Extract claims from content 204 +* Validate clarity and uniqueness 205 +* Filter vague or duplicate claims 206 + 207 +Phase 2 - Evidence Gathering (Parallel): 208 + 209 +* Independent evidence gathering per claim 210 +* Source validation and scenario generation 211 +* Quality gates prevent poor data from advancing 212 + 213 +Phase 3 - Verdict Generation (Parallel): 214 + 215 +* Generate verdict from validated evidence 216 +* Confidence scoring and risk assessment 217 +* Low-confidence cases routed to human review 218 + 219 +=== Architectural Benefits === 220 + 221 +**Scalability:** 222 + 223 +* Process 100 claims with 3x latency of single claim 224 +* Parallel processing across independent claims 225 +* Linear cost scaling with claim count 226 + 227 +=== 2.3 Design Philosophy === 228 + 229 +**Quality:** 230 + 231 +* Validation gates between phases 232 +* Errors isolated to individual claims 233 +* Clear observability per processing step 234 + 235 +**Flexibility:** 236 + 237 +* Each phase optimizable independently 238 +* Can use different model sizes per phase 239 +* Easy to add human review at decision points 240 + 71 71 == 4. Storage Architecture == 242 + 72 72 {{include reference="FactHarbor.Specification.Diagrams.Storage Architecture.WebHome"/}} 73 73 See [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] for detailed information. 245 + 74 74 == 4.5 Versioning Architecture == 247 + 75 75 {{include reference="FactHarbor.Specification.Diagrams.Versioning Architecture.WebHome"/}} 249 + 76 76 == 5. Automated Systems in Detail == 251 + 77 77 FactHarbor relies heavily on automation to achieve scale and quality. Here's how each automated system works: 253 + 78 78 === 5.1 AKEL (AI Knowledge Evaluation Layer) === 255 + 79 79 **What it does**: Primary AI processing engine that analyzes claims automatically 80 80 **Inputs**: 258 + 81 81 * User-submitted claim text 82 82 * Existing evidence and sources 83 83 * Source track record database 84 84 **Processing steps**: 263 + 85 85 1. **Parse & Extract**: Identify key components, entities, assertions 86 86 2. **Gather Evidence**: Search web and database for relevant sources 87 87 3. **Check Sources**: Evaluate source reliability using track records ... ... @@ -89,6 +89,7 @@ 89 89 5. **Synthesize Verdict**: Compile evidence assessment per scenario 90 90 6. **Calculate Risk**: Assess potential harm and controversy 91 91 **Outputs**: 271 + 92 92 * Structured claim record 93 93 * Evidence links with relevance scores 94 94 * Scenarios with context descriptions ... ... @@ -96,8 +96,11 @@ 96 96 * Overall confidence score 97 97 * Risk assessment 98 98 **Timing**: 10-18 seconds total (parallel processing) 279 + 99 99 === 5.2 Background Jobs === 281 + 100 100 **Source Track Record Updates** (Weekly): 283 + 101 101 * Analyze claim outcomes from past week 102 102 * Calculate source accuracy and reliability 103 103 * Update source_track_record table ... ... @@ -114,83 +114,120 @@ 114 114 * Move old AKEL logs to S3 (90+ days) 115 115 * Archive old edit history 116 116 * Compress and backup data 300 + 117 117 === 5.3 Quality Monitoring === 302 + 118 118 **Automated checks run continuously**: 304 + 119 119 * **Anomaly Detection**: Flag unusual patterns 120 - * Sudden confidence score changes121 - * Unusual evidence distributions122 - * Suspicious source patterns306 +* Sudden confidence score changes 307 +* Unusual evidence distributions 308 +* Suspicious source patterns 123 123 * **Contradiction Detection**: Identify conflicts 124 - * Evidence that contradicts other evidence125 - * Claims with internal contradictions126 - * Source track record anomalies310 +* Evidence that contradicts other evidence 311 +* Claims with internal contradictions 312 +* Source track record anomalies 127 127 * **Completeness Validation**: Ensure thoroughness 128 - * Sufficient evidence gathered 129 - * Multiple source types represented 130 - * Key scenarios identified 314 +* Sufficient evidence gathered 315 +* Multiple source types represented 316 +* Key scenarios identified 317 + 131 131 === 5.4 Moderation Detection === 319 + 132 132 **Automated abuse detection**: 321 + 133 133 * **Spam Identification**: Pattern matching for spam claims 134 134 * **Manipulation Detection**: Identify coordinated editing 135 135 * **Gaming Detection**: Flag attempts to game source scores 136 136 * **Suspicious Activity**: Log unusual behavior patterns 137 137 **Human Review**: Moderators review flagged items, system learns from decisions 327 + 138 138 == 6. Scalability Strategy == 329 + 139 139 === 6.1 Horizontal Scaling === 331 + 140 140 Components scale independently: 333 + 141 141 * **AKEL Workers**: Add more processing workers as claim volume grows 142 142 * **Database Read Replicas**: Add replicas for read-heavy workloads 143 143 * **Cache Layer**: Redis cluster for distributed caching 144 144 * **API Servers**: Load-balanced API instances 338 + 145 145 === 6.2 Vertical Scaling === 340 + 146 146 Individual components can be upgraded: 342 + 147 147 * **Database Server**: Increase CPU/RAM for PostgreSQL 148 148 * **Cache Memory**: Expand Redis memory 149 149 * **Worker Resources**: More powerful AKEL worker machines 346 + 150 150 === 6.3 Performance Optimization === 348 + 151 151 Built-in optimizations: 350 + 152 152 * **Denormalized Data**: Cache summary data in claim records (70% fewer joins) 153 153 * **Parallel Processing**: AKEL pipeline processes in parallel (40% faster) 154 154 * **Intelligent Caching**: Redis caches frequently accessed data 155 155 * **Background Processing**: Non-urgent tasks run asynchronously 355 + 156 156 == 7. Monitoring & Observability == 357 + 157 157 === 7.1 Key Metrics === 359 + 158 158 System tracks: 361 + 159 159 * **Performance**: AKEL processing time, API response time, cache hit rate 160 160 * **Quality**: Confidence score distribution, evidence completeness, contradiction rate 161 161 * **Usage**: Claims per day, active users, API requests 162 162 * **Errors**: Failed AKEL runs, API errors, database issues 366 + 163 163 === 7.2 Alerts === 368 + 164 164 Automated alerts for: 370 + 165 165 * Processing time >30 seconds (threshold breach) 166 166 * Error rate >1% (quality issue) 167 167 * Cache hit rate <80% (cache problem) 168 168 * Database connections >80% capacity (scaling needed) 375 + 169 169 === 7.3 Dashboards === 377 + 170 170 Real-time monitoring: 379 + 171 171 * **System Health**: Overall status and key metrics 172 172 * **AKEL Performance**: Processing time breakdown 173 173 * **Quality Metrics**: Confidence scores, completeness 174 174 * **User Activity**: Usage patterns, peak times 384 + 175 175 == 8. Security Architecture == 386 + 176 176 === 8.1 Authentication & Authorization === 388 + 177 177 * **User Authentication**: Secure login with password hashing 178 178 * **Role-Based Access**: Reader, Contributor, Moderator, Admin 179 179 * **API Keys**: For programmatic access 180 180 * **Rate Limiting**: Prevent abuse 393 + 181 181 === 8.2 Data Security === 395 + 182 182 * **Encryption**: TLS for transport, encrypted storage for sensitive data 183 183 * **Audit Logging**: Track all significant changes 184 184 * **Input Validation**: Sanitize all user inputs 185 185 * **SQL Injection Protection**: Parameterized queries 400 + 186 186 === 8.3 Abuse Prevention === 402 + 187 187 * **Rate Limiting**: Prevent flooding and DDoS 188 188 * **Automated Detection**: Flag suspicious patterns 189 189 * **Human Review**: Moderators investigate flagged content 190 190 * **Ban Mechanisms**: Block abusive users/IPs 407 + 191 191 == 9. Deployment Architecture == 409 + 192 192 === 9.1 Production Environment === 411 + 193 193 **Components**: 413 + 194 194 * Load Balancer (HAProxy or cloud LB) 195 195 * Multiple API servers (stateless) 196 196 * AKEL worker pool (auto-scaling) ... ... @@ -198,11 +198,15 @@ 198 198 * Redis cluster 199 199 * S3-compatible storage 200 200 **Regions**: Single region for V1.0, multi-region when needed 421 + 201 201 === 9.2 Development & Staging === 423 + 202 202 **Development**: Local Docker Compose setup 203 203 **Staging**: Scaled-down production replica 204 204 **CI/CD**: Automated testing and deployment 427 + 205 205 === 9.3 Disaster Recovery === 429 + 206 206 * **Database Backups**: Daily automated backups to S3 207 207 * **Point-in-Time Recovery**: Transaction log archival 208 208 * **Replication**: Real-time replication to standby ... ... @@ -213,20 +213,28 @@ 213 213 {{include reference="FactHarbor.Specification.Diagrams.Federation Architecture.WebHome"/}} 214 214 215 215 == 10. Future Architecture Evolution == 440 + 216 216 === 10.1 When to Add Complexity === 442 + 217 217 See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers. 218 218 **Elasticsearch**: When PostgreSQL search consistently >500ms 219 -**TimescaleDB**: When metrics queries consistently >1s 445 +**TimescaleDB**: When metrics queries consistently >1s 220 220 **Federation**: When 10,000+ users and explicit demand 221 221 **Complex Reputation**: When 100+ active contributors 448 + 222 222 === 10.2 Federation (V2.0+) === 450 + 223 223 **Deferred until**: 452 + 224 224 * Core product proven with 10,000+ users 225 225 * User demand for decentralization 226 226 * Single-node limits reached 227 227 See [[Federation & Decentralization>>FactHarbor.Specification.Federation & Decentralization.WebHome]] for future plans. 457 + 228 228 == 11. Technology Stack Summary == 459 + 229 229 **Backend**: 461 + 230 230 * Python (FastAPI or Django) 231 231 * PostgreSQL (primary database) 232 232 * Redis (caching) ... ... @@ -244,8 +244,10 @@ 244 244 * Prometheus + Grafana 245 245 * Structured logging (ELK or cloud logging) 246 246 * Error tracking (Sentry) 479 + 247 247 == 12. Related Pages == 248 -* [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] 481 + 482 +* [[AI Knowledge Extraction Layer (AKEL)>>Archive.FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] 249 249 * [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] 250 250 * [[Data Model>>FactHarbor.Specification.Data Model.WebHome]] 251 251 * [[API Layer>>FactHarbor.Specification.Architecture.WebHome]]