Changes for page Architecture
Last modified by Robert Schaub on 2025/12/24 18:26
From version 3.2
edited by Robert Schaub
on 2025/12/24 18:26
on 2025/12/24 18:26
Change comment:
Update document after refactoring.
Summary
-
Page properties (2 modified, 0 added, 0 removed)
Details
- Page properties
-
- Parent
-
... ... @@ -1,1 +1,1 @@ 1 -Test.FactHarbor V0\.9\.103.Specification.WebHome1 +Test.FactHarbor.Specification.WebHome - Content
-
... ... @@ -20,114 +20,43 @@ 20 20 ==== Processing Layer ==== 21 21 Core business logic and AI processing: 22 22 * **AKEL Pipeline**: AI-driven claim analysis (parallel processing) 23 - * Parse and extract claim components 24 - * Gather evidence from multiple sources 25 - * Check source track records 26 - * Extract scenarios from evidence 27 - * Synthesize verdicts 28 - * Calculate risk scores 29 - 30 -* **LLM Abstraction Layer**: Provider-agnostic AI access 31 - * Multi-provider support (Anthropic, OpenAI, Google, local models) 32 - * Automatic failover and rate limit handling 33 - * Per-stage model configuration 34 - * Cost optimization through provider selection 35 - * No vendor lock-in 23 + * Parse and extract claim components 24 + * Gather evidence from multiple sources 25 + * Check source track records 26 + * Extract scenarios from evidence 27 + * Synthesize verdicts 28 + * Calculate risk scores 36 36 * **Background Jobs**: Automated maintenance tasks 37 - * Source track record updates (weekly) 38 - * Cache warming and invalidation 39 - * Metrics aggregation 40 - * Data archival 30 + * Source track record updates (weekly) 31 + * Cache warming and invalidation 32 + * Metrics aggregation 33 + * Data archival 41 41 * **Quality Monitoring**: Automated quality checks 42 - * Anomaly detection 43 - * Contradiction detection 44 - * Completeness validation 35 + * Anomaly detection 36 + * Contradiction detection 37 + * Completeness validation 45 45 * **Moderation Detection**: Automated abuse detection 46 - * Spam identification 47 - * Manipulation detection 48 - * Flag suspicious activity 39 + * Spam identification 40 + * Manipulation detection 41 + * Flag suspicious activity 49 49 ==== Data & Storage Layer ==== 50 50 Persistent data storage and caching: 51 51 * **PostgreSQL**: Primary database for all core data 52 - * Claims, evidence, sources, users 53 - * Scenarios, edits, audit logs 54 - * Built-in full-text search 55 - * Time-series capabilities for metrics 45 + * Claims, evidence, sources, users 46 + * Scenarios, edits, audit logs 47 + * Built-in full-text search 48 + * Time-series capabilities for metrics 56 56 * **Redis**: High-speed caching layer 57 - * Session data 58 - * Frequently accessed claims 59 - * API rate limiting 50 + * Session data 51 + * Frequently accessed claims 52 + * API rate limiting 60 60 * **S3 Storage**: Long-term archival 61 - * Old edit history (90+ days) 62 - * AKEL processing logs 63 - * Backup snapshots 54 + * Old edit history (90+ days) 55 + * AKEL processing logs 56 + * Backup snapshots 64 64 **Optional future additions** (add only when metrics prove necessary): 65 65 * **Elasticsearch**: If PostgreSQL full-text search becomes slow 66 66 * **TimescaleDB**: If metrics queries become a bottleneck 67 - 68 - 69 -=== 2.2 LLM Abstraction Layer === 70 - 71 -{{include reference="Test.FactHarbor.Specification.Diagrams.LLM Abstraction Architecture.WebHome"/}} 72 - 73 -**Purpose:** FactHarbor uses a provider-agnostic abstraction layer for all AI interactions, avoiding vendor lock-in and enabling flexible provider selection. 74 - 75 -**Multi-Provider Support:** 76 -* **Primary:** Anthropic Claude API (Haiku for extraction, Sonnet for analysis) 77 -* **Secondary:** OpenAI GPT API (automatic failover) 78 -* **Tertiary:** Google Vertex AI / Gemini 79 -* **Future:** Local models (Llama, Mistral) for on-premises deployments 80 - 81 -**Provider Interface:** 82 -* Abstract `LLMProvider` interface with `complete()`, `stream()`, `getName()`, `getCostPer1kTokens()`, `isAvailable()` methods 83 -* Per-stage model configuration (Stage 1: Haiku, Stage 2 & 3: Sonnet) 84 -* Environment variable and database configuration 85 -* Adapter pattern implementation (AnthropicProvider, OpenAIProvider, GoogleProvider) 86 - 87 -**Configuration:** 88 -* Runtime provider switching without code changes 89 -* Admin API for provider management (`POST /admin/v1/llm/configure`) 90 -* Per-stage cost optimization (use cheaper models for extraction, quality models for analysis) 91 -* Support for rate limit handling and cost tracking 92 - 93 -**Failover Strategy:** 94 -* Automatic fallback: Primary → Secondary → Tertiary 95 -* Circuit breaker pattern for unavailable providers 96 -* Health checking and provider availability monitoring 97 -* Graceful degradation when all providers unavailable 98 - 99 -**Cost Optimization:** 100 -* Track and compare costs across providers per request 101 -* Enable A/B testing of different models for quality/cost tradeoffs 102 -* Per-stage provider selection for optimal cost-efficiency 103 -* Cost comparison: Anthropic ($0.114), OpenAI ($0.065), Google ($0.072) per article at 0% cache 104 - 105 -**Architecture Pattern:** 106 - 107 -{{code}} 108 -AKEL Stages LLM Abstraction Providers 109 -━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 110 -Stage 1 Extract ──→ Provider Interface ──→ Anthropic (PRIMARY) 111 -Stage 2 Analyze ──→ Configuration ──→ OpenAI (SECONDARY) 112 -Stage 3 Holistic ──→ Failover Handler ──→ Google (TERTIARY) 113 - └→ Local Models (FUTURE) 114 -{{/code}} 115 - 116 -**Benefits:** 117 -* **No Vendor Lock-In:** Switch providers based on cost, quality, or availability without code changes 118 -* **Resilience:** Automatic failover ensures service continuity during provider outages 119 -* **Cost Efficiency:** Use optimal provider per task (cheap for extraction, quality for analysis) 120 -* **Quality Assurance:** Cross-provider output verification for critical claims 121 -* **Regulatory Compliance:** Use specific providers for data residency requirements 122 -* **Future-Proofing:** Easy integration of new models as they become available 123 - 124 -**Cross-References:** 125 -* [[Requirements>>FactHarbor.Specification.Requirements.WebHome#NFR-14]]: NFR-14 (formal requirement) 126 -* [[POC Requirements>>FactHarbor.Specification.POC.Requirements#NFR-POC-11]]: NFR-POC-11 (POC1 implementation) 127 -* [[API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome#Section-6]]: Section 6 (implementation details) 128 -* [[Design Decisions>>FactHarbor.Specification.Design-Decisions#Section-9]]: Section 9 (design rationale) 129 - 130 - 131 131 === 2.2 Design Philosophy === 132 132 **Start Simple, Evolve Based on Metrics** 133 133 The architecture deliberately starts simple: ... ... @@ -191,7 +191,7 @@ 191 191 * Process 100 claims with ~3x latency of single claim 192 192 * Parallel processing across independent claims 193 193 * Linear cost scaling with claim count 194 - === 2.3 Design Philosophy ===123 + 195 195 **Quality:** 196 196 * Validation gates between phases 197 197 * Errors isolated to individual claims ... ... @@ -202,6 +202,7 @@ 202 202 * Can use different model sizes per phase 203 203 * Easy to add human review at decision points 204 204 134 + 205 205 == 4. Storage Architecture == 206 206 {{include reference="FactHarbor.Specification.Diagrams.Storage Architecture.WebHome"/}} 207 207 See [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] for detailed information. ... ... @@ -251,17 +251,17 @@ 251 251 === 5.3 Quality Monitoring === 252 252 **Automated checks run continuously**: 253 253 * **Anomaly Detection**: Flag unusual patterns 254 - * Sudden confidence score changes 255 - * Unusual evidence distributions 256 - * Suspicious source patterns 184 + * Sudden confidence score changes 185 + * Unusual evidence distributions 186 + * Suspicious source patterns 257 257 * **Contradiction Detection**: Identify conflicts 258 - * Evidence that contradicts other evidence 259 - * Claims with internal contradictions 260 - * Source track record anomalies 188 + * Evidence that contradicts other evidence 189 + * Claims with internal contradictions 190 + * Source track record anomalies 261 261 * **Completeness Validation**: Ensure thoroughness 262 - * Sufficient evidence gathered 263 - * Multiple source types represented 264 - * Key scenarios identified 192 + * Sufficient evidence gathered 193 + * Multiple source types represented 194 + * Key scenarios identified 265 265 === 5.4 Moderation Detection === 266 266 **Automated abuse detection**: 267 267 * **Spam Identification**: Pattern matching for spam claims ... ... @@ -350,7 +350,7 @@ 350 350 === 10.1 When to Add Complexity === 351 351 See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers. 352 352 **Elasticsearch**: When PostgreSQL search consistently >500ms 353 -**TimescaleDB**: When metrics queries consistently >1s 283 +**TimescaleDB**: When metrics queries consistently >1s 354 354 **Federation**: When 10,000+ users and explicit demand 355 355 **Complex Reputation**: When 100+ active contributors 356 356 === 10.2 Federation (V2.0+) ===