Changes for page Architecture
Last modified by Robert Schaub on 2025/12/24 21:53
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -20,114 +20,43 @@ 20 20 ==== Processing Layer ==== 21 21 Core business logic and AI processing: 22 22 * **AKEL Pipeline**: AI-driven claim analysis (parallel processing) 23 - * Parse and extract claim components 24 - * Gather evidence from multiple sources 25 - * Check source track records 26 - * Extract scenarios from evidence 27 - * Synthesize verdicts 28 - * Calculate risk scores 29 - 30 -* **LLM Abstraction Layer**: Provider-agnostic AI access 31 - * Multi-provider support (Anthropic, OpenAI, Google, local models) 32 - * Automatic failover and rate limit handling 33 - * Per-stage model configuration 34 - * Cost optimization through provider selection 35 - * No vendor lock-in 23 + * Parse and extract claim components 24 + * Gather evidence from multiple sources 25 + * Check source track records 26 + * Extract scenarios from evidence 27 + * Synthesize verdicts 28 + * Calculate risk scores 36 36 * **Background Jobs**: Automated maintenance tasks 37 - * Source track record updates (weekly) 38 - * Cache warming and invalidation 39 - * Metrics aggregation 40 - * Data archival 30 + * Source track record updates (weekly) 31 + * Cache warming and invalidation 32 + * Metrics aggregation 33 + * Data archival 41 41 * **Quality Monitoring**: Automated quality checks 42 - * Anomaly detection 43 - * Contradiction detection 44 - * Completeness validation 35 + * Anomaly detection 36 + * Contradiction detection 37 + * Completeness validation 45 45 * **Moderation Detection**: Automated abuse detection 46 - * Spam identification 47 - * Manipulation detection 48 - * Flag suspicious activity 39 + * Spam identification 40 + * Manipulation detection 41 + * Flag suspicious activity 49 49 ==== Data & Storage Layer ==== 50 50 Persistent data storage and caching: 51 51 * **PostgreSQL**: Primary database for all core data 52 - * Claims, evidence, sources, users 53 - * Scenarios, edits, audit logs 54 - * Built-in full-text search 55 - * Time-series capabilities for metrics 45 + * Claims, evidence, sources, users 46 + * Scenarios, edits, audit logs 47 + * Built-in full-text search 48 + * Time-series capabilities for metrics 56 56 * **Redis**: High-speed caching layer 57 - * Session data 58 - * Frequently accessed claims 59 - * API rate limiting 50 + * Session data 51 + * Frequently accessed claims 52 + * API rate limiting 60 60 * **S3 Storage**: Long-term archival 61 - * Old edit history (90+ days) 62 - * AKEL processing logs 63 - * Backup snapshots 54 + * Old edit history (90+ days) 55 + * AKEL processing logs 56 + * Backup snapshots 64 64 **Optional future additions** (add only when metrics prove necessary): 65 65 * **Elasticsearch**: If PostgreSQL full-text search becomes slow 66 66 * **TimescaleDB**: If metrics queries become a bottleneck 67 - 68 - 69 -=== 2.2 LLM Abstraction Layer === 70 - 71 -{{include reference="Test.FactHarbor.Specification.Diagrams.LLM Abstraction Architecture.WebHome"/}} 72 - 73 -**Purpose:** FactHarbor uses a provider-agnostic abstraction layer for all AI interactions, avoiding vendor lock-in and enabling flexible provider selection. 74 - 75 -**Multi-Provider Support:** 76 -* **Primary:** Anthropic Claude API (Haiku for extraction, Sonnet for analysis) 77 -* **Secondary:** OpenAI GPT API (automatic failover) 78 -* **Tertiary:** Google Vertex AI / Gemini 79 -* **Future:** Local models (Llama, Mistral) for on-premises deployments 80 - 81 -**Provider Interface:** 82 -* Abstract `LLMProvider` interface with `complete()`, `stream()`, `getName()`, `getCostPer1kTokens()`, `isAvailable()` methods 83 -* Per-stage model configuration (Stage 1: Haiku, Stage 2 & 3: Sonnet) 84 -* Environment variable and database configuration 85 -* Adapter pattern implementation (AnthropicProvider, OpenAIProvider, GoogleProvider) 86 - 87 -**Configuration:** 88 -* Runtime provider switching without code changes 89 -* Admin API for provider management (`POST /admin/v1/llm/configure`) 90 -* Per-stage cost optimization (use cheaper models for extraction, quality models for analysis) 91 -* Support for rate limit handling and cost tracking 92 - 93 -**Failover Strategy:** 94 -* Automatic fallback: Primary → Secondary → Tertiary 95 -* Circuit breaker pattern for unavailable providers 96 -* Health checking and provider availability monitoring 97 -* Graceful degradation when all providers unavailable 98 - 99 -**Cost Optimization:** 100 -* Track and compare costs across providers per request 101 -* Enable A/B testing of different models for quality/cost tradeoffs 102 -* Per-stage provider selection for optimal cost-efficiency 103 -* Cost comparison: Anthropic ($0.114), OpenAI ($0.065), Google ($0.072) per article at 0% cache 104 - 105 -**Architecture Pattern:** 106 - 107 -{{code}} 108 -AKEL Stages LLM Abstraction Providers 109 -━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 110 -Stage 1 Extract ──→ Provider Interface ──→ Anthropic (PRIMARY) 111 -Stage 2 Analyze ──→ Configuration ──→ OpenAI (SECONDARY) 112 -Stage 3 Holistic ──→ Failover Handler ──→ Google (TERTIARY) 113 - └→ Local Models (FUTURE) 114 -{{/code}} 115 - 116 -**Benefits:** 117 -* **No Vendor Lock-In:** Switch providers based on cost, quality, or availability without code changes 118 -* **Resilience:** Automatic failover ensures service continuity during provider outages 119 -* **Cost Efficiency:** Use optimal provider per task (cheap for extraction, quality for analysis) 120 -* **Quality Assurance:** Cross-provider output verification for critical claims 121 -* **Regulatory Compliance:** Use specific providers for data residency requirements 122 -* **Future-Proofing:** Easy integration of new models as they become available 123 - 124 -**Cross-References:** 125 -* [[Requirements>>FactHarbor.Specification.Requirements.WebHome#NFR-14]]: NFR-14 (formal requirement) 126 -* [[POC Requirements>>FactHarbor.Specification.POC.Requirements#NFR-POC-11]]: NFR-POC-11 (POC1 implementation) 127 -* [[API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome#Section-6]]: Section 6 (implementation details) 128 -* [[Design Decisions>>FactHarbor.Specification.Design-Decisions#Section-9]]: Section 9 (design rationale) 129 - 130 - 131 131 === 2.2 Design Philosophy === 132 132 **Start Simple, Evolve Based on Metrics** 133 133 The architecture deliberately starts simple: ... ... @@ -137,71 +137,8 @@ 137 137 * Measure before optimizing (add complexity only when proven necessary) 138 138 See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale. 139 139 == 3. AKEL Architecture == 140 -{{include reference="FactHarbor.Specification.Diagrams.AKEL Architecture.WebHome"/}}69 +{{include reference="FactHarbor.Specification.Diagrams.AKEL_Architecture.WebHome"/}} 141 141 See [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information. 142 - 143 -== 3.5 Claim Processing Architecture == 144 - 145 -FactHarbor's claim processing architecture is designed to handle both single-claim and multi-claim submissions efficiently. 146 - 147 -=== Multi-Claim Handling === 148 - 149 -Users often submit: 150 -* **Text with multiple claims**: Articles, statements, or paragraphs containing several distinct factual claims 151 -* **Web pages**: URLs that are analyzed to extract all verifiable claims 152 -* **Single claims**: Simple, direct factual statements 153 - 154 -The first processing step is always **Claim Extraction**: identifying and isolating individual verifiable claims from submitted content. 155 - 156 -=== Processing Phases === 157 - 158 -**POC Implementation (Two-Phase):** 159 - 160 -Phase 1 - Claim Extraction: 161 -* LLM analyzes submitted content 162 -* Extracts all distinct, verifiable claims 163 -* Returns structured list of claims with context 164 - 165 -Phase 2 - Parallel Analysis: 166 -* Each claim processed independently by LLM 167 -* Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk 168 -* Parallelized across all claims 169 -* Results aggregated for presentation 170 - 171 -**Production Implementation (Three-Phase):** 172 - 173 -Phase 1 - Extraction + Validation: 174 -* Extract claims from content 175 -* Validate clarity and uniqueness 176 -* Filter vague or duplicate claims 177 - 178 -Phase 2 - Evidence Gathering (Parallel): 179 -* Independent evidence gathering per claim 180 -* Source validation and scenario generation 181 -* Quality gates prevent poor data from advancing 182 - 183 -Phase 3 - Verdict Generation (Parallel): 184 -* Generate verdict from validated evidence 185 -* Confidence scoring and risk assessment 186 -* Low-confidence cases routed to human review 187 - 188 -=== Architectural Benefits === 189 - 190 -**Scalability:** 191 -* Process 100 claims with ~3x latency of single claim 192 -* Parallel processing across independent claims 193 -* Linear cost scaling with claim count 194 -=== 2.3 Design Philosophy === 195 -**Quality:** 196 -* Validation gates between phases 197 -* Errors isolated to individual claims 198 -* Clear observability per processing step 199 - 200 -**Flexibility:** 201 -* Each phase optimizable independently 202 -* Can use different model sizes per phase 203 -* Easy to add human review at decision points 204 - 205 205 == 4. Storage Architecture == 206 206 {{include reference="FactHarbor.Specification.Diagrams.Storage Architecture.WebHome"/}} 207 207 See [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] for detailed information. ... ... @@ -251,17 +251,17 @@ 251 251 === 5.3 Quality Monitoring === 252 252 **Automated checks run continuously**: 253 253 * **Anomaly Detection**: Flag unusual patterns 254 - * Sudden confidence score changes 255 - * Unusual evidence distributions 256 - * Suspicious source patterns 120 + * Sudden confidence score changes 121 + * Unusual evidence distributions 122 + * Suspicious source patterns 257 257 * **Contradiction Detection**: Identify conflicts 258 - * Evidence that contradicts other evidence 259 - * Claims with internal contradictions 260 - * Source track record anomalies 124 + * Evidence that contradicts other evidence 125 + * Claims with internal contradictions 126 + * Source track record anomalies 261 261 * **Completeness Validation**: Ensure thoroughness 262 - * Sufficient evidence gathered 263 - * Multiple source types represented 264 - * Key scenarios identified 128 + * Sufficient evidence gathered 129 + * Multiple source types represented 130 + * Key scenarios identified 265 265 === 5.4 Moderation Detection === 266 266 **Automated abuse detection**: 267 267 * **Spam Identification**: Pattern matching for spam claims ... ... @@ -350,7 +350,7 @@ 350 350 === 10.1 When to Add Complexity === 351 351 See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers. 352 352 **Elasticsearch**: When PostgreSQL search consistently >500ms 353 -**TimescaleDB**: When metrics queries consistently >1s 219 +**TimescaleDB**: When metrics queries consistently >1s 354 354 **Federation**: When 10,000+ users and explicit demand 355 355 **Complex Reputation**: When 100+ active contributors 356 356 === 10.2 Federation (V2.0+) ===