Changes for page Architecture
Last modified by Robert Schaub on 2025/12/24 21:53
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -20,43 +20,114 @@ 20 20 ==== Processing Layer ==== 21 21 Core business logic and AI processing: 22 22 * **AKEL Pipeline**: AI-driven claim analysis (parallel processing) 23 - * Parse and extract claim components 24 - * Gather evidence from multiple sources 25 - * Check source track records 26 - * Extract scenarios from evidence 27 - * Synthesize verdicts 28 - * Calculate risk scores 23 + * Parse and extract claim components 24 + * Gather evidence from multiple sources 25 + * Check source track records 26 + * Extract scenarios from evidence 27 + * Synthesize verdicts 28 + * Calculate risk scores 29 + 30 +* **LLM Abstraction Layer**: Provider-agnostic AI access 31 + * Multi-provider support (Anthropic, OpenAI, Google, local models) 32 + * Automatic failover and rate limit handling 33 + * Per-stage model configuration 34 + * Cost optimization through provider selection 35 + * No vendor lock-in 29 29 * **Background Jobs**: Automated maintenance tasks 30 - * Source track record updates (weekly)31 - * Cache warming and invalidation32 - * Metrics aggregation33 - * Data archival37 + * Source track record updates (weekly) 38 + * Cache warming and invalidation 39 + * Metrics aggregation 40 + * Data archival 34 34 * **Quality Monitoring**: Automated quality checks 35 - * Anomaly detection36 - * Contradiction detection37 - * Completeness validation42 + * Anomaly detection 43 + * Contradiction detection 44 + * Completeness validation 38 38 * **Moderation Detection**: Automated abuse detection 39 - * Spam identification40 - * Manipulation detection41 - * Flag suspicious activity46 + * Spam identification 47 + * Manipulation detection 48 + * Flag suspicious activity 42 42 ==== Data & Storage Layer ==== 43 43 Persistent data storage and caching: 44 44 * **PostgreSQL**: Primary database for all core data 45 - * Claims, evidence, sources, users46 - * Scenarios, edits, audit logs47 - * Built-in full-text search48 - * Time-series capabilities for metrics52 + * Claims, evidence, sources, users 53 + * Scenarios, edits, audit logs 54 + * Built-in full-text search 55 + * Time-series capabilities for metrics 49 49 * **Redis**: High-speed caching layer 50 - * Session data51 - * Frequently accessed claims52 - * API rate limiting57 + * Session data 58 + * Frequently accessed claims 59 + * API rate limiting 53 53 * **S3 Storage**: Long-term archival 54 - * Old edit history (90+ days)55 - * AKEL processing logs56 - * Backup snapshots61 + * Old edit history (90+ days) 62 + * AKEL processing logs 63 + * Backup snapshots 57 57 **Optional future additions** (add only when metrics prove necessary): 58 58 * **Elasticsearch**: If PostgreSQL full-text search becomes slow 59 59 * **TimescaleDB**: If metrics queries become a bottleneck 67 + 68 + 69 +=== 2.2 LLM Abstraction Layer === 70 + 71 +{{include reference="Test.FactHarbor.Specification.Diagrams.LLM Abstraction Architecture.WebHome"/}} 72 + 73 +**Purpose:** FactHarbor uses a provider-agnostic abstraction layer for all AI interactions, avoiding vendor lock-in and enabling flexible provider selection. 74 + 75 +**Multi-Provider Support:** 76 +* **Primary:** Anthropic Claude API (Haiku for extraction, Sonnet for analysis) 77 +* **Secondary:** OpenAI GPT API (automatic failover) 78 +* **Tertiary:** Google Vertex AI / Gemini 79 +* **Future:** Local models (Llama, Mistral) for on-premises deployments 80 + 81 +**Provider Interface:** 82 +* Abstract `LLMProvider` interface with `complete()`, `stream()`, `getName()`, `getCostPer1kTokens()`, `isAvailable()` methods 83 +* Per-stage model configuration (Stage 1: Haiku, Stage 2 & 3: Sonnet) 84 +* Environment variable and database configuration 85 +* Adapter pattern implementation (AnthropicProvider, OpenAIProvider, GoogleProvider) 86 + 87 +**Configuration:** 88 +* Runtime provider switching without code changes 89 +* Admin API for provider management (`POST /admin/v1/llm/configure`) 90 +* Per-stage cost optimization (use cheaper models for extraction, quality models for analysis) 91 +* Support for rate limit handling and cost tracking 92 + 93 +**Failover Strategy:** 94 +* Automatic fallback: Primary → Secondary → Tertiary 95 +* Circuit breaker pattern for unavailable providers 96 +* Health checking and provider availability monitoring 97 +* Graceful degradation when all providers unavailable 98 + 99 +**Cost Optimization:** 100 +* Track and compare costs across providers per request 101 +* Enable A/B testing of different models for quality/cost tradeoffs 102 +* Per-stage provider selection for optimal cost-efficiency 103 +* Cost comparison: Anthropic ($0.114), OpenAI ($0.065), Google ($0.072) per article at 0% cache 104 + 105 +**Architecture Pattern:** 106 + 107 +{{code}} 108 +AKEL Stages LLM Abstraction Providers 109 +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 110 +Stage 1 Extract ──→ Provider Interface ──→ Anthropic (PRIMARY) 111 +Stage 2 Analyze ──→ Configuration ──→ OpenAI (SECONDARY) 112 +Stage 3 Holistic ──→ Failover Handler ──→ Google (TERTIARY) 113 + └→ Local Models (FUTURE) 114 +{{/code}} 115 + 116 +**Benefits:** 117 +* **No Vendor Lock-In:** Switch providers based on cost, quality, or availability without code changes 118 +* **Resilience:** Automatic failover ensures service continuity during provider outages 119 +* **Cost Efficiency:** Use optimal provider per task (cheap for extraction, quality for analysis) 120 +* **Quality Assurance:** Cross-provider output verification for critical claims 121 +* **Regulatory Compliance:** Use specific providers for data residency requirements 122 +* **Future-Proofing:** Easy integration of new models as they become available 123 + 124 +**Cross-References:** 125 +* [[Requirements>>FactHarbor.Specification.Requirements.WebHome#NFR-14]]: NFR-14 (formal requirement) 126 +* [[POC Requirements>>FactHarbor.Specification.POC.Requirements#NFR-POC-11]]: NFR-POC-11 (POC1 implementation) 127 +* [[API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome#Section-6]]: Section 6 (implementation details) 128 +* [[Design Decisions>>FactHarbor.Specification.Design-Decisions#Section-9]]: Section 9 (design rationale) 129 + 130 + 60 60 === 2.2 Design Philosophy === 61 61 **Start Simple, Evolve Based on Metrics** 62 62 The architecture deliberately starts simple: ... ... @@ -66,8 +66,71 @@ 66 66 * Measure before optimizing (add complexity only when proven necessary) 67 67 See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale. 68 68 == 3. AKEL Architecture == 69 -{{include reference="FactHarbor.Specification.Diagrams.AKEL _Architecture.WebHome"/}}140 +{{include reference="FactHarbor.Specification.Diagrams.AKEL Architecture.WebHome"/}} 70 70 See [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information. 142 + 143 +== 3.5 Claim Processing Architecture == 144 + 145 +FactHarbor's claim processing architecture is designed to handle both single-claim and multi-claim submissions efficiently. 146 + 147 +=== Multi-Claim Handling === 148 + 149 +Users often submit: 150 +* **Text with multiple claims**: Articles, statements, or paragraphs containing several distinct factual claims 151 +* **Web pages**: URLs that are analyzed to extract all verifiable claims 152 +* **Single claims**: Simple, direct factual statements 153 + 154 +The first processing step is always **Claim Extraction**: identifying and isolating individual verifiable claims from submitted content. 155 + 156 +=== Processing Phases === 157 + 158 +**POC Implementation (Two-Phase):** 159 + 160 +Phase 1 - Claim Extraction: 161 +* LLM analyzes submitted content 162 +* Extracts all distinct, verifiable claims 163 +* Returns structured list of claims with context 164 + 165 +Phase 2 - Parallel Analysis: 166 +* Each claim processed independently by LLM 167 +* Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk 168 +* Parallelized across all claims 169 +* Results aggregated for presentation 170 + 171 +**Production Implementation (Three-Phase):** 172 + 173 +Phase 1 - Extraction + Validation: 174 +* Extract claims from content 175 +* Validate clarity and uniqueness 176 +* Filter vague or duplicate claims 177 + 178 +Phase 2 - Evidence Gathering (Parallel): 179 +* Independent evidence gathering per claim 180 +* Source validation and scenario generation 181 +* Quality gates prevent poor data from advancing 182 + 183 +Phase 3 - Verdict Generation (Parallel): 184 +* Generate verdict from validated evidence 185 +* Confidence scoring and risk assessment 186 +* Low-confidence cases routed to human review 187 + 188 +=== Architectural Benefits === 189 + 190 +**Scalability:** 191 +* Process 100 claims with ~3x latency of single claim 192 +* Parallel processing across independent claims 193 +* Linear cost scaling with claim count 194 +=== 2.3 Design Philosophy === 195 +**Quality:** 196 +* Validation gates between phases 197 +* Errors isolated to individual claims 198 +* Clear observability per processing step 199 + 200 +**Flexibility:** 201 +* Each phase optimizable independently 202 +* Can use different model sizes per phase 203 +* Easy to add human review at decision points 204 + 71 71 == 4. Storage Architecture == 72 72 {{include reference="FactHarbor.Specification.Diagrams.Storage Architecture.WebHome"/}} 73 73 See [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] for detailed information. ... ... @@ -117,17 +117,17 @@ 117 117 === 5.3 Quality Monitoring === 118 118 **Automated checks run continuously**: 119 119 * **Anomaly Detection**: Flag unusual patterns 120 - * Sudden confidence score changes121 - * Unusual evidence distributions122 - * Suspicious source patterns254 + * Sudden confidence score changes 255 + * Unusual evidence distributions 256 + * Suspicious source patterns 123 123 * **Contradiction Detection**: Identify conflicts 124 - * Evidence that contradicts other evidence125 - * Claims with internal contradictions126 - * Source track record anomalies258 + * Evidence that contradicts other evidence 259 + * Claims with internal contradictions 260 + * Source track record anomalies 127 127 * **Completeness Validation**: Ensure thoroughness 128 - * Sufficient evidence gathered129 - * Multiple source types represented130 - * Key scenarios identified262 + * Sufficient evidence gathered 263 + * Multiple source types represented 264 + * Key scenarios identified 131 131 === 5.4 Moderation Detection === 132 132 **Automated abuse detection**: 133 133 * **Spam Identification**: Pattern matching for spam claims ... ... @@ -207,11 +207,16 @@ 207 207 * **Point-in-Time Recovery**: Transaction log archival 208 208 * **Replication**: Real-time replication to standby 209 209 * **Recovery Time Objective**: <4 hours 344 + 345 +=== 9.5 Federation Architecture Diagram === 346 + 347 +{{include reference="FactHarbor.Specification.Diagrams.Federation Architecture.WebHome"/}} 348 + 210 210 == 10. Future Architecture Evolution == 211 211 === 10.1 When to Add Complexity === 212 212 See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers. 213 213 **Elasticsearch**: When PostgreSQL search consistently >500ms 214 -**TimescaleDB**: When metrics queries consistently >1s 353 +**TimescaleDB**: When metrics queries consistently >1s 215 215 **Federation**: When 10,000+ users and explicit demand 216 216 **Complex Reputation**: When 100+ active contributors 217 217 === 10.2 Federation (V2.0+) ===