Changes for page Architecture

Last modified by Robert Schaub on 2025/12/24 21:53

From version 4.1
edited by Robert Schaub
on 2025/12/24 21:53
Change comment: Imported from XAR
To version 2.1
edited by Robert Schaub
on 2025/12/18 12:54
Change comment: Imported from XAR

Summary

Details

Page properties
Content
... ... @@ -20,114 +20,43 @@
20 20  ==== Processing Layer ====
21 21  Core business logic and AI processing:
22 22  * **AKEL Pipeline**: AI-driven claim analysis (parallel processing)
23 - * Parse and extract claim components
24 - * Gather evidence from multiple sources
25 - * Check source track records
26 - * Extract scenarios from evidence
27 - * Synthesize verdicts
28 - * Calculate risk scores
29 -
30 -* **LLM Abstraction Layer**: Provider-agnostic AI access
31 - * Multi-provider support (Anthropic, OpenAI, Google, local models)
32 - * Automatic failover and rate limit handling
33 - * Per-stage model configuration
34 - * Cost optimization through provider selection
35 - * No vendor lock-in
23 + * Parse and extract claim components
24 + * Gather evidence from multiple sources
25 + * Check source track records
26 + * Extract scenarios from evidence
27 + * Synthesize verdicts
28 + * Calculate risk scores
36 36  * **Background Jobs**: Automated maintenance tasks
37 - * Source track record updates (weekly)
38 - * Cache warming and invalidation
39 - * Metrics aggregation
40 - * Data archival
30 + * Source track record updates (weekly)
31 + * Cache warming and invalidation
32 + * Metrics aggregation
33 + * Data archival
41 41  * **Quality Monitoring**: Automated quality checks
42 - * Anomaly detection
43 - * Contradiction detection
44 - * Completeness validation
35 + * Anomaly detection
36 + * Contradiction detection
37 + * Completeness validation
45 45  * **Moderation Detection**: Automated abuse detection
46 - * Spam identification
47 - * Manipulation detection
48 - * Flag suspicious activity
39 + * Spam identification
40 + * Manipulation detection
41 + * Flag suspicious activity
49 49  ==== Data & Storage Layer ====
50 50  Persistent data storage and caching:
51 51  * **PostgreSQL**: Primary database for all core data
52 - * Claims, evidence, sources, users
53 - * Scenarios, edits, audit logs
54 - * Built-in full-text search
55 - * Time-series capabilities for metrics
45 + * Claims, evidence, sources, users
46 + * Scenarios, edits, audit logs
47 + * Built-in full-text search
48 + * Time-series capabilities for metrics
56 56  * **Redis**: High-speed caching layer
57 - * Session data
58 - * Frequently accessed claims
59 - * API rate limiting
50 + * Session data
51 + * Frequently accessed claims
52 + * API rate limiting
60 60  * **S3 Storage**: Long-term archival
61 - * Old edit history (90+ days)
62 - * AKEL processing logs
63 - * Backup snapshots
54 + * Old edit history (90+ days)
55 + * AKEL processing logs
56 + * Backup snapshots
64 64  **Optional future additions** (add only when metrics prove necessary):
65 65  * **Elasticsearch**: If PostgreSQL full-text search becomes slow
66 66  * **TimescaleDB**: If metrics queries become a bottleneck
67 -
68 -
69 -=== 2.2 LLM Abstraction Layer ===
70 -
71 -{{include reference="Test.FactHarbor.Specification.Diagrams.LLM Abstraction Architecture.WebHome"/}}
72 -
73 -**Purpose:** FactHarbor uses a provider-agnostic abstraction layer for all AI interactions, avoiding vendor lock-in and enabling flexible provider selection.
74 -
75 -**Multi-Provider Support:**
76 -* **Primary:** Anthropic Claude API (Haiku for extraction, Sonnet for analysis)
77 -* **Secondary:** OpenAI GPT API (automatic failover)
78 -* **Tertiary:** Google Vertex AI / Gemini
79 -* **Future:** Local models (Llama, Mistral) for on-premises deployments
80 -
81 -**Provider Interface:**
82 -* Abstract `LLMProvider` interface with `complete()`, `stream()`, `getName()`, `getCostPer1kTokens()`, `isAvailable()` methods
83 -* Per-stage model configuration (Stage 1: Haiku, Stage 2 & 3: Sonnet)
84 -* Environment variable and database configuration
85 -* Adapter pattern implementation (AnthropicProvider, OpenAIProvider, GoogleProvider)
86 -
87 -**Configuration:**
88 -* Runtime provider switching without code changes
89 -* Admin API for provider management (`POST /admin/v1/llm/configure`)
90 -* Per-stage cost optimization (use cheaper models for extraction, quality models for analysis)
91 -* Support for rate limit handling and cost tracking
92 -
93 -**Failover Strategy:**
94 -* Automatic fallback: Primary → Secondary → Tertiary
95 -* Circuit breaker pattern for unavailable providers
96 -* Health checking and provider availability monitoring
97 -* Graceful degradation when all providers unavailable
98 -
99 -**Cost Optimization:**
100 -* Track and compare costs across providers per request
101 -* Enable A/B testing of different models for quality/cost tradeoffs
102 -* Per-stage provider selection for optimal cost-efficiency
103 -* Cost comparison: Anthropic ($0.114), OpenAI ($0.065), Google ($0.072) per article at 0% cache
104 -
105 -**Architecture Pattern:**
106 -
107 -{{code}}
108 -AKEL Stages LLM Abstraction Providers
109 -━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
110 -Stage 1 Extract ──→ Provider Interface ──→ Anthropic (PRIMARY)
111 -Stage 2 Analyze ──→ Configuration ──→ OpenAI (SECONDARY)
112 -Stage 3 Holistic ──→ Failover Handler ──→ Google (TERTIARY)
113 - └→ Local Models (FUTURE)
114 -{{/code}}
115 -
116 -**Benefits:**
117 -* **No Vendor Lock-In:** Switch providers based on cost, quality, or availability without code changes
118 -* **Resilience:** Automatic failover ensures service continuity during provider outages
119 -* **Cost Efficiency:** Use optimal provider per task (cheap for extraction, quality for analysis)
120 -* **Quality Assurance:** Cross-provider output verification for critical claims
121 -* **Regulatory Compliance:** Use specific providers for data residency requirements
122 -* **Future-Proofing:** Easy integration of new models as they become available
123 -
124 -**Cross-References:**
125 -* [[Requirements>>FactHarbor.Specification.Requirements.WebHome#NFR-14]]: NFR-14 (formal requirement)
126 -* [[POC Requirements>>FactHarbor.Specification.POC.Requirements#NFR-POC-11]]: NFR-POC-11 (POC1 implementation)
127 -* [[API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome#Section-6]]: Section 6 (implementation details)
128 -* [[Design Decisions>>FactHarbor.Specification.Design-Decisions#Section-9]]: Section 9 (design rationale)
129 -
130 -
131 131  === 2.2 Design Philosophy ===
132 132  **Start Simple, Evolve Based on Metrics**
133 133  The architecture deliberately starts simple:
... ... @@ -137,71 +137,8 @@
137 137  * Measure before optimizing (add complexity only when proven necessary)
138 138  See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale.
139 139  == 3. AKEL Architecture ==
140 -{{include reference="FactHarbor.Specification.Diagrams.AKEL Architecture.WebHome"/}}
69 +{{include reference="FactHarbor.Specification.Diagrams.AKEL_Architecture.WebHome"/}}
141 141  See [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information.
142 -
143 -== 3.5 Claim Processing Architecture ==
144 -
145 -FactHarbor's claim processing architecture is designed to handle both single-claim and multi-claim submissions efficiently.
146 -
147 -=== Multi-Claim Handling ===
148 -
149 -Users often submit:
150 -* **Text with multiple claims**: Articles, statements, or paragraphs containing several distinct factual claims
151 -* **Web pages**: URLs that are analyzed to extract all verifiable claims
152 -* **Single claims**: Simple, direct factual statements
153 -
154 -The first processing step is always **Claim Extraction**: identifying and isolating individual verifiable claims from submitted content.
155 -
156 -=== Processing Phases ===
157 -
158 -**POC Implementation (Two-Phase):**
159 -
160 -Phase 1 - Claim Extraction:
161 -* LLM analyzes submitted content
162 -* Extracts all distinct, verifiable claims
163 -* Returns structured list of claims with context
164 -
165 -Phase 2 - Parallel Analysis:
166 -* Each claim processed independently by LLM
167 -* Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk
168 -* Parallelized across all claims
169 -* Results aggregated for presentation
170 -
171 -**Production Implementation (Three-Phase):**
172 -
173 -Phase 1 - Extraction + Validation:
174 -* Extract claims from content
175 -* Validate clarity and uniqueness
176 -* Filter vague or duplicate claims
177 -
178 -Phase 2 - Evidence Gathering (Parallel):
179 -* Independent evidence gathering per claim
180 -* Source validation and scenario generation
181 -* Quality gates prevent poor data from advancing
182 -
183 -Phase 3 - Verdict Generation (Parallel):
184 -* Generate verdict from validated evidence
185 -* Confidence scoring and risk assessment
186 -* Low-confidence cases routed to human review
187 -
188 -=== Architectural Benefits ===
189 -
190 -**Scalability:**
191 -* Process 100 claims with ~3x latency of single claim
192 -* Parallel processing across independent claims
193 -* Linear cost scaling with claim count
194 -=== 2.3 Design Philosophy ===
195 -**Quality:**
196 -* Validation gates between phases
197 -* Errors isolated to individual claims
198 -* Clear observability per processing step
199 -
200 -**Flexibility:**
201 -* Each phase optimizable independently
202 -* Can use different model sizes per phase
203 -* Easy to add human review at decision points
204 -
205 205  == 4. Storage Architecture ==
206 206  {{include reference="FactHarbor.Specification.Diagrams.Storage Architecture.WebHome"/}}
207 207  See [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] for detailed information.
... ... @@ -251,17 +251,17 @@
251 251  === 5.3 Quality Monitoring ===
252 252  **Automated checks run continuously**:
253 253  * **Anomaly Detection**: Flag unusual patterns
254 - * Sudden confidence score changes
255 - * Unusual evidence distributions
256 - * Suspicious source patterns
120 + * Sudden confidence score changes
121 + * Unusual evidence distributions
122 + * Suspicious source patterns
257 257  * **Contradiction Detection**: Identify conflicts
258 - * Evidence that contradicts other evidence
259 - * Claims with internal contradictions
260 - * Source track record anomalies
124 + * Evidence that contradicts other evidence
125 + * Claims with internal contradictions
126 + * Source track record anomalies
261 261  * **Completeness Validation**: Ensure thoroughness
262 - * Sufficient evidence gathered
263 - * Multiple source types represented
264 - * Key scenarios identified
128 + * Sufficient evidence gathered
129 + * Multiple source types represented
130 + * Key scenarios identified
265 265  === 5.4 Moderation Detection ===
266 266  **Automated abuse detection**:
267 267  * **Spam Identification**: Pattern matching for spam claims
... ... @@ -350,7 +350,7 @@
350 350  === 10.1 When to Add Complexity ===
351 351  See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers.
352 352  **Elasticsearch**: When PostgreSQL search consistently >500ms
353 -**TimescaleDB**: When metrics queries consistently >1s
219 +**TimescaleDB**: When metrics queries consistently >1s
354 354  **Federation**: When 10,000+ users and explicit demand
355 355  **Complex Reputation**: When 100+ active contributors
356 356  === 10.2 Federation (V2.0+) ===