Changes for page Architecture

Last modified by Robert Schaub on 2025/12/24 18:26

From version 2.1
edited by Robert Schaub
on 2025/12/24 13:58
Change comment: Imported from XAR
To version 3.3
edited by Robert Schaub
on 2025/12/24 18:26
Change comment: Renamed back-links.

Summary

Details

Page properties
Parent
... ... @@ -1,1 +1,1 @@
1 -Test.FactHarbor.Specification.WebHome
1 +Test.FactHarbor V0\.9\.103.Specification.WebHome
Content
... ... @@ -1,6 +1,9 @@
1 1  = Architecture =
2 +
2 2  FactHarbor's architecture is designed for **simplicity, automation, and continuous improvement**.
4 +
3 3  == 1. Core Principles ==
6 +
4 4  * **AI-First**: AKEL (AI) is the primary system, humans supplement
5 5  * **Publish by Default**: No centralized approval (removed in V0.9.50), publish with confidence scores
6 6  * **System Over Data**: Fix algorithms, not individual outputs
... ... @@ -7,65 +7,158 @@
7 7  * **Measure Everything**: Quality metrics drive improvements
8 8  * **Scale Through Automation**: Minimal human intervention
9 9  * **Start Simple**: Add complexity only when metrics prove necessary
13 +
10 10  == 2. High-Level Architecture ==
15 +
11 11  {{include reference="FactHarbor.Specification.Diagrams.High-Level Architecture.WebHome"/}}
17 +
12 12  === 2.1 Three-Layer Architecture ===
19 +
13 13  FactHarbor uses a clean three-layer architecture:
21 +
14 14  ==== Interface Layer ====
23 +
15 15  Handles all user and system interactions:
25 +
16 16  * **Web UI**: Browse claims, view evidence, submit feedback
17 17  * **REST API**: Programmatic access for integrations
18 18  * **Authentication & Authorization**: User identity and permissions
19 19  * **Rate Limiting**: Protect against abuse
30 +
20 20  ==== Processing Layer ====
32 +
21 21  Core business logic and AI processing:
34 +
22 22  * **AKEL Pipeline**: AI-driven claim analysis (parallel processing)
23 - * Parse and extract claim components
24 - * Gather evidence from multiple sources
25 - * Check source track records
26 - * Extract scenarios from evidence
27 - * Synthesize verdicts
28 - * Calculate risk scores
36 +* Parse and extract claim components
37 +* Gather evidence from multiple sources
38 +* Check source track records
39 +* Extract scenarios from evidence
40 +* Synthesize verdicts
41 +* Calculate risk scores
42 +
43 +* **LLM Abstraction Layer**: Provider-agnostic AI access
44 +* Multi-provider support (Anthropic, OpenAI, Google, local models)
45 +* Automatic failover and rate limit handling
46 +* Per-stage model configuration
47 +* Cost optimization through provider selection
48 +* No vendor lock-in
29 29  * **Background Jobs**: Automated maintenance tasks
30 - * Source track record updates (weekly)
31 - * Cache warming and invalidation
32 - * Metrics aggregation
33 - * Data archival
50 +* Source track record updates (weekly)
51 +* Cache warming and invalidation
52 +* Metrics aggregation
53 +* Data archival
34 34  * **Quality Monitoring**: Automated quality checks
35 - * Anomaly detection
36 - * Contradiction detection
37 - * Completeness validation
55 +* Anomaly detection
56 +* Contradiction detection
57 +* Completeness validation
38 38  * **Moderation Detection**: Automated abuse detection
39 - * Spam identification
40 - * Manipulation detection
41 - * Flag suspicious activity
59 +* Spam identification
60 +* Manipulation detection
61 +* Flag suspicious activity
62 +
42 42  ==== Data & Storage Layer ====
64 +
43 43  Persistent data storage and caching:
66 +
44 44  * **PostgreSQL**: Primary database for all core data
45 - * Claims, evidence, sources, users
46 - * Scenarios, edits, audit logs
47 - * Built-in full-text search
48 - * Time-series capabilities for metrics
68 +* Claims, evidence, sources, users
69 +* Scenarios, edits, audit logs
70 +* Built-in full-text search
71 +* Time-series capabilities for metrics
49 49  * **Redis**: High-speed caching layer
50 - * Session data
51 - * Frequently accessed claims
52 - * API rate limiting
73 +* Session data
74 +* Frequently accessed claims
75 +* API rate limiting
53 53  * **S3 Storage**: Long-term archival
54 - * Old edit history (90+ days)
55 - * AKEL processing logs
56 - * Backup snapshots
77 +* Old edit history (90+ days)
78 +* AKEL processing logs
79 +* Backup snapshots
57 57  **Optional future additions** (add only when metrics prove necessary):
58 58  * **Elasticsearch**: If PostgreSQL full-text search becomes slow
59 59  * **TimescaleDB**: If metrics queries become a bottleneck
83 +
84 +=== 2.2 LLM Abstraction Layer ===
85 +
86 +{{include reference="Test.FactHarbor V0\.9\.103.Specification.Diagrams.LLM Abstraction Architecture.WebHome"/}}
87 +
88 +**Purpose:** FactHarbor uses a provider-agnostic abstraction layer for all AI interactions, avoiding vendor lock-in and enabling flexible provider selection.
89 +
90 +**Multi-Provider Support:**
91 +
92 +* **Primary:** Anthropic Claude API (Haiku for extraction, Sonnet for analysis)
93 +* **Secondary:** OpenAI GPT API (automatic failover)
94 +* **Tertiary:** Google Vertex AI / Gemini
95 +* **Future:** Local models (Llama, Mistral) for on-premises deployments
96 +
97 +**Provider Interface:**
98 +
99 +* Abstract `LLMProvider` interface with `complete()`, `stream()`, `getName()`, `getCostPer1kTokens()`, `isAvailable()` methods
100 +* Per-stage model configuration (Stage 1: Haiku, Stage 2 & 3: Sonnet)
101 +* Environment variable and database configuration
102 +* Adapter pattern implementation (AnthropicProvider, OpenAIProvider, GoogleProvider)
103 +
104 +**Configuration:**
105 +
106 +* Runtime provider switching without code changes
107 +* Admin API for provider management (`POST /admin/v1/llm/configure`)
108 +* Per-stage cost optimization (use cheaper models for extraction, quality models for analysis)
109 +* Support for rate limit handling and cost tracking
110 +
111 +**Failover Strategy:**
112 +
113 +* Automatic fallback: Primary → Secondary → Tertiary
114 +* Circuit breaker pattern for unavailable providers
115 +* Health checking and provider availability monitoring
116 +* Graceful degradation when all providers unavailable
117 +
118 +**Cost Optimization:**
119 +
120 +* Track and compare costs across providers per request
121 +* Enable A/B testing of different models for quality/cost tradeoffs
122 +* Per-stage provider selection for optimal cost-efficiency
123 +* Cost comparison: Anthropic ($0.114), OpenAI ($0.065), Google ($0.072) per article at 0% cache
124 +
125 +**Architecture Pattern:**
126 +
127 +{{code}}
128 +AKEL Stages LLM Abstraction Providers
129 +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
130 +Stage 1 Extract ──→ Provider Interface ──→ Anthropic (PRIMARY)
131 +Stage 2 Analyze ──→ Configuration ──→ OpenAI (SECONDARY)
132 +Stage 3 Holistic ──→ Failover Handler ──→ Google (TERTIARY)
133 + └→ Local Models (FUTURE)
134 +{{/code}}
135 +
136 +**Benefits:**
137 +
138 +* **No Vendor Lock-In:** Switch providers based on cost, quality, or availability without code changes
139 +* **Resilience:** Automatic failover ensures service continuity during provider outages
140 +* **Cost Efficiency:** Use optimal provider per task (cheap for extraction, quality for analysis)
141 +* **Quality Assurance:** Cross-provider output verification for critical claims
142 +* **Regulatory Compliance:** Use specific providers for data residency requirements
143 +* **Future-Proofing:** Easy integration of new models as they become available
144 +
145 +**Cross-References:**
146 +
147 +* [[Requirements>>FactHarbor.Specification.Requirements.WebHome#NFR-14]]: NFR-14 (formal requirement)
148 +* [[POC Requirements>>FactHarbor.Specification.POC.Requirements#NFR-POC-11]]: NFR-POC-11 (POC1 implementation)
149 +* [[API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome#Section-6]]: Section 6 (implementation details)
150 +* [[Design Decisions>>FactHarbor.Specification.Design-Decisions#Section-9]]: Section 9 (design rationale)
151 +
60 60  === 2.2 Design Philosophy ===
153 +
61 61  **Start Simple, Evolve Based on Metrics**
62 62  The architecture deliberately starts simple:
156 +
63 63  * Single primary database (PostgreSQL handles most workloads initially)
64 64  * Three clear layers (easy to understand and maintain)
65 65  * Automated operations (minimal human intervention)
66 66  * Measure before optimizing (add complexity only when proven necessary)
67 67  See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale.
162 +
68 68  == 3. AKEL Architecture ==
164 +
69 69  {{include reference="FactHarbor.Specification.Diagrams.AKEL_Architecture.WebHome"/}}
70 70  See [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information.
71 71  
... ... @@ -76,6 +76,7 @@
76 76  === Multi-Claim Handling ===
77 77  
78 78  Users often submit:
175 +
79 79  * **Text with multiple claims**: Articles, statements, or paragraphs containing several distinct factual claims
80 80  * **Web pages**: URLs that are analyzed to extract all verifiable claims
81 81  * **Single claims**: Simple, direct factual statements
... ... @@ -87,11 +87,13 @@
87 87  **POC Implementation (Two-Phase):**
88 88  
89 89  Phase 1 - Claim Extraction:
187 +
90 90  * LLM analyzes submitted content
91 91  * Extracts all distinct, verifiable claims
92 92  * Returns structured list of claims with context
93 93  
94 94  Phase 2 - Parallel Analysis:
193 +
95 95  * Each claim processed independently by LLM
96 96  * Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk
97 97  * Parallelized across all claims
... ... @@ -100,16 +100,19 @@
100 100  **Production Implementation (Three-Phase):**
101 101  
102 102  Phase 1 - Extraction + Validation:
202 +
103 103  * Extract claims from content
104 104  * Validate clarity and uniqueness
105 105  * Filter vague or duplicate claims
106 106  
107 107  Phase 2 - Evidence Gathering (Parallel):
208 +
108 108  * Independent evidence gathering per claim
109 109  * Source validation and scenario generation
110 110  * Quality gates prevent poor data from advancing
111 111  
112 112  Phase 3 - Verdict Generation (Parallel):
214 +
113 113  * Generate verdict from validated evidence
114 114  * Confidence scoring and risk assessment
115 115  * Low-confidence cases routed to human review
... ... @@ -117,35 +117,48 @@
117 117  === Architectural Benefits ===
118 118  
119 119  **Scalability:**
120 -* Process 100 claims with ~3x latency of single claim
222 +
223 +* Process 100 claims with 3x latency of single claim
121 121  * Parallel processing across independent claims
122 122  * Linear cost scaling with claim count
123 123  
227 +=== 2.3 Design Philosophy ===
228 +
124 124  **Quality:**
230 +
125 125  * Validation gates between phases
126 126  * Errors isolated to individual claims
127 127  * Clear observability per processing step
128 128  
129 129  **Flexibility:**
236 +
130 130  * Each phase optimizable independently
131 131  * Can use different model sizes per phase
132 132  * Easy to add human review at decision points
133 133  
134 -
135 135  == 4. Storage Architecture ==
242 +
136 136  {{include reference="FactHarbor.Specification.Diagrams.Storage Architecture.WebHome"/}}
137 137  See [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] for detailed information.
245 +
138 138  == 4.5 Versioning Architecture ==
247 +
139 139  {{include reference="FactHarbor.Specification.Diagrams.Versioning Architecture.WebHome"/}}
249 +
140 140  == 5. Automated Systems in Detail ==
251 +
141 141  FactHarbor relies heavily on automation to achieve scale and quality. Here's how each automated system works:
253 +
142 142  === 5.1 AKEL (AI Knowledge Evaluation Layer) ===
255 +
143 143  **What it does**: Primary AI processing engine that analyzes claims automatically
144 144  **Inputs**:
258 +
145 145  * User-submitted claim text
146 146  * Existing evidence and sources
147 147  * Source track record database
148 148  **Processing steps**:
263 +
149 149  1. **Parse & Extract**: Identify key components, entities, assertions
150 150  2. **Gather Evidence**: Search web and database for relevant sources
151 151  3. **Check Sources**: Evaluate source reliability using track records
... ... @@ -153,6 +153,7 @@
153 153  5. **Synthesize Verdict**: Compile evidence assessment per scenario
154 154  6. **Calculate Risk**: Assess potential harm and controversy
155 155  **Outputs**:
271 +
156 156  * Structured claim record
157 157  * Evidence links with relevance scores
158 158  * Scenarios with context descriptions
... ... @@ -160,8 +160,11 @@
160 160  * Overall confidence score
161 161  * Risk assessment
162 162  **Timing**: 10-18 seconds total (parallel processing)
279 +
163 163  === 5.2 Background Jobs ===
281 +
164 164  **Source Track Record Updates** (Weekly):
283 +
165 165  * Analyze claim outcomes from past week
166 166  * Calculate source accuracy and reliability
167 167  * Update source_track_record table
... ... @@ -178,83 +178,120 @@
178 178  * Move old AKEL logs to S3 (90+ days)
179 179  * Archive old edit history
180 180  * Compress and backup data
300 +
181 181  === 5.3 Quality Monitoring ===
302 +
182 182  **Automated checks run continuously**:
304 +
183 183  * **Anomaly Detection**: Flag unusual patterns
184 - * Sudden confidence score changes
185 - * Unusual evidence distributions
186 - * Suspicious source patterns
306 +* Sudden confidence score changes
307 +* Unusual evidence distributions
308 +* Suspicious source patterns
187 187  * **Contradiction Detection**: Identify conflicts
188 - * Evidence that contradicts other evidence
189 - * Claims with internal contradictions
190 - * Source track record anomalies
310 +* Evidence that contradicts other evidence
311 +* Claims with internal contradictions
312 +* Source track record anomalies
191 191  * **Completeness Validation**: Ensure thoroughness
192 - * Sufficient evidence gathered
193 - * Multiple source types represented
194 - * Key scenarios identified
314 +* Sufficient evidence gathered
315 +* Multiple source types represented
316 +* Key scenarios identified
317 +
195 195  === 5.4 Moderation Detection ===
319 +
196 196  **Automated abuse detection**:
321 +
197 197  * **Spam Identification**: Pattern matching for spam claims
198 198  * **Manipulation Detection**: Identify coordinated editing
199 199  * **Gaming Detection**: Flag attempts to game source scores
200 200  * **Suspicious Activity**: Log unusual behavior patterns
201 201  **Human Review**: Moderators review flagged items, system learns from decisions
327 +
202 202  == 6. Scalability Strategy ==
329 +
203 203  === 6.1 Horizontal Scaling ===
331 +
204 204  Components scale independently:
333 +
205 205  * **AKEL Workers**: Add more processing workers as claim volume grows
206 206  * **Database Read Replicas**: Add replicas for read-heavy workloads
207 207  * **Cache Layer**: Redis cluster for distributed caching
208 208  * **API Servers**: Load-balanced API instances
338 +
209 209  === 6.2 Vertical Scaling ===
340 +
210 210  Individual components can be upgraded:
342 +
211 211  * **Database Server**: Increase CPU/RAM for PostgreSQL
212 212  * **Cache Memory**: Expand Redis memory
213 213  * **Worker Resources**: More powerful AKEL worker machines
346 +
214 214  === 6.3 Performance Optimization ===
348 +
215 215  Built-in optimizations:
350 +
216 216  * **Denormalized Data**: Cache summary data in claim records (70% fewer joins)
217 217  * **Parallel Processing**: AKEL pipeline processes in parallel (40% faster)
218 218  * **Intelligent Caching**: Redis caches frequently accessed data
219 219  * **Background Processing**: Non-urgent tasks run asynchronously
355 +
220 220  == 7. Monitoring & Observability ==
357 +
221 221  === 7.1 Key Metrics ===
359 +
222 222  System tracks:
361 +
223 223  * **Performance**: AKEL processing time, API response time, cache hit rate
224 224  * **Quality**: Confidence score distribution, evidence completeness, contradiction rate
225 225  * **Usage**: Claims per day, active users, API requests
226 226  * **Errors**: Failed AKEL runs, API errors, database issues
366 +
227 227  === 7.2 Alerts ===
368 +
228 228  Automated alerts for:
370 +
229 229  * Processing time >30 seconds (threshold breach)
230 230  * Error rate >1% (quality issue)
231 231  * Cache hit rate <80% (cache problem)
232 232  * Database connections >80% capacity (scaling needed)
375 +
233 233  === 7.3 Dashboards ===
377 +
234 234  Real-time monitoring:
379 +
235 235  * **System Health**: Overall status and key metrics
236 236  * **AKEL Performance**: Processing time breakdown
237 237  * **Quality Metrics**: Confidence scores, completeness
238 238  * **User Activity**: Usage patterns, peak times
384 +
239 239  == 8. Security Architecture ==
386 +
240 240  === 8.1 Authentication & Authorization ===
388 +
241 241  * **User Authentication**: Secure login with password hashing
242 242  * **Role-Based Access**: Reader, Contributor, Moderator, Admin
243 243  * **API Keys**: For programmatic access
244 244  * **Rate Limiting**: Prevent abuse
393 +
245 245  === 8.2 Data Security ===
395 +
246 246  * **Encryption**: TLS for transport, encrypted storage for sensitive data
247 247  * **Audit Logging**: Track all significant changes
248 248  * **Input Validation**: Sanitize all user inputs
249 249  * **SQL Injection Protection**: Parameterized queries
400 +
250 250  === 8.3 Abuse Prevention ===
402 +
251 251  * **Rate Limiting**: Prevent flooding and DDoS
252 252  * **Automated Detection**: Flag suspicious patterns
253 253  * **Human Review**: Moderators investigate flagged content
254 254  * **Ban Mechanisms**: Block abusive users/IPs
407 +
255 255  == 9. Deployment Architecture ==
409 +
256 256  === 9.1 Production Environment ===
411 +
257 257  **Components**:
413 +
258 258  * Load Balancer (HAProxy or cloud LB)
259 259  * Multiple API servers (stateless)
260 260  * AKEL worker pool (auto-scaling)
... ... @@ -262,11 +262,15 @@
262 262  * Redis cluster
263 263  * S3-compatible storage
264 264  **Regions**: Single region for V1.0, multi-region when needed
421 +
265 265  === 9.2 Development & Staging ===
423 +
266 266  **Development**: Local Docker Compose setup
267 267  **Staging**: Scaled-down production replica
268 268  **CI/CD**: Automated testing and deployment
427 +
269 269  === 9.3 Disaster Recovery ===
429 +
270 270  * **Database Backups**: Daily automated backups to S3
271 271  * **Point-in-Time Recovery**: Transaction log archival
272 272  * **Replication**: Real-time replication to standby
... ... @@ -277,20 +277,28 @@
277 277  {{include reference="FactHarbor.Specification.Diagrams.Federation Architecture.WebHome"/}}
278 278  
279 279  == 10. Future Architecture Evolution ==
440 +
280 280  === 10.1 When to Add Complexity ===
442 +
281 281  See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers.
282 282  **Elasticsearch**: When PostgreSQL search consistently >500ms
283 -**TimescaleDB**: When metrics queries consistently >1s
445 +**TimescaleDB**: When metrics queries consistently >1s
284 284  **Federation**: When 10,000+ users and explicit demand
285 285  **Complex Reputation**: When 100+ active contributors
448 +
286 286  === 10.2 Federation (V2.0+) ===
450 +
287 287  **Deferred until**:
452 +
288 288  * Core product proven with 10,000+ users
289 289  * User demand for decentralization
290 290  * Single-node limits reached
291 291  See [[Federation & Decentralization>>FactHarbor.Specification.Federation & Decentralization.WebHome]] for future plans.
457 +
292 292  == 11. Technology Stack Summary ==
459 +
293 293  **Backend**:
461 +
294 294  * Python (FastAPI or Django)
295 295  * PostgreSQL (primary database)
296 296  * Redis (caching)
... ... @@ -308,7 +308,9 @@
308 308  * Prometheus + Grafana
309 309  * Structured logging (ELK or cloud logging)
310 310  * Error tracking (Sentry)
479 +
311 311  == 12. Related Pages ==
481 +
312 312  * [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
313 313  * [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]]
314 314  * [[Data Model>>FactHarbor.Specification.Data Model.WebHome]]