Changes for page Architecture

Last modified by Robert Schaub on 2026/02/08 08:23

From version 2.1
edited by Robert Schaub
on 2025/12/18 12:54
Change comment: Imported from XAR
To version 3.3
edited by Robert Schaub
on 2026/01/20 20:24
Change comment: Renamed back-links.

Summary

Details

Page properties
Parent
... ... @@ -1,1 +1,1 @@
1 -FactHarbor.Specification.WebHome
1 +Archive.FactHarbor V0\.9\.50 Plus (Prev Rel).Specification.WebHome
Content
... ... @@ -1,6 +1,9 @@
1 1  = Architecture =
2 +
2 2  FactHarbor's architecture is designed for **simplicity, automation, and continuous improvement**.
4 +
3 3  == 1. Core Principles ==
6 +
4 4  * **AI-First**: AKEL (AI) is the primary system, humans supplement
5 5  * **Publish by Default**: No centralized approval (removed in V0.9.50), publish with confidence scores
6 6  * **System Over Data**: Fix algorithms, not individual outputs
... ... @@ -7,81 +7,180 @@
7 7  * **Measure Everything**: Quality metrics drive improvements
8 8  * **Scale Through Automation**: Minimal human intervention
9 9  * **Start Simple**: Add complexity only when metrics prove necessary
13 +
10 10  == 2. High-Level Architecture ==
15 +
11 11  {{include reference="FactHarbor.Specification.Diagrams.High-Level Architecture.WebHome"/}}
17 +
12 12  === 2.1 Three-Layer Architecture ===
19 +
13 13  FactHarbor uses a clean three-layer architecture:
21 +
14 14  ==== Interface Layer ====
23 +
15 15  Handles all user and system interactions:
25 +
16 16  * **Web UI**: Browse claims, view evidence, submit feedback
17 17  * **REST API**: Programmatic access for integrations
18 18  * **Authentication & Authorization**: User identity and permissions
19 19  * **Rate Limiting**: Protect against abuse
30 +
20 20  ==== Processing Layer ====
32 +
21 21  Core business logic and AI processing:
34 +
22 22  * **AKEL Pipeline**: AI-driven claim analysis (parallel processing)
23 - * Parse and extract claim components
24 - * Gather evidence from multiple sources
25 - * Check source track records
26 - * Extract scenarios from evidence
27 - * Synthesize verdicts
28 - * Calculate risk scores
36 +* Parse and extract claim components
37 +* Gather evidence from multiple sources
38 +* Check source track records
39 +* Extract scenarios from evidence
40 +* Synthesize verdicts
41 +* Calculate risk scores
29 29  * **Background Jobs**: Automated maintenance tasks
30 - * Source track record updates (weekly)
31 - * Cache warming and invalidation
32 - * Metrics aggregation
33 - * Data archival
43 +* Source track record updates (weekly)
44 +* Cache warming and invalidation
45 +* Metrics aggregation
46 +* Data archival
34 34  * **Quality Monitoring**: Automated quality checks
35 - * Anomaly detection
36 - * Contradiction detection
37 - * Completeness validation
48 +* Anomaly detection
49 +* Contradiction detection
50 +* Completeness validation
38 38  * **Moderation Detection**: Automated abuse detection
39 - * Spam identification
40 - * Manipulation detection
41 - * Flag suspicious activity
52 +* Spam identification
53 +* Manipulation detection
54 +* Flag suspicious activity
55 +
42 42  ==== Data & Storage Layer ====
57 +
43 43  Persistent data storage and caching:
59 +
44 44  * **PostgreSQL**: Primary database for all core data
45 - * Claims, evidence, sources, users
46 - * Scenarios, edits, audit logs
47 - * Built-in full-text search
48 - * Time-series capabilities for metrics
61 +* Claims, evidence, sources, users
62 +* Scenarios, edits, audit logs
63 +* Built-in full-text search
64 +* Time-series capabilities for metrics
49 49  * **Redis**: High-speed caching layer
50 - * Session data
51 - * Frequently accessed claims
52 - * API rate limiting
66 +* Session data
67 +* Frequently accessed claims
68 +* API rate limiting
53 53  * **S3 Storage**: Long-term archival
54 - * Old edit history (90+ days)
55 - * AKEL processing logs
56 - * Backup snapshots
70 +* Old edit history (90+ days)
71 +* AKEL processing logs
72 +* Backup snapshots
57 57  **Optional future additions** (add only when metrics prove necessary):
58 58  * **Elasticsearch**: If PostgreSQL full-text search becomes slow
59 59  * **TimescaleDB**: If metrics queries become a bottleneck
76 +
60 60  === 2.2 Design Philosophy ===
78 +
61 61  **Start Simple, Evolve Based on Metrics**
62 62  The architecture deliberately starts simple:
81 +
63 63  * Single primary database (PostgreSQL handles most workloads initially)
64 64  * Three clear layers (easy to understand and maintain)
65 65  * Automated operations (minimal human intervention)
66 66  * Measure before optimizing (add complexity only when proven necessary)
67 67  See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale.
87 +
68 68  == 3. AKEL Architecture ==
89 +
69 69  {{include reference="FactHarbor.Specification.Diagrams.AKEL_Architecture.WebHome"/}}
70 -See [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information.
91 +See [[AI Knowledge Extraction Layer (AKEL)>>Archive.FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information.
92 +
93 +== 3.5 Claim Processing Architecture ==
94 +
95 +FactHarbor's claim processing architecture is designed to handle both single-claim and multi-claim submissions efficiently.
96 +
97 +=== Multi-Claim Handling ===
98 +
99 +Users often submit:
100 +
101 +* **Text with multiple claims**: Articles, statements, or paragraphs containing several distinct factual claims
102 +* **Web pages**: URLs that are analyzed to extract all verifiable claims
103 +* **Single claims**: Simple, direct factual statements
104 +
105 +The first processing step is always **Claim Extraction**: identifying and isolating individual verifiable claims from submitted content.
106 +
107 +=== Processing Phases ===
108 +
109 +**POC Implementation (Two-Phase):**
110 +
111 +Phase 1 - Claim Extraction:
112 +
113 +* LLM analyzes submitted content
114 +* Extracts all distinct, verifiable claims
115 +* Returns structured list of claims with context
116 +
117 +Phase 2 - Parallel Analysis:
118 +
119 +* Each claim processed independently by LLM
120 +* Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk
121 +* Parallelized across all claims
122 +* Results aggregated for presentation
123 +
124 +**Production Implementation (Three-Phase):**
125 +
126 +Phase 1 - Extraction + Validation:
127 +
128 +* Extract claims from content
129 +* Validate clarity and uniqueness
130 +* Filter vague or duplicate claims
131 +
132 +Phase 2 - Evidence Gathering (Parallel):
133 +
134 +* Independent evidence gathering per claim
135 +* Source validation and scenario generation
136 +* Quality gates prevent poor data from advancing
137 +
138 +Phase 3 - Verdict Generation (Parallel):
139 +
140 +* Generate verdict from validated evidence
141 +* Confidence scoring and risk assessment
142 +* Low-confidence cases routed to human review
143 +
144 +=== Architectural Benefits ===
145 +
146 +**Scalability:**
147 +
148 +* Process 100 claims with 3x latency of single claim
149 +* Parallel processing across independent claims
150 +* Linear cost scaling with claim count
151 +
152 +**Quality:**
153 +
154 +* Validation gates between phases
155 +* Errors isolated to individual claims
156 +* Clear observability per processing step
157 +
158 +**Flexibility:**
159 +
160 +* Each phase optimizable independently
161 +* Can use different model sizes per phase
162 +* Easy to add human review at decision points
163 +
71 71  == 4. Storage Architecture ==
165 +
72 72  {{include reference="FactHarbor.Specification.Diagrams.Storage Architecture.WebHome"/}}
73 73  See [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] for detailed information.
168 +
74 74  == 4.5 Versioning Architecture ==
170 +
75 75  {{include reference="FactHarbor.Specification.Diagrams.Versioning Architecture.WebHome"/}}
172 +
76 76  == 5. Automated Systems in Detail ==
174 +
77 77  FactHarbor relies heavily on automation to achieve scale and quality. Here's how each automated system works:
176 +
78 78  === 5.1 AKEL (AI Knowledge Evaluation Layer) ===
178 +
79 79  **What it does**: Primary AI processing engine that analyzes claims automatically
80 80  **Inputs**:
181 +
81 81  * User-submitted claim text
82 82  * Existing evidence and sources
83 83  * Source track record database
84 84  **Processing steps**:
186 +
85 85  1. **Parse & Extract**: Identify key components, entities, assertions
86 86  2. **Gather Evidence**: Search web and database for relevant sources
87 87  3. **Check Sources**: Evaluate source reliability using track records
... ... @@ -89,6 +89,7 @@
89 89  5. **Synthesize Verdict**: Compile evidence assessment per scenario
90 90  6. **Calculate Risk**: Assess potential harm and controversy
91 91  **Outputs**:
194 +
92 92  * Structured claim record
93 93  * Evidence links with relevance scores
94 94  * Scenarios with context descriptions
... ... @@ -96,8 +96,11 @@
96 96  * Overall confidence score
97 97  * Risk assessment
98 98  **Timing**: 10-18 seconds total (parallel processing)
202 +
99 99  === 5.2 Background Jobs ===
204 +
100 100  **Source Track Record Updates** (Weekly):
206 +
101 101  * Analyze claim outcomes from past week
102 102  * Calculate source accuracy and reliability
103 103  * Update source_track_record table
... ... @@ -114,83 +114,120 @@
114 114  * Move old AKEL logs to S3 (90+ days)
115 115  * Archive old edit history
116 116  * Compress and backup data
223 +
117 117  === 5.3 Quality Monitoring ===
225 +
118 118  **Automated checks run continuously**:
227 +
119 119  * **Anomaly Detection**: Flag unusual patterns
120 - * Sudden confidence score changes
121 - * Unusual evidence distributions
122 - * Suspicious source patterns
229 +* Sudden confidence score changes
230 +* Unusual evidence distributions
231 +* Suspicious source patterns
123 123  * **Contradiction Detection**: Identify conflicts
124 - * Evidence that contradicts other evidence
125 - * Claims with internal contradictions
126 - * Source track record anomalies
233 +* Evidence that contradicts other evidence
234 +* Claims with internal contradictions
235 +* Source track record anomalies
127 127  * **Completeness Validation**: Ensure thoroughness
128 - * Sufficient evidence gathered
129 - * Multiple source types represented
130 - * Key scenarios identified
237 +* Sufficient evidence gathered
238 +* Multiple source types represented
239 +* Key scenarios identified
240 +
131 131  === 5.4 Moderation Detection ===
242 +
132 132  **Automated abuse detection**:
244 +
133 133  * **Spam Identification**: Pattern matching for spam claims
134 134  * **Manipulation Detection**: Identify coordinated editing
135 135  * **Gaming Detection**: Flag attempts to game source scores
136 136  * **Suspicious Activity**: Log unusual behavior patterns
137 137  **Human Review**: Moderators review flagged items, system learns from decisions
250 +
138 138  == 6. Scalability Strategy ==
252 +
139 139  === 6.1 Horizontal Scaling ===
254 +
140 140  Components scale independently:
256 +
141 141  * **AKEL Workers**: Add more processing workers as claim volume grows
142 142  * **Database Read Replicas**: Add replicas for read-heavy workloads
143 143  * **Cache Layer**: Redis cluster for distributed caching
144 144  * **API Servers**: Load-balanced API instances
261 +
145 145  === 6.2 Vertical Scaling ===
263 +
146 146  Individual components can be upgraded:
265 +
147 147  * **Database Server**: Increase CPU/RAM for PostgreSQL
148 148  * **Cache Memory**: Expand Redis memory
149 149  * **Worker Resources**: More powerful AKEL worker machines
269 +
150 150  === 6.3 Performance Optimization ===
271 +
151 151  Built-in optimizations:
273 +
152 152  * **Denormalized Data**: Cache summary data in claim records (70% fewer joins)
153 153  * **Parallel Processing**: AKEL pipeline processes in parallel (40% faster)
154 154  * **Intelligent Caching**: Redis caches frequently accessed data
155 155  * **Background Processing**: Non-urgent tasks run asynchronously
278 +
156 156  == 7. Monitoring & Observability ==
280 +
157 157  === 7.1 Key Metrics ===
282 +
158 158  System tracks:
284 +
159 159  * **Performance**: AKEL processing time, API response time, cache hit rate
160 160  * **Quality**: Confidence score distribution, evidence completeness, contradiction rate
161 161  * **Usage**: Claims per day, active users, API requests
162 162  * **Errors**: Failed AKEL runs, API errors, database issues
289 +
163 163  === 7.2 Alerts ===
291 +
164 164  Automated alerts for:
293 +
165 165  * Processing time >30 seconds (threshold breach)
166 166  * Error rate >1% (quality issue)
167 167  * Cache hit rate <80% (cache problem)
168 168  * Database connections >80% capacity (scaling needed)
298 +
169 169  === 7.3 Dashboards ===
300 +
170 170  Real-time monitoring:
302 +
171 171  * **System Health**: Overall status and key metrics
172 172  * **AKEL Performance**: Processing time breakdown
173 173  * **Quality Metrics**: Confidence scores, completeness
174 174  * **User Activity**: Usage patterns, peak times
307 +
175 175  == 8. Security Architecture ==
309 +
176 176  === 8.1 Authentication & Authorization ===
311 +
177 177  * **User Authentication**: Secure login with password hashing
178 178  * **Role-Based Access**: Reader, Contributor, Moderator, Admin
179 179  * **API Keys**: For programmatic access
180 180  * **Rate Limiting**: Prevent abuse
316 +
181 181  === 8.2 Data Security ===
318 +
182 182  * **Encryption**: TLS for transport, encrypted storage for sensitive data
183 183  * **Audit Logging**: Track all significant changes
184 184  * **Input Validation**: Sanitize all user inputs
185 185  * **SQL Injection Protection**: Parameterized queries
323 +
186 186  === 8.3 Abuse Prevention ===
325 +
187 187  * **Rate Limiting**: Prevent flooding and DDoS
188 188  * **Automated Detection**: Flag suspicious patterns
189 189  * **Human Review**: Moderators investigate flagged content
190 190  * **Ban Mechanisms**: Block abusive users/IPs
330 +
191 191  == 9. Deployment Architecture ==
332 +
192 192  === 9.1 Production Environment ===
334 +
193 193  **Components**:
336 +
194 194  * Load Balancer (HAProxy or cloud LB)
195 195  * Multiple API servers (stateless)
196 196  * AKEL worker pool (auto-scaling)
... ... @@ -198,11 +198,15 @@
198 198  * Redis cluster
199 199  * S3-compatible storage
200 200  **Regions**: Single region for V1.0, multi-region when needed
344 +
201 201  === 9.2 Development & Staging ===
346 +
202 202  **Development**: Local Docker Compose setup
203 203  **Staging**: Scaled-down production replica
204 204  **CI/CD**: Automated testing and deployment
350 +
205 205  === 9.3 Disaster Recovery ===
352 +
206 206  * **Database Backups**: Daily automated backups to S3
207 207  * **Point-in-Time Recovery**: Transaction log archival
208 208  * **Replication**: Real-time replication to standby
... ... @@ -213,20 +213,28 @@
213 213  {{include reference="FactHarbor.Specification.Diagrams.Federation Architecture.WebHome"/}}
214 214  
215 215  == 10. Future Architecture Evolution ==
363 +
216 216  === 10.1 When to Add Complexity ===
365 +
217 217  See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers.
218 218  **Elasticsearch**: When PostgreSQL search consistently >500ms
219 219  **TimescaleDB**: When metrics queries consistently >1s
220 220  **Federation**: When 10,000+ users and explicit demand
221 221  **Complex Reputation**: When 100+ active contributors
371 +
222 222  === 10.2 Federation (V2.0+) ===
373 +
223 223  **Deferred until**:
375 +
224 224  * Core product proven with 10,000+ users
225 225  * User demand for decentralization
226 226  * Single-node limits reached
227 227  See [[Federation & Decentralization>>FactHarbor.Specification.Federation & Decentralization.WebHome]] for future plans.
380 +
228 228  == 11. Technology Stack Summary ==
382 +
229 229  **Backend**:
384 +
230 230  * Python (FastAPI or Django)
231 231  * PostgreSQL (primary database)
232 232  * Redis (caching)
... ... @@ -244,8 +244,10 @@
244 244  * Prometheus + Grafana
245 245  * Structured logging (ELK or cloud logging)
246 246  * Error tracking (Sentry)
402 +
247 247  == 12. Related Pages ==
248 -* [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
404 +
405 +* [[AI Knowledge Extraction Layer (AKEL)>>Archive.FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
249 249  * [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]]
250 250  * [[Data Model>>FactHarbor.Specification.Data Model.WebHome]]
251 251  * [[API Layer>>FactHarbor.Specification.Architecture.WebHome]]