Architecture

See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale.

87

88

== 3. AKEL Architecture ==

89

90

{{include reference="FactHarbor.Specification.Diagrams.AKEL_Architecture.WebHome"/}}

91

See [[AI Knowledge Extraction Layer (AKEL)>>Archive.FactHarbor 2026\.01\.20.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information.

92

93

== 3.5 Claim Processing Architecture ==

94

95

FactHarbor's claim processing architecture is designed to handle both single-claim and multi-claim submissions efficiently.

96

97

=== Multi-Claim Handling ===

Users often submit:

* **Text with multiple claims**: Articles, statements, or paragraphs containing several distinct factual claims

102

* **Web pages**: URLs that are analyzed to extract all verifiable claims

103

* **Single claims**: Simple, direct factual statements

104

105

The first processing step is always **Claim Extraction**: identifying and isolating individual verifiable claims from submitted content.

106

107

=== Processing Phases ===

108

109

**POC Implementation (Two-Phase):**

110

111

Phase 1 - Claim Extraction:

112

113

* LLM analyzes submitted content

114

* Extracts all distinct, verifiable claims

115

* Returns structured list of claims with context

116

117

Phase 2 - Parallel Analysis:

118

119

* Each claim processed independently by LLM

120

* Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk

121

* Parallelized across all claims

122

* Results aggregated for presentation

123

124

**Production Implementation (Three-Phase):**

125

126

Phase 1 - Extraction + Validation:

127

128

* Extract claims from content

129

* Validate clarity and uniqueness

130

* Filter vague or duplicate claims

131

132

Phase 2 - Evidence Gathering (Parallel):

133

134

* Independent evidence gathering per claim

135

* Source validation and scenario generation

136

* Quality gates prevent poor data from advancing

137

138

Phase 3 - Verdict Generation (Parallel):

139

140

* Generate verdict from validated evidence

141

* Confidence scoring and risk assessment

142

* Low-confidence cases routed to human review

143

144

=== Architectural Benefits ===

**Scalability:**

* Process 100 claims with 3x latency of single claim

149

* Parallel processing across independent claims

150

* Linear cost scaling with claim count

**Quality:**

* Validation gates between phases

155

* Errors isolated to individual claims

156

* Clear observability per processing step

**Flexibility:**

* Each phase optimizable independently

161

* Can use different model sizes per phase

162

* Easy to add human review at decision points

163

164

== 4. Storage Architecture ==

165

166

{{include reference="Archive.FactHarbor 2026\.01\.20.Specification.Diagrams.Storage Architecture.WebHome"/}}

167

See [[Storage Strategy>>Archive.FactHarbor 2026\.01\.20.Specification.Architecture.WebHome]] for detailed information.

168

169

== 4.5 Versioning Architecture ==

170

171

{{include reference="Archive.FactHarbor 2026\.01\.20.Specification.Diagrams.Versioning Architecture.WebHome"/}}

172

173

== 5. Automated Systems in Detail ==

174

175

FactHarbor relies heavily on automation to achieve scale and quality. Here's how each automated system works:

176

177

=== 5.1 AKEL (AI Knowledge Evaluation Layer) ===

178

179

**What it does**: Primary AI processing engine that analyzes claims automatically

180

**Inputs**:

181

182

* User-submitted claim text

183

* Existing evidence and sources

184

* Source track record database

185

**Processing steps**:

186

187

1. **Parse & Extract**: Identify key components, entities, assertions

188

2. **Gather Evidence**: Search web and database for relevant sources

189

3. **Check Sources**: Evaluate source reliability using track records

190

4. **Extract Scenarios**: Identify different contexts from evidence

191

5. **Synthesize Verdict**: Compile evidence assessment per scenario

192

6. **Calculate Risk**: Assess potential harm and controversy

193

**Outputs**:

194

195

* Structured claim record

196

* Evidence links with relevance scores

197

* Scenarios with context descriptions

198

* Verdict summary per scenario

199

* Overall confidence score

200

* Risk assessment

201

**Timing**: 10-18 seconds total (parallel processing)

202

203

=== 5.2 Background Jobs ===

204

205

**Source Track Record Updates** (Weekly):

206

207

* Analyze claim outcomes from past week

208

* Calculate source accuracy and reliability

209

* Update source_track_record table

210

* Never triggered by individual claims (prevents circular dependencies)

211

**Cache Management** (Continuous):

212

* Warm cache for popular claims

213

* Invalidate cache on claim updates

214

* Monitor cache hit rates

215

**Metrics Aggregation** (Hourly):

216

* Roll up detailed metrics

217

* Calculate system health indicators

218

* Generate performance reports

219

**Data Archival** (Daily):

220

* Move old AKEL logs to S3 (90+ days)

221

* Archive old edit history

222

* Compress and backup data

223

224

=== 5.3 Quality Monitoring ===

225

226

**Automated checks run continuously**:

227

228

* **Anomaly Detection**: Flag unusual patterns

229

* Sudden confidence score changes

230

* Unusual evidence distributions

231

* Suspicious source patterns

232

* **Contradiction Detection**: Identify conflicts

233

* Evidence that contradicts other evidence

234

* Claims with internal contradictions

235

* Source track record anomalies

236

* **Completeness Validation**: Ensure thoroughness

237

* Sufficient evidence gathered

238

* Multiple source types represented

239

* Key scenarios identified

240

241

=== 5.4 Moderation Detection ===

242

243

**Automated abuse detection**:

244

245

* **Spam Identification**: Pattern matching for spam claims

246

* **Manipulation Detection**: Identify coordinated editing

247

* **Gaming Detection**: Flag attempts to game source scores

248

* **Suspicious Activity**: Log unusual behavior patterns

249

**Human Review**: Moderators review flagged items, system learns from decisions

250

251

== 6. Scalability Strategy ==

252

253

=== 6.1 Horizontal Scaling ===

254

255

Components scale independently:

256

257

* **AKEL Workers**: Add more processing workers as claim volume grows

258

* **Database Read Replicas**: Add replicas for read-heavy workloads

259

* **Cache Layer**: Redis cluster for distributed caching

260

* **API Servers**: Load-balanced API instances

261

262

=== 6.2 Vertical Scaling ===

263

264

Individual components can be upgraded:

265

266

* **Database Server**: Increase CPU/RAM for PostgreSQL

267

* **Cache Memory**: Expand Redis memory

268

* **Worker Resources**: More powerful AKEL worker machines

269

270

=== 6.3 Performance Optimization ===

271

272

Built-in optimizations:

273

274

* **Denormalized Data**: Cache summary data in claim records (70% fewer joins)

275

* **Parallel Processing**: AKEL pipeline processes in parallel (40% faster)

276

* **Intelligent Caching**: Redis caches frequently accessed data

277

* **Background Processing**: Non-urgent tasks run asynchronously

278

279

== 7. Monitoring & Observability ==

280

281

=== 7.1 Key Metrics ===

System tracks:

* **Performance**: AKEL processing time, API response time, cache hit rate

286

* **Quality**: Confidence score distribution, evidence completeness, contradiction rate

287

* **Usage**: Claims per day, active users, API requests

288

* **Errors**: Failed AKEL runs, API errors, database issues

=== 7.2 Alerts ===

Automated alerts for:

293

294

* Processing time >30 seconds (threshold breach)

295

* Error rate >1% (quality issue)

296

* Cache hit rate <80% (cache problem)

297

* Database connections >80% capacity (scaling needed)

298

299

=== 7.3 Dashboards ===

300

301

Real-time monitoring:

302

303

* **System Health**: Overall status and key metrics

304

* **AKEL Performance**: Processing time breakdown

305

* **Quality Metrics**: Confidence scores, completeness

306

* **User Activity**: Usage patterns, peak times

307

308

== 8. Security Architecture ==

309

310

=== 8.1 Authentication & Authorization ===

311

312

* **User Authentication**: Secure login with password hashing

313

* **Role-Based Access**: Reader, Contributor, Moderator, Admin

314

* **API Keys**: For programmatic access

315

* **Rate Limiting**: Prevent abuse

316

317

=== 8.2 Data Security ===

318

319

* **Encryption**: TLS for transport, encrypted storage for sensitive data

320

* **Audit Logging**: Track all significant changes

321

* **Input Validation**: Sanitize all user inputs

322

* **SQL Injection Protection**: Parameterized queries

323

324

=== 8.3 Abuse Prevention ===

325

326

* **Rate Limiting**: Prevent flooding and DDoS

327

* **Automated Detection**: Flag suspicious patterns

328

* **Human Review**: Moderators investigate flagged content

329

* **Ban Mechanisms**: Block abusive users/IPs

330

331

== 9. Deployment Architecture ==

332

333

=== 9.1 Production Environment ===

**Components**:

* Load Balancer (HAProxy or cloud LB)

338

* Multiple API servers (stateless)

339

* AKEL worker pool (auto-scaling)

340

* PostgreSQL primary + read replicas

341

* Redis cluster

342

* S3-compatible storage

343

**Regions**: Single region for V1.0, multi-region when needed

344

345

=== 9.2 Development & Staging ===

346

347

**Development**: Local Docker Compose setup

348

**Staging**: Scaled-down production replica

349

**CI/CD**: Automated testing and deployment

350

351

=== 9.3 Disaster Recovery ===

352

353

* **Database Backups**: Daily automated backups to S3

354

* **Point-in-Time Recovery**: Transaction log archival

355

* **Replication**: Real-time replication to standby

356

* **Recovery Time Objective**: <4 hours

357

358

=== 9.5 Federation Architecture Diagram ===

359

360

{{include reference="Archive.FactHarbor 2026\.01\.20.Specification.Diagrams.Federation Architecture.WebHome"/}}

361

362

== 10. Future Architecture Evolution ==

363

364

=== 10.1 When to Add Complexity ===

365

366

See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers.

367

**Elasticsearch**: When PostgreSQL search consistently >500ms

368

**TimescaleDB**: When metrics queries consistently >1s

369

**Federation**: When 10,000+ users and explicit demand

370

**Complex Reputation**: When 100+ active contributors

371

372

=== 10.2 Federation (V2.0+) ===

**Deferred until**:

* Core product proven with 10,000+ users

377

* User demand for decentralization

378

* Single-node limits reached

379

See [[Federation & Decentralization>>Archive.FactHarbor 2026\.01\.20.Specification.Federation & Decentralization.WebHome]] for future plans.

380

381

== 11. Technology Stack Summary ==

**Backend**:

* Python (FastAPI or Django)

386

* PostgreSQL (primary database)

387

* Redis (caching)

388

**Frontend**:

389

* Modern JavaScript framework (React, Vue, or Svelte)

390

* Server-side rendering for SEO

391

**AI/LLM**:

392

* Multi-provider orchestration (Claude, GPT-4, local models)

393

* Fallback and cross-checking support

394

**Infrastructure**:

395

* Docker containers

396

* Kubernetes or cloud platform auto-scaling

397

* S3-compatible object storage

398

**Monitoring**:

399

* Prometheus + Grafana

400

* Structured logging (ELK or cloud logging)

401

* Error tracking (Sentry)

402

403

== 12. Related Pages ==

404

405

* [[AI Knowledge Extraction Layer (AKEL)>>Archive.FactHarbor 2026\.01\.20.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]

406

* [[Storage Strategy>>Archive.FactHarbor 2026\.01\.20.Specification.Architecture.WebHome]]

407

* [[Data Model>>Archive.FactHarbor 2026\.01\.20.Specification.Data Model.WebHome]]

408

* [[API Layer>>Archive.FactHarbor 2026\.01\.20.Specification.Architecture.WebHome]]

409

* [[Design Decisions>>FactHarbor.Specification.Design-Decisions]]

410

* [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]]

Wiki source code of Architecture