Architecture

**Purpose:** FactHarbor uses a provider-agnostic abstraction layer for all AI interactions, avoiding vendor lock-in and enabling flexible provider selection.

89

90

**Multi-Provider Support:**

91

92

* **Primary:** Anthropic Claude API (Haiku for extraction, Sonnet for analysis)

93

* **Secondary:** OpenAI GPT API (automatic failover)

94

* **Tertiary:** Google Vertex AI / Gemini

95

* **Future:** Local models (Llama, Mistral) for on-premises deployments

96

97

**Provider Interface:**

98

99

* Abstract `LLMProvider` interface with `complete()`, `stream()`, `getName()`, `getCostPer1kTokens()`, `isAvailable()` methods

100

* Per-stage model configuration (Stage 1: Haiku, Stage 2 & 3: Sonnet)

101

* Environment variable and database configuration

102

* Adapter pattern implementation (AnthropicProvider, OpenAIProvider, GoogleProvider)

**Configuration:**

* Runtime provider switching without code changes

107

* Admin API for provider management (`POST /admin/v1/llm/configure`)

108

* Per-stage cost optimization (use cheaper models for extraction, quality models for analysis)

109

* Support for rate limit handling and cost tracking

110

111

**Failover Strategy:**

112

113

* Automatic fallback: Primary → Secondary → Tertiary

114

* Circuit breaker pattern for unavailable providers

115

* Health checking and provider availability monitoring

116

* Graceful degradation when all providers unavailable

117

118

**Cost Optimization:**

119

120

* Track and compare costs across providers per request

121

* Enable A/B testing of different models for quality/cost tradeoffs

122

* Per-stage provider selection for optimal cost-efficiency

123

* Cost comparison: Anthropic ($0.114), OpenAI ($0.065), Google ($0.072) per article at 0% cache

124

125

**Architecture Pattern:**

126

127

128

AKEL Stages LLM Abstraction Providers

129

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

130

Stage 1 Extract ──→ Provider Interface ──→ Anthropic (PRIMARY)

131

Stage 2 Analyze ──→ Configuration ──→ OpenAI (SECONDARY)

132

Stage 3 Holistic ──→ Failover Handler ──→ Google (TERTIARY)

133

└→ Local Models (FUTURE)

**Benefits:**

* **No Vendor Lock-In:** Switch providers based on cost, quality, or availability without code changes

139

* **Resilience:** Automatic failover ensures service continuity during provider outages

140

* **Cost Efficiency:** Use optimal provider per task (cheap for extraction, quality for analysis)

141

* **Quality Assurance:** Cross-provider output verification for critical claims

142

* **Regulatory Compliance:** Use specific providers for data residency requirements

143

* **Future-Proofing:** Easy integration of new models as they become available

144

145

**Cross-References:**

146

147

* [[Requirements>>FactHarbor.Specification.Requirements.WebHome#NFR-14]]: NFR-14 (formal requirement)

148

* [[POC Requirements>>FactHarbor.Specification.POC.Requirements#NFR-POC-11]]: NFR-POC-11 (POC1 implementation)

149

* [[API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome#Section-6]]: Section 6 (implementation details)

150

* [[Design Decisions>>FactHarbor.Specification.Design-Decisions#Section-9]]: Section 9 (design rationale)

151

152

=== 2.2 Design Philosophy ===

153

154

**Start Simple, Evolve Based on Metrics**

155

The architecture deliberately starts simple:

156

157

* Single primary database (PostgreSQL handles most workloads initially)

158

* Three clear layers (easy to understand and maintain)

159

* Automated operations (minimal human intervention)

160

* Measure before optimizing (add complexity only when proven necessary)

161

See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale.

162

163

== 3. AKEL Architecture ==

164

165

{{include reference="FactHarbor.Specification.Diagrams.AKEL_Architecture.WebHome"/}}

166

See [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information.

167

168

== 3.5 Claim Processing Architecture ==

169

170

FactHarbor's claim processing architecture is designed to handle both single-claim and multi-claim submissions efficiently.

171

172

=== Multi-Claim Handling ===

Users often submit:

* **Text with multiple claims**: Articles, statements, or paragraphs containing several distinct factual claims

177

* **Web pages**: URLs that are analyzed to extract all verifiable claims

178

* **Single claims**: Simple, direct factual statements

179

180

The first processing step is always **Claim Extraction**: identifying and isolating individual verifiable claims from submitted content.

181

182

=== Processing Phases ===

183

184

**POC Implementation (Two-Phase):**

185

186

Phase 1 - Claim Extraction:

187

188

* LLM analyzes submitted content

189

* Extracts all distinct, verifiable claims

190

* Returns structured list of claims with context

191

192

Phase 2 - Parallel Analysis:

193

194

* Each claim processed independently by LLM

195

* Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk

196

* Parallelized across all claims

197

* Results aggregated for presentation

198

199

**Production Implementation (Three-Phase):**

200

201

Phase 1 - Extraction + Validation:

202

203

* Extract claims from content

204

* Validate clarity and uniqueness

205

* Filter vague or duplicate claims

206

207

Phase 2 - Evidence Gathering (Parallel):

208

209

* Independent evidence gathering per claim

210

* Source validation and scenario generation

211

* Quality gates prevent poor data from advancing

212

213

Phase 3 - Verdict Generation (Parallel):

214

215

* Generate verdict from validated evidence

216

* Confidence scoring and risk assessment

217

* Low-confidence cases routed to human review

218

219

=== Architectural Benefits ===

**Scalability:**

* Process 100 claims with 3x latency of single claim

224

* Parallel processing across independent claims

225

* Linear cost scaling with claim count

226

227

=== 2.3 Design Philosophy ===

**Quality:**

* Validation gates between phases

232

* Errors isolated to individual claims

233

* Clear observability per processing step

**Flexibility:**

* Each phase optimizable independently

238

* Can use different model sizes per phase

239

* Easy to add human review at decision points

240

241

== 4. Storage Architecture ==

242

243

{{include reference="FactHarbor.Specification.Diagrams.Storage Architecture.WebHome"/}}

244

See [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] for detailed information.

245

246

== 4.5 Versioning Architecture ==

247

248

{{include reference="FactHarbor.Specification.Diagrams.Versioning Architecture.WebHome"/}}

249

250

== 5. Automated Systems in Detail ==

251

252

FactHarbor relies heavily on automation to achieve scale and quality. Here's how each automated system works:

253

254

=== 5.1 AKEL (AI Knowledge Evaluation Layer) ===

255

256

**What it does**: Primary AI processing engine that analyzes claims automatically

257

**Inputs**:

258

259

* User-submitted claim text

260

* Existing evidence and sources

261

* Source track record database

262

**Processing steps**:

263

264

1. **Parse & Extract**: Identify key components, entities, assertions

265

2. **Gather Evidence**: Search web and database for relevant sources

266

3. **Check Sources**: Evaluate source reliability using track records

267

4. **Extract Scenarios**: Identify different contexts from evidence

268

5. **Synthesize Verdict**: Compile evidence assessment per scenario

269

6. **Calculate Risk**: Assess potential harm and controversy

270

**Outputs**:

271

272

* Structured claim record

273

* Evidence links with relevance scores

274

* Scenarios with context descriptions

275

* Verdict summary per scenario

276

* Overall confidence score

277

* Risk assessment

278

**Timing**: 10-18 seconds total (parallel processing)

279

280

=== 5.2 Background Jobs ===

281

282

**Source Track Record Updates** (Weekly):

283

284

* Analyze claim outcomes from past week

285

* Calculate source accuracy and reliability

286

* Update source_track_record table

287

* Never triggered by individual claims (prevents circular dependencies)

288

**Cache Management** (Continuous):

289

* Warm cache for popular claims

290

* Invalidate cache on claim updates

291

* Monitor cache hit rates

292

**Metrics Aggregation** (Hourly):

293

* Roll up detailed metrics

294

* Calculate system health indicators

295

* Generate performance reports

296

**Data Archival** (Daily):

297

* Move old AKEL logs to S3 (90+ days)

298

* Archive old edit history

299

* Compress and backup data

300

301

=== 5.3 Quality Monitoring ===

302

303

**Automated checks run continuously**:

304

305

* **Anomaly Detection**: Flag unusual patterns

306

* Sudden confidence score changes

307

* Unusual evidence distributions

308

* Suspicious source patterns

309

* **Contradiction Detection**: Identify conflicts

310

* Evidence that contradicts other evidence

311

* Claims with internal contradictions

312

* Source track record anomalies

313

* **Completeness Validation**: Ensure thoroughness

314

* Sufficient evidence gathered

315

* Multiple source types represented

316

* Key scenarios identified

317

318

=== 5.4 Moderation Detection ===

319

320

**Automated abuse detection**:

321

322

* **Spam Identification**: Pattern matching for spam claims

323

* **Manipulation Detection**: Identify coordinated editing

324

* **Gaming Detection**: Flag attempts to game source scores

325

* **Suspicious Activity**: Log unusual behavior patterns

326

**Human Review**: Moderators review flagged items, system learns from decisions

327

328

== 6. Scalability Strategy ==

329

330

=== 6.1 Horizontal Scaling ===

331

332

Components scale independently:

333

334

* **AKEL Workers**: Add more processing workers as claim volume grows

335

* **Database Read Replicas**: Add replicas for read-heavy workloads

336

* **Cache Layer**: Redis cluster for distributed caching

337

* **API Servers**: Load-balanced API instances

338

339

=== 6.2 Vertical Scaling ===

340

341

Individual components can be upgraded:

342

343

* **Database Server**: Increase CPU/RAM for PostgreSQL

344

* **Cache Memory**: Expand Redis memory

345

* **Worker Resources**: More powerful AKEL worker machines

346

347

=== 6.3 Performance Optimization ===

348

349

Built-in optimizations:

350

351

* **Denormalized Data**: Cache summary data in claim records (70% fewer joins)

352

* **Parallel Processing**: AKEL pipeline processes in parallel (40% faster)

353

* **Intelligent Caching**: Redis caches frequently accessed data

354

* **Background Processing**: Non-urgent tasks run asynchronously

355

356

== 7. Monitoring & Observability ==

357

358

=== 7.1 Key Metrics ===

System tracks:

* **Performance**: AKEL processing time, API response time, cache hit rate

363

* **Quality**: Confidence score distribution, evidence completeness, contradiction rate

364

* **Usage**: Claims per day, active users, API requests

365

* **Errors**: Failed AKEL runs, API errors, database issues

=== 7.2 Alerts ===

Automated alerts for:

370

371

* Processing time >30 seconds (threshold breach)

372

* Error rate >1% (quality issue)

373

* Cache hit rate <80% (cache problem)

374

* Database connections >80% capacity (scaling needed)

375

376

=== 7.3 Dashboards ===

377

378

Real-time monitoring:

379

380

* **System Health**: Overall status and key metrics

381

* **AKEL Performance**: Processing time breakdown

382

* **Quality Metrics**: Confidence scores, completeness

383

* **User Activity**: Usage patterns, peak times

384

385

== 8. Security Architecture ==

386

387

=== 8.1 Authentication & Authorization ===

388

389

* **User Authentication**: Secure login with password hashing

390

* **Role-Based Access**: Reader, Contributor, Moderator, Admin

391

* **API Keys**: For programmatic access

392

* **Rate Limiting**: Prevent abuse

393

394

=== 8.2 Data Security ===

395

396

* **Encryption**: TLS for transport, encrypted storage for sensitive data

397

* **Audit Logging**: Track all significant changes

398

* **Input Validation**: Sanitize all user inputs

399

* **SQL Injection Protection**: Parameterized queries

400

401

=== 8.3 Abuse Prevention ===

402

403

* **Rate Limiting**: Prevent flooding and DDoS

404

* **Automated Detection**: Flag suspicious patterns

405

* **Human Review**: Moderators investigate flagged content

406

* **Ban Mechanisms**: Block abusive users/IPs

407

408

== 9. Deployment Architecture ==

409

410

=== 9.1 Production Environment ===

**Components**:

* Load Balancer (HAProxy or cloud LB)

415

* Multiple API servers (stateless)

416

* AKEL worker pool (auto-scaling)

417

* PostgreSQL primary + read replicas

418

* Redis cluster

419

* S3-compatible storage

420

**Regions**: Single region for V1.0, multi-region when needed

421

422

=== 9.2 Development & Staging ===

423

424

**Development**: Local Docker Compose setup

425

**Staging**: Scaled-down production replica

426

**CI/CD**: Automated testing and deployment

427

428

=== 9.3 Disaster Recovery ===

429

430

* **Database Backups**: Daily automated backups to S3

431

* **Point-in-Time Recovery**: Transaction log archival

432

* **Replication**: Real-time replication to standby

433

* **Recovery Time Objective**: <4 hours

434

435

=== 9.5 Federation Architecture Diagram ===

436

437

{{include reference="FactHarbor.Specification.Diagrams.Federation Architecture.WebHome"/}}

438

439

== 10. Future Architecture Evolution ==

440

441

=== 10.1 When to Add Complexity ===

442

443

See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers.

444

**Elasticsearch**: When PostgreSQL search consistently >500ms

445

**TimescaleDB**: When metrics queries consistently >1s

446

**Federation**: When 10,000+ users and explicit demand

447

**Complex Reputation**: When 100+ active contributors

448

449

=== 10.2 Federation (V2.0+) ===

**Deferred until**:

* Core product proven with 10,000+ users

454

* User demand for decentralization

455

* Single-node limits reached

456

See [[Federation & Decentralization>>FactHarbor.Specification.Federation & Decentralization.WebHome]] for future plans.

457

458

== 11. Technology Stack Summary ==

**Backend**:

* Python (FastAPI or Django)

463

* PostgreSQL (primary database)

464

* Redis (caching)

465

**Frontend**:

466

* Modern JavaScript framework (React, Vue, or Svelte)

467

* Server-side rendering for SEO

468

**AI/LLM**:

469

* Multi-provider orchestration (Claude, GPT-4, local models)

470

* Fallback and cross-checking support

471

**Infrastructure**:

472

* Docker containers

473

* Kubernetes or cloud platform auto-scaling

474

* S3-compatible object storage

475

**Monitoring**:

476

* Prometheus + Grafana

477

* Structured logging (ELK or cloud logging)

478

* Error tracking (Sentry)

479

480

== 12. Related Pages ==

481

482

* [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]

483

* [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]]

484

* [[Data Model>>FactHarbor.Specification.Data Model.WebHome]]

485

* [[API Layer>>FactHarbor.Specification.Architecture.WebHome]]

486

* [[Design Decisions>>FactHarbor.Specification.Design-Decisions]]

487

* [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]]

Wiki source code of Architecture

Applications

Navigation

Need help?