Architecture

**Purpose:** FactHarbor uses a provider-agnostic abstraction layer for all AI interactions, avoiding vendor lock-in and enabling flexible provider selection.

74

75

**Multi-Provider Support:**

76

* **Primary:** Anthropic Claude API (Haiku for extraction, Sonnet for analysis)

77

* **Secondary:** OpenAI GPT API (automatic failover)

78

* **Tertiary:** Google Vertex AI / Gemini

79

* **Future:** Local models (Llama, Mistral) for on-premises deployments

80

81

**Provider Interface:**

82

* Abstract `LLMProvider` interface with `complete()`, `stream()`, `getName()`, `getCostPer1kTokens()`, `isAvailable()` methods

83

* Per-stage model configuration (Stage 1: Haiku, Stage 2 & 3: Sonnet)

84

* Environment variable and database configuration

85

* Adapter pattern implementation (AnthropicProvider, OpenAIProvider, GoogleProvider)

86

87

**Configuration:**

88

* Runtime provider switching without code changes

89

* Admin API for provider management (`POST /admin/v1/llm/configure`)

90

* Per-stage cost optimization (use cheaper models for extraction, quality models for analysis)

91

* Support for rate limit handling and cost tracking

92

93

**Failover Strategy:**

94

* Automatic fallback: Primary → Secondary → Tertiary

95

* Circuit breaker pattern for unavailable providers

96

* Health checking and provider availability monitoring

97

* Graceful degradation when all providers unavailable

98

99

**Cost Optimization:**

100

* Track and compare costs across providers per request

101

* Enable A/B testing of different models for quality/cost tradeoffs

102

* Per-stage provider selection for optimal cost-efficiency

103

* Cost comparison: Anthropic ($0.114), OpenAI ($0.065), Google ($0.072) per article at 0% cache

104

105

**Architecture Pattern:**

106

107

108

AKEL Stages LLM Abstraction Providers

109

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

110

Stage 1 Extract ──→ Provider Interface ──→ Anthropic (PRIMARY)

111

Stage 2 Analyze ──→ Configuration ──→ OpenAI (SECONDARY)

112

Stage 3 Holistic ──→ Failover Handler ──→ Google (TERTIARY)

113

└→ Local Models (FUTURE)

**Benefits:**

* **No Vendor Lock-In:** Switch providers based on cost, quality, or availability without code changes

118

* **Resilience:** Automatic failover ensures service continuity during provider outages

119

* **Cost Efficiency:** Use optimal provider per task (cheap for extraction, quality for analysis)

120

* **Quality Assurance:** Cross-provider output verification for critical claims

121

* **Regulatory Compliance:** Use specific providers for data residency requirements

122

* **Future-Proofing:** Easy integration of new models as they become available

123

124

**Cross-References:**

125

* [[Requirements>>FactHarbor.Specification.Requirements.WebHome#NFR-14]]: NFR-14 (formal requirement)

126

* [[POC Requirements>>FactHarbor.Specification.POC.Requirements#NFR-POC-11]]: NFR-POC-11 (POC1 implementation)

127

* [[API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome#Section-6]]: Section 6 (implementation details)

128

* [[Design Decisions>>FactHarbor.Specification.Design-Decisions#Section-9]]: Section 9 (design rationale)

129

130

131

=== 2.2 Design Philosophy ===

132

**Start Simple, Evolve Based on Metrics**

133

The architecture deliberately starts simple:

134

* Single primary database (PostgreSQL handles most workloads initially)

135

* Three clear layers (easy to understand and maintain)

136

* Automated operations (minimal human intervention)

137

* Measure before optimizing (add complexity only when proven necessary)

138

See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale.

139

== 3. AKEL Architecture ==

140

{{include reference="FactHarbor.Specification.Diagrams.AKEL Architecture.WebHome"/}}

141

See [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information.

142

143

== 3.5 Claim Processing Architecture ==

144

145

FactHarbor's claim processing architecture is designed to handle both single-claim and multi-claim submissions efficiently.

146

147

=== Multi-Claim Handling ===

148

149

Users often submit:

150

* **Text with multiple claims**: Articles, statements, or paragraphs containing several distinct factual claims

151

* **Web pages**: URLs that are analyzed to extract all verifiable claims

152

* **Single claims**: Simple, direct factual statements

153

154

The first processing step is always **Claim Extraction**: identifying and isolating individual verifiable claims from submitted content.

155

156

=== Processing Phases ===

157

158

**POC Implementation (Two-Phase):**

159

160

Phase 1 - Claim Extraction:

161

* LLM analyzes submitted content

162

* Extracts all distinct, verifiable claims

163

* Returns structured list of claims with context

164

165

Phase 2 - Parallel Analysis:

166

* Each claim processed independently by LLM

167

* Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk

168

* Parallelized across all claims

169

* Results aggregated for presentation

170

171

**Production Implementation (Three-Phase):**

172

173

Phase 1 - Extraction + Validation:

174

* Extract claims from content

175

* Validate clarity and uniqueness

176

* Filter vague or duplicate claims

177

178

Phase 2 - Evidence Gathering (Parallel):

179

* Independent evidence gathering per claim

180

* Source validation and scenario generation

181

* Quality gates prevent poor data from advancing

182

183

Phase 3 - Verdict Generation (Parallel):

184

* Generate verdict from validated evidence

185

* Confidence scoring and risk assessment

186

* Low-confidence cases routed to human review

187

188

=== Architectural Benefits ===

189

190

**Scalability:**

191

* Process 100 claims with ~3x latency of single claim

192

* Parallel processing across independent claims

193

* Linear cost scaling with claim count

194

=== 2.3 Design Philosophy ===

195

**Quality:**

196

* Validation gates between phases

197

* Errors isolated to individual claims

198

* Clear observability per processing step

199

200

**Flexibility:**

201

* Each phase optimizable independently

202

* Can use different model sizes per phase

203

* Easy to add human review at decision points

204

205

== 4. Storage Architecture ==

206

{{include reference="FactHarbor.Specification.Diagrams.Storage Architecture.WebHome"/}}

207

See [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] for detailed information.

208

== 4.5 Versioning Architecture ==

209

{{include reference="FactHarbor.Specification.Diagrams.Versioning Architecture.WebHome"/}}

210

== 5. Automated Systems in Detail ==

211

FactHarbor relies heavily on automation to achieve scale and quality. Here's how each automated system works:

212

=== 5.1 AKEL (AI Knowledge Evaluation Layer) ===

213

**What it does**: Primary AI processing engine that analyzes claims automatically

214

**Inputs**:

215

* User-submitted claim text

216

* Existing evidence and sources

217

* Source track record database

218

**Processing steps**:

219

1. **Parse & Extract**: Identify key components, entities, assertions

220

2. **Gather Evidence**: Search web and database for relevant sources

221

3. **Check Sources**: Evaluate source reliability using track records

222

4. **Extract Scenarios**: Identify different contexts from evidence

223

5. **Synthesize Verdict**: Compile evidence assessment per scenario

224

6. **Calculate Risk**: Assess potential harm and controversy

225

**Outputs**:

226

* Structured claim record

227

* Evidence links with relevance scores

228

* Scenarios with context descriptions

229

* Verdict summary per scenario

230

* Overall confidence score

231

* Risk assessment

232

**Timing**: 10-18 seconds total (parallel processing)

233

=== 5.2 Background Jobs ===

234

**Source Track Record Updates** (Weekly):

235

* Analyze claim outcomes from past week

236

* Calculate source accuracy and reliability

237

* Update source_track_record table

238

* Never triggered by individual claims (prevents circular dependencies)

239

**Cache Management** (Continuous):

240

* Warm cache for popular claims

241

* Invalidate cache on claim updates

242

* Monitor cache hit rates

243

**Metrics Aggregation** (Hourly):

244

* Roll up detailed metrics

245

* Calculate system health indicators

246

* Generate performance reports

247

**Data Archival** (Daily):

248

* Move old AKEL logs to S3 (90+ days)

249

* Archive old edit history

250

* Compress and backup data

251

=== 5.3 Quality Monitoring ===

252

**Automated checks run continuously**:

253

* **Anomaly Detection**: Flag unusual patterns

254

* Sudden confidence score changes

255

* Unusual evidence distributions

256

* Suspicious source patterns

257

* **Contradiction Detection**: Identify conflicts

258

* Evidence that contradicts other evidence

259

* Claims with internal contradictions

260

* Source track record anomalies

261

* **Completeness Validation**: Ensure thoroughness

262

* Sufficient evidence gathered

263

* Multiple source types represented

264

* Key scenarios identified

265

=== 5.4 Moderation Detection ===

266

**Automated abuse detection**:

267

* **Spam Identification**: Pattern matching for spam claims

268

* **Manipulation Detection**: Identify coordinated editing

269

* **Gaming Detection**: Flag attempts to game source scores

270

* **Suspicious Activity**: Log unusual behavior patterns

271

**Human Review**: Moderators review flagged items, system learns from decisions

272

== 6. Scalability Strategy ==

273

=== 6.1 Horizontal Scaling ===

274

Components scale independently:

275

* **AKEL Workers**: Add more processing workers as claim volume grows

276

* **Database Read Replicas**: Add replicas for read-heavy workloads

277

* **Cache Layer**: Redis cluster for distributed caching

278

* **API Servers**: Load-balanced API instances

279

=== 6.2 Vertical Scaling ===

280

Individual components can be upgraded:

281

* **Database Server**: Increase CPU/RAM for PostgreSQL

282

* **Cache Memory**: Expand Redis memory

283

* **Worker Resources**: More powerful AKEL worker machines

284

=== 6.3 Performance Optimization ===

285

Built-in optimizations:

286

* **Denormalized Data**: Cache summary data in claim records (70% fewer joins)

287

* **Parallel Processing**: AKEL pipeline processes in parallel (40% faster)

288

* **Intelligent Caching**: Redis caches frequently accessed data

289

* **Background Processing**: Non-urgent tasks run asynchronously

290

== 7. Monitoring & Observability ==

291

=== 7.1 Key Metrics ===

292

System tracks:

293

* **Performance**: AKEL processing time, API response time, cache hit rate

294

* **Quality**: Confidence score distribution, evidence completeness, contradiction rate

295

* **Usage**: Claims per day, active users, API requests

296

* **Errors**: Failed AKEL runs, API errors, database issues

297

=== 7.2 Alerts ===

298

Automated alerts for:

299

* Processing time >30 seconds (threshold breach)

300

* Error rate >1% (quality issue)

301

* Cache hit rate <80% (cache problem)

302

* Database connections >80% capacity (scaling needed)

303

=== 7.3 Dashboards ===

304

Real-time monitoring:

305

* **System Health**: Overall status and key metrics

306

* **AKEL Performance**: Processing time breakdown

307

* **Quality Metrics**: Confidence scores, completeness

308

* **User Activity**: Usage patterns, peak times

309

== 8. Security Architecture ==

310

=== 8.1 Authentication & Authorization ===

311

* **User Authentication**: Secure login with password hashing

312

* **Role-Based Access**: Reader, Contributor, Moderator, Admin

313

* **API Keys**: For programmatic access

314

* **Rate Limiting**: Prevent abuse

315

=== 8.2 Data Security ===

316

* **Encryption**: TLS for transport, encrypted storage for sensitive data

317

* **Audit Logging**: Track all significant changes

318

* **Input Validation**: Sanitize all user inputs

319

* **SQL Injection Protection**: Parameterized queries

320

=== 8.3 Abuse Prevention ===

321

* **Rate Limiting**: Prevent flooding and DDoS

322

* **Automated Detection**: Flag suspicious patterns

323

* **Human Review**: Moderators investigate flagged content

324

* **Ban Mechanisms**: Block abusive users/IPs

325

== 9. Deployment Architecture ==

326

=== 9.1 Production Environment ===

327

**Components**:

328

* Load Balancer (HAProxy or cloud LB)

329

* Multiple API servers (stateless)

330

* AKEL worker pool (auto-scaling)

331

* PostgreSQL primary + read replicas

332

* Redis cluster

333

* S3-compatible storage

334

**Regions**: Single region for V1.0, multi-region when needed

335

=== 9.2 Development & Staging ===

336

**Development**: Local Docker Compose setup

337

**Staging**: Scaled-down production replica

338

**CI/CD**: Automated testing and deployment

339

=== 9.3 Disaster Recovery ===

340

* **Database Backups**: Daily automated backups to S3

341

* **Point-in-Time Recovery**: Transaction log archival

342

* **Replication**: Real-time replication to standby

343

* **Recovery Time Objective**: <4 hours

344

345

=== 9.5 Federation Architecture Diagram ===

346

347

{{include reference="FactHarbor.Specification.Diagrams.Federation Architecture.WebHome"/}}

348

349

== 10. Future Architecture Evolution ==

350

=== 10.1 When to Add Complexity ===

351

See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers.

352

**Elasticsearch**: When PostgreSQL search consistently >500ms

353

**TimescaleDB**: When metrics queries consistently >1s

354

**Federation**: When 10,000+ users and explicit demand

355

**Complex Reputation**: When 100+ active contributors

356

=== 10.2 Federation (V2.0+) ===

357

**Deferred until**:

358

* Core product proven with 10,000+ users

359

* User demand for decentralization

360

* Single-node limits reached

361

See [[Federation & Decentralization>>FactHarbor.Specification.Federation & Decentralization.WebHome]] for future plans.

362

== 11. Technology Stack Summary ==

363

**Backend**:

364

* Python (FastAPI or Django)

365

* PostgreSQL (primary database)

366

* Redis (caching)

367

**Frontend**:

368

* Modern JavaScript framework (React, Vue, or Svelte)

369

* Server-side rendering for SEO

370

**AI/LLM**:

371

* Multi-provider orchestration (Claude, GPT-4, local models)

372

* Fallback and cross-checking support

373

**Infrastructure**:

374

* Docker containers

375

* Kubernetes or cloud platform auto-scaling

376

* S3-compatible object storage

377

**Monitoring**:

378

* Prometheus + Grafana

379

* Structured logging (ELK or cloud logging)

380

* Error tracking (Sentry)

381

== 12. Related Pages ==

382

* [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]

383

* [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]]

384

* [[Data Model>>FactHarbor.Specification.Data Model.WebHome]]

385

* [[API Layer>>FactHarbor.Specification.Architecture.WebHome]]

386

* [[Design Decisions>>FactHarbor.Specification.Design-Decisions]]

387

* [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]]

Wiki source code of Architecture

Applications

Navigation

Need help?