Architecture

See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale.

68

== 3. AKEL Architecture ==

69

{{include reference="FactHarbor.Specification.Diagrams.AKEL_Architecture.WebHome"/}}

70

See [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information.

71

72

== 3.5 Claim Processing Architecture ==

73

74

FactHarbor's claim processing architecture is designed to handle both single-claim and multi-claim submissions efficiently.

75

76

=== Multi-Claim Handling ===

77

78

Users often submit:

79

* **Text with multiple claims**: Articles, statements, or paragraphs containing several distinct factual claims

80

* **Web pages**: URLs that are analyzed to extract all verifiable claims

81

* **Single claims**: Simple, direct factual statements

82

83

The first processing step is always **Claim Extraction**: identifying and isolating individual verifiable claims from submitted content.

84

85

=== Processing Phases ===

86

87

**POC Implementation (Two-Phase):**

88

89

Phase 1 - Claim Extraction:

90

* LLM analyzes submitted content

91

* Extracts all distinct, verifiable claims

92

* Returns structured list of claims with context

93

94

Phase 2 - Parallel Analysis:

95

* Each claim processed independently by LLM

96

* Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk

97

* Parallelized across all claims

98

* Results aggregated for presentation

99

100

**Production Implementation (Three-Phase):**

101

102

Phase 1 - Extraction + Validation:

103

* Extract claims from content

104

* Validate clarity and uniqueness

105

* Filter vague or duplicate claims

106

107

Phase 2 - Evidence Gathering (Parallel):

108

* Independent evidence gathering per claim

109

* Source validation and scenario generation

110

* Quality gates prevent poor data from advancing

111

112

Phase 3 - Verdict Generation (Parallel):

113

* Generate verdict from validated evidence

114

* Confidence scoring and risk assessment

115

* Low-confidence cases routed to human review

116

117

=== Architectural Benefits ===

118

119

**Scalability:**

120

* Process 100 claims with ~3x latency of single claim

121

* Parallel processing across independent claims

122

* Linear cost scaling with claim count

123

124

**Quality:**

125

* Validation gates between phases

126

* Errors isolated to individual claims

127

* Clear observability per processing step

128

129

**Flexibility:**

130

* Each phase optimizable independently

131

* Can use different model sizes per phase

132

* Easy to add human review at decision points

133

134

135

== 4. Storage Architecture ==

136

{{include reference="FactHarbor.Specification.Diagrams.Storage Architecture.WebHome"/}}

137

See [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] for detailed information.

138

== 4.5 Versioning Architecture ==

139

{{include reference="FactHarbor.Specification.Diagrams.Versioning Architecture.WebHome"/}}

140

== 5. Automated Systems in Detail ==

141

FactHarbor relies heavily on automation to achieve scale and quality. Here's how each automated system works:

142

=== 5.1 AKEL (AI Knowledge Evaluation Layer) ===

143

**What it does**: Primary AI processing engine that analyzes claims automatically

144

**Inputs**:

145

* User-submitted claim text

146

* Existing evidence and sources

147

* Source track record database

148

**Processing steps**:

149

1. **Parse & Extract**: Identify key components, entities, assertions

150

2. **Gather Evidence**: Search web and database for relevant sources

151

3. **Check Sources**: Evaluate source reliability using track records

152

4. **Extract Scenarios**: Identify different contexts from evidence

153

5. **Synthesize Verdict**: Compile evidence assessment per scenario

154

6. **Calculate Risk**: Assess potential harm and controversy

155

**Outputs**:

156

* Structured claim record

157

* Evidence links with relevance scores

158

* Scenarios with context descriptions

159

* Verdict summary per scenario

160

* Overall confidence score

161

* Risk assessment

162

**Timing**: 10-18 seconds total (parallel processing)

163

=== 5.2 Background Jobs ===

164

**Source Track Record Updates** (Weekly):

165

* Analyze claim outcomes from past week

166

* Calculate source accuracy and reliability

167

* Update source_track_record table

168

* Never triggered by individual claims (prevents circular dependencies)

169

**Cache Management** (Continuous):

170

* Warm cache for popular claims

171

* Invalidate cache on claim updates

172

* Monitor cache hit rates

173

**Metrics Aggregation** (Hourly):

174

* Roll up detailed metrics

175

* Calculate system health indicators

176

* Generate performance reports

177

**Data Archival** (Daily):

178

* Move old AKEL logs to S3 (90+ days)

179

* Archive old edit history

180

* Compress and backup data

181

=== 5.3 Quality Monitoring ===

182

**Automated checks run continuously**:

183

* **Anomaly Detection**: Flag unusual patterns

184

* Sudden confidence score changes

185

* Unusual evidence distributions

186

* Suspicious source patterns

187

* **Contradiction Detection**: Identify conflicts

188

* Evidence that contradicts other evidence

189

* Claims with internal contradictions

190

* Source track record anomalies

191

* **Completeness Validation**: Ensure thoroughness

192

* Sufficient evidence gathered

193

* Multiple source types represented

194

* Key scenarios identified

195

=== 5.4 Moderation Detection ===

196

**Automated abuse detection**:

197

* **Spam Identification**: Pattern matching for spam claims

198

* **Manipulation Detection**: Identify coordinated editing

199

* **Gaming Detection**: Flag attempts to game source scores

200

* **Suspicious Activity**: Log unusual behavior patterns

201

**Human Review**: Moderators review flagged items, system learns from decisions

202

== 6. Scalability Strategy ==

203

=== 6.1 Horizontal Scaling ===

204

Components scale independently:

205

* **AKEL Workers**: Add more processing workers as claim volume grows

206

* **Database Read Replicas**: Add replicas for read-heavy workloads

207

* **Cache Layer**: Redis cluster for distributed caching

208

* **API Servers**: Load-balanced API instances

209

=== 6.2 Vertical Scaling ===

210

Individual components can be upgraded:

211

* **Database Server**: Increase CPU/RAM for PostgreSQL

212

* **Cache Memory**: Expand Redis memory

213

* **Worker Resources**: More powerful AKEL worker machines

214

=== 6.3 Performance Optimization ===

215

Built-in optimizations:

216

* **Denormalized Data**: Cache summary data in claim records (70% fewer joins)

217

* **Parallel Processing**: AKEL pipeline processes in parallel (40% faster)

218

* **Intelligent Caching**: Redis caches frequently accessed data

219

* **Background Processing**: Non-urgent tasks run asynchronously

220

== 7. Monitoring & Observability ==

221

=== 7.1 Key Metrics ===

222

System tracks:

223

* **Performance**: AKEL processing time, API response time, cache hit rate

224

* **Quality**: Confidence score distribution, evidence completeness, contradiction rate

225

* **Usage**: Claims per day, active users, API requests

226

* **Errors**: Failed AKEL runs, API errors, database issues

227

=== 7.2 Alerts ===

228

Automated alerts for:

229

* Processing time >30 seconds (threshold breach)

230

* Error rate >1% (quality issue)

231

* Cache hit rate <80% (cache problem)

232

* Database connections >80% capacity (scaling needed)

233

=== 7.3 Dashboards ===

234

Real-time monitoring:

235

* **System Health**: Overall status and key metrics

236

* **AKEL Performance**: Processing time breakdown

237

* **Quality Metrics**: Confidence scores, completeness

238

* **User Activity**: Usage patterns, peak times

239

== 8. Security Architecture ==

240

=== 8.1 Authentication & Authorization ===

241

* **User Authentication**: Secure login with password hashing

242

* **Role-Based Access**: Reader, Contributor, Moderator, Admin

243

* **API Keys**: For programmatic access

244

* **Rate Limiting**: Prevent abuse

245

=== 8.2 Data Security ===

246

* **Encryption**: TLS for transport, encrypted storage for sensitive data

247

* **Audit Logging**: Track all significant changes

248

* **Input Validation**: Sanitize all user inputs

249

* **SQL Injection Protection**: Parameterized queries

250

=== 8.3 Abuse Prevention ===

251

* **Rate Limiting**: Prevent flooding and DDoS

252

* **Automated Detection**: Flag suspicious patterns

253

* **Human Review**: Moderators investigate flagged content

254

* **Ban Mechanisms**: Block abusive users/IPs

255

== 9. Deployment Architecture ==

256

=== 9.1 Production Environment ===

257

**Components**:

258

* Load Balancer (HAProxy or cloud LB)

259

* Multiple API servers (stateless)

260

* AKEL worker pool (auto-scaling)

261

* PostgreSQL primary + read replicas

262

* Redis cluster

263

* S3-compatible storage

264

**Regions**: Single region for V1.0, multi-region when needed

265

=== 9.2 Development & Staging ===

266

**Development**: Local Docker Compose setup

267

**Staging**: Scaled-down production replica

268

**CI/CD**: Automated testing and deployment

269

=== 9.3 Disaster Recovery ===

270

* **Database Backups**: Daily automated backups to S3

271

* **Point-in-Time Recovery**: Transaction log archival

272

* **Replication**: Real-time replication to standby

273

* **Recovery Time Objective**: <4 hours

274

275

=== 9.5 Federation Architecture Diagram ===

276

277

{{include reference="FactHarbor.Specification.Diagrams.Federation Architecture.WebHome"/}}

278

279

== 10. Future Architecture Evolution ==

280

=== 10.1 When to Add Complexity ===

281

See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers.

282

**Elasticsearch**: When PostgreSQL search consistently >500ms

283

**TimescaleDB**: When metrics queries consistently >1s

284

**Federation**: When 10,000+ users and explicit demand

285

**Complex Reputation**: When 100+ active contributors

286

=== 10.2 Federation (V2.0+) ===

287

**Deferred until**:

288

* Core product proven with 10,000+ users

289

* User demand for decentralization

290

* Single-node limits reached

291

See [[Federation & Decentralization>>FactHarbor.Specification.Federation & Decentralization.WebHome]] for future plans.

292

== 11. Technology Stack Summary ==

293

**Backend**:

294

* Python (FastAPI or Django)

295

* PostgreSQL (primary database)

296

* Redis (caching)

297

**Frontend**:

298

* Modern JavaScript framework (React, Vue, or Svelte)

299

* Server-side rendering for SEO

300

**AI/LLM**:

301

* Multi-provider orchestration (Claude, GPT-4, local models)

302

* Fallback and cross-checking support

303

**Infrastructure**:

304

* Docker containers

305

* Kubernetes or cloud platform auto-scaling

306

* S3-compatible object storage

307

**Monitoring**:

308

* Prometheus + Grafana

309

* Structured logging (ELK or cloud logging)

310

* Error tracking (Sentry)

311

== 12. Related Pages ==

312

* [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]

313

* [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]]

314

* [[Data Model>>FactHarbor.Specification.Data Model.WebHome]]

315

* [[API Layer>>FactHarbor.Specification.Architecture.WebHome]]

316

* [[Design Decisions>>FactHarbor.Specification.Design-Decisions]]

317

* [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]]

Wiki source code of Architecture

Applications

Navigation

Need help?