Architecture
Architecture
FactHarbor's architecture is designed for simplicity, automation, and continuous improvement.
1. Core Principles
- AI-First: AKEL (AI) is the primary system, humans supplement
- Publish by Default: No centralized approval (removed in V0.9.50), publish with confidence scores
- System Over Data: Fix algorithms, not individual outputs
- Measure Everything: Quality metrics drive improvements
- Scale Through Automation: Minimal human intervention
- Start Simple: Add complexity only when metrics prove necessary
2. High-Level Architecture
High-Level Architecture
graph TB subgraph Interface_Layer["🖥️ Interface Layer"] UI[Web UI
Browse & Submit] API[REST API
Programmatic Access] AUTH[Authentication
& Authorization] end subgraph Processing_Layer["⚙️ Processing Layer"] AKEL[AKEL Pipeline
Parallel Processing
10-18 seconds] LLM[LLM Abstraction Layer
Multi-Provider Support
Anthropic OpenAI Google] BG[Background Jobs
Source Scoring,
Cache, Archival] QM[Quality Monitoring
Automated Checks] end subgraph Data_Layer["💾 Data & Storage Layer"] PG[(PostgreSQL
Primary Database
All Core Data)] REDIS[(Redis
Cache & LLM Config)] S3[(S3
Archives)] end UI --> AUTH API --> AUTH AUTH --> AKEL AUTH --> QM AKEL --> LLM LLM --> PG LLM --> REDIS AKEL --> PG AKEL --> REDIS BG --> PG BG --> S3 QM --> PG REDIS --> PG style Interface_Layer fill:#e1f5ff style Processing_Layer fill:#fff4e1 style Data_Layer fill:#f0f0f0 style AKEL fill:#ffcccc style LLM fill:#ccffcc style PG fill:#9999ff
Three-Layer Architecture - Clean separation with LLM abstraction: Interface Layer (user interactions), Processing Layer (AKEL + LLM Abstraction + background jobs), Data Layer (PostgreSQL primary + Redis cache/config + S3 archives). LLM Abstraction Layer provides provider-agnostic access to Anthropic, OpenAI, Google, and local models with automatic failover.
2.1 Three-Layer Architecture
FactHarbor uses a clean three-layer architecture:
Interface Layer
Handles all user and system interactions:
- Web UI: Browse claims, view evidence, submit feedback
- REST API: Programmatic access for integrations
- Authentication & Authorization: User identity and permissions
- Rate Limiting: Protect against abuse
Processing Layer
Core business logic and AI processing:
- AKEL Pipeline: AI-driven claim analysis (parallel processing)
- Parse and extract claim components
- Gather evidence from multiple sources
- Check source track records
- Extract scenarios from evidence
- Synthesize verdicts
- Calculate risk scores
- LLM Abstraction Layer: Provider-agnostic AI access
- Multi-provider support (Anthropic, OpenAI, Google, local models)
- Automatic failover and rate limit handling
- Per-stage model configuration
- Cost optimization through provider selection
- No vendor lock-in
- Background Jobs: Automated maintenance tasks
- Source track record updates (weekly)
- Cache warming and invalidation
- Metrics aggregation
- Data archival
- Quality Monitoring: Automated quality checks
- Anomaly detection
- Contradiction detection
- Completeness validation
- Moderation Detection: Automated abuse detection
- Spam identification
- Manipulation detection
- Flag suspicious activity
Data & Storage Layer
Persistent data storage and caching:
- PostgreSQL: Primary database for all core data
- Claims, evidence, sources, users
- Scenarios, edits, audit logs
- Built-in full-text search
- Time-series capabilities for metrics
- Redis: High-speed caching layer
- Session data
- Frequently accessed claims
- API rate limiting
- S3 Storage: Long-term archival
- Old edit history (90+ days)
- AKEL processing logs
- Backup snapshots
Optional future additions (add only when metrics prove necessary): - Elasticsearch: If PostgreSQL full-text search becomes slow
- TimescaleDB: If metrics queries become a bottleneck
2.2 LLM Abstraction Layer
LLM Abstraction Architecture
graph LR subgraph AKEL["AKEL Pipeline"] S1[Stage 1
Extract Claims] S2[Stage 2
Analyze Claims] S3[Stage 3
Holistic Assessment] end subgraph LLM["LLM Abstraction Layer"] INT[Provider Interface] CFG[Configuration
Registry] FAIL[Failover
Handler] end subgraph Providers["LLM Providers"] ANT[Anthropic
Claude API
PRIMARY] OAI[OpenAI
GPT API
SECONDARY] GOO[Google
Gemini API
TERTIARY] LOC[Local Models
Llama/Mistral
FUTURE] end S1 --> INT S2 --> INT S3 --> INT INT --> CFG INT --> FAIL CFG --> ANT FAIL --> ANT FAIL --> OAI FAIL --> GOO ANT -.fallback.-> OAI OAI -.fallback.-> GOO style AKEL fill:#ffcccc style LLM fill:#ccffcc style Providers fill:#e1f5ff style ANT fill:#ff9999 style OAI fill:#99ccff style GOO fill:#99ff99 style LOC fill:#cccccc
LLM Abstraction Architecture - AKEL stages call through provider interface. Configuration registry selects provider per stage. Failover handler implements automatic fallback chain.
POC1 Implementation:
- PRIMARY: Provider A API (FAST model for Stage 1, REASONING model for Stages 2 & 3)
- Failover: Basic error handling with cache fallback
Future (POC2/Beta):
- SECONDARY: OpenAI GPT API (automatic failover)
- TERTIARY: Google Gemini API (tertiary fallback)
- FUTURE: Local models (Llama/Mistral for on-premises deployments)
Architecture Benefits:
- Prevents vendor lock-in
- Ensures resilience through automatic failover
- Enables cost optimization per stage
- Supports regulatory compliance (provider selection for data residency)
Description: Shows how AKEL stages interact with multiple LLM providers through an abstraction layer. POC1 uses Anthropic Claude as primary provider (Haiku 4.5 for extraction, Sonnet 4.5 for analysis). OpenAI, Google, and local models are shown as future expansion options (POC2/Beta).
Purpose: FactHarbor uses a provider-agnostic abstraction layer for all AI interactions, avoiding vendor lock-in and enabling flexible provider selection.
Multi-Provider Support:
- Primary: Anthropic Claude API (Haiku for extraction, Sonnet for analysis)
- Secondary: OpenAI GPT API (automatic failover)
- Tertiary: Google Vertex AI / Gemini
- Future: Local models (Llama, Mistral) for on-premises deployments
Provider Interface:
- Abstract `LLMProvider` interface with `complete()`, `stream()`, `getName()`, `getCostPer1kTokens()`, `isAvailable()` methods
- Per-stage model configuration (Stage 1: Haiku, Stage 2 & 3: Sonnet)
- Environment variable and database configuration
- Adapter pattern implementation (AnthropicProvider, OpenAIProvider, GoogleProvider)
Configuration:
- Runtime provider switching without code changes
- Admin API for provider management (`POST /admin/v1/llm/configure`)
- Per-stage cost optimization (use cheaper models for extraction, quality models for analysis)
- Support for rate limit handling and cost tracking
Failover Strategy:
- Automatic fallback: Primary → Secondary → Tertiary
- Circuit breaker pattern for unavailable providers
- Health checking and provider availability monitoring
- Graceful degradation when all providers unavailable
Cost Optimization:
- Track and compare costs across providers per request
- Enable A/B testing of different models for quality/cost tradeoffs
- Per-stage provider selection for optimal cost-efficiency
- Cost comparison: Anthropic ($0.114), OpenAI ($0.065), Google ($0.072) per article at 0% cache
Architecture Pattern:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Stage 1 Extract ──→ Provider Interface ──→ Anthropic (PRIMARY)
Stage 2 Analyze ──→ Configuration ──→ OpenAI (SECONDARY)
Stage 3 Holistic ──→ Failover Handler ──→ Google (TERTIARY)
└→ Local Models (FUTURE)
Benefits:
- No Vendor Lock-In: Switch providers based on cost, quality, or availability without code changes
- Resilience: Automatic failover ensures service continuity during provider outages
- Cost Efficiency: Use optimal provider per task (cheap for extraction, quality for analysis)
- Quality Assurance: Cross-provider output verification for critical claims
- Regulatory Compliance: Use specific providers for data residency requirements
- Future-Proofing: Easy integration of new models as they become available
Cross-References:
- Requirements: NFR-14 (formal requirement)
- POC Requirements: NFR-POC-11 (POC1 implementation)
- API Specification: Section 6 (implementation details)
- Design Decisions: Section 9 (design rationale)
2.2 Design Philosophy
Start Simple, Evolve Based on Metrics
The architecture deliberately starts simple:
- Single primary database (PostgreSQL handles most workloads initially)
- Three clear layers (easy to understand and maintain)
- Automated operations (minimal human intervention)
- Measure before optimizing (add complexity only when proven necessary)
See Design Decisions and When to Add Complexity for detailed rationale.
3. AKEL Architecture
AKEL Architecture
graph TB User[User Submits Content
Text/URL/Single Claim] Extract[Claim Extraction
LLM identifies distinct claims] AKEL[AKEL Core Processing
Per Claim] Evidence[Evidence Gathering] Scenario[Scenario Generation] Verdict[Verdict Generation] Storage[(Storage Layer
PostgreSQL + S3)] Queue[Processing Queue
Parallel Claims] User --> Extract Extract -->|Multiple Claims| Queue Extract -->|Single Claim| AKEL Queue -->|Process Each| AKEL AKEL --> Evidence AKEL --> Scenario Evidence --> Verdict Scenario --> Verdict Verdict --> Storage style Extract fill:#e1f5ff style Queue fill:#fff4e1 style AKEL fill:#f0f0f0
See AI Knowledge Extraction Layer (AKEL) for detailed information.
3.5 Claim Processing Architecture
FactHarbor's claim processing architecture is designed to handle both single-claim and multi-claim submissions efficiently.
Multi-Claim Handling
Users often submit:
- Text with multiple claims: Articles, statements, or paragraphs containing several distinct factual claims
- Web pages: URLs that are analyzed to extract all verifiable claims
- Single claims: Simple, direct factual statements
The first processing step is always Claim Extraction: identifying and isolating individual verifiable claims from submitted content.
Processing Phases
POC Implementation (Two-Phase):
Phase 1 - Claim Extraction:
- LLM analyzes submitted content
- Extracts all distinct, verifiable claims
- Returns structured list of claims with context
Phase 2 - Parallel Analysis:
- Each claim processed independently by LLM
- Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk
- Parallelized across all claims
- Results aggregated for presentation
Production Implementation (Three-Phase):
Phase 1 - Extraction + Validation:
- Extract claims from content
- Validate clarity and uniqueness
- Filter vague or duplicate claims
Phase 2 - Evidence Gathering (Parallel):
- Independent evidence gathering per claim
- Source validation and scenario generation
- Quality gates prevent poor data from advancing
Phase 3 - Verdict Generation (Parallel):
- Generate verdict from validated evidence
- Confidence scoring and risk assessment
- Low-confidence cases routed to human review
Architectural Benefits
Scalability:
- Process 100 claims with 3x latency of single claim
- Parallel processing across independent claims
- Linear cost scaling with claim count
2.3 Design Philosophy
Quality:
- Validation gates between phases
- Errors isolated to individual claims
- Clear observability per processing step
Flexibility:
- Each phase optimizable independently
- Can use different model sizes per phase
- Easy to add human review at decision points
4. Storage Architecture
Storage Architecture
graph TB APP[Application
API + AKEL] --> REDIS[Redis Cache
Hot data, sessions,
rate limiting] REDIS --> PG[(PostgreSQL
Primary Database
**All core data**
Claims, Evidence,
Sources, Users)] APP --> PG PG -->|Backups &
Archives| S3[(S3 Storage
Old logs,
Backups)] BG[Background
Scheduler] --> PG BG --> S3 subgraph V10["✅ V1.0 Core (3 systems)"] PG REDIS S3 end subgraph Future["🔮 Optional Future (Add if metrics show need)"] PG -.->|If search slow
>500ms| ES[(Elasticsearch
Full-text search)] PG -.->|If metrics slow
>1s queries| TS[TimescaleDB
Time-series] end style PG fill:#9999ff style REDIS fill:#ff9999 style S3 fill:#ff99ff style ES fill:#cccccc style TS fill:#cccccc style V10 fill:#e8f5e9 style Future fill:#fff3e0
Simplified Storage - PostgreSQL as single primary database for all core data (claims, evidence, sources, users, metrics). Redis for caching, S3 for archives. Elasticsearch and TimescaleDB are optional additions only if performance metrics prove necessary. Start with 3 systems, not 5.
See Storage Strategy for detailed information.
4.5 Versioning Architecture
graph LR CLAIM[Claim] -->|edited| EDIT[Edit Record] EDIT -->|stores| BEFORE[Before State] EDIT -->|stores| AFTER[After State] EDIT -->|tracks| WHO[Who Changed] EDIT -->|tracks| WHEN[When Changed] EDIT -->|tracks| WHY[Why Changed] EDIT -->|if needed| RESTORE[Manual Restore] RESTORE -->|create new| CLAIM style EDIT fill:#ffcccc style RESTORE fill:#ccffcc
Versioning Architecture - Simple audit trail for V1.0: Track who, what, when, why for each change. Store before/after values in edits table. Manual restore if needed (create new edit with old values). Full versioning system (branching, merging, automatic rollback) deferred to V2.0+ unless users explicitly request it.
V1.0: Simple edit history sufficient for accountability and basic rollback.
V2.0+: Add complex versioning if users request "see version history" or "restore previous version" features.
5. Automated Systems in Detail
FactHarbor relies heavily on automation to achieve scale and quality. Here's how each automated system works:
5.1 AKEL (AI Knowledge Evaluation Layer)
What it does: Primary AI processing engine that analyzes claims automatically
Inputs:
- User-submitted claim text
- Existing evidence and sources
- Source track record database
Processing steps:
- Parse & Extract: Identify key components, entities, assertions
2. Gather Evidence: Search web and database for relevant sources
3. Check Sources: Evaluate source reliability using track records
4. Extract Scenarios: Identify different contexts from evidence
5. Synthesize Verdict: Compile evidence assessment per scenario
6. Calculate Risk: Assess potential harm and controversy
Outputs:
- Structured claim record
- Evidence links with relevance scores
- Scenarios with context descriptions
- Verdict summary per scenario
- Overall confidence score
- Risk assessment
Timing: 10-18 seconds total (parallel processing)
5.2 Background Jobs
Source Track Record Updates (Weekly):
- Analyze claim outcomes from past week
- Calculate source accuracy and reliability
- Update source_track_record table
- Never triggered by individual claims (prevents circular dependencies)
Cache Management (Continuous): - Warm cache for popular claims
- Invalidate cache on claim updates
- Monitor cache hit rates
Metrics Aggregation (Hourly): - Roll up detailed metrics
- Calculate system health indicators
- Generate performance reports
Data Archival (Daily): - Move old AKEL logs to S3 (90+ days)
- Archive old edit history
- Compress and backup data
5.3 Quality Monitoring
Automated checks run continuously:
- Anomaly Detection: Flag unusual patterns
- Sudden confidence score changes
- Unusual evidence distributions
- Suspicious source patterns
- Contradiction Detection: Identify conflicts
- Evidence that contradicts other evidence
- Claims with internal contradictions
- Source track record anomalies
- Completeness Validation: Ensure thoroughness
- Sufficient evidence gathered
- Multiple source types represented
- Key scenarios identified
5.4 Moderation Detection
Automated abuse detection:
- Spam Identification: Pattern matching for spam claims
- Manipulation Detection: Identify coordinated editing
- Gaming Detection: Flag attempts to game source scores
- Suspicious Activity: Log unusual behavior patterns
Human Review: Moderators review flagged items, system learns from decisions
6. Scalability Strategy
6.1 Horizontal Scaling
Components scale independently:
- AKEL Workers: Add more processing workers as claim volume grows
- Database Read Replicas: Add replicas for read-heavy workloads
- Cache Layer: Redis cluster for distributed caching
- API Servers: Load-balanced API instances
6.2 Vertical Scaling
Individual components can be upgraded:
- Database Server: Increase CPU/RAM for PostgreSQL
- Cache Memory: Expand Redis memory
- Worker Resources: More powerful AKEL worker machines
6.3 Performance Optimization
Built-in optimizations:
- Denormalized Data: Cache summary data in claim records (70% fewer joins)
- Parallel Processing: AKEL pipeline processes in parallel (40% faster)
- Intelligent Caching: Redis caches frequently accessed data
- Background Processing: Non-urgent tasks run asynchronously
7. Monitoring & Observability
7.1 Key Metrics
System tracks:
- Performance: AKEL processing time, API response time, cache hit rate
- Quality: Confidence score distribution, evidence completeness, contradiction rate
- Usage: Claims per day, active users, API requests
- Errors: Failed AKEL runs, API errors, database issues
7.2 Alerts
Automated alerts for:
- Processing time >30 seconds (threshold breach)
- Error rate >1% (quality issue)
- Cache hit rate <80% (cache problem)
- Database connections >80% capacity (scaling needed)
7.3 Dashboards
Real-time monitoring:
- System Health: Overall status and key metrics
- AKEL Performance: Processing time breakdown
- Quality Metrics: Confidence scores, completeness
- User Activity: Usage patterns, peak times
8. Security Architecture
8.1 Authentication & Authorization
- User Authentication: Secure login with password hashing
- Role-Based Access: Reader, Contributor, Moderator, Admin
- API Keys: For programmatic access
- Rate Limiting: Prevent abuse
8.2 Data Security
- Encryption: TLS for transport, encrypted storage for sensitive data
- Audit Logging: Track all significant changes
- Input Validation: Sanitize all user inputs
- SQL Injection Protection: Parameterized queries
8.3 Abuse Prevention
- Rate Limiting: Prevent flooding and DDoS
- Automated Detection: Flag suspicious patterns
- Human Review: Moderators investigate flagged content
- Ban Mechanisms: Block abusive users/IPs
9. Deployment Architecture
9.1 Production Environment
Components:
- Load Balancer (HAProxy or cloud LB)
- Multiple API servers (stateless)
- AKEL worker pool (auto-scaling)
- PostgreSQL primary + read replicas
- Redis cluster
- S3-compatible storage
Regions: Single region for V1.0, multi-region when needed
9.2 Development & Staging
Development: Local Docker Compose setup
Staging: Scaled-down production replica
CI/CD: Automated testing and deployment
9.3 Disaster Recovery
- Database Backups: Daily automated backups to S3
- Point-in-Time Recovery: Transaction log archival
- Replication: Real-time replication to standby
- Recovery Time Objective: <4 hours
9.5 Federation Architecture Diagram
Federation Architecture
This diagram shows the complete federated architecture with node components and communication layers.
graph LR FH1[FactHarbor
Instance 1] FH2[FactHarbor
Instance 2] FH3[FactHarbor
Instance 3] FH1 -.->|V1.0+:
Sync claims| FH2 FH2 -.->|V1.0+:
Sync claims| FH3 FH3 -.->|V1.0+:
Sync claims| FH1 U1[Users] --> FH1 U2[Users] --> FH2 U3[Users] --> FH3 style FH1 fill:#e1f5ff style FH2 fill:#e1f5ff style FH3 fill:#e1f5ff
Federation Architecture - Future (V1.0+): Independent FactHarbor instances can sync claims for broader reach while maintaining local control.
10. Future Architecture Evolution
10.1 When to Add Complexity
See When to Add Complexity for specific triggers.
Elasticsearch: When PostgreSQL search consistently >500ms
TimescaleDB: When metrics queries consistently >1s
Federation: When 10,000+ users and explicit demand
Complex Reputation: When 100+ active contributors
10.2 Federation (V2.0+)
Deferred until:
- Core product proven with 10,000+ users
- User demand for decentralization
- Single-node limits reached
See Federation & Decentralization for future plans.
11. Technology Stack Summary
Backend:
- Python (FastAPI or Django)
- PostgreSQL (primary database)
- Redis (caching)
Frontend: - Modern JavaScript framework (React, Vue, or Svelte)
- Server-side rendering for SEO
AI/LLM: - Multi-provider orchestration (Claude, GPT-4, local models)
- Fallback and cross-checking support
Infrastructure: - Docker containers
- Kubernetes or cloud platform auto-scaling
- S3-compatible object storage
Monitoring: - Prometheus + Grafana
- Structured logging (ELK or cloud logging)
- Error tracking (Sentry)