Architecture

Last modified by Robert Schaub on 2025/12/24 20:30

Architecture

FactHarbor uses a modular-monolith architecture (POC → Beta 0) designed to evolve into a distributed, federated, multi-node system (Release 1.0+).
Modules are strongly separated, versioned, and auditable. All logic is transparent and deterministic.

High-Level System Architecture

FactHarbor is composed of the following major modules:

  • UI Frontend
  • REST API Layer
  • Core Logic Layer
    • Claim Processing 
    • Scenario Engine 
    • Evidence Repository 
    • Verdict Engine 
    • Re-evaluation Engine 
    • Roles / Identity / Reputation
  • AKEL (AI Knowledge Extraction Layer)
  • Federation Layer
  • Workers & Background Jobs
  • Storage Layer (Postgres + VectorDB + ObjectStore)

High-Level Architecture

graph TB
 subgraph Interface_Layer["🖥️ Interface Layer"]
 UI[Web UI
Browse & Submit] API[REST API
Programmatic Access] AUTH[Authentication
& Authorization] end subgraph Processing_Layer["⚙️ Processing Layer"] AKEL[AKEL Pipeline
Parallel Processing
10-18 seconds] LLM[LLM Abstraction Layer
Multi-Provider Support
Anthropic OpenAI Google] BG[Background Jobs
Source Scoring,
Cache, Archival] QM[Quality Monitoring
Automated Checks] end subgraph Data_Layer["💾 Data & Storage Layer"] PG[(PostgreSQL
Primary Database
All Core Data)] REDIS[(Redis
Cache & LLM Config)] S3[(S3
Archives)] end UI --> AUTH API --> AUTH AUTH --> AKEL AUTH --> QM AKEL --> LLM LLM --> PG LLM --> REDIS AKEL --> PG AKEL --> REDIS BG --> PG BG --> S3 QM --> PG REDIS --> PG style Interface_Layer fill:#e1f5ff style Processing_Layer fill:#fff4e1 style Data_Layer fill:#f0f0f0 style AKEL fill:#ffcccc style LLM fill:#ccffcc style PG fill:#9999ff

Three-Layer Architecture - Clean separation with LLM abstraction: Interface Layer (user interactions), Processing Layer (AKEL + LLM Abstraction + background jobs), Data Layer (PostgreSQL primary + Redis cache/config + S3 archives). LLM Abstraction Layer provides provider-agnostic access to Anthropic, OpenAI, Google, and local models with automatic failover.

Key ideas:

  • Core logic is deterministic, auditable, and versioned  
  • AKEL drafts structured outputs but never publishes directly  
  • Workers run long or asynchronous tasks  
  • Storage is separated for scalability and clarity  
  • Federation Layer provides optional distributed operation  

Storage Architecture

FactHarbor separates structured data, embeddings, and evidence files:

  • PostgreSQL — canonical structured entities, all versioning, lineage, signatures  
  • Vector DB (Qdrant or pgvector) — semantic search, duplication detection, cluster mapping  
  • Object Storage — PDFs, datasets, raw evidence, transcripts  
  • Optional (Release 1.0): Redis for caching, IPFS for decentralized object storage  

Storage Architecture

graph TB
 APP[Application
API + AKEL] --> REDIS[Redis Cache
Hot data, sessions,
rate limiting] REDIS --> PG[(PostgreSQL
Primary Database
**All core data**
Claims, Evidence,
Sources, Users)] APP --> PG PG -->|Backups &
Archives| S3[(S3 Storage
Old logs,
Backups)] BG[Background
Scheduler] --> PG BG --> S3 subgraph V10["✅ V1.0 Core (3 systems)"] PG REDIS S3 end subgraph Future["🔮 Optional Future (Add if metrics show need)"] PG -.->|If search slow
>500ms| ES[(Elasticsearch
Full-text search)] PG -.->|If metrics slow
>1s queries| TS[TimescaleDB
Time-series] end style PG fill:#9999ff style REDIS fill:#ff9999 style S3 fill:#ff99ff style ES fill:#cccccc style TS fill:#cccccc style V10 fill:#e8f5e9 style Future fill:#fff3e0

Simplified Storage - PostgreSQL as single primary database for all core data (claims, evidence, sources, users, metrics). Redis for caching, S3 for archives. Elasticsearch and TimescaleDB are optional additions only if performance metrics prove necessary. Start with 3 systems, not 5.

Core Backend Module Architecture

Each module has a clear responsibility and versioned boundaries to allow future extraction into microservices.

1. Claim Processing Module

Responsibilities:

  • Ingest text, URLs, documents, transcripts, federated input  
  • Extract claims (AKEL-assisted)  
  • Normalize structure  
  • Classify (type, domain, evaluability, safety)  
  • Deduplicate via embeddings  
  • Assign to claim clusters  

Flow:
Ingest → Normalize → Classify → Deduplicate → Cluster

2. Scenario Engine

Responsibilities:

  • Create and validate scenarios  
  • Enforce required fields (definitions, assumptions, boundaries...)  
  • Perform safety checks (AKEL-assisted)  
  • Manage versioning and lifecycle  
  • Provide contextual evaluation settings to the Verdict Engine  

Flow:
Create → Validate → Version → Lifecycle → Safety

3. Evidence Repository

Responsibilities:

  • Store metadata + files (object store)  
  • Classify evidence  
  • Compute preliminary reliability  
  • Maintain version history  
  • Detect retractions or disputes  
  • Provide structured metadata to the Verdict Engine  

Flow:
Store → Classify → Score → Version → Update/Retract

4. Verdict Engine

Responsibilities:

  • Aggregate scenario-linked evidence  
  • Compute likelihood ranges per scenario
  • Generate reasoning chain  
  • Track uncertainty factors  
  • Maintain verdict version timelines  

Flow:
Aggregate → Compute → Explain → Version → Timeline

5. Re-evaluation Engine

Responsibilities:

  • Listen for upstream changes  
  • Trigger partial or full recomputation  
  • Update verdicts + summary views  
  • Maintain consistency across federated nodes  

Triggers include:

  • Evidence updated or retracted  
  • Scenario definition or assumption changes  
  • Claim type or evaluability changes  
  • Contradiction detection  
  • Federation sync updates  

Flow:
Trigger → Impact Analysis → Recompute → Publish Update

AKEL Integration Summary

AKEL is fully documented in its own chapter.
Here is only the architectural integration summary:

  • Receives raw input for claims  
  • Proposes scenario drafts  
  • Extracts and summarizes evidence  
  • Gives reliability hints  
  • Suggests draft verdicts  
  • Monitors contradictions  
  • Syncs metadata with trusted nodes  

AKEL runs in parallel to human review — never overrides it.

AKEL Architecture

graph TB
 User[User Submits Content
Text/URL/Single Claim] Extract[Claim Extraction
LLM identifies distinct claims] AKEL[AKEL Core Processing
Per Claim] Evidence[Evidence Gathering] Scenario[Scenario Generation] Verdict[Verdict Generation] Storage[(Storage Layer
PostgreSQL + S3)] Queue[Processing Queue
Parallel Claims] User --> Extract Extract -->|Multiple Claims| Queue Extract -->|Single Claim| AKEL Queue -->|Process Each| AKEL AKEL --> Evidence AKEL --> Scenario Evidence --> Verdict Scenario --> Verdict Verdict --> Storage style Extract fill:#e1f5ff style Queue fill:#fff4e1 style AKEL fill:#f0f0f0

Federated Architecture

Each FactHarbor node:

  • Has its own dataset (claims, scenarios, evidence, verdicts)  
  • Runs its own AKEL  
  • Maintains local governance and reviewer rules  
  • May partially mirror global or domain-specific data  
  • Contributes to global knowledge clusters  

Nodes synchronize via:

  • Signed version bundles  
  • Merkle-tree lineage structures  
  • Optionally IPFS for evidence  
  • Trust-weighted acceptance  

Benefits:

  • Community independence  
  • Scalability  
  • Resilience  
  • Domain specialization  

Federation Architecture
This diagram shows the complete federated architecture with node components and communication layers.

graph LR
 FH1[FactHarbor
Instance 1] FH2[FactHarbor
Instance 2] FH3[FactHarbor
Instance 3] FH1 -.->|V1.0+:
Sync claims| FH2 FH2 -.->|V1.0+:
Sync claims| FH3 FH3 -.->|V1.0+:
Sync claims| FH1 U1[Users] --> FH1 U2[Users] --> FH2 U3[Users] --> FH3 style FH1 fill:#e1f5ff style FH2 fill:#e1f5ff style FH3 fill:#e1f5ff

Federation Architecture - Future (V1.0+): Independent FactHarbor instances can sync claims for broader reach while maintaining local control.

Request → Verdict Flow

Simple end-to-end flow:

User → UI Frontend → REST API → FactHarbor Core
 → (Claim Processing → Scenario Engine → Evidence Repository → Verdict Engine)
 → Summary View → UI Frontend → User

Federation Sync Workflow

Sequence:

Detect Local Change → Build Signed Bundle → Push to Peers → Validate Signature → Merge or Fork → Trigger Re-evaluation

Versioning Architecture

All entities (Claim, Scenario, Evidence, Verdict) use immutable version chains:

  • VersionID  
  • ParentVersionID  
  • Timestamp  
  • AuthorType (Human, AI, ExternalNode)  
  • ChangeReason  
  • Signature (optional POC, required in 1.0)  
graph LR
 CLAIM[Claim] -->|edited| EDIT[Edit Record]
 EDIT -->|stores| BEFORE[Before State]
 EDIT -->|stores| AFTER[After State]
 EDIT -->|tracks| WHO[Who Changed]
 EDIT -->|tracks| WHEN[When Changed]
 EDIT -->|tracks| WHY[Why Changed]
 EDIT -->|if needed| RESTORE[Manual Restore]
 RESTORE -->|create new| CLAIM
 style EDIT fill:#ffcccc
 style RESTORE fill:#ccffcc

Versioning Architecture - Simple audit trail for V1.0: Track who, what, when, why for each change. Store before/after values in edits table. Manual restore if needed (create new edit with old values). Full versioning system (branching, merging, automatic rollback) deferred to V2.0+ unless users explicitly request it.
V1.0: Simple edit history sufficient for accountability and basic rollback.
V2.0+: Add complex versioning if users request "see version history" or "restore previous version" features.