Architecture

Version 2.4 by Robert Schaub on 2025/12/11 21:34

Architecture

FactHarbor uses a modular-monolith architecture (POC → Beta 0) designed to evolve into a distributed, federated, multi-node system (Release 1.0+).
Modules are strongly separated, versioned, and auditable. All logic is transparent and deterministic.


High-Level System Architecture

FactHarbor is composed of the following major modules:

  • UI Frontend
  • REST API Layer
  • Core Logic Layer
    – Claim Processing  
    – Scenario Engine  
    – Evidence Repository  
    – Verdict Engine  
    – Re-evaluation Engine  
    – Roles / Identity / Reputation
  • AKEL (AI Knowledge Extraction Layer)
  • Federation Layer
  • Workers & Background Jobs
  • Storage Layer (Postgres + VectorDB + ObjectStore)

Key ideas:

  • Core logic is deterministic, auditable, and versioned 
  • AKEL drafts structured outputs but never publishes directly 
  • Workers run long or asynchronous tasks 
  • Storage is separated for scalability and clarity 
  • Federation Layer provides optional distributed operation 

Storage Architecture

FactHarbor separates structured data, embeddings, and evidence files:

  • PostgreSQL — canonical structured entities, all versioning, lineage, signatures 
  • Vector DB (Qdrant or pgvector) — semantic search, duplication detection, cluster mapping 
  • Object Storage — PDFs, datasets, raw evidence, transcripts 
  • Optional (Release 1.0): Redis for caching, IPFS for decentralized object storage 

Storage Architecture

graph TB
 APP[Application
API + AKEL] --> REDIS[Redis Cache
Hot data, sessions,
rate limiting] REDIS --> PG[(PostgreSQL
Primary Database
**All core data**
Claims, Evidence,
Sources, Users)] APP --> PG PG -->|Backups &
Archives| S3[(S3 Storage
Old logs,
Backups)] BG[Background
Scheduler] --> PG BG --> S3 subgraph V10["✅ V1.0 Core (3 systems)"] PG REDIS S3 end subgraph Future["🔮 Optional Future (Add if metrics show need)"] PG -.->|If search slow
>500ms| ES[(Elasticsearch
Full-text search)] PG -.->|If metrics slow
>1s queries| TS[TimescaleDB
Time-series] end style PG fill:#9999ff style REDIS fill:#ff9999 style S3 fill:#ff99ff style ES fill:#cccccc style TS fill:#cccccc style V10 fill:#e8f5e9 style Future fill:#fff3e0

Simplified Storage - PostgreSQL as single primary database for all core data (claims, evidence, sources, users, metrics). Redis for caching, S3 for archives. Elasticsearch and TimescaleDB are optional additions only if performance metrics prove necessary. Start with 3 systems, not 5.


Core Backend Module Architecture

Each module has a clear responsibility and versioned boundaries to allow future extraction into microservices.

Claim Processing Module

Responsibilities:

  • Ingest text, URLs, documents, transcripts, federated input 
  • Extract claims (AKEL-assisted) 
  • Normalize structure 
  • Classify (type, domain, evaluability, safety) 
  • Deduplicate via embeddings 
  • Assign to claim clusters 

Flow:  
Ingest → Normalize → Classify → Deduplicate → Cluster


Scenario Engine

Responsibilities:

  • Create and validate scenarios 
  • Enforce required fields (definitions, assumptions, boundaries...) 
  • Perform safety checks (AKEL-assisted) 
  • Manage versioning and lifecycle 
  • Provide contextual evaluation settings to the Verdict Engine 

Flow:  
Create → Validate → Version → Lifecycle → Safety


Evidence Repository

Responsibilities:

  • Store metadata + files (object store) 
  • Classify evidence 
  • Compute preliminary reliability 
  • Maintain version history 
  • Detect retractions or disputes 
  • Provide structured metadata to the Verdict Engine 

Flow:  
Store → Classify → Score → Version → Update/Retract


Verdict Engine

Responsibilities:

  • Aggregate scenario-linked evidence 
  • Compute likelihood ranges 
  • Generate reasoning chain 
  • Track uncertainty factors 
  • Maintain verdict version timelines 

Flow:  
Aggregate → Compute → Explain → Version → Timeline


Re-evaluation Engine

Responsibilities:

  • Listen for upstream changes 
  • Trigger partial or full recomputation 
  • Update verdicts + summary views 
  • Maintain consistency across federated nodes 

Triggers include:

  • Evidence updated or retracted 
  • Scenario definition or assumption changes 
  • Claim type or evaluability changes 
  • Contradiction detection 
  • Federation sync updates 

Flow:  
Trigger → Impact Analysis → Recompute → Publish Update


AKEL Integration Summary

AKEL is fully documented in its own chapter.
Here is only the architectural integration summary:

  • Receives raw input for claims 
  • Proposes scenario drafts 
  • Extracts and summarizes evidence 
  • Gives reliability hints 
  • Suggests draft verdicts 
  • Monitors contradictions 
  • Syncs metadata with trusted nodes 

AKEL runs in parallel to human review — never overrides it.


Federated Architecture

Each FactHarbor node:

  • Has its own dataset (claims, scenarios, evidence, verdicts) 
  • Runs its own AKEL 
  • Maintains local governance and reviewer rules 
  • May partially mirror global or domain-specific data 
  • Contributes to global knowledge clusters 

Nodes synchronize via:

  • Signed version bundles 
  • Merkle-tree lineage structures 
  • Optionally IPFS for evidence 
  • Trust-weighted acceptance 

Benefits:

  • Community independence 
  • Scalability 
  • Resilience 
  • Domain specialization 

Request → Verdict Flow

Simple end-to-end flow:

User → UI Frontend → REST API → FactHarbor Core
      → (Claim Processing → Scenario Engine → Evidence Repository → Verdict Engine)
      → Summary View → UI Frontend → User


Federation Sync Workflow

Sequence:

Detect Local Change → Build Signed Bundle → Push to Peers → Validate Signature → Merge or Fork → Trigger Re-evaluation


Versioning Architecture

All entities (Claim, Scenario, Evidence, Verdict) use immutable version chains:

  • VersionID 
  • ParentVersionID 
  • Timestamp 
  • AuthorType (Human, AI, ExternalNode) 
  • ChangeReason 
  • Signature (optional POC, required in 1.0) 
graph LR
 CLAIM[Claim] -->|edited| EDIT[Edit Record]
 EDIT -->|stores| BEFORE[Before State]
 EDIT -->|stores| AFTER[After State]
 EDIT -->|tracks| WHO[Who Changed]
 EDIT -->|tracks| WHEN[When Changed]
 EDIT -->|tracks| WHY[Why Changed]
 EDIT -->|if needed| RESTORE[Manual Restore]
 RESTORE -->|create new| CLAIM
 style EDIT fill:#ffcccc
 style RESTORE fill:#ccffcc

Versioning Architecture - Simple audit trail for V1.0: Track who, what, when, why for each change. Store before/after values in edits table. Manual restore if needed (create new edit with old values). Full versioning system (branching, merging, automatic rollback) deferred to V2.0+ unless users explicitly request it.
V1.0: Simple edit history sufficient for accountability and basic rollback.
V2.0+: Add complex versioning if users request "see version history" or "restore previous version" features.