Architecture

Version 2.5 by Robert Schaub on 2025/12/11 21:34

Architecture

FactHarbor uses a modular-monolith architecture (POC → Beta 0) designed to evolve into a distributed, federated, multi-node system (Release 1.0+).
Modules are strongly separated, versioned, and auditable. All logic is transparent and deterministic.


High-Level System Architecture

FactHarbor is composed of the following major modules:

  • UI Frontend
  • REST API Layer
  • Core Logic Layer
    – Claim Processing  
    – Scenario Engine  
    – Evidence Repository  
    – Verdict Engine  
    – Re-evaluation Engine  
    – Roles / Identity / Reputation
  • AKEL (AI Knowledge Extraction Layer)
  • Federation Layer
  • Workers & Background Jobs
  • Storage Layer (Postgres + VectorDB + ObjectStore)

Key ideas:

  • Core logic is deterministic, auditable, and versioned 
  • AKEL drafts structured outputs but never publishes directly 
  • Workers run long or asynchronous tasks 
  • Storage is separated for scalability and clarity 
  • Federation Layer provides optional distributed operation 

Storage Architecture

FactHarbor separates structured data, embeddings, and evidence files:

  • PostgreSQL — canonical structured entities, all versioning, lineage, signatures 
  • Vector DB (Qdrant or pgvector) — semantic search, duplication detection, cluster mapping 
  • Object Storage — PDFs, datasets, raw evidence, transcripts 
  • Optional (Release 1.0): Redis for caching, IPFS for decentralized object storage 

Core Backend Module Architecture

Each module has a clear responsibility and versioned boundaries to allow future extraction into microservices.

Claim Processing Module

Responsibilities:

  • Ingest text, URLs, documents, transcripts, federated input 
  • Extract claims (AKEL-assisted) 
  • Normalize structure 
  • Classify (type, domain, evaluability, safety) 
  • Deduplicate via embeddings 
  • Assign to claim clusters 

Flow:  
Ingest → Normalize → Classify → Deduplicate → Cluster


Scenario Engine

Responsibilities:

  • Create and validate scenarios 
  • Enforce required fields (definitions, assumptions, boundaries...) 
  • Perform safety checks (AKEL-assisted) 
  • Manage versioning and lifecycle 
  • Provide contextual evaluation settings to the Verdict Engine 

Flow:  
Create → Validate → Version → Lifecycle → Safety


Evidence Repository

Responsibilities:

  • Store metadata + files (object store) 
  • Classify evidence 
  • Compute preliminary reliability 
  • Maintain version history 
  • Detect retractions or disputes 
  • Provide structured metadata to the Verdict Engine 

Flow:  
Store → Classify → Score → Version → Update/Retract


Verdict Engine

Responsibilities:

  • Aggregate scenario-linked evidence 
  • Compute likelihood ranges 
  • Generate reasoning chain 
  • Track uncertainty factors 
  • Maintain verdict version timelines 

Flow:  
Aggregate → Compute → Explain → Version → Timeline


Re-evaluation Engine

Responsibilities:

  • Listen for upstream changes 
  • Trigger partial or full recomputation 
  • Update verdicts + summary views 
  • Maintain consistency across federated nodes 

Triggers include:

  • Evidence updated or retracted 
  • Scenario definition or assumption changes 
  • Claim type or evaluability changes 
  • Contradiction detection 
  • Federation sync updates 

Flow:  
Trigger → Impact Analysis → Recompute → Publish Update


AKEL Integration Summary

AKEL is fully documented in its own chapter.
Here is only the architectural integration summary:

  • Receives raw input for claims 
  • Proposes scenario drafts 
  • Extracts and summarizes evidence 
  • Gives reliability hints 
  • Suggests draft verdicts 
  • Monitors contradictions 
  • Syncs metadata with trusted nodes 

AKEL runs in parallel to human review — never overrides it.


Federated Architecture

Each FactHarbor node:

  • Has its own dataset (claims, scenarios, evidence, verdicts) 
  • Runs its own AKEL 
  • Maintains local governance and reviewer rules 
  • May partially mirror global or domain-specific data 
  • Contributes to global knowledge clusters 

Nodes synchronize via:

  • Signed version bundles 
  • Merkle-tree lineage structures 
  • Optionally IPFS for evidence 
  • Trust-weighted acceptance 

Benefits:

  • Community independence 
  • Scalability 
  • Resilience 
  • Domain specialization 

Request → Verdict Flow

Simple end-to-end flow:

User → UI Frontend → REST API → FactHarbor Core
      → (Claim Processing → Scenario Engine → Evidence Repository → Verdict Engine)
      → Summary View → UI Frontend → User


Federation Sync Workflow

Sequence:

Detect Local Change → Build Signed Bundle → Push to Peers → Validate Signature → Merge or Fork → Trigger Re-evaluation


Versioning Architecture

All entities (Claim, Scenario, Evidence, Verdict) use immutable version chains:

  • VersionID 
  • ParentVersionID 
  • Timestamp 
  • AuthorType (Human, AI, ExternalNode) 
  • ChangeReason 
  • Signature (optional POC, required in 1.0) 

UCM Configuration Versioning Architecture


graph LR
    ADMIN[UCM Administrator] -->|creates| BLOB[Config Blob - immutable]
    BLOB -->|content-addressed| STORE[(config_blobs)]
    ADMIN -->|activates| ACTIVE[config_active]
    ACTIVE -->|points to| BLOB
    JOB[Analysis Job] -->|snapshots at start| USAGE[config_usage]
    USAGE -->|references| BLOB
    REPORT[Analysis Report] -->|cites| USAGE

How UCM Config Versioning Works

 Concept  Description
 config_blobs  Immutable, content-addressed config versions. Each change creates a new blob; old blobs are never deleted.
 config_active  Pointer to the currently active config blob per config type. Changing this activates a new config version.
 config_usage  Links each analysis job to the exact config snapshot used. Enables reproducibility.
 Immutability  Analysis outputs are never edited. To improve results, update UCM config and re-analyse.

Current Implementation (v2.10.2)

 Feature  Status
 UCM config storage  Implemented (config.db SQLite)
 Config hot-reload  Implemented (60s TTL)
 Per-job config snapshots  Implemented (job_config_snapshots)
 Content-addressed blobs  Implemented (hash-based deduplication)
 Config activation tracking  Implemented (config_active table)
 Admin UI for config management  Not yet implemented (CLI/direct DB)

Design Principles

  • Every config change creates a new immutable blob — no in-place mutation
  • Every analysis job records the config snapshot used at time of execution
  • Reports can be reproduced by re-running with the same config snapshot
  • Config history is the audit trail — who changed what, when, and why
  • Analysis data is never edited — "improve the system, not the data"