Architecture

Version 2.5 by Robert Schaub on 2025/12/11 21:34

Architecture

FactHarbor uses a modular-monolith architecture (POC → Beta 0) designed to evolve into a distributed, federated, multi-node system (Release 1.0+).
Modules are strongly separated, versioned, and auditable. All logic is transparent and deterministic.

High-Level System Architecture

FactHarbor is composed of the following major modules:

UI Frontend
REST API Layer
Core Logic Layer
– Claim Processing
– Scenario Engine
– Evidence Repository
– Verdict Engine
– Re-evaluation Engine
– Roles / Identity / Reputation
AKEL (AI Knowledge Extraction Layer)
Federation Layer
Workers & Background Jobs
Storage Layer (Postgres + VectorDB + ObjectStore)

Key ideas:

Core logic is deterministic, auditable, and versioned
AKEL drafts structured outputs but never publishes directly
Workers run long or asynchronous tasks
Storage is separated for scalability and clarity
Federation Layer provides optional distributed operation

Storage Architecture

FactHarbor separates structured data, embeddings, and evidence files:

PostgreSQL — canonical structured entities, all versioning, lineage, signatures
Vector DB (Qdrant or pgvector) — semantic search, duplication detection, cluster mapping
Object Storage — PDFs, datasets, raw evidence, transcripts
Optional (Release 1.0): Redis for caching, IPFS for decentralized object storage

Core Backend Module Architecture

Each module has a clear responsibility and versioned boundaries to allow future extraction into microservices.

Claim Processing Module

Responsibilities:

Ingest text, URLs, documents, transcripts, federated input
Extract claims (AKEL-assisted)
Normalize structure
Classify (type, domain, evaluability, safety)
Deduplicate via embeddings
Assign to claim clusters

Flow:
Ingest → Normalize → Classify → Deduplicate → Cluster

Scenario Engine

Responsibilities:

Create and validate scenarios
Enforce required fields (definitions, assumptions, boundaries...)
Perform safety checks (AKEL-assisted)
Manage versioning and lifecycle
Provide contextual evaluation settings to the Verdict Engine

Flow:
Create → Validate → Version → Lifecycle → Safety

Evidence Repository

Responsibilities:

Store metadata + files (object store)
Classify evidence
Compute preliminary reliability
Maintain version history
Detect retractions or disputes
Provide structured metadata to the Verdict Engine

Flow:
Store → Classify → Score → Version → Update/Retract

Verdict Engine

Responsibilities:

Aggregate scenario-linked evidence
Compute likelihood ranges
Generate reasoning chain
Track uncertainty factors
Maintain verdict version timelines

Flow:
Aggregate → Compute → Explain → Version → Timeline

Re-evaluation Engine

Responsibilities:

Listen for upstream changes
Trigger partial or full recomputation
Update verdicts + summary views
Maintain consistency across federated nodes

Triggers include:

Evidence updated or retracted
Scenario definition or assumption changes
Claim type or evaluability changes
Contradiction detection
Federation sync updates

Flow:
Trigger → Impact Analysis → Recompute → Publish Update

AKEL Integration Summary

AKEL is fully documented in its own chapter.
Here is only the architectural integration summary:

Receives raw input for claims
Proposes scenario drafts
Extracts and summarizes evidence
Gives reliability hints
Suggests draft verdicts
Monitors contradictions
Syncs metadata with trusted nodes

AKEL runs in parallel to human review — never overrides it.

Federated Architecture

Each FactHarbor node:

Has its own dataset (claims, scenarios, evidence, verdicts)
Runs its own AKEL
Maintains local governance and reviewer rules
May partially mirror global or domain-specific data
Contributes to global knowledge clusters

Nodes synchronize via:

Signed version bundles
Merkle-tree lineage structures
Optionally IPFS for evidence
Trust-weighted acceptance

Benefits:

Community independence
Scalability
Resilience
Domain specialization

Request → Verdict Flow

Simple end-to-end flow:

User → UI Frontend → REST API → FactHarbor Core
→ (Claim Processing → Scenario Engine → Evidence Repository → Verdict Engine)
→ Summary View → UI Frontend → User

Federation Sync Workflow

Sequence:

Detect Local Change → Build Signed Bundle → Push to Peers → Validate Signature → Merge or Fork → Trigger Re-evaluation

Versioning Architecture

All entities (Claim, Scenario, Evidence, Verdict) use immutable version chains:

VersionID
ParentVersionID
Timestamp
AuthorType (Human, AI, ExternalNode)
ChangeReason
Signature (optional POC, required in 1.0)

UCM Configuration Versioning Architecture


graph LR
    ADMIN[UCM Administrator] -->|creates| BLOB[Config Blob - immutable]
    BLOB -->|content-addressed| STORE[(config_blobs)]
    ADMIN -->|activates| ACTIVE[config_active]
    ACTIVE -->|points to| BLOB
    JOB[Analysis Job] -->|snapshots at start| USAGE[config_usage]
    USAGE -->|references| BLOB
    REPORT[Analysis Report] -->|cites| USAGE

How UCM Config Versioning Works

Concept	Description
config_blobs	Immutable, content-addressed config versions. Each change creates a new blob; old blobs are never deleted.
config_active	Pointer to the currently active config blob per config type. Changing this activates a new config version.
config_usage	Links each analysis job to the exact config snapshot used. Enables reproducibility.
Immutability	Analysis outputs are never edited. To improve results, update UCM config and re-analyse.

Current Implementation (v2.10.2)

Feature	Status
UCM config storage	Implemented (config.db SQLite)
Config hot-reload	Implemented (60s TTL)
Per-job config snapshots	Implemented (job_config_snapshots)
Content-addressed blobs	Implemented (hash-based deduplication)
Config activation tracking	Implemented (config_active table)
Admin UI for config management	Not yet implemented (CLI/direct DB)

Design Principles

Every config change creates a new immutable blob — no in-place mutation
Every analysis job records the config snapshot used at time of execution
Reports can be reproduced by re-running with the same config snapshot
Config history is the audit trail — who changed what, when, and why
Analysis data is never edited — "improve the system, not the data"