Architecture

Version 2.4 by Robert Schaub on 2025/12/11 21:34

Architecture

FactHarbor uses a modular-monolith architecture (POC → Beta 0) designed to evolve into a distributed, federated, multi-node system (Release 1.0+).
Modules are strongly separated, versioned, and auditable. All logic is transparent and deterministic.

High-Level System Architecture

FactHarbor is composed of the following major modules:

UI Frontend
REST API Layer
Core Logic Layer
– Claim Processing
– Scenario Engine
– Evidence Repository
– Verdict Engine
– Re-evaluation Engine
– Roles / Identity / Reputation
AKEL (AI Knowledge Extraction Layer)
Federation Layer
Workers & Background Jobs
Storage Layer (Postgres + VectorDB + ObjectStore)

Key ideas:

Core logic is deterministic, auditable, and versioned
AKEL drafts structured outputs but never publishes directly
Workers run long or asynchronous tasks
Storage is separated for scalability and clarity
Federation Layer provides optional distributed operation

Storage Architecture

FactHarbor separates structured data, embeddings, and evidence files:

PostgreSQL — canonical structured entities, all versioning, lineage, signatures
Vector DB (Qdrant or pgvector) — semantic search, duplication detection, cluster mapping
Object Storage — PDFs, datasets, raw evidence, transcripts
Optional (Release 1.0): Redis for caching, IPFS for decentralized object storage

Storage Architecture

graph TB
 APP[Application
API + AKEL] --> REDIS[Redis Cache
Hot data, sessions,
rate limiting]
 REDIS --> PG[(PostgreSQL
Primary Database
**All core data**
Claims, Evidence,
Sources, Users)]
 APP --> PG
 PG -->|Backups &
Archives| S3[(S3 Storage
Old logs,
Backups)]
 BG[Background
Scheduler] --> PG
 BG --> S3
 subgraph V10["✅ V1.0 Core (3 systems)"]
 PG
 REDIS
 S3
 end
 subgraph Future["🔮 Optional Future (Add if metrics show need)"]
 PG -.->|If search slow
>500ms| ES[(Elasticsearch
Full-text search)]
 PG -.->|If metrics slow
>1s queries| TS[TimescaleDB
Time-series]
 end
 style PG fill:#9999ff
 style REDIS fill:#ff9999
 style S3 fill:#ff99ff
 style ES fill:#cccccc
 style TS fill:#cccccc
 style V10 fill:#e8f5e9
 style Future fill:#fff3e0

Simplified Storage - PostgreSQL as single primary database for all core data (claims, evidence, sources, users, metrics). Redis for caching, S3 for archives. Elasticsearch and TimescaleDB are optional additions only if performance metrics prove necessary. Start with 3 systems, not 5.

Core Backend Module Architecture

Each module has a clear responsibility and versioned boundaries to allow future extraction into microservices.

Claim Processing Module

Responsibilities:

Ingest text, URLs, documents, transcripts, federated input
Extract claims (AKEL-assisted)
Normalize structure
Classify (type, domain, evaluability, safety)
Deduplicate via embeddings
Assign to claim clusters

Flow:
Ingest → Normalize → Classify → Deduplicate → Cluster

Scenario Engine

Responsibilities:

Create and validate scenarios
Enforce required fields (definitions, assumptions, boundaries...)
Perform safety checks (AKEL-assisted)
Manage versioning and lifecycle
Provide contextual evaluation settings to the Verdict Engine

Flow:
Create → Validate → Version → Lifecycle → Safety

Evidence Repository

Responsibilities:

Store metadata + files (object store)
Classify evidence
Compute preliminary reliability
Maintain version history
Detect retractions or disputes
Provide structured metadata to the Verdict Engine

Flow:
Store → Classify → Score → Version → Update/Retract

Verdict Engine

Responsibilities:

Aggregate scenario-linked evidence
Compute likelihood ranges
Generate reasoning chain
Track uncertainty factors
Maintain verdict version timelines

Flow:
Aggregate → Compute → Explain → Version → Timeline

Re-evaluation Engine

Responsibilities:

Listen for upstream changes
Trigger partial or full recomputation
Update verdicts + summary views
Maintain consistency across federated nodes

Triggers include:

Evidence updated or retracted
Scenario definition or assumption changes
Claim type or evaluability changes
Contradiction detection
Federation sync updates

Flow:
Trigger → Impact Analysis → Recompute → Publish Update

AKEL Integration Summary

AKEL is fully documented in its own chapter.
Here is only the architectural integration summary:

Receives raw input for claims
Proposes scenario drafts
Extracts and summarizes evidence
Gives reliability hints
Suggests draft verdicts
Monitors contradictions
Syncs metadata with trusted nodes

AKEL runs in parallel to human review — never overrides it.

Federated Architecture

Each FactHarbor node:

Has its own dataset (claims, scenarios, evidence, verdicts)
Runs its own AKEL
Maintains local governance and reviewer rules
May partially mirror global or domain-specific data
Contributes to global knowledge clusters

Nodes synchronize via:

Signed version bundles
Merkle-tree lineage structures
Optionally IPFS for evidence
Trust-weighted acceptance

Benefits:

Community independence
Scalability
Resilience
Domain specialization

Request → Verdict Flow

Simple end-to-end flow:

User → UI Frontend → REST API → FactHarbor Core
→ (Claim Processing → Scenario Engine → Evidence Repository → Verdict Engine)
→ Summary View → UI Frontend → User

Federation Sync Workflow

Sequence:

Detect Local Change → Build Signed Bundle → Push to Peers → Validate Signature → Merge or Fork → Trigger Re-evaluation

Versioning Architecture

All entities (Claim, Scenario, Evidence, Verdict) use immutable version chains:

VersionID
ParentVersionID
Timestamp
AuthorType (Human, AI, ExternalNode)
ChangeReason
Signature (optional POC, required in 1.0)

graph LR
 CLAIM[Claim] -->|edited| EDIT[Edit Record]
 EDIT -->|stores| BEFORE[Before State]
 EDIT -->|stores| AFTER[After State]
 EDIT -->|tracks| WHO[Who Changed]
 EDIT -->|tracks| WHEN[When Changed]
 EDIT -->|tracks| WHY[Why Changed]
 EDIT -->|if needed| RESTORE[Manual Restore]
 RESTORE -->|create new| CLAIM
 style EDIT fill:#ffcccc
 style RESTORE fill:#ccffcc

Versioning Architecture - Simple audit trail for V1.0: Track who, what, when, why for each change. Store before/after values in edits table. Manual restore if needed (create new edit with old values). Full versioning system (branching, merging, automatic rollback) deferred to V2.0+ unless users explicitly request it.
V1.0: Simple edit history sufficient for accountability and basic rollback.
V2.0+: Add complex versioning if users request "see version history" or "restore previous version" features.