Architecture

Last modified by Robert Schaub on 2025/12/24 20:30

Architecture

FactHarbor uses a modular-monolith architecture (POC → Beta 0) designed to evolve into a distributed, federated, multi-node system (Release 1.0+).
Modules are strongly separated, versioned, and auditable. All logic is transparent and deterministic.

High-Level System Architecture

FactHarbor is composed of the following major modules:

UI Frontend
REST API Layer
Core Logic Layer
- Claim Processing
- Scenario Engine
- Evidence Repository
- Verdict Engine
- Re-evaluation Engine
- Roles / Identity / Reputation
AKEL (AI Knowledge Extraction Layer)
Federation Layer
Workers & Background Jobs
Storage Layer (Postgres + VectorDB + ObjectStore)

High-Level Architecture

graph TB
 subgraph Interface_Layer["🖥️ Interface Layer"]
 UI[Web UI
Browse & Submit]
 API[REST API
Programmatic Access]
 AUTH[Authentication
& Authorization]
 end
 subgraph Processing_Layer["⚙️ Processing Layer"]
 AKEL[AKEL Pipeline
Parallel Processing
10-18 seconds]
 LLM[LLM Abstraction Layer
Multi-Provider Support
Anthropic OpenAI Google]
 BG[Background Jobs
Source Scoring,
Cache, Archival]
 QM[Quality Monitoring
Automated Checks]
 end
 subgraph Data_Layer["💾 Data & Storage Layer"]
 PG[(PostgreSQL
Primary Database
All Core Data)]
 REDIS[(Redis
Cache & LLM Config)]
 S3[(S3
Archives)]
 end
 UI --> AUTH
 API --> AUTH
 AUTH --> AKEL
 AUTH --> QM
 AKEL --> LLM
 LLM --> PG
 LLM --> REDIS
 AKEL --> PG
 AKEL --> REDIS
 BG --> PG
 BG --> S3
 QM --> PG
 REDIS --> PG
 style Interface_Layer fill:#e1f5ff
 style Processing_Layer fill:#fff4e1
 style Data_Layer fill:#f0f0f0
 style AKEL fill:#ffcccc
 style LLM fill:#ccffcc
 style PG fill:#9999ff

Three-Layer Architecture - Clean separation with LLM abstraction: Interface Layer (user interactions), Processing Layer (AKEL + LLM Abstraction + background jobs), Data Layer (PostgreSQL primary + Redis cache/config + S3 archives). LLM Abstraction Layer provides provider-agnostic access to Anthropic, OpenAI, Google, and local models with automatic failover.

Key ideas:

Core logic is deterministic, auditable, and versioned
AKEL drafts structured outputs but never publishes directly
Workers run long or asynchronous tasks
Storage is separated for scalability and clarity
Federation Layer provides optional distributed operation

Storage Architecture

FactHarbor separates structured data, embeddings, and evidence files:

PostgreSQL — canonical structured entities, all versioning, lineage, signatures
Vector DB (Qdrant or pgvector) — semantic search, duplication detection, cluster mapping
Object Storage — PDFs, datasets, raw evidence, transcripts
Optional (Release 1.0): Redis for caching, IPFS for decentralized object storage

Storage Architecture

graph TB
 APP[Application
API + AKEL] --> REDIS[Redis Cache
Hot data, sessions,
rate limiting]
 REDIS --> PG[(PostgreSQL
Primary Database
**All core data**
Claims, Evidence,
Sources, Users)]
 APP --> PG
 PG -->|Backups &
Archives| S3[(S3 Storage
Old logs,
Backups)]
 BG[Background
Scheduler] --> PG
 BG --> S3
 subgraph V10["✅ V1.0 Core (3 systems)"]
 PG
 REDIS
 S3
 end
 subgraph Future["🔮 Optional Future (Add if metrics show need)"]
 PG -.->|If search slow
>500ms| ES[(Elasticsearch
Full-text search)]
 PG -.->|If metrics slow
>1s queries| TS[TimescaleDB
Time-series]
 end
 style PG fill:#9999ff
 style REDIS fill:#ff9999
 style S3 fill:#ff99ff
 style ES fill:#cccccc
 style TS fill:#cccccc
 style V10 fill:#e8f5e9
 style Future fill:#fff3e0

Simplified Storage - PostgreSQL as single primary database for all core data (claims, evidence, sources, users, metrics). Redis for caching, S3 for archives. Elasticsearch and TimescaleDB are optional additions only if performance metrics prove necessary. Start with 3 systems, not 5.

Core Backend Module Architecture

Each module has a clear responsibility and versioned boundaries to allow future extraction into microservices.

1. Claim Processing Module

Responsibilities:

Ingest text, URLs, documents, transcripts, federated input
Extract claims (AKEL-assisted)
Normalize structure
Classify (type, domain, evaluability, safety)
Deduplicate via embeddings
Assign to claim clusters

Flow:
Ingest → Normalize → Classify → Deduplicate → Cluster

2. Scenario Engine

Responsibilities:

Create and validate scenarios
Enforce required fields (definitions, assumptions, boundaries...)
Perform safety checks (AKEL-assisted)
Manage versioning and lifecycle
Provide contextual evaluation settings to the Verdict Engine

Flow:
Create → Validate → Version → Lifecycle → Safety

3. Evidence Repository

Responsibilities:

Store metadata + files (object store)
Classify evidence
Compute preliminary reliability
Maintain version history
Detect retractions or disputes
Provide structured metadata to the Verdict Engine

Flow:
Store → Classify → Score → Version → Update/Retract

4. Verdict Engine

Responsibilities:

Aggregate scenario-linked evidence
Compute likelihood ranges per scenario
Generate reasoning chain
Track uncertainty factors
Maintain verdict version timelines

Flow:
Aggregate → Compute → Explain → Version → Timeline

5. Re-evaluation Engine

Responsibilities:

Listen for upstream changes
Trigger partial or full recomputation
Update verdicts + summary views
Maintain consistency across federated nodes

Triggers include:

Evidence updated or retracted
Scenario definition or assumption changes
Claim type or evaluability changes
Contradiction detection
Federation sync updates

Flow:
Trigger → Impact Analysis → Recompute → Publish Update

AKEL Integration Summary

AKEL is fully documented in its own chapter.
Here is only the architectural integration summary:

Receives raw input for claims
Proposes scenario drafts
Extracts and summarizes evidence
Gives reliability hints
Suggests draft verdicts
Monitors contradictions
Syncs metadata with trusted nodes

AKEL runs in parallel to human review — never overrides it.

AKEL Architecture

graph TB
 User[User Submits Content
Text/URL/Single Claim]
 Extract[Claim Extraction
LLM identifies distinct claims]
 AKEL[AKEL Core Processing
Per Claim]
 Evidence[Evidence Gathering]
 Scenario[Scenario Generation]
 Verdict[Verdict Generation]
 Storage[(Storage Layer
PostgreSQL + S3)]
 Queue[Processing Queue
Parallel Claims]
 
 User --> Extract
 Extract -->|Multiple Claims| Queue
 Extract -->|Single Claim| AKEL
 Queue -->|Process Each| AKEL
 
 AKEL --> Evidence
 AKEL --> Scenario
 Evidence --> Verdict
 Scenario --> Verdict
 Verdict --> Storage
 
 style Extract fill:#e1f5ff
 style Queue fill:#fff4e1
 style AKEL fill:#f0f0f0

Federated Architecture

Each FactHarbor node:

Has its own dataset (claims, scenarios, evidence, verdicts)
Runs its own AKEL
Maintains local governance and reviewer rules
May partially mirror global or domain-specific data
Contributes to global knowledge clusters

Nodes synchronize via:

Signed version bundles
Merkle-tree lineage structures
Optionally IPFS for evidence
Trust-weighted acceptance

Benefits:

Community independence
Scalability
Resilience
Domain specialization

Federation Architecture
This diagram shows the complete federated architecture with node components and communication layers.

graph LR
 FH1[FactHarbor
Instance 1] 
 FH2[FactHarbor
Instance 2]
 FH3[FactHarbor
Instance 3]
 FH1 -.->|V1.0+:
Sync claims| FH2
 FH2 -.->|V1.0+:
Sync claims| FH3
 FH3 -.->|V1.0+:
Sync claims| FH1
 U1[Users] --> FH1
 U2[Users] --> FH2
 U3[Users] --> FH3
 style FH1 fill:#e1f5ff
 style FH2 fill:#e1f5ff
 style FH3 fill:#e1f5ff

Federation Architecture - Future (V1.0+): Independent FactHarbor instances can sync claims for broader reach while maintaining local control.

Request → Verdict Flow

Simple end-to-end flow:

User → UI Frontend → REST API → FactHarbor Core
→ (Claim Processing → Scenario Engine → Evidence Repository → Verdict Engine)
→ Summary View → UI Frontend → User

Federation Sync Workflow

Sequence:

Detect Local Change → Build Signed Bundle → Push to Peers → Validate Signature → Merge or Fork → Trigger Re-evaluation

Versioning Architecture

All entities (Claim, Scenario, Evidence, Verdict) use immutable version chains:

VersionID
ParentVersionID
Timestamp
AuthorType (Human, AI, ExternalNode)
ChangeReason
Signature (optional POC, required in 1.0)

graph LR
 CLAIM[Claim] -->|edited| EDIT[Edit Record]
 EDIT -->|stores| BEFORE[Before State]
 EDIT -->|stores| AFTER[After State]
 EDIT -->|tracks| WHO[Who Changed]
 EDIT -->|tracks| WHEN[When Changed]
 EDIT -->|tracks| WHY[Why Changed]
 EDIT -->|if needed| RESTORE[Manual Restore]
 RESTORE -->|create new| CLAIM
 style EDIT fill:#ffcccc
 style RESTORE fill:#ccffcc

Versioning Architecture - Simple audit trail for V1.0: Track who, what, when, why for each change. Store before/after values in edits table. Manual restore if needed (create new edit with old values). Full versioning system (branching, merging, automatic rollback) deferred to V2.0+ unless users explicitly request it.
V1.0: Simple edit history sufficient for accountability and basic rollback.
V2.0+: Add complex versioning if users request "see version history" or "restore previous version" features.

Architecture

Architecture

High-Level System Architecture

Storage Architecture

Core Backend Module Architecture

1. Claim Processing Module

2. Scenario Engine

3. Evidence Repository

4. Verdict Engine

5. Re-evaluation Engine

AKEL Integration Summary

Federated Architecture

Request → Verdict Flow

Federation Sync Workflow

Versioning Architecture

Navigation

Need help?