Federation & Decentralization

Version 1.1 by Robert Schaub on 2025/12/16 21:42

Federation & Decentralization

FactHarbor is designed to operate as a federated network of nodes rather than a single central server.

Decentralization provides:

Resilience against censorship or political pressure
Autonomy for local governance and moderation
Scalability across many independent communities
Trust without centralized control
Domain specialization (health-focused nodes, energy-focused nodes, etc.)

FactHarbor draws inspiration from the Fediverse but uses stronger structure, versioning, and integrity guarantees.

1. Federation Architecture Diagram

The following diagram shows the complete federated architecture with node components and communication layers.

Federation Architecture
This diagram shows the complete federated architecture with node components and communication layers.

graph LR
 FH1[FactHarbor
Instance 1] 
 FH2[FactHarbor
Instance 2]
 FH3[FactHarbor
Instance 3]
 FH1 -.->|V1.0+:
Sync claims| FH2
 FH2 -.->|V1.0+:
Sync claims| FH3
 FH3 -.->|V1.0+:
Sync claims| FH1
 U1[Users] --> FH1
 U2[Users] --> FH2
 U3[Users] --> FH3
 style FH1 fill:#e1f5ff
 style FH2 fill:#e1f5ff
 style FH3 fill:#e1f5ff

Federation Architecture - Future (V1.0+): Independent FactHarbor instances can sync claims for broader reach while maintaining local control.

2. Federated FactHarbor Nodes

Each FactHarbor instance ("node") maintains:

Its own database
Its own AKEL instance
Its own reviewers, experts, and contributors
Its own governance rules

Nodes exchange structured information:

Claims
Scenarios
Evidence metadata (not necessarily full files)
Verdicts (optional)
Hashes and signatures for integrity

Nodes choose which external nodes they trust.

3. Global Identifiers

Every entity receives a globally unique, linkable identifier.

Format:
`factharbor://node_url/type/local_id`

Example:
`factharbor://factharbor.energy/claim/CLM-55812`

Supported types:

`claim`
`scenario`
`evidence`
`verdict`
`user` (optional)
`cluster`

Properties:

Globally consistent
Human-readable
Hash-derived
Independent of database internals
URL-resolvable (future enhancement)

This allows cross-node references and prevents identifier collisions in federated environments.

4. Trust Model

Each node maintains a trust table defining relationships with other nodes:

4.1 Trust Levels

Trusted Nodes:

Claims auto-imported
Scenarios accepted without re-review
Evidence considered valid
Verdicts displayed to users
High synchronization priority

Neutral Nodes:

Claims imported but flagged for review
Scenarios require local validation
Evidence requires re-assessment
Verdicts shown with "external node" disclaimer
Normal synchronization priority

Untrusted Nodes:

Claims quarantined, manual import only
Scenarios rejected by default
Evidence not accepted
Verdicts not displayed
No automatic synchronization

4.2 Trust Affects

Auto-import: Whether claims/scenarios are automatically added
Re-review requirements: Whether local reviewers must validate
Verdict display: Whether external verdicts are shown to users
Synchronization frequency: How often data is exchanged
Reputation signals: How external reputation is interpreted

4.3 Local Trust Authority

Each node's governance team decides:

Which nodes to trust
Trust level criteria
Trust escalation/degradation rules
Dispute resolution with partner nodes

Trust is local and autonomous - no global trust registry exists.

5. Data Sharing Model

5.1 What Nodes Share

Always shared (if federation enabled):

Claims and claim clusters
Scenario structures
Evidence metadata and content hashes
Integrity signatures

Optionally shared:

Full evidence files (large documents)
Verdicts (nodes may choose to keep verdicts local)
Vector embeddings
Scenario templates
AKEL distilled knowledge

Never shared:

Internal user lists
Reviewer comments and internal discussions
Governance decisions and meeting notes
Access control data
Private or sensitive content marked as local-only

5.2 Large Evidence Files

Evidence files are:

Stored locally by default
Referenced via global content hash
Optionally served through IPFS
Accessible via direct peer-to-peer transfer
Can be stored in S3-compatible object storage

6. Synchronization Protocol

Nodes exchange data using multiple synchronization methods:

6.1 Push-Based Synchronization

Mechanism: Webhooks

When local content changes:

Node builds signed bundle
2. Sends webhook notification to subscribed nodes
3. Remote nodes fetch bundle
4. Remote nodes validate and import

Use case: Real-time updates for trusted partners

6.2 Pull-Based Synchronization

Mechanism: Scheduled polling

Nodes periodically:

Query partner nodes for updates
2. Fetch changed entities since last sync
3. Validate and import
4. Store sync checkpoint

Use case: Regular batch updates, lower trust nodes

6.3 Subscription-Based Synchronization

Mechanism: WebSub-like protocol

Nodes subscribe to:

Specific claim clusters
Specific domains (medical, energy, etc.)
Specific scenario types
Verdict updates

Publisher pushes updates only to subscribers.

Use case: Selective federation, domain specialization

6.4 Large Asset Transfer

For files >10MB:

S3-compatible object storage
IPFS (content-addressed)
Direct peer-to-peer transfer
Chunked HTTP transfer with resume support

7. Federation Sync Workflow

Complete synchronization sequence for creating and sharing new content:

7.1 Step 1: Local Node Creates New Versions

User or AKEL creates:

New claim version
New scenario version
New evidence version
New verdict version

All changes tracked with:

VersionID
ParentVersionID
AuthorType
Timestamp
JustificationText

7.2 Step 2: Federation Layer Builds Signed Bundle

Federation layer packages:

Entity data (claim, scenario, evidence metadata, verdict)
Version lineage (ParentVersionID chain)
Cryptographic signatures
Node provenance information
Trust metadata

Bundle format:

JSON-LD for structured data
Content-addressed hashes
Digital signatures for integrity

7.3 Step 3: Bundle Includes Required Data

Each bundle contains:

Claims: Full claim text, classification, domain
Scenarios: Definitions, assumptions, boundaries
Evidence metadata: Source URLs, hashes, reliability scores (not always full files)
Verdicts: Likelihood ranges, uncertainty, reasoning chains
Lineage: Version history, parent relationships
Signatures: Cryptographic proof of origin

7.4 Step 4: Bundle Pushed to Trusted Neighbor Nodes

Based on trust table:

Push to trusted nodes immediately
Queue for neutral nodes (batched)
Skip untrusted nodes

Push methods:

Webhook notification
Direct API call
Pub/Sub message queue

7.5 Step 5: Remote Nodes Validate Lineage and Signatures

Receiving node:

Verifies cryptographic signatures
2. Validates version lineage (ParentVersionID chain)
3. Checks for conflicts with local data
4. Validates data structure and required fields
5. Applies local trust policies

Validation failures → reject or quarantine bundle

7.6 Step 6: Accept or Branch Versions

Accept (if validation passes):

Import new versions
Maintain provenance metadata
Link to local related entities
Update local indices

Branch (if conflict detected):

Create parallel version tree
Mark as "external branch"
Allow local reviewers to merge or reject
Preserve both version histories

Reject (if validation fails):

Log rejection reason
Notify source node (optional)
Quarantine for manual review (optional)

7.7 Step 7: Local Re-evaluation Runs if Required

After import, local node checks:

Does new evidence affect existing verdicts?
Do new scenarios require re-assessment?
Are there contradictions with local content?

If yes:

Trigger AKEL re-evaluation
Queue for reviewer attention
Update affected verdicts
Notify users following related content

8. Cross-Node AI Knowledge Exchange

Each node runs its own AKEL instance and may exchange AI-derived knowledge:

8.1 What Can Be Shared

Vector embeddings:

For cross-node claim clustering
For semantic search alignment
Never includes training data

Canonical claim forms:

Normalized claim text
Standard phrasing templates
Domain-specific formulations

Scenario templates:

Reusable scenario structures
Common assumption patterns
Evaluation method templates

Contradiction alerts:

Detected conflicts between claims
Evidence conflicts across nodes
Scenario incompatibilities

Metadata and insights:

Aggregate quality metrics
Reliability signal extraction
Bubble detection patterns

8.2 What Can NEVER Be Shared

Model weights: No sharing of trained model parameters

Training data: No sharing of full training datasets

Local governance overrides: AKEL suggestions can be overridden locally

User behavior data: No cross-node tracking

Internal review discussions: Private content stays private

8.3 Benefits of AI Knowledge Exchange

Reduced duplication across nodes
Improved claim clustering accuracy
Faster contradiction detection
Shared scenario libraries
Cross-node quality improvements

8.4 Local Control Maintained

Nodes accept or reject shared AI knowledge
Human reviewers can override any AKEL suggestion
Local governance always has final authority
No external AI control over local content
Privacy-preserving knowledge exchange

9. Decentralized Processing

Each node independently performs:

AKEL processing
Scenario drafting and validation
Evidence review
Verdict calculation
Truth landscape summarization

Nodes can specialize:

Health-focused node with medical experts
Energy-focused node with domain knowledge
Small node delegating scenario libraries to partners
Regional node with language/culture specialization

Optional data sharing includes:

Embeddings for clustering
Claim clusters for alignment
Scenario templates for efficiency
Verdict comparison metadata

10. Scaling to Thousands of Users

Nodes scale independently through:

Horizontally scalable API servers
Worker pools for AKEL tasks
Hybrid storage (local + S3/IPFS)
Redis caching for performance
Sharded or partitioned databases

Federation allows effectively unlimited horizontal scaling by adding new nodes.

Communities may form:

Domain-specific nodes (epidemiology, energy, climate)
Language or region-based nodes
NGO or institutional nodes
Private organizational nodes
Academic research nodes

Nodes cooperate through:

Scenario library sharing
Shared or overlapping claim clusters
Expert delegation between nodes
Distributed AKEL task support
Cross-node quality audits

11. Federation and Release 1.0

POC: Single node, optional federation experiments

Beta 0: 2-3 nodes, basic federation protocol

Release 1.0: Full federation support with:

Robust synchronization protocol
Trust model implementation
Cross-node AI knowledge exchange
Federated search and discovery
Distributed audit collaboration
Inter-node expert consultation

Federation & Decentralization

Federation & Decentralization

1. Federation Architecture Diagram

2. Federated FactHarbor Nodes

3. Global Identifiers

4. Trust Model

4.1 Trust Levels

4.2 Trust Affects

4.3 Local Trust Authority

5. Data Sharing Model

5.1 What Nodes Share

5.2 Large Evidence Files

6. Synchronization Protocol

6.1 Push-Based Synchronization

6.2 Pull-Based Synchronization

6.3 Subscription-Based Synchronization

6.4 Large Asset Transfer

7. Federation Sync Workflow

7.1 Step 1: Local Node Creates New Versions

7.2 Step 2: Federation Layer Builds Signed Bundle

7.3 Step 3: Bundle Includes Required Data

7.4 Step 4: Bundle Pushed to Trusted Neighbor Nodes

7.5 Step 5: Remote Nodes Validate Lineage and Signatures

7.6 Step 6: Accept or Branch Versions

7.7 Step 7: Local Re-evaluation Runs if Required

8. Cross-Node AI Knowledge Exchange

8.1 What Can Be Shared

8.2 What Can NEVER Be Shared

8.3 Benefits of AI Knowledge Exchange

8.4 Local Control Maintained

9. Decentralized Processing

10. Scaling to Thousands of Users

11. Federation and Release 1.0

12. Related Pages