Federation & Decentralization

Version 1.1 by Robert Schaub on 2025/12/16 21:42

Federation & Decentralization

FactHarbor is designed to operate as a federated network of nodes rather than a single central server.

Decentralization provides:

  • Resilience against censorship or political pressure
  • Autonomy for local governance and moderation
  • Scalability across many independent communities
  • Trust without centralized control
  • Domain specialization (health-focused nodes, energy-focused nodes, etc.)

FactHarbor draws inspiration from the Fediverse but uses stronger structure, versioning, and integrity guarantees.

1. Federation Architecture Diagram

The following diagram shows the complete federated architecture with node components and communication layers.

Federation Architecture
This diagram shows the complete federated architecture with node components and communication layers.

graph LR
 FH1[FactHarbor
Instance 1] FH2[FactHarbor
Instance 2] FH3[FactHarbor
Instance 3] FH1 -.->|V1.0+:
Sync claims| FH2 FH2 -.->|V1.0+:
Sync claims| FH3 FH3 -.->|V1.0+:
Sync claims| FH1 U1[Users] --> FH1 U2[Users] --> FH2 U3[Users] --> FH3 style FH1 fill:#e1f5ff style FH2 fill:#e1f5ff style FH3 fill:#e1f5ff

Federation Architecture - Future (V1.0+): Independent FactHarbor instances can sync claims for broader reach while maintaining local control.

2. Federated FactHarbor Nodes

Each FactHarbor instance ("node") maintains:

  • Its own database
  • Its own AKEL instance
  • Its own reviewers, experts, and contributors
  • Its own governance rules

Nodes exchange structured information:

  • Claims
  • Scenarios
  • Evidence metadata (not necessarily full files)
  • Verdicts (optional)
  • Hashes and signatures for integrity

Nodes choose which external nodes they trust.

3. Global Identifiers

Every entity receives a globally unique, linkable identifier.

Format:  
`factharbor://node_url/type/local_id`

Example:  
`factharbor://factharbor.energy/claim/CLM-55812`

Supported types:

  • `claim`
  • `scenario`
  • `evidence`
  • `verdict`
  • `user` (optional)
  • `cluster`

Properties:

  • Globally consistent
  • Human-readable
  • Hash-derived
  • Independent of database internals
  • URL-resolvable (future enhancement)

This allows cross-node references and prevents identifier collisions in federated environments.

4. Trust Model

Each node maintains a trust table defining relationships with other nodes:

4.1 Trust Levels

Trusted Nodes:

  • Claims auto-imported
  • Scenarios accepted without re-review
  • Evidence considered valid
  • Verdicts displayed to users
  • High synchronization priority

Neutral Nodes:

  • Claims imported but flagged for review
  • Scenarios require local validation
  • Evidence requires re-assessment
  • Verdicts shown with "external node" disclaimer
  • Normal synchronization priority

Untrusted Nodes:

  • Claims quarantined, manual import only
  • Scenarios rejected by default
  • Evidence not accepted
  • Verdicts not displayed
  • No automatic synchronization

4.2 Trust Affects

  • Auto-import: Whether claims/scenarios are automatically added
  • Re-review requirements: Whether local reviewers must validate
  • Verdict display: Whether external verdicts are shown to users
  • Synchronization frequency: How often data is exchanged
  • Reputation signals: How external reputation is interpreted

4.3 Local Trust Authority

Each node's governance team decides:

  • Which nodes to trust
  • Trust level criteria
  • Trust escalation/degradation rules
  • Dispute resolution with partner nodes

Trust is local and autonomous - no global trust registry exists.

5. Data Sharing Model

5.1 What Nodes Share

Always shared (if federation enabled):

  • Claims and claim clusters
  • Scenario structures
  • Evidence metadata and content hashes
  • Integrity signatures

Optionally shared:

  • Full evidence files (large documents)
  • Verdicts (nodes may choose to keep verdicts local)
  • Vector embeddings
  • Scenario templates
  • AKEL distilled knowledge

Never shared:

  • Internal user lists
  • Reviewer comments and internal discussions
  • Governance decisions and meeting notes
  • Access control data
  • Private or sensitive content marked as local-only

5.2 Large Evidence Files

Evidence files are:

  • Stored locally by default
  • Referenced via global content hash
  • Optionally served through IPFS
  • Accessible via direct peer-to-peer transfer
  • Can be stored in S3-compatible object storage

6. Synchronization Protocol

Nodes exchange data using multiple synchronization methods:

6.1 Push-Based Synchronization

Mechanism: Webhooks

When local content changes:

  1. Node builds signed bundle
    2. Sends webhook notification to subscribed nodes
    3. Remote nodes fetch bundle
    4. Remote nodes validate and import

Use case: Real-time updates for trusted partners

6.2 Pull-Based Synchronization

Mechanism: Scheduled polling

Nodes periodically:

  1. Query partner nodes for updates
    2. Fetch changed entities since last sync
    3. Validate and import
    4. Store sync checkpoint

Use case: Regular batch updates, lower trust nodes

6.3 Subscription-Based Synchronization

Mechanism: WebSub-like protocol

Nodes subscribe to:

  • Specific claim clusters
  • Specific domains (medical, energy, etc.)
  • Specific scenario types
  • Verdict updates

Publisher pushes updates only to subscribers.

Use case: Selective federation, domain specialization

6.4 Large Asset Transfer

For files >10MB:

  • S3-compatible object storage
  • IPFS (content-addressed)
  • Direct peer-to-peer transfer
  • Chunked HTTP transfer with resume support

7. Federation Sync Workflow

Complete synchronization sequence for creating and sharing new content:

7.1 Step 1: Local Node Creates New Versions

User or AKEL creates:

  • New claim version
  • New scenario version
  • New evidence version
  • New verdict version

All changes tracked with:

  • VersionID
  • ParentVersionID
  • AuthorType
  • Timestamp
  • JustificationText

7.2 Step 2: Federation Layer Builds Signed Bundle

Federation layer packages:

  • Entity data (claim, scenario, evidence metadata, verdict)
  • Version lineage (ParentVersionID chain)
  • Cryptographic signatures
  • Node provenance information
  • Trust metadata

Bundle format:

  • JSON-LD for structured data
  • Content-addressed hashes
  • Digital signatures for integrity

7.3 Step 3: Bundle Includes Required Data

Each bundle contains:

  • Claims: Full claim text, classification, domain
  • Scenarios: Definitions, assumptions, boundaries
  • Evidence metadata: Source URLs, hashes, reliability scores (not always full files)
  • Verdicts: Likelihood ranges, uncertainty, reasoning chains
  • Lineage: Version history, parent relationships
  • Signatures: Cryptographic proof of origin

7.4 Step 4: Bundle Pushed to Trusted Neighbor Nodes

Based on trust table:

  • Push to trusted nodes immediately
  • Queue for neutral nodes (batched)
  • Skip untrusted nodes

Push methods:

  • Webhook notification
  • Direct API call
  • Pub/Sub message queue

7.5 Step 5: Remote Nodes Validate Lineage and Signatures

Receiving node:

  1. Verifies cryptographic signatures
    2. Validates version lineage (ParentVersionID chain)
    3. Checks for conflicts with local data
    4. Validates data structure and required fields
    5. Applies local trust policies

Validation failures → reject or quarantine bundle

7.6 Step 6: Accept or Branch Versions

Accept (if validation passes):

  • Import new versions
  • Maintain provenance metadata
  • Link to local related entities
  • Update local indices

Branch (if conflict detected):

  • Create parallel version tree
  • Mark as "external branch"
  • Allow local reviewers to merge or reject
  • Preserve both version histories

Reject (if validation fails):

  • Log rejection reason
  • Notify source node (optional)
  • Quarantine for manual review (optional)

7.7 Step 7: Local Re-evaluation Runs if Required

After import, local node checks:

  • Does new evidence affect existing verdicts?
  • Do new scenarios require re-assessment?
  • Are there contradictions with local content?

If yes:

  • Trigger AKEL re-evaluation
  • Queue for reviewer attention
  • Update affected verdicts
  • Notify users following related content

8. Cross-Node AI Knowledge Exchange

Each node runs its own AKEL instance and may exchange AI-derived knowledge:

8.1 What Can Be Shared

Vector embeddings:

  • For cross-node claim clustering
  • For semantic search alignment
  • Never includes training data

Canonical claim forms:

  • Normalized claim text
  • Standard phrasing templates
  • Domain-specific formulations

Scenario templates:

  • Reusable scenario structures
  • Common assumption patterns
  • Evaluation method templates

Contradiction alerts:

  • Detected conflicts between claims
  • Evidence conflicts across nodes
  • Scenario incompatibilities

Metadata and insights:

  • Aggregate quality metrics
  • Reliability signal extraction
  • Bubble detection patterns

8.2 What Can NEVER Be Shared

Model weights: No sharing of trained model parameters

Training data: No sharing of full training datasets

Local governance overrides: AKEL suggestions can be overridden locally

User behavior data: No cross-node tracking

Internal review discussions: Private content stays private

8.3 Benefits of AI Knowledge Exchange

  • Reduced duplication across nodes
  • Improved claim clustering accuracy
  • Faster contradiction detection
  • Shared scenario libraries
  • Cross-node quality improvements

8.4 Local Control Maintained

  • Nodes accept or reject shared AI knowledge
  • Human reviewers can override any AKEL suggestion
  • Local governance always has final authority
  • No external AI control over local content
  • Privacy-preserving knowledge exchange

9. Decentralized Processing

Each node independently performs:

  • AKEL processing
  • Scenario drafting and validation
  • Evidence review
  • Verdict calculation
  • Truth landscape summarization

Nodes can specialize:

  • Health-focused node with medical experts
  • Energy-focused node with domain knowledge
  • Small node delegating scenario libraries to partners
  • Regional node with language/culture specialization

Optional data sharing includes:

  • Embeddings for clustering
  • Claim clusters for alignment
  • Scenario templates for efficiency
  • Verdict comparison metadata

10. Scaling to Thousands of Users

Nodes scale independently through:

  • Horizontally scalable API servers
  • Worker pools for AKEL tasks
  • Hybrid storage (local + S3/IPFS)
  • Redis caching for performance
  • Sharded or partitioned databases

Federation allows effectively unlimited horizontal scaling by adding new nodes.

Communities may form:

  • Domain-specific nodes (epidemiology, energy, climate)
  • Language or region-based nodes
  • NGO or institutional nodes
  • Private organizational nodes
  • Academic research nodes

Nodes cooperate through:

  • Scenario library sharing
  • Shared or overlapping claim clusters
  • Expert delegation between nodes
  • Distributed AKEL task support
  • Cross-node quality audits

11. Federation and Release 1.0

POC: Single node, optional federation experiments

Beta 0: 2-3 nodes, basic federation protocol

Release 1.0: Full federation support with:

  • Robust synchronization protocol
  • Trust model implementation
  • Cross-node AI knowledge exchange
  • Federated search and discovery
  • Distributed audit collaboration
  • Inter-node expert consultation

12. Related Pages