Data Model

Last modified by Robert Schaub on 2025/12/24 20:34

Data Model

This page describes the current data model for FactHarbor v0.9.1.

Versioning Strategy

Every entity in FactHarbor has a full immutable version history. This ensures:

  • Complete auditability
  • Ability to reconstruct historical state
  • Federation-compatible lineage tracking
  • Transparent evolution of claims, scenarios, and verdicts

Core Versioning Principles

Immutability:

  • Each version is stored independently
  • Versions cannot be deleted, only superseded
  • Historical versions remain accessible

Lineage:

  • Each version links to its parent via `ParentVersionID`
  • Forms directed acyclic graph (DAG) of changes
  • Supports branching in federated environments

Provenance:

  • Every version timestamped (`CreatedAt`)
  • Author type recorded (`AuthorType`: Human, AI, ExternalNode)
  • Justification captured (`JustificationText`)
  • Digital signatures for integrity (`SignatureHash` in Release 1.0)

Federation Support:

  • Versions can originate from remote nodes
  • Conflict detection via lineage comparison
  • Parallel version trees for branching scenarios
  • Cross-node version synchronization

Common Version Fields

All versioned entities include:

  • VersionID: Unique identifier for this specific version
  • ParentVersionID: Link to previous version (null for first version)
  • CreatedAt: Timestamp (ISO 8601, UTC)
  • AuthorType: Human | AI | ExternalNode
  • CreatedBy: Foreign key to User or TechnicalUser
  • JustificationText: Brief explanation of changes
  • PublicationMode: Mode1 (draft) | Mode2 (AI-published) | Mode3 (human-reviewed)
  • ReviewStatus: Workflow state (draft|in_review|approved|rejected)
  • NodeOrigin: Node ID where version was created (for federation)
  • SignatureHash: Cryptographic signature (Release 1.0)

Core Entity Definitions

User Entities

USER (base user table):

  • ``UserID`` (PK)
  • ``UserType`` (Reader|Contributor|Reviewer|Auditor|Expert|Moderator|Maintainer)
  • ``DisplayName``
  • ``Email`` (for Contributors and above)
  • ``RegisteredAt``
  • ``LastActive``
  • ``Status`` (active|suspended|banned)

TECHNICAL_USER (system processes):

  • ``SystemID`` (PK)
  • ``SystemName``
  • ``Purpose`` (AKEL|FederationSync|BackupService|Monitor|Audit)
  • ``CreatedBy`` (FK to Maintainer who created this system user)
  • ``CreatedAt``
  • ``Status`` (active|paused|deprecated)
  • ``ApiKey`` (encrypted)
  • ``Permissions`` (JSON - authorized operations)

Examples of Technical Users:

  • AKEL instances (AI processing)
  • Federation sync bots
  • Scheduled audit tasks
  • Backup services
  • Monitoring systems
  • External API integrations

Content Entities

The system relies on the following versioned core entities:

CLAIM_CLUSTER:

  • ``ClusterID`` (PK)
  • ``EmbeddingVectorRef``
  • ``Theme``
  • Groups related claims into topical clusters
  • One Cluster has many Claims
  • A Claim belongs to exactly one primary cluster

CLAIM / CLAIM_VERSION:

  • ``CLAIM`` is the long-lived anchor for a real-world claim
  • ``CLAIM_VERSION`` is an immutable snapshot that includes:
  • ``VersionID`` (PK)
  • ``ClaimID`` (FK to CLAIM)
  • ``ParentVersionID`` (FK to prior version, nullable)
  • ``Text``
  • ``Domain``
  • ``ClaimType`` (literal|metaphorical|rhetorical|supernatural)
  • ``Evaluability`` (empirical|subjective|non-falsifiable)
  • ``RiskTier`` (A|B|C) - replaced SafetyCategory for consistency
  • ``PublicationMode`` (Mode1|Mode2|Mode3)
  • ``ReviewStatus`` (draft|in_review|approved|rejected)
  • ``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
  • ``NodeOrigin``, ``SignatureHash``
  • ``Status`` (active|superseded|merged)

SCENARIO / SCENARIO_VERSION:

  • ``SCENARIO`` is the anchor for a scenario across time
  • ``SCENARIO_VERSION`` is an immutable snapshot:
  • ``VersionID`` (PK)
  • ``ScenarioID`` (FK to SCENARIO)
  • ``ParentVersionID``
  • ``ClaimID`` (FK to CLAIM)
  • ``Definitions`` (JSON)
  • ``Boundaries`` (JSON)
  • ``Assumptions`` (JSON)
  • ``Context`` (text)
  • ``EvaluationMethod`` (text)
  • ``PublicationMode`` (Mode1|Mode2|Mode3)
  • ``ReviewStatus`` (draft|in_review|approved|rejected)
  • ``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
  • ``NodeOrigin``, ``SignatureHash``
  • ``Status`` (active|superseded|deprecated)

Note: SafetyClass removed from Scenario - risk tier is at claim level

EVIDENCE / EVIDENCE_VERSION:

  • ``EVIDENCE`` is the anchor
  • ``EVIDENCE_VERSION`` is the versioned snapshot:
  • ``VersionID`` (PK)
  • ``EvidenceID`` (FK to EVIDENCE)
  • ``ParentVersionID``
  • ``Type`` (paper|dataset|report|transcript|expert|media)
  • ``Category`` (empirical|historical|rhetorical|dataset|meta-analysis)
  • ``Reliability`` (low|medium|high)
  • ``Provenance`` (URL, DOI, source metadata)
  • ``ExtractionMethod`` (manual|OCR|API|AKEL)
  • ``ContentHash`` (SHA256 of evidence content)
  • ``PublicationMode`` (Mode1|Mode2|Mode3)
  • ``ReviewStatus`` (draft|verified|disputed|retracted)
  • ``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
  • ``NodeOrigin``, ``SignatureHash``
  • ``Status`` (active|superseded)

VERDICT / VERDICT_VERSION:

  • ``VERDICT`` is the anchor
  • ``VERDICT_VERSION`` is the snapshot:
  • ``VersionID`` (PK)
  • ``VerdictID`` (FK to VERDICT)
  • ``ParentVersionID``
  • ``ClaimID`` (FK to CLAIM)
  • ``ScenarioVersionID`` (FK to specific SCENARIO_VERSION)
  • ``EvidenceVersionSet`` (JSON array of Evidence VersionIDs used)
  • ``LikelihoodRange`` (0–1, with uncertainty bounds)
  • ``ExplanationChain`` (JSON)
  • ``UncertaintyFactors`` (JSON)
  • ``PublicationMode`` (Mode1|Mode2|Mode3)
  • ``ReviewStatus`` (draft|in_review|approved|retracted)
  • ``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
  • ``NodeOrigin``, ``SignatureHash``
  • ``Status`` (current|outdated|superseded|retracted)

Many-to-Many Linking Tables

ScenarioEvidenceLink:

  • Links scenario versions to evidence versions with relevance scoring
  • ``ScenarioID``, ``ScenarioVersionID``
  • ``EvidenceID``, ``EvidenceVersionID``
  • ``RelevanceScore`` (0–1) - How relevant this evidence is to this scenario
  • ``LinkJustification`` - Brief explanation of relevance

Purpose:

  • Evidence can be used by multiple scenarios
  • Scenarios can draw from multiple pieces of evidence
  • Relevance scoring helps prioritize evidence
  • Version-specific linking preserves historical accuracy

ClaimCluster:

  • Semantic clustering of similar claims
  • ``ClusterID`` (PK)
  • ``EmbeddingVector`` - Vector representation for semantic search
  • ``MemberList`` - List of ClaimIDs in this cluster
  • ``Theme`` - Human-readable theme description

Key Changes in v0.9.1

Updated Field Names:

  • `SafetyCategory` → `RiskTier` (consistency with risk tier system A/B/C)
  • `SafetyClass` removed from Scenario (redundant with claim-level RiskTier)

Added Fields to All Version Entities:

  • `PublicationMode` - Track Mode 1/2/3 status
  • `ReviewStatus` - Track workflow state
  • `NodeOrigin` - Federation provenance
  • `CreatedBy` - FK to User/TechnicalUser (clarified)

New Entity:

  • `TECHNICAL_USER` - Separate system processes from human users

Clarifications:

  • `ScenarioVersionID` in Verdict (not just ScenarioID) - links to specific version
  • `ContentHash` in Evidence - SHA256 for integrity checking

Data Model Behavior

Late-Arriving Evidence

When new evidence versions appear:

  1. Existing verdicts marked as outdated
    2. Scenario relevance must be re-evaluated
    3. Re-evaluation engine triggers verdict recomputation
    4. New verdict versions created
    5. Users notified of updates

Scenario Evolution

When a scenario's assumptions or definitions change:

  • Creates new scenario version (not in-place update)
  • All dependent verdicts must be recalculated
  • Previous scenario versions remain accessible
  • Version lineage preserved

Federated Nodes

Each node may share partial data:

  • Claims and scenarios shared if relevant
  • Evidence metadata shared, not always full files
  • Version synchronization via NodeOrigin tracking
  • Branching allowed for divergent interpretations

Visual Diagrams

The following diagrams provide visual representations of the data model structure and relationships.

Core Data Model ERD

Core Data Model ERD

erDiagram
    CLAIM_CLUSTER {
        string ClusterID PK
        string EmbeddingVectorRef
        string Theme
    }
    CLAIM {
        string ClaimID PK
        string ClusterID FK
        string Status
        datetime CreatedAt
    }
    CLAIM_VERSION {
        string ClaimVersionID PK
        string ClaimID FK
        string Text
        string ClaimType
        string Domain
        datetime CreatedAt
    }
    SCENARIO {
        string ScenarioID PK
        string ClaimID FK
        string Name
        datetime CreatedAt
    }
    SCENARIO_VERSION {
        string ScenarioVersionID PK
        string ScenarioID FK
        string Definitions
        string Assumptions
        string Boundaries
        datetime CreatedAt
    }
    EVIDENCE {
        string EvidenceID PK
        string SourceType
        string URL
        float ReliabilityScore
    }
    EVIDENCE_VERSION {
        string EvidenceVersionID PK
        string EvidenceID FK
        string Summary
        float ReliabilityScore
        datetime CreatedAt
    }
    SCENARIO_EVIDENCE_VERSION_LINK {
        string LinkID PK
        string ScenarioVersionID FK
        string EvidenceVersionID FK
        float Relevance
        string Direction
    }
    VERDICT {
        string VerdictID PK
        string ScenarioID FK
    }
    VERDICT_VERSION {
        string VerdictVersionID PK
        string VerdictID FK
        float Verdict
        float Confidence
        string Reasoning
        datetime CreatedAt
    }

    CLAIM_CLUSTER ||--o{ CLAIM : contains
    CLAIM ||--o{ CLAIM_VERSION : versions
    CLAIM ||--o{ SCENARIO : has
    SCENARIO ||--o{ SCENARIO_VERSION : versions
    EVIDENCE ||--o{ EVIDENCE_VERSION : versions
    SCENARIO_VERSION ||--o{ SCENARIO_EVIDENCE_VERSION_LINK : links
    EVIDENCE_VERSION ||--o{ SCENARIO_EVIDENCE_VERSION_LINK : linked
    SCENARIO ||--o{ VERDICT : assessed
    VERDICT ||--o{ VERDICT_VERSION : versions

User Roles Structure

Content Workflow


Related Pages