Data Model

Last modified by Robert Schaub on 2025/12/24 20:34

Data Model

This page describes the current data model for FactHarbor v0.9.1.

Versioning Strategy

Every entity in FactHarbor has a full immutable version history. This ensures:

Complete auditability
Ability to reconstruct historical state
Federation-compatible lineage tracking
Transparent evolution of claims, scenarios, and verdicts

Core Versioning Principles

Immutability:

Each version is stored independently
Versions cannot be deleted, only superseded
Historical versions remain accessible

Lineage:

Each version links to its parent via `ParentVersionID`
Forms directed acyclic graph (DAG) of changes
Supports branching in federated environments

Provenance:

Every version timestamped (`CreatedAt`)
Author type recorded (`AuthorType`: Human, AI, ExternalNode)
Justification captured (`JustificationText`)
Digital signatures for integrity (`SignatureHash` in Release 1.0)

Federation Support:

Versions can originate from remote nodes
Conflict detection via lineage comparison
Parallel version trees for branching scenarios
Cross-node version synchronization

Common Version Fields

All versioned entities include:

VersionID: Unique identifier for this specific version
ParentVersionID: Link to previous version (null for first version)
CreatedAt: Timestamp (ISO 8601, UTC)
AuthorType: Human | AI | ExternalNode
CreatedBy: Foreign key to User or TechnicalUser
JustificationText: Brief explanation of changes
PublicationMode: Mode1 (draft) | Mode2 (AI-published) | Mode3 (human-reviewed)
ReviewStatus: Workflow state (draft|in_review|approved|rejected)
NodeOrigin: Node ID where version was created (for federation)
SignatureHash: Cryptographic signature (Release 1.0)

Core Entity Definitions

User Entities

USER (base user table):

``UserID`` (PK)
``UserType`` (Reader|Contributor|Reviewer|Auditor|Expert|Moderator|Maintainer)
``DisplayName``
``Email`` (for Contributors and above)
``RegisteredAt``
``LastActive``
``Status`` (active|suspended|banned)

TECHNICAL_USER (system processes):

``SystemID`` (PK)
``SystemName``
``Purpose`` (AKEL|FederationSync|BackupService|Monitor|Audit)
``CreatedBy`` (FK to Maintainer who created this system user)
``CreatedAt``
``Status`` (active|paused|deprecated)
``ApiKey`` (encrypted)
``Permissions`` (JSON - authorized operations)

Examples of Technical Users:

AKEL instances (AI processing)
Federation sync bots
Scheduled audit tasks
Backup services
Monitoring systems
External API integrations

Content Entities

The system relies on the following versioned core entities:

CLAIM_CLUSTER:

``ClusterID`` (PK)
``EmbeddingVectorRef``
``Theme``
Groups related claims into topical clusters
One Cluster has many Claims
A Claim belongs to exactly one primary cluster

CLAIM / CLAIM_VERSION:

``CLAIM`` is the long-lived anchor for a real-world claim
``CLAIM_VERSION`` is an immutable snapshot that includes:
``VersionID`` (PK)
``ClaimID`` (FK to CLAIM)
``ParentVersionID`` (FK to prior version, nullable)
``Text``
``Domain``
``ClaimType`` (literal|metaphorical|rhetorical|supernatural)
``Evaluability`` (empirical|subjective|non-falsifiable)
``RiskTier`` (A|B|C) - replaced SafetyCategory for consistency
``PublicationMode`` (Mode1|Mode2|Mode3)
``ReviewStatus`` (draft|in_review|approved|rejected)
``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
``NodeOrigin``, ``SignatureHash``
``Status`` (active|superseded|merged)

SCENARIO / SCENARIO_VERSION:

``SCENARIO`` is the anchor for a scenario across time
``SCENARIO_VERSION`` is an immutable snapshot:
``VersionID`` (PK)
``ScenarioID`` (FK to SCENARIO)
``ParentVersionID``
``ClaimID`` (FK to CLAIM)
``Definitions`` (JSON)
``Boundaries`` (JSON)
``Assumptions`` (JSON)
``Context`` (text)
``EvaluationMethod`` (text)
``PublicationMode`` (Mode1|Mode2|Mode3)
``ReviewStatus`` (draft|in_review|approved|rejected)
``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
``NodeOrigin``, ``SignatureHash``
``Status`` (active|superseded|deprecated)

Note: SafetyClass removed from Scenario - risk tier is at claim level

EVIDENCE / EVIDENCE_VERSION:

``EVIDENCE`` is the anchor
``EVIDENCE_VERSION`` is the versioned snapshot:
``VersionID`` (PK)
``EvidenceID`` (FK to EVIDENCE)
``ParentVersionID``
``Type`` (paper|dataset|report|transcript|expert|media)
``Category`` (empirical|historical|rhetorical|dataset|meta-analysis)
``Reliability`` (low|medium|high)
``Provenance`` (URL, DOI, source metadata)
``ExtractionMethod`` (manual|OCR|API|AKEL)
``ContentHash`` (SHA256 of evidence content)
``PublicationMode`` (Mode1|Mode2|Mode3)
``ReviewStatus`` (draft|verified|disputed|retracted)
``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
``NodeOrigin``, ``SignatureHash``
``Status`` (active|superseded)

VERDICT / VERDICT_VERSION:

``VERDICT`` is the anchor
``VERDICT_VERSION`` is the snapshot:
``VersionID`` (PK)
``VerdictID`` (FK to VERDICT)
``ParentVersionID``
``ClaimID`` (FK to CLAIM)
``ScenarioVersionID`` (FK to specific SCENARIO_VERSION)
``EvidenceVersionSet`` (JSON array of Evidence VersionIDs used)
``LikelihoodRange`` (0–1, with uncertainty bounds)
``ExplanationChain`` (JSON)
``UncertaintyFactors`` (JSON)
``PublicationMode`` (Mode1|Mode2|Mode3)
``ReviewStatus`` (draft|in_review|approved|retracted)
``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
``NodeOrigin``, ``SignatureHash``
``Status`` (current|outdated|superseded|retracted)

Many-to-Many Linking Tables

ScenarioEvidenceLink:

Links scenario versions to evidence versions with relevance scoring
``ScenarioID``, ``ScenarioVersionID``
``EvidenceID``, ``EvidenceVersionID``
``RelevanceScore`` (0–1) - How relevant this evidence is to this scenario
``LinkJustification`` - Brief explanation of relevance

Purpose:

Evidence can be used by multiple scenarios
Scenarios can draw from multiple pieces of evidence
Relevance scoring helps prioritize evidence
Version-specific linking preserves historical accuracy

ClaimCluster:

Semantic clustering of similar claims
``ClusterID`` (PK)
``EmbeddingVector`` - Vector representation for semantic search
``MemberList`` - List of ClaimIDs in this cluster
``Theme`` - Human-readable theme description

Key Changes in v0.9.1

Updated Field Names:

`SafetyCategory` → `RiskTier` (consistency with risk tier system A/B/C)
`SafetyClass` removed from Scenario (redundant with claim-level RiskTier)

Added Fields to All Version Entities:

`PublicationMode` - Track Mode 1/2/3 status
`ReviewStatus` - Track workflow state
`NodeOrigin` - Federation provenance
`CreatedBy` - FK to User/TechnicalUser (clarified)

New Entity:

`TECHNICAL_USER` - Separate system processes from human users

Clarifications:

`ScenarioVersionID` in Verdict (not just ScenarioID) - links to specific version
`ContentHash` in Evidence - SHA256 for integrity checking

Data Model Behavior

Late-Arriving Evidence

When new evidence versions appear:

Existing verdicts marked as outdated
2. Scenario relevance must be re-evaluated
3. Re-evaluation engine triggers verdict recomputation
4. New verdict versions created
5. Users notified of updates

Scenario Evolution

When a scenario's assumptions or definitions change:

Creates new scenario version (not in-place update)
All dependent verdicts must be recalculated
Previous scenario versions remain accessible
Version lineage preserved

Federated Nodes

Each node may share partial data:

Claims and scenarios shared if relevant
Evidence metadata shared, not always full files
Version synchronization via NodeOrigin tracking
Branching allowed for divergent interpretations

Visual Diagrams

The following diagrams provide visual representations of the data model structure and relationships.

Core Data Model ERD

Core Data Model ERD

erDiagram
    CLAIM_CLUSTER {
        string ClusterID PK
        string EmbeddingVectorRef
        string Theme
    }
    CLAIM {
        string ClaimID PK
        string ClusterID FK
        string Status
        datetime CreatedAt
    }
    CLAIM_VERSION {
        string ClaimVersionID PK
        string ClaimID FK
        string Text
        string ClaimType
        string Domain
        datetime CreatedAt
    }
    SCENARIO {
        string ScenarioID PK
        string ClaimID FK
        string Name
        datetime CreatedAt
    }
    SCENARIO_VERSION {
        string ScenarioVersionID PK
        string ScenarioID FK
        string Definitions
        string Assumptions
        string Boundaries
        datetime CreatedAt
    }
    EVIDENCE {
        string EvidenceID PK
        string SourceType
        string URL
        float ReliabilityScore
    }
    EVIDENCE_VERSION {
        string EvidenceVersionID PK
        string EvidenceID FK
        string Summary
        float ReliabilityScore
        datetime CreatedAt
    }
    SCENARIO_EVIDENCE_VERSION_LINK {
        string LinkID PK
        string ScenarioVersionID FK
        string EvidenceVersionID FK
        float Relevance
        string Direction
    }
    VERDICT {
        string VerdictID PK
        string ScenarioID FK
    }
    VERDICT_VERSION {
        string VerdictVersionID PK
        string VerdictID FK
        float Verdict
        float Confidence
        string Reasoning
        datetime CreatedAt
    }

    CLAIM_CLUSTER ||--o{ CLAIM : contains
    CLAIM ||--o{ CLAIM_VERSION : versions
    CLAIM ||--o{ SCENARIO : has
    SCENARIO ||--o{ SCENARIO_VERSION : versions
    EVIDENCE ||--o{ EVIDENCE_VERSION : versions
    SCENARIO_VERSION ||--o{ SCENARIO_EVIDENCE_VERSION_LINK : links
    EVIDENCE_VERSION ||--o{ SCENARIO_EVIDENCE_VERSION_LINK : linked
    SCENARIO ||--o{ VERDICT : assessed
    VERDICT ||--o{ VERDICT_VERSION : versions

Data Model

Data Model

Versioning Strategy

Core Versioning Principles

Common Version Fields

Core Entity Definitions

User Entities

Content Entities

Many-to-Many Linking Tables

Key Changes in v0.9.1

Data Model Behavior

Late-Arriving Evidence

Scenario Evolution

Federated Nodes

Visual Diagrams

Core Data Model ERD

User Roles Structure

Content Workflow

Related Pages

Applications

Navigation

Need help?