Data Model (From Specification Chat)

Version 1.1 by Robert Schaub on 2025/11/27 12:02

5. Data Model

The FactHarbor data model centers on four fully versioned, immutable entities:

  • Claim
  • Scenario
  • Evidence
  • Verdict

These entities form the structured “truth landscape” for each claim.  
The model is explicitly versioned, traceable, and federation-ready.

To keep the system auditable and explainable, FactHarbor uses a consistent
identity vs. version pattern:

  • Identity entities (e.g. CLAIM, SCENARIO)  
      define *what* something is in a stable sense.
  • Version entities (e.g. CLAIM_VERSION, SCENARIO_VERSION)  
      define *how that thing looked at a given point in time*.

All reasoning (e.g. verdicts, review actions) is attached to versions, never to
mutable identities.


5.1 Core entities and versioning pattern

 Logical concept  Identity entity  Version entity  Notes
 Claim (what people argue about)  CLAIM  CLAIM_VERSION  Claim text, phrasing, and metadata live in CLAIM_VERSION. The identity CLAIM stays stable across rephrasings.
 Scenario (interpretive frame)  SCENARIO  SCENARIO_VERSION  A SCENARIO belongs to a CLAIM. Its versions capture evolving definitions, assumptions, and boundaries.
 Evidence (source / datapoint)  EVIDENCE  EVIDENCE_VERSION  Identity of a source vs. specific extractions / updates over time.
 Verdict (assessment)  VERDICT  VERDICT_VERSION  A VERDICT is defined per SCENARIO; VERDICT_VERSION captures the history of assessments.
 Scenario–Evidence link  SCENARIO_EVIDENCE_LINK  SCENARIO_EVIDENCE_LINK_VERSION  Links bind scenario versions to evidence versions with relevance & direction.
 Claim cluster (semantic group)  CLAIM_CLUSTER  –  Groups semantically related claims; mainly for discovery and navigation.

Key design decisions:

  • A CLAIM belongs to exactly one CLAIM_CLUSTER.
  • A SCENARIO belongs to exactly one CLAIM  
      (scenarios live at the *claim* level, not per individual phrasing).
  • Verdicts and Scenario–Evidence links are always attached to versions:
  • SCENARIO_VERSION +
      EVIDENCE_VERSION
      SCENARIO_EVIDENCE_LINK_VERSION
  • SCENARIO_VERSION
      VERDICT_VERSION

This ensures that when a Scenario or Evidence changes, old verdicts and links
remain intact as historical records and can be revisited.


5.2 Core Data Model ERD (expanded, versioned)

The following Mermaid ER diagram shows the main entities and their relationships.
The convention is that fields ending in Id are primary keys,
and fields with ...IdFk are foreign keys.

erDiagram

    CLAIM_CLUSTER {
        string claimClusterId
        string theme
        string embeddingVectorRef
        string language
        datetime createdAt
    }

    CLAIM {
        string claimId
        string claimClusterIdFk
        string status
        datetime createdAt
    }

    CLAIM_VERSION {
        string claimVersionId
        string claimIdFk
        string text
        string language
        string claimType
        string domain
        string authorType
        datetime createdAt
    }

    SCENARIO {
        string scenarioId
        string claimIdFk
        string key
        string title
        boolean isDeprecated
    }

    SCENARIO_VERSION {
        string scenarioVersionId
        string scenarioIdFk
        string versionTag
        string definitionsJson
        string assumptionsJson
        string boundariesJson
        string notes
        datetime createdAt
    }

    EVIDENCE {
        string evidenceId
        string canonicalSourceId
        string mainUrl
        string evidenceType
        string language
    }

    EVIDENCE_VERSION {
        string evidenceVersionId
        string evidenceIdFk
        string snapshotLocation
        string extractionSummary
        string reliabilityModel
        datetime collectedAt
        datetime createdAt
    }

    SCENARIO_EVIDENCE_LINK {
        string scenarioEvidenceLinkId
        string scenarioIdFk
        string evidenceIdFk
    }

    SCENARIO_EVIDENCE_LINK_VERSION {
        string scenarioEvidenceLinkVersionId
        string scenarioEvidenceLinkIdFk
        string scenarioVersionIdFk
        string evidenceVersionIdFk
        float relevance
        string direction   %% SUPPORTS / CONTRADICTS / MIXED / CONTEXT
        string rationale
        datetime createdAt
    }

    VERDICT {
        string verdictId
        string scenarioIdFk
        string verdictType    %% e.g. likelihood, classification
    }

    VERDICT_VERSION {
        string verdictVersionId
        string verdictIdFk
        string scenarioVersionIdFk
        float probability
        float confidence
        string reasoningSummary
        string uncertaintyFactorsJson
        datetime createdAt
    }

    %% Relationships

    CLAIM_CLUSTER ||--o{ CLAIM : contains
    CLAIM ||--o{ CLAIM_VERSION : has_versions
    CLAIM ||--o{ SCENARIO : has_scenarios
    SCENARIO ||--o{ SCENARIO_VERSION : has_versions

    EVIDENCE ||--o{ EVIDENCE_VERSION : has_versions

    SCENARIO ||--o{ SCENARIO_EVIDENCE_LINK : may_link
    EVIDENCE ||--o{ SCENARIO_EVIDENCE_LINK : may_link

    SCENARIO_EVIDENCE_LINK ||--o{ SCENARIO_EVIDENCE_LINK_VERSION : has_versions

    SCENARIO_VERSION ||--o{ SCENARIO_EVIDENCE_LINK_VERSION : uses_evidence
    EVIDENCE_VERSION ||--o{ SCENARIO_EVIDENCE_LINK_VERSION : is_used_in

    SCENARIO ||--o{ VERDICT : has_verdicts
    VERDICT ||--o{ VERDICT_VERSION : has_versions
    SCENARIO_VERSION ||--o{ VERDICT_VERSION : assessed_in

Important points:

  • Scenarios and Evidence are linked via their versions  
      (SCENARIO_VERSION and EVIDENCE_VERSION).
  • Verdicts are per ScenarioVersion and stored in VERDICT_VERSION.
  • CLAIM_CLUSTER is shared across diagrams; it is shown here and in the Data Use / Review model.

All version entities are immutable: once created, they are never changed, only
superseded by newer versions.


5.3 Data Use & Review ERD (expanded, versioned)

The Data Use model captures who does what with which versioned data:

  • Users (including technical users)
  • Roles and role assignments
  • Review actions on versioned entities
erDiagram

    USER {
        string userId
        string displayName
        string email
        string userType      %% "human" or "technical"
        datetime createdAt
    }

    TECHNICAL_USER {
        string technicalUserId
        string userIdFk
        string description
        string systemIdentifier
    }

    ROLE {
        string roleId
        string code          %% e.g. READER, CONTRIBUTOR, REVIEWER, TRUSTED_CONTRIBUTOR, MODERATOR, SYSTEM_ADMIN, FEDERATION_OPERATOR, FEDERATION_ADMIN
        string description
        boolean isFederationRole
    }

    USER_ROLE_MEMBERSHIP {
        string membershipId
        string userIdFk
        string roleIdFk
        datetime grantedAt
        string grantedByUserIdFk
    }

    REVIEW_ACTION {
        string reviewActionId
        string subjectType         %% e.g. CLAIM_VERSION, SCENARIO_VERSION...
        string subjectVersionId
        string actionType          %% APPROVE, REJECT, FLAG, COMMENT, REQUEST_CHANGES...
        string outcome             %% ACCEPTED, REJECTED, ESCALATED...
        string comment
        string createdByUserIdFk
        datetime createdAt
    }

    %% Versioned data entities (references from the core model)

    CLAIM_VERSION {
        string claimVersionId
    }

    SCENARIO_VERSION {
        string scenarioVersionId
    }

    EVIDENCE_VERSION {
        string evidenceVersionId
    }

    SCENARIO_EVIDENCE_LINK_VERSION {
        string scenarioEvidenceLinkVersionId
    }

    VERDICT_VERSION {
        string verdictVersionId
    }

    %% Relationships

    USER ||--o{ TECHNICAL_USER : may_be
    USER ||--o{ USER_ROLE_MEMBERSHIP : has_role
    ROLE ||--o{ USER_ROLE_MEMBERSHIP : assigned_to

    USER ||--o{ REVIEW_ACTION : performs

    CLAIM_VERSION ||--o{ REVIEW_ACTION : is_reviewed_in
    SCENARIO_VERSION ||--o{ REVIEW_ACTION : is_reviewed_in
    EVIDENCE_VERSION ||--o{ REVIEW_ACTION : is_reviewed_in
    SCENARIO_EVIDENCE_LINK_VERSION ||--o{ REVIEW_ACTION : is_reviewed_in
    VERDICT_VERSION ||--o{ REVIEW_ACTION : is_reviewed_in

Notes:

  • Most roles (READER, CONTRIBUTOR, TRUSTED_CONTRIBUTOR, REVIEWER, MODERATOR,
      SYSTEM_ADMIN, FEDERATION_OPERATOR, FEDERATION_ADMIN, …) are represented as rows
      in ROLE.
  • TECHNICAL_USER captures strictly technical accounts (API keys,
      node-to-node federation agents, batch jobs). All other roles can, in principle,
      be held by both human and technical users where appropriate.
  • A READER normally does not perform REVIEW_ACTIONs, while
      roles like REVIEWER, TRUSTED_CONTRIBUTOR, MODERATOR, and some federation roles
      do.

5.4 Versioning and re-evaluation behavior

This section ties the data model to the re-evaluation logic
(described in more detail in the Versioning and Automation chapters).

  • When a new EVIDENCE_VERSION is created:
  • All related SCENARIO_EVIDENCE_LINK_VERSION entries referencing
        that evidence version are candidates for re-assessment.
  • Related VERDICT_VERSION entries may become outdated and
        are queued for re-evaluation.
  • When a new SCENARIO_VERSION is created:
  • It may inherit some links from earlier scenarios, or start empty depending
        on the change classification (cosmetic vs. conceptual).
  • All verdicts for that scenario are recalculated and stored as new
      VERDICT_VERSION entries.
  • REVIEW_ACTIONs are always attached to the exact version that was seen by
      the reviewer. This preserves a faithful audit trail if data later changes.
  • In a federated environment, nodes can choose:
  • which identity entities to replicate (CLAIM, SCENARIO, EVIDENCE, VERDICT)
  • which versioned entities to replicate (e.g. only accepted VERDICT_VERSIONs,
        only EVIDENCE_VERSIONs above a reliability threshold, etc.)

5.5 Behavioral Notes

5.5.1 Late-Arriving Evidence

New evidence versions can make existing verdicts outdated and may trigger
re-evaluation cascades. This is handled by the global trigger and automation
architecture (see the Versioning & Automation chapters).

5.5.2 Scenario Evolution

Scenario changes create new SCENARIO_VERSIONs; dependent verdicts and
Scenario–Evidence links are re-assessed. Old versions remain available for
historical comparison and reproducibility.

5.5.3 Federation

Federated nodes can replicate subsets of the graph, including:

  • Claims and Scenarios of local interest
  • Evidence metadata (without full content)
  • Verdict lineages used for local decision-making

Federation-specific entities (such as FEDERATION_NODE,
replication logs, and trust rules) are described in the Federation &
Decentralization chapter and build on top of the core data model defined here.