Data Model

Version 5.1 by Robert Schaub on 2025/12/14 22:27

Data Model

This page describes the current data model for FactHarbor.

Versioning Strategy

Every entity in FactHarbor has a full immutable version history. This ensures:

  • Complete auditability
  • Ability to reconstruct historical state
  • Federation-compatible lineage tracking
  • Transparent evolution of claims, scenarios, and verdicts

Core Versioning Principles

Immutability:

  • Each version is stored independently
  • Versions cannot be deleted, only superseded
  • Historical versions remain accessible

Lineage:

  • Each version links to its parent via `ParentVersionID`
  • Forms directed acyclic graph (DAG) of changes
  • Supports branching in federated environments

Provenance:

  • Every version timestamped (`CreatedAt`)
  • Author type recorded (`AuthorType`: Human, AI, ExternalNode)
  • Justification captured (`JustificationText`)
  • Digital signatures for integrity (`SignatureHash` in Release 1.0)

Federation Support:

  • Versions can originate from remote nodes
  • Conflict detection via lineage comparison
  • Parallel version trees for branching scenarios
  • Cross-node version synchronization

Common Version Fields

All versioned entities include:

  • VersionID: Unique identifier for this specific version
  • ParentVersionID: Link to previous version (null for first version)
  • CreatedAt: Timestamp (ISO 8601, UTC)
  • AuthorType: Human | AI | ExternalNode
  • JustificationText: Brief explanation of changes
  • SignatureHash: Cryptographic signature (Release 1.0)

Core Data Model Refinements

The system relies on the following versioned core entities:

  • CLAIM_CLUSTER
    • ``ClusterID`` (PK), ``EmbeddingVectorRef``, ``Theme``
    • Groups related claims into topical clusters.
    • One Cluster has many Claims.
    • A Claim belongs to exactly one primary cluster.
  • CLAIM / CLAIM_VERSION
    • ``CLAIM`` is the long‑lived anchor for a real‑world claim.
    • ``CLAIM_VERSION`` is an immutable snapshot that includes:
      • ``ClaimID`` (FK to CLAIM)
      • ``VersionID`` (PK)
      • ``ParentVersionID`` (FK to prior version, nullable)
      • ``Text``
      • ``Domain``
      • ``ClaimType`` (literal, metaphorical, rhetorical, supernatural...)
      • ``Evaluability`` (empirical, subjective, non-falsifiable)
      • ``SafetyCategory`` (low, medium, high)
      • ``CreatedAt``, ``AuthorType``, ``JustificationText``
      • ``Status`` (active, superseded, merged)
  • SCENARIO / SCENARIO_VERSION
    • ``SCENARIO`` is the anchor for a scenario across time.
    • ``SCENARIO_VERSION`` is an immutable snapshot:
      • ``ScenarioID`` (FK to SCENARIO)
      • ``VersionID`` (PK)
      • ``ParentVersionID``
      • ``ClaimID`` (FK to CLAIM)
      • ``Definitions``
      • ``Boundaries``
      • ``Assumptions``
      • ``Context``
      • ``EvaluationMethod``
      • ``SafetyClass``
      • ``CreatedAt``, ``AuthorType``, ``JustificationText``
      • ``Status`` (active, superseded, deprecated)
  • EVIDENCE / EVIDENCE_VERSION
    • ``EVIDENCE`` is the anchor.
    • ``EVIDENCE_VERSION`` is the versioned snapshot:
      • ``EvidenceID`` (FK to EVIDENCE)
      • ``VersionID`` (PK)
      • ``ParentVersionID``
      • ``Type`` (paper, dataset, report, transcript, expert...)
      • ``Category`` (empirical, historical, rhetorical, dataset, meta-analysis...)
      • ``Reliability`` (low/med/high)
      • ``Provenance`` (URL, DOI, source metadata)
      • ``ExtractionMethod`` (manual, OCR, API, AKEL)
      • ``CreatedAt``, ``AuthorType``, ``JustificationText``
      • ``Status`` (verified, updated, disputed, retracted, superseded)
  • VERDICT / VERDICT_VERSION
    • ``VERDICT`` is the anchor.
    • ``VERDICT_VERSION`` is the snapshot:
      • ``VerdictID`` (FK to VERDICT)
      • ``VersionID`` (PK)
      • ``ParentVersionID``
      • ``ClaimID`` (FK to CLAIM)
      • ``ScenarioID`` (FK to SCENARIO)
      • ``EvidenceVersionSet`` (list of evidence version IDs used)
      • ``LikelihoodRange`` (0–1, with uncertainty bounds)
      • ``ExplanationChain``
      • ``UncertaintyFactors``
      • ``CreatedAt``, ``AuthorType``, ``JustificationText``
      • ``Status`` (current, outdated, superseded, retracted)

Many-to-Many Linking Tables

Links scenario versions to evidence versions with relevance scoring.

Fields:

  • ``ScenarioID``
  • ``ScenarioVersionID``
  • ``EvidenceID``
  • ``EvidenceVersionID``
  • ``RelevanceScore`` (0–1) - How relevant this evidence is to this scenario
  • ``LinkJustification`` - Brief explanation of relevance

Purpose:

  • Evidence can be used by multiple scenarios
  • Scenarios can draw from multiple pieces of evidence
  • Relevance scoring helps prioritize evidence
  • Version-specific linking preserves historical accuracy

ClaimCluster

Semantic clustering of similar claims.

Fields:

  • ``ClusterID`` (PK)
  • ``EmbeddingVector`` - Vector representation for semantic search
  • ``MemberList`` - List of ClaimIDs in this cluster
  • ``Theme`` - Human-readable theme description

Purpose:

  • Groups semantically similar claims
  • Enables efficient search and discovery
  • Supports cross-node claim alignment
  • Reduces duplication

Data Model Behavior

Late-Arriving Evidence

When new evidence versions appear:

  1. Existing verdicts marked as outdated
    2. Scenario relevance must be re-evaluated
    3. Re-evaluation engine triggers verdict recomputation
    4. New verdict versions created
    5. Users notified of updates

Process:

  • New EvidenceVersion imported
  • System scans related ScenarioEvidenceLinks
  • Checks if evidence affects existing verdicts
  • Queues affected verdicts for re-evaluation
  • AKEL or reviewer creates new VerdictVersion
  • Old verdicts remain accessible (historical record)

Scenario Evolution

When a scenario's assumptions or definitions change:

Creates new scenario version (not in-place update):

  • New ScenarioVersion with updated fields
  • ParentVersionID points to previous version
  • All dependent verdicts must be recalculated
  • Previous scenario versions remain accessible

Triggers:

  • Refined definitions
  • Changed assumptions
  • Expanded or narrowed boundaries
  • Updated evaluation methods
  • Safety classification changes

Impact:

  • Verdicts based on old scenario version remain valid (historical)
  • New verdicts required for new scenario version
  • Users can compare old vs new scenarios
  • Evidence links may need re-assessment

Federated Nodes

Each node may share partial data:

Claims and scenarios: Shared if relevant to node's domain

Evidence metadata: Shared, but not always full evidence files

Verdict lineage: Shared only if not locally overridden

Version synchronization:

  • Remote versions imported with provenance metadata
  • Conflicts detected via ParentVersionID comparison
  • Branching allowed for divergent interpretations
  • Local node retains authority over local versions

Trust and acceptance:

  • Trusted nodes: auto-import versions
  • Neutral nodes: import but flag for review
  • Untrusted nodes: manual import only

Entity-Relationship Overview

Core relationships:

```
CLAIM_CLUSTER (1) ──< (N) CLAIM
CLAIM (1) ──< (N) CLAIM_VERSION
CLAIM (1) ──< (N) SCENARIO
SCENARIO (1) ──< (N) SCENARIO_VERSION
SCENARIO_VERSION (N) ──< (N) EVIDENCE_VERSION [via ScenarioEvidenceLink]
SCENARIO_VERSION (1) ──< (N) VERDICT_VERSION
VERDICT_VERSION references specific EvidenceVersionSet
```

Version chains:

Each entity has a version DAG:
```
Version 1 (ParentVersionID=null)
    ↓
Version 2 (ParentVersionID=1)
    ↓
Version 3 (ParentVersionID=2)
```

In federated environments, branching may occur:
```
Version 1
    ↓
Version 2
   /   ↓   ↓
V3a  V3b (parallel branches from different nodes)
```


 Related Pages ==