Data Model

Version 5.1 by Robert Schaub on 2025/12/14 22:27

Data Model

This page describes the current data model for FactHarbor.

Versioning Strategy

Every entity in FactHarbor has a full immutable version history. This ensures:

Complete auditability
Ability to reconstruct historical state
Federation-compatible lineage tracking
Transparent evolution of claims, scenarios, and verdicts

Core Versioning Principles

Immutability:

Each version is stored independently
Versions cannot be deleted, only superseded
Historical versions remain accessible

Lineage:

Each version links to its parent via `ParentVersionID`
Forms directed acyclic graph (DAG) of changes
Supports branching in federated environments

Provenance:

Every version timestamped (`CreatedAt`)
Author type recorded (`AuthorType`: Human, AI, ExternalNode)
Justification captured (`JustificationText`)
Digital signatures for integrity (`SignatureHash` in Release 1.0)

Federation Support:

Versions can originate from remote nodes
Conflict detection via lineage comparison
Parallel version trees for branching scenarios
Cross-node version synchronization

Common Version Fields

All versioned entities include:

VersionID: Unique identifier for this specific version
ParentVersionID: Link to previous version (null for first version)
CreatedAt: Timestamp (ISO 8601, UTC)
AuthorType: Human | AI | ExternalNode
JustificationText: Brief explanation of changes
SignatureHash: Cryptographic signature (Release 1.0)

Core Data Model Refinements

The system relies on the following versioned core entities:

CLAIM_CLUSTER
- ``ClusterID`` (PK), ``EmbeddingVectorRef``, ``Theme``
- Groups related claims into topical clusters.
- One Cluster has many Claims.
- A Claim belongs to exactly one primary cluster.

CLAIM / CLAIM_VERSION
- ``CLAIM`` is the long‑lived anchor for a real‑world claim.
- ``CLAIM_VERSION`` is an immutable snapshot that includes:
  - ``ClaimID`` (FK to CLAIM)
  - ``VersionID`` (PK)
  - ``ParentVersionID`` (FK to prior version, nullable)
  - ``Text``
  - ``Domain``
  - ``ClaimType`` (literal, metaphorical, rhetorical, supernatural...)
  - ``Evaluability`` (empirical, subjective, non-falsifiable)
  - ``SafetyCategory`` (low, medium, high)
  - ``CreatedAt``, ``AuthorType``, ``JustificationText``
  - ``Status`` (active, superseded, merged)

SCENARIO / SCENARIO_VERSION
- ``SCENARIO`` is the anchor for a scenario across time.
- ``SCENARIO_VERSION`` is an immutable snapshot:
  - ``ScenarioID`` (FK to SCENARIO)
  - ``VersionID`` (PK)
  - ``ParentVersionID``
  - ``ClaimID`` (FK to CLAIM)
  - ``Definitions``
  - ``Boundaries``
  - ``Assumptions``
  - ``Context``
  - ``EvaluationMethod``
  - ``SafetyClass``
  - ``CreatedAt``, ``AuthorType``, ``JustificationText``
  - ``Status`` (active, superseded, deprecated)

EVIDENCE / EVIDENCE_VERSION
- ``EVIDENCE`` is the anchor.
- ``EVIDENCE_VERSION`` is the versioned snapshot:
  - ``EvidenceID`` (FK to EVIDENCE)
  - ``VersionID`` (PK)
  - ``ParentVersionID``
  - ``Type`` (paper, dataset, report, transcript, expert...)
  - ``Category`` (empirical, historical, rhetorical, dataset, meta-analysis...)
  - ``Reliability`` (low/med/high)
  - ``Provenance`` (URL, DOI, source metadata)
  - ``ExtractionMethod`` (manual, OCR, API, AKEL)
  - ``CreatedAt``, ``AuthorType``, ``JustificationText``
  - ``Status`` (verified, updated, disputed, retracted, superseded)

VERDICT / VERDICT_VERSION
- ``VERDICT`` is the anchor.
- ``VERDICT_VERSION`` is the snapshot:
  - ``VerdictID`` (FK to VERDICT)
  - ``VersionID`` (PK)
  - ``ParentVersionID``
  - ``ClaimID`` (FK to CLAIM)
  - ``ScenarioID`` (FK to SCENARIO)
  - ``EvidenceVersionSet`` (list of evidence version IDs used)
  - ``LikelihoodRange`` (0–1, with uncertainty bounds)
  - ``ExplanationChain``
  - ``UncertaintyFactors``
  - ``CreatedAt``, ``AuthorType``, ``JustificationText``
  - ``Status`` (current, outdated, superseded, retracted)

Many-to-Many Linking Tables

ScenarioEvidenceLink

Links scenario versions to evidence versions with relevance scoring.

Fields:

``ScenarioID``
``ScenarioVersionID``
``EvidenceID``
``EvidenceVersionID``
``RelevanceScore`` (0–1) - How relevant this evidence is to this scenario
``LinkJustification`` - Brief explanation of relevance

Purpose:

Evidence can be used by multiple scenarios
Scenarios can draw from multiple pieces of evidence
Relevance scoring helps prioritize evidence
Version-specific linking preserves historical accuracy

ClaimCluster

Semantic clustering of similar claims.

Fields:

``ClusterID`` (PK)
``EmbeddingVector`` - Vector representation for semantic search
``MemberList`` - List of ClaimIDs in this cluster
``Theme`` - Human-readable theme description

Purpose:

Groups semantically similar claims
Enables efficient search and discovery
Supports cross-node claim alignment
Reduces duplication

Data Model Behavior

Late-Arriving Evidence

When new evidence versions appear:

Existing verdicts marked as outdated
2. Scenario relevance must be re-evaluated
3. Re-evaluation engine triggers verdict recomputation
4. New verdict versions created
5. Users notified of updates

Process:

New EvidenceVersion imported
System scans related ScenarioEvidenceLinks
Checks if evidence affects existing verdicts
Queues affected verdicts for re-evaluation
AKEL or reviewer creates new VerdictVersion
Old verdicts remain accessible (historical record)

Scenario Evolution

When a scenario's assumptions or definitions change:

Creates new scenario version (not in-place update):

New ScenarioVersion with updated fields
ParentVersionID points to previous version
All dependent verdicts must be recalculated
Previous scenario versions remain accessible

Triggers:

Refined definitions
Changed assumptions
Expanded or narrowed boundaries
Updated evaluation methods
Safety classification changes

Impact:

Verdicts based on old scenario version remain valid (historical)
New verdicts required for new scenario version
Users can compare old vs new scenarios
Evidence links may need re-assessment

Federated Nodes

Each node may share partial data:

Claims and scenarios: Shared if relevant to node's domain

Evidence metadata: Shared, but not always full evidence files

Verdict lineage: Shared only if not locally overridden

Version synchronization:

Remote versions imported with provenance metadata
Conflicts detected via ParentVersionID comparison
Branching allowed for divergent interpretations
Local node retains authority over local versions

Trust and acceptance:

Trusted nodes: auto-import versions
Neutral nodes: import but flag for review
Untrusted nodes: manual import only

Entity-Relationship Overview

Core relationships:

```
CLAIM_CLUSTER (1) ──< (N) CLAIM
CLAIM (1) ──< (N) CLAIM_VERSION
CLAIM (1) ──< (N) SCENARIO
SCENARIO (1) ──< (N) SCENARIO_VERSION
SCENARIO_VERSION (N) ──< (N) EVIDENCE_VERSION [via ScenarioEvidenceLink]
SCENARIO_VERSION (1) ──< (N) VERDICT_VERSION
VERDICT_VERSION references specific EvidenceVersionSet
```

Version chains:

Each entity has a version DAG:
```
Version 1 (ParentVersionID=null)
↓
Version 2 (ParentVersionID=1)
↓
Version 3 (ParentVersionID=2)
```

In federated environments, branching may occur:
```
Version 1
↓
Version 2
/ ↓ ↓
V3a V3b (parallel branches from different nodes)
```

Related Pages ==

Data Model

Data Model

Versioning Strategy

Core Versioning Principles

Common Version Fields

Core Data Model Refinements

Many-to-Many Linking Tables

ScenarioEvidenceLink

ClaimCluster

Data Model Behavior

Late-Arriving Evidence

Scenario Evolution

Federated Nodes

Entity-Relationship Overview

Applications

Navigation

Need help?