Data Model
Data Model
This page describes the current data model for FactHarbor.
Versioning Strategy
Every entity in FactHarbor has a full immutable version history. This ensures:
- Complete auditability
- Ability to reconstruct historical state
- Federation-compatible lineage tracking
- Transparent evolution of claims, scenarios, and verdicts
Core Versioning Principles
Immutability:
- Each version is stored independently
- Versions cannot be deleted, only superseded
- Historical versions remain accessible
Lineage:
- Each version links to its parent via `ParentVersionID`
- Forms directed acyclic graph (DAG) of changes
- Supports branching in federated environments
Provenance:
- Every version timestamped (`CreatedAt`)
- Author type recorded (`AuthorType`: Human, AI, ExternalNode)
- Justification captured (`JustificationText`)
- Digital signatures for integrity (`SignatureHash` in Release 1.0)
Federation Support:
- Versions can originate from remote nodes
- Conflict detection via lineage comparison
- Parallel version trees for branching scenarios
- Cross-node version synchronization
Common Version Fields
All versioned entities include:
- VersionID: Unique identifier for this specific version
- ParentVersionID: Link to previous version (null for first version)
- CreatedAt: Timestamp (ISO 8601, UTC)
- AuthorType: Human | AI | ExternalNode
- JustificationText: Brief explanation of changes
- SignatureHash: Cryptographic signature (Release 1.0)
Core Data Model Refinements
The system relies on the following versioned core entities:
- CLAIM_CLUSTER
- ``ClusterID`` (PK), ``EmbeddingVectorRef``, ``Theme``
- Groups related claims into topical clusters.
- One Cluster has many Claims.
- A Claim belongs to exactly one primary cluster.
- CLAIM / CLAIM_VERSION
- ``CLAIM`` is the long‑lived anchor for a real‑world claim.
- ``CLAIM_VERSION`` is an immutable snapshot that includes:
- ``ClaimID`` (FK to CLAIM)
- ``VersionID`` (PK)
- ``ParentVersionID`` (FK to prior version, nullable)
- ``Text``
- ``Domain``
- ``ClaimType`` (literal, metaphorical, rhetorical, supernatural...)
- ``Evaluability`` (empirical, subjective, non-falsifiable)
- ``SafetyCategory`` (low, medium, high)
- ``CreatedAt``, ``AuthorType``, ``JustificationText``
- ``Status`` (active, superseded, merged)
- SCENARIO / SCENARIO_VERSION
- ``SCENARIO`` is the anchor for a scenario across time.
- ``SCENARIO_VERSION`` is an immutable snapshot:
- ``ScenarioID`` (FK to SCENARIO)
- ``VersionID`` (PK)
- ``ParentVersionID``
- ``ClaimID`` (FK to CLAIM)
- ``Definitions``
- ``Boundaries``
- ``Assumptions``
- ``Context``
- ``EvaluationMethod``
- ``SafetyClass``
- ``CreatedAt``, ``AuthorType``, ``JustificationText``
- ``Status`` (active, superseded, deprecated)
- EVIDENCE / EVIDENCE_VERSION
- ``EVIDENCE`` is the anchor.
- ``EVIDENCE_VERSION`` is the versioned snapshot:
- ``EvidenceID`` (FK to EVIDENCE)
- ``VersionID`` (PK)
- ``ParentVersionID``
- ``Type`` (paper, dataset, report, transcript, expert...)
- ``Category`` (empirical, historical, rhetorical, dataset, meta-analysis...)
- ``Reliability`` (low/med/high)
- ``Provenance`` (URL, DOI, source metadata)
- ``ExtractionMethod`` (manual, OCR, API, AKEL)
- ``CreatedAt``, ``AuthorType``, ``JustificationText``
- ``Status`` (verified, updated, disputed, retracted, superseded)
- VERDICT / VERDICT_VERSION
- ``VERDICT`` is the anchor.
- ``VERDICT_VERSION`` is the snapshot:
- ``VerdictID`` (FK to VERDICT)
- ``VersionID`` (PK)
- ``ParentVersionID``
- ``ClaimID`` (FK to CLAIM)
- ``ScenarioID`` (FK to SCENARIO)
- ``EvidenceVersionSet`` (list of evidence version IDs used)
- ``LikelihoodRange`` (0–1, with uncertainty bounds)
- ``ExplanationChain``
- ``UncertaintyFactors``
- ``CreatedAt``, ``AuthorType``, ``JustificationText``
- ``Status`` (current, outdated, superseded, retracted)
Many-to-Many Linking Tables
ScenarioEvidenceLink
Links scenario versions to evidence versions with relevance scoring.
Fields:
- ``ScenarioID``
- ``ScenarioVersionID``
- ``EvidenceID``
- ``EvidenceVersionID``
- ``RelevanceScore`` (0–1) - How relevant this evidence is to this scenario
- ``LinkJustification`` - Brief explanation of relevance
Purpose:
- Evidence can be used by multiple scenarios
- Scenarios can draw from multiple pieces of evidence
- Relevance scoring helps prioritize evidence
- Version-specific linking preserves historical accuracy
ClaimCluster
Semantic clustering of similar claims.
Fields:
- ``ClusterID`` (PK)
- ``EmbeddingVector`` - Vector representation for semantic search
- ``MemberList`` - List of ClaimIDs in this cluster
- ``Theme`` - Human-readable theme description
Purpose:
- Groups semantically similar claims
- Enables efficient search and discovery
- Supports cross-node claim alignment
- Reduces duplication
Data Model Behavior
Late-Arriving Evidence
When new evidence versions appear:
- Existing verdicts marked as outdated
2. Scenario relevance must be re-evaluated
3. Re-evaluation engine triggers verdict recomputation
4. New verdict versions created
5. Users notified of updates
Process:
- New EvidenceVersion imported
- System scans related ScenarioEvidenceLinks
- Checks if evidence affects existing verdicts
- Queues affected verdicts for re-evaluation
- AKEL or reviewer creates new VerdictVersion
- Old verdicts remain accessible (historical record)
Scenario Evolution
When a scenario's assumptions or definitions change:
Creates new scenario version (not in-place update):
- New ScenarioVersion with updated fields
- ParentVersionID points to previous version
- All dependent verdicts must be recalculated
- Previous scenario versions remain accessible
Triggers:
- Refined definitions
- Changed assumptions
- Expanded or narrowed boundaries
- Updated evaluation methods
- Safety classification changes
Impact:
- Verdicts based on old scenario version remain valid (historical)
- New verdicts required for new scenario version
- Users can compare old vs new scenarios
- Evidence links may need re-assessment
Federated Nodes
Each node may share partial data:
Claims and scenarios: Shared if relevant to node's domain
Evidence metadata: Shared, but not always full evidence files
Verdict lineage: Shared only if not locally overridden
Version synchronization:
- Remote versions imported with provenance metadata
- Conflicts detected via ParentVersionID comparison
- Branching allowed for divergent interpretations
- Local node retains authority over local versions
Trust and acceptance:
- Trusted nodes: auto-import versions
- Neutral nodes: import but flag for review
- Untrusted nodes: manual import only
Entity-Relationship Overview
Core relationships:
```
CLAIM_CLUSTER (1) ──< (N) CLAIM
CLAIM (1) ──< (N) CLAIM_VERSION
CLAIM (1) ──< (N) SCENARIO
SCENARIO (1) ──< (N) SCENARIO_VERSION
SCENARIO_VERSION (N) ──< (N) EVIDENCE_VERSION [via ScenarioEvidenceLink]
SCENARIO_VERSION (1) ──< (N) VERDICT_VERSION
VERDICT_VERSION references specific EvidenceVersionSet
```
Version chains:
Each entity has a version DAG:
```
Version 1 (ParentVersionID=null)
↓
Version 2 (ParentVersionID=1)
↓
Version 3 (ParentVersionID=2)
```
In federated environments, branching may occur:
```
Version 1
↓
Version 2
/ ↓ ↓
V3a V3b (parallel branches from different nodes)
```
Related Pages ==