Data Model
Data Model
This page describes the current data model for FactHarbor v0.9.1.
Versioning Strategy
Every entity in FactHarbor has a full immutable version history. This ensures:
- Complete auditability
- Ability to reconstruct historical state
- Federation-compatible lineage tracking
- Transparent evolution of claims, scenarios, and verdicts
Core Versioning Principles
Immutability:
- Each version is stored independently
- Versions cannot be deleted, only superseded
- Historical versions remain accessible
Lineage:
- Each version links to its parent via `ParentVersionID`
- Forms directed acyclic graph (DAG) of changes
- Supports branching in federated environments
Provenance:
- Every version timestamped (`CreatedAt`)
- Author type recorded (`AuthorType`: Human, AI, ExternalNode)
- Justification captured (`JustificationText`)
- Digital signatures for integrity (`SignatureHash` in Release 1.0)
Federation Support:
- Versions can originate from remote nodes
- Conflict detection via lineage comparison
- Parallel version trees for branching scenarios
- Cross-node version synchronization
Common Version Fields
All versioned entities include:
- VersionID: Unique identifier for this specific version
- ParentVersionID: Link to previous version (null for first version)
- CreatedAt: Timestamp (ISO 8601, UTC)
- AuthorType: Human | AI | ExternalNode
- CreatedBy: Foreign key to User or TechnicalUser
- JustificationText: Brief explanation of changes
- PublicationMode: Mode1 (draft) | Mode2 (AI-published) | Mode3 (human-reviewed)
- ReviewStatus: Workflow state (draft|in_review|approved|rejected)
- NodeOrigin: Node ID where version was created (for federation)
- SignatureHash: Cryptographic signature (Release 1.0)
Core Entity Definitions
User Entities
USER (base user table):
- ``UserID`` (PK)
- ``UserType`` (Reader|Contributor|Reviewer|Auditor|Expert|Moderator|Maintainer)
- ``DisplayName``
- ``Email`` (for Contributors and above)
- ``RegisteredAt``
- ``LastActive``
- ``Status`` (active|suspended|banned)
TECHNICAL_USER (system processes):
- ``SystemID`` (PK)
- ``SystemName``
- ``Purpose`` (AKEL|FederationSync|BackupService|Monitor|Audit)
- ``CreatedBy`` (FK to Maintainer who created this system user)
- ``CreatedAt``
- ``Status`` (active|paused|deprecated)
- ``ApiKey`` (encrypted)
- ``Permissions`` (JSON - authorized operations)
Examples of Technical Users:
- AKEL instances (AI processing)
- Federation sync bots
- Scheduled audit tasks
- Backup services
- Monitoring systems
- External API integrations
Content Entities
The system relies on the following versioned core entities:
CLAIM_CLUSTER:
- ``ClusterID`` (PK)
- ``EmbeddingVectorRef``
- ``Theme``
- Groups related claims into topical clusters
- One Cluster has many Claims
- A Claim belongs to exactly one primary cluster
CLAIM / CLAIM_VERSION:
- ``CLAIM`` is the long-lived anchor for a real-world claim
- ``CLAIM_VERSION`` is an immutable snapshot that includes:
- ``VersionID`` (PK)
- ``ClaimID`` (FK to CLAIM)
- ``ParentVersionID`` (FK to prior version, nullable)
- ``Text``
- ``Domain``
- ``ClaimType`` (literal|metaphorical|rhetorical|supernatural)
- ``Evaluability`` (empirical|subjective|non-falsifiable)
- ``RiskTier`` (A|B|C) - replaced SafetyCategory for consistency
- ``PublicationMode`` (Mode1|Mode2|Mode3)
- ``ReviewStatus`` (draft|in_review|approved|rejected)
- ``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
- ``NodeOrigin``, ``SignatureHash``
- ``Status`` (active|superseded|merged)
SCENARIO / SCENARIO_VERSION:
- ``SCENARIO`` is the anchor for a scenario across time
- ``SCENARIO_VERSION`` is an immutable snapshot:
- ``VersionID`` (PK)
- ``ScenarioID`` (FK to SCENARIO)
- ``ParentVersionID``
- ``ClaimID`` (FK to CLAIM)
- ``Definitions`` (JSON)
- ``Boundaries`` (JSON)
- ``Assumptions`` (JSON)
- ``Context`` (text)
- ``EvaluationMethod`` (text)
- ``PublicationMode`` (Mode1|Mode2|Mode3)
- ``ReviewStatus`` (draft|in_review|approved|rejected)
- ``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
- ``NodeOrigin``, ``SignatureHash``
- ``Status`` (active|superseded|deprecated)
Note: SafetyClass removed from Scenario - risk tier is at claim level
EVIDENCE / EVIDENCE_VERSION:
- ``EVIDENCE`` is the anchor
- ``EVIDENCE_VERSION`` is the versioned snapshot:
- ``VersionID`` (PK)
- ``EvidenceID`` (FK to EVIDENCE)
- ``ParentVersionID``
- ``Type`` (paper|dataset|report|transcript|expert|media)
- ``Category`` (empirical|historical|rhetorical|dataset|meta-analysis)
- ``Reliability`` (low|medium|high)
- ``Provenance`` (URL, DOI, source metadata)
- ``ExtractionMethod`` (manual|OCR|API|AKEL)
- ``ContentHash`` (SHA256 of evidence content)
- ``PublicationMode`` (Mode1|Mode2|Mode3)
- ``ReviewStatus`` (draft|verified|disputed|retracted)
- ``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
- ``NodeOrigin``, ``SignatureHash``
- ``Status`` (active|superseded)
VERDICT / VERDICT_VERSION:
- ``VERDICT`` is the anchor
- ``VERDICT_VERSION`` is the snapshot:
- ``VersionID`` (PK)
- ``VerdictID`` (FK to VERDICT)
- ``ParentVersionID``
- ``ClaimID`` (FK to CLAIM)
- ``ScenarioVersionID`` (FK to specific SCENARIO_VERSION)
- ``EvidenceVersionSet`` (JSON array of Evidence VersionIDs used)
- ``LikelihoodRange`` (0–1, with uncertainty bounds)
- ``ExplanationChain`` (JSON)
- ``UncertaintyFactors`` (JSON)
- ``PublicationMode`` (Mode1|Mode2|Mode3)
- ``ReviewStatus`` (draft|in_review|approved|retracted)
- ``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
- ``NodeOrigin``, ``SignatureHash``
- ``Status`` (current|outdated|superseded|retracted)
Many-to-Many Linking Tables
ScenarioEvidenceLink:
- Links scenario versions to evidence versions with relevance scoring
- ``ScenarioID``, ``ScenarioVersionID``
- ``EvidenceID``, ``EvidenceVersionID``
- ``RelevanceScore`` (0–1) - How relevant this evidence is to this scenario
- ``LinkJustification`` - Brief explanation of relevance
Purpose:
- Evidence can be used by multiple scenarios
- Scenarios can draw from multiple pieces of evidence
- Relevance scoring helps prioritize evidence
- Version-specific linking preserves historical accuracy
ClaimCluster:
- Semantic clustering of similar claims
- ``ClusterID`` (PK)
- ``EmbeddingVector`` - Vector representation for semantic search
- ``MemberList`` - List of ClaimIDs in this cluster
- ``Theme`` - Human-readable theme description
Key Changes in v0.9.1
Updated Field Names:
- `SafetyCategory` → `RiskTier` (consistency with risk tier system A/B/C)
- `SafetyClass` removed from Scenario (redundant with claim-level RiskTier)
Added Fields to All Version Entities:
- `PublicationMode` - Track Mode 1/2/3 status
- `ReviewStatus` - Track workflow state
- `NodeOrigin` - Federation provenance
- `CreatedBy` - FK to User/TechnicalUser (clarified)
New Entity:
- `TECHNICAL_USER` - Separate system processes from human users
Clarifications:
- `ScenarioVersionID` in Verdict (not just ScenarioID) - links to specific version
- `ContentHash` in Evidence - SHA256 for integrity checking
Data Model Behavior
Late-Arriving Evidence
When new evidence versions appear:
- Existing verdicts marked as outdated
2. Scenario relevance must be re-evaluated
3. Re-evaluation engine triggers verdict recomputation
4. New verdict versions created
5. Users notified of updates
Scenario Evolution
When a scenario's assumptions or definitions change:
- Creates new scenario version (not in-place update)
- All dependent verdicts must be recalculated
- Previous scenario versions remain accessible
- Version lineage preserved
Federated Nodes
Each node may share partial data:
- Claims and scenarios shared if relevant
- Evidence metadata shared, not always full files
- Version synchronization via NodeOrigin tracking
- Branching allowed for divergent interpretations
Visual Diagrams
The following diagrams provide visual representations of the data model structure and relationships.