Data Model
Data Model
This page describes the current data model for FactHarbor v0.9.1.
1. Versioning Strategy
Every entity in FactHarbor has a full immutable version history. This ensures:
- Complete auditability
- Ability to reconstruct historical state
- Federation-compatible lineage tracking
- Transparent evolution of claims, scenarios, and verdicts
1.1 Core Versioning Principles
Immutability:
- Each version is stored independently
- Versions cannot be deleted, only superseded
- Historical versions remain accessible
Lineage:
- Each version links to its parent via `ParentVersionID`
- Forms directed acyclic graph (DAG) of changes
- Supports branching in federated environments
Provenance:
- Every version timestamped (`CreatedAt`)
- Author type recorded (`AuthorType`: Human, AI, ExternalNode)
- Justification captured (`JustificationText`)
- Digital signatures for integrity (`SignatureHash` in Release 1.0)
Federation Support:
- Versions can originate from remote nodes
- Conflict detection via lineage comparison
- Parallel version trees for branching scenarios
- Cross-node version synchronization
1.2 Common Version Fields
All versioned entities include:
- VersionID: Unique identifier for this specific version
- ParentVersionID: Link to previous version (null for first version)
- CreatedAt: Timestamp (ISO 8601, UTC)
- AuthorType: Human | AI | ExternalNode
- CreatedBy: Foreign key to User or TechnicalUser
- JustificationText: Brief explanation of changes
- PublicationMode: Mode1 (draft) | Mode2 (AI-published) | Mode3 (human-reviewed)
- ReviewStatus: Workflow state (draft|in_review|approved|rejected)
- NodeOrigin: Node ID where version was created (for federation)
- SignatureHash: Cryptographic signature (Release 1.0)
2. Core Entity Definitions
2.1 User Entities
USER (base user table):
- ``UserID`` (PK)
- ``UserType`` (Reader|Contributor|Reviewer|Auditor|Expert|Moderator|Maintainer)
- ``DisplayName``
- ``Email`` (for Contributors and above)
- ``RegisteredAt``
- ``LastActive``
- ``Status`` (active|suspended|banned)
TECHNICAL_USER (system processes):
- ``SystemID`` (PK)
- ``SystemName``
- ``Purpose`` (AKEL|FederationSync|BackupService|Monitor|Audit)
- ``CreatedBy`` (FK to Maintainer who created this system user)
- ``CreatedAt``
- ``Status`` (active|paused|deprecated)
- ``ApiKey`` (encrypted)
- ``Permissions`` (JSON - authorized operations)
Examples of Technical Users:
- AKEL instances (AI processing)
- Federation sync bots
- Scheduled audit tasks
- Backup services
- Monitoring systems
- External API integrations
2.2 Content Entities
The system relies on the following versioned core entities:
CLAIM_CLUSTER:
- ``ClusterID`` (PK)
- ``EmbeddingVectorRef``
- ``Theme``
- Groups related claims into topical clusters
- One Cluster has many Claims
- A Claim belongs to exactly one primary cluster
CLAIM / CLAIM_VERSION:
- ``CLAIM`` is the long-lived anchor for a real-world claim
- ``CLAIM_VERSION`` is an immutable snapshot that includes:
- ``VersionID`` (PK)
- ``ClaimID`` (FK to CLAIM)
- ``ParentVersionID`` (FK to prior version, nullable)
- ``Text``
- ``Domain``
- ``ClaimType`` (literal|metaphorical|rhetorical|supernatural)
- ``Evaluability`` (empirical|subjective|non-falsifiable)
- ``RiskTier`` (A|B|C) - replaced SafetyCategory for consistency
- ``PublicationMode`` (Mode1|Mode2|Mode3)
- ``ReviewStatus`` (draft|in_review|approved|rejected)
- ``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
- ``NodeOrigin``, ``SignatureHash``
- ``Status`` (active|superseded|merged)
SCENARIO / SCENARIO_VERSION:
- ``SCENARIO`` is the anchor for a scenario across time
- ``SCENARIO_VERSION`` is an immutable snapshot:
- ``VersionID`` (PK)
- ``ScenarioID`` (FK to SCENARIO)
- ``ParentVersionID``
- ``ClaimID`` (FK to CLAIM)
- ``Definitions`` (JSON)
- ``Boundaries`` (JSON)
- ``Assumptions`` (JSON)
- ``Context`` (text)
- ``EvaluationMethod`` (text)
- ``PublicationMode`` (Mode1|Mode2|Mode3)
- ``ReviewStatus`` (draft|in_review|approved|rejected)
- ``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
- ``NodeOrigin``, ``SignatureHash``
- ``Status`` (active|superseded|deprecated)
Note: SafetyClass removed from Scenario - risk tier is at claim level
EVIDENCE / EVIDENCE_VERSION:
- ``EVIDENCE`` is the anchor
- ``EVIDENCE_VERSION`` is the versioned snapshot:
- ``VersionID`` (PK)
- ``EvidenceID`` (FK to EVIDENCE)
- ``ParentVersionID``
- ``Type`` (paper|dataset|report|transcript|expert|media)
- ``Category`` (empirical|historical|rhetorical|dataset|meta-analysis)
- ``Reliability`` (low|medium|high)
- ``Provenance`` (URL, DOI, source metadata)
- ``ExtractionMethod`` (manual|OCR|API|AKEL)
- ``ContentHash`` (SHA256 of evidence content)
- ``PublicationMode`` (Mode1|Mode2|Mode3)
- ``ReviewStatus`` (draft|verified|disputed|retracted)
- ``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
- ``NodeOrigin``, ``SignatureHash``
- ``Status`` (active|superseded)
VERDICT / VERDICT_VERSION:
- ``VERDICT`` is the anchor
- ``VERDICT_VERSION`` is the snapshot:
- ``VersionID`` (PK)
- ``VerdictID`` (FK to VERDICT)
- ``ParentVersionID``
- ``ClaimID`` (FK to CLAIM)
- ``ScenarioVersionID`` (FK to specific SCENARIO_VERSION)
- ``EvidenceVersionSet`` (JSON array of Evidence VersionIDs used)
- ``LikelihoodRange`` (0–1, with uncertainty bounds)
- ``ExplanationChain`` (JSON)
- ``UncertaintyFactors`` (JSON)
- ``PublicationMode`` (Mode1|Mode2|Mode3)
- ``ReviewStatus`` (draft|in_review|approved|retracted)
- ``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
- ``NodeOrigin``, ``SignatureHash``
- ``Status`` (current|outdated|superseded|retracted)
3. Many-to-Many Linking Tables
ScenarioEvidenceLink:
- Links scenario versions to evidence versions with relevance scoring
- ``ScenarioID``, ``ScenarioVersionID``
- ``EvidenceID``, ``EvidenceVersionID``
- ``RelevanceScore`` (0–1) - How relevant this evidence is to this scenario
- ``LinkJustification`` - Brief explanation of relevance
Purpose:
- Evidence can be used by multiple scenarios
- Scenarios can draw from multiple pieces of evidence
- Relevance scoring helps prioritize evidence
- Version-specific linking preserves historical accuracy
ClaimCluster:
- Semantic clustering of similar claims
- ``ClusterID`` (PK)
- ``EmbeddingVector`` - Vector representation for semantic search
- ``MemberList`` - List of ClaimIDs in this cluster
- ``Theme`` - Human-readable theme description
4. Key Changes in v0.9.1
Updated Field Names:
- `SafetyCategory` → `RiskTier` (consistency with risk tier system A/B/C)
- `SafetyClass` removed from Scenario (redundant with claim-level RiskTier)
Added Fields to All Version Entities:
- `PublicationMode` - Track Mode 1/2/3 status
- `ReviewStatus` - Track workflow state
- `NodeOrigin` - Federation provenance
- `CreatedBy` - FK to User/TechnicalUser (clarified)
New Entity:
- `TECHNICAL_USER` - Separate system processes from human users
Clarifications:
- `ScenarioVersionID` in Verdict (not just ScenarioID) - links to specific version
- `ContentHash` in Evidence - SHA256 for integrity checking
5. Data Model Behavior
5.1 Late-Arriving Evidence
When new evidence versions appear:
- Existing verdicts marked as outdated
2. Scenario relevance must be re-evaluated
3. Re-evaluation engine triggers verdict recomputation
4. New verdict versions created
5. Users notified of updates
5.2 Scenario Evolution
When a scenario's assumptions or definitions change:
- Creates new scenario version (not in-place update)
- All dependent verdicts must be recalculated
- Previous scenario versions remain accessible
- Version lineage preserved
5.3 Federated Nodes
Each node may share partial data:
- Claims and scenarios shared if relevant
- Evidence metadata shared, not always full files
- Version synchronization via NodeOrigin tracking
- Branching allowed for divergent interpretations
6. Visual Diagrams
The following diagrams provide visual representations of the data model structure and relationships.
6.1 Core Data Model ERD
Core Data Model ERD
erDiagram
CLAIM_CLUSTER {
string ClusterID PK
string EmbeddingVectorRef
string Theme
}
CLAIM {
string ClaimID PK
string ClusterID FK
string Status
datetime CreatedAt
}
CLAIM_VERSION {
string ClaimVersionID PK
string ClaimID FK
string Text
string ClaimType
string Domain
datetime CreatedAt
}
SCENARIO {
string ScenarioID PK
string ClaimID FK
string Name
datetime CreatedAt
}
SCENARIO_VERSION {
string ScenarioVersionID PK
string ScenarioID FK
string Definitions
string Assumptions
string Boundaries
datetime CreatedAt
}
EVIDENCE {
string EvidenceID PK
string SourceType
string URL
float ReliabilityScore
}
EVIDENCE_VERSION {
string EvidenceVersionID PK
string EvidenceID FK
string Summary
float ReliabilityScore
datetime CreatedAt
}
SCENARIO_EVIDENCE_VERSION_LINK {
string LinkID PK
string ScenarioVersionID FK
string EvidenceVersionID FK
float Relevance
string Direction
}
VERDICT {
string VerdictID PK
string ScenarioID FK
}
VERDICT_VERSION {
string VerdictVersionID PK
string VerdictID FK
float Verdict
float Confidence
string Reasoning
datetime CreatedAt
}
CLAIM_CLUSTER ||--o{ CLAIM : contains
CLAIM ||--o{ CLAIM_VERSION : versions
CLAIM ||--o{ SCENARIO : has
SCENARIO ||--o{ SCENARIO_VERSION : versions
EVIDENCE ||--o{ EVIDENCE_VERSION : versions
SCENARIO_VERSION ||--o{ SCENARIO_EVIDENCE_VERSION_LINK : links
EVIDENCE_VERSION ||--o{ SCENARIO_EVIDENCE_VERSION_LINK : linked
SCENARIO ||--o{ VERDICT : assessed
VERDICT ||--o{ VERDICT_VERSION : versions