Data Model (From Specification Chat)
5. Data Model
The FactHarbor data model centers on four fully versioned, immutable entities:
- Claim
- Scenario
- Evidence
- Verdict
These entities form the structured “truth landscape” for each claim.
The model is explicitly versioned, traceable, and federation-ready.
To keep the system auditable and explainable, FactHarbor uses a consistent
identity vs. version pattern:
- Identity entities (e.g. CLAIM, SCENARIO)
define *what* something is in a stable sense. - Version entities (e.g. CLAIM_VERSION, SCENARIO_VERSION)
define *how that thing looked at a given point in time*.
All reasoning (e.g. verdicts, review actions) is attached to versions, never to
mutable identities.
5.1 Core entities and versioning pattern
| Logical concept | Identity entity | Version entity | Notes |
| Claim (what people argue about) | CLAIM | CLAIM_VERSION | Claim text, phrasing, and metadata live in CLAIM_VERSION. The identity CLAIM stays stable across rephrasings. |
| Scenario (interpretive frame) | SCENARIO | SCENARIO_VERSION | A SCENARIO belongs to a CLAIM. Its versions capture evolving definitions, assumptions, and boundaries. |
| Evidence (source / datapoint) | EVIDENCE | EVIDENCE_VERSION | Identity of a source vs. specific extractions / updates over time. |
| Verdict (assessment) | VERDICT | VERDICT_VERSION | A VERDICT is defined per SCENARIO; VERDICT_VERSION captures the history of assessments. |
| Scenario–Evidence link | SCENARIO_EVIDENCE_LINK | SCENARIO_EVIDENCE_LINK_VERSION | Links bind scenario versions to evidence versions with relevance & direction. |
| Claim cluster (semantic group) | CLAIM_CLUSTER | – | Groups semantically related claims; mainly for discovery and navigation. |
Key design decisions:
- A CLAIM belongs to exactly one CLAIM_CLUSTER.
- A SCENARIO belongs to exactly one CLAIM
(scenarios live at the *claim* level, not per individual phrasing). - Verdicts and Scenario–Evidence links are always attached to versions:
- SCENARIO_VERSION +
EVIDENCE_VERSION →
SCENARIO_EVIDENCE_LINK_VERSION - SCENARIO_VERSION →
VERDICT_VERSION
This ensures that when a Scenario or Evidence changes, old verdicts and links
remain intact as historical records and can be revisited.
5.2 Core Data Model ERD (expanded, versioned)
The following Mermaid ER diagram shows the main entities and their relationships.
The convention is that fields ending in Id are primary keys,
and fields with ...IdFk are foreign keys.
Core Data Model ERD (Versioned)
This diagram shows the full core data model with all versioned entities.
erDiagram
CLAIM_CLUSTER {
string ClusterID PK
string EmbeddingVectorRef
string Theme
}
CLAIM {
string ClaimID PK
string ClusterID FK
string Status
datetime CreatedAt
}
CLAIM_VERSION {
string ClaimVersionID PK
string ClaimID FK
string Text
string ClaimType
string Domain
datetime CreatedAt
}
SCENARIO {
string ScenarioID PK
string ClaimID FK
string Name
datetime CreatedAt
}
SCENARIO_VERSION {
string ScenarioVersionID PK
string ScenarioID FK
string Definitions
string Assumptions
string Boundaries
datetime CreatedAt
}
EVIDENCE {
string EvidenceID PK
string SourceType
string URL
float ReliabilityScore
}
EVIDENCE_VERSION {
string EvidenceVersionID PK
string EvidenceID FK
string Summary
float ReliabilityScore
datetime CreatedAt
}
SCENARIO_EVIDENCE_LINK {
string LinkID PK
string ScenarioVersionID FK
string EvidenceVersionID FK
float Relevance
string Direction
}
VERDICT {
string VerdictID PK
string ScenarioID FK
}
VERDICT_VERSION {
string VerdictVersionID PK
string VerdictID FK
float Verdict
float Confidence
string Reasoning
datetime CreatedAt
}
CLAIM_CLUSTER ||--o{ CLAIM : contains
CLAIM ||--o{ CLAIM_VERSION : versions
CLAIM ||--o{ SCENARIO : has
SCENARIO ||--o{ SCENARIO_VERSION : versions
EVIDENCE ||--o{ EVIDENCE_VERSION : versions
SCENARIO_VERSION ||--o{ SCENARIO_EVIDENCE_LINK : links
EVIDENCE_VERSION ||--o{ SCENARIO_EVIDENCE_LINK : linked
SCENARIO ||--o{ VERDICT : assessed
VERDICT ||--o{ VERDICT_VERSION : versions
Core Data Model ERD (Versioned)
This diagram shows the full core data model with all versioned entities.
erDiagram
CLAIM_CLUSTER {
string ClusterID PK
string EmbeddingVectorRef
string Theme
}
CLAIM {
string ClaimID PK
string ClusterID FK
string Status
datetime CreatedAt
}
CLAIM_VERSION {
string ClaimVersionID PK
string ClaimID FK
string Text
string ClaimType
string Domain
datetime CreatedAt
}
SCENARIO {
string ScenarioID PK
string ClaimID FK
string Name
datetime CreatedAt
}
SCENARIO_VERSION {
string ScenarioVersionID PK
string ScenarioID FK
string Definitions
string Assumptions
string Boundaries
datetime CreatedAt
}
EVIDENCE {
string EvidenceID PK
string SourceType
string URL
float ReliabilityScore
}
EVIDENCE_VERSION {
string EvidenceVersionID PK
string EvidenceID FK
string Summary
float ReliabilityScore
datetime CreatedAt
}
SCENARIO_EVIDENCE_LINK {
string LinkID PK
string ScenarioVersionID FK
string EvidenceVersionID FK
float Relevance
string Direction
}
VERDICT {
string VerdictID PK
string ScenarioID FK
}
VERDICT_VERSION {
string VerdictVersionID PK
string VerdictID FK
float Verdict
float Confidence
string Reasoning
datetime CreatedAt
}
CLAIM_CLUSTER ||--o{ CLAIM : contains
CLAIM ||--o{ CLAIM_VERSION : versions
CLAIM ||--o{ SCENARIO : has
SCENARIO ||--o{ SCENARIO_VERSION : versions
EVIDENCE ||--o{ EVIDENCE_VERSION : versions
SCENARIO_VERSION ||--o{ SCENARIO_EVIDENCE_LINK : links
EVIDENCE_VERSION ||--o{ SCENARIO_EVIDENCE_LINK : linked
SCENARIO ||--o{ VERDICT : assessed
VERDICT ||--o{ VERDICT_VERSION : versions
Important points:
- Scenarios and Evidence are linked via their versions
(SCENARIO_VERSION and EVIDENCE_VERSION). - Verdicts are per ScenarioVersion and stored in VERDICT_VERSION.
- CLAIM_CLUSTER is shared across diagrams; it is shown here and in the Data Use / Review model.
All version entities are immutable: once created, they are never changed, only
superseded by newer versions.
5.3 Data Use & Review ERD
The Data Use model captures who does what with which versioned data:
- Users (including technical users)
- Roles and role assignments
- Review actions on versioned entities
Data Use ERD (Roles, Review & Versioned Entities)
This diagram shows how users, roles, and review actions relate to the
versioned core entities.
erDiagram
%% Core clusters shown for context
CLAIM_CLUSTER {
string ClusterID PK
string EmbeddingVectorRef
string Theme
}
CLAIM {
string ClaimID PK
string ClusterID FK
string Status
datetime CreatedAt
}
CLAIM_VERSION {
string ClaimVersionID PK
string ClaimID FK
string Text
string ClaimType
string Domain
datetime CreatedAt
}
SCENARIO {
string ScenarioID PK
string ClaimID FK
string Name
datetime CreatedAt
}
SCENARIO_VERSION {
string ScenarioVersionID PK
string ScenarioID FK
string Definitions
string Assumptions
string Boundaries
datetime CreatedAt
}
EVIDENCE {
string EvidenceID PK
string SourceType
string URL
float ReliabilityScore
}
EVIDENCE_VERSION {
string EvidenceVersionID PK
string EvidenceID FK
string Summary
float ReliabilityScore
datetime CreatedAt
}
VERDICT {
string VerdictID PK
string ScenarioID FK
}
VERDICT_VERSION {
string VerdictVersionID PK
string VerdictID FK
float Verdict
float Confidence
string Reasoning
datetime CreatedAt
}
%% Users and roles
USER {
string UserID PK
string Handle
string Email
}
TECHNICAL_USER {
string UserID PK
string SystemName
}
CONTRIBUTING_USER {
string UserID PK
string DisplayName
}
TRUSTED_CONTRIBUTOR {
string UserID PK
string TrustLevel
}
REVIEWER {
string UserID PK
string Domain
}
EXPERT {
string UserID PK
string ExpertiseArea
}
FEDERATION_NODE {
string NodeID PK
string Region
}
FEDERATION_ADMIN {
string UserID PK
string Permissions
}
REVIEW_ACTION {
string ReviewActionID PK
string UserID FK
string TargetEntityType
string TargetEntityVersionID
string ActionType
string Comment
datetime Timestamp
}
%% Inheritance / specialization (modelled as relationships)
USER ||--o{ TECHNICAL_USER : "is a"
USER ||--o{ CONTRIBUTING_USER : "is a"
CONTRIBUTING_USER ||--o{ TRUSTED_CONTRIBUTOR : "subset"
CONTRIBUTING_USER ||--o{ REVIEWER : "subset"
CONTRIBUTING_USER ||--o{ EXPERT : "subset"
TECHNICAL_USER ||--o{ FEDERATION_NODE : "operates"
TECHNICAL_USER ||--o{ FEDERATION_ADMIN : "administers"
%% Review actions on versioned entities
USER ||--o{ REVIEW_ACTION : performs
REVIEW_ACTION }o--|| CLAIM_VERSION : reviews
REVIEW_ACTION }o--|| SCENARIO_VERSION : reviews
REVIEW_ACTION }o--|| EVIDENCE_VERSION : reviews
REVIEW_ACTION }o--|| VERDICT_VERSION : reviews
Data Use ERD (Roles, Review & Versioned Entities)
This diagram shows how users, roles, and review actions relate to the
versioned core entities.
erDiagram
%% Core clusters shown for context
CLAIM_CLUSTER {
string ClusterID PK
string EmbeddingVectorRef
string Theme
}
CLAIM {
string ClaimID PK
string ClusterID FK
string Status
datetime CreatedAt
}
CLAIM_VERSION {
string ClaimVersionID PK
string ClaimID FK
string Text
string ClaimType
string Domain
datetime CreatedAt
}
SCENARIO {
string ScenarioID PK
string ClaimID FK
string Name
datetime CreatedAt
}
SCENARIO_VERSION {
string ScenarioVersionID PK
string ScenarioID FK
string Definitions
string Assumptions
string Boundaries
datetime CreatedAt
}
EVIDENCE {
string EvidenceID PK
string SourceType
string URL
float ReliabilityScore
}
EVIDENCE_VERSION {
string EvidenceVersionID PK
string EvidenceID FK
string Summary
float ReliabilityScore
datetime CreatedAt
}
VERDICT {
string VerdictID PK
string ScenarioID FK
}
VERDICT_VERSION {
string VerdictVersionID PK
string VerdictID FK
float Verdict
float Confidence
string Reasoning
datetime CreatedAt
}
%% Users and roles
USER {
string UserID PK
string Handle
string Email
}
TECHNICAL_USER {
string UserID PK
string SystemName
}
CONTRIBUTING_USER {
string UserID PK
string DisplayName
}
TRUSTED_CONTRIBUTOR {
string UserID PK
string TrustLevel
}
REVIEWER {
string UserID PK
string Domain
}
EXPERT {
string UserID PK
string ExpertiseArea
}
FEDERATION_NODE {
string NodeID PK
string Region
}
FEDERATION_ADMIN {
string UserID PK
string Permissions
}
REVIEW_ACTION {
string ReviewActionID PK
string UserID FK
string TargetEntityType
string TargetEntityVersionID
string ActionType
string Comment
datetime Timestamp
}
%% Inheritance / specialization (modelled as relationships)
USER ||--o{ TECHNICAL_USER : "is a"
USER ||--o{ CONTRIBUTING_USER : "is a"
CONTRIBUTING_USER ||--o{ TRUSTED_CONTRIBUTOR : "subset"
CONTRIBUTING_USER ||--o{ REVIEWER : "subset"
CONTRIBUTING_USER ||--o{ EXPERT : "subset"
TECHNICAL_USER ||--o{ FEDERATION_NODE : "operates"
TECHNICAL_USER ||--o{ FEDERATION_ADMIN : "administers"
%% Review actions on versioned entities
USER ||--o{ REVIEW_ACTION : performs
REVIEW_ACTION }o--|| CLAIM_VERSION : reviews
REVIEW_ACTION }o--|| SCENARIO_VERSION : reviews
REVIEW_ACTION }o--|| EVIDENCE_VERSION : reviews
REVIEW_ACTION }o--|| VERDICT_VERSION : reviews
Notes:
- Most roles (READER, CONTRIBUTOR, TRUSTED_CONTRIBUTOR, REVIEWER, MODERATOR,
SYSTEM_ADMIN, FEDERATION_OPERATOR, FEDERATION_ADMIN, …) are represented as rows
in ROLE. - TECHNICAL_USER captures strictly technical accounts (API keys,
node-to-node federation agents, batch jobs). All other roles can, in principle,
be held by both human and technical users where appropriate. - A READER normally does not perform REVIEW_ACTIONs, while
roles like REVIEWER, TRUSTED_CONTRIBUTOR, MODERATOR, and some federation roles
do.
5.4 Versioning and re-evaluation behavior
This section ties the data model to the re-evaluation logic
(described in more detail in the Versioning and Automation chapters).
- When a new EVIDENCE_VERSION is created:
- All related SCENARIO_EVIDENCE_LINK_VERSION entries referencing
that evidence version are candidates for re-assessment. - Related VERDICT_VERSION entries may become outdated and
are queued for re-evaluation.
- When a new SCENARIO_VERSION is created:
- It may inherit some links from earlier scenarios, or start empty depending
on the change classification (cosmetic vs. conceptual). - All verdicts for that scenario are recalculated and stored as new
VERDICT_VERSION entries.
- REVIEW_ACTIONs are always attached to the exact version that was seen by
the reviewer. This preserves a faithful audit trail if data later changes.
- In a federated environment, nodes can choose:
- which identity entities to replicate (CLAIM, SCENARIO, EVIDENCE, VERDICT)
- which versioned entities to replicate (e.g. only accepted VERDICT_VERSIONs,
only EVIDENCE_VERSIONs above a reliability threshold, etc.)
5.5 Behavioral Notes
5.5.1 Late-Arriving Evidence
New evidence versions can make existing verdicts outdated and may trigger
re-evaluation cascades. This is handled by the global trigger and automation
architecture (see the Versioning & Automation chapters).
5.5.2 Scenario Evolution
Scenario changes create new SCENARIO_VERSIONs; dependent verdicts and
Scenario–Evidence links are re-assessed. Old versions remain available for
historical comparison and reproducibility.
5.5.3 Federation
Federated nodes can replicate subsets of the graph, including:
- Claims and Scenarios of local interest
- Evidence metadata (without full content)
- Verdict lineages used for local decision-making
Federation-specific entities (such as FEDERATION_NODE,
replication logs, and trust rules) are described in the Federation &
Decentralization chapter and build on top of the core data model defined here.
USER
├── TECHNICAL_USER
│ ├── FEDERATION_ADMIN
│ └── AKEL_AGENT (optional future)
READER
└── CONTRIBUTING_USER
├── TRUSTED_CONTRIBUTOR
├── REVIEWER
├── EXPERT
├── MODERATOR
ADMIN
FEDERATION_ADMIN (administrative, but human)
1. Overall analysis & review of the data model
1.1 Strengths of the current design
Identity vs. version pattern
Using base entities plus version entities (CLAIM + CLAIM_VERSION, SCENARIO + SCENARIO_VERSION, etc.) is exactly how modern knowledge systems handle:- auditability
- time evolution
- re-evaluation triggers
- federation and partial replication
Scenario-centric reasoning
Separating Claim (what people argue about) from Scenario (interpretive frame) is very aligned with “truth landscape” style systems:- Scenarios explain why people disagree.
- Verdicts are tied to specific scenario versions → avoids mixing incompatible assumptions.
- Evidence and verdicts as first-class entities
Evidence is explicit, linked to scenarios, and verdicts are per scenario. This matches good practice from fact-checking, scientific assessment panels, and trust graphs. Cluster level (CLAIM_CLUSTER)
Grouping related claims avoids duplication and lets you:- reuse scenarios across paraphrases
- share embeddings / semantic search
- keep the system scalable as the corpus grows.
Explicit review layer (REVIEW_ACTION, roles, etc.)
Separating “data” from “who reviewed what” keeps the model clean, and is exactly what you want for:- governance
- permissions
- audit trails
- future trust scoring per user / role.
1.2 Design decisions I’m locking in (based on our discussions)
To make the model consistent and “state-of-the-art”, I will assume the following as current intended design:
Claims vs Scenarios
- CLAIM is the stable identity for “what people argue about”.
- CLAIM_VERSION are individual phrasings / formulations / metadata.
SCENARIO belongs to a CLAIM, not to a specific CLAIM_VERSION.
Rationale:- Many different phrasings share the same scenario.
- You avoid duplicating scenarios per wording.
- SCENARIO_VERSION holds detailed definitions, assumptions, boundaries, etc.
Version-specific reasoning
- Verdicts are always attached to SCENARIO_VERSION (not base SCENARIO).
- Evidence links are between SCENARIO_VERSION and EVIDENCE_VERSION.
→ This is what we agreed when we said “SCENARIO_EVIDENCE_LINK should link the respective versions instead”.
Clusters
- CLAIM_CLUSTER groups Claims (semantically close claims).
- It is visible in both diagrams (Core Data Model and Data Use).
Review vs data
All review happens on versioned entities:
- CLAIM_VERSION
- SCENARIO_VERSION
- EVIDENCE_VERSION
- SCENARIO_EVIDENCE_LINK_VERSION
- VERDICT_VERSION
- REVIEW_ACTION is the generic log of who did what on which version.
Users & roles
- USER has an attribute (or a linked entity) that distinguishes technical users from normal accounts.
- We keep TECHNICAL_USER as a specialisation of USER (strictly technical accounts).
- All human & technical accounts can hold roles via USER_ROLE_MEMBERSHIP.
Roles include:
- READER
- CONTRIBUTOR
- TRUSTED_CONTRIBUTOR
- REVIEWER
- MODERATOR
- SYSTEM_ADMIN / MAINTAINER
- FEDERATION_OPERATOR
- FEDERATION_ADMIN
(all present in the Data Use ERD, but as rows of ROLE rather than separate entities).
1.3 Gaps / potential problems
These are the main issues & missing areas I see:
Versioning text in chapter 5 is currently too thin (‘…’ placeholders)
The spec does not yet verbally spell out:
- the identity vs version pattern, systematically
- how re-evaluation triggers are derived from version changes
- how this aligns with federation (which versions are replicated where).
No explicit “provenance granularity” in the model
EVIDENCE is a single entity. For more advanced use cases, you may later want:
- EVIDENCE_SOURCE (the whole article/report/video)
- EVIDENCE_FRAGMENT (specific paragraph/clip with its own reliability, quote, etc.)
- For now, I’ll keep EVIDENCE/EVIDENCE_VERSION as is, but I’ll mention this as a possible extension.
Review target polymorphism
REVIEW_ACTION can apply to multiple entity types. In the diagram this shows as multiple relationships:
- CLAIM_VERSION → REVIEW_ACTION
- SCENARIO_VERSION → REVIEW_ACTION
- etc.
- A more “pure” relational modeling would use a generic “subjectType + subjectId” or an intermediate “REVIEW_TARGET” table.
- For readability, I’ll keep the simpler multi-edge representation and mention the polymorphism in text.
Federation details missing from core ERD
- There is no explicit FEDERATION_NODE / REPLICATION_LOG in the Data Model chapter.
- This is ok for “core logical data model”, but I’ll add a short note that federation metadata is handled in the Federation chapter and via additional entities.
Automation / AKEL artifacts left implicit
The Data Model chapter currently doesn’t describe:
- AKEL task queues
- extraction runs
- model versions
- That’s fine for now; I’ll just clarify that those belong to a “Processing / AKEL” submodel, not the core logical data model.