Data Model (From Specification Chat)

Evidence is explicit, linked to scenarios, and verdicts are per scenario. This matches good practice from fact-checking, scientific assessment panels, and trust graphs.

23

* (((

24

**Cluster level (CLAIM_CLUSTER)**

25

Grouping related claims avoids duplication and lets you:

26

27

* reuse scenarios across paraphrases

28

* share embeddings / semantic search

29

* keep the system scalable as the corpus grows.

30

)))

31

* (((

32

**Explicit review layer (REVIEW_ACTION, roles, etc.)**

33

Separating “data” from “who reviewed what” keeps the model clean, and is exactly what you want for:

* governance

* permissions

* audit trails

* future trust scoring per user / role.

)))

----

=== 1.2 Design decisions I’m locking in (based on our discussions) ===

44

45

To make the model consistent and “state-of-the-art”, I will assume the following as //current intended design//:

46

47

1. (((

48

**Claims vs Scenarios**

49

50

* CLAIM is the stable identity for “what people argue about”.

51

* CLAIM_VERSION are individual phrasings / formulations / metadata.

52

* (((

53

SCENARIO belongs to a **CLAIM**, not to a specific CLAIM_VERSION.

54

Rationale:

55

56

* Many different phrasings share the //same// scenario.

57

* You avoid duplicating scenarios per wording.

58

)))

59

* SCENARIO_VERSION holds detailed definitions, assumptions, boundaries, etc.

60

)))

61

1. (((

62

**Version-specific reasoning**

63

64

* **Verdicts** are always attached to SCENARIO_VERSION (not base SCENARIO).

65

* **Evidence links** are between SCENARIO_VERSION and EVIDENCE_VERSION.

66

→ This is what we agreed when we said //“SCENARIO_EVIDENCE_LINK should link the respective versions instead”//.

)))

1. (((

**Clusters**

* CLAIM_CLUSTER groups Claims (semantically close claims).

72

* It is visible in **both diagrams** (Core Data Model and Data Use).

)))

1. (((

**Review vs data**

* (((

All review happens **on versioned entities**:

* CLAIM_VERSION

* SCENARIO_VERSION

* EVIDENCE_VERSION

* SCENARIO_EVIDENCE_LINK_VERSION

84

* VERDICT_VERSION

85

)))

86

* REVIEW_ACTION is the generic log of //who// did //what// on //which version//.

)))

1. (((

**Users & roles**

* USER has an attribute (or a linked entity) that distinguishes **technical users** from normal accounts.

92

* We //keep// TECHNICAL_USER as a specialisation of USER (strictly technical accounts).

93

* All human & technical accounts can hold roles via USER_ROLE_MEMBERSHIP.

* (((

Roles include:

* READER

* CONTRIBUTOR

* TRUSTED_CONTRIBUTOR

100

* REVIEWER

101

* MODERATOR

102

* SYSTEM_ADMIN / MAINTAINER

103

* FEDERATION_OPERATOR

104

* FEDERATION_ADMIN

105

(all present in the Data Use ERD, but as rows of ROLE rather than separate entities).

)))

)))

----

=== 1.3 Gaps / potential problems ===

112

113

These are the main issues & missing areas I see:

114

115

1. (((

116

**Versioning text in chapter 5 is currently too thin (‘…’ placeholders)**

117

118

* (((

119

The spec does not yet //verbally// spell out:

120

121

* the identity vs version pattern, systematically

122

* how re-evaluation triggers are derived from version changes

123

* how this aligns with federation (which versions are replicated where).

)))

)))

1. (((

**No explicit “provenance granularity” in the model**

128

129

* (((

130

EVIDENCE is a single entity. For more advanced use cases, you may later want:

131

132

* EVIDENCE_SOURCE (the whole article/report/video)

133

* EVIDENCE_FRAGMENT (specific paragraph/clip with its own reliability, quote, etc.)

134

)))

135

* For now, I’ll keep EVIDENCE/EVIDENCE_VERSION as is, but I’ll mention this as a possible extension.

136

)))

137

1. (((

138

**Review target polymorphism**

139

140

* (((

141

REVIEW_ACTION can apply to multiple entity types. In the diagram this shows as multiple relationships:

142

143

* CLAIM_VERSION → REVIEW_ACTION

144

* SCENARIO_VERSION → REVIEW_ACTION

145

* etc.

146

)))

147

* A more “pure” relational modeling would use a generic “subjectType + subjectId” or an intermediate “REVIEW_TARGET” table.

148

* For readability, I’ll keep the simpler multi-edge representation and mention the polymorphism in text.

149

)))

150

1. (((

151

**Federation details missing from core ERD**

152

153

* There is no explicit FEDERATION_NODE / REPLICATION_LOG in the Data Model chapter.

154

* This is ok for “core logical data model”, but I’ll add a short note that federation metadata is handled in the Federation chapter and via additional entities.

155

)))

156

1. (((

157

**Automation / AKEL artifacts left implicit**

158

159

* (((

160

The Data Model chapter currently doesn’t describe:

* AKEL task queues

* extraction runs

* model versions

)))

* That’s fine for now; I’ll just clarify that those belong to a “Processing / AKEL” submodel, not the core logical data model.

)))

= 5. Data Model =

The FactHarbor data model centers on four fully versioned, immutable entities:

* **Claim**

* **Scenario**

* **Evidence**

* **Verdict**

These entities form the structured **“truth landscape”** for each claim.

179

The model is explicitly **versioned**, **traceable**, and **federation-ready**.

180

181

To keep the system auditable and explainable, FactHarbor uses a consistent

182

**identity vs. version** pattern:

183

184

* Identity entities (e.g. {{code}}CLAIM{{/code}}, {{code}}SCENARIO{{/code}})

185

define *what* something is in a stable sense.

186

* Version entities (e.g. {{code}}CLAIM_VERSION{{/code}}, {{code}}SCENARIO_VERSION{{/code}})

187

define *how that thing looked at a given point in time*.

188

189

All reasoning (e.g. verdicts, review actions) is attached to **versions**, never to

mutable identities.

----

= 5.1 Core entities and versioning pattern =

195

196

(% class="wikitable" %)

197

| **Logical concept** | **Identity entity** | **Version entity** | **Notes**

198

| Claim (what people argue about) | {{code}}CLAIM{{/code}} | {{code}}CLAIM_VERSION{{/code}} | Claim text, phrasing, and metadata live in {{code}}CLAIM_VERSION{{/code}}. The identity {{code}}CLAIM{{/code}} stays stable across rephrasings.

199

| Scenario (interpretive frame) | {{code}}SCENARIO{{/code}} | {{code}}SCENARIO_VERSION{{/code}} | A SCENARIO belongs to a CLAIM. Its versions capture evolving definitions, assumptions, and boundaries.

200

| Evidence (source / datapoint) | {{code}}EVIDENCE{{/code}} | {{code}}EVIDENCE_VERSION{{/code}} | Identity of a source vs. specific extractions / updates over time.

201

| Verdict (assessment) | {{code}}VERDICT{{/code}} | {{code}}VERDICT_VERSION{{/code}} | A VERDICT is defined per SCENARIO; VERDICT_VERSION captures the history of assessments.

202

| Scenario–Evidence link | {{code}}SCENARIO_EVIDENCE_LINK{{/code}} | {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} | Links bind scenario versions to evidence versions with relevance & direction.

203

| Claim cluster (semantic group) | {{code}}CLAIM_CLUSTER{{/code}} | – | Groups semantically related claims; mainly for discovery and navigation.

204

205

Key design decisions:

206

207

* A {{code}}CLAIM{{/code}} belongs to exactly one {{code}}CLAIM_CLUSTER{{/code}}.

208

* A {{code}}SCENARIO{{/code}} belongs to exactly one {{code}}CLAIM{{/code}}

209

(scenarios live at the *claim* level, not per individual phrasing).

210

* Verdicts and Scenario–Evidence links are always attached to **versions**:

211

* {{code}}SCENARIO_VERSION{{/code}} +

212

{{code}}EVIDENCE_VERSION{{/code}} →

213

{{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}}

214

* {{code}}SCENARIO_VERSION{{/code}} →

215

{{code}}VERDICT_VERSION{{/code}}

216

217

This ensures that when a Scenario or Evidence changes, old verdicts and links

218

remain intact as historical records and can be revisited.

----

= 5.2 Core Data Model ERD (expanded, versioned) =

223

224

The following Mermaid ER diagram shows the main entities and their relationships.

225

The convention is that fields ending in {{code}}Id{{/code}} are primary keys,

226

and fields with {{code}}...IdFk{{/code}} are foreign keys.

erDiagram

CLAIM_CLUSTER {

string ClusterID PK

string EmbeddingVectorRef

string Theme

}

CLAIM {

string ClaimID PK

string ClusterID FK

string Status

datetime CreatedAt

}

CLAIM_VERSION {

string ClaimVersionID PK

string ClaimID FK

string Text

string ClaimType

string Domain

datetime CreatedAt

}

SCENARIO {

string ScenarioID PK

string ClaimID FK

string Name

datetime CreatedAt

}

SCENARIO_VERSION {

string ScenarioVersionID PK

string ScenarioID FK

string Definitions

string Assumptions

string Boundaries

datetime CreatedAt

}

EVIDENCE {

string EvidenceID PK

string SourceType

string URL

float ReliabilityScore

}

EVIDENCE_VERSION {

string EvidenceVersionID PK

277

string EvidenceID FK

278

string Summary

279

float ReliabilityScore

datetime CreatedAt

}

SCENARIO_EVIDENCE_LINK {

284

string LinkID PK

285

string ScenarioVersionID FK

286

string EvidenceVersionID FK

float Relevance

string Direction

}

VERDICT {

string VerdictID PK

string ScenarioID FK

}

VERDICT_VERSION {

string VerdictVersionID PK

string VerdictID FK

float Verdict

float Confidence

string Reasoning

datetime CreatedAt

}

CLAIM_CLUSTER ||--o{ CLAIM : contains

306

CLAIM ||--o{ CLAIM_VERSION : versions

307

308

CLAIM ||--o{ SCENARIO : has

309

SCENARIO ||--o{ SCENARIO_VERSION : versions

310

311

EVIDENCE ||--o{ EVIDENCE_VERSION : versions

312

313

SCENARIO_VERSION ||--o{ SCENARIO_EVIDENCE_LINK : links

314

EVIDENCE_VERSION ||--o{ SCENARIO_EVIDENCE_LINK : linked

315

316

SCENARIO ||--o{ VERDICT : assessed

317

VERDICT ||--o{ VERDICT_VERSION : versions

**Important points:**

322

323

* Scenarios and Evidence are **linked via their versions**

324

({{code}}SCENARIO_VERSION{{/code}} and {{code}}EVIDENCE_VERSION{{/code}}).

325

* Verdicts are **per ScenarioVersion** and stored in {{code}}VERDICT_VERSION{{/code}}.

326

* {{code}}CLAIM_CLUSTER{{/code}} is shared across diagrams; it is shown here and in the Data Use / Review model.

327

328

All version entities are immutable: once created, they are never changed, only

329

superseded by newer versions.

----

= 5.3 Data Use & Review ERD (expanded, versioned) =

334

335

The **Data Use** model captures who does what with which versioned data:

336

337

* Users (including technical users)

338

* Roles and role assignments

339

* Review actions on versioned entities

erDiagram

%% Core clusters shown for context

344

CLAIM_CLUSTER {

345

string ClusterID PK

346

string EmbeddingVectorRef

string Theme

}

CLAIM {

string ClaimID PK

string ClusterID FK

string Status

datetime CreatedAt

}

CLAIM_VERSION {

string ClaimVersionID PK

string ClaimID FK

string Text

string ClaimType

string Domain

datetime CreatedAt

}

SCENARIO {

string ScenarioID PK

string ClaimID FK

string Name

datetime CreatedAt

}

SCENARIO_VERSION {

string ScenarioVersionID PK

string ScenarioID FK

string Definitions

string Assumptions

string Boundaries

datetime CreatedAt

}

EVIDENCE {

string EvidenceID PK

string SourceType

string URL

float ReliabilityScore

}

EVIDENCE_VERSION {

string EvidenceVersionID PK

391

string EvidenceID FK

392

string Summary

393

float ReliabilityScore

datetime CreatedAt

}

VERDICT {

string VerdictID PK

string ScenarioID FK

}

VERDICT_VERSION {

string VerdictVersionID PK

string VerdictID FK

float Verdict

float Confidence

string Reasoning

datetime CreatedAt

}

%% Users and roles

USER {

string UserID PK

string Handle

string Email

}

TECHNICAL_USER {

string UserID PK

string SystemName

}

CONTRIBUTING_USER {

string UserID PK

string DisplayName

}

TRUSTED_CONTRIBUTOR {

string UserID PK

string TrustLevel

}

REVIEWER {

string UserID PK

string Domain

}

EXPERT {

string UserID PK

string ExpertiseArea

}

FEDERATION_NODE {

string NodeID PK

string Region

}

FEDERATION_ADMIN {

string UserID PK

string Permissions

}

REVIEW_ACTION {

string ReviewActionID PK

455

string UserID FK

456

string TargetEntityType

457

string TargetEntityVersionID

string ActionType

string Comment

datetime Timestamp

}

%% Inheritance / specialization (modelled as relationships)

464

USER ||--o{ TECHNICAL_USER : "is a"

465

USER ||--o{ CONTRIBUTING_USER : "is a"

466

467

CONTRIBUTING_USER ||--o{ TRUSTED_CONTRIBUTOR : "subset"

468

CONTRIBUTING_USER ||--o{ REVIEWER : "subset"

469

CONTRIBUTING_USER ||--o{ EXPERT : "subset"

470

471

TECHNICAL_USER ||--o{ FEDERATION_NODE : "operates"

472

TECHNICAL_USER ||--o{ FEDERATION_ADMIN : "administers"

473

474

%% Review actions on versioned entities

475

USER ||--o{ REVIEW_ACTION : performs

476

477

REVIEW_ACTION }o--|| CLAIM_VERSION : reviews

478

REVIEW_ACTION }o--|| SCENARIO_VERSION : reviews

479

REVIEW_ACTION }o--|| EVIDENCE_VERSION : reviews

480

REVIEW_ACTION }o--|| VERDICT_VERSION : reviews

Notes:

* Most roles (READER, CONTRIBUTOR, TRUSTED_CONTRIBUTOR, REVIEWER, MODERATOR,

487

SYSTEM_ADMIN, FEDERATION_OPERATOR, FEDERATION_ADMIN, …) are represented as rows

488

in {{code}}ROLE{{/code}}.

489

* {{code}}TECHNICAL_USER{{/code}} captures strictly technical accounts (API keys,

490

node-to-node federation agents, batch jobs). All other roles can, in principle,

491

be held by both human and technical users where appropriate.

492

* A {{code}}READER{{/code}} normally does **not** perform REVIEW_ACTIONs, while

493

roles like REVIEWER, TRUSTED_CONTRIBUTOR, MODERATOR, and some federation roles

do.

----

= 5.4 Versioning and re-evaluation behavior =

499

500

This section ties the data model to the re-evaluation logic

501

(described in more detail in the Versioning and Automation chapters).

502

503

* When a new {{code}}EVIDENCE_VERSION{{/code}} is created:

504

* All related {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} entries referencing

505

that evidence version are candidates for re-assessment.

506

* Related {{code}}VERDICT_VERSION{{/code}} entries may become **outdated** and

507

are queued for re-evaluation.

508

509

* When a new {{code}}SCENARIO_VERSION{{/code}} is created:

510

* It may inherit some links from earlier scenarios, or start empty depending

511

on the change classification (cosmetic vs. conceptual).

512

* All verdicts for that scenario are recalculated and stored as new

513

{{code}}VERDICT_VERSION{{/code}} entries.

514

515

* REVIEW_ACTIONs are always attached to the **exact version** that was seen by

516

the reviewer. This preserves a faithful audit trail if data later changes.

517

518

* In a federated environment, nodes can choose:

519

* which identity entities to replicate (CLAIM, SCENARIO, EVIDENCE, VERDICT)

520

* which versioned entities to replicate (e.g. only accepted VERDICT_VERSIONs,

521

only EVIDENCE_VERSIONs above a reliability threshold, etc.)

----

= 5.5 Behavioral Notes =

526

527

== 5.5.1 Late-Arriving Evidence ==

528

529

New evidence versions can make existing verdicts **outdated** and may trigger

530

re-evaluation cascades. This is handled by the global trigger and automation

531

architecture (see the Versioning & Automation chapters).

532

533

== 5.5.2 Scenario Evolution ==

534

535

Scenario changes create new SCENARIO_VERSIONs; dependent verdicts and

536

Scenario–Evidence links are re-assessed. Old versions remain available for

537

historical comparison and reproducibility.

538

539

== 5.5.3 Federation ==

540

541

Federated nodes can replicate subsets of the graph, including:

542

543

* Claims and Scenarios of local interest

544

* Evidence metadata (without full content)

545

* Verdict lineages used for local decision-making

546

547

Federation-specific entities (such as {{code}}FEDERATION_NODE{{/code}},

548

replication logs, and trust rules) are described in the Federation &

549

Decentralization chapter and build on top of the core data model defined here.

Wiki source code of Data Model (From Specification Chat)