Data Model (From Specification Chat)

Evidence is explicit, linked to scenarios, and verdicts are per scenario. This matches good practice from fact-checking, scientific assessment panels, and trust graphs.

23

* (((

24

**Cluster level (CLAIM_CLUSTER)**

25

Grouping related claims avoids duplication and lets you:

26

27

* reuse scenarios across paraphrases

28

* share embeddings / semantic search

29

* keep the system scalable as the corpus grows.

30

)))

31

* (((

32

**Explicit review layer (REVIEW_ACTION, roles, etc.)**

33

Separating “data” from “who reviewed what” keeps the model clean, and is exactly what you want for:

* governance

* permissions

* audit trails

* future trust scoring per user / role.

)))

----

=== 1.2 Design decisions I’m locking in (based on our discussions) ===

44

45

To make the model consistent and “state-of-the-art”, I will assume the following as //current intended design//:

46

47

1. (((

48

**Claims vs Scenarios**

49

50

* CLAIM is the stable identity for “what people argue about”.

51

* CLAIM_VERSION are individual phrasings / formulations / metadata.

52

* (((

53

SCENARIO belongs to a **CLAIM**, not to a specific CLAIM_VERSION.

54

Rationale:

55

56

* Many different phrasings share the //same// scenario.

57

* You avoid duplicating scenarios per wording.

58

)))

59

* SCENARIO_VERSION holds detailed definitions, assumptions, boundaries, etc.

60

)))

61

1. (((

62

**Version-specific reasoning**

63

64

* **Verdicts** are always attached to SCENARIO_VERSION (not base SCENARIO).

65

* **Evidence links** are between SCENARIO_VERSION and EVIDENCE_VERSION.

66

→ This is what we agreed when we said //“SCENARIO_EVIDENCE_LINK should link the respective versions instead”//.

)))

1. (((

**Clusters**

* CLAIM_CLUSTER groups Claims (semantically close claims).

72

* It is visible in **both diagrams** (Core Data Model and Data Use).

)))

1. (((

**Review vs data**

* (((

All review happens **on versioned entities**:

* CLAIM_VERSION

* SCENARIO_VERSION

* EVIDENCE_VERSION

* SCENARIO_EVIDENCE_LINK_VERSION

84

* VERDICT_VERSION

85

)))

86

* REVIEW_ACTION is the generic log of //who// did //what// on //which version//.

)))

1. (((

**Users & roles**

* USER has an attribute (or a linked entity) that distinguishes **technical users** from normal accounts.

92

* We //keep// TECHNICAL_USER as a specialisation of USER (strictly technical accounts).

93

* All human & technical accounts can hold roles via USER_ROLE_MEMBERSHIP.

* (((

Roles include:

* READER

* CONTRIBUTOR

* TRUSTED_CONTRIBUTOR

100

* REVIEWER

101

* MODERATOR

102

* SYSTEM_ADMIN / MAINTAINER

103

* FEDERATION_OPERATOR

104

* FEDERATION_ADMIN

105

(all present in the Data Use ERD, but as rows of ROLE rather than separate entities).

)))

)))

----

=== 1.3 Gaps / potential problems ===

112

113

These are the main issues & missing areas I see:

114

115

1. (((

116

**Versioning text in chapter 5 is currently too thin (‘…’ placeholders)**

117

118

* (((

119

The spec does not yet //verbally// spell out:

120

121

* the identity vs version pattern, systematically

122

* how re-evaluation triggers are derived from version changes

123

* how this aligns with federation (which versions are replicated where).

)))

)))

1. (((

**No explicit “provenance granularity” in the model**

128

129

* (((

130

EVIDENCE is a single entity. For more advanced use cases, you may later want:

131

132

* EVIDENCE_SOURCE (the whole article/report/video)

133

* EVIDENCE_FRAGMENT (specific paragraph/clip with its own reliability, quote, etc.)

134

)))

135

* For now, I’ll keep EVIDENCE/EVIDENCE_VERSION as is, but I’ll mention this as a possible extension.

136

)))

137

1. (((

138

**Review target polymorphism**

139

140

* (((

141

REVIEW_ACTION can apply to multiple entity types. In the diagram this shows as multiple relationships:

142

143

* CLAIM_VERSION → REVIEW_ACTION

144

* SCENARIO_VERSION → REVIEW_ACTION

145

* etc.

146

)))

147

* A more “pure” relational modeling would use a generic “subjectType + subjectId” or an intermediate “REVIEW_TARGET” table.

148

* For readability, I’ll keep the simpler multi-edge representation and mention the polymorphism in text.

149

)))

150

1. (((

151

**Federation details missing from core ERD**

152

153

* There is no explicit FEDERATION_NODE / REPLICATION_LOG in the Data Model chapter.

154

* This is ok for “core logical data model”, but I’ll add a short note that federation metadata is handled in the Federation chapter and via additional entities.

155

)))

156

1. (((

157

**Automation / AKEL artifacts left implicit**

158

159

* (((

160

The Data Model chapter currently doesn’t describe:

* AKEL task queues

* extraction runs

* model versions

)))

* That’s fine for now; I’ll just clarify that those belong to a “Processing / AKEL” submodel, not the core logical data model.

)))

= 5. Data Model =

The FactHarbor data model centers on four fully versioned, immutable entities:

* **Claim**

* **Scenario**

* **Evidence**

* **Verdict**

These entities form the structured **“truth landscape”** for each claim.

179

The model is explicitly **versioned**, **traceable**, and **federation-ready**.

180

181

To keep the system auditable and explainable, FactHarbor uses a consistent

182

**identity vs. version** pattern:

183

184

* Identity entities (e.g. {{code}}CLAIM{{/code}}, {{code}}SCENARIO{{/code}})

185

define *what* something is in a stable sense.

186

* Version entities (e.g. {{code}}CLAIM_VERSION{{/code}}, {{code}}SCENARIO_VERSION{{/code}})

187

define *how that thing looked at a given point in time*.

188

189

All reasoning (e.g. verdicts, review actions) is attached to **versions**, never to

mutable identities.

----

= 5.1 Core entities and versioning pattern =

195

196

(% class="wikitable" %)

197

| **Logical concept** | **Identity entity** | **Version entity** | **Notes**

198

| Claim (what people argue about) | {{code}}CLAIM{{/code}} | {{code}}CLAIM_VERSION{{/code}} | Claim text, phrasing, and metadata live in {{code}}CLAIM_VERSION{{/code}}. The identity {{code}}CLAIM{{/code}} stays stable across rephrasings.

199

| Scenario (interpretive frame) | {{code}}SCENARIO{{/code}} | {{code}}SCENARIO_VERSION{{/code}} | A SCENARIO belongs to a CLAIM. Its versions capture evolving definitions, assumptions, and boundaries.

200

| Evidence (source / datapoint) | {{code}}EVIDENCE{{/code}} | {{code}}EVIDENCE_VERSION{{/code}} | Identity of a source vs. specific extractions / updates over time.

201

| Verdict (assessment) | {{code}}VERDICT{{/code}} | {{code}}VERDICT_VERSION{{/code}} | A VERDICT is defined per SCENARIO; VERDICT_VERSION captures the history of assessments.

202

| Scenario–Evidence link | {{code}}SCENARIO_EVIDENCE_LINK{{/code}} | {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} | Links bind scenario versions to evidence versions with relevance & direction.

203

| Claim cluster (semantic group) | {{code}}CLAIM_CLUSTER{{/code}} | – | Groups semantically related claims; mainly for discovery and navigation.

204

205

Key design decisions:

206

207

* A {{code}}CLAIM{{/code}} belongs to exactly one {{code}}CLAIM_CLUSTER{{/code}}.

208

* A {{code}}SCENARIO{{/code}} belongs to exactly one {{code}}CLAIM{{/code}}

209

(scenarios live at the *claim* level, not per individual phrasing).

210

* Verdicts and Scenario–Evidence links are always attached to **versions**:

211

* {{code}}SCENARIO_VERSION{{/code}} +

212

{{code}}EVIDENCE_VERSION{{/code}} →

213

{{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}}

214

* {{code}}SCENARIO_VERSION{{/code}} →

215

{{code}}VERDICT_VERSION{{/code}}

216

217

This ensures that when a Scenario or Evidence changes, old verdicts and links

218

remain intact as historical records and can be revisited.

----

= 5.2 Core Data Model ERD (expanded, versioned) =

223

224

The following Mermaid ER diagram shows the main entities and their relationships.

225

The convention is that fields ending in {{code}}Id{{/code}} are primary keys,

226

and fields with {{code}}...IdFk{{/code}} are foreign keys.

227

228

{{comment}} Core Data Model ERD (Mermaid, from /Specification/Diagrams/Data Model) {{/comment}}

229

{{include document="FactHarbor.Playground.Core Data Model ERD Page (from Specification chat).WebHome"/}}

230

231

**Important points:**

232

233

* Scenarios and Evidence are **linked via their versions**

234

({{code}}SCENARIO_VERSION{{/code}} and {{code}}EVIDENCE_VERSION{{/code}}).

235

* Verdicts are **per ScenarioVersion** and stored in {{code}}VERDICT_VERSION{{/code}}.

236

* {{code}}CLAIM_CLUSTER{{/code}} is shared across diagrams; it is shown here and in the Data Use / Review model.

237

238

All version entities are immutable: once created, they are never changed, only

239

superseded by newer versions.

----

= 5.3 Data Use & Review ERD (expanded, versioned) =

244

245

The **Data Use** model captures who does what with which versioned data:

246

247

* Users (including technical users)

248

* Roles and role assignments

249

* Review actions on versioned entities

250

251

{{comment}} Data Use ERD (Mermaid, from /Specification/Diagrams/Data Use ERD) {{/comment}}

252

{{include document="FactHarbor.Playground.Data Use ERD Page (from Specification chat).WebHome"/}}

Notes:

* Most roles (READER, CONTRIBUTOR, TRUSTED_CONTRIBUTOR, REVIEWER, MODERATOR,

257

SYSTEM_ADMIN, FEDERATION_OPERATOR, FEDERATION_ADMIN, …) are represented as rows

258

in {{code}}ROLE{{/code}}.

259

* {{code}}TECHNICAL_USER{{/code}} captures strictly technical accounts (API keys,

260

node-to-node federation agents, batch jobs). All other roles can, in principle,

261

be held by both human and technical users where appropriate.

262

* A {{code}}READER{{/code}} normally does **not** perform REVIEW_ACTIONs, while

263

roles like REVIEWER, TRUSTED_CONTRIBUTOR, MODERATOR, and some federation roles

do.

----

= 5.4 Versioning and re-evaluation behavior =

269

270

This section ties the data model to the re-evaluation logic

271

(described in more detail in the Versioning and Automation chapters).

272

273

* When a new {{code}}EVIDENCE_VERSION{{/code}} is created:

274

* All related {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} entries referencing

275

that evidence version are candidates for re-assessment.

276

* Related {{code}}VERDICT_VERSION{{/code}} entries may become **outdated** and

277

are queued for re-evaluation.

278

279

* When a new {{code}}SCENARIO_VERSION{{/code}} is created:

280

* It may inherit some links from earlier scenarios, or start empty depending

281

on the change classification (cosmetic vs. conceptual).

282

* All verdicts for that scenario are recalculated and stored as new

283

{{code}}VERDICT_VERSION{{/code}} entries.

284

285

* REVIEW_ACTIONs are always attached to the **exact version** that was seen by

286

the reviewer. This preserves a faithful audit trail if data later changes.

287

288

* In a federated environment, nodes can choose:

289

* which identity entities to replicate (CLAIM, SCENARIO, EVIDENCE, VERDICT)

290

* which versioned entities to replicate (e.g. only accepted VERDICT_VERSIONs,

291

only EVIDENCE_VERSIONs above a reliability threshold, etc.)

----

= 5.5 Behavioral Notes =

296

297

== 5.5.1 Late-Arriving Evidence ==

298

299

New evidence versions can make existing verdicts **outdated** and may trigger

300

re-evaluation cascades. This is handled by the global trigger and automation

301

architecture (see the Versioning & Automation chapters).

302

303

== 5.5.2 Scenario Evolution ==

304

305

Scenario changes create new SCENARIO_VERSIONs; dependent verdicts and

306

Scenario–Evidence links are re-assessed. Old versions remain available for

307

historical comparison and reproducibility.

308

309

== 5.5.3 Federation ==

310

311

Federated nodes can replicate subsets of the graph, including:

312

313

* Claims and Scenarios of local interest

314

* Evidence metadata (without full content)

315

* Verdict lineages used for local decision-making

316

317

Federation-specific entities (such as {{code}}FEDERATION_NODE{{/code}},

318

replication logs, and trust rules) are described in the Federation &

319

Decentralization chapter and build on top of the core data model defined here.

Wiki source code of Data Model (From Specification Chat)

Applications

Navigation

Need help?