Data Model (From Specification Chat)

| Claim (what people argue about) | {{code}}CLAIM{{/code}} | {{code}}CLAIM_VERSION{{/code}} | Claim text, phrasing, and metadata live in {{code}}CLAIM_VERSION{{/code}}. The identity {{code}}CLAIM{{/code}} stays stable across rephrasings.

35

| Scenario (interpretive frame) | {{code}}SCENARIO{{/code}} | {{code}}SCENARIO_VERSION{{/code}} | A SCENARIO belongs to a CLAIM. Its versions capture evolving definitions, assumptions, and boundaries.

36

| Evidence (source / datapoint) | {{code}}EVIDENCE{{/code}} | {{code}}EVIDENCE_VERSION{{/code}} | Identity of a source vs. specific extractions / updates over time.

37

| Verdict (assessment) | {{code}}VERDICT{{/code}} | {{code}}VERDICT_VERSION{{/code}} | A VERDICT is defined per SCENARIO; VERDICT_VERSION captures the history of assessments.

38

| Scenario–Evidence link | {{code}}SCENARIO_EVIDENCE_LINK{{/code}} | {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} | Links bind scenario versions to evidence versions with relevance & direction.

39

| Claim cluster (semantic group) | {{code}}CLAIM_CLUSTER{{/code}} | – | Groups semantically related claims; mainly for discovery and navigation.

40

41

Key design decisions:

42

43

* A {{code}}CLAIM{{/code}} belongs to exactly one {{code}}CLAIM_CLUSTER{{/code}}.

44

* A {{code}}SCENARIO{{/code}} belongs to exactly one {{code}}CLAIM{{/code}}

45

(scenarios live at the *claim* level, not per individual phrasing).

46

* Verdicts and Scenario–Evidence links are always attached to **versions**:

47

* {{code}}SCENARIO_VERSION{{/code}} +

48

{{code}}EVIDENCE_VERSION{{/code}} →

49

{{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}}

50

* {{code}}SCENARIO_VERSION{{/code}} →

51

{{code}}VERDICT_VERSION{{/code}}

52

53

This ensures that when a Scenario or Evidence changes, old verdicts and links

54

remain intact as historical records and can be revisited.

----

= 5.2 Core Data Model ERD (expanded, versioned) =

59

60

The following Mermaid ER diagram shows the main entities and their relationships.

61

The convention is that fields ending in {{code}}Id{{/code}} are primary keys,

62

and fields with {{code}}...IdFk{{/code}} are foreign keys.

63

64

{{comment}} Core Data Model ERD (Mermaid, from /Specification/Diagrams/Data Model) {{/comment}}

65

{{include document="FactHarbor.Playground.Core Data Model ERD Page (from Specification chat).WebHome" reference="FactHarbor.Playground.data.Core Data Model ERD Page (from Specification chat).WebHome"/}}

66

67

**Important points:**

68

69

* Scenarios and Evidence are **linked via their versions**

70

({{code}}SCENARIO_VERSION{{/code}} and {{code}}EVIDENCE_VERSION{{/code}}).

71

* Verdicts are **per ScenarioVersion** and stored in {{code}}VERDICT_VERSION{{/code}}.

72

* {{code}}CLAIM_CLUSTER{{/code}} is shared across diagrams; it is shown here and in the Data Use / Review model.

73

74

All version entities are immutable: once created, they are never changed, only

75

superseded by newer versions.

----

= 5.3 Data Use & Review ERD =

80

81

The **Data Use** model captures who does what with which versioned data:

82

83

* Users (including technical users)

84

* Roles and role assignments

85

* Review actions on versioned entities

86

87

{{comment}} Data Use ERD (Mermaid, from /Specification/Diagrams/Data Use ERD) {{/comment}}

88

{{include document="FactHarbor.Playground.Data Use ERD Page (from Specification chat).WebHome" reference="FactHarbor.Playground.data.Data Use ERD Page (from Specification chat).WebHome"/}}

Notes:

* Most roles (READER, CONTRIBUTOR, TRUSTED_CONTRIBUTOR, REVIEWER, MODERATOR,

94

SYSTEM_ADMIN, FEDERATION_OPERATOR, FEDERATION_ADMIN, …) are represented as rows

95

in {{code}}ROLE{{/code}}.

96

* {{code}}TECHNICAL_USER{{/code}} captures strictly technical accounts (API keys,

97

node-to-node federation agents, batch jobs). All other roles can, in principle,

98

be held by both human and technical users where appropriate.

99

* A {{code}}READER{{/code}} normally does **not** perform REVIEW_ACTIONs, while

100

roles like REVIEWER, TRUSTED_CONTRIBUTOR, MODERATOR, and some federation roles

do.

----

= 5.4 Versioning and re-evaluation behavior =

106

107

This section ties the data model to the re-evaluation logic

108

(described in more detail in the Versioning and Automation chapters).

109

110

* When a new {{code}}EVIDENCE_VERSION{{/code}} is created:

111

* All related {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} entries referencing

112

that evidence version are candidates for re-assessment.

113

* Related {{code}}VERDICT_VERSION{{/code}} entries may become **outdated** and

114

are queued for re-evaluation.

115

116

* When a new {{code}}SCENARIO_VERSION{{/code}} is created:

117

* It may inherit some links from earlier scenarios, or start empty depending

118

on the change classification (cosmetic vs. conceptual).

119

* All verdicts for that scenario are recalculated and stored as new

120

{{code}}VERDICT_VERSION{{/code}} entries.

121

122

* REVIEW_ACTIONs are always attached to the **exact version** that was seen by

123

the reviewer. This preserves a faithful audit trail if data later changes.

124

125

* In a federated environment, nodes can choose:

126

* which identity entities to replicate (CLAIM, SCENARIO, EVIDENCE, VERDICT)

127

* which versioned entities to replicate (e.g. only accepted VERDICT_VERSIONs,

128

only EVIDENCE_VERSIONs above a reliability threshold, etc.)

----

= 5.5 Behavioral Notes =

133

134

== 5.5.1 Late-Arriving Evidence ==

135

136

New evidence versions can make existing verdicts **outdated** and may trigger

137

re-evaluation cascades. This is handled by the global trigger and automation

138

architecture (see the Versioning & Automation chapters).

139

140

== 5.5.2 Scenario Evolution ==

141

142

Scenario changes create new SCENARIO_VERSIONs; dependent verdicts and

143

Scenario–Evidence links are re-assessed. Old versions remain available for

144

historical comparison and reproducibility.

145

146

== 5.5.3 Federation ==

147

148

Federated nodes can replicate subsets of the graph, including:

149

150

* Claims and Scenarios of local interest

151

* Evidence metadata (without full content)

152

* Verdict lineages used for local decision-making

153

154

Federation-specific entities (such as {{code}}FEDERATION_NODE{{/code}},

155

replication logs, and trust rules) are described in the Federation &

156

Decentralization chapter and build on top of the core data model defined here.

----

== 1. Overall analysis & review of the data model ==

161

162

=== 1.1 Strengths of the current design ===

163

164

* (((

165

**Identity vs. version pattern**

166

Using base entities plus version entities (CLAIM + CLAIM_VERSION, SCENARIO + SCENARIO_VERSION, etc.) is exactly how modern knowledge systems handle:

* auditability

* time evolution

* re-evaluation triggers

171

* federation and partial replication

172

)))

173

* (((

174

**Scenario-centric reasoning**

175

Separating //Claim// (what people argue about) from //Scenario// (interpretive frame) is very aligned with “truth landscape” style systems:

176

177

* Scenarios explain //why people disagree//.

178

* Verdicts are tied to specific scenario versions → avoids mixing incompatible assumptions.

179

)))

180

* **Evidence and verdicts as first-class entities**

181

Evidence is explicit, linked to scenarios, and verdicts are per scenario. This matches good practice from fact-checking, scientific assessment panels, and trust graphs.

182

* (((

183

**Cluster level (CLAIM_CLUSTER)**

184

Grouping related claims avoids duplication and lets you:

185

186

* reuse scenarios across paraphrases

187

* share embeddings / semantic search

188

* keep the system scalable as the corpus grows.

189

)))

190

* (((

191

**Explicit review layer (REVIEW_ACTION, roles, etc.)**

192

Separating “data” from “who reviewed what” keeps the model clean, and is exactly what you want for:

* governance

* permissions

* audit trails

* future trust scoring per user / role.

)))

----

=== 1.2 Design decisions I’m locking in (based on our discussions) ===

203

204

To make the model consistent and “state-of-the-art”, I will assume the following as //current intended design//:

205

206

1. (((

207

**Claims vs Scenarios**

208

209

* CLAIM is the stable identity for “what people argue about”.

210

* CLAIM_VERSION are individual phrasings / formulations / metadata.

211

* (((

212

SCENARIO belongs to a **CLAIM**, not to a specific CLAIM_VERSION.

213

Rationale:

214

215

* Many different phrasings share the //same// scenario.

216

* You avoid duplicating scenarios per wording.

217

)))

218

* SCENARIO_VERSION holds detailed definitions, assumptions, boundaries, etc.

219

)))

220

1. (((

221

**Version-specific reasoning**

222

223

* **Verdicts** are always attached to SCENARIO_VERSION (not base SCENARIO).

224

* **Evidence links** are between SCENARIO_VERSION and EVIDENCE_VERSION.

225

→ This is what we agreed when we said //“SCENARIO_EVIDENCE_LINK should link the respective versions instead”//.

)))

1. (((

**Clusters**

* CLAIM_CLUSTER groups Claims (semantically close claims).

231

* It is visible in **both diagrams** (Core Data Model and Data Use).

)))

1. (((

**Review vs data**

* (((

All review happens **on versioned entities**:

* CLAIM_VERSION

* SCENARIO_VERSION

* EVIDENCE_VERSION

* SCENARIO_EVIDENCE_LINK_VERSION

243

* VERDICT_VERSION

244

)))

245

* REVIEW_ACTION is the generic log of //who// did //what// on //which version//.

)))

1. (((

**Users & roles**

* USER has an attribute (or a linked entity) that distinguishes **technical users** from normal accounts.

251

* We //keep// TECHNICAL_USER as a specialisation of USER (strictly technical accounts).

252

* All human & technical accounts can hold roles via USER_ROLE_MEMBERSHIP.

* (((

Roles include:

* READER

* CONTRIBUTOR

* TRUSTED_CONTRIBUTOR

259

* REVIEWER

260

* MODERATOR

261

* SYSTEM_ADMIN / MAINTAINER

262

* FEDERATION_OPERATOR

263

* FEDERATION_ADMIN

264

(all present in the Data Use ERD, but as rows of ROLE rather than separate entities).

)))

)))

----

=== 1.3 Gaps / potential problems ===

271

272

These are the main issues & missing areas I see:

273

274

1. (((

275

**Versioning text in chapter 5 is currently too thin (‘…’ placeholders)**

276

277

* (((

278

The spec does not yet //verbally// spell out:

279

280

* the identity vs version pattern, systematically

281

* how re-evaluation triggers are derived from version changes

282

* how this aligns with federation (which versions are replicated where).

)))

)))

1. (((

**No explicit “provenance granularity” in the model**

287

288

* (((

289

EVIDENCE is a single entity. For more advanced use cases, you may later want:

290

291

* EVIDENCE_SOURCE (the whole article/report/video)

292

* EVIDENCE_FRAGMENT (specific paragraph/clip with its own reliability, quote, etc.)

293

)))

294

* For now, I’ll keep EVIDENCE/EVIDENCE_VERSION as is, but I’ll mention this as a possible extension.

295

)))

296

1. (((

297

**Review target polymorphism**

298

299

* (((

300

REVIEW_ACTION can apply to multiple entity types. In the diagram this shows as multiple relationships:

301

302

* CLAIM_VERSION → REVIEW_ACTION

303

* SCENARIO_VERSION → REVIEW_ACTION

304

* etc.

305

)))

306

* A more “pure” relational modeling would use a generic “subjectType + subjectId” or an intermediate “REVIEW_TARGET” table.

307

* For readability, I’ll keep the simpler multi-edge representation and mention the polymorphism in text.

308

)))

309

1. (((

310

**Federation details missing from core ERD**

311

312

* There is no explicit FEDERATION_NODE / REPLICATION_LOG in the Data Model chapter.

313

* This is ok for “core logical data model”, but I’ll add a short note that federation metadata is handled in the Federation chapter and via additional entities.

314

)))

315

1. (((

316

**Automation / AKEL artifacts left implicit**

317

318

* (((

319

The Data Model chapter currently doesn’t describe:

320