Data Model (From Specification Chat)

| Claim (what people argue about) | {{code}}CLAIM{{/code}} | {{code}}CLAIM_VERSION{{/code}} | Claim text, phrasing, and metadata live in {{code}}CLAIM_VERSION{{/code}}. The identity {{code}}CLAIM{{/code}} stays stable across rephrasings.

35

| Scenario (interpretive frame) | {{code}}SCENARIO{{/code}} | {{code}}SCENARIO_VERSION{{/code}} | A SCENARIO belongs to a CLAIM. Its versions capture evolving definitions, assumptions, and boundaries.

36

| Evidence (source / datapoint) | {{code}}EVIDENCE{{/code}} | {{code}}EVIDENCE_VERSION{{/code}} | Identity of a source vs. specific extractions / updates over time.

37

| Verdict (assessment) | {{code}}VERDICT{{/code}} | {{code}}VERDICT_VERSION{{/code}} | A VERDICT is defined per SCENARIO; VERDICT_VERSION captures the history of assessments.

38

| Scenario–Evidence link | {{code}}SCENARIO_EVIDENCE_LINK{{/code}} | {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} | Links bind scenario versions to evidence versions with relevance & direction.

39

| Claim cluster (semantic group) | {{code}}CLAIM_CLUSTER{{/code}} | – | Groups semantically related claims; mainly for discovery and navigation.

40

41

Key design decisions:

42

43

* A {{code}}CLAIM{{/code}} belongs to exactly one {{code}}CLAIM_CLUSTER{{/code}}.

44

* A {{code}}SCENARIO{{/code}} belongs to exactly one {{code}}CLAIM{{/code}}

45

(scenarios live at the *claim* level, not per individual phrasing).

46

* Verdicts and Scenario–Evidence links are always attached to **versions**:

47

* {{code}}SCENARIO_VERSION{{/code}} +

48

{{code}}EVIDENCE_VERSION{{/code}} →

49

{{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}}

50

* {{code}}SCENARIO_VERSION{{/code}} →

51

{{code}}VERDICT_VERSION{{/code}}

52

53

This ensures that when a Scenario or Evidence changes, old verdicts and links

54

remain intact as historical records and can be revisited.

----

= 5.2 Core Data Model ERD (expanded, versioned) =

59

60

The following Mermaid ER diagram shows the main entities and their relationships.

61

The convention is that fields ending in {{code}}Id{{/code}} are primary keys,

62

and fields with {{code}}...IdFk{{/code}} are foreign keys.

63

64

{{comment}} Core Data Model ERD (Mermaid, from /Specification/Diagrams/Data Model) {{/comment}}

65

{{include document="FactHarbor.Playground.Core Data Model ERD Page (from Specification chat).WebHome" reference="FactHarbor.Playground.data.Core Data Model ERD Page (from Specification chat).WebHome"/}}

66

67

**Important points:**

68

69

* Scenarios and Evidence are **linked via their versions**

70

({{code}}SCENARIO_VERSION{{/code}} and {{code}}EVIDENCE_VERSION{{/code}}).

71

* Verdicts are **per ScenarioVersion** and stored in {{code}}VERDICT_VERSION{{/code}}.

72

* {{code}}CLAIM_CLUSTER{{/code}} is shared across diagrams; it is shown here and in the Data Use / Review model.

73

74

All version entities are immutable: once created, they are never changed, only

75

superseded by newer versions.

----

= 5.3 Data Use & Review ERD (expanded, versioned) =

80

81

The **Data Use** model captures who does what with which versioned data:

82

83

* Users (including technical users)

84

* Roles and role assignments

85

* Review actions on versioned entities

86

87

{{comment}} Data Use ERD (Mermaid, from /Specification/Diagrams/Data Use ERD) {{/comment}}

88

{{include document="FactHarbor.Playground.Data Use ERD Page (from Specification chat).WebHome" reference="FactHarbor.Playground.data.Data Use ERD Page (from Specification chat).WebHome"/}}

89

90

= Data Use ERD (Roles, Review & Versioned Entities) =

91

92

This diagram shows how users, roles, and review actions relate to the

93

versioned core entities.

erDiagram

%% Core clusters shown for context

98

CLAIM_CLUSTER {

99

string ClusterID PK

100

string EmbeddingVectorRef

string Theme

}

CLAIM {

string ClaimID PK

string ClusterID FK

string Status

datetime CreatedAt

}

CLAIM_VERSION {

string ClaimVersionID PK

string ClaimID FK

string Text

string ClaimType

string Domain

datetime CreatedAt

}

SCENARIO {

string ScenarioID PK

string ClaimID FK

string Name

datetime CreatedAt

}

SCENARIO_VERSION {

string ScenarioVersionID PK

string ScenarioID FK

string Definitions

string Assumptions

string Boundaries

datetime CreatedAt

}

EVIDENCE {

string EvidenceID PK

string SourceType

string URL

float ReliabilityScore

}

EVIDENCE_VERSION {

string EvidenceVersionID PK

145

string EvidenceID FK

146

string Summary

147

float ReliabilityScore

datetime CreatedAt

}

VERDICT {

string VerdictID PK

string ScenarioID FK

}

VERDICT_VERSION {

string VerdictVersionID PK

string VerdictID FK

float Verdict

float Confidence

string Reasoning

datetime CreatedAt

}

%% Users and roles

USER {

string UserID PK

string Handle

string Email

}

TECHNICAL_USER {

string UserID PK

string SystemName

}

CONTRIBUTING_USER {

string UserID PK

string DisplayName

}

TRUSTED_CONTRIBUTOR {

string UserID PK

string TrustLevel

}

REVIEWER {

string UserID PK

string Domain

}

EXPERT {

string UserID PK

string ExpertiseArea

}

FEDERATION_NODE {

string NodeID PK

string Region

}

FEDERATION_ADMIN {

string UserID PK

string Permissions

}

REVIEW_ACTION {

string ReviewActionID PK

209

string UserID FK

210

string TargetEntityType

211

string TargetEntityVersionID

string ActionType

string Comment

datetime Timestamp

}

%% Inheritance / specialization (modelled as relationships)

218

USER ||--o{ TECHNICAL_USER : "is a"

219

USER ||--o{ CONTRIBUTING_USER : "is a"

220

221

CONTRIBUTING_USER ||--o{ TRUSTED_CONTRIBUTOR : "subset"

222

CONTRIBUTING_USER ||--o{ REVIEWER : "subset"

223

CONTRIBUTING_USER ||--o{ EXPERT : "subset"

224

225

TECHNICAL_USER ||--o{ FEDERATION_NODE : "operates"

226

TECHNICAL_USER ||--o{ FEDERATION_ADMIN : "administers"

227

228

%% Review actions on versioned entities

229

USER ||--o{ REVIEW_ACTION : performs

230

231

REVIEW_ACTION }o--|| CLAIM_VERSION : reviews

232

REVIEW_ACTION }o--|| SCENARIO_VERSION : reviews

233

REVIEW_ACTION }o--|| EVIDENCE_VERSION : reviews

234

REVIEW_ACTION }o--|| VERDICT_VERSION : reviews

This diagram focuses on *who* uses and reviews *which* versioned entities.

239

USER is the base type; TECHNICAL_USER and CONTRIBUTING_USER are specializations.

240

Other roles (REVIEWER, EXPERT, TRUSTED_CONTRIBUTOR, FEDERATION_ADMIN, FEDERATION_NODE)

241

are modelled as specializations or technical subtypes.

Notes:

* Most roles (READER, CONTRIBUTOR, TRUSTED_CONTRIBUTOR, REVIEWER, MODERATOR,

248

SYSTEM_ADMIN, FEDERATION_OPERATOR, FEDERATION_ADMIN, …) are represented as rows

249

in {{code}}ROLE{{/code}}.

250

* {{code}}TECHNICAL_USER{{/code}} captures strictly technical accounts (API keys,

251

node-to-node federation agents, batch jobs). All other roles can, in principle,

252

be held by both human and technical users where appropriate.

253

* A {{code}}READER{{/code}} normally does **not** perform REVIEW_ACTIONs, while

254

roles like REVIEWER, TRUSTED_CONTRIBUTOR, MODERATOR, and some federation roles

do.

----

= 5.4 Versioning and re-evaluation behavior =

260

261

This section ties the data model to the re-evaluation logic

262

(described in more detail in the Versioning and Automation chapters).

263

264

* When a new {{code}}EVIDENCE_VERSION{{/code}} is created:

265

* All related {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} entries referencing

266

that evidence version are candidates for re-assessment.

267

* Related {{code}}VERDICT_VERSION{{/code}} entries may become **outdated** and

268

are queued for re-evaluation.

269

270

* When a new {{code}}SCENARIO_VERSION{{/code}} is created:

271

* It may inherit some links from earlier scenarios, or start empty depending

272

on the change classification (cosmetic vs. conceptual).

273

* All verdicts for that scenario are recalculated and stored as new

274

{{code}}VERDICT_VERSION{{/code}} entries.

275

276

* REVIEW_ACTIONs are always attached to the **exact version** that was seen by

277

the reviewer. This preserves a faithful audit trail if data later changes.

278

279

* In a federated environment, nodes can choose:

280

* which identity entities to replicate (CLAIM, SCENARIO, EVIDENCE, VERDICT)

281

* which versioned entities to replicate (e.g. only accepted VERDICT_VERSIONs,

282

only EVIDENCE_VERSIONs above a reliability threshold, etc.)

----

= 5.5 Behavioral Notes =

287

288

== 5.5.1 Late-Arriving Evidence ==

289

290

New evidence versions can make existing verdicts **outdated** and may trigger

291

re-evaluation cascades. This is handled by the global trigger and automation

292

architecture (see the Versioning & Automation chapters).

293

294

== 5.5.2 Scenario Evolution ==

295

296

Scenario changes create new SCENARIO_VERSIONs; dependent verdicts and

297

Scenario–Evidence links are re-assessed. Old versions remain available for

298

historical comparison and reproducibility.

299

300

== 5.5.3 Federation ==

301

302

Federated nodes can replicate subsets of the graph, including:

303

304

* Claims and Scenarios of local interest

305

* Evidence metadata (without full content)

306

* Verdict lineages used for local decision-making

307

308

Federation-specific entities (such as {{code}}FEDERATION_NODE{{/code}},

309

replication logs, and trust rules) are described in the Federation &

310

Decentralization chapter and build on top of the core data model defined here.

----

== 1. Overall analysis & review of the data model ==

315

316

=== 1.1 Strengths of the current design ===

317

318

* (((

319

**Identity vs. version pattern**

320

Using base entities plus version entities (CLAIM + CLAIM_VERSION, SCENARIO + SCENARIO_VERSION, etc.) is exactly how modern knowledge systems handle:

* auditability

* time evolution

* re-evaluation triggers

325

* federation and partial replication

326

)))

327

* (((

328

**Scenario-centric reasoning**

329

Separating //Claim// (what people argue about) from //Scenario// (interpretive frame) is very aligned with “truth landscape” style systems:

330

331

* Scenarios explain //why people disagree//.

332

* Verdicts are tied to specific scenario versions → avoids mixing incompatible assumptions.

333

)))

334

* **Evidence and verdicts as first-class entities**

335

Evidence is explicit, linked to scenarios, and verdicts are per scenario. This matches good practice from fact-checking, scientific assessment panels, and trust graphs.

336

* (((

337

**Cluster level (CLAIM_CLUSTER)**

338

Grouping related claims avoids duplication and lets you:

339

340

* reuse scenarios across paraphrases

341

* share embeddings / semantic search

342

* keep the system scalable as the corpus grows.

343

)))

344

* (((

345

**Explicit review layer (REVIEW_ACTION, roles, etc.)**

346

Separating “data” from “who reviewed what” keeps the model clean, and is exactly what you want for:

* governance

* permissions

* audit trails

* future trust scoring per user / role.

)))

----

=== 1.2 Design decisions I’m locking in (based on our discussions) ===

357

358

To make the model consistent and “state-of-the-art”, I will assume the following as //current intended design//:

359

360

1. (((

361

**Claims vs Scenarios**

362

363

* CLAIM is the stable identity for “what people argue about”.

364

* CLAIM_VERSION are individual phrasings / formulations / metadata.

365

* (((

366

SCENARIO belongs to a **CLAIM**, not to a specific CLAIM_VERSION.

367

Rationale:

368

369

* Many different phrasings share the //same// scenario.

370

* You avoid duplicating scenarios per wording.

371

)))

372

* SCENARIO_VERSION holds detailed definitions, assumptions, boundaries, etc.

373

)))

374

1. (((

375

**Version-specific reasoning**

376

377

* **Verdicts** are always attached to SCENARIO_VERSION (not base SCENARIO).

378

* **Evidence links** are between SCENARIO_VERSION and EVIDENCE_VERSION.

379

→ This is what we agreed when we said //“SCENARIO_EVIDENCE_LINK should link the respective versions instead”//.

)))

1. (((

**Clusters**

* CLAIM_CLUSTER groups Claims (semantically close claims).

385

* It is visible in **both diagrams** (Core Data Model and Data Use).

)))

1. (((

**Review vs data**

* (((

All review happens **on versioned entities**:

* CLAIM_VERSION

* SCENARIO_VERSION

* EVIDENCE_VERSION

* SCENARIO_EVIDENCE_LINK_VERSION

397

* VERDICT_VERSION

398

)))

399

* REVIEW_ACTION is the generic log of //who// did //what// on //which version//.

)))

1. (((

**Users & roles**

* USER has an attribute (or a linked entity) that distinguishes **technical users** from normal accounts.

405

* We //keep// TECHNICAL_USER as a specialisation of USER (strictly technical accounts).

406

* All human & technical accounts can hold roles via USER_ROLE_MEMBERSHIP.

* (((

Roles include:

* READER

* CONTRIBUTOR

* TRUSTED_CONTRIBUTOR

413

* REVIEWER

414

* MODERATOR

415

* SYSTEM_ADMIN / MAINTAINER

416

* FEDERATION_OPERATOR

417

* FEDERATION_ADMIN

418

(all present in the Data Use ERD, but as rows of ROLE rather than separate entities).

)))

)))

----

=== 1.3 Gaps / potential problems ===

425

426

These are the main issues & missing areas I see:

427

428

1. (((

429

**Versioning text in chapter 5 is currently too thin (‘…’ placeholders)**

430

431

* (((

432

The spec does not yet //verbally// spell out:

433

434

* the identity vs version pattern, systematically

435

* how re-evaluation triggers are derived from version changes

436

* how this aligns with federation (which versions are replicated where).

)))

)))

1. (((

**No explicit “provenance granularity” in the model**

441

442

* (((

443

EVIDENCE is a single entity. For more advanced use cases, you may later want:

444

445

* EVIDENCE_SOURCE (the whole article/report/video)

446

* EVIDENCE_FRAGMENT (specific paragraph/clip with its own reliability, quote, etc.)

447

)))

448

* For now, I’ll keep EVIDENCE/EVIDENCE_VERSION as is, but I’ll mention this as a possible extension.

449

)))

450

1. (((

451

**Review target polymorphism**

452

453

* (((

454

REVIEW_ACTION can apply to multiple entity types. In the diagram this shows as multiple relationships:

455

456

* CLAIM_VERSION → REVIEW_ACTION

457

* SCENARIO_VERSION → REVIEW_ACTION

458

* etc.

459

)))

460

* A more “pure” relational modeling would use a generic “subjectType + subjectId” or an intermediate “REVIEW_TARGET” table.

461

* For readability, I’ll keep the simpler multi-edge representation and mention the polymorphism in text.

462

)))

463

1. (((

464

**Federation details missing from core ERD**

465

466

* There is no explicit FEDERATION_NODE / REPLICATION_LOG in the Data Model chapter.

467

* This is ok for “core logical data model”, but I’ll add a short note that federation metadata is handled in the Federation chapter and via additional entities.

468

)))

469

1. (((

470

**Automation / AKEL artifacts left implicit**

471

472

* (((

473

The Data Model chapter currently doesn’t describe:

474