Data Model (From Specification Chat)

| Claim (what people argue about) | {{code}}CLAIM{{/code}} | {{code}}CLAIM_VERSION{{/code}} | Claim text, phrasing, and metadata live in {{code}}CLAIM_VERSION{{/code}}. The identity {{code}}CLAIM{{/code}} stays stable across rephrasings.

35

| Scenario (interpretive frame) | {{code}}SCENARIO{{/code}} | {{code}}SCENARIO_VERSION{{/code}} | A SCENARIO belongs to a CLAIM. Its versions capture evolving definitions, assumptions, and boundaries.

36

| Evidence (source / datapoint) | {{code}}EVIDENCE{{/code}} | {{code}}EVIDENCE_VERSION{{/code}} | Identity of a source vs. specific extractions / updates over time.

37

| Verdict (assessment) | {{code}}VERDICT{{/code}} | {{code}}VERDICT_VERSION{{/code}} | A VERDICT is defined per SCENARIO; VERDICT_VERSION captures the history of assessments.

38

| Scenario–Evidence link | {{code}}SCENARIO_EVIDENCE_LINK{{/code}} | {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} | Links bind scenario versions to evidence versions with relevance & direction.

39

| Claim cluster (semantic group) | {{code}}CLAIM_CLUSTER{{/code}} | – | Groups semantically related claims; mainly for discovery and navigation.

40

41

Key design decisions:

42

43

* A {{code}}CLAIM{{/code}} belongs to exactly one {{code}}CLAIM_CLUSTER{{/code}}.

44

* A {{code}}SCENARIO{{/code}} belongs to exactly one {{code}}CLAIM{{/code}}

45

(scenarios live at the *claim* level, not per individual phrasing).

46

* Verdicts and Scenario–Evidence links are always attached to **versions**:

47

* {{code}}SCENARIO_VERSION{{/code}} +

48

{{code}}EVIDENCE_VERSION{{/code}} →

49

{{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}}

50

* {{code}}SCENARIO_VERSION{{/code}} →

51

{{code}}VERDICT_VERSION{{/code}}

52

53

This ensures that when a Scenario or Evidence changes, old verdicts and links

54

remain intact as historical records and can be revisited.

----

= 5.2 Core Data Model ERD (expanded, versioned) =

59

60

The following Mermaid ER diagram shows the main entities and their relationships.

61

The convention is that fields ending in {{code}}Id{{/code}} are primary keys,

62

and fields with {{code}}...IdFk{{/code}} are foreign keys.

63

64

{{comment}} Core Data Model ERD (Mermaid, from /Specification/Diagrams/Data Model) {{/comment}}

65

{{include document="FactHarbor.Playground.Core Data Model ERD Page (from Specification chat).WebHome" reference="FactHarbor.Playground.data.Core Data Model ERD Page (from Specification chat).WebHome"/}}

66

67

= Core Data Model ERD (Versioned) =

68

69

This diagram shows the full core data model with all versioned entities.

erDiagram

CLAIM_CLUSTER {

string ClusterID PK

string EmbeddingVectorRef

string Theme

}

CLAIM {

string ClaimID PK

string ClusterID FK

string Status

datetime CreatedAt

}

CLAIM_VERSION {

string ClaimVersionID PK

string ClaimID FK

string Text

string ClaimType

string Domain

datetime CreatedAt

}

SCENARIO {

string ScenarioID PK

string ClaimID FK

string Name

datetime CreatedAt

}

SCENARIO_VERSION {

string ScenarioVersionID PK

string ScenarioID FK

string Definitions

string Assumptions

string Boundaries

datetime CreatedAt

}

EVIDENCE {

string EvidenceID PK

string SourceType

string URL

float ReliabilityScore

}

EVIDENCE_VERSION {

string EvidenceVersionID PK

120

string EvidenceID FK

121

string Summary

122

float ReliabilityScore

datetime CreatedAt

}

SCENARIO_EVIDENCE_LINK {

127

string LinkID PK

128

string ScenarioVersionID FK

129

string EvidenceVersionID FK

float Relevance

string Direction

}

VERDICT {

string VerdictID PK

string ScenarioID FK

}

VERDICT_VERSION {

string VerdictVersionID PK

string VerdictID FK

float Verdict

float Confidence

string Reasoning

datetime CreatedAt

}

CLAIM_CLUSTER ||--o{ CLAIM : contains

149

CLAIM ||--o{ CLAIM_VERSION : versions

150

151

CLAIM ||--o{ SCENARIO : has

152

SCENARIO ||--o{ SCENARIO_VERSION : versions

153

154

EVIDENCE ||--o{ EVIDENCE_VERSION : versions

155

156

SCENARIO_VERSION ||--o{ SCENARIO_EVIDENCE_LINK : links

157

EVIDENCE_VERSION ||--o{ SCENARIO_EVIDENCE_LINK : linked

158

159

SCENARIO ||--o{ VERDICT : assessed

160

VERDICT ||--o{ VERDICT_VERSION : versions

All key entities are explicitly versioned here (…VERSION tables).

165

This reflects the versioning requirements in the textual Data Model chapter.

**Important points:**

170

171

* Scenarios and Evidence are **linked via their versions**

172

({{code}}SCENARIO_VERSION{{/code}} and {{code}}EVIDENCE_VERSION{{/code}}).

173

* Verdicts are **per ScenarioVersion** and stored in {{code}}VERDICT_VERSION{{/code}}.

174

* {{code}}CLAIM_CLUSTER{{/code}} is shared across diagrams; it is shown here and in the Data Use / Review model.

175

176

All version entities are immutable: once created, they are never changed, only

177

superseded by newer versions.

----

= 5.3 Data Use & Review ERD (expanded, versioned) =

182

183

The **Data Use** model captures who does what with which versioned data:

184

185

* Users (including technical users)

186

* Roles and role assignments

187

* Review actions on versioned entities

188

189

{{comment}} Data Use ERD (Mermaid, from /Specification/Diagrams/Data Use ERD) {{/comment}}

190

{{include document="FactHarbor.Playground.Data Use ERD Page (from Specification chat).WebHome" reference="FactHarbor.Playground.data.Data Use ERD Page (from Specification chat).WebHome"/}}

191

192

= Data Use ERD (Roles, Review & Versioned Entities) =

193

194

This diagram shows how users, roles, and review actions relate to the

195

versioned core entities.

erDiagram

%% Core clusters shown for context

200

CLAIM_CLUSTER {

201

string ClusterID PK

202

string EmbeddingVectorRef

string Theme

}

CLAIM {

string ClaimID PK

string ClusterID FK

string Status

datetime CreatedAt

}

CLAIM_VERSION {

string ClaimVersionID PK

string ClaimID FK

string Text

string ClaimType

string Domain

datetime CreatedAt

}

SCENARIO {

string ScenarioID PK

string ClaimID FK

string Name

datetime CreatedAt

}

SCENARIO_VERSION {

string ScenarioVersionID PK

string ScenarioID FK

string Definitions

string Assumptions

string Boundaries

datetime CreatedAt

}

EVIDENCE {

string EvidenceID PK

string SourceType

string URL

float ReliabilityScore

}

EVIDENCE_VERSION {

string EvidenceVersionID PK

247

string EvidenceID FK

248

string Summary

249

float ReliabilityScore

datetime CreatedAt

}

VERDICT {

string VerdictID PK

string ScenarioID FK

}

VERDICT_VERSION {

string VerdictVersionID PK

string VerdictID FK

float Verdict

float Confidence

string Reasoning

datetime CreatedAt

}

%% Users and roles

USER {

string UserID PK

string Handle

string Email

}

TECHNICAL_USER {

string UserID PK

string SystemName

}

CONTRIBUTING_USER {

string UserID PK

string DisplayName

}

TRUSTED_CONTRIBUTOR {

string UserID PK

string TrustLevel

}

REVIEWER {

string UserID PK

string Domain

}

EXPERT {

string UserID PK

string ExpertiseArea

}

FEDERATION_NODE {

string NodeID PK

string Region

}

FEDERATION_ADMIN {

string UserID PK

string Permissions

}

REVIEW_ACTION {

string ReviewActionID PK

311

string UserID FK

312

string TargetEntityType

313

string TargetEntityVersionID

string ActionType

string Comment

datetime Timestamp

}

%% Inheritance / specialization (modelled as relationships)

320

USER ||--o{ TECHNICAL_USER : "is a"

321

USER ||--o{ CONTRIBUTING_USER : "is a"

322

323

CONTRIBUTING_USER ||--o{ TRUSTED_CONTRIBUTOR : "subset"

324

CONTRIBUTING_USER ||--o{ REVIEWER : "subset"

325

CONTRIBUTING_USER ||--o{ EXPERT : "subset"

326

327

TECHNICAL_USER ||--o{ FEDERATION_NODE : "operates"

328

TECHNICAL_USER ||--o{ FEDERATION_ADMIN : "administers"

329

330

%% Review actions on versioned entities

331

USER ||--o{ REVIEW_ACTION : performs

332

333

REVIEW_ACTION }o--|| CLAIM_VERSION : reviews

334

REVIEW_ACTION }o--|| SCENARIO_VERSION : reviews

335

REVIEW_ACTION }o--|| EVIDENCE_VERSION : reviews

336

REVIEW_ACTION }o--|| VERDICT_VERSION : reviews

This diagram focuses on *who* uses and reviews *which* versioned entities.

341

USER is the base type; TECHNICAL_USER and CONTRIBUTING_USER are specializations.

342

Other roles (REVIEWER, EXPERT, TRUSTED_CONTRIBUTOR, FEDERATION_ADMIN, FEDERATION_NODE)

343

are modelled as specializations or technical subtypes.

Notes:

* Most roles (READER, CONTRIBUTOR, TRUSTED_CONTRIBUTOR, REVIEWER, MODERATOR,

350

SYSTEM_ADMIN, FEDERATION_OPERATOR, FEDERATION_ADMIN, …) are represented as rows

351

in {{code}}ROLE{{/code}}.

352

* {{code}}TECHNICAL_USER{{/code}} captures strictly technical accounts (API keys,

353

node-to-node federation agents, batch jobs). All other roles can, in principle,

354

be held by both human and technical users where appropriate.

355

* A {{code}}READER{{/code}} normally does **not** perform REVIEW_ACTIONs, while

356

roles like REVIEWER, TRUSTED_CONTRIBUTOR, MODERATOR, and some federation roles

do.

----

= 5.4 Versioning and re-evaluation behavior =

362

363

This section ties the data model to the re-evaluation logic

364

(described in more detail in the Versioning and Automation chapters).

365

366

* When a new {{code}}EVIDENCE_VERSION{{/code}} is created:

367

* All related {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} entries referencing

368

that evidence version are candidates for re-assessment.

369

* Related {{code}}VERDICT_VERSION{{/code}} entries may become **outdated** and

370

are queued for re-evaluation.

371

372

* When a new {{code}}SCENARIO_VERSION{{/code}} is created:

373

* It may inherit some links from earlier scenarios, or start empty depending

374

on the change classification (cosmetic vs. conceptual).

375

* All verdicts for that scenario are recalculated and stored as new

376

{{code}}VERDICT_VERSION{{/code}} entries.

377

378

* REVIEW_ACTIONs are always attached to the **exact version** that was seen by

379

the reviewer. This preserves a faithful audit trail if data later changes.

380

381

* In a federated environment, nodes can choose:

382

* which identity entities to replicate (CLAIM, SCENARIO, EVIDENCE, VERDICT)

383

* which versioned entities to replicate (e.g. only accepted VERDICT_VERSIONs,

384

only EVIDENCE_VERSIONs above a reliability threshold, etc.)

----

= 5.5 Behavioral Notes =

389

390

== 5.5.1 Late-Arriving Evidence ==

391

392

New evidence versions can make existing verdicts **outdated** and may trigger

393

re-evaluation cascades. This is handled by the global trigger and automation

394

architecture (see the Versioning & Automation chapters).

395

396

== 5.5.2 Scenario Evolution ==

397

398

Scenario changes create new SCENARIO_VERSIONs; dependent verdicts and

399

Scenario–Evidence links are re-assessed. Old versions remain available for

400

historical comparison and reproducibility.

401

402

== 5.5.3 Federation ==

403

404

Federated nodes can replicate subsets of the graph, including:

405

406

* Claims and Scenarios of local interest

407

* Evidence metadata (without full content)

408

* Verdict lineages used for local decision-making

409

410

Federation-specific entities (such as {{code}}FEDERATION_NODE{{/code}},

411

replication logs, and trust rules) are described in the Federation &

412

Decentralization chapter and build on top of the core data model defined here.

----

== 1. Overall analysis & review of the data model ==

417

418

=== 1.1 Strengths of the current design ===

419

420

* (((

421

**Identity vs. version pattern**

422

Using base entities plus version entities (CLAIM + CLAIM_VERSION, SCENARIO + SCENARIO_VERSION, etc.) is exactly how modern knowledge systems handle:

* auditability

* time evolution

* re-evaluation triggers

427

* federation and partial replication

428

)))

429

* (((

430

**Scenario-centric reasoning**

431

Separating //Claim// (what people argue about) from //Scenario// (interpretive frame) is very aligned with “truth landscape” style systems:

432

433

* Scenarios explain //why people disagree//.

434

* Verdicts are tied to specific scenario versions → avoids mixing incompatible assumptions.

435

)))

436

* **Evidence and verdicts as first-class entities**

437

Evidence is explicit, linked to scenarios, and verdicts are per scenario. This matches good practice from fact-checking, scientific assessment panels, and trust graphs.

438

* (((

439

**Cluster level (CLAIM_CLUSTER)**

440

Grouping related claims avoids duplication and lets you:

441

442

* reuse scenarios across paraphrases

443

* share embeddings / semantic search

444

* keep the system scalable as the corpus grows.

445

)))

446

* (((

447

**Explicit review layer (REVIEW_ACTION, roles, etc.)**

448

Separating “data” from “who reviewed what” keeps the model clean, and is exactly what you want for:

* governance

* permissions

* audit trails

* future trust scoring per user / role.

)))

----

=== 1.2 Design decisions I’m locking in (based on our discussions) ===

459

460

To make the model consistent and “state-of-the-art”, I will assume the following as //current intended design//:

461

462

1. (((

463

**Claims vs Scenarios**

464

465

* CLAIM is the stable identity for “what people argue about”.

466

* CLAIM_VERSION are individual phrasings / formulations / metadata.

467

* (((

468

SCENARIO belongs to a **CLAIM**, not to a specific CLAIM_VERSION.

469

Rationale:

470

471

* Many different phrasings share the //same// scenario.

472

* You avoid duplicating scenarios per wording.

473

)))

474

* SCENARIO_VERSION holds detailed definitions, assumptions, boundaries, etc.

475

)))

476

1. (((

477

**Version-specific reasoning**

478

479

* **Verdicts** are always attached to SCENARIO_VERSION (not base SCENARIO).

480

* **Evidence links** are between SCENARIO_VERSION and EVIDENCE_VERSION.

481

→ This is what we agreed when we said //“SCENARIO_EVIDENCE_LINK should link the respective versions instead”//.

)))

1. (((

**Clusters**

* CLAIM_CLUSTER groups Claims (semantically close claims).

487

* It is visible in **both diagrams** (Core Data Model and Data Use).

)))

1. (((

**Review vs data**

* (((

All review happens **on versioned entities**:

* CLAIM_VERSION

* SCENARIO_VERSION

* EVIDENCE_VERSION

* SCENARIO_EVIDENCE_LINK_VERSION

499

* VERDICT_VERSION

500

)))

501

* REVIEW_ACTION is the generic log of //who// did //what// on //which version//.

)))

1. (((

**Users & roles**

* USER has an attribute (or a linked entity) that distinguishes **technical users** from normal accounts.

507

* We //keep// TECHNICAL_USER as a specialisation of USER (strictly technical accounts).

508

* All human & technical accounts can hold roles via USER_ROLE_MEMBERSHIP.

* (((

Roles include:

* READER

* CONTRIBUTOR

* TRUSTED_CONTRIBUTOR

515

* REVIEWER

516

* MODERATOR

517

* SYSTEM_ADMIN / MAINTAINER

518

* FEDERATION_OPERATOR

519

* FEDERATION_ADMIN

520

(all present in the Data Use ERD, but as rows of ROLE rather than separate entities).

)))

)))

----

=== 1.3 Gaps / potential problems ===

527

528

These are the main issues & missing areas I see:

529

530

1. (((

531

**Versioning text in chapter 5 is currently too thin (‘…’ placeholders)**

532

533

* (((

534

The spec does not yet //verbally// spell out:

535

536

* the identity vs version pattern, systematically

537

* how re-evaluation triggers are derived from version changes

538

* how this aligns with federation (which versions are replicated where).

)))

)))

1. (((

**No explicit “provenance granularity” in the model**

543

544

* (((

545

EVIDENCE is a single entity. For more advanced use cases, you may later want:

546

547

* EVIDENCE_SOURCE (the whole article/report/video)

548

* EVIDENCE_FRAGMENT (specific paragraph/clip with its own reliability, quote, etc.)

549

)))

550

* For now, I’ll keep EVIDENCE/EVIDENCE_VERSION as is, but I’ll mention this as a possible extension.

551

)))

552

1. (((

553

**Review target polymorphism**

554

555

* (((

556

REVIEW_ACTION can apply to multiple entity types. In the diagram this shows as multiple relationships:

557

558

* CLAIM_VERSION → REVIEW_ACTION

559

* SCENARIO_VERSION → REVIEW_ACTION

560

* etc.

561

)))

562

* A more “pure” relational modeling would use a generic “subjectType + subjectId” or an intermediate “REVIEW_TARGET” table.

563

* For readability, I’ll keep the simpler multi-edge representation and mention the polymorphism in text.

564

)))

565

1. (((

566

**Federation details missing from core ERD**

567

568

* There is no explicit FEDERATION_NODE / REPLICATION_LOG in the Data Model chapter.

569

* This is ok for “core logical data model”, but I’ll add a short note that federation metadata is handled in the Federation chapter and via additional entities.

570

)))

571

1. (((

572

**Automation / AKEL artifacts left implicit**

573

574

* (((

575

The Data Model chapter currently doesn’t describe:

576