Last modified by Robert Schaub on 2025/12/24 20:35

Hide last authors
Robert Schaub 5.2 1 (((
Robert Schaub 8.2 2
Robert Schaub 5.2 3 )))
4
5 = 5. Data Model =
6
7 The FactHarbor data model centers on four fully versioned, immutable entities:
8
9 * **Claim**
10 * **Scenario**
11 * **Evidence**
12 * **Verdict**
13
14 These entities form the structured **“truth landscape”** for each claim.
15 The model is explicitly **versioned**, **traceable**, and **federation-ready**.
16
17 To keep the system auditable and explainable, FactHarbor uses a consistent
18 **identity vs. version** pattern:
19
20 * Identity entities (e.g. {{code}}CLAIM{{/code}}, {{code}}SCENARIO{{/code}})
21 define *what* something is in a stable sense.
22 * Version entities (e.g. {{code}}CLAIM_VERSION{{/code}}, {{code}}SCENARIO_VERSION{{/code}})
23 define *how that thing looked at a given point in time*.
24
25 All reasoning (e.g. verdicts, review actions) is attached to **versions**, never to
26 mutable identities.
27
28 ----
29
30 = 5.1 Core entities and versioning pattern =
31
32 (% class="wikitable" %)
33 | **Logical concept** | **Identity entity** | **Version entity** | **Notes**
34 | Claim (what people argue about) | {{code}}CLAIM{{/code}} | {{code}}CLAIM_VERSION{{/code}} | Claim text, phrasing, and metadata live in {{code}}CLAIM_VERSION{{/code}}. The identity {{code}}CLAIM{{/code}} stays stable across rephrasings.
35 | Scenario (interpretive frame) | {{code}}SCENARIO{{/code}} | {{code}}SCENARIO_VERSION{{/code}} | A SCENARIO belongs to a CLAIM. Its versions capture evolving definitions, assumptions, and boundaries.
36 | Evidence (source / datapoint) | {{code}}EVIDENCE{{/code}} | {{code}}EVIDENCE_VERSION{{/code}} | Identity of a source vs. specific extractions / updates over time.
37 | Verdict (assessment) | {{code}}VERDICT{{/code}} | {{code}}VERDICT_VERSION{{/code}} | A VERDICT is defined per SCENARIO; VERDICT_VERSION captures the history of assessments.
38 | Scenario–Evidence link | {{code}}SCENARIO_EVIDENCE_LINK{{/code}} | {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} | Links bind scenario versions to evidence versions with relevance & direction.
39 | Claim cluster (semantic group) | {{code}}CLAIM_CLUSTER{{/code}} | – | Groups semantically related claims; mainly for discovery and navigation.
40
41 Key design decisions:
42
43 * A {{code}}CLAIM{{/code}} belongs to exactly one {{code}}CLAIM_CLUSTER{{/code}}.
44 * A {{code}}SCENARIO{{/code}} belongs to exactly one {{code}}CLAIM{{/code}}
45 (scenarios live at the *claim* level, not per individual phrasing).
46 * Verdicts and Scenario–Evidence links are always attached to **versions**:
47 * {{code}}SCENARIO_VERSION{{/code}} +
48 {{code}}EVIDENCE_VERSION{{/code}} →
49 {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}}
50 * {{code}}SCENARIO_VERSION{{/code}} →
51 {{code}}VERDICT_VERSION{{/code}}
52
53 This ensures that when a Scenario or Evidence changes, old verdicts and links
54 remain intact as historical records and can be revisited.
55
56 ----
57
58 = 5.2 Core Data Model ERD (expanded, versioned) =
59
60 The following Mermaid ER diagram shows the main entities and their relationships.
61 The convention is that fields ending in {{code}}Id{{/code}} are primary keys,
62 and fields with {{code}}...IdFk{{/code}} are foreign keys.
63
64 {{comment}} Core Data Model ERD (Mermaid, from /Specification/Diagrams/Data Model) {{/comment}}
Robert Schaub 9.15 65 {{include document="FactHarbor.Playground.Core Data Model ERD Page (from Specification chat).WebHome" reference="Test.Playground.data.Core Data Model ERD Page (from Specification chat).WebHome"/}}
Robert Schaub 5.2 66
Robert Schaub 8.2 67 = Core Data Model ERD (Versioned) =
68
69 This diagram shows the full core data model with all versioned entities.
70
71 {{mermaid}}
72 erDiagram
73 CLAIM_CLUSTER {
74 string ClusterID PK
75 string EmbeddingVectorRef
76 string Theme
77 }
78
79 CLAIM {
80 string ClaimID PK
81 string ClusterID FK
82 string Status
83 datetime CreatedAt
84 }
85
86 CLAIM_VERSION {
87 string ClaimVersionID PK
88 string ClaimID FK
89 string Text
90 string ClaimType
91 string Domain
92 datetime CreatedAt
93 }
94
95 SCENARIO {
96 string ScenarioID PK
97 string ClaimID FK
98 string Name
99 datetime CreatedAt
100 }
101
102 SCENARIO_VERSION {
103 string ScenarioVersionID PK
104 string ScenarioID FK
105 string Definitions
106 string Assumptions
107 string Boundaries
108 datetime CreatedAt
109 }
110
111 EVIDENCE {
112 string EvidenceID PK
113 string SourceType
114 string URL
115 float ReliabilityScore
116 }
117
118 EVIDENCE_VERSION {
119 string EvidenceVersionID PK
120 string EvidenceID FK
121 string Summary
122 float ReliabilityScore
123 datetime CreatedAt
124 }
125
126 SCENARIO_EVIDENCE_LINK {
127 string LinkID PK
128 string ScenarioVersionID FK
129 string EvidenceVersionID FK
130 float Relevance
131 string Direction
132 }
133
134 VERDICT {
135 string VerdictID PK
136 string ScenarioID FK
137 }
138
139 VERDICT_VERSION {
140 string VerdictVersionID PK
141 string VerdictID FK
142 float Verdict
143 float Confidence
144 string Reasoning
145 datetime CreatedAt
146 }
147
148 CLAIM_CLUSTER ||--o{ CLAIM : contains
149 CLAIM ||--o{ CLAIM_VERSION : versions
150
151 CLAIM ||--o{ SCENARIO : has
152 SCENARIO ||--o{ SCENARIO_VERSION : versions
153
154 EVIDENCE ||--o{ EVIDENCE_VERSION : versions
155
156 SCENARIO_VERSION ||--o{ SCENARIO_EVIDENCE_LINK : links
157 EVIDENCE_VERSION ||--o{ SCENARIO_EVIDENCE_LINK : linked
158
159 SCENARIO ||--o{ VERDICT : assessed
160 VERDICT ||--o{ VERDICT_VERSION : versions
161 {{/mermaid}}
162
163 {{info}}
164 All key entities are explicitly versioned here (…VERSION tables).
165 This reflects the versioning requirements in the textual Data Model chapter.
166 {{/info}}
167
168
Robert Schaub 5.2 169 **Important points:**
170
171 * Scenarios and Evidence are **linked via their versions**
172 ({{code}}SCENARIO_VERSION{{/code}} and {{code}}EVIDENCE_VERSION{{/code}}).
173 * Verdicts are **per ScenarioVersion** and stored in {{code}}VERDICT_VERSION{{/code}}.
174 * {{code}}CLAIM_CLUSTER{{/code}} is shared across diagrams; it is shown here and in the Data Use / Review model.
175
176 All version entities are immutable: once created, they are never changed, only
177 superseded by newer versions.
178
179 ----
180
Robert Schaub 8.1 181 = 5.3 Data Use & Review ERD =
Robert Schaub 5.2 182
183 The **Data Use** model captures who does what with which versioned data:
184
185 * Users (including technical users)
186 * Roles and role assignments
187 * Review actions on versioned entities
188
189 {{comment}} Data Use ERD (Mermaid, from /Specification/Diagrams/Data Use ERD) {{/comment}}
Robert Schaub 9.16 190 {{include document="FactHarbor.Playground.Data Use ERD Page (from Specification chat).WebHome" reference="Test.Playground.data.Data Use ERD Page (from Specification chat).WebHome"/}}
Robert Schaub 5.2 191
Robert Schaub 8.2 192 = Data Use ERD (Roles, Review & Versioned Entities) =
Robert Schaub 5.2 193
Robert Schaub 8.2 194 This diagram shows how users, roles, and review actions relate to the
195 versioned core entities.
196
197 {{mermaid}}
198 erDiagram
199 %% Core clusters shown for context
200 CLAIM_CLUSTER {
201 string ClusterID PK
202 string EmbeddingVectorRef
203 string Theme
204 }
205
206 CLAIM {
207 string ClaimID PK
208 string ClusterID FK
209 string Status
210 datetime CreatedAt
211 }
212
213 CLAIM_VERSION {
214 string ClaimVersionID PK
215 string ClaimID FK
216 string Text
217 string ClaimType
218 string Domain
219 datetime CreatedAt
220 }
221
222 SCENARIO {
223 string ScenarioID PK
224 string ClaimID FK
225 string Name
226 datetime CreatedAt
227 }
228
229 SCENARIO_VERSION {
230 string ScenarioVersionID PK
231 string ScenarioID FK
232 string Definitions
233 string Assumptions
234 string Boundaries
235 datetime CreatedAt
236 }
237
238 EVIDENCE {
239 string EvidenceID PK
240 string SourceType
241 string URL
242 float ReliabilityScore
243 }
244
245 EVIDENCE_VERSION {
246 string EvidenceVersionID PK
247 string EvidenceID FK
248 string Summary
249 float ReliabilityScore
250 datetime CreatedAt
251 }
252
253 VERDICT {
254 string VerdictID PK
255 string ScenarioID FK
256 }
257
258 VERDICT_VERSION {
259 string VerdictVersionID PK
260 string VerdictID FK
261 float Verdict
262 float Confidence
263 string Reasoning
264 datetime CreatedAt
265 }
266
267 %% Users and roles
268 USER {
269 string UserID PK
270 string Handle
271 string Email
272 }
273
274 TECHNICAL_USER {
275 string UserID PK
276 string SystemName
277 }
278
279 CONTRIBUTING_USER {
280 string UserID PK
281 string DisplayName
282 }
283
284 TRUSTED_CONTRIBUTOR {
285 string UserID PK
286 string TrustLevel
287 }
288
289 REVIEWER {
290 string UserID PK
291 string Domain
292 }
293
294 EXPERT {
295 string UserID PK
296 string ExpertiseArea
297 }
298
299 FEDERATION_NODE {
300 string NodeID PK
301 string Region
302 }
303
304 FEDERATION_ADMIN {
305 string UserID PK
306 string Permissions
307 }
308
309 REVIEW_ACTION {
310 string ReviewActionID PK
311 string UserID FK
312 string TargetEntityType
313 string TargetEntityVersionID
314 string ActionType
315 string Comment
316 datetime Timestamp
317 }
318
319 %% Inheritance / specialization (modelled as relationships)
320 USER ||--o{ TECHNICAL_USER : "is a"
321 USER ||--o{ CONTRIBUTING_USER : "is a"
322
323 CONTRIBUTING_USER ||--o{ TRUSTED_CONTRIBUTOR : "subset"
324 CONTRIBUTING_USER ||--o{ REVIEWER : "subset"
325 CONTRIBUTING_USER ||--o{ EXPERT : "subset"
326
327 TECHNICAL_USER ||--o{ FEDERATION_NODE : "operates"
328 TECHNICAL_USER ||--o{ FEDERATION_ADMIN : "administers"
329
330 %% Review actions on versioned entities
331 USER ||--o{ REVIEW_ACTION : performs
332
333 REVIEW_ACTION }o--|| CLAIM_VERSION : reviews
334 REVIEW_ACTION }o--|| SCENARIO_VERSION : reviews
335 REVIEW_ACTION }o--|| EVIDENCE_VERSION : reviews
336 REVIEW_ACTION }o--|| VERDICT_VERSION : reviews
337 {{/mermaid}}
338
339 {{info}}
340 This diagram focuses on *who* uses and reviews *which* versioned entities.
341 USER is the base type; TECHNICAL_USER and CONTRIBUTING_USER are specializations.
342 Other roles (REVIEWER, EXPERT, TRUSTED_CONTRIBUTOR, FEDERATION_ADMIN, FEDERATION_NODE)
343 are modelled as specializations or technical subtypes.
344 {{/info}}
345
346
347
Robert Schaub 5.2 348 Notes:
349
350 * Most roles (READER, CONTRIBUTOR, TRUSTED_CONTRIBUTOR, REVIEWER, MODERATOR,
351 SYSTEM_ADMIN, FEDERATION_OPERATOR, FEDERATION_ADMIN, …) are represented as rows
352 in {{code}}ROLE{{/code}}.
353 * {{code}}TECHNICAL_USER{{/code}} captures strictly technical accounts (API keys,
354 node-to-node federation agents, batch jobs). All other roles can, in principle,
355 be held by both human and technical users where appropriate.
356 * A {{code}}READER{{/code}} normally does **not** perform REVIEW_ACTIONs, while
357 roles like REVIEWER, TRUSTED_CONTRIBUTOR, MODERATOR, and some federation roles
358 do.
359
360 ----
361
362 = 5.4 Versioning and re-evaluation behavior =
363
364 This section ties the data model to the re-evaluation logic
365 (described in more detail in the Versioning and Automation chapters).
366
367 * When a new {{code}}EVIDENCE_VERSION{{/code}} is created:
368 * All related {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} entries referencing
369 that evidence version are candidates for re-assessment.
370 * Related {{code}}VERDICT_VERSION{{/code}} entries may become **outdated** and
371 are queued for re-evaluation.
372
373 * When a new {{code}}SCENARIO_VERSION{{/code}} is created:
374 * It may inherit some links from earlier scenarios, or start empty depending
375 on the change classification (cosmetic vs. conceptual).
376 * All verdicts for that scenario are recalculated and stored as new
377 {{code}}VERDICT_VERSION{{/code}} entries.
378
379 * REVIEW_ACTIONs are always attached to the **exact version** that was seen by
380 the reviewer. This preserves a faithful audit trail if data later changes.
381
382 * In a federated environment, nodes can choose:
383 * which identity entities to replicate (CLAIM, SCENARIO, EVIDENCE, VERDICT)
384 * which versioned entities to replicate (e.g. only accepted VERDICT_VERSIONs,
385 only EVIDENCE_VERSIONs above a reliability threshold, etc.)
386
387 ----
388
389 = 5.5 Behavioral Notes =
390
391 == 5.5.1 Late-Arriving Evidence ==
392
393 New evidence versions can make existing verdicts **outdated** and may trigger
394 re-evaluation cascades. This is handled by the global trigger and automation
395 architecture (see the Versioning & Automation chapters).
396
397 == 5.5.2 Scenario Evolution ==
398
399 Scenario changes create new SCENARIO_VERSIONs; dependent verdicts and
400 Scenario–Evidence links are re-assessed. Old versions remain available for
401 historical comparison and reproducibility.
402
403 == 5.5.3 Federation ==
404
405 Federated nodes can replicate subsets of the graph, including:
406
407 * Claims and Scenarios of local interest
408 * Evidence metadata (without full content)
409 * Verdict lineages used for local decision-making
410
411 Federation-specific entities (such as {{code}}FEDERATION_NODE{{/code}},
412 replication logs, and trust rules) are described in the Federation &
413 Decentralization chapter and build on top of the core data model defined here.
414
415
Robert Schaub 8.2 416
Robert Schaub 9.1 417
Robert Schaub 8.2 418 USER
419 ├── TECHNICAL_USER
420 │  ├── FEDERATION_ADMIN
421 │  └── AKEL_AGENT (optional future)
422
423 READER
424 └── CONTRIBUTING_USER
425 ├── TRUSTED_CONTRIBUTOR
426 ├── REVIEWER
427 ├── EXPERT
428 ├── MODERATOR
429
430
431 ADMIN
432
433 FEDERATION_ADMIN (administrative, but human)
434
435
Robert Schaub 4.1 436 == 1. Overall analysis & review of the data model ==
437
438 === 1.1 Strengths of the current design ===
439
440 * (((
441 **Identity vs. version pattern**
442 Using base entities plus version entities (CLAIM + CLAIM_VERSION, SCENARIO + SCENARIO_VERSION, etc.) is exactly how modern knowledge systems handle:
443
444 * auditability
445 * time evolution
446 * re-evaluation triggers
447 * federation and partial replication
448 )))
449 * (((
450 **Scenario-centric reasoning**
451 Separating //Claim// (what people argue about) from //Scenario// (interpretive frame) is very aligned with “truth landscape” style systems:
452
453 * Scenarios explain //why people disagree//.
454 * Verdicts are tied to specific scenario versions → avoids mixing incompatible assumptions.
455 )))
456 * **Evidence and verdicts as first-class entities**
457 Evidence is explicit, linked to scenarios, and verdicts are per scenario. This matches good practice from fact-checking, scientific assessment panels, and trust graphs.
458 * (((
459 **Cluster level (CLAIM_CLUSTER)**
460 Grouping related claims avoids duplication and lets you:
461
462 * reuse scenarios across paraphrases
463 * share embeddings / semantic search
464 * keep the system scalable as the corpus grows.
465 )))
466 * (((
467 **Explicit review layer (REVIEW_ACTION, roles, etc.)**
468 Separating “data” from “who reviewed what” keeps the model clean, and is exactly what you want for:
469
470 * governance
471 * permissions
472 * audit trails
473 * future trust scoring per user / role.
474 )))
475
476 ----
477
478 === 1.2 Design decisions I’m locking in (based on our discussions) ===
479
480 To make the model consistent and “state-of-the-art”, I will assume the following as //current intended design//:
481
482 1. (((
483 **Claims vs Scenarios**
484
485 * CLAIM is the stable identity for “what people argue about”.
486 * CLAIM_VERSION are individual phrasings / formulations / metadata.
487 * (((
488 SCENARIO belongs to a **CLAIM**, not to a specific CLAIM_VERSION.
489 Rationale:
490
491 * Many different phrasings share the //same// scenario.
492 * You avoid duplicating scenarios per wording.
493 )))
494 * SCENARIO_VERSION holds detailed definitions, assumptions, boundaries, etc.
495 )))
496 1. (((
497 **Version-specific reasoning**
498
499 * **Verdicts** are always attached to SCENARIO_VERSION (not base SCENARIO).
500 * **Evidence links** are between SCENARIO_VERSION and EVIDENCE_VERSION.
501 → This is what we agreed when we said //“SCENARIO_EVIDENCE_LINK should link the respective versions instead”//.
502 )))
503 1. (((
504 **Clusters**
505
506 * CLAIM_CLUSTER groups Claims (semantically close claims).
507 * It is visible in **both diagrams** (Core Data Model and Data Use).
508 )))
509 1. (((
510 **Review vs data**
511
512 * (((
513 All review happens **on versioned entities**:
514
515 * CLAIM_VERSION
516 * SCENARIO_VERSION
517 * EVIDENCE_VERSION
518 * SCENARIO_EVIDENCE_LINK_VERSION
519 * VERDICT_VERSION
520 )))
521 * REVIEW_ACTION is the generic log of //who// did //what// on //which version//.
522 )))
523 1. (((
524 **Users & roles**
525
526 * USER has an attribute (or a linked entity) that distinguishes **technical users** from normal accounts.
527 * We //keep// TECHNICAL_USER as a specialisation of USER (strictly technical accounts).
528 * All human & technical accounts can hold roles via USER_ROLE_MEMBERSHIP.
529 * (((
530 Roles include:
531
532 * READER
533 * CONTRIBUTOR
534 * TRUSTED_CONTRIBUTOR
535 * REVIEWER
536 * MODERATOR
537 * SYSTEM_ADMIN / MAINTAINER
538 * FEDERATION_OPERATOR
539 * FEDERATION_ADMIN
540 (all present in the Data Use ERD, but as rows of ROLE rather than separate entities).
541 )))
542 )))
543
544 ----
545
546 === 1.3 Gaps / potential problems ===
547
548 These are the main issues & missing areas I see:
549
550 1. (((
551 **Versioning text in chapter 5 is currently too thin (‘…’ placeholders)**
552
553 * (((
554 The spec does not yet //verbally// spell out:
555
556 * the identity vs version pattern, systematically
557 * how re-evaluation triggers are derived from version changes
558 * how this aligns with federation (which versions are replicated where).
559 )))
560 )))
561 1. (((
562 **No explicit “provenance granularity” in the model**
563
564 * (((
565 EVIDENCE is a single entity. For more advanced use cases, you may later want:
566
567 * EVIDENCE_SOURCE (the whole article/report/video)
568 * EVIDENCE_FRAGMENT (specific paragraph/clip with its own reliability, quote, etc.)
569 )))
570 * For now, I’ll keep EVIDENCE/EVIDENCE_VERSION as is, but I’ll mention this as a possible extension.
571 )))
572 1. (((
573 **Review target polymorphism**
574
575 * (((
576 REVIEW_ACTION can apply to multiple entity types. In the diagram this shows as multiple relationships:
577
578 * CLAIM_VERSION → REVIEW_ACTION
579 * SCENARIO_VERSION → REVIEW_ACTION
580 * etc.
581 )))
582 * A more “pure” relational modeling would use a generic “subjectType + subjectId” or an intermediate “REVIEW_TARGET” table.
583 * For readability, I’ll keep the simpler multi-edge representation and mention the polymorphism in text.
584 )))
585 1. (((
586 **Federation details missing from core ERD**
587
588 * There is no explicit FEDERATION_NODE / REPLICATION_LOG in the Data Model chapter.
589 * This is ok for “core logical data model”, but I’ll add a short note that federation metadata is handled in the Federation chapter and via additional entities.
590 )))
591 1. (((
592 **Automation / AKEL artifacts left implicit**
593
594 * (((
595 The Data Model chapter currently doesn’t describe:
596
597 * AKEL task queues
598 * extraction runs
599 * model versions
600 )))
601 * That’s fine for now; I’ll just clarify that those belong to a “Processing / AKEL” submodel, not the core logical data model.
602 )))