Version 7.1 by Robert Schaub on 2025/11/27 12:41

Show last authors
1 (((
2
3 )))
4
5 = 5. Data Model =
6
7 The FactHarbor data model centers on four fully versioned, immutable entities:
8
9 * **Claim**
10 * **Scenario**
11 * **Evidence**
12 * **Verdict**
13
14 These entities form the structured **“truth landscape”** for each claim.
15 The model is explicitly **versioned**, **traceable**, and **federation-ready**.
16
17 To keep the system auditable and explainable, FactHarbor uses a consistent
18 **identity vs. version** pattern:
19
20 * Identity entities (e.g. {{code}}CLAIM{{/code}}, {{code}}SCENARIO{{/code}})
21 define *what* something is in a stable sense.
22 * Version entities (e.g. {{code}}CLAIM_VERSION{{/code}}, {{code}}SCENARIO_VERSION{{/code}})
23 define *how that thing looked at a given point in time*.
24
25 All reasoning (e.g. verdicts, review actions) is attached to **versions**, never to
26 mutable identities.
27
28 ----
29
30 = 5.1 Core entities and versioning pattern =
31
32 (% class="wikitable" %)
33 | **Logical concept** | **Identity entity** | **Version entity** | **Notes**
34 | Claim (what people argue about) | {{code}}CLAIM{{/code}} | {{code}}CLAIM_VERSION{{/code}} | Claim text, phrasing, and metadata live in {{code}}CLAIM_VERSION{{/code}}. The identity {{code}}CLAIM{{/code}} stays stable across rephrasings.
35 | Scenario (interpretive frame) | {{code}}SCENARIO{{/code}} | {{code}}SCENARIO_VERSION{{/code}} | A SCENARIO belongs to a CLAIM. Its versions capture evolving definitions, assumptions, and boundaries.
36 | Evidence (source / datapoint) | {{code}}EVIDENCE{{/code}} | {{code}}EVIDENCE_VERSION{{/code}} | Identity of a source vs. specific extractions / updates over time.
37 | Verdict (assessment) | {{code}}VERDICT{{/code}} | {{code}}VERDICT_VERSION{{/code}} | A VERDICT is defined per SCENARIO; VERDICT_VERSION captures the history of assessments.
38 | Scenario–Evidence link | {{code}}SCENARIO_EVIDENCE_LINK{{/code}} | {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} | Links bind scenario versions to evidence versions with relevance & direction.
39 | Claim cluster (semantic group) | {{code}}CLAIM_CLUSTER{{/code}} | – | Groups semantically related claims; mainly for discovery and navigation.
40
41 Key design decisions:
42
43 * A {{code}}CLAIM{{/code}} belongs to exactly one {{code}}CLAIM_CLUSTER{{/code}}.
44 * A {{code}}SCENARIO{{/code}} belongs to exactly one {{code}}CLAIM{{/code}}
45 (scenarios live at the *claim* level, not per individual phrasing).
46 * Verdicts and Scenario–Evidence links are always attached to **versions**:
47 * {{code}}SCENARIO_VERSION{{/code}} +
48 {{code}}EVIDENCE_VERSION{{/code}} →
49 {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}}
50 * {{code}}SCENARIO_VERSION{{/code}} →
51 {{code}}VERDICT_VERSION{{/code}}
52
53 This ensures that when a Scenario or Evidence changes, old verdicts and links
54 remain intact as historical records and can be revisited.
55
56 ----
57
58 = 5.2 Core Data Model ERD (expanded, versioned) =
59
60 The following Mermaid ER diagram shows the main entities and their relationships.
61 The convention is that fields ending in {{code}}Id{{/code}} are primary keys,
62 and fields with {{code}}...IdFk{{/code}} are foreign keys.
63
64 {{comment}} Core Data Model ERD (Mermaid, from /Specification/Diagrams/Data Model) {{/comment}}
65 {{include document="FactHarbor.Playground.Core Data Model ERD Page (from Specification chat).WebHome" reference="FactHarbor.Playground.data.Core Data Model ERD Page (from Specification chat).WebHome"/}}
66
67 **Important points:**
68
69 * Scenarios and Evidence are **linked via their versions**
70 ({{code}}SCENARIO_VERSION{{/code}} and {{code}}EVIDENCE_VERSION{{/code}}).
71 * Verdicts are **per ScenarioVersion** and stored in {{code}}VERDICT_VERSION{{/code}}.
72 * {{code}}CLAIM_CLUSTER{{/code}} is shared across diagrams; it is shown here and in the Data Use / Review model.
73
74 All version entities are immutable: once created, they are never changed, only
75 superseded by newer versions.
76
77 ----
78
79 = 5.3 Data Use & Review ERD (expanded, versioned) =
80
81 The **Data Use** model captures who does what with which versioned data:
82
83 * Users (including technical users)
84 * Roles and role assignments
85 * Review actions on versioned entities
86
87 {{comment}} Data Use ERD (Mermaid, from /Specification/Diagrams/Data Use ERD) {{/comment}}
88 {{include document="FactHarbor.Playground.Data Use ERD Page (from Specification chat).WebHome" reference="FactHarbor.Playground.data.Data Use ERD Page (from Specification chat).WebHome"/}}
89
90 = Data Use ERD (Roles, Review & Versioned Entities) =
91
92 This diagram shows how users, roles, and review actions relate to the
93 versioned core entities.
94
95 {{mermaid}}
96 erDiagram
97 %% Core clusters shown for context
98 CLAIM_CLUSTER {
99 string ClusterID PK
100 string EmbeddingVectorRef
101 string Theme
102 }
103
104 CLAIM {
105 string ClaimID PK
106 string ClusterID FK
107 string Status
108 datetime CreatedAt
109 }
110
111 CLAIM_VERSION {
112 string ClaimVersionID PK
113 string ClaimID FK
114 string Text
115 string ClaimType
116 string Domain
117 datetime CreatedAt
118 }
119
120 SCENARIO {
121 string ScenarioID PK
122 string ClaimID FK
123 string Name
124 datetime CreatedAt
125 }
126
127 SCENARIO_VERSION {
128 string ScenarioVersionID PK
129 string ScenarioID FK
130 string Definitions
131 string Assumptions
132 string Boundaries
133 datetime CreatedAt
134 }
135
136 EVIDENCE {
137 string EvidenceID PK
138 string SourceType
139 string URL
140 float ReliabilityScore
141 }
142
143 EVIDENCE_VERSION {
144 string EvidenceVersionID PK
145 string EvidenceID FK
146 string Summary
147 float ReliabilityScore
148 datetime CreatedAt
149 }
150
151 VERDICT {
152 string VerdictID PK
153 string ScenarioID FK
154 }
155
156 VERDICT_VERSION {
157 string VerdictVersionID PK
158 string VerdictID FK
159 float Verdict
160 float Confidence
161 string Reasoning
162 datetime CreatedAt
163 }
164
165 %% Users and roles
166 USER {
167 string UserID PK
168 string Handle
169 string Email
170 }
171
172 TECHNICAL_USER {
173 string UserID PK
174 string SystemName
175 }
176
177 CONTRIBUTING_USER {
178 string UserID PK
179 string DisplayName
180 }
181
182 TRUSTED_CONTRIBUTOR {
183 string UserID PK
184 string TrustLevel
185 }
186
187 REVIEWER {
188 string UserID PK
189 string Domain
190 }
191
192 EXPERT {
193 string UserID PK
194 string ExpertiseArea
195 }
196
197 FEDERATION_NODE {
198 string NodeID PK
199 string Region
200 }
201
202 FEDERATION_ADMIN {
203 string UserID PK
204 string Permissions
205 }
206
207 REVIEW_ACTION {
208 string ReviewActionID PK
209 string UserID FK
210 string TargetEntityType
211 string TargetEntityVersionID
212 string ActionType
213 string Comment
214 datetime Timestamp
215 }
216
217 %% Inheritance / specialization (modelled as relationships)
218 USER ||--o{ TECHNICAL_USER : "is a"
219 USER ||--o{ CONTRIBUTING_USER : "is a"
220
221 CONTRIBUTING_USER ||--o{ TRUSTED_CONTRIBUTOR : "subset"
222 CONTRIBUTING_USER ||--o{ REVIEWER : "subset"
223 CONTRIBUTING_USER ||--o{ EXPERT : "subset"
224
225 TECHNICAL_USER ||--o{ FEDERATION_NODE : "operates"
226 TECHNICAL_USER ||--o{ FEDERATION_ADMIN : "administers"
227
228 %% Review actions on versioned entities
229 USER ||--o{ REVIEW_ACTION : performs
230
231 REVIEW_ACTION }o--|| CLAIM_VERSION : reviews
232 REVIEW_ACTION }o--|| SCENARIO_VERSION : reviews
233 REVIEW_ACTION }o--|| EVIDENCE_VERSION : reviews
234 REVIEW_ACTION }o--|| VERDICT_VERSION : reviews
235 {{/mermaid}}
236
237 {{info}}
238 This diagram focuses on *who* uses and reviews *which* versioned entities.
239 USER is the base type; TECHNICAL_USER and CONTRIBUTING_USER are specializations.
240 Other roles (REVIEWER, EXPERT, TRUSTED_CONTRIBUTOR, FEDERATION_ADMIN, FEDERATION_NODE)
241 are modelled as specializations or technical subtypes.
242 {{/info}}
243
244
245 Notes:
246
247 * Most roles (READER, CONTRIBUTOR, TRUSTED_CONTRIBUTOR, REVIEWER, MODERATOR,
248 SYSTEM_ADMIN, FEDERATION_OPERATOR, FEDERATION_ADMIN, …) are represented as rows
249 in {{code}}ROLE{{/code}}.
250 * {{code}}TECHNICAL_USER{{/code}} captures strictly technical accounts (API keys,
251 node-to-node federation agents, batch jobs). All other roles can, in principle,
252 be held by both human and technical users where appropriate.
253 * A {{code}}READER{{/code}} normally does **not** perform REVIEW_ACTIONs, while
254 roles like REVIEWER, TRUSTED_CONTRIBUTOR, MODERATOR, and some federation roles
255 do.
256
257 ----
258
259 = 5.4 Versioning and re-evaluation behavior =
260
261 This section ties the data model to the re-evaluation logic
262 (described in more detail in the Versioning and Automation chapters).
263
264 * When a new {{code}}EVIDENCE_VERSION{{/code}} is created:
265 * All related {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} entries referencing
266 that evidence version are candidates for re-assessment.
267 * Related {{code}}VERDICT_VERSION{{/code}} entries may become **outdated** and
268 are queued for re-evaluation.
269
270 * When a new {{code}}SCENARIO_VERSION{{/code}} is created:
271 * It may inherit some links from earlier scenarios, or start empty depending
272 on the change classification (cosmetic vs. conceptual).
273 * All verdicts for that scenario are recalculated and stored as new
274 {{code}}VERDICT_VERSION{{/code}} entries.
275
276 * REVIEW_ACTIONs are always attached to the **exact version** that was seen by
277 the reviewer. This preserves a faithful audit trail if data later changes.
278
279 * In a federated environment, nodes can choose:
280 * which identity entities to replicate (CLAIM, SCENARIO, EVIDENCE, VERDICT)
281 * which versioned entities to replicate (e.g. only accepted VERDICT_VERSIONs,
282 only EVIDENCE_VERSIONs above a reliability threshold, etc.)
283
284 ----
285
286 = 5.5 Behavioral Notes =
287
288 == 5.5.1 Late-Arriving Evidence ==
289
290 New evidence versions can make existing verdicts **outdated** and may trigger
291 re-evaluation cascades. This is handled by the global trigger and automation
292 architecture (see the Versioning & Automation chapters).
293
294 == 5.5.2 Scenario Evolution ==
295
296 Scenario changes create new SCENARIO_VERSIONs; dependent verdicts and
297 Scenario–Evidence links are re-assessed. Old versions remain available for
298 historical comparison and reproducibility.
299
300 == 5.5.3 Federation ==
301
302 Federated nodes can replicate subsets of the graph, including:
303
304 * Claims and Scenarios of local interest
305 * Evidence metadata (without full content)
306 * Verdict lineages used for local decision-making
307
308 Federation-specific entities (such as {{code}}FEDERATION_NODE{{/code}},
309 replication logs, and trust rules) are described in the Federation &
310 Decentralization chapter and build on top of the core data model defined here.
311
312 ----
313
314 == 1. Overall analysis & review of the data model ==
315
316 === 1.1 Strengths of the current design ===
317
318 * (((
319 **Identity vs. version pattern**
320 Using base entities plus version entities (CLAIM + CLAIM_VERSION, SCENARIO + SCENARIO_VERSION, etc.) is exactly how modern knowledge systems handle:
321
322 * auditability
323 * time evolution
324 * re-evaluation triggers
325 * federation and partial replication
326 )))
327 * (((
328 **Scenario-centric reasoning**
329 Separating //Claim// (what people argue about) from //Scenario// (interpretive frame) is very aligned with “truth landscape” style systems:
330
331 * Scenarios explain //why people disagree//.
332 * Verdicts are tied to specific scenario versions → avoids mixing incompatible assumptions.
333 )))
334 * **Evidence and verdicts as first-class entities**
335 Evidence is explicit, linked to scenarios, and verdicts are per scenario. This matches good practice from fact-checking, scientific assessment panels, and trust graphs.
336 * (((
337 **Cluster level (CLAIM_CLUSTER)**
338 Grouping related claims avoids duplication and lets you:
339
340 * reuse scenarios across paraphrases
341 * share embeddings / semantic search
342 * keep the system scalable as the corpus grows.
343 )))
344 * (((
345 **Explicit review layer (REVIEW_ACTION, roles, etc.)**
346 Separating “data” from “who reviewed what” keeps the model clean, and is exactly what you want for:
347
348 * governance
349 * permissions
350 * audit trails
351 * future trust scoring per user / role.
352 )))
353
354 ----
355
356 === 1.2 Design decisions I’m locking in (based on our discussions) ===
357
358 To make the model consistent and “state-of-the-art”, I will assume the following as //current intended design//:
359
360 1. (((
361 **Claims vs Scenarios**
362
363 * CLAIM is the stable identity for “what people argue about”.
364 * CLAIM_VERSION are individual phrasings / formulations / metadata.
365 * (((
366 SCENARIO belongs to a **CLAIM**, not to a specific CLAIM_VERSION.
367 Rationale:
368
369 * Many different phrasings share the //same// scenario.
370 * You avoid duplicating scenarios per wording.
371 )))
372 * SCENARIO_VERSION holds detailed definitions, assumptions, boundaries, etc.
373 )))
374 1. (((
375 **Version-specific reasoning**
376
377 * **Verdicts** are always attached to SCENARIO_VERSION (not base SCENARIO).
378 * **Evidence links** are between SCENARIO_VERSION and EVIDENCE_VERSION.
379 → This is what we agreed when we said //“SCENARIO_EVIDENCE_LINK should link the respective versions instead”//.
380 )))
381 1. (((
382 **Clusters**
383
384 * CLAIM_CLUSTER groups Claims (semantically close claims).
385 * It is visible in **both diagrams** (Core Data Model and Data Use).
386 )))
387 1. (((
388 **Review vs data**
389
390 * (((
391 All review happens **on versioned entities**:
392
393 * CLAIM_VERSION
394 * SCENARIO_VERSION
395 * EVIDENCE_VERSION
396 * SCENARIO_EVIDENCE_LINK_VERSION
397 * VERDICT_VERSION
398 )))
399 * REVIEW_ACTION is the generic log of //who// did //what// on //which version//.
400 )))
401 1. (((
402 **Users & roles**
403
404 * USER has an attribute (or a linked entity) that distinguishes **technical users** from normal accounts.
405 * We //keep// TECHNICAL_USER as a specialisation of USER (strictly technical accounts).
406 * All human & technical accounts can hold roles via USER_ROLE_MEMBERSHIP.
407 * (((
408 Roles include:
409
410 * READER
411 * CONTRIBUTOR
412 * TRUSTED_CONTRIBUTOR
413 * REVIEWER
414 * MODERATOR
415 * SYSTEM_ADMIN / MAINTAINER
416 * FEDERATION_OPERATOR
417 * FEDERATION_ADMIN
418 (all present in the Data Use ERD, but as rows of ROLE rather than separate entities).
419 )))
420 )))
421
422 ----
423
424 === 1.3 Gaps / potential problems ===
425
426 These are the main issues & missing areas I see:
427
428 1. (((
429 **Versioning text in chapter 5 is currently too thin (‘…’ placeholders)**
430
431 * (((
432 The spec does not yet //verbally// spell out:
433
434 * the identity vs version pattern, systematically
435 * how re-evaluation triggers are derived from version changes
436 * how this aligns with federation (which versions are replicated where).
437 )))
438 )))
439 1. (((
440 **No explicit “provenance granularity” in the model**
441
442 * (((
443 EVIDENCE is a single entity. For more advanced use cases, you may later want:
444
445 * EVIDENCE_SOURCE (the whole article/report/video)
446 * EVIDENCE_FRAGMENT (specific paragraph/clip with its own reliability, quote, etc.)
447 )))
448 * For now, I’ll keep EVIDENCE/EVIDENCE_VERSION as is, but I’ll mention this as a possible extension.
449 )))
450 1. (((
451 **Review target polymorphism**
452
453 * (((
454 REVIEW_ACTION can apply to multiple entity types. In the diagram this shows as multiple relationships:
455
456 * CLAIM_VERSION → REVIEW_ACTION
457 * SCENARIO_VERSION → REVIEW_ACTION
458 * etc.
459 )))
460 * A more “pure” relational modeling would use a generic “subjectType + subjectId” or an intermediate “REVIEW_TARGET” table.
461 * For readability, I’ll keep the simpler multi-edge representation and mention the polymorphism in text.
462 )))
463 1. (((
464 **Federation details missing from core ERD**
465
466 * There is no explicit FEDERATION_NODE / REPLICATION_LOG in the Data Model chapter.
467 * This is ok for “core logical data model”, but I’ll add a short note that federation metadata is handled in the Federation chapter and via additional entities.
468 )))
469 1. (((
470 **Automation / AKEL artifacts left implicit**
471
472 * (((
473 The Data Model chapter currently doesn’t describe:
474
475 * AKEL task queues
476 * extraction runs
477 * model versions
478 )))
479 * That’s fine for now; I’ll just clarify that those belong to a “Processing / AKEL” submodel, not the core logical data model.
480 )))