Version 4.1 by Robert Schaub on 2025/11/27 12:11

Show last authors
1 == 1. Overall analysis & review of the data model ==
2
3 === 1.1 Strengths of the current design ===
4
5 * (((
6 **Identity vs. version pattern**
7 Using base entities plus version entities (CLAIM + CLAIM_VERSION, SCENARIO + SCENARIO_VERSION, etc.) is exactly how modern knowledge systems handle:
8
9 * auditability
10 * time evolution
11 * re-evaluation triggers
12 * federation and partial replication
13 )))
14 * (((
15 **Scenario-centric reasoning**
16 Separating //Claim// (what people argue about) from //Scenario// (interpretive frame) is very aligned with “truth landscape” style systems:
17
18 * Scenarios explain //why people disagree//.
19 * Verdicts are tied to specific scenario versions → avoids mixing incompatible assumptions.
20 )))
21 * **Evidence and verdicts as first-class entities**
22 Evidence is explicit, linked to scenarios, and verdicts are per scenario. This matches good practice from fact-checking, scientific assessment panels, and trust graphs.
23 * (((
24 **Cluster level (CLAIM_CLUSTER)**
25 Grouping related claims avoids duplication and lets you:
26
27 * reuse scenarios across paraphrases
28 * share embeddings / semantic search
29 * keep the system scalable as the corpus grows.
30 )))
31 * (((
32 **Explicit review layer (REVIEW_ACTION, roles, etc.)**
33 Separating “data” from “who reviewed what” keeps the model clean, and is exactly what you want for:
34
35 * governance
36 * permissions
37 * audit trails
38 * future trust scoring per user / role.
39 )))
40
41 ----
42
43 === 1.2 Design decisions I’m locking in (based on our discussions) ===
44
45 To make the model consistent and “state-of-the-art”, I will assume the following as //current intended design//:
46
47 1. (((
48 **Claims vs Scenarios**
49
50 * CLAIM is the stable identity for “what people argue about”.
51 * CLAIM_VERSION are individual phrasings / formulations / metadata.
52 * (((
53 SCENARIO belongs to a **CLAIM**, not to a specific CLAIM_VERSION.
54 Rationale:
55
56 * Many different phrasings share the //same// scenario.
57 * You avoid duplicating scenarios per wording.
58 )))
59 * SCENARIO_VERSION holds detailed definitions, assumptions, boundaries, etc.
60 )))
61 1. (((
62 **Version-specific reasoning**
63
64 * **Verdicts** are always attached to SCENARIO_VERSION (not base SCENARIO).
65 * **Evidence links** are between SCENARIO_VERSION and EVIDENCE_VERSION.
66 → This is what we agreed when we said //“SCENARIO_EVIDENCE_LINK should link the respective versions instead”//.
67 )))
68 1. (((
69 **Clusters**
70
71 * CLAIM_CLUSTER groups Claims (semantically close claims).
72 * It is visible in **both diagrams** (Core Data Model and Data Use).
73 )))
74 1. (((
75 **Review vs data**
76
77 * (((
78 All review happens **on versioned entities**:
79
80 * CLAIM_VERSION
81 * SCENARIO_VERSION
82 * EVIDENCE_VERSION
83 * SCENARIO_EVIDENCE_LINK_VERSION
84 * VERDICT_VERSION
85 )))
86 * REVIEW_ACTION is the generic log of //who// did //what// on //which version//.
87 )))
88 1. (((
89 **Users & roles**
90
91 * USER has an attribute (or a linked entity) that distinguishes **technical users** from normal accounts.
92 * We //keep// TECHNICAL_USER as a specialisation of USER (strictly technical accounts).
93 * All human & technical accounts can hold roles via USER_ROLE_MEMBERSHIP.
94 * (((
95 Roles include:
96
97 * READER
98 * CONTRIBUTOR
99 * TRUSTED_CONTRIBUTOR
100 * REVIEWER
101 * MODERATOR
102 * SYSTEM_ADMIN / MAINTAINER
103 * FEDERATION_OPERATOR
104 * FEDERATION_ADMIN
105 (all present in the Data Use ERD, but as rows of ROLE rather than separate entities).
106 )))
107 )))
108
109 ----
110
111 === 1.3 Gaps / potential problems ===
112
113 These are the main issues & missing areas I see:
114
115 1. (((
116 **Versioning text in chapter 5 is currently too thin (‘…’ placeholders)**
117
118 * (((
119 The spec does not yet //verbally// spell out:
120
121 * the identity vs version pattern, systematically
122 * how re-evaluation triggers are derived from version changes
123 * how this aligns with federation (which versions are replicated where).
124 )))
125 )))
126 1. (((
127 **No explicit “provenance granularity” in the model**
128
129 * (((
130 EVIDENCE is a single entity. For more advanced use cases, you may later want:
131
132 * EVIDENCE_SOURCE (the whole article/report/video)
133 * EVIDENCE_FRAGMENT (specific paragraph/clip with its own reliability, quote, etc.)
134 )))
135 * For now, I’ll keep EVIDENCE/EVIDENCE_VERSION as is, but I’ll mention this as a possible extension.
136 )))
137 1. (((
138 **Review target polymorphism**
139
140 * (((
141 REVIEW_ACTION can apply to multiple entity types. In the diagram this shows as multiple relationships:
142
143 * CLAIM_VERSION → REVIEW_ACTION
144 * SCENARIO_VERSION → REVIEW_ACTION
145 * etc.
146 )))
147 * A more “pure” relational modeling would use a generic “subjectType + subjectId” or an intermediate “REVIEW_TARGET” table.
148 * For readability, I’ll keep the simpler multi-edge representation and mention the polymorphism in text.
149 )))
150 1. (((
151 **Federation details missing from core ERD**
152
153 * There is no explicit FEDERATION_NODE / REPLICATION_LOG in the Data Model chapter.
154 * This is ok for “core logical data model”, but I’ll add a short note that federation metadata is handled in the Federation chapter and via additional entities.
155 )))
156 1. (((
157 **Automation / AKEL artifacts left implicit**
158
159 * (((
160 The Data Model chapter currently doesn’t describe:
161
162 * AKEL task queues
163 * extraction runs
164 * model versions
165 )))
166 * That’s fine for now; I’ll just clarify that those belong to a “Processing / AKEL” submodel, not the core logical data model.
167 )))
168
169 = 5. Data Model =
170
171 The FactHarbor data model centers on four fully versioned, immutable entities:
172
173 * **Claim**
174 * **Scenario**
175 * **Evidence**
176 * **Verdict**
177
178 These entities form the structured **“truth landscape”** for each claim.
179 The model is explicitly **versioned**, **traceable**, and **federation-ready**.
180
181 To keep the system auditable and explainable, FactHarbor uses a consistent
182 **identity vs. version** pattern:
183
184 * Identity entities (e.g. {{code}}CLAIM{{/code}}, {{code}}SCENARIO{{/code}})
185 define *what* something is in a stable sense.
186 * Version entities (e.g. {{code}}CLAIM_VERSION{{/code}}, {{code}}SCENARIO_VERSION{{/code}})
187 define *how that thing looked at a given point in time*.
188
189 All reasoning (e.g. verdicts, review actions) is attached to **versions**, never to
190 mutable identities.
191
192 ----
193
194 = 5.1 Core entities and versioning pattern =
195
196 (% class="wikitable" %)
197 | **Logical concept** | **Identity entity** | **Version entity** | **Notes**
198 | Claim (what people argue about) | {{code}}CLAIM{{/code}} | {{code}}CLAIM_VERSION{{/code}} | Claim text, phrasing, and metadata live in {{code}}CLAIM_VERSION{{/code}}. The identity {{code}}CLAIM{{/code}} stays stable across rephrasings.
199 | Scenario (interpretive frame) | {{code}}SCENARIO{{/code}} | {{code}}SCENARIO_VERSION{{/code}} | A SCENARIO belongs to a CLAIM. Its versions capture evolving definitions, assumptions, and boundaries.
200 | Evidence (source / datapoint) | {{code}}EVIDENCE{{/code}} | {{code}}EVIDENCE_VERSION{{/code}} | Identity of a source vs. specific extractions / updates over time.
201 | Verdict (assessment) | {{code}}VERDICT{{/code}} | {{code}}VERDICT_VERSION{{/code}} | A VERDICT is defined per SCENARIO; VERDICT_VERSION captures the history of assessments.
202 | Scenario–Evidence link | {{code}}SCENARIO_EVIDENCE_LINK{{/code}} | {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} | Links bind scenario versions to evidence versions with relevance & direction.
203 | Claim cluster (semantic group) | {{code}}CLAIM_CLUSTER{{/code}} | – | Groups semantically related claims; mainly for discovery and navigation.
204
205 Key design decisions:
206
207 * A {{code}}CLAIM{{/code}} belongs to exactly one {{code}}CLAIM_CLUSTER{{/code}}.
208 * A {{code}}SCENARIO{{/code}} belongs to exactly one {{code}}CLAIM{{/code}}
209 (scenarios live at the *claim* level, not per individual phrasing).
210 * Verdicts and Scenario–Evidence links are always attached to **versions**:
211 * {{code}}SCENARIO_VERSION{{/code}} +
212 {{code}}EVIDENCE_VERSION{{/code}} →
213 {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}}
214 * {{code}}SCENARIO_VERSION{{/code}} →
215 {{code}}VERDICT_VERSION{{/code}}
216
217 This ensures that when a Scenario or Evidence changes, old verdicts and links
218 remain intact as historical records and can be revisited.
219
220 ----
221
222 = 5.2 Core Data Model ERD (expanded, versioned) =
223
224 The following Mermaid ER diagram shows the main entities and their relationships.
225 The convention is that fields ending in {{code}}Id{{/code}} are primary keys,
226 and fields with {{code}}...IdFk{{/code}} are foreign keys.
227
228 {{mermaid}}
229 erDiagram
230 CLAIM_CLUSTER {
231 string ClusterID PK
232 string EmbeddingVectorRef
233 string Theme
234 }
235
236 CLAIM {
237 string ClaimID PK
238 string ClusterID FK
239 string Status
240 datetime CreatedAt
241 }
242
243 CLAIM_VERSION {
244 string ClaimVersionID PK
245 string ClaimID FK
246 string Text
247 string ClaimType
248 string Domain
249 datetime CreatedAt
250 }
251
252 SCENARIO {
253 string ScenarioID PK
254 string ClaimID FK
255 string Name
256 datetime CreatedAt
257 }
258
259 SCENARIO_VERSION {
260 string ScenarioVersionID PK
261 string ScenarioID FK
262 string Definitions
263 string Assumptions
264 string Boundaries
265 datetime CreatedAt
266 }
267
268 EVIDENCE {
269 string EvidenceID PK
270 string SourceType
271 string URL
272 float ReliabilityScore
273 }
274
275 EVIDENCE_VERSION {
276 string EvidenceVersionID PK
277 string EvidenceID FK
278 string Summary
279 float ReliabilityScore
280 datetime CreatedAt
281 }
282
283 SCENARIO_EVIDENCE_LINK {
284 string LinkID PK
285 string ScenarioVersionID FK
286 string EvidenceVersionID FK
287 float Relevance
288 string Direction
289 }
290
291 VERDICT {
292 string VerdictID PK
293 string ScenarioID FK
294 }
295
296 VERDICT_VERSION {
297 string VerdictVersionID PK
298 string VerdictID FK
299 float Verdict
300 float Confidence
301 string Reasoning
302 datetime CreatedAt
303 }
304
305 CLAIM_CLUSTER ||--o{ CLAIM : contains
306 CLAIM ||--o{ CLAIM_VERSION : versions
307
308 CLAIM ||--o{ SCENARIO : has
309 SCENARIO ||--o{ SCENARIO_VERSION : versions
310
311 EVIDENCE ||--o{ EVIDENCE_VERSION : versions
312
313 SCENARIO_VERSION ||--o{ SCENARIO_EVIDENCE_LINK : links
314 EVIDENCE_VERSION ||--o{ SCENARIO_EVIDENCE_LINK : linked
315
316 SCENARIO ||--o{ VERDICT : assessed
317 VERDICT ||--o{ VERDICT_VERSION : versions
318
319 {{/mermaid}}
320
321 **Important points:**
322
323 * Scenarios and Evidence are **linked via their versions**
324 ({{code}}SCENARIO_VERSION{{/code}} and {{code}}EVIDENCE_VERSION{{/code}}).
325 * Verdicts are **per ScenarioVersion** and stored in {{code}}VERDICT_VERSION{{/code}}.
326 * {{code}}CLAIM_CLUSTER{{/code}} is shared across diagrams; it is shown here and in the Data Use / Review model.
327
328 All version entities are immutable: once created, they are never changed, only
329 superseded by newer versions.
330
331 ----
332
333 = 5.3 Data Use & Review ERD (expanded, versioned) =
334
335 The **Data Use** model captures who does what with which versioned data:
336
337 * Users (including technical users)
338 * Roles and role assignments
339 * Review actions on versioned entities
340
341 {{mermaid}}
342 erDiagram
343 %% Core clusters shown for context
344 CLAIM_CLUSTER {
345 string ClusterID PK
346 string EmbeddingVectorRef
347 string Theme
348 }
349
350 CLAIM {
351 string ClaimID PK
352 string ClusterID FK
353 string Status
354 datetime CreatedAt
355 }
356
357 CLAIM_VERSION {
358 string ClaimVersionID PK
359 string ClaimID FK
360 string Text
361 string ClaimType
362 string Domain
363 datetime CreatedAt
364 }
365
366 SCENARIO {
367 string ScenarioID PK
368 string ClaimID FK
369 string Name
370 datetime CreatedAt
371 }
372
373 SCENARIO_VERSION {
374 string ScenarioVersionID PK
375 string ScenarioID FK
376 string Definitions
377 string Assumptions
378 string Boundaries
379 datetime CreatedAt
380 }
381
382 EVIDENCE {
383 string EvidenceID PK
384 string SourceType
385 string URL
386 float ReliabilityScore
387 }
388
389 EVIDENCE_VERSION {
390 string EvidenceVersionID PK
391 string EvidenceID FK
392 string Summary
393 float ReliabilityScore
394 datetime CreatedAt
395 }
396
397 VERDICT {
398 string VerdictID PK
399 string ScenarioID FK
400 }
401
402 VERDICT_VERSION {
403 string VerdictVersionID PK
404 string VerdictID FK
405 float Verdict
406 float Confidence
407 string Reasoning
408 datetime CreatedAt
409 }
410
411 %% Users and roles
412 USER {
413 string UserID PK
414 string Handle
415 string Email
416 }
417
418 TECHNICAL_USER {
419 string UserID PK
420 string SystemName
421 }
422
423 CONTRIBUTING_USER {
424 string UserID PK
425 string DisplayName
426 }
427
428 TRUSTED_CONTRIBUTOR {
429 string UserID PK
430 string TrustLevel
431 }
432
433 REVIEWER {
434 string UserID PK
435 string Domain
436 }
437
438 EXPERT {
439 string UserID PK
440 string ExpertiseArea
441 }
442
443 FEDERATION_NODE {
444 string NodeID PK
445 string Region
446 }
447
448 FEDERATION_ADMIN {
449 string UserID PK
450 string Permissions
451 }
452
453 REVIEW_ACTION {
454 string ReviewActionID PK
455 string UserID FK
456 string TargetEntityType
457 string TargetEntityVersionID
458 string ActionType
459 string Comment
460 datetime Timestamp
461 }
462
463 %% Inheritance / specialization (modelled as relationships)
464 USER ||--o{ TECHNICAL_USER : "is a"
465 USER ||--o{ CONTRIBUTING_USER : "is a"
466
467 CONTRIBUTING_USER ||--o{ TRUSTED_CONTRIBUTOR : "subset"
468 CONTRIBUTING_USER ||--o{ REVIEWER : "subset"
469 CONTRIBUTING_USER ||--o{ EXPERT : "subset"
470
471 TECHNICAL_USER ||--o{ FEDERATION_NODE : "operates"
472 TECHNICAL_USER ||--o{ FEDERATION_ADMIN : "administers"
473
474 %% Review actions on versioned entities
475 USER ||--o{ REVIEW_ACTION : performs
476
477 REVIEW_ACTION }o--|| CLAIM_VERSION : reviews
478 REVIEW_ACTION }o--|| SCENARIO_VERSION : reviews
479 REVIEW_ACTION }o--|| EVIDENCE_VERSION : reviews
480 REVIEW_ACTION }o--|| VERDICT_VERSION : reviews
481
482 {{/mermaid}}
483
484 Notes:
485
486 * Most roles (READER, CONTRIBUTOR, TRUSTED_CONTRIBUTOR, REVIEWER, MODERATOR,
487 SYSTEM_ADMIN, FEDERATION_OPERATOR, FEDERATION_ADMIN, …) are represented as rows
488 in {{code}}ROLE{{/code}}.
489 * {{code}}TECHNICAL_USER{{/code}} captures strictly technical accounts (API keys,
490 node-to-node federation agents, batch jobs). All other roles can, in principle,
491 be held by both human and technical users where appropriate.
492 * A {{code}}READER{{/code}} normally does **not** perform REVIEW_ACTIONs, while
493 roles like REVIEWER, TRUSTED_CONTRIBUTOR, MODERATOR, and some federation roles
494 do.
495
496 ----
497
498 = 5.4 Versioning and re-evaluation behavior =
499
500 This section ties the data model to the re-evaluation logic
501 (described in more detail in the Versioning and Automation chapters).
502
503 * When a new {{code}}EVIDENCE_VERSION{{/code}} is created:
504 * All related {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} entries referencing
505 that evidence version are candidates for re-assessment.
506 * Related {{code}}VERDICT_VERSION{{/code}} entries may become **outdated** and
507 are queued for re-evaluation.
508
509 * When a new {{code}}SCENARIO_VERSION{{/code}} is created:
510 * It may inherit some links from earlier scenarios, or start empty depending
511 on the change classification (cosmetic vs. conceptual).
512 * All verdicts for that scenario are recalculated and stored as new
513 {{code}}VERDICT_VERSION{{/code}} entries.
514
515 * REVIEW_ACTIONs are always attached to the **exact version** that was seen by
516 the reviewer. This preserves a faithful audit trail if data later changes.
517
518 * In a federated environment, nodes can choose:
519 * which identity entities to replicate (CLAIM, SCENARIO, EVIDENCE, VERDICT)
520 * which versioned entities to replicate (e.g. only accepted VERDICT_VERSIONs,
521 only EVIDENCE_VERSIONs above a reliability threshold, etc.)
522
523 ----
524
525 = 5.5 Behavioral Notes =
526
527 == 5.5.1 Late-Arriving Evidence ==
528
529 New evidence versions can make existing verdicts **outdated** and may trigger
530 re-evaluation cascades. This is handled by the global trigger and automation
531 architecture (see the Versioning & Automation chapters).
532
533 == 5.5.2 Scenario Evolution ==
534
535 Scenario changes create new SCENARIO_VERSIONs; dependent verdicts and
536 Scenario–Evidence links are re-assessed. Old versions remain available for
537 historical comparison and reproducibility.
538
539 == 5.5.3 Federation ==
540
541 Federated nodes can replicate subsets of the graph, including:
542
543 * Claims and Scenarios of local interest
544 * Evidence metadata (without full content)
545 * Verdict lineages used for local decision-making
546
547 Federation-specific entities (such as {{code}}FEDERATION_NODE{{/code}},
548 replication logs, and trust rules) are described in the Federation &
549 Decentralization chapter and build on top of the core data model defined here.