Version 3.1 by Robert Schaub on 2025/11/27 12:08

Show last authors
1 = 5. Data Model =
2
3 The FactHarbor data model centers on four fully versioned, immutable entities:
4
5 * **Claim**
6 * **Scenario**
7 * **Evidence**
8 * **Verdict**
9
10 These entities form the structured **“truth landscape”** for each claim.
11 The model is explicitly **versioned**, **traceable**, and **federation-ready**.
12
13 To keep the system auditable and explainable, FactHarbor uses a consistent
14 **identity vs. version** pattern:
15
16 * Identity entities (e.g. {{code}}CLAIM{{/code}}, {{code}}SCENARIO{{/code}})
17 define *what* something is in a stable sense.
18 * Version entities (e.g. {{code}}CLAIM_VERSION{{/code}}, {{code}}SCENARIO_VERSION{{/code}})
19 define *how that thing looked at a given point in time*.
20
21 All reasoning (e.g. verdicts, review actions) is attached to **versions**, never to
22 mutable identities.
23
24 ----
25
26 = 5.1 Core entities and versioning pattern =
27
28 (% class="wikitable" %)
29 | **Logical concept** | **Identity entity** | **Version entity** | **Notes**
30 | Claim (what people argue about) | {{code}}CLAIM{{/code}} | {{code}}CLAIM_VERSION{{/code}} | Claim text, phrasing, and metadata live in {{code}}CLAIM_VERSION{{/code}}. The identity {{code}}CLAIM{{/code}} stays stable across rephrasings.
31 | Scenario (interpretive frame) | {{code}}SCENARIO{{/code}} | {{code}}SCENARIO_VERSION{{/code}} | A SCENARIO belongs to a CLAIM. Its versions capture evolving definitions, assumptions, and boundaries.
32 | Evidence (source / datapoint) | {{code}}EVIDENCE{{/code}} | {{code}}EVIDENCE_VERSION{{/code}} | Identity of a source vs. specific extractions / updates over time.
33 | Verdict (assessment) | {{code}}VERDICT{{/code}} | {{code}}VERDICT_VERSION{{/code}} | A VERDICT is defined per SCENARIO; VERDICT_VERSION captures the history of assessments.
34 | Scenario–Evidence link | {{code}}SCENARIO_EVIDENCE_LINK{{/code}} | {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} | Links bind scenario versions to evidence versions with relevance & direction.
35 | Claim cluster (semantic group) | {{code}}CLAIM_CLUSTER{{/code}} | – | Groups semantically related claims; mainly for discovery and navigation.
36
37 Key design decisions:
38
39 * A {{code}}CLAIM{{/code}} belongs to exactly one {{code}}CLAIM_CLUSTER{{/code}}.
40 * A {{code}}SCENARIO{{/code}} belongs to exactly one {{code}}CLAIM{{/code}}
41 (scenarios live at the *claim* level, not per individual phrasing).
42 * Verdicts and Scenario–Evidence links are always attached to **versions**:
43 * {{code}}SCENARIO_VERSION{{/code}} +
44 {{code}}EVIDENCE_VERSION{{/code}} →
45 {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}}
46 * {{code}}SCENARIO_VERSION{{/code}} →
47 {{code}}VERDICT_VERSION{{/code}}
48
49 This ensures that when a Scenario or Evidence changes, old verdicts and links
50 remain intact as historical records and can be revisited.
51
52 ----
53
54 = 5.2 Core Data Model ERD (expanded, versioned) =
55
56 The following Mermaid ER diagram shows the main entities and their relationships.
57 The convention is that fields ending in {{code}}Id{{/code}} are primary keys,
58 and fields with {{code}}...IdFk{{/code}} are foreign keys.
59
60 {{mermaid}}
61 erDiagram
62 CLAIM_CLUSTER {
63 string ClusterID PK
64 string EmbeddingVectorRef
65 string Theme
66 }
67
68 CLAIM {
69 string ClaimID PK
70 string ClusterID FK
71 string Status
72 datetime CreatedAt
73 }
74
75 CLAIM_VERSION {
76 string ClaimVersionID PK
77 string ClaimID FK
78 string Text
79 string ClaimType
80 string Domain
81 datetime CreatedAt
82 }
83
84 SCENARIO {
85 string ScenarioID PK
86 string ClaimID FK
87 string Name
88 datetime CreatedAt
89 }
90
91 SCENARIO_VERSION {
92 string ScenarioVersionID PK
93 string ScenarioID FK
94 string Definitions
95 string Assumptions
96 string Boundaries
97 datetime CreatedAt
98 }
99
100 EVIDENCE {
101 string EvidenceID PK
102 string SourceType
103 string URL
104 float ReliabilityScore
105 }
106
107 EVIDENCE_VERSION {
108 string EvidenceVersionID PK
109 string EvidenceID FK
110 string Summary
111 float ReliabilityScore
112 datetime CreatedAt
113 }
114
115 SCENARIO_EVIDENCE_LINK {
116 string LinkID PK
117 string ScenarioVersionID FK
118 string EvidenceVersionID FK
119 float Relevance
120 string Direction
121 }
122
123 VERDICT {
124 string VerdictID PK
125 string ScenarioID FK
126 }
127
128 VERDICT_VERSION {
129 string VerdictVersionID PK
130 string VerdictID FK
131 float Verdict
132 float Confidence
133 string Reasoning
134 datetime CreatedAt
135 }
136
137 CLAIM_CLUSTER ||--o{ CLAIM : contains
138 CLAIM ||--o{ CLAIM_VERSION : versions
139
140 CLAIM ||--o{ SCENARIO : has
141 SCENARIO ||--o{ SCENARIO_VERSION : versions
142
143 EVIDENCE ||--o{ EVIDENCE_VERSION : versions
144
145 SCENARIO_VERSION ||--o{ SCENARIO_EVIDENCE_LINK : links
146 EVIDENCE_VERSION ||--o{ SCENARIO_EVIDENCE_LINK : linked
147
148 SCENARIO ||--o{ VERDICT : assessed
149 VERDICT ||--o{ VERDICT_VERSION : versions
150
151 {{/mermaid}}
152
153 **Important points:**
154
155 * Scenarios and Evidence are **linked via their versions**
156 ({{code}}SCENARIO_VERSION{{/code}} and {{code}}EVIDENCE_VERSION{{/code}}).
157 * Verdicts are **per ScenarioVersion** and stored in {{code}}VERDICT_VERSION{{/code}}.
158 * {{code}}CLAIM_CLUSTER{{/code}} is shared across diagrams; it is shown here and in the Data Use / Review model.
159
160 All version entities are immutable: once created, they are never changed, only
161 superseded by newer versions.
162
163 ----
164
165 = 5.3 Data Use & Review ERD (expanded, versioned) =
166
167 The **Data Use** model captures who does what with which versioned data:
168
169 * Users (including technical users)
170 * Roles and role assignments
171 * Review actions on versioned entities
172
173 {{mermaid}}
174 erDiagram
175 %% Core clusters shown for context
176 CLAIM_CLUSTER {
177 string ClusterID PK
178 string EmbeddingVectorRef
179 string Theme
180 }
181
182 CLAIM {
183 string ClaimID PK
184 string ClusterID FK
185 string Status
186 datetime CreatedAt
187 }
188
189 CLAIM_VERSION {
190 string ClaimVersionID PK
191 string ClaimID FK
192 string Text
193 string ClaimType
194 string Domain
195 datetime CreatedAt
196 }
197
198 SCENARIO {
199 string ScenarioID PK
200 string ClaimID FK
201 string Name
202 datetime CreatedAt
203 }
204
205 SCENARIO_VERSION {
206 string ScenarioVersionID PK
207 string ScenarioID FK
208 string Definitions
209 string Assumptions
210 string Boundaries
211 datetime CreatedAt
212 }
213
214 EVIDENCE {
215 string EvidenceID PK
216 string SourceType
217 string URL
218 float ReliabilityScore
219 }
220
221 EVIDENCE_VERSION {
222 string EvidenceVersionID PK
223 string EvidenceID FK
224 string Summary
225 float ReliabilityScore
226 datetime CreatedAt
227 }
228
229 VERDICT {
230 string VerdictID PK
231 string ScenarioID FK
232 }
233
234 VERDICT_VERSION {
235 string VerdictVersionID PK
236 string VerdictID FK
237 float Verdict
238 float Confidence
239 string Reasoning
240 datetime CreatedAt
241 }
242
243 %% Users and roles
244 USER {
245 string UserID PK
246 string Handle
247 string Email
248 }
249
250 TECHNICAL_USER {
251 string UserID PK
252 string SystemName
253 }
254
255 CONTRIBUTING_USER {
256 string UserID PK
257 string DisplayName
258 }
259
260 TRUSTED_CONTRIBUTOR {
261 string UserID PK
262 string TrustLevel
263 }
264
265 REVIEWER {
266 string UserID PK
267 string Domain
268 }
269
270 EXPERT {
271 string UserID PK
272 string ExpertiseArea
273 }
274
275 FEDERATION_NODE {
276 string NodeID PK
277 string Region
278 }
279
280 FEDERATION_ADMIN {
281 string UserID PK
282 string Permissions
283 }
284
285 REVIEW_ACTION {
286 string ReviewActionID PK
287 string UserID FK
288 string TargetEntityType
289 string TargetEntityVersionID
290 string ActionType
291 string Comment
292 datetime Timestamp
293 }
294
295 %% Inheritance / specialization (modelled as relationships)
296 USER ||--o{ TECHNICAL_USER : "is a"
297 USER ||--o{ CONTRIBUTING_USER : "is a"
298
299 CONTRIBUTING_USER ||--o{ TRUSTED_CONTRIBUTOR : "subset"
300 CONTRIBUTING_USER ||--o{ REVIEWER : "subset"
301 CONTRIBUTING_USER ||--o{ EXPERT : "subset"
302
303 TECHNICAL_USER ||--o{ FEDERATION_NODE : "operates"
304 TECHNICAL_USER ||--o{ FEDERATION_ADMIN : "administers"
305
306 %% Review actions on versioned entities
307 USER ||--o{ REVIEW_ACTION : performs
308
309 REVIEW_ACTION }o--|| CLAIM_VERSION : reviews
310 REVIEW_ACTION }o--|| SCENARIO_VERSION : reviews
311 REVIEW_ACTION }o--|| EVIDENCE_VERSION : reviews
312 REVIEW_ACTION }o--|| VERDICT_VERSION : reviews
313
314 {{/mermaid}}
315
316 Notes:
317
318 * Most roles (READER, CONTRIBUTOR, TRUSTED_CONTRIBUTOR, REVIEWER, MODERATOR,
319 SYSTEM_ADMIN, FEDERATION_OPERATOR, FEDERATION_ADMIN, …) are represented as rows
320 in {{code}}ROLE{{/code}}.
321 * {{code}}TECHNICAL_USER{{/code}} captures strictly technical accounts (API keys,
322 node-to-node federation agents, batch jobs). All other roles can, in principle,
323 be held by both human and technical users where appropriate.
324 * A {{code}}READER{{/code}} normally does **not** perform REVIEW_ACTIONs, while
325 roles like REVIEWER, TRUSTED_CONTRIBUTOR, MODERATOR, and some federation roles
326 do.
327
328 ----
329
330 = 5.4 Versioning and re-evaluation behavior =
331
332 This section ties the data model to the re-evaluation logic
333 (described in more detail in the Versioning and Automation chapters).
334
335 * When a new {{code}}EVIDENCE_VERSION{{/code}} is created:
336 * All related {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} entries referencing
337 that evidence version are candidates for re-assessment.
338 * Related {{code}}VERDICT_VERSION{{/code}} entries may become **outdated** and
339 are queued for re-evaluation.
340
341 * When a new {{code}}SCENARIO_VERSION{{/code}} is created:
342 * It may inherit some links from earlier scenarios, or start empty depending
343 on the change classification (cosmetic vs. conceptual).
344 * All verdicts for that scenario are recalculated and stored as new
345 {{code}}VERDICT_VERSION{{/code}} entries.
346
347 * REVIEW_ACTIONs are always attached to the **exact version** that was seen by
348 the reviewer. This preserves a faithful audit trail if data later changes.
349
350 * In a federated environment, nodes can choose:
351 * which identity entities to replicate (CLAIM, SCENARIO, EVIDENCE, VERDICT)
352 * which versioned entities to replicate (e.g. only accepted VERDICT_VERSIONs,
353 only EVIDENCE_VERSIONs above a reliability threshold, etc.)
354
355 ----
356
357 = 5.5 Behavioral Notes =
358
359 == 5.5.1 Late-Arriving Evidence ==
360
361 New evidence versions can make existing verdicts **outdated** and may trigger
362 re-evaluation cascades. This is handled by the global trigger and automation
363 architecture (see the Versioning & Automation chapters).
364
365 == 5.5.2 Scenario Evolution ==
366
367 Scenario changes create new SCENARIO_VERSIONs; dependent verdicts and
368 Scenario–Evidence links are re-assessed. Old versions remain available for
369 historical comparison and reproducibility.
370
371 == 5.5.3 Federation ==
372
373 Federated nodes can replicate subsets of the graph, including:
374
375 * Claims and Scenarios of local interest
376 * Evidence metadata (without full content)
377 * Verdict lineages used for local decision-making
378
379 Federation-specific entities (such as {{code}}FEDERATION_NODE{{/code}},
380 replication logs, and trust rules) are described in the Federation &
381 Decentralization chapter and build on top of the core data model defined here.