Version 1.1 by Robert Schaub on 2025/11/27 12:02

Hide last authors
Robert Schaub 1.1 1 = 5. Data Model =
2
3 The FactHarbor data model centers on four fully versioned, immutable entities:
4
5 * **Claim**
6 * **Scenario**
7 * **Evidence**
8 * **Verdict**
9
10 These entities form the structured **“truth landscape”** for each claim.
11 The model is explicitly **versioned**, **traceable**, and **federation-ready**.
12
13 To keep the system auditable and explainable, FactHarbor uses a consistent
14 **identity vs. version** pattern:
15
16 * Identity entities (e.g. {{code}}CLAIM{{/code}}, {{code}}SCENARIO{{/code}})
17 define *what* something is in a stable sense.
18 * Version entities (e.g. {{code}}CLAIM_VERSION{{/code}}, {{code}}SCENARIO_VERSION{{/code}})
19 define *how that thing looked at a given point in time*.
20
21 All reasoning (e.g. verdicts, review actions) is attached to **versions**, never to
22 mutable identities.
23
24 ----
25
26 = 5.1 Core entities and versioning pattern =
27
28 (% class="wikitable" %)
29 | **Logical concept** | **Identity entity** | **Version entity** | **Notes**
30 | Claim (what people argue about) | {{code}}CLAIM{{/code}} | {{code}}CLAIM_VERSION{{/code}} | Claim text, phrasing, and metadata live in {{code}}CLAIM_VERSION{{/code}}. The identity {{code}}CLAIM{{/code}} stays stable across rephrasings.
31 | Scenario (interpretive frame) | {{code}}SCENARIO{{/code}} | {{code}}SCENARIO_VERSION{{/code}} | A SCENARIO belongs to a CLAIM. Its versions capture evolving definitions, assumptions, and boundaries.
32 | Evidence (source / datapoint) | {{code}}EVIDENCE{{/code}} | {{code}}EVIDENCE_VERSION{{/code}} | Identity of a source vs. specific extractions / updates over time.
33 | Verdict (assessment) | {{code}}VERDICT{{/code}} | {{code}}VERDICT_VERSION{{/code}} | A VERDICT is defined per SCENARIO; VERDICT_VERSION captures the history of assessments.
34 | Scenario–Evidence link | {{code}}SCENARIO_EVIDENCE_LINK{{/code}} | {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} | Links bind scenario versions to evidence versions with relevance & direction.
35 | Claim cluster (semantic group) | {{code}}CLAIM_CLUSTER{{/code}} | – | Groups semantically related claims; mainly for discovery and navigation.
36
37 Key design decisions:
38
39 * A {{code}}CLAIM{{/code}} belongs to exactly one {{code}}CLAIM_CLUSTER{{/code}}.
40 * A {{code}}SCENARIO{{/code}} belongs to exactly one {{code}}CLAIM{{/code}}
41 (scenarios live at the *claim* level, not per individual phrasing).
42 * Verdicts and Scenario–Evidence links are always attached to **versions**:
43 * {{code}}SCENARIO_VERSION{{/code}} +
44 {{code}}EVIDENCE_VERSION{{/code}} →
45 {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}}
46 * {{code}}SCENARIO_VERSION{{/code}} →
47 {{code}}VERDICT_VERSION{{/code}}
48
49 This ensures that when a Scenario or Evidence changes, old verdicts and links
50 remain intact as historical records and can be revisited.
51
52 ----
53
54 = 5.2 Core Data Model ERD (expanded, versioned) =
55
56 The following Mermaid ER diagram shows the main entities and their relationships.
57 The convention is that fields ending in {{code}}Id{{/code}} are primary keys,
58 and fields with {{code}}...IdFk{{/code}} are foreign keys.
59
60 {{mermaid}}
61 erDiagram
62
63 CLAIM_CLUSTER {
64 string claimClusterId
65 string theme
66 string embeddingVectorRef
67 string language
68 datetime createdAt
69 }
70
71 CLAIM {
72 string claimId
73 string claimClusterIdFk
74 string status
75 datetime createdAt
76 }
77
78 CLAIM_VERSION {
79 string claimVersionId
80 string claimIdFk
81 string text
82 string language
83 string claimType
84 string domain
85 string authorType
86 datetime createdAt
87 }
88
89 SCENARIO {
90 string scenarioId
91 string claimIdFk
92 string key
93 string title
94 boolean isDeprecated
95 }
96
97 SCENARIO_VERSION {
98 string scenarioVersionId
99 string scenarioIdFk
100 string versionTag
101 string definitionsJson
102 string assumptionsJson
103 string boundariesJson
104 string notes
105 datetime createdAt
106 }
107
108 EVIDENCE {
109 string evidenceId
110 string canonicalSourceId
111 string mainUrl
112 string evidenceType
113 string language
114 }
115
116 EVIDENCE_VERSION {
117 string evidenceVersionId
118 string evidenceIdFk
119 string snapshotLocation
120 string extractionSummary
121 string reliabilityModel
122 datetime collectedAt
123 datetime createdAt
124 }
125
126 SCENARIO_EVIDENCE_LINK {
127 string scenarioEvidenceLinkId
128 string scenarioIdFk
129 string evidenceIdFk
130 }
131
132 SCENARIO_EVIDENCE_LINK_VERSION {
133 string scenarioEvidenceLinkVersionId
134 string scenarioEvidenceLinkIdFk
135 string scenarioVersionIdFk
136 string evidenceVersionIdFk
137 float relevance
138 string direction %% SUPPORTS / CONTRADICTS / MIXED / CONTEXT
139 string rationale
140 datetime createdAt
141 }
142
143 VERDICT {
144 string verdictId
145 string scenarioIdFk
146 string verdictType %% e.g. likelihood, classification
147 }
148
149 VERDICT_VERSION {
150 string verdictVersionId
151 string verdictIdFk
152 string scenarioVersionIdFk
153 float probability
154 float confidence
155 string reasoningSummary
156 string uncertaintyFactorsJson
157 datetime createdAt
158 }
159
160 %% Relationships
161
162 CLAIM_CLUSTER ||--o{ CLAIM : contains
163 CLAIM ||--o{ CLAIM_VERSION : has_versions
164 CLAIM ||--o{ SCENARIO : has_scenarios
165 SCENARIO ||--o{ SCENARIO_VERSION : has_versions
166
167 EVIDENCE ||--o{ EVIDENCE_VERSION : has_versions
168
169 SCENARIO ||--o{ SCENARIO_EVIDENCE_LINK : may_link
170 EVIDENCE ||--o{ SCENARIO_EVIDENCE_LINK : may_link
171
172 SCENARIO_EVIDENCE_LINK ||--o{ SCENARIO_EVIDENCE_LINK_VERSION : has_versions
173
174 SCENARIO_VERSION ||--o{ SCENARIO_EVIDENCE_LINK_VERSION : uses_evidence
175 EVIDENCE_VERSION ||--o{ SCENARIO_EVIDENCE_LINK_VERSION : is_used_in
176
177 SCENARIO ||--o{ VERDICT : has_verdicts
178 VERDICT ||--o{ VERDICT_VERSION : has_versions
179 SCENARIO_VERSION ||--o{ VERDICT_VERSION : assessed_in
180 {{/mermaid}}
181
182 **Important points:**
183
184 * Scenarios and Evidence are **linked via their versions**
185 ({{code}}SCENARIO_VERSION{{/code}} and {{code}}EVIDENCE_VERSION{{/code}}).
186 * Verdicts are **per ScenarioVersion** and stored in {{code}}VERDICT_VERSION{{/code}}.
187 * {{code}}CLAIM_CLUSTER{{/code}} is shared across diagrams; it is shown here and in the Data Use / Review model.
188
189 All version entities are immutable: once created, they are never changed, only
190 superseded by newer versions.
191
192 ----
193
194 = 5.3 Data Use & Review ERD (expanded, versioned) =
195
196 The **Data Use** model captures who does what with which versioned data:
197
198 * Users (including technical users)
199 * Roles and role assignments
200 * Review actions on versioned entities
201
202 {{mermaid}}
203 erDiagram
204
205 USER {
206 string userId
207 string displayName
208 string email
209 string userType %% "human" or "technical"
210 datetime createdAt
211 }
212
213 TECHNICAL_USER {
214 string technicalUserId
215 string userIdFk
216 string description
217 string systemIdentifier
218 }
219
220 ROLE {
221 string roleId
222 string code %% e.g. READER, CONTRIBUTOR, REVIEWER, TRUSTED_CONTRIBUTOR, MODERATOR, SYSTEM_ADMIN, FEDERATION_OPERATOR, FEDERATION_ADMIN
223 string description
224 boolean isFederationRole
225 }
226
227 USER_ROLE_MEMBERSHIP {
228 string membershipId
229 string userIdFk
230 string roleIdFk
231 datetime grantedAt
232 string grantedByUserIdFk
233 }
234
235 REVIEW_ACTION {
236 string reviewActionId
237 string subjectType %% e.g. CLAIM_VERSION, SCENARIO_VERSION...
238 string subjectVersionId
239 string actionType %% APPROVE, REJECT, FLAG, COMMENT, REQUEST_CHANGES...
240 string outcome %% ACCEPTED, REJECTED, ESCALATED...
241 string comment
242 string createdByUserIdFk
243 datetime createdAt
244 }
245
246 %% Versioned data entities (references from the core model)
247
248 CLAIM_VERSION {
249 string claimVersionId
250 }
251
252 SCENARIO_VERSION {
253 string scenarioVersionId
254 }
255
256 EVIDENCE_VERSION {
257 string evidenceVersionId
258 }
259
260 SCENARIO_EVIDENCE_LINK_VERSION {
261 string scenarioEvidenceLinkVersionId
262 }
263
264 VERDICT_VERSION {
265 string verdictVersionId
266 }
267
268 %% Relationships
269
270 USER ||--o{ TECHNICAL_USER : may_be
271 USER ||--o{ USER_ROLE_MEMBERSHIP : has_role
272 ROLE ||--o{ USER_ROLE_MEMBERSHIP : assigned_to
273
274 USER ||--o{ REVIEW_ACTION : performs
275
276 CLAIM_VERSION ||--o{ REVIEW_ACTION : is_reviewed_in
277 SCENARIO_VERSION ||--o{ REVIEW_ACTION : is_reviewed_in
278 EVIDENCE_VERSION ||--o{ REVIEW_ACTION : is_reviewed_in
279 SCENARIO_EVIDENCE_LINK_VERSION ||--o{ REVIEW_ACTION : is_reviewed_in
280 VERDICT_VERSION ||--o{ REVIEW_ACTION : is_reviewed_in
281 {{/mermaid}}
282
283 Notes:
284
285 * Most roles (READER, CONTRIBUTOR, TRUSTED_CONTRIBUTOR, REVIEWER, MODERATOR,
286 SYSTEM_ADMIN, FEDERATION_OPERATOR, FEDERATION_ADMIN, …) are represented as rows
287 in {{code}}ROLE{{/code}}.
288 * {{code}}TECHNICAL_USER{{/code}} captures strictly technical accounts (API keys,
289 node-to-node federation agents, batch jobs). All other roles can, in principle,
290 be held by both human and technical users where appropriate.
291 * A {{code}}READER{{/code}} normally does **not** perform REVIEW_ACTIONs, while
292 roles like REVIEWER, TRUSTED_CONTRIBUTOR, MODERATOR, and some federation roles
293 do.
294
295 ----
296
297 = 5.4 Versioning and re-evaluation behavior =
298
299 This section ties the data model to the re-evaluation logic
300 (described in more detail in the Versioning and Automation chapters).
301
302 * When a new {{code}}EVIDENCE_VERSION{{/code}} is created:
303 * All related {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} entries referencing
304 that evidence version are candidates for re-assessment.
305 * Related {{code}}VERDICT_VERSION{{/code}} entries may become **outdated** and
306 are queued for re-evaluation.
307
308 * When a new {{code}}SCENARIO_VERSION{{/code}} is created:
309 * It may inherit some links from earlier scenarios, or start empty depending
310 on the change classification (cosmetic vs. conceptual).
311 * All verdicts for that scenario are recalculated and stored as new
312 {{code}}VERDICT_VERSION{{/code}} entries.
313
314 * REVIEW_ACTIONs are always attached to the **exact version** that was seen by
315 the reviewer. This preserves a faithful audit trail if data later changes.
316
317 * In a federated environment, nodes can choose:
318 * which identity entities to replicate (CLAIM, SCENARIO, EVIDENCE, VERDICT)
319 * which versioned entities to replicate (e.g. only accepted VERDICT_VERSIONs,
320 only EVIDENCE_VERSIONs above a reliability threshold, etc.)
321
322 ----
323
324 = 5.5 Behavioral Notes =
325
326 == 5.5.1 Late-Arriving Evidence ==
327
328 New evidence versions can make existing verdicts **outdated** and may trigger
329 re-evaluation cascades. This is handled by the global trigger and automation
330 architecture (see the Versioning & Automation chapters).
331
332 == 5.5.2 Scenario Evolution ==
333
334 Scenario changes create new SCENARIO_VERSIONs; dependent verdicts and
335 Scenario–Evidence links are re-assessed. Old versions remain available for
336 historical comparison and reproducibility.
337
338 == 5.5.3 Federation ==
339
340 Federated nodes can replicate subsets of the graph, including:
341
342 * Claims and Scenarios of local interest
343 * Evidence metadata (without full content)
344 * Verdict lineages used for local decision-making
345
346 Federation-specific entities (such as {{code}}FEDERATION_NODE{{/code}},
347 replication logs, and trust rules) are described in the Federation &
348 Decentralization chapter and build on top of the core data model defined here.