Version 2.1 by Robert Schaub on 2025/11/27 12:05

Show last authors
1 = 5. Data Model =
2
3 The FactHarbor data model centers on four fully versioned, immutable entities:
4
5 * **Claim**
6 * **Scenario**
7 * **Evidence**
8 * **Verdict**
9
10 These entities form the structured **“truth landscape”** for each claim.
11 The model is explicitly **versioned**, **traceable**, and **federation-ready**.
12
13 To keep the system auditable and explainable, FactHarbor uses a consistent
14 **identity vs. version** pattern:
15
16 * Identity entities (e.g. {{code}}CLAIM{{/code}}, {{code}}SCENARIO{{/code}})
17 define *what* something is in a stable sense.
18 * Version entities (e.g. {{code}}CLAIM_VERSION{{/code}}, {{code}}SCENARIO_VERSION{{/code}})
19 define *how that thing looked at a given point in time*.
20
21 All reasoning (e.g. verdicts, review actions) is attached to **versions**, never to
22 mutable identities.
23
24 ----
25
26 = 5.1 Core entities and versioning pattern =
27
28 (% class="wikitable" %)
29 | **Logical concept** | **Identity entity** | **Version entity** | **Notes**
30 | Claim (what people argue about) | {{code}}CLAIM{{/code}} | {{code}}CLAIM_VERSION{{/code}} | Claim text, phrasing, and metadata live in {{code}}CLAIM_VERSION{{/code}}. The identity {{code}}CLAIM{{/code}} stays stable across rephrasings.
31 | Scenario (interpretive frame) | {{code}}SCENARIO{{/code}} | {{code}}SCENARIO_VERSION{{/code}} | A SCENARIO belongs to a CLAIM. Its versions capture evolving definitions, assumptions, and boundaries.
32 | Evidence (source / datapoint) | {{code}}EVIDENCE{{/code}} | {{code}}EVIDENCE_VERSION{{/code}} | Identity of a source vs. specific extractions / updates over time.
33 | Verdict (assessment) | {{code}}VERDICT{{/code}} | {{code}}VERDICT_VERSION{{/code}} | A VERDICT is defined per SCENARIO; VERDICT_VERSION captures the history of assessments.
34 | Scenario–Evidence link | {{code}}SCENARIO_EVIDENCE_LINK{{/code}} | {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} | Links bind scenario versions to evidence versions with relevance & direction.
35 | Claim cluster (semantic group) | {{code}}CLAIM_CLUSTER{{/code}} | – | Groups semantically related claims; mainly for discovery and navigation.
36
37 Key design decisions:
38
39 * A {{code}}CLAIM{{/code}} belongs to exactly one {{code}}CLAIM_CLUSTER{{/code}}.
40 * A {{code}}SCENARIO{{/code}} belongs to exactly one {{code}}CLAIM{{/code}}
41 (scenarios live at the *claim* level, not per individual phrasing).
42 * Verdicts and Scenario–Evidence links are always attached to **versions**:
43 * {{code}}SCENARIO_VERSION{{/code}} +
44 {{code}}EVIDENCE_VERSION{{/code}} →
45 {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}}
46 * {{code}}SCENARIO_VERSION{{/code}} →
47 {{code}}VERDICT_VERSION{{/code}}
48
49 This ensures that when a Scenario or Evidence changes, old verdicts and links
50 remain intact as historical records and can be revisited.
51
52 ----
53
54 = 5.2 Core Data Model ERD (expanded, versioned) =
55
56 The following Mermaid ER diagram shows the main entities and their relationships.
57 The convention is that fields ending in {{code}}Id{{/code}} are primary keys,
58 and fields with {{code}}...IdFk{{/code}} are foreign keys.
59
60 {{mermaid}}
61 erDiagram
62 CLAIM_CLUSTER {
63 string ClusterID PK
64 string EmbeddingVectorRef
65 string Theme
66 }
67
68 CLAIM {
69 string ClaimID PK
70 string ClusterID FK
71 string Status
72 datetime CreatedAt
73 }
74
75 CLAIM_VERSION {
76 string ClaimVersionID PK
77 string ClaimID FK
78 string Text
79 string ClaimType
80 string Domain
81 datetime CreatedAt
82 }
83
84 SCENARIO {
85 string ScenarioID PK
86 string ClaimID FK
87 string Name
88 datetime CreatedAt
89 }
90
91 SCENARIO_VERSION {
92 string ScenarioVersionID PK
93 string ScenarioID FK
94 string Definitions
95 string Assumptions
96 string Boundaries
97 datetime CreatedAt
98 }
99
100 EVIDENCE {
101 string EvidenceID PK
102 string SourceType
103 string URL
104 float ReliabilityScore
105 }
106
107 EVIDENCE_VERSION {
108 string EvidenceVersionID PK
109 string EvidenceID FK
110 string Summary
111 float ReliabilityScore
112 datetime CreatedAt
113 }
114
115 SCENARIO_EVIDENCE_LINK {
116 string LinkID PK
117 string ScenarioVersionID FK
118 string EvidenceVersionID FK
119 float Relevance
120 string Direction
121 }
122
123 VERDICT {
124 string VerdictID PK
125 string ScenarioID FK
126 }
127
128 VERDICT_VERSION {
129 string VerdictVersionID PK
130 string VerdictID FK
131 float Verdict
132 float Confidence
133 string Reasoning
134 datetime CreatedAt
135 }
136
137 CLAIM_CLUSTER ||--o{ CLAIM : contains
138 CLAIM ||--o{ CLAIM_VERSION : versions
139
140 CLAIM ||--o{ SCENARIO : has
141 SCENARIO ||--o{ SCENARIO_VERSION : versions
142
143 EVIDENCE ||--o{ EVIDENCE_VERSION : versions
144
145 SCENARIO_VERSION ||--o{ SCENARIO_EVIDENCE_LINK : links
146 EVIDENCE_VERSION ||--o{ SCENARIO_EVIDENCE_LINK : linked
147
148 SCENARIO ||--o{ VERDICT : assessed
149 VERDICT ||--o{ VERDICT_VERSION : versions
150
151 {{/mermaid}}
152
153 **Important points:**
154
155 * Scenarios and Evidence are **linked via their versions**
156 ({{code}}SCENARIO_VERSION{{/code}} and {{code}}EVIDENCE_VERSION{{/code}}).
157 * Verdicts are **per ScenarioVersion** and stored in {{code}}VERDICT_VERSION{{/code}}.
158 * {{code}}CLAIM_CLUSTER{{/code}} is shared across diagrams; it is shown here and in the Data Use / Review model.
159
160 All version entities are immutable: once created, they are never changed, only
161 superseded by newer versions.
162
163 ----
164
165 = 5.3 Data Use & Review ERD (expanded, versioned) =
166
167 The **Data Use** model captures who does what with which versioned data:
168
169 * Users (including technical users)
170 * Roles and role assignments
171 * Review actions on versioned entities
172
173 {{mermaid}}
174 erDiagram
175
176 USER {
177 string userId
178 string displayName
179 string email
180 string userType %% "human" or "technical"
181 datetime createdAt
182 }
183
184 TECHNICAL_USER {
185 string technicalUserId
186 string userIdFk
187 string description
188 string systemIdentifier
189 }
190
191 ROLE {
192 string roleId
193 string code %% e.g. READER, CONTRIBUTOR, REVIEWER, TRUSTED_CONTRIBUTOR, MODERATOR, SYSTEM_ADMIN, FEDERATION_OPERATOR, FEDERATION_ADMIN
194 string description
195 boolean isFederationRole
196 }
197
198 USER_ROLE_MEMBERSHIP {
199 string membershipId
200 string userIdFk
201 string roleIdFk
202 datetime grantedAt
203 string grantedByUserIdFk
204 }
205
206 REVIEW_ACTION {
207 string reviewActionId
208 string subjectType %% e.g. CLAIM_VERSION, SCENARIO_VERSION...
209 string subjectVersionId
210 string actionType %% APPROVE, REJECT, FLAG, COMMENT, REQUEST_CHANGES...
211 string outcome %% ACCEPTED, REJECTED, ESCALATED...
212 string comment
213 string createdByUserIdFk
214 datetime createdAt
215 }
216
217 %% Versioned data entities (references from the core model)
218
219 CLAIM_VERSION {
220 string claimVersionId
221 }
222
223 SCENARIO_VERSION {
224 string scenarioVersionId
225 }
226
227 EVIDENCE_VERSION {
228 string evidenceVersionId
229 }
230
231 SCENARIO_EVIDENCE_LINK_VERSION {
232 string scenarioEvidenceLinkVersionId
233 }
234
235 VERDICT_VERSION {
236 string verdictVersionId
237 }
238
239 %% Relationships
240
241 USER ||--o{ TECHNICAL_USER : may_be
242 USER ||--o{ USER_ROLE_MEMBERSHIP : has_role
243 ROLE ||--o{ USER_ROLE_MEMBERSHIP : assigned_to
244
245 USER ||--o{ REVIEW_ACTION : performs
246
247 CLAIM_VERSION ||--o{ REVIEW_ACTION : is_reviewed_in
248 SCENARIO_VERSION ||--o{ REVIEW_ACTION : is_reviewed_in
249 EVIDENCE_VERSION ||--o{ REVIEW_ACTION : is_reviewed_in
250 SCENARIO_EVIDENCE_LINK_VERSION ||--o{ REVIEW_ACTION : is_reviewed_in
251 VERDICT_VERSION ||--o{ REVIEW_ACTION : is_reviewed_in
252 {{/mermaid}}
253
254 Notes:
255
256 * Most roles (READER, CONTRIBUTOR, TRUSTED_CONTRIBUTOR, REVIEWER, MODERATOR,
257 SYSTEM_ADMIN, FEDERATION_OPERATOR, FEDERATION_ADMIN, …) are represented as rows
258 in {{code}}ROLE{{/code}}.
259 * {{code}}TECHNICAL_USER{{/code}} captures strictly technical accounts (API keys,
260 node-to-node federation agents, batch jobs). All other roles can, in principle,
261 be held by both human and technical users where appropriate.
262 * A {{code}}READER{{/code}} normally does **not** perform REVIEW_ACTIONs, while
263 roles like REVIEWER, TRUSTED_CONTRIBUTOR, MODERATOR, and some federation roles
264 do.
265
266 ----
267
268 = 5.4 Versioning and re-evaluation behavior =
269
270 This section ties the data model to the re-evaluation logic
271 (described in more detail in the Versioning and Automation chapters).
272
273 * When a new {{code}}EVIDENCE_VERSION{{/code}} is created:
274 * All related {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} entries referencing
275 that evidence version are candidates for re-assessment.
276 * Related {{code}}VERDICT_VERSION{{/code}} entries may become **outdated** and
277 are queued for re-evaluation.
278
279 * When a new {{code}}SCENARIO_VERSION{{/code}} is created:
280 * It may inherit some links from earlier scenarios, or start empty depending
281 on the change classification (cosmetic vs. conceptual).
282 * All verdicts for that scenario are recalculated and stored as new
283 {{code}}VERDICT_VERSION{{/code}} entries.
284
285 * REVIEW_ACTIONs are always attached to the **exact version** that was seen by
286 the reviewer. This preserves a faithful audit trail if data later changes.
287
288 * In a federated environment, nodes can choose:
289 * which identity entities to replicate (CLAIM, SCENARIO, EVIDENCE, VERDICT)
290 * which versioned entities to replicate (e.g. only accepted VERDICT_VERSIONs,
291 only EVIDENCE_VERSIONs above a reliability threshold, etc.)
292
293 ----
294
295 = 5.5 Behavioral Notes =
296
297 == 5.5.1 Late-Arriving Evidence ==
298
299 New evidence versions can make existing verdicts **outdated** and may trigger
300 re-evaluation cascades. This is handled by the global trigger and automation
301 architecture (see the Versioning & Automation chapters).
302
303 == 5.5.2 Scenario Evolution ==
304
305 Scenario changes create new SCENARIO_VERSIONs; dependent verdicts and
306 Scenario–Evidence links are re-assessed. Old versions remain available for
307 historical comparison and reproducibility.
308
309 == 5.5.3 Federation ==
310
311 Federated nodes can replicate subsets of the graph, including:
312
313 * Claims and Scenarios of local interest
314 * Evidence metadata (without full content)
315 * Verdict lineages used for local decision-making
316
317 Federation-specific entities (such as {{code}}FEDERATION_NODE{{/code}},
318 replication logs, and trust rules) are described in the Federation &
319 Decentralization chapter and build on top of the core data model defined here.