Wiki source code of Data Model

Version 5.1 by Robert Schaub on 2025/12/14 22:27

Show last authors
1 = Data Model =
2
3 This page describes the current data model for FactHarbor.
4
5 == Versioning Strategy ==
6
7 Every entity in FactHarbor has a full immutable version history. This ensures:
8 * Complete auditability
9 * Ability to reconstruct historical state
10 * Federation-compatible lineage tracking
11 * Transparent evolution of claims, scenarios, and verdicts
12
13 === Core Versioning Principles ===
14
15 **Immutability**:
16 * Each version is stored independently
17 * Versions cannot be deleted, only superseded
18 * Historical versions remain accessible
19
20 **Lineage**:
21 * Each version links to its parent via `ParentVersionID`
22 * Forms directed acyclic graph (DAG) of changes
23 * Supports branching in federated environments
24
25 **Provenance**:
26 * Every version timestamped (`CreatedAt`)
27 * Author type recorded (`AuthorType`: Human, AI, ExternalNode)
28 * Justification captured (`JustificationText`)
29 * Digital signatures for integrity (`SignatureHash` in Release 1.0)
30
31 **Federation Support**:
32 * Versions can originate from remote nodes
33 * Conflict detection via lineage comparison
34 * Parallel version trees for branching scenarios
35 * Cross-node version synchronization
36
37 === Common Version Fields ===
38
39 All versioned entities include:
40
41 * **VersionID**: Unique identifier for this specific version
42 * **ParentVersionID**: Link to previous version (null for first version)
43 * **CreatedAt**: Timestamp (ISO 8601, UTC)
44 * **AuthorType**: Human | AI | ExternalNode
45 * **JustificationText**: Brief explanation of changes
46 * **SignatureHash**: Cryptographic signature (Release 1.0)
47
48 ----
49
50 == Core Data Model Refinements ==
51
52 The system relies on the following versioned core entities:
53
54 * **CLAIM_CLUSTER**
55 ** ``ClusterID`` (PK), ``EmbeddingVectorRef``, ``Theme``
56 ** Groups related claims into topical clusters.
57 ** One Cluster has many Claims.
58 ** A Claim belongs to exactly one primary cluster.
59
60 * **CLAIM / CLAIM_VERSION**
61 ** ``CLAIM`` is the long‑lived anchor for a real‑world claim.
62 ** ``CLAIM_VERSION`` is an immutable snapshot that includes:
63 *** ``ClaimID`` (FK to CLAIM)
64 *** ``VersionID`` (PK)
65 *** ``ParentVersionID`` (FK to prior version, nullable)
66 *** ``Text``
67 *** ``Domain``
68 *** ``ClaimType`` (literal, metaphorical, rhetorical, supernatural...)
69 *** ``Evaluability`` (empirical, subjective, non-falsifiable)
70 *** ``SafetyCategory`` (low, medium, high)
71 *** ``CreatedAt``, ``AuthorType``, ``JustificationText``
72 *** ``Status`` (active, superseded, merged)
73
74 * **SCENARIO / SCENARIO_VERSION**
75 ** ``SCENARIO`` is the anchor for a scenario across time.
76 ** ``SCENARIO_VERSION`` is an immutable snapshot:
77 *** ``ScenarioID`` (FK to SCENARIO)
78 *** ``VersionID`` (PK)
79 *** ``ParentVersionID``
80 *** ``ClaimID`` (FK to CLAIM)
81 *** ``Definitions``
82 *** ``Boundaries``
83 *** ``Assumptions``
84 *** ``Context``
85 *** ``EvaluationMethod``
86 *** ``SafetyClass``
87 *** ``CreatedAt``, ``AuthorType``, ``JustificationText``
88 *** ``Status`` (active, superseded, deprecated)
89
90 * **EVIDENCE / EVIDENCE_VERSION**
91 ** ``EVIDENCE`` is the anchor.
92 ** ``EVIDENCE_VERSION`` is the versioned snapshot:
93 *** ``EvidenceID`` (FK to EVIDENCE)
94 *** ``VersionID`` (PK)
95 *** ``ParentVersionID``
96 *** ``Type`` (paper, dataset, report, transcript, expert...)
97 *** ``Category`` (empirical, historical, rhetorical, dataset, meta-analysis...)
98 *** ``Reliability`` (low/med/high)
99 *** ``Provenance`` (URL, DOI, source metadata)
100 *** ``ExtractionMethod`` (manual, OCR, API, AKEL)
101 *** ``CreatedAt``, ``AuthorType``, ``JustificationText``
102 *** ``Status`` (verified, updated, disputed, retracted, superseded)
103
104 * **VERDICT / VERDICT_VERSION**
105 ** ``VERDICT`` is the anchor.
106 ** ``VERDICT_VERSION`` is the snapshot:
107 *** ``VerdictID`` (FK to VERDICT)
108 *** ``VersionID`` (PK)
109 *** ``ParentVersionID``
110 *** ``ClaimID`` (FK to CLAIM)
111 *** ``ScenarioID`` (FK to SCENARIO)
112 *** ``EvidenceVersionSet`` (list of evidence version IDs used)
113 *** ``LikelihoodRange`` (0–1, with uncertainty bounds)
114 *** ``ExplanationChain``
115 *** ``UncertaintyFactors``
116 *** ``CreatedAt``, ``AuthorType``, ``JustificationText``
117 *** ``Status`` (current, outdated, superseded, retracted)
118
119 ----
120
121 == Many-to-Many Linking Tables ==
122
123 === ScenarioEvidenceLink ===
124
125 Links scenario versions to evidence versions with relevance scoring.
126
127 **Fields**:
128 * ``ScenarioID``
129 * ``ScenarioVersionID``
130 * ``EvidenceID``
131 * ``EvidenceVersionID``
132 * ``RelevanceScore`` (0–1) - How relevant this evidence is to this scenario
133 * ``LinkJustification`` - Brief explanation of relevance
134
135 **Purpose**:
136 * Evidence can be used by multiple scenarios
137 * Scenarios can draw from multiple pieces of evidence
138 * Relevance scoring helps prioritize evidence
139 * Version-specific linking preserves historical accuracy
140
141 === ClaimCluster ===
142
143 Semantic clustering of similar claims.
144
145 **Fields**:
146 * ``ClusterID`` (PK)
147 * ``EmbeddingVector`` - Vector representation for semantic search
148 * ``MemberList`` - List of ClaimIDs in this cluster
149 * ``Theme`` - Human-readable theme description
150
151 **Purpose**:
152 * Groups semantically similar claims
153 * Enables efficient search and discovery
154 * Supports cross-node claim alignment
155 * Reduces duplication
156
157 ----
158
159 == Data Model Behavior ==
160
161 === Late-Arriving Evidence ===
162
163 When new evidence versions appear:
164
165 1. Existing verdicts marked as **outdated**
166 2. Scenario relevance must be re-evaluated
167 3. Re-evaluation engine triggers verdict recomputation
168 4. New verdict versions created
169 5. Users notified of updates
170
171 **Process**:
172 * New EvidenceVersion imported
173 * System scans related ScenarioEvidenceLinks
174 * Checks if evidence affects existing verdicts
175 * Queues affected verdicts for re-evaluation
176 * AKEL or reviewer creates new VerdictVersion
177 * Old verdicts remain accessible (historical record)
178
179 === Scenario Evolution ===
180
181 When a scenario's assumptions or definitions change:
182
183 **Creates new scenario version** (not in-place update):
184 * New ScenarioVersion with updated fields
185 * ParentVersionID points to previous version
186 * All dependent verdicts must be recalculated
187 * Previous scenario versions remain accessible
188
189 **Triggers**:
190 * Refined definitions
191 * Changed assumptions
192 * Expanded or narrowed boundaries
193 * Updated evaluation methods
194 * Safety classification changes
195
196 **Impact**:
197 * Verdicts based on old scenario version remain valid (historical)
198 * New verdicts required for new scenario version
199 * Users can compare old vs new scenarios
200 * Evidence links may need re-assessment
201
202 === Federated Nodes ===
203
204 Each node may share partial data:
205
206 **Claims and scenarios**: Shared if relevant to node's domain
207
208 **Evidence metadata**: Shared, but not always full evidence files
209
210 **Verdict lineage**: Shared only if not locally overridden
211
212 **Version synchronization**:
213 * Remote versions imported with provenance metadata
214 * Conflicts detected via ParentVersionID comparison
215 * Branching allowed for divergent interpretations
216 * Local node retains authority over local versions
217
218 **Trust and acceptance**:
219 * Trusted nodes: auto-import versions
220 * Neutral nodes: import but flag for review
221 * Untrusted nodes: manual import only
222
223 ----
224
225 == Entity-Relationship Overview ==
226
227 **Core relationships**:
228
229 ```
230 CLAIM_CLUSTER (1) ──< (N) CLAIM
231 CLAIM (1) ──< (N) CLAIM_VERSION
232 CLAIM (1) ──< (N) SCENARIO
233 SCENARIO (1) ──< (N) SCENARIO_VERSION
234 SCENARIO_VERSION (N) ──< (N) EVIDENCE_VERSION [via ScenarioEvidenceLink]
235 SCENARIO_VERSION (1) ──< (N) VERDICT_VERSION
236 VERDICT_VERSION references specific EvidenceVersionSet
237 ```
238
239 **Version chains**:
240
241 Each entity has a version DAG:
242 ```
243 Version 1 (ParentVersionID=null)
244
245 Version 2 (ParentVersionID=1)
246
247 Version 3 (ParentVersionID=2)
248 ```
249
250 In federated environments, branching may occur:
251 ```
252 Version 1
253
254 Version 2
255 / ↓ ↓
256 V3a V3b (parallel branches from different nodes)
257 ```
258
259 ----
260
261 ## Related Pages ==
262
263 * [[Federation & Decentralization>>FactHarbor.Specification.Federation & Decentralization.WebHome]]
264 * [[AKEL (AI Knowledge Extraction Layer)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
265 * [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]