Data Model - XWiki

1

= Data Model =

2

3

This page describes the current data model for FactHarbor v0.9.1.

4

5

== 1. Versioning Strategy ==

6

7

Every entity in FactHarbor has a full immutable version history. This ensures:

8

* Complete auditability

9

* Ability to reconstruct historical state

10

* Federation-compatible lineage tracking

11

* Transparent evolution of claims, scenarios, and verdicts

12

13

=== 1.1 Core Versioning Principles ===

14

15

**Immutability**:

16

* Each version is stored independently

17

* Versions cannot be deleted, only superseded

18

* Historical versions remain accessible

19

20

**Lineage**:

21

* Each version links to its parent via `ParentVersionID`

22

* Forms directed acyclic graph (DAG) of changes

23

* Supports branching in federated environments

24

25

**Provenance**:

26

* Every version timestamped (`CreatedAt`)

27

* Author type recorded (`AuthorType`: Human, AI, ExternalNode)

28

* Justification captured (`JustificationText`)

29

* Digital signatures for integrity (`SignatureHash` in Release 1.0)

30

31

**Federation Support**:

32

* Versions can originate from remote nodes

33

* Conflict detection via lineage comparison

34

* Parallel version trees for branching scenarios

35

* Cross-node version synchronization

36

37

=== 1.2 Common Version Fields ===

38

39

All versioned entities include:

40

41

* **VersionID**: Unique identifier for this specific version

42

* **ParentVersionID**: Link to previous version (null for first version)

43

* **CreatedAt**: Timestamp (ISO 8601, UTC)

44

* **AuthorType**: Human | AI | ExternalNode

45

* **CreatedBy**: Foreign key to User or TechnicalUser

46

* **JustificationText**: Brief explanation of changes

47

* **PublicationMode**: Mode1 (draft) | Mode2 (AI-published) | Mode3 (human-reviewed)

48

* **ReviewStatus**: Workflow state (draft|in_review|approved|rejected)

49

* **NodeOrigin**: Node ID where version was created (for federation)

50

* **SignatureHash**: Cryptographic signature (Release 1.0)

51

52

53

== 2. Core Entity Definitions ==

54

55

=== 2.1 User Entities ===

56

57

**USER** (base user table):

* ``UserID`` (PK)

* ``DisplayName``

* ``Email`` (for Contributors and above)

62

* ``RegisteredAt``

63

* ``LastActive``

64

* ``Status`` (active|suspended|banned)

65

66

**TECHNICAL_USER** (system processes):

67

* ``SystemID`` (PK)

68

* ``SystemName``

69

* ``Purpose`` (AKEL|FederationSync|BackupService|Monitor|Audit)

70

* ``CreatedBy`` (FK to Maintainer who created this system user)

71

* ``CreatedAt``

72

* ``Status`` (active|paused|deprecated)

73

* ``ApiKey`` (encrypted)

74

* ``Permissions`` (JSON - authorized operations)

75

76

**Examples of Technical Users**:

77

* AKEL instances (AI processing)

78

* Federation sync bots

79

* Scheduled audit tasks

80

* Backup services

81

* Monitoring systems

82

* External API integrations

83

84

85

=== 2.2 Content Entities ===

86

87

The system relies on the following versioned core entities:

**CLAIM_CLUSTER**:

* ``ClusterID`` (PK)

* ``EmbeddingVectorRef``

92

* ``Theme``

93

* Groups related claims into topical clusters

94

* One Cluster has many Claims

95

* A Claim belongs to exactly one primary cluster

96

97

**CLAIM / CLAIM_VERSION**:

98

* ``CLAIM`` is the long-lived anchor for a real-world claim

99

* ``CLAIM_VERSION`` is an immutable snapshot that includes:

100

* ``VersionID`` (PK)

101

* ``ClaimID`` (FK to CLAIM)

102

* ``ParentVersionID`` (FK to prior version, nullable)

103

* ``Text``

104

* ``Domain``

105

* ``ClaimType`` (literal|metaphorical|rhetorical|supernatural)

106

* ``Evaluability`` (empirical|subjective|non-falsifiable)

107

* ``RiskTier`` (A|B|C) - replaced SafetyCategory for consistency

108

* ``PublicationMode`` (Mode1|Mode2|Mode3)

109

* ``ReviewStatus`` (draft|in_review|approved|rejected)

110

* ``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``

111

* ``NodeOrigin``, ``SignatureHash``

112

* ``Status`` (active|superseded|merged)

113

114

**SCENARIO / SCENARIO_VERSION**:

115

* ``SCENARIO`` is the anchor for a scenario across time

116

* ``SCENARIO_VERSION`` is an immutable snapshot:

117

* ``VersionID`` (PK)

118

* ``ScenarioID`` (FK to SCENARIO)

119

* ``ParentVersionID``

120

* ``ClaimID`` (FK to CLAIM)

121

* ``Definitions`` (JSON)

122

* ``Boundaries`` (JSON)

123

* ``Assumptions`` (JSON)

124

* ``Context`` (text)

125

* ``EvaluationMethod`` (text)

126

* ``PublicationMode`` (Mode1|Mode2|Mode3)

127

* ``ReviewStatus`` (draft|in_review|approved|rejected)

128

* ``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``

129

* ``NodeOrigin``, ``SignatureHash``

130

* ``Status`` (active|superseded|deprecated)

131

132

**Note**: SafetyClass removed from Scenario - risk tier is at claim level

133

134

**EVIDENCE / EVIDENCE_VERSION**:

135

* ``EVIDENCE`` is the anchor

136

* ``EVIDENCE_VERSION`` is the versioned snapshot:

137

* ``VersionID`` (PK)

138

* ``EvidenceID`` (FK to EVIDENCE)

139

* ``ParentVersionID``

140

141

* ``Category`` (empirical|historical|rhetorical|dataset|meta-analysis)

142

* ``Reliability`` (low|medium|high)

143

* ``Provenance`` (URL, DOI, source metadata)

144

* ``ExtractionMethod`` (manual|OCR|API|AKEL)

145

* ``ContentHash`` (SHA256 of evidence content)

146

* ``PublicationMode`` (Mode1|Mode2|Mode3)

147

* ``ReviewStatus`` (draft|verified|disputed|retracted)

148

* ``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``

149

* ``NodeOrigin``, ``SignatureHash``

150

* ``Status`` (active|superseded)

151

152

**VERDICT / VERDICT_VERSION**:

153

* ``VERDICT`` is the anchor

154

* ``VERDICT_VERSION`` is the snapshot:

155

* ``VersionID`` (PK)

156

* ``VerdictID`` (FK to VERDICT)

157

* ``ParentVersionID``

158

* ``ClaimID`` (FK to CLAIM)

159

* ``ScenarioVersionID`` (FK to specific SCENARIO_VERSION)

160

* ``EvidenceVersionSet`` (JSON array of Evidence VersionIDs used)

161

* ``LikelihoodRange`` (0–1, with uncertainty bounds)

162

* ``ExplanationChain`` (JSON)

163

* ``UncertaintyFactors`` (JSON)

164

* ``PublicationMode`` (Mode1|Mode2|Mode3)

165

* ``ReviewStatus`` (draft|in_review|approved|retracted)

166

* ``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``

167

* ``NodeOrigin``, ``SignatureHash``

168

* ``Status`` (current|outdated|superseded|retracted)

169

170

171

== 3. Many-to-Many Linking Tables ==

172

173

**ScenarioEvidenceLink**:

174

* Links scenario versions to evidence versions with relevance scoring

175

* ``ScenarioID``, ``ScenarioVersionID``

176

* ``EvidenceID``, ``EvidenceVersionID``

177

* ``RelevanceScore`` (0–1) - How relevant this evidence is to this scenario

178

* ``LinkJustification`` - Brief explanation of relevance

179

180

**Purpose**:

181

* Evidence can be used by multiple scenarios

182

* Scenarios can draw from multiple pieces of evidence

183

* Relevance scoring helps prioritize evidence

184

* Version-specific linking preserves historical accuracy

185

186

**ClaimCluster**:

187

* Semantic clustering of similar claims

188

* ``ClusterID`` (PK)

189

* ``EmbeddingVector`` - Vector representation for semantic search

190

* ``MemberList`` - List of ClaimIDs in this cluster

191

* ``Theme`` - Human-readable theme description

192

193

194

== 4. Key Changes in v0.9.1 ==

195

196

**Updated Field Names**:

197

* `SafetyCategory` → `RiskTier` (consistency with risk tier system A/B/C)

198

* `SafetyClass` removed from Scenario (redundant with claim-level RiskTier)

199

200

**Added Fields to All Version Entities**:

201

* `PublicationMode` - Track Mode 1/2/3 status

202

* `ReviewStatus` - Track workflow state

203

* `NodeOrigin` - Federation provenance

204

* `CreatedBy` - FK to User/TechnicalUser (clarified)

205

206

**New Entity**:

207

* `TECHNICAL_USER` - Separate system processes from human users

208

209

**Clarifications**:

210

* `ScenarioVersionID` in Verdict (not just ScenarioID) - links to specific version

211

* `ContentHash` in Evidence - SHA256 for integrity checking

212

213

214

== 5. Data Model Behavior ==

215

216

=== 5.1 Late-Arriving Evidence ===

217

218

When new evidence versions appear:

219

1. Existing verdicts marked as **outdated**

220

2. Scenario relevance must be re-evaluated

221

3. Re-evaluation engine triggers verdict recomputation

222

4. New verdict versions created

223

5. Users notified of updates

224

225

=== 5.2 Scenario Evolution ===

226

227

When a scenario's assumptions or definitions change:

228

* Creates new scenario version (not in-place update)

229

* All dependent verdicts must be recalculated

230

* Previous scenario versions remain accessible

231

* Version lineage preserved

232

233

=== 5.3 Federated Nodes ===

234

235

Each node may share partial data:

236

* Claims and scenarios shared if relevant

237

* Evidence metadata shared, not always full files

238

* Version synchronization via NodeOrigin tracking

239

* Branching allowed for divergent interpretations

240

241

242

== 6. Visual Diagrams ==

243

244

The following diagrams provide visual representations of the data model structure and relationships.

245

246

=== 6.1 Core Data Model ERD ===

247

248

{{include reference="Test.FactHarborV09.Specification.Diagrams.Core Data Model ERD.WebHome"}}

249

250

=== 6.2 User Roles Structure ===

251

252

{{include reference="Test.FactHarborV09.Specification.Diagrams.User Roles ERD.WebHome"}}

253

254

=== 6.3 Content Workflow ===

255

256

{{include reference="Test.FactHarborV09.Specification.Diagrams.Content Workflow ERD.WebHome"}}

257

258

259

== 7. Related Pages ==

260

261

* [[Federation & Decentralization>>Test.FactHarborV09.Specification.Federation & Decentralization.WebHome]]

262

* [[AKEL (AI Knowledge Extraction Layer)>>Test.FactHarborV09.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]

263

* [[Architecture>>Test.FactHarborV09.Specification.Architecture.WebHome]]

Wiki source code of Data Model

Applications

Navigation

Need help?