Changes for page Data Model (From Specification Chat)
Last modified by Robert Schaub on 2025/12/24 20:35
From version 4.1
edited by Robert Schaub
on 2025/11/27 12:11
on 2025/11/27 12:11
Change comment:
There is no comment for this version
To version 8.1
edited by Robert Schaub
on 2025/11/27 12:55
on 2025/11/27 12:55
Change comment:
There is no comment for this version
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -1,3 +1,162 @@ 1 +((( 2 + 3 +))) 4 + 5 += 5. Data Model = 6 + 7 +The FactHarbor data model centers on four fully versioned, immutable entities: 8 + 9 +* **Claim** 10 +* **Scenario** 11 +* **Evidence** 12 +* **Verdict** 13 + 14 +These entities form the structured **“truth landscape”** for each claim. 15 +The model is explicitly **versioned**, **traceable**, and **federation-ready**. 16 + 17 +To keep the system auditable and explainable, FactHarbor uses a consistent 18 +**identity vs. version** pattern: 19 + 20 +* Identity entities (e.g. {{code}}CLAIM{{/code}}, {{code}}SCENARIO{{/code}}) 21 + define *what* something is in a stable sense. 22 +* Version entities (e.g. {{code}}CLAIM_VERSION{{/code}}, {{code}}SCENARIO_VERSION{{/code}}) 23 + define *how that thing looked at a given point in time*. 24 + 25 +All reasoning (e.g. verdicts, review actions) is attached to **versions**, never to 26 +mutable identities. 27 + 28 +---- 29 + 30 += 5.1 Core entities and versioning pattern = 31 + 32 +(% class="wikitable" %) 33 +| **Logical concept** | **Identity entity** | **Version entity** | **Notes** 34 +| Claim (what people argue about) | {{code}}CLAIM{{/code}} | {{code}}CLAIM_VERSION{{/code}} | Claim text, phrasing, and metadata live in {{code}}CLAIM_VERSION{{/code}}. The identity {{code}}CLAIM{{/code}} stays stable across rephrasings. 35 +| Scenario (interpretive frame) | {{code}}SCENARIO{{/code}} | {{code}}SCENARIO_VERSION{{/code}} | A SCENARIO belongs to a CLAIM. Its versions capture evolving definitions, assumptions, and boundaries. 36 +| Evidence (source / datapoint) | {{code}}EVIDENCE{{/code}} | {{code}}EVIDENCE_VERSION{{/code}} | Identity of a source vs. specific extractions / updates over time. 37 +| Verdict (assessment) | {{code}}VERDICT{{/code}} | {{code}}VERDICT_VERSION{{/code}} | A VERDICT is defined per SCENARIO; VERDICT_VERSION captures the history of assessments. 38 +| Scenario–Evidence link | {{code}}SCENARIO_EVIDENCE_LINK{{/code}} | {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} | Links bind scenario versions to evidence versions with relevance & direction. 39 +| Claim cluster (semantic group) | {{code}}CLAIM_CLUSTER{{/code}} | – | Groups semantically related claims; mainly for discovery and navigation. 40 + 41 +Key design decisions: 42 + 43 +* A {{code}}CLAIM{{/code}} belongs to exactly one {{code}}CLAIM_CLUSTER{{/code}}. 44 +* A {{code}}SCENARIO{{/code}} belongs to exactly one {{code}}CLAIM{{/code}} 45 + (scenarios live at the *claim* level, not per individual phrasing). 46 +* Verdicts and Scenario–Evidence links are always attached to **versions**: 47 +* {{code}}SCENARIO_VERSION{{/code}} + 48 +{{code}}EVIDENCE_VERSION{{/code}} → 49 +{{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} 50 +* {{code}}SCENARIO_VERSION{{/code}} → 51 +{{code}}VERDICT_VERSION{{/code}} 52 + 53 +This ensures that when a Scenario or Evidence changes, old verdicts and links 54 +remain intact as historical records and can be revisited. 55 + 56 +---- 57 + 58 += 5.2 Core Data Model ERD (expanded, versioned) = 59 + 60 +The following Mermaid ER diagram shows the main entities and their relationships. 61 +The convention is that fields ending in {{code}}Id{{/code}} are primary keys, 62 +and fields with {{code}}...IdFk{{/code}} are foreign keys. 63 + 64 +{{comment}} Core Data Model ERD (Mermaid, from /Specification/Diagrams/Data Model) {{/comment}} 65 +{{include document="FactHarbor.Playground.Core Data Model ERD Page (from Specification chat).WebHome" reference="FactHarbor.Playground.data.Core Data Model ERD Page (from Specification chat).WebHome"/}} 66 + 67 +**Important points:** 68 + 69 +* Scenarios and Evidence are **linked via their versions** 70 + ({{code}}SCENARIO_VERSION{{/code}} and {{code}}EVIDENCE_VERSION{{/code}}). 71 +* Verdicts are **per ScenarioVersion** and stored in {{code}}VERDICT_VERSION{{/code}}. 72 +* {{code}}CLAIM_CLUSTER{{/code}} is shared across diagrams; it is shown here and in the Data Use / Review model. 73 + 74 +All version entities are immutable: once created, they are never changed, only 75 +superseded by newer versions. 76 + 77 +---- 78 + 79 += 5.3 Data Use & Review ERD = 80 + 81 +The **Data Use** model captures who does what with which versioned data: 82 + 83 +* Users (including technical users) 84 +* Roles and role assignments 85 +* Review actions on versioned entities 86 + 87 +{{comment}} Data Use ERD (Mermaid, from /Specification/Diagrams/Data Use ERD) {{/comment}} 88 +{{include document="FactHarbor.Playground.Data Use ERD Page (from Specification chat).WebHome" reference="FactHarbor.Playground.data.Data Use ERD Page (from Specification chat).WebHome"/}} 89 + 90 + 91 +Notes: 92 + 93 +* Most roles (READER, CONTRIBUTOR, TRUSTED_CONTRIBUTOR, REVIEWER, MODERATOR, 94 + SYSTEM_ADMIN, FEDERATION_OPERATOR, FEDERATION_ADMIN, …) are represented as rows 95 + in {{code}}ROLE{{/code}}. 96 +* {{code}}TECHNICAL_USER{{/code}} captures strictly technical accounts (API keys, 97 + node-to-node federation agents, batch jobs). All other roles can, in principle, 98 + be held by both human and technical users where appropriate. 99 +* A {{code}}READER{{/code}} normally does **not** perform REVIEW_ACTIONs, while 100 + roles like REVIEWER, TRUSTED_CONTRIBUTOR, MODERATOR, and some federation roles 101 + do. 102 + 103 +---- 104 + 105 += 5.4 Versioning and re-evaluation behavior = 106 + 107 +This section ties the data model to the re-evaluation logic 108 +(described in more detail in the Versioning and Automation chapters). 109 + 110 +* When a new {{code}}EVIDENCE_VERSION{{/code}} is created: 111 +* All related {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} entries referencing 112 + that evidence version are candidates for re-assessment. 113 +* Related {{code}}VERDICT_VERSION{{/code}} entries may become **outdated** and 114 + are queued for re-evaluation. 115 + 116 +* When a new {{code}}SCENARIO_VERSION{{/code}} is created: 117 +* It may inherit some links from earlier scenarios, or start empty depending 118 + on the change classification (cosmetic vs. conceptual). 119 +* All verdicts for that scenario are recalculated and stored as new 120 +{{code}}VERDICT_VERSION{{/code}} entries. 121 + 122 +* REVIEW_ACTIONs are always attached to the **exact version** that was seen by 123 + the reviewer. This preserves a faithful audit trail if data later changes. 124 + 125 +* In a federated environment, nodes can choose: 126 +* which identity entities to replicate (CLAIM, SCENARIO, EVIDENCE, VERDICT) 127 +* which versioned entities to replicate (e.g. only accepted VERDICT_VERSIONs, 128 + only EVIDENCE_VERSIONs above a reliability threshold, etc.) 129 + 130 +---- 131 + 132 += 5.5 Behavioral Notes = 133 + 134 +== 5.5.1 Late-Arriving Evidence == 135 + 136 +New evidence versions can make existing verdicts **outdated** and may trigger 137 +re-evaluation cascades. This is handled by the global trigger and automation 138 +architecture (see the Versioning & Automation chapters). 139 + 140 +== 5.5.2 Scenario Evolution == 141 + 142 +Scenario changes create new SCENARIO_VERSIONs; dependent verdicts and 143 +Scenario–Evidence links are re-assessed. Old versions remain available for 144 +historical comparison and reproducibility. 145 + 146 +== 5.5.3 Federation == 147 + 148 +Federated nodes can replicate subsets of the graph, including: 149 + 150 +* Claims and Scenarios of local interest 151 +* Evidence metadata (without full content) 152 +* Verdict lineages used for local decision-making 153 + 154 +Federation-specific entities (such as {{code}}FEDERATION_NODE{{/code}}, 155 +replication logs, and trust rules) are described in the Federation & 156 +Decentralization chapter and build on top of the core data model defined here. 157 + 158 +---- 159 + 1 1 == 1. Overall analysis & review of the data model == 2 2 3 3 === 1.1 Strengths of the current design === ... ... @@ -165,385 +165,3 @@ 165 165 ))) 166 166 * That’s fine for now; I’ll just clarify that those belong to a “Processing / AKEL” submodel, not the core logical data model. 167 167 ))) 168 - 169 -= 5. Data Model = 170 - 171 -The FactHarbor data model centers on four fully versioned, immutable entities: 172 - 173 -* **Claim** 174 -* **Scenario** 175 -* **Evidence** 176 -* **Verdict** 177 - 178 -These entities form the structured **“truth landscape”** for each claim. 179 -The model is explicitly **versioned**, **traceable**, and **federation-ready**. 180 - 181 -To keep the system auditable and explainable, FactHarbor uses a consistent 182 -**identity vs. version** pattern: 183 - 184 -* Identity entities (e.g. {{code}}CLAIM{{/code}}, {{code}}SCENARIO{{/code}}) 185 - define *what* something is in a stable sense. 186 -* Version entities (e.g. {{code}}CLAIM_VERSION{{/code}}, {{code}}SCENARIO_VERSION{{/code}}) 187 - define *how that thing looked at a given point in time*. 188 - 189 -All reasoning (e.g. verdicts, review actions) is attached to **versions**, never to 190 -mutable identities. 191 - 192 ----- 193 - 194 -= 5.1 Core entities and versioning pattern = 195 - 196 -(% class="wikitable" %) 197 -| **Logical concept** | **Identity entity** | **Version entity** | **Notes** 198 -| Claim (what people argue about) | {{code}}CLAIM{{/code}} | {{code}}CLAIM_VERSION{{/code}} | Claim text, phrasing, and metadata live in {{code}}CLAIM_VERSION{{/code}}. The identity {{code}}CLAIM{{/code}} stays stable across rephrasings. 199 -| Scenario (interpretive frame) | {{code}}SCENARIO{{/code}} | {{code}}SCENARIO_VERSION{{/code}} | A SCENARIO belongs to a CLAIM. Its versions capture evolving definitions, assumptions, and boundaries. 200 -| Evidence (source / datapoint) | {{code}}EVIDENCE{{/code}} | {{code}}EVIDENCE_VERSION{{/code}} | Identity of a source vs. specific extractions / updates over time. 201 -| Verdict (assessment) | {{code}}VERDICT{{/code}} | {{code}}VERDICT_VERSION{{/code}} | A VERDICT is defined per SCENARIO; VERDICT_VERSION captures the history of assessments. 202 -| Scenario–Evidence link | {{code}}SCENARIO_EVIDENCE_LINK{{/code}} | {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} | Links bind scenario versions to evidence versions with relevance & direction. 203 -| Claim cluster (semantic group) | {{code}}CLAIM_CLUSTER{{/code}} | – | Groups semantically related claims; mainly for discovery and navigation. 204 - 205 -Key design decisions: 206 - 207 -* A {{code}}CLAIM{{/code}} belongs to exactly one {{code}}CLAIM_CLUSTER{{/code}}. 208 -* A {{code}}SCENARIO{{/code}} belongs to exactly one {{code}}CLAIM{{/code}} 209 - (scenarios live at the *claim* level, not per individual phrasing). 210 -* Verdicts and Scenario–Evidence links are always attached to **versions**: 211 -* {{code}}SCENARIO_VERSION{{/code}} + 212 -{{code}}EVIDENCE_VERSION{{/code}} → 213 -{{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} 214 -* {{code}}SCENARIO_VERSION{{/code}} → 215 -{{code}}VERDICT_VERSION{{/code}} 216 - 217 -This ensures that when a Scenario or Evidence changes, old verdicts and links 218 -remain intact as historical records and can be revisited. 219 - 220 ----- 221 - 222 -= 5.2 Core Data Model ERD (expanded, versioned) = 223 - 224 -The following Mermaid ER diagram shows the main entities and their relationships. 225 -The convention is that fields ending in {{code}}Id{{/code}} are primary keys, 226 -and fields with {{code}}...IdFk{{/code}} are foreign keys. 227 - 228 -{{mermaid}} 229 -erDiagram 230 - CLAIM_CLUSTER { 231 - string ClusterID PK 232 - string EmbeddingVectorRef 233 - string Theme 234 - } 235 - 236 - CLAIM { 237 - string ClaimID PK 238 - string ClusterID FK 239 - string Status 240 - datetime CreatedAt 241 - } 242 - 243 - CLAIM_VERSION { 244 - string ClaimVersionID PK 245 - string ClaimID FK 246 - string Text 247 - string ClaimType 248 - string Domain 249 - datetime CreatedAt 250 - } 251 - 252 - SCENARIO { 253 - string ScenarioID PK 254 - string ClaimID FK 255 - string Name 256 - datetime CreatedAt 257 - } 258 - 259 - SCENARIO_VERSION { 260 - string ScenarioVersionID PK 261 - string ScenarioID FK 262 - string Definitions 263 - string Assumptions 264 - string Boundaries 265 - datetime CreatedAt 266 - } 267 - 268 - EVIDENCE { 269 - string EvidenceID PK 270 - string SourceType 271 - string URL 272 - float ReliabilityScore 273 - } 274 - 275 - EVIDENCE_VERSION { 276 - string EvidenceVersionID PK 277 - string EvidenceID FK 278 - string Summary 279 - float ReliabilityScore 280 - datetime CreatedAt 281 - } 282 - 283 - SCENARIO_EVIDENCE_LINK { 284 - string LinkID PK 285 - string ScenarioVersionID FK 286 - string EvidenceVersionID FK 287 - float Relevance 288 - string Direction 289 - } 290 - 291 - VERDICT { 292 - string VerdictID PK 293 - string ScenarioID FK 294 - } 295 - 296 - VERDICT_VERSION { 297 - string VerdictVersionID PK 298 - string VerdictID FK 299 - float Verdict 300 - float Confidence 301 - string Reasoning 302 - datetime CreatedAt 303 - } 304 - 305 - CLAIM_CLUSTER ||--o{ CLAIM : contains 306 - CLAIM ||--o{ CLAIM_VERSION : versions 307 - 308 - CLAIM ||--o{ SCENARIO : has 309 - SCENARIO ||--o{ SCENARIO_VERSION : versions 310 - 311 - EVIDENCE ||--o{ EVIDENCE_VERSION : versions 312 - 313 - SCENARIO_VERSION ||--o{ SCENARIO_EVIDENCE_LINK : links 314 - EVIDENCE_VERSION ||--o{ SCENARIO_EVIDENCE_LINK : linked 315 - 316 - SCENARIO ||--o{ VERDICT : assessed 317 - VERDICT ||--o{ VERDICT_VERSION : versions 318 - 319 -{{/mermaid}} 320 - 321 -**Important points:** 322 - 323 -* Scenarios and Evidence are **linked via their versions** 324 - ({{code}}SCENARIO_VERSION{{/code}} and {{code}}EVIDENCE_VERSION{{/code}}). 325 -* Verdicts are **per ScenarioVersion** and stored in {{code}}VERDICT_VERSION{{/code}}. 326 -* {{code}}CLAIM_CLUSTER{{/code}} is shared across diagrams; it is shown here and in the Data Use / Review model. 327 - 328 -All version entities are immutable: once created, they are never changed, only 329 -superseded by newer versions. 330 - 331 ----- 332 - 333 -= 5.3 Data Use & Review ERD (expanded, versioned) = 334 - 335 -The **Data Use** model captures who does what with which versioned data: 336 - 337 -* Users (including technical users) 338 -* Roles and role assignments 339 -* Review actions on versioned entities 340 - 341 -{{mermaid}} 342 -erDiagram 343 - %% Core clusters shown for context 344 - CLAIM_CLUSTER { 345 - string ClusterID PK 346 - string EmbeddingVectorRef 347 - string Theme 348 - } 349 - 350 - CLAIM { 351 - string ClaimID PK 352 - string ClusterID FK 353 - string Status 354 - datetime CreatedAt 355 - } 356 - 357 - CLAIM_VERSION { 358 - string ClaimVersionID PK 359 - string ClaimID FK 360 - string Text 361 - string ClaimType 362 - string Domain 363 - datetime CreatedAt 364 - } 365 - 366 - SCENARIO { 367 - string ScenarioID PK 368 - string ClaimID FK 369 - string Name 370 - datetime CreatedAt 371 - } 372 - 373 - SCENARIO_VERSION { 374 - string ScenarioVersionID PK 375 - string ScenarioID FK 376 - string Definitions 377 - string Assumptions 378 - string Boundaries 379 - datetime CreatedAt 380 - } 381 - 382 - EVIDENCE { 383 - string EvidenceID PK 384 - string SourceType 385 - string URL 386 - float ReliabilityScore 387 - } 388 - 389 - EVIDENCE_VERSION { 390 - string EvidenceVersionID PK 391 - string EvidenceID FK 392 - string Summary 393 - float ReliabilityScore 394 - datetime CreatedAt 395 - } 396 - 397 - VERDICT { 398 - string VerdictID PK 399 - string ScenarioID FK 400 - } 401 - 402 - VERDICT_VERSION { 403 - string VerdictVersionID PK 404 - string VerdictID FK 405 - float Verdict 406 - float Confidence 407 - string Reasoning 408 - datetime CreatedAt 409 - } 410 - 411 - %% Users and roles 412 - USER { 413 - string UserID PK 414 - string Handle 415 - string Email 416 - } 417 - 418 - TECHNICAL_USER { 419 - string UserID PK 420 - string SystemName 421 - } 422 - 423 - CONTRIBUTING_USER { 424 - string UserID PK 425 - string DisplayName 426 - } 427 - 428 - TRUSTED_CONTRIBUTOR { 429 - string UserID PK 430 - string TrustLevel 431 - } 432 - 433 - REVIEWER { 434 - string UserID PK 435 - string Domain 436 - } 437 - 438 - EXPERT { 439 - string UserID PK 440 - string ExpertiseArea 441 - } 442 - 443 - FEDERATION_NODE { 444 - string NodeID PK 445 - string Region 446 - } 447 - 448 - FEDERATION_ADMIN { 449 - string UserID PK 450 - string Permissions 451 - } 452 - 453 - REVIEW_ACTION { 454 - string ReviewActionID PK 455 - string UserID FK 456 - string TargetEntityType 457 - string TargetEntityVersionID 458 - string ActionType 459 - string Comment 460 - datetime Timestamp 461 - } 462 - 463 - %% Inheritance / specialization (modelled as relationships) 464 - USER ||--o{ TECHNICAL_USER : "is a" 465 - USER ||--o{ CONTRIBUTING_USER : "is a" 466 - 467 - CONTRIBUTING_USER ||--o{ TRUSTED_CONTRIBUTOR : "subset" 468 - CONTRIBUTING_USER ||--o{ REVIEWER : "subset" 469 - CONTRIBUTING_USER ||--o{ EXPERT : "subset" 470 - 471 - TECHNICAL_USER ||--o{ FEDERATION_NODE : "operates" 472 - TECHNICAL_USER ||--o{ FEDERATION_ADMIN : "administers" 473 - 474 - %% Review actions on versioned entities 475 - USER ||--o{ REVIEW_ACTION : performs 476 - 477 - REVIEW_ACTION }o--|| CLAIM_VERSION : reviews 478 - REVIEW_ACTION }o--|| SCENARIO_VERSION : reviews 479 - REVIEW_ACTION }o--|| EVIDENCE_VERSION : reviews 480 - REVIEW_ACTION }o--|| VERDICT_VERSION : reviews 481 - 482 -{{/mermaid}} 483 - 484 -Notes: 485 - 486 -* Most roles (READER, CONTRIBUTOR, TRUSTED_CONTRIBUTOR, REVIEWER, MODERATOR, 487 - SYSTEM_ADMIN, FEDERATION_OPERATOR, FEDERATION_ADMIN, …) are represented as rows 488 - in {{code}}ROLE{{/code}}. 489 -* {{code}}TECHNICAL_USER{{/code}} captures strictly technical accounts (API keys, 490 - node-to-node federation agents, batch jobs). All other roles can, in principle, 491 - be held by both human and technical users where appropriate. 492 -* A {{code}}READER{{/code}} normally does **not** perform REVIEW_ACTIONs, while 493 - roles like REVIEWER, TRUSTED_CONTRIBUTOR, MODERATOR, and some federation roles 494 - do. 495 - 496 ----- 497 - 498 -= 5.4 Versioning and re-evaluation behavior = 499 - 500 -This section ties the data model to the re-evaluation logic 501 -(described in more detail in the Versioning and Automation chapters). 502 - 503 -* When a new {{code}}EVIDENCE_VERSION{{/code}} is created: 504 -* All related {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} entries referencing 505 - that evidence version are candidates for re-assessment. 506 -* Related {{code}}VERDICT_VERSION{{/code}} entries may become **outdated** and 507 - are queued for re-evaluation. 508 - 509 -* When a new {{code}}SCENARIO_VERSION{{/code}} is created: 510 -* It may inherit some links from earlier scenarios, or start empty depending 511 - on the change classification (cosmetic vs. conceptual). 512 -* All verdicts for that scenario are recalculated and stored as new 513 -{{code}}VERDICT_VERSION{{/code}} entries. 514 - 515 -* REVIEW_ACTIONs are always attached to the **exact version** that was seen by 516 - the reviewer. This preserves a faithful audit trail if data later changes. 517 - 518 -* In a federated environment, nodes can choose: 519 -* which identity entities to replicate (CLAIM, SCENARIO, EVIDENCE, VERDICT) 520 -* which versioned entities to replicate (e.g. only accepted VERDICT_VERSIONs, 521 - only EVIDENCE_VERSIONs above a reliability threshold, etc.) 522 - 523 ----- 524 - 525 -= 5.5 Behavioral Notes = 526 - 527 -== 5.5.1 Late-Arriving Evidence == 528 - 529 -New evidence versions can make existing verdicts **outdated** and may trigger 530 -re-evaluation cascades. This is handled by the global trigger and automation 531 -architecture (see the Versioning & Automation chapters). 532 - 533 -== 5.5.2 Scenario Evolution == 534 - 535 -Scenario changes create new SCENARIO_VERSIONs; dependent verdicts and 536 -Scenario–Evidence links are re-assessed. Old versions remain available for 537 -historical comparison and reproducibility. 538 - 539 -== 5.5.3 Federation == 540 - 541 -Federated nodes can replicate subsets of the graph, including: 542 - 543 -* Claims and Scenarios of local interest 544 -* Evidence metadata (without full content) 545 -* Verdict lineages used for local decision-making 546 - 547 -Federation-specific entities (such as {{code}}FEDERATION_NODE{{/code}}, 548 -replication logs, and trust rules) are described in the Federation & 549 -Decentralization chapter and build on top of the core data model defined here.