Wiki source code of Data Model (From Specification Chat)

Version 8.2 by Robert Schaub on 2025/11/27 23:15

version	line-number	content
5.2	1	(((
8.2	2
5.2	3	)))
	4
	5	= 5. Data Model =
	6
	7	The FactHarbor data model centers on four fully versioned, immutable entities:
	8
	9	* Claim
	10	* Scenario
	11	* Evidence
	12	* Verdict
	13
	14	These entities form the structured “truth landscape” for each claim.
	15	The model is explicitly versioned, traceable, and federation-ready.
	16
	17	To keep the system auditable and explainable, FactHarbor uses a consistent
	18	identity vs. version pattern:
	19
	20	* Identity entities (e.g. {{code}}CLAIM{{/code}}, {{code}}SCENARIO{{/code}})
	21	define what something is in a stable sense.
	22	* Version entities (e.g. {{code}}CLAIM_VERSION{{/code}}, {{code}}SCENARIO_VERSION{{/code}})
	23	define how that thing looked at a given point in time.
	24
	25	All reasoning (e.g. verdicts, review actions) is attached to versions, never to
	26	mutable identities.
	27
	28	----
	29
	30	= 5.1 Core entities and versioning pattern =
	31
	32	(% class="wikitable" %)
	33	\| Logical concept \| Identity entity \| Version entity \| Notes
	34	\| Claim (what people argue about) \| {{code}}CLAIM{{/code}} \| {{code}}CLAIM_VERSION{{/code}} \| Claim text, phrasing, and metadata live in {{code}}CLAIM_VERSION{{/code}}. The identity {{code}}CLAIM{{/code}} stays stable across rephrasings.
	35	\| Scenario (interpretive frame) \| {{code}}SCENARIO{{/code}} \| {{code}}SCENARIO_VERSION{{/code}} \| A SCENARIO belongs to a CLAIM. Its versions capture evolving definitions, assumptions, and boundaries.
	36	\| Evidence (source / datapoint) \| {{code}}EVIDENCE{{/code}} \| {{code}}EVIDENCE_VERSION{{/code}} \| Identity of a source vs. specific extractions / updates over time.
	37	\| Verdict (assessment) \| {{code}}VERDICT{{/code}} \| {{code}}VERDICT_VERSION{{/code}} \| A VERDICT is defined per SCENARIO; VERDICT_VERSION captures the history of assessments.
	38	\| Scenario–Evidence link \| {{code}}SCENARIO_EVIDENCE_LINK{{/code}} \| {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} \| Links bind scenario versions to evidence versions with relevance & direction.
	39	\| Claim cluster (semantic group) \| {{code}}CLAIM_CLUSTER{{/code}} \| – \| Groups semantically related claims; mainly for discovery and navigation.
	40
	41	Key design decisions:
	42
	43	* A {{code}}CLAIM{{/code}} belongs to exactly one {{code}}CLAIM_CLUSTER{{/code}}.
	44	* A {{code}}SCENARIO{{/code}} belongs to exactly one {{code}}CLAIM{{/code}}
	45	(scenarios live at the claim level, not per individual phrasing).
	46	* Verdicts and Scenario–Evidence links are always attached to versions:
	47	* {{code}}SCENARIO_VERSION{{/code}} +
	48	{{code}}EVIDENCE_VERSION{{/code}} →
	49	{{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}}
	50	* {{code}}SCENARIO_VERSION{{/code}} →
	51	{{code}}VERDICT_VERSION{{/code}}
	52
	53	This ensures that when a Scenario or Evidence changes, old verdicts and links
	54	remain intact as historical records and can be revisited.
	55
	56	----
	57
	58	= 5.2 Core Data Model ERD (expanded, versioned) =
	59
	60	The following Mermaid ER diagram shows the main entities and their relationships.
	61	The convention is that fields ending in {{code}}Id{{/code}} are primary keys,
	62	and fields with {{code}}...IdFk{{/code}} are foreign keys.
	63
	64	{{comment}} Core Data Model ERD (Mermaid, from /Specification/Diagrams/Data Model) {{/comment}}
6.2	65	{{include document="FactHarbor.Playground.Core Data Model ERD Page (from Specification chat).WebHome" reference="FactHarbor.Playground.data.Core Data Model ERD Page (from Specification chat).WebHome"/}}
5.2	66
8.2	67	= Core Data Model ERD (Versioned) =
	68
	69	This diagram shows the full core data model with all versioned entities.
	70
	71	{{mermaid}}
	72	erDiagram
	73	CLAIM_CLUSTER {
	74	string ClusterID PK
	75	string EmbeddingVectorRef
	76	string Theme
	77	}
	78
	79	CLAIM {
	80	string ClaimID PK
	81	string ClusterID FK
	82	string Status
	83	datetime CreatedAt
	84	}
	85
	86	CLAIM_VERSION {
	87	string ClaimVersionID PK
	88	string ClaimID FK
	89	string Text
	90	string ClaimType
	91	string Domain
	92	datetime CreatedAt
	93	}
	94
	95	SCENARIO {
	96	string ScenarioID PK
	97	string ClaimID FK
	98	string Name
	99	datetime CreatedAt
	100	}
	101
	102	SCENARIO_VERSION {
	103	string ScenarioVersionID PK
	104	string ScenarioID FK
	105	string Definitions
	106	string Assumptions
	107	string Boundaries
	108	datetime CreatedAt
	109	}
	110
	111	EVIDENCE {
	112	string EvidenceID PK
	113	string SourceType
	114	string URL
	115	float ReliabilityScore
	116	}
	117
	118	EVIDENCE_VERSION {
	119	string EvidenceVersionID PK
	120	string EvidenceID FK
	121	string Summary
	122	float ReliabilityScore
	123	datetime CreatedAt
	124	}
	125
	126	SCENARIO_EVIDENCE_LINK {
	127	string LinkID PK
	128	string ScenarioVersionID FK
	129	string EvidenceVersionID FK
	130	float Relevance
	131	string Direction
	132	}
	133
	134	VERDICT {
	135	string VerdictID PK
	136	string ScenarioID FK
	137	}
	138
	139	VERDICT_VERSION {
	140	string VerdictVersionID PK
	141	string VerdictID FK
	142	float Verdict
	143	float Confidence
	144	string Reasoning
	145	datetime CreatedAt
	146	}
	147
	148	CLAIM_CLUSTER \|\|--o{ CLAIM : contains
	149	CLAIM \|\|--o{ CLAIM_VERSION : versions
	150
	151	CLAIM \|\|--o{ SCENARIO : has
	152	SCENARIO \|\|--o{ SCENARIO_VERSION : versions
	153
	154	EVIDENCE \|\|--o{ EVIDENCE_VERSION : versions
	155
	156	SCENARIO_VERSION \|\|--o{ SCENARIO_EVIDENCE_LINK : links
	157	EVIDENCE_VERSION \|\|--o{ SCENARIO_EVIDENCE_LINK : linked
	158
	159	SCENARIO \|\|--o{ VERDICT : assessed
	160	VERDICT \|\|--o{ VERDICT_VERSION : versions
	161	{{/mermaid}}
	162
	163	{{info}}
	164	All key entities are explicitly versioned here (…VERSION tables).
	165	This reflects the versioning requirements in the textual Data Model chapter.
	166	{{/info}}
	167
	168
5.2	169	Important points:
	170
	171	* Scenarios and Evidence are linked via their versions
	172	({{code}}SCENARIO_VERSION{{/code}} and {{code}}EVIDENCE_VERSION{{/code}}).
	173	* Verdicts are per ScenarioVersion and stored in {{code}}VERDICT_VERSION{{/code}}.
	174	* {{code}}CLAIM_CLUSTER{{/code}} is shared across diagrams; it is shown here and in the Data Use / Review model.
	175
	176	All version entities are immutable: once created, they are never changed, only
	177	superseded by newer versions.
	178
	179	----
	180
8.1	181	= 5.3 Data Use & Review ERD =
5.2	182
	183	The Data Use model captures who does what with which versioned data:
	184
	185	* Users (including technical users)
	186	* Roles and role assignments
	187	* Review actions on versioned entities
	188
	189	{{comment}} Data Use ERD (Mermaid, from /Specification/Diagrams/Data Use ERD) {{/comment}}
6.3	190	{{include document="FactHarbor.Playground.Data Use ERD Page (from Specification chat).WebHome" reference="FactHarbor.Playground.data.Data Use ERD Page (from Specification chat).WebHome"/}}
5.2	191
8.2	192	= Data Use ERD (Roles, Review & Versioned Entities) =
5.2	193
8.2	194	This diagram shows how users, roles, and review actions relate to the
	195	versioned core entities.
	196
	197	{{mermaid}}
	198	erDiagram
	199	%% Core clusters shown for context
	200	CLAIM_CLUSTER {
	201	string ClusterID PK
	202	string EmbeddingVectorRef
	203	string Theme
	204	}
	205
	206	CLAIM {
	207	string ClaimID PK
	208	string ClusterID FK
	209	string Status
	210	datetime CreatedAt
	211	}
	212
	213	CLAIM_VERSION {
	214	string ClaimVersionID PK
	215	string ClaimID FK
	216	string Text
	217	string ClaimType
	218	string Domain
	219	datetime CreatedAt
	220	}
	221
	222	SCENARIO {
	223	string ScenarioID PK
	224	string ClaimID FK
	225	string Name
	226	datetime CreatedAt
	227	}
	228
	229	SCENARIO_VERSION {
	230	string ScenarioVersionID PK
	231	string ScenarioID FK
	232	string Definitions
	233	string Assumptions
	234	string Boundaries
	235	datetime CreatedAt
	236	}
	237
	238	EVIDENCE {
	239	string EvidenceID PK
	240	string SourceType
	241	string URL
	242	float ReliabilityScore
	243	}
	244
	245	EVIDENCE_VERSION {
	246	string EvidenceVersionID PK
	247	string EvidenceID FK
	248	string Summary
	249	float ReliabilityScore
	250	datetime CreatedAt
	251	}
	252
	253	VERDICT {
	254	string VerdictID PK
	255	string ScenarioID FK
	256	}
	257
	258	VERDICT_VERSION {
	259	string VerdictVersionID PK
	260	string VerdictID FK
	261	float Verdict
	262	float Confidence
	263	string Reasoning
	264	datetime CreatedAt
	265	}
	266
	267	%% Users and roles
	268	USER {
	269	string UserID PK
	270	string Handle
	271	string Email
	272	}
	273
	274	TECHNICAL_USER {
	275	string UserID PK
	276	string SystemName
	277	}
	278
	279	CONTRIBUTING_USER {
	280	string UserID PK
	281	string DisplayName
	282	}
	283
	284	TRUSTED_CONTRIBUTOR {
	285	string UserID PK
	286	string TrustLevel
	287	}
	288
	289	REVIEWER {
	290	string UserID PK
	291	string Domain
	292	}
	293
	294	EXPERT {
	295	string UserID PK
	296	string ExpertiseArea
	297	}
	298
	299	FEDERATION_NODE {
	300	string NodeID PK
	301	string Region
	302	}
	303
	304	FEDERATION_ADMIN {
	305	string UserID PK
	306	string Permissions
	307	}
	308
	309	REVIEW_ACTION {
	310	string ReviewActionID PK
	311	string UserID FK
	312	string TargetEntityType
	313	string TargetEntityVersionID
	314	string ActionType
	315	string Comment
	316	datetime Timestamp
	317	}
	318
	319	%% Inheritance / specialization (modelled as relationships)
	320	USER \|\|--o{ TECHNICAL_USER : "is a"
	321	USER \|\|--o{ CONTRIBUTING_USER : "is a"
	322
	323	CONTRIBUTING_USER \|\|--o{ TRUSTED_CONTRIBUTOR : "subset"
	324	CONTRIBUTING_USER \|\|--o{ REVIEWER : "subset"
	325	CONTRIBUTING_USER \|\|--o{ EXPERT : "subset"
	326
	327	TECHNICAL_USER \|\|--o{ FEDERATION_NODE : "operates"
	328	TECHNICAL_USER \|\|--o{ FEDERATION_ADMIN : "administers"
	329
	330	%% Review actions on versioned entities
	331	USER \|\|--o{ REVIEW_ACTION : performs
	332
	333	REVIEW_ACTION }o--\|\| CLAIM_VERSION : reviews
	334	REVIEW_ACTION }o--\|\| SCENARIO_VERSION : reviews
	335	REVIEW_ACTION }o--\|\| EVIDENCE_VERSION : reviews
	336	REVIEW_ACTION }o--\|\| VERDICT_VERSION : reviews
	337	{{/mermaid}}
	338
	339	{{info}}
	340	This diagram focuses on who uses and reviews which versioned entities.
	341	USER is the base type; TECHNICAL_USER and CONTRIBUTING_USER are specializations.
	342	Other roles (REVIEWER, EXPERT, TRUSTED_CONTRIBUTOR, FEDERATION_ADMIN, FEDERATION_NODE)
	343	are modelled as specializations or technical subtypes.
	344	{{/info}}
	345
	346
	347
5.2	348	Notes:
	349
	350	* Most roles (READER, CONTRIBUTOR, TRUSTED_CONTRIBUTOR, REVIEWER, MODERATOR,
	351	SYSTEM_ADMIN, FEDERATION_OPERATOR, FEDERATION_ADMIN, …) are represented as rows
	352	in {{code}}ROLE{{/code}}.
	353	* {{code}}TECHNICAL_USER{{/code}} captures strictly technical accounts (API keys,
	354	node-to-node federation agents, batch jobs). All other roles can, in principle,
	355	be held by both human and technical users where appropriate.
	356	* A {{code}}READER{{/code}} normally does not perform REVIEW_ACTIONs, while
	357	roles like REVIEWER, TRUSTED_CONTRIBUTOR, MODERATOR, and some federation roles
	358	do.
	359
	360	----
	361
	362	= 5.4 Versioning and re-evaluation behavior =
	363
	364	This section ties the data model to the re-evaluation logic
	365	(described in more detail in the Versioning and Automation chapters).
	366
	367	* When a new {{code}}EVIDENCE_VERSION{{/code}} is created:
	368	* All related {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} entries referencing
	369	that evidence version are candidates for re-assessment.
	370	* Related {{code}}VERDICT_VERSION{{/code}} entries may become outdated and
	371	are queued for re-evaluation.
	372
	373	* When a new {{code}}SCENARIO_VERSION{{/code}} is created:
	374	* It may inherit some links from earlier scenarios, or start empty depending
	375	on the change classification (cosmetic vs. conceptual).
	376	* All verdicts for that scenario are recalculated and stored as new
	377	{{code}}VERDICT_VERSION{{/code}} entries.
	378
	379	* REVIEW_ACTIONs are always attached to the exact version that was seen by
	380	the reviewer. This preserves a faithful audit trail if data later changes.
	381
	382	* In a federated environment, nodes can choose:
	383	* which identity entities to replicate (CLAIM, SCENARIO, EVIDENCE, VERDICT)
	384	* which versioned entities to replicate (e.g. only accepted VERDICT_VERSIONs,
	385	only EVIDENCE_VERSIONs above a reliability threshold, etc.)
	386
	387	----
	388
	389	= 5.5 Behavioral Notes =
	390
	391	== 5.5.1 Late-Arriving Evidence ==
	392
	393	New evidence versions can make existing verdicts outdated and may trigger
	394	re-evaluation cascades. This is handled by the global trigger and automation
	395	architecture (see the Versioning & Automation chapters).
	396
	397	== 5.5.2 Scenario Evolution ==
	398
	399	Scenario changes create new SCENARIO_VERSIONs; dependent verdicts and
	400	Scenario–Evidence links are re-assessed. Old versions remain available for
	401	historical comparison and reproducibility.
	402
	403	== 5.5.3 Federation ==
	404
	405	Federated nodes can replicate subsets of the graph, including:
	406
	407	* Claims and Scenarios of local interest
	408	* Evidence metadata (without full content)
	409	* Verdict lineages used for local decision-making
	410
	411	Federation-specific entities (such as {{code}}FEDERATION_NODE{{/code}},
	412	replication logs, and trust rules) are described in the Federation &
	413	Decentralization chapter and build on top of the core data model defined here.
	414
	415
8.2	416
	417	USER
	418	├── TECHNICAL_USER
	419	│ ├── FEDERATION_ADMIN
	420	│ └── AKEL_AGENT (optional future)
	421
	422	READER
	423	└── CONTRIBUTING_USER
	424	├── TRUSTED_CONTRIBUTOR
	425	├── REVIEWER
	426	├── EXPERT
	427	├── MODERATOR
	428
	429
	430	ADMIN
	431
	432	FEDERATION_ADMIN (administrative, but human)
	433
	434
4.1	435	== 1. Overall analysis & review of the data model ==
	436
	437	=== 1.1 Strengths of the current design ===
	438
	439	* (((
	440	Identity vs. version pattern
	441	Using base entities plus version entities (CLAIM + CLAIM_VERSION, SCENARIO + SCENARIO_VERSION, etc.) is exactly how modern knowledge systems handle:
	442
	443	* auditability
	444	* time evolution
	445	* re-evaluation triggers
	446	* federation and partial replication
	447	)))
	448	* (((
	449	Scenario-centric reasoning
	450	Separating //Claim// (what people argue about) from //Scenario// (interpretive frame) is very aligned with “truth landscape” style systems:
	451
	452	* Scenarios explain //why people disagree//.
	453	* Verdicts are tied to specific scenario versions → avoids mixing incompatible assumptions.
	454	)))
	455	* Evidence and verdicts as first-class entities
	456	Evidence is explicit, linked to scenarios, and verdicts are per scenario. This matches good practice from fact-checking, scientific assessment panels, and trust graphs.
	457	* (((
	458	Cluster level (CLAIM_CLUSTER)
	459	Grouping related claims avoids duplication and lets you:
	460
	461	* reuse scenarios across paraphrases
	462	* share embeddings / semantic search
	463	* keep the system scalable as the corpus grows.
	464	)))
	465	* (((
	466	Explicit review layer (REVIEW_ACTION, roles, etc.)
	467	Separating “data” from “who reviewed what” keeps the model clean, and is exactly what you want for:
	468
	469	* governance
	470	* permissions
	471	* audit trails
	472	* future trust scoring per user / role.
	473	)))
	474
	475	----
	476
	477	=== 1.2 Design decisions I’m locking in (based on our discussions) ===
	478
	479	To make the model consistent and “state-of-the-art”, I will assume the following as //current intended design//:
	480
	481	1. (((
	482	Claims vs Scenarios
	483
	484	* CLAIM is the stable identity for “what people argue about”.
	485	* CLAIM_VERSION are individual phrasings / formulations / metadata.
	486	* (((
	487	SCENARIO belongs to a CLAIM, not to a specific CLAIM_VERSION.
	488	Rationale:
	489
	490	* Many different phrasings share the //same// scenario.
	491	* You avoid duplicating scenarios per wording.
	492	)))
	493	* SCENARIO_VERSION holds detailed definitions, assumptions, boundaries, etc.
	494	)))
	495	1. (((
	496	Version-specific reasoning
	497
	498	* Verdicts are always attached to SCENARIO_VERSION (not base SCENARIO).
	499	* Evidence links are between SCENARIO_VERSION and EVIDENCE_VERSION.
	500	→ This is what we agreed when we said //“SCENARIO_EVIDENCE_LINK should link the respective versions instead”//.
	501	)))
	502	1. (((
	503	Clusters
	504
	505	* CLAIM_CLUSTER groups Claims (semantically close claims).
	506	* It is visible in both diagrams (Core Data Model and Data Use).
	507	)))
	508	1. (((
	509	Review vs data
	510
	511	* (((
	512	All review happens on versioned entities:
	513
	514	* CLAIM_VERSION
	515	* SCENARIO_VERSION
	516	* EVIDENCE_VERSION
	517	* SCENARIO_EVIDENCE_LINK_VERSION
	518	* VERDICT_VERSION
	519	)))
	520	* REVIEW_ACTION is the generic log of //who// did //what// on //which version//.
	521	)))
	522	1. (((
	523	Users & roles
	524
	525	* USER has an attribute (or a linked entity) that distinguishes technical users from normal accounts.
	526	* We //keep// TECHNICAL_USER as a specialisation of USER (strictly technical accounts).
	527	* All human & technical accounts can hold roles via USER_ROLE_MEMBERSHIP.
	528	* (((
	529	Roles include:
	530
	531	* READER
	532	* CONTRIBUTOR
	533	* TRUSTED_CONTRIBUTOR
	534	* REVIEWER
	535	* MODERATOR
	536	* SYSTEM_ADMIN / MAINTAINER
	537	* FEDERATION_OPERATOR
	538	* FEDERATION_ADMIN
	539	(all present in the Data Use ERD, but as rows of ROLE rather than separate entities).
	540	)))
	541	)))
	542
	543	----
	544
	545	=== 1.3 Gaps / potential problems ===
	546
	547	These are the main issues & missing areas I see:
	548
	549	1. (((
	550	Versioning text in chapter 5 is currently too thin (‘…’ placeholders)
	551
	552	* (((
	553	The spec does not yet //verbally// spell out:
	554
	555	* the identity vs version pattern, systematically
	556	* how re-evaluation triggers are derived from version changes
	557	* how this aligns with federation (which versions are replicated where).
	558	)))
	559	)))
	560	1. (((
	561	No explicit “provenance granularity” in the model
	562
	563	* (((
	564	EVIDENCE is a single entity. For more advanced use cases, you may later want:
	565
	566	* EVIDENCE_SOURCE (the whole article/report/video)
	567	* EVIDENCE_FRAGMENT (specific paragraph/clip with its own reliability, quote, etc.)
	568	)))
	569	* For now, I’ll keep EVIDENCE/EVIDENCE_VERSION as is, but I’ll mention this as a possible extension.
	570	)))
	571	1. (((
	572	Review target polymorphism
	573
	574	* (((
	575	REVIEW_ACTION can apply to multiple entity types. In the diagram this shows as multiple relationships:
	576
	577	* CLAIM_VERSION → REVIEW_ACTION
	578	* SCENARIO_VERSION → REVIEW_ACTION
	579	* etc.
	580	)))
	581	* A more “pure” relational modeling would use a generic “subjectType + subjectId” or an intermediate “REVIEW_TARGET” table.
	582	* For readability, I’ll keep the simpler multi-edge representation and mention the polymorphism in text.
	583	)))
	584	1. (((
	585	Federation details missing from core ERD
	586
	587	* There is no explicit FEDERATION_NODE / REPLICATION_LOG in the Data Model chapter.
	588	* This is ok for “core logical data model”, but I’ll add a short note that federation metadata is handled in the Federation chapter and via additional entities.
	589	)))
	590	1. (((
	591	Automation / AKEL artifacts left implicit
	592
	593	* (((
	594	The Data Model chapter currently doesn’t describe:
	595
	596	* AKEL task queues
	597	* extraction runs
	598	* model versions
	599	)))
	600	* That’s fine for now; I’ll just clarify that those belong to a “Processing / AKEL” submodel, not the core logical data model.
	601	)))

Wiki source code of Data Model (From Specification Chat)

Applications

Navigation

Need help?