Wiki source code of Data Model (From Specification Chat)

Last modified by Robert Schaub on 2025/12/24 20:35

version	line-number	content
5.2	1	(((
8.2	2
5.2	3	)))
	4
	5	= 5. Data Model =
	6
	7	The FactHarbor data model centers on four fully versioned, immutable entities:
	8
	9	* Claim
	10	* Scenario
	11	* Evidence
	12	* Verdict
	13
	14	These entities form the structured “truth landscape” for each claim.
	15	The model is explicitly versioned, traceable, and federation-ready.
	16
	17	To keep the system auditable and explainable, FactHarbor uses a consistent
	18	identity vs. version pattern:
	19
	20	* Identity entities (e.g. {{code}}CLAIM{{/code}}, {{code}}SCENARIO{{/code}})
	21	define what something is in a stable sense.
	22	* Version entities (e.g. {{code}}CLAIM_VERSION{{/code}}, {{code}}SCENARIO_VERSION{{/code}})
	23	define how that thing looked at a given point in time.
	24
	25	All reasoning (e.g. verdicts, review actions) is attached to versions, never to
	26	mutable identities.
	27
	28	----
	29
	30	= 5.1 Core entities and versioning pattern =
	31
	32	(% class="wikitable" %)
	33	\| Logical concept \| Identity entity \| Version entity \| Notes
	34	\| Claim (what people argue about) \| {{code}}CLAIM{{/code}} \| {{code}}CLAIM_VERSION{{/code}} \| Claim text, phrasing, and metadata live in {{code}}CLAIM_VERSION{{/code}}. The identity {{code}}CLAIM{{/code}} stays stable across rephrasings.
	35	\| Scenario (interpretive frame) \| {{code}}SCENARIO{{/code}} \| {{code}}SCENARIO_VERSION{{/code}} \| A SCENARIO belongs to a CLAIM. Its versions capture evolving definitions, assumptions, and boundaries.
	36	\| Evidence (source / datapoint) \| {{code}}EVIDENCE{{/code}} \| {{code}}EVIDENCE_VERSION{{/code}} \| Identity of a source vs. specific extractions / updates over time.
	37	\| Verdict (assessment) \| {{code}}VERDICT{{/code}} \| {{code}}VERDICT_VERSION{{/code}} \| A VERDICT is defined per SCENARIO; VERDICT_VERSION captures the history of assessments.
	38	\| Scenario–Evidence link \| {{code}}SCENARIO_EVIDENCE_LINK{{/code}} \| {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} \| Links bind scenario versions to evidence versions with relevance & direction.
	39	\| Claim cluster (semantic group) \| {{code}}CLAIM_CLUSTER{{/code}} \| – \| Groups semantically related claims; mainly for discovery and navigation.
	40
	41	Key design decisions:
	42
	43	* A {{code}}CLAIM{{/code}} belongs to exactly one {{code}}CLAIM_CLUSTER{{/code}}.
	44	* A {{code}}SCENARIO{{/code}} belongs to exactly one {{code}}CLAIM{{/code}}
	45	(scenarios live at the claim level, not per individual phrasing).
	46	* Verdicts and Scenario–Evidence links are always attached to versions:
	47	* {{code}}SCENARIO_VERSION{{/code}} +
	48	{{code}}EVIDENCE_VERSION{{/code}} →
	49	{{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}}
	50	* {{code}}SCENARIO_VERSION{{/code}} →
	51	{{code}}VERDICT_VERSION{{/code}}
	52
	53	This ensures that when a Scenario or Evidence changes, old verdicts and links
	54	remain intact as historical records and can be revisited.
	55
	56	----
	57
	58	= 5.2 Core Data Model ERD (expanded, versioned) =
	59
	60	The following Mermaid ER diagram shows the main entities and their relationships.
	61	The convention is that fields ending in {{code}}Id{{/code}} are primary keys,
	62	and fields with {{code}}...IdFk{{/code}} are foreign keys.
	63
	64	{{comment}} Core Data Model ERD (Mermaid, from /Specification/Diagrams/Data Model) {{/comment}}
9.15	65	{{include document="FactHarbor.Playground.Core Data Model ERD Page (from Specification chat).WebHome" reference="Test.Playground.data.Core Data Model ERD Page (from Specification chat).WebHome"/}}
5.2	66
8.2	67	= Core Data Model ERD (Versioned) =
	68
	69	This diagram shows the full core data model with all versioned entities.
	70
	71	{{mermaid}}
	72	erDiagram
	73	CLAIM_CLUSTER {
	74	string ClusterID PK
	75	string EmbeddingVectorRef
	76	string Theme
	77	}
	78
	79	CLAIM {
	80	string ClaimID PK
	81	string ClusterID FK
	82	string Status
	83	datetime CreatedAt
	84	}
	85
	86	CLAIM_VERSION {
	87	string ClaimVersionID PK
	88	string ClaimID FK
	89	string Text
	90	string ClaimType
	91	string Domain
	92	datetime CreatedAt
	93	}
	94
	95	SCENARIO {
	96	string ScenarioID PK
	97	string ClaimID FK
	98	string Name
	99	datetime CreatedAt
	100	}
	101
	102	SCENARIO_VERSION {
	103	string ScenarioVersionID PK
	104	string ScenarioID FK
	105	string Definitions
	106	string Assumptions
	107	string Boundaries
	108	datetime CreatedAt
	109	}
	110
	111	EVIDENCE {
	112	string EvidenceID PK
	113	string SourceType
	114	string URL
	115	float ReliabilityScore
	116	}
	117
	118	EVIDENCE_VERSION {
	119	string EvidenceVersionID PK
	120	string EvidenceID FK
	121	string Summary
	122	float ReliabilityScore
	123	datetime CreatedAt
	124	}
	125
	126	SCENARIO_EVIDENCE_LINK {
	127	string LinkID PK
	128	string ScenarioVersionID FK
	129	string EvidenceVersionID FK
	130	float Relevance
	131	string Direction
	132	}
	133
	134	VERDICT {
	135	string VerdictID PK
	136	string ScenarioID FK
	137	}
	138
	139	VERDICT_VERSION {
	140	string VerdictVersionID PK
	141	string VerdictID FK
	142	float Verdict
	143	float Confidence
	144	string Reasoning
	145	datetime CreatedAt
	146	}
	147
	148	CLAIM_CLUSTER \|\|--o{ CLAIM : contains
	149	CLAIM \|\|--o{ CLAIM_VERSION : versions
	150
	151	CLAIM \|\|--o{ SCENARIO : has
	152	SCENARIO \|\|--o{ SCENARIO_VERSION : versions
	153
	154	EVIDENCE \|\|--o{ EVIDENCE_VERSION : versions
	155
	156	SCENARIO_VERSION \|\|--o{ SCENARIO_EVIDENCE_LINK : links
	157	EVIDENCE_VERSION \|\|--o{ SCENARIO_EVIDENCE_LINK : linked
	158
	159	SCENARIO \|\|--o{ VERDICT : assessed
	160	VERDICT \|\|--o{ VERDICT_VERSION : versions
	161	{{/mermaid}}
	162
	163	{{info}}
	164	All key entities are explicitly versioned here (…VERSION tables).
	165	This reflects the versioning requirements in the textual Data Model chapter.
	166	{{/info}}
	167
	168
5.2	169	Important points:
	170
	171	* Scenarios and Evidence are linked via their versions
	172	({{code}}SCENARIO_VERSION{{/code}} and {{code}}EVIDENCE_VERSION{{/code}}).
	173	* Verdicts are per ScenarioVersion and stored in {{code}}VERDICT_VERSION{{/code}}.
	174	* {{code}}CLAIM_CLUSTER{{/code}} is shared across diagrams; it is shown here and in the Data Use / Review model.
	175
	176	All version entities are immutable: once created, they are never changed, only
	177	superseded by newer versions.
	178
	179	----
	180
8.1	181	= 5.3 Data Use & Review ERD =
5.2	182
	183	The Data Use model captures who does what with which versioned data:
	184
	185	* Users (including technical users)
	186	* Roles and role assignments
	187	* Review actions on versioned entities
	188
	189	{{comment}} Data Use ERD (Mermaid, from /Specification/Diagrams/Data Use ERD) {{/comment}}
9.16	190	{{include document="FactHarbor.Playground.Data Use ERD Page (from Specification chat).WebHome" reference="Test.Playground.data.Data Use ERD Page (from Specification chat).WebHome"/}}
5.2	191
8.2	192	= Data Use ERD (Roles, Review & Versioned Entities) =
5.2	193
8.2	194	This diagram shows how users, roles, and review actions relate to the
	195	versioned core entities.
	196
	197	{{mermaid}}
	198	erDiagram
	199	%% Core clusters shown for context
	200	CLAIM_CLUSTER {
	201	string ClusterID PK
	202	string EmbeddingVectorRef
	203	string Theme
	204	}
	205
	206	CLAIM {
	207	string ClaimID PK
	208	string ClusterID FK
	209	string Status
	210	datetime CreatedAt
	211	}
	212
	213	CLAIM_VERSION {
	214	string ClaimVersionID PK
	215	string ClaimID FK
	216	string Text
	217	string ClaimType
	218	string Domain
	219	datetime CreatedAt
	220	}
	221
	222	SCENARIO {
	223	string ScenarioID PK
	224	string ClaimID FK
	225	string Name
	226	datetime CreatedAt
	227	}
	228
	229	SCENARIO_VERSION {
	230	string ScenarioVersionID PK
	231	string ScenarioID FK
	232	string Definitions
	233	string Assumptions
	234	string Boundaries
	235	datetime CreatedAt
	236	}
	237
	238	EVIDENCE {
	239	string EvidenceID PK
	240	string SourceType
	241	string URL
	242	float ReliabilityScore
	243	}
	244
	245	EVIDENCE_VERSION {
	246	string EvidenceVersionID PK
	247	string EvidenceID FK
	248	string Summary
	249	float ReliabilityScore
	250	datetime CreatedAt
	251	}
	252
	253	VERDICT {
	254	string VerdictID PK
	255	string ScenarioID FK
	256	}
	257
	258	VERDICT_VERSION {
	259	string VerdictVersionID PK
	260	string VerdictID FK
	261	float Verdict
	262	float Confidence
	263	string Reasoning
	264	datetime CreatedAt
	265	}
	266
	267	%% Users and roles
	268	USER {
	269	string UserID PK
	270	string Handle
	271	string Email
	272	}
	273
	274	TECHNICAL_USER {
	275	string UserID PK
	276	string SystemName
	277	}
	278
	279	CONTRIBUTING_USER {
	280	string UserID PK
	281	string DisplayName
	282	}
	283
	284	TRUSTED_CONTRIBUTOR {
	285	string UserID PK
	286	string TrustLevel
	287	}
	288
	289	REVIEWER {
	290	string UserID PK
	291	string Domain
	292	}
	293
	294	EXPERT {
	295	string UserID PK
	296	string ExpertiseArea
	297	}
	298
	299	FEDERATION_NODE {
	300	string NodeID PK
	301	string Region
	302	}
	303
	304	FEDERATION_ADMIN {
	305	string UserID PK
	306	string Permissions
	307	}
	308
	309	REVIEW_ACTION {
	310	string ReviewActionID PK
	311	string UserID FK
	312	string TargetEntityType
	313	string TargetEntityVersionID
	314	string ActionType
	315	string Comment
	316	datetime Timestamp
	317	}
	318
	319	%% Inheritance / specialization (modelled as relationships)
	320	USER \|\|--o{ TECHNICAL_USER : "is a"
	321	USER \|\|--o{ CONTRIBUTING_USER : "is a"
	322
	323	CONTRIBUTING_USER \|\|--o{ TRUSTED_CONTRIBUTOR : "subset"
	324	CONTRIBUTING_USER \|\|--o{ REVIEWER : "subset"
	325	CONTRIBUTING_USER \|\|--o{ EXPERT : "subset"
	326
	327	TECHNICAL_USER \|\|--o{ FEDERATION_NODE : "operates"
	328	TECHNICAL_USER \|\|--o{ FEDERATION_ADMIN : "administers"
	329
	330	%% Review actions on versioned entities
	331	USER \|\|--o{ REVIEW_ACTION : performs
	332
	333	REVIEW_ACTION }o--\|\| CLAIM_VERSION : reviews
	334	REVIEW_ACTION }o--\|\| SCENARIO_VERSION : reviews
	335	REVIEW_ACTION }o--\|\| EVIDENCE_VERSION : reviews
	336	REVIEW_ACTION }o--\|\| VERDICT_VERSION : reviews
	337	{{/mermaid}}
	338
	339	{{info}}
	340	This diagram focuses on who uses and reviews which versioned entities.
	341	USER is the base type; TECHNICAL_USER and CONTRIBUTING_USER are specializations.
	342	Other roles (REVIEWER, EXPERT, TRUSTED_CONTRIBUTOR, FEDERATION_ADMIN, FEDERATION_NODE)
	343	are modelled as specializations or technical subtypes.
	344	{{/info}}
	345
	346
	347
5.2	348	Notes:
	349
	350	* Most roles (READER, CONTRIBUTOR, TRUSTED_CONTRIBUTOR, REVIEWER, MODERATOR,
	351	SYSTEM_ADMIN, FEDERATION_OPERATOR, FEDERATION_ADMIN, …) are represented as rows
	352	in {{code}}ROLE{{/code}}.
	353	* {{code}}TECHNICAL_USER{{/code}} captures strictly technical accounts (API keys,
	354	node-to-node federation agents, batch jobs). All other roles can, in principle,
	355	be held by both human and technical users where appropriate.
	356	* A {{code}}READER{{/code}} normally does not perform REVIEW_ACTIONs, while
	357	roles like REVIEWER, TRUSTED_CONTRIBUTOR, MODERATOR, and some federation roles
	358	do.
	359
	360	----
	361
	362	= 5.4 Versioning and re-evaluation behavior =
	363
	364	This section ties the data model to the re-evaluation logic
	365	(described in more detail in the Versioning and Automation chapters).
	366
	367	* When a new {{code}}EVIDENCE_VERSION{{/code}} is created:
	368	* All related {{code}}SCENARIO_EVIDENCE_LINK_VERSION{{/code}} entries referencing
	369	that evidence version are candidates for re-assessment.
	370	* Related {{code}}VERDICT_VERSION{{/code}} entries may become outdated and
	371	are queued for re-evaluation.
	372
	373	* When a new {{code}}SCENARIO_VERSION{{/code}} is created:
	374	* It may inherit some links from earlier scenarios, or start empty depending
	375	on the change classification (cosmetic vs. conceptual).
	376	* All verdicts for that scenario are recalculated and stored as new
	377	{{code}}VERDICT_VERSION{{/code}} entries.
	378
	379	* REVIEW_ACTIONs are always attached to the exact version that was seen by
	380	the reviewer. This preserves a faithful audit trail if data later changes.
	381
	382	* In a federated environment, nodes can choose:
	383	* which identity entities to replicate (CLAIM, SCENARIO, EVIDENCE, VERDICT)
	384	* which versioned entities to replicate (e.g. only accepted VERDICT_VERSIONs,
	385	only EVIDENCE_VERSIONs above a reliability threshold, etc.)
	386
	387	----
	388
	389	= 5.5 Behavioral Notes =
	390
	391	== 5.5.1 Late-Arriving Evidence ==
	392
	393	New evidence versions can make existing verdicts outdated and may trigger
	394	re-evaluation cascades. This is handled by the global trigger and automation
	395	architecture (see the Versioning & Automation chapters).
	396
	397	== 5.5.2 Scenario Evolution ==
	398
	399	Scenario changes create new SCENARIO_VERSIONs; dependent verdicts and
	400	Scenario–Evidence links are re-assessed. Old versions remain available for
	401	historical comparison and reproducibility.
	402
	403	== 5.5.3 Federation ==
	404
	405	Federated nodes can replicate subsets of the graph, including:
	406
	407	* Claims and Scenarios of local interest
	408	* Evidence metadata (without full content)
	409	* Verdict lineages used for local decision-making
	410
	411	Federation-specific entities (such as {{code}}FEDERATION_NODE{{/code}},
	412	replication logs, and trust rules) are described in the Federation &
	413	Decentralization chapter and build on top of the core data model defined here.
	414
	415
8.2	416
9.1	417
8.2	418	USER
	419	├── TECHNICAL_USER
	420	│ ├── FEDERATION_ADMIN
	421	│ └── AKEL_AGENT (optional future)
	422
	423	READER
	424	└── CONTRIBUTING_USER
	425	├── TRUSTED_CONTRIBUTOR
	426	├── REVIEWER
	427	├── EXPERT
	428	├── MODERATOR
	429
	430
	431	ADMIN
	432
	433	FEDERATION_ADMIN (administrative, but human)
	434
	435
4.1	436	== 1. Overall analysis & review of the data model ==
	437
	438	=== 1.1 Strengths of the current design ===
	439
	440	* (((
	441	Identity vs. version pattern
	442	Using base entities plus version entities (CLAIM + CLAIM_VERSION, SCENARIO + SCENARIO_VERSION, etc.) is exactly how modern knowledge systems handle:
	443
	444	* auditability
	445	* time evolution
	446	* re-evaluation triggers
	447	* federation and partial replication
	448	)))
	449	* (((
	450	Scenario-centric reasoning
	451	Separating //Claim// (what people argue about) from //Scenario// (interpretive frame) is very aligned with “truth landscape” style systems:
	452
	453	* Scenarios explain //why people disagree//.
	454	* Verdicts are tied to specific scenario versions → avoids mixing incompatible assumptions.
	455	)))
	456	* Evidence and verdicts as first-class entities
	457	Evidence is explicit, linked to scenarios, and verdicts are per scenario. This matches good practice from fact-checking, scientific assessment panels, and trust graphs.
	458	* (((
	459	Cluster level (CLAIM_CLUSTER)
	460	Grouping related claims avoids duplication and lets you:
	461
	462	* reuse scenarios across paraphrases
	463	* share embeddings / semantic search
	464	* keep the system scalable as the corpus grows.
	465	)))
	466	* (((
	467	Explicit review layer (REVIEW_ACTION, roles, etc.)
	468	Separating “data” from “who reviewed what” keeps the model clean, and is exactly what you want for:
	469
	470	* governance
	471	* permissions
	472	* audit trails
	473	* future trust scoring per user / role.
	474	)))
	475
	476	----
	477
	478	=== 1.2 Design decisions I’m locking in (based on our discussions) ===
	479
	480	To make the model consistent and “state-of-the-art”, I will assume the following as //current intended design//:
	481
	482	1. (((
	483	Claims vs Scenarios
	484
	485	* CLAIM is the stable identity for “what people argue about”.
	486	* CLAIM_VERSION are individual phrasings / formulations / metadata.
	487	* (((
	488	SCENARIO belongs to a CLAIM, not to a specific CLAIM_VERSION.
	489	Rationale:
	490
	491	* Many different phrasings share the //same// scenario.
	492	* You avoid duplicating scenarios per wording.
	493	)))
	494	* SCENARIO_VERSION holds detailed definitions, assumptions, boundaries, etc.
	495	)))
	496	1. (((
	497	Version-specific reasoning
	498
	499	* Verdicts are always attached to SCENARIO_VERSION (not base SCENARIO).
	500	* Evidence links are between SCENARIO_VERSION and EVIDENCE_VERSION.
	501	→ This is what we agreed when we said //“SCENARIO_EVIDENCE_LINK should link the respective versions instead”//.
	502	)))
	503	1. (((
	504	Clusters
	505
	506	* CLAIM_CLUSTER groups Claims (semantically close claims).
	507	* It is visible in both diagrams (Core Data Model and Data Use).
	508	)))
	509	1. (((
	510	Review vs data
	511
	512	* (((
	513	All review happens on versioned entities:
	514
	515	* CLAIM_VERSION
	516	* SCENARIO_VERSION
	517	* EVIDENCE_VERSION
	518	* SCENARIO_EVIDENCE_LINK_VERSION
	519	* VERDICT_VERSION
	520	)))
	521	* REVIEW_ACTION is the generic log of //who// did //what// on //which version//.
	522	)))
	523	1. (((
	524	Users & roles
	525
	526	* USER has an attribute (or a linked entity) that distinguishes technical users from normal accounts.
	527	* We //keep// TECHNICAL_USER as a specialisation of USER (strictly technical accounts).
	528	* All human & technical accounts can hold roles via USER_ROLE_MEMBERSHIP.
	529	* (((
	530	Roles include:
	531
	532	* READER
	533	* CONTRIBUTOR
	534	* TRUSTED_CONTRIBUTOR
	535	* REVIEWER
	536	* MODERATOR
	537	* SYSTEM_ADMIN / MAINTAINER
	538	* FEDERATION_OPERATOR
	539	* FEDERATION_ADMIN
	540	(all present in the Data Use ERD, but as rows of ROLE rather than separate entities).
	541	)))
	542	)))
	543
	544	----
	545
	546	=== 1.3 Gaps / potential problems ===
	547
	548	These are the main issues & missing areas I see:
	549
	550	1. (((
	551	Versioning text in chapter 5 is currently too thin (‘…’ placeholders)
	552
	553	* (((
	554	The spec does not yet //verbally// spell out:
	555
	556	* the identity vs version pattern, systematically
	557	* how re-evaluation triggers are derived from version changes
	558	* how this aligns with federation (which versions are replicated where).
	559	)))
	560	)))
	561	1. (((
	562	No explicit “provenance granularity” in the model
	563
	564	* (((
	565	EVIDENCE is a single entity. For more advanced use cases, you may later want:
	566
	567	* EVIDENCE_SOURCE (the whole article/report/video)
	568	* EVIDENCE_FRAGMENT (specific paragraph/clip with its own reliability, quote, etc.)
	569	)))
	570	* For now, I’ll keep EVIDENCE/EVIDENCE_VERSION as is, but I’ll mention this as a possible extension.
	571	)))
	572	1. (((
	573	Review target polymorphism
	574
	575	* (((
	576	REVIEW_ACTION can apply to multiple entity types. In the diagram this shows as multiple relationships:
	577
	578	* CLAIM_VERSION → REVIEW_ACTION
	579	* SCENARIO_VERSION → REVIEW_ACTION
	580	* etc.
	581	)))
	582	* A more “pure” relational modeling would use a generic “subjectType + subjectId” or an intermediate “REVIEW_TARGET” table.
	583	* For readability, I’ll keep the simpler multi-edge representation and mention the polymorphism in text.
	584	)))
	585	1. (((
	586	Federation details missing from core ERD
	587
	588	* There is no explicit FEDERATION_NODE / REPLICATION_LOG in the Data Model chapter.
	589	* This is ok for “core logical data model”, but I’ll add a short note that federation metadata is handled in the Federation chapter and via additional entities.
	590	)))
	591	1. (((
	592	Automation / AKEL artifacts left implicit
	593
	594	* (((
	595	The Data Model chapter currently doesn’t describe:
	596
	597	* AKEL task queues
	598	* extraction runs
	599	* model versions
	600	)))
	601	* That’s fine for now; I’ll just clarify that those belong to a “Processing / AKEL” submodel, not the core logical data model.
	602	)))

Wiki source code of Data Model (From Specification Chat)

Applications

Navigation

Need help?