Data Model

Last modified by Robert Schaub on 2025/12/24 20:30

Data Model

This page describes the current data model for FactHarbor 0.9.25.

1. Versioning Strategy

Every entity in FactHarbor has a full immutable version history. This ensures:

Complete auditability
Ability to reconstruct historical state
Federation-compatible lineage tracking
Transparent evolution of claims, scenarios, and verdicts

1.1 Core Versioning Principles

Immutability:

Each version is stored independently
Versions cannot be deleted, only superseded
Historical versions remain accessible

Lineage:

Each version links to its parent via `ParentVersionID`
Forms directed acyclic graph (DAG) of changes
Supports branching in federated environments

Provenance:

Every version timestamped (`CreatedAt`)
Author type recorded (`AuthorType`: Human, AI, ExternalNode)
Justification captured (`JustificationText`)
Digital signatures for integrity (`SignatureHash` in Release 1.0)

Federation Support:

Versions can originate from remote nodes
Conflict detection via lineage comparison
Parallel version trees for branching scenarios
Cross-node version synchronization

1.2 Common Version Fields

All versioned entities include:

VersionID: Unique identifier for this specific version
ParentVersionID: Link to previous version (null for first version)
CreatedAt: Timestamp (ISO 8601, UTC)
AuthorType: Human | AI | ExternalNode
CreatedBy: Foreign key to User or TechnicalUser
JustificationText: Brief explanation of changes
PublicationMode: Mode1 (draft) | Mode2 (AI-published) | Mode3 (human-reviewed)
ReviewStatus: Workflow state (draft|in_review|approved|rejected)
NodeOrigin: Node ID where version was created (for federation)
SignatureHash: Cryptographic signature (Release 1.0)

1.3 Versioning Architecture Diagram

graph LR
 CLAIM[Claim] -->|edited| EDIT[Edit Record]
 EDIT -->|stores| BEFORE[Before State]
 EDIT -->|stores| AFTER[After State]
 EDIT -->|tracks| WHO[Who Changed]
 EDIT -->|tracks| WHEN[When Changed]
 EDIT -->|tracks| WHY[Why Changed]
 EDIT -->|if needed| RESTORE[Manual Restore]
 RESTORE -->|create new| CLAIM
 style EDIT fill:#ffcccc
 style RESTORE fill:#ccffcc

Versioning Architecture - Simple audit trail for V1.0: Track who, what, when, why for each change. Store before/after values in edits table. Manual restore if needed (create new edit with old values). Full versioning system (branching, merging, automatic rollback) deferred to V2.0+ unless users explicitly request it.
V1.0: Simple edit history sufficient for accountability and basic rollback.
V2.0+: Add complex versioning if users request "see version history" or "restore previous version" features.

2. Core Entity Definitions

2.1 User Entities

USER (base user table):

``UserID`` (PK)
``UserType`` (Reader|Contributor|Reviewer|Auditor|Expert|Moderator|Maintainer)
``DisplayName``
``Email`` (for Contributors and above)
``RegisteredAt``
``LastActive``
``Status`` (active|suspended|banned)

TECHNICAL_USER (system processes):

``SystemID`` (PK)
``SystemName``
``Purpose`` (AKEL|FederationSync|BackupService|Monitor|Audit)
``CreatedBy`` (FK to Maintainer who created this system user)
``CreatedAt``
``Status`` (active|paused|deprecated)
``ApiKey`` (encrypted)
``Permissions`` (JSON - authorized operations)

Examples of Technical Users:

AKEL instances (AI processing)
Federation sync bots
Scheduled audit tasks
Backup services
Monitoring systems
External API integrations

2.2 Content Entities

The system relies on the following versioned core entities:

CLAIM_CLUSTER:

``ClusterID`` (PK)
``EmbeddingVectorRef``
``Theme``
Groups related claims into topical clusters
One Cluster has many Claims
A Claim belongs to exactly one primary cluster

CLAIM / CLAIM_VERSION:

``CLAIM`` is the long-lived anchor for a real-world claim
``CLAIM_VERSION`` is an immutable snapshot that includes:
``VersionID`` (PK)
``ClaimID`` (FK to CLAIM)
``ParentVersionID`` (FK to prior version, nullable)
``Text``
``Domain``
``ClaimType`` (literal|metaphorical|rhetorical|supernatural)
``Evaluability`` (empirical|subjective|non-falsifiable)
``RiskTier`` (A|B|C) - replaced SafetyCategory for consistency
``PublicationMode`` (Mode1|Mode2|Mode3)
``ReviewStatus`` (draft|in_review|approved|rejected)
``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
``NodeOrigin``, ``SignatureHash``
``Status`` (active|superseded|merged)

SCENARIO / SCENARIO_VERSION:

``SCENARIO`` is the anchor for a scenario across time
``SCENARIO_VERSION`` is an immutable snapshot:
``VersionID`` (PK)
``ScenarioID`` (FK to SCENARIO)
``ParentVersionID``
``ClaimID`` (FK to CLAIM)
``Definitions`` (JSON)
``Boundaries`` (JSON)
``Assumptions`` (JSON)
``Context`` (text)
``EvaluationMethod`` (text)
``PublicationMode`` (Mode1|Mode2|Mode3)
``ReviewStatus`` (draft|in_review|approved|rejected)
``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
``NodeOrigin``, ``SignatureHash``
``Status`` (active|superseded|deprecated)

Note: SafetyClass removed from Scenario - risk tier is at claim level

EVIDENCE / EVIDENCE_VERSION:

``EVIDENCE`` is the anchor
``EVIDENCE_VERSION`` is the versioned snapshot:
``VersionID`` (PK)
``EvidenceID`` (FK to EVIDENCE)
``ParentVersionID``
``Type`` (paper|dataset|report|transcript|expert|media)
``Category`` (empirical|historical|rhetorical|dataset|meta-analysis)
``Reliability`` (low|medium|high)
``Provenance`` (URL, DOI, source metadata)
``ExtractionMethod`` (manual|OCR|API|AKEL)
``ContentHash`` (SHA256 of evidence content)
``PublicationMode`` (Mode1|Mode2|Mode3)
``ReviewStatus`` (draft|verified|disputed|retracted)
``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
``NodeOrigin``, ``SignatureHash``
``Status`` (active|superseded)

VERDICT / VERDICT_VERSION:

``VERDICT`` is the anchor
``VERDICT_VERSION`` is the snapshot:
``VersionID`` (PK)
``VerdictID`` (FK to VERDICT)
``ParentVersionID``
``ClaimID`` (FK to CLAIM)
``ScenarioVersionID`` (FK to specific SCENARIO_VERSION)
``EvidenceVersionSet`` (JSON array of Evidence VersionIDs used)
``LikelihoodRange`` (0–1, with uncertainty bounds)
``ExplanationChain`` (JSON)
``UncertaintyFactors`` (JSON)
``PublicationMode`` (Mode1|Mode2|Mode3)
``ReviewStatus`` (draft|in_review|approved|retracted)
``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
``NodeOrigin``, ``SignatureHash``
``Status`` (current|outdated|superseded|retracted)

3. Many-to-Many Linking Tables

ScenarioEvidenceLink:

Links scenario versions to evidence versions with relevance scoring
``ScenarioID``, ``ScenarioVersionID``
``EvidenceID``, ``EvidenceVersionID``
``RelevanceScore`` (0–1) - How relevant this evidence is to this scenario
``LinkJustification`` - Brief explanation of relevance

Purpose:

Evidence can be used by multiple scenarios
Scenarios can draw from multiple pieces of evidence
Relevance scoring helps prioritize evidence
Version-specific linking preserves historical accuracy

ClaimCluster:

Semantic clustering of similar claims
``ClusterID`` (PK)
``EmbeddingVector`` - Vector representation for semantic search
``MemberList`` - List of ClaimIDs in this cluster
``Theme`` - Human-readable theme description

4. Key Changes Since 0.9.1

Updated Field Names:

`SafetyCategory` → ``RiskTier`` (consistency with risk tier system A/B/C)
`SafetyClass` removed from Scenario (redundant with claim-level RiskTier)

Added Fields to All Version Entities:

`PublicationMode` - Track Mode 1/2/3 status
`ReviewStatus` - Track workflow state
`NodeOrigin` - Federation provenance
`CreatedBy` - FK to User/TechnicalUser (clarified)

New Entity:

`TECHNICAL_USER` - Separate system processes from human users

Clarifications:

`ScenarioVersionID` in Verdict (not just ScenarioID) - links to specific version
`ContentHash` in Evidence - SHA256 for integrity checking

5. Data Model Behavior

5.1 Late-Arriving Evidence

When new evidence versions appear:

Existing verdicts marked as outdated
2. Scenario relevance must be re-evaluated
3. Re-evaluation engine triggers verdict recomputation
4. New verdict versions created
5. Users notified of updates

5.2 Scenario Evolution

When a scenario's assumptions or definitions change:

Creates new scenario version (not in-place update)
All dependent verdicts must be recalculated
Previous scenario versions remain accessible
Version lineage preserved

5.3 Federated Nodes

Each node may share partial data:

Claims and scenarios shared if relevant
Evidence metadata shared, not always full files
Version synchronization via NodeOrigin tracking
Branching allowed for divergent interpretations

6. Visual Diagrams

The following diagrams provide visual representations of the data model structure and relationships.

6.1 Core Data Model ERD

Core Data Model ERD

erDiagram
 USER ||--o{ CLAIM : creates
 CLAIM ||--o{ EVIDENCE : has
 CLAIM ||--o{ SCENARIO : has
 SCENARIO ||--o{ VERDICT : assessed
 EVIDENCE }o--|| SOURCE : from
 USER {
 uuid id PK
 text name
 text email
 text role "reader|contributor|moderator|admin"
 int contributions_count "cached"
 timestamp created_at
 }
 CLAIM {
 uuid id PK
 uuid user_id FK
 text text
 decimal confidence "0-1"
 jsonb evidence_summary "cached: top 5 evidence"
 text_array source_names "cached: for display"
 int scenario_count "cached: count"
 timestamp cache_updated_at "cache freshness"
 timestamp created_at
 timestamp updated_at
 }
 EVIDENCE {
 uuid id PK
 uuid claim_id FK
 uuid source_id FK
 text content
 decimal relevance "0-1"
 text url
 timestamp created_at
 }
 SOURCE {
 uuid id PK
 text name
 text domain
 decimal track_record_score "0-1"
 int total_citations
 timestamp last_updated
 timestamp created_at
 }
 SCENARIO {
 uuid id PK
 uuid claim_id FK "belongs to claim"
 uuid extracted_from "references evidence_id that provided context"
 text description
 jsonb assumptions
 timestamp created_at
 timestamp updated_at
 }
 VERDICT {
 uuid id PK
 uuid scenario_id FK "assessed scenario"
 text likelihood_range "e.g. 0.40-0.65 (uncertain)"
 decimal confidence "0-1"
 text explanation_summary "verdict reasoning"
 text_array uncertainty_factors "factors affecting confidence"
 timestamp created_at
 timestamp updated_at
 }

Core Data Model ERD - Shows primary business entities and their relationships. Claims have Evidence (sources supporting/refuting) and Scenarios (different contexts for evaluation). Each Scenario is assessed by Verdicts (conclusion about the claim in that scenario context). Evidence comes from Sources (with track records). Verdicts track changes through the Edit entity like all other entities. Claims include denormalized cache fields for performance. Most entities created/edited by AKEL automatically. See Audit Trail ERD for edit tracking relationships.

6.2 User Roles Structure

classDiagram
 class User {
 +UUID id
 +String username
 +String email
 +Role role
 +Int reputation
 +Timestamp created_at
 +contribute()
 +flag_issue()
 +earn_reputation()
 }
 class Reader {
 <>
 +browse()
 +search()
 +flag_content()
 }
 class Contributor {
 <>
 +edit_claims()
 +add_evidence()
 +suggest_improvements()
 +requires: reputation sufficient
 }
 class Moderator {
 <>
 +review_flags()
 +hide_content()
 +resolve_disputes()
 +requires: appointed by Governing Team
 }
 User --> Reader : default role
 User --> Contributor : registers + earns reputation
 User --> Moderator : appointed
 note for User "Reputation system unlocks permissions progressively"
 note for Contributor "Reputation sufficient: Full edit access"
 note for Contributor "Reputation sufficient: Can approve changes"

Simplified flat role structure:

Three roles only: Reader (default), Contributor (earned), Moderator (appointed)
Reputation system replaces role hierarchy
Progressive permissions based on reputation, not titles

6.3 Content Workflow

Claim & Scenario Workflow
This diagram shows how Claims are submitted and Scenarios are created and reviewed.

graph TB
 Start[User Submission
Text/URL/Single Claim]
 Extract{Claim Extraction
LLM Analysis}
 ValidateClaims{Validate Claims
Clear & Distinct?}
 Single[Single Claim]
 Multi[Multiple Claims]
 Queue[Parallel Processing]
 
 Process[Process Claim
AKEL Analysis]
 Evidence[Gather Evidence
LLM + Sources]
 Scenarios[Generate Scenarios
LLM Analysis]
 CrossRef[Cross-Reference
Evidence & Scenarios]
 Verdict[Generate Verdict
Confidence + Risk]
 Review{Confidence
Check}
 Publish[Publish Verdict]
 HumanReview[Human Review Queue]
 
 Start --> Extract
 Extract --> ValidateClaims
 ValidateClaims -->|Valid| Single
 ValidateClaims -->|Valid| Multi
 ValidateClaims -->|Invalid| Start
 
 Single --> Process
 Multi --> Queue
 Queue -->|Each Claim| Process
 
 Process --> Evidence
 Process --> Scenarios
 Evidence --> CrossRef
 Scenarios --> CrossRef
 CrossRef --> Verdict
 
 Verdict --> Review
 Review -->|High Confidence| Publish
 Review -->|Low Confidence| HumanReview
 HumanReview --> Publish
 
 style Extract fill:#e1f5ff
 style Queue fill:#fff4e1
 style Process fill:#f0f0f0
 style HumanReview fill:#ffe1e1

Data Model

Data Model

1. Versioning Strategy

1.1 Core Versioning Principles

1.2 Common Version Fields

1.3 Versioning Architecture Diagram

2. Core Entity Definitions

2.1 User Entities

2.2 Content Entities

3. Many-to-Many Linking Tables

4. Key Changes Since 0.9.1

5. Data Model Behavior

5.1 Late-Arriving Evidence

5.2 Scenario Evolution

5.3 Federated Nodes

6. Visual Diagrams

6.1 Core Data Model ERD

6.2 User Roles Structure

6.3 Content Workflow

7. Related Pages

Applications

Navigation

Need help?