Data Model

Version 1.1 by Robert Schaub on 2025/12/16 21:42

Data Model

This page describes the current data model for FactHarbor v0.9.1.

1. Versioning Strategy

Every entity in FactHarbor has a full immutable version history. This ensures:

Complete auditability
Ability to reconstruct historical state
Federation-compatible lineage tracking
Transparent evolution of claims, scenarios, and verdicts

1.1 Core Versioning Principles

Immutability:

Each version is stored independently
Versions cannot be deleted, only superseded
Historical versions remain accessible

Lineage:

Each version links to its parent via `ParentVersionID`
Forms directed acyclic graph (DAG) of changes
Supports branching in federated environments

Provenance:

Every version timestamped (`CreatedAt`)
Author type recorded (`AuthorType`: Human, AI, ExternalNode)
Justification captured (`JustificationText`)
Digital signatures for integrity (`SignatureHash` in Release 1.0)

Federation Support:

Versions can originate from remote nodes
Conflict detection via lineage comparison
Parallel version trees for branching scenarios
Cross-node version synchronization

1.2 Common Version Fields

All versioned entities include:

VersionID: Unique identifier for this specific version
ParentVersionID: Link to previous version (null for first version)
CreatedAt: Timestamp (ISO 8601, UTC)
AuthorType: Human | AI | ExternalNode
CreatedBy: Foreign key to User or TechnicalUser
JustificationText: Brief explanation of changes
PublicationMode: Mode1 (draft) | Mode2 (AI-published) | Mode3 (human-reviewed)
ReviewStatus: Workflow state (draft|in_review|approved|rejected)
NodeOrigin: Node ID where version was created (for federation)
SignatureHash: Cryptographic signature (Release 1.0)

1.3 Versioning Architecture Diagram

UCM Configuration Versioning Architecture


graph LR
    ADMIN[UCM Administrator] -->|creates| BLOB[Config Blob - immutable]
    BLOB -->|content-addressed| STORE[(config_blobs)]
    ADMIN -->|activates| ACTIVE[config_active]
    ACTIVE -->|points to| BLOB
    JOB[Analysis Job] -->|snapshots at start| USAGE[config_usage]
    USAGE -->|references| BLOB
    REPORT[Analysis Report] -->|cites| USAGE

How UCM Config Versioning Works

Concept	Description
config_blobs	Immutable, content-addressed config versions. Each change creates a new blob; old blobs are never deleted.
config_active	Pointer to the currently active config blob per config type. Changing this activates a new config version.
config_usage	Links each analysis job to the exact config snapshot used. Enables reproducibility.
Immutability	Analysis outputs are never edited. To improve results, update UCM config and re-analyse.

Current Implementation (v2.10.2)

Feature	Status
UCM config storage	Implemented (config.db SQLite)
Config hot-reload	Implemented (60s TTL)
Per-job config snapshots	Implemented (job_config_snapshots)
Content-addressed blobs	Implemented (hash-based deduplication)
Config activation tracking	Implemented (config_active table)
Admin UI for config management	Not yet implemented (CLI/direct DB)

Design Principles

Every config change creates a new immutable blob — no in-place mutation
Every analysis job records the config snapshot used at time of execution
Reports can be reproduced by re-running with the same config snapshot
Config history is the audit trail — who changed what, when, and why
Analysis data is never edited — "improve the system, not the data"

2. Core Entity Definitions

2.1 User Entities

USER (base user table):

``UserID`` (PK)
``UserType`` (Reader|Contributor|Reviewer|Auditor|Expert|Moderator|Maintainer)
``DisplayName``
``Email`` (for Contributors and above)
``RegisteredAt``
``LastActive``
``Status`` (active|suspended|banned)

TECHNICAL_USER (system processes):

``SystemID`` (PK)
``SystemName``
``Purpose`` (AKEL|FederationSync|BackupService|Monitor|Audit)
``CreatedBy`` (FK to Maintainer who created this system user)
``CreatedAt``
``Status`` (active|paused|deprecated)
``ApiKey`` (encrypted)
``Permissions`` (JSON - authorized operations)

Examples of Technical Users:

AKEL instances (AI processing)
Federation sync bots
Scheduled audit tasks
Backup services
Monitoring systems
External API integrations

2.2 Content Entities

The system relies on the following versioned core entities:

CLAIM_CLUSTER:

``ClusterID`` (PK)
``EmbeddingVectorRef``
``Theme``
Groups related claims into topical clusters
One Cluster has many Claims
A Claim belongs to exactly one primary cluster

CLAIM / CLAIM_VERSION:

``CLAIM`` is the long-lived anchor for a real-world claim
``CLAIM_VERSION`` is an immutable snapshot that includes:
``VersionID`` (PK)
``ClaimID`` (FK to CLAIM)
``ParentVersionID`` (FK to prior version, nullable)
``Text``
``Domain``
``ClaimType`` (literal|metaphorical|rhetorical|supernatural)
``Evaluability`` (empirical|subjective|non-falsifiable)
``RiskTier`` (A|B|C) - replaced SafetyCategory for consistency
``PublicationMode`` (Mode1|Mode2|Mode3)
``ReviewStatus`` (draft|in_review|approved|rejected)
``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
``NodeOrigin``, ``SignatureHash``
``Status`` (active|superseded|merged)

SCENARIO / SCENARIO_VERSION:

``SCENARIO`` is the anchor for a scenario across time
``SCENARIO_VERSION`` is an immutable snapshot:
``VersionID`` (PK)
``ScenarioID`` (FK to SCENARIO)
``ParentVersionID``
``ClaimID`` (FK to CLAIM)
``Definitions`` (JSON)
``Boundaries`` (JSON)
``Assumptions`` (JSON)
``Context`` (text)
``EvaluationMethod`` (text)
``PublicationMode`` (Mode1|Mode2|Mode3)
``ReviewStatus`` (draft|in_review|approved|rejected)
``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
``NodeOrigin``, ``SignatureHash``
``Status`` (active|superseded|deprecated)

Note: SafetyClass removed from Scenario - risk tier is at claim level

EVIDENCE / EVIDENCE_VERSION:

``EVIDENCE`` is the anchor
``EVIDENCE_VERSION`` is the versioned snapshot:
``VersionID`` (PK)
``EvidenceID`` (FK to EVIDENCE)
``ParentVersionID``
``Type`` (paper|dataset|report|transcript|expert|media)
``Category`` (empirical|historical|rhetorical|dataset|meta-analysis)
``Reliability`` (low|medium|high)
``Provenance`` (URL, DOI, source metadata)
``ExtractionMethod`` (manual|OCR|API|AKEL)
``ContentHash`` (SHA256 of evidence content)
``PublicationMode`` (Mode1|Mode2|Mode3)
``ReviewStatus`` (draft|verified|disputed|retracted)
``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
``NodeOrigin``, ``SignatureHash``
``Status`` (active|superseded)

VERDICT / VERDICT_VERSION:

``VERDICT`` is the anchor
``VERDICT_VERSION`` is the snapshot:
``VersionID`` (PK)
``VerdictID`` (FK to VERDICT)
``ParentVersionID``
``ClaimID`` (FK to CLAIM)
``ScenarioVersionID`` (FK to specific SCENARIO_VERSION)
``EvidenceVersionSet`` (JSON array of Evidence VersionIDs used)
``LikelihoodRange`` (0–1, with uncertainty bounds)
``ExplanationChain`` (JSON)
``UncertaintyFactors`` (JSON)
``PublicationMode`` (Mode1|Mode2|Mode3)
``ReviewStatus`` (draft|in_review|approved|retracted)
``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
``NodeOrigin``, ``SignatureHash``
``Status`` (current|outdated|superseded|retracted)

3. Many-to-Many Linking Tables

ScenarioEvidenceLink:

Links scenario versions to evidence versions with relevance scoring
``ScenarioID``, ``ScenarioVersionID``
``EvidenceID``, ``EvidenceVersionID``
``RelevanceScore`` (0–1) - How relevant this evidence is to this scenario
``LinkJustification`` - Brief explanation of relevance

Purpose:

Evidence can be used by multiple scenarios
Scenarios can draw from multiple pieces of evidence
Relevance scoring helps prioritize evidence
Version-specific linking preserves historical accuracy

ClaimCluster:

Semantic clustering of similar claims
``ClusterID`` (PK)
``EmbeddingVector`` - Vector representation for semantic search
``MemberList`` - List of ClaimIDs in this cluster
``Theme`` - Human-readable theme description

4. Key Changes in v0.9.1

Updated Field Names:

`SafetyCategory` → `RiskTier` (consistency with risk tier system A/B/C)
`SafetyClass` removed from Scenario (redundant with claim-level RiskTier)

Added Fields to All Version Entities:

`PublicationMode` - Track Mode 1/2/3 status
`ReviewStatus` - Track workflow state
`NodeOrigin` - Federation provenance
`CreatedBy` - FK to User/TechnicalUser (clarified)

New Entity:

`TECHNICAL_USER` - Separate system processes from human users

Clarifications:

`ScenarioVersionID` in Verdict (not just ScenarioID) - links to specific version
`ContentHash` in Evidence - SHA256 for integrity checking

5. Data Model Behavior

5.1 Late-Arriving Evidence

When new evidence versions appear:

Existing verdicts marked as outdated
2. Scenario relevance must be re-evaluated
3. Re-evaluation engine triggers verdict recomputation
4. New verdict versions created
5. Users notified of updates

5.2 Scenario Evolution

When a scenario's assumptions or definitions change:

Creates new scenario version (not in-place update)
All dependent verdicts must be recalculated
Previous scenario versions remain accessible
Version lineage preserved

5.3 Federated Nodes

Each node may share partial data:

Claims and scenarios shared if relevant
Evidence metadata shared, not always full files
Version synchronization via NodeOrigin tracking
Branching allowed for divergent interpretations

6. Visual Diagrams

The following diagrams provide visual representations of the data model structure and relationships.

6.1 Core Data Model ERD

This diagram shows the current implementation data model. Storage is JSON blobs in SQLite. AnalysisContexts (bounded analytical frames) and KeyFactors (decomposition questions with contestation tracking) are embedded in result JSON, not separate stored entities. No user system implemented.

Updated 2026-02-08 per documentation audit report.

Current Implementation Data Model


erDiagram
    ARTICLE ||--o{ CLAIM : contains
    ARTICLE ||--|| ARTICLE_VERDICT : has
    CLAIM ||--|| CLAIM_VERDICT : has
    CLAIM ||--o{ CLAIM : depends_on
    CLAIM_VERDICT }o--o{ EVIDENCE_ITEM : supported_by
    SOURCE ||--o{ EVIDENCE_ITEM : provides
    ARTICLE ||--o{ ANALYSIS_CONTEXT : has

    ARTICLE {
        string id_PK
        string inputType
        string inputValue
        string articleThesis
        string detectedInputType
        boolean requiresSeparateAnalysis
        json analysisContexts
        string schemaVersion
    }

    CLAIM {
        string id_PK
        string articleId_FK
        string text
        string type
        string claimRole
        string_array dependsOn
        string keyFactorId
        boolean isCentral
        string contextId
    }

    CLAIM_VERDICT {
        string claimId_FK
        number verdict
        number truthPercentage
        number confidence
        string reasoning
        string_array supportingEvidenceIds
        string ratingConfirmation
        boolean isContested
        string contestedBy
        string factualBasis
    }

    ARTICLE_VERDICT {
        string articleId_FK
        string verdict
        int truthPercentage
        int confidence
        string summary
    }

    EVIDENCE_ITEM {
        string id_PK
        string sourceId_FK
        string statement
        string sourceExcerpt
        string category
        string claimDirection
        string contextId
        string sourceAuthority
        string probativeValue
        string evidenceBasis
        number extractionConfidence
    }

    SOURCE {
        string id_PK
        string url
        string title
        float trackRecordScore
        float trackRecordConfidence
        boolean trackRecordConsensus
        string category
        boolean fetchSuccess
    }

    ANALYSIS_CONTEXT {
        string id_PK
        string name
        string shortName
        string subject
        string temporal
        string status
        string outcome
        string assessedStatement
        json metadata
    }

Key Implementation Notes

7-Point Verdict Scale:

TRUE (86-100%) / MOSTLY-TRUE (72-85%) / LEANING-TRUE (58-71%)
MIXED (43-57%, high confidence) / UNVERIFIED (43-57%, low confidence)
LEANING-FALSE (29-42%) / MOSTLY-FALSE (15-28%) / FALSE (0-14%)

ratingConfirmation (v2.8.4): LLM-provided verdict direction confirmation ("claim_supported" | "claim_refuted" | "mixed"). Used for direction mismatch validation.

KeyFactors: Optional decomposition questions discovered during analysis - not stored as separate entities.

Storage: All data stored as JSON blob in SQLite ResultJson field.

See Also: Target Data Model for normalized design.

6.2 User Roles Structure

User Class Diagram


classDiagram
    class BaseUser {
        +view_results()
        +browse()
        +search()
    }
    class Reader {
        <>
        +browse()
        +search()
        +view_results()
    }
    class RegisteredUser {
        +UUID id
        +String username
        +Role role
        +Timestamp created_at
        +submit_url()
        +flag_issue()
        +view_submission_history()
    }
    class UCMAdministrator {
        +manage_config()
        +view_audit_trail()
        +activate_config_version()
        +trigger_reanalysis()
        +view_system_metrics()
    }
    class Moderator {
        +review_flags()
        +hide_content()
        +ban_user()
    }
    BaseUser <|-- Reader : anonymous
    BaseUser <|-- RegisteredUser : logged in
    RegisteredUser <|-- UCMAdministrator : appointed
    RegisteredUser <|-- Moderator : appointed

Role Permissions

Role	Capabilities	Requirements
Reader (Guest)	Browse, search, view results	No login required
User (Registered)	Everything Reader can + submit URLs/text (rate-limited), flag content	Free account required
UCM Administrator	Everything User can + manage UCM config, view audit trail, trigger re-analysis	Appointed by Governing Team
Moderator	Everything User can + review flags, hide content, ban users	Appointed by Governing Team

Current Implementation

All users are anonymous Readers (no authentication system yet)
UCM config management via CLI/direct DB access
No moderator tooling
No rate limiting (single-user development mode)

Design Principles

No data editing roles — analysis outputs are immutable
UCM Administrator improves the system through configuration, not by editing individual outputs
Submission requires login — LLM inference and web search are not free; rate limits control costs
Four roles: Reader (guest), User (registered), UCM Administrator (appointed), Moderator (appointed)

6.3 Content Workflow

Current Implementation (v2.10.2) — The pipeline uses AnalysisContexts (bounded analytical frames) and KeyFactors (decomposition questions with contestation tracking), discovered during the understanding phase.

Claim Analysis Workflow


graph TB
    Start[User Submission]

    subgraph Step1[Step 1 Understand]
        Extract{understandClaim LLM Analysis}
        Gate1{Gate 1 Claim Validation}
        DetectType[Detect Input Type]
        DetectContexts[Detect Contexts]
        KeyFactors[Discover KeyFactors]
    end

    subgraph Step2[Step 2 Research]
        Decide[decideNextResearch]
        Search[Web Search]
        Fetch[Fetch Sources]
        Facts[extractEvidence]
    end

    subgraph Step3[Step 3 Verdict]
        Verdict[generateVerdicts]
        Gate4{Gate 4 Confidence Check}
    end

    subgraph Output[Output]
        Publish[Publish Result]
        LowConf[Low Confidence Flag]
    end

    Start --> Extract
    Extract --> Gate1
    Gate1 -->|Pass Factual| DetectType
    Gate1 -->|Fail Opinion| Exclude[Exclude from analysis]
    DetectType --> DetectContexts
    DetectContexts --> KeyFactors
    KeyFactors --> Decide
    Decide --> Search
    Search --> Fetch
    Fetch --> Facts
    Facts -->|More research needed| Decide
    Facts -->|Sufficient evidence| Verdict
    Verdict --> Gate4
    Gate4 -->|High or Medium confidence| Publish
    Gate4 -->|Low or Insufficient| LowConf

Quality Gates (Implemented)

Gate	Name	Purpose	Pass Criteria
Gate 1	Claim Validation	Filter non-factual claims	Factual, opinion score 0.3 or less, specificity 0.3 or more
Gate 4	Verdict Confidence	Ensure sufficient evidence	2 or more sources, avg quality 0.6 or more, agreement 60% or more

Gates 2 (Contradiction Search) and 3 (Uncertainty Quantification) are not yet implemented.

KeyFactors (Replaces Scenarios)

KeyFactors are optional decomposition questions discovered during the understanding phase:

Not stored as separate entities
Help break down complex claims into checkable sub-questions
See KeyFactors Design for design rationale

7-Point Verdict Scale

TRUE (86-100%) - Claim is well-supported by evidence
MOSTLY-TRUE (72-85%) - Largely accurate with minor caveats
LEANING-TRUE (58-71%) - More evidence supports than contradicts
MIXED (43-57%, high confidence) - Roughly equal evidence both ways
UNVERIFIED (43-57%, low confidence) - Insufficient evidence to determine
LEANING-FALSE (29-42%) - More evidence contradicts than supports
MOSTLY-FALSE (15-28%) - Largely inaccurate
FALSE (0-14%) - Claim is refuted by evidence

Data Model

Data Model

1. Versioning Strategy

1.1 Core Versioning Principles

1.2 Common Version Fields

1.3 Versioning Architecture Diagram

UCM Configuration Versioning Architecture

How UCM Config Versioning Works

Current Implementation (v2.10.2)

Design Principles

2. Core Entity Definitions

2.1 User Entities

2.2 Content Entities

3. Many-to-Many Linking Tables

4. Key Changes in v0.9.1

5. Data Model Behavior

5.1 Late-Arriving Evidence

5.2 Scenario Evolution

5.3 Federated Nodes

6. Visual Diagrams

6.1 Core Data Model ERD

Current Implementation Data Model

Key Implementation Notes

6.2 User Roles Structure

User Class Diagram

Role Permissions

Current Implementation

Design Principles

6.3 Content Workflow

Claim Analysis Workflow

Quality Gates (Implemented)

KeyFactors (Replaces Scenarios)

7-Point Verdict Scale

7. Related Pages

Applications

Navigation

Need help?