Data Model

Version 4.1 by Robert Schaub on 2025/12/12 08:32

Data Model

This page describes the current data model for FactHarbor.

Core Data Model Refinements

The system relies on the following versioned core entities:

CLAIM_CLUSTER
- ``ClusterID`` (PK), ``EmbeddingVectorRef``, ``Theme``
- Groups related claims into topical clusters.
- One Cluster has many Claims.
- A Claim belongs to exactly one primary cluster.

CLAIM / CLAIM_VERSION
- ``CLAIM`` is the long‑lived anchor for a real‑world claim.
- ``CLAIM_VERSION`` is an immutable snapshot of wording + basic metadata.
- Note: Verdicts are NEVER attached directly to a Claim. They are attached to Scenarios.

SCENARIO / SCENARIO_VERSION
- ``SCENARIO`` represents a stable interpretive context for a claim.
- ``SCENARIO_VERSION`` is an immutable snapshot of that context (definitions, assumptions, boundaries).
- A single Claim may have multiple Scenarios.

EVIDENCE / EVIDENCE_VERSION
- ``EVIDENCE`` is the logical source (report, article, dataset…).
- ``EVIDENCE_VERSION`` is the extracted/processed snapshot (summary, reliability score, extraction method).

VERDICT / VERDICT_VERSION
- ``VERDICT`` represents the assertion "this claim is assessed under this specific scenario."
- ``VERDICT_VERSION`` is an immutable snapshot of the evaluation (likelihood, confidence, reasoning, timestamp).
- Cardinality: 1 Scenario has 1 active Verdict (but many Verdict versions over time). Therefore, 1 Claim has N Verdicts.

SCENARIO_EVIDENCE_VERSION_LINK
- Connects ``ScenarioVersion`` ↔ ``EvidenceVersion`` (many‑to‑many).
- Fields: ``LinkID``, ``Relevance``, ``Direction`` (SUPPORTS / CONTRADICTS / NEUTRAL / MIXED).
- Rule: The link always targets specific VERSIONS of entities, never the base tables, to ensure auditability.

Core Data Model ERD

This diagram shows the current implementation data model. Storage is JSON blobs in SQLite. AnalysisContexts (bounded analytical frames) and KeyFactors (decomposition questions with contestation tracking) are embedded in result JSON, not separate stored entities. No user system implemented.

Updated 2026-02-08 per documentation audit report.

Current Implementation Data Model


erDiagram
    ARTICLE ||--o{ CLAIM : contains
    ARTICLE ||--|| ARTICLE_VERDICT : has
    CLAIM ||--|| CLAIM_VERDICT : has
    CLAIM ||--o{ CLAIM : depends_on
    CLAIM_VERDICT }o--o{ EVIDENCE_ITEM : supported_by
    SOURCE ||--o{ EVIDENCE_ITEM : provides
    ARTICLE ||--o{ ANALYSIS_CONTEXT : has

    ARTICLE {
        string id_PK
        string inputType
        string inputValue
        string articleThesis
        string detectedInputType
        boolean requiresSeparateAnalysis
        json analysisContexts
        string schemaVersion
    }

    CLAIM {
        string id_PK
        string articleId_FK
        string text
        string type
        string claimRole
        string_array dependsOn
        string keyFactorId
        boolean isCentral
        string contextId
    }

    CLAIM_VERDICT {
        string claimId_FK
        number verdict
        number truthPercentage
        number confidence
        string reasoning
        string_array supportingEvidenceIds
        string ratingConfirmation
        boolean isContested
        string contestedBy
        string factualBasis
    }

    ARTICLE_VERDICT {
        string articleId_FK
        string verdict
        int truthPercentage
        int confidence
        string summary
    }

    EVIDENCE_ITEM {
        string id_PK
        string sourceId_FK
        string statement
        string sourceExcerpt
        string category
        string claimDirection
        string contextId
        string sourceAuthority
        string probativeValue
        string evidenceBasis
        number extractionConfidence
    }

    SOURCE {
        string id_PK
        string url
        string title
        float trackRecordScore
        float trackRecordConfidence
        boolean trackRecordConsensus
        string category
        boolean fetchSuccess
    }

    ANALYSIS_CONTEXT {
        string id_PK
        string name
        string shortName
        string subject
        string temporal
        string status
        string outcome
        string assessedStatement
        json metadata
    }

Key Implementation Notes

7-Point Verdict Scale:

TRUE (86-100%) / MOSTLY-TRUE (72-85%) / LEANING-TRUE (58-71%)
MIXED (43-57%, high confidence) / UNVERIFIED (43-57%, low confidence)
LEANING-FALSE (29-42%) / MOSTLY-FALSE (15-28%) / FALSE (0-14%)

ratingConfirmation (v2.8.4): LLM-provided verdict direction confirmation ("claim_supported" | "claim_refuted" | "mixed"). Used for direction mismatch validation.

KeyFactors: Optional decomposition questions discovered during analysis - not stored as separate entities.

Storage: All data stored as JSON blob in SQLite ResultJson field.

See Also: Target Data Model for normalized design.