Data Model
Version 4.1 by Robert Schaub on 2025/12/12 08:32
Data Model
This page describes the current data model for FactHarbor.
Core Data Model Refinements
The system relies on the following versioned core entities:
- CLAIM_CLUSTER
- ``ClusterID`` (PK), ``EmbeddingVectorRef``, ``Theme``
- Groups related claims into topical clusters.
- One Cluster has many Claims.
- A Claim belongs to exactly one primary cluster.
- CLAIM / CLAIM_VERSION
- ``CLAIM`` is the long‑lived anchor for a real‑world claim.
- ``CLAIM_VERSION`` is an immutable snapshot of wording + basic metadata.
- Note: Verdicts are NEVER attached directly to a Claim. They are attached to Scenarios.
- SCENARIO / SCENARIO_VERSION
- ``SCENARIO`` represents a stable interpretive context for a claim.
- ``SCENARIO_VERSION`` is an immutable snapshot of that context (definitions, assumptions, boundaries).
- A single Claim may have multiple Scenarios.
- EVIDENCE / EVIDENCE_VERSION
- ``EVIDENCE`` is the logical source (report, article, dataset…).
- ``EVIDENCE_VERSION`` is the extracted/processed snapshot (summary, reliability score, extraction method).
- VERDICT / VERDICT_VERSION
- ``VERDICT`` represents the assertion "this claim is assessed under this specific scenario."
- ``VERDICT_VERSION`` is an immutable snapshot of the evaluation (likelihood, confidence, reasoning, timestamp).
- Cardinality: 1 Scenario has 1 active Verdict (but many Verdict versions over time). Therefore, 1 Claim has N Verdicts.
- SCENARIO_EVIDENCE_VERSION_LINK
- Connects ``ScenarioVersion`` ↔ ``EvidenceVersion`` (many‑to‑many).
- Fields: ``LinkID``, ``Relevance``, ``Direction`` (SUPPORTS / CONTRADICTS / NEUTRAL / MIXED).
- Rule: The link always targets specific VERSIONS of entities, never the base tables, to ensure auditability.
Core Data Model ERD
Current Implementation Data Model
erDiagram
ARTICLE ||--o{ CLAIM : contains
ARTICLE ||--|| ARTICLE_VERDICT : has
CLAIM ||--|| CLAIM_VERDICT : has
CLAIM ||--o{ CLAIM : depends_on
CLAIM_VERDICT }o--o{ EVIDENCE_ITEM : supported_by
SOURCE ||--o{ EVIDENCE_ITEM : provides
ARTICLE ||--o{ ANALYSIS_CONTEXT : has
ARTICLE {
string id_PK
string inputType
string inputValue
string articleThesis
string detectedInputType
boolean requiresSeparateAnalysis
json analysisContexts
string schemaVersion
}
CLAIM {
string id_PK
string articleId_FK
string text
string type
string claimRole
string_array dependsOn
string keyFactorId
boolean isCentral
string contextId
}
CLAIM_VERDICT {
string claimId_FK
number verdict
number truthPercentage
number confidence
string reasoning
string_array supportingEvidenceIds
string ratingConfirmation
boolean isContested
string contestedBy
string factualBasis
}
ARTICLE_VERDICT {
string articleId_FK
string verdict
int truthPercentage
int confidence
string summary
}
EVIDENCE_ITEM {
string id_PK
string sourceId_FK
string statement
string sourceExcerpt
string category
string claimDirection
string contextId
string sourceAuthority
string probativeValue
string evidenceBasis
number extractionConfidence
}
SOURCE {
string id_PK
string url
string title
float trackRecordScore
float trackRecordConfidence
boolean trackRecordConsensus
string category
boolean fetchSuccess
}
ANALYSIS_CONTEXT {
string id_PK
string name
string shortName
string subject
string temporal
string status
string outcome
string assessedStatement
json metadata
}
Key Implementation Notes
7-Point Verdict Scale:
- TRUE (86-100%) / MOSTLY-TRUE (72-85%) / LEANING-TRUE (58-71%)
- MIXED (43-57%, high confidence) / UNVERIFIED (43-57%, low confidence)
- LEANING-FALSE (29-42%) / MOSTLY-FALSE (15-28%) / FALSE (0-14%)
ratingConfirmation (v2.8.4): LLM-provided verdict direction confirmation ("claim_supported" | "claim_refuted" | "mixed"). Used for direction mismatch validation.
KeyFactors: Optional decomposition questions discovered during analysis - not stored as separate entities.
Storage: All data stored as JSON blob in SQLite ResultJson field.
See Also: Target Data Model for normalized design.