Data Model
Version 3.1 by Robert Schaub on 2025/12/11 21:35
Data Model
This page describes the current data model for FactHarbor.
Core Data Model Refinements
The system relies on the following versioned core entities:
- CLAIM_CLUSTER
- ``ClusterID`` (PK), ``EmbeddingVectorRef``, ``Theme``
- Groups related claims into topical clusters.
- CLAIM / CLAIM_VERSION
- ``CLAIM`` is the long‑lived anchor for a real‑world claim.
- ``CLAIM_VERSION`` is an immutable snapshot of wording + basic metadata.
- Verdicts are NOT attached to ClaimVersion but to Scenario.
- SCENARIO / SCENARIO_VERSION
- ``SCENARIO`` represents a stable interpretive context for a claim.
- ``SCENARIO_VERSION`` is an immutable snapshot of that context (definitions, assumptions, boundaries).
- Verdicts are attached to SCENARIO, with verdict history in VERDICT_VERSION.
- EVIDENCE / EVIDENCE_VERSION
- ``EVIDENCE`` is the logical source (report, article, dataset…).
- ``EVIDENCE_VERSION`` is the extracted/processed snapshot (summary, reliability, etc.).
- VERDICT / VERDICT_VERSION
- ``VERDICT`` represents “this scenario is evaluated for this claim.”
- ``VERDICT_VERSION`` is an immutable snapshot of a concrete evaluation (likelihood, confidence, reasoning, timestamp).
- SCENARIO_EVIDENCE_VERSION_LINK
- Connects ``ScenarioVersion`` ↔ ``EvidenceVersion`` (many‑to‑many).
- Fields: Relevance, Direction (SUPPORTS / CONTRADICTS / NEUTRAL).
- Rule: The link always targets VERSIONED entities, never the base tables.
Core Data Model ERD
Current Implementation Data Model
erDiagram
ARTICLE ||--o{ CLAIM : contains
ARTICLE ||--|| ARTICLE_VERDICT : has
CLAIM ||--|| CLAIM_VERDICT : has
CLAIM ||--o{ CLAIM : depends_on
CLAIM_VERDICT }o--o{ EVIDENCE_ITEM : supported_by
SOURCE ||--o{ EVIDENCE_ITEM : provides
ARTICLE ||--o{ ANALYSIS_CONTEXT : has
ARTICLE {
string id_PK
string inputType
string inputValue
string articleThesis
string detectedInputType
boolean requiresSeparateAnalysis
json analysisContexts
string schemaVersion
}
CLAIM {
string id_PK
string articleId_FK
string text
string type
string claimRole
string_array dependsOn
string keyFactorId
boolean isCentral
string contextId
}
CLAIM_VERDICT {
string claimId_FK
number verdict
number truthPercentage
number confidence
string reasoning
string_array supportingEvidenceIds
string ratingConfirmation
boolean isContested
string contestedBy
string factualBasis
}
ARTICLE_VERDICT {
string articleId_FK
string verdict
int truthPercentage
int confidence
string summary
}
EVIDENCE_ITEM {
string id_PK
string sourceId_FK
string statement
string sourceExcerpt
string category
string claimDirection
string contextId
string sourceAuthority
string probativeValue
string evidenceBasis
number extractionConfidence
}
SOURCE {
string id_PK
string url
string title
float trackRecordScore
float trackRecordConfidence
boolean trackRecordConsensus
string category
boolean fetchSuccess
}
ANALYSIS_CONTEXT {
string id_PK
string name
string shortName
string subject
string temporal
string status
string outcome
string assessedStatement
json metadata
}
Key Implementation Notes
7-Point Verdict Scale:
- TRUE (86-100%) / MOSTLY-TRUE (72-85%) / LEANING-TRUE (58-71%)
- MIXED (43-57%, high confidence) / UNVERIFIED (43-57%, low confidence)
- LEANING-FALSE (29-42%) / MOSTLY-FALSE (15-28%) / FALSE (0-14%)
ratingConfirmation (v2.8.4): LLM-provided verdict direction confirmation ("claim_supported" | "claim_refuted" | "mixed"). Used for direction mismatch validation.
KeyFactors: Optional decomposition questions discovered during analysis - not stored as separate entities.
Storage: All data stored as JSON blob in SQLite ResultJson field.
See Also: Target Data Model for normalized design.