Data Model
Data Model
This page describes the current data model for FactHarbor v0.9.1.
1. Versioning Strategy
Every entity in FactHarbor has a full immutable version history. This ensures:
- Complete auditability
- Ability to reconstruct historical state
- Federation-compatible lineage tracking
- Transparent evolution of claims, scenarios, and verdicts
1.1 Core Versioning Principles
Immutability:
- Each version is stored independently
- Versions cannot be deleted, only superseded
- Historical versions remain accessible
Lineage:
- Each version links to its parent via `ParentVersionID`
- Forms directed acyclic graph (DAG) of changes
- Supports branching in federated environments
Provenance:
- Every version timestamped (`CreatedAt`)
- Author type recorded (`AuthorType`: Human, AI, ExternalNode)
- Justification captured (`JustificationText`)
- Digital signatures for integrity (`SignatureHash` in Release 1.0)
Federation Support:
- Versions can originate from remote nodes
- Conflict detection via lineage comparison
- Parallel version trees for branching scenarios
- Cross-node version synchronization
1.2 Common Version Fields
All versioned entities include:
- VersionID: Unique identifier for this specific version
- ParentVersionID: Link to previous version (null for first version)
- CreatedAt: Timestamp (ISO 8601, UTC)
- AuthorType: Human | AI | ExternalNode
- CreatedBy: Foreign key to User or TechnicalUser
- JustificationText: Brief explanation of changes
- PublicationMode: Mode1 (draft) | Mode2 (AI-published) | Mode3 (human-reviewed)
- ReviewStatus: Workflow state (draft|in_review|approved|rejected)
- NodeOrigin: Node ID where version was created (for federation)
- SignatureHash: Cryptographic signature (Release 1.0)
1.3 Versioning Architecture Diagram
UCM Configuration Versioning Architecture
graph LR
ADMIN[UCM Administrator] -->|creates| BLOB[Config Blob - immutable]
BLOB -->|content-addressed| STORE[(config_blobs)]
ADMIN -->|activates| ACTIVE[config_active]
ACTIVE -->|points to| BLOB
JOB[Analysis Job] -->|snapshots at start| USAGE[config_usage]
USAGE -->|references| BLOB
REPORT[Analysis Report] -->|cites| USAGE
How UCM Config Versioning Works
| Concept | Description |
|---|---|
| config_blobs | Immutable, content-addressed config versions. Each change creates a new blob; old blobs are never deleted. |
| config_active | Pointer to the currently active config blob per config type. Changing this activates a new config version. |
| config_usage | Links each analysis job to the exact config snapshot used. Enables reproducibility. |
| Immutability | Analysis outputs are never edited. To improve results, update UCM config and re-analyse. |
Current Implementation (v2.10.2)
| Feature | Status |
|---|---|
| UCM config storage | Implemented (config.db SQLite) |
| Config hot-reload | Implemented (60s TTL) |
| Per-job config snapshots | Implemented (job_config_snapshots) |
| Content-addressed blobs | Implemented (hash-based deduplication) |
| Config activation tracking | Implemented (config_active table) |
| Admin UI for config management | Not yet implemented (CLI/direct DB) |
Design Principles
- Every config change creates a new immutable blob — no in-place mutation
- Every analysis job records the config snapshot used at time of execution
- Reports can be reproduced by re-running with the same config snapshot
- Config history is the audit trail — who changed what, when, and why
- Analysis data is never edited — "improve the system, not the data"
2. Core Entity Definitions
2.1 User Entities
USER (base user table):
- ``UserID`` (PK)
- ``UserType`` (Reader|Contributor|Reviewer|Auditor|Expert|Moderator|Maintainer)
- ``DisplayName``
- ``Email`` (for Contributors and above)
- ``RegisteredAt``
- ``LastActive``
- ``Status`` (active|suspended|banned)
TECHNICAL_USER (system processes):
- ``SystemID`` (PK)
- ``SystemName``
- ``Purpose`` (AKEL|FederationSync|BackupService|Monitor|Audit)
- ``CreatedBy`` (FK to Maintainer who created this system user)
- ``CreatedAt``
- ``Status`` (active|paused|deprecated)
- ``ApiKey`` (encrypted)
- ``Permissions`` (JSON - authorized operations)
Examples of Technical Users:
- AKEL instances (AI processing)
- Federation sync bots
- Scheduled audit tasks
- Backup services
- Monitoring systems
- External API integrations
2.2 Content Entities
The system relies on the following versioned core entities:
CLAIM_CLUSTER:
- ``ClusterID`` (PK)
- ``EmbeddingVectorRef``
- ``Theme``
- Groups related claims into topical clusters
- One Cluster has many Claims
- A Claim belongs to exactly one primary cluster
CLAIM / CLAIM_VERSION:
- ``CLAIM`` is the long-lived anchor for a real-world claim
- ``CLAIM_VERSION`` is an immutable snapshot that includes:
- ``VersionID`` (PK)
- ``ClaimID`` (FK to CLAIM)
- ``ParentVersionID`` (FK to prior version, nullable)
- ``Text``
- ``Domain``
- ``ClaimType`` (literal|metaphorical|rhetorical|supernatural)
- ``Evaluability`` (empirical|subjective|non-falsifiable)
- ``RiskTier`` (A|B|C) - replaced SafetyCategory for consistency
- ``PublicationMode`` (Mode1|Mode2|Mode3)
- ``ReviewStatus`` (draft|in_review|approved|rejected)
- ``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
- ``NodeOrigin``, ``SignatureHash``
- ``Status`` (active|superseded|merged)
SCENARIO / SCENARIO_VERSION:
- ``SCENARIO`` is the anchor for a scenario across time
- ``SCENARIO_VERSION`` is an immutable snapshot:
- ``VersionID`` (PK)
- ``ScenarioID`` (FK to SCENARIO)
- ``ParentVersionID``
- ``ClaimID`` (FK to CLAIM)
- ``Definitions`` (JSON)
- ``Boundaries`` (JSON)
- ``Assumptions`` (JSON)
- ``Context`` (text)
- ``EvaluationMethod`` (text)
- ``PublicationMode`` (Mode1|Mode2|Mode3)
- ``ReviewStatus`` (draft|in_review|approved|rejected)
- ``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
- ``NodeOrigin``, ``SignatureHash``
- ``Status`` (active|superseded|deprecated)
Note: SafetyClass removed from Scenario - risk tier is at claim level
EVIDENCE / EVIDENCE_VERSION:
- ``EVIDENCE`` is the anchor
- ``EVIDENCE_VERSION`` is the versioned snapshot:
- ``VersionID`` (PK)
- ``EvidenceID`` (FK to EVIDENCE)
- ``ParentVersionID``
- ``Type`` (paper|dataset|report|transcript|expert|media)
- ``Category`` (empirical|historical|rhetorical|dataset|meta-analysis)
- ``Reliability`` (low|medium|high)
- ``Provenance`` (URL, DOI, source metadata)
- ``ExtractionMethod`` (manual|OCR|API|AKEL)
- ``ContentHash`` (SHA256 of evidence content)
- ``PublicationMode`` (Mode1|Mode2|Mode3)
- ``ReviewStatus`` (draft|verified|disputed|retracted)
- ``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
- ``NodeOrigin``, ``SignatureHash``
- ``Status`` (active|superseded)
VERDICT / VERDICT_VERSION:
- ``VERDICT`` is the anchor
- ``VERDICT_VERSION`` is the snapshot:
- ``VersionID`` (PK)
- ``VerdictID`` (FK to VERDICT)
- ``ParentVersionID``
- ``ClaimID`` (FK to CLAIM)
- ``ScenarioVersionID`` (FK to specific SCENARIO_VERSION)
- ``EvidenceVersionSet`` (JSON array of Evidence VersionIDs used)
- ``LikelihoodRange`` (0–1, with uncertainty bounds)
- ``ExplanationChain`` (JSON)
- ``UncertaintyFactors`` (JSON)
- ``PublicationMode`` (Mode1|Mode2|Mode3)
- ``ReviewStatus`` (draft|in_review|approved|retracted)
- ``CreatedAt``, ``AuthorType``, ``CreatedBy``, ``JustificationText``
- ``NodeOrigin``, ``SignatureHash``
- ``Status`` (current|outdated|superseded|retracted)
3. Many-to-Many Linking Tables
ScenarioEvidenceLink:
- Links scenario versions to evidence versions with relevance scoring
- ``ScenarioID``, ``ScenarioVersionID``
- ``EvidenceID``, ``EvidenceVersionID``
- ``RelevanceScore`` (0–1) - How relevant this evidence is to this scenario
- ``LinkJustification`` - Brief explanation of relevance
Purpose:
- Evidence can be used by multiple scenarios
- Scenarios can draw from multiple pieces of evidence
- Relevance scoring helps prioritize evidence
- Version-specific linking preserves historical accuracy
ClaimCluster:
- Semantic clustering of similar claims
- ``ClusterID`` (PK)
- ``EmbeddingVector`` - Vector representation for semantic search
- ``MemberList`` - List of ClaimIDs in this cluster
- ``Theme`` - Human-readable theme description
4. Key Changes in v0.9.1
Updated Field Names:
- `SafetyCategory` → `RiskTier` (consistency with risk tier system A/B/C)
- `SafetyClass` removed from Scenario (redundant with claim-level RiskTier)
Added Fields to All Version Entities:
- `PublicationMode` - Track Mode 1/2/3 status
- `ReviewStatus` - Track workflow state
- `NodeOrigin` - Federation provenance
- `CreatedBy` - FK to User/TechnicalUser (clarified)
New Entity:
- `TECHNICAL_USER` - Separate system processes from human users
Clarifications:
- `ScenarioVersionID` in Verdict (not just ScenarioID) - links to specific version
- `ContentHash` in Evidence - SHA256 for integrity checking
5. Data Model Behavior
5.1 Late-Arriving Evidence
When new evidence versions appear:
- Existing verdicts marked as outdated
2. Scenario relevance must be re-evaluated
3. Re-evaluation engine triggers verdict recomputation
4. New verdict versions created
5. Users notified of updates
5.2 Scenario Evolution
When a scenario's assumptions or definitions change:
- Creates new scenario version (not in-place update)
- All dependent verdicts must be recalculated
- Previous scenario versions remain accessible
- Version lineage preserved
5.3 Federated Nodes
Each node may share partial data:
- Claims and scenarios shared if relevant
- Evidence metadata shared, not always full files
- Version synchronization via NodeOrigin tracking
- Branching allowed for divergent interpretations
6. Visual Diagrams
The following diagrams provide visual representations of the data model structure and relationships.
6.1 Core Data Model ERD
Current Implementation Data Model
erDiagram
ARTICLE ||--o{ CLAIM : contains
ARTICLE ||--|| ARTICLE_VERDICT : has
CLAIM ||--|| CLAIM_VERDICT : has
CLAIM ||--o{ CLAIM : depends_on
CLAIM_VERDICT }o--o{ EVIDENCE_ITEM : supported_by
SOURCE ||--o{ EVIDENCE_ITEM : provides
ARTICLE ||--o{ ANALYSIS_CONTEXT : has
ARTICLE {
string id_PK
string inputType
string inputValue
string articleThesis
string detectedInputType
boolean requiresSeparateAnalysis
json analysisContexts
string schemaVersion
}
CLAIM {
string id_PK
string articleId_FK
string text
string type
string claimRole
string_array dependsOn
string keyFactorId
boolean isCentral
string contextId
}
CLAIM_VERDICT {
string claimId_FK
number verdict
number truthPercentage
number confidence
string reasoning
string_array supportingEvidenceIds
string ratingConfirmation
boolean isContested
string contestedBy
string factualBasis
}
ARTICLE_VERDICT {
string articleId_FK
string verdict
int truthPercentage
int confidence
string summary
}
EVIDENCE_ITEM {
string id_PK
string sourceId_FK
string statement
string sourceExcerpt
string category
string claimDirection
string contextId
string sourceAuthority
string probativeValue
string evidenceBasis
number extractionConfidence
}
SOURCE {
string id_PK
string url
string title
float trackRecordScore
float trackRecordConfidence
boolean trackRecordConsensus
string category
boolean fetchSuccess
}
ANALYSIS_CONTEXT {
string id_PK
string name
string shortName
string subject
string temporal
string status
string outcome
string assessedStatement
json metadata
}
Key Implementation Notes
7-Point Verdict Scale:
- TRUE (86-100%) / MOSTLY-TRUE (72-85%) / LEANING-TRUE (58-71%)
- MIXED (43-57%, high confidence) / UNVERIFIED (43-57%, low confidence)
- LEANING-FALSE (29-42%) / MOSTLY-FALSE (15-28%) / FALSE (0-14%)
ratingConfirmation (v2.8.4): LLM-provided verdict direction confirmation ("claim_supported" | "claim_refuted" | "mixed"). Used for direction mismatch validation.
KeyFactors: Optional decomposition questions discovered during analysis - not stored as separate entities.
Storage: All data stored as JSON blob in SQLite ResultJson field.
See Also: Target Data Model for normalized design.
6.2 User Roles Structure
User Class Diagram
classDiagram
class BaseUser {
+view_results()
+browse()
+search()
}
class Reader {
<>
+browse()
+search()
+view_results()
}
class RegisteredUser {
+UUID id
+String username
+Role role
+Timestamp created_at
+submit_url()
+flag_issue()
+view_submission_history()
}
class UCMAdministrator {
+manage_config()
+view_audit_trail()
+activate_config_version()
+trigger_reanalysis()
+view_system_metrics()
}
class Moderator {
+review_flags()
+hide_content()
+ban_user()
}
BaseUser <|-- Reader : anonymous
BaseUser <|-- RegisteredUser : logged in
RegisteredUser <|-- UCMAdministrator : appointed
RegisteredUser <|-- Moderator : appointed
Role Permissions
| Role | Capabilities | Requirements |
|---|---|---|
| Reader (Guest) | Browse, search, view results | No login required |
| User (Registered) | Everything Reader can + submit URLs/text (rate-limited), flag content | Free account required |
| UCM Administrator | Everything User can + manage UCM config, view audit trail, trigger re-analysis | Appointed by Governing Team |
| Moderator | Everything User can + review flags, hide content, ban users | Appointed by Governing Team |
Current Implementation
- All users are anonymous Readers (no authentication system yet)
- UCM config management via CLI/direct DB access
- No moderator tooling
- No rate limiting (single-user development mode)
Design Principles
- No data editing roles — analysis outputs are immutable
- UCM Administrator improves the system through configuration, not by editing individual outputs
- Submission requires login — LLM inference and web search are not free; rate limits control costs
- Four roles: Reader (guest), User (registered), UCM Administrator (appointed), Moderator (appointed)
6.3 Content Workflow
Claim Analysis Workflow
graph TB
Start[User Submission]
subgraph Step1[Step 1 Understand]
Extract{understandClaim LLM Analysis}
Gate1{Gate 1 Claim Validation}
DetectType[Detect Input Type]
DetectContexts[Detect Contexts]
KeyFactors[Discover KeyFactors]
end
subgraph Step2[Step 2 Research]
Decide[decideNextResearch]
Search[Web Search]
Fetch[Fetch Sources]
Facts[extractEvidence]
end
subgraph Step3[Step 3 Verdict]
Verdict[generateVerdicts]
Gate4{Gate 4 Confidence Check}
end
subgraph Output[Output]
Publish[Publish Result]
LowConf[Low Confidence Flag]
end
Start --> Extract
Extract --> Gate1
Gate1 -->|Pass Factual| DetectType
Gate1 -->|Fail Opinion| Exclude[Exclude from analysis]
DetectType --> DetectContexts
DetectContexts --> KeyFactors
KeyFactors --> Decide
Decide --> Search
Search --> Fetch
Fetch --> Facts
Facts -->|More research needed| Decide
Facts -->|Sufficient evidence| Verdict
Verdict --> Gate4
Gate4 -->|High or Medium confidence| Publish
Gate4 -->|Low or Insufficient| LowConf
Quality Gates (Implemented)
| Gate | Name | Purpose | Pass Criteria |
|---|---|---|---|
| Gate 1 | Claim Validation | Filter non-factual claims | Factual, opinion score 0.3 or less, specificity 0.3 or more |
| Gate 4 | Verdict Confidence | Ensure sufficient evidence | 2 or more sources, avg quality 0.6 or more, agreement 60% or more |
Gates 2 (Contradiction Search) and 3 (Uncertainty Quantification) are not yet implemented.
KeyFactors (Replaces Scenarios)
KeyFactors are optional decomposition questions discovered during the understanding phase:
- Not stored as separate entities
- Help break down complex claims into checkable sub-questions
- See KeyFactors Design for design rationale
7-Point Verdict Scale
- TRUE (86-100%) - Claim is well-supported by evidence
- MOSTLY-TRUE (72-85%) - Largely accurate with minor caveats
- LEANING-TRUE (58-71%) - More evidence supports than contradicts
- MIXED (43-57%, high confidence) - Roughly equal evidence both ways
- UNVERIFIED (43-57%, low confidence) - Insufficient evidence to determine
- LEANING-FALSE (29-42%) - More evidence contradicts than supports
- MOSTLY-FALSE (15-28%) - Largely inaccurate
- FALSE (0-14%) - Claim is refuted by evidence