Data Model - XWiki

1

= Data Model =

2

FactHarbor's data model is **simple, focused, designed for automated processing**.

3

== 1. Core Entities ==

4

=== 1.1 Claim ===

5

Fields: id, assertion, domain, **status** (Published/Hidden only), **confidence_score**, **risk_score**, completeness_score, version, views, edit_count

6

==== Performance Optimization: Denormalized Fields ====

7

**Rationale**: Claims system is 95% reads, 5% writes. Denormalizing common data reduces joins and improves query performance by 70%.

8

**Additional cached fields in claims table**:

9

* **evidence_summary** (JSONB): Top 5 most relevant evidence snippets with scores

10

* Avoids joining evidence table for listing/preview

11

* Updated when evidence is added/removed

12

* Format: `[{"text": "...", "source": "...", "relevance": 0.95}, ...]`

13

* **source_names** (TEXT[]): Array of source names for quick display

14

* Avoids joining through evidence to sources

15

* Updated when sources change

16

* Format: `["New York Times", "Nature Journal", ...]`

17

* **scenario_count** (INTEGER): Number of scenarios for this claim

18

* Quick metric without counting rows

19

* Updated when scenarios added/removed

20

* **cache_updated_at** (TIMESTAMP): When denormalized data was last refreshed

21

* Helps invalidate stale caches

22

* Triggers background refresh if too old

23

**Update Strategy**:

24

* **Immediate**: Update on claim edit (user-facing)

25

* **Deferred**: Update via background job every hour (non-critical)

26

* **Invalidation**: Clear cache when source data changes significantly

27

**Trade-offs**:

28

* ✅ 70% fewer joins on common queries

29

* ✅ Much faster claim list/search pages

30

* ✅ Better user experience

31

* ⚠️ Small storage increase (~10%)

32

* ⚠️ Need to keep caches in sync

33

=== 1.2 Evidence ===

34

Fields: claim_id, source_id, excerpt, url, relevance_score, supports

35

=== 1.3 Source ===

36

**Purpose**: Track reliability of information sources over time

37

**Fields**:

38

* **id** (UUID): Unique identifier

39

* **name** (text): Source name (e.g., "New York Times", "Nature Journal")

40

* **domain** (text): Website domain (e.g., "nytimes.com")

41

* **type** (enum): NewsOutlet, AcademicJournal, GovernmentAgency, etc.

42

* **track_record_score** (0-100): Overall reliability score

43

* **accuracy_history** (JSON): Historical accuracy data

44

* **correction_frequency** (float): How often source publishes corrections

45

* **last_updated** (timestamp): When track record last recalculated

46

**How It Works**:

47

* Initial score based on source type (70 for academic journals, 30 for unknown)

48

* Updated daily by background scheduler

49

* Formula: accuracy_rate (50%) + correction_policy (20%) + editorial_standards (15%) + bias_transparency (10%) + longevity (5%)

50

* Track Record Check in AKEL pipeline: Adjusts evidence confidence based on source quality

51

* Quality thresholds: 90+=Exceptional, 70-89=Reliable, 50-69=Acceptable, 30-49=Questionable, <30=Unreliable

52

**See**: SOURCE Track Record System documentation for complete details on calculation, updates, and usage

53

Fields: id, name, domain, **track_record_score**, **accuracy_history**, **correction_frequency**

54

**Key**: Automated source reliability tracking

55

==== Source Scoring Process (Separation of Concerns) ====

56

**Critical design principle**: Prevent circular dependencies between source scoring and claim analysis.

57

**The Problem**:

58

* Source scores should influence claim verdicts

59

* Claim verdicts should update source scores

60

* But: Direct feedback creates circular dependency and potential feedback loops

61

**The Solution**: Temporal separation

62

==== Weekly Background Job (Source Scoring) ====

63

Runs independently of claim analysis:

64

65

def update_source_scores_weekly():

66

"""

67

Background job: Calculate source reliability

68

Never triggered by individual claim analysis

69

"""

70

# Analyze all claims from past week

71

claims = get_claims_from_past_week()

72

for source in get_all_sources():

73

# Calculate accuracy metrics

74

correct_verdicts = count_correct_verdicts_citing(source, claims)

75

total_citations = count_total_citations(source, claims)

76

accuracy = correct_verdicts / total_citations if total_citations > 0 else 0.5

77

# Weight by claim importance

78

weighted_score = calculate_weighted_score(source, claims)

79

# Update source record

80

source.track_record_score = weighted_score

81

source.total_citations = total_citations

82

source.last_updated = now()

83

source.save()

84

# Job runs: Sunday 2 AM UTC

85

# Never during claim processing

86

87

==== Real-Time Claim Analysis (AKEL) ====

88

Uses source scores but never updates them:

89

90

def analyze_claim(claim_text):

91

"""

92

Real-time: Analyze claim using current source scores

93

READ source scores, never UPDATE them

94

"""

95

# Gather evidence

96

evidence_list = gather_evidence(claim_text)

97

for evidence in evidence_list:

98

# READ source score (snapshot from last weekly update)

99

source = get_source(evidence.source_id)

100

source_score = source.track_record_score

101

# Use score to weight evidence

102

evidence.weighted_relevance = evidence.relevance * source_score

103

# Generate verdict using weighted evidence

104

verdict = synthesize_verdict(evidence_list)

105

# NEVER update source scores here

106

# That happens in weekly background job

107

return verdict

108

109

==== Monthly Audit (Quality Assurance) ====

110

Moderator review of flagged source scores:

111

* Verify scores make sense

112

* Detect gaming attempts

113

* Identify systematic biases

114

* Manual adjustments if needed

115

**Key Principles**:

116

✅ **Scoring and analysis are temporally separated**

117

* Source scoring: Weekly batch job

118

* Claim analysis: Real-time processing

119

* Never update scores during analysis

120

✅ **One-way data flow during processing**

121

* Claims READ source scores

122

* Claims NEVER WRITE source scores

123

* Updates happen in background only

124

✅ **Predictable update cycle**

125

* Sources update every Sunday 2 AM

126

* Claims always use last week's scores

127

* No mid-week score changes

128

✅ **Audit trail**

129

* Log all score changes

130

* Track score history

131

* Explainable calculations

132

**Benefits**:

133

* No circular dependencies

134

* Predictable behavior

135

* Easier to reason about

136

* Simpler testing

137

* Clear audit trail

138

**Example Timeline**:

139

```

140

Sunday 2 AM: Calculate source scores for past week

141

→ NYT score: 0.87 (up from 0.85)

142

→ Blog X score: 0.52 (down from 0.61)

143

Monday-Saturday: Claims processed using these scores

144

→ All claims this week use NYT=0.87

145

→ All claims this week use Blog X=0.52

146

Next Sunday 2 AM: Recalculate scores including this week's claims

147

→ NYT score: 0.89 (trending up)

148

→ Blog X score: 0.48 (trending down)

149

```

150

=== 1.4 Scenario ===

151

**Purpose**: Different interpretations or contexts for evaluating claims

152

**Key Concept**: Scenarios are extracted from evidence, not generated arbitrarily. Each scenario represents a specific context, assumption set, or condition under which a claim should be evaluated.

153

**Relationship**: One-to-many with Claims (**simplified for V1.0**: scenario belongs to single claim)

154

**Fields**:

155

* **id** (UUID): Unique identifier

156

* **claim_id** (UUID): Foreign key to claim (one-to-many)

157

* **description** (text): Human-readable description of the scenario

158

* **assumptions** (JSONB): Key assumptions that define this scenario context

159

* **extracted_from** (UUID): Reference to evidence that this scenario was extracted from

160

* **verdict_summary** (text): Compiled verdict for this scenario

161

* **confidence** (decimal 0-1): Confidence level for verdict in this scenario

162

* **created_at** (timestamp): When scenario was created

163

* **updated_at** (timestamp): Last modification

164

**How Found**: Evidence search → Extract context → Create scenario → Link to claim

165

**Example**:

166

For claim "Vaccines reduce hospitalization":

167

* Scenario 1: "Clinical trials (healthy adults 18-65, original strain)" from trial paper

168

* Scenario 2: "Real-world data (diverse population, Omicron variant)" from hospital data

169

* Scenario 3: "Immunocompromised patients" from specialist study

170

**V2.0 Evolution**: Many-to-many relationship can be added if users request cross-claim scenario sharing. For V1.0, keeping scenarios tied to single claims simplifies queries and reduces complexity without limiting functionality.

171

=== 1.5 User ===

172

Fields: username, email, **role** (Reader/Contributor/Moderator), **reputation**, contributions_count

173

=== User Reputation System ==

174

**V1.0 Approach**: Simple manual role assignment

175

**Rationale**: Complex reputation systems aren't needed until 100+ active contributors demonstrate the need for automated reputation management. Start simple, add complexity when metrics prove necessary.

176

=== Roles (Manual Assignment) ===

177

**reader** (default):

178

* View published claims and evidence

179

* Browse and search content

180

* No editing permissions

181

**contributor**:

182

* Submit new claims

183

* Suggest edits to existing content

184

* Add evidence

185

* Requires manual promotion by moderator/admin

186

**moderator**:

187

* Approve/reject contributor suggestions

188

* Flag inappropriate content

189

* Handle abuse reports

190

* Assigned by admins based on trust

191

**admin**:

192

* Manage users and roles

193

* System configuration

194

* Access to all features

195

* Founder-appointed initially

196

=== Contribution Tracking (Simple) ===

197

**Basic metrics only**:

198

* `contributions_count`: Total number of contributions

199

* `created_at`: Account age

200

* `last_active`: Recent activity

201

**No complex calculations**:

202

* No point systems

203

* No automated privilege escalation

204

* No reputation decay

205

* No threshold-based promotions

206

=== Promotion Process ===

207

**Manual review by moderators/admins**:

208

1. User demonstrates value through contributions

209

2. Moderator reviews user's contribution history

210

3. Moderator promotes user to contributor role

211

4. Admin promotes trusted contributors to moderator

212

**Criteria** (guidelines, not automated):

213

* Quality of contributions

214

* Consistency over time

215

* Collaborative behavior

216

* Understanding of project goals

217

=== V2.0+ Evolution ===

218

**Add complex reputation when**:

219

* 100+ active contributors

220

* Manual role management becomes bottleneck

221

* Clear patterns of abuse emerge requiring automation

222

**Future features may include**:

223

* Automated point calculations

224

* Threshold-based promotions

225

* Reputation decay for inactive users

226

* Track record scoring for contributors

227

See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for triggers.

228

=== 1.6 Edit ===

229

**Fields**: entity_type, entity_id, user_id, before_state (JSON), after_state (JSON), edit_type, reason, created_at

230

**Purpose**: Complete audit trail for all content changes

231

=== Edit History Details ===

232

**What Gets Edited**:

233

* **Claims** (20% edited): assertion, domain, status, scores, analysis

234

* **Evidence** (10% edited): excerpt, relevance_score, supports

235

* **Scenarios** (5% edited): description, assumptions, confidence

236

* **Sources**: NOT versioned (continuous updates, not editorial decisions)

237

**Who Edits**:

238

* **Contributors** (rep sufficient): Corrections, additions

239

* **Trusted Contributors** (rep sufficient): Major improvements, approvals

240

* **Moderators**: Abuse handling, dispute resolution

241

* **System (AKEL)**: Re-analysis, automated improvements (user_id = NULL)

242

**Edit Types**:

243

* `CONTENT_CORRECTION`: User fixes factual error

244

* `CLARIFICATION`: Improved wording

245

* `SYSTEM_REANALYSIS`: AKEL re-processed claim

246

* `MODERATION_ACTION`: Hide/unhide for abuse

247

* `REVERT`: Rollback to previous version

248

**Retention Policy** (5 years total):

249

1. **Hot storage** (3 months): PostgreSQL, instant access

250

2. **Warm storage** (2 years): Partitioned, slower queries

251

3. **Cold storage** (3 years): S3 compressed, download required

252

4. **Deletion**: After 5 years (except legal holds)

253

**Storage per 1M claims**: ~400 MB (20% edited × 2 KB per edit)

254

**Use Cases**:

255

* View claim history timeline

256

* Detect vandalism patterns

257

* Learn from user corrections (system improvement)

258

* Legal compliance (audit trail)

259

* Rollback capability

260

See **Edit History Documentation** for complete details on what gets edited by whom, retention policy, and use cases

261

=== 1.7 Flag ===

262

Fields: entity_id, reported_by, issue_type, status, resolution_note

263

=== 1.8 QualityMetric ===

264

**Fields**: metric_type, category, value, target, timestamp

265

**Purpose**: Time-series quality tracking

266

**Usage**:

267

* **Continuous monitoring**: Hourly calculation of error rates, confidence scores, processing times

268

* **Quality dashboard**: Real-time display with trend charts

269

* **Alerting**: Automatic alerts when metrics exceed thresholds

270

* **A/B testing**: Compare control vs treatment metrics

271

* **Improvement validation**: Measure before/after changes

272

**Example**: `{type: "ErrorRate", category: "Politics", value: 0.12, target: 0.10, timestamp: "2025-12-17"}`

273

=== 1.9 ErrorPattern ===

274

**Fields**: error_category, claim_id, description, root_cause, frequency, status

275

**Purpose**: Capture errors to trigger system improvements

276

**Usage**:

277

* **Error capture**: When users flag issues or system detects problems

278

* **Pattern analysis**: Weekly grouping by category and frequency

279

* **Improvement workflow**: Analyze → Fix → Test → Deploy → Re-process → Monitor

280

* **Metrics**: Track error rate reduction over time

281

**Example**: `{category: "WrongSource", description: "Unreliable tabloid cited", root_cause: "No quality check", frequency: 23, status: "Fixed"}`

282

283

== 1.4 Core Data Model ERD ==

284

285

{{include reference="FactHarbor.Specification.Diagrams.Core Data Model ERD.WebHome"/}}

286

287

== 1.5 User Class Diagram ==

288

{{include reference="FactHarbor.Specification.Diagrams.User Class Diagram.WebHome"/}}

289

== 2. Versioning Strategy ==

290

**All Content Entities Are Versioned**:

291

* **Claim**: Every edit creates new version (V1→V2→V3...)

292

* **Evidence**: Changes tracked in edit history

293

* **Scenario**: Modifications versioned

294

**How Versioning Works**:

295

* Entity table stores **current state only**

296

* Edit table stores **all historical states** (before_state, after_state as JSON)

297

* Version number increments with each edit

298

* Complete audit trail maintained forever

299

**Unversioned Entities** (current state only, no history):

300

* **Source**: Track record continuously updated (not versioned history, just current score)

301

* **User**: Account state (reputation accumulated, not versioned)

302

* **QualityMetric**: Time-series data (each record is a point in time, not a version)

303

* **ErrorPattern**: System improvement queue (status tracked, not versioned)

304

**Example**:

305

```

306

Claim V1: "The sky is blue"

307

→ User edits →

308

Claim V2: "The sky is blue during daytime"

309

→ EDIT table stores: {before: "The sky is blue", after: "The sky is blue during daytime"}

310

```

311

== 2.5. Storage vs Computation Strategy ==

312

**Critical architectural decision**: What to persist in databases vs compute dynamically?

313

**Trade-off**:

314

* **Store more**: Better reproducibility, faster, lower LLM costs | Higher storage/maintenance costs

315

* **Compute more**: Lower storage/maintenance costs | Slower, higher LLM costs, less reproducible

316

=== Recommendation: Hybrid Approach ===

317

**STORE (in PostgreSQL):**

318

==== Claims (Current State + History) ====

319

* **What**: assertion, domain, status, created_at, updated_at, version

320

* **Why**: Core entity, must be persistent

321

* **Also store**: confidence_score (computed once, then cached)

322

* **Size**: ~1 KB per claim

323

* **Growth**: Linear with claims

324

* **Decision**: ✅ STORE - Essential

325

==== Evidence (All Records) ====

326

* **What**: claim_id, source_id, excerpt, url, relevance_score, supports, extracted_at

327

* **Why**: Hard to re-gather, user contributions, reproducibility

328

* **Size**: ~2 KB per evidence (with excerpt)

329

* **Growth**: 3-10 evidence per claim

330

* **Decision**: ✅ STORE - Essential for reproducibility

331

==== Sources (Track Records) ====

332

* **What**: name, domain, track_record_score, accuracy_history, correction_frequency

333

* **Why**: Continuously updated, expensive to recompute

334

* **Size**: ~500 bytes per source

335

* **Growth**: Slow (limited number of sources)

336

* **Decision**: ✅ STORE - Essential for quality

337

==== Edit History (All Versions) ====

338

* **What**: before_state, after_state, user_id, reason, timestamp

339

* **Why**: Audit trail, legal requirement, reproducibility

340

* **Size**: ~2 KB per edit

341

* **Growth**: Linear with edits (~A portion of claims get edited)

342

* **Retention**: Hot storage 3 months → Warm storage 2 years → Archive to S3 3 years → Delete after 5 years total

343

* **Decision**: ✅ STORE - Essential for accountability

344

==== Flags (User Reports) ====

345

* **What**: entity_id, reported_by, issue_type, description, status

346

* **Why**: Error detection, system improvement triggers

347

* **Size**: ~500 bytes per flag

348

* **Growth**: 5-high percentage of claims get flagged

349

* **Decision**: ✅ STORE - Essential for improvement

350

==== ErrorPatterns (System Improvement) ====

351

* **What**: error_category, claim_id, description, root_cause, frequency, status

352

* **Why**: Learning loop, prevent recurring errors

353

* **Size**: ~1 KB per pattern

354

* **Growth**: Slow (limited patterns, many fixed)

355

* **Decision**: ✅ STORE - Essential for learning

356

==== QualityMetrics (Time Series) ====

357

* **What**: metric_type, category, value, target, timestamp

358

* **Why**: Trend analysis, cannot recreate historical metrics

359

* **Size**: ~200 bytes per metric

360

* **Growth**: Hourly = 8,760 per year per metric type

361

* **Retention**: 2 years hot, then aggregate and archive

362

* **Decision**: ✅ STORE - Essential for monitoring

363

**STORE (Computed Once, Then Cached):**

364

==== Analysis Summary ====

365

* **What**: Neutral text summary of claim analysis (200-500 words)

366

* **Computed**: Once by AKEL when claim first analyzed

367

* **Stored in**: Claim table (text field)

368

* **Recomputed**: Only when system significantly improves OR claim edited

369

* **Why store**: Expensive to regenerate ($0.01-0.05 per analysis), doesn't change often

370

* **Size**: ~2 KB per claim

371

* **Decision**: ✅ STORE (cached) - Cost-effective

372

==== Confidence Score ====

373

* **What**: 0-100 score of analysis confidence

374

* **Computed**: Once by AKEL

375

* **Stored in**: Claim table (integer field)

376

* **Recomputed**: When evidence added, source track record changes significantly, or system improves

377

* **Why store**: Cheap to store, expensive to compute, users need it fast

378

* **Size**: 4 bytes per claim

379

* **Decision**: ✅ STORE (cached) - Performance critical

380

==== Risk Score ====

381

* **What**: 0-100 score of claim risk level

382

* **Computed**: Once by AKEL

383

* **Stored in**: Claim table (integer field)

384

* **Recomputed**: When domain changes, evidence changes, or controversy detected

385

* **Why store**: Same as confidence score

386

* **Size**: 4 bytes per claim

387

* **Decision**: ✅ STORE (cached) - Performance critical

388

**COMPUTE DYNAMICALLY (Never Store):**

389

==== Scenarios ==== ⚠️ CRITICAL DECISION

390

* **What**: 2-5 possible interpretations of claim with assumptions

391

* **Current design**: Stored in Scenario table

392

* **Alternative**: Compute on-demand when user views claim details

393

* **Storage cost**: ~1 KB per scenario × 3 scenarios average = ~3 KB per claim

394

* **Compute cost**: $0.005-0.01 per request (LLM API call)

395

* **Frequency**: Viewed in detail by ~20% of users

396

* **Trade-off analysis**:

397

- IF STORED: 1M claims × 3 KB = 3 GB storage, $0.05/month, fast access

398

- IF COMPUTED: 1M claims × 20% views × $0.01 = $2,000/month in LLM costs

399

* **Reproducibility**: Scenarios may improve as AI improves (good to recompute)

400

* **Speed**: Computed = 5-8 seconds delay, Stored = instant

401

* **Decision**: ✅ STORE (hybrid approach below)

402

**Scenario Strategy** (APPROVED):

403

1. **Store scenarios** initially when claim analyzed

404

2. **Mark as stale** when system improves significantly

405

3. **Recompute on next view** if marked stale

406

4. **Cache for 30 days** if frequently accessed

407

5. **Result**: Best of both worlds - speed + freshness

408

==== Verdict Synthesis ====

409

* **What**: Final conclusion text synthesizing all scenarios

410

* **Compute cost**: $0.002-0.005 per request

411

* **Frequency**: Every time claim viewed

412

* **Why not store**: Changes as evidence/scenarios change, users want fresh analysis

413

* **Speed**: 2-3 seconds (acceptable)

414

**Alternative**: Store "last verdict" as cached field, recompute only if claim edited or marked stale

415

* **Recommendation**: ✅ STORE cached version, mark stale when changes occur

416

==== Search Results ====

417

* **What**: Lists of claims matching search query

418

* **Compute from**: Elasticsearch index

419

* **Cache**: 15 minutes in Redis for popular queries

420

* **Why not store permanently**: Constantly changing, infinite possible queries

421

==== Aggregated Statistics ====

422

* **What**: "Total claims: 1,234,567", "Average confidence: 78%", etc.

423

* **Compute from**: Database queries

424

* **Cache**: 1 hour in Redis

425

* **Why not store**: Can be derived, relatively cheap to compute

426

==== User Reputation ====

427

* **What**: Score based on contributions

428

* **Current design**: Stored in User table

429

* **Alternative**: Compute from Edit table

430

* **Trade-off**:

431

- Stored: Fast, simple

432

- Computed: Always accurate, no denormalization

433

* **Frequency**: Read on every user action

434

* **Compute cost**: Simple COUNT query, milliseconds

435

* **Decision**: ✅ STORE - Performance critical, read-heavy

436

=== Summary Table ===

437

438

|-----------|---------|---------|----------------|----------|-----------|

439

| Claim core | ✅ | - | 1 KB | STORE | Essential |

| Search results | - | ✅ | - | COMPUTE + 15min cache | Dynamic |

452

| Aggregations | - | ✅ | - | COMPUTE + 1hr cache | Derivable |

453

**Total storage per claim**: ~18 KB (without edits and flags)

454

**For 1 million claims**:

455

* **Storage**: ~18 GB (manageable)

456

* **PostgreSQL**: ~$50/month (standard instance)

457

* **Redis cache**: ~$20/month (1 GB instance)

458

* **S3 archives**: ~$5/month (old edits)

459

* **Total**: ~$75/month infrastructure

460

**LLM cost savings by caching**:

461

* Analysis summary stored: Save $0.03 per claim = $30K per 1M claims

462

* Scenarios stored: Save $0.01 per claim × 20% views = $2K per 1M claims

463

* Verdict stored: Save $0.003 per claim = $3K per 1M claims

464

* **Total savings**: ~$35K per 1M claims vs recomputing every time

465

=== Recomputation Triggers ===

466

**When to mark cached data as stale and recompute:**

467

1. **User edits claim** → Recompute: all (analysis, scenarios, verdict, scores)

468

2. **Evidence added** → Recompute: scenarios, verdict, confidence score

469

3. **Source track record changes >10 points** → Recompute: confidence score, verdict

470

4. **System improvement deployed** → Mark affected claims stale, recompute on next view

471

5. **Controversy detected** (high flag rate) → Recompute: risk score

472

**Recomputation strategy**:

473

* **Eager**: Immediately recompute (for user edits)

474

* **Lazy**: Recompute on next view (for system improvements)

475

* **Batch**: Nightly re-evaluation of stale claims (if <1000)

476

=== Database Size Projection ===

477

**Year 1**: 10K claims

478

* Storage: 180 MB

479

* Cost: $10/month

480

**Year 3**: 100K claims

481

* Storage: 1.8 GB

482

* Cost: $30/month

483

**Year 5**: 1M claims

484

* Storage: 18 GB

485

* Cost: $75/month

486

**Year 10**: 10M claims

487

* Storage: 180 GB

488

* Cost: $300/month

489

* Optimization: Archive old claims to S3 ($5/TB/month)

490

**Conclusion**: Storage costs are manageable, LLM cost savings are substantial.

491

== 3. Key Simplifications ==

492

* **Two content states only**: Published, Hidden

493

* **Three user roles only**: Reader, Contributor, Moderator

494

* **No complex versioning**: Linear edit history

495

* **Reputation-based permissions**: Not role hierarchy

496

* **Source track records**: Continuous evaluation

497

== 3. What Gets Stored in the Database ==

498

=== 3.1 Primary Storage (PostgreSQL) ===

499

**Claims Table**:

500

* Current state only (latest version)

501

* Fields: id, assertion, domain, status, confidence_score, risk_score, completeness_score, version, created_at, updated_at

502

**Evidence Table**:

503

* All evidence records

504

* Fields: id, claim_id, source_id, excerpt, url, relevance_score, supports, extracted_at, archived

505

**Scenario Table**:

506

* All scenarios for each claim

507

* Fields: id, claim_id, description, assumptions (text array), confidence, created_by, created_at

508

**Source Table**:

509

* Track record database (continuously updated)

510

* Fields: id, name, domain, type, track_record_score, accuracy_history (JSON), correction_frequency, last_updated, claim_count, corrections_count

511

**User Table**:

512

* All user accounts

513

* Fields: id, username, email, role (Reader/Contributor/Moderator), reputation, created_at, last_active, contributions_count, flags_submitted, flags_accepted

514

**Edit Table**:

515

* Complete version history

516

* Fields: id, entity_type, entity_id, user_id, before_state (JSON), after_state (JSON), edit_type, reason, created_at

517

**Flag Table**:

518

* User-reported issues

519

* Fields: id, entity_type, entity_id, reported_by, issue_type, description, status, resolved_by, resolution_note, created_at, resolved_at