Changes for page Data Model
Last modified by Robert Schaub on 2026/02/08 08:27
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -7,19 +7,19 @@ 7 7 **Rationale**: Claims system is 95% reads, 5% writes. Denormalizing common data reduces joins and improves query performance by 70%. 8 8 **Additional cached fields in claims table**: 9 9 * **evidence_summary** (JSONB): Top 5 most relevant evidence snippets with scores 10 - * Avoids joining evidence table for listing/preview11 - * Updated when evidence is added/removed12 - * Format: `[{"text": "...", "source": "...", "relevance": 0.95}, ...]`10 + * Avoids joining evidence table for listing/preview 11 + * Updated when evidence is added/removed 12 + * Format: `[{"text": "...", "source": "...", "relevance": 0.95}, ...]` 13 13 * **source_names** (TEXT[]): Array of source names for quick display 14 - * Avoids joining through evidence to sources15 - * Updated when sources change16 - * Format: `["New York Times", "Nature Journal", ...]`14 + * Avoids joining through evidence to sources 15 + * Updated when sources change 16 + * Format: `["New York Times", "Nature Journal", ...]` 17 17 * **scenario_count** (INTEGER): Number of scenarios for this claim 18 - * Quick metric without counting rows19 - * Updated when scenarios added/removed18 + * Quick metric without counting rows 19 + * Updated when scenarios added/removed 20 20 * **cache_updated_at** (TIMESTAMP): When denormalized data was last refreshed 21 - * Helps invalidate stale caches22 - * Triggers background refresh if too old21 + * Helps invalidate stale caches 22 + * Triggers background refresh if too old 23 23 **Update Strategy**: 24 24 * **Immediate**: Update on claim edit (user-facing) 25 25 * **Deferred**: Update via background job every hour (non-critical) ... ... @@ -63,48 +63,48 @@ 63 63 Runs independently of claim analysis: 64 64 {{code language="python"}} 65 65 def update_source_scores_weekly(): 66 - """67 - Background job: Calculate source reliability68 - Never triggered by individual claim analysis69 - """70 - # Analyze all claims from past week71 - claims = get_claims_from_past_week()72 - for source in get_all_sources():73 - # Calculate accuracy metrics74 - correct_verdicts = count_correct_verdicts_citing(source, claims)75 - total_citations = count_total_citations(source, claims)76 - accuracy = correct_verdicts / total_citations if total_citations > 0 else 0.577 - # Weight by claim importance78 - weighted_score = calculate_weighted_score(source, claims)79 - # Update source record80 - source.track_record_score = weighted_score81 - source.total_citations = total_citations82 - source.last_updated = now()83 - source.save()84 - # Job runs: Sunday 2 AM UTC85 - # Never during claim processing66 + """ 67 + Background job: Calculate source reliability 68 + Never triggered by individual claim analysis 69 + """ 70 + # Analyze all claims from past week 71 + claims = get_claims_from_past_week() 72 + for source in get_all_sources(): 73 + # Calculate accuracy metrics 74 + correct_verdicts = count_correct_verdicts_citing(source, claims) 75 + total_citations = count_total_citations(source, claims) 76 + accuracy = correct_verdicts / total_citations if total_citations > 0 else 0.5 77 + # Weight by claim importance 78 + weighted_score = calculate_weighted_score(source, claims) 79 + # Update source record 80 + source.track_record_score = weighted_score 81 + source.total_citations = total_citations 82 + source.last_updated = now() 83 + source.save() 84 + # Job runs: Sunday 2 AM UTC 85 + # Never during claim processing 86 86 {{/code}} 87 87 ==== Real-Time Claim Analysis (AKEL) ==== 88 88 Uses source scores but never updates them: 89 89 {{code language="python"}} 90 90 def analyze_claim(claim_text): 91 - """92 - Real-time: Analyze claim using current source scores93 - READ source scores, never UPDATE them94 - """95 - # Gather evidence96 - evidence_list = gather_evidence(claim_text)97 - for evidence in evidence_list:98 - # READ source score (snapshot from last weekly update)99 - source = get_source(evidence.source_id)100 - source_score = source.track_record_score101 - # Use score to weight evidence102 - evidence.weighted_relevance = evidence.relevance * source_score103 - # Generate verdict using weighted evidence104 - verdict = synthesize_verdict(evidence_list)105 - # NEVER update source scores here106 - # That happens in weekly background job107 - return verdict91 + """ 92 + Real-time: Analyze claim using current source scores 93 + READ source scores, never UPDATE them 94 + """ 95 + # Gather evidence 96 + evidence_list = gather_evidence(claim_text) 97 + for evidence in evidence_list: 98 + # READ source score (snapshot from last weekly update) 99 + source = get_source(evidence.source_id) 100 + source_score = source.track_record_score 101 + # Use score to weight evidence 102 + evidence.weighted_relevance = evidence.relevance * source_score 103 + # Generate verdict using weighted evidence 104 + verdict = synthesize_verdict(evidence_list) 105 + # NEVER update source scores here 106 + # That happens in weekly background job 107 + return verdict 108 108 {{/code}} 109 109 ==== Monthly Audit (Quality Assurance) ==== 110 110 Moderator review of flagged source scores: ... ... @@ -138,14 +138,14 @@ 138 138 **Example Timeline**: 139 139 ``` 140 140 Sunday 2 AM: Calculate source scores for past week 141 - → NYT score: 0.87 (up from 0.85)142 - → Blog X score: 0.52 (down from 0.61)141 + → NYT score: 0.87 (up from 0.85) 142 + → Blog X score: 0.52 (down from 0.61) 143 143 Monday-Saturday: Claims processed using these scores 144 - → All claims this week use NYT=0.87145 - → All claims this week use Blog X=0.52144 + → All claims this week use NYT=0.87 145 + → All claims this week use Blog X=0.52 146 146 Next Sunday 2 AM: Recalculate scores including this week's claims 147 - → NYT score: 0.89 (trending up)148 - → Blog X score: 0.48 (trending down)147 + → NYT score: 0.89 (trending up) 148 + → Blog X score: 0.48 (trending down) 149 149 ``` 150 150 === 1.4 Scenario === 151 151 **Purpose**: Different interpretations or contexts for evaluating claims ... ... @@ -174,23 +174,24 @@ 174 174 **Core Fields**: 175 175 * **id** (UUID): Primary key 176 176 * **scenario_id** (UUID FK): The scenario being assessed 177 -* **created_at** (timestamp): When verdict was first created 178 - 179 -**Versioned via VERDICT_VERSION**: Verdicts evolve as new evidence emerges or analysis improves. Each version captures: 180 180 * **likelihood_range** (text): Probabilistic assessment (e.g., "0.40-0.65 (uncertain)", "0.75-0.85 (likely true)") 181 181 * **confidence** (decimal 0-1): How confident we are in this assessment 182 182 * **explanation_summary** (text): Human-readable reasoning explaining the verdict 183 183 * **uncertainty_factors** (text array): Specific factors limiting confidence (e.g., "Small sample sizes", "Lifestyle confounds", "Long-term effects unknown") 184 -* **created_at** (timestamp): When this version was generated 181 +* **created_at** (timestamp): When verdict was created 182 +* **updated_at** (timestamp): Last modification 185 185 186 -** Relationship**:EachScenario has multipleVerdictsover time (asunderstanding evolves). EachVerdicthasmultipleversions.184 +**Change Tracking**: Like all entities, verdict changes are tracked through the Edit entity (section 1.7), not through separate version tables. Each edit records before/after states. 187 187 186 +**Relationship**: Each Scenario has one Verdict. When understanding evolves, the verdict is updated and the change is logged in the Edit entity. 187 + 188 188 **Example**: 189 189 For claim "Exercise improves mental health" in scenario "Clinical trials (healthy adults, structured programs)": 190 -* Initial verdict (v1): likelihood_range="0.40-0.65 (uncertain)", uncertainty_factors=["Small sample sizes", "Short-term studies only"] 191 -* Updated verdict (v2): likelihood_range="0.70-0.85 (likely true)", uncertainty_factors=["Lifestyle confounds remain"] 190 +* Initial state: likelihood_range="0.40-0.65 (uncertain)", uncertainty_factors=["Small sample sizes", "Short-term studies only"] 191 +* After new evidence: likelihood_range="0.70-0.85 (likely true)", uncertainty_factors=["Lifestyle confounds remain"] 192 +* Edit entity records the complete before/after change with timestamp and reason 192 192 193 -**Key Design**: SeparatingVerdictfromScenario allows trackinghow ourunderstandingevolves withoutlosinghistory.194 +**Key Design**: Verdicts are mutable entities tracked through the centralized Edit entity, consistent with Claims, Evidence, and Scenarios. 194 194 195 195 === 1.6 User === 196 196 Fields: username, email, **role** (Reader/Contributor/Moderator), **reputation**, contributions_count ... ... @@ -248,7 +248,7 @@ 248 248 * Threshold-based promotions 249 249 * Reputation decay for inactive users 250 250 * Track record scoring for contributors 251 -See [[When to Add Complexity>> Test.FactHarbor.Specification.When-to-Add-Complexity]] for triggers.252 +See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for triggers. 252 252 === 1.7 Edit === 253 253 **Fields**: entity_type, entity_id, user_id, before_state (JSON), after_state (JSON), edit_type, reason, created_at 254 254 **Purpose**: Complete audit trail for all content changes ... ... @@ -284,7 +284,7 @@ 284 284 See **Edit History Documentation** for complete details on what gets edited by whom, retention policy, and use cases 285 285 === 1.8 Flag === 286 286 Fields: entity_id, reported_by, issue_type, status, resolution_note 287 -=== 1.9 QualityMetric ===288 +=== 1.9 QualityMetric === 288 288 **Fields**: metric_type, category, value, target, timestamp 289 289 **Purpose**: Time-series quality tracking 290 290 **Usage**: ... ... @@ -294,7 +294,7 @@ 294 294 * **A/B testing**: Compare control vs treatment metrics 295 295 * **Improvement validation**: Measure before/after changes 296 296 **Example**: `{type: "ErrorRate", category: "Politics", value: 0.12, target: 0.10, timestamp: "2025-12-17"}` 297 -=== 1.10 ErrorPattern ===298 +=== 1.10 ErrorPattern === 298 298 **Fields**: error_category, claim_id, description, root_cause, frequency, status 299 299 **Purpose**: Capture errors to trigger system improvements 300 300 **Usage**: ... ... @@ -306,10 +306,10 @@ 306 306 307 307 == 1.4 Core Data Model ERD == 308 308 309 -{{include reference=" Test.FactHarbor.Specification.Diagrams.Core Data Model ERD.WebHome"/}}310 +{{include reference="FactHarbor.Specification.Diagrams.Core Data Model ERD.WebHome"/}} 310 310 311 311 == 1.5 User Class Diagram == 312 -{{include reference=" Test.FactHarbor.Specification.Diagrams.User Class Diagram.WebHome"/}}313 +{{include reference="FactHarbor.Specification.Diagrams.User Class Diagram.WebHome"/}} 313 313 == 2. Versioning Strategy == 314 314 **All Content Entities Are Versioned**: 315 315 * **Claim**: Every edit creates new version (V1→V2→V3...) ... ... @@ -328,9 +328,9 @@ 328 328 **Example**: 329 329 ``` 330 330 Claim V1: "The sky is blue" 331 - → User edits →332 + → User edits → 332 332 Claim V2: "The sky is blue during daytime" 333 - → EDIT table stores: {before: "The sky is blue", after: "The sky is blue during daytime"}334 + → EDIT table stores: {before: "The sky is blue", after: "The sky is blue during daytime"} 334 334 ``` 335 335 == 2.5. Storage vs Computation Strategy == 336 336 **Critical architectural decision**: What to persist in databases vs compute dynamically? ... ... @@ -418,8 +418,8 @@ 418 418 * **Compute cost**: $0.005-0.01 per request (LLM API call) 419 419 * **Frequency**: Viewed in detail by ~20% of users 420 420 * **Trade-off analysis**: 421 - - IF STORED: 1M claims × 3 KB = 3 GB storage, $0.05/month, fast access422 - - IF COMPUTED: 1M claims × 20% views × $0.01 = $2,000/month in LLM costs422 + - IF STORED: 1M claims × 3 KB = 3 GB storage, $0.05/month, fast access 423 + - IF COMPUTED: 1M claims × 20% views × $0.01 = $2,000/month in LLM costs 423 423 * **Reproducibility**: Scenarios may improve as AI improves (good to recompute) 424 424 * **Speed**: Computed = 5-8 seconds delay, Stored = instant 425 425 * **Decision**: ✅ STORE (hybrid approach below) ... ... @@ -452,8 +452,8 @@ 452 452 * **Current design**: Stored in User table 453 453 * **Alternative**: Compute from Edit table 454 454 * **Trade-off**: 455 - - Stored: Fast, simple456 - - Computed: Always accurate, no denormalization456 + - Stored: Fast, simple 457 + - Computed: Always accurate, no denormalization 457 457 * **Frequency**: Read on every user action 458 458 * **Compute cost**: Simple COUNT query, milliseconds 459 459 * **Decision**: ✅ STORE - Performance critical, read-heavy ... ... @@ -483,7 +483,7 @@ 483 483 * **Total**: ~$75/month infrastructure 484 484 **LLM cost savings by caching**: 485 485 * Analysis summary stored: Save $0.03 per claim = $30K per 1M claims 486 -* Scenarios stored: Save $0.01 per claim × 20% views = $2K per 1M claims 487 +* Scenarios stored: Save $0.01 per claim × 20% views = $2K per 1M claims 487 487 * Verdict stored: Save $0.003 per claim = $3K per 1M claims 488 488 * **Total savings**: ~$35K per 1M claims vs recomputing every time 489 489 === Recomputation Triggers === ... ... @@ -501,11 +501,11 @@ 501 501 **Year 1**: 10K claims 502 502 * Storage: 180 MB 503 503 * Cost: $10/month 504 -**Year 3**: 100K claims 505 +**Year 3**: 100K claims 505 505 * Storage: 1.8 GB 506 506 * Cost: $30/month 507 507 **Year 5**: 1M claims 508 -* Storage: 18 GB 509 +* Storage: 18 GB 509 509 * Cost: $75/month 510 510 **Year 10**: 10M claims 511 511 * Storage: 180 GB ... ... @@ -572,6 +572,6 @@ 572 572 * Source names (autocomplete) 573 573 Synchronized from PostgreSQL via change data capture or periodic sync. 574 574 == 4. Related Pages == 575 -* [[Architecture>> Test.FactHarbor.Specification.Architecture.WebHome]]576 -* [[Requirements>> Test.FactHarbor.Specification.Requirements.WebHome]]577 -* [[Workflows>> Test.FactHarbor.Specification.Workflows.WebHome]]576 +* [[Architecture>>FactHarbor.Specification.Architecture.WebHome]] 577 +* [[Requirements>>FactHarbor.Specification.Requirements.WebHome]] 578 +* [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]