Changes for page Data Model

Last modified by Robert Schaub on 2026/02/08 08:27

From 2.1 to 3.1 From 5.1 to 5.2

From version 3.1

edited by Robert Schaub
on 2025/12/19 14:41

Change comment: Imported from XAR

To version 5.1

edited by Robert Schaub
on 2025/12/24 21:53

Change comment: Imported from XAR

Raw
Rendered

Summary

Page properties (1 modified, 0 added, 0 removed)

Details

Page properties

Content

@@ -7,19 +7,19 @@
  **Rationale**: Claims system is 95% reads, 5% writes. Denormalizing common data reduces joins and improves query performance by 70%.
  **Additional cached fields in claims table**:
  * **evidence_summary** (JSONB): Top 5 most relevant evidence snippets with scores
--  * Avoids joining evidence table for listing/preview
--  * Updated when evidence is added/removed
--  * Format: `[{"text": "...", "source": "...", "relevance": 0.95}, ...]`
++ * Avoids joining evidence table for listing/preview
++ * Updated when evidence is added/removed
++ * Format: `[{"text": "...", "source": "...", "relevance": 0.95}, ...]`
  * **source_names** (TEXT[]): Array of source names for quick display
--  * Avoids joining through evidence to sources
--  * Updated when sources change
--  * Format: `["New York Times", "Nature Journal", ...]`
++ * Avoids joining through evidence to sources
++ * Updated when sources change
++ * Format: `["New York Times", "Nature Journal", ...]`
  * **scenario_count** (INTEGER): Number of scenarios for this claim
--  * Quick metric without counting rows
--  * Updated when scenarios added/removed
++ * Quick metric without counting rows
++ * Updated when scenarios added/removed
  * **cache_updated_at** (TIMESTAMP): When denormalized data was last refreshed
--  * Helps invalidate stale caches
--  * Triggers background refresh if too old
++ * Helps invalidate stale caches
++ * Triggers background refresh if too old
  **Update Strategy**:
  * **Immediate**: Update on claim edit (user-facing)
  * **Deferred**: Update via background job every hour (non-critical)
@@ -63,48 +63,48 @@
  Runs independently of claim analysis:
  {{code language="python"}}
  def update_source_scores_weekly():
--    """
--    Background job: Calculate source reliability
--    Never triggered by individual claim analysis
--    """
--    # Analyze all claims from past week
--    claims = get_claims_from_past_week()
--    for source in get_all_sources():
--        # Calculate accuracy metrics
--        correct_verdicts = count_correct_verdicts_citing(source, claims)
--        total_citations = count_total_citations(source, claims)
--        accuracy = correct_verdicts / total_citations if total_citations > 0 else 0.5
--        # Weight by claim importance
--        weighted_score = calculate_weighted_score(source, claims)
--        # Update source record
--        source.track_record_score = weighted_score
--        source.total_citations = total_citations
--        source.last_updated = now()
--        source.save()
--    # Job runs: Sunday 2 AM UTC
--    # Never during claim processing
++ """
++ Background job: Calculate source reliability
++ Never triggered by individual claim analysis
++ """
++ # Analyze all claims from past week
++ claims = get_claims_from_past_week()
++ for source in get_all_sources():
++ # Calculate accuracy metrics
++ correct_verdicts = count_correct_verdicts_citing(source, claims)
++ total_citations = count_total_citations(source, claims)
++ accuracy = correct_verdicts / total_citations if total_citations > 0 else 0.5
++ # Weight by claim importance
++ weighted_score = calculate_weighted_score(source, claims)
++ # Update source record
++ source.track_record_score = weighted_score
++ source.total_citations = total_citations
++ source.last_updated = now()
++ source.save()
++ # Job runs: Sunday 2 AM UTC
++ # Never during claim processing
  {{/code}}
  ==== Real-Time Claim Analysis (AKEL) ====
  Uses source scores but never updates them:
  {{code language="python"}}
  def analyze_claim(claim_text):
--    """
--    Real-time: Analyze claim using current source scores
--    READ source scores, never UPDATE them
--    """
--    # Gather evidence
--    evidence_list = gather_evidence(claim_text)
--    for evidence in evidence_list:
--        # READ source score (snapshot from last weekly update)
--        source = get_source(evidence.source_id)
--        source_score = source.track_record_score
--        # Use score to weight evidence
--        evidence.weighted_relevance = evidence.relevance * source_score
--    # Generate verdict using weighted evidence
--    verdict = synthesize_verdict(evidence_list)
--    # NEVER update source scores here
--    # That happens in weekly background job
--    return verdict
++ """
++ Real-time: Analyze claim using current source scores
++ READ source scores, never UPDATE them
++ """
++ # Gather evidence
++ evidence_list = gather_evidence(claim_text)
++ for evidence in evidence_list:
++ # READ source score (snapshot from last weekly update)
++ source = get_source(evidence.source_id)
++ source_score = source.track_record_score
++ # Use score to weight evidence
++ evidence.weighted_relevance = evidence.relevance * source_score
++ # Generate verdict using weighted evidence
++ verdict = synthesize_verdict(evidence_list)
++ # NEVER update source scores here
++ # That happens in weekly background job
++ return verdict
  {{/code}}
  ==== Monthly Audit (Quality Assurance) ====
  Moderator review of flagged source scores:
@@ -138,14 +138,14 @@
  **Example Timeline**:
  ```
  Sunday 2 AM: Calculate source scores for past week
--  → NYT score: 0.87 (up from 0.85)
--  → Blog X score: 0.52 (down from 0.61)
++ → NYT score: 0.87 (up from 0.85)
++ → Blog X score: 0.52 (down from 0.61)
  Monday-Saturday: Claims processed using these scores
--  → All claims this week use NYT=0.87
--  → All claims this week use Blog X=0.52
++ → All claims this week use NYT=0.87
++ → All claims this week use Blog X=0.52
  Next Sunday 2 AM: Recalculate scores including this week's claims
--  → NYT score: 0.89 (trending up)
--  → Blog X score: 0.48 (trending down)
++ → NYT score: 0.89 (trending up)
++ → Blog X score: 0.48 (trending down)
  ```
  === 1.4 Scenario ===
  **Purpose**: Different interpretations or contexts for evaluating claims
@@ -174,23 +174,24 @@
  **Core Fields**:
  * **id** (UUID): Primary key
  * **scenario_id** (UUID FK): The scenario being assessed
--* **created_at** (timestamp): When verdict was first created
--
--**Versioned via VERDICT_VERSION**: Verdicts evolve as new evidence emerges or analysis improves. Each version captures:
  * **likelihood_range** (text): Probabilistic assessment (e.g., "0.40-0.65 (uncertain)", "0.75-0.85 (likely true)")
  * **confidence** (decimal 0-1): How confident we are in this assessment
  * **explanation_summary** (text): Human-readable reasoning explaining the verdict
  * **uncertainty_factors** (text array): Specific factors limiting confidence (e.g., "Small sample sizes", "Lifestyle confounds", "Long-term effects unknown")
--* **created_at** (timestamp): When this version was generated
++* **created_at** (timestamp): When verdict was created
++* **updated_at** (timestamp): Last modification
--**Relationship**: Each Scenario has multiple Verdicts over time (as understanding evolves). Each Verdict has multiple versions.
++**Change Tracking**: Like all entities, verdict changes are tracked through the Edit entity (section 1.7), not through separate version tables. Each edit records before/after states.
++**Relationship**: Each Scenario has one Verdict. When understanding evolves, the verdict is updated and the change is logged in the Edit entity.
++
  **Example**:
  For claim "Exercise improves mental health" in scenario "Clinical trials (healthy adults, structured programs)":
--* Initial verdict (v1): likelihood_range="0.40-0.65 (uncertain)", uncertainty_factors=["Small sample sizes", "Short-term studies only"]
--* Updated verdict (v2): likelihood_range="0.70-0.85 (likely true)", uncertainty_factors=["Lifestyle confounds remain"]
++* Initial state: likelihood_range="0.40-0.65 (uncertain)", uncertainty_factors=["Small sample sizes", "Short-term studies only"]
++* After new evidence: likelihood_range="0.70-0.85 (likely true)", uncertainty_factors=["Lifestyle confounds remain"]
++* Edit entity records the complete before/after change with timestamp and reason
--**Key Design**: Separating Verdict from Scenario allows tracking how our understanding evolves without losing history.
++**Key Design**: Verdicts are mutable entities tracked through the centralized Edit entity, consistent with Claims, Evidence, and Scenarios.
  === 1.6 User ===
  Fields: username, email, **role** (Reader/Contributor/Moderator), **reputation**, contributions_count
@@ -248,7 +248,7 @@
  * Threshold-based promotions
  * Reputation decay for inactive users
  * Track record scoring for contributors
--See [[When to Add Complexity>>Test.FactHarbor.Specification.When-to-Add-Complexity]] for triggers.
++See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for triggers.
  === 1.7 Edit ===
  **Fields**: entity_type, entity_id, user_id, before_state (JSON), after_state (JSON), edit_type, reason, created_at
  **Purpose**: Complete audit trail for all content changes
@@ -284,7 +284,7 @@
  See **Edit History Documentation** for complete details on what gets edited by whom, retention policy, and use cases
  === 1.8 Flag ===
  Fields: entity_id, reported_by, issue_type, status, resolution_note
--=== 1.9 QualityMetric  ===
++=== 1.9 QualityMetric ===
  **Fields**: metric_type, category, value, target, timestamp
  **Purpose**: Time-series quality tracking
  **Usage**:
@@ -294,7 +294,7 @@
  * **A/B testing**: Compare control vs treatment metrics
  * **Improvement validation**: Measure before/after changes
  **Example**: `{type: "ErrorRate", category: "Politics", value: 0.12, target: 0.10, timestamp: "2025-12-17"}`
--=== 1.10 ErrorPattern  ===
++=== 1.10 ErrorPattern ===
  **Fields**: error_category, claim_id, description, root_cause, frequency, status
  **Purpose**: Capture errors to trigger system improvements
  **Usage**:
@@ -306,10 +306,10 @@
  == 1.4 Core Data Model ERD ==
--{{include reference="Test.FactHarbor.Specification.Diagrams.Core Data Model ERD.WebHome"/}}
++{{include reference="FactHarbor.Specification.Diagrams.Core Data Model ERD.WebHome"/}}
  == 1.5 User Class Diagram ==
--{{include reference="Test.FactHarbor.Specification.Diagrams.User Class Diagram.WebHome"/}}
++{{include reference="FactHarbor.Specification.Diagrams.User Class Diagram.WebHome"/}}
  == 2. Versioning Strategy ==
  **All Content Entities Are Versioned**:
  * **Claim**: Every edit creates new version (V1→V2→V3...)
@@ -328,9 +328,9 @@
  **Example**:
  ```
  Claim V1: "The sky is blue"
--  → User edits →
++ → User edits →
  Claim V2: "The sky is blue during daytime"
--  → EDIT table stores: {before: "The sky is blue", after: "The sky is blue during daytime"}
++ → EDIT table stores: {before: "The sky is blue", after: "The sky is blue during daytime"}
  ```
  == 2.5. Storage vs Computation Strategy ==
  **Critical architectural decision**: What to persist in databases vs compute dynamically?
@@ -418,8 +418,8 @@
  * **Compute cost**: $0.005-0.01 per request (LLM API call)
  * **Frequency**: Viewed in detail by ~20% of users
  * **Trade-off analysis**:
--  - IF STORED: 1M claims × 3 KB = 3 GB storage, $0.05/month, fast access
--  - IF COMPUTED: 1M claims × 20% views × $0.01 = $2,000/month in LLM costs
++ - IF STORED: 1M claims × 3 KB = 3 GB storage, $0.05/month, fast access
++ - IF COMPUTED: 1M claims × 20% views × $0.01 = $2,000/month in LLM costs
  * **Reproducibility**: Scenarios may improve as AI improves (good to recompute)
  * **Speed**: Computed = 5-8 seconds delay, Stored = instant
  * **Decision**: ✅ STORE (hybrid approach below)
@@ -452,8 +452,8 @@
  * **Current design**: Stored in User table
  * **Alternative**: Compute from Edit table
  * **Trade-off**:
--  - Stored: Fast, simple
--  - Computed: Always accurate, no denormalization
++ - Stored: Fast, simple
++ - Computed: Always accurate, no denormalization
  * **Frequency**: Read on every user action
  * **Compute cost**: Simple COUNT query, milliseconds
  * **Decision**: ✅ STORE - Performance critical, read-heavy
@@ -483,7 +483,7 @@
  * **Total**: ~$75/month infrastructure
  **LLM cost savings by caching**:
  * Analysis summary stored: Save $0.03 per claim = $30K per 1M claims
--* Scenarios stored: Save $0.01 per claim × 20% views = $2K per 1M claims
++* Scenarios stored: Save $0.01 per claim × 20% views = $2K per 1M claims
  * Verdict stored: Save $0.003 per claim = $3K per 1M claims
  * **Total savings**: ~$35K per 1M claims vs recomputing every time
  === Recomputation Triggers ===
@@ -501,11 +501,11 @@
  **Year 1**: 10K claims
  * Storage: 180 MB
  * Cost: $10/month
--**Year 3**: 100K claims
++**Year 3**: 100K claims
  * Storage: 1.8 GB
  * Cost: $30/month
  **Year 5**: 1M claims
--* Storage: 18 GB
++* Storage: 18 GB
  * Cost: $75/month
  **Year 10**: 10M claims
  * Storage: 180 GB
@@ -572,6 +572,6 @@
  * Source names (autocomplete)
  Synchronized from PostgreSQL via change data capture or periodic sync.
  == 4. Related Pages ==
--* [[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]]
--* [[Requirements>>Test.FactHarbor.Specification.Requirements.WebHome]]
--* [[Workflows>>Test.FactHarbor.Specification.Workflows.WebHome]]
++* [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]
++* [[Requirements>>FactHarbor.Specification.Requirements.WebHome]]
++* [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]

Changes for page Data Model

Summary

Details

Applications

Navigation

Need help?