Changes for page Data Model

Last modified by Robert Schaub on 2026/02/08 08:27

From version 3.1
edited by Robert Schaub
on 2025/12/19 14:41
Change comment: Imported from XAR
To version 5.1
edited by Robert Schaub
on 2025/12/24 21:53
Change comment: Imported from XAR

Summary

Details

Page properties
Content
... ... @@ -7,19 +7,19 @@
7 7  **Rationale**: Claims system is 95% reads, 5% writes. Denormalizing common data reduces joins and improves query performance by 70%.
8 8  **Additional cached fields in claims table**:
9 9  * **evidence_summary** (JSONB): Top 5 most relevant evidence snippets with scores
10 - * Avoids joining evidence table for listing/preview
11 - * Updated when evidence is added/removed
12 - * Format: `[{"text": "...", "source": "...", "relevance": 0.95}, ...]`
10 + * Avoids joining evidence table for listing/preview
11 + * Updated when evidence is added/removed
12 + * Format: `[{"text": "...", "source": "...", "relevance": 0.95}, ...]`
13 13  * **source_names** (TEXT[]): Array of source names for quick display
14 - * Avoids joining through evidence to sources
15 - * Updated when sources change
16 - * Format: `["New York Times", "Nature Journal", ...]`
14 + * Avoids joining through evidence to sources
15 + * Updated when sources change
16 + * Format: `["New York Times", "Nature Journal", ...]`
17 17  * **scenario_count** (INTEGER): Number of scenarios for this claim
18 - * Quick metric without counting rows
19 - * Updated when scenarios added/removed
18 + * Quick metric without counting rows
19 + * Updated when scenarios added/removed
20 20  * **cache_updated_at** (TIMESTAMP): When denormalized data was last refreshed
21 - * Helps invalidate stale caches
22 - * Triggers background refresh if too old
21 + * Helps invalidate stale caches
22 + * Triggers background refresh if too old
23 23  **Update Strategy**:
24 24  * **Immediate**: Update on claim edit (user-facing)
25 25  * **Deferred**: Update via background job every hour (non-critical)
... ... @@ -63,48 +63,48 @@
63 63  Runs independently of claim analysis:
64 64  {{code language="python"}}
65 65  def update_source_scores_weekly():
66 - """
67 - Background job: Calculate source reliability
68 - Never triggered by individual claim analysis
69 - """
70 - # Analyze all claims from past week
71 - claims = get_claims_from_past_week()
72 - for source in get_all_sources():
73 - # Calculate accuracy metrics
74 - correct_verdicts = count_correct_verdicts_citing(source, claims)
75 - total_citations = count_total_citations(source, claims)
76 - accuracy = correct_verdicts / total_citations if total_citations > 0 else 0.5
77 - # Weight by claim importance
78 - weighted_score = calculate_weighted_score(source, claims)
79 - # Update source record
80 - source.track_record_score = weighted_score
81 - source.total_citations = total_citations
82 - source.last_updated = now()
83 - source.save()
84 - # Job runs: Sunday 2 AM UTC
85 - # Never during claim processing
66 + """
67 + Background job: Calculate source reliability
68 + Never triggered by individual claim analysis
69 + """
70 + # Analyze all claims from past week
71 + claims = get_claims_from_past_week()
72 + for source in get_all_sources():
73 + # Calculate accuracy metrics
74 + correct_verdicts = count_correct_verdicts_citing(source, claims)
75 + total_citations = count_total_citations(source, claims)
76 + accuracy = correct_verdicts / total_citations if total_citations > 0 else 0.5
77 + # Weight by claim importance
78 + weighted_score = calculate_weighted_score(source, claims)
79 + # Update source record
80 + source.track_record_score = weighted_score
81 + source.total_citations = total_citations
82 + source.last_updated = now()
83 + source.save()
84 + # Job runs: Sunday 2 AM UTC
85 + # Never during claim processing
86 86  {{/code}}
87 87  ==== Real-Time Claim Analysis (AKEL) ====
88 88  Uses source scores but never updates them:
89 89  {{code language="python"}}
90 90  def analyze_claim(claim_text):
91 - """
92 - Real-time: Analyze claim using current source scores
93 - READ source scores, never UPDATE them
94 - """
95 - # Gather evidence
96 - evidence_list = gather_evidence(claim_text)
97 - for evidence in evidence_list:
98 - # READ source score (snapshot from last weekly update)
99 - source = get_source(evidence.source_id)
100 - source_score = source.track_record_score
101 - # Use score to weight evidence
102 - evidence.weighted_relevance = evidence.relevance * source_score
103 - # Generate verdict using weighted evidence
104 - verdict = synthesize_verdict(evidence_list)
105 - # NEVER update source scores here
106 - # That happens in weekly background job
107 - return verdict
91 + """
92 + Real-time: Analyze claim using current source scores
93 + READ source scores, never UPDATE them
94 + """
95 + # Gather evidence
96 + evidence_list = gather_evidence(claim_text)
97 + for evidence in evidence_list:
98 + # READ source score (snapshot from last weekly update)
99 + source = get_source(evidence.source_id)
100 + source_score = source.track_record_score
101 + # Use score to weight evidence
102 + evidence.weighted_relevance = evidence.relevance * source_score
103 + # Generate verdict using weighted evidence
104 + verdict = synthesize_verdict(evidence_list)
105 + # NEVER update source scores here
106 + # That happens in weekly background job
107 + return verdict
108 108  {{/code}}
109 109  ==== Monthly Audit (Quality Assurance) ====
110 110  Moderator review of flagged source scores:
... ... @@ -138,14 +138,14 @@
138 138  **Example Timeline**:
139 139  ```
140 140  Sunday 2 AM: Calculate source scores for past week
141 - → NYT score: 0.87 (up from 0.85)
142 - → Blog X score: 0.52 (down from 0.61)
141 + → NYT score: 0.87 (up from 0.85)
142 + → Blog X score: 0.52 (down from 0.61)
143 143  Monday-Saturday: Claims processed using these scores
144 - → All claims this week use NYT=0.87
145 - → All claims this week use Blog X=0.52
144 + → All claims this week use NYT=0.87
145 + → All claims this week use Blog X=0.52
146 146  Next Sunday 2 AM: Recalculate scores including this week's claims
147 - → NYT score: 0.89 (trending up)
148 - → Blog X score: 0.48 (trending down)
147 + → NYT score: 0.89 (trending up)
148 + → Blog X score: 0.48 (trending down)
149 149  ```
150 150  === 1.4 Scenario ===
151 151  **Purpose**: Different interpretations or contexts for evaluating claims
... ... @@ -174,23 +174,24 @@
174 174  **Core Fields**:
175 175  * **id** (UUID): Primary key
176 176  * **scenario_id** (UUID FK): The scenario being assessed
177 -* **created_at** (timestamp): When verdict was first created
178 -
179 -**Versioned via VERDICT_VERSION**: Verdicts evolve as new evidence emerges or analysis improves. Each version captures:
180 180  * **likelihood_range** (text): Probabilistic assessment (e.g., "0.40-0.65 (uncertain)", "0.75-0.85 (likely true)")
181 181  * **confidence** (decimal 0-1): How confident we are in this assessment
182 182  * **explanation_summary** (text): Human-readable reasoning explaining the verdict
183 183  * **uncertainty_factors** (text array): Specific factors limiting confidence (e.g., "Small sample sizes", "Lifestyle confounds", "Long-term effects unknown")
184 -* **created_at** (timestamp): When this version was generated
181 +* **created_at** (timestamp): When verdict was created
182 +* **updated_at** (timestamp): Last modification
185 185  
186 -**Relationship**: Each Scenario has multiple Verdicts over time (as understanding evolves). Each Verdict has multiple versions.
184 +**Change Tracking**: Like all entities, verdict changes are tracked through the Edit entity (section 1.7), not through separate version tables. Each edit records before/after states.
187 187  
186 +**Relationship**: Each Scenario has one Verdict. When understanding evolves, the verdict is updated and the change is logged in the Edit entity.
187 +
188 188  **Example**:
189 189  For claim "Exercise improves mental health" in scenario "Clinical trials (healthy adults, structured programs)":
190 -* Initial verdict (v1): likelihood_range="0.40-0.65 (uncertain)", uncertainty_factors=["Small sample sizes", "Short-term studies only"]
191 -* Updated verdict (v2): likelihood_range="0.70-0.85 (likely true)", uncertainty_factors=["Lifestyle confounds remain"]
190 +* Initial state: likelihood_range="0.40-0.65 (uncertain)", uncertainty_factors=["Small sample sizes", "Short-term studies only"]
191 +* After new evidence: likelihood_range="0.70-0.85 (likely true)", uncertainty_factors=["Lifestyle confounds remain"]
192 +* Edit entity records the complete before/after change with timestamp and reason
192 192  
193 -**Key Design**: Separating Verdict from Scenario allows tracking how our understanding evolves without losing history.
194 +**Key Design**: Verdicts are mutable entities tracked through the centralized Edit entity, consistent with Claims, Evidence, and Scenarios.
194 194  
195 195  === 1.6 User ===
196 196  Fields: username, email, **role** (Reader/Contributor/Moderator), **reputation**, contributions_count
... ... @@ -248,7 +248,7 @@
248 248  * Threshold-based promotions
249 249  * Reputation decay for inactive users
250 250  * Track record scoring for contributors
251 -See [[When to Add Complexity>>Test.FactHarbor.Specification.When-to-Add-Complexity]] for triggers.
252 +See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for triggers.
252 252  === 1.7 Edit ===
253 253  **Fields**: entity_type, entity_id, user_id, before_state (JSON), after_state (JSON), edit_type, reason, created_at
254 254  **Purpose**: Complete audit trail for all content changes
... ... @@ -284,7 +284,7 @@
284 284  See **Edit History Documentation** for complete details on what gets edited by whom, retention policy, and use cases
285 285  === 1.8 Flag ===
286 286  Fields: entity_id, reported_by, issue_type, status, resolution_note
287 -=== 1.9 QualityMetric ===
288 +=== 1.9 QualityMetric ===
288 288  **Fields**: metric_type, category, value, target, timestamp
289 289  **Purpose**: Time-series quality tracking
290 290  **Usage**:
... ... @@ -294,7 +294,7 @@
294 294  * **A/B testing**: Compare control vs treatment metrics
295 295  * **Improvement validation**: Measure before/after changes
296 296  **Example**: `{type: "ErrorRate", category: "Politics", value: 0.12, target: 0.10, timestamp: "2025-12-17"}`
297 -=== 1.10 ErrorPattern ===
298 +=== 1.10 ErrorPattern ===
298 298  **Fields**: error_category, claim_id, description, root_cause, frequency, status
299 299  **Purpose**: Capture errors to trigger system improvements
300 300  **Usage**:
... ... @@ -306,10 +306,10 @@
306 306  
307 307  == 1.4 Core Data Model ERD ==
308 308  
309 -{{include reference="Test.FactHarbor.Specification.Diagrams.Core Data Model ERD.WebHome"/}}
310 +{{include reference="FactHarbor.Specification.Diagrams.Core Data Model ERD.WebHome"/}}
310 310  
311 311  == 1.5 User Class Diagram ==
312 -{{include reference="Test.FactHarbor.Specification.Diagrams.User Class Diagram.WebHome"/}}
313 +{{include reference="FactHarbor.Specification.Diagrams.User Class Diagram.WebHome"/}}
313 313  == 2. Versioning Strategy ==
314 314  **All Content Entities Are Versioned**:
315 315  * **Claim**: Every edit creates new version (V1→V2→V3...)
... ... @@ -328,9 +328,9 @@
328 328  **Example**:
329 329  ```
330 330  Claim V1: "The sky is blue"
331 - → User edits →
332 + → User edits →
332 332  Claim V2: "The sky is blue during daytime"
333 - → EDIT table stores: {before: "The sky is blue", after: "The sky is blue during daytime"}
334 + → EDIT table stores: {before: "The sky is blue", after: "The sky is blue during daytime"}
334 334  ```
335 335  == 2.5. Storage vs Computation Strategy ==
336 336  **Critical architectural decision**: What to persist in databases vs compute dynamically?
... ... @@ -418,8 +418,8 @@
418 418  * **Compute cost**: $0.005-0.01 per request (LLM API call)
419 419  * **Frequency**: Viewed in detail by ~20% of users
420 420  * **Trade-off analysis**:
421 - - IF STORED: 1M claims × 3 KB = 3 GB storage, $0.05/month, fast access
422 - - IF COMPUTED: 1M claims × 20% views × $0.01 = $2,000/month in LLM costs
422 + - IF STORED: 1M claims × 3 KB = 3 GB storage, $0.05/month, fast access
423 + - IF COMPUTED: 1M claims × 20% views × $0.01 = $2,000/month in LLM costs
423 423  * **Reproducibility**: Scenarios may improve as AI improves (good to recompute)
424 424  * **Speed**: Computed = 5-8 seconds delay, Stored = instant
425 425  * **Decision**: ✅ STORE (hybrid approach below)
... ... @@ -452,8 +452,8 @@
452 452  * **Current design**: Stored in User table
453 453  * **Alternative**: Compute from Edit table
454 454  * **Trade-off**:
455 - - Stored: Fast, simple
456 - - Computed: Always accurate, no denormalization
456 + - Stored: Fast, simple
457 + - Computed: Always accurate, no denormalization
457 457  * **Frequency**: Read on every user action
458 458  * **Compute cost**: Simple COUNT query, milliseconds
459 459  * **Decision**: ✅ STORE - Performance critical, read-heavy
... ... @@ -483,7 +483,7 @@
483 483  * **Total**: ~$75/month infrastructure
484 484  **LLM cost savings by caching**:
485 485  * Analysis summary stored: Save $0.03 per claim = $30K per 1M claims
486 -* Scenarios stored: Save $0.01 per claim × 20% views = $2K per 1M claims
487 +* Scenarios stored: Save $0.01 per claim × 20% views = $2K per 1M claims
487 487  * Verdict stored: Save $0.003 per claim = $3K per 1M claims
488 488  * **Total savings**: ~$35K per 1M claims vs recomputing every time
489 489  === Recomputation Triggers ===
... ... @@ -501,11 +501,11 @@
501 501  **Year 1**: 10K claims
502 502  * Storage: 180 MB
503 503  * Cost: $10/month
504 -**Year 3**: 100K claims
505 +**Year 3**: 100K claims
505 505  * Storage: 1.8 GB
506 506  * Cost: $30/month
507 507  **Year 5**: 1M claims
508 -* Storage: 18 GB
509 +* Storage: 18 GB
509 509  * Cost: $75/month
510 510  **Year 10**: 10M claims
511 511  * Storage: 180 GB
... ... @@ -572,6 +572,6 @@
572 572  * Source names (autocomplete)
573 573  Synchronized from PostgreSQL via change data capture or periodic sync.
574 574  == 4. Related Pages ==
575 -* [[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]]
576 -* [[Requirements>>Test.FactHarbor.Specification.Requirements.WebHome]]
577 -* [[Workflows>>Test.FactHarbor.Specification.Workflows.WebHome]]
576 +* [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]
577 +* [[Requirements>>FactHarbor.Specification.Requirements.WebHome]]
578 +* [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]