Changes for page Data Model

Last modified by Robert Schaub on 2025/12/22 14:16

From 1.4 to 1.5

From version 1.1

edited by Robert Schaub
on 2025/12/22 14:10

Change comment: Imported from XAR

To version 1.4

edited by Robert Schaub
on 2025/12/22 14:16

Change comment: Renamed back-links.

Raw
Rendered

Summary

Page properties (2 modified, 0 added, 0 removed)

Details

Page properties

Parent

@@ -1,1 +1,1 @@
--Test.FactHarbor.Specification.WebHome
++Test.FactHarbor pre11 V0\.9\.70.Specification.WebHome

Content

@@ -1,11 +1,18 @@
  = Data Model =
++
  FactHarbor's data model is **simple, focused, designed for automated processing**.
++
  == 1. Core Entities ==
++
  === 1.1 Claim ===
++
  Fields: id, assertion, domain, **status** (Published/Hidden only), **confidence_score**, **risk_score**, completeness_score, version, views, edit_count
++
  ==== Performance Optimization: Denormalized Fields ====
++
  **Rationale**: Claims system is 95% reads, 5% writes. Denormalizing common data reduces joins and improves query performance by 70%.
  **Additional cached fields in claims table**:
++
  * **evidence_summary** (JSONB): Top 5 most relevant evidence snippets with scores * Avoids joining evidence table for listing/preview * Updated when evidence is added/removed * Format: `[{"text": "...", "source": "...", "relevance": 0.95}, ...]`
  * **source_names** (TEXT[]): Array of source names for quick display * Avoids joining through evidence to sources * Updated when sources change * Format: `["New York Times", "Nature Journal", ...]`
  * **scenario_count** (INTEGER): Number of scenarios for this claim * Quick metric without counting rows * Updated when scenarios added/removed
@@ -18,13 +18,18 @@
  * ✅ 70% fewer joins on common queries
  * ✅ Much faster claim list/search pages
  * ✅ Better user experience
--* ⚠️ Small storage increase (~10%)
++* ⚠️ Small storage increase (10%)
  * ⚠️ Need to keep caches in sync
++
  === 1.2 Evidence ===
++
  Fields: claim_id, source_id, excerpt, url, relevance_score, supports
++
  === 1.3 Source ===
++
  **Purpose**: Track reliability of information sources over time
  **Fields**:
++
  * **id** (UUID): Unique identifier
  * **name** (text): Source name (e.g., "New York Times", "Nature Journal")
  * **domain** (text): Website domain (e.g., "nytimes.com")
@@ -42,24 +42,30 @@
  **See**: SOURCE Track Record System documentation for complete details on calculation, updates, and usage
  Fields: id, name, domain, **track_record_score**, **accuracy_history**, **correction_frequency**
  **Key**: Automated source reliability tracking
++
  ==== Source Scoring Process (Separation of Concerns) ====
++
  **Critical design principle**: Prevent circular dependencies between source scoring and claim analysis.
  **The Problem**: * Source scores should influence claim verdicts
++
  * Claim verdicts should update source scores
  * But: Direct feedback creates circular dependency and potential feedback loops
  **The Solution**: Temporal separation
++
  ==== Weekly Background Job (Source Scoring) ====
++
  Runs independently of claim analysis:
--{{code language="python"}}
--def update_source_scores_weekly(): """ Background job: Calculate source reliability Never triggered by individual claim analysis """ # Analyze all claims from past week claims = get_claims_from_past_week() for source in get_all_sources(): # Calculate accuracy metrics correct_verdicts = count_correct_verdicts_citing(source, claims) total_citations = count_total_citations(source, claims) accuracy = correct_verdicts / total_citations if total_citations > 0 else 0.5 # Weight by claim importance weighted_score = calculate_weighted_score(source, claims) # Update source record source.track_record_score = weighted_score source.total_citations = total_citations source.last_updated = now() source.save() # Job runs: Sunday 2 AM UTC # Never during claim processing
--{{/code}}
++{{code language="python"}}def update_source_scores_weekly(): """ Background job: Calculate source reliability Never triggered by individual claim analysis """ # Analyze all claims from past week claims = get_claims_from_past_week() for source in get_all_sources(): # Calculate accuracy metrics correct_verdicts = count_correct_verdicts_citing(source, claims) total_citations = count_total_citations(source, claims) accuracy = correct_verdicts / total_citations if total_citations > 0 else 0.5 # Weight by claim importance weighted_score = calculate_weighted_score(source, claims) # Update source record source.track_record_score = weighted_score source.total_citations = total_citations source.last_updated = now() source.save() # Job runs: Sunday 2 AM UTC # Never during claim processing{{/code}}
++
  ==== Real-Time Claim Analysis (AKEL) ====
++
  Uses source scores but never updates them:
--{{code language="python"}}
--def analyze_claim(claim_text): """ Real-time: Analyze claim using current source scores READ source scores, never UPDATE them """ # Gather evidence evidence_list = gather_evidence(claim_text) for evidence in evidence_list: # READ source score (snapshot from last weekly update) source = get_source(evidence.source_id) source_score = source.track_record_score # Use score to weight evidence evidence.weighted_relevance = evidence.relevance * source_score # Generate verdict using weighted evidence verdict = synthesize_verdict(evidence_list) # NEVER update source scores here # That happens in weekly background job return verdict
--{{/code}}
++{{code language="python"}}def analyze_claim(claim_text): """ Real-time: Analyze claim using current source scores READ source scores, never UPDATE them """ # Gather evidence evidence_list = gather_evidence(claim_text) for evidence in evidence_list: # READ source score (snapshot from last weekly update) source = get_source(evidence.source_id) source_score = source.track_record_score # Use score to weight evidence evidence.weighted_relevance = evidence.relevance * source_score # Generate verdict using weighted evidence verdict = synthesize_verdict(evidence_list) # NEVER update source scores here # That happens in weekly background job return verdict{{/code}}
++
  ==== Monthly Audit (Quality Assurance) ====
++
  Moderator review of flagged source scores:
++
  * Verify scores make sense
  * Detect gaming attempts
  * Identify systematic biases
@@ -93,11 +93,14 @@
  Monday-Saturday: Claims processed using these scores → All claims this week use NYT=0.87 → All claims this week use Blog X=0.52
  Next Sunday 2 AM: Recalculate scores including this week's claims → NYT score: 0.89 (trending up) → Blog X score: 0.48 (trending down)
  ```
++
  === 1.4 Scenario ===
++
  **Purpose**: Different interpretations or contexts for evaluating claims
  **Key Concept**: Scenarios are extracted from evidence, not generated arbitrarily. Each scenario represents a specific context, assumption set, or condition under which a claim should be evaluated.
  **Relationship**: One-to-many with Claims (**simplified for V1.0**: scenario belongs to single claim)
  **Fields**:
++
  * **id** (UUID): Unique identifier
  * **claim_id** (UUID): Foreign key to claim (one-to-many)
  * **description** (text): Human-readable description of the scenario
@@ -124,11 +124,16 @@
  * After new evidence: likelihood_range="0.70-0.85 (likely true)", uncertainty_factors=["Lifestyle confounds remain"]
  * Edit entity records the complete before/after change with timestamp and reason **Key Design**: Verdicts are mutable entities tracked through the centralized Edit entity, consistent with Claims, Evidence, and Scenarios. === 1.6 User ===
  Fields: username, email, **role** (Reader/Contributor/Moderator), **reputation**, contributions_count
--=== User Reputation System ==
++
++=== User Reputation System ===
++
  **V1.0 Approach**: Simple manual role assignment
  **Rationale**: Complex reputation systems aren't needed until 100+ active contributors demonstrate the need for automated reputation management. Start simple, add complexity when metrics prove necessary.
++
  === Roles (Manual Assignment) ===
++
  **reader** (default):
++
  * View published claims and evidence
  * Browse and search content
  * No editing permissions
@@ -147,8 +147,11 @@
  * System configuration
  * Access to all features
  * Founder-appointed initially
++
  === Contribution Tracking (Simple) ===
++
  **Basic metrics only**:
++
  * `contributions_count`: Total number of contributions
  * `created_at`: Account age
  * `last_active`: Recent activity
@@ -157,19 +157,26 @@
  * No automated privilege escalation
  * No reputation decay
  * No threshold-based promotions
++
  === Promotion Process ===
++
  **Manual review by moderators/admins**:
++
 . User demonstrates value through contributions
 . Moderator reviews user's contribution history
 . Moderator promotes user to contributor role
 . Admin promotes trusted contributors to moderator
  **Criteria** (guidelines, not automated):
++
  * Quality of contributions
  * Consistency over time
  * Collaborative behavior
  * Understanding of project goals
++
  === V2.0+ Evolution ===
++
  **Add complex reputation when**:
++
  * 100+ active contributors
  * Manual role management becomes bottleneck
  * Clear patterns of abuse emerge requiring automation
@@ -179,11 +179,16 @@
  * Reputation decay for inactive users
  * Track record scoring for contributors
  See [[When to Add Complexity>>Test.FactHarbor.Specification.When-to-Add-Complexity]] for triggers.
++
  === 1.7 Edit ===
++
  **Fields**: entity_type, entity_id, user_id, before_state (JSON), after_state (JSON), edit_type, reason, created_at
  **Purpose**: Complete audit trail for all content changes
++
  === Edit History Details ===
++
  **What Gets Edited**:
++
  * **Claims** (20% edited): assertion, domain, status, scores, analysis
  * **Evidence** (10% edited): excerpt, relevance_score, supports
  * **Scenarios** (5% edited): description, assumptions, confidence
@@ -200,12 +200,14 @@
  * `MODERATION_ACTION`: Hide/unhide for abuse
  * `REVERT`: Rollback to previous version
  **Retention Policy** (5 years total):
++
 . **Hot storage** (3 months): PostgreSQL, instant access
 . **Warm storage** (2 years): Partitioned, slower queries
 . **Cold storage** (3 years): S3 compressed, download required
 . **Deletion**: After 5 years (except legal holds)
--**Storage per 1M claims**: ~400 MB (20% edited × 2 KB per edit)
++**Storage per 1M claims**: 400 MB (20% edited × 2 KB per edit)
  **Use Cases**:
++
  * View claim history timeline
  * Detect vandalism patterns
  * Learn from user corrections (system improvement)
@@ -212,12 +212,17 @@
  * Legal compliance (audit trail)
  * Rollback capability
  See **Edit History Documentation** for complete details on what gets edited by whom, retention policy, and use cases
++
  === 1.8 Flag ===
++
  Fields: entity_id, reported_by, issue_type, status, resolution_note
++
  === 1.9 QualityMetric ===
++
  **Fields**: metric_type, category, value, target, timestamp
  **Purpose**: Time-series quality tracking
  **Usage**:
++
  * **Continuous monitoring**: Hourly calculation of error rates, confidence scores, processing times
  * **Quality dashboard**: Real-time display with trend charts
  * **Alerting**: Automatic alerts when metrics exceed thresholds
@@ -224,18 +224,24 @@
  * **A/B testing**: Compare control vs treatment metrics
  * **Improvement validation**: Measure before/after changes
  **Example**: `{type: "ErrorRate", category: "Politics", value: 0.12, target: 0.10, timestamp: "2025-12-17"}`
++
  === 1.10 ErrorPattern ===
++
  **Fields**: error_category, claim_id, description, root_cause, frequency, status
  **Purpose**: Capture errors to trigger system improvements
  **Usage**:
++
  * **Error capture**: When users flag issues or system detects problems
  * **Pattern analysis**: Weekly grouping by category and frequency
  * **Improvement workflow**: Analyze → Fix → Test → Deploy → Re-process → Monitor
  * **Metrics**: Track error rate reduction over time
--**Example**: `{category: "WrongSource", description: "Unreliable tabloid cited", root_cause: "No quality check", frequency: 23, status: "Fixed"}` == 1.4 Core Data Model ERD == {{include reference="Test.FactHarbor.Specification.Diagrams.Core Data Model ERD.WebHome"/}} == 1.5 User Class Diagram ==
++**Example**: `{category: "WrongSource", description: "Unreliable tabloid cited", root_cause: "No quality check", frequency: 23, status: "Fixed"}` == 1.4 Core Data Model ERD == {{include reference="Test.FactHarbor pre11 V0\.9\.70.Specification.Diagrams.Core Data Model ERD.WebHome"/}} == 1.5 User Class Diagram ==
  {{include reference="Test.FactHarbor.Specification.Diagrams.User Class Diagram.WebHome"/}}
++
  == 2. Versioning Strategy ==
++
  **All Content Entities Are Versioned**:
++
  * **Claim**: Every edit creates new version (V1→V2→V3...)
  * **Evidence**: Changes tracked in edit history
  * **Scenario**: Modifications versioned
@@ -253,68 +253,91 @@
  ```
  Claim V1: "The sky is blue" → User edits → Claim V2: "The sky is blue during daytime" → EDIT table stores: {before: "The sky is blue", after: "The sky is blue during daytime"}
  ```
++
  == 2.5. Storage vs Computation Strategy ==
++
  **Critical architectural decision**: What to persist in databases vs compute dynamically?
  **Trade-off**:
++
  * **Store more**: Better reproducibility, faster, lower LLM costs | Higher storage/maintenance costs
  * **Compute more**: Lower storage/maintenance costs | Slower, higher LLM costs, less reproducible
++
  === Recommendation: Hybrid Approach ===
++
  **STORE (in PostgreSQL):**
++
  ==== Claims (Current State + History) ====
++
  * **What**: assertion, domain, status, created_at, updated_at, version
  * **Why**: Core entity, must be persistent
  * **Also store**: confidence_score (computed once, then cached)
--* **Size**: ~1 KB per claim
++* **Size**: 1 KB per claim
  * **Growth**: Linear with claims
  * **Decision**: ✅ STORE - Essential
++
  ==== Evidence (All Records) ====
++
  * **What**: claim_id, source_id, excerpt, url, relevance_score, supports, extracted_at
  * **Why**: Hard to re-gather, user contributions, reproducibility
--* **Size**: ~2 KB per evidence (with excerpt)
++* **Size**: 2 KB per evidence (with excerpt)
  * **Growth**: 3-10 evidence per claim
  * **Decision**: ✅ STORE - Essential for reproducibility
++
  ==== Sources (Track Records) ====
++
  * **What**: name, domain, track_record_score, accuracy_history, correction_frequency
  * **Why**: Continuously updated, expensive to recompute
--* **Size**: ~500 bytes per source
++* **Size**: 500 bytes per source
  * **Growth**: Slow (limited number of sources)
  * **Decision**: ✅ STORE - Essential for quality
++
  ==== Edit History (All Versions) ====
++
  * **What**: before_state, after_state, user_id, reason, timestamp
  * **Why**: Audit trail, legal requirement, reproducibility
--* **Size**: ~2 KB per edit
--* **Growth**: Linear with edits (~A portion of claims get edited)
++* **Size**: 2 KB per edit
++* **Growth**: Linear with edits (A portion of claims get edited)
  * **Retention**: Hot storage 3 months → Warm storage 2 years → Archive to S3 3 years → Delete after 5 years total
  * **Decision**: ✅ STORE - Essential for accountability
++
  ==== Flags (User Reports) ====
++
  * **What**: entity_id, reported_by, issue_type, description, status
  * **Why**: Error detection, system improvement triggers
--* **Size**: ~500 bytes per flag
++* **Size**: 500 bytes per flag
  * **Growth**: 5-high percentage of claims get flagged
  * **Decision**: ✅ STORE - Essential for improvement
++
  ==== ErrorPatterns (System Improvement) ====
++
  * **What**: error_category, claim_id, description, root_cause, frequency, status
  * **Why**: Learning loop, prevent recurring errors
--* **Size**: ~1 KB per pattern
++* **Size**: 1 KB per pattern
  * **Growth**: Slow (limited patterns, many fixed)
  * **Decision**: ✅ STORE - Essential for learning
++
  ==== QualityMetrics (Time Series) ====
++
  * **What**: metric_type, category, value, target, timestamp
  * **Why**: Trend analysis, cannot recreate historical metrics
--* **Size**: ~200 bytes per metric
++* **Size**: 200 bytes per metric
  * **Growth**: Hourly = 8,760 per year per metric type
  * **Retention**: 2 years hot, then aggregate and archive
  * **Decision**: ✅ STORE - Essential for monitoring
  **STORE (Computed Once, Then Cached):**
++
  ==== Analysis Summary ====
++
  * **What**: Neutral text summary of claim analysis (200-500 words)
  * **Computed**: Once by AKEL when claim first analyzed
  * **Stored in**: Claim table (text field)
  * **Recomputed**: Only when system significantly improves OR claim edited
  * **Why store**: Expensive to regenerate ($0.01-0.05 per analysis), doesn't change often
--* **Size**: ~2 KB per claim
++* **Size**: 2 KB per claim
  * **Decision**: ✅ STORE (cached) - Cost-effective
++
  ==== Confidence Score ====
++
  * **What**: 0-100 score of analysis confidence
  * **Computed**: Once by AKEL
  * **Stored in**: Claim table (integer field)
@@ -322,7 +322,9 @@
  * **Why store**: Cheap to store, expensive to compute, users need it fast
  * **Size**: 4 bytes per claim
  * **Decision**: ✅ STORE (cached) - Performance critical
++
  ==== Risk Score ====
++
  * **What**: 0-100 score of claim risk level
  * **Computed**: Once by AKEL
  * **Stored in**: Claim table (integer field)
@@ -331,24 +331,33 @@
  * **Size**: 4 bytes per claim
  * **Decision**: ✅ STORE (cached) - Performance critical
  **COMPUTE DYNAMICALLY (Never Store):**
--==== Scenarios ==== ⚠️ CRITICAL DECISION
++
++==== Scenarios ====
++
++ ⚠️ CRITICAL DECISION
++
  * **What**: 2-5 possible interpretations of claim with assumptions
  * **Current design**: Stored in Scenario table
  * **Alternative**: Compute on-demand when user views claim details
--* **Storage cost**: ~1 KB per scenario × 3 scenarios average = ~3 KB per claim
++* **Storage cost**: 1 KB per scenario × 3 scenarios average = 3 KB per claim
  * **Compute cost**: $0.005-0.01 per request (LLM API call)
--* **Frequency**: Viewed in detail by ~20% of users
++* **Frequency**: Viewed in detail by 20% of users
  * **Trade-off analysis**: - IF STORED: 1M claims × 3 KB = 3 GB storage, $0.05/month, fast access - IF COMPUTED: 1M claims × 20% views × $0.01 = $2,000/month in LLM costs
  * **Reproducibility**: Scenarios may improve as AI improves (good to recompute)
  * **Speed**: Computed = 5-8 seconds delay, Stored = instant
  * **Decision**: ✅ STORE (hybrid approach below)
  **Scenario Strategy** (APPROVED):
++
 . **Store scenarios** initially when claim analyzed
 . **Mark as stale** when system improves significantly
 . **Recompute on next view** if marked stale
 . **Cache for 30 days** if frequently accessed
 . **Result**: Best of both worlds - speed + freshness
--==== Verdict Synthesis ==== * **What**: Final conclusion text synthesizing all scenarios
++
++==== Verdict Synthesis ====
++
++ ~* **What**: Final conclusion text synthesizing all scenarios
++
  * **Compute cost**: $0.002-0.005 per request
  * **Frequency**: Every time claim viewed
  * **Why not store**: Changes as evidence/scenarios change, users want fresh analysis
@@ -355,17 +355,23 @@
  * **Speed**: 2-3 seconds (acceptable)
  **Alternative**: Store "last verdict" as cached field, recompute only if claim edited or marked stale
  * **Recommendation**: ✅ STORE cached version, mark stale when changes occur
++
  ==== Search Results ====
++
  * **What**: Lists of claims matching search query
  * **Compute from**: Elasticsearch index
  * **Cache**: 15 minutes in Redis for popular queries
  * **Why not store permanently**: Constantly changing, infinite possible queries
++
  ==== Aggregated Statistics ====
++
  * **What**: "Total claims: 1,234,567", "Average confidence: 78%", etc.
  * **Compute from**: Database queries
  * **Cache**: 1 hour in Redis
  * **Why not store**: Can be derived, relatively cheap to compute
++
  ==== User Reputation ====
++
  * **What**: Score based on contributions
  * **Current design**: Stored in User table
  * **Alternative**: Compute from Edit table
@@ -373,36 +373,42 @@
  * **Frequency**: Read on every user action
  * **Compute cost**: Simple COUNT query, milliseconds
  * **Decision**: ✅ STORE - Performance critical, read-heavy
++
  === Summary Table ===
--| Data Type | Storage | Compute | Size per Claim | Decision | Rationale |
--|-----------|---------|---------|----------------|----------|-----------|
--| Claim core | ✅ | - | 1 KB | STORE | Essential |
--| Evidence | ✅ | - | 2 KB × 5 = 10 KB | STORE | Reproducibility |
--| Sources | ✅ | - | 500 B (shared) | STORE | Track record |
--| Edit history | ✅ | - | 2 KB × 20% = 400 B avg | STORE | Audit |
--| Analysis summary | ✅ | Once | 2 KB | STORE (cached) | Cost-effective |
--| Confidence score | ✅ | Once | 4 B | STORE (cached) | Fast access |
--| Risk score | ✅ | Once | 4 B | STORE (cached) | Fast access |
--| Scenarios | ✅ | When stale | 3 KB | STORE (hybrid) | Balance cost/speed |
--| Verdict | ✅ | When stale | 1 KB | STORE (cached) | Fast access |
--| Flags | ✅ | - | 500 B × 10% = 50 B avg | STORE | Improvement |
--| ErrorPatterns | ✅ | - | 1 KB (global) | STORE | Learning |
--| QualityMetrics | ✅ | - | 200 B (time series) | STORE | Trending |
--| Search results | - | ✅ | - | COMPUTE + 15min cache | Dynamic |
++
++| Data Type | Storage | Compute | Size per Claim | Decision | Rationale |\\
++|-|-|-|||-|\\
++| Claim core | ✅ | - | 1 KB | STORE | Essential |\\
++| Evidence | ✅ | - | 2 KB × 5 = 10 KB | STORE | Reproducibility |\\
++| Sources | ✅ | - | 500 B (shared) | STORE | Track record |\\
++| Edit history | ✅ | - | 2 KB × 20% = 400 B avg | STORE | Audit |\\
++| Analysis summary | ✅ | Once | 2 KB | STORE (cached) | Cost-effective |\\
++| Confidence score | ✅ | Once | 4 B | STORE (cached) | Fast access |\\
++| Risk score | ✅ | Once | 4 B | STORE (cached) | Fast access |\\
++| Scenarios | ✅ | When stale | 3 KB | STORE (hybrid) | Balance cost/speed |\\
++| Verdict | ✅ | When stale | 1 KB | STORE (cached) | Fast access |\\
++| Flags | ✅ | - | 500 B × 10% = 50 B avg | STORE | Improvement |\\
++| ErrorPatterns | ✅ | - | 1 KB (global) | STORE | Learning |\\
++| QualityMetrics | ✅ | - | 200 B (time series) | STORE | Trending |\\
++| Search results | - | ✅ | - | COMPUTE + 15min cache | Dynamic |\\
  | Aggregations | - | ✅ | - | COMPUTE + 1hr cache | Derivable |
--**Total storage per claim**: ~18 KB (without edits and flags)
++**Total storage per claim**: 18 KB (without edits and flags)
  **For 1 million claims**:
--* **Storage**: ~18 GB (manageable)
--* **PostgreSQL**: ~$50/month (standard instance)
--* **Redis cache**: ~$20/month (1 GB instance)
--* **S3 archives**: ~$5/month (old edits)
--* **Total**: ~$75/month infrastructure
++
++* **Storage**: 18 GB (manageable)
++* **PostgreSQL**: $50/month (standard instance)
++* **Redis cache**: $20/month (1 GB instance)
++* **S3 archives**: $5/month (old edits)
++* **Total**: $75/month infrastructure
  **LLM cost savings by caching**:
  * Analysis summary stored: Save $0.03 per claim = $30K per 1M claims
  * Scenarios stored: Save $0.01 per claim × 20% views = $2K per 1M claims * Verdict stored: Save $0.003 per claim = $3K per 1M claims
--* **Total savings**: ~$35K per 1M claims vs recomputing every time
++* **Total savings**: $35K per 1M claims vs recomputing every time
++
  === Recomputation Triggers ===
++
  **When to mark cached data as stale and recompute:**
++
 . **User edits claim** → Recompute: all (analysis, scenarios, verdict, scores)
 . **Evidence added** → Recompute: scenarios, verdict, confidence score
 . **Source track record changes >10 points** → Recompute: confidence score, verdict
@@ -409,11 +409,15 @@
 . **System improvement deployed** → Mark affected claims stale, recompute on next view
 . **Controversy detected** (high flag rate) → Recompute: risk score
  **Recomputation strategy**:
++
  * **Eager**: Immediately recompute (for user edits)
  * **Lazy**: Recompute on next view (for system improvements)
  * **Batch**: Nightly re-evaluation of stale claims (if <1000)
++
  === Database Size Projection ===
++
  **Year 1**: 10K claims
++
  * Storage: 180 MB
  * Cost: $10/month
  **Year 3**: 100K claims * Storage: 1.8 GB
@@ -425,15 +425,21 @@
  * Cost: $300/month
  * Optimization: Archive old claims to S3 ($5/TB/month)
  **Conclusion**: Storage costs are manageable, LLM cost savings are substantial.
++
  == 3. Key Simplifications ==
++
  * **Two content states only**: Published, Hidden
  * **Three user roles only**: Reader, Contributor, Moderator
  * **No complex versioning**: Linear edit history
  * **Reputation-based permissions**: Not role hierarchy
  * **Source track records**: Continuous evaluation
++
  == 3. What Gets Stored in the Database ==
++
  === 3.1 Primary Storage (PostgreSQL) ===
++
  **Claims Table**:
++
  * Current state only (latest version)
  * Fields: id, assertion, domain, status, confidence_score, risk_score, completeness_score, version, created_at, updated_at
  **Evidence Table**:
@@ -460,31 +460,44 @@
  **QualityMetric Table**:
  * Time-series quality data
  * Fields: id, metric_type, metric_category, value, target, timestamp
++
  === 3.2 What's NOT Stored (Computed on-the-fly) ===
++
  * **Verdicts**: Synthesized from evidence + scenarios when requested
  * **Risk scores**: Recalculated based on current factors
  * **Aggregated statistics**: Computed from base data
  * **Search results**: Generated from Elasticsearch index
++
  === 3.3 Cache Layer (Redis) ===
++
  **Cached for performance**:
++
  * Frequently accessed claims (TTL: 1 hour)
  * Search results (TTL: 15 minutes)
  * User sessions (TTL: 24 hours)
  * Source track records (TTL: 1 hour)
++
  === 3.4 File Storage (S3) ===
++
  **Archived content**:
++
  * Old edit history (>3 months)
  * Evidence documents (archived copies)
  * Database backups
  * Export files
++
  === 3.5 Search Index (Elasticsearch) ===
++
  **Indexed for search**:
++
  * Claim assertions (full-text)
  * Evidence excerpts (full-text)
  * Scenario descriptions (full-text)
  * Source names (autocomplete)
  Synchronized from PostgreSQL via change data capture or periodic sync.
++
  == 4. Related Pages ==
--* [[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]]
++
++* [[Architecture>>Test.FactHarbor pre11 V0\.9\.70.Specification.Architecture.WebHome]]
  * [[Requirements>>Test.FactHarbor.Specification.Requirements.WebHome]]
  * [[Workflows>>Test.FactHarbor.Specification.Workflows.WebHome]]

Changes for page Data Model

Summary

Details

Applications

Navigation

Need help?