Changes for page POC1 API & Schemas Specification
Last modified by Robert Schaub on 2025/12/24 20:16
From version 2.2
edited by Robert Schaub
on 2025/12/24 20:16
on 2025/12/24 20:16
Change comment:
Update document after refactoring.
Summary
-
Page properties (2 modified, 0 added, 0 removed)
Details
- Page properties
-
- Parent
-
... ... @@ -1,1 +1,1 @@ 1 -Test.FactHarbor V0\.9\.105.Specification.POC.WebHome1 +Test.FactHarbor.Specification.POC.WebHome - Content
-
... ... @@ -58,7 +58,7 @@ 58 58 59 59 * **Input:** Article text 60 60 * **Output:** 5 canonical claims (normalized, deduplicated) 61 -* **Model:** Claude Haiku 4 .5.5(default, configurable via LLM abstraction layer)61 +* **Model:** Claude Haiku 4 (default, configurable via LLM abstraction layer) 62 62 * **Cost:** $0.003 per article 63 63 * **Cache strategy:** No caching (article-specific) 64 64 ... ... @@ -66,7 +66,7 @@ 66 66 67 67 * **Input:** Single canonical claim 68 68 * **Output:** Scenarios + Evidence + Verdicts 69 -* **Model:** Claude Sonnet 4.5 (default, configurable via LLM abstraction layer)69 +* **Model:** Claude Sonnet 3.5 (default, configurable via LLM abstraction layer) 70 70 * **Cost:** $0.081 per NEW claim 71 71 * **Cache strategy:** Redis, 90-day TTL 72 72 * **Cache key:** claim:v1norm1:{language}:{sha256(canonical_claim)} ... ... @@ -75,7 +75,7 @@ 75 75 76 76 * **Input:** Article + Claim verdicts (from cache or Stage 2) 77 77 * **Output:** Article verdict + Fallacies + Logic quality 78 -* **Model:** Claude Sonnet 4.5 (default, configurable via LLM abstraction layer)78 +* **Model:** Claude Sonnet 3.5 (default, configurable via LLM abstraction layer) 79 79 * **Cost:** $0.030 per article 80 80 * **Cache strategy:** No caching (article-specific) 81 81 ... ... @@ -115,336 +115,6 @@ 115 115 116 116 When free users reach their $10 monthly limit, they enter **Cache-Only Mode**: 117 117 118 - 119 - 120 -==== Stage 3: Holistic Assessment - Complete Specification ==== 121 - 122 -===== 3.3.1 Overview ===== 123 - 124 -**Purpose:** Synthesize individual claim analyses into an overall article assessment, identifying logical fallacies, reasoning quality, and publication readiness. 125 - 126 -**Approach:** **Single-Pass Holistic Analysis** (Approach 1 from Comparison Matrix) 127 - 128 -**Why This Approach for POC1:** 129 -* ✅ **1 API call** (vs 2 for Two-Pass or Judge) 130 -* ✅ **Low cost** ($0.030 per article) 131 -* ✅ **Fast** (4-6 seconds) 132 -* ✅ **Low complexity** (simple implementation) 133 -* ⚠️ **Medium reliability** (acceptable for POC1, will improve in POC2/Production) 134 - 135 -**Alternative Approaches Considered:** 136 - 137 -|= Approach |= API Calls |= Cost |= Speed |= Complexity |= Reliability |= Best For 138 -| **1. Single-Pass** ⭐ | 1 | 💰 Low | ⚡ Fast | 🟢 Low | ⚠️ Medium | **POC1** 139 -| 2. Two-Pass | 2 | 💰💰 Med | 🐢 Slow | 🟡 Med | ✅ High | POC2/Prod 140 -| 3. Structured | 1 | 💰 Low | ⚡ Fast | 🟡 Med | ✅ High | POC1 (alternative) 141 -| 4. Weighted | 1 | 💰 Low | ⚡ Fast | 🟢 Low | ⚠️ Medium | POC1 (alternative) 142 -| 5. Heuristics | 1 | 💰 Lowest | ⚡⚡ Fastest | 🟡 Med | ⚠️ Medium | Any 143 -| 6. Hybrid | 1 | 💰 Low | ⚡ Fast | 🔴 Med-High | ✅ High | POC2 144 -| 7. Judge | 2 | 💰💰 Med | 🐢 Slow | 🟡 Med | ✅ High | Production 145 - 146 -**POC1 Choice:** Approach 1 (Single-Pass) for speed and simplicity. Will upgrade to Approach 2 (Two-Pass) or 6 (Hybrid) in POC2 for higher reliability. 147 - 148 -===== 3.3.2 What Stage 3 Evaluates ===== 149 - 150 -Stage 3 performs **integrated holistic analysis** considering: 151 - 152 -**1. Claim-Level Aggregation:** 153 -* Verdict distribution (how many TRUE vs FALSE vs DISPUTED) 154 -* Average confidence across all claims 155 -* Claim interdependencies (do claims support/contradict each other?) 156 -* Critical claim identification (which claims are most important?) 157 - 158 -**2. Contextual Factors:** 159 -* **Source credibility**: Is the article from a reputable publisher? 160 -* **Author expertise**: Does the author have relevant credentials? 161 -* **Publication date**: Is information current or outdated? 162 -* **Claim coherence**: Do claims form a logical narrative? 163 -* **Missing context**: Are important caveats or qualifications missing? 164 - 165 -**3. Logical Fallacies:** 166 -* **Cherry-picking**: Selective evidence presentation 167 -* **False equivalence**: Treating unequal things as equal 168 -* **Straw man**: Misrepresenting opposing arguments 169 -* **Ad hominem**: Attacking person instead of argument 170 -* **Slippery slope**: Assuming extreme consequences without justification 171 -* **Circular reasoning**: Conclusion assumes premise 172 -* **False dichotomy**: Presenting only two options when more exist 173 - 174 -**4. Reasoning Quality:** 175 -* **Evidence strength**: Quality and quantity of supporting evidence 176 -* **Logical coherence**: Arguments follow logically 177 -* **Transparency**: Assumptions and limitations acknowledged 178 -* **Nuance**: Complexity and uncertainty appropriately addressed 179 - 180 -**5. Publication Readiness:** 181 -* **Risk tier assignment**: A (high risk), B (medium), or C (low risk) 182 -* **Publication mode**: DRAFT_ONLY, AI_GENERATED, or HUMAN_REVIEWED 183 -* **Required disclaimers**: What warnings should accompany this content? 184 - 185 -===== 3.3.3 Implementation: Single-Pass Approach ===== 186 - 187 -**Input:** 188 -* Original article text (full content) 189 -* Stage 2 claim analyses (array of ClaimAnalysis objects) 190 -* Article metadata (URL, title, author, date, source) 191 - 192 -**Processing:** 193 - 194 -{{code language="python"}} 195 -# Pseudo-code for Stage 3 (Single-Pass) 196 - 197 -def stage3_holistic_assessment(article, claim_analyses, metadata): 198 - """ 199 - Single-pass holistic assessment using Claude Sonnet 4.5. 200 - 201 - Approach 1: One comprehensive prompt that asks the LLM to: 202 - 1. Review all claim verdicts 203 - 2. Identify patterns and dependencies 204 - 3. Detect logical fallacies 205 - 4. Assess reasoning quality 206 - 5. Determine credibility score and risk tier 207 - 6. Generate publication recommendations 208 - """ 209 - 210 - # Construct comprehensive prompt 211 - prompt = f""" 212 -You are analyzing an article for factual accuracy and logical reasoning. 213 - 214 -ARTICLE METADATA: 215 -- Title: {metadata['title']} 216 -- Source: {metadata['source']} 217 -- Date: {metadata['date']} 218 -- Author: {metadata['author']} 219 - 220 -ARTICLE TEXT: 221 -{article} 222 - 223 -INDIVIDUAL CLAIM ANALYSES: 224 -{format_claim_analyses(claim_analyses)} 225 - 226 -YOUR TASK: 227 -Perform a holistic assessment considering: 228 - 229 -1. CLAIM AGGREGATION: 230 - - Review the verdict for each claim 231 - - Identify any interdependencies between claims 232 - - Determine which claims are most critical to the article's thesis 233 - 234 -2. CONTEXTUAL EVALUATION: 235 - - Assess source credibility 236 - - Evaluate author expertise 237 - - Consider publication timeliness 238 - - Identify missing context or important caveats 239 - 240 -3. LOGICAL FALLACIES: 241 - - Identify any logical fallacies present 242 - - For each fallacy, provide: 243 - * Type of fallacy 244 - * Where it occurs in the article 245 - * Why it's problematic 246 - * Severity (minor/moderate/severe) 247 - 248 -4. REASONING QUALITY: 249 - - Evaluate evidence strength 250 - - Assess logical coherence 251 - - Check for transparency in assumptions 252 - - Evaluate handling of nuance and uncertainty 253 - 254 -5. CREDIBILITY SCORING: 255 - - Calculate overall credibility score (0.0-1.0) 256 - - Assign risk tier: 257 - * A (high risk): ≤0.5 credibility OR severe fallacies 258 - * B (medium risk): 0.5-0.8 credibility OR moderate issues 259 - * C (low risk): >0.8 credibility AND no significant issues 260 - 261 -6. PUBLICATION RECOMMENDATIONS: 262 - - Determine publication mode: 263 - * DRAFT_ONLY: Tier A, multiple severe issues 264 - * AI_GENERATED: Tier B/C, acceptable quality with disclaimers 265 - * HUMAN_REVIEWED: Complex or borderline cases 266 - - List required disclaimers 267 - - Explain decision rationale 268 - 269 -OUTPUT FORMAT: 270 -Return a JSON object matching the ArticleAssessment schema. 271 -""" 272 - 273 - # Call LLM 274 - response = llm_client.complete( 275 - model="claude-sonnet-4-5-20250929", 276 - prompt=prompt, 277 - max_tokens=4000, 278 - response_format="json" 279 - ) 280 - 281 - # Parse and validate response 282 - assessment = parse_json(response.content) 283 - validate_article_assessment_schema(assessment) 284 - 285 - return assessment 286 -{{/code}} 287 - 288 -**Prompt Engineering Notes:** 289 - 290 -1. **Structured Instructions**: Break down task into 6 clear sections 291 -2. **Context-Rich**: Provide article + all claim analyses + metadata 292 -3. **Explicit Criteria**: Define credibility scoring and risk tiers precisely 293 -4. **JSON Schema**: Request structured output matching ArticleAssessment schema 294 -5. **Examples** (in production): Include 2-3 example assessments for consistency 295 - 296 -===== 3.3.4 Credibility Scoring Algorithm ===== 297 - 298 -**Base Score Calculation:** 299 - 300 -{{code language="python"}} 301 -def calculate_credibility_score(claim_analyses, fallacies, contextual_factors): 302 - """ 303 - Calculate overall credibility score (0.0-1.0). 304 - 305 - This is a GUIDELINE for the LLM, not strict code. 306 - The LLM has flexibility to adjust based on context. 307 - """ 308 - 309 - # 1. Claim Verdict Score (60% weight) 310 - verdict_weights = { 311 - "TRUE": 1.0, 312 - "PARTIALLY_TRUE": 0.7, 313 - "DISPUTED": 0.5, 314 - "UNSUPPORTED": 0.3, 315 - "FALSE": 0.0, 316 - "UNVERIFIABLE": 0.4 317 - } 318 - 319 - claim_scores = [ 320 - verdict_weights[c.verdict.label] * c.verdict.confidence 321 - for c in claim_analyses 322 - ] 323 - avg_claim_score = sum(claim_scores) / len(claim_scores) 324 - claim_component = avg_claim_score * 0.6 325 - 326 - # 2. Fallacy Penalty (20% weight) 327 - fallacy_penalties = { 328 - "minor": -0.05, 329 - "moderate": -0.15, 330 - "severe": -0.30 331 - } 332 - 333 - fallacy_score = 1.0 334 - for fallacy in fallacies: 335 - fallacy_score += fallacy_penalties[fallacy.severity] 336 - 337 - fallacy_score = max(0.0, min(1.0, fallacy_score)) 338 - fallacy_component = fallacy_score * 0.2 339 - 340 - # 3. Contextual Factors (20% weight) 341 - context_adjustments = { 342 - "source_credibility": {"positive": +0.1, "neutral": 0, "negative": -0.1}, 343 - "author_expertise": {"positive": +0.1, "neutral": 0, "negative": -0.1}, 344 - "timeliness": {"positive": +0.05, "neutral": 0, "negative": -0.05}, 345 - "transparency": {"positive": +0.05, "neutral": 0, "negative": -0.05} 346 - } 347 - 348 - context_score = 1.0 349 - for factor in contextual_factors: 350 - adjustment = context_adjustments.get(factor.factor, {}).get(factor.impact, 0) 351 - context_score += adjustment 352 - 353 - context_score = max(0.0, min(1.0, context_score)) 354 - context_component = context_score * 0.2 355 - 356 - # 4. Combine components 357 - final_score = claim_component + fallacy_component + context_component 358 - 359 - # 5. Apply confidence modifier 360 - avg_confidence = sum(c.verdict.confidence for c in claim_analyses) / len(claim_analyses) 361 - final_score = final_score * (0.8 + 0.2 * avg_confidence) 362 - 363 - return max(0.0, min(1.0, final_score)) 364 -{{/code}} 365 - 366 -**Note:** This algorithm is a **guideline** provided to the LLM in the system prompt. The LLM has flexibility to adjust based on specific article context, but should generally follow this structure for consistency. 367 - 368 -===== 3.3.5 Risk Tier Assignment ===== 369 - 370 -**Automatic Risk Tier Rules:** 371 - 372 -{{code}} 373 -Risk Tier A (High Risk - Requires Review): 374 -- Credibility score ≤ 0.5, OR 375 -- Any severe fallacies detected, OR 376 -- Multiple (3+) moderate fallacies, OR 377 -- 50%+ of claims are FALSE or UNSUPPORTED 378 - 379 -Risk Tier B (Medium Risk - May Publish with Disclaimers): 380 -- Credibility score 0.5-0.8, OR 381 -- 1-2 moderate fallacies, OR 382 -- 20-49% of claims are DISPUTED or PARTIALLY_TRUE 383 - 384 -Risk Tier C (Low Risk - Safe to Publish): 385 -- Credibility score > 0.8, AND 386 -- No severe or moderate fallacies, AND 387 -- <20% disputed/problematic claims, AND 388 -- No critical missing context 389 -{{/code}} 390 - 391 -===== 3.3.6 Output: ArticleAssessment Schema ===== 392 - 393 -(See Stage 3 Output Schema section above for complete JSON schema) 394 - 395 -===== 3.3.7 Performance Metrics ===== 396 - 397 -**POC1 Targets:** 398 -* **Processing time**: 4-6 seconds per article 399 -* **Cost**: $0.030 per article (Sonnet 4.5 tokens) 400 -* **Quality**: 70-80% agreement with human reviewers (acceptable for POC) 401 -* **API calls**: 1 per article 402 - 403 -**Future Improvements (POC2/Production):** 404 -* Upgrade to Two-Pass (Approach 2): +15% accuracy, +$0.020 cost 405 -* Add human review sampling: 10% of Tier B articles 406 -* Implement Judge approach (Approach 7) for Tier A: Highest quality 407 - 408 -===== 3.3.8 Example Stage 3 Execution ===== 409 - 410 -**Input:** 411 -* Article: "Biden won the 2020 election" 412 -* Claim analyses: [{claim: "Biden won", verdict: "TRUE", confidence: 0.95}] 413 - 414 -**Stage 3 Processing:** 415 -1. Analyzes single claim with high confidence 416 -2. Checks for contextual factors (source credibility) 417 -3. Searches for logical fallacies (none found) 418 -4. Calculates credibility: 0.6 * 0.95 + 0.2 * 1.0 + 0.2 * 1.0 = 0.97 419 -5. Assigns risk tier: C (low risk) 420 -6. Recommends: AI_GENERATED publication mode 421 - 422 -**Output:** 423 -```json 424 -{ 425 - "article_id": "a1", 426 - "overall_assessment": { 427 - "credibility_score": 0.97, 428 - "risk_tier": "C", 429 - "summary": "Article makes single verifiable claim with strong evidence support", 430 - "confidence": 0.95 431 - }, 432 - "claim_aggregation": { 433 - "total_claims": 1, 434 - "verdict_distribution": {"TRUE": 1}, 435 - "avg_confidence": 0.95 436 - }, 437 - "contextual_factors": [ 438 - {"factor": "source_credibility", "impact": "positive", "description": "Reputable news source"} 439 - ], 440 - "recommendations": { 441 - "publication_mode": "AI_GENERATED", 442 - "requires_review": false, 443 - "suggested_disclaimers": [] 444 - } 445 -} 446 -``` 447 - 448 448 ==== What Cache-Only Mode Provides: ==== 449 449 450 450 ✅ **Claim Extraction (Platform-Funded):** ... ... @@ -566,7 +566,7 @@ 566 566 **Primary Provider (Default):** 567 567 568 568 * **Anthropic Claude API** 569 - * Models: Claude Haiku 4 .5, Claude Sonnet4.5, Claude Opus 4239 + * Models: Claude Haiku 4, Claude Sonnet 3.5, Claude Opus 4 570 570 * Used by default in POC1 571 571 * Best quality for holistic analysis 572 572 ... ... @@ -603,9 +603,9 @@ 603 603 LLM_STAGE1_PROVIDER=anthropic 604 604 LLM_STAGE1_MODEL=claude-haiku-4 605 605 LLM_STAGE2_PROVIDER=anthropic 606 -LLM_STAGE2_MODEL=claude-sonnet- 4-5-20250929276 +LLM_STAGE2_MODEL=claude-sonnet-3-5 607 607 LLM_STAGE3_PROVIDER=anthropic 608 -LLM_STAGE3_MODEL=claude-sonnet- 4-5-20250929278 +LLM_STAGE3_MODEL=claude-sonnet-3-5 609 609 610 610 # Cost limits 611 611 LLM_MAX_COST_PER_REQUEST=1.00 ... ... @@ -632,19 +632,19 @@ 632 632 "stage_config": { 633 633 "stage1": { 634 634 "provider": "anthropic", 635 - "model": "claude-haiku-4 -5-20251001",305 + "model": "claude-haiku-4", 636 636 "max_tokens": 4096, 637 637 "temperature": 0.0 638 638 }, 639 639 "stage2": { 640 640 "provider": "anthropic", 641 - "model": "claude-sonnet- 4-5-20250929",311 + "model": "claude-sonnet-3-5", 642 642 "max_tokens": 16384, 643 643 "temperature": 0.3 644 644 }, 645 645 "stage3": { 646 646 "provider": "anthropic", 647 - "model": "claude-sonnet- 4-5-20250929",317 + "model": "claude-sonnet-3-5", 648 648 "max_tokens": 8192, 649 649 "temperature": 0.2 650 650 } ... ... @@ -658,7 +658,7 @@ 658 658 659 659 **Stage 1: Claim Extraction** 660 660 661 -* **Default:** Anthropic Claude Haiku 4 .5331 +* **Default:** Anthropic Claude Haiku 4 662 662 * **Alternative:** OpenAI GPT-4o-mini, Google Gemini 1.5 Flash 663 663 * **Rationale:** Fast, cheap, simple task 664 664 * **Cost:** ~$0.003 per article ... ... @@ -665,7 +665,7 @@ 665 665 666 666 **Stage 2: Claim Analysis** (CACHEABLE) 667 667 668 -* **Default:** Anthropic Claude Sonnet 4.5338 +* **Default:** Anthropic Claude Sonnet 3.5 669 669 * **Alternative:** OpenAI GPT-4o, Google Gemini 1.5 Pro 670 670 * **Rationale:** High-quality analysis, cached 90 days 671 671 * **Cost:** ~$0.081 per NEW claim ... ... @@ -672,7 +672,7 @@ 672 672 673 673 **Stage 3: Holistic Assessment** 674 674 675 -* **Default:** Anthropic Claude Sonnet 4.5345 +* **Default:** Anthropic Claude Sonnet 3.5 676 676 * **Alternative:** OpenAI GPT-4o, Claude Opus 4 (for high-stakes) 677 677 * **Rationale:** Complex reasoning, logical fallacy detection 678 678 * **Cost:** ~$0.030 per article ... ... @@ -680,9 +680,9 @@ 680 680 **Cost Comparison (Example):** 681 681 682 682 |=Stage|=Anthropic (Default)|=OpenAI Alternative|=Google Alternative 683 -|Stage 1|Claude Haiku 4 .5.5($0.003)|GPT-4o-mini ($0.002)|Gemini Flash ($0.002)684 -|Stage 2|Claude Sonnet 4.5 ($0.081)|GPT-4o ($0.045)|Gemini Pro ($0.050)685 -|Stage 3|Claude Sonnet 4.5 ($0.030)|GPT-4o ($0.018)|Gemini Pro ($0.020)353 +|Stage 1|Claude Haiku 4 ($0.003)|GPT-4o-mini ($0.002)|Gemini Flash ($0.002) 354 +|Stage 2|Claude Sonnet 3.5 ($0.081)|GPT-4o ($0.045)|Gemini Pro ($0.050) 355 +|Stage 3|Claude Sonnet 3.5 ($0.030)|GPT-4o ($0.018)|Gemini Pro ($0.020) 686 686 |**Total (0% cache)**|**$0.114**|**$0.065**|**$0.072** 687 687 688 688 **Note:** POC1 uses Anthropic exclusively for consistency. Multi-provider support planned for POC2. ... ... @@ -743,7 +743,7 @@ 743 743 "stage": "stage2", 744 744 "previous": { 745 745 "provider": "anthropic", 746 - "model": "claude-sonnet- 4-5-20250929"416 + "model": "claude-sonnet-3-5" 747 747 }, 748 748 "current": { 749 749 "provider": "openai", ... ... @@ -769,17 +769,17 @@ 769 769 "stages": { 770 770 "stage1": { 771 771 "provider": "anthropic", 772 - "model": "claude-haiku-4 -5-20251001",442 + "model": "claude-haiku-4", 773 773 "cost_per_request": 0.003 774 774 }, 775 775 "stage2": { 776 776 "provider": "anthropic", 777 - "model": "claude-sonnet- 4-5-20250929",447 + "model": "claude-sonnet-3-5", 778 778 "cost_per_new_claim": 0.081 779 779 }, 780 780 "stage3": { 781 781 "provider": "anthropic", 782 - "model": "claude-sonnet- 4-5-20250929",452 + "model": "claude-sonnet-3-5", 783 783 "cost_per_request": 0.030 784 784 } 785 785 } ... ... @@ -796,7 +796,7 @@ 796 796 class AnthropicProvider implements LLMProvider { 797 797 async complete(prompt: string, options: CompletionOptions) { 798 798 const response = await anthropic.messages.create({ 799 - model: options.model || 'claude-sonnet- 4-5-20250929',469 + model: options.model || 'claude-sonnet-3-5', 800 800 max_tokens: options.maxTokens || 4096, 801 801 messages: [{ role: 'user', content: prompt }], 802 802 system: options.systemPrompt ... ... @@ -862,178 +862,6 @@ 862 862 863 863 ---- 864 864 865 - 866 - 867 -==== Stage 2 Output Schema: ClaimAnalysis ==== 868 - 869 -**Complete schema for each claim's analysis result:** 870 - 871 -{{code language="json"}} 872 -{ 873 - "claim_id": "claim_abc123", 874 - "claim_text": "Biden won the 2020 election", 875 - "scenarios": [ 876 - { 877 - "scenario_id": "scenario_1", 878 - "description": "Interpreting 'won' as Electoral College victory", 879 - "verdict": { 880 - "label": "TRUE", 881 - "confidence": 0.95, 882 - "explanation": "Joe Biden won 306 electoral votes vs Trump's 232" 883 - }, 884 - "evidence": { 885 - "supporting": [ 886 - { 887 - "text": "Biden certified with 306 electoral votes", 888 - "source_url": "https://www.archives.gov/electoral-college/2020", 889 - "source_title": "2020 Electoral College Results", 890 - "credibility_score": 0.98 891 - } 892 - ], 893 - "opposing": [] 894 - } 895 - } 896 - ], 897 - "recommended_scenario": "scenario_1", 898 - "metadata": { 899 - "analysis_timestamp": "2024-12-24T18:00:00Z", 900 - "model_used": "claude-sonnet-4-5-20250929", 901 - "processing_time_seconds": 8.5 902 - } 903 -} 904 -{{/code}} 905 - 906 -**Required Fields:** 907 -* **claim_id**: Unique identifier matching Stage 1 output 908 -* **claim_text**: The exact claim being analyzed 909 -* **scenarios**: Array of interpretation scenarios (minimum 1) 910 - * **scenario_id**: Unique ID for this scenario 911 - * **description**: Clear interpretation of the claim 912 - * **verdict**: Verdict object with label, confidence, explanation 913 - * **evidence**: Supporting and opposing evidence arrays 914 -* **recommended_scenario**: ID of the primary/recommended scenario 915 -* **metadata**: Processing metadata (timestamp, model, timing) 916 - 917 -**Optional Fields:** 918 -* Additional context, warnings, or quality scores 919 - 920 -**Minimum Viable Example:** 921 - 922 -{{code language="json"}} 923 -{ 924 - "claim_id": "c1", 925 - "claim_text": "The sky is blue", 926 - "scenarios": [{ 927 - "scenario_id": "s1", 928 - "description": "Under clear daytime conditions", 929 - "verdict": {"label": "TRUE", "confidence": 0.99, "explanation": "Rayleigh scattering"}, 930 - "evidence": {"supporting": [], "opposing": []} 931 - }], 932 - "recommended_scenario": "s1", 933 - "metadata": {"analysis_timestamp": "2024-12-24T18:00:00Z"} 934 -} 935 -{{/code}} 936 - 937 - 938 - 939 -==== Stage 3 Output Schema: ArticleAssessment ==== 940 - 941 -**Complete schema for holistic article-level assessment:** 942 - 943 -{{code language="json"}} 944 -{ 945 - "article_id": "article_xyz789", 946 - "overall_assessment": { 947 - "credibility_score": 0.72, 948 - "risk_tier": "B", 949 - "summary": "Article contains mostly accurate claims with one disputed claim requiring expert review", 950 - "confidence": 0.85 951 - }, 952 - "claim_aggregation": { 953 - "total_claims": 5, 954 - "verdict_distribution": { 955 - "TRUE": 3, 956 - "PARTIALLY_TRUE": 1, 957 - "DISPUTED": 1, 958 - "FALSE": 0, 959 - "UNSUPPORTED": 0, 960 - "UNVERIFIABLE": 0 961 - }, 962 - "avg_confidence": 0.82 963 - }, 964 - "contextual_factors": [ 965 - { 966 - "factor": "Source credibility", 967 - "impact": "positive", 968 - "description": "Published by reputable news organization" 969 - }, 970 - { 971 - "factor": "Claim interdependence", 972 - "impact": "neutral", 973 - "description": "Claims are independent; no logical chains" 974 - } 975 - ], 976 - "recommendations": { 977 - "publication_mode": "AI_GENERATED", 978 - "requires_review": false, 979 - "review_reason": null, 980 - "suggested_disclaimers": [ 981 - "One claim (Claim 4) has conflicting expert opinions" 982 - ] 983 - }, 984 - "metadata": { 985 - "holistic_timestamp": "2024-12-24T18:00:10Z", 986 - "model_used": "claude-sonnet-4-5-20250929", 987 - "processing_time_seconds": 4.2, 988 - "cache_used": false 989 - } 990 -} 991 -{{/code}} 992 - 993 -**Required Fields:** 994 -* **article_id**: Unique identifier for this article 995 -* **overall_assessment**: Top-level assessment 996 - * **credibility_score**: 0.0-1.0 composite score 997 - * **risk_tier**: A, B, or C (per AKEL quality gates) 998 - * **summary**: Human-readable assessment 999 - * **confidence**: How confident the holistic assessment is 1000 -* **claim_aggregation**: Statistics across all claims 1001 - * **total_claims**: Count of claims analyzed 1002 - * **verdict_distribution**: Count per verdict label 1003 - * **avg_confidence**: Average confidence across verdicts 1004 -* **contextual_factors**: Array of contextual considerations 1005 -* **recommendations**: Publication decision support 1006 - * **publication_mode**: DRAFT_ONLY, AI_GENERATED, or HUMAN_REVIEWED 1007 - * **requires_review**: Boolean flag 1008 - * **suggested_disclaimers**: Array of disclaimer texts 1009 -* **metadata**: Processing metadata 1010 - 1011 -**Minimum Viable Example:** 1012 - 1013 -{{code language="json"}} 1014 -{ 1015 - "article_id": "a1", 1016 - "overall_assessment": { 1017 - "credibility_score": 0.95, 1018 - "risk_tier": "C", 1019 - "summary": "All claims verified as true", 1020 - "confidence": 0.98 1021 - }, 1022 - "claim_aggregation": { 1023 - "total_claims": 1, 1024 - "verdict_distribution": {"TRUE": 1}, 1025 - "avg_confidence": 0.99 1026 - }, 1027 - "contextual_factors": [], 1028 - "recommendations": { 1029 - "publication_mode": "AI_GENERATED", 1030 - "requires_review": false, 1031 - "suggested_disclaimers": [] 1032 - }, 1033 - "metadata": {"holistic_timestamp": "2024-12-24T18:00:00Z"} 1034 -} 1035 -{{/code}} 1036 - 1037 1037 === 3.2 Create Analysis Job (3-Stage) === 1038 1038 1039 1039 **Endpoint:** POST /v1/analyze ... ... @@ -1085,20 +1085,6 @@ 1085 1085 "browsing": "on", 1086 1086 "depth": "standard", 1087 1087 "max_claims": 5, 1088 - 1089 -* **cache_preference** (optional): Cache usage preference 1090 - * **Type:** string 1091 - * **Enum:** {{code}}["prefer_cache", "allow_partial", "skip_cache"]{{/code}} 1092 - * **Default:** {{code}}"prefer_cache"{{/code}} 1093 - * **Semantics:** 1094 - * {{code}}"prefer_cache"{{/code}}: Use full cache if available, otherwise run all stages 1095 - * {{code}}"allow_partial"{{/code}}: Use cached Stage 2 results if available, rerun only Stage 3 1096 - * {{code}}"skip_cache"{{/code}}: Always rerun all stages (ignore cache) 1097 - * **Behavior:** When set to {{code}}"allow_partial"{{/code}} and Stage 2 cached results exist: 1098 - * Stage 1 & 2 are skipped 1099 - * Stage 3 (holistic assessment) runs fresh with cached claim analyses 1100 - * Response includes {{code}}"cache_used": true{{/code}} and {{code}}"stages_cached": ["stage1", "stage2"]{{/code}} 1101 - 1102 1102 "scenarios_per_claim": 2, 1103 1103 "max_evidence_per_scenario": 6, 1104 1104 "context_aware_analysis": true ... ... @@ -1286,78 +1286,80 @@ 1286 1286 1287 1287 **Algorithm: Canonical Claim Normalization v1** 1288 1288 773 +{{{def normalize_claim_v1(claim_text: str, language: str) -> str: 774 + """ 775 + Normalizes claim to canonical form for cache key generation. 776 + Version: v1norm1 (POC1) 777 + """ 778 + import re 779 + import unicodedata 780 + 781 + # Step 1: Unicode normalization (NFC) 782 + text = unicodedata.normalize('NFC', claim_text) 783 + 784 + # Step 2: Lowercase 785 + text = text.lower() 786 + 787 + # Step 3: Remove punctuation (except hyphens in words) 788 + text = re.sub(r'[^\w\s-]', '', text) 789 + 790 + # Step 4: Normalize whitespace (collapse multiple spaces) 791 + text = re.sub(r'\s+', ' ', text).strip() 792 + 793 + # Step 5: Numeric normalization 794 + text = text.replace('%', ' percent') 795 + # Spell out single-digit numbers 796 + num_to_word = {'0':'zero', '1':'one', '2':'two', '3':'three', 797 + '4':'four', '5':'five', '6':'six', '7':'seven', 798 + '8':'eight', '9':'nine'} 799 + for num, word in num_to_word.items(): 800 + text = re.sub(rf'\b{num}\b', word, text) 801 + 802 + # Step 6: Common abbreviations (English only in v1) 803 + if language == 'en': 804 + text = text.replace('covid-19', 'covid') 805 + text = text.replace('u.s.', 'us') 806 + text = text.replace('u.k.', 'uk') 807 + 808 + # Step 7: NO entity normalization in v1 809 + # (Trump vs Donald Trump vs President Trump remain distinct) 810 + 811 + return text 1289 1289 1290 -**Normative Algorithm:** 813 +# Version identifier (include in cache namespace) 814 +CANONICALIZER_VERSION = "v1norm1" 815 +}}} 1291 1291 1292 -{{code language="python"}} 1293 -def normalize_claim(text: str) -> str: 1294 - """ 1295 - Canonical claim normalization for deduplication. 1296 - MUST follow this algorithm exactly. 1297 - 1298 - Version: v1norm1 1299 - """ 1300 - import re 1301 - import unicodedata 1302 - 1303 - # 1. Unicode normalization (NFD) 1304 - text = unicodedata.normalize('NFD', text) 1305 - 1306 - # 2. Lowercase 1307 - text = text.lower() 1308 - 1309 - # 3. Remove diacritics 1310 - text = ''.join(c for c in text if unicodedata.category(c) != 'Mn') 1311 - 1312 - # 4. Normalize whitespace 1313 - text = re.sub(r'\s+', ' ', text) 1314 - text = text.strip() 1315 - 1316 - # 5. Remove punctuation except apostrophes in contractions 1317 - text = re.sub(r"[^\w\s']", '', text) 1318 - 1319 - # 6. Normalize common contractions 1320 - contractions = { 1321 - "don't": "do not", 1322 - "doesn't": "does not", 1323 - "didn't": "did not", 1324 - "can't": "cannot", 1325 - "won't": "will not", 1326 - "shouldn't": "should not", 1327 - "wouldn't": "would not", 1328 - "isn't": "is not", 1329 - "aren't": "are not", 1330 - "wasn't": "was not", 1331 - "weren't": "were not", 1332 - "haven't": "have not", 1333 - "hasn't": "has not", 1334 - "hadn't": "had not" 1335 - } 1336 - 1337 - for contraction, expansion in contractions.items(): 1338 - text = re.sub(r'\b' + contraction + r'\b', expansion, text) 1339 - 1340 - # 7. Remove remaining apostrophes 1341 - text = text.replace("'", "") 1342 - 1343 - # 8. Final whitespace normalization 1344 - text = re.sub(r'\s+', ' ', text) 1345 - text = text.strip() 1346 - 1347 - return text 1348 -{{/code}} 817 +**Cache Key Formula (Updated):** 1349 1349 1350 -**Normalization Examples:** 819 +{{{language = "en" 820 +canonical = normalize_claim_v1(claim_text, language) 821 +cache_key = f"claim:{CANONICALIZER_VERSION}:{language}:{sha256(canonical)}" 1351 1351 1352 - |= Input |= Normalized Output1353 - |"Biden won the 2020 election"| {{code}}bidenwon the2020election{{/code}}1354 - |"Bidenwonthe 2020 election!"| {{code}}bidenwonthe2020election{{/code}}1355 - |"Biden won the2020election" | {{code}}biden won the 2020 election{{/code}}1356 - |"Bidendidn't win the 2020 election"| {{code}}bidendid not win the2020 election{{/code}}1357 - | "BIDEN WON THE 2020 ELECTION" | {{code}}biden won the 2020 election{{/code}}823 +Example: 824 + claim: "COVID-19 vaccines are 95% effective" 825 + canonical: "covid vaccines are 95 percent effective" 826 + sha256: abc123...def456 827 + key: "claim:v1norm1:en:abc123...def456" 828 +}}} 1358 1358 1359 -** Versioning:** Algorithm versionis {{code}}v1norm1{{/code}}. Changesto thealgorithmrequireanew version identifier.830 +**Cache Metadata MUST Include:** 1360 1360 832 +{{{{ 833 + "canonical_claim": "covid vaccines are 95 percent effective", 834 + "canonicalizer_version": "v1norm1", 835 + "language": "en", 836 + "original_claim_samples": ["COVID-19 vaccines are 95% effective"] 837 +} 838 +}}} 839 + 840 +**Version Upgrade Path:** 841 + 842 +* v1norm1 → v1norm2: Cache namespace changes, old keys remain valid until TTL 843 +* v1normN → v2norm1: Major version bump, invalidate all v1 caches 844 + 845 +---- 846 + 1361 1361 === 5.1.2 Copyright & Data Retention Policy === 1362 1362 1363 1363 **Evidence Excerpt Storage:**