Changes for page POC1 API & Schemas Specification
Last modified by Robert Schaub on 2025/12/24 20:16
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -58,7 +58,7 @@ 58 58 59 59 * **Input:** Article text 60 60 * **Output:** 5 canonical claims (normalized, deduplicated) 61 -* **Model:** Claude Haiku 4 (default, configurable via LLM abstraction layer) 61 +* **Model:** Claude Haiku 4.5.5 (default, configurable via LLM abstraction layer) 62 62 * **Cost:** $0.003 per article 63 63 * **Cache strategy:** No caching (article-specific) 64 64 ... ... @@ -66,7 +66,7 @@ 66 66 67 67 * **Input:** Single canonical claim 68 68 * **Output:** Scenarios + Evidence + Verdicts 69 -* **Model:** Claude Sonnet 3.5 (default, configurable via LLM abstraction layer)69 +* **Model:** Claude Sonnet 4.5 (default, configurable via LLM abstraction layer) 70 70 * **Cost:** $0.081 per NEW claim 71 71 * **Cache strategy:** Redis, 90-day TTL 72 72 * **Cache key:** claim:v1norm1:{language}:{sha256(canonical_claim)} ... ... @@ -75,7 +75,7 @@ 75 75 76 76 * **Input:** Article + Claim verdicts (from cache or Stage 2) 77 77 * **Output:** Article verdict + Fallacies + Logic quality 78 -* **Model:** Claude Sonnet 3.5 (default, configurable via LLM abstraction layer)78 +* **Model:** Claude Sonnet 4.5 (default, configurable via LLM abstraction layer) 79 79 * **Cost:** $0.030 per article 80 80 * **Cache strategy:** No caching (article-specific) 81 81 ... ... @@ -115,6 +115,336 @@ 115 115 116 116 When free users reach their $10 monthly limit, they enter **Cache-Only Mode**: 117 117 118 + 119 + 120 +==== Stage 3: Holistic Assessment - Complete Specification ==== 121 + 122 +===== 3.3.1 Overview ===== 123 + 124 +**Purpose:** Synthesize individual claim analyses into an overall article assessment, identifying logical fallacies, reasoning quality, and publication readiness. 125 + 126 +**Approach:** **Single-Pass Holistic Analysis** (Approach 1 from Comparison Matrix) 127 + 128 +**Why This Approach for POC1:** 129 +* ✅ **1 API call** (vs 2 for Two-Pass or Judge) 130 +* ✅ **Low cost** ($0.030 per article) 131 +* ✅ **Fast** (4-6 seconds) 132 +* ✅ **Low complexity** (simple implementation) 133 +* ⚠️ **Medium reliability** (acceptable for POC1, will improve in POC2/Production) 134 + 135 +**Alternative Approaches Considered:** 136 + 137 +|= Approach |= API Calls |= Cost |= Speed |= Complexity |= Reliability |= Best For 138 +| **1. Single-Pass** ⭐ | 1 | 💰 Low | ⚡ Fast | 🟢 Low | ⚠️ Medium | **POC1** 139 +| 2. Two-Pass | 2 | 💰💰 Med | 🐢 Slow | 🟡 Med | ✅ High | POC2/Prod 140 +| 3. Structured | 1 | 💰 Low | ⚡ Fast | 🟡 Med | ✅ High | POC1 (alternative) 141 +| 4. Weighted | 1 | 💰 Low | ⚡ Fast | 🟢 Low | ⚠️ Medium | POC1 (alternative) 142 +| 5. Heuristics | 1 | 💰 Lowest | ⚡⚡ Fastest | 🟡 Med | ⚠️ Medium | Any 143 +| 6. Hybrid | 1 | 💰 Low | ⚡ Fast | 🔴 Med-High | ✅ High | POC2 144 +| 7. Judge | 2 | 💰💰 Med | 🐢 Slow | 🟡 Med | ✅ High | Production 145 + 146 +**POC1 Choice:** Approach 1 (Single-Pass) for speed and simplicity. Will upgrade to Approach 2 (Two-Pass) or 6 (Hybrid) in POC2 for higher reliability. 147 + 148 +===== 3.3.2 What Stage 3 Evaluates ===== 149 + 150 +Stage 3 performs **integrated holistic analysis** considering: 151 + 152 +**1. Claim-Level Aggregation:** 153 +* Verdict distribution (how many TRUE vs FALSE vs DISPUTED) 154 +* Average confidence across all claims 155 +* Claim interdependencies (do claims support/contradict each other?) 156 +* Critical claim identification (which claims are most important?) 157 + 158 +**2. Contextual Factors:** 159 +* **Source credibility**: Is the article from a reputable publisher? 160 +* **Author expertise**: Does the author have relevant credentials? 161 +* **Publication date**: Is information current or outdated? 162 +* **Claim coherence**: Do claims form a logical narrative? 163 +* **Missing context**: Are important caveats or qualifications missing? 164 + 165 +**3. Logical Fallacies:** 166 +* **Cherry-picking**: Selective evidence presentation 167 +* **False equivalence**: Treating unequal things as equal 168 +* **Straw man**: Misrepresenting opposing arguments 169 +* **Ad hominem**: Attacking person instead of argument 170 +* **Slippery slope**: Assuming extreme consequences without justification 171 +* **Circular reasoning**: Conclusion assumes premise 172 +* **False dichotomy**: Presenting only two options when more exist 173 + 174 +**4. Reasoning Quality:** 175 +* **Evidence strength**: Quality and quantity of supporting evidence 176 +* **Logical coherence**: Arguments follow logically 177 +* **Transparency**: Assumptions and limitations acknowledged 178 +* **Nuance**: Complexity and uncertainty appropriately addressed 179 + 180 +**5. Publication Readiness:** 181 +* **Risk tier assignment**: A (high risk), B (medium), or C (low risk) 182 +* **Publication mode**: DRAFT_ONLY, AI_GENERATED, or HUMAN_REVIEWED 183 +* **Required disclaimers**: What warnings should accompany this content? 184 + 185 +===== 3.3.3 Implementation: Single-Pass Approach ===== 186 + 187 +**Input:** 188 +* Original article text (full content) 189 +* Stage 2 claim analyses (array of ClaimAnalysis objects) 190 +* Article metadata (URL, title, author, date, source) 191 + 192 +**Processing:** 193 + 194 +{{code language="python"}} 195 +# Pseudo-code for Stage 3 (Single-Pass) 196 + 197 +def stage3_holistic_assessment(article, claim_analyses, metadata): 198 + """ 199 + Single-pass holistic assessment using Claude Sonnet 4.5. 200 + 201 + Approach 1: One comprehensive prompt that asks the LLM to: 202 + 1. Review all claim verdicts 203 + 2. Identify patterns and dependencies 204 + 3. Detect logical fallacies 205 + 4. Assess reasoning quality 206 + 5. Determine credibility score and risk tier 207 + 6. Generate publication recommendations 208 + """ 209 + 210 + # Construct comprehensive prompt 211 + prompt = f""" 212 +You are analyzing an article for factual accuracy and logical reasoning. 213 + 214 +ARTICLE METADATA: 215 +- Title: {metadata['title']} 216 +- Source: {metadata['source']} 217 +- Date: {metadata['date']} 218 +- Author: {metadata['author']} 219 + 220 +ARTICLE TEXT: 221 +{article} 222 + 223 +INDIVIDUAL CLAIM ANALYSES: 224 +{format_claim_analyses(claim_analyses)} 225 + 226 +YOUR TASK: 227 +Perform a holistic assessment considering: 228 + 229 +1. CLAIM AGGREGATION: 230 + - Review the verdict for each claim 231 + - Identify any interdependencies between claims 232 + - Determine which claims are most critical to the article's thesis 233 + 234 +2. CONTEXTUAL EVALUATION: 235 + - Assess source credibility 236 + - Evaluate author expertise 237 + - Consider publication timeliness 238 + - Identify missing context or important caveats 239 + 240 +3. LOGICAL FALLACIES: 241 + - Identify any logical fallacies present 242 + - For each fallacy, provide: 243 + * Type of fallacy 244 + * Where it occurs in the article 245 + * Why it's problematic 246 + * Severity (minor/moderate/severe) 247 + 248 +4. REASONING QUALITY: 249 + - Evaluate evidence strength 250 + - Assess logical coherence 251 + - Check for transparency in assumptions 252 + - Evaluate handling of nuance and uncertainty 253 + 254 +5. CREDIBILITY SCORING: 255 + - Calculate overall credibility score (0.0-1.0) 256 + - Assign risk tier: 257 + * A (high risk): ≤0.5 credibility OR severe fallacies 258 + * B (medium risk): 0.5-0.8 credibility OR moderate issues 259 + * C (low risk): >0.8 credibility AND no significant issues 260 + 261 +6. PUBLICATION RECOMMENDATIONS: 262 + - Determine publication mode: 263 + * DRAFT_ONLY: Tier A, multiple severe issues 264 + * AI_GENERATED: Tier B/C, acceptable quality with disclaimers 265 + * HUMAN_REVIEWED: Complex or borderline cases 266 + - List required disclaimers 267 + - Explain decision rationale 268 + 269 +OUTPUT FORMAT: 270 +Return a JSON object matching the ArticleAssessment schema. 271 +""" 272 + 273 + # Call LLM 274 + response = llm_client.complete( 275 + model="claude-sonnet-4-5-20250929", 276 + prompt=prompt, 277 + max_tokens=4000, 278 + response_format="json" 279 + ) 280 + 281 + # Parse and validate response 282 + assessment = parse_json(response.content) 283 + validate_article_assessment_schema(assessment) 284 + 285 + return assessment 286 +{{/code}} 287 + 288 +**Prompt Engineering Notes:** 289 + 290 +1. **Structured Instructions**: Break down task into 6 clear sections 291 +2. **Context-Rich**: Provide article + all claim analyses + metadata 292 +3. **Explicit Criteria**: Define credibility scoring and risk tiers precisely 293 +4. **JSON Schema**: Request structured output matching ArticleAssessment schema 294 +5. **Examples** (in production): Include 2-3 example assessments for consistency 295 + 296 +===== 3.3.4 Credibility Scoring Algorithm ===== 297 + 298 +**Base Score Calculation:** 299 + 300 +{{code language="python"}} 301 +def calculate_credibility_score(claim_analyses, fallacies, contextual_factors): 302 + """ 303 + Calculate overall credibility score (0.0-1.0). 304 + 305 + This is a GUIDELINE for the LLM, not strict code. 306 + The LLM has flexibility to adjust based on context. 307 + """ 308 + 309 + # 1. Claim Verdict Score (60% weight) 310 + verdict_weights = { 311 + "TRUE": 1.0, 312 + "PARTIALLY_TRUE": 0.7, 313 + "DISPUTED": 0.5, 314 + "UNSUPPORTED": 0.3, 315 + "FALSE": 0.0, 316 + "UNVERIFIABLE": 0.4 317 + } 318 + 319 + claim_scores = [ 320 + verdict_weights[c.verdict.label] * c.verdict.confidence 321 + for c in claim_analyses 322 + ] 323 + avg_claim_score = sum(claim_scores) / len(claim_scores) 324 + claim_component = avg_claim_score * 0.6 325 + 326 + # 2. Fallacy Penalty (20% weight) 327 + fallacy_penalties = { 328 + "minor": -0.05, 329 + "moderate": -0.15, 330 + "severe": -0.30 331 + } 332 + 333 + fallacy_score = 1.0 334 + for fallacy in fallacies: 335 + fallacy_score += fallacy_penalties[fallacy.severity] 336 + 337 + fallacy_score = max(0.0, min(1.0, fallacy_score)) 338 + fallacy_component = fallacy_score * 0.2 339 + 340 + # 3. Contextual Factors (20% weight) 341 + context_adjustments = { 342 + "source_credibility": {"positive": +0.1, "neutral": 0, "negative": -0.1}, 343 + "author_expertise": {"positive": +0.1, "neutral": 0, "negative": -0.1}, 344 + "timeliness": {"positive": +0.05, "neutral": 0, "negative": -0.05}, 345 + "transparency": {"positive": +0.05, "neutral": 0, "negative": -0.05} 346 + } 347 + 348 + context_score = 1.0 349 + for factor in contextual_factors: 350 + adjustment = context_adjustments.get(factor.factor, {}).get(factor.impact, 0) 351 + context_score += adjustment 352 + 353 + context_score = max(0.0, min(1.0, context_score)) 354 + context_component = context_score * 0.2 355 + 356 + # 4. Combine components 357 + final_score = claim_component + fallacy_component + context_component 358 + 359 + # 5. Apply confidence modifier 360 + avg_confidence = sum(c.verdict.confidence for c in claim_analyses) / len(claim_analyses) 361 + final_score = final_score * (0.8 + 0.2 * avg_confidence) 362 + 363 + return max(0.0, min(1.0, final_score)) 364 +{{/code}} 365 + 366 +**Note:** This algorithm is a **guideline** provided to the LLM in the system prompt. The LLM has flexibility to adjust based on specific article context, but should generally follow this structure for consistency. 367 + 368 +===== 3.3.5 Risk Tier Assignment ===== 369 + 370 +**Automatic Risk Tier Rules:** 371 + 372 +{{code}} 373 +Risk Tier A (High Risk - Requires Review): 374 +- Credibility score ≤ 0.5, OR 375 +- Any severe fallacies detected, OR 376 +- Multiple (3+) moderate fallacies, OR 377 +- 50%+ of claims are FALSE or UNSUPPORTED 378 + 379 +Risk Tier B (Medium Risk - May Publish with Disclaimers): 380 +- Credibility score 0.5-0.8, OR 381 +- 1-2 moderate fallacies, OR 382 +- 20-49% of claims are DISPUTED or PARTIALLY_TRUE 383 + 384 +Risk Tier C (Low Risk - Safe to Publish): 385 +- Credibility score > 0.8, AND 386 +- No severe or moderate fallacies, AND 387 +- <20% disputed/problematic claims, AND 388 +- No critical missing context 389 +{{/code}} 390 + 391 +===== 3.3.6 Output: ArticleAssessment Schema ===== 392 + 393 +(See Stage 3 Output Schema section above for complete JSON schema) 394 + 395 +===== 3.3.7 Performance Metrics ===== 396 + 397 +**POC1 Targets:** 398 +* **Processing time**: 4-6 seconds per article 399 +* **Cost**: $0.030 per article (Sonnet 4.5 tokens) 400 +* **Quality**: 70-80% agreement with human reviewers (acceptable for POC) 401 +* **API calls**: 1 per article 402 + 403 +**Future Improvements (POC2/Production):** 404 +* Upgrade to Two-Pass (Approach 2): +15% accuracy, +$0.020 cost 405 +* Add human review sampling: 10% of Tier B articles 406 +* Implement Judge approach (Approach 7) for Tier A: Highest quality 407 + 408 +===== 3.3.8 Example Stage 3 Execution ===== 409 + 410 +**Input:** 411 +* Article: "Biden won the 2020 election" 412 +* Claim analyses: [{claim: "Biden won", verdict: "TRUE", confidence: 0.95}] 413 + 414 +**Stage 3 Processing:** 415 +1. Analyzes single claim with high confidence 416 +2. Checks for contextual factors (source credibility) 417 +3. Searches for logical fallacies (none found) 418 +4. Calculates credibility: 0.6 * 0.95 + 0.2 * 1.0 + 0.2 * 1.0 = 0.97 419 +5. Assigns risk tier: C (low risk) 420 +6. Recommends: AI_GENERATED publication mode 421 + 422 +**Output:** 423 +```json 424 +{ 425 + "article_id": "a1", 426 + "overall_assessment": { 427 + "credibility_score": 0.97, 428 + "risk_tier": "C", 429 + "summary": "Article makes single verifiable claim with strong evidence support", 430 + "confidence": 0.95 431 + }, 432 + "claim_aggregation": { 433 + "total_claims": 1, 434 + "verdict_distribution": {"TRUE": 1}, 435 + "avg_confidence": 0.95 436 + }, 437 + "contextual_factors": [ 438 + {"factor": "source_credibility", "impact": "positive", "description": "Reputable news source"} 439 + ], 440 + "recommendations": { 441 + "publication_mode": "AI_GENERATED", 442 + "requires_review": false, 443 + "suggested_disclaimers": [] 444 + } 445 +} 446 +``` 447 + 118 118 ==== What Cache-Only Mode Provides: ==== 119 119 120 120 ✅ **Claim Extraction (Platform-Funded):** ... ... @@ -236,7 +236,7 @@ 236 236 **Primary Provider (Default):** 237 237 238 238 * **Anthropic Claude API** 239 - * Models: Claude Haiku 4, Claude Sonnet 3.5, Claude Opus 4569 + * Models: Claude Haiku 4.5, Claude Sonnet 4.5, Claude Opus 4 240 240 * Used by default in POC1 241 241 * Best quality for holistic analysis 242 242 ... ... @@ -273,9 +273,9 @@ 273 273 LLM_STAGE1_PROVIDER=anthropic 274 274 LLM_STAGE1_MODEL=claude-haiku-4 275 275 LLM_STAGE2_PROVIDER=anthropic 276 -LLM_STAGE2_MODEL=claude-sonnet- 3-5606 +LLM_STAGE2_MODEL=claude-sonnet-4-5-20250929 277 277 LLM_STAGE3_PROVIDER=anthropic 278 -LLM_STAGE3_MODEL=claude-sonnet- 3-5608 +LLM_STAGE3_MODEL=claude-sonnet-4-5-20250929 279 279 280 280 # Cost limits 281 281 LLM_MAX_COST_PER_REQUEST=1.00 ... ... @@ -302,19 +302,19 @@ 302 302 "stage_config": { 303 303 "stage1": { 304 304 "provider": "anthropic", 305 - "model": "claude-haiku-4", 635 + "model": "claude-haiku-4-5-20251001", 306 306 "max_tokens": 4096, 307 307 "temperature": 0.0 308 308 }, 309 309 "stage2": { 310 310 "provider": "anthropic", 311 - "model": "claude-sonnet- 3-5",641 + "model": "claude-sonnet-4-5-20250929", 312 312 "max_tokens": 16384, 313 313 "temperature": 0.3 314 314 }, 315 315 "stage3": { 316 316 "provider": "anthropic", 317 - "model": "claude-sonnet- 3-5",647 + "model": "claude-sonnet-4-5-20250929", 318 318 "max_tokens": 8192, 319 319 "temperature": 0.2 320 320 } ... ... @@ -328,7 +328,7 @@ 328 328 329 329 **Stage 1: Claim Extraction** 330 330 331 -* **Default:** Anthropic Claude Haiku 4 661 +* **Default:** Anthropic Claude Haiku 4.5 332 332 * **Alternative:** OpenAI GPT-4o-mini, Google Gemini 1.5 Flash 333 333 * **Rationale:** Fast, cheap, simple task 334 334 * **Cost:** ~$0.003 per article ... ... @@ -335,7 +335,7 @@ 335 335 336 336 **Stage 2: Claim Analysis** (CACHEABLE) 337 337 338 -* **Default:** Anthropic Claude Sonnet 3.5668 +* **Default:** Anthropic Claude Sonnet 4.5 339 339 * **Alternative:** OpenAI GPT-4o, Google Gemini 1.5 Pro 340 340 * **Rationale:** High-quality analysis, cached 90 days 341 341 * **Cost:** ~$0.081 per NEW claim ... ... @@ -342,7 +342,7 @@ 342 342 343 343 **Stage 3: Holistic Assessment** 344 344 345 -* **Default:** Anthropic Claude Sonnet 3.5675 +* **Default:** Anthropic Claude Sonnet 4.5 346 346 * **Alternative:** OpenAI GPT-4o, Claude Opus 4 (for high-stakes) 347 347 * **Rationale:** Complex reasoning, logical fallacy detection 348 348 * **Cost:** ~$0.030 per article ... ... @@ -350,9 +350,9 @@ 350 350 **Cost Comparison (Example):** 351 351 352 352 |=Stage|=Anthropic (Default)|=OpenAI Alternative|=Google Alternative 353 -|Stage 1|Claude Haiku 4 ($0.003)|GPT-4o-mini ($0.002)|Gemini Flash ($0.002) 354 -|Stage 2|Claude Sonnet 3.5 ($0.081)|GPT-4o ($0.045)|Gemini Pro ($0.050)355 -|Stage 3|Claude Sonnet 3.5 ($0.030)|GPT-4o ($0.018)|Gemini Pro ($0.020)683 +|Stage 1|Claude Haiku 4.5.5 ($0.003)|GPT-4o-mini ($0.002)|Gemini Flash ($0.002) 684 +|Stage 2|Claude Sonnet 4.5 ($0.081)|GPT-4o ($0.045)|Gemini Pro ($0.050) 685 +|Stage 3|Claude Sonnet 4.5 ($0.030)|GPT-4o ($0.018)|Gemini Pro ($0.020) 356 356 |**Total (0% cache)**|**$0.114**|**$0.065**|**$0.072** 357 357 358 358 **Note:** POC1 uses Anthropic exclusively for consistency. Multi-provider support planned for POC2. ... ... @@ -413,7 +413,7 @@ 413 413 "stage": "stage2", 414 414 "previous": { 415 415 "provider": "anthropic", 416 - "model": "claude-sonnet- 3-5"746 + "model": "claude-sonnet-4-5-20250929" 417 417 }, 418 418 "current": { 419 419 "provider": "openai", ... ... @@ -439,17 +439,17 @@ 439 439 "stages": { 440 440 "stage1": { 441 441 "provider": "anthropic", 442 - "model": "claude-haiku-4", 772 + "model": "claude-haiku-4-5-20251001", 443 443 "cost_per_request": 0.003 444 444 }, 445 445 "stage2": { 446 446 "provider": "anthropic", 447 - "model": "claude-sonnet- 3-5",777 + "model": "claude-sonnet-4-5-20250929", 448 448 "cost_per_new_claim": 0.081 449 449 }, 450 450 "stage3": { 451 451 "provider": "anthropic", 452 - "model": "claude-sonnet- 3-5",782 + "model": "claude-sonnet-4-5-20250929", 453 453 "cost_per_request": 0.030 454 454 } 455 455 } ... ... @@ -466,7 +466,7 @@ 466 466 class AnthropicProvider implements LLMProvider { 467 467 async complete(prompt: string, options: CompletionOptions) { 468 468 const response = await anthropic.messages.create({ 469 - model: options.model || 'claude-sonnet- 3-5',799 + model: options.model || 'claude-sonnet-4-5-20250929', 470 470 max_tokens: options.maxTokens || 4096, 471 471 messages: [{ role: 'user', content: prompt }], 472 472 system: options.systemPrompt ... ... @@ -532,6 +532,178 @@ 532 532 533 533 ---- 534 534 865 + 866 + 867 +==== Stage 2 Output Schema: ClaimAnalysis ==== 868 + 869 +**Complete schema for each claim's analysis result:** 870 + 871 +{{code language="json"}} 872 +{ 873 + "claim_id": "claim_abc123", 874 + "claim_text": "Biden won the 2020 election", 875 + "scenarios": [ 876 + { 877 + "scenario_id": "scenario_1", 878 + "description": "Interpreting 'won' as Electoral College victory", 879 + "verdict": { 880 + "label": "TRUE", 881 + "confidence": 0.95, 882 + "explanation": "Joe Biden won 306 electoral votes vs Trump's 232" 883 + }, 884 + "evidence": { 885 + "supporting": [ 886 + { 887 + "text": "Biden certified with 306 electoral votes", 888 + "source_url": "https://www.archives.gov/electoral-college/2020", 889 + "source_title": "2020 Electoral College Results", 890 + "credibility_score": 0.98 891 + } 892 + ], 893 + "opposing": [] 894 + } 895 + } 896 + ], 897 + "recommended_scenario": "scenario_1", 898 + "metadata": { 899 + "analysis_timestamp": "2024-12-24T18:00:00Z", 900 + "model_used": "claude-sonnet-4-5-20250929", 901 + "processing_time_seconds": 8.5 902 + } 903 +} 904 +{{/code}} 905 + 906 +**Required Fields:** 907 +* **claim_id**: Unique identifier matching Stage 1 output 908 +* **claim_text**: The exact claim being analyzed 909 +* **scenarios**: Array of interpretation scenarios (minimum 1) 910 + * **scenario_id**: Unique ID for this scenario 911 + * **description**: Clear interpretation of the claim 912 + * **verdict**: Verdict object with label, confidence, explanation 913 + * **evidence**: Supporting and opposing evidence arrays 914 +* **recommended_scenario**: ID of the primary/recommended scenario 915 +* **metadata**: Processing metadata (timestamp, model, timing) 916 + 917 +**Optional Fields:** 918 +* Additional context, warnings, or quality scores 919 + 920 +**Minimum Viable Example:** 921 + 922 +{{code language="json"}} 923 +{ 924 + "claim_id": "c1", 925 + "claim_text": "The sky is blue", 926 + "scenarios": [{ 927 + "scenario_id": "s1", 928 + "description": "Under clear daytime conditions", 929 + "verdict": {"label": "TRUE", "confidence": 0.99, "explanation": "Rayleigh scattering"}, 930 + "evidence": {"supporting": [], "opposing": []} 931 + }], 932 + "recommended_scenario": "s1", 933 + "metadata": {"analysis_timestamp": "2024-12-24T18:00:00Z"} 934 +} 935 +{{/code}} 936 + 937 + 938 + 939 +==== Stage 3 Output Schema: ArticleAssessment ==== 940 + 941 +**Complete schema for holistic article-level assessment:** 942 + 943 +{{code language="json"}} 944 +{ 945 + "article_id": "article_xyz789", 946 + "overall_assessment": { 947 + "credibility_score": 0.72, 948 + "risk_tier": "B", 949 + "summary": "Article contains mostly accurate claims with one disputed claim requiring expert review", 950 + "confidence": 0.85 951 + }, 952 + "claim_aggregation": { 953 + "total_claims": 5, 954 + "verdict_distribution": { 955 + "TRUE": 3, 956 + "PARTIALLY_TRUE": 1, 957 + "DISPUTED": 1, 958 + "FALSE": 0, 959 + "UNSUPPORTED": 0, 960 + "UNVERIFIABLE": 0 961 + }, 962 + "avg_confidence": 0.82 963 + }, 964 + "contextual_factors": [ 965 + { 966 + "factor": "Source credibility", 967 + "impact": "positive", 968 + "description": "Published by reputable news organization" 969 + }, 970 + { 971 + "factor": "Claim interdependence", 972 + "impact": "neutral", 973 + "description": "Claims are independent; no logical chains" 974 + } 975 + ], 976 + "recommendations": { 977 + "publication_mode": "AI_GENERATED", 978 + "requires_review": false, 979 + "review_reason": null, 980 + "suggested_disclaimers": [ 981 + "One claim (Claim 4) has conflicting expert opinions" 982 + ] 983 + }, 984 + "metadata": { 985 + "holistic_timestamp": "2024-12-24T18:00:10Z", 986 + "model_used": "claude-sonnet-4-5-20250929", 987 + "processing_time_seconds": 4.2, 988 + "cache_used": false 989 + } 990 +} 991 +{{/code}} 992 + 993 +**Required Fields:** 994 +* **article_id**: Unique identifier for this article 995 +* **overall_assessment**: Top-level assessment 996 + * **credibility_score**: 0.0-1.0 composite score 997 + * **risk_tier**: A, B, or C (per AKEL quality gates) 998 + * **summary**: Human-readable assessment 999 + * **confidence**: How confident the holistic assessment is 1000 +* **claim_aggregation**: Statistics across all claims 1001 + * **total_claims**: Count of claims analyzed 1002 + * **verdict_distribution**: Count per verdict label 1003 + * **avg_confidence**: Average confidence across verdicts 1004 +* **contextual_factors**: Array of contextual considerations 1005 +* **recommendations**: Publication decision support 1006 + * **publication_mode**: DRAFT_ONLY, AI_GENERATED, or HUMAN_REVIEWED 1007 + * **requires_review**: Boolean flag 1008 + * **suggested_disclaimers**: Array of disclaimer texts 1009 +* **metadata**: Processing metadata 1010 + 1011 +**Minimum Viable Example:** 1012 + 1013 +{{code language="json"}} 1014 +{ 1015 + "article_id": "a1", 1016 + "overall_assessment": { 1017 + "credibility_score": 0.95, 1018 + "risk_tier": "C", 1019 + "summary": "All claims verified as true", 1020 + "confidence": 0.98 1021 + }, 1022 + "claim_aggregation": { 1023 + "total_claims": 1, 1024 + "verdict_distribution": {"TRUE": 1}, 1025 + "avg_confidence": 0.99 1026 + }, 1027 + "contextual_factors": [], 1028 + "recommendations": { 1029 + "publication_mode": "AI_GENERATED", 1030 + "requires_review": false, 1031 + "suggested_disclaimers": [] 1032 + }, 1033 + "metadata": {"holistic_timestamp": "2024-12-24T18:00:00Z"} 1034 +} 1035 +{{/code}} 1036 + 535 535 === 3.2 Create Analysis Job (3-Stage) === 536 536 537 537 **Endpoint:** POST /v1/analyze ... ... @@ -583,6 +583,20 @@ 583 583 "browsing": "on", 584 584 "depth": "standard", 585 585 "max_claims": 5, 1088 + 1089 +* **cache_preference** (optional): Cache usage preference 1090 + * **Type:** string 1091 + * **Enum:** {{code}}["prefer_cache", "allow_partial", "skip_cache"]{{/code}} 1092 + * **Default:** {{code}}"prefer_cache"{{/code}} 1093 + * **Semantics:** 1094 + * {{code}}"prefer_cache"{{/code}}: Use full cache if available, otherwise run all stages 1095 + * {{code}}"allow_partial"{{/code}}: Use cached Stage 2 results if available, rerun only Stage 3 1096 + * {{code}}"skip_cache"{{/code}}: Always rerun all stages (ignore cache) 1097 + * **Behavior:** When set to {{code}}"allow_partial"{{/code}} and Stage 2 cached results exist: 1098 + * Stage 1 & 2 are skipped 1099 + * Stage 3 (holistic assessment) runs fresh with cached claim analyses 1100 + * Response includes {{code}}"cache_used": true{{/code}} and {{code}}"stages_cached": ["stage1", "stage2"]{{/code}} 1101 + 586 586 "scenarios_per_claim": 2, 587 587 "max_evidence_per_scenario": 6, 588 588 "context_aware_analysis": true ... ... @@ -770,80 +770,78 @@ 770 770 771 771 **Algorithm: Canonical Claim Normalization v1** 772 772 773 -{{{def normalize_claim_v1(claim_text: str, language: str) -> str: 774 - """ 775 - Normalizes claim to canonical form for cache key generation. 776 - Version: v1norm1 (POC1) 777 - """ 778 - import re 779 - import unicodedata 780 - 781 - # Step 1: Unicode normalization (NFC) 782 - text = unicodedata.normalize('NFC', claim_text) 783 - 784 - # Step 2: Lowercase 785 - text = text.lower() 786 - 787 - # Step 3: Remove punctuation (except hyphens in words) 788 - text = re.sub(r'[^\w\s-]', '', text) 789 - 790 - # Step 4: Normalize whitespace (collapse multiple spaces) 791 - text = re.sub(r'\s+', ' ', text).strip() 792 - 793 - # Step 5: Numeric normalization 794 - text = text.replace('%', ' percent') 795 - # Spell out single-digit numbers 796 - num_to_word = {'0':'zero', '1':'one', '2':'two', '3':'three', 797 - '4':'four', '5':'five', '6':'six', '7':'seven', 798 - '8':'eight', '9':'nine'} 799 - for num, word in num_to_word.items(): 800 - text = re.sub(rf'\b{num}\b', word, text) 801 - 802 - # Step 6: Common abbreviations (English only in v1) 803 - if language == 'en': 804 - text = text.replace('covid-19', 'covid') 805 - text = text.replace('u.s.', 'us') 806 - text = text.replace('u.k.', 'uk') 807 - 808 - # Step 7: NO entity normalization in v1 809 - # (Trump vs Donald Trump vs President Trump remain distinct) 810 - 811 - return text 812 812 813 -# Version identifier (include in cache namespace) 814 -CANONICALIZER_VERSION = "v1norm1" 815 -}}} 1290 +**Normative Algorithm:** 816 816 817 -**Cache Key Formula (Updated):** 1292 +{{code language="python"}} 1293 +def normalize_claim(text: str) -> str: 1294 + """ 1295 + Canonical claim normalization for deduplication. 1296 + MUST follow this algorithm exactly. 1297 + 1298 + Version: v1norm1 1299 + """ 1300 + import re 1301 + import unicodedata 1302 + 1303 + # 1. Unicode normalization (NFD) 1304 + text = unicodedata.normalize('NFD', text) 1305 + 1306 + # 2. Lowercase 1307 + text = text.lower() 1308 + 1309 + # 3. Remove diacritics 1310 + text = ''.join(c for c in text if unicodedata.category(c) != 'Mn') 1311 + 1312 + # 4. Normalize whitespace 1313 + text = re.sub(r'\s+', ' ', text) 1314 + text = text.strip() 1315 + 1316 + # 5. Remove punctuation except apostrophes in contractions 1317 + text = re.sub(r"[^\w\s']", '', text) 1318 + 1319 + # 6. Normalize common contractions 1320 + contractions = { 1321 + "don't": "do not", 1322 + "doesn't": "does not", 1323 + "didn't": "did not", 1324 + "can't": "cannot", 1325 + "won't": "will not", 1326 + "shouldn't": "should not", 1327 + "wouldn't": "would not", 1328 + "isn't": "is not", 1329 + "aren't": "are not", 1330 + "wasn't": "was not", 1331 + "weren't": "were not", 1332 + "haven't": "have not", 1333 + "hasn't": "has not", 1334 + "hadn't": "had not" 1335 + } 1336 + 1337 + for contraction, expansion in contractions.items(): 1338 + text = re.sub(r'\b' + contraction + r'\b', expansion, text) 1339 + 1340 + # 7. Remove remaining apostrophes 1341 + text = text.replace("'", "") 1342 + 1343 + # 8. Final whitespace normalization 1344 + text = re.sub(r'\s+', ' ', text) 1345 + text = text.strip() 1346 + 1347 + return text 1348 +{{/code}} 818 818 819 -{{{language = "en" 820 -canonical = normalize_claim_v1(claim_text, language) 821 -cache_key = f"claim:{CANONICALIZER_VERSION}:{language}:{sha256(canonical)}" 1350 +**Normalization Examples:** 822 822 823 - Example:824 - claim:"COVID-19vaccinesare95%effective"825 - canonical:"covidvaccinesare95percent effective"826 - sha256:abc123...def456827 - key:"claim:v1norm1:en:abc123...def456"828 -}}} 1352 +|= Input |= Normalized Output 1353 +| "Biden won the 2020 election" | {{code}}biden won the 2020 election{{/code}} 1354 +| "Biden won the 2020 election!" | {{code}}biden won the 2020 election{{/code}} 1355 +| "Biden won the 2020 election" | {{code}}biden won the 2020 election{{/code}} 1356 +| "Biden didn't win the 2020 election" | {{code}}biden did not win the 2020 election{{/code}} 1357 +| "BIDEN WON THE 2020 ELECTION" | {{code}}biden won the 2020 election{{/code}} 829 829 830 -** CacheMetadataMUSTInclude:**1359 +**Versioning:** Algorithm version is {{code}}v1norm1{{/code}}. Changes to the algorithm require a new version identifier. 831 831 832 -{{{{ 833 - "canonical_claim": "covid vaccines are 95 percent effective", 834 - "canonicalizer_version": "v1norm1", 835 - "language": "en", 836 - "original_claim_samples": ["COVID-19 vaccines are 95% effective"] 837 -} 838 -}}} 839 - 840 -**Version Upgrade Path:** 841 - 842 -* v1norm1 → v1norm2: Cache namespace changes, old keys remain valid until TTL 843 -* v1normN → v2norm1: Major version bump, invalidate all v1 caches 844 - 845 ----- 846 - 847 847 === 5.1.2 Copyright & Data Retention Policy === 848 848 849 849 **Evidence Excerpt Storage:**