Last modified by Robert Schaub on 2025/12/24 20:16

From version 1.1
edited by Robert Schaub
on 2025/12/24 19:45
Change comment: Imported from XAR
To version 2.1
edited by Robert Schaub
on 2025/12/24 19:51
Change comment: Imported from XAR

Summary

Details

Page properties
Content
... ... @@ -58,7 +58,7 @@
58 58  
59 59  * **Input:** Article text
60 60  * **Output:** 5 canonical claims (normalized, deduplicated)
61 -* **Model:** Claude Haiku 4 (default, configurable via LLM abstraction layer)
61 +* **Model:** Claude Haiku 4.5.5 (default, configurable via LLM abstraction layer)
62 62  * **Cost:** $0.003 per article
63 63  * **Cache strategy:** No caching (article-specific)
64 64  
... ... @@ -66,7 +66,7 @@
66 66  
67 67  * **Input:** Single canonical claim
68 68  * **Output:** Scenarios + Evidence + Verdicts
69 -* **Model:** Claude Sonnet 3.5 (default, configurable via LLM abstraction layer)
69 +* **Model:** Claude Sonnet 4.5 (default, configurable via LLM abstraction layer)
70 70  * **Cost:** $0.081 per NEW claim
71 71  * **Cache strategy:** Redis, 90-day TTL
72 72  * **Cache key:** claim:v1norm1:{language}:{sha256(canonical_claim)}
... ... @@ -75,7 +75,7 @@
75 75  
76 76  * **Input:** Article + Claim verdicts (from cache or Stage 2)
77 77  * **Output:** Article verdict + Fallacies + Logic quality
78 -* **Model:** Claude Sonnet 3.5 (default, configurable via LLM abstraction layer)
78 +* **Model:** Claude Sonnet 4.5 (default, configurable via LLM abstraction layer)
79 79  * **Cost:** $0.030 per article
80 80  * **Cache strategy:** No caching (article-specific)
81 81  
... ... @@ -115,6 +115,336 @@
115 115  
116 116  When free users reach their $10 monthly limit, they enter **Cache-Only Mode**:
117 117  
118 +
119 +
120 +==== Stage 3: Holistic Assessment - Complete Specification ====
121 +
122 +===== 3.3.1 Overview =====
123 +
124 +**Purpose:** Synthesize individual claim analyses into an overall article assessment, identifying logical fallacies, reasoning quality, and publication readiness.
125 +
126 +**Approach:** **Single-Pass Holistic Analysis** (Approach 1 from Comparison Matrix)
127 +
128 +**Why This Approach for POC1:**
129 +* ✅ **1 API call** (vs 2 for Two-Pass or Judge)
130 +* ✅ **Low cost** ($0.030 per article)
131 +* ✅ **Fast** (4-6 seconds)
132 +* ✅ **Low complexity** (simple implementation)
133 +* ⚠️ **Medium reliability** (acceptable for POC1, will improve in POC2/Production)
134 +
135 +**Alternative Approaches Considered:**
136 +
137 +|= Approach |= API Calls |= Cost |= Speed |= Complexity |= Reliability |= Best For
138 +| **1. Single-Pass** ⭐ | 1 | 💰 Low | ⚡ Fast | 🟢 Low | ⚠️ Medium | **POC1**
139 +| 2. Two-Pass | 2 | 💰💰 Med | 🐢 Slow | 🟡 Med | ✅ High | POC2/Prod
140 +| 3. Structured | 1 | 💰 Low | ⚡ Fast | 🟡 Med | ✅ High | POC1 (alternative)
141 +| 4. Weighted | 1 | 💰 Low | ⚡ Fast | 🟢 Low | ⚠️ Medium | POC1 (alternative)
142 +| 5. Heuristics | 1 | 💰 Lowest | ⚡⚡ Fastest | 🟡 Med | ⚠️ Medium | Any
143 +| 6. Hybrid | 1 | 💰 Low | ⚡ Fast | 🔴 Med-High | ✅ High | POC2
144 +| 7. Judge | 2 | 💰💰 Med | 🐢 Slow | 🟡 Med | ✅ High | Production
145 +
146 +**POC1 Choice:** Approach 1 (Single-Pass) for speed and simplicity. Will upgrade to Approach 2 (Two-Pass) or 6 (Hybrid) in POC2 for higher reliability.
147 +
148 +===== 3.3.2 What Stage 3 Evaluates =====
149 +
150 +Stage 3 performs **integrated holistic analysis** considering:
151 +
152 +**1. Claim-Level Aggregation:**
153 +* Verdict distribution (how many TRUE vs FALSE vs DISPUTED)
154 +* Average confidence across all claims
155 +* Claim interdependencies (do claims support/contradict each other?)
156 +* Critical claim identification (which claims are most important?)
157 +
158 +**2. Contextual Factors:**
159 +* **Source credibility**: Is the article from a reputable publisher?
160 +* **Author expertise**: Does the author have relevant credentials?
161 +* **Publication date**: Is information current or outdated?
162 +* **Claim coherence**: Do claims form a logical narrative?
163 +* **Missing context**: Are important caveats or qualifications missing?
164 +
165 +**3. Logical Fallacies:**
166 +* **Cherry-picking**: Selective evidence presentation
167 +* **False equivalence**: Treating unequal things as equal
168 +* **Straw man**: Misrepresenting opposing arguments
169 +* **Ad hominem**: Attacking person instead of argument
170 +* **Slippery slope**: Assuming extreme consequences without justification
171 +* **Circular reasoning**: Conclusion assumes premise
172 +* **False dichotomy**: Presenting only two options when more exist
173 +
174 +**4. Reasoning Quality:**
175 +* **Evidence strength**: Quality and quantity of supporting evidence
176 +* **Logical coherence**: Arguments follow logically
177 +* **Transparency**: Assumptions and limitations acknowledged
178 +* **Nuance**: Complexity and uncertainty appropriately addressed
179 +
180 +**5. Publication Readiness:**
181 +* **Risk tier assignment**: A (high risk), B (medium), or C (low risk)
182 +* **Publication mode**: DRAFT_ONLY, AI_GENERATED, or HUMAN_REVIEWED
183 +* **Required disclaimers**: What warnings should accompany this content?
184 +
185 +===== 3.3.3 Implementation: Single-Pass Approach =====
186 +
187 +**Input:**
188 +* Original article text (full content)
189 +* Stage 2 claim analyses (array of ClaimAnalysis objects)
190 +* Article metadata (URL, title, author, date, source)
191 +
192 +**Processing:**
193 +
194 +{{code language="python"}}
195 +# Pseudo-code for Stage 3 (Single-Pass)
196 +
197 +def stage3_holistic_assessment(article, claim_analyses, metadata):
198 + """
199 + Single-pass holistic assessment using Claude Sonnet 4.5.
200 +
201 + Approach 1: One comprehensive prompt that asks the LLM to:
202 + 1. Review all claim verdicts
203 + 2. Identify patterns and dependencies
204 + 3. Detect logical fallacies
205 + 4. Assess reasoning quality
206 + 5. Determine credibility score and risk tier
207 + 6. Generate publication recommendations
208 + """
209 +
210 + # Construct comprehensive prompt
211 + prompt = f"""
212 +You are analyzing an article for factual accuracy and logical reasoning.
213 +
214 +ARTICLE METADATA:
215 +- Title: {metadata['title']}
216 +- Source: {metadata['source']}
217 +- Date: {metadata['date']}
218 +- Author: {metadata['author']}
219 +
220 +ARTICLE TEXT:
221 +{article}
222 +
223 +INDIVIDUAL CLAIM ANALYSES:
224 +{format_claim_analyses(claim_analyses)}
225 +
226 +YOUR TASK:
227 +Perform a holistic assessment considering:
228 +
229 +1. CLAIM AGGREGATION:
230 + - Review the verdict for each claim
231 + - Identify any interdependencies between claims
232 + - Determine which claims are most critical to the article's thesis
233 +
234 +2. CONTEXTUAL EVALUATION:
235 + - Assess source credibility
236 + - Evaluate author expertise
237 + - Consider publication timeliness
238 + - Identify missing context or important caveats
239 +
240 +3. LOGICAL FALLACIES:
241 + - Identify any logical fallacies present
242 + - For each fallacy, provide:
243 + * Type of fallacy
244 + * Where it occurs in the article
245 + * Why it's problematic
246 + * Severity (minor/moderate/severe)
247 +
248 +4. REASONING QUALITY:
249 + - Evaluate evidence strength
250 + - Assess logical coherence
251 + - Check for transparency in assumptions
252 + - Evaluate handling of nuance and uncertainty
253 +
254 +5. CREDIBILITY SCORING:
255 + - Calculate overall credibility score (0.0-1.0)
256 + - Assign risk tier:
257 + * A (high risk): ≤0.5 credibility OR severe fallacies
258 + * B (medium risk): 0.5-0.8 credibility OR moderate issues
259 + * C (low risk): >0.8 credibility AND no significant issues
260 +
261 +6. PUBLICATION RECOMMENDATIONS:
262 + - Determine publication mode:
263 + * DRAFT_ONLY: Tier A, multiple severe issues
264 + * AI_GENERATED: Tier B/C, acceptable quality with disclaimers
265 + * HUMAN_REVIEWED: Complex or borderline cases
266 + - List required disclaimers
267 + - Explain decision rationale
268 +
269 +OUTPUT FORMAT:
270 +Return a JSON object matching the ArticleAssessment schema.
271 +"""
272 +
273 + # Call LLM
274 + response = llm_client.complete(
275 + model="claude-sonnet-4-5-20250929",
276 + prompt=prompt,
277 + max_tokens=4000,
278 + response_format="json"
279 + )
280 +
281 + # Parse and validate response
282 + assessment = parse_json(response.content)
283 + validate_article_assessment_schema(assessment)
284 +
285 + return assessment
286 +{{/code}}
287 +
288 +**Prompt Engineering Notes:**
289 +
290 +1. **Structured Instructions**: Break down task into 6 clear sections
291 +2. **Context-Rich**: Provide article + all claim analyses + metadata
292 +3. **Explicit Criteria**: Define credibility scoring and risk tiers precisely
293 +4. **JSON Schema**: Request structured output matching ArticleAssessment schema
294 +5. **Examples** (in production): Include 2-3 example assessments for consistency
295 +
296 +===== 3.3.4 Credibility Scoring Algorithm =====
297 +
298 +**Base Score Calculation:**
299 +
300 +{{code language="python"}}
301 +def calculate_credibility_score(claim_analyses, fallacies, contextual_factors):
302 + """
303 + Calculate overall credibility score (0.0-1.0).
304 +
305 + This is a GUIDELINE for the LLM, not strict code.
306 + The LLM has flexibility to adjust based on context.
307 + """
308 +
309 + # 1. Claim Verdict Score (60% weight)
310 + verdict_weights = {
311 + "TRUE": 1.0,
312 + "PARTIALLY_TRUE": 0.7,
313 + "DISPUTED": 0.5,
314 + "UNSUPPORTED": 0.3,
315 + "FALSE": 0.0,
316 + "UNVERIFIABLE": 0.4
317 + }
318 +
319 + claim_scores = [
320 + verdict_weights[c.verdict.label] * c.verdict.confidence
321 + for c in claim_analyses
322 + ]
323 + avg_claim_score = sum(claim_scores) / len(claim_scores)
324 + claim_component = avg_claim_score * 0.6
325 +
326 + # 2. Fallacy Penalty (20% weight)
327 + fallacy_penalties = {
328 + "minor": -0.05,
329 + "moderate": -0.15,
330 + "severe": -0.30
331 + }
332 +
333 + fallacy_score = 1.0
334 + for fallacy in fallacies:
335 + fallacy_score += fallacy_penalties[fallacy.severity]
336 +
337 + fallacy_score = max(0.0, min(1.0, fallacy_score))
338 + fallacy_component = fallacy_score * 0.2
339 +
340 + # 3. Contextual Factors (20% weight)
341 + context_adjustments = {
342 + "source_credibility": {"positive": +0.1, "neutral": 0, "negative": -0.1},
343 + "author_expertise": {"positive": +0.1, "neutral": 0, "negative": -0.1},
344 + "timeliness": {"positive": +0.05, "neutral": 0, "negative": -0.05},
345 + "transparency": {"positive": +0.05, "neutral": 0, "negative": -0.05}
346 + }
347 +
348 + context_score = 1.0
349 + for factor in contextual_factors:
350 + adjustment = context_adjustments.get(factor.factor, {}).get(factor.impact, 0)
351 + context_score += adjustment
352 +
353 + context_score = max(0.0, min(1.0, context_score))
354 + context_component = context_score * 0.2
355 +
356 + # 4. Combine components
357 + final_score = claim_component + fallacy_component + context_component
358 +
359 + # 5. Apply confidence modifier
360 + avg_confidence = sum(c.verdict.confidence for c in claim_analyses) / len(claim_analyses)
361 + final_score = final_score * (0.8 + 0.2 * avg_confidence)
362 +
363 + return max(0.0, min(1.0, final_score))
364 +{{/code}}
365 +
366 +**Note:** This algorithm is a **guideline** provided to the LLM in the system prompt. The LLM has flexibility to adjust based on specific article context, but should generally follow this structure for consistency.
367 +
368 +===== 3.3.5 Risk Tier Assignment =====
369 +
370 +**Automatic Risk Tier Rules:**
371 +
372 +{{code}}
373 +Risk Tier A (High Risk - Requires Review):
374 +- Credibility score ≤ 0.5, OR
375 +- Any severe fallacies detected, OR
376 +- Multiple (3+) moderate fallacies, OR
377 +- 50%+ of claims are FALSE or UNSUPPORTED
378 +
379 +Risk Tier B (Medium Risk - May Publish with Disclaimers):
380 +- Credibility score 0.5-0.8, OR
381 +- 1-2 moderate fallacies, OR
382 +- 20-49% of claims are DISPUTED or PARTIALLY_TRUE
383 +
384 +Risk Tier C (Low Risk - Safe to Publish):
385 +- Credibility score > 0.8, AND
386 +- No severe or moderate fallacies, AND
387 +- <20% disputed/problematic claims, AND
388 +- No critical missing context
389 +{{/code}}
390 +
391 +===== 3.3.6 Output: ArticleAssessment Schema =====
392 +
393 +(See Stage 3 Output Schema section above for complete JSON schema)
394 +
395 +===== 3.3.7 Performance Metrics =====
396 +
397 +**POC1 Targets:**
398 +* **Processing time**: 4-6 seconds per article
399 +* **Cost**: $0.030 per article (Sonnet 4.5 tokens)
400 +* **Quality**: 70-80% agreement with human reviewers (acceptable for POC)
401 +* **API calls**: 1 per article
402 +
403 +**Future Improvements (POC2/Production):**
404 +* Upgrade to Two-Pass (Approach 2): +15% accuracy, +$0.020 cost
405 +* Add human review sampling: 10% of Tier B articles
406 +* Implement Judge approach (Approach 7) for Tier A: Highest quality
407 +
408 +===== 3.3.8 Example Stage 3 Execution =====
409 +
410 +**Input:**
411 +* Article: "Biden won the 2020 election"
412 +* Claim analyses: [{claim: "Biden won", verdict: "TRUE", confidence: 0.95}]
413 +
414 +**Stage 3 Processing:**
415 +1. Analyzes single claim with high confidence
416 +2. Checks for contextual factors (source credibility)
417 +3. Searches for logical fallacies (none found)
418 +4. Calculates credibility: 0.6 * 0.95 + 0.2 * 1.0 + 0.2 * 1.0 = 0.97
419 +5. Assigns risk tier: C (low risk)
420 +6. Recommends: AI_GENERATED publication mode
421 +
422 +**Output:**
423 +```json
424 +{
425 + "article_id": "a1",
426 + "overall_assessment": {
427 + "credibility_score": 0.97,
428 + "risk_tier": "C",
429 + "summary": "Article makes single verifiable claim with strong evidence support",
430 + "confidence": 0.95
431 + },
432 + "claim_aggregation": {
433 + "total_claims": 1,
434 + "verdict_distribution": {"TRUE": 1},
435 + "avg_confidence": 0.95
436 + },
437 + "contextual_factors": [
438 + {"factor": "source_credibility", "impact": "positive", "description": "Reputable news source"}
439 + ],
440 + "recommendations": {
441 + "publication_mode": "AI_GENERATED",
442 + "requires_review": false,
443 + "suggested_disclaimers": []
444 + }
445 +}
446 +```
447 +
118 118  ==== What Cache-Only Mode Provides: ====
119 119  
120 120  ✅ **Claim Extraction (Platform-Funded):**
... ... @@ -236,7 +236,7 @@
236 236  **Primary Provider (Default):**
237 237  
238 238  * **Anthropic Claude API**
239 - * Models: Claude Haiku 4, Claude Sonnet 3.5, Claude Opus 4
569 + * Models: Claude Haiku 4.5, Claude Sonnet 4.5, Claude Opus 4
240 240   * Used by default in POC1
241 241   * Best quality for holistic analysis
242 242  
... ... @@ -273,9 +273,9 @@
273 273  LLM_STAGE1_PROVIDER=anthropic
274 274  LLM_STAGE1_MODEL=claude-haiku-4
275 275  LLM_STAGE2_PROVIDER=anthropic
276 -LLM_STAGE2_MODEL=claude-sonnet-3-5
606 +LLM_STAGE2_MODEL=claude-sonnet-4-5-20250929
277 277  LLM_STAGE3_PROVIDER=anthropic
278 -LLM_STAGE3_MODEL=claude-sonnet-3-5
608 +LLM_STAGE3_MODEL=claude-sonnet-4-5-20250929
279 279  
280 280  # Cost limits
281 281  LLM_MAX_COST_PER_REQUEST=1.00
... ... @@ -302,19 +302,19 @@
302 302   "stage_config": {
303 303   "stage1": {
304 304   "provider": "anthropic",
305 - "model": "claude-haiku-4",
635 + "model": "claude-haiku-4-5-20251001",
306 306   "max_tokens": 4096,
307 307   "temperature": 0.0
308 308   },
309 309   "stage2": {
310 310   "provider": "anthropic",
311 - "model": "claude-sonnet-3-5",
641 + "model": "claude-sonnet-4-5-20250929",
312 312   "max_tokens": 16384,
313 313   "temperature": 0.3
314 314   },
315 315   "stage3": {
316 316   "provider": "anthropic",
317 - "model": "claude-sonnet-3-5",
647 + "model": "claude-sonnet-4-5-20250929",
318 318   "max_tokens": 8192,
319 319   "temperature": 0.2
320 320   }
... ... @@ -328,7 +328,7 @@
328 328  
329 329  **Stage 1: Claim Extraction**
330 330  
331 -* **Default:** Anthropic Claude Haiku 4
661 +* **Default:** Anthropic Claude Haiku 4.5
332 332  * **Alternative:** OpenAI GPT-4o-mini, Google Gemini 1.5 Flash
333 333  * **Rationale:** Fast, cheap, simple task
334 334  * **Cost:** ~$0.003 per article
... ... @@ -335,7 +335,7 @@
335 335  
336 336  **Stage 2: Claim Analysis** (CACHEABLE)
337 337  
338 -* **Default:** Anthropic Claude Sonnet 3.5
668 +* **Default:** Anthropic Claude Sonnet 4.5
339 339  * **Alternative:** OpenAI GPT-4o, Google Gemini 1.5 Pro
340 340  * **Rationale:** High-quality analysis, cached 90 days
341 341  * **Cost:** ~$0.081 per NEW claim
... ... @@ -342,7 +342,7 @@
342 342  
343 343  **Stage 3: Holistic Assessment**
344 344  
345 -* **Default:** Anthropic Claude Sonnet 3.5
675 +* **Default:** Anthropic Claude Sonnet 4.5
346 346  * **Alternative:** OpenAI GPT-4o, Claude Opus 4 (for high-stakes)
347 347  * **Rationale:** Complex reasoning, logical fallacy detection
348 348  * **Cost:** ~$0.030 per article
... ... @@ -350,9 +350,9 @@
350 350  **Cost Comparison (Example):**
351 351  
352 352  |=Stage|=Anthropic (Default)|=OpenAI Alternative|=Google Alternative
353 -|Stage 1|Claude Haiku 4 ($0.003)|GPT-4o-mini ($0.002)|Gemini Flash ($0.002)
354 -|Stage 2|Claude Sonnet 3.5 ($0.081)|GPT-4o ($0.045)|Gemini Pro ($0.050)
355 -|Stage 3|Claude Sonnet 3.5 ($0.030)|GPT-4o ($0.018)|Gemini Pro ($0.020)
683 +|Stage 1|Claude Haiku 4.5.5 ($0.003)|GPT-4o-mini ($0.002)|Gemini Flash ($0.002)
684 +|Stage 2|Claude Sonnet 4.5 ($0.081)|GPT-4o ($0.045)|Gemini Pro ($0.050)
685 +|Stage 3|Claude Sonnet 4.5 ($0.030)|GPT-4o ($0.018)|Gemini Pro ($0.020)
356 356  |**Total (0% cache)**|**$0.114**|**$0.065**|**$0.072**
357 357  
358 358  **Note:** POC1 uses Anthropic exclusively for consistency. Multi-provider support planned for POC2.
... ... @@ -413,7 +413,7 @@
413 413   "stage": "stage2",
414 414   "previous": {
415 415   "provider": "anthropic",
416 - "model": "claude-sonnet-3-5"
746 + "model": "claude-sonnet-4-5-20250929"
417 417   },
418 418   "current": {
419 419   "provider": "openai",
... ... @@ -439,17 +439,17 @@
439 439   "stages": {
440 440   "stage1": {
441 441   "provider": "anthropic",
442 - "model": "claude-haiku-4",
772 + "model": "claude-haiku-4-5-20251001",
443 443   "cost_per_request": 0.003
444 444   },
445 445   "stage2": {
446 446   "provider": "anthropic",
447 - "model": "claude-sonnet-3-5",
777 + "model": "claude-sonnet-4-5-20250929",
448 448   "cost_per_new_claim": 0.081
449 449   },
450 450   "stage3": {
451 451   "provider": "anthropic",
452 - "model": "claude-sonnet-3-5",
782 + "model": "claude-sonnet-4-5-20250929",
453 453   "cost_per_request": 0.030
454 454   }
455 455   }
... ... @@ -466,7 +466,7 @@
466 466  class AnthropicProvider implements LLMProvider {
467 467   async complete(prompt: string, options: CompletionOptions) {
468 468   const response = await anthropic.messages.create({
469 - model: options.model || 'claude-sonnet-3-5',
799 + model: options.model || 'claude-sonnet-4-5-20250929',
470 470   max_tokens: options.maxTokens || 4096,
471 471   messages: [{ role: 'user', content: prompt }],
472 472   system: options.systemPrompt
... ... @@ -532,6 +532,178 @@
532 532  
533 533  ----
534 534  
865 +
866 +
867 +==== Stage 2 Output Schema: ClaimAnalysis ====
868 +
869 +**Complete schema for each claim's analysis result:**
870 +
871 +{{code language="json"}}
872 +{
873 + "claim_id": "claim_abc123",
874 + "claim_text": "Biden won the 2020 election",
875 + "scenarios": [
876 + {
877 + "scenario_id": "scenario_1",
878 + "description": "Interpreting 'won' as Electoral College victory",
879 + "verdict": {
880 + "label": "TRUE",
881 + "confidence": 0.95,
882 + "explanation": "Joe Biden won 306 electoral votes vs Trump's 232"
883 + },
884 + "evidence": {
885 + "supporting": [
886 + {
887 + "text": "Biden certified with 306 electoral votes",
888 + "source_url": "https://www.archives.gov/electoral-college/2020",
889 + "source_title": "2020 Electoral College Results",
890 + "credibility_score": 0.98
891 + }
892 + ],
893 + "opposing": []
894 + }
895 + }
896 + ],
897 + "recommended_scenario": "scenario_1",
898 + "metadata": {
899 + "analysis_timestamp": "2024-12-24T18:00:00Z",
900 + "model_used": "claude-sonnet-4-5-20250929",
901 + "processing_time_seconds": 8.5
902 + }
903 +}
904 +{{/code}}
905 +
906 +**Required Fields:**
907 +* **claim_id**: Unique identifier matching Stage 1 output
908 +* **claim_text**: The exact claim being analyzed
909 +* **scenarios**: Array of interpretation scenarios (minimum 1)
910 + * **scenario_id**: Unique ID for this scenario
911 + * **description**: Clear interpretation of the claim
912 + * **verdict**: Verdict object with label, confidence, explanation
913 + * **evidence**: Supporting and opposing evidence arrays
914 +* **recommended_scenario**: ID of the primary/recommended scenario
915 +* **metadata**: Processing metadata (timestamp, model, timing)
916 +
917 +**Optional Fields:**
918 +* Additional context, warnings, or quality scores
919 +
920 +**Minimum Viable Example:**
921 +
922 +{{code language="json"}}
923 +{
924 + "claim_id": "c1",
925 + "claim_text": "The sky is blue",
926 + "scenarios": [{
927 + "scenario_id": "s1",
928 + "description": "Under clear daytime conditions",
929 + "verdict": {"label": "TRUE", "confidence": 0.99, "explanation": "Rayleigh scattering"},
930 + "evidence": {"supporting": [], "opposing": []}
931 + }],
932 + "recommended_scenario": "s1",
933 + "metadata": {"analysis_timestamp": "2024-12-24T18:00:00Z"}
934 +}
935 +{{/code}}
936 +
937 +
938 +
939 +==== Stage 3 Output Schema: ArticleAssessment ====
940 +
941 +**Complete schema for holistic article-level assessment:**
942 +
943 +{{code language="json"}}
944 +{
945 + "article_id": "article_xyz789",
946 + "overall_assessment": {
947 + "credibility_score": 0.72,
948 + "risk_tier": "B",
949 + "summary": "Article contains mostly accurate claims with one disputed claim requiring expert review",
950 + "confidence": 0.85
951 + },
952 + "claim_aggregation": {
953 + "total_claims": 5,
954 + "verdict_distribution": {
955 + "TRUE": 3,
956 + "PARTIALLY_TRUE": 1,
957 + "DISPUTED": 1,
958 + "FALSE": 0,
959 + "UNSUPPORTED": 0,
960 + "UNVERIFIABLE": 0
961 + },
962 + "avg_confidence": 0.82
963 + },
964 + "contextual_factors": [
965 + {
966 + "factor": "Source credibility",
967 + "impact": "positive",
968 + "description": "Published by reputable news organization"
969 + },
970 + {
971 + "factor": "Claim interdependence",
972 + "impact": "neutral",
973 + "description": "Claims are independent; no logical chains"
974 + }
975 + ],
976 + "recommendations": {
977 + "publication_mode": "AI_GENERATED",
978 + "requires_review": false,
979 + "review_reason": null,
980 + "suggested_disclaimers": [
981 + "One claim (Claim 4) has conflicting expert opinions"
982 + ]
983 + },
984 + "metadata": {
985 + "holistic_timestamp": "2024-12-24T18:00:10Z",
986 + "model_used": "claude-sonnet-4-5-20250929",
987 + "processing_time_seconds": 4.2,
988 + "cache_used": false
989 + }
990 +}
991 +{{/code}}
992 +
993 +**Required Fields:**
994 +* **article_id**: Unique identifier for this article
995 +* **overall_assessment**: Top-level assessment
996 + * **credibility_score**: 0.0-1.0 composite score
997 + * **risk_tier**: A, B, or C (per AKEL quality gates)
998 + * **summary**: Human-readable assessment
999 + * **confidence**: How confident the holistic assessment is
1000 +* **claim_aggregation**: Statistics across all claims
1001 + * **total_claims**: Count of claims analyzed
1002 + * **verdict_distribution**: Count per verdict label
1003 + * **avg_confidence**: Average confidence across verdicts
1004 +* **contextual_factors**: Array of contextual considerations
1005 +* **recommendations**: Publication decision support
1006 + * **publication_mode**: DRAFT_ONLY, AI_GENERATED, or HUMAN_REVIEWED
1007 + * **requires_review**: Boolean flag
1008 + * **suggested_disclaimers**: Array of disclaimer texts
1009 +* **metadata**: Processing metadata
1010 +
1011 +**Minimum Viable Example:**
1012 +
1013 +{{code language="json"}}
1014 +{
1015 + "article_id": "a1",
1016 + "overall_assessment": {
1017 + "credibility_score": 0.95,
1018 + "risk_tier": "C",
1019 + "summary": "All claims verified as true",
1020 + "confidence": 0.98
1021 + },
1022 + "claim_aggregation": {
1023 + "total_claims": 1,
1024 + "verdict_distribution": {"TRUE": 1},
1025 + "avg_confidence": 0.99
1026 + },
1027 + "contextual_factors": [],
1028 + "recommendations": {
1029 + "publication_mode": "AI_GENERATED",
1030 + "requires_review": false,
1031 + "suggested_disclaimers": []
1032 + },
1033 + "metadata": {"holistic_timestamp": "2024-12-24T18:00:00Z"}
1034 +}
1035 +{{/code}}
1036 +
535 535  === 3.2 Create Analysis Job (3-Stage) ===
536 536  
537 537  **Endpoint:** POST /v1/analyze
... ... @@ -583,6 +583,20 @@
583 583   "browsing": "on",
584 584   "depth": "standard",
585 585   "max_claims": 5,
1088 +
1089 +* **cache_preference** (optional): Cache usage preference
1090 + * **Type:** string
1091 + * **Enum:** {{code}}["prefer_cache", "allow_partial", "skip_cache"]{{/code}}
1092 + * **Default:** {{code}}"prefer_cache"{{/code}}
1093 + * **Semantics:**
1094 + * {{code}}"prefer_cache"{{/code}}: Use full cache if available, otherwise run all stages
1095 + * {{code}}"allow_partial"{{/code}}: Use cached Stage 2 results if available, rerun only Stage 3
1096 + * {{code}}"skip_cache"{{/code}}: Always rerun all stages (ignore cache)
1097 + * **Behavior:** When set to {{code}}"allow_partial"{{/code}} and Stage 2 cached results exist:
1098 + * Stage 1 & 2 are skipped
1099 + * Stage 3 (holistic assessment) runs fresh with cached claim analyses
1100 + * Response includes {{code}}"cache_used": true{{/code}} and {{code}}"stages_cached": ["stage1", "stage2"]{{/code}}
1101 +
586 586   "scenarios_per_claim": 2,
587 587   "max_evidence_per_scenario": 6,
588 588   "context_aware_analysis": true
... ... @@ -770,80 +770,78 @@
770 770  
771 771  **Algorithm: Canonical Claim Normalization v1**
772 772  
773 -{{{def normalize_claim_v1(claim_text: str, language: str) -> str:
774 - """
775 - Normalizes claim to canonical form for cache key generation.
776 - Version: v1norm1 (POC1)
777 - """
778 - import re
779 - import unicodedata
780 -
781 - # Step 1: Unicode normalization (NFC)
782 - text = unicodedata.normalize('NFC', claim_text)
783 -
784 - # Step 2: Lowercase
785 - text = text.lower()
786 -
787 - # Step 3: Remove punctuation (except hyphens in words)
788 - text = re.sub(r'[^\w\s-]', '', text)
789 -
790 - # Step 4: Normalize whitespace (collapse multiple spaces)
791 - text = re.sub(r'\s+', ' ', text).strip()
792 -
793 - # Step 5: Numeric normalization
794 - text = text.replace('%', ' percent')
795 - # Spell out single-digit numbers
796 - num_to_word = {'0':'zero', '1':'one', '2':'two', '3':'three',
797 - '4':'four', '5':'five', '6':'six', '7':'seven',
798 - '8':'eight', '9':'nine'}
799 - for num, word in num_to_word.items():
800 - text = re.sub(rf'\b{num}\b', word, text)
801 -
802 - # Step 6: Common abbreviations (English only in v1)
803 - if language == 'en':
804 - text = text.replace('covid-19', 'covid')
805 - text = text.replace('u.s.', 'us')
806 - text = text.replace('u.k.', 'uk')
807 -
808 - # Step 7: NO entity normalization in v1
809 - # (Trump vs Donald Trump vs President Trump remain distinct)
810 -
811 - return text
812 812  
813 -# Version identifier (include in cache namespace)
814 -CANONICALIZER_VERSION = "v1norm1"
815 -}}}
1290 +**Normative Algorithm:**
816 816  
817 -**Cache Key Formula (Updated):**
1292 +{{code language="python"}}
1293 +def normalize_claim(text: str) -> str:
1294 + """
1295 + Canonical claim normalization for deduplication.
1296 + MUST follow this algorithm exactly.
1297 +
1298 + Version: v1norm1
1299 + """
1300 + import re
1301 + import unicodedata
1302 +
1303 + # 1. Unicode normalization (NFD)
1304 + text = unicodedata.normalize('NFD', text)
1305 +
1306 + # 2. Lowercase
1307 + text = text.lower()
1308 +
1309 + # 3. Remove diacritics
1310 + text = ''.join(c for c in text if unicodedata.category(c) != 'Mn')
1311 +
1312 + # 4. Normalize whitespace
1313 + text = re.sub(r'\s+', ' ', text)
1314 + text = text.strip()
1315 +
1316 + # 5. Remove punctuation except apostrophes in contractions
1317 + text = re.sub(r"[^\w\s']", '', text)
1318 +
1319 + # 6. Normalize common contractions
1320 + contractions = {
1321 + "don't": "do not",
1322 + "doesn't": "does not",
1323 + "didn't": "did not",
1324 + "can't": "cannot",
1325 + "won't": "will not",
1326 + "shouldn't": "should not",
1327 + "wouldn't": "would not",
1328 + "isn't": "is not",
1329 + "aren't": "are not",
1330 + "wasn't": "was not",
1331 + "weren't": "were not",
1332 + "haven't": "have not",
1333 + "hasn't": "has not",
1334 + "hadn't": "had not"
1335 + }
1336 +
1337 + for contraction, expansion in contractions.items():
1338 + text = re.sub(r'\b' + contraction + r'\b', expansion, text)
1339 +
1340 + # 7. Remove remaining apostrophes
1341 + text = text.replace("'", "")
1342 +
1343 + # 8. Final whitespace normalization
1344 + text = re.sub(r'\s+', ' ', text)
1345 + text = text.strip()
1346 +
1347 + return text
1348 +{{/code}}
818 818  
819 -{{{language = "en"
820 -canonical = normalize_claim_v1(claim_text, language)
821 -cache_key = f"claim:{CANONICALIZER_VERSION}:{language}:{sha256(canonical)}"
1350 +**Normalization Examples:**
822 822  
823 -Example:
824 - claim: "COVID-19 vaccines are 95% effective"
825 - canonical: "covid vaccines are 95 percent effective"
826 - sha256: abc123...def456
827 - key: "claim:v1norm1:en:abc123...def456"
828 -}}}
1352 +|= Input |= Normalized Output
1353 +| "Biden won the 2020 election" | {{code}}biden won the 2020 election{{/code}}
1354 +| "Biden won the 2020 election!" | {{code}}biden won the 2020 election{{/code}}
1355 +| "Biden won the 2020 election" | {{code}}biden won the 2020 election{{/code}}
1356 +| "Biden didn't win the 2020 election" | {{code}}biden did not win the 2020 election{{/code}}
1357 +| "BIDEN WON THE 2020 ELECTION" | {{code}}biden won the 2020 election{{/code}}
829 829  
830 -**Cache Metadata MUST Include:**
1359 +**Versioning:** Algorithm version is {{code}}v1norm1{{/code}}. Changes to the algorithm require a new version identifier.
831 831  
832 -{{{{
833 - "canonical_claim": "covid vaccines are 95 percent effective",
834 - "canonicalizer_version": "v1norm1",
835 - "language": "en",
836 - "original_claim_samples": ["COVID-19 vaccines are 95% effective"]
837 -}
838 -}}}
839 -
840 -**Version Upgrade Path:**
841 -
842 -* v1norm1 → v1norm2: Cache namespace changes, old keys remain valid until TTL
843 -* v1normN → v2norm1: Major version bump, invalidate all v1 caches
844 -
845 -----
846 -
847 847  === 5.1.2 Copyright & Data Retention Policy ===
848 848  
849 849  **Evidence Excerpt Storage:**