Last modified by Robert Schaub on 2025/12/24 20:16

From version 2.2
edited by Robert Schaub
on 2025/12/24 20:16
Change comment: Update document after refactoring.
To version 1.1
edited by Robert Schaub
on 2025/12/24 19:45
Change comment: Imported from XAR

Summary

Details

Page properties
Parent
... ... @@ -1,1 +1,1 @@
1 -Test.FactHarbor V0\.9\.105.Specification.POC.WebHome
1 +Test.FactHarbor.Specification.POC.WebHome
Content
... ... @@ -58,7 +58,7 @@
58 58  
59 59  * **Input:** Article text
60 60  * **Output:** 5 canonical claims (normalized, deduplicated)
61 -* **Model:** Claude Haiku 4.5.5 (default, configurable via LLM abstraction layer)
61 +* **Model:** Claude Haiku 4 (default, configurable via LLM abstraction layer)
62 62  * **Cost:** $0.003 per article
63 63  * **Cache strategy:** No caching (article-specific)
64 64  
... ... @@ -66,7 +66,7 @@
66 66  
67 67  * **Input:** Single canonical claim
68 68  * **Output:** Scenarios + Evidence + Verdicts
69 -* **Model:** Claude Sonnet 4.5 (default, configurable via LLM abstraction layer)
69 +* **Model:** Claude Sonnet 3.5 (default, configurable via LLM abstraction layer)
70 70  * **Cost:** $0.081 per NEW claim
71 71  * **Cache strategy:** Redis, 90-day TTL
72 72  * **Cache key:** claim:v1norm1:{language}:{sha256(canonical_claim)}
... ... @@ -75,7 +75,7 @@
75 75  
76 76  * **Input:** Article + Claim verdicts (from cache or Stage 2)
77 77  * **Output:** Article verdict + Fallacies + Logic quality
78 -* **Model:** Claude Sonnet 4.5 (default, configurable via LLM abstraction layer)
78 +* **Model:** Claude Sonnet 3.5 (default, configurable via LLM abstraction layer)
79 79  * **Cost:** $0.030 per article
80 80  * **Cache strategy:** No caching (article-specific)
81 81  
... ... @@ -115,336 +115,6 @@
115 115  
116 116  When free users reach their $10 monthly limit, they enter **Cache-Only Mode**:
117 117  
118 -
119 -
120 -==== Stage 3: Holistic Assessment - Complete Specification ====
121 -
122 -===== 3.3.1 Overview =====
123 -
124 -**Purpose:** Synthesize individual claim analyses into an overall article assessment, identifying logical fallacies, reasoning quality, and publication readiness.
125 -
126 -**Approach:** **Single-Pass Holistic Analysis** (Approach 1 from Comparison Matrix)
127 -
128 -**Why This Approach for POC1:**
129 -* ✅ **1 API call** (vs 2 for Two-Pass or Judge)
130 -* ✅ **Low cost** ($0.030 per article)
131 -* ✅ **Fast** (4-6 seconds)
132 -* ✅ **Low complexity** (simple implementation)
133 -* ⚠️ **Medium reliability** (acceptable for POC1, will improve in POC2/Production)
134 -
135 -**Alternative Approaches Considered:**
136 -
137 -|= Approach |= API Calls |= Cost |= Speed |= Complexity |= Reliability |= Best For
138 -| **1. Single-Pass** ⭐ | 1 | 💰 Low | ⚡ Fast | 🟢 Low | ⚠️ Medium | **POC1**
139 -| 2. Two-Pass | 2 | 💰💰 Med | 🐢 Slow | 🟡 Med | ✅ High | POC2/Prod
140 -| 3. Structured | 1 | 💰 Low | ⚡ Fast | 🟡 Med | ✅ High | POC1 (alternative)
141 -| 4. Weighted | 1 | 💰 Low | ⚡ Fast | 🟢 Low | ⚠️ Medium | POC1 (alternative)
142 -| 5. Heuristics | 1 | 💰 Lowest | ⚡⚡ Fastest | 🟡 Med | ⚠️ Medium | Any
143 -| 6. Hybrid | 1 | 💰 Low | ⚡ Fast | 🔴 Med-High | ✅ High | POC2
144 -| 7. Judge | 2 | 💰💰 Med | 🐢 Slow | 🟡 Med | ✅ High | Production
145 -
146 -**POC1 Choice:** Approach 1 (Single-Pass) for speed and simplicity. Will upgrade to Approach 2 (Two-Pass) or 6 (Hybrid) in POC2 for higher reliability.
147 -
148 -===== 3.3.2 What Stage 3 Evaluates =====
149 -
150 -Stage 3 performs **integrated holistic analysis** considering:
151 -
152 -**1. Claim-Level Aggregation:**
153 -* Verdict distribution (how many TRUE vs FALSE vs DISPUTED)
154 -* Average confidence across all claims
155 -* Claim interdependencies (do claims support/contradict each other?)
156 -* Critical claim identification (which claims are most important?)
157 -
158 -**2. Contextual Factors:**
159 -* **Source credibility**: Is the article from a reputable publisher?
160 -* **Author expertise**: Does the author have relevant credentials?
161 -* **Publication date**: Is information current or outdated?
162 -* **Claim coherence**: Do claims form a logical narrative?
163 -* **Missing context**: Are important caveats or qualifications missing?
164 -
165 -**3. Logical Fallacies:**
166 -* **Cherry-picking**: Selective evidence presentation
167 -* **False equivalence**: Treating unequal things as equal
168 -* **Straw man**: Misrepresenting opposing arguments
169 -* **Ad hominem**: Attacking person instead of argument
170 -* **Slippery slope**: Assuming extreme consequences without justification
171 -* **Circular reasoning**: Conclusion assumes premise
172 -* **False dichotomy**: Presenting only two options when more exist
173 -
174 -**4. Reasoning Quality:**
175 -* **Evidence strength**: Quality and quantity of supporting evidence
176 -* **Logical coherence**: Arguments follow logically
177 -* **Transparency**: Assumptions and limitations acknowledged
178 -* **Nuance**: Complexity and uncertainty appropriately addressed
179 -
180 -**5. Publication Readiness:**
181 -* **Risk tier assignment**: A (high risk), B (medium), or C (low risk)
182 -* **Publication mode**: DRAFT_ONLY, AI_GENERATED, or HUMAN_REVIEWED
183 -* **Required disclaimers**: What warnings should accompany this content?
184 -
185 -===== 3.3.3 Implementation: Single-Pass Approach =====
186 -
187 -**Input:**
188 -* Original article text (full content)
189 -* Stage 2 claim analyses (array of ClaimAnalysis objects)
190 -* Article metadata (URL, title, author, date, source)
191 -
192 -**Processing:**
193 -
194 -{{code language="python"}}
195 -# Pseudo-code for Stage 3 (Single-Pass)
196 -
197 -def stage3_holistic_assessment(article, claim_analyses, metadata):
198 - """
199 - Single-pass holistic assessment using Claude Sonnet 4.5.
200 -
201 - Approach 1: One comprehensive prompt that asks the LLM to:
202 - 1. Review all claim verdicts
203 - 2. Identify patterns and dependencies
204 - 3. Detect logical fallacies
205 - 4. Assess reasoning quality
206 - 5. Determine credibility score and risk tier
207 - 6. Generate publication recommendations
208 - """
209 -
210 - # Construct comprehensive prompt
211 - prompt = f"""
212 -You are analyzing an article for factual accuracy and logical reasoning.
213 -
214 -ARTICLE METADATA:
215 -- Title: {metadata['title']}
216 -- Source: {metadata['source']}
217 -- Date: {metadata['date']}
218 -- Author: {metadata['author']}
219 -
220 -ARTICLE TEXT:
221 -{article}
222 -
223 -INDIVIDUAL CLAIM ANALYSES:
224 -{format_claim_analyses(claim_analyses)}
225 -
226 -YOUR TASK:
227 -Perform a holistic assessment considering:
228 -
229 -1. CLAIM AGGREGATION:
230 - - Review the verdict for each claim
231 - - Identify any interdependencies between claims
232 - - Determine which claims are most critical to the article's thesis
233 -
234 -2. CONTEXTUAL EVALUATION:
235 - - Assess source credibility
236 - - Evaluate author expertise
237 - - Consider publication timeliness
238 - - Identify missing context or important caveats
239 -
240 -3. LOGICAL FALLACIES:
241 - - Identify any logical fallacies present
242 - - For each fallacy, provide:
243 - * Type of fallacy
244 - * Where it occurs in the article
245 - * Why it's problematic
246 - * Severity (minor/moderate/severe)
247 -
248 -4. REASONING QUALITY:
249 - - Evaluate evidence strength
250 - - Assess logical coherence
251 - - Check for transparency in assumptions
252 - - Evaluate handling of nuance and uncertainty
253 -
254 -5. CREDIBILITY SCORING:
255 - - Calculate overall credibility score (0.0-1.0)
256 - - Assign risk tier:
257 - * A (high risk): ≤0.5 credibility OR severe fallacies
258 - * B (medium risk): 0.5-0.8 credibility OR moderate issues
259 - * C (low risk): >0.8 credibility AND no significant issues
260 -
261 -6. PUBLICATION RECOMMENDATIONS:
262 - - Determine publication mode:
263 - * DRAFT_ONLY: Tier A, multiple severe issues
264 - * AI_GENERATED: Tier B/C, acceptable quality with disclaimers
265 - * HUMAN_REVIEWED: Complex or borderline cases
266 - - List required disclaimers
267 - - Explain decision rationale
268 -
269 -OUTPUT FORMAT:
270 -Return a JSON object matching the ArticleAssessment schema.
271 -"""
272 -
273 - # Call LLM
274 - response = llm_client.complete(
275 - model="claude-sonnet-4-5-20250929",
276 - prompt=prompt,
277 - max_tokens=4000,
278 - response_format="json"
279 - )
280 -
281 - # Parse and validate response
282 - assessment = parse_json(response.content)
283 - validate_article_assessment_schema(assessment)
284 -
285 - return assessment
286 -{{/code}}
287 -
288 -**Prompt Engineering Notes:**
289 -
290 -1. **Structured Instructions**: Break down task into 6 clear sections
291 -2. **Context-Rich**: Provide article + all claim analyses + metadata
292 -3. **Explicit Criteria**: Define credibility scoring and risk tiers precisely
293 -4. **JSON Schema**: Request structured output matching ArticleAssessment schema
294 -5. **Examples** (in production): Include 2-3 example assessments for consistency
295 -
296 -===== 3.3.4 Credibility Scoring Algorithm =====
297 -
298 -**Base Score Calculation:**
299 -
300 -{{code language="python"}}
301 -def calculate_credibility_score(claim_analyses, fallacies, contextual_factors):
302 - """
303 - Calculate overall credibility score (0.0-1.0).
304 -
305 - This is a GUIDELINE for the LLM, not strict code.
306 - The LLM has flexibility to adjust based on context.
307 - """
308 -
309 - # 1. Claim Verdict Score (60% weight)
310 - verdict_weights = {
311 - "TRUE": 1.0,
312 - "PARTIALLY_TRUE": 0.7,
313 - "DISPUTED": 0.5,
314 - "UNSUPPORTED": 0.3,
315 - "FALSE": 0.0,
316 - "UNVERIFIABLE": 0.4
317 - }
318 -
319 - claim_scores = [
320 - verdict_weights[c.verdict.label] * c.verdict.confidence
321 - for c in claim_analyses
322 - ]
323 - avg_claim_score = sum(claim_scores) / len(claim_scores)
324 - claim_component = avg_claim_score * 0.6
325 -
326 - # 2. Fallacy Penalty (20% weight)
327 - fallacy_penalties = {
328 - "minor": -0.05,
329 - "moderate": -0.15,
330 - "severe": -0.30
331 - }
332 -
333 - fallacy_score = 1.0
334 - for fallacy in fallacies:
335 - fallacy_score += fallacy_penalties[fallacy.severity]
336 -
337 - fallacy_score = max(0.0, min(1.0, fallacy_score))
338 - fallacy_component = fallacy_score * 0.2
339 -
340 - # 3. Contextual Factors (20% weight)
341 - context_adjustments = {
342 - "source_credibility": {"positive": +0.1, "neutral": 0, "negative": -0.1},
343 - "author_expertise": {"positive": +0.1, "neutral": 0, "negative": -0.1},
344 - "timeliness": {"positive": +0.05, "neutral": 0, "negative": -0.05},
345 - "transparency": {"positive": +0.05, "neutral": 0, "negative": -0.05}
346 - }
347 -
348 - context_score = 1.0
349 - for factor in contextual_factors:
350 - adjustment = context_adjustments.get(factor.factor, {}).get(factor.impact, 0)
351 - context_score += adjustment
352 -
353 - context_score = max(0.0, min(1.0, context_score))
354 - context_component = context_score * 0.2
355 -
356 - # 4. Combine components
357 - final_score = claim_component + fallacy_component + context_component
358 -
359 - # 5. Apply confidence modifier
360 - avg_confidence = sum(c.verdict.confidence for c in claim_analyses) / len(claim_analyses)
361 - final_score = final_score * (0.8 + 0.2 * avg_confidence)
362 -
363 - return max(0.0, min(1.0, final_score))
364 -{{/code}}
365 -
366 -**Note:** This algorithm is a **guideline** provided to the LLM in the system prompt. The LLM has flexibility to adjust based on specific article context, but should generally follow this structure for consistency.
367 -
368 -===== 3.3.5 Risk Tier Assignment =====
369 -
370 -**Automatic Risk Tier Rules:**
371 -
372 -{{code}}
373 -Risk Tier A (High Risk - Requires Review):
374 -- Credibility score ≤ 0.5, OR
375 -- Any severe fallacies detected, OR
376 -- Multiple (3+) moderate fallacies, OR
377 -- 50%+ of claims are FALSE or UNSUPPORTED
378 -
379 -Risk Tier B (Medium Risk - May Publish with Disclaimers):
380 -- Credibility score 0.5-0.8, OR
381 -- 1-2 moderate fallacies, OR
382 -- 20-49% of claims are DISPUTED or PARTIALLY_TRUE
383 -
384 -Risk Tier C (Low Risk - Safe to Publish):
385 -- Credibility score > 0.8, AND
386 -- No severe or moderate fallacies, AND
387 -- <20% disputed/problematic claims, AND
388 -- No critical missing context
389 -{{/code}}
390 -
391 -===== 3.3.6 Output: ArticleAssessment Schema =====
392 -
393 -(See Stage 3 Output Schema section above for complete JSON schema)
394 -
395 -===== 3.3.7 Performance Metrics =====
396 -
397 -**POC1 Targets:**
398 -* **Processing time**: 4-6 seconds per article
399 -* **Cost**: $0.030 per article (Sonnet 4.5 tokens)
400 -* **Quality**: 70-80% agreement with human reviewers (acceptable for POC)
401 -* **API calls**: 1 per article
402 -
403 -**Future Improvements (POC2/Production):**
404 -* Upgrade to Two-Pass (Approach 2): +15% accuracy, +$0.020 cost
405 -* Add human review sampling: 10% of Tier B articles
406 -* Implement Judge approach (Approach 7) for Tier A: Highest quality
407 -
408 -===== 3.3.8 Example Stage 3 Execution =====
409 -
410 -**Input:**
411 -* Article: "Biden won the 2020 election"
412 -* Claim analyses: [{claim: "Biden won", verdict: "TRUE", confidence: 0.95}]
413 -
414 -**Stage 3 Processing:**
415 -1. Analyzes single claim with high confidence
416 -2. Checks for contextual factors (source credibility)
417 -3. Searches for logical fallacies (none found)
418 -4. Calculates credibility: 0.6 * 0.95 + 0.2 * 1.0 + 0.2 * 1.0 = 0.97
419 -5. Assigns risk tier: C (low risk)
420 -6. Recommends: AI_GENERATED publication mode
421 -
422 -**Output:**
423 -```json
424 -{
425 - "article_id": "a1",
426 - "overall_assessment": {
427 - "credibility_score": 0.97,
428 - "risk_tier": "C",
429 - "summary": "Article makes single verifiable claim with strong evidence support",
430 - "confidence": 0.95
431 - },
432 - "claim_aggregation": {
433 - "total_claims": 1,
434 - "verdict_distribution": {"TRUE": 1},
435 - "avg_confidence": 0.95
436 - },
437 - "contextual_factors": [
438 - {"factor": "source_credibility", "impact": "positive", "description": "Reputable news source"}
439 - ],
440 - "recommendations": {
441 - "publication_mode": "AI_GENERATED",
442 - "requires_review": false,
443 - "suggested_disclaimers": []
444 - }
445 -}
446 -```
447 -
448 448  ==== What Cache-Only Mode Provides: ====
449 449  
450 450  ✅ **Claim Extraction (Platform-Funded):**
... ... @@ -566,7 +566,7 @@
566 566  **Primary Provider (Default):**
567 567  
568 568  * **Anthropic Claude API**
569 - * Models: Claude Haiku 4.5, Claude Sonnet 4.5, Claude Opus 4
239 + * Models: Claude Haiku 4, Claude Sonnet 3.5, Claude Opus 4
570 570   * Used by default in POC1
571 571   * Best quality for holistic analysis
572 572  
... ... @@ -603,9 +603,9 @@
603 603  LLM_STAGE1_PROVIDER=anthropic
604 604  LLM_STAGE1_MODEL=claude-haiku-4
605 605  LLM_STAGE2_PROVIDER=anthropic
606 -LLM_STAGE2_MODEL=claude-sonnet-4-5-20250929
276 +LLM_STAGE2_MODEL=claude-sonnet-3-5
607 607  LLM_STAGE3_PROVIDER=anthropic
608 -LLM_STAGE3_MODEL=claude-sonnet-4-5-20250929
278 +LLM_STAGE3_MODEL=claude-sonnet-3-5
609 609  
610 610  # Cost limits
611 611  LLM_MAX_COST_PER_REQUEST=1.00
... ... @@ -632,19 +632,19 @@
632 632   "stage_config": {
633 633   "stage1": {
634 634   "provider": "anthropic",
635 - "model": "claude-haiku-4-5-20251001",
305 + "model": "claude-haiku-4",
636 636   "max_tokens": 4096,
637 637   "temperature": 0.0
638 638   },
639 639   "stage2": {
640 640   "provider": "anthropic",
641 - "model": "claude-sonnet-4-5-20250929",
311 + "model": "claude-sonnet-3-5",
642 642   "max_tokens": 16384,
643 643   "temperature": 0.3
644 644   },
645 645   "stage3": {
646 646   "provider": "anthropic",
647 - "model": "claude-sonnet-4-5-20250929",
317 + "model": "claude-sonnet-3-5",
648 648   "max_tokens": 8192,
649 649   "temperature": 0.2
650 650   }
... ... @@ -658,7 +658,7 @@
658 658  
659 659  **Stage 1: Claim Extraction**
660 660  
661 -* **Default:** Anthropic Claude Haiku 4.5
331 +* **Default:** Anthropic Claude Haiku 4
662 662  * **Alternative:** OpenAI GPT-4o-mini, Google Gemini 1.5 Flash
663 663  * **Rationale:** Fast, cheap, simple task
664 664  * **Cost:** ~$0.003 per article
... ... @@ -665,7 +665,7 @@
665 665  
666 666  **Stage 2: Claim Analysis** (CACHEABLE)
667 667  
668 -* **Default:** Anthropic Claude Sonnet 4.5
338 +* **Default:** Anthropic Claude Sonnet 3.5
669 669  * **Alternative:** OpenAI GPT-4o, Google Gemini 1.5 Pro
670 670  * **Rationale:** High-quality analysis, cached 90 days
671 671  * **Cost:** ~$0.081 per NEW claim
... ... @@ -672,7 +672,7 @@
672 672  
673 673  **Stage 3: Holistic Assessment**
674 674  
675 -* **Default:** Anthropic Claude Sonnet 4.5
345 +* **Default:** Anthropic Claude Sonnet 3.5
676 676  * **Alternative:** OpenAI GPT-4o, Claude Opus 4 (for high-stakes)
677 677  * **Rationale:** Complex reasoning, logical fallacy detection
678 678  * **Cost:** ~$0.030 per article
... ... @@ -680,9 +680,9 @@
680 680  **Cost Comparison (Example):**
681 681  
682 682  |=Stage|=Anthropic (Default)|=OpenAI Alternative|=Google Alternative
683 -|Stage 1|Claude Haiku 4.5.5 ($0.003)|GPT-4o-mini ($0.002)|Gemini Flash ($0.002)
684 -|Stage 2|Claude Sonnet 4.5 ($0.081)|GPT-4o ($0.045)|Gemini Pro ($0.050)
685 -|Stage 3|Claude Sonnet 4.5 ($0.030)|GPT-4o ($0.018)|Gemini Pro ($0.020)
353 +|Stage 1|Claude Haiku 4 ($0.003)|GPT-4o-mini ($0.002)|Gemini Flash ($0.002)
354 +|Stage 2|Claude Sonnet 3.5 ($0.081)|GPT-4o ($0.045)|Gemini Pro ($0.050)
355 +|Stage 3|Claude Sonnet 3.5 ($0.030)|GPT-4o ($0.018)|Gemini Pro ($0.020)
686 686  |**Total (0% cache)**|**$0.114**|**$0.065**|**$0.072**
687 687  
688 688  **Note:** POC1 uses Anthropic exclusively for consistency. Multi-provider support planned for POC2.
... ... @@ -743,7 +743,7 @@
743 743   "stage": "stage2",
744 744   "previous": {
745 745   "provider": "anthropic",
746 - "model": "claude-sonnet-4-5-20250929"
416 + "model": "claude-sonnet-3-5"
747 747   },
748 748   "current": {
749 749   "provider": "openai",
... ... @@ -769,17 +769,17 @@
769 769   "stages": {
770 770   "stage1": {
771 771   "provider": "anthropic",
772 - "model": "claude-haiku-4-5-20251001",
442 + "model": "claude-haiku-4",
773 773   "cost_per_request": 0.003
774 774   },
775 775   "stage2": {
776 776   "provider": "anthropic",
777 - "model": "claude-sonnet-4-5-20250929",
447 + "model": "claude-sonnet-3-5",
778 778   "cost_per_new_claim": 0.081
779 779   },
780 780   "stage3": {
781 781   "provider": "anthropic",
782 - "model": "claude-sonnet-4-5-20250929",
452 + "model": "claude-sonnet-3-5",
783 783   "cost_per_request": 0.030
784 784   }
785 785   }
... ... @@ -796,7 +796,7 @@
796 796  class AnthropicProvider implements LLMProvider {
797 797   async complete(prompt: string, options: CompletionOptions) {
798 798   const response = await anthropic.messages.create({
799 - model: options.model || 'claude-sonnet-4-5-20250929',
469 + model: options.model || 'claude-sonnet-3-5',
800 800   max_tokens: options.maxTokens || 4096,
801 801   messages: [{ role: 'user', content: prompt }],
802 802   system: options.systemPrompt
... ... @@ -862,178 +862,6 @@
862 862  
863 863  ----
864 864  
865 -
866 -
867 -==== Stage 2 Output Schema: ClaimAnalysis ====
868 -
869 -**Complete schema for each claim's analysis result:**
870 -
871 -{{code language="json"}}
872 -{
873 - "claim_id": "claim_abc123",
874 - "claim_text": "Biden won the 2020 election",
875 - "scenarios": [
876 - {
877 - "scenario_id": "scenario_1",
878 - "description": "Interpreting 'won' as Electoral College victory",
879 - "verdict": {
880 - "label": "TRUE",
881 - "confidence": 0.95,
882 - "explanation": "Joe Biden won 306 electoral votes vs Trump's 232"
883 - },
884 - "evidence": {
885 - "supporting": [
886 - {
887 - "text": "Biden certified with 306 electoral votes",
888 - "source_url": "https://www.archives.gov/electoral-college/2020",
889 - "source_title": "2020 Electoral College Results",
890 - "credibility_score": 0.98
891 - }
892 - ],
893 - "opposing": []
894 - }
895 - }
896 - ],
897 - "recommended_scenario": "scenario_1",
898 - "metadata": {
899 - "analysis_timestamp": "2024-12-24T18:00:00Z",
900 - "model_used": "claude-sonnet-4-5-20250929",
901 - "processing_time_seconds": 8.5
902 - }
903 -}
904 -{{/code}}
905 -
906 -**Required Fields:**
907 -* **claim_id**: Unique identifier matching Stage 1 output
908 -* **claim_text**: The exact claim being analyzed
909 -* **scenarios**: Array of interpretation scenarios (minimum 1)
910 - * **scenario_id**: Unique ID for this scenario
911 - * **description**: Clear interpretation of the claim
912 - * **verdict**: Verdict object with label, confidence, explanation
913 - * **evidence**: Supporting and opposing evidence arrays
914 -* **recommended_scenario**: ID of the primary/recommended scenario
915 -* **metadata**: Processing metadata (timestamp, model, timing)
916 -
917 -**Optional Fields:**
918 -* Additional context, warnings, or quality scores
919 -
920 -**Minimum Viable Example:**
921 -
922 -{{code language="json"}}
923 -{
924 - "claim_id": "c1",
925 - "claim_text": "The sky is blue",
926 - "scenarios": [{
927 - "scenario_id": "s1",
928 - "description": "Under clear daytime conditions",
929 - "verdict": {"label": "TRUE", "confidence": 0.99, "explanation": "Rayleigh scattering"},
930 - "evidence": {"supporting": [], "opposing": []}
931 - }],
932 - "recommended_scenario": "s1",
933 - "metadata": {"analysis_timestamp": "2024-12-24T18:00:00Z"}
934 -}
935 -{{/code}}
936 -
937 -
938 -
939 -==== Stage 3 Output Schema: ArticleAssessment ====
940 -
941 -**Complete schema for holistic article-level assessment:**
942 -
943 -{{code language="json"}}
944 -{
945 - "article_id": "article_xyz789",
946 - "overall_assessment": {
947 - "credibility_score": 0.72,
948 - "risk_tier": "B",
949 - "summary": "Article contains mostly accurate claims with one disputed claim requiring expert review",
950 - "confidence": 0.85
951 - },
952 - "claim_aggregation": {
953 - "total_claims": 5,
954 - "verdict_distribution": {
955 - "TRUE": 3,
956 - "PARTIALLY_TRUE": 1,
957 - "DISPUTED": 1,
958 - "FALSE": 0,
959 - "UNSUPPORTED": 0,
960 - "UNVERIFIABLE": 0
961 - },
962 - "avg_confidence": 0.82
963 - },
964 - "contextual_factors": [
965 - {
966 - "factor": "Source credibility",
967 - "impact": "positive",
968 - "description": "Published by reputable news organization"
969 - },
970 - {
971 - "factor": "Claim interdependence",
972 - "impact": "neutral",
973 - "description": "Claims are independent; no logical chains"
974 - }
975 - ],
976 - "recommendations": {
977 - "publication_mode": "AI_GENERATED",
978 - "requires_review": false,
979 - "review_reason": null,
980 - "suggested_disclaimers": [
981 - "One claim (Claim 4) has conflicting expert opinions"
982 - ]
983 - },
984 - "metadata": {
985 - "holistic_timestamp": "2024-12-24T18:00:10Z",
986 - "model_used": "claude-sonnet-4-5-20250929",
987 - "processing_time_seconds": 4.2,
988 - "cache_used": false
989 - }
990 -}
991 -{{/code}}
992 -
993 -**Required Fields:**
994 -* **article_id**: Unique identifier for this article
995 -* **overall_assessment**: Top-level assessment
996 - * **credibility_score**: 0.0-1.0 composite score
997 - * **risk_tier**: A, B, or C (per AKEL quality gates)
998 - * **summary**: Human-readable assessment
999 - * **confidence**: How confident the holistic assessment is
1000 -* **claim_aggregation**: Statistics across all claims
1001 - * **total_claims**: Count of claims analyzed
1002 - * **verdict_distribution**: Count per verdict label
1003 - * **avg_confidence**: Average confidence across verdicts
1004 -* **contextual_factors**: Array of contextual considerations
1005 -* **recommendations**: Publication decision support
1006 - * **publication_mode**: DRAFT_ONLY, AI_GENERATED, or HUMAN_REVIEWED
1007 - * **requires_review**: Boolean flag
1008 - * **suggested_disclaimers**: Array of disclaimer texts
1009 -* **metadata**: Processing metadata
1010 -
1011 -**Minimum Viable Example:**
1012 -
1013 -{{code language="json"}}
1014 -{
1015 - "article_id": "a1",
1016 - "overall_assessment": {
1017 - "credibility_score": 0.95,
1018 - "risk_tier": "C",
1019 - "summary": "All claims verified as true",
1020 - "confidence": 0.98
1021 - },
1022 - "claim_aggregation": {
1023 - "total_claims": 1,
1024 - "verdict_distribution": {"TRUE": 1},
1025 - "avg_confidence": 0.99
1026 - },
1027 - "contextual_factors": [],
1028 - "recommendations": {
1029 - "publication_mode": "AI_GENERATED",
1030 - "requires_review": false,
1031 - "suggested_disclaimers": []
1032 - },
1033 - "metadata": {"holistic_timestamp": "2024-12-24T18:00:00Z"}
1034 -}
1035 -{{/code}}
1036 -
1037 1037  === 3.2 Create Analysis Job (3-Stage) ===
1038 1038  
1039 1039  **Endpoint:** POST /v1/analyze
... ... @@ -1085,20 +1085,6 @@
1085 1085   "browsing": "on",
1086 1086   "depth": "standard",
1087 1087   "max_claims": 5,
1088 -
1089 -* **cache_preference** (optional): Cache usage preference
1090 - * **Type:** string
1091 - * **Enum:** {{code}}["prefer_cache", "allow_partial", "skip_cache"]{{/code}}
1092 - * **Default:** {{code}}"prefer_cache"{{/code}}
1093 - * **Semantics:**
1094 - * {{code}}"prefer_cache"{{/code}}: Use full cache if available, otherwise run all stages
1095 - * {{code}}"allow_partial"{{/code}}: Use cached Stage 2 results if available, rerun only Stage 3
1096 - * {{code}}"skip_cache"{{/code}}: Always rerun all stages (ignore cache)
1097 - * **Behavior:** When set to {{code}}"allow_partial"{{/code}} and Stage 2 cached results exist:
1098 - * Stage 1 & 2 are skipped
1099 - * Stage 3 (holistic assessment) runs fresh with cached claim analyses
1100 - * Response includes {{code}}"cache_used": true{{/code}} and {{code}}"stages_cached": ["stage1", "stage2"]{{/code}}
1101 -
1102 1102   "scenarios_per_claim": 2,
1103 1103   "max_evidence_per_scenario": 6,
1104 1104   "context_aware_analysis": true
... ... @@ -1286,78 +1286,80 @@
1286 1286  
1287 1287  **Algorithm: Canonical Claim Normalization v1**
1288 1288  
773 +{{{def normalize_claim_v1(claim_text: str, language: str) -> str:
774 + """
775 + Normalizes claim to canonical form for cache key generation.
776 + Version: v1norm1 (POC1)
777 + """
778 + import re
779 + import unicodedata
780 +
781 + # Step 1: Unicode normalization (NFC)
782 + text = unicodedata.normalize('NFC', claim_text)
783 +
784 + # Step 2: Lowercase
785 + text = text.lower()
786 +
787 + # Step 3: Remove punctuation (except hyphens in words)
788 + text = re.sub(r'[^\w\s-]', '', text)
789 +
790 + # Step 4: Normalize whitespace (collapse multiple spaces)
791 + text = re.sub(r'\s+', ' ', text).strip()
792 +
793 + # Step 5: Numeric normalization
794 + text = text.replace('%', ' percent')
795 + # Spell out single-digit numbers
796 + num_to_word = {'0':'zero', '1':'one', '2':'two', '3':'three',
797 + '4':'four', '5':'five', '6':'six', '7':'seven',
798 + '8':'eight', '9':'nine'}
799 + for num, word in num_to_word.items():
800 + text = re.sub(rf'\b{num}\b', word, text)
801 +
802 + # Step 6: Common abbreviations (English only in v1)
803 + if language == 'en':
804 + text = text.replace('covid-19', 'covid')
805 + text = text.replace('u.s.', 'us')
806 + text = text.replace('u.k.', 'uk')
807 +
808 + # Step 7: NO entity normalization in v1
809 + # (Trump vs Donald Trump vs President Trump remain distinct)
810 +
811 + return text
1289 1289  
1290 -**Normative Algorithm:**
813 +# Version identifier (include in cache namespace)
814 +CANONICALIZER_VERSION = "v1norm1"
815 +}}}
1291 1291  
1292 -{{code language="python"}}
1293 -def normalize_claim(text: str) -> str:
1294 - """
1295 - Canonical claim normalization for deduplication.
1296 - MUST follow this algorithm exactly.
1297 -
1298 - Version: v1norm1
1299 - """
1300 - import re
1301 - import unicodedata
1302 -
1303 - # 1. Unicode normalization (NFD)
1304 - text = unicodedata.normalize('NFD', text)
1305 -
1306 - # 2. Lowercase
1307 - text = text.lower()
1308 -
1309 - # 3. Remove diacritics
1310 - text = ''.join(c for c in text if unicodedata.category(c) != 'Mn')
1311 -
1312 - # 4. Normalize whitespace
1313 - text = re.sub(r'\s+', ' ', text)
1314 - text = text.strip()
1315 -
1316 - # 5. Remove punctuation except apostrophes in contractions
1317 - text = re.sub(r"[^\w\s']", '', text)
1318 -
1319 - # 6. Normalize common contractions
1320 - contractions = {
1321 - "don't": "do not",
1322 - "doesn't": "does not",
1323 - "didn't": "did not",
1324 - "can't": "cannot",
1325 - "won't": "will not",
1326 - "shouldn't": "should not",
1327 - "wouldn't": "would not",
1328 - "isn't": "is not",
1329 - "aren't": "are not",
1330 - "wasn't": "was not",
1331 - "weren't": "were not",
1332 - "haven't": "have not",
1333 - "hasn't": "has not",
1334 - "hadn't": "had not"
1335 - }
1336 -
1337 - for contraction, expansion in contractions.items():
1338 - text = re.sub(r'\b' + contraction + r'\b', expansion, text)
1339 -
1340 - # 7. Remove remaining apostrophes
1341 - text = text.replace("'", "")
1342 -
1343 - # 8. Final whitespace normalization
1344 - text = re.sub(r'\s+', ' ', text)
1345 - text = text.strip()
1346 -
1347 - return text
1348 -{{/code}}
817 +**Cache Key Formula (Updated):**
1349 1349  
1350 -**Normalization Examples:**
819 +{{{language = "en"
820 +canonical = normalize_claim_v1(claim_text, language)
821 +cache_key = f"claim:{CANONICALIZER_VERSION}:{language}:{sha256(canonical)}"
1351 1351  
1352 -|= Input |= Normalized Output
1353 -| "Biden won the 2020 election" | {{code}}biden won the 2020 election{{/code}}
1354 -| "Biden won the 2020 election!" | {{code}}biden won the 2020 election{{/code}}
1355 -| "Biden won the 2020 election" | {{code}}biden won the 2020 election{{/code}}
1356 -| "Biden didn't win the 2020 election" | {{code}}biden did not win the 2020 election{{/code}}
1357 -| "BIDEN WON THE 2020 ELECTION" | {{code}}biden won the 2020 election{{/code}}
823 +Example:
824 + claim: "COVID-19 vaccines are 95% effective"
825 + canonical: "covid vaccines are 95 percent effective"
826 + sha256: abc123...def456
827 + key: "claim:v1norm1:en:abc123...def456"
828 +}}}
1358 1358  
1359 -**Versioning:** Algorithm version is {{code}}v1norm1{{/code}}. Changes to the algorithm require a new version identifier.
830 +**Cache Metadata MUST Include:**
1360 1360  
832 +{{{{
833 + "canonical_claim": "covid vaccines are 95 percent effective",
834 + "canonicalizer_version": "v1norm1",
835 + "language": "en",
836 + "original_claim_samples": ["COVID-19 vaccines are 95% effective"]
837 +}
838 +}}}
839 +
840 +**Version Upgrade Path:**
841 +
842 +* v1norm1 → v1norm2: Cache namespace changes, old keys remain valid until TTL
843 +* v1normN → v2norm1: Major version bump, invalidate all v1 caches
844 +
845 +----
846 +
1361 1361  === 5.1.2 Copyright & Data Retention Policy ===
1362 1362  
1363 1363  **Evidence Excerpt Storage:**