Changes for page POC1 API & Schemas Specification
Last modified by Robert Schaub on 2025/12/24 18:26
Summary
-
Page properties (2 modified, 0 added, 0 removed)
Details
- Page properties
-
- Title
-
... ... @@ -1,1 +1,1 @@ 1 -POC1 API & Schemas Specification v0.4.11 +POC1 API & Schemas Specification - Content
-
... ... @@ -1,44 +1,25 @@ 1 - #FactHarborPOC1—API & Schemas Specification1 += POC1 API & Schemas Specification = 2 2 3 -**Version:** 0.4.1 (POC1 - 3-Stage Caching Architecture) 4 -**Namespace:** FactHarbor.* 5 -**Syntax:** xWiki 2.1 6 -**Last Updated:** 2025-12-24 3 +---- 7 7 8 ---- 9 - 10 10 == Version History == 11 11 12 12 |=Version|=Date|=Changes 13 13 |0.4.1|2025-12-24|Applied 9 critical fixes: file format notice, verdict taxonomy, canonicalization algorithm, Stage 1 cost policy, BullMQ fix, language in cache key, historical claims TTL, idempotency, copyright policy 14 14 |0.4|2025-12-24|**BREAKING:** 3-stage pipeline with claim-level caching, user tier system, cache-only mode for free users, Redis cache architecture 15 -|0.3.1|2025-12-24|Fixed single-prompt strategy, SSE clarification, schema canonicalization, cost constraints, chain-of-thought, evidence citation, Jina safety, gate numbering 16 -|0.3|2025-12-24|Added complete API endpoints, LLM config, risk tiers, scraping details, quality gate logging, temporal separation note, cross-references 17 -|0.2|2025-12-24|Initial rebased version with holistic assessment 18 -|0.1|2025-12-24|Original specification 10 +|0.3.1|2025-12-24|Fixed single-prompt strategy, SSE clarification, schema canonicalization, cost constraints 11 +|0.3|2025-12-24|Added complete API endpoints, LLM config, risk tiers, scraping details 19 19 20 ---- 21 ---- 13 +---- 22 22 23 -== File Format Notice == 24 - 25 -**⚠️ Important:** This file is stored as {{code}}.md{{/code}} for transport/versioning, but the content is **xWiki 2.1 syntax** (not Markdown). 26 - 27 -**When importing to xWiki:** 28 -* Use "Import as XWiki content" (not "Import as Markdown") 29 -* The xWiki parser will correctly interpret {{code}}==}} headers, {{{{code}}}}}} blocks, etc. 30 - 31 -**Alternate naming:** If your workflow supports it, rename to {{code}}.xwiki.txt{{/code}} to avoid ambiguity. 32 - 33 ---- 34 - 35 35 == 1. Core Objective (POC1) == 36 36 37 -The primary technical goal of POC1 is to validate **Approach 1 (Single-Pass Holistic Analysis)** while implementing **claim-level caching** to achieve cost sustainability :17 +The primary technical goal of POC1 is to validate **Approach 1 (Single-Pass Holistic Analysis)** while implementing **claim-level caching** to achieve cost sustainability. 38 38 39 -The system must prove that AI can identify an article's **Main Thesis** and determine if thesupporting claims(even if individually accurate) logically support that thesis without committing fallacies(e.g., correlation vs. causation, cherry-picking, hasty generalization).19 +The system must prove that AI can identify an article's **Main Thesis** and determine if supporting claims logically support that thesis without committing fallacies. 40 40 41 -**Success Criteria:** 21 +=== Success Criteria: === 22 + 42 42 * Test with 30 diverse articles 43 43 * Target: ≥70% accuracy detecting misleading articles 44 44 * Cost: <$0.25 per NEW analysis (uncached) ... ... @@ -46,14 +46,13 @@ 46 46 * Cache hit rate: ≥50% after 1,000 articles 47 47 * Processing time: <2 minutes (standard depth) 48 48 49 -**Economic Model:** 50 -* Free tier: $10 credit per month (~40-140 articles depending on cache hits) 51 -* After limit: Cache-only mode (instant, free access to cached claims) 52 -* Paid tier: Unlimited new analyses 30 +=== Economic Model: === 53 53 54 -**See:** [[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]] for complete investigation of 7 approaches. 32 +* **Free tier:** $10 credit per month (~~40-140 articles depending on cache hits) 33 +* **After limit:** Cache-only mode (instant, free access to cached claims) 34 +* **Paid tier:** Unlimited new analyses 55 55 56 ---- 36 +---- 57 57 58 58 == 2. Architecture Overview == 59 59 ... ... @@ -61,52 +61,61 @@ 61 61 62 62 FactHarbor POC1 uses a **3-stage architecture** designed for claim-level caching and cost efficiency: 63 63 64 -{{ code language="mermaid"}}44 +{{mermaid}} 65 65 graph TD 66 - A[Article Input] --> B[Stage 1: Extract Claims]67 - B --> C{For Each Claim}68 - C --> D[Check Cache]69 - D -->|Cache HIT| E[Return Cached Verdict]70 - D -->|Cache MISS| F[Stage 2: Analyze Claim]71 - F --> G[Store in Cache]72 - G --> E73 - E --> H[Stage 3: Holistic Assessment]74 - H --> I[Final Report]75 -{{/ code}}46 + A[Article Input] --> B[Stage 1: Extract Claims] 47 + B --> C{For Each Claim} 48 + C --> D[Check Cache] 49 + D -->|Cache HIT| E[Return Cached Verdict] 50 + D -->|Cache MISS| F[Stage 2: Analyze Claim] 51 + F --> G[Store in Cache] 52 + G --> E 53 + E --> H[Stage 3: Holistic Assessment] 54 + H --> I[Final Report] 55 +{{/mermaid}} 76 76 77 -**Stage 1: Claim Extraction** (Haiku, no cache) 78 -* Input: Article text 79 -* Output: 5 canonical claims (normalized, deduplicated) 80 -* Model: Claude Haiku 4 81 -* Cost: $0.003 per article 82 -* Cache strategy: No caching (article-specific) 57 +==== Stage 1: Claim Extraction (Haiku, no cache) ==== 83 83 84 -**Stage 2: Claim Analysis** (Sonnet, CACHED) 85 -* Input: Single canonical claim 86 -* Output: Scenarios + Evidence + Verdicts 87 -* Model: Claude Sonnet 3.5 88 -* Cost: $0.081 per NEW claim 89 -* Cache strategy: **Redis, 90-day TTL** 90 -* Cache key: {{code}}claim:v1norm1:{language}:{sha256(canonical_claim)}{{/code}} 59 +* **Input:** Article text 60 +* **Output:** 5 canonical claims (normalized, deduplicated) 61 +* **Model:** Claude Haiku 4 (default, configurable via LLM abstraction layer) 62 +* **Cost:** $0.003 per article 63 +* **Cache strategy:** No caching (article-specific) 91 91 92 -**Stage 3: Holistic Assessment** (Sonnet, no cache) 93 -* Input: Article + Claim verdicts (from cache or Stage 2) 94 -* Output: Article verdict + Fallacies + Logic quality 95 -* Model: Claude Sonnet 3.5 96 -* Cost: $0.030 per article 97 -* Cache strategy: No caching (article-specific) 65 +==== Stage 2: Claim Analysis (Sonnet, CACHED) ==== 98 98 99 -**Total Cost Formula:** 100 -{{code}} 101 -Cost = $0.003 (extraction) + (N_new_claims × $0.081) + $0.030 (holistic) 67 +* **Input:** Single canonical claim 68 +* **Output:** Scenarios + Evidence + Verdicts 69 +* **Model:** Claude Sonnet 3.5 (default, configurable via LLM abstraction layer) 70 +* **Cost:** $0.081 per NEW claim 71 +* **Cache strategy:** Redis, 90-day TTL 72 +* **Cache key:** claim:v1norm1:{language}:{sha256(canonical_claim)} 102 102 74 +==== Stage 3: Holistic Assessment (Sonnet, no cache) ==== 75 + 76 +* **Input:** Article + Claim verdicts (from cache or Stage 2) 77 +* **Output:** Article verdict + Fallacies + Logic quality 78 +* **Model:** Claude Sonnet 3.5 (default, configurable via LLM abstraction layer) 79 +* **Cost:** $0.030 per article 80 +* **Cache strategy:** No caching (article-specific) 81 + 82 + 83 + 84 +**Note:** Stage 3 implements **Approach 1 (Single-Pass Holistic Analysis)** from the [[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]]. While claim analysis (Stage 2) is cached for efficiency, the holistic assessment maintains the integrated evaluation philosophy of Approach 1. 85 + 86 +=== Total Cost Formula: === 87 + 88 +{{{Cost = $0.003 (extraction) + (N_new_claims × $0.081) + $0.030 (holistic) 89 + 103 103 Examples: 104 104 - 0 new claims (100% cache hit): $0.033 105 105 - 1 new claim (80% cache hit): $0.114 106 106 - 3 new claims (40% cache hit): $0.276 107 107 - 5 new claims (0% cache hit): $0.438 108 - {{/code}}95 +}}} 109 109 97 +---- 98 + 110 110 === 2.2 User Tier System === 111 111 112 112 |=Tier|=Monthly Credit|=After Limit|=Cache Access|=Analytics ... ... @@ -115,17 +115,21 @@ 115 115 |**Enterprise** (future)|Custom|Continues|✅ Full + Priority|Full 116 116 117 117 **Free Tier Economics:** 107 + 118 118 * $10 credit = 40-140 articles analyzed (depending on cache hit rate) 119 119 * Average 70 articles/month at 70% cache hit rate 120 -* After limit: Cache-only mode (see Section 2.3)110 +* After limit: Cache-only mode 121 121 112 +---- 113 + 122 122 === 2.3 Cache-Only Mode (Free Tier Feature) === 123 123 124 124 When free users reach their $10 monthly limit, they enter **Cache-Only Mode**: 125 125 126 - **What Cache-Only Mode Provides:**118 +==== What Cache-Only Mode Provides: ==== 127 127 128 128 ✅ **Claim Extraction (Platform-Funded):** 121 + 129 129 * Stage 1 extraction runs at $0.003 per article 130 130 * **Cost: Absorbed by platform** (not charged to user credit) 131 131 * Rationale: Extraction is necessary to check cache, and cost is negligible ... ... @@ -132,628 +132,560 @@ 132 132 * Rate limit: Max 50 extractions/day in cache-only mode (prevents abuse) 133 133 134 134 ✅ **Instant Access to Cached Claims:** 128 + 135 135 * Any claim that exists in cache → Full verdict returned 136 136 * Cost: $0 (no LLM calls) 137 137 * Response time: <100ms 138 138 139 139 ✅ **Partial Article Analysis:** 134 + 140 140 * Check each claim against cache 141 141 * Return verdicts for ALL cached claims 142 -* For uncached claims: Return {{code}}"status": "cache_miss"{{/code}}137 +* For uncached claims: Return "status": "cache_miss" 143 143 144 144 ✅ **Cache Coverage Report:** 140 + 145 145 * "3 of 5 claims available in cache (60% coverage)" 146 146 * Links to cached analyses 147 147 * Estimated cost to complete: $0.162 (2 new claims) 148 148 149 149 ❌ **Not Available in Cache-Only Mode:** 146 + 150 150 * New claim analysis (Stage 2 LLM calls blocked) 151 151 * Full holistic assessment (Stage 3 blocked if any claims missing) 152 152 153 - **User Experience:**154 - {{code language="json"}}155 -{ 156 - "status": "cache_only_mode",157 - "message": "Monthly credit limit reached. Showing cached results only.",158 - "cache_coverage": {159 - "claims_total": 5,160 - "claims_cached": 3,161 - "claims_missing": 2,162 - "coverage_percent": 60163 - },164 - "cached_claims": [165 - {"claim_id": "C1", "verdict": "Likely", "confidence": 0.82},166 - {"claim_id": "C2", "verdict": "Highly Likely", "confidence": 0.91},167 - {"claim_id": "C4", "verdict": "Unclear", "confidence": 0.55}168 - ],169 - "missing_claims": [170 - {"claim_id": "C3", "claim_text": "...", "estimated_cost": "$0.081"},171 - {"claim_id": "C5", "claim_text": "...", "estimated_cost": "$0.081"}172 - ],173 - "upgrade_options": {174 - "top_up": "$5 for 20-70 more articles",175 - "pro_tier": "$50/month unlimited"176 - }150 +==== User Experience Example: ==== 151 + 152 +{{{{ 153 + "status": "cache_only_mode", 154 + "message": "Monthly credit limit reached. Showing cached results only.", 155 + "cache_coverage": { 156 + "claims_total": 5, 157 + "claims_cached": 3, 158 + "claims_missing": 2, 159 + "coverage_percent": 60 160 + }, 161 + "cached_claims": [ 162 + {"claim_id": "C1", "verdict": "Likely", "confidence": 0.82}, 163 + {"claim_id": "C2", "verdict": "Highly Likely", "confidence": 0.91}, 164 + {"claim_id": "C4", "verdict": "Unclear", "confidence": 0.55} 165 + ], 166 + "missing_claims": [ 167 + {"claim_id": "C3", "claim_text": "...", "estimated_cost": "$0.081"}, 168 + {"claim_id": "C5", "claim_text": "...", "estimated_cost": "$0.081"} 169 + ], 170 + "upgrade_options": { 171 + "top_up": "$5 for 20-70 more articles", 172 + "pro_tier": "$50/month unlimited" 173 + } 177 177 } 178 - {{/code}}175 +}}} 179 179 180 180 **Design Rationale:** 178 + 181 181 * Free users still get value (cached claims often answer their question) 182 182 * Demonstrates FactHarbor's value (partial results encourage upgrade) 183 183 * Sustainable for platform (no additional cost) 184 184 * Fair to all users (everyone contributes to cache) 185 185 186 ---- 184 +---- 187 187 188 -== 3. REST API Contract == 189 189 190 -=== 3.1 User Credit Tracking === 191 191 192 - **Endpoint:**{{code}}GET/v1/user/credit{{/code}}188 +== 6. LLM Abstraction Layer == 193 193 194 - **Response:**{{code}}200OK{{/code}}190 +=== 6.1 Design Principle === 195 195 196 -{{code language="json"}} 197 -{ 198 - "user_id": "user_abc123", 199 - "tier": "free", 200 - "credit_limit": 10.00, 201 - "credit_used": 7.42, 202 - "credit_remaining": 2.58, 203 - "reset_date": "2025-02-01T00:00:00Z", 204 - "cache_only_mode": false, 205 - "usage_stats": { 206 - "articles_analyzed": 67, 207 - "claims_from_cache": 189, 208 - "claims_newly_analyzed": 113, 209 - "cache_hit_rate": 0.626 210 - } 192 +**FactHarbor uses provider-agnostic LLM abstraction** to avoid vendor lock-in and enable: 193 + 194 +* **Provider switching:** Change LLM providers without code changes 195 +* **Cost optimization:** Use different providers for different stages 196 +* **Resilience:** Automatic fallback if primary provider fails 197 +* **Cross-checking:** Compare outputs from multiple providers 198 +* **A/B testing:** Test new models without deployment changes 199 + 200 +**Implementation:** All LLM calls go through an abstraction layer that routes to configured providers. 201 + 202 +---- 203 + 204 +=== 6.2 LLM Provider Interface === 205 + 206 +**Abstract Interface:** 207 + 208 +{{{ 209 +interface LLMProvider { 210 + // Core methods 211 + complete(prompt: string, options: CompletionOptions): Promise<CompletionResponse> 212 + stream(prompt: string, options: CompletionOptions): AsyncIterator<StreamChunk> 213 + 214 + // Provider metadata 215 + getName(): string 216 + getMaxTokens(): number 217 + getCostPer1kTokens(): { input: number, output: number } 218 + 219 + // Health check 220 + isAvailable(): Promise<boolean> 211 211 } 212 -{{/code}} 213 213 214 ---- 223 +interface CompletionOptions { 224 + model?: string 225 + maxTokens?: number 226 + temperature?: number 227 + stopSequences?: string[] 228 + systemPrompt?: string 229 +} 230 +}}} 215 215 216 - === 3.2 Create Analysis Job (3-Stage) ===232 +---- 217 217 218 - **Endpoint:**{{code}}POST /v1/analyze{{/code}}234 +=== 6.3 Supported Providers (POC1) === 219 219 220 -** RequestBody:**236 +**Primary Provider (Default):** 221 221 238 +* **Anthropic Claude API** 239 + * Models: Claude Haiku 4, Claude Sonnet 3.5, Claude Opus 4 240 + * Used by default in POC1 241 + * Best quality for holistic analysis 222 222 223 -** IdempotencySupport:**243 +**Secondary Providers (Future):** 224 224 225 -To prevent duplicate job creation on network retries, clients SHOULD include: 245 +* **OpenAI API** 246 + * Models: GPT-4o, GPT-4o-mini 247 + * For cost comparison 248 + 249 +* **Google Vertex AI** 250 + * Models: Gemini 1.5 Pro, Gemini 1.5 Flash 251 + * For diversity in evidence gathering 226 226 227 -{{code language="http"}} 228 -POST /v1/analyze 229 -Idempotency-Key: {client-generated-uuid} 230 -{{/code}} 253 +* **Local Models** (Post-POC) 254 + * Models: Llama 3.1, Mistral 255 + * For privacy-sensitive deployments 231 231 232 - OR use the {{code}}client.request_id{{/code}} field:257 +---- 233 233 234 -{{code language="json"}} 235 -{ 236 - "input_url": "...", 237 - "client": { 238 - "request_id": "client-uuid-12345", 239 - "source_label": "optional" 240 - } 241 -} 242 -{{/code}} 259 +=== 6.4 Provider Configuration === 243 243 244 -**Server Behavior:** 245 -* If {{code}}Idempotency-Key{{/code}} or {{code}}request_id{{/code}} seen before (within 24 hours): 246 - - Return existing job ({{code}}200 OK{{/code}}, not {{code}}202 Accepted{{/code}}) 247 - - Do NOT create duplicate job or charge twice 248 -* Idempotency keys expire after 24 hours (matches job retention) 261 +**Environment Variables:** 249 249 250 -**Example Response (Idempotent):** 251 -{{code language="json"}} 252 -{ 253 - "job_id": "01J...ULID", 254 - "status": "RUNNING", 255 - "idempotent": true, 256 - "original_request_at": "2025-12-24T10:31:00Z", 257 - "message": "Returning existing job (idempotency key matched)" 258 -} 259 -{{/code}} 263 +{{{ 264 +# Primary provider 265 +LLM_PRIMARY_PROVIDER=anthropic 266 +ANTHROPIC_API_KEY=sk-ant-... 260 260 268 +# Fallback provider 269 +LLM_FALLBACK_PROVIDER=openai 270 +OPENAI_API_KEY=sk-... 261 261 262 -{{code language="json"}} 263 -{ 264 - "input_type": "url", 265 - "input_url": "https://example.com/medical-report-01", 266 - "input_text": null, 267 - "options": { 268 - "browsing": "on", 269 - "depth": "standard", 270 - "max_claims": 5, 271 - "context_aware_analysis": true, 272 - "cache_preference": "prefer_cache" 273 - }, 274 - "client": { 275 - "request_id": "optional-client-tracking-id", 276 - "source_label": "optional" 277 - } 278 -} 279 -{{/code}} 272 +# Provider selection per stage 273 +LLM_STAGE1_PROVIDER=anthropic 274 +LLM_STAGE1_MODEL=claude-haiku-4 275 +LLM_STAGE2_PROVIDER=anthropic 276 +LLM_STAGE2_MODEL=claude-sonnet-3-5 277 +LLM_STAGE3_PROVIDER=anthropic 278 +LLM_STAGE3_MODEL=claude-sonnet-3-5 280 280 281 -**Options:** 282 -* {{code}}cache_preference{{/code}}: {{code}}prefer_cache{{/code}} | {{code}}require_fresh{{/code}} | {{code}}allow_partial{{/code}} 283 - - {{code}}prefer_cache{{/code}}: Use cache when available, analyze new claims (default) 284 - - {{code}}require_fresh{{/code}}: Force re-analysis of all claims (ignores cache, costs more) 285 - - {{code}}allow_partial{{/code}}: Return partial results if some claims uncached (for free tier cache-only mode) 280 +# Cost limits 281 +LLM_MAX_COST_PER_REQUEST=1.00 282 +}}} 286 286 287 -** Response:**{{code}}202Accepted{{/code}}284 +**Database Configuration (Alternative):** 288 288 289 -{{ code language="json"}}286 +{{{{ 290 290 { 291 - "job_id": "01J...ULID", 292 - "status": "QUEUED", 293 - "created_at": "2025-12-24T10:31:00Z", 294 - "estimated_cost": 0.114, 295 - "cost_breakdown": { 296 - "stage1_extraction": 0.003, 297 - "stage2_new_claims": 0.081, 298 - "stage2_cached_claims": 0.000, 299 - "stage3_holistic": 0.030 300 - }, 301 - "cache_info": { 302 - "claims_to_extract": 5, 303 - "estimated_cache_hits": 4, 304 - "estimated_new_claims": 1 305 - }, 306 - "links": { 307 - "self": "/v1/jobs/01J...ULID", 308 - "result": "/v1/jobs/01J...ULID/result", 309 - "report": "/v1/jobs/01J...ULID/report", 310 - "events": "/v1/jobs/01J...ULID/events" 288 + "providers": [ 289 + { 290 + "name": "anthropic", 291 + "api_key_ref": "vault://anthropic-api-key", 292 + "enabled": true, 293 + "priority": 1 294 + }, 295 + { 296 + "name": "openai", 297 + "api_key_ref": "vault://openai-api-key", 298 + "enabled": true, 299 + "priority": 2 300 + } 301 + ], 302 + "stage_config": { 303 + "stage1": { 304 + "provider": "anthropic", 305 + "model": "claude-haiku-4", 306 + "max_tokens": 4096, 307 + "temperature": 0.0 308 + }, 309 + "stage2": { 310 + "provider": "anthropic", 311 + "model": "claude-sonnet-3-5", 312 + "max_tokens": 16384, 313 + "temperature": 0.3 314 + }, 315 + "stage3": { 316 + "provider": "anthropic", 317 + "model": "claude-sonnet-3-5", 318 + "max_tokens": 8192, 319 + "temperature": 0.2 320 + } 311 311 } 312 312 } 313 - {{/code}}323 +}}} 314 314 315 - **Error Responses:**325 +---- 316 316 317 -{{code}}402 Payment Required{{/code}} - Free tier limit reached, cache-only mode 318 -{{code language="json"}} 319 -{ 320 - "error": "credit_limit_reached", 321 - "message": "Monthly credit limit reached. Entering cache-only mode.", 322 - "cache_only_mode": true, 323 - "credit_remaining": 0.00, 324 - "reset_date": "2025-02-01T00:00:00Z", 325 - "action": "Resubmit with cache_preference=allow_partial for cached results" 326 -} 327 -{{/code}} 327 +=== 6.5 Stage-Specific Models (POC1 Defaults) === 328 328 329 - ---329 +**Stage 1: Claim Extraction** 330 330 331 -=== 3.3 Get Job Status === 331 +* **Default:** Anthropic Claude Haiku 4 332 +* **Alternative:** OpenAI GPT-4o-mini, Google Gemini 1.5 Flash 333 +* **Rationale:** Fast, cheap, simple task 334 +* **Cost:** ~$0.003 per article 332 332 333 -** Endpoint:**{{code}}GET/v1/jobs/{job_id}{{/code}}336 +**Stage 2: Claim Analysis** (CACHEABLE) 334 334 335 -**Response:** {{code}}200 OK{{/code}} 338 +* **Default:** Anthropic Claude Sonnet 3.5 339 +* **Alternative:** OpenAI GPT-4o, Google Gemini 1.5 Pro 340 +* **Rationale:** High-quality analysis, cached 90 days 341 +* **Cost:** ~$0.081 per NEW claim 336 336 337 -{{code language="json"}} 338 -{ 339 - "job_id": "01J...ULID", 340 - "status": "RUNNING", 341 - "created_at": "2025-12-24T10:31:00Z", 342 - "updated_at": "2025-12-24T10:31:22Z", 343 - "progress": { 344 - "stage": "stage2_claim_analysis", 345 - "percent": 65, 346 - "message": "Analyzing claim 3 of 5 (2 from cache)", 347 - "current_claim_id": "C3", 348 - "cache_hits": 2, 349 - "cache_misses": 1 350 - }, 351 - "actual_cost": 0.084, 352 - "cost_breakdown": { 353 - "stage1_extraction": 0.003, 354 - "stage2_new_claims": 0.081, 355 - "stage2_cached_claims": 0.000, 356 - "stage3_holistic": null 357 - }, 358 - "input_echo": { 359 - "input_type": "url", 360 - "input_url": "https://example.com/medical-report-01" 361 - }, 362 - "links": { 363 - "self": "/v1/jobs/01J...ULID", 364 - "result": "/v1/jobs/01J...ULID/result", 365 - "report": "/v1/jobs/01J...ULID/report" 366 - }, 367 - "error": null 343 +**Stage 3: Holistic Assessment** 344 + 345 +* **Default:** Anthropic Claude Sonnet 3.5 346 +* **Alternative:** OpenAI GPT-4o, Claude Opus 4 (for high-stakes) 347 +* **Rationale:** Complex reasoning, logical fallacy detection 348 +* **Cost:** ~$0.030 per article 349 + 350 +**Cost Comparison (Example):** 351 + 352 +|=Stage|=Anthropic (Default)|=OpenAI Alternative|=Google Alternative 353 +|Stage 1|Claude Haiku 4 ($0.003)|GPT-4o-mini ($0.002)|Gemini Flash ($0.002) 354 +|Stage 2|Claude Sonnet 3.5 ($0.081)|GPT-4o ($0.045)|Gemini Pro ($0.050) 355 +|Stage 3|Claude Sonnet 3.5 ($0.030)|GPT-4o ($0.018)|Gemini Pro ($0.020) 356 +|**Total (0% cache)**|**$0.114**|**$0.065**|**$0.072** 357 + 358 +**Note:** POC1 uses Anthropic exclusively for consistency. Multi-provider support planned for POC2. 359 + 360 +---- 361 + 362 +=== 6.6 Failover Strategy === 363 + 364 +**Automatic Failover:** 365 + 366 +{{{ 367 +async function completeLLM(stage: string, prompt: string): Promise<string> { 368 + const primaryProvider = getProviderForStage(stage) 369 + const fallbackProvider = getFallbackProvider() 370 + 371 + try { 372 + return await primaryProvider.complete(prompt) 373 + } catch (error) { 374 + if (error.type === 'rate_limit' || error.type === 'service_unavailable') { 375 + logger.warn(`Primary provider failed, using fallback`) 376 + return await fallbackProvider.complete(prompt) 377 + } 378 + throw error 379 + } 368 368 } 369 - {{/code}}381 +}}} 370 370 371 - ---383 +**Fallback Priority:** 372 372 373 -=== 3.4 Get Analysis Result === 385 +1. **Primary:** Configured provider for stage 386 +2. **Secondary:** Fallback provider (if configured) 387 +3. **Cache:** Return cached result (if available for Stage 2) 388 +4. **Error:** Return 503 Service Unavailable 374 374 375 - **Endpoint:** {{code}}GET /v1/jobs/{job_id}/result{{/code}}390 +---- 376 376 377 - **Response:**{{code}}200OK{{/code}}392 +=== 6.7 Provider Selection API === 378 378 379 - Returns complete**AnalysisResult**schema (seeSection4).394 +**Admin Endpoint:** POST /admin/v1/llm/configure 380 380 381 -** Cache-OnlyModeResponse:** {{code}}206 PartialContent{{/code}}396 +**Update provider for specific stage:** 382 382 383 -{{ code language="json"}}398 +{{{{ 384 384 { 385 - "cache_only_mode": true, 386 - "cache_coverage": { 387 - "claims_total": 5, 388 - "claims_cached": 3, 389 - "claims_missing": 2, 390 - "coverage_percent": 60 400 + "stage": "stage2", 401 + "provider": "openai", 402 + "model": "gpt-4o", 403 + "max_tokens": 16384, 404 + "temperature": 0.3 405 +} 406 +}}} 407 + 408 +**Response:** 200 OK 409 + 410 +{{{{ 411 +{ 412 + "message": "LLM configuration updated", 413 + "stage": "stage2", 414 + "previous": { 415 + "provider": "anthropic", 416 + "model": "claude-sonnet-3-5" 391 391 }, 392 - "partial_result": { 393 - "metadata": { 394 - "job_id": "01J...ULID", 395 - "timestamp_utc": "2025-12-24T10:31:30Z", 396 - "engine_version": "POC1-v0.4", 397 - "cache_only": true 418 + "current": { 419 + "provider": "openai", 420 + "model": "gpt-4o" 421 + }, 422 + "cost_impact": { 423 + "previous_cost_per_claim": 0.081, 424 + "new_cost_per_claim": 0.045, 425 + "savings_percent": 44 426 + } 427 +} 428 +}}} 429 + 430 +**Get current configuration:** 431 + 432 +GET /admin/v1/llm/config 433 + 434 +{{{{ 435 +{ 436 + "providers": ["anthropic", "openai"], 437 + "primary": "anthropic", 438 + "fallback": "openai", 439 + "stages": { 440 + "stage1": { 441 + "provider": "anthropic", 442 + "model": "claude-haiku-4", 443 + "cost_per_request": 0.003 398 398 }, 399 - "claims": [ 400 - { 401 - "claim_id": "C1", 402 - "claim_text": "...", 403 - "canonical_claim": "...", 404 - "source": "cache", 405 - "cached_at": "2025-12-20T15:30:00Z", 406 - "cache_hit_count": 47, 407 - "scenarios": [...] 408 - }, 409 - { 410 - "claim_id": "C3", 411 - "claim_text": "...", 412 - "canonical_claim": "...", 413 - "source": "not_analyzed", 414 - "status": "cache_miss", 415 - "estimated_cost": 0.081 416 - } 417 - ], 418 - "article_holistic_assessment": null, 419 - "upgrade_prompt": { 420 - "message": "Upgrade to Pro for full analysis of all claims", 421 - "missing_claims": 2, 422 - "cost_to_complete": 0.192 445 + "stage2": { 446 + "provider": "anthropic", 447 + "model": "claude-sonnet-3-5", 448 + "cost_per_new_claim": 0.081 449 + }, 450 + "stage3": { 451 + "provider": "anthropic", 452 + "model": "claude-sonnet-3-5", 453 + "cost_per_request": 0.030 423 423 } 424 424 } 425 425 } 426 - {{/code}}457 +}}} 427 427 428 -**Other Responses:** 429 -* {{code}}409 Conflict{{/code}} - Job not finished yet 430 -* {{code}}404 Not Found{{/code}} - Job ID unknown 459 +---- 431 431 432 - ---461 +=== 6.8 Implementation Notes === 433 433 434 - === 3.5 Stage-Specific Endpoints(Optional,Advanced)===463 +**Provider Adapter Pattern:** 435 435 436 -For direct stage access (useful for cache debugging, custom workflows): 465 +{{{ 466 +class AnthropicProvider implements LLMProvider { 467 + async complete(prompt: string, options: CompletionOptions) { 468 + const response = await anthropic.messages.create({ 469 + model: options.model || 'claude-sonnet-3-5', 470 + max_tokens: options.maxTokens || 4096, 471 + messages: [{ role: 'user', content: prompt }], 472 + system: options.systemPrompt 473 + }) 474 + return response.content[0].text 475 + } 476 +} 437 437 438 -**Extract Claims Only:** 439 -{{code}}POST /v1/analyze/extract-claims{{/code}} 478 +class OpenAIProvider implements LLMProvider { 479 + async complete(prompt: string, options: CompletionOptions) { 480 + const response = await openai.chat.completions.create({ 481 + model: options.model || 'gpt-4o', 482 + max_tokens: options.maxTokens || 4096, 483 + messages: [ 484 + { role: 'system', content: options.systemPrompt }, 485 + { role: 'user', content: prompt } 486 + ] 487 + }) 488 + return response.choices[0].message.content 489 + } 490 +} 491 +}}} 440 440 441 -**Analyze Single Claim:** 442 -{{code}}POST /v1/analyze/claim{{/code}} 493 +**Provider Registry:** 443 443 444 -**Assess Article (with claim verdicts):** 445 -{{code}}POST /v1/analyze/assess-article{{/code}} 495 +{{{ 496 +const providers = new Map<string, LLMProvider>() 497 +providers.set('anthropic', new AnthropicProvider()) 498 +providers.set('openai', new OpenAIProvider()) 499 +providers.set('google', new GoogleProvider()) 446 446 447 -**Check Claim Cache:** 448 -{{code}}GET /v1/cache/claim/{claim_hash}{{/code}} 501 +function getProvider(name: string): LLMProvider { 502 + return providers.get(name) || providers.get(config.primaryProvider) 503 +} 504 +}}} 449 449 450 -**Cache Statistics:** 451 -{{code}}GET /v1/cache/stats{{/code}} 506 +---- 452 452 453 - ---508 +== 3. REST API Contract == 454 454 455 -=== 3. 6DownloadMarkdownReport===510 +=== 3.1 User Credit Tracking === 456 456 457 -**Endpoint:** {{code}}GET /v1/jobs/{job_id}/report{{/code}}512 +**Endpoint:** GET /v1/user/credit 458 458 459 -**Response:** {{code}}200 OK{{/code}} with {{code}}text/markdown; charset=utf-8{{/code}} content514 +**Response:** 200 OK 460 460 461 -**Headers:** 462 -* {{code}}Content-Disposition: attachment; filename="factharbor_poc1_{job_id}.md"{{/code}} 516 +{{{{ 517 + "user_id": "user_abc123", 518 + "tier": "free", 519 + "credit_limit": 10.00, 520 + "credit_used": 7.42, 521 + "credit_remaining": 2.58, 522 + "reset_date": "2025-02-01T00:00:00Z", 523 + "cache_only_mode": false, 524 + "usage_stats": { 525 + "articles_analyzed": 67, 526 + "claims_from_cache": 189, 527 + "claims_newly_analyzed": 113, 528 + "cache_hit_rate": 0.626 529 + } 530 +} 531 +}}} 463 463 464 - **Cache-Only Mode:** Report includes "Partial Analysis" watermark and upgrade prompt.533 +---- 465 465 466 -- --535 +=== 3.2 Create Analysis Job (3-Stage) === 467 467 468 - ===3.7StreamJob Events (Backend Progress) ===537 +**Endpoint:** POST /v1/analyze 469 469 470 - **Endpoint:** {{code}}GET /v1/jobs/{job_id}/events{{/code}}539 +==== Idempotency Support: ==== 471 471 472 - **Response:**Server-SentEvents(SSE)stream541 +To prevent duplicate job creation on network retries, clients SHOULD include: 473 473 474 -**Event Types:** 475 -* {{code}}progress{{/code}} - Backend progress (e.g., "Stage 1: Extracting claims") 476 -* {{code}}cache_hit{{/code}} - Claim found in cache 477 -* {{code}}cache_miss{{/code}} - Claim requires new analysis 478 -* {{code}}stage_complete{{/code}} - Stage 1/2/3 finished 479 -* {{code}}complete{{/code}} - Job finished 480 -* {{code}}error{{/code}} - Error occurred 481 -* {{code}}credit_warning{{/code}} - User approaching limit 543 +{{{POST /v1/analyze 544 +Idempotency-Key: {client-generated-uuid} 545 +}}} 482 482 483 - ---547 +OR use the client.request_id field: 484 484 485 -=== 3.8 Cancel Job === 549 +{{{{ 550 + "input_url": "...", 551 + "client": { 552 + "request_id": "client-uuid-12345", 553 + "source_label": "optional" 554 + } 555 +} 556 +}}} 486 486 487 -** Endpoint:** {{code}}DELETE/v1/jobs/{job_id}{{/code}}558 +**Server Behavior:** 488 488 489 -**Note:** If job is mid-stage (e.g., analyzing claim 3 of 5), user is charged for completed work only. 560 +* If Idempotency-Key or request_id seen before (within 24 hours): 561 +** Return existing job (200 OK, not 202 Accepted) 562 +** Do NOT create duplicate job or charge twice 563 +* Idempotency keys expire after 24 hours (matches job retention) 490 490 491 - ---565 +**Example Response (Idempotent):** 492 492 493 -=== 3.9 Health Check === 567 +{{{{ 568 + "job_id": "01J...ULID", 569 + "status": "RUNNING", 570 + "idempotent": true, 571 + "original_request_at": "2025-12-24T10:31:00Z", 572 + "message": "Returning existing job (idempotency key matched)" 573 +} 574 +}}} 494 494 495 - **Endpoint:**{{code}}GET /v1/health{{/code}}576 +==== Request Body: ==== 496 496 497 -{{code language="json"}} 498 -{ 499 - "status": "ok", 500 - "version": "POC1-v0.4", 501 - "model_stage1": "claude-haiku-4", 502 - "model_stage2": "claude-3-5-sonnet-20241022", 503 - "model_stage3": "claude-3-5-sonnet-20241022", 504 - "cache": { 505 - "status": "connected", 506 - "total_claims": 12847, 507 - "avg_hit_rate_24h": 0.73 508 - } 578 +{{{{ 579 + "input_type": "url", 580 + "input_url": "https://example.com/medical-report-01", 581 + "input_text": null, 582 + "options": { 583 + "browsing": "on", 584 + "depth": "standard", 585 + "max_claims": 5, 586 + "scenarios_per_claim": 2, 587 + "max_evidence_per_scenario": 6, 588 + "context_aware_analysis": true 589 + }, 590 + "client": { 591 + "request_id": "optional-client-tracking-id", 592 + "source_label": "optional" 593 + } 509 509 } 510 - {{/code}}595 +}}} 511 511 512 - ---597 +**Options:** 513 513 514 -== 4. Data Schemas == 599 +* browsing: on | off (retrieve web sources or just output queries) 600 +* depth: standard | deep (evidence thoroughness) 601 +* max_claims: 1-10 (default: **5** for cost control) 602 +* scenarios_per_claim: 1-5 (default: **2** for cost control) 603 +* max_evidence_per_scenario: 3-10 (default: **6**) 604 +* context_aware_analysis: true | false (experimental) 515 515 516 - === 4.1 Stage1 Output:ClaimExtraction ===606 +**Response:** 202 Accepted 517 517 518 -{{code language="json"}} 519 -{ 520 - "job_id": "01J...ULID", 521 - "stage": "stage1_extraction", 522 - "article_metadata": { 523 - "title": "Article title", 524 - "source_url": "https://example.com/article", 525 - "extracted_text_length": 5234, 526 - "language": "en" 527 - }, 528 - "claims": [ 529 - { 530 - "claim_id": "C1", 531 - "claim_text": "Original claim text from article", 532 - "canonical_claim": "Normalized, deduplicated phrasing", 533 - "claim_hash": "sha256:abc123...", 534 - "is_central_to_thesis": true, 535 - "claim_type": "causal", 536 - "evaluability": "evaluable", 537 - "risk_tier": "B", 538 - "domain": "public_health" 539 - } 540 - ], 541 - "article_thesis": "Main argument detected", 542 - "cost": 0.003 608 +{{{{ 609 + "job_id": "01J...ULID", 610 + "status": "QUEUED", 611 + "created_at": "2025-12-24T10:31:00Z", 612 + "estimated_cost": 0.114, 613 + "cost_breakdown": { 614 + "stage1_extraction": 0.003, 615 + "stage2_new_claims": 0.081, 616 + "stage2_cached_claims": 0.000, 617 + "stage3_holistic": 0.030 618 + }, 619 + "cache_info": { 620 + "claims_to_extract": 5, 621 + "estimated_cache_hits": 4, 622 + "estimated_new_claims": 1 623 + }, 624 + "links": { 625 + "self": "/v1/jobs/01J...ULID", 626 + "result": "/v1/jobs/01J...ULID/result", 627 + "report": "/v1/jobs/01J...ULID/report", 628 + "events": "/v1/jobs/01J...ULID/events" 629 + } 543 543 } 544 - {{/code}}631 +}}} 545 545 546 - ===4.2 Stage2 Output: ClaimAnalysis(CACHED) ===633 +**Error Responses:** 547 547 548 - Thisis theCACHEABLE unit.StoredinRediswith90-dayTTL.635 +402 Payment Required - Free tier limit reached, cache-only mode 549 549 550 -{{code language="json"}} 551 -{ 552 - "claim_hash": "sha256:abc123...", 553 - "canonical_claim": "COVID vaccines are 95% effective", 554 - "language": "en", 555 - "domain": "public_health", 556 - "analysis_version": "v1.0", 557 - "scenarios": [ 558 - { 559 - "scenario_id": "S1", 560 - "scenario_title": "mRNA vaccines (Pfizer/Moderna) in clinical trials", 561 - "definitions": {"95% effective": "95% reduction in symptomatic infection"}, 562 - "assumptions": ["Based on phase 3 trial data", "Against original strain"], 563 - "boundaries": { 564 - "time": "2020-2021 trials", 565 - "geography": "Multi-country trials", 566 - "population": "Adult population (16+)", 567 - "conditions": "Before widespread variants" 568 - }, 569 - "verdict": { 570 - "label": "Highly Likely", 571 - "probability_range": [0.88, 0.97], 572 - "confidence": 0.92, 573 - "reasoning_chain": [ 574 - "Pfizer/BioNTech trial: 95% efficacy (n=43,548)", 575 - "Moderna trial: 94.1% efficacy (n=30,420)", 576 - "Peer-reviewed publications in NEJM", 577 - "FDA independent analysis confirmed" 578 - ], 579 - "key_supporting_evidence_ids": ["E1", "E2"], 580 - "key_counter_evidence_ids": ["E3"], 581 - "uncertainty_factors": [ 582 - "Limited data on long-term effectiveness", 583 - "Variant-specific performance not yet measured" 584 - ] 585 - }, 586 - "evidence": [ 587 - { 588 - "evidence_id": "E1", 589 - "stance": "supports", 590 - "relevance_to_scenario": 0.98, 591 - "evidence_summary": [ 592 - "Pfizer trial showed 170 cases in placebo vs 8 in vaccine group", 593 - "Follow-up period median 2 months post-dose 2", 594 - "Efficacy consistent across age, sex, race, ethnicity" 595 - ], 596 - "citation": { 597 - "title": "Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine", 598 - "author_or_org": "Polack et al.", 599 - "publication_date": "2020-12-31", 600 - "url": "https://nejm.org/doi/full/10.1056/NEJMoa2034577", 601 - "publisher": "New England Journal of Medicine", 602 - "retrieved_at_utc": "2025-12-20T15:30:00Z" 603 - }, 604 - "excerpt": ["The vaccine was 95% effective in preventing Covid-19"], 605 - "excerpt_word_count": 9, 606 - "source_reliability_score": 0.95, 607 - "reliability_justification": "Peer-reviewed, high-impact journal, large RCT", 608 - "limitations_and_reservations": [ 609 - "Short follow-up period (2 months)", 610 - "Primarily measures symptomatic infection, not transmission" 611 - ], 612 - "retraction_or_dispute_signal": "none" 613 - } 614 - ] 615 - } 616 - ], 617 - "cache_metadata": { 618 - "first_analyzed": "2025-12-01T10:00:00Z", 619 - "last_updated": "2025-12-20T15:30:00Z", 620 - "hit_count": 47, 621 - "version": "v1.0", 622 - "ttl_expires": "2026-03-20T15:30:00Z" 623 - }, 624 - "cost": 0.081 637 +{{{{ 638 + "error": "credit_limit_reached", 639 + "message": "Monthly credit limit reached. Entering cache-only mode.", 640 + "cache_only_mode": true, 641 + "credit_remaining": 0.00, 642 + "reset_date": "2025-02-01T00:00:00Z", 643 + "action": "Resubmit with cache_preference=allow_partial for cached results" 625 625 } 626 - {{/code}}645 +}}} 627 627 628 -**Cache Key Structure:** 629 -{{code}} 630 -Redis Key: claim:v1norm1:{language}:{sha256(canonical_claim)} 631 -TTL: 90 days (7,776,000 seconds) 632 -Size: ~15KB JSON (compressed: ~5KB) 633 -{{/code}} 647 +---- 634 634 635 -== =4.3Stage3 Output: HolisticAssessment===649 +== 4. Data Schemas == 636 636 637 -{{code language="json"}} 638 -{ 639 - "job_id": "01J...ULID", 640 - "stage": "stage3_holistic", 641 - "article_metadata": { 642 - "title": "...", 643 - "main_thesis": "...", 644 - "source_url": "..." 645 - }, 646 - "article_holistic_assessment": { 647 - "overall_verdict": "MISLEADING", 648 - "logic_quality_score": 0.42, 649 - "fallacies_detected": [ 650 - "correlation-causation", 651 - "cherry-picking" 652 - ], 653 - "verdict_reasoning": [ 654 - "Central claim C1 is REFUTED by multiple systematic reviews", 655 - "Supporting claims C2-C4 are TRUE but do not support the thesis", 656 - "Article commits correlation-causation fallacy", 657 - "Selective citation of evidence (cherry-picking detected)" 658 - ], 659 - "experimental_feature": true 660 - }, 661 - "claims_summary": [ 662 - { 663 - "claim_id": "C1", 664 - "is_central_to_thesis": true, 665 - "verdict": "Refuted", 666 - "confidence": 0.89, 667 - "source": "cache", 668 - "cache_hit": true 669 - }, 670 - { 671 - "claim_id": "C2", 672 - "is_central_to_thesis": false, 673 - "verdict": "Highly Likely", 674 - "confidence": 0.91, 675 - "source": "new_analysis", 676 - "cache_hit": false 677 - } 678 - ], 679 - "quality_gates": { 680 - "gate1_claim_validation": "pass", 681 - "gate4_verdict_confidence": "pass", 682 - "passed_all": true 683 - }, 684 - "cost": 0.030, 685 - "total_job_cost": 0.114 686 -} 687 -{{/code}} 651 +=== 4.1 Stage 1 Output: ClaimExtraction === 688 688 689 -=== 4.4 Complete AnalysisResult (All 3 Stages Combined) === 690 - 691 -{{code language="json"}} 692 -{ 693 - "metadata": { 694 - "job_id": "01J...ULID", 695 - "timestamp_utc": "2025-12-24T10:31:30Z", 696 - "engine_version": "POC1-v0.4", 697 - "llm_stage1": "claude-haiku-4", 698 - "llm_stage2": "claude-3-5-sonnet-20241022", 699 - "llm_stage3": "claude-3-5-sonnet-20241022", 700 - "usage_stats": { 701 - "stage1_tokens": {"input": 10000, "output": 500}, 702 - "stage2_tokens": {"input": 2000, "output": 5000}, 703 - "stage3_tokens": {"input": 5000, "output": 1000}, 704 - "total_input_tokens": 17000, 705 - "total_output_tokens": 6500, 706 - "estimated_cost_usd": 0.114, 707 - "response_time_sec": 45.2 708 - }, 709 - "cache_stats": { 710 - "claims_total": 5, 711 - "claims_from_cache": 4, 712 - "claims_new_analysis": 1, 713 - "cache_hit_rate": 0.80, 714 - "cache_savings_usd": 0.324 715 - } 716 - }, 717 - "article_holistic_assessment": { 718 - "main_thesis": "...", 719 - "overall_verdict": "MISLEADING", 720 - "logic_quality_score": 0.42, 721 - "fallacies_detected": ["correlation-causation", "cherry-picking"], 722 - "verdict_reasoning": ["...", "...", "..."], 723 - "experimental_feature": true 724 - }, 725 - "claims": [ 726 - { 727 - "claim_id": "C1", 728 - "is_central_to_thesis": true, 729 - "claim_text": "...", 730 - "canonical_claim": "...", 731 - "claim_hash": "sha256:abc123...", 732 - "claim_type": "causal", 733 - "evaluability": "evaluable", 734 - "risk_tier": "B", 735 - "source": "cache", 736 - "cached_at": "2025-12-20T15:30:00Z", 737 - "cache_hit_count": 47, 738 - "scenarios": [...] 739 - }, 740 - { 741 - "claim_id": "C2", 742 - "source": "new_analysis", 743 - "analyzed_at": "2025-12-24T10:31:15Z", 744 - "scenarios": [...] 745 - } 746 - ], 747 - "quality_gates": { 748 - "gate1_claim_validation": "pass", 749 - "gate4_verdict_confidence": "pass", 750 - "passed_all": true 751 - } 653 +{{{{ 654 + "job_id": "01J...ULID", 655 + "stage": "stage1_extraction", 656 + "article_metadata": { 657 + "title": "Article title", 658 + "source_url": "https://example.com/article", 659 + "extracted_text_length": 5234, 660 + "language": "en" 661 + }, 662 + "claims": [ 663 + { 664 + "claim_id": "C1", 665 + "claim_text": "Original claim text from article", 666 + "canonical_claim": "Normalized, deduplicated phrasing", 667 + "claim_hash": "sha256:abc123...", 668 + "is_central_to_thesis": true, 669 + "claim_type": "causal", 670 + "evaluability": "evaluable", 671 + "risk_tier": "B", 672 + "domain": "public_health" 673 + } 674 + ], 675 + "article_thesis": "Main argument detected", 676 + "cost": 0.003 752 752 } 753 - {{/code}}678 +}}} 754 754 680 +---- 755 755 756 - 757 757 === 4.5 Verdict Label Taxonomy === 758 758 759 759 FactHarbor uses **three distinct verdict taxonomies** depending on analysis level: ... ... @@ -763,23 +763,26 @@ 763 763 Used for individual scenario verdicts within a claim. 764 764 765 765 **Enum Values:** 766 -* {{code}}Highly Likely{{/code}} - Probability 0.85-1.0, high confidence 767 -* {{code}}Likely{{/code}} - Probability 0.65-0.84, moderate-high confidence 768 -* {{code}}Unclear{{/code}} - Probability 0.35-0.64, or low confidence 769 -* {{code}}Unlikely{{/code}} - Probability 0.16-0.34, moderate-high confidence 770 -* {{code}}Highly Unlikely{{/code}} - Probability 0.0-0.15, high confidence 771 -* {{code}}Unsubstantiated{{/code}} - Insufficient evidence to determine probability 772 772 692 +* Highly Likely - Probability 0.85-1.0, high confidence 693 +* Likely - Probability 0.65-0.84, moderate-high confidence 694 +* Unclear - Probability 0.35-0.64, or low confidence 695 +* Unlikely - Probability 0.16-0.34, moderate-high confidence 696 +* Highly Unlikely - Probability 0.0-0.15, high confidence 697 +* Unsubstantiated - Insufficient evidence to determine probability 698 + 773 773 ==== 4.5.2 Claim Verdict Labels (Rollup) ==== 774 774 775 775 Used when summarizing a claim across all scenarios. 776 776 777 777 **Enum Values:** 778 -* {{code}}Supported{{/code}} - Majority of scenarios are Likely or Highly Likely 779 -* {{code}}Refuted{{/code}} - Majority of scenarios are Unlikely or Highly Unlikely 780 -* {{code}}Inconclusive{{/code}} - Mixed scenarios or majority Unclear/Unsubstantiated 781 781 705 +* Supported - Majority of scenarios are Likely or Highly Likely 706 +* Refuted - Majority of scenarios are Unlikely or Highly Unlikely 707 +* Inconclusive - Mixed scenarios or majority Unclear/Unsubstantiated 708 + 782 782 **Mapping Logic:** 710 + 783 783 * If ≥60% scenarios are (Highly Likely | Likely) → Supported 784 784 * If ≥60% scenarios are (Highly Unlikely | Unlikely) → Refuted 785 785 * Otherwise → Inconclusive ... ... @@ -789,23 +789,23 @@ 789 789 Used for holistic article-level assessment. 790 790 791 791 **Enum Values:** 792 -* {{code}}WELL-SUPPORTED{{/code}} - Article thesis logically follows from supported claims 793 -* {{code}}MISLEADING{{/code}} - Claims may be true but article commits logical fallacies 794 -* {{code}}REFUTED{{/code}} - Central claims are refuted, invalidating thesis 795 -* {{code}}UNCERTAIN{{/code}} - Insufficient evidence or highly mixed claim verdicts 796 796 721 +* WELL-SUPPORTED - Article thesis logically follows from supported claims 722 +* MISLEADING - Claims may be true but article commits logical fallacies 723 +* REFUTED - Central claims are refuted, invalidating thesis 724 +* UNCERTAIN - Insufficient evidence or highly mixed claim verdicts 725 + 797 797 **Note:** Article verdict considers **claim centrality** (central claims override supporting claims). 798 798 799 799 ==== 4.5.4 API Field Mapping ==== 800 800 801 801 |=Level|=API Field|=Enum Name 802 -|Scenario| {{code}}scenarios[].verdict.label{{/code}}|scenario_verdict_label803 -|Claim| {{code}}claims[].rollup_verdict{{/code}}(optional)|claim_verdict_label804 -|Article| {{code}}article_holistic_assessment.overall_verdict{{/code}}|article_verdict_label731 +|Scenario|scenarios[].verdict.label|scenario_verdict_label 732 +|Claim|claims[].rollup_verdict (optional)|claim_verdict_label 733 +|Article|article_holistic_assessment.overall_verdict|article_verdict_label 805 805 735 +---- 806 806 807 ---- 808 - 809 809 == 5. Cache Architecture == 810 810 811 811 === 5.1 Redis Cache Design === ... ... @@ -813,117 +813,29 @@ 813 813 **Technology:** Redis 7.0+ (in-memory key-value store) 814 814 815 815 **Cache Key Schema:** 816 -{{code}} 817 -claim:v1norm1:{language}:{sha256(canonical_claim)} 818 -{{/code}} 819 819 745 +{{{claim:v1norm1:{language}:{sha256(canonical_claim)} 746 +}}} 747 + 820 820 **Example:** 821 - {{code}}822 -Claim (English): "COVID vaccines are 95% effective" 749 + 750 +{{{Claim (English): "COVID vaccines are 95% effective" 823 823 Canonical: "covid vaccines are 95 percent effective" 824 824 Language: "en" 825 825 SHA256: abc123...def456 826 826 Key: claim:v1norm1:en:abc123...def456 827 - {{/code}}755 +}}} 828 828 829 829 **Rationale:** Prevents cross-language collisions and enables per-language cache analytics. 830 830 831 831 **Data Structure:** 832 -{{code language="redis"}} 833 -SET claim:v1:abc123...def456 '{...ClaimAnalysis JSON...}' 834 -EXPIRE claim:v1:abc123...def456 7776000 # 90 days 835 -{{/code}} 836 836 837 -**Additional Keys:** 838 -{{code}} 761 +{{{SET claim:v1norm1:en:abc123...def456 '{...ClaimAnalysis JSON...}' 762 +EXPIRE claim:v1norm1:en:abc123...def456 7776000 # 90 days 763 +}}} 839 839 840 - ==== 5.1.1 Canonical Claim Normalization (v1) ====765 +---- 841 841 842 -The cache key depends on deterministic claim normalization. All implementations MUST follow this algorithm exactly. 843 - 844 -**Algorithm: Canonical Claim Normalization v1** 845 - 846 -{{code language="python"}} 847 -def normalize_claim_v1(claim_text: str, language: str) -> str: 848 - """ 849 - Normalizes claim to canonical form for cache key generation. 850 - Version: v1norm1 (POC1) 851 - """ 852 - import re 853 - import unicodedata 854 - 855 - # Step 1: Unicode normalization (NFC) 856 - text = unicodedata.normalize('NFC', claim_text) 857 - 858 - # Step 2: Lowercase 859 - text = text.lower() 860 - 861 - # Step 3: Remove punctuation (except hyphens in words) 862 - text = re.sub(r'[^\w\s-]', '', text) 863 - 864 - # Step 4: Normalize whitespace (collapse multiple spaces) 865 - text = re.sub(r'\s+', ' ', text).strip() 866 - 867 - # Step 5: Numeric normalization 868 - text = text.replace('%', ' percent') 869 - # Spell out single-digit numbers 870 - num_to_word = {'0':'zero', '1':'one', '2':'two', '3':'three', 871 - '4':'four', '5':'five', '6':'six', '7':'seven', 872 - '8':'eight', '9':'nine'} 873 - for num, word in num_to_word.items(): 874 - text = re.sub(rf'\b{num}\b', word, text) 875 - 876 - # Step 6: Common abbreviations (English only in v1) 877 - if language == 'en': 878 - text = text.replace('covid-19', 'covid') 879 - text = text.replace('u.s.', 'us') 880 - text = text.replace('u.k.', 'uk') 881 - 882 - # Step 7: NO entity normalization in v1 883 - # (Trump vs Donald Trump vs President Trump remain distinct) 884 - 885 - return text 886 - 887 -# Version identifier (include in cache namespace) 888 -CANONICALIZER_VERSION = "v1norm1" 889 -{{/code}} 890 - 891 -**Cache Key Formula (Updated):** 892 - 893 -{{code}} 894 -language = "en" 895 -canonical = normalize_claim_v1(claim_text, language) 896 -cache_key = f"claim:{CANONICALIZER_VERSION}:{language}:{sha256(canonical)}" 897 - 898 -Example: 899 - claim: "COVID-19 vaccines are 95% effective" 900 - canonical: "covid vaccines are 95 percent effective" 901 - sha256: abc123...def456 902 - key: "claim:v1norm1:en:abc123...def456" 903 -{{/code}} 904 - 905 -**Cache Metadata MUST Include:** 906 - 907 -{{code language="json"}} 908 -{ 909 - "canonical_claim": "covid vaccines are 95 percent effective", 910 - "canonicalizer_version": "v1norm1", 911 - "language": "en", 912 - "original_claim_samples": ["COVID-19 vaccines are 95% effective"] 913 -} 914 -{{/code}} 915 - 916 -**Version Upgrade Path:** 917 -* v1norm1 → v1norm2: Cache namespace changes, old keys remain valid until TTL 918 -* v1normN → v2norm1: Major version bump, invalidate all v1 caches 919 - 920 - 921 -claim:stats:hit_count:{claim_hash} # Counter 922 -claim:index:domain:{domain} # Set of claim hashes by domain 923 -claim:index:language:{lang} # Set of claim hashes by language 924 -{{/code}} 925 - 926 - 927 927 === 5.1.1 Canonical Claim Normalization (v1) === 928 928 929 929 The cache key depends on deterministic claim normalization. All implementations MUST follow this algorithm exactly. ... ... @@ -930,82 +930,80 @@ 930 930 931 931 **Algorithm: Canonical Claim Normalization v1** 932 932 933 -{{code language="python"}} 934 -def normalize_claim_v1(claim_text: str, language: str) -> str: 935 - """ 936 - Normalizes claim to canonical form for cache key generation. 937 - Version: v1norm1 (POC1) 938 - """ 939 - import re 940 - import unicodedata 941 - 942 - # Step 1: Unicode normalization (NFC) 943 - text = unicodedata.normalize('NFC', claim_text) 944 - 945 - # Step 2: Lowercase 946 - text = text.lower() 947 - 948 - # Step 3: Remove punctuation (except hyphens in words) 949 - text = re.sub(r'[^\w\s-]', '', text) 950 - 951 - # Step 4: Normalize whitespace (collapse multiple spaces) 952 - text = re.sub(r'\s+', ' ', text).strip() 953 - 954 - # Step 5: Numeric normalization 955 - text = text.replace('%', ' percent') 956 - # Spell out single-digit numbers 957 - num_to_word = {'0':'zero', '1':'one', '2':'two', '3':'three', 958 - '4':'four', '5':'five', '6':'six', '7':'seven', 959 - '8':'eight', '9':'nine'} 960 - for num, word in num_to_word.items(): 961 - text = re.sub(rf'\b{num}\b', word, text) 962 - 963 - # Step 6: Common abbreviations (English only in v1) 964 - if language == 'en': 965 - text = text.replace('covid-19', 'covid') 966 - text = text.replace('u.s.', 'us') 967 - text = text.replace('u.k.', 'uk') 968 - 969 - # Step 7: NO entity normalization in v1 970 - # (Trump vs Donald Trump vs President Trump remain distinct) 971 - 972 - return text 773 +{{{def normalize_claim_v1(claim_text: str, language: str) -> str: 774 + """ 775 + Normalizes claim to canonical form for cache key generation. 776 + Version: v1norm1 (POC1) 777 + """ 778 + import re 779 + import unicodedata 780 + 781 + # Step 1: Unicode normalization (NFC) 782 + text = unicodedata.normalize('NFC', claim_text) 783 + 784 + # Step 2: Lowercase 785 + text = text.lower() 786 + 787 + # Step 3: Remove punctuation (except hyphens in words) 788 + text = re.sub(r'[^\w\s-]', '', text) 789 + 790 + # Step 4: Normalize whitespace (collapse multiple spaces) 791 + text = re.sub(r'\s+', ' ', text).strip() 792 + 793 + # Step 5: Numeric normalization 794 + text = text.replace('%', ' percent') 795 + # Spell out single-digit numbers 796 + num_to_word = {'0':'zero', '1':'one', '2':'two', '3':'three', 797 + '4':'four', '5':'five', '6':'six', '7':'seven', 798 + '8':'eight', '9':'nine'} 799 + for num, word in num_to_word.items(): 800 + text = re.sub(rf'\b{num}\b', word, text) 801 + 802 + # Step 6: Common abbreviations (English only in v1) 803 + if language == 'en': 804 + text = text.replace('covid-19', 'covid') 805 + text = text.replace('u.s.', 'us') 806 + text = text.replace('u.k.', 'uk') 807 + 808 + # Step 7: NO entity normalization in v1 809 + # (Trump vs Donald Trump vs President Trump remain distinct) 810 + 811 + return text 973 973 974 974 # Version identifier (include in cache namespace) 975 975 CANONICALIZER_VERSION = "v1norm1" 976 - {{/code}}815 +}}} 977 977 978 978 **Cache Key Formula (Updated):** 979 979 980 -{{code}} 981 -language = "en" 819 +{{{language = "en" 982 982 canonical = normalize_claim_v1(claim_text, language) 983 983 cache_key = f"claim:{CANONICALIZER_VERSION}:{language}:{sha256(canonical)}" 984 984 985 985 Example: 986 - claim: "COVID-19 vaccines are 95% effective"987 - canonical: "covid vaccines are 95 percent effective"988 - sha256: abc123...def456989 - key: "claim:v1norm1:en:abc123...def456"990 - {{/code}}824 + claim: "COVID-19 vaccines are 95% effective" 825 + canonical: "covid vaccines are 95 percent effective" 826 + sha256: abc123...def456 827 + key: "claim:v1norm1:en:abc123...def456" 828 +}}} 991 991 992 992 **Cache Metadata MUST Include:** 993 993 994 -{{code language="json"}} 995 -{ 996 - "canonical_claim": "covid vaccines are 95 percent effective", 997 - "canonicalizer_version": "v1norm1", 998 - "language": "en", 999 - "original_claim_samples": ["COVID-19 vaccines are 95% effective"] 832 +{{{{ 833 + "canonical_claim": "covid vaccines are 95 percent effective", 834 + "canonicalizer_version": "v1norm1", 835 + "language": "en", 836 + "original_claim_samples": ["COVID-19 vaccines are 95% effective"] 1000 1000 } 1001 - {{/code}}838 +}}} 1002 1002 1003 1003 **Version Upgrade Path:** 841 + 1004 1004 * v1norm1 → v1norm2: Cache namespace changes, old keys remain valid until TTL 1005 1005 * v1normN → v2norm1: Major version bump, invalidate all v1 caches 1006 1006 845 +---- 1007 1007 1008 - 1009 1009 === 5.1.2 Copyright & Data Retention Policy === 1010 1010 1011 1011 **Evidence Excerpt Storage:** ... ... @@ -1013,6 +1013,7 @@ 1013 1013 To comply with copyright law and fair use principles: 1014 1014 1015 1015 **What We Store:** 854 + 1016 1016 * **Metadata only:** Title, author, publisher, URL, publication date 1017 1017 * **Short excerpts:** Max 25 words per quote, max 3 quotes per evidence item 1018 1018 * **Summaries:** AI-generated bullet points (not verbatim text) ... ... @@ -1019,17 +1019,20 @@ 1019 1019 * **No full articles:** Never store complete article text beyond job processing 1020 1020 1021 1021 **Total per Cached Claim:** 861 + 1022 1022 * Scenarios: 2 per claim 1023 1023 * Evidence items: 6 per scenario (12 total) 1024 1024 * Quotes: 3 per evidence × 25 words = 75 words per item 1025 -* **Maximum stored verbatim text:** ~900 words per claim (12 × 75) 865 +* **Maximum stored verbatim text:** ~~900 words per claim (12 × 75) 1026 1026 1027 1027 **Retention:** 868 + 1028 1028 * Cache TTL: 90 days 1029 1029 * Job outputs: 24 hours (then archived or deleted) 1030 1030 * No persistent full-text article storage 1031 1031 1032 1032 **Rationale:** 874 + 1033 1033 * Short excerpts for citation = fair use 1034 1034 * Summaries are transformative (not copyrightable) 1035 1035 * Limited retention (90 days max) ... ... @@ -1036,480 +1036,27 @@ 1036 1036 * No commercial republication of excerpts 1037 1037 1038 1038 **DMCA Compliance:** 881 + 1039 1039 * Cache invalidation endpoint available for rights holders 1040 1040 * Contact: dmca@factharbor.org 1041 1041 885 +---- 1042 1042 1043 -== =5.2 Cache InvalidationStrategy ===887 +== Summary == 1044 1044 1045 -**Time-Based (Primary):** 1046 -* TTL: 90 days for most claims 1047 -* Reasoning: Evidence freshness, news cycles 889 +This WYSIWYG preview shows the **structure and key sections** of the 1,515-line API specification. 1048 1048 1049 -**Event-Based (Manual):** 1050 -* Admin can flag claims for invalidation 1051 -* Example: "Major study retracts findings" 1052 -* Tool: {{code}}DELETE /v1/cache/claim/{claim_hash}?reason=retraction{{/code}} 891 +**Full specification includes:** 1053 1053 1054 -**Version-Based (Automatic):** 1055 -* AKEL v2.0 release → Invalidate all v1.0 caches 1056 -* Cache keys include version: {{code}}claim:v1:*{{/code}} vs {{code}}claim:v2:*{{/code}} 893 +* Complete API endpoints (7 total) 894 +* All data schemas (ClaimExtraction, ClaimAnalysis, HolisticAssessment, Complete) 895 +* Quality gates & validation rules 896 +* LLM configuration for all 3 stages 897 +* Implementation notes with code samples 898 +* Testing strategy 899 +* Cross-references to other pages 1057 1057 1058 -**Long-Lived Historical Claims:** 1059 -* Historical claims about completed events generally have stable verdicts 1060 -* Example: "2024 US presidential election results" 1061 -* **Policy:** Extended TTL (365-3,650 days) instead of "never invalidate" 1062 -* **Reason:** Even historical data gets revisions (updated counts, corrections) 1063 -* **Mechanism:** Admin can still manually invalidate if major correction issued 1064 -* **Flag:** {{code}}is_historical=true{{/code}} in cache metadata → longer TTL 901 +**The complete specification is available in:** 1065 1065 1066 -=== 5.3 Cache Warming Strategy === 1067 - 1068 -**Proactive Cache Building (Future):** 1069 - 1070 -**Trending Topics:** 1071 -* Monitor news APIs for trending topics 1072 -* Pre-analyze top 20 common claims 1073 -* Example: New health study published → Pre-cache related claims 1074 - 1075 -**Predictable Events:** 1076 -* Elections, sporting events, earnings reports 1077 -* Pre-cache expected claims before event 1078 -* Reduces load during traffic spikes 1079 - 1080 -**User Patterns:** 1081 -* Analyze query logs 1082 -* Identify frequently requested claims 1083 -* Prioritize cache warming for these 1084 - 1085 ---- 1086 - 1087 -== 6. Quality Gates & Validation Rules == 1088 - 1089 -=== 6.1 Quality Gate Overview === 1090 - 1091 -|=Gate|=Name|=POC1 Status|=Applies To|=Notes 1092 -|**Gate 1**|Claim Validation|✅ Hard gate|Stage 1: Extraction|Filters opinions, compound claims 1093 -|**Gate 2**|Contradiction Search|✅ Mandatory rule|Stage 2: Analysis|Enforced per cached claim 1094 -|**Gate 3**|Uncertainty Disclosure|⚠️ Soft guidance|Stage 2: Analysis|Best practice 1095 -|**Gate 4**|Verdict Confidence|✅ Hard gate|Stage 2: Analysis|Confidence ≥ 0.5 required 1096 - 1097 -**Hard Gate Failures:** 1098 -* Gate 1 fail → Claim excluded from analysis 1099 -* Gate 4 fail → Claim marked "Unsubstantiated" but included 1100 - 1101 -=== 6.2 Validation Rules === 1102 - 1103 -|=Rule|=Requirement 1104 -|**Mandatory Contradiction**|Stage 2 MUST search for "undermines" evidence. If none found, reasoning must state: "No counter-evidence found despite targeted search." 1105 -|**Context-Aware Logic**|Stage 3 must prioritize central claims. If {{code}}is_central_to_thesis=true{{/code}} claim is REFUTED, article cannot be WELL-SUPPORTED. 1106 -|**Cache Consistency**|Cached claims must match current AKEL version. Version mismatch → cache miss. 1107 -|**Author Identification**|All outputs MUST include {{code}}author_type: "AI/AKEL"{{/code}}. 1108 - 1109 ---- 1110 - 1111 -== 7. Deterministic Markdown Template == 1112 - 1113 -Report generation uses **fixed template** (not LLM-generated). 1114 - 1115 -**Cache-Only Mode Template:** 1116 -{{code language="markdown"}} 1117 -# FactHarbor Analysis Report: PARTIAL ANALYSIS 1118 - 1119 -**Job ID:** {job_id} | **Generated:** {timestamp_utc} 1120 -**Mode:** Cache-Only (Free Tier) 1121 - 1122 ---- 1123 - 1124 -## ⚠️ Partial Analysis Notice 1125 - 1126 -This is a **cache-only analysis** based on previously analyzed claims. 1127 -{cache_coverage_percent}% of claims were available in cache. 1128 - 1129 -**What's Included:** 1130 -* {claims_cached} of {claims_total} claims analyzed 1131 -* Evidence and verdicts from cache (last updated: {oldest_cache_date}) 1132 - 1133 -**What's Missing:** 1134 -* {claims_missing} claims require new analysis 1135 -* Full article holistic assessment unavailable 1136 -* Estimated cost to complete: ${cost_to_complete} 1137 - 1138 -**[Upgrade to Pro]** for complete analysis 1139 - 1140 ---- 1141 - 1142 -## Cached Claims 1143 - 1144 -### [C1] {claim_text} ✅ From Cache 1145 -* **Cached:** {cached_at} ({cache_age} ago) 1146 -* **Times Used:** {hit_count} articles 1147 -* **Verdict:** {verdict} (Confidence: {confidence}) 1148 -* **Evidence:** {evidence_count} sources 1149 - 1150 -[Full claim details...] 1151 - 1152 -### [C3] {claim_text} ⚠️ Not In Cache 1153 -* **Status:** Requires new analysis 1154 -* **Cost:** $0.081 1155 -* **Upgrade to analyze this claim** 1156 - 1157 ---- 1158 - 1159 -**Powered by FactHarbor POC1-v0.4** | [Upgrade](https://factharbor.org/upgrade) 1160 -{{/code}} 1161 - 1162 ---- 1163 - 1164 -== 8. LLM Configuration (3-Stage) == 1165 - 1166 -=== 8.1 Stage 1: Claim Extraction (Haiku) === 1167 - 1168 -|=Parameter|=Value|=Notes 1169 -|**Model**|{{code}}claude-haiku-4-20250108{{/code}}|Fast, cheap, sufficient for extraction 1170 -|**Input Tokens**|~10K|Article text after URL extraction 1171 -|**Output Tokens**|~500|5 claims @ ~100 tokens each 1172 -|**Cost**|$0.003 per article|($0.25/M input + $1.25/M output) 1173 -|**Temperature**|0.0|Deterministic 1174 -|**Max Tokens**|1000|Generous buffer 1175 - 1176 -**Prompt Strategy:** 1177 -* Extract 5 verifiable factual claims 1178 -* Mark central vs. supporting claims 1179 -* Canonicalize (normalize phrasing) 1180 -* Deduplicate similar claims 1181 -* Output structured JSON only 1182 - 1183 -=== 8.2 Stage 2: Claim Analysis (Sonnet, CACHED) === 1184 - 1185 -|=Parameter|=Value|=Notes 1186 -|**Model**|{{code}}claude-3-5-sonnet-20241022{{/code}}|High quality for verdicts 1187 -|**Input Tokens**|~2K|Single claim + prompt + context 1188 -|**Output Tokens**|~5K|2 scenarios × ~2.5K tokens 1189 -|**Cost**|$0.081 per NEW claim|($3/M input + $15/M output) 1190 -|**Temperature**|0.0|Deterministic (cache consistency) 1191 -|**Max Tokens**|8000|Sufficient for 2 scenarios 1192 -|**Cache Strategy**|Redis, 90-day TTL|Key: {{code}}claim:v1norm1:{language}:{sha256(canonical_claim)}{{/code}} 1193 - 1194 -**Prompt Strategy:** 1195 -* Generate 2 scenario interpretations 1196 -* Search for supporting AND undermining evidence (mandatory) 1197 -* 6 evidence items per scenario maximum 1198 -* Compute verdict with reasoning chain (3-4 bullets) 1199 -* Output structured JSON only 1200 - 1201 -**Output Constraints (Cost Control):** 1202 -* Scenarios: Max 2 per claim 1203 -* Evidence: Max 6 per scenario 1204 -* Evidence summary: Max 3 bullets 1205 -* Reasoning chain: Max 4 bullets 1206 - 1207 -=== 8.3 Stage 3: Holistic Assessment (Sonnet) === 1208 - 1209 -|=Parameter|=Value|=Notes 1210 -|**Model**|{{code}}claude-3-5-sonnet-20241022{{/code}}|Context-aware analysis 1211 -|**Input Tokens**|~5K|Article + claim verdicts 1212 -|**Output Tokens**|~1K|Article verdict + fallacies 1213 -|**Cost**|$0.030 per article|($3/M input + $15/M output) 1214 -|**Temperature**|0.0|Deterministic 1215 -|**Max Tokens**|2000|Sufficient for assessment 1216 - 1217 -**Prompt Strategy:** 1218 -* Detect main thesis 1219 -* Evaluate logical coherence (claim verdicts → thesis) 1220 -* Identify fallacies (correlation-causation, cherry-picking, etc.) 1221 -* Compute logic_quality_score 1222 -* Explain article verdict reasoning (3-4 bullets) 1223 -* Output structured JSON only 1224 - 1225 -=== 8.4 Cost Projections by Cache Hit Rate === 1226 - 1227 -|=Cache Hit Rate|=Cost per Article|=10K Articles Cost|=100K Articles Cost 1228 -|0% (cold start)|$0.438|$4,380|$43,800 1229 -|20%|$0.357|$3,570|$35,700 1230 -|40%|$0.276|$2,760|$27,600 1231 -|**60%**|**$0.195**|**$1,950**|**$19,500** 1232 -|**70%** (target)|**$0.155**|**$1,550**|**$15,500** 1233 -|**80%**|**$0.114**|**$1,140**|**$11,400** 1234 -|**90%**|**$0.073**|**$730**|**$7,300** 1235 -|95%|$0.053|$530|$5,300 1236 - 1237 -**Break-Even Analysis:** 1238 -* Monolithic (v0.3.1): $0.15 per article constant 1239 -* 3-stage breaks even at **70% cache hit rate** 1240 -* Expected after ~1,500 articles in same domain 1241 - 1242 ---- 1243 - 1244 -== 9. Implementation Notes == 1245 - 1246 -=== 9.1 Recommended Tech Stack === 1247 - 1248 -* **Framework:** Next.js 14+ with App Router (TypeScript) 1249 -* **Cache:** Redis 7.0+ (managed: AWS ElastiCache, Redis Cloud, Upstash) 1250 -* **Storage:** Filesystem JSON for jobs + S3/R2 for archival 1251 -* **Queue:** BullMQ with Redis (for 3-stage pipeline orchestration) 1252 -* **LLM Client:** Anthropic Python SDK or TypeScript SDK 1253 -* **Cost Tracking:** PostgreSQL for user credit ledger 1254 -* **Deployment:** Vercel (frontend + API) + Redis Cloud 1255 - 1256 -=== 9.2 3-Stage Pipeline Implementation === 1257 - 1258 -**Job Queue Flow (Conceptual):** 1259 - 1260 -{{code language="typescript"}} 1261 -// Stage 1: Extract Claims 1262 -const stage1Job = await queue.add('stage1-extract-claims', { 1263 - jobId: 'job123', 1264 - articleUrl: 'https://example.com/article' 1265 -}); 1266 - 1267 -// On Stage 1 completion → enqueue Stage 2 jobs 1268 -stage1Job.on('completed', async (result) => { 1269 - const { claims } = result; 1270 - 1271 - // Stage 2: Analyze each claim (with cache check) 1272 - const stage2Jobs = await Promise.all( 1273 - claims.map(claim => 1274 - queue.add('stage2-analyze-claim', { 1275 - jobId: 'job123', 1276 - claimId: claim.claim_id, 1277 - canonicalClaim: claim.canonical_claim, 1278 - checkCache: true 1279 - }) 1280 - ) 1281 - ); 1282 - 1283 - // On all Stage 2 completions → enqueue Stage 3 1284 - await Promise.all(stage2Jobs.map(j => j.waitUntilFinished())); 1285 - 1286 - const claimVerdicts = await gatherStage2Results('job123'); 1287 - 1288 - await queue.add('stage3-holistic', { 1289 - jobId: 'job123', 1290 - articleUrl: 'https://example.com/article', 1291 - claimVerdicts: claimVerdicts 1292 - }); 1293 -}); 1294 -{{/code}} 1295 - 1296 -**Note:** This is a conceptual sketch. Actual implementation may use BullMQ Flow API or custom orchestration. 1297 - 1298 -**Cache Check Logic:** 1299 -{{code language="typescript"}} 1300 -async function analyzeClaimWithCache(claim: string): Promise<ClaimAnalysis> { 1301 - const canonicalClaim = normalizeClaim(claim); 1302 - const claimHash = sha256(canonicalClaim); 1303 - const cacheKey = `claim:v1:${claimHash}`; 1304 - 1305 - // Check cache 1306 - const cached = await redis.get(cacheKey); 1307 - if (cached) { 1308 - await redis.incr(`claim:stats:hit_count:${claimHash}`); 1309 - return JSON.parse(cached); 1310 - } 1311 - 1312 - // Cache miss - analyze with LLM 1313 - const analysis = await analyzeClaim_Stage2(canonicalClaim); 1314 - 1315 - // Store in cache 1316 - await redis.set(cacheKey, JSON.stringify(analysis), 'EX', 7776000); // 90 days 1317 - 1318 - return analysis; 1319 -} 1320 -{{/code}} 1321 - 1322 -=== 9.3 User Credit Management === 1323 - 1324 -**PostgreSQL Schema:** 1325 -{{code language="sql"}} 1326 -CREATE TABLE user_credits ( 1327 - user_id UUID PRIMARY KEY, 1328 - tier VARCHAR(20) DEFAULT 'free', 1329 - credit_limit DECIMAL(10,2) DEFAULT 10.00, 1330 - credit_used DECIMAL(10,2) DEFAULT 0.00, 1331 - reset_date TIMESTAMP, 1332 - cache_only_mode BOOLEAN DEFAULT false, 1333 - created_at TIMESTAMP DEFAULT NOW() 1334 -); 1335 - 1336 -CREATE TABLE usage_log ( 1337 - id SERIAL PRIMARY KEY, 1338 - user_id UUID REFERENCES user_credits(user_id), 1339 - job_id VARCHAR(50), 1340 - stage VARCHAR(20), 1341 - cost DECIMAL(10,4), 1342 - cache_hit BOOLEAN, 1343 - created_at TIMESTAMP DEFAULT NOW() 1344 -); 1345 -{{/code}} 1346 - 1347 -**Credit Deduction Logic:** 1348 -{{code language="typescript"}} 1349 -async function deductCredit(userId: string, cost: number): Promise<boolean> { 1350 - const user = await db.query('SELECT * FROM user_credits WHERE user_id = $1', [userId]); 1351 - 1352 - const newUsed = user.credit_used + cost; 1353 - 1354 - if (newUsed > user.credit_limit && user.tier === 'free') { 1355 - // Trigger cache-only mode 1356 - await db.query( 1357 - 'UPDATE user_credits SET cache_only_mode = true WHERE user_id = $1', 1358 - [userId] 1359 - ); 1360 - throw new Error('CREDIT_LIMIT_REACHED'); 1361 - } 1362 - 1363 - await db.query( 1364 - 'UPDATE user_credits SET credit_used = $1 WHERE user_id = $2', 1365 - [newUsed, userId] 1366 - ); 1367 - 1368 - return true; 1369 -} 1370 -{{/code}} 1371 - 1372 -=== 9.4 Cache-Only Mode Implementation === 1373 - 1374 -**Middleware:** 1375 -{{code language="typescript"}} 1376 -async function checkCacheOnlyMode(req, res, next) { 1377 - const user = await getUserCredit(req.userId); 1378 - 1379 - if (user.cache_only_mode) { 1380 - // Allow only cache reads 1381 - if (req.body.options?.cache_preference !== 'allow_partial') { 1382 - return res.status(402).json({ 1383 - error: 'credit_limit_reached', 1384 - message: 'Resubmit with cache_preference=allow_partial', 1385 - cache_only_mode: true 1386 - }); 1387 - } 1388 - 1389 - // Modify request to skip Stage 2 for uncached claims 1390 - req.cacheOnlyMode = true; 1391 - } 1392 - 1393 - next(); 1394 -} 1395 -{{/code}} 1396 - 1397 -=== 9.5 Estimated Timeline === 1398 - 1399 -**POC1 with 3-Stage Architecture:** 1400 -* Week 1: Stage 1 (Haiku extraction) + Redis setup 1401 -* Week 2: Stage 2 (Sonnet analysis + caching) 1402 -* Week 3: Stage 3 (Holistic assessment) + pipeline orchestration 1403 -* Week 4: User credit system + cache-only mode 1404 -* Week 5: Testing with 100 articles (measure cache hit rate) 1405 -* Week 6: Optimization + bug fixes 1406 -* **Total: 6-8 weeks** 1407 - 1408 -**Manual coding:** 12-16 weeks 1409 - 1410 ---- 1411 - 1412 -== 10. Testing Strategy == 1413 - 1414 -=== 10.1 Cache Performance Testing === 1415 - 1416 -**Test Scenarios:** 1417 - 1418 -**Scenario 1: Cold Start (0 cache)** 1419 -* Analyze 100 diverse articles 1420 -* Measure: Cost per article, cache growth rate 1421 -* Expected: $0.35-0.40 avg, ~400 unique claims cached 1422 - 1423 -**Scenario 2: Warm Cache (Overlapping Domain)** 1424 -* Analyze 100 articles on SAME topic (e.g., "2024 election") 1425 -* Measure: Cache hit rate growth 1426 -* Expected: Hit rate 20% → 60% by article 100 1427 - 1428 -**Scenario 3: Mature Cache (1,000 articles)** 1429 -* Analyze next 100 articles (diverse topics) 1430 -* Measure: Steady-state cache hit rate 1431 -* Expected: 60-70% hit rate, $0.15-0.18 avg cost 1432 - 1433 -**Scenario 4: Cache-Only Mode** 1434 -* Free user reaches $10 limit (67 articles at 70% hit rate) 1435 -* Submit 10 more articles with {{code}}cache_preference=allow_partial{{/code}} 1436 -* Measure: Coverage %, user satisfaction 1437 -* Expected: 60-70% coverage, instant results 1438 - 1439 -=== 10.2 Success Metrics === 1440 - 1441 -**Cache Performance:** 1442 -* Week 1: 5-10% hit rate 1443 -* Week 2: 15-25% hit rate 1444 -* Week 3: 30-40% hit rate 1445 -* Week 4: 45-55% hit rate 1446 -* Target: ≥50% by 1,000 articles 1447 - 1448 -**Cost Targets:** 1449 -* Articles 1-100: $0.35-0.40 avg ⚠️ (expected) 1450 -* Articles 100-500: $0.25-0.30 avg 1451 -* Articles 500-1,000: $0.18-0.22 avg 1452 -* Articles 1,000+: $0.12-0.15 avg ✅ 1453 - 1454 -**Quality Metrics (same as v0.3.1):** 1455 -* Hallucination rate: <5% 1456 -* Context-aware accuracy: ≥70% 1457 -* False positive rate: <15% 1458 -* Mandatory contradiction search: 100% compliance 1459 - 1460 -=== 10.3 Free Tier Economics Validation === 1461 - 1462 -**Test with simulated 1,000 users:** 1463 -* Each user: $10 credit 1464 -* 70% cache hit rate 1465 -* Avg 70 articles/user/month 1466 - 1467 -**Projected Costs:** 1468 -* Total credits: 1,000 × $10 = $10,000 1469 -* Actual LLM costs: ~$9,000 (cache savings) 1470 -* Margin: 10% 1471 - 1472 -**Sustainability Check:** 1473 -* If margin <5% → Reduce free tier limit 1474 -* If margin >20% → Consider increasing free tier 1475 - 1476 ---- 1477 - 1478 -== 11. Cross-References == 1479 - 1480 -This API specification implements requirements from: 1481 - 1482 -* **[[POC Requirements>>Test.FactHarbor.Specification.POC.Requirements]]** 1483 -** FR-POC-1 through FR-POC-6 (3-stage architecture) 1484 -** NFR-POC-1 through NFR-POC-3 (quality gates, caching) 1485 -** NEW: FR-POC-7 (Claim-level caching) 1486 -** NEW: FR-POC-8 (User credit system) 1487 -** NEW: FR-POC-9 (Cache-only mode) 1488 - 1489 -* **[[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]]** 1490 -** Approach 1 implemented in Stage 3 1491 -** Context-aware holistic assessment 1492 - 1493 -* **[[Requirements>>Test.FactHarbor.Specification.Requirements.WebHome]]** 1494 -** FR4 (Analysis Summary) - enhanced with caching 1495 -** FR7 (Verdict Calculation) - cached per claim 1496 -** NFR11 (Quality Gates) - enforced across stages 1497 -** NEW: NFR19 (Cost Efficiency via Caching) 1498 -** NEW: NFR20 (Free Tier Sustainability) 1499 - 1500 -* **[[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]]** 1501 -** POC1 3-stage pipeline architecture 1502 -** Redis cache layer 1503 -** User credit system 1504 - 1505 -* **[[Data Model>>Test.FactHarbor.Specification.Data Model.WebHome]]** 1506 -** Claim structure (cacheable unit) 1507 -** Evidence structure 1508 -** Scenario boundaries 1509 - 1510 ---- 1511 - 1512 -**End of Specification - FactHarbor POC1 API v0.4** 1513 - 1514 -**3-stage caching architecture with free tier cache-only mode. Ready for sustainable, scalable implementation!** 🚀 1515 - 903 +* FactHarbor_POC1_API_and_Schemas_Spec_v0_4_1_PATCHED.md (45 KB standalone) 904 +* Export files (TEST/PRODUCTION) for xWiki import