Changes for page POC1 API & Schemas Specification
Last modified by Robert Schaub on 2025/12/24 18:26
Summary
-
Page properties (2 modified, 0 added, 0 removed)
Details
- Page properties
-
- Title
-
... ... @@ -1,1 +1,1 @@ 1 -POC1 API & Schemas Specification 1 +POC1 API & Schemas Specification v0.4.1 - Content
-
... ... @@ -1,6 +1,6 @@ 1 1 # FactHarbor POC1 — API & Schemas Specification 2 2 3 -**Version:** 0. 3(POC1 -ProductionReady)3 +**Version:** 0.4.1 (POC1 - 3-Stage Caching Architecture) 4 4 **Namespace:** FactHarbor.* 5 5 **Syntax:** xWiki 2.1 6 6 **Last Updated:** 2025-12-24 ... ... @@ -10,15 +10,31 @@ 10 10 == Version History == 11 11 12 12 |=Version|=Date|=Changes 13 +|0.4.1|2025-12-24|Applied 9 critical fixes: file format notice, verdict taxonomy, canonicalization algorithm, Stage 1 cost policy, BullMQ fix, language in cache key, historical claims TTL, idempotency, copyright policy 14 +|0.4|2025-12-24|**BREAKING:** 3-stage pipeline with claim-level caching, user tier system, cache-only mode for free users, Redis cache architecture 15 +|0.3.1|2025-12-24|Fixed single-prompt strategy, SSE clarification, schema canonicalization, cost constraints, chain-of-thought, evidence citation, Jina safety, gate numbering 13 13 |0.3|2025-12-24|Added complete API endpoints, LLM config, risk tiers, scraping details, quality gate logging, temporal separation note, cross-references 14 14 |0.2|2025-12-24|Initial rebased version with holistic assessment 15 15 |0.1|2025-12-24|Original specification 16 16 17 17 --- 21 +--- 18 18 23 +== File Format Notice == 24 + 25 +**⚠️ Important:** This file is stored as {{code}}.md{{/code}} for transport/versioning, but the content is **xWiki 2.1 syntax** (not Markdown). 26 + 27 +**When importing to xWiki:** 28 +* Use "Import as XWiki content" (not "Import as Markdown") 29 +* The xWiki parser will correctly interpret {{code}}==}} headers, {{{{code}}}}}} blocks, etc. 30 + 31 +**Alternate naming:** If your workflow supports it, rename to {{code}}.xwiki.txt{{/code}} to avoid ambiguity. 32 + 33 +--- 34 + 19 19 == 1. Core Objective (POC1) == 20 20 21 -The primary technical goal of POC1 is to validate **Approach 1 (Single-Pass Holistic Analysis)**: 37 +The primary technical goal of POC1 is to validate **Approach 1 (Single-Pass Holistic Analysis)** while implementing **claim-level caching** to achieve cost sustainability: 22 22 23 23 The system must prove that AI can identify an article's **Main Thesis** and determine if the supporting claims (even if individually accurate) logically support that thesis without committing fallacies (e.g., correlation vs. causation, cherry-picking, hasty generalization). 24 24 ... ... @@ -25,69 +25,226 @@ 25 25 **Success Criteria:** 26 26 * Test with 30 diverse articles 27 27 * Target: ≥70% accuracy detecting misleading articles 28 -* Cost: <$0.35 per analysis 44 +* Cost: <$0.25 per NEW analysis (uncached) 45 +* Cost: $0.00 for cached claim reuse 46 +* Cache hit rate: ≥50% after 1,000 articles 29 29 * Processing time: <2 minutes (standard depth) 30 30 49 +**Economic Model:** 50 +* Free tier: $10 credit per month (~40-140 articles depending on cache hits) 51 +* After limit: Cache-only mode (instant, free access to cached claims) 52 +* Paid tier: Unlimited new analyses 53 + 31 31 **See:** [[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]] for complete investigation of 7 approaches. 32 32 33 33 --- 34 34 35 -== 2. RuntimeModel& Job States==58 +== 2. Architecture Overview == 36 36 37 -=== 2.1 Pipeline Steps===60 +=== 2.1 3-Stage Pipeline with Caching === 38 38 39 -F orprogressreportingviaAPI, thepipeline follows thesestages:62 +FactHarbor POC1 uses a **3-stage architecture** designed for claim-level caching and cost efficiency: 40 40 41 -# **INGEST**: URL scraping (Jina Reader / Trafilatura) or text normalization. 42 -# **EXTRACT_CLAIMS**: Identifying 3-5 verifiable factual claims + marking central vs. supporting. 43 -# **SCENARIOS**: Generating context interpretations for each claim. 44 -# **RETRIEVAL**: Evidence gathering (Search API + mandatory contradiction search). 45 -# **VERDICTS**: Assigning likelihoods, confidence, and uncertainty per scenario. 46 -# **HOLISTIC_ASSESSMENT**: Evaluating article-level credibility (Thesis vs. Claims logic). 47 -# **REPORT**: Generating final Markdown and JSON outputs. 64 +{{code language="mermaid"}} 65 +graph TD 66 + A[Article Input] --> B[Stage 1: Extract Claims] 67 + B --> C{For Each Claim} 68 + C --> D[Check Cache] 69 + D -->|Cache HIT| E[Return Cached Verdict] 70 + D -->|Cache MISS| F[Stage 2: Analyze Claim] 71 + F --> G[Store in Cache] 72 + G --> E 73 + E --> H[Stage 3: Holistic Assessment] 74 + H --> I[Final Report] 75 +{{/code}} 48 48 49 -=== 2.1.1 URL Extraction Strategy === 77 +**Stage 1: Claim Extraction** (Haiku, no cache) 78 +* Input: Article text 79 +* Output: 5 canonical claims (normalized, deduplicated) 80 +* Model: Claude Haiku 4 81 +* Cost: $0.003 per article 82 +* Cache strategy: No caching (article-specific) 50 50 51 -**Primary:** Jina AI Reader ({{code}}https://r.jina.ai/{url}{{/code}}) 52 -* **Rationale:** Clean markdown, handles JS rendering, free tier sufficient 53 -* **Fallback:** Trafilatura (Python library) for simple static HTML 84 +**Stage 2: Claim Analysis** (Sonnet, CACHED) 85 +* Input: Single canonical claim 86 +* Output: Scenarios + Evidence + Verdicts 87 +* Model: Claude Sonnet 3.5 88 +* Cost: $0.081 per NEW claim 89 +* Cache strategy: **Redis, 90-day TTL** 90 +* Cache key: {{code}}claim:v1norm1:{language}:{sha256(canonical_claim)}{{/code}} 54 54 55 -**Error Handling:** 92 +**Stage 3: Holistic Assessment** (Sonnet, no cache) 93 +* Input: Article + Claim verdicts (from cache or Stage 2) 94 +* Output: Article verdict + Fallacies + Logic quality 95 +* Model: Claude Sonnet 3.5 96 +* Cost: $0.030 per article 97 +* Cache strategy: No caching (article-specific) 56 56 57 -|=Error Code|=Trigger|=Action 58 -|{{code}}URL_BLOCKED{{/code}}|403/401/Paywall detected|Return error, suggest text paste 59 -|{{code}}URL_UNREACHABLE{{/code}}|Network/DNS failure|Retry once, then fail 60 -|{{code}}URL_NOT_FOUND{{/code}}|404 Not Found|Return error immediately 61 -|{{code}}EXTRACTION_FAILED{{/code}}|Content <50 words or unreadable|Return error with reason 99 +**Total Cost Formula:** 100 +{{code}} 101 +Cost = $0.003 (extraction) + (N_new_claims × $0.081) + $0.030 (holistic) 62 62 63 - **Supported URL Patterns:**64 - *✅Newsarticles,blogposts,Wikipedia65 - *✅Academicpreprints(arXiv)66 - *❌Socialmediaposts(Twitter,Facebook)- notin POC167 - *❌Videoplatforms (YouTube,TikTok)- not in POC168 - * ❌ PDF files -deferred to Beta 0103 +Examples: 104 +- 0 new claims (100% cache hit): $0.033 105 +- 1 new claim (80% cache hit): $0.114 106 +- 3 new claims (40% cache hit): $0.276 107 +- 5 new claims (0% cache hit): $0.438 108 +{{/code}} 69 69 70 -=== 2.2 Job StatusEnumeration===110 +=== 2.2 User Tier System === 71 71 72 -((( 73 -* **QUEUED** - Job accepted, waiting in queue 74 -* **RUNNING** - Processing in progress 75 -* **SUCCEEDED** - Analysis complete, results available 76 -* **FAILED** - Error occurred, see error details 77 -* **CANCELLED** - User cancelled via DELETE endpoint 78 -))) 112 +|=Tier|=Monthly Credit|=After Limit|=Cache Access|=Analytics 113 +|**Free**|$10|Cache-only mode|✅ Full|Basic 114 +|**Pro** (future)|$50|Continues|✅ Full|Advanced 115 +|**Enterprise** (future)|Custom|Continues|✅ Full + Priority|Full 79 79 117 +**Free Tier Economics:** 118 +* $10 credit = 40-140 articles analyzed (depending on cache hit rate) 119 +* Average 70 articles/month at 70% cache hit rate 120 +* After limit: Cache-only mode (see Section 2.3) 121 + 122 +=== 2.3 Cache-Only Mode (Free Tier Feature) === 123 + 124 +When free users reach their $10 monthly limit, they enter **Cache-Only Mode**: 125 + 126 +**What Cache-Only Mode Provides:** 127 + 128 +✅ **Claim Extraction (Platform-Funded):** 129 +* Stage 1 extraction runs at $0.003 per article 130 +* **Cost: Absorbed by platform** (not charged to user credit) 131 +* Rationale: Extraction is necessary to check cache, and cost is negligible 132 +* Rate limit: Max 50 extractions/day in cache-only mode (prevents abuse) 133 + 134 +✅ **Instant Access to Cached Claims:** 135 +* Any claim that exists in cache → Full verdict returned 136 +* Cost: $0 (no LLM calls) 137 +* Response time: <100ms 138 + 139 +✅ **Partial Article Analysis:** 140 +* Check each claim against cache 141 +* Return verdicts for ALL cached claims 142 +* For uncached claims: Return {{code}}"status": "cache_miss"{{/code}} 143 + 144 +✅ **Cache Coverage Report:** 145 +* "3 of 5 claims available in cache (60% coverage)" 146 +* Links to cached analyses 147 +* Estimated cost to complete: $0.162 (2 new claims) 148 + 149 +❌ **Not Available in Cache-Only Mode:** 150 +* New claim analysis (Stage 2 LLM calls blocked) 151 +* Full holistic assessment (Stage 3 blocked if any claims missing) 152 + 153 +**User Experience:** 154 +{{code language="json"}} 155 +{ 156 + "status": "cache_only_mode", 157 + "message": "Monthly credit limit reached. Showing cached results only.", 158 + "cache_coverage": { 159 + "claims_total": 5, 160 + "claims_cached": 3, 161 + "claims_missing": 2, 162 + "coverage_percent": 60 163 + }, 164 + "cached_claims": [ 165 + {"claim_id": "C1", "verdict": "Likely", "confidence": 0.82}, 166 + {"claim_id": "C2", "verdict": "Highly Likely", "confidence": 0.91}, 167 + {"claim_id": "C4", "verdict": "Unclear", "confidence": 0.55} 168 + ], 169 + "missing_claims": [ 170 + {"claim_id": "C3", "claim_text": "...", "estimated_cost": "$0.081"}, 171 + {"claim_id": "C5", "claim_text": "...", "estimated_cost": "$0.081"} 172 + ], 173 + "upgrade_options": { 174 + "top_up": "$5 for 20-70 more articles", 175 + "pro_tier": "$50/month unlimited" 176 + } 177 +} 178 +{{/code}} 179 + 180 +**Design Rationale:** 181 +* Free users still get value (cached claims often answer their question) 182 +* Demonstrates FactHarbor's value (partial results encourage upgrade) 183 +* Sustainable for platform (no additional cost) 184 +* Fair to all users (everyone contributes to cache) 185 + 80 80 --- 81 81 82 82 == 3. REST API Contract == 83 83 84 -=== 3.1 Cre ateAnalysisJob===190 +=== 3.1 User Credit Tracking === 85 85 192 +**Endpoint:** {{code}}GET /v1/user/credit{{/code}} 193 + 194 +**Response:** {{code}}200 OK{{/code}} 195 + 196 +{{code language="json"}} 197 +{ 198 + "user_id": "user_abc123", 199 + "tier": "free", 200 + "credit_limit": 10.00, 201 + "credit_used": 7.42, 202 + "credit_remaining": 2.58, 203 + "reset_date": "2025-02-01T00:00:00Z", 204 + "cache_only_mode": false, 205 + "usage_stats": { 206 + "articles_analyzed": 67, 207 + "claims_from_cache": 189, 208 + "claims_newly_analyzed": 113, 209 + "cache_hit_rate": 0.626 210 + } 211 +} 212 +{{/code}} 213 + 214 +--- 215 + 216 +=== 3.2 Create Analysis Job (3-Stage) === 217 + 86 86 **Endpoint:** {{code}}POST /v1/analyze{{/code}} 87 87 88 -**Request Body Example:** 220 +**Request Body:** 221 + 222 + 223 +**Idempotency Support:** 224 + 225 +To prevent duplicate job creation on network retries, clients SHOULD include: 226 + 227 +{{code language="http"}} 228 +POST /v1/analyze 229 +Idempotency-Key: {client-generated-uuid} 230 +{{/code}} 231 + 232 +OR use the {{code}}client.request_id{{/code}} field: 233 + 89 89 {{code language="json"}} 90 90 { 236 + "input_url": "...", 237 + "client": { 238 + "request_id": "client-uuid-12345", 239 + "source_label": "optional" 240 + } 241 +} 242 +{{/code}} 243 + 244 +**Server Behavior:** 245 +* If {{code}}Idempotency-Key{{/code}} or {{code}}request_id{{/code}} seen before (within 24 hours): 246 + - Return existing job ({{code}}200 OK{{/code}}, not {{code}}202 Accepted{{/code}}) 247 + - Do NOT create duplicate job or charge twice 248 +* Idempotency keys expire after 24 hours (matches job retention) 249 + 250 +**Example Response (Idempotent):** 251 +{{code language="json"}} 252 +{ 253 + "job_id": "01J...ULID", 254 + "status": "RUNNING", 255 + "idempotent": true, 256 + "original_request_at": "2025-12-24T10:31:00Z", 257 + "message": "Returning existing job (idempotency key matched)" 258 +} 259 +{{/code}} 260 + 261 + 262 +{{code language="json"}} 263 +{ 91 91 "input_type": "url", 92 92 "input_url": "https://example.com/medical-report-01", 93 93 "input_text": null, ... ... @@ -95,7 +95,8 @@ 95 95 "browsing": "on", 96 96 "depth": "standard", 97 97 "max_claims": 5, 98 - "context_aware_analysis": true 271 + "context_aware_analysis": true, 272 + "cache_preference": "prefer_cache" 99 99 }, 100 100 "client": { 101 101 "request_id": "optional-client-tracking-id", ... ... @@ -105,10 +105,10 @@ 105 105 {{/code}} 106 106 107 107 **Options:** 108 -* {{code}} browsing{{/code}}: {{code}}on{{/code}} | {{code}}off{{/code}}(retrieveweb sourcesorjustoutput queries)109 - *{{code}}depth{{/code}}:{{code}}standard{{/code}}| {{code}}deep{{/code}}(evidencethoroughness)110 - *{{code}}max_claims{{/code}}:1-50(default:10)111 - *{{code}}context_aware_analysis{{/code}}:{{code}}true{{/code}}|{{code}}false{{/code}}(experimental)282 +* {{code}}cache_preference{{/code}}: {{code}}prefer_cache{{/code}} | {{code}}require_fresh{{/code}} | {{code}}allow_partial{{/code}} 283 + - {{code}}prefer_cache{{/code}}: Use cache when available, analyze new claims (default) 284 + - {{code}}require_fresh{{/code}}: Force re-analysis of all claims (ignores cache, costs more) 285 + - {{code}}allow_partial{{/code}}: Return partial results if some claims uncached (for free tier cache-only mode) 112 112 113 113 **Response:** {{code}}202 Accepted{{/code}} 114 114 ... ... @@ -117,6 +117,18 @@ 117 117 "job_id": "01J...ULID", 118 118 "status": "QUEUED", 119 119 "created_at": "2025-12-24T10:31:00Z", 294 + "estimated_cost": 0.114, 295 + "cost_breakdown": { 296 + "stage1_extraction": 0.003, 297 + "stage2_new_claims": 0.081, 298 + "stage2_cached_claims": 0.000, 299 + "stage3_holistic": 0.030 300 + }, 301 + "cache_info": { 302 + "claims_to_extract": 5, 303 + "estimated_cache_hits": 4, 304 + "estimated_new_claims": 1 305 + }, 120 120 "links": { 121 121 "self": "/v1/jobs/01J...ULID", 122 122 "result": "/v1/jobs/01J...ULID/result", ... ... @@ -126,9 +126,23 @@ 126 126 } 127 127 {{/code}} 128 128 315 +**Error Responses:** 316 + 317 +{{code}}402 Payment Required{{/code}} - Free tier limit reached, cache-only mode 318 +{{code language="json"}} 319 +{ 320 + "error": "credit_limit_reached", 321 + "message": "Monthly credit limit reached. Entering cache-only mode.", 322 + "cache_only_mode": true, 323 + "credit_remaining": 0.00, 324 + "reset_date": "2025-02-01T00:00:00Z", 325 + "action": "Resubmit with cache_preference=allow_partial for cached results" 326 +} 327 +{{/code}} 328 + 129 129 --- 130 130 131 -=== 3. 2Get Job Status ===331 +=== 3.3 Get Job Status === 132 132 133 133 **Endpoint:** {{code}}GET /v1/jobs/{job_id}{{/code}} 134 134 ... ... @@ -141,12 +141,20 @@ 141 141 "created_at": "2025-12-24T10:31:00Z", 142 142 "updated_at": "2025-12-24T10:31:22Z", 143 143 "progress": { 144 - "step": "RETRIEVAL", 145 - "percent": 60, 146 - "message": "Gathering evidence for C2-S1", 147 - "current_claim_id": "C2", 148 - "current_scenario_id": "C2-S1" 344 + "stage": "stage2_claim_analysis", 345 + "percent": 65, 346 + "message": "Analyzing claim 3 of 5 (2 from cache)", 347 + "current_claim_id": "C3", 348 + "cache_hits": 2, 349 + "cache_misses": 1 149 149 }, 351 + "actual_cost": 0.084, 352 + "cost_breakdown": { 353 + "stage1_extraction": 0.003, 354 + "stage2_new_claims": 0.081, 355 + "stage2_cached_claims": 0.000, 356 + "stage3_holistic": null 357 + }, 150 150 "input_echo": { 151 151 "input_type": "url", 152 152 "input_url": "https://example.com/medical-report-01" ... ... @@ -162,12 +162,61 @@ 162 162 163 163 --- 164 164 165 -=== 3. 3GetJSONResult ===373 +=== 3.4 Get Analysis Result === 166 166 167 167 **Endpoint:** {{code}}GET /v1/jobs/{job_id}/result{{/code}} 168 168 169 -**Response:** {{code}}200 OK{{/code}} (Returns the **AnalysisResult** schema - see Section 4)377 +**Response:** {{code}}200 OK{{/code}} 170 170 379 +Returns complete **AnalysisResult** schema (see Section 4). 380 + 381 +**Cache-Only Mode Response:** {{code}}206 Partial Content{{/code}} 382 + 383 +{{code language="json"}} 384 +{ 385 + "cache_only_mode": true, 386 + "cache_coverage": { 387 + "claims_total": 5, 388 + "claims_cached": 3, 389 + "claims_missing": 2, 390 + "coverage_percent": 60 391 + }, 392 + "partial_result": { 393 + "metadata": { 394 + "job_id": "01J...ULID", 395 + "timestamp_utc": "2025-12-24T10:31:30Z", 396 + "engine_version": "POC1-v0.4", 397 + "cache_only": true 398 + }, 399 + "claims": [ 400 + { 401 + "claim_id": "C1", 402 + "claim_text": "...", 403 + "canonical_claim": "...", 404 + "source": "cache", 405 + "cached_at": "2025-12-20T15:30:00Z", 406 + "cache_hit_count": 47, 407 + "scenarios": [...] 408 + }, 409 + { 410 + "claim_id": "C3", 411 + "claim_text": "...", 412 + "canonical_claim": "...", 413 + "source": "not_analyzed", 414 + "status": "cache_miss", 415 + "estimated_cost": 0.081 416 + } 417 + ], 418 + "article_holistic_assessment": null, 419 + "upgrade_prompt": { 420 + "message": "Upgrade to Pro for full analysis of all claims", 421 + "missing_claims": 2, 422 + "cost_to_complete": 0.192 423 + } 424 + } 425 +} 426 +{{/code}} 427 + 171 171 **Other Responses:** 172 172 * {{code}}409 Conflict{{/code}} - Job not finished yet 173 173 * {{code}}404 Not Found{{/code}} - Job ID unknown ... ... @@ -174,8 +174,29 @@ 174 174 175 175 --- 176 176 177 -=== 3. 4DownloadMarkdownReport ===434 +=== 3.5 Stage-Specific Endpoints (Optional, Advanced) === 178 178 436 +For direct stage access (useful for cache debugging, custom workflows): 437 + 438 +**Extract Claims Only:** 439 +{{code}}POST /v1/analyze/extract-claims{{/code}} 440 + 441 +**Analyze Single Claim:** 442 +{{code}}POST /v1/analyze/claim{{/code}} 443 + 444 +**Assess Article (with claim verdicts):** 445 +{{code}}POST /v1/analyze/assess-article{{/code}} 446 + 447 +**Check Claim Cache:** 448 +{{code}}GET /v1/cache/claim/{claim_hash}{{/code}} 449 + 450 +**Cache Statistics:** 451 +{{code}}GET /v1/cache/stats{{/code}} 452 + 453 +--- 454 + 455 +=== 3.6 Download Markdown Report === 456 + 179 179 **Endpoint:** {{code}}GET /v1/jobs/{job_id}/report{{/code}} 180 180 181 181 **Response:** {{code}}200 OK{{/code}} with {{code}}text/markdown; charset=utf-8{{/code}} content ... ... @@ -183,13 +183,11 @@ 183 183 **Headers:** 184 184 * {{code}}Content-Disposition: attachment; filename="factharbor_poc1_{job_id}.md"{{/code}} 185 185 186 -**Other Responses:** 187 -* {{code}}409 Conflict{{/code}} - Job not finished 188 -* {{code}}404 Not Found{{/code}} - Job unknown 464 +**Cache-Only Mode:** Report includes "Partial Analysis" watermark and upgrade prompt. 189 189 190 190 --- 191 191 192 -=== 3. 5Stream Job Events (Optional, Recommended) ===468 +=== 3.7 Stream Job Events (Backend Progress) === 193 193 194 194 **Endpoint:** {{code}}GET /v1/jobs/{job_id}/events{{/code}} 195 195 ... ... @@ -196,478 +196,1044 @@ 196 196 **Response:** Server-Sent Events (SSE) stream 197 197 198 198 **Event Types:** 199 -* {{code}}progress{{/code}} - Progress update 200 -* {{code}}claim_extracted{{/code}} - Claim identified 201 -* {{code}}verdict_computed{{/code}} - Scenario verdict complete 475 +* {{code}}progress{{/code}} - Backend progress (e.g., "Stage 1: Extracting claims") 476 +* {{code}}cache_hit{{/code}} - Claim found in cache 477 +* {{code}}cache_miss{{/code}} - Claim requires new analysis 478 +* {{code}}stage_complete{{/code}} - Stage 1/2/3 finished 202 202 * {{code}}complete{{/code}} - Job finished 203 203 * {{code}}error{{/code}} - Error occurred 481 +* {{code}}credit_warning{{/code}} - User approaching limit 204 204 205 205 --- 206 206 207 -=== 3. 6Cancel Job ===485 +=== 3.8 Cancel Job === 208 208 209 209 **Endpoint:** {{code}}DELETE /v1/jobs/{job_id}{{/code}} 210 210 211 - Attempts tocancel aqueued or runningjob.489 +**Note:** If job is mid-stage (e.g., analyzing claim 3 of 5), user is charged for completed work only. 212 212 213 -**Response:** {{code}}200 OK{{/code}} with updated Job object (status: CANCELLED) 214 - 215 -**Note:** Already-completed jobs cannot be cancelled. 216 - 217 217 --- 218 218 219 -=== 3. 7Health Check ===493 +=== 3.9 Health Check === 220 220 221 221 **Endpoint:** {{code}}GET /v1/health{{/code}} 222 222 223 -**Response:** {{code}}200 OK{{/code}} 224 - 225 225 {{code language="json"}} 226 226 { 227 227 "status": "ok", 228 - "version": "POC1-v0.3", 229 - "model": "claude-3-5-sonnet-20241022" 500 + "version": "POC1-v0.4", 501 + "model_stage1": "claude-haiku-4", 502 + "model_stage2": "claude-3-5-sonnet-20241022", 503 + "model_stage3": "claude-3-5-sonnet-20241022", 504 + "cache": { 505 + "status": "connected", 506 + "total_claims": 12847, 507 + "avg_hit_rate_24h": 0.73 508 + } 230 230 } 231 231 {{/code}} 232 232 233 233 --- 234 234 235 -== 4. AnalysisResult Schema(Context-Aware)==514 +== 4. Data Schemas == 236 236 237 - Thisschemaimplements the **Context-AwareAnalysis**required bythePOC1 specification.516 +=== 4.1 Stage 1 Output: ClaimExtraction === 238 238 239 239 {{code language="json"}} 240 240 { 520 + "job_id": "01J...ULID", 521 + "stage": "stage1_extraction", 522 + "article_metadata": { 523 + "title": "Article title", 524 + "source_url": "https://example.com/article", 525 + "extracted_text_length": 5234, 526 + "language": "en" 527 + }, 528 + "claims": [ 529 + { 530 + "claim_id": "C1", 531 + "claim_text": "Original claim text from article", 532 + "canonical_claim": "Normalized, deduplicated phrasing", 533 + "claim_hash": "sha256:abc123...", 534 + "is_central_to_thesis": true, 535 + "claim_type": "causal", 536 + "evaluability": "evaluable", 537 + "risk_tier": "B", 538 + "domain": "public_health" 539 + } 540 + ], 541 + "article_thesis": "Main argument detected", 542 + "cost": 0.003 543 +} 544 +{{/code}} 545 + 546 +=== 4.2 Stage 2 Output: ClaimAnalysis (CACHED) === 547 + 548 +This is the CACHEABLE unit. Stored in Redis with 90-day TTL. 549 + 550 +{{code language="json"}} 551 +{ 552 + "claim_hash": "sha256:abc123...", 553 + "canonical_claim": "COVID vaccines are 95% effective", 554 + "language": "en", 555 + "domain": "public_health", 556 + "analysis_version": "v1.0", 557 + "scenarios": [ 558 + { 559 + "scenario_id": "S1", 560 + "scenario_title": "mRNA vaccines (Pfizer/Moderna) in clinical trials", 561 + "definitions": {"95% effective": "95% reduction in symptomatic infection"}, 562 + "assumptions": ["Based on phase 3 trial data", "Against original strain"], 563 + "boundaries": { 564 + "time": "2020-2021 trials", 565 + "geography": "Multi-country trials", 566 + "population": "Adult population (16+)", 567 + "conditions": "Before widespread variants" 568 + }, 569 + "verdict": { 570 + "label": "Highly Likely", 571 + "probability_range": [0.88, 0.97], 572 + "confidence": 0.92, 573 + "reasoning_chain": [ 574 + "Pfizer/BioNTech trial: 95% efficacy (n=43,548)", 575 + "Moderna trial: 94.1% efficacy (n=30,420)", 576 + "Peer-reviewed publications in NEJM", 577 + "FDA independent analysis confirmed" 578 + ], 579 + "key_supporting_evidence_ids": ["E1", "E2"], 580 + "key_counter_evidence_ids": ["E3"], 581 + "uncertainty_factors": [ 582 + "Limited data on long-term effectiveness", 583 + "Variant-specific performance not yet measured" 584 + ] 585 + }, 586 + "evidence": [ 587 + { 588 + "evidence_id": "E1", 589 + "stance": "supports", 590 + "relevance_to_scenario": 0.98, 591 + "evidence_summary": [ 592 + "Pfizer trial showed 170 cases in placebo vs 8 in vaccine group", 593 + "Follow-up period median 2 months post-dose 2", 594 + "Efficacy consistent across age, sex, race, ethnicity" 595 + ], 596 + "citation": { 597 + "title": "Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine", 598 + "author_or_org": "Polack et al.", 599 + "publication_date": "2020-12-31", 600 + "url": "https://nejm.org/doi/full/10.1056/NEJMoa2034577", 601 + "publisher": "New England Journal of Medicine", 602 + "retrieved_at_utc": "2025-12-20T15:30:00Z" 603 + }, 604 + "excerpt": ["The vaccine was 95% effective in preventing Covid-19"], 605 + "excerpt_word_count": 9, 606 + "source_reliability_score": 0.95, 607 + "reliability_justification": "Peer-reviewed, high-impact journal, large RCT", 608 + "limitations_and_reservations": [ 609 + "Short follow-up period (2 months)", 610 + "Primarily measures symptomatic infection, not transmission" 611 + ], 612 + "retraction_or_dispute_signal": "none" 613 + } 614 + ] 615 + } 616 + ], 617 + "cache_metadata": { 618 + "first_analyzed": "2025-12-01T10:00:00Z", 619 + "last_updated": "2025-12-20T15:30:00Z", 620 + "hit_count": 47, 621 + "version": "v1.0", 622 + "ttl_expires": "2026-03-20T15:30:00Z" 623 + }, 624 + "cost": 0.081 625 +} 626 +{{/code}} 627 + 628 +**Cache Key Structure:** 629 +{{code}} 630 +Redis Key: claim:v1norm1:{language}:{sha256(canonical_claim)} 631 +TTL: 90 days (7,776,000 seconds) 632 +Size: ~15KB JSON (compressed: ~5KB) 633 +{{/code}} 634 + 635 +=== 4.3 Stage 3 Output: HolisticAssessment === 636 + 637 +{{code language="json"}} 638 +{ 639 + "job_id": "01J...ULID", 640 + "stage": "stage3_holistic", 641 + "article_metadata": { 642 + "title": "...", 643 + "main_thesis": "...", 644 + "source_url": "..." 645 + }, 646 + "article_holistic_assessment": { 647 + "overall_verdict": "MISLEADING", 648 + "logic_quality_score": 0.42, 649 + "fallacies_detected": [ 650 + "correlation-causation", 651 + "cherry-picking" 652 + ], 653 + "verdict_reasoning": [ 654 + "Central claim C1 is REFUTED by multiple systematic reviews", 655 + "Supporting claims C2-C4 are TRUE but do not support the thesis", 656 + "Article commits correlation-causation fallacy", 657 + "Selective citation of evidence (cherry-picking detected)" 658 + ], 659 + "experimental_feature": true 660 + }, 661 + "claims_summary": [ 662 + { 663 + "claim_id": "C1", 664 + "is_central_to_thesis": true, 665 + "verdict": "Refuted", 666 + "confidence": 0.89, 667 + "source": "cache", 668 + "cache_hit": true 669 + }, 670 + { 671 + "claim_id": "C2", 672 + "is_central_to_thesis": false, 673 + "verdict": "Highly Likely", 674 + "confidence": 0.91, 675 + "source": "new_analysis", 676 + "cache_hit": false 677 + } 678 + ], 679 + "quality_gates": { 680 + "gate1_claim_validation": "pass", 681 + "gate4_verdict_confidence": "pass", 682 + "passed_all": true 683 + }, 684 + "cost": 0.030, 685 + "total_job_cost": 0.114 686 +} 687 +{{/code}} 688 + 689 +=== 4.4 Complete AnalysisResult (All 3 Stages Combined) === 690 + 691 +{{code language="json"}} 692 +{ 241 241 "metadata": { 242 - "job_id": "string (ULID)", 243 - "timestamp_utc": "ISO8601", 244 - "engine_version": "POC1-v0.3", 245 - "llm_provider": "anthropic", 246 - "llm_model": "claude-3-5-sonnet-20241022", 694 + "job_id": "01J...ULID", 695 + "timestamp_utc": "2025-12-24T10:31:30Z", 696 + "engine_version": "POC1-v0.4", 697 + "llm_stage1": "claude-haiku-4", 698 + "llm_stage2": "claude-3-5-sonnet-20241022", 699 + "llm_stage3": "claude-3-5-sonnet-20241022", 247 247 "usage_stats": { 248 - "input_tokens": "integer", 249 - "output_tokens": "integer", 250 - "estimated_cost_usd": "float", 251 - "response_time_sec": "float" 701 + "stage1_tokens": {"input": 10000, "output": 500}, 702 + "stage2_tokens": {"input": 2000, "output": 5000}, 703 + "stage3_tokens": {"input": 5000, "output": 1000}, 704 + "total_input_tokens": 17000, 705 + "total_output_tokens": 6500, 706 + "estimated_cost_usd": 0.114, 707 + "response_time_sec": 45.2 708 + }, 709 + "cache_stats": { 710 + "claims_total": 5, 711 + "claims_from_cache": 4, 712 + "claims_new_analysis": 1, 713 + "cache_hit_rate": 0.80, 714 + "cache_savings_usd": 0.324 252 252 } 253 253 }, 254 254 "article_holistic_assessment": { 255 - "main_thesis": " string (The core argument detected)",256 - "overall_verdict": " WELL-SUPPORTED |MISLEADING| REFUTED | UNCERTAIN",257 - "logic_quality_score": "float (0-1)",258 - "fallacies_detected": ["correlation-causation", "cherry-picking" , "hasty-generalization"],259 - "verdict_reasoning": " string(Explanationof why article credibility differs from claim average)",718 + "main_thesis": "...", 719 + "overall_verdict": "MISLEADING", 720 + "logic_quality_score": 0.42, 721 + "fallacies_detected": ["correlation-causation", "cherry-picking"], 722 + "verdict_reasoning": ["...", "...", "..."], 260 260 "experimental_feature": true 261 261 }, 262 262 "claims": [ 263 263 { 264 264 "claim_id": "C1", 265 - "is_central_to_thesis": "boolean", 266 - "claim_text": "string", 267 - "canonical_form": "string", 268 - "claim_type": "descriptive | causal | predictive | normative | definitional", 269 - "evaluability": "evaluable | partly_evaluable | not_evaluable", 270 - "risk_tier": "A | B | C", 271 - "risk_tier_justification": "string", 272 - "domain": "string (e.g., 'public health', 'economics')", 273 - "key_terms": ["term1", "term2"], 274 - "entities": ["Person X", "Org Y"], 275 - "time_scope_detected": "2020-2024", 276 - "geography_scope_detected": "Brazil", 277 - "scenarios": [ 278 - { 279 - "scenario_id": "C1-S1", 280 - "context_title": "string", 281 - "definitions": {"key_term": "definition"}, 282 - "assumptions": ["Assumption 1", "Assumption 2"], 283 - "boundaries": { 284 - "time": "as of 2025-01", 285 - "geography": "Brazil", 286 - "population": "adult population", 287 - "conditions": "excludes X; includes Y" 288 - }, 289 - "scope_of_evidence": "What counts as evidence for this scenario", 290 - "scenario_questions": ["Question that decides the verdict"], 291 - "verdict": { 292 - "label": "Highly Likely | Likely | Unclear | Unlikely | Refuted | Unsubstantiated", 293 - "probability_range": [0.0, 1.0], 294 - "confidence": "float (0-1)", 295 - "reasoning": "string", 296 - "key_supporting_evidence_ids": ["E1", "E3"], 297 - "key_counter_evidence_ids": ["E2"], 298 - "uncertainty_factors": ["Data gap", "Method disagreement"], 299 - "what_would_change_my_mind": ["Specific new study", "Updated dataset"] 300 - }, 301 - "evidence": [ 302 - { 303 - "evidence_id": "E1", 304 - "stance": "supports | undermines | mixed | context_dependent", 305 - "relevance_to_scenario": "float (0-1)", 306 - "evidence_summary": ["Bullet fact 1", "Bullet fact 2"], 307 - "citation": { 308 - "title": "Source title", 309 - "author_or_org": "Org/Author", 310 - "publication_date": "2024-05-01", 311 - "url": "https://source.example", 312 - "publisher": "Publisher/Domain" 313 - }, 314 - "excerpt": ["Short quote ≤25 words (optional)"], 315 - "source_reliability_score": "float (0-1) - READ-ONLY SNAPSHOT", 316 - "reliability_justification": "Why high/medium/low", 317 - "limitations_and_reservations": ["Limitation 1", "Limitation 2"], 318 - "retraction_or_dispute_signal": "none | correction | retraction | disputed", 319 - "retrieval_status": "OK | NEEDS_RETRIEVAL | FAILED" 320 - } 321 - ] 322 - } 323 - ] 728 + "is_central_to_thesis": true, 729 + "claim_text": "...", 730 + "canonical_claim": "...", 731 + "claim_hash": "sha256:abc123...", 732 + "claim_type": "causal", 733 + "evaluability": "evaluable", 734 + "risk_tier": "B", 735 + "source": "cache", 736 + "cached_at": "2025-12-20T15:30:00Z", 737 + "cache_hit_count": 47, 738 + "scenarios": [...] 739 + }, 740 + { 741 + "claim_id": "C2", 742 + "source": "new_analysis", 743 + "analyzed_at": "2025-12-24T10:31:15Z", 744 + "scenarios": [...] 324 324 } 325 325 ], 326 326 "quality_gates": { 327 - "gate1_claim_validation": "pass | fail", 328 - "gate4_verdict_confidence": "pass | fail", 329 - "passed_all": "boolean", 330 - "gate_fail_reasons": [ 331 - { 332 - "gate": "gate1_claim_validation", 333 - "claim_id": "C1", 334 - "reason_code": "OPINION_DETECTED | COMPOUND_CLAIM | SUBJECTIVE | TOO_VAGUE", 335 - "explanation": "Human-readable explanation" 336 - } 337 - ] 338 - }, 339 - "global_notes": { 340 - "limitations": ["System limitation 1", "Limitation 2"], 341 - "safety_or_policy_notes": ["Note 1"] 748 + "gate1_claim_validation": "pass", 749 + "gate4_verdict_confidence": "pass", 750 + "passed_all": true 342 342 } 343 343 } 344 344 {{/code}} 345 345 346 -=== 4.1 Risk Tier Definitions === 347 347 348 -|=Tier|=Impact|=Examples|=Actions 349 -|**A (High)**|High real-world impact if wrong|Health claims, safety information, financial advice, medical procedures|Human review recommended (Mode3_Human_Reviewed_Required) 350 -|**B (Medium)**|Moderate impact, contested topics|Political claims, social issues, scientific debates, economic predictions|Enhanced contradiction search, AI-generated publication OK (Mode2_AI_Generated) 351 -|**C (Low)**|Low impact, easily verifiable|Historical facts, basic statistics, biographical data, geographic information|Standard processing, AI-generated publication OK (Mode2_AI_Generated) 352 352 353 -=== 4. 2SourceReliability(Read-OnlySnapshots)===757 +=== 4.5 Verdict Label Taxonomy === 354 354 355 - **IMPORTANT:** The {{code}}source_reliability_score{{/code}}ineachevidenceitemis a **historical snapshot**from theweekly backgroundscoringjob.759 +FactHarbor uses **three distinct verdict taxonomies** depending on analysis level: 356 356 357 -* POC1 treats these scores as **read-only** (no modification during analysis) 358 -* **Prevents circular dependency:** scoring → affects retrieval → affects scoring 359 -* Full Source Track Record System is a **separate service** (not part of POC1) 360 -* **Temporal separation:** Scoring runs weekly; analysis uses snapshots 761 +==== 4.5.1 Scenario Verdict Labels (Stage 2) ==== 361 361 362 - **See:**[[Data Model>>Test.FactHarbor.Specification.Data Model.WebHome]]Section1.3 (SourceTrack RecordSystem) forscoringalgorithm.763 +Used for individual scenario verdicts within a claim. 363 363 364 -=== 4.3 Quality Gate Reason Codes === 765 +**Enum Values:** 766 +* {{code}}Highly Likely{{/code}} - Probability 0.85-1.0, high confidence 767 +* {{code}}Likely{{/code}} - Probability 0.65-0.84, moderate-high confidence 768 +* {{code}}Unclear{{/code}} - Probability 0.35-0.64, or low confidence 769 +* {{code}}Unlikely{{/code}} - Probability 0.16-0.34, moderate-high confidence 770 +* {{code}}Highly Unlikely{{/code}} - Probability 0.0-0.15, high confidence 771 +* {{code}}Unsubstantiated{{/code}} - Insufficient evidence to determine probability 365 365 366 -**Gate 1 (Claim Validation):** 367 -* {{code}}OPINION_DETECTED{{/code}} - Subjective judgment without factual anchor 368 -* {{code}}COMPOUND_CLAIM{{/code}} - Multiple claims in one statement 369 -* {{code}}SUBJECTIVE{{/code}} - Value judgment, not verifiable fact 370 -* {{code}}TOO_VAGUE{{/code}} - Lacks specificity for evaluation 773 +==== 4.5.2 Claim Verdict Labels (Rollup) ==== 371 371 372 -**Gate 4 (Verdict Confidence):** 373 -* {{code}}LOW_CONFIDENCE{{/code}} - Confidence below threshold (<0.5) 374 -* {{code}}INSUFFICIENT_EVIDENCE{{/code}} - Too few sources to reach verdict 375 -* {{code}}CONTRADICTORY_EVIDENCE{{/code}} - Evidence conflicts without resolution 376 -* {{code}}NO_COUNTER_EVIDENCE{{/code}} - Contradiction search failed 775 +Used when summarizing a claim across all scenarios. 377 377 378 -**Purpose:** Enable system improvement workflow (Observe → Analyze → Improve) 777 +**Enum Values:** 778 +* {{code}}Supported{{/code}} - Majority of scenarios are Likely or Highly Likely 779 +* {{code}}Refuted{{/code}} - Majority of scenarios are Unlikely or Highly Unlikely 780 +* {{code}}Inconclusive{{/code}} - Mixed scenarios or majority Unclear/Unsubstantiated 379 379 380 ---- 782 +**Mapping Logic:** 783 +* If ≥60% scenarios are (Highly Likely | Likely) → Supported 784 +* If ≥60% scenarios are (Highly Unlikely | Unlikely) → Refuted 785 +* Otherwise → Inconclusive 381 381 382 -== 5. Validation Rules(POC1 Enforcement) ==787 +==== 4.5.3 Article Verdict Labels (Stage 3) ==== 383 383 384 -|=Rule|=Requirement 385 -|**Mandatory Contradiction**|For every claim, the engine MUST search for "undermines" evidence. If none found, reasoning must explicitly state: "No counter-evidence found despite targeted search." Evidence must include at least 1 item with {{code}}stance ∈ {undermines, mixed, context_dependent}{{/code}} OR explicit note in {{code}}uncertainty_factors{{/code}}. 386 -|**Context-Aware Logic**|The {{code}}overall_verdict{{/code}} must prioritize central claims. If a {{code}}is_central_to_thesis=true{{/code}} claim is REFUTED, the overall article cannot be WELL-SUPPORTED. Central claims override verdict averaging. 387 -|**Author Identification**|All automated outputs MUST include {{code}}author_type: "AI/AKEL"{{/code}} or equivalent marker to distinguish AI-generated from human-reviewed content. 388 -|**Claim-to-Scenario Lifecycle**|In stateless POC1, Scenarios are **strictly children** of a specific Claim version. If a Claim's text changes, child Scenarios are part of that version's "snapshot." No scenario migration across versions. 789 +Used for holistic article-level assessment. 389 389 390 ---- 791 +**Enum Values:** 792 +* {{code}}WELL-SUPPORTED{{/code}} - Article thesis logically follows from supported claims 793 +* {{code}}MISLEADING{{/code}} - Claims may be true but article commits logical fallacies 794 +* {{code}}REFUTED{{/code}} - Central claims are refuted, invalidating thesis 795 +* {{code}}UNCERTAIN{{/code}} - Insufficient evidence or highly mixed claim verdicts 391 391 392 - == 6. DeterministicMarkdownTemplate ==797 +**Note:** Article verdict considers **claim centrality** (central claims override supporting claims). 393 393 394 - Thesystem renders {{code}}report.md{{/code}}usinga **fixed template** basedonthe JSON result (NOTgeneratedby LLM).799 +==== 4.5.4 API Field Mapping ==== 395 395 396 -{{code language="markdown"}} 397 -# FactHarbor Analysis Report: {overall_verdict} 801 +|=Level|=API Field|=Enum Name 802 +|Scenario|{{code}}scenarios[].verdict.label{{/code}}|scenario_verdict_label 803 +|Claim|{{code}}claims[].rollup_verdict{{/code}} (optional)|claim_verdict_label 804 +|Article|{{code}}article_holistic_assessment.overall_verdict{{/code}}|article_verdict_label 398 398 399 -**Job ID:** {job_id} | **Generated:** {timestamp_utc} 400 -**Model:** {llm_model} | **Cost:** ${estimated_cost_usd} | **Time:** {response_time_sec}s 401 401 402 402 --- 403 403 404 - ##1.HolisticAssessment(Experimental)809 +== 5. Cache Architecture == 405 405 406 - **MainThesis:**{main_thesis}811 +=== 5.1 Redis Cache Design === 407 407 408 -** OverallVerdict:**{overall_verdict}813 +**Technology:** Redis 7.0+ (in-memory key-value store) 409 409 410 -**Logic Quality Score:** {logic_quality_score}/1.0 815 +**Cache Key Schema:** 816 +{{code}} 817 +claim:v1norm1:{language}:{sha256(canonical_claim)} 818 +{{/code}} 411 411 412 -**Fallacies Detected:** {fallacies_detected} 820 +**Example:** 821 +{{code}} 822 +Claim (English): "COVID vaccines are 95% effective" 823 +Canonical: "covid vaccines are 95 percent effective" 824 +Language: "en" 825 +SHA256: abc123...def456 826 +Key: claim:v1norm1:en:abc123...def456 827 +{{/code}} 413 413 414 -**R easoning:**{verdict_reasoning}829 +**Rationale:** Prevents cross-language collisions and enables per-language cache analytics. 415 415 416 ---- 831 +**Data Structure:** 832 +{{code language="redis"}} 833 +SET claim:v1:abc123...def456 '{...ClaimAnalysis JSON...}' 834 +EXPIRE claim:v1:abc123...def456 7776000 # 90 days 835 +{{/code}} 417 417 418 -## 2. Key Claims Analysis 837 +**Additional Keys:** 838 +{{code}} 419 419 420 -### [C1] {claim_text} 421 -* **Role:** {is_central_to_thesis ? "Central to thesis" : "Supporting claim"} 422 -* **Risk Tier:** {risk_tier} ({risk_tier_justification}) 423 -* **Evaluability:** {evaluability} 840 +==== 5.1.1 Canonical Claim Normalization (v1) ==== 424 424 425 - **ScenariosExplored:**{scenarios.length}842 +The cache key depends on deterministic claim normalization. All implementations MUST follow this algorithm exactly. 426 426 427 -#### Scenario: {scenario.context_title} 428 -* **Verdict:** {verdict.label} (Confidence: {verdict.confidence}) 429 -* **Probability Range:** {verdict.probability_range[0]} - {verdict.probability_range[1]} 430 -* **Reasoning:** {verdict.reasoning} 844 +**Algorithm: Canonical Claim Normalization v1** 431 431 432 -**Evidence:** 433 -* Supporting: {evidence.filter(e => e.stance == "supports").length} sources 434 -* Undermining: {evidence.filter(e => e.stance == "undermines").length} sources 435 -* Mixed: {evidence.filter(e => e.stance == "mixed").length} sources 846 +{{code language="python"}} 847 +def normalize_claim_v1(claim_text: str, language: str) -> str: 848 + """ 849 + Normalizes claim to canonical form for cache key generation. 850 + Version: v1norm1 (POC1) 851 + """ 852 + import re 853 + import unicodedata 854 + 855 + # Step 1: Unicode normalization (NFC) 856 + text = unicodedata.normalize('NFC', claim_text) 857 + 858 + # Step 2: Lowercase 859 + text = text.lower() 860 + 861 + # Step 3: Remove punctuation (except hyphens in words) 862 + text = re.sub(r'[^\w\s-]', '', text) 863 + 864 + # Step 4: Normalize whitespace (collapse multiple spaces) 865 + text = re.sub(r'\s+', ' ', text).strip() 866 + 867 + # Step 5: Numeric normalization 868 + text = text.replace('%', ' percent') 869 + # Spell out single-digit numbers 870 + num_to_word = {'0':'zero', '1':'one', '2':'two', '3':'three', 871 + '4':'four', '5':'five', '6':'six', '7':'seven', 872 + '8':'eight', '9':'nine'} 873 + for num, word in num_to_word.items(): 874 + text = re.sub(rf'\b{num}\b', word, text) 875 + 876 + # Step 6: Common abbreviations (English only in v1) 877 + if language == 'en': 878 + text = text.replace('covid-19', 'covid') 879 + text = text.replace('u.s.', 'us') 880 + text = text.replace('u.k.', 'uk') 881 + 882 + # Step 7: NO entity normalization in v1 883 + # (Trump vs Donald Trump vs President Trump remain distinct) 884 + 885 + return text 436 436 437 -**Key Evidence:** 438 -* [{evidence[0].citation.title}]({evidence[0].citation.url}) - {evidence[0].stance} 887 +# Version identifier (include in cache namespace) 888 +CANONICALIZER_VERSION = "v1norm1" 889 +{{/code}} 439 439 440 - ---891 +**Cache Key Formula (Updated):** 441 441 442 -## 3. Quality Assessment 893 +{{code}} 894 +language = "en" 895 +canonical = normalize_claim_v1(claim_text, language) 896 +cache_key = f"claim:{CANONICALIZER_VERSION}:{language}:{sha256(canonical)}" 443 443 444 -**Quality Gates:** 445 -* Gate 1 (Claim Validation): {gate1_claim_validation} 446 -* Gate 4 (Verdict Confidence): {gate4_verdict_confidence} 447 -* Overall: {passed_all ? "PASS" : "FAIL"} 898 +Example: 899 + claim: "COVID-19 vaccines are 95% effective" 900 + canonical: "covid vaccines are 95 percent effective" 901 + sha256: abc123...def456 902 + key: "claim:v1norm1:en:abc123...def456" 903 +{{/code}} 448 448 449 -{if gate_fail_reasons.length > 0} 450 -**Failed Gates:** 451 -{gate_fail_reasons.map(r => `* ${r.gate}: ${r.explanation}`)} 452 -{/if} 905 +**Cache Metadata MUST Include:** 453 453 907 +{{code language="json"}} 908 +{ 909 + "canonical_claim": "covid vaccines are 95 percent effective", 910 + "canonicalizer_version": "v1norm1", 911 + "language": "en", 912 + "original_claim_samples": ["COVID-19 vaccines are 95% effective"] 913 +} 914 +{{/code}} 915 + 916 +**Version Upgrade Path:** 917 +* v1norm1 → v1norm2: Cache namespace changes, old keys remain valid until TTL 918 +* v1normN → v2norm1: Major version bump, invalidate all v1 caches 919 + 920 + 921 +claim:stats:hit_count:{claim_hash} # Counter 922 +claim:index:domain:{domain} # Set of claim hashes by domain 923 +claim:index:language:{lang} # Set of claim hashes by language 924 +{{/code}} 925 + 926 + 927 +=== 5.1.1 Canonical Claim Normalization (v1) === 928 + 929 +The cache key depends on deterministic claim normalization. All implementations MUST follow this algorithm exactly. 930 + 931 +**Algorithm: Canonical Claim Normalization v1** 932 + 933 +{{code language="python"}} 934 +def normalize_claim_v1(claim_text: str, language: str) -> str: 935 + """ 936 + Normalizes claim to canonical form for cache key generation. 937 + Version: v1norm1 (POC1) 938 + """ 939 + import re 940 + import unicodedata 941 + 942 + # Step 1: Unicode normalization (NFC) 943 + text = unicodedata.normalize('NFC', claim_text) 944 + 945 + # Step 2: Lowercase 946 + text = text.lower() 947 + 948 + # Step 3: Remove punctuation (except hyphens in words) 949 + text = re.sub(r'[^\w\s-]', '', text) 950 + 951 + # Step 4: Normalize whitespace (collapse multiple spaces) 952 + text = re.sub(r'\s+', ' ', text).strip() 953 + 954 + # Step 5: Numeric normalization 955 + text = text.replace('%', ' percent') 956 + # Spell out single-digit numbers 957 + num_to_word = {'0':'zero', '1':'one', '2':'two', '3':'three', 958 + '4':'four', '5':'five', '6':'six', '7':'seven', 959 + '8':'eight', '9':'nine'} 960 + for num, word in num_to_word.items(): 961 + text = re.sub(rf'\b{num}\b', word, text) 962 + 963 + # Step 6: Common abbreviations (English only in v1) 964 + if language == 'en': 965 + text = text.replace('covid-19', 'covid') 966 + text = text.replace('u.s.', 'us') 967 + text = text.replace('u.k.', 'uk') 968 + 969 + # Step 7: NO entity normalization in v1 970 + # (Trump vs Donald Trump vs President Trump remain distinct) 971 + 972 + return text 973 + 974 +# Version identifier (include in cache namespace) 975 +CANONICALIZER_VERSION = "v1norm1" 976 +{{/code}} 977 + 978 +**Cache Key Formula (Updated):** 979 + 980 +{{code}} 981 +language = "en" 982 +canonical = normalize_claim_v1(claim_text, language) 983 +cache_key = f"claim:{CANONICALIZER_VERSION}:{language}:{sha256(canonical)}" 984 + 985 +Example: 986 + claim: "COVID-19 vaccines are 95% effective" 987 + canonical: "covid vaccines are 95 percent effective" 988 + sha256: abc123...def456 989 + key: "claim:v1norm1:en:abc123...def456" 990 +{{/code}} 991 + 992 +**Cache Metadata MUST Include:** 993 + 994 +{{code language="json"}} 995 +{ 996 + "canonical_claim": "covid vaccines are 95 percent effective", 997 + "canonicalizer_version": "v1norm1", 998 + "language": "en", 999 + "original_claim_samples": ["COVID-19 vaccines are 95% effective"] 1000 +} 1001 +{{/code}} 1002 + 1003 +**Version Upgrade Path:** 1004 +* v1norm1 → v1norm2: Cache namespace changes, old keys remain valid until TTL 1005 +* v1normN → v2norm1: Major version bump, invalidate all v1 caches 1006 + 1007 + 1008 + 1009 +=== 5.1.2 Copyright & Data Retention Policy === 1010 + 1011 +**Evidence Excerpt Storage:** 1012 + 1013 +To comply with copyright law and fair use principles: 1014 + 1015 +**What We Store:** 1016 +* **Metadata only:** Title, author, publisher, URL, publication date 1017 +* **Short excerpts:** Max 25 words per quote, max 3 quotes per evidence item 1018 +* **Summaries:** AI-generated bullet points (not verbatim text) 1019 +* **No full articles:** Never store complete article text beyond job processing 1020 + 1021 +**Total per Cached Claim:** 1022 +* Scenarios: 2 per claim 1023 +* Evidence items: 6 per scenario (12 total) 1024 +* Quotes: 3 per evidence × 25 words = 75 words per item 1025 +* **Maximum stored verbatim text:** ~900 words per claim (12 × 75) 1026 + 1027 +**Retention:** 1028 +* Cache TTL: 90 days 1029 +* Job outputs: 24 hours (then archived or deleted) 1030 +* No persistent full-text article storage 1031 + 1032 +**Rationale:** 1033 +* Short excerpts for citation = fair use 1034 +* Summaries are transformative (not copyrightable) 1035 +* Limited retention (90 days max) 1036 +* No commercial republication of excerpts 1037 + 1038 +**DMCA Compliance:** 1039 +* Cache invalidation endpoint available for rights holders 1040 +* Contact: dmca@factharbor.org 1041 + 1042 + 1043 +=== 5.2 Cache Invalidation Strategy === 1044 + 1045 +**Time-Based (Primary):** 1046 +* TTL: 90 days for most claims 1047 +* Reasoning: Evidence freshness, news cycles 1048 + 1049 +**Event-Based (Manual):** 1050 +* Admin can flag claims for invalidation 1051 +* Example: "Major study retracts findings" 1052 +* Tool: {{code}}DELETE /v1/cache/claim/{claim_hash}?reason=retraction{{/code}} 1053 + 1054 +**Version-Based (Automatic):** 1055 +* AKEL v2.0 release → Invalidate all v1.0 caches 1056 +* Cache keys include version: {{code}}claim:v1:*{{/code}} vs {{code}}claim:v2:*{{/code}} 1057 + 1058 +**Long-Lived Historical Claims:** 1059 +* Historical claims about completed events generally have stable verdicts 1060 +* Example: "2024 US presidential election results" 1061 +* **Policy:** Extended TTL (365-3,650 days) instead of "never invalidate" 1062 +* **Reason:** Even historical data gets revisions (updated counts, corrections) 1063 +* **Mechanism:** Admin can still manually invalidate if major correction issued 1064 +* **Flag:** {{code}}is_historical=true{{/code}} in cache metadata → longer TTL 1065 + 1066 +=== 5.3 Cache Warming Strategy === 1067 + 1068 +**Proactive Cache Building (Future):** 1069 + 1070 +**Trending Topics:** 1071 +* Monitor news APIs for trending topics 1072 +* Pre-analyze top 20 common claims 1073 +* Example: New health study published → Pre-cache related claims 1074 + 1075 +**Predictable Events:** 1076 +* Elections, sporting events, earnings reports 1077 +* Pre-cache expected claims before event 1078 +* Reduces load during traffic spikes 1079 + 1080 +**User Patterns:** 1081 +* Analyze query logs 1082 +* Identify frequently requested claims 1083 +* Prioritize cache warming for these 1084 + 454 454 --- 455 455 456 - ##4.Limitations &Disclaimers1087 +== 6. Quality Gates & Validation Rules == 457 457 458 -**System Limitations:** 459 -{limitations.map(l => `* ${l}`)} 1089 +=== 6.1 Quality Gate Overview === 460 460 461 - **Important Notes:**462 -* Thisanalysisis AI-generatedandexperimental(POC1)463 -* Cont ext-awarearticle verdictisbeingtestedfor accuracy464 -* Humanreviewrecommendedforhigh-riskclaims(Tier A)465 -* Cost:${estimated_cost_usd}|Tokens:{input_tokens+output_tokens}1091 +|=Gate|=Name|=POC1 Status|=Applies To|=Notes 1092 +|**Gate 1**|Claim Validation|✅ Hard gate|Stage 1: Extraction|Filters opinions, compound claims 1093 +|**Gate 2**|Contradiction Search|✅ Mandatory rule|Stage 2: Analysis|Enforced per cached claim 1094 +|**Gate 3**|Uncertainty Disclosure|⚠️ Soft guidance|Stage 2: Analysis|Best practice 1095 +|**Gate 4**|Verdict Confidence|✅ Hard gate|Stage 2: Analysis|Confidence ≥ 0.5 required 466 466 467 -**Methodology:** FactHarbor uses Claude 3.5 Sonnet to extract claims, generate scenarios, gather evidence (with mandatory contradiction search), and assess logical coherence between claims and article thesis. 1097 +**Hard Gate Failures:** 1098 +* Gate 1 fail → Claim excluded from analysis 1099 +* Gate 4 fail → Claim marked "Unsubstantiated" but included 468 468 1101 +=== 6.2 Validation Rules === 1102 + 1103 +|=Rule|=Requirement 1104 +|**Mandatory Contradiction**|Stage 2 MUST search for "undermines" evidence. If none found, reasoning must state: "No counter-evidence found despite targeted search." 1105 +|**Context-Aware Logic**|Stage 3 must prioritize central claims. If {{code}}is_central_to_thesis=true{{/code}} claim is REFUTED, article cannot be WELL-SUPPORTED. 1106 +|**Cache Consistency**|Cached claims must match current AKEL version. Version mismatch → cache miss. 1107 +|**Author Identification**|All outputs MUST include {{code}}author_type: "AI/AKEL"{{/code}}. 1108 + 469 469 --- 470 470 471 -*Generated by FactHarbor POC1-v0.3 | [About FactHarbor](https://factharbor.org)* 472 -{{/code}} 1111 +== 7. Deterministic Markdown Template == 473 473 474 - **TargetReportSize:**220-350 words(optimizedfor2-minuteread)1113 +Report generation uses **fixed template** (not LLM-generated). 475 475 1115 +**Cache-Only Mode Template:** 1116 +{{code language="markdown"}} 1117 +# FactHarbor Analysis Report: PARTIAL ANALYSIS 1118 + 1119 +**Job ID:** {job_id} | **Generated:** {timestamp_utc} 1120 +**Mode:** Cache-Only (Free Tier) 1121 + 476 476 --- 477 477 478 - ==7.LLM Configuration(POC1) ==1124 +## ⚠️ Partial Analysis Notice 479 479 480 -|=Parameter|=Value|=Notes 481 -|**Provider**|Anthropic|Primary provider for POC1 482 -|**Model**|{{code}}claude-3-5-sonnet-20241022{{/code}}|Current production model 483 -|**Future Model**|{{code}}claude-sonnet-4-20250514{{/code}}|When available (architecture supports) 484 -|**Token Budget**|50K-80K per analysis|Input + output combined (varies by article length) 485 -|**Estimated Cost**|$0.10-0.30 per article|Based on Sonnet 3.5 pricing ($3/M input, $15/M output) 486 -|**Prompt Strategy**|Single-pass per stage|Not multi-turn; structured JSON output with schema validation 487 -|**Chain-of-Thought**|Yes|For verdict reasoning and holistic assessment 488 -|**Few-Shot Examples**|Yes|For claim extraction and scenario generation 1126 +This is a **cache-only analysis** based on previously analyzed claims. 1127 +{cache_coverage_percent}% of claims were available in cache. 489 489 490 -=== 7.1 Token Budgets by Stage === 1129 +**What's Included:** 1130 +* {claims_cached} of {claims_total} claims analyzed 1131 +* Evidence and verdicts from cache (last updated: {oldest_cache_date}) 491 491 492 -|=Stage|=Approximate Output Tokens 493 -|Claim Extraction|~4,000 (10 claims × ~400 tokens) 494 -|Scenario Generation|~3,000 per claim (3 scenarios × ~1,000 tokens) 495 -|Evidence Synthesis|~2,000 per scenario 496 -|Verdict Generation|~1,000 per scenario 497 -|Holistic Assessment|~500 (context-aware summary) 1133 +**What's Missing:** 1134 +* {claims_missing} claims require new analysis 1135 +* Full article holistic assessment unavailable 1136 +* Estimated cost to complete: ${cost_to_complete} 498 498 499 -** Total:**50K-80Ktokensperarticle(input+ output)1138 +**[Upgrade to Pro]** for complete analysis 500 500 501 - === 7.2 API Integration ===1140 +--- 502 502 503 -**Anthropic Messages API:** 504 -* Endpoint: {{code}}https://api.anthropic.com/v1/messages{{/code}} 505 -* Authentication: API key via {{code}}x-api-key{{/code}} header 506 -* Model parameter: {{code}}"model": "claude-3-5-sonnet-20241022"{{/code}} 507 -* Max tokens: {{code}}"max_tokens": 4096{{/code}} (per stage) 1142 +## Cached Claims 508 508 509 -**No LangChain/LangGraph needed** for POC1 simplicity - direct SDK calls suffice. 1144 +### [C1] {claim_text} ✅ From Cache 1145 +* **Cached:** {cached_at} ({cache_age} ago) 1146 +* **Times Used:** {hit_count} articles 1147 +* **Verdict:** {verdict} (Confidence: {confidence}) 1148 +* **Evidence:** {evidence_count} sources 510 510 1150 +[Full claim details...] 1151 + 1152 +### [C3] {claim_text} ⚠️ Not In Cache 1153 +* **Status:** Requires new analysis 1154 +* **Cost:** $0.081 1155 +* **Upgrade to analyze this claim** 1156 + 511 511 --- 512 512 513 -== 8. Cross-References (xWiki) == 1159 +**Powered by FactHarbor POC1-v0.4** | [Upgrade](https://factharbor.org/upgrade) 1160 +{{/code}} 514 514 515 - This API specification implements requirements from:1162 +--- 516 516 517 -* **[[POC Requirements>>Test.FactHarbor.Specification.POC.Requirements]]** 518 -** FR-POC-1 through FR-POC-6 (POC1-specific functional requirements) 519 -** NFR-POC-1 through NFR-POC-3 (quality gates lite: Gates 1 & 4 only) 520 -** Section 2.1: Analysis Summary (Context-Aware) component specification 521 -** Section 10.3: Prompt structure for claim extraction and verdict synthesis 1164 +== 8. LLM Configuration (3-Stage) == 522 522 523 -* **[[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]]** 524 -** Complete investigation of 7 approaches to article-level verdicts 525 -** Approach 1 (Single-Pass Holistic Analysis) chosen for POC1 526 -** Experimental feature testing plan (30 articles, ≥70% accuracy target) 527 -** Decision framework for POC2 implementation 1166 +=== 8.1 Stage 1: Claim Extraction (Haiku) === 528 528 529 -* **[[Requirements>>Test.FactHarbor.Specification.Requirements.WebHome]]** 530 -** FR4 (Analysis Summary) - enhanced with context-aware capability 531 -** FR7 (Verdict Calculation) - probability ranges + confidence scores 532 -** NFR11 (Quality Gates) - POC1 implements Gates 1 & 4; Gates 2 & 3 in POC2 1168 +|=Parameter|=Value|=Notes 1169 +|**Model**|{{code}}claude-haiku-4-20250108{{/code}}|Fast, cheap, sufficient for extraction 1170 +|**Input Tokens**|~10K|Article text after URL extraction 1171 +|**Output Tokens**|~500|5 claims @ ~100 tokens each 1172 +|**Cost**|$0.003 per article|($0.25/M input + $1.25/M output) 1173 +|**Temperature**|0.0|Deterministic 1174 +|**Max Tokens**|1000|Generous buffer 533 533 534 -* **[[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]]** 535 -** POC1 simplified architecture (stateless, single AKEL orchestration call) 536 -** Data persistence minimized (job outputs only, no database required) 537 -** Deferred complexity (no Elasticsearch, TimescaleDB, Federation until metrics justify) 1176 +**Prompt Strategy:** 1177 +* Extract 5 verifiable factual claims 1178 +* Mark central vs. supporting claims 1179 +* Canonicalize (normalize phrasing) 1180 +* Deduplicate similar claims 1181 +* Output structured JSON only 538 538 539 -* **[[Data Model>>Test.FactHarbor.Specification.Data Model.WebHome]]** 540 -** Evidence structure (source, stance, reliability rating) 541 -** Scenario boundaries (time, geography, population, conditions) 542 -** Claim types and evaluability taxonomy 543 -** Source Track Record System (Section 1.3) - temporal separation 1183 +=== 8.2 Stage 2: Claim Analysis (Sonnet, CACHED) === 544 544 545 -* **[[Requirements Roadmap Matrix>>Test.FactHarbor.Roadmap.Requirements-Roadmap-Matrix.WebHome]]** 546 -** POC1 requirement mappings and phase assignments 547 -** Context-aware analysis as POC1 experimental feature 548 -** POC2 enhancement path (Gates 2 & 3, evidence deduplication) 1185 +|=Parameter|=Value|=Notes 1186 +|**Model**|{{code}}claude-3-5-sonnet-20241022{{/code}}|High quality for verdicts 1187 +|**Input Tokens**|~2K|Single claim + prompt + context 1188 +|**Output Tokens**|~5K|2 scenarios × ~2.5K tokens 1189 +|**Cost**|$0.081 per NEW claim|($3/M input + $15/M output) 1190 +|**Temperature**|0.0|Deterministic (cache consistency) 1191 +|**Max Tokens**|8000|Sufficient for 2 scenarios 1192 +|**Cache Strategy**|Redis, 90-day TTL|Key: {{code}}claim:v1norm1:{language}:{sha256(canonical_claim)}{{/code}} 549 549 1194 +**Prompt Strategy:** 1195 +* Generate 2 scenario interpretations 1196 +* Search for supporting AND undermining evidence (mandatory) 1197 +* 6 evidence items per scenario maximum 1198 +* Compute verdict with reasoning chain (3-4 bullets) 1199 +* Output structured JSON only 1200 + 1201 +**Output Constraints (Cost Control):** 1202 +* Scenarios: Max 2 per claim 1203 +* Evidence: Max 6 per scenario 1204 +* Evidence summary: Max 3 bullets 1205 +* Reasoning chain: Max 4 bullets 1206 + 1207 +=== 8.3 Stage 3: Holistic Assessment (Sonnet) === 1208 + 1209 +|=Parameter|=Value|=Notes 1210 +|**Model**|{{code}}claude-3-5-sonnet-20241022{{/code}}|Context-aware analysis 1211 +|**Input Tokens**|~5K|Article + claim verdicts 1212 +|**Output Tokens**|~1K|Article verdict + fallacies 1213 +|**Cost**|$0.030 per article|($3/M input + $15/M output) 1214 +|**Temperature**|0.0|Deterministic 1215 +|**Max Tokens**|2000|Sufficient for assessment 1216 + 1217 +**Prompt Strategy:** 1218 +* Detect main thesis 1219 +* Evaluate logical coherence (claim verdicts → thesis) 1220 +* Identify fallacies (correlation-causation, cherry-picking, etc.) 1221 +* Compute logic_quality_score 1222 +* Explain article verdict reasoning (3-4 bullets) 1223 +* Output structured JSON only 1224 + 1225 +=== 8.4 Cost Projections by Cache Hit Rate === 1226 + 1227 +|=Cache Hit Rate|=Cost per Article|=10K Articles Cost|=100K Articles Cost 1228 +|0% (cold start)|$0.438|$4,380|$43,800 1229 +|20%|$0.357|$3,570|$35,700 1230 +|40%|$0.276|$2,760|$27,600 1231 +|**60%**|**$0.195**|**$1,950**|**$19,500** 1232 +|**70%** (target)|**$0.155**|**$1,550**|**$15,500** 1233 +|**80%**|**$0.114**|**$1,140**|**$11,400** 1234 +|**90%**|**$0.073**|**$730**|**$7,300** 1235 +|95%|$0.053|$530|$5,300 1236 + 1237 +**Break-Even Analysis:** 1238 +* Monolithic (v0.3.1): $0.15 per article constant 1239 +* 3-stage breaks even at **70% cache hit rate** 1240 +* Expected after ~1,500 articles in same domain 1241 + 550 550 --- 551 551 552 -== 9. Implementation Notes (POC1)==1244 +== 9. Implementation Notes == 553 553 554 554 === 9.1 Recommended Tech Stack === 555 555 556 -* **Framework:** Next.js 14+ with App Router (TypeScript) - Full-stack in one codebase 557 -* **Rationale:** API routes + React UI unified, Vercel deployment-ready, similar to C# in structure 558 -* **Storage:** Filesystem JSON files (no database needed for POC1) 559 -* **Queue:** In-memory queue or Redis (optional for concurrency) 560 -* **URL Extraction:** Jina AI Reader API (primary), trafilatura (fallback) 561 -* **Deployment:** Vercel, AWS Lambda, or similar serverless 1248 +* **Framework:** Next.js 14+ with App Router (TypeScript) 1249 +* **Cache:** Redis 7.0+ (managed: AWS ElastiCache, Redis Cloud, Upstash) 1250 +* **Storage:** Filesystem JSON for jobs + S3/R2 for archival 1251 +* **Queue:** BullMQ with Redis (for 3-stage pipeline orchestration) 1252 +* **LLM Client:** Anthropic Python SDK or TypeScript SDK 1253 +* **Cost Tracking:** PostgreSQL for user credit ledger 1254 +* **Deployment:** Vercel (frontend + API) + Redis Cloud 562 562 563 -=== 9.2 POC1Simplifications===1256 +=== 9.2 3-Stage Pipeline Implementation === 564 564 565 -* **No database required:** Job metadata + outputs stored as JSON files ({{code}}jobs/{job_id}.json{{/code}}, {{code}}results/{job_id}.json{{/code}}) 566 -* **No user authentication:** Optional API key validation only (env var: {{code}}FACTHARBOR_API_KEY{{/code}}) 567 -* **Single-instance deployment:** No distributed processing, no worker pools 568 -* **Synchronous LLM calls:** No streaming in POC1 (entire response before returning) 569 -* **Job retention:** 24 hours default (configurable: {{code}}JOB_RETENTION_HOURS{{/code}}) 570 -* **Rate limiting:** Simple IP-based (optional) - no complex billing 1258 +**Job Queue Flow (Conceptual):** 571 571 572 -=== 9.3 Estimated Costs (Per Analysis) === 1260 +{{code language="typescript"}} 1261 +// Stage 1: Extract Claims 1262 +const stage1Job = await queue.add('stage1-extract-claims', { 1263 + jobId: 'job123', 1264 + articleUrl: 'https://example.com/article' 1265 +}); 573 573 574 -**LLM API costs (Claude 3.5 Sonnet):** 575 -* Input: $3.00 per million tokens 576 -* Output: $15.00 per million tokens 577 -* **Per article:** $0.10-0.30 (varies by length, 5-10 claims typical) 1267 +// On Stage 1 completion → enqueue Stage 2 jobs 1268 +stage1Job.on('completed', async (result) => { 1269 + const { claims } = result; 1270 + 1271 + // Stage 2: Analyze each claim (with cache check) 1272 + const stage2Jobs = await Promise.all( 1273 + claims.map(claim => 1274 + queue.add('stage2-analyze-claim', { 1275 + jobId: 'job123', 1276 + claimId: claim.claim_id, 1277 + canonicalClaim: claim.canonical_claim, 1278 + checkCache: true 1279 + }) 1280 + ) 1281 + ); 1282 + 1283 + // On all Stage 2 completions → enqueue Stage 3 1284 + await Promise.all(stage2Jobs.map(j => j.waitUntilFinished())); 1285 + 1286 + const claimVerdicts = await gatherStage2Results('job123'); 1287 + 1288 + await queue.add('stage3-holistic', { 1289 + jobId: 'job123', 1290 + articleUrl: 'https://example.com/article', 1291 + claimVerdicts: claimVerdicts 1292 + }); 1293 +}); 1294 +{{/code}} 578 578 579 -**Web search costs (optional):** 580 -* Using external search API (Tavily, Brave): $0.01-0.05 per analysis 581 -* POC1 can use free search APIs initially 1296 +**Note:** This is a conceptual sketch. Actual implementation may use BullMQ Flow API or custom orchestration. 582 582 583 -**Infrastructure costs:** 584 -* Vercel hobby tier: Free for POC 585 -* AWS Lambda: ~$0.001 per request 586 -* **Total infra:** <$0.01 per analysis 1298 +**Cache Check Logic:** 1299 +{{code language="typescript"}} 1300 +async function analyzeClaimWithCache(claim: string): Promise<ClaimAnalysis> { 1301 + const canonicalClaim = normalizeClaim(claim); 1302 + const claimHash = sha256(canonicalClaim); 1303 + const cacheKey = `claim:v1:${claimHash}`; 1304 + 1305 + // Check cache 1306 + const cached = await redis.get(cacheKey); 1307 + if (cached) { 1308 + await redis.incr(`claim:stats:hit_count:${claimHash}`); 1309 + return JSON.parse(cached); 1310 + } 1311 + 1312 + // Cache miss - analyze with LLM 1313 + const analysis = await analyzeClaim_Stage2(canonicalClaim); 1314 + 1315 + // Store in cache 1316 + await redis.set(cacheKey, JSON.stringify(analysis), 'EX', 7776000); // 90 days 1317 + 1318 + return analysis; 1319 +} 1320 +{{/code}} 587 587 588 - **Totalestimated cost:** ~$0.15-0.35peranalysis ✅ Meets<$0.35 target1322 +=== 9.3 User Credit Management === 589 589 590 -=== 9.4 Estimated Timeline (AI-Assisted) === 1324 +**PostgreSQL Schema:** 1325 +{{code language="sql"}} 1326 +CREATE TABLE user_credits ( 1327 + user_id UUID PRIMARY KEY, 1328 + tier VARCHAR(20) DEFAULT 'free', 1329 + credit_limit DECIMAL(10,2) DEFAULT 10.00, 1330 + credit_used DECIMAL(10,2) DEFAULT 0.00, 1331 + reset_date TIMESTAMP, 1332 + cache_only_mode BOOLEAN DEFAULT false, 1333 + created_at TIMESTAMP DEFAULT NOW() 1334 +); 591 591 592 -**With Cursor IDE + Claude API:** 593 -* Day 1-2: API scaffolding + job queue 594 -* Day 3-4: LLM integration + prompt engineering 595 -* Day 5-6: Evidence retrieval + contradiction search 596 -* Day 7: Report templates + testing with 30 articles 597 -* **Total:** 5-7 days for working POC1 1336 +CREATE TABLE usage_log ( 1337 + id SERIAL PRIMARY KEY, 1338 + user_id UUID REFERENCES user_credits(user_id), 1339 + job_id VARCHAR(50), 1340 + stage VARCHAR(20), 1341 + cost DECIMAL(10,4), 1342 + cache_hit BOOLEAN, 1343 + created_at TIMESTAMP DEFAULT NOW() 1344 +); 1345 +{{/code}} 598 598 599 -**Manual coding (no AI assistance):** 600 -* Estimate: 15-20 days 1347 +**Credit Deduction Logic:** 1348 +{{code language="typescript"}} 1349 +async function deductCredit(userId: string, cost: number): Promise<boolean> { 1350 + const user = await db.query('SELECT * FROM user_credits WHERE user_id = $1', [userId]); 1351 + 1352 + const newUsed = user.credit_used + cost; 1353 + 1354 + if (newUsed > user.credit_limit && user.tier === 'free') { 1355 + // Trigger cache-only mode 1356 + await db.query( 1357 + 'UPDATE user_credits SET cache_only_mode = true WHERE user_id = $1', 1358 + [userId] 1359 + ); 1360 + throw new Error('CREDIT_LIMIT_REACHED'); 1361 + } 1362 + 1363 + await db.query( 1364 + 'UPDATE user_credits SET credit_used = $1 WHERE user_id = $2', 1365 + [newUsed, userId] 1366 + ); 1367 + 1368 + return true; 1369 +} 1370 +{{/code}} 601 601 602 -=== 9. 5FirstPrompt for AI CodeGeneration ===1372 +=== 9.4 Cache-Only Mode Implementation === 603 603 604 -{{code}} 605 -Based on the FactHarbor POC1 API & Schemas Specification (v0.3), generate a Next.js 14 TypeScript application with: 1374 +**Middleware:** 1375 +{{code language="typescript"}} 1376 +async function checkCacheOnlyMode(req, res, next) { 1377 + const user = await getUserCredit(req.userId); 1378 + 1379 + if (user.cache_only_mode) { 1380 + // Allow only cache reads 1381 + if (req.body.options?.cache_preference !== 'allow_partial') { 1382 + return res.status(402).json({ 1383 + error: 'credit_limit_reached', 1384 + message: 'Resubmit with cache_preference=allow_partial', 1385 + cache_only_mode: true 1386 + }); 1387 + } 1388 + 1389 + // Modify request to skip Stage 2 for uncached claims 1390 + req.cacheOnlyMode = true; 1391 + } 1392 + 1393 + next(); 1394 +} 1395 +{{/code}} 606 606 607 -1. API routes implementing the 7 endpoints specified in Section 3 608 -2. AnalyzeRequest/AnalysisResult types matching schemas in Sections 4-5 609 -3. Anthropic Claude 3.5 Sonnet integration for: 610 - - Claim extraction (with central/supporting marking) 611 - - Scenario generation 612 - - Evidence synthesis (with mandatory contradiction search) 613 - - Verdict generation 614 - - Holistic assessment (article-level credibility) 615 -4. Job-based async execution with progress tracking (7 pipeline stages) 616 -5. Quality Gates 1 & 4 from NFR11 implementation 617 -6. Mandatory contradiction search enforcement (Section 5) 618 -7. Context-aware analysis (experimental) as specified 619 -8. Filesystem-based job storage (no database) 620 -9. Markdown report generation from JSON templates (Section 6) 1397 +=== 9.5 Estimated Timeline === 621 621 622 -Use the validation rules from Section 5 and error codes from Section 2.1.1. 623 -Target: <$0.35 per analysis, <2 minutes processing time. 624 -{{/code}} 1399 +**POC1 with 3-Stage Architecture:** 1400 +* Week 1: Stage 1 (Haiku extraction) + Redis setup 1401 +* Week 2: Stage 2 (Sonnet analysis + caching) 1402 +* Week 3: Stage 3 (Holistic assessment) + pipeline orchestration 1403 +* Week 4: User credit system + cache-only mode 1404 +* Week 5: Testing with 100 articles (measure cache hit rate) 1405 +* Week 6: Optimization + bug fixes 1406 +* **Total: 6-8 weeks** 625 625 1408 +**Manual coding:** 12-16 weeks 1409 + 626 626 --- 627 627 628 -== 10. Testing Strategy (POC1)==1412 +== 10. Testing Strategy == 629 629 630 -=== 10.1 Test Dataset(30 Articles)===1414 +=== 10.1 Cache Performance Testing === 631 631 632 -**Category 1: Straightforward Factual (10 articles)** 633 -* Purpose: Baseline accuracy 634 -* Example: "WHO report on global vaccination rates" 635 -* Expected: High claim accuracy, straightforward verdict 1416 +**Test Scenarios:** 636 636 637 -** Category2:AccurateClaims,QuestionableConclusions(10 articles)**⭐ **Context-Aware Test**638 -* Purpose:Testholistic assessmentcapability639 -* Example:"Coffeecurescancer"(truepremises, falseconclusion)640 -* Expected: IndividualclaimsTRUE,articleverdict MISLEADING1418 +**Scenario 1: Cold Start (0 cache)** 1419 +* Analyze 100 diverse articles 1420 +* Measure: Cost per article, cache growth rate 1421 +* Expected: $0.35-0.40 avg, ~400 unique claims cached 641 641 642 -** Category3:Mixed Accuracy(5articles)**643 -* Purpose:Testnuancehandling644 -* Example:Articleswithsome true, some falseclaims645 -* Expected: Scenario-leveldifferentiation1423 +**Scenario 2: Warm Cache (Overlapping Domain)** 1424 +* Analyze 100 articles on SAME topic (e.g., "2024 election") 1425 +* Measure: Cache hit rate growth 1426 +* Expected: Hit rate 20% → 60% by article 100 646 646 647 -** Category4:Low-QualityClaims(5articles)**648 -* Purpose:Testqualitygates649 -* Example:Opinion pieces,compound claims650 -* Expected: Gate 1failures,rejection or draft-only mode1428 +**Scenario 3: Mature Cache (1,000 articles)** 1429 +* Analyze next 100 articles (diverse topics) 1430 +* Measure: Steady-state cache hit rate 1431 +* Expected: 60-70% hit rate, $0.15-0.18 avg cost 651 651 1433 +**Scenario 4: Cache-Only Mode** 1434 +* Free user reaches $10 limit (67 articles at 70% hit rate) 1435 +* Submit 10 more articles with {{code}}cache_preference=allow_partial{{/code}} 1436 +* Measure: Coverage %, user satisfaction 1437 +* Expected: 60-70% coverage, instant results 1438 + 652 652 === 10.2 Success Metrics === 653 653 654 -**Quality Metrics:** 655 -* Hallucination rate: <5% (target: <3%) 656 -* Context-aware accuracy: ≥70% (experimental - key POC1 goal) 1441 +**Cache Performance:** 1442 +* Week 1: 5-10% hit rate 1443 +* Week 2: 15-25% hit rate 1444 +* Week 3: 30-40% hit rate 1445 +* Week 4: 45-55% hit rate 1446 +* Target: ≥50% by 1,000 articles 1447 + 1448 +**Cost Targets:** 1449 +* Articles 1-100: $0.35-0.40 avg ⚠️ (expected) 1450 +* Articles 100-500: $0.25-0.30 avg 1451 +* Articles 500-1,000: $0.18-0.22 avg 1452 +* Articles 1,000+: $0.12-0.15 avg ✅ 1453 + 1454 +**Quality Metrics (same as v0.3.1):** 1455 +* Hallucination rate: <5% 1456 +* Context-aware accuracy: ≥70% 657 657 * False positive rate: <15% 658 658 * Mandatory contradiction search: 100% compliance 659 659 660 -**Performance Metrics:** 661 -* Processing time: <2 minutes per article (standard depth) 662 -* Cost per analysis: <$0.35 663 -* API uptime: >99% 664 -* LLM API error rate: <1% 1460 +=== 10.3 Free Tier Economics Validation === 665 665 666 -**See:** [[POC1 Roadmap>>Test.FactHarbor.Roadmap.POC1.WebHome]] Section 11 for complete success criteria and testing methodology. 1462 +**Test with simulated 1,000 users:** 1463 +* Each user: $10 credit 1464 +* 70% cache hit rate 1465 +* Avg 70 articles/user/month 667 667 1467 +**Projected Costs:** 1468 +* Total credits: 1,000 × $10 = $10,000 1469 +* Actual LLM costs: ~$9,000 (cache savings) 1470 +* Margin: 10% 1471 + 1472 +**Sustainability Check:** 1473 +* If margin <5% → Reduce free tier limit 1474 +* If margin >20% → Consider increasing free tier 1475 + 668 668 --- 669 669 670 - **EndofSpecification- FactHarborPOC1 API v0.3**1478 +== 11. Cross-References == 671 671 672 - **ReadyforxWikiimport andAI-assisted implementation!**🚀1480 +This API specification implements requirements from: 673 673 1482 +* **[[POC Requirements>>Test.FactHarbor.Specification.POC.Requirements]]** 1483 +** FR-POC-1 through FR-POC-6 (3-stage architecture) 1484 +** NFR-POC-1 through NFR-POC-3 (quality gates, caching) 1485 +** NEW: FR-POC-7 (Claim-level caching) 1486 +** NEW: FR-POC-8 (User credit system) 1487 +** NEW: FR-POC-9 (Cache-only mode) 1488 + 1489 +* **[[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]]** 1490 +** Approach 1 implemented in Stage 3 1491 +** Context-aware holistic assessment 1492 + 1493 +* **[[Requirements>>Test.FactHarbor.Specification.Requirements.WebHome]]** 1494 +** FR4 (Analysis Summary) - enhanced with caching 1495 +** FR7 (Verdict Calculation) - cached per claim 1496 +** NFR11 (Quality Gates) - enforced across stages 1497 +** NEW: NFR19 (Cost Efficiency via Caching) 1498 +** NEW: NFR20 (Free Tier Sustainability) 1499 + 1500 +* **[[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]]** 1501 +** POC1 3-stage pipeline architecture 1502 +** Redis cache layer 1503 +** User credit system 1504 + 1505 +* **[[Data Model>>Test.FactHarbor.Specification.Data Model.WebHome]]** 1506 +** Claim structure (cacheable unit) 1507 +** Evidence structure 1508 +** Scenario boundaries 1509 + 1510 +--- 1511 + 1512 +**End of Specification - FactHarbor POC1 API v0.4** 1513 + 1514 +**3-stage caching architecture with free tier cache-only mode. Ready for sustainable, scalable implementation!** 🚀 1515 +