Changes for page POC1 API & Schemas Specification
Last modified by Robert Schaub on 2025/12/24 18:26
Summary
-
Page properties (2 modified, 0 added, 0 removed)
Details
- Page properties
-
- Title
-
... ... @@ -1,1 +1,1 @@ 1 -POC1 API & Schemas Specification v0.4.11 +POC1 API & Schemas Specification - Content
-
... ... @@ -1,6 +1,6 @@ 1 1 # FactHarbor POC1 — API & Schemas Specification 2 2 3 -**Version:** 0. 4.1(POC1 -3-Stage CachingArchitecture)3 +**Version:** 0.3 (POC1 - Production Ready) 4 4 **Namespace:** FactHarbor.* 5 5 **Syntax:** xWiki 2.1 6 6 **Last Updated:** 2025-12-24 ... ... @@ -10,31 +10,15 @@ 10 10 == Version History == 11 11 12 12 |=Version|=Date|=Changes 13 -|0.4.1|2025-12-24|Applied 9 critical fixes: file format notice, verdict taxonomy, canonicalization algorithm, Stage 1 cost policy, BullMQ fix, language in cache key, historical claims TTL, idempotency, copyright policy 14 -|0.4|2025-12-24|**BREAKING:** 3-stage pipeline with claim-level caching, user tier system, cache-only mode for free users, Redis cache architecture 15 -|0.3.1|2025-12-24|Fixed single-prompt strategy, SSE clarification, schema canonicalization, cost constraints, chain-of-thought, evidence citation, Jina safety, gate numbering 16 16 |0.3|2025-12-24|Added complete API endpoints, LLM config, risk tiers, scraping details, quality gate logging, temporal separation note, cross-references 17 17 |0.2|2025-12-24|Initial rebased version with holistic assessment 18 18 |0.1|2025-12-24|Original specification 19 19 20 20 --- 21 ---- 22 22 23 -== File Format Notice == 24 - 25 -**⚠️ Important:** This file is stored as {{code}}.md{{/code}} for transport/versioning, but the content is **xWiki 2.1 syntax** (not Markdown). 26 - 27 -**When importing to xWiki:** 28 -* Use "Import as XWiki content" (not "Import as Markdown") 29 -* The xWiki parser will correctly interpret {{code}}==}} headers, {{{{code}}}}}} blocks, etc. 30 - 31 -**Alternate naming:** If your workflow supports it, rename to {{code}}.xwiki.txt{{/code}} to avoid ambiguity. 32 - 33 ---- 34 - 35 35 == 1. Core Objective (POC1) == 36 36 37 -The primary technical goal of POC1 is to validate **Approach 1 (Single-Pass Holistic Analysis)** while implementing **claim-level caching** to achieve cost sustainability:21 +The primary technical goal of POC1 is to validate **Approach 1 (Single-Pass Holistic Analysis)**: 38 38 39 39 The system must prove that AI can identify an article's **Main Thesis** and determine if the supporting claims (even if individually accurate) logically support that thesis without committing fallacies (e.g., correlation vs. causation, cherry-picking, hasty generalization). 40 40 ... ... @@ -41,226 +41,69 @@ 41 41 **Success Criteria:** 42 42 * Test with 30 diverse articles 43 43 * Target: ≥70% accuracy detecting misleading articles 44 -* Cost: <$0.25 per NEW analysis (uncached) 45 -* Cost: $0.00 for cached claim reuse 46 -* Cache hit rate: ≥50% after 1,000 articles 28 +* Cost: <$0.35 per analysis 47 47 * Processing time: <2 minutes (standard depth) 48 48 49 -**Economic Model:** 50 -* Free tier: $10 credit per month (~40-140 articles depending on cache hits) 51 -* After limit: Cache-only mode (instant, free access to cached claims) 52 -* Paid tier: Unlimited new analyses 53 - 54 54 **See:** [[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]] for complete investigation of 7 approaches. 55 55 56 56 --- 57 57 58 -== 2. ArchitectureOverview==35 +== 2. Runtime Model & Job States == 59 59 60 -=== 2.1 3-StagePipelinewithCaching===37 +=== 2.1 Pipeline Steps === 61 61 62 -F actHarborPOC1 usesa **3-stagearchitecture**designedfor claim-levelcaching and costefficiency:39 +For progress reporting via API, the pipeline follows these stages: 63 63 64 -{{code language="mermaid"}} 65 -graph TD 66 - A[Article Input] --> B[Stage 1: Extract Claims] 67 - B --> C{For Each Claim} 68 - C --> D[Check Cache] 69 - D -->|Cache HIT| E[Return Cached Verdict] 70 - D -->|Cache MISS| F[Stage 2: Analyze Claim] 71 - F --> G[Store in Cache] 72 - G --> E 73 - E --> H[Stage 3: Holistic Assessment] 74 - H --> I[Final Report] 75 -{{/code}} 41 +# **INGEST**: URL scraping (Jina Reader / Trafilatura) or text normalization. 42 +# **EXTRACT_CLAIMS**: Identifying 3-5 verifiable factual claims + marking central vs. supporting. 43 +# **SCENARIOS**: Generating context interpretations for each claim. 44 +# **RETRIEVAL**: Evidence gathering (Search API + mandatory contradiction search). 45 +# **VERDICTS**: Assigning likelihoods, confidence, and uncertainty per scenario. 46 +# **HOLISTIC_ASSESSMENT**: Evaluating article-level credibility (Thesis vs. Claims logic). 47 +# **REPORT**: Generating final Markdown and JSON outputs. 76 76 77 -**Stage 1: Claim Extraction** (Haiku, no cache) 78 -* Input: Article text 79 -* Output: 5 canonical claims (normalized, deduplicated) 80 -* Model: Claude Haiku 4 81 -* Cost: $0.003 per article 82 -* Cache strategy: No caching (article-specific) 49 +=== 2.1.1 URL Extraction Strategy === 83 83 84 -**Stage 2: Claim Analysis** (Sonnet, CACHED) 85 -* Input: Single canonical claim 86 -* Output: Scenarios + Evidence + Verdicts 87 -* Model: Claude Sonnet 3.5 88 -* Cost: $0.081 per NEW claim 89 -* Cache strategy: **Redis, 90-day TTL** 90 -* Cache key: {{code}}claim:v1norm1:{language}:{sha256(canonical_claim)}{{/code}} 51 +**Primary:** Jina AI Reader ({{code}}https://r.jina.ai/{url}{{/code}}) 52 +* **Rationale:** Clean markdown, handles JS rendering, free tier sufficient 53 +* **Fallback:** Trafilatura (Python library) for simple static HTML 91 91 92 -**Stage 3: Holistic Assessment** (Sonnet, no cache) 93 -* Input: Article + Claim verdicts (from cache or Stage 2) 94 -* Output: Article verdict + Fallacies + Logic quality 95 -* Model: Claude Sonnet 3.5 96 -* Cost: $0.030 per article 97 -* Cache strategy: No caching (article-specific) 55 +**Error Handling:** 98 98 99 -**Total Cost Formula:** 100 -{{code}} 101 -Cost = $0.003 (extraction) + (N_new_claims × $0.081) + $0.030 (holistic) 57 +|=Error Code|=Trigger|=Action 58 +|{{code}}URL_BLOCKED{{/code}}|403/401/Paywall detected|Return error, suggest text paste 59 +|{{code}}URL_UNREACHABLE{{/code}}|Network/DNS failure|Retry once, then fail 60 +|{{code}}URL_NOT_FOUND{{/code}}|404 Not Found|Return error immediately 61 +|{{code}}EXTRACTION_FAILED{{/code}}|Content <50 words or unreadable|Return error with reason 102 102 103 - Examples:104 - -0newclaims (100%cachehit):$0.033105 - -1newclaim(80%cachehit):$0.114106 - -3newclaims (40%cachehit):$0.276107 - -5newclaims (0% cachehit):$0.438108 - {{/code}}63 +**Supported URL Patterns:** 64 +* ✅ News articles, blog posts, Wikipedia 65 +* ✅ Academic preprints (arXiv) 66 +* ❌ Social media posts (Twitter, Facebook) - not in POC1 67 +* ❌ Video platforms (YouTube, TikTok) - not in POC1 68 +* ❌ PDF files - deferred to Beta 0 109 109 110 -=== 2.2 UserTierSystem===70 +=== 2.2 Job Status Enumeration === 111 111 112 -|=Tier|=Monthly Credit|=After Limit|=Cache Access|=Analytics 113 -|**Free**|$10|Cache-only mode|✅ Full|Basic 114 -|**Pro** (future)|$50|Continues|✅ Full|Advanced 115 -|**Enterprise** (future)|Custom|Continues|✅ Full + Priority|Full 72 +((( 73 +* **QUEUED** - Job accepted, waiting in queue 74 +* **RUNNING** - Processing in progress 75 +* **SUCCEEDED** - Analysis complete, results available 76 +* **FAILED** - Error occurred, see error details 77 +* **CANCELLED** - User cancelled via DELETE endpoint 78 +))) 116 116 117 -**Free Tier Economics:** 118 -* $10 credit = 40-140 articles analyzed (depending on cache hit rate) 119 -* Average 70 articles/month at 70% cache hit rate 120 -* After limit: Cache-only mode (see Section 2.3) 121 - 122 -=== 2.3 Cache-Only Mode (Free Tier Feature) === 123 - 124 -When free users reach their $10 monthly limit, they enter **Cache-Only Mode**: 125 - 126 -**What Cache-Only Mode Provides:** 127 - 128 -✅ **Claim Extraction (Platform-Funded):** 129 -* Stage 1 extraction runs at $0.003 per article 130 -* **Cost: Absorbed by platform** (not charged to user credit) 131 -* Rationale: Extraction is necessary to check cache, and cost is negligible 132 -* Rate limit: Max 50 extractions/day in cache-only mode (prevents abuse) 133 - 134 -✅ **Instant Access to Cached Claims:** 135 -* Any claim that exists in cache → Full verdict returned 136 -* Cost: $0 (no LLM calls) 137 -* Response time: <100ms 138 - 139 -✅ **Partial Article Analysis:** 140 -* Check each claim against cache 141 -* Return verdicts for ALL cached claims 142 -* For uncached claims: Return {{code}}"status": "cache_miss"{{/code}} 143 - 144 -✅ **Cache Coverage Report:** 145 -* "3 of 5 claims available in cache (60% coverage)" 146 -* Links to cached analyses 147 -* Estimated cost to complete: $0.162 (2 new claims) 148 - 149 -❌ **Not Available in Cache-Only Mode:** 150 -* New claim analysis (Stage 2 LLM calls blocked) 151 -* Full holistic assessment (Stage 3 blocked if any claims missing) 152 - 153 -**User Experience:** 154 -{{code language="json"}} 155 -{ 156 - "status": "cache_only_mode", 157 - "message": "Monthly credit limit reached. Showing cached results only.", 158 - "cache_coverage": { 159 - "claims_total": 5, 160 - "claims_cached": 3, 161 - "claims_missing": 2, 162 - "coverage_percent": 60 163 - }, 164 - "cached_claims": [ 165 - {"claim_id": "C1", "verdict": "Likely", "confidence": 0.82}, 166 - {"claim_id": "C2", "verdict": "Highly Likely", "confidence": 0.91}, 167 - {"claim_id": "C4", "verdict": "Unclear", "confidence": 0.55} 168 - ], 169 - "missing_claims": [ 170 - {"claim_id": "C3", "claim_text": "...", "estimated_cost": "$0.081"}, 171 - {"claim_id": "C5", "claim_text": "...", "estimated_cost": "$0.081"} 172 - ], 173 - "upgrade_options": { 174 - "top_up": "$5 for 20-70 more articles", 175 - "pro_tier": "$50/month unlimited" 176 - } 177 -} 178 -{{/code}} 179 - 180 -**Design Rationale:** 181 -* Free users still get value (cached claims often answer their question) 182 -* Demonstrates FactHarbor's value (partial results encourage upgrade) 183 -* Sustainable for platform (no additional cost) 184 -* Fair to all users (everyone contributes to cache) 185 - 186 186 --- 187 187 188 188 == 3. REST API Contract == 189 189 190 -=== 3.1 UserCreditTracking===84 +=== 3.1 Create Analysis Job === 191 191 192 -**Endpoint:** {{code}}GET /v1/user/credit{{/code}} 193 - 194 -**Response:** {{code}}200 OK{{/code}} 195 - 196 -{{code language="json"}} 197 -{ 198 - "user_id": "user_abc123", 199 - "tier": "free", 200 - "credit_limit": 10.00, 201 - "credit_used": 7.42, 202 - "credit_remaining": 2.58, 203 - "reset_date": "2025-02-01T00:00:00Z", 204 - "cache_only_mode": false, 205 - "usage_stats": { 206 - "articles_analyzed": 67, 207 - "claims_from_cache": 189, 208 - "claims_newly_analyzed": 113, 209 - "cache_hit_rate": 0.626 210 - } 211 -} 212 -{{/code}} 213 - 214 ---- 215 - 216 -=== 3.2 Create Analysis Job (3-Stage) === 217 - 218 218 **Endpoint:** {{code}}POST /v1/analyze{{/code}} 219 219 220 -**Request Body:** 221 - 222 - 223 -**Idempotency Support:** 224 - 225 -To prevent duplicate job creation on network retries, clients SHOULD include: 226 - 227 -{{code language="http"}} 228 -POST /v1/analyze 229 -Idempotency-Key: {client-generated-uuid} 230 -{{/code}} 231 - 232 -OR use the {{code}}client.request_id{{/code}} field: 233 - 88 +**Request Body Example:** 234 234 {{code language="json"}} 235 235 { 236 - "input_url": "...", 237 - "client": { 238 - "request_id": "client-uuid-12345", 239 - "source_label": "optional" 240 - } 241 -} 242 -{{/code}} 243 - 244 -**Server Behavior:** 245 -* If {{code}}Idempotency-Key{{/code}} or {{code}}request_id{{/code}} seen before (within 24 hours): 246 - - Return existing job ({{code}}200 OK{{/code}}, not {{code}}202 Accepted{{/code}}) 247 - - Do NOT create duplicate job or charge twice 248 -* Idempotency keys expire after 24 hours (matches job retention) 249 - 250 -**Example Response (Idempotent):** 251 -{{code language="json"}} 252 -{ 253 - "job_id": "01J...ULID", 254 - "status": "RUNNING", 255 - "idempotent": true, 256 - "original_request_at": "2025-12-24T10:31:00Z", 257 - "message": "Returning existing job (idempotency key matched)" 258 -} 259 -{{/code}} 260 - 261 - 262 -{{code language="json"}} 263 -{ 264 264 "input_type": "url", 265 265 "input_url": "https://example.com/medical-report-01", 266 266 "input_text": null, ... ... @@ -268,8 +268,7 @@ 268 268 "browsing": "on", 269 269 "depth": "standard", 270 270 "max_claims": 5, 271 - "context_aware_analysis": true, 272 - "cache_preference": "prefer_cache" 98 + "context_aware_analysis": true 273 273 }, 274 274 "client": { 275 275 "request_id": "optional-client-tracking-id", ... ... @@ -279,10 +279,10 @@ 279 279 {{/code}} 280 280 281 281 **Options:** 282 -* {{code}} cache_preference{{/code}}: {{code}}prefer_cache{{/code}} | {{code}}require_fresh{{/code}}|{{code}}allow_partial{{/code}}283 - -{{code}}prefer_cache{{/code}}:Use cachewhen available, analyze new claims(default)284 - -{{code}}require_fresh{{/code}}:Force re-analysisof all claims(ignores cache, costsmore)285 - -{{code}}allow_partial{{/code}}:Return partial results if someclaimsuncached (for freetier cache-only mode)108 +* {{code}}browsing{{/code}}: {{code}}on{{/code}} | {{code}}off{{/code}} (retrieve web sources or just output queries) 109 +* {{code}}depth{{/code}}: {{code}}standard{{/code}} | {{code}}deep{{/code}} (evidence thoroughness) 110 +* {{code}}max_claims{{/code}}: 1-50 (default: 10) 111 +* {{code}}context_aware_analysis{{/code}}: {{code}}true{{/code}} | {{code}}false{{/code}} (experimental) 286 286 287 287 **Response:** {{code}}202 Accepted{{/code}} 288 288 ... ... @@ -291,18 +291,6 @@ 291 291 "job_id": "01J...ULID", 292 292 "status": "QUEUED", 293 293 "created_at": "2025-12-24T10:31:00Z", 294 - "estimated_cost": 0.114, 295 - "cost_breakdown": { 296 - "stage1_extraction": 0.003, 297 - "stage2_new_claims": 0.081, 298 - "stage2_cached_claims": 0.000, 299 - "stage3_holistic": 0.030 300 - }, 301 - "cache_info": { 302 - "claims_to_extract": 5, 303 - "estimated_cache_hits": 4, 304 - "estimated_new_claims": 1 305 - }, 306 306 "links": { 307 307 "self": "/v1/jobs/01J...ULID", 308 308 "result": "/v1/jobs/01J...ULID/result", ... ... @@ -312,23 +312,9 @@ 312 312 } 313 313 {{/code}} 314 314 315 -**Error Responses:** 316 - 317 -{{code}}402 Payment Required{{/code}} - Free tier limit reached, cache-only mode 318 -{{code language="json"}} 319 -{ 320 - "error": "credit_limit_reached", 321 - "message": "Monthly credit limit reached. Entering cache-only mode.", 322 - "cache_only_mode": true, 323 - "credit_remaining": 0.00, 324 - "reset_date": "2025-02-01T00:00:00Z", 325 - "action": "Resubmit with cache_preference=allow_partial for cached results" 326 -} 327 -{{/code}} 328 - 329 329 --- 330 330 331 -=== 3. 3Get Job Status ===131 +=== 3.2 Get Job Status === 332 332 333 333 **Endpoint:** {{code}}GET /v1/jobs/{job_id}{{/code}} 334 334 ... ... @@ -341,20 +341,12 @@ 341 341 "created_at": "2025-12-24T10:31:00Z", 342 342 "updated_at": "2025-12-24T10:31:22Z", 343 343 "progress": { 344 - "stage": "stage2_claim_analysis", 345 - "percent": 65, 346 - "message": "Analyzing claim 3 of 5 (2 from cache)", 347 - "current_claim_id": "C3", 348 - "cache_hits": 2, 349 - "cache_misses": 1 144 + "step": "RETRIEVAL", 145 + "percent": 60, 146 + "message": "Gathering evidence for C2-S1", 147 + "current_claim_id": "C2", 148 + "current_scenario_id": "C2-S1" 350 350 }, 351 - "actual_cost": 0.084, 352 - "cost_breakdown": { 353 - "stage1_extraction": 0.003, 354 - "stage2_new_claims": 0.081, 355 - "stage2_cached_claims": 0.000, 356 - "stage3_holistic": null 357 - }, 358 358 "input_echo": { 359 359 "input_type": "url", 360 360 "input_url": "https://example.com/medical-report-01" ... ... @@ -370,61 +370,12 @@ 370 370 371 371 --- 372 372 373 -=== 3. 4GetAnalysisResult ===165 +=== 3.3 Get JSON Result === 374 374 375 375 **Endpoint:** {{code}}GET /v1/jobs/{job_id}/result{{/code}} 376 376 377 -**Response:** {{code}}200 OK{{/code}} 169 +**Response:** {{code}}200 OK{{/code}} (Returns the **AnalysisResult** schema - see Section 4) 378 378 379 -Returns complete **AnalysisResult** schema (see Section 4). 380 - 381 -**Cache-Only Mode Response:** {{code}}206 Partial Content{{/code}} 382 - 383 -{{code language="json"}} 384 -{ 385 - "cache_only_mode": true, 386 - "cache_coverage": { 387 - "claims_total": 5, 388 - "claims_cached": 3, 389 - "claims_missing": 2, 390 - "coverage_percent": 60 391 - }, 392 - "partial_result": { 393 - "metadata": { 394 - "job_id": "01J...ULID", 395 - "timestamp_utc": "2025-12-24T10:31:30Z", 396 - "engine_version": "POC1-v0.4", 397 - "cache_only": true 398 - }, 399 - "claims": [ 400 - { 401 - "claim_id": "C1", 402 - "claim_text": "...", 403 - "canonical_claim": "...", 404 - "source": "cache", 405 - "cached_at": "2025-12-20T15:30:00Z", 406 - "cache_hit_count": 47, 407 - "scenarios": [...] 408 - }, 409 - { 410 - "claim_id": "C3", 411 - "claim_text": "...", 412 - "canonical_claim": "...", 413 - "source": "not_analyzed", 414 - "status": "cache_miss", 415 - "estimated_cost": 0.081 416 - } 417 - ], 418 - "article_holistic_assessment": null, 419 - "upgrade_prompt": { 420 - "message": "Upgrade to Pro for full analysis of all claims", 421 - "missing_claims": 2, 422 - "cost_to_complete": 0.192 423 - } 424 - } 425 -} 426 -{{/code}} 427 - 428 428 **Other Responses:** 429 429 * {{code}}409 Conflict{{/code}} - Job not finished yet 430 430 * {{code}}404 Not Found{{/code}} - Job ID unknown ... ... @@ -431,29 +431,8 @@ 431 431 432 432 --- 433 433 434 -=== 3. 5Stage-SpecificEndpoints(Optional,Advanced)===177 +=== 3.4 Download Markdown Report === 435 435 436 -For direct stage access (useful for cache debugging, custom workflows): 437 - 438 -**Extract Claims Only:** 439 -{{code}}POST /v1/analyze/extract-claims{{/code}} 440 - 441 -**Analyze Single Claim:** 442 -{{code}}POST /v1/analyze/claim{{/code}} 443 - 444 -**Assess Article (with claim verdicts):** 445 -{{code}}POST /v1/analyze/assess-article{{/code}} 446 - 447 -**Check Claim Cache:** 448 -{{code}}GET /v1/cache/claim/{claim_hash}{{/code}} 449 - 450 -**Cache Statistics:** 451 -{{code}}GET /v1/cache/stats{{/code}} 452 - 453 ---- 454 - 455 -=== 3.6 Download Markdown Report === 456 - 457 457 **Endpoint:** {{code}}GET /v1/jobs/{job_id}/report{{/code}} 458 458 459 459 **Response:** {{code}}200 OK{{/code}} with {{code}}text/markdown; charset=utf-8{{/code}} content ... ... @@ -461,11 +461,13 @@ 461 461 **Headers:** 462 462 * {{code}}Content-Disposition: attachment; filename="factharbor_poc1_{job_id}.md"{{/code}} 463 463 464 -**Cache-Only Mode:** Report includes "Partial Analysis" watermark and upgrade prompt. 186 +**Other Responses:** 187 +* {{code}}409 Conflict{{/code}} - Job not finished 188 +* {{code}}404 Not Found{{/code}} - Job unknown 465 465 466 466 --- 467 467 468 -=== 3. 7Stream Job Events (BackendProgress) ===192 +=== 3.5 Stream Job Events (Optional, Recommended) === 469 469 470 470 **Endpoint:** {{code}}GET /v1/jobs/{job_id}/events{{/code}} 471 471 ... ... @@ -472,1044 +472,478 @@ 472 472 **Response:** Server-Sent Events (SSE) stream 473 473 474 474 **Event Types:** 475 -* {{code}}progress{{/code}} - Backend progress (e.g., "Stage 1: Extracting claims") 476 -* {{code}}cache_hit{{/code}} - Claim found in cache 477 -* {{code}}cache_miss{{/code}} - Claim requires new analysis 478 -* {{code}}stage_complete{{/code}} - Stage 1/2/3 finished 199 +* {{code}}progress{{/code}} - Progress update 200 +* {{code}}claim_extracted{{/code}} - Claim identified 201 +* {{code}}verdict_computed{{/code}} - Scenario verdict complete 479 479 * {{code}}complete{{/code}} - Job finished 480 480 * {{code}}error{{/code}} - Error occurred 481 -* {{code}}credit_warning{{/code}} - User approaching limit 482 482 483 483 --- 484 484 485 -=== 3. 8Cancel Job ===207 +=== 3.6 Cancel Job === 486 486 487 487 **Endpoint:** {{code}}DELETE /v1/jobs/{job_id}{{/code}} 488 488 489 - **Note:** If job ismid-stage(e.g.,analyzingclaim3 of 5),user is chargedforcompleted workonly.211 +Attempts to cancel a queued or running job. 490 490 213 +**Response:** {{code}}200 OK{{/code}} with updated Job object (status: CANCELLED) 214 + 215 +**Note:** Already-completed jobs cannot be cancelled. 216 + 491 491 --- 492 492 493 -=== 3. 9Health Check ===219 +=== 3.7 Health Check === 494 494 495 495 **Endpoint:** {{code}}GET /v1/health{{/code}} 496 496 223 +**Response:** {{code}}200 OK{{/code}} 224 + 497 497 {{code language="json"}} 498 498 { 499 499 "status": "ok", 500 - "version": "POC1-v0.4", 501 - "model_stage1": "claude-haiku-4", 502 - "model_stage2": "claude-3-5-sonnet-20241022", 503 - "model_stage3": "claude-3-5-sonnet-20241022", 504 - "cache": { 505 - "status": "connected", 506 - "total_claims": 12847, 507 - "avg_hit_rate_24h": 0.73 508 - } 228 + "version": "POC1-v0.3", 229 + "model": "claude-3-5-sonnet-20241022" 509 509 } 510 510 {{/code}} 511 511 512 512 --- 513 513 514 -== 4. DataSchemas==235 +== 4. AnalysisResult Schema (Context-Aware) == 515 515 516 - ===4.1Stage1Output:ClaimExtraction===237 +This schema implements the **Context-Aware Analysis** required by the POC1 specification. 517 517 518 518 {{code language="json"}} 519 519 { 520 - "job_id": "01J...ULID", 521 - "stage": "stage1_extraction", 522 - "article_metadata": { 523 - "title": "Article title", 524 - "source_url": "https://example.com/article", 525 - "extracted_text_length": 5234, 526 - "language": "en" 527 - }, 528 - "claims": [ 529 - { 530 - "claim_id": "C1", 531 - "claim_text": "Original claim text from article", 532 - "canonical_claim": "Normalized, deduplicated phrasing", 533 - "claim_hash": "sha256:abc123...", 534 - "is_central_to_thesis": true, 535 - "claim_type": "causal", 536 - "evaluability": "evaluable", 537 - "risk_tier": "B", 538 - "domain": "public_health" 539 - } 540 - ], 541 - "article_thesis": "Main argument detected", 542 - "cost": 0.003 543 -} 544 -{{/code}} 545 - 546 -=== 4.2 Stage 2 Output: ClaimAnalysis (CACHED) === 547 - 548 -This is the CACHEABLE unit. Stored in Redis with 90-day TTL. 549 - 550 -{{code language="json"}} 551 -{ 552 - "claim_hash": "sha256:abc123...", 553 - "canonical_claim": "COVID vaccines are 95% effective", 554 - "language": "en", 555 - "domain": "public_health", 556 - "analysis_version": "v1.0", 557 - "scenarios": [ 558 - { 559 - "scenario_id": "S1", 560 - "scenario_title": "mRNA vaccines (Pfizer/Moderna) in clinical trials", 561 - "definitions": {"95% effective": "95% reduction in symptomatic infection"}, 562 - "assumptions": ["Based on phase 3 trial data", "Against original strain"], 563 - "boundaries": { 564 - "time": "2020-2021 trials", 565 - "geography": "Multi-country trials", 566 - "population": "Adult population (16+)", 567 - "conditions": "Before widespread variants" 568 - }, 569 - "verdict": { 570 - "label": "Highly Likely", 571 - "probability_range": [0.88, 0.97], 572 - "confidence": 0.92, 573 - "reasoning_chain": [ 574 - "Pfizer/BioNTech trial: 95% efficacy (n=43,548)", 575 - "Moderna trial: 94.1% efficacy (n=30,420)", 576 - "Peer-reviewed publications in NEJM", 577 - "FDA independent analysis confirmed" 578 - ], 579 - "key_supporting_evidence_ids": ["E1", "E2"], 580 - "key_counter_evidence_ids": ["E3"], 581 - "uncertainty_factors": [ 582 - "Limited data on long-term effectiveness", 583 - "Variant-specific performance not yet measured" 584 - ] 585 - }, 586 - "evidence": [ 587 - { 588 - "evidence_id": "E1", 589 - "stance": "supports", 590 - "relevance_to_scenario": 0.98, 591 - "evidence_summary": [ 592 - "Pfizer trial showed 170 cases in placebo vs 8 in vaccine group", 593 - "Follow-up period median 2 months post-dose 2", 594 - "Efficacy consistent across age, sex, race, ethnicity" 595 - ], 596 - "citation": { 597 - "title": "Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine", 598 - "author_or_org": "Polack et al.", 599 - "publication_date": "2020-12-31", 600 - "url": "https://nejm.org/doi/full/10.1056/NEJMoa2034577", 601 - "publisher": "New England Journal of Medicine", 602 - "retrieved_at_utc": "2025-12-20T15:30:00Z" 603 - }, 604 - "excerpt": ["The vaccine was 95% effective in preventing Covid-19"], 605 - "excerpt_word_count": 9, 606 - "source_reliability_score": 0.95, 607 - "reliability_justification": "Peer-reviewed, high-impact journal, large RCT", 608 - "limitations_and_reservations": [ 609 - "Short follow-up period (2 months)", 610 - "Primarily measures symptomatic infection, not transmission" 611 - ], 612 - "retraction_or_dispute_signal": "none" 613 - } 614 - ] 615 - } 616 - ], 617 - "cache_metadata": { 618 - "first_analyzed": "2025-12-01T10:00:00Z", 619 - "last_updated": "2025-12-20T15:30:00Z", 620 - "hit_count": 47, 621 - "version": "v1.0", 622 - "ttl_expires": "2026-03-20T15:30:00Z" 623 - }, 624 - "cost": 0.081 625 -} 626 -{{/code}} 627 - 628 -**Cache Key Structure:** 629 -{{code}} 630 -Redis Key: claim:v1norm1:{language}:{sha256(canonical_claim)} 631 -TTL: 90 days (7,776,000 seconds) 632 -Size: ~15KB JSON (compressed: ~5KB) 633 -{{/code}} 634 - 635 -=== 4.3 Stage 3 Output: HolisticAssessment === 636 - 637 -{{code language="json"}} 638 -{ 639 - "job_id": "01J...ULID", 640 - "stage": "stage3_holistic", 641 - "article_metadata": { 642 - "title": "...", 643 - "main_thesis": "...", 644 - "source_url": "..." 645 - }, 646 - "article_holistic_assessment": { 647 - "overall_verdict": "MISLEADING", 648 - "logic_quality_score": 0.42, 649 - "fallacies_detected": [ 650 - "correlation-causation", 651 - "cherry-picking" 652 - ], 653 - "verdict_reasoning": [ 654 - "Central claim C1 is REFUTED by multiple systematic reviews", 655 - "Supporting claims C2-C4 are TRUE but do not support the thesis", 656 - "Article commits correlation-causation fallacy", 657 - "Selective citation of evidence (cherry-picking detected)" 658 - ], 659 - "experimental_feature": true 660 - }, 661 - "claims_summary": [ 662 - { 663 - "claim_id": "C1", 664 - "is_central_to_thesis": true, 665 - "verdict": "Refuted", 666 - "confidence": 0.89, 667 - "source": "cache", 668 - "cache_hit": true 669 - }, 670 - { 671 - "claim_id": "C2", 672 - "is_central_to_thesis": false, 673 - "verdict": "Highly Likely", 674 - "confidence": 0.91, 675 - "source": "new_analysis", 676 - "cache_hit": false 677 - } 678 - ], 679 - "quality_gates": { 680 - "gate1_claim_validation": "pass", 681 - "gate4_verdict_confidence": "pass", 682 - "passed_all": true 683 - }, 684 - "cost": 0.030, 685 - "total_job_cost": 0.114 686 -} 687 -{{/code}} 688 - 689 -=== 4.4 Complete AnalysisResult (All 3 Stages Combined) === 690 - 691 -{{code language="json"}} 692 -{ 693 693 "metadata": { 694 - "job_id": "01J...ULID", 695 - "timestamp_utc": "2025-12-24T10:31:30Z", 696 - "engine_version": "POC1-v0.4", 697 - "llm_stage1": "claude-haiku-4", 698 - "llm_stage2": "claude-3-5-sonnet-20241022", 699 - "llm_stage3": "claude-3-5-sonnet-20241022", 242 + "job_id": "string (ULID)", 243 + "timestamp_utc": "ISO8601", 244 + "engine_version": "POC1-v0.3", 245 + "llm_provider": "anthropic", 246 + "llm_model": "claude-3-5-sonnet-20241022", 700 700 "usage_stats": { 701 - "stage1_tokens": {"input": 10000, "output": 500}, 702 - "stage2_tokens": {"input": 2000, "output": 5000}, 703 - "stage3_tokens": {"input": 5000, "output": 1000}, 704 - "total_input_tokens": 17000, 705 - "total_output_tokens": 6500, 706 - "estimated_cost_usd": 0.114, 707 - "response_time_sec": 45.2 708 - }, 709 - "cache_stats": { 710 - "claims_total": 5, 711 - "claims_from_cache": 4, 712 - "claims_new_analysis": 1, 713 - "cache_hit_rate": 0.80, 714 - "cache_savings_usd": 0.324 248 + "input_tokens": "integer", 249 + "output_tokens": "integer", 250 + "estimated_cost_usd": "float", 251 + "response_time_sec": "float" 715 715 } 716 716 }, 717 717 "article_holistic_assessment": { 718 - "main_thesis": " ...",719 - "overall_verdict": "MISLEADING", 720 - "logic_quality_score": 0 .42,721 - "fallacies_detected": ["correlation-causation", "cherry-picking"], 722 - "verdict_reasoning": ["...","...","..."],255 + "main_thesis": "string (The core argument detected)", 256 + "overall_verdict": "WELL-SUPPORTED | MISLEADING | REFUTED | UNCERTAIN", 257 + "logic_quality_score": "float (0-1)", 258 + "fallacies_detected": ["correlation-causation", "cherry-picking", "hasty-generalization"], 259 + "verdict_reasoning": "string (Explanation of why article credibility differs from claim average)", 723 723 "experimental_feature": true 724 724 }, 725 725 "claims": [ 726 726 { 727 727 "claim_id": "C1", 728 - "is_central_to_thesis": true, 729 - "claim_text": "...", 730 - "canonical_claim": "...", 731 - "claim_hash": "sha256:abc123...", 732 - "claim_type": "causal", 733 - "evaluability": "evaluable", 734 - "risk_tier": "B", 735 - "source": "cache", 736 - "cached_at": "2025-12-20T15:30:00Z", 737 - "cache_hit_count": 47, 738 - "scenarios": [...] 739 - }, 740 - { 741 - "claim_id": "C2", 742 - "source": "new_analysis", 743 - "analyzed_at": "2025-12-24T10:31:15Z", 744 - "scenarios": [...] 265 + "is_central_to_thesis": "boolean", 266 + "claim_text": "string", 267 + "canonical_form": "string", 268 + "claim_type": "descriptive | causal | predictive | normative | definitional", 269 + "evaluability": "evaluable | partly_evaluable | not_evaluable", 270 + "risk_tier": "A | B | C", 271 + "risk_tier_justification": "string", 272 + "domain": "string (e.g., 'public health', 'economics')", 273 + "key_terms": ["term1", "term2"], 274 + "entities": ["Person X", "Org Y"], 275 + "time_scope_detected": "2020-2024", 276 + "geography_scope_detected": "Brazil", 277 + "scenarios": [ 278 + { 279 + "scenario_id": "C1-S1", 280 + "context_title": "string", 281 + "definitions": {"key_term": "definition"}, 282 + "assumptions": ["Assumption 1", "Assumption 2"], 283 + "boundaries": { 284 + "time": "as of 2025-01", 285 + "geography": "Brazil", 286 + "population": "adult population", 287 + "conditions": "excludes X; includes Y" 288 + }, 289 + "scope_of_evidence": "What counts as evidence for this scenario", 290 + "scenario_questions": ["Question that decides the verdict"], 291 + "verdict": { 292 + "label": "Highly Likely | Likely | Unclear | Unlikely | Refuted | Unsubstantiated", 293 + "probability_range": [0.0, 1.0], 294 + "confidence": "float (0-1)", 295 + "reasoning": "string", 296 + "key_supporting_evidence_ids": ["E1", "E3"], 297 + "key_counter_evidence_ids": ["E2"], 298 + "uncertainty_factors": ["Data gap", "Method disagreement"], 299 + "what_would_change_my_mind": ["Specific new study", "Updated dataset"] 300 + }, 301 + "evidence": [ 302 + { 303 + "evidence_id": "E1", 304 + "stance": "supports | undermines | mixed | context_dependent", 305 + "relevance_to_scenario": "float (0-1)", 306 + "evidence_summary": ["Bullet fact 1", "Bullet fact 2"], 307 + "citation": { 308 + "title": "Source title", 309 + "author_or_org": "Org/Author", 310 + "publication_date": "2024-05-01", 311 + "url": "https://source.example", 312 + "publisher": "Publisher/Domain" 313 + }, 314 + "excerpt": ["Short quote ≤25 words (optional)"], 315 + "source_reliability_score": "float (0-1) - READ-ONLY SNAPSHOT", 316 + "reliability_justification": "Why high/medium/low", 317 + "limitations_and_reservations": ["Limitation 1", "Limitation 2"], 318 + "retraction_or_dispute_signal": "none | correction | retraction | disputed", 319 + "retrieval_status": "OK | NEEDS_RETRIEVAL | FAILED" 320 + } 321 + ] 322 + } 323 + ] 745 745 } 746 746 ], 747 747 "quality_gates": { 748 - "gate1_claim_validation": "pass", 749 - "gate4_verdict_confidence": "pass", 750 - "passed_all": true 327 + "gate1_claim_validation": "pass | fail", 328 + "gate4_verdict_confidence": "pass | fail", 329 + "passed_all": "boolean", 330 + "gate_fail_reasons": [ 331 + { 332 + "gate": "gate1_claim_validation", 333 + "claim_id": "C1", 334 + "reason_code": "OPINION_DETECTED | COMPOUND_CLAIM | SUBJECTIVE | TOO_VAGUE", 335 + "explanation": "Human-readable explanation" 336 + } 337 + ] 338 + }, 339 + "global_notes": { 340 + "limitations": ["System limitation 1", "Limitation 2"], 341 + "safety_or_policy_notes": ["Note 1"] 751 751 } 752 752 } 753 753 {{/code}} 754 754 346 +=== 4.1 Risk Tier Definitions === 755 755 348 +|=Tier|=Impact|=Examples|=Actions 349 +|**A (High)**|High real-world impact if wrong|Health claims, safety information, financial advice, medical procedures|Human review recommended (Mode3_Human_Reviewed_Required) 350 +|**B (Medium)**|Moderate impact, contested topics|Political claims, social issues, scientific debates, economic predictions|Enhanced contradiction search, AI-generated publication OK (Mode2_AI_Generated) 351 +|**C (Low)**|Low impact, easily verifiable|Historical facts, basic statistics, biographical data, geographic information|Standard processing, AI-generated publication OK (Mode2_AI_Generated) 756 756 757 -=== 4. 5VerdictLabelTaxonomy ===353 +=== 4.2 Source Reliability (Read-Only Snapshots) === 758 758 759 - FactHarbor uses**threedistinctverdicttaxonomies**dependingonanalysislevel:355 +**IMPORTANT:** The {{code}}source_reliability_score{{/code}} in each evidence item is a **historical snapshot** from the weekly background scoring job. 760 760 761 -==== 4.5.1 Scenario Verdict Labels (Stage 2) ==== 357 +* POC1 treats these scores as **read-only** (no modification during analysis) 358 +* **Prevents circular dependency:** scoring → affects retrieval → affects scoring 359 +* Full Source Track Record System is a **separate service** (not part of POC1) 360 +* **Temporal separation:** Scoring runs weekly; analysis uses snapshots 762 762 763 - Usedforindividualscenarioverdictswithin aclaim.362 +**See:** [[Data Model>>Test.FactHarbor.Specification.Data Model.WebHome]] Section 1.3 (Source Track Record System) for scoring algorithm. 764 764 765 -**Enum Values:** 766 -* {{code}}Highly Likely{{/code}} - Probability 0.85-1.0, high confidence 767 -* {{code}}Likely{{/code}} - Probability 0.65-0.84, moderate-high confidence 768 -* {{code}}Unclear{{/code}} - Probability 0.35-0.64, or low confidence 769 -* {{code}}Unlikely{{/code}} - Probability 0.16-0.34, moderate-high confidence 770 -* {{code}}Highly Unlikely{{/code}} - Probability 0.0-0.15, high confidence 771 -* {{code}}Unsubstantiated{{/code}} - Insufficient evidence to determine probability 364 +=== 4.3 Quality Gate Reason Codes === 772 772 773 -==== 4.5.2 Claim Verdict Labels (Rollup) ==== 366 +**Gate 1 (Claim Validation):** 367 +* {{code}}OPINION_DETECTED{{/code}} - Subjective judgment without factual anchor 368 +* {{code}}COMPOUND_CLAIM{{/code}} - Multiple claims in one statement 369 +* {{code}}SUBJECTIVE{{/code}} - Value judgment, not verifiable fact 370 +* {{code}}TOO_VAGUE{{/code}} - Lacks specificity for evaluation 774 774 775 -Used when summarizing a claim across all scenarios. 372 +**Gate 4 (Verdict Confidence):** 373 +* {{code}}LOW_CONFIDENCE{{/code}} - Confidence below threshold (<0.5) 374 +* {{code}}INSUFFICIENT_EVIDENCE{{/code}} - Too few sources to reach verdict 375 +* {{code}}CONTRADICTORY_EVIDENCE{{/code}} - Evidence conflicts without resolution 376 +* {{code}}NO_COUNTER_EVIDENCE{{/code}} - Contradiction search failed 776 776 777 -**Enum Values:** 778 -* {{code}}Supported{{/code}} - Majority of scenarios are Likely or Highly Likely 779 -* {{code}}Refuted{{/code}} - Majority of scenarios are Unlikely or Highly Unlikely 780 -* {{code}}Inconclusive{{/code}} - Mixed scenarios or majority Unclear/Unsubstantiated 378 +**Purpose:** Enable system improvement workflow (Observe → Analyze → Improve) 781 781 782 -**Mapping Logic:** 783 -* If ≥60% scenarios are (Highly Likely | Likely) → Supported 784 -* If ≥60% scenarios are (Highly Unlikely | Unlikely) → Refuted 785 -* Otherwise → Inconclusive 380 +--- 786 786 787 -== ==4.5.3ArticleVerdictLabels (Stage3) ====382 +== 5. Validation Rules (POC1 Enforcement) == 788 788 789 -Used for holistic article-level assessment. 384 +|=Rule|=Requirement 385 +|**Mandatory Contradiction**|For every claim, the engine MUST search for "undermines" evidence. If none found, reasoning must explicitly state: "No counter-evidence found despite targeted search." Evidence must include at least 1 item with {{code}}stance ∈ {undermines, mixed, context_dependent}{{/code}} OR explicit note in {{code}}uncertainty_factors{{/code}}. 386 +|**Context-Aware Logic**|The {{code}}overall_verdict{{/code}} must prioritize central claims. If a {{code}}is_central_to_thesis=true{{/code}} claim is REFUTED, the overall article cannot be WELL-SUPPORTED. Central claims override verdict averaging. 387 +|**Author Identification**|All automated outputs MUST include {{code}}author_type: "AI/AKEL"{{/code}} or equivalent marker to distinguish AI-generated from human-reviewed content. 388 +|**Claim-to-Scenario Lifecycle**|In stateless POC1, Scenarios are **strictly children** of a specific Claim version. If a Claim's text changes, child Scenarios are part of that version's "snapshot." No scenario migration across versions. 790 790 791 -**Enum Values:** 792 -* {{code}}WELL-SUPPORTED{{/code}} - Article thesis logically follows from supported claims 793 -* {{code}}MISLEADING{{/code}} - Claims may be true but article commits logical fallacies 794 -* {{code}}REFUTED{{/code}} - Central claims are refuted, invalidating thesis 795 -* {{code}}UNCERTAIN{{/code}} - Insufficient evidence or highly mixed claim verdicts 390 +--- 796 796 797 - **Note:**Articleverdictconsiders **claimcentrality** (centralclaims override supportingclaims).392 +== 6. Deterministic Markdown Template == 798 798 799 - ====4.5.4APIFieldMapping====394 +The system renders {{code}}report.md{{/code}} using a **fixed template** based on the JSON result (NOT generated by LLM). 800 800 801 -|=Level|=API Field|=Enum Name 802 -|Scenario|{{code}}scenarios[].verdict.label{{/code}}|scenario_verdict_label 803 -|Claim|{{code}}claims[].rollup_verdict{{/code}} (optional)|claim_verdict_label 804 -|Article|{{code}}article_holistic_assessment.overall_verdict{{/code}}|article_verdict_label 396 +{{code language="markdown"}} 397 +# FactHarbor Analysis Report: {overall_verdict} 805 805 399 +**Job ID:** {job_id} | **Generated:** {timestamp_utc} 400 +**Model:** {llm_model} | **Cost:** ${estimated_cost_usd} | **Time:** {response_time_sec}s 806 806 807 807 --- 808 808 809 - ==5.CacheArchitecture==404 +## 1. Holistic Assessment (Experimental) 810 810 811 - ===5.1 RedisCacheDesign ===406 +**Main Thesis:** {main_thesis} 812 812 813 -** Technology:**Redis7.0+ (in-memory key-valuestore)408 +**Overall Verdict:** {overall_verdict} 814 814 815 -**Cache Key Schema:** 816 -{{code}} 817 -claim:v1norm1:{language}:{sha256(canonical_claim)} 818 -{{/code}} 410 +**Logic Quality Score:** {logic_quality_score}/1.0 819 819 820 -**Example:** 821 -{{code}} 822 -Claim (English): "COVID vaccines are 95% effective" 823 -Canonical: "covid vaccines are 95 percent effective" 824 -Language: "en" 825 -SHA256: abc123...def456 826 -Key: claim:v1norm1:en:abc123...def456 827 -{{/code}} 412 +**Fallacies Detected:** {fallacies_detected} 828 828 829 -**Ra tionale:**Preventscross-languagecollisions and enables per-language cache analytics.414 +**Reasoning:** {verdict_reasoning} 830 830 831 -**Data Structure:** 832 -{{code language="redis"}} 833 -SET claim:v1:abc123...def456 '{...ClaimAnalysis JSON...}' 834 -EXPIRE claim:v1:abc123...def456 7776000 # 90 days 835 -{{/code}} 836 - 837 -**Additional Keys:** 838 -{{code}} 839 - 840 -==== 5.1.1 Canonical Claim Normalization (v1) ==== 841 - 842 -The cache key depends on deterministic claim normalization. All implementations MUST follow this algorithm exactly. 843 - 844 -**Algorithm: Canonical Claim Normalization v1** 845 - 846 -{{code language="python"}} 847 -def normalize_claim_v1(claim_text: str, language: str) -> str: 848 - """ 849 - Normalizes claim to canonical form for cache key generation. 850 - Version: v1norm1 (POC1) 851 - """ 852 - import re 853 - import unicodedata 854 - 855 - # Step 1: Unicode normalization (NFC) 856 - text = unicodedata.normalize('NFC', claim_text) 857 - 858 - # Step 2: Lowercase 859 - text = text.lower() 860 - 861 - # Step 3: Remove punctuation (except hyphens in words) 862 - text = re.sub(r'[^\w\s-]', '', text) 863 - 864 - # Step 4: Normalize whitespace (collapse multiple spaces) 865 - text = re.sub(r'\s+', ' ', text).strip() 866 - 867 - # Step 5: Numeric normalization 868 - text = text.replace('%', ' percent') 869 - # Spell out single-digit numbers 870 - num_to_word = {'0':'zero', '1':'one', '2':'two', '3':'three', 871 - '4':'four', '5':'five', '6':'six', '7':'seven', 872 - '8':'eight', '9':'nine'} 873 - for num, word in num_to_word.items(): 874 - text = re.sub(rf'\b{num}\b', word, text) 875 - 876 - # Step 6: Common abbreviations (English only in v1) 877 - if language == 'en': 878 - text = text.replace('covid-19', 'covid') 879 - text = text.replace('u.s.', 'us') 880 - text = text.replace('u.k.', 'uk') 881 - 882 - # Step 7: NO entity normalization in v1 883 - # (Trump vs Donald Trump vs President Trump remain distinct) 884 - 885 - return text 886 - 887 -# Version identifier (include in cache namespace) 888 -CANONICALIZER_VERSION = "v1norm1" 889 -{{/code}} 890 - 891 -**Cache Key Formula (Updated):** 892 - 893 -{{code}} 894 -language = "en" 895 -canonical = normalize_claim_v1(claim_text, language) 896 -cache_key = f"claim:{CANONICALIZER_VERSION}:{language}:{sha256(canonical)}" 897 - 898 -Example: 899 - claim: "COVID-19 vaccines are 95% effective" 900 - canonical: "covid vaccines are 95 percent effective" 901 - sha256: abc123...def456 902 - key: "claim:v1norm1:en:abc123...def456" 903 -{{/code}} 904 - 905 -**Cache Metadata MUST Include:** 906 - 907 -{{code language="json"}} 908 -{ 909 - "canonical_claim": "covid vaccines are 95 percent effective", 910 - "canonicalizer_version": "v1norm1", 911 - "language": "en", 912 - "original_claim_samples": ["COVID-19 vaccines are 95% effective"] 913 -} 914 -{{/code}} 915 - 916 -**Version Upgrade Path:** 917 -* v1norm1 → v1norm2: Cache namespace changes, old keys remain valid until TTL 918 -* v1normN → v2norm1: Major version bump, invalidate all v1 caches 919 - 920 - 921 -claim:stats:hit_count:{claim_hash} # Counter 922 -claim:index:domain:{domain} # Set of claim hashes by domain 923 -claim:index:language:{lang} # Set of claim hashes by language 924 -{{/code}} 925 - 926 - 927 -=== 5.1.1 Canonical Claim Normalization (v1) === 928 - 929 -The cache key depends on deterministic claim normalization. All implementations MUST follow this algorithm exactly. 930 - 931 -**Algorithm: Canonical Claim Normalization v1** 932 - 933 -{{code language="python"}} 934 -def normalize_claim_v1(claim_text: str, language: str) -> str: 935 - """ 936 - Normalizes claim to canonical form for cache key generation. 937 - Version: v1norm1 (POC1) 938 - """ 939 - import re 940 - import unicodedata 941 - 942 - # Step 1: Unicode normalization (NFC) 943 - text = unicodedata.normalize('NFC', claim_text) 944 - 945 - # Step 2: Lowercase 946 - text = text.lower() 947 - 948 - # Step 3: Remove punctuation (except hyphens in words) 949 - text = re.sub(r'[^\w\s-]', '', text) 950 - 951 - # Step 4: Normalize whitespace (collapse multiple spaces) 952 - text = re.sub(r'\s+', ' ', text).strip() 953 - 954 - # Step 5: Numeric normalization 955 - text = text.replace('%', ' percent') 956 - # Spell out single-digit numbers 957 - num_to_word = {'0':'zero', '1':'one', '2':'two', '3':'three', 958 - '4':'four', '5':'five', '6':'six', '7':'seven', 959 - '8':'eight', '9':'nine'} 960 - for num, word in num_to_word.items(): 961 - text = re.sub(rf'\b{num}\b', word, text) 962 - 963 - # Step 6: Common abbreviations (English only in v1) 964 - if language == 'en': 965 - text = text.replace('covid-19', 'covid') 966 - text = text.replace('u.s.', 'us') 967 - text = text.replace('u.k.', 'uk') 968 - 969 - # Step 7: NO entity normalization in v1 970 - # (Trump vs Donald Trump vs President Trump remain distinct) 971 - 972 - return text 973 - 974 -# Version identifier (include in cache namespace) 975 -CANONICALIZER_VERSION = "v1norm1" 976 -{{/code}} 977 - 978 -**Cache Key Formula (Updated):** 979 - 980 -{{code}} 981 -language = "en" 982 -canonical = normalize_claim_v1(claim_text, language) 983 -cache_key = f"claim:{CANONICALIZER_VERSION}:{language}:{sha256(canonical)}" 984 - 985 -Example: 986 - claim: "COVID-19 vaccines are 95% effective" 987 - canonical: "covid vaccines are 95 percent effective" 988 - sha256: abc123...def456 989 - key: "claim:v1norm1:en:abc123...def456" 990 -{{/code}} 991 - 992 -**Cache Metadata MUST Include:** 993 - 994 -{{code language="json"}} 995 -{ 996 - "canonical_claim": "covid vaccines are 95 percent effective", 997 - "canonicalizer_version": "v1norm1", 998 - "language": "en", 999 - "original_claim_samples": ["COVID-19 vaccines are 95% effective"] 1000 -} 1001 -{{/code}} 1002 - 1003 -**Version Upgrade Path:** 1004 -* v1norm1 → v1norm2: Cache namespace changes, old keys remain valid until TTL 1005 -* v1normN → v2norm1: Major version bump, invalidate all v1 caches 1006 - 1007 - 1008 - 1009 -=== 5.1.2 Copyright & Data Retention Policy === 1010 - 1011 -**Evidence Excerpt Storage:** 1012 - 1013 -To comply with copyright law and fair use principles: 1014 - 1015 -**What We Store:** 1016 -* **Metadata only:** Title, author, publisher, URL, publication date 1017 -* **Short excerpts:** Max 25 words per quote, max 3 quotes per evidence item 1018 -* **Summaries:** AI-generated bullet points (not verbatim text) 1019 -* **No full articles:** Never store complete article text beyond job processing 1020 - 1021 -**Total per Cached Claim:** 1022 -* Scenarios: 2 per claim 1023 -* Evidence items: 6 per scenario (12 total) 1024 -* Quotes: 3 per evidence × 25 words = 75 words per item 1025 -* **Maximum stored verbatim text:** ~900 words per claim (12 × 75) 1026 - 1027 -**Retention:** 1028 -* Cache TTL: 90 days 1029 -* Job outputs: 24 hours (then archived or deleted) 1030 -* No persistent full-text article storage 1031 - 1032 -**Rationale:** 1033 -* Short excerpts for citation = fair use 1034 -* Summaries are transformative (not copyrightable) 1035 -* Limited retention (90 days max) 1036 -* No commercial republication of excerpts 1037 - 1038 -**DMCA Compliance:** 1039 -* Cache invalidation endpoint available for rights holders 1040 -* Contact: dmca@factharbor.org 1041 - 1042 - 1043 -=== 5.2 Cache Invalidation Strategy === 1044 - 1045 -**Time-Based (Primary):** 1046 -* TTL: 90 days for most claims 1047 -* Reasoning: Evidence freshness, news cycles 1048 - 1049 -**Event-Based (Manual):** 1050 -* Admin can flag claims for invalidation 1051 -* Example: "Major study retracts findings" 1052 -* Tool: {{code}}DELETE /v1/cache/claim/{claim_hash}?reason=retraction{{/code}} 1053 - 1054 -**Version-Based (Automatic):** 1055 -* AKEL v2.0 release → Invalidate all v1.0 caches 1056 -* Cache keys include version: {{code}}claim:v1:*{{/code}} vs {{code}}claim:v2:*{{/code}} 1057 - 1058 -**Long-Lived Historical Claims:** 1059 -* Historical claims about completed events generally have stable verdicts 1060 -* Example: "2024 US presidential election results" 1061 -* **Policy:** Extended TTL (365-3,650 days) instead of "never invalidate" 1062 -* **Reason:** Even historical data gets revisions (updated counts, corrections) 1063 -* **Mechanism:** Admin can still manually invalidate if major correction issued 1064 -* **Flag:** {{code}}is_historical=true{{/code}} in cache metadata → longer TTL 1065 - 1066 -=== 5.3 Cache Warming Strategy === 1067 - 1068 -**Proactive Cache Building (Future):** 1069 - 1070 -**Trending Topics:** 1071 -* Monitor news APIs for trending topics 1072 -* Pre-analyze top 20 common claims 1073 -* Example: New health study published → Pre-cache related claims 1074 - 1075 -**Predictable Events:** 1076 -* Elections, sporting events, earnings reports 1077 -* Pre-cache expected claims before event 1078 -* Reduces load during traffic spikes 1079 - 1080 -**User Patterns:** 1081 -* Analyze query logs 1082 -* Identify frequently requested claims 1083 -* Prioritize cache warming for these 1084 - 1085 1085 --- 1086 1086 1087 - ==6.QualityGates& Validation Rules==418 +## 2. Key Claims Analysis 1088 1088 1089 -=== 6.1 Quality Gate Overview === 420 +### [C1] {claim_text} 421 +* **Role:** {is_central_to_thesis ? "Central to thesis" : "Supporting claim"} 422 +* **Risk Tier:** {risk_tier} ({risk_tier_justification}) 423 +* **Evaluability:** {evaluability} 1090 1090 1091 -|=Gate|=Name|=POC1 Status|=Applies To|=Notes 1092 -|**Gate 1**|Claim Validation|✅ Hard gate|Stage 1: Extraction|Filters opinions, compound claims 1093 -|**Gate 2**|Contradiction Search|✅ Mandatory rule|Stage 2: Analysis|Enforced per cached claim 1094 -|**Gate 3**|Uncertainty Disclosure|⚠️ Soft guidance|Stage 2: Analysis|Best practice 1095 -|**Gate 4**|Verdict Confidence|✅ Hard gate|Stage 2: Analysis|Confidence ≥ 0.5 required 425 +**Scenarios Explored:** {scenarios.length} 1096 1096 1097 -**Hard Gate Failures:** 1098 -* Gate 1 fail → Claim excluded from analysis 1099 -* Gate 4 fail → Claim marked "Unsubstantiated" but included 427 +#### Scenario: {scenario.context_title} 428 +* **Verdict:** {verdict.label} (Confidence: {verdict.confidence}) 429 +* **Probability Range:** {verdict.probability_range[0]} - {verdict.probability_range[1]} 430 +* **Reasoning:** {verdict.reasoning} 1100 1100 1101 -=== 6.2 Validation Rules === 432 +**Evidence:** 433 +* Supporting: {evidence.filter(e => e.stance == "supports").length} sources 434 +* Undermining: {evidence.filter(e => e.stance == "undermines").length} sources 435 +* Mixed: {evidence.filter(e => e.stance == "mixed").length} sources 1102 1102 1103 -|=Rule|=Requirement 1104 -|**Mandatory Contradiction**|Stage 2 MUST search for "undermines" evidence. If none found, reasoning must state: "No counter-evidence found despite targeted search." 1105 -|**Context-Aware Logic**|Stage 3 must prioritize central claims. If {{code}}is_central_to_thesis=true{{/code}} claim is REFUTED, article cannot be WELL-SUPPORTED. 1106 -|**Cache Consistency**|Cached claims must match current AKEL version. Version mismatch → cache miss. 1107 -|**Author Identification**|All outputs MUST include {{code}}author_type: "AI/AKEL"{{/code}}. 437 +**Key Evidence:** 438 +* [{evidence[0].citation.title}]({evidence[0].citation.url}) - {evidence[0].stance} 1108 1108 1109 1109 --- 1110 1110 1111 - ==7.DeterministicMarkdown Template==442 +## 3. Quality Assessment 1112 1112 1113 -Report generation uses **fixed template** (not LLM-generated). 444 +**Quality Gates:** 445 +* Gate 1 (Claim Validation): {gate1_claim_validation} 446 +* Gate 4 (Verdict Confidence): {gate4_verdict_confidence} 447 +* Overall: {passed_all ? "PASS" : "FAIL"} 1114 1114 1115 -**Cache-Only Mode Template:** 1116 -{{code language="markdown"}} 1117 -# FactHarbor Analysis Report: PARTIAL ANALYSIS 449 +{if gate_fail_reasons.length > 0} 450 +**Failed Gates:** 451 +{gate_fail_reasons.map(r => `* ${r.gate}: ${r.explanation}`)} 452 +{/if} 1118 1118 1119 -**Job ID:** {job_id} | **Generated:** {timestamp_utc} 1120 -**Mode:** Cache-Only (Free Tier) 1121 - 1122 1122 --- 1123 1123 1124 -## ⚠️Partial AnalysisNotice456 +## 4. Limitations & Disclaimers 1125 1125 1126 - This is a**cache-onlyanalysis** basedon previouslyanalyzed claims.1127 -{ cache_coverage_percent}% of claims wereavailableincache.458 +**System Limitations:** 459 +{limitations.map(l => `* ${l}`)} 1128 1128 1129 -**What's Included:** 1130 -* {claims_cached} of {claims_total} claims analyzed 1131 -* Evidence and verdicts from cache (last updated: {oldest_cache_date}) 461 +**Important Notes:** 462 +* This analysis is AI-generated and experimental (POC1) 463 +* Context-aware article verdict is being tested for accuracy 464 +* Human review recommended for high-risk claims (Tier A) 465 +* Cost: ${estimated_cost_usd} | Tokens: {input_tokens + output_tokens} 1132 1132 1133 -**What's Missing:** 1134 -* {claims_missing} claims require new analysis 1135 -* Full article holistic assessment unavailable 1136 -* Estimated cost to complete: ${cost_to_complete} 467 +**Methodology:** FactHarbor uses Claude 3.5 Sonnet to extract claims, generate scenarios, gather evidence (with mandatory contradiction search), and assess logical coherence between claims and article thesis. 1137 1137 1138 -**[Upgrade to Pro]** for complete analysis 1139 - 1140 1140 --- 1141 1141 1142 -## Cached Claims 471 +*Generated by FactHarbor POC1-v0.3 | [About FactHarbor](https://factharbor.org)* 472 +{{/code}} 1143 1143 1144 -### [C1] {claim_text} ✅ From Cache 1145 -* **Cached:** {cached_at} ({cache_age} ago) 1146 -* **Times Used:** {hit_count} articles 1147 -* **Verdict:** {verdict} (Confidence: {confidence}) 1148 -* **Evidence:** {evidence_count} sources 474 +**Target Report Size:** 220-350 words (optimized for 2-minute read) 1149 1149 1150 -[Full claim details...] 1151 - 1152 -### [C3] {claim_text} ⚠️ Not In Cache 1153 -* **Status:** Requires new analysis 1154 -* **Cost:** $0.081 1155 -* **Upgrade to analyze this claim** 1156 - 1157 1157 --- 1158 1158 1159 -**Powered by FactHarbor POC1-v0.4** | [Upgrade](https://factharbor.org/upgrade) 1160 -{{/code}} 478 +== 7. LLM Configuration (POC1) == 1161 1161 1162 ---- 480 +|=Parameter|=Value|=Notes 481 +|**Provider**|Anthropic|Primary provider for POC1 482 +|**Model**|{{code}}claude-3-5-sonnet-20241022{{/code}}|Current production model 483 +|**Future Model**|{{code}}claude-sonnet-4-20250514{{/code}}|When available (architecture supports) 484 +|**Token Budget**|50K-80K per analysis|Input + output combined (varies by article length) 485 +|**Estimated Cost**|$0.10-0.30 per article|Based on Sonnet 3.5 pricing ($3/M input, $15/M output) 486 +|**Prompt Strategy**|Single-pass per stage|Not multi-turn; structured JSON output with schema validation 487 +|**Chain-of-Thought**|Yes|For verdict reasoning and holistic assessment 488 +|**Few-Shot Examples**|Yes|For claim extraction and scenario generation 1163 1163 1164 -== 8.LLM Configuration(3-Stage)==490 +=== 7.1 Token Budgets by Stage === 1165 1165 1166 -=== 8.1 Stage 1: Claim Extraction (Haiku) === 492 +|=Stage|=Approximate Output Tokens 493 +|Claim Extraction|~4,000 (10 claims × ~400 tokens) 494 +|Scenario Generation|~3,000 per claim (3 scenarios × ~1,000 tokens) 495 +|Evidence Synthesis|~2,000 per scenario 496 +|Verdict Generation|~1,000 per scenario 497 +|Holistic Assessment|~500 (context-aware summary) 1167 1167 1168 -|=Parameter|=Value|=Notes 1169 -|**Model**|{{code}}claude-haiku-4-20250108{{/code}}|Fast, cheap, sufficient for extraction 1170 -|**Input Tokens**|~10K|Article text after URL extraction 1171 -|**Output Tokens**|~500|5 claims @ ~100 tokens each 1172 -|**Cost**|$0.003 per article|($0.25/M input + $1.25/M output) 1173 -|**Temperature**|0.0|Deterministic 1174 -|**Max Tokens**|1000|Generous buffer 499 +**Total:** 50K-80K tokens per article (input + output) 1175 1175 1176 -**Prompt Strategy:** 1177 -* Extract 5 verifiable factual claims 1178 -* Mark central vs. supporting claims 1179 -* Canonicalize (normalize phrasing) 1180 -* Deduplicate similar claims 1181 -* Output structured JSON only 501 +=== 7.2 API Integration === 1182 1182 1183 -=== 8.2 Stage 2: Claim Analysis (Sonnet, CACHED) === 503 +**Anthropic Messages API:** 504 +* Endpoint: {{code}}https://api.anthropic.com/v1/messages{{/code}} 505 +* Authentication: API key via {{code}}x-api-key{{/code}} header 506 +* Model parameter: {{code}}"model": "claude-3-5-sonnet-20241022"{{/code}} 507 +* Max tokens: {{code}}"max_tokens": 4096{{/code}} (per stage) 1184 1184 1185 -|=Parameter|=Value|=Notes 1186 -|**Model**|{{code}}claude-3-5-sonnet-20241022{{/code}}|High quality for verdicts 1187 -|**Input Tokens**|~2K|Single claim + prompt + context 1188 -|**Output Tokens**|~5K|2 scenarios × ~2.5K tokens 1189 -|**Cost**|$0.081 per NEW claim|($3/M input + $15/M output) 1190 -|**Temperature**|0.0|Deterministic (cache consistency) 1191 -|**Max Tokens**|8000|Sufficient for 2 scenarios 1192 -|**Cache Strategy**|Redis, 90-day TTL|Key: {{code}}claim:v1norm1:{language}:{sha256(canonical_claim)}{{/code}} 509 +**No LangChain/LangGraph needed** for POC1 simplicity - direct SDK calls suffice. 1193 1193 1194 -**Prompt Strategy:** 1195 -* Generate 2 scenario interpretations 1196 -* Search for supporting AND undermining evidence (mandatory) 1197 -* 6 evidence items per scenario maximum 1198 -* Compute verdict with reasoning chain (3-4 bullets) 1199 -* Output structured JSON only 511 +--- 1200 1200 1201 -**Output Constraints (Cost Control):** 1202 -* Scenarios: Max 2 per claim 1203 -* Evidence: Max 6 per scenario 1204 -* Evidence summary: Max 3 bullets 1205 -* Reasoning chain: Max 4 bullets 513 +== 8. Cross-References (xWiki) == 1206 1206 1207 - ===8.3Stage3: HolisticAssessment(Sonnet)===515 +This API specification implements requirements from: 1208 1208 1209 -|=Parameter|=Value|=Notes 1210 -|**Model**|{{code}}claude-3-5-sonnet-20241022{{/code}}|Context-aware analysis 1211 -|**Input Tokens**|~5K|Article + claim verdicts 1212 -|**Output Tokens**|~1K|Article verdict + fallacies 1213 -|**Cost**|$0.030 per article|($3/M input + $15/M output) 1214 -|**Temperature**|0.0|Deterministic 1215 -|**Max Tokens**|2000|Sufficient for assessment 517 +* **[[POC Requirements>>Test.FactHarbor.Specification.POC.Requirements]]** 518 +** FR-POC-1 through FR-POC-6 (POC1-specific functional requirements) 519 +** NFR-POC-1 through NFR-POC-3 (quality gates lite: Gates 1 & 4 only) 520 +** Section 2.1: Analysis Summary (Context-Aware) component specification 521 +** Section 10.3: Prompt structure for claim extraction and verdict synthesis 1216 1216 1217 -**Prompt Strategy:** 1218 -* Detect main thesis 1219 -* Evaluate logical coherence (claim verdicts → thesis) 1220 -* Identify fallacies (correlation-causation, cherry-picking, etc.) 1221 -* Compute logic_quality_score 1222 -* Explain article verdict reasoning (3-4 bullets) 1223 -* Output structured JSON only 523 +* **[[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]]** 524 +** Complete investigation of 7 approaches to article-level verdicts 525 +** Approach 1 (Single-Pass Holistic Analysis) chosen for POC1 526 +** Experimental feature testing plan (30 articles, ≥70% accuracy target) 527 +** Decision framework for POC2 implementation 1224 1224 1225 -=== 8.4 Cost Projections by Cache Hit Rate === 529 +* **[[Requirements>>Test.FactHarbor.Specification.Requirements.WebHome]]** 530 +** FR4 (Analysis Summary) - enhanced with context-aware capability 531 +** FR7 (Verdict Calculation) - probability ranges + confidence scores 532 +** NFR11 (Quality Gates) - POC1 implements Gates 1 & 4; Gates 2 & 3 in POC2 1226 1226 1227 -|=Cache Hit Rate|=Cost per Article|=10K Articles Cost|=100K Articles Cost 1228 -|0% (cold start)|$0.438|$4,380|$43,800 1229 -|20%|$0.357|$3,570|$35,700 1230 -|40%|$0.276|$2,760|$27,600 1231 -|**60%**|**$0.195**|**$1,950**|**$19,500** 1232 -|**70%** (target)|**$0.155**|**$1,550**|**$15,500** 1233 -|**80%**|**$0.114**|**$1,140**|**$11,400** 1234 -|**90%**|**$0.073**|**$730**|**$7,300** 1235 -|95%|$0.053|$530|$5,300 534 +* **[[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]]** 535 +** POC1 simplified architecture (stateless, single AKEL orchestration call) 536 +** Data persistence minimized (job outputs only, no database required) 537 +** Deferred complexity (no Elasticsearch, TimescaleDB, Federation until metrics justify) 1236 1236 1237 -**Break-Even Analysis:** 1238 -* Monolithic (v0.3.1): $0.15 per article constant 1239 -* 3-stage breaks even at **70% cache hit rate** 1240 -* Expected after ~1,500 articles in same domain 539 +* **[[Data Model>>Test.FactHarbor.Specification.Data Model.WebHome]]** 540 +** Evidence structure (source, stance, reliability rating) 541 +** Scenario boundaries (time, geography, population, conditions) 542 +** Claim types and evaluability taxonomy 543 +** Source Track Record System (Section 1.3) - temporal separation 1241 1241 545 +* **[[Requirements Roadmap Matrix>>Test.FactHarbor.Roadmap.Requirements-Roadmap-Matrix.WebHome]]** 546 +** POC1 requirement mappings and phase assignments 547 +** Context-aware analysis as POC1 experimental feature 548 +** POC2 enhancement path (Gates 2 & 3, evidence deduplication) 549 + 1242 1242 --- 1243 1243 1244 -== 9. Implementation Notes == 552 +== 9. Implementation Notes (POC1) == 1245 1245 1246 1246 === 9.1 Recommended Tech Stack === 1247 1247 1248 -* **Framework:** Next.js 14+ with App Router (TypeScript) 1249 -* **Cache:** Redis 7.0+ (managed: AWS ElastiCache, Redis Cloud, Upstash) 1250 -* **Storage:** Filesystem JSON for jobs + S3/R2 for archival 1251 -* **Queue:** BullMQ with Redis (for 3-stage pipeline orchestration) 1252 -* **LLM Client:** Anthropic Python SDK or TypeScript SDK 1253 -* **Cost Tracking:** PostgreSQL for user credit ledger 1254 -* **Deployment:** Vercel (frontend + API) + Redis Cloud 556 +* **Framework:** Next.js 14+ with App Router (TypeScript) - Full-stack in one codebase 557 +* **Rationale:** API routes + React UI unified, Vercel deployment-ready, similar to C# in structure 558 +* **Storage:** Filesystem JSON files (no database needed for POC1) 559 +* **Queue:** In-memory queue or Redis (optional for concurrency) 560 +* **URL Extraction:** Jina AI Reader API (primary), trafilatura (fallback) 561 +* **Deployment:** Vercel, AWS Lambda, or similar serverless 1255 1255 1256 -=== 9.2 3-StagePipelineImplementation ===563 +=== 9.2 POC1 Simplifications === 1257 1257 1258 -**Job Queue Flow (Conceptual):** 565 +* **No database required:** Job metadata + outputs stored as JSON files ({{code}}jobs/{job_id}.json{{/code}}, {{code}}results/{job_id}.json{{/code}}) 566 +* **No user authentication:** Optional API key validation only (env var: {{code}}FACTHARBOR_API_KEY{{/code}}) 567 +* **Single-instance deployment:** No distributed processing, no worker pools 568 +* **Synchronous LLM calls:** No streaming in POC1 (entire response before returning) 569 +* **Job retention:** 24 hours default (configurable: {{code}}JOB_RETENTION_HOURS{{/code}}) 570 +* **Rate limiting:** Simple IP-based (optional) - no complex billing 1259 1259 1260 -{{code language="typescript"}} 1261 -// Stage 1: Extract Claims 1262 -const stage1Job = await queue.add('stage1-extract-claims', { 1263 - jobId: 'job123', 1264 - articleUrl: 'https://example.com/article' 1265 -}); 572 +=== 9.3 Estimated Costs (Per Analysis) === 1266 1266 1267 -// On Stage 1 completion → enqueue Stage 2 jobs 1268 -stage1Job.on('completed', async (result) => { 1269 - const { claims } = result; 1270 - 1271 - // Stage 2: Analyze each claim (with cache check) 1272 - const stage2Jobs = await Promise.all( 1273 - claims.map(claim => 1274 - queue.add('stage2-analyze-claim', { 1275 - jobId: 'job123', 1276 - claimId: claim.claim_id, 1277 - canonicalClaim: claim.canonical_claim, 1278 - checkCache: true 1279 - }) 1280 - ) 1281 - ); 1282 - 1283 - // On all Stage 2 completions → enqueue Stage 3 1284 - await Promise.all(stage2Jobs.map(j => j.waitUntilFinished())); 1285 - 1286 - const claimVerdicts = await gatherStage2Results('job123'); 1287 - 1288 - await queue.add('stage3-holistic', { 1289 - jobId: 'job123', 1290 - articleUrl: 'https://example.com/article', 1291 - claimVerdicts: claimVerdicts 1292 - }); 1293 -}); 1294 -{{/code}} 574 +**LLM API costs (Claude 3.5 Sonnet):** 575 +* Input: $3.00 per million tokens 576 +* Output: $15.00 per million tokens 577 +* **Per article:** $0.10-0.30 (varies by length, 5-10 claims typical) 1295 1295 1296 -**Note:** This is a conceptual sketch. Actual implementation may use BullMQ Flow API or custom orchestration. 579 +**Web search costs (optional):** 580 +* Using external search API (Tavily, Brave): $0.01-0.05 per analysis 581 +* POC1 can use free search APIs initially 1297 1297 1298 -**Cache Check Logic:** 1299 -{{code language="typescript"}} 1300 -async function analyzeClaimWithCache(claim: string): Promise<ClaimAnalysis> { 1301 - const canonicalClaim = normalizeClaim(claim); 1302 - const claimHash = sha256(canonicalClaim); 1303 - const cacheKey = `claim:v1:${claimHash}`; 1304 - 1305 - // Check cache 1306 - const cached = await redis.get(cacheKey); 1307 - if (cached) { 1308 - await redis.incr(`claim:stats:hit_count:${claimHash}`); 1309 - return JSON.parse(cached); 1310 - } 1311 - 1312 - // Cache miss - analyze with LLM 1313 - const analysis = await analyzeClaim_Stage2(canonicalClaim); 1314 - 1315 - // Store in cache 1316 - await redis.set(cacheKey, JSON.stringify(analysis), 'EX', 7776000); // 90 days 1317 - 1318 - return analysis; 1319 -} 1320 -{{/code}} 583 +**Infrastructure costs:** 584 +* Vercel hobby tier: Free for POC 585 +* AWS Lambda: ~$0.001 per request 586 +* **Total infra:** <$0.01 per analysis 1321 1321 1322 - ===9.3UserCredit Management===588 +**Total estimated cost:** ~$0.15-0.35 per analysis ✅ Meets <$0.35 target 1323 1323 1324 -**PostgreSQL Schema:** 1325 -{{code language="sql"}} 1326 -CREATE TABLE user_credits ( 1327 - user_id UUID PRIMARY KEY, 1328 - tier VARCHAR(20) DEFAULT 'free', 1329 - credit_limit DECIMAL(10,2) DEFAULT 10.00, 1330 - credit_used DECIMAL(10,2) DEFAULT 0.00, 1331 - reset_date TIMESTAMP, 1332 - cache_only_mode BOOLEAN DEFAULT false, 1333 - created_at TIMESTAMP DEFAULT NOW() 1334 -); 590 +=== 9.4 Estimated Timeline (AI-Assisted) === 1335 1335 1336 -CREATE TABLE usage_log ( 1337 - id SERIAL PRIMARY KEY, 1338 - user_id UUID REFERENCES user_credits(user_id), 1339 - job_id VARCHAR(50), 1340 - stage VARCHAR(20), 1341 - cost DECIMAL(10,4), 1342 - cache_hit BOOLEAN, 1343 - created_at TIMESTAMP DEFAULT NOW() 1344 -); 1345 -{{/code}} 592 +**With Cursor IDE + Claude API:** 593 +* Day 1-2: API scaffolding + job queue 594 +* Day 3-4: LLM integration + prompt engineering 595 +* Day 5-6: Evidence retrieval + contradiction search 596 +* Day 7: Report templates + testing with 30 articles 597 +* **Total:** 5-7 days for working POC1 1346 1346 1347 -**Credit Deduction Logic:** 1348 -{{code language="typescript"}} 1349 -async function deductCredit(userId: string, cost: number): Promise<boolean> { 1350 - const user = await db.query('SELECT * FROM user_credits WHERE user_id = $1', [userId]); 1351 - 1352 - const newUsed = user.credit_used + cost; 1353 - 1354 - if (newUsed > user.credit_limit && user.tier === 'free') { 1355 - // Trigger cache-only mode 1356 - await db.query( 1357 - 'UPDATE user_credits SET cache_only_mode = true WHERE user_id = $1', 1358 - [userId] 1359 - ); 1360 - throw new Error('CREDIT_LIMIT_REACHED'); 1361 - } 1362 - 1363 - await db.query( 1364 - 'UPDATE user_credits SET credit_used = $1 WHERE user_id = $2', 1365 - [newUsed, userId] 1366 - ); 1367 - 1368 - return true; 1369 -} 1370 -{{/code}} 599 +**Manual coding (no AI assistance):** 600 +* Estimate: 15-20 days 1371 1371 1372 -=== 9. 4Cache-OnlyModeImplementation ===602 +=== 9.5 First Prompt for AI Code Generation === 1373 1373 1374 -**Middleware:** 1375 -{{code language="typescript"}} 1376 -async function checkCacheOnlyMode(req, res, next) { 1377 - const user = await getUserCredit(req.userId); 1378 - 1379 - if (user.cache_only_mode) { 1380 - // Allow only cache reads 1381 - if (req.body.options?.cache_preference !== 'allow_partial') { 1382 - return res.status(402).json({ 1383 - error: 'credit_limit_reached', 1384 - message: 'Resubmit with cache_preference=allow_partial', 1385 - cache_only_mode: true 1386 - }); 1387 - } 1388 - 1389 - // Modify request to skip Stage 2 for uncached claims 1390 - req.cacheOnlyMode = true; 1391 - } 1392 - 1393 - next(); 1394 -} 1395 -{{/code}} 604 +{{code}} 605 +Based on the FactHarbor POC1 API & Schemas Specification (v0.3), generate a Next.js 14 TypeScript application with: 1396 1396 1397 -=== 9.5 Estimated Timeline === 607 +1. API routes implementing the 7 endpoints specified in Section 3 608 +2. AnalyzeRequest/AnalysisResult types matching schemas in Sections 4-5 609 +3. Anthropic Claude 3.5 Sonnet integration for: 610 + - Claim extraction (with central/supporting marking) 611 + - Scenario generation 612 + - Evidence synthesis (with mandatory contradiction search) 613 + - Verdict generation 614 + - Holistic assessment (article-level credibility) 615 +4. Job-based async execution with progress tracking (7 pipeline stages) 616 +5. Quality Gates 1 & 4 from NFR11 implementation 617 +6. Mandatory contradiction search enforcement (Section 5) 618 +7. Context-aware analysis (experimental) as specified 619 +8. Filesystem-based job storage (no database) 620 +9. Markdown report generation from JSON templates (Section 6) 1398 1398 1399 -**POC1 with 3-Stage Architecture:** 1400 -* Week 1: Stage 1 (Haiku extraction) + Redis setup 1401 -* Week 2: Stage 2 (Sonnet analysis + caching) 1402 -* Week 3: Stage 3 (Holistic assessment) + pipeline orchestration 1403 -* Week 4: User credit system + cache-only mode 1404 -* Week 5: Testing with 100 articles (measure cache hit rate) 1405 -* Week 6: Optimization + bug fixes 1406 -* **Total: 6-8 weeks** 622 +Use the validation rules from Section 5 and error codes from Section 2.1.1. 623 +Target: <$0.35 per analysis, <2 minutes processing time. 624 +{{/code}} 1407 1407 1408 -**Manual coding:** 12-16 weeks 1409 - 1410 1410 --- 1411 1411 1412 -== 10. Testing Strategy == 628 +== 10. Testing Strategy (POC1) == 1413 1413 1414 -=== 10.1 CachePerformanceTesting===630 +=== 10.1 Test Dataset (30 Articles) === 1415 1415 1416 -**Test Scenarios:** 632 +**Category 1: Straightforward Factual (10 articles)** 633 +* Purpose: Baseline accuracy 634 +* Example: "WHO report on global vaccination rates" 635 +* Expected: High claim accuracy, straightforward verdict 1417 1417 1418 -** Scenario1: ColdStart(0cache)**1419 -* Analyze100diverse articles1420 -* Measure: Cost perarticle,cachegrowthrate1421 -* Expected: $0.35-0.40 avg, ~400 uniqueclaimscached637 +**Category 2: Accurate Claims, Questionable Conclusions (10 articles)** ⭐ **Context-Aware Test** 638 +* Purpose: Test holistic assessment capability 639 +* Example: "Coffee cures cancer" (true premises, false conclusion) 640 +* Expected: Individual claims TRUE, article verdict MISLEADING 1422 1422 1423 -** Scenario2:WarmCache(OverlappingDomain)**1424 -* Analyze100 articlesonSAME topic(e.g.,"2024 election")1425 -* Measure:Cachehitrategrowth1426 -* Expected: Hit rate20% → 60% byarticle100642 +**Category 3: Mixed Accuracy (5 articles)** 643 +* Purpose: Test nuance handling 644 +* Example: Articles with some true, some false claims 645 +* Expected: Scenario-level differentiation 1427 1427 1428 -** Scenario3:MatureCache(1,000articles)**1429 -* Analyzenext100articles (diversetopics)1430 -* Measure:Steady-state cachehit rate1431 -* Expected: 60-70% hit rate,$0.15-0.18avgcost647 +**Category 4: Low-Quality Claims (5 articles)** 648 +* Purpose: Test quality gates 649 +* Example: Opinion pieces, compound claims 650 +* Expected: Gate 1 failures, rejection or draft-only mode 1432 1432 1433 -**Scenario 4: Cache-Only Mode** 1434 -* Free user reaches $10 limit (67 articles at 70% hit rate) 1435 -* Submit 10 more articles with {{code}}cache_preference=allow_partial{{/code}} 1436 -* Measure: Coverage %, user satisfaction 1437 -* Expected: 60-70% coverage, instant results 1438 - 1439 1439 === 10.2 Success Metrics === 1440 1440 1441 -**Cache Performance:** 1442 -* Week 1: 5-10% hit rate 1443 -* Week 2: 15-25% hit rate 1444 -* Week 3: 30-40% hit rate 1445 -* Week 4: 45-55% hit rate 1446 -* Target: ≥50% by 1,000 articles 1447 - 1448 -**Cost Targets:** 1449 -* Articles 1-100: $0.35-0.40 avg ⚠️ (expected) 1450 -* Articles 100-500: $0.25-0.30 avg 1451 -* Articles 500-1,000: $0.18-0.22 avg 1452 -* Articles 1,000+: $0.12-0.15 avg ✅ 1453 - 1454 -**Quality Metrics (same as v0.3.1):** 1455 -* Hallucination rate: <5% 1456 -* Context-aware accuracy: ≥70% 654 +**Quality Metrics:** 655 +* Hallucination rate: <5% (target: <3%) 656 +* Context-aware accuracy: ≥70% (experimental - key POC1 goal) 1457 1457 * False positive rate: <15% 1458 1458 * Mandatory contradiction search: 100% compliance 1459 1459 1460 -=== 10.3 Free Tier Economics Validation === 660 +**Performance Metrics:** 661 +* Processing time: <2 minutes per article (standard depth) 662 +* Cost per analysis: <$0.35 663 +* API uptime: >99% 664 +* LLM API error rate: <1% 1461 1461 1462 -**Test with simulated 1,000 users:** 1463 -* Each user: $10 credit 1464 -* 70% cache hit rate 1465 -* Avg 70 articles/user/month 666 +**See:** [[POC1 Roadmap>>Test.FactHarbor.Roadmap.POC1.WebHome]] Section 11 for complete success criteria and testing methodology. 1466 1466 1467 -**Projected Costs:** 1468 -* Total credits: 1,000 × $10 = $10,000 1469 -* Actual LLM costs: ~$9,000 (cache savings) 1470 -* Margin: 10% 1471 - 1472 -**Sustainability Check:** 1473 -* If margin <5% → Reduce free tier limit 1474 -* If margin >20% → Consider increasing free tier 1475 - 1476 1476 --- 1477 1477 1478 - ==11. Cross-References==670 +**End of Specification - FactHarbor POC1 API v0.3** 1479 1479 1480 - ThisAPIspecification implements requirementsfrom:672 +**Ready for xWiki import and AI-assisted implementation!** 🚀 1481 1481 1482 -* **[[POC Requirements>>Test.FactHarbor.Specification.POC.Requirements]]** 1483 -** FR-POC-1 through FR-POC-6 (3-stage architecture) 1484 -** NFR-POC-1 through NFR-POC-3 (quality gates, caching) 1485 -** NEW: FR-POC-7 (Claim-level caching) 1486 -** NEW: FR-POC-8 (User credit system) 1487 -** NEW: FR-POC-9 (Cache-only mode) 1488 - 1489 -* **[[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]]** 1490 -** Approach 1 implemented in Stage 3 1491 -** Context-aware holistic assessment 1492 - 1493 -* **[[Requirements>>Test.FactHarbor.Specification.Requirements.WebHome]]** 1494 -** FR4 (Analysis Summary) - enhanced with caching 1495 -** FR7 (Verdict Calculation) - cached per claim 1496 -** NFR11 (Quality Gates) - enforced across stages 1497 -** NEW: NFR19 (Cost Efficiency via Caching) 1498 -** NEW: NFR20 (Free Tier Sustainability) 1499 - 1500 -* **[[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]]** 1501 -** POC1 3-stage pipeline architecture 1502 -** Redis cache layer 1503 -** User credit system 1504 - 1505 -* **[[Data Model>>Test.FactHarbor.Specification.Data Model.WebHome]]** 1506 -** Claim structure (cacheable unit) 1507 -** Evidence structure 1508 -** Scenario boundaries 1509 - 1510 ---- 1511 - 1512 -**End of Specification - FactHarbor POC1 API v0.4** 1513 - 1514 -**3-stage caching architecture with free tier cache-only mode. Ready for sustainable, scalable implementation!** 🚀 1515 -