Changes for page POC1 API & Schemas Specification
Last modified by Robert Schaub on 2025/12/24 18:26
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -1,904 +1,673 @@ 1 - =POC1 API & Schemas Specification=1 +# FactHarbor POC1 — API & Schemas Specification 2 2 3 ----- 3 +**Version:** 0.3 (POC1 - Production Ready) 4 +**Namespace:** FactHarbor.* 5 +**Syntax:** xWiki 2.1 6 +**Last Updated:** 2025-12-24 4 4 8 +--- 9 + 5 5 == Version History == 6 6 7 7 |=Version|=Date|=Changes 8 -|0.4.1|2025-12-24|Applied 9 critical fixes: file format notice, verdict taxonomy, canonicalization algorithm, Stage 1 cost policy, BullMQ fix, language in cache key, historical claims TTL, idempotency, copyright policy 9 -|0.4|2025-12-24|**BREAKING:** 3-stage pipeline with claim-level caching, user tier system, cache-only mode for free users, Redis cache architecture 10 -|0.3.1|2025-12-24|Fixed single-prompt strategy, SSE clarification, schema canonicalization, cost constraints 11 -|0.3|2025-12-24|Added complete API endpoints, LLM config, risk tiers, scraping details 13 +|0.3|2025-12-24|Added complete API endpoints, LLM config, risk tiers, scraping details, quality gate logging, temporal separation note, cross-references 14 +|0.2|2025-12-24|Initial rebased version with holistic assessment 15 +|0.1|2025-12-24|Original specification 12 12 13 ---- -17 +--- 14 14 15 15 == 1. Core Objective (POC1) == 16 16 17 -The primary technical goal of POC1 is to validate **Approach 1 (Single-Pass Holistic Analysis)** while implementing **claim-level caching** to achieve cost sustainability.21 +The primary technical goal of POC1 is to validate **Approach 1 (Single-Pass Holistic Analysis)**: 18 18 19 -The system must prove that AI can identify an article's **Main Thesis** and determine if supporting claims logically support that thesis without committing fallacies. 23 +The system must prove that AI can identify an article's **Main Thesis** and determine if the supporting claims (even if individually accurate) logically support that thesis without committing fallacies (e.g., correlation vs. causation, cherry-picking, hasty generalization). 20 20 21 -=== Success Criteria: === 22 - 25 +**Success Criteria:** 23 23 * Test with 30 diverse articles 24 24 * Target: ≥70% accuracy detecting misleading articles 25 -* Cost: <$0.25 per NEW analysis (uncached) 26 -* Cost: $0.00 for cached claim reuse 27 -* Cache hit rate: ≥50% after 1,000 articles 28 +* Cost: <$0.35 per analysis 28 28 * Processing time: <2 minutes (standard depth) 29 29 30 - ===EconomicModel:===31 +**See:** [[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]] for complete investigation of 7 approaches. 31 31 32 -* **Free tier:** $10 credit per month (~~40-140 articles depending on cache hits) 33 -* **After limit:** Cache-only mode (instant, free access to cached claims) 34 -* **Paid tier:** Unlimited new analyses 33 +--- 35 35 36 - ----35 +== 2. Runtime Model & Job States == 37 37 38 -== 2. ArchitectureOverview==37 +=== 2.1 Pipeline Steps === 39 39 40 - ===2.13-StagePipeline withCaching===39 +For progress reporting via API, the pipeline follows these stages: 41 41 42 -FactHarbor POC1 uses a **3-stage architecture** designed for claim-level caching and cost efficiency: 41 +# **INGEST**: URL scraping (Jina Reader / Trafilatura) or text normalization. 42 +# **EXTRACT_CLAIMS**: Identifying 3-5 verifiable factual claims + marking central vs. supporting. 43 +# **SCENARIOS**: Generating context interpretations for each claim. 44 +# **RETRIEVAL**: Evidence gathering (Search API + mandatory contradiction search). 45 +# **VERDICTS**: Assigning likelihoods, confidence, and uncertainty per scenario. 46 +# **HOLISTIC_ASSESSMENT**: Evaluating article-level credibility (Thesis vs. Claims logic). 47 +# **REPORT**: Generating final Markdown and JSON outputs. 43 43 44 -{{mermaid}} 45 -graph TD 46 - A[Article Input] --> B[Stage 1: Extract Claims] 47 - B --> C{For Each Claim} 48 - C --> D[Check Cache] 49 - D -->|Cache HIT| E[Return Cached Verdict] 50 - D -->|Cache MISS| F[Stage 2: Analyze Claim] 51 - F --> G[Store in Cache] 52 - G --> E 53 - E --> H[Stage 3: Holistic Assessment] 54 - H --> I[Final Report] 55 -{{/mermaid}} 49 +=== 2.1.1 URL Extraction Strategy === 56 56 57 -==== Stage 1: Claim Extraction (Haiku, no cache) ==== 51 +**Primary:** Jina AI Reader ({{code}}https://r.jina.ai/{url}{{/code}}) 52 +* **Rationale:** Clean markdown, handles JS rendering, free tier sufficient 53 +* **Fallback:** Trafilatura (Python library) for simple static HTML 58 58 59 -* **Input:** Article text 60 -* **Output:** 5 canonical claims (normalized, deduplicated) 61 -* **Model:** Claude Haiku 4 (default, configurable via LLM abstraction layer) 62 -* **Cost:** $0.003 per article 63 -* **Cache strategy:** No caching (article-specific) 55 +**Error Handling:** 64 64 65 -==== Stage 2: Claim Analysis (Sonnet, CACHED) ==== 57 +|=Error Code|=Trigger|=Action 58 +|{{code}}URL_BLOCKED{{/code}}|403/401/Paywall detected|Return error, suggest text paste 59 +|{{code}}URL_UNREACHABLE{{/code}}|Network/DNS failure|Retry once, then fail 60 +|{{code}}URL_NOT_FOUND{{/code}}|404 Not Found|Return error immediately 61 +|{{code}}EXTRACTION_FAILED{{/code}}|Content <50 words or unreadable|Return error with reason 66 66 67 -* **Input:** Singlecanonicalclaim68 -* **Output:**Scenarios+Evidence + Verdicts69 -* **Model:**ClaudeSonnet 3.5 (default, configurablevia LLM abstractionlayer)70 -* **Cost:**$0.081perNEW claim71 -* **Cachestrategy:**Redis,90-dayTTL72 -* **Cachekey:**claim:v1norm1:{language}:{sha256(canonical_claim)}63 +**Supported URL Patterns:** 64 +* ✅ News articles, blog posts, Wikipedia 65 +* ✅ Academic preprints (arXiv) 66 +* ❌ Social media posts (Twitter, Facebook) - not in POC1 67 +* ❌ Video platforms (YouTube, TikTok) - not in POC1 68 +* ❌ PDF files - deferred to Beta 0 73 73 74 -=== =Stage 3: Holistic Assessment(Sonnet,no cache)====70 +=== 2.2 Job Status Enumeration === 75 75 76 -* **Input:** Article + Claim verdicts (from cache or Stage 2) 77 -* **Output:** Article verdict + Fallacies + Logic quality 78 -* **Model:** Claude Sonnet 3.5 (default, configurable via LLM abstraction layer) 79 -* **Cost:** $0.030 per article 80 -* **Cache strategy:** No caching (article-specific) 72 +((( 73 +* **QUEUED** - Job accepted, waiting in queue 74 +* **RUNNING** - Processing in progress 75 +* **SUCCEEDED** - Analysis complete, results available 76 +* **FAILED** - Error occurred, see error details 77 +* **CANCELLED** - User cancelled via DELETE endpoint 78 +))) 81 81 80 +--- 82 82 82 +== 3. REST API Contract == 83 83 84 - **Note:**Stage3implements **Approach1(Single-Pass HolisticAnalysis)**from the [[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]].While claim analysis (Stage 2) is cached for efficiency, the holistic assessment maintains the integrated evaluation philosophy of Approach 1.84 +=== 3.1 Create Analysis Job === 85 85 86 - === TotalCostFormula: ===86 +**Endpoint:** {{code}}POST /v1/analyze{{/code}} 87 87 88 -{{{Cost = $0.003 (extraction) + (N_new_claims × $0.081) + $0.030 (holistic) 88 +**Request Body Example:** 89 +{{code language="json"}} 90 +{ 91 + "input_type": "url", 92 + "input_url": "https://example.com/medical-report-01", 93 + "input_text": null, 94 + "options": { 95 + "browsing": "on", 96 + "depth": "standard", 97 + "max_claims": 5, 98 + "context_aware_analysis": true 99 + }, 100 + "client": { 101 + "request_id": "optional-client-tracking-id", 102 + "source_label": "optional" 103 + } 104 +} 105 +{{/code}} 89 89 90 -Examples: 91 -- 0 new claims (100% cache hit): $0.033 92 -- 1 new claim (80% cache hit): $0.114 93 -- 3 new claims (40% cache hit): $0.276 94 -- 5 new claims (0% cache hit): $0.438 95 -}}} 107 +**Options:** 108 +* {{code}}browsing{{/code}}: {{code}}on{{/code}} | {{code}}off{{/code}} (retrieve web sources or just output queries) 109 +* {{code}}depth{{/code}}: {{code}}standard{{/code}} | {{code}}deep{{/code}} (evidence thoroughness) 110 +* {{code}}max_claims{{/code}}: 1-50 (default: 10) 111 +* {{code}}context_aware_analysis{{/code}}: {{code}}true{{/code}} | {{code}}false{{/code}} (experimental) 96 96 97 - ----113 +**Response:** {{code}}202 Accepted{{/code}} 98 98 99 -=== 2.2 User Tier System === 115 +{{code language="json"}} 116 +{ 117 + "job_id": "01J...ULID", 118 + "status": "QUEUED", 119 + "created_at": "2025-12-24T10:31:00Z", 120 + "links": { 121 + "self": "/v1/jobs/01J...ULID", 122 + "result": "/v1/jobs/01J...ULID/result", 123 + "report": "/v1/jobs/01J...ULID/report", 124 + "events": "/v1/jobs/01J...ULID/events" 125 + } 126 +} 127 +{{/code}} 100 100 101 -|=Tier|=Monthly Credit|=After Limit|=Cache Access|=Analytics 102 -|**Free**|$10|Cache-only mode|✅ Full|Basic 103 -|**Pro** (future)|$50|Continues|✅ Full|Advanced 104 -|**Enterprise** (future)|Custom|Continues|✅ Full + Priority|Full 129 +--- 105 105 106 - **FreeTierEconomics:**131 +=== 3.2 Get Job Status === 107 107 108 -* $10 credit = 40-140 articles analyzed (depending on cache hit rate) 109 -* Average 70 articles/month at 70% cache hit rate 110 -* After limit: Cache-only mode 133 +**Endpoint:** {{code}}GET /v1/jobs/{job_id}{{/code}} 111 111 112 - ----135 +**Response:** {{code}}200 OK{{/code}} 113 113 114 -=== 2.3 Cache-Only Mode (Free Tier Feature) === 115 - 116 -When free users reach their $10 monthly limit, they enter **Cache-Only Mode**: 117 - 118 -==== What Cache-Only Mode Provides: ==== 119 - 120 -✅ **Claim Extraction (Platform-Funded):** 121 - 122 -* Stage 1 extraction runs at $0.003 per article 123 -* **Cost: Absorbed by platform** (not charged to user credit) 124 -* Rationale: Extraction is necessary to check cache, and cost is negligible 125 -* Rate limit: Max 50 extractions/day in cache-only mode (prevents abuse) 126 - 127 -✅ **Instant Access to Cached Claims:** 128 - 129 -* Any claim that exists in cache → Full verdict returned 130 -* Cost: $0 (no LLM calls) 131 -* Response time: <100ms 132 - 133 -✅ **Partial Article Analysis:** 134 - 135 -* Check each claim against cache 136 -* Return verdicts for ALL cached claims 137 -* For uncached claims: Return "status": "cache_miss" 138 - 139 -✅ **Cache Coverage Report:** 140 - 141 -* "3 of 5 claims available in cache (60% coverage)" 142 -* Links to cached analyses 143 -* Estimated cost to complete: $0.162 (2 new claims) 144 - 145 -❌ **Not Available in Cache-Only Mode:** 146 - 147 -* New claim analysis (Stage 2 LLM calls blocked) 148 -* Full holistic assessment (Stage 3 blocked if any claims missing) 149 - 150 -==== User Experience Example: ==== 151 - 152 -{{{{ 153 - "status": "cache_only_mode", 154 - "message": "Monthly credit limit reached. Showing cached results only.", 155 - "cache_coverage": { 156 - "claims_total": 5, 157 - "claims_cached": 3, 158 - "claims_missing": 2, 159 - "coverage_percent": 60 160 - }, 161 - "cached_claims": [ 162 - {"claim_id": "C1", "verdict": "Likely", "confidence": 0.82}, 163 - {"claim_id": "C2", "verdict": "Highly Likely", "confidence": 0.91}, 164 - {"claim_id": "C4", "verdict": "Unclear", "confidence": 0.55} 165 - ], 166 - "missing_claims": [ 167 - {"claim_id": "C3", "claim_text": "...", "estimated_cost": "$0.081"}, 168 - {"claim_id": "C5", "claim_text": "...", "estimated_cost": "$0.081"} 169 - ], 170 - "upgrade_options": { 171 - "top_up": "$5 for 20-70 more articles", 172 - "pro_tier": "$50/month unlimited" 173 - } 137 +{{code language="json"}} 138 +{ 139 + "job_id": "01J...ULID", 140 + "status": "RUNNING", 141 + "created_at": "2025-12-24T10:31:00Z", 142 + "updated_at": "2025-12-24T10:31:22Z", 143 + "progress": { 144 + "step": "RETRIEVAL", 145 + "percent": 60, 146 + "message": "Gathering evidence for C2-S1", 147 + "current_claim_id": "C2", 148 + "current_scenario_id": "C2-S1" 149 + }, 150 + "input_echo": { 151 + "input_type": "url", 152 + "input_url": "https://example.com/medical-report-01" 153 + }, 154 + "links": { 155 + "self": "/v1/jobs/01J...ULID", 156 + "result": "/v1/jobs/01J...ULID/result", 157 + "report": "/v1/jobs/01J...ULID/report" 158 + }, 159 + "error": null 174 174 } 175 -}} }161 +{{/code}} 176 176 177 - **Design Rationale:**163 +--- 178 178 179 -* Free users still get value (cached claims often answer their question) 180 -* Demonstrates FactHarbor's value (partial results encourage upgrade) 181 -* Sustainable for platform (no additional cost) 182 -* Fair to all users (everyone contributes to cache) 165 +=== 3.3 Get JSON Result === 183 183 184 - ----167 +**Endpoint:** {{code}}GET /v1/jobs/{job_id}/result{{/code}} 185 185 169 +**Response:** {{code}}200 OK{{/code}} (Returns the **AnalysisResult** schema - see Section 4) 186 186 171 +**Other Responses:** 172 +* {{code}}409 Conflict{{/code}} - Job not finished yet 173 +* {{code}}404 Not Found{{/code}} - Job ID unknown 187 187 188 - == 6. LLM Abstraction Layer ==175 +--- 189 189 190 -=== 6.1DesignPrinciple ===177 +=== 3.4 Download Markdown Report === 191 191 192 -** FactHarbor usesprovider-agnostic LLM abstraction**toavoidvendorlock-in andenable:179 +**Endpoint:** {{code}}GET /v1/jobs/{job_id}/report{{/code}} 193 193 194 -* **Provider switching:** Change LLM providers without code changes 195 -* **Cost optimization:** Use different providers for different stages 196 -* **Resilience:** Automatic fallback if primary provider fails 197 -* **Cross-checking:** Compare outputs from multiple providers 198 -* **A/B testing:** Test new models without deployment changes 181 +**Response:** {{code}}200 OK{{/code}} with {{code}}text/markdown; charset=utf-8{{/code}} content 199 199 200 -**Implementation:** All LLM calls go through an abstraction layer that routes to configured providers. 183 +**Headers:** 184 +* {{code}}Content-Disposition: attachment; filename="factharbor_poc1_{job_id}.md"{{/code}} 201 201 202 ----- 186 +**Other Responses:** 187 +* {{code}}409 Conflict{{/code}} - Job not finished 188 +* {{code}}404 Not Found{{/code}} - Job unknown 203 203 204 - === 6.2 LLM Provider Interface ===190 +--- 205 205 206 - **AbstractInterface:**192 +=== 3.5 Stream Job Events (Optional, Recommended) === 207 207 208 -{{{ 209 -interface LLMProvider { 210 - // Core methods 211 - complete(prompt: string, options: CompletionOptions): Promise<CompletionResponse> 212 - stream(prompt: string, options: CompletionOptions): AsyncIterator<StreamChunk> 213 - 214 - // Provider metadata 215 - getName(): string 216 - getMaxTokens(): number 217 - getCostPer1kTokens(): { input: number, output: number } 218 - 219 - // Health check 220 - isAvailable(): Promise<boolean> 221 -} 194 +**Endpoint:** {{code}}GET /v1/jobs/{job_id}/events{{/code}} 222 222 223 -interface CompletionOptions { 224 - model?: string 225 - maxTokens?: number 226 - temperature?: number 227 - stopSequences?: string[] 228 - systemPrompt?: string 229 -} 230 -}}} 196 +**Response:** Server-Sent Events (SSE) stream 231 231 232 ----- 198 +**Event Types:** 199 +* {{code}}progress{{/code}} - Progress update 200 +* {{code}}claim_extracted{{/code}} - Claim identified 201 +* {{code}}verdict_computed{{/code}} - Scenario verdict complete 202 +* {{code}}complete{{/code}} - Job finished 203 +* {{code}}error{{/code}} - Error occurred 233 233 234 - === 6.3 Supported Providers (POC1) ===205 +--- 235 235 236 - **PrimaryProvider(Default):**207 +=== 3.6 Cancel Job === 237 237 238 -* **Anthropic Claude API** 239 - * Models: Claude Haiku 4, Claude Sonnet 3.5, Claude Opus 4 240 - * Used by default in POC1 241 - * Best quality for holistic analysis 209 +**Endpoint:** {{code}}DELETE /v1/jobs/{job_id}{{/code}} 242 242 243 - **SecondaryProviders(Future):**211 +Attempts to cancel a queued or running job. 244 244 245 -* **OpenAI API** 246 - * Models: GPT-4o, GPT-4o-mini 247 - * For cost comparison 248 - 249 -* **Google Vertex AI** 250 - * Models: Gemini 1.5 Pro, Gemini 1.5 Flash 251 - * For diversity in evidence gathering 213 +**Response:** {{code}}200 OK{{/code}} with updated Job object (status: CANCELLED) 252 252 253 -* **Local Models** (Post-POC) 254 - * Models: Llama 3.1, Mistral 255 - * For privacy-sensitive deployments 215 +**Note:** Already-completed jobs cannot be cancelled. 256 256 257 ---- -217 +--- 258 258 259 -=== 6.4Provider Configuration===219 +=== 3.7 Health Check === 260 260 261 -**En vironmentVariables:**221 +**Endpoint:** {{code}}GET /v1/health{{/code}} 262 262 263 -{{{ 264 -# Primary provider 265 -LLM_PRIMARY_PROVIDER=anthropic 266 -ANTHROPIC_API_KEY=sk-ant-... 223 +**Response:** {{code}}200 OK{{/code}} 267 267 268 -# Fallback provider 269 -LLM_FALLBACK_PROVIDER=openai 270 -OPENAI_API_KEY=sk-... 271 - 272 -# Provider selection per stage 273 -LLM_STAGE1_PROVIDER=anthropic 274 -LLM_STAGE1_MODEL=claude-haiku-4 275 -LLM_STAGE2_PROVIDER=anthropic 276 -LLM_STAGE2_MODEL=claude-sonnet-3-5 277 -LLM_STAGE3_PROVIDER=anthropic 278 -LLM_STAGE3_MODEL=claude-sonnet-3-5 279 - 280 -# Cost limits 281 -LLM_MAX_COST_PER_REQUEST=1.00 282 -}}} 283 - 284 -**Database Configuration (Alternative):** 285 - 286 -{{{{ 225 +{{code language="json"}} 287 287 { 288 - "providers": [ 289 - { 290 - "name": "anthropic", 291 - "api_key_ref": "vault://anthropic-api-key", 292 - "enabled": true, 293 - "priority": 1 294 - }, 295 - { 296 - "name": "openai", 297 - "api_key_ref": "vault://openai-api-key", 298 - "enabled": true, 299 - "priority": 2 300 - } 301 - ], 302 - "stage_config": { 303 - "stage1": { 304 - "provider": "anthropic", 305 - "model": "claude-haiku-4", 306 - "max_tokens": 4096, 307 - "temperature": 0.0 308 - }, 309 - "stage2": { 310 - "provider": "anthropic", 311 - "model": "claude-sonnet-3-5", 312 - "max_tokens": 16384, 313 - "temperature": 0.3 314 - }, 315 - "stage3": { 316 - "provider": "anthropic", 317 - "model": "claude-sonnet-3-5", 318 - "max_tokens": 8192, 319 - "temperature": 0.2 320 - } 321 - } 227 + "status": "ok", 228 + "version": "POC1-v0.3", 229 + "model": "claude-3-5-sonnet-20241022" 322 322 } 323 -}} }231 +{{/code}} 324 324 325 ---- -233 +--- 326 326 327 -== =6.5Stage-Specific Models(POC1 Defaults) ===235 +== 4. AnalysisResult Schema (Context-Aware) == 328 328 329 -** Stage1: ClaimExtraction**237 +This schema implements the **Context-Aware Analysis** required by the POC1 specification. 330 330 331 -* **Default:** Anthropic Claude Haiku 4 332 -* **Alternative:** OpenAI GPT-4o-mini, Google Gemini 1.5 Flash 333 -* **Rationale:** Fast, cheap, simple task 334 -* **Cost:** ~$0.003 per article 335 - 336 -**Stage 2: Claim Analysis** (CACHEABLE) 337 - 338 -* **Default:** Anthropic Claude Sonnet 3.5 339 -* **Alternative:** OpenAI GPT-4o, Google Gemini 1.5 Pro 340 -* **Rationale:** High-quality analysis, cached 90 days 341 -* **Cost:** ~$0.081 per NEW claim 342 - 343 -**Stage 3: Holistic Assessment** 344 - 345 -* **Default:** Anthropic Claude Sonnet 3.5 346 -* **Alternative:** OpenAI GPT-4o, Claude Opus 4 (for high-stakes) 347 -* **Rationale:** Complex reasoning, logical fallacy detection 348 -* **Cost:** ~$0.030 per article 349 - 350 -**Cost Comparison (Example):** 351 - 352 -|=Stage|=Anthropic (Default)|=OpenAI Alternative|=Google Alternative 353 -|Stage 1|Claude Haiku 4 ($0.003)|GPT-4o-mini ($0.002)|Gemini Flash ($0.002) 354 -|Stage 2|Claude Sonnet 3.5 ($0.081)|GPT-4o ($0.045)|Gemini Pro ($0.050) 355 -|Stage 3|Claude Sonnet 3.5 ($0.030)|GPT-4o ($0.018)|Gemini Pro ($0.020) 356 -|**Total (0% cache)**|**$0.114**|**$0.065**|**$0.072** 357 - 358 -**Note:** POC1 uses Anthropic exclusively for consistency. Multi-provider support planned for POC2. 359 - 360 ----- 361 - 362 -=== 6.6 Failover Strategy === 363 - 364 -**Automatic Failover:** 365 - 366 -{{{ 367 -async function completeLLM(stage: string, prompt: string): Promise<string> { 368 - const primaryProvider = getProviderForStage(stage) 369 - const fallbackProvider = getFallbackProvider() 370 - 371 - try { 372 - return await primaryProvider.complete(prompt) 373 - } catch (error) { 374 - if (error.type === 'rate_limit' || error.type === 'service_unavailable') { 375 - logger.warn(`Primary provider failed, using fallback`) 376 - return await fallbackProvider.complete(prompt) 377 - } 378 - throw error 379 - } 380 -} 381 -}}} 382 - 383 -**Fallback Priority:** 384 - 385 -1. **Primary:** Configured provider for stage 386 -2. **Secondary:** Fallback provider (if configured) 387 -3. **Cache:** Return cached result (if available for Stage 2) 388 -4. **Error:** Return 503 Service Unavailable 389 - 390 ----- 391 - 392 -=== 6.7 Provider Selection API === 393 - 394 -**Admin Endpoint:** POST /admin/v1/llm/configure 395 - 396 -**Update provider for specific stage:** 397 - 398 -{{{{ 239 +{{code language="json"}} 399 399 { 400 - "stage": "stage2", 401 - "provider": "openai", 402 - "model": "gpt-4o", 403 - "max_tokens": 16384, 404 - "temperature": 0.3 405 -} 406 -}}} 407 - 408 -**Response:** 200 OK 409 - 410 -{{{{ 411 -{ 412 - "message": "LLM configuration updated", 413 - "stage": "stage2", 414 - "previous": { 415 - "provider": "anthropic", 416 - "model": "claude-sonnet-3-5" 241 + "metadata": { 242 + "job_id": "string (ULID)", 243 + "timestamp_utc": "ISO8601", 244 + "engine_version": "POC1-v0.3", 245 + "llm_provider": "anthropic", 246 + "llm_model": "claude-3-5-sonnet-20241022", 247 + "usage_stats": { 248 + "input_tokens": "integer", 249 + "output_tokens": "integer", 250 + "estimated_cost_usd": "float", 251 + "response_time_sec": "float" 252 + } 417 417 }, 418 - "current": { 419 - "provider": "openai", 420 - "model": "gpt-4o" 254 + "article_holistic_assessment": { 255 + "main_thesis": "string (The core argument detected)", 256 + "overall_verdict": "WELL-SUPPORTED | MISLEADING | REFUTED | UNCERTAIN", 257 + "logic_quality_score": "float (0-1)", 258 + "fallacies_detected": ["correlation-causation", "cherry-picking", "hasty-generalization"], 259 + "verdict_reasoning": "string (Explanation of why article credibility differs from claim average)", 260 + "experimental_feature": true 421 421 }, 422 - "cost_impact": { 423 - "previous_cost_per_claim": 0.081, 424 - "new_cost_per_claim": 0.045, 425 - "savings_percent": 44 426 - } 427 -} 428 -}}} 429 - 430 -**Get current configuration:** 431 - 432 -GET /admin/v1/llm/config 433 - 434 -{{{{ 435 -{ 436 - "providers": ["anthropic", "openai"], 437 - "primary": "anthropic", 438 - "fallback": "openai", 439 - "stages": { 440 - "stage1": { 441 - "provider": "anthropic", 442 - "model": "claude-haiku-4", 443 - "cost_per_request": 0.003 444 - }, 445 - "stage2": { 446 - "provider": "anthropic", 447 - "model": "claude-sonnet-3-5", 448 - "cost_per_new_claim": 0.081 449 - }, 450 - "stage3": { 451 - "provider": "anthropic", 452 - "model": "claude-sonnet-3-5", 453 - "cost_per_request": 0.030 262 + "claims": [ 263 + { 264 + "claim_id": "C1", 265 + "is_central_to_thesis": "boolean", 266 + "claim_text": "string", 267 + "canonical_form": "string", 268 + "claim_type": "descriptive | causal | predictive | normative | definitional", 269 + "evaluability": "evaluable | partly_evaluable | not_evaluable", 270 + "risk_tier": "A | B | C", 271 + "risk_tier_justification": "string", 272 + "domain": "string (e.g., 'public health', 'economics')", 273 + "key_terms": ["term1", "term2"], 274 + "entities": ["Person X", "Org Y"], 275 + "time_scope_detected": "2020-2024", 276 + "geography_scope_detected": "Brazil", 277 + "scenarios": [ 278 + { 279 + "scenario_id": "C1-S1", 280 + "context_title": "string", 281 + "definitions": {"key_term": "definition"}, 282 + "assumptions": ["Assumption 1", "Assumption 2"], 283 + "boundaries": { 284 + "time": "as of 2025-01", 285 + "geography": "Brazil", 286 + "population": "adult population", 287 + "conditions": "excludes X; includes Y" 288 + }, 289 + "scope_of_evidence": "What counts as evidence for this scenario", 290 + "scenario_questions": ["Question that decides the verdict"], 291 + "verdict": { 292 + "label": "Highly Likely | Likely | Unclear | Unlikely | Refuted | Unsubstantiated", 293 + "probability_range": [0.0, 1.0], 294 + "confidence": "float (0-1)", 295 + "reasoning": "string", 296 + "key_supporting_evidence_ids": ["E1", "E3"], 297 + "key_counter_evidence_ids": ["E2"], 298 + "uncertainty_factors": ["Data gap", "Method disagreement"], 299 + "what_would_change_my_mind": ["Specific new study", "Updated dataset"] 300 + }, 301 + "evidence": [ 302 + { 303 + "evidence_id": "E1", 304 + "stance": "supports | undermines | mixed | context_dependent", 305 + "relevance_to_scenario": "float (0-1)", 306 + "evidence_summary": ["Bullet fact 1", "Bullet fact 2"], 307 + "citation": { 308 + "title": "Source title", 309 + "author_or_org": "Org/Author", 310 + "publication_date": "2024-05-01", 311 + "url": "https://source.example", 312 + "publisher": "Publisher/Domain" 313 + }, 314 + "excerpt": ["Short quote ≤25 words (optional)"], 315 + "source_reliability_score": "float (0-1) - READ-ONLY SNAPSHOT", 316 + "reliability_justification": "Why high/medium/low", 317 + "limitations_and_reservations": ["Limitation 1", "Limitation 2"], 318 + "retraction_or_dispute_signal": "none | correction | retraction | disputed", 319 + "retrieval_status": "OK | NEEDS_RETRIEVAL | FAILED" 320 + } 321 + ] 322 + } 323 + ] 454 454 } 325 + ], 326 + "quality_gates": { 327 + "gate1_claim_validation": "pass | fail", 328 + "gate4_verdict_confidence": "pass | fail", 329 + "passed_all": "boolean", 330 + "gate_fail_reasons": [ 331 + { 332 + "gate": "gate1_claim_validation", 333 + "claim_id": "C1", 334 + "reason_code": "OPINION_DETECTED | COMPOUND_CLAIM | SUBJECTIVE | TOO_VAGUE", 335 + "explanation": "Human-readable explanation" 336 + } 337 + ] 338 + }, 339 + "global_notes": { 340 + "limitations": ["System limitation 1", "Limitation 2"], 341 + "safety_or_policy_notes": ["Note 1"] 455 455 } 456 456 } 457 -}} }344 +{{/code}} 458 458 459 - ----346 +=== 4.1 Risk Tier Definitions === 460 460 461 -=== 6.8 Implementation Notes === 348 +|=Tier|=Impact|=Examples|=Actions 349 +|**A (High)**|High real-world impact if wrong|Health claims, safety information, financial advice, medical procedures|Human review recommended (Mode3_Human_Reviewed_Required) 350 +|**B (Medium)**|Moderate impact, contested topics|Political claims, social issues, scientific debates, economic predictions|Enhanced contradiction search, AI-generated publication OK (Mode2_AI_Generated) 351 +|**C (Low)**|Low impact, easily verifiable|Historical facts, basic statistics, biographical data, geographic information|Standard processing, AI-generated publication OK (Mode2_AI_Generated) 462 462 463 - **ProviderAdapterPattern:**353 +=== 4.2 Source Reliability (Read-Only Snapshots) === 464 464 465 -{{{ 466 -class AnthropicProvider implements LLMProvider { 467 - async complete(prompt: string, options: CompletionOptions) { 468 - const response = await anthropic.messages.create({ 469 - model: options.model || 'claude-sonnet-3-5', 470 - max_tokens: options.maxTokens || 4096, 471 - messages: [{ role: 'user', content: prompt }], 472 - system: options.systemPrompt 473 - }) 474 - return response.content[0].text 475 - } 476 -} 355 +**IMPORTANT:** The {{code}}source_reliability_score{{/code}} in each evidence item is a **historical snapshot** from the weekly background scoring job. 477 477 478 -class OpenAIProvider implements LLMProvider { 479 - async complete(prompt: string, options: CompletionOptions) { 480 - const response = await openai.chat.completions.create({ 481 - model: options.model || 'gpt-4o', 482 - max_tokens: options.maxTokens || 4096, 483 - messages: [ 484 - { role: 'system', content: options.systemPrompt }, 485 - { role: 'user', content: prompt } 486 - ] 487 - }) 488 - return response.choices[0].message.content 489 - } 490 -} 491 -}}} 357 +* POC1 treats these scores as **read-only** (no modification during analysis) 358 +* **Prevents circular dependency:** scoring → affects retrieval → affects scoring 359 +* Full Source Track Record System is a **separate service** (not part of POC1) 360 +* **Temporal separation:** Scoring runs weekly; analysis uses snapshots 492 492 493 -** Provider Registry:**362 +**See:** [[Data Model>>Test.FactHarbor.Specification.Data Model.WebHome]] Section 1.3 (Source Track Record System) for scoring algorithm. 494 494 495 -{{{ 496 -const providers = new Map<string, LLMProvider>() 497 -providers.set('anthropic', new AnthropicProvider()) 498 -providers.set('openai', new OpenAIProvider()) 499 -providers.set('google', new GoogleProvider()) 364 +=== 4.3 Quality Gate Reason Codes === 500 500 501 -function getProvider(name: string): LLMProvider { 502 - return providers.get(name) || providers.get(config.primaryProvider) 503 -} 504 -}}} 366 +**Gate 1 (Claim Validation):** 367 +* {{code}}OPINION_DETECTED{{/code}} - Subjective judgment without factual anchor 368 +* {{code}}COMPOUND_CLAIM{{/code}} - Multiple claims in one statement 369 +* {{code}}SUBJECTIVE{{/code}} - Value judgment, not verifiable fact 370 +* {{code}}TOO_VAGUE{{/code}} - Lacks specificity for evaluation 505 505 506 ----- 372 +**Gate 4 (Verdict Confidence):** 373 +* {{code}}LOW_CONFIDENCE{{/code}} - Confidence below threshold (<0.5) 374 +* {{code}}INSUFFICIENT_EVIDENCE{{/code}} - Too few sources to reach verdict 375 +* {{code}}CONTRADICTORY_EVIDENCE{{/code}} - Evidence conflicts without resolution 376 +* {{code}}NO_COUNTER_EVIDENCE{{/code}} - Contradiction search failed 507 507 508 - ==3. RESTAPIContract==378 +**Purpose:** Enable system improvement workflow (Observe → Analyze → Improve) 509 509 510 - === 3.1 User Credit Tracking ===380 +--- 511 511 512 - **Endpoint:**GET /v1/user/credit382 +== 5. Validation Rules (POC1 Enforcement) == 513 513 514 -**Response:** 200 OK 384 +|=Rule|=Requirement 385 +|**Mandatory Contradiction**|For every claim, the engine MUST search for "undermines" evidence. If none found, reasoning must explicitly state: "No counter-evidence found despite targeted search." Evidence must include at least 1 item with {{code}}stance ∈ {undermines, mixed, context_dependent}{{/code}} OR explicit note in {{code}}uncertainty_factors{{/code}}. 386 +|**Context-Aware Logic**|The {{code}}overall_verdict{{/code}} must prioritize central claims. If a {{code}}is_central_to_thesis=true{{/code}} claim is REFUTED, the overall article cannot be WELL-SUPPORTED. Central claims override verdict averaging. 387 +|**Author Identification**|All automated outputs MUST include {{code}}author_type: "AI/AKEL"{{/code}} or equivalent marker to distinguish AI-generated from human-reviewed content. 388 +|**Claim-to-Scenario Lifecycle**|In stateless POC1, Scenarios are **strictly children** of a specific Claim version. If a Claim's text changes, child Scenarios are part of that version's "snapshot." No scenario migration across versions. 515 515 516 -{{{{ 517 - "user_id": "user_abc123", 518 - "tier": "free", 519 - "credit_limit": 10.00, 520 - "credit_used": 7.42, 521 - "credit_remaining": 2.58, 522 - "reset_date": "2025-02-01T00:00:00Z", 523 - "cache_only_mode": false, 524 - "usage_stats": { 525 - "articles_analyzed": 67, 526 - "claims_from_cache": 189, 527 - "claims_newly_analyzed": 113, 528 - "cache_hit_rate": 0.626 529 - } 530 -} 531 -}}} 390 +--- 532 532 533 - ----392 +== 6. Deterministic Markdown Template == 534 534 535 - ===3.2CreateAnalysisJob(3-Stage)===394 +The system renders {{code}}report.md{{/code}} using a **fixed template** based on the JSON result (NOT generated by LLM). 536 536 537 -**Endpoint:** POST /v1/analyze 396 +{{code language="markdown"}} 397 +# FactHarbor Analysis Report: {overall_verdict} 538 538 539 -==== Idempotency Support: ==== 399 +**Job ID:** {job_id} | **Generated:** {timestamp_utc} 400 +**Model:** {llm_model} | **Cost:** ${estimated_cost_usd} | **Time:** {response_time_sec}s 540 540 541 - To prevent duplicate job creation on network retries, clients SHOULD include:402 +--- 542 542 543 -{{{POST /v1/analyze 544 -Idempotency-Key: {client-generated-uuid} 545 -}}} 404 +## 1. Holistic Assessment (Experimental) 546 546 547 - ORuse theclient.request_id field:406 +**Main Thesis:** {main_thesis} 548 548 549 -{{{{ 550 - "input_url": "...", 551 - "client": { 552 - "request_id": "client-uuid-12345", 553 - "source_label": "optional" 554 - } 555 -} 556 -}}} 408 +**Overall Verdict:** {overall_verdict} 557 557 558 -** ServerBehavior:**410 +**Logic Quality Score:** {logic_quality_score}/1.0 559 559 560 -* If Idempotency-Key or request_id seen before (within 24 hours): 561 -** Return existing job (200 OK, not 202 Accepted) 562 -** Do NOT create duplicate job or charge twice 563 -* Idempotency keys expire after 24 hours (matches job retention) 412 +**Fallacies Detected:** {fallacies_detected} 564 564 565 -** ExampleResponse(Idempotent):**414 +**Reasoning:** {verdict_reasoning} 566 566 567 -{{{{ 568 - "job_id": "01J...ULID", 569 - "status": "RUNNING", 570 - "idempotent": true, 571 - "original_request_at": "2025-12-24T10:31:00Z", 572 - "message": "Returning existing job (idempotency key matched)" 573 -} 574 -}}} 416 +--- 575 575 576 - ====RequestBody: ====418 +## 2. Key Claims Analysis 577 577 578 -{{{{ 579 - "input_type": "url", 580 - "input_url": "https://example.com/medical-report-01", 581 - "input_text": null, 582 - "options": { 583 - "browsing": "on", 584 - "depth": "standard", 585 - "max_claims": 5, 586 - "scenarios_per_claim": 2, 587 - "max_evidence_per_scenario": 6, 588 - "context_aware_analysis": true 589 - }, 590 - "client": { 591 - "request_id": "optional-client-tracking-id", 592 - "source_label": "optional" 593 - } 594 -} 595 -}}} 420 +### [C1] {claim_text} 421 +* **Role:** {is_central_to_thesis ? "Central to thesis" : "Supporting claim"} 422 +* **Risk Tier:** {risk_tier} ({risk_tier_justification}) 423 +* **Evaluability:** {evaluability} 596 596 597 -** Options:**425 +**Scenarios Explored:** {scenarios.length} 598 598 599 -* browsing: on | off (retrieve web sources or just output queries) 600 -* depth: standard | deep (evidence thoroughness) 601 -* max_claims: 1-10 (default: **5** for cost control) 602 -* scenarios_per_claim: 1-5 (default: **2** for cost control) 603 -* max_evidence_per_scenario: 3-10 (default: **6**) 604 -* context_aware_analysis: true | false (experimental) 427 +#### Scenario: {scenario.context_title} 428 +* **Verdict:** {verdict.label} (Confidence: {verdict.confidence}) 429 +* **Probability Range:** {verdict.probability_range[0]} - {verdict.probability_range[1]} 430 +* **Reasoning:** {verdict.reasoning} 605 605 606 -**Response:** 202 Accepted 432 +**Evidence:** 433 +* Supporting: {evidence.filter(e => e.stance == "supports").length} sources 434 +* Undermining: {evidence.filter(e => e.stance == "undermines").length} sources 435 +* Mixed: {evidence.filter(e => e.stance == "mixed").length} sources 607 607 608 -{{{{ 609 - "job_id": "01J...ULID", 610 - "status": "QUEUED", 611 - "created_at": "2025-12-24T10:31:00Z", 612 - "estimated_cost": 0.114, 613 - "cost_breakdown": { 614 - "stage1_extraction": 0.003, 615 - "stage2_new_claims": 0.081, 616 - "stage2_cached_claims": 0.000, 617 - "stage3_holistic": 0.030 618 - }, 619 - "cache_info": { 620 - "claims_to_extract": 5, 621 - "estimated_cache_hits": 4, 622 - "estimated_new_claims": 1 623 - }, 624 - "links": { 625 - "self": "/v1/jobs/01J...ULID", 626 - "result": "/v1/jobs/01J...ULID/result", 627 - "report": "/v1/jobs/01J...ULID/report", 628 - "events": "/v1/jobs/01J...ULID/events" 629 - } 630 -} 631 -}}} 437 +**Key Evidence:** 438 +* [{evidence[0].citation.title}]({evidence[0].citation.url}) - {evidence[0].stance} 632 632 633 - **Error Responses:**440 +--- 634 634 635 - 402PaymentRequired - Free tierlimitreached, cache-only mode442 +## 3. Quality Assessment 636 636 637 -{{{{ 638 - "error": "credit_limit_reached", 639 - "message": "Monthly credit limit reached. Entering cache-only mode.", 640 - "cache_only_mode": true, 641 - "credit_remaining": 0.00, 642 - "reset_date": "2025-02-01T00:00:00Z", 643 - "action": "Resubmit with cache_preference=allow_partial for cached results" 644 -} 645 -}}} 444 +**Quality Gates:** 445 +* Gate 1 (Claim Validation): {gate1_claim_validation} 446 +* Gate 4 (Verdict Confidence): {gate4_verdict_confidence} 447 +* Overall: {passed_all ? "PASS" : "FAIL"} 646 646 647 ----- 449 +{if gate_fail_reasons.length > 0} 450 +**Failed Gates:** 451 +{gate_fail_reasons.map(r => `* ${r.gate}: ${r.explanation}`)} 452 +{/if} 648 648 649 - == 4. Data Schemas ==454 +--- 650 650 651 - ===4.1Stage 1 Output: ClaimExtraction===456 +## 4. Limitations & Disclaimers 652 652 653 -{{{{ 654 - "job_id": "01J...ULID", 655 - "stage": "stage1_extraction", 656 - "article_metadata": { 657 - "title": "Article title", 658 - "source_url": "https://example.com/article", 659 - "extracted_text_length": 5234, 660 - "language": "en" 661 - }, 662 - "claims": [ 663 - { 664 - "claim_id": "C1", 665 - "claim_text": "Original claim text from article", 666 - "canonical_claim": "Normalized, deduplicated phrasing", 667 - "claim_hash": "sha256:abc123...", 668 - "is_central_to_thesis": true, 669 - "claim_type": "causal", 670 - "evaluability": "evaluable", 671 - "risk_tier": "B", 672 - "domain": "public_health" 673 - } 674 - ], 675 - "article_thesis": "Main argument detected", 676 - "cost": 0.003 677 -} 678 -}}} 458 +**System Limitations:** 459 +{limitations.map(l => `* ${l}`)} 679 679 680 ----- 461 +**Important Notes:** 462 +* This analysis is AI-generated and experimental (POC1) 463 +* Context-aware article verdict is being tested for accuracy 464 +* Human review recommended for high-risk claims (Tier A) 465 +* Cost: ${estimated_cost_usd} | Tokens: {input_tokens + output_tokens} 681 681 682 - ===4.5VerdictLabelTaxonomy===467 +**Methodology:** FactHarbor uses Claude 3.5 Sonnet to extract claims, generate scenarios, gather evidence (with mandatory contradiction search), and assess logical coherence between claims and article thesis. 683 683 684 - FactHarbor uses **three distinct verdict taxonomies** depending on analysis level:469 +--- 685 685 686 -==== 4.5.1 Scenario Verdict Labels (Stage 2) ==== 471 +*Generated by FactHarbor POC1-v0.3 | [About FactHarbor](https://factharbor.org)* 472 +{{/code}} 687 687 688 - Usedfor individualscenarioverdictswithin aclaim.474 +**Target Report Size:** 220-350 words (optimized for 2-minute read) 689 689 690 - **Enum Values:**476 +--- 691 691 692 -* Highly Likely - Probability 0.85-1.0, high confidence 693 -* Likely - Probability 0.65-0.84, moderate-high confidence 694 -* Unclear - Probability 0.35-0.64, or low confidence 695 -* Unlikely - Probability 0.16-0.34, moderate-high confidence 696 -* Highly Unlikely - Probability 0.0-0.15, high confidence 697 -* Unsubstantiated - Insufficient evidence to determine probability 478 +== 7. LLM Configuration (POC1) == 698 698 699 -==== 4.5.2 Claim Verdict Labels (Rollup) ==== 480 +|=Parameter|=Value|=Notes 481 +|**Provider**|Anthropic|Primary provider for POC1 482 +|**Model**|{{code}}claude-3-5-sonnet-20241022{{/code}}|Current production model 483 +|**Future Model**|{{code}}claude-sonnet-4-20250514{{/code}}|When available (architecture supports) 484 +|**Token Budget**|50K-80K per analysis|Input + output combined (varies by article length) 485 +|**Estimated Cost**|$0.10-0.30 per article|Based on Sonnet 3.5 pricing ($3/M input, $15/M output) 486 +|**Prompt Strategy**|Single-pass per stage|Not multi-turn; structured JSON output with schema validation 487 +|**Chain-of-Thought**|Yes|For verdict reasoning and holistic assessment 488 +|**Few-Shot Examples**|Yes|For claim extraction and scenario generation 700 700 701 - Usedwhensummarizinga claim acrossallscenarios.490 +=== 7.1 Token Budgets by Stage === 702 702 703 -**Enum Values:** 492 +|=Stage|=Approximate Output Tokens 493 +|Claim Extraction|~4,000 (10 claims × ~400 tokens) 494 +|Scenario Generation|~3,000 per claim (3 scenarios × ~1,000 tokens) 495 +|Evidence Synthesis|~2,000 per scenario 496 +|Verdict Generation|~1,000 per scenario 497 +|Holistic Assessment|~500 (context-aware summary) 704 704 705 -* Supported - Majority of scenarios are Likely or Highly Likely 706 -* Refuted - Majority of scenarios are Unlikely or Highly Unlikely 707 -* Inconclusive - Mixed scenarios or majority Unclear/Unsubstantiated 499 +**Total:** 50K-80K tokens per article (input + output) 708 708 709 - **MappingLogic:**501 +=== 7.2 API Integration === 710 710 711 -* If ≥60% scenarios are (Highly Likely | Likely) → Supported 712 -* If ≥60% scenarios are (Highly Unlikely | Unlikely) → Refuted 713 -* Otherwise → Inconclusive 503 +**Anthropic Messages API:** 504 +* Endpoint: {{code}}https://api.anthropic.com/v1/messages{{/code}} 505 +* Authentication: API key via {{code}}x-api-key{{/code}} header 506 +* Model parameter: {{code}}"model": "claude-3-5-sonnet-20241022"{{/code}} 507 +* Max tokens: {{code}}"max_tokens": 4096{{/code}} (per stage) 714 714 715 - ====4.5.3ArticleVerdictLabels(Stage3) ====509 +**No LangChain/LangGraph needed** for POC1 simplicity - direct SDK calls suffice. 716 716 717 - Used for holistic article-level assessment.511 +--- 718 718 719 - **EnumValues:**513 +== 8. Cross-References (xWiki) == 720 720 721 -* WELL-SUPPORTED - Article thesis logically follows from supported claims 722 -* MISLEADING - Claims may be true but article commits logical fallacies 723 -* REFUTED - Central claims are refuted, invalidating thesis 724 -* UNCERTAIN - Insufficient evidence or highly mixed claim verdicts 515 +This API specification implements requirements from: 725 725 726 -**Note:** Article verdict considers **claim centrality** (central claims override supporting claims). 517 +* **[[POC Requirements>>Test.FactHarbor.Specification.POC.Requirements]]** 518 +** FR-POC-1 through FR-POC-6 (POC1-specific functional requirements) 519 +** NFR-POC-1 through NFR-POC-3 (quality gates lite: Gates 1 & 4 only) 520 +** Section 2.1: Analysis Summary (Context-Aware) component specification 521 +** Section 10.3: Prompt structure for claim extraction and verdict synthesis 727 727 728 -==== 4.5.4 API Field Mapping ==== 523 +* **[[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]]** 524 +** Complete investigation of 7 approaches to article-level verdicts 525 +** Approach 1 (Single-Pass Holistic Analysis) chosen for POC1 526 +** Experimental feature testing plan (30 articles, ≥70% accuracy target) 527 +** Decision framework for POC2 implementation 729 729 730 - |=Level|=APIField|=EnumName731 - |Scenario|scenarios[].verdict.label|scenario_verdict_label732 - |Claim|claims[].rollup_verdict(optional)|claim_verdict_label733 - |Article|article_holistic_assessment.overall_verdict|article_verdict_label529 +* **[[Requirements>>Test.FactHarbor.Specification.Requirements.WebHome]]** 530 +** FR4 (Analysis Summary) - enhanced with context-aware capability 531 +** FR7 (Verdict Calculation) - probability ranges + confidence scores 532 +** NFR11 (Quality Gates) - POC1 implements Gates 1 & 4; Gates 2 & 3 in POC2 734 734 735 ----- 534 +* **[[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]]** 535 +** POC1 simplified architecture (stateless, single AKEL orchestration call) 536 +** Data persistence minimized (job outputs only, no database required) 537 +** Deferred complexity (no Elasticsearch, TimescaleDB, Federation until metrics justify) 736 736 737 -== 5. Cache Architecture == 539 +* **[[Data Model>>Test.FactHarbor.Specification.Data Model.WebHome]]** 540 +** Evidence structure (source, stance, reliability rating) 541 +** Scenario boundaries (time, geography, population, conditions) 542 +** Claim types and evaluability taxonomy 543 +** Source Track Record System (Section 1.3) - temporal separation 738 738 739 -=== 5.1 Redis Cache Design === 545 +* **[[Requirements Roadmap Matrix>>Test.FactHarbor.Roadmap.Requirements-Roadmap-Matrix.WebHome]]** 546 +** POC1 requirement mappings and phase assignments 547 +** Context-aware analysis as POC1 experimental feature 548 +** POC2 enhancement path (Gates 2 & 3, evidence deduplication) 740 740 741 - **Technology:** Redis 7.0+ (in-memory key-value store)550 +--- 742 742 743 - **CacheKeySchema:**552 +== 9. Implementation Notes (POC1) == 744 744 745 -{{{claim:v1norm1:{language}:{sha256(canonical_claim)} 746 -}}} 554 +=== 9.1 Recommended Tech Stack === 747 747 748 -**Example:** 556 +* **Framework:** Next.js 14+ with App Router (TypeScript) - Full-stack in one codebase 557 +* **Rationale:** API routes + React UI unified, Vercel deployment-ready, similar to C# in structure 558 +* **Storage:** Filesystem JSON files (no database needed for POC1) 559 +* **Queue:** In-memory queue or Redis (optional for concurrency) 560 +* **URL Extraction:** Jina AI Reader API (primary), trafilatura (fallback) 561 +* **Deployment:** Vercel, AWS Lambda, or similar serverless 749 749 750 -{{{Claim (English): "COVID vaccines are 95% effective" 751 -Canonical: "covid vaccines are 95 percent effective" 752 -Language: "en" 753 -SHA256: abc123...def456 754 -Key: claim:v1norm1:en:abc123...def456 755 -}}} 563 +=== 9.2 POC1 Simplifications === 756 756 757 -**Rationale:** Prevents cross-language collisions and enables per-language cache analytics. 565 +* **No database required:** Job metadata + outputs stored as JSON files ({{code}}jobs/{job_id}.json{{/code}}, {{code}}results/{job_id}.json{{/code}}) 566 +* **No user authentication:** Optional API key validation only (env var: {{code}}FACTHARBOR_API_KEY{{/code}}) 567 +* **Single-instance deployment:** No distributed processing, no worker pools 568 +* **Synchronous LLM calls:** No streaming in POC1 (entire response before returning) 569 +* **Job retention:** 24 hours default (configurable: {{code}}JOB_RETENTION_HOURS{{/code}}) 570 +* **Rate limiting:** Simple IP-based (optional) - no complex billing 758 758 759 - **DataStructure:**572 +=== 9.3 Estimated Costs (Per Analysis) === 760 760 761 -{{{SET claim:v1norm1:en:abc123...def456 '{...ClaimAnalysis JSON...}' 762 -EXPIRE claim:v1norm1:en:abc123...def456 7776000 # 90 days 763 -}}} 574 +**LLM API costs (Claude 3.5 Sonnet):** 575 +* Input: $3.00 per million tokens 576 +* Output: $15.00 per million tokens 577 +* **Per article:** $0.10-0.30 (varies by length, 5-10 claims typical) 764 764 765 ----- 579 +**Web search costs (optional):** 580 +* Using external search API (Tavily, Brave): $0.01-0.05 per analysis 581 +* POC1 can use free search APIs initially 766 766 767 -=== 5.1.1 Canonical Claim Normalization (v1) === 583 +**Infrastructure costs:** 584 +* Vercel hobby tier: Free for POC 585 +* AWS Lambda: ~$0.001 per request 586 +* **Total infra:** <$0.01 per analysis 768 768 769 -T he cachekey dependsondeterministicclaimnormalization.AllimplementationsMUSTfollowthisalgorithmexactly.588 +**Total estimated cost:** ~$0.15-0.35 per analysis ✅ Meets <$0.35 target 770 770 771 - **Algorithm: CanonicalClaimNormalizationv1**590 +=== 9.4 Estimated Timeline (AI-Assisted) === 772 772 773 -{{{def normalize_claim_v1(claim_text: str, language: str) -> str: 774 - """ 775 - Normalizes claim to canonical form for cache key generation. 776 - Version: v1norm1 (POC1) 777 - """ 778 - import re 779 - import unicodedata 780 - 781 - # Step 1: Unicode normalization (NFC) 782 - text = unicodedata.normalize('NFC', claim_text) 783 - 784 - # Step 2: Lowercase 785 - text = text.lower() 786 - 787 - # Step 3: Remove punctuation (except hyphens in words) 788 - text = re.sub(r'[^\w\s-]', '', text) 789 - 790 - # Step 4: Normalize whitespace (collapse multiple spaces) 791 - text = re.sub(r'\s+', ' ', text).strip() 792 - 793 - # Step 5: Numeric normalization 794 - text = text.replace('%', ' percent') 795 - # Spell out single-digit numbers 796 - num_to_word = {'0':'zero', '1':'one', '2':'two', '3':'three', 797 - '4':'four', '5':'five', '6':'six', '7':'seven', 798 - '8':'eight', '9':'nine'} 799 - for num, word in num_to_word.items(): 800 - text = re.sub(rf'\b{num}\b', word, text) 801 - 802 - # Step 6: Common abbreviations (English only in v1) 803 - if language == 'en': 804 - text = text.replace('covid-19', 'covid') 805 - text = text.replace('u.s.', 'us') 806 - text = text.replace('u.k.', 'uk') 807 - 808 - # Step 7: NO entity normalization in v1 809 - # (Trump vs Donald Trump vs President Trump remain distinct) 810 - 811 - return text 592 +**With Cursor IDE + Claude API:** 593 +* Day 1-2: API scaffolding + job queue 594 +* Day 3-4: LLM integration + prompt engineering 595 +* Day 5-6: Evidence retrieval + contradiction search 596 +* Day 7: Report templates + testing with 30 articles 597 +* **Total:** 5-7 days for working POC1 812 812 813 -# Version identifier (include in cache namespace) 814 -CANONICALIZER_VERSION = "v1norm1" 815 -}}} 599 +**Manual coding (no AI assistance):** 600 +* Estimate: 15-20 days 816 816 817 - **CacheKeyFormula(Updated):**602 +=== 9.5 First Prompt for AI Code Generation === 818 818 819 -{{{language = "en" 820 -canonical = normalize_claim_v1(claim_text, language) 821 -cache_key = f"claim:{CANONICALIZER_VERSION}:{language}:{sha256(canonical)}" 604 +{{code}} 605 +Based on the FactHarbor POC1 API & Schemas Specification (v0.3), generate a Next.js 14 TypeScript application with: 822 822 823 -Example: 824 - claim: "COVID-19 vaccines are 95% effective" 825 - canonical: "covid vaccines are 95 percent effective" 826 - sha256: abc123...def456 827 - key: "claim:v1norm1:en:abc123...def456" 828 -}}} 607 +1. API routes implementing the 7 endpoints specified in Section 3 608 +2. AnalyzeRequest/AnalysisResult types matching schemas in Sections 4-5 609 +3. Anthropic Claude 3.5 Sonnet integration for: 610 + - Claim extraction (with central/supporting marking) 611 + - Scenario generation 612 + - Evidence synthesis (with mandatory contradiction search) 613 + - Verdict generation 614 + - Holistic assessment (article-level credibility) 615 +4. Job-based async execution with progress tracking (7 pipeline stages) 616 +5. Quality Gates 1 & 4 from NFR11 implementation 617 +6. Mandatory contradiction search enforcement (Section 5) 618 +7. Context-aware analysis (experimental) as specified 619 +8. Filesystem-based job storage (no database) 620 +9. Markdown report generation from JSON templates (Section 6) 829 829 830 -**Cache Metadata MUST Include:** 622 +Use the validation rules from Section 5 and error codes from Section 2.1.1. 623 +Target: <$0.35 per analysis, <2 minutes processing time. 624 +{{/code}} 831 831 832 -{{{{ 833 - "canonical_claim": "covid vaccines are 95 percent effective", 834 - "canonicalizer_version": "v1norm1", 835 - "language": "en", 836 - "original_claim_samples": ["COVID-19 vaccines are 95% effective"] 837 -} 838 -}}} 626 +--- 839 839 840 - **VersionUpgrade Path:**628 +== 10. Testing Strategy (POC1) == 841 841 842 -* v1norm1 → v1norm2: Cache namespace changes, old keys remain valid until TTL 843 -* v1normN → v2norm1: Major version bump, invalidate all v1 caches 630 +=== 10.1 Test Dataset (30 Articles) === 844 844 845 ----- 632 +**Category 1: Straightforward Factual (10 articles)** 633 +* Purpose: Baseline accuracy 634 +* Example: "WHO report on global vaccination rates" 635 +* Expected: High claim accuracy, straightforward verdict 846 846 847 -=== 5.1.2 Copyright & Data Retention Policy === 637 +**Category 2: Accurate Claims, Questionable Conclusions (10 articles)** ⭐ **Context-Aware Test** 638 +* Purpose: Test holistic assessment capability 639 +* Example: "Coffee cures cancer" (true premises, false conclusion) 640 +* Expected: Individual claims TRUE, article verdict MISLEADING 848 848 849 -**Evidence Excerpt Storage:** 642 +**Category 3: Mixed Accuracy (5 articles)** 643 +* Purpose: Test nuance handling 644 +* Example: Articles with some true, some false claims 645 +* Expected: Scenario-level differentiation 850 850 851 -To comply with copyright law and fair use principles: 647 +**Category 4: Low-Quality Claims (5 articles)** 648 +* Purpose: Test quality gates 649 +* Example: Opinion pieces, compound claims 650 +* Expected: Gate 1 failures, rejection or draft-only mode 852 852 853 - **WhatWeStore:**652 +=== 10.2 Success Metrics === 854 854 855 -* **Metadata only:** Title, author, publisher, URL, publication date 856 -* **Short excerpts:** Max 25 words per quote, max 3 quotes per evidence item 857 -* **Summaries:** AI-generated bullet points (not verbatim text) 858 -* **No full articles:** Never store complete article text beyond job processing 654 +**Quality Metrics:** 655 +* Hallucination rate: <5% (target: <3%) 656 +* Context-aware accuracy: ≥70% (experimental - key POC1 goal) 657 +* False positive rate: <15% 658 +* Mandatory contradiction search: 100% compliance 859 859 860 -**Total per Cached Claim:** 660 +**Performance Metrics:** 661 +* Processing time: <2 minutes per article (standard depth) 662 +* Cost per analysis: <$0.35 663 +* API uptime: >99% 664 +* LLM API error rate: <1% 861 861 862 -* Scenarios: 2 per claim 863 -* Evidence items: 6 per scenario (12 total) 864 -* Quotes: 3 per evidence × 25 words = 75 words per item 865 -* **Maximum stored verbatim text:** ~~900 words per claim (12 × 75) 666 +**See:** [[POC1 Roadmap>>Test.FactHarbor.Roadmap.POC1.WebHome]] Section 11 for complete success criteria and testing methodology. 866 866 867 - **Retention:**668 +--- 868 868 869 -* Cache TTL: 90 days 870 -* Job outputs: 24 hours (then archived or deleted) 871 -* No persistent full-text article storage 670 +**End of Specification - FactHarbor POC1 API v0.3** 872 872 873 -**Ra tionale:**672 +**Ready for xWiki import and AI-assisted implementation!** 🚀 874 874 875 -* Short excerpts for citation = fair use 876 -* Summaries are transformative (not copyrightable) 877 -* Limited retention (90 days max) 878 -* No commercial republication of excerpts 879 - 880 -**DMCA Compliance:** 881 - 882 -* Cache invalidation endpoint available for rights holders 883 -* Contact: dmca@factharbor.org 884 - 885 ----- 886 - 887 -== Summary == 888 - 889 -This WYSIWYG preview shows the **structure and key sections** of the 1,515-line API specification. 890 - 891 -**Full specification includes:** 892 - 893 -* Complete API endpoints (7 total) 894 -* All data schemas (ClaimExtraction, ClaimAnalysis, HolisticAssessment, Complete) 895 -* Quality gates & validation rules 896 -* LLM configuration for all 3 stages 897 -* Implementation notes with code samples 898 -* Testing strategy 899 -* Cross-references to other pages 900 - 901 -**The complete specification is available in:** 902 - 903 -* FactHarbor_POC1_API_and_Schemas_Spec_v0_4_1_PATCHED.md (45 KB standalone) 904 -* Export files (TEST/PRODUCTION) for xWiki import