Last modified by Robert Schaub on 2025/12/24 18:26

From version 1.1
edited by Robert Schaub
on 2025/12/24 11:54
Change comment: Imported from XAR
To version 5.1
edited by Robert Schaub
on 2025/12/24 17:59
Change comment: Imported from XAR

Summary

Details

Page properties
Content
... ... @@ -1,673 +1,904 @@
1 -# FactHarbor POC1 API & Schemas Specification
1 += POC1 API & Schemas Specification =
2 2  
3 -**Version:** 0.3 (POC1 - Production Ready)
4 -**Namespace:** FactHarbor.*
5 -**Syntax:** xWiki 2.1
6 -**Last Updated:** 2025-12-24
3 +----
7 7  
8 ----
9 -
10 10  == Version History ==
11 11  
12 12  |=Version|=Date|=Changes
13 -|0.3|2025-12-24|Added complete API endpoints, LLM config, risk tiers, scraping details, quality gate logging, temporal separation note, cross-references
14 -|0.2|2025-12-24|Initial rebased version with holistic assessment
15 -|0.1|2025-12-24|Original specification
8 +|0.4.1|2025-12-24|Applied 9 critical fixes: file format notice, verdict taxonomy, canonicalization algorithm, Stage 1 cost policy, BullMQ fix, language in cache key, historical claims TTL, idempotency, copyright policy
9 +|0.4|2025-12-24|**BREAKING:** 3-stage pipeline with claim-level caching, user tier system, cache-only mode for free users, Redis cache architecture
10 +|0.3.1|2025-12-24|Fixed single-prompt strategy, SSE clarification, schema canonicalization, cost constraints
11 +|0.3|2025-12-24|Added complete API endpoints, LLM config, risk tiers, scraping details
16 16  
17 ----
13 +----
18 18  
19 19  == 1. Core Objective (POC1) ==
20 20  
21 -The primary technical goal of POC1 is to validate **Approach 1 (Single-Pass Holistic Analysis)**:
17 +The primary technical goal of POC1 is to validate **Approach 1 (Single-Pass Holistic Analysis)** while implementing **claim-level caching** to achieve cost sustainability.
22 22  
23 -The system must prove that AI can identify an article's **Main Thesis** and determine if the supporting claims (even if individually accurate) logically support that thesis without committing fallacies (e.g., correlation vs. causation, cherry-picking, hasty generalization).
19 +The system must prove that AI can identify an article's **Main Thesis** and determine if supporting claims logically support that thesis without committing fallacies.
24 24  
25 -**Success Criteria:**
21 +=== Success Criteria: ===
22 +
26 26  * Test with 30 diverse articles
27 27  * Target: ≥70% accuracy detecting misleading articles
28 -* Cost: <$0.35 per analysis
25 +* Cost: <$0.25 per NEW analysis (uncached)
26 +* Cost: $0.00 for cached claim reuse
27 +* Cache hit rate: ≥50% after 1,000 articles
29 29  * Processing time: <2 minutes (standard depth)
30 30  
31 -**See:** [[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]] for complete investigation of 7 approaches.
30 +=== Economic Model: ===
32 32  
33 ----
32 +* **Free tier:** $10 credit per month (~~40-140 articles depending on cache hits)
33 +* **After limit:** Cache-only mode (instant, free access to cached claims)
34 +* **Paid tier:** Unlimited new analyses
34 34  
35 -== 2. Runtime Model & Job States ==
36 +----
36 36  
37 -=== 2.1 Pipeline Steps ===
38 +== 2. Architecture Overview ==
38 38  
39 -For progress reporting via API, the pipeline follows these stages:
40 +=== 2.1 3-Stage Pipeline with Caching ===
40 40  
41 -# **INGEST**: URL scraping (Jina Reader / Trafilatura) or text normalization.
42 -# **EXTRACT_CLAIMS**: Identifying 3-5 verifiable factual claims + marking central vs. supporting.
43 -# **SCENARIOS**: Generating context interpretations for each claim.
44 -# **RETRIEVAL**: Evidence gathering (Search API + mandatory contradiction search).
45 -# **VERDICTS**: Assigning likelihoods, confidence, and uncertainty per scenario.
46 -# **HOLISTIC_ASSESSMENT**: Evaluating article-level credibility (Thesis vs. Claims logic).
47 -# **REPORT**: Generating final Markdown and JSON outputs.
42 +FactHarbor POC1 uses a **3-stage architecture** designed for claim-level caching and cost efficiency:
48 48  
49 -=== 2.1.1 URL Extraction Strategy ===
44 +{{mermaid}}
45 +graph TD
46 + A[Article Input] --> B[Stage 1: Extract Claims]
47 + B --> C{For Each Claim}
48 + C --> D[Check Cache]
49 + D -->|Cache HIT| E[Return Cached Verdict]
50 + D -->|Cache MISS| F[Stage 2: Analyze Claim]
51 + F --> G[Store in Cache]
52 + G --> E
53 + E --> H[Stage 3: Holistic Assessment]
54 + H --> I[Final Report]
55 +{{/mermaid}}
50 50  
51 -**Primary:** Jina AI Reader ({{code}}https://r.jina.ai/{url}{{/code}})
52 -* **Rationale:** Clean markdown, handles JS rendering, free tier sufficient
53 -* **Fallback:** Trafilatura (Python library) for simple static HTML
57 +==== Stage 1: Claim Extraction (Haiku, no cache) ====
54 54  
55 -**Error Handling:**
59 +* **Input:** Article text
60 +* **Output:** 5 canonical claims (normalized, deduplicated)
61 +* **Model:** Claude Haiku 4 (default, configurable via LLM abstraction layer)
62 +* **Cost:** $0.003 per article
63 +* **Cache strategy:** No caching (article-specific)
56 56  
57 -|=Error Code|=Trigger|=Action
58 -|{{code}}URL_BLOCKED{{/code}}|403/401/Paywall detected|Return error, suggest text paste
59 -|{{code}}URL_UNREACHABLE{{/code}}|Network/DNS failure|Retry once, then fail
60 -|{{code}}URL_NOT_FOUND{{/code}}|404 Not Found|Return error immediately
61 -|{{code}}EXTRACTION_FAILED{{/code}}|Content <50 words or unreadable|Return error with reason
65 +==== Stage 2: Claim Analysis (Sonnet, CACHED) ====
62 62  
63 -**Supported URL Patterns:**
64 -* News articles, blog posts, Wikipedia
65 -* Academic preprints (arXiv)
66 -* ❌ Social media posts (Twitter, Facebook) - not in POC1
67 -* ❌ Video platforms (YouTube, TikTok) - not in POC1
68 -* PDF files - deferred to Beta 0
67 +* **Input:** Single canonical claim
68 +* **Output:** Scenarios + Evidence + Verdicts
69 +* **Model:** Claude Sonnet 3.5 (default, configurable via LLM abstraction layer)
70 +* **Cost:** $0.081 per NEW claim
71 +* **Cache strategy:** Redis, 90-day TTL
72 +* **Cache key:** claim:v1norm1:{language}:{sha256(canonical_claim)}
69 69  
70 -=== 2.2 Job Status Enumeration ===
74 +==== Stage 3: Holistic Assessment (Sonnet, no cache) ====
71 71  
72 -(((
73 -* **QUEUED** - Job accepted, waiting in queue
74 -* **RUNNING** - Processing in progress
75 -* **SUCCEEDED** - Analysis complete, results available
76 -* **FAILED** - Error occurred, see error details
77 -* **CANCELLED** - User cancelled via DELETE endpoint
78 -)))
76 +* **Input:** Article + Claim verdicts (from cache or Stage 2)
77 +* **Output:** Article verdict + Fallacies + Logic quality
78 +* **Model:** Claude Sonnet 3.5 (default, configurable via LLM abstraction layer)
79 +* **Cost:** $0.030 per article
80 +* **Cache strategy:** No caching (article-specific)
79 79  
80 ----
81 81  
82 -== 3. REST API Contract ==
83 83  
84 -=== 3.1 Create Analysis Job ===
84 +**Note:** Stage 3 implements **Approach 1 (Single-Pass Holistic Analysis)** from the [[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]]. While claim analysis (Stage 2) is cached for efficiency, the holistic assessment maintains the integrated evaluation philosophy of Approach 1.
85 85  
86 -**Endpoint:** {{code}}POST /v1/analyze{{/code}}
86 +=== Total Cost Formula: ===
87 87  
88 -**Request Body Example:**
89 -{{code language="json"}}
90 -{
91 - "input_type": "url",
92 - "input_url": "https://example.com/medical-report-01",
93 - "input_text": null,
94 - "options": {
95 - "browsing": "on",
96 - "depth": "standard",
97 - "max_claims": 5,
98 - "context_aware_analysis": true
99 - },
100 - "client": {
101 - "request_id": "optional-client-tracking-id",
102 - "source_label": "optional"
103 - }
104 -}
105 -{{/code}}
88 +{{{Cost = $0.003 (extraction) + (N_new_claims × $0.081) + $0.030 (holistic)
106 106  
107 -**Options:**
108 -* {{code}}browsing{{/code}}: {{code}}on{{/code}} | {{code}}off{{/code}} (retrieve web sources or just output queries)
109 -* {{code}}depth{{/code}}: {{code}}standard{{/code}} | {{code}}deep{{/code}} (evidence thoroughness)
110 -* {{code}}max_claims{{/code}}: 1-50 (default: 10)
111 -* {{code}}context_aware_analysis{{/code}}: {{code}}true{{/code}} | {{code}}false{{/code}} (experimental)
90 +Examples:
91 +- 0 new claims (100% cache hit): $0.033
92 +- 1 new claim (80% cache hit): $0.114
93 +- 3 new claims (40% cache hit): $0.276
94 +- 5 new claims (0% cache hit): $0.438
95 +}}}
112 112  
113 -**Response:** {{code}}202 Accepted{{/code}}
97 +----
114 114  
115 -{{code language="json"}}
116 -{
117 - "job_id": "01J...ULID",
118 - "status": "QUEUED",
119 - "created_at": "2025-12-24T10:31:00Z",
120 - "links": {
121 - "self": "/v1/jobs/01J...ULID",
122 - "result": "/v1/jobs/01J...ULID/result",
123 - "report": "/v1/jobs/01J...ULID/report",
124 - "events": "/v1/jobs/01J...ULID/events"
125 - }
126 -}
127 -{{/code}}
99 +=== 2.2 User Tier System ===
128 128  
129 ----
101 +|=Tier|=Monthly Credit|=After Limit|=Cache Access|=Analytics
102 +|**Free**|$10|Cache-only mode|✅ Full|Basic
103 +|**Pro** (future)|$50|Continues|✅ Full|Advanced
104 +|**Enterprise** (future)|Custom|Continues|✅ Full + Priority|Full
130 130  
131 -=== 3.2 Get Job Status ===
106 +**Free Tier Economics:**
132 132  
133 -**Endpoint:** {{code}}GET /v1/jobs/{job_id}{{/code}}
108 +* $10 credit = 40-140 articles analyzed (depending on cache hit rate)
109 +* Average 70 articles/month at 70% cache hit rate
110 +* After limit: Cache-only mode
134 134  
135 -**Response:** {{code}}200 OK{{/code}}
112 +----
136 136  
137 -{{code language="json"}}
138 -{
139 - "job_id": "01J...ULID",
140 - "status": "RUNNING",
141 - "created_at": "2025-12-24T10:31:00Z",
142 - "updated_at": "2025-12-24T10:31:22Z",
143 - "progress": {
144 - "step": "RETRIEVAL",
145 - "percent": 60,
146 - "message": "Gathering evidence for C2-S1",
147 - "current_claim_id": "C2",
148 - "current_scenario_id": "C2-S1"
149 - },
150 - "input_echo": {
151 - "input_type": "url",
152 - "input_url": "https://example.com/medical-report-01"
153 - },
154 - "links": {
155 - "self": "/v1/jobs/01J...ULID",
156 - "result": "/v1/jobs/01J...ULID/result",
157 - "report": "/v1/jobs/01J...ULID/report"
158 - },
159 - "error": null
114 +=== 2.3 Cache-Only Mode (Free Tier Feature) ===
115 +
116 +When free users reach their $10 monthly limit, they enter **Cache-Only Mode**:
117 +
118 +==== What Cache-Only Mode Provides: ====
119 +
120 +✅ **Claim Extraction (Platform-Funded):**
121 +
122 +* Stage 1 extraction runs at $0.003 per article
123 +* **Cost: Absorbed by platform** (not charged to user credit)
124 +* Rationale: Extraction is necessary to check cache, and cost is negligible
125 +* Rate limit: Max 50 extractions/day in cache-only mode (prevents abuse)
126 +
127 +✅ **Instant Access to Cached Claims:**
128 +
129 +* Any claim that exists in cache → Full verdict returned
130 +* Cost: $0 (no LLM calls)
131 +* Response time: <100ms
132 +
133 +✅ **Partial Article Analysis:**
134 +
135 +* Check each claim against cache
136 +* Return verdicts for ALL cached claims
137 +* For uncached claims: Return "status": "cache_miss"
138 +
139 +✅ **Cache Coverage Report:**
140 +
141 +* "3 of 5 claims available in cache (60% coverage)"
142 +* Links to cached analyses
143 +* Estimated cost to complete: $0.162 (2 new claims)
144 +
145 +❌ **Not Available in Cache-Only Mode:**
146 +
147 +* New claim analysis (Stage 2 LLM calls blocked)
148 +* Full holistic assessment (Stage 3 blocked if any claims missing)
149 +
150 +==== User Experience Example: ====
151 +
152 +{{{{
153 + "status": "cache_only_mode",
154 + "message": "Monthly credit limit reached. Showing cached results only.",
155 + "cache_coverage": {
156 + "claims_total": 5,
157 + "claims_cached": 3,
158 + "claims_missing": 2,
159 + "coverage_percent": 60
160 + },
161 + "cached_claims": [
162 + {"claim_id": "C1", "verdict": "Likely", "confidence": 0.82},
163 + {"claim_id": "C2", "verdict": "Highly Likely", "confidence": 0.91},
164 + {"claim_id": "C4", "verdict": "Unclear", "confidence": 0.55}
165 + ],
166 + "missing_claims": [
167 + {"claim_id": "C3", "claim_text": "...", "estimated_cost": "$0.081"},
168 + {"claim_id": "C5", "claim_text": "...", "estimated_cost": "$0.081"}
169 + ],
170 + "upgrade_options": {
171 + "top_up": "$5 for 20-70 more articles",
172 + "pro_tier": "$50/month unlimited"
173 + }
160 160  }
161 -{{/code}}
175 +}}}
162 162  
163 ----
177 +**Design Rationale:**
164 164  
165 -=== 3.3 Get JSON Result ===
179 +* Free users still get value (cached claims often answer their question)
180 +* Demonstrates FactHarbor's value (partial results encourage upgrade)
181 +* Sustainable for platform (no additional cost)
182 +* Fair to all users (everyone contributes to cache)
166 166  
167 -**Endpoint:** {{code}}GET /v1/jobs/{job_id}/result{{/code}}
184 +----
168 168  
169 -**Response:** {{code}}200 OK{{/code}} (Returns the **AnalysisResult** schema - see Section 4)
170 170  
171 -**Other Responses:**
172 -* {{code}}409 Conflict{{/code}} - Job not finished yet
173 -* {{code}}404 Not Found{{/code}} - Job ID unknown
174 174  
175 ----
188 +== 6. LLM Abstraction Layer ==
176 176  
177 -=== 3.4 Download Markdown Report ===
190 +=== 6.1 Design Principle ===
178 178  
179 -**Endpoint:** {{code}}GET /v1/jobs/{job_id}/report{{/code}}
192 +**FactHarbor uses provider-agnostic LLM abstraction** to avoid vendor lock-in and enable:
180 180  
181 -**Response:** {{code}}200 OK{{/code}} with {{code}}text/markdown; charset=utf-8{{/code}} content
194 +* **Provider switching:** Change LLM providers without code changes
195 +* **Cost optimization:** Use different providers for different stages
196 +* **Resilience:** Automatic fallback if primary provider fails
197 +* **Cross-checking:** Compare outputs from multiple providers
198 +* **A/B testing:** Test new models without deployment changes
182 182  
183 -**Headers:**
184 -* {{code}}Content-Disposition: attachment; filename="factharbor_poc1_{job_id}.md"{{/code}}
200 +**Implementation:** All LLM calls go through an abstraction layer that routes to configured providers.
185 185  
186 -**Other Responses:**
187 -* {{code}}409 Conflict{{/code}} - Job not finished
188 -* {{code}}404 Not Found{{/code}} - Job unknown
202 +----
189 189  
190 ----
204 +=== 6.2 LLM Provider Interface ===
191 191  
192 -=== 3.5 Stream Job Events (Optional, Recommended) ===
206 +**Abstract Interface:**
193 193  
194 -**Endpoint:** {{code}}GET /v1/jobs/{job_id}/events{{/code}}
208 +{{{
209 +interface LLMProvider {
210 + // Core methods
211 + complete(prompt: string, options: CompletionOptions): Promise<CompletionResponse>
212 + stream(prompt: string, options: CompletionOptions): AsyncIterator<StreamChunk>
213 +
214 + // Provider metadata
215 + getName(): string
216 + getMaxTokens(): number
217 + getCostPer1kTokens(): { input: number, output: number }
218 +
219 + // Health check
220 + isAvailable(): Promise<boolean>
221 +}
195 195  
196 -**Response:** Server-Sent Events (SSE) stream
223 +interface CompletionOptions {
224 + model?: string
225 + maxTokens?: number
226 + temperature?: number
227 + stopSequences?: string[]
228 + systemPrompt?: string
229 +}
230 +}}}
197 197  
198 -**Event Types:**
199 -* {{code}}progress{{/code}} - Progress update
200 -* {{code}}claim_extracted{{/code}} - Claim identified
201 -* {{code}}verdict_computed{{/code}} - Scenario verdict complete
202 -* {{code}}complete{{/code}} - Job finished
203 -* {{code}}error{{/code}} - Error occurred
232 +----
204 204  
205 ----
234 +=== 6.3 Supported Providers (POC1) ===
206 206  
207 -=== 3.6 Cancel Job ===
236 +**Primary Provider (Default):**
208 208  
209 -**Endpoint:** {{code}}DELETE /v1/jobs/{job_id}{{/code}}
238 +* **Anthropic Claude API**
239 + * Models: Claude Haiku 4, Claude Sonnet 3.5, Claude Opus 4
240 + * Used by default in POC1
241 + * Best quality for holistic analysis
210 210  
211 -Attempts to cancel a queued or running job.
243 +**Secondary Providers (Future):**
212 212  
213 -**Response:** {{code}}200 OK{{/code}} with updated Job object (status: CANCELLED)
245 +* **OpenAI API**
246 + * Models: GPT-4o, GPT-4o-mini
247 + * For cost comparison
248 +
249 +* **Google Vertex AI**
250 + * Models: Gemini 1.5 Pro, Gemini 1.5 Flash
251 + * For diversity in evidence gathering
214 214  
215 -**Note:** Already-completed jobs cannot be cancelled.
253 +* **Local Models** (Post-POC)
254 + * Models: Llama 3.1, Mistral
255 + * For privacy-sensitive deployments
216 216  
217 ----
257 +----
218 218  
219 -=== 3.7 Health Check ===
259 +=== 6.4 Provider Configuration ===
220 220  
221 -**Endpoint:** {{code}}GET /v1/health{{/code}}
261 +**Environment Variables:**
222 222  
223 -**Response:** {{code}}200 OK{{/code}}
263 +{{{
264 +# Primary provider
265 +LLM_PRIMARY_PROVIDER=anthropic
266 +ANTHROPIC_API_KEY=sk-ant-...
224 224  
225 -{{code language="json"}}
268 +# Fallback provider
269 +LLM_FALLBACK_PROVIDER=openai
270 +OPENAI_API_KEY=sk-...
271 +
272 +# Provider selection per stage
273 +LLM_STAGE1_PROVIDER=anthropic
274 +LLM_STAGE1_MODEL=claude-haiku-4
275 +LLM_STAGE2_PROVIDER=anthropic
276 +LLM_STAGE2_MODEL=claude-sonnet-3-5
277 +LLM_STAGE3_PROVIDER=anthropic
278 +LLM_STAGE3_MODEL=claude-sonnet-3-5
279 +
280 +# Cost limits
281 +LLM_MAX_COST_PER_REQUEST=1.00
282 +}}}
283 +
284 +**Database Configuration (Alternative):**
285 +
286 +{{{{
226 226  {
227 - "status": "ok",
228 - "version": "POC1-v0.3",
229 - "model": "claude-3-5-sonnet-20241022"
288 + "providers": [
289 + {
290 + "name": "anthropic",
291 + "api_key_ref": "vault://anthropic-api-key",
292 + "enabled": true,
293 + "priority": 1
294 + },
295 + {
296 + "name": "openai",
297 + "api_key_ref": "vault://openai-api-key",
298 + "enabled": true,
299 + "priority": 2
300 + }
301 + ],
302 + "stage_config": {
303 + "stage1": {
304 + "provider": "anthropic",
305 + "model": "claude-haiku-4",
306 + "max_tokens": 4096,
307 + "temperature": 0.0
308 + },
309 + "stage2": {
310 + "provider": "anthropic",
311 + "model": "claude-sonnet-3-5",
312 + "max_tokens": 16384,
313 + "temperature": 0.3
314 + },
315 + "stage3": {
316 + "provider": "anthropic",
317 + "model": "claude-sonnet-3-5",
318 + "max_tokens": 8192,
319 + "temperature": 0.2
320 + }
321 + }
230 230  }
231 -{{/code}}
323 +}}}
232 232  
233 ----
325 +----
234 234  
235 -== 4. AnalysisResult Schema (Context-Aware) ==
327 +=== 6.5 Stage-Specific Models (POC1 Defaults) ===
236 236  
237 -This schema implements the **Context-Aware Analysis** required by the POC1 specification.
329 +**Stage 1: Claim Extraction**
238 238  
239 -{{code language="json"}}
240 -{
241 - "metadata": {
242 - "job_id": "string (ULID)",
243 - "timestamp_utc": "ISO8601",
244 - "engine_version": "POC1-v0.3",
245 - "llm_provider": "anthropic",
246 - "llm_model": "claude-3-5-sonnet-20241022",
247 - "usage_stats": {
248 - "input_tokens": "integer",
249 - "output_tokens": "integer",
250 - "estimated_cost_usd": "float",
251 - "response_time_sec": "float"
331 +* **Default:** Anthropic Claude Haiku 4
332 +* **Alternative:** OpenAI GPT-4o-mini, Google Gemini 1.5 Flash
333 +* **Rationale:** Fast, cheap, simple task
334 +* **Cost:** ~$0.003 per article
335 +
336 +**Stage 2: Claim Analysis** (CACHEABLE)
337 +
338 +* **Default:** Anthropic Claude Sonnet 3.5
339 +* **Alternative:** OpenAI GPT-4o, Google Gemini 1.5 Pro
340 +* **Rationale:** High-quality analysis, cached 90 days
341 +* **Cost:** ~$0.081 per NEW claim
342 +
343 +**Stage 3: Holistic Assessment**
344 +
345 +* **Default:** Anthropic Claude Sonnet 3.5
346 +* **Alternative:** OpenAI GPT-4o, Claude Opus 4 (for high-stakes)
347 +* **Rationale:** Complex reasoning, logical fallacy detection
348 +* **Cost:** ~$0.030 per article
349 +
350 +**Cost Comparison (Example):**
351 +
352 +|=Stage|=Anthropic (Default)|=OpenAI Alternative|=Google Alternative
353 +|Stage 1|Claude Haiku 4 ($0.003)|GPT-4o-mini ($0.002)|Gemini Flash ($0.002)
354 +|Stage 2|Claude Sonnet 3.5 ($0.081)|GPT-4o ($0.045)|Gemini Pro ($0.050)
355 +|Stage 3|Claude Sonnet 3.5 ($0.030)|GPT-4o ($0.018)|Gemini Pro ($0.020)
356 +|**Total (0% cache)**|**$0.114**|**$0.065**|**$0.072**
357 +
358 +**Note:** POC1 uses Anthropic exclusively for consistency. Multi-provider support planned for POC2.
359 +
360 +----
361 +
362 +=== 6.6 Failover Strategy ===
363 +
364 +**Automatic Failover:**
365 +
366 +{{{
367 +async function completeLLM(stage: string, prompt: string): Promise<string> {
368 + const primaryProvider = getProviderForStage(stage)
369 + const fallbackProvider = getFallbackProvider()
370 +
371 + try {
372 + return await primaryProvider.complete(prompt)
373 + } catch (error) {
374 + if (error.type === 'rate_limit' || error.type === 'service_unavailable') {
375 + logger.warn(`Primary provider failed, using fallback`)
376 + return await fallbackProvider.complete(prompt)
252 252   }
378 + throw error
379 + }
380 +}
381 +}}}
382 +
383 +**Fallback Priority:**
384 +
385 +1. **Primary:** Configured provider for stage
386 +2. **Secondary:** Fallback provider (if configured)
387 +3. **Cache:** Return cached result (if available for Stage 2)
388 +4. **Error:** Return 503 Service Unavailable
389 +
390 +----
391 +
392 +=== 6.7 Provider Selection API ===
393 +
394 +**Admin Endpoint:** POST /admin/v1/llm/configure
395 +
396 +**Update provider for specific stage:**
397 +
398 +{{{{
399 +{
400 + "stage": "stage2",
401 + "provider": "openai",
402 + "model": "gpt-4o",
403 + "max_tokens": 16384,
404 + "temperature": 0.3
405 +}
406 +}}}
407 +
408 +**Response:** 200 OK
409 +
410 +{{{{
411 +{
412 + "message": "LLM configuration updated",
413 + "stage": "stage2",
414 + "previous": {
415 + "provider": "anthropic",
416 + "model": "claude-sonnet-3-5"
253 253   },
254 - "article_holistic_assessment": {
255 - "main_thesis": "string (The core argument detected)",
256 - "overall_verdict": "WELL-SUPPORTED | MISLEADING | REFUTED | UNCERTAIN",
257 - "logic_quality_score": "float (0-1)",
258 - "fallacies_detected": ["correlation-causation", "cherry-picking", "hasty-generalization"],
259 - "verdict_reasoning": "string (Explanation of why article credibility differs from claim average)",
260 - "experimental_feature": true
418 + "current": {
419 + "provider": "openai",
420 + "model": "gpt-4o"
261 261   },
262 - "claims": [
263 - {
264 - "claim_id": "C1",
265 - "is_central_to_thesis": "boolean",
266 - "claim_text": "string",
267 - "canonical_form": "string",
268 - "claim_type": "descriptive | causal | predictive | normative | definitional",
269 - "evaluability": "evaluable | partly_evaluable | not_evaluable",
270 - "risk_tier": "A | B | C",
271 - "risk_tier_justification": "string",
272 - "domain": "string (e.g., 'public health', 'economics')",
273 - "key_terms": ["term1", "term2"],
274 - "entities": ["Person X", "Org Y"],
275 - "time_scope_detected": "2020-2024",
276 - "geography_scope_detected": "Brazil",
277 - "scenarios": [
278 - {
279 - "scenario_id": "C1-S1",
280 - "context_title": "string",
281 - "definitions": {"key_term": "definition"},
282 - "assumptions": ["Assumption 1", "Assumption 2"],
283 - "boundaries": {
284 - "time": "as of 2025-01",
285 - "geography": "Brazil",
286 - "population": "adult population",
287 - "conditions": "excludes X; includes Y"
288 - },
289 - "scope_of_evidence": "What counts as evidence for this scenario",
290 - "scenario_questions": ["Question that decides the verdict"],
291 - "verdict": {
292 - "label": "Highly Likely | Likely | Unclear | Unlikely | Refuted | Unsubstantiated",
293 - "probability_range": [0.0, 1.0],
294 - "confidence": "float (0-1)",
295 - "reasoning": "string",
296 - "key_supporting_evidence_ids": ["E1", "E3"],
297 - "key_counter_evidence_ids": ["E2"],
298 - "uncertainty_factors": ["Data gap", "Method disagreement"],
299 - "what_would_change_my_mind": ["Specific new study", "Updated dataset"]
300 - },
301 - "evidence": [
302 - {
303 - "evidence_id": "E1",
304 - "stance": "supports | undermines | mixed | context_dependent",
305 - "relevance_to_scenario": "float (0-1)",
306 - "evidence_summary": ["Bullet fact 1", "Bullet fact 2"],
307 - "citation": {
308 - "title": "Source title",
309 - "author_or_org": "Org/Author",
310 - "publication_date": "2024-05-01",
311 - "url": "https://source.example",
312 - "publisher": "Publisher/Domain"
313 - },
314 - "excerpt": ["Short quote ≤25 words (optional)"],
315 - "source_reliability_score": "float (0-1) - READ-ONLY SNAPSHOT",
316 - "reliability_justification": "Why high/medium/low",
317 - "limitations_and_reservations": ["Limitation 1", "Limitation 2"],
318 - "retraction_or_dispute_signal": "none | correction | retraction | disputed",
319 - "retrieval_status": "OK | NEEDS_RETRIEVAL | FAILED"
320 - }
321 - ]
322 - }
323 - ]
422 + "cost_impact": {
423 + "previous_cost_per_claim": 0.081,
424 + "new_cost_per_claim": 0.045,
425 + "savings_percent": 44
426 + }
427 +}
428 +}}}
429 +
430 +**Get current configuration:**
431 +
432 +GET /admin/v1/llm/config
433 +
434 +{{{{
435 +{
436 + "providers": ["anthropic", "openai"],
437 + "primary": "anthropic",
438 + "fallback": "openai",
439 + "stages": {
440 + "stage1": {
441 + "provider": "anthropic",
442 + "model": "claude-haiku-4",
443 + "cost_per_request": 0.003
444 + },
445 + "stage2": {
446 + "provider": "anthropic",
447 + "model": "claude-sonnet-3-5",
448 + "cost_per_new_claim": 0.081
449 + },
450 + "stage3": {
451 + "provider": "anthropic",
452 + "model": "claude-sonnet-3-5",
453 + "cost_per_request": 0.030
324 324   }
325 - ],
326 - "quality_gates": {
327 - "gate1_claim_validation": "pass | fail",
328 - "gate4_verdict_confidence": "pass | fail",
329 - "passed_all": "boolean",
330 - "gate_fail_reasons": [
331 - {
332 - "gate": "gate1_claim_validation",
333 - "claim_id": "C1",
334 - "reason_code": "OPINION_DETECTED | COMPOUND_CLAIM | SUBJECTIVE | TOO_VAGUE",
335 - "explanation": "Human-readable explanation"
336 - }
337 - ]
338 - },
339 - "global_notes": {
340 - "limitations": ["System limitation 1", "Limitation 2"],
341 - "safety_or_policy_notes": ["Note 1"]
342 342   }
343 343  }
344 -{{/code}}
457 +}}}
345 345  
346 -=== 4.1 Risk Tier Definitions ===
459 +----
347 347  
348 -|=Tier|=Impact|=Examples|=Actions
349 -|**A (High)**|High real-world impact if wrong|Health claims, safety information, financial advice, medical procedures|Human review recommended (Mode3_Human_Reviewed_Required)
350 -|**B (Medium)**|Moderate impact, contested topics|Political claims, social issues, scientific debates, economic predictions|Enhanced contradiction search, AI-generated publication OK (Mode2_AI_Generated)
351 -|**C (Low)**|Low impact, easily verifiable|Historical facts, basic statistics, biographical data, geographic information|Standard processing, AI-generated publication OK (Mode2_AI_Generated)
461 +=== 6.8 Implementation Notes ===
352 352  
353 -=== 4.2 Source Reliability (Read-Only Snapshots) ===
463 +**Provider Adapter Pattern:**
354 354  
355 -**IMPORTANT:** The {{code}}source_reliability_score{{/code}} in each evidence item is a **historical snapshot** from the weekly background scoring job.
465 +{{{
466 +class AnthropicProvider implements LLMProvider {
467 + async complete(prompt: string, options: CompletionOptions) {
468 + const response = await anthropic.messages.create({
469 + model: options.model || 'claude-sonnet-3-5',
470 + max_tokens: options.maxTokens || 4096,
471 + messages: [{ role: 'user', content: prompt }],
472 + system: options.systemPrompt
473 + })
474 + return response.content[0].text
475 + }
476 +}
356 356  
357 -* POC1 treats these scores as **read-only** (no modification during analysis)
358 -* **Prevents circular dependency:** scoring → affects retrieval → affects scoring
359 -* Full Source Track Record System is a **separate service** (not part of POC1)
360 -* **Temporal separation:** Scoring runs weekly; analysis uses snapshots
478 +class OpenAIProvider implements LLMProvider {
479 + async complete(prompt: string, options: CompletionOptions) {
480 + const response = await openai.chat.completions.create({
481 + model: options.model || 'gpt-4o',
482 + max_tokens: options.maxTokens || 4096,
483 + messages: [
484 + { role: 'system', content: options.systemPrompt },
485 + { role: 'user', content: prompt }
486 + ]
487 + })
488 + return response.choices[0].message.content
489 + }
490 +}
491 +}}}
361 361  
362 -**See:** [[Data Model>>Test.FactHarbor.Specification.Data Model.WebHome]] Section 1.3 (Source Track Record System) for scoring algorithm.
493 +**Provider Registry:**
363 363  
364 -=== 4.3 Quality Gate Reason Codes ===
495 +{{{
496 +const providers = new Map<string, LLMProvider>()
497 +providers.set('anthropic', new AnthropicProvider())
498 +providers.set('openai', new OpenAIProvider())
499 +providers.set('google', new GoogleProvider())
365 365  
366 -**Gate 1 (Claim Validation):**
367 -* {{code}}OPINION_DETECTED{{/code}} - Subjective judgment without factual anchor
368 -* {{code}}COMPOUND_CLAIM{{/code}} - Multiple claims in one statement
369 -* {{code}}SUBJECTIVE{{/code}} - Value judgment, not verifiable fact
370 -* {{code}}TOO_VAGUE{{/code}} - Lacks specificity for evaluation
501 +function getProvider(name: string): LLMProvider {
502 + return providers.get(name) || providers.get(config.primaryProvider)
503 +}
504 +}}}
371 371  
372 -**Gate 4 (Verdict Confidence):**
373 -* {{code}}LOW_CONFIDENCE{{/code}} - Confidence below threshold (<0.5)
374 -* {{code}}INSUFFICIENT_EVIDENCE{{/code}} - Too few sources to reach verdict
375 -* {{code}}CONTRADICTORY_EVIDENCE{{/code}} - Evidence conflicts without resolution
376 -* {{code}}NO_COUNTER_EVIDENCE{{/code}} - Contradiction search failed
506 +----
377 377  
378 -**Purpose:** Enable system improvement workflow (Observe → Analyze → Improve)
508 +== 3. REST API Contract ==
379 379  
380 ----
510 +=== 3.1 User Credit Tracking ===
381 381  
382 -== 5. Validation Rules (POC1 Enforcement) ==
512 +**Endpoint:** GET /v1/user/credit
383 383  
384 -|=Rule|=Requirement
385 -|**Mandatory Contradiction**|For every claim, the engine MUST search for "undermines" evidence. If none found, reasoning must explicitly state: "No counter-evidence found despite targeted search." Evidence must include at least 1 item with {{code}}stance ∈ {undermines, mixed, context_dependent}{{/code}} OR explicit note in {{code}}uncertainty_factors{{/code}}.
386 -|**Context-Aware Logic**|The {{code}}overall_verdict{{/code}} must prioritize central claims. If a {{code}}is_central_to_thesis=true{{/code}} claim is REFUTED, the overall article cannot be WELL-SUPPORTED. Central claims override verdict averaging.
387 -|**Author Identification**|All automated outputs MUST include {{code}}author_type: "AI/AKEL"{{/code}} or equivalent marker to distinguish AI-generated from human-reviewed content.
388 -|**Claim-to-Scenario Lifecycle**|In stateless POC1, Scenarios are **strictly children** of a specific Claim version. If a Claim's text changes, child Scenarios are part of that version's "snapshot." No scenario migration across versions.
514 +**Response:** 200 OK
389 389  
390 ----
516 +{{{{
517 + "user_id": "user_abc123",
518 + "tier": "free",
519 + "credit_limit": 10.00,
520 + "credit_used": 7.42,
521 + "credit_remaining": 2.58,
522 + "reset_date": "2025-02-01T00:00:00Z",
523 + "cache_only_mode": false,
524 + "usage_stats": {
525 + "articles_analyzed": 67,
526 + "claims_from_cache": 189,
527 + "claims_newly_analyzed": 113,
528 + "cache_hit_rate": 0.626
529 + }
530 +}
531 +}}}
391 391  
392 -== 6. Deterministic Markdown Template ==
533 +----
393 393  
394 -The system renders {{code}}report.md{{/code}} using a **fixed template** based on the JSON result (NOT generated by LLM).
535 +=== 3.2 Create Analysis Job (3-Stage) ===
395 395  
396 -{{code language="markdown"}}
397 -# FactHarbor Analysis Report: {overall_verdict}
537 +**Endpoint:** POST /v1/analyze
398 398  
399 -**Job ID:** {job_id} | **Generated:** {timestamp_utc}
400 -**Model:** {llm_model} | **Cost:** ${estimated_cost_usd} | **Time:** {response_time_sec}s
539 +==== Idempotency Support: ====
401 401  
402 ----
541 +To prevent duplicate job creation on network retries, clients SHOULD include:
403 403  
404 -## 1. Holistic Assessment (Experimental)
543 +{{{POST /v1/analyze
544 +Idempotency-Key: {client-generated-uuid}
545 +}}}
405 405  
406 -**Main Thesis:** {main_thesis}
547 +OR use the client.request_id field:
407 407  
408 -**Overall Verdict:** {overall_verdict}
549 +{{{{
550 + "input_url": "...",
551 + "client": {
552 + "request_id": "client-uuid-12345",
553 + "source_label": "optional"
554 + }
555 +}
556 +}}}
409 409  
410 -**Logic Quality Score:** {logic_quality_score}/1.0
558 +**Server Behavior:**
411 411  
412 -**Fallacies Detected:** {fallacies_detected}
560 +* If Idempotency-Key or request_id seen before (within 24 hours):
561 +** Return existing job (200 OK, not 202 Accepted)
562 +** Do NOT create duplicate job or charge twice
563 +* Idempotency keys expire after 24 hours (matches job retention)
413 413  
414 -**Reasoning:** {verdict_reasoning}
565 +**Example Response (Idempotent):**
415 415  
416 ----
567 +{{{{
568 + "job_id": "01J...ULID",
569 + "status": "RUNNING",
570 + "idempotent": true,
571 + "original_request_at": "2025-12-24T10:31:00Z",
572 + "message": "Returning existing job (idempotency key matched)"
573 +}
574 +}}}
417 417  
418 -## 2. Key Claims Analysis
576 +==== Request Body: ====
419 419  
420 -### [C1] {claim_text}
421 -* **Role:** {is_central_to_thesis ? "Central to thesis" : "Supporting claim"}
422 -* **Risk Tier:** {risk_tier} ({risk_tier_justification})
423 -* **Evaluability:** {evaluability}
578 +{{{{
579 + "input_type": "url",
580 + "input_url": "https://example.com/medical-report-01",
581 + "input_text": null,
582 + "options": {
583 + "browsing": "on",
584 + "depth": "standard",
585 + "max_claims": 5,
586 + "scenarios_per_claim": 2,
587 + "max_evidence_per_scenario": 6,
588 + "context_aware_analysis": true
589 + },
590 + "client": {
591 + "request_id": "optional-client-tracking-id",
592 + "source_label": "optional"
593 + }
594 +}
595 +}}}
424 424  
425 -**Scenarios Explored:** {scenarios.length}
597 +**Options:**
426 426  
427 -#### Scenario: {scenario.context_title}
428 -* **Verdict:** {verdict.label} (Confidence: {verdict.confidence})
429 -* **Probability Range:** {verdict.probability_range[0]} - {verdict.probability_range[1]}
430 -* **Reasoning:** {verdict.reasoning}
599 +* browsing: on | off (retrieve web sources or just output queries)
600 +* depth: standard | deep (evidence thoroughness)
601 +* max_claims: 1-10 (default: **5** for cost control)
602 +* scenarios_per_claim: 1-5 (default: **2** for cost control)
603 +* max_evidence_per_scenario: 3-10 (default: **6**)
604 +* context_aware_analysis: true | false (experimental)
431 431  
432 -**Evidence:**
433 -* Supporting: {evidence.filter(e => e.stance == "supports").length} sources
434 -* Undermining: {evidence.filter(e => e.stance == "undermines").length} sources
435 -* Mixed: {evidence.filter(e => e.stance == "mixed").length} sources
606 +**Response:** 202 Accepted
436 436  
437 -**Key Evidence:**
438 -* [{evidence[0].citation.title}]({evidence[0].citation.url}) - {evidence[0].stance}
608 +{{{{
609 + "job_id": "01J...ULID",
610 + "status": "QUEUED",
611 + "created_at": "2025-12-24T10:31:00Z",
612 + "estimated_cost": 0.114,
613 + "cost_breakdown": {
614 + "stage1_extraction": 0.003,
615 + "stage2_new_claims": 0.081,
616 + "stage2_cached_claims": 0.000,
617 + "stage3_holistic": 0.030
618 + },
619 + "cache_info": {
620 + "claims_to_extract": 5,
621 + "estimated_cache_hits": 4,
622 + "estimated_new_claims": 1
623 + },
624 + "links": {
625 + "self": "/v1/jobs/01J...ULID",
626 + "result": "/v1/jobs/01J...ULID/result",
627 + "report": "/v1/jobs/01J...ULID/report",
628 + "events": "/v1/jobs/01J...ULID/events"
629 + }
630 +}
631 +}}}
439 439  
440 ----
633 +**Error Responses:**
441 441  
442 -## 3. Quality Assessment
635 +402 Payment Required - Free tier limit reached, cache-only mode
443 443  
444 -**Quality Gates:**
445 -* Gate 1 (Claim Validation): {gate1_claim_validation}
446 -* Gate 4 (Verdict Confidence): {gate4_verdict_confidence}
447 -* Overall: {passed_all ? "PASS" : "FAIL"}
637 +{{{{
638 + "error": "credit_limit_reached",
639 + "message": "Monthly credit limit reached. Entering cache-only mode.",
640 + "cache_only_mode": true,
641 + "credit_remaining": 0.00,
642 + "reset_date": "2025-02-01T00:00:00Z",
643 + "action": "Resubmit with cache_preference=allow_partial for cached results"
644 +}
645 +}}}
448 448  
449 -{if gate_fail_reasons.length > 0}
450 -**Failed Gates:**
451 -{gate_fail_reasons.map(r => `* ${r.gate}: ${r.explanation}`)}
452 -{/if}
647 +----
453 453  
454 ----
649 +== 4. Data Schemas ==
455 455  
456 -## 4. Limitations & Disclaimers
651 +=== 4.1 Stage 1 Output: ClaimExtraction ===
457 457  
458 -**System Limitations:**
459 -{limitations.map(l => `* ${l}`)}
653 +{{{{
654 + "job_id": "01J...ULID",
655 + "stage": "stage1_extraction",
656 + "article_metadata": {
657 + "title": "Article title",
658 + "source_url": "https://example.com/article",
659 + "extracted_text_length": 5234,
660 + "language": "en"
661 + },
662 + "claims": [
663 + {
664 + "claim_id": "C1",
665 + "claim_text": "Original claim text from article",
666 + "canonical_claim": "Normalized, deduplicated phrasing",
667 + "claim_hash": "sha256:abc123...",
668 + "is_central_to_thesis": true,
669 + "claim_type": "causal",
670 + "evaluability": "evaluable",
671 + "risk_tier": "B",
672 + "domain": "public_health"
673 + }
674 + ],
675 + "article_thesis": "Main argument detected",
676 + "cost": 0.003
677 +}
678 +}}}
460 460  
461 -**Important Notes:**
462 -* This analysis is AI-generated and experimental (POC1)
463 -* Context-aware article verdict is being tested for accuracy
464 -* Human review recommended for high-risk claims (Tier A)
465 -* Cost: ${estimated_cost_usd} | Tokens: {input_tokens + output_tokens}
680 +----
466 466  
467 -**Methodology:** FactHarbor uses Claude 3.5 Sonnet to extract claims, generate scenarios, gather evidence (with mandatory contradiction search), and assess logical coherence between claims and article thesis.
682 +=== 4.5 Verdict Label Taxonomy ===
468 468  
469 ----
684 +FactHarbor uses **three distinct verdict taxonomies** depending on analysis level:
470 470  
471 -*Generated by FactHarbor POC1-v0.3 | [About FactHarbor](https://factharbor.org)*
472 -{{/code}}
686 +==== 4.5.1 Scenario Verdict Labels (Stage 2) ====
473 473  
474 -**Target Report Size:** 220-350 words (optimized for 2-minute read)
688 +Used for individual scenario verdicts within a claim.
475 475  
476 ----
690 +**Enum Values:**
477 477  
478 -== 7. LLM Configuration (POC1) ==
692 +* Highly Likely - Probability 0.85-1.0, high confidence
693 +* Likely - Probability 0.65-0.84, moderate-high confidence
694 +* Unclear - Probability 0.35-0.64, or low confidence
695 +* Unlikely - Probability 0.16-0.34, moderate-high confidence
696 +* Highly Unlikely - Probability 0.0-0.15, high confidence
697 +* Unsubstantiated - Insufficient evidence to determine probability
479 479  
480 -|=Parameter|=Value|=Notes
481 -|**Provider**|Anthropic|Primary provider for POC1
482 -|**Model**|{{code}}claude-3-5-sonnet-20241022{{/code}}|Current production model
483 -|**Future Model**|{{code}}claude-sonnet-4-20250514{{/code}}|When available (architecture supports)
484 -|**Token Budget**|50K-80K per analysis|Input + output combined (varies by article length)
485 -|**Estimated Cost**|$0.10-0.30 per article|Based on Sonnet 3.5 pricing ($3/M input, $15/M output)
486 -|**Prompt Strategy**|Single-pass per stage|Not multi-turn; structured JSON output with schema validation
487 -|**Chain-of-Thought**|Yes|For verdict reasoning and holistic assessment
488 -|**Few-Shot Examples**|Yes|For claim extraction and scenario generation
699 +==== 4.5.2 Claim Verdict Labels (Rollup) ====
489 489  
490 -=== 7.1 Token Budgets by Stage ===
701 +Used when summarizing a claim across all scenarios.
491 491  
492 -|=Stage|=Approximate Output Tokens
493 -|Claim Extraction|~4,000 (10 claims × ~400 tokens)
494 -|Scenario Generation|~3,000 per claim (3 scenarios × ~1,000 tokens)
495 -|Evidence Synthesis|~2,000 per scenario
496 -|Verdict Generation|~1,000 per scenario
497 -|Holistic Assessment|~500 (context-aware summary)
703 +**Enum Values:**
498 498  
499 -**Total:** 50K-80K tokens per article (input + output)
705 +* Supported - Majority of scenarios are Likely or Highly Likely
706 +* Refuted - Majority of scenarios are Unlikely or Highly Unlikely
707 +* Inconclusive - Mixed scenarios or majority Unclear/Unsubstantiated
500 500  
501 -=== 7.2 API Integration ===
709 +**Mapping Logic:**
502 502  
503 -**Anthropic Messages API:**
504 -* Endpoint: {{code}}https://api.anthropic.com/v1/messages{{/code}}
505 -* Authentication: API key via {{code}}x-api-key{{/code}} header
506 -* Model parameter: {{code}}"model": "claude-3-5-sonnet-20241022"{{/code}}
507 -* Max tokens: {{code}}"max_tokens": 4096{{/code}} (per stage)
711 +* If ≥60% scenarios are (Highly Likely | Likely) → Supported
712 +* If ≥60% scenarios are (Highly Unlikely | Unlikely) → Refuted
713 +* Otherwise → Inconclusive
508 508  
509 -**No LangChain/LangGraph needed** for POC1 simplicity - direct SDK calls suffice.
715 +==== 4.5.3 Article Verdict Labels (Stage 3) ====
510 510  
511 ----
717 +Used for holistic article-level assessment.
512 512  
513 -== 8. Cross-References (xWiki) ==
719 +**Enum Values:**
514 514  
515 -This API specification implements requirements from:
721 +* WELL-SUPPORTED - Article thesis logically follows from supported claims
722 +* MISLEADING - Claims may be true but article commits logical fallacies
723 +* REFUTED - Central claims are refuted, invalidating thesis
724 +* UNCERTAIN - Insufficient evidence or highly mixed claim verdicts
516 516  
517 -* **[[POC Requirements>>Test.FactHarbor.Specification.POC.Requirements]]**
518 -** FR-POC-1 through FR-POC-6 (POC1-specific functional requirements)
519 -** NFR-POC-1 through NFR-POC-3 (quality gates lite: Gates 1 & 4 only)
520 -** Section 2.1: Analysis Summary (Context-Aware) component specification
521 -** Section 10.3: Prompt structure for claim extraction and verdict synthesis
726 +**Note:** Article verdict considers **claim centrality** (central claims override supporting claims).
522 522  
523 -* **[[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]]**
524 -** Complete investigation of 7 approaches to article-level verdicts
525 -** Approach 1 (Single-Pass Holistic Analysis) chosen for POC1
526 -** Experimental feature testing plan (30 articles, ≥70% accuracy target)
527 -** Decision framework for POC2 implementation
728 +==== 4.5.4 API Field Mapping ====
528 528  
529 -* **[[Requirements>>Test.FactHarbor.Specification.Requirements.WebHome]]**
530 -** FR4 (Analysis Summary) - enhanced with context-aware capability
531 -** FR7 (Verdict Calculation) - probability ranges + confidence scores
532 -** NFR11 (Quality Gates) - POC1 implements Gates 1 & 4; Gates 2 & 3 in POC2
730 +|=Level|=API Field|=Enum Name
731 +|Scenario|scenarios[].verdict.label|scenario_verdict_label
732 +|Claim|claims[].rollup_verdict (optional)|claim_verdict_label
733 +|Article|article_holistic_assessment.overall_verdict|article_verdict_label
533 533  
534 -* **[[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]]**
535 -** POC1 simplified architecture (stateless, single AKEL orchestration call)
536 -** Data persistence minimized (job outputs only, no database required)
537 -** Deferred complexity (no Elasticsearch, TimescaleDB, Federation until metrics justify)
735 +----
538 538  
539 -* **[[Data Model>>Test.FactHarbor.Specification.Data Model.WebHome]]**
540 -** Evidence structure (source, stance, reliability rating)
541 -** Scenario boundaries (time, geography, population, conditions)
542 -** Claim types and evaluability taxonomy
543 -** Source Track Record System (Section 1.3) - temporal separation
737 +== 5. Cache Architecture ==
544 544  
545 -* **[[Requirements Roadmap Matrix>>Test.FactHarbor.Roadmap.Requirements-Roadmap-Matrix.WebHome]]**
546 -** POC1 requirement mappings and phase assignments
547 -** Context-aware analysis as POC1 experimental feature
548 -** POC2 enhancement path (Gates 2 & 3, evidence deduplication)
739 +=== 5.1 Redis Cache Design ===
549 549  
550 ----
741 +**Technology:** Redis 7.0+ (in-memory key-value store)
551 551  
552 -== 9. Implementation Notes (POC1) ==
743 +**Cache Key Schema:**
553 553  
554 -=== 9.1 Recommended Tech Stack ===
745 +{{{claim:v1norm1:{language}:{sha256(canonical_claim)}
746 +}}}
555 555  
556 -* **Framework:** Next.js 14+ with App Router (TypeScript) - Full-stack in one codebase
557 -* **Rationale:** API routes + React UI unified, Vercel deployment-ready, similar to C# in structure
558 -* **Storage:** Filesystem JSON files (no database needed for POC1)
559 -* **Queue:** In-memory queue or Redis (optional for concurrency)
560 -* **URL Extraction:** Jina AI Reader API (primary), trafilatura (fallback)
561 -* **Deployment:** Vercel, AWS Lambda, or similar serverless
748 +**Example:**
562 562  
563 -=== 9.2 POC1 Simplifications ===
750 +{{{Claim (English): "COVID vaccines are 95% effective"
751 +Canonical: "covid vaccines are 95 percent effective"
752 +Language: "en"
753 +SHA256: abc123...def456
754 +Key: claim:v1norm1:en:abc123...def456
755 +}}}
564 564  
565 -* **No database required:** Job metadata + outputs stored as JSON files ({{code}}jobs/{job_id}.json{{/code}}, {{code}}results/{job_id}.json{{/code}})
566 -* **No user authentication:** Optional API key validation only (env var: {{code}}FACTHARBOR_API_KEY{{/code}})
567 -* **Single-instance deployment:** No distributed processing, no worker pools
568 -* **Synchronous LLM calls:** No streaming in POC1 (entire response before returning)
569 -* **Job retention:** 24 hours default (configurable: {{code}}JOB_RETENTION_HOURS{{/code}})
570 -* **Rate limiting:** Simple IP-based (optional) - no complex billing
757 +**Rationale:** Prevents cross-language collisions and enables per-language cache analytics.
571 571  
572 -=== 9.3 Estimated Costs (Per Analysis) ===
759 +**Data Structure:**
573 573  
574 -**LLM API costs (Claude 3.5 Sonnet):**
575 -* Input: $3.00 per million tokens
576 -* Output: $15.00 per million tokens
577 -* **Per article:** $0.10-0.30 (varies by length, 5-10 claims typical)
761 +{{{SET claim:v1norm1:en:abc123...def456 '{...ClaimAnalysis JSON...}'
762 +EXPIRE claim:v1norm1:en:abc123...def456 7776000 # 90 days
763 +}}}
578 578  
579 -**Web search costs (optional):**
580 -* Using external search API (Tavily, Brave): $0.01-0.05 per analysis
581 -* POC1 can use free search APIs initially
765 +----
582 582  
583 -**Infrastructure costs:**
584 -* Vercel hobby tier: Free for POC
585 -* AWS Lambda: ~$0.001 per request
586 -* **Total infra:** <$0.01 per analysis
767 +=== 5.1.1 Canonical Claim Normalization (v1) ===
587 587  
588 -**Total estimated cost:** ~$0.15-0.35 per analysis Meets <$0.35 target
769 +The cache key depends on deterministic claim normalization. All implementations MUST follow this algorithm exactly.
589 589  
590 -=== 9.4 Estimated Timeline (AI-Assisted) ===
771 +**Algorithm: Canonical Claim Normalization v1**
591 591  
592 -**With Cursor IDE + Claude API:**
593 -* Day 1-2: API scaffolding + job queue
594 -* Day 3-4: LLM integration + prompt engineering
595 -* Day 5-6: Evidence retrieval + contradiction search
596 -* Day 7: Report templates + testing with 30 articles
597 -* **Total:** 5-7 days for working POC1
773 +{{{def normalize_claim_v1(claim_text: str, language: str) -> str:
774 + """
775 + Normalizes claim to canonical form for cache key generation.
776 + Version: v1norm1 (POC1)
777 + """
778 + import re
779 + import unicodedata
780 +
781 + # Step 1: Unicode normalization (NFC)
782 + text = unicodedata.normalize('NFC', claim_text)
783 +
784 + # Step 2: Lowercase
785 + text = text.lower()
786 +
787 + # Step 3: Remove punctuation (except hyphens in words)
788 + text = re.sub(r'[^\w\s-]', '', text)
789 +
790 + # Step 4: Normalize whitespace (collapse multiple spaces)
791 + text = re.sub(r'\s+', ' ', text).strip()
792 +
793 + # Step 5: Numeric normalization
794 + text = text.replace('%', ' percent')
795 + # Spell out single-digit numbers
796 + num_to_word = {'0':'zero', '1':'one', '2':'two', '3':'three',
797 + '4':'four', '5':'five', '6':'six', '7':'seven',
798 + '8':'eight', '9':'nine'}
799 + for num, word in num_to_word.items():
800 + text = re.sub(rf'\b{num}\b', word, text)
801 +
802 + # Step 6: Common abbreviations (English only in v1)
803 + if language == 'en':
804 + text = text.replace('covid-19', 'covid')
805 + text = text.replace('u.s.', 'us')
806 + text = text.replace('u.k.', 'uk')
807 +
808 + # Step 7: NO entity normalization in v1
809 + # (Trump vs Donald Trump vs President Trump remain distinct)
810 +
811 + return text
598 598  
599 -**Manual coding (no AI assistance):**
600 -* Estimate: 15-20 days
813 +# Version identifier (include in cache namespace)
814 +CANONICALIZER_VERSION = "v1norm1"
815 +}}}
601 601  
602 -=== 9.5 First Prompt for AI Code Generation ===
817 +**Cache Key Formula (Updated):**
603 603  
604 -{{code}}
605 -Based on the FactHarbor POC1 API & Schemas Specification (v0.3), generate a Next.js 14 TypeScript application with:
819 +{{{language = "en"
820 +canonical = normalize_claim_v1(claim_text, language)
821 +cache_key = f"claim:{CANONICALIZER_VERSION}:{language}:{sha256(canonical)}"
606 606  
607 -1. API routes implementing the 7 endpoints specified in Section 3
608 -2. AnalyzeRequest/AnalysisResult types matching schemas in Sections 4-5
609 -3. Anthropic Claude 3.5 Sonnet integration for:
610 - - Claim extraction (with central/supporting marking)
611 - - Scenario generation
612 - - Evidence synthesis (with mandatory contradiction search)
613 - - Verdict generation
614 - - Holistic assessment (article-level credibility)
615 -4. Job-based async execution with progress tracking (7 pipeline stages)
616 -5. Quality Gates 1 & 4 from NFR11 implementation
617 -6. Mandatory contradiction search enforcement (Section 5)
618 -7. Context-aware analysis (experimental) as specified
619 -8. Filesystem-based job storage (no database)
620 -9. Markdown report generation from JSON templates (Section 6)
823 +Example:
824 + claim: "COVID-19 vaccines are 95% effective"
825 + canonical: "covid vaccines are 95 percent effective"
826 + sha256: abc123...def456
827 + key: "claim:v1norm1:en:abc123...def456"
828 +}}}
621 621  
622 -Use the validation rules from Section 5 and error codes from Section 2.1.1.
623 -Target: <$0.35 per analysis, <2 minutes processing time.
624 -{{/code}}
830 +**Cache Metadata MUST Include:**
625 625  
626 ----
832 +{{{{
833 + "canonical_claim": "covid vaccines are 95 percent effective",
834 + "canonicalizer_version": "v1norm1",
835 + "language": "en",
836 + "original_claim_samples": ["COVID-19 vaccines are 95% effective"]
837 +}
838 +}}}
627 627  
628 -== 10. Testing Strategy (POC1) ==
840 +**Version Upgrade Path:**
629 629  
630 -=== 10.1 Test Dataset (30 Articles) ===
842 +* v1norm1 → v1norm2: Cache namespace changes, old keys remain valid until TTL
843 +* v1normN → v2norm1: Major version bump, invalidate all v1 caches
631 631  
632 -**Category 1: Straightforward Factual (10 articles)**
633 -* Purpose: Baseline accuracy
634 -* Example: "WHO report on global vaccination rates"
635 -* Expected: High claim accuracy, straightforward verdict
845 +----
636 636  
637 -**Category 2: Accurate Claims, Questionable Conclusions (10 articles)** ⭐ **Context-Aware Test**
638 -* Purpose: Test holistic assessment capability
639 -* Example: "Coffee cures cancer" (true premises, false conclusion)
640 -* Expected: Individual claims TRUE, article verdict MISLEADING
847 +=== 5.1.2 Copyright & Data Retention Policy ===
641 641  
642 -**Category 3: Mixed Accuracy (5 articles)**
643 -* Purpose: Test nuance handling
644 -* Example: Articles with some true, some false claims
645 -* Expected: Scenario-level differentiation
849 +**Evidence Excerpt Storage:**
646 646  
647 -**Category 4: Low-Quality Claims (5 articles)**
648 -* Purpose: Test quality gates
649 -* Example: Opinion pieces, compound claims
650 -* Expected: Gate 1 failures, rejection or draft-only mode
851 +To comply with copyright law and fair use principles:
651 651  
652 -=== 10.2 Success Metrics ===
853 +**What We Store:**
653 653  
654 -**Quality Metrics:**
655 -* Hallucination rate: <5% (target: <3%)
656 -* Context-aware accuracy: ≥70% (experimental - key POC1 goal)
657 -* False positive rate: <15%
658 -* Mandatory contradiction search: 100% compliance
855 +* **Metadata only:** Title, author, publisher, URL, publication date
856 +* **Short excerpts:** Max 25 words per quote, max 3 quotes per evidence item
857 +* **Summaries:** AI-generated bullet points (not verbatim text)
858 +* **No full articles:** Never store complete article text beyond job processing
659 659  
660 -**Performance Metrics:**
661 -* Processing time: <2 minutes per article (standard depth)
662 -* Cost per analysis: <$0.35
663 -* API uptime: >99%
664 -* LLM API error rate: <1%
860 +**Total per Cached Claim:**
665 665  
666 -**See:** [[POC1 Roadmap>>Test.FactHarbor.Roadmap.POC1.WebHome]] Section 11 for complete success criteria and testing methodology.
862 +* Scenarios: 2 per claim
863 +* Evidence items: 6 per scenario (12 total)
864 +* Quotes: 3 per evidence × 25 words = 75 words per item
865 +* **Maximum stored verbatim text:** ~~900 words per claim (12 × 75)
667 667  
668 ----
867 +**Retention:**
669 669  
670 -**End of Specification - FactHarbor POC1 API v0.3**
869 +* Cache TTL: 90 days
870 +* Job outputs: 24 hours (then archived or deleted)
871 +* No persistent full-text article storage
671 671  
672 -**Ready for xWiki import and AI-assisted implementation!** 🚀
873 +**Rationale:**
673 673  
875 +* Short excerpts for citation = fair use
876 +* Summaries are transformative (not copyrightable)
877 +* Limited retention (90 days max)
878 +* No commercial republication of excerpts
879 +
880 +**DMCA Compliance:**
881 +
882 +* Cache invalidation endpoint available for rights holders
883 +* Contact: dmca@factharbor.org
884 +
885 +----
886 +
887 +== Summary ==
888 +
889 +This WYSIWYG preview shows the **structure and key sections** of the 1,515-line API specification.
890 +
891 +**Full specification includes:**
892 +
893 +* Complete API endpoints (7 total)
894 +* All data schemas (ClaimExtraction, ClaimAnalysis, HolisticAssessment, Complete)
895 +* Quality gates & validation rules
896 +* LLM configuration for all 3 stages
897 +* Implementation notes with code samples
898 +* Testing strategy
899 +* Cross-references to other pages
900 +
901 +**The complete specification is available in:**
902 +
903 +* FactHarbor_POC1_API_and_Schemas_Spec_v0_4_1_PATCHED.md (45 KB standalone)
904 +* Export files (TEST/PRODUCTION) for xWiki import