Changes for page POC1 API & Schemas Specification
Last modified by Robert Schaub on 2025/12/24 18:26
From version 3.1
edited by Robert Schaub
on 2025/12/24 16:32
on 2025/12/24 16:32
Change comment:
There is no comment for this version
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -1,578 +1,673 @@ 1 - =POC1 API & Schemas Specification=1 +# FactHarbor POC1 — API & Schemas Specification 2 2 3 ----- 3 +**Version:** 0.3 (POC1 - Production Ready) 4 +**Namespace:** FactHarbor.* 5 +**Syntax:** xWiki 2.1 6 +**Last Updated:** 2025-12-24 4 4 8 +--- 9 + 5 5 == Version History == 6 6 7 7 |=Version|=Date|=Changes 8 -|0.4.1|2025-12-24|Applied 9 critical fixes: file format notice, verdict taxonomy, canonicalization algorithm, Stage 1 cost policy, BullMQ fix, language in cache key, historical claims TTL, idempotency, copyright policy 9 -|0.4|2025-12-24|**BREAKING:** 3-stage pipeline with claim-level caching, user tier system, cache-only mode for free users, Redis cache architecture 10 -|0.3.1|2025-12-24|Fixed single-prompt strategy, SSE clarification, schema canonicalization, cost constraints 11 -|0.3|2025-12-24|Added complete API endpoints, LLM config, risk tiers, scraping details 13 +|0.3|2025-12-24|Added complete API endpoints, LLM config, risk tiers, scraping details, quality gate logging, temporal separation note, cross-references 14 +|0.2|2025-12-24|Initial rebased version with holistic assessment 15 +|0.1|2025-12-24|Original specification 12 12 13 ---- -17 +--- 14 14 15 15 == 1. Core Objective (POC1) == 16 16 17 -The primary technical goal of POC1 is to validate **Approach 1 (Single-Pass Holistic Analysis)** while implementing **claim-level caching** to achieve cost sustainability.21 +The primary technical goal of POC1 is to validate **Approach 1 (Single-Pass Holistic Analysis)**: 18 18 19 -The system must prove that AI can identify an article's **Main Thesis** and determine if supporting claims logically support that thesis without committing fallacies. 23 +The system must prove that AI can identify an article's **Main Thesis** and determine if the supporting claims (even if individually accurate) logically support that thesis without committing fallacies (e.g., correlation vs. causation, cherry-picking, hasty generalization). 20 20 21 -=== Success Criteria: === 22 - 25 +**Success Criteria:** 23 23 * Test with 30 diverse articles 24 24 * Target: ≥70% accuracy detecting misleading articles 25 -* Cost: <$0.25 per NEW analysis (uncached) 26 -* Cost: $0.00 for cached claim reuse 27 -* Cache hit rate: ≥50% after 1,000 articles 28 +* Cost: <$0.35 per analysis 28 28 * Processing time: <2 minutes (standard depth) 29 29 30 - ===EconomicModel:===31 +**See:** [[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]] for complete investigation of 7 approaches. 31 31 32 -* **Free tier:** $10 credit per month (~~40-140 articles depending on cache hits) 33 -* **After limit:** Cache-only mode (instant, free access to cached claims) 34 -* **Paid tier:** Unlimited new analyses 33 +--- 35 35 36 - ----35 +== 2. Runtime Model & Job States == 37 37 38 -== 2. ArchitectureOverview==37 +=== 2.1 Pipeline Steps === 39 39 40 - ===2.13-StagePipeline withCaching===39 +For progress reporting via API, the pipeline follows these stages: 41 41 42 -FactHarbor POC1 uses a **3-stage architecture** designed for claim-level caching and cost efficiency: 41 +# **INGEST**: URL scraping (Jina Reader / Trafilatura) or text normalization. 42 +# **EXTRACT_CLAIMS**: Identifying 3-5 verifiable factual claims + marking central vs. supporting. 43 +# **SCENARIOS**: Generating context interpretations for each claim. 44 +# **RETRIEVAL**: Evidence gathering (Search API + mandatory contradiction search). 45 +# **VERDICTS**: Assigning likelihoods, confidence, and uncertainty per scenario. 46 +# **HOLISTIC_ASSESSMENT**: Evaluating article-level credibility (Thesis vs. Claims logic). 47 +# **REPORT**: Generating final Markdown and JSON outputs. 43 43 44 -{{mermaid}} 45 -graph TD 46 - A[Article Input] --> B[Stage 1: Extract Claims] 47 - B --> C{For Each Claim} 48 - C --> D[Check Cache] 49 - D -->|Cache HIT| E[Return Cached Verdict] 50 - D -->|Cache MISS| F[Stage 2: Analyze Claim] 51 - F --> G[Store in Cache] 52 - G --> E 53 - E --> H[Stage 3: Holistic Assessment] 54 - H --> I[Final Report] 55 -{{/mermaid}} 49 +=== 2.1.1 URL Extraction Strategy === 56 56 57 -==== Stage 1: Claim Extraction (Haiku, no cache) ==== 51 +**Primary:** Jina AI Reader ({{code}}https://r.jina.ai/{url}{{/code}}) 52 +* **Rationale:** Clean markdown, handles JS rendering, free tier sufficient 53 +* **Fallback:** Trafilatura (Python library) for simple static HTML 58 58 59 -* **Input:** Article text 60 -* **Output:** 5 canonical claims (normalized, deduplicated) 61 -* **Model:** Claude Haiku 4 62 -* **Cost:** $0.003 per article 63 -* **Cache strategy:** No caching (article-specific) 55 +**Error Handling:** 64 64 65 -==== Stage 2: Claim Analysis (Sonnet, CACHED) ==== 57 +|=Error Code|=Trigger|=Action 58 +|{{code}}URL_BLOCKED{{/code}}|403/401/Paywall detected|Return error, suggest text paste 59 +|{{code}}URL_UNREACHABLE{{/code}}|Network/DNS failure|Retry once, then fail 60 +|{{code}}URL_NOT_FOUND{{/code}}|404 Not Found|Return error immediately 61 +|{{code}}EXTRACTION_FAILED{{/code}}|Content <50 words or unreadable|Return error with reason 66 66 67 -* **Input:** Singlecanonicalclaim68 -* **Output:**Scenarios+Evidence + Verdicts69 -* **Model:**ClaudeSonnet3.570 -* **Cost:**$0.081perNEW claim71 -* **Cachestrategy:**Redis,90-dayTTL72 -* **Cachekey:**claim:v1norm1:{language}:{sha256(canonical_claim)}63 +**Supported URL Patterns:** 64 +* ✅ News articles, blog posts, Wikipedia 65 +* ✅ Academic preprints (arXiv) 66 +* ❌ Social media posts (Twitter, Facebook) - not in POC1 67 +* ❌ Video platforms (YouTube, TikTok) - not in POC1 68 +* ❌ PDF files - deferred to Beta 0 73 73 74 -=== =Stage 3: Holistic Assessment(Sonnet,no cache)====70 +=== 2.2 Job Status Enumeration === 75 75 76 -* **Input:** Article + Claim verdicts (from cache or Stage 2) 77 -* **Output:** Article verdict + Fallacies + Logic quality 78 -* **Model:** Claude Sonnet 3.5 79 -* **Cost:** $0.030 per article 80 -* **Cache strategy:** No caching (article-specific) 72 +((( 73 +* **QUEUED** - Job accepted, waiting in queue 74 +* **RUNNING** - Processing in progress 75 +* **SUCCEEDED** - Analysis complete, results available 76 +* **FAILED** - Error occurred, see error details 77 +* **CANCELLED** - User cancelled via DELETE endpoint 78 +))) 81 81 82 - === Total Cost Formula: ===80 +--- 83 83 84 - {{{Cost=$0.003(extraction) + (N_new_claims × $0.081)+$0.030(holistic)82 +== 3. REST API Contract == 85 85 86 -Examples: 87 -- 0 new claims (100% cache hit): $0.033 88 -- 1 new claim (80% cache hit): $0.114 89 -- 3 new claims (40% cache hit): $0.276 90 -- 5 new claims (0% cache hit): $0.438 91 -}}} 84 +=== 3.1 Create Analysis Job === 92 92 93 - ----86 +**Endpoint:** {{code}}POST /v1/analyze{{/code}} 94 94 95 -=== 2.2 User Tier System === 88 +**Request Body Example:** 89 +{{code language="json"}} 90 +{ 91 + "input_type": "url", 92 + "input_url": "https://example.com/medical-report-01", 93 + "input_text": null, 94 + "options": { 95 + "browsing": "on", 96 + "depth": "standard", 97 + "max_claims": 5, 98 + "context_aware_analysis": true 99 + }, 100 + "client": { 101 + "request_id": "optional-client-tracking-id", 102 + "source_label": "optional" 103 + } 104 +} 105 +{{/code}} 96 96 97 -|=Tier|=Monthly Credit|=After Limit|=Cache Access|=Analytics 98 -|**Free**|$10|Cache-only mode|✅ Full|Basic 99 -|**Pro** (future)|$50|Continues|✅ Full|Advanced 100 -|**Enterprise** (future)|Custom|Continues|✅ Full + Priority|Full 107 +**Options:** 108 +* {{code}}browsing{{/code}}: {{code}}on{{/code}} | {{code}}off{{/code}} (retrieve web sources or just output queries) 109 +* {{code}}depth{{/code}}: {{code}}standard{{/code}} | {{code}}deep{{/code}} (evidence thoroughness) 110 +* {{code}}max_claims{{/code}}: 1-50 (default: 10) 111 +* {{code}}context_aware_analysis{{/code}}: {{code}}true{{/code}} | {{code}}false{{/code}} (experimental) 101 101 102 -** Free Tier Economics:**113 +**Response:** {{code}}202 Accepted{{/code}} 103 103 104 -* $10 credit = 40-140 articles analyzed (depending on cache hit rate) 105 -* Average 70 articles/month at 70% cache hit rate 106 -* After limit: Cache-only mode 115 +{{code language="json"}} 116 +{ 117 + "job_id": "01J...ULID", 118 + "status": "QUEUED", 119 + "created_at": "2025-12-24T10:31:00Z", 120 + "links": { 121 + "self": "/v1/jobs/01J...ULID", 122 + "result": "/v1/jobs/01J...ULID/result", 123 + "report": "/v1/jobs/01J...ULID/report", 124 + "events": "/v1/jobs/01J...ULID/events" 125 + } 126 +} 127 +{{/code}} 107 107 108 ---- -129 +--- 109 109 110 -=== 2.3Cache-OnlyMode(Free Tier Feature)===131 +=== 3.2 Get Job Status === 111 111 112 - Whenfree users reach their $10 monthly limit, they enter**Cache-OnlyMode**:133 +**Endpoint:** {{code}}GET /v1/jobs/{job_id}{{/code}} 113 113 114 - ==== What Cache-OnlyModeProvides: ====135 +**Response:** {{code}}200 OK{{/code}} 115 115 116 -✅ **Claim Extraction (Platform-Funded):** 137 +{{code language="json"}} 138 +{ 139 + "job_id": "01J...ULID", 140 + "status": "RUNNING", 141 + "created_at": "2025-12-24T10:31:00Z", 142 + "updated_at": "2025-12-24T10:31:22Z", 143 + "progress": { 144 + "step": "RETRIEVAL", 145 + "percent": 60, 146 + "message": "Gathering evidence for C2-S1", 147 + "current_claim_id": "C2", 148 + "current_scenario_id": "C2-S1" 149 + }, 150 + "input_echo": { 151 + "input_type": "url", 152 + "input_url": "https://example.com/medical-report-01" 153 + }, 154 + "links": { 155 + "self": "/v1/jobs/01J...ULID", 156 + "result": "/v1/jobs/01J...ULID/result", 157 + "report": "/v1/jobs/01J...ULID/report" 158 + }, 159 + "error": null 160 +} 161 +{{/code}} 117 117 118 -* Stage 1 extraction runs at $0.003 per article 119 -* **Cost: Absorbed by platform** (not charged to user credit) 120 -* Rationale: Extraction is necessary to check cache, and cost is negligible 121 -* Rate limit: Max 50 extractions/day in cache-only mode (prevents abuse) 163 +--- 122 122 123 - ✅**InstantAccesstoCachedClaims:**165 +=== 3.3 Get JSON Result === 124 124 125 -* Any claim that exists in cache → Full verdict returned 126 -* Cost: $0 (no LLM calls) 127 -* Response time: <100ms 167 +**Endpoint:** {{code}}GET /v1/jobs/{job_id}/result{{/code}} 128 128 129 - ✅**PartialArticle Analysis:**169 +**Response:** {{code}}200 OK{{/code}} (Returns the **AnalysisResult** schema - see Section 4) 130 130 131 -* Checkeach claim against cache132 -* ReturnverdictsforALLcachedclaims133 -* Foruncachedclaims:Return"status":"cache_miss"171 +**Other Responses:** 172 +* {{code}}409 Conflict{{/code}} - Job not finished yet 173 +* {{code}}404 Not Found{{/code}} - Job ID unknown 134 134 135 - ✅ **Cache Coverage Report:**175 +--- 136 136 137 -* "3 of 5 claims available in cache (60% coverage)" 138 -* Links to cached analyses 139 -* Estimated cost to complete: $0.162 (2 new claims) 177 +=== 3.4 Download Markdown Report === 140 140 141 - ❌**Not Available inCache-OnlyMode:**179 +**Endpoint:** {{code}}GET /v1/jobs/{job_id}/report{{/code}} 142 142 143 -* New claim analysis (Stage 2 LLM calls blocked) 144 -* Full holistic assessment (Stage 3 blocked if any claims missing) 181 +**Response:** {{code}}200 OK{{/code}} with {{code}}text/markdown; charset=utf-8{{/code}} content 145 145 146 -==== User Experience Example: ==== 183 +**Headers:** 184 +* {{code}}Content-Disposition: attachment; filename="factharbor_poc1_{job_id}.md"{{/code}} 147 147 148 -{{{{ 149 - "status": "cache_only_mode", 150 - "message": "Monthly credit limit reached. Showing cached results only.", 151 - "cache_coverage": { 152 - "claims_total": 5, 153 - "claims_cached": 3, 154 - "claims_missing": 2, 155 - "coverage_percent": 60 156 - }, 157 - "cached_claims": [ 158 - {"claim_id": "C1", "verdict": "Likely", "confidence": 0.82}, 159 - {"claim_id": "C2", "verdict": "Highly Likely", "confidence": 0.91}, 160 - {"claim_id": "C4", "verdict": "Unclear", "confidence": 0.55} 161 - ], 162 - "missing_claims": [ 163 - {"claim_id": "C3", "claim_text": "...", "estimated_cost": "$0.081"}, 164 - {"claim_id": "C5", "claim_text": "...", "estimated_cost": "$0.081"} 165 - ], 166 - "upgrade_options": { 167 - "top_up": "$5 for 20-70 more articles", 168 - "pro_tier": "$50/month unlimited" 169 - } 170 -} 171 -}}} 186 +**Other Responses:** 187 +* {{code}}409 Conflict{{/code}} - Job not finished 188 +* {{code}}404 Not Found{{/code}} - Job unknown 172 172 173 - **Design Rationale:**190 +--- 174 174 175 -* Free users still get value (cached claims often answer their question) 176 -* Demonstrates FactHarbor's value (partial results encourage upgrade) 177 -* Sustainable for platform (no additional cost) 178 -* Fair to all users (everyone contributes to cache) 192 +=== 3.5 Stream Job Events (Optional, Recommended) === 179 179 180 - ----194 +**Endpoint:** {{code}}GET /v1/jobs/{job_id}/events{{/code}} 181 181 182 - ==3.RESTAPI Contract ==196 +**Response:** Server-Sent Events (SSE) stream 183 183 184 -=== 3.1 User Credit Tracking === 198 +**Event Types:** 199 +* {{code}}progress{{/code}} - Progress update 200 +* {{code}}claim_extracted{{/code}} - Claim identified 201 +* {{code}}verdict_computed{{/code}} - Scenario verdict complete 202 +* {{code}}complete{{/code}} - Job finished 203 +* {{code}}error{{/code}} - Error occurred 185 185 186 - **Endpoint:** GET /v1/user/credit205 +--- 187 187 188 - **Response:**200OK207 +=== 3.6 Cancel Job === 189 189 190 -{{{{ 191 - "user_id": "user_abc123", 192 - "tier": "free", 193 - "credit_limit": 10.00, 194 - "credit_used": 7.42, 195 - "credit_remaining": 2.58, 196 - "reset_date": "2025-02-01T00:00:00Z", 197 - "cache_only_mode": false, 198 - "usage_stats": { 199 - "articles_analyzed": 67, 200 - "claims_from_cache": 189, 201 - "claims_newly_analyzed": 113, 202 - "cache_hit_rate": 0.626 203 - } 204 -} 205 -}}} 209 +**Endpoint:** {{code}}DELETE /v1/jobs/{job_id}{{/code}} 206 206 207 - ----211 +Attempts to cancel a queued or running job. 208 208 209 - ===3.2CreateAnalysisJob (3-Stage)===213 +**Response:** {{code}}200 OK{{/code}} with updated Job object (status: CANCELLED) 210 210 211 -** Endpoint:**POST/v1/analyze215 +**Note:** Already-completed jobs cannot be cancelled. 212 212 213 - ==== Idempotency Support: ====217 +--- 214 214 215 - Topreventduplicatejob creationon networkretries, clients SHOULD include:219 +=== 3.7 Health Check === 216 216 217 -{{{POST /v1/analyze 218 -Idempotency-Key: {client-generated-uuid} 219 -}}} 221 +**Endpoint:** {{code}}GET /v1/health{{/code}} 220 220 221 - ORusethe client.request_idfield:223 +**Response:** {{code}}200 OK{{/code}} 222 222 223 -{{{{ 224 - "input_url": "...", 225 - "client": { 226 - "request_id": "client-uuid-12345", 227 - "source_label": "optional" 228 - } 225 +{{code language="json"}} 226 +{ 227 + "status": "ok", 228 + "version": "POC1-v0.3", 229 + "model": "claude-3-5-sonnet-20241022" 229 229 } 230 -}} }231 +{{/code}} 231 231 232 - **Server Behavior:**233 +--- 233 233 234 -* If Idempotency-Key or request_id seen before (within 24 hours): 235 -** Return existing job (200 OK, not 202 Accepted) 236 -** Do NOT create duplicate job or charge twice 237 -* Idempotency keys expire after 24 hours (matches job retention) 235 +== 4. AnalysisResult Schema (Context-Aware) == 238 238 239 - **ExampleResponse(Idempotent):**237 +This schema implements the **Context-Aware Analysis** required by the POC1 specification. 240 240 241 -{{{{ 242 - "job_id": "01J...ULID", 243 - "status": "RUNNING", 244 - "idempotent": true, 245 - "original_request_at": "2025-12-24T10:31:00Z", 246 - "message": "Returning existing job (idempotency key matched)" 247 -} 248 -}}} 249 - 250 -==== Request Body: ==== 251 - 252 -{{{{ 253 - "input_type": "url", 254 - "input_url": "https://example.com/medical-report-01", 255 - "input_text": null, 256 - "options": { 257 - "browsing": "on", 258 - "depth": "standard", 259 - "max_claims": 5, 260 - "scenarios_per_claim": 2, 261 - "max_evidence_per_scenario": 6, 262 - "context_aware_analysis": true 239 +{{code language="json"}} 240 +{ 241 + "metadata": { 242 + "job_id": "string (ULID)", 243 + "timestamp_utc": "ISO8601", 244 + "engine_version": "POC1-v0.3", 245 + "llm_provider": "anthropic", 246 + "llm_model": "claude-3-5-sonnet-20241022", 247 + "usage_stats": { 248 + "input_tokens": "integer", 249 + "output_tokens": "integer", 250 + "estimated_cost_usd": "float", 251 + "response_time_sec": "float" 252 + } 263 263 }, 264 - "client": { 265 - "request_id": "optional-client-tracking-id", 266 - "source_label": "optional" 254 + "article_holistic_assessment": { 255 + "main_thesis": "string (The core argument detected)", 256 + "overall_verdict": "WELL-SUPPORTED | MISLEADING | REFUTED | UNCERTAIN", 257 + "logic_quality_score": "float (0-1)", 258 + "fallacies_detected": ["correlation-causation", "cherry-picking", "hasty-generalization"], 259 + "verdict_reasoning": "string (Explanation of why article credibility differs from claim average)", 260 + "experimental_feature": true 261 + }, 262 + "claims": [ 263 + { 264 + "claim_id": "C1", 265 + "is_central_to_thesis": "boolean", 266 + "claim_text": "string", 267 + "canonical_form": "string", 268 + "claim_type": "descriptive | causal | predictive | normative | definitional", 269 + "evaluability": "evaluable | partly_evaluable | not_evaluable", 270 + "risk_tier": "A | B | C", 271 + "risk_tier_justification": "string", 272 + "domain": "string (e.g., 'public health', 'economics')", 273 + "key_terms": ["term1", "term2"], 274 + "entities": ["Person X", "Org Y"], 275 + "time_scope_detected": "2020-2024", 276 + "geography_scope_detected": "Brazil", 277 + "scenarios": [ 278 + { 279 + "scenario_id": "C1-S1", 280 + "context_title": "string", 281 + "definitions": {"key_term": "definition"}, 282 + "assumptions": ["Assumption 1", "Assumption 2"], 283 + "boundaries": { 284 + "time": "as of 2025-01", 285 + "geography": "Brazil", 286 + "population": "adult population", 287 + "conditions": "excludes X; includes Y" 288 + }, 289 + "scope_of_evidence": "What counts as evidence for this scenario", 290 + "scenario_questions": ["Question that decides the verdict"], 291 + "verdict": { 292 + "label": "Highly Likely | Likely | Unclear | Unlikely | Refuted | Unsubstantiated", 293 + "probability_range": [0.0, 1.0], 294 + "confidence": "float (0-1)", 295 + "reasoning": "string", 296 + "key_supporting_evidence_ids": ["E1", "E3"], 297 + "key_counter_evidence_ids": ["E2"], 298 + "uncertainty_factors": ["Data gap", "Method disagreement"], 299 + "what_would_change_my_mind": ["Specific new study", "Updated dataset"] 300 + }, 301 + "evidence": [ 302 + { 303 + "evidence_id": "E1", 304 + "stance": "supports | undermines | mixed | context_dependent", 305 + "relevance_to_scenario": "float (0-1)", 306 + "evidence_summary": ["Bullet fact 1", "Bullet fact 2"], 307 + "citation": { 308 + "title": "Source title", 309 + "author_or_org": "Org/Author", 310 + "publication_date": "2024-05-01", 311 + "url": "https://source.example", 312 + "publisher": "Publisher/Domain" 313 + }, 314 + "excerpt": ["Short quote ≤25 words (optional)"], 315 + "source_reliability_score": "float (0-1) - READ-ONLY SNAPSHOT", 316 + "reliability_justification": "Why high/medium/low", 317 + "limitations_and_reservations": ["Limitation 1", "Limitation 2"], 318 + "retraction_or_dispute_signal": "none | correction | retraction | disputed", 319 + "retrieval_status": "OK | NEEDS_RETRIEVAL | FAILED" 320 + } 321 + ] 322 + } 323 + ] 324 + } 325 + ], 326 + "quality_gates": { 327 + "gate1_claim_validation": "pass | fail", 328 + "gate4_verdict_confidence": "pass | fail", 329 + "passed_all": "boolean", 330 + "gate_fail_reasons": [ 331 + { 332 + "gate": "gate1_claim_validation", 333 + "claim_id": "C1", 334 + "reason_code": "OPINION_DETECTED | COMPOUND_CLAIM | SUBJECTIVE | TOO_VAGUE", 335 + "explanation": "Human-readable explanation" 336 + } 337 + ] 338 + }, 339 + "global_notes": { 340 + "limitations": ["System limitation 1", "Limitation 2"], 341 + "safety_or_policy_notes": ["Note 1"] 267 267 } 268 268 } 269 -}} }344 +{{/code}} 270 270 271 - **Options:**346 +=== 4.1 Risk Tier Definitions === 272 272 273 -* browsing: on | off (retrieve web sources or just output queries) 274 -* depth: standard | deep (evidence thoroughness) 275 -* max_claims: 1-10 (default: **5** for cost control) 276 -* scenarios_per_claim: 1-5 (default: **2** for cost control) 277 -* max_evidence_per_scenario: 3-10 (default: **6**) 278 -* context_aware_analysis: true | false (experimental) 348 +|=Tier|=Impact|=Examples|=Actions 349 +|**A (High)**|High real-world impact if wrong|Health claims, safety information, financial advice, medical procedures|Human review recommended (Mode3_Human_Reviewed_Required) 350 +|**B (Medium)**|Moderate impact, contested topics|Political claims, social issues, scientific debates, economic predictions|Enhanced contradiction search, AI-generated publication OK (Mode2_AI_Generated) 351 +|**C (Low)**|Low impact, easily verifiable|Historical facts, basic statistics, biographical data, geographic information|Standard processing, AI-generated publication OK (Mode2_AI_Generated) 279 279 280 - **Response:**202Accepted353 +=== 4.2 Source Reliability (Read-Only Snapshots) === 281 281 282 -{{{{ 283 - "job_id": "01J...ULID", 284 - "status": "QUEUED", 285 - "created_at": "2025-12-24T10:31:00Z", 286 - "estimated_cost": 0.114, 287 - "cost_breakdown": { 288 - "stage1_extraction": 0.003, 289 - "stage2_new_claims": 0.081, 290 - "stage2_cached_claims": 0.000, 291 - "stage3_holistic": 0.030 292 - }, 293 - "cache_info": { 294 - "claims_to_extract": 5, 295 - "estimated_cache_hits": 4, 296 - "estimated_new_claims": 1 297 - }, 298 - "links": { 299 - "self": "/v1/jobs/01J...ULID", 300 - "result": "/v1/jobs/01J...ULID/result", 301 - "report": "/v1/jobs/01J...ULID/report", 302 - "events": "/v1/jobs/01J...ULID/events" 303 - } 304 -} 305 -}}} 355 +**IMPORTANT:** The {{code}}source_reliability_score{{/code}} in each evidence item is a **historical snapshot** from the weekly background scoring job. 306 306 307 -**Error Responses:** 357 +* POC1 treats these scores as **read-only** (no modification during analysis) 358 +* **Prevents circular dependency:** scoring → affects retrieval → affects scoring 359 +* Full Source Track Record System is a **separate service** (not part of POC1) 360 +* **Temporal separation:** Scoring runs weekly; analysis uses snapshots 308 308 309 - 402PaymentRequired -Freetierlimit reached,cache-onlymode362 +**See:** [[Data Model>>Test.FactHarbor.Specification.Data Model.WebHome]] Section 1.3 (Source Track Record System) for scoring algorithm. 310 310 311 -{{{{ 312 - "error": "credit_limit_reached", 313 - "message": "Monthly credit limit reached. Entering cache-only mode.", 314 - "cache_only_mode": true, 315 - "credit_remaining": 0.00, 316 - "reset_date": "2025-02-01T00:00:00Z", 317 - "action": "Resubmit with cache_preference=allow_partial for cached results" 318 -} 319 -}}} 364 +=== 4.3 Quality Gate Reason Codes === 320 320 321 ----- 366 +**Gate 1 (Claim Validation):** 367 +* {{code}}OPINION_DETECTED{{/code}} - Subjective judgment without factual anchor 368 +* {{code}}COMPOUND_CLAIM{{/code}} - Multiple claims in one statement 369 +* {{code}}SUBJECTIVE{{/code}} - Value judgment, not verifiable fact 370 +* {{code}}TOO_VAGUE{{/code}} - Lacks specificity for evaluation 322 322 323 -== 4. Data Schemas == 372 +**Gate 4 (Verdict Confidence):** 373 +* {{code}}LOW_CONFIDENCE{{/code}} - Confidence below threshold (<0.5) 374 +* {{code}}INSUFFICIENT_EVIDENCE{{/code}} - Too few sources to reach verdict 375 +* {{code}}CONTRADICTORY_EVIDENCE{{/code}} - Evidence conflicts without resolution 376 +* {{code}}NO_COUNTER_EVIDENCE{{/code}} - Contradiction search failed 324 324 325 - ===4.1 Stage1 Output:ClaimExtraction===378 +**Purpose:** Enable system improvement workflow (Observe → Analyze → Improve) 326 326 327 -{{{{ 328 - "job_id": "01J...ULID", 329 - "stage": "stage1_extraction", 330 - "article_metadata": { 331 - "title": "Article title", 332 - "source_url": "https://example.com/article", 333 - "extracted_text_length": 5234, 334 - "language": "en" 335 - }, 336 - "claims": [ 337 - { 338 - "claim_id": "C1", 339 - "claim_text": "Original claim text from article", 340 - "canonical_claim": "Normalized, deduplicated phrasing", 341 - "claim_hash": "sha256:abc123...", 342 - "is_central_to_thesis": true, 343 - "claim_type": "causal", 344 - "evaluability": "evaluable", 345 - "risk_tier": "B", 346 - "domain": "public_health" 347 - } 348 - ], 349 - "article_thesis": "Main argument detected", 350 - "cost": 0.003 351 -} 352 -}}} 380 +--- 353 353 354 - ----382 +== 5. Validation Rules (POC1 Enforcement) == 355 355 356 -=== 4.5 Verdict Label Taxonomy === 384 +|=Rule|=Requirement 385 +|**Mandatory Contradiction**|For every claim, the engine MUST search for "undermines" evidence. If none found, reasoning must explicitly state: "No counter-evidence found despite targeted search." Evidence must include at least 1 item with {{code}}stance ∈ {undermines, mixed, context_dependent}{{/code}} OR explicit note in {{code}}uncertainty_factors{{/code}}. 386 +|**Context-Aware Logic**|The {{code}}overall_verdict{{/code}} must prioritize central claims. If a {{code}}is_central_to_thesis=true{{/code}} claim is REFUTED, the overall article cannot be WELL-SUPPORTED. Central claims override verdict averaging. 387 +|**Author Identification**|All automated outputs MUST include {{code}}author_type: "AI/AKEL"{{/code}} or equivalent marker to distinguish AI-generated from human-reviewed content. 388 +|**Claim-to-Scenario Lifecycle**|In stateless POC1, Scenarios are **strictly children** of a specific Claim version. If a Claim's text changes, child Scenarios are part of that version's "snapshot." No scenario migration across versions. 357 357 358 - FactHarbor uses **three distinct verdict taxonomies** depending on analysis level:390 +--- 359 359 360 -== ==4.5.1Scenario VerdictLabels (Stage2)====392 +== 6. Deterministic Markdown Template == 361 361 362 - Usedforindividualscenarioverdictswithinaclaim.394 +The system renders {{code}}report.md{{/code}} using a **fixed template** based on the JSON result (NOT generated by LLM). 363 363 364 -**Enum Values:** 396 +{{code language="markdown"}} 397 +# FactHarbor Analysis Report: {overall_verdict} 365 365 366 -* Highly Likely - Probability 0.85-1.0, high confidence 367 -* Likely - Probability 0.65-0.84, moderate-high confidence 368 -* Unclear - Probability 0.35-0.64, or low confidence 369 -* Unlikely - Probability 0.16-0.34, moderate-high confidence 370 -* Highly Unlikely - Probability 0.0-0.15, high confidence 371 -* Unsubstantiated - Insufficient evidence to determine probability 399 +**Job ID:** {job_id} | **Generated:** {timestamp_utc} 400 +**Model:** {llm_model} | **Cost:** ${estimated_cost_usd} | **Time:** {response_time_sec}s 372 372 373 - ==== 4.5.2 Claim Verdict Labels (Rollup) ====402 +--- 374 374 375 - Usedwhensummarizing aclaimacrossallscenarios.404 +## 1. Holistic Assessment (Experimental) 376 376 377 -** EnumValues:**406 +**Main Thesis:** {main_thesis} 378 378 379 -* Supported - Majority of scenarios are Likely or Highly Likely 380 -* Refuted - Majority of scenarios are Unlikely or Highly Unlikely 381 -* Inconclusive - Mixed scenarios or majority Unclear/Unsubstantiated 408 +**Overall Verdict:** {overall_verdict} 382 382 383 -** MappingLogic:**410 +**Logic Quality Score:** {logic_quality_score}/1.0 384 384 385 -* If ≥60% scenarios are (Highly Likely | Likely) → Supported 386 -* If ≥60% scenarios are (Highly Unlikely | Unlikely) → Refuted 387 -* Otherwise → Inconclusive 412 +**Fallacies Detected:** {fallacies_detected} 388 388 389 - ==== 4.5.3 ArticleVerdictLabels (Stage 3) ====414 +**Reasoning:** {verdict_reasoning} 390 390 391 - Used for holistic article-level assessment.416 +--- 392 392 393 - **EnumValues:**418 +## 2. Key Claims Analysis 394 394 395 - *WELL-SUPPORTED- Article thesis logically follows fromsupported claims396 -* MISLEADING - Claimsmaybetruebutarticlecommitslogicalfallacies397 -* R EFUTED- Centralclaimsare refuted,invalidating thesis398 -* UNCERTAIN - Insufficient evidence or highly mixed claim verdicts420 +### [C1] {claim_text} 421 +* **Role:** {is_central_to_thesis ? "Central to thesis" : "Supporting claim"} 422 +* **Risk Tier:** {risk_tier} ({risk_tier_justification}) 423 +* **Evaluability:** {evaluability} 399 399 400 -** Note:** Articleverdict considers**claim centrality**(central claims override supporting claims).425 +**Scenarios Explored:** {scenarios.length} 401 401 402 -==== 4.5.4 API Field Mapping ==== 427 +#### Scenario: {scenario.context_title} 428 +* **Verdict:** {verdict.label} (Confidence: {verdict.confidence}) 429 +* **Probability Range:** {verdict.probability_range[0]} - {verdict.probability_range[1]} 430 +* **Reasoning:** {verdict.reasoning} 403 403 404 - |=Level|=API Field|=Enum Name405 - |Scenario|scenarios[].verdict.label|scenario_verdict_label406 - |Claim|claims[].rollup_verdict(optional)|claim_verdict_label407 - |Article|article_holistic_assessment.overall_verdict|article_verdict_label432 +**Evidence:** 433 +* Supporting: {evidence.filter(e => e.stance == "supports").length} sources 434 +* Undermining: {evidence.filter(e => e.stance == "undermines").length} sources 435 +* Mixed: {evidence.filter(e => e.stance == "mixed").length} sources 408 408 409 ----- 437 +**Key Evidence:** 438 +* [{evidence[0].citation.title}]({evidence[0].citation.url}) - {evidence[0].stance} 410 410 411 - == 5. Cache Architecture ==440 +--- 412 412 413 - ===5.1RedisCacheDesign===442 +## 3. Quality Assessment 414 414 415 -**Technology:** Redis 7.0+ (in-memory key-value store) 444 +**Quality Gates:** 445 +* Gate 1 (Claim Validation): {gate1_claim_validation} 446 +* Gate 4 (Verdict Confidence): {gate4_verdict_confidence} 447 +* Overall: {passed_all ? "PASS" : "FAIL"} 416 416 417 -**Cache Key Schema:** 449 +{if gate_fail_reasons.length > 0} 450 +**Failed Gates:** 451 +{gate_fail_reasons.map(r => `* ${r.gate}: ${r.explanation}`)} 452 +{/if} 418 418 419 -{{{claim:v1norm1:{language}:{sha256(canonical_claim)} 420 -}}} 454 +--- 421 421 422 - **Example:**456 +## 4. Limitations & Disclaimers 423 423 424 -{{{Claim (English): "COVID vaccines are 95% effective" 425 -Canonical: "covid vaccines are 95 percent effective" 426 -Language: "en" 427 -SHA256: abc123...def456 428 -Key: claim:v1norm1:en:abc123...def456 429 -}}} 458 +**System Limitations:** 459 +{limitations.map(l => `* ${l}`)} 430 430 431 -**Rationale:** Prevents cross-language collisions and enables per-language cache analytics. 461 +**Important Notes:** 462 +* This analysis is AI-generated and experimental (POC1) 463 +* Context-aware article verdict is being tested for accuracy 464 +* Human review recommended for high-risk claims (Tier A) 465 +* Cost: ${estimated_cost_usd} | Tokens: {input_tokens + output_tokens} 432 432 433 -** Data Structure:**467 +**Methodology:** FactHarbor uses Claude 3.5 Sonnet to extract claims, generate scenarios, gather evidence (with mandatory contradiction search), and assess logical coherence between claims and article thesis. 434 434 435 -{{{SET claim:v1norm1:en:abc123...def456 '{...ClaimAnalysis JSON...}' 436 -EXPIRE claim:v1norm1:en:abc123...def456 7776000 # 90 days 437 -}}} 469 +--- 438 438 439 ----- 471 +*Generated by FactHarbor POC1-v0.3 | [About FactHarbor](https://factharbor.org)* 472 +{{/code}} 440 440 441 - ===5.1.1CanonicalClaimNormalization(v1)===474 +**Target Report Size:** 220-350 words (optimized for 2-minute read) 442 442 443 - The cache key depends on deterministic claim normalization. All implementations MUST follow this algorithm exactly.476 +--- 444 444 445 - **Algorithm:Canonical Claim Normalizationv1**478 +== 7. LLM Configuration (POC1) == 446 446 447 -{{{def normalize_claim_v1(claim_text: str, language: str) -> str: 448 - """ 449 - Normalizes claim to canonical form for cache key generation. 450 - Version: v1norm1 (POC1) 451 - """ 452 - import re 453 - import unicodedata 454 - 455 - # Step 1: Unicode normalization (NFC) 456 - text = unicodedata.normalize('NFC', claim_text) 457 - 458 - # Step 2: Lowercase 459 - text = text.lower() 460 - 461 - # Step 3: Remove punctuation (except hyphens in words) 462 - text = re.sub(r'[^\w\s-]', '', text) 463 - 464 - # Step 4: Normalize whitespace (collapse multiple spaces) 465 - text = re.sub(r'\s+', ' ', text).strip() 466 - 467 - # Step 5: Numeric normalization 468 - text = text.replace('%', ' percent') 469 - # Spell out single-digit numbers 470 - num_to_word = {'0':'zero', '1':'one', '2':'two', '3':'three', 471 - '4':'four', '5':'five', '6':'six', '7':'seven', 472 - '8':'eight', '9':'nine'} 473 - for num, word in num_to_word.items(): 474 - text = re.sub(rf'\b{num}\b', word, text) 475 - 476 - # Step 6: Common abbreviations (English only in v1) 477 - if language == 'en': 478 - text = text.replace('covid-19', 'covid') 479 - text = text.replace('u.s.', 'us') 480 - text = text.replace('u.k.', 'uk') 481 - 482 - # Step 7: NO entity normalization in v1 483 - # (Trump vs Donald Trump vs President Trump remain distinct) 484 - 485 - return text 480 +|=Parameter|=Value|=Notes 481 +|**Provider**|Anthropic|Primary provider for POC1 482 +|**Model**|{{code}}claude-3-5-sonnet-20241022{{/code}}|Current production model 483 +|**Future Model**|{{code}}claude-sonnet-4-20250514{{/code}}|When available (architecture supports) 484 +|**Token Budget**|50K-80K per analysis|Input + output combined (varies by article length) 485 +|**Estimated Cost**|$0.10-0.30 per article|Based on Sonnet 3.5 pricing ($3/M input, $15/M output) 486 +|**Prompt Strategy**|Single-pass per stage|Not multi-turn; structured JSON output with schema validation 487 +|**Chain-of-Thought**|Yes|For verdict reasoning and holistic assessment 488 +|**Few-Shot Examples**|Yes|For claim extraction and scenario generation 486 486 487 -# Version identifier (include in cache namespace) 488 -CANONICALIZER_VERSION = "v1norm1" 489 -}}} 490 +=== 7.1 Token Budgets by Stage === 490 490 491 -**Cache Key Formula (Updated):** 492 +|=Stage|=Approximate Output Tokens 493 +|Claim Extraction|~4,000 (10 claims × ~400 tokens) 494 +|Scenario Generation|~3,000 per claim (3 scenarios × ~1,000 tokens) 495 +|Evidence Synthesis|~2,000 per scenario 496 +|Verdict Generation|~1,000 per scenario 497 +|Holistic Assessment|~500 (context-aware summary) 492 492 493 -{{{language = "en" 494 -canonical = normalize_claim_v1(claim_text, language) 495 -cache_key = f"claim:{CANONICALIZER_VERSION}:{language}:{sha256(canonical)}" 499 +**Total:** 50K-80K tokens per article (input + output) 496 496 497 -Example: 498 - claim: "COVID-19 vaccines are 95% effective" 499 - canonical: "covid vaccines are 95 percent effective" 500 - sha256: abc123...def456 501 - key: "claim:v1norm1:en:abc123...def456" 502 -}}} 501 +=== 7.2 API Integration === 503 503 504 -**Cache Metadata MUST Include:** 503 +**Anthropic Messages API:** 504 +* Endpoint: {{code}}https://api.anthropic.com/v1/messages{{/code}} 505 +* Authentication: API key via {{code}}x-api-key{{/code}} header 506 +* Model parameter: {{code}}"model": "claude-3-5-sonnet-20241022"{{/code}} 507 +* Max tokens: {{code}}"max_tokens": 4096{{/code}} (per stage) 505 505 506 -{{{{ 507 - "canonical_claim": "covid vaccines are 95 percent effective", 508 - "canonicalizer_version": "v1norm1", 509 - "language": "en", 510 - "original_claim_samples": ["COVID-19 vaccines are 95% effective"] 511 -} 512 -}}} 509 +**No LangChain/LangGraph needed** for POC1 simplicity - direct SDK calls suffice. 513 513 514 - **Version Upgrade Path:**511 +--- 515 515 516 -* v1norm1 → v1norm2: Cache namespace changes, old keys remain valid until TTL 517 -* v1normN → v2norm1: Major version bump, invalidate all v1 caches 513 +== 8. Cross-References (xWiki) == 518 518 519 - ----515 +This API specification implements requirements from: 520 520 521 -=== 5.1.2 Copyright & Data Retention Policy === 517 +* **[[POC Requirements>>Test.FactHarbor.Specification.POC.Requirements]]** 518 +** FR-POC-1 through FR-POC-6 (POC1-specific functional requirements) 519 +** NFR-POC-1 through NFR-POC-3 (quality gates lite: Gates 1 & 4 only) 520 +** Section 2.1: Analysis Summary (Context-Aware) component specification 521 +** Section 10.3: Prompt structure for claim extraction and verdict synthesis 522 522 523 -**Evidence Excerpt Storage:** 523 +* **[[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]]** 524 +** Complete investigation of 7 approaches to article-level verdicts 525 +** Approach 1 (Single-Pass Holistic Analysis) chosen for POC1 526 +** Experimental feature testing plan (30 articles, ≥70% accuracy target) 527 +** Decision framework for POC2 implementation 524 524 525 -To comply with copyright law and fair use principles: 529 +* **[[Requirements>>Test.FactHarbor.Specification.Requirements.WebHome]]** 530 +** FR4 (Analysis Summary) - enhanced with context-aware capability 531 +** FR7 (Verdict Calculation) - probability ranges + confidence scores 532 +** NFR11 (Quality Gates) - POC1 implements Gates 1 & 4; Gates 2 & 3 in POC2 526 526 527 -**What We Store:** 534 +* **[[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]]** 535 +** POC1 simplified architecture (stateless, single AKEL orchestration call) 536 +** Data persistence minimized (job outputs only, no database required) 537 +** Deferred complexity (no Elasticsearch, TimescaleDB, Federation until metrics justify) 528 528 529 -* **Metadata only:** Title, author, publisher, URL, publication date 530 -* **Short excerpts:** Max 25 words per quote, max 3 quotes per evidence item 531 -* **Summaries:** AI-generated bullet points (not verbatim text) 532 -* **No full articles:** Never store complete article text beyond job processing 539 +* **[[Data Model>>Test.FactHarbor.Specification.Data Model.WebHome]]** 540 +** Evidence structure (source, stance, reliability rating) 541 +** Scenario boundaries (time, geography, population, conditions) 542 +** Claim types and evaluability taxonomy 543 +** Source Track Record System (Section 1.3) - temporal separation 533 533 534 -**Total per Cached Claim:** 545 +* **[[Requirements Roadmap Matrix>>Test.FactHarbor.Roadmap.Requirements-Roadmap-Matrix.WebHome]]** 546 +** POC1 requirement mappings and phase assignments 547 +** Context-aware analysis as POC1 experimental feature 548 +** POC2 enhancement path (Gates 2 & 3, evidence deduplication) 535 535 536 -* Scenarios: 2 per claim 537 -* Evidence items: 6 per scenario (12 total) 538 -* Quotes: 3 per evidence × 25 words = 75 words per item 539 -* **Maximum stored verbatim text:** ~~900 words per claim (12 × 75) 550 +--- 540 540 541 - **Retention:**552 +== 9. Implementation Notes (POC1) == 542 542 543 -* Cache TTL: 90 days 544 -* Job outputs: 24 hours (then archived or deleted) 545 -* No persistent full-text article storage 554 +=== 9.1 Recommended Tech Stack === 546 546 547 -**Rationale:** 556 +* **Framework:** Next.js 14+ with App Router (TypeScript) - Full-stack in one codebase 557 +* **Rationale:** API routes + React UI unified, Vercel deployment-ready, similar to C# in structure 558 +* **Storage:** Filesystem JSON files (no database needed for POC1) 559 +* **Queue:** In-memory queue or Redis (optional for concurrency) 560 +* **URL Extraction:** Jina AI Reader API (primary), trafilatura (fallback) 561 +* **Deployment:** Vercel, AWS Lambda, or similar serverless 548 548 549 -* Short excerpts for citation = fair use 550 -* Summaries are transformative (not copyrightable) 551 -* Limited retention (90 days max) 552 -* No commercial republication of excerpts 563 +=== 9.2 POC1 Simplifications === 553 553 554 -**DMCA Compliance:** 565 +* **No database required:** Job metadata + outputs stored as JSON files ({{code}}jobs/{job_id}.json{{/code}}, {{code}}results/{job_id}.json{{/code}}) 566 +* **No user authentication:** Optional API key validation only (env var: {{code}}FACTHARBOR_API_KEY{{/code}}) 567 +* **Single-instance deployment:** No distributed processing, no worker pools 568 +* **Synchronous LLM calls:** No streaming in POC1 (entire response before returning) 569 +* **Job retention:** 24 hours default (configurable: {{code}}JOB_RETENTION_HOURS{{/code}}) 570 +* **Rate limiting:** Simple IP-based (optional) - no complex billing 555 555 556 -* Cache invalidation endpoint available for rights holders 557 -* Contact: dmca@factharbor.org 572 +=== 9.3 Estimated Costs (Per Analysis) === 558 558 559 ----- 574 +**LLM API costs (Claude 3.5 Sonnet):** 575 +* Input: $3.00 per million tokens 576 +* Output: $15.00 per million tokens 577 +* **Per article:** $0.10-0.30 (varies by length, 5-10 claims typical) 560 560 561 -== Summary == 579 +**Web search costs (optional):** 580 +* Using external search API (Tavily, Brave): $0.01-0.05 per analysis 581 +* POC1 can use free search APIs initially 562 562 563 -This WYSIWYG preview shows the **structure and key sections** of the 1,515-line API specification. 583 +**Infrastructure costs:** 584 +* Vercel hobby tier: Free for POC 585 +* AWS Lambda: ~$0.001 per request 586 +* **Total infra:** <$0.01 per analysis 564 564 565 -** Fullspecificationincludes:**588 +**Total estimated cost:** ~$0.15-0.35 per analysis ✅ Meets <$0.35 target 566 566 567 -* Complete API endpoints (7 total) 568 -* All data schemas (ClaimExtraction, ClaimAnalysis, HolisticAssessment, Complete) 569 -* Quality gates & validation rules 570 -* LLM configuration for all 3 stages 571 -* Implementation notes with code samples 572 -* Testing strategy 573 -* Cross-references to other pages 590 +=== 9.4 Estimated Timeline (AI-Assisted) === 574 574 575 -**The complete specification is available in:** 592 +**With Cursor IDE + Claude API:** 593 +* Day 1-2: API scaffolding + job queue 594 +* Day 3-4: LLM integration + prompt engineering 595 +* Day 5-6: Evidence retrieval + contradiction search 596 +* Day 7: Report templates + testing with 30 articles 597 +* **Total:** 5-7 days for working POC1 576 576 577 -* FactHarbor_POC1_API_and_Schemas_Spec_v0_4_1_PATCHED.md (45 KB standalone) 578 -* Export files (TEST/PRODUCTION) for xWiki import 599 +**Manual coding (no AI assistance):** 600 +* Estimate: 15-20 days 601 + 602 +=== 9.5 First Prompt for AI Code Generation === 603 + 604 +{{code}} 605 +Based on the FactHarbor POC1 API & Schemas Specification (v0.3), generate a Next.js 14 TypeScript application with: 606 + 607 +1. API routes implementing the 7 endpoints specified in Section 3 608 +2. AnalyzeRequest/AnalysisResult types matching schemas in Sections 4-5 609 +3. Anthropic Claude 3.5 Sonnet integration for: 610 + - Claim extraction (with central/supporting marking) 611 + - Scenario generation 612 + - Evidence synthesis (with mandatory contradiction search) 613 + - Verdict generation 614 + - Holistic assessment (article-level credibility) 615 +4. Job-based async execution with progress tracking (7 pipeline stages) 616 +5. Quality Gates 1 & 4 from NFR11 implementation 617 +6. Mandatory contradiction search enforcement (Section 5) 618 +7. Context-aware analysis (experimental) as specified 619 +8. Filesystem-based job storage (no database) 620 +9. Markdown report generation from JSON templates (Section 6) 621 + 622 +Use the validation rules from Section 5 and error codes from Section 2.1.1. 623 +Target: <$0.35 per analysis, <2 minutes processing time. 624 +{{/code}} 625 + 626 +--- 627 + 628 +== 10. Testing Strategy (POC1) == 629 + 630 +=== 10.1 Test Dataset (30 Articles) === 631 + 632 +**Category 1: Straightforward Factual (10 articles)** 633 +* Purpose: Baseline accuracy 634 +* Example: "WHO report on global vaccination rates" 635 +* Expected: High claim accuracy, straightforward verdict 636 + 637 +**Category 2: Accurate Claims, Questionable Conclusions (10 articles)** ⭐ **Context-Aware Test** 638 +* Purpose: Test holistic assessment capability 639 +* Example: "Coffee cures cancer" (true premises, false conclusion) 640 +* Expected: Individual claims TRUE, article verdict MISLEADING 641 + 642 +**Category 3: Mixed Accuracy (5 articles)** 643 +* Purpose: Test nuance handling 644 +* Example: Articles with some true, some false claims 645 +* Expected: Scenario-level differentiation 646 + 647 +**Category 4: Low-Quality Claims (5 articles)** 648 +* Purpose: Test quality gates 649 +* Example: Opinion pieces, compound claims 650 +* Expected: Gate 1 failures, rejection or draft-only mode 651 + 652 +=== 10.2 Success Metrics === 653 + 654 +**Quality Metrics:** 655 +* Hallucination rate: <5% (target: <3%) 656 +* Context-aware accuracy: ≥70% (experimental - key POC1 goal) 657 +* False positive rate: <15% 658 +* Mandatory contradiction search: 100% compliance 659 + 660 +**Performance Metrics:** 661 +* Processing time: <2 minutes per article (standard depth) 662 +* Cost per analysis: <$0.35 663 +* API uptime: >99% 664 +* LLM API error rate: <1% 665 + 666 +**See:** [[POC1 Roadmap>>Test.FactHarbor.Roadmap.POC1.WebHome]] Section 11 for complete success criteria and testing methodology. 667 + 668 +--- 669 + 670 +**End of Specification - FactHarbor POC1 API v0.3** 671 + 672 +**Ready for xWiki import and AI-assisted implementation!** 🚀 673 +