Changes for page POC1 API & Schemas Specification
Last modified by Robert Schaub on 2025/12/24 18:26
To version 3.1
edited by Robert Schaub
on 2025/12/24 16:32
on 2025/12/24 16:32
Change comment:
There is no comment for this version
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -1,673 +1,578 @@ 1 - #FactHarborPOC1—API & Schemas Specification1 += POC1 API & Schemas Specification = 2 2 3 -**Version:** 0.3 (POC1 - Production Ready) 4 -**Namespace:** FactHarbor.* 5 -**Syntax:** xWiki 2.1 6 -**Last Updated:** 2025-12-24 3 +---- 7 7 8 ---- 9 - 10 10 == Version History == 11 11 12 12 |=Version|=Date|=Changes 13 -|0.3|2025-12-24|Added complete API endpoints, LLM config, risk tiers, scraping details, quality gate logging, temporal separation note, cross-references 14 -|0.2|2025-12-24|Initial rebased version with holistic assessment 15 -|0.1|2025-12-24|Original specification 8 +|0.4.1|2025-12-24|Applied 9 critical fixes: file format notice, verdict taxonomy, canonicalization algorithm, Stage 1 cost policy, BullMQ fix, language in cache key, historical claims TTL, idempotency, copyright policy 9 +|0.4|2025-12-24|**BREAKING:** 3-stage pipeline with claim-level caching, user tier system, cache-only mode for free users, Redis cache architecture 10 +|0.3.1|2025-12-24|Fixed single-prompt strategy, SSE clarification, schema canonicalization, cost constraints 11 +|0.3|2025-12-24|Added complete API endpoints, LLM config, risk tiers, scraping details 16 16 17 ---- 13 +---- 18 18 19 19 == 1. Core Objective (POC1) == 20 20 21 -The primary technical goal of POC1 is to validate **Approach 1 (Single-Pass Holistic Analysis)** :17 +The primary technical goal of POC1 is to validate **Approach 1 (Single-Pass Holistic Analysis)** while implementing **claim-level caching** to achieve cost sustainability. 22 22 23 -The system must prove that AI can identify an article's **Main Thesis** and determine if thesupporting claims(even if individually accurate) logically support that thesis without committing fallacies(e.g., correlation vs. causation, cherry-picking, hasty generalization).19 +The system must prove that AI can identify an article's **Main Thesis** and determine if supporting claims logically support that thesis without committing fallacies. 24 24 25 -**Success Criteria:** 21 +=== Success Criteria: === 22 + 26 26 * Test with 30 diverse articles 27 27 * Target: ≥70% accuracy detecting misleading articles 28 -* Cost: <$0.35 per analysis 25 +* Cost: <$0.25 per NEW analysis (uncached) 26 +* Cost: $0.00 for cached claim reuse 27 +* Cache hit rate: ≥50% after 1,000 articles 29 29 * Processing time: <2 minutes (standard depth) 30 30 31 - **See:**[[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]] for completeinvestigation of 7 approaches.30 +=== Economic Model: === 32 32 33 ---- 32 +* **Free tier:** $10 credit per month (~~40-140 articles depending on cache hits) 33 +* **After limit:** Cache-only mode (instant, free access to cached claims) 34 +* **Paid tier:** Unlimited new analyses 34 34 35 - == 2. Runtime Model & Job States ==36 +---- 36 36 37 -== =2.1PipelineSteps===38 +== 2. Architecture Overview == 38 38 39 - Forprogressreporting viaAPI, thepipelinefollowsthesestages:40 +=== 2.1 3-Stage Pipeline with Caching === 40 40 41 -# **INGEST**: URL scraping (Jina Reader / Trafilatura) or text normalization. 42 -# **EXTRACT_CLAIMS**: Identifying 3-5 verifiable factual claims + marking central vs. supporting. 43 -# **SCENARIOS**: Generating context interpretations for each claim. 44 -# **RETRIEVAL**: Evidence gathering (Search API + mandatory contradiction search). 45 -# **VERDICTS**: Assigning likelihoods, confidence, and uncertainty per scenario. 46 -# **HOLISTIC_ASSESSMENT**: Evaluating article-level credibility (Thesis vs. Claims logic). 47 -# **REPORT**: Generating final Markdown and JSON outputs. 42 +FactHarbor POC1 uses a **3-stage architecture** designed for claim-level caching and cost efficiency: 48 48 49 -=== 2.1.1 URL Extraction Strategy === 44 +{{mermaid}} 45 +graph TD 46 + A[Article Input] --> B[Stage 1: Extract Claims] 47 + B --> C{For Each Claim} 48 + C --> D[Check Cache] 49 + D -->|Cache HIT| E[Return Cached Verdict] 50 + D -->|Cache MISS| F[Stage 2: Analyze Claim] 51 + F --> G[Store in Cache] 52 + G --> E 53 + E --> H[Stage 3: Holistic Assessment] 54 + H --> I[Final Report] 55 +{{/mermaid}} 50 50 51 -**Primary:** Jina AI Reader ({{code}}https://r.jina.ai/{url}{{/code}}) 52 -* **Rationale:** Clean markdown, handles JS rendering, free tier sufficient 53 -* **Fallback:** Trafilatura (Python library) for simple static HTML 57 +==== Stage 1: Claim Extraction (Haiku, no cache) ==== 54 54 55 -**Error Handling:** 59 +* **Input:** Article text 60 +* **Output:** 5 canonical claims (normalized, deduplicated) 61 +* **Model:** Claude Haiku 4 62 +* **Cost:** $0.003 per article 63 +* **Cache strategy:** No caching (article-specific) 56 56 57 -|=Error Code|=Trigger|=Action 58 -|{{code}}URL_BLOCKED{{/code}}|403/401/Paywall detected|Return error, suggest text paste 59 -|{{code}}URL_UNREACHABLE{{/code}}|Network/DNS failure|Retry once, then fail 60 -|{{code}}URL_NOT_FOUND{{/code}}|404 Not Found|Return error immediately 61 -|{{code}}EXTRACTION_FAILED{{/code}}|Content <50 words or unreadable|Return error with reason 65 +==== Stage 2: Claim Analysis (Sonnet, CACHED) ==== 62 62 63 -** SupportedURL Patterns:**64 -* ✅Newsarticles, blog posts,Wikipedia65 -* ✅Academicpreprints(arXiv)66 -* ❌ Social media posts(Twitter,Facebook)- notin POC167 -* ❌ Videoplatforms(YouTube,TikTok)-not in POC168 -* ❌PDFfiles- deferred toBeta067 +* **Input:** Single canonical claim 68 +* **Output:** Scenarios + Evidence + Verdicts 69 +* **Model:** Claude Sonnet 3.5 70 +* **Cost:** $0.081 per NEW claim 71 +* **Cache strategy:** Redis, 90-day TTL 72 +* **Cache key:** claim:v1norm1:{language}:{sha256(canonical_claim)} 69 69 70 -=== 2.2 JobStatusEnumeration ===74 +==== Stage 3: Holistic Assessment (Sonnet, no cache) ==== 71 71 72 -((( 73 -* **QUEUED** - Job accepted, waiting in queue 74 -* **RUNNING** - Processing in progress 75 -* **SUCCEEDED** - Analysis complete, results available 76 -* **FAILED** - Error occurred, see error details 77 -* **CANCELLED** - User cancelled via DELETE endpoint 78 -))) 76 +* **Input:** Article + Claim verdicts (from cache or Stage 2) 77 +* **Output:** Article verdict + Fallacies + Logic quality 78 +* **Model:** Claude Sonnet 3.5 79 +* **Cost:** $0.030 per article 80 +* **Cache strategy:** No caching (article-specific) 79 79 80 - ---82 +=== Total Cost Formula: === 81 81 82 -= =3.REST API Contract==84 +{{{Cost = $0.003 (extraction) + (N_new_claims × $0.081) + $0.030 (holistic) 83 83 84 -=== 3.1 Create Analysis Job === 86 +Examples: 87 +- 0 new claims (100% cache hit): $0.033 88 +- 1 new claim (80% cache hit): $0.114 89 +- 3 new claims (40% cache hit): $0.276 90 +- 5 new claims (0% cache hit): $0.438 91 +}}} 85 85 86 - **Endpoint:** {{code}}POST /v1/analyze{{/code}}93 +---- 87 87 88 -**Request Body Example:** 89 -{{code language="json"}} 90 -{ 91 - "input_type": "url", 92 - "input_url": "https://example.com/medical-report-01", 93 - "input_text": null, 94 - "options": { 95 - "browsing": "on", 96 - "depth": "standard", 97 - "max_claims": 5, 98 - "context_aware_analysis": true 99 - }, 100 - "client": { 101 - "request_id": "optional-client-tracking-id", 102 - "source_label": "optional" 103 - } 104 -} 105 -{{/code}} 95 +=== 2.2 User Tier System === 106 106 107 -**Options:** 108 -* {{code}}browsing{{/code}}: {{code}}on{{/code}} | {{code}}off{{/code}} (retrieve web sources or just output queries) 109 -* {{code}}depth{{/code}}: {{code}}standard{{/code}} | {{code}}deep{{/code}} (evidence thoroughness) 110 -* {{code}}max_claims{{/code}}: 1-50 (default: 10) 111 -* {{code}}context_aware_analysis{{/code}}: {{code}}true{{/code}} | {{code}}false{{/code}} (experimental) 97 +|=Tier|=Monthly Credit|=After Limit|=Cache Access|=Analytics 98 +|**Free**|$10|Cache-only mode|✅ Full|Basic 99 +|**Pro** (future)|$50|Continues|✅ Full|Advanced 100 +|**Enterprise** (future)|Custom|Continues|✅ Full + Priority|Full 112 112 113 -** Response:**{{code}}202Accepted{{/code}}102 +**Free Tier Economics:** 114 114 115 -{{code language="json"}} 116 -{ 117 - "job_id": "01J...ULID", 118 - "status": "QUEUED", 119 - "created_at": "2025-12-24T10:31:00Z", 120 - "links": { 121 - "self": "/v1/jobs/01J...ULID", 122 - "result": "/v1/jobs/01J...ULID/result", 123 - "report": "/v1/jobs/01J...ULID/report", 124 - "events": "/v1/jobs/01J...ULID/events" 125 - } 126 -} 127 -{{/code}} 104 +* $10 credit = 40-140 articles analyzed (depending on cache hit rate) 105 +* Average 70 articles/month at 70% cache hit rate 106 +* After limit: Cache-only mode 128 128 129 ---- 108 +---- 130 130 131 -=== 3.2GetJobStatus===110 +=== 2.3 Cache-Only Mode (Free Tier Feature) === 132 132 133 - **Endpoint:**{{code}}GET/v1/jobs/{job_id}{{/code}}112 +When free users reach their $10 monthly limit, they enter **Cache-Only Mode**: 134 134 135 - **Response:**{{code}}200OK{{/code}}114 +==== What Cache-Only Mode Provides: ==== 136 136 137 -{{code language="json"}} 138 -{ 139 - "job_id": "01J...ULID", 140 - "status": "RUNNING", 141 - "created_at": "2025-12-24T10:31:00Z", 142 - "updated_at": "2025-12-24T10:31:22Z", 143 - "progress": { 144 - "step": "RETRIEVAL", 145 - "percent": 60, 146 - "message": "Gathering evidence for C2-S1", 147 - "current_claim_id": "C2", 148 - "current_scenario_id": "C2-S1" 149 - }, 150 - "input_echo": { 151 - "input_type": "url", 152 - "input_url": "https://example.com/medical-report-01" 153 - }, 154 - "links": { 155 - "self": "/v1/jobs/01J...ULID", 156 - "result": "/v1/jobs/01J...ULID/result", 157 - "report": "/v1/jobs/01J...ULID/report" 158 - }, 159 - "error": null 160 -} 161 -{{/code}} 116 +✅ **Claim Extraction (Platform-Funded):** 162 162 163 ---- 118 +* Stage 1 extraction runs at $0.003 per article 119 +* **Cost: Absorbed by platform** (not charged to user credit) 120 +* Rationale: Extraction is necessary to check cache, and cost is negligible 121 +* Rate limit: Max 50 extractions/day in cache-only mode (prevents abuse) 164 164 165 - ===3.3 GetJSON Result===123 +✅ **Instant Access to Cached Claims:** 166 166 167 -**Endpoint:** {{code}}GET /v1/jobs/{job_id}/result{{/code}} 125 +* Any claim that exists in cache → Full verdict returned 126 +* Cost: $0 (no LLM calls) 127 +* Response time: <100ms 168 168 169 -** Response:** {{code}}200 OK{{/code}} (Returnsthe**AnalysisResult**schema - see Section 4)129 +✅ **Partial Article Analysis:** 170 170 171 -* *OtherResponses:**172 -* {{code}}409 Conflict{{/code}}- Jobnotfinishedyet173 -* {{code}}404 NotFound{{/code}}-Job IDunknown131 +* Check each claim against cache 132 +* Return verdicts for ALL cached claims 133 +* For uncached claims: Return "status": "cache_miss" 174 174 175 - ---135 +✅ **Cache Coverage Report:** 176 176 177 -=== 3.4 Download Markdown Report === 137 +* "3 of 5 claims available in cache (60% coverage)" 138 +* Links to cached analyses 139 +* Estimated cost to complete: $0.162 (2 new claims) 178 178 179 -** Endpoint:**{{code}}GET/v1/jobs/{job_id}/report{{/code}}141 +❌ **Not Available in Cache-Only Mode:** 180 180 181 -**Response:** {{code}}200 OK{{/code}} with {{code}}text/markdown; charset=utf-8{{/code}} content 143 +* New claim analysis (Stage 2 LLM calls blocked) 144 +* Full holistic assessment (Stage 3 blocked if any claims missing) 182 182 183 -**Headers:** 184 -* {{code}}Content-Disposition: attachment; filename="factharbor_poc1_{job_id}.md"{{/code}} 146 +==== User Experience Example: ==== 185 185 186 -**Other Responses:** 187 -* {{code}}409 Conflict{{/code}} - Job not finished 188 -* {{code}}404 Not Found{{/code}} - Job unknown 148 +{{{{ 149 + "status": "cache_only_mode", 150 + "message": "Monthly credit limit reached. Showing cached results only.", 151 + "cache_coverage": { 152 + "claims_total": 5, 153 + "claims_cached": 3, 154 + "claims_missing": 2, 155 + "coverage_percent": 60 156 + }, 157 + "cached_claims": [ 158 + {"claim_id": "C1", "verdict": "Likely", "confidence": 0.82}, 159 + {"claim_id": "C2", "verdict": "Highly Likely", "confidence": 0.91}, 160 + {"claim_id": "C4", "verdict": "Unclear", "confidence": 0.55} 161 + ], 162 + "missing_claims": [ 163 + {"claim_id": "C3", "claim_text": "...", "estimated_cost": "$0.081"}, 164 + {"claim_id": "C5", "claim_text": "...", "estimated_cost": "$0.081"} 165 + ], 166 + "upgrade_options": { 167 + "top_up": "$5 for 20-70 more articles", 168 + "pro_tier": "$50/month unlimited" 169 + } 170 +} 171 +}}} 189 189 190 - ---173 +**Design Rationale:** 191 191 192 -=== 3.5 Stream Job Events (Optional, Recommended) === 175 +* Free users still get value (cached claims often answer their question) 176 +* Demonstrates FactHarbor's value (partial results encourage upgrade) 177 +* Sustainable for platform (no additional cost) 178 +* Fair to all users (everyone contributes to cache) 193 193 194 - **Endpoint:** {{code}}GET /v1/jobs/{job_id}/events{{/code}}180 +---- 195 195 196 - **Response:**Server-SentEvents (SSE)stream182 +== 3. REST API Contract == 197 197 198 -**Event Types:** 199 -* {{code}}progress{{/code}} - Progress update 200 -* {{code}}claim_extracted{{/code}} - Claim identified 201 -* {{code}}verdict_computed{{/code}} - Scenario verdict complete 202 -* {{code}}complete{{/code}} - Job finished 203 -* {{code}}error{{/code}} - Error occurred 184 +=== 3.1 User Credit Tracking === 204 204 205 - ---186 +**Endpoint:** GET /v1/user/credit 206 206 207 - === 3.6 CancelJob===188 +**Response:** 200 OK 208 208 209 -**Endpoint:** {{code}}DELETE /v1/jobs/{job_id}{{/code}} 190 +{{{{ 191 + "user_id": "user_abc123", 192 + "tier": "free", 193 + "credit_limit": 10.00, 194 + "credit_used": 7.42, 195 + "credit_remaining": 2.58, 196 + "reset_date": "2025-02-01T00:00:00Z", 197 + "cache_only_mode": false, 198 + "usage_stats": { 199 + "articles_analyzed": 67, 200 + "claims_from_cache": 189, 201 + "claims_newly_analyzed": 113, 202 + "cache_hit_rate": 0.626 203 + } 204 +} 205 +}}} 210 210 211 - Attempts to cancel a queued or running job.207 +---- 212 212 213 - **Response:**{{code}}200OK{{/code}} with updatedJobobject(status: CANCELLED)209 +=== 3.2 Create Analysis Job (3-Stage) === 214 214 215 -** Note:**Already-completedjobs cannot be cancelled.211 +**Endpoint:** POST /v1/analyze 216 216 217 - ---213 +==== Idempotency Support: ==== 218 218 219 - ===3.7HealthCheck===215 +To prevent duplicate job creation on network retries, clients SHOULD include: 220 220 221 -**Endpoint:** {{code}}GET /v1/health{{/code}} 217 +{{{POST /v1/analyze 218 +Idempotency-Key: {client-generated-uuid} 219 +}}} 222 222 223 - **Response:**{{code}}200OK{{/code}}221 +OR use the client.request_id field: 224 224 225 -{{code language="json"}} 226 -{ 227 - "status": "ok", 228 - "version": "POC1-v0.3", 229 - "model": "claude-3-5-sonnet-20241022" 223 +{{{{ 224 + "input_url": "...", 225 + "client": { 226 + "request_id": "client-uuid-12345", 227 + "source_label": "optional" 228 + } 230 230 } 231 - {{/code}}230 +}}} 232 232 233 - ---232 +**Server Behavior:** 234 234 235 -== 4. AnalysisResult Schema (Context-Aware) == 234 +* If Idempotency-Key or request_id seen before (within 24 hours): 235 +** Return existing job (200 OK, not 202 Accepted) 236 +** Do NOT create duplicate job or charge twice 237 +* Idempotency keys expire after 24 hours (matches job retention) 236 236 237 - This schemaimplementsthe**Context-Aware Analysis** requiredby thePOC1 specification.239 +**Example Response (Idempotent):** 238 238 239 -{{code language="json"}} 240 -{ 241 - "metadata": { 242 - "job_id": "string (ULID)", 243 - "timestamp_utc": "ISO8601", 244 - "engine_version": "POC1-v0.3", 245 - "llm_provider": "anthropic", 246 - "llm_model": "claude-3-5-sonnet-20241022", 247 - "usage_stats": { 248 - "input_tokens": "integer", 249 - "output_tokens": "integer", 250 - "estimated_cost_usd": "float", 251 - "response_time_sec": "float" 252 - } 241 +{{{{ 242 + "job_id": "01J...ULID", 243 + "status": "RUNNING", 244 + "idempotent": true, 245 + "original_request_at": "2025-12-24T10:31:00Z", 246 + "message": "Returning existing job (idempotency key matched)" 247 +} 248 +}}} 249 + 250 +==== Request Body: ==== 251 + 252 +{{{{ 253 + "input_type": "url", 254 + "input_url": "https://example.com/medical-report-01", 255 + "input_text": null, 256 + "options": { 257 + "browsing": "on", 258 + "depth": "standard", 259 + "max_claims": 5, 260 + "scenarios_per_claim": 2, 261 + "max_evidence_per_scenario": 6, 262 + "context_aware_analysis": true 253 253 }, 254 - "article_holistic_assessment": { 255 - "main_thesis": "string (The core argument detected)", 256 - "overall_verdict": "WELL-SUPPORTED | MISLEADING | REFUTED | UNCERTAIN", 257 - "logic_quality_score": "float (0-1)", 258 - "fallacies_detected": ["correlation-causation", "cherry-picking", "hasty-generalization"], 259 - "verdict_reasoning": "string (Explanation of why article credibility differs from claim average)", 260 - "experimental_feature": true 261 - }, 262 - "claims": [ 263 - { 264 - "claim_id": "C1", 265 - "is_central_to_thesis": "boolean", 266 - "claim_text": "string", 267 - "canonical_form": "string", 268 - "claim_type": "descriptive | causal | predictive | normative | definitional", 269 - "evaluability": "evaluable | partly_evaluable | not_evaluable", 270 - "risk_tier": "A | B | C", 271 - "risk_tier_justification": "string", 272 - "domain": "string (e.g., 'public health', 'economics')", 273 - "key_terms": ["term1", "term2"], 274 - "entities": ["Person X", "Org Y"], 275 - "time_scope_detected": "2020-2024", 276 - "geography_scope_detected": "Brazil", 277 - "scenarios": [ 278 - { 279 - "scenario_id": "C1-S1", 280 - "context_title": "string", 281 - "definitions": {"key_term": "definition"}, 282 - "assumptions": ["Assumption 1", "Assumption 2"], 283 - "boundaries": { 284 - "time": "as of 2025-01", 285 - "geography": "Brazil", 286 - "population": "adult population", 287 - "conditions": "excludes X; includes Y" 288 - }, 289 - "scope_of_evidence": "What counts as evidence for this scenario", 290 - "scenario_questions": ["Question that decides the verdict"], 291 - "verdict": { 292 - "label": "Highly Likely | Likely | Unclear | Unlikely | Refuted | Unsubstantiated", 293 - "probability_range": [0.0, 1.0], 294 - "confidence": "float (0-1)", 295 - "reasoning": "string", 296 - "key_supporting_evidence_ids": ["E1", "E3"], 297 - "key_counter_evidence_ids": ["E2"], 298 - "uncertainty_factors": ["Data gap", "Method disagreement"], 299 - "what_would_change_my_mind": ["Specific new study", "Updated dataset"] 300 - }, 301 - "evidence": [ 302 - { 303 - "evidence_id": "E1", 304 - "stance": "supports | undermines | mixed | context_dependent", 305 - "relevance_to_scenario": "float (0-1)", 306 - "evidence_summary": ["Bullet fact 1", "Bullet fact 2"], 307 - "citation": { 308 - "title": "Source title", 309 - "author_or_org": "Org/Author", 310 - "publication_date": "2024-05-01", 311 - "url": "https://source.example", 312 - "publisher": "Publisher/Domain" 313 - }, 314 - "excerpt": ["Short quote ≤25 words (optional)"], 315 - "source_reliability_score": "float (0-1) - READ-ONLY SNAPSHOT", 316 - "reliability_justification": "Why high/medium/low", 317 - "limitations_and_reservations": ["Limitation 1", "Limitation 2"], 318 - "retraction_or_dispute_signal": "none | correction | retraction | disputed", 319 - "retrieval_status": "OK | NEEDS_RETRIEVAL | FAILED" 320 - } 321 - ] 322 - } 323 - ] 324 - } 325 - ], 326 - "quality_gates": { 327 - "gate1_claim_validation": "pass | fail", 328 - "gate4_verdict_confidence": "pass | fail", 329 - "passed_all": "boolean", 330 - "gate_fail_reasons": [ 331 - { 332 - "gate": "gate1_claim_validation", 333 - "claim_id": "C1", 334 - "reason_code": "OPINION_DETECTED | COMPOUND_CLAIM | SUBJECTIVE | TOO_VAGUE", 335 - "explanation": "Human-readable explanation" 336 - } 337 - ] 338 - }, 339 - "global_notes": { 340 - "limitations": ["System limitation 1", "Limitation 2"], 341 - "safety_or_policy_notes": ["Note 1"] 264 + "client": { 265 + "request_id": "optional-client-tracking-id", 266 + "source_label": "optional" 342 342 } 343 343 } 344 - {{/code}}269 +}}} 345 345 346 - === 4.1 Risk Tier Definitions===271 +**Options:** 347 347 348 -|=Tier|=Impact|=Examples|=Actions 349 -|**A (High)**|High real-world impact if wrong|Health claims, safety information, financial advice, medical procedures|Human review recommended (Mode3_Human_Reviewed_Required) 350 -|**B (Medium)**|Moderate impact, contested topics|Political claims, social issues, scientific debates, economic predictions|Enhanced contradiction search, AI-generated publication OK (Mode2_AI_Generated) 351 -|**C (Low)**|Low impact, easily verifiable|Historical facts, basic statistics, biographical data, geographic information|Standard processing, AI-generated publication OK (Mode2_AI_Generated) 273 +* browsing: on | off (retrieve web sources or just output queries) 274 +* depth: standard | deep (evidence thoroughness) 275 +* max_claims: 1-10 (default: **5** for cost control) 276 +* scenarios_per_claim: 1-5 (default: **2** for cost control) 277 +* max_evidence_per_scenario: 3-10 (default: **6**) 278 +* context_aware_analysis: true | false (experimental) 352 352 353 - ===4.2SourceReliability (Read-Only Snapshots) ===280 +**Response:** 202 Accepted 354 354 355 -**IMPORTANT:** The {{code}}source_reliability_score{{/code}} in each evidence item is a **historical snapshot** from the weekly background scoring job. 282 +{{{{ 283 + "job_id": "01J...ULID", 284 + "status": "QUEUED", 285 + "created_at": "2025-12-24T10:31:00Z", 286 + "estimated_cost": 0.114, 287 + "cost_breakdown": { 288 + "stage1_extraction": 0.003, 289 + "stage2_new_claims": 0.081, 290 + "stage2_cached_claims": 0.000, 291 + "stage3_holistic": 0.030 292 + }, 293 + "cache_info": { 294 + "claims_to_extract": 5, 295 + "estimated_cache_hits": 4, 296 + "estimated_new_claims": 1 297 + }, 298 + "links": { 299 + "self": "/v1/jobs/01J...ULID", 300 + "result": "/v1/jobs/01J...ULID/result", 301 + "report": "/v1/jobs/01J...ULID/report", 302 + "events": "/v1/jobs/01J...ULID/events" 303 + } 304 +} 305 +}}} 356 356 357 -* POC1 treats these scores as **read-only** (no modification during analysis) 358 -* **Prevents circular dependency:** scoring → affects retrieval → affects scoring 359 -* Full Source Track Record System is a **separate service** (not part of POC1) 360 -* **Temporal separation:** Scoring runs weekly; analysis uses snapshots 307 +**Error Responses:** 361 361 362 - **See:**[[DataModel>>Test.FactHarbor.Specification.Data Model.WebHome]] Section1.3 (SourceTrack RecordSystem) for scoring algorithm.309 +402 Payment Required - Free tier limit reached, cache-only mode 363 363 364 -=== 4.3 Quality Gate Reason Codes === 311 +{{{{ 312 + "error": "credit_limit_reached", 313 + "message": "Monthly credit limit reached. Entering cache-only mode.", 314 + "cache_only_mode": true, 315 + "credit_remaining": 0.00, 316 + "reset_date": "2025-02-01T00:00:00Z", 317 + "action": "Resubmit with cache_preference=allow_partial for cached results" 318 +} 319 +}}} 365 365 366 -**Gate 1 (Claim Validation):** 367 -* {{code}}OPINION_DETECTED{{/code}} - Subjective judgment without factual anchor 368 -* {{code}}COMPOUND_CLAIM{{/code}} - Multiple claims in one statement 369 -* {{code}}SUBJECTIVE{{/code}} - Value judgment, not verifiable fact 370 -* {{code}}TOO_VAGUE{{/code}} - Lacks specificity for evaluation 321 +---- 371 371 372 -**Gate 4 (Verdict Confidence):** 373 -* {{code}}LOW_CONFIDENCE{{/code}} - Confidence below threshold (<0.5) 374 -* {{code}}INSUFFICIENT_EVIDENCE{{/code}} - Too few sources to reach verdict 375 -* {{code}}CONTRADICTORY_EVIDENCE{{/code}} - Evidence conflicts without resolution 376 -* {{code}}NO_COUNTER_EVIDENCE{{/code}} - Contradiction search failed 323 +== 4. Data Schemas == 377 377 378 - **Purpose:**Enablesystemimprovementworkflow (Observe → Analyze → Improve)325 +=== 4.1 Stage 1 Output: ClaimExtraction === 379 379 380 ---- 327 +{{{{ 328 + "job_id": "01J...ULID", 329 + "stage": "stage1_extraction", 330 + "article_metadata": { 331 + "title": "Article title", 332 + "source_url": "https://example.com/article", 333 + "extracted_text_length": 5234, 334 + "language": "en" 335 + }, 336 + "claims": [ 337 + { 338 + "claim_id": "C1", 339 + "claim_text": "Original claim text from article", 340 + "canonical_claim": "Normalized, deduplicated phrasing", 341 + "claim_hash": "sha256:abc123...", 342 + "is_central_to_thesis": true, 343 + "claim_type": "causal", 344 + "evaluability": "evaluable", 345 + "risk_tier": "B", 346 + "domain": "public_health" 347 + } 348 + ], 349 + "article_thesis": "Main argument detected", 350 + "cost": 0.003 351 +} 352 +}}} 381 381 382 - == 5. Validation Rules (POC1 Enforcement) ==354 +---- 383 383 384 -|=Rule|=Requirement 385 -|**Mandatory Contradiction**|For every claim, the engine MUST search for "undermines" evidence. If none found, reasoning must explicitly state: "No counter-evidence found despite targeted search." Evidence must include at least 1 item with {{code}}stance ∈ {undermines, mixed, context_dependent}{{/code}} OR explicit note in {{code}}uncertainty_factors{{/code}}. 386 -|**Context-Aware Logic**|The {{code}}overall_verdict{{/code}} must prioritize central claims. If a {{code}}is_central_to_thesis=true{{/code}} claim is REFUTED, the overall article cannot be WELL-SUPPORTED. Central claims override verdict averaging. 387 -|**Author Identification**|All automated outputs MUST include {{code}}author_type: "AI/AKEL"{{/code}} or equivalent marker to distinguish AI-generated from human-reviewed content. 388 -|**Claim-to-Scenario Lifecycle**|In stateless POC1, Scenarios are **strictly children** of a specific Claim version. If a Claim's text changes, child Scenarios are part of that version's "snapshot." No scenario migration across versions. 356 +=== 4.5 Verdict Label Taxonomy === 389 389 390 - ---358 +FactHarbor uses **three distinct verdict taxonomies** depending on analysis level: 391 391 392 -== 6.DeterministicMarkdownTemplate ==360 +==== 4.5.1 Scenario Verdict Labels (Stage 2) ==== 393 393 394 - Thesystem renders{{code}}report.md{{/code}}using a **fixedtemplate**based ontheJSONresult(NOT generatedby LLM).362 +Used for individual scenario verdicts within a claim. 395 395 396 -{{code language="markdown"}} 397 -# FactHarbor Analysis Report: {overall_verdict} 364 +**Enum Values:** 398 398 399 -**Job ID:** {job_id} | **Generated:** {timestamp_utc} 400 -**Model:** {llm_model} | **Cost:** ${estimated_cost_usd} | **Time:** {response_time_sec}s 366 +* Highly Likely - Probability 0.85-1.0, high confidence 367 +* Likely - Probability 0.65-0.84, moderate-high confidence 368 +* Unclear - Probability 0.35-0.64, or low confidence 369 +* Unlikely - Probability 0.16-0.34, moderate-high confidence 370 +* Highly Unlikely - Probability 0.0-0.15, high confidence 371 +* Unsubstantiated - Insufficient evidence to determine probability 401 401 402 - ---373 +==== 4.5.2 Claim Verdict Labels (Rollup) ==== 403 403 404 - ##1.HolisticAssessment (Experimental)375 +Used when summarizing a claim across all scenarios. 405 405 406 -** MainThesis:**{main_thesis}377 +**Enum Values:** 407 407 408 -**Overall Verdict:** {overall_verdict} 379 +* Supported - Majority of scenarios are Likely or Highly Likely 380 +* Refuted - Majority of scenarios are Unlikely or Highly Unlikely 381 +* Inconclusive - Mixed scenarios or majority Unclear/Unsubstantiated 409 409 410 -**Logic Quality Score:**{logic_quality_score}/1.0383 +**Mapping Logic:** 411 411 412 -**Fallacies Detected:** {fallacies_detected} 385 +* If ≥60% scenarios are (Highly Likely | Likely) → Supported 386 +* If ≥60% scenarios are (Highly Unlikely | Unlikely) → Refuted 387 +* Otherwise → Inconclusive 413 413 414 - **Reasoning:**{verdict_reasoning}389 +==== 4.5.3 Article Verdict Labels (Stage 3) ==== 415 415 416 -- --391 +Used for holistic article-level assessment. 417 417 418 - ## 2. Key ClaimsAnalysis393 +**Enum Values:** 419 419 420 - ###[C1]{claim_text}421 -* **Role:**{is_central_to_thesis?"Centraltothesis": "Supportingclaim"}422 -* **RiskTier:**{risk_tier}({risk_tier_justification})423 -* **Evaluability:**{evaluability}395 +* WELL-SUPPORTED - Article thesis logically follows from supported claims 396 +* MISLEADING - Claims may be true but article commits logical fallacies 397 +* REFUTED - Central claims are refuted, invalidating thesis 398 +* UNCERTAIN - Insufficient evidence or highly mixed claim verdicts 424 424 425 -** ScenariosExplored:**{scenarios.length}400 +**Note:** Article verdict considers **claim centrality** (central claims override supporting claims). 426 426 427 -#### Scenario: {scenario.context_title} 428 -* **Verdict:** {verdict.label} (Confidence: {verdict.confidence}) 429 -* **Probability Range:** {verdict.probability_range[0]} - {verdict.probability_range[1]} 430 -* **Reasoning:** {verdict.reasoning} 402 +==== 4.5.4 API Field Mapping ==== 431 431 432 - **Evidence:**433 - *Supporting: {evidence.filter(e => e.stance== "supports").length} sources434 - * Undermining: {evidence.filter(e => e.stance== "undermines").length} sources435 - * Mixed: {evidence.filter(e => e.stance== "mixed").length} sources404 +|=Level|=API Field|=Enum Name 405 +|Scenario|scenarios[].verdict.label|scenario_verdict_label 406 +|Claim|claims[].rollup_verdict (optional)|claim_verdict_label 407 +|Article|article_holistic_assessment.overall_verdict|article_verdict_label 436 436 437 -**Key Evidence:** 438 -* [{evidence[0].citation.title}]({evidence[0].citation.url}) - {evidence[0].stance} 409 +---- 439 439 440 - ---411 +== 5. Cache Architecture == 441 441 442 - ##3.Quality Assessment413 +=== 5.1 Redis Cache Design === 443 443 444 -**Quality Gates:** 445 -* Gate 1 (Claim Validation): {gate1_claim_validation} 446 -* Gate 4 (Verdict Confidence): {gate4_verdict_confidence} 447 -* Overall: {passed_all ? "PASS" : "FAIL"} 415 +**Technology:** Redis 7.0+ (in-memory key-value store) 448 448 449 -{if gate_fail_reasons.length > 0} 450 -**Failed Gates:** 451 -{gate_fail_reasons.map(r => `* ${r.gate}: ${r.explanation}`)} 452 -{/if} 417 +**Cache Key Schema:** 453 453 454 ---- 419 +{{{claim:v1norm1:{language}:{sha256(canonical_claim)} 420 +}}} 455 455 456 - ## 4. Limitations & Disclaimers422 +**Example:** 457 457 458 -**System Limitations:** 459 -{limitations.map(l => `* ${l}`)} 424 +{{{Claim (English): "COVID vaccines are 95% effective" 425 +Canonical: "covid vaccines are 95 percent effective" 426 +Language: "en" 427 +SHA256: abc123...def456 428 +Key: claim:v1norm1:en:abc123...def456 429 +}}} 460 460 461 -**Important Notes:** 462 -* This analysis is AI-generated and experimental (POC1) 463 -* Context-aware article verdict is being tested for accuracy 464 -* Human review recommended for high-risk claims (Tier A) 465 -* Cost: ${estimated_cost_usd} | Tokens: {input_tokens + output_tokens} 431 +**Rationale:** Prevents cross-language collisions and enables per-language cache analytics. 466 466 467 -** Methodology:** FactHarboruses Claude 3.5Sonnetto extractclaims, generatescenarios, gather evidence (with mandatory contradiction search), and assess logical coherence between claims and article thesis.433 +**Data Structure:** 468 468 469 ---- 435 +{{{SET claim:v1norm1:en:abc123...def456 '{...ClaimAnalysis JSON...}' 436 +EXPIRE claim:v1norm1:en:abc123...def456 7776000 # 90 days 437 +}}} 470 470 471 -*Generated by FactHarbor POC1-v0.3 | [About FactHarbor](https://factharbor.org)* 472 -{{/code}} 439 +---- 473 473 474 - **TargetReport Size:** 220-350words(optimizedfor2-minuteread)441 +=== 5.1.1 Canonical Claim Normalization (v1) === 475 475 476 - ---443 +The cache key depends on deterministic claim normalization. All implementations MUST follow this algorithm exactly. 477 477 478 - ==7. LLMConfiguration(POC1) ==445 +**Algorithm: Canonical Claim Normalization v1** 479 479 480 -|=Parameter|=Value|=Notes 481 -|**Provider**|Anthropic|Primary provider for POC1 482 -|**Model**|{{code}}claude-3-5-sonnet-20241022{{/code}}|Current production model 483 -|**Future Model**|{{code}}claude-sonnet-4-20250514{{/code}}|When available (architecture supports) 484 -|**Token Budget**|50K-80K per analysis|Input + output combined (varies by article length) 485 -|**Estimated Cost**|$0.10-0.30 per article|Based on Sonnet 3.5 pricing ($3/M input, $15/M output) 486 -|**Prompt Strategy**|Single-pass per stage|Not multi-turn; structured JSON output with schema validation 487 -|**Chain-of-Thought**|Yes|For verdict reasoning and holistic assessment 488 -|**Few-Shot Examples**|Yes|For claim extraction and scenario generation 447 +{{{def normalize_claim_v1(claim_text: str, language: str) -> str: 448 + """ 449 + Normalizes claim to canonical form for cache key generation. 450 + Version: v1norm1 (POC1) 451 + """ 452 + import re 453 + import unicodedata 454 + 455 + # Step 1: Unicode normalization (NFC) 456 + text = unicodedata.normalize('NFC', claim_text) 457 + 458 + # Step 2: Lowercase 459 + text = text.lower() 460 + 461 + # Step 3: Remove punctuation (except hyphens in words) 462 + text = re.sub(r'[^\w\s-]', '', text) 463 + 464 + # Step 4: Normalize whitespace (collapse multiple spaces) 465 + text = re.sub(r'\s+', ' ', text).strip() 466 + 467 + # Step 5: Numeric normalization 468 + text = text.replace('%', ' percent') 469 + # Spell out single-digit numbers 470 + num_to_word = {'0':'zero', '1':'one', '2':'two', '3':'three', 471 + '4':'four', '5':'five', '6':'six', '7':'seven', 472 + '8':'eight', '9':'nine'} 473 + for num, word in num_to_word.items(): 474 + text = re.sub(rf'\b{num}\b', word, text) 475 + 476 + # Step 6: Common abbreviations (English only in v1) 477 + if language == 'en': 478 + text = text.replace('covid-19', 'covid') 479 + text = text.replace('u.s.', 'us') 480 + text = text.replace('u.k.', 'uk') 481 + 482 + # Step 7: NO entity normalization in v1 483 + # (Trump vs Donald Trump vs President Trump remain distinct) 484 + 485 + return text 489 489 490 -=== 7.1 Token Budgets by Stage === 487 +# Version identifier (include in cache namespace) 488 +CANONICALIZER_VERSION = "v1norm1" 489 +}}} 491 491 492 -|=Stage|=Approximate Output Tokens 493 -|Claim Extraction|~4,000 (10 claims × ~400 tokens) 494 -|Scenario Generation|~3,000 per claim (3 scenarios × ~1,000 tokens) 495 -|Evidence Synthesis|~2,000 per scenario 496 -|Verdict Generation|~1,000 per scenario 497 -|Holistic Assessment|~500 (context-aware summary) 491 +**Cache Key Formula (Updated):** 498 498 499 -**Total:** 50K-80K tokens per article (input + output) 493 +{{{language = "en" 494 +canonical = normalize_claim_v1(claim_text, language) 495 +cache_key = f"claim:{CANONICALIZER_VERSION}:{language}:{sha256(canonical)}" 500 500 501 -=== 7.2 API Integration === 497 +Example: 498 + claim: "COVID-19 vaccines are 95% effective" 499 + canonical: "covid vaccines are 95 percent effective" 500 + sha256: abc123...def456 501 + key: "claim:v1norm1:en:abc123...def456" 502 +}}} 502 502 503 -**Anthropic Messages API:** 504 -* Endpoint: {{code}}https://api.anthropic.com/v1/messages{{/code}} 505 -* Authentication: API key via {{code}}x-api-key{{/code}} header 506 -* Model parameter: {{code}}"model": "claude-3-5-sonnet-20241022"{{/code}} 507 -* Max tokens: {{code}}"max_tokens": 4096{{/code}} (per stage) 504 +**Cache Metadata MUST Include:** 508 508 509 -**No LangChain/LangGraph needed** for POC1 simplicity - direct SDK calls suffice. 506 +{{{{ 507 + "canonical_claim": "covid vaccines are 95 percent effective", 508 + "canonicalizer_version": "v1norm1", 509 + "language": "en", 510 + "original_claim_samples": ["COVID-19 vaccines are 95% effective"] 511 +} 512 +}}} 510 510 511 - ---514 +**Version Upgrade Path:** 512 512 513 -== 8. Cross-References (xWiki) == 516 +* v1norm1 → v1norm2: Cache namespace changes, old keys remain valid until TTL 517 +* v1normN → v2norm1: Major version bump, invalidate all v1 caches 514 514 515 - This API specification implements requirements from:519 +---- 516 516 517 -* **[[POC Requirements>>Test.FactHarbor.Specification.POC.Requirements]]** 518 -** FR-POC-1 through FR-POC-6 (POC1-specific functional requirements) 519 -** NFR-POC-1 through NFR-POC-3 (quality gates lite: Gates 1 & 4 only) 520 -** Section 2.1: Analysis Summary (Context-Aware) component specification 521 -** Section 10.3: Prompt structure for claim extraction and verdict synthesis 521 +=== 5.1.2 Copyright & Data Retention Policy === 522 522 523 -* **[[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]]** 524 -** Complete investigation of 7 approaches to article-level verdicts 525 -** Approach 1 (Single-Pass Holistic Analysis) chosen for POC1 526 -** Experimental feature testing plan (30 articles, ≥70% accuracy target) 527 -** Decision framework for POC2 implementation 523 +**Evidence Excerpt Storage:** 528 528 529 -* **[[Requirements>>Test.FactHarbor.Specification.Requirements.WebHome]]** 530 -** FR4 (Analysis Summary) - enhanced with context-aware capability 531 -** FR7 (Verdict Calculation) - probability ranges + confidence scores 532 -** NFR11 (Quality Gates) - POC1 implements Gates 1 & 4; Gates 2 & 3 in POC2 525 +To comply with copyright law and fair use principles: 533 533 534 -* **[[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]]** 535 -** POC1 simplified architecture (stateless, single AKEL orchestration call) 536 -** Data persistence minimized (job outputs only, no database required) 537 -** Deferred complexity (no Elasticsearch, TimescaleDB, Federation until metrics justify) 527 +**What We Store:** 538 538 539 -* **[[Data Model>>Test.FactHarbor.Specification.Data Model.WebHome]]** 540 -** Evidence structure (source, stance, reliability rating) 541 -** Scenario boundaries (time, geography, population, conditions) 542 -** Claim types and evaluability taxonomy 543 -** Source Track Record System (Section 1.3) - temporal separation 529 +* **Metadata only:** Title, author, publisher, URL, publication date 530 +* **Short excerpts:** Max 25 words per quote, max 3 quotes per evidence item 531 +* **Summaries:** AI-generated bullet points (not verbatim text) 532 +* **No full articles:** Never store complete article text beyond job processing 544 544 545 -* **[[Requirements Roadmap Matrix>>Test.FactHarbor.Roadmap.Requirements-Roadmap-Matrix.WebHome]]** 546 -** POC1 requirement mappings and phase assignments 547 -** Context-aware analysis as POC1 experimental feature 548 -** POC2 enhancement path (Gates 2 & 3, evidence deduplication) 534 +**Total per Cached Claim:** 549 549 550 ---- 536 +* Scenarios: 2 per claim 537 +* Evidence items: 6 per scenario (12 total) 538 +* Quotes: 3 per evidence × 25 words = 75 words per item 539 +* **Maximum stored verbatim text:** ~~900 words per claim (12 × 75) 551 551 552 - == 9. ImplementationNotes (POC1) ==541 +**Retention:** 553 553 554 -=== 9.1 Recommended Tech Stack === 543 +* Cache TTL: 90 days 544 +* Job outputs: 24 hours (then archived or deleted) 545 +* No persistent full-text article storage 555 555 556 -* **Framework:** Next.js 14+ with App Router (TypeScript) - Full-stack in one codebase 557 -* **Rationale:** API routes + React UI unified, Vercel deployment-ready, similar to C# in structure 558 -* **Storage:** Filesystem JSON files (no database needed for POC1) 559 -* **Queue:** In-memory queue or Redis (optional for concurrency) 560 -* **URL Extraction:** Jina AI Reader API (primary), trafilatura (fallback) 561 -* **Deployment:** Vercel, AWS Lambda, or similar serverless 547 +**Rationale:** 562 562 563 -=== 9.2 POC1 Simplifications === 549 +* Short excerpts for citation = fair use 550 +* Summaries are transformative (not copyrightable) 551 +* Limited retention (90 days max) 552 +* No commercial republication of excerpts 564 564 565 -* **No database required:** Job metadata + outputs stored as JSON files ({{code}}jobs/{job_id}.json{{/code}}, {{code}}results/{job_id}.json{{/code}}) 566 -* **No user authentication:** Optional API key validation only (env var: {{code}}FACTHARBOR_API_KEY{{/code}}) 567 -* **Single-instance deployment:** No distributed processing, no worker pools 568 -* **Synchronous LLM calls:** No streaming in POC1 (entire response before returning) 569 -* **Job retention:** 24 hours default (configurable: {{code}}JOB_RETENTION_HOURS{{/code}}) 570 -* **Rate limiting:** Simple IP-based (optional) - no complex billing 554 +**DMCA Compliance:** 571 571 572 -=== 9.3 Estimated Costs (Per Analysis) === 556 +* Cache invalidation endpoint available for rights holders 557 +* Contact: dmca@factharbor.org 573 573 574 -**LLM API costs (Claude 3.5 Sonnet):** 575 -* Input: $3.00 per million tokens 576 -* Output: $15.00 per million tokens 577 -* **Per article:** $0.10-0.30 (varies by length, 5-10 claims typical) 559 +---- 578 578 579 -**Web search costs (optional):** 580 -* Using external search API (Tavily, Brave): $0.01-0.05 per analysis 581 -* POC1 can use free search APIs initially 561 +== Summary == 582 582 583 -**Infrastructure costs:** 584 -* Vercel hobby tier: Free for POC 585 -* AWS Lambda: ~$0.001 per request 586 -* **Total infra:** <$0.01 per analysis 563 +This WYSIWYG preview shows the **structure and key sections** of the 1,515-line API specification. 587 587 588 -** Totalestimated cost:**~$0.15-0.35 per analysis ✅ Meets<$0.35 target565 +**Full specification includes:** 589 589 590 -=== 9.4 Estimated Timeline (AI-Assisted) === 567 +* Complete API endpoints (7 total) 568 +* All data schemas (ClaimExtraction, ClaimAnalysis, HolisticAssessment, Complete) 569 +* Quality gates & validation rules 570 +* LLM configuration for all 3 stages 571 +* Implementation notes with code samples 572 +* Testing strategy 573 +* Cross-references to other pages 591 591 592 -**With Cursor IDE + Claude API:** 593 -* Day 1-2: API scaffolding + job queue 594 -* Day 3-4: LLM integration + prompt engineering 595 -* Day 5-6: Evidence retrieval + contradiction search 596 -* Day 7: Report templates + testing with 30 articles 597 -* **Total:** 5-7 days for working POC1 575 +**The complete specification is available in:** 598 598 599 -**Manual coding (no AI assistance):** 600 -* Estimate: 15-20 days 601 - 602 -=== 9.5 First Prompt for AI Code Generation === 603 - 604 -{{code}} 605 -Based on the FactHarbor POC1 API & Schemas Specification (v0.3), generate a Next.js 14 TypeScript application with: 606 - 607 -1. API routes implementing the 7 endpoints specified in Section 3 608 -2. AnalyzeRequest/AnalysisResult types matching schemas in Sections 4-5 609 -3. Anthropic Claude 3.5 Sonnet integration for: 610 - - Claim extraction (with central/supporting marking) 611 - - Scenario generation 612 - - Evidence synthesis (with mandatory contradiction search) 613 - - Verdict generation 614 - - Holistic assessment (article-level credibility) 615 -4. Job-based async execution with progress tracking (7 pipeline stages) 616 -5. Quality Gates 1 & 4 from NFR11 implementation 617 -6. Mandatory contradiction search enforcement (Section 5) 618 -7. Context-aware analysis (experimental) as specified 619 -8. Filesystem-based job storage (no database) 620 -9. Markdown report generation from JSON templates (Section 6) 621 - 622 -Use the validation rules from Section 5 and error codes from Section 2.1.1. 623 -Target: <$0.35 per analysis, <2 minutes processing time. 624 -{{/code}} 625 - 626 ---- 627 - 628 -== 10. Testing Strategy (POC1) == 629 - 630 -=== 10.1 Test Dataset (30 Articles) === 631 - 632 -**Category 1: Straightforward Factual (10 articles)** 633 -* Purpose: Baseline accuracy 634 -* Example: "WHO report on global vaccination rates" 635 -* Expected: High claim accuracy, straightforward verdict 636 - 637 -**Category 2: Accurate Claims, Questionable Conclusions (10 articles)** ⭐ **Context-Aware Test** 638 -* Purpose: Test holistic assessment capability 639 -* Example: "Coffee cures cancer" (true premises, false conclusion) 640 -* Expected: Individual claims TRUE, article verdict MISLEADING 641 - 642 -**Category 3: Mixed Accuracy (5 articles)** 643 -* Purpose: Test nuance handling 644 -* Example: Articles with some true, some false claims 645 -* Expected: Scenario-level differentiation 646 - 647 -**Category 4: Low-Quality Claims (5 articles)** 648 -* Purpose: Test quality gates 649 -* Example: Opinion pieces, compound claims 650 -* Expected: Gate 1 failures, rejection or draft-only mode 651 - 652 -=== 10.2 Success Metrics === 653 - 654 -**Quality Metrics:** 655 -* Hallucination rate: <5% (target: <3%) 656 -* Context-aware accuracy: ≥70% (experimental - key POC1 goal) 657 -* False positive rate: <15% 658 -* Mandatory contradiction search: 100% compliance 659 - 660 -**Performance Metrics:** 661 -* Processing time: <2 minutes per article (standard depth) 662 -* Cost per analysis: <$0.35 663 -* API uptime: >99% 664 -* LLM API error rate: <1% 665 - 666 -**See:** [[POC1 Roadmap>>Test.FactHarbor.Roadmap.POC1.WebHome]] Section 11 for complete success criteria and testing methodology. 667 - 668 ---- 669 - 670 -**End of Specification - FactHarbor POC1 API v0.3** 671 - 672 -**Ready for xWiki import and AI-assisted implementation!** 🚀 673 - 577 +* FactHarbor_POC1_API_and_Schemas_Spec_v0_4_1_PATCHED.md (45 KB standalone) 578 +* Export files (TEST/PRODUCTION) for xWiki import