Wiki source code of POC1 API & Schemas Specification
Version 2.1 by Robert Schaub on 2025/12/24 21:08
Show last authors
| author | version | line-number | content |
|---|---|---|---|
| 1 | = POC1 API & Schemas Specification = | ||
| 2 | Version: 0.4.1 (POC1) | ||
| 3 | Scope: POC1 (Test) — URL/Text → Stage 1 Claim Extraction → Stage 2 Claim Analysis (cached) → Stage 3 Article Assessment → result.json + report.md | ||
| 4 | |||
| 5 | ---- | ||
| 6 | |||
| 7 | == Version History == | ||
| 8 | |||
| 9 | |=Version|=Date|=Changes | ||
| 10 | |0.4.1|2025-12-24|Codegen-ready canonical contract: locked enums + mapping, deterministic normalization (v1norm1), idempotency, cache_preference semantics, minimal OpenAPI 3.1, URL extraction + SSRF rules, evidence excerpt policy | ||
| 11 | |0.4|2025-12-24|3-stage pipeline with claim-level caching; cache-only mode for free users; Redis cache architecture | ||
| 12 | |0.3.1|2025-12-24|Reduced schema naming drift; clarified SSE (progress only); added cost knobs and constraints | ||
| 13 | |0.3|2025-12-24|Initial endpoint set + draft schemas | ||
| 14 | |||
| 15 | ---- | ||
| 16 | |||
| 17 | == POC1 Codegen Contract (Canonical) == | ||
| 18 | |||
| 19 | {{info}} | ||
| 20 | This section is the **authoritative, code-generation-ready contract** for POC1. | ||
| 21 | If any other page conflicts with this section, **this section wins**. | ||
| 22 | |||
| 23 | **Format note:** This page is authored in **xWiki 2.1** syntax. If exchanged as a transport `.md` file, import it as **xWiki content**, not Markdown. | ||
| 24 | {{/info}} | ||
| 25 | |||
| 26 | === 1) Goals === | ||
| 27 | POC1 provides: | ||
| 28 | * A REST API to submit **either a URL or raw text** | ||
| 29 | * Asynchronous execution (job-based) with progress/status and optional SSE events | ||
| 30 | * Deterministic outputs: | ||
| 31 | ** ``result.json`` (machine-readable; schema-validated) | ||
| 32 | ** ``report.md`` (human-readable; rendered from JSON via template) | ||
| 33 | * Mandatory counter-evidence attempt per scenario (or explicit “not found despite search” uncertainty note) | ||
| 34 | * Claim-level caching to validate cost sustainability | ||
| 35 | |||
| 36 | === 2) Non-goals (POC1) === | ||
| 37 | * Full editorial workflow / human review UI | ||
| 38 | * Full provenance ledger beyond minimal job metadata | ||
| 39 | * Complex billing/rate-limit systems (simple caps are fine) | ||
| 40 | * Multi-language i18n beyond best-effort language detection | ||
| 41 | |||
| 42 | === 3) Pipeline model (locked) === | ||
| 43 | * **Stage 1 — Claim Extraction** | ||
| 44 | ** Input: URL-text or pasted text | ||
| 45 | ** Output: claim candidates including canonical claim + ``claim_hash`` | ||
| 46 | * **Stage 2 — Claim Analysis (cached by claim_hash)** | ||
| 47 | ** Input: canonical claim(s) | ||
| 48 | ** Output: scenarios, evidence/counter-evidence, scenario verdicts, plus a claim-level verdict summary | ||
| 49 | * **Stage 3 — Article Assessment** | ||
| 50 | ** Input: article text + Stage 2 analyses (from cache and/or freshly computed) | ||
| 51 | ** Output: article-level assessment (main thesis, reasoning quality, key risks, context) | ||
| 52 | |||
| 53 | === 4) Locked verdict taxonomies (two enums + mapping) === | ||
| 54 | |||
| 55 | **Scenario verdict enum** (``ScenarioVerdict.verdict_label``): | ||
| 56 | * ``Highly likely`` | ``Likely`` | ``Unclear`` | ``Unlikely`` | ``Highly unlikely`` | ``Unsubstantiated`` | ||
| 57 | |||
| 58 | **Claim verdict enum** (``ClaimVerdict.verdict_label``): | ||
| 59 | * ``Supported`` | ``Refuted`` | ``Inconclusive`` | ||
| 60 | |||
| 61 | **Mapping rule (for summaries):** | ||
| 62 | * If the claim’s primary-interpretation scenario verdict is: | ||
| 63 | ** ``Highly likely`` / ``Likely`` ⇒ ``Supported`` | ||
| 64 | ** ``Highly unlikely`` / ``Unlikely`` ⇒ ``Refuted`` | ||
| 65 | ** ``Unclear`` / ``Unsubstantiated`` ⇒ ``Inconclusive`` | ||
| 66 | * If scenarios materially disagree (different assumptions lead to different outcomes) ⇒ ``Inconclusive``, and explain why. | ||
| 67 | |||
| 68 | === 5) Deterministic canonical claim normalization (locked) === | ||
| 69 | Normalization version: ``v1norm1`` | ||
| 70 | Cache key namespace: ``claim:v1norm1:{language}:{sha256(canonical_claim_text)}`` | ||
| 71 | |||
| 72 | Rules (MUST be implemented exactly; any change requires bumping normalization version): | ||
| 73 | 1. Unicode normalize: NFD | ||
| 74 | 2. Lowercase | ||
| 75 | 3. Strip diacritics | ||
| 76 | 4. Normalize apostrophes: ``’`` and ``‘`` → ``'`` | ||
| 77 | 5. Normalize percent: ``%`` → `` percent`` | ||
| 78 | 6. Normalize whitespace (collapse) | ||
| 79 | 7. Remove punctuation except apostrophes | ||
| 80 | 8. Expand contractions (fixed list for v1norm1) | ||
| 81 | 9. Normalize whitespace again | ||
| 82 | |||
| 83 | Contractions list (v1norm1): | ||
| 84 | * don't→do not, doesn't→does not, didn't→did not, can't→cannot, won't→will not, | ||
| 85 | * shouldn't→should not, wouldn't→would not, isn't→is not, aren't→are not, | ||
| 86 | * wasn't→was not, weren't→were not | ||
| 87 | |||
| 88 | **Reference implementation (normative):** | ||
| 89 | {{code language="python"}} | ||
| 90 | import re | ||
| 91 | import unicodedata | ||
| 92 | |||
| 93 | # Canonical claim normalization for deduplication. | ||
| 94 | # Version: v1norm1 | ||
| 95 | # | ||
| 96 | # IMPORTANT: | ||
| 97 | # - Any change to these rules REQUIRES a new normalization version. | ||
| 98 | # - Cache keys MUST include the normalization version to avoid collisions. | ||
| 99 | |||
| 100 | CONTRACTIONS_V1NORM1 = { | ||
| 101 | "don't": "do not", | ||
| 102 | "doesn't": "does not", | ||
| 103 | "didn't": "did not", | ||
| 104 | "can't": "cannot", | ||
| 105 | "won't": "will not", | ||
| 106 | "shouldn't": "should not", | ||
| 107 | "wouldn't": "would not", | ||
| 108 | "isn't": "is not", | ||
| 109 | "aren't": "are not", | ||
| 110 | "wasn't": "was not", | ||
| 111 | "weren't": "were not", | ||
| 112 | } | ||
| 113 | |||
| 114 | def normalize_claim(text: str) -> str: | ||
| 115 | """Canonical claim normalization (v1norm1).""" | ||
| 116 | if text is None: | ||
| 117 | return "" | ||
| 118 | |||
| 119 | # 1) Unicode normalization | ||
| 120 | text = unicodedata.normalize("NFD", text) | ||
| 121 | |||
| 122 | # 2) Lowercase | ||
| 123 | text = text.lower() | ||
| 124 | |||
| 125 | # 3) Remove diacritics | ||
| 126 | text = "".join(c for c in text if unicodedata.category(c) != "Mn") | ||
| 127 | |||
| 128 | # 4) Normalize apostrophes (common unicode variants) | ||
| 129 | text = text.replace("’", "'").replace("‘", "'") | ||
| 130 | |||
| 131 | # 5) Normalize percent sign | ||
| 132 | text = text.replace("%", " percent") | ||
| 133 | |||
| 134 | # 6) Normalize whitespace | ||
| 135 | text = re.sub(r"\s+", " ", text).strip() | ||
| 136 | |||
| 137 | # 7) Remove punctuation except apostrophes | ||
| 138 | text = re.sub(r"[^\w\s']", "", text) | ||
| 139 | |||
| 140 | # 8) Expand contractions (fixed list for v1norm1) | ||
| 141 | for k, v in CONTRACTIONS_V1NORM1.items(): | ||
| 142 | text = re.sub(rf"\b{re.escape(k)}\b", v, text) | ||
| 143 | |||
| 144 | # 9) Final whitespace normalization | ||
| 145 | text = re.sub(r"\s+", " ", text).strip() | ||
| 146 | |||
| 147 | return text | ||
| 148 | {{/code}} | ||
| 149 | |||
| 150 | === 6) Report generation (locked) === | ||
| 151 | * ``report.md`` MUST be generated from ``result.json`` using a deterministic template (server-side). | ||
| 152 | * The LLM MUST NOT be the sole renderer of report markdown. | ||
| 153 | |||
| 154 | === 7) No chain-of-thought storage/exposure (locked) === | ||
| 155 | * Store/expose only short, user-visible rationale bullets. | ||
| 156 | * Never store or expose internal reasoning traces. | ||
| 157 | |||
| 158 | ---- | ||
| 159 | |||
| 160 | == REST API (POC1) == | ||
| 161 | |||
| 162 | === 1) API Versioning === | ||
| 163 | Base path: ``/v1`` | ||
| 164 | |||
| 165 | === 2) Content types === | ||
| 166 | * Requests: ``application/json`` | ||
| 167 | * JSON responses: ``application/json`` | ||
| 168 | * Report download: ``text/markdown; charset=utf-8`` | ||
| 169 | * SSE events: ``text/event-stream`` | ||
| 170 | |||
| 171 | === 3) Time & IDs === | ||
| 172 | * All timestamps: ISO 8601 UTC (e.g., ``2025-12-24T19:31:30Z``) | ||
| 173 | * ``job_id``: ULID (string) | ||
| 174 | * ``claim_hash``: sha256 hex string (lowercase), computed over canonical claim + version + language as specified | ||
| 175 | |||
| 176 | === 4) Authentication === | ||
| 177 | All ``/v1`` endpoints require: | ||
| 178 | * Header: ``Authorization: Bearer <API_KEY>`` | ||
| 179 | |||
| 180 | === 5) Error Envelope (all non-2xx) === | ||
| 181 | {{code language="json"}} | ||
| 182 | { | ||
| 183 | "error": { | ||
| 184 | "code": "CACHE_MISS | VALIDATION_ERROR | UNAUTHORIZED | FORBIDDEN | NOT_FOUND | RATE_LIMITED | UPSTREAM_FETCH_ERROR | INTERNAL_ERROR", | ||
| 185 | "message": "Human readable message", | ||
| 186 | "details": { | ||
| 187 | "field_errors": [ | ||
| 188 | {"field": "input_url", "issue": "Must be a valid https URL"} | ||
| 189 | ] | ||
| 190 | } | ||
| 191 | } | ||
| 192 | } | ||
| 193 | {{/code}} | ||
| 194 | |||
| 195 | ---- | ||
| 196 | |||
| 197 | == Endpoints == | ||
| 198 | |||
| 199 | === 1) POST /v1/analyze === | ||
| 200 | Creates an asynchronous job. Exactly one of ``input_url`` or ``input_text`` MUST be provided. | ||
| 201 | |||
| 202 | **Request (AnalyzeRequest)** | ||
| 203 | {{code language="json"}} | ||
| 204 | { | ||
| 205 | "input_url": "https://example.org/article", | ||
| 206 | "input_text": null, | ||
| 207 | "options": { | ||
| 208 | "max_claims": 5, | ||
| 209 | "cache_preference": "prefer_cache", | ||
| 210 | "browsing": "on", | ||
| 211 | "output_report": true | ||
| 212 | }, | ||
| 213 | "client": { | ||
| 214 | "request_id": "optional-idempotency-key" | ||
| 215 | } | ||
| 216 | } | ||
| 217 | {{/code}} | ||
| 218 | |||
| 219 | **Options** | ||
| 220 | * ``max_claims``: integer, 1..50 (default 5) | ||
| 221 | * ``cache_preference``: ``prefer_cache`` | ``allow_partial`` | ``cache_only`` | ``skip_cache`` | ||
| 222 | ** ``prefer_cache``: use cache when available; otherwise run full pipeline | ||
| 223 | ** ``allow_partial``: if Stage 2 cached analyses exist, server MAY skip Stage 1+2 and rerun only Stage 3 using cached analyses | ||
| 224 | ** ``cache_only``: do not run Stage 2 for uncached claims; fail with CACHE_MISS (402) if required claim analyses are missing | ||
| 225 | ** ``skip_cache``: ignore cache and recompute | ||
| 226 | * ``browsing``: ``on`` | ``off`` | ||
| 227 | ** ``off``: do not retrieve evidence; mark evidence items as ``NEEDS_RETRIEVAL`` and output retrieval queries | ||
| 228 | * ``output_report``: boolean (default true) | ||
| 229 | |||
| 230 | **Response: 202 Accepted (JobCreated)** | ||
| 231 | {{code language="json"}} | ||
| 232 | { | ||
| 233 | "job_id": "01J8Y9K6M2Q1J0JZ7E5P8H7Y9C", | ||
| 234 | "status": "QUEUED", | ||
| 235 | "created_at": "2025-12-24T19:30:00Z", | ||
| 236 | "estimated_cost": { | ||
| 237 | "credits": 420, | ||
| 238 | "explain": "Estimate may change after Stage 1 (claim count)." | ||
| 239 | }, | ||
| 240 | "links": { | ||
| 241 | "self": "/v1/jobs/01J8Y9K6M2Q1J0JZ7E5P8H7Y9C", | ||
| 242 | "events": "/v1/jobs/01J8Y9K6M2Q1J0JZ7E5P8H7Y9C/events", | ||
| 243 | "result": "/v1/jobs/01J8Y9K6M2Q1J0JZ7E5P8H7Y9C/result", | ||
| 244 | "report": "/v1/jobs/01J8Y9K6M2Q1J0JZ7E5P8H7Y9C/report" | ||
| 245 | } | ||
| 246 | } | ||
| 247 | {{/code}} | ||
| 248 | |||
| 249 | **Cache-only failure: 402 (CACHE_MISS)** | ||
| 250 | {{code language="json"}} | ||
| 251 | { | ||
| 252 | "error": { | ||
| 253 | "code": "CACHE_MISS", | ||
| 254 | "message": "cache_only requested but a required claim analysis is not cached.", | ||
| 255 | "details": { | ||
| 256 | "missing_claim_hash": "9f2a...c01", | ||
| 257 | "normalization_version": "v1norm1" | ||
| 258 | } | ||
| 259 | } | ||
| 260 | } | ||
| 261 | {{/code}} | ||
| 262 | |||
| 263 | === 2) GET /v1/jobs/{job_id} === | ||
| 264 | Returns job status and progress. | ||
| 265 | |||
| 266 | **Response: 200 OK (Job)** | ||
| 267 | {{code language="json"}} | ||
| 268 | { | ||
| 269 | "job_id": "01J...", | ||
| 270 | "status": "RUNNING", | ||
| 271 | "created_at": "2025-12-24T19:30:00Z", | ||
| 272 | "updated_at": "2025-12-24T19:31:12Z", | ||
| 273 | "progress": { | ||
| 274 | "stage": "STAGE2_CLAIM_ANALYSIS", | ||
| 275 | "stage_progress": 0.4, | ||
| 276 | "message": "Analyzing claim 3/8" | ||
| 277 | }, | ||
| 278 | "links": { | ||
| 279 | "events": "/v1/jobs/01J.../events", | ||
| 280 | "result": "/v1/jobs/01J.../result", | ||
| 281 | "report": "/v1/jobs/01J.../report" | ||
| 282 | } | ||
| 283 | } | ||
| 284 | {{/code}} | ||
| 285 | |||
| 286 | Statuses: | ||
| 287 | * ``QUEUED`` → ``RUNNING`` → ``SUCCEEDED`` | ``FAILED`` | ``CANCELED`` | ||
| 288 | |||
| 289 | Stages: | ||
| 290 | * ``STAGE1_CLAIM_EXTRACT`` | ||
| 291 | * ``STAGE2_CLAIM_ANALYSIS`` | ||
| 292 | * ``STAGE3_ARTICLE_ASSESSMENT`` | ||
| 293 | |||
| 294 | === 3) GET /v1/jobs/{job_id}/events === | ||
| 295 | Server-Sent Events (SSE) for job progress reporting (no token streaming). | ||
| 296 | |||
| 297 | Event types (minimum): | ||
| 298 | * ``job.created`` | ||
| 299 | * ``stage.started`` | ||
| 300 | * ``stage.progress`` | ||
| 301 | * ``stage.completed`` | ||
| 302 | * ``job.succeeded`` | ||
| 303 | * ``job.failed`` | ||
| 304 | |||
| 305 | === 4) GET /v1/jobs/{job_id}/result === | ||
| 306 | Returns JSON result. | ||
| 307 | * ``200`` if job ``SUCCEEDED`` | ||
| 308 | * ``409`` if not finished | ||
| 309 | |||
| 310 | === 5) GET /v1/jobs/{job_id}/report === | ||
| 311 | Returns ``text/markdown`` report. | ||
| 312 | * ``200`` if job ``SUCCEEDED`` | ||
| 313 | * ``409`` if not finished | ||
| 314 | |||
| 315 | === 6) DELETE /v1/jobs/{job_id} === | ||
| 316 | Cancels a job (best-effort) and deletes stored artifacts if supported. | ||
| 317 | * ``204`` success | ||
| 318 | * ``404`` if not found | ||
| 319 | |||
| 320 | === 7) GET /v1/health === | ||
| 321 | **Response: 200 OK** | ||
| 322 | {{code language="json"}} | ||
| 323 | { | ||
| 324 | "status": "ok", | ||
| 325 | "service": "factharbor-poc1", | ||
| 326 | "version": "v0.9.105", | ||
| 327 | "time": "2025-12-24T19:31:30Z" | ||
| 328 | } | ||
| 329 | {{/code}} | ||
| 330 | |||
| 331 | ---- | ||
| 332 | |||
| 333 | == Idempotency (Required) == | ||
| 334 | |||
| 335 | Clients SHOULD send: | ||
| 336 | * Header: ``Idempotency-Key: <string>`` (preferred) | ||
| 337 | or | ||
| 338 | * Body: ``client.request_id`` | ||
| 339 | |||
| 340 | Server behavior: | ||
| 341 | * Same key + same request body → return existing job (``200``) with: | ||
| 342 | ** ``idempotent=true`` | ||
| 343 | ** ``original_request_at`` | ||
| 344 | * Same key + different request body → ``409`` with ``VALIDATION_ERROR`` and mismatch details | ||
| 345 | |||
| 346 | Idempotency TTL: 24 hours (matches minimum job retention) | ||
| 347 | |||
| 348 | ---- | ||
| 349 | |||
| 350 | == Schemas (result.json) == | ||
| 351 | |||
| 352 | === Top-level AnalysisResult === | ||
| 353 | {{code language="json"}} | ||
| 354 | { | ||
| 355 | "job_id": "01J...", | ||
| 356 | "input": { | ||
| 357 | "source_type": "url|text", | ||
| 358 | "source": "https://example.org/article", | ||
| 359 | "language": "en", | ||
| 360 | "retrieved_at_utc": "2025-12-24T19:30:12Z", | ||
| 361 | "extraction": { | ||
| 362 | "method": "jina|trafilatura|manual", | ||
| 363 | "word_count": 1234 | ||
| 364 | } | ||
| 365 | }, | ||
| 366 | "claim_extraction": { | ||
| 367 | "normalization_version": "v1norm1", | ||
| 368 | "claims": [ | ||
| 369 | { | ||
| 370 | "claim_hash": "sha256hex...", | ||
| 371 | "claim_text": "as found in source", | ||
| 372 | "canonical_claim_text": "normalized claim", | ||
| 373 | "confidence": 0.72 | ||
| 374 | } | ||
| 375 | ] | ||
| 376 | }, | ||
| 377 | "claim_analyses": [ | ||
| 378 | { | ||
| 379 | "claim_hash": "sha256hex...", | ||
| 380 | "claim_verdict": { | ||
| 381 | "verdict_label": "Supported|Refuted|Inconclusive", | ||
| 382 | "confidence": 0.63, | ||
| 383 | "rationale_bullets": [ | ||
| 384 | "Short bullet 1", | ||
| 385 | "Short bullet 2" | ||
| 386 | ] | ||
| 387 | }, | ||
| 388 | "scenarios": [ | ||
| 389 | { | ||
| 390 | "scenario_id": "01J...ULID", | ||
| 391 | "scenario_title": "Interpretation / boundary", | ||
| 392 | "definitions": { "term": "definition" }, | ||
| 393 | "assumptions": ["..."], | ||
| 394 | "boundaries": { "time": "...", "geography": "...", "population": "...", "conditions": "..." }, | ||
| 395 | "retrieval_plan": { | ||
| 396 | "queries": [ | ||
| 397 | {"q": "support query", "purpose": "support"}, | ||
| 398 | {"q": "counter query", "purpose": "counter"} | ||
| 399 | ] | ||
| 400 | }, | ||
| 401 | "evidence": [ | ||
| 402 | { | ||
| 403 | "evidence_id": "01J...ULID", | ||
| 404 | "stance": "supports|undermines|mixed|context_dependent", | ||
| 405 | "relevance": 0.0, | ||
| 406 | "summary_bullets": ["..."], | ||
| 407 | "citation": { | ||
| 408 | "title": "Source title", | ||
| 409 | "publisher": "Publisher", | ||
| 410 | "author_or_org": "Org/Author", | ||
| 411 | "publication_date": "YYYY-MM-DD", | ||
| 412 | "url": "https://...", | ||
| 413 | "retrieved_at_utc": "2025-12-24T19:31:01Z" | ||
| 414 | }, | ||
| 415 | "excerpt": "optional short quote (max 25 words)", | ||
| 416 | "reliability_rating": "high|medium|low", | ||
| 417 | "limitations": ["..."], | ||
| 418 | "retrieval_status": "OK|NEEDS_RETRIEVAL|FAILED" | ||
| 419 | } | ||
| 420 | ], | ||
| 421 | "verdict": { | ||
| 422 | "verdict_label": "Highly likely|Likely|Unclear|Unlikely|Highly unlikely|Unsubstantiated", | ||
| 423 | "probability_range": [0.0, 1.0], | ||
| 424 | "confidence": 0.0, | ||
| 425 | "rationale_bullets": ["..."], | ||
| 426 | "key_supporting_evidence_ids": ["E..."], | ||
| 427 | "key_counter_evidence_ids": ["E..."], | ||
| 428 | "uncertainty_factors": ["..."], | ||
| 429 | "what_would_change_my_mind": ["..."] | ||
| 430 | } | ||
| 431 | } | ||
| 432 | ], | ||
| 433 | "quality_gates": { | ||
| 434 | "gate1_claim_validation": "pass|partial|fail", | ||
| 435 | "gate2_contradiction_search": "pass|partial|fail", | ||
| 436 | "gate3_uncertainty_disclosure": "pass|partial|fail", | ||
| 437 | "gate4_verdict_confidence": "pass|partial|fail", | ||
| 438 | "fail_reasons": [] | ||
| 439 | } | ||
| 440 | } | ||
| 441 | ], | ||
| 442 | "article_assessment": { | ||
| 443 | "main_thesis": "string", | ||
| 444 | "thesis_support": "supported|challenged|mixed|unclear", | ||
| 445 | "overall_reasoning_quality": "high|medium|low", | ||
| 446 | "summary": "string", | ||
| 447 | "key_risks": [ | ||
| 448 | "missing evidence", | ||
| 449 | "cherry-picking", | ||
| 450 | "correlation/causation", | ||
| 451 | "time window mismatch" | ||
| 452 | ], | ||
| 453 | "how_claims_connect_to_thesis": [ | ||
| 454 | "Short bullets connecting claim analyses to the thesis assessment" | ||
| 455 | ] | ||
| 456 | }, | ||
| 457 | "global_notes": { | ||
| 458 | "limitations": ["..."], | ||
| 459 | "policy_notes": [] | ||
| 460 | } | ||
| 461 | } | ||
| 462 | {{/code}} | ||
| 463 | |||
| 464 | === Mandatory counter-evidence rule (POC1) === | ||
| 465 | For each scenario, attempt to include at least one evidence item with: | ||
| 466 | * ``stance`` ∈ {``undermines``, ``mixed``, ``context_dependent``} | ||
| 467 | |||
| 468 | If not found: | ||
| 469 | * include explicit “not found despite targeted search” note in ``uncertainty_factors`` | ||
| 470 | * and/or include evidence items with ``retrieval_status=FAILED`` indicating the attempted search | ||
| 471 | |||
| 472 | ---- | ||
| 473 | |||
| 474 | == URL Extraction Rules (POC1) == | ||
| 475 | |||
| 476 | * Primary extraction: Jina Reader (if enabled) | ||
| 477 | * Fallback: Trafilatura (or equivalent) | ||
| 478 | |||
| 479 | **SSRF protections (required):** | ||
| 480 | * Block local IPs, metadata IPs, file:// URLs, and internal hostnames | ||
| 481 | * If blocked: return ``UPSTREAM_FETCH_ERROR`` with reason | ||
| 482 | |||
| 483 | **Copyright/ToS safe storage policy (POC1):** | ||
| 484 | * Store only metadata + short excerpts | ||
| 485 | * Excerpts: max 25 words per quote, and cap total quotes per source | ||
| 486 | |||
| 487 | ---- | ||
| 488 | |||
| 489 | == Cost Control Knobs (POC1 defaults) == | ||
| 490 | |||
| 491 | Defaults: | ||
| 492 | * ``max_claims = 5`` | ||
| 493 | * ``scenarios_per_claim = 2..3`` (internal Stage 2 policy) | ||
| 494 | * Cap evidence items per scenario (recommended: 6 total; at least 1 counter) | ||
| 495 | * Keep rationales concise (bullets) | ||
| 496 | |||
| 497 | ---- | ||
| 498 | |||
| 499 | == Minimal OpenAPI 3.1 (POC1) == | ||
| 500 | |||
| 501 | {{code language="yaml"}} | ||
| 502 | openapi: 3.1.0 | ||
| 503 | info: | ||
| 504 | title: FactHarbor POC1 API | ||
| 505 | version: 0.9.105 | ||
| 506 | paths: | ||
| 507 | /v1/analyze: | ||
| 508 | post: | ||
| 509 | summary: Create analysis job | ||
| 510 | parameters: | ||
| 511 | - in: header | ||
| 512 | name: Idempotency-Key | ||
| 513 | required: false | ||
| 514 | schema: { type: string } | ||
| 515 | requestBody: | ||
| 516 | required: true | ||
| 517 | content: | ||
| 518 | application/json: | ||
| 519 | schema: | ||
| 520 | $ref: '#/components/schemas/AnalyzeRequest' | ||
| 521 | responses: | ||
| 522 | '202': | ||
| 523 | description: Accepted | ||
| 524 | content: | ||
| 525 | application/json: | ||
| 526 | schema: | ||
| 527 | $ref: '#/components/schemas/JobCreated' | ||
| 528 | '4XX': | ||
| 529 | description: Error | ||
| 530 | content: | ||
| 531 | application/json: | ||
| 532 | schema: | ||
| 533 | $ref: '#/components/schemas/ErrorEnvelope' | ||
| 534 | /v1/jobs/{job_id}: | ||
| 535 | get: | ||
| 536 | summary: Get job status | ||
| 537 | parameters: | ||
| 538 | - in: path | ||
| 539 | name: job_id | ||
| 540 | required: true | ||
| 541 | schema: { type: string } | ||
| 542 | responses: | ||
| 543 | '200': | ||
| 544 | description: OK | ||
| 545 | content: | ||
| 546 | application/json: | ||
| 547 | schema: | ||
| 548 | $ref: '#/components/schemas/Job' | ||
| 549 | '404': | ||
| 550 | description: Not Found | ||
| 551 | content: | ||
| 552 | application/json: | ||
| 553 | schema: | ||
| 554 | $ref: '#/components/schemas/ErrorEnvelope' | ||
| 555 | delete: | ||
| 556 | summary: Cancel job (best-effort) and delete artifacts | ||
| 557 | parameters: | ||
| 558 | - in: path | ||
| 559 | name: job_id | ||
| 560 | required: true | ||
| 561 | schema: { type: string } | ||
| 562 | responses: | ||
| 563 | '204': { description: No Content } | ||
| 564 | '404': | ||
| 565 | description: Not Found | ||
| 566 | content: | ||
| 567 | application/json: | ||
| 568 | schema: | ||
| 569 | $ref: '#/components/schemas/ErrorEnvelope' | ||
| 570 | /v1/jobs/{job_id}/events: | ||
| 571 | get: | ||
| 572 | summary: Job progress via SSE | ||
| 573 | parameters: | ||
| 574 | - in: path | ||
| 575 | name: job_id | ||
| 576 | required: true | ||
| 577 | schema: { type: string } | ||
| 578 | responses: | ||
| 579 | '200': | ||
| 580 | description: text/event-stream | ||
| 581 | /v1/jobs/{job_id}/result: | ||
| 582 | get: | ||
| 583 | summary: Get final JSON result | ||
| 584 | parameters: | ||
| 585 | - in: path | ||
| 586 | name: job_id | ||
| 587 | required: true | ||
| 588 | schema: { type: string } | ||
| 589 | responses: | ||
| 590 | '200': | ||
| 591 | description: OK | ||
| 592 | content: | ||
| 593 | application/json: | ||
| 594 | schema: | ||
| 595 | $ref: '#/components/schemas/AnalysisResult' | ||
| 596 | '409': | ||
| 597 | description: Not ready | ||
| 598 | content: | ||
| 599 | application/json: | ||
| 600 | schema: | ||
| 601 | $ref: '#/components/schemas/ErrorEnvelope' | ||
| 602 | /v1/jobs/{job_id}/report: | ||
| 603 | get: | ||
| 604 | summary: Download report (markdown) | ||
| 605 | parameters: | ||
| 606 | - in: path | ||
| 607 | name: job_id | ||
| 608 | required: true | ||
| 609 | schema: { type: string } | ||
| 610 | responses: | ||
| 611 | '200': | ||
| 612 | description: text/markdown | ||
| 613 | '409': | ||
| 614 | description: Not ready | ||
| 615 | content: | ||
| 616 | application/json: | ||
| 617 | schema: | ||
| 618 | $ref: '#/components/schemas/ErrorEnvelope' | ||
| 619 | /v1/health: | ||
| 620 | get: | ||
| 621 | summary: Health check | ||
| 622 | responses: | ||
| 623 | '200': | ||
| 624 | description: OK | ||
| 625 | content: | ||
| 626 | application/json: | ||
| 627 | schema: | ||
| 628 | type: object | ||
| 629 | properties: | ||
| 630 | status: { type: string } | ||
| 631 | components: | ||
| 632 | schemas: | ||
| 633 | AnalyzeRequest: | ||
| 634 | type: object | ||
| 635 | required: [options] | ||
| 636 | properties: | ||
| 637 | input_url: { type: ['string', 'null'] } | ||
| 638 | input_text: { type: ['string', 'null'] } | ||
| 639 | options: | ||
| 640 | $ref: '#/components/schemas/AnalyzeOptions' | ||
| 641 | client: | ||
| 642 | type: object | ||
| 643 | properties: | ||
| 644 | request_id: { type: string } | ||
| 645 | AnalyzeOptions: | ||
| 646 | type: object | ||
| 647 | properties: | ||
| 648 | max_claims: { type: integer, minimum: 1, maximum: 50, default: 5 } | ||
| 649 | cache_preference: | ||
| 650 | type: string | ||
| 651 | enum: [prefer_cache, allow_partial, cache_only, skip_cache] | ||
| 652 | default: prefer_cache | ||
| 653 | browsing: | ||
| 654 | type: string | ||
| 655 | enum: [on, off] | ||
| 656 | default: on | ||
| 657 | output_report: { type: boolean, default: true } | ||
| 658 | JobCreated: | ||
| 659 | type: object | ||
| 660 | required: [job_id, status, created_at, links] | ||
| 661 | properties: | ||
| 662 | job_id: { type: string } | ||
| 663 | status: { type: string } | ||
| 664 | created_at: { type: string } | ||
| 665 | estimated_cost: | ||
| 666 | type: object | ||
| 667 | properties: | ||
| 668 | credits: { type: integer } | ||
| 669 | explain: { type: string } | ||
| 670 | links: | ||
| 671 | type: object | ||
| 672 | properties: | ||
| 673 | self: { type: string } | ||
| 674 | events: { type: string } | ||
| 675 | result: { type: string } | ||
| 676 | report: { type: string } | ||
| 677 | Job: | ||
| 678 | type: object | ||
| 679 | required: [job_id, status, created_at, updated_at, links] | ||
| 680 | properties: | ||
| 681 | job_id: { type: string } | ||
| 682 | status: { type: string, enum: [QUEUED, RUNNING, SUCCEEDED, FAILED, CANCELED] } | ||
| 683 | progress: | ||
| 684 | type: object | ||
| 685 | properties: | ||
| 686 | stage: { type: string } | ||
| 687 | stage_progress: { type: number, minimum: 0, maximum: 1 } | ||
| 688 | message: { type: string } | ||
| 689 | created_at: { type: string } | ||
| 690 | updated_at: { type: string } | ||
| 691 | links: | ||
| 692 | type: object | ||
| 693 | properties: | ||
| 694 | events: { type: string } | ||
| 695 | result: { type: string } | ||
| 696 | report: { type: string } | ||
| 697 | AnalysisResult: | ||
| 698 | type: object | ||
| 699 | required: [job_id, claim_extraction, claim_analyses, article_assessment] | ||
| 700 | properties: | ||
| 701 | job_id: { type: string } | ||
| 702 | claim_extraction: | ||
| 703 | type: object | ||
| 704 | properties: | ||
| 705 | normalization_version: { type: string } | ||
| 706 | claims: | ||
| 707 | type: array | ||
| 708 | items: | ||
| 709 | type: object | ||
| 710 | properties: | ||
| 711 | claim_hash: { type: string } | ||
| 712 | claim_text: { type: string } | ||
| 713 | canonical_claim_text: { type: string } | ||
| 714 | confidence: { type: number } | ||
| 715 | claim_analyses: | ||
| 716 | type: array | ||
| 717 | items: | ||
| 718 | type: object | ||
| 719 | properties: | ||
| 720 | claim_hash: { type: string } | ||
| 721 | scenarios: | ||
| 722 | type: array | ||
| 723 | items: | ||
| 724 | type: object | ||
| 725 | properties: | ||
| 726 | scenario_id: { type: string } | ||
| 727 | scenario_title: { type: string } | ||
| 728 | verdict: | ||
| 729 | type: object | ||
| 730 | properties: | ||
| 731 | verdict_label: { type: string } | ||
| 732 | confidence: { type: number } | ||
| 733 | rationale_bullets: | ||
| 734 | type: array | ||
| 735 | items: { type: string } | ||
| 736 | article_assessment: | ||
| 737 | type: object | ||
| 738 | properties: | ||
| 739 | overall_reasoning_quality: { type: string, enum: [high, medium, low] } | ||
| 740 | summary: { type: string } | ||
| 741 | key_risks: | ||
| 742 | type: array | ||
| 743 | items: { type: string } | ||
| 744 | ErrorEnvelope: | ||
| 745 | type: object | ||
| 746 | properties: | ||
| 747 | error: | ||
| 748 | type: object | ||
| 749 | properties: | ||
| 750 | code: { type: string } | ||
| 751 | message: { type: string } | ||
| 752 | details: { type: object } | ||
| 753 | {{/code}} | ||
| 754 | |||
| 755 | ---- | ||
| 756 | End of page. |