Version 4.1 by Robert Schaub on 2025/12/24 16:55

Hide last authors
Robert Schaub 2.2 1 = POC1 API & Schemas Specification =
Robert Schaub 1.1 2
Robert Schaub 2.2 3 ----
Robert Schaub 1.1 4
5 == Version History ==
6
7 |=Version|=Date|=Changes
Robert Schaub 2.1 8 |0.4.1|2025-12-24|Applied 9 critical fixes: file format notice, verdict taxonomy, canonicalization algorithm, Stage 1 cost policy, BullMQ fix, language in cache key, historical claims TTL, idempotency, copyright policy
9 |0.4|2025-12-24|**BREAKING:** 3-stage pipeline with claim-level caching, user tier system, cache-only mode for free users, Redis cache architecture
Robert Schaub 2.2 10 |0.3.1|2025-12-24|Fixed single-prompt strategy, SSE clarification, schema canonicalization, cost constraints
11 |0.3|2025-12-24|Added complete API endpoints, LLM config, risk tiers, scraping details
Robert Schaub 1.1 12
Robert Schaub 2.2 13 ----
Robert Schaub 1.1 14
Robert Schaub 2.2 15 == 1. Core Objective (POC1) ==
Robert Schaub 2.1 16
Robert Schaub 2.2 17 The primary technical goal of POC1 is to validate **Approach 1 (Single-Pass Holistic Analysis)** while implementing **claim-level caching** to achieve cost sustainability.
Robert Schaub 2.1 18
Robert Schaub 2.2 19 The system must prove that AI can identify an article's **Main Thesis** and determine if supporting claims logically support that thesis without committing fallacies.
Robert Schaub 2.1 20
Robert Schaub 2.2 21 === Success Criteria: ===
Robert Schaub 2.1 22
Robert Schaub 1.1 23 * Test with 30 diverse articles
24 * Target: ≥70% accuracy detecting misleading articles
Robert Schaub 2.1 25 * Cost: <$0.25 per NEW analysis (uncached)
26 * Cost: $0.00 for cached claim reuse
27 * Cache hit rate: ≥50% after 1,000 articles
Robert Schaub 1.1 28 * Processing time: <2 minutes (standard depth)
29
Robert Schaub 2.2 30 === Economic Model: ===
Robert Schaub 2.1 31
Robert Schaub 2.2 32 * **Free tier:** $10 credit per month (~~40-140 articles depending on cache hits)
33 * **After limit:** Cache-only mode (instant, free access to cached claims)
34 * **Paid tier:** Unlimited new analyses
Robert Schaub 1.1 35
Robert Schaub 2.2 36 ----
Robert Schaub 1.1 37
Robert Schaub 2.1 38 == 2. Architecture Overview ==
Robert Schaub 1.1 39
Robert Schaub 2.1 40 === 2.1 3-Stage Pipeline with Caching ===
Robert Schaub 1.1 41
Robert Schaub 2.1 42 FactHarbor POC1 uses a **3-stage architecture** designed for claim-level caching and cost efficiency:
Robert Schaub 1.1 43
Robert Schaub 3.1 44 {{mermaid}}
45 graph TD
Robert Schaub 2.1 46 A[Article Input] --> B[Stage 1: Extract Claims]
47 B --> C{For Each Claim}
48 C --> D[Check Cache]
49 D -->|Cache HIT| E[Return Cached Verdict]
50 D -->|Cache MISS| F[Stage 2: Analyze Claim]
51 F --> G[Store in Cache]
52 G --> E
53 E --> H[Stage 3: Holistic Assessment]
54 H --> I[Final Report]
Robert Schaub 3.1 55 {{/mermaid}}
Robert Schaub 1.1 56
Robert Schaub 2.2 57 ==== Stage 1: Claim Extraction (Haiku, no cache) ====
Robert Schaub 1.1 58
Robert Schaub 2.2 59 * **Input:** Article text
60 * **Output:** 5 canonical claims (normalized, deduplicated)
61 * **Model:** Claude Haiku 4
62 * **Cost:** $0.003 per article
63 * **Cache strategy:** No caching (article-specific)
Robert Schaub 1.1 64
Robert Schaub 2.2 65 ==== Stage 2: Claim Analysis (Sonnet, CACHED) ====
Robert Schaub 1.1 66
Robert Schaub 2.2 67 * **Input:** Single canonical claim
68 * **Output:** Scenarios + Evidence + Verdicts
69 * **Model:** Claude Sonnet 3.5
70 * **Cost:** $0.081 per NEW claim
71 * **Cache strategy:** Redis, 90-day TTL
72 * **Cache key:** claim:v1norm1:{language}:{sha256(canonical_claim)}
Robert Schaub 1.1 73
Robert Schaub 2.2 74 ==== Stage 3: Holistic Assessment (Sonnet, no cache) ====
75
76 * **Input:** Article + Claim verdicts (from cache or Stage 2)
77 * **Output:** Article verdict + Fallacies + Logic quality
78 * **Model:** Claude Sonnet 3.5
79 * **Cost:** $0.030 per article
80 * **Cache strategy:** No caching (article-specific)
81
82 === Total Cost Formula: ===
83
84 {{{Cost = $0.003 (extraction) + (N_new_claims × $0.081) + $0.030 (holistic)
85
Robert Schaub 2.1 86 Examples:
87 - 0 new claims (100% cache hit): $0.033
88 - 1 new claim (80% cache hit): $0.114
89 - 3 new claims (40% cache hit): $0.276
90 - 5 new claims (0% cache hit): $0.438
Robert Schaub 2.2 91 }}}
Robert Schaub 1.1 92
Robert Schaub 2.2 93 ----
94
Robert Schaub 2.1 95 === 2.2 User Tier System ===
Robert Schaub 1.1 96
Robert Schaub 2.1 97 |=Tier|=Monthly Credit|=After Limit|=Cache Access|=Analytics
98 |**Free**|$10|Cache-only mode|✅ Full|Basic
99 |**Pro** (future)|$50|Continues|✅ Full|Advanced
100 |**Enterprise** (future)|Custom|Continues|✅ Full + Priority|Full
Robert Schaub 1.1 101
Robert Schaub 2.1 102 **Free Tier Economics:**
Robert Schaub 2.2 103
Robert Schaub 2.1 104 * $10 credit = 40-140 articles analyzed (depending on cache hit rate)
105 * Average 70 articles/month at 70% cache hit rate
Robert Schaub 2.2 106 * After limit: Cache-only mode
Robert Schaub 2.1 107
Robert Schaub 2.2 108 ----
109
Robert Schaub 2.1 110 === 2.3 Cache-Only Mode (Free Tier Feature) ===
111
112 When free users reach their $10 monthly limit, they enter **Cache-Only Mode**:
113
Robert Schaub 2.2 114 ==== What Cache-Only Mode Provides: ====
Robert Schaub 2.1 115
116 ✅ **Claim Extraction (Platform-Funded):**
Robert Schaub 2.2 117
Robert Schaub 2.1 118 * Stage 1 extraction runs at $0.003 per article
119 * **Cost: Absorbed by platform** (not charged to user credit)
120 * Rationale: Extraction is necessary to check cache, and cost is negligible
121 * Rate limit: Max 50 extractions/day in cache-only mode (prevents abuse)
122
123 ✅ **Instant Access to Cached Claims:**
Robert Schaub 2.2 124
Robert Schaub 2.1 125 * Any claim that exists in cache → Full verdict returned
126 * Cost: $0 (no LLM calls)
127 * Response time: <100ms
128
129 ✅ **Partial Article Analysis:**
Robert Schaub 2.2 130
Robert Schaub 2.1 131 * Check each claim against cache
132 * Return verdicts for ALL cached claims
Robert Schaub 2.2 133 * For uncached claims: Return "status": "cache_miss"
Robert Schaub 2.1 134
135 ✅ **Cache Coverage Report:**
Robert Schaub 2.2 136
Robert Schaub 2.1 137 * "3 of 5 claims available in cache (60% coverage)"
138 * Links to cached analyses
139 * Estimated cost to complete: $0.162 (2 new claims)
140
141 ❌ **Not Available in Cache-Only Mode:**
Robert Schaub 2.2 142
Robert Schaub 2.1 143 * New claim analysis (Stage 2 LLM calls blocked)
144 * Full holistic assessment (Stage 3 blocked if any claims missing)
145
Robert Schaub 2.2 146 ==== User Experience Example: ====
147
148 {{{{
Robert Schaub 2.1 149 "status": "cache_only_mode",
150 "message": "Monthly credit limit reached. Showing cached results only.",
151 "cache_coverage": {
152 "claims_total": 5,
153 "claims_cached": 3,
154 "claims_missing": 2,
155 "coverage_percent": 60
156 },
157 "cached_claims": [
158 {"claim_id": "C1", "verdict": "Likely", "confidence": 0.82},
159 {"claim_id": "C2", "verdict": "Highly Likely", "confidence": 0.91},
160 {"claim_id": "C4", "verdict": "Unclear", "confidence": 0.55}
161 ],
162 "missing_claims": [
163 {"claim_id": "C3", "claim_text": "...", "estimated_cost": "$0.081"},
164 {"claim_id": "C5", "claim_text": "...", "estimated_cost": "$0.081"}
165 ],
166 "upgrade_options": {
167 "top_up": "$5 for 20-70 more articles",
168 "pro_tier": "$50/month unlimited"
169 }
170 }
Robert Schaub 2.2 171 }}}
Robert Schaub 2.1 172
173 **Design Rationale:**
Robert Schaub 2.2 174
Robert Schaub 2.1 175 * Free users still get value (cached claims often answer their question)
176 * Demonstrates FactHarbor's value (partial results encourage upgrade)
177 * Sustainable for platform (no additional cost)
178 * Fair to all users (everyone contributes to cache)
179
Robert Schaub 2.2 180 ----
Robert Schaub 1.1 181
182 == 3. REST API Contract ==
183
Robert Schaub 2.1 184 === 3.1 User Credit Tracking ===
Robert Schaub 1.1 185
Robert Schaub 2.2 186 **Endpoint:** GET /v1/user/credit
Robert Schaub 2.1 187
Robert Schaub 2.2 188 **Response:** 200 OK
Robert Schaub 2.1 189
Robert Schaub 2.2 190 {{{{
Robert Schaub 2.1 191 "user_id": "user_abc123",
192 "tier": "free",
193 "credit_limit": 10.00,
194 "credit_used": 7.42,
195 "credit_remaining": 2.58,
196 "reset_date": "2025-02-01T00:00:00Z",
197 "cache_only_mode": false,
198 "usage_stats": {
199 "articles_analyzed": 67,
200 "claims_from_cache": 189,
201 "claims_newly_analyzed": 113,
202 "cache_hit_rate": 0.626
203 }
204 }
Robert Schaub 2.2 205 }}}
Robert Schaub 2.1 206
Robert Schaub 2.2 207 ----
Robert Schaub 2.1 208
209 === 3.2 Create Analysis Job (3-Stage) ===
210
Robert Schaub 2.2 211 **Endpoint:** POST /v1/analyze
Robert Schaub 1.1 212
Robert Schaub 2.2 213 ==== Idempotency Support: ====
Robert Schaub 2.1 214
215 To prevent duplicate job creation on network retries, clients SHOULD include:
216
Robert Schaub 2.2 217 {{{POST /v1/analyze
Robert Schaub 2.1 218 Idempotency-Key: {client-generated-uuid}
Robert Schaub 2.2 219 }}}
Robert Schaub 2.1 220
Robert Schaub 2.2 221 OR use the client.request_id field:
Robert Schaub 2.1 222
Robert Schaub 2.2 223 {{{{
Robert Schaub 2.1 224 "input_url": "...",
225 "client": {
226 "request_id": "client-uuid-12345",
227 "source_label": "optional"
228 }
229 }
Robert Schaub 2.2 230 }}}
Robert Schaub 2.1 231
232 **Server Behavior:**
Robert Schaub 2.2 233
234 * If Idempotency-Key or request_id seen before (within 24 hours):
235 ** Return existing job (200 OK, not 202 Accepted)
236 ** Do NOT create duplicate job or charge twice
Robert Schaub 2.1 237 * Idempotency keys expire after 24 hours (matches job retention)
238
239 **Example Response (Idempotent):**
Robert Schaub 2.2 240
241 {{{{
Robert Schaub 2.1 242 "job_id": "01J...ULID",
243 "status": "RUNNING",
244 "idempotent": true,
245 "original_request_at": "2025-12-24T10:31:00Z",
246 "message": "Returning existing job (idempotency key matched)"
247 }
Robert Schaub 2.2 248 }}}
Robert Schaub 2.1 249
Robert Schaub 2.2 250 ==== Request Body: ====
Robert Schaub 2.1 251
Robert Schaub 2.2 252 {{{{
Robert Schaub 1.1 253 "input_type": "url",
254 "input_url": "https://example.com/medical-report-01",
255 "input_text": null,
256 "options": {
257 "browsing": "on",
258 "depth": "standard",
259 "max_claims": 5,
Robert Schaub 2.2 260 "scenarios_per_claim": 2,
261 "max_evidence_per_scenario": 6,
262 "context_aware_analysis": true
Robert Schaub 1.1 263 },
264 "client": {
265 "request_id": "optional-client-tracking-id",
266 "source_label": "optional"
267 }
268 }
Robert Schaub 2.2 269 }}}
Robert Schaub 1.1 270
271 **Options:**
272
Robert Schaub 2.2 273 * browsing: on | off (retrieve web sources or just output queries)
274 * depth: standard | deep (evidence thoroughness)
275 * max_claims: 1-10 (default: **5** for cost control)
276 * scenarios_per_claim: 1-5 (default: **2** for cost control)
277 * max_evidence_per_scenario: 3-10 (default: **6**)
278 * context_aware_analysis: true | false (experimental)
Robert Schaub 1.1 279
Robert Schaub 2.2 280 **Response:** 202 Accepted
281
282 {{{{
Robert Schaub 1.1 283 "job_id": "01J...ULID",
284 "status": "QUEUED",
285 "created_at": "2025-12-24T10:31:00Z",
Robert Schaub 2.1 286 "estimated_cost": 0.114,
287 "cost_breakdown": {
288 "stage1_extraction": 0.003,
289 "stage2_new_claims": 0.081,
290 "stage2_cached_claims": 0.000,
291 "stage3_holistic": 0.030
292 },
293 "cache_info": {
294 "claims_to_extract": 5,
295 "estimated_cache_hits": 4,
296 "estimated_new_claims": 1
297 },
Robert Schaub 1.1 298 "links": {
299 "self": "/v1/jobs/01J...ULID",
300 "result": "/v1/jobs/01J...ULID/result",
301 "report": "/v1/jobs/01J...ULID/report",
302 "events": "/v1/jobs/01J...ULID/events"
303 }
304 }
Robert Schaub 2.2 305 }}}
Robert Schaub 1.1 306
Robert Schaub 2.1 307 **Error Responses:**
308
Robert Schaub 2.2 309 402 Payment Required - Free tier limit reached, cache-only mode
310
311 {{{{
Robert Schaub 2.1 312 "error": "credit_limit_reached",
313 "message": "Monthly credit limit reached. Entering cache-only mode.",
314 "cache_only_mode": true,
315 "credit_remaining": 0.00,
316 "reset_date": "2025-02-01T00:00:00Z",
317 "action": "Resubmit with cache_preference=allow_partial for cached results"
318 }
Robert Schaub 2.2 319 }}}
Robert Schaub 2.1 320
Robert Schaub 2.2 321 ----
Robert Schaub 1.1 322
Robert Schaub 2.1 323 == 4. Data Schemas ==
Robert Schaub 1.1 324
Robert Schaub 2.1 325 === 4.1 Stage 1 Output: ClaimExtraction ===
Robert Schaub 1.1 326
Robert Schaub 2.2 327 {{{{
Robert Schaub 2.1 328 "job_id": "01J...ULID",
329 "stage": "stage1_extraction",
330 "article_metadata": {
331 "title": "Article title",
332 "source_url": "https://example.com/article",
333 "extracted_text_length": 5234,
334 "language": "en"
335 },
336 "claims": [
337 {
338 "claim_id": "C1",
339 "claim_text": "Original claim text from article",
340 "canonical_claim": "Normalized, deduplicated phrasing",
341 "claim_hash": "sha256:abc123...",
342 "is_central_to_thesis": true,
343 "claim_type": "causal",
344 "evaluability": "evaluable",
345 "risk_tier": "B",
346 "domain": "public_health"
347 }
348 ],
349 "article_thesis": "Main argument detected",
350 "cost": 0.003
351 }
Robert Schaub 2.2 352 }}}
Robert Schaub 2.1 353
Robert Schaub 2.2 354 ----
Robert Schaub 2.1 355
356 === 4.5 Verdict Label Taxonomy ===
Robert Schaub 1.1 357
Robert Schaub 2.1 358 FactHarbor uses **three distinct verdict taxonomies** depending on analysis level:
Robert Schaub 1.1 359
Robert Schaub 2.1 360 ==== 4.5.1 Scenario Verdict Labels (Stage 2) ====
Robert Schaub 1.1 361
Robert Schaub 2.1 362 Used for individual scenario verdicts within a claim.
Robert Schaub 1.1 363
Robert Schaub 2.1 364 **Enum Values:**
Robert Schaub 1.1 365
Robert Schaub 2.2 366 * Highly Likely - Probability 0.85-1.0, high confidence
367 * Likely - Probability 0.65-0.84, moderate-high confidence
368 * Unclear - Probability 0.35-0.64, or low confidence
369 * Unlikely - Probability 0.16-0.34, moderate-high confidence
370 * Highly Unlikely - Probability 0.0-0.15, high confidence
371 * Unsubstantiated - Insufficient evidence to determine probability
372
Robert Schaub 2.1 373 ==== 4.5.2 Claim Verdict Labels (Rollup) ====
Robert Schaub 1.1 374
Robert Schaub 2.1 375 Used when summarizing a claim across all scenarios.
Robert Schaub 1.1 376
Robert Schaub 2.1 377 **Enum Values:**
Robert Schaub 1.1 378
Robert Schaub 2.2 379 * Supported - Majority of scenarios are Likely or Highly Likely
380 * Refuted - Majority of scenarios are Unlikely or Highly Unlikely
381 * Inconclusive - Mixed scenarios or majority Unclear/Unsubstantiated
382
Robert Schaub 2.1 383 **Mapping Logic:**
Robert Schaub 2.2 384
Robert Schaub 2.1 385 * If ≥60% scenarios are (Highly Likely | Likely) → Supported
386 * If ≥60% scenarios are (Highly Unlikely | Unlikely) → Refuted
387 * Otherwise → Inconclusive
Robert Schaub 1.1 388
Robert Schaub 2.1 389 ==== 4.5.3 Article Verdict Labels (Stage 3) ====
Robert Schaub 1.1 390
Robert Schaub 2.1 391 Used for holistic article-level assessment.
Robert Schaub 1.1 392
Robert Schaub 2.1 393 **Enum Values:**
Robert Schaub 1.1 394
Robert Schaub 2.2 395 * WELL-SUPPORTED - Article thesis logically follows from supported claims
396 * MISLEADING - Claims may be true but article commits logical fallacies
397 * REFUTED - Central claims are refuted, invalidating thesis
398 * UNCERTAIN - Insufficient evidence or highly mixed claim verdicts
399
Robert Schaub 2.1 400 **Note:** Article verdict considers **claim centrality** (central claims override supporting claims).
Robert Schaub 1.1 401
Robert Schaub 2.1 402 ==== 4.5.4 API Field Mapping ====
Robert Schaub 1.1 403
Robert Schaub 2.1 404 |=Level|=API Field|=Enum Name
Robert Schaub 2.2 405 |Scenario|scenarios[].verdict.label|scenario_verdict_label
406 |Claim|claims[].rollup_verdict (optional)|claim_verdict_label
407 |Article|article_holistic_assessment.overall_verdict|article_verdict_label
Robert Schaub 1.1 408
Robert Schaub 2.2 409 ----
Robert Schaub 1.1 410
Robert Schaub 2.1 411 == 5. Cache Architecture ==
Robert Schaub 1.1 412
Robert Schaub 2.1 413 === 5.1 Redis Cache Design ===
Robert Schaub 1.1 414
Robert Schaub 2.1 415 **Technology:** Redis 7.0+ (in-memory key-value store)
Robert Schaub 1.1 416
Robert Schaub 2.1 417 **Cache Key Schema:**
Robert Schaub 1.1 418
Robert Schaub 2.2 419 {{{claim:v1norm1:{language}:{sha256(canonical_claim)}
420 }}}
421
Robert Schaub 2.1 422 **Example:**
Robert Schaub 2.2 423
424 {{{Claim (English): "COVID vaccines are 95% effective"
Robert Schaub 2.1 425 Canonical: "covid vaccines are 95 percent effective"
426 Language: "en"
427 SHA256: abc123...def456
428 Key: claim:v1norm1:en:abc123...def456
Robert Schaub 2.2 429 }}}
Robert Schaub 1.1 430
Robert Schaub 2.1 431 **Rationale:** Prevents cross-language collisions and enables per-language cache analytics.
Robert Schaub 1.1 432
Robert Schaub 2.1 433 **Data Structure:**
434
Robert Schaub 2.2 435 {{{SET claim:v1norm1:en:abc123...def456 '{...ClaimAnalysis JSON...}'
436 EXPIRE claim:v1norm1:en:abc123...def456 7776000 # 90 days
437 }}}
Robert Schaub 2.1 438
Robert Schaub 2.2 439 ----
Robert Schaub 2.1 440
441 === 5.1.1 Canonical Claim Normalization (v1) ===
442
443 The cache key depends on deterministic claim normalization. All implementations MUST follow this algorithm exactly.
444
445 **Algorithm: Canonical Claim Normalization v1**
446
Robert Schaub 2.2 447 {{{def normalize_claim_v1(claim_text: str, language: str) -> str:
Robert Schaub 2.1 448 """
449 Normalizes claim to canonical form for cache key generation.
450 Version: v1norm1 (POC1)
451 """
452 import re
453 import unicodedata
454
455 # Step 1: Unicode normalization (NFC)
456 text = unicodedata.normalize('NFC', claim_text)
457
458 # Step 2: Lowercase
459 text = text.lower()
460
461 # Step 3: Remove punctuation (except hyphens in words)
462 text = re.sub(r'[^\w\s-]', '', text)
463
464 # Step 4: Normalize whitespace (collapse multiple spaces)
465 text = re.sub(r'\s+', ' ', text).strip()
466
467 # Step 5: Numeric normalization
468 text = text.replace('%', ' percent')
469 # Spell out single-digit numbers
470 num_to_word = {'0':'zero', '1':'one', '2':'two', '3':'three',
471 '4':'four', '5':'five', '6':'six', '7':'seven',
472 '8':'eight', '9':'nine'}
473 for num, word in num_to_word.items():
474 text = re.sub(rf'\b{num}\b', word, text)
475
476 # Step 6: Common abbreviations (English only in v1)
477 if language == 'en':
478 text = text.replace('covid-19', 'covid')
479 text = text.replace('u.s.', 'us')
480 text = text.replace('u.k.', 'uk')
481
482 # Step 7: NO entity normalization in v1
483 # (Trump vs Donald Trump vs President Trump remain distinct)
484
485 return text
486
487 # Version identifier (include in cache namespace)
488 CANONICALIZER_VERSION = "v1norm1"
Robert Schaub 2.2 489 }}}
Robert Schaub 2.1 490
491 **Cache Key Formula (Updated):**
492
Robert Schaub 2.2 493 {{{language = "en"
Robert Schaub 2.1 494 canonical = normalize_claim_v1(claim_text, language)
495 cache_key = f"claim:{CANONICALIZER_VERSION}:{language}:{sha256(canonical)}"
496
497 Example:
498 claim: "COVID-19 vaccines are 95% effective"
499 canonical: "covid vaccines are 95 percent effective"
500 sha256: abc123...def456
501 key: "claim:v1norm1:en:abc123...def456"
Robert Schaub 2.2 502 }}}
Robert Schaub 2.1 503
504 **Cache Metadata MUST Include:**
505
Robert Schaub 2.2 506 {{{{
Robert Schaub 2.1 507 "canonical_claim": "covid vaccines are 95 percent effective",
508 "canonicalizer_version": "v1norm1",
509 "language": "en",
510 "original_claim_samples": ["COVID-19 vaccines are 95% effective"]
511 }
Robert Schaub 2.2 512 }}}
Robert Schaub 2.1 513
514 **Version Upgrade Path:**
Robert Schaub 2.2 515
Robert Schaub 2.1 516 * v1norm1 → v1norm2: Cache namespace changes, old keys remain valid until TTL
517 * v1normN → v2norm1: Major version bump, invalidate all v1 caches
518
Robert Schaub 2.2 519 ----
Robert Schaub 2.1 520
521 === 5.1.2 Copyright & Data Retention Policy ===
522
523 **Evidence Excerpt Storage:**
524
525 To comply with copyright law and fair use principles:
526
527 **What We Store:**
Robert Schaub 2.2 528
Robert Schaub 2.1 529 * **Metadata only:** Title, author, publisher, URL, publication date
530 * **Short excerpts:** Max 25 words per quote, max 3 quotes per evidence item
531 * **Summaries:** AI-generated bullet points (not verbatim text)
532 * **No full articles:** Never store complete article text beyond job processing
533
534 **Total per Cached Claim:**
Robert Schaub 2.2 535
Robert Schaub 2.1 536 * Scenarios: 2 per claim
537 * Evidence items: 6 per scenario (12 total)
538 * Quotes: 3 per evidence × 25 words = 75 words per item
Robert Schaub 2.2 539 * **Maximum stored verbatim text:** ~~900 words per claim (12 × 75)
Robert Schaub 2.1 540
541 **Retention:**
Robert Schaub 2.2 542
Robert Schaub 2.1 543 * Cache TTL: 90 days
544 * Job outputs: 24 hours (then archived or deleted)
545 * No persistent full-text article storage
546
547 **Rationale:**
Robert Schaub 2.2 548
Robert Schaub 2.1 549 * Short excerpts for citation = fair use
550 * Summaries are transformative (not copyrightable)
551 * Limited retention (90 days max)
552 * No commercial republication of excerpts
553
554 **DMCA Compliance:**
Robert Schaub 2.2 555
Robert Schaub 2.1 556 * Cache invalidation endpoint available for rights holders
557 * Contact: dmca@factharbor.org
558
Robert Schaub 2.2 559 ----
Robert Schaub 2.1 560
Robert Schaub 2.2 561 == Summary ==
Robert Schaub 2.1 562
Robert Schaub 2.2 563 This WYSIWYG preview shows the **structure and key sections** of the 1,515-line API specification.
Robert Schaub 2.1 564
Robert Schaub 2.2 565 **Full specification includes:**
Robert Schaub 2.1 566
Robert Schaub 2.2 567 * Complete API endpoints (7 total)
568 * All data schemas (ClaimExtraction, ClaimAnalysis, HolisticAssessment, Complete)
569 * Quality gates & validation rules
570 * LLM configuration for all 3 stages
571 * Implementation notes with code samples
572 * Testing strategy
573 * Cross-references to other pages
Robert Schaub 2.1 574
Robert Schaub 2.2 575 **The complete specification is available in:**
Robert Schaub 2.1 576
Robert Schaub 2.2 577 * FactHarbor_POC1_API_and_Schemas_Spec_v0_4_1_PATCHED.md (45 KB standalone)
578 * Export files (TEST/PRODUCTION) for xWiki import