Changes for page POC1 API & Schemas Specification

Last modified by Robert Schaub on 2025/12/24 18:26

From 2.2 to 2.1

From version 5.2

edited by Robert Schaub
on 2025/12/24 18:26

Change comment: Update document after refactoring.

To version 2.2

edited by Robert Schaub
on 2025/12/24 16:28

Change comment: There is no comment for this version

Raw
Rendered

Summary

Page properties (2 modified, 0 added, 0 removed)

Details

Page properties

Parent

@@ -1,1 +1,1 @@
--Test.FactHarbor V0\.9\.103.Specification.POC.WebHome
++Test.FactHarbor.Specification.POC.WebHome

Content

@@ -41,24 +41,23 @@
  FactHarbor POC1 uses a **3-stage architecture** designed for claim-level caching and cost efficiency:
--{{mermaid}}
--graph TD
-- A[Article Input] --> B[Stage 1: Extract Claims]
-- B --> C{For Each Claim}
-- C --> D[Check Cache]
-- D -->|Cache HIT| E[Return Cached Verdict]
-- D -->|Cache MISS| F[Stage 2: Analyze Claim]
-- F --> G[Store in Cache]
-- G --> E
-- E --> H[Stage 3: Holistic Assessment]
-- H --> I[Final Report]
--{{/mermaid}}
++{{{graph TD
++    A[Article Input] --> B[Stage 1: Extract Claims]
++    B --> C{For Each Claim}
++    C --> D[Check Cache]
++    D -->|Cache HIT| E[Return Cached Verdict]
++    D -->|Cache MISS| F[Stage 2: Analyze Claim]
++    F --> G[Store in Cache]
++    G --> E
++    E --> H[Stage 3: Holistic Assessment]
++    H --> I[Final Report]
++}}}
  ==== Stage 1: Claim Extraction (Haiku, no cache) ====
  * **Input:** Article text
  * **Output:** 5 canonical claims (normalized, deduplicated)
--* **Model:** Claude Haiku 4 (default, configurable via LLM abstraction layer)
++* **Model:** Claude Haiku 4
  * **Cost:** $0.003 per article
  * **Cache strategy:** No caching (article-specific)
@@ -66,7 +66,7 @@
  * **Input:** Single canonical claim
  * **Output:** Scenarios + Evidence + Verdicts
--* **Model:** Claude Sonnet 3.5 (default, configurable via LLM abstraction layer)
++* **Model:** Claude Sonnet 3.5
  * **Cost:** $0.081 per NEW claim
  * **Cache strategy:** Redis, 90-day TTL
  * **Cache key:** claim:v1norm1:{language}:{sha256(canonical_claim)}
@@ -75,14 +75,10 @@
  * **Input:** Article + Claim verdicts (from cache or Stage 2)
  * **Output:** Article verdict + Fallacies + Logic quality
--* **Model:** Claude Sonnet 3.5 (default, configurable via LLM abstraction layer)
++* **Model:** Claude Sonnet 3.5
  * **Cost:** $0.030 per article
  * **Cache strategy:** No caching (article-specific)
--
--
--**Note:** Stage 3 implements **Approach 1 (Single-Pass Holistic Analysis)** from the [[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]]. While claim analysis (Stage 2) is cached for efficiency, the holistic assessment maintains the integrated evaluation philosophy of Approach 1.
--
  === Total Cost Formula: ===
  {{{Cost = $0.003 (extraction) + (N_new_claims × $0.081) + $0.030 (holistic)
@@ -150,27 +150,27 @@
  ==== User Experience Example: ====
  {{{{
-- "status": "cache_only_mode",
-- "message": "Monthly credit limit reached. Showing cached results only.",
-- "cache_coverage": {
-- "claims_total": 5,
-- "claims_cached": 3,
-- "claims_missing": 2,
-- "coverage_percent": 60
-- },
-- "cached_claims": [
-- {"claim_id": "C1", "verdict": "Likely", "confidence": 0.82},
-- {"claim_id": "C2", "verdict": "Highly Likely", "confidence": 0.91},
-- {"claim_id": "C4", "verdict": "Unclear", "confidence": 0.55}
-- ],
-- "missing_claims": [
-- {"claim_id": "C3", "claim_text": "...", "estimated_cost": "$0.081"},
-- {"claim_id": "C5", "claim_text": "...", "estimated_cost": "$0.081"}
-- ],
-- "upgrade_options": {
-- "top_up": "$5 for 20-70 more articles",
-- "pro_tier": "$50/month unlimited"
-- }
++  "status": "cache_only_mode",
++  "message": "Monthly credit limit reached. Showing cached results only.",
++  "cache_coverage": {
++    "claims_total": 5,
++    "claims_cached": 3,
++    "claims_missing": 2,
++    "coverage_percent": 60
++  },
++  "cached_claims": [
++    {"claim_id": "C1", "verdict": "Likely", "confidence": 0.82},
++    {"claim_id": "C2", "verdict": "Highly Likely", "confidence": 0.91},
++    {"claim_id": "C4", "verdict": "Unclear", "confidence": 0.55}
++  ],
++  "missing_claims": [
++    {"claim_id": "C3", "claim_text": "...", "estimated_cost": "$0.081"},
++    {"claim_id": "C5", "claim_text": "...", "estimated_cost": "$0.081"}
++  ],
++  "upgrade_options": {
++    "top_up": "$5 for 20-70 more articles",
++    "pro_tier": "$50/month unlimited"
++  }
  }
  }}}
@@ -183,328 +183,6 @@
  ----
--
--
--== 6. LLM Abstraction Layer ==
--
--=== 6.1 Design Principle ===
--
--**FactHarbor uses provider-agnostic LLM abstraction** to avoid vendor lock-in and enable:
--
--* **Provider switching:** Change LLM providers without code changes
--* **Cost optimization:** Use different providers for different stages
--* **Resilience:** Automatic fallback if primary provider fails
--* **Cross-checking:** Compare outputs from multiple providers
--* **A/B testing:** Test new models without deployment changes
--
--**Implementation:** All LLM calls go through an abstraction layer that routes to configured providers.
--
------
--
--=== 6.2 LLM Provider Interface ===
--
--**Abstract Interface:**
--
--{{{
--interface LLMProvider {
--  // Core methods
--  complete(prompt: string, options: CompletionOptions): Promise<CompletionResponse>
--  stream(prompt: string, options: CompletionOptions): AsyncIterator<StreamChunk>
--
--  // Provider metadata
--  getName(): string
--  getMaxTokens(): number
--  getCostPer1kTokens(): { input: number, output: number }
--
--  // Health check
--  isAvailable(): Promise<boolean>
--}
--
--interface CompletionOptions {
--  model?: string
--  maxTokens?: number
--  temperature?: number
--  stopSequences?: string[]
--  systemPrompt?: string
--}
--}}}
--
------
--
--=== 6.3 Supported Providers (POC1) ===
--
--**Primary Provider (Default):**
--
--* **Anthropic Claude API**
--  * Models: Claude Haiku 4, Claude Sonnet 3.5, Claude Opus 4
--  * Used by default in POC1
--  * Best quality for holistic analysis
--
--**Secondary Providers (Future):**
--
--* **OpenAI API**
--  * Models: GPT-4o, GPT-4o-mini
--  * For cost comparison
--
--* **Google Vertex AI**
--  * Models: Gemini 1.5 Pro, Gemini 1.5 Flash
--  * For diversity in evidence gathering
--
--* **Local Models** (Post-POC)
--  * Models: Llama 3.1, Mistral
--  * For privacy-sensitive deployments
--
------
--
--=== 6.4 Provider Configuration ===
--
--**Environment Variables:**
--
--{{{
--# Primary provider
--LLM_PRIMARY_PROVIDER=anthropic
--ANTHROPIC_API_KEY=sk-ant-...
--
--# Fallback provider
--LLM_FALLBACK_PROVIDER=openai
--OPENAI_API_KEY=sk-...
--
--# Provider selection per stage
--LLM_STAGE1_PROVIDER=anthropic
--LLM_STAGE1_MODEL=claude-haiku-4
--LLM_STAGE2_PROVIDER=anthropic
--LLM_STAGE2_MODEL=claude-sonnet-3-5
--LLM_STAGE3_PROVIDER=anthropic
--LLM_STAGE3_MODEL=claude-sonnet-3-5
--
--# Cost limits
--LLM_MAX_COST_PER_REQUEST=1.00
--}}}
--
--**Database Configuration (Alternative):**
--
--{{{{
--{
--  "providers": [
--    {
--      "name": "anthropic",
--      "api_key_ref": "vault://anthropic-api-key",
--      "enabled": true,
--      "priority": 1
--    },
--    {
--      "name": "openai",
--      "api_key_ref": "vault://openai-api-key",
--      "enabled": true,
--      "priority": 2
--    }
--  ],
--  "stage_config": {
--    "stage1": {
--      "provider": "anthropic",
--      "model": "claude-haiku-4",
--      "max_tokens": 4096,
--      "temperature": 0.0
--    },
--    "stage2": {
--      "provider": "anthropic",
--      "model": "claude-sonnet-3-5",
--      "max_tokens": 16384,
--      "temperature": 0.3
--    },
--    "stage3": {
--      "provider": "anthropic",
--      "model": "claude-sonnet-3-5",
--      "max_tokens": 8192,
--      "temperature": 0.2
--    }
--  }
--}
--}}}
--
------
--
--=== 6.5 Stage-Specific Models (POC1 Defaults) ===
--
--**Stage 1: Claim Extraction**
--
--* **Default:** Anthropic Claude Haiku 4
--* **Alternative:** OpenAI GPT-4o-mini, Google Gemini 1.5 Flash
--* **Rationale:** Fast, cheap, simple task
--* **Cost:** ~$0.003 per article
--
--**Stage 2: Claim Analysis** (CACHEABLE)
--
--* **Default:** Anthropic Claude Sonnet 3.5
--* **Alternative:** OpenAI GPT-4o, Google Gemini 1.5 Pro
--* **Rationale:** High-quality analysis, cached 90 days
--* **Cost:** ~$0.081 per NEW claim
--
--**Stage 3: Holistic Assessment**
--
--* **Default:** Anthropic Claude Sonnet 3.5
--* **Alternative:** OpenAI GPT-4o, Claude Opus 4 (for high-stakes)
--* **Rationale:** Complex reasoning, logical fallacy detection
--* **Cost:** ~$0.030 per article
--
--**Cost Comparison (Example):**
--
--|=Stage|=Anthropic (Default)|=OpenAI Alternative|=Google Alternative
--|Stage 1|Claude Haiku 4 ($0.003)|GPT-4o-mini ($0.002)|Gemini Flash ($0.002)
--|Stage 2|Claude Sonnet 3.5 ($0.081)|GPT-4o ($0.045)|Gemini Pro ($0.050)
--|Stage 3|Claude Sonnet 3.5 ($0.030)|GPT-4o ($0.018)|Gemini Pro ($0.020)
--|**Total (0% cache)**|**$0.114**|**$0.065**|**$0.072**
--
--**Note:** POC1 uses Anthropic exclusively for consistency. Multi-provider support planned for POC2.
--
------
--
--=== 6.6 Failover Strategy ===
--
--**Automatic Failover:**
--
--{{{
--async function completeLLM(stage: string, prompt: string): Promise<string> {
--  const primaryProvider = getProviderForStage(stage)
--  const fallbackProvider = getFallbackProvider()
--
--  try {
--    return await primaryProvider.complete(prompt)
--  } catch (error) {
--    if (error.type === 'rate_limit' || error.type === 'service_unavailable') {
--      logger.warn(`Primary provider failed, using fallback`)
--      return await fallbackProvider.complete(prompt)
--    }
--    throw error
--  }
--}
--}}}
--
--**Fallback Priority:**
--
--1. **Primary:** Configured provider for stage
--2. **Secondary:** Fallback provider (if configured)
--3. **Cache:** Return cached result (if available for Stage 2)
--4. **Error:** Return 503 Service Unavailable
--
------
--
--=== 6.7 Provider Selection API ===
--
--**Admin Endpoint:** POST /admin/v1/llm/configure
--
--**Update provider for specific stage:**
--
--{{{{
--{
--  "stage": "stage2",
--  "provider": "openai",
--  "model": "gpt-4o",
--  "max_tokens": 16384,
--  "temperature": 0.3
--}
--}}}
--
--**Response:** 200 OK
--
--{{{{
--{
--  "message": "LLM configuration updated",
--  "stage": "stage2",
--  "previous": {
--    "provider": "anthropic",
--    "model": "claude-sonnet-3-5"
--  },
--  "current": {
--    "provider": "openai",
--    "model": "gpt-4o"
--  },
--  "cost_impact": {
--    "previous_cost_per_claim": 0.081,
--    "new_cost_per_claim": 0.045,
--    "savings_percent": 44
--  }
--}
--}}}
--
--**Get current configuration:**
--
--GET /admin/v1/llm/config
--
--{{{{
--{
--  "providers": ["anthropic", "openai"],
--  "primary": "anthropic",
--  "fallback": "openai",
--  "stages": {
--    "stage1": {
--      "provider": "anthropic",
--      "model": "claude-haiku-4",
--      "cost_per_request": 0.003
--    },
--    "stage2": {
--      "provider": "anthropic",
--      "model": "claude-sonnet-3-5",
--      "cost_per_new_claim": 0.081
--    },
--    "stage3": {
--      "provider": "anthropic",
--      "model": "claude-sonnet-3-5",
--      "cost_per_request": 0.030
--    }
--  }
--}
--}}}
--
------
--
--=== 6.8 Implementation Notes ===
--
--**Provider Adapter Pattern:**
--
--{{{
--class AnthropicProvider implements LLMProvider {
--  async complete(prompt: string, options: CompletionOptions) {
--    const response = await anthropic.messages.create({
--      model: options.model || 'claude-sonnet-3-5',
--      max_tokens: options.maxTokens || 4096,
--      messages: [{ role: 'user', content: prompt }],
--      system: options.systemPrompt
--    })
--    return response.content[0].text
--  }
--}
--
--class OpenAIProvider implements LLMProvider {
--  async complete(prompt: string, options: CompletionOptions) {
--    const response = await openai.chat.completions.create({
--      model: options.model || 'gpt-4o',
--      max_tokens: options.maxTokens || 4096,
--      messages: [
--        { role: 'system', content: options.systemPrompt },
--        { role: 'user', content: prompt }
--      ]
--    })
--    return response.choices[0].message.content
--  }
--}
--}}}
--
--**Provider Registry:**
--
--{{{
--const providers = new Map<string, LLMProvider>()
--providers.set('anthropic', new AnthropicProvider())
--providers.set('openai', new OpenAIProvider())
--providers.set('google', new GoogleProvider())
--
--function getProvider(name: string): LLMProvider {
--  return providers.get(name) || providers.get(config.primaryProvider)
--}
--}}}
--
------
--
  == 3. REST API Contract ==
  === 3.1 User Credit Tracking ===
@@ -514,19 +514,19 @@
  **Response:** 200 OK
  {{{{
-- "user_id": "user_abc123",
-- "tier": "free",
-- "credit_limit": 10.00,
-- "credit_used": 7.42,
-- "credit_remaining": 2.58,
-- "reset_date": "2025-02-01T00:00:00Z",
-- "cache_only_mode": false,
-- "usage_stats": {
-- "articles_analyzed": 67,
-- "claims_from_cache": 189,
-- "claims_newly_analyzed": 113,
-- "cache_hit_rate": 0.626
-- }
++  "user_id": "user_abc123",
++  "tier": "free",
++  "credit_limit": 10.00,
++  "credit_used": 7.42,
++  "credit_remaining": 2.58,
++  "reset_date": "2025-02-01T00:00:00Z",
++  "cache_only_mode": false,
++  "usage_stats": {
++    "articles_analyzed": 67,
++    "claims_from_cache": 189,
++    "claims_newly_analyzed": 113,
++    "cache_hit_rate": 0.626
++  }
  }
  }}}
@@ -547,11 +547,11 @@
  OR use the client.request_id field:
  {{{{
-- "input_url": "...",
-- "client": {
-- "request_id": "client-uuid-12345",
-- "source_label": "optional"
-- }
++  "input_url": "...",
++  "client": {
++    "request_id": "client-uuid-12345",
++    "source_label": "optional"
++  }
  }
  }}}
@@ -565,11 +565,11 @@
  **Example Response (Idempotent):**
  {{{{
-- "job_id": "01J...ULID",
-- "status": "RUNNING",
-- "idempotent": true,
-- "original_request_at": "2025-12-24T10:31:00Z",
-- "message": "Returning existing job (idempotency key matched)"
++  "job_id": "01J...ULID",
++  "status": "RUNNING",
++  "idempotent": true,
++  "original_request_at": "2025-12-24T10:31:00Z",
++  "message": "Returning existing job (idempotency key matched)"
  }
  }}}
@@ -576,21 +576,21 @@
  ==== Request Body: ====
  {{{{
-- "input_type": "url",
-- "input_url": "https://example.com/medical-report-01",
-- "input_text": null,
-- "options": {
-- "browsing": "on",
-- "depth": "standard",
-- "max_claims": 5,
-- "scenarios_per_claim": 2,
-- "max_evidence_per_scenario": 6,
-- "context_aware_analysis": true
-- },
-- "client": {
-- "request_id": "optional-client-tracking-id",
-- "source_label": "optional"
-- }
++  "input_type": "url",
++  "input_url": "https://example.com/medical-report-01",
++  "input_text": null,
++  "options": {
++    "browsing": "on",
++    "depth": "standard",
++    "max_claims": 5,
++    "scenarios_per_claim": 2,
++    "max_evidence_per_scenario": 6,
++    "context_aware_analysis": true
++  },
++  "client": {
++    "request_id": "optional-client-tracking-id",
++    "source_label": "optional"
++  }
  }
  }}}
@@ -606,27 +606,27 @@
  **Response:** 202 Accepted
  {{{{
-- "job_id": "01J...ULID",
-- "status": "QUEUED",
-- "created_at": "2025-12-24T10:31:00Z",
-- "estimated_cost": 0.114,
-- "cost_breakdown": {
-- "stage1_extraction": 0.003,
-- "stage2_new_claims": 0.081,
-- "stage2_cached_claims": 0.000,
-- "stage3_holistic": 0.030
-- },
-- "cache_info": {
-- "claims_to_extract": 5,
-- "estimated_cache_hits": 4,
-- "estimated_new_claims": 1
-- },
-- "links": {
-- "self": "/v1/jobs/01J...ULID",
-- "result": "/v1/jobs/01J...ULID/result",
-- "report": "/v1/jobs/01J...ULID/report",
-- "events": "/v1/jobs/01J...ULID/events"
-- }
++  "job_id": "01J...ULID",
++  "status": "QUEUED",
++  "created_at": "2025-12-24T10:31:00Z",
++  "estimated_cost": 0.114,
++  "cost_breakdown": {
++    "stage1_extraction": 0.003,
++    "stage2_new_claims": 0.081,
++    "stage2_cached_claims": 0.000,
++    "stage3_holistic": 0.030
++  },
++  "cache_info": {
++    "claims_to_extract": 5,
++    "estimated_cache_hits": 4,
++    "estimated_new_claims": 1
++  },
++  "links": {
++    "self": "/v1/jobs/01J...ULID",
++    "result": "/v1/jobs/01J...ULID/result",
++    "report": "/v1/jobs/01J...ULID/report",
++    "events": "/v1/jobs/01J...ULID/events"
++  }
  }
  }}}
@@ -635,12 +635,12 @@
 Payment Required - Free tier limit reached, cache-only mode
  {{{{
-- "error": "credit_limit_reached",
-- "message": "Monthly credit limit reached. Entering cache-only mode.",
-- "cache_only_mode": true,
-- "credit_remaining": 0.00,
-- "reset_date": "2025-02-01T00:00:00Z",
-- "action": "Resubmit with cache_preference=allow_partial for cached results"
++  "error": "credit_limit_reached",
++  "message": "Monthly credit limit reached. Entering cache-only mode.",
++  "cache_only_mode": true,
++  "credit_remaining": 0.00,
++  "reset_date": "2025-02-01T00:00:00Z",
++  "action": "Resubmit with cache_preference=allow_partial for cached results"
  }
  }}}
@@ -651,29 +651,29 @@
  === 4.1 Stage 1 Output: ClaimExtraction ===
  {{{{
-- "job_id": "01J...ULID",
-- "stage": "stage1_extraction",
-- "article_metadata": {
-- "title": "Article title",
-- "source_url": "https://example.com/article",
-- "extracted_text_length": 5234,
-- "language": "en"
-- },
-- "claims": [
-- {
-- "claim_id": "C1",
-- "claim_text": "Original claim text from article",
-- "canonical_claim": "Normalized, deduplicated phrasing",
-- "claim_hash": "sha256:abc123...",
-- "is_central_to_thesis": true,
-- "claim_type": "causal",
-- "evaluability": "evaluable",
-- "risk_tier": "B",
-- "domain": "public_health"
-- }
-- ],
-- "article_thesis": "Main argument detected",
-- "cost": 0.003
++  "job_id": "01J...ULID",
++  "stage": "stage1_extraction",
++  "article_metadata": {
++    "title": "Article title",
++    "source_url": "https://example.com/article",
++    "extracted_text_length": 5234,
++    "language": "en"
++  },
++  "claims": [
++    {
++      "claim_id": "C1",
++      "claim_text": "Original claim text from article",
++      "canonical_claim": "Normalized, deduplicated phrasing",
++      "claim_hash": "sha256:abc123...",
++      "is_central_to_thesis": true,
++      "claim_type": "causal",
++      "evaluability": "evaluable",
++      "risk_tier": "B",
++      "domain": "public_health"
++    }
++  ],
++  "article_thesis": "Main argument detected",
++  "cost": 0.003
  }
  }}}
@@ -759,7 +759,7 @@
  **Data Structure:**
  {{{SET claim:v1norm1:en:abc123...def456 '{...ClaimAnalysis JSON...}'
--EXPIRE claim:v1norm1:en:abc123...def456 7776000 # 90 days
++EXPIRE claim:v1norm1:en:abc123...def456 7776000  # 90 days
  }}}
  ----
@@ -771,44 +771,44 @@
  **Algorithm: Canonical Claim Normalization v1**
  {{{def normalize_claim_v1(claim_text: str, language: str) -> str:
-- """
-- Normalizes claim to canonical form for cache key generation.
-- Version: v1norm1 (POC1)
-- """
-- import re
-- import unicodedata
--
-- # Step 1: Unicode normalization (NFC)
-- text = unicodedata.normalize('NFC', claim_text)
--
-- # Step 2: Lowercase
-- text = text.lower()
--
-- # Step 3: Remove punctuation (except hyphens in words)
-- text = re.sub(r'[^\w\s-]', '', text)
--
-- # Step 4: Normalize whitespace (collapse multiple spaces)
-- text = re.sub(r'\s+', ' ', text).strip()
--
-- # Step 5: Numeric normalization
-- text = text.replace('%', ' percent')
-- # Spell out single-digit numbers
-- num_to_word = {'0':'zero', '1':'one', '2':'two', '3':'three',
-- '4':'four', '5':'five', '6':'six', '7':'seven',
-- '8':'eight', '9':'nine'}
-- for num, word in num_to_word.items():
-- text = re.sub(rf'\b{num}\b', word, text)
--
-- # Step 6: Common abbreviations (English only in v1)
-- if language == 'en':
-- text = text.replace('covid-19', 'covid')
-- text = text.replace('u.s.', 'us')
-- text = text.replace('u.k.', 'uk')
--
-- # Step 7: NO entity normalization in v1
-- # (Trump vs Donald Trump vs President Trump remain distinct)
--
-- return text
++    """
++    Normalizes claim to canonical form for cache key generation.
++    Version: v1norm1 (POC1)
++    """
++    import re
++    import unicodedata
++
++    # Step 1: Unicode normalization (NFC)
++    text = unicodedata.normalize('NFC', claim_text)
++
++    # Step 2: Lowercase
++    text = text.lower()
++
++    # Step 3: Remove punctuation (except hyphens in words)
++    text = re.sub(r'[^\w\s-]', '', text)
++
++    # Step 4: Normalize whitespace (collapse multiple spaces)
++    text = re.sub(r'\s+', ' ', text).strip()
++
++    # Step 5: Numeric normalization
++    text = text.replace('%', ' percent')
++    # Spell out single-digit numbers
++    num_to_word = {'0':'zero', '1':'one', '2':'two', '3':'three',
++                   '4':'four', '5':'five', '6':'six', '7':'seven',
++                   '8':'eight', '9':'nine'}
++    for num, word in num_to_word.items():
++        text = re.sub(rf'\b{num}\b', word, text)
++
++    # Step 6: Common abbreviations (English only in v1)
++    if language == 'en':
++        text = text.replace('covid-19', 'covid')
++        text = text.replace('u.s.', 'us')
++        text = text.replace('u.k.', 'uk')
++
++    # Step 7: NO entity normalization in v1
++    # (Trump vs Donald Trump vs President Trump remain distinct)
++
++    return text
  # Version identifier (include in cache namespace)
  CANONICALIZER_VERSION = "v1norm1"
@@ -821,19 +821,19 @@
  cache_key = f"claim:{CANONICALIZER_VERSION}:{language}:{sha256(canonical)}"
  Example:
-- claim: "COVID-19 vaccines are 95% effective"
-- canonical: "covid vaccines are 95 percent effective"
-- sha256: abc123...def456
-- key: "claim:v1norm1:en:abc123...def456"
++  claim: "COVID-19 vaccines are 95% effective"
++  canonical: "covid vaccines are 95 percent effective"
++  sha256: abc123...def456
++  key: "claim:v1norm1:en:abc123...def456"
  }}}
  **Cache Metadata MUST Include:**
  {{{{
-- "canonical_claim": "covid vaccines are 95 percent effective",
-- "canonicalizer_version": "v1norm1",
-- "language": "en",
-- "original_claim_samples": ["COVID-19 vaccines are 95% effective"]
++  "canonical_claim": "covid vaccines are 95 percent effective",
++  "canonicalizer_version": "v1norm1",
++  "language": "en",
++  "original_claim_samples": ["COVID-19 vaccines are 95% effective"]
  }
  }}}

Changes for page POC1 API & Schemas Specification

Summary

Details

Applications

Navigation

Need help?