FactHarbor POC1 Architecture Analysis 1.Jan.26

Last modified by Robert Schaub on 2026/02/08 08:12

FactHarbor POC1 Architecture Analysis

Version: 2.6.17
Analysis Date: January 2026
Document Purpose: Technical diagrams, gap analysis, and optimization recommendations

1. AKEL Flow Diagram (with LLM and WebSearch Interactions)

flowchart TB
    subgraph Input["📥 Input Layer"]
        URL[URL Input]
        TEXT[Text Input]
    end

    subgraph Retrieval["🔍 Content Retrieval"]
        FETCH[extractTextFromUrl]
        PDF[PDF Parser
pdf-parse v1]
        HTML[HTML Parser
cheerio]
    end

    subgraph AKEL["🧠 AKEL Pipeline"]
        direction TB
        
        subgraph Step1["Step 1: Understand"]
            UNDERSTAND[understandClaim
━━━━━━━━━━━━━
• Detect input type
• Extract claims
• Identify dependencies
• Assign risk tiers]
            LLM1[("🤖 LLM Call #1
Claude/GPT/Gemini")]
        end
        
        subgraph Step2["Step 2: Research (Iterative)"]
            DECIDE[decideNextResearch
━━━━━━━━━━━━━
• Generate queries
• Focus areas]
            
            SEARCH[("🌐 Web Search
Google CSE / SerpAPI")]
            
            FETCHSRC[fetchSourceContent
━━━━━━━━━━━━━
• Parallel fetching
• Timeout handling]
            
            EXTRACT[extractFacts
━━━━━━━━━━━━━
• Parse sources
• Extract facts]
            LLM2[("🤖 LLM Call #2-N
Per source")]
        end
        
        subgraph Step3["Step 3: Verdict Generation"]
            VERDICT[generateVerdicts
━━━━━━━━━━━━━
• Claim verdicts
• Article verdict
• Dependency propagation]
            LLM3[("🤖 LLM Call #N+1
Final synthesis")]
        end
        
        subgraph Step4["Step 4: Report"]
            REPORT[buildTwoPanelSummary
━━━━━━━━━━━━━
• Format results
• Generate markdown]
        end
    end

    subgraph Output["📤 Output"]
        RESULT[AnalysisResult JSON]
        MARKDOWN[Report Markdown]
    end

    %% Flow connections
    URL --> FETCH
    TEXT --> UNDERSTAND
    FETCH --> PDF
    FETCH --> HTML
    PDF --> UNDERSTAND
    HTML --> UNDERSTAND
    
    UNDERSTAND --> LLM1
    LLM1 --> DECIDE
    
    DECIDE --> SEARCH
    SEARCH --> FETCHSRC
    FETCHSRC --> EXTRACT
    EXTRACT --> LLM2
    LLM2 --> DECIDE
    
    DECIDE -->|"Research Complete"| VERDICT
    VERDICT --> LLM3
    LLM3 --> REPORT
    
    REPORT --> RESULT
    REPORT --> MARKDOWN

    %% Styling
    classDef llm fill:#e1f5fe,stroke:#01579b,stroke-width:2px
    classDef search fill:#fff3e0,stroke:#e65100,stroke-width:2px
    classDef step fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
    
    class LLM1,LLM2,LLM3 llm
    class SEARCH search
    class UNDERSTAND,DECIDE,FETCHSRC,EXTRACT,VERDICT,REPORT step

2. ERD Data Model (Current POC1 Implementation)

Data Objects ERD

erDiagram
    ARTICLE ||--o{ CLAIM : "contains"
    ARTICLE ||--|| ARTICLE_VERDICT : "has"
    CLAIM ||--|| CLAIM_VERDICT : "has"
    CLAIM ||--o{ CLAIM : "depends on"
    CLAIM_VERDICT }o--o{ EVIDENCE : "supported by"
    SOURCE ||--o{ EVIDENCE : "provides"
    ARTICLE ||--o{ SOURCE : "references"

    ARTICLE {
        string id PK "Unique identifier (job ID)"
        string inputType "text | url"
        string inputValue "Original URL or text"
        string articleThesis "Main argument/thesis"
        string detectedInputType "question | claim | article"
        boolean isQuestion "True if input is a question"
        datetime createdAt "Analysis timestamp"
        datetime updatedAt "Last update"
        json distinctProceedings "Legal proceedings if any"
        boolean hasMultipleProceedings "Multi-proceeding flag"
        string proceedingContext "Context for proceedings"
        json logicalFallacies "Detected fallacies array"
        boolean isPseudoscience "Pseudoscience detection"
        string_array pseudoscienceCategories "Categories if detected"
        int llmCalls "Total LLM API calls"
        json searchQueries "All search queries performed"
        string schemaVersion "e.g. 2.6.17"
    }

    CLAIM {
        string id PK "SC1, SC2, C1, etc."
        string articleId FK "Parent article"
        string text "The claim statement"
        string type "legal | procedural | factual | evaluative"
        string claimRole "attribution | source | timing | core"
        string_array dependsOn "IDs of prerequisite claims"
        string_array keyEntities "Named entities in claim"
        boolean isCentral "Is this a central claim?"
        string relatedProceedingId "Linked proceeding if any"
        int startOffset "Position in original text"
        int endOffset "End position in original text"
        string approximatePosition "Descriptive position"
    }

    CLAIM_VERDICT {
        string id PK "Same as claim ID"
        string claimId FK "Reference to claim"
        string llmVerdict "WELL-SUPPORTED | PARTIALLY-SUPPORTED | UNCERTAIN | REFUTED"
        string verdict "True | Mostly True | Leaning True | Unverified | Leaning False | Mostly False | False"
        int confidence "0-100 LLM confidence"
        int truthPercentage "0-100 calibrated truth score"
        string riskTier "A (high) | B (medium) | C (low)"
        string reasoning "Explanation of verdict"
        string_array supportingFactIds "Evidence IDs supporting this"
        boolean dependencyFailed "True if prerequisite failed"
        string_array failedDependencies "Which deps failed"
        string highlightColor "green | light-green | yellow | orange | dark-orange | red | dark-red"
        boolean isPseudoscience "Pseudoscience flag"
        string escalationReason "Why verdict was escalated"
    }

    ARTICLE_VERDICT {
        string id PK "Same as article ID"
        string articleId FK "Reference to article"
        string llmArticleVerdict "Original LLM verdict"
        int llmArticleConfidence "Original LLM confidence"
        string articleVerdict "True | Mostly True | Leaning True | Unverified | Leaning False | Mostly False | False"
        int articleTruthPercentage "0-100 calibrated score"
        string articleVerdictReason "Why verdict differs from claims avg"
        int claimsAverageTruthPercentage "Average of claim verdicts"
        string claimsAverageVerdict "7-point average verdict"
        int claimsTotal "Total claims analyzed"
        int claimsSupported "Claims with truth >= 72%"
        int claimsUncertain "Claims with truth 43-71%"
        int claimsRefuted "Claims with truth < 43%"
        int centralClaimsTotal "Number of central claims"
        int centralClaimsSupported "Central claims supported"
    }

    EVIDENCE {
        string id PK "S1-F1, S1-F2 format"
        string sourceId FK "Reference to source"
        string claimId FK "Optional: specific claim this supports"
        string fact "The factual statement extracted"
        string category "legal_provision | evidence | expert_quote | statistic | event | criticism"
        string specificity "high | medium"
        string sourceExcerpt "Original text excerpt"
        string relatedProceedingId "Linked proceeding if any"
        boolean isContestedClaim "Is this a contested assertion"
        string claimSource "Who made contested claim"
    }

    SOURCE {
        string id PK "S1, S2, etc."
        string articleId FK "Parent article"
        string url "Full URL"
        string title "Page/document title"
        string domain "Extracted domain"
        int trackRecordScore "0-100 reliability score or null"
        string fullText "Extracted content"
        datetime fetchedAt "When content was fetched"
        string category "news | academic | government | legal"
        boolean fetchSuccess "True if fetch succeeded"
        string searchQuery "Which query found this"
        string mimeType "text/html | application/pdf"
    }

Data Usage ERD

erDiagram
    JOB ||--o{ JOB_EVENT : "has"
    JOB ||--|| ANALYSIS_RESULT : "produces"
    ANALYSIS_RESULT ||--o{ CLAIM_VERDICT : "contains"
    ANALYSIS_RESULT ||--o{ FETCHED_SOURCE : "references"
    ANALYSIS_RESULT ||--o{ EXTRACTED_FACT : "contains"
    CLAIM_VERDICT }o--o{ EXTRACTED_FACT : "supported by"
    FETCHED_SOURCE ||--o{ EXTRACTED_FACT : "provides"
    CLAIM_VERDICT ||--o{ CLAIM_VERDICT : "depends on"

    JOB {
        string JobId PK "GUID"
        string Status "QUEUED|RUNNING|COMPLETE|FAILED"
        int Progress "0-100"
        datetime CreatedUtc
        datetime UpdatedUtc
        string InputType "text|url"
        string InputValue "URL or text content"
        string InputPreview "First 100 chars"
        json ResultJson "Full analysis result"
        string ReportMarkdown "Formatted report"
    }

    JOB_EVENT {
        long Id PK
        string JobId FK
        datetime TsUtc
        string Level "info|warn|error"
        string Message
    }

    ANALYSIS_RESULT {
        string schemaVersion "2.6.17"
        string inputType "question|claim|article"
        boolean isQuestion
        string articleThesis
        int articleTruthPercentage "0-100"
        string articleVerdict "7-point scale"
        json claimPattern "total/supported/uncertain/refuted"
        boolean isPseudoscience
        int llmCalls "Total LLM invocations"
        json searchQueries "All search queries"
    }

    CLAIM_VERDICT {
        string claimId PK "SC1, SC2, etc."
        string claimText
        boolean isCentral
        string claimRole "attribution|source|timing|core"
        string_array dependsOn "Prerequisite claim IDs"
        boolean dependencyFailed
        string llmVerdict "WELL-SUPPORTED|PARTIALLY-SUPPORTED|UNCERTAIN|REFUTED"
        string verdict "7-point: True to False"
        int confidence "0-100"
        int truthPercentage "0-100"
        string riskTier "A|B|C"
        string reasoning
        string_array supportingFactIds
        string highlightColor "green to dark-red"
    }

    FETCHED_SOURCE {
        string id PK "S1, S2, etc."
        string url
        string title
        int trackRecordScore "0-100 or null"
        string fullText "Extracted content"
        datetime fetchedAt
        string category "legal|news|academic"
        boolean fetchSuccess
        string searchQuery "Which query found this"
    }

    EXTRACTED_FACT {
        string id PK "S1-F1, S1-F2, etc."
        string fact "The factual statement"
        string category "legal_provision|evidence|expert_quote|statistic|event|criticism"
        string specificity "high|medium"
        string sourceId FK
        string sourceUrl
        string sourceTitle
        string sourceExcerpt
        string relatedProceedingId
        boolean isContestedClaim
        string claimSource
    }

3. Overall Architecture with Interactions

flowchart TB
    subgraph Client["🖥️ Client Layer"]
        BROWSER[Web Browser]
        ANALYZE_PAGE["/analyze page
React + TailwindCSS"]
        JOBS_PAGE["/jobs page
Job history & status"]
    end

    subgraph NextJS["⚡ Next.js Web App (apps/web)"]
        direction TB
        
        subgraph API_Routes["API Routes"]
            ANALYZE_API["/api/fh/analyze
━━━━━━━━━━━━━
POST: Create job"]
            JOBS_API["/api/fh/jobs
━━━━━━━━━━━━━
GET: List jobs
POST: Create job"]
            JOB_API["/api/fh/jobs/[id]
━━━━━━━━━━━━━
GET: Job status"]
            EVENTS_API["/api/fh/jobs/[id]/events
━━━━━━━━━━━━━
GET: Job events (SSE)"]
            RUN_JOB["/api/internal/run-job
━━━━━━━━━━━━━
POST: Execute analysis"]
        end
        
        subgraph Lib["Core Libraries"]
            ANALYZER["analyzer.ts
━━━━━━━━━━━━━
AKEL Pipeline
2918 lines"]
            RETRIEVAL["retrieval.ts
━━━━━━━━━━━━━
URL content extraction"]
            WEBSEARCH["web-search.ts
━━━━━━━━━━━━━
Search abstraction"]
            MBFC["mbfc-loader.ts
━━━━━━━━━━━━━
Source reliability"]
        end
    end

    subgraph DotNet["🔧 .NET API (apps/api)"]
        DOTNET_API["FactHarbor.Api
ASP.NET Core"]
        
        subgraph Controllers["Controllers"]
            ANALYZE_CTRL["AnalyzeController"]
            JOBS_CTRL["JobsController"]
            INTERNAL_CTRL["InternalJobsController"]
        end
        
        subgraph Services["Services"]
            JOB_SVC["JobService
━━━━━━━━━━━━━
Job CRUD operations"]
            RUNNER_CLIENT["RunnerClient
━━━━━━━━━━━━━
Calls Next.js runner"]
        end
        
        DB[(SQLite Database
━━━━━━━━━━━━━
JobEntity
JobEventEntity)]
    end

    subgraph External["🌐 External Services"]
        LLM_PROVIDERS["LLM Providers
━━━━━━━━━━━━━
• Anthropic Claude
• OpenAI GPT
• Google Gemini
• Mistral"]
        SEARCH_PROVIDERS["Search Providers
━━━━━━━━━━━━━
• Google CSE
• SerpAPI
• Brave
• Tavily"]
        WEB["Web Content
━━━━━━━━━━━━━
• News sites
• PDFs
• Academic sources"]
    end

    %% Client interactions
    BROWSER --> ANALYZE_PAGE
    BROWSER --> JOBS_PAGE
    ANALYZE_PAGE --> ANALYZE_API
    JOBS_PAGE --> JOBS_API
    
    %% Next.js internal
    ANALYZE_API --> JOBS_API
    JOBS_API -->|"Proxy"| DOTNET_API
    JOB_API -->|"Proxy"| DOTNET_API
    EVENTS_API -->|"Proxy"| DOTNET_API
    
    %% .NET flow
    DOTNET_API --> ANALYZE_CTRL
    DOTNET_API --> JOBS_CTRL
    DOTNET_API --> INTERNAL_CTRL
    ANALYZE_CTRL --> JOB_SVC
    JOBS_CTRL --> JOB_SVC
    JOB_SVC --> DB
    JOB_SVC --> RUNNER_CLIENT
    RUNNER_CLIENT -->|"HTTP POST"| RUN_JOB
    
    %% Analysis execution
    RUN_JOB --> ANALYZER
    ANALYZER --> RETRIEVAL
    ANALYZER --> WEBSEARCH
    ANALYZER --> MBFC
    
    %% External calls
    ANALYZER -->|"AI SDK"| LLM_PROVIDERS
    WEBSEARCH --> SEARCH_PROVIDERS
    RETRIEVAL --> WEB

    %% Styling
    classDef external fill:#fff3e0,stroke:#e65100,stroke-width:2px
    classDef core fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
    classDef api fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
    
    class LLM_PROVIDERS,SEARCH_PROVIDERS,WEB external
    class ANALYZER,RETRIEVAL,WEBSEARCH,MBFC core
    class ANALYZE_API,JOBS_API,JOB_API,EVENTS_API,RUN_JOB api

4. Specification vs Implementation Gap Analysis

4.1 Data Model Gaps

Specification Entity	POC1 Status	Gap Description
-	-	-
Claim	⚠️ Partial	No persistent storage; claims exist only in JSON result. Missing: `status`, `confidence_score`, `risk_score`, `completeness_score`, `version`, `views`, `edit_count`
Evidence	⚠️ Partial	Implemented as `ExtractedFact` but lacks: `supports` enum, proper `relevance_score`
Source	⚠️ Partial	`FetchedSource` exists but missing: `type` enum, `accuracy_history`, `correction_frequency`, weekly update scheduler
Scenario	❌ Missing	Not implemented. Claims are evaluated directly without scenario contexts
Verdict	⚠️ Partial	`ClaimVerdict` exists but missing: `likelihood_range`, `uncertainty_factors` array, proper `explanation_summary`
User	❌ Missing	No user authentication or role system
Edit	❌ Missing	No audit trail for changes

4.2 AKEL Component Gaps

Spec Component	POC1 Status	Gap Description
	-	-
AKEL Orchestrator	✅ Implemented	`runAnalysis()` function serves this role
Claim Extractor	✅ Implemented	`understandClaim()` with claim role/dependency tracking
Claim Classifier	⚠️ Partial	Risk tier (A/B/C) assigned, but no domain classification
Scenario Generator	❌ Missing	Claims evaluated without scenario extraction
Evidence Summarizer	✅ Implemented	`extractFacts()` function
Contradiction Detector	⚠️ Partial	`isContestedClaim` flag exists but no active contradiction search
Quality Gate Validator	❌ Missing	No source quality gates, no mandatory checks
Audit Sampling Scheduler	❌ Missing	No audit system
Embedding Handler	❌ Missing	Not needed for POC
Federation Sync	❌ Missing	Not needed for POC

4.3 Architecture Gaps

Spec Requirement	POC1 Status	Gap Description
	-	-
Three-Layer Architecture	✅ Implemented	Interface (Next.js) → Processing (AKEL) → Data (SQLite)
LLM Abstraction Layer	✅ Implemented	AI SDK supports multiple providers with failover
PostgreSQL Primary DB	⚠️ Different	Using SQLite for simplicity (acceptable for POC)
Redis Caching	❌ Missing	No caching layer
S3 Archival	❌ Missing	No long-term storage
Background Jobs	❌ Missing	No scheduler for source updates, cache warming
Quality Monitoring	⚠️ Partial	LLM call counting exists, but no anomaly detection

4.4 Publication & Review Gaps

Spec Feature	POC1 Status	Gap Description
	-	-
Risk Tier Publication Rules	❌ Missing	All results published immediately regardless of tier
Human Review Queue	❌ Missing	No review workflow
AI-Generated Labeling	⚠️ Partial	Results show "AI analysis" but no formal labeling system
Audit Rate Sampling	❌ Missing	No sampling audits

5. Optimization Recommendations

5.1 Cost Optimizations

pie title Current LLM Cost Distribution (Estimated per Analysis)
    "Step 1: Understand" : 15
    "Step 2: Research (per source)" : 60
    "Step 3: Verdicts" : 25

Optimization	Estimated Savings	Implementation Effort
	-
Cache claim understanding	30-50% on repeated claims	Medium
Use Haiku for fact extraction	40% on Step 2 costs	Low (config change)
Batch fact extraction	20% fewer API calls	Medium
Skip search for known claims	50%+ for cached claims	High (needs claim DB)
Reduce max iterations	Linear reduction	Low (config change)

5.2 Timing Optimizations

gantt
    title Current Analysis Timeline (Typical)
    dateFormat ss
    axisFormat %S sec
    
    section Current Flow
    URL Fetch           :a1, 00, 2s
    Step 1 Understand   :a2, after a1, 15s
    Search Iteration 1  :a3, after a2, 8s
    Fetch Sources 1     :a4, after a3, 10s
    Extract Facts 1     :a5, after a4, 12s
    Search Iteration 2  :a6, after a5, 8s
    Fetch Sources 2     :a7, after a6, 10s
    Extract Facts 2     :a8, after a7, 12s
    Generate Verdicts   :a9, after a8, 15s
    
    section Optimized Flow
    URL Fetch           :b1, 00, 2s
    Step 1 Understand   :b2, after b1, 10s
    Search + Fetch (parallel) :b3, after b2, 12s
    Extract Facts (batched)   :b4, after b3, 8s
    Generate Verdicts   :b5, after b4, 10s

Optimization	Time Savings	Notes
		-
Parallel source fetching	Already implemented	Currently fetches 3 sources in parallel
Streaming LLM responses	20-30% perceived	User sees progress faster
Search query batching	10-15%	Send multiple queries to search API
Reduce prompt size	5-10% per call	Optimize system prompts
Use faster models for extraction	30-40% on Step 2	Claude Haiku vs Sonnet

5.3 Priority Recommendations

HIGH PRIORITY - Implement Claim Caching
   - Cache claim verdicts by content hash
   - Reduces costs for repeated/similar claims
   - Enables the separated verdict architecture (see Section 6)

2. MEDIUM PRIORITY - Use Tiered Models
   - Step 1 (Understand): Sonnet (needs reasoning)
   - Step 2 (Extract): Haiku (simple extraction)
   - Step 3 (Verdicts): Sonnet (needs synthesis)

3. LOW PRIORITY - Add Redis Cache
   - Cache source content (24h TTL)
   - Cache search results (1h TTL)
   - Reduces external API calls

6. Separated Verdict Architecture Proposal

6.1 Current Architecture

flowchart LR
    subgraph Current["Current: Monolithic Analysis"]
        INPUT[Article Input] --> ANALYZE[Full Analysis Pipeline]
        ANALYZE --> CLAIMS[Claim Verdicts]
        ANALYZE --> ARTICLE[Article Verdict]
        CLAIMS -.->|"Aggregated"| ARTICLE
    end

Issues:
- Every analysis re-processes all claims
- No caching of individual claim verdicts
- Article verdict tightly coupled to claim extraction

6.2 Proposed Separated Architecture

flowchart TB
    subgraph Input["Input Processing"]
        ARTICLE[Article/Text Input]
        EXTRACT[Claim Extraction]
    end

    subgraph ClaimLayer["Claim Verdict Layer (Cacheable)"]
        CACHE[(Claim Cache
━━━━━━━━━━━━━
Key: claim_hash
TTL: 7 days)]
        
        CLAIM1["Claim 1 Analysis"]
        CLAIM2["Claim 2 Analysis"]
        CLAIM3["Claim N Analysis"]
        
        VERDICT1[Claim 1 Verdict]
        VERDICT2[Claim 2 Verdict]
        VERDICT3[Claim N Verdict]
    end

    subgraph ArticleLayer["Article Verdict Layer (Dynamic)"]
        AGGREGATE[Aggregate Claim Verdicts]
        CONTEXT[Apply Article Context
━━━━━━━━━━━━━
• Claim relationships
• Logical structure
• Author intent]
        ARTICLE_VERDICT[Article Verdict]
    end

    %% Flow
    ARTICLE --> EXTRACT
    EXTRACT --> CLAIM1
    EXTRACT --> CLAIM2
    EXTRACT --> CLAIM3
    
    CLAIM1 -->|"Cache Miss"| VERDICT1
    CLAIM2 -->|"Cache Hit"| VERDICT2
    CLAIM3 -->|"Cache Miss"| VERDICT3
    
    CLAIM1 <-.-> CACHE
    CLAIM2 <-.-> CACHE
    CLAIM3 <-.-> CACHE
    
    VERDICT1 --> AGGREGATE
    VERDICT2 --> AGGREGATE
    VERDICT3 --> AGGREGATE
    
    AGGREGATE --> CONTEXT
    CONTEXT --> ARTICLE_VERDICT

    classDef cache fill:#fff9c4,stroke:#f57f17,stroke-width:2px
    classDef dynamic fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
    class CACHE cache
    class CONTEXT,ARTICLE_VERDICT dynamic

6.3 Benefits Analysis

Benefit	Impact	Rationale
-		-
Cost Reduction	40-70% for repeated claims	Many articles share common claims (e.g., "COVID vaccines are safe")
Faster Analysis	50%+ for cached claims	Skip research + LLM calls for known claims
Consistency	High	Same claim always gets same verdict (until cache expires)
Freshness Control	Configurable TTL	Balance consistency vs. new evidence
Scalability	Linear improvement	More users = higher cache hit rate

6.4 Implementation Considerations

Claim Hashing Strategy:
function getClaimHash(claim: string): string {
// Normalize: lowercase, remove punctuation, stem words
const normalized = normalize(claim);
// Hash for cache key
return crypto.createHash('sha256').update(normalized).digest('hex').slice(0, 16);
}

Cache Invalidation Triggers:
- TTL expiration (default 7 days)
- Major news event related to claim topic
- Source track record significant change
- Manual invalidation by moderator

Article Verdict Considerations:
- Article verdict should ALWAYS be dynamic (never cached)
- Same claims in different article contexts may yield different article verdicts
- Example: "Vaccines are safe" + "Vaccines cause autism" → article may be misleading even if first claim is true

# 6.5 Recommendation

YES, separating is beneficial with the following caveats:

Claim verdicts should be cached with semantic similarity matching (not just exact match)
2. Article verdicts should always be dynamic to account for:
   - Claim relationships and logical structure
   - Author's argumentative strategy
   - Context and framing
   - Selective use of true claims to support false conclusions

3. Implementation phases:
   - Phase 1: Exact-match claim caching (simple hash)
   - Phase 2: Semantic similarity caching (embedding-based)
   - Phase 3: Federated claim sharing across instances

7. Summary

Current State

- POC1 implements core AKEL pipeline successfully
- Claim dependency tracking is implemented
- Multiple LLM providers supported
- No persistent claim storage or caching

Key Gaps from Specification

- No scenario extraction
- No user/role system
- No audit trail
- No source track record updates
- No review queue

Recommended Next Steps

Implement claim caching layer
2. Separate claim vs article verdict generation
3. Add Redis for source/search caching
4. Implement tiered model selection
5. Add basic audit logging