Changes for page Architecture

Last modified by Robert Schaub on 2025/12/24 21:53

From version 1.1

edited by Robert Schaub
on 2025/12/18 12:03

Change comment: Imported from XAR

To version 4.1

edited by Robert Schaub
on 2025/12/24 21:53

Change comment: Imported from XAR

Raw
Rendered

Summary

Page properties (1 modified, 0 added, 0 removed)

Details

Page properties

Content

@@ -20,43 +20,114 @@
  ==== Processing Layer ====
  Core business logic and AI processing:
  * **AKEL Pipeline**: AI-driven claim analysis (parallel processing)
--  * Parse and extract claim components
--  * Gather evidence from multiple sources
--  * Check source track records
--  * Extract scenarios from evidence
--  * Synthesize verdicts
--  * Calculate risk scores
++ * Parse and extract claim components
++ * Gather evidence from multiple sources
++ * Check source track records
++ * Extract scenarios from evidence
++ * Synthesize verdicts
++ * Calculate risk scores
++
++* **LLM Abstraction Layer**: Provider-agnostic AI access
++ * Multi-provider support (Anthropic, OpenAI, Google, local models)
++ * Automatic failover and rate limit handling
++ * Per-stage model configuration
++ * Cost optimization through provider selection
++ * No vendor lock-in
  * **Background Jobs**: Automated maintenance tasks
--  * Source track record updates (weekly)
--  * Cache warming and invalidation
--  * Metrics aggregation
--  * Data archival
++ * Source track record updates (weekly)
++ * Cache warming and invalidation
++ * Metrics aggregation
++ * Data archival
  * **Quality Monitoring**: Automated quality checks
--  * Anomaly detection
--  * Contradiction detection
--  * Completeness validation
++ * Anomaly detection
++ * Contradiction detection
++ * Completeness validation
  * **Moderation Detection**: Automated abuse detection
--  * Spam identification
--  * Manipulation detection
--  * Flag suspicious activity
++ * Spam identification
++ * Manipulation detection
++ * Flag suspicious activity
  ==== Data & Storage Layer ====
  Persistent data storage and caching:
  * **PostgreSQL**: Primary database for all core data
--  * Claims, evidence, sources, users
--  * Scenarios, edits, audit logs
--  * Built-in full-text search
--  * Time-series capabilities for metrics
++ * Claims, evidence, sources, users
++ * Scenarios, edits, audit logs
++ * Built-in full-text search
++ * Time-series capabilities for metrics
  * **Redis**: High-speed caching layer
--  * Session data
--  * Frequently accessed claims
--  * API rate limiting
++ * Session data
++ * Frequently accessed claims
++ * API rate limiting
  * **S3 Storage**: Long-term archival
--  * Old edit history (90+ days)
--  * AKEL processing logs
--  * Backup snapshots
++ * Old edit history (90+ days)
++ * AKEL processing logs
++ * Backup snapshots
  **Optional future additions** (add only when metrics prove necessary):
  * **Elasticsearch**: If PostgreSQL full-text search becomes slow
  * **TimescaleDB**: If metrics queries become a bottleneck
++
++
++=== 2.2 LLM Abstraction Layer ===
++
++{{include reference="Test.FactHarbor.Specification.Diagrams.LLM Abstraction Architecture.WebHome"/}}
++
++**Purpose:** FactHarbor uses a provider-agnostic abstraction layer for all AI interactions, avoiding vendor lock-in and enabling flexible provider selection.
++
++**Multi-Provider Support:**
++* **Primary:** Anthropic Claude API (Haiku for extraction, Sonnet for analysis)
++* **Secondary:** OpenAI GPT API (automatic failover)
++* **Tertiary:** Google Vertex AI / Gemini
++* **Future:** Local models (Llama, Mistral) for on-premises deployments
++
++**Provider Interface:**
++* Abstract `LLMProvider` interface with `complete()`, `stream()`, `getName()`, `getCostPer1kTokens()`, `isAvailable()` methods
++* Per-stage model configuration (Stage 1: Haiku, Stage 2 & 3: Sonnet)
++* Environment variable and database configuration
++* Adapter pattern implementation (AnthropicProvider, OpenAIProvider, GoogleProvider)
++
++**Configuration:**
++* Runtime provider switching without code changes
++* Admin API for provider management (`POST /admin/v1/llm/configure`)
++* Per-stage cost optimization (use cheaper models for extraction, quality models for analysis)
++* Support for rate limit handling and cost tracking
++
++**Failover Strategy:**
++* Automatic fallback: Primary → Secondary → Tertiary
++* Circuit breaker pattern for unavailable providers
++* Health checking and provider availability monitoring
++* Graceful degradation when all providers unavailable
++
++**Cost Optimization:**
++* Track and compare costs across providers per request
++* Enable A/B testing of different models for quality/cost tradeoffs
++* Per-stage provider selection for optimal cost-efficiency
++* Cost comparison: Anthropic ($0.114), OpenAI ($0.065), Google ($0.072) per article at 0% cache
++
++**Architecture Pattern:**
++
++{{code}}
++AKEL Stages          LLM Abstraction       Providers
++━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
++Stage 1 Extract  ──→ Provider Interface ──→ Anthropic (PRIMARY)
++Stage 2 Analyze  ──→ Configuration      ──→ OpenAI (SECONDARY)
++Stage 3 Holistic ──→ Failover Handler   ──→ Google (TERTIARY)
++                                         └→ Local Models (FUTURE)
++{{/code}}
++
++**Benefits:**
++* **No Vendor Lock-In:** Switch providers based on cost, quality, or availability without code changes
++* **Resilience:** Automatic failover ensures service continuity during provider outages
++* **Cost Efficiency:** Use optimal provider per task (cheap for extraction, quality for analysis)
++* **Quality Assurance:** Cross-provider output verification for critical claims
++* **Regulatory Compliance:** Use specific providers for data residency requirements
++* **Future-Proofing:** Easy integration of new models as they become available
++
++**Cross-References:**
++* [[Requirements>>FactHarbor.Specification.Requirements.WebHome#NFR-14]]: NFR-14 (formal requirement)
++* [[POC Requirements>>FactHarbor.Specification.POC.Requirements#NFR-POC-11]]: NFR-POC-11 (POC1 implementation)
++* [[API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome#Section-6]]: Section 6 (implementation details)
++* [[Design Decisions>>FactHarbor.Specification.Design-Decisions#Section-9]]: Section 9 (design rationale)
++
++
  === 2.2 Design Philosophy ===
  **Start Simple, Evolve Based on Metrics**
  The architecture deliberately starts simple:
@@ -66,8 +66,71 @@
  * Measure before optimizing (add complexity only when proven necessary)
  See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale.
  == 3. AKEL Architecture ==
--{{include reference="FactHarbor.Specification.Diagrams.AKEL_Architecture.WebHome"/}}
++{{include reference="FactHarbor.Specification.Diagrams.AKEL Architecture.WebHome"/}}
  See [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information.
++
++== 3.5 Claim Processing Architecture ==
++
++FactHarbor's claim processing architecture is designed to handle both single-claim and multi-claim submissions efficiently.
++
++=== Multi-Claim Handling ===
++
++Users often submit:
++* **Text with multiple claims**: Articles, statements, or paragraphs containing several distinct factual claims
++* **Web pages**: URLs that are analyzed to extract all verifiable claims
++* **Single claims**: Simple, direct factual statements
++
++The first processing step is always **Claim Extraction**: identifying and isolating individual verifiable claims from submitted content.
++
++=== Processing Phases ===
++
++**POC Implementation (Two-Phase):**
++
++Phase 1 - Claim Extraction:
++* LLM analyzes submitted content
++* Extracts all distinct, verifiable claims
++* Returns structured list of claims with context
++
++Phase 2 - Parallel Analysis:
++* Each claim processed independently by LLM
++* Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk
++* Parallelized across all claims
++* Results aggregated for presentation
++
++**Production Implementation (Three-Phase):**
++
++Phase 1 - Extraction + Validation:
++* Extract claims from content
++* Validate clarity and uniqueness
++* Filter vague or duplicate claims
++
++Phase 2 - Evidence Gathering (Parallel):
++* Independent evidence gathering per claim
++* Source validation and scenario generation
++* Quality gates prevent poor data from advancing
++
++Phase 3 - Verdict Generation (Parallel):
++* Generate verdict from validated evidence
++* Confidence scoring and risk assessment
++* Low-confidence cases routed to human review
++
++=== Architectural Benefits ===
++
++**Scalability:**
++* Process 100 claims with ~3x latency of single claim
++* Parallel processing across independent claims
++* Linear cost scaling with claim count
++=== 2.3 Design Philosophy ===
++**Quality:**
++* Validation gates between phases
++* Errors isolated to individual claims
++* Clear observability per processing step
++
++**Flexibility:**
++* Each phase optimizable independently
++* Can use different model sizes per phase
++* Easy to add human review at decision points
++
  == 4. Storage Architecture ==
  {{include reference="FactHarbor.Specification.Diagrams.Storage Architecture.WebHome"/}}
  See [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] for detailed information.
@@ -117,17 +117,17 @@
  === 5.3 Quality Monitoring ===
  **Automated checks run continuously**:
  * **Anomaly Detection**: Flag unusual patterns
--  * Sudden confidence score changes
--  * Unusual evidence distributions
--  * Suspicious source patterns
++ * Sudden confidence score changes
++ * Unusual evidence distributions
++ * Suspicious source patterns
  * **Contradiction Detection**: Identify conflicts
--  * Evidence that contradicts other evidence
--  * Claims with internal contradictions
--  * Source track record anomalies
++ * Evidence that contradicts other evidence
++ * Claims with internal contradictions
++ * Source track record anomalies
  * **Completeness Validation**: Ensure thoroughness
--  * Sufficient evidence gathered
--  * Multiple source types represented
--  * Key scenarios identified
++ * Sufficient evidence gathered
++ * Multiple source types represented
++ * Key scenarios identified
  === 5.4 Moderation Detection ===
  **Automated abuse detection**:
  * **Spam Identification**: Pattern matching for spam claims
@@ -207,11 +207,16 @@
  * **Point-in-Time Recovery**: Transaction log archival
  * **Replication**: Real-time replication to standby
  * **Recovery Time Objective**: <4 hours
++
++=== 9.5 Federation Architecture Diagram ===
++
++{{include reference="FactHarbor.Specification.Diagrams.Federation Architecture.WebHome"/}}
++
  == 10. Future Architecture Evolution ==
  === 10.1 When to Add Complexity ===
  See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers.
  **Elasticsearch**: When PostgreSQL search consistently >500ms
--**TimescaleDB**: When metrics queries consistently >1s
++**TimescaleDB**: When metrics queries consistently >1s
  **Federation**: When 10,000+ users and explicit demand
  **Complex Reputation**: When 100+ active contributors
  === 10.2 Federation (V2.0+) ===

Changes for page Architecture

Summary

Details

Applications

Navigation

Need help?