Wiki source code of Architecture

Last modified by Robert Schaub on 2025/12/24 20:16

version	line-number	content
1.1	1	= Architecture =
1.3	2
1.1	3	FactHarbor's architecture is designed for simplicity, automation, and continuous improvement.
1.3	4
1.1	5	== 1. Core Principles ==
1.3	6
1.1	7	* AI-First: AKEL (AI) is the primary system, humans supplement
	8	* Publish by Default: No centralized approval (removed in V0.9.50), publish with confidence scores
	9	* System Over Data: Fix algorithms, not individual outputs
	10	* Measure Everything: Quality metrics drive improvements
	11	* Scale Through Automation: Minimal human intervention
	12	* Start Simple: Add complexity only when metrics prove necessary
1.3	13
1.1	14	== 2. High-Level Architecture ==
1.3	15
1.1	16	{{include reference="FactHarbor.Specification.Diagrams.High-Level Architecture.WebHome"/}}
1.3	17
1.1	18	=== 2.1 Three-Layer Architecture ===
1.3	19
1.1	20	FactHarbor uses a clean three-layer architecture:
1.3	21
1.1	22	==== Interface Layer ====
1.3	23
1.1	24	Handles all user and system interactions:
1.3	25
1.1	26	* Web UI: Browse claims, view evidence, submit feedback
	27	* REST API: Programmatic access for integrations
	28	* Authentication & Authorization: User identity and permissions
	29	* Rate Limiting: Protect against abuse
1.3	30
1.1	31	==== Processing Layer ====
1.3	32
1.1	33	Core business logic and AI processing:
1.3	34
1.1	35	* AKEL Pipeline: AI-driven claim analysis (parallel processing)
1.3	36	* Parse and extract claim components
	37	* Gather evidence from multiple sources
	38	* Check source track records
	39	* Extract scenarios from evidence
	40	* Synthesize verdicts
	41	* Calculate risk scores
1.1	42
	43	* LLM Abstraction Layer: Provider-agnostic AI access
1.3	44	* Multi-provider support (Anthropic, OpenAI, Google, local models)
	45	* Automatic failover and rate limit handling
	46	* Per-stage model configuration
	47	* Cost optimization through provider selection
	48	* No vendor lock-in
1.1	49	* Background Jobs: Automated maintenance tasks
1.3	50	* Source track record updates (weekly)
	51	* Cache warming and invalidation
	52	* Metrics aggregation
	53	* Data archival
1.1	54	* Quality Monitoring: Automated quality checks
1.3	55	* Anomaly detection
	56	* Contradiction detection
	57	* Completeness validation
1.1	58	* Moderation Detection: Automated abuse detection
1.3	59	* Spam identification
	60	* Manipulation detection
	61	* Flag suspicious activity
	62
1.1	63	==== Data & Storage Layer ====
1.3	64
1.1	65	Persistent data storage and caching:
1.3	66
1.1	67	* PostgreSQL: Primary database for all core data
1.3	68	* Claims, evidence, sources, users
	69	* Scenarios, edits, audit logs
	70	* Built-in full-text search
	71	* Time-series capabilities for metrics
1.1	72	* Redis: High-speed caching layer
1.3	73	* Session data
	74	* Frequently accessed claims
	75	* API rate limiting
1.1	76	* S3 Storage: Long-term archival
1.3	77	* Old edit history (90+ days)
	78	* AKEL processing logs
	79	* Backup snapshots
1.1	80	Optional future additions (add only when metrics prove necessary):
	81	* Elasticsearch: If PostgreSQL full-text search becomes slow
	82	* TimescaleDB: If metrics queries become a bottleneck
	83
	84	=== 2.2 LLM Abstraction Layer ===
	85
1.3	86	{{include reference="Test.FactHarbor V0\.9\.105.Specification.Diagrams.LLM Abstraction Architecture.WebHome"/}}
1.1	87
	88	Purpose: FactHarbor uses a provider-agnostic abstraction layer for all AI interactions, avoiding vendor lock-in and enabling flexible provider selection.
	89
	90	Multi-Provider Support:
1.3	91
1.1	92	* Primary: Anthropic Claude API (Haiku for extraction, Sonnet for analysis)
	93	* Secondary: OpenAI GPT API (automatic failover)
	94	* Tertiary: Google Vertex AI / Gemini
	95	* Future: Local models (Llama, Mistral) for on-premises deployments
	96
	97	Provider Interface:
1.3	98
1.1	99	* Abstract `LLMProvider` interface with `complete()`, `stream()`, `getName()`, `getCostPer1kTokens()`, `isAvailable()` methods
	100	* Per-stage model configuration (Stage 1: Haiku, Stage 2 & 3: Sonnet)
	101	* Environment variable and database configuration
	102	* Adapter pattern implementation (AnthropicProvider, OpenAIProvider, GoogleProvider)
	103
	104	Configuration:
1.3	105
1.1	106	* Runtime provider switching without code changes
	107	* Admin API for provider management (`POST /admin/v1/llm/configure`)
	108	* Per-stage cost optimization (use cheaper models for extraction, quality models for analysis)
	109	* Support for rate limit handling and cost tracking
	110
	111	Failover Strategy:
1.3	112
1.1	113	* Automatic fallback: Primary → Secondary → Tertiary
	114	* Circuit breaker pattern for unavailable providers
	115	* Health checking and provider availability monitoring
	116	* Graceful degradation when all providers unavailable
	117
	118	Cost Optimization:
1.3	119
1.1	120	* Track and compare costs across providers per request
	121	* Enable A/B testing of different models for quality/cost tradeoffs
	122	* Per-stage provider selection for optimal cost-efficiency
	123	* Cost comparison: Anthropic ($0.114), OpenAI ($0.065), Google ($0.072) per article at 0% cache
	124
	125	Architecture Pattern:
	126
	127	{{code}}
	128	AKEL Stages LLM Abstraction Providers
	129	━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
	130	Stage 1 Extract ──→ Provider Interface ──→ Anthropic (PRIMARY)
	131	Stage 2 Analyze ──→ Configuration ──→ OpenAI (SECONDARY)
	132	Stage 3 Holistic ──→ Failover Handler ──→ Google (TERTIARY)
	133	└→ Local Models (FUTURE)
	134	{{/code}}
	135
	136	Benefits:
1.3	137
1.1	138	* No Vendor Lock-In: Switch providers based on cost, quality, or availability without code changes
	139	* Resilience: Automatic failover ensures service continuity during provider outages
	140	* Cost Efficiency: Use optimal provider per task (cheap for extraction, quality for analysis)
	141	* Quality Assurance: Cross-provider output verification for critical claims
	142	* Regulatory Compliance: Use specific providers for data residency requirements
	143	* Future-Proofing: Easy integration of new models as they become available
	144
	145	Cross-References:
1.3	146
1.1	147	* [[Requirements>>FactHarbor.Specification.Requirements.WebHome#NFR-14]]: NFR-14 (formal requirement)
	148	* [[POC Requirements>>FactHarbor.Specification.POC.Requirements#NFR-POC-11]]: NFR-POC-11 (POC1 implementation)
	149	* [[API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome#Section-6]]: Section 6 (implementation details)
	150	* [[Design Decisions>>FactHarbor.Specification.Design-Decisions#Section-9]]: Section 9 (design rationale)
	151
1.3	152	=== 2.2 Design Philosophy ===
1.1	153
	154	Start Simple, Evolve Based on Metrics
	155	The architecture deliberately starts simple:
1.3	156
1.1	157	* Single primary database (PostgreSQL handles most workloads initially)
	158	* Three clear layers (easy to understand and maintain)
	159	* Automated operations (minimal human intervention)
	160	* Measure before optimizing (add complexity only when proven necessary)
	161	See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale.
1.3	162
1.1	163	== 3. AKEL Architecture ==
1.3	164
1.1	165	{{include reference="FactHarbor.Specification.Diagrams.AKEL_Architecture.WebHome"/}}
	166	See [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information.
	167
	168	== 3.5 Claim Processing Architecture ==
	169
	170	FactHarbor's claim processing architecture is designed to handle both single-claim and multi-claim submissions efficiently.
	171
	172	=== Multi-Claim Handling ===
	173
	174	Users often submit:
1.3	175
1.1	176	* Text with multiple claims: Articles, statements, or paragraphs containing several distinct factual claims
	177	* Web pages: URLs that are analyzed to extract all verifiable claims
	178	* Single claims: Simple, direct factual statements
	179
	180	The first processing step is always Claim Extraction: identifying and isolating individual verifiable claims from submitted content.
	181
	182	=== Processing Phases ===
	183
	184	POC Implementation (Two-Phase):
	185
	186	Phase 1 - Claim Extraction:
1.3	187
1.1	188	* LLM analyzes submitted content
	189	* Extracts all distinct, verifiable claims
	190	* Returns structured list of claims with context
	191
	192	Phase 2 - Parallel Analysis:
1.3	193
1.1	194	* Each claim processed independently by LLM
	195	* Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk
	196	* Parallelized across all claims
	197	* Results aggregated for presentation
	198
	199	Production Implementation (Three-Phase):
	200
	201	Phase 1 - Extraction + Validation:
1.3	202
1.1	203	* Extract claims from content
	204	* Validate clarity and uniqueness
	205	* Filter vague or duplicate claims
	206
	207	Phase 2 - Evidence Gathering (Parallel):
1.3	208
1.1	209	* Independent evidence gathering per claim
	210	* Source validation and scenario generation
	211	* Quality gates prevent poor data from advancing
	212
	213	Phase 3 - Verdict Generation (Parallel):
1.3	214
1.1	215	* Generate verdict from validated evidence
	216	* Confidence scoring and risk assessment
	217	* Low-confidence cases routed to human review
	218
	219	=== Architectural Benefits ===
	220
	221	Scalability:
1.3	222
	223	* Process 100 claims with 3x latency of single claim
1.1	224	* Parallel processing across independent claims
	225	* Linear cost scaling with claim count
1.3	226
1.1	227	=== 2.3 Design Philosophy ===
1.3	228
1.1	229	Quality:
1.3	230
1.1	231	* Validation gates between phases
	232	* Errors isolated to individual claims
	233	* Clear observability per processing step
	234
	235	Flexibility:
1.3	236
1.1	237	* Each phase optimizable independently
	238	* Can use different model sizes per phase
	239	* Easy to add human review at decision points
	240
	241	== 4. Storage Architecture ==
1.3	242
1.1	243	{{include reference="FactHarbor.Specification.Diagrams.Storage Architecture.WebHome"/}}
	244	See [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]] for detailed information.
1.3	245
1.1	246	== 4.5 Versioning Architecture ==
1.3	247
1.1	248	{{include reference="FactHarbor.Specification.Diagrams.Versioning Architecture.WebHome"/}}
1.3	249
1.1	250	== 5. Automated Systems in Detail ==
1.3	251
1.1	252	FactHarbor relies heavily on automation to achieve scale and quality. Here's how each automated system works:
1.3	253
1.1	254	=== 5.1 AKEL (AI Knowledge Evaluation Layer) ===
1.3	255
1.1	256	What it does: Primary AI processing engine that analyzes claims automatically
	257	Inputs:
1.3	258
1.1	259	* User-submitted claim text
	260	* Existing evidence and sources
	261	* Source track record database
	262	Processing steps:
1.3	263
1.1	264	1. Parse & Extract: Identify key components, entities, assertions
	265	2. Gather Evidence: Search web and database for relevant sources
	266	3. Check Sources: Evaluate source reliability using track records
	267	4. Extract Scenarios: Identify different contexts from evidence
	268	5. Synthesize Verdict: Compile evidence assessment per scenario
	269	6. Calculate Risk: Assess potential harm and controversy
	270	Outputs:
1.3	271
1.1	272	* Structured claim record
	273	* Evidence links with relevance scores
	274	* Scenarios with context descriptions
	275	* Verdict summary per scenario
	276	* Overall confidence score
	277	* Risk assessment
	278	Timing: 10-18 seconds total (parallel processing)
1.3	279
1.1	280	=== 5.2 Background Jobs ===
1.3	281
1.1	282	Source Track Record Updates (Weekly):
1.3	283
1.1	284	* Analyze claim outcomes from past week
	285	* Calculate source accuracy and reliability
	286	* Update source_track_record table
	287	* Never triggered by individual claims (prevents circular dependencies)
	288	Cache Management (Continuous):
	289	* Warm cache for popular claims
	290	* Invalidate cache on claim updates
	291	* Monitor cache hit rates
	292	Metrics Aggregation (Hourly):
	293	* Roll up detailed metrics
	294	* Calculate system health indicators
	295	* Generate performance reports
	296	Data Archival (Daily):
	297	* Move old AKEL logs to S3 (90+ days)
	298	* Archive old edit history
	299	* Compress and backup data
1.3	300
1.1	301	=== 5.3 Quality Monitoring ===
1.3	302
1.1	303	Automated checks run continuously:
1.3	304
1.1	305	* Anomaly Detection: Flag unusual patterns
1.3	306	* Sudden confidence score changes
	307	* Unusual evidence distributions
	308	* Suspicious source patterns
1.1	309	* Contradiction Detection: Identify conflicts
1.3	310	* Evidence that contradicts other evidence
	311	* Claims with internal contradictions
	312	* Source track record anomalies
1.1	313	* Completeness Validation: Ensure thoroughness
1.3	314	* Sufficient evidence gathered
	315	* Multiple source types represented
	316	* Key scenarios identified
	317
1.1	318	=== 5.4 Moderation Detection ===
1.3	319
1.1	320	Automated abuse detection:
1.3	321
1.1	322	* Spam Identification: Pattern matching for spam claims
	323	* Manipulation Detection: Identify coordinated editing
	324	* Gaming Detection: Flag attempts to game source scores
	325	* Suspicious Activity: Log unusual behavior patterns
	326	Human Review: Moderators review flagged items, system learns from decisions
1.3	327
1.1	328	== 6. Scalability Strategy ==
1.3	329
1.1	330	=== 6.1 Horizontal Scaling ===
1.3	331
1.1	332	Components scale independently:
1.3	333
1.1	334	* AKEL Workers: Add more processing workers as claim volume grows
	335	* Database Read Replicas: Add replicas for read-heavy workloads
	336	* Cache Layer: Redis cluster for distributed caching
	337	* API Servers: Load-balanced API instances
1.3	338
1.1	339	=== 6.2 Vertical Scaling ===
1.3	340
1.1	341	Individual components can be upgraded:
1.3	342
1.1	343	* Database Server: Increase CPU/RAM for PostgreSQL
	344	* Cache Memory: Expand Redis memory
	345	* Worker Resources: More powerful AKEL worker machines
1.3	346
1.1	347	=== 6.3 Performance Optimization ===
1.3	348
1.1	349	Built-in optimizations:
1.3	350
1.1	351	* Denormalized Data: Cache summary data in claim records (70% fewer joins)
	352	* Parallel Processing: AKEL pipeline processes in parallel (40% faster)
	353	* Intelligent Caching: Redis caches frequently accessed data
	354	* Background Processing: Non-urgent tasks run asynchronously
1.3	355
1.1	356	== 7. Monitoring & Observability ==
1.3	357
1.1	358	=== 7.1 Key Metrics ===
1.3	359
1.1	360	System tracks:
1.3	361
1.1	362	* Performance: AKEL processing time, API response time, cache hit rate
	363	* Quality: Confidence score distribution, evidence completeness, contradiction rate
	364	* Usage: Claims per day, active users, API requests
	365	* Errors: Failed AKEL runs, API errors, database issues
1.3	366
1.1	367	=== 7.2 Alerts ===
1.3	368
1.1	369	Automated alerts for:
1.3	370
1.1	371	* Processing time >30 seconds (threshold breach)
	372	* Error rate >1% (quality issue)
	373	* Cache hit rate <80% (cache problem)
	374	* Database connections >80% capacity (scaling needed)
1.3	375
1.1	376	=== 7.3 Dashboards ===
1.3	377
1.1	378	Real-time monitoring:
1.3	379
1.1	380	* System Health: Overall status and key metrics
	381	* AKEL Performance: Processing time breakdown
	382	* Quality Metrics: Confidence scores, completeness
	383	* User Activity: Usage patterns, peak times
1.3	384
1.1	385	== 8. Security Architecture ==
1.3	386
1.1	387	=== 8.1 Authentication & Authorization ===
1.3	388
1.1	389	* User Authentication: Secure login with password hashing
	390	* Role-Based Access: Reader, Contributor, Moderator, Admin
	391	* API Keys: For programmatic access
	392	* Rate Limiting: Prevent abuse
1.3	393
1.1	394	=== 8.2 Data Security ===
1.3	395
1.1	396	* Encryption: TLS for transport, encrypted storage for sensitive data
	397	* Audit Logging: Track all significant changes
	398	* Input Validation: Sanitize all user inputs
	399	* SQL Injection Protection: Parameterized queries
1.3	400
1.1	401	=== 8.3 Abuse Prevention ===
1.3	402
1.1	403	* Rate Limiting: Prevent flooding and DDoS
	404	* Automated Detection: Flag suspicious patterns
	405	* Human Review: Moderators investigate flagged content
	406	* Ban Mechanisms: Block abusive users/IPs
1.3	407
1.1	408	== 9. Deployment Architecture ==
1.3	409
1.1	410	=== 9.1 Production Environment ===
1.3	411
1.1	412	Components:
1.3	413
1.1	414	* Load Balancer (HAProxy or cloud LB)
	415	* Multiple API servers (stateless)
	416	* AKEL worker pool (auto-scaling)
	417	* PostgreSQL primary + read replicas
	418	* Redis cluster
	419	* S3-compatible storage
	420	Regions: Single region for V1.0, multi-region when needed
1.3	421
1.1	422	=== 9.2 Development & Staging ===
1.3	423
1.1	424	Development: Local Docker Compose setup
	425	Staging: Scaled-down production replica
	426	CI/CD: Automated testing and deployment
1.3	427
1.1	428	=== 9.3 Disaster Recovery ===
1.3	429
1.1	430	* Database Backups: Daily automated backups to S3
	431	* Point-in-Time Recovery: Transaction log archival
	432	* Replication: Real-time replication to standby
	433	* Recovery Time Objective: <4 hours
	434
	435	=== 9.5 Federation Architecture Diagram ===
	436
	437	{{include reference="FactHarbor.Specification.Diagrams.Federation Architecture.WebHome"/}}
	438
	439	== 10. Future Architecture Evolution ==
1.3	440
1.1	441	=== 10.1 When to Add Complexity ===
1.3	442
1.1	443	See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers.
	444	Elasticsearch: When PostgreSQL search consistently >500ms
	445	TimescaleDB: When metrics queries consistently >1s
	446	Federation: When 10,000+ users and explicit demand
	447	Complex Reputation: When 100+ active contributors
1.3	448
1.1	449	=== 10.2 Federation (V2.0+) ===
1.3	450
1.1	451	Deferred until:
1.3	452
1.1	453	* Core product proven with 10,000+ users
	454	* User demand for decentralization
	455	* Single-node limits reached
	456	See [[Federation & Decentralization>>FactHarbor.Specification.Federation & Decentralization.WebHome]] for future plans.
1.3	457
1.1	458	== 11. Technology Stack Summary ==
1.3	459
1.1	460	Backend:
1.3	461
1.1	462	* Python (FastAPI or Django)
	463	* PostgreSQL (primary database)
	464	* Redis (caching)
	465	Frontend:
	466	* Modern JavaScript framework (React, Vue, or Svelte)
	467	* Server-side rendering for SEO
	468	AI/LLM:
	469	* Multi-provider orchestration (Claude, GPT-4, local models)
	470	* Fallback and cross-checking support
	471	Infrastructure:
	472	* Docker containers
	473	* Kubernetes or cloud platform auto-scaling
	474	* S3-compatible object storage
	475	Monitoring:
	476	* Prometheus + Grafana
	477	* Structured logging (ELK or cloud logging)
	478	* Error tracking (Sentry)
1.3	479
1.1	480	== 12. Related Pages ==
1.3	481
1.1	482	* [[AI Knowledge Extraction Layer (AKEL)>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
	483	* [[Storage Strategy>>FactHarbor.Specification.Architecture.WebHome]]
	484	* [[Data Model>>FactHarbor.Specification.Data Model.WebHome]]
	485	* [[API Layer>>FactHarbor.Specification.Architecture.WebHome]]
	486	* [[Design Decisions>>FactHarbor.Specification.Design-Decisions]]
	487	* [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]]

Wiki source code of Architecture

Applications

Navigation

Need help?