Wiki source code of Architecture

Last modified by Robert Schaub on 2026/02/08 08:23

version	line-number	content
1.1	1	= Architecture =
1.3	2
1.1	3	FactHarbor's architecture is designed for simplicity, automation, and continuous improvement.
1.3	4
1.1	5	== 1. Core Principles ==
1.3	6
1.1	7	* AI-First: AKEL (AI) is the primary system, humans supplement
	8	* Publish by Default: No centralized approval (removed in V0.9.50), publish with confidence scores
	9	* System Over Data: Fix algorithms, not individual outputs
	10	* Measure Everything: Quality metrics drive improvements
	11	* Scale Through Automation: Minimal human intervention
	12	* Start Simple: Add complexity only when metrics prove necessary
1.3	13
1.1	14	== 2. High-Level Architecture ==
1.3	15
1.15	16	{{include reference="Archive.FactHarbor 2026\.01\.20.Specification.Diagrams.High-Level Architecture.WebHome"/}}
1.3	17
1.1	18	=== 2.1 Three-Layer Architecture ===
1.3	19
1.1	20	FactHarbor uses a clean three-layer architecture:
1.3	21
1.1	22	==== Interface Layer ====
1.3	23
1.1	24	Handles all user and system interactions:
1.3	25
1.1	26	* Web UI: Browse claims, view evidence, submit feedback
	27	* REST API: Programmatic access for integrations
	28	* Authentication & Authorization: User identity and permissions
	29	* Rate Limiting: Protect against abuse
1.3	30
1.1	31	==== Processing Layer ====
1.3	32
1.1	33	Core business logic and AI processing:
1.3	34
1.1	35	* AKEL Pipeline: AI-driven claim analysis (parallel processing)
1.3	36	* Parse and extract claim components
	37	* Gather evidence from multiple sources
	38	* Check source track records
	39	* Extract scenarios from evidence
	40	* Synthesize verdicts
	41	* Calculate risk scores
1.1	42	* Background Jobs: Automated maintenance tasks
1.3	43	* Source track record updates (weekly)
	44	* Cache warming and invalidation
	45	* Metrics aggregation
	46	* Data archival
1.1	47	* Quality Monitoring: Automated quality checks
1.3	48	* Anomaly detection
	49	* Contradiction detection
	50	* Completeness validation
1.1	51	* Moderation Detection: Automated abuse detection
1.3	52	* Spam identification
	53	* Manipulation detection
	54	* Flag suspicious activity
	55
1.1	56	==== Data & Storage Layer ====
1.3	57
1.1	58	Persistent data storage and caching:
1.3	59
1.1	60	* PostgreSQL: Primary database for all core data
1.3	61	* Claims, evidence, sources, users
	62	* Scenarios, edits, audit logs
	63	* Built-in full-text search
	64	* Time-series capabilities for metrics
1.1	65	* Redis: High-speed caching layer
1.3	66	* Session data
	67	* Frequently accessed claims
	68	* API rate limiting
1.1	69	* S3 Storage: Long-term archival
1.3	70	* Old edit history (90+ days)
	71	* AKEL processing logs
	72	* Backup snapshots
1.1	73	Optional future additions (add only when metrics prove necessary):
	74	* Elasticsearch: If PostgreSQL full-text search becomes slow
	75	* TimescaleDB: If metrics queries become a bottleneck
1.3	76
1.1	77	=== 2.2 Design Philosophy ===
1.3	78
1.1	79	Start Simple, Evolve Based on Metrics
	80	The architecture deliberately starts simple:
1.3	81
1.1	82	* Single primary database (PostgreSQL handles most workloads initially)
	83	* Three clear layers (easy to understand and maintain)
	84	* Automated operations (minimal human intervention)
	85	* Measure before optimizing (add complexity only when proven necessary)
	86	See [[Design Decisions>>FactHarbor.Specification.Design-Decisions]] and [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for detailed rationale.
1.3	87
1.1	88	== 3. AKEL Architecture ==
1.3	89
1.1	90	{{include reference="FactHarbor.Specification.Diagrams.AKEL_Architecture.WebHome"/}}
1.11	91	See [[AI Knowledge Extraction Layer (AKEL)>>Archive.FactHarbor 2026\.01\.20.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] for detailed information.
1.1	92
	93	== 3.5 Claim Processing Architecture ==
	94
	95	FactHarbor's claim processing architecture is designed to handle both single-claim and multi-claim submissions efficiently.
	96
	97	=== Multi-Claim Handling ===
	98
	99	Users often submit:
1.3	100
1.1	101	* Text with multiple claims: Articles, statements, or paragraphs containing several distinct factual claims
	102	* Web pages: URLs that are analyzed to extract all verifiable claims
	103	* Single claims: Simple, direct factual statements
	104
	105	The first processing step is always Claim Extraction: identifying and isolating individual verifiable claims from submitted content.
	106
	107	=== Processing Phases ===
	108
	109	POC Implementation (Two-Phase):
	110
	111	Phase 1 - Claim Extraction:
1.3	112
1.1	113	* LLM analyzes submitted content
	114	* Extracts all distinct, verifiable claims
	115	* Returns structured list of claims with context
	116
	117	Phase 2 - Parallel Analysis:
1.3	118
1.1	119	* Each claim processed independently by LLM
	120	* Single call per claim generates: Evidence, Scenarios, Sources, Verdict, Risk
	121	* Parallelized across all claims
	122	* Results aggregated for presentation
	123
	124	Production Implementation (Three-Phase):
	125
	126	Phase 1 - Extraction + Validation:
1.3	127
1.1	128	* Extract claims from content
	129	* Validate clarity and uniqueness
	130	* Filter vague or duplicate claims
	131
	132	Phase 2 - Evidence Gathering (Parallel):
1.3	133
1.1	134	* Independent evidence gathering per claim
	135	* Source validation and scenario generation
	136	* Quality gates prevent poor data from advancing
	137
	138	Phase 3 - Verdict Generation (Parallel):
1.3	139
1.1	140	* Generate verdict from validated evidence
	141	* Confidence scoring and risk assessment
	142	* Low-confidence cases routed to human review
	143
	144	=== Architectural Benefits ===
	145
	146	Scalability:
1.3	147
	148	* Process 100 claims with 3x latency of single claim
1.1	149	* Parallel processing across independent claims
	150	* Linear cost scaling with claim count
	151
	152	Quality:
1.3	153
1.1	154	* Validation gates between phases
	155	* Errors isolated to individual claims
	156	* Clear observability per processing step
	157
	158	Flexibility:
1.3	159
1.1	160	* Each phase optimizable independently
	161	* Can use different model sizes per phase
	162	* Easy to add human review at decision points
	163
1.3	164	== 4. Storage Architecture ==
1.1	165
1.16	166	{{include reference="Archive.FactHarbor 2026\.01\.20.Specification.Diagrams.Storage Architecture.WebHome"/}}
1.12	167	See [[Storage Strategy>>Archive.FactHarbor 2026\.01\.20.Specification.Architecture.WebHome]] for detailed information.
1.3	168
1.1	169	== 4.5 Versioning Architecture ==
1.3	170
1.17	171	{{include reference="Archive.FactHarbor 2026\.01\.20.Specification.Diagrams.Versioning Architecture.WebHome"/}}
1.3	172
1.1	173	== 5. Automated Systems in Detail ==
1.3	174
1.1	175	FactHarbor relies heavily on automation to achieve scale and quality. Here's how each automated system works:
1.3	176
1.1	177	=== 5.1 AKEL (AI Knowledge Evaluation Layer) ===
1.3	178
1.1	179	What it does: Primary AI processing engine that analyzes claims automatically
	180	Inputs:
1.3	181
1.1	182	* User-submitted claim text
	183	* Existing evidence and sources
	184	* Source track record database
	185	Processing steps:
1.3	186
1.1	187	1. Parse & Extract: Identify key components, entities, assertions
	188	2. Gather Evidence: Search web and database for relevant sources
	189	3. Check Sources: Evaluate source reliability using track records
	190	4. Extract Scenarios: Identify different contexts from evidence
	191	5. Synthesize Verdict: Compile evidence assessment per scenario
	192	6. Calculate Risk: Assess potential harm and controversy
	193	Outputs:
1.3	194
1.1	195	* Structured claim record
	196	* Evidence links with relevance scores
	197	* Scenarios with context descriptions
	198	* Verdict summary per scenario
	199	* Overall confidence score
	200	* Risk assessment
	201	Timing: 10-18 seconds total (parallel processing)
1.3	202
1.1	203	=== 5.2 Background Jobs ===
1.3	204
1.1	205	Source Track Record Updates (Weekly):
1.3	206
1.1	207	* Analyze claim outcomes from past week
	208	* Calculate source accuracy and reliability
	209	* Update source_track_record table
	210	* Never triggered by individual claims (prevents circular dependencies)
	211	Cache Management (Continuous):
	212	* Warm cache for popular claims
	213	* Invalidate cache on claim updates
	214	* Monitor cache hit rates
	215	Metrics Aggregation (Hourly):
	216	* Roll up detailed metrics
	217	* Calculate system health indicators
	218	* Generate performance reports
	219	Data Archival (Daily):
	220	* Move old AKEL logs to S3 (90+ days)
	221	* Archive old edit history
	222	* Compress and backup data
1.3	223
1.1	224	=== 5.3 Quality Monitoring ===
1.3	225
1.1	226	Automated checks run continuously:
1.3	227
1.1	228	* Anomaly Detection: Flag unusual patterns
1.3	229	* Sudden confidence score changes
	230	* Unusual evidence distributions
	231	* Suspicious source patterns
1.1	232	* Contradiction Detection: Identify conflicts
1.3	233	* Evidence that contradicts other evidence
	234	* Claims with internal contradictions
	235	* Source track record anomalies
1.1	236	* Completeness Validation: Ensure thoroughness
1.3	237	* Sufficient evidence gathered
	238	* Multiple source types represented
	239	* Key scenarios identified
	240
1.1	241	=== 5.4 Moderation Detection ===
1.3	242
1.1	243	Automated abuse detection:
1.3	244
1.1	245	* Spam Identification: Pattern matching for spam claims
	246	* Manipulation Detection: Identify coordinated editing
	247	* Gaming Detection: Flag attempts to game source scores
	248	* Suspicious Activity: Log unusual behavior patterns
	249	Human Review: Moderators review flagged items, system learns from decisions
1.3	250
1.1	251	== 6. Scalability Strategy ==
1.3	252
1.1	253	=== 6.1 Horizontal Scaling ===
1.3	254
1.1	255	Components scale independently:
1.3	256
1.1	257	* AKEL Workers: Add more processing workers as claim volume grows
	258	* Database Read Replicas: Add replicas for read-heavy workloads
	259	* Cache Layer: Redis cluster for distributed caching
	260	* API Servers: Load-balanced API instances
1.3	261
1.1	262	=== 6.2 Vertical Scaling ===
1.3	263
1.1	264	Individual components can be upgraded:
1.3	265
1.1	266	* Database Server: Increase CPU/RAM for PostgreSQL
	267	* Cache Memory: Expand Redis memory
	268	* Worker Resources: More powerful AKEL worker machines
1.3	269
1.1	270	=== 6.3 Performance Optimization ===
1.3	271
1.1	272	Built-in optimizations:
1.3	273
1.1	274	* Denormalized Data: Cache summary data in claim records (70% fewer joins)
	275	* Parallel Processing: AKEL pipeline processes in parallel (40% faster)
	276	* Intelligent Caching: Redis caches frequently accessed data
	277	* Background Processing: Non-urgent tasks run asynchronously
1.3	278
1.1	279	== 7. Monitoring & Observability ==
1.3	280
1.1	281	=== 7.1 Key Metrics ===
1.3	282
1.1	283	System tracks:
1.3	284
1.1	285	* Performance: AKEL processing time, API response time, cache hit rate
	286	* Quality: Confidence score distribution, evidence completeness, contradiction rate
	287	* Usage: Claims per day, active users, API requests
	288	* Errors: Failed AKEL runs, API errors, database issues
1.3	289
1.1	290	=== 7.2 Alerts ===
1.3	291
1.1	292	Automated alerts for:
1.3	293
1.1	294	* Processing time >30 seconds (threshold breach)
	295	* Error rate >1% (quality issue)
	296	* Cache hit rate <80% (cache problem)
	297	* Database connections >80% capacity (scaling needed)
1.3	298
1.1	299	=== 7.3 Dashboards ===
1.3	300
1.1	301	Real-time monitoring:
1.3	302
1.1	303	* System Health: Overall status and key metrics
	304	* AKEL Performance: Processing time breakdown
	305	* Quality Metrics: Confidence scores, completeness
	306	* User Activity: Usage patterns, peak times
1.3	307
1.1	308	== 8. Security Architecture ==
1.3	309
1.1	310	=== 8.1 Authentication & Authorization ===
1.3	311
1.1	312	* User Authentication: Secure login with password hashing
	313	* Role-Based Access: Reader, Contributor, Moderator, Admin
	314	* API Keys: For programmatic access
	315	* Rate Limiting: Prevent abuse
1.3	316
1.1	317	=== 8.2 Data Security ===
1.3	318
1.1	319	* Encryption: TLS for transport, encrypted storage for sensitive data
	320	* Audit Logging: Track all significant changes
	321	* Input Validation: Sanitize all user inputs
	322	* SQL Injection Protection: Parameterized queries
1.3	323
1.1	324	=== 8.3 Abuse Prevention ===
1.3	325
1.1	326	* Rate Limiting: Prevent flooding and DDoS
	327	* Automated Detection: Flag suspicious patterns
	328	* Human Review: Moderators investigate flagged content
	329	* Ban Mechanisms: Block abusive users/IPs
1.3	330
1.1	331	== 9. Deployment Architecture ==
1.3	332
1.1	333	=== 9.1 Production Environment ===
1.3	334
1.1	335	Components:
1.3	336
1.1	337	* Load Balancer (HAProxy or cloud LB)
	338	* Multiple API servers (stateless)
	339	* AKEL worker pool (auto-scaling)
	340	* PostgreSQL primary + read replicas
	341	* Redis cluster
	342	* S3-compatible storage
	343	Regions: Single region for V1.0, multi-region when needed
1.3	344
1.1	345	=== 9.2 Development & Staging ===
1.3	346
1.1	347	Development: Local Docker Compose setup
	348	Staging: Scaled-down production replica
	349	CI/CD: Automated testing and deployment
1.3	350
1.1	351	=== 9.3 Disaster Recovery ===
1.3	352
1.1	353	* Database Backups: Daily automated backups to S3
	354	* Point-in-Time Recovery: Transaction log archival
	355	* Replication: Real-time replication to standby
	356	* Recovery Time Objective: <4 hours
	357
	358	=== 9.5 Federation Architecture Diagram ===
	359
1.14	360	{{include reference="Archive.FactHarbor 2026\.01\.20.Specification.Diagrams.Federation Architecture.WebHome"/}}
1.1	361
	362	== 10. Future Architecture Evolution ==
1.3	363
1.1	364	=== 10.1 When to Add Complexity ===
1.3	365
1.1	366	See [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]] for specific triggers.
	367	Elasticsearch: When PostgreSQL search consistently >500ms
	368	TimescaleDB: When metrics queries consistently >1s
	369	Federation: When 10,000+ users and explicit demand
	370	Complex Reputation: When 100+ active contributors
1.3	371
1.1	372	=== 10.2 Federation (V2.0+) ===
1.3	373
1.1	374	Deferred until:
1.3	375
1.1	376	* Core product proven with 10,000+ users
	377	* User demand for decentralization
	378	* Single-node limits reached
1.18	379	See [[Federation & Decentralization>>Archive.FactHarbor 2026\.01\.20.Specification.Federation & Decentralization.WebHome]] for future plans.
1.3	380
1.1	381	== 11. Technology Stack Summary ==
1.3	382
1.1	383	Backend:
1.3	384
1.1	385	* Python (FastAPI or Django)
	386	* PostgreSQL (primary database)
	387	* Redis (caching)
	388	Frontend:
	389	* Modern JavaScript framework (React, Vue, or Svelte)
	390	* Server-side rendering for SEO
	391	AI/LLM:
	392	* Multi-provider orchestration (Claude, GPT-4, local models)
	393	* Fallback and cross-checking support
	394	Infrastructure:
	395	* Docker containers
	396	* Kubernetes or cloud platform auto-scaling
	397	* S3-compatible object storage
	398	Monitoring:
	399	* Prometheus + Grafana
	400	* Structured logging (ELK or cloud logging)
	401	* Error tracking (Sentry)
1.3	402
1.1	403	== 12. Related Pages ==
1.3	404
1.11	405	* [[AI Knowledge Extraction Layer (AKEL)>>Archive.FactHarbor 2026\.01\.20.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
1.12	406	* [[Storage Strategy>>Archive.FactHarbor 2026\.01\.20.Specification.Architecture.WebHome]]
1.13	407	* [[Data Model>>Archive.FactHarbor 2026\.01\.20.Specification.Data Model.WebHome]]
1.12	408	* [[API Layer>>Archive.FactHarbor 2026\.01\.20.Specification.Architecture.WebHome]]
1.1	409	* [[Design Decisions>>FactHarbor.Specification.Design-Decisions]]
	410	* [[When to Add Complexity>>FactHarbor.Specification.When-to-Add-Complexity]]

Wiki source code of Architecture

Applications

Navigation

Need help?