Skip to Content

Wiki source code of POC2: Robust Quality & Reliability

Version 1.1 by Robert Schaub on 2025/12/21 11:25

Show last authors

author	version	line-number	content
		1	= POC2: Robust Quality & Reliability =
		2
		3	Phase Goal: Prove AKEL produces high-quality outputs consistently at scale
		4
		5	Success Metric: <5% hallucination rate, all 4 quality gates operational
		6
		7	---
		8
		9	== 1. Overview ==
		10
		11	POC2 extends POC1 by implementing the full quality assurance framework (all 4 gates), adding evidence deduplication, and processing significantly more test articles to validate system reliability at scale.
		12
		13	Key Innovation: Complete quality validation pipeline catches all categories of errors
		14
		15	What We're Proving:
		16	* All 4 quality gates work together effectively
		17	* Evidence deduplication prevents artificial inflation
		18	* System maintains quality at larger scale
		19	* Quality metrics dashboard provides actionable insights
		20
		21	---
		22
		23	== 2. New Requirements ==
		24
		25	=== 2.1 NFR11: Complete Quality Assurance Framework ===
		26
		27	Add Gates 2 & 3 (POC1 had only Gates 1 & 4)
		28
		29	==== Gate 2: Evidence Relevance Validation ====
		30
		31	Purpose: Ensure AI-linked evidence actually relates to the claim
		32
		33	Validation Checks:
		34	1. Semantic Similarity: Cosine similarity between claim and evidence embeddings ≥ 0.6
		35	2. Entity Overlap: At least 1 shared named entity between claim and evidence
		36	3. Topic Relevance: Evidence discusses the claim's subject matter (score ≥ 0.5)
		37
		38	Action if Failed:
		39	* Discard irrelevant evidence (don't count it)
		40	* If <2 relevant evidence items remain → "Insufficient Evidence" verdict
		41	* Log discarded evidence for quality review
		42
		43	Target: 0% of evidence cited is off-topic
		44
		45	---
		46
		47	==== Gate 3: Scenario Coherence Check ====
		48
		49	Purpose: Validate scenarios are logical, complete, and meaningfully different
		50
		51	Validation Checks:
		52	1. Completeness: All required fields populated (assumptions, scope, evidence context)
		53	2. Internal Consistency: Assumptions don't contradict each other (score <0.3)
		54	3. Distinctiveness: Scenarios are meaningfully different (similarity <0.8)
		55	4. Minimum Detail: At least 1 specific assumption per scenario
		56
		57	Action if Failed:
		58	* Merge duplicate scenarios
		59	* Flag contradictory assumptions for review
		60	* Reduce confidence score by 20%
		61	* Do not publish if <2 distinct scenarios
		62
		63	Target: 0% duplicate scenarios, all scenarios internally consistent
		64
		65	---
		66
		67	=== 2.2 FR54: Evidence Deduplication (NEW) ===
		68
		69	Priority: HIGH
		70	Fulfills: Accurate evidence counting, prevents artificial inflation
		71
		72	Purpose: Prevent counting the same evidence multiple times when cited by different sources
		73
		74	Problem:
		75	* Wire services (AP, Reuters) redistribute same content
		76	* Different sites cite the same original study
		77	* Aggregators copy primary sources
		78	* AKEL might count this as "5 sources" when it's really 1
		79
		80	Solution: Content Fingerprinting
		81	* Generate SHA-256 hash of normalized text
		82	* Detect near-duplicates (≥85% similarity) using fuzzy matching
		83	* Track which sources cited each unique piece of evidence
		84	* Display provenance chain to user
		85
		86	Target: Duplicate detection >95% accurate, evidence counts reflect reality
		87
		88	---
		89
		90	=== 2.3 NFR13: Quality Metrics Dashboard (Internal) ===
		91
		92	Priority: HIGH
		93	Fulfills: Real-time quality monitoring during development
		94
		95	Dashboard Metrics:
		96	* Claim processing statistics
		97	* Gate performance (pass/fail rates for each gate)
		98	* Evidence quality metrics
		99	* Hallucination rate tracking
		100	* Processing performance
		101
		102	Target: Dashboard functional, all metrics tracked, exportable
		103
		104	---
		105
		106	== 3. Success Criteria ==
		107
		108	✅ Quality:
		109	* Hallucination rate <5% (target: <3%)
		110	* Average quality rating ≥8.0/10
		111	* 0 critical failures (publishable falsities)
		112	* Gates correctly identify >95% of low-quality outputs
		113
		114	✅ All 4 Gates Operational:
		115	* Gate 1: Claim validation working
		116	* Gate 2: Evidence relevance filtering working
		117	* Gate 3: Scenario coherence checking working
		118	* Gate 4: Verdict confidence assessment working
		119
		120	✅ Evidence Deduplication:
		121	* Duplicate detection >95% accurate
		122	* Evidence counts reflect reality
		123	* Provenance tracked correctly
		124
		125	✅ Metrics Dashboard:
		126	* All metrics implemented and tracking
		127	* Dashboard functional and useful
		128	* Alerts trigger appropriately
		129
		130	---
		131
		132	== 4. Architecture Notes ==
		133
		134	POC2 Enhanced Architecture:
		135
		136	{{code}}
		137	Input → AKEL Processing → All 4 Quality Gates → Display
		138	(claims + scenarios (1: Claim validation
		139	+ evidence linking 2: Evidence relevance
		140	+ verdicts) 3: Scenario coherence
		141	4: Verdict confidence)
		142	{{/code}}
		143
		144	Key Additions from POC1:
		145	* Scenario generation component
		146	* Evidence deduplication system
		147	* Gates 2 & 3 implementation
		148	* Quality metrics collection
		149
		150	Still Simplified vs. Full System:
		151	* Single AKEL orchestration (not multi-component pipeline)
		152	* No review queue
		153	* No federation architecture
		154
		155	See: [[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]] for details
		156
		157	---
		158
		159	== Related Pages ==
		160
		161	* [[POC1>>Test.FactHarbor.Roadmap.POC1.WebHome]] - Previous phase
		162	* [[Beta 0>>Test.FactHarbor.Roadmap.Beta0.WebHome]] - Next phase
		163	* [[Roadmap Overview>>Test.FactHarbor.Roadmap.WebHome]]
		164	* [[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]]
		165
		166	---
		167
		168	Document Status: ✅ POC2 Specification Complete - Waiting for POC1 Completion
		169	Version: V0.9.70