Skip to Content

Wiki source code of POC2: Robust Quality & Reliability

Version 1.1 by Robert Schaub on 2025/12/22 13:26

Show last authors

author	version	line-number	content
		1	= POC2: Robust Quality & Reliability =
		2
		3	Phase Goal: Prove AKEL produces high-quality outputs consistently at scale
		4
		5	Success Metric: <5% hallucination rate, all 4 quality gates operational
		6
		7
		8	== 1. Overview ==
		9
		10	POC2 extends POC1 by implementing the full quality assurance framework (all 4 gates), adding evidence deduplication, and processing significantly more test articles to validate system reliability at scale.
		11
		12	Key Innovation: Complete quality validation pipeline catches all categories of errors
		13
		14	What We're Proving:
		15	* All 4 quality gates work together effectively
		16	* Evidence deduplication prevents artificial inflation
		17	* System maintains quality at larger scale
		18	* Quality metrics dashboard provides actionable insights
		19
		20
		21	== 2. New Requirements ==
		22
		23	=== 2.1 NFR11: Complete Quality Assurance Framework ===
		24
		25	Add Gates 2 & 3 (POC1 had only Gates 1 & 4)
		26
		27	==== Gate 2: Evidence Relevance Validation ====
		28
		29	Purpose: Ensure AI-linked evidence actually relates to the claim
		30
		31	Validation Checks:
		32	1. Semantic Similarity: Cosine similarity between claim and evidence embeddings ≥ 0.6
		33	2. Entity Overlap: At least 1 shared named entity between claim and evidence
		34	3. Topic Relevance: Evidence discusses the claim's subject matter (score ≥ 0.5)
		35
		36	Action if Failed:
		37	* Discard irrelevant evidence (don't count it)
		38	* If <2 relevant evidence items remain → "Insufficient Evidence" verdict
		39	* Log discarded evidence for quality review
		40
		41	Target: 0% of evidence cited is off-topic
		42
		43
		44	==== Gate 3: Scenario Coherence Check ====
		45
		46	Purpose: Validate scenarios are logical, complete, and meaningfully different
		47
		48	Validation Checks:
		49	1. Completeness: All required fields populated (assumptions, scope, evidence context)
		50	2. Internal Consistency: Assumptions don't contradict each other (score <0.3)
		51	3. Distinctiveness: Scenarios are meaningfully different (similarity <0.8)
		52	4. Minimum Detail: At least 1 specific assumption per scenario
		53
		54	Action if Failed:
		55	* Merge duplicate scenarios
		56	* Flag contradictory assumptions for review
		57	* Reduce confidence score by 20%
		58	* Do not publish if <2 distinct scenarios
		59
		60	Target: 0% duplicate scenarios, all scenarios internally consistent
		61
		62
		63	=== 2.2 FR54: Evidence Deduplication (NEW) ===
		64
		65	Priority: HIGH
		66	Fulfills: Accurate evidence counting, prevents artificial inflation
		67
		68	Purpose: Prevent counting the same evidence multiple times when cited by different sources
		69
		70	Problem:
		71	* Wire services (AP, Reuters) redistribute same content
		72	* Different sites cite the same original study
		73	* Aggregators copy primary sources
		74	* AKEL might count this as "5 sources" when it's really 1
		75
		76	Solution: Content Fingerprinting
		77	* Generate SHA-256 hash of normalized text
		78	* Detect near-duplicates (≥85% similarity) using fuzzy matching
		79	* Track which sources cited each unique piece of evidence
		80	* Display provenance chain to user
		81
		82	Target: Duplicate detection >95% accurate, evidence counts reflect reality
		83
		84
		85	=== 2.3 NFR13: Quality Metrics Dashboard (Internal) ===
		86
		87	Priority: HIGH
		88	Fulfills: Real-time quality monitoring during development
		89
		90	Dashboard Metrics:
		91	* Claim processing statistics
		92	* Gate performance (pass/fail rates for each gate)
		93	* Evidence quality metrics
		94	* Hallucination rate tracking
		95	* Processing performance
		96
		97	Target: Dashboard functional, all metrics tracked, exportable
		98
		99
		100	== 3. Success Criteria ==
		101
		102	✅ Quality:
		103	* Hallucination rate <5% (target: <3%)
		104	* Average quality rating ≥8.0/10
		105	* 0 critical failures (publishable falsities)
		106	* Gates correctly identify >95% of low-quality outputs
		107
		108	✅ All 4 Gates Operational:
		109	* Gate 1: Claim validation working
		110	* Gate 2: Evidence relevance filtering working
		111	* Gate 3: Scenario coherence checking working
		112	* Gate 4: Verdict confidence assessment working
		113
		114	✅ Evidence Deduplication:
		115	* Duplicate detection >95% accurate
		116	* Evidence counts reflect reality
		117	* Provenance tracked correctly
		118
		119	✅ Metrics Dashboard:
		120	* All metrics implemented and tracking
		121	* Dashboard functional and useful
		122	* Alerts trigger appropriately
		123
		124
		125	== 4. Architecture Notes ==
		126
		127	POC2 Enhanced Architecture:
		128
		129	{{code}}
		130	Input → AKEL Processing → All 4 Quality Gates → Display
		131	(claims + scenarios (1: Claim validation
		132	+ evidence linking 2: Evidence relevance
		133	+ verdicts) 3: Scenario coherence
		134	4: Verdict confidence)
		135	{{/code}}
		136
		137	Key Additions from POC1:
		138	* Scenario generation component
		139	* Evidence deduplication system
		140	* Gates 2 & 3 implementation
		141	* Quality metrics collection
		142
		143	Still Simplified vs. Full System:
		144	* Single AKEL orchestration (not multi-component pipeline)
		145	* No review queue
		146	* No federation architecture
		147
		148	See: [[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]] for details
		149
		150
		151	== Related Pages ==
		152
		153	* [[POC1>>Test.FactHarbor.Roadmap.POC1.WebHome]] - Previous phase
		154	* [[Beta 0>>Test.FactHarbor.Roadmap.Beta0.WebHome]] - Next phase
		155	* [[Roadmap Overview>>Test.FactHarbor.Roadmap.WebHome]]
		156	* [[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]]
		157
		158
		159	Document Status: ✅ POC2 Specification Complete - Waiting for POC1 Completion
		160	Version: V0.9.70