Skip to Content

Wiki source code of POC2: Robust Quality & Reliability

Last modified by Robert Schaub on 2025/12/24 20:35

Show last authors

author	version	line-number	content
		1	= POC2: Robust Quality & Reliability =
		2
		3	Phase Goal: Prove AKEL produces high-quality outputs consistently at scale
		4
		5	Success Metric: <5% hallucination rate, all 4 quality gates operational
		6
		7	----
		8
		9	== 1. Overview ==
		10
		11	POC2 extends POC1 by implementing the full quality assurance framework (all 4 gates), adding evidence deduplication, and processing significantly more test articles to validate system reliability at scale.
		12
		13	Key Innovation: Complete quality validation pipeline catches all categories of errors
		14
		15	What We're Proving:
		16
		17	* All 4 quality gates work together effectively
		18	* Evidence deduplication prevents artificial inflation
		19	* System maintains quality at larger scale
		20	* Quality metrics dashboard provides actionable insights
		21
		22	----
		23
		24	== 2. New Requirements ==
		25
		26	=== 2.1 NFR11: Complete Quality Assurance Framework ===
		27
		28	Add Gates 2 & 3 (POC1 had only Gates 1 & 4)
		29
		30	==== Gate 2: Evidence Relevance Validation ====
		31
		32	Purpose: Ensure AI-linked evidence actually relates to the claim
		33
		34	Validation Checks:
		35
		36	1. Semantic Similarity: Cosine similarity between claim and evidence embeddings ≥ 0.6
		37	2. Entity Overlap: At least 1 shared named entity between claim and evidence
		38	3. Topic Relevance: Evidence discusses the claim's subject matter (score ≥ 0.5)
		39
		40	Action if Failed:
		41
		42	* Discard irrelevant evidence (don't count it)
		43	* If <2 relevant evidence items remain → "Insufficient Evidence" verdict
		44	* Log discarded evidence for quality review
		45
		46	Target: 0% of evidence cited is off-topic
		47
		48	----
		49
		50	==== Gate 3: Scenario Coherence Check ====
		51
		52	Purpose: Validate scenarios are logical, complete, and meaningfully different
		53
		54	Validation Checks:
		55
		56	1. Completeness: All required fields populated (assumptions, scope, evidence context)
		57	2. Internal Consistency: Assumptions don't contradict each other (score <0.3)
		58	3. Distinctiveness: Scenarios are meaningfully different (similarity <0.8)
		59	4. Minimum Detail: At least 1 specific assumption per scenario
		60
		61	Action if Failed:
		62
		63	* Merge duplicate scenarios
		64	* Flag contradictory assumptions for review
		65	* Reduce confidence score by 20%
		66	* Do not publish if <2 distinct scenarios
		67
		68	Target: 0% duplicate scenarios, all scenarios internally consistent
		69
		70	----
		71
		72	=== 2.2 FR54: Evidence Deduplication (NEW) ===
		73
		74	Priority: HIGH
		75	Fulfills: Accurate evidence counting, prevents artificial inflation
		76
		77	Purpose: Prevent counting the same evidence multiple times when cited by different sources
		78
		79	Problem:
		80
		81	* Wire services (AP, Reuters) redistribute same content
		82	* Different sites cite the same original study
		83	* Aggregators copy primary sources
		84	* AKEL might count this as "5 sources" when it's really 1
		85
		86	Solution: Content Fingerprinting
		87
		88	* Generate SHA-256 hash of normalized text
		89	* Detect near-duplicates (≥85% similarity) using fuzzy matching
		90	* Track which sources cited each unique piece of evidence
		91	* Display provenance chain to user
		92
		93	Target: Duplicate detection >95% accurate, evidence counts reflect reality
		94
		95	----
		96
		97	=== 2.3 NFR13: Quality Metrics Dashboard (Internal) ===
		98
		99	Priority: HIGH
		100	Fulfills: Real-time quality monitoring during development
		101
		102	Dashboard Metrics:
		103
		104	* Claim processing statistics
		105	* Gate performance (pass/fail rates for each gate)
		106	* Evidence quality metrics
		107	* Hallucination rate tracking
		108	* Processing performance
		109
		110	Target: Dashboard functional, all metrics tracked, exportable
		111
		112	----
		113
		114	== 3. Success Criteria ==
		115
		116	✅ Quality:
		117
		118	* Hallucination rate <5% (target: <3%)
		119	* Average quality rating ≥8.0/10
		120	* 0 critical failures (publishable falsities)
		121	* Gates correctly identify >95% of low-quality outputs
		122
		123	✅ All 4 Gates Operational:
		124
		125	* Gate 1: Claim validation working
		126	* Gate 2: Evidence relevance filtering working
		127	* Gate 3: Scenario coherence checking working
		128	* Gate 4: Verdict confidence assessment working
		129
		130	✅ Evidence Deduplication:
		131
		132	* Duplicate detection >95% accurate
		133	* Evidence counts reflect reality
		134	* Provenance tracked correctly
		135
		136	✅ Metrics Dashboard:
		137
		138	* All metrics implemented and tracking
		139	* Dashboard functional and useful
		140	* Alerts trigger appropriately
		141
		142	----
		143
		144	== 4. Architecture Notes ==
		145
		146	POC2 Enhanced Architecture:
		147
		148	{{code}}
		149	Input → AKEL Processing → All 4 Quality Gates → Display
		150	(claims + scenarios (1: Claim validation
		151	+ evidence linking 2: Evidence relevance
		152	+ verdicts) 3: Scenario coherence
		153	4: Verdict confidence)
		154	{{/code}}
		155
		156	Key Additions from POC1:
		157
		158	* Scenario generation component
		159	* Evidence deduplication system
		160	* Gates 2 & 3 implementation
		161	* Quality metrics collection
		162
		163	Still Simplified vs. Full System:
		164
		165	* Single AKEL orchestration (not multi-component pipeline)
		166	* No review queue
		167	* No federation architecture
		168
		169	See: [[Architecture>>Archive.FactHarbor delta for V0\.9\.70.Specification.Architecture.WebHome]] for details
		170
		171	----
		172
		173	== Related Pages ==
		174
		175	* [[POC1>>Archive.FactHarbor delta for V0\.9\.70.Roadmap.POC1.WebHome]] - Previous phase
		176	* [[Beta 0>>Archive.FactHarbor delta for V0\.9\.70.Roadmap.Beta0.WebHome]] - Next phase
		177	* [[Roadmap Overview>>Archive.FactHarbor delta for V0\.9\.70.Roadmap.WebHome]]
		178	* [[Architecture>>Archive.FactHarbor delta for V0\.9\.70.Specification.Architecture.WebHome]]
		179
		180	----
		181
		182	Document Status: ✅ POC2 Specification Complete - Waiting for POC1 Completion
		183	Version: V0.9.70