Wiki source code of POC2: Robust Quality & Reliability

Version 1.5 by Robert Schaub on 2025/12/21 13:38

version	line-number	content
1.1	1	= POC2: Robust Quality & Reliability =
	2
	3	Phase Goal: Prove AKEL produces high-quality outputs consistently at scale
	4
	5	Success Metric: <5% hallucination rate, all 4 quality gates operational
	6
1.3	7	----
1.1	8
	9	== 1. Overview ==
	10
	11	POC2 extends POC1 by implementing the full quality assurance framework (all 4 gates), adding evidence deduplication, and processing significantly more test articles to validate system reliability at scale.
	12
	13	Key Innovation: Complete quality validation pipeline catches all categories of errors
	14
	15	What We're Proving:
1.2	16
1.1	17	* All 4 quality gates work together effectively
	18	* Evidence deduplication prevents artificial inflation
	19	* System maintains quality at larger scale
	20	* Quality metrics dashboard provides actionable insights
	21
1.3	22	----
1.1	23
	24	== 2. New Requirements ==
	25
	26	=== 2.1 NFR11: Complete Quality Assurance Framework ===
	27
	28	Add Gates 2 & 3 (POC1 had only Gates 1 & 4)
	29
	30	==== Gate 2: Evidence Relevance Validation ====
	31
	32	Purpose: Ensure AI-linked evidence actually relates to the claim
	33
	34	Validation Checks:
1.2	35
1.1	36	1. Semantic Similarity: Cosine similarity between claim and evidence embeddings ≥ 0.6
	37	2. Entity Overlap: At least 1 shared named entity between claim and evidence
	38	3. Topic Relevance: Evidence discusses the claim's subject matter (score ≥ 0.5)
	39
	40	Action if Failed:
1.2	41
1.1	42	* Discard irrelevant evidence (don't count it)
	43	* If <2 relevant evidence items remain → "Insufficient Evidence" verdict
	44	* Log discarded evidence for quality review
	45
	46	Target: 0% of evidence cited is off-topic
	47
1.3	48	----
1.1	49
	50	==== Gate 3: Scenario Coherence Check ====
	51
	52	Purpose: Validate scenarios are logical, complete, and meaningfully different
	53
	54	Validation Checks:
1.2	55
1.1	56	1. Completeness: All required fields populated (assumptions, scope, evidence context)
	57	2. Internal Consistency: Assumptions don't contradict each other (score <0.3)
	58	3. Distinctiveness: Scenarios are meaningfully different (similarity <0.8)
	59	4. Minimum Detail: At least 1 specific assumption per scenario
	60
	61	Action if Failed:
1.2	62
1.1	63	* Merge duplicate scenarios
	64	* Flag contradictory assumptions for review
	65	* Reduce confidence score by 20%
	66	* Do not publish if <2 distinct scenarios
	67
	68	Target: 0% duplicate scenarios, all scenarios internally consistent
	69
1.3	70	----
1.1	71
	72	=== 2.2 FR54: Evidence Deduplication (NEW) ===
	73
	74	Priority: HIGH
	75	Fulfills: Accurate evidence counting, prevents artificial inflation
	76
	77	Purpose: Prevent counting the same evidence multiple times when cited by different sources
	78
	79	Problem:
1.2	80
1.1	81	* Wire services (AP, Reuters) redistribute same content
	82	* Different sites cite the same original study
	83	* Aggregators copy primary sources
	84	* AKEL might count this as "5 sources" when it's really 1
	85
	86	Solution: Content Fingerprinting
1.2	87
1.1	88	* Generate SHA-256 hash of normalized text
	89	* Detect near-duplicates (≥85% similarity) using fuzzy matching
	90	* Track which sources cited each unique piece of evidence
	91	* Display provenance chain to user
	92
	93	Target: Duplicate detection >95% accurate, evidence counts reflect reality
	94
1.3	95	----
1.1	96
	97	=== 2.3 NFR13: Quality Metrics Dashboard (Internal) ===
	98
	99	Priority: HIGH
	100	Fulfills: Real-time quality monitoring during development
	101
	102	Dashboard Metrics:
1.2	103
1.1	104	* Claim processing statistics
	105	* Gate performance (pass/fail rates for each gate)
	106	* Evidence quality metrics
	107	* Hallucination rate tracking
	108	* Processing performance
	109
	110	Target: Dashboard functional, all metrics tracked, exportable
	111
1.3	112	----
1.1	113
	114	== 3. Success Criteria ==
	115
	116	✅ Quality:
1.2	117
1.1	118	* Hallucination rate <5% (target: <3%)
	119	* Average quality rating ≥8.0/10
	120	* 0 critical failures (publishable falsities)
	121	* Gates correctly identify >95% of low-quality outputs
	122
	123	✅ All 4 Gates Operational:
1.2	124
1.1	125	* Gate 1: Claim validation working
	126	* Gate 2: Evidence relevance filtering working
	127	* Gate 3: Scenario coherence checking working
	128	* Gate 4: Verdict confidence assessment working
	129
	130	✅ Evidence Deduplication:
1.2	131
1.1	132	* Duplicate detection >95% accurate
	133	* Evidence counts reflect reality
	134	* Provenance tracked correctly
	135
	136	✅ Metrics Dashboard:
1.2	137
1.1	138	* All metrics implemented and tracking
	139	* Dashboard functional and useful
	140	* Alerts trigger appropriately
	141
1.3	142	----
1.1	143
	144	== 4. Architecture Notes ==
	145
	146	POC2 Enhanced Architecture:
	147
	148	{{code}}
	149	Input → AKEL Processing → All 4 Quality Gates → Display
	150	(claims + scenarios (1: Claim validation
	151	+ evidence linking 2: Evidence relevance
	152	+ verdicts) 3: Scenario coherence
	153	4: Verdict confidence)
	154	{{/code}}
	155
	156	Key Additions from POC1:
1.2	157
1.1	158	* Scenario generation component
	159	* Evidence deduplication system
	160	* Gates 2 & 3 implementation
	161	* Quality metrics collection
	162
	163	Still Simplified vs. Full System:
1.2	164
1.1	165	* Single AKEL orchestration (not multi-component pipeline)
	166	* No review queue
	167	* No federation architecture
	168
	169	See: [[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]] for details
	170
1.3	171	----
1.1	172
	173	== Related Pages ==
	174
1.3	175	* [[POC1>>FactHarbor.Archive.FactHarbor delta for V0\.9\.70.Roadmap.POC1.WebHome]] - Previous phase
1.2	176	* [[Beta 0>>FactHarbor.Archive.FactHarbor delta for V0\.9\.70.Roadmap.Beta0.WebHome]] - Next phase
1.5	177	* [[Roadmap Overview>>FactHarbor.Archive.FactHarbor delta for V0\.9\.70.Roadmap.WebHome]]
1.1	178	* [[Architecture>>Test.FactHarbor.Specification.Architecture.WebHome]]
	179
1.3	180	----
1.1	181
	182	Document Status: ✅ POC2 Specification Complete - Waiting for POC1 Completion
	183	Version: V0.9.70

Wiki source code of POC2: Robust Quality & Reliability

Applications

Navigation

Need help?