Skip to Content

Wiki source code of AI Knowledge Extraction Layer (AKEL)

Last modified by Robert Schaub on 2025/12/24 20:33

Show last authors

author	version	line-number	content
		1	= AKEL — AI Knowledge Extraction Layer =
		2
		3	AKEL is FactHarbor's automated intelligence subsystem.
		4	Its purpose is to reduce human workload, enhance consistency, and enable scalable knowledge processing — without ever replacing human judgment.
		5
		6	AKEL outputs are marked with AuthorType = AI and published according to risk-based review policies (see Publication Modes below).
		7
		8	AKEL operates in two modes:
		9
		10	* Single-node mode (POC & Beta 0)
		11	* Federated multi-node mode (Release 1.0+)
		12
		13	Human reviewers, experts, and moderators always retain final authority over content marked as "Human-Reviewed."
		14
		15
		16	== 1. Purpose and Role ==
		17
		18	AKEL transforms unstructured inputs into structured, publication-ready content.
		19
		20	Core responsibilities:
		21
		22	* Claim extraction from arbitrary text
		23	* Claim classification (domain, type, evaluability, safety, risk tier)
		24	* Scenario generation (definitions, boundaries, assumptions, methodology)
		25	* Evidence summarization and metadata extraction
		26	* Contradiction detection and counter-evidence search
		27	* Reservation and limitation identification
		28	* Bubble detection (echo chambers, conspiracy theories, isolated sources)
		29	* Re-evaluation proposal generation
		30	* Cross-node embedding exchange (Release 1.0+)
		31
		32	== 2. Components ==
		33
		34	* AKEL Orchestrator – central coordinator
		35	* Claim Extractor
		36	* Claim Classifier (with risk tier assignment)
		37	* Scenario Generator
		38	* Evidence Summarizer
		39	* Contradiction Detector (enhanced with counter-evidence search)
		40	* Quality Gate Validator
		41	* Audit Sampling Scheduler
		42	* Embedding Handler (Release 1.0+)
		43	* Federation Sync Adapter (Release 1.0+)
		44
		45	== 3. Inputs and Outputs ==
		46
		47	=== 3.1 Inputs ===
		48
		49	* User-submitted claims or evidence
		50	* Uploaded documents
		51	* URLs or citations
		52	* External LLM API (optional)
		53	* Embeddings (from local or federated peers)
		54
		55	=== 3.2 Outputs (publication mode varies by risk tier) ===
		56
		57	* ClaimVersion (draft or AI-generated)
		58	* ScenarioVersion (draft or AI-generated)
		59	* EvidenceVersion (summary + metadata, draft or AI-generated)
		60	* VerdictVersion (draft, AI-generated, or human-reviewed)
		61	* Contradiction alerts
		62	* Reservation and limitation notices
		63	* Re-evaluation proposals
		64	* Updated embeddings
		65
		66	== 4. Publication Modes ==
		67
		68	AKEL content is published according to three modes:
		69
		70	=== 4.1 Mode 1: Draft-Only (Never Public) ===
		71
		72	Used for:
		73
		74	* Failed quality gate checks
		75	* Sensitive topics flagged for expert review
		76	* Unclear scope or missing critical sources
		77	* High reputational risk content
		78
		79	Visibility: Internal review queue only
		80
		81	=== 4.2 Mode 2: Published as AI-Generated (No Prior Human Review) ===
		82
		83	Requirements:
		84
		85	* All automated quality gates passed (see below)
		86	* Risk tier permits AI-draft publication (Tier B or C)
		87	* Contradiction search completed successfully
		88	* Clear labeling as "AI-Generated, Awaiting Human Review"
		89
		90	Label shown to users:
		91	```
		92	[AI-Generated] This content was produced by AI and has not yet been human-reviewed.
		93	Source: AI \| Review Status: Pending \| Risk Tier: [B/C]
		94	Contradiction Search: Completed \| Last Updated: [timestamp]
		95	```
		96
		97	User actions:
		98
		99	* Browse and read content
		100	* Request human review (escalates to review queue)
		101	* Flag for expert attention
		102
		103	=== 4.3 Mode 3: Published as Human-Reviewed ===
		104
		105	Requirements:
		106
		107	* Human reviewer or domain expert validated
		108	* All quality gates passed
		109	* Visible "Human-Reviewed" mark with reviewer role and timestamp
		110
		111	Label shown to users:
		112	```
		113	[Human-Reviewed] This content has been validated by human reviewers.
		114	Source: AI+Human \| Review Status: Approved \| Reviewed by: [Role] on [timestamp]
		115	Risk Tier: [A/B/C] \| Contradiction Search: Completed
		116	```
		117
		118
		119	== 5. Risk Tiers ==
		120
		121	AKEL assigns risk tiers to all content to determine appropriate review requirements:
		122
		123	=== 5.1 Tier A — High Risk / High Impact ===
		124
		125	Domains: Medical, legal, elections, safety/security, major reputational harm
		126
		127	Publication policy:
		128
		129	* Human review REQUIRED before "Human-Reviewed" status
		130	* AI-generated content MAY be published but:
		131	** Clearly flagged as AI-draft with prominent disclaimer
		132	** May have limited visibility
		133	** Auto-escalated to expert review queue
		134	** User warnings displayed
		135
		136	Audit rate: Recommendation: 30-50% of published AI-drafts sampled in first 6 months
		137
		138	=== 5.2 Tier B — Medium Risk ===
		139
		140	Domains: Contested public policy, complex science, causality claims, significant financial impact
		141
		142	Publication policy:
		143
		144	* AI-draft CAN publish immediately with clear labeling
		145	* Sampling audits conducted (see Audit System below)
		146	* High-engagement items auto-escalated to expert review
		147	* Users can request human review
		148
		149	Audit rate: Recommendation: 10-20% of published AI-drafts sampled
		150
		151	=== 5.3 Tier C — Low Risk ===
		152
		153	Domains: Definitions, simple factual lookups with strong primary sources, historical facts, established scientific consensus
		154
		155	Publication policy:
		156
		157	* AI-draft default publication mode
		158	* Sampling audits sufficient
		159	* Community flagging available
		160	* Human review on request
		161
		162	Audit rate: Recommendation: 5-10% of published AI-drafts sampled
		163
		164
		165	== 6. Quality Gates (Mandatory Before AI-Draft Publication) ==
		166
		167	All AI-generated content must pass these automated checks before Mode 2 publication:
		168
		169	=== 6.1 Gate 1: Source Quality ===
		170
		171	* Primary sources identified and accessible
		172	* Source reliability scored against whitelist
		173	* Citation completeness verified
		174	* Publication dates checked
		175	* Author credentials validated (where applicable)
		176
		177	=== 6.2 Gate 2: Contradiction Search (MANDATORY) ===
		178
		179	The system MUST actively search for:
		180
		181	* Counter-evidence – Rebuttals, conflicting results, contradictory studies
		182	* Reservations – Caveats, limitations, boundary conditions, applicability constraints
		183	* Alternative interpretations – Different framings, definitions, contextual variations
		184	* Bubble detection – Conspiracy theories, echo chambers, ideologically isolated sources
		185
		186	Search coverage requirements:
		187
		188	* Academic literature (BOTH supporting AND opposing views)
		189	* Reputable media across diverse political/ideological perspectives
		190	* Official contradictions (retractions, corrections, updates, amendments)
		191	* Domain-specific skeptics, critics, and alternative expert opinions
		192	* Cross-cultural and international perspectives
		193
		194	Search must actively avoid algorithmic bubbles:
		195
		196	* Deliberately seek opposing viewpoints
		197	* Check for echo chamber patterns in source clusters
		198	* Identify tribal or ideological source clustering
		199	* Flag when search space appears artificially constrained
		200	* Verify diversity of perspectives represented
		201
		202	Outcomes:
		203
		204	* Strong counter-evidence found → Auto-escalate to Tier B or draft-only mode
		205	* Significant uncertainty detected → Require uncertainty disclosure in verdict
		206	* Bubble indicators present → Flag for expert review and human validation
		207	* Limited perspective diversity → Expand search or flag for human review
		208
		209	=== 6.3 Gate 3: Uncertainty Quantification ===
		210
		211	* Confidence scores calculated for all claims and verdicts
		212	* Limitations explicitly stated
		213	* Data gaps identified and disclosed
		214	* Strength of evidence assessed
		215	* Alternative scenarios considered
		216
		217	=== 6.4 Gate 4: Structural Integrity ===
		218
		219	* No hallucinations detected (fact-checking against sources)
		220	* Logic chain valid and traceable
		221	* References accessible and verifiable
		222	* No circular reasoning
		223	* Premises clearly stated
		224
		225	If any gate fails:
		226
		227	* Content remains in draft-only mode
		228	* Failure reason logged
		229	* Human review required before publication
		230	* Failure patterns analyzed for system improvement
		231
		232	== 7. Audit System (Sampling-Based Quality Assurance) ==
		233
		234	Instead of reviewing ALL AI output, FactHarbor implements stratified sampling audits:
		235
		236	=== 7.1 Sampling Strategy ===
		237
		238	Audits prioritize:
		239
		240	* Risk tier (higher tiers get more frequent audits)
		241	* AI confidence score (low confidence → higher sampling rate)
		242	* Traffic and engagement (high-visibility content audited more)
		243	* Novelty (new claim types, new domains, emerging topics)
		244	* Disagreement signals (user flags, contradiction alerts, community reports)
		245
		246	=== 7.2 Audit Process ===
		247
		248	1. System selects content for audit based on sampling strategy
		249	2. Human auditor reviews AI-generated content against quality standards
		250	3. Auditor validates or corrects:
		251
		252	* Claim extraction accuracy
		253	* Scenario appropriateness
		254	* Evidence relevance and interpretation
		255	* Verdict reasoning
		256	* Contradiction search completeness
		257	4. Audit outcome recorded (pass/fail + detailed feedback)
		258	5. Failed audits trigger immediate content review
		259	6. Audit results feed back into system improvement
		260
		261	=== 7.3 Feedback Loop (Continuous Improvement) ===
		262
		263	Audit outcomes systematically improve:
		264
		265	* Query templates – Refined based on missed evidence patterns
		266	* Retrieval source weights – Adjusted for accuracy and reliability
		267	* Contradiction detection heuristics – Enhanced to catch missed counter-evidence
		268	* Model prompts and extraction rules – Tuned for better claim extraction
		269	* Risk tier assignments – Recalibrated based on error patterns
		270	* Bubble detection algorithms – Improved to identify echo chambers
		271
		272	=== 7.4 Audit Transparency ===
		273
		274	* Audit statistics published regularly
		275	* Accuracy rates by risk tier tracked and reported
		276	* System improvements documented
		277	* Community can view aggregate audit performance
		278
		279	== 8. Architecture Overview ==
		280
		281	{{include reference="Archive.FactHarbor V0\.9\.23 Lost Data.Specification.Diagrams.AKEL Architecture.WebHome"}}
		282
		283
		284	== 9. AKEL and Federation ==
		285
		286	In Release 1.0+, AKEL participates in cross-node knowledge alignment:
		287
		288	* Shares embeddings
		289	* Exchanges canonicalized claim forms
		290	* Exchanges scenario templates
		291	* Sends + receives contradiction alerts
		292	* Shares audit findings (with privacy controls)
		293	* Never shares model weights
		294	* Never overrides local governance
		295
		296	Nodes may choose trust levels for AKEL-related data:
		297
		298	* Trusted nodes: auto-merge embeddings + templates
		299	* Neutral nodes: require reviewer approval
		300	* Untrusted nodes: fully manual import
		301
		302
		303	== 10. Human Review Workflow (Mode 3 Publication) ==
		304
		305	For content requiring human validation before "Human-Reviewed" status:
		306
		307	1. AKEL generates content and publishes as AI-draft (Mode 2) or keeps as draft (Mode 1)
		308	2. Reviewers inspect content in review queue
		309	3. Reviewers validate quality gates were correctly applied
		310	4. Experts validate high-risk (Tier A) or domain-specific outputs
		311	5. Moderators finalize "Human-Reviewed" publication
		312	6. Version numbers increment, full history preserved
		313
		314	Note: Most AI-generated content (Tier B and C) can remain in Mode 2 (AI-Generated) indefinitely. Human review is optional for these tiers unless users or audits flag issues.
		315
		316
		317	== 11. POC v1 Behavior ==
		318
		319	The POC explicitly demonstrates AI-generated content publication:
		320
		321	* Produces public AI-generated output (Mode 2)
		322	* No human data sources required
		323	* No human approval gate
		324	* Clear "AI-Generated - POC/Demo" labeling
		325	* All quality gates active (including contradiction search)
		326	* Users understand this demonstrates AI reasoning capabilities
		327	* Risk tier classification shown (demo purposes)
		328
		329
		330	== 12. Related Pages ==
		331
		332	* [[Automation>>Test.FactHarborV09.Specification.Automation.WebHome]]
		333	* [[Requirements (Roles)>>Test.FactHarborV09.Specification.Requirements.WebHome]]
		334	* [[Workflows>>Test.FactHarborV09.Specification.Workflows.WebHome]]
		335	* [[Governance>>Test.FactHarborV09.Organisation.Governance]]
		336	{{/include}}