Skip to Content

Wiki source code of AI Knowledge Extraction Layer (AKEL)

Last modified by Robert Schaub on 2025/12/24 20:33

Show last authors

author	version	line-number	content
		1	= AKEL — AI Knowledge Extraction Layer =
		2
		3	AKEL is FactHarbor's automated intelligence subsystem.
		4	Its purpose is to reduce human workload, enhance consistency, and enable scalable knowledge processing — without ever replacing human judgment.
		5
		6	AKEL outputs are marked with AuthorType = AI and published according to risk-based review policies (see Publication Modes below).
		7
		8	AKEL operates in two modes:
		9
		10	* Single-node mode (POC & Beta 0)
		11	* Federated multi-node mode (Release 1.0+)
		12
		13	Human reviewers, experts, and moderators always retain final authority over content marked as "Human-Reviewed."
		14
		15	----
		16
		17	== Purpose and Role ==
		18
		19	AKEL transforms unstructured inputs into structured, publication-ready content.
		20
		21	Core responsibilities:
		22
		23	* Claim extraction from arbitrary text
		24	* Claim classification (domain, type, evaluability, safety, risk tier)
		25	* Scenario generation (definitions, boundaries, assumptions, methodology)
		26	* Evidence summarization and metadata extraction
		27	* Contradiction detection and counter-evidence search
		28	* Reservation and limitation identification
		29	* Bubble detection (echo chambers, conspiracy theories, isolated sources)
		30	* Re-evaluation proposal generation
		31	* Cross-node embedding exchange (Release 1.0+)
		32
		33	----
		34
		35	== Components ==
		36
		37	* AKEL Orchestrator – central coordinator
		38	* Claim Extractor
		39	* Claim Classifier (with risk tier assignment)
		40	* Scenario Generator
		41	* Evidence Summarizer
		42	* Contradiction Detector (enhanced with counter-evidence search)
		43	* Quality Gate Validator
		44	* Audit Sampling Scheduler
		45	* Embedding Handler (Release 1.0+)
		46	* Federation Sync Adapter (Release 1.0+)
		47
		48	----
		49
		50	== Inputs and Outputs ==
		51
		52	=== Inputs ===
		53
		54	* User-submitted claims or evidence
		55	* Uploaded documents
		56	* URLs or citations
		57	* External LLM API (optional)
		58	* Embeddings (from local or federated peers)
		59
		60	=== Outputs (publication mode varies by risk tier) ===
		61
		62	* ClaimVersion (draft or AI-generated)
		63	* ScenarioVersion (draft or AI-generated)
		64	* EvidenceVersion (summary + metadata, draft or AI-generated)
		65	* VerdictVersion (draft, AI-generated, or human-reviewed)
		66	* Contradiction alerts
		67	* Reservation and limitation notices
		68	* Re-evaluation proposals
		69	* Updated embeddings
		70
		71	----
		72
		73	== Publication Modes ==
		74
		75	AKEL content is published according to three modes:
		76
		77	=== Mode 1: Draft-Only (Never Public) ===
		78
		79	Used for:
		80
		81	* Failed quality gate checks
		82	* Sensitive topics flagged for expert review
		83	* Unclear scope or missing critical sources
		84	* High reputational risk content
		85
		86	Visibility: Internal review queue only
		87
		88	=== Mode 2: Published as AI-Generated (No Prior Human Review) ===
		89
		90	Requirements:
		91
		92	* All automated quality gates passed (see below)
		93	* Risk tier permits AI-draft publication (Tier B or C)
		94	* Contradiction search completed successfully
		95	* Clear labeling as "AI-Generated, Awaiting Human Review"
		96
		97	Label shown to users:
		98	```
		99	[AI-Generated] This content was produced by AI and has not yet been human-reviewed.
		100	Source: AI \| Review Status: Pending \| Risk Tier: [B/C]
		101	Contradiction Search: Completed \| Last Updated: [timestamp]
		102	```
		103
		104	User actions:
		105
		106	* Browse and read content
		107	* Request human review (escalates to review queue)
		108	* Flag for expert attention
		109
		110	=== Mode 3: Published as Human-Reviewed ===
		111
		112	Requirements:
		113
		114	* Human reviewer or domain expert validated
		115	* All quality gates passed
		116	* Visible "Human-Reviewed" mark with reviewer role and timestamp
		117
		118	Label shown to users:
		119	```
		120	[Human-Reviewed] This content has been validated by human reviewers.
		121	Source: AI+Human \| Review Status: Approved \| Reviewed by: [Role] on [timestamp]
		122	Risk Tier: [A/B/C] \| Contradiction Search: Completed
		123	```
		124
		125	----
		126
		127	== Risk Tiers ==
		128
		129	AKEL assigns risk tiers to all content to determine appropriate review requirements:
		130
		131	=== Tier A — High Risk / High Impact ===
		132
		133	Domains: Medical, legal, elections, safety/security, major reputational harm
		134
		135	Publication policy:
		136
		137	* Human review REQUIRED before "Human-Reviewed" status
		138	* AI-generated content MAY be published but:
		139	** Clearly flagged as AI-draft with prominent disclaimer
		140	** May have limited visibility
		141	** Auto-escalated to expert review queue
		142	** User warnings displayed
		143
		144	Audit rate: Recommendation: 30-50% of published AI-drafts sampled in first 6 months
		145
		146	=== Tier B — Medium Risk ===
		147
		148	Domains: Contested public policy, complex science, causality claims, significant financial impact
		149
		150	Publication policy:
		151
		152	* AI-draft CAN publish immediately with clear labeling
		153	* Sampling audits conducted (see Audit System below)
		154	* High-engagement items auto-escalated to expert review
		155	* Users can request human review
		156
		157	Audit rate: Recommendation: 10-20% of published AI-drafts sampled
		158
		159	=== Tier C — Low Risk ===
		160
		161	Domains: Definitions, simple factual lookups with strong primary sources, historical facts, established scientific consensus
		162
		163	Publication policy:
		164
		165	* AI-draft default publication mode
		166	* Sampling audits sufficient
		167	* Community flagging available
		168	* Human review on request
		169
		170	Audit rate: Recommendation: 5-10% of published AI-drafts sampled
		171
		172	----
		173
		174	== Quality Gates (Mandatory Before AI-Draft Publication) ==
		175
		176	All AI-generated content must pass these automated checks before Mode 2 publication:
		177
		178	=== Gate 1: Source Quality ===
		179
		180	* Primary sources identified and accessible
		181	* Source reliability scored against whitelist
		182	* Citation completeness verified
		183	* Publication dates checked
		184	* Author credentials validated (where applicable)
		185
		186	=== Gate 2: Contradiction Search (MANDATORY) ===
		187
		188	The system MUST actively search for:
		189
		190	* Counter-evidence – Rebuttals, conflicting results, contradictory studies
		191	* Reservations – Caveats, limitations, boundary conditions, applicability constraints
		192	* Alternative interpretations – Different framings, definitions, contextual variations
		193	* Bubble detection – Conspiracy theories, echo chambers, ideologically isolated sources
		194
		195	Search coverage requirements:
		196
		197	* Academic literature (BOTH supporting AND opposing views)
		198	* Reputable media across diverse political/ideological perspectives
		199	* Official contradictions (retractions, corrections, updates, amendments)
		200	* Domain-specific skeptics, critics, and alternative expert opinions
		201	* Cross-cultural and international perspectives
		202
		203	Search must actively avoid algorithmic bubbles:
		204
		205	* Deliberately seek opposing viewpoints
		206	* Check for echo chamber patterns in source clusters
		207	* Identify tribal or ideological source clustering
		208	* Flag when search space appears artificially constrained
		209	* Verify diversity of perspectives represented
		210
		211	Outcomes:
		212
		213	* Strong counter-evidence found → Auto-escalate to Tier B or draft-only mode
		214	* Significant uncertainty detected → Require uncertainty disclosure in verdict
		215	* Bubble indicators present → Flag for expert review and human validation
		216	* Limited perspective diversity → Expand search or flag for human review
		217
		218	=== Gate 3: Uncertainty Quantification ===
		219
		220	* Confidence scores calculated for all claims and verdicts
		221	* Limitations explicitly stated
		222	* Data gaps identified and disclosed
		223	* Strength of evidence assessed
		224	* Alternative scenarios considered
		225
		226	=== Gate 4: Structural Integrity ===
		227
		228	* No hallucinations detected (fact-checking against sources)
		229	* Logic chain valid and traceable
		230	* References accessible and verifiable
		231	* No circular reasoning
		232	* Premises clearly stated
		233
		234	If any gate fails:
		235
		236	* Content remains in draft-only mode
		237	* Failure reason logged
		238	* Human review required before publication
		239	* Failure patterns analyzed for system improvement
		240
		241	----
		242
		243	== Audit System (Sampling-Based Quality Assurance) ==
		244
		245	Instead of reviewing ALL AI output, FactHarbor implements stratified sampling audits:
		246
		247	=== Sampling Strategy ===
		248
		249	Audits prioritize:
		250
		251	* Risk tier (higher tiers get more frequent audits)
		252	* AI confidence score (low confidence → higher sampling rate)
		253	* Traffic and engagement (high-visibility content audited more)
		254	* Novelty (new claim types, new domains, emerging topics)
		255	* Disagreement signals (user flags, contradiction alerts, community reports)
		256
		257	=== Audit Process ===
		258
		259	1. System selects content for audit based on sampling strategy
		260	2. Human auditor reviews AI-generated content against quality standards
		261	3. Auditor validates or corrects:
		262
		263	* Claim extraction accuracy
		264	* Scenario appropriateness
		265	* Evidence relevance and interpretation
		266	* Verdict reasoning
		267	* Contradiction search completeness
		268	4. Audit outcome recorded (pass/fail + detailed feedback)
		269	5. Failed audits trigger immediate content review
		270	6. Audit results feed back into system improvement
		271
		272	=== Feedback Loop (Continuous Improvement) ===
		273
		274	Audit outcomes systematically improve:
		275
		276	* Query templates – Refined based on missed evidence patterns
		277	* Retrieval source weights – Adjusted for accuracy and reliability
		278	* Contradiction detection heuristics – Enhanced to catch missed counter-evidence
		279	* Model prompts and extraction rules – Tuned for better claim extraction
		280	* Risk tier assignments – Recalibrated based on error patterns
		281	* Bubble detection algorithms – Improved to identify echo chambers
		282
		283	=== Audit Transparency ===
		284
		285	* Audit statistics published regularly
		286	* Accuracy rates by risk tier tracked and reported
		287	* System improvements documented
		288	* Community can view aggregate audit performance
		289
		290	----
		291
		292	== Architecture Overview ==
		293
		294	{{include reference="Archive.FactHarbor V0\.9\.23 Lost Data.Specification.Diagrams.AKEL Architecture.WebHome"/}}
		295
		296	----
		297
		298	== AKEL and Federation ==
		299
		300	In Release 1.0+, AKEL participates in cross-node knowledge alignment:
		301
		302	* Shares embeddings
		303	* Exchanges canonicalized claim forms
		304	* Exchanges scenario templates
		305	* Sends + receives contradiction alerts
		306	* Shares audit findings (with privacy controls)
		307	* Never shares model weights
		308	* Never overrides local governance
		309
		310	Nodes may choose trust levels for AKEL-related data:
		311
		312	* Trusted nodes: auto-merge embeddings + templates
		313	* Neutral nodes: require reviewer approval
		314	* Untrusted nodes: fully manual import
		315
		316	----
		317
		318	== Human Review Workflow (Mode 3 Publication) ==
		319
		320	For content requiring human validation before "Human-Reviewed" status:
		321
		322	1. AKEL generates content and publishes as AI-draft (Mode 2) or keeps as draft (Mode 1)
		323	2. Reviewers inspect content in review queue
		324	3. Reviewers validate quality gates were correctly applied
		325	4. Experts validate high-risk (Tier A) or domain-specific outputs
		326	5. Moderators finalize "Human-Reviewed" publication
		327	6. Version numbers increment, full history preserved
		328
		329	Note: Most AI-generated content (Tier B and C) can remain in Mode 2 (AI-Generated) indefinitely. Human review is optional for these tiers unless users or audits flag issues.
		330
		331	----
		332
		333	== POC v1 Behavior ==
		334
		335	The POC explicitly demonstrates AI-generated content publication:
		336
		337	* Produces public AI-generated output (Mode 2)
		338	* No human data sources required
		339	* No human approval gate
		340	* Clear "AI-Generated - POC/Demo" labeling
		341	* All quality gates active (including contradiction search)
		342	* Users understand this demonstrates AI reasoning capabilities
		343	* Risk tier classification shown (demo purposes)
		344
		345	----
		346
		347	== Related Pages ==
		348
		349	* [[Automation>>Archive.FactHarbor V0\.9\.18 copy.Specification.Automation.WebHome]]
		350	* [[Requirements (Roles)>>Archive.FactHarbor V0\.9\.18 copy.Specification.Requirements.WebHome]]
		351	* [[Workflows>>Archive.FactHarbor V0\.9\.18 copy.Specification.Workflows.WebHome]]
		352	* [[Governance>>FactHarbor.Organisation.Governance]]