AI Knowledge Extraction Layer (AKEL)

author	version	line-number	content
		1	= AKEL — AI Knowledge Extraction Layer =
		2	AKEL is FactHarbor's automated intelligence subsystem.
		3	Its purpose is to reduce human workload, enhance consistency, and enable scalable knowledge processing — without ever replacing human judgment.
		4	AKEL outputs are marked with AuthorType = AI and published according to risk-based review policies (see Publication Modes below).
		5	AKEL operates in two modes:
		6	* Single-node mode (POC & Beta 0)
		7	* Federated multi-node mode (Release 1.0+)
		8	== 1. Purpose and Role ==
		9	AKEL transforms unstructured inputs into structured, publication-ready content.
		10	Core responsibilities:
		11	* Claim extraction from arbitrary text
		12	* Claim classification (domain, type, evaluability, safety, risk tier)
		13	* Scenario generation (definitions, boundaries, assumptions, methodology)
		14	* Evidence summarization and metadata extraction
		15	* Contradiction detection and counter-evidence search
		16	* Reservation and limitation identification
		17	* Bubble detection (echo chambers, conspiracy theories, isolated sources)
		18	* Re-evaluation proposal generation
		19	* Cross-node embedding exchange (Release 1.0+)
		20	== 2. Components ==
		21	* AKEL Orchestrator – central coordinator
		22	* Claim Extractor
		23	* Claim Classifier (with risk tier assignment)
		24	* Scenario Generator
		25	* Evidence Summarizer
		26	* Contradiction Detector (enhanced with counter-evidence search)
		27	* Quality Gate Validator
		28	* Audit Sampling Scheduler
		29	* Embedding Handler (Release 1.0+)
		30	* Federation Sync Adapter (Release 1.0+)
		31	== 3. Inputs and Outputs ==
		32	=== 3.1 Inputs ===
		33	* User-submitted claims or evidence
		34	* Uploaded documents
		35	* URLs or citations
		36	* External LLM API (optional)
		37	* Embeddings (from local or federated peers)
		38	=== 3.2 Outputs (publication mode varies by risk tier) ===
		39	* ClaimVersion (draft or AI-generated)
		40	* ScenarioVersion (draft or AI-generated)
		41	* EvidenceVersion (summary + metadata, draft or AI-generated)
		42	* VerdictVersion (draft, AI-generated, or human-reviewed)
		43	* Contradiction alerts
		44	* Reservation and limitation notices
		45	* Re-evaluation proposals
		46	* Updated embeddings
		47	== 4. Publication Modes ==
		48	AKEL content is published according to three modes:
		49	=== 4.1 Mode 1: Draft-Only (Never Public) ===
		50	Used for:
		51	* Failed quality gate checks
		52	* Sensitive topics flagged for expert review
		53	* Unclear scope or missing critical sources
		54	* High reputational risk content
		55	Visibility: Internal review queue only
		56	=== 4.2 Mode 2: Published as AI-Generated (No Prior Human Review) ===
		57	Requirements:
		58	* All automated quality gates passed (see below)
		59	* Risk tier permits AI-draft publication (Tier B or C)
		60	* Contradiction search completed successfully
		61	* Clear labeling as "AI-Generated, AKEL-Generated"
		62	Label shown to users:
		63	```
		64	[AI-Generated] This content was produced by AI and has not yet been human-reviewed.
		65	Source: AI \| Review Status: Pending \| Risk Tier: [B/C]
		66	Contradiction Search: Completed \| Last Updated: [timestamp]
		67	```
		68	User actions:
		69	* Browse and read content
		70	* Request human review (escalates to review queue)
		71	* Flag for expert attention
		72	== 5. Risk tiers ==
		73	AKEL assigns risk tiers to all content to determine appropriate review requirements:
		74	=== 5.1 Tier A — High Risk / High Impact ===
		75	Domains: Medical, legal, elections, safety/security, major reputational harm
		76	Publication policy:
		77	* Human review REQUIRED before "AKEL-Generated" status
		78	* AI-generated content MAY be published but:
		79	** Clearly flagged as AI-draft with prominent disclaimer
		80	** May have limited visibility
		81	** Auto-escalated to expert review queue
		82	** User warnings displayed
		83	Audit rate: Recommendation: 30-50% of published AI-drafts sampled in first 6 months
		84	=== 5.2 Tier B — Medium Risk ===
		85	Domains: Contested public policy, complex science, causality claims, significant financial impact
		86	Publication policy:
		87	* AI-draft CAN publish immediately with clear labeling
		88	* Sampling audits conducted (see Audit System below)
		89	* High-engagement items auto-escalated to expert review
		90	* Users can report issue for moderator review
		91	Audit rate: Recommendation: 10-20% of published AI-drafts sampled
		92	=== 5.3 Tier C — Low Risk ===
		93	Domains: Definitions, simple factual lookups with strong primary sources, historical facts, established scientific consensus
		94	Publication policy:
		95	* AI-draft default publication mode
		96	* Sampling audits sufficient
		97	* Community flagging available
		98	* Human review on request
		99	Audit rate: Recommendation: 5-10% of published AI-drafts sampled
		100	== 6. Quality Gates (Mandatory Before AI-Draft Publication) ==
		101	All AI-generated content must pass these automated checks before Mode 2 publication:
		102	=== 6.1 Gate 1: Source Quality ===
		103	* Primary sources identified and accessible
		104	* Source reliability scored against whitelist
		105	* Citation completeness verified
		106	* Publication dates checked
		107	* Author credentials validated (where applicable)
		108	=== 6.2 Gate 2: Contradiction Search (MANDATORY) ===
		109	The system MUST actively search for:
		110	* Counter-evidence – Rebuttals, conflicting results, contradictory studies
		111	* Reservations – Caveats, limitations, boundary conditions, applicability constraints
		112	* Alternative interpretations – Different framings, definitions, contextual variations
		113	* Bubble detection – Conspiracy theories, echo chambers, ideologically isolated sources
		114	Search coverage requirements:
		115	* Academic literature (BOTH supporting AND opposing views)
		116	* Reputable media across diverse political/ideological perspectives
		117	* Official contradictions (retractions, corrections, updates, amendments)
		118	* Domain-specific skeptics, critics, and alternative expert opinions
		119	* Cross-cultural and international perspectives
		120	Search must actively avoid algorithmic bubbles:
		121	* Deliberately seek opposing viewpoints
		122	* Check for echo chamber patterns in source clusters
		123	* Identify tribal or ideological source clustering
		124	* Flag when search space appears artificially constrained
		125	* Verify diversity of perspectives represented
		126	Outcomes:
		127	* Strong counter-evidence found → Auto-escalate to Tier B or draft-only mode
		128	* Significant uncertainty detected → Require uncertainty disclosure in verdict
		129	* Bubble indicators present → Flag for expert review and human validation
		130	* Limited perspective diversity → Expand search or flag for human review
		131	=== 6.3 Gate 3: Uncertainty Quantification ===
		132	* Confidence scores calculated for all claims and verdicts
		133	* Limitations explicitly stated
		134	* Data gaps identified and disclosed
		135	* Strength of evidence assessed
		136	* Alternative scenarios considered
		137	=== 6.4 Gate 4: Structural Integrity ===
		138	* No hallucinations detected (fact-checking against sources)
		139	* Logic chain valid and traceable
		140	* References accessible and verifiable
		141	* No circular reasoning
		142	* Premises clearly stated
		143	If any gate fails:
		144	* Content remains in draft-only mode
		145	* Failure reason logged
		146	* Human review required before publication
		147	* Failure patterns analyzed for system improvement
		148	== 7. Audit System (Sampling-Based Quality Assurance) ==
		149	Instead of reviewing ALL AI output, FactHarbor implements stratified sampling audits:
		150	=== 7.1 Sampling Strategy ===
		151	Audits prioritize:
		152	* Risk tier (higher tiers get more frequent audits)
		153	* AI confidence score (low confidence → higher sampling rate)
		154	* Traffic and engagement (high-visibility content audited more)
		155	* Novelty (new claim types, new domains, emerging topics)
		156	* Disagreement signals (user flags, contradiction alerts, community reports)
		157	=== 7.2 Audit Process ===
		158	1. System selects content for audit based on sampling strategy
		159	2. Human auditor reviews AI-generated content against quality standards
		160	3. Moderator validates or corrects:
		161	* Claim extraction accuracy
		162	* Scenario appropriateness
		163	* Evidence relevance and interpretation
		164	* Verdict reasoning
		165	* Contradiction search completeness
		166	4. Audit outcome recorded (pass/fail + detailed feedback)
		167	5. Failed audits trigger immediate content review
		168	6. Audit results feed back into system improvement
		169	=== 7.3 Feedback Loop (Continuous Improvement) ===
		170	Audit outcomes systematically improve:
		171	* Query templates – Refined based on missed evidence patterns
		172	* Retrieval source weights – Adjusted for accuracy and reliability
		173	* Contradiction detection heuristics – Enhanced to catch missed counter-evidence
		174	* Model prompts and extraction rules – Tuned for better claim extraction
		175	* Risk tier assignments – Recalibrated based on error patterns
		176	* Bubble detection algorithms – Improved to identify echo chambers
		177	=== 7.4 Audit Transparency ===
		178	* Audit statistics published regularly
		179	* Accuracy rates by risk tier tracked and reported
		180	* System improvements documented
		181	* Community can view aggregate audit performance
		182	== 8. Architecture Overview ==
		183	{{include reference="FactHarbor.Specification.Diagrams.AKEL Architecture.WebHome"/}}
		184	== 9. AKEL and Federation ==
		185	In Release 1.0+, AKEL participates in cross-node knowledge alignment:
		186	* Shares embeddings
		187	* Exchanges canonicalized claim forms
		188	* Exchanges scenario templates
		189	* Sends + receives contradiction alerts
		190	* Shares audit findings (with privacy controls)
		191	* Never shares model weights
		192	* Never overrides local governance
		193	Nodes may choose trust levels for AKEL-related data:
		194	* Trusted nodes: auto-merge embeddings + templates
		195	* Neutral nodes: require additional verification
		196	* Untrusted nodes: fully manual import
		197	== 10. Human Review Workflow (Mode 3 Publication) ==
		198	For content requiring human validation before "AKEL-Generated" status:
		199	1. AKEL generates content and publishes as AI-draft (Mode 2) or keeps as draft (Mode 1)
		200	2. Contributors inspect content in review queue
		201	3. Contributors validate quality gates were correctly applied
		202	4. Trusted Contributors validate high-risk (Tier A) or domain-specific outputs
		203	5. Moderators finalize "AKEL-Generated" publication
		204	6. Version numbers increment, full history preserved
		205	Note: Most AI-generated content (Tier B and C) can remain in Mode 2 (AI-Generated) indefinitely. Human review is optional for these tiers unless users or audits flag issues.
		206	== 11. POC v1 Behavior ==
		207	The POC explicitly demonstrates AI-generated content publication:
		208	* Produces public AI-generated output (Mode 2)
		209	* No human data sources required
		210	* No human approval gate
		211	* Clear "AI-Generated - POC/Demo" labeling
		212	* All quality gates active (including contradiction search)
		213	* Users understand this demonstrates AI reasoning capabilities
		214	* Risk tier classification shown (demo purposes)
		215	== 12. Related Pages ==
		216	* [[Automation>>FactHarbor.Specification.Automation.WebHome]]
		217	* [[Requirements (Roles)>>FactHarbor.Specification.Requirements.WebHome]]
		218	* [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]
		219	* [[Governance>>FactHarbor.Organisation.Governance.WebHome]]

= AKEL — AI Knowledge Extraction Layer =

AKEL is FactHarbor's automated intelligence subsystem.

Its purpose is to reduce human workload, enhance consistency, and enable scalable knowledge processing — **without ever replacing human judgment**.

4

AKEL outputs are marked with **AuthorType = AI** and published according to risk-based review policies (see Publication Modes below).

5

AKEL operates in two modes:

6

* **Single-node mode** (POC & Beta 0)

7

* **Federated multi-node mode** (Release 1.0+)

8

== 1. Purpose and Role ==

9

AKEL transforms unstructured inputs into structured, publication-ready content.

10

Core responsibilities:

11

* Claim extraction from arbitrary text

12

* Claim classification (domain, type, evaluability, safety, **risk tier**)

13

* Scenario generation (definitions, boundaries, assumptions, methodology)

14

* Evidence summarization and metadata extraction

15

* **Contradiction detection and counter-evidence search**

16

* **Reservation and limitation identification**

17

* **Bubble detection** (echo chambers, conspiracy theories, isolated sources)

18

* Re-evaluation proposal generation

19

* Cross-node embedding exchange (Release 1.0+)

20

== 2. Components ==

21

* **AKEL Orchestrator** – central coordinator

22

* **Claim Extractor**

23

* **Claim Classifier** (with risk tier assignment)

24

* **Scenario Generator**

25

* **Evidence Summarizer**

26

* **Contradiction Detector** (enhanced with counter-evidence search)

27

* **Quality Gate Validator**

28

* **Audit Sampling Scheduler**

29

* **Embedding Handler** (Release 1.0+)

30

* **Federation Sync Adapter** (Release 1.0+)

31

== 3. Inputs and Outputs ==

32

=== 3.1 Inputs ===

33

* User-submitted claims or evidence

34

* Uploaded documents

35

* URLs or citations

36

* External LLM API (optional)

37

* Embeddings (from local or federated peers)

38

=== 3.2 Outputs (publication mode varies by risk tier) ===

39

* ClaimVersion (draft or AI-generated)

40

* ScenarioVersion (draft or AI-generated)

41

* EvidenceVersion (summary + metadata, draft or AI-generated)

42

* VerdictVersion (draft, AI-generated, or human-reviewed)

43

* Contradiction alerts

44

* Reservation and limitation notices

45

* Re-evaluation proposals

46

* Updated embeddings

47

== 4. Publication Modes ==

48

AKEL content is published according to three modes:

49

=== 4.1 Mode 1: Draft-Only (Never Public) ===

50

**Used for:**

51

* Failed quality gate checks

52

* Sensitive topics flagged for expert review

53

* Unclear scope or missing critical sources

54

* High reputational risk content

55

**Visibility:** Internal review queue only

56

=== 4.2 Mode 2: Published as AI-Generated (No Prior Human Review) ===

57

**Requirements:**

58

* All automated quality gates passed (see below)

59

* Risk tier permits AI-draft publication (Tier B or C)

60

* Contradiction search completed successfully

61

* Clear labeling as "AI-Generated, AKEL-Generated"

62

**Label shown to users:**

63

```

64

[AI-Generated] This content was produced by AI and has not yet been human-reviewed.

65

Source: AI | Review Status: Pending | Risk Tier: [B/C]

66

Contradiction Search: Completed | Last Updated: [timestamp]

67

```

68

**User actions:**

69

* Browse and read content

70

* Request human review (escalates to review queue)

71

* Flag for expert attention

72

== 5. Risk tiers ==

73

AKEL assigns risk tiers to all content to determine appropriate review requirements:

74

=== 5.1 Tier A — High Risk / High Impact ===

75

**Domains:** Medical, legal, elections, safety/security, major reputational harm

76

**Publication policy:**

77

* Human review REQUIRED before "AKEL-Generated" status

78

* AI-generated content MAY be published but:

79

** Clearly flagged as AI-draft with prominent disclaimer

80

** May have limited visibility

81

** Auto-escalated to expert review queue

82

** User warnings displayed

83

**Audit rate:** Recommendation: 30-50% of published AI-drafts sampled in first 6 months

84

=== 5.2 Tier B — Medium Risk ===

85

**Domains:** Contested public policy, complex science, causality claims, significant financial impact

86

**Publication policy:**

87

* AI-draft CAN publish immediately with clear labeling

88

* Sampling audits conducted (see Audit System below)

89

* High-engagement items auto-escalated to expert review

90

* Users can report issue for moderator review

91

**Audit rate:** Recommendation: 10-20% of published AI-drafts sampled

92

=== 5.3 Tier C — Low Risk ===

93

**Domains:** Definitions, simple factual lookups with strong primary sources, historical facts, established scientific consensus

94

**Publication policy:**

95

* AI-draft default publication mode

96

* Sampling audits sufficient

97

* Community flagging available

98

* Human review on request

99

**Audit rate:** Recommendation: 5-10% of published AI-drafts sampled

100

== 6. Quality Gates (Mandatory Before AI-Draft Publication) ==

101

All AI-generated content must pass these automated checks before Mode 2 publication:

102

=== 6.1 Gate 1: Source Quality ===

103

* Primary sources identified and accessible

104

* Source reliability scored against whitelist

105

* Citation completeness verified

106

* Publication dates checked

107

* Author credentials validated (where applicable)

108

=== 6.2 Gate 2: Contradiction Search (MANDATORY) ===

109

**The system MUST actively search for:**

110

* **Counter-evidence** – Rebuttals, conflicting results, contradictory studies

111

* **Reservations** – Caveats, limitations, boundary conditions, applicability constraints

112

* **Alternative interpretations** – Different framings, definitions, contextual variations

113

* **Bubble detection** – Conspiracy theories, echo chambers, ideologically isolated sources

114

**Search coverage requirements:**

115

* Academic literature (BOTH supporting AND opposing views)

116

* Reputable media across diverse political/ideological perspectives

117

* Official contradictions (retractions, corrections, updates, amendments)

118

* Domain-specific skeptics, critics, and alternative expert opinions

119

* Cross-cultural and international perspectives

120

**Search must actively avoid algorithmic bubbles:**

121

* Deliberately seek opposing viewpoints

122

* Check for echo chamber patterns in source clusters

123

* Identify tribal or ideological source clustering

124

* Flag when search space appears artificially constrained

125

* Verify diversity of perspectives represented

126

**Outcomes:**

127

* **Strong counter-evidence found** → Auto-escalate to Tier B or draft-only mode

128

* **Significant uncertainty detected** → Require uncertainty disclosure in verdict

129

* **Bubble indicators present** → Flag for expert review and human validation

130

* **Limited perspective diversity** → Expand search or flag for human review

131

=== 6.3 Gate 3: Uncertainty Quantification ===

132

* Confidence scores calculated for all claims and verdicts

133

* Limitations explicitly stated

134

* Data gaps identified and disclosed

135

* Strength of evidence assessed

136

* Alternative scenarios considered

137

=== 6.4 Gate 4: Structural Integrity ===

138

* No hallucinations detected (fact-checking against sources)

139

* Logic chain valid and traceable

140

* References accessible and verifiable

141

* No circular reasoning

142

* Premises clearly stated

143

**If any gate fails:**

144

* Content remains in draft-only mode

145

* Failure reason logged

146

* Human review required before publication

147

* Failure patterns analyzed for system improvement

148

== 7. Audit System (Sampling-Based Quality Assurance) ==

149

Instead of reviewing ALL AI output, FactHarbor implements stratified sampling audits:

150

=== 7.1 Sampling Strategy ===

151

Audits prioritize:

152

* **Risk tier** (higher tiers get more frequent audits)

153

* **AI confidence score** (low confidence → higher sampling rate)

154

* **Traffic and engagement** (high-visibility content audited more)

155

* **Novelty** (new claim types, new domains, emerging topics)

156

* **Disagreement signals** (user flags, contradiction alerts, community reports)

157

=== 7.2 Audit Process ===

158

1. System selects content for audit based on sampling strategy

159

2. Human auditor reviews AI-generated content against quality standards

160

3. Moderator validates or corrects:

161

* Claim extraction accuracy

162

* Scenario appropriateness

163

* Evidence relevance and interpretation

164

* Verdict reasoning

165

* Contradiction search completeness

166

4. Audit outcome recorded (pass/fail + detailed feedback)

167

5. Failed audits trigger immediate content review

168

6. Audit results feed back into system improvement

169

=== 7.3 Feedback Loop (Continuous Improvement) ===

170

Audit outcomes systematically improve:

171

* **Query templates** – Refined based on missed evidence patterns

172

* **Retrieval source weights** – Adjusted for accuracy and reliability

173

* **Contradiction detection heuristics** – Enhanced to catch missed counter-evidence

174

* **Model prompts and extraction rules** – Tuned for better claim extraction

175

* **Risk tier assignments** – Recalibrated based on error patterns

176

* **Bubble detection algorithms** – Improved to identify echo chambers

177

=== 7.4 Audit Transparency ===

178

* Audit statistics published regularly

179

* Accuracy rates by risk tier tracked and reported

180

* System improvements documented

181

* Community can view aggregate audit performance

182

== 8. Architecture Overview ==

183

{{include reference="FactHarbor.Specification.Diagrams.AKEL Architecture.WebHome"/}}

184

== 9. AKEL and Federation ==

185

In Release 1.0+, AKEL participates in cross-node knowledge alignment:

186

* Shares embeddings

187

* Exchanges canonicalized claim forms

188

* Exchanges scenario templates

189

* Sends + receives contradiction alerts

190

* Shares audit findings (with privacy controls)

191

* Never shares model weights

192

* Never overrides local governance

193

Nodes may choose trust levels for AKEL-related data:

194

* Trusted nodes: auto-merge embeddings + templates

195

* Neutral nodes: require additional verification

196

* Untrusted nodes: fully manual import

197

== 10. Human Review Workflow (Mode 3 Publication) ==

198

For content requiring human validation before "AKEL-Generated" status:

199

1. AKEL generates content and publishes as AI-draft (Mode 2) or keeps as draft (Mode 1)

200

2. Contributors inspect content in review queue

201

3. Contributors validate quality gates were correctly applied

202

4. Trusted Contributors validate high-risk (Tier A) or domain-specific outputs

203

5. Moderators finalize "AKEL-Generated" publication

204

6. Version numbers increment, full history preserved

205

**Note:** Most AI-generated content (Tier B and C) can remain in Mode 2 (AI-Generated) indefinitely. Human review is optional for these tiers unless users or audits flag issues.

206

== 11. POC v1 Behavior ==

207

The POC explicitly demonstrates AI-generated content publication:

208

* Produces public AI-generated output (Mode 2)

209

* No human data sources required

210

* No human approval gate

211

* Clear "AI-Generated - POC/Demo" labeling