AI Knowledge Extraction Layer (AKEL)

version	line-number	content
1.1	1	= AKEL — AI Knowledge Extraction Layer =
	2	AKEL is FactHarbor's automated intelligence subsystem.
	3	Its purpose is to reduce human workload, enhance consistency, and enable scalable knowledge processing — without ever replacing human judgment.
	4	AKEL outputs are marked with AuthorType = AI and published according to risk-based review policies (see Publication Modes below).
	5	AKEL operates in two modes:
	6	* Single-node mode (POC & Beta 0)
	7	* Federated multi-node mode (Release 1.0+)
	8	== 1. Purpose and Role ==
	9	AKEL transforms unstructured inputs into structured, publication-ready content.
	10	Core responsibilities:
	11	* Claim extraction from arbitrary text
	12	* Claim classification (domain, type, evaluability, safety, risk tier)
	13	* Scenario generation (definitions, boundaries, assumptions, methodology)
	14	* Evidence summarization and metadata extraction
	15	* Contradiction detection and counter-evidence search
	16	* Reservation and limitation identification
	17	* Bubble detection (echo chambers, conspiracy theories, isolated sources)
	18	* Re-evaluation proposal generation
	19	* Cross-node embedding exchange (Release 1.0+)
	20	== 2. Components ==
	21	* AKEL Orchestrator – central coordinator
	22	* Claim Extractor
	23	* Claim Classifier (with risk tier assignment)
	24	* Scenario Generator
	25	* Evidence Summarizer
	26	* Contradiction Detector (enhanced with counter-evidence search)
	27	* Quality Gate Validator
	28	* Audit Sampling Scheduler
	29	* Embedding Handler (Release 1.0+)
	30	* Federation Sync Adapter (Release 1.0+)
	31	== 3. Inputs and Outputs ==
	32	=== 3.1 Inputs ===
	33	* User-submitted claims or evidence
	34	* Uploaded documents
	35	* URLs or citations
	36	* External LLM API (optional)
	37	* Embeddings (from local or federated peers)
	38	=== 3.2 Outputs (publication mode varies by risk tier) ===
	39	* ClaimVersion (draft or AI-generated)
	40	* ScenarioVersion (draft or AI-generated)
	41	* EvidenceVersion (summary + metadata, draft or AI-generated)
	42	* VerdictVersion (draft, AI-generated, or human-reviewed)
	43	* Contradiction alerts
	44	* Reservation and limitation notices
	45	* Re-evaluation proposals
	46	* Updated embeddings
	47	== 4. Publication Modes ==
	48	AKEL content is published according to three modes:
	49	=== 4.1 Mode 1: Draft-Only (Never Public) ===
	50	Used for:
	51	* Failed quality gate checks
	52	* Sensitive topics flagged for expert review
	53	* Unclear scope or missing critical sources
	54	* High reputational risk content
	55	Visibility: Internal review queue only
	56	=== 4.2 Mode 2: Published as AI-Generated (No Prior Human Review) ===
	57	Requirements:
	58	* All automated quality gates passed (see below)
	59	* Risk tier permits AI-draft publication (Tier B or C)
	60	* Contradiction search completed successfully
	61	* Clear labeling as "AI-Generated, AKEL-Generated"
	62	Label shown to users:
	63	```
	64	[AI-Generated] This content was produced by AI and has not yet been human-reviewed.
	65	Source: AI \| Review Status: Pending \| Risk Tier: [B/C]
	66	Contradiction Search: Completed \| Last Updated: [timestamp]
	67	```
	68	User actions:
	69	* Browse and read content
	70	* Request human review (escalates to review queue)
	71	* Flag for expert attention
	72	== 5. Risk tiers ==
	73	AKEL assigns risk tiers to all content to determine appropriate review requirements:
	74	=== 5.1 Tier A — High Risk / High Impact ===
	75	Domains: Medical, legal, elections, safety/security, major reputational harm
	76	Publication policy:
	77	* Human review REQUIRED before "AKEL-Generated" status
	78	* AI-generated content MAY be published but:
	79	** Clearly flagged as AI-draft with prominent disclaimer
	80	** May have limited visibility
	81	** Auto-escalated to expert review queue
	82	** User warnings displayed
	83	Audit rate: Recommendation: 30-50% of published AI-drafts sampled in first 6 months
	84	=== 5.2 Tier B — Medium Risk ===
	85	Domains: Contested public policy, complex science, causality claims, significant financial impact
	86	Publication policy:
	87	* AI-draft CAN publish immediately with clear labeling
	88	* Sampling audits conducted (see Audit System below)
	89	* High-engagement items auto-escalated to expert review
	90	* Users can report issue for moderator review
	91	Audit rate: Recommendation: 10-20% of published AI-drafts sampled
	92	=== 5.3 Tier C — Low Risk ===
	93	Domains: Definitions, simple factual lookups with strong primary sources, historical facts, established scientific consensus
	94	Publication policy:
	95	* AI-draft default publication mode
	96	* Sampling audits sufficient
	97	* Community flagging available
	98	* Human review on request
	99	Audit rate: Recommendation: 5-10% of published AI-drafts sampled
	100	== 6. Quality Gates (Mandatory Before AI-Draft Publication) ==
	101	All AI-generated content must pass these automated checks before Mode 2 publication:
	102	=== 6.1 Gate 1: Source Quality ===
	103	* Primary sources identified and accessible
	104	* Source reliability scored against whitelist
	105	* Citation completeness verified
	106	* Publication dates checked
	107	* Author credentials validated (where applicable)
	108	=== 6.2 Gate 2: Contradiction Search (MANDATORY) ===
	109	The system MUST actively search for:
	110	* Counter-evidence – Rebuttals, conflicting results, contradictory studies
	111	* Reservations – Caveats, limitations, boundary conditions, applicability constraints
	112	* Alternative interpretations – Different framings, definitions, contextual variations
	113	* Bubble detection – Conspiracy theories, echo chambers, ideologically isolated sources
	114	Search coverage requirements:
	115	* Academic literature (BOTH supporting AND opposing views)
	116	* Reputable media across diverse political/ideological perspectives
	117	* Official contradictions (retractions, corrections, updates, amendments)
	118	* Domain-specific skeptics, critics, and alternative expert opinions
	119	* Cross-cultural and international perspectives
	120	Search must actively avoid algorithmic bubbles:
	121	* Deliberately seek opposing viewpoints
	122	* Check for echo chamber patterns in source clusters
	123	* Identify tribal or ideological source clustering
	124	* Flag when search space appears artificially constrained
	125	* Verify diversity of perspectives represented
	126	Outcomes:
	127	* Strong counter-evidence found → Auto-escalate to Tier B or draft-only mode
	128	* Significant uncertainty detected → Require uncertainty disclosure in verdict
	129	* Bubble indicators present → Flag for expert review and human validation
	130	* Limited perspective diversity → Expand search or flag for human review
	131	=== 6.3 Gate 3: Uncertainty Quantification ===
	132	* Confidence scores calculated for all claims and verdicts
	133	* Limitations explicitly stated
	134	* Data gaps identified and disclosed
	135	* Strength of evidence assessed
	136	* Alternative scenarios considered
	137	=== 6.4 Gate 4: Structural Integrity ===
	138	* No hallucinations detected (fact-checking against sources)
	139	* Logic chain valid and traceable
	140	* References accessible and verifiable
	141	* No circular reasoning
	142	* Premises clearly stated
	143	If any gate fails:
	144	* Content remains in draft-only mode
	145	* Failure reason logged
	146	* Human review required before publication
	147	* Failure patterns analyzed for system improvement
	148	== 7. Audit System (Sampling-Based Quality Assurance) ==
	149	Instead of reviewing ALL AI output, FactHarbor implements stratified sampling audits:
	150	=== 7.1 Sampling Strategy ===
	151	Audits prioritize:
	152	* Risk tier (higher tiers get more frequent audits)
	153	* AI confidence score (low confidence → higher sampling rate)
	154	* Traffic and engagement (high-visibility content audited more)
	155	* Novelty (new claim types, new domains, emerging topics)
	156	* Disagreement signals (user flags, contradiction alerts, community reports)
	157	=== 7.2 Audit Process ===
	158	1. System selects content for audit based on sampling strategy
	159	2. Human auditor reviews AI-generated content against quality standards
	160	3. Moderator validates or corrects:
	161	* Claim extraction accuracy
	162	* Scenario appropriateness
	163	* Evidence relevance and interpretation
	164	* Verdict reasoning
	165	* Contradiction search completeness
	166	4. Audit outcome recorded (pass/fail + detailed feedback)
	167	5. Failed audits trigger immediate content review
	168	6. Audit results feed back into system improvement
	169	=== 7.3 Feedback Loop (Continuous Improvement) ===
	170	Audit outcomes systematically improve:
	171	* Query templates – Refined based on missed evidence patterns
	172	* Retrieval source weights – Adjusted for accuracy and reliability
	173	* Contradiction detection heuristics – Enhanced to catch missed counter-evidence
	174	* Model prompts and extraction rules – Tuned for better claim extraction
	175	* Risk tier assignments – Recalibrated based on error patterns
	176	* Bubble detection algorithms – Improved to identify echo chambers
	177	=== 7.4 Audit Transparency ===
	178	* Audit statistics published regularly
	179	* Accuracy rates by risk tier tracked and reported
	180	* System improvements documented
	181	* Community can view aggregate audit performance
	182	== 8. Architecture Overview ==
	183	{{include reference="FactHarbor.Specification.Diagrams.AKEL Architecture.WebHome"/}}
	184	== 9. AKEL and Federation ==
	185	In Release 1.0+, AKEL participates in cross-node knowledge alignment:
	186	* Shares embeddings
	187	* Exchanges canonicalized claim forms
	188	* Exchanges scenario templates
	189	* Sends + receives contradiction alerts
	190	* Shares audit findings (with privacy controls)
	191	* Never shares model weights
	192	* Never overrides local governance
	193	Nodes may choose trust levels for AKEL-related data:
	194	* Trusted nodes: auto-merge embeddings + templates
	195	* Neutral nodes: require additional verification
	196	* Untrusted nodes: fully manual import
	197	== 10. Human Review Workflow (Mode 3 Publication) ==
	198	For content requiring human validation before "AKEL-Generated" status:
	199	1. AKEL generates content and publishes as AI-draft (Mode 2) or keeps as draft (Mode 1)
	200	2. Contributors inspect content in review queue
	201	3. Contributors validate quality gates were correctly applied
	202	4. Trusted Contributors validate high-risk (Tier A) or domain-specific outputs
	203	5. Moderators finalize "AKEL-Generated" publication
	204	6. Version numbers increment, full history preserved
	205	Note: Most AI-generated content (Tier B and C) can remain in Mode 2 (AI-Generated) indefinitely. Human review is optional for these tiers unless users or audits flag issues.
	206	== 11. POC v1 Behavior ==
	207	The POC explicitly demonstrates AI-generated content publication:
	208	* Produces public AI-generated output (Mode 2)
	209	* No human data sources required
	210	* No human approval gate
	211	* Clear "AI-Generated - POC/Demo" labeling
	212	* All quality gates active (including contradiction search)
	213	* Users understand this demonstrates AI reasoning capabilities
	214	* Risk tier classification shown (demo purposes)
	215	== 12. Related Pages ==
	216	* [[Automation>>FactHarbor.Specification.Automation.WebHome]]
	217	* [[Requirements (Roles)>>FactHarbor.Specification.Requirements.WebHome]]
	218	* [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]
	219	* [[Governance>>FactHarbor.Organisation.Governance.WebHome]]

= AKEL — AI Knowledge Extraction Layer =

AKEL is FactHarbor's automated intelligence subsystem.

Its purpose is to reduce human workload, enhance consistency, and enable scalable knowledge processing — **without ever replacing human judgment**.

4

AKEL outputs are marked with **AuthorType = AI** and published according to risk-based review policies (see Publication Modes below).

5

AKEL operates in two modes:

6

* **Single-node mode** (POC & Beta 0)

7

* **Federated multi-node mode** (Release 1.0+)

8

== 1. Purpose and Role ==

9

AKEL transforms unstructured inputs into structured, publication-ready content.

10

Core responsibilities:

11

* Claim extraction from arbitrary text

12

* Claim classification (domain, type, evaluability, safety, **risk tier**)

13

* Scenario generation (definitions, boundaries, assumptions, methodology)

14

* Evidence summarization and metadata extraction

15

* **Contradiction detection and counter-evidence search**

16

* **Reservation and limitation identification**

17

* **Bubble detection** (echo chambers, conspiracy theories, isolated sources)

18

* Re-evaluation proposal generation

19

* Cross-node embedding exchange (Release 1.0+)

20

== 2. Components ==

21

* **AKEL Orchestrator** – central coordinator

22

* **Claim Extractor**

23

* **Claim Classifier** (with risk tier assignment)

24

* **Scenario Generator**

25

* **Evidence Summarizer**

26

* **Contradiction Detector** (enhanced with counter-evidence search)

27

* **Quality Gate Validator**

28

* **Audit Sampling Scheduler**

29

* **Embedding Handler** (Release 1.0+)

30

* **Federation Sync Adapter** (Release 1.0+)

31

== 3. Inputs and Outputs ==

32

=== 3.1 Inputs ===

33

* User-submitted claims or evidence

34

* Uploaded documents

35

* URLs or citations

36

* External LLM API (optional)

37

* Embeddings (from local or federated peers)

38

=== 3.2 Outputs (publication mode varies by risk tier) ===

39

* ClaimVersion (draft or AI-generated)

40

* ScenarioVersion (draft or AI-generated)

41

* EvidenceVersion (summary + metadata, draft or AI-generated)

42

* VerdictVersion (draft, AI-generated, or human-reviewed)

43

* Contradiction alerts

44

* Reservation and limitation notices

45

* Re-evaluation proposals

46

* Updated embeddings

47

== 4. Publication Modes ==

48

AKEL content is published according to three modes:

49

=== 4.1 Mode 1: Draft-Only (Never Public) ===

50

**Used for:**

51

* Failed quality gate checks

52

* Sensitive topics flagged for expert review

53

* Unclear scope or missing critical sources

54

* High reputational risk content

55

**Visibility:** Internal review queue only

56

=== 4.2 Mode 2: Published as AI-Generated (No Prior Human Review) ===

57

**Requirements:**

58

* All automated quality gates passed (see below)

59

* Risk tier permits AI-draft publication (Tier B or C)

60

* Contradiction search completed successfully

61

* Clear labeling as "AI-Generated, AKEL-Generated"

62

**Label shown to users:**

63

```

64

[AI-Generated] This content was produced by AI and has not yet been human-reviewed.

65

Source: AI | Review Status: Pending | Risk Tier: [B/C]

66

Contradiction Search: Completed | Last Updated: [timestamp]

67

```

68

**User actions:**

69

* Browse and read content

70

* Request human review (escalates to review queue)

71

* Flag for expert attention

72

== 5. Risk tiers ==

73

AKEL assigns risk tiers to all content to determine appropriate review requirements:

74

=== 5.1 Tier A — High Risk / High Impact ===

75

**Domains:** Medical, legal, elections, safety/security, major reputational harm

76

**Publication policy:**

77

* Human review REQUIRED before "AKEL-Generated" status

78

* AI-generated content MAY be published but:

79

** Clearly flagged as AI-draft with prominent disclaimer

80

** May have limited visibility

81

** Auto-escalated to expert review queue

82

** User warnings displayed

83

**Audit rate:** Recommendation: 30-50% of published AI-drafts sampled in first 6 months

84

=== 5.2 Tier B — Medium Risk ===

85

**Domains:** Contested public policy, complex science, causality claims, significant financial impact

86

**Publication policy:**

87

* AI-draft CAN publish immediately with clear labeling

88

* Sampling audits conducted (see Audit System below)

89

* High-engagement items auto-escalated to expert review

90

* Users can report issue for moderator review

91

**Audit rate:** Recommendation: 10-20% of published AI-drafts sampled

92

=== 5.3 Tier C — Low Risk ===

93

**Domains:** Definitions, simple factual lookups with strong primary sources, historical facts, established scientific consensus

94

**Publication policy:**

95

* AI-draft default publication mode

96

* Sampling audits sufficient

97

* Community flagging available

98

* Human review on request

99

**Audit rate:** Recommendation: 5-10% of published AI-drafts sampled

100

== 6. Quality Gates (Mandatory Before AI-Draft Publication) ==

101

All AI-generated content must pass these automated checks before Mode 2 publication:

102

=== 6.1 Gate 1: Source Quality ===

103

* Primary sources identified and accessible

104

* Source reliability scored against whitelist

105

* Citation completeness verified

106

* Publication dates checked

107

* Author credentials validated (where applicable)

108

=== 6.2 Gate 2: Contradiction Search (MANDATORY) ===

109

**The system MUST actively search for:**

110

* **Counter-evidence** – Rebuttals, conflicting results, contradictory studies

111

* **Reservations** – Caveats, limitations, boundary conditions, applicability constraints

112

* **Alternative interpretations** – Different framings, definitions, contextual variations

113

* **Bubble detection** – Conspiracy theories, echo chambers, ideologically isolated sources

114

**Search coverage requirements:**

115

* Academic literature (BOTH supporting AND opposing views)

116

* Reputable media across diverse political/ideological perspectives

117

* Official contradictions (retractions, corrections, updates, amendments)

118

* Domain-specific skeptics, critics, and alternative expert opinions

119

* Cross-cultural and international perspectives

120

**Search must actively avoid algorithmic bubbles:**

121

* Deliberately seek opposing viewpoints

122

* Check for echo chamber patterns in source clusters

123

* Identify tribal or ideological source clustering

124

* Flag when search space appears artificially constrained

125

* Verify diversity of perspectives represented

126

**Outcomes:**

127

* **Strong counter-evidence found** → Auto-escalate to Tier B or draft-only mode

128

* **Significant uncertainty detected** → Require uncertainty disclosure in verdict

129

* **Bubble indicators present** → Flag for expert review and human validation

130

* **Limited perspective diversity** → Expand search or flag for human review

131

=== 6.3 Gate 3: Uncertainty Quantification ===

132

* Confidence scores calculated for all claims and verdicts

133

* Limitations explicitly stated

134

* Data gaps identified and disclosed

135

* Strength of evidence assessed

136

* Alternative scenarios considered

137

=== 6.4 Gate 4: Structural Integrity ===

138

* No hallucinations detected (fact-checking against sources)

139

* Logic chain valid and traceable

140

* References accessible and verifiable

141

* No circular reasoning

142

* Premises clearly stated

143

**If any gate fails:**

144

* Content remains in draft-only mode

145

* Failure reason logged

146

* Human review required before publication

147

* Failure patterns analyzed for system improvement

148

== 7. Audit System (Sampling-Based Quality Assurance) ==

149

Instead of reviewing ALL AI output, FactHarbor implements stratified sampling audits:

150

=== 7.1 Sampling Strategy ===

151

Audits prioritize:

152

* **Risk tier** (higher tiers get more frequent audits)

153

* **AI confidence score** (low confidence → higher sampling rate)

154

* **Traffic and engagement** (high-visibility content audited more)

155

* **Novelty** (new claim types, new domains, emerging topics)

156

* **Disagreement signals** (user flags, contradiction alerts, community reports)

157

=== 7.2 Audit Process ===

158

1. System selects content for audit based on sampling strategy

159

2. Human auditor reviews AI-generated content against quality standards

160

3. Moderator validates or corrects:

161

* Claim extraction accuracy

162

* Scenario appropriateness

163

* Evidence relevance and interpretation

164

* Verdict reasoning

165

* Contradiction search completeness

166

4. Audit outcome recorded (pass/fail + detailed feedback)

167

5. Failed audits trigger immediate content review

168

6. Audit results feed back into system improvement

169

=== 7.3 Feedback Loop (Continuous Improvement) ===

170

Audit outcomes systematically improve:

171

* **Query templates** – Refined based on missed evidence patterns

172

* **Retrieval source weights** – Adjusted for accuracy and reliability

173

* **Contradiction detection heuristics** – Enhanced to catch missed counter-evidence

174

* **Model prompts and extraction rules** – Tuned for better claim extraction

175

* **Risk tier assignments** – Recalibrated based on error patterns

176

* **Bubble detection algorithms** – Improved to identify echo chambers

177

=== 7.4 Audit Transparency ===

178

* Audit statistics published regularly

179

* Accuracy rates by risk tier tracked and reported

180

* System improvements documented

181

* Community can view aggregate audit performance

182

== 8. Architecture Overview ==

183

{{include reference="FactHarbor.Specification.Diagrams.AKEL Architecture.WebHome"/}}

184

== 9. AKEL and Federation ==

185

In Release 1.0+, AKEL participates in cross-node knowledge alignment:

186

* Shares embeddings

187

* Exchanges canonicalized claim forms

188

* Exchanges scenario templates

189

* Sends + receives contradiction alerts

190

* Shares audit findings (with privacy controls)

191

* Never shares model weights

192

* Never overrides local governance

193

Nodes may choose trust levels for AKEL-related data:

194

* Trusted nodes: auto-merge embeddings + templates

195

* Neutral nodes: require additional verification

196

* Untrusted nodes: fully manual import

197

== 10. Human Review Workflow (Mode 3 Publication) ==

198

For content requiring human validation before "AKEL-Generated" status:

199

1. AKEL generates content and publishes as AI-draft (Mode 2) or keeps as draft (Mode 1)

200

2. Contributors inspect content in review queue

201

3. Contributors validate quality gates were correctly applied

202

4. Trusted Contributors validate high-risk (Tier A) or domain-specific outputs

203

5. Moderators finalize "AKEL-Generated" publication

204

6. Version numbers increment, full history preserved

205

**Note:** Most AI-generated content (Tier B and C) can remain in Mode 2 (AI-Generated) indefinitely. Human review is optional for these tiers unless users or audits flag issues.

206

== 11. POC v1 Behavior ==

207

The POC explicitly demonstrates AI-generated content publication:

208

* Produces public AI-generated output (Mode 2)

209

* No human data sources required

210

* No human approval gate

211

* Clear "AI-Generated - POC/Demo" labeling