AI Knowledge Extraction Layer (AKEL)

1

= AKEL – AI Knowledge Extraction Layer =

2

3

**Version:** 0.9.70

4

**Last Updated:** December 21, 2025

5

**Status:** CORRECTED - Automation Philosophy Consistent

6

7

AKEL is FactHarbor's automated intelligence subsystem.

8

Its purpose is to reduce human workload, enhance consistency, and enable scalable knowledge processing.

9

10

AKEL outputs are marked with **AuthorType = AI** and published according to risk-based policies (see Publication Modes below).

11

12

AKEL operates in two modes:

13

* **Single-node mode** (POC & Beta 0)

14

* **Federated multi-node mode** (Release 1.0+)

15

16

17

== 1. Core Philosophy: Automation First ==

18

19

**V0.9.50+ Philosophy Shift:**

20

21

FactHarbor uses **"Improve the system, not the data"** approach:

22

23

* ✅ **Automated Publication:** AI-generated content publishes immediately after passing quality gates

24

* ✅ **Quality Gates:** Automated checks (not human approval)

25

* ✅ **Sampling Audits:** Humans analyze patterns for system improvement (not individual approval)

26

* ❌ **NO approval workflows:** No review queues, no moderator gatekeeping for content quality

27

* ❌ **NO manual fixes:** If output is wrong, improve the algorithm/prompts

28

29

**Why This Matters:**

30

31

Traditional approach: Human reviews every output → Bottleneck, inconsistent

32

FactHarbor approach: Automated quality gates + pattern-based improvement → Scalable, consistent

33

34

35

== 2. Publication Modes ==

36

37

**V0.9.70 CLARIFICATION:** FactHarbor uses **TWO publication modes** (not three):

38

39

=== Mode 1: Draft-Only ===

40

41

**Status:** Not visible to public

42

43

**When Used:**

44

* Quality gates failed

45

* Confidence below threshold

46

* Structural integrity issues

47

* Insufficient evidence

48

49

**What Happens:**

50

* Content remains private

51

* System logs failure reasons

52

* Prompts/algorithms improved based on patterns

53

* Content may be re-processed after improvements

54

55

**NOT "pending human approval"** - it's blocked because it doesn't meet automated quality standards.

56

57

58

=== Mode 2: AI-Generated (Public) ===

59

60

**Status:** Published and visible to all users

61

62

**When Used:**

63

* Quality gates passed

64

* Confidence ≥ threshold

65

* Meets structural requirements

66

* Sufficient evidence found

67

68

**Includes:**

69

* Confidence score displayed (0-100%)

70

* Risk tier badge (A/B/C)

71

* Quality indicators

72

* Clear "AI-Generated" labeling

73

* Sampling audit status

74

75

**Labels by Risk Tier:**

76

* **Tier A (High Risk):** "⚠️ AI-Generated - High Impact Topic - Seek Professional Advice"

77

* **Tier B (Medium Risk):** "🤖 AI-Generated - May Contain Errors"

78

* **Tier C (Low Risk):** "🤖 AI-Generated"

79

80

81

=== REMOVED: "Mode 3: Human-Reviewed" ===

82

83

**V0.9.50 Decision:** No centralized approval workflow.

84

85

**Rationale:**

86

* Defeats automation purpose

87

* Creates bottleneck

88

* Inconsistent quality

89

* Not scalable

90

91

**What Replaced It:**

92

* Better quality gates

93

* Sampling audits for system improvement

94

* Transparent confidence scoring

95

* Risk-based warnings

96

97

98

== 3. Risk Tiers (A/B/C) ==

99

100

Risk classification determines WARNING LABELS and AUDIT FREQUENCY, NOT approval requirements.

101

102

=== Tier A: High-Stakes Claims ===

103

104

**Examples:** Medical advice, legal interpretations, financial recommendations, safety information

105

106

**Impact:**

107

* ✅ Publish immediately (if passes gates)

108

* ✅ Prominent warning labels

109

* ✅ Higher sampling audit frequency (50% audited)

110

* ✅ Explicit disclaimers ("Seek professional advice")

111

* ❌ NOT held for moderator approval

112

113

**Philosophy:** Publish with strong warnings, monitor closely

114

115

116

=== Tier B: Moderate-Stakes Claims ===

117

118

**Examples:** Political claims, controversial topics, scientific debates

119

120

**Impact:**

121

* ✅ Publish immediately (if passes gates)

122

* ✅ Standard warning labels

123

* ✅ Medium sampling audit frequency (20% audited)

124

* ❌ NOT held for moderator approval

125

126

127

=== Tier C: Low-Stakes Claims ===

128

129

**Examples:** Entertainment facts, sports statistics, general knowledge

130

131

**Impact:**

132

* ✅ Publish immediately (if passes gates)

133

* ✅ Minimal warning labels

134

* ✅ Low sampling audit frequency (5% audited)

135

136

137

== 4. Quality Gates (Automated, Not Human) ==

138

139

All AI-generated content must pass these **AUTOMATED checks** before publication:

140

141

=== Gate 1: Source Quality ===

142

143

**Automated Checks:**

144

* Primary sources identified and accessible

145

* Source reliability scored against database

146

* Citation completeness verified

147

* Publication dates checked

148

* Author credentials validated (where applicable)

149

150

**If Failed:** Block publication, log pattern, improve source detection

151

152

153

=== Gate 2: Contradiction Search (MANDATORY) ===

154

155

**The system MUST actively search for:**

156

157

* **Counter-evidence** – Rebuttals, conflicting results, contradictory studies

158

* **Reservations** – Caveats, limitations, boundary conditions

159

* **Alternative interpretations** – Different framings, definitions

160

* **Bubble detection** – Echo chambers, ideologically isolated sources

161

162

**Search Coverage Requirements:**

163

* Academic literature (BOTH supporting AND opposing views)

164

* Diverse media across political/ideological perspectives

165

* Official contradictions (retractions, corrections, amendments)

166

* Cross-cultural and international perspectives

167

168

**Search Must Avoid Algorithmic Bubbles:**

169

* Deliberately seek opposing viewpoints

170

* Check for echo chamber patterns

171

* Identify tribal source clustering

172

* Flag artificially constrained search space

173

* Verify diversity of perspectives

174

175

**Outcomes:**

176

* Strong counter-evidence → Auto-escalate to Tier B or draft-only

177

* Significant uncertainty → Require uncertainty disclosure in verdict

178

* Bubble indicators → Flag for sampling audit

179

* Limited perspective diversity → Expand search or flag

180

181

**If Failed:** Block publication, improve search algorithms

182

183

184

=== Gate 3: Uncertainty Quantification ===

185

186

**Automated Checks:**

187

* Confidence scores calculated for all claims and verdicts

188

* Limitations explicitly stated

189

* Data gaps identified and disclosed

190

* Strength of evidence assessed

191

* Alternative scenarios considered

192

193

**If Failed:** Block publication, improve confidence scoring

194

195

196

=== Gate 4: Structural Integrity ===

197

198

**Automated Checks:**

199

* No hallucinations detected (fact-checking against sources)

200

* Logic chain valid and traceable

201

* References accessible and verifiable

202

* No circular reasoning

203

* Premises clearly stated

204

205

**If Failed:** Block publication, improve hallucination detection

206

207

208

**CRITICAL:** If any gate fails:

209

* ✅ Content remains in draft-only mode

210

* ✅ Failure reason logged

211

* ✅ Failure patterns analyzed for system improvement

212

* ❌ **NOT "sent for human review"**

213

* ❌ **NOT "manually overridden"**

214

215

**Philosophy:** Fix the system that generated bad output, don't manually fix individual outputs.

216

217

218

== 5. Sampling Audit System ==

219

220

**Purpose:** Improve the system through pattern analysis (NOT approve individual outputs)

221

222

=== 5.1 How Sampling Works ===

223

224

**Stratified Sampling Strategy:**

225

226

Audits prioritize:

227

* **Risk tier** (Tier A: 50%, Tier B: 20%, Tier C: 5%)

228

* **AI confidence score** (low confidence → higher sampling rate)

229

* **Traffic and engagement** (high-visibility content audited more)

230

* **Novelty** (new claim types, new domains, emerging topics)

231

* **Disagreement signals** (user flags, contradiction alerts, community reports)

232

233

**NOT:** Review queue for approval

234

**IS:** Statistical sampling for quality monitoring

235

236

237

=== 5.2 Audit Process ===

238

239

1. **System selects** content for audit based on sampling strategy

240

2. **Human auditor** reviews AI-generated content against quality standards

241

3. **Auditor validates or identifies issues:**

242

* Claim extraction accuracy

243

* Scenario appropriateness

244

* Evidence relevance and interpretation

245

* Verdict reasoning

246

* Contradiction search completeness

247

4. **Audit outcome recorded** (pass/fail + detailed feedback)

248

5. **Failed audits trigger:**

249

* Analysis of failure pattern

250

* System improvement tasks

251

* Algorithm/prompt adjustments

252

6. **Audit results feed back** into system improvement

253

254

**CRITICAL:** Auditors analyze PATTERNS, not fix individual outputs.

255

256

257

=== 5.3 Feedback Loop (Continuous Improvement) ===

258

259

Audit outcomes systematically improve:

260

261

* **Query templates** – Refined based on missed evidence patterns

262

* **Retrieval source weights** – Adjusted for accuracy and reliability

263

* **Contradiction detection heuristics** – Enhanced to catch missed counter-evidence

264

* **Model prompts and extraction rules** – Tuned for better claim extraction

265

* **Risk tier assignments** – Recalibrated based on error patterns

266

* **Bubble detection algorithms** – Improved to identify echo chambers

267

268

**Philosophy:** "Improve the system, not the data"

269

270

271

=== 5.4 Audit Transparency ===

272

273

**Publicly Published:**

274

* Audit statistics (monthly)

275

* Accuracy rates by risk tier

276

* System improvements made

277

* Aggregate audit performance

278

279

**Enables:**

280

* Public accountability

281

* System trust

282

* Continuous improvement visibility

283

284

285

== 6. Human Intervention Criteria ==

286

287

**From Organisation.Decision-Processes:**

288

289

**LEGITIMATE reasons to intervene:**

290

291

* ✅ AKEL explicitly flags item for sampling audit

292

* ✅ System metrics show performance degradation

293

* ✅ Legal/safety issue requires immediate action

294

* ✅ User reports reveal systematic bias pattern

295

296

**ILLEGITIMATE reasons** (system improvement needed instead):

297

298

* ❌ "I disagree with this verdict" → Improve algorithm

299

* ❌ "This source should rank higher" → Improve scoring rules

300

* ❌ "Manual quality gate before publication" → Defeats purpose of automation

301

* ❌ "I know better than the algorithm" → Then improve the algorithm

302

303

**Philosophy:** If you disagree with output, improve the system that generated it.

304

305

306

== 7. Architecture Overview ==

307

308

=== POC Architecture (POC1, POC2) ===

309

310

**Simple, Single-Call Approach:**

311

312

```

313

User submits article/claim

↓

Single AI API call

↓

Returns complete analysis

318

↓

319

Quality gates validate

320

↓

321

PASS → Publish (Mode 2)

322

FAIL → Block (Mode 1)

323

```

324

325

**Components in Single Call:**

326

1. Extract 3-5 factual claims

327

2. For each claim: verdict + confidence + risk tier + reasoning

328

3. Generate analysis summary

329

4. Generate article summary

330

5. Run basic quality checks

331

332

**Processing Time:** 10-18 seconds

333

334

**Advantages:** Simple, fast POC development, proves AI capability

335

**Limitations:** No component reusability, all-or-nothing

336

337

338

=== Full System Architecture (Beta 0, Release 1.0) ===

339

340

**Multi-Component Pipeline:**

```

AKEL Orchestrator

├── Claim Extractor

├── Claim Classifier (with risk tier assignment)

346

├── Scenario Generator

347

├── Evidence Summarizer

348

├── Contradiction Detector

349

├── Quality Gate Validator

350

├── Audit Sampling Scheduler

351

└── Federation Sync Adapter (Release 1.0+)

```

**Processing:**

* Parallel processing where possible

356

* Separate component calls

357

* Quality gates between phases

358

* Audit sampling selection

359

* Cross-node coordination (federated mode)

360

361

**Processing Time:** 10-30 seconds (full pipeline)

362

363

364

=== Evolution Path ===

365

366

**POC1:** Single prompt → Prove concept

367

**POC2:** Add scenario component → Test full pipeline

368

**Beta 0:** Multi-component AKEL → Production architecture

369

**Release 1.0:** Full AKEL + Federation → Scale

370

371

372

== 8. AKEL and Federation ==

373

374

In Release 1.0+, AKEL participates in cross-node knowledge alignment:

375

376

* Shares embeddings

377

* Exchanges canonicalized claim forms

378

* Exchanges scenario templates

379

* Sends + receives contradiction alerts

380

* Shares audit findings (with privacy controls)

381

* Never shares model weights

382

* Never overrides local governance

383

384

Nodes may choose trust levels for AKEL-related data:

385

* Trusted nodes: auto-merge embeddings + templates

386

* Neutral nodes: require additional verification

387

* Untrusted nodes: fully manual import

388

389

390

== 9. POC Behavior ==

391

392

The POC explicitly demonstrates AI-generated content publication:

393

394

* ✅ Produces public AI-generated output (Mode 2)

395

* ✅ No human data sources required

396

* ✅ No human approval gate

397

* ✅ Clear "AI-Generated - POC/Demo" labeling

398

* ✅ All quality gates active (including contradiction search)

399

* ✅ Users understand this demonstrates AI reasoning capabilities

400

* ✅ Risk tier classification shown (demo purposes)

401

402

**Philosophy Validation:** POC proves automation-first approach works.

403

404

405

== 10. Related Pages ==

406

407

* [[Automation>>FactHarbor.Specification.Automation.WebHome]]

408

* [[Requirements (Roles)>>FactHarbor.Specification.Requirements.WebHome]]

409

* [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]

410

* [[Governance>>FactHarbor.Organisation.Governance.WebHome]]

411

* [[Decision Processes>>FactHarbor.Organisation.Decision-Processes.WebHome]]

**V0.9.70 CHANGES:**

- ❌ REMOVED: Section "Human Review Workflow (Mode 3 Publication)"

416

- ❌ REMOVED: All references to "Mode 3"

417

- ❌ REMOVED: "Human review required before publication"

418

- ✅ CLARIFIED: Two modes only (AI-Generated / Draft-Only)

419

- ✅ CLARIFIED: Quality gate failures → Block + improve system

420

- ✅ CLARIFIED: Sampling audits for improvement, NOT approval

421

- ✅ CLARIFIED: Risk tiers affect warnings/audits, NOT approval gates

422

- ✅ ENHANCED: Gate 2 (Contradiction Search) specification

423

- ✅ ADDED: Clear human intervention criteria

424

- ✅ ADDED: Detailed audit system explanation

Wiki source code of AI Knowledge Extraction Layer (AKEL)

Applications

Navigation

Need help?