AI Knowledge Extraction Layer (AKEL)

= AKEL – AI Knowledge Extraction Layer =

**Version:** 0.9.70 **Last Updated:** December 21, 2025 **Status:** CORRECTED - Automation Philosophy Consistent AKEL is FactHarbor's automated intelligence subsystem. Its purpose is to reduce human workload, enhance consistency, and enable scalable knowledge processing. AKEL outputs are marked with **AuthorType = AI** and published according to risk-based policies (see Publication Modes below). AKEL operates in two modes:

4

5

* **Single-node mode** (POC & Beta 0)

6

* **Federated multi-node mode** (Release 1.0+) == 1. Core Philosophy: Automation First == **V0.9.50+ Philosophy Shift:** FactHarbor uses **"Improve the system, not the data"** approach: * ✅ **Automated Publication:** AI-generated content publishes immediately after passing quality gates

7

* ✅ **Quality Gates:** Automated checks (not human approval)

8

* ✅ **Sampling Audits:** Humans analyze patterns for system improvement (not individual approval)

9

* ❌ **NO approval workflows:** No review queues, no moderator gatekeeping for content quality

10

* ❌ **NO manual fixes:** If output is wrong, improve the algorithm/prompts **Why This Matters:** Traditional approach: Human reviews every output → Bottleneck, inconsistent FactHarbor approach: Automated quality gates + pattern-based improvement → Scalable, consistent == 2. Publication Modes == **V0.9.70 CLARIFICATION:** FactHarbor uses **TWO publication modes** (not three): === Mode 1: Draft-Only === **Status:** Not visible to public **When Used:**

11

* Quality gates failed

12

* Confidence below threshold

13

* Structural integrity issues

14

* Insufficient evidence **What Happens:**

15

* Content remains private

16

* System logs failure reasons

17

* Prompts/algorithms improved based on patterns

18

* Content may be re-processed after improvements **NOT "pending human approval"** - it's blocked because it doesn't meet automated quality standards. === Mode 2: AI-Generated (Public) === **Status:** Published and visible to all users **When Used:**

19

* Quality gates passed

20

* Confidence ≥ threshold

21

* Meets structural requirements

22

* Sufficient evidence found **Includes:**

23

* Confidence score displayed (0-100%)

24

* Risk tier badge (A/B/C)

25

* Quality indicators

26

* Clear "AI-Generated" labeling

27

* Sampling audit status **Labels by Risk Tier:**

28

* **Tier A (High Risk):** "⚠️ AI-Generated - High Impact Topic - Seek Professional Advice"

29

* **Tier B (Medium Risk):** "🤖 AI-Generated - May Contain Errors"

30

* **Tier C (Low Risk):** "🤖 AI-Generated" === REMOVED: "Mode 3: Human-Reviewed" === **V0.9.50 Decision:** No centralized approval workflow. **Rationale:**

31

* Defeats automation purpose

32

* Creates bottleneck

33

* Inconsistent quality

34

* Not scalable **What Replaced It:**

35

* Better quality gates

36

* Sampling audits for system improvement

37

* Transparent confidence scoring

38

* Risk-based warnings == 3. Risk Tiers (A/B/C) == Risk classification determines WARNING LABELS and AUDIT FREQUENCY, NOT approval requirements. === Tier A: High-Stakes Claims === **Examples:** Medical advice, legal interpretations, financial recommendations, safety information **Impact:**

39

* ✅ Publish immediately (if passes gates)

40

* ✅ Prominent warning labels

41

* ✅ Higher sampling audit frequency (50% audited)

42

* ✅ Explicit disclaimers ("Seek professional advice")

43

* ❌ NOT held for moderator approval **Philosophy:** Publish with strong warnings, monitor closely === Tier B: Moderate-Stakes Claims === **Examples:** Political claims, controversial topics, scientific debates **Impact:**

44

* ✅ Publish immediately (if passes gates)

45

* ✅ Standard warning labels

46

* ✅ Medium sampling audit frequency (20% audited)

47

* ❌ NOT held for moderator approval === Tier C: Low-Stakes Claims === **Examples:** Entertainment facts, sports statistics, general knowledge **Impact:**

48

* ✅ Publish immediately (if passes gates)

49

* ✅ Minimal warning labels

50

* ✅ Low sampling audit frequency (5% audited) == 4. Quality Gates (Automated, Not Human) == All AI-generated content must pass these **AUTOMATED checks** before publication: === Gate 1: Source Quality === **Automated Checks:**

51

* Primary sources identified and accessible

52

* Source reliability scored against database

53

* Citation completeness verified

54

* Publication dates checked

55

* Author credentials validated (where applicable) **If Failed:** Block publication, log pattern, improve source detection === Gate 2: Contradiction Search (MANDATORY) === **The system MUST actively search for:** * **Counter-evidence** – Rebuttals, conflicting results, contradictory studies

56

* **Reservations** – Caveats, limitations, boundary conditions

57

* **Alternative interpretations** – Different framings, definitions

58

* **Bubble detection** – Echo chambers, ideologically isolated sources **Search Coverage Requirements:**

59

* Academic literature (BOTH supporting AND opposing views)

60

* Diverse media across political/ideological perspectives

61

* Official contradictions (retractions, corrections, amendments)

62

* Cross-cultural and international perspectives **Search Must Avoid Algorithmic Bubbles:**

63

* Deliberately seek opposing viewpoints

64

* Check for echo chamber patterns

65

* Identify tribal source clustering

66

* Flag artificially constrained search space

67

* Verify diversity of perspectives **Outcomes:**

68

* Strong counter-evidence → Auto-escalate to Tier B or draft-only

69

* Significant uncertainty → Require uncertainty disclosure in verdict

70

* Bubble indicators → Flag for sampling audit

71

* Limited perspective diversity → Expand search or flag **If Failed:** Block publication, improve search algorithms === Gate 3: Uncertainty Quantification === **Automated Checks:**

72

* Confidence scores calculated for all claims and verdicts

73

* Limitations explicitly stated

74

* Data gaps identified and disclosed

75

* Strength of evidence assessed

76

* Alternative scenarios considered **If Failed:** Block publication, improve confidence scoring === Gate 4: Structural Integrity === **Automated Checks:**

77

* No hallucinations detected (fact-checking against sources)

78

* Logic chain valid and traceable

79

* References accessible and verifiable

80

* No circular reasoning

81

* Premises clearly stated **If Failed:** Block publication, improve hallucination detection **CRITICAL:** If any gate fails:

82

* ✅ Content remains in draft-only mode

83

* ✅ Failure reason logged

84

* ✅ Failure patterns analyzed for system improvement

85

* ❌ **NOT "sent for human review"**

86

* ❌ **NOT "manually overridden"** **Philosophy:** Fix the system that generated bad output, don't manually fix individual outputs. == 5. Sampling Audit System == **Purpose:** Improve the system through pattern analysis (NOT approve individual outputs) === 5.1 How Sampling Works === **Stratified Sampling Strategy:** Audits prioritize:

87

* **Risk tier** (Tier A: 50%, Tier B: 20%, Tier C: 5%)

88

* **AI confidence score** (low confidence → higher sampling rate)

89

* **Traffic and engagement** (high-visibility content audited more)

90

* **Novelty** (new claim types, new domains, emerging topics)

91

* **Disagreement signals** (user flags, contradiction alerts, community reports) **NOT:** Review queue for approval **IS:** Statistical sampling for quality monitoring === 5.2 Audit Process === 1. **System selects** content for audit based on sampling strategy

92

2. **Human auditor** reviews AI-generated content against quality standards

93

3. **Auditor validates or identifies issues:** * Claim extraction accuracy * Scenario appropriateness * Evidence relevance and interpretation * Verdict reasoning * Contradiction search completeness

94

4. **Audit outcome recorded** (pass/fail + detailed feedback)

95

5. **Failed audits trigger:** * Analysis of failure pattern * System improvement tasks * Algorithm/prompt adjustments

96

6. **Audit results feed back** into system improvement **CRITICAL:** Auditors analyze PATTERNS, not fix individual outputs. === 5.3 Feedback Loop (Continuous Improvement) === Audit outcomes systematically improve: * **Query templates** – Refined based on missed evidence patterns

97

* **Retrieval source weights** – Adjusted for accuracy and reliability

98

* **Contradiction detection heuristics** – Enhanced to catch missed counter-evidence

99

* **Model prompts and extraction rules** – Tuned for better claim extraction

100

* **Risk tier assignments** – Recalibrated based on error patterns

101

* **Bubble detection algorithms** – Improved to identify echo chambers **Philosophy:** "Improve the system, not the data" === 5.4 Audit Transparency === **Publicly Published:**

102

* Audit statistics (monthly)

103

* Accuracy rates by risk tier

104

* System improvements made

105

* Aggregate audit performance **Enables:**

106

* Public accountability

107

* System trust

108

* Continuous improvement visibility == 6. Human Intervention Criteria == **From Organisation.Decision-Processes:** **LEGITIMATE reasons to intervene:** * ✅ AKEL explicitly flags item for sampling audit

109

* ✅ System metrics show performance degradation

110

* ✅ Legal/safety issue requires immediate action

111

* ✅ User reports reveal systematic bias pattern **ILLEGITIMATE reasons** (system improvement needed instead): * ❌ "I disagree with this verdict" → Improve algorithm

112

* ❌ "This source should rank higher" → Improve scoring rules

113

* ❌ "Manual quality gate before publication" → Defeats purpose of automation

114

* ❌ "I know better than the algorithm" → Then improve the algorithm **Philosophy:** If you disagree with output, improve the system that generated it. == 7. Architecture Overview == === POC Architecture (POC1, POC2) === **Simple, Single-Call Approach:** ```

115

User submits article/claim ↓

116

Single AI API call ↓

117

Returns complete analysis ↓

118

Quality gates validate ↓

119

PASS → Publish (Mode 2)

120

FAIL → Block (Mode 1)

121

``` **Components in Single Call:**

122

123

1. Extract 3-5 factual claims

124

2. For each claim: verdict + confidence + risk tier + reasoning

125

3. Generate analysis summary

126

4. Generate article summary

127

5. Run basic quality checks **Processing Time:** 10-18 seconds **Advantages:** Simple, fast POC development, proves AI capability **Limitations:** No component reusability, all-or-nothing === Full System Architecture (Beta 0, Release 1.0) === **Multi-Component Pipeline:** ```

128

AKEL Orchestrator

129

├── Claim Extractor

130

├── Claim Classifier (with risk tier assignment)

131

├── Scenario Generator

132

├── Evidence Summarizer

133

├── Contradiction Detector

134

├── Quality Gate Validator

135

├── Audit Sampling Scheduler

136

└── Federation Sync Adapter (Release 1.0+)

137

``` **Processing:**

138

139

* Parallel processing where possible

140

* Separate component calls

141

* Quality gates between phases

142

* Audit sampling selection

143

* Cross-node coordination (federated mode) **Processing Time:** 10-30 seconds (full pipeline) === Evolution Path === **POC1:** Single prompt → Prove concept **POC2:** Add scenario component → Test full pipeline **Beta 0:** Multi-component AKEL → Production architecture **Release 1.0:** Full AKEL + Federation → Scale == 8. AKEL and Federation == In Release 1.0+, AKEL participates in cross-node knowledge alignment: * Shares embeddings

144

* Exchanges canonicalized claim forms

145

* Exchanges scenario templates

146

* Sends + receives contradiction alerts

147

* Shares audit findings (with privacy controls)

148

* Never shares model weights

149

* Never overrides local governance Nodes may choose trust levels for AKEL-related data:

150

* Trusted nodes: auto-merge embeddings + templates

151

* Neutral nodes: require additional verification

152

* Untrusted nodes: fully manual import == 9. POC Behavior == The POC explicitly demonstrates AI-generated content publication: * ✅ Produces public AI-generated output (Mode 2)

153

* ✅ No human data sources required

154

* ✅ No human approval gate

155

* ✅ Clear "AI-Generated - POC/Demo" labeling

156

* ✅ All quality gates active (including contradiction search)

157

* ✅ Users understand this demonstrates AI reasoning capabilities

158

* ✅ Risk tier classification shown (demo purposes) **Philosophy Validation:** POC proves automation-first approach works. == 10. Related Pages == * [[Automation>>Archive.FactHarbor.Specification.Automation.WebHome]]

159

* [[Requirements (Roles)>>Archive.FactHarbor.Specification.Requirements.WebHome]]

160

* [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]

161

* [[Governance>>Archive.FactHarbor.Organisation.Governance.WebHome]]

162

* [[Decision Processes>>FactHarbor.Organisation.Decision-Processes.WebHome]] **V0.9.70 CHANGES:**

163

- ❌ REMOVED: Section "Human Review Workflow (Mode 3 Publication)"

164

- ❌ REMOVED: All references to "Mode 3"