AI Knowledge Extraction Layer (AKEL)

Last modified by Robert Schaub on 2025/12/22 13:49

AKEL – AI Knowledge Extraction Layer

Version: 0.9.70  
Last Updated: December 21, 2025  
Status: CORRECTED - Automation Philosophy Consistent

AKEL is FactHarbor's automated intelligence subsystem.  
Its purpose is to reduce human workload, enhance consistency, and enable scalable knowledge processing.

AKEL outputs are marked with AuthorType = AI and published according to risk-based policies (see Publication Modes below).

AKEL operates in two modes:

  • Single-node mode (POC & Beta 0)
  • Federated multi-node mode (Release 1.0+)

1. Core Philosophy: Automation First

V0.9.50+ Philosophy Shift:

FactHarbor uses "Improve the system, not the data" approach:

  • Automated Publication: AI-generated content publishes immediately after passing quality gates
  • Quality Gates: Automated checks (not human approval)
  • Sampling Audits: Humans analyze patterns for system improvement (not individual approval)
  • NO approval workflows: No review queues, no moderator gatekeeping for content quality
  • NO manual fixes: If output is wrong, improve the algorithm/prompts

Why This Matters:

Traditional approach: Human reviews every output → Bottleneck, inconsistent  
FactHarbor approach: Automated quality gates + pattern-based improvement → Scalable, consistent

2. Publication Modes

V0.9.70 CLARIFICATION: FactHarbor uses TWO publication modes (not three):

Mode 1: Draft-Only

Status: Not visible to public

When Used:

  • Quality gates failed
  • Confidence below threshold
  • Structural integrity issues
  • Insufficient evidence

What Happens:

  • Content remains private
  • System logs failure reasons
  • Prompts/algorithms improved based on patterns
  • Content may be re-processed after improvements

NOT "pending human approval" - it's blocked because it doesn't meet automated quality standards.

Mode 2: AI-Generated (Public)

Status: Published and visible to all users

When Used:

  • Quality gates passed
  • Confidence ≥ threshold
  • Meets structural requirements
  • Sufficient evidence found

Includes:

  • Confidence score displayed (0-100%)
  • Risk tier badge (A/B/C)
  • Quality indicators
  • Clear "AI-Generated" labeling
  • Sampling audit status

Labels by Risk Tier:

  • Tier A (High Risk): "⚠️ AI-Generated - High Impact Topic - Seek Professional Advice"
  • Tier B (Medium Risk): "🤖 AI-Generated - May Contain Errors"
  • Tier C (Low Risk): "🤖 AI-Generated"

REMOVED: "Mode 3: Human-Reviewed"

V0.9.50 Decision: No centralized approval workflow.

Rationale:

  • Defeats automation purpose
  • Creates bottleneck
  • Inconsistent quality
  • Not scalable

What Replaced It:

  • Better quality gates
  • Sampling audits for system improvement
  • Transparent confidence scoring
  • Risk-based warnings

3. Risk Tiers (A/B/C)

Risk classification determines WARNING LABELS and AUDIT FREQUENCY, NOT approval requirements.

Tier A: High-Stakes Claims

Examples: Medical advice, legal interpretations, financial recommendations, safety information

Impact:

  • ✅ Publish immediately (if passes gates)
  • ✅ Prominent warning labels
  • ✅ Higher sampling audit frequency (50% audited)
  • ✅ Explicit disclaimers ("Seek professional advice")
  • ❌ NOT held for moderator approval

Philosophy: Publish with strong warnings, monitor closely

Tier B: Moderate-Stakes Claims

Examples: Political claims, controversial topics, scientific debates

Impact:

  • ✅ Publish immediately (if passes gates)
  • ✅ Standard warning labels
  • ✅ Medium sampling audit frequency (20% audited)
  • ❌ NOT held for moderator approval

Tier C: Low-Stakes Claims

Examples: Entertainment facts, sports statistics, general knowledge

Impact:

  • ✅ Publish immediately (if passes gates)
  • ✅ Minimal warning labels
  • ✅ Low sampling audit frequency (5% audited)

4. Quality Gates (Automated, Not Human)

All AI-generated content must pass these AUTOMATED checks before publication:

Gate 1: Source Quality

Automated Checks:

  • Primary sources identified and accessible
  • Source reliability scored against database
  • Citation completeness verified
  • Publication dates checked
  • Author credentials validated (where applicable)

If Failed: Block publication, log pattern, improve source detection

Gate 2: Contradiction Search (MANDATORY)

The system MUST actively search for:

  • Counter-evidence – Rebuttals, conflicting results, contradictory studies
  • Reservations – Caveats, limitations, boundary conditions
  • Alternative interpretations – Different framings, definitions
  • Bubble detection – Echo chambers, ideologically isolated sources

Search Coverage Requirements:

  • Academic literature (BOTH supporting AND opposing views)
  • Diverse media across political/ideological perspectives
  • Official contradictions (retractions, corrections, amendments)
  • Cross-cultural and international perspectives

Search Must Avoid Algorithmic Bubbles:

  • Deliberately seek opposing viewpoints
  • Check for echo chamber patterns
  • Identify tribal source clustering
  • Flag artificially constrained search space
  • Verify diversity of perspectives

Outcomes:

  • Strong counter-evidence → Auto-escalate to Tier B or draft-only
  • Significant uncertainty → Require uncertainty disclosure in verdict
  • Bubble indicators → Flag for sampling audit
  • Limited perspective diversity → Expand search or flag

If Failed: Block publication, improve search algorithms

Gate 3: Uncertainty Quantification

Automated Checks:

  • Confidence scores calculated for all claims and verdicts
  • Limitations explicitly stated
  • Data gaps identified and disclosed
  • Strength of evidence assessed
  • Alternative scenarios considered

If Failed: Block publication, improve confidence scoring

Gate 4: Structural Integrity

Automated Checks:

  • No hallucinations detected (fact-checking against sources)
  • Logic chain valid and traceable
  • References accessible and verifiable
  • No circular reasoning
  • Premises clearly stated

If Failed: Block publication, improve hallucination detection

CRITICAL: If any gate fails:

  • ✅ Content remains in draft-only mode
  • ✅ Failure reason logged
  • ✅ Failure patterns analyzed for system improvement
  • NOT "sent for human review"
  • NOT "manually overridden"

Philosophy: Fix the system that generated bad output, don't manually fix individual outputs.

5. Sampling Audit System

Purpose: Improve the system through pattern analysis (NOT approve individual outputs)

5.1 How Sampling Works

Stratified Sampling Strategy:

Audits prioritize:

  • Risk tier (Tier A: 50%, Tier B: 20%, Tier C: 5%)
  • AI confidence score (low confidence → higher sampling rate)
  • Traffic and engagement (high-visibility content audited more)
  • Novelty (new claim types, new domains, emerging topics)
  • Disagreement signals (user flags, contradiction alerts, community reports)

NOT: Review queue for approval  
IS: Statistical sampling for quality monitoring

5.2 Audit Process

  1. System selects content for audit based on sampling strategy
    2. Human auditor reviews AI-generated content against quality standards
    3. Auditor validates or identifies issues:
  • Claim extraction accuracy
  • Scenario appropriateness
  • Evidence relevance and interpretation
  • Verdict reasoning
  • Contradiction search completeness
    4. Audit outcome recorded (pass/fail + detailed feedback)
    5. Failed audits trigger:
  • Analysis of failure pattern
  • System improvement tasks
  • Algorithm/prompt adjustments
    6. Audit results feed back into system improvement

CRITICAL: Auditors analyze PATTERNS, not fix individual outputs.

5.3 Feedback Loop (Continuous Improvement)

Audit outcomes systematically improve:

  • Query templates – Refined based on missed evidence patterns
  • Retrieval source weights – Adjusted for accuracy and reliability
  • Contradiction detection heuristics – Enhanced to catch missed counter-evidence
  • Model prompts and extraction rules – Tuned for better claim extraction
  • Risk tier assignments – Recalibrated based on error patterns
  • Bubble detection algorithms – Improved to identify echo chambers

Philosophy: "Improve the system, not the data"

5.4 Audit Transparency

Publicly Published:

  • Audit statistics (monthly)
  • Accuracy rates by risk tier
  • System improvements made
  • Aggregate audit performance

Enables:

  • Public accountability
  • System trust
  • Continuous improvement visibility

6. Human Intervention Criteria

From Organisation.Decision-Processes:

LEGITIMATE reasons to intervene:

  • ✅ AKEL explicitly flags item for sampling audit
  • ✅ System metrics show performance degradation
  • ✅ Legal/safety issue requires immediate action
  • ✅ User reports reveal systematic bias pattern

ILLEGITIMATE reasons (system improvement needed instead):

  • ❌ "I disagree with this verdict" → Improve algorithm
  • ❌ "This source should rank higher" → Improve scoring rules
  • ❌ "Manual quality gate before publication" → Defeats purpose of automation
  • ❌ "I know better than the algorithm" → Then improve the algorithm

Philosophy: If you disagree with output, improve the system that generated it.

7. Architecture Overview

POC Architecture (POC1, POC2)

Simple, Single-Call Approach:

```
User submits article/claim
    ↓
Single AI API call
    ↓
Returns complete analysis
    ↓
Quality gates validate
    ↓
PASS → Publish (Mode 2)
FAIL → Block (Mode 1)
```

Components in Single Call:

  1. Extract 3-5 factual claims
    2. For each claim: verdict + confidence + risk tier + reasoning
    3. Generate analysis summary
    4. Generate article summary
    5. Run basic quality checks

Processing Time: 10-18 seconds

Advantages: Simple, fast POC development, proves AI capability  
Limitations: No component reusability, all-or-nothing

Full System Architecture (Beta 0, Release 1.0)

Multi-Component Pipeline:

```
AKEL Orchestrator
├── Claim Extractor
├── Claim Classifier (with risk tier assignment)
├── Scenario Generator
├── Evidence Summarizer
├── Contradiction Detector
├── Quality Gate Validator
├── Audit Sampling Scheduler
└── Federation Sync Adapter (Release 1.0+)
```

Processing:

  • Parallel processing where possible
  • Separate component calls
  • Quality gates between phases
  • Audit sampling selection
  • Cross-node coordination (federated mode)

Processing Time: 10-30 seconds (full pipeline)

Evolution Path

POC1: Single prompt → Prove concept  
POC2: Add scenario component → Test full pipeline  
Beta 0: Multi-component AKEL → Production architecture  
Release 1.0: Full AKEL + Federation → Scale

8. AKEL and Federation

In Release 1.0+, AKEL participates in cross-node knowledge alignment:

  • Shares embeddings
  • Exchanges canonicalized claim forms
  • Exchanges scenario templates
  • Sends + receives contradiction alerts
  • Shares audit findings (with privacy controls)
  • Never shares model weights
  • Never overrides local governance

Nodes may choose trust levels for AKEL-related data:

  • Trusted nodes: auto-merge embeddings + templates
  • Neutral nodes: require additional verification
  • Untrusted nodes: fully manual import

9. POC Behavior

The POC explicitly demonstrates AI-generated content publication:

  • ✅ Produces public AI-generated output (Mode 2)
  • ✅ No human data sources required
  • ✅ No human approval gate
  • ✅ Clear "AI-Generated - POC/Demo" labeling
  • ✅ All quality gates active (including contradiction search)
  • ✅ Users understand this demonstrates AI reasoning capabilities
  • ✅ Risk tier classification shown (demo purposes)

Philosophy Validation: POC proves automation-first approach works.

10. Related Pages

V0.9.70 CHANGES:
- ❌ REMOVED: Section "Human Review Workflow (Mode 3 Publication)"
- ❌ REMOVED: All references to "Mode 3"
- ❌ REMOVED: "Human review required before publication"
- ✅ CLARIFIED: Two modes only (AI-Generated / Draft-Only)
- ✅ CLARIFIED: Quality gate failures → Block + improve system
- ✅ CLARIFIED: Sampling audits for improvement, NOT approval
- ✅ CLARIFIED: Risk tiers affect warnings/audits, NOT approval gates
- ✅ ENHANCED: Gate 2 (Contradiction Search) specification
- ✅ ADDED: Clear human intervention criteria
- ✅ ADDED: Detailed audit system explanation