AI Knowledge Extraction Layer (AKEL)
AKEL – AI Knowledge Extraction Layer
Version: 0.9.70
Last Updated: December 21, 2025
Status: CORRECTED - Automation Philosophy Consistent
AKEL is FactHarbor's automated intelligence subsystem.
Its purpose is to reduce human workload, enhance consistency, and enable scalable knowledge processing.
AKEL outputs are marked with AuthorType = AI and published according to risk-based policies (see Publication Modes below).
AKEL operates in two modes:
- Single-node mode (POC & Beta 0)
- Federated multi-node mode (Release 1.0+)
1. Core Philosophy: Automation First
V0.9.50+ Philosophy Shift:
FactHarbor uses "Improve the system, not the data" approach:
- ✅ Automated Publication: AI-generated content publishes immediately after passing quality gates
- ✅ Quality Gates: Automated checks (not human approval)
- ✅ Sampling Audits: Humans analyze patterns for system improvement (not individual approval)
- ❌ NO approval workflows: No review queues, no moderator gatekeeping for content quality
- ❌ NO manual fixes: If output is wrong, improve the algorithm/prompts
Why This Matters:
Traditional approach: Human reviews every output → Bottleneck, inconsistent
FactHarbor approach: Automated quality gates + pattern-based improvement → Scalable, consistent
2. Publication Modes
V0.9.70 CLARIFICATION: FactHarbor uses TWO publication modes (not three):
Mode 1: Draft-Only
Status: Not visible to public
When Used:
- Quality gates failed
- Confidence below threshold
- Structural integrity issues
- Insufficient evidence
What Happens:
- Content remains private
- System logs failure reasons
- Prompts/algorithms improved based on patterns
- Content may be re-processed after improvements
NOT "pending human approval" - it's blocked because it doesn't meet automated quality standards.
Mode 2: AI-Generated (Public)
Status: Published and visible to all users
When Used:
- Quality gates passed
- Confidence ≥ threshold
- Meets structural requirements
- Sufficient evidence found
Includes:
- Confidence score displayed (0-100%)
- Risk tier badge (A/B/C)
- Quality indicators
- Clear "AI-Generated" labeling
- Sampling audit status
Labels by Risk Tier:
- Tier A (High Risk): "⚠️ AI-Generated - High Impact Topic - Seek Professional Advice"
- Tier B (Medium Risk): "🤖 AI-Generated - May Contain Errors"
- Tier C (Low Risk): "🤖 AI-Generated"
REMOVED: "Mode 3: Human-Reviewed"
V0.9.50 Decision: No centralized approval workflow.
Rationale:
- Defeats automation purpose
- Creates bottleneck
- Inconsistent quality
- Not scalable
What Replaced It:
- Better quality gates
- Sampling audits for system improvement
- Transparent confidence scoring
- Risk-based warnings
3. Risk Tiers (A/B/C)
Risk classification determines WARNING LABELS and AUDIT FREQUENCY, NOT approval requirements.
Tier A: High-Stakes Claims
Examples: Medical advice, legal interpretations, financial recommendations, safety information
Impact:
- ✅ Publish immediately (if passes gates)
- ✅ Prominent warning labels
- ✅ Higher sampling audit frequency (50% audited)
- ✅ Explicit disclaimers ("Seek professional advice")
- ❌ NOT held for moderator approval
Philosophy: Publish with strong warnings, monitor closely
Tier B: Moderate-Stakes Claims
Examples: Political claims, controversial topics, scientific debates
Impact:
- ✅ Publish immediately (if passes gates)
- ✅ Standard warning labels
- ✅ Medium sampling audit frequency (20% audited)
- ❌ NOT held for moderator approval
Tier C: Low-Stakes Claims
Examples: Entertainment facts, sports statistics, general knowledge
Impact:
- ✅ Publish immediately (if passes gates)
- ✅ Minimal warning labels
- ✅ Low sampling audit frequency (5% audited)
4. Quality Gates (Automated, Not Human)
All AI-generated content must pass these AUTOMATED checks before publication:
Gate 1: Source Quality
Automated Checks:
- Primary sources identified and accessible
- Source reliability scored against database
- Citation completeness verified
- Publication dates checked
- Author credentials validated (where applicable)
If Failed: Block publication, log pattern, improve source detection
Gate 2: Contradiction Search (MANDATORY)
The system MUST actively search for:
- Counter-evidence – Rebuttals, conflicting results, contradictory studies
- Reservations – Caveats, limitations, boundary conditions
- Alternative interpretations – Different framings, definitions
- Bubble detection – Echo chambers, ideologically isolated sources
Search Coverage Requirements:
- Academic literature (BOTH supporting AND opposing views)
- Diverse media across political/ideological perspectives
- Official contradictions (retractions, corrections, amendments)
- Cross-cultural and international perspectives
Search Must Avoid Algorithmic Bubbles:
- Deliberately seek opposing viewpoints
- Check for echo chamber patterns
- Identify tribal source clustering
- Flag artificially constrained search space
- Verify diversity of perspectives
Outcomes:
- Strong counter-evidence → Auto-escalate to Tier B or draft-only
- Significant uncertainty → Require uncertainty disclosure in verdict
- Bubble indicators → Flag for sampling audit
- Limited perspective diversity → Expand search or flag
If Failed: Block publication, improve search algorithms
Gate 3: Uncertainty Quantification
Automated Checks:
- Confidence scores calculated for all claims and verdicts
- Limitations explicitly stated
- Data gaps identified and disclosed
- Strength of evidence assessed
- Alternative scenarios considered
If Failed: Block publication, improve confidence scoring
Gate 4: Structural Integrity
Automated Checks:
- No hallucinations detected (fact-checking against sources)
- Logic chain valid and traceable
- References accessible and verifiable
- No circular reasoning
- Premises clearly stated
If Failed: Block publication, improve hallucination detection
CRITICAL: If any gate fails:
- ✅ Content remains in draft-only mode
- ✅ Failure reason logged
- ✅ Failure patterns analyzed for system improvement
- ❌ NOT "sent for human review"
- ❌ NOT "manually overridden"
Philosophy: Fix the system that generated bad output, don't manually fix individual outputs.
5. Sampling Audit System
Purpose: Improve the system through pattern analysis (NOT approve individual outputs)
5.1 How Sampling Works
Stratified Sampling Strategy:
Audits prioritize:
- Risk tier (Tier A: 50%, Tier B: 20%, Tier C: 5%)
- AI confidence score (low confidence → higher sampling rate)
- Traffic and engagement (high-visibility content audited more)
- Novelty (new claim types, new domains, emerging topics)
- Disagreement signals (user flags, contradiction alerts, community reports)
NOT: Review queue for approval
IS: Statistical sampling for quality monitoring
5.2 Audit Process
- System selects content for audit based on sampling strategy
2. Human auditor reviews AI-generated content against quality standards
3. Auditor validates or identifies issues:
- Claim extraction accuracy
- Scenario appropriateness
- Evidence relevance and interpretation
- Verdict reasoning
- Contradiction search completeness
4. Audit outcome recorded (pass/fail + detailed feedback)
5. Failed audits trigger: - Analysis of failure pattern
- System improvement tasks
- Algorithm/prompt adjustments
6. Audit results feed back into system improvement
CRITICAL: Auditors analyze PATTERNS, not fix individual outputs.
5.3 Feedback Loop (Continuous Improvement)
Audit outcomes systematically improve:
- Query templates – Refined based on missed evidence patterns
- Retrieval source weights – Adjusted for accuracy and reliability
- Contradiction detection heuristics – Enhanced to catch missed counter-evidence
- Model prompts and extraction rules – Tuned for better claim extraction
- Risk tier assignments – Recalibrated based on error patterns
- Bubble detection algorithms – Improved to identify echo chambers
Philosophy: "Improve the system, not the data"
5.4 Audit Transparency
Publicly Published:
- Audit statistics (monthly)
- Accuracy rates by risk tier
- System improvements made
- Aggregate audit performance
Enables:
- Public accountability
- System trust
- Continuous improvement visibility
6. Human Intervention Criteria
From Organisation.Decision-Processes:
LEGITIMATE reasons to intervene:
- ✅ AKEL explicitly flags item for sampling audit
- ✅ System metrics show performance degradation
- ✅ Legal/safety issue requires immediate action
- ✅ User reports reveal systematic bias pattern
ILLEGITIMATE reasons (system improvement needed instead):
- ❌ "I disagree with this verdict" → Improve algorithm
- ❌ "This source should rank higher" → Improve scoring rules
- ❌ "Manual quality gate before publication" → Defeats purpose of automation
- ❌ "I know better than the algorithm" → Then improve the algorithm
Philosophy: If you disagree with output, improve the system that generated it.
7. Architecture Overview
POC Architecture (POC1, POC2)
Simple, Single-Call Approach:
```
User submits article/claim
↓
Single AI API call
↓
Returns complete analysis
↓
Quality gates validate
↓
PASS → Publish (Mode 2)
FAIL → Block (Mode 1)
```
Components in Single Call:
- Extract 3-5 factual claims
2. For each claim: verdict + confidence + risk tier + reasoning
3. Generate analysis summary
4. Generate article summary
5. Run basic quality checks
Processing Time: 10-18 seconds
Advantages: Simple, fast POC development, proves AI capability
Limitations: No component reusability, all-or-nothing
Full System Architecture (Beta 0, Release 1.0)
Multi-Component Pipeline:
```
AKEL Orchestrator
├── Claim Extractor
├── Claim Classifier (with risk tier assignment)
├── Scenario Generator
├── Evidence Summarizer
├── Contradiction Detector
├── Quality Gate Validator
├── Audit Sampling Scheduler
└── Federation Sync Adapter (Release 1.0+)
```
Processing:
- Parallel processing where possible
- Separate component calls
- Quality gates between phases
- Audit sampling selection
- Cross-node coordination (federated mode)
Processing Time: 10-30 seconds (full pipeline)
Evolution Path
POC1: Single prompt → Prove concept
POC2: Add scenario component → Test full pipeline
Beta 0: Multi-component AKEL → Production architecture
Release 1.0: Full AKEL + Federation → Scale
8. AKEL and Federation
In Release 1.0+, AKEL participates in cross-node knowledge alignment:
- Shares embeddings
- Exchanges canonicalized claim forms
- Exchanges scenario templates
- Sends + receives contradiction alerts
- Shares audit findings (with privacy controls)
- Never shares model weights
- Never overrides local governance
Nodes may choose trust levels for AKEL-related data:
- Trusted nodes: auto-merge embeddings + templates
- Neutral nodes: require additional verification
- Untrusted nodes: fully manual import
9. POC Behavior
The POC explicitly demonstrates AI-generated content publication:
- ✅ Produces public AI-generated output (Mode 2)
- ✅ No human data sources required
- ✅ No human approval gate
- ✅ Clear "AI-Generated - POC/Demo" labeling
- ✅ All quality gates active (including contradiction search)
- ✅ Users understand this demonstrates AI reasoning capabilities
- ✅ Risk tier classification shown (demo purposes)
Philosophy Validation: POC proves automation-first approach works.
10. Related Pages
V0.9.70 CHANGES:
- ❌ REMOVED: Section "Human Review Workflow (Mode 3 Publication)"
- ❌ REMOVED: All references to "Mode 3"
- ❌ REMOVED: "Human review required before publication"
- ✅ CLARIFIED: Two modes only (AI-Generated / Draft-Only)
- ✅ CLARIFIED: Quality gate failures → Block + improve system
- ✅ CLARIFIED: Sampling audits for improvement, NOT approval
- ✅ CLARIFIED: Risk tiers affect warnings/audits, NOT approval gates
- ✅ ENHANCED: Gate 2 (Contradiction Search) specification
- ✅ ADDED: Clear human intervention criteria
- ✅ ADDED: Detailed audit system explanation