AI Knowledge Extraction Layer (AKEL)

Last modified by Robert Schaub on 2025/12/22 14:38

AKEL – AI Knowledge Extraction Layer

Version: 0.9.70 Last Updated: December 21, 2025 Status: CORRECTED - Automation Philosophy Consistent AKEL is FactHarbor's automated intelligence subsystem. Its purpose is to reduce human workload, enhance consistency, and enable scalable knowledge processing. AKEL outputs are marked with AuthorType = AI and published according to risk-based policies (see Publication Modes below). AKEL operates in two modes:

Single-node mode (POC & Beta 0)
Federated multi-node mode (Release 1.0+) == 1. Core Philosophy: Automation First == V0.9.50+ Philosophy Shift: FactHarbor uses "Improve the system, not the data" approach: * ✅ Automated Publication: AI-generated content publishes immediately after passing quality gates
✅ Quality Gates: Automated checks (not human approval)
✅ Sampling Audits: Humans analyze patterns for system improvement (not individual approval)
❌ NO approval workflows: No review queues, no moderator gatekeeping for content quality
❌ NO manual fixes: If output is wrong, improve the algorithm/prompts Why This Matters: Traditional approach: Human reviews every output → Bottleneck, inconsistent FactHarbor approach: Automated quality gates + pattern-based improvement → Scalable, consistent == 2. Publication Modes == V0.9.70 CLARIFICATION: FactHarbor uses TWO publication modes (not three): === Mode 1: Draft-Only === Status: Not visible to public When Used:
Quality gates failed
Confidence below threshold
Structural integrity issues
Insufficient evidence What Happens:
Content remains private
System logs failure reasons
Prompts/algorithms improved based on patterns
Content may be re-processed after improvements NOT "pending human approval" - it's blocked because it doesn't meet automated quality standards. === Mode 2: AI-Generated (Public) === Status: Published and visible to all users When Used:
Quality gates passed
Confidence ≥ threshold
Meets structural requirements
Sufficient evidence found Includes:
Confidence score displayed (0-100%)
Risk tier badge (A/B/C)
Quality indicators
Clear "AI-Generated" labeling
Sampling audit status Labels by Risk Tier:
Tier A (High Risk): "⚠️ AI-Generated - High Impact Topic - Seek Professional Advice"
Tier B (Medium Risk): "🤖 AI-Generated - May Contain Errors"
Tier C (Low Risk): "🤖 AI-Generated" === REMOVED: "Mode 3: Human-Reviewed" === V0.9.50 Decision: No centralized approval workflow. Rationale:
Defeats automation purpose
Creates bottleneck
Inconsistent quality
Not scalable What Replaced It:
Better quality gates
Sampling audits for system improvement
Transparent confidence scoring
Risk-based warnings == 3. Risk Tiers (A/B/C) == Risk classification determines WARNING LABELS and AUDIT FREQUENCY, NOT approval requirements. === Tier A: High-Stakes Claims === Examples: Medical advice, legal interpretations, financial recommendations, safety information Impact:
✅ Publish immediately (if passes gates)
✅ Prominent warning labels
✅ Higher sampling audit frequency (50% audited)
✅ Explicit disclaimers ("Seek professional advice")
❌ NOT held for moderator approval Philosophy: Publish with strong warnings, monitor closely === Tier B: Moderate-Stakes Claims === Examples: Political claims, controversial topics, scientific debates Impact:
✅ Publish immediately (if passes gates)
✅ Standard warning labels
✅ Medium sampling audit frequency (20% audited)
❌ NOT held for moderator approval === Tier C: Low-Stakes Claims === Examples: Entertainment facts, sports statistics, general knowledge Impact:
✅ Publish immediately (if passes gates)
✅ Minimal warning labels
✅ Low sampling audit frequency (5% audited) == 4. Quality Gates (Automated, Not Human) == All AI-generated content must pass these AUTOMATED checks before publication: === Gate 1: Source Quality === Automated Checks:
Primary sources identified and accessible
Source reliability scored against database
Citation completeness verified
Publication dates checked
Author credentials validated (where applicable) If Failed: Block publication, log pattern, improve source detection === Gate 2: Contradiction Search (MANDATORY) === The system MUST actively search for: * Counter-evidence – Rebuttals, conflicting results, contradictory studies
Reservations – Caveats, limitations, boundary conditions
Alternative interpretations – Different framings, definitions
Bubble detection – Echo chambers, ideologically isolated sources Search Coverage Requirements:
Academic literature (BOTH supporting AND opposing views)
Diverse media across political/ideological perspectives
Official contradictions (retractions, corrections, amendments)
Cross-cultural and international perspectives Search Must Avoid Algorithmic Bubbles:
Deliberately seek opposing viewpoints
Check for echo chamber patterns
Identify tribal source clustering
Flag artificially constrained search space
Verify diversity of perspectives Outcomes:
Strong counter-evidence → Auto-escalate to Tier B or draft-only
Significant uncertainty → Require uncertainty disclosure in verdict
Bubble indicators → Flag for sampling audit
Limited perspective diversity → Expand search or flag If Failed: Block publication, improve search algorithms === Gate 3: Uncertainty Quantification === Automated Checks:
Confidence scores calculated for all claims and verdicts
Limitations explicitly stated
Data gaps identified and disclosed
Strength of evidence assessed
Alternative scenarios considered If Failed: Block publication, improve confidence scoring === Gate 4: Structural Integrity === Automated Checks:
No hallucinations detected (fact-checking against sources)
Logic chain valid and traceable
References accessible and verifiable
No circular reasoning
Premises clearly stated If Failed: Block publication, improve hallucination detection CRITICAL: If any gate fails:
✅ Content remains in draft-only mode
✅ Failure reason logged
✅ Failure patterns analyzed for system improvement
❌ NOT "sent for human review"
❌ NOT "manually overridden" Philosophy: Fix the system that generated bad output, don't manually fix individual outputs. == 5. Sampling Audit System == Purpose: Improve the system through pattern analysis (NOT approve individual outputs) === 5.1 How Sampling Works === Stratified Sampling Strategy: Audits prioritize:
Risk tier (Tier A: 50%, Tier B: 20%, Tier C: 5%)
AI confidence score (low confidence → higher sampling rate)
Traffic and engagement (high-visibility content audited more)
Novelty (new claim types, new domains, emerging topics)
Disagreement signals (user flags, contradiction alerts, community reports) NOT: Review queue for approval IS: Statistical sampling for quality monitoring === 5.2 Audit Process === 1. System selects content for audit based on sampling strategy
2. Human auditor reviews AI-generated content against quality standards
3. Auditor validates or identifies issues: * Claim extraction accuracy * Scenario appropriateness * Evidence relevance and interpretation * Verdict reasoning * Contradiction search completeness
4. Audit outcome recorded (pass/fail + detailed feedback)
5. Failed audits trigger: * Analysis of failure pattern * System improvement tasks * Algorithm/prompt adjustments
6. Audit results feed back into system improvement CRITICAL: Auditors analyze PATTERNS, not fix individual outputs. === 5.3 Feedback Loop (Continuous Improvement) === Audit outcomes systematically improve: * Query templates – Refined based on missed evidence patterns
Retrieval source weights – Adjusted for accuracy and reliability
Contradiction detection heuristics – Enhanced to catch missed counter-evidence
Model prompts and extraction rules – Tuned for better claim extraction
Risk tier assignments – Recalibrated based on error patterns
Bubble detection algorithms – Improved to identify echo chambers Philosophy: "Improve the system, not the data" === 5.4 Audit Transparency === Publicly Published:
Audit statistics (monthly)
Accuracy rates by risk tier
System improvements made
Aggregate audit performance Enables:
Public accountability
System trust
Continuous improvement visibility == 6. Human Intervention Criteria == From Organisation.Decision-Processes: LEGITIMATE reasons to intervene: * ✅ AKEL explicitly flags item for sampling audit
✅ System metrics show performance degradation
✅ Legal/safety issue requires immediate action
✅ User reports reveal systematic bias pattern ILLEGITIMATE reasons (system improvement needed instead): * ❌ "I disagree with this verdict" → Improve algorithm
❌ "This source should rank higher" → Improve scoring rules
❌ "Manual quality gate before publication" → Defeats purpose of automation
❌ "I know better than the algorithm" → Then improve the algorithm Philosophy: If you disagree with output, improve the system that generated it. == 7. Architecture Overview == === POC Architecture (POC1, POC2) === Simple, Single-Call Approach: ```
User submits article/claim ↓
Single AI API call ↓
Returns complete analysis ↓
Quality gates validate ↓
PASS → Publish (Mode 2)
FAIL → Block (Mode 1)
``` Components in Single Call:

Extract 3-5 factual claims
2. For each claim: verdict + confidence + risk tier + reasoning
3. Generate analysis summary
4. Generate article summary
5. Run basic quality checks Processing Time: 10-18 seconds Advantages: Simple, fast POC development, proves AI capability Limitations: No component reusability, all-or-nothing === Full System Architecture (Beta 0, Release 1.0) === Multi-Component Pipeline: ```
AKEL Orchestrator
├── Claim Extractor
├── Claim Classifier (with risk tier assignment)
├── Scenario Generator
├── Evidence Summarizer
├── Contradiction Detector
├── Quality Gate Validator
├── Audit Sampling Scheduler
└── Federation Sync Adapter (Release 1.0+)
``` Processing:

Parallel processing where possible
Separate component calls
Quality gates between phases
Audit sampling selection
Cross-node coordination (federated mode) Processing Time: 10-30 seconds (full pipeline) === Evolution Path === POC1: Single prompt → Prove concept POC2: Add scenario component → Test full pipeline Beta 0: Multi-component AKEL → Production architecture Release 1.0: Full AKEL + Federation → Scale == 8. AKEL and Federation == In Release 1.0+, AKEL participates in cross-node knowledge alignment: * Shares embeddings
Exchanges canonicalized claim forms
Exchanges scenario templates
Sends + receives contradiction alerts
Shares audit findings (with privacy controls)
Never shares model weights
Never overrides local governance Nodes may choose trust levels for AKEL-related data:
Trusted nodes: auto-merge embeddings + templates
Neutral nodes: require additional verification
Untrusted nodes: fully manual import == 9. POC Behavior == The POC explicitly demonstrates AI-generated content publication: * ✅ Produces public AI-generated output (Mode 2)
✅ No human data sources required
✅ No human approval gate
✅ Clear "AI-Generated - POC/Demo" labeling
✅ All quality gates active (including contradiction search)
✅ Users understand this demonstrates AI reasoning capabilities
✅ Risk tier classification shown (demo purposes) Philosophy Validation: POC proves automation-first approach works. == 10. Related Pages == * Automation
Requirements (Roles)
Workflows
Governance
Decision Processes V0.9.70 CHANGES:
- ❌ REMOVED: Section "Human Review Workflow (Mode 3 Publication)"
- ❌ REMOVED: All references to "Mode 3"
- ❌ REMOVED: "Human review required before publication"
- ✅ CLARIFIED: Two modes only (AI-Generated / Draft-Only)
- ✅ CLARIFIED: Quality gate failures → Block + improve system
- ✅ CLARIFIED: Sampling audits for improvement, NOT approval
- ✅ CLARIFIED: Risk tiers affect warnings/audits, NOT approval gates
- ✅ ENHANCED: Gate 2 (Contradiction Search) specification
- ✅ ADDED: Clear human intervention criteria
- ✅ ADDED: Detailed audit system explanation

AI Knowledge Extraction Layer (AKEL)

AKEL – AI Knowledge Extraction Layer

Applications

Navigation

Need help?