Wiki source code of AI Knowledge Extraction Layer (AKEL)
Last modified by Robert Schaub on 2025/12/22 14:32
Show last authors
| author | version | line-number | content |
|---|---|---|---|
| 1 | = AKEL – AI Knowledge Extraction Layer = **Version:** 0.9.70 **Last Updated:** December 21, 2025 **Status:** CORRECTED - Automation Philosophy Consistent AKEL is FactHarbor's automated intelligence subsystem. Its purpose is to reduce human workload, enhance consistency, and enable scalable knowledge processing. AKEL outputs are marked with **AuthorType = AI** and published according to risk-based policies (see Publication Modes below). AKEL operates in two modes: | ||
| 2 | * **Single-node mode** (POC & Beta 0) | ||
| 3 | * **Federated multi-node mode** (Release 1.0+) == 1. Core Philosophy: Automation First == **V0.9.50+ Philosophy Shift:** FactHarbor uses **"Improve the system, not the data"** approach: * ✅ **Automated Publication:** AI-generated content publishes immediately after passing quality gates | ||
| 4 | * ✅ **Quality Gates:** Automated checks (not human approval) | ||
| 5 | * ✅ **Sampling Audits:** Humans analyze patterns for system improvement (not individual approval) | ||
| 6 | * ❌ **NO approval workflows:** No review queues, no moderator gatekeeping for content quality | ||
| 7 | * ❌ **NO manual fixes:** If output is wrong, improve the algorithm/prompts **Why This Matters:** Traditional approach: Human reviews every output → Bottleneck, inconsistent FactHarbor approach: Automated quality gates + pattern-based improvement → Scalable, consistent == 2. Publication Modes == **V0.9.70 CLARIFICATION:** FactHarbor uses **TWO publication modes** (not three): === Mode 1: Draft-Only === **Status:** Not visible to public **When Used:** | ||
| 8 | * Quality gates failed | ||
| 9 | * Confidence below threshold | ||
| 10 | * Structural integrity issues | ||
| 11 | * Insufficient evidence **What Happens:** | ||
| 12 | * Content remains private | ||
| 13 | * System logs failure reasons | ||
| 14 | * Prompts/algorithms improved based on patterns | ||
| 15 | * Content may be re-processed after improvements **NOT "pending human approval"** - it's blocked because it doesn't meet automated quality standards. === Mode 2: AI-Generated (Public) === **Status:** Published and visible to all users **When Used:** | ||
| 16 | * Quality gates passed | ||
| 17 | * Confidence ≥ threshold | ||
| 18 | * Meets structural requirements | ||
| 19 | * Sufficient evidence found **Includes:** | ||
| 20 | * Confidence score displayed (0-100%) | ||
| 21 | * Risk tier badge (A/B/C) | ||
| 22 | * Quality indicators | ||
| 23 | * Clear "AI-Generated" labeling | ||
| 24 | * Sampling audit status **Labels by Risk Tier:** | ||
| 25 | * **Tier A (High Risk):** "⚠️ AI-Generated - High Impact Topic - Seek Professional Advice" | ||
| 26 | * **Tier B (Medium Risk):** "🤖 AI-Generated - May Contain Errors" | ||
| 27 | * **Tier C (Low Risk):** "🤖 AI-Generated" === REMOVED: "Mode 3: Human-Reviewed" === **V0.9.50 Decision:** No centralized approval workflow. **Rationale:** | ||
| 28 | * Defeats automation purpose | ||
| 29 | * Creates bottleneck | ||
| 30 | * Inconsistent quality | ||
| 31 | * Not scalable **What Replaced It:** | ||
| 32 | * Better quality gates | ||
| 33 | * Sampling audits for system improvement | ||
| 34 | * Transparent confidence scoring | ||
| 35 | * Risk-based warnings == 3. Risk Tiers (A/B/C) == Risk classification determines WARNING LABELS and AUDIT FREQUENCY, NOT approval requirements. === Tier A: High-Stakes Claims === **Examples:** Medical advice, legal interpretations, financial recommendations, safety information **Impact:** | ||
| 36 | * ✅ Publish immediately (if passes gates) | ||
| 37 | * ✅ Prominent warning labels | ||
| 38 | * ✅ Higher sampling audit frequency (50% audited) | ||
| 39 | * ✅ Explicit disclaimers ("Seek professional advice") | ||
| 40 | * ❌ NOT held for moderator approval **Philosophy:** Publish with strong warnings, monitor closely === Tier B: Moderate-Stakes Claims === **Examples:** Political claims, controversial topics, scientific debates **Impact:** | ||
| 41 | * ✅ Publish immediately (if passes gates) | ||
| 42 | * ✅ Standard warning labels | ||
| 43 | * ✅ Medium sampling audit frequency (20% audited) | ||
| 44 | * ❌ NOT held for moderator approval === Tier C: Low-Stakes Claims === **Examples:** Entertainment facts, sports statistics, general knowledge **Impact:** | ||
| 45 | * ✅ Publish immediately (if passes gates) | ||
| 46 | * ✅ Minimal warning labels | ||
| 47 | * ✅ Low sampling audit frequency (5% audited) == 4. Quality Gates (Automated, Not Human) == All AI-generated content must pass these **AUTOMATED checks** before publication: === Gate 1: Source Quality === **Automated Checks:** | ||
| 48 | * Primary sources identified and accessible | ||
| 49 | * Source reliability scored against database | ||
| 50 | * Citation completeness verified | ||
| 51 | * Publication dates checked | ||
| 52 | * Author credentials validated (where applicable) **If Failed:** Block publication, log pattern, improve source detection === Gate 2: Contradiction Search (MANDATORY) === **The system MUST actively search for:** * **Counter-evidence** – Rebuttals, conflicting results, contradictory studies | ||
| 53 | * **Reservations** – Caveats, limitations, boundary conditions | ||
| 54 | * **Alternative interpretations** – Different framings, definitions | ||
| 55 | * **Bubble detection** – Echo chambers, ideologically isolated sources **Search Coverage Requirements:** | ||
| 56 | * Academic literature (BOTH supporting AND opposing views) | ||
| 57 | * Diverse media across political/ideological perspectives | ||
| 58 | * Official contradictions (retractions, corrections, amendments) | ||
| 59 | * Cross-cultural and international perspectives **Search Must Avoid Algorithmic Bubbles:** | ||
| 60 | * Deliberately seek opposing viewpoints | ||
| 61 | * Check for echo chamber patterns | ||
| 62 | * Identify tribal source clustering | ||
| 63 | * Flag artificially constrained search space | ||
| 64 | * Verify diversity of perspectives **Outcomes:** | ||
| 65 | * Strong counter-evidence → Auto-escalate to Tier B or draft-only | ||
| 66 | * Significant uncertainty → Require uncertainty disclosure in verdict | ||
| 67 | * Bubble indicators → Flag for sampling audit | ||
| 68 | * Limited perspective diversity → Expand search or flag **If Failed:** Block publication, improve search algorithms === Gate 3: Uncertainty Quantification === **Automated Checks:** | ||
| 69 | * Confidence scores calculated for all claims and verdicts | ||
| 70 | * Limitations explicitly stated | ||
| 71 | * Data gaps identified and disclosed | ||
| 72 | * Strength of evidence assessed | ||
| 73 | * Alternative scenarios considered **If Failed:** Block publication, improve confidence scoring === Gate 4: Structural Integrity === **Automated Checks:** | ||
| 74 | * No hallucinations detected (fact-checking against sources) | ||
| 75 | * Logic chain valid and traceable | ||
| 76 | * References accessible and verifiable | ||
| 77 | * No circular reasoning | ||
| 78 | * Premises clearly stated **If Failed:** Block publication, improve hallucination detection **CRITICAL:** If any gate fails: | ||
| 79 | * ✅ Content remains in draft-only mode | ||
| 80 | * ✅ Failure reason logged | ||
| 81 | * ✅ Failure patterns analyzed for system improvement | ||
| 82 | * ❌ **NOT "sent for human review"** | ||
| 83 | * ❌ **NOT "manually overridden"** **Philosophy:** Fix the system that generated bad output, don't manually fix individual outputs. == 5. Sampling Audit System == **Purpose:** Improve the system through pattern analysis (NOT approve individual outputs) === 5.1 How Sampling Works === **Stratified Sampling Strategy:** Audits prioritize: | ||
| 84 | * **Risk tier** (Tier A: 50%, Tier B: 20%, Tier C: 5%) | ||
| 85 | * **AI confidence score** (low confidence → higher sampling rate) | ||
| 86 | * **Traffic and engagement** (high-visibility content audited more) | ||
| 87 | * **Novelty** (new claim types, new domains, emerging topics) | ||
| 88 | * **Disagreement signals** (user flags, contradiction alerts, community reports) **NOT:** Review queue for approval **IS:** Statistical sampling for quality monitoring === 5.2 Audit Process === 1. **System selects** content for audit based on sampling strategy | ||
| 89 | 2. **Human auditor** reviews AI-generated content against quality standards | ||
| 90 | 3. **Auditor validates or identifies issues:** * Claim extraction accuracy * Scenario appropriateness * Evidence relevance and interpretation * Verdict reasoning * Contradiction search completeness | ||
| 91 | 4. **Audit outcome recorded** (pass/fail + detailed feedback) | ||
| 92 | 5. **Failed audits trigger:** * Analysis of failure pattern * System improvement tasks * Algorithm/prompt adjustments | ||
| 93 | 6. **Audit results feed back** into system improvement **CRITICAL:** Auditors analyze PATTERNS, not fix individual outputs. === 5.3 Feedback Loop (Continuous Improvement) === Audit outcomes systematically improve: * **Query templates** – Refined based on missed evidence patterns | ||
| 94 | * **Retrieval source weights** – Adjusted for accuracy and reliability | ||
| 95 | * **Contradiction detection heuristics** – Enhanced to catch missed counter-evidence | ||
| 96 | * **Model prompts and extraction rules** – Tuned for better claim extraction | ||
| 97 | * **Risk tier assignments** – Recalibrated based on error patterns | ||
| 98 | * **Bubble detection algorithms** – Improved to identify echo chambers **Philosophy:** "Improve the system, not the data" === 5.4 Audit Transparency === **Publicly Published:** | ||
| 99 | * Audit statistics (monthly) | ||
| 100 | * Accuracy rates by risk tier | ||
| 101 | * System improvements made | ||
| 102 | * Aggregate audit performance **Enables:** | ||
| 103 | * Public accountability | ||
| 104 | * System trust | ||
| 105 | * Continuous improvement visibility == 6. Human Intervention Criteria == **From Organisation.Decision-Processes:** **LEGITIMATE reasons to intervene:** * ✅ AKEL explicitly flags item for sampling audit | ||
| 106 | * ✅ System metrics show performance degradation | ||
| 107 | * ✅ Legal/safety issue requires immediate action | ||
| 108 | * ✅ User reports reveal systematic bias pattern **ILLEGITIMATE reasons** (system improvement needed instead): * ❌ "I disagree with this verdict" → Improve algorithm | ||
| 109 | * ❌ "This source should rank higher" → Improve scoring rules | ||
| 110 | * ❌ "Manual quality gate before publication" → Defeats purpose of automation | ||
| 111 | * ❌ "I know better than the algorithm" → Then improve the algorithm **Philosophy:** If you disagree with output, improve the system that generated it. == 7. Architecture Overview == === POC Architecture (POC1, POC2) === **Simple, Single-Call Approach:** ``` | ||
| 112 | User submits article/claim ↓ | ||
| 113 | Single AI API call ↓ | ||
| 114 | Returns complete analysis ↓ | ||
| 115 | Quality gates validate ↓ | ||
| 116 | PASS → Publish (Mode 2) | ||
| 117 | FAIL → Block (Mode 1) | ||
| 118 | ``` **Components in Single Call:** | ||
| 119 | 1. Extract 3-5 factual claims | ||
| 120 | 2. For each claim: verdict + confidence + risk tier + reasoning | ||
| 121 | 3. Generate analysis summary | ||
| 122 | 4. Generate article summary | ||
| 123 | 5. Run basic quality checks **Processing Time:** 10-18 seconds **Advantages:** Simple, fast POC development, proves AI capability **Limitations:** No component reusability, all-or-nothing === Full System Architecture (Beta 0, Release 1.0) === **Multi-Component Pipeline:** ``` | ||
| 124 | AKEL Orchestrator | ||
| 125 | ├── Claim Extractor | ||
| 126 | ├── Claim Classifier (with risk tier assignment) | ||
| 127 | ├── Scenario Generator | ||
| 128 | ├── Evidence Summarizer | ||
| 129 | ├── Contradiction Detector | ||
| 130 | ├── Quality Gate Validator | ||
| 131 | ├── Audit Sampling Scheduler | ||
| 132 | └── Federation Sync Adapter (Release 1.0+) | ||
| 133 | ``` **Processing:** | ||
| 134 | * Parallel processing where possible | ||
| 135 | * Separate component calls | ||
| 136 | * Quality gates between phases | ||
| 137 | * Audit sampling selection | ||
| 138 | * Cross-node coordination (federated mode) **Processing Time:** 10-30 seconds (full pipeline) === Evolution Path === **POC1:** Single prompt → Prove concept **POC2:** Add scenario component → Test full pipeline **Beta 0:** Multi-component AKEL → Production architecture **Release 1.0:** Full AKEL + Federation → Scale == 8. AKEL and Federation == In Release 1.0+, AKEL participates in cross-node knowledge alignment: * Shares embeddings | ||
| 139 | * Exchanges canonicalized claim forms | ||
| 140 | * Exchanges scenario templates | ||
| 141 | * Sends + receives contradiction alerts | ||
| 142 | * Shares audit findings (with privacy controls) | ||
| 143 | * Never shares model weights | ||
| 144 | * Never overrides local governance Nodes may choose trust levels for AKEL-related data: | ||
| 145 | * Trusted nodes: auto-merge embeddings + templates | ||
| 146 | * Neutral nodes: require additional verification | ||
| 147 | * Untrusted nodes: fully manual import == 9. POC Behavior == The POC explicitly demonstrates AI-generated content publication: * ✅ Produces public AI-generated output (Mode 2) | ||
| 148 | * ✅ No human data sources required | ||
| 149 | * ✅ No human approval gate | ||
| 150 | * ✅ Clear "AI-Generated - POC/Demo" labeling | ||
| 151 | * ✅ All quality gates active (including contradiction search) | ||
| 152 | * ✅ Users understand this demonstrates AI reasoning capabilities | ||
| 153 | * ✅ Risk tier classification shown (demo purposes) **Philosophy Validation:** POC proves automation-first approach works. == 10. Related Pages == * [[Automation>>FactHarbor.Specification.Automation.WebHome]] | ||
| 154 | * [[Requirements (Roles)>>FactHarbor.Specification.Requirements.WebHome]] | ||
| 155 | * [[Workflows>>FactHarbor.Specification.Workflows.WebHome]] | ||
| 156 | * [[Governance>>FactHarbor.Organisation.Governance.WebHome]] | ||
| 157 | * [[Decision Processes>>FactHarbor.Organisation.Decision-Processes.WebHome]] **V0.9.70 CHANGES:** | ||
| 158 | - ❌ REMOVED: Section "Human Review Workflow (Mode 3 Publication)" | ||
| 159 | - ❌ REMOVED: All references to "Mode 3" | ||
| 160 | - ❌ REMOVED: "Human review required before publication" | ||
| 161 | - ✅ CLARIFIED: Two modes only (AI-Generated / Draft-Only) | ||
| 162 | - ✅ CLARIFIED: Quality gate failures → Block + improve system | ||
| 163 | - ✅ CLARIFIED: Sampling audits for improvement, NOT approval | ||
| 164 | - ✅ CLARIFIED: Risk tiers affect warnings/audits, NOT approval gates | ||
| 165 | - ✅ ENHANCED: Gate 2 (Contradiction Search) specification | ||
| 166 | - ✅ ADDED: Clear human intervention criteria | ||
| 167 | - ✅ ADDED: Detailed audit system explanation |