Wiki source code of AI Knowledge Extraction Layer (AKEL)
Last modified by Robert Schaub on 2025/12/24 20:33
Show last authors
| author | version | line-number | content |
|---|---|---|---|
| 1 | = AKEL — AI Knowledge Extraction Layer = | ||
| 2 | |||
| 3 | AKEL is FactHarbor's automated intelligence subsystem. | ||
| 4 | Its purpose is to reduce human workload, enhance consistency, and enable scalable knowledge processing — **without ever replacing human judgment**. | ||
| 5 | |||
| 6 | AKEL outputs are marked with **AuthorType = AI** and published according to risk-based review policies (see Publication Modes below). | ||
| 7 | |||
| 8 | AKEL operates in two modes: | ||
| 9 | |||
| 10 | * **Single-node mode** (POC & Beta 0) | ||
| 11 | * **Federated multi-node mode** (Release 1.0+) | ||
| 12 | |||
| 13 | Human reviewers, experts, and moderators always retain final authority over content marked as "Human-Reviewed." | ||
| 14 | |||
| 15 | ---- | ||
| 16 | |||
| 17 | == Purpose and Role == | ||
| 18 | |||
| 19 | AKEL transforms unstructured inputs into structured, publication-ready content. | ||
| 20 | |||
| 21 | Core responsibilities: | ||
| 22 | |||
| 23 | * Claim extraction from arbitrary text | ||
| 24 | * Claim classification (domain, type, evaluability, safety, **risk tier**) | ||
| 25 | * Scenario generation (definitions, boundaries, assumptions, methodology) | ||
| 26 | * Evidence summarization and metadata extraction | ||
| 27 | * **Contradiction detection and counter-evidence search** | ||
| 28 | * **Reservation and limitation identification** | ||
| 29 | * **Bubble detection** (echo chambers, conspiracy theories, isolated sources) | ||
| 30 | * Re-evaluation proposal generation | ||
| 31 | * Cross-node embedding exchange (Release 1.0+) | ||
| 32 | |||
| 33 | ---- | ||
| 34 | |||
| 35 | == Components == | ||
| 36 | |||
| 37 | * **AKEL Orchestrator** – central coordinator | ||
| 38 | * **Claim Extractor** | ||
| 39 | * **Claim Classifier** (with risk tier assignment) | ||
| 40 | * **Scenario Generator** | ||
| 41 | * **Evidence Summarizer** | ||
| 42 | * **Contradiction Detector** (enhanced with counter-evidence search) | ||
| 43 | * **Quality Gate Validator** | ||
| 44 | * **Audit Sampling Scheduler** | ||
| 45 | * **Embedding Handler** (Release 1.0+) | ||
| 46 | * **Federation Sync Adapter** (Release 1.0+) | ||
| 47 | |||
| 48 | ---- | ||
| 49 | |||
| 50 | == Inputs and Outputs == | ||
| 51 | |||
| 52 | === Inputs === | ||
| 53 | |||
| 54 | * User-submitted claims or evidence | ||
| 55 | * Uploaded documents | ||
| 56 | * URLs or citations | ||
| 57 | * External LLM API (optional) | ||
| 58 | * Embeddings (from local or federated peers) | ||
| 59 | |||
| 60 | === Outputs (publication mode varies by risk tier) === | ||
| 61 | |||
| 62 | * ClaimVersion (draft or AI-generated) | ||
| 63 | * ScenarioVersion (draft or AI-generated) | ||
| 64 | * EvidenceVersion (summary + metadata, draft or AI-generated) | ||
| 65 | * VerdictVersion (draft, AI-generated, or human-reviewed) | ||
| 66 | * Contradiction alerts | ||
| 67 | * Reservation and limitation notices | ||
| 68 | * Re-evaluation proposals | ||
| 69 | * Updated embeddings | ||
| 70 | |||
| 71 | ---- | ||
| 72 | |||
| 73 | == Publication Modes == | ||
| 74 | |||
| 75 | AKEL content is published according to three modes: | ||
| 76 | |||
| 77 | === Mode 1: Draft-Only (Never Public) === | ||
| 78 | |||
| 79 | **Used for:** | ||
| 80 | |||
| 81 | * Failed quality gate checks | ||
| 82 | * Sensitive topics flagged for expert review | ||
| 83 | * Unclear scope or missing critical sources | ||
| 84 | * High reputational risk content | ||
| 85 | |||
| 86 | **Visibility:** Internal review queue only | ||
| 87 | |||
| 88 | === Mode 2: Published as AI-Generated (No Prior Human Review) === | ||
| 89 | |||
| 90 | **Requirements:** | ||
| 91 | |||
| 92 | * All automated quality gates passed (see below) | ||
| 93 | * Risk tier permits AI-draft publication (Tier B or C) | ||
| 94 | * Contradiction search completed successfully | ||
| 95 | * Clear labeling as "AI-Generated, Awaiting Human Review" | ||
| 96 | |||
| 97 | **Label shown to users:** | ||
| 98 | ``` | ||
| 99 | [AI-Generated] This content was produced by AI and has not yet been human-reviewed. | ||
| 100 | Source: AI | Review Status: Pending | Risk Tier: [B/C] | ||
| 101 | Contradiction Search: Completed | Last Updated: [timestamp] | ||
| 102 | ``` | ||
| 103 | |||
| 104 | **User actions:** | ||
| 105 | |||
| 106 | * Browse and read content | ||
| 107 | * Request human review (escalates to review queue) | ||
| 108 | * Flag for expert attention | ||
| 109 | |||
| 110 | === Mode 3: Published as Human-Reviewed === | ||
| 111 | |||
| 112 | **Requirements:** | ||
| 113 | |||
| 114 | * Human reviewer or domain expert validated | ||
| 115 | * All quality gates passed | ||
| 116 | * Visible "Human-Reviewed" mark with reviewer role and timestamp | ||
| 117 | |||
| 118 | **Label shown to users:** | ||
| 119 | ``` | ||
| 120 | [Human-Reviewed] This content has been validated by human reviewers. | ||
| 121 | Source: AI+Human | Review Status: Approved | Reviewed by: [Role] on [timestamp] | ||
| 122 | Risk Tier: [A/B/C] | Contradiction Search: Completed | ||
| 123 | ``` | ||
| 124 | |||
| 125 | ---- | ||
| 126 | |||
| 127 | == Risk Tiers == | ||
| 128 | |||
| 129 | AKEL assigns risk tiers to all content to determine appropriate review requirements: | ||
| 130 | |||
| 131 | === Tier A — High Risk / High Impact === | ||
| 132 | |||
| 133 | **Domains:** Medical, legal, elections, safety/security, major reputational harm | ||
| 134 | |||
| 135 | **Publication policy:** | ||
| 136 | |||
| 137 | * Human review REQUIRED before "Human-Reviewed" status | ||
| 138 | * AI-generated content MAY be published but: | ||
| 139 | ** Clearly flagged as AI-draft with prominent disclaimer | ||
| 140 | ** May have limited visibility | ||
| 141 | ** Auto-escalated to expert review queue | ||
| 142 | ** User warnings displayed | ||
| 143 | |||
| 144 | **Audit rate:** Recommendation: 30-50% of published AI-drafts sampled in first 6 months | ||
| 145 | |||
| 146 | === Tier B — Medium Risk === | ||
| 147 | |||
| 148 | **Domains:** Contested public policy, complex science, causality claims, significant financial impact | ||
| 149 | |||
| 150 | **Publication policy:** | ||
| 151 | |||
| 152 | * AI-draft CAN publish immediately with clear labeling | ||
| 153 | * Sampling audits conducted (see Audit System below) | ||
| 154 | * High-engagement items auto-escalated to expert review | ||
| 155 | * Users can request human review | ||
| 156 | |||
| 157 | **Audit rate:** Recommendation: 10-20% of published AI-drafts sampled | ||
| 158 | |||
| 159 | === Tier C — Low Risk === | ||
| 160 | |||
| 161 | **Domains:** Definitions, simple factual lookups with strong primary sources, historical facts, established scientific consensus | ||
| 162 | |||
| 163 | **Publication policy:** | ||
| 164 | |||
| 165 | * AI-draft default publication mode | ||
| 166 | * Sampling audits sufficient | ||
| 167 | * Community flagging available | ||
| 168 | * Human review on request | ||
| 169 | |||
| 170 | **Audit rate:** Recommendation: 5-10% of published AI-drafts sampled | ||
| 171 | |||
| 172 | ---- | ||
| 173 | |||
| 174 | == Quality Gates (Mandatory Before AI-Draft Publication) == | ||
| 175 | |||
| 176 | All AI-generated content must pass these automated checks before Mode 2 publication: | ||
| 177 | |||
| 178 | === Gate 1: Source Quality === | ||
| 179 | |||
| 180 | * Primary sources identified and accessible | ||
| 181 | * Source reliability scored against whitelist | ||
| 182 | * Citation completeness verified | ||
| 183 | * Publication dates checked | ||
| 184 | * Author credentials validated (where applicable) | ||
| 185 | |||
| 186 | === Gate 2: Contradiction Search (MANDATORY) === | ||
| 187 | |||
| 188 | **The system MUST actively search for:** | ||
| 189 | |||
| 190 | * **Counter-evidence** – Rebuttals, conflicting results, contradictory studies | ||
| 191 | * **Reservations** – Caveats, limitations, boundary conditions, applicability constraints | ||
| 192 | * **Alternative interpretations** – Different framings, definitions, contextual variations | ||
| 193 | * **Bubble detection** – Conspiracy theories, echo chambers, ideologically isolated sources | ||
| 194 | |||
| 195 | **Search coverage requirements:** | ||
| 196 | |||
| 197 | * Academic literature (BOTH supporting AND opposing views) | ||
| 198 | * Reputable media across diverse political/ideological perspectives | ||
| 199 | * Official contradictions (retractions, corrections, updates, amendments) | ||
| 200 | * Domain-specific skeptics, critics, and alternative expert opinions | ||
| 201 | * Cross-cultural and international perspectives | ||
| 202 | |||
| 203 | **Search must actively avoid algorithmic bubbles:** | ||
| 204 | |||
| 205 | * Deliberately seek opposing viewpoints | ||
| 206 | * Check for echo chamber patterns in source clusters | ||
| 207 | * Identify tribal or ideological source clustering | ||
| 208 | * Flag when search space appears artificially constrained | ||
| 209 | * Verify diversity of perspectives represented | ||
| 210 | |||
| 211 | **Outcomes:** | ||
| 212 | |||
| 213 | * **Strong counter-evidence found** → Auto-escalate to Tier B or draft-only mode | ||
| 214 | * **Significant uncertainty detected** → Require uncertainty disclosure in verdict | ||
| 215 | * **Bubble indicators present** → Flag for expert review and human validation | ||
| 216 | * **Limited perspective diversity** → Expand search or flag for human review | ||
| 217 | |||
| 218 | === Gate 3: Uncertainty Quantification === | ||
| 219 | |||
| 220 | * Confidence scores calculated for all claims and verdicts | ||
| 221 | * Limitations explicitly stated | ||
| 222 | * Data gaps identified and disclosed | ||
| 223 | * Strength of evidence assessed | ||
| 224 | * Alternative scenarios considered | ||
| 225 | |||
| 226 | === Gate 4: Structural Integrity === | ||
| 227 | |||
| 228 | * No hallucinations detected (fact-checking against sources) | ||
| 229 | * Logic chain valid and traceable | ||
| 230 | * References accessible and verifiable | ||
| 231 | * No circular reasoning | ||
| 232 | * Premises clearly stated | ||
| 233 | |||
| 234 | **If any gate fails:** | ||
| 235 | |||
| 236 | * Content remains in draft-only mode | ||
| 237 | * Failure reason logged | ||
| 238 | * Human review required before publication | ||
| 239 | * Failure patterns analyzed for system improvement | ||
| 240 | |||
| 241 | ---- | ||
| 242 | |||
| 243 | == Audit System (Sampling-Based Quality Assurance) == | ||
| 244 | |||
| 245 | Instead of reviewing ALL AI output, FactHarbor implements stratified sampling audits: | ||
| 246 | |||
| 247 | === Sampling Strategy === | ||
| 248 | |||
| 249 | Audits prioritize: | ||
| 250 | |||
| 251 | * **Risk tier** (higher tiers get more frequent audits) | ||
| 252 | * **AI confidence score** (low confidence → higher sampling rate) | ||
| 253 | * **Traffic and engagement** (high-visibility content audited more) | ||
| 254 | * **Novelty** (new claim types, new domains, emerging topics) | ||
| 255 | * **Disagreement signals** (user flags, contradiction alerts, community reports) | ||
| 256 | |||
| 257 | === Audit Process === | ||
| 258 | |||
| 259 | 1. System selects content for audit based on sampling strategy | ||
| 260 | 2. Human auditor reviews AI-generated content against quality standards | ||
| 261 | 3. Auditor validates or corrects: | ||
| 262 | |||
| 263 | * Claim extraction accuracy | ||
| 264 | * Scenario appropriateness | ||
| 265 | * Evidence relevance and interpretation | ||
| 266 | * Verdict reasoning | ||
| 267 | * Contradiction search completeness | ||
| 268 | 4. Audit outcome recorded (pass/fail + detailed feedback) | ||
| 269 | 5. Failed audits trigger immediate content review | ||
| 270 | 6. Audit results feed back into system improvement | ||
| 271 | |||
| 272 | === Feedback Loop (Continuous Improvement) === | ||
| 273 | |||
| 274 | Audit outcomes systematically improve: | ||
| 275 | |||
| 276 | * **Query templates** – Refined based on missed evidence patterns | ||
| 277 | * **Retrieval source weights** – Adjusted for accuracy and reliability | ||
| 278 | * **Contradiction detection heuristics** – Enhanced to catch missed counter-evidence | ||
| 279 | * **Model prompts and extraction rules** – Tuned for better claim extraction | ||
| 280 | * **Risk tier assignments** – Recalibrated based on error patterns | ||
| 281 | * **Bubble detection algorithms** – Improved to identify echo chambers | ||
| 282 | |||
| 283 | === Audit Transparency === | ||
| 284 | |||
| 285 | * Audit statistics published regularly | ||
| 286 | * Accuracy rates by risk tier tracked and reported | ||
| 287 | * System improvements documented | ||
| 288 | * Community can view aggregate audit performance | ||
| 289 | |||
| 290 | ---- | ||
| 291 | |||
| 292 | == Architecture Overview == | ||
| 293 | |||
| 294 | {{include reference="Archive.FactHarbor V0\.9\.23 Lost Data.Specification.Diagrams.AKEL Architecture.WebHome"/}} | ||
| 295 | |||
| 296 | ---- | ||
| 297 | |||
| 298 | == AKEL and Federation == | ||
| 299 | |||
| 300 | In Release 1.0+, AKEL participates in cross-node knowledge alignment: | ||
| 301 | |||
| 302 | * Shares embeddings | ||
| 303 | * Exchanges canonicalized claim forms | ||
| 304 | * Exchanges scenario templates | ||
| 305 | * Sends + receives contradiction alerts | ||
| 306 | * Shares audit findings (with privacy controls) | ||
| 307 | * Never shares model weights | ||
| 308 | * Never overrides local governance | ||
| 309 | |||
| 310 | Nodes may choose trust levels for AKEL-related data: | ||
| 311 | |||
| 312 | * Trusted nodes: auto-merge embeddings + templates | ||
| 313 | * Neutral nodes: require reviewer approval | ||
| 314 | * Untrusted nodes: fully manual import | ||
| 315 | |||
| 316 | ---- | ||
| 317 | |||
| 318 | == Human Review Workflow (Mode 3 Publication) == | ||
| 319 | |||
| 320 | For content requiring human validation before "Human-Reviewed" status: | ||
| 321 | |||
| 322 | 1. AKEL generates content and publishes as AI-draft (Mode 2) or keeps as draft (Mode 1) | ||
| 323 | 2. Reviewers inspect content in review queue | ||
| 324 | 3. Reviewers validate quality gates were correctly applied | ||
| 325 | 4. Experts validate high-risk (Tier A) or domain-specific outputs | ||
| 326 | 5. Moderators finalize "Human-Reviewed" publication | ||
| 327 | 6. Version numbers increment, full history preserved | ||
| 328 | |||
| 329 | **Note:** Most AI-generated content (Tier B and C) can remain in Mode 2 (AI-Generated) indefinitely. Human review is optional for these tiers unless users or audits flag issues. | ||
| 330 | |||
| 331 | ---- | ||
| 332 | |||
| 333 | == POC v1 Behavior == | ||
| 334 | |||
| 335 | The POC explicitly demonstrates AI-generated content publication: | ||
| 336 | |||
| 337 | * Produces public AI-generated output (Mode 2) | ||
| 338 | * No human data sources required | ||
| 339 | * No human approval gate | ||
| 340 | * Clear "AI-Generated - POC/Demo" labeling | ||
| 341 | * All quality gates active (including contradiction search) | ||
| 342 | * Users understand this demonstrates AI reasoning capabilities | ||
| 343 | * Risk tier classification shown (demo purposes) | ||
| 344 | |||
| 345 | ---- | ||
| 346 | |||
| 347 | == Related Pages == | ||
| 348 | |||
| 349 | * [[Automation>>Archive.FactHarbor V0\.9\.18 copy.Specification.Automation.WebHome]] | ||
| 350 | * [[Requirements (Roles)>>Archive.FactHarbor V0\.9\.18 copy.Specification.Requirements.WebHome]] | ||
| 351 | * [[Workflows>>Archive.FactHarbor V0\.9\.18 copy.Specification.Workflows.WebHome]] | ||
| 352 | * [[Governance>>FactHarbor.Organisation.Governance]] |