Wiki source code of AI Knowledge Extraction Layer (AKEL)
Last modified by Robert Schaub on 2025/12/24 20:33
Hide last authors
| author | version | line-number | content |
|---|---|---|---|
| |
1.1 | 1 | = AKEL — AI Knowledge Extraction Layer = |
| 2 | |||
| |
5.1 | 3 | AKEL is FactHarbor's automated intelligence subsystem. |
| |
1.1 | 4 | Its purpose is to reduce human workload, enhance consistency, and enable scalable knowledge processing — **without ever replacing human judgment**. |
| 5 | |||
| |
5.1 | 6 | AKEL outputs are marked with **AuthorType = AI** and published according to risk-based review policies (see Publication Modes below). |
| |
1.1 | 7 | |
| 8 | AKEL operates in two modes: | ||
| 9 | |||
| 10 | * **Single-node mode** (POC & Beta 0) | ||
| 11 | * **Federated multi-node mode** (Release 1.0+) | ||
| 12 | |||
| |
5.1 | 13 | Human reviewers, experts, and moderators always retain final authority over content marked as "Human-Reviewed." |
| |
1.1 | 14 | |
| |
4.1 | 15 | ---- |
| 16 | |||
| |
1.1 | 17 | == Purpose and Role == |
| 18 | |||
| |
5.1 | 19 | AKEL transforms unstructured inputs into structured, publication-ready content. |
| |
1.1 | 20 | |
| 21 | Core responsibilities: | ||
| 22 | |||
| 23 | * Claim extraction from arbitrary text | ||
| |
5.1 | 24 | * Claim classification (domain, type, evaluability, safety, **risk tier**) |
| |
1.1 | 25 | * Scenario generation (definitions, boundaries, assumptions, methodology) |
| 26 | * Evidence summarization and metadata extraction | ||
| |
5.1 | 27 | * **Contradiction detection and counter-evidence search** |
| 28 | * **Reservation and limitation identification** | ||
| 29 | * **Bubble detection** (echo chambers, conspiracy theories, isolated sources) | ||
| |
1.1 | 30 | * Re-evaluation proposal generation |
| 31 | * Cross-node embedding exchange (Release 1.0+) | ||
| 32 | |||
| |
4.1 | 33 | ---- |
| 34 | |||
| |
1.1 | 35 | == Components == |
| 36 | |||
| 37 | * **AKEL Orchestrator** – central coordinator | ||
| 38 | * **Claim Extractor** | ||
| |
5.1 | 39 | * **Claim Classifier** (with risk tier assignment) |
| |
1.1 | 40 | * **Scenario Generator** |
| 41 | * **Evidence Summarizer** | ||
| |
5.1 | 42 | * **Contradiction Detector** (enhanced with counter-evidence search) |
| 43 | * **Quality Gate Validator** | ||
| 44 | * **Audit Sampling Scheduler** | ||
| |
1.1 | 45 | * **Embedding Handler** (Release 1.0+) |
| 46 | * **Federation Sync Adapter** (Release 1.0+) | ||
| 47 | |||
| |
4.1 | 48 | ---- |
| 49 | |||
| |
1.1 | 50 | == Inputs and Outputs == |
| 51 | |||
| 52 | === Inputs === | ||
| |
6.4 | 53 | |
| |
4.1 | 54 | * User-submitted claims or evidence |
| 55 | * Uploaded documents | ||
| 56 | * URLs or citations | ||
| 57 | * External LLM API (optional) | ||
| |
1.1 | 58 | * Embeddings (from local or federated peers) |
| 59 | |||
| |
5.1 | 60 | === Outputs (publication mode varies by risk tier) === |
| |
6.4 | 61 | |
| |
5.1 | 62 | * ClaimVersion (draft or AI-generated) |
| 63 | * ScenarioVersion (draft or AI-generated) | ||
| 64 | * EvidenceVersion (summary + metadata, draft or AI-generated) | ||
| 65 | * VerdictVersion (draft, AI-generated, or human-reviewed) | ||
| |
4.1 | 66 | * Contradiction alerts |
| |
5.1 | 67 | * Reservation and limitation notices |
| |
4.1 | 68 | * Re-evaluation proposals |
| |
1.1 | 69 | * Updated embeddings |
| 70 | |||
| |
4.1 | 71 | ---- |
| 72 | |||
| |
5.1 | 73 | == Publication Modes == |
| 74 | |||
| 75 | AKEL content is published according to three modes: | ||
| 76 | |||
| 77 | === Mode 1: Draft-Only (Never Public) === | ||
| 78 | |||
| 79 | **Used for:** | ||
| |
6.4 | 80 | |
| |
5.1 | 81 | * Failed quality gate checks |
| 82 | * Sensitive topics flagged for expert review | ||
| 83 | * Unclear scope or missing critical sources | ||
| 84 | * High reputational risk content | ||
| 85 | |||
| 86 | **Visibility:** Internal review queue only | ||
| 87 | |||
| 88 | === Mode 2: Published as AI-Generated (No Prior Human Review) === | ||
| 89 | |||
| 90 | **Requirements:** | ||
| |
6.4 | 91 | |
| |
5.1 | 92 | * All automated quality gates passed (see below) |
| 93 | * Risk tier permits AI-draft publication (Tier B or C) | ||
| 94 | * Contradiction search completed successfully | ||
| 95 | * Clear labeling as "AI-Generated, Awaiting Human Review" | ||
| 96 | |||
| 97 | **Label shown to users:** | ||
| 98 | ``` | ||
| 99 | [AI-Generated] This content was produced by AI and has not yet been human-reviewed. | ||
| 100 | Source: AI | Review Status: Pending | Risk Tier: [B/C] | ||
| 101 | Contradiction Search: Completed | Last Updated: [timestamp] | ||
| 102 | ``` | ||
| 103 | |||
| 104 | **User actions:** | ||
| |
6.4 | 105 | |
| |
5.1 | 106 | * Browse and read content |
| 107 | * Request human review (escalates to review queue) | ||
| 108 | * Flag for expert attention | ||
| 109 | |||
| 110 | === Mode 3: Published as Human-Reviewed === | ||
| 111 | |||
| 112 | **Requirements:** | ||
| |
6.4 | 113 | |
| |
5.1 | 114 | * Human reviewer or domain expert validated |
| 115 | * All quality gates passed | ||
| 116 | * Visible "Human-Reviewed" mark with reviewer role and timestamp | ||
| 117 | |||
| 118 | **Label shown to users:** | ||
| 119 | ``` | ||
| 120 | [Human-Reviewed] This content has been validated by human reviewers. | ||
| 121 | Source: AI+Human | Review Status: Approved | Reviewed by: [Role] on [timestamp] | ||
| 122 | Risk Tier: [A/B/C] | Contradiction Search: Completed | ||
| 123 | ``` | ||
| 124 | |||
| 125 | ---- | ||
| 126 | |||
| 127 | == Risk Tiers == | ||
| 128 | |||
| 129 | AKEL assigns risk tiers to all content to determine appropriate review requirements: | ||
| 130 | |||
| 131 | === Tier A — High Risk / High Impact === | ||
| 132 | |||
| 133 | **Domains:** Medical, legal, elections, safety/security, major reputational harm | ||
| 134 | |||
| 135 | **Publication policy:** | ||
| |
6.4 | 136 | |
| |
5.1 | 137 | * Human review REQUIRED before "Human-Reviewed" status |
| 138 | * AI-generated content MAY be published but: | ||
| |
6.4 | 139 | ** Clearly flagged as AI-draft with prominent disclaimer |
| 140 | ** May have limited visibility | ||
| 141 | ** Auto-escalated to expert review queue | ||
| 142 | ** User warnings displayed | ||
| |
5.1 | 143 | |
| 144 | **Audit rate:** Recommendation: 30-50% of published AI-drafts sampled in first 6 months | ||
| 145 | |||
| 146 | === Tier B — Medium Risk === | ||
| 147 | |||
| 148 | **Domains:** Contested public policy, complex science, causality claims, significant financial impact | ||
| 149 | |||
| 150 | **Publication policy:** | ||
| |
6.4 | 151 | |
| |
5.1 | 152 | * AI-draft CAN publish immediately with clear labeling |
| 153 | * Sampling audits conducted (see Audit System below) | ||
| 154 | * High-engagement items auto-escalated to expert review | ||
| 155 | * Users can request human review | ||
| 156 | |||
| 157 | **Audit rate:** Recommendation: 10-20% of published AI-drafts sampled | ||
| 158 | |||
| 159 | === Tier C — Low Risk === | ||
| 160 | |||
| 161 | **Domains:** Definitions, simple factual lookups with strong primary sources, historical facts, established scientific consensus | ||
| 162 | |||
| 163 | **Publication policy:** | ||
| |
6.4 | 164 | |
| |
5.1 | 165 | * AI-draft default publication mode |
| 166 | * Sampling audits sufficient | ||
| 167 | * Community flagging available | ||
| 168 | * Human review on request | ||
| 169 | |||
| 170 | **Audit rate:** Recommendation: 5-10% of published AI-drafts sampled | ||
| 171 | |||
| 172 | ---- | ||
| 173 | |||
| 174 | == Quality Gates (Mandatory Before AI-Draft Publication) == | ||
| 175 | |||
| 176 | All AI-generated content must pass these automated checks before Mode 2 publication: | ||
| 177 | |||
| 178 | === Gate 1: Source Quality === | ||
| |
6.4 | 179 | |
| |
5.1 | 180 | * Primary sources identified and accessible |
| 181 | * Source reliability scored against whitelist | ||
| 182 | * Citation completeness verified | ||
| 183 | * Publication dates checked | ||
| 184 | * Author credentials validated (where applicable) | ||
| 185 | |||
| 186 | === Gate 2: Contradiction Search (MANDATORY) === | ||
| 187 | |||
| 188 | **The system MUST actively search for:** | ||
| 189 | |||
| 190 | * **Counter-evidence** – Rebuttals, conflicting results, contradictory studies | ||
| 191 | * **Reservations** – Caveats, limitations, boundary conditions, applicability constraints | ||
| 192 | * **Alternative interpretations** – Different framings, definitions, contextual variations | ||
| 193 | * **Bubble detection** – Conspiracy theories, echo chambers, ideologically isolated sources | ||
| 194 | |||
| 195 | **Search coverage requirements:** | ||
| |
6.4 | 196 | |
| |
5.1 | 197 | * Academic literature (BOTH supporting AND opposing views) |
| 198 | * Reputable media across diverse political/ideological perspectives | ||
| 199 | * Official contradictions (retractions, corrections, updates, amendments) | ||
| 200 | * Domain-specific skeptics, critics, and alternative expert opinions | ||
| 201 | * Cross-cultural and international perspectives | ||
| 202 | |||
| 203 | **Search must actively avoid algorithmic bubbles:** | ||
| |
6.4 | 204 | |
| |
5.1 | 205 | * Deliberately seek opposing viewpoints |
| 206 | * Check for echo chamber patterns in source clusters | ||
| 207 | * Identify tribal or ideological source clustering | ||
| 208 | * Flag when search space appears artificially constrained | ||
| 209 | * Verify diversity of perspectives represented | ||
| 210 | |||
| 211 | **Outcomes:** | ||
| |
6.4 | 212 | |
| |
5.1 | 213 | * **Strong counter-evidence found** → Auto-escalate to Tier B or draft-only mode |
| 214 | * **Significant uncertainty detected** → Require uncertainty disclosure in verdict | ||
| 215 | * **Bubble indicators present** → Flag for expert review and human validation | ||
| 216 | * **Limited perspective diversity** → Expand search or flag for human review | ||
| 217 | |||
| 218 | === Gate 3: Uncertainty Quantification === | ||
| |
6.4 | 219 | |
| |
5.1 | 220 | * Confidence scores calculated for all claims and verdicts |
| 221 | * Limitations explicitly stated | ||
| 222 | * Data gaps identified and disclosed | ||
| 223 | * Strength of evidence assessed | ||
| 224 | * Alternative scenarios considered | ||
| 225 | |||
| 226 | === Gate 4: Structural Integrity === | ||
| |
6.4 | 227 | |
| |
5.1 | 228 | * No hallucinations detected (fact-checking against sources) |
| 229 | * Logic chain valid and traceable | ||
| 230 | * References accessible and verifiable | ||
| 231 | * No circular reasoning | ||
| 232 | * Premises clearly stated | ||
| 233 | |||
| 234 | **If any gate fails:** | ||
| |
6.4 | 235 | |
| |
5.1 | 236 | * Content remains in draft-only mode |
| 237 | * Failure reason logged | ||
| 238 | * Human review required before publication | ||
| 239 | * Failure patterns analyzed for system improvement | ||
| 240 | |||
| 241 | ---- | ||
| 242 | |||
| 243 | == Audit System (Sampling-Based Quality Assurance) == | ||
| 244 | |||
| 245 | Instead of reviewing ALL AI output, FactHarbor implements stratified sampling audits: | ||
| 246 | |||
| 247 | === Sampling Strategy === | ||
| 248 | |||
| 249 | Audits prioritize: | ||
| |
6.4 | 250 | |
| |
5.1 | 251 | * **Risk tier** (higher tiers get more frequent audits) |
| 252 | * **AI confidence score** (low confidence → higher sampling rate) | ||
| 253 | * **Traffic and engagement** (high-visibility content audited more) | ||
| 254 | * **Novelty** (new claim types, new domains, emerging topics) | ||
| 255 | * **Disagreement signals** (user flags, contradiction alerts, community reports) | ||
| 256 | |||
| 257 | === Audit Process === | ||
| 258 | |||
| 259 | 1. System selects content for audit based on sampling strategy | ||
| 260 | 2. Human auditor reviews AI-generated content against quality standards | ||
| 261 | 3. Auditor validates or corrects: | ||
| |
6.4 | 262 | |
| 263 | * Claim extraction accuracy | ||
| 264 | * Scenario appropriateness | ||
| 265 | * Evidence relevance and interpretation | ||
| 266 | * Verdict reasoning | ||
| 267 | * Contradiction search completeness | ||
| |
5.1 | 268 | 4. Audit outcome recorded (pass/fail + detailed feedback) |
| 269 | 5. Failed audits trigger immediate content review | ||
| 270 | 6. Audit results feed back into system improvement | ||
| 271 | |||
| 272 | === Feedback Loop (Continuous Improvement) === | ||
| 273 | |||
| 274 | Audit outcomes systematically improve: | ||
| |
6.4 | 275 | |
| |
5.1 | 276 | * **Query templates** – Refined based on missed evidence patterns |
| 277 | * **Retrieval source weights** – Adjusted for accuracy and reliability | ||
| 278 | * **Contradiction detection heuristics** – Enhanced to catch missed counter-evidence | ||
| 279 | * **Model prompts and extraction rules** – Tuned for better claim extraction | ||
| 280 | * **Risk tier assignments** – Recalibrated based on error patterns | ||
| 281 | * **Bubble detection algorithms** – Improved to identify echo chambers | ||
| 282 | |||
| 283 | === Audit Transparency === | ||
| 284 | |||
| 285 | * Audit statistics published regularly | ||
| 286 | * Accuracy rates by risk tier tracked and reported | ||
| 287 | * System improvements documented | ||
| 288 | * Community can view aggregate audit performance | ||
| 289 | |||
| 290 | ---- | ||
| 291 | |||
| |
1.1 | 292 | == Architecture Overview == |
| 293 | |||
| |
6.15 | 294 | {{include reference="Archive.FactHarbor V0\.9\.23 Lost Data.Specification.Diagrams.AKEL Architecture.WebHome"/}} |
| |
1.1 | 295 | |
| |
4.1 | 296 | ---- |
| 297 | |||
| |
1.1 | 298 | == AKEL and Federation == |
| 299 | |||
| 300 | In Release 1.0+, AKEL participates in cross-node knowledge alignment: | ||
| 301 | |||
| |
4.1 | 302 | * Shares embeddings |
| 303 | * Exchanges canonicalized claim forms | ||
| 304 | * Exchanges scenario templates | ||
| 305 | * Sends + receives contradiction alerts | ||
| |
5.1 | 306 | * Shares audit findings (with privacy controls) |
| |
4.1 | 307 | * Never shares model weights |
| |
1.1 | 308 | * Never overrides local governance |
| 309 | |||
| 310 | Nodes may choose trust levels for AKEL-related data: | ||
| 311 | |||
| |
4.1 | 312 | * Trusted nodes: auto-merge embeddings + templates |
| 313 | * Neutral nodes: require reviewer approval | ||
| |
1.1 | 314 | * Untrusted nodes: fully manual import |
| 315 | |||
| |
4.1 | 316 | ---- |
| 317 | |||
| |
5.1 | 318 | == Human Review Workflow (Mode 3 Publication) == |
| |
1.1 | 319 | |
| |
5.1 | 320 | For content requiring human validation before "Human-Reviewed" status: |
| |
1.1 | 321 | |
| |
5.1 | 322 | 1. AKEL generates content and publishes as AI-draft (Mode 2) or keeps as draft (Mode 1) |
| 323 | 2. Reviewers inspect content in review queue | ||
| 324 | 3. Reviewers validate quality gates were correctly applied | ||
| 325 | 4. Experts validate high-risk (Tier A) or domain-specific outputs | ||
| 326 | 5. Moderators finalize "Human-Reviewed" publication | ||
| 327 | 6. Version numbers increment, full history preserved | ||
| |
4.1 | 328 | |
| |
5.1 | 329 | **Note:** Most AI-generated content (Tier B and C) can remain in Mode 2 (AI-Generated) indefinitely. Human review is optional for these tiers unless users or audits flag issues. |
| 330 | |||
| |
4.1 | 331 | ---- |
| |
5.1 | 332 | |
| 333 | == POC v1 Behavior == | ||
| 334 | |||
| 335 | The POC explicitly demonstrates AI-generated content publication: | ||
| 336 | |||
| 337 | * Produces public AI-generated output (Mode 2) | ||
| 338 | * No human data sources required | ||
| 339 | * No human approval gate | ||
| 340 | * Clear "AI-Generated - POC/Demo" labeling | ||
| 341 | * All quality gates active (including contradiction search) | ||
| 342 | * Users understand this demonstrates AI reasoning capabilities | ||
| 343 | * Risk tier classification shown (demo purposes) | ||
| 344 | |||
| 345 | ---- | ||
| 346 | |||
| 347 | == Related Pages == | ||
| 348 | |||
| |
6.12 | 349 | * [[Automation>>Archive.FactHarbor V0\.9\.18 copy.Specification.Automation.WebHome]] |
| |
6.13 | 350 | * [[Requirements (Roles)>>Archive.FactHarbor V0\.9\.18 copy.Specification.Requirements.WebHome]] |
| |
6.14 | 351 | * [[Workflows>>Archive.FactHarbor V0\.9\.18 copy.Specification.Workflows.WebHome]] |
| |
5.1 | 352 | * [[Governance>>FactHarbor.Organisation.Governance]] |