AI Knowledge Extraction Layer (AKEL)

Version 1.1 by Robert Schaub on 2025/12/16 21:42

AKEL — AI Knowledge Extraction Layer

AKEL is FactHarbor's automated intelligence subsystem.
Its purpose is to reduce human workload, enhance consistency, and enable scalable knowledge processing — without ever replacing human judgment.

AKEL outputs are marked with AuthorType = AI and published according to risk-based review policies (see Publication Modes below).

AKEL operates in two modes:

Single-node mode (POC & Beta 0)
Federated multi-node mode (Release 1.0+)

Human reviewers, experts, and moderators always retain final authority over content marked as "Human-Reviewed."

1. Purpose and Role

AKEL transforms unstructured inputs into structured, publication-ready content.

Core responsibilities:

Claim extraction from arbitrary text
Claim classification (domain, type, evaluability, safety, risk tier)
Scenario generation (definitions, boundaries, assumptions, methodology)
Evidence summarization and metadata extraction
Contradiction detection and counter-evidence search
Reservation and limitation identification
Bubble detection (echo chambers, conspiracy theories, isolated sources)
Re-evaluation proposal generation
Cross-node embedding exchange (Release 1.0+)

2. Components

AKEL Orchestrator – central coordinator
Claim Extractor
Claim Classifier (with risk tier assignment)
Scenario Generator
Evidence Summarizer
Contradiction Detector (enhanced with counter-evidence search)
Quality Gate Validator
Audit Sampling Scheduler
Embedding Handler (Release 1.0+)
Federation Sync Adapter (Release 1.0+)

3. Inputs and Outputs

3.1 Inputs

User-submitted claims or evidence
Uploaded documents
URLs or citations
External LLM API (optional)
Embeddings (from local or federated peers)

3.2 Outputs (publication mode varies by risk tier)

ClaimVersion (draft or AI-generated)
ScenarioVersion (draft or AI-generated)
EvidenceVersion (summary + metadata, draft or AI-generated)
VerdictVersion (draft, AI-generated, or human-reviewed)
Contradiction alerts
Reservation and limitation notices
Re-evaluation proposals
Updated embeddings

4. Publication Modes

AKEL content is published according to three modes:

4.1 Mode 1: Draft-Only (Never Public)

Used for:

Failed quality gate checks
Sensitive topics flagged for expert review
Unclear scope or missing critical sources
High reputational risk content

Visibility: Internal review queue only

4.2 Mode 2: Published as AI-Generated (No Prior Human Review)

Requirements:

All automated quality gates passed (see below)
Risk tier permits AI-draft publication (Tier B or C)
Contradiction search completed successfully
Clear labeling as "AI-Generated, Awaiting Human Review"

Label shown to users:
```
[AI-Generated] This content was produced by AI and has not yet been human-reviewed.
Source: AI | Review Status: Pending | Risk Tier: [B/C]
Contradiction Search: Completed | Last Updated: [timestamp]
```

User actions:

Browse and read content
Request human review (escalates to review queue)
Flag for expert attention

4.3 Mode 3: Published as Human-Reviewed

Requirements:

Human reviewer or domain expert validated
All quality gates passed
Visible "Human-Reviewed" mark with reviewer role and timestamp

Label shown to users:
```
[Human-Reviewed] This content has been validated by human reviewers.
Source: AI+Human | Review Status: Approved | Reviewed by: [Role] on [timestamp]
Risk Tier: [A/B/C] | Contradiction Search: Completed
```

5. Risk Tiers

AKEL assigns risk tiers to all content to determine appropriate review requirements:

5.1 Tier A — High Risk / High Impact

Domains: Medical, legal, elections, safety/security, major reputational harm

Publication policy:

Human review REQUIRED before "Human-Reviewed" status
AI-generated content MAY be published but:
- Clearly flagged as AI-draft with prominent disclaimer
- May have limited visibility
- Auto-escalated to expert review queue
- User warnings displayed

Audit rate: Recommendation: 30-50% of published AI-drafts sampled in first 6 months

5.2 Tier B — Medium Risk

Domains: Contested public policy, complex science, causality claims, significant financial impact

Publication policy:

AI-draft CAN publish immediately with clear labeling
Sampling audits conducted (see Audit System below)
High-engagement items auto-escalated to expert review
Users can request human review

Audit rate: Recommendation: 10-20% of published AI-drafts sampled

5.3 Tier C — Low Risk

Domains: Definitions, simple factual lookups with strong primary sources, historical facts, established scientific consensus

Publication policy:

AI-draft default publication mode
Sampling audits sufficient
Community flagging available
Human review on request

Audit rate: Recommendation: 5-10% of published AI-drafts sampled

6. Quality Gates (Mandatory Before AI-Draft Publication)

All AI-generated content must pass these automated checks before Mode 2 publication:

6.1 Gate 1: Source Quality

Primary sources identified and accessible
Source reliability scored against whitelist
Citation completeness verified
Publication dates checked
Author credentials validated (where applicable)

6.2 Gate 2: Contradiction Search (MANDATORY)

The system MUST actively search for:

Counter-evidence – Rebuttals, conflicting results, contradictory studies
Reservations – Caveats, limitations, boundary conditions, applicability constraints
Alternative interpretations – Different framings, definitions, contextual variations
Bubble detection – Conspiracy theories, echo chambers, ideologically isolated sources

Search coverage requirements:

Academic literature (BOTH supporting AND opposing views)
Reputable media across diverse political/ideological perspectives
Official contradictions (retractions, corrections, updates, amendments)
Domain-specific skeptics, critics, and alternative expert opinions
Cross-cultural and international perspectives

Search must actively avoid algorithmic bubbles:

Deliberately seek opposing viewpoints
Check for echo chamber patterns in source clusters
Identify tribal or ideological source clustering
Flag when search space appears artificially constrained
Verify diversity of perspectives represented

Outcomes:

Strong counter-evidence found → Auto-escalate to Tier B or draft-only mode
Significant uncertainty detected → Require uncertainty disclosure in verdict
Bubble indicators present → Flag for expert review and human validation
Limited perspective diversity → Expand search or flag for human review

6.3 Gate 3: Uncertainty Quantification

Confidence scores calculated for all claims and verdicts
Limitations explicitly stated
Data gaps identified and disclosed
Strength of evidence assessed
Alternative scenarios considered

6.4 Gate 4: Structural Integrity

No hallucinations detected (fact-checking against sources)
Logic chain valid and traceable
References accessible and verifiable
No circular reasoning
Premises clearly stated

If any gate fails:

Content remains in draft-only mode
Failure reason logged
Human review required before publication
Failure patterns analyzed for system improvement

7. Audit System (Sampling-Based Quality Assurance)

Instead of reviewing ALL AI output, FactHarbor implements stratified sampling audits:

7.1 Sampling Strategy

Audits prioritize:

Risk tier (higher tiers get more frequent audits)
AI confidence score (low confidence → higher sampling rate)
Traffic and engagement (high-visibility content audited more)
Novelty (new claim types, new domains, emerging topics)
Disagreement signals (user flags, contradiction alerts, community reports)

7.2 Audit Process

System selects content for audit based on sampling strategy
2. Human auditor reviews AI-generated content against quality standards
3. Auditor validates or corrects:

Claim extraction accuracy
Scenario appropriateness
Evidence relevance and interpretation
Verdict reasoning
Contradiction search completeness
4. Audit outcome recorded (pass/fail + detailed feedback)
5. Failed audits trigger immediate content review
6. Audit results feed back into system improvement

7.3 Feedback Loop (Continuous Improvement)

Audit outcomes systematically improve:

Query templates – Refined based on missed evidence patterns
Retrieval source weights – Adjusted for accuracy and reliability
Contradiction detection heuristics – Enhanced to catch missed counter-evidence
Model prompts and extraction rules – Tuned for better claim extraction
Risk tier assignments – Recalibrated based on error patterns
Bubble detection algorithms – Improved to identify echo chambers

7.4 Audit Transparency

Audit statistics published regularly
Accuracy rates by risk tier tracked and reported
System improvements documented
Community can view aggregate audit performance

8. Architecture Overview

Current Implementation - Triple-Path Pipeline Architecture. Three pipeline variants share common modules for AnalysisContext detection, aggregation, claim processing, evidence filtering, verdict corrections, and source reliability.

Updated 2026-02-08 per documentation audit report.

Triple-Path Pipeline Architecture


graph TB
    subgraph Input[User Input]
        URL[URL Input]
        TEXT[Text Input]
    end

    subgraph Shared[Shared Modules]
        CONTEXTS[analysis-contexts.ts Context Detection]
        AGG[aggregation.ts Verdict Aggregation]
        CLAIM_D[claim-decomposition.ts]
        EF[evidence-filter.ts ~330 lines]
        QG[quality-gates.ts ~410 lines]
        SR[source-reliability.ts ~620 lines]
        VC[verdict-corrections.ts ~310 lines]
        TS[truth-scale.ts ~280 lines]
        BU[budgets.ts ~250 lines]
    end

    subgraph Dispatch[Pipeline Dispatch]
        SELECT{Select Pipeline}
    end

    subgraph Pipelines[Pipeline Implementations]
        ORCH[Orchestrated Pipeline]
        CANON[Monolithic Canonical]
        DYN[Monolithic Dynamic]
    end

    subgraph LLM[LLM Layer]
        PROVIDER[AI SDK Provider]
    end

    subgraph Output[Result]
        RESULT[AnalysisResult JSON]
        REPORT[Markdown Report]
    end

    URL --> SELECT
    TEXT --> SELECT
    SELECT -->|orchestrated| ORCH
    SELECT -->|monolithic_canonical| CANON
    SELECT -->|monolithic_dynamic| DYN
    CONTEXTS --> ORCH
    CONTEXTS --> CANON
    AGG --> ORCH
    AGG --> CANON
    CLAIM_D --> ORCH
    CLAIM_D --> CANON
    EF --> ORCH
    QG --> ORCH
    SR --> ORCH
    SR --> CANON
    SR --> DYN
    VC --> ORCH
    TS --> CANON
    TS --> DYN
    BU --> ORCH
    BU --> CANON
    BU --> DYN
    ORCH --> PROVIDER
    CANON --> PROVIDER
    DYN --> PROVIDER
    ORCH --> RESULT
    CANON --> RESULT
    DYN --> RESULT
    RESULT --> REPORT

Pipeline Variants

Variant	File	Lines	Approach	Output Schema
Orchestrated	orchestrated.ts	13,300	Multi-step workflow with explicit stages	Canonical (structured)
Monolithic Canonical	monolithic-canonical.ts	1,500	Single LLM tool-loop call	Canonical (structured)
Monolithic Dynamic	monolithic-dynamic.ts	735	Single LLM tool-loop call	Dynamic (flexible)

Shared Modules

Module	Lines	Used By	Purpose
analysis-contexts.ts		Orch, Canon	Heuristic context pre-detection before LLM
aggregation.ts		Orch, Canon	Verdict weighting, contestation validation
claim-decomposition.ts		Orch, Canon	Claim text parsing and normalization
evidence-filter.ts	330	Orch	Probative value filtering, false positive rate calculation
quality-gates.ts	410	Orch	Gate 1 (claim validation) and Gate 4 (verdict confidence)
source-reliability.ts	620	Orch, Canon, Dyn	LLM-based source reliability evaluation with cache
verdict-corrections.ts	310	Orch	Post-hoc verdict direction mismatch corrections
truth-scale.ts	280	Canon, Dyn	Percentage-to-verdict label mapping
budgets.ts	250	Orch, Canon, Dyn	Token/cost budget tracking and enforcement

Orchestrated Pipeline Steps

Understand - Detect input type, extract claims, identify dependencies
2. Research (iterative) - Generate queries, fetch sources, extract evidence
3. Verdict Generation - Generate claim and article verdicts
4. Summary - Build two-panel summary
5. Report - Generate markdown report

Detailed Pipeline Diagrams

For internal implementation details of each pipeline variant:

Orchestrated Pipeline Internal - 7-step staged workflow (13,300 lines)
Monolithic Canonical Internal - Single-context canonical output (1,500 lines)
Monolithic Dynamic Internal - Flexible experimental output (735 lines)

9. AKEL and Federation

In Release 1.0+, AKEL participates in cross-node knowledge alignment:

Shares embeddings
Exchanges canonicalized claim forms
Exchanges scenario templates
Sends + receives contradiction alerts
Shares audit findings (with privacy controls)
Never shares model weights
Never overrides local governance

Nodes may choose trust levels for AKEL-related data:

Trusted nodes: auto-merge embeddings + templates
Neutral nodes: require reviewer approval
Untrusted nodes: fully manual import

10. Human Review Workflow (Mode 3 Publication)

For content requiring human validation before "Human-Reviewed" status:

AKEL generates content and publishes as AI-draft (Mode 2) or keeps as draft (Mode 1)
2. Reviewers inspect content in review queue
3. Reviewers validate quality gates were correctly applied
4. Experts validate high-risk (Tier A) or domain-specific outputs
5. Moderators finalize "Human-Reviewed" publication
6. Version numbers increment, full history preserved

Note: Most AI-generated content (Tier B and C) can remain in Mode 2 (AI-Generated) indefinitely. Human review is optional for these tiers unless users or audits flag issues.

11. POC v1 Behavior

The POC explicitly demonstrates AI-generated content publication:

Produces public AI-generated output (Mode 2)
No human data sources required
No human approval gate
Clear "AI-Generated - POC/Demo" labeling
All quality gates active (including contradiction search)
Users understand this demonstrates AI reasoning capabilities
Risk tier classification shown (demo purposes)

AI Knowledge Extraction Layer (AKEL)

AKEL — AI Knowledge Extraction Layer

1. Purpose and Role

2. Components

3. Inputs and Outputs

3.1 Inputs

3.2 Outputs (publication mode varies by risk tier)

4. Publication Modes

4.1 Mode 1: Draft-Only (Never Public)

4.2 Mode 2: Published as AI-Generated (No Prior Human Review)

4.3 Mode 3: Published as Human-Reviewed

5. Risk Tiers

5.1 Tier A — High Risk / High Impact

5.2 Tier B — Medium Risk

5.3 Tier C — Low Risk

6. Quality Gates (Mandatory Before AI-Draft Publication)

6.1 Gate 1: Source Quality

6.2 Gate 2: Contradiction Search (MANDATORY)

6.3 Gate 3: Uncertainty Quantification

6.4 Gate 4: Structural Integrity

7. Audit System (Sampling-Based Quality Assurance)

7.1 Sampling Strategy

7.2 Audit Process

7.3 Feedback Loop (Continuous Improvement)

7.4 Audit Transparency

8. Architecture Overview

Triple-Path Pipeline Architecture

Pipeline Variants

Shared Modules

Orchestrated Pipeline Steps

Detailed Pipeline Diagrams

9. AKEL and Federation

10. Human Review Workflow (Mode 3 Publication)

11. POC v1 Behavior

12. Related Pages