AI Knowledge Extraction Layer (AKEL)

Version 2.1 by Robert Schaub on 2025/12/11 21:35

AKEL — AI Knowledge Extraction Layer

AKEL is FactHarbor’s automated intelligence subsystem.
Its purpose is to reduce human workload, enhance consistency, and enable scalable knowledge processing — without ever replacing human judgment.

All AKEL outputs are marked with AuthorType = AI and require human approval before publication.

AKEL operates in two modes:

Single-node mode (POC & Beta 0)
Federated multi-node mode (Release 1.0+)

Human reviewers, experts, and moderators always retain final authority.

Purpose and Role

AKEL transforms unstructured inputs into structured, review-ready drafts.

Core responsibilities:

Claim extraction from arbitrary text
Claim classification (domain, type, evaluability, safety)
Scenario generation (definitions, boundaries, assumptions, methodology)
Evidence summarization and metadata extraction
Contradiction detection
Re-evaluation proposal generation
Cross-node embedding exchange (Release 1.0+)

Components

AKEL Orchestrator – central coordinator
Claim Extractor
Claim Classifier
Scenario Generator
Evidence Summarizer
Contradiction Detector
Embedding Handler (Release 1.0+)
Federation Sync Adapter (Release 1.0+)

Inputs and Outputs

Inputs

User-submitted claims or evidence
Uploaded documents
URLs or citations
External LLM API (optional)
Embeddings (from local or federated peers)

Outputs (all require human approval)

ClaimVersion (draft)
ScenarioVersion (draft)
EvidenceVersion (summary + metadata draft)
VerdictVersion (draft; internal only)
Contradiction alerts
Re-evaluation proposals
Updated embeddings

Architecture Overview

Current Implementation - Triple-Path Pipeline Architecture. Three pipeline variants share common modules for AnalysisContext detection, aggregation, claim processing, evidence filtering, verdict corrections, and source reliability.

Updated 2026-02-08 per documentation audit report.

Triple-Path Pipeline Architecture


graph TB
    subgraph Input[User Input]
        URL[URL Input]
        TEXT[Text Input]
    end

    subgraph Shared[Shared Modules]
        CONTEXTS[analysis-contexts.ts Context Detection]
        AGG[aggregation.ts Verdict Aggregation]
        CLAIM_D[claim-decomposition.ts]
        EF[evidence-filter.ts ~330 lines]
        QG[quality-gates.ts ~410 lines]
        SR[source-reliability.ts ~620 lines]
        VC[verdict-corrections.ts ~310 lines]
        TS[truth-scale.ts ~280 lines]
        BU[budgets.ts ~250 lines]
    end

    subgraph Dispatch[Pipeline Dispatch]
        SELECT{Select Pipeline}
    end

    subgraph Pipelines[Pipeline Implementations]
        ORCH[Orchestrated Pipeline]
        CANON[Monolithic Canonical]
        DYN[Monolithic Dynamic]
    end

    subgraph LLM[LLM Layer]
        PROVIDER[AI SDK Provider]
    end

    subgraph Output[Result]
        RESULT[AnalysisResult JSON]
        REPORT[Markdown Report]
    end

    URL --> SELECT
    TEXT --> SELECT
    SELECT -->|orchestrated| ORCH
    SELECT -->|monolithic_canonical| CANON
    SELECT -->|monolithic_dynamic| DYN
    CONTEXTS --> ORCH
    CONTEXTS --> CANON
    AGG --> ORCH
    AGG --> CANON
    CLAIM_D --> ORCH
    CLAIM_D --> CANON
    EF --> ORCH
    QG --> ORCH
    SR --> ORCH
    SR --> CANON
    SR --> DYN
    VC --> ORCH
    TS --> CANON
    TS --> DYN
    BU --> ORCH
    BU --> CANON
    BU --> DYN
    ORCH --> PROVIDER
    CANON --> PROVIDER
    DYN --> PROVIDER
    ORCH --> RESULT
    CANON --> RESULT
    DYN --> RESULT
    RESULT --> REPORT

Pipeline Variants

Variant	File	Lines	Approach	Output Schema
Orchestrated	orchestrated.ts	13,300	Multi-step workflow with explicit stages	Canonical (structured)
Monolithic Canonical	monolithic-canonical.ts	1,500	Single LLM tool-loop call	Canonical (structured)
Monolithic Dynamic	monolithic-dynamic.ts	735	Single LLM tool-loop call	Dynamic (flexible)

Shared Modules

Module	Lines	Used By	Purpose
analysis-contexts.ts		Orch, Canon	Heuristic context pre-detection before LLM
aggregation.ts		Orch, Canon	Verdict weighting, contestation validation
claim-decomposition.ts		Orch, Canon	Claim text parsing and normalization
evidence-filter.ts	330	Orch	Probative value filtering, false positive rate calculation
quality-gates.ts	410	Orch	Gate 1 (claim validation) and Gate 4 (verdict confidence)
source-reliability.ts	620	Orch, Canon, Dyn	LLM-based source reliability evaluation with cache
verdict-corrections.ts	310	Orch	Post-hoc verdict direction mismatch corrections
truth-scale.ts	280	Canon, Dyn	Percentage-to-verdict label mapping
budgets.ts	250	Orch, Canon, Dyn	Token/cost budget tracking and enforcement

Orchestrated Pipeline Steps

Understand - Detect input type, extract claims, identify dependencies
2. Research (iterative) - Generate queries, fetch sources, extract evidence
3. Verdict Generation - Generate claim and article verdicts
4. Summary - Build two-panel summary
5. Report - Generate markdown report

Detailed Pipeline Diagrams

For internal implementation details of each pipeline variant:

Orchestrated Pipeline Internal - 7-step staged workflow (13,300 lines)
Monolithic Canonical Internal - Single-context canonical output (1,500 lines)
Monolithic Dynamic Internal - Flexible experimental output (735 lines)

AKEL and Federation

In Release 1.0+, AKEL participates in cross-node knowledge alignment:

Shares embeddings
Exchanges canonicalized claim forms
Exchanges scenario templates
Sends + receives contradiction alerts
Never shares model weights
Never overrides local governance

Nodes may choose trust levels for AKEL-related data:

Trusted nodes: auto-merge embeddings + templates
Neutral nodes: require reviewer approval
Untrusted nodes: fully manual import

Human Approval Workflow

AKEL generates draft outputs (AuthorType = AI)
2. Reviewers inspect and approve/moderate the drafts
3. Experts validate high-risk or domain-specific outputs
4. Moderators finalize publication
5. Version numbers increment, history preserved

No AKEL output is ever published automatically.