AI Knowledge Extraction Layer (AKEL)
AKEL — AI Knowledge Extraction Layer
AKEL is FactHarbor's automated intelligence subsystem.
Its purpose is to reduce human workload, enhance consistency, and enable scalable knowledge processing — without ever replacing human judgment.
AKEL outputs are marked with AuthorType = AI and published according to risk-based review policies (see Publication Modes below).
AKEL operates in two modes:
- Single-node mode (POC & Beta 0)
- Federated multi-node mode (Release 1.0+)
1. Purpose and Role
AKEL transforms unstructured inputs into structured, publication-ready content.
Core responsibilities:
- Claim extraction from arbitrary text
- Claim classification (domain, type, evaluability, safety, risk tier)
- Scenario generation (definitions, boundaries, assumptions, methodology)
- Evidence summarization and metadata extraction
- Contradiction detection and counter-evidence search
- Reservation and limitation identification
- Bubble detection (echo chambers, conspiracy theories, isolated sources)
- Re-evaluation proposal generation
- Cross-node embedding exchange (Release 1.0+)
2. Components
- AKEL Orchestrator – central coordinator
- Claim Extractor
- Claim Classifier (with risk tier assignment)
- Scenario Generator
- Evidence Summarizer
- Contradiction Detector (enhanced with counter-evidence search)
- Quality Gate Validator
- Audit Sampling Scheduler
- Embedding Handler (Release 1.0+)
- Federation Sync Adapter (Release 1.0+)
3. Inputs and Outputs
3.1 Inputs
- User-submitted claims or evidence
- Uploaded documents
- URLs or citations
- External LLM API (optional)
- Embeddings (from local or federated peers)
3.2 Outputs (publication mode varies by risk tier)
- ClaimVersion (draft or AI-generated)
- ScenarioVersion (draft or AI-generated)
- EvidenceVersion (summary + metadata, draft or AI-generated)
- VerdictVersion (draft, AI-generated, or human-reviewed)
- Contradiction alerts
- Reservation and limitation notices
- Re-evaluation proposals
- Updated embeddings
4. Publication Modes
AKEL content is published according to three modes:
4.1 Mode 1: Draft-Only (Never Public)
Used for:
- Failed quality gate checks
- Sensitive topics flagged for expert review
- Unclear scope or missing critical sources
- High reputational risk content
Visibility: Internal review queue only
4.2 Mode 2: Published as AI-Generated (No Prior Human Review)
Requirements:
- All automated quality gates passed (see below)
- Risk tier permits AI-draft publication (Tier B or C)
- Contradiction search completed successfully
- Clear labeling as "AI-Generated, AKEL-Generated"
Label shown to users:
```
[AI-Generated] This content was produced by AI and has not yet been human-reviewed.
Source: AI | Review Status: Pending | Risk Tier: [B/C]
Contradiction Search: Completed | Last Updated: [timestamp]
```
User actions: - Browse and read content
- Request human review (escalates to review queue)
- Flag for expert attention
5. Risk tiers
AKEL assigns risk tiers to all content to determine appropriate review requirements:
5.1 Tier A — High Risk / High Impact
Domains: Medical, legal, elections, safety/security, major reputational harm
Publication policy:
- Human review REQUIRED before "AKEL-Generated" status
- AI-generated content MAY be published but:
- Clearly flagged as AI-draft with prominent disclaimer
- May have limited visibility
- Auto-escalated to expert review queue
- User warnings displayed
Audit rate: Recommendation: 30-50% of published AI-drafts sampled in first 6 months
5.2 Tier B — Medium Risk
Domains: Contested public policy, complex science, causality claims, significant financial impact
Publication policy:
- AI-draft CAN publish immediately with clear labeling
- Sampling audits conducted (see Audit System below)
- High-engagement items auto-escalated to expert review
- Users can report issue for moderator review
Audit rate: Recommendation: 10-20% of published AI-drafts sampled
5.3 Tier C — Low Risk
Domains: Definitions, simple factual lookups with strong primary sources, historical facts, established scientific consensus
Publication policy:
- AI-draft default publication mode
- Sampling audits sufficient
- Community flagging available
- Human review on request
Audit rate: Recommendation: 5-10% of published AI-drafts sampled
6. Quality Gates (Mandatory Before AI-Draft Publication)
All AI-generated content must pass these automated checks before Mode 2 publication:
6.1 Gate 1: Source Quality
- Primary sources identified and accessible
- Source reliability scored against whitelist
- Citation completeness verified
- Publication dates checked
- Author credentials validated (where applicable)
6.2 Gate 2: Contradiction Search (MANDATORY)
The system MUST actively search for:
- Counter-evidence – Rebuttals, conflicting results, contradictory studies
- Reservations – Caveats, limitations, boundary conditions, applicability constraints
- Alternative interpretations – Different framings, definitions, contextual variations
- Bubble detection – Conspiracy theories, echo chambers, ideologically isolated sources
Search coverage requirements: - Academic literature (BOTH supporting AND opposing views)
- Reputable media across diverse political/ideological perspectives
- Official contradictions (retractions, corrections, updates, amendments)
- Domain-specific skeptics, critics, and alternative expert opinions
- Cross-cultural and international perspectives
Search must actively avoid algorithmic bubbles: - Deliberately seek opposing viewpoints
- Check for echo chamber patterns in source clusters
- Identify tribal or ideological source clustering
- Flag when search space appears artificially constrained
- Verify diversity of perspectives represented
Outcomes: - Strong counter-evidence found → Auto-escalate to Tier B or draft-only mode
- Significant uncertainty detected → Require uncertainty disclosure in verdict
- Bubble indicators present → Flag for expert review and human validation
- Limited perspective diversity → Expand search or flag for human review
6.3 Gate 3: Uncertainty Quantification
- Confidence scores calculated for all claims and verdicts
- Limitations explicitly stated
- Data gaps identified and disclosed
- Strength of evidence assessed
- Alternative scenarios considered
6.4 Gate 4: Structural Integrity
- No hallucinations detected (fact-checking against sources)
- Logic chain valid and traceable
- References accessible and verifiable
- No circular reasoning
- Premises clearly stated
If any gate fails: - Content remains in draft-only mode
- Failure reason logged
- Human review required before publication
- Failure patterns analyzed for system improvement
7. Audit System (Sampling-Based Quality Assurance)
Instead of reviewing ALL AI output, FactHarbor implements stratified sampling audits:
7.1 Sampling Strategy
Audits prioritize:
- Risk tier (higher tiers get more frequent audits)
- AI confidence score (low confidence → higher sampling rate)
- Traffic and engagement (high-visibility content audited more)
- Novelty (new claim types, new domains, emerging topics)
- Disagreement signals (user flags, contradiction alerts, community reports)
7.2 Audit Process
- System selects content for audit based on sampling strategy
2. Human auditor reviews AI-generated content against quality standards
3. Moderator validates or corrects:
- Claim extraction accuracy
- Scenario appropriateness
- Evidence relevance and interpretation
- Verdict reasoning
- Contradiction search completeness
4. Audit outcome recorded (pass/fail + detailed feedback)
5. Failed audits trigger immediate content review
6. Audit results feed back into system improvement
7.3 Feedback Loop (Continuous Improvement)
Audit outcomes systematically improve:
- Query templates – Refined based on missed evidence patterns
- Retrieval source weights – Adjusted for accuracy and reliability
- Contradiction detection heuristics – Enhanced to catch missed counter-evidence
- Model prompts and extraction rules – Tuned for better claim extraction
- Risk tier assignments – Recalibrated based on error patterns
- Bubble detection algorithms – Improved to identify echo chambers
7.4 Audit Transparency
- Audit statistics published regularly
- Accuracy rates by risk tier tracked and reported
- System improvements documented
- Community can view aggregate audit performance
8. Architecture Overview
AKEL Architecture
graph TB User[User Submits Content
Text/URL/Single Claim] Extract[Claim Extraction
LLM identifies distinct claims] AKEL[AKEL Core Processing
Per Claim] Evidence[Evidence Gathering] Scenario[Scenario Generation] Verdict[Verdict Generation] Storage[(Storage Layer
PostgreSQL + S3)] Queue[Processing Queue
Parallel Claims] User --> Extract Extract -->|Multiple Claims| Queue Extract -->|Single Claim| AKEL Queue -->|Process Each| AKEL AKEL --> Evidence AKEL --> Scenario Evidence --> Verdict Scenario --> Verdict Verdict --> Storage style Extract fill:#e1f5ff style Queue fill:#fff4e1 style AKEL fill:#f0f0f0
9. AKEL and Federation
In Release 1.0+, AKEL participates in cross-node knowledge alignment:
- Shares embeddings
- Exchanges canonicalized claim forms
- Exchanges scenario templates
- Sends + receives contradiction alerts
- Shares audit findings (with privacy controls)
- Never shares model weights
- Never overrides local governance
Nodes may choose trust levels for AKEL-related data: - Trusted nodes: auto-merge embeddings + templates
- Neutral nodes: require additional verification
- Untrusted nodes: fully manual import
10. Human Review Workflow (Mode 3 Publication)
For content requiring human validation before "AKEL-Generated" status:
- AKEL generates content and publishes as AI-draft (Mode 2) or keeps as draft (Mode 1)
2. Contributors inspect content in review queue
3. Contributors validate quality gates were correctly applied
4. Trusted Contributors validate high-risk (Tier A) or domain-specific outputs
5. Moderators finalize "AKEL-Generated" publication
6. Version numbers increment, full history preserved
Note: Most AI-generated content (Tier B and C) can remain in Mode 2 (AI-Generated) indefinitely. Human review is optional for these tiers unless users or audits flag issues.
11. POC v1 Behavior
The POC explicitly demonstrates AI-generated content publication:
- Produces public AI-generated output (Mode 2)
- No human data sources required
- No human approval gate
- Clear "AI-Generated - POC/Demo" labeling
- All quality gates active (including contradiction search)
- Users understand this demonstrates AI reasoning capabilities
- Risk tier classification shown (demo purposes)