POC Requirements (POC1 & POC2)

Last modified by Robert Schaub on 2025/12/24 20:16

POC Requirements

Information

POC1 Architecture: 3-stage AKEL pipeline (Extract → Analyze → Holistic) with Redis caching, credit tracking, and LLM abstraction layer.

See POC1 API Specification for complete technical details.

Status: ✅ Approved for Development
Version: 2.0 (Updated after Specification Cross-Check)
Goal: Prove that AI can extract claims and determine verdicts automatically without human intervention

1. POC Overview

1.1 What POC Tests

Core Question:

 Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?

What we're proving:

  • AI can identify factual claims from text
  • AI can evaluate those claims and produce verdicts
  • Output is comprehensible and useful
  • Fully automated approach is viable

What we're NOT testing:

  • Scenario generation (deferred to POC2)
  • Evidence display (deferred to POC2)
  • Production scalability
  • Perfect accuracy
  • Complete feature set

1.2 Scenarios Deferred to POC2

Intentional Simplification:

Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are deliberately excluded from POC1.

Rationale:

  • POC1 tests: Can AI extract claims and generate verdicts?
  • POC2 will add: Scenario generation and management
  • Open questions remain: Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow?

Design Decision:

Prove basic AI capability first, then add scenario complexity based on POC1 learnings. This is good engineering: test the hardest part (AI fact-checking) before adding architectural complexity.

No Risk:

Scenarios are additive complexity, not foundational. Deferring them to POC2 allows:

  • Faster POC1 validation
  • Learning from POC1 to inform scenario design
  • Iterative approach: fail fast if basic AI doesn't work
  • Flexibility to adjust scenario architecture based on POC1 insights

Full System Workflow (Future):
Claims Scenarios Evidence Verdicts

POC1 Simplified Workflow:
Claims Verdicts (scenarios implicit in reasoning)

2. POC Output Specification

2.1 Component 1: ANALYSIS SUMMARY (Context-Aware)

What: Context-aware overview that considers both individual claims AND their relationship to the article's main argument

Length: 4-6 sentences 

Content (Required Elements):

  1. Article's main thesis/claim - What is the article trying to argue or prove?
    2. Claim count and verdicts - How many claims analyzed, distribution of verdicts
    3. Central vs. supporting claims - Which claims are central to the article's argument?
    4. Relationship assessment - Do the claims support the article's conclusion?
    5. Overall credibility - Final assessment considering claim importance

Critical Innovation:

POC1 tests whether AI can understand that article credibility ≠ simple average of claim verdicts. An article might:

  • Make accurate supporting facts but draw unsupported conclusions
  • Have one false central claim that invalidates the whole argument
  • Misframe accurate information to mislead

Good Example (Context-Aware):
This article argues that coffee cures cancer based on its antioxidant
content. We analyzed 3 factual claims: 2 about coffee's chemical
properties are well-supported, but the main causal claim is refuted
by current evidence. The article confuses correlation with causation.
Overall assessment: MISLEADING - makes an unsupported medical claim
despite citing some accurate facts.

Poor Example (Simple Aggregation - Don't Do This):
This article makes 3 claims. 2 are well-supported and 1 is refuted.
Overall assessment: mostly accurate (67% accurate).

↑ This misses that the refuted claim IS the article's main point!

What POC1 Tests:

Can AI identify and assess:

  • ✅ The article's main thesis/conclusion?
  • ✅ Which claims are central vs. supporting?
  • ✅ Whether the evidence supports the conclusion?
  • ✅ Overall credibility considering logical structure?

If AI Cannot Do This:

That's valuable to learn in POC1! We'll:

  • Note as limitation
  • Fall back to simple aggregation with warning
  • Design explicit article-level analysis for POC2

2.2 Component 2: CLAIMS IDENTIFICATION

What: List of factual claims extracted from article
Format: Numbered list
Quantity: 3-5 claims
Requirements:

  • Factual claims only (not opinions/questions)
  • Clearly stated
  • Automatically extracted by AI

Example:
CLAIMS IDENTIFIED:

[1] Coffee reduces diabetes risk by 30%
[2] Coffee improves heart health
[3] Decaf has same benefits as regular
[4] Coffee prevents Alzheimer's completely

2.3 Component 3: CLAIMS VERDICTS

What: Verdict for each claim identified
Format: Per claim structure 

Required Elements:

  • Verdict Label: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
  • Confidence Score: 0-100%
  • Brief Reasoning: 1-3 sentences explaining why
  • Risk Tier: A (High) / B (Medium) / C (Low) - for demonstration

Example:
VERDICTS:

[1] WELL-SUPPORTED (85%) [Risk: C]
Multiple studies confirm 25-30% risk reduction with regular consumption.

[2] UNCERTAIN (65%) [Risk: B]
Evidence is mixed. Some studies show benefits, others show no effect.

[3] PARTIALLY SUPPORTED (60%) [Risk: C]
Some benefits overlap, but caffeine-related benefits are reduced in decaf.

[4] REFUTED (90%) [Risk: B]
No evidence for complete prevention. Claim is significantly overstated.

Risk Tier Display:

  • Tier A (Red): High Risk - Medical/Legal/Safety/Elections
  • Tier B (Yellow): Medium Risk - Policy/Science/Causality 
  • Tier C (Green): Low Risk - Facts/Definitions/History

Note: Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.

2.4 Component 4: ARTICLE SUMMARY (Optional)

What: Brief summary of original article content
Length: 3-5 sentences
Tone: Neutral (article's position, not FactHarbor's analysis)

Example:
ARTICLE SUMMARY:

Health News Today article discusses coffee benefits, citing studies
on diabetes and Alzheimer's. Author highlights research linking coffee 
to disease prevention. Recommends 2-3 cups daily for optimal health.

2.5 Component 5: USAGE STATISTICS (Cost Tracking)

What: LLM usage metrics for cost optimization and scaling decisions

Purpose: 

  • Understand cost per analysis
  • Identify optimization opportunities
  • Project costs at scale
  • Inform architecture decisions

Display Format:
USAGE STATISTICS:
Article: 2,450 words (12,300 characters)
Input tokens: 15,234
Output tokens: 892
Total tokens: 16,126
Estimated cost: $0.24 USD
Response time: 8.3 seconds
Cost per claim: $0.048
Model: claude-sonnet-4-20250514

Why This Matters:

At scale, LLM costs are critical:

  • 10,000 articles/month ≈ $200-500/month
  • 100,000 articles/month ≈ $2,000-5,000/month
  • Cost optimization can reduce expenses 30-50%

What POC1 Learns:

  • How cost scales with article length
  • Prompt optimization opportunities (caching, compression)
  • Output verbosity tradeoffs
  • Model selection strategy (Sonnet vs. Haiku)
  • Article length limits (if needed)

Implementation:

  • Claude API already returns usage data
  • No extra API calls needed
  • Display to user + log for aggregate analysis
  • Test with articles of varying lengths

Critical for GO/NO-GO: Unit economics must be viable at scale!

2.6 Total Output Size

Combined: 220-350 words

  • Analysis Summary (Context-Aware): 60-90 words (4-6 sentences)
  • Claims Identification: 30-50 words
  • Claims Verdicts: 100-150 words
  • Article Summary: 30-50 words (optional)

Note: Analysis summary is slightly longer (4-6 sentences vs. 3-5) to accommodate context-aware assessment of article structure and logical reasoning.

3. What's NOT in POC Scope

3.1 Feature Exclusions

The following are explicitly excluded from POC:

Content Features:

  • ❌ Scenarios (deferred to POC2)
  • ❌ Evidence display (supporting/opposing lists)
  • ❌ Source links (clickable references)
  • ❌ Detailed reasoning chains
  • ❌ Source quality ratings (shown but not detailed)
  • ❌ Contradiction detection (basic only)
  • ❌ Risk assessment (shown but not workflow-integrated)

Platform Features:

  • ❌ User accounts / authentication
  • ❌ Saved history
  • ❌ Search functionality
  • ❌ Claim comparison
  • ❌ User contributions
  • ❌ Commenting system
  • ❌ Social sharing

Technical Features:

  • ❌ Browser extensions
  • ❌ Mobile apps
  • ❌ API endpoints
  • ❌ Webhooks
  • ❌ Export features (PDF, CSV)

Quality Features:

  • ❌ Accessibility (WCAG compliance)
  • ❌ Multilingual support
  • ❌ Mobile optimization
  • ❌ Media verification (images/videos)

Production Features:

  • ❌ Security hardening
  • ❌ Privacy compliance (GDPR)
  • ❌ Terms of service
  • ❌ Monitoring/logging
  • ❌ Error tracking
  • ❌ Analytics
  • ❌ A/B testing

4. POC Simplifications vs. Full System

4.1 Architecture Comparison

POC Architecture (Simplified):
User Input Single AKEL Call Output Display
 (all processing)

Full System Architecture:
User Input Claim Extractor Claim Classifier Scenario Generator
Evidence Summarizer Contradiction Detector Verdict Generator
Quality Gates Publication Output Display

Key Differences:

AspectPOC1Full System
ProcessingSingle API callMulti-component pipeline
ScenariosNone (implicit)Explicit entities with versioning
EvidenceBasic retrievalComprehensive with quality scoring
Quality GatesSimplified (4 basic checks)Full validation infrastructure
Workflow3 steps (input/process/output)6 phases with gates
Data ModelStateless (no database)PostgreSQL + Redis + S3
ArchitectureSingle prompt to ClaudeAKEL Orchestrator + Components

4.2 Workflow Comparison

POC1 Workflow:

  1. User submits text/URL
    2. Single AKEL call (all processing in one prompt)
    3. Display results
    Total: 3 steps, 10-18 seconds

Full System Workflow:

  1. Claim Submission (extraction, normalization, clustering)
    2. Scenario Building (definitions, assumptions, boundaries)
    3. Evidence Handling (retrieval, assessment, linking)
    4. Verdict Creation (synthesis, reasoning, approval)
    5. Public Presentation (summaries, landscapes, deep dives)
    6. Time Evolution (versioning, re-evaluation triggers)
    Total: 6 phases with quality gates, 10-30 seconds

4.3 Why POC is Simplified

Engineering Rationale:

  1. Test core capability first: Can AI do basic fact-checking without humans?
    2. Fail fast: If AI can't generate reasonable verdicts, pivot early
    3. Learn before building: POC1 insights inform full architecture
    4. Iterative approach: Add complexity only after validating foundations
    5. Resource efficiency: Don't build full system if core concept fails

Acceptable Trade-offs:

  • ✅ POC proves AI capability (most risky assumption)
  • ✅ POC validates user comprehension (can people understand output?)
  • ❌ POC doesn't validate full workflow (test in Beta)
  • ❌ POC doesn't validate scale (test in Beta)
  • ❌ POC doesn't validate scenario architecture (design in POC2)

4.4 Gap Between POC1 and POC2/Beta

What needs to be built for POC2:

  • Scenario generation component
  • Evidence Model structure (full)
  • Scenario-evidence linking
  • Multi-interpretation comparison
  • Truth landscape visualization

What needs to be built for Beta:

  • Multi-component AKEL pipeline
  • Quality gate infrastructure
  • Review workflow system
  • Audit sampling framework
  • Production data model
  • Federation architecture (Release 1.0)

POC1 → POC2 is significant architectural expansion.

5. Publication Mode & Labeling

5.1 POC Publication Mode

Mode: Mode 2 (AI-Generated, No Prior Human Review)

Per FactHarbor Specification Section 11 "POC v1 Behavior":

  • Produces public AI-generated output
  • No human approval gate
  • Clear AI-Generated labeling
  • All quality gates active (simplified)
  • Risk tier classification shown (demo)

5.2 User-Facing Labels

Primary Label (top of analysis):
╔════════════════════════════════════════════════════════════╗
║ [AI-GENERATED - POC/DEMO] ║
║ ║
This analysis was produced entirely by AI and has not
been human-reviewed. Use for demonstration purposes. ║
║ ║
Source: AI/AKEL v1.0 (POC)
Review Status: Not Reviewed (Proof-of-Concept)
Quality Gates: 4/4 Passed (Simplified)
Last Updated: [timestamp] ║
╚════════════════════════════════════════════════════════════╝

Per-Claim Risk Labels:

  • [Risk: A] 🔴 High Risk (Medical/Legal/Safety)
  • [Risk: B] 🟡 Medium Risk (Policy/Science)
  • [Risk: C] 🟢 Low Risk (Facts/Definitions)

5.3 Display Requirements

Must Show:

  • AI-Generated status (prominent)
  • POC/Demo disclaimer
  • Risk tier per claim
  • Confidence scores (0-100%)
  • Quality gate status (passed/failed)
  • Timestamp

Must NOT Claim:

  • Human review
  • Production quality
  • Medical/legal advice
  • Authoritative verdicts
  • Complete accuracy

5.4 Mode 2 vs. Full System Publication

ElementPOC Mode 2Full System Mode 2Full System Mode 3
LabelAI-Generated (POC)AI-GeneratedAKEL-Generated
ReviewNoneNoneHuman-Reviewed
Quality Gates4 (simplified)6 (full)6 (full) + Human
AuditNone (POC)Sampling (5-50%)Pre-publication
Risk DisplayDemo onlyWorkflow-integratedValidated
User ActionsView onlyFlag for reviewTrust rating

6. Quality Gates (Simplified Implementation)

6.1 Overview

Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements simplified versions of the 4 mandatory gates.

Full System Has 4 Gates:

  1. Source Quality
    2. Contradiction Search (MANDATORY)
    3. Uncertainty Quantification
    4. Structural Integrity

POC Implements Simplified Versions:

  • Focus on demonstrating concept
  • Basic implementations sufficient
  • Failures displayed to user (not blocking)
  • Full system has comprehensive validation

6.2 Gate 1: Source Quality (Basic)

Full System Requirements:

  • Primary sources identified and accessible
  • Source reliability scored against whitelist
  • Citation completeness verified
  • Publication dates checked
  • Author credentials validated

POC Implementation:

  • ✅ At least 2 sources found
  • ✅ Sources accessible (URLs valid)
  • ❌ No whitelist checking
  • ❌ No credential validation
  • ❌ No comprehensive reliability scoring

Pass Criteria: ≥2 accessible sources found

Failure Handling: Display error message, don't generate verdict

6.3 Gate 2: Contradiction Search (Basic)

Full System Requirements:

  • Counter-evidence actively searched
  • Reservations and limitations identified
  • Alternative interpretations explored
  • Bubble detection (echo chambers, conspiracy theories)
  • Cross-cultural and international perspectives
  • Academic literature (supporting AND opposing)

POC Implementation:

  • ✅ Basic search for counter-evidence
  • ✅ Identify obvious contradictions
  • ❌ No comprehensive academic search
  • ❌ No bubble detection
  • ❌ No systematic alternative interpretation search
  • ❌ No international perspective verification

Pass Criteria: Basic contradiction search attempted

Failure Handling: Note "limited contradiction search" in output

6.4 Gate 3: Uncertainty Quantification (Basic)

Full System Requirements:

  • Confidence scores calculated for all claims/verdicts
  • Limitations explicitly stated
  • Data gaps identified and disclosed
  • Strength of evidence assessed
  • Alternative scenarios considered

POC Implementation:

  • ✅ Confidence scores (0-100%)
  • ✅ Basic uncertainty acknowledgment
  • ❌ No detailed limitation disclosure
  • ❌ No data gap identification
  • ❌ No alternative scenario consideration (deferred to POC2)

Pass Criteria: Confidence score assigned

Failure Handling: Show "Confidence: Unknown" if calculation fails

6.5 Gate 4: Structural Integrity (Basic)

Full System Requirements:

  • No hallucinations detected (fact-checking against sources)
  • Logic chain valid and traceable
  • References accessible and verifiable
  • No circular reasoning
  • Premises clearly stated

POC Implementation:

  • ✅ Basic coherence check
  • ✅ References accessible
  • ❌ No comprehensive hallucination detection
  • ❌ No formal logic validation
  • ❌ No premise extraction and verification

Pass Criteria: Output is coherent and references are accessible

Failure Handling: Display error message

6.6 Quality Gate Display

POC shows simplified status:
Quality Gates: 4/4 Passed (Simplified)
Source Quality: 3 sources found
Contradiction Search: Basic search completed
Uncertainty: Confidence scores assigned
Structural Integrity: Output coherent

If any gate fails:
Quality Gates: 3/4 Passed (Simplified)
Source Quality: 3 sources found
Contradiction Search: Search failed - limited evidence
Uncertainty: Confidence scores assigned
Structural Integrity: Output coherent

Note: This analysis has limited evidence. Use with caution.

6.7 Simplified vs. Full System

GatePOC (Simplified)Full System
Source Quality≥2 sources accessibleWhitelist scoring, credentials, comprehensiveness
ContradictionBasic searchSystematic academic + media + international
UncertaintyConfidence % assignedDetailed limitations, data gaps, alternatives
StructuralCoherence checkHallucination detection, logic validation, premise check

POC Goal: Demonstrate that quality gates are possible, not perfect implementation.

7. AKEL Architecture Comparison

7.1 POC AKEL (Simplified)

Implementation:

  • Single Claude API call (Sonnet 4.5)
  • One comprehensive prompt
  • All processing in single request
  • No separate components
  • No orchestration layer

Prompt Structure:
Task: Analyze this article and provide:

1. Extract 3-5 factual claims
2. For each claim:
- Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
- Assign confidence score (0-100%)
- Assign risk tier (A/B/C)
- Write brief reasoning (1-3 sentences)
3. Generate analysis summary (3-5 sentences)
4. Generate article summary (3-5 sentences)
5. Run basic quality checks

Return as structured JSON.

Processing Time: 10-18 seconds (estimate)

7.2 Full System AKEL (Production)

Architecture:
AKEL Orchestrator
├── Claim Extractor
├── Claim Classifier (with risk tier assignment)
├── Scenario Generator
├── Evidence Summarizer
├── Contradiction Detector
├── Quality Gate Validator
├── Audit Sampling Scheduler
└── Federation Sync Adapter (Release 1.0+)

Processing:

  • Parallel processing where possible
  • Separate component calls
  • Quality gates between phases
  • Audit sampling selection
  • Cross-node coordination (federated mode)

Processing Time: 10-30 seconds (full pipeline)

7.3 Why POC Uses Single Call

Advantages:

  • ✅ Simpler to implement
  • ✅ Faster POC development
  • ✅ Easier to debug
  • ✅ Proves AI capability
  • ✅ Good enough for concept validation

Limitations:

  • ❌ No component reusability
  • ❌ No parallel processing
  • ❌ All-or-nothing (can't partially succeed)
  • ❌ Harder to improve individual components
  • ❌ No audit sampling

Acceptable Trade-off:

POC tests "Can AI do this?" not "How should we architect it?"

Full component architecture comes in Beta after POC validates concept.

7.4 Evolution Path

POC1: Single prompt → Prove concept
POC2: Add scenario component → Test full pipeline
Beta: Multi-component AKEL → Production architecture
Release 1.0: Full AKEL + Federation → Scale

8. Functional Requirements

FR-POC-1: Article Input

Requirement: User can submit article for analysis

Functionality:

  • Text input field (paste article text, up to 5000 characters)
  • URL input field (paste article URL)
  • "Analyze" button to trigger processing
  • Loading indicator during analysis

Excluded:

  • No user authentication
  • No claim history
  • No search functionality
  • No saved templates

Acceptance Criteria:

  • User can paste text from article
  • User can paste URL of article
  • System accepts input and triggers analysis

FR-POC-2: Claim Extraction (Fully Automated)

Requirement: AI automatically extracts 3-5 factual claims

Functionality:

  • AI reads article text
  • AI identifies factual claims (not opinions/questions)
  • AI extracts 3-5 most important claims
  • System displays numbered list

Critical: NO MANUAL EDITING ALLOWED

  • AI selects which claims to extract
  • AI identifies factual vs. non-factual
  • System processes claims as extracted
  • No human curation or correction

Error Handling:

  • If extraction fails: Display error message
  • User can retry with different input
  • No manual intervention to fix extraction

Acceptance Criteria:

  • AI extracts 3-5 claims automatically
  • Claims are factual (not opinions)
  • Claims are clearly stated
  • No manual editing required

FR-POC-3: Verdict Generation (Fully Automated)

Requirement: AI automatically generates verdict for each claim

Functionality:

  • For each claim, AI:
  • Evaluates claim based on available evidence/knowledge
  • Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
  • Assigns confidence score (0-100%)
  • Assigns risk tier (A/B/C)
  • Writes brief reasoning (1-3 sentences)
  • System displays verdict for each claim

Critical: NO MANUAL EDITING ALLOWED

  • AI computes verdicts based on evidence
  • AI generates confidence scores
  • AI writes reasoning
  • No human review or adjustment

Error Handling:

  • If verdict generation fails: Display error message
  • User can retry
  • No manual intervention to adjust verdicts

Acceptance Criteria:

  • Each claim has a verdict
  • Confidence score is displayed (0-100%)
  • Risk tier is displayed (A/B/C)
  • Reasoning is understandable (1-3 sentences)
  • Verdict is defensible given reasoning
  • All generated automatically by AI

FR-POC-4: Analysis Summary (Fully Automated)

Requirement: AI generates brief summary of analysis

Functionality:

  • AI summarizes findings in 3-5 sentences:
  • How many claims found
  • Distribution of verdicts
  • Overall assessment
  • System displays at top of results

Critical: NO MANUAL EDITING ALLOWED

Acceptance Criteria:

  • Summary is coherent
  • Accurately reflects analysis
  • 3-5 sentences
  • Automatically generated

FR-POC-5: Article Summary (Fully Automated, Optional)

Requirement: AI generates brief summary of original article

Functionality:

  • AI summarizes article content (not FactHarbor's analysis)
  • 3-5 sentences
  • System displays

Note: Optional - can skip if time limited

Critical: NO MANUAL EDITING ALLOWED

Acceptance Criteria:

  • Summary is neutral (article's position)
  • Accurately reflects article content
  • 3-5 sentences
  • Automatically generated

FR-POC-6: Publication Mode Display

Requirement: Clear labeling of AI-generated content

Functionality:

  • Display Mode 2 publication label
  • Show POC/Demo disclaimer
  • Display risk tiers per claim
  • Show quality gate status
  • Display timestamp

Acceptance Criteria:

  • Label is prominent and clear
  • User understands this is AI-generated POC output
  • Risk tiers are color-coded
  • Quality gate status is visible

FR-POC-7: Quality Gate Execution

Requirement: Execute simplified quality gates

Functionality:

  • Check source quality (basic)
  • Attempt contradiction search (basic)
  • Calculate confidence scores
  • Verify structural integrity (basic)
  • Display gate results

Acceptance Criteria:

  • All 4 gates attempted
  • Pass/fail status displayed
  • Failures explained to user
  • Gates don't block publication (POC mode)

9. Non-Functional Requirements

NFR-POC-1: Fully Automated Processing

Requirement: Complete AI automation with zero manual intervention

Critical Rule: NO MANUAL EDITING AT ANY STAGE

What this means:

  • Claims: AI selects (no human curation)
  • Scenarios: N/A (deferred to POC2)
  • Evidence: AI evaluates (no human selection)
  • Verdicts: AI determines (no human adjustment)
  • Summaries: AI writes (no human editing)

Pipeline:
User Input AKEL Processing Output Display

 ZERO human editing

If AI output is poor:

  • ❌ Do NOT manually fix it
  • ✅ Document the failure
  • ✅ Improve prompts and retry
  • ✅ Accept that POC might fail

Why this matters:

  • Tests whether AI can do this without humans
  • Validates scalability (humans can't review every analysis)
  • Honest test of technical feasibility

NFR-POC-2: Performance

Requirement: Analysis completes in reasonable time

Acceptable Performance:

  • Processing time: 1-5 minutes (acceptable for POC)
  • Display loading indicator to user
  • Show progress if possible ("Extracting claims...", "Generating verdicts...")

Not Required:

  • Production-level speed (< 30 seconds)
  • Optimization for scale
  • Caching

Acceptance Criteria:

  • Analysis completes within 5 minutes
  • User sees loading indicator
  • No timeout errors

NFR-POC-3: Reliability

Requirement: System works for manual testing sessions

Acceptable:

  • Occasional errors (< 20% failure rate)
  • Manual restart if needed
  • Display error messages clearly

Not Required:

  • 99.9% uptime
  • Automatic error recovery
  • Production monitoring

Acceptance Criteria:

  • System works for test demonstrations
  • Errors are handled gracefully
  • User receives clear error messages

NFR-POC-4: Environment

Requirement: Runs on simple infrastructure

Acceptable:

  • Single machine or simple cloud setup
  • No distributed architecture
  • No load balancing
  • No redundancy
  • Local development environment viable

Not Required:

  • Production infrastructure
  • Multi-region deployment
  • Auto-scaling
  • Disaster recovery

NFR-POC-5: Cost Efficiency Tracking

Requirement: Track and display LLM usage metrics to inform optimization decisions

Must Track:

  • Input tokens (article + prompt)
  • Output tokens (generated analysis)
  • Total tokens
  • Estimated cost (USD)
  • Response time (seconds)
  • Article length (words/characters)

Must Display:

  • Usage statistics in UI (Component 5)
  • Cost per analysis
  • Cost per claim extracted

Must Log:

  • Aggregate metrics for analysis
  • Cost distribution by article length
  • Token efficiency trends

Purpose:

  • Understand unit economics
  • Identify optimization opportunities
  • Project costs at scale
  • Inform architecture decisions (caching, model selection, etc.)

Acceptance Criteria:

  • ✅ Usage data displayed after each analysis
  • ✅ Metrics logged for aggregate analysis
  • ✅ Cost calculated accurately (Claude API pricing)
  • ✅ Test cases include varying article lengths
  • ✅ POC1 report includes cost analysis section

Success Target:

  • Average cost per analysis < $0.05 USD
  • Cost scaling behavior understood (linear/exponential)
  • 2+ optimization opportunities identified

Critical: Unit economics must be viable for scaling decision!

10. Technical Architecture

10.1 System Components

Frontend:

  • Simple HTML form (text input + URL input + button)
  • Loading indicator
  • Results display page (single page, no tabs/navigation)

Backend:

  • Single API endpoint
  • Calls Claude API (Sonnet 4.5 or latest)
  • Parses response
  • Returns JSON to frontend

Data Storage:

  • None required (stateless POC)
  • Optional: Simple file storage or SQLite for demo examples

External Services:

  • Claude API (Anthropic) - required
  • Optional: URL fetch service for article text extraction

10.2 Processing Flow

1. User submits text or URL
 ↓
2. Backend receives request
 ↓
3. If URL: Fetch article text
 ↓
4. Call Claude API with single prompt:
"Extract claims, evaluate each, provide verdicts"
 ↓
5. Claude API returns:
- Analysis summary
- Claims list
- Verdicts for each claim (with risk tiers)
- Article summary (optional)
- Quality gate results
 ↓
6. Backend parses response
 ↓
7. Frontend displays results with Mode 2 labeling

Key Simplification: Single API call does entire analysis

10.3 AI Prompt Strategy

Single Comprehensive Prompt:
Task: Analyze this article and provide:

1. Identify the article's main thesis/conclusion
- What is the article trying to argue or prove?
- What is the primary claim or conclusion?

2. Extract 3-5 factual claims from the article
- Note which claims are CENTRAL to the main thesis
- Note which claims are SUPPORTING facts

3. For each claim:
- Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
- Assign confidence score (0-100%)
- Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
- Write brief reasoning (1-3 sentences)

4. Assess relationship between claims and main thesis:
- Do the claims actually support the article's conclusion?
- Are there logical leaps or unsupported inferences?
- Is the article's framing misleading even if individual facts are accurate?

5. Run quality gates:
- Check: ≥2 sources found
- Attempt: Basic contradiction search
- Calculate: Confidence scores
- Verify: Structural integrity

6. Write context-aware analysis summary (4-6 sentences):
- State article's main thesis
- Report claims found and verdict distribution
- Note if central claims are problematic
- Assess whether evidence supports conclusion
- Overall credibility considering claim importance

7. Write article summary (3-5 sentences: neutral summary of article content)

Return as structured JSON with quality gate results.

One prompt generates everything.

Critical Addition:

Steps 1, 2 (marking central claims), 4, and 6 are NEW for context-aware analysis. These test whether AI can distinguish between "accurate facts poorly reasoned" vs. "genuinely credible article."

10.4 Technology Stack Suggestions

Frontend:

  • HTML + CSS + JavaScript (minimal framework)
  • OR: Next.js (if team prefers)
  • Hosted: Local machine OR Vercel/Netlify free tier

Backend:

  • Python Flask/FastAPI (simple REST API)
  • OR: Next.js API routes (if using Next.js)
  • Hosted: Local machine OR Railway/Render free tier

AKEL Integration:

  • Claude API via Anthropic SDK
  • Model: Claude Sonnet 4.5 or latest available

Database:

  • None (stateless acceptable)
  • OR: SQLite if want to store demo examples
  • OR: JSON files on disk

Deployment:

  • Local development environment sufficient for POC
  • Optional: Deploy to cloud for remote demos

11. Success Criteria

11.1 Minimum Success (POC Passes)

Required for GO decision:

  • ✅ AI extracts 3-5 factual claims automatically
  • ✅ AI provides verdict for each claim automatically
  • ✅ Verdicts are reasonable (≥70% make logical sense)
  • ✅ Analysis summary is coherent
  • ✅ Output is comprehensible to reviewers
  • ✅ Team/advisors understand the output
  • ✅ Team agrees approach has merit
  • Minimal or no manual editing needed (< 30% of analyses require manual intervention)
  • Cost efficiency acceptable (average cost per analysis < $0.05 USD target)
  • Cost scaling understood (data collected on article length vs. cost)
  • Optimization opportunities identified (≥2 potential improvements documented)

Quality Definition:

  • "Reasonable verdict" = Defensible given general knowledge
  • "Coherent summary" = Logically structured, grammatically correct
  • "Comprehensible" = Reviewers understand what analysis means

11.2 POC Fails If

Automatic NO-GO if any of these:

  • ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones)
  • ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random)
  • ❌ Output incomprehensible (reviewers can't understand analysis)
  • Requires manual editing for most analyses (> 50% need human correction)
  • ❌ Team loses confidence in AI-automated approach

11.3 Quality Thresholds

POC quality expectations:

ComponentQuality ThresholdDefinition
Claim Extraction≥70% accuracy Identifies obvious factual claims, may miss some edge cases
Verdict Logic≥70% defensible Verdicts are logical given reasoning provided
Reasoning Clarity≥70% clear 1-3 sentences are understandable and relevant
Overall Analysis≥70% useful Output helps user understand article claims

Analogy: "B student" quality (70-80%), not "A+" perfection yet

Not expecting:

  • 100% accuracy
  • Perfect claim coverage
  • Comprehensive evidence gathering
  • Flawless verdicts
  • Production polish

Expecting:

  • Reasonable claim extraction
  • Defensible verdicts
  • Understandable reasoning
  • Useful output

12. Test Cases

12.1 Test Case 1: Simple Factual Claim

Input: "Coffee reduces the risk of type 2 diabetes by 30%"

Expected Output:

  • Extract claim correctly
  • Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED
  • Confidence: 70-90%
  • Risk tier: C (Low)
  • Reasoning: Mentions studies or evidence

Success: Verdict is reasonable and reasoning makes sense

12.2 Test Case 2: Complex News Article

Input: News article URL with multiple claims about politics/health/science

Expected Output:

  • Extract 3-5 key claims
  • Verdict for each (may vary: some supported, some uncertain, some refuted)
  • Coherent analysis summary
  • Article summary
  • Risk tiers assigned appropriately

Success: Claims identified are actually from article, verdicts are reasonable

12.3 Test Case 3: Controversial Topic

Input: Article on contested political or scientific topic

Expected Output:

  • Balanced analysis
  • Acknowledges uncertainty where appropriate
  • Doesn't overstate confidence
  • Reasoning shows awareness of complexity

Success: Analysis is fair and doesn't show obvious bias

12.4 Test Case 4: Clearly False Claim

Input: Article with obviously false claim (e.g., "The Earth is flat")

Expected Output:

  • Extract claim
  • Verdict: REFUTED
  • High confidence (> 90%)
  • Risk tier: C (Low - established fact)
  • Clear reasoning

Success: AI correctly identifies false claim with high confidence

12.5 Test Case 5: Genuinely Uncertain Claim

Input: Article with claim where evidence is genuinely mixed

Expected Output:

  • Extract claim
  • Verdict: UNCERTAIN
  • Moderate confidence (40-60%)
  • Reasoning explains why uncertain

Success: AI recognizes uncertainty and doesn't overstate confidence

12.6 Test Case 6: High-Risk Medical Claim

Input: Article making medical claims

Expected Output:

  • Extract claim
  • Verdict: [appropriate based on evidence]
  • Risk tier: A (High - medical)
  • Red label displayed
  • Clear disclaimer about not being medical advice

Success: Risk tier correctly assigned, appropriate warnings shown

13. POC Decision Gate

13.1 Decision Framework

After POC testing complete, team makes one of three decisions:

Option A: GO (Proceed to POC2)

Conditions:

  • AI quality ≥70% without manual editing
  • Basic claim → verdict pipeline validated
  • Internal + advisor feedback positive
  • Technical feasibility confirmed
  • Team confident in direction
  • Clear path to improving AI quality to ≥90%

Next Steps:

  • Plan POC2 development (add scenarios)
  • Design scenario architecture
  • Expand to Evidence Model structure
  • Test with more complex articles

Option B: NO-GO (Pivot or Stop)

Conditions:

  • AI quality < 60%
  • Requires manual editing for most analyses (> 50%)
  • Feedback indicates fundamental flaws
  • Cost/effort not justified by value
  • No clear path to improvement

Next Steps:

  • Pivot: Change to hybrid human-AI approach (accept manual review required)
  • Stop: Conclude approach not viable, revisit later

Option C: ITERATE (Improve POC)

Conditions:

  • Concept has merit but execution needs work
  • Specific improvements identified
  • Addressable with better prompts/approach
  • AI quality between 60-70%

Next Steps:

  • Improve AI prompts
  • Test different approaches
  • Re-run POC with improvements
  • Then make GO/NO-GO decision

13.2 Decision Criteria Summary

AI Quality < 60%  NO-GO (approach doesn't work)
AI Quality 60-70%  ITERATE (improve and retry)
AI Quality 70%  GO (proceed to POC2)

14. Key Risks & Mitigations

14.1 Risk: AI Quality Not Good Enough

Likelihood: Medium-High
Impact: POC fails 

Mitigation:

  • Extensive prompt engineering and testing
  • Use best available AI models (Sonnet 4.5)
  • Test with diverse article types
  • Iterate on prompts based on results

Acceptance: This is what POC tests - be ready for failure

14.2 Risk: AI Consistency Issues

Likelihood: Medium
Impact: Works sometimes, fails other times 

Mitigation:

  • Test with 10+ diverse articles
  • Measure success rate honestly
  • Improve prompts to increase consistency

Acceptance: Some variability OK if average quality ≥70%

14.3 Risk: Output Incomprehensible

Likelihood: Low-Medium
Impact: Users can't understand analysis 

Mitigation:

  • Create clear explainer document
  • Iterate on output format
  • Test with non-technical reviewers
  • Simplify language if needed

Acceptance: Iterate until comprehensible

14.4 Risk: API Rate Limits / Costs

Likelihood: Low
Impact: System slow or expensive 

Mitigation:

  • Monitor API usage
  • Implement retry logic
  • Estimate costs before scaling

Acceptance: POC can be slow and expensive (optimization later)

14.5 Risk: Scope Creep

Likelihood: Medium
Impact: POC becomes too complex 

Mitigation:

  • Strict scope discipline
  • Say NO to feature additions
  • Keep focus on core question

Acceptance: POC is minimal by design

15. POC Philosophy

15.1 Core Principles



      1. Build Less, Learn More
  • Minimum features to test hypothesis
  • Don't build unvalidated features
  • Focus on core question only

2. Fail Fast

  • Quick test of hardest part (AI capability)
  • Accept that POC might fail
  • Better to discover issues early
  • Honest assessment over optimistic hope

3. Test First, Build Second

  • Validate AI can do this before building platform
  • Don't assume it will work
  • Let results guide decisions

4. Automation First

  • No manual editing allowed
  • Tests scalability, not just feasibility
  • Proves approach can work at scale

5. Honest Assessment

  • Don't cherry-pick examples
  • Don't manually fix bad outputs
  • Document failures openly
  • Make data-driven decisions

15.2 What POC Is

✅ Testing AI capability without humans
✅ Proving core technical concept
✅ Fast validation of approach
✅ Honest assessment of feasibility 

15.3 What POC Is NOT

❌ Building a product
❌ Production-ready system
❌ Feature-complete platform
❌ Perfectly accurate analysis
❌ Polished user experience 

16. Success

 Clear Path Forward ==

If POC succeeds (≥70% AI quality):

  • ✅ Approach validated
  • ✅ Proceed to POC2 (add scenarios)
  • ✅ Design full Evidence Model structure
  • ✅ Test multi-scenario comparison
  • ✅ Focus on improving AI quality from 70% → 90%

If POC fails (< 60% AI quality):

  • ✅ Learn what doesn't work
  • ✅ Pivot to different approach
  • ✅ OR wait for better AI technology
  • ✅ Avoid wasting resources on non-viable approach

Either way, POC provides clarity.

17. Related Pages

Document Status: ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)

NFR-POC-11: LLM Provider Abstraction (POC1)

Requirement: POC1 MUST implement LLM abstraction layer with support for multiple providers.

POC1 Implementation:

  • Primary Provider: Anthropic Claude API
  • Stage 1: Claude Haiku 4
  • Stage 2: Claude Sonnet 3.5 (cached)
  • Stage 3: Claude Sonnet 3.5
  • Provider Interface: Abstract LLMProvider interface implemented
  • Configuration: Environment variables for provider selection
  • LLM_PRIMARY_PROVIDER=anthropic
  • LLM_STAGE1_MODEL=claude-haiku-4
  • LLM_STAGE2_MODEL=claude-sonnet-3-5
  • Failover: Basic error handling with cache fallback for Stage 2
  • Cost Tracking: Log provider name and cost per request

Future (POC2/Beta):

  • Secondary provider (OpenAI) with automatic failover
  • Admin API for runtime provider switching
  • Cost comparison dashboard
  • Cross-provider output verification

Success Criteria:

  • All LLM calls go through abstraction layer (no direct API calls)
  • Provider can be changed via environment variable without code changes
  • Cost tracking includes provider name in logs
  • Stage 2 falls back to cache on provider failure

Implementation: See POC1 API & Schemas Specification Section 6

Dependencies:

  • NFR-14 (Main Requirements)
  • Design Decision 9
  • Architecture Section 2.2

Priority: HIGH (P1)

Rationale: Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.