POC Requirements (POC1 & POC2)

Last modified by Robert Schaub on 2025/12/24 20:16

POC Requirements

POC1 Architecture: 3-stage AKEL pipeline (Extract → Analyze → Holistic) with Redis caching, credit tracking, and LLM abstraction layer.

See POC1 API Specification for complete technical details.

Status: ✅ Approved for Development
Version: 2.0 (Updated after Specification Cross-Check)
Goal: Prove that AI can extract claims and determine verdicts automatically without human intervention

1. POC Overview

1.1 What POC Tests

Core Question:

Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?

What we're proving:

AI can identify factual claims from text
AI can evaluate those claims and produce verdicts
Output is comprehensible and useful
Fully automated approach is viable

What we're NOT testing:

Scenario generation (deferred to POC2)
Evidence display (deferred to POC2)
Production scalability
Perfect accuracy
Complete feature set

1.2 Scenarios Deferred to POC2

Intentional Simplification:

Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are deliberately excluded from POC1.

Rationale:

POC1 tests: Can AI extract claims and generate verdicts?
POC2 will add: Scenario generation and management
Open questions remain: Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow?

Design Decision:

Prove basic AI capability first, then add scenario complexity based on POC1 learnings. This is good engineering: test the hardest part (AI fact-checking) before adding architectural complexity.

No Risk:

Scenarios are additive complexity, not foundational. Deferring them to POC2 allows:

Faster POC1 validation
Learning from POC1 to inform scenario design
Iterative approach: fail fast if basic AI doesn't work
Flexibility to adjust scenario architecture based on POC1 insights

Full System Workflow (Future):
Claims → Scenarios → Evidence → Verdicts

POC1 Simplified Workflow:
Claims → Verdicts (scenarios implicit in reasoning)

2. POC Output Specification

2.1 Component 1: ANALYSIS SUMMARY (Context-Aware)

What: Context-aware overview that considers both individual claims AND their relationship to the article's main argument

Length: 4-6 sentences

Content (Required Elements):

Article's main thesis/claim - What is the article trying to argue or prove?
2. Claim count and verdicts - How many claims analyzed, distribution of verdicts
3. Central vs. supporting claims - Which claims are central to the article's argument?
4. Relationship assessment - Do the claims support the article's conclusion?
5. Overall credibility - Final assessment considering claim importance

Critical Innovation:

POC1 tests whether AI can understand that article credibility ≠ simple average of claim verdicts. An article might:

Make accurate supporting facts but draw unsupported conclusions
Have one false central claim that invalidates the whole argument
Misframe accurate information to mislead

Good Example (Context-Aware):
This article argues that coffee cures cancer based on its antioxidant
content. We analyzed 3 factual claims: 2 about coffee's chemical
properties are well-supported, but the main causal claim is refuted
by current evidence. The article confuses correlation with causation.
Overall assessment: MISLEADING - makes an unsupported medical claim
despite citing some accurate facts.

Poor Example (Simple Aggregation - Don't Do This):
This article makes 3 claims. 2 are well-supported and 1 is refuted.
Overall assessment: mostly accurate (67% accurate).
↑ This misses that the refuted claim IS the article's main point!

What POC1 Tests:

Can AI identify and assess:

✅ The article's main thesis/conclusion?
✅ Which claims are central vs. supporting?
✅ Whether the evidence supports the conclusion?
✅ Overall credibility considering logical structure?

If AI Cannot Do This:

That's valuable to learn in POC1! We'll:

Note as limitation
Fall back to simple aggregation with warning
Design explicit article-level analysis for POC2

2.2 Component 2: CLAIMS IDENTIFICATION

What: List of factual claims extracted from article
Format: Numbered list
Quantity: 3-5 claims
Requirements:

Factual claims only (not opinions/questions)
Clearly stated
Automatically extracted by AI

Example:
CLAIMS IDENTIFIED:

[1] Coffee reduces diabetes risk by 30%
[2] Coffee improves heart health
[3] Decaf has same benefits as regular
[4] Coffee prevents Alzheimer's completely

2.3 Component 3: CLAIMS VERDICTS

What: Verdict for each claim identified
Format: Per claim structure

Required Elements:

Verdict Label: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
Confidence Score: 0-100%
Brief Reasoning: 1-3 sentences explaining why
Risk Tier: A (High) / B (Medium) / C (Low) - for demonstration

Example:
VERDICTS:

[1] WELL-SUPPORTED (85%) [Risk: C]
Multiple studies confirm 25-30% risk reduction with regular consumption.

[2] UNCERTAIN (65%) [Risk: B]
Evidence is mixed. Some studies show benefits, others show no effect.

[3] PARTIALLY SUPPORTED (60%) [Risk: C]
Some benefits overlap, but caffeine-related benefits are reduced in decaf.

[4] REFUTED (90%) [Risk: B]
No evidence for complete prevention. Claim is significantly overstated.

Risk Tier Display:

Tier A (Red): High Risk - Medical/Legal/Safety/Elections
Tier B (Yellow): Medium Risk - Policy/Science/Causality
Tier C (Green): Low Risk - Facts/Definitions/History

Note: Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.

2.4 Component 4: ARTICLE SUMMARY (Optional)

What: Brief summary of original article content
Length: 3-5 sentences
Tone: Neutral (article's position, not FactHarbor's analysis)

Example:
ARTICLE SUMMARY:

Health News Today article discusses coffee benefits, citing studies
on diabetes and Alzheimer's. Author highlights research linking coffee
to disease prevention. Recommends 2-3 cups daily for optimal health.

2.5 Component 5: USAGE STATISTICS (Cost Tracking)

What: LLM usage metrics for cost optimization and scaling decisions

Purpose:

Understand cost per analysis
Identify optimization opportunities
Project costs at scale
Inform architecture decisions

Display Format:
USAGE STATISTICS:
• Article: 2,450 words (12,300 characters)
• Input tokens: 15,234
• Output tokens: 892
• Total tokens: 16,126
• Estimated cost: $0.24 USD
• Response time: 8.3 seconds
• Cost per claim: $0.048
• Model: claude-sonnet-4-20250514

Why This Matters:

At scale, LLM costs are critical:

10,000 articles/month ≈ $200-500/month
100,000 articles/month ≈ $2,000-5,000/month
Cost optimization can reduce expenses 30-50%

What POC1 Learns:

How cost scales with article length
Prompt optimization opportunities (caching, compression)
Output verbosity tradeoffs
Model selection strategy (Sonnet vs. Haiku)
Article length limits (if needed)

Implementation:

Claude API already returns usage data
No extra API calls needed
Display to user + log for aggregate analysis
Test with articles of varying lengths

Critical for GO/NO-GO: Unit economics must be viable at scale!

2.6 Total Output Size

Combined: 220-350 words

Analysis Summary (Context-Aware): 60-90 words (4-6 sentences)
Claims Identification: 30-50 words
Claims Verdicts: 100-150 words
Article Summary: 30-50 words (optional)

Note: Analysis summary is slightly longer (4-6 sentences vs. 3-5) to accommodate context-aware assessment of article structure and logical reasoning.

3. What's NOT in POC Scope

3.1 Feature Exclusions

The following are explicitly excluded from POC:

Content Features:

❌ Scenarios (deferred to POC2)
❌ Evidence display (supporting/opposing lists)
❌ Source links (clickable references)
❌ Detailed reasoning chains
❌ Source quality ratings (shown but not detailed)
❌ Contradiction detection (basic only)
❌ Risk assessment (shown but not workflow-integrated)

Platform Features:

❌ User accounts / authentication
❌ Saved history
❌ Search functionality
❌ Claim comparison
❌ User contributions
❌ Commenting system
❌ Social sharing

Technical Features:

❌ Browser extensions
❌ Mobile apps
❌ API endpoints
❌ Webhooks
❌ Export features (PDF, CSV)

Quality Features:

❌ Accessibility (WCAG compliance)
❌ Multilingual support
❌ Mobile optimization
❌ Media verification (images/videos)

Production Features:

❌ Security hardening
❌ Privacy compliance (GDPR)
❌ Terms of service
❌ Monitoring/logging
❌ Error tracking
❌ Analytics
❌ A/B testing

4. POC Simplifications vs. Full System

4.1 Architecture Comparison

POC Architecture (Simplified):
User Input → Single AKEL Call → Output Display
(all processing)

Full System Architecture:
User Input → Claim Extractor → Claim Classifier → Scenario Generator
→ Evidence Summarizer → Contradiction Detector → Verdict Generator
→ Quality Gates → Publication → Output Display

Key Differences:

Aspect	POC1	Full System
Processing	Single API call	Multi-component pipeline
Scenarios	None (implicit)	Explicit entities with versioning
Evidence	Basic retrieval	Comprehensive with quality scoring
Quality Gates	Simplified (4 basic checks)	Full validation infrastructure
Workflow	3 steps (input/process/output)	6 phases with gates
Data Model	Stateless (no database)	PostgreSQL + Redis + S3
Architecture	Single prompt to Claude	AKEL Orchestrator + Components

4.2 Workflow Comparison

POC1 Workflow:

User submits text/URL
2. Single AKEL call (all processing in one prompt)
3. Display results
Total: 3 steps, 10-18 seconds

Full System Workflow:

Claim Submission (extraction, normalization, clustering)
2. Scenario Building (definitions, assumptions, boundaries)
3. Evidence Handling (retrieval, assessment, linking)
4. Verdict Creation (synthesis, reasoning, approval)
5. Public Presentation (summaries, landscapes, deep dives)
6. Time Evolution (versioning, re-evaluation triggers)
Total: 6 phases with quality gates, 10-30 seconds

4.3 Why POC is Simplified

Engineering Rationale:

Test core capability first: Can AI do basic fact-checking without humans?
2. Fail fast: If AI can't generate reasonable verdicts, pivot early
3. Learn before building: POC1 insights inform full architecture
4. Iterative approach: Add complexity only after validating foundations
5. Resource efficiency: Don't build full system if core concept fails

Acceptable Trade-offs:

✅ POC proves AI capability (most risky assumption)
✅ POC validates user comprehension (can people understand output?)
❌ POC doesn't validate full workflow (test in Beta)
❌ POC doesn't validate scale (test in Beta)
❌ POC doesn't validate scenario architecture (design in POC2)

4.4 Gap Between POC1 and POC2/Beta

What needs to be built for POC2:

Scenario generation component
Evidence Model structure (full)
Scenario-evidence linking
Multi-interpretation comparison
Truth landscape visualization

What needs to be built for Beta:

Multi-component AKEL pipeline
Quality gate infrastructure
Review workflow system
Audit sampling framework
Production data model
Federation architecture (Release 1.0)

POC1 → POC2 is significant architectural expansion.

5. Publication Mode & Labeling

5.1 POC Publication Mode

Mode: Mode 2 (AI-Generated, No Prior Human Review)

Per FactHarbor Specification Section 11 "POC v1 Behavior":

Produces public AI-generated output
No human approval gate
Clear AI-Generated labeling
All quality gates active (simplified)
Risk tier classification shown (demo)

5.2 User-Facing Labels

Primary Label (top of analysis):
╔════════════════════════════════════════════════════════════╗
║ [AI-GENERATED - POC/DEMO] ║
║ ║
║ This analysis was produced entirely by AI and has not ║
║ been human-reviewed. Use for demonstration purposes. ║
║ ║
║ Source: AI/AKEL v1.0 (POC) ║
║ Review Status: Not Reviewed (Proof-of-Concept) ║
║ Quality Gates: 4/4 Passed (Simplified) ║
║ Last Updated: [timestamp] ║
╚════════════════════════════════════════════════════════════╝

Per-Claim Risk Labels:

[Risk: A] 🔴 High Risk (Medical/Legal/Safety)
[Risk: B] 🟡 Medium Risk (Policy/Science)
[Risk: C] 🟢 Low Risk (Facts/Definitions)

5.3 Display Requirements

Must Show:

AI-Generated status (prominent)
POC/Demo disclaimer
Risk tier per claim
Confidence scores (0-100%)
Quality gate status (passed/failed)
Timestamp

Must NOT Claim:

Human review
Production quality
Medical/legal advice
Authoritative verdicts
Complete accuracy

5.4 Mode 2 vs. Full System Publication

Element	POC Mode 2	Full System Mode 2	Full System Mode 3
Label	AI-Generated (POC)	AI-Generated	AKEL-Generated
Review	None	None	Human-Reviewed
Quality Gates	4 (simplified)	6 (full)	6 (full) + Human
Audit	None (POC)	Sampling (5-50%)	Pre-publication
Risk Display	Demo only	Workflow-integrated	Validated
User Actions	View only	Flag for review	Trust rating

6. Quality Gates (Simplified Implementation)

6.1 Overview

Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements simplified versions of the 4 mandatory gates.

Full System Has 4 Gates:

Source Quality
2. Contradiction Search (MANDATORY)
3. Uncertainty Quantification
4. Structural Integrity

POC Implements Simplified Versions:

Focus on demonstrating concept
Basic implementations sufficient
Failures displayed to user (not blocking)
Full system has comprehensive validation

6.2 Gate 1: Source Quality (Basic)

Full System Requirements:

Primary sources identified and accessible
Source reliability scored against whitelist
Citation completeness verified
Publication dates checked
Author credentials validated

POC Implementation:

✅ At least 2 sources found
✅ Sources accessible (URLs valid)
❌ No whitelist checking
❌ No credential validation
❌ No comprehensive reliability scoring

Pass Criteria: ≥2 accessible sources found

Failure Handling: Display error message, don't generate verdict

6.3 Gate 2: Contradiction Search (Basic)

Full System Requirements:

Counter-evidence actively searched
Reservations and limitations identified
Alternative interpretations explored
Bubble detection (echo chambers, conspiracy theories)
Cross-cultural and international perspectives
Academic literature (supporting AND opposing)

POC Implementation:

✅ Basic search for counter-evidence
✅ Identify obvious contradictions
❌ No comprehensive academic search
❌ No bubble detection
❌ No systematic alternative interpretation search
❌ No international perspective verification

Pass Criteria: Basic contradiction search attempted

Failure Handling: Note "limited contradiction search" in output

6.4 Gate 3: Uncertainty Quantification (Basic)

Full System Requirements:

Confidence scores calculated for all claims/verdicts
Limitations explicitly stated
Data gaps identified and disclosed
Strength of evidence assessed
Alternative scenarios considered

POC Implementation:

✅ Confidence scores (0-100%)
✅ Basic uncertainty acknowledgment
❌ No detailed limitation disclosure
❌ No data gap identification
❌ No alternative scenario consideration (deferred to POC2)

Pass Criteria: Confidence score assigned

Failure Handling: Show "Confidence: Unknown" if calculation fails

6.5 Gate 4: Structural Integrity (Basic)

Full System Requirements:

No hallucinations detected (fact-checking against sources)
Logic chain valid and traceable
References accessible and verifiable
No circular reasoning
Premises clearly stated

POC Implementation:

✅ Basic coherence check
✅ References accessible
❌ No comprehensive hallucination detection
❌ No formal logic validation
❌ No premise extraction and verification

Pass Criteria: Output is coherent and references are accessible

Failure Handling: Display error message

6.6 Quality Gate Display

POC shows simplified status:
Quality Gates: 4/4 Passed (Simplified)
✓ Source Quality: 3 sources found
✓ Contradiction Search: Basic search completed
✓ Uncertainty: Confidence scores assigned
✓ Structural Integrity: Output coherent

If any gate fails:
Quality Gates: 3/4 Passed (Simplified)
✓ Source Quality: 3 sources found
✗ Contradiction Search: Search failed - limited evidence
✓ Uncertainty: Confidence scores assigned
✓ Structural Integrity: Output coherent

Note: This analysis has limited evidence. Use with caution.

6.7 Simplified vs. Full System

Gate	POC (Simplified)	Full System
Source Quality	≥2 sources accessible	Whitelist scoring, credentials, comprehensiveness
Contradiction	Basic search	Systematic academic + media + international
Uncertainty	Confidence % assigned	Detailed limitations, data gaps, alternatives
Structural	Coherence check	Hallucination detection, logic validation, premise check

POC Goal: Demonstrate that quality gates are possible, not perfect implementation.

7. AKEL Architecture Comparison

7.1 POC AKEL (Simplified)

Implementation:

Single Claude API call (Sonnet 4.5)
One comprehensive prompt
All processing in single request
No separate components
No orchestration layer

Prompt Structure:
Task: Analyze this article and provide:

1. Extract 3-5 factual claims
2. For each claim:
- Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
- Assign confidence score (0-100%)
- Assign risk tier (A/B/C)
- Write brief reasoning (1-3 sentences)
3. Generate analysis summary (3-5 sentences)
4. Generate article summary (3-5 sentences)
5. Run basic quality checks

Return as structured JSON.

Processing Time: 10-18 seconds (estimate)

7.2 Full System AKEL (Production)

Architecture:
AKEL Orchestrator
├── Claim Extractor
├── Claim Classifier (with risk tier assignment)
├── Scenario Generator
├── Evidence Summarizer
├── Contradiction Detector
├── Quality Gate Validator
├── Audit Sampling Scheduler
└── Federation Sync Adapter (Release 1.0+)

Processing:

Parallel processing where possible
Separate component calls
Quality gates between phases
Audit sampling selection
Cross-node coordination (federated mode)

Processing Time: 10-30 seconds (full pipeline)

7.3 Why POC Uses Single Call

Advantages:

✅ Simpler to implement
✅ Faster POC development
✅ Easier to debug
✅ Proves AI capability
✅ Good enough for concept validation

Limitations:

❌ No component reusability
❌ No parallel processing
❌ All-or-nothing (can't partially succeed)
❌ Harder to improve individual components
❌ No audit sampling

Acceptable Trade-off:

POC tests "Can AI do this?" not "How should we architect it?"

Full component architecture comes in Beta after POC validates concept.

7.4 Evolution Path

POC1: Single prompt → Prove concept
POC2: Add scenario component → Test full pipeline
Beta: Multi-component AKEL → Production architecture
Release 1.0: Full AKEL + Federation → Scale

8. Functional Requirements

FR-POC-1: Article Input

Requirement: User can submit article for analysis

Functionality:

Text input field (paste article text, up to 5000 characters)
URL input field (paste article URL)
"Analyze" button to trigger processing
Loading indicator during analysis

Excluded:

No user authentication
No claim history
No search functionality
No saved templates

Acceptance Criteria:

User can paste text from article
User can paste URL of article
System accepts input and triggers analysis

FR-POC-2: Claim Extraction (Fully Automated)

Requirement: AI automatically extracts 3-5 factual claims

Functionality:

AI reads article text
AI identifies factual claims (not opinions/questions)
AI extracts 3-5 most important claims
System displays numbered list

Critical: NO MANUAL EDITING ALLOWED

AI selects which claims to extract
AI identifies factual vs. non-factual
System processes claims as extracted
No human curation or correction

Error Handling:

If extraction fails: Display error message
User can retry with different input
No manual intervention to fix extraction

Acceptance Criteria:

AI extracts 3-5 claims automatically
Claims are factual (not opinions)
Claims are clearly stated
No manual editing required

FR-POC-3: Verdict Generation (Fully Automated)

Requirement: AI automatically generates verdict for each claim

Functionality:

For each claim, AI:
Evaluates claim based on available evidence/knowledge
Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
Assigns confidence score (0-100%)
Assigns risk tier (A/B/C)
Writes brief reasoning (1-3 sentences)
System displays verdict for each claim

Critical: NO MANUAL EDITING ALLOWED

AI computes verdicts based on evidence
AI generates confidence scores
AI writes reasoning
No human review or adjustment

Error Handling:

If verdict generation fails: Display error message
User can retry
No manual intervention to adjust verdicts

Acceptance Criteria:

Each claim has a verdict
Confidence score is displayed (0-100%)
Risk tier is displayed (A/B/C)
Reasoning is understandable (1-3 sentences)
Verdict is defensible given reasoning
All generated automatically by AI

FR-POC-4: Analysis Summary (Fully Automated)

Requirement: AI generates brief summary of analysis

Functionality:

AI summarizes findings in 3-5 sentences:
How many claims found
Distribution of verdicts
Overall assessment
System displays at top of results

Critical: NO MANUAL EDITING ALLOWED

Acceptance Criteria:

Summary is coherent
Accurately reflects analysis
3-5 sentences
Automatically generated

FR-POC-5: Article Summary (Fully Automated, Optional)

Requirement: AI generates brief summary of original article

Functionality:

AI summarizes article content (not FactHarbor's analysis)
3-5 sentences
System displays

Note: Optional - can skip if time limited

Critical: NO MANUAL EDITING ALLOWED

Acceptance Criteria:

Summary is neutral (article's position)
Accurately reflects article content
3-5 sentences
Automatically generated

FR-POC-6: Publication Mode Display

Requirement: Clear labeling of AI-generated content

Functionality:

Display Mode 2 publication label
Show POC/Demo disclaimer
Display risk tiers per claim
Show quality gate status
Display timestamp

Acceptance Criteria:

Label is prominent and clear
User understands this is AI-generated POC output
Risk tiers are color-coded
Quality gate status is visible

FR-POC-7: Quality Gate Execution

Requirement: Execute simplified quality gates

Functionality:

Check source quality (basic)
Attempt contradiction search (basic)
Calculate confidence scores
Verify structural integrity (basic)
Display gate results

Acceptance Criteria:

All 4 gates attempted
Pass/fail status displayed
Failures explained to user
Gates don't block publication (POC mode)

9. Non-Functional Requirements

NFR-POC-1: Fully Automated Processing

Requirement: Complete AI automation with zero manual intervention

Critical Rule: NO MANUAL EDITING AT ANY STAGE

What this means:

Claims: AI selects (no human curation)
Scenarios: N/A (deferred to POC2)
Evidence: AI evaluates (no human selection)
Verdicts: AI determines (no human adjustment)
Summaries: AI writes (no human editing)

Pipeline:
User Input → AKEL Processing → Output Display
↓
ZERO human editing

If AI output is poor:

❌ Do NOT manually fix it
✅ Document the failure
✅ Improve prompts and retry
✅ Accept that POC might fail

Why this matters:

Tests whether AI can do this without humans
Validates scalability (humans can't review every analysis)
Honest test of technical feasibility

NFR-POC-2: Performance

Requirement: Analysis completes in reasonable time

Acceptable Performance:

Processing time: 1-5 minutes (acceptable for POC)
Display loading indicator to user
Show progress if possible ("Extracting claims...", "Generating verdicts...")

Not Required:

Production-level speed (< 30 seconds)
Optimization for scale
Caching

Acceptance Criteria:

Analysis completes within 5 minutes
User sees loading indicator
No timeout errors

NFR-POC-3: Reliability

Requirement: System works for manual testing sessions

Acceptable:

Occasional errors (< 20% failure rate)
Manual restart if needed
Display error messages clearly

Not Required:

99.9% uptime
Automatic error recovery
Production monitoring

Acceptance Criteria:

System works for test demonstrations
Errors are handled gracefully
User receives clear error messages

NFR-POC-4: Environment

Requirement: Runs on simple infrastructure

Acceptable:

Single machine or simple cloud setup
No distributed architecture
No load balancing
No redundancy
Local development environment viable

Not Required:

Production infrastructure
Multi-region deployment
Auto-scaling
Disaster recovery

NFR-POC-5: Cost Efficiency Tracking

Requirement: Track and display LLM usage metrics to inform optimization decisions

Must Track:

Input tokens (article + prompt)
Output tokens (generated analysis)
Total tokens
Estimated cost (USD)
Response time (seconds)
Article length (words/characters)

Must Display:

Usage statistics in UI (Component 5)
Cost per analysis
Cost per claim extracted

Must Log:

Aggregate metrics for analysis
Cost distribution by article length
Token efficiency trends

Purpose:

Understand unit economics
Identify optimization opportunities
Project costs at scale
Inform architecture decisions (caching, model selection, etc.)

Acceptance Criteria:

✅ Usage data displayed after each analysis
✅ Metrics logged for aggregate analysis
✅ Cost calculated accurately (Claude API pricing)
✅ Test cases include varying article lengths
✅ POC1 report includes cost analysis section

Success Target:

Average cost per analysis < $0.05 USD
Cost scaling behavior understood (linear/exponential)
2+ optimization opportunities identified

Critical: Unit economics must be viable for scaling decision!

10. Technical Architecture

10.1 System Components

Frontend:

Simple HTML form (text input + URL input + button)
Loading indicator
Results display page (single page, no tabs/navigation)

Backend:

Single API endpoint
Calls Claude API (Sonnet 4.5 or latest)
Parses response
Returns JSON to frontend

Data Storage:

None required (stateless POC)
Optional: Simple file storage or SQLite for demo examples

External Services:

Claude API (Anthropic) - required
Optional: URL fetch service for article text extraction

10.2 Processing Flow

1. User submits text or URL
↓
2. Backend receives request
↓
3. If URL: Fetch article text
↓
4. Call Claude API with single prompt:
"Extract claims, evaluate each, provide verdicts"
↓
5. Claude API returns:
- Analysis summary
- Claims list
- Verdicts for each claim (with risk tiers)
- Article summary (optional)
- Quality gate results
↓
6. Backend parses response
↓
7. Frontend displays results with Mode 2 labeling

Key Simplification: Single API call does entire analysis

10.3 AI Prompt Strategy

Single Comprehensive Prompt:
Task: Analyze this article and provide:

1. Identify the article's main thesis/conclusion
- What is the article trying to argue or prove?
- What is the primary claim or conclusion?

2. Extract 3-5 factual claims from the article
- Note which claims are CENTRAL to the main thesis
- Note which claims are SUPPORTING facts

3. For each claim:
- Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
- Assign confidence score (0-100%)
- Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
- Write brief reasoning (1-3 sentences)

4. Assess relationship between claims and main thesis:
- Do the claims actually support the article's conclusion?
- Are there logical leaps or unsupported inferences?
- Is the article's framing misleading even if individual facts are accurate?

5. Run quality gates:
- Check: ≥2 sources found
- Attempt: Basic contradiction search
- Calculate: Confidence scores
- Verify: Structural integrity

6. Write context-aware analysis summary (4-6 sentences):
- State article's main thesis
- Report claims found and verdict distribution
- Note if central claims are problematic
- Assess whether evidence supports conclusion
- Overall credibility considering claim importance

7. Write article summary (3-5 sentences: neutral summary of article content)

Return as structured JSON with quality gate results.

One prompt generates everything.

Critical Addition:

Steps 1, 2 (marking central claims), 4, and 6 are NEW for context-aware analysis. These test whether AI can distinguish between "accurate facts poorly reasoned" vs. "genuinely credible article."

10.4 Technology Stack Suggestions

Frontend:

HTML + CSS + JavaScript (minimal framework)
OR: Next.js (if team prefers)
Hosted: Local machine OR Vercel/Netlify free tier

Backend:

Python Flask/FastAPI (simple REST API)
OR: Next.js API routes (if using Next.js)
Hosted: Local machine OR Railway/Render free tier

AKEL Integration:

Claude API via Anthropic SDK
Model: Claude Sonnet 4.5 or latest available

Database:

None (stateless acceptable)
OR: SQLite if want to store demo examples
OR: JSON files on disk

Deployment:

Local development environment sufficient for POC
Optional: Deploy to cloud for remote demos

11. Success Criteria

11.1 Minimum Success (POC Passes)

Required for GO decision:

✅ AI extracts 3-5 factual claims automatically
✅ AI provides verdict for each claim automatically
✅ Verdicts are reasonable (≥70% make logical sense)
✅ Analysis summary is coherent
✅ Output is comprehensible to reviewers
✅ Team/advisors understand the output
✅ Team agrees approach has merit
✅ Minimal or no manual editing needed (< 30% of analyses require manual intervention)
✅ Cost efficiency acceptable (average cost per analysis < $0.05 USD target)
✅ Cost scaling understood (data collected on article length vs. cost)
✅ Optimization opportunities identified (≥2 potential improvements documented)

Quality Definition:

"Reasonable verdict" = Defensible given general knowledge
"Coherent summary" = Logically structured, grammatically correct
"Comprehensible" = Reviewers understand what analysis means

11.2 POC Fails If

Automatic NO-GO if any of these:

❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones)
❌ Verdicts nonsensical (< 60% reasonable - contradictory or random)
❌ Output incomprehensible (reviewers can't understand analysis)
❌ Requires manual editing for most analyses (> 50% need human correction)
❌ Team loses confidence in AI-automated approach

11.3 Quality Thresholds

POC quality expectations:

Component	Quality Threshold	Definition
Claim Extraction	≥70% accuracy	Identifies obvious factual claims, may miss some edge cases
Verdict Logic	≥70% defensible	Verdicts are logical given reasoning provided
Reasoning Clarity	≥70% clear	1-3 sentences are understandable and relevant
Overall Analysis	≥70% useful	Output helps user understand article claims

Analogy: "B student" quality (70-80%), not "A+" perfection yet

Not expecting:

100% accuracy
Perfect claim coverage
Comprehensive evidence gathering
Flawless verdicts
Production polish

Expecting:

Reasonable claim extraction
Defensible verdicts
Understandable reasoning
Useful output

12. Test Cases

12.1 Test Case 1: Simple Factual Claim

Input: "Coffee reduces the risk of type 2 diabetes by 30%"

Expected Output:

Extract claim correctly
Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED
Confidence: 70-90%
Risk tier: C (Low)
Reasoning: Mentions studies or evidence

Success: Verdict is reasonable and reasoning makes sense

12.2 Test Case 2: Complex News Article

Input: News article URL with multiple claims about politics/health/science

Expected Output:

Extract 3-5 key claims
Verdict for each (may vary: some supported, some uncertain, some refuted)
Coherent analysis summary
Article summary
Risk tiers assigned appropriately

Success: Claims identified are actually from article, verdicts are reasonable

12.3 Test Case 3: Controversial Topic

Input: Article on contested political or scientific topic

Expected Output:

Balanced analysis
Acknowledges uncertainty where appropriate
Doesn't overstate confidence
Reasoning shows awareness of complexity

Success: Analysis is fair and doesn't show obvious bias

12.4 Test Case 4: Clearly False Claim

Input: Article with obviously false claim (e.g., "The Earth is flat")

Expected Output:

Extract claim
Verdict: REFUTED
High confidence (> 90%)
Risk tier: C (Low - established fact)
Clear reasoning

Success: AI correctly identifies false claim with high confidence

12.5 Test Case 5: Genuinely Uncertain Claim

Input: Article with claim where evidence is genuinely mixed

Expected Output:

Extract claim
Verdict: UNCERTAIN
Moderate confidence (40-60%)
Reasoning explains why uncertain

Success: AI recognizes uncertainty and doesn't overstate confidence

12.6 Test Case 6: High-Risk Medical Claim

Input: Article making medical claims

Expected Output:

Extract claim
Verdict: [appropriate based on evidence]
Risk tier: A (High - medical)
Red label displayed
Clear disclaimer about not being medical advice

Success: Risk tier correctly assigned, appropriate warnings shown

13. POC Decision Gate

13.1 Decision Framework

After POC testing complete, team makes one of three decisions:

Option A: GO (Proceed to POC2)

Conditions:

AI quality ≥70% without manual editing
Basic claim → verdict pipeline validated
Internal + advisor feedback positive
Technical feasibility confirmed
Team confident in direction
Clear path to improving AI quality to ≥90%

Next Steps:

Plan POC2 development (add scenarios)
Design scenario architecture
Expand to Evidence Model structure
Test with more complex articles

Option B: NO-GO (Pivot or Stop)

Conditions:

AI quality < 60%
Requires manual editing for most analyses (> 50%)
Feedback indicates fundamental flaws
Cost/effort not justified by value
No clear path to improvement

Next Steps:

Pivot: Change to hybrid human-AI approach (accept manual review required)
Stop: Conclude approach not viable, revisit later

Option C: ITERATE (Improve POC)

Conditions:

Concept has merit but execution needs work
Specific improvements identified
Addressable with better prompts/approach
AI quality between 60-70%

Next Steps:

Improve AI prompts
Test different approaches
Re-run POC with improvements
Then make GO/NO-GO decision

13.2 Decision Criteria Summary

AI Quality < 60% → NO-GO (approach doesn't work)
AI Quality 60-70% → ITERATE (improve and retry)
AI Quality ≥70% → GO (proceed to POC2)

14. Key Risks & Mitigations

14.1 Risk: AI Quality Not Good Enough

Likelihood: Medium-High
Impact: POC fails

Mitigation:

Extensive prompt engineering and testing
Use best available AI models (Sonnet 4.5)
Test with diverse article types
Iterate on prompts based on results

Acceptance: This is what POC tests - be ready for failure

14.2 Risk: AI Consistency Issues

Likelihood: Medium
Impact: Works sometimes, fails other times

Mitigation:

Test with 10+ diverse articles
Measure success rate honestly
Improve prompts to increase consistency

Acceptance: Some variability OK if average quality ≥70%

14.3 Risk: Output Incomprehensible

Likelihood: Low-Medium
Impact: Users can't understand analysis

Mitigation:

Create clear explainer document
Iterate on output format
Test with non-technical reviewers
Simplify language if needed

Acceptance: Iterate until comprehensible

14.4 Risk: API Rate Limits / Costs

Likelihood: Low
Impact: System slow or expensive

Mitigation:

Monitor API usage
Implement retry logic
Estimate costs before scaling

Acceptance: POC can be slow and expensive (optimization later)

14.5 Risk: Scope Creep

Likelihood: Medium
Impact: POC becomes too complex

Mitigation:

Strict scope discipline
Say NO to feature additions
Keep focus on core question

Acceptance: POC is minimal by design

15. POC Philosophy

15.1 Core Principles

- 1. Build Less, Learn More
Minimum features to test hypothesis
Don't build unvalidated features
Focus on core question only

2. Fail Fast

Quick test of hardest part (AI capability)
Accept that POC might fail
Better to discover issues early
Honest assessment over optimistic hope

3. Test First, Build Second

Validate AI can do this before building platform
Don't assume it will work
Let results guide decisions

4. Automation First

No manual editing allowed
Tests scalability, not just feasibility
Proves approach can work at scale

5. Honest Assessment

Don't cherry-pick examples
Don't manually fix bad outputs
Document failures openly
Make data-driven decisions

15.2 What POC Is

✅ Testing AI capability without humans
✅ Proving core technical concept
✅ Fast validation of approach
✅ Honest assessment of feasibility

15.3 What POC Is NOT

❌ Building a product
❌ Production-ready system
❌ Feature-complete platform
❌ Perfectly accurate analysis
❌ Polished user experience

16. Success

Clear Path Forward ==

If POC succeeds (≥70% AI quality):

✅ Approach validated
✅ Proceed to POC2 (add scenarios)
✅ Design full Evidence Model structure
✅ Test multi-scenario comparison
✅ Focus on improving AI quality from 70% → 90%

If POC fails (< 60% AI quality):

✅ Learn what doesn't work
✅ Pivot to different approach
✅ OR wait for better AI technology
✅ Avoid wasting resources on non-viable approach

Either way, POC provides clarity.

17. Related Pages

Document Status: ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)

NFR-POC-11: LLM Provider Abstraction (POC1)

Requirement: POC1 MUST implement LLM abstraction layer with support for multiple providers.

POC1 Implementation:

Primary Provider: Anthropic Claude API
Stage 1: Claude Haiku 4
Stage 2: Claude Sonnet 3.5 (cached)
Stage 3: Claude Sonnet 3.5

Provider Interface: Abstract LLMProvider interface implemented

Configuration: Environment variables for provider selection
LLM_PRIMARY_PROVIDER=anthropic
LLM_STAGE1_MODEL=claude-haiku-4
LLM_STAGE2_MODEL=claude-sonnet-3-5

Failover: Basic error handling with cache fallback for Stage 2

Cost Tracking: Log provider name and cost per request

Future (POC2/Beta):

Secondary provider (OpenAI) with automatic failover
Admin API for runtime provider switching
Cost comparison dashboard
Cross-provider output verification

Success Criteria:

All LLM calls go through abstraction layer (no direct API calls)
Provider can be changed via environment variable without code changes
Cost tracking includes provider name in logs
Stage 2 falls back to cache on provider failure

Implementation: See POC1 API & Schemas Specification Section 6

Dependencies:

NFR-14 (Main Requirements)
Design Decision 9
Architecture Section 2.2

Priority: HIGH (P1)

Rationale: Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.