POC Requirements

Last modified by Robert Schaub on 2025/12/22 14:33

POC Requirements

Status: ✅ Approved for Development Version: 3.0 (Aligned with Main Requirements) Goal: Prove that AI can extract claims and determine verdicts automatically without human intervention InformationCore Philosophy: POC validates the Main Requirements through simplified implementation. All POC features map to formal FR/NFR requirements. == 1. POC Overview == === 1.1 What POC Tests === Core Question:

Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts? What we're proving:

AI can identify factual claims from text
AI can evaluate those claims with structured evidence
Quality gates can filter unreliable outputs
The core workflow is technically feasible What we're NOT proving:
Production-ready reliability (that's POC2)
User-facing features (that's Beta 0)
Full IFCN compliance (that's V1.0) === 1.2 Requirements Mapping === POC1 implements a subset of the full system requirements defined in Main Requirements. Scope Summary:
In Scope: 8 requirements (7 FRs + 1 NFR)
Partial: 3 NFRs (simplified versions)
Out of Scope: 19 requirements (deferred to later phases) == 2. Requirements Scope Matrix == SuccessRequirements Traceability: This matrix shows which Main Requirements are implemented in POC1, providing full traceability between POC and system requirements. |=Requirement|=POC1 Status|=Implementation Level|=Notes

CORE WORKFLOW
FR1: Claim Extraction	✅ In Scope	Full	AKEL extracts claims from text
FR2: Claim Context	✅ In Scope	Basic	Context preserved with claim
FR3: Multiple Scenarios	✅ In Scope	Full	AKEL generates interpretation scenarios
FR4: Analysis Summary	✅ In Scope	Basic	Simple summary format
FR5: Evidence Collection	✅ In Scope	Full	AKEL searches for evidence
FR6: Evidence Evaluation	✅ In Scope	Full	AKEL evaluates source reliability
FR7: Automated Verdicts	✅ In Scope	Full	AKEL computes verdicts with uncertainty
QUALITY & RELIABILITY
NFR11: Quality Assurance	✅ In Scope	Lite	2 gates only (Gate 1 & 4)
NFR1: Performance	⚠️ Partial	Basic	Response time monitored, not optimized
NFR2: Scalability	⚠️ Partial	Single-thread	No concurrent processing
NFR3: Reliability	⚠️ Partial	Basic	Error handling, no retry logic
DEFERRED TO LATER
FR8-FR13	❌ Out of Scope	N/A	User accounts, corrections, publishing
FR44-FR53	❌ Out of Scope	N/A	Advanced features (V1.0+)
NFR4: Security	❌ Out of Scope	N/A	POC2
NFR5: Maintainability	❌ Out of Scope	N/A	POC2
NFR12: Security Controls	❌ Out of Scope	N/A	Beta 0
NFR13: Monitoring	❌ Out of Scope	N/A	POC2 == 3. POC Simplifications == === 3.1 FR1: Claim Extraction (Full Implementation) === Main Requirement: AI extracts factual claims from input text POC Implementation:

✅ AKEL extracts claims using LLM
✅ Each claim includes original text reference
✅ Claims are identified as factual/non-factual
❌ No advanced claim parsing (added in POC2) Acceptance Criteria:
Extracts 3-5 claims from typical article
Identifies factual vs non-factual claims
Quality Gate 1 validates extraction === 3.2 FR3: Multiple Scenarios (Full Implementation) === Main Requirement: Generate multiple interpretation scenarios for ambiguous claims POC Implementation:
✅ AKEL generates 2-3 scenarios per claim
✅ Scenarios capture different interpretations
✅ Each scenario is evaluated separately
✅ Verdict considers all scenarios Acceptance Criteria:
Generates 2+ scenarios for ambiguous claims
Scenarios are meaningfully different
All scenarios are evaluated === 3.3 FR4: Analysis Summary (Basic Implementation) === Main Requirement: Provide user-friendly summary of analysis POC Implementation:
✅ Simple text summary generated
❌ No rich formatting (added in Beta 0)
❌ No visual elements (added in Beta 0)
❌ No interactive features (added in Beta 0) POC Format:
```
Claim: [extracted claim]
Scenarios: [list of scenarios]
Evidence: [supporting/opposing evidence]
Verdict: [probability with uncertainty]
``` === 3.4 FR5-FR6: Evidence Collection & Evaluation (Full Implementation) === Main Requirements:
FR5: Collect supporting and opposing evidence
FR6: Evaluate evidence source reliability POC Implementation:
✅ AKEL searches for evidence (web/knowledge base)
✅ Mandatory contradiction search (finds opposing evidence)
✅ Source reliability scoring
❌ No evidence deduplication (added in POC2)
❌ No advanced source verification (added in POC2) Acceptance Criteria:
Finds 2+ supporting evidence items
Finds 1+ opposing evidence (if exists)
Sources scored for reliability === 3.5 FR7: Automated Verdicts (Full Implementation) === Main Requirement: AI computes verdicts with uncertainty quantification POC Implementation:
✅ Probabilistic verdicts (0-100% confidence)
✅ Uncertainty explicitly stated
✅ Reasoning chain provided
✅ Quality Gate 4 validates verdict confidence POC Output:
```
Verdict: 70% likely true
Uncertainty: ±15% (moderate confidence)
Reasoning: Based on 3 high-quality sources...
Confidence Level: MEDIUM
``` Acceptance Criteria:
Verdicts include probability (0-100%)
Uncertainty explicitly quantified
Reasoning chain explains verdict === 3.6 NFR11: Quality Assurance Framework (LITE VERSION) === Main Requirement: Complete quality assurance with 7 quality gates POC Implementation: 2 gates only Quality Gate 1: Claim Validation
✅ Validates claim is factual and verifiable
✅ Blocks non-factual claims (opinion/prediction/ambiguous)
✅ Provides clear rejection reason Quality Gate 4: Verdict Confidence Assessment
✅ Validates ≥2 sources found
✅ Validates quality score ≥0.6
✅ Blocks low-confidence verdicts
✅ Provides clear rejection reason Out of Scope (POC2+):
❌ Gate 2: Evidence Relevance
❌ Gate 3: Scenario Coherence
❌ Gate 5: Source Diversity
❌ Gate 6: Reasoning Validity
❌ Gate 7: Output Completeness Rationale: Prove gate concept works. Add remaining gates in POC2 after validating approach. === 3.7 NFR1-3: Performance, Scalability, Reliability (Basic) === Main Requirements:
NFR1: Response time < 30 seconds
NFR2: Handle 1000+ concurrent users
NFR3: 99.9% uptime POC Implementation:
⚠️ Response time monitored (not optimized)
⚠️ Single-threaded processing (no concurrency)
⚠️ Basic error handling (no advanced retry logic) Rationale: POC proves functionality. Performance optimization happens in POC2. POC Acceptance:
Analysis completes (no timeout requirement)
Errors don't crash system
Basic logging in place == 4. What's NOT in POC Scope == === 4.1 User-Facing Features (Beta 0+) === WarningDeferred to Beta 0: Out of Scope:
❌ User accounts and authentication (FR8)
❌ User corrections system (FR9, FR45-46)
❌ Public publishing interface (FR10)
❌ Social sharing (FR11)
❌ Email notifications (FR12)
❌ API access (FR13) Rationale: POC validates AI capabilities. User features added in Beta 0. === 4.2 Advanced Features (V1.0+) === Out of Scope:
❌ IFCN compliance (FR47)
❌ ClaimReview schema (FR48)
❌ Archive.org integration (FR49)
❌ OSINT toolkit (FR50)
❌ Video verification (FR51)
❌ Deepfake detection (FR52)
❌ Cross-org sharing (FR53) Rationale: Advanced features require proven platform. Added post-V1.0. === 4.3 Production Requirements (POC2, Beta 0) === Out of Scope:
❌ Security controls (NFR4, NFR12)
❌ Code maintainability (NFR5)
❌ System monitoring (NFR13)
❌ Evidence deduplication
❌ Advanced source verification
❌ Full 7-gate quality framework Rationale: POC proves concept. Production hardening happens in POC2 and Beta 0. == 5. POC Output Specification == === 5.1 Required Output Elements === For each analyzed claim, POC must produce: 1. Claim
Original text
Classification (factual/non-factual/ambiguous)
If non-factual: Clear reason why 2. Scenarios (if factual)
2-3 interpretation scenarios
Each scenario clearly described 3. Evidence (if factual)
Supporting evidence (2+ items)
Opposing evidence (if exists)
Source URLs and reliability scores 4. Verdict (if factual)
Probability (0-100%)
Uncertainty quantification
Confidence level (LOW/MEDIUM/HIGH)
Reasoning chain 5. Quality Status
Which gates passed/failed
If failed: Clear explanation why === 5.2 Example POC Output === { "claim": { "text": "Switzerland has the highest life expectancy in Europe", "type": "factual", "gate1_status": "PASS" }, "scenarios": [ "Switzerland's overall life expectancy is highest", "Switzerland ranks highest for specific age groups" ], "evidence": { "supporting": [ { "source": "WHO Report 2023", "reliability": 0.95, "excerpt": "Switzerland: 83.4 years average..." } ], "opposing": [ { "source": "Eurostat 2024", "reliability": 0.90, "excerpt": "Spain leads at 83.5 years..." } ] }, "verdict": { "probability": 0.65, "uncertainty": 0.15, "confidence": "MEDIUM", "reasoning": "WHO and Eurostat show similar but conflicting data...", "gate4_status": "PASS" }
} == 6. Success Criteria == SuccessPOC Success Definition: POC validates that AI can extract claims, find balanced evidence, and compute reasonable verdicts with quality gates improving output quality. === 6.1 Functional Success === POC is successful if: ✅ FR1-FR7 Requirements Met:

Extracts 3-5 factual claims from test articles
2. Generates 2-3 scenarios per ambiguous claim
3. Finds supporting AND opposing evidence
4. Computes probabilistic verdicts with uncertainty
5. Provides clear reasoning chains ✅ Quality Gates Work:
Gate 1 blocks non-factual claims (100% block rate)
2. Gate 4 blocks low-quality verdicts (blocks if <2 sources or quality <0.6)
3. Clear rejection reasons provided ✅ NFR11 Met:
Quality gates reduce hallucination rate
2. Blocked outputs have clear explanations
3. Quality metrics are logged === 6.2 Quality Thresholds === Minimum Acceptable:

≥70% of test claims correctly classified (factual/non-factual)
≥60% of verdicts are reasonable (human evaluation)
Gate 1 blocks 100% of non-factual claims
Gate 4 blocks verdicts with <2 sources Target:
≥80% claims correctly classified
≥75% verdicts are reasonable
<10% false positives (blocking good claims) === 6.3 POC Decision Gate === After POC1, we decide: ✅ PROCEED to POC2 if:
Success criteria met
Quality gates demonstrably improve output
Core workflow is technically sound
Clear path to production quality ⚠️ ITERATE POC1 if:
Success criteria partially met
Gates work but need tuning
Core issues identified but fixable ❌ PIVOT APPROACH if:
Success criteria not met
Fundamental AI limitations discovered
Quality gates insufficient
Alternative approach needed == 7. Test Cases == === 7.1 Happy Path === Test 1: Simple Factual Claim
Input: "Paris is the capital of France"
Expected: Factual, 1 scenario, verdict 95% true Test 2: Ambiguous Claim
Input: "Switzerland has the highest income in Europe"
Expected: Factual, 2-3 scenarios, verdict with uncertainty Test 3: Statistical Claim
Input: "10% of people have condition X"
Expected: Factual, evidence with numbers, probabilistic verdict === 7.2 Edge Cases === Test 4: Opinion
Input: "Paris is the best city"
Expected: Non-factual (opinion), blocked by Gate 1 Test 5: Prediction
Input: "Bitcoin will reach $100,000 next year"
Expected: Non-factual (prediction), blocked by Gate 1 Test 6: Insufficient Evidence
Input: Obscure factual claim with no sources
Expected: Blocked by Gate 4 (<2 sources) === 7.3 Quality Gate Tests === Test 7: Gate 1 Effectiveness
Input: Mix of 10 factual + 10 non-factual claims
Expected: Gate 1 blocks all 10 non-factual (100% precision) Test 8: Gate 4 Effectiveness
Input: Claims with varying evidence availability
Expected: Gate 4 blocks low-confidence verdicts == 8. Technical Architecture (POC) == === 8.1 Simplified Architecture === POC Tech Stack:
Frontend: Simple web interface (Next.js + TypeScript)
Backend: Single API endpoint
AI: Claude API (Sonnet 4.5)
Database: Local JSON files (no database)
Deployment: Single server Architecture Diagram: See POC1 Specification === 8.2 AKEL Implementation === POC AKEL:
Single-threaded processing
Synchronous API calls
No caching
Basic error handling
Console logging Full AKEL (POC2+):
Multi-threaded processing
Async API calls
Evidence caching
Advanced error handling with retry
Structured logging + monitoring == 9. POC Philosophy == InformationImportant: POC validates concept, not production readiness. Focus is on proving AI can do the job, with production quality coming in later phases. === 9.1 Core Principles === 1. Prove Concept, Not Production
POC validates AI can do the job
Production quality comes in POC2 and Beta 0
Focus on "does it work?" not "is it perfect?" 2. Implement Subset of Requirements
POC covers FR1-7, NFR11 (lite)
All other requirements deferred
Clear mapping to Main Requirements 3. Quality Gates Validate Approach
2 gates prove the concept
Remaining 5 gates added in POC2
Gates must demonstrably improve quality 4. Iterate Based on Results
POC results determine next steps
Decision gate after POC1
Flexibility to pivot if needed === 9.2 Success = Clear Path Forward === POC succeeds if we can confidently answer: ✅ Technical Feasibility:
Can AI extract claims reliably?
Can AI find balanced evidence?
Can AI compute reasonable verdicts? ✅ Quality Approach:
Do quality gates improve output?
Can we measure and track quality?
Is the gate approach scalable? ✅ Production Path:
Is the core architecture sound?
What needs improvement for production?
Is POC2 the right next step? == 10. Related Pages == * Main Requirements - Full system requirements (this POC implements a subset)
POC1 Specification (Detailed) - Detailed POC1 technical specs
POC Summary - High-level POC overview
Implementation Roadmap - POC1, POC2, Beta 0, V1.0 phases
User Needs - What users need (drives requirements) Document Owner: Technical Team Review Frequency: After each POC iteration Version History:
v1.0 - Initial POC requirements
v2.0 - Updated after specification cross-check
v3.0 - Aligned with Main Requirements (FR/NFR IDs added)