POC Requirements

Last modified by Robert Schaub on 2025/12/22 14:33

POC Requirements

 Status: ✅ Approved for Development Version: 3.0 (Aligned with Main Requirements) Goal: Prove that AI can extract claims and determine verdicts automatically without human intervention InformationCore Philosophy: POC validates the Main Requirements through simplified implementation. All POC features map to formal FR/NFR requirements. == 1. POC Overview == === 1.1 What POC Tests === Core Question:

 Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts? What we're proving:

  • AI can identify factual claims from text
  • AI can evaluate those claims with structured evidence
  • Quality gates can filter unreliable outputs
  • The core workflow is technically feasible What we're NOT proving:
  • Production-ready reliability (that's POC2)
  • User-facing features (that's Beta 0)
  • Full IFCN compliance (that's V1.0) === 1.2 Requirements Mapping === POC1 implements a subset of the full system requirements defined in Main Requirements. Scope Summary:
  • In Scope: 8 requirements (7 FRs + 1 NFR)
  • Partial: 3 NFRs (simplified versions)
  • Out of Scope: 19 requirements (deferred to later phases) == 2. Requirements Scope Matrix == SuccessRequirements Traceability: This matrix shows which Main Requirements are implemented in POC1, providing full traceability between POC and system requirements. |=Requirement|=POC1 Status|=Implementation Level|=Notes
CORE WORKFLOW
FR1: Claim ExtractionIn ScopeFullAKEL extracts claims from text
FR2: Claim ContextIn ScopeBasicContext preserved with claim
FR3: Multiple ScenariosIn ScopeFullAKEL generates interpretation scenarios
FR4: Analysis SummaryIn ScopeBasicSimple summary format
FR5: Evidence CollectionIn ScopeFullAKEL searches for evidence
FR6: Evidence EvaluationIn ScopeFullAKEL evaluates source reliability
FR7: Automated VerdictsIn ScopeFullAKEL computes verdicts with uncertainty
QUALITY & RELIABILITY
NFR11: Quality AssuranceIn ScopeLite2 gates only (Gate 1 & 4)
NFR1: Performance⚠️ PartialBasicResponse time monitored, not optimized
NFR2: Scalability⚠️ PartialSingle-threadNo concurrent processing
NFR3: Reliability⚠️ PartialBasicError handling, no retry logic
DEFERRED TO LATER
FR8-FR13❌ Out of ScopeN/AUser accounts, corrections, publishing
FR44-FR53❌ Out of ScopeN/AAdvanced features (V1.0+)
NFR4: Security❌ Out of ScopeN/APOC2
NFR5: Maintainability❌ Out of ScopeN/APOC2
NFR12: Security Controls❌ Out of ScopeN/ABeta 0
NFR13: Monitoring❌ Out of ScopeN/APOC2 == 3. POC Simplifications == === 3.1 FR1: Claim Extraction (Full Implementation) === Main Requirement: AI extracts factual claims from input text POC Implementation:
  • ✅ AKEL extracts claims using LLM
  • ✅ Each claim includes original text reference
  • ✅ Claims are identified as factual/non-factual
  • ❌ No advanced claim parsing (added in POC2) Acceptance Criteria:
  • Extracts 3-5 claims from typical article
  • Identifies factual vs non-factual claims
  • Quality Gate 1 validates extraction === 3.2 FR3: Multiple Scenarios (Full Implementation) === Main Requirement: Generate multiple interpretation scenarios for ambiguous claims POC Implementation:
  • ✅ AKEL generates 2-3 scenarios per claim
  • ✅ Scenarios capture different interpretations
  • ✅ Each scenario is evaluated separately
  • ✅ Verdict considers all scenarios Acceptance Criteria:
  • Generates 2+ scenarios for ambiguous claims
  • Scenarios are meaningfully different
  • All scenarios are evaluated === 3.3 FR4: Analysis Summary (Basic Implementation) === Main Requirement: Provide user-friendly summary of analysis POC Implementation:
  • ✅ Simple text summary generated
  • ❌ No rich formatting (added in Beta 0)
  • ❌ No visual elements (added in Beta 0)
  • ❌ No interactive features (added in Beta 0) POC Format:
    ```
    Claim: [extracted claim]
    Scenarios: [list of scenarios]
    Evidence: [supporting/opposing evidence]
    Verdict: [probability with uncertainty]
    ``` === 3.4 FR5-FR6: Evidence Collection & Evaluation (Full Implementation) === Main Requirements:
  • FR5: Collect supporting and opposing evidence
  • FR6: Evaluate evidence source reliability POC Implementation:
  • ✅ AKEL searches for evidence (web/knowledge base)
  • Mandatory contradiction search (finds opposing evidence)
  • ✅ Source reliability scoring
  • ❌ No evidence deduplication (added in POC2)
  • ❌ No advanced source verification (added in POC2) Acceptance Criteria:
  • Finds 2+ supporting evidence items
  • Finds 1+ opposing evidence (if exists)
  • Sources scored for reliability === 3.5 FR7: Automated Verdicts (Full Implementation) === Main Requirement: AI computes verdicts with uncertainty quantification POC Implementation:
  • ✅ Probabilistic verdicts (0-100% confidence)
  • ✅ Uncertainty explicitly stated
  • ✅ Reasoning chain provided
  • ✅ Quality Gate 4 validates verdict confidence POC Output:
    ```
    Verdict: 70% likely true
    Uncertainty: ±15% (moderate confidence)
    Reasoning: Based on 3 high-quality sources...
    Confidence Level: MEDIUM
    ``` Acceptance Criteria:
  • Verdicts include probability (0-100%)
  • Uncertainty explicitly quantified
  • Reasoning chain explains verdict === 3.6 NFR11: Quality Assurance Framework (LITE VERSION) === Main Requirement: Complete quality assurance with 7 quality gates POC Implementation: 2 gates only Quality Gate 1: Claim Validation
  • ✅ Validates claim is factual and verifiable
  • ✅ Blocks non-factual claims (opinion/prediction/ambiguous)
  • ✅ Provides clear rejection reason Quality Gate 4: Verdict Confidence Assessment
  • ✅ Validates ≥2 sources found
  • ✅ Validates quality score ≥0.6
  • ✅ Blocks low-confidence verdicts
  • ✅ Provides clear rejection reason Out of Scope (POC2+):
  • ❌ Gate 2: Evidence Relevance
  • ❌ Gate 3: Scenario Coherence
  • ❌ Gate 5: Source Diversity
  • ❌ Gate 6: Reasoning Validity
  • ❌ Gate 7: Output Completeness Rationale: Prove gate concept works. Add remaining gates in POC2 after validating approach. === 3.7 NFR1-3: Performance, Scalability, Reliability (Basic) === Main Requirements:
  • NFR1: Response time < 30 seconds
  • NFR2: Handle 1000+ concurrent users
  • NFR3: 99.9% uptime POC Implementation:
  • ⚠️ Response time monitored (not optimized)
  • ⚠️ Single-threaded processing (no concurrency)
  • ⚠️ Basic error handling (no advanced retry logic) Rationale: POC proves functionality. Performance optimization happens in POC2. POC Acceptance:
  • Analysis completes (no timeout requirement)
  • Errors don't crash system
  • Basic logging in place == 4. What's NOT in POC Scope == === 4.1 User-Facing Features (Beta 0+) === WarningDeferred to Beta 0: Out of Scope:
  • ❌ User accounts and authentication (FR8)
  • ❌ User corrections system (FR9, FR45-46)
  • ❌ Public publishing interface (FR10)
  • ❌ Social sharing (FR11)
  • ❌ Email notifications (FR12)
  • ❌ API access (FR13) Rationale: POC validates AI capabilities. User features added in Beta 0. === 4.2 Advanced Features (V1.0+) === Out of Scope:
  • ❌ IFCN compliance (FR47)
  • ❌ ClaimReview schema (FR48)
  • ❌ Archive.org integration (FR49)
  • ❌ OSINT toolkit (FR50)
  • ❌ Video verification (FR51)
  • ❌ Deepfake detection (FR52)
  • ❌ Cross-org sharing (FR53) Rationale: Advanced features require proven platform. Added post-V1.0. === 4.3 Production Requirements (POC2, Beta 0) === Out of Scope:
  • ❌ Security controls (NFR4, NFR12)
  • ❌ Code maintainability (NFR5)
  • ❌ System monitoring (NFR13)
  • ❌ Evidence deduplication
  • ❌ Advanced source verification
  • ❌ Full 7-gate quality framework Rationale: POC proves concept. Production hardening happens in POC2 and Beta 0. == 5. POC Output Specification == === 5.1 Required Output Elements === For each analyzed claim, POC must produce: 1. Claim
  • Original text
  • Classification (factual/non-factual/ambiguous)
  • If non-factual: Clear reason why 2. Scenarios (if factual)
  • 2-3 interpretation scenarios
  • Each scenario clearly described 3. Evidence (if factual)
  • Supporting evidence (2+ items)
  • Opposing evidence (if exists)
  • Source URLs and reliability scores 4. Verdict (if factual)
  • Probability (0-100%)
  • Uncertainty quantification
  • Confidence level (LOW/MEDIUM/HIGH)
  • Reasoning chain 5. Quality Status
  • Which gates passed/failed
  • If failed: Clear explanation why === 5.2 Example POC Output === { "claim": { "text": "Switzerland has the highest life expectancy in Europe", "type": "factual", "gate1_status": "PASS" }, "scenarios": [ "Switzerland's overall life expectancy is highest", "Switzerland ranks highest for specific age groups" ], "evidence": { "supporting": [ { "source": "WHO Report 2023", "reliability": 0.95, "excerpt": "Switzerland: 83.4 years average..." } ], "opposing": [ { "source": "Eurostat 2024", "reliability": 0.90, "excerpt": "Spain leads at 83.5 years..." } ] }, "verdict": { "probability": 0.65, "uncertainty": 0.15, "confidence": "MEDIUM", "reasoning": "WHO and Eurostat show similar but conflicting data...", "gate4_status": "PASS" }
    }
    == 6. Success Criteria == SuccessPOC Success Definition: POC validates that AI can extract claims, find balanced evidence, and compute reasonable verdicts with quality gates improving output quality. === 6.1 Functional Success === POC is successful if: ✅ FR1-FR7 Requirements Met:
  1. Extracts 3-5 factual claims from test articles
    2. Generates 2-3 scenarios per ambiguous claim
    3. Finds supporting AND opposing evidence
    4. Computes probabilistic verdicts with uncertainty
    5. Provides clear reasoning chains ✅ Quality Gates Work:
  2. Gate 1 blocks non-factual claims (100% block rate)
    2. Gate 4 blocks low-quality verdicts (blocks if <2 sources or quality <0.6)
    3. Clear rejection reasons provided ✅ NFR11 Met:
  3. Quality gates reduce hallucination rate
    2. Blocked outputs have clear explanations
    3. Quality metrics are logged === 6.2 Quality Thresholds === Minimum Acceptable:
  • ≥70% of test claims correctly classified (factual/non-factual)
  • ≥60% of verdicts are reasonable (human evaluation)
  • Gate 1 blocks 100% of non-factual claims
  • Gate 4 blocks verdicts with <2 sources Target:
  • ≥80% claims correctly classified
  • ≥75% verdicts are reasonable
  • <10% false positives (blocking good claims) === 6.3 POC Decision Gate === After POC1, we decide: ✅ PROCEED to POC2 if:
  • Success criteria met
  • Quality gates demonstrably improve output
  • Core workflow is technically sound
  • Clear path to production quality ⚠️ ITERATE POC1 if:
  • Success criteria partially met
  • Gates work but need tuning
  • Core issues identified but fixable ❌ PIVOT APPROACH if:
  • Success criteria not met
  • Fundamental AI limitations discovered
  • Quality gates insufficient
  • Alternative approach needed == 7. Test Cases == === 7.1 Happy Path === Test 1: Simple Factual Claim
  • Input: "Paris is the capital of France"
  • Expected: Factual, 1 scenario, verdict 95% true Test 2: Ambiguous Claim
  • Input: "Switzerland has the highest income in Europe"
  • Expected: Factual, 2-3 scenarios, verdict with uncertainty Test 3: Statistical Claim
  • Input: "10% of people have condition X"
  • Expected: Factual, evidence with numbers, probabilistic verdict === 7.2 Edge Cases === Test 4: Opinion
  • Input: "Paris is the best city"
  • Expected: Non-factual (opinion), blocked by Gate 1 Test 5: Prediction
  • Input: "Bitcoin will reach $100,000 next year"
  • Expected: Non-factual (prediction), blocked by Gate 1 Test 6: Insufficient Evidence
  • Input: Obscure factual claim with no sources
  • Expected: Blocked by Gate 4 (<2 sources) === 7.3 Quality Gate Tests === Test 7: Gate 1 Effectiveness
  • Input: Mix of 10 factual + 10 non-factual claims
  • Expected: Gate 1 blocks all 10 non-factual (100% precision) Test 8: Gate 4 Effectiveness
  • Input: Claims with varying evidence availability
  • Expected: Gate 4 blocks low-confidence verdicts == 8. Technical Architecture (POC) == === 8.1 Simplified Architecture === POC Tech Stack:
  • Frontend: Simple web interface (Next.js + TypeScript)
  • Backend: Single API endpoint
  • AI: Claude API (Sonnet 4.5)
  • Database: Local JSON files (no database)
  • Deployment: Single server Architecture Diagram: See POC1 Specification === 8.2 AKEL Implementation === POC AKEL:
  • Single-threaded processing
  • Synchronous API calls
  • No caching
  • Basic error handling
  • Console logging Full AKEL (POC2+):
  • Multi-threaded processing
  • Async API calls
  • Evidence caching
  • Advanced error handling with retry
  • Structured logging + monitoring == 9. POC Philosophy == InformationImportant: POC validates concept, not production readiness. Focus is on proving AI can do the job, with production quality coming in later phases. === 9.1 Core Principles === 1. Prove Concept, Not Production
  • POC validates AI can do the job
  • Production quality comes in POC2 and Beta 0
  • Focus on "does it work?" not "is it perfect?" 2. Implement Subset of Requirements
  • POC covers FR1-7, NFR11 (lite)
  • All other requirements deferred
  • Clear mapping to Main Requirements 3. Quality Gates Validate Approach
  • 2 gates prove the concept
  • Remaining 5 gates added in POC2
  • Gates must demonstrably improve quality 4. Iterate Based on Results
  • POC results determine next steps
  • Decision gate after POC1
  • Flexibility to pivot if needed === 9.2 Success = Clear Path Forward === POC succeeds if we can confidently answer: ✅ Technical Feasibility:
  • Can AI extract claims reliably?
  • Can AI find balanced evidence?
  • Can AI compute reasonable verdicts? ✅ Quality Approach:
  • Do quality gates improve output?
  • Can we measure and track quality?
  • Is the gate approach scalable? ✅ Production Path:
  • Is the core architecture sound?
  • What needs improvement for production?
  • Is POC2 the right next step? == 10. Related Pages == * Main Requirements - Full system requirements (this POC implements a subset)
  • POC1 Specification (Detailed) - Detailed POC1 technical specs
  • POC Summary - High-level POC overview
  • Implementation Roadmap - POC1, POC2, Beta 0, V1.0 phases
  • User Needs - What users need (drives requirements) Document Owner: Technical Team Review Frequency: After each POC iteration Version History:
  • v1.0 - Initial POC requirements
  • v2.0 - Updated after specification cross-check
  • v3.0 - Aligned with Main Requirements (FR/NFR IDs added)