POC Requirements
Last modified by Robert Schaub on 2025/12/22 14:33
POC Requirements
Status: ✅ Approved for Development Version: 3.0 (Aligned with Main Requirements) Goal: Prove that AI can extract claims and determine verdicts automatically without human intervention == 1. POC Overview == === 1.1 What POC Tests === Core Question:
Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts? What we're proving:
- AI can identify factual claims from text
- AI can evaluate those claims with structured evidence
- Quality gates can filter unreliable outputs
- The core workflow is technically feasible What we're NOT proving:
- Production-ready reliability (that's POC2)
- User-facing features (that's Beta 0)
- Full IFCN compliance (that's V1.0) === 1.2 Requirements Mapping === POC1 implements a subset of the full system requirements defined in Main Requirements. Scope Summary:
- In Scope: 8 requirements (7 FRs + 1 NFR)
- Partial: 3 NFRs (simplified versions)
- Out of Scope: 19 requirements (deferred to later phases) == 2. Requirements Scope Matrix == |=Requirement|=POC1 Status|=Implementation Level|=Notes
| CORE WORKFLOW | ||||
| FR1: Claim Extraction | ✅ In Scope | Full | AKEL extracts claims from text | |
| FR2: Claim Context | ✅ In Scope | Basic | Context preserved with claim | |
| FR3: Multiple Scenarios | ✅ In Scope | Full | AKEL generates interpretation scenarios | |
| FR4: Analysis Summary | ✅ In Scope | Basic | Simple summary format | |
| FR5: Evidence Collection | ✅ In Scope | Full | AKEL searches for evidence | |
| FR6: Evidence Evaluation | ✅ In Scope | Full | AKEL evaluates source reliability | |
| FR7: Automated Verdicts | ✅ In Scope | Full | AKEL computes verdicts with uncertainty | |
| QUALITY & RELIABILITY | ||||
| NFR11: Quality Assurance | ✅ In Scope | Lite | 2 gates only (Gate 1 & 4) | |
| NFR1: Performance | ⚠️ Partial | Basic | Response time monitored, not optimized | |
| NFR2: Scalability | ⚠️ Partial | Single-thread | No concurrent processing | |
| NFR3: Reliability | ⚠️ Partial | Basic | Error handling, no retry logic | |
| DEFERRED TO LATER | ||||
| FR8-FR13 | ❌ Out of Scope | N/A | User accounts, corrections, publishing | |
| FR44-FR53 | ❌ Out of Scope | N/A | Advanced features (V1.0+) | |
| NFR4: Security | ❌ Out of Scope | N/A | POC2 | |
| NFR5: Maintainability | ❌ Out of Scope | N/A | POC2 | |
| NFR12: Security Controls | ❌ Out of Scope | N/A | Beta 0 | |
| NFR13: Monitoring | ❌ Out of Scope | N/A | POC2 == 3. POC Simplifications == === 3.1 FR1: Claim Extraction (Full Implementation) === Main Requirement: AI extracts factual claims from input text POC Implementation: |
- ✅ AKEL extracts claims using LLM
- ✅ Each claim includes original text reference
- ✅ Claims are identified as factual/non-factual
- ❌ No advanced claim parsing (added in POC2) Acceptance Criteria:
- Extracts 3-5 claims from typical article
- Identifies factual vs non-factual claims
- Quality Gate 1 validates extraction === 3.2 FR3: Multiple Scenarios (Full Implementation) === Main Requirement: Generate multiple interpretation scenarios for ambiguous claims POC Implementation:
- ✅ AKEL generates 2-3 scenarios per claim
- ✅ Scenarios capture different interpretations
- ✅ Each scenario is evaluated separately
- ✅ Verdict considers all scenarios Acceptance Criteria:
- Generates 2+ scenarios for ambiguous claims
- Scenarios are meaningfully different
- All scenarios are evaluated === 3.3 FR4: Analysis Summary (Basic Implementation) === Main Requirement: Provide user-friendly summary of analysis POC Implementation:
- ✅ Simple text summary generated
- ❌ No rich formatting (added in Beta 0)
- ❌ No visual elements (added in Beta 0)
- ❌ No interactive features (added in Beta 0) POC Format:
```
Claim: [extracted claim]
Scenarios: [list of scenarios]
Evidence: [supporting/opposing evidence]
Verdict: [probability with uncertainty]
``` === 3.4 FR5-FR6: Evidence Collection & Evaluation (Full Implementation) === Main Requirements: - FR5: Collect supporting and opposing evidence
- FR6: Evaluate evidence source reliability POC Implementation:
- ✅ AKEL searches for evidence (web/knowledge base)
- ✅ Mandatory contradiction search (finds opposing evidence)
- ✅ Source reliability scoring
- ❌ No evidence deduplication (added in POC2)
- ❌ No advanced source verification (added in POC2) Acceptance Criteria:
- Finds 2+ supporting evidence items
- Finds 1+ opposing evidence (if exists)
- Sources scored for reliability === 3.5 FR7: Automated Verdicts (Full Implementation) === Main Requirement: AI computes verdicts with uncertainty quantification POC Implementation:
- ✅ Probabilistic verdicts (0-100% confidence)
- ✅ Uncertainty explicitly stated
- ✅ Reasoning chain provided
- ✅ Quality Gate 4 validates verdict confidence POC Output:
```
Verdict: 70% likely true
Uncertainty: ±15% (moderate confidence)
Reasoning: Based on 3 high-quality sources...
Confidence Level: MEDIUM
``` Acceptance Criteria: - Verdicts include probability (0-100%)
- Uncertainty explicitly quantified
- Reasoning chain explains verdict === 3.6 NFR11: Quality Assurance Framework (LITE VERSION) === Main Requirement: Complete quality assurance with 7 quality gates POC Implementation: 2 gates only Quality Gate 1: Claim Validation
- ✅ Validates claim is factual and verifiable
- ✅ Blocks non-factual claims (opinion/prediction/ambiguous)
- ✅ Provides clear rejection reason Quality Gate 4: Verdict Confidence Assessment
- ✅ Validates ≥2 sources found
- ✅ Validates quality score ≥0.6
- ✅ Blocks low-confidence verdicts
- ✅ Provides clear rejection reason Out of Scope (POC2+):
- ❌ Gate 2: Evidence Relevance
- ❌ Gate 3: Scenario Coherence
- ❌ Gate 5: Source Diversity
- ❌ Gate 6: Reasoning Validity
- ❌ Gate 7: Output Completeness Rationale: Prove gate concept works. Add remaining gates in POC2 after validating approach. === 3.7 NFR1-3: Performance, Scalability, Reliability (Basic) === Main Requirements:
- NFR1: Response time < 30 seconds
- NFR2: Handle 1000+ concurrent users
- NFR3: 99.9% uptime POC Implementation:
- ⚠️ Response time monitored (not optimized)
- ⚠️ Single-threaded processing (no concurrency)
- ⚠️ Basic error handling (no advanced retry logic) Rationale: POC proves functionality. Performance optimization happens in POC2. POC Acceptance:
- Analysis completes (no timeout requirement)
- Errors don't crash system
- Basic logging in place == 4. What's NOT in POC Scope == === 4.1 User-Facing Features (Beta 0+) === Out of Scope:
- ❌ User accounts and authentication (FR8)
- ❌ User corrections system (FR9, FR45-46)
- ❌ Public publishing interface (FR10)
- ❌ Social sharing (FR11)
- ❌ Email notifications (FR12)
- ❌ API access (FR13) Rationale: POC validates AI capabilities. User features added in Beta 0. === 4.2 Advanced Features (V1.0+) === Out of Scope:
- ❌ IFCN compliance (FR47)
- ❌ ClaimReview schema (FR48)
- ❌ Archive.org integration (FR49)
- ❌ OSINT toolkit (FR50)
- ❌ Video verification (FR51)
- ❌ Deepfake detection (FR52)
- ❌ Cross-org sharing (FR53) Rationale: Advanced features require proven platform. Added post-V1.0. === 4.3 Production Requirements (POC2, Beta 0) === Out of Scope:
- ❌ Security controls (NFR4, NFR12)
- ❌ Code maintainability (NFR5)
- ❌ System monitoring (NFR13)
- ❌ Evidence deduplication
- ❌ Advanced source verification
- ❌ Full 7-gate quality framework Rationale: POC proves concept. Production hardening happens in POC2 and Beta 0. == 5. POC Output Specification == === 5.1 Required Output Elements === For each analyzed claim, POC must produce: 1. Claim
- Original text
- Classification (factual/non-factual/ambiguous)
- If non-factual: Clear reason why 2. Scenarios (if factual)
- 2-3 interpretation scenarios
- Each scenario clearly described 3. Evidence (if factual)
- Supporting evidence (2+ items)
- Opposing evidence (if exists)
- Source URLs and reliability scores 4. Verdict (if factual)
- Probability (0-100%)
- Uncertainty quantification
- Confidence level (LOW/MEDIUM/HIGH)
- Reasoning chain 5. Quality Status
- Which gates passed/failed
- If failed: Clear explanation why === 5.2 Example POC Output === { "claim": { "text": "Switzerland has the highest life expectancy in Europe", "type": "factual", "gate1_status": "PASS" }, "scenarios": [ "Switzerland's overall life expectancy is highest", "Switzerland ranks highest for specific age groups" ], "evidence": { "supporting": [ { "source": "WHO Report 2023", "reliability": 0.95, "excerpt": "Switzerland: 83.4 years average..." } ], "opposing": [ { "source": "Eurostat 2024", "reliability": 0.90, "excerpt": "Spain leads at 83.5 years..." } ] }, "verdict": { "probability": 0.65, "uncertainty": 0.15, "confidence": "MEDIUM", "reasoning": "WHO and Eurostat show similar but conflicting data...", "gate4_status": "PASS" }
} == 6. Success Criteria == === 6.1 Functional Success === POC is successful if: ✅ FR1-FR7 Requirements Met:
- Extracts 3-5 factual claims from test articles
2. Generates 2-3 scenarios per ambiguous claim
3. Finds supporting AND opposing evidence
4. Computes probabilistic verdicts with uncertainty
5. Provides clear reasoning chains ✅ Quality Gates Work: - Gate 1 blocks non-factual claims (100% block rate)
2. Gate 4 blocks low-quality verdicts (blocks if <2 sources or quality <0.6)
3. Clear rejection reasons provided ✅ NFR11 Met: - Quality gates reduce hallucination rate
2. Blocked outputs have clear explanations
3. Quality metrics are logged === 6.2 Quality Thresholds === Minimum Acceptable:
- ≥70% of test claims correctly classified (factual/non-factual)
- ≥60% of verdicts are reasonable (human evaluation)
- Gate 1 blocks 100% of non-factual claims
- Gate 4 blocks verdicts with <2 sources Target:
- ≥80% claims correctly classified
- ≥75% verdicts are reasonable
- <10% false positives (blocking good claims) === 6.3 POC Decision Gate === After POC1, we decide: ✅ PROCEED to POC2 if:
- Success criteria met
- Quality gates demonstrably improve output
- Core workflow is technically sound
- Clear path to production quality ⚠️ ITERATE POC1 if:
- Success criteria partially met
- Gates work but need tuning
- Core issues identified but fixable ❌ PIVOT APPROACH if:
- Success criteria not met
- Fundamental AI limitations discovered
- Quality gates insufficient
- Alternative approach needed == 7. Test Cases == === 7.1 Happy Path === Test 1: Simple Factual Claim
- Input: "Paris is the capital of France"
- Expected: Factual, 1 scenario, verdict 95% true Test 2: Ambiguous Claim
- Input: "Switzerland has the highest income in Europe"
- Expected: Factual, 2-3 scenarios, verdict with uncertainty Test 3: Statistical Claim
- Input: "10% of people have condition X"
- Expected: Factual, evidence with numbers, probabilistic verdict === 7.2 Edge Cases === Test 4: Opinion
- Input: "Paris is the best city"
- Expected: Non-factual (opinion), blocked by Gate 1 Test 5: Prediction
- Input: "Bitcoin will reach $100,000 next year"
- Expected: Non-factual (prediction), blocked by Gate 1 Test 6: Insufficient Evidence
- Input: Obscure factual claim with no sources
- Expected: Blocked by Gate 4 (<2 sources) === 7.3 Quality Gate Tests === Test 7: Gate 1 Effectiveness
- Input: Mix of 10 factual + 10 non-factual claims
- Expected: Gate 1 blocks all 10 non-factual (100% precision) Test 8: Gate 4 Effectiveness
- Input: Claims with varying evidence availability
- Expected: Gate 4 blocks low-confidence verdicts == 8. Technical Architecture (POC) == === 8.1 Simplified Architecture === POC Tech Stack:
- Frontend: Simple web interface (Next.js + TypeScript)
- Backend: Single API endpoint
- AI: Claude API (Sonnet 4.5)
- Database: Local JSON files (no database)
- Deployment: Single server Architecture Diagram: See POC1 Specification === 8.2 AKEL Implementation === POC AKEL:
- Single-threaded processing
- Synchronous API calls
- No caching
- Basic error handling
- Console logging Full AKEL (POC2+):
- Multi-threaded processing
- Async API calls
- Evidence caching
- Advanced error handling with retry
- Structured logging + monitoring == 9. POC Philosophy == === 9.1 Core Principles === 1. Prove Concept, Not Production
- POC validates AI can do the job
- Production quality comes in POC2 and Beta 0
- Focus on "does it work?" not "is it perfect?" 2. Implement Subset of Requirements
- POC covers FR1-7, NFR11 (lite)
- All other requirements deferred
- Clear mapping to Main Requirements 3. Quality Gates Validate Approach
- 2 gates prove the concept
- Remaining 5 gates added in POC2
- Gates must demonstrably improve quality 4. Iterate Based on Results
- POC results determine next steps
- Decision gate after POC1
- Flexibility to pivot if needed === 9.2 Success = Clear Path Forward === POC succeeds if we can confidently answer: ✅ Technical Feasibility:
- Can AI extract claims reliably?
- Can AI find balanced evidence?
- Can AI compute reasonable verdicts? ✅ Quality Approach:
- Do quality gates improve output?
- Can we measure and track quality?
- Is the gate approach scalable? ✅ Production Path:
- Is the core architecture sound?
- What needs improvement for production?
- Is POC2 the right next step? == 10. Related Pages == * Main Requirements - Full system requirements (this POC implements a subset)
- POC1 Specification (Detailed) - Detailed POC1 technical specs
- POC Summary - High-level POC overview
- Implementation Roadmap - POC1, POC2, Beta 0, V1.0 phases
- User Needs - What users need (drives requirements) Document Owner: Technical Team Review Frequency: After each POC iteration Version History:
- v1.0 - Initial POC requirements
- v2.0 - Updated after specification cross-check
- v3.0 - Aligned with Main Requirements (FR/NFR IDs added)