Changes for page POC Requirements (POC1 & POC2)

Last modified by Robert Schaub on 2025/12/23 16:49

From version 1.1

edited by Robert Schaub
on 2025/12/23 16:15

Change comment: Imported from XAR

To version 1.2

edited by Robert Schaub
on 2025/12/23 16:49

Change comment: Renamed from xwiki:Test.FactHarbor.Specification.POC.Requirements

Raw
Rendered

Summary

Page properties (1 modified, 0 added, 0 removed)

Details

Page properties

Content

@@ -14,9 +14,11 @@
  === 1.1 What POC Tests ===
  **Core Question:**
++
  > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?
  **What we're proving:**
++
  * AI can identify factual claims from text
  * AI can evaluate those claims with structured evidence
  * Quality gates can filter unreliable outputs
@@ -23,6 +23,7 @@
  * The core workflow is technically feasible
  **What we're NOT proving:**
++
  * Production-ready reliability (that's POC2)
  * User-facing features (that's Beta 0)
  * Full IFCN compliance (that's V1.0)
@@ -32,15 +32,15 @@
  POC1 implements a **subset** of the full system requirements defined in [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]].
  **Scope Summary:**
++
  * **In Scope:** 8 requirements (7 FRs + 1 NFR)
  * **Partial:** 3 NFRs (simplified versions)
  * **Out of Scope:** 19 requirements (deferred to later phases)
--
  == 2. POC1 Scope ==
  {{success}}
--**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor.Roadmap.Requirements-Roadmap-Matrix.WebHome]]
++**Authoritative Source for Phase Mapping:** [[Requirements Roadmap Matrix>>Test.FactHarbor V0\.9\.84.Roadmap.Requirements-Roadmap-Matrix.WebHome]]
  The Roadmap Matrix is the single source of truth for which requirements are implemented in which phases. This page provides POC1-specific implementation details only.
  {{/success}}
@@ -53,6 +53,7 @@
  | **NFR11** | Quality Assurance Framework | 4 quality gates implemented
  **POC1 also implements these workflow components** (detailed as FR1-FR6, FR11, FR13 in implementation sections below):
++
  * Claim extraction (FR1)
  * Claim context (FR2)
  * Multiple scenarios (FR3)
@@ -63,6 +63,7 @@
  * In-article highlighting (FR13) - deferred to Beta 0
  **Partial implementations:**
++
  * NFR1 (Explainability) - Basic only
  * NFR2 (Performance) - Functional but not optimized
  * NFR3 (Transparency) - Basic only
@@ -78,6 +78,7 @@
  **Main Requirement:** AI extracts factual claims from input text
  **POC Implementation:**
++
  * ✅ AKEL extracts claims using LLM
  * ✅ Each claim includes original text reference
  * ✅ Claims are identified as factual/non-factual
@@ -84,16 +84,17 @@
  * ❌ No advanced claim parsing (added in POC2)
  **Acceptance Criteria:**
++
  * Extracts 3-5 claims from typical article
  * Identifies factual vs non-factual claims
  * Quality Gate 1 validates extraction
--
  === 3.2 FR3: Multiple Scenarios (Full Implementation) ===
  **Main Requirement:** Generate multiple interpretation scenarios for ambiguous claims
  **POC Implementation:**
++
  * ✅ AKEL generates 2-3 scenarios per claim
  * ✅ Scenarios capture different interpretations
  * ✅ Each scenario is evaluated separately
@@ -100,16 +100,17 @@
  * ✅ Verdict considers all scenarios
  **Acceptance Criteria:**
++
  * Generates 2+ scenarios for ambiguous claims
  * Scenarios are meaningfully different
  * All scenarios are evaluated
--
  === 3.3 FR4: Analysis Summary (Basic Implementation) ===
  **Main Requirement:** Provide user-friendly summary of analysis
  **POC Implementation:**
++
  * ✅ Simple text summary generated
  * ❌ No rich formatting (added in Beta 0)
  * ❌ No visual elements (added in Beta 0)
@@ -127,10 +127,12 @@
  === 3.4 FR5-FR6: Evidence Collection & Evaluation (Full Implementation) ===
  **Main Requirements:**
++
  * FR5: Collect supporting and opposing evidence
  * FR6: Evaluate evidence source reliability
  **POC Implementation:**
++
  * ✅ AKEL searches for evidence (web/knowledge base)
  * ✅ **Mandatory contradiction search** (finds opposing evidence)
  * ✅ Source reliability scoring
@@ -138,16 +138,17 @@
  * ❌ No advanced source verification (added in POC2)
  **Acceptance Criteria:**
++
  * Finds 2+ supporting evidence items
  * Finds 1+ opposing evidence (if exists)
  * Sources scored for reliability
--
  === 3.5 FR7: Automated Verdicts (Full Implementation) ===
  **Main Requirement:** AI computes verdicts with uncertainty quantification
  **POC Implementation:**
++
  * ✅ Probabilistic verdicts (0-100% confidence)
  * ✅ Uncertainty explicitly stated
  * ✅ Reasoning chain provided
@@ -162,11 +162,11 @@
  ```
  **Acceptance Criteria:**
++
  * Verdicts include probability (0-100%)
  * Uncertainty explicitly quantified
  * Reasoning chain explains verdict
--
  === 3.6 NFR11: Quality Assurance Framework (LITE VERSION) ===
  **Main Requirement:** Complete quality assurance with 7 quality gates
@@ -174,11 +174,13 @@
  **POC Implementation:** **2 gates only**
  **Quality Gate 1: Claim Validation**
++
  * ✅ Validates claim is factual and verifiable
  * ✅ Blocks non-factual claims (opinion/prediction/ambiguous)
  * ✅ Provides clear rejection reason
  **Quality Gate 4: Verdict Confidence Assessment**
++
  * ✅ Validates ≥2 sources found
  * ✅ Validates quality score ≥0.6
  * ✅ Blocks low-confidence verdicts
@@ -185,6 +185,7 @@
  * ✅ Provides clear rejection reason
  **Out of Scope (POC2+):**
++
  * ❌ Gate 2: Evidence Relevance
  * ❌ Gate 3: Scenario Coherence
  * ❌ Gate 5: Source Diversity
@@ -197,11 +197,13 @@
  === 3.7 NFR1-3: Performance, Scalability, Reliability (Basic) ===
  **Main Requirements:**
++
  * NFR1: Response time < 30 seconds
  * NFR2: Handle 1000+ concurrent users
  * NFR3: 99.9% uptime
  **POC Implementation:**
++
  * ⚠️ **Response time monitored** (not optimized)
  * ⚠️ **Single-threaded processing** (no concurrency)
  * ⚠️ **Basic error handling** (no advanced retry logic)
@@ -209,11 +209,11 @@
  **Rationale:** POC proves functionality. Performance optimization happens in POC2.
  **POC Acceptance:**
++
  * Analysis completes (no timeout requirement)
  * Errors don't crash system
  * Basic logging in place
--
  == 4. What's NOT in POC Scope ==
  === 4.1 User-Facing Features (Beta 0+) ===
@@ -223,6 +223,7 @@
  {{/warning}}
  **Out of Scope:**
++
  * ❌ User accounts and authentication (FR8)
  * ❌ User corrections system (FR9, FR45-46)
  * ❌ Public publishing interface (FR10)
@@ -236,6 +236,7 @@
  === 4.2 Advanced Features (V1.0+) ===
  **Out of Scope:**
++
  * ❌ IFCN compliance (FR47)
  * ❌ ClaimReview schema (FR48)
  * ❌ Archive.org integration (FR49)
@@ -250,6 +250,7 @@
  === 4.3 Production Requirements (POC2, Beta 0) ===
  **Out of Scope:**
++
  * ❌ Security controls (NFR4, NFR12)
  * ❌ Code maintainability (NFR5)
  * ❌ System monitoring (NFR13)
@@ -266,21 +266,26 @@
  For each analyzed claim, POC must produce:
--**1. Claim**
++*
++**
++**1. Claim
  * Original text
  * Classification (factual/non-factual/ambiguous)
  * If non-factual: Clear reason why
  **2. Scenarios** (if factual)
++
  * 2-3 interpretation scenarios
  * Each scenario clearly described
  **3. Evidence** (if factual)
++
  * Supporting evidence (2+ items)
  * Opposing evidence (if exists)
  * Source URLs and reliability scores
  **4. Verdict** (if factual)
++
  * Probability (0-100%)
  * Uncertainty quantification
  * Confidence level (LOW/MEDIUM/HIGH)
@@ -287,10 +287,10 @@
  * Reasoning chain
  **5. Quality Status**
++
  * Which gates passed/failed
  * If failed: Clear explanation why
--
  === 5.2 Example POC Output ===
  {{code language="json"}}
@@ -342,6 +342,7 @@
  POC is successful if:
  ✅ **FR1-FR7 Requirements Met:**
++
 . Extracts 3-5 factual claims from test articles
 . Generates 2-3 scenarios per ambiguous claim
 . Finds supporting AND opposing evidence
@@ -349,19 +349,21 @@
 . Provides clear reasoning chains
  ✅ **Quality Gates Work:**
++
 . Gate 1 blocks non-factual claims (100% block rate)
 . Gate 4 blocks low-quality verdicts (blocks if <2 sources or quality <0.6)
 . Clear rejection reasons provided
  ✅ **NFR11 Met:**
++
 . Quality gates reduce hallucination rate
 . Blocked outputs have clear explanations
 . Quality metrics are logged
--
  === 6.2 Quality Thresholds ===
  **Minimum Acceptable:**
++
  * ≥70% of test claims correctly classified (factual/non-factual)
  * ≥60% of verdicts are reasonable (human evaluation)
  * Gate 1 blocks 100% of non-factual claims
@@ -368,16 +368,17 @@
  * Gate 4 blocks verdicts with <2 sources
  **Target:**
++
  * ≥80% claims correctly classified
  * ≥75% verdicts are reasonable
  * <10% false positives (blocking good claims)
--
  === 6.3 POC Decision Gate ===
  **After POC1, we decide:**
  **✅ PROCEED to POC2** if:
++
  * Success criteria met
  * Quality gates demonstrably improve output
  * Core workflow is technically sound
@@ -384,65 +384,72 @@
  * Clear path to production quality
  **⚠️ ITERATE POC1** if:
++
  * Success criteria partially met
  * Gates work but need tuning
  * Core issues identified but fixable
  **❌ PIVOT APPROACH** if:
++
  * Success criteria not met
  * Fundamental AI limitations discovered
  * Quality gates insufficient
  * Alternative approach needed
--
  == 7. Test Cases ==
  === 7.1 Happy Path ===
  **Test 1: Simple Factual Claim**
++
  * Input: "Paris is the capital of France"
--* Expected: Factual, 1 scenario, verdict ~95% true
++* Expected: Factual, 1 scenario, verdict 95% true
  **Test 2: Ambiguous Claim**
++
  * Input: "Switzerland has the highest income in Europe"
  * Expected: Factual, 2-3 scenarios, verdict with uncertainty
  **Test 3: Statistical Claim**
++
  * Input: "10% of people have condition X"
  * Expected: Factual, evidence with numbers, probabilistic verdict
--
  === 7.2 Edge Cases ===
  **Test 4: Opinion**
++
  * Input: "Paris is the best city"
  * Expected: Non-factual (opinion), blocked by Gate 1
  **Test 5: Prediction**
++
  * Input: "Bitcoin will reach $100,000 next year"
  * Expected: Non-factual (prediction), blocked by Gate 1
  **Test 6: Insufficient Evidence**
++
  * Input: Obscure factual claim with no sources
  * Expected: Blocked by Gate 4 (<2 sources)
--
  === 7.3 Quality Gate Tests ===
  **Test 7: Gate 1 Effectiveness**
++
  * Input: Mix of 10 factual + 10 non-factual claims
  * Expected: Gate 1 blocks all 10 non-factual (100% precision)
  **Test 8: Gate 4 Effectiveness**
++
  * Input: Claims with varying evidence availability
  * Expected: Gate 4 blocks low-confidence verdicts
--
  == 8. Technical Architecture (POC) ==
  === 8.1 Simplified Architecture ===
  **POC Tech Stack:**
++
  * **Frontend:** Simple web interface (Next.js + TypeScript)
  * **Backend:** Single API endpoint
  * **AI:** Claude API (Sonnet 4.5)
@@ -455,6 +455,7 @@
  === 8.2 AKEL Implementation ===
  **POC AKEL:**
++
  * Single-threaded processing
  * Synchronous API calls
  * No caching
@@ -462,6 +462,7 @@
  * Console logging
  **Full AKEL (POC2+):**
++
  * Multi-threaded processing
  * Async API calls
  * Evidence caching
@@ -468,7 +468,6 @@
  * Advanced error handling with retry
  * Structured logging + monitoring
--
  == 9. POC Philosophy ==
  {{info}}
@@ -477,47 +477,55 @@
  === 9.1 Core Principles ===
--**1. Prove Concept, Not Production**
++*
++**
++**1. Prove Concept, Not Production
  * POC validates AI can do the job
  * Production quality comes in POC2 and Beta 0
  * Focus on "does it work?" not "is it perfect?"
  **2. Implement Subset of Requirements**
++
  * POC covers FR1-7, NFR11 (lite)
  * All other requirements deferred
  * Clear mapping to [[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]
  **3. Quality Gates Validate Approach**
++
  * 2 gates prove the concept
  * Remaining 5 gates added in POC2
  * Gates must demonstrably improve quality
  **4. Iterate Based on Results**
++
  * POC results determine next steps
  * Decision gate after POC1
  * Flexibility to pivot if needed
++=== 9.2 Success ===
--=== 9.2 Success = Clear Path Forward ===
++ Clear Path Forward ===
  POC succeeds if we can confidently answer:
  ✅ **Technical Feasibility:**
++
  * Can AI extract claims reliably?
  * Can AI find balanced evidence?
  * Can AI compute reasonable verdicts?
  ✅ **Quality Approach:**
++
  * Do quality gates improve output?
  * Can we measure and track quality?
  * Is the gate approach scalable?
  ✅ **Production Path:**
++
  * Is the core architecture sound?
  * What needs improvement for production?
  * Is POC2 the right next step?
--
  == 10. Related Pages ==
  * **[[Main Requirements>>FactHarbor.Specification.Requirements.WebHome]]** - Full system requirements (this POC implements a subset)
@@ -526,11 +526,10 @@
  * **[[Implementation Roadmap>>FactHarbor.Roadmap.WebHome]]** - POC1, POC2, Beta 0, V1.0 phases
  * **[[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]** - What users need (drives requirements)
--
  **Document Owner:** Technical Team
  **Review Frequency:** After each POC iteration
  **Version History:**
++
  * v1.0 - Initial POC requirements
  * v2.0 - Updated after specification cross-check
  * v3.0 - Aligned with Main Requirements (FR/NFR IDs added)
--

Changes for page POC Requirements (POC1 & POC2)

Summary

Details

Applications

Navigation

Need help?