Changes for page POC Requirements (POC1 & POC2)

Last modified by Robert Schaub on 2025/12/24 18:27

From 1.1 to 2.1 From 3.1 to 3.2

From version 2.1

edited by Robert Schaub
on 2025/12/24 13:58

Change comment: Imported from XAR

To version 3.1

edited by Robert Schaub
on 2025/12/24 17:59

Change comment: Imported from XAR

Raw
Rendered

Summary

Page properties (1 modified, 0 added, 0 removed)

Details

Page properties

Content

@@ -1,7 +1,7 @@
  = POC Requirements =
--**Status:** ✅ Approved for Development
--**Version:** 2.0 (Updated after Specification Cross-Check)
++**Status:** ✅ Approved for Development
++**Version:** 2.0 (Updated after Specification Cross-Check)
  **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
  == 1. POC Overview ==
@@ -63,7 +63,7 @@
  **What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument
--**Length:** 4-6 sentences
++**Length:** 4-6 sentences
  **Content (Required Elements):**
 . **Article's main thesis/claim** - What is the article trying to argue or prove?
@@ -113,9 +113,9 @@
  === 2.2 Component 2: CLAIMS IDENTIFICATION ===
--**What:** List of factual claims extracted from article
--**Format:** Numbered list
--**Quantity:** 3-5 claims
++**What:** List of factual claims extracted from article
++**Format:** Numbered list
++**Quantity:** 3-5 claims
  **Requirements:**
  * Factual claims only (not opinions/questions)
  * Clearly stated
@@ -133,8 +133,8 @@
  === 2.3 Component 3: CLAIMS VERDICTS ===
--**What:** Verdict for each claim identified
--**Format:** Per claim structure
++**What:** Verdict for each claim identified
++**Format:** Per claim structure
  **Required Elements:**
  * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
@@ -161,7 +161,7 @@
  **Risk Tier Display:**
  * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections
--* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
++* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
  * **Tier C (Green):** Low Risk - Facts/Definitions/History
  **Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.
@@ -168,8 +168,8 @@
  === 2.4 Component 4: ARTICLE SUMMARY (Optional) ===
--**What:** Brief summary of original article content
--**Length:** 3-5 sentences
++**What:** Brief summary of original article content
++**Length:** 3-5 sentences
  **Tone:** Neutral (article's position, not FactHarbor's analysis)
  **Example:**
@@ -289,7 +289,7 @@
  **POC Architecture (Simplified):**
  {{code}}
  User Input → Single AKEL Call → Output Display
--           (all processing)
++ (all processing)
  {{/code}}
  **Full System Architecture:**
@@ -382,15 +382,15 @@
  **Primary Label (top of analysis):**
  {{code}}
  ╔════════════════════════════════════════════════════════════╗
--║  [AI-GENERATED - POC/DEMO]                                ║
--║                                                            ║
--║  This analysis was produced entirely by AI and has not    ║
--║  been human-reviewed. Use for demonstration purposes.     ║
--║                                                            ║
--║  Source: AI/AKEL v1.0 (POC)                               ║
--║  Review Status: Not Reviewed (Proof-of-Concept)          ║
--║  Quality Gates: 4/4 Passed (Simplified)                  ║
--║  Last Updated: [timestamp]                                ║
++║ [AI-GENERATED - POC/DEMO] ║
++║ ║
++║ This analysis was produced entirely by AI and has not ║
++║ been human-reviewed. Use for demonstration purposes. ║
++║ ║
++║ Source: AI/AKEL v1.0 (POC) ║
++║ Review Status: Not Reviewed (Proof-of-Concept) ║
++║ Quality Gates: 4/4 Passed (Simplified) ║
++║ Last Updated: [timestamp] ║
  ╚════════════════════════════════════════════════════════════╝
  {{/code}}
@@ -575,10 +575,10 @@
 . Extract 3-5 factual claims
 . For each claim:
--   - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
--   - Assign confidence score (0-100%)
--   - Assign risk tier (A/B/C)
--   - Write brief reasoning (1-3 sentences)
++ - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
++ - Assign confidence score (0-100%)
++ - Assign risk tier (A/B/C)
++ - Write brief reasoning (1-3 sentences)
 . Generate analysis summary (3-5 sentences)
 . Generate article summary (3-5 sentences)
 . Run basic quality checks
@@ -697,11 +697,11 @@
  **Functionality:**
  * For each claim, AI:
--  * Evaluates claim based on available evidence/knowledge
--  * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
--  * Assigns confidence score (0-100%)
--  * Assigns risk tier (A/B/C)
--  * Writes brief reasoning (1-3 sentences)
++ * Evaluates claim based on available evidence/knowledge
++ * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
++ * Assigns confidence score (0-100%)
++ * Assigns risk tier (A/B/C)
++ * Writes brief reasoning (1-3 sentences)
  * System displays verdict for each claim
  **Critical:** NO MANUAL EDITING ALLOWED
@@ -729,9 +729,9 @@
  **Functionality:**
  * AI summarizes findings in 3-5 sentences:
--  * How many claims found
--  * Distribution of verdicts
--  * Overall assessment
++ * How many claims found
++ * Distribution of verdicts
++ * Overall assessment
  * System displays at top of results
  **Critical:** NO MANUAL EDITING ALLOWED
@@ -813,8 +813,8 @@
  **Pipeline:**
  {{code}}
  User Input → AKEL Processing → Output Display
--           ↓
--     ZERO human editing
++ ↓
++ ZERO human editing
  {{/code}}
  **If AI output is poor:**
@@ -952,23 +952,23 @@
  {{code}}
 . User submits text or URL
--   ↓
++ ↓
 . Backend receives request
--   ↓
++ ↓
 . If URL: Fetch article text
--   ↓
++ ↓
 . Call Claude API with single prompt:
--   "Extract claims, evaluate each, provide verdicts"
--   ↓
++ "Extract claims, evaluate each, provide verdicts"
++ ↓
 . Claude API returns:
--   - Analysis summary
--   - Claims list
--   - Verdicts for each claim (with risk tiers)
--   - Article summary (optional)
--   - Quality gate results
--   ↓
++ - Analysis summary
++ - Claims list
++ - Verdicts for each claim (with risk tiers)
++ - Article summary (optional)
++ - Quality gate results
++ ↓
 . Backend parses response
--   ↓
++ ↓
 . Frontend displays results with Mode 2 labeling
  {{/code}}
@@ -981,36 +981,36 @@
  Task: Analyze this article and provide:
 . Identify the article's main thesis/conclusion
--   - What is the article trying to argue or prove?
--   - What is the primary claim or conclusion?
++ - What is the article trying to argue or prove?
++ - What is the primary claim or conclusion?
 . Extract 3-5 factual claims from the article
--   - Note which claims are CENTRAL to the main thesis
--   - Note which claims are SUPPORTING facts
++ - Note which claims are CENTRAL to the main thesis
++ - Note which claims are SUPPORTING facts
 . For each claim:
--   - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
--   - Assign confidence score (0-100%)
--   - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
--   - Write brief reasoning (1-3 sentences)
++ - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
++ - Assign confidence score (0-100%)
++ - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
++ - Write brief reasoning (1-3 sentences)
 . Assess relationship between claims and main thesis:
--   - Do the claims actually support the article's conclusion?
--   - Are there logical leaps or unsupported inferences?
--   - Is the article's framing misleading even if individual facts are accurate?
++ - Do the claims actually support the article's conclusion?
++ - Are there logical leaps or unsupported inferences?
++ - Is the article's framing misleading even if individual facts are accurate?
 . Run quality gates:
--   - Check: ≥2 sources found
--   - Attempt: Basic contradiction search
--   - Calculate: Confidence scores
--   - Verify: Structural integrity
++ - Check: ≥2 sources found
++ - Attempt: Basic contradiction search
++ - Calculate: Confidence scores
++ - Verify: Structural integrity
 . Write context-aware analysis summary (4-6 sentences):
--   - State article's main thesis
--   - Report claims found and verdict distribution
--   - Note if central claims are problematic
--   - Assess whether evidence supports conclusion
--   - Overall credibility considering claim importance
++ - State article's main thesis
++ - Report claims found and verdict distribution
++ - Note if central claims are problematic
++ - Assess whether evidence supports conclusion
++ - Overall credibility considering claim importance
 . Write article summary (3-5 sentences: neutral summary of article content)
@@ -1234,9 +1234,9 @@
  === 13.2 Decision Criteria Summary ===
  {{code}}
--AI Quality < 60%  → NO-GO (approach doesn't work)
++AI Quality < 60% → NO-GO (approach doesn't work)
  AI Quality 60-70% → ITERATE (improve and retry)
--AI Quality ≥70%   → GO (proceed to POC2)
++AI Quality ≥70% → GO (proceed to POC2)
  {{/code}}
  == 14. Key Risks & Mitigations ==
@@ -1243,8 +1243,8 @@
  === 14.1 Risk: AI Quality Not Good Enough ===
--**Likelihood:** Medium-High
--**Impact:** POC fails
++**Likelihood:** Medium-High
++**Impact:** POC fails
  **Mitigation:**
  * Extensive prompt engineering and testing
@@ -1256,8 +1256,8 @@
  === 14.2 Risk: AI Consistency Issues ===
--**Likelihood:** Medium
--**Impact:** Works sometimes, fails other times
++**Likelihood:** Medium
++**Impact:** Works sometimes, fails other times
  **Mitigation:**
  * Test with 10+ diverse articles
@@ -1268,8 +1268,8 @@
  === 14.3 Risk: Output Incomprehensible ===
--**Likelihood:** Low-Medium
--**Impact:** Users can't understand analysis
++**Likelihood:** Low-Medium
++**Impact:** Users can't understand analysis
  **Mitigation:**
  * Create clear explainer document
@@ -1281,8 +1281,8 @@
  === 14.4 Risk: API Rate Limits / Costs ===
--**Likelihood:** Low
--**Impact:** System slow or expensive
++**Likelihood:** Low
++**Impact:** System slow or expensive
  **Mitigation:**
  * Monitor API usage
@@ -1293,8 +1293,8 @@
  === 14.5 Risk: Scope Creep ===
--**Likelihood:** Medium
--**Impact:** POC becomes too complex
++**Likelihood:** Medium
++**Impact:** POC becomes too complex
  **Mitigation:**
  * Strict scope discipline
@@ -1336,18 +1336,18 @@
  === 15.2 What POC Is ===
--✅ Testing AI capability without humans
--✅ Proving core technical concept
--✅ Fast validation of approach
--✅ Honest assessment of feasibility
++✅ Testing AI capability without humans
++✅ Proving core technical concept
++✅ Fast validation of approach
++✅ Honest assessment of feasibility
  === 15.3 What POC Is NOT ===
--❌ Building a product
--❌ Production-ready system
--❌ Feature-complete platform
--❌ Perfectly accurate analysis
--❌ Polished user experience
++❌ Building a product
++❌ Production-ready system
++❌ Feature-complete platform
++❌ Perfectly accurate analysis
++❌ Polished user experience
  == 16. Success = Clear Path Forward ==
@@ -1377,3 +1377,52 @@
  **Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)
++
++=== NFR-POC-11: LLM Provider Abstraction (POC1) ===
++
++**Requirement:** POC1 MUST implement LLM abstraction layer with support for multiple providers.
++
++**POC1 Implementation:**
++
++* **Primary Provider:** Anthropic Claude API
++ * Stage 1: Claude Haiku 4
++ * Stage 2: Claude Sonnet 3.5 (cached)
++ * Stage 3: Claude Sonnet 3.5
++
++* **Provider Interface:** Abstract LLMProvider interface implemented
++
++* **Configuration:** Environment variables for provider selection
++ * {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}}
++ * {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}}
++ * {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}}
++
++* **Failover:** Basic error handling with cache fallback for Stage 2
++
++* **Cost Tracking:** Log provider name and cost per request
++
++**Future (POC2/Beta):**
++
++* Secondary provider (OpenAI) with automatic failover
++* Admin API for runtime provider switching
++* Cost comparison dashboard
++* Cross-provider output verification
++
++**Success Criteria:**
++
++* All LLM calls go through abstraction layer (no direct API calls)
++* Provider can be changed via environment variable without code changes
++* Cost tracking includes provider name in logs
++* Stage 2 falls back to cache on provider failure
++
++**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor.Specification.POC.API-and-Schemas.WebHome]] Section 6
++
++**Dependencies:**
++* NFR-14 (Main Requirements)
++* Design Decision 9
++* Architecture Section 2.2
++
++**Priority:** HIGH (P1)
++
++**Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.
++
++

Changes for page POC Requirements (POC1 & POC2)

Summary

Details

Applications

Navigation

Need help?