Changes for page POC Requirements (POC1 & POC2)

Last modified by Robert Schaub on 2026/02/08 08:26

From 2.1 to 2.2

From version 1.1

edited by Robert Schaub
on 2025/12/19 16:13

Change comment: Imported from XAR

To version 2.1

edited by Robert Schaub
on 2025/12/24 21:53

Change comment: Imported from XAR

Raw
Rendered

Summary

Page properties (2 modified, 0 added, 0 removed)

Details

Page properties

Title

@@ -1,1 +1,1 @@
--POC Requirements
++POC Requirements (POC1 & POC2)

Content

@@ -1,11 +1,18 @@
  = POC Requirements =
--**Status:** ✅ Approved for Development
--**Version:** 2.0 (Updated after Specification Cross-Check)
--**Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
-----
++{{info}}
++**POC1 Architecture:** 3-stage AKEL pipeline (Extract → Analyze → Holistic) with Redis caching, credit tracking, and LLM abstraction layer.
++See [[POC1 API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome]] for complete technical details.
++{{/info}}
++
++
++
++**Status:** ✅ Approved for Development
++**Version:** 2.0 (Updated after Specification Cross-Check)
++**Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
++
  == 1. POC Overview ==
  === 1.1 What POC Tests ===
@@ -26,8 +26,6 @@
  * Perfect accuracy
  * Complete feature set
-----
--
  === 1.2 Scenarios Deferred to POC2 ===
  **Intentional Simplification:**
@@ -61,33 +61,65 @@
  Claims → Verdicts (scenarios implicit in reasoning)
  {{/code}}
-----
--
  == 2. POC Output Specification ==
--=== 2.1 Component 1: ANALYSIS SUMMARY ===
++=== 2.1 Component 1: ANALYSIS SUMMARY (Context-Aware) ===
--**What:** Brief overview of findings
--**Length:** 3-5 sentences
--**Content:**
--* How many claims found
--* Distribution of verdicts
--* Overall assessment
++**What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument
--**Example:**
++**Length:** 4-6 sentences
++
++**Content (Required Elements):**
++1. **Article's main thesis/claim** - What is the article trying to argue or prove?
++2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts
++3. **Central vs. supporting claims** - Which claims are central to the article's argument?
++4. **Relationship assessment** - Do the claims support the article's conclusion?
++5. **Overall credibility** - Final assessment considering claim importance
++
++**Critical Innovation:**
++
++POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might:
++* Make accurate supporting facts but draw unsupported conclusions
++* Have one false central claim that invalidates the whole argument
++* Misframe accurate information to mislead
++
++**Good Example (Context-Aware):**
  {{code}}
--This article makes 4 claims about coffee's health effects. We found
--2 claims are well-supported, 1 is uncertain, and 1 is refuted.
--Overall assessment: mostly accurate with some exaggeration.
++This article argues that coffee cures cancer based on its antioxidant
++content. We analyzed 3 factual claims: 2 about coffee's chemical
++properties are well-supported, but the main causal claim is refuted
++by current evidence. The article confuses correlation with causation.
++Overall assessment: MISLEADING - makes an unsupported medical claim
++despite citing some accurate facts.
  {{/code}}
-----
++**Poor Example (Simple Aggregation - Don't Do This):**
++{{code}}
++This article makes 3 claims. 2 are well-supported and 1 is refuted.
++Overall assessment: mostly accurate (67% accurate).
++{{/code}}
++↑ This misses that the refuted claim IS the article's main point!
++**What POC1 Tests:**
++
++Can AI identify and assess:
++* ✅ The article's main thesis/conclusion?
++* ✅ Which claims are central vs. supporting?
++* ✅ Whether the evidence supports the conclusion?
++* ✅ Overall credibility considering logical structure?
++
++**If AI Cannot Do This:**
++
++That's valuable to learn in POC1! We'll:
++* Note as limitation
++* Fall back to simple aggregation with warning
++* Design explicit article-level analysis for POC2
++
  === 2.2 Component 2: CLAIMS IDENTIFICATION ===
--**What:** List of factual claims extracted from article
--**Format:** Numbered list
--**Quantity:** 3-5 claims
++**What:** List of factual claims extracted from article
++**Format:** Numbered list
++**Quantity:** 3-5 claims
  **Requirements:**
  * Factual claims only (not opinions/questions)
  * Clearly stated
@@ -103,12 +103,10 @@
  [4] Coffee prevents Alzheimer's completely
  {{/code}}
-----
--
  === 2.3 Component 3: CLAIMS VERDICTS ===
--**What:** Verdict for each claim identified
--**Format:** Per claim structure
++**What:** Verdict for each claim identified
++**Format:** Per claim structure
  **Required Elements:**
  * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
@@ -135,17 +135,15 @@
  **Risk Tier Display:**
  * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections
--* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
++* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
  * **Tier C (Green):** Low Risk - Facts/Definitions/History
  **Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.
-----
--
  === 2.4 Component 4: ARTICLE SUMMARY (Optional) ===
--**What:** Brief summary of original article content
--**Length:** 3-5 sentences
++**What:** Brief summary of original article content
++**Length:** 3-5 sentences
  **Tone:** Neutral (article's position, not FactHarbor's analysis)
  **Example:**
@@ -157,17 +157,60 @@
  to disease prevention. Recommends 2-3 cups daily for optimal health.
  {{/code}}
-----
++=== 2.5 Component 5: USAGE STATISTICS (Cost Tracking) ===
--=== 2.5 Total Output Size ===
++**What:** LLM usage metrics for cost optimization and scaling decisions
--**Combined:** ~200-300 words
--* Analysis Summary: 50-70 words
++**Purpose:**
++* Understand cost per analysis
++* Identify optimization opportunities
++* Project costs at scale
++* Inform architecture decisions
++
++**Display Format:**
++{{code}}
++USAGE STATISTICS:
++• Article: 2,450 words (12,300 characters)
++• Input tokens: 15,234
++• Output tokens: 892
++• Total tokens: 16,126
++• Estimated cost: $0.24 USD
++• Response time: 8.3 seconds
++• Cost per claim: $0.048
++• Model: claude-sonnet-4-20250514
++{{/code}}
++
++**Why This Matters:**
++
++At scale, LLM costs are critical:
++* 10,000 articles/month ≈ $200-500/month
++* 100,000 articles/month ≈ $2,000-5,000/month
++* Cost optimization can reduce expenses 30-50%
++
++**What POC1 Learns:**
++* How cost scales with article length
++* Prompt optimization opportunities (caching, compression)
++* Output verbosity tradeoffs
++* Model selection strategy (FAST vs. REASONING roles)
++* Article length limits (if needed)
++
++**Implementation:**
++* Claude API already returns usage data
++* No extra API calls needed
++* Display to user + log for aggregate analysis
++* Test with articles of varying lengths
++
++**Critical for GO/NO-GO:** Unit economics must be viable at scale!
++
++=== 2.6 Total Output Size ===
++
++**Combined:** ~220-350 words
++* Analysis Summary (Context-Aware): 60-90 words (4-6 sentences)
  * Claims Identification: 30-50 words
  * Claims Verdicts: 100-150 words
  * Article Summary: 30-50 words (optional)
-----
++**Note:** Analysis summary is slightly longer (4-6 sentences vs. 3-5) to accommodate context-aware assessment of article structure and logical reasoning.
  == 3. What's NOT in POC Scope ==
@@ -215,8 +215,6 @@
  * ❌ Analytics
  * ❌ A/B testing
-----
--
  == 4. POC Simplifications vs. Full System ==
  === 4.1 Architecture Comparison ===
@@ -224,7 +224,7 @@
  **POC Architecture (Simplified):**
  {{code}}
  User Input → Single AKEL Call → Output Display
--           (all processing)
++ (all processing)
  {{/code}}
  **Full System Architecture:**
@@ -245,8 +245,6 @@
  |Data Model|Stateless (no database)|PostgreSQL + Redis + S3
  |Architecture|Single prompt to Claude|AKEL Orchestrator + Components
-----
--
  === 4.2 Workflow Comparison ===
  **POC1 Workflow:**
@@ -264,8 +264,6 @@
 . **Time Evolution** (versioning, re-evaluation triggers)
  **Total: 6 phases with quality gates, ~10-30 seconds**
-----
--
  === 4.3 Why POC is Simplified ===
  **Engineering Rationale:**
@@ -284,8 +284,6 @@
  * ❌ POC doesn't validate scale (test in Beta)
  * ❌ POC doesn't validate scenario architecture (design in POC2)
-----
--
  === 4.4 Gap Between POC1 and POC2/Beta ===
  **What needs to be built for POC2:**
@@ -305,8 +305,6 @@
  **POC1 → POC2 is significant architectural expansion.**
-----
--
  == 5. Publication Mode & Labeling ==
  === 5.1 POC Publication Mode ===
@@ -320,22 +320,20 @@
  * All quality gates active (simplified)
  * Risk tier classification shown (demo)
-----
--
  === 5.2 User-Facing Labels ===
  **Primary Label (top of analysis):**
  {{code}}
  ╔════════════════════════════════════════════════════════════╗
--║  [AI-GENERATED - POC/DEMO]                                ║
--║                                                            ║
--║  This analysis was produced entirely by AI and has not    ║
--║  been human-reviewed. Use for demonstration purposes.     ║
--║                                                            ║
--║  Source: AI/AKEL v1.0 (POC)                               ║
--║  Review Status: Not Reviewed (Proof-of-Concept)          ║
--║  Quality Gates: 4/4 Passed (Simplified)                  ║
--║  Last Updated: [timestamp]                                ║
++║ [AI-GENERATED - POC/DEMO] ║
++║ ║
++║ This analysis was produced entirely by AI and has not ║
++║ been human-reviewed. Use for demonstration purposes. ║
++║ ║
++║ Source: AI/AKEL v1.0 (POC) ║
++║ Review Status: Not Reviewed (Proof-of-Concept) ║
++║ Quality Gates: 4/4 Passed (Simplified) ║
++║ Last Updated: [timestamp] ║
  ╚════════════════════════════════════════════════════════════╝
  {{/code}}
@@ -344,8 +344,6 @@
  * **[Risk: B]** 🟡 Medium Risk (Policy/Science)
  * **[Risk: C]** 🟢 Low Risk (Facts/Definitions)
-----
--
  === 5.3 Display Requirements ===
  **Must Show:**
@@ -363,8 +363,6 @@
  * Authoritative verdicts
  * Complete accuracy
-----
--
  === 5.4 Mode 2 vs. Full System Publication ===
  |=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3
@@ -375,8 +375,6 @@
  |Risk Display|Demo only|Workflow-integrated|Validated
  |User Actions|View only|Flag for review|Trust rating
-----
--
  == 6. Quality Gates (Simplified Implementation) ==
  === 6.1 Overview ===
@@ -395,8 +395,6 @@
  * Failures displayed to user (not blocking)
  * Full system has comprehensive validation
-----
--
  === 6.2 Gate 1: Source Quality (Basic) ===
  **Full System Requirements:**
@@ -417,8 +417,6 @@
  **Failure Handling:** Display error message, don't generate verdict
-----
--
  === 6.3 Gate 2: Contradiction Search (Basic) ===
  **Full System Requirements:**
@@ -441,8 +441,6 @@
  **Failure Handling:** Note "limited contradiction search" in output
-----
--
  === 6.4 Gate 3: Uncertainty Quantification (Basic) ===
  **Full System Requirements:**
@@ -463,8 +463,6 @@
  **Failure Handling:** Show "Confidence: Unknown" if calculation fails
-----
--
  === 6.5 Gate 4: Structural Integrity (Basic) ===
  **Full System Requirements:**
@@ -485,8 +485,6 @@
  **Failure Handling:** Display error message
-----
--
  === 6.6 Quality Gate Display ===
  **POC shows simplified status:**
@@ -509,8 +509,6 @@
  Note: This analysis has limited evidence. Use with caution.
  {{/code}}
-----
--
  === 6.7 Simplified vs. Full System ===
  |=Gate|=POC (Simplified)|=Full System
@@ -521,14 +521,12 @@
  **POC Goal:** Demonstrate that quality gates are possible, not perfect implementation.
-----
--
  == 7. AKEL Architecture Comparison ==
  === 7.1 POC AKEL (Simplified) ===
  **Implementation:**
--* Single Claude API call (Sonnet 4.5)
++* Single provider API call (REASONING model)
  * One comprehensive prompt
  * All processing in single request
  * No separate components
@@ -540,10 +540,10 @@
 . Extract 3-5 factual claims
 . For each claim:
--   - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
--   - Assign confidence score (0-100%)
--   - Assign risk tier (A/B/C)
--   - Write brief reasoning (1-3 sentences)
++ - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
++ - Assign confidence score (0-100%)
++ - Assign risk tier (A/B/C)
++ - Write brief reasoning (1-3 sentences)
 . Generate analysis summary (3-5 sentences)
 . Generate article summary (3-5 sentences)
 . Run basic quality checks
@@ -553,8 +553,6 @@
  **Processing Time:** 10-18 seconds (estimate)
-----
--
  === 7.2 Full System AKEL (Production) ===
  **Architecture:**
@@ -579,8 +579,6 @@
  **Processing Time:** 10-30 seconds (full pipeline)
-----
--
  === 7.3 Why POC Uses Single Call ===
  **Advantages:**
@@ -603,8 +603,6 @@
  Full component architecture comes in Beta after POC validates concept.
-----
--
  === 7.4 Evolution Path ===
  **POC1:** Single prompt → Prove concept
@@ -612,8 +612,6 @@
  **Beta:** Multi-component AKEL → Production architecture
  **Release 1.0:** Full AKEL + Federation → Scale
-----
--
  == 8. Functional Requirements ==
  === FR-POC-1: Article Input ===
@@ -637,8 +637,6 @@
  * User can paste URL of article
  * System accepts input and triggers analysis
-----
--
  === FR-POC-2: Claim Extraction (Fully Automated) ===
  **Requirement:** AI automatically extracts 3-5 factual claims
@@ -666,8 +666,6 @@
  * Claims are clearly stated
  * No manual editing required
-----
--
  === FR-POC-3: Verdict Generation (Fully Automated) ===
  **Requirement:** AI automatically generates verdict for each claim
@@ -674,11 +674,11 @@
  **Functionality:**
  * For each claim, AI:
--  * Evaluates claim based on available evidence/knowledge
--  * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
--  * Assigns confidence score (0-100%)
--  * Assigns risk tier (A/B/C)
--  * Writes brief reasoning (1-3 sentences)
++ * Evaluates claim based on available evidence/knowledge
++ * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
++ * Assigns confidence score (0-100%)
++ * Assigns risk tier (A/B/C)
++ * Writes brief reasoning (1-3 sentences)
  * System displays verdict for each claim
  **Critical:** NO MANUAL EDITING ALLOWED
@@ -700,8 +700,6 @@
  * Verdict is defensible given reasoning
  * All generated automatically by AI
-----
--
  === FR-POC-4: Analysis Summary (Fully Automated) ===
  **Requirement:** AI generates brief summary of analysis
@@ -708,9 +708,9 @@
  **Functionality:**
  * AI summarizes findings in 3-5 sentences:
--  * How many claims found
--  * Distribution of verdicts
--  * Overall assessment
++ * How many claims found
++ * Distribution of verdicts
++ * Overall assessment
  * System displays at top of results
  **Critical:** NO MANUAL EDITING ALLOWED
@@ -721,8 +721,6 @@
  * 3-5 sentences
  * Automatically generated
-----
--
  === FR-POC-5: Article Summary (Fully Automated, Optional) ===
  **Requirement:** AI generates brief summary of original article
@@ -742,8 +742,6 @@
  * 3-5 sentences
  * Automatically generated
-----
--
  === FR-POC-6: Publication Mode Display ===
  **Requirement:** Clear labeling of AI-generated content
@@ -761,8 +761,6 @@
  * Risk tiers are color-coded
  * Quality gate status is visible
-----
--
  === FR-POC-7: Quality Gate Execution ===
  **Requirement:** Execute simplified quality gates
@@ -780,8 +780,6 @@
  * Failures explained to user
  * Gates don't block publication (POC mode)
-----
--
  == 9. Non-Functional Requirements ==
  === NFR-POC-1: Fully Automated Processing ===
@@ -800,8 +800,8 @@
  **Pipeline:**
  {{code}}
  User Input → AKEL Processing → Output Display
--           ↓
--     ZERO human editing
++ ↓
++ ZERO human editing
  {{/code}}
  **If AI output is poor:**
@@ -815,8 +815,6 @@
  * Validates scalability (humans can't review every analysis)
  * Honest test of technical feasibility
-----
--
  === NFR-POC-2: Performance ===
  **Requirement:** Analysis completes in reasonable time
@@ -836,8 +836,6 @@
  * User sees loading indicator
  * No timeout errors
-----
--
  === NFR-POC-3: Reliability ===
  **Requirement:** System works for manual testing sessions
@@ -857,8 +857,6 @@
  * Errors are handled gracefully
  * User receives clear error messages
-----
--
  === NFR-POC-4: Environment ===
  **Requirement:** Runs on simple infrastructure
@@ -876,8 +876,48 @@
  * Auto-scaling
  * Disaster recovery
-----
++=== NFR-POC-5: Cost Efficiency Tracking ===
++**Requirement:** Track and display LLM usage metrics to inform optimization decisions
++
++**Must Track:**
++* Input tokens (article + prompt)
++* Output tokens (generated analysis)
++* Total tokens
++* Estimated cost (USD)
++* Response time (seconds)
++* Article length (words/characters)
++
++**Must Display:**
++* Usage statistics in UI (Component 5)
++* Cost per analysis
++* Cost per claim extracted
++
++**Must Log:**
++* Aggregate metrics for analysis
++* Cost distribution by article length
++* Token efficiency trends
++
++**Purpose:**
++* Understand unit economics
++* Identify optimization opportunities
++* Project costs at scale
++* Inform architecture decisions (caching, model selection, etc.)
++
++**Acceptance Criteria:**
++* ✅ Usage data displayed after each analysis
++* ✅ Metrics logged for aggregate analysis
++* ✅ Cost calculated accurately (Claude API pricing)
++* ✅ Test cases include varying article lengths
++* ✅ POC1 report includes cost analysis section
++
++**Success Target:**
++* Average cost per analysis < $0.05 USD
++* Cost scaling behavior understood (linear/exponential)
++* 2+ optimization opportunities identified
++
++**Critical:** Unit economics must be viable for scaling decision!
++
  == 10. Technical Architecture ==
  === 10.1 System Components ===
@@ -889,7 +889,7 @@
  **Backend:**
  * Single API endpoint
--* Calls Claude API (Sonnet 4.5 or latest)
++* Calls provider API (REASONING model; configured via LLM abstraction)
  * Parses response
  * Returns JSON to frontend
@@ -901,36 +901,32 @@
  * Claude API (Anthropic) - required
  * Optional: URL fetch service for article text extraction
-----
--
  === 10.2 Processing Flow ===
  {{code}}
 . User submits text or URL
--   ↓
++ ↓
 . Backend receives request
--   ↓
++ ↓
 . If URL: Fetch article text
--   ↓
++ ↓
 . Call Claude API with single prompt:
--   "Extract claims, evaluate each, provide verdicts"
--   ↓
++ "Extract claims, evaluate each, provide verdicts"
++ ↓
 . Claude API returns:
--   - Analysis summary
--   - Claims list
--   - Verdicts for each claim (with risk tiers)
--   - Article summary (optional)
--   - Quality gate results
--   ↓
++ - Analysis summary
++ - Claims list
++ - Verdicts for each claim (with risk tiers)
++ - Article summary (optional)
++ - Quality gate results
++ ↓
 . Backend parses response
--   ↓
++ ↓
 . Frontend displays results with Mode 2 labeling
  {{/code}}
  **Key Simplification:** Single API call does entire analysis
-----
--
  === 10.3 AI Prompt Strategy ===
  **Single Comprehensive Prompt:**
@@ -937,27 +937,49 @@
  {{code}}
  Task: Analyze this article and provide:
--1. Extract 3-5 factual claims from the article
--2. For each claim:
--   - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
--   - Assign confidence score (0-100%)
--   - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
--   - Write brief reasoning (1-3 sentences)
--3. Run quality gates:
--   - Check: ≥2 sources found
--   - Attempt: Basic contradiction search
--   - Calculate: Confidence scores
--   - Verify: Structural integrity
--4. Write analysis summary (3-5 sentences: claims found, verdict distribution, overall assessment)
--5. Write article summary (3-5 sentences: neutral summary of article content)
++1. Identify the article's main thesis/conclusion
++ - What is the article trying to argue or prove?
++ - What is the primary claim or conclusion?
++2. Extract 3-5 factual claims from the article
++ - Note which claims are CENTRAL to the main thesis
++ - Note which claims are SUPPORTING facts
++
++3. For each claim:
++ - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
++ - Assign confidence score (0-100%)
++ - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
++ - Write brief reasoning (1-3 sentences)
++
++4. Assess relationship between claims and main thesis:
++ - Do the claims actually support the article's conclusion?
++ - Are there logical leaps or unsupported inferences?
++ - Is the article's framing misleading even if individual facts are accurate?
++
++5. Run quality gates:
++ - Check: ≥2 sources found
++ - Attempt: Basic contradiction search
++ - Calculate: Confidence scores
++ - Verify: Structural integrity
++
++6. Write context-aware analysis summary (4-6 sentences):
++ - State article's main thesis
++ - Report claims found and verdict distribution
++ - Note if central claims are problematic
++ - Assess whether evidence supports conclusion
++ - Overall credibility considering claim importance
++
++7. Write article summary (3-5 sentences: neutral summary of article content)
++
  Return as structured JSON with quality gate results.
  {{/code}}
  **One prompt generates everything.**
-----
++**Critical Addition:**
++Steps 1, 2 (marking central claims), 4, and 6 are NEW for context-aware analysis. These test whether AI can distinguish between "accurate facts poorly reasoned" vs. "genuinely credible article."
++
  === 10.4 Technology Stack Suggestions ===
  **Frontend:**
@@ -972,7 +972,7 @@
  **AKEL Integration:**
  * Claude API via Anthropic SDK
--* Model: Claude Sonnet 4.5 or latest available
++* Model: Provider-default REASONING model or latest available
  **Database:**
  * None (stateless acceptable)
@@ -983,8 +983,6 @@
  * Local development environment sufficient for POC
  * Optional: Deploy to cloud for remote demos
-----
--
  == 11. Success Criteria ==
  === 11.1 Minimum Success (POC Passes) ===
@@ -998,6 +998,9 @@
  * ✅ Team/advisors understand the output
  * ✅ Team agrees approach has merit
  * ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention)
++* ✅ **Cost efficiency acceptable** (average cost per analysis < $0.05 USD target)
++* ✅ **Cost scaling understood** (data collected on article length vs. cost)
++* ✅ **Optimization opportunities identified** (≥2 potential improvements documented)
  **Quality Definition:**
  * "Reasonable verdict" = Defensible given general knowledge
@@ -1004,8 +1004,6 @@
  * "Coherent summary" = Logically structured, grammatically correct
  * "Comprehensible" = Reviewers understand what analysis means
-----
--
  === 11.2 POC Fails If ===
  **Automatic NO-GO if any of these:**
@@ -1015,8 +1015,6 @@
  * ❌ **Requires manual editing for most analyses** (> 50% need human correction)
  * ❌ Team loses confidence in AI-automated approach
-----
--
  === 11.3 Quality Thresholds ===
  **POC quality expectations:**
@@ -1042,8 +1042,6 @@
  * Understandable reasoning
  * Useful output
-----
--
  == 12. Test Cases ==
  === 12.1 Test Case 1: Simple Factual Claim ===
@@ -1059,8 +1059,6 @@
  **Success:** Verdict is reasonable and reasoning makes sense
-----
--
  === 12.2 Test Case 2: Complex News Article ===
  **Input:** News article URL with multiple claims about politics/health/science
@@ -1074,8 +1074,6 @@
  **Success:** Claims identified are actually from article, verdicts are reasonable
-----
--
  === 12.3 Test Case 3: Controversial Topic ===
  **Input:** Article on contested political or scientific topic
@@ -1088,8 +1088,6 @@
  **Success:** Analysis is fair and doesn't show obvious bias
-----
--
  === 12.4 Test Case 4: Clearly False Claim ===
  **Input:** Article with obviously false claim (e.g., "The Earth is flat")
@@ -1103,8 +1103,6 @@
  **Success:** AI correctly identifies false claim with high confidence
-----
--
  === 12.5 Test Case 5: Genuinely Uncertain Claim ===
  **Input:** Article with claim where evidence is genuinely mixed
@@ -1117,8 +1117,6 @@
  **Success:** AI recognizes uncertainty and doesn't overstate confidence
-----
--
  === 12.6 Test Case 6: High-Risk Medical Claim ===
  **Input:** Article making medical claims
@@ -1132,8 +1132,6 @@
  **Success:** Risk tier correctly assigned, appropriate warnings shown
-----
--
  == 13. POC Decision Gate ==
  === 13.1 Decision Framework ===
@@ -1156,8 +1156,6 @@
  * Expand to Evidence Model structure
  * Test with more complex articles
-----
--
  **Option B: NO-GO (Pivot or Stop)**
  **Conditions:**
@@ -1171,8 +1171,6 @@
  * **Pivot:** Change to hybrid human-AI approach (accept manual review required)
  * **Stop:** Conclude approach not viable, revisit later
-----
--
  **Option C: ITERATE (Improve POC)**
  **Conditions:**
@@ -1187,39 +1187,33 @@
  * Re-run POC with improvements
  * Then make GO/NO-GO decision
-----
--
  === 13.2 Decision Criteria Summary ===
  {{code}}
--AI Quality < 60%  → NO-GO (approach doesn't work)
++AI Quality < 60% → NO-GO (approach doesn't work)
  AI Quality 60-70% → ITERATE (improve and retry)
--AI Quality ≥70%   → GO (proceed to POC2)
++AI Quality ≥70% → GO (proceed to POC2)
  {{/code}}
-----
--
  == 14. Key Risks & Mitigations ==
  === 14.1 Risk: AI Quality Not Good Enough ===
--**Likelihood:** Medium-High
--**Impact:** POC fails
++**Likelihood:** Medium-High
++**Impact:** POC fails
  **Mitigation:**
  * Extensive prompt engineering and testing
--* Use best available AI models (Sonnet 4.5)
++* Use best available AI models (role-based selection; configured via LLM abstraction)
  * Test with diverse article types
  * Iterate on prompts based on results
  **Acceptance:** This is what POC tests - be ready for failure
-----
--
  === 14.2 Risk: AI Consistency Issues ===
--**Likelihood:** Medium
--**Impact:** Works sometimes, fails other times
++**Likelihood:** Medium
++**Impact:** Works sometimes, fails other times
  **Mitigation:**
  * Test with 10+ diverse articles
@@ -1228,12 +1228,10 @@
  **Acceptance:** Some variability OK if average quality ≥70%
-----
--
  === 14.3 Risk: Output Incomprehensible ===
--**Likelihood:** Low-Medium
--**Impact:** Users can't understand analysis
++**Likelihood:** Low-Medium
++**Impact:** Users can't understand analysis
  **Mitigation:**
  * Create clear explainer document
@@ -1243,12 +1243,10 @@
  **Acceptance:** Iterate until comprehensible
-----
--
  === 14.4 Risk: API Rate Limits / Costs ===
--**Likelihood:** Low
--**Impact:** System slow or expensive
++**Likelihood:** Low
++**Impact:** System slow or expensive
  **Mitigation:**
  * Monitor API usage
@@ -1257,12 +1257,10 @@
  **Acceptance:** POC can be slow and expensive (optimization later)
-----
--
  === 14.5 Risk: Scope Creep ===
--**Likelihood:** Medium
--**Impact:** POC becomes too complex
++**Likelihood:** Medium
++**Impact:** POC becomes too complex
  **Mitigation:**
  * Strict scope discipline
@@ -1271,8 +1271,6 @@
  **Acceptance:** POC is minimal by design
-----
--
  == 15. POC Philosophy ==
  === 15.1 Core Principles ===
@@ -1304,27 +1304,21 @@
  * Document failures openly
  * Make data-driven decisions
-----
--
  === 15.2 What POC Is ===
--✅ Testing AI capability without humans
--✅ Proving core technical concept
--✅ Fast validation of approach
--✅ Honest assessment of feasibility
++✅ Testing AI capability without humans
++✅ Proving core technical concept
++✅ Fast validation of approach
++✅ Honest assessment of feasibility
-----
--
  === 15.3 What POC Is NOT ===
--❌ Building a product
--❌ Production-ready system
--❌ Feature-complete platform
--❌ Perfectly accurate analysis
--❌ Polished user experience
++❌ Building a product
++❌ Production-ready system
++❌ Feature-complete platform
++❌ Perfectly accurate analysis
++❌ Polished user experience
-----
--
  == 16. Success = Clear Path Forward ==
  **If POC succeeds (≥70% AI quality):**
@@ -1342,18 +1342,63 @@
  **Either way, POC provides clarity.**
-----
--
  == 17. Related Pages ==
--* [[User Needs>>FactHarbor.Specification.Requirements.User Needs]]
--* [[Requirements>>FactHarbor.Requirements.WebHome]]
--* [[Gap Analysis>>FactHarbor.Analysis.GapAnalysis]]
++* [[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]
++* [[Requirements>>FactHarbor.Specification.Requirements.WebHome]]
++* [[Gap Analysis>>FactHarbor.Specification.Requirements.GapAnalysis]]
  * [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]
  * [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
  * [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]
-----
--
  **Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)
++
++=== NFR-POC-11: LLM Provider Abstraction (POC1) ===
++
++**Requirement:** POC1 MUST implement LLM abstraction layer with support for multiple providers.
++
++**POC1 Implementation:**
++
++* **Primary Provider:** Anthropic Claude API
++ * Stage 1: Provider-default FAST model
++ * Stage 2: Provider-default REASONING model (cached)
++ * Stage 3: Provider-default REASONING model
++
++* **Provider Interface:** Abstract LLMProvider interface implemented
++
++* **Configuration:** Environment variables for provider selection
++ * {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}}
++ * {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}}
++ * {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}}
++
++* **Failover:** Basic error handling with cache fallback for Stage 2
++
++* **Cost Tracking:** Log provider name and cost per request
++
++**Future (POC2/Beta):**
++
++* Secondary provider (OpenAI) with automatic failover
++* Admin API for runtime provider switching
++* Cost comparison dashboard
++* Cross-provider output verification
++
++**Success Criteria:**
++
++* All LLM calls go through abstraction layer (no direct API calls)
++* Provider can be changed via environment variable without code changes
++* Cost tracking includes provider name in logs
++* Stage 2 falls back to cache on provider failure
++
++**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor.Specification.POC.API-and-Schemas.WebHome]] Section 6
++
++**Dependencies:**
++* NFR-14 (Main Requirements)
++* Design Decision 9
++* Architecture Section 2.2
++
++**Priority:** HIGH (P1)
++
++**Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.
++
++

Changes for page POC Requirements (POC1 & POC2)

Summary

Details

Applications

Navigation

Need help?