Changes for page POC Requirements (POC1 & POC2)

Last modified by Robert Schaub on 2026/02/08 08:26

From 2.4 to 2.3

From version 2.3

edited by Robert Schaub
on 2026/01/20 20:25

Change comment: Renamed back-links.

To version 1.1

edited by Robert Schaub
on 2025/12/19 16:13

Change comment: Imported from XAR

Raw
Rendered

Summary

Page properties (2 modified, 0 added, 0 removed)

Details

Page properties

Title

@@ -1,1 +1,1 @@
--POC Requirements (POC1 & POC2)
++POC Requirements

Content

@@ -1,28 +1,19 @@
  = POC Requirements =
--
--{{info}}
--**POC1 Architecture:** 3-stage AKEL pipeline (Extract → Analyze → Holistic) with Redis caching, credit tracking, and LLM abstraction layer.
--
--See [[POC1 API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome]] for complete technical details.
--{{/info}}
--
--
--
--**Status:** ✅ Approved for Development
--**Version:** 2.0 (Updated after Specification Cross-Check)
++**Status:** ✅ Approved for Development
++**Version:** 2.0 (Updated after Specification Cross-Check)
  **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
++---
++
  == 1. POC Overview ==
  === 1.1 What POC Tests ===
  **Core Question:**
--
  > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?
  **What we're proving:**
--
  * AI can identify factual claims from text
  * AI can evaluate those claims and produce verdicts
  * Output is comprehensible and useful
@@ -29,7 +29,6 @@
  * Fully automated approach is viable
  **What we're NOT testing:**
--
  * Scenario generation (deferred to POC2)
  * Evidence display (deferred to POC2)
  * Production scalability
@@ -36,6 +36,8 @@
  * Perfect accuracy
  * Complete feature set
++---
++
  === 1.2 Scenarios Deferred to POC2 ===
  **Intentional Simplification:**
@@ -43,7 +43,6 @@
  Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**.
  **Rationale:**
--
  * **POC1 tests:** Can AI extract claims and generate verdicts?
  * **POC2 will add:** Scenario generation and management
  * **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow?
@@ -55,7 +55,6 @@
  **No Risk:**
  Scenarios are additive complexity, not foundational. Deferring them to POC2 allows:
--
  * Faster POC1 validation
  * Learning from POC1 to inform scenario design
  * Iterative approach: fail fast if basic AI doesn't work
@@ -62,91 +62,65 @@
  * Flexibility to adjust scenario architecture based on POC1 insights
  **Full System Workflow (Future):**
--{{code}}Claims → Scenarios → Evidence → Verdicts{{/code}}
++{{code}}
++Claims → Scenarios → Evidence → Verdicts
++{{/code}}
  **POC1 Simplified Workflow:**
--{{code}}Claims → Verdicts (scenarios implicit in reasoning){{/code}}
++{{code}}
++Claims → Verdicts (scenarios implicit in reasoning)
++{{/code}}
++---
++
  == 2. POC Output Specification ==
--=== 2.1 Component 1: ANALYSIS SUMMARY (Context-Aware) ===
++=== 2.1 Component 1: ANALYSIS SUMMARY ===
--**What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument
++**What:** Brief overview of findings
++**Length:** 3-5 sentences
++**Content:**
++* How many claims found
++* Distribution of verdicts
++* Overall assessment
--**Length:** 4-6 sentences
++**Example:**
++{{code}}
++This article makes 4 claims about coffee's health effects. We found
++2 claims are well-supported, 1 is uncertain, and 1 is refuted.
++Overall assessment: mostly accurate with some exaggeration.
++{{/code}}
--**Content (Required Elements):**
++---
--1. **Article's main thesis/claim** - What is the article trying to argue or prove?
--2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts
--3. **Central vs. supporting claims** - Which claims are central to the article's argument?
--4. **Relationship assessment** - Do the claims support the article's conclusion?
--5. **Overall credibility** - Final assessment considering claim importance
--
--**Critical Innovation:**
--
--POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might:
--
--* Make accurate supporting facts but draw unsupported conclusions
--* Have one false central claim that invalidates the whole argument
--* Misframe accurate information to mislead
--
--**Good Example (Context-Aware):**
--{{code}}This article argues that coffee cures cancer based on its antioxidant
--content. We analyzed 3 factual claims: 2 about coffee's chemical
--properties are well-supported, but the main causal claim is refuted
--by current evidence. The article confuses correlation with causation.
--Overall assessment: MISLEADING - makes an unsupported medical claim
--despite citing some accurate facts.{{/code}}
--
--**Poor Example (Simple Aggregation - Don't Do This):**
--{{code}}This article makes 3 claims. 2 are well-supported and 1 is refuted.
--Overall assessment: mostly accurate (67% accurate).{{/code}}
--↑ This misses that the refuted claim IS the article's main point!
--
--**What POC1 Tests:**
--
--Can AI identify and assess:
--
--* ✅ The article's main thesis/conclusion?
--* ✅ Which claims are central vs. supporting?
--* ✅ Whether the evidence supports the conclusion?
--* ✅ Overall credibility considering logical structure?
--
--**If AI Cannot Do This:**
--
--That's valuable to learn in POC1! We'll:
--
--* Note as limitation
--* Fall back to simple aggregation with warning
--* Design explicit article-level analysis for POC2
--
  === 2.2 Component 2: CLAIMS IDENTIFICATION ===
--**What:** List of factual claims extracted from article
--**Format:** Numbered list
--**Quantity:** 3-5 claims
++**What:** List of factual claims extracted from article
++**Format:** Numbered list
++**Quantity:** 3-5 claims
  **Requirements:**
--
  * Factual claims only (not opinions/questions)
  * Clearly stated
  * Automatically extracted by AI
  **Example:**
--{{code}}CLAIMS IDENTIFIED:
++{{code}}
++CLAIMS IDENTIFIED:
  [1] Coffee reduces diabetes risk by 30%
  [2] Coffee improves heart health
  [3] Decaf has same benefits as regular
--[4] Coffee prevents Alzheimer's completely{{/code}}
++[4] Coffee prevents Alzheimer's completely
++{{/code}}
++---
++
  === 2.3 Component 3: CLAIMS VERDICTS ===
--**What:** Verdict for each claim identified
--**Format:** Per claim structure
++**What:** Verdict for each claim identified
++**Format:** Per claim structure
  **Required Elements:**
--
  * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
  * **Confidence Score:** 0-100%
  * **Brief Reasoning:** 1-3 sentences explaining why
@@ -153,7 +153,8 @@
  * **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration
  **Example:**
--{{code}}VERDICTS:
++{{code}}
++VERDICTS:
  [1] WELL-SUPPORTED (85%) [Risk: C]
  Multiple studies confirm 25-30% risk reduction with regular consumption.
@@ -165,86 +165,44 @@
  Some benefits overlap, but caffeine-related benefits are reduced in decaf.
  [4] REFUTED (90%) [Risk: B]
--No evidence for complete prevention. Claim is significantly overstated.{{/code}}
++No evidence for complete prevention. Claim is significantly overstated.
++{{/code}}
  **Risk Tier Display:**
--
  * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections
--* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
++* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
  * **Tier C (Green):** Low Risk - Facts/Definitions/History
  **Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.
++---
++
  === 2.4 Component 4: ARTICLE SUMMARY (Optional) ===
--**What:** Brief summary of original article content
--**Length:** 3-5 sentences
++**What:** Brief summary of original article content
++**Length:** 3-5 sentences
  **Tone:** Neutral (article's position, not FactHarbor's analysis)
  **Example:**
--{{code}}ARTICLE SUMMARY:
++{{code}}
++ARTICLE SUMMARY:
  Health News Today article discusses coffee benefits, citing studies
  on diabetes and Alzheimer's. Author highlights research linking coffee
--to disease prevention. Recommends 2-3 cups daily for optimal health.{{/code}}
++to disease prevention. Recommends 2-3 cups daily for optimal health.
++{{/code}}
--=== 2.5 Component 5: USAGE STATISTICS (Cost Tracking) ===
++---
--**What:** LLM usage metrics for cost optimization and scaling decisions
++=== 2.5 Total Output Size ===
--**Purpose:**
--
--* Understand cost per analysis
--* Identify optimization opportunities
--* Project costs at scale
--* Inform architecture decisions
--
--**Display Format:**
--{{code}}USAGE STATISTICS:
--• Article: 2,450 words (12,300 characters)
--• Input tokens: 15,234
--• Output tokens: 892
--• Total tokens: 16,126
--• Estimated cost: $0.24 USD
--• Response time: 8.3 seconds
--• Cost per claim: $0.048
--• Model: claude-sonnet-4-20250514{{/code}}
--
--**Why This Matters:**
--
--At scale, LLM costs are critical:
--
--* 10,000 articles/month ≈ $200-500/month
--* 100,000 articles/month ≈ $2,000-5,000/month
--* Cost optimization can reduce expenses 30-50%
--
--**What POC1 Learns:**
--
--* How cost scales with article length
--* Prompt optimization opportunities (caching, compression)
--* Output verbosity tradeoffs
--* Model selection strategy (FAST vs. REASONING roles)
--* Article length limits (if needed)
--
--**Implementation:**
--
--* Claude API already returns usage data
--* No extra API calls needed
--* Display to user + log for aggregate analysis
--* Test with articles of varying lengths
--
--**Critical for GO/NO-GO:** Unit economics must be viable at scale!
--
--=== 2.6 Total Output Size ===
--
--**Combined:** 220-350 words
--
--* Analysis Summary (Context-Aware): 60-90 words (4-6 sentences)
++**Combined:** ~200-300 words
++* Analysis Summary: 50-70 words
  * Claims Identification: 30-50 words
  * Claims Verdicts: 100-150 words
  * Article Summary: 30-50 words (optional)
--**Note:** Analysis summary is slightly longer (4-6 sentences vs. 3-5) to accommodate context-aware assessment of article structure and logical reasoning.
++---
  == 3. What's NOT in POC Scope ==
@@ -253,7 +253,6 @@
  The following are **explicitly excluded** from POC:
  **Content Features:**
--
  * ❌ Scenarios (deferred to POC2)
  * ❌ Evidence display (supporting/opposing lists)
  * ❌ Source links (clickable references)
@@ -263,7 +263,6 @@
  * ❌ Risk assessment (shown but not workflow-integrated)
  **Platform Features:**
--
  * ❌ User accounts / authentication
  * ❌ Saved history
  * ❌ Search functionality
@@ -273,7 +273,6 @@
  * ❌ Social sharing
  **Technical Features:**
--
  * ❌ Browser extensions
  * ❌ Mobile apps
  * ❌ API endpoints
@@ -281,7 +281,6 @@
  * ❌ Export features (PDF, CSV)
  **Quality Features:**
--
  * ❌ Accessibility (WCAG compliance)
  * ❌ Multilingual support
  * ❌ Mobile optimization
@@ -288,7 +288,6 @@
  * ❌ Media verification (images/videos)
  **Production Features:**
--
  * ❌ Security hardening
  * ❌ Privacy compliance (GDPR)
  * ❌ Terms of service
@@ -297,18 +297,24 @@
  * ❌ Analytics
  * ❌ A/B testing
++---
++
  == 4. POC Simplifications vs. Full System ==
  === 4.1 Architecture Comparison ===
  **POC Architecture (Simplified):**
--{{code}}User Input → Single AKEL Call → Output Display
-- (all processing){{/code}}
++{{code}}
++User Input → Single AKEL Call → Output Display
++           (all processing)
++{{/code}}
  **Full System Architecture:**
--{{code}}User Input → Claim Extractor → Claim Classifier → Scenario Generator
++{{code}}
++User Input → Claim Extractor → Claim Classifier → Scenario Generator
  → Evidence Summarizer → Contradiction Detector → Verdict Generator
--→ Quality Gates → Publication → Output Display{{/code}}
++→ Quality Gates → Publication → Output Display
++{{/code}}
  **Key Differences:**
@@ -321,17 +321,17 @@
  |Data Model|Stateless (no database)|PostgreSQL + Redis + S3
  |Architecture|Single prompt to Claude|AKEL Orchestrator + Components
++---
++
  === 4.2 Workflow Comparison ===
  **POC1 Workflow:**
--
 . User submits text/URL
 . Single AKEL call (all processing in one prompt)
 . Display results
--**Total: 3 steps, 10-18 seconds**
++**Total: 3 steps, ~10-18 seconds**
  **Full System Workflow:**
--
 . **Claim Submission** (extraction, normalization, clustering)
 . **Scenario Building** (definitions, assumptions, boundaries)
 . **Evidence Handling** (retrieval, assessment, linking)
@@ -338,8 +338,10 @@
 . **Verdict Creation** (synthesis, reasoning, approval)
 . **Public Presentation** (summaries, landscapes, deep dives)
 . **Time Evolution** (versioning, re-evaluation triggers)
--**Total: 6 phases with quality gates, 10-30 seconds**
++**Total: 6 phases with quality gates, ~10-30 seconds**
++---
++
  === 4.3 Why POC is Simplified ===
  **Engineering Rationale:**
@@ -358,10 +358,11 @@
  * ❌ POC doesn't validate scale (test in Beta)
  * ❌ POC doesn't validate scenario architecture (design in POC2)
++---
++
  === 4.4 Gap Between POC1 and POC2/Beta ===
  **What needs to be built for POC2:**
--
  * Scenario generation component
  * Evidence Model structure (full)
  * Scenario-evidence linking
@@ -369,7 +369,6 @@
  * Truth landscape visualization
  **What needs to be built for Beta:**
--
  * Multi-component AKEL pipeline
  * Quality gate infrastructure
  * Review workflow system
@@ -379,6 +379,8 @@
  **POC1 → POC2 is significant architectural expansion.**
++---
++
  == 5. Publication Mode & Labeling ==
  === 5.1 POC Publication Mode ===
@@ -386,7 +386,6 @@
  **Mode:** Mode 2 (AI-Generated, No Prior Human Review)
  Per FactHarbor Specification Section 11 "POC v1 Behavior":
--
  * Produces public AI-generated output
  * No human approval gate
  * Clear AI-Generated labeling
@@ -393,31 +393,35 @@
  * All quality gates active (simplified)
  * Risk tier classification shown (demo)
++---
++
  === 5.2 User-Facing Labels ===
  **Primary Label (top of analysis):**
--{{code}}╔════════════════════════════════════════════════════════════╗
--║ [AI-GENERATED - POC/DEMO] ║
--║ ║
--║ This analysis was produced entirely by AI and has not ║
--║ been human-reviewed. Use for demonstration purposes. ║
--║ ║
--║ Source: AI/AKEL v1.0 (POC) ║
--║ Review Status: Not Reviewed (Proof-of-Concept) ║
--║ Quality Gates: 4/4 Passed (Simplified) ║
--║ Last Updated: [timestamp] ║
--╚════════════════════════════════════════════════════════════╝{{/code}}
++{{code}}
++╔════════════════════════════════════════════════════════════╗
++║  [AI-GENERATED - POC/DEMO]                                ║
++║                                                            ║
++║  This analysis was produced entirely by AI and has not    ║
++║  been human-reviewed. Use for demonstration purposes.     ║
++║                                                            ║
++║  Source: AI/AKEL v1.0 (POC)                               ║
++║  Review Status: Not Reviewed (Proof-of-Concept)          ║
++║  Quality Gates: 4/4 Passed (Simplified)                  ║
++║  Last Updated: [timestamp]                                ║
++╚════════════════════════════════════════════════════════════╝
++{{/code}}
  **Per-Claim Risk Labels:**
--
  * **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety)
  * **[Risk: B]** 🟡 Medium Risk (Policy/Science)
  * **[Risk: C]** 🟢 Low Risk (Facts/Definitions)
++---
++
  === 5.3 Display Requirements ===
  **Must Show:**
--
  * AI-Generated status (prominent)
  * POC/Demo disclaimer
  * Risk tier per claim
@@ -426,7 +426,6 @@
  * Timestamp
  **Must NOT Claim:**
--
  * Human review
  * Production quality
  * Medical/legal advice
@@ -433,6 +433,8 @@
  * Authoritative verdicts
  * Complete accuracy
++---
++
  === 5.4 Mode 2 vs. Full System Publication ===
  |=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3
@@ -443,6 +443,8 @@
  |Risk Display|Demo only|Workflow-integrated|Validated
  |User Actions|View only|Flag for review|Trust rating
++---
++
  == 6. Quality Gates (Simplified Implementation) ==
  === 6.1 Overview ===
@@ -450,7 +450,6 @@
  Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates.
  **Full System Has 4 Gates:**
--
 . Source Quality
 . Contradiction Search (MANDATORY)
 . Uncertainty Quantification
@@ -457,16 +457,16 @@
 . Structural Integrity
  **POC Implements Simplified Versions:**
--
  * Focus on demonstrating concept
  * Basic implementations sufficient
  * Failures displayed to user (not blocking)
  * Full system has comprehensive validation
++---
++
  === 6.2 Gate 1: Source Quality (Basic) ===
  **Full System Requirements:**
--
  * Primary sources identified and accessible
  * Source reliability scored against whitelist
  * Citation completeness verified
@@ -474,7 +474,6 @@
  * Author credentials validated
  **POC Implementation:**
--
  * ✅ At least 2 sources found
  * ✅ Sources accessible (URLs valid)
  * ❌ No whitelist checking
@@ -485,10 +485,11 @@
  **Failure Handling:** Display error message, don't generate verdict
++---
++
  === 6.3 Gate 2: Contradiction Search (Basic) ===
  **Full System Requirements:**
--
  * Counter-evidence actively searched
  * Reservations and limitations identified
  * Alternative interpretations explored
@@ -497,7 +497,6 @@
  * Academic literature (supporting AND opposing)
  **POC Implementation:**
--
  * ✅ Basic search for counter-evidence
  * ✅ Identify obvious contradictions
  * ❌ No comprehensive academic search
@@ -509,10 +509,11 @@
  **Failure Handling:** Note "limited contradiction search" in output
++---
++
  === 6.4 Gate 3: Uncertainty Quantification (Basic) ===
  **Full System Requirements:**
--
  * Confidence scores calculated for all claims/verdicts
  * Limitations explicitly stated
  * Data gaps identified and disclosed
@@ -520,7 +520,6 @@
  * Alternative scenarios considered
  **POC Implementation:**
--
  * ✅ Confidence scores (0-100%)
  * ✅ Basic uncertainty acknowledgment
  * ❌ No detailed limitation disclosure
@@ -531,10 +531,11 @@
  **Failure Handling:** Show "Confidence: Unknown" if calculation fails
++---
++
  === 6.5 Gate 4: Structural Integrity (Basic) ===
  **Full System Requirements:**
--
  * No hallucinations detected (fact-checking against sources)
  * Logic chain valid and traceable
  * References accessible and verifiable
@@ -542,7 +542,6 @@
  * Premises clearly stated
  **POC Implementation:**
--
  * ✅ Basic coherence check
  * ✅ References accessible
  * ❌ No comprehensive hallucination detection
@@ -553,24 +553,32 @@
  **Failure Handling:** Display error message
++---
++
  === 6.6 Quality Gate Display ===
  **POC shows simplified status:**
--{{code}}Quality Gates: 4/4 Passed (Simplified)
++{{code}}
++Quality Gates: 4/4 Passed (Simplified)
  ✓ Source Quality: 3 sources found
  ✓ Contradiction Search: Basic search completed
  ✓ Uncertainty: Confidence scores assigned
--✓ Structural Integrity: Output coherent{{/code}}
++✓ Structural Integrity: Output coherent
++{{/code}}
  **If any gate fails:**
--{{code}}Quality Gates: 3/4 Passed (Simplified)
++{{code}}
++Quality Gates: 3/4 Passed (Simplified)
  ✓ Source Quality: 3 sources found
  ✗ Contradiction Search: Search failed - limited evidence
  ✓ Uncertainty: Confidence scores assigned
  ✓ Structural Integrity: Output coherent
--Note: This analysis has limited evidence. Use with caution.{{/code}}
++Note: This analysis has limited evidence. Use with caution.
++{{/code}}
++---
++
  === 6.7 Simplified vs. Full System ===
  |=Gate|=POC (Simplified)|=Full System
@@ -581,13 +581,14 @@
  **POC Goal:** Demonstrate that quality gates are possible, not perfect implementation.
++---
++
  == 7. AKEL Architecture Comparison ==
  === 7.1 POC AKEL (Simplified) ===
  **Implementation:**
--
--* Single provider API call (REASONING model)
++* Single Claude API call (Sonnet 4.5)
  * One comprehensive prompt
  * All processing in single request
  * No separate components
@@ -594,26 +594,31 @@
  * No orchestration layer
  **Prompt Structure:**
--{{code}}Task: Analyze this article and provide:
++{{code}}
++Task: Analyze this article and provide:
 . Extract 3-5 factual claims
 . For each claim:
-- - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
-- - Assign confidence score (0-100%)
-- - Assign risk tier (A/B/C)
-- - Write brief reasoning (1-3 sentences)
++   - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
++   - Assign confidence score (0-100%)
++   - Assign risk tier (A/B/C)
++   - Write brief reasoning (1-3 sentences)
 . Generate analysis summary (3-5 sentences)
 . Generate article summary (3-5 sentences)
 . Run basic quality checks
--Return as structured JSON.{{/code}}
++Return as structured JSON.
++{{/code}}
  **Processing Time:** 10-18 seconds (estimate)
++---
++
  === 7.2 Full System AKEL (Production) ===
  **Architecture:**
--{{code}}AKEL Orchestrator
++{{code}}
++AKEL Orchestrator
  ├── Claim Extractor
  ├── Claim Classifier (with risk tier assignment)
  ├── Scenario Generator
@@ -621,10 +621,10 @@
  ├── Contradiction Detector
  ├── Quality Gate Validator
  ├── Audit Sampling Scheduler
--└── Federation Sync Adapter (Release 1.0+){{/code}}
++└── Federation Sync Adapter (Release 1.0+)
++{{/code}}
  **Processing:**
--
  * Parallel processing where possible
  * Separate component calls
  * Quality gates between phases
@@ -633,10 +633,11 @@
  **Processing Time:** 10-30 seconds (full pipeline)
++---
++
  === 7.3 Why POC Uses Single Call ===
  **Advantages:**
--
  * ✅ Simpler to implement
  * ✅ Faster POC development
  * ✅ Easier to debug
@@ -644,7 +644,6 @@
  * ✅ Good enough for concept validation
  **Limitations:**
--
  * ❌ No component reusability
  * ❌ No parallel processing
  * ❌ All-or-nothing (can't partially succeed)
@@ -657,6 +657,8 @@
  Full component architecture comes in Beta after POC validates concept.
++---
++
  === 7.4 Evolution Path ===
  **POC1:** Single prompt → Prove concept
@@ -664,6 +664,8 @@
  **Beta:** Multi-component AKEL → Production architecture
  **Release 1.0:** Full AKEL + Federation → Scale
++---
++
  == 8. Functional Requirements ==
  === FR-POC-1: Article Input ===
@@ -671,7 +671,6 @@
  **Requirement:** User can submit article for analysis
  **Functionality:**
--
  * Text input field (paste article text, up to 5000 characters)
  * URL input field (paste article URL)
  * "Analyze" button to trigger processing
@@ -678,7 +678,6 @@
  * Loading indicator during analysis
  **Excluded:**
--
  * No user authentication
  * No claim history
  * No search functionality
@@ -685,17 +685,17 @@
  * No saved templates
  **Acceptance Criteria:**
--
  * User can paste text from article
  * User can paste URL of article
  * System accepts input and triggers analysis
++---
++
  === FR-POC-2: Claim Extraction (Fully Automated) ===
  **Requirement:** AI automatically extracts 3-5 factual claims
  **Functionality:**
--
  * AI reads article text
  * AI identifies factual claims (not opinions/questions)
  * AI extracts 3-5 most important claims
@@ -702,7 +702,6 @@
  * System displays numbered list
  **Critical:** NO MANUAL EDITING ALLOWED
--
  * AI selects which claims to extract
  * AI identifies factual vs. non-factual
  * System processes claims as extracted
@@ -709,34 +709,32 @@
  * No human curation or correction
  **Error Handling:**
--
  * If extraction fails: Display error message
  * User can retry with different input
  * No manual intervention to fix extraction
  **Acceptance Criteria:**
--
  * AI extracts 3-5 claims automatically
  * Claims are factual (not opinions)
  * Claims are clearly stated
  * No manual editing required
++---
++
  === FR-POC-3: Verdict Generation (Fully Automated) ===
  **Requirement:** AI automatically generates verdict for each claim
  **Functionality:**
--
  * For each claim, AI:
--* Evaluates claim based on available evidence/knowledge
--* Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
--* Assigns confidence score (0-100%)
--* Assigns risk tier (A/B/C)
--* Writes brief reasoning (1-3 sentences)
++  * Evaluates claim based on available evidence/knowledge
++  * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
++  * Assigns confidence score (0-100%)
++  * Assigns risk tier (A/B/C)
++  * Writes brief reasoning (1-3 sentences)
  * System displays verdict for each claim
  **Critical:** NO MANUAL EDITING ALLOWED
--
  * AI computes verdicts based on evidence
  * AI generates confidence scores
  * AI writes reasoning
@@ -743,13 +743,11 @@
  * No human review or adjustment
  **Error Handling:**
--
  * If verdict generation fails: Display error message
  * User can retry
  * No manual intervention to adjust verdicts
  **Acceptance Criteria:**
--
  * Each claim has a verdict
  * Confidence score is displayed (0-100%)
  * Risk tier is displayed (A/B/C)
@@ -757,33 +757,34 @@
  * Verdict is defensible given reasoning
  * All generated automatically by AI
++---
++
  === FR-POC-4: Analysis Summary (Fully Automated) ===
  **Requirement:** AI generates brief summary of analysis
  **Functionality:**
--
  * AI summarizes findings in 3-5 sentences:
--* How many claims found
--* Distribution of verdicts
--* Overall assessment
++  * How many claims found
++  * Distribution of verdicts
++  * Overall assessment
  * System displays at top of results
  **Critical:** NO MANUAL EDITING ALLOWED
  **Acceptance Criteria:**
--
  * Summary is coherent
  * Accurately reflects analysis
  * 3-5 sentences
  * Automatically generated
++---
++
  === FR-POC-5: Article Summary (Fully Automated, Optional) ===
  **Requirement:** AI generates brief summary of original article
  **Functionality:**
--
  * AI summarizes article content (not FactHarbor's analysis)
  * 3-5 sentences
  * System displays
@@ -793,18 +793,18 @@
  **Critical:** NO MANUAL EDITING ALLOWED
  **Acceptance Criteria:**
--
  * Summary is neutral (article's position)
  * Accurately reflects article content
  * 3-5 sentences
  * Automatically generated
++---
++
  === FR-POC-6: Publication Mode Display ===
  **Requirement:** Clear labeling of AI-generated content
  **Functionality:**
--
  * Display Mode 2 publication label
  * Show POC/Demo disclaimer
  * Display risk tiers per claim
@@ -812,18 +812,18 @@
  * Display timestamp
  **Acceptance Criteria:**
--
  * Label is prominent and clear
  * User understands this is AI-generated POC output
  * Risk tiers are color-coded
  * Quality gate status is visible
++---
++
  === FR-POC-7: Quality Gate Execution ===
  **Requirement:** Execute simplified quality gates
  **Functionality:**
--
  * Check source quality (basic)
  * Attempt contradiction search (basic)
  * Calculate confidence scores
@@ -831,12 +831,13 @@
  * Display gate results
  **Acceptance Criteria:**
--
  * All 4 gates attempted
  * Pass/fail status displayed
  * Failures explained to user
  * Gates don't block publication (POC mode)
++---
++
  == 9. Non-Functional Requirements ==
  === NFR-POC-1: Fully Automated Processing ===
@@ -846,7 +846,6 @@
  **Critical Rule:** NO MANUAL EDITING AT ANY STAGE
  **What this means:**
--
  * Claims: AI selects (no human curation)
  * Scenarios: N/A (deferred to POC2)
  * Evidence: AI evaluates (no human selection)
@@ -854,12 +854,13 @@
  * Summaries: AI writes (no human editing)
  **Pipeline:**
--{{code}}User Input → AKEL Processing → Output Display
-- ↓
-- ZERO human editing{{/code}}
++{{code}}
++User Input → AKEL Processing → Output Display
++           ↓
++     ZERO human editing
++{{/code}}
  **If AI output is poor:**
--
  * ❌ Do NOT manually fix it
  * ✅ Document the failure
  * ✅ Improve prompts and retry
@@ -866,61 +866,59 @@
  * ✅ Accept that POC might fail
  **Why this matters:**
--
  * Tests whether AI can do this without humans
  * Validates scalability (humans can't review every analysis)
  * Honest test of technical feasibility
++---
++
  === NFR-POC-2: Performance ===
  **Requirement:** Analysis completes in reasonable time
  **Acceptable Performance:**
--
  * Processing time: 1-5 minutes (acceptable for POC)
  * Display loading indicator to user
  * Show progress if possible ("Extracting claims...", "Generating verdicts...")
  **Not Required:**
--
  * Production-level speed (< 30 seconds)
  * Optimization for scale
  * Caching
  **Acceptance Criteria:**
--
  * Analysis completes within 5 minutes
  * User sees loading indicator
  * No timeout errors
++---
++
  === NFR-POC-3: Reliability ===
  **Requirement:** System works for manual testing sessions
  **Acceptable:**
--
  * Occasional errors (< 20% failure rate)
  * Manual restart if needed
  * Display error messages clearly
  **Not Required:**
--
  * 99.9% uptime
  * Automatic error recovery
  * Production monitoring
  **Acceptance Criteria:**
--
  * System works for test demonstrations
  * Errors are handled gracefully
  * User receives clear error messages
++---
++
  === NFR-POC-4: Environment ===
  **Requirement:** Runs on simple infrastructure
  **Acceptable:**
--
  * Single machine or simple cloud setup
  * No distributed architecture
  * No load balancing
@@ -928,196 +928,125 @@
  * Local development environment viable
  **Not Required:**
--
  * Production infrastructure
  * Multi-region deployment
  * Auto-scaling
  * Disaster recovery
--=== NFR-POC-5: Cost Efficiency Tracking ===
++---
--**Requirement:** Track and display LLM usage metrics to inform optimization decisions
--
--**Must Track:**
--
--* Input tokens (article + prompt)
--* Output tokens (generated analysis)
--* Total tokens
--* Estimated cost (USD)
--* Response time (seconds)
--* Article length (words/characters)
--
--**Must Display:**
--
--* Usage statistics in UI (Component 5)
--* Cost per analysis
--* Cost per claim extracted
--
--**Must Log:**
--
--* Aggregate metrics for analysis
--* Cost distribution by article length
--* Token efficiency trends
--
--**Purpose:**
--
--* Understand unit economics
--* Identify optimization opportunities
--* Project costs at scale
--* Inform architecture decisions (caching, model selection, etc.)
--
--**Acceptance Criteria:**
--
--* ✅ Usage data displayed after each analysis
--* ✅ Metrics logged for aggregate analysis
--* ✅ Cost calculated accurately (Claude API pricing)
--* ✅ Test cases include varying article lengths
--* ✅ POC1 report includes cost analysis section
--
--**Success Target:**
--
--* Average cost per analysis < $0.05 USD
--* Cost scaling behavior understood (linear/exponential)
--* 2+ optimization opportunities identified
--
--**Critical:** Unit economics must be viable for scaling decision!
--
  == 10. Technical Architecture ==
  === 10.1 System Components ===
  **Frontend:**
--
  * Simple HTML form (text input + URL input + button)
  * Loading indicator
  * Results display page (single page, no tabs/navigation)
  **Backend:**
--
  * Single API endpoint
--* Calls provider API (REASONING model; configured via LLM abstraction)
++* Calls Claude API (Sonnet 4.5 or latest)
  * Parses response
  * Returns JSON to frontend
  **Data Storage:**
--
  * None required (stateless POC)
  * Optional: Simple file storage or SQLite for demo examples
  **External Services:**
--
  * Claude API (Anthropic) - required
  * Optional: URL fetch service for article text extraction
++---
++
  === 10.2 Processing Flow ===
  {{code}}
 . User submits text or URL
-- ↓
++   ↓
 . Backend receives request
-- ↓
++   ↓
 . If URL: Fetch article text
-- ↓
++   ↓
 . Call Claude API with single prompt:
-- "Extract claims, evaluate each, provide verdicts"
-- ↓
++   "Extract claims, evaluate each, provide verdicts"
++   ↓
 . Claude API returns:
-- - Analysis summary
-- - Claims list
-- - Verdicts for each claim (with risk tiers)
-- - Article summary (optional)
-- - Quality gate results
-- ↓
++   - Analysis summary
++   - Claims list
++   - Verdicts for each claim (with risk tiers)
++   - Article summary (optional)
++   - Quality gate results
++   ↓
 . Backend parses response
-- ↓
++   ↓
 . Frontend displays results with Mode 2 labeling
  {{/code}}
  **Key Simplification:** Single API call does entire analysis
++---
++
  === 10.3 AI Prompt Strategy ===
  **Single Comprehensive Prompt:**
--{{code}}Task: Analyze this article and provide:
++{{code}}
++Task: Analyze this article and provide:
--1. Identify the article's main thesis/conclusion
-- - What is the article trying to argue or prove?
-- - What is the primary claim or conclusion?
++1. Extract 3-5 factual claims from the article
++2. For each claim:
++   - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
++   - Assign confidence score (0-100%)
++   - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
++   - Write brief reasoning (1-3 sentences)
++3. Run quality gates:
++   - Check: ≥2 sources found
++   - Attempt: Basic contradiction search
++   - Calculate: Confidence scores
++   - Verify: Structural integrity
++4. Write analysis summary (3-5 sentences: claims found, verdict distribution, overall assessment)
++5. Write article summary (3-5 sentences: neutral summary of article content)
--2. Extract 3-5 factual claims from the article
-- - Note which claims are CENTRAL to the main thesis
-- - Note which claims are SUPPORTING facts
++Return as structured JSON with quality gate results.
++{{/code}}
--3. For each claim:
-- - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
-- - Assign confidence score (0-100%)
-- - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
-- - Write brief reasoning (1-3 sentences)
--
--4. Assess relationship between claims and main thesis:
-- - Do the claims actually support the article's conclusion?
-- - Are there logical leaps or unsupported inferences?
-- - Is the article's framing misleading even if individual facts are accurate?
--
--5. Run quality gates:
-- - Check: ≥2 sources found
-- - Attempt: Basic contradiction search
-- - Calculate: Confidence scores
-- - Verify: Structural integrity
--
--6. Write context-aware analysis summary (4-6 sentences):
-- - State article's main thesis
-- - Report claims found and verdict distribution
-- - Note if central claims are problematic
-- - Assess whether evidence supports conclusion
-- - Overall credibility considering claim importance
--
--7. Write article summary (3-5 sentences: neutral summary of article content)
--
--Return as structured JSON with quality gate results.{{/code}}
--
  **One prompt generates everything.**
--**Critical Addition:**
++---
--Steps 1, 2 (marking central claims), 4, and 6 are NEW for context-aware analysis. These test whether AI can distinguish between "accurate facts poorly reasoned" vs. "genuinely credible article."
--
  === 10.4 Technology Stack Suggestions ===
  **Frontend:**
--
  * HTML + CSS + JavaScript (minimal framework)
  * OR: Next.js (if team prefers)
  * Hosted: Local machine OR Vercel/Netlify free tier
  **Backend:**
--
  * Python Flask/FastAPI (simple REST API)
  * OR: Next.js API routes (if using Next.js)
  * Hosted: Local machine OR Railway/Render free tier
  **AKEL Integration:**
--
  * Claude API via Anthropic SDK
--* Model: Provider-default REASONING model or latest available
++* Model: Claude Sonnet 4.5 or latest available
  **Database:**
--
  * None (stateless acceptable)
  * OR: SQLite if want to store demo examples
  * OR: JSON files on disk
  **Deployment:**
--
  * Local development environment sufficient for POC
  * Optional: Deploy to cloud for remote demos
++---
++
  == 11. Success Criteria ==
  === 11.1 Minimum Success (POC Passes) ===
  **Required for GO decision:**
--
  * ✅ AI extracts 3-5 factual claims automatically
  * ✅ AI provides verdict for each claim automatically
  * ✅ Verdicts are reasonable (≥70% make logical sense)
@@ -1126,20 +1126,17 @@
  * ✅ Team/advisors understand the output
  * ✅ Team agrees approach has merit
  * ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention)
--* ✅ **Cost efficiency acceptable** (average cost per analysis < $0.05 USD target)
--* ✅ **Cost scaling understood** (data collected on article length vs. cost)
--* ✅ **Optimization opportunities identified** (≥2 potential improvements documented)
  **Quality Definition:**
--
  * "Reasonable verdict" = Defensible given general knowledge
  * "Coherent summary" = Logically structured, grammatically correct
  * "Comprehensible" = Reviewers understand what analysis means
++---
++
  === 11.2 POC Fails If ===
  **Automatic NO-GO if any of these:**
--
  * ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones)
  * ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random)
  * ❌ Output incomprehensible (reviewers can't understand analysis)
@@ -1146,20 +1146,21 @@
  * ❌ **Requires manual editing for most analyses** (> 50% need human correction)
  * ❌ Team loses confidence in AI-automated approach
++---
++
  === 11.3 Quality Thresholds ===
  **POC quality expectations:**
  |=Component|=Quality Threshold|=Definition
--|Claim Extraction|(% class="success" %)≥70% accuracy |Identifies obvious factual claims, may miss some edge cases
--|Verdict Logic|(% class="success" %)≥70% defensible |Verdicts are logical given reasoning provided
--|Reasoning Clarity|(% class="success" %)≥70% clear |1-3 sentences are understandable and relevant
--|Overall Analysis|(% class="success" %)≥70% useful |Output helps user understand article claims
++|Claim Extraction|(% class="success" %)≥70% accuracy(%%) |Identifies obvious factual claims, may miss some edge cases
++|Verdict Logic|(% class="success" %)≥70% defensible(%%) |Verdicts are logical given reasoning provided
++|Reasoning Clarity|(% class="success" %)≥70% clear(%%) |1-3 sentences are understandable and relevant
++|Overall Analysis|(% class="success" %)≥70% useful(%%) |Output helps user understand article claims
  **Analogy:** "B student" quality (70-80%), not "A+" perfection yet
  **Not expecting:**
--
  * 100% accuracy
  * Perfect claim coverage
  * Comprehensive evidence gathering
@@ -1167,12 +1167,13 @@
  * Production polish
  **Expecting:**
--
  * Reasonable claim extraction
  * Defensible verdicts
  * Understandable reasoning
  * Useful output
++---
++
  == 12. Test Cases ==
  === 12.1 Test Case 1: Simple Factual Claim ===
@@ -1180,7 +1180,6 @@
  **Input:** "Coffee reduces the risk of type 2 diabetes by 30%"
  **Expected Output:**
--
  * Extract claim correctly
  * Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED
  * Confidence: 70-90%
@@ -1189,12 +1189,13 @@
  **Success:** Verdict is reasonable and reasoning makes sense
++---
++
  === 12.2 Test Case 2: Complex News Article ===
  **Input:** News article URL with multiple claims about politics/health/science
  **Expected Output:**
--
  * Extract 3-5 key claims
  * Verdict for each (may vary: some supported, some uncertain, some refuted)
  * Coherent analysis summary
@@ -1203,12 +1203,13 @@
  **Success:** Claims identified are actually from article, verdicts are reasonable
++---
++
  === 12.3 Test Case 3: Controversial Topic ===
  **Input:** Article on contested political or scientific topic
  **Expected Output:**
--
  * Balanced analysis
  * Acknowledges uncertainty where appropriate
  * Doesn't overstate confidence
@@ -1216,12 +1216,13 @@
  **Success:** Analysis is fair and doesn't show obvious bias
++---
++
  === 12.4 Test Case 4: Clearly False Claim ===
  **Input:** Article with obviously false claim (e.g., "The Earth is flat")
  **Expected Output:**
--
  * Extract claim
  * Verdict: REFUTED
  * High confidence (> 90%)
@@ -1230,12 +1230,13 @@
  **Success:** AI correctly identifies false claim with high confidence
++---
++
  === 12.5 Test Case 5: Genuinely Uncertain Claim ===
  **Input:** Article with claim where evidence is genuinely mixed
  **Expected Output:**
--
  * Extract claim
  * Verdict: UNCERTAIN
  * Moderate confidence (40-60%)
@@ -1243,12 +1243,13 @@
  **Success:** AI recognizes uncertainty and doesn't overstate confidence
++---
++
  === 12.6 Test Case 6: High-Risk Medical Claim ===
  **Input:** Article making medical claims
  **Expected Output:**
--
  * Extract claim
  * Verdict: [appropriate based on evidence]
  * Risk tier: A (High - medical)
@@ -1257,6 +1257,8 @@
  **Success:** Risk tier correctly assigned, appropriate warnings shown
++---
++
  == 13. POC Decision Gate ==
  === 13.1 Decision Framework ===
@@ -1266,7 +1266,6 @@
  **Option A: GO (Proceed to POC2)**
  **Conditions:**
--
  * AI quality ≥70% without manual editing
  * Basic claim → verdict pipeline validated
  * Internal + advisor feedback positive
@@ -1275,16 +1275,16 @@
  * Clear path to improving AI quality to ≥90%
  **Next Steps:**
--
  * Plan POC2 development (add scenarios)
  * Design scenario architecture
  * Expand to Evidence Model structure
  * Test with more complex articles
++---
++
  **Option B: NO-GO (Pivot or Stop)**
  **Conditions:**
--
  * AI quality < 60%
  * Requires manual editing for most analyses (> 50%)
  * Feedback indicates fundamental flaws
@@ -1292,14 +1292,14 @@
  * No clear path to improvement
  **Next Steps:**
--
  * **Pivot:** Change to hybrid human-AI approach (accept manual review required)
  * **Stop:** Conclude approach not viable, revisit later
++---
++
  **Option C: ITERATE (Improve POC)**
  **Conditions:**
--
  * Concept has merit but execution needs work
  * Specific improvements identified
  * Addressable with better prompts/approach
@@ -1306,43 +1306,46 @@
  * AI quality between 60-70%
  **Next Steps:**
--
  * Improve AI prompts
  * Test different approaches
  * Re-run POC with improvements
  * Then make GO/NO-GO decision
++---
++
  === 13.2 Decision Criteria Summary ===
  {{code}}
--AI Quality < 60% → NO-GO (approach doesn't work)
++AI Quality < 60%  → NO-GO (approach doesn't work)
  AI Quality 60-70% → ITERATE (improve and retry)
--AI Quality ≥70% → GO (proceed to POC2)
++AI Quality ≥70%   → GO (proceed to POC2)
  {{/code}}
++---
++
  == 14. Key Risks & Mitigations ==
  === 14.1 Risk: AI Quality Not Good Enough ===
--**Likelihood:** Medium-High
--**Impact:** POC fails
++**Likelihood:** Medium-High
++**Impact:** POC fails
  **Mitigation:**
--
  * Extensive prompt engineering and testing
--* Use best available AI models (role-based selection; configured via LLM abstraction)
++* Use best available AI models (Sonnet 4.5)
  * Test with diverse article types
  * Iterate on prompts based on results
  **Acceptance:** This is what POC tests - be ready for failure
++---
++
  === 14.2 Risk: AI Consistency Issues ===
--**Likelihood:** Medium
--**Impact:** Works sometimes, fails other times
++**Likelihood:** Medium
++**Impact:** Works sometimes, fails other times
  **Mitigation:**
--
  * Test with 10+ diverse articles
  * Measure success rate honestly
  * Improve prompts to increase consistency
@@ -1349,13 +1349,14 @@
  **Acceptance:** Some variability OK if average quality ≥70%
++---
++
  === 14.3 Risk: Output Incomprehensible ===
--**Likelihood:** Low-Medium
--**Impact:** Users can't understand analysis
++**Likelihood:** Low-Medium
++**Impact:** Users can't understand analysis
  **Mitigation:**
--
  * Create clear explainer document
  * Iterate on output format
  * Test with non-technical reviewers
@@ -1363,13 +1363,14 @@
  **Acceptance:** Iterate until comprehensible
++---
++
  === 14.4 Risk: API Rate Limits / Costs ===
--**Likelihood:** Low
--**Impact:** System slow or expensive
++**Likelihood:** Low
++**Impact:** System slow or expensive
  **Mitigation:**
--
  * Monitor API usage
  * Implement retry logic
  * Estimate costs before scaling
@@ -1376,13 +1376,14 @@
  **Acceptance:** POC can be slow and expensive (optimization later)
++---
++
  === 14.5 Risk: Scope Creep ===
--**Likelihood:** Medium
--**Impact:** POC becomes too complex
++**Likelihood:** Medium
++**Impact:** POC becomes too complex
  **Mitigation:**
--
  * Strict scope discipline
  * Say NO to feature additions
  * Keep focus on core question
@@ -1389,19 +1389,18 @@
  **Acceptance:** POC is minimal by design
++---
++
  == 15. POC Philosophy ==
  === 15.1 Core Principles ===
--* \\
--** \\
--**1. Build Less, Learn More
++**1. Build Less, Learn More**
  * Minimum features to test hypothesis
  * Don't build unvalidated features
  * Focus on core question only
  **2. Fail Fast**
--
  * Quick test of hardest part (AI capability)
  * Accept that POC might fail
  * Better to discover issues early
@@ -1408,45 +1408,45 @@
  * Honest assessment over optimistic hope
  **3. Test First, Build Second**
--
  * Validate AI can do this before building platform
  * Don't assume it will work
  * Let results guide decisions
  **4. Automation First**
--
  * No manual editing allowed
  * Tests scalability, not just feasibility
  * Proves approach can work at scale
  **5. Honest Assessment**
--
  * Don't cherry-pick examples
  * Don't manually fix bad outputs
  * Document failures openly
  * Make data-driven decisions
++---
++
  === 15.2 What POC Is ===
--✅ Testing AI capability without humans
--✅ Proving core technical concept
--✅ Fast validation of approach
--✅ Honest assessment of feasibility
++✅ Testing AI capability without humans
++✅ Proving core technical concept
++✅ Fast validation of approach
++✅ Honest assessment of feasibility
++---
++
  === 15.3 What POC Is NOT ===
--❌ Building a product
--❌ Production-ready system
--❌ Feature-complete platform
--❌ Perfectly accurate analysis
--❌ Polished user experience
++❌ Building a product
++❌ Production-ready system
++❌ Feature-complete platform
++❌ Perfectly accurate analysis
++❌ Polished user experience
--== 16. Success ==
++---
-- Clear Path Forward ==
++== 16. Success = Clear Path Forward ==
  **If POC succeeds (≥70% AI quality):**
--
  * ✅ Approach validated
  * ✅ Proceed to POC2 (add scenarios)
  * ✅ Design full Evidence Model structure
@@ -1454,7 +1454,6 @@
  * ✅ Focus on improving AI quality from 70% → 90%
  **If POC fails (< 60% AI quality):**
--
  * ✅ Learn what doesn't work
  * ✅ Pivot to different approach
  * ✅ OR wait for better AI technology
@@ -1462,62 +1462,18 @@
  **Either way, POC provides clarity.**
++---
++
  == 17. Related Pages ==
--* [[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]
--* [[Requirements>>FactHarbor.Specification.Requirements.WebHome]]
--* [[Gap Analysis>>FactHarbor.Specification.Requirements.GapAnalysis]]
--* [[Architecture>>Archive.FactHarbor.Specification.Architecture.WebHome]]
--* [[AKEL>>Archive.FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
++* [[User Needs>>FactHarbor.Specification.Requirements.User Needs]]
++* [[Requirements>>FactHarbor.Requirements.WebHome]]
++* [[Gap Analysis>>FactHarbor.Analysis.GapAnalysis]]
++* [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]
++* [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
  * [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]
++---
++
  **Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)
--
--=== NFR-POC-11: LLM Provider Abstraction (POC1) ===
--
--**Requirement:** POC1 MUST implement LLM abstraction layer with support for multiple providers.
--
--**POC1 Implementation:**
--
--* **Primary Provider:** Anthropic Claude API
--* Stage 1: Provider-default FAST model
--* Stage 2: Provider-default REASONING model (cached)
--* Stage 3: Provider-default REASONING model
--
--* **Provider Interface:** Abstract LLMProvider interface implemented
--
--* **Configuration:** Environment variables for provider selection
--* {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}}
--* {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}}
--* {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}}
--
--* **Failover:** Basic error handling with cache fallback for Stage 2
--
--* **Cost Tracking:** Log provider name and cost per request
--
--**Future (POC2/Beta):**
--
--* Secondary provider (OpenAI) with automatic failover
--* Admin API for runtime provider switching
--* Cost comparison dashboard
--* Cross-provider output verification
--
--**Success Criteria:**
--
--* All LLM calls go through abstraction layer (no direct API calls)
--* Provider can be changed via environment variable without code changes
--* Cost tracking includes provider name in logs
--* Stage 2 falls back to cache on provider failure
--
--**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor.Specification.POC.API-and-Schemas.WebHome]] Section 6
--
--**Dependencies:**
--
--* NFR-14 (Main Requirements)
--* Design Decision 9
--* Architecture Section 2.2
--
--**Priority:** HIGH (P1)
--
--**Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.

Changes for page POC Requirements (POC1 & POC2)

Summary

Details

Applications

Navigation

Need help?