Changes for page POC Requirements (POC1 & POC2)
Last modified by Robert Schaub on 2025/12/24 18:27
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -1,7 +1,7 @@ 1 1 = POC Requirements = 2 2 3 -**Status:** ✅ Approved for Development 4 -**Version:** 2.0 (Updated after Specification Cross-Check) 3 +**Status:** ✅ Approved for Development 4 +**Version:** 2.0 (Updated after Specification Cross-Check) 5 5 **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention 6 6 7 7 == 1. POC Overview == ... ... @@ -63,7 +63,7 @@ 63 63 64 64 **What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument 65 65 66 -**Length:** 4-6 sentences 66 +**Length:** 4-6 sentences 67 67 68 68 **Content (Required Elements):** 69 69 1. **Article's main thesis/claim** - What is the article trying to argue or prove? ... ... @@ -113,9 +113,9 @@ 113 113 114 114 === 2.2 Component 2: CLAIMS IDENTIFICATION === 115 115 116 -**What:** List of factual claims extracted from article 117 -**Format:** Numbered list 118 -**Quantity:** 3-5 claims 116 +**What:** List of factual claims extracted from article 117 +**Format:** Numbered list 118 +**Quantity:** 3-5 claims 119 119 **Requirements:** 120 120 * Factual claims only (not opinions/questions) 121 121 * Clearly stated ... ... @@ -133,8 +133,8 @@ 133 133 134 134 === 2.3 Component 3: CLAIMS VERDICTS === 135 135 136 -**What:** Verdict for each claim identified 137 -**Format:** Per claim structure 136 +**What:** Verdict for each claim identified 137 +**Format:** Per claim structure 138 138 139 139 **Required Elements:** 140 140 * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED ... ... @@ -161,7 +161,7 @@ 161 161 162 162 **Risk Tier Display:** 163 163 * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections 164 -* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality 164 +* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality 165 165 * **Tier C (Green):** Low Risk - Facts/Definitions/History 166 166 167 167 **Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow. ... ... @@ -168,8 +168,8 @@ 168 168 169 169 === 2.4 Component 4: ARTICLE SUMMARY (Optional) === 170 170 171 -**What:** Brief summary of original article content 172 -**Length:** 3-5 sentences 171 +**What:** Brief summary of original article content 172 +**Length:** 3-5 sentences 173 173 **Tone:** Neutral (article's position, not FactHarbor's analysis) 174 174 175 175 **Example:** ... ... @@ -289,7 +289,7 @@ 289 289 **POC Architecture (Simplified):** 290 290 {{code}} 291 291 User Input → Single AKEL Call → Output Display 292 - (all processing) 292 + (all processing) 293 293 {{/code}} 294 294 295 295 **Full System Architecture:** ... ... @@ -382,15 +382,15 @@ 382 382 **Primary Label (top of analysis):** 383 383 {{code}} 384 384 ╔════════════════════════════════════════════════════════════╗ 385 -║ [AI-GENERATED - POC/DEMO] ║ 386 -║ ║ 387 -║ This analysis was produced entirely by AI and has not ║ 388 -║ been human-reviewed. Use for demonstration purposes. ║ 389 -║ ║ 390 -║ Source: AI/AKEL v1.0 (POC) ║ 391 -║ Review Status: Not Reviewed (Proof-of-Concept) ║ 392 -║ Quality Gates: 4/4 Passed (Simplified) ║ 393 -║ Last Updated: [timestamp] ║ 385 +║ [AI-GENERATED - POC/DEMO] ║ 386 +║ ║ 387 +║ This analysis was produced entirely by AI and has not ║ 388 +║ been human-reviewed. Use for demonstration purposes. ║ 389 +║ ║ 390 +║ Source: AI/AKEL v1.0 (POC) ║ 391 +║ Review Status: Not Reviewed (Proof-of-Concept) ║ 392 +║ Quality Gates: 4/4 Passed (Simplified) ║ 393 +║ Last Updated: [timestamp] ║ 394 394 ╚════════════════════════════════════════════════════════════╝ 395 395 {{/code}} 396 396 ... ... @@ -575,10 +575,10 @@ 575 575 576 576 1. Extract 3-5 factual claims 577 577 2. For each claim: 578 - - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED) 579 - - Assign confidence score (0-100%) 580 - - Assign risk tier (A/B/C) 581 - - Write brief reasoning (1-3 sentences) 578 + - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED) 579 + - Assign confidence score (0-100%) 580 + - Assign risk tier (A/B/C) 581 + - Write brief reasoning (1-3 sentences) 582 582 3. Generate analysis summary (3-5 sentences) 583 583 4. Generate article summary (3-5 sentences) 584 584 5. Run basic quality checks ... ... @@ -697,11 +697,11 @@ 697 697 698 698 **Functionality:** 699 699 * For each claim, AI: 700 - * Evaluates claim based on available evidence/knowledge 701 - * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED 702 - * Assigns confidence score (0-100%) 703 - * Assigns risk tier (A/B/C) 704 - * Writes brief reasoning (1-3 sentences) 700 + * Evaluates claim based on available evidence/knowledge 701 + * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED 702 + * Assigns confidence score (0-100%) 703 + * Assigns risk tier (A/B/C) 704 + * Writes brief reasoning (1-3 sentences) 705 705 * System displays verdict for each claim 706 706 707 707 **Critical:** NO MANUAL EDITING ALLOWED ... ... @@ -729,9 +729,9 @@ 729 729 730 730 **Functionality:** 731 731 * AI summarizes findings in 3-5 sentences: 732 - * How many claims found 733 - * Distribution of verdicts 734 - * Overall assessment 732 + * How many claims found 733 + * Distribution of verdicts 734 + * Overall assessment 735 735 * System displays at top of results 736 736 737 737 **Critical:** NO MANUAL EDITING ALLOWED ... ... @@ -813,8 +813,8 @@ 813 813 **Pipeline:** 814 814 {{code}} 815 815 User Input → AKEL Processing → Output Display 816 - ↓ 817 - ZERO human editing 816 + ↓ 817 + ZERO human editing 818 818 {{/code}} 819 819 820 820 **If AI output is poor:** ... ... @@ -952,23 +952,23 @@ 952 952 953 953 {{code}} 954 954 1. User submits text or URL 955 - ↓ 955 + ↓ 956 956 2. Backend receives request 957 - ↓ 957 + ↓ 958 958 3. If URL: Fetch article text 959 - ↓ 959 + ↓ 960 960 4. Call Claude API with single prompt: 961 - "Extract claims, evaluate each, provide verdicts" 962 - ↓ 961 + "Extract claims, evaluate each, provide verdicts" 962 + ↓ 963 963 5. Claude API returns: 964 - - Analysis summary 965 - - Claims list 966 - - Verdicts for each claim (with risk tiers) 967 - - Article summary (optional) 968 - - Quality gate results 969 - ↓ 964 + - Analysis summary 965 + - Claims list 966 + - Verdicts for each claim (with risk tiers) 967 + - Article summary (optional) 968 + - Quality gate results 969 + ↓ 970 970 6. Backend parses response 971 - ↓ 971 + ↓ 972 972 7. Frontend displays results with Mode 2 labeling 973 973 {{/code}} 974 974 ... ... @@ -981,36 +981,36 @@ 981 981 Task: Analyze this article and provide: 982 982 983 983 1. Identify the article's main thesis/conclusion 984 - - What is the article trying to argue or prove? 985 - - What is the primary claim or conclusion? 984 + - What is the article trying to argue or prove? 985 + - What is the primary claim or conclusion? 986 986 987 987 2. Extract 3-5 factual claims from the article 988 - - Note which claims are CENTRAL to the main thesis 989 - - Note which claims are SUPPORTING facts 988 + - Note which claims are CENTRAL to the main thesis 989 + - Note which claims are SUPPORTING facts 990 990 991 991 3. For each claim: 992 - - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED) 993 - - Assign confidence score (0-100%) 994 - - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions) 995 - - Write brief reasoning (1-3 sentences) 992 + - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED) 993 + - Assign confidence score (0-100%) 994 + - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions) 995 + - Write brief reasoning (1-3 sentences) 996 996 997 997 4. Assess relationship between claims and main thesis: 998 - - Do the claims actually support the article's conclusion? 999 - - Are there logical leaps or unsupported inferences? 1000 - - Is the article's framing misleading even if individual facts are accurate? 998 + - Do the claims actually support the article's conclusion? 999 + - Are there logical leaps or unsupported inferences? 1000 + - Is the article's framing misleading even if individual facts are accurate? 1001 1001 1002 1002 5. Run quality gates: 1003 - - Check: ≥2 sources found 1004 - - Attempt: Basic contradiction search 1005 - - Calculate: Confidence scores 1006 - - Verify: Structural integrity 1003 + - Check: ≥2 sources found 1004 + - Attempt: Basic contradiction search 1005 + - Calculate: Confidence scores 1006 + - Verify: Structural integrity 1007 1007 1008 1008 6. Write context-aware analysis summary (4-6 sentences): 1009 - - State article's main thesis 1010 - - Report claims found and verdict distribution 1011 - - Note if central claims are problematic 1012 - - Assess whether evidence supports conclusion 1013 - - Overall credibility considering claim importance 1009 + - State article's main thesis 1010 + - Report claims found and verdict distribution 1011 + - Note if central claims are problematic 1012 + - Assess whether evidence supports conclusion 1013 + - Overall credibility considering claim importance 1014 1014 1015 1015 7. Write article summary (3-5 sentences: neutral summary of article content) 1016 1016 ... ... @@ -1234,9 +1234,9 @@ 1234 1234 === 13.2 Decision Criteria Summary === 1235 1235 1236 1236 {{code}} 1237 -AI Quality < 60% → NO-GO (approach doesn't work) 1237 +AI Quality < 60% → NO-GO (approach doesn't work) 1238 1238 AI Quality 60-70% → ITERATE (improve and retry) 1239 -AI Quality ≥70% → GO (proceed to POC2) 1239 +AI Quality ≥70% → GO (proceed to POC2) 1240 1240 {{/code}} 1241 1241 1242 1242 == 14. Key Risks & Mitigations == ... ... @@ -1243,8 +1243,8 @@ 1243 1243 1244 1244 === 14.1 Risk: AI Quality Not Good Enough === 1245 1245 1246 -**Likelihood:** Medium-High 1247 -**Impact:** POC fails 1246 +**Likelihood:** Medium-High 1247 +**Impact:** POC fails 1248 1248 1249 1249 **Mitigation:** 1250 1250 * Extensive prompt engineering and testing ... ... @@ -1256,8 +1256,8 @@ 1256 1256 1257 1257 === 14.2 Risk: AI Consistency Issues === 1258 1258 1259 -**Likelihood:** Medium 1260 -**Impact:** Works sometimes, fails other times 1259 +**Likelihood:** Medium 1260 +**Impact:** Works sometimes, fails other times 1261 1261 1262 1262 **Mitigation:** 1263 1263 * Test with 10+ diverse articles ... ... @@ -1268,8 +1268,8 @@ 1268 1268 1269 1269 === 14.3 Risk: Output Incomprehensible === 1270 1270 1271 -**Likelihood:** Low-Medium 1272 -**Impact:** Users can't understand analysis 1271 +**Likelihood:** Low-Medium 1272 +**Impact:** Users can't understand analysis 1273 1273 1274 1274 **Mitigation:** 1275 1275 * Create clear explainer document ... ... @@ -1281,8 +1281,8 @@ 1281 1281 1282 1282 === 14.4 Risk: API Rate Limits / Costs === 1283 1283 1284 -**Likelihood:** Low 1285 -**Impact:** System slow or expensive 1284 +**Likelihood:** Low 1285 +**Impact:** System slow or expensive 1286 1286 1287 1287 **Mitigation:** 1288 1288 * Monitor API usage ... ... @@ -1293,8 +1293,8 @@ 1293 1293 1294 1294 === 14.5 Risk: Scope Creep === 1295 1295 1296 -**Likelihood:** Medium 1297 -**Impact:** POC becomes too complex 1296 +**Likelihood:** Medium 1297 +**Impact:** POC becomes too complex 1298 1298 1299 1299 **Mitigation:** 1300 1300 * Strict scope discipline ... ... @@ -1336,18 +1336,18 @@ 1336 1336 1337 1337 === 15.2 What POC Is === 1338 1338 1339 -✅ Testing AI capability without humans 1340 -✅ Proving core technical concept 1341 -✅ Fast validation of approach 1342 -✅ Honest assessment of feasibility 1339 +✅ Testing AI capability without humans 1340 +✅ Proving core technical concept 1341 +✅ Fast validation of approach 1342 +✅ Honest assessment of feasibility 1343 1343 1344 1344 === 15.3 What POC Is NOT === 1345 1345 1346 -❌ Building a product 1347 -❌ Production-ready system 1348 -❌ Feature-complete platform 1349 -❌ Perfectly accurate analysis 1350 -❌ Polished user experience 1346 +❌ Building a product 1347 +❌ Production-ready system 1348 +❌ Feature-complete platform 1349 +❌ Perfectly accurate analysis 1350 +❌ Polished user experience 1351 1351 1352 1352 == 16. Success = Clear Path Forward == 1353 1353 ... ... @@ -1368,9 +1368,9 @@ 1368 1368 1369 1369 == 17. Related Pages == 1370 1370 1371 -* [[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]] 1372 -* [[Requirements>>FactHarbor.Specification.Requirements.WebHome]] 1373 -* [[Gap Analysis>>FactHarbor.Specification.Requirements.GapAnalysis]] 1371 +* [[User Needs>>Test.FactHarbor.Specification.Requirements.User Needs.WebHome]] 1372 +* [[Requirements>>Test.FactHarbor.Specification.Requirements.WebHome]] 1373 +* [[Gap Analysis>>Test.FactHarbor.Specification.Requirements.GapAnalysis]] 1374 1374 * [[Architecture>>FactHarbor.Specification.Architecture.WebHome]] 1375 1375 * [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]] 1376 1376 * [[Workflows>>FactHarbor.Specification.Workflows.WebHome]] ... ... @@ -1377,52 +1377,3 @@ 1377 1377 1378 1378 **Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment) 1379 1379 1380 - 1381 -=== NFR-POC-11: LLM Provider Abstraction (POC1) === 1382 - 1383 -**Requirement:** POC1 MUST implement LLM abstraction layer with support for multiple providers. 1384 - 1385 -**POC1 Implementation:** 1386 - 1387 -* **Primary Provider:** Anthropic Claude API 1388 - * Stage 1: Claude Haiku 4 1389 - * Stage 2: Claude Sonnet 3.5 (cached) 1390 - * Stage 3: Claude Sonnet 3.5 1391 - 1392 -* **Provider Interface:** Abstract LLMProvider interface implemented 1393 - 1394 -* **Configuration:** Environment variables for provider selection 1395 - * {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}} 1396 - * {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}} 1397 - * {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}} 1398 - 1399 -* **Failover:** Basic error handling with cache fallback for Stage 2 1400 - 1401 -* **Cost Tracking:** Log provider name and cost per request 1402 - 1403 -**Future (POC2/Beta):** 1404 - 1405 -* Secondary provider (OpenAI) with automatic failover 1406 -* Admin API for runtime provider switching 1407 -* Cost comparison dashboard 1408 -* Cross-provider output verification 1409 - 1410 -**Success Criteria:** 1411 - 1412 -* All LLM calls go through abstraction layer (no direct API calls) 1413 -* Provider can be changed via environment variable without code changes 1414 -* Cost tracking includes provider name in logs 1415 -* Stage 2 falls back to cache on provider failure 1416 - 1417 -**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor.Specification.POC.API-and-Schemas.WebHome]] Section 6 1418 - 1419 -**Dependencies:** 1420 -* NFR-14 (Main Requirements) 1421 -* Design Decision 9 1422 -* Architecture Section 2.2 1423 - 1424 -**Priority:** HIGH (P1) 1425 - 1426 -**Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later. 1427 - 1428 -