Last modified by Robert Schaub on 2025/12/24 18:27

From version 2.1
edited by Robert Schaub
on 2025/12/24 13:58
Change comment: Imported from XAR
To version 3.1
edited by Robert Schaub
on 2025/12/24 17:59
Change comment: Imported from XAR

Summary

Details

Page properties
Content
... ... @@ -1,7 +1,7 @@
1 1  = POC Requirements =
2 2  
3 -**Status:** ✅ Approved for Development
4 -**Version:** 2.0 (Updated after Specification Cross-Check)
3 +**Status:** ✅ Approved for Development
4 +**Version:** 2.0 (Updated after Specification Cross-Check)
5 5  **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
6 6  
7 7  == 1. POC Overview ==
... ... @@ -63,7 +63,7 @@
63 63  
64 64  **What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument
65 65  
66 -**Length:** 4-6 sentences
66 +**Length:** 4-6 sentences
67 67  
68 68  **Content (Required Elements):**
69 69  1. **Article's main thesis/claim** - What is the article trying to argue or prove?
... ... @@ -113,9 +113,9 @@
113 113  
114 114  === 2.2 Component 2: CLAIMS IDENTIFICATION ===
115 115  
116 -**What:** List of factual claims extracted from article
117 -**Format:** Numbered list
118 -**Quantity:** 3-5 claims
116 +**What:** List of factual claims extracted from article
117 +**Format:** Numbered list
118 +**Quantity:** 3-5 claims
119 119  **Requirements:**
120 120  * Factual claims only (not opinions/questions)
121 121  * Clearly stated
... ... @@ -133,8 +133,8 @@
133 133  
134 134  === 2.3 Component 3: CLAIMS VERDICTS ===
135 135  
136 -**What:** Verdict for each claim identified
137 -**Format:** Per claim structure
136 +**What:** Verdict for each claim identified
137 +**Format:** Per claim structure
138 138  
139 139  **Required Elements:**
140 140  * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
... ... @@ -161,7 +161,7 @@
161 161  
162 162  **Risk Tier Display:**
163 163  * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections
164 -* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
164 +* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
165 165  * **Tier C (Green):** Low Risk - Facts/Definitions/History
166 166  
167 167  **Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.
... ... @@ -168,8 +168,8 @@
168 168  
169 169  === 2.4 Component 4: ARTICLE SUMMARY (Optional) ===
170 170  
171 -**What:** Brief summary of original article content
172 -**Length:** 3-5 sentences
171 +**What:** Brief summary of original article content
172 +**Length:** 3-5 sentences
173 173  **Tone:** Neutral (article's position, not FactHarbor's analysis)
174 174  
175 175  **Example:**
... ... @@ -289,7 +289,7 @@
289 289  **POC Architecture (Simplified):**
290 290  {{code}}
291 291  User Input → Single AKEL Call → Output Display
292 - (all processing)
292 + (all processing)
293 293  {{/code}}
294 294  
295 295  **Full System Architecture:**
... ... @@ -382,15 +382,15 @@
382 382  **Primary Label (top of analysis):**
383 383  {{code}}
384 384  ╔════════════════════════════════════════════════════════════╗
385 -║ [AI-GENERATED - POC/DEMO]
386 -║
387 -║ This analysis was produced entirely by AI and has not
388 -║ been human-reviewed. Use for demonstration purposes.
389 -║
390 -║ Source: AI/AKEL v1.0 (POC)
391 -║ Review Status: Not Reviewed (Proof-of-Concept)
392 -║ Quality Gates: 4/4 Passed (Simplified)
393 -║ Last Updated: [timestamp]
385 +║ [AI-GENERATED - POC/DEMO] ║
386 +║ ║
387 +║ This analysis was produced entirely by AI and has not ║
388 +║ been human-reviewed. Use for demonstration purposes. ║
389 +║ ║
390 +║ Source: AI/AKEL v1.0 (POC) ║
391 +║ Review Status: Not Reviewed (Proof-of-Concept) ║
392 +║ Quality Gates: 4/4 Passed (Simplified) ║
393 +║ Last Updated: [timestamp] ║
394 394  ╚════════════════════════════════════════════════════════════╝
395 395  {{/code}}
396 396  
... ... @@ -575,10 +575,10 @@
575 575  
576 576  1. Extract 3-5 factual claims
577 577  2. For each claim:
578 - - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
579 - - Assign confidence score (0-100%)
580 - - Assign risk tier (A/B/C)
581 - - Write brief reasoning (1-3 sentences)
578 + - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
579 + - Assign confidence score (0-100%)
580 + - Assign risk tier (A/B/C)
581 + - Write brief reasoning (1-3 sentences)
582 582  3. Generate analysis summary (3-5 sentences)
583 583  4. Generate article summary (3-5 sentences)
584 584  5. Run basic quality checks
... ... @@ -697,11 +697,11 @@
697 697  
698 698  **Functionality:**
699 699  * For each claim, AI:
700 - * Evaluates claim based on available evidence/knowledge
701 - * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
702 - * Assigns confidence score (0-100%)
703 - * Assigns risk tier (A/B/C)
704 - * Writes brief reasoning (1-3 sentences)
700 + * Evaluates claim based on available evidence/knowledge
701 + * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
702 + * Assigns confidence score (0-100%)
703 + * Assigns risk tier (A/B/C)
704 + * Writes brief reasoning (1-3 sentences)
705 705  * System displays verdict for each claim
706 706  
707 707  **Critical:** NO MANUAL EDITING ALLOWED
... ... @@ -729,9 +729,9 @@
729 729  
730 730  **Functionality:**
731 731  * AI summarizes findings in 3-5 sentences:
732 - * How many claims found
733 - * Distribution of verdicts
734 - * Overall assessment
732 + * How many claims found
733 + * Distribution of verdicts
734 + * Overall assessment
735 735  * System displays at top of results
736 736  
737 737  **Critical:** NO MANUAL EDITING ALLOWED
... ... @@ -813,8 +813,8 @@
813 813  **Pipeline:**
814 814  {{code}}
815 815  User Input → AKEL Processing → Output Display
816 -
817 - ZERO human editing
816 + ↓
817 + ZERO human editing
818 818  {{/code}}
819 819  
820 820  **If AI output is poor:**
... ... @@ -952,23 +952,23 @@
952 952  
953 953  {{code}}
954 954  1. User submits text or URL
955 -
955 + ↓
956 956  2. Backend receives request
957 -
957 + ↓
958 958  3. If URL: Fetch article text
959 -
959 + ↓
960 960  4. Call Claude API with single prompt:
961 - "Extract claims, evaluate each, provide verdicts"
962 -
961 + "Extract claims, evaluate each, provide verdicts"
962 + ↓
963 963  5. Claude API returns:
964 - - Analysis summary
965 - - Claims list
966 - - Verdicts for each claim (with risk tiers)
967 - - Article summary (optional)
968 - - Quality gate results
969 -
964 + - Analysis summary
965 + - Claims list
966 + - Verdicts for each claim (with risk tiers)
967 + - Article summary (optional)
968 + - Quality gate results
969 + ↓
970 970  6. Backend parses response
971 -
971 + ↓
972 972  7. Frontend displays results with Mode 2 labeling
973 973  {{/code}}
974 974  
... ... @@ -981,36 +981,36 @@
981 981  Task: Analyze this article and provide:
982 982  
983 983  1. Identify the article's main thesis/conclusion
984 - - What is the article trying to argue or prove?
985 - - What is the primary claim or conclusion?
984 + - What is the article trying to argue or prove?
985 + - What is the primary claim or conclusion?
986 986  
987 987  2. Extract 3-5 factual claims from the article
988 - - Note which claims are CENTRAL to the main thesis
989 - - Note which claims are SUPPORTING facts
988 + - Note which claims are CENTRAL to the main thesis
989 + - Note which claims are SUPPORTING facts
990 990  
991 991  3. For each claim:
992 - - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
993 - - Assign confidence score (0-100%)
994 - - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
995 - - Write brief reasoning (1-3 sentences)
992 + - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
993 + - Assign confidence score (0-100%)
994 + - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
995 + - Write brief reasoning (1-3 sentences)
996 996  
997 997  4. Assess relationship between claims and main thesis:
998 - - Do the claims actually support the article's conclusion?
999 - - Are there logical leaps or unsupported inferences?
1000 - - Is the article's framing misleading even if individual facts are accurate?
998 + - Do the claims actually support the article's conclusion?
999 + - Are there logical leaps or unsupported inferences?
1000 + - Is the article's framing misleading even if individual facts are accurate?
1001 1001  
1002 1002  5. Run quality gates:
1003 - - Check: ≥2 sources found
1004 - - Attempt: Basic contradiction search
1005 - - Calculate: Confidence scores
1006 - - Verify: Structural integrity
1003 + - Check: ≥2 sources found
1004 + - Attempt: Basic contradiction search
1005 + - Calculate: Confidence scores
1006 + - Verify: Structural integrity
1007 1007  
1008 1008  6. Write context-aware analysis summary (4-6 sentences):
1009 - - State article's main thesis
1010 - - Report claims found and verdict distribution
1011 - - Note if central claims are problematic
1012 - - Assess whether evidence supports conclusion
1013 - - Overall credibility considering claim importance
1009 + - State article's main thesis
1010 + - Report claims found and verdict distribution
1011 + - Note if central claims are problematic
1012 + - Assess whether evidence supports conclusion
1013 + - Overall credibility considering claim importance
1014 1014  
1015 1015  7. Write article summary (3-5 sentences: neutral summary of article content)
1016 1016  
... ... @@ -1234,9 +1234,9 @@
1234 1234  === 13.2 Decision Criteria Summary ===
1235 1235  
1236 1236  {{code}}
1237 -AI Quality < 60% → NO-GO (approach doesn't work)
1237 +AI Quality < 60% → NO-GO (approach doesn't work)
1238 1238  AI Quality 60-70% → ITERATE (improve and retry)
1239 -AI Quality ≥70% → GO (proceed to POC2)
1239 +AI Quality ≥70% → GO (proceed to POC2)
1240 1240  {{/code}}
1241 1241  
1242 1242  == 14. Key Risks & Mitigations ==
... ... @@ -1243,8 +1243,8 @@
1243 1243  
1244 1244  === 14.1 Risk: AI Quality Not Good Enough ===
1245 1245  
1246 -**Likelihood:** Medium-High
1247 -**Impact:** POC fails
1246 +**Likelihood:** Medium-High
1247 +**Impact:** POC fails
1248 1248  
1249 1249  **Mitigation:**
1250 1250  * Extensive prompt engineering and testing
... ... @@ -1256,8 +1256,8 @@
1256 1256  
1257 1257  === 14.2 Risk: AI Consistency Issues ===
1258 1258  
1259 -**Likelihood:** Medium
1260 -**Impact:** Works sometimes, fails other times
1259 +**Likelihood:** Medium
1260 +**Impact:** Works sometimes, fails other times
1261 1261  
1262 1262  **Mitigation:**
1263 1263  * Test with 10+ diverse articles
... ... @@ -1268,8 +1268,8 @@
1268 1268  
1269 1269  === 14.3 Risk: Output Incomprehensible ===
1270 1270  
1271 -**Likelihood:** Low-Medium
1272 -**Impact:** Users can't understand analysis
1271 +**Likelihood:** Low-Medium
1272 +**Impact:** Users can't understand analysis
1273 1273  
1274 1274  **Mitigation:**
1275 1275  * Create clear explainer document
... ... @@ -1281,8 +1281,8 @@
1281 1281  
1282 1282  === 14.4 Risk: API Rate Limits / Costs ===
1283 1283  
1284 -**Likelihood:** Low
1285 -**Impact:** System slow or expensive
1284 +**Likelihood:** Low
1285 +**Impact:** System slow or expensive
1286 1286  
1287 1287  **Mitigation:**
1288 1288  * Monitor API usage
... ... @@ -1293,8 +1293,8 @@
1293 1293  
1294 1294  === 14.5 Risk: Scope Creep ===
1295 1295  
1296 -**Likelihood:** Medium
1297 -**Impact:** POC becomes too complex
1296 +**Likelihood:** Medium
1297 +**Impact:** POC becomes too complex
1298 1298  
1299 1299  **Mitigation:**
1300 1300  * Strict scope discipline
... ... @@ -1336,18 +1336,18 @@
1336 1336  
1337 1337  === 15.2 What POC Is ===
1338 1338  
1339 -✅ Testing AI capability without humans
1340 -✅ Proving core technical concept
1341 -✅ Fast validation of approach
1342 -✅ Honest assessment of feasibility
1339 +✅ Testing AI capability without humans
1340 +✅ Proving core technical concept
1341 +✅ Fast validation of approach
1342 +✅ Honest assessment of feasibility
1343 1343  
1344 1344  === 15.3 What POC Is NOT ===
1345 1345  
1346 -❌ Building a product
1347 -❌ Production-ready system
1348 -❌ Feature-complete platform
1349 -❌ Perfectly accurate analysis
1350 -❌ Polished user experience
1346 +❌ Building a product
1347 +❌ Production-ready system
1348 +❌ Feature-complete platform
1349 +❌ Perfectly accurate analysis
1350 +❌ Polished user experience
1351 1351  
1352 1352  == 16. Success = Clear Path Forward ==
1353 1353  
... ... @@ -1377,3 +1377,52 @@
1377 1377  
1378 1378  **Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)
1379 1379  
1380 +
1381 +=== NFR-POC-11: LLM Provider Abstraction (POC1) ===
1382 +
1383 +**Requirement:** POC1 MUST implement LLM abstraction layer with support for multiple providers.
1384 +
1385 +**POC1 Implementation:**
1386 +
1387 +* **Primary Provider:** Anthropic Claude API
1388 + * Stage 1: Claude Haiku 4
1389 + * Stage 2: Claude Sonnet 3.5 (cached)
1390 + * Stage 3: Claude Sonnet 3.5
1391 +
1392 +* **Provider Interface:** Abstract LLMProvider interface implemented
1393 +
1394 +* **Configuration:** Environment variables for provider selection
1395 + * {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}}
1396 + * {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}}
1397 + * {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}}
1398 +
1399 +* **Failover:** Basic error handling with cache fallback for Stage 2
1400 +
1401 +* **Cost Tracking:** Log provider name and cost per request
1402 +
1403 +**Future (POC2/Beta):**
1404 +
1405 +* Secondary provider (OpenAI) with automatic failover
1406 +* Admin API for runtime provider switching
1407 +* Cost comparison dashboard
1408 +* Cross-provider output verification
1409 +
1410 +**Success Criteria:**
1411 +
1412 +* All LLM calls go through abstraction layer (no direct API calls)
1413 +* Provider can be changed via environment variable without code changes
1414 +* Cost tracking includes provider name in logs
1415 +* Stage 2 falls back to cache on provider failure
1416 +
1417 +**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor.Specification.POC.API-and-Schemas.WebHome]] Section 6
1418 +
1419 +**Dependencies:**
1420 +* NFR-14 (Main Requirements)
1421 +* Design Decision 9
1422 +* Architecture Section 2.2
1423 +
1424 +**Priority:** HIGH (P1)
1425 +
1426 +**Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.
1427 +
1428 +