Last modified by Robert Schaub on 2026/02/08 08:26

From version 2.1
edited by Robert Schaub
on 2025/12/24 21:53
Change comment: Imported from XAR
To version 1.1
edited by Robert Schaub
on 2025/12/19 16:13
Change comment: Imported from XAR

Summary

Details

Page properties
Title
... ... @@ -1,1 +1,1 @@
1 -POC Requirements (POC1 & POC2)
1 +POC Requirements
Content
... ... @@ -1,18 +1,11 @@
1 1  = POC Requirements =
2 2  
3 -
4 -{{info}}
5 -**POC1 Architecture:** 3-stage AKEL pipeline (Extract → Analyze → Holistic) with Redis caching, credit tracking, and LLM abstraction layer.
6 -
7 -See [[POC1 API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome]] for complete technical details.
8 -{{/info}}
9 -
10 -
11 -
12 -**Status:** ✅ Approved for Development
13 -**Version:** 2.0 (Updated after Specification Cross-Check)
3 +**Status:** ✅ Approved for Development
4 +**Version:** 2.0 (Updated after Specification Cross-Check)
14 14  **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
15 15  
7 +---
8 +
16 16  == 1. POC Overview ==
17 17  
18 18  === 1.1 What POC Tests ===
... ... @@ -33,6 +33,8 @@
33 33  * Perfect accuracy
34 34  * Complete feature set
35 35  
29 +---
30 +
36 36  === 1.2 Scenarios Deferred to POC2 ===
37 37  
38 38  **Intentional Simplification:**
... ... @@ -66,65 +66,33 @@
66 66  Claims → Verdicts (scenarios implicit in reasoning)
67 67  {{/code}}
68 68  
64 +---
65 +
69 69  == 2. POC Output Specification ==
70 70  
71 -=== 2.1 Component 1: ANALYSIS SUMMARY (Context-Aware) ===
68 +=== 2.1 Component 1: ANALYSIS SUMMARY ===
72 72  
73 -**What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument
70 +**What:** Brief overview of findings
71 +**Length:** 3-5 sentences
72 +**Content:**
73 +* How many claims found
74 +* Distribution of verdicts
75 +* Overall assessment
74 74  
75 -**Length:** 4-6 sentences
76 -
77 -**Content (Required Elements):**
78 -1. **Article's main thesis/claim** - What is the article trying to argue or prove?
79 -2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts
80 -3. **Central vs. supporting claims** - Which claims are central to the article's argument?
81 -4. **Relationship assessment** - Do the claims support the article's conclusion?
82 -5. **Overall credibility** - Final assessment considering claim importance
83 -
84 -**Critical Innovation:**
85 -
86 -POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might:
87 -* Make accurate supporting facts but draw unsupported conclusions
88 -* Have one false central claim that invalidates the whole argument
89 -* Misframe accurate information to mislead
90 -
91 -**Good Example (Context-Aware):**
77 +**Example:**
92 92  {{code}}
93 -This article argues that coffee cures cancer based on its antioxidant
94 -content. We analyzed 3 factual claims: 2 about coffee's chemical
95 -properties are well-supported, but the main causal claim is refuted
96 -by current evidence. The article confuses correlation with causation.
97 -Overall assessment: MISLEADING - makes an unsupported medical claim
98 -despite citing some accurate facts.
79 +This article makes 4 claims about coffee's health effects. We found
80 +2 claims are well-supported, 1 is uncertain, and 1 is refuted.
81 +Overall assessment: mostly accurate with some exaggeration.
99 99  {{/code}}
100 100  
101 -**Poor Example (Simple Aggregation - Don't Do This):**
102 -{{code}}
103 -This article makes 3 claims. 2 are well-supported and 1 is refuted.
104 -Overall assessment: mostly accurate (67% accurate).
105 -{{/code}}
106 -↑ This misses that the refuted claim IS the article's main point!
84 +---
107 107  
108 -**What POC1 Tests:**
109 -
110 -Can AI identify and assess:
111 -* ✅ The article's main thesis/conclusion?
112 -* ✅ Which claims are central vs. supporting?
113 -* ✅ Whether the evidence supports the conclusion?
114 -* ✅ Overall credibility considering logical structure?
115 -
116 -**If AI Cannot Do This:**
117 -
118 -That's valuable to learn in POC1! We'll:
119 -* Note as limitation
120 -* Fall back to simple aggregation with warning
121 -* Design explicit article-level analysis for POC2
122 -
123 123  === 2.2 Component 2: CLAIMS IDENTIFICATION ===
124 124  
125 -**What:** List of factual claims extracted from article
126 -**Format:** Numbered list
127 -**Quantity:** 3-5 claims
88 +**What:** List of factual claims extracted from article
89 +**Format:** Numbered list
90 +**Quantity:** 3-5 claims
128 128  **Requirements:**
129 129  * Factual claims only (not opinions/questions)
130 130  * Clearly stated
... ... @@ -140,10 +140,12 @@
140 140  [4] Coffee prevents Alzheimer's completely
141 141  {{/code}}
142 142  
106 +---
107 +
143 143  === 2.3 Component 3: CLAIMS VERDICTS ===
144 144  
145 -**What:** Verdict for each claim identified
146 -**Format:** Per claim structure
110 +**What:** Verdict for each claim identified
111 +**Format:** Per claim structure
147 147  
148 148  **Required Elements:**
149 149  * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
... ... @@ -170,15 +170,17 @@
170 170  
171 171  **Risk Tier Display:**
172 172  * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections
173 -* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
138 +* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
174 174  * **Tier C (Green):** Low Risk - Facts/Definitions/History
175 175  
176 176  **Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.
177 177  
143 +---
144 +
178 178  === 2.4 Component 4: ARTICLE SUMMARY (Optional) ===
179 179  
180 -**What:** Brief summary of original article content
181 -**Length:** 3-5 sentences
147 +**What:** Brief summary of original article content
148 +**Length:** 3-5 sentences
182 182  **Tone:** Neutral (article's position, not FactHarbor's analysis)
183 183  
184 184  **Example:**
... ... @@ -190,60 +190,17 @@
190 190  to disease prevention. Recommends 2-3 cups daily for optimal health.
191 191  {{/code}}
192 192  
193 -=== 2.5 Component 5: USAGE STATISTICS (Cost Tracking) ===
160 +---
194 194  
195 -**What:** LLM usage metrics for cost optimization and scaling decisions
162 +=== 2.5 Total Output Size ===
196 196  
197 -**Purpose:**
198 -* Understand cost per analysis
199 -* Identify optimization opportunities
200 -* Project costs at scale
201 -* Inform architecture decisions
202 -
203 -**Display Format:**
204 -{{code}}
205 -USAGE STATISTICS:
206 -• Article: 2,450 words (12,300 characters)
207 -• Input tokens: 15,234
208 -• Output tokens: 892
209 -• Total tokens: 16,126
210 -• Estimated cost: $0.24 USD
211 -• Response time: 8.3 seconds
212 -• Cost per claim: $0.048
213 -• Model: claude-sonnet-4-20250514
214 -{{/code}}
215 -
216 -**Why This Matters:**
217 -
218 -At scale, LLM costs are critical:
219 -* 10,000 articles/month ≈ $200-500/month
220 -* 100,000 articles/month ≈ $2,000-5,000/month
221 -* Cost optimization can reduce expenses 30-50%
222 -
223 -**What POC1 Learns:**
224 -* How cost scales with article length
225 -* Prompt optimization opportunities (caching, compression)
226 -* Output verbosity tradeoffs
227 -* Model selection strategy (FAST vs. REASONING roles)
228 -* Article length limits (if needed)
229 -
230 -**Implementation:**
231 -* Claude API already returns usage data
232 -* No extra API calls needed
233 -* Display to user + log for aggregate analysis
234 -* Test with articles of varying lengths
235 -
236 -**Critical for GO/NO-GO:** Unit economics must be viable at scale!
237 -
238 -=== 2.6 Total Output Size ===
239 -
240 -**Combined:** ~220-350 words
241 -* Analysis Summary (Context-Aware): 60-90 words (4-6 sentences)
164 +**Combined:** ~200-300 words
165 +* Analysis Summary: 50-70 words
242 242  * Claims Identification: 30-50 words
243 243  * Claims Verdicts: 100-150 words
244 244  * Article Summary: 30-50 words (optional)
245 245  
246 -**Note:** Analysis summary is slightly longer (4-6 sentences vs. 3-5) to accommodate context-aware assessment of article structure and logical reasoning.
170 +---
247 247  
248 248  == 3. What's NOT in POC Scope ==
249 249  
... ... @@ -291,6 +291,8 @@
291 291  * ❌ Analytics
292 292  * ❌ A/B testing
293 293  
218 +---
219 +
294 294  == 4. POC Simplifications vs. Full System ==
295 295  
296 296  === 4.1 Architecture Comparison ===
... ... @@ -298,7 +298,7 @@
298 298  **POC Architecture (Simplified):**
299 299  {{code}}
300 300  User Input → Single AKEL Call → Output Display
301 - (all processing)
227 + (all processing)
302 302  {{/code}}
303 303  
304 304  **Full System Architecture:**
... ... @@ -319,6 +319,8 @@
319 319  |Data Model|Stateless (no database)|PostgreSQL + Redis + S3
320 320  |Architecture|Single prompt to Claude|AKEL Orchestrator + Components
321 321  
248 +---
249 +
322 322  === 4.2 Workflow Comparison ===
323 323  
324 324  **POC1 Workflow:**
... ... @@ -336,6 +336,8 @@
336 336  6. **Time Evolution** (versioning, re-evaluation triggers)
337 337  **Total: 6 phases with quality gates, ~10-30 seconds**
338 338  
267 +---
268 +
339 339  === 4.3 Why POC is Simplified ===
340 340  
341 341  **Engineering Rationale:**
... ... @@ -354,6 +354,8 @@
354 354  * ❌ POC doesn't validate scale (test in Beta)
355 355  * ❌ POC doesn't validate scenario architecture (design in POC2)
356 356  
287 +---
288 +
357 357  === 4.4 Gap Between POC1 and POC2/Beta ===
358 358  
359 359  **What needs to be built for POC2:**
... ... @@ -373,6 +373,8 @@
373 373  
374 374  **POC1 → POC2 is significant architectural expansion.**
375 375  
308 +---
309 +
376 376  == 5. Publication Mode & Labeling ==
377 377  
378 378  === 5.1 POC Publication Mode ===
... ... @@ -386,20 +386,22 @@
386 386  * All quality gates active (simplified)
387 387  * Risk tier classification shown (demo)
388 388  
323 +---
324 +
389 389  === 5.2 User-Facing Labels ===
390 390  
391 391  **Primary Label (top of analysis):**
392 392  {{code}}
393 393  ╔════════════════════════════════════════════════════════════╗
394 -║ [AI-GENERATED - POC/DEMO] ║
395 -║ ║
396 -║ This analysis was produced entirely by AI and has not ║
397 -║ been human-reviewed. Use for demonstration purposes. ║
398 -║ ║
399 -║ Source: AI/AKEL v1.0 (POC) ║
400 -║ Review Status: Not Reviewed (Proof-of-Concept) ║
401 -║ Quality Gates: 4/4 Passed (Simplified) ║
402 -║ Last Updated: [timestamp] ║
330 +║ [AI-GENERATED - POC/DEMO]
331 +║
332 +║ This analysis was produced entirely by AI and has not
333 +║ been human-reviewed. Use for demonstration purposes.
334 +║
335 +║ Source: AI/AKEL v1.0 (POC)
336 +║ Review Status: Not Reviewed (Proof-of-Concept)
337 +║ Quality Gates: 4/4 Passed (Simplified)
338 +║ Last Updated: [timestamp]
403 403  ╚════════════════════════════════════════════════════════════╝
404 404  {{/code}}
405 405  
... ... @@ -408,6 +408,8 @@
408 408  * **[Risk: B]** 🟡 Medium Risk (Policy/Science)
409 409  * **[Risk: C]** 🟢 Low Risk (Facts/Definitions)
410 410  
347 +---
348 +
411 411  === 5.3 Display Requirements ===
412 412  
413 413  **Must Show:**
... ... @@ -425,6 +425,8 @@
425 425  * Authoritative verdicts
426 426  * Complete accuracy
427 427  
366 +---
367 +
428 428  === 5.4 Mode 2 vs. Full System Publication ===
429 429  
430 430  |=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3
... ... @@ -435,6 +435,8 @@
435 435  |Risk Display|Demo only|Workflow-integrated|Validated
436 436  |User Actions|View only|Flag for review|Trust rating
437 437  
378 +---
379 +
438 438  == 6. Quality Gates (Simplified Implementation) ==
439 439  
440 440  === 6.1 Overview ===
... ... @@ -453,6 +453,8 @@
453 453  * Failures displayed to user (not blocking)
454 454  * Full system has comprehensive validation
455 455  
398 +---
399 +
456 456  === 6.2 Gate 1: Source Quality (Basic) ===
457 457  
458 458  **Full System Requirements:**
... ... @@ -473,6 +473,8 @@
473 473  
474 474  **Failure Handling:** Display error message, don't generate verdict
475 475  
420 +---
421 +
476 476  === 6.3 Gate 2: Contradiction Search (Basic) ===
477 477  
478 478  **Full System Requirements:**
... ... @@ -495,6 +495,8 @@
495 495  
496 496  **Failure Handling:** Note "limited contradiction search" in output
497 497  
444 +---
445 +
498 498  === 6.4 Gate 3: Uncertainty Quantification (Basic) ===
499 499  
500 500  **Full System Requirements:**
... ... @@ -515,6 +515,8 @@
515 515  
516 516  **Failure Handling:** Show "Confidence: Unknown" if calculation fails
517 517  
466 +---
467 +
518 518  === 6.5 Gate 4: Structural Integrity (Basic) ===
519 519  
520 520  **Full System Requirements:**
... ... @@ -535,6 +535,8 @@
535 535  
536 536  **Failure Handling:** Display error message
537 537  
488 +---
489 +
538 538  === 6.6 Quality Gate Display ===
539 539  
540 540  **POC shows simplified status:**
... ... @@ -557,6 +557,8 @@
557 557  Note: This analysis has limited evidence. Use with caution.
558 558  {{/code}}
559 559  
512 +---
513 +
560 560  === 6.7 Simplified vs. Full System ===
561 561  
562 562  |=Gate|=POC (Simplified)|=Full System
... ... @@ -567,12 +567,14 @@
567 567  
568 568  **POC Goal:** Demonstrate that quality gates are possible, not perfect implementation.
569 569  
524 +---
525 +
570 570  == 7. AKEL Architecture Comparison ==
571 571  
572 572  === 7.1 POC AKEL (Simplified) ===
573 573  
574 574  **Implementation:**
575 -* Single provider API call (REASONING model)
531 +* Single Claude API call (Sonnet 4.5)
576 576  * One comprehensive prompt
577 577  * All processing in single request
578 578  * No separate components
... ... @@ -584,10 +584,10 @@
584 584  
585 585  1. Extract 3-5 factual claims
586 586  2. For each claim:
587 - - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
588 - - Assign confidence score (0-100%)
589 - - Assign risk tier (A/B/C)
590 - - Write brief reasoning (1-3 sentences)
543 + - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
544 + - Assign confidence score (0-100%)
545 + - Assign risk tier (A/B/C)
546 + - Write brief reasoning (1-3 sentences)
591 591  3. Generate analysis summary (3-5 sentences)
592 592  4. Generate article summary (3-5 sentences)
593 593  5. Run basic quality checks
... ... @@ -597,6 +597,8 @@
597 597  
598 598  **Processing Time:** 10-18 seconds (estimate)
599 599  
556 +---
557 +
600 600  === 7.2 Full System AKEL (Production) ===
601 601  
602 602  **Architecture:**
... ... @@ -621,6 +621,8 @@
621 621  
622 622  **Processing Time:** 10-30 seconds (full pipeline)
623 623  
582 +---
583 +
624 624  === 7.3 Why POC Uses Single Call ===
625 625  
626 626  **Advantages:**
... ... @@ -643,6 +643,8 @@
643 643  
644 644  Full component architecture comes in Beta after POC validates concept.
645 645  
606 +---
607 +
646 646  === 7.4 Evolution Path ===
647 647  
648 648  **POC1:** Single prompt → Prove concept
... ... @@ -650,6 +650,8 @@
650 650  **Beta:** Multi-component AKEL → Production architecture
651 651  **Release 1.0:** Full AKEL + Federation → Scale
652 652  
615 +---
616 +
653 653  == 8. Functional Requirements ==
654 654  
655 655  === FR-POC-1: Article Input ===
... ... @@ -673,6 +673,8 @@
673 673  * User can paste URL of article
674 674  * System accepts input and triggers analysis
675 675  
640 +---
641 +
676 676  === FR-POC-2: Claim Extraction (Fully Automated) ===
677 677  
678 678  **Requirement:** AI automatically extracts 3-5 factual claims
... ... @@ -700,6 +700,8 @@
700 700  * Claims are clearly stated
701 701  * No manual editing required
702 702  
669 +---
670 +
703 703  === FR-POC-3: Verdict Generation (Fully Automated) ===
704 704  
705 705  **Requirement:** AI automatically generates verdict for each claim
... ... @@ -706,11 +706,11 @@
706 706  
707 707  **Functionality:**
708 708  * For each claim, AI:
709 - * Evaluates claim based on available evidence/knowledge
710 - * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
711 - * Assigns confidence score (0-100%)
712 - * Assigns risk tier (A/B/C)
713 - * Writes brief reasoning (1-3 sentences)
677 + * Evaluates claim based on available evidence/knowledge
678 + * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
679 + * Assigns confidence score (0-100%)
680 + * Assigns risk tier (A/B/C)
681 + * Writes brief reasoning (1-3 sentences)
714 714  * System displays verdict for each claim
715 715  
716 716  **Critical:** NO MANUAL EDITING ALLOWED
... ... @@ -732,6 +732,8 @@
732 732  * Verdict is defensible given reasoning
733 733  * All generated automatically by AI
734 734  
703 +---
704 +
735 735  === FR-POC-4: Analysis Summary (Fully Automated) ===
736 736  
737 737  **Requirement:** AI generates brief summary of analysis
... ... @@ -738,9 +738,9 @@
738 738  
739 739  **Functionality:**
740 740  * AI summarizes findings in 3-5 sentences:
741 - * How many claims found
742 - * Distribution of verdicts
743 - * Overall assessment
711 + * How many claims found
712 + * Distribution of verdicts
713 + * Overall assessment
744 744  * System displays at top of results
745 745  
746 746  **Critical:** NO MANUAL EDITING ALLOWED
... ... @@ -751,6 +751,8 @@
751 751  * 3-5 sentences
752 752  * Automatically generated
753 753  
724 +---
725 +
754 754  === FR-POC-5: Article Summary (Fully Automated, Optional) ===
755 755  
756 756  **Requirement:** AI generates brief summary of original article
... ... @@ -770,6 +770,8 @@
770 770  * 3-5 sentences
771 771  * Automatically generated
772 772  
745 +---
746 +
773 773  === FR-POC-6: Publication Mode Display ===
774 774  
775 775  **Requirement:** Clear labeling of AI-generated content
... ... @@ -787,6 +787,8 @@
787 787  * Risk tiers are color-coded
788 788  * Quality gate status is visible
789 789  
764 +---
765 +
790 790  === FR-POC-7: Quality Gate Execution ===
791 791  
792 792  **Requirement:** Execute simplified quality gates
... ... @@ -804,6 +804,8 @@
804 804  * Failures explained to user
805 805  * Gates don't block publication (POC mode)
806 806  
783 +---
784 +
807 807  == 9. Non-Functional Requirements ==
808 808  
809 809  === NFR-POC-1: Fully Automated Processing ===
... ... @@ -822,8 +822,8 @@
822 822  **Pipeline:**
823 823  {{code}}
824 824  User Input → AKEL Processing → Output Display
825 - ↓
826 - ZERO human editing
803 +
804 + ZERO human editing
827 827  {{/code}}
828 828  
829 829  **If AI output is poor:**
... ... @@ -837,6 +837,8 @@
837 837  * Validates scalability (humans can't review every analysis)
838 838  * Honest test of technical feasibility
839 839  
818 +---
819 +
840 840  === NFR-POC-2: Performance ===
841 841  
842 842  **Requirement:** Analysis completes in reasonable time
... ... @@ -856,6 +856,8 @@
856 856  * User sees loading indicator
857 857  * No timeout errors
858 858  
839 +---
840 +
859 859  === NFR-POC-3: Reliability ===
860 860  
861 861  **Requirement:** System works for manual testing sessions
... ... @@ -875,6 +875,8 @@
875 875  * Errors are handled gracefully
876 876  * User receives clear error messages
877 877  
860 +---
861 +
878 878  === NFR-POC-4: Environment ===
879 879  
880 880  **Requirement:** Runs on simple infrastructure
... ... @@ -892,48 +892,8 @@
892 892  * Auto-scaling
893 893  * Disaster recovery
894 894  
895 -=== NFR-POC-5: Cost Efficiency Tracking ===
879 +---
896 896  
897 -**Requirement:** Track and display LLM usage metrics to inform optimization decisions
898 -
899 -**Must Track:**
900 -* Input tokens (article + prompt)
901 -* Output tokens (generated analysis)
902 -* Total tokens
903 -* Estimated cost (USD)
904 -* Response time (seconds)
905 -* Article length (words/characters)
906 -
907 -**Must Display:**
908 -* Usage statistics in UI (Component 5)
909 -* Cost per analysis
910 -* Cost per claim extracted
911 -
912 -**Must Log:**
913 -* Aggregate metrics for analysis
914 -* Cost distribution by article length
915 -* Token efficiency trends
916 -
917 -**Purpose:**
918 -* Understand unit economics
919 -* Identify optimization opportunities
920 -* Project costs at scale
921 -* Inform architecture decisions (caching, model selection, etc.)
922 -
923 -**Acceptance Criteria:**
924 -* ✅ Usage data displayed after each analysis
925 -* ✅ Metrics logged for aggregate analysis
926 -* ✅ Cost calculated accurately (Claude API pricing)
927 -* ✅ Test cases include varying article lengths
928 -* ✅ POC1 report includes cost analysis section
929 -
930 -**Success Target:**
931 -* Average cost per analysis < $0.05 USD
932 -* Cost scaling behavior understood (linear/exponential)
933 -* 2+ optimization opportunities identified
934 -
935 -**Critical:** Unit economics must be viable for scaling decision!
936 -
937 937  == 10. Technical Architecture ==
938 938  
939 939  === 10.1 System Components ===
... ... @@ -945,7 +945,7 @@
945 945  
946 946  **Backend:**
947 947  * Single API endpoint
948 -* Calls provider API (REASONING model; configured via LLM abstraction)
892 +* Calls Claude API (Sonnet 4.5 or latest)
949 949  * Parses response
950 950  * Returns JSON to frontend
951 951  
... ... @@ -957,32 +957,36 @@
957 957  * Claude API (Anthropic) - required
958 958  * Optional: URL fetch service for article text extraction
959 959  
904 +---
905 +
960 960  === 10.2 Processing Flow ===
961 961  
962 962  {{code}}
963 963  1. User submits text or URL
964 - ↓
910 +
965 965  2. Backend receives request
966 - ↓
912 +
967 967  3. If URL: Fetch article text
968 - ↓
914 +
969 969  4. Call Claude API with single prompt:
970 - "Extract claims, evaluate each, provide verdicts"
971 - ↓
916 + "Extract claims, evaluate each, provide verdicts"
917 +
972 972  5. Claude API returns:
973 - - Analysis summary
974 - - Claims list
975 - - Verdicts for each claim (with risk tiers)
976 - - Article summary (optional)
977 - - Quality gate results
978 - ↓
919 + - Analysis summary
920 + - Claims list
921 + - Verdicts for each claim (with risk tiers)
922 + - Article summary (optional)
923 + - Quality gate results
924 +
979 979  6. Backend parses response
980 - ↓
926 +
981 981  7. Frontend displays results with Mode 2 labeling
982 982  {{/code}}
983 983  
984 984  **Key Simplification:** Single API call does entire analysis
985 985  
932 +---
933 +
986 986  === 10.3 AI Prompt Strategy ===
987 987  
988 988  **Single Comprehensive Prompt:**
... ... @@ -989,49 +989,27 @@
989 989  {{code}}
990 990  Task: Analyze this article and provide:
991 991  
992 -1. Identify the article's main thesis/conclusion
993 - - What is the article trying to argue or prove?
994 - - What is the primary claim or conclusion?
940 +1. Extract 3-5 factual claims from the article
941 +2. For each claim:
942 + - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
943 + - Assign confidence score (0-100%)
944 + - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
945 + - Write brief reasoning (1-3 sentences)
946 +3. Run quality gates:
947 + - Check: ≥2 sources found
948 + - Attempt: Basic contradiction search
949 + - Calculate: Confidence scores
950 + - Verify: Structural integrity
951 +4. Write analysis summary (3-5 sentences: claims found, verdict distribution, overall assessment)
952 +5. Write article summary (3-5 sentences: neutral summary of article content)
995 995  
996 -2. Extract 3-5 factual claims from the article
997 - - Note which claims are CENTRAL to the main thesis
998 - - Note which claims are SUPPORTING facts
999 -
1000 -3. For each claim:
1001 - - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
1002 - - Assign confidence score (0-100%)
1003 - - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
1004 - - Write brief reasoning (1-3 sentences)
1005 -
1006 -4. Assess relationship between claims and main thesis:
1007 - - Do the claims actually support the article's conclusion?
1008 - - Are there logical leaps or unsupported inferences?
1009 - - Is the article's framing misleading even if individual facts are accurate?
1010 -
1011 -5. Run quality gates:
1012 - - Check: ≥2 sources found
1013 - - Attempt: Basic contradiction search
1014 - - Calculate: Confidence scores
1015 - - Verify: Structural integrity
1016 -
1017 -6. Write context-aware analysis summary (4-6 sentences):
1018 - - State article's main thesis
1019 - - Report claims found and verdict distribution
1020 - - Note if central claims are problematic
1021 - - Assess whether evidence supports conclusion
1022 - - Overall credibility considering claim importance
1023 -
1024 -7. Write article summary (3-5 sentences: neutral summary of article content)
1025 -
1026 1026  Return as structured JSON with quality gate results.
1027 1027  {{/code}}
1028 1028  
1029 1029  **One prompt generates everything.**
1030 1030  
1031 -**Critical Addition:**
959 +---
1032 1032  
1033 -Steps 1, 2 (marking central claims), 4, and 6 are NEW for context-aware analysis. These test whether AI can distinguish between "accurate facts poorly reasoned" vs. "genuinely credible article."
1034 -
1035 1035  === 10.4 Technology Stack Suggestions ===
1036 1036  
1037 1037  **Frontend:**
... ... @@ -1046,7 +1046,7 @@
1046 1046  
1047 1047  **AKEL Integration:**
1048 1048  * Claude API via Anthropic SDK
1049 -* Model: Provider-default REASONING model or latest available
975 +* Model: Claude Sonnet 4.5 or latest available
1050 1050  
1051 1051  **Database:**
1052 1052  * None (stateless acceptable)
... ... @@ -1057,6 +1057,8 @@
1057 1057  * Local development environment sufficient for POC
1058 1058  * Optional: Deploy to cloud for remote demos
1059 1059  
986 +---
987 +
1060 1060  == 11. Success Criteria ==
1061 1061  
1062 1062  === 11.1 Minimum Success (POC Passes) ===
... ... @@ -1070,9 +1070,6 @@
1070 1070  * ✅ Team/advisors understand the output
1071 1071  * ✅ Team agrees approach has merit
1072 1072  * ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention)
1073 -* ✅ **Cost efficiency acceptable** (average cost per analysis < $0.05 USD target)
1074 -* ✅ **Cost scaling understood** (data collected on article length vs. cost)
1075 -* ✅ **Optimization opportunities identified** (≥2 potential improvements documented)
1076 1076  
1077 1077  **Quality Definition:**
1078 1078  * "Reasonable verdict" = Defensible given general knowledge
... ... @@ -1079,6 +1079,8 @@
1079 1079  * "Coherent summary" = Logically structured, grammatically correct
1080 1080  * "Comprehensible" = Reviewers understand what analysis means
1081 1081  
1007 +---
1008 +
1082 1082  === 11.2 POC Fails If ===
1083 1083  
1084 1084  **Automatic NO-GO if any of these:**
... ... @@ -1088,6 +1088,8 @@
1088 1088  * ❌ **Requires manual editing for most analyses** (> 50% need human correction)
1089 1089  * ❌ Team loses confidence in AI-automated approach
1090 1090  
1018 +---
1019 +
1091 1091  === 11.3 Quality Thresholds ===
1092 1092  
1093 1093  **POC quality expectations:**
... ... @@ -1113,6 +1113,8 @@
1113 1113  * Understandable reasoning
1114 1114  * Useful output
1115 1115  
1045 +---
1046 +
1116 1116  == 12. Test Cases ==
1117 1117  
1118 1118  === 12.1 Test Case 1: Simple Factual Claim ===
... ... @@ -1128,6 +1128,8 @@
1128 1128  
1129 1129  **Success:** Verdict is reasonable and reasoning makes sense
1130 1130  
1062 +---
1063 +
1131 1131  === 12.2 Test Case 2: Complex News Article ===
1132 1132  
1133 1133  **Input:** News article URL with multiple claims about politics/health/science
... ... @@ -1141,6 +1141,8 @@
1141 1141  
1142 1142  **Success:** Claims identified are actually from article, verdicts are reasonable
1143 1143  
1077 +---
1078 +
1144 1144  === 12.3 Test Case 3: Controversial Topic ===
1145 1145  
1146 1146  **Input:** Article on contested political or scientific topic
... ... @@ -1153,6 +1153,8 @@
1153 1153  
1154 1154  **Success:** Analysis is fair and doesn't show obvious bias
1155 1155  
1091 +---
1092 +
1156 1156  === 12.4 Test Case 4: Clearly False Claim ===
1157 1157  
1158 1158  **Input:** Article with obviously false claim (e.g., "The Earth is flat")
... ... @@ -1166,6 +1166,8 @@
1166 1166  
1167 1167  **Success:** AI correctly identifies false claim with high confidence
1168 1168  
1106 +---
1107 +
1169 1169  === 12.5 Test Case 5: Genuinely Uncertain Claim ===
1170 1170  
1171 1171  **Input:** Article with claim where evidence is genuinely mixed
... ... @@ -1178,6 +1178,8 @@
1178 1178  
1179 1179  **Success:** AI recognizes uncertainty and doesn't overstate confidence
1180 1180  
1120 +---
1121 +
1181 1181  === 12.6 Test Case 6: High-Risk Medical Claim ===
1182 1182  
1183 1183  **Input:** Article making medical claims
... ... @@ -1191,6 +1191,8 @@
1191 1191  
1192 1192  **Success:** Risk tier correctly assigned, appropriate warnings shown
1193 1193  
1135 +---
1136 +
1194 1194  == 13. POC Decision Gate ==
1195 1195  
1196 1196  === 13.1 Decision Framework ===
... ... @@ -1213,6 +1213,8 @@
1213 1213  * Expand to Evidence Model structure
1214 1214  * Test with more complex articles
1215 1215  
1159 +---
1160 +
1216 1216  **Option B: NO-GO (Pivot or Stop)**
1217 1217  
1218 1218  **Conditions:**
... ... @@ -1226,6 +1226,8 @@
1226 1226  * **Pivot:** Change to hybrid human-AI approach (accept manual review required)
1227 1227  * **Stop:** Conclude approach not viable, revisit later
1228 1228  
1174 +---
1175 +
1229 1229  **Option C: ITERATE (Improve POC)**
1230 1230  
1231 1231  **Conditions:**
... ... @@ -1240,33 +1240,39 @@
1240 1240  * Re-run POC with improvements
1241 1241  * Then make GO/NO-GO decision
1242 1242  
1190 +---
1191 +
1243 1243  === 13.2 Decision Criteria Summary ===
1244 1244  
1245 1245  {{code}}
1246 -AI Quality < 60% → NO-GO (approach doesn't work)
1195 +AI Quality < 60% → NO-GO (approach doesn't work)
1247 1247  AI Quality 60-70% → ITERATE (improve and retry)
1248 -AI Quality ≥70% → GO (proceed to POC2)
1197 +AI Quality ≥70% → GO (proceed to POC2)
1249 1249  {{/code}}
1250 1250  
1200 +---
1201 +
1251 1251  == 14. Key Risks & Mitigations ==
1252 1252  
1253 1253  === 14.1 Risk: AI Quality Not Good Enough ===
1254 1254  
1255 -**Likelihood:** Medium-High
1256 -**Impact:** POC fails
1206 +**Likelihood:** Medium-High
1207 +**Impact:** POC fails
1257 1257  
1258 1258  **Mitigation:**
1259 1259  * Extensive prompt engineering and testing
1260 -* Use best available AI models (role-based selection; configured via LLM abstraction)
1211 +* Use best available AI models (Sonnet 4.5)
1261 1261  * Test with diverse article types
1262 1262  * Iterate on prompts based on results
1263 1263  
1264 1264  **Acceptance:** This is what POC tests - be ready for failure
1265 1265  
1217 +---
1218 +
1266 1266  === 14.2 Risk: AI Consistency Issues ===
1267 1267  
1268 -**Likelihood:** Medium
1269 -**Impact:** Works sometimes, fails other times
1221 +**Likelihood:** Medium
1222 +**Impact:** Works sometimes, fails other times
1270 1270  
1271 1271  **Mitigation:**
1272 1272  * Test with 10+ diverse articles
... ... @@ -1275,10 +1275,12 @@
1275 1275  
1276 1276  **Acceptance:** Some variability OK if average quality ≥70%
1277 1277  
1231 +---
1232 +
1278 1278  === 14.3 Risk: Output Incomprehensible ===
1279 1279  
1280 -**Likelihood:** Low-Medium
1281 -**Impact:** Users can't understand analysis
1235 +**Likelihood:** Low-Medium
1236 +**Impact:** Users can't understand analysis
1282 1282  
1283 1283  **Mitigation:**
1284 1284  * Create clear explainer document
... ... @@ -1288,10 +1288,12 @@
1288 1288  
1289 1289  **Acceptance:** Iterate until comprehensible
1290 1290  
1246 +---
1247 +
1291 1291  === 14.4 Risk: API Rate Limits / Costs ===
1292 1292  
1293 -**Likelihood:** Low
1294 -**Impact:** System slow or expensive
1250 +**Likelihood:** Low
1251 +**Impact:** System slow or expensive
1295 1295  
1296 1296  **Mitigation:**
1297 1297  * Monitor API usage
... ... @@ -1300,10 +1300,12 @@
1300 1300  
1301 1301  **Acceptance:** POC can be slow and expensive (optimization later)
1302 1302  
1260 +---
1261 +
1303 1303  === 14.5 Risk: Scope Creep ===
1304 1304  
1305 -**Likelihood:** Medium
1306 -**Impact:** POC becomes too complex
1264 +**Likelihood:** Medium
1265 +**Impact:** POC becomes too complex
1307 1307  
1308 1308  **Mitigation:**
1309 1309  * Strict scope discipline
... ... @@ -1312,6 +1312,8 @@
1312 1312  
1313 1313  **Acceptance:** POC is minimal by design
1314 1314  
1274 +---
1275 +
1315 1315  == 15. POC Philosophy ==
1316 1316  
1317 1317  === 15.1 Core Principles ===
... ... @@ -1343,21 +1343,27 @@
1343 1343  * Document failures openly
1344 1344  * Make data-driven decisions
1345 1345  
1307 +---
1308 +
1346 1346  === 15.2 What POC Is ===
1347 1347  
1348 -✅ Testing AI capability without humans
1349 -✅ Proving core technical concept
1350 -✅ Fast validation of approach
1351 -✅ Honest assessment of feasibility
1311 +✅ Testing AI capability without humans
1312 +✅ Proving core technical concept
1313 +✅ Fast validation of approach
1314 +✅ Honest assessment of feasibility
1352 1352  
1316 +---
1317 +
1353 1353  === 15.3 What POC Is NOT ===
1354 1354  
1355 -❌ Building a product
1356 -❌ Production-ready system
1357 -❌ Feature-complete platform
1358 -❌ Perfectly accurate analysis
1359 -❌ Polished user experience
1320 +❌ Building a product
1321 +❌ Production-ready system
1322 +❌ Feature-complete platform
1323 +❌ Perfectly accurate analysis
1324 +❌ Polished user experience
1360 1360  
1326 +---
1327 +
1361 1361  == 16. Success = Clear Path Forward ==
1362 1362  
1363 1363  **If POC succeeds (≥70% AI quality):**
... ... @@ -1375,63 +1375,18 @@
1375 1375  
1376 1376  **Either way, POC provides clarity.**
1377 1377  
1345 +---
1346 +
1378 1378  == 17. Related Pages ==
1379 1379  
1380 -* [[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]
1381 -* [[Requirements>>FactHarbor.Specification.Requirements.WebHome]]
1382 -* [[Gap Analysis>>FactHarbor.Specification.Requirements.GapAnalysis]]
1349 +* [[User Needs>>FactHarbor.Specification.Requirements.User Needs]]
1350 +* [[Requirements>>FactHarbor.Requirements.WebHome]]
1351 +* [[Gap Analysis>>FactHarbor.Analysis.GapAnalysis]]
1383 1383  * [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]
1384 1384  * [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
1385 1385  * [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]
1386 1386  
1356 +---
1357 +
1387 1387  **Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)
1388 1388  
1389 -
1390 -=== NFR-POC-11: LLM Provider Abstraction (POC1) ===
1391 -
1392 -**Requirement:** POC1 MUST implement LLM abstraction layer with support for multiple providers.
1393 -
1394 -**POC1 Implementation:**
1395 -
1396 -* **Primary Provider:** Anthropic Claude API
1397 - * Stage 1: Provider-default FAST model
1398 - * Stage 2: Provider-default REASONING model (cached)
1399 - * Stage 3: Provider-default REASONING model
1400 -
1401 -* **Provider Interface:** Abstract LLMProvider interface implemented
1402 -
1403 -* **Configuration:** Environment variables for provider selection
1404 - * {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}}
1405 - * {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}}
1406 - * {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}}
1407 -
1408 -* **Failover:** Basic error handling with cache fallback for Stage 2
1409 -
1410 -* **Cost Tracking:** Log provider name and cost per request
1411 -
1412 -**Future (POC2/Beta):**
1413 -
1414 -* Secondary provider (OpenAI) with automatic failover
1415 -* Admin API for runtime provider switching
1416 -* Cost comparison dashboard
1417 -* Cross-provider output verification
1418 -
1419 -**Success Criteria:**
1420 -
1421 -* All LLM calls go through abstraction layer (no direct API calls)
1422 -* Provider can be changed via environment variable without code changes
1423 -* Cost tracking includes provider name in logs
1424 -* Stage 2 falls back to cache on provider failure
1425 -
1426 -**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor.Specification.POC.API-and-Schemas.WebHome]] Section 6
1427 -
1428 -**Dependencies:**
1429 -* NFR-14 (Main Requirements)
1430 -* Design Decision 9
1431 -* Architecture Section 2.2
1432 -
1433 -**Priority:** HIGH (P1)
1434 -
1435 -**Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.
1436 -
1437 -