Last modified by Robert Schaub on 2025/12/24 20:16

From version 2.3
edited by Robert Schaub
on 2025/12/24 20:16
Change comment: Update document after refactoring.
To version 1.1
edited by Robert Schaub
on 2025/12/24 19:45
Change comment: Imported from XAR

Summary

Details

Page properties
Parent
... ... @@ -1,1 +1,1 @@
1 -WebHome
1 +Test.FactHarbor.Specification.POC.WebHome
Content
... ... @@ -1,14 +1,5 @@
1 1  = POC Requirements =
2 2  
3 -
4 -{{info}}
5 -**POC1 Architecture:** 3-stage AKEL pipeline (Extract → Analyze → Holistic) with Redis caching, credit tracking, and LLM abstraction layer.
6 -
7 -See [[POC1 API Specification>>FactHarbor.Specification.POC.API-and-Schemas.WebHome]] for complete technical details.
8 -{{/info}}
9 -
10 -
11 -
12 12  **Status:** ✅ Approved for Development
13 13  **Version:** 2.0 (Updated after Specification Cross-Check)
14 14  **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
... ... @@ -18,11 +18,9 @@
18 18  === 1.1 What POC Tests ===
19 19  
20 20  **Core Question:**
21 -
22 22  > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?
23 23  
24 24  **What we're proving:**
25 -
26 26  * AI can identify factual claims from text
27 27  * AI can evaluate those claims and produce verdicts
28 28  * Output is comprehensible and useful
... ... @@ -29,7 +29,6 @@
29 29  * Fully automated approach is viable
30 30  
31 31  **What we're NOT testing:**
32 -
33 33  * Scenario generation (deferred to POC2)
34 34  * Evidence display (deferred to POC2)
35 35  * Production scalability
... ... @@ -43,7 +43,6 @@
43 43  Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**.
44 44  
45 45  **Rationale:**
46 -
47 47  * **POC1 tests:** Can AI extract claims and generate verdicts?
48 48  * **POC2 will add:** Scenario generation and management
49 49  * **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow?
... ... @@ -55,7 +55,6 @@
55 55  **No Risk:**
56 56  
57 57  Scenarios are additive complexity, not foundational. Deferring them to POC2 allows:
58 -
59 59  * Faster POC1 validation
60 60  * Learning from POC1 to inform scenario design
61 61  * Iterative approach: fail fast if basic AI doesn't work
... ... @@ -62,10 +62,14 @@
62 62  * Flexibility to adjust scenario architecture based on POC1 insights
63 63  
64 64  **Full System Workflow (Future):**
65 -{{code}}Claims → Scenarios → Evidence → Verdicts{{/code}}
51 +{{code}}
52 +Claims → Scenarios → Evidence → Verdicts
53 +{{/code}}
66 66  
67 67  **POC1 Simplified Workflow:**
68 -{{code}}Claims → Verdicts (scenarios implicit in reasoning){{/code}}
56 +{{code}}
57 +Claims → Verdicts (scenarios implicit in reasoning)
58 +{{/code}}
69 69  
70 70  == 2. POC Output Specification ==
71 71  
... ... @@ -76,7 +76,6 @@
76 76  **Length:** 4-6 sentences
77 77  
78 78  **Content (Required Elements):**
79 -
80 80  1. **Article's main thesis/claim** - What is the article trying to argue or prove?
81 81  2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts
82 82  3. **Central vs. supporting claims** - Which claims are central to the article's argument?
... ... @@ -86,28 +86,30 @@
86 86  **Critical Innovation:**
87 87  
88 88  POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might:
89 -
90 90  * Make accurate supporting facts but draw unsupported conclusions
91 91  * Have one false central claim that invalidates the whole argument
92 92  * Misframe accurate information to mislead
93 93  
94 94  **Good Example (Context-Aware):**
95 -{{code}}This article argues that coffee cures cancer based on its antioxidant
83 +{{code}}
84 +This article argues that coffee cures cancer based on its antioxidant
96 96  content. We analyzed 3 factual claims: 2 about coffee's chemical
97 97  properties are well-supported, but the main causal claim is refuted
98 98  by current evidence. The article confuses correlation with causation.
99 99  Overall assessment: MISLEADING - makes an unsupported medical claim
100 -despite citing some accurate facts.{{/code}}
89 +despite citing some accurate facts.
90 +{{/code}}
101 101  
102 102  **Poor Example (Simple Aggregation - Don't Do This):**
103 -{{code}}This article makes 3 claims. 2 are well-supported and 1 is refuted.
104 -Overall assessment: mostly accurate (67% accurate).{{/code}}
93 +{{code}}
94 +This article makes 3 claims. 2 are well-supported and 1 is refuted.
95 +Overall assessment: mostly accurate (67% accurate).
96 +{{/code}}
105 105  ↑ This misses that the refuted claim IS the article's main point!
106 106  
107 107  **What POC1 Tests:**
108 108  
109 109  Can AI identify and assess:
110 -
111 111  * ✅ The article's main thesis/conclusion?
112 112  * ✅ Which claims are central vs. supporting?
113 113  * ✅ Whether the evidence supports the conclusion?
... ... @@ -116,7 +116,6 @@
116 116  **If AI Cannot Do This:**
117 117  
118 118  That's valuable to learn in POC1! We'll:
119 -
120 120  * Note as limitation
121 121  * Fall back to simple aggregation with warning
122 122  * Design explicit article-level analysis for POC2
... ... @@ -127,18 +127,19 @@
127 127  **Format:** Numbered list
128 128  **Quantity:** 3-5 claims
129 129  **Requirements:**
130 -
131 131  * Factual claims only (not opinions/questions)
132 132  * Clearly stated
133 133  * Automatically extracted by AI
134 134  
135 135  **Example:**
136 -{{code}}CLAIMS IDENTIFIED:
125 +{{code}}
126 +CLAIMS IDENTIFIED:
137 137  
138 138  [1] Coffee reduces diabetes risk by 30%
139 139  [2] Coffee improves heart health
140 140  [3] Decaf has same benefits as regular
141 -[4] Coffee prevents Alzheimer's completely{{/code}}
131 +[4] Coffee prevents Alzheimer's completely
132 +{{/code}}
142 142  
143 143  === 2.3 Component 3: CLAIMS VERDICTS ===
144 144  
... ... @@ -146,7 +146,6 @@
146 146  **Format:** Per claim structure
147 147  
148 148  **Required Elements:**
149 -
150 150  * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
151 151  * **Confidence Score:** 0-100%
152 152  * **Brief Reasoning:** 1-3 sentences explaining why
... ... @@ -153,7 +153,8 @@
153 153  * **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration
154 154  
155 155  **Example:**
156 -{{code}}VERDICTS:
146 +{{code}}
147 +VERDICTS:
157 157  
158 158  [1] WELL-SUPPORTED (85%) [Risk: C]
159 159  Multiple studies confirm 25-30% risk reduction with regular consumption.
... ... @@ -165,10 +165,10 @@
165 165  Some benefits overlap, but caffeine-related benefits are reduced in decaf.
166 166  
167 167  [4] REFUTED (90%) [Risk: B]
168 -No evidence for complete prevention. Claim is significantly overstated.{{/code}}
159 +No evidence for complete prevention. Claim is significantly overstated.
160 +{{/code}}
169 169  
170 170  **Risk Tier Display:**
171 -
172 172  * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections
173 173  * **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
174 174  * **Tier C (Green):** Low Risk - Facts/Definitions/History
... ... @@ -182,11 +182,13 @@
182 182  **Tone:** Neutral (article's position, not FactHarbor's analysis)
183 183  
184 184  **Example:**
185 -{{code}}ARTICLE SUMMARY:
176 +{{code}}
177 +ARTICLE SUMMARY:
186 186  
187 187  Health News Today article discusses coffee benefits, citing studies
188 188  on diabetes and Alzheimer's. Author highlights research linking coffee
189 -to disease prevention. Recommends 2-3 cups daily for optimal health.{{/code}}
181 +to disease prevention. Recommends 2-3 cups daily for optimal health.
182 +{{/code}}
190 190  
191 191  === 2.5 Component 5: USAGE STATISTICS (Cost Tracking) ===
192 192  
... ... @@ -193,7 +193,6 @@
193 193  **What:** LLM usage metrics for cost optimization and scaling decisions
194 194  
195 195  **Purpose:**
196 -
197 197  * Understand cost per analysis
198 198  * Identify optimization opportunities
199 199  * Project costs at scale
... ... @@ -200,7 +200,8 @@
200 200  * Inform architecture decisions
201 201  
202 202  **Display Format:**
203 -{{code}}USAGE STATISTICS:
195 +{{code}}
196 +USAGE STATISTICS:
204 204  • Article: 2,450 words (12,300 characters)
205 205  • Input tokens: 15,234
206 206  • Output tokens: 892
... ... @@ -208,18 +208,17 @@
208 208  • Estimated cost: $0.24 USD
209 209  • Response time: 8.3 seconds
210 210  • Cost per claim: $0.048
211 -• Model: claude-sonnet-4-20250514{{/code}}
204 +• Model: claude-sonnet-4-20250514
205 +{{/code}}
212 212  
213 213  **Why This Matters:**
214 214  
215 215  At scale, LLM costs are critical:
216 -
217 217  * 10,000 articles/month ≈ $200-500/month
218 218  * 100,000 articles/month ≈ $2,000-5,000/month
219 219  * Cost optimization can reduce expenses 30-50%
220 220  
221 221  **What POC1 Learns:**
222 -
223 223  * How cost scales with article length
224 224  * Prompt optimization opportunities (caching, compression)
225 225  * Output verbosity tradeoffs
... ... @@ -227,7 +227,6 @@
227 227  * Article length limits (if needed)
228 228  
229 229  **Implementation:**
230 -
231 231  * Claude API already returns usage data
232 232  * No extra API calls needed
233 233  * Display to user + log for aggregate analysis
... ... @@ -237,8 +237,7 @@
237 237  
238 238  === 2.6 Total Output Size ===
239 239  
240 -**Combined:** 220-350 words
241 -
231 +**Combined:** ~220-350 words
242 242  * Analysis Summary (Context-Aware): 60-90 words (4-6 sentences)
243 243  * Claims Identification: 30-50 words
244 244  * Claims Verdicts: 100-150 words
... ... @@ -253,7 +253,6 @@
253 253  The following are **explicitly excluded** from POC:
254 254  
255 255  **Content Features:**
256 -
257 257  * ❌ Scenarios (deferred to POC2)
258 258  * ❌ Evidence display (supporting/opposing lists)
259 259  * ❌ Source links (clickable references)
... ... @@ -263,7 +263,6 @@
263 263  * ❌ Risk assessment (shown but not workflow-integrated)
264 264  
265 265  **Platform Features:**
266 -
267 267  * ❌ User accounts / authentication
268 268  * ❌ Saved history
269 269  * ❌ Search functionality
... ... @@ -273,7 +273,6 @@
273 273  * ❌ Social sharing
274 274  
275 275  **Technical Features:**
276 -
277 277  * ❌ Browser extensions
278 278  * ❌ Mobile apps
279 279  * ❌ API endpoints
... ... @@ -281,7 +281,6 @@
281 281  * ❌ Export features (PDF, CSV)
282 282  
283 283  **Quality Features:**
284 -
285 285  * ❌ Accessibility (WCAG compliance)
286 286  * ❌ Multilingual support
287 287  * ❌ Mobile optimization
... ... @@ -288,7 +288,6 @@
288 288  * ❌ Media verification (images/videos)
289 289  
290 290  **Production Features:**
291 -
292 292  * ❌ Security hardening
293 293  * ❌ Privacy compliance (GDPR)
294 294  * ❌ Terms of service
... ... @@ -302,13 +302,17 @@
302 302  === 4.1 Architecture Comparison ===
303 303  
304 304  **POC Architecture (Simplified):**
305 -{{code}}User Input → Single AKEL Call → Output Display
306 - (all processing){{/code}}
290 +{{code}}
291 +User Input → Single AKEL Call → Output Display
292 + (all processing)
293 +{{/code}}
307 307  
308 308  **Full System Architecture:**
309 -{{code}}User Input → Claim Extractor → Claim Classifier → Scenario Generator
296 +{{code}}
297 +User Input → Claim Extractor → Claim Classifier → Scenario Generator
310 310  → Evidence Summarizer → Contradiction Detector → Verdict Generator
311 -→ Quality Gates → Publication → Output Display{{/code}}
299 +→ Quality Gates → Publication → Output Display
300 +{{/code}}
312 312  
313 313  **Key Differences:**
314 314  
... ... @@ -324,14 +324,12 @@
324 324  === 4.2 Workflow Comparison ===
325 325  
326 326  **POC1 Workflow:**
327 -
328 328  1. User submits text/URL
329 329  2. Single AKEL call (all processing in one prompt)
330 330  3. Display results
331 -**Total: 3 steps, 10-18 seconds**
319 +**Total: 3 steps, ~10-18 seconds**
332 332  
333 333  **Full System Workflow:**
334 -
335 335  1. **Claim Submission** (extraction, normalization, clustering)
336 336  2. **Scenario Building** (definitions, assumptions, boundaries)
337 337  3. **Evidence Handling** (retrieval, assessment, linking)
... ... @@ -338,7 +338,7 @@
338 338  4. **Verdict Creation** (synthesis, reasoning, approval)
339 339  5. **Public Presentation** (summaries, landscapes, deep dives)
340 340  6. **Time Evolution** (versioning, re-evaluation triggers)
341 -**Total: 6 phases with quality gates, 10-30 seconds**
328 +**Total: 6 phases with quality gates, ~10-30 seconds**
342 342  
343 343  === 4.3 Why POC is Simplified ===
344 344  
... ... @@ -361,7 +361,6 @@
361 361  === 4.4 Gap Between POC1 and POC2/Beta ===
362 362  
363 363  **What needs to be built for POC2:**
364 -
365 365  * Scenario generation component
366 366  * Evidence Model structure (full)
367 367  * Scenario-evidence linking
... ... @@ -369,7 +369,6 @@
369 369  * Truth landscape visualization
370 370  
371 371  **What needs to be built for Beta:**
372 -
373 373  * Multi-component AKEL pipeline
374 374  * Quality gate infrastructure
375 375  * Review workflow system
... ... @@ -386,7 +386,6 @@
386 386  **Mode:** Mode 2 (AI-Generated, No Prior Human Review)
387 387  
388 388  Per FactHarbor Specification Section 11 "POC v1 Behavior":
389 -
390 390  * Produces public AI-generated output
391 391  * No human approval gate
392 392  * Clear AI-Generated labeling
... ... @@ -396,7 +396,8 @@
396 396  === 5.2 User-Facing Labels ===
397 397  
398 398  **Primary Label (top of analysis):**
399 -{{code}}╔════════════════════════════════════════════════════════════╗
383 +{{code}}
384 +╔════════════════════════════════════════════════════════════╗
400 400  ║ [AI-GENERATED - POC/DEMO] ║
401 401  ║ ║
402 402  ║ This analysis was produced entirely by AI and has not ║
... ... @@ -406,10 +406,10 @@
406 406  ║ Review Status: Not Reviewed (Proof-of-Concept) ║
407 407  ║ Quality Gates: 4/4 Passed (Simplified) ║
408 408  ║ Last Updated: [timestamp] ║
409 -╚════════════════════════════════════════════════════════════╝{{/code}}
394 +╚════════════════════════════════════════════════════════════╝
395 +{{/code}}
410 410  
411 411  **Per-Claim Risk Labels:**
412 -
413 413  * **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety)
414 414  * **[Risk: B]** 🟡 Medium Risk (Policy/Science)
415 415  * **[Risk: C]** 🟢 Low Risk (Facts/Definitions)
... ... @@ -417,7 +417,6 @@
417 417  === 5.3 Display Requirements ===
418 418  
419 419  **Must Show:**
420 -
421 421  * AI-Generated status (prominent)
422 422  * POC/Demo disclaimer
423 423  * Risk tier per claim
... ... @@ -426,7 +426,6 @@
426 426  * Timestamp
427 427  
428 428  **Must NOT Claim:**
429 -
430 430  * Human review
431 431  * Production quality
432 432  * Medical/legal advice
... ... @@ -450,7 +450,6 @@
450 450  Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates.
451 451  
452 452  **Full System Has 4 Gates:**
453 -
454 454  1. Source Quality
455 455  2. Contradiction Search (MANDATORY)
456 456  3. Uncertainty Quantification
... ... @@ -457,7 +457,6 @@
457 457  4. Structural Integrity
458 458  
459 459  **POC Implements Simplified Versions:**
460 -
461 461  * Focus on demonstrating concept
462 462  * Basic implementations sufficient
463 463  * Failures displayed to user (not blocking)
... ... @@ -466,7 +466,6 @@
466 466  === 6.2 Gate 1: Source Quality (Basic) ===
467 467  
468 468  **Full System Requirements:**
469 -
470 470  * Primary sources identified and accessible
471 471  * Source reliability scored against whitelist
472 472  * Citation completeness verified
... ... @@ -474,7 +474,6 @@
474 474  * Author credentials validated
475 475  
476 476  **POC Implementation:**
477 -
478 478  * ✅ At least 2 sources found
479 479  * ✅ Sources accessible (URLs valid)
480 480  * ❌ No whitelist checking
... ... @@ -488,7 +488,6 @@
488 488  === 6.3 Gate 2: Contradiction Search (Basic) ===
489 489  
490 490  **Full System Requirements:**
491 -
492 492  * Counter-evidence actively searched
493 493  * Reservations and limitations identified
494 494  * Alternative interpretations explored
... ... @@ -497,7 +497,6 @@
497 497  * Academic literature (supporting AND opposing)
498 498  
499 499  **POC Implementation:**
500 -
501 501  * ✅ Basic search for counter-evidence
502 502  * ✅ Identify obvious contradictions
503 503  * ❌ No comprehensive academic search
... ... @@ -512,7 +512,6 @@
512 512  === 6.4 Gate 3: Uncertainty Quantification (Basic) ===
513 513  
514 514  **Full System Requirements:**
515 -
516 516  * Confidence scores calculated for all claims/verdicts
517 517  * Limitations explicitly stated
518 518  * Data gaps identified and disclosed
... ... @@ -520,7 +520,6 @@
520 520  * Alternative scenarios considered
521 521  
522 522  **POC Implementation:**
523 -
524 524  * ✅ Confidence scores (0-100%)
525 525  * ✅ Basic uncertainty acknowledgment
526 526  * ❌ No detailed limitation disclosure
... ... @@ -534,7 +534,6 @@
534 534  === 6.5 Gate 4: Structural Integrity (Basic) ===
535 535  
536 536  **Full System Requirements:**
537 -
538 538  * No hallucinations detected (fact-checking against sources)
539 539  * Logic chain valid and traceable
540 540  * References accessible and verifiable
... ... @@ -542,7 +542,6 @@
542 542  * Premises clearly stated
543 543  
544 544  **POC Implementation:**
545 -
546 546  * ✅ Basic coherence check
547 547  * ✅ References accessible
548 548  * ❌ No comprehensive hallucination detection
... ... @@ -556,20 +556,24 @@
556 556  === 6.6 Quality Gate Display ===
557 557  
558 558  **POC shows simplified status:**
559 -{{code}}Quality Gates: 4/4 Passed (Simplified)
532 +{{code}}
533 +Quality Gates: 4/4 Passed (Simplified)
560 560  ✓ Source Quality: 3 sources found
561 561  ✓ Contradiction Search: Basic search completed
562 562  ✓ Uncertainty: Confidence scores assigned
563 -✓ Structural Integrity: Output coherent{{/code}}
537 +✓ Structural Integrity: Output coherent
538 +{{/code}}
564 564  
565 565  **If any gate fails:**
566 -{{code}}Quality Gates: 3/4 Passed (Simplified)
541 +{{code}}
542 +Quality Gates: 3/4 Passed (Simplified)
567 567  ✓ Source Quality: 3 sources found
568 568  ✗ Contradiction Search: Search failed - limited evidence
569 569  ✓ Uncertainty: Confidence scores assigned
570 570  ✓ Structural Integrity: Output coherent
571 571  
572 -Note: This analysis has limited evidence. Use with caution.{{/code}}
548 +Note: This analysis has limited evidence. Use with caution.
549 +{{/code}}
573 573  
574 574  === 6.7 Simplified vs. Full System ===
575 575  
... ... @@ -586,7 +586,6 @@
586 586  === 7.1 POC AKEL (Simplified) ===
587 587  
588 588  **Implementation:**
589 -
590 590  * Single Claude API call (Sonnet 4.5)
591 591  * One comprehensive prompt
592 592  * All processing in single request
... ... @@ -594,7 +594,8 @@
594 594  * No orchestration layer
595 595  
596 596  **Prompt Structure:**
597 -{{code}}Task: Analyze this article and provide:
573 +{{code}}
574 +Task: Analyze this article and provide:
598 598  
599 599  1. Extract 3-5 factual claims
600 600  2. For each claim:
... ... @@ -606,7 +606,8 @@
606 606  4. Generate article summary (3-5 sentences)
607 607  5. Run basic quality checks
608 608  
609 -Return as structured JSON.{{/code}}
586 +Return as structured JSON.
587 +{{/code}}
610 610  
611 611  **Processing Time:** 10-18 seconds (estimate)
612 612  
... ... @@ -613,7 +613,8 @@
613 613  === 7.2 Full System AKEL (Production) ===
614 614  
615 615  **Architecture:**
616 -{{code}}AKEL Orchestrator
594 +{{code}}
595 +AKEL Orchestrator
617 617  ├── Claim Extractor
618 618  ├── Claim Classifier (with risk tier assignment)
619 619  ├── Scenario Generator
... ... @@ -621,10 +621,10 @@
621 621  ├── Contradiction Detector
622 622  ├── Quality Gate Validator
623 623  ├── Audit Sampling Scheduler
624 -└── Federation Sync Adapter (Release 1.0+){{/code}}
603 +└── Federation Sync Adapter (Release 1.0+)
604 +{{/code}}
625 625  
626 626  **Processing:**
627 -
628 628  * Parallel processing where possible
629 629  * Separate component calls
630 630  * Quality gates between phases
... ... @@ -636,7 +636,6 @@
636 636  === 7.3 Why POC Uses Single Call ===
637 637  
638 638  **Advantages:**
639 -
640 640  * ✅ Simpler to implement
641 641  * ✅ Faster POC development
642 642  * ✅ Easier to debug
... ... @@ -644,7 +644,6 @@
644 644  * ✅ Good enough for concept validation
645 645  
646 646  **Limitations:**
647 -
648 648  * ❌ No component reusability
649 649  * ❌ No parallel processing
650 650  * ❌ All-or-nothing (can't partially succeed)
... ... @@ -671,7 +671,6 @@
671 671  **Requirement:** User can submit article for analysis
672 672  
673 673  **Functionality:**
674 -
675 675  * Text input field (paste article text, up to 5000 characters)
676 676  * URL input field (paste article URL)
677 677  * "Analyze" button to trigger processing
... ... @@ -678,7 +678,6 @@
678 678  * Loading indicator during analysis
679 679  
680 680  **Excluded:**
681 -
682 682  * No user authentication
683 683  * No claim history
684 684  * No search functionality
... ... @@ -685,7 +685,6 @@
685 685  * No saved templates
686 686  
687 687  **Acceptance Criteria:**
688 -
689 689  * User can paste text from article
690 690  * User can paste URL of article
691 691  * System accepts input and triggers analysis
... ... @@ -695,7 +695,6 @@
695 695  **Requirement:** AI automatically extracts 3-5 factual claims
696 696  
697 697  **Functionality:**
698 -
699 699  * AI reads article text
700 700  * AI identifies factual claims (not opinions/questions)
701 701  * AI extracts 3-5 most important claims
... ... @@ -702,7 +702,6 @@
702 702  * System displays numbered list
703 703  
704 704  **Critical:** NO MANUAL EDITING ALLOWED
705 -
706 706  * AI selects which claims to extract
707 707  * AI identifies factual vs. non-factual
708 708  * System processes claims as extracted
... ... @@ -709,13 +709,11 @@
709 709  * No human curation or correction
710 710  
711 711  **Error Handling:**
712 -
713 713  * If extraction fails: Display error message
714 714  * User can retry with different input
715 715  * No manual intervention to fix extraction
716 716  
717 717  **Acceptance Criteria:**
718 -
719 719  * AI extracts 3-5 claims automatically
720 720  * Claims are factual (not opinions)
721 721  * Claims are clearly stated
... ... @@ -726,17 +726,15 @@
726 726  **Requirement:** AI automatically generates verdict for each claim
727 727  
728 728  **Functionality:**
729 -
730 730  * For each claim, AI:
731 -* Evaluates claim based on available evidence/knowledge
732 -* Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
733 -* Assigns confidence score (0-100%)
734 -* Assigns risk tier (A/B/C)
735 -* Writes brief reasoning (1-3 sentences)
700 + * Evaluates claim based on available evidence/knowledge
701 + * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
702 + * Assigns confidence score (0-100%)
703 + * Assigns risk tier (A/B/C)
704 + * Writes brief reasoning (1-3 sentences)
736 736  * System displays verdict for each claim
737 737  
738 738  **Critical:** NO MANUAL EDITING ALLOWED
739 -
740 740  * AI computes verdicts based on evidence
741 741  * AI generates confidence scores
742 742  * AI writes reasoning
... ... @@ -743,13 +743,11 @@
743 743  * No human review or adjustment
744 744  
745 745  **Error Handling:**
746 -
747 747  * If verdict generation fails: Display error message
748 748  * User can retry
749 749  * No manual intervention to adjust verdicts
750 750  
751 751  **Acceptance Criteria:**
752 -
753 753  * Each claim has a verdict
754 754  * Confidence score is displayed (0-100%)
755 755  * Risk tier is displayed (A/B/C)
... ... @@ -762,17 +762,15 @@
762 762  **Requirement:** AI generates brief summary of analysis
763 763  
764 764  **Functionality:**
765 -
766 766  * AI summarizes findings in 3-5 sentences:
767 -* How many claims found
768 -* Distribution of verdicts
769 -* Overall assessment
732 + * How many claims found
733 + * Distribution of verdicts
734 + * Overall assessment
770 770  * System displays at top of results
771 771  
772 772  **Critical:** NO MANUAL EDITING ALLOWED
773 773  
774 774  **Acceptance Criteria:**
775 -
776 776  * Summary is coherent
777 777  * Accurately reflects analysis
778 778  * 3-5 sentences
... ... @@ -783,7 +783,6 @@
783 783  **Requirement:** AI generates brief summary of original article
784 784  
785 785  **Functionality:**
786 -
787 787  * AI summarizes article content (not FactHarbor's analysis)
788 788  * 3-5 sentences
789 789  * System displays
... ... @@ -793,7 +793,6 @@
793 793  **Critical:** NO MANUAL EDITING ALLOWED
794 794  
795 795  **Acceptance Criteria:**
796 -
797 797  * Summary is neutral (article's position)
798 798  * Accurately reflects article content
799 799  * 3-5 sentences
... ... @@ -804,7 +804,6 @@
804 804  **Requirement:** Clear labeling of AI-generated content
805 805  
806 806  **Functionality:**
807 -
808 808  * Display Mode 2 publication label
809 809  * Show POC/Demo disclaimer
810 810  * Display risk tiers per claim
... ... @@ -812,7 +812,6 @@
812 812  * Display timestamp
813 813  
814 814  **Acceptance Criteria:**
815 -
816 816  * Label is prominent and clear
817 817  * User understands this is AI-generated POC output
818 818  * Risk tiers are color-coded
... ... @@ -823,7 +823,6 @@
823 823  **Requirement:** Execute simplified quality gates
824 824  
825 825  **Functionality:**
826 -
827 827  * Check source quality (basic)
828 828  * Attempt contradiction search (basic)
829 829  * Calculate confidence scores
... ... @@ -831,7 +831,6 @@
831 831  * Display gate results
832 832  
833 833  **Acceptance Criteria:**
834 -
835 835  * All 4 gates attempted
836 836  * Pass/fail status displayed
837 837  * Failures explained to user
... ... @@ -846,7 +846,6 @@
846 846  **Critical Rule:** NO MANUAL EDITING AT ANY STAGE
847 847  
848 848  **What this means:**
849 -
850 850  * Claims: AI selects (no human curation)
851 851  * Scenarios: N/A (deferred to POC2)
852 852  * Evidence: AI evaluates (no human selection)
... ... @@ -854,12 +854,13 @@
854 854  * Summaries: AI writes (no human editing)
855 855  
856 856  **Pipeline:**
857 -{{code}}User Input → AKEL Processing → Output Display
814 +{{code}}
815 +User Input → AKEL Processing → Output Display
858 858   ↓
859 - ZERO human editing{{/code}}
817 + ZERO human editing
818 +{{/code}}
860 860  
861 861  **If AI output is poor:**
862 -
863 863  * ❌ Do NOT manually fix it
864 864  * ✅ Document the failure
865 865  * ✅ Improve prompts and retry
... ... @@ -866,7 +866,6 @@
866 866  * ✅ Accept that POC might fail
867 867  
868 868  **Why this matters:**
869 -
870 870  * Tests whether AI can do this without humans
871 871  * Validates scalability (humans can't review every analysis)
872 872  * Honest test of technical feasibility
... ... @@ -876,19 +876,16 @@
876 876  **Requirement:** Analysis completes in reasonable time
877 877  
878 878  **Acceptable Performance:**
879 -
880 880  * Processing time: 1-5 minutes (acceptable for POC)
881 881  * Display loading indicator to user
882 882  * Show progress if possible ("Extracting claims...", "Generating verdicts...")
883 883  
884 884  **Not Required:**
885 -
886 886  * Production-level speed (< 30 seconds)
887 887  * Optimization for scale
888 888  * Caching
889 889  
890 890  **Acceptance Criteria:**
891 -
892 892  * Analysis completes within 5 minutes
893 893  * User sees loading indicator
894 894  * No timeout errors
... ... @@ -898,19 +898,16 @@
898 898  **Requirement:** System works for manual testing sessions
899 899  
900 900  **Acceptable:**
901 -
902 902  * Occasional errors (< 20% failure rate)
903 903  * Manual restart if needed
904 904  * Display error messages clearly
905 905  
906 906  **Not Required:**
907 -
908 908  * 99.9% uptime
909 909  * Automatic error recovery
910 910  * Production monitoring
911 911  
912 912  **Acceptance Criteria:**
913 -
914 914  * System works for test demonstrations
915 915  * Errors are handled gracefully
916 916  * User receives clear error messages
... ... @@ -920,7 +920,6 @@
920 920  **Requirement:** Runs on simple infrastructure
921 921  
922 922  **Acceptable:**
923 -
924 924  * Single machine or simple cloud setup
925 925  * No distributed architecture
926 926  * No load balancing
... ... @@ -928,7 +928,6 @@
928 928  * Local development environment viable
929 929  
930 930  **Not Required:**
931 -
932 932  * Production infrastructure
933 933  * Multi-region deployment
934 934  * Auto-scaling
... ... @@ -939,7 +939,6 @@
939 939  **Requirement:** Track and display LLM usage metrics to inform optimization decisions
940 940  
941 941  **Must Track:**
942 -
943 943  * Input tokens (article + prompt)
944 944  * Output tokens (generated analysis)
945 945  * Total tokens
... ... @@ -948,19 +948,16 @@
948 948  * Article length (words/characters)
949 949  
950 950  **Must Display:**
951 -
952 952  * Usage statistics in UI (Component 5)
953 953  * Cost per analysis
954 954  * Cost per claim extracted
955 955  
956 956  **Must Log:**
957 -
958 958  * Aggregate metrics for analysis
959 959  * Cost distribution by article length
960 960  * Token efficiency trends
961 961  
962 962  **Purpose:**
963 -
964 964  * Understand unit economics
965 965  * Identify optimization opportunities
966 966  * Project costs at scale
... ... @@ -967,7 +967,6 @@
967 967  * Inform architecture decisions (caching, model selection, etc.)
968 968  
969 969  **Acceptance Criteria:**
970 -
971 971  * ✅ Usage data displayed after each analysis
972 972  * ✅ Metrics logged for aggregate analysis
973 973  * ✅ Cost calculated accurately (Claude API pricing)
... ... @@ -975,7 +975,6 @@
975 975  * ✅ POC1 report includes cost analysis section
976 976  
977 977  **Success Target:**
978 -
979 979  * Average cost per analysis < $0.05 USD
980 980  * Cost scaling behavior understood (linear/exponential)
981 981  * 2+ optimization opportunities identified
... ... @@ -987,13 +987,11 @@
987 987  === 10.1 System Components ===
988 988  
989 989  **Frontend:**
990 -
991 991  * Simple HTML form (text input + URL input + button)
992 992  * Loading indicator
993 993  * Results display page (single page, no tabs/navigation)
994 994  
995 995  **Backend:**
996 -
997 997  * Single API endpoint
998 998  * Calls Claude API (Sonnet 4.5 or latest)
999 999  * Parses response
... ... @@ -1000,12 +1000,10 @@
1000 1000  * Returns JSON to frontend
1001 1001  
1002 1002  **Data Storage:**
1003 -
1004 1004  * None required (stateless POC)
1005 1005  * Optional: Simple file storage or SQLite for demo examples
1006 1006  
1007 1007  **External Services:**
1008 -
1009 1009  * Claude API (Anthropic) - required
1010 1010  * Optional: URL fetch service for article text extraction
1011 1011  
... ... @@ -1038,7 +1038,8 @@
1038 1038  === 10.3 AI Prompt Strategy ===
1039 1039  
1040 1040  **Single Comprehensive Prompt:**
1041 -{{code}}Task: Analyze this article and provide:
980 +{{code}}
981 +Task: Analyze this article and provide:
1042 1042  
1043 1043  1. Identify the article's main thesis/conclusion
1044 1044   - What is the article trying to argue or prove?
... ... @@ -1074,7 +1074,8 @@
1074 1074  
1075 1075  7. Write article summary (3-5 sentences: neutral summary of article content)
1076 1076  
1077 -Return as structured JSON with quality gate results.{{/code}}
1017 +Return as structured JSON with quality gate results.
1018 +{{/code}}
1078 1078  
1079 1079  **One prompt generates everything.**
1080 1080  
... ... @@ -1085,30 +1085,25 @@
1085 1085  === 10.4 Technology Stack Suggestions ===
1086 1086  
1087 1087  **Frontend:**
1088 -
1089 1089  * HTML + CSS + JavaScript (minimal framework)
1090 1090  * OR: Next.js (if team prefers)
1091 1091  * Hosted: Local machine OR Vercel/Netlify free tier
1092 1092  
1093 1093  **Backend:**
1094 -
1095 1095  * Python Flask/FastAPI (simple REST API)
1096 1096  * OR: Next.js API routes (if using Next.js)
1097 1097  * Hosted: Local machine OR Railway/Render free tier
1098 1098  
1099 1099  **AKEL Integration:**
1100 -
1101 1101  * Claude API via Anthropic SDK
1102 1102  * Model: Claude Sonnet 4.5 or latest available
1103 1103  
1104 1104  **Database:**
1105 -
1106 1106  * None (stateless acceptable)
1107 1107  * OR: SQLite if want to store demo examples
1108 1108  * OR: JSON files on disk
1109 1109  
1110 1110  **Deployment:**
1111 -
1112 1112  * Local development environment sufficient for POC
1113 1113  * Optional: Deploy to cloud for remote demos
1114 1114  
... ... @@ -1117,7 +1117,6 @@
1117 1117  === 11.1 Minimum Success (POC Passes) ===
1118 1118  
1119 1119  **Required for GO decision:**
1120 -
1121 1121  * ✅ AI extracts 3-5 factual claims automatically
1122 1122  * ✅ AI provides verdict for each claim automatically
1123 1123  * ✅ Verdicts are reasonable (≥70% make logical sense)
... ... @@ -1131,7 +1131,6 @@
1131 1131  * ✅ **Optimization opportunities identified** (≥2 potential improvements documented)
1132 1132  
1133 1133  **Quality Definition:**
1134 -
1135 1135  * "Reasonable verdict" = Defensible given general knowledge
1136 1136  * "Coherent summary" = Logically structured, grammatically correct
1137 1137  * "Comprehensible" = Reviewers understand what analysis means
... ... @@ -1139,7 +1139,6 @@
1139 1139  === 11.2 POC Fails If ===
1140 1140  
1141 1141  **Automatic NO-GO if any of these:**
1142 -
1143 1143  * ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones)
1144 1144  * ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random)
1145 1145  * ❌ Output incomprehensible (reviewers can't understand analysis)
... ... @@ -1151,15 +1151,14 @@
1151 1151  **POC quality expectations:**
1152 1152  
1153 1153  |=Component|=Quality Threshold|=Definition
1154 -|Claim Extraction|(% class="success" %)≥70% accuracy |Identifies obvious factual claims, may miss some edge cases
1155 -|Verdict Logic|(% class="success" %)≥70% defensible |Verdicts are logical given reasoning provided
1156 -|Reasoning Clarity|(% class="success" %)≥70% clear |1-3 sentences are understandable and relevant
1157 -|Overall Analysis|(% class="success" %)≥70% useful |Output helps user understand article claims
1087 +|Claim Extraction|(% class="success" %)≥70% accuracy(%%) |Identifies obvious factual claims, may miss some edge cases
1088 +|Verdict Logic|(% class="success" %)≥70% defensible(%%) |Verdicts are logical given reasoning provided
1089 +|Reasoning Clarity|(% class="success" %)≥70% clear(%%) |1-3 sentences are understandable and relevant
1090 +|Overall Analysis|(% class="success" %)≥70% useful(%%) |Output helps user understand article claims
1158 1158  
1159 1159  **Analogy:** "B student" quality (70-80%), not "A+" perfection yet
1160 1160  
1161 1161  **Not expecting:**
1162 -
1163 1163  * 100% accuracy
1164 1164  * Perfect claim coverage
1165 1165  * Comprehensive evidence gathering
... ... @@ -1167,7 +1167,6 @@
1167 1167  * Production polish
1168 1168  
1169 1169  **Expecting:**
1170 -
1171 1171  * Reasonable claim extraction
1172 1172  * Defensible verdicts
1173 1173  * Understandable reasoning
... ... @@ -1180,7 +1180,6 @@
1180 1180  **Input:** "Coffee reduces the risk of type 2 diabetes by 30%"
1181 1181  
1182 1182  **Expected Output:**
1183 -
1184 1184  * Extract claim correctly
1185 1185  * Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED
1186 1186  * Confidence: 70-90%
... ... @@ -1194,7 +1194,6 @@
1194 1194  **Input:** News article URL with multiple claims about politics/health/science
1195 1195  
1196 1196  **Expected Output:**
1197 -
1198 1198  * Extract 3-5 key claims
1199 1199  * Verdict for each (may vary: some supported, some uncertain, some refuted)
1200 1200  * Coherent analysis summary
... ... @@ -1208,7 +1208,6 @@
1208 1208  **Input:** Article on contested political or scientific topic
1209 1209  
1210 1210  **Expected Output:**
1211 -
1212 1212  * Balanced analysis
1213 1213  * Acknowledges uncertainty where appropriate
1214 1214  * Doesn't overstate confidence
... ... @@ -1221,7 +1221,6 @@
1221 1221  **Input:** Article with obviously false claim (e.g., "The Earth is flat")
1222 1222  
1223 1223  **Expected Output:**
1224 -
1225 1225  * Extract claim
1226 1226  * Verdict: REFUTED
1227 1227  * High confidence (> 90%)
... ... @@ -1235,7 +1235,6 @@
1235 1235  **Input:** Article with claim where evidence is genuinely mixed
1236 1236  
1237 1237  **Expected Output:**
1238 -
1239 1239  * Extract claim
1240 1240  * Verdict: UNCERTAIN
1241 1241  * Moderate confidence (40-60%)
... ... @@ -1248,7 +1248,6 @@
1248 1248  **Input:** Article making medical claims
1249 1249  
1250 1250  **Expected Output:**
1251 -
1252 1252  * Extract claim
1253 1253  * Verdict: [appropriate based on evidence]
1254 1254  * Risk tier: A (High - medical)
... ... @@ -1266,7 +1266,6 @@
1266 1266  **Option A: GO (Proceed to POC2)**
1267 1267  
1268 1268  **Conditions:**
1269 -
1270 1270  * AI quality ≥70% without manual editing
1271 1271  * Basic claim → verdict pipeline validated
1272 1272  * Internal + advisor feedback positive
... ... @@ -1275,7 +1275,6 @@
1275 1275  * Clear path to improving AI quality to ≥90%
1276 1276  
1277 1277  **Next Steps:**
1278 -
1279 1279  * Plan POC2 development (add scenarios)
1280 1280  * Design scenario architecture
1281 1281  * Expand to Evidence Model structure
... ... @@ -1284,7 +1284,6 @@
1284 1284  **Option B: NO-GO (Pivot or Stop)**
1285 1285  
1286 1286  **Conditions:**
1287 -
1288 1288  * AI quality < 60%
1289 1289  * Requires manual editing for most analyses (> 50%)
1290 1290  * Feedback indicates fundamental flaws
... ... @@ -1292,7 +1292,6 @@
1292 1292  * No clear path to improvement
1293 1293  
1294 1294  **Next Steps:**
1295 -
1296 1296  * **Pivot:** Change to hybrid human-AI approach (accept manual review required)
1297 1297  * **Stop:** Conclude approach not viable, revisit later
1298 1298  
... ... @@ -1299,7 +1299,6 @@
1299 1299  **Option C: ITERATE (Improve POC)**
1300 1300  
1301 1301  **Conditions:**
1302 -
1303 1303  * Concept has merit but execution needs work
1304 1304  * Specific improvements identified
1305 1305  * Addressable with better prompts/approach
... ... @@ -1306,7 +1306,6 @@
1306 1306  * AI quality between 60-70%
1307 1307  
1308 1308  **Next Steps:**
1309 -
1310 1310  * Improve AI prompts
1311 1311  * Test different approaches
1312 1312  * Re-run POC with improvements
... ... @@ -1328,7 +1328,6 @@
1328 1328  **Impact:** POC fails
1329 1329  
1330 1330  **Mitigation:**
1331 -
1332 1332  * Extensive prompt engineering and testing
1333 1333  * Use best available AI models (Sonnet 4.5)
1334 1334  * Test with diverse article types
... ... @@ -1342,7 +1342,6 @@
1342 1342  **Impact:** Works sometimes, fails other times
1343 1343  
1344 1344  **Mitigation:**
1345 -
1346 1346  * Test with 10+ diverse articles
1347 1347  * Measure success rate honestly
1348 1348  * Improve prompts to increase consistency
... ... @@ -1355,7 +1355,6 @@
1355 1355  **Impact:** Users can't understand analysis
1356 1356  
1357 1357  **Mitigation:**
1358 -
1359 1359  * Create clear explainer document
1360 1360  * Iterate on output format
1361 1361  * Test with non-technical reviewers
... ... @@ -1369,7 +1369,6 @@
1369 1369  **Impact:** System slow or expensive
1370 1370  
1371 1371  **Mitigation:**
1372 -
1373 1373  * Monitor API usage
1374 1374  * Implement retry logic
1375 1375  * Estimate costs before scaling
... ... @@ -1382,7 +1382,6 @@
1382 1382  **Impact:** POC becomes too complex
1383 1383  
1384 1384  **Mitigation:**
1385 -
1386 1386  * Strict scope discipline
1387 1387  * Say NO to feature additions
1388 1388  * Keep focus on core question
... ... @@ -1393,15 +1393,12 @@
1393 1393  
1394 1394  === 15.1 Core Principles ===
1395 1395  
1396 -*
1397 -**
1398 -**1. Build Less, Learn More
1310 +**1. Build Less, Learn More**
1399 1399  * Minimum features to test hypothesis
1400 1400  * Don't build unvalidated features
1401 1401  * Focus on core question only
1402 1402  
1403 1403  **2. Fail Fast**
1404 -
1405 1405  * Quick test of hardest part (AI capability)
1406 1406  * Accept that POC might fail
1407 1407  * Better to discover issues early
... ... @@ -1408,19 +1408,16 @@
1408 1408  * Honest assessment over optimistic hope
1409 1409  
1410 1410  **3. Test First, Build Second**
1411 -
1412 1412  * Validate AI can do this before building platform
1413 1413  * Don't assume it will work
1414 1414  * Let results guide decisions
1415 1415  
1416 1416  **4. Automation First**
1417 -
1418 1418  * No manual editing allowed
1419 1419  * Tests scalability, not just feasibility
1420 1420  * Proves approach can work at scale
1421 1421  
1422 1422  **5. Honest Assessment**
1423 -
1424 1424  * Don't cherry-pick examples
1425 1425  * Don't manually fix bad outputs
1426 1426  * Document failures openly
... ... @@ -1441,12 +1441,9 @@
1441 1441  ❌ Perfectly accurate analysis
1442 1442  ❌ Polished user experience
1443 1443  
1444 -== 16. Success ==
1352 +== 16. Success = Clear Path Forward ==
1445 1445  
1446 - Clear Path Forward ==
1447 -
1448 1448  **If POC succeeds (≥70% AI quality):**
1449 -
1450 1450  * ✅ Approach validated
1451 1451  * ✅ Proceed to POC2 (add scenarios)
1452 1452  * ✅ Design full Evidence Model structure
... ... @@ -1454,7 +1454,6 @@
1454 1454  * ✅ Focus on improving AI quality from 70% → 90%
1455 1455  
1456 1456  **If POC fails (< 60% AI quality):**
1457 -
1458 1458  * ✅ Learn what doesn't work
1459 1459  * ✅ Pivot to different approach
1460 1460  * ✅ OR wait for better AI technology
... ... @@ -1481,16 +1481,16 @@
1481 1481  **POC1 Implementation:**
1482 1482  
1483 1483  * **Primary Provider:** Anthropic Claude API
1484 -* Stage 1: Claude Haiku 4
1485 -* Stage 2: Claude Sonnet 3.5 (cached)
1486 -* Stage 3: Claude Sonnet 3.5
1388 + * Stage 1: Claude Haiku 4
1389 + * Stage 2: Claude Sonnet 3.5 (cached)
1390 + * Stage 3: Claude Sonnet 3.5
1487 1487  
1488 1488  * **Provider Interface:** Abstract LLMProvider interface implemented
1489 1489  
1490 1490  * **Configuration:** Environment variables for provider selection
1491 -* {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}}
1492 -* {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}}
1493 -* {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}}
1395 + * {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}}
1396 + * {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}}
1397 + * {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}}
1494 1494  
1495 1495  * **Failover:** Basic error handling with cache fallback for Stage 2
1496 1496  
... ... @@ -1510,10 +1510,9 @@
1510 1510  * Cost tracking includes provider name in logs
1511 1511  * Stage 2 falls back to cache on provider failure
1512 1512  
1513 -**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor V0\.9\.105.Specification.POC.API-and-Schemas.WebHome]] Section 6
1417 +**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor.Specification.POC.API-and-Schemas.WebHome]] Section 6
1514 1514  
1515 1515  **Dependencies:**
1516 -
1517 1517  * NFR-14 (Main Requirements)
1518 1518  * Design Decision 9
1519 1519  * Architecture Section 2.2
... ... @@ -1521,3 +1521,5 @@
1521 1521  **Priority:** HIGH (P1)
1522 1522  
1523 1523  **Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.
1427 +
1428 +