Last modified by Robert Schaub on 2025/12/24 20:16

From version 2.1
edited by Robert Schaub
on 2025/12/24 19:51
Change comment: Imported from XAR
To version 2.2
edited by Robert Schaub
on 2025/12/24 20:16
Change comment: Renamed back-links.

Summary

Details

Page properties
Content
... ... @@ -18,9 +18,11 @@
18 18  === 1.1 What POC Tests ===
19 19  
20 20  **Core Question:**
21 +
21 21  > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?
22 22  
23 23  **What we're proving:**
25 +
24 24  * AI can identify factual claims from text
25 25  * AI can evaluate those claims and produce verdicts
26 26  * Output is comprehensible and useful
... ... @@ -27,6 +27,7 @@
27 27  * Fully automated approach is viable
28 28  
29 29  **What we're NOT testing:**
32 +
30 30  * Scenario generation (deferred to POC2)
31 31  * Evidence display (deferred to POC2)
32 32  * Production scalability
... ... @@ -40,6 +40,7 @@
40 40  Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**.
41 41  
42 42  **Rationale:**
46 +
43 43  * **POC1 tests:** Can AI extract claims and generate verdicts?
44 44  * **POC2 will add:** Scenario generation and management
45 45  * **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow?
... ... @@ -51,6 +51,7 @@
51 51  **No Risk:**
52 52  
53 53  Scenarios are additive complexity, not foundational. Deferring them to POC2 allows:
58 +
54 54  * Faster POC1 validation
55 55  * Learning from POC1 to inform scenario design
56 56  * Iterative approach: fail fast if basic AI doesn't work
... ... @@ -57,14 +57,10 @@
57 57  * Flexibility to adjust scenario architecture based on POC1 insights
58 58  
59 59  **Full System Workflow (Future):**
60 -{{code}}
61 -Claims → Scenarios → Evidence → Verdicts
62 -{{/code}}
65 +{{code}}Claims → Scenarios → Evidence → Verdicts{{/code}}
63 63  
64 64  **POC1 Simplified Workflow:**
65 -{{code}}
66 -Claims → Verdicts (scenarios implicit in reasoning)
67 -{{/code}}
68 +{{code}}Claims → Verdicts (scenarios implicit in reasoning){{/code}}
68 68  
69 69  == 2. POC Output Specification ==
70 70  
... ... @@ -75,6 +75,7 @@
75 75  **Length:** 4-6 sentences
76 76  
77 77  **Content (Required Elements):**
79 +
78 78  1. **Article's main thesis/claim** - What is the article trying to argue or prove?
79 79  2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts
80 80  3. **Central vs. supporting claims** - Which claims are central to the article's argument?
... ... @@ -84,30 +84,28 @@
84 84  **Critical Innovation:**
85 85  
86 86  POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might:
89 +
87 87  * Make accurate supporting facts but draw unsupported conclusions
88 88  * Have one false central claim that invalidates the whole argument
89 89  * Misframe accurate information to mislead
90 90  
91 91  **Good Example (Context-Aware):**
92 -{{code}}
93 -This article argues that coffee cures cancer based on its antioxidant
95 +{{code}}This article argues that coffee cures cancer based on its antioxidant
94 94  content. We analyzed 3 factual claims: 2 about coffee's chemical
95 95  properties are well-supported, but the main causal claim is refuted
96 96  by current evidence. The article confuses correlation with causation.
97 97  Overall assessment: MISLEADING - makes an unsupported medical claim
98 -despite citing some accurate facts.
99 -{{/code}}
100 +despite citing some accurate facts.{{/code}}
100 100  
101 101  **Poor Example (Simple Aggregation - Don't Do This):**
102 -{{code}}
103 -This article makes 3 claims. 2 are well-supported and 1 is refuted.
104 -Overall assessment: mostly accurate (67% accurate).
105 -{{/code}}
103 +{{code}}This article makes 3 claims. 2 are well-supported and 1 is refuted.
104 +Overall assessment: mostly accurate (67% accurate).{{/code}}
106 106  ↑ This misses that the refuted claim IS the article's main point!
107 107  
108 108  **What POC1 Tests:**
109 109  
110 110  Can AI identify and assess:
110 +
111 111  * ✅ The article's main thesis/conclusion?
112 112  * ✅ Which claims are central vs. supporting?
113 113  * ✅ Whether the evidence supports the conclusion?
... ... @@ -116,6 +116,7 @@
116 116  **If AI Cannot Do This:**
117 117  
118 118  That's valuable to learn in POC1! We'll:
119 +
119 119  * Note as limitation
120 120  * Fall back to simple aggregation with warning
121 121  * Design explicit article-level analysis for POC2
... ... @@ -126,19 +126,18 @@
126 126  **Format:** Numbered list
127 127  **Quantity:** 3-5 claims
128 128  **Requirements:**
130 +
129 129  * Factual claims only (not opinions/questions)
130 130  * Clearly stated
131 131  * Automatically extracted by AI
132 132  
133 133  **Example:**
134 -{{code}}
135 -CLAIMS IDENTIFIED:
136 +{{code}}CLAIMS IDENTIFIED:
136 136  
137 137  [1] Coffee reduces diabetes risk by 30%
138 138  [2] Coffee improves heart health
139 139  [3] Decaf has same benefits as regular
140 -[4] Coffee prevents Alzheimer's completely
141 -{{/code}}
141 +[4] Coffee prevents Alzheimer's completely{{/code}}
142 142  
143 143  === 2.3 Component 3: CLAIMS VERDICTS ===
144 144  
... ... @@ -146,6 +146,7 @@
146 146  **Format:** Per claim structure
147 147  
148 148  **Required Elements:**
149 +
149 149  * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
150 150  * **Confidence Score:** 0-100%
151 151  * **Brief Reasoning:** 1-3 sentences explaining why
... ... @@ -152,8 +152,7 @@
152 152  * **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration
153 153  
154 154  **Example:**
155 -{{code}}
156 -VERDICTS:
156 +{{code}}VERDICTS:
157 157  
158 158  [1] WELL-SUPPORTED (85%) [Risk: C]
159 159  Multiple studies confirm 25-30% risk reduction with regular consumption.
... ... @@ -165,10 +165,10 @@
165 165  Some benefits overlap, but caffeine-related benefits are reduced in decaf.
166 166  
167 167  [4] REFUTED (90%) [Risk: B]
168 -No evidence for complete prevention. Claim is significantly overstated.
169 -{{/code}}
168 +No evidence for complete prevention. Claim is significantly overstated.{{/code}}
170 170  
171 171  **Risk Tier Display:**
171 +
172 172  * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections
173 173  * **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
174 174  * **Tier C (Green):** Low Risk - Facts/Definitions/History
... ... @@ -182,13 +182,11 @@
182 182  **Tone:** Neutral (article's position, not FactHarbor's analysis)
183 183  
184 184  **Example:**
185 -{{code}}
186 -ARTICLE SUMMARY:
185 +{{code}}ARTICLE SUMMARY:
187 187  
188 188  Health News Today article discusses coffee benefits, citing studies
189 189  on diabetes and Alzheimer's. Author highlights research linking coffee
190 -to disease prevention. Recommends 2-3 cups daily for optimal health.
191 -{{/code}}
189 +to disease prevention. Recommends 2-3 cups daily for optimal health.{{/code}}
192 192  
193 193  === 2.5 Component 5: USAGE STATISTICS (Cost Tracking) ===
194 194  
... ... @@ -195,6 +195,7 @@
195 195  **What:** LLM usage metrics for cost optimization and scaling decisions
196 196  
197 197  **Purpose:**
196 +
198 198  * Understand cost per analysis
199 199  * Identify optimization opportunities
200 200  * Project costs at scale
... ... @@ -201,8 +201,7 @@
201 201  * Inform architecture decisions
202 202  
203 203  **Display Format:**
204 -{{code}}
205 -USAGE STATISTICS:
203 +{{code}}USAGE STATISTICS:
206 206  • Article: 2,450 words (12,300 characters)
207 207  • Input tokens: 15,234
208 208  • Output tokens: 892
... ... @@ -210,17 +210,18 @@
210 210  • Estimated cost: $0.24 USD
211 211  • Response time: 8.3 seconds
212 212  • Cost per claim: $0.048
213 -• Model: claude-sonnet-4-20250514
214 -{{/code}}
211 +• Model: claude-sonnet-4-20250514{{/code}}
215 215  
216 216  **Why This Matters:**
217 217  
218 218  At scale, LLM costs are critical:
216 +
219 219  * 10,000 articles/month ≈ $200-500/month
220 220  * 100,000 articles/month ≈ $2,000-5,000/month
221 221  * Cost optimization can reduce expenses 30-50%
222 222  
223 223  **What POC1 Learns:**
222 +
224 224  * How cost scales with article length
225 225  * Prompt optimization opportunities (caching, compression)
226 226  * Output verbosity tradeoffs
... ... @@ -228,6 +228,7 @@
228 228  * Article length limits (if needed)
229 229  
230 230  **Implementation:**
230 +
231 231  * Claude API already returns usage data
232 232  * No extra API calls needed
233 233  * Display to user + log for aggregate analysis
... ... @@ -237,7 +237,8 @@
237 237  
238 238  === 2.6 Total Output Size ===
239 239  
240 -**Combined:** ~220-350 words
240 +**Combined:** 220-350 words
241 +
241 241  * Analysis Summary (Context-Aware): 60-90 words (4-6 sentences)
242 242  * Claims Identification: 30-50 words
243 243  * Claims Verdicts: 100-150 words
... ... @@ -252,6 +252,7 @@
252 252  The following are **explicitly excluded** from POC:
253 253  
254 254  **Content Features:**
256 +
255 255  * ❌ Scenarios (deferred to POC2)
256 256  * ❌ Evidence display (supporting/opposing lists)
257 257  * ❌ Source links (clickable references)
... ... @@ -261,6 +261,7 @@
261 261  * ❌ Risk assessment (shown but not workflow-integrated)
262 262  
263 263  **Platform Features:**
266 +
264 264  * ❌ User accounts / authentication
265 265  * ❌ Saved history
266 266  * ❌ Search functionality
... ... @@ -270,6 +270,7 @@
270 270  * ❌ Social sharing
271 271  
272 272  **Technical Features:**
276 +
273 273  * ❌ Browser extensions
274 274  * ❌ Mobile apps
275 275  * ❌ API endpoints
... ... @@ -277,6 +277,7 @@
277 277  * ❌ Export features (PDF, CSV)
278 278  
279 279  **Quality Features:**
284 +
280 280  * ❌ Accessibility (WCAG compliance)
281 281  * ❌ Multilingual support
282 282  * ❌ Mobile optimization
... ... @@ -283,6 +283,7 @@
283 283  * ❌ Media verification (images/videos)
284 284  
285 285  **Production Features:**
291 +
286 286  * ❌ Security hardening
287 287  * ❌ Privacy compliance (GDPR)
288 288  * ❌ Terms of service
... ... @@ -296,17 +296,13 @@
296 296  === 4.1 Architecture Comparison ===
297 297  
298 298  **POC Architecture (Simplified):**
299 -{{code}}
300 -User Input → Single AKEL Call → Output Display
301 - (all processing)
302 -{{/code}}
305 +{{code}}User Input → Single AKEL Call → Output Display
306 + (all processing){{/code}}
303 303  
304 304  **Full System Architecture:**
305 -{{code}}
306 -User Input → Claim Extractor → Claim Classifier → Scenario Generator
309 +{{code}}User Input → Claim Extractor → Claim Classifier → Scenario Generator
307 307  → Evidence Summarizer → Contradiction Detector → Verdict Generator
308 -→ Quality Gates → Publication → Output Display
309 -{{/code}}
311 +→ Quality Gates → Publication → Output Display{{/code}}
310 310  
311 311  **Key Differences:**
312 312  
... ... @@ -322,12 +322,14 @@
322 322  === 4.2 Workflow Comparison ===
323 323  
324 324  **POC1 Workflow:**
327 +
325 325  1. User submits text/URL
326 326  2. Single AKEL call (all processing in one prompt)
327 327  3. Display results
328 -**Total: 3 steps, ~10-18 seconds**
331 +**Total: 3 steps, 10-18 seconds**
329 329  
330 330  **Full System Workflow:**
334 +
331 331  1. **Claim Submission** (extraction, normalization, clustering)
332 332  2. **Scenario Building** (definitions, assumptions, boundaries)
333 333  3. **Evidence Handling** (retrieval, assessment, linking)
... ... @@ -334,7 +334,7 @@
334 334  4. **Verdict Creation** (synthesis, reasoning, approval)
335 335  5. **Public Presentation** (summaries, landscapes, deep dives)
336 336  6. **Time Evolution** (versioning, re-evaluation triggers)
337 -**Total: 6 phases with quality gates, ~10-30 seconds**
341 +**Total: 6 phases with quality gates, 10-30 seconds**
338 338  
339 339  === 4.3 Why POC is Simplified ===
340 340  
... ... @@ -357,6 +357,7 @@
357 357  === 4.4 Gap Between POC1 and POC2/Beta ===
358 358  
359 359  **What needs to be built for POC2:**
364 +
360 360  * Scenario generation component
361 361  * Evidence Model structure (full)
362 362  * Scenario-evidence linking
... ... @@ -364,6 +364,7 @@
364 364  * Truth landscape visualization
365 365  
366 366  **What needs to be built for Beta:**
372 +
367 367  * Multi-component AKEL pipeline
368 368  * Quality gate infrastructure
369 369  * Review workflow system
... ... @@ -380,6 +380,7 @@
380 380  **Mode:** Mode 2 (AI-Generated, No Prior Human Review)
381 381  
382 382  Per FactHarbor Specification Section 11 "POC v1 Behavior":
389 +
383 383  * Produces public AI-generated output
384 384  * No human approval gate
385 385  * Clear AI-Generated labeling
... ... @@ -389,8 +389,7 @@
389 389  === 5.2 User-Facing Labels ===
390 390  
391 391  **Primary Label (top of analysis):**
392 -{{code}}
393 -╔════════════════════════════════════════════════════════════╗
399 +{{code}}╔════════════════════════════════════════════════════════════╗
394 394  ║ [AI-GENERATED - POC/DEMO] ║
395 395  ║ ║
396 396  ║ This analysis was produced entirely by AI and has not ║
... ... @@ -400,10 +400,10 @@
400 400  ║ Review Status: Not Reviewed (Proof-of-Concept) ║
401 401  ║ Quality Gates: 4/4 Passed (Simplified) ║
402 402  ║ Last Updated: [timestamp] ║
403 -╚════════════════════════════════════════════════════════════╝
404 -{{/code}}
409 +╚════════════════════════════════════════════════════════════╝{{/code}}
405 405  
406 406  **Per-Claim Risk Labels:**
412 +
407 407  * **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety)
408 408  * **[Risk: B]** 🟡 Medium Risk (Policy/Science)
409 409  * **[Risk: C]** 🟢 Low Risk (Facts/Definitions)
... ... @@ -411,6 +411,7 @@
411 411  === 5.3 Display Requirements ===
412 412  
413 413  **Must Show:**
420 +
414 414  * AI-Generated status (prominent)
415 415  * POC/Demo disclaimer
416 416  * Risk tier per claim
... ... @@ -419,6 +419,7 @@
419 419  * Timestamp
420 420  
421 421  **Must NOT Claim:**
429 +
422 422  * Human review
423 423  * Production quality
424 424  * Medical/legal advice
... ... @@ -442,6 +442,7 @@
442 442  Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates.
443 443  
444 444  **Full System Has 4 Gates:**
453 +
445 445  1. Source Quality
446 446  2. Contradiction Search (MANDATORY)
447 447  3. Uncertainty Quantification
... ... @@ -448,6 +448,7 @@
448 448  4. Structural Integrity
449 449  
450 450  **POC Implements Simplified Versions:**
460 +
451 451  * Focus on demonstrating concept
452 452  * Basic implementations sufficient
453 453  * Failures displayed to user (not blocking)
... ... @@ -456,6 +456,7 @@
456 456  === 6.2 Gate 1: Source Quality (Basic) ===
457 457  
458 458  **Full System Requirements:**
469 +
459 459  * Primary sources identified and accessible
460 460  * Source reliability scored against whitelist
461 461  * Citation completeness verified
... ... @@ -463,6 +463,7 @@
463 463  * Author credentials validated
464 464  
465 465  **POC Implementation:**
477 +
466 466  * ✅ At least 2 sources found
467 467  * ✅ Sources accessible (URLs valid)
468 468  * ❌ No whitelist checking
... ... @@ -476,6 +476,7 @@
476 476  === 6.3 Gate 2: Contradiction Search (Basic) ===
477 477  
478 478  **Full System Requirements:**
491 +
479 479  * Counter-evidence actively searched
480 480  * Reservations and limitations identified
481 481  * Alternative interpretations explored
... ... @@ -484,6 +484,7 @@
484 484  * Academic literature (supporting AND opposing)
485 485  
486 486  **POC Implementation:**
500 +
487 487  * ✅ Basic search for counter-evidence
488 488  * ✅ Identify obvious contradictions
489 489  * ❌ No comprehensive academic search
... ... @@ -498,6 +498,7 @@
498 498  === 6.4 Gate 3: Uncertainty Quantification (Basic) ===
499 499  
500 500  **Full System Requirements:**
515 +
501 501  * Confidence scores calculated for all claims/verdicts
502 502  * Limitations explicitly stated
503 503  * Data gaps identified and disclosed
... ... @@ -505,6 +505,7 @@
505 505  * Alternative scenarios considered
506 506  
507 507  **POC Implementation:**
523 +
508 508  * ✅ Confidence scores (0-100%)
509 509  * ✅ Basic uncertainty acknowledgment
510 510  * ❌ No detailed limitation disclosure
... ... @@ -518,6 +518,7 @@
518 518  === 6.5 Gate 4: Structural Integrity (Basic) ===
519 519  
520 520  **Full System Requirements:**
537 +
521 521  * No hallucinations detected (fact-checking against sources)
522 522  * Logic chain valid and traceable
523 523  * References accessible and verifiable
... ... @@ -525,6 +525,7 @@
525 525  * Premises clearly stated
526 526  
527 527  **POC Implementation:**
545 +
528 528  * ✅ Basic coherence check
529 529  * ✅ References accessible
530 530  * ❌ No comprehensive hallucination detection
... ... @@ -538,24 +538,20 @@
538 538  === 6.6 Quality Gate Display ===
539 539  
540 540  **POC shows simplified status:**
541 -{{code}}
542 -Quality Gates: 4/4 Passed (Simplified)
559 +{{code}}Quality Gates: 4/4 Passed (Simplified)
543 543  ✓ Source Quality: 3 sources found
544 544  ✓ Contradiction Search: Basic search completed
545 545  ✓ Uncertainty: Confidence scores assigned
546 -✓ Structural Integrity: Output coherent
547 -{{/code}}
563 +✓ Structural Integrity: Output coherent{{/code}}
548 548  
549 549  **If any gate fails:**
550 -{{code}}
551 -Quality Gates: 3/4 Passed (Simplified)
566 +{{code}}Quality Gates: 3/4 Passed (Simplified)
552 552  ✓ Source Quality: 3 sources found
553 553  ✗ Contradiction Search: Search failed - limited evidence
554 554  ✓ Uncertainty: Confidence scores assigned
555 555  ✓ Structural Integrity: Output coherent
556 556  
557 -Note: This analysis has limited evidence. Use with caution.
558 -{{/code}}
572 +Note: This analysis has limited evidence. Use with caution.{{/code}}
559 559  
560 560  === 6.7 Simplified vs. Full System ===
561 561  
... ... @@ -572,6 +572,7 @@
572 572  === 7.1 POC AKEL (Simplified) ===
573 573  
574 574  **Implementation:**
589 +
575 575  * Single Claude API call (Sonnet 4.5)
576 576  * One comprehensive prompt
577 577  * All processing in single request
... ... @@ -579,8 +579,7 @@
579 579  * No orchestration layer
580 580  
581 581  **Prompt Structure:**
582 -{{code}}
583 -Task: Analyze this article and provide:
597 +{{code}}Task: Analyze this article and provide:
584 584  
585 585  1. Extract 3-5 factual claims
586 586  2. For each claim:
... ... @@ -592,8 +592,7 @@
592 592  4. Generate article summary (3-5 sentences)
593 593  5. Run basic quality checks
594 594  
595 -Return as structured JSON.
596 -{{/code}}
609 +Return as structured JSON.{{/code}}
597 597  
598 598  **Processing Time:** 10-18 seconds (estimate)
599 599  
... ... @@ -600,8 +600,7 @@
600 600  === 7.2 Full System AKEL (Production) ===
601 601  
602 602  **Architecture:**
603 -{{code}}
604 -AKEL Orchestrator
616 +{{code}}AKEL Orchestrator
605 605  ├── Claim Extractor
606 606  ├── Claim Classifier (with risk tier assignment)
607 607  ├── Scenario Generator
... ... @@ -609,10 +609,10 @@
609 609  ├── Contradiction Detector
610 610  ├── Quality Gate Validator
611 611  ├── Audit Sampling Scheduler
612 -└── Federation Sync Adapter (Release 1.0+)
613 -{{/code}}
624 +└── Federation Sync Adapter (Release 1.0+){{/code}}
614 614  
615 615  **Processing:**
627 +
616 616  * Parallel processing where possible
617 617  * Separate component calls
618 618  * Quality gates between phases
... ... @@ -624,6 +624,7 @@
624 624  === 7.3 Why POC Uses Single Call ===
625 625  
626 626  **Advantages:**
639 +
627 627  * ✅ Simpler to implement
628 628  * ✅ Faster POC development
629 629  * ✅ Easier to debug
... ... @@ -631,6 +631,7 @@
631 631  * ✅ Good enough for concept validation
632 632  
633 633  **Limitations:**
647 +
634 634  * ❌ No component reusability
635 635  * ❌ No parallel processing
636 636  * ❌ All-or-nothing (can't partially succeed)
... ... @@ -657,6 +657,7 @@
657 657  **Requirement:** User can submit article for analysis
658 658  
659 659  **Functionality:**
674 +
660 660  * Text input field (paste article text, up to 5000 characters)
661 661  * URL input field (paste article URL)
662 662  * "Analyze" button to trigger processing
... ... @@ -663,6 +663,7 @@
663 663  * Loading indicator during analysis
664 664  
665 665  **Excluded:**
681 +
666 666  * No user authentication
667 667  * No claim history
668 668  * No search functionality
... ... @@ -669,6 +669,7 @@
669 669  * No saved templates
670 670  
671 671  **Acceptance Criteria:**
688 +
672 672  * User can paste text from article
673 673  * User can paste URL of article
674 674  * System accepts input and triggers analysis
... ... @@ -678,6 +678,7 @@
678 678  **Requirement:** AI automatically extracts 3-5 factual claims
679 679  
680 680  **Functionality:**
698 +
681 681  * AI reads article text
682 682  * AI identifies factual claims (not opinions/questions)
683 683  * AI extracts 3-5 most important claims
... ... @@ -684,6 +684,7 @@
684 684  * System displays numbered list
685 685  
686 686  **Critical:** NO MANUAL EDITING ALLOWED
705 +
687 687  * AI selects which claims to extract
688 688  * AI identifies factual vs. non-factual
689 689  * System processes claims as extracted
... ... @@ -690,11 +690,13 @@
690 690  * No human curation or correction
691 691  
692 692  **Error Handling:**
712 +
693 693  * If extraction fails: Display error message
694 694  * User can retry with different input
695 695  * No manual intervention to fix extraction
696 696  
697 697  **Acceptance Criteria:**
718 +
698 698  * AI extracts 3-5 claims automatically
699 699  * Claims are factual (not opinions)
700 700  * Claims are clearly stated
... ... @@ -705,15 +705,17 @@
705 705  **Requirement:** AI automatically generates verdict for each claim
706 706  
707 707  **Functionality:**
729 +
708 708  * For each claim, AI:
709 - * Evaluates claim based on available evidence/knowledge
710 - * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
711 - * Assigns confidence score (0-100%)
712 - * Assigns risk tier (A/B/C)
713 - * Writes brief reasoning (1-3 sentences)
731 +* Evaluates claim based on available evidence/knowledge
732 +* Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
733 +* Assigns confidence score (0-100%)
734 +* Assigns risk tier (A/B/C)
735 +* Writes brief reasoning (1-3 sentences)
714 714  * System displays verdict for each claim
715 715  
716 716  **Critical:** NO MANUAL EDITING ALLOWED
739 +
717 717  * AI computes verdicts based on evidence
718 718  * AI generates confidence scores
719 719  * AI writes reasoning
... ... @@ -720,11 +720,13 @@
720 720  * No human review or adjustment
721 721  
722 722  **Error Handling:**
746 +
723 723  * If verdict generation fails: Display error message
724 724  * User can retry
725 725  * No manual intervention to adjust verdicts
726 726  
727 727  **Acceptance Criteria:**
752 +
728 728  * Each claim has a verdict
729 729  * Confidence score is displayed (0-100%)
730 730  * Risk tier is displayed (A/B/C)
... ... @@ -737,15 +737,17 @@
737 737  **Requirement:** AI generates brief summary of analysis
738 738  
739 739  **Functionality:**
765 +
740 740  * AI summarizes findings in 3-5 sentences:
741 - * How many claims found
742 - * Distribution of verdicts
743 - * Overall assessment
767 +* How many claims found
768 +* Distribution of verdicts
769 +* Overall assessment
744 744  * System displays at top of results
745 745  
746 746  **Critical:** NO MANUAL EDITING ALLOWED
747 747  
748 748  **Acceptance Criteria:**
775 +
749 749  * Summary is coherent
750 750  * Accurately reflects analysis
751 751  * 3-5 sentences
... ... @@ -756,6 +756,7 @@
756 756  **Requirement:** AI generates brief summary of original article
757 757  
758 758  **Functionality:**
786 +
759 759  * AI summarizes article content (not FactHarbor's analysis)
760 760  * 3-5 sentences
761 761  * System displays
... ... @@ -765,6 +765,7 @@
765 765  **Critical:** NO MANUAL EDITING ALLOWED
766 766  
767 767  **Acceptance Criteria:**
796 +
768 768  * Summary is neutral (article's position)
769 769  * Accurately reflects article content
770 770  * 3-5 sentences
... ... @@ -775,6 +775,7 @@
775 775  **Requirement:** Clear labeling of AI-generated content
776 776  
777 777  **Functionality:**
807 +
778 778  * Display Mode 2 publication label
779 779  * Show POC/Demo disclaimer
780 780  * Display risk tiers per claim
... ... @@ -782,6 +782,7 @@
782 782  * Display timestamp
783 783  
784 784  **Acceptance Criteria:**
815 +
785 785  * Label is prominent and clear
786 786  * User understands this is AI-generated POC output
787 787  * Risk tiers are color-coded
... ... @@ -792,6 +792,7 @@
792 792  **Requirement:** Execute simplified quality gates
793 793  
794 794  **Functionality:**
826 +
795 795  * Check source quality (basic)
796 796  * Attempt contradiction search (basic)
797 797  * Calculate confidence scores
... ... @@ -799,6 +799,7 @@
799 799  * Display gate results
800 800  
801 801  **Acceptance Criteria:**
834 +
802 802  * All 4 gates attempted
803 803  * Pass/fail status displayed
804 804  * Failures explained to user
... ... @@ -813,6 +813,7 @@
813 813  **Critical Rule:** NO MANUAL EDITING AT ANY STAGE
814 814  
815 815  **What this means:**
849 +
816 816  * Claims: AI selects (no human curation)
817 817  * Scenarios: N/A (deferred to POC2)
818 818  * Evidence: AI evaluates (no human selection)
... ... @@ -820,13 +820,12 @@
820 820  * Summaries: AI writes (no human editing)
821 821  
822 822  **Pipeline:**
823 -{{code}}
824 -User Input → AKEL Processing → Output Display
857 +{{code}}User Input → AKEL Processing → Output Display
825 825   ↓
826 - ZERO human editing
827 -{{/code}}
859 + ZERO human editing{{/code}}
828 828  
829 829  **If AI output is poor:**
862 +
830 830  * ❌ Do NOT manually fix it
831 831  * ✅ Document the failure
832 832  * ✅ Improve prompts and retry
... ... @@ -833,6 +833,7 @@
833 833  * ✅ Accept that POC might fail
834 834  
835 835  **Why this matters:**
869 +
836 836  * Tests whether AI can do this without humans
837 837  * Validates scalability (humans can't review every analysis)
838 838  * Honest test of technical feasibility
... ... @@ -842,16 +842,19 @@
842 842  **Requirement:** Analysis completes in reasonable time
843 843  
844 844  **Acceptable Performance:**
879 +
845 845  * Processing time: 1-5 minutes (acceptable for POC)
846 846  * Display loading indicator to user
847 847  * Show progress if possible ("Extracting claims...", "Generating verdicts...")
848 848  
849 849  **Not Required:**
885 +
850 850  * Production-level speed (< 30 seconds)
851 851  * Optimization for scale
852 852  * Caching
853 853  
854 854  **Acceptance Criteria:**
891 +
855 855  * Analysis completes within 5 minutes
856 856  * User sees loading indicator
857 857  * No timeout errors
... ... @@ -861,16 +861,19 @@
861 861  **Requirement:** System works for manual testing sessions
862 862  
863 863  **Acceptable:**
901 +
864 864  * Occasional errors (< 20% failure rate)
865 865  * Manual restart if needed
866 866  * Display error messages clearly
867 867  
868 868  **Not Required:**
907 +
869 869  * 99.9% uptime
870 870  * Automatic error recovery
871 871  * Production monitoring
872 872  
873 873  **Acceptance Criteria:**
913 +
874 874  * System works for test demonstrations
875 875  * Errors are handled gracefully
876 876  * User receives clear error messages
... ... @@ -880,6 +880,7 @@
880 880  **Requirement:** Runs on simple infrastructure
881 881  
882 882  **Acceptable:**
923 +
883 883  * Single machine or simple cloud setup
884 884  * No distributed architecture
885 885  * No load balancing
... ... @@ -887,6 +887,7 @@
887 887  * Local development environment viable
888 888  
889 889  **Not Required:**
931 +
890 890  * Production infrastructure
891 891  * Multi-region deployment
892 892  * Auto-scaling
... ... @@ -897,6 +897,7 @@
897 897  **Requirement:** Track and display LLM usage metrics to inform optimization decisions
898 898  
899 899  **Must Track:**
942 +
900 900  * Input tokens (article + prompt)
901 901  * Output tokens (generated analysis)
902 902  * Total tokens
... ... @@ -905,16 +905,19 @@
905 905  * Article length (words/characters)
906 906  
907 907  **Must Display:**
951 +
908 908  * Usage statistics in UI (Component 5)
909 909  * Cost per analysis
910 910  * Cost per claim extracted
911 911  
912 912  **Must Log:**
957 +
913 913  * Aggregate metrics for analysis
914 914  * Cost distribution by article length
915 915  * Token efficiency trends
916 916  
917 917  **Purpose:**
963 +
918 918  * Understand unit economics
919 919  * Identify optimization opportunities
920 920  * Project costs at scale
... ... @@ -921,6 +921,7 @@
921 921  * Inform architecture decisions (caching, model selection, etc.)
922 922  
923 923  **Acceptance Criteria:**
970 +
924 924  * ✅ Usage data displayed after each analysis
925 925  * ✅ Metrics logged for aggregate analysis
926 926  * ✅ Cost calculated accurately (Claude API pricing)
... ... @@ -928,6 +928,7 @@
928 928  * ✅ POC1 report includes cost analysis section
929 929  
930 930  **Success Target:**
978 +
931 931  * Average cost per analysis < $0.05 USD
932 932  * Cost scaling behavior understood (linear/exponential)
933 933  * 2+ optimization opportunities identified
... ... @@ -939,11 +939,13 @@
939 939  === 10.1 System Components ===
940 940  
941 941  **Frontend:**
990 +
942 942  * Simple HTML form (text input + URL input + button)
943 943  * Loading indicator
944 944  * Results display page (single page, no tabs/navigation)
945 945  
946 946  **Backend:**
996 +
947 947  * Single API endpoint
948 948  * Calls Claude API (Sonnet 4.5 or latest)
949 949  * Parses response
... ... @@ -950,10 +950,12 @@
950 950  * Returns JSON to frontend
951 951  
952 952  **Data Storage:**
1003 +
953 953  * None required (stateless POC)
954 954  * Optional: Simple file storage or SQLite for demo examples
955 955  
956 956  **External Services:**
1008 +
957 957  * Claude API (Anthropic) - required
958 958  * Optional: URL fetch service for article text extraction
959 959  
... ... @@ -986,8 +986,7 @@
986 986  === 10.3 AI Prompt Strategy ===
987 987  
988 988  **Single Comprehensive Prompt:**
989 -{{code}}
990 -Task: Analyze this article and provide:
1041 +{{code}}Task: Analyze this article and provide:
991 991  
992 992  1. Identify the article's main thesis/conclusion
993 993   - What is the article trying to argue or prove?
... ... @@ -1023,8 +1023,7 @@
1023 1023  
1024 1024  7. Write article summary (3-5 sentences: neutral summary of article content)
1025 1025  
1026 -Return as structured JSON with quality gate results.
1027 -{{/code}}
1077 +Return as structured JSON with quality gate results.{{/code}}
1028 1028  
1029 1029  **One prompt generates everything.**
1030 1030  
... ... @@ -1035,25 +1035,30 @@
1035 1035  === 10.4 Technology Stack Suggestions ===
1036 1036  
1037 1037  **Frontend:**
1088 +
1038 1038  * HTML + CSS + JavaScript (minimal framework)
1039 1039  * OR: Next.js (if team prefers)
1040 1040  * Hosted: Local machine OR Vercel/Netlify free tier
1041 1041  
1042 1042  **Backend:**
1094 +
1043 1043  * Python Flask/FastAPI (simple REST API)
1044 1044  * OR: Next.js API routes (if using Next.js)
1045 1045  * Hosted: Local machine OR Railway/Render free tier
1046 1046  
1047 1047  **AKEL Integration:**
1100 +
1048 1048  * Claude API via Anthropic SDK
1049 1049  * Model: Claude Sonnet 4.5 or latest available
1050 1050  
1051 1051  **Database:**
1105 +
1052 1052  * None (stateless acceptable)
1053 1053  * OR: SQLite if want to store demo examples
1054 1054  * OR: JSON files on disk
1055 1055  
1056 1056  **Deployment:**
1111 +
1057 1057  * Local development environment sufficient for POC
1058 1058  * Optional: Deploy to cloud for remote demos
1059 1059  
... ... @@ -1062,6 +1062,7 @@
1062 1062  === 11.1 Minimum Success (POC Passes) ===
1063 1063  
1064 1064  **Required for GO decision:**
1120 +
1065 1065  * ✅ AI extracts 3-5 factual claims automatically
1066 1066  * ✅ AI provides verdict for each claim automatically
1067 1067  * ✅ Verdicts are reasonable (≥70% make logical sense)
... ... @@ -1075,6 +1075,7 @@
1075 1075  * ✅ **Optimization opportunities identified** (≥2 potential improvements documented)
1076 1076  
1077 1077  **Quality Definition:**
1134 +
1078 1078  * "Reasonable verdict" = Defensible given general knowledge
1079 1079  * "Coherent summary" = Logically structured, grammatically correct
1080 1080  * "Comprehensible" = Reviewers understand what analysis means
... ... @@ -1082,6 +1082,7 @@
1082 1082  === 11.2 POC Fails If ===
1083 1083  
1084 1084  **Automatic NO-GO if any of these:**
1142 +
1085 1085  * ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones)
1086 1086  * ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random)
1087 1087  * ❌ Output incomprehensible (reviewers can't understand analysis)
... ... @@ -1093,14 +1093,15 @@
1093 1093  **POC quality expectations:**
1094 1094  
1095 1095  |=Component|=Quality Threshold|=Definition
1096 -|Claim Extraction|(% class="success" %)≥70% accuracy(%%) |Identifies obvious factual claims, may miss some edge cases
1097 -|Verdict Logic|(% class="success" %)≥70% defensible(%%) |Verdicts are logical given reasoning provided
1098 -|Reasoning Clarity|(% class="success" %)≥70% clear(%%) |1-3 sentences are understandable and relevant
1099 -|Overall Analysis|(% class="success" %)≥70% useful(%%) |Output helps user understand article claims
1154 +|Claim Extraction|(% class="success" %)≥70% accuracy |Identifies obvious factual claims, may miss some edge cases
1155 +|Verdict Logic|(% class="success" %)≥70% defensible |Verdicts are logical given reasoning provided
1156 +|Reasoning Clarity|(% class="success" %)≥70% clear |1-3 sentences are understandable and relevant
1157 +|Overall Analysis|(% class="success" %)≥70% useful |Output helps user understand article claims
1100 1100  
1101 1101  **Analogy:** "B student" quality (70-80%), not "A+" perfection yet
1102 1102  
1103 1103  **Not expecting:**
1162 +
1104 1104  * 100% accuracy
1105 1105  * Perfect claim coverage
1106 1106  * Comprehensive evidence gathering
... ... @@ -1108,6 +1108,7 @@
1108 1108  * Production polish
1109 1109  
1110 1110  **Expecting:**
1170 +
1111 1111  * Reasonable claim extraction
1112 1112  * Defensible verdicts
1113 1113  * Understandable reasoning
... ... @@ -1120,6 +1120,7 @@
1120 1120  **Input:** "Coffee reduces the risk of type 2 diabetes by 30%"
1121 1121  
1122 1122  **Expected Output:**
1183 +
1123 1123  * Extract claim correctly
1124 1124  * Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED
1125 1125  * Confidence: 70-90%
... ... @@ -1133,6 +1133,7 @@
1133 1133  **Input:** News article URL with multiple claims about politics/health/science
1134 1134  
1135 1135  **Expected Output:**
1197 +
1136 1136  * Extract 3-5 key claims
1137 1137  * Verdict for each (may vary: some supported, some uncertain, some refuted)
1138 1138  * Coherent analysis summary
... ... @@ -1146,6 +1146,7 @@
1146 1146  **Input:** Article on contested political or scientific topic
1147 1147  
1148 1148  **Expected Output:**
1211 +
1149 1149  * Balanced analysis
1150 1150  * Acknowledges uncertainty where appropriate
1151 1151  * Doesn't overstate confidence
... ... @@ -1158,6 +1158,7 @@
1158 1158  **Input:** Article with obviously false claim (e.g., "The Earth is flat")
1159 1159  
1160 1160  **Expected Output:**
1224 +
1161 1161  * Extract claim
1162 1162  * Verdict: REFUTED
1163 1163  * High confidence (> 90%)
... ... @@ -1171,6 +1171,7 @@
1171 1171  **Input:** Article with claim where evidence is genuinely mixed
1172 1172  
1173 1173  **Expected Output:**
1238 +
1174 1174  * Extract claim
1175 1175  * Verdict: UNCERTAIN
1176 1176  * Moderate confidence (40-60%)
... ... @@ -1183,6 +1183,7 @@
1183 1183  **Input:** Article making medical claims
1184 1184  
1185 1185  **Expected Output:**
1251 +
1186 1186  * Extract claim
1187 1187  * Verdict: [appropriate based on evidence]
1188 1188  * Risk tier: A (High - medical)
... ... @@ -1200,6 +1200,7 @@
1200 1200  **Option A: GO (Proceed to POC2)**
1201 1201  
1202 1202  **Conditions:**
1269 +
1203 1203  * AI quality ≥70% without manual editing
1204 1204  * Basic claim → verdict pipeline validated
1205 1205  * Internal + advisor feedback positive
... ... @@ -1208,6 +1208,7 @@
1208 1208  * Clear path to improving AI quality to ≥90%
1209 1209  
1210 1210  **Next Steps:**
1278 +
1211 1211  * Plan POC2 development (add scenarios)
1212 1212  * Design scenario architecture
1213 1213  * Expand to Evidence Model structure
... ... @@ -1216,6 +1216,7 @@
1216 1216  **Option B: NO-GO (Pivot or Stop)**
1217 1217  
1218 1218  **Conditions:**
1287 +
1219 1219  * AI quality < 60%
1220 1220  * Requires manual editing for most analyses (> 50%)
1221 1221  * Feedback indicates fundamental flaws
... ... @@ -1223,6 +1223,7 @@
1223 1223  * No clear path to improvement
1224 1224  
1225 1225  **Next Steps:**
1295 +
1226 1226  * **Pivot:** Change to hybrid human-AI approach (accept manual review required)
1227 1227  * **Stop:** Conclude approach not viable, revisit later
1228 1228  
... ... @@ -1229,6 +1229,7 @@
1229 1229  **Option C: ITERATE (Improve POC)**
1230 1230  
1231 1231  **Conditions:**
1302 +
1232 1232  * Concept has merit but execution needs work
1233 1233  * Specific improvements identified
1234 1234  * Addressable with better prompts/approach
... ... @@ -1235,6 +1235,7 @@
1235 1235  * AI quality between 60-70%
1236 1236  
1237 1237  **Next Steps:**
1309 +
1238 1238  * Improve AI prompts
1239 1239  * Test different approaches
1240 1240  * Re-run POC with improvements
... ... @@ -1256,6 +1256,7 @@
1256 1256  **Impact:** POC fails
1257 1257  
1258 1258  **Mitigation:**
1331 +
1259 1259  * Extensive prompt engineering and testing
1260 1260  * Use best available AI models (Sonnet 4.5)
1261 1261  * Test with diverse article types
... ... @@ -1269,6 +1269,7 @@
1269 1269  **Impact:** Works sometimes, fails other times
1270 1270  
1271 1271  **Mitigation:**
1345 +
1272 1272  * Test with 10+ diverse articles
1273 1273  * Measure success rate honestly
1274 1274  * Improve prompts to increase consistency
... ... @@ -1281,6 +1281,7 @@
1281 1281  **Impact:** Users can't understand analysis
1282 1282  
1283 1283  **Mitigation:**
1358 +
1284 1284  * Create clear explainer document
1285 1285  * Iterate on output format
1286 1286  * Test with non-technical reviewers
... ... @@ -1294,6 +1294,7 @@
1294 1294  **Impact:** System slow or expensive
1295 1295  
1296 1296  **Mitigation:**
1372 +
1297 1297  * Monitor API usage
1298 1298  * Implement retry logic
1299 1299  * Estimate costs before scaling
... ... @@ -1306,6 +1306,7 @@
1306 1306  **Impact:** POC becomes too complex
1307 1307  
1308 1308  **Mitigation:**
1385 +
1309 1309  * Strict scope discipline
1310 1310  * Say NO to feature additions
1311 1311  * Keep focus on core question
... ... @@ -1316,12 +1316,15 @@
1316 1316  
1317 1317  === 15.1 Core Principles ===
1318 1318  
1319 -**1. Build Less, Learn More**
1396 +*
1397 +**
1398 +**1. Build Less, Learn More
1320 1320  * Minimum features to test hypothesis
1321 1321  * Don't build unvalidated features
1322 1322  * Focus on core question only
1323 1323  
1324 1324  **2. Fail Fast**
1404 +
1325 1325  * Quick test of hardest part (AI capability)
1326 1326  * Accept that POC might fail
1327 1327  * Better to discover issues early
... ... @@ -1328,16 +1328,19 @@
1328 1328  * Honest assessment over optimistic hope
1329 1329  
1330 1330  **3. Test First, Build Second**
1411 +
1331 1331  * Validate AI can do this before building platform
1332 1332  * Don't assume it will work
1333 1333  * Let results guide decisions
1334 1334  
1335 1335  **4. Automation First**
1417 +
1336 1336  * No manual editing allowed
1337 1337  * Tests scalability, not just feasibility
1338 1338  * Proves approach can work at scale
1339 1339  
1340 1340  **5. Honest Assessment**
1423 +
1341 1341  * Don't cherry-pick examples
1342 1342  * Don't manually fix bad outputs
1343 1343  * Document failures openly
... ... @@ -1358,9 +1358,12 @@
1358 1358  ❌ Perfectly accurate analysis
1359 1359  ❌ Polished user experience
1360 1360  
1361 -== 16. Success = Clear Path Forward ==
1444 +== 16. Success ==
1362 1362  
1446 + Clear Path Forward ==
1447 +
1363 1363  **If POC succeeds (≥70% AI quality):**
1449 +
1364 1364  * ✅ Approach validated
1365 1365  * ✅ Proceed to POC2 (add scenarios)
1366 1366  * ✅ Design full Evidence Model structure
... ... @@ -1368,6 +1368,7 @@
1368 1368  * ✅ Focus on improving AI quality from 70% → 90%
1369 1369  
1370 1370  **If POC fails (< 60% AI quality):**
1457 +
1371 1371  * ✅ Learn what doesn't work
1372 1372  * ✅ Pivot to different approach
1373 1373  * ✅ OR wait for better AI technology
... ... @@ -1394,16 +1394,16 @@
1394 1394  **POC1 Implementation:**
1395 1395  
1396 1396  * **Primary Provider:** Anthropic Claude API
1397 - * Stage 1: Claude Haiku 4
1398 - * Stage 2: Claude Sonnet 3.5 (cached)
1399 - * Stage 3: Claude Sonnet 3.5
1484 +* Stage 1: Claude Haiku 4
1485 +* Stage 2: Claude Sonnet 3.5 (cached)
1486 +* Stage 3: Claude Sonnet 3.5
1400 1400  
1401 1401  * **Provider Interface:** Abstract LLMProvider interface implemented
1402 1402  
1403 1403  * **Configuration:** Environment variables for provider selection
1404 - * {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}}
1405 - * {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}}
1406 - * {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}}
1491 +* {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}}
1492 +* {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}}
1493 +* {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}}
1407 1407  
1408 1408  * **Failover:** Basic error handling with cache fallback for Stage 2
1409 1409  
... ... @@ -1423,9 +1423,10 @@
1423 1423  * Cost tracking includes provider name in logs
1424 1424  * Stage 2 falls back to cache on provider failure
1425 1425  
1426 -**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor.Specification.POC.API-and-Schemas.WebHome]] Section 6
1513 +**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor V0\.9\.105.Specification.POC.API-and-Schemas.WebHome]] Section 6
1427 1427  
1428 1428  **Dependencies:**
1516 +
1429 1429  * NFR-14 (Main Requirements)
1430 1430  * Design Decision 9
1431 1431  * Architecture Section 2.2
... ... @@ -1433,5 +1433,3 @@
1433 1433  **Priority:** HIGH (P1)
1434 1434  
1435 1435  **Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.
1436 -
1437 -