Wiki source code of POC Requirements (POC1 & POC2)

Version 1.1 by Robert Schaub on 2025/12/24 11:54

Show last authors
1 = POC Requirements =
2
3 **Status:** ✅ Approved for Development
4 **Version:** 2.0 (Updated after Specification Cross-Check)
5 **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
6
7 == 1. POC Overview ==
8
9 === 1.1 What POC Tests ===
10
11 **Core Question:**
12 > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?
13
14 **What we're proving:**
15 * AI can identify factual claims from text
16 * AI can evaluate those claims and produce verdicts
17 * Output is comprehensible and useful
18 * Fully automated approach is viable
19
20 **What we're NOT testing:**
21 * Scenario generation (deferred to POC2)
22 * Evidence display (deferred to POC2)
23 * Production scalability
24 * Perfect accuracy
25 * Complete feature set
26
27 === 1.2 Scenarios Deferred to POC2 ===
28
29 **Intentional Simplification:**
30
31 Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**.
32
33 **Rationale:**
34 * **POC1 tests:** Can AI extract claims and generate verdicts?
35 * **POC2 will add:** Scenario generation and management
36 * **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow?
37
38 **Design Decision:**
39
40 Prove basic AI capability first, then add scenario complexity based on POC1 learnings. This is good engineering: test the hardest part (AI fact-checking) before adding architectural complexity.
41
42 **No Risk:**
43
44 Scenarios are additive complexity, not foundational. Deferring them to POC2 allows:
45 * Faster POC1 validation
46 * Learning from POC1 to inform scenario design
47 * Iterative approach: fail fast if basic AI doesn't work
48 * Flexibility to adjust scenario architecture based on POC1 insights
49
50 **Full System Workflow (Future):**
51 {{code}}
52 Claims → Scenarios → Evidence → Verdicts
53 {{/code}}
54
55 **POC1 Simplified Workflow:**
56 {{code}}
57 Claims → Verdicts (scenarios implicit in reasoning)
58 {{/code}}
59
60 == 2. POC Output Specification ==
61
62 === 2.1 Component 1: ANALYSIS SUMMARY (Context-Aware) ===
63
64 **What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument
65
66 **Length:** 4-6 sentences
67
68 **Content (Required Elements):**
69 1. **Article's main thesis/claim** - What is the article trying to argue or prove?
70 2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts
71 3. **Central vs. supporting claims** - Which claims are central to the article's argument?
72 4. **Relationship assessment** - Do the claims support the article's conclusion?
73 5. **Overall credibility** - Final assessment considering claim importance
74
75 **Critical Innovation:**
76
77 POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might:
78 * Make accurate supporting facts but draw unsupported conclusions
79 * Have one false central claim that invalidates the whole argument
80 * Misframe accurate information to mislead
81
82 **Good Example (Context-Aware):**
83 {{code}}
84 This article argues that coffee cures cancer based on its antioxidant
85 content. We analyzed 3 factual claims: 2 about coffee's chemical
86 properties are well-supported, but the main causal claim is refuted
87 by current evidence. The article confuses correlation with causation.
88 Overall assessment: MISLEADING - makes an unsupported medical claim
89 despite citing some accurate facts.
90 {{/code}}
91
92 **Poor Example (Simple Aggregation - Don't Do This):**
93 {{code}}
94 This article makes 3 claims. 2 are well-supported and 1 is refuted.
95 Overall assessment: mostly accurate (67% accurate).
96 {{/code}}
97 ↑ This misses that the refuted claim IS the article's main point!
98
99 **What POC1 Tests:**
100
101 Can AI identify and assess:
102 * ✅ The article's main thesis/conclusion?
103 * ✅ Which claims are central vs. supporting?
104 * ✅ Whether the evidence supports the conclusion?
105 * ✅ Overall credibility considering logical structure?
106
107 **If AI Cannot Do This:**
108
109 That's valuable to learn in POC1! We'll:
110 * Note as limitation
111 * Fall back to simple aggregation with warning
112 * Design explicit article-level analysis for POC2
113
114 === 2.2 Component 2: CLAIMS IDENTIFICATION ===
115
116 **What:** List of factual claims extracted from article
117 **Format:** Numbered list
118 **Quantity:** 3-5 claims
119 **Requirements:**
120 * Factual claims only (not opinions/questions)
121 * Clearly stated
122 * Automatically extracted by AI
123
124 **Example:**
125 {{code}}
126 CLAIMS IDENTIFIED:
127
128 [1] Coffee reduces diabetes risk by 30%
129 [2] Coffee improves heart health
130 [3] Decaf has same benefits as regular
131 [4] Coffee prevents Alzheimer's completely
132 {{/code}}
133
134 === 2.3 Component 3: CLAIMS VERDICTS ===
135
136 **What:** Verdict for each claim identified
137 **Format:** Per claim structure
138
139 **Required Elements:**
140 * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
141 * **Confidence Score:** 0-100%
142 * **Brief Reasoning:** 1-3 sentences explaining why
143 * **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration
144
145 **Example:**
146 {{code}}
147 VERDICTS:
148
149 [1] WELL-SUPPORTED (85%) [Risk: C]
150 Multiple studies confirm 25-30% risk reduction with regular consumption.
151
152 [2] UNCERTAIN (65%) [Risk: B]
153 Evidence is mixed. Some studies show benefits, others show no effect.
154
155 [3] PARTIALLY SUPPORTED (60%) [Risk: C]
156 Some benefits overlap, but caffeine-related benefits are reduced in decaf.
157
158 [4] REFUTED (90%) [Risk: B]
159 No evidence for complete prevention. Claim is significantly overstated.
160 {{/code}}
161
162 **Risk Tier Display:**
163 * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections
164 * **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
165 * **Tier C (Green):** Low Risk - Facts/Definitions/History
166
167 **Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.
168
169 === 2.4 Component 4: ARTICLE SUMMARY (Optional) ===
170
171 **What:** Brief summary of original article content
172 **Length:** 3-5 sentences
173 **Tone:** Neutral (article's position, not FactHarbor's analysis)
174
175 **Example:**
176 {{code}}
177 ARTICLE SUMMARY:
178
179 Health News Today article discusses coffee benefits, citing studies
180 on diabetes and Alzheimer's. Author highlights research linking coffee
181 to disease prevention. Recommends 2-3 cups daily for optimal health.
182 {{/code}}
183
184 === 2.5 Component 5: USAGE STATISTICS (Cost Tracking) ===
185
186 **What:** LLM usage metrics for cost optimization and scaling decisions
187
188 **Purpose:**
189 * Understand cost per analysis
190 * Identify optimization opportunities
191 * Project costs at scale
192 * Inform architecture decisions
193
194 **Display Format:**
195 {{code}}
196 USAGE STATISTICS:
197 • Article: 2,450 words (12,300 characters)
198 • Input tokens: 15,234
199 • Output tokens: 892
200 • Total tokens: 16,126
201 • Estimated cost: $0.24 USD
202 • Response time: 8.3 seconds
203 • Cost per claim: $0.048
204 • Model: claude-sonnet-4-20250514
205 {{/code}}
206
207 **Why This Matters:**
208
209 At scale, LLM costs are critical:
210 * 10,000 articles/month ≈ $200-500/month
211 * 100,000 articles/month ≈ $2,000-5,000/month
212 * Cost optimization can reduce expenses 30-50%
213
214 **What POC1 Learns:**
215 * How cost scales with article length
216 * Prompt optimization opportunities (caching, compression)
217 * Output verbosity tradeoffs
218 * Model selection strategy (Sonnet vs. Haiku)
219 * Article length limits (if needed)
220
221 **Implementation:**
222 * Claude API already returns usage data
223 * No extra API calls needed
224 * Display to user + log for aggregate analysis
225 * Test with articles of varying lengths
226
227 **Critical for GO/NO-GO:** Unit economics must be viable at scale!
228
229 === 2.6 Total Output Size ===
230
231 **Combined:** ~220-350 words
232 * Analysis Summary (Context-Aware): 60-90 words (4-6 sentences)
233 * Claims Identification: 30-50 words
234 * Claims Verdicts: 100-150 words
235 * Article Summary: 30-50 words (optional)
236
237 **Note:** Analysis summary is slightly longer (4-6 sentences vs. 3-5) to accommodate context-aware assessment of article structure and logical reasoning.
238
239 == 3. What's NOT in POC Scope ==
240
241 === 3.1 Feature Exclusions ===
242
243 The following are **explicitly excluded** from POC:
244
245 **Content Features:**
246 * ❌ Scenarios (deferred to POC2)
247 * ❌ Evidence display (supporting/opposing lists)
248 * ❌ Source links (clickable references)
249 * ❌ Detailed reasoning chains
250 * ❌ Source quality ratings (shown but not detailed)
251 * ❌ Contradiction detection (basic only)
252 * ❌ Risk assessment (shown but not workflow-integrated)
253
254 **Platform Features:**
255 * ❌ User accounts / authentication
256 * ❌ Saved history
257 * ❌ Search functionality
258 * ❌ Claim comparison
259 * ❌ User contributions
260 * ❌ Commenting system
261 * ❌ Social sharing
262
263 **Technical Features:**
264 * ❌ Browser extensions
265 * ❌ Mobile apps
266 * ❌ API endpoints
267 * ❌ Webhooks
268 * ❌ Export features (PDF, CSV)
269
270 **Quality Features:**
271 * ❌ Accessibility (WCAG compliance)
272 * ❌ Multilingual support
273 * ❌ Mobile optimization
274 * ❌ Media verification (images/videos)
275
276 **Production Features:**
277 * ❌ Security hardening
278 * ❌ Privacy compliance (GDPR)
279 * ❌ Terms of service
280 * ❌ Monitoring/logging
281 * ❌ Error tracking
282 * ❌ Analytics
283 * ❌ A/B testing
284
285 == 4. POC Simplifications vs. Full System ==
286
287 === 4.1 Architecture Comparison ===
288
289 **POC Architecture (Simplified):**
290 {{code}}
291 User Input → Single AKEL Call → Output Display
292 (all processing)
293 {{/code}}
294
295 **Full System Architecture:**
296 {{code}}
297 User Input → Claim Extractor → Claim Classifier → Scenario Generator
298 → Evidence Summarizer → Contradiction Detector → Verdict Generator
299 → Quality Gates → Publication → Output Display
300 {{/code}}
301
302 **Key Differences:**
303
304 |=Aspect|=POC1|=Full System
305 |Processing|Single API call|Multi-component pipeline
306 |Scenarios|None (implicit)|Explicit entities with versioning
307 |Evidence|Basic retrieval|Comprehensive with quality scoring
308 |Quality Gates|Simplified (4 basic checks)|Full validation infrastructure
309 |Workflow|3 steps (input/process/output)|6 phases with gates
310 |Data Model|Stateless (no database)|PostgreSQL + Redis + S3
311 |Architecture|Single prompt to Claude|AKEL Orchestrator + Components
312
313 === 4.2 Workflow Comparison ===
314
315 **POC1 Workflow:**
316 1. User submits text/URL
317 2. Single AKEL call (all processing in one prompt)
318 3. Display results
319 **Total: 3 steps, ~10-18 seconds**
320
321 **Full System Workflow:**
322 1. **Claim Submission** (extraction, normalization, clustering)
323 2. **Scenario Building** (definitions, assumptions, boundaries)
324 3. **Evidence Handling** (retrieval, assessment, linking)
325 4. **Verdict Creation** (synthesis, reasoning, approval)
326 5. **Public Presentation** (summaries, landscapes, deep dives)
327 6. **Time Evolution** (versioning, re-evaluation triggers)
328 **Total: 6 phases with quality gates, ~10-30 seconds**
329
330 === 4.3 Why POC is Simplified ===
331
332 **Engineering Rationale:**
333
334 1. **Test core capability first:** Can AI do basic fact-checking without humans?
335 2. **Fail fast:** If AI can't generate reasonable verdicts, pivot early
336 3. **Learn before building:** POC1 insights inform full architecture
337 4. **Iterative approach:** Add complexity only after validating foundations
338 5. **Resource efficiency:** Don't build full system if core concept fails
339
340 **Acceptable Trade-offs:**
341
342 * ✅ POC proves AI capability (most risky assumption)
343 * ✅ POC validates user comprehension (can people understand output?)
344 * ❌ POC doesn't validate full workflow (test in Beta)
345 * ❌ POC doesn't validate scale (test in Beta)
346 * ❌ POC doesn't validate scenario architecture (design in POC2)
347
348 === 4.4 Gap Between POC1 and POC2/Beta ===
349
350 **What needs to be built for POC2:**
351 * Scenario generation component
352 * Evidence Model structure (full)
353 * Scenario-evidence linking
354 * Multi-interpretation comparison
355 * Truth landscape visualization
356
357 **What needs to be built for Beta:**
358 * Multi-component AKEL pipeline
359 * Quality gate infrastructure
360 * Review workflow system
361 * Audit sampling framework
362 * Production data model
363 * Federation architecture (Release 1.0)
364
365 **POC1 → POC2 is significant architectural expansion.**
366
367 == 5. Publication Mode & Labeling ==
368
369 === 5.1 POC Publication Mode ===
370
371 **Mode:** Mode 2 (AI-Generated, No Prior Human Review)
372
373 Per FactHarbor Specification Section 11 "POC v1 Behavior":
374 * Produces public AI-generated output
375 * No human approval gate
376 * Clear AI-Generated labeling
377 * All quality gates active (simplified)
378 * Risk tier classification shown (demo)
379
380 === 5.2 User-Facing Labels ===
381
382 **Primary Label (top of analysis):**
383 {{code}}
384 ╔════════════════════════════════════════════════════════════╗
385 ║ [AI-GENERATED - POC/DEMO] ║
386 ║ ║
387 ║ This analysis was produced entirely by AI and has not ║
388 ║ been human-reviewed. Use for demonstration purposes. ║
389 ║ ║
390 ║ Source: AI/AKEL v1.0 (POC) ║
391 ║ Review Status: Not Reviewed (Proof-of-Concept) ║
392 ║ Quality Gates: 4/4 Passed (Simplified) ║
393 ║ Last Updated: [timestamp] ║
394 ╚════════════════════════════════════════════════════════════╝
395 {{/code}}
396
397 **Per-Claim Risk Labels:**
398 * **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety)
399 * **[Risk: B]** 🟡 Medium Risk (Policy/Science)
400 * **[Risk: C]** 🟢 Low Risk (Facts/Definitions)
401
402 === 5.3 Display Requirements ===
403
404 **Must Show:**
405 * AI-Generated status (prominent)
406 * POC/Demo disclaimer
407 * Risk tier per claim
408 * Confidence scores (0-100%)
409 * Quality gate status (passed/failed)
410 * Timestamp
411
412 **Must NOT Claim:**
413 * Human review
414 * Production quality
415 * Medical/legal advice
416 * Authoritative verdicts
417 * Complete accuracy
418
419 === 5.4 Mode 2 vs. Full System Publication ===
420
421 |=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3
422 |Label|AI-Generated (POC)|AI-Generated|AKEL-Generated
423 |Review|None|None|Human-Reviewed
424 |Quality Gates|4 (simplified)|6 (full)|6 (full) + Human
425 |Audit|None (POC)|Sampling (5-50%)|Pre-publication
426 |Risk Display|Demo only|Workflow-integrated|Validated
427 |User Actions|View only|Flag for review|Trust rating
428
429 == 6. Quality Gates (Simplified Implementation) ==
430
431 === 6.1 Overview ===
432
433 Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates.
434
435 **Full System Has 4 Gates:**
436 1. Source Quality
437 2. Contradiction Search (MANDATORY)
438 3. Uncertainty Quantification
439 4. Structural Integrity
440
441 **POC Implements Simplified Versions:**
442 * Focus on demonstrating concept
443 * Basic implementations sufficient
444 * Failures displayed to user (not blocking)
445 * Full system has comprehensive validation
446
447 === 6.2 Gate 1: Source Quality (Basic) ===
448
449 **Full System Requirements:**
450 * Primary sources identified and accessible
451 * Source reliability scored against whitelist
452 * Citation completeness verified
453 * Publication dates checked
454 * Author credentials validated
455
456 **POC Implementation:**
457 * ✅ At least 2 sources found
458 * ✅ Sources accessible (URLs valid)
459 * ❌ No whitelist checking
460 * ❌ No credential validation
461 * ❌ No comprehensive reliability scoring
462
463 **Pass Criteria:** ≥2 accessible sources found
464
465 **Failure Handling:** Display error message, don't generate verdict
466
467 === 6.3 Gate 2: Contradiction Search (Basic) ===
468
469 **Full System Requirements:**
470 * Counter-evidence actively searched
471 * Reservations and limitations identified
472 * Alternative interpretations explored
473 * Bubble detection (echo chambers, conspiracy theories)
474 * Cross-cultural and international perspectives
475 * Academic literature (supporting AND opposing)
476
477 **POC Implementation:**
478 * ✅ Basic search for counter-evidence
479 * ✅ Identify obvious contradictions
480 * ❌ No comprehensive academic search
481 * ❌ No bubble detection
482 * ❌ No systematic alternative interpretation search
483 * ❌ No international perspective verification
484
485 **Pass Criteria:** Basic contradiction search attempted
486
487 **Failure Handling:** Note "limited contradiction search" in output
488
489 === 6.4 Gate 3: Uncertainty Quantification (Basic) ===
490
491 **Full System Requirements:**
492 * Confidence scores calculated for all claims/verdicts
493 * Limitations explicitly stated
494 * Data gaps identified and disclosed
495 * Strength of evidence assessed
496 * Alternative scenarios considered
497
498 **POC Implementation:**
499 * ✅ Confidence scores (0-100%)
500 * ✅ Basic uncertainty acknowledgment
501 * ❌ No detailed limitation disclosure
502 * ❌ No data gap identification
503 * ❌ No alternative scenario consideration (deferred to POC2)
504
505 **Pass Criteria:** Confidence score assigned
506
507 **Failure Handling:** Show "Confidence: Unknown" if calculation fails
508
509 === 6.5 Gate 4: Structural Integrity (Basic) ===
510
511 **Full System Requirements:**
512 * No hallucinations detected (fact-checking against sources)
513 * Logic chain valid and traceable
514 * References accessible and verifiable
515 * No circular reasoning
516 * Premises clearly stated
517
518 **POC Implementation:**
519 * ✅ Basic coherence check
520 * ✅ References accessible
521 * ❌ No comprehensive hallucination detection
522 * ❌ No formal logic validation
523 * ❌ No premise extraction and verification
524
525 **Pass Criteria:** Output is coherent and references are accessible
526
527 **Failure Handling:** Display error message
528
529 === 6.6 Quality Gate Display ===
530
531 **POC shows simplified status:**
532 {{code}}
533 Quality Gates: 4/4 Passed (Simplified)
534 ✓ Source Quality: 3 sources found
535 ✓ Contradiction Search: Basic search completed
536 ✓ Uncertainty: Confidence scores assigned
537 ✓ Structural Integrity: Output coherent
538 {{/code}}
539
540 **If any gate fails:**
541 {{code}}
542 Quality Gates: 3/4 Passed (Simplified)
543 ✓ Source Quality: 3 sources found
544 ✗ Contradiction Search: Search failed - limited evidence
545 ✓ Uncertainty: Confidence scores assigned
546 ✓ Structural Integrity: Output coherent
547
548 Note: This analysis has limited evidence. Use with caution.
549 {{/code}}
550
551 === 6.7 Simplified vs. Full System ===
552
553 |=Gate|=POC (Simplified)|=Full System
554 |Source Quality|≥2 sources accessible|Whitelist scoring, credentials, comprehensiveness
555 |Contradiction|Basic search|Systematic academic + media + international
556 |Uncertainty|Confidence % assigned|Detailed limitations, data gaps, alternatives
557 |Structural|Coherence check|Hallucination detection, logic validation, premise check
558
559 **POC Goal:** Demonstrate that quality gates are possible, not perfect implementation.
560
561 == 7. AKEL Architecture Comparison ==
562
563 === 7.1 POC AKEL (Simplified) ===
564
565 **Implementation:**
566 * Single Claude API call (Sonnet 4.5)
567 * One comprehensive prompt
568 * All processing in single request
569 * No separate components
570 * No orchestration layer
571
572 **Prompt Structure:**
573 {{code}}
574 Task: Analyze this article and provide:
575
576 1. Extract 3-5 factual claims
577 2. For each claim:
578 - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
579 - Assign confidence score (0-100%)
580 - Assign risk tier (A/B/C)
581 - Write brief reasoning (1-3 sentences)
582 3. Generate analysis summary (3-5 sentences)
583 4. Generate article summary (3-5 sentences)
584 5. Run basic quality checks
585
586 Return as structured JSON.
587 {{/code}}
588
589 **Processing Time:** 10-18 seconds (estimate)
590
591 === 7.2 Full System AKEL (Production) ===
592
593 **Architecture:**
594 {{code}}
595 AKEL Orchestrator
596 ├── Claim Extractor
597 ├── Claim Classifier (with risk tier assignment)
598 ├── Scenario Generator
599 ├── Evidence Summarizer
600 ├── Contradiction Detector
601 ├── Quality Gate Validator
602 ├── Audit Sampling Scheduler
603 └── Federation Sync Adapter (Release 1.0+)
604 {{/code}}
605
606 **Processing:**
607 * Parallel processing where possible
608 * Separate component calls
609 * Quality gates between phases
610 * Audit sampling selection
611 * Cross-node coordination (federated mode)
612
613 **Processing Time:** 10-30 seconds (full pipeline)
614
615 === 7.3 Why POC Uses Single Call ===
616
617 **Advantages:**
618 * ✅ Simpler to implement
619 * ✅ Faster POC development
620 * ✅ Easier to debug
621 * ✅ Proves AI capability
622 * ✅ Good enough for concept validation
623
624 **Limitations:**
625 * ❌ No component reusability
626 * ❌ No parallel processing
627 * ❌ All-or-nothing (can't partially succeed)
628 * ❌ Harder to improve individual components
629 * ❌ No audit sampling
630
631 **Acceptable Trade-off:**
632
633 POC tests "Can AI do this?" not "How should we architect it?"
634
635 Full component architecture comes in Beta after POC validates concept.
636
637 === 7.4 Evolution Path ===
638
639 **POC1:** Single prompt → Prove concept
640 **POC2:** Add scenario component → Test full pipeline
641 **Beta:** Multi-component AKEL → Production architecture
642 **Release 1.0:** Full AKEL + Federation → Scale
643
644 == 8. Functional Requirements ==
645
646 === FR-POC-1: Article Input ===
647
648 **Requirement:** User can submit article for analysis
649
650 **Functionality:**
651 * Text input field (paste article text, up to 5000 characters)
652 * URL input field (paste article URL)
653 * "Analyze" button to trigger processing
654 * Loading indicator during analysis
655
656 **Excluded:**
657 * No user authentication
658 * No claim history
659 * No search functionality
660 * No saved templates
661
662 **Acceptance Criteria:**
663 * User can paste text from article
664 * User can paste URL of article
665 * System accepts input and triggers analysis
666
667 === FR-POC-2: Claim Extraction (Fully Automated) ===
668
669 **Requirement:** AI automatically extracts 3-5 factual claims
670
671 **Functionality:**
672 * AI reads article text
673 * AI identifies factual claims (not opinions/questions)
674 * AI extracts 3-5 most important claims
675 * System displays numbered list
676
677 **Critical:** NO MANUAL EDITING ALLOWED
678 * AI selects which claims to extract
679 * AI identifies factual vs. non-factual
680 * System processes claims as extracted
681 * No human curation or correction
682
683 **Error Handling:**
684 * If extraction fails: Display error message
685 * User can retry with different input
686 * No manual intervention to fix extraction
687
688 **Acceptance Criteria:**
689 * AI extracts 3-5 claims automatically
690 * Claims are factual (not opinions)
691 * Claims are clearly stated
692 * No manual editing required
693
694 === FR-POC-3: Verdict Generation (Fully Automated) ===
695
696 **Requirement:** AI automatically generates verdict for each claim
697
698 **Functionality:**
699 * For each claim, AI:
700 * Evaluates claim based on available evidence/knowledge
701 * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
702 * Assigns confidence score (0-100%)
703 * Assigns risk tier (A/B/C)
704 * Writes brief reasoning (1-3 sentences)
705 * System displays verdict for each claim
706
707 **Critical:** NO MANUAL EDITING ALLOWED
708 * AI computes verdicts based on evidence
709 * AI generates confidence scores
710 * AI writes reasoning
711 * No human review or adjustment
712
713 **Error Handling:**
714 * If verdict generation fails: Display error message
715 * User can retry
716 * No manual intervention to adjust verdicts
717
718 **Acceptance Criteria:**
719 * Each claim has a verdict
720 * Confidence score is displayed (0-100%)
721 * Risk tier is displayed (A/B/C)
722 * Reasoning is understandable (1-3 sentences)
723 * Verdict is defensible given reasoning
724 * All generated automatically by AI
725
726 === FR-POC-4: Analysis Summary (Fully Automated) ===
727
728 **Requirement:** AI generates brief summary of analysis
729
730 **Functionality:**
731 * AI summarizes findings in 3-5 sentences:
732 * How many claims found
733 * Distribution of verdicts
734 * Overall assessment
735 * System displays at top of results
736
737 **Critical:** NO MANUAL EDITING ALLOWED
738
739 **Acceptance Criteria:**
740 * Summary is coherent
741 * Accurately reflects analysis
742 * 3-5 sentences
743 * Automatically generated
744
745 === FR-POC-5: Article Summary (Fully Automated, Optional) ===
746
747 **Requirement:** AI generates brief summary of original article
748
749 **Functionality:**
750 * AI summarizes article content (not FactHarbor's analysis)
751 * 3-5 sentences
752 * System displays
753
754 **Note:** Optional - can skip if time limited
755
756 **Critical:** NO MANUAL EDITING ALLOWED
757
758 **Acceptance Criteria:**
759 * Summary is neutral (article's position)
760 * Accurately reflects article content
761 * 3-5 sentences
762 * Automatically generated
763
764 === FR-POC-6: Publication Mode Display ===
765
766 **Requirement:** Clear labeling of AI-generated content
767
768 **Functionality:**
769 * Display Mode 2 publication label
770 * Show POC/Demo disclaimer
771 * Display risk tiers per claim
772 * Show quality gate status
773 * Display timestamp
774
775 **Acceptance Criteria:**
776 * Label is prominent and clear
777 * User understands this is AI-generated POC output
778 * Risk tiers are color-coded
779 * Quality gate status is visible
780
781 === FR-POC-7: Quality Gate Execution ===
782
783 **Requirement:** Execute simplified quality gates
784
785 **Functionality:**
786 * Check source quality (basic)
787 * Attempt contradiction search (basic)
788 * Calculate confidence scores
789 * Verify structural integrity (basic)
790 * Display gate results
791
792 **Acceptance Criteria:**
793 * All 4 gates attempted
794 * Pass/fail status displayed
795 * Failures explained to user
796 * Gates don't block publication (POC mode)
797
798 == 9. Non-Functional Requirements ==
799
800 === NFR-POC-1: Fully Automated Processing ===
801
802 **Requirement:** Complete AI automation with zero manual intervention
803
804 **Critical Rule:** NO MANUAL EDITING AT ANY STAGE
805
806 **What this means:**
807 * Claims: AI selects (no human curation)
808 * Scenarios: N/A (deferred to POC2)
809 * Evidence: AI evaluates (no human selection)
810 * Verdicts: AI determines (no human adjustment)
811 * Summaries: AI writes (no human editing)
812
813 **Pipeline:**
814 {{code}}
815 User Input → AKEL Processing → Output Display
816
817 ZERO human editing
818 {{/code}}
819
820 **If AI output is poor:**
821 * ❌ Do NOT manually fix it
822 * ✅ Document the failure
823 * ✅ Improve prompts and retry
824 * ✅ Accept that POC might fail
825
826 **Why this matters:**
827 * Tests whether AI can do this without humans
828 * Validates scalability (humans can't review every analysis)
829 * Honest test of technical feasibility
830
831 === NFR-POC-2: Performance ===
832
833 **Requirement:** Analysis completes in reasonable time
834
835 **Acceptable Performance:**
836 * Processing time: 1-5 minutes (acceptable for POC)
837 * Display loading indicator to user
838 * Show progress if possible ("Extracting claims...", "Generating verdicts...")
839
840 **Not Required:**
841 * Production-level speed (< 30 seconds)
842 * Optimization for scale
843 * Caching
844
845 **Acceptance Criteria:**
846 * Analysis completes within 5 minutes
847 * User sees loading indicator
848 * No timeout errors
849
850 === NFR-POC-3: Reliability ===
851
852 **Requirement:** System works for manual testing sessions
853
854 **Acceptable:**
855 * Occasional errors (< 20% failure rate)
856 * Manual restart if needed
857 * Display error messages clearly
858
859 **Not Required:**
860 * 99.9% uptime
861 * Automatic error recovery
862 * Production monitoring
863
864 **Acceptance Criteria:**
865 * System works for test demonstrations
866 * Errors are handled gracefully
867 * User receives clear error messages
868
869 === NFR-POC-4: Environment ===
870
871 **Requirement:** Runs on simple infrastructure
872
873 **Acceptable:**
874 * Single machine or simple cloud setup
875 * No distributed architecture
876 * No load balancing
877 * No redundancy
878 * Local development environment viable
879
880 **Not Required:**
881 * Production infrastructure
882 * Multi-region deployment
883 * Auto-scaling
884 * Disaster recovery
885
886 === NFR-POC-5: Cost Efficiency Tracking ===
887
888 **Requirement:** Track and display LLM usage metrics to inform optimization decisions
889
890 **Must Track:**
891 * Input tokens (article + prompt)
892 * Output tokens (generated analysis)
893 * Total tokens
894 * Estimated cost (USD)
895 * Response time (seconds)
896 * Article length (words/characters)
897
898 **Must Display:**
899 * Usage statistics in UI (Component 5)
900 * Cost per analysis
901 * Cost per claim extracted
902
903 **Must Log:**
904 * Aggregate metrics for analysis
905 * Cost distribution by article length
906 * Token efficiency trends
907
908 **Purpose:**
909 * Understand unit economics
910 * Identify optimization opportunities
911 * Project costs at scale
912 * Inform architecture decisions (caching, model selection, etc.)
913
914 **Acceptance Criteria:**
915 * ✅ Usage data displayed after each analysis
916 * ✅ Metrics logged for aggregate analysis
917 * ✅ Cost calculated accurately (Claude API pricing)
918 * ✅ Test cases include varying article lengths
919 * ✅ POC1 report includes cost analysis section
920
921 **Success Target:**
922 * Average cost per analysis < $0.05 USD
923 * Cost scaling behavior understood (linear/exponential)
924 * 2+ optimization opportunities identified
925
926 **Critical:** Unit economics must be viable for scaling decision!
927
928 == 10. Technical Architecture ==
929
930 === 10.1 System Components ===
931
932 **Frontend:**
933 * Simple HTML form (text input + URL input + button)
934 * Loading indicator
935 * Results display page (single page, no tabs/navigation)
936
937 **Backend:**
938 * Single API endpoint
939 * Calls Claude API (Sonnet 4.5 or latest)
940 * Parses response
941 * Returns JSON to frontend
942
943 **Data Storage:**
944 * None required (stateless POC)
945 * Optional: Simple file storage or SQLite for demo examples
946
947 **External Services:**
948 * Claude API (Anthropic) - required
949 * Optional: URL fetch service for article text extraction
950
951 === 10.2 Processing Flow ===
952
953 {{code}}
954 1. User submits text or URL
955
956 2. Backend receives request
957
958 3. If URL: Fetch article text
959
960 4. Call Claude API with single prompt:
961 "Extract claims, evaluate each, provide verdicts"
962
963 5. Claude API returns:
964 - Analysis summary
965 - Claims list
966 - Verdicts for each claim (with risk tiers)
967 - Article summary (optional)
968 - Quality gate results
969
970 6. Backend parses response
971
972 7. Frontend displays results with Mode 2 labeling
973 {{/code}}
974
975 **Key Simplification:** Single API call does entire analysis
976
977 === 10.3 AI Prompt Strategy ===
978
979 **Single Comprehensive Prompt:**
980 {{code}}
981 Task: Analyze this article and provide:
982
983 1. Identify the article's main thesis/conclusion
984 - What is the article trying to argue or prove?
985 - What is the primary claim or conclusion?
986
987 2. Extract 3-5 factual claims from the article
988 - Note which claims are CENTRAL to the main thesis
989 - Note which claims are SUPPORTING facts
990
991 3. For each claim:
992 - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
993 - Assign confidence score (0-100%)
994 - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
995 - Write brief reasoning (1-3 sentences)
996
997 4. Assess relationship between claims and main thesis:
998 - Do the claims actually support the article's conclusion?
999 - Are there logical leaps or unsupported inferences?
1000 - Is the article's framing misleading even if individual facts are accurate?
1001
1002 5. Run quality gates:
1003 - Check: ≥2 sources found
1004 - Attempt: Basic contradiction search
1005 - Calculate: Confidence scores
1006 - Verify: Structural integrity
1007
1008 6. Write context-aware analysis summary (4-6 sentences):
1009 - State article's main thesis
1010 - Report claims found and verdict distribution
1011 - Note if central claims are problematic
1012 - Assess whether evidence supports conclusion
1013 - Overall credibility considering claim importance
1014
1015 7. Write article summary (3-5 sentences: neutral summary of article content)
1016
1017 Return as structured JSON with quality gate results.
1018 {{/code}}
1019
1020 **One prompt generates everything.**
1021
1022 **Critical Addition:**
1023
1024 Steps 1, 2 (marking central claims), 4, and 6 are NEW for context-aware analysis. These test whether AI can distinguish between "accurate facts poorly reasoned" vs. "genuinely credible article."
1025
1026 === 10.4 Technology Stack Suggestions ===
1027
1028 **Frontend:**
1029 * HTML + CSS + JavaScript (minimal framework)
1030 * OR: Next.js (if team prefers)
1031 * Hosted: Local machine OR Vercel/Netlify free tier
1032
1033 **Backend:**
1034 * Python Flask/FastAPI (simple REST API)
1035 * OR: Next.js API routes (if using Next.js)
1036 * Hosted: Local machine OR Railway/Render free tier
1037
1038 **AKEL Integration:**
1039 * Claude API via Anthropic SDK
1040 * Model: Claude Sonnet 4.5 or latest available
1041
1042 **Database:**
1043 * None (stateless acceptable)
1044 * OR: SQLite if want to store demo examples
1045 * OR: JSON files on disk
1046
1047 **Deployment:**
1048 * Local development environment sufficient for POC
1049 * Optional: Deploy to cloud for remote demos
1050
1051 == 11. Success Criteria ==
1052
1053 === 11.1 Minimum Success (POC Passes) ===
1054
1055 **Required for GO decision:**
1056 * ✅ AI extracts 3-5 factual claims automatically
1057 * ✅ AI provides verdict for each claim automatically
1058 * ✅ Verdicts are reasonable (≥70% make logical sense)
1059 * ✅ Analysis summary is coherent
1060 * ✅ Output is comprehensible to reviewers
1061 * ✅ Team/advisors understand the output
1062 * ✅ Team agrees approach has merit
1063 * ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention)
1064 * ✅ **Cost efficiency acceptable** (average cost per analysis < $0.05 USD target)
1065 * ✅ **Cost scaling understood** (data collected on article length vs. cost)
1066 * ✅ **Optimization opportunities identified** (≥2 potential improvements documented)
1067
1068 **Quality Definition:**
1069 * "Reasonable verdict" = Defensible given general knowledge
1070 * "Coherent summary" = Logically structured, grammatically correct
1071 * "Comprehensible" = Reviewers understand what analysis means
1072
1073 === 11.2 POC Fails If ===
1074
1075 **Automatic NO-GO if any of these:**
1076 * ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones)
1077 * ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random)
1078 * ❌ Output incomprehensible (reviewers can't understand analysis)
1079 * ❌ **Requires manual editing for most analyses** (> 50% need human correction)
1080 * ❌ Team loses confidence in AI-automated approach
1081
1082 === 11.3 Quality Thresholds ===
1083
1084 **POC quality expectations:**
1085
1086 |=Component|=Quality Threshold|=Definition
1087 |Claim Extraction|(% class="success" %)≥70% accuracy(%%) |Identifies obvious factual claims, may miss some edge cases
1088 |Verdict Logic|(% class="success" %)≥70% defensible(%%) |Verdicts are logical given reasoning provided
1089 |Reasoning Clarity|(% class="success" %)≥70% clear(%%) |1-3 sentences are understandable and relevant
1090 |Overall Analysis|(% class="success" %)≥70% useful(%%) |Output helps user understand article claims
1091
1092 **Analogy:** "B student" quality (70-80%), not "A+" perfection yet
1093
1094 **Not expecting:**
1095 * 100% accuracy
1096 * Perfect claim coverage
1097 * Comprehensive evidence gathering
1098 * Flawless verdicts
1099 * Production polish
1100
1101 **Expecting:**
1102 * Reasonable claim extraction
1103 * Defensible verdicts
1104 * Understandable reasoning
1105 * Useful output
1106
1107 == 12. Test Cases ==
1108
1109 === 12.1 Test Case 1: Simple Factual Claim ===
1110
1111 **Input:** "Coffee reduces the risk of type 2 diabetes by 30%"
1112
1113 **Expected Output:**
1114 * Extract claim correctly
1115 * Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED
1116 * Confidence: 70-90%
1117 * Risk tier: C (Low)
1118 * Reasoning: Mentions studies or evidence
1119
1120 **Success:** Verdict is reasonable and reasoning makes sense
1121
1122 === 12.2 Test Case 2: Complex News Article ===
1123
1124 **Input:** News article URL with multiple claims about politics/health/science
1125
1126 **Expected Output:**
1127 * Extract 3-5 key claims
1128 * Verdict for each (may vary: some supported, some uncertain, some refuted)
1129 * Coherent analysis summary
1130 * Article summary
1131 * Risk tiers assigned appropriately
1132
1133 **Success:** Claims identified are actually from article, verdicts are reasonable
1134
1135 === 12.3 Test Case 3: Controversial Topic ===
1136
1137 **Input:** Article on contested political or scientific topic
1138
1139 **Expected Output:**
1140 * Balanced analysis
1141 * Acknowledges uncertainty where appropriate
1142 * Doesn't overstate confidence
1143 * Reasoning shows awareness of complexity
1144
1145 **Success:** Analysis is fair and doesn't show obvious bias
1146
1147 === 12.4 Test Case 4: Clearly False Claim ===
1148
1149 **Input:** Article with obviously false claim (e.g., "The Earth is flat")
1150
1151 **Expected Output:**
1152 * Extract claim
1153 * Verdict: REFUTED
1154 * High confidence (> 90%)
1155 * Risk tier: C (Low - established fact)
1156 * Clear reasoning
1157
1158 **Success:** AI correctly identifies false claim with high confidence
1159
1160 === 12.5 Test Case 5: Genuinely Uncertain Claim ===
1161
1162 **Input:** Article with claim where evidence is genuinely mixed
1163
1164 **Expected Output:**
1165 * Extract claim
1166 * Verdict: UNCERTAIN
1167 * Moderate confidence (40-60%)
1168 * Reasoning explains why uncertain
1169
1170 **Success:** AI recognizes uncertainty and doesn't overstate confidence
1171
1172 === 12.6 Test Case 6: High-Risk Medical Claim ===
1173
1174 **Input:** Article making medical claims
1175
1176 **Expected Output:**
1177 * Extract claim
1178 * Verdict: [appropriate based on evidence]
1179 * Risk tier: A (High - medical)
1180 * Red label displayed
1181 * Clear disclaimer about not being medical advice
1182
1183 **Success:** Risk tier correctly assigned, appropriate warnings shown
1184
1185 == 13. POC Decision Gate ==
1186
1187 === 13.1 Decision Framework ===
1188
1189 After POC testing complete, team makes one of three decisions:
1190
1191 **Option A: GO (Proceed to POC2)**
1192
1193 **Conditions:**
1194 * AI quality ≥70% without manual editing
1195 * Basic claim → verdict pipeline validated
1196 * Internal + advisor feedback positive
1197 * Technical feasibility confirmed
1198 * Team confident in direction
1199 * Clear path to improving AI quality to ≥90%
1200
1201 **Next Steps:**
1202 * Plan POC2 development (add scenarios)
1203 * Design scenario architecture
1204 * Expand to Evidence Model structure
1205 * Test with more complex articles
1206
1207 **Option B: NO-GO (Pivot or Stop)**
1208
1209 **Conditions:**
1210 * AI quality < 60%
1211 * Requires manual editing for most analyses (> 50%)
1212 * Feedback indicates fundamental flaws
1213 * Cost/effort not justified by value
1214 * No clear path to improvement
1215
1216 **Next Steps:**
1217 * **Pivot:** Change to hybrid human-AI approach (accept manual review required)
1218 * **Stop:** Conclude approach not viable, revisit later
1219
1220 **Option C: ITERATE (Improve POC)**
1221
1222 **Conditions:**
1223 * Concept has merit but execution needs work
1224 * Specific improvements identified
1225 * Addressable with better prompts/approach
1226 * AI quality between 60-70%
1227
1228 **Next Steps:**
1229 * Improve AI prompts
1230 * Test different approaches
1231 * Re-run POC with improvements
1232 * Then make GO/NO-GO decision
1233
1234 === 13.2 Decision Criteria Summary ===
1235
1236 {{code}}
1237 AI Quality < 60% → NO-GO (approach doesn't work)
1238 AI Quality 60-70% → ITERATE (improve and retry)
1239 AI Quality ≥70% → GO (proceed to POC2)
1240 {{/code}}
1241
1242 == 14. Key Risks & Mitigations ==
1243
1244 === 14.1 Risk: AI Quality Not Good Enough ===
1245
1246 **Likelihood:** Medium-High
1247 **Impact:** POC fails
1248
1249 **Mitigation:**
1250 * Extensive prompt engineering and testing
1251 * Use best available AI models (Sonnet 4.5)
1252 * Test with diverse article types
1253 * Iterate on prompts based on results
1254
1255 **Acceptance:** This is what POC tests - be ready for failure
1256
1257 === 14.2 Risk: AI Consistency Issues ===
1258
1259 **Likelihood:** Medium
1260 **Impact:** Works sometimes, fails other times
1261
1262 **Mitigation:**
1263 * Test with 10+ diverse articles
1264 * Measure success rate honestly
1265 * Improve prompts to increase consistency
1266
1267 **Acceptance:** Some variability OK if average quality ≥70%
1268
1269 === 14.3 Risk: Output Incomprehensible ===
1270
1271 **Likelihood:** Low-Medium
1272 **Impact:** Users can't understand analysis
1273
1274 **Mitigation:**
1275 * Create clear explainer document
1276 * Iterate on output format
1277 * Test with non-technical reviewers
1278 * Simplify language if needed
1279
1280 **Acceptance:** Iterate until comprehensible
1281
1282 === 14.4 Risk: API Rate Limits / Costs ===
1283
1284 **Likelihood:** Low
1285 **Impact:** System slow or expensive
1286
1287 **Mitigation:**
1288 * Monitor API usage
1289 * Implement retry logic
1290 * Estimate costs before scaling
1291
1292 **Acceptance:** POC can be slow and expensive (optimization later)
1293
1294 === 14.5 Risk: Scope Creep ===
1295
1296 **Likelihood:** Medium
1297 **Impact:** POC becomes too complex
1298
1299 **Mitigation:**
1300 * Strict scope discipline
1301 * Say NO to feature additions
1302 * Keep focus on core question
1303
1304 **Acceptance:** POC is minimal by design
1305
1306 == 15. POC Philosophy ==
1307
1308 === 15.1 Core Principles ===
1309
1310 **1. Build Less, Learn More**
1311 * Minimum features to test hypothesis
1312 * Don't build unvalidated features
1313 * Focus on core question only
1314
1315 **2. Fail Fast**
1316 * Quick test of hardest part (AI capability)
1317 * Accept that POC might fail
1318 * Better to discover issues early
1319 * Honest assessment over optimistic hope
1320
1321 **3. Test First, Build Second**
1322 * Validate AI can do this before building platform
1323 * Don't assume it will work
1324 * Let results guide decisions
1325
1326 **4. Automation First**
1327 * No manual editing allowed
1328 * Tests scalability, not just feasibility
1329 * Proves approach can work at scale
1330
1331 **5. Honest Assessment**
1332 * Don't cherry-pick examples
1333 * Don't manually fix bad outputs
1334 * Document failures openly
1335 * Make data-driven decisions
1336
1337 === 15.2 What POC Is ===
1338
1339 ✅ Testing AI capability without humans
1340 ✅ Proving core technical concept
1341 ✅ Fast validation of approach
1342 ✅ Honest assessment of feasibility
1343
1344 === 15.3 What POC Is NOT ===
1345
1346 ❌ Building a product
1347 ❌ Production-ready system
1348 ❌ Feature-complete platform
1349 ❌ Perfectly accurate analysis
1350 ❌ Polished user experience
1351
1352 == 16. Success = Clear Path Forward ==
1353
1354 **If POC succeeds (≥70% AI quality):**
1355 * ✅ Approach validated
1356 * ✅ Proceed to POC2 (add scenarios)
1357 * ✅ Design full Evidence Model structure
1358 * ✅ Test multi-scenario comparison
1359 * ✅ Focus on improving AI quality from 70% → 90%
1360
1361 **If POC fails (< 60% AI quality):**
1362 * ✅ Learn what doesn't work
1363 * ✅ Pivot to different approach
1364 * ✅ OR wait for better AI technology
1365 * ✅ Avoid wasting resources on non-viable approach
1366
1367 **Either way, POC provides clarity.**
1368
1369 == 17. Related Pages ==
1370
1371 * [[User Needs>>Test.FactHarbor.Specification.Requirements.User Needs.WebHome]]
1372 * [[Requirements>>Test.FactHarbor.Specification.Requirements.WebHome]]
1373 * [[Gap Analysis>>Test.FactHarbor.Specification.Requirements.GapAnalysis]]
1374 * [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]
1375 * [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
1376 * [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]
1377
1378 **Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)