Wiki source code of POC Requirements

Version 1.1 by Robert Schaub on 2025/12/19 16:13

Hide last authors
Robert Schaub 1.1 1 = POC Requirements =
2
3 **Status:** ✅ Approved for Development
4 **Version:** 2.0 (Updated after Specification Cross-Check)
5 **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
6
7 ---
8
9 == 1. POC Overview ==
10
11 === 1.1 What POC Tests ===
12
13 **Core Question:**
14 > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?
15
16 **What we're proving:**
17 * AI can identify factual claims from text
18 * AI can evaluate those claims and produce verdicts
19 * Output is comprehensible and useful
20 * Fully automated approach is viable
21
22 **What we're NOT testing:**
23 * Scenario generation (deferred to POC2)
24 * Evidence display (deferred to POC2)
25 * Production scalability
26 * Perfect accuracy
27 * Complete feature set
28
29 ---
30
31 === 1.2 Scenarios Deferred to POC2 ===
32
33 **Intentional Simplification:**
34
35 Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**.
36
37 **Rationale:**
38 * **POC1 tests:** Can AI extract claims and generate verdicts?
39 * **POC2 will add:** Scenario generation and management
40 * **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow?
41
42 **Design Decision:**
43
44 Prove basic AI capability first, then add scenario complexity based on POC1 learnings. This is good engineering: test the hardest part (AI fact-checking) before adding architectural complexity.
45
46 **No Risk:**
47
48 Scenarios are additive complexity, not foundational. Deferring them to POC2 allows:
49 * Faster POC1 validation
50 * Learning from POC1 to inform scenario design
51 * Iterative approach: fail fast if basic AI doesn't work
52 * Flexibility to adjust scenario architecture based on POC1 insights
53
54 **Full System Workflow (Future):**
55 {{code}}
56 Claims → Scenarios → Evidence → Verdicts
57 {{/code}}
58
59 **POC1 Simplified Workflow:**
60 {{code}}
61 Claims → Verdicts (scenarios implicit in reasoning)
62 {{/code}}
63
64 ---
65
66 == 2. POC Output Specification ==
67
68 === 2.1 Component 1: ANALYSIS SUMMARY ===
69
70 **What:** Brief overview of findings
71 **Length:** 3-5 sentences
72 **Content:**
73 * How many claims found
74 * Distribution of verdicts
75 * Overall assessment
76
77 **Example:**
78 {{code}}
79 This article makes 4 claims about coffee's health effects. We found
80 2 claims are well-supported, 1 is uncertain, and 1 is refuted.
81 Overall assessment: mostly accurate with some exaggeration.
82 {{/code}}
83
84 ---
85
86 === 2.2 Component 2: CLAIMS IDENTIFICATION ===
87
88 **What:** List of factual claims extracted from article
89 **Format:** Numbered list
90 **Quantity:** 3-5 claims
91 **Requirements:**
92 * Factual claims only (not opinions/questions)
93 * Clearly stated
94 * Automatically extracted by AI
95
96 **Example:**
97 {{code}}
98 CLAIMS IDENTIFIED:
99
100 [1] Coffee reduces diabetes risk by 30%
101 [2] Coffee improves heart health
102 [3] Decaf has same benefits as regular
103 [4] Coffee prevents Alzheimer's completely
104 {{/code}}
105
106 ---
107
108 === 2.3 Component 3: CLAIMS VERDICTS ===
109
110 **What:** Verdict for each claim identified
111 **Format:** Per claim structure
112
113 **Required Elements:**
114 * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
115 * **Confidence Score:** 0-100%
116 * **Brief Reasoning:** 1-3 sentences explaining why
117 * **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration
118
119 **Example:**
120 {{code}}
121 VERDICTS:
122
123 [1] WELL-SUPPORTED (85%) [Risk: C]
124 Multiple studies confirm 25-30% risk reduction with regular consumption.
125
126 [2] UNCERTAIN (65%) [Risk: B]
127 Evidence is mixed. Some studies show benefits, others show no effect.
128
129 [3] PARTIALLY SUPPORTED (60%) [Risk: C]
130 Some benefits overlap, but caffeine-related benefits are reduced in decaf.
131
132 [4] REFUTED (90%) [Risk: B]
133 No evidence for complete prevention. Claim is significantly overstated.
134 {{/code}}
135
136 **Risk Tier Display:**
137 * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections
138 * **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
139 * **Tier C (Green):** Low Risk - Facts/Definitions/History
140
141 **Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.
142
143 ---
144
145 === 2.4 Component 4: ARTICLE SUMMARY (Optional) ===
146
147 **What:** Brief summary of original article content
148 **Length:** 3-5 sentences
149 **Tone:** Neutral (article's position, not FactHarbor's analysis)
150
151 **Example:**
152 {{code}}
153 ARTICLE SUMMARY:
154
155 Health News Today article discusses coffee benefits, citing studies
156 on diabetes and Alzheimer's. Author highlights research linking coffee
157 to disease prevention. Recommends 2-3 cups daily for optimal health.
158 {{/code}}
159
160 ---
161
162 === 2.5 Total Output Size ===
163
164 **Combined:** ~200-300 words
165 * Analysis Summary: 50-70 words
166 * Claims Identification: 30-50 words
167 * Claims Verdicts: 100-150 words
168 * Article Summary: 30-50 words (optional)
169
170 ---
171
172 == 3. What's NOT in POC Scope ==
173
174 === 3.1 Feature Exclusions ===
175
176 The following are **explicitly excluded** from POC:
177
178 **Content Features:**
179 * ❌ Scenarios (deferred to POC2)
180 * ❌ Evidence display (supporting/opposing lists)
181 * ❌ Source links (clickable references)
182 * ❌ Detailed reasoning chains
183 * ❌ Source quality ratings (shown but not detailed)
184 * ❌ Contradiction detection (basic only)
185 * ❌ Risk assessment (shown but not workflow-integrated)
186
187 **Platform Features:**
188 * ❌ User accounts / authentication
189 * ❌ Saved history
190 * ❌ Search functionality
191 * ❌ Claim comparison
192 * ❌ User contributions
193 * ❌ Commenting system
194 * ❌ Social sharing
195
196 **Technical Features:**
197 * ❌ Browser extensions
198 * ❌ Mobile apps
199 * ❌ API endpoints
200 * ❌ Webhooks
201 * ❌ Export features (PDF, CSV)
202
203 **Quality Features:**
204 * ❌ Accessibility (WCAG compliance)
205 * ❌ Multilingual support
206 * ❌ Mobile optimization
207 * ❌ Media verification (images/videos)
208
209 **Production Features:**
210 * ❌ Security hardening
211 * ❌ Privacy compliance (GDPR)
212 * ❌ Terms of service
213 * ❌ Monitoring/logging
214 * ❌ Error tracking
215 * ❌ Analytics
216 * ❌ A/B testing
217
218 ---
219
220 == 4. POC Simplifications vs. Full System ==
221
222 === 4.1 Architecture Comparison ===
223
224 **POC Architecture (Simplified):**
225 {{code}}
226 User Input → Single AKEL Call → Output Display
227 (all processing)
228 {{/code}}
229
230 **Full System Architecture:**
231 {{code}}
232 User Input → Claim Extractor → Claim Classifier → Scenario Generator
233 → Evidence Summarizer → Contradiction Detector → Verdict Generator
234 → Quality Gates → Publication → Output Display
235 {{/code}}
236
237 **Key Differences:**
238
239 |=Aspect|=POC1|=Full System
240 |Processing|Single API call|Multi-component pipeline
241 |Scenarios|None (implicit)|Explicit entities with versioning
242 |Evidence|Basic retrieval|Comprehensive with quality scoring
243 |Quality Gates|Simplified (4 basic checks)|Full validation infrastructure
244 |Workflow|3 steps (input/process/output)|6 phases with gates
245 |Data Model|Stateless (no database)|PostgreSQL + Redis + S3
246 |Architecture|Single prompt to Claude|AKEL Orchestrator + Components
247
248 ---
249
250 === 4.2 Workflow Comparison ===
251
252 **POC1 Workflow:**
253 1. User submits text/URL
254 2. Single AKEL call (all processing in one prompt)
255 3. Display results
256 **Total: 3 steps, ~10-18 seconds**
257
258 **Full System Workflow:**
259 1. **Claim Submission** (extraction, normalization, clustering)
260 2. **Scenario Building** (definitions, assumptions, boundaries)
261 3. **Evidence Handling** (retrieval, assessment, linking)
262 4. **Verdict Creation** (synthesis, reasoning, approval)
263 5. **Public Presentation** (summaries, landscapes, deep dives)
264 6. **Time Evolution** (versioning, re-evaluation triggers)
265 **Total: 6 phases with quality gates, ~10-30 seconds**
266
267 ---
268
269 === 4.3 Why POC is Simplified ===
270
271 **Engineering Rationale:**
272
273 1. **Test core capability first:** Can AI do basic fact-checking without humans?
274 2. **Fail fast:** If AI can't generate reasonable verdicts, pivot early
275 3. **Learn before building:** POC1 insights inform full architecture
276 4. **Iterative approach:** Add complexity only after validating foundations
277 5. **Resource efficiency:** Don't build full system if core concept fails
278
279 **Acceptable Trade-offs:**
280
281 * ✅ POC proves AI capability (most risky assumption)
282 * ✅ POC validates user comprehension (can people understand output?)
283 * ❌ POC doesn't validate full workflow (test in Beta)
284 * ❌ POC doesn't validate scale (test in Beta)
285 * ❌ POC doesn't validate scenario architecture (design in POC2)
286
287 ---
288
289 === 4.4 Gap Between POC1 and POC2/Beta ===
290
291 **What needs to be built for POC2:**
292 * Scenario generation component
293 * Evidence Model structure (full)
294 * Scenario-evidence linking
295 * Multi-interpretation comparison
296 * Truth landscape visualization
297
298 **What needs to be built for Beta:**
299 * Multi-component AKEL pipeline
300 * Quality gate infrastructure
301 * Review workflow system
302 * Audit sampling framework
303 * Production data model
304 * Federation architecture (Release 1.0)
305
306 **POC1 → POC2 is significant architectural expansion.**
307
308 ---
309
310 == 5. Publication Mode & Labeling ==
311
312 === 5.1 POC Publication Mode ===
313
314 **Mode:** Mode 2 (AI-Generated, No Prior Human Review)
315
316 Per FactHarbor Specification Section 11 "POC v1 Behavior":
317 * Produces public AI-generated output
318 * No human approval gate
319 * Clear AI-Generated labeling
320 * All quality gates active (simplified)
321 * Risk tier classification shown (demo)
322
323 ---
324
325 === 5.2 User-Facing Labels ===
326
327 **Primary Label (top of analysis):**
328 {{code}}
329 ╔════════════════════════════════════════════════════════════╗
330 ║ [AI-GENERATED - POC/DEMO] ║
331 ║ ║
332 ║ This analysis was produced entirely by AI and has not ║
333 ║ been human-reviewed. Use for demonstration purposes. ║
334 ║ ║
335 ║ Source: AI/AKEL v1.0 (POC) ║
336 ║ Review Status: Not Reviewed (Proof-of-Concept) ║
337 ║ Quality Gates: 4/4 Passed (Simplified) ║
338 ║ Last Updated: [timestamp] ║
339 ╚════════════════════════════════════════════════════════════╝
340 {{/code}}
341
342 **Per-Claim Risk Labels:**
343 * **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety)
344 * **[Risk: B]** 🟡 Medium Risk (Policy/Science)
345 * **[Risk: C]** 🟢 Low Risk (Facts/Definitions)
346
347 ---
348
349 === 5.3 Display Requirements ===
350
351 **Must Show:**
352 * AI-Generated status (prominent)
353 * POC/Demo disclaimer
354 * Risk tier per claim
355 * Confidence scores (0-100%)
356 * Quality gate status (passed/failed)
357 * Timestamp
358
359 **Must NOT Claim:**
360 * Human review
361 * Production quality
362 * Medical/legal advice
363 * Authoritative verdicts
364 * Complete accuracy
365
366 ---
367
368 === 5.4 Mode 2 vs. Full System Publication ===
369
370 |=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3
371 |Label|AI-Generated (POC)|AI-Generated|AKEL-Generated
372 |Review|None|None|Human-Reviewed
373 |Quality Gates|4 (simplified)|6 (full)|6 (full) + Human
374 |Audit|None (POC)|Sampling (5-50%)|Pre-publication
375 |Risk Display|Demo only|Workflow-integrated|Validated
376 |User Actions|View only|Flag for review|Trust rating
377
378 ---
379
380 == 6. Quality Gates (Simplified Implementation) ==
381
382 === 6.1 Overview ===
383
384 Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates.
385
386 **Full System Has 4 Gates:**
387 1. Source Quality
388 2. Contradiction Search (MANDATORY)
389 3. Uncertainty Quantification
390 4. Structural Integrity
391
392 **POC Implements Simplified Versions:**
393 * Focus on demonstrating concept
394 * Basic implementations sufficient
395 * Failures displayed to user (not blocking)
396 * Full system has comprehensive validation
397
398 ---
399
400 === 6.2 Gate 1: Source Quality (Basic) ===
401
402 **Full System Requirements:**
403 * Primary sources identified and accessible
404 * Source reliability scored against whitelist
405 * Citation completeness verified
406 * Publication dates checked
407 * Author credentials validated
408
409 **POC Implementation:**
410 * ✅ At least 2 sources found
411 * ✅ Sources accessible (URLs valid)
412 * ❌ No whitelist checking
413 * ❌ No credential validation
414 * ❌ No comprehensive reliability scoring
415
416 **Pass Criteria:** ≥2 accessible sources found
417
418 **Failure Handling:** Display error message, don't generate verdict
419
420 ---
421
422 === 6.3 Gate 2: Contradiction Search (Basic) ===
423
424 **Full System Requirements:**
425 * Counter-evidence actively searched
426 * Reservations and limitations identified
427 * Alternative interpretations explored
428 * Bubble detection (echo chambers, conspiracy theories)
429 * Cross-cultural and international perspectives
430 * Academic literature (supporting AND opposing)
431
432 **POC Implementation:**
433 * ✅ Basic search for counter-evidence
434 * ✅ Identify obvious contradictions
435 * ❌ No comprehensive academic search
436 * ❌ No bubble detection
437 * ❌ No systematic alternative interpretation search
438 * ❌ No international perspective verification
439
440 **Pass Criteria:** Basic contradiction search attempted
441
442 **Failure Handling:** Note "limited contradiction search" in output
443
444 ---
445
446 === 6.4 Gate 3: Uncertainty Quantification (Basic) ===
447
448 **Full System Requirements:**
449 * Confidence scores calculated for all claims/verdicts
450 * Limitations explicitly stated
451 * Data gaps identified and disclosed
452 * Strength of evidence assessed
453 * Alternative scenarios considered
454
455 **POC Implementation:**
456 * ✅ Confidence scores (0-100%)
457 * ✅ Basic uncertainty acknowledgment
458 * ❌ No detailed limitation disclosure
459 * ❌ No data gap identification
460 * ❌ No alternative scenario consideration (deferred to POC2)
461
462 **Pass Criteria:** Confidence score assigned
463
464 **Failure Handling:** Show "Confidence: Unknown" if calculation fails
465
466 ---
467
468 === 6.5 Gate 4: Structural Integrity (Basic) ===
469
470 **Full System Requirements:**
471 * No hallucinations detected (fact-checking against sources)
472 * Logic chain valid and traceable
473 * References accessible and verifiable
474 * No circular reasoning
475 * Premises clearly stated
476
477 **POC Implementation:**
478 * ✅ Basic coherence check
479 * ✅ References accessible
480 * ❌ No comprehensive hallucination detection
481 * ❌ No formal logic validation
482 * ❌ No premise extraction and verification
483
484 **Pass Criteria:** Output is coherent and references are accessible
485
486 **Failure Handling:** Display error message
487
488 ---
489
490 === 6.6 Quality Gate Display ===
491
492 **POC shows simplified status:**
493 {{code}}
494 Quality Gates: 4/4 Passed (Simplified)
495 ✓ Source Quality: 3 sources found
496 ✓ Contradiction Search: Basic search completed
497 ✓ Uncertainty: Confidence scores assigned
498 ✓ Structural Integrity: Output coherent
499 {{/code}}
500
501 **If any gate fails:**
502 {{code}}
503 Quality Gates: 3/4 Passed (Simplified)
504 ✓ Source Quality: 3 sources found
505 ✗ Contradiction Search: Search failed - limited evidence
506 ✓ Uncertainty: Confidence scores assigned
507 ✓ Structural Integrity: Output coherent
508
509 Note: This analysis has limited evidence. Use with caution.
510 {{/code}}
511
512 ---
513
514 === 6.7 Simplified vs. Full System ===
515
516 |=Gate|=POC (Simplified)|=Full System
517 |Source Quality|≥2 sources accessible|Whitelist scoring, credentials, comprehensiveness
518 |Contradiction|Basic search|Systematic academic + media + international
519 |Uncertainty|Confidence % assigned|Detailed limitations, data gaps, alternatives
520 |Structural|Coherence check|Hallucination detection, logic validation, premise check
521
522 **POC Goal:** Demonstrate that quality gates are possible, not perfect implementation.
523
524 ---
525
526 == 7. AKEL Architecture Comparison ==
527
528 === 7.1 POC AKEL (Simplified) ===
529
530 **Implementation:**
531 * Single Claude API call (Sonnet 4.5)
532 * One comprehensive prompt
533 * All processing in single request
534 * No separate components
535 * No orchestration layer
536
537 **Prompt Structure:**
538 {{code}}
539 Task: Analyze this article and provide:
540
541 1. Extract 3-5 factual claims
542 2. For each claim:
543 - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
544 - Assign confidence score (0-100%)
545 - Assign risk tier (A/B/C)
546 - Write brief reasoning (1-3 sentences)
547 3. Generate analysis summary (3-5 sentences)
548 4. Generate article summary (3-5 sentences)
549 5. Run basic quality checks
550
551 Return as structured JSON.
552 {{/code}}
553
554 **Processing Time:** 10-18 seconds (estimate)
555
556 ---
557
558 === 7.2 Full System AKEL (Production) ===
559
560 **Architecture:**
561 {{code}}
562 AKEL Orchestrator
563 ├── Claim Extractor
564 ├── Claim Classifier (with risk tier assignment)
565 ├── Scenario Generator
566 ├── Evidence Summarizer
567 ├── Contradiction Detector
568 ├── Quality Gate Validator
569 ├── Audit Sampling Scheduler
570 └── Federation Sync Adapter (Release 1.0+)
571 {{/code}}
572
573 **Processing:**
574 * Parallel processing where possible
575 * Separate component calls
576 * Quality gates between phases
577 * Audit sampling selection
578 * Cross-node coordination (federated mode)
579
580 **Processing Time:** 10-30 seconds (full pipeline)
581
582 ---
583
584 === 7.3 Why POC Uses Single Call ===
585
586 **Advantages:**
587 * ✅ Simpler to implement
588 * ✅ Faster POC development
589 * ✅ Easier to debug
590 * ✅ Proves AI capability
591 * ✅ Good enough for concept validation
592
593 **Limitations:**
594 * ❌ No component reusability
595 * ❌ No parallel processing
596 * ❌ All-or-nothing (can't partially succeed)
597 * ❌ Harder to improve individual components
598 * ❌ No audit sampling
599
600 **Acceptable Trade-off:**
601
602 POC tests "Can AI do this?" not "How should we architect it?"
603
604 Full component architecture comes in Beta after POC validates concept.
605
606 ---
607
608 === 7.4 Evolution Path ===
609
610 **POC1:** Single prompt → Prove concept
611 **POC2:** Add scenario component → Test full pipeline
612 **Beta:** Multi-component AKEL → Production architecture
613 **Release 1.0:** Full AKEL + Federation → Scale
614
615 ---
616
617 == 8. Functional Requirements ==
618
619 === FR-POC-1: Article Input ===
620
621 **Requirement:** User can submit article for analysis
622
623 **Functionality:**
624 * Text input field (paste article text, up to 5000 characters)
625 * URL input field (paste article URL)
626 * "Analyze" button to trigger processing
627 * Loading indicator during analysis
628
629 **Excluded:**
630 * No user authentication
631 * No claim history
632 * No search functionality
633 * No saved templates
634
635 **Acceptance Criteria:**
636 * User can paste text from article
637 * User can paste URL of article
638 * System accepts input and triggers analysis
639
640 ---
641
642 === FR-POC-2: Claim Extraction (Fully Automated) ===
643
644 **Requirement:** AI automatically extracts 3-5 factual claims
645
646 **Functionality:**
647 * AI reads article text
648 * AI identifies factual claims (not opinions/questions)
649 * AI extracts 3-5 most important claims
650 * System displays numbered list
651
652 **Critical:** NO MANUAL EDITING ALLOWED
653 * AI selects which claims to extract
654 * AI identifies factual vs. non-factual
655 * System processes claims as extracted
656 * No human curation or correction
657
658 **Error Handling:**
659 * If extraction fails: Display error message
660 * User can retry with different input
661 * No manual intervention to fix extraction
662
663 **Acceptance Criteria:**
664 * AI extracts 3-5 claims automatically
665 * Claims are factual (not opinions)
666 * Claims are clearly stated
667 * No manual editing required
668
669 ---
670
671 === FR-POC-3: Verdict Generation (Fully Automated) ===
672
673 **Requirement:** AI automatically generates verdict for each claim
674
675 **Functionality:**
676 * For each claim, AI:
677 * Evaluates claim based on available evidence/knowledge
678 * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
679 * Assigns confidence score (0-100%)
680 * Assigns risk tier (A/B/C)
681 * Writes brief reasoning (1-3 sentences)
682 * System displays verdict for each claim
683
684 **Critical:** NO MANUAL EDITING ALLOWED
685 * AI computes verdicts based on evidence
686 * AI generates confidence scores
687 * AI writes reasoning
688 * No human review or adjustment
689
690 **Error Handling:**
691 * If verdict generation fails: Display error message
692 * User can retry
693 * No manual intervention to adjust verdicts
694
695 **Acceptance Criteria:**
696 * Each claim has a verdict
697 * Confidence score is displayed (0-100%)
698 * Risk tier is displayed (A/B/C)
699 * Reasoning is understandable (1-3 sentences)
700 * Verdict is defensible given reasoning
701 * All generated automatically by AI
702
703 ---
704
705 === FR-POC-4: Analysis Summary (Fully Automated) ===
706
707 **Requirement:** AI generates brief summary of analysis
708
709 **Functionality:**
710 * AI summarizes findings in 3-5 sentences:
711 * How many claims found
712 * Distribution of verdicts
713 * Overall assessment
714 * System displays at top of results
715
716 **Critical:** NO MANUAL EDITING ALLOWED
717
718 **Acceptance Criteria:**
719 * Summary is coherent
720 * Accurately reflects analysis
721 * 3-5 sentences
722 * Automatically generated
723
724 ---
725
726 === FR-POC-5: Article Summary (Fully Automated, Optional) ===
727
728 **Requirement:** AI generates brief summary of original article
729
730 **Functionality:**
731 * AI summarizes article content (not FactHarbor's analysis)
732 * 3-5 sentences
733 * System displays
734
735 **Note:** Optional - can skip if time limited
736
737 **Critical:** NO MANUAL EDITING ALLOWED
738
739 **Acceptance Criteria:**
740 * Summary is neutral (article's position)
741 * Accurately reflects article content
742 * 3-5 sentences
743 * Automatically generated
744
745 ---
746
747 === FR-POC-6: Publication Mode Display ===
748
749 **Requirement:** Clear labeling of AI-generated content
750
751 **Functionality:**
752 * Display Mode 2 publication label
753 * Show POC/Demo disclaimer
754 * Display risk tiers per claim
755 * Show quality gate status
756 * Display timestamp
757
758 **Acceptance Criteria:**
759 * Label is prominent and clear
760 * User understands this is AI-generated POC output
761 * Risk tiers are color-coded
762 * Quality gate status is visible
763
764 ---
765
766 === FR-POC-7: Quality Gate Execution ===
767
768 **Requirement:** Execute simplified quality gates
769
770 **Functionality:**
771 * Check source quality (basic)
772 * Attempt contradiction search (basic)
773 * Calculate confidence scores
774 * Verify structural integrity (basic)
775 * Display gate results
776
777 **Acceptance Criteria:**
778 * All 4 gates attempted
779 * Pass/fail status displayed
780 * Failures explained to user
781 * Gates don't block publication (POC mode)
782
783 ---
784
785 == 9. Non-Functional Requirements ==
786
787 === NFR-POC-1: Fully Automated Processing ===
788
789 **Requirement:** Complete AI automation with zero manual intervention
790
791 **Critical Rule:** NO MANUAL EDITING AT ANY STAGE
792
793 **What this means:**
794 * Claims: AI selects (no human curation)
795 * Scenarios: N/A (deferred to POC2)
796 * Evidence: AI evaluates (no human selection)
797 * Verdicts: AI determines (no human adjustment)
798 * Summaries: AI writes (no human editing)
799
800 **Pipeline:**
801 {{code}}
802 User Input → AKEL Processing → Output Display
803
804 ZERO human editing
805 {{/code}}
806
807 **If AI output is poor:**
808 * ❌ Do NOT manually fix it
809 * ✅ Document the failure
810 * ✅ Improve prompts and retry
811 * ✅ Accept that POC might fail
812
813 **Why this matters:**
814 * Tests whether AI can do this without humans
815 * Validates scalability (humans can't review every analysis)
816 * Honest test of technical feasibility
817
818 ---
819
820 === NFR-POC-2: Performance ===
821
822 **Requirement:** Analysis completes in reasonable time
823
824 **Acceptable Performance:**
825 * Processing time: 1-5 minutes (acceptable for POC)
826 * Display loading indicator to user
827 * Show progress if possible ("Extracting claims...", "Generating verdicts...")
828
829 **Not Required:**
830 * Production-level speed (< 30 seconds)
831 * Optimization for scale
832 * Caching
833
834 **Acceptance Criteria:**
835 * Analysis completes within 5 minutes
836 * User sees loading indicator
837 * No timeout errors
838
839 ---
840
841 === NFR-POC-3: Reliability ===
842
843 **Requirement:** System works for manual testing sessions
844
845 **Acceptable:**
846 * Occasional errors (< 20% failure rate)
847 * Manual restart if needed
848 * Display error messages clearly
849
850 **Not Required:**
851 * 99.9% uptime
852 * Automatic error recovery
853 * Production monitoring
854
855 **Acceptance Criteria:**
856 * System works for test demonstrations
857 * Errors are handled gracefully
858 * User receives clear error messages
859
860 ---
861
862 === NFR-POC-4: Environment ===
863
864 **Requirement:** Runs on simple infrastructure
865
866 **Acceptable:**
867 * Single machine or simple cloud setup
868 * No distributed architecture
869 * No load balancing
870 * No redundancy
871 * Local development environment viable
872
873 **Not Required:**
874 * Production infrastructure
875 * Multi-region deployment
876 * Auto-scaling
877 * Disaster recovery
878
879 ---
880
881 == 10. Technical Architecture ==
882
883 === 10.1 System Components ===
884
885 **Frontend:**
886 * Simple HTML form (text input + URL input + button)
887 * Loading indicator
888 * Results display page (single page, no tabs/navigation)
889
890 **Backend:**
891 * Single API endpoint
892 * Calls Claude API (Sonnet 4.5 or latest)
893 * Parses response
894 * Returns JSON to frontend
895
896 **Data Storage:**
897 * None required (stateless POC)
898 * Optional: Simple file storage or SQLite for demo examples
899
900 **External Services:**
901 * Claude API (Anthropic) - required
902 * Optional: URL fetch service for article text extraction
903
904 ---
905
906 === 10.2 Processing Flow ===
907
908 {{code}}
909 1. User submits text or URL
910
911 2. Backend receives request
912
913 3. If URL: Fetch article text
914
915 4. Call Claude API with single prompt:
916 "Extract claims, evaluate each, provide verdicts"
917
918 5. Claude API returns:
919 - Analysis summary
920 - Claims list
921 - Verdicts for each claim (with risk tiers)
922 - Article summary (optional)
923 - Quality gate results
924
925 6. Backend parses response
926
927 7. Frontend displays results with Mode 2 labeling
928 {{/code}}
929
930 **Key Simplification:** Single API call does entire analysis
931
932 ---
933
934 === 10.3 AI Prompt Strategy ===
935
936 **Single Comprehensive Prompt:**
937 {{code}}
938 Task: Analyze this article and provide:
939
940 1. Extract 3-5 factual claims from the article
941 2. For each claim:
942 - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
943 - Assign confidence score (0-100%)
944 - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
945 - Write brief reasoning (1-3 sentences)
946 3. Run quality gates:
947 - Check: ≥2 sources found
948 - Attempt: Basic contradiction search
949 - Calculate: Confidence scores
950 - Verify: Structural integrity
951 4. Write analysis summary (3-5 sentences: claims found, verdict distribution, overall assessment)
952 5. Write article summary (3-5 sentences: neutral summary of article content)
953
954 Return as structured JSON with quality gate results.
955 {{/code}}
956
957 **One prompt generates everything.**
958
959 ---
960
961 === 10.4 Technology Stack Suggestions ===
962
963 **Frontend:**
964 * HTML + CSS + JavaScript (minimal framework)
965 * OR: Next.js (if team prefers)
966 * Hosted: Local machine OR Vercel/Netlify free tier
967
968 **Backend:**
969 * Python Flask/FastAPI (simple REST API)
970 * OR: Next.js API routes (if using Next.js)
971 * Hosted: Local machine OR Railway/Render free tier
972
973 **AKEL Integration:**
974 * Claude API via Anthropic SDK
975 * Model: Claude Sonnet 4.5 or latest available
976
977 **Database:**
978 * None (stateless acceptable)
979 * OR: SQLite if want to store demo examples
980 * OR: JSON files on disk
981
982 **Deployment:**
983 * Local development environment sufficient for POC
984 * Optional: Deploy to cloud for remote demos
985
986 ---
987
988 == 11. Success Criteria ==
989
990 === 11.1 Minimum Success (POC Passes) ===
991
992 **Required for GO decision:**
993 * ✅ AI extracts 3-5 factual claims automatically
994 * ✅ AI provides verdict for each claim automatically
995 * ✅ Verdicts are reasonable (≥70% make logical sense)
996 * ✅ Analysis summary is coherent
997 * ✅ Output is comprehensible to reviewers
998 * ✅ Team/advisors understand the output
999 * ✅ Team agrees approach has merit
1000 * ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention)
1001
1002 **Quality Definition:**
1003 * "Reasonable verdict" = Defensible given general knowledge
1004 * "Coherent summary" = Logically structured, grammatically correct
1005 * "Comprehensible" = Reviewers understand what analysis means
1006
1007 ---
1008
1009 === 11.2 POC Fails If ===
1010
1011 **Automatic NO-GO if any of these:**
1012 * ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones)
1013 * ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random)
1014 * ❌ Output incomprehensible (reviewers can't understand analysis)
1015 * ❌ **Requires manual editing for most analyses** (> 50% need human correction)
1016 * ❌ Team loses confidence in AI-automated approach
1017
1018 ---
1019
1020 === 11.3 Quality Thresholds ===
1021
1022 **POC quality expectations:**
1023
1024 |=Component|=Quality Threshold|=Definition
1025 |Claim Extraction|(% class="success" %)≥70% accuracy(%%) |Identifies obvious factual claims, may miss some edge cases
1026 |Verdict Logic|(% class="success" %)≥70% defensible(%%) |Verdicts are logical given reasoning provided
1027 |Reasoning Clarity|(% class="success" %)≥70% clear(%%) |1-3 sentences are understandable and relevant
1028 |Overall Analysis|(% class="success" %)≥70% useful(%%) |Output helps user understand article claims
1029
1030 **Analogy:** "B student" quality (70-80%), not "A+" perfection yet
1031
1032 **Not expecting:**
1033 * 100% accuracy
1034 * Perfect claim coverage
1035 * Comprehensive evidence gathering
1036 * Flawless verdicts
1037 * Production polish
1038
1039 **Expecting:**
1040 * Reasonable claim extraction
1041 * Defensible verdicts
1042 * Understandable reasoning
1043 * Useful output
1044
1045 ---
1046
1047 == 12. Test Cases ==
1048
1049 === 12.1 Test Case 1: Simple Factual Claim ===
1050
1051 **Input:** "Coffee reduces the risk of type 2 diabetes by 30%"
1052
1053 **Expected Output:**
1054 * Extract claim correctly
1055 * Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED
1056 * Confidence: 70-90%
1057 * Risk tier: C (Low)
1058 * Reasoning: Mentions studies or evidence
1059
1060 **Success:** Verdict is reasonable and reasoning makes sense
1061
1062 ---
1063
1064 === 12.2 Test Case 2: Complex News Article ===
1065
1066 **Input:** News article URL with multiple claims about politics/health/science
1067
1068 **Expected Output:**
1069 * Extract 3-5 key claims
1070 * Verdict for each (may vary: some supported, some uncertain, some refuted)
1071 * Coherent analysis summary
1072 * Article summary
1073 * Risk tiers assigned appropriately
1074
1075 **Success:** Claims identified are actually from article, verdicts are reasonable
1076
1077 ---
1078
1079 === 12.3 Test Case 3: Controversial Topic ===
1080
1081 **Input:** Article on contested political or scientific topic
1082
1083 **Expected Output:**
1084 * Balanced analysis
1085 * Acknowledges uncertainty where appropriate
1086 * Doesn't overstate confidence
1087 * Reasoning shows awareness of complexity
1088
1089 **Success:** Analysis is fair and doesn't show obvious bias
1090
1091 ---
1092
1093 === 12.4 Test Case 4: Clearly False Claim ===
1094
1095 **Input:** Article with obviously false claim (e.g., "The Earth is flat")
1096
1097 **Expected Output:**
1098 * Extract claim
1099 * Verdict: REFUTED
1100 * High confidence (> 90%)
1101 * Risk tier: C (Low - established fact)
1102 * Clear reasoning
1103
1104 **Success:** AI correctly identifies false claim with high confidence
1105
1106 ---
1107
1108 === 12.5 Test Case 5: Genuinely Uncertain Claim ===
1109
1110 **Input:** Article with claim where evidence is genuinely mixed
1111
1112 **Expected Output:**
1113 * Extract claim
1114 * Verdict: UNCERTAIN
1115 * Moderate confidence (40-60%)
1116 * Reasoning explains why uncertain
1117
1118 **Success:** AI recognizes uncertainty and doesn't overstate confidence
1119
1120 ---
1121
1122 === 12.6 Test Case 6: High-Risk Medical Claim ===
1123
1124 **Input:** Article making medical claims
1125
1126 **Expected Output:**
1127 * Extract claim
1128 * Verdict: [appropriate based on evidence]
1129 * Risk tier: A (High - medical)
1130 * Red label displayed
1131 * Clear disclaimer about not being medical advice
1132
1133 **Success:** Risk tier correctly assigned, appropriate warnings shown
1134
1135 ---
1136
1137 == 13. POC Decision Gate ==
1138
1139 === 13.1 Decision Framework ===
1140
1141 After POC testing complete, team makes one of three decisions:
1142
1143 **Option A: GO (Proceed to POC2)**
1144
1145 **Conditions:**
1146 * AI quality ≥70% without manual editing
1147 * Basic claim → verdict pipeline validated
1148 * Internal + advisor feedback positive
1149 * Technical feasibility confirmed
1150 * Team confident in direction
1151 * Clear path to improving AI quality to ≥90%
1152
1153 **Next Steps:**
1154 * Plan POC2 development (add scenarios)
1155 * Design scenario architecture
1156 * Expand to Evidence Model structure
1157 * Test with more complex articles
1158
1159 ---
1160
1161 **Option B: NO-GO (Pivot or Stop)**
1162
1163 **Conditions:**
1164 * AI quality < 60%
1165 * Requires manual editing for most analyses (> 50%)
1166 * Feedback indicates fundamental flaws
1167 * Cost/effort not justified by value
1168 * No clear path to improvement
1169
1170 **Next Steps:**
1171 * **Pivot:** Change to hybrid human-AI approach (accept manual review required)
1172 * **Stop:** Conclude approach not viable, revisit later
1173
1174 ---
1175
1176 **Option C: ITERATE (Improve POC)**
1177
1178 **Conditions:**
1179 * Concept has merit but execution needs work
1180 * Specific improvements identified
1181 * Addressable with better prompts/approach
1182 * AI quality between 60-70%
1183
1184 **Next Steps:**
1185 * Improve AI prompts
1186 * Test different approaches
1187 * Re-run POC with improvements
1188 * Then make GO/NO-GO decision
1189
1190 ---
1191
1192 === 13.2 Decision Criteria Summary ===
1193
1194 {{code}}
1195 AI Quality < 60% → NO-GO (approach doesn't work)
1196 AI Quality 60-70% → ITERATE (improve and retry)
1197 AI Quality ≥70% → GO (proceed to POC2)
1198 {{/code}}
1199
1200 ---
1201
1202 == 14. Key Risks & Mitigations ==
1203
1204 === 14.1 Risk: AI Quality Not Good Enough ===
1205
1206 **Likelihood:** Medium-High
1207 **Impact:** POC fails
1208
1209 **Mitigation:**
1210 * Extensive prompt engineering and testing
1211 * Use best available AI models (Sonnet 4.5)
1212 * Test with diverse article types
1213 * Iterate on prompts based on results
1214
1215 **Acceptance:** This is what POC tests - be ready for failure
1216
1217 ---
1218
1219 === 14.2 Risk: AI Consistency Issues ===
1220
1221 **Likelihood:** Medium
1222 **Impact:** Works sometimes, fails other times
1223
1224 **Mitigation:**
1225 * Test with 10+ diverse articles
1226 * Measure success rate honestly
1227 * Improve prompts to increase consistency
1228
1229 **Acceptance:** Some variability OK if average quality ≥70%
1230
1231 ---
1232
1233 === 14.3 Risk: Output Incomprehensible ===
1234
1235 **Likelihood:** Low-Medium
1236 **Impact:** Users can't understand analysis
1237
1238 **Mitigation:**
1239 * Create clear explainer document
1240 * Iterate on output format
1241 * Test with non-technical reviewers
1242 * Simplify language if needed
1243
1244 **Acceptance:** Iterate until comprehensible
1245
1246 ---
1247
1248 === 14.4 Risk: API Rate Limits / Costs ===
1249
1250 **Likelihood:** Low
1251 **Impact:** System slow or expensive
1252
1253 **Mitigation:**
1254 * Monitor API usage
1255 * Implement retry logic
1256 * Estimate costs before scaling
1257
1258 **Acceptance:** POC can be slow and expensive (optimization later)
1259
1260 ---
1261
1262 === 14.5 Risk: Scope Creep ===
1263
1264 **Likelihood:** Medium
1265 **Impact:** POC becomes too complex
1266
1267 **Mitigation:**
1268 * Strict scope discipline
1269 * Say NO to feature additions
1270 * Keep focus on core question
1271
1272 **Acceptance:** POC is minimal by design
1273
1274 ---
1275
1276 == 15. POC Philosophy ==
1277
1278 === 15.1 Core Principles ===
1279
1280 **1. Build Less, Learn More**
1281 * Minimum features to test hypothesis
1282 * Don't build unvalidated features
1283 * Focus on core question only
1284
1285 **2. Fail Fast**
1286 * Quick test of hardest part (AI capability)
1287 * Accept that POC might fail
1288 * Better to discover issues early
1289 * Honest assessment over optimistic hope
1290
1291 **3. Test First, Build Second**
1292 * Validate AI can do this before building platform
1293 * Don't assume it will work
1294 * Let results guide decisions
1295
1296 **4. Automation First**
1297 * No manual editing allowed
1298 * Tests scalability, not just feasibility
1299 * Proves approach can work at scale
1300
1301 **5. Honest Assessment**
1302 * Don't cherry-pick examples
1303 * Don't manually fix bad outputs
1304 * Document failures openly
1305 * Make data-driven decisions
1306
1307 ---
1308
1309 === 15.2 What POC Is ===
1310
1311 ✅ Testing AI capability without humans
1312 ✅ Proving core technical concept
1313 ✅ Fast validation of approach
1314 ✅ Honest assessment of feasibility
1315
1316 ---
1317
1318 === 15.3 What POC Is NOT ===
1319
1320 ❌ Building a product
1321 ❌ Production-ready system
1322 ❌ Feature-complete platform
1323 ❌ Perfectly accurate analysis
1324 ❌ Polished user experience
1325
1326 ---
1327
1328 == 16. Success = Clear Path Forward ==
1329
1330 **If POC succeeds (≥70% AI quality):**
1331 * ✅ Approach validated
1332 * ✅ Proceed to POC2 (add scenarios)
1333 * ✅ Design full Evidence Model structure
1334 * ✅ Test multi-scenario comparison
1335 * ✅ Focus on improving AI quality from 70% → 90%
1336
1337 **If POC fails (< 60% AI quality):**
1338 * ✅ Learn what doesn't work
1339 * ✅ Pivot to different approach
1340 * ✅ OR wait for better AI technology
1341 * ✅ Avoid wasting resources on non-viable approach
1342
1343 **Either way, POC provides clarity.**
1344
1345 ---
1346
1347 == 17. Related Pages ==
1348
1349 * [[User Needs>>FactHarbor.Specification.Requirements.User Needs]]
1350 * [[Requirements>>FactHarbor.Requirements.WebHome]]
1351 * [[Gap Analysis>>FactHarbor.Analysis.GapAnalysis]]
1352 * [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]
1353 * [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
1354 * [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]
1355
1356 ---
1357
1358 **Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)
1359