Wiki source code of POC Requirements

Last modified by Robert Schaub on 2026/02/08 08:27

Show last authors
1 = POC Requirements =
2
3 **Status:** ✅ Approved for Development
4 **Version:** 2.0 (Updated after Specification Cross-Check)
5 **Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention
6
7 ----
8
9 == 1. POC Overview ==
10
11 === 1.1 What POC Tests ===
12
13 **Core Question:**
14
15 > Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?
16
17 **What we're proving:**
18
19 * AI can identify factual claims from text
20 * AI can evaluate those claims and produce verdicts
21 * Output is comprehensible and useful
22 * Fully automated approach is viable
23
24 **What we're NOT testing:**
25
26 * Scenario generation (deferred to POC2)
27 * Evidence display (deferred to POC2)
28 * Production scalability
29 * Perfect accuracy
30 * Complete feature set
31
32 ----
33
34 === 1.2 Scenarios Deferred to POC2 ===
35
36 **Intentional Simplification:**
37
38 Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**.
39
40 **Rationale:**
41
42 * **POC1 tests:** Can AI extract claims and generate verdicts?
43 * **POC2 will add:** Scenario generation and management
44 * **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow?
45
46 **Design Decision:**
47
48 Prove basic AI capability first, then add scenario complexity based on POC1 learnings. This is good engineering: test the hardest part (AI fact-checking) before adding architectural complexity.
49
50 **No Risk:**
51
52 Scenarios are additive complexity, not foundational. Deferring them to POC2 allows:
53
54 * Faster POC1 validation
55 * Learning from POC1 to inform scenario design
56 * Iterative approach: fail fast if basic AI doesn't work
57 * Flexibility to adjust scenario architecture based on POC1 insights
58
59 **Full System Workflow (Future):**
60 {{code}}Claims → Scenarios → Evidence → Verdicts{{/code}}
61
62 **POC1 Simplified Workflow:**
63 {{code}}Claims → Verdicts (scenarios implicit in reasoning){{/code}}
64
65 ----
66
67 == 2. POC Output Specification ==
68
69 === 2.1 Component 1: ANALYSIS SUMMARY ===
70
71 **What:** Brief overview of findings
72 **Length:** 3-5 sentences
73 **Content:**
74
75 * How many claims found
76 * Distribution of verdicts
77 * Overall assessment
78
79 **Example:**
80 {{code}}This article makes 4 claims about coffee's health effects. We found
81 2 claims are well-supported, 1 is uncertain, and 1 is refuted.
82 Overall assessment: mostly accurate with some exaggeration.{{/code}}
83
84 ----
85
86 === 2.2 Component 2: CLAIMS IDENTIFICATION ===
87
88 **What:** List of factual claims extracted from article
89 **Format:** Numbered list
90 **Quantity:** 3-5 claims
91 **Requirements:**
92
93 * Factual claims only (not opinions/questions)
94 * Clearly stated
95 * Automatically extracted by AI
96
97 **Example:**
98 {{code}}CLAIMS IDENTIFIED:
99
100 [1] Coffee reduces diabetes risk by 30%
101 [2] Coffee improves heart health
102 [3] Decaf has same benefits as regular
103 [4] Coffee prevents Alzheimer's completely{{/code}}
104
105 ----
106
107 === 2.3 Component 3: CLAIMS VERDICTS ===
108
109 **What:** Verdict for each claim identified
110 **Format:** Per claim structure
111
112 **Required Elements:**
113
114 * **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
115 * **Confidence Score:** 0-100%
116 * **Brief Reasoning:** 1-3 sentences explaining why
117 * **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration
118
119 **Example:**
120 {{code}}VERDICTS:
121
122 [1] WELL-SUPPORTED (85%) [Risk: C]
123 Multiple studies confirm 25-30% risk reduction with regular consumption.
124
125 [2] UNCERTAIN (65%) [Risk: B]
126 Evidence is mixed. Some studies show benefits, others show no effect.
127
128 [3] PARTIALLY SUPPORTED (60%) [Risk: C]
129 Some benefits overlap, but caffeine-related benefits are reduced in decaf.
130
131 [4] REFUTED (90%) [Risk: B]
132 No evidence for complete prevention. Claim is significantly overstated.{{/code}}
133
134 **Risk Tier Display:**
135
136 * **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections
137 * **Tier B (Yellow):** Medium Risk - Policy/Science/Causality
138 * **Tier C (Green):** Low Risk - Facts/Definitions/History
139
140 **Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.
141
142 ----
143
144 === 2.4 Component 4: ARTICLE SUMMARY (Optional) ===
145
146 **What:** Brief summary of original article content
147 **Length:** 3-5 sentences
148 **Tone:** Neutral (article's position, not FactHarbor's analysis)
149
150 **Example:**
151 {{code}}ARTICLE SUMMARY:
152
153 Health News Today article discusses coffee benefits, citing studies
154 on diabetes and Alzheimer's. Author highlights research linking coffee
155 to disease prevention. Recommends 2-3 cups daily for optimal health.{{/code}}
156
157 ----
158
159 === 2.5 Total Output Size ===
160
161 **Combined:** 200-300 words
162
163 * Analysis Summary: 50-70 words
164 * Claims Identification: 30-50 words
165 * Claims Verdicts: 100-150 words
166 * Article Summary: 30-50 words (optional)
167
168 ----
169
170 == 3. What's NOT in POC Scope ==
171
172 === 3.1 Feature Exclusions ===
173
174 The following are **explicitly excluded** from POC:
175
176 **Content Features:**
177
178 * ❌ Scenarios (deferred to POC2)
179 * ❌ Evidence display (supporting/opposing lists)
180 * ❌ Source links (clickable references)
181 * ❌ Detailed reasoning chains
182 * ❌ Source quality ratings (shown but not detailed)
183 * ❌ Contradiction detection (basic only)
184 * ❌ Risk assessment (shown but not workflow-integrated)
185
186 **Platform Features:**
187
188 * ❌ User accounts / authentication
189 * ❌ Saved history
190 * ❌ Search functionality
191 * ❌ Claim comparison
192 * ❌ User contributions
193 * ❌ Commenting system
194 * ❌ Social sharing
195
196 **Technical Features:**
197
198 * ❌ Browser extensions
199 * ❌ Mobile apps
200 * ❌ API endpoints
201 * ❌ Webhooks
202 * ❌ Export features (PDF, CSV)
203
204 **Quality Features:**
205
206 * ❌ Accessibility (WCAG compliance)
207 * ❌ Multilingual support
208 * ❌ Mobile optimization
209 * ❌ Media verification (images/videos)
210
211 **Production Features:**
212
213 * ❌ Security hardening
214 * ❌ Privacy compliance (GDPR)
215 * ❌ Terms of service
216 * ❌ Monitoring/logging
217 * ❌ Error tracking
218 * ❌ Analytics
219 * ❌ A/B testing
220
221 ----
222
223 == 4. POC Simplifications vs. Full System ==
224
225 === 4.1 Architecture Comparison ===
226
227 **POC Architecture (Simplified):**
228 {{code}}User Input → Single AKEL Call → Output Display
229 (all processing){{/code}}
230
231 **Full System Architecture:**
232 {{code}}User Input → Claim Extractor → Claim Classifier → Scenario Generator
233 → Evidence Summarizer → Contradiction Detector → Verdict Generator
234 → Quality Gates → Publication → Output Display{{/code}}
235
236 **Key Differences:**
237
238 |=Aspect|=POC1|=Full System
239 |Processing|Single API call|Multi-component pipeline
240 |Scenarios|None (implicit)|Explicit entities with versioning
241 |Evidence|Basic retrieval|Comprehensive with quality scoring
242 |Quality Gates|Simplified (4 basic checks)|Full validation infrastructure
243 |Workflow|3 steps (input/process/output)|6 phases with gates
244 |Data Model|Stateless (no database)|PostgreSQL + Redis + S3
245 |Architecture|Single prompt to Claude|AKEL Orchestrator + Components
246
247 ----
248
249 === 4.2 Workflow Comparison ===
250
251 **POC1 Workflow:**
252
253 1. User submits text/URL
254 2. Single AKEL call (all processing in one prompt)
255 3. Display results
256 **Total: 3 steps, 10-18 seconds**
257
258 **Full System Workflow:**
259
260 1. **Claim Submission** (extraction, normalization, clustering)
261 2. **Scenario Building** (definitions, assumptions, boundaries)
262 3. **Evidence Handling** (retrieval, assessment, linking)
263 4. **Verdict Creation** (synthesis, reasoning, approval)
264 5. **Public Presentation** (summaries, landscapes, deep dives)
265 6. **Time Evolution** (versioning, re-evaluation triggers)
266 **Total: 6 phases with quality gates, 10-30 seconds**
267
268 ----
269
270 === 4.3 Why POC is Simplified ===
271
272 **Engineering Rationale:**
273
274 1. **Test core capability first:** Can AI do basic fact-checking without humans?
275 2. **Fail fast:** If AI can't generate reasonable verdicts, pivot early
276 3. **Learn before building:** POC1 insights inform full architecture
277 4. **Iterative approach:** Add complexity only after validating foundations
278 5. **Resource efficiency:** Don't build full system if core concept fails
279
280 **Acceptable Trade-offs:**
281
282 * ✅ POC proves AI capability (most risky assumption)
283 * ✅ POC validates user comprehension (can people understand output?)
284 * ❌ POC doesn't validate full workflow (test in Beta)
285 * ❌ POC doesn't validate scale (test in Beta)
286 * ❌ POC doesn't validate scenario architecture (design in POC2)
287
288 ----
289
290 === 4.4 Gap Between POC1 and POC2/Beta ===
291
292 **What needs to be built for POC2:**
293
294 * Scenario generation component
295 * Evidence Model structure (full)
296 * Scenario-evidence linking
297 * Multi-interpretation comparison
298 * Truth landscape visualization
299
300 **What needs to be built for Beta:**
301
302 * Multi-component AKEL pipeline
303 * Quality gate infrastructure
304 * Review workflow system
305 * Audit sampling framework
306 * Production data model
307 * Federation architecture (Release 1.0)
308
309 **POC1 → POC2 is significant architectural expansion.**
310
311 ----
312
313 == 5. Publication Mode & Labeling ==
314
315 === 5.1 POC Publication Mode ===
316
317 **Mode:** Mode 2 (AI-Generated, No Prior Human Review)
318
319 Per FactHarbor Specification Section 11 "POC v1 Behavior":
320
321 * Produces public AI-generated output
322 * No human approval gate
323 * Clear AI-Generated labeling
324 * All quality gates active (simplified)
325 * Risk tier classification shown (demo)
326
327 ----
328
329 === 5.2 User-Facing Labels ===
330
331 **Primary Label (top of analysis):**
332 {{code}}╔════════════════════════════════════════════════════════════╗
333 ║ [AI-GENERATED - POC/DEMO] ║
334 ║ ║
335 ║ This analysis was produced entirely by AI and has not ║
336 ║ been human-reviewed. Use for demonstration purposes. ║
337 ║ ║
338 ║ Source: AI/AKEL v1.0 (POC) ║
339 ║ Review Status: Not Reviewed (Proof-of-Concept) ║
340 ║ Quality Gates: 4/4 Passed (Simplified) ║
341 ║ Last Updated: [timestamp] ║
342 ╚════════════════════════════════════════════════════════════╝{{/code}}
343
344 **Per-Claim Risk Labels:**
345
346 * **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety)
347 * **[Risk: B]** 🟡 Medium Risk (Policy/Science)
348 * **[Risk: C]** 🟢 Low Risk (Facts/Definitions)
349
350 ----
351
352 === 5.3 Display Requirements ===
353
354 **Must Show:**
355
356 * AI-Generated status (prominent)
357 * POC/Demo disclaimer
358 * Risk tier per claim
359 * Confidence scores (0-100%)
360 * Quality gate status (passed/failed)
361 * Timestamp
362
363 **Must NOT Claim:**
364
365 * Human review
366 * Production quality
367 * Medical/legal advice
368 * Authoritative verdicts
369 * Complete accuracy
370
371 ----
372
373 === 5.4 Mode 2 vs. Full System Publication ===
374
375 |=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3
376 |Label|AI-Generated (POC)|AI-Generated|AKEL-Generated
377 |Review|None|None|Human-Reviewed
378 |Quality Gates|4 (simplified)|6 (full)|6 (full) + Human
379 |Audit|None (POC)|Sampling (5-50%)|Pre-publication
380 |Risk Display|Demo only|Workflow-integrated|Validated
381 |User Actions|View only|Flag for review|Trust rating
382
383 ----
384
385 == 6. Quality Gates (Simplified Implementation) ==
386
387 === 6.1 Overview ===
388
389 Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates.
390
391 **Full System Has 4 Gates:**
392
393 1. Source Quality
394 2. Contradiction Search (MANDATORY)
395 3. Uncertainty Quantification
396 4. Structural Integrity
397
398 **POC Implements Simplified Versions:**
399
400 * Focus on demonstrating concept
401 * Basic implementations sufficient
402 * Failures displayed to user (not blocking)
403 * Full system has comprehensive validation
404
405 ----
406
407 === 6.2 Gate 1: Source Quality (Basic) ===
408
409 **Full System Requirements:**
410
411 * Primary sources identified and accessible
412 * Source reliability scored against whitelist
413 * Citation completeness verified
414 * Publication dates checked
415 * Author credentials validated
416
417 **POC Implementation:**
418
419 * ✅ At least 2 sources found
420 * ✅ Sources accessible (URLs valid)
421 * ❌ No whitelist checking
422 * ❌ No credential validation
423 * ❌ No comprehensive reliability scoring
424
425 **Pass Criteria:** ≥2 accessible sources found
426
427 **Failure Handling:** Display error message, don't generate verdict
428
429 ----
430
431 === 6.3 Gate 2: Contradiction Search (Basic) ===
432
433 **Full System Requirements:**
434
435 * Counter-evidence actively searched
436 * Reservations and limitations identified
437 * Alternative interpretations explored
438 * Bubble detection (echo chambers, conspiracy theories)
439 * Cross-cultural and international perspectives
440 * Academic literature (supporting AND opposing)
441
442 **POC Implementation:**
443
444 * ✅ Basic search for counter-evidence
445 * ✅ Identify obvious contradictions
446 * ❌ No comprehensive academic search
447 * ❌ No bubble detection
448 * ❌ No systematic alternative interpretation search
449 * ❌ No international perspective verification
450
451 **Pass Criteria:** Basic contradiction search attempted
452
453 **Failure Handling:** Note "limited contradiction search" in output
454
455 ----
456
457 === 6.4 Gate 3: Uncertainty Quantification (Basic) ===
458
459 **Full System Requirements:**
460
461 * Confidence scores calculated for all claims/verdicts
462 * Limitations explicitly stated
463 * Data gaps identified and disclosed
464 * Strength of evidence assessed
465 * Alternative scenarios considered
466
467 **POC Implementation:**
468
469 * ✅ Confidence scores (0-100%)
470 * ✅ Basic uncertainty acknowledgment
471 * ❌ No detailed limitation disclosure
472 * ❌ No data gap identification
473 * ❌ No alternative scenario consideration (deferred to POC2)
474
475 **Pass Criteria:** Confidence score assigned
476
477 **Failure Handling:** Show "Confidence: Unknown" if calculation fails
478
479 ----
480
481 === 6.5 Gate 4: Structural Integrity (Basic) ===
482
483 **Full System Requirements:**
484
485 * No hallucinations detected (fact-checking against sources)
486 * Logic chain valid and traceable
487 * References accessible and verifiable
488 * No circular reasoning
489 * Premises clearly stated
490
491 **POC Implementation:**
492
493 * ✅ Basic coherence check
494 * ✅ References accessible
495 * ❌ No comprehensive hallucination detection
496 * ❌ No formal logic validation
497 * ❌ No premise extraction and verification
498
499 **Pass Criteria:** Output is coherent and references are accessible
500
501 **Failure Handling:** Display error message
502
503 ----
504
505 === 6.6 Quality Gate Display ===
506
507 **POC shows simplified status:**
508 {{code}}Quality Gates: 4/4 Passed (Simplified)
509 ✓ Source Quality: 3 sources found
510 ✓ Contradiction Search: Basic search completed
511 ✓ Uncertainty: Confidence scores assigned
512 ✓ Structural Integrity: Output coherent{{/code}}
513
514 **If any gate fails:**
515 {{code}}Quality Gates: 3/4 Passed (Simplified)
516 ✓ Source Quality: 3 sources found
517 ✗ Contradiction Search: Search failed - limited evidence
518 ✓ Uncertainty: Confidence scores assigned
519 ✓ Structural Integrity: Output coherent
520
521 Note: This analysis has limited evidence. Use with caution.{{/code}}
522
523 ----
524
525 === 6.7 Simplified vs. Full System ===
526
527 |=Gate|=POC (Simplified)|=Full System
528 |Source Quality|≥2 sources accessible|Whitelist scoring, credentials, comprehensiveness
529 |Contradiction|Basic search|Systematic academic + media + international
530 |Uncertainty|Confidence % assigned|Detailed limitations, data gaps, alternatives
531 |Structural|Coherence check|Hallucination detection, logic validation, premise check
532
533 **POC Goal:** Demonstrate that quality gates are possible, not perfect implementation.
534
535 ----
536
537 == 7. AKEL Architecture Comparison ==
538
539 === 7.1 POC AKEL (Simplified) ===
540
541 **Implementation:**
542
543 * Single Claude API call (Sonnet 4.5)
544 * One comprehensive prompt
545 * All processing in single request
546 * No separate components
547 * No orchestration layer
548
549 **Prompt Structure:**
550 {{code}}Task: Analyze this article and provide:
551
552 1. Extract 3-5 factual claims
553 2. For each claim:
554 - Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)
555 - Assign confidence score (0-100%)
556 - Assign risk tier (A/B/C)
557 - Write brief reasoning (1-3 sentences)
558 3. Generate analysis summary (3-5 sentences)
559 4. Generate article summary (3-5 sentences)
560 5. Run basic quality checks
561
562 Return as structured JSON.{{/code}}
563
564 **Processing Time:** 10-18 seconds (estimate)
565
566 ----
567
568 === 7.2 Full System AKEL (Production) ===
569
570 **Architecture:**
571 {{code}}AKEL Orchestrator
572 ├── Claim Extractor
573 ├── Claim Classifier (with risk tier assignment)
574 ├── Scenario Generator
575 ├── Evidence Summarizer
576 ├── Contradiction Detector
577 ├── Quality Gate Validator
578 ├── Audit Sampling Scheduler
579 └── Federation Sync Adapter (Release 1.0+){{/code}}
580
581 **Processing:**
582
583 * Parallel processing where possible
584 * Separate component calls
585 * Quality gates between phases
586 * Audit sampling selection
587 * Cross-node coordination (federated mode)
588
589 **Processing Time:** 10-30 seconds (full pipeline)
590
591 ----
592
593 === 7.3 Why POC Uses Single Call ===
594
595 **Advantages:**
596
597 * ✅ Simpler to implement
598 * ✅ Faster POC development
599 * ✅ Easier to debug
600 * ✅ Proves AI capability
601 * ✅ Good enough for concept validation
602
603 **Limitations:**
604
605 * ❌ No component reusability
606 * ❌ No parallel processing
607 * ❌ All-or-nothing (can't partially succeed)
608 * ❌ Harder to improve individual components
609 * ❌ No audit sampling
610
611 **Acceptable Trade-off:**
612
613 POC tests "Can AI do this?" not "How should we architect it?"
614
615 Full component architecture comes in Beta after POC validates concept.
616
617 ----
618
619 === 7.4 Evolution Path ===
620
621 **POC1:** Single prompt → Prove concept
622 **POC2:** Add scenario component → Test full pipeline
623 **Beta:** Multi-component AKEL → Production architecture
624 **Release 1.0:** Full AKEL + Federation → Scale
625
626 ----
627
628 == 8. Functional Requirements ==
629
630 === FR-POC-1: Article Input ===
631
632 **Requirement:** User can submit article for analysis
633
634 **Functionality:**
635
636 * Text input field (paste article text, up to 5000 characters)
637 * URL input field (paste article URL)
638 * "Analyze" button to trigger processing
639 * Loading indicator during analysis
640
641 **Excluded:**
642
643 * No user authentication
644 * No claim history
645 * No search functionality
646 * No saved templates
647
648 **Acceptance Criteria:**
649
650 * User can paste text from article
651 * User can paste URL of article
652 * System accepts input and triggers analysis
653
654 ----
655
656 === FR-POC-2: Claim Extraction (Fully Automated) ===
657
658 **Requirement:** AI automatically extracts 3-5 factual claims
659
660 **Functionality:**
661
662 * AI reads article text
663 * AI identifies factual claims (not opinions/questions)
664 * AI extracts 3-5 most important claims
665 * System displays numbered list
666
667 **Critical:** NO MANUAL EDITING ALLOWED
668
669 * AI selects which claims to extract
670 * AI identifies factual vs. non-factual
671 * System processes claims as extracted
672 * No human curation or correction
673
674 **Error Handling:**
675
676 * If extraction fails: Display error message
677 * User can retry with different input
678 * No manual intervention to fix extraction
679
680 **Acceptance Criteria:**
681
682 * AI extracts 3-5 claims automatically
683 * Claims are factual (not opinions)
684 * Claims are clearly stated
685 * No manual editing required
686
687 ----
688
689 === FR-POC-3: Verdict Generation (Fully Automated) ===
690
691 **Requirement:** AI automatically generates verdict for each claim
692
693 **Functionality:**
694
695 * For each claim, AI:
696 * Evaluates claim based on available evidence/knowledge
697 * Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED
698 * Assigns confidence score (0-100%)
699 * Assigns risk tier (A/B/C)
700 * Writes brief reasoning (1-3 sentences)
701 * System displays verdict for each claim
702
703 **Critical:** NO MANUAL EDITING ALLOWED
704
705 * AI computes verdicts based on evidence
706 * AI generates confidence scores
707 * AI writes reasoning
708 * No human review or adjustment
709
710 **Error Handling:**
711
712 * If verdict generation fails: Display error message
713 * User can retry
714 * No manual intervention to adjust verdicts
715
716 **Acceptance Criteria:**
717
718 * Each claim has a verdict
719 * Confidence score is displayed (0-100%)
720 * Risk tier is displayed (A/B/C)
721 * Reasoning is understandable (1-3 sentences)
722 * Verdict is defensible given reasoning
723 * All generated automatically by AI
724
725 ----
726
727 === FR-POC-4: Analysis Summary (Fully Automated) ===
728
729 **Requirement:** AI generates brief summary of analysis
730
731 **Functionality:**
732
733 * AI summarizes findings in 3-5 sentences:
734 * How many claims found
735 * Distribution of verdicts
736 * Overall assessment
737 * System displays at top of results
738
739 **Critical:** NO MANUAL EDITING ALLOWED
740
741 **Acceptance Criteria:**
742
743 * Summary is coherent
744 * Accurately reflects analysis
745 * 3-5 sentences
746 * Automatically generated
747
748 ----
749
750 === FR-POC-5: Article Summary (Fully Automated, Optional) ===
751
752 **Requirement:** AI generates brief summary of original article
753
754 **Functionality:**
755
756 * AI summarizes article content (not FactHarbor's analysis)
757 * 3-5 sentences
758 * System displays
759
760 **Note:** Optional - can skip if time limited
761
762 **Critical:** NO MANUAL EDITING ALLOWED
763
764 **Acceptance Criteria:**
765
766 * Summary is neutral (article's position)
767 * Accurately reflects article content
768 * 3-5 sentences
769 * Automatically generated
770
771 ----
772
773 === FR-POC-6: Publication Mode Display ===
774
775 **Requirement:** Clear labeling of AI-generated content
776
777 **Functionality:**
778
779 * Display Mode 2 publication label
780 * Show POC/Demo disclaimer
781 * Display risk tiers per claim
782 * Show quality gate status
783 * Display timestamp
784
785 **Acceptance Criteria:**
786
787 * Label is prominent and clear
788 * User understands this is AI-generated POC output
789 * Risk tiers are color-coded
790 * Quality gate status is visible
791
792 ----
793
794 === FR-POC-7: Quality Gate Execution ===
795
796 **Requirement:** Execute simplified quality gates
797
798 **Functionality:**
799
800 * Check source quality (basic)
801 * Attempt contradiction search (basic)
802 * Calculate confidence scores
803 * Verify structural integrity (basic)
804 * Display gate results
805
806 **Acceptance Criteria:**
807
808 * All 4 gates attempted
809 * Pass/fail status displayed
810 * Failures explained to user
811 * Gates don't block publication (POC mode)
812
813 ----
814
815 == 9. Non-Functional Requirements ==
816
817 === NFR-POC-1: Fully Automated Processing ===
818
819 **Requirement:** Complete AI automation with zero manual intervention
820
821 **Critical Rule:** NO MANUAL EDITING AT ANY STAGE
822
823 **What this means:**
824
825 * Claims: AI selects (no human curation)
826 * Scenarios: N/A (deferred to POC2)
827 * Evidence: AI evaluates (no human selection)
828 * Verdicts: AI determines (no human adjustment)
829 * Summaries: AI writes (no human editing)
830
831 **Pipeline:**
832 {{code}}User Input → AKEL Processing → Output Display
833
834 ZERO human editing{{/code}}
835
836 **If AI output is poor:**
837
838 * ❌ Do NOT manually fix it
839 * ✅ Document the failure
840 * ✅ Improve prompts and retry
841 * ✅ Accept that POC might fail
842
843 **Why this matters:**
844
845 * Tests whether AI can do this without humans
846 * Validates scalability (humans can't review every analysis)
847 * Honest test of technical feasibility
848
849 ----
850
851 === NFR-POC-2: Performance ===
852
853 **Requirement:** Analysis completes in reasonable time
854
855 **Acceptable Performance:**
856
857 * Processing time: 1-5 minutes (acceptable for POC)
858 * Display loading indicator to user
859 * Show progress if possible ("Extracting claims...", "Generating verdicts...")
860
861 **Not Required:**
862
863 * Production-level speed (< 30 seconds)
864 * Optimization for scale
865 * Caching
866
867 **Acceptance Criteria:**
868
869 * Analysis completes within 5 minutes
870 * User sees loading indicator
871 * No timeout errors
872
873 ----
874
875 === NFR-POC-3: Reliability ===
876
877 **Requirement:** System works for manual testing sessions
878
879 **Acceptable:**
880
881 * Occasional errors (< 20% failure rate)
882 * Manual restart if needed
883 * Display error messages clearly
884
885 **Not Required:**
886
887 * 99.9% uptime
888 * Automatic error recovery
889 * Production monitoring
890
891 **Acceptance Criteria:**
892
893 * System works for test demonstrations
894 * Errors are handled gracefully
895 * User receives clear error messages
896
897 ----
898
899 === NFR-POC-4: Environment ===
900
901 **Requirement:** Runs on simple infrastructure
902
903 **Acceptable:**
904
905 * Single machine or simple cloud setup
906 * No distributed architecture
907 * No load balancing
908 * No redundancy
909 * Local development environment viable
910
911 **Not Required:**
912
913 * Production infrastructure
914 * Multi-region deployment
915 * Auto-scaling
916 * Disaster recovery
917
918 ----
919
920 == 10. Technical Architecture ==
921
922 === 10.1 System Components ===
923
924 **Frontend:**
925
926 * Simple HTML form (text input + URL input + button)
927 * Loading indicator
928 * Results display page (single page, no tabs/navigation)
929
930 **Backend:**
931
932 * Single API endpoint
933 * Calls Claude API (Sonnet 4.5 or latest)
934 * Parses response
935 * Returns JSON to frontend
936
937 **Data Storage:**
938
939 * None required (stateless POC)
940 * Optional: Simple file storage or SQLite for demo examples
941
942 **External Services:**
943
944 * Claude API (Anthropic) - required
945 * Optional: URL fetch service for article text extraction
946
947 ----
948
949 === 10.2 Processing Flow ===
950
951 {{code}}
952 1. User submits text or URL
953
954 2. Backend receives request
955
956 3. If URL: Fetch article text
957
958 4. Call Claude API with single prompt:
959 "Extract claims, evaluate each, provide verdicts"
960
961 5. Claude API returns:
962 - Analysis summary
963 - Claims list
964 - Verdicts for each claim (with risk tiers)
965 - Article summary (optional)
966 - Quality gate results
967
968 6. Backend parses response
969
970 7. Frontend displays results with Mode 2 labeling
971 {{/code}}
972
973 **Key Simplification:** Single API call does entire analysis
974
975 ----
976
977 === 10.3 AI Prompt Strategy ===
978
979 **Single Comprehensive Prompt:**
980 {{code}}Task: Analyze this article and provide:
981
982 1. Extract 3-5 factual claims from the article
983 2. For each claim:
984 - Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)
985 - Assign confidence score (0-100%)
986 - Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)
987 - Write brief reasoning (1-3 sentences)
988 3. Run quality gates:
989 - Check: ≥2 sources found
990 - Attempt: Basic contradiction search
991 - Calculate: Confidence scores
992 - Verify: Structural integrity
993 4. Write analysis summary (3-5 sentences: claims found, verdict distribution, overall assessment)
994 5. Write article summary (3-5 sentences: neutral summary of article content)
995
996 Return as structured JSON with quality gate results.{{/code}}
997
998 **One prompt generates everything.**
999
1000 ----
1001
1002 === 10.4 Technology Stack Suggestions ===
1003
1004 **Frontend:**
1005
1006 * HTML + CSS + JavaScript (minimal framework)
1007 * OR: Next.js (if team prefers)
1008 * Hosted: Local machine OR Vercel/Netlify free tier
1009
1010 **Backend:**
1011
1012 * Python Flask/FastAPI (simple REST API)
1013 * OR: Next.js API routes (if using Next.js)
1014 * Hosted: Local machine OR Railway/Render free tier
1015
1016 **AKEL Integration:**
1017
1018 * Claude API via Anthropic SDK
1019 * Model: Claude Sonnet 4.5 or latest available
1020
1021 **Database:**
1022
1023 * None (stateless acceptable)
1024 * OR: SQLite if want to store demo examples
1025 * OR: JSON files on disk
1026
1027 **Deployment:**
1028
1029 * Local development environment sufficient for POC
1030 * Optional: Deploy to cloud for remote demos
1031
1032 ----
1033
1034 == 11. Success Criteria ==
1035
1036 === 11.1 Minimum Success (POC Passes) ===
1037
1038 **Required for GO decision:**
1039
1040 * ✅ AI extracts 3-5 factual claims automatically
1041 * ✅ AI provides verdict for each claim automatically
1042 * ✅ Verdicts are reasonable (≥70% make logical sense)
1043 * ✅ Analysis summary is coherent
1044 * ✅ Output is comprehensible to reviewers
1045 * ✅ Team/advisors understand the output
1046 * ✅ Team agrees approach has merit
1047 * ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention)
1048
1049 **Quality Definition:**
1050
1051 * "Reasonable verdict" = Defensible given general knowledge
1052 * "Coherent summary" = Logically structured, grammatically correct
1053 * "Comprehensible" = Reviewers understand what analysis means
1054
1055 ----
1056
1057 === 11.2 POC Fails If ===
1058
1059 **Automatic NO-GO if any of these:**
1060
1061 * ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones)
1062 * ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random)
1063 * ❌ Output incomprehensible (reviewers can't understand analysis)
1064 * ❌ **Requires manual editing for most analyses** (> 50% need human correction)
1065 * ❌ Team loses confidence in AI-automated approach
1066
1067 ----
1068
1069 === 11.3 Quality Thresholds ===
1070
1071 **POC quality expectations:**
1072
1073 |=Component|=Quality Threshold|=Definition
1074 |Claim Extraction|(% class="success" %)≥70% accuracy |Identifies obvious factual claims, may miss some edge cases
1075 |Verdict Logic|(% class="success" %)≥70% defensible |Verdicts are logical given reasoning provided
1076 |Reasoning Clarity|(% class="success" %)≥70% clear |1-3 sentences are understandable and relevant
1077 |Overall Analysis|(% class="success" %)≥70% useful |Output helps user understand article claims
1078
1079 **Analogy:** "B student" quality (70-80%), not "A+" perfection yet
1080
1081 **Not expecting:**
1082
1083 * 100% accuracy
1084 * Perfect claim coverage
1085 * Comprehensive evidence gathering
1086 * Flawless verdicts
1087 * Production polish
1088
1089 **Expecting:**
1090
1091 * Reasonable claim extraction
1092 * Defensible verdicts
1093 * Understandable reasoning
1094 * Useful output
1095
1096 ----
1097
1098 == 12. Test Cases ==
1099
1100 === 12.1 Test Case 1: Simple Factual Claim ===
1101
1102 **Input:** "Coffee reduces the risk of type 2 diabetes by 30%"
1103
1104 **Expected Output:**
1105
1106 * Extract claim correctly
1107 * Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED
1108 * Confidence: 70-90%
1109 * Risk tier: C (Low)
1110 * Reasoning: Mentions studies or evidence
1111
1112 **Success:** Verdict is reasonable and reasoning makes sense
1113
1114 ----
1115
1116 === 12.2 Test Case 2: Complex News Article ===
1117
1118 **Input:** News article URL with multiple claims about politics/health/science
1119
1120 **Expected Output:**
1121
1122 * Extract 3-5 key claims
1123 * Verdict for each (may vary: some supported, some uncertain, some refuted)
1124 * Coherent analysis summary
1125 * Article summary
1126 * Risk tiers assigned appropriately
1127
1128 **Success:** Claims identified are actually from article, verdicts are reasonable
1129
1130 ----
1131
1132 === 12.3 Test Case 3: Controversial Topic ===
1133
1134 **Input:** Article on contested political or scientific topic
1135
1136 **Expected Output:**
1137
1138 * Balanced analysis
1139 * Acknowledges uncertainty where appropriate
1140 * Doesn't overstate confidence
1141 * Reasoning shows awareness of complexity
1142
1143 **Success:** Analysis is fair and doesn't show obvious bias
1144
1145 ----
1146
1147 === 12.4 Test Case 4: Clearly False Claim ===
1148
1149 **Input:** Article with obviously false claim (e.g., "The Earth is flat")
1150
1151 **Expected Output:**
1152
1153 * Extract claim
1154 * Verdict: REFUTED
1155 * High confidence (> 90%)
1156 * Risk tier: C (Low - established fact)
1157 * Clear reasoning
1158
1159 **Success:** AI correctly identifies false claim with high confidence
1160
1161 ----
1162
1163 === 12.5 Test Case 5: Genuinely Uncertain Claim ===
1164
1165 **Input:** Article with claim where evidence is genuinely mixed
1166
1167 **Expected Output:**
1168
1169 * Extract claim
1170 * Verdict: UNCERTAIN
1171 * Moderate confidence (40-60%)
1172 * Reasoning explains why uncertain
1173
1174 **Success:** AI recognizes uncertainty and doesn't overstate confidence
1175
1176 ----
1177
1178 === 12.6 Test Case 6: High-Risk Medical Claim ===
1179
1180 **Input:** Article making medical claims
1181
1182 **Expected Output:**
1183
1184 * Extract claim
1185 * Verdict: [appropriate based on evidence]
1186 * Risk tier: A (High - medical)
1187 * Red label displayed
1188 * Clear disclaimer about not being medical advice
1189
1190 **Success:** Risk tier correctly assigned, appropriate warnings shown
1191
1192 ----
1193
1194 == 13. POC Decision Gate ==
1195
1196 === 13.1 Decision Framework ===
1197
1198 After POC testing complete, team makes one of three decisions:
1199
1200 **Option A: GO (Proceed to POC2)**
1201
1202 **Conditions:**
1203
1204 * AI quality ≥70% without manual editing
1205 * Basic claim → verdict pipeline validated
1206 * Internal + advisor feedback positive
1207 * Technical feasibility confirmed
1208 * Team confident in direction
1209 * Clear path to improving AI quality to ≥90%
1210
1211 **Next Steps:**
1212
1213 * Plan POC2 development (add scenarios)
1214 * Design scenario architecture
1215 * Expand to Evidence Model structure
1216 * Test with more complex articles
1217
1218 ----
1219
1220 **Option B: NO-GO (Pivot or Stop)**
1221
1222 **Conditions:**
1223
1224 * AI quality < 60%
1225 * Requires manual editing for most analyses (> 50%)
1226 * Feedback indicates fundamental flaws
1227 * Cost/effort not justified by value
1228 * No clear path to improvement
1229
1230 **Next Steps:**
1231
1232 * **Pivot:** Change to hybrid human-AI approach (accept manual review required)
1233 * **Stop:** Conclude approach not viable, revisit later
1234
1235 ----
1236
1237 **Option C: ITERATE (Improve POC)**
1238
1239 **Conditions:**
1240
1241 * Concept has merit but execution needs work
1242 * Specific improvements identified
1243 * Addressable with better prompts/approach
1244 * AI quality between 60-70%
1245
1246 **Next Steps:**
1247
1248 * Improve AI prompts
1249 * Test different approaches
1250 * Re-run POC with improvements
1251 * Then make GO/NO-GO decision
1252
1253 ----
1254
1255 === 13.2 Decision Criteria Summary ===
1256
1257 {{code}}
1258 AI Quality < 60% → NO-GO (approach doesn't work)
1259 AI Quality 60-70% → ITERATE (improve and retry)
1260 AI Quality ≥70% → GO (proceed to POC2)
1261 {{/code}}
1262
1263 ----
1264
1265 == 14. Key Risks & Mitigations ==
1266
1267 === 14.1 Risk: AI Quality Not Good Enough ===
1268
1269 **Likelihood:** Medium-High
1270 **Impact:** POC fails
1271
1272 **Mitigation:**
1273
1274 * Extensive prompt engineering and testing
1275 * Use best available AI models (Sonnet 4.5)
1276 * Test with diverse article types
1277 * Iterate on prompts based on results
1278
1279 **Acceptance:** This is what POC tests - be ready for failure
1280
1281 ----
1282
1283 === 14.2 Risk: AI Consistency Issues ===
1284
1285 **Likelihood:** Medium
1286 **Impact:** Works sometimes, fails other times
1287
1288 **Mitigation:**
1289
1290 * Test with 10+ diverse articles
1291 * Measure success rate honestly
1292 * Improve prompts to increase consistency
1293
1294 **Acceptance:** Some variability OK if average quality ≥70%
1295
1296 ----
1297
1298 === 14.3 Risk: Output Incomprehensible ===
1299
1300 **Likelihood:** Low-Medium
1301 **Impact:** Users can't understand analysis
1302
1303 **Mitigation:**
1304
1305 * Create clear explainer document
1306 * Iterate on output format
1307 * Test with non-technical reviewers
1308 * Simplify language if needed
1309
1310 **Acceptance:** Iterate until comprehensible
1311
1312 ----
1313
1314 === 14.4 Risk: API Rate Limits / Costs ===
1315
1316 **Likelihood:** Low
1317 **Impact:** System slow or expensive
1318
1319 **Mitigation:**
1320
1321 * Monitor API usage
1322 * Implement retry logic
1323 * Estimate costs before scaling
1324
1325 **Acceptance:** POC can be slow and expensive (optimization later)
1326
1327 ----
1328
1329 === 14.5 Risk: Scope Creep ===
1330
1331 **Likelihood:** Medium
1332 **Impact:** POC becomes too complex
1333
1334 **Mitigation:**
1335
1336 * Strict scope discipline
1337 * Say NO to feature additions
1338 * Keep focus on core question
1339
1340 **Acceptance:** POC is minimal by design
1341
1342 ----
1343
1344 == 15. POC Philosophy ==
1345
1346 === 15.1 Core Principles ===
1347
1348 * \\
1349 ** \\
1350 **1. Build Less, Learn More
1351 * Minimum features to test hypothesis
1352 * Don't build unvalidated features
1353 * Focus on core question only
1354
1355 **2. Fail Fast**
1356
1357 * Quick test of hardest part (AI capability)
1358 * Accept that POC might fail
1359 * Better to discover issues early
1360 * Honest assessment over optimistic hope
1361
1362 **3. Test First, Build Second**
1363
1364 * Validate AI can do this before building platform
1365 * Don't assume it will work
1366 * Let results guide decisions
1367
1368 **4. Automation First**
1369
1370 * No manual editing allowed
1371 * Tests scalability, not just feasibility
1372 * Proves approach can work at scale
1373
1374 **5. Honest Assessment**
1375
1376 * Don't cherry-pick examples
1377 * Don't manually fix bad outputs
1378 * Document failures openly
1379 * Make data-driven decisions
1380
1381 ----
1382
1383 === 15.2 What POC Is ===
1384
1385 ✅ Testing AI capability without humans
1386 ✅ Proving core technical concept
1387 ✅ Fast validation of approach
1388 ✅ Honest assessment of feasibility
1389
1390 ----
1391
1392 === 15.3 What POC Is NOT ===
1393
1394 ❌ Building a product
1395 ❌ Production-ready system
1396 ❌ Feature-complete platform
1397 ❌ Perfectly accurate analysis
1398 ❌ Polished user experience
1399
1400 ----
1401
1402 == 16. Success ==
1403
1404 Clear Path Forward ==
1405
1406 **If POC succeeds (≥70% AI quality):**
1407
1408 * ✅ Approach validated
1409 * ✅ Proceed to POC2 (add scenarios)
1410 * ✅ Design full Evidence Model structure
1411 * ✅ Test multi-scenario comparison
1412 * ✅ Focus on improving AI quality from 70% → 90%
1413
1414 **If POC fails (< 60% AI quality):**
1415
1416 * ✅ Learn what doesn't work
1417 * ✅ Pivot to different approach
1418 * ✅ OR wait for better AI technology
1419 * ✅ Avoid wasting resources on non-viable approach
1420
1421 **Either way, POC provides clarity.**
1422
1423 ----
1424
1425 == 17. Related Pages ==
1426
1427 * [[User Needs>>Archive.FactHarbor 2026\.01\.20.Specification.Requirements.User Needs.WebHome]]
1428 * [[Requirements>>FactHarbor.Requirements.WebHome]]
1429 * [[Gap Analysis>>FactHarbor.Analysis.GapAnalysis]]
1430 * [[Architecture>>Archive.FactHarbor 2026\.01\.20.Specification.Architecture.WebHome]]
1431 * [[AKEL>>Archive.FactHarbor 2026\.01\.20.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]
1432 * [[Workflows>>Archive.FactHarbor 2026\.01\.20.Specification.Workflows.WebHome]]
1433
1434 ----
1435
1436 **Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)