POC Requirements (POC1 & POC2)

Prove basic AI capability first, then add scenario complexity based on POC1 learnings. This is good engineering: test the hardest part (AI fact-checking) before adding architectural complexity.

54

55

**No Risk:**

56

57

Scenarios are additive complexity, not foundational. Deferring them to POC2 allows:

58

59

* Faster POC1 validation

60

* Learning from POC1 to inform scenario design

61

* Iterative approach: fail fast if basic AI doesn't work

62

* Flexibility to adjust scenario architecture based on POC1 insights

63

64

**Full System Workflow (Future):**

65

{{code}}Claims → Scenarios → Evidence → Verdicts{{/code}}

66

67

**POC1 Simplified Workflow:**

68

{{code}}Claims → Verdicts (scenarios implicit in reasoning){{/code}}

69

70

== 2. POC Output Specification ==

71

72

=== 2.1 Component 1: ANALYSIS SUMMARY (Context-Aware) ===

73

74

**What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument

75

76

**Length:** 4-6 sentences

77

78

**Content (Required Elements):**

79

80

1. **Article's main thesis/claim** - What is the article trying to argue or prove?

81

2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts

82

3. **Central vs. supporting claims** - Which claims are central to the article's argument?

83

4. **Relationship assessment** - Do the claims support the article's conclusion?

84

5. **Overall credibility** - Final assessment considering claim importance

85

86

**Critical Innovation:**

87

88

POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might:

89

90

* Make accurate supporting facts but draw unsupported conclusions

91

* Have one false central claim that invalidates the whole argument

92

* Misframe accurate information to mislead

93

94

**Good Example (Context-Aware):**

95

{{code}}This article argues that coffee cures cancer based on its antioxidant

96

content. We analyzed 3 factual claims: 2 about coffee's chemical

97

properties are well-supported, but the main causal claim is refuted

98

by current evidence. The article confuses correlation with causation.

99

Overall assessment: MISLEADING - makes an unsupported medical claim

100

despite citing some accurate facts.{{/code}}

101

102

**Poor Example (Simple Aggregation - Don't Do This):**

103

{{code}}This article makes 3 claims. 2 are well-supported and 1 is refuted.

104

Overall assessment: mostly accurate (67% accurate).{{/code}}

105

↑ This misses that the refuted claim IS the article's main point!

**What POC1 Tests:**

Can AI identify and assess:

110

111

* ✅ The article's main thesis/conclusion?

112

* ✅ Which claims are central vs. supporting?

113

* ✅ Whether the evidence supports the conclusion?

114

* ✅ Overall credibility considering logical structure?

115

116

**If AI Cannot Do This:**

117

118

That's valuable to learn in POC1! We'll:

119

120

* Note as limitation

121

* Fall back to simple aggregation with warning

122

* Design explicit article-level analysis for POC2

123

124

=== 2.2 Component 2: CLAIMS IDENTIFICATION ===

125

126

**What:** List of factual claims extracted from article

127

**Format:** Numbered list

128

**Quantity:** 3-5 claims

129

**Requirements:**

130

131

* Factual claims only (not opinions/questions)

132

* Clearly stated

133

* Automatically extracted by AI

134

135

**Example:**

136

{{code}}CLAIMS IDENTIFIED:

137

138

[1] Coffee reduces diabetes risk by 30%

139

[2] Coffee improves heart health

140

[3] Decaf has same benefits as regular

141

[4] Coffee prevents Alzheimer's completely{{/code}}

142

143

=== 2.3 Component 3: CLAIMS VERDICTS ===

144

145

**What:** Verdict for each claim identified

146

**Format:** Per claim structure

147

148

**Required Elements:**

149

150

* **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED

151

* **Confidence Score:** 0-100%

152

* **Brief Reasoning:** 1-3 sentences explaining why

153

* **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration

**Example:**

{{code}}VERDICTS:

[1] WELL-SUPPORTED (85%) [Risk: C]

159

Multiple studies confirm 25-30% risk reduction with regular consumption.

160

161

[2] UNCERTAIN (65%) [Risk: B]

162

Evidence is mixed. Some studies show benefits, others show no effect.

163

164

[3] PARTIALLY SUPPORTED (60%) [Risk: C]

165

Some benefits overlap, but caffeine-related benefits are reduced in decaf.

166

167

[4] REFUTED (90%) [Risk: B]

168

No evidence for complete prevention. Claim is significantly overstated.{{/code}}

169

170

**Risk Tier Display:**

171

172

* **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections

173

* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality

174

* **Tier C (Green):** Low Risk - Facts/Definitions/History

175

176

**Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.

177

178

=== 2.4 Component 4: ARTICLE SUMMARY (Optional) ===

179

180

**What:** Brief summary of original article content

181

**Length:** 3-5 sentences

182

**Tone:** Neutral (article's position, not FactHarbor's analysis)

183

184

**Example:**

185

{{code}}ARTICLE SUMMARY:

186

187

Health News Today article discusses coffee benefits, citing studies

188

on diabetes and Alzheimer's. Author highlights research linking coffee

189

to disease prevention. Recommends 2-3 cups daily for optimal health.{{/code}}

190

191

=== 2.5 Component 5: USAGE STATISTICS (Cost Tracking) ===

192

193

**What:** LLM usage metrics for cost optimization and scaling decisions

**Purpose:**

* Understand cost per analysis

198

* Identify optimization opportunities

199

* Project costs at scale

200

* Inform architecture decisions

201

202

**Display Format:**

203

{{code}}USAGE STATISTICS:

204

• Article: 2,450 words (12,300 characters)

205

• Input tokens: 15,234

206

• Output tokens: 892

207

• Total tokens: 16,126

208

• Estimated cost: $0.24 USD

209

• Response time: 8.3 seconds

210

• Cost per claim: $0.048

211

• Model: claude-sonnet-4-20250514{{/code}}

212

213

**Why This Matters:**

214

215

At scale, LLM costs are critical:

216

217

* 10,000 articles/month ≈ $200-500/month

218

* 100,000 articles/month ≈ $2,000-5,000/month

219

* Cost optimization can reduce expenses 30-50%

220

221

**What POC1 Learns:**

222

223

* How cost scales with article length

224

* Prompt optimization opportunities (caching, compression)

225

* Output verbosity tradeoffs

226

* Model selection strategy (Sonnet vs. Haiku)

227

* Article length limits (if needed)

**Implementation:**

* Claude API already returns usage data

232

* No extra API calls needed

233

* Display to user + log for aggregate analysis

234

* Test with articles of varying lengths

235

236

**Critical for GO/NO-GO:** Unit economics must be viable at scale!

237

238

=== 2.6 Total Output Size ===

239

240

**Combined:** 220-350 words

241

242

* Analysis Summary (Context-Aware): 60-90 words (4-6 sentences)

243

* Claims Identification: 30-50 words

244

* Claims Verdicts: 100-150 words

245

* Article Summary: 30-50 words (optional)

246

247

**Note:** Analysis summary is slightly longer (4-6 sentences vs. 3-5) to accommodate context-aware assessment of article structure and logical reasoning.

248

249

== 3. What's NOT in POC Scope ==

250

251

=== 3.1 Feature Exclusions ===

252

253

The following are **explicitly excluded** from POC:

254

255

**Content Features:**

256

257

* ❌ Scenarios (deferred to POC2)

258

* ❌ Evidence display (supporting/opposing lists)

259

* ❌ Source links (clickable references)

260

* ❌ Detailed reasoning chains

261

* ❌ Source quality ratings (shown but not detailed)

262

* ❌ Contradiction detection (basic only)

263

* ❌ Risk assessment (shown but not workflow-integrated)

264

265

**Platform Features:**

266

267

* ❌ User accounts / authentication

268

* ❌ Saved history

269

* ❌ Search functionality

270

* ❌ Claim comparison

271

* ❌ User contributions

272

* ❌ Commenting system

273

* ❌ Social sharing

274

275

**Technical Features:**

276

277

* ❌ Browser extensions

* ❌ Mobile apps

* ❌ API endpoints

* ❌ Webhooks

* ❌ Export features (PDF, CSV)

282

283

**Quality Features:**

284

285

* ❌ Accessibility (WCAG compliance)

286

* ❌ Multilingual support

287

* ❌ Mobile optimization

288

* ❌ Media verification (images/videos)

289

290

**Production Features:**

291

292

* ❌ Security hardening

293

* ❌ Privacy compliance (GDPR)

294

* ❌ Terms of service

295

* ❌ Monitoring/logging

* ❌ Error tracking

* ❌ Analytics

* ❌ A/B testing

== 4. POC Simplifications vs. Full System ==

301

302

=== 4.1 Architecture Comparison ===

303

304

**POC Architecture (Simplified):**

305

{{code}}User Input → Single AKEL Call → Output Display

306

(all processing){{/code}}

307

308

**Full System Architecture:**

309

{{code}}User Input → Claim Extractor → Claim Classifier → Scenario Generator

310

→ Evidence Summarizer → Contradiction Detector → Verdict Generator

311

→ Quality Gates → Publication → Output Display{{/code}}

**Key Differences:**

|=Aspect|=POC1|=Full System

316

|Processing|Single API call|Multi-component pipeline

317

|Scenarios|None (implicit)|Explicit entities with versioning

318

|Evidence|Basic retrieval|Comprehensive with quality scoring

319

|Quality Gates|Simplified (4 basic checks)|Full validation infrastructure

320

|Workflow|3 steps (input/process/output)|6 phases with gates

321

|Data Model|Stateless (no database)|PostgreSQL + Redis + S3

322

|Architecture|Single prompt to Claude|AKEL Orchestrator + Components

323

324

=== 4.2 Workflow Comparison ===

**POC1 Workflow:**

1. User submits text/URL

329

2. Single AKEL call (all processing in one prompt)

330

3. Display results

331

**Total: 3 steps, 10-18 seconds**

332

333

**Full System Workflow:**

334

335

1. **Claim Submission** (extraction, normalization, clustering)

336

2. **Scenario Building** (definitions, assumptions, boundaries)

337

3. **Evidence Handling** (retrieval, assessment, linking)

338

4. **Verdict Creation** (synthesis, reasoning, approval)

339

5. **Public Presentation** (summaries, landscapes, deep dives)

340

6. **Time Evolution** (versioning, re-evaluation triggers)

341

**Total: 6 phases with quality gates, 10-30 seconds**

342

343

=== 4.3 Why POC is Simplified ===

344

345

**Engineering Rationale:**

346

347

1. **Test core capability first:** Can AI do basic fact-checking without humans?

348

2. **Fail fast:** If AI can't generate reasonable verdicts, pivot early

349

3. **Learn before building:** POC1 insights inform full architecture

350

4. **Iterative approach:** Add complexity only after validating foundations

351

5. **Resource efficiency:** Don't build full system if core concept fails

352

353

**Acceptable Trade-offs:**

354

355

* ✅ POC proves AI capability (most risky assumption)

356

* ✅ POC validates user comprehension (can people understand output?)

357

* ❌ POC doesn't validate full workflow (test in Beta)

358

* ❌ POC doesn't validate scale (test in Beta)

359

* ❌ POC doesn't validate scenario architecture (design in POC2)

360

361

=== 4.4 Gap Between POC1 and POC2/Beta ===

362

363

**What needs to be built for POC2:**

364

365

* Scenario generation component

366

* Evidence Model structure (full)

367

* Scenario-evidence linking

368

* Multi-interpretation comparison

369

* Truth landscape visualization

370

371

**What needs to be built for Beta:**

372

373

* Multi-component AKEL pipeline

374

* Quality gate infrastructure

375

* Review workflow system

376

* Audit sampling framework

377

* Production data model

378

* Federation architecture (Release 1.0)

379

380

**POC1 → POC2 is significant architectural expansion.**

381

382

== 5. Publication Mode & Labeling ==

383

384

=== 5.1 POC Publication Mode ===

385

386

**Mode:** Mode 2 (AI-Generated, No Prior Human Review)

387

388

Per FactHarbor Specification Section 11 "POC v1 Behavior":

389

390

* Produces public AI-generated output

391

* No human approval gate

392

* Clear AI-Generated labeling

393

* All quality gates active (simplified)

394

* Risk tier classification shown (demo)

395

396

=== 5.2 User-Facing Labels ===

397

398

**Primary Label (top of analysis):**

399

{{code}}╔════════════════════════════════════════════════════════════╗

400

║ [AI-GENERATED - POC/DEMO] ║

401

║ ║

402

║ This analysis was produced entirely by AI and has not ║

403

║ been human-reviewed. Use for demonstration purposes. ║

404

║ ║

405

║ Source: AI/AKEL v1.0 (POC) ║

406

║ Review Status: Not Reviewed (Proof-of-Concept) ║

407

║ Quality Gates: 4/4 Passed (Simplified) ║

408

║ Last Updated: [timestamp] ║

409

╚════════════════════════════════════════════════════════════╝{{/code}}

410

411

**Per-Claim Risk Labels:**

412

413

* **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety)

414

* **[Risk: B]** 🟡 Medium Risk (Policy/Science)

415

* **[Risk: C]** 🟢 Low Risk (Facts/Definitions)

416

417

=== 5.3 Display Requirements ===

**Must Show:**

* AI-Generated status (prominent)

422

* POC/Demo disclaimer

423

* Risk tier per claim

424

* Confidence scores (0-100%)

425

* Quality gate status (passed/failed)

* Timestamp

**Must NOT Claim:**

* Human review

* Production quality

* Medical/legal advice

433

* Authoritative verdicts

434

* Complete accuracy

435

436

=== 5.4 Mode 2 vs. Full System Publication ===

437

438

|=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3

439

|Label|AI-Generated (POC)|AI-Generated|AKEL-Generated

440

|Review|None|None|Human-Reviewed

441

|Quality Gates|4 (simplified)|6 (full)|6 (full) + Human

442

|Audit|None (POC)|Sampling (5-50%)|Pre-publication

443

|Risk Display|Demo only|Workflow-integrated|Validated

444

|User Actions|View only|Flag for review|Trust rating

445

446

== 6. Quality Gates (Simplified Implementation) ==

=== 6.1 Overview ===

Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates.

451

452

**Full System Has 4 Gates:**

453

454

1. Source Quality

455

2. Contradiction Search (MANDATORY)

456

3. Uncertainty Quantification

457

4. Structural Integrity

458

459

**POC Implements Simplified Versions:**

460

461

* Focus on demonstrating concept

462

* Basic implementations sufficient

463

* Failures displayed to user (not blocking)

464

* Full system has comprehensive validation

465

466

=== 6.2 Gate 1: Source Quality (Basic) ===

467

468

**Full System Requirements:**

469

470

* Primary sources identified and accessible

471

* Source reliability scored against whitelist

472

* Citation completeness verified

473

* Publication dates checked

474

* Author credentials validated

475

476

**POC Implementation:**

477

478

* ✅ At least 2 sources found

479

* ✅ Sources accessible (URLs valid)

480

* ❌ No whitelist checking

481

* ❌ No credential validation

482

* ❌ No comprehensive reliability scoring

483

484

**Pass Criteria:** ≥2 accessible sources found

485

486

**Failure Handling:** Display error message, don't generate verdict

487

488

=== 6.3 Gate 2: Contradiction Search (Basic) ===

489

490

**Full System Requirements:**

491

492

* Counter-evidence actively searched

493

* Reservations and limitations identified

494

* Alternative interpretations explored

495

* Bubble detection (echo chambers, conspiracy theories)

496

* Cross-cultural and international perspectives

497

* Academic literature (supporting AND opposing)

498

499

**POC Implementation:**

500

501

* ✅ Basic search for counter-evidence

502

* ✅ Identify obvious contradictions

503

* ❌ No comprehensive academic search

504

* ❌ No bubble detection

505

* ❌ No systematic alternative interpretation search

506

* ❌ No international perspective verification

507

508

**Pass Criteria:** Basic contradiction search attempted

509

510

**Failure Handling:** Note "limited contradiction search" in output

511

512

=== 6.4 Gate 3: Uncertainty Quantification (Basic) ===

513

514

**Full System Requirements:**

515

516

* Confidence scores calculated for all claims/verdicts

517

* Limitations explicitly stated

518

* Data gaps identified and disclosed

519

* Strength of evidence assessed

520

* Alternative scenarios considered

521

522

**POC Implementation:**

523

524

* ✅ Confidence scores (0-100%)

525

* ✅ Basic uncertainty acknowledgment

526

* ❌ No detailed limitation disclosure

527

* ❌ No data gap identification

528

* ❌ No alternative scenario consideration (deferred to POC2)

529

530

**Pass Criteria:** Confidence score assigned

531

532

**Failure Handling:** Show "Confidence: Unknown" if calculation fails

533

534

=== 6.5 Gate 4: Structural Integrity (Basic) ===

535

536

**Full System Requirements:**

537

538

* No hallucinations detected (fact-checking against sources)

539

* Logic chain valid and traceable

540

* References accessible and verifiable

541

* No circular reasoning

542

* Premises clearly stated

543

544

**POC Implementation:**

545

546

* ✅ Basic coherence check

547

* ✅ References accessible

548

* ❌ No comprehensive hallucination detection

549

* ❌ No formal logic validation

550

* ❌ No premise extraction and verification

551

552

**Pass Criteria:** Output is coherent and references are accessible

553

554

**Failure Handling:** Display error message

555

556

=== 6.6 Quality Gate Display ===

557

558

**POC shows simplified status:**

559

{{code}}Quality Gates: 4/4 Passed (Simplified)

560

✓ Source Quality: 3 sources found

561

✓ Contradiction Search: Basic search completed

562

✓ Uncertainty: Confidence scores assigned

563

✓ Structural Integrity: Output coherent{{/code}}

564

565

**If any gate fails:**

566

{{code}}Quality Gates: 3/4 Passed (Simplified)

567

✓ Source Quality: 3 sources found

568

✗ Contradiction Search: Search failed - limited evidence

569

✓ Uncertainty: Confidence scores assigned

570

✓ Structural Integrity: Output coherent

571

572

Note: This analysis has limited evidence. Use with caution.{{/code}}

573

574

=== 6.7 Simplified vs. Full System ===

575

576

|=Gate|=POC (Simplified)|=Full System

577

|Source Quality|≥2 sources accessible|Whitelist scoring, credentials, comprehensiveness

578

|Contradiction|Basic search|Systematic academic + media + international

579

|Uncertainty|Confidence % assigned|Detailed limitations, data gaps, alternatives

580

|Structural|Coherence check|Hallucination detection, logic validation, premise check

581

582

**POC Goal:** Demonstrate that quality gates are possible, not perfect implementation.

583

584

== 7. AKEL Architecture Comparison ==

585

586

=== 7.1 POC AKEL (Simplified) ===

**Implementation:**

* Single Claude API call (Sonnet 4.5)

591

* One comprehensive prompt

592

* All processing in single request

593

* No separate components

594

* No orchestration layer

595

596

**Prompt Structure:**

597

{{code}}Task: Analyze this article and provide:

598

599

1. Extract 3-5 factual claims

600

2. For each claim:

601

- Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)

602

- Assign confidence score (0-100%)

603

- Assign risk tier (A/B/C)

604

- Write brief reasoning (1-3 sentences)

605

3. Generate analysis summary (3-5 sentences)

606

4. Generate article summary (3-5 sentences)

607

5. Run basic quality checks

608

609

Return as structured JSON.{{/code}}

610

611

**Processing Time:** 10-18 seconds (estimate)

612

613

=== 7.2 Full System AKEL (Production) ===

614

615

**Architecture:**

616

{{code}}AKEL Orchestrator

617

├── Claim Extractor

618

├── Claim Classifier (with risk tier assignment)

619

├── Scenario Generator

620

├── Evidence Summarizer

621

├── Contradiction Detector

622

├── Quality Gate Validator

623

├── Audit Sampling Scheduler

624

└── Federation Sync Adapter (Release 1.0+){{/code}}

**Processing:**

* Parallel processing where possible

629

* Separate component calls

630

* Quality gates between phases

631

* Audit sampling selection

632

* Cross-node coordination (federated mode)

633

634

**Processing Time:** 10-30 seconds (full pipeline)

635

636

=== 7.3 Why POC Uses Single Call ===

**Advantages:**

* ✅ Simpler to implement

641

* ✅ Faster POC development

642

* ✅ Easier to debug

643

* ✅ Proves AI capability

644

* ✅ Good enough for concept validation

**Limitations:**

* ❌ No component reusability

649

* ❌ No parallel processing

650

* ❌ All-or-nothing (can't partially succeed)

651

* ❌ Harder to improve individual components

652

* ❌ No audit sampling

653

654

**Acceptable Trade-off:**

655

656

POC tests "Can AI do this?" not "How should we architect it?"

657

658

Full component architecture comes in Beta after POC validates concept.

659

660

=== 7.4 Evolution Path ===

661

662

**POC1:** Single prompt → Prove concept

663

**POC2:** Add scenario component → Test full pipeline

664

**Beta:** Multi-component AKEL → Production architecture

665

**Release 1.0:** Full AKEL + Federation → Scale

666

667

== 8. Functional Requirements ==

668

669

=== FR-POC-1: Article Input ===

670

671

**Requirement:** User can submit article for analysis

**Functionality:**

* Text input field (paste article text, up to 5000 characters)

676

* URL input field (paste article URL)

677

* "Analyze" button to trigger processing

678

* Loading indicator during analysis

**Excluded:**

* No user authentication

683

* No claim history

684

* No search functionality

685

* No saved templates

686

687

**Acceptance Criteria:**

688

689

* User can paste text from article

690

* User can paste URL of article

691

* System accepts input and triggers analysis

692

693

=== FR-POC-2: Claim Extraction (Fully Automated) ===

694

695

**Requirement:** AI automatically extracts 3-5 factual claims

**Functionality:**

* AI reads article text

700

* AI identifies factual claims (not opinions/questions)

701

* AI extracts 3-5 most important claims

702

* System displays numbered list

703

704

**Critical:** NO MANUAL EDITING ALLOWED

705

706

* AI selects which claims to extract

707

* AI identifies factual vs. non-factual

708

* System processes claims as extracted

709

* No human curation or correction

**Error Handling:**

* If extraction fails: Display error message

714

* User can retry with different input

715

* No manual intervention to fix extraction

716

717

**Acceptance Criteria:**

718

719

* AI extracts 3-5 claims automatically

720

* Claims are factual (not opinions)

721

* Claims are clearly stated

722

* No manual editing required

723

724

=== FR-POC-3: Verdict Generation (Fully Automated) ===

725

726

**Requirement:** AI automatically generates verdict for each claim

**Functionality:**

* For each claim, AI:

731

* Evaluates claim based on available evidence/knowledge

732

* Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED

733

* Assigns confidence score (0-100%)

734

* Assigns risk tier (A/B/C)

735

* Writes brief reasoning (1-3 sentences)

736

* System displays verdict for each claim

737

738

**Critical:** NO MANUAL EDITING ALLOWED

739

740

* AI computes verdicts based on evidence

741

* AI generates confidence scores

742

* AI writes reasoning

743

* No human review or adjustment

**Error Handling:**

* If verdict generation fails: Display error message

748

* User can retry

749

* No manual intervention to adjust verdicts

750

751

**Acceptance Criteria:**

752

753

* Each claim has a verdict

754

* Confidence score is displayed (0-100%)

755

* Risk tier is displayed (A/B/C)

756

* Reasoning is understandable (1-3 sentences)

757

* Verdict is defensible given reasoning

758

* All generated automatically by AI

759

760

=== FR-POC-4: Analysis Summary (Fully Automated) ===

761

762

**Requirement:** AI generates brief summary of analysis

**Functionality:**

* AI summarizes findings in 3-5 sentences:

767

* How many claims found

768

* Distribution of verdicts

769

* Overall assessment

770

* System displays at top of results

771

772

**Critical:** NO MANUAL EDITING ALLOWED

773

774

**Acceptance Criteria:**

775

776

* Summary is coherent

777

* Accurately reflects analysis

778

* 3-5 sentences

779

* Automatically generated

780

781

=== FR-POC-5: Article Summary (Fully Automated, Optional) ===

782

783

**Requirement:** AI generates brief summary of original article

**Functionality:**

* AI summarizes article content (not FactHarbor's analysis)

* 3-5 sentences

* System displays

**Note:** Optional - can skip if time limited

792

793

**Critical:** NO MANUAL EDITING ALLOWED

794

795

**Acceptance Criteria:**

796

797

* Summary is neutral (article's position)

798

* Accurately reflects article content

799

* 3-5 sentences

800

* Automatically generated

801

802

=== FR-POC-6: Publication Mode Display ===

803

804

**Requirement:** Clear labeling of AI-generated content

**Functionality:**

* Display Mode 2 publication label

809

* Show POC/Demo disclaimer

810

* Display risk tiers per claim

811

* Show quality gate status

812

* Display timestamp

813

814

**Acceptance Criteria:**

815

816

* Label is prominent and clear

817

* User understands this is AI-generated POC output

818

* Risk tiers are color-coded

819

* Quality gate status is visible

820

821

=== FR-POC-7: Quality Gate Execution ===

822

823

**Requirement:** Execute simplified quality gates

**Functionality:**

* Check source quality (basic)

828

* Attempt contradiction search (basic)

829

* Calculate confidence scores

830

* Verify structural integrity (basic)

831

* Display gate results

832

833

**Acceptance Criteria:**

834

835

* All 4 gates attempted

836

* Pass/fail status displayed

837

* Failures explained to user

838

* Gates don't block publication (POC mode)

839

840

== 9. Non-Functional Requirements ==

841

842

=== NFR-POC-1: Fully Automated Processing ===

843

844

**Requirement:** Complete AI automation with zero manual intervention

845

846

**Critical Rule:** NO MANUAL EDITING AT ANY STAGE

**What this means:**

* Claims: AI selects (no human curation)

851

* Scenarios: N/A (deferred to POC2)

852

* Evidence: AI evaluates (no human selection)

853

* Verdicts: AI determines (no human adjustment)

854

* Summaries: AI writes (no human editing)

855

856

**Pipeline:**

857

{{code}}User Input → AKEL Processing → Output Display

858

↓

859

ZERO human editing{{/code}}

860

861

**If AI output is poor:**

862

863

* ❌ Do NOT manually fix it

864

* ✅ Document the failure

865

* ✅ Improve prompts and retry

866

* ✅ Accept that POC might fail

867

868

**Why this matters:**

869

870

* Tests whether AI can do this without humans

871

* Validates scalability (humans can't review every analysis)

872

* Honest test of technical feasibility

873

874

=== NFR-POC-2: Performance ===

875

876

**Requirement:** Analysis completes in reasonable time

877

878

**Acceptable Performance:**

879

880

* Processing time: 1-5 minutes (acceptable for POC)

881

* Display loading indicator to user

882

* Show progress if possible ("Extracting claims...", "Generating verdicts...")

**Not Required:**

* Production-level speed (< 30 seconds)

887

* Optimization for scale

888

* Caching

889

890

**Acceptance Criteria:**

891

892

* Analysis completes within 5 minutes

893

* User sees loading indicator

894

* No timeout errors

895

896

=== NFR-POC-3: Reliability ===

897

898

**Requirement:** System works for manual testing sessions

**Acceptable:**

* Occasional errors (< 20% failure rate)

903

* Manual restart if needed

904

* Display error messages clearly

**Not Required:**

* 99.9% uptime

* Automatic error recovery

910

* Production monitoring

911

912

**Acceptance Criteria:**

913

914

* System works for test demonstrations

915

* Errors are handled gracefully

916

* User receives clear error messages

917

918

=== NFR-POC-4: Environment ===

919

920

**Requirement:** Runs on simple infrastructure

**Acceptable:**

* Single machine or simple cloud setup

925

* No distributed architecture

926

* No load balancing

927

* No redundancy

928

* Local development environment viable

**Not Required:**

* Production infrastructure

933

* Multi-region deployment

* Auto-scaling

* Disaster recovery

=== NFR-POC-5: Cost Efficiency Tracking ===

938

939

**Requirement:** Track and display LLM usage metrics to inform optimization decisions

**Must Track:**

* Input tokens (article + prompt)

944

* Output tokens (generated analysis)

945

* Total tokens

946

* Estimated cost (USD)

947

* Response time (seconds)

948

* Article length (words/characters)

**Must Display:**

* Usage statistics in UI (Component 5)

953

* Cost per analysis

954

* Cost per claim extracted

**Must Log:**

* Aggregate metrics for analysis

959

* Cost distribution by article length

960

* Token efficiency trends

**Purpose:**

* Understand unit economics

965

* Identify optimization opportunities

966

* Project costs at scale

967

* Inform architecture decisions (caching, model selection, etc.)

968

969

**Acceptance Criteria:**

970

971

* ✅ Usage data displayed after each analysis

972

* ✅ Metrics logged for aggregate analysis

973

* ✅ Cost calculated accurately (Claude API pricing)

974

* ✅ Test cases include varying article lengths

975

* ✅ POC1 report includes cost analysis section

**Success Target:**

* Average cost per analysis < $0.05 USD

980

* Cost scaling behavior understood (linear/exponential)

981

* 2+ optimization opportunities identified

982

983

**Critical:** Unit economics must be viable for scaling decision!

984

985

== 10. Technical Architecture ==

986

987

=== 10.1 System Components ===

**Frontend:**

* Simple HTML form (text input + URL input + button)

992

* Loading indicator

993

* Results display page (single page, no tabs/navigation)

**Backend:**

* Single API endpoint

998

* Calls Claude API (Sonnet 4.5 or latest)

999

* Parses response

1000

* Returns JSON to frontend

**Data Storage:**

* None required (stateless POC)

1005

* Optional: Simple file storage or SQLite for demo examples

1006

1007

**External Services:**

1008

1009

* Claude API (Anthropic) - required

1010

* Optional: URL fetch service for article text extraction

1011

1012

=== 10.2 Processing Flow ===

1013

1014

1015

1. User submits text or URL

1016

↓

1017

2. Backend receives request

1018

↓

1019

3. If URL: Fetch article text

1020

↓

1021

4. Call Claude API with single prompt:

1022

"Extract claims, evaluate each, provide verdicts"

1023

↓

1024

5. Claude API returns:

1025

- Analysis summary

1026

- Claims list

1027

- Verdicts for each claim (with risk tiers)

1028

- Article summary (optional)

1029

- Quality gate results

1030

↓

1031

6. Backend parses response

1032

↓

1033

7. Frontend displays results with Mode 2 labeling

1034

1035

1036

**Key Simplification:** Single API call does entire analysis

1037

1038

=== 10.3 AI Prompt Strategy ===

1039

1040

**Single Comprehensive Prompt:**

1041

{{code}}Task: Analyze this article and provide:

1042

1043

1. Identify the article's main thesis/conclusion

1044

- What is the article trying to argue or prove?

1045

- What is the primary claim or conclusion?

1046

1047

2. Extract 3-5 factual claims from the article

1048

- Note which claims are CENTRAL to the main thesis

1049

- Note which claims are SUPPORTING facts

1050

1051

3. For each claim:

1052

- Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)

1053

- Assign confidence score (0-100%)

1054

- Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)

1055

- Write brief reasoning (1-3 sentences)

1056

1057

4. Assess relationship between claims and main thesis:

1058

- Do the claims actually support the article's conclusion?

1059

- Are there logical leaps or unsupported inferences?

1060

- Is the article's framing misleading even if individual facts are accurate?

1061

1062

5. Run quality gates:

1063

- Check: ≥2 sources found

1064

- Attempt: Basic contradiction search

1065

- Calculate: Confidence scores

1066

- Verify: Structural integrity

1067

1068

6. Write context-aware analysis summary (4-6 sentences):

1069

- State article's main thesis

1070

- Report claims found and verdict distribution

1071

- Note if central claims are problematic

1072

- Assess whether evidence supports conclusion

1073

- Overall credibility considering claim importance

1074

1075

7. Write article summary (3-5 sentences: neutral summary of article content)

1076

1077

Return as structured JSON with quality gate results.{{/code}}

1078

1079

**One prompt generates everything.**

1080

1081

**Critical Addition:**

1082

1083

Steps 1, 2 (marking central claims), 4, and 6 are NEW for context-aware analysis. These test whether AI can distinguish between "accurate facts poorly reasoned" vs. "genuinely credible article."

1084

1085

=== 10.4 Technology Stack Suggestions ===

**Frontend:**

* HTML + CSS + JavaScript (minimal framework)

1090

* OR: Next.js (if team prefers)

1091

* Hosted: Local machine OR Vercel/Netlify free tier

**Backend:**

* Python Flask/FastAPI (simple REST API)

1096

* OR: Next.js API routes (if using Next.js)

1097

* Hosted: Local machine OR Railway/Render free tier

1098

1099

**AKEL Integration:**

1100

1101

* Claude API via Anthropic SDK

1102

* Model: Claude Sonnet 4.5 or latest available

**Database:**

* None (stateless acceptable)

1107

* OR: SQLite if want to store demo examples

1108

* OR: JSON files on disk

**Deployment:**

* Local development environment sufficient for POC

1113

* Optional: Deploy to cloud for remote demos

1114

1115

== 11. Success Criteria ==

1116

1117

=== 11.1 Minimum Success (POC Passes) ===

1118

1119

**Required for GO decision:**

1120

1121

* ✅ AI extracts 3-5 factual claims automatically

1122

* ✅ AI provides verdict for each claim automatically

1123

* ✅ Verdicts are reasonable (≥70% make logical sense)

1124

* ✅ Analysis summary is coherent

1125

* ✅ Output is comprehensible to reviewers

1126

* ✅ Team/advisors understand the output

1127

* ✅ Team agrees approach has merit

1128

* ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention)

1129

* ✅ **Cost efficiency acceptable** (average cost per analysis < $0.05 USD target)

1130

* ✅ **Cost scaling understood** (data collected on article length vs. cost)

1131

* ✅ **Optimization opportunities identified** (≥2 potential improvements documented)

1132

1133

**Quality Definition:**

1134

1135

* "Reasonable verdict" = Defensible given general knowledge

1136

* "Coherent summary" = Logically structured, grammatically correct

1137

* "Comprehensible" = Reviewers understand what analysis means

1138

1139

=== 11.2 POC Fails If ===

1140

1141

**Automatic NO-GO if any of these:**

1142

1143

* ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones)

1144

* ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random)

1145

* ❌ Output incomprehensible (reviewers can't understand analysis)

1146

* ❌ **Requires manual editing for most analyses** (> 50% need human correction)

1147

* ❌ Team loses confidence in AI-automated approach

1148

1149

=== 11.3 Quality Thresholds ===

1150

1151

**POC quality expectations:**

1152

1153

|=Component|=Quality Threshold|=Definition

1154

|Claim Extraction|(% class="success" %)≥70% accuracy |Identifies obvious factual claims, may miss some edge cases

1155

|Verdict Logic|(% class="success" %)≥70% defensible |Verdicts are logical given reasoning provided

1156

|Reasoning Clarity|(% class="success" %)≥70% clear |1-3 sentences are understandable and relevant

1157

|Overall Analysis|(% class="success" %)≥70% useful |Output helps user understand article claims

1158

1159

**Analogy:** "B student" quality (70-80%), not "A+" perfection yet

**Not expecting:**

* 100% accuracy

* Perfect claim coverage

1165

* Comprehensive evidence gathering

* Flawless verdicts

* Production polish

**Expecting:**

* Reasonable claim extraction

1172

* Defensible verdicts

1173

* Understandable reasoning

* Useful output

== 12. Test Cases ==

=== 12.1 Test Case 1: Simple Factual Claim ===

1179

1180

**Input:** "Coffee reduces the risk of type 2 diabetes by 30%"

**Expected Output:**

* Extract claim correctly

1185

* Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED

1186

* Confidence: 70-90%

1187

* Risk tier: C (Low)

1188

* Reasoning: Mentions studies or evidence

1189

1190

**Success:** Verdict is reasonable and reasoning makes sense

1191

1192

=== 12.2 Test Case 2: Complex News Article ===

1193

1194

**Input:** News article URL with multiple claims about politics/health/science

**Expected Output:**

* Extract 3-5 key claims

1199

* Verdict for each (may vary: some supported, some uncertain, some refuted)

1200

* Coherent analysis summary

1201

* Article summary

1202

* Risk tiers assigned appropriately

1203

1204

**Success:** Claims identified are actually from article, verdicts are reasonable

1205

1206

=== 12.3 Test Case 3: Controversial Topic ===

1207

1208

**Input:** Article on contested political or scientific topic

**Expected Output:**

* Balanced analysis

* Acknowledges uncertainty where appropriate

1214

* Doesn't overstate confidence

1215

* Reasoning shows awareness of complexity

1216

1217

**Success:** Analysis is fair and doesn't show obvious bias

1218

1219

=== 12.4 Test Case 4: Clearly False Claim ===

1220

1221

**Input:** Article with obviously false claim (e.g., "The Earth is flat")

**Expected Output:**

* Extract claim

* Verdict: REFUTED

* High confidence (> 90%)

1228

* Risk tier: C (Low - established fact)

1229

* Clear reasoning

1230

1231

**Success:** AI correctly identifies false claim with high confidence

1232

1233

=== 12.5 Test Case 5: Genuinely Uncertain Claim ===

1234

1235

**Input:** Article with claim where evidence is genuinely mixed

**Expected Output:**

* Extract claim

* Verdict: UNCERTAIN

* Moderate confidence (40-60%)

1242

* Reasoning explains why uncertain

1243

1244

**Success:** AI recognizes uncertainty and doesn't overstate confidence

1245

1246

=== 12.6 Test Case 6: High-Risk Medical Claim ===

1247

1248

**Input:** Article making medical claims

**Expected Output:**

* Extract claim

* Verdict: [appropriate based on evidence]

1254

* Risk tier: A (High - medical)

1255

* Red label displayed

1256

* Clear disclaimer about not being medical advice

1257

1258

**Success:** Risk tier correctly assigned, appropriate warnings shown

1259

1260

== 13. POC Decision Gate ==

1261

1262

=== 13.1 Decision Framework ===

1263

1264

After POC testing complete, team makes one of three decisions:

1265

1266

**Option A: GO (Proceed to POC2)**

**Conditions:**

* AI quality ≥70% without manual editing

1271

* Basic claim → verdict pipeline validated

1272

* Internal + advisor feedback positive

1273

* Technical feasibility confirmed

1274

* Team confident in direction

1275

* Clear path to improving AI quality to ≥90%

**Next Steps:**

* Plan POC2 development (add scenarios)

1280

* Design scenario architecture

1281

* Expand to Evidence Model structure

1282

* Test with more complex articles

1283

1284

**Option B: NO-GO (Pivot or Stop)**

**Conditions:**

* AI quality < 60%

* Requires manual editing for most analyses (> 50%)

1290

* Feedback indicates fundamental flaws

1291

* Cost/effort not justified by value

1292

* No clear path to improvement

**Next Steps:**

* **Pivot:** Change to hybrid human-AI approach (accept manual review required)

1297

* **Stop:** Conclude approach not viable, revisit later

1298

1299

**Option C: ITERATE (Improve POC)**

**Conditions:**

* Concept has merit but execution needs work

1304

* Specific improvements identified

1305

* Addressable with better prompts/approach

1306

* AI quality between 60-70%

**Next Steps:**

* Improve AI prompts

* Test different approaches

1312

* Re-run POC with improvements

1313

* Then make GO/NO-GO decision

1314

1315

=== 13.2 Decision Criteria Summary ===

1316

1317

1318

AI Quality < 60% → NO-GO (approach doesn't work)

1319

AI Quality 60-70% → ITERATE (improve and retry)

1320

AI Quality ≥70% → GO (proceed to POC2)

1321

1322

1323

== 14. Key Risks & Mitigations ==

1324

1325

=== 14.1 Risk: AI Quality Not Good Enough ===

1326

1327

**Likelihood:** Medium-High

1328

**Impact:** POC fails

**Mitigation:**

* Extensive prompt engineering and testing

1333

* Use best available AI models (Sonnet 4.5)

1334

* Test with diverse article types

1335

* Iterate on prompts based on results

1336

1337

**Acceptance:** This is what POC tests - be ready for failure

1338

1339

=== 14.2 Risk: AI Consistency Issues ===

1340

1341

**Likelihood:** Medium

1342

**Impact:** Works sometimes, fails other times

**Mitigation:**

* Test with 10+ diverse articles

1347

* Measure success rate honestly

1348

* Improve prompts to increase consistency

1349

1350

**Acceptance:** Some variability OK if average quality ≥70%

1351

1352

=== 14.3 Risk: Output Incomprehensible ===

1353

1354

**Likelihood:** Low-Medium

1355

**Impact:** Users can't understand analysis

**Mitigation:**

* Create clear explainer document

1360

* Iterate on output format

1361

* Test with non-technical reviewers

1362

* Simplify language if needed

1363

1364

**Acceptance:** Iterate until comprehensible

1365

1366

=== 14.4 Risk: API Rate Limits / Costs ===

1367

1368

**Likelihood:** Low

1369

**Impact:** System slow or expensive

**Mitigation:**

* Monitor API usage

* Implement retry logic

1375

* Estimate costs before scaling

1376

1377

**Acceptance:** POC can be slow and expensive (optimization later)

1378

1379

=== 14.5 Risk: Scope Creep ===

1380

1381

**Likelihood:** Medium

1382

**Impact:** POC becomes too complex

**Mitigation:**

* Strict scope discipline

1387

* Say NO to feature additions

1388

* Keep focus on core question

1389

1390

**Acceptance:** POC is minimal by design

1391

1392

== 15. POC Philosophy ==

1393

1394

=== 15.1 Core Principles ===

*

**

**1. Build Less, Learn More

1399

* Minimum features to test hypothesis

1400

* Don't build unvalidated features

1401

* Focus on core question only

**2. Fail Fast**

* Quick test of hardest part (AI capability)

1406

* Accept that POC might fail

1407

* Better to discover issues early

1408

* Honest assessment over optimistic hope

1409

1410

**3. Test First, Build Second**

1411

1412

* Validate AI can do this before building platform

1413

* Don't assume it will work

1414

* Let results guide decisions

1415

1416

**4. Automation First**

1417

1418

* No manual editing allowed

1419

* Tests scalability, not just feasibility

1420

* Proves approach can work at scale

1421

1422

**5. Honest Assessment**

1423

1424

* Don't cherry-pick examples

1425

* Don't manually fix bad outputs

1426

* Document failures openly

1427

* Make data-driven decisions

1428

1429

=== 15.2 What POC Is ===

1430

1431

✅ Testing AI capability without humans

1432

✅ Proving core technical concept

1433

✅ Fast validation of approach

1434

✅ Honest assessment of feasibility

1435

1436

=== 15.3 What POC Is NOT ===

1437

1438

❌ Building a product

1439

❌ Production-ready system

1440

❌ Feature-complete platform

1441

❌ Perfectly accurate analysis

1442

❌ Polished user experience

== 16. Success ==

Clear Path Forward ==

1447

1448

**If POC succeeds (≥70% AI quality):**

1449

1450

* ✅ Approach validated

1451

* ✅ Proceed to POC2 (add scenarios)

1452

* ✅ Design full Evidence Model structure

1453

* ✅ Test multi-scenario comparison

1454

* ✅ Focus on improving AI quality from 70% → 90%

1455

1456

**If POC fails (< 60% AI quality):**

1457

1458

* ✅ Learn what doesn't work

1459

* ✅ Pivot to different approach

1460

* ✅ OR wait for better AI technology

1461

* ✅ Avoid wasting resources on non-viable approach

1462

1463

**Either way, POC provides clarity.**

1464

1465

== 17. Related Pages ==

1466

1467

* [[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]

1468

* [[Requirements>>FactHarbor.Specification.Requirements.WebHome]]

1469

* [[Gap Analysis>>FactHarbor.Specification.Requirements.GapAnalysis]]

1470

* [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]

1471

* [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]

1472

* [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]

1473

1474

**Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)

1475

1476

1477

=== NFR-POC-11: LLM Provider Abstraction (POC1) ===

1478

1479

**Requirement:** POC1 MUST implement LLM abstraction layer with support for multiple providers.

1480

1481

**POC1 Implementation:**

1482

1483

* **Primary Provider:** Anthropic Claude API

1484

* Stage 1: Claude Haiku 4

1485

* Stage 2: Claude Sonnet 3.5 (cached)

1486

* Stage 3: Claude Sonnet 3.5

1487

1488

* **Provider Interface:** Abstract LLMProvider interface implemented

1489

1490

* **Configuration:** Environment variables for provider selection

1491

* {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}}

1492

* {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}}

1493

* {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}}

1494

1495

* **Failover:** Basic error handling with cache fallback for Stage 2

1496

1497

* **Cost Tracking:** Log provider name and cost per request

1498

1499

**Future (POC2/Beta):**

1500

1501

* Secondary provider (OpenAI) with automatic failover

1502

* Admin API for runtime provider switching

1503

* Cost comparison dashboard

1504

* Cross-provider output verification

1505

1506

**Success Criteria:**

1507

1508

* All LLM calls go through abstraction layer (no direct API calls)

1509

* Provider can be changed via environment variable without code changes

1510

* Cost tracking includes provider name in logs

1511

* Stage 2 falls back to cache on provider failure

1512

1513

**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor V0\.9\.105.Specification.POC.API-and-Schemas.WebHome]] Section 6

**Dependencies:**

* NFR-14 (Main Requirements)

1518

* Design Decision 9

1519

* Architecture Section 2.2

1520

1521

**Priority:** HIGH (P1)

1522

1523

**Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.

Wiki source code of POC Requirements (POC1 & POC2)

Applications

Navigation

Need help?