POC Requirements (POC1 & POC2)

Prove basic AI capability first, then add scenario complexity based on POC1 learnings. This is good engineering: test the hardest part (AI fact-checking) before adding architectural complexity.

50

51

**No Risk:**

52

53

Scenarios are additive complexity, not foundational. Deferring them to POC2 allows:

54

* Faster POC1 validation

55

* Learning from POC1 to inform scenario design

56

* Iterative approach: fail fast if basic AI doesn't work

57

* Flexibility to adjust scenario architecture based on POC1 insights

58

59

**Full System Workflow (Future):**

60

61

Claims → Scenarios → Evidence → Verdicts

62

63

64

**POC1 Simplified Workflow:**

65

66

Claims → Verdicts (scenarios implicit in reasoning)

67

68

69

== 2. POC Output Specification ==

70

71

=== 2.1 Component 1: ANALYSIS SUMMARY (Context-Aware) ===

72

73

**What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument

74

75

**Length:** 4-6 sentences

76

77

**Content (Required Elements):**

78

1. **Article's main thesis/claim** - What is the article trying to argue or prove?

79

2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts

80

3. **Central vs. supporting claims** - Which claims are central to the article's argument?

81

4. **Relationship assessment** - Do the claims support the article's conclusion?

82

5. **Overall credibility** - Final assessment considering claim importance

83

84

**Critical Innovation:**

85

86

POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might:

87

* Make accurate supporting facts but draw unsupported conclusions

88

* Have one false central claim that invalidates the whole argument

89

* Misframe accurate information to mislead

90

91

**Good Example (Context-Aware):**

92

93

This article argues that coffee cures cancer based on its antioxidant

94

content. We analyzed 3 factual claims: 2 about coffee's chemical

95

properties are well-supported, but the main causal claim is refuted

96

by current evidence. The article confuses correlation with causation.

97

Overall assessment: MISLEADING - makes an unsupported medical claim

98

despite citing some accurate facts.

99

100

101

**Poor Example (Simple Aggregation - Don't Do This):**

102

103

This article makes 3 claims. 2 are well-supported and 1 is refuted.

104

Overall assessment: mostly accurate (67% accurate).

105

106

↑ This misses that the refuted claim IS the article's main point!

**What POC1 Tests:**

Can AI identify and assess:

111

* ✅ The article's main thesis/conclusion?

112

* ✅ Which claims are central vs. supporting?

113

* ✅ Whether the evidence supports the conclusion?

114

* ✅ Overall credibility considering logical structure?

115

116

**If AI Cannot Do This:**

117

118

That's valuable to learn in POC1! We'll:

119

* Note as limitation

120

* Fall back to simple aggregation with warning

121

* Design explicit article-level analysis for POC2

122

123

=== 2.2 Component 2: CLAIMS IDENTIFICATION ===

124

125

**What:** List of factual claims extracted from article

126

**Format:** Numbered list

127

**Quantity:** 3-5 claims

128

**Requirements:**

129

* Factual claims only (not opinions/questions)

130

* Clearly stated

131

* Automatically extracted by AI

**Example:**

CLAIMS IDENTIFIED:

[1] Coffee reduces diabetes risk by 30%

138

[2] Coffee improves heart health

139

[3] Decaf has same benefits as regular

140

[4] Coffee prevents Alzheimer's completely

141

142

143

=== 2.3 Component 3: CLAIMS VERDICTS ===

144

145

**What:** Verdict for each claim identified

146

**Format:** Per claim structure

147

148

**Required Elements:**

149

* **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED

150

* **Confidence Score:** 0-100%

151

* **Brief Reasoning:** 1-3 sentences explaining why

152

* **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration

**Example:**

VERDICTS:

[1] WELL-SUPPORTED (85%) [Risk: C]

159

Multiple studies confirm 25-30% risk reduction with regular consumption.

160

161

[2] UNCERTAIN (65%) [Risk: B]

162

Evidence is mixed. Some studies show benefits, others show no effect.

163

164

[3] PARTIALLY SUPPORTED (60%) [Risk: C]

165

Some benefits overlap, but caffeine-related benefits are reduced in decaf.

166

167

[4] REFUTED (90%) [Risk: B]

168

No evidence for complete prevention. Claim is significantly overstated.

169

170

171

**Risk Tier Display:**

172

* **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections

173

* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality

174

* **Tier C (Green):** Low Risk - Facts/Definitions/History

175

176

**Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.

177

178

=== 2.4 Component 4: ARTICLE SUMMARY (Optional) ===

179

180

**What:** Brief summary of original article content

181

**Length:** 3-5 sentences

182

**Tone:** Neutral (article's position, not FactHarbor's analysis)

**Example:**

ARTICLE SUMMARY:

Health News Today article discusses coffee benefits, citing studies

189

on diabetes and Alzheimer's. Author highlights research linking coffee

190

to disease prevention. Recommends 2-3 cups daily for optimal health.

191

192

193

=== 2.5 Component 5: USAGE STATISTICS (Cost Tracking) ===

194

195

**What:** LLM usage metrics for cost optimization and scaling decisions

196

197

**Purpose:**

198

* Understand cost per analysis

199

* Identify optimization opportunities

200

* Project costs at scale

201

* Inform architecture decisions

**Display Format:**

USAGE STATISTICS:

• Article: 2,450 words (12,300 characters)

207

• Input tokens: 15,234

208

• Output tokens: 892

209

• Total tokens: 16,126

210

• Estimated cost: $0.24 USD

211

• Response time: 8.3 seconds

212

• Cost per claim: $0.048

213

• Model: claude-sonnet-4-20250514

214

215

216

**Why This Matters:**

217

218

At scale, LLM costs are critical:

219

* 10,000 articles/month ≈ $200-500/month

220

* 100,000 articles/month ≈ $2,000-5,000/month

221

* Cost optimization can reduce expenses 30-50%

222

223

**What POC1 Learns:**

224

* How cost scales with article length

225

* Prompt optimization opportunities (caching, compression)

226

* Output verbosity tradeoffs

227

* Model selection strategy (FAST vs. REASONING roles)

228

* Article length limits (if needed)

229

230

**Implementation:**

231

* Claude API already returns usage data

232

* No extra API calls needed

233

* Display to user + log for aggregate analysis

234

* Test with articles of varying lengths

235

236

**Critical for GO/NO-GO:** Unit economics must be viable at scale!

237

238

=== 2.6 Total Output Size ===

239

240

**Combined:** ~220-350 words

241

* Analysis Summary (Context-Aware): 60-90 words (4-6 sentences)

242

* Claims Identification: 30-50 words

243

* Claims Verdicts: 100-150 words

244

* Article Summary: 30-50 words (optional)

245

246

**Note:** Analysis summary is slightly longer (4-6 sentences vs. 3-5) to accommodate context-aware assessment of article structure and logical reasoning.

247

248

== 3. What's NOT in POC Scope ==

249

250

=== 3.1 Feature Exclusions ===

251

252

The following are **explicitly excluded** from POC:

253

254

**Content Features:**

255

* ❌ Scenarios (deferred to POC2)

256

* ❌ Evidence display (supporting/opposing lists)

257

* ❌ Source links (clickable references)

258

* ❌ Detailed reasoning chains

259

* ❌ Source quality ratings (shown but not detailed)

260

* ❌ Contradiction detection (basic only)

261

* ❌ Risk assessment (shown but not workflow-integrated)

262

263

**Platform Features:**

264

* ❌ User accounts / authentication

265

* ❌ Saved history

266

* ❌ Search functionality

267

* ❌ Claim comparison

268

* ❌ User contributions

269

* ❌ Commenting system

270

* ❌ Social sharing

271

272

**Technical Features:**

273

* ❌ Browser extensions

* ❌ Mobile apps

* ❌ API endpoints

* ❌ Webhooks

* ❌ Export features (PDF, CSV)

278

279

**Quality Features:**

280

* ❌ Accessibility (WCAG compliance)

281

* ❌ Multilingual support

282

* ❌ Mobile optimization

283

* ❌ Media verification (images/videos)

284

285

**Production Features:**

286

* ❌ Security hardening

287

* ❌ Privacy compliance (GDPR)

288

* ❌ Terms of service

289

* ❌ Monitoring/logging

* ❌ Error tracking

* ❌ Analytics

* ❌ A/B testing

== 4. POC Simplifications vs. Full System ==

295

296

=== 4.1 Architecture Comparison ===

297

298

**POC Architecture (Simplified):**

299

300

User Input → Single AKEL Call → Output Display

(all processing)

**Full System Architecture:**

305

306

User Input → Claim Extractor → Claim Classifier → Scenario Generator

307

→ Evidence Summarizer → Contradiction Detector → Verdict Generator

308

→ Quality Gates → Publication → Output Display

**Key Differences:**

|=Aspect|=POC1|=Full System

314

|Processing|Single API call|Multi-component pipeline

315

|Scenarios|None (implicit)|Explicit entities with versioning

316

|Evidence|Basic retrieval|Comprehensive with quality scoring

317

|Quality Gates|Simplified (4 basic checks)|Full validation infrastructure

318

|Workflow|3 steps (input/process/output)|6 phases with gates

319

|Data Model|Stateless (no database)|PostgreSQL + Redis + S3

320

|Architecture|Single prompt to Claude|AKEL Orchestrator + Components

321

322

=== 4.2 Workflow Comparison ===

323

324

**POC1 Workflow:**

325

1. User submits text/URL

326

2. Single AKEL call (all processing in one prompt)

327

3. Display results

328

**Total: 3 steps, ~10-18 seconds**

329

330

**Full System Workflow:**

331

1. **Claim Submission** (extraction, normalization, clustering)

332

2. **Scenario Building** (definitions, assumptions, boundaries)

333

3. **Evidence Handling** (retrieval, assessment, linking)

334

4. **Verdict Creation** (synthesis, reasoning, approval)

335

5. **Public Presentation** (summaries, landscapes, deep dives)

336

6. **Time Evolution** (versioning, re-evaluation triggers)

337

**Total: 6 phases with quality gates, ~10-30 seconds**

338

339

=== 4.3 Why POC is Simplified ===

340

341

**Engineering Rationale:**

342

343

1. **Test core capability first:** Can AI do basic fact-checking without humans?

344

2. **Fail fast:** If AI can't generate reasonable verdicts, pivot early

345

3. **Learn before building:** POC1 insights inform full architecture

346

4. **Iterative approach:** Add complexity only after validating foundations

347

5. **Resource efficiency:** Don't build full system if core concept fails

348

349

**Acceptable Trade-offs:**

350

351

* ✅ POC proves AI capability (most risky assumption)

352

* ✅ POC validates user comprehension (can people understand output?)

353

* ❌ POC doesn't validate full workflow (test in Beta)

354

* ❌ POC doesn't validate scale (test in Beta)

355

* ❌ POC doesn't validate scenario architecture (design in POC2)

356

357

=== 4.4 Gap Between POC1 and POC2/Beta ===

358

359

**What needs to be built for POC2:**

360

* Scenario generation component

361

* Evidence Model structure (full)

362

* Scenario-evidence linking

363

* Multi-interpretation comparison

364

* Truth landscape visualization

365

366

**What needs to be built for Beta:**

367

* Multi-component AKEL pipeline

368

* Quality gate infrastructure

369

* Review workflow system

370

* Audit sampling framework

371

* Production data model

372

* Federation architecture (Release 1.0)

373

374

**POC1 → POC2 is significant architectural expansion.**

375

376

== 5. Publication Mode & Labeling ==

377

378

=== 5.1 POC Publication Mode ===

379

380

**Mode:** Mode 2 (AI-Generated, No Prior Human Review)

381

382

Per FactHarbor Specification Section 11 "POC v1 Behavior":

383

* Produces public AI-generated output

384

* No human approval gate

385

* Clear AI-Generated labeling

386

* All quality gates active (simplified)

387

* Risk tier classification shown (demo)

388

389

=== 5.2 User-Facing Labels ===

390

391

**Primary Label (top of analysis):**

392

393

╔════════════════════════════════════════════════════════════╗

394

║ [AI-GENERATED - POC/DEMO] ║

395

║ ║

396

║ This analysis was produced entirely by AI and has not ║

397

║ been human-reviewed. Use for demonstration purposes. ║

398

║ ║

399

║ Source: AI/AKEL v1.0 (POC) ║

400

║ Review Status: Not Reviewed (Proof-of-Concept) ║

401

║ Quality Gates: 4/4 Passed (Simplified) ║

402

║ Last Updated: [timestamp] ║

403

╚════════════════════════════════════════════════════════════╝

404

405

406

**Per-Claim Risk Labels:**

407

* **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety)

408

* **[Risk: B]** 🟡 Medium Risk (Policy/Science)

409

* **[Risk: C]** 🟢 Low Risk (Facts/Definitions)

410

411

=== 5.3 Display Requirements ===

412

413

**Must Show:**

414

* AI-Generated status (prominent)

415

* POC/Demo disclaimer

416

* Risk tier per claim

417

* Confidence scores (0-100%)

418

* Quality gate status (passed/failed)

* Timestamp

**Must NOT Claim:**

* Human review

* Production quality

* Medical/legal advice

425

* Authoritative verdicts

426

* Complete accuracy

427

428

=== 5.4 Mode 2 vs. Full System Publication ===

429

430

|=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3

431

|Label|AI-Generated (POC)|AI-Generated|AKEL-Generated

432

|Review|None|None|Human-Reviewed

433

|Quality Gates|4 (simplified)|6 (full)|6 (full) + Human

434

|Audit|None (POC)|Sampling (5-50%)|Pre-publication

435

|Risk Display|Demo only|Workflow-integrated|Validated

436

|User Actions|View only|Flag for review|Trust rating

437

438

== 6. Quality Gates (Simplified Implementation) ==

=== 6.1 Overview ===

Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates.

443

444

**Full System Has 4 Gates:**

445

1. Source Quality

446

2. Contradiction Search (MANDATORY)

447

3. Uncertainty Quantification

448

4. Structural Integrity

449

450

**POC Implements Simplified Versions:**

451

* Focus on demonstrating concept

452

* Basic implementations sufficient

453

* Failures displayed to user (not blocking)

454

* Full system has comprehensive validation

455

456

=== 6.2 Gate 1: Source Quality (Basic) ===

457

458

**Full System Requirements:**

459

* Primary sources identified and accessible

460

* Source reliability scored against whitelist

461

* Citation completeness verified

462

* Publication dates checked

463

* Author credentials validated

464

465

**POC Implementation:**

466

* ✅ At least 2 sources found

467

* ✅ Sources accessible (URLs valid)

468

* ❌ No whitelist checking

469

* ❌ No credential validation

470

* ❌ No comprehensive reliability scoring

471

472

**Pass Criteria:** ≥2 accessible sources found

473

474

**Failure Handling:** Display error message, don't generate verdict

475

476

=== 6.3 Gate 2: Contradiction Search (Basic) ===

477

478

**Full System Requirements:**

479

* Counter-evidence actively searched

480

* Reservations and limitations identified

481

* Alternative interpretations explored

482

* Bubble detection (echo chambers, conspiracy theories)

483

* Cross-cultural and international perspectives

484

* Academic literature (supporting AND opposing)

485

486

**POC Implementation:**

487

* ✅ Basic search for counter-evidence

488

* ✅ Identify obvious contradictions

489

* ❌ No comprehensive academic search

490

* ❌ No bubble detection

491

* ❌ No systematic alternative interpretation search

492

* ❌ No international perspective verification

493

494

**Pass Criteria:** Basic contradiction search attempted

495

496

**Failure Handling:** Note "limited contradiction search" in output

497

498

=== 6.4 Gate 3: Uncertainty Quantification (Basic) ===

499

500

**Full System Requirements:**

501

* Confidence scores calculated for all claims/verdicts

502

* Limitations explicitly stated

503

* Data gaps identified and disclosed

504

* Strength of evidence assessed

505

* Alternative scenarios considered

506

507

**POC Implementation:**

508

* ✅ Confidence scores (0-100%)

509

* ✅ Basic uncertainty acknowledgment

510

* ❌ No detailed limitation disclosure

511

* ❌ No data gap identification

512

* ❌ No alternative scenario consideration (deferred to POC2)

513

514

**Pass Criteria:** Confidence score assigned

515

516

**Failure Handling:** Show "Confidence: Unknown" if calculation fails

517

518

=== 6.5 Gate 4: Structural Integrity (Basic) ===

519

520

**Full System Requirements:**

521

* No hallucinations detected (fact-checking against sources)

522

* Logic chain valid and traceable

523

* References accessible and verifiable

524

* No circular reasoning

525

* Premises clearly stated

526

527

**POC Implementation:**

528

* ✅ Basic coherence check

529

* ✅ References accessible

530

* ❌ No comprehensive hallucination detection

531

* ❌ No formal logic validation

532

* ❌ No premise extraction and verification

533

534

**Pass Criteria:** Output is coherent and references are accessible

535

536

**Failure Handling:** Display error message

537

538

=== 6.6 Quality Gate Display ===

539

540

**POC shows simplified status:**

541

542

Quality Gates: 4/4 Passed (Simplified)

543

✓ Source Quality: 3 sources found

544

✓ Contradiction Search: Basic search completed

545

✓ Uncertainty: Confidence scores assigned

546

✓ Structural Integrity: Output coherent

547

548

549

**If any gate fails:**

550

551

Quality Gates: 3/4 Passed (Simplified)

552

✓ Source Quality: 3 sources found

553

✗ Contradiction Search: Search failed - limited evidence

554

✓ Uncertainty: Confidence scores assigned

555

✓ Structural Integrity: Output coherent

556

557

Note: This analysis has limited evidence. Use with caution.

558

559

560

=== 6.7 Simplified vs. Full System ===

561

562

|=Gate|=POC (Simplified)|=Full System

563

|Source Quality|≥2 sources accessible|Whitelist scoring, credentials, comprehensiveness

564

|Contradiction|Basic search|Systematic academic + media + international

565

|Uncertainty|Confidence % assigned|Detailed limitations, data gaps, alternatives

566

|Structural|Coherence check|Hallucination detection, logic validation, premise check

567

568

**POC Goal:** Demonstrate that quality gates are possible, not perfect implementation.

569

570

== 7. AKEL Architecture Comparison ==

571

572

=== 7.1 POC AKEL (Simplified) ===

573

574

**Implementation:**

575

* Single provider API call (REASONING model)

576

* One comprehensive prompt

577

* All processing in single request

578

* No separate components

579

* No orchestration layer

580

581

**Prompt Structure:**

582

583

Task: Analyze this article and provide:

584

585

1. Extract 3-5 factual claims

586

2. For each claim:

587

- Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)

588

- Assign confidence score (0-100%)

589

- Assign risk tier (A/B/C)

590

- Write brief reasoning (1-3 sentences)

591

3. Generate analysis summary (3-5 sentences)

592

4. Generate article summary (3-5 sentences)

593

5. Run basic quality checks

594

595

Return as structured JSON.

596

597

598

**Processing Time:** 10-18 seconds (estimate)

599

600

=== 7.2 Full System AKEL (Production) ===

**Architecture:**

AKEL Orchestrator

├── Claim Extractor

├── Claim Classifier (with risk tier assignment)

607

├── Scenario Generator

608

├── Evidence Summarizer

609

├── Contradiction Detector

610

├── Quality Gate Validator

611

├── Audit Sampling Scheduler

612

└── Federation Sync Adapter (Release 1.0+)

**Processing:**

* Parallel processing where possible

617

* Separate component calls

618

* Quality gates between phases

619

* Audit sampling selection

620

* Cross-node coordination (federated mode)

621

622

**Processing Time:** 10-30 seconds (full pipeline)

623

624

=== 7.3 Why POC Uses Single Call ===

625

626

**Advantages:**

627

* ✅ Simpler to implement

628

* ✅ Faster POC development

629

* ✅ Easier to debug

630

* ✅ Proves AI capability

631

* ✅ Good enough for concept validation

632

633

**Limitations:**

634

* ❌ No component reusability

635

* ❌ No parallel processing

636

* ❌ All-or-nothing (can't partially succeed)

637

* ❌ Harder to improve individual components

638

* ❌ No audit sampling

639

640

**Acceptable Trade-off:**

641

642

POC tests "Can AI do this?" not "How should we architect it?"

643

644

Full component architecture comes in Beta after POC validates concept.

645

646

=== 7.4 Evolution Path ===

647

648

**POC1:** Single prompt → Prove concept

649

**POC2:** Add scenario component → Test full pipeline

650

**Beta:** Multi-component AKEL → Production architecture

651

**Release 1.0:** Full AKEL + Federation → Scale

652

653

== 8. Functional Requirements ==

654

655

=== FR-POC-1: Article Input ===

656

657

**Requirement:** User can submit article for analysis

658

659

**Functionality:**

660

* Text input field (paste article text, up to 5000 characters)

661

* URL input field (paste article URL)

662

* "Analyze" button to trigger processing

663

* Loading indicator during analysis

664

665

**Excluded:**

666

* No user authentication

667

* No claim history

668

* No search functionality

669

* No saved templates

670

671

**Acceptance Criteria:**

672

* User can paste text from article

673

* User can paste URL of article

674

* System accepts input and triggers analysis

675

676

=== FR-POC-2: Claim Extraction (Fully Automated) ===

677

678

**Requirement:** AI automatically extracts 3-5 factual claims

679

680

**Functionality:**

681

* AI reads article text

682

* AI identifies factual claims (not opinions/questions)

683

* AI extracts 3-5 most important claims

684

* System displays numbered list

685

686

**Critical:** NO MANUAL EDITING ALLOWED

687

* AI selects which claims to extract

688

* AI identifies factual vs. non-factual

689

* System processes claims as extracted

690

* No human curation or correction

691

692

**Error Handling:**

693

* If extraction fails: Display error message

694

* User can retry with different input

695

* No manual intervention to fix extraction

696

697

**Acceptance Criteria:**

698

* AI extracts 3-5 claims automatically

699

* Claims are factual (not opinions)

700

* Claims are clearly stated

701

* No manual editing required

702

703

=== FR-POC-3: Verdict Generation (Fully Automated) ===

704

705

**Requirement:** AI automatically generates verdict for each claim

706

707

**Functionality:**

708

* For each claim, AI:

709

* Evaluates claim based on available evidence/knowledge

710

* Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED

711

* Assigns confidence score (0-100%)

712

* Assigns risk tier (A/B/C)

713

* Writes brief reasoning (1-3 sentences)

714

* System displays verdict for each claim

715

716

**Critical:** NO MANUAL EDITING ALLOWED

717

* AI computes verdicts based on evidence

718

* AI generates confidence scores

719

* AI writes reasoning

720

* No human review or adjustment

721

722

**Error Handling:**

723

* If verdict generation fails: Display error message

724

* User can retry

725

* No manual intervention to adjust verdicts

726

727

**Acceptance Criteria:**

728

* Each claim has a verdict

729

* Confidence score is displayed (0-100%)

730

* Risk tier is displayed (A/B/C)

731

* Reasoning is understandable (1-3 sentences)

732

* Verdict is defensible given reasoning

733

* All generated automatically by AI

734

735

=== FR-POC-4: Analysis Summary (Fully Automated) ===

736

737

**Requirement:** AI generates brief summary of analysis

738

739

**Functionality:**

740

* AI summarizes findings in 3-5 sentences:

741

* How many claims found

742

* Distribution of verdicts

743

* Overall assessment

744

* System displays at top of results

745

746

**Critical:** NO MANUAL EDITING ALLOWED

747

748

**Acceptance Criteria:**

749

* Summary is coherent

750

* Accurately reflects analysis

751

* 3-5 sentences

752

* Automatically generated

753

754

=== FR-POC-5: Article Summary (Fully Automated, Optional) ===

755

756

**Requirement:** AI generates brief summary of original article

757

758

**Functionality:**

759

* AI summarizes article content (not FactHarbor's analysis)

* 3-5 sentences

* System displays

**Note:** Optional - can skip if time limited

764

765

**Critical:** NO MANUAL EDITING ALLOWED

766

767

**Acceptance Criteria:**

768

* Summary is neutral (article's position)

769

* Accurately reflects article content

770

* 3-5 sentences

771

* Automatically generated

772

773

=== FR-POC-6: Publication Mode Display ===

774

775

**Requirement:** Clear labeling of AI-generated content

776

777

**Functionality:**

778

* Display Mode 2 publication label

779

* Show POC/Demo disclaimer

780

* Display risk tiers per claim

781

* Show quality gate status

782

* Display timestamp

783

784

**Acceptance Criteria:**

785

* Label is prominent and clear

786

* User understands this is AI-generated POC output

787

* Risk tiers are color-coded

788

* Quality gate status is visible

789

790

=== FR-POC-7: Quality Gate Execution ===

791

792

**Requirement:** Execute simplified quality gates

793

794

**Functionality:**

795

* Check source quality (basic)

796

* Attempt contradiction search (basic)

797

* Calculate confidence scores

798

* Verify structural integrity (basic)

799

* Display gate results

800

801

**Acceptance Criteria:**

802

* All 4 gates attempted

803

* Pass/fail status displayed

804

* Failures explained to user

805

* Gates don't block publication (POC mode)

806

807

== 9. Non-Functional Requirements ==

808

809

=== NFR-POC-1: Fully Automated Processing ===

810

811

**Requirement:** Complete AI automation with zero manual intervention

812

813

**Critical Rule:** NO MANUAL EDITING AT ANY STAGE

814

815

**What this means:**

816

* Claims: AI selects (no human curation)

817

* Scenarios: N/A (deferred to POC2)

818

* Evidence: AI evaluates (no human selection)

819

* Verdicts: AI determines (no human adjustment)

820

* Summaries: AI writes (no human editing)

**Pipeline:**

User Input → AKEL Processing → Output Display

↓

ZERO human editing

**If AI output is poor:**

830

* ❌ Do NOT manually fix it

831

* ✅ Document the failure

832

* ✅ Improve prompts and retry

833

* ✅ Accept that POC might fail

834

835

**Why this matters:**

836

* Tests whether AI can do this without humans

837

* Validates scalability (humans can't review every analysis)

838

* Honest test of technical feasibility

839

840

=== NFR-POC-2: Performance ===

841

842

**Requirement:** Analysis completes in reasonable time

843

844

**Acceptable Performance:**

845

* Processing time: 1-5 minutes (acceptable for POC)

846

* Display loading indicator to user

847

* Show progress if possible ("Extracting claims...", "Generating verdicts...")

848

849

**Not Required:**

850

* Production-level speed (< 30 seconds)

851

* Optimization for scale

852

* Caching

853

854

**Acceptance Criteria:**

855

* Analysis completes within 5 minutes

856

* User sees loading indicator

857

* No timeout errors

858

859

=== NFR-POC-3: Reliability ===

860

861

**Requirement:** System works for manual testing sessions

862

863

**Acceptable:**

864

* Occasional errors (< 20% failure rate)

865

* Manual restart if needed

866

* Display error messages clearly

**Not Required:**

* 99.9% uptime

* Automatic error recovery

871

* Production monitoring

872

873

**Acceptance Criteria:**

874

* System works for test demonstrations

875

* Errors are handled gracefully

876

* User receives clear error messages

877

878

=== NFR-POC-4: Environment ===

879

880

**Requirement:** Runs on simple infrastructure

881

882

**Acceptable:**

883

* Single machine or simple cloud setup

884

* No distributed architecture

885

* No load balancing

886

* No redundancy

887

* Local development environment viable

888

889

**Not Required:**

890

* Production infrastructure

891

* Multi-region deployment

* Auto-scaling

* Disaster recovery

=== NFR-POC-5: Cost Efficiency Tracking ===

896

897

**Requirement:** Track and display LLM usage metrics to inform optimization decisions

898

899

**Must Track:**

900

* Input tokens (article + prompt)

901

* Output tokens (generated analysis)

902

* Total tokens

903

* Estimated cost (USD)

904

* Response time (seconds)

905

* Article length (words/characters)

906

907

**Must Display:**

908

* Usage statistics in UI (Component 5)

909

* Cost per analysis

910

* Cost per claim extracted

911

912

**Must Log:**

913

* Aggregate metrics for analysis

914

* Cost distribution by article length

915

* Token efficiency trends

916

917

**Purpose:**

918

* Understand unit economics

919

* Identify optimization opportunities

920

* Project costs at scale

921

* Inform architecture decisions (caching, model selection, etc.)

922

923

**Acceptance Criteria:**

924

* ✅ Usage data displayed after each analysis

925

* ✅ Metrics logged for aggregate analysis

926

* ✅ Cost calculated accurately (Claude API pricing)

927

* ✅ Test cases include varying article lengths

928

* ✅ POC1 report includes cost analysis section

929

930

**Success Target:**

931

* Average cost per analysis < $0.05 USD

932

* Cost scaling behavior understood (linear/exponential)

933

* 2+ optimization opportunities identified

934

935

**Critical:** Unit economics must be viable for scaling decision!

936

937

== 10. Technical Architecture ==

938

939

=== 10.1 System Components ===

940

941

**Frontend:**

942

* Simple HTML form (text input + URL input + button)

943

* Loading indicator

944

* Results display page (single page, no tabs/navigation)

945

946

**Backend:**

947

* Single API endpoint

948

* Calls provider API (REASONING model; configured via LLM abstraction)

949

* Parses response

950

* Returns JSON to frontend

951

952

**Data Storage:**

953

* None required (stateless POC)

954

* Optional: Simple file storage or SQLite for demo examples

955

956

**External Services:**

957

* Claude API (Anthropic) - required

958

* Optional: URL fetch service for article text extraction

959

960

=== 10.2 Processing Flow ===

961

962

963

1. User submits text or URL

964

↓

965

2. Backend receives request

966

↓

967

3. If URL: Fetch article text

968

↓

969

4. Call Claude API with single prompt:

970

"Extract claims, evaluate each, provide verdicts"

971

↓

972

5. Claude API returns:

973

- Analysis summary

974

- Claims list

975

- Verdicts for each claim (with risk tiers)

976

- Article summary (optional)

977

- Quality gate results

978

↓

979

6. Backend parses response

980

↓

981

7. Frontend displays results with Mode 2 labeling

982

983

984

**Key Simplification:** Single API call does entire analysis

985

986

=== 10.3 AI Prompt Strategy ===

987

988

**Single Comprehensive Prompt:**

989

990

Task: Analyze this article and provide:

991

992

1. Identify the article's main thesis/conclusion

993

- What is the article trying to argue or prove?

994

- What is the primary claim or conclusion?

995

996

2. Extract 3-5 factual claims from the article

997

- Note which claims are CENTRAL to the main thesis

998

- Note which claims are SUPPORTING facts

999

1000

3. For each claim:

1001

- Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)

1002

- Assign confidence score (0-100%)

1003

- Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)

1004

- Write brief reasoning (1-3 sentences)

1005

1006

4. Assess relationship between claims and main thesis:

1007

- Do the claims actually support the article's conclusion?

1008

- Are there logical leaps or unsupported inferences?

1009

- Is the article's framing misleading even if individual facts are accurate?

1010

1011

5. Run quality gates:

1012

- Check: ≥2 sources found

1013

- Attempt: Basic contradiction search

1014

- Calculate: Confidence scores

1015

- Verify: Structural integrity

1016

1017

6. Write context-aware analysis summary (4-6 sentences):

1018

- State article's main thesis

1019

- Report claims found and verdict distribution

1020

- Note if central claims are problematic

1021

- Assess whether evidence supports conclusion

1022

- Overall credibility considering claim importance

1023

1024

7. Write article summary (3-5 sentences: neutral summary of article content)

1025

1026

Return as structured JSON with quality gate results.

1027

1028

1029

**One prompt generates everything.**

1030

1031

**Critical Addition:**

1032

1033

Steps 1, 2 (marking central claims), 4, and 6 are NEW for context-aware analysis. These test whether AI can distinguish between "accurate facts poorly reasoned" vs. "genuinely credible article."

1034

1035

=== 10.4 Technology Stack Suggestions ===

1036

1037

**Frontend:**

1038

* HTML + CSS + JavaScript (minimal framework)

1039

* OR: Next.js (if team prefers)

1040

* Hosted: Local machine OR Vercel/Netlify free tier

1041

1042

**Backend:**

1043

* Python Flask/FastAPI (simple REST API)

1044

* OR: Next.js API routes (if using Next.js)

1045

* Hosted: Local machine OR Railway/Render free tier

1046

1047

**AKEL Integration:**

1048

* Claude API via Anthropic SDK

1049

* Model: Provider-default REASONING model or latest available

1050

1051

**Database:**

1052

* None (stateless acceptable)

1053

* OR: SQLite if want to store demo examples

1054

* OR: JSON files on disk

1055

1056

**Deployment:**

1057

* Local development environment sufficient for POC

1058

* Optional: Deploy to cloud for remote demos

1059

1060

== 11. Success Criteria ==

1061

1062

=== 11.1 Minimum Success (POC Passes) ===

1063

1064

**Required for GO decision:**

1065

* ✅ AI extracts 3-5 factual claims automatically

1066

* ✅ AI provides verdict for each claim automatically

1067

* ✅ Verdicts are reasonable (≥70% make logical sense)

1068

* ✅ Analysis summary is coherent

1069

* ✅ Output is comprehensible to reviewers

1070

* ✅ Team/advisors understand the output

1071

* ✅ Team agrees approach has merit

1072

* ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention)

1073

* ✅ **Cost efficiency acceptable** (average cost per analysis < $0.05 USD target)

1074

* ✅ **Cost scaling understood** (data collected on article length vs. cost)

1075

* ✅ **Optimization opportunities identified** (≥2 potential improvements documented)

1076

1077

**Quality Definition:**

1078

* "Reasonable verdict" = Defensible given general knowledge

1079

* "Coherent summary" = Logically structured, grammatically correct

1080

* "Comprehensible" = Reviewers understand what analysis means

1081

1082

=== 11.2 POC Fails If ===

1083

1084

**Automatic NO-GO if any of these:**

1085

* ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones)

1086

* ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random)

1087

* ❌ Output incomprehensible (reviewers can't understand analysis)

1088

* ❌ **Requires manual editing for most analyses** (> 50% need human correction)

1089

* ❌ Team loses confidence in AI-automated approach

1090

1091

=== 11.3 Quality Thresholds ===

1092

1093

**POC quality expectations:**

1094

1095

|=Component|=Quality Threshold|=Definition

1096

|Claim Extraction|(% class="success" %)≥70% accuracy(%%) |Identifies obvious factual claims, may miss some edge cases

1097

|Verdict Logic|(% class="success" %)≥70% defensible(%%) |Verdicts are logical given reasoning provided

1098

|Reasoning Clarity|(% class="success" %)≥70% clear(%%) |1-3 sentences are understandable and relevant

1099

|Overall Analysis|(% class="success" %)≥70% useful(%%) |Output helps user understand article claims

1100

1101

**Analogy:** "B student" quality (70-80%), not "A+" perfection yet

**Not expecting:**

* 100% accuracy

* Perfect claim coverage

1106

* Comprehensive evidence gathering

* Flawless verdicts

* Production polish

**Expecting:**

* Reasonable claim extraction

1112

* Defensible verdicts

1113

* Understandable reasoning

* Useful output

== 12. Test Cases ==

=== 12.1 Test Case 1: Simple Factual Claim ===

1119

1120

**Input:** "Coffee reduces the risk of type 2 diabetes by 30%"

1121

1122

**Expected Output:**

1123

* Extract claim correctly

1124

* Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED

1125

* Confidence: 70-90%

1126

* Risk tier: C (Low)

1127

* Reasoning: Mentions studies or evidence

1128

1129

**Success:** Verdict is reasonable and reasoning makes sense

1130

1131

=== 12.2 Test Case 2: Complex News Article ===

1132

1133

**Input:** News article URL with multiple claims about politics/health/science

1134

1135

**Expected Output:**

1136

* Extract 3-5 key claims

1137

* Verdict for each (may vary: some supported, some uncertain, some refuted)

1138

* Coherent analysis summary

1139

* Article summary

1140

* Risk tiers assigned appropriately

1141

1142

**Success:** Claims identified are actually from article, verdicts are reasonable

1143

1144

=== 12.3 Test Case 3: Controversial Topic ===

1145

1146

**Input:** Article on contested political or scientific topic

**Expected Output:**

* Balanced analysis

* Acknowledges uncertainty where appropriate

1151

* Doesn't overstate confidence

1152

* Reasoning shows awareness of complexity

1153

1154

**Success:** Analysis is fair and doesn't show obvious bias

1155

1156

=== 12.4 Test Case 4: Clearly False Claim ===

1157

1158

**Input:** Article with obviously false claim (e.g., "The Earth is flat")

**Expected Output:**

* Extract claim

* Verdict: REFUTED

* High confidence (> 90%)

1164

* Risk tier: C (Low - established fact)

1165

* Clear reasoning

1166

1167

**Success:** AI correctly identifies false claim with high confidence

1168

1169

=== 12.5 Test Case 5: Genuinely Uncertain Claim ===

1170

1171

**Input:** Article with claim where evidence is genuinely mixed

**Expected Output:**

* Extract claim

* Verdict: UNCERTAIN

* Moderate confidence (40-60%)

1177

* Reasoning explains why uncertain

1178

1179

**Success:** AI recognizes uncertainty and doesn't overstate confidence

1180

1181

=== 12.6 Test Case 6: High-Risk Medical Claim ===

1182

1183

**Input:** Article making medical claims

**Expected Output:**

* Extract claim

* Verdict: [appropriate based on evidence]

1188

* Risk tier: A (High - medical)

1189

* Red label displayed

1190

* Clear disclaimer about not being medical advice

1191

1192

**Success:** Risk tier correctly assigned, appropriate warnings shown

1193

1194

== 13. POC Decision Gate ==

1195

1196

=== 13.1 Decision Framework ===

1197

1198

After POC testing complete, team makes one of three decisions:

1199

1200

**Option A: GO (Proceed to POC2)**

1201

1202

**Conditions:**

1203

* AI quality ≥70% without manual editing

1204

* Basic claim → verdict pipeline validated

1205

* Internal + advisor feedback positive

1206

* Technical feasibility confirmed

1207

* Team confident in direction

1208

* Clear path to improving AI quality to ≥90%

1209

1210

**Next Steps:**

1211

* Plan POC2 development (add scenarios)

1212

* Design scenario architecture

1213

* Expand to Evidence Model structure

1214

* Test with more complex articles

1215

1216

**Option B: NO-GO (Pivot or Stop)**

**Conditions:**

* AI quality < 60%

* Requires manual editing for most analyses (> 50%)

1221

* Feedback indicates fundamental flaws

1222

* Cost/effort not justified by value

1223

* No clear path to improvement

1224

1225

**Next Steps:**

1226

* **Pivot:** Change to hybrid human-AI approach (accept manual review required)

1227

* **Stop:** Conclude approach not viable, revisit later

1228

1229

**Option C: ITERATE (Improve POC)**

1230

1231

**Conditions:**

1232

* Concept has merit but execution needs work

1233

* Specific improvements identified

1234

* Addressable with better prompts/approach

1235

* AI quality between 60-70%

**Next Steps:**

* Improve AI prompts

* Test different approaches

1240

* Re-run POC with improvements

1241

* Then make GO/NO-GO decision

1242

1243

=== 13.2 Decision Criteria Summary ===

1244

1245

1246

AI Quality < 60% → NO-GO (approach doesn't work)

1247

AI Quality 60-70% → ITERATE (improve and retry)

1248

AI Quality ≥70% → GO (proceed to POC2)

1249

1250

1251

== 14. Key Risks & Mitigations ==

1252

1253

=== 14.1 Risk: AI Quality Not Good Enough ===

1254

1255

**Likelihood:** Medium-High

1256

**Impact:** POC fails

1257

1258

**Mitigation:**

1259

* Extensive prompt engineering and testing

1260

* Use best available AI models (role-based selection; configured via LLM abstraction)

1261

* Test with diverse article types

1262

* Iterate on prompts based on results

1263

1264

**Acceptance:** This is what POC tests - be ready for failure

1265

1266

=== 14.2 Risk: AI Consistency Issues ===

1267

1268

**Likelihood:** Medium

1269

**Impact:** Works sometimes, fails other times

1270

1271

**Mitigation:**

1272

* Test with 10+ diverse articles

1273

* Measure success rate honestly

1274

* Improve prompts to increase consistency

1275

1276

**Acceptance:** Some variability OK if average quality ≥70%

1277

1278

=== 14.3 Risk: Output Incomprehensible ===

1279

1280

**Likelihood:** Low-Medium

1281

**Impact:** Users can't understand analysis

1282

1283

**Mitigation:**

1284

* Create clear explainer document

1285

* Iterate on output format

1286

* Test with non-technical reviewers

1287

* Simplify language if needed

1288

1289

**Acceptance:** Iterate until comprehensible

1290

1291

=== 14.4 Risk: API Rate Limits / Costs ===

1292

1293

**Likelihood:** Low

1294

**Impact:** System slow or expensive

**Mitigation:**

* Monitor API usage

* Implement retry logic

1299

* Estimate costs before scaling

1300

1301

**Acceptance:** POC can be slow and expensive (optimization later)

1302

1303

=== 14.5 Risk: Scope Creep ===

1304

1305

**Likelihood:** Medium

1306

**Impact:** POC becomes too complex

1307

1308

**Mitigation:**

1309

* Strict scope discipline

1310

* Say NO to feature additions

1311

* Keep focus on core question

1312

1313

**Acceptance:** POC is minimal by design

1314

1315

== 15. POC Philosophy ==

1316

1317

=== 15.1 Core Principles ===

1318

1319

**1. Build Less, Learn More**

1320

* Minimum features to test hypothesis

1321

* Don't build unvalidated features

1322

* Focus on core question only

1323

1324

**2. Fail Fast**

1325

* Quick test of hardest part (AI capability)

1326

* Accept that POC might fail

1327

* Better to discover issues early

1328

* Honest assessment over optimistic hope

1329

1330

**3. Test First, Build Second**

1331

* Validate AI can do this before building platform

1332

* Don't assume it will work

1333

* Let results guide decisions

1334

1335

**4. Automation First**

1336

* No manual editing allowed

1337

* Tests scalability, not just feasibility

1338

* Proves approach can work at scale

1339

1340

**5. Honest Assessment**

1341

* Don't cherry-pick examples

1342

* Don't manually fix bad outputs

1343

* Document failures openly

1344

* Make data-driven decisions

1345

1346

=== 15.2 What POC Is ===

1347

1348

✅ Testing AI capability without humans

1349

✅ Proving core technical concept

1350

✅ Fast validation of approach

1351

✅ Honest assessment of feasibility

1352

1353

=== 15.3 What POC Is NOT ===

1354

1355

❌ Building a product

1356

❌ Production-ready system

1357

❌ Feature-complete platform

1358

❌ Perfectly accurate analysis

1359

❌ Polished user experience

1360

1361

== 16. Success = Clear Path Forward ==

1362

1363

**If POC succeeds (≥70% AI quality):**

1364

* ✅ Approach validated

1365

* ✅ Proceed to POC2 (add scenarios)

1366

* ✅ Design full Evidence Model structure

1367

* ✅ Test multi-scenario comparison

1368

* ✅ Focus on improving AI quality from 70% → 90%

1369

1370

**If POC fails (< 60% AI quality):**

1371

* ✅ Learn what doesn't work

1372

* ✅ Pivot to different approach

1373

* ✅ OR wait for better AI technology

1374

* ✅ Avoid wasting resources on non-viable approach

1375

1376

**Either way, POC provides clarity.**

1377

1378

== 17. Related Pages ==

1379

1380

* [[User Needs>>FactHarbor.Specification.Requirements.User Needs.WebHome]]

1381

* [[Requirements>>FactHarbor.Specification.Requirements.WebHome]]

1382

* [[Gap Analysis>>FactHarbor.Specification.Requirements.GapAnalysis]]

1383

* [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]

1384

* [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]

1385

* [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]

1386

1387

**Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)

1388

1389

1390

=== NFR-POC-11: LLM Provider Abstraction (POC1) ===

1391

1392

**Requirement:** POC1 MUST implement LLM abstraction layer with support for multiple providers.

1393

1394

**POC1 Implementation:**

1395

1396

* **Primary Provider:** Anthropic Claude API

1397

* Stage 1: Provider-default FAST model

1398

* Stage 2: Provider-default REASONING model (cached)

1399

* Stage 3: Provider-default REASONING model

1400

1401

* **Provider Interface:** Abstract LLMProvider interface implemented

1402

1403

* **Configuration:** Environment variables for provider selection

1404

* {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}}

1405

* {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}}

1406

* {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}}

1407

1408

* **Failover:** Basic error handling with cache fallback for Stage 2

1409

1410

* **Cost Tracking:** Log provider name and cost per request

1411

1412

**Future (POC2/Beta):**

1413

1414

* Secondary provider (OpenAI) with automatic failover

1415

* Admin API for runtime provider switching

1416

* Cost comparison dashboard

1417

* Cross-provider output verification

1418

1419

**Success Criteria:**

1420

1421

* All LLM calls go through abstraction layer (no direct API calls)

1422

* Provider can be changed via environment variable without code changes

1423

* Cost tracking includes provider name in logs

1424

* Stage 2 falls back to cache on provider failure

1425

1426

**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor.Specification.POC.API-and-Schemas.WebHome]] Section 6

1427

1428

**Dependencies:**

1429

* NFR-14 (Main Requirements)

1430

* Design Decision 9

1431

* Architecture Section 2.2

1432

1433

**Priority:** HIGH (P1)

1434

1435

**Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.

Wiki source code of POC Requirements (POC1 & POC2)