POC Requirements (POC1 & POC2)

Prove basic AI capability first, then add scenario complexity based on POC1 learnings. This is good engineering: test the hardest part (AI fact-checking) before adding architectural complexity.

41

42

**No Risk:**

43

44

Scenarios are additive complexity, not foundational. Deferring them to POC2 allows:

45

* Faster POC1 validation

46

* Learning from POC1 to inform scenario design

47

* Iterative approach: fail fast if basic AI doesn't work

48

* Flexibility to adjust scenario architecture based on POC1 insights

49

50

**Full System Workflow (Future):**

51

52

Claims → Scenarios → Evidence → Verdicts

53

54

55

**POC1 Simplified Workflow:**

56

57

Claims → Verdicts (scenarios implicit in reasoning)

58

59

60

== 2. POC Output Specification ==

61

62

=== 2.1 Component 1: ANALYSIS SUMMARY (Context-Aware) ===

63

64

**What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument

65

66

**Length:** 4-6 sentences

67

68

**Content (Required Elements):**

69

1. **Article's main thesis/claim** - What is the article trying to argue or prove?

70

2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts

71

3. **Central vs. supporting claims** - Which claims are central to the article's argument?

72

4. **Relationship assessment** - Do the claims support the article's conclusion?

73

5. **Overall credibility** - Final assessment considering claim importance

74

75

**Critical Innovation:**

76

77

POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might:

78

* Make accurate supporting facts but draw unsupported conclusions

79

* Have one false central claim that invalidates the whole argument

80

* Misframe accurate information to mislead

81

82

**Good Example (Context-Aware):**

83

84

This article argues that coffee cures cancer based on its antioxidant

85

content. We analyzed 3 factual claims: 2 about coffee's chemical

86

properties are well-supported, but the main causal claim is refuted

87

by current evidence. The article confuses correlation with causation.

88

Overall assessment: MISLEADING - makes an unsupported medical claim

89

despite citing some accurate facts.

90

91

92

**Poor Example (Simple Aggregation - Don't Do This):**

93

94

This article makes 3 claims. 2 are well-supported and 1 is refuted.

95

Overall assessment: mostly accurate (67% accurate).

96

97

↑ This misses that the refuted claim IS the article's main point!

**What POC1 Tests:**

Can AI identify and assess:

102

* ✅ The article's main thesis/conclusion?

103

* ✅ Which claims are central vs. supporting?

104

* ✅ Whether the evidence supports the conclusion?

105

* ✅ Overall credibility considering logical structure?

106

107

**If AI Cannot Do This:**

108

109

That's valuable to learn in POC1! We'll:

110

* Note as limitation

111

* Fall back to simple aggregation with warning

112

* Design explicit article-level analysis for POC2

113

114

=== 2.2 Component 2: CLAIMS IDENTIFICATION ===

115

116

**What:** List of factual claims extracted from article

117

**Format:** Numbered list

118

**Quantity:** 3-5 claims

119

**Requirements:**

120

* Factual claims only (not opinions/questions)

121

* Clearly stated

122

* Automatically extracted by AI

**Example:**

CLAIMS IDENTIFIED:

[1] Coffee reduces diabetes risk by 30%

129

[2] Coffee improves heart health

130

[3] Decaf has same benefits as regular

131

[4] Coffee prevents Alzheimer's completely

132

133

134

=== 2.3 Component 3: CLAIMS VERDICTS ===

135

136

**What:** Verdict for each claim identified

137

**Format:** Per claim structure

138

139

**Required Elements:**

140

* **Verdict Label:** WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED

141

* **Confidence Score:** 0-100%

142

* **Brief Reasoning:** 1-3 sentences explaining why

143

* **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration

**Example:**

VERDICTS:

[1] WELL-SUPPORTED (85%) [Risk: C]

150

Multiple studies confirm 25-30% risk reduction with regular consumption.

151

152

[2] UNCERTAIN (65%) [Risk: B]

153

Evidence is mixed. Some studies show benefits, others show no effect.

154

155

[3] PARTIALLY SUPPORTED (60%) [Risk: C]

156

Some benefits overlap, but caffeine-related benefits are reduced in decaf.

157

158

[4] REFUTED (90%) [Risk: B]

159

No evidence for complete prevention. Claim is significantly overstated.

160

161

162

**Risk Tier Display:**

163

* **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections

164

* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality

165

* **Tier C (Green):** Low Risk - Facts/Definitions/History

166

167

**Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.

168

169

=== 2.4 Component 4: ARTICLE SUMMARY (Optional) ===

170

171

**What:** Brief summary of original article content

172

**Length:** 3-5 sentences

173

**Tone:** Neutral (article's position, not FactHarbor's analysis)

**Example:**

ARTICLE SUMMARY:

Health News Today article discusses coffee benefits, citing studies

180

on diabetes and Alzheimer's. Author highlights research linking coffee

181

to disease prevention. Recommends 2-3 cups daily for optimal health.

182

183

184

=== 2.5 Component 5: USAGE STATISTICS (Cost Tracking) ===

185

186

**What:** LLM usage metrics for cost optimization and scaling decisions

187

188

**Purpose:**

189

* Understand cost per analysis

190

* Identify optimization opportunities

191

* Project costs at scale

192

* Inform architecture decisions

**Display Format:**

USAGE STATISTICS:

• Article: 2,450 words (12,300 characters)

198

• Input tokens: 15,234

199

• Output tokens: 892

200

• Total tokens: 16,126

201

• Estimated cost: $0.24 USD

202

• Response time: 8.3 seconds

203

• Cost per claim: $0.048

204

• Model: claude-sonnet-4-20250514

205

206

207

**Why This Matters:**

208

209

At scale, LLM costs are critical:

210

* 10,000 articles/month ≈ $200-500/month

211

* 100,000 articles/month ≈ $2,000-5,000/month

212

* Cost optimization can reduce expenses 30-50%

213

214

**What POC1 Learns:**

215

* How cost scales with article length

216

* Prompt optimization opportunities (caching, compression)

217

* Output verbosity tradeoffs

218

* Model selection strategy (Sonnet vs. Haiku)

219

* Article length limits (if needed)

220

221

**Implementation:**

222

* Claude API already returns usage data

223

* No extra API calls needed

224

* Display to user + log for aggregate analysis

225

* Test with articles of varying lengths

226

227

**Critical for GO/NO-GO:** Unit economics must be viable at scale!

228

229

=== 2.6 Total Output Size ===

230

231

**Combined:** ~220-350 words

232

* Analysis Summary (Context-Aware): 60-90 words (4-6 sentences)

233

* Claims Identification: 30-50 words

234

* Claims Verdicts: 100-150 words

235

* Article Summary: 30-50 words (optional)

236

237

**Note:** Analysis summary is slightly longer (4-6 sentences vs. 3-5) to accommodate context-aware assessment of article structure and logical reasoning.

238

239

== 3. What's NOT in POC Scope ==

240

241

=== 3.1 Feature Exclusions ===

242

243

The following are **explicitly excluded** from POC:

244

245

**Content Features:**

246

* ❌ Scenarios (deferred to POC2)

247

* ❌ Evidence display (supporting/opposing lists)

248

* ❌ Source links (clickable references)

249

* ❌ Detailed reasoning chains

250

* ❌ Source quality ratings (shown but not detailed)

251

* ❌ Contradiction detection (basic only)

252

* ❌ Risk assessment (shown but not workflow-integrated)

253

254

**Platform Features:**

255

* ❌ User accounts / authentication

256

* ❌ Saved history

257

* ❌ Search functionality

258

* ❌ Claim comparison

259

* ❌ User contributions

260

* ❌ Commenting system

261

* ❌ Social sharing

262

263

**Technical Features:**

264

* ❌ Browser extensions

* ❌ Mobile apps

* ❌ API endpoints

* ❌ Webhooks

* ❌ Export features (PDF, CSV)

269

270

**Quality Features:**

271

* ❌ Accessibility (WCAG compliance)

272

* ❌ Multilingual support

273

* ❌ Mobile optimization

274

* ❌ Media verification (images/videos)

275

276

**Production Features:**

277

* ❌ Security hardening

278

* ❌ Privacy compliance (GDPR)

279

* ❌ Terms of service

280

* ❌ Monitoring/logging

* ❌ Error tracking

* ❌ Analytics

* ❌ A/B testing

== 4. POC Simplifications vs. Full System ==

286

287

=== 4.1 Architecture Comparison ===

288

289

**POC Architecture (Simplified):**

290

291

User Input → Single AKEL Call → Output Display

(all processing)

**Full System Architecture:**

296

297

User Input → Claim Extractor → Claim Classifier → Scenario Generator

298

→ Evidence Summarizer → Contradiction Detector → Verdict Generator

299

→ Quality Gates → Publication → Output Display

**Key Differences:**

|=Aspect|=POC1|=Full System

305

|Processing|Single API call|Multi-component pipeline

306

|Scenarios|None (implicit)|Explicit entities with versioning

307

|Evidence|Basic retrieval|Comprehensive with quality scoring

308

|Quality Gates|Simplified (4 basic checks)|Full validation infrastructure

309

|Workflow|3 steps (input/process/output)|6 phases with gates

310

|Data Model|Stateless (no database)|PostgreSQL + Redis + S3

311

|Architecture|Single prompt to Claude|AKEL Orchestrator + Components

312

313

=== 4.2 Workflow Comparison ===

314

315

**POC1 Workflow:**

316

1. User submits text/URL

317

2. Single AKEL call (all processing in one prompt)

318

3. Display results

319

**Total: 3 steps, ~10-18 seconds**

320

321

**Full System Workflow:**

322

1. **Claim Submission** (extraction, normalization, clustering)

323

2. **Scenario Building** (definitions, assumptions, boundaries)

324

3. **Evidence Handling** (retrieval, assessment, linking)

325

4. **Verdict Creation** (synthesis, reasoning, approval)

326

5. **Public Presentation** (summaries, landscapes, deep dives)

327

6. **Time Evolution** (versioning, re-evaluation triggers)

328

**Total: 6 phases with quality gates, ~10-30 seconds**

329

330

=== 4.3 Why POC is Simplified ===

331

332

**Engineering Rationale:**

333

334

1. **Test core capability first:** Can AI do basic fact-checking without humans?

335

2. **Fail fast:** If AI can't generate reasonable verdicts, pivot early

336

3. **Learn before building:** POC1 insights inform full architecture

337

4. **Iterative approach:** Add complexity only after validating foundations

338

5. **Resource efficiency:** Don't build full system if core concept fails

339

340

**Acceptable Trade-offs:**

341

342

* ✅ POC proves AI capability (most risky assumption)

343

* ✅ POC validates user comprehension (can people understand output?)

344

* ❌ POC doesn't validate full workflow (test in Beta)

345

* ❌ POC doesn't validate scale (test in Beta)

346

* ❌ POC doesn't validate scenario architecture (design in POC2)

347

348

=== 4.4 Gap Between POC1 and POC2/Beta ===

349

350

**What needs to be built for POC2:**

351

* Scenario generation component

352

* Evidence Model structure (full)

353

* Scenario-evidence linking

354

* Multi-interpretation comparison

355

* Truth landscape visualization

356

357

**What needs to be built for Beta:**

358

* Multi-component AKEL pipeline

359

* Quality gate infrastructure

360

* Review workflow system

361

* Audit sampling framework

362

* Production data model

363

* Federation architecture (Release 1.0)

364

365

**POC1 → POC2 is significant architectural expansion.**

366

367

== 5. Publication Mode & Labeling ==

368

369

=== 5.1 POC Publication Mode ===

370

371

**Mode:** Mode 2 (AI-Generated, No Prior Human Review)

372

373

Per FactHarbor Specification Section 11 "POC v1 Behavior":

374

* Produces public AI-generated output

375

* No human approval gate

376

* Clear AI-Generated labeling

377

* All quality gates active (simplified)

378

* Risk tier classification shown (demo)

379

380

=== 5.2 User-Facing Labels ===

381

382

**Primary Label (top of analysis):**

383

384

╔════════════════════════════════════════════════════════════╗

385

║ [AI-GENERATED - POC/DEMO] ║

386

║ ║

387

║ This analysis was produced entirely by AI and has not ║

388

║ been human-reviewed. Use for demonstration purposes. ║

389

║ ║

390

║ Source: AI/AKEL v1.0 (POC) ║

391

║ Review Status: Not Reviewed (Proof-of-Concept) ║

392

║ Quality Gates: 4/4 Passed (Simplified) ║

393

║ Last Updated: [timestamp] ║

394

╚════════════════════════════════════════════════════════════╝

395

396

397

**Per-Claim Risk Labels:**

398

* **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety)

399

* **[Risk: B]** 🟡 Medium Risk (Policy/Science)

400

* **[Risk: C]** 🟢 Low Risk (Facts/Definitions)

401

402

=== 5.3 Display Requirements ===

403

404

**Must Show:**

405

* AI-Generated status (prominent)

406

* POC/Demo disclaimer

407

* Risk tier per claim

408

* Confidence scores (0-100%)

409

* Quality gate status (passed/failed)

* Timestamp

**Must NOT Claim:**

* Human review

* Production quality

* Medical/legal advice

416

* Authoritative verdicts

417

* Complete accuracy

418

419

=== 5.4 Mode 2 vs. Full System Publication ===

420

421

|=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3

422

|Label|AI-Generated (POC)|AI-Generated|AKEL-Generated

423

|Review|None|None|Human-Reviewed

424

|Quality Gates|4 (simplified)|6 (full)|6 (full) + Human

425

|Audit|None (POC)|Sampling (5-50%)|Pre-publication

426

|Risk Display|Demo only|Workflow-integrated|Validated

427

|User Actions|View only|Flag for review|Trust rating

428

429

== 6. Quality Gates (Simplified Implementation) ==

=== 6.1 Overview ===

Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates.

434

435

**Full System Has 4 Gates:**

436

1. Source Quality

437

2. Contradiction Search (MANDATORY)

438

3. Uncertainty Quantification

439

4. Structural Integrity

440

441

**POC Implements Simplified Versions:**

442

* Focus on demonstrating concept

443

* Basic implementations sufficient

444

* Failures displayed to user (not blocking)

445

* Full system has comprehensive validation

446

447

=== 6.2 Gate 1: Source Quality (Basic) ===

448

449

**Full System Requirements:**

450

* Primary sources identified and accessible

451

* Source reliability scored against whitelist

452

* Citation completeness verified

453

* Publication dates checked

454

* Author credentials validated

455

456

**POC Implementation:**

457

* ✅ At least 2 sources found

458

* ✅ Sources accessible (URLs valid)

459

* ❌ No whitelist checking

460

* ❌ No credential validation

461

* ❌ No comprehensive reliability scoring

462

463

**Pass Criteria:** ≥2 accessible sources found

464

465

**Failure Handling:** Display error message, don't generate verdict

466

467

=== 6.3 Gate 2: Contradiction Search (Basic) ===

468

469

**Full System Requirements:**

470

* Counter-evidence actively searched

471

* Reservations and limitations identified

472

* Alternative interpretations explored

473

* Bubble detection (echo chambers, conspiracy theories)

474

* Cross-cultural and international perspectives

475

* Academic literature (supporting AND opposing)

476

477

**POC Implementation:**

478

* ✅ Basic search for counter-evidence

479

* ✅ Identify obvious contradictions

480

* ❌ No comprehensive academic search

481

* ❌ No bubble detection

482

* ❌ No systematic alternative interpretation search

483

* ❌ No international perspective verification

484

485

**Pass Criteria:** Basic contradiction search attempted

486

487

**Failure Handling:** Note "limited contradiction search" in output

488

489

=== 6.4 Gate 3: Uncertainty Quantification (Basic) ===

490

491

**Full System Requirements:**

492

* Confidence scores calculated for all claims/verdicts

493

* Limitations explicitly stated

494

* Data gaps identified and disclosed

495

* Strength of evidence assessed

496

* Alternative scenarios considered

497

498

**POC Implementation:**

499

* ✅ Confidence scores (0-100%)

500

* ✅ Basic uncertainty acknowledgment

501

* ❌ No detailed limitation disclosure

502

* ❌ No data gap identification

503

* ❌ No alternative scenario consideration (deferred to POC2)

504

505

**Pass Criteria:** Confidence score assigned

506

507

**Failure Handling:** Show "Confidence: Unknown" if calculation fails

508

509

=== 6.5 Gate 4: Structural Integrity (Basic) ===

510

511

**Full System Requirements:**

512

* No hallucinations detected (fact-checking against sources)

513

* Logic chain valid and traceable

514

* References accessible and verifiable

515

* No circular reasoning

516

* Premises clearly stated

517

518

**POC Implementation:**

519

* ✅ Basic coherence check

520

* ✅ References accessible

521

* ❌ No comprehensive hallucination detection

522

* ❌ No formal logic validation

523

* ❌ No premise extraction and verification

524

525

**Pass Criteria:** Output is coherent and references are accessible

526

527

**Failure Handling:** Display error message

528

529

=== 6.6 Quality Gate Display ===

530

531

**POC shows simplified status:**

532

533

Quality Gates: 4/4 Passed (Simplified)

534

✓ Source Quality: 3 sources found

535

✓ Contradiction Search: Basic search completed

536

✓ Uncertainty: Confidence scores assigned

537

✓ Structural Integrity: Output coherent

538

539

540

**If any gate fails:**

541

542

Quality Gates: 3/4 Passed (Simplified)

543

✓ Source Quality: 3 sources found

544

✗ Contradiction Search: Search failed - limited evidence

545

✓ Uncertainty: Confidence scores assigned

546

✓ Structural Integrity: Output coherent

547

548

Note: This analysis has limited evidence. Use with caution.

549

550

551

=== 6.7 Simplified vs. Full System ===

552

553

|=Gate|=POC (Simplified)|=Full System

554

|Source Quality|≥2 sources accessible|Whitelist scoring, credentials, comprehensiveness

555

|Contradiction|Basic search|Systematic academic + media + international

556

|Uncertainty|Confidence % assigned|Detailed limitations, data gaps, alternatives

557

|Structural|Coherence check|Hallucination detection, logic validation, premise check

558

559

**POC Goal:** Demonstrate that quality gates are possible, not perfect implementation.

560

561

== 7. AKEL Architecture Comparison ==

562

563

=== 7.1 POC AKEL (Simplified) ===

564

565

**Implementation:**

566

* Single Claude API call (Sonnet 4.5)

567

* One comprehensive prompt

568

* All processing in single request

569

* No separate components

570

* No orchestration layer

571

572

**Prompt Structure:**

573

574

Task: Analyze this article and provide:

575

576

1. Extract 3-5 factual claims

577

2. For each claim:

578

- Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)

579

- Assign confidence score (0-100%)

580

- Assign risk tier (A/B/C)

581

- Write brief reasoning (1-3 sentences)

582

3. Generate analysis summary (3-5 sentences)

583

4. Generate article summary (3-5 sentences)

584

5. Run basic quality checks

585

586

Return as structured JSON.

587

588

589

**Processing Time:** 10-18 seconds (estimate)

590

591

=== 7.2 Full System AKEL (Production) ===

**Architecture:**

AKEL Orchestrator

├── Claim Extractor

├── Claim Classifier (with risk tier assignment)

598

├── Scenario Generator

599

├── Evidence Summarizer

600

├── Contradiction Detector

601

├── Quality Gate Validator

602

├── Audit Sampling Scheduler

603

└── Federation Sync Adapter (Release 1.0+)

**Processing:**

* Parallel processing where possible

608

* Separate component calls

609

* Quality gates between phases

610

* Audit sampling selection

611

* Cross-node coordination (federated mode)

612

613

**Processing Time:** 10-30 seconds (full pipeline)

614

615

=== 7.3 Why POC Uses Single Call ===

616

617

**Advantages:**

618

* ✅ Simpler to implement

619

* ✅ Faster POC development

620

* ✅ Easier to debug

621

* ✅ Proves AI capability

622

* ✅ Good enough for concept validation

623

624

**Limitations:**

625

* ❌ No component reusability

626

* ❌ No parallel processing

627

* ❌ All-or-nothing (can't partially succeed)

628

* ❌ Harder to improve individual components

629

* ❌ No audit sampling

630

631

**Acceptable Trade-off:**

632

633

POC tests "Can AI do this?" not "How should we architect it?"

634

635

Full component architecture comes in Beta after POC validates concept.

636

637

=== 7.4 Evolution Path ===

638

639

**POC1:** Single prompt → Prove concept

640

**POC2:** Add scenario component → Test full pipeline

641

**Beta:** Multi-component AKEL → Production architecture

642

**Release 1.0:** Full AKEL + Federation → Scale

643

644

== 8. Functional Requirements ==

645

646

=== FR-POC-1: Article Input ===

647

648

**Requirement:** User can submit article for analysis

649

650

**Functionality:**

651

* Text input field (paste article text, up to 5000 characters)

652

* URL input field (paste article URL)

653

* "Analyze" button to trigger processing

654

* Loading indicator during analysis

655

656

**Excluded:**

657

* No user authentication

658

* No claim history

659

* No search functionality

660

* No saved templates

661

662

**Acceptance Criteria:**

663

* User can paste text from article

664

* User can paste URL of article

665

* System accepts input and triggers analysis

666

667

=== FR-POC-2: Claim Extraction (Fully Automated) ===

668

669

**Requirement:** AI automatically extracts 3-5 factual claims

670

671

**Functionality:**

672

* AI reads article text

673

* AI identifies factual claims (not opinions/questions)

674

* AI extracts 3-5 most important claims

675

* System displays numbered list

676

677

**Critical:** NO MANUAL EDITING ALLOWED

678

* AI selects which claims to extract

679

* AI identifies factual vs. non-factual

680

* System processes claims as extracted

681

* No human curation or correction

682

683

**Error Handling:**

684

* If extraction fails: Display error message

685

* User can retry with different input

686

* No manual intervention to fix extraction

687

688

**Acceptance Criteria:**

689

* AI extracts 3-5 claims automatically

690

* Claims are factual (not opinions)

691

* Claims are clearly stated

692

* No manual editing required

693

694

=== FR-POC-3: Verdict Generation (Fully Automated) ===

695

696

**Requirement:** AI automatically generates verdict for each claim

697

698

**Functionality:**

699

* For each claim, AI:

700

* Evaluates claim based on available evidence/knowledge

701

* Determines verdict: WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED

702

* Assigns confidence score (0-100%)

703

* Assigns risk tier (A/B/C)

704

* Writes brief reasoning (1-3 sentences)

705

* System displays verdict for each claim

706

707

**Critical:** NO MANUAL EDITING ALLOWED

708

* AI computes verdicts based on evidence

709

* AI generates confidence scores

710

* AI writes reasoning

711

* No human review or adjustment

712

713

**Error Handling:**

714

* If verdict generation fails: Display error message

715

* User can retry

716

* No manual intervention to adjust verdicts

717

718

**Acceptance Criteria:**

719

* Each claim has a verdict

720

* Confidence score is displayed (0-100%)

721

* Risk tier is displayed (A/B/C)

722

* Reasoning is understandable (1-3 sentences)

723

* Verdict is defensible given reasoning

724

* All generated automatically by AI

725

726

=== FR-POC-4: Analysis Summary (Fully Automated) ===

727

728

**Requirement:** AI generates brief summary of analysis

729

730

**Functionality:**

731

* AI summarizes findings in 3-5 sentences:

732

* How many claims found

733

* Distribution of verdicts

734

* Overall assessment

735

* System displays at top of results

736

737

**Critical:** NO MANUAL EDITING ALLOWED

738

739

**Acceptance Criteria:**

740

* Summary is coherent

741

* Accurately reflects analysis

742

* 3-5 sentences

743

* Automatically generated

744

745

=== FR-POC-5: Article Summary (Fully Automated, Optional) ===

746

747

**Requirement:** AI generates brief summary of original article

748

749

**Functionality:**

750

* AI summarizes article content (not FactHarbor's analysis)

* 3-5 sentences

* System displays

**Note:** Optional - can skip if time limited

755

756

**Critical:** NO MANUAL EDITING ALLOWED

757

758

**Acceptance Criteria:**

759

* Summary is neutral (article's position)

760

* Accurately reflects article content

761

* 3-5 sentences

762

* Automatically generated

763

764

=== FR-POC-6: Publication Mode Display ===

765

766

**Requirement:** Clear labeling of AI-generated content

767

768

**Functionality:**

769

* Display Mode 2 publication label

770

* Show POC/Demo disclaimer

771

* Display risk tiers per claim

772

* Show quality gate status

773

* Display timestamp

774

775

**Acceptance Criteria:**

776

* Label is prominent and clear

777

* User understands this is AI-generated POC output

778

* Risk tiers are color-coded

779

* Quality gate status is visible

780

781

=== FR-POC-7: Quality Gate Execution ===

782

783

**Requirement:** Execute simplified quality gates

784

785

**Functionality:**

786

* Check source quality (basic)

787

* Attempt contradiction search (basic)

788

* Calculate confidence scores

789

* Verify structural integrity (basic)

790

* Display gate results

791

792

**Acceptance Criteria:**

793

* All 4 gates attempted

794

* Pass/fail status displayed

795

* Failures explained to user

796

* Gates don't block publication (POC mode)

797

798

== 9. Non-Functional Requirements ==

799

800

=== NFR-POC-1: Fully Automated Processing ===

801

802

**Requirement:** Complete AI automation with zero manual intervention

803

804

**Critical Rule:** NO MANUAL EDITING AT ANY STAGE

805

806

**What this means:**

807

* Claims: AI selects (no human curation)

808

* Scenarios: N/A (deferred to POC2)

809

* Evidence: AI evaluates (no human selection)

810

* Verdicts: AI determines (no human adjustment)

811

* Summaries: AI writes (no human editing)

**Pipeline:**

User Input → AKEL Processing → Output Display

↓

ZERO human editing

**If AI output is poor:**

821

* ❌ Do NOT manually fix it

822

* ✅ Document the failure

823

* ✅ Improve prompts and retry

824

* ✅ Accept that POC might fail

825

826

**Why this matters:**

827

* Tests whether AI can do this without humans

828

* Validates scalability (humans can't review every analysis)

829

* Honest test of technical feasibility

830

831

=== NFR-POC-2: Performance ===

832

833

**Requirement:** Analysis completes in reasonable time

834

835

**Acceptable Performance:**

836

* Processing time: 1-5 minutes (acceptable for POC)

837

* Display loading indicator to user

838

* Show progress if possible ("Extracting claims...", "Generating verdicts...")

839

840

**Not Required:**

841

* Production-level speed (< 30 seconds)

842

* Optimization for scale

843

* Caching

844

845

**Acceptance Criteria:**

846

* Analysis completes within 5 minutes

847

* User sees loading indicator

848

* No timeout errors

849

850

=== NFR-POC-3: Reliability ===

851

852

**Requirement:** System works for manual testing sessions

853

854

**Acceptable:**

855

* Occasional errors (< 20% failure rate)

856

* Manual restart if needed

857

* Display error messages clearly

**Not Required:**

* 99.9% uptime

* Automatic error recovery

862

* Production monitoring

863

864

**Acceptance Criteria:**

865

* System works for test demonstrations

866

* Errors are handled gracefully

867

* User receives clear error messages

868

869

=== NFR-POC-4: Environment ===

870

871

**Requirement:** Runs on simple infrastructure

872

873

**Acceptable:**

874

* Single machine or simple cloud setup

875

* No distributed architecture

876

* No load balancing

877

* No redundancy

878

* Local development environment viable

879

880

**Not Required:**

881

* Production infrastructure

882

* Multi-region deployment

* Auto-scaling

* Disaster recovery

=== NFR-POC-5: Cost Efficiency Tracking ===

887

888

**Requirement:** Track and display LLM usage metrics to inform optimization decisions

889

890

**Must Track:**

891

* Input tokens (article + prompt)

892

* Output tokens (generated analysis)

893

* Total tokens

894

* Estimated cost (USD)

895

* Response time (seconds)

896

* Article length (words/characters)

897

898

**Must Display:**

899

* Usage statistics in UI (Component 5)

900

* Cost per analysis

901

* Cost per claim extracted

902

903

**Must Log:**

904

* Aggregate metrics for analysis

905

* Cost distribution by article length

906

* Token efficiency trends

907

908

**Purpose:**

909

* Understand unit economics

910

* Identify optimization opportunities

911

* Project costs at scale

912

* Inform architecture decisions (caching, model selection, etc.)

913

914

**Acceptance Criteria:**

915

* ✅ Usage data displayed after each analysis

916

* ✅ Metrics logged for aggregate analysis

917

* ✅ Cost calculated accurately (Claude API pricing)

918

* ✅ Test cases include varying article lengths

919

* ✅ POC1 report includes cost analysis section

920

921

**Success Target:**

922

* Average cost per analysis < $0.05 USD

923

* Cost scaling behavior understood (linear/exponential)

924

* 2+ optimization opportunities identified

925

926

**Critical:** Unit economics must be viable for scaling decision!

927

928

== 10. Technical Architecture ==

929

930

=== 10.1 System Components ===

931

932

**Frontend:**

933

* Simple HTML form (text input + URL input + button)

934

* Loading indicator

935

* Results display page (single page, no tabs/navigation)

936

937

**Backend:**

938

* Single API endpoint

939

* Calls Claude API (Sonnet 4.5 or latest)

940

* Parses response

941

* Returns JSON to frontend

942

943

**Data Storage:**

944

* None required (stateless POC)

945

* Optional: Simple file storage or SQLite for demo examples

946

947

**External Services:**

948

* Claude API (Anthropic) - required

949

* Optional: URL fetch service for article text extraction

950

951

=== 10.2 Processing Flow ===

952

953

954

1. User submits text or URL

955

↓

956

2. Backend receives request

957

↓

958

3. If URL: Fetch article text

959

↓

960

4. Call Claude API with single prompt:

961

"Extract claims, evaluate each, provide verdicts"

962

↓

963

5. Claude API returns:

964

- Analysis summary

965

- Claims list

966

- Verdicts for each claim (with risk tiers)

967

- Article summary (optional)

968

- Quality gate results

969

↓

970

6. Backend parses response

971

↓

972

7. Frontend displays results with Mode 2 labeling

973

974

975

**Key Simplification:** Single API call does entire analysis

976

977

=== 10.3 AI Prompt Strategy ===

978

979

**Single Comprehensive Prompt:**

980

981

Task: Analyze this article and provide:

982

983

1. Identify the article's main thesis/conclusion

984

- What is the article trying to argue or prove?

985

- What is the primary claim or conclusion?

986

987

2. Extract 3-5 factual claims from the article

988

- Note which claims are CENTRAL to the main thesis

989

- Note which claims are SUPPORTING facts

990

991

3. For each claim:

992

- Determine verdict (WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED)

993

- Assign confidence score (0-100%)

994

- Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)

995

- Write brief reasoning (1-3 sentences)

996

997

4. Assess relationship between claims and main thesis:

998

- Do the claims actually support the article's conclusion?

999

- Are there logical leaps or unsupported inferences?

1000

- Is the article's framing misleading even if individual facts are accurate?

1001

1002

5. Run quality gates:

1003

- Check: ≥2 sources found

1004

- Attempt: Basic contradiction search

1005

- Calculate: Confidence scores

1006

- Verify: Structural integrity

1007

1008

6. Write context-aware analysis summary (4-6 sentences):

1009

- State article's main thesis

1010

- Report claims found and verdict distribution

1011

- Note if central claims are problematic

1012

- Assess whether evidence supports conclusion

1013

- Overall credibility considering claim importance

1014

1015

7. Write article summary (3-5 sentences: neutral summary of article content)

1016

1017

Return as structured JSON with quality gate results.

1018

1019

1020

**One prompt generates everything.**

1021

1022

**Critical Addition:**

1023

1024

Steps 1, 2 (marking central claims), 4, and 6 are NEW for context-aware analysis. These test whether AI can distinguish between "accurate facts poorly reasoned" vs. "genuinely credible article."

1025

1026

=== 10.4 Technology Stack Suggestions ===

1027

1028

**Frontend:**

1029

* HTML + CSS + JavaScript (minimal framework)

1030

* OR: Next.js (if team prefers)

1031

* Hosted: Local machine OR Vercel/Netlify free tier

1032

1033

**Backend:**

1034

* Python Flask/FastAPI (simple REST API)

1035

* OR: Next.js API routes (if using Next.js)

1036

* Hosted: Local machine OR Railway/Render free tier

1037

1038

**AKEL Integration:**

1039

* Claude API via Anthropic SDK

1040

* Model: Claude Sonnet 4.5 or latest available

1041

1042

**Database:**

1043

* None (stateless acceptable)

1044

* OR: SQLite if want to store demo examples

1045

* OR: JSON files on disk

1046

1047

**Deployment:**

1048

* Local development environment sufficient for POC

1049

* Optional: Deploy to cloud for remote demos

1050

1051

== 11. Success Criteria ==

1052

1053

=== 11.1 Minimum Success (POC Passes) ===

1054

1055

**Required for GO decision:**

1056

* ✅ AI extracts 3-5 factual claims automatically

1057

* ✅ AI provides verdict for each claim automatically

1058

* ✅ Verdicts are reasonable (≥70% make logical sense)

1059

* ✅ Analysis summary is coherent

1060

* ✅ Output is comprehensible to reviewers

1061

* ✅ Team/advisors understand the output

1062

* ✅ Team agrees approach has merit

1063

* ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention)

1064

* ✅ **Cost efficiency acceptable** (average cost per analysis < $0.05 USD target)

1065

* ✅ **Cost scaling understood** (data collected on article length vs. cost)

1066

* ✅ **Optimization opportunities identified** (≥2 potential improvements documented)

1067

1068

**Quality Definition:**

1069

* "Reasonable verdict" = Defensible given general knowledge

1070

* "Coherent summary" = Logically structured, grammatically correct

1071

* "Comprehensible" = Reviewers understand what analysis means

1072

1073

=== 11.2 POC Fails If ===

1074

1075

**Automatic NO-GO if any of these:**

1076

* ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones)

1077

* ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random)

1078

* ❌ Output incomprehensible (reviewers can't understand analysis)

1079

* ❌ **Requires manual editing for most analyses** (> 50% need human correction)

1080

* ❌ Team loses confidence in AI-automated approach

1081

1082

=== 11.3 Quality Thresholds ===

1083

1084

**POC quality expectations:**

1085

1086

|=Component|=Quality Threshold|=Definition

1087

|Claim Extraction|(% class="success" %)≥70% accuracy(%%) |Identifies obvious factual claims, may miss some edge cases

1088

|Verdict Logic|(% class="success" %)≥70% defensible(%%) |Verdicts are logical given reasoning provided

1089

|Reasoning Clarity|(% class="success" %)≥70% clear(%%) |1-3 sentences are understandable and relevant

1090

|Overall Analysis|(% class="success" %)≥70% useful(%%) |Output helps user understand article claims

1091

1092

**Analogy:** "B student" quality (70-80%), not "A+" perfection yet

**Not expecting:**

* 100% accuracy

* Perfect claim coverage

1097

* Comprehensive evidence gathering

* Flawless verdicts

* Production polish

**Expecting:**

* Reasonable claim extraction

1103

* Defensible verdicts

1104

* Understandable reasoning

* Useful output

== 12. Test Cases ==

=== 12.1 Test Case 1: Simple Factual Claim ===

1110

1111

**Input:** "Coffee reduces the risk of type 2 diabetes by 30%"

1112

1113

**Expected Output:**

1114

* Extract claim correctly

1115

* Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED

1116

* Confidence: 70-90%

1117

* Risk tier: C (Low)

1118

* Reasoning: Mentions studies or evidence

1119

1120

**Success:** Verdict is reasonable and reasoning makes sense

1121

1122

=== 12.2 Test Case 2: Complex News Article ===

1123

1124

**Input:** News article URL with multiple claims about politics/health/science

1125

1126

**Expected Output:**

1127

* Extract 3-5 key claims

1128

* Verdict for each (may vary: some supported, some uncertain, some refuted)

1129

* Coherent analysis summary

1130

* Article summary

1131

* Risk tiers assigned appropriately

1132

1133

**Success:** Claims identified are actually from article, verdicts are reasonable

1134

1135

=== 12.3 Test Case 3: Controversial Topic ===

1136

1137

**Input:** Article on contested political or scientific topic

**Expected Output:**

* Balanced analysis

* Acknowledges uncertainty where appropriate

1142

* Doesn't overstate confidence

1143

* Reasoning shows awareness of complexity

1144

1145

**Success:** Analysis is fair and doesn't show obvious bias

1146

1147

=== 12.4 Test Case 4: Clearly False Claim ===

1148

1149

**Input:** Article with obviously false claim (e.g., "The Earth is flat")

**Expected Output:**

* Extract claim

* Verdict: REFUTED

* High confidence (> 90%)

1155

* Risk tier: C (Low - established fact)

1156

* Clear reasoning

1157

1158

**Success:** AI correctly identifies false claim with high confidence

1159

1160

=== 12.5 Test Case 5: Genuinely Uncertain Claim ===

1161

1162

**Input:** Article with claim where evidence is genuinely mixed

**Expected Output:**

* Extract claim

* Verdict: UNCERTAIN

* Moderate confidence (40-60%)

1168

* Reasoning explains why uncertain

1169

1170

**Success:** AI recognizes uncertainty and doesn't overstate confidence

1171

1172

=== 12.6 Test Case 6: High-Risk Medical Claim ===

1173

1174

**Input:** Article making medical claims

**Expected Output:**

* Extract claim

* Verdict: [appropriate based on evidence]

1179

* Risk tier: A (High - medical)

1180

* Red label displayed

1181

* Clear disclaimer about not being medical advice

1182

1183

**Success:** Risk tier correctly assigned, appropriate warnings shown

1184

1185

== 13. POC Decision Gate ==

1186

1187

=== 13.1 Decision Framework ===

1188

1189

After POC testing complete, team makes one of three decisions:

1190

1191

**Option A: GO (Proceed to POC2)**

1192

1193

**Conditions:**

1194

* AI quality ≥70% without manual editing

1195

* Basic claim → verdict pipeline validated

1196

* Internal + advisor feedback positive

1197

* Technical feasibility confirmed

1198

* Team confident in direction

1199

* Clear path to improving AI quality to ≥90%

1200

1201

**Next Steps:**

1202

* Plan POC2 development (add scenarios)

1203

* Design scenario architecture

1204

* Expand to Evidence Model structure

1205

* Test with more complex articles

1206

1207

**Option B: NO-GO (Pivot or Stop)**

**Conditions:**

* AI quality < 60%

* Requires manual editing for most analyses (> 50%)

1212

* Feedback indicates fundamental flaws

1213

* Cost/effort not justified by value

1214

* No clear path to improvement

1215

1216

**Next Steps:**

1217

* **Pivot:** Change to hybrid human-AI approach (accept manual review required)

1218

* **Stop:** Conclude approach not viable, revisit later

1219

1220

**Option C: ITERATE (Improve POC)**

1221

1222

**Conditions:**

1223

* Concept has merit but execution needs work

1224

* Specific improvements identified

1225

* Addressable with better prompts/approach

1226

* AI quality between 60-70%

**Next Steps:**

* Improve AI prompts

* Test different approaches

1231

* Re-run POC with improvements

1232

* Then make GO/NO-GO decision

1233

1234

=== 13.2 Decision Criteria Summary ===

1235

1236

1237

AI Quality < 60% → NO-GO (approach doesn't work)

1238

AI Quality 60-70% → ITERATE (improve and retry)

1239

AI Quality ≥70% → GO (proceed to POC2)

1240

1241

1242

== 14. Key Risks & Mitigations ==

1243

1244

=== 14.1 Risk: AI Quality Not Good Enough ===

1245

1246

**Likelihood:** Medium-High

1247

**Impact:** POC fails

1248

1249

**Mitigation:**

1250

* Extensive prompt engineering and testing

1251

* Use best available AI models (Sonnet 4.5)

1252

* Test with diverse article types

1253

* Iterate on prompts based on results

1254

1255

**Acceptance:** This is what POC tests - be ready for failure

1256

1257

=== 14.2 Risk: AI Consistency Issues ===

1258

1259

**Likelihood:** Medium

1260

**Impact:** Works sometimes, fails other times

1261

1262

**Mitigation:**

1263

* Test with 10+ diverse articles

1264

* Measure success rate honestly

1265

* Improve prompts to increase consistency

1266

1267

**Acceptance:** Some variability OK if average quality ≥70%

1268

1269

=== 14.3 Risk: Output Incomprehensible ===

1270

1271

**Likelihood:** Low-Medium

1272

**Impact:** Users can't understand analysis

1273

1274

**Mitigation:**

1275

* Create clear explainer document

1276

* Iterate on output format

1277

* Test with non-technical reviewers

1278

* Simplify language if needed

1279

1280

**Acceptance:** Iterate until comprehensible

1281

1282

=== 14.4 Risk: API Rate Limits / Costs ===

1283

1284

**Likelihood:** Low

1285

**Impact:** System slow or expensive

**Mitigation:**

* Monitor API usage

* Implement retry logic

1290

* Estimate costs before scaling

1291

1292

**Acceptance:** POC can be slow and expensive (optimization later)

1293

1294

=== 14.5 Risk: Scope Creep ===

1295

1296

**Likelihood:** Medium

1297

**Impact:** POC becomes too complex

1298

1299

**Mitigation:**

1300

* Strict scope discipline

1301

* Say NO to feature additions

1302

* Keep focus on core question

1303

1304

**Acceptance:** POC is minimal by design

1305

1306

== 15. POC Philosophy ==

1307

1308

=== 15.1 Core Principles ===

1309

1310

**1. Build Less, Learn More**

1311

* Minimum features to test hypothesis

1312

* Don't build unvalidated features

1313

* Focus on core question only

1314

1315

**2. Fail Fast**

1316

* Quick test of hardest part (AI capability)

1317

* Accept that POC might fail

1318

* Better to discover issues early

1319

* Honest assessment over optimistic hope

1320

1321

**3. Test First, Build Second**

1322

* Validate AI can do this before building platform

1323

* Don't assume it will work

1324

* Let results guide decisions

1325

1326

**4. Automation First**

1327

* No manual editing allowed

1328

* Tests scalability, not just feasibility

1329

* Proves approach can work at scale

1330

1331

**5. Honest Assessment**

1332

* Don't cherry-pick examples

1333

* Don't manually fix bad outputs

1334

* Document failures openly

1335

* Make data-driven decisions

1336

1337

=== 15.2 What POC Is ===

1338

1339

✅ Testing AI capability without humans

1340

✅ Proving core technical concept

1341

✅ Fast validation of approach

1342

✅ Honest assessment of feasibility

1343

1344

=== 15.3 What POC Is NOT ===

1345

1346

❌ Building a product

1347

❌ Production-ready system

1348

❌ Feature-complete platform

1349

❌ Perfectly accurate analysis

1350

❌ Polished user experience

1351

1352

== 16. Success = Clear Path Forward ==

1353

1354

**If POC succeeds (≥70% AI quality):**

1355

* ✅ Approach validated

1356

* ✅ Proceed to POC2 (add scenarios)

1357

* ✅ Design full Evidence Model structure

1358

* ✅ Test multi-scenario comparison

1359

* ✅ Focus on improving AI quality from 70% → 90%

1360

1361

**If POC fails (< 60% AI quality):**

1362

* ✅ Learn what doesn't work

1363

* ✅ Pivot to different approach

1364

* ✅ OR wait for better AI technology

1365

* ✅ Avoid wasting resources on non-viable approach

1366

1367

**Either way, POC provides clarity.**

1368

1369

== 17. Related Pages ==

1370

1371

* [[User Needs>>Test.FactHarbor.Specification.Requirements.User Needs.WebHome]]

1372

* [[Requirements>>Test.FactHarbor.Specification.Requirements.WebHome]]

1373

* [[Gap Analysis>>Test.FactHarbor.Specification.Requirements.GapAnalysis]]

1374

* [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]

1375

* [[AKEL>>FactHarbor.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]

1376

* [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]

1377

1378

**Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)

Wiki source code of POC Requirements (POC1 & POC2)

Applications

Navigation

Need help?