POC Requirements (POC1 & POC2)

= POC Requirements =

**Implementation Status (Updated January 2026)**

This specification describes the original POC1 design. The current implementation (v2.6.33) has evolved in several key areas:

7

8

* **Verdict Scale**: Expanded from 4-point to **7-point symmetric scale** (TRUE → MOSTLY-TRUE → LEANING-TRUE → MIXED/UNVERIFIED → LEANING-FALSE → MOSTLY-FALSE → FALSE)

9

* **Scenarios**: Replaced with **KeyFactors** (decomposition questions discovered during analysis)

10

* **Quality Gates**: Gate 1 (Claim Validation) and Gate 4 (Verdict Confidence) implemented; Gates 2-3 deferred

11

* **Caching**: Redis claim caching **not yet implemented**; all data stored as JSON blobs

12

* **Data Model**: Normalized tables **not implemented**; using JSON blob storage in SQLite

13

14

See `Docs/STATUS/Documentation_Inconsistencies.md` for full comparison.

**POC1 Architecture:** 3-stage AKEL pipeline (Extract → Analyze → Holistic) with Redis caching, credit tracking, and LLM abstraction layer.

19

20

See [[POC1 API Specification>>Archive.FactHarbor 2026\.02\.08.Specification.POC.API-and-Schemas.WebHome]] for complete technical details.

**Status:** ✅ Approved for Development

26

**Version:** 2.0 (Updated after Specification Cross-Check)

27

**Goal:** Prove that AI can extract claims and determine verdicts automatically without human intervention

28

29

== 1. POC Overview ==

30

31

=== 1.1 What POC Tests ===

**Core Question:**

> Can AI automatically extract factual claims from articles and evaluate them with reasonable verdicts?

36

37

**What we're proving:**

38

39

* AI can identify factual claims from text

40

* AI can evaluate those claims and produce verdicts

41

* Output is comprehensible and useful

42

* Fully automated approach is viable

43

44

**What we're NOT testing:**

45

46

* Scenario generation (deferred to POC2)

47

* Evidence display (deferred to POC2)

48

* Production scalability

49

* Perfect accuracy

50

* Complete feature set

51

52

=== 1.2 Scenarios Deferred to POC2 ===

53

54

55

**Implementation Update (v2.6.33):** Scenarios were **replaced with KeyFactors** - decomposition questions discovered during the understanding phase. KeyFactors are optional, emergent, and do not require a separate entity. See `Docs/ARCHITECTURE/KeyFactors_Design.md` for rationale.

56

57

58

**Intentional Simplification:**

59

60

Scenarios are a core component of the full FactHarbor system (Claims → Scenarios → Evidence → Verdicts), but are **deliberately excluded from POC1**.

**Rationale:**

* **POC1 tests:** Can AI extract claims and generate verdicts?

65

* **POC2 will add:** Scenario generation and management

66

* **Open questions remain:** Should scenarios be separate entities? How are they sequenced with evidence gathering? What's the optimal workflow?

**Design Decision:**

Prove basic AI capability first, then add scenario complexity based on POC1 learnings. This is good engineering: test the hardest part (AI fact-checking) before adding architectural complexity.

**No Risk:**

Scenarios are additive complexity, not foundational. Deferring them to POC2 allows:

75

76

* Faster POC1 validation

77

* Learning from POC1 to inform scenario design

78

* Iterative approach: fail fast if basic AI doesn't work

79

* Flexibility to adjust scenario architecture based on POC1 insights

80

81

**Full System Workflow (Future):**

82

{{code}}Claims → Scenarios → Evidence → Verdicts{{/code}}

83

84

**POC1 Simplified Workflow:**

85

{{code}}Claims → Verdicts (scenarios implicit in reasoning){{/code}}

86

87

== 2. POC Output Specification ==

88

89

=== 2.1 Component 1: ANALYSIS SUMMARY (Context-Aware) ===

90

91

**What:** Context-aware overview that considers both individual claims AND their relationship to the article's main argument

92

93

**Length:** 4-6 sentences

94

95

**Content (Required Elements):**

96

97

1. **Article's main thesis/claim** - What is the article trying to argue or prove?

98

2. **Claim count and verdicts** - How many claims analyzed, distribution of verdicts

99

3. **Central vs. supporting claims** - Which claims are central to the article's argument?

100

4. **Relationship assessment** - Do the claims support the article's conclusion?

101

5. **Overall credibility** - Final assessment considering claim importance

102

103

**Critical Innovation:**

104

105

POC1 tests whether AI can understand that **article credibility ≠ simple average of claim verdicts**. An article might:

106

107

* Make accurate supporting facts but draw unsupported conclusions

108

* Have one false central claim that invalidates the whole argument

109

* Misframe accurate information to mislead

110

111

**Good Example (Context-Aware):**

112

{{code}}This article argues that coffee cures cancer based on its antioxidant

113

content. We analyzed 3 factual claims: 2 about coffee's chemical

114

properties are well-supported, but the main causal claim is refuted

115

by current evidence. The article confuses correlation with causation.

116

Overall assessment: MISLEADING - makes an unsupported medical claim

117

despite citing some accurate facts.{{/code}}

118

119

**Poor Example (Simple Aggregation - Don't Do This):**

120

{{code}}This article makes 3 claims. 2 are well-supported and 1 is refuted.

121

Overall assessment: mostly accurate (67% accurate).{{/code}}

122

↑ This misses that the refuted claim IS the article's main point!

**What POC1 Tests:**

Can AI identify and assess:

127

128

* ✅ The article's main thesis/conclusion?

129

* ✅ Which claims are central vs. supporting?

130

* ✅ Whether the evidence supports the conclusion?

131

* ✅ Overall credibility considering logical structure?

132

133

**If AI Cannot Do This:**

134

135

That's valuable to learn in POC1! We'll:

136

137

* Note as limitation

138

* Fall back to simple aggregation with warning

139

* Design explicit article-level analysis for POC2

140

141

=== 2.2 Component 2: CLAIMS IDENTIFICATION ===

142

143

**What:** List of factual claims extracted from article

144

**Format:** Numbered list

145

**Quantity:** 3-5 claims

146

**Requirements:**

147

148

* Factual claims only (not opinions/questions)

149

* Clearly stated

150

* Automatically extracted by AI

151

152

**Example:**

153

{{code}}CLAIMS IDENTIFIED:

154

155

[1] Coffee reduces diabetes risk by 30%

156

[2] Coffee improves heart health

157

[3] Decaf has same benefits as regular

158

[4] Coffee prevents Alzheimer's completely{{/code}}

159

160

=== 2.3 Component 3: CLAIMS VERDICTS ===

161

162

**What:** Verdict for each claim identified

163

**Format:** Per claim structure

164

165

**Required Elements:**

166

167

* **Verdict Label:** ~~WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED~~

168

//**Current Implementation (v2.6.33):** 7-point symmetric scale://////////////

169

* TRUE (86-100%) / MOSTLY-TRUE (72-85%) / LEANING-TRUE (58-71%)

170

* MIXED (43-57%, high confidence) / UNVERIFIED (43-57%, low confidence)

171

* LEANING-FALSE (29-42%) / MOSTLY-FALSE (15-28%) / FALSE (0-14%)

172

* **Confidence Score:** 0-100%

173

* **Brief Reasoning:** 1-3 sentences explaining why

174

* **Risk Tier:** A (High) / B (Medium) / C (Low) - for demonstration

**Example:**

{{code}}VERDICTS:

[1] WELL-SUPPORTED (85%) [Risk: C]

180

Multiple studies confirm 25-30% risk reduction with regular consumption.

181

182

[2] UNCERTAIN (65%) [Risk: B]

183

Evidence is mixed. Some studies show benefits, others show no effect.

184

185

[3] PARTIALLY SUPPORTED (60%) [Risk: C]

186

Some benefits overlap, but caffeine-related benefits are reduced in decaf.

187

188

[4] REFUTED (90%) [Risk: B]

189

No evidence for complete prevention. Claim is significantly overstated.{{/code}}

190

191

**Risk Tier Display:**

192

193

* **Tier A (Red):** High Risk - Medical/Legal/Safety/Elections

194

* **Tier B (Yellow):** Medium Risk - Policy/Science/Causality

195

* **Tier C (Green):** Low Risk - Facts/Definitions/History

196

197

**Note:** Risk tier shown for demonstration purposes in POC. Full system uses risk tiers to determine review workflow.

198

199

=== 2.4 Component 4: ARTICLE SUMMARY (Optional) ===

200

201

**What:** Brief summary of original article content

202

**Length:** 3-5 sentences

203

**Tone:** Neutral (article's position, not FactHarbor's analysis)

204

205

**Example:**

206

{{code}}ARTICLE SUMMARY:

207

208

Health News Today article discusses coffee benefits, citing studies

209

on diabetes and Alzheimer's. Author highlights research linking coffee

210

to disease prevention. Recommends 2-3 cups daily for optimal health.{{/code}}

211

212

=== 2.5 Component 5: USAGE STATISTICS (Cost Tracking) ===

213

214

**What:** LLM usage metrics for cost optimization and scaling decisions

**Purpose:**

* Understand cost per analysis

219

* Identify optimization opportunities

220

* Project costs at scale

221

* Inform architecture decisions

222

223

**Display Format:**

224

{{code}}USAGE STATISTICS:

225

• Article: 2,450 words (12,300 characters)

226

• Input tokens: 15,234

227

• Output tokens: 892

228

• Total tokens: 16,126

229

• Estimated cost: $0.24 USD

230

• Response time: 8.3 seconds

231

• Cost per claim: $0.048

232

• Model: claude-sonnet-4-20250514{{/code}}

233

234

**Why This Matters:**

235

236

At scale, LLM costs are critical:

237

238

* 10,000 articles/month ≈ $200-500/month

239

* 100,000 articles/month ≈ $2,000-5,000/month

240

* Cost optimization can reduce expenses 30-50%

241

242

**What POC1 Learns:**

243

244

* How cost scales with article length

245

* Prompt optimization opportunities (caching, compression)

246

* Output verbosity tradeoffs

247

* Model selection strategy (FAST vs. REASONING roles)

248

* Article length limits (if needed)

**Implementation:**

* Claude API already returns usage data

253

* No extra API calls needed

254

* Display to user + log for aggregate analysis

255

* Test with articles of varying lengths

256

257

**Critical for GO/NO-GO:** Unit economics must be viable at scale!

258

259

=== 2.6 Total Output Size ===

260

261

**Combined:** 220-350 words

262

263

* Analysis Summary (Context-Aware): 60-90 words (4-6 sentences)

264

* Claims Identification: 30-50 words

265

* Claims Verdicts: 100-150 words

266

* Article Summary: 30-50 words (optional)

267

268

**Note:** Analysis summary is slightly longer (4-6 sentences vs. 3-5) to accommodate context-aware assessment of article structure and logical reasoning.

269

270

== 3. What's NOT in POC Scope ==

271

272

=== 3.1 Feature Exclusions ===

273

274

The following are **explicitly excluded** from POC:

275

276

**Content Features:**

277

278

* ❌ Scenarios (deferred to POC2)

279

* ❌ Evidence display (supporting/opposing lists)

280

* ❌ Source links (clickable references)

281

* ❌ Detailed reasoning chains

282

* ❌ Source quality ratings (shown but not detailed)

283

* ❌ Contradiction detection (basic only)

284

* ❌ Risk assessment (shown but not workflow-integrated)

285

286

**Platform Features:**

287

288

* ❌ User accounts / authentication

289

* ❌ Saved history

290

* ❌ Search functionality

291

* ❌ Claim comparison

292

* ❌ User contributions

293

* ❌ Commenting system

294

* ❌ Social sharing

295

296

**Technical Features:**

297

298

* ❌ Browser extensions

* ❌ Mobile apps

* ❌ API endpoints

* ❌ Webhooks

* ❌ Export features (PDF, CSV)

303

304

**Quality Features:**

305

306

* ❌ Accessibility (WCAG compliance)

307

* ❌ Multilingual support

308

* ❌ Mobile optimization

309

* ❌ Media verification (images/videos)

310

311

**Production Features:**

312

313

* ❌ Security hardening

314

* ❌ Privacy compliance (GDPR)

315

* ❌ Terms of service

316

* ❌ Monitoring/logging

* ❌ Error tracking

* ❌ Analytics

* ❌ A/B testing

== 4. POC Simplifications vs. Full System ==

322

323

=== 4.1 Architecture Comparison ===

324

325

**POC Architecture (Simplified):**

326

{{code}}User Input → Single AKEL Call → Output Display

327

(all processing){{/code}}

328

329

**Full System Architecture:**

330

{{code}}User Input → Claim Extractor → Claim Classifier → Scenario Generator

331

→ Evidence Summarizer → Contradiction Detector → Verdict Generator

332

→ Quality Gates → Publication → Output Display{{/code}}

**Key Differences:**

|=Aspect|=POC1|=Full System

337

|Processing|Single API call|Multi-component pipeline

338

|Scenarios|None (implicit)|Explicit entities with versioning

339

|Evidence|Basic retrieval|Comprehensive with quality scoring

340

|Quality Gates|Simplified (4 basic checks)|Full validation infrastructure

341

|Workflow|3 steps (input/process/output)|6 phases with gates

342

|Data Model|Stateless (no database)|PostgreSQL + Redis + S3

343

|Architecture|Single prompt to Claude|AKEL Orchestrator + Components

344

345

=== 4.2 Workflow Comparison ===

**POC1 Workflow:**

1. User submits text/URL

350

2. Single AKEL call (all processing in one prompt)

351

3. Display results

352

**Total: 3 steps, 10-18 seconds**

353

354

**Full System Workflow:**

355

356

1. **Claim Submission** (extraction, normalization, clustering)

357

2. **Scenario Building** (definitions, assumptions, boundaries)

358

3. **Evidence Handling** (retrieval, assessment, linking)

359

4. **Verdict Creation** (synthesis, reasoning, approval)

360

5. **Public Presentation** (summaries, landscapes, deep dives)

361

6. **Time Evolution** (versioning, re-evaluation triggers)

362

**Total: 6 phases with quality gates, 10-30 seconds**

363

364

=== 4.3 Why POC is Simplified ===

365

366

**Engineering Rationale:**

367

368

1. **Test core capability first:** Can AI do basic fact-checking without humans?

369

2. **Fail fast:** If AI can't generate reasonable verdicts, pivot early

370

3. **Learn before building:** POC1 insights inform full architecture

371

4. **Iterative approach:** Add complexity only after validating foundations

372

5. **Resource efficiency:** Don't build full system if core concept fails

373

374

**Acceptable Trade-offs:**

375

376

* ✅ POC proves AI capability (most risky assumption)

377

* ✅ POC validates user comprehension (can people understand output?)

378

* ❌ POC doesn't validate full workflow (test in Beta)

379

* ❌ POC doesn't validate scale (test in Beta)

380

* ❌ POC doesn't validate scenario architecture (design in POC2)

381

382

=== 4.4 Gap Between POC1 and POC2/Beta ===

383

384

**What needs to be built for POC2:**

385

386

* Scenario generation component

387

* Evidence Model structure (full)

388

* Scenario-evidence linking

389

* Multi-interpretation comparison

390

* Truth landscape visualization

391

392

**What needs to be built for Beta:**

393

394

* Multi-component AKEL pipeline

395

* Quality gate infrastructure

396

* Review workflow system

397

* Audit sampling framework

398

* Production data model

399

* Federation architecture (Release 1.0)

400

401

**POC1 → POC2 is significant architectural expansion.**

402

403

== 5. Publication Mode & Labeling ==

404

405

=== 5.1 POC Publication Mode ===

406

407

**Mode:** Mode 2 (AI-Generated, No Prior Human Review)

408

409

Per FactHarbor Specification Section 11 "POC v1 Behavior":

410

411

* Produces public AI-generated output

412

* No human approval gate

413

* Clear AI-Generated labeling

414

* All quality gates active (simplified)

415

* Risk tier classification shown (demo)

416

417

=== 5.2 User-Facing Labels ===

418

419

**Primary Label (top of analysis):**

420

{{code}}╔════════════════════════════════════════════════════════════╗

421

║ [AI-GENERATED - POC/DEMO] ║

422

║ ║

423

║ This analysis was produced entirely by AI and has not ║

424

║ been human-reviewed. Use for demonstration purposes. ║

425

║ ║

426

║ Source: AI/AKEL v1.0 (POC) ║

427

║ Review Status: Not Reviewed (Proof-of-Concept) ║

428

║ Quality Gates: 4/4 Passed (Simplified) ║

429

║ Last Updated: [timestamp] ║

430

╚════════════════════════════════════════════════════════════╝{{/code}}

431

432

**Per-Claim Risk Labels:**

433

434

* **[Risk: A]** 🔴 High Risk (Medical/Legal/Safety)

435

* **[Risk: B]** 🟡 Medium Risk (Policy/Science)

436

* **[Risk: C]** 🟢 Low Risk (Facts/Definitions)

437

438

=== 5.3 Display Requirements ===

**Must Show:**

* AI-Generated status (prominent)

443

* POC/Demo disclaimer

444

* Risk tier per claim

445

* Confidence scores (0-100%)

446

* Quality gate status (passed/failed)

* Timestamp

**Must NOT Claim:**

* Human review

* Production quality

* Medical/legal advice

454

* Authoritative verdicts

455

* Complete accuracy

456

457

=== 5.4 Mode 2 vs. Full System Publication ===

458

459

|=Element|=POC Mode 2|=Full System Mode 2|=Full System Mode 3

460

|Label|AI-Generated (POC)|AI-Generated|AKEL-Generated

461

|Review|None|None|Human-Reviewed

462

|Quality Gates|4 (simplified)|6 (full)|6 (full) + Human

463

|Audit|None (POC)|Sampling (5-50%)|Pre-publication

464

|Risk Display|Demo only|Workflow-integrated|Validated

465

|User Actions|View only|Flag for review|Trust rating

466

467

== 6. Quality Gates (Simplified Implementation) ==

=== 6.1 Overview ===

Per FactHarbor Specification Section 6, all AI-generated content must pass quality gates before publication. POC implements **simplified versions** of the 4 mandatory gates.

472

473

**Full System Has 4 Gates:**

474

475

1. Source Quality

476

2. Contradiction Search (MANDATORY)

477

3. Uncertainty Quantification

478

4. Structural Integrity

479

480

**POC Implements Simplified Versions:**

481

482

* Focus on demonstrating concept

483

* Basic implementations sufficient

484

* Failures displayed to user (not blocking)

485

* Full system has comprehensive validation

486

487

=== 6.2 Gate 1: Source Quality (Basic) ===

488

489

**Full System Requirements:**

490

491

* Primary sources identified and accessible

492

* Source reliability scored against whitelist

493

* Citation completeness verified

494

* Publication dates checked

495

* Author credentials validated

496

497

**POC Implementation:**

498

499

* ✅ At least 2 sources found

500

* ✅ Sources accessible (URLs valid)

501

* ❌ No whitelist checking

502

* ❌ No credential validation

503

* ❌ No comprehensive reliability scoring

504

505

**Pass Criteria:** ≥2 accessible sources found

506

507

**Failure Handling:** Display error message, don't generate verdict

508

509

=== 6.3 Gate 2: Contradiction Search (Basic) ===

510

511

**Full System Requirements:**

512

513

* Counter-evidence actively searched

514

* Reservations and limitations identified

515

* Alternative interpretations explored

516

* Bubble detection (echo chambers, conspiracy theories)

517

* Cross-cultural and international perspectives

518

* Academic literature (supporting AND opposing)

519

520

**POC Implementation:**

521

522

* ✅ Basic search for counter-evidence

523

* ✅ Identify obvious contradictions

524

* ❌ No comprehensive academic search

525

* ❌ No bubble detection

526

* ❌ No systematic alternative interpretation search

527

* ❌ No international perspective verification

528

529

**Pass Criteria:** Basic contradiction search attempted

530

531

**Failure Handling:** Note "limited contradiction search" in output

532

533

=== 6.4 Gate 3: Uncertainty Quantification (Basic) ===

534

535

**Full System Requirements:**

536

537

* Confidence scores calculated for all claims/verdicts

538

* Limitations explicitly stated

539

* Data gaps identified and disclosed

540

* Strength of evidence assessed

541

* Alternative scenarios considered

542

543

**POC Implementation:**

544

545

* ✅ Confidence scores (0-100%)

546

* ✅ Basic uncertainty acknowledgment

547

* ❌ No detailed limitation disclosure

548

* ❌ No data gap identification

549

* ❌ No alternative scenario consideration (deferred to POC2)

550

551

**Pass Criteria:** Confidence score assigned

552

553

**Failure Handling:** Show "Confidence: Unknown" if calculation fails

554

555

=== 6.5 Gate 4: Structural Integrity (Basic) ===

556

557

**Full System Requirements:**

558

559

* No hallucinations detected (fact-checking against sources)

560

* Logic chain valid and traceable

561

* References accessible and verifiable

562

* No circular reasoning

563

* Premises clearly stated

564

565

**POC Implementation:**

566

567

* ✅ Basic coherence check

568

* ✅ References accessible

569

* ❌ No comprehensive hallucination detection

570

* ❌ No formal logic validation

571

* ❌ No premise extraction and verification

572

573

**Pass Criteria:** Output is coherent and references are accessible

574

575

**Failure Handling:** Display error message

576

577

=== 6.6 Quality Gate Display ===

578

579

**POC shows simplified status:**

580

{{code}}Quality Gates: 4/4 Passed (Simplified)

581

✓ Source Quality: 3 sources found

582

✓ Contradiction Search: Basic search completed

583

✓ Uncertainty: Confidence scores assigned

584

✓ Structural Integrity: Output coherent{{/code}}

585

586

**If any gate fails:**

587

{{code}}Quality Gates: 3/4 Passed (Simplified)

588

✓ Source Quality: 3 sources found

589

✗ Contradiction Search: Search failed - limited evidence

590

✓ Uncertainty: Confidence scores assigned

591

✓ Structural Integrity: Output coherent

592

593

Note: This analysis has limited evidence. Use with caution.{{/code}}

594

595

=== 6.7 Simplified vs. Full System ===

596

597

|=Gate|=POC (Simplified)|=Full System

598

|Source Quality|≥2 sources accessible|Whitelist scoring, credentials, comprehensiveness

599

|Contradiction|Basic search|Systematic academic + media + international

600

|Uncertainty|Confidence % assigned|Detailed limitations, data gaps, alternatives

601

|Structural|Coherence check|Hallucination detection, logic validation, premise check

602

603

**POC Goal:** Demonstrate that quality gates are possible, not perfect implementation.

604

605

== 7. AKEL Architecture Comparison ==

606

607

=== 7.1 POC AKEL (Simplified) ===

**Implementation:**

* Single provider API call (REASONING model)

612

* One comprehensive prompt

613

* All processing in single request

614

* No separate components

615

* No orchestration layer

616

617

**Prompt Structure:**

618

{{code}}Task: Analyze this article and provide:

619

620

1. Extract 3-5 factual claims

621

2. For each claim:

622

- Determine verdict (WELL-SUPPORTED/PARTIALLY/UNCERTAIN/REFUTED)

623

- Assign confidence score (0-100%)

624

- Assign risk tier (A/B/C)

625

- Write brief reasoning (1-3 sentences)

626

3. Generate analysis summary (3-5 sentences)

627

4. Generate article summary (3-5 sentences)

628

5. Run basic quality checks

629

630

Return as structured JSON.{{/code}}

631

632

**Processing Time:** 10-18 seconds (estimate)

633

634

=== 7.2 Full System AKEL (Production) ===

635

636

**Architecture:**

637

{{code}}AKEL Orchestrator

638

├── Claim Extractor

639

├── Claim Classifier (with risk tier assignment)

640

├── Scenario Generator

641

├── Evidence Summarizer

642

├── Contradiction Detector

643

├── Quality Gate Validator

644

├── Audit Sampling Scheduler

645

└── Federation Sync Adapter (Release 1.0+){{/code}}

**Processing:**

* Parallel processing where possible

650

* Separate component calls

651

* Quality gates between phases

652

* Audit sampling selection

653

* Cross-node coordination (federated mode)

654

655

**Processing Time:** 10-30 seconds (full pipeline)

656

657

=== 7.3 Why POC Uses Single Call ===

**Advantages:**

* ✅ Simpler to implement

662

* ✅ Faster POC development

663

* ✅ Easier to debug

664

* ✅ Proves AI capability

665

* ✅ Good enough for concept validation

**Limitations:**

* ❌ No component reusability

670

* ❌ No parallel processing

671

* ❌ All-or-nothing (can't partially succeed)

672

* ❌ Harder to improve individual components

673

* ❌ No audit sampling

674

675

**Acceptable Trade-off:**

676

677

POC tests "Can AI do this?" not "How should we architect it?"

678

679

Full component architecture comes in Beta after POC validates concept.

680

681

=== 7.4 Evolution Path ===

682

683

**POC1:** Single prompt → Prove concept

684

**POC2:** Add scenario component → Test full pipeline

685

**Beta:** Multi-component AKEL → Production architecture

686

**Release 1.0:** Full AKEL + Federation → Scale

687

688

== 8. Functional Requirements ==

689

690

=== FR-POC-1: Article Input ===

691

692

**Requirement:** User can submit article for analysis

**Functionality:**

* Text input field (paste article text, up to 5000 characters)

697

* URL input field (paste article URL)

698

* "Analyze" button to trigger processing

699

* Loading indicator during analysis

**Excluded:**

* No user authentication

704

* No claim history

705

* No search functionality

706

* No saved templates

707

708

**Acceptance Criteria:**

709

710

* User can paste text from article

711

* User can paste URL of article

712

* System accepts input and triggers analysis

713

714

=== FR-POC-2: Claim Extraction (Fully Automated) ===

715

716

**Requirement:** AI automatically extracts 3-5 factual claims

**Functionality:**

* AI reads article text

721

* AI identifies factual claims (not opinions/questions)

722

* AI extracts 3-5 most important claims

723

* System displays numbered list

724

725

**Critical:** NO MANUAL EDITING ALLOWED

726

727

* AI selects which claims to extract

728

* AI identifies factual vs. non-factual

729

* System processes claims as extracted

730

* No human curation or correction

**Error Handling:**

* If extraction fails: Display error message

735

* User can retry with different input

736

* No manual intervention to fix extraction

737

738

**Acceptance Criteria:**

739

740

* AI extracts 3-5 claims automatically

741

* Claims are factual (not opinions)

742

* Claims are clearly stated

743

* No manual editing required

744

745

=== FR-POC-3: Verdict Generation (Fully Automated) ===

746

747

**Requirement:** AI automatically generates verdict for each claim

**Functionality:**

* For each claim, AI:

752

* Evaluates claim based on available evidence/knowledge

753

* Determines verdict: ~~WELL-SUPPORTED / PARTIALLY SUPPORTED / UNCERTAIN / REFUTED~~ //(Now: 7-point scale - see Section 2.3)//

754

* Assigns confidence score (0-100%)

755

* Assigns risk tier (A/B/C)

756

* Writes brief reasoning (1-3 sentences)

757

* System displays verdict for each claim

758

759

**Critical:** NO MANUAL EDITING ALLOWED

760

761

* AI computes verdicts based on evidence

762

* AI generates confidence scores

763

* AI writes reasoning

764

* No human review or adjustment

**Error Handling:**

* If verdict generation fails: Display error message

769

* User can retry

770

* No manual intervention to adjust verdicts

771

772

**Acceptance Criteria:**

773

774

* Each claim has a verdict

775

* Confidence score is displayed (0-100%)

776

* Risk tier is displayed (A/B/C)

777

* Reasoning is understandable (1-3 sentences)

778

* Verdict is defensible given reasoning

779

* All generated automatically by AI

780

781

=== FR-POC-4: Analysis Summary (Fully Automated) ===

782

783

**Requirement:** AI generates brief summary of analysis

**Functionality:**

* AI summarizes findings in 3-5 sentences:

788

* How many claims found

789

* Distribution of verdicts

790

* Overall assessment

791

* System displays at top of results

792

793

**Critical:** NO MANUAL EDITING ALLOWED

794

795

**Acceptance Criteria:**

796

797

* Summary is coherent

798

* Accurately reflects analysis

799

* 3-5 sentences

800

* Automatically generated

801

802

=== FR-POC-5: Article Summary (Fully Automated, Optional) ===

803

804

**Requirement:** AI generates brief summary of original article

**Functionality:**

* AI summarizes article content (not FactHarbor's analysis)

* 3-5 sentences

* System displays

**Note:** Optional - can skip if time limited

813

814

**Critical:** NO MANUAL EDITING ALLOWED

815

816

**Acceptance Criteria:**

817

818

* Summary is neutral (article's position)

819

* Accurately reflects article content

820

* 3-5 sentences

821

* Automatically generated

822

823

=== FR-POC-6: Publication Mode Display ===

824

825

**Requirement:** Clear labeling of AI-generated content

**Functionality:**

* Display Mode 2 publication label

830

* Show POC/Demo disclaimer

831

* Display risk tiers per claim

832

* Show quality gate status

833

* Display timestamp

834

835

**Acceptance Criteria:**

836

837

* Label is prominent and clear

838

* User understands this is AI-generated POC output

839

* Risk tiers are color-coded

840

* Quality gate status is visible

841

842

=== FR-POC-7: Quality Gate Execution ===

843

844

**Requirement:** Execute simplified quality gates

**Functionality:**

* Check source quality (basic)

849

* Attempt contradiction search (basic)

850

* Calculate confidence scores

851

* Verify structural integrity (basic)

852

* Display gate results

853

854

**Acceptance Criteria:**

855

856

* All 4 gates attempted

857

* Pass/fail status displayed

858

* Failures explained to user

859

* Gates don't block publication (POC mode)

860

861

== 9. Non-Functional Requirements ==

862

863

=== NFR-POC-1: Fully Automated Processing ===

864

865

**Requirement:** Complete AI automation with zero manual intervention

866

867

**Critical Rule:** NO MANUAL EDITING AT ANY STAGE

**What this means:**

* Claims: AI selects (no human curation)

872

* Scenarios: N/A (deferred to POC2)

873

* Evidence: AI evaluates (no human selection)

874

* Verdicts: AI determines (no human adjustment)

875

* Summaries: AI writes (no human editing)

876

877

**Pipeline:**

878

{{code}}User Input → AKEL Processing → Output Display

879

↓

880

ZERO human editing{{/code}}

881

882

**If AI output is poor:**

883

884

* ❌ Do NOT manually fix it

885

* ✅ Document the failure

886

* ✅ Improve prompts and retry

887

* ✅ Accept that POC might fail

888

889

**Why this matters:**

890

891

* Tests whether AI can do this without humans

892

* Validates scalability (humans can't review every analysis)

893

* Honest test of technical feasibility

894

895

=== NFR-POC-2: Performance ===

896

897

**Requirement:** Analysis completes in reasonable time

898

899

**Acceptable Performance:**

900

901

* Processing time: 1-5 minutes (acceptable for POC)

902

* Display loading indicator to user

903

* Show progress if possible ("Extracting claims...", "Generating verdicts...")

**Not Required:**

* Production-level speed (< 30 seconds)

908

* Optimization for scale

909

* Caching

910

911

**Acceptance Criteria:**

912

913

* Analysis completes within 5 minutes

914

* User sees loading indicator

915

* No timeout errors

916

917

=== NFR-POC-3: Reliability ===

918

919

**Requirement:** System works for manual testing sessions

**Acceptable:**

* Occasional errors (< 20% failure rate)

924

* Manual restart if needed

925

* Display error messages clearly

**Not Required:**

* 99.9% uptime

* Automatic error recovery

931

* Production monitoring

932

933

**Acceptance Criteria:**

934

935

* System works for test demonstrations

936

* Errors are handled gracefully

937

* User receives clear error messages

938

939

=== NFR-POC-4: Environment ===

940

941

**Requirement:** Runs on simple infrastructure

**Acceptable:**

* Single machine or simple cloud setup

946

* No distributed architecture

947

* No load balancing

948

* No redundancy

949

* Local development environment viable

**Not Required:**

* Production infrastructure

954

* Multi-region deployment

* Auto-scaling

* Disaster recovery

=== NFR-POC-5: Cost Efficiency Tracking ===

959

960

**Requirement:** Track and display LLM usage metrics to inform optimization decisions

**Must Track:**

* Input tokens (article + prompt)

965

* Output tokens (generated analysis)

966

* Total tokens

967

* Estimated cost (USD)

968

* Response time (seconds)

969

* Article length (words/characters)

**Must Display:**

* Usage statistics in UI (Component 5)

974

* Cost per analysis

975

* Cost per claim extracted

**Must Log:**

* Aggregate metrics for analysis

980

* Cost distribution by article length

981

* Token efficiency trends

**Purpose:**

* Understand unit economics

986

* Identify optimization opportunities

987

* Project costs at scale

988

* Inform architecture decisions (caching, model selection, etc.)

989

990

**Acceptance Criteria:**

991

992

* ✅ Usage data displayed after each analysis

993

* ✅ Metrics logged for aggregate analysis

994

* ✅ Cost calculated accurately (Claude API pricing)

995

* ✅ Test cases include varying article lengths

996

* ✅ POC1 report includes cost analysis section

**Success Target:**

* Average cost per analysis < $0.05 USD

1001

* Cost scaling behavior understood (linear/exponential)

1002

* 2+ optimization opportunities identified

1003

1004

**Critical:** Unit economics must be viable for scaling decision!

1005

1006

== 10. Technical Architecture ==

1007

1008

=== 10.1 System Components ===

**Frontend:**

* Simple HTML form (text input + URL input + button)

1013

* Loading indicator

1014

* Results display page (single page, no tabs/navigation)

**Backend:**

* Single API endpoint

1019

* Calls provider API (REASONING model; configured via LLM abstraction)

1020

* Parses response

1021

* Returns JSON to frontend

**Data Storage:**

* None required (stateless POC)

1026

* Optional: Simple file storage or SQLite for demo examples

1027

1028

**External Services:**

1029

1030

* Claude API (Anthropic) - required

1031

* Optional: URL fetch service for article text extraction

1032

1033

=== 10.2 Processing Flow ===

1034

1035

1036

1. User submits text or URL

1037

↓

1038

2. Backend receives request

1039

↓

1040

3. If URL: Fetch article text

1041

↓

1042

4. Call Claude API with single prompt:

1043

"Extract claims, evaluate each, provide verdicts"

1044

↓

1045

5. Claude API returns:

1046

- Analysis summary

1047

- Claims list

1048

- Verdicts for each claim (with risk tiers)

1049

- Article summary (optional)

1050

- Quality gate results

1051

↓

1052

6. Backend parses response

1053

↓

1054

7. Frontend displays results with Mode 2 labeling

1055

1056

1057

**Key Simplification:** Single API call does entire analysis

1058

1059

=== 10.3 AI Prompt Strategy ===

1060

1061

**Single Comprehensive Prompt:**

1062

{{code}}Task: Analyze this article and provide:

1063

1064

1. Identify the article's main thesis/conclusion

1065

- What is the article trying to argue or prove?

1066

- What is the primary claim or conclusion?

1067

1068

2. Extract 3-5 factual claims from the article

1069

- Note which claims are CENTRAL to the main thesis

1070

- Note which claims are SUPPORTING facts

1071

1072

3. For each claim:

1073

- Determine verdict (7-point scale: TRUE/MOSTLY-TRUE/LEANING-TRUE/MIXED/UNVERIFIED/LEANING-FALSE/MOSTLY-FALSE/FALSE)

1074

- Assign confidence score (0-100%)

1075

- Assign risk tier (A: Medical/Legal/Safety, B: Policy/Science, C: Facts/Definitions)

1076

- Write brief reasoning (1-3 sentences)

1077

1078

4. Assess relationship between claims and main thesis:

1079

- Do the claims actually support the article's conclusion?

1080

- Are there logical leaps or unsupported inferences?

1081

- Is the article's framing misleading even if individual facts are accurate?

1082

1083

5. Run quality gates:

1084

- Check: ≥2 sources found

1085

- Attempt: Basic contradiction search

1086

- Calculate: Confidence scores

1087

- Verify: Structural integrity

1088

1089

6. Write context-aware analysis summary (4-6 sentences):

1090

- State article's main thesis

1091

- Report claims found and verdict distribution

1092

- Note if central claims are problematic

1093

- Assess whether evidence supports conclusion

1094

- Overall credibility considering claim importance

1095

1096

7. Write article summary (3-5 sentences: neutral summary of article content)

1097

1098

Return as structured JSON with quality gate results.{{/code}}

1099

1100

**One prompt generates everything.**

1101

1102

**Critical Addition:**

1103

1104

Steps 1, 2 (marking central claims), 4, and 6 are NEW for context-aware analysis. These test whether AI can distinguish between "accurate facts poorly reasoned" vs. "genuinely credible article."

1105

1106

=== 10.4 Technology Stack Suggestions ===

**Frontend:**

* HTML + CSS + JavaScript (minimal framework)

1111

* OR: Next.js (if team prefers)

1112

* Hosted: Local machine OR Vercel/Netlify free tier

**Backend:**

* Python Flask/FastAPI (simple REST API)

1117

* OR: Next.js API routes (if using Next.js)

1118

* Hosted: Local machine OR Railway/Render free tier

1119

1120

**AKEL Integration:**

1121

1122

* Claude API via Anthropic SDK

1123

* Model: Provider-default REASONING model or latest available

**Database:**

* None (stateless acceptable)

1128

* OR: SQLite if want to store demo examples

1129

* OR: JSON files on disk

**Deployment:**

* Local development environment sufficient for POC

1134

* Optional: Deploy to cloud for remote demos

1135

1136

== 11. Success Criteria ==

1137

1138

=== 11.1 Minimum Success (POC Passes) ===

1139

1140

**Required for GO decision:**

1141

1142

* ✅ AI extracts 3-5 factual claims automatically

1143

* ✅ AI provides verdict for each claim automatically

1144

* ✅ Verdicts are reasonable (≥70% make logical sense)

1145

* ✅ Analysis summary is coherent

1146

* ✅ Output is comprehensible to reviewers

1147

* ✅ Team/advisors understand the output

1148

* ✅ Team agrees approach has merit

1149

* ✅ **Minimal or no manual editing needed** (< 30% of analyses require manual intervention)

1150

* ✅ **Cost efficiency acceptable** (average cost per analysis < $0.05 USD target)

1151

* ✅ **Cost scaling understood** (data collected on article length vs. cost)

1152

* ✅ **Optimization opportunities identified** (≥2 potential improvements documented)

1153

1154

**Quality Definition:**

1155

1156

* "Reasonable verdict" = Defensible given general knowledge

1157

* "Coherent summary" = Logically structured, grammatically correct

1158

* "Comprehensible" = Reviewers understand what analysis means

1159

1160

=== 11.2 POC Fails If ===

1161

1162

**Automatic NO-GO if any of these:**

1163

1164

* ❌ Claim extraction poor (< 60% accuracy - extracts non-claims or misses obvious ones)

1165

* ❌ Verdicts nonsensical (< 60% reasonable - contradictory or random)

1166

* ❌ Output incomprehensible (reviewers can't understand analysis)

1167

* ❌ **Requires manual editing for most analyses** (> 50% need human correction)

1168

* ❌ Team loses confidence in AI-automated approach

1169

1170

=== 11.3 Quality Thresholds ===

1171

1172

**POC quality expectations:**

1173

1174

|=Component|=Quality Threshold|=Definition

1175

|Claim Extraction|(% class="success" %)≥70% accuracy |Identifies obvious factual claims, may miss some edge cases

1176

|Verdict Logic|(% class="success" %)≥70% defensible |Verdicts are logical given reasoning provided

1177

|Reasoning Clarity|(% class="success" %)≥70% clear |1-3 sentences are understandable and relevant

1178

|Overall Analysis|(% class="success" %)≥70% useful |Output helps user understand article claims

1179

1180

**Analogy:** "B student" quality (70-80%), not "A+" perfection yet

**Not expecting:**

* 100% accuracy

* Perfect claim coverage

1186

* Comprehensive evidence gathering

* Flawless verdicts

* Production polish

**Expecting:**

* Reasonable claim extraction

1193

* Defensible verdicts

1194

* Understandable reasoning

* Useful output

== 12. Test Cases ==

=== 12.1 Test Case 1: Simple Factual Claim ===

1200

1201

**Input:** "Coffee reduces the risk of type 2 diabetes by 30%"

**Expected Output:**

* Extract claim correctly

1206

* Provide verdict: WELL-SUPPORTED or PARTIALLY SUPPORTED

1207

* Confidence: 70-90%

1208

* Risk tier: C (Low)

1209

* Reasoning: Mentions studies or evidence

1210

1211

**Success:** Verdict is reasonable and reasoning makes sense

1212

1213

=== 12.2 Test Case 2: Complex News Article ===

1214

1215

**Input:** News article URL with multiple claims about politics/health/science

**Expected Output:**

* Extract 3-5 key claims

1220

* Verdict for each (may vary: some supported, some uncertain, some refuted)

1221

* Coherent analysis summary

1222

* Article summary

1223

* Risk tiers assigned appropriately

1224

1225

**Success:** Claims identified are actually from article, verdicts are reasonable

1226

1227

=== 12.3 Test Case 3: Controversial Topic ===

1228

1229

**Input:** Article on contested political or scientific topic

**Expected Output:**

* Balanced analysis

* Acknowledges uncertainty where appropriate

1235

* Doesn't overstate confidence

1236

* Reasoning shows awareness of complexity

1237

1238

**Success:** Analysis is fair and doesn't show obvious bias

1239

1240

=== 12.4 Test Case 4: Clearly False Claim ===

1241

1242

**Input:** Article with obviously false claim (e.g., "The Earth is flat")

**Expected Output:**

* Extract claim

* Verdict: REFUTED

* High confidence (> 90%)

1249

* Risk tier: C (Low - established fact)

1250

* Clear reasoning

1251

1252

**Success:** AI correctly identifies false claim with high confidence

1253

1254

=== 12.5 Test Case 5: Genuinely Uncertain Claim ===

1255

1256

**Input:** Article with claim where evidence is genuinely mixed

**Expected Output:**

* Extract claim

* Verdict: UNCERTAIN

* Moderate confidence (40-60%)

1263

* Reasoning explains why uncertain

1264

1265

**Success:** AI recognizes uncertainty and doesn't overstate confidence

1266

1267

=== 12.6 Test Case 6: High-Risk Medical Claim ===

1268

1269

**Input:** Article making medical claims

**Expected Output:**

* Extract claim

* Verdict: [appropriate based on evidence]

1275

* Risk tier: A (High - medical)

1276

* Red label displayed

1277

* Clear disclaimer about not being medical advice

1278

1279

**Success:** Risk tier correctly assigned, appropriate warnings shown

1280

1281

== 13. POC Decision Gate ==

1282

1283

=== 13.1 Decision Framework ===

1284

1285

After POC testing complete, team makes one of three decisions:

1286

1287

**Option A: GO (Proceed to POC2)**

**Conditions:**

* AI quality ≥70% without manual editing

1292

* Basic claim → verdict pipeline validated

1293

* Internal + advisor feedback positive

1294

* Technical feasibility confirmed

1295

* Team confident in direction

1296

* Clear path to improving AI quality to ≥90%

**Next Steps:**

* Plan POC2 development (add scenarios)

1301

* Design scenario architecture

1302

* Expand to Evidence Model structure

1303

* Test with more complex articles

1304

1305

**Option B: NO-GO (Pivot or Stop)**

**Conditions:**

* AI quality < 60%

* Requires manual editing for most analyses (> 50%)

1311

* Feedback indicates fundamental flaws

1312

* Cost/effort not justified by value

1313

* No clear path to improvement

**Next Steps:**

* **Pivot:** Change to hybrid human-AI approach (accept manual review required)

1318

* **Stop:** Conclude approach not viable, revisit later

1319

1320

**Option C: ITERATE (Improve POC)**

**Conditions:**

* Concept has merit but execution needs work

1325

* Specific improvements identified

1326

* Addressable with better prompts/approach

1327

* AI quality between 60-70%

**Next Steps:**

* Improve AI prompts

* Test different approaches

1333

* Re-run POC with improvements

1334

* Then make GO/NO-GO decision

1335

1336

=== 13.2 Decision Criteria Summary ===

1337

1338

1339

AI Quality < 60% → NO-GO (approach doesn't work)

1340

AI Quality 60-70% → ITERATE (improve and retry)

1341

AI Quality ≥70% → GO (proceed to POC2)

1342

1343

1344

== 14. Key Risks & Mitigations ==

1345

1346

=== 14.1 Risk: AI Quality Not Good Enough ===

1347

1348

**Likelihood:** Medium-High

1349

**Impact:** POC fails

**Mitigation:**

* Extensive prompt engineering and testing

1354

* Use best available AI models (role-based selection; configured via LLM abstraction)

1355

* Test with diverse article types

1356

* Iterate on prompts based on results

1357

1358

**Acceptance:** This is what POC tests - be ready for failure

1359

1360

=== 14.2 Risk: AI Consistency Issues ===

1361

1362

**Likelihood:** Medium

1363

**Impact:** Works sometimes, fails other times

**Mitigation:**

* Test with 10+ diverse articles

1368

* Measure success rate honestly

1369

* Improve prompts to increase consistency

1370

1371

**Acceptance:** Some variability OK if average quality ≥70%

1372

1373

=== 14.3 Risk: Output Incomprehensible ===

1374

1375

**Likelihood:** Low-Medium

1376

**Impact:** Users can't understand analysis

**Mitigation:**

* Create clear explainer document

1381

* Iterate on output format

1382

* Test with non-technical reviewers

1383

* Simplify language if needed

1384

1385

**Acceptance:** Iterate until comprehensible

1386

1387

=== 14.4 Risk: API Rate Limits / Costs ===

1388

1389

**Likelihood:** Low

1390

**Impact:** System slow or expensive

**Mitigation:**

* Monitor API usage

* Implement retry logic

1396

* Estimate costs before scaling

1397

1398

**Acceptance:** POC can be slow and expensive (optimization later)

1399

1400

=== 14.5 Risk: Scope Creep ===

1401

1402

**Likelihood:** Medium

1403

**Impact:** POC becomes too complex

**Mitigation:**

* Strict scope discipline

1408

* Say NO to feature additions

1409

* Keep focus on core question

1410

1411

**Acceptance:** POC is minimal by design

1412

1413

== 15. POC Philosophy ==

1414

1415

=== 15.1 Core Principles ===

* \\

** \\

**1. Build Less, Learn More

1420

* Minimum features to test hypothesis

1421

* Don't build unvalidated features

1422

* Focus on core question only

**2. Fail Fast**

* Quick test of hardest part (AI capability)

1427

* Accept that POC might fail

1428

* Better to discover issues early

1429

* Honest assessment over optimistic hope

1430

1431

**3. Test First, Build Second**

1432

1433

* Validate AI can do this before building platform

1434

* Don't assume it will work

1435

* Let results guide decisions

1436

1437

**4. Automation First**

1438

1439

* No manual editing allowed

1440

* Tests scalability, not just feasibility

1441

* Proves approach can work at scale

1442

1443

**5. Honest Assessment**

1444

1445

* Don't cherry-pick examples

1446

* Don't manually fix bad outputs

1447

* Document failures openly

1448

* Make data-driven decisions

1449

1450

=== 15.2 What POC Is ===

1451

1452

✅ Testing AI capability without humans

1453

✅ Proving core technical concept

1454

✅ Fast validation of approach

1455

✅ Honest assessment of feasibility

1456

1457

=== 15.3 What POC Is NOT ===

1458

1459

❌ Building a product

1460

❌ Production-ready system

1461

❌ Feature-complete platform

1462

❌ Perfectly accurate analysis

1463

❌ Polished user experience

== 16. Success ==

Clear Path Forward ==

1468

1469

**If POC succeeds (≥70% AI quality):**

1470

1471

* ✅ Approach validated

1472

* ✅ Proceed to POC2 (add scenarios)

1473

* ✅ Design full Evidence Model structure

1474

* ✅ Test multi-scenario comparison

1475

* ✅ Focus on improving AI quality from 70% → 90%

1476

1477

**If POC fails (< 60% AI quality):**

1478

1479

* ✅ Learn what doesn't work

1480

* ✅ Pivot to different approach

1481

* ✅ OR wait for better AI technology

1482

* ✅ Avoid wasting resources on non-viable approach

1483

1484

**Either way, POC provides clarity.**

1485

1486

== 17. Related Pages ==

1487

1488

* [[User Needs>>Archive.FactHarbor 2026\.02\.08.Specification.Requirements.User Needs.WebHome]]

1489

* [[Requirements>>Archive.FactHarbor 2026\.02\.08.Specification.Requirements.WebHome]]

1490

* [[Gap Analysis>>FactHarbor.Specification.Requirements.GapAnalysis]]

1491

* [[Architecture>>Archive.FactHarbor 2026\.02\.08.Specification.Architecture.WebHome]]

1492

* [[AKEL>>Archive.FactHarbor 2026\.02\.08.Specification.AI Knowledge Extraction Layer (AKEL).WebHome]]

1493

* [[Workflows>>Archive.FactHarbor 2026\.02\.08.Specification.Workflows.WebHome]]

1494

1495

**Document Status:** ✅ Ready for POC Development (Version 2.0 - Updated with Spec Alignment)

1496

1497

1498

=== NFR-POC-11: LLM Provider Abstraction (POC1) ===

1499

1500

**Requirement:** POC1 MUST implement LLM abstraction layer with support for multiple providers.

1501

1502

**POC1 Implementation:**

1503

1504

* **Primary Provider:** Anthropic Claude API

1505

* Stage 1: Provider-default FAST model

1506

* Stage 2: Provider-default REASONING model (cached)

1507

* Stage 3: Provider-default REASONING model

1508

1509

* **Provider Interface:** Abstract LLMProvider interface implemented

1510

1511

* **Configuration:** Environment variables for provider selection

1512

* {{code}}LLM_PRIMARY_PROVIDER=anthropic{{/code}}

1513

* {{code}}LLM_STAGE1_MODEL=claude-haiku-4{{/code}}

1514

* {{code}}LLM_STAGE2_MODEL=claude-sonnet-3-5{{/code}}

1515

1516

* **Failover:** Basic error handling with cache fallback for Stage 2

1517

1518

* **Cost Tracking:** Log provider name and cost per request

1519

1520

**Future (POC2/Beta):**

1521

1522

* Secondary provider (OpenAI) with automatic failover

1523

* Admin API for runtime provider switching

1524

* Cost comparison dashboard

1525

* Cross-provider output verification

1526

1527

**Success Criteria:**

1528

1529

* All LLM calls go through abstraction layer (no direct API calls)

1530

* Provider can be changed via environment variable without code changes

1531

* Cost tracking includes provider name in logs

1532

* Stage 2 falls back to cache on provider failure

1533

1534

**Implementation:** See [[POC1 API & Schemas Specification>>Test.FactHarbor.Specification.POC.API-and-Schemas.WebHome]] Section 6

**Dependencies:**

* NFR-14 (Main Requirements)

1539

* Design Decision 9

1540

* Architecture Section 2.2

1541

1542

**Priority:** HIGH (P1)

1543

1544

**Rationale:** Even though POC1 uses single provider, abstraction must be in place from start to avoid costly refactoring later.

Wiki source code of POC Requirements (POC1 & POC2)

Applications

Navigation

Need help?