FactHarbor POC1 Architecture Analysis

| **Claim** | ⚠️ Partial | No persistent storage; claims exist only in JSON result. Missing: `status`, `confidence_score`, `risk_score`, `completeness_score`, `version`, `views`, `edit_count` |

302

| **Evidence** | ⚠️ Partial | Implemented as `ExtractedFact` but lacks: `supports` enum, proper `relevance_score` |

303

| **Source** | ⚠️ Partial | `FetchedSource` exists but missing: `type` enum, `accuracy_history`, `correction_frequency`, weekly update scheduler |

304

| **Scenario** | ❌ Missing | Not implemented. Claims are evaluated directly without scenario contexts |

305

| **Verdict** | ⚠️ Partial | `ClaimVerdict` exists but missing: `likelihood_range`, `uncertainty_factors` array, proper `explanation_summary` |

306

| **User** | ❌ Missing | No user authentication or role system |

307

| **Edit** | ❌ Missing | No audit trail for changes |

308

309

310

=== 4.2 AKEL Component Gaps ===

311

312

| Spec Component | POC1 Status | Gap Description |

313

|----------------|-------------|-----------------|

314

| **AKEL Orchestrator** | ✅ Implemented | `runAnalysis()` function serves this role |

315

| **Claim Extractor** | ✅ Implemented | `understandClaim()` with claim role/dependency tracking |

316

| **Claim Classifier** | ⚠️ Partial | Risk tier (A/B/C) assigned, but no domain classification |

317

| **Scenario Generator** | ❌ Missing | Claims evaluated without scenario extraction |

318

| **Evidence Summarizer** | ✅ Implemented | `extractFacts()` function |

319

| **Contradiction Detector** | ⚠️ Partial | `isContestedClaim` flag exists but no active contradiction search |

320

| **Quality Gate Validator** | ❌ Missing | No source quality gates, no mandatory checks |

321

| **Audit Sampling Scheduler** | ❌ Missing | No audit system |

322

| **Embedding Handler** | ❌ Missing | Not needed for POC |

323

| **Federation Sync** | ❌ Missing | Not needed for POC |

324

325

326

=== 4.3 Architecture Gaps===

327

328

329

| Spec Requirement | POC1 Status | Gap Description |

330

|------------------|-------------|-----------------|

331

| **Three-Layer Architecture** | ✅ Implemented | Interface (Next.js) → Processing (AKEL) → Data (SQLite) |

332

| **LLM Abstraction Layer** | ✅ Implemented | AI SDK supports multiple providers with failover |

333

| **PostgreSQL Primary DB** | ⚠️ Different | Using SQLite for simplicity (acceptable for POC) |

334

| **Redis Caching** | ❌ Missing | No caching layer |

335

| **S3 Archival** | ❌ Missing | No long-term storage |

336

| **Background Jobs** | ❌ Missing | No scheduler for source updates, cache warming |

337

| **Quality Monitoring** | ⚠️ Partial | LLM call counting exists, but no anomaly detection |

338

339

340

=== 4.4 Publication & Review Gaps===

341

342

343

| Spec Feature | POC1 Status | Gap Description |

344

|--------------|-------------|-----------------|

345

| **Risk Tier Publication Rules** | ❌ Missing | All results published immediately regardless of tier |

346

| **Human Review Queue** | ❌ Missing | No review workflow |

347

| **AI-Generated Labeling** | ⚠️ Partial | Results show "AI analysis" but no formal labeling system |

348

| **Audit Rate Sampling** | ❌ Missing | No sampling audits |

---

== 5. Optimization Recommendations==

=== 5.1 Cost Optimizations===

pie title Current LLM Cost Distribution (Estimated per Analysis)

362

"Step 1: Understand" : 15

363

"Step 2: Research (per source)" : 60

364

"Step 3: Verdicts" : 25

365

366

367

| Optimization | Estimated Savings | Implementation Effort |

368

|--------------|-------------------|----------------------|

369

| **Cache claim understanding** | 30-50% on repeated claims | Medium |

370

| **Use Haiku for fact extraction** | 40% on Step 2 costs | Low (config change) |

371

| **Batch fact extraction** | 20% fewer API calls | Medium |

372

| **Skip search for known claims** | 50%+ for cached claims | High (needs claim DB) |

373

| **Reduce max iterations** | Linear reduction | Low (config change) |

374

375

376

=== 5.2 Timing Optimizations===

gantt

title Current Analysis Timeline (Typical)

dateFormat ss

axisFormat %S sec

section Current Flow

URL Fetch :a1, 00, 2s

387

Step 1 Understand :a2, after a1, 15s

388

Search Iteration 1 :a3, after a2, 8s

389

Fetch Sources 1 :a4, after a3, 10s

390

Extract Facts 1 :a5, after a4, 12s

391

Search Iteration 2 :a6, after a5, 8s

392

Fetch Sources 2 :a7, after a6, 10s

393

Extract Facts 2 :a8, after a7, 12s

394

Generate Verdicts :a9, after a8, 15s

395

396

section Optimized Flow

397

URL Fetch :b1, 00, 2s

398

Step 1 Understand :b2, after b1, 10s

399

Search + Fetch (parallel) :b3, after b2, 12s

400

Extract Facts (batched) :b4, after b3, 8s

401

Generate Verdicts :b5, after b4, 10s

402

403

404

| Optimization | Time Savings | Notes |

405

|--------------|--------------|-------|

406

| **Parallel source fetching** | Already implemented | Currently fetches 3 sources in parallel |

407

| **Streaming LLM responses** | 20-30% perceived | User sees progress faster |

408

| **Search query batching** | 10-15% | Send multiple queries to search API |

409

| **Reduce prompt size** | 5-10% per call | Optimize system prompts |

410

| **Use faster models for extraction** | 30-40% on Step 2 | Claude Haiku vs Sonnet |

411

412

413

=== 5.3 Priority Recommendations===

414

415

416

1. **HIGH PRIORITY - Implement Claim Caching**

417

- Cache claim verdicts by content hash

418

- Reduces costs for repeated/similar claims

419

- Enables the separated verdict architecture (see Section 6)

420

421

2. **MEDIUM PRIORITY - Use Tiered Models**

422

- Step 1 (Understand): Sonnet (needs reasoning)

423

- Step 2 (Extract): Haiku (simple extraction)

424

- Step 3 (Verdicts): Sonnet (needs synthesis)

425

426

3. **LOW PRIORITY - Add Redis Cache**

427

- Cache source content (24h TTL)

428

- Cache search results (1h TTL)

429

- Reduces external API calls

---

== 6. Separated Verdict Architecture Proposal==

=== 6.1 Current Architecture===

flowchart LR

subgraph Current["Current: Monolithic Analysis"]

444

INPUT[Article Input] --> ANALYZE[Full Analysis Pipeline]

445

ANALYZE --> CLAIMS[Claim Verdicts]

446

ANALYZE --> ARTICLE[Article Verdict]

447

CLAIMS -.->|"Aggregated"| ARTICLE

end

**Issues:**

- Every analysis re-processes all claims

453

- No caching of individual claim verdicts

454

- Article verdict tightly coupled to claim extraction

455

456

457

=== 6.2 Proposed Separated Architecture===

flowchart TB

subgraph Input["Input Processing"]

463

ARTICLE[Article/Text Input]

464

EXTRACT[Claim Extraction]

465

end

466

467

subgraph ClaimLayer["Claim Verdict Layer (Cacheable)"]

468

CACHE[(Claim Cache ━━━━━━━━━━━━━ Key: claim_hash TTL: 7 days)]

469

470

CLAIM1["Claim 1 Analysis"]

471

CLAIM2["Claim 2 Analysis"]

472

CLAIM3["Claim N Analysis"]

473

474

VERDICT1[Claim 1 Verdict]

475

VERDICT2[Claim 2 Verdict]

476

VERDICT3[Claim N Verdict]

477

end

478

479

subgraph ArticleLayer["Article Verdict Layer (Dynamic)"]

480

AGGREGATE[Aggregate Claim Verdicts]

481

CONTEXT[Apply Article Context ━━━━━━━━━━━━━ • Claim relationships • Logical structure • Author intent]

482

ARTICLE_VERDICT[Article Verdict]

end

%% Flow

ARTICLE --> EXTRACT

EXTRACT --> CLAIM1

EXTRACT --> CLAIM2

EXTRACT --> CLAIM3

CLAIM1 -->|"Cache Miss"| VERDICT1

492

CLAIM2 -->|"Cache Hit"| VERDICT2

493

CLAIM3 -->|"Cache Miss"| VERDICT3

CLAIM1 <-.-> CACHE

CLAIM2 <-.-> CACHE

CLAIM3 <-.-> CACHE

VERDICT1 --> AGGREGATE

500

VERDICT2 --> AGGREGATE

501

VERDICT3 --> AGGREGATE

502

503

AGGREGATE --> CONTEXT

504

CONTEXT --> ARTICLE_VERDICT

505

506

classDef cache fill:#fff9c4,stroke:#f57f17,stroke-width:2px

507

classDef dynamic fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px

508

class CACHE cache

509

class CONTEXT,ARTICLE_VERDICT dynamic

=== 6.3 Benefits Analysis===

514

515

516

| Benefit | Impact | Rationale |

517

|---------|--------|-----------|

518

| **Cost Reduction** | 40-70% for repeated claims | Many articles share common claims (e.g., "COVID vaccines are safe") |

519

| **Faster Analysis** | 50%+ for cached claims | Skip research + LLM calls for known claims |

520

| **Consistency** | High | Same claim always gets same verdict (until cache expires) |

521

| **Freshness Control** | Configurable TTL | Balance consistency vs. new evidence |

522

| **Scalability** | Linear improvement | More users = higher cache hit rate |

523

524

525

=== 6.4 Implementation Considerations ===

526

527

**Claim Hashing Strategy:**

528

529

function getClaimHash(claim: string): string {

530

// Normalize: lowercase, remove punctuation, stem words

531

const normalized = normalize(claim);

532

// Hash for cache key

533

return crypto.createHash('sha256').update(normalized).digest('hex').slice(0, 16);

}

**Cache Invalidation Triggers:**

538

- TTL expiration (default 7 days)

539

- Major news event related to claim topic

540

- Source track record significant change

541

- Manual invalidation by moderator

542

543

**Article Verdict Considerations:**

544

- Article verdict should ALWAYS be dynamic (never cached)

545

- Same claims in different article contexts may yield different article verdicts

546

- Example: "Vaccines are safe" + "Vaccines cause autism" → article may be misleading even if first claim is true

547

548

### 6.5 Recommendation

549

550

**YES, separating is beneficial** with the following caveats:

551

552

1. **Claim verdicts should be cached** with semantic similarity matching (not just exact match)

553

2. **Article verdicts should always be dynamic** to account for:

554

- Claim relationships and logical structure

555

- Author's argumentative strategy

556

- Context and framing

557

- Selective use of true claims to support false conclusions

558

559

3. **Implementation phases:**

560

- Phase 1: Exact-match claim caching (simple hash)

561

- Phase 2: Semantic similarity caching (embedding-based)

562

- Phase 3: Federated claim sharing across instances

---

== 7. Summary==

=== Current State===

- POC1 implements core AKEL pipeline successfully

574

- Claim dependency tracking is implemented

575

- Multiple LLM providers supported

576

- No persistent claim storage or caching

577

578

579

=== Key Gaps from Specification===

580

581

- No scenario extraction

582

- No user/role system

583

- No audit trail

584

- No source track record updates

- No review queue

=== Recommended Next Steps===

589

590

1. Implement claim caching layer

591

2. Separate claim vs article verdict generation

592

3. Add Redis for source/search caching

593

4. Implement tiered model selection

594

5. Add basic audit logging

Wiki source code of FactHarbor POC1 Architecture Analysis

Applications

Navigation

Need help?