FactHarbor POC1 Architecture Analysis

| **Claim** | ⚠️ Partial | No persistent storage; claims exist only in JSON result. Missing: `status`, `confidence_score`, `risk_score`, `completeness_score`, `version`, `views`, `edit_count` |

301

| **Evidence** | ⚠️ Partial | Implemented as `ExtractedFact` but lacks: `supports` enum, proper `relevance_score` |

302

| **Source** | ⚠️ Partial | `FetchedSource` exists but missing: `type` enum, `accuracy_history`, `correction_frequency`, weekly update scheduler |

303

| **Scenario** | ❌ Missing | Not implemented. Claims are evaluated directly without scenario contexts |

304

| **Verdict** | ⚠️ Partial | `ClaimVerdict` exists but missing: `likelihood_range`, `uncertainty_factors` array, proper `explanation_summary` |

305

| **User** | ❌ Missing | No user authentication or role system |

306

| **Edit** | ❌ Missing | No audit trail for changes |

307

308

=== 4.2 AKEL Component Gaps ===

309

310

| Spec Component | POC1 Status | Gap Description |

311

| |-|-|

312

| **AKEL Orchestrator** | ✅ Implemented | `runAnalysis()` function serves this role |

313

| **Claim Extractor** | ✅ Implemented | `understandClaim()` with claim role/dependency tracking |

314

| **Claim Classifier** | ⚠️ Partial | Risk tier (A/B/C) assigned, but no domain classification |

315

| **Scenario Generator** | ❌ Missing | Claims evaluated without scenario extraction |

316

| **Evidence Summarizer** | ✅ Implemented | `extractFacts()` function |

317

| **Contradiction Detector** | ⚠️ Partial | `isContestedClaim` flag exists but no active contradiction search |

318

| **Quality Gate Validator** | ❌ Missing | No source quality gates, no mandatory checks |

319

| **Audit Sampling Scheduler** | ❌ Missing | No audit system |

320

| **Embedding Handler** | ❌ Missing | Not needed for POC |

321

| **Federation Sync** | ❌ Missing | Not needed for POC |

322

323

=== 4.3 Architecture Gaps ===

324

325

326

| Spec Requirement | POC1 Status | Gap Description |

327

| |-|-|

328

| **Three-Layer Architecture** | ✅ Implemented | Interface (Next.js) → Processing (AKEL) → Data (SQLite) |

329

| **LLM Abstraction Layer** | ✅ Implemented | AI SDK supports multiple providers with failover |

330

| **PostgreSQL Primary DB** | ⚠️ Different | Using SQLite for simplicity (acceptable for POC) |

331

| **Redis Caching** | ❌ Missing | No caching layer |

332

| **S3 Archival** | ❌ Missing | No long-term storage |

333

| **Background Jobs** | ❌ Missing | No scheduler for source updates, cache warming |

334

| **Quality Monitoring** | ⚠️ Partial | LLM call counting exists, but no anomaly detection |

335

336

=== 4.4 Publication & Review Gaps ===

337

338

339

| Spec Feature | POC1 Status | Gap Description |

340

| |-|-|

341

| **Risk Tier Publication Rules** | ❌ Missing | All results published immediately regardless of tier |

342

| **Human Review Queue** | ❌ Missing | No review workflow |

343

| **AI-Generated Labeling** | ⚠️ Partial | Results show "AI analysis" but no formal labeling system |

344

| **Audit Rate Sampling** | ❌ Missing | No sampling audits |

----

== 5. Optimization Recommendations ==

=== 5.1 Cost Optimizations ===

pie title Current LLM Cost Distribution (Estimated per Analysis)

358

"Step 1: Understand" : 15

359

"Step 2: Research (per source)" : 60

360

"Step 3: Verdicts" : 25

361

362

363

| Optimization | Estimated Savings | Implementation Effort |

364

| |-| |

365

| **Cache claim understanding** | 30-50% on repeated claims | Medium |

366

| **Use Haiku for fact extraction** | 40% on Step 2 costs | Low (config change) |

367

| **Batch fact extraction** | 20% fewer API calls | Medium |

368

| **Skip search for known claims** | 50%+ for cached claims | High (needs claim DB) |

369

| **Reduce max iterations** | Linear reduction | Low (config change) |

370

371

=== 5.2 Timing Optimizations ===

gantt

title Current Analysis Timeline (Typical)

dateFormat ss

axisFormat %S sec

section Current Flow

URL Fetch :a1, 00, 2s

382

Step 1 Understand :a2, after a1, 15s

383

Search Iteration 1 :a3, after a2, 8s

384

Fetch Sources 1 :a4, after a3, 10s

385

Extract Facts 1 :a5, after a4, 12s

386

Search Iteration 2 :a6, after a5, 8s

387

Fetch Sources 2 :a7, after a6, 10s

388

Extract Facts 2 :a8, after a7, 12s

389

Generate Verdicts :a9, after a8, 15s

390

391

section Optimized Flow

392

URL Fetch :b1, 00, 2s

393

Step 1 Understand :b2, after b1, 10s

394

Search + Fetch (parallel) :b3, after b2, 12s

395

Extract Facts (batched) :b4, after b3, 8s

396

Generate Verdicts :b5, after b4, 10s

397

398

399

| Optimization | Time Savings | Notes |

400

| | |-|

401

| **Parallel source fetching** | Already implemented | Currently fetches 3 sources in parallel |

402

| **Streaming LLM responses** | 20-30% perceived | User sees progress faster |

403

| **Search query batching** | 10-15% | Send multiple queries to search API |

404

| **Reduce prompt size** | 5-10% per call | Optimize system prompts |

405

| **Use faster models for extraction** | 30-40% on Step 2 | Claude Haiku vs Sonnet |

406

407

=== 5.3 Priority Recommendations ===

408

409

410

1. **HIGH PRIORITY - Implement Claim Caching**

411

- Cache claim verdicts by content hash

412

- Reduces costs for repeated/similar claims

413

- Enables the separated verdict architecture (see Section 6)

414

415

2. **MEDIUM PRIORITY - Use Tiered Models**

416

- Step 1 (Understand): Sonnet (needs reasoning)

417

- Step 2 (Extract): Haiku (simple extraction)

418

- Step 3 (Verdicts): Sonnet (needs synthesis)

419

420

3. **LOW PRIORITY - Add Redis Cache**

421

- Cache source content (24h TTL)

422

- Cache search results (1h TTL)

423

- Reduces external API calls

----

== 6. Separated Verdict Architecture Proposal ==

=== 6.1 Current Architecture ===

flowchart LR

subgraph Current["Current: Monolithic Analysis"]

438

INPUT[Article Input] --> ANALYZE[Full Analysis Pipeline]

439

ANALYZE --> CLAIMS[Claim Verdicts]

440

ANALYZE --> ARTICLE[Article Verdict]

441

CLAIMS -.->|"Aggregated"| ARTICLE

end

**Issues:**

- Every analysis re-processes all claims

447

- No caching of individual claim verdicts

448

- Article verdict tightly coupled to claim extraction

449

450

451

=== 6.2 Proposed Separated Architecture ===

flowchart TB

subgraph Input["Input Processing"]

457

ARTICLE[Article/Text Input]

458

EXTRACT[Claim Extraction]

459

end

460

461

subgraph ClaimLayer["Claim Verdict Layer (Cacheable)"]

462

CACHE[(Claim Cache ━━━━━━━━━━━━━ Key: claim_hash TTL: 7 days)]

463

464

CLAIM1["Claim 1 Analysis"]

465

CLAIM2["Claim 2 Analysis"]

466

CLAIM3["Claim N Analysis"]

467

468

VERDICT1[Claim 1 Verdict]

469

VERDICT2[Claim 2 Verdict]

470

VERDICT3[Claim N Verdict]

471

end

472

473

subgraph ArticleLayer["Article Verdict Layer (Dynamic)"]

474

AGGREGATE[Aggregate Claim Verdicts]

475

CONTEXT[Apply Article Context ━━━━━━━━━━━━━ • Claim relationships • Logical structure • Author intent]

476

ARTICLE_VERDICT[Article Verdict]

end

%% Flow

ARTICLE --> EXTRACT

EXTRACT --> CLAIM1

EXTRACT --> CLAIM2

EXTRACT --> CLAIM3

CLAIM1 -->|"Cache Miss"| VERDICT1

486

CLAIM2 -->|"Cache Hit"| VERDICT2

487

CLAIM3 -->|"Cache Miss"| VERDICT3

CLAIM1 <-.-> CACHE

CLAIM2 <-.-> CACHE

CLAIM3 <-.-> CACHE

VERDICT1 --> AGGREGATE

494

VERDICT2 --> AGGREGATE

495

VERDICT3 --> AGGREGATE

496

497

AGGREGATE --> CONTEXT

498

CONTEXT --> ARTICLE_VERDICT

499

500

classDef cache fill:#fff9c4,stroke:#f57f17,stroke-width:2px

501

classDef dynamic fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px

502

class CACHE cache

503

class CONTEXT,ARTICLE_VERDICT dynamic

=== 6.3 Benefits Analysis ===

508

509

510

| Benefit | Impact | Rationale |

511

|-| |-|

512

| **Cost Reduction** | 40-70% for repeated claims | Many articles share common claims (e.g., "COVID vaccines are safe") |

513

| **Faster Analysis** | 50%+ for cached claims | Skip research + LLM calls for known claims |

514

| **Consistency** | High | Same claim always gets same verdict (until cache expires) |

515

| **Freshness Control** | Configurable TTL | Balance consistency vs. new evidence |

516

| **Scalability** | Linear improvement | More users = higher cache hit rate |

517

518

=== 6.4 Implementation Considerations ===

519

520

**Claim Hashing Strategy:**

521

{{code language="typescript"}}function getClaimHash(claim: string): string {

522

// Normalize: lowercase, remove punctuation, stem words

523

const normalized = normalize(claim);

524

// Hash for cache key

525

return crypto.createHash('sha256').update(normalized).digest('hex').slice(0, 16);

526

}{{/code}}

527

528

**Cache Invalidation Triggers:**

529

- TTL expiration (default 7 days)

530

- Major news event related to claim topic

531

- Source track record significant change

532

- Manual invalidation by moderator

533

534

**Article Verdict Considerations:**

535

- Article verdict should ALWAYS be dynamic (never cached)

536

- Same claims in different article contexts may yield different article verdicts

537

- Example: "Vaccines are safe" + "Vaccines cause autism" → article may be misleading even if first claim is true

538

539

### 6.5 Recommendation##

540

541

**YES, separating is beneficial** with the following caveats:

542

543

1. **Claim verdicts should be cached** with semantic similarity matching (not just exact match)

544

2. **Article verdicts should always be dynamic** to account for:

545

- Claim relationships and logical structure

546

- Author's argumentative strategy

547

- Context and framing

548

- Selective use of true claims to support false conclusions

549

550

3. **Implementation phases:**

551

- Phase 1: Exact-match claim caching (simple hash)

552

- Phase 2: Semantic similarity caching (embedding-based)

553

- Phase 3: Federated claim sharing across instances

----

== 7. Summary ==

=== Current State ===

563

564

- POC1 implements core AKEL pipeline successfully

565

- Claim dependency tracking is implemented

566

- Multiple LLM providers supported

567

- No persistent claim storage or caching

568

569

570

=== Key Gaps from Specification ===

571

572

- No scenario extraction

573

- No user/role system

574

- No audit trail

575

- No source track record updates

- No review queue

=== Recommended Next Steps ===

580

581

1. Implement claim caching layer

582

2. Separate claim vs article verdict generation

583

3. Add Redis for source/search caching

584

4. Implement tiered model selection

585

5. Add basic audit logging

Wiki source code of FactHarbor POC1 Architecture Analysis

Applications

Navigation

Need help?