FactHarbor POC1 Architecture Analysis

| **Claim** | ⚠️ Partial | No persistent storage; claims exist only in JSON result. Missing: `status`, `confidence_score`, `risk_score`, `completeness_score`, `version`, `views`, `edit_count` |

303

| **Evidence** | ⚠️ Partial | Implemented as `ExtractedFact` but lacks: `supports` enum, proper `relevance_score` |

304

| **Source** | ⚠️ Partial | `FetchedSource` exists but missing: `type` enum, `accuracy_history`, `correction_frequency`, weekly update scheduler |

305

| **Scenario** | ❌ Missing | Not implemented. Claims are evaluated directly without scenario contexts |

306

| **Verdict** | ⚠️ Partial | `ClaimVerdict` exists but missing: `likelihood_range`, `uncertainty_factors` array, proper `explanation_summary` |

307

| **User** | ❌ Missing | No user authentication or role system |

308

| **Edit** | ❌ Missing | No audit trail for changes |

309

310

=== 4.2 AKEL Component Gaps ===

311

312

| Spec Component | POC1 Status | Gap Description |

313

| |-|-|

314

| **AKEL Orchestrator** | ✅ Implemented | `runAnalysis()` function serves this role |

315

| **Claim Extractor** | ✅ Implemented | `understandClaim()` with claim role/dependency tracking |

316

| **Claim Classifier** | ⚠️ Partial | Risk tier (A/B/C) assigned, but no domain classification |

317

| **Scenario Generator** | ❌ Missing | Claims evaluated without scenario extraction |

318

| **Evidence Summarizer** | ✅ Implemented | `extractFacts()` function |

319

| **Contradiction Detector** | ⚠️ Partial | `isContestedClaim` flag exists but no active contradiction search |

320

| **Quality Gate Validator** | ❌ Missing | No source quality gates, no mandatory checks |

321

| **Audit Sampling Scheduler** | ❌ Missing | No audit system |

322

| **Embedding Handler** | ❌ Missing | Not needed for POC |

323

| **Federation Sync** | ❌ Missing | Not needed for POC |

324

325

=== 4.3 Architecture Gaps ===

326

327

328

| Spec Requirement | POC1 Status | Gap Description |

329

||-|-|

330

| **Three-Layer Architecture** | ✅ Implemented | Interface (Next.js) → Processing (AKEL) → Data (SQLite) |

331

| **LLM Abstraction Layer** | ✅ Implemented | AI SDK supports multiple providers with failover |

332

| **PostgreSQL Primary DB** | ⚠️ Different | Using SQLite for simplicity (acceptable for POC) |

333

| **Redis Caching** | ❌ Missing | No caching layer |

334

| **S3 Archival** | ❌ Missing | No long-term storage |

335

| **Background Jobs** | ❌ Missing | No scheduler for source updates, cache warming |

336

| **Quality Monitoring** | ⚠️ Partial | LLM call counting exists, but no anomaly detection |

337

338

=== 4.4 Publication & Review Gaps ===

339

340

341

| Spec Feature | POC1 Status | Gap Description |

342

||-|-|

343

| **Risk Tier Publication Rules** | ❌ Missing | All results published immediately regardless of tier |

344

| **Human Review Queue** | ❌ Missing | No review workflow |

345

| **AI-Generated Labeling** | ⚠️ Partial | Results show "AI analysis" but no formal labeling system |

346

| **Audit Rate Sampling** | ❌ Missing | No sampling audits |

-----

== 5. Optimization Recommendations ==

=== 5.1 Cost Optimizations ===

pie title Current LLM Cost Distribution (Estimated per Analysis)

360

"Step 1: Understand" : 15

361

"Step 2: Research (per source)" : 60

362

"Step 3: Verdicts" : 25

363

364

365

| Optimization | Estimated Savings | Implementation Effort |

366

||-----||

367

| **Cache claim understanding** | 30-50% on repeated claims | Medium |

368

| **Use Haiku for fact extraction** | 40% on Step 2 costs | Low (config change) |

369

| **Batch fact extraction** | 20% fewer API calls | Medium |

370

| **Skip search for known claims** | 50%+ for cached claims | High (needs claim DB) |

371

| **Reduce max iterations** | Linear reduction | Low (config change) |

372

373

=== 5.2 Timing Optimizations ===

gantt

title Current Analysis Timeline (Typical)

dateFormat ss

axisFormat %S sec

section Current Flow

URL Fetch :a1, 00, 2s

384

Step 1 Understand :a2, after a1, 15s

385

Search Iteration 1 :a3, after a2, 8s

386

Fetch Sources 1 :a4, after a3, 10s

387

Extract Facts 1 :a5, after a4, 12s

388

Search Iteration 2 :a6, after a5, 8s

389

Fetch Sources 2 :a7, after a6, 10s

390

Extract Facts 2 :a8, after a7, 12s

391

Generate Verdicts :a9, after a8, 15s

392

393

section Optimized Flow

394

URL Fetch :b1, 00, 2s

395

Step 1 Understand :b2, after b1, 10s

396

Search + Fetch (parallel) :b3, after b2, 12s

397

Extract Facts (batched) :b4, after b3, 8s

398

Generate Verdicts :b5, after b4, 10s

399

400

401

| Optimization | Time Savings | Notes |

402

|||-----|

403

| **Parallel source fetching** | Already implemented | Currently fetches 3 sources in parallel |

404

| **Streaming LLM responses** | 20-30% perceived | User sees progress faster |

405

| **Search query batching** | 10-15% | Send multiple queries to search API |

406

| **Reduce prompt size** | 5-10% per call | Optimize system prompts |

407

| **Use faster models for extraction** | 30-40% on Step 2 | Claude Haiku vs Sonnet |

408

409

=== 5.3 Priority Recommendations ===

410

411

412

1. **HIGH PRIORITY - Implement Claim Caching**

413

- Cache claim verdicts by content hash

414

- Reduces costs for repeated/similar claims

415

- Enables the separated verdict architecture (see Section 6)

416

417

2. **MEDIUM PRIORITY - Use Tiered Models**

418

- Step 1 (Understand): Sonnet (needs reasoning)

419

- Step 2 (Extract): Haiku (simple extraction)

420

- Step 3 (Verdicts): Sonnet (needs synthesis)

421

422

3. **LOW PRIORITY - Add Redis Cache**

423

- Cache source content (24h TTL)

424

- Cache search results (1h TTL)

425

- Reduces external API calls

-----

== 6. Separated Verdict Architecture Proposal ==

=== 6.1 Current Architecture ===

flowchart LR

subgraph Current["Current: Monolithic Analysis"]

440

INPUT[Article Input] --> ANALYZE[Full Analysis Pipeline]

441

ANALYZE --> CLAIMS[Claim Verdicts]

442

ANALYZE --> ARTICLE[Article Verdict]

443

CLAIMS -.->|"Aggregated"| ARTICLE

end

**Issues:**

- Every analysis re-processes all claims

449

- No caching of individual claim verdicts

450

- Article verdict tightly coupled to claim extraction

451

452

453

=== 6.2 Proposed Separated Architecture ===

flowchart TB

subgraph Input["Input Processing"]

459

ARTICLE[Article/Text Input]

460

EXTRACT[Claim Extraction]

461

end

462

463

subgraph ClaimLayer["Claim Verdict Layer (Cacheable)"]

464

CACHE[(Claim Cache ━━━━━━━━━━━━━ Key: claim_hash TTL: 7 days)]

465

466

CLAIM1["Claim 1 Analysis"]

467

CLAIM2["Claim 2 Analysis"]

468

CLAIM3["Claim N Analysis"]

469

470

VERDICT1[Claim 1 Verdict]

471

VERDICT2[Claim 2 Verdict]

472

VERDICT3[Claim N Verdict]

473

end

474

475

subgraph ArticleLayer["Article Verdict Layer (Dynamic)"]

476

AGGREGATE[Aggregate Claim Verdicts]

477

CONTEXT[Apply Article Context ━━━━━━━━━━━━━ • Claim relationships • Logical structure • Author intent]

478

ARTICLE_VERDICT[Article Verdict]

end

%% Flow

ARTICLE --> EXTRACT

EXTRACT --> CLAIM1

EXTRACT --> CLAIM2

EXTRACT --> CLAIM3

CLAIM1 -->|"Cache Miss"| VERDICT1

488

CLAIM2 -->|"Cache Hit"| VERDICT2

489

CLAIM3 -->|"Cache Miss"| VERDICT3

CLAIM1 <-.-> CACHE

CLAIM2 <-.-> CACHE

CLAIM3 <-.-> CACHE

VERDICT1 --> AGGREGATE

496

VERDICT2 --> AGGREGATE

497

VERDICT3 --> AGGREGATE

498

499

AGGREGATE --> CONTEXT

500

CONTEXT --> ARTICLE_VERDICT

501

502

classDef cache fill:#fff9c4,stroke:#f57f17,stroke-width:2px

503

classDef dynamic fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px

504

class CACHE cache

505

class CONTEXT,ARTICLE_VERDICT dynamic

=== 6.3 Benefits Analysis ===

510

511

512

| Benefit | Impact | Rationale |

513

|-| |-----|

514

| **Cost Reduction** | 40-70% for repeated claims | Many articles share common claims (e.g., "COVID vaccines are safe") |

515

| **Faster Analysis** | 50%+ for cached claims | Skip research + LLM calls for known claims |

516

| **Consistency** | High | Same claim always gets same verdict (until cache expires) |

517

| **Freshness Control** | Configurable TTL | Balance consistency vs. new evidence |

518

| **Scalability** | Linear improvement | More users = higher cache hit rate |

519

520

=== 6.4 Implementation Considerations ===

521

522

**Claim Hashing Strategy:**

523

{{code language="typescript"}}function getClaimHash(claim: string): string {

524

// Normalize: lowercase, remove punctuation, stem words

525

const normalized = normalize(claim);

526

// Hash for cache key

527

return crypto.createHash('sha256').update(normalized).digest('hex').slice(0, 16);

528

}{{/code}}

529

530

**Cache Invalidation Triggers:**

531

- TTL expiration (default 7 days)

532

- Major news event related to claim topic

533

- Source track record significant change

534

- Manual invalidation by moderator

535

536

**Article Verdict Considerations:**

537

- Article verdict should ALWAYS be dynamic (never cached)

538

- Same claims in different article contexts may yield different article verdicts

539

- Example: "Vaccines are safe" + "Vaccines cause autism" → article may be misleading even if first claim is true

540

541

### 6.5 Recommendation##

542

543

**YES, separating is beneficial** with the following caveats:

544

545

1. **Claim verdicts should be cached** with semantic similarity matching (not just exact match)

546

2. **Article verdicts should always be dynamic** to account for:

547

- Claim relationships and logical structure

548

- Author's argumentative strategy

549

- Context and framing

550

- Selective use of true claims to support false conclusions

551

552

3. **Implementation phases:**

553

- Phase 1: Exact-match claim caching (simple hash)

554

- Phase 2: Semantic similarity caching (embedding-based)

555

- Phase 3: Federated claim sharing across instances

-----

== 7. Summary ==

=== Current State ===

565

566

- POC1 implements core AKEL pipeline successfully

567

- Claim dependency tracking is implemented

568

- Multiple LLM providers supported

569

- No persistent claim storage or caching

570

571

572

=== Key Gaps from Specification ===

573

574

- No scenario extraction

575

- No user/role system

576

- No audit trail

577

- No source track record updates

- No review queue

=== Recommended Next Steps ===

582

583

1. Implement claim caching layer

584

2. Separate claim vs article verdict generation

585

3. Add Redis for source/search caching

586

4. Implement tiered model selection

587

5. Add basic audit logging

Wiki source code of FactHarbor POC1 Architecture Analysis

Applications

Navigation

Need help?