FactHarbor POC1 Architecture Analysis

| **Claim** | ⚠️ Partial | No persistent storage; claims exist only in JSON result. Missing: `status`, `confidence_score`, `risk_score`, `completeness_score`, `version`, `views`, `edit_count` |

292

| **Evidence** | ⚠️ Partial | Implemented as `ExtractedFact` but lacks: `supports` enum, proper `relevance_score` |

293

| **Source** | ⚠️ Partial | `FetchedSource` exists but missing: `type` enum, `accuracy_history`, `correction_frequency`, weekly update scheduler |

294

| **Scenario** | ❌ Missing | Not implemented. Claims are evaluated directly without scenario contexts |

295

| **Verdict** | ⚠️ Partial | `ClaimVerdict` exists but missing: `likelihood_range`, `uncertainty_factors` array, proper `explanation_summary` |

296

| **User** | ❌ Missing | No user authentication or role system |

297

| **Edit** | ❌ Missing | No audit trail for changes |

298

299

=== 4.2 AKEL Component Gaps ===

300

301

| Spec Component | POC1 Status | Gap Description |

302

| |-|-|

303

| **AKEL Orchestrator** | ✅ Implemented | `runAnalysis()` function serves this role |

304

| **Claim Extractor** | ✅ Implemented | `understandClaim()` with claim role/dependency tracking |

305

| **Claim Classifier** | ⚠️ Partial | Risk tier (A/B/C) assigned, but no domain classification |

306

| **Scenario Generator** | ❌ Missing | Claims evaluated without scenario extraction |

307

| **Evidence Summarizer** | ✅ Implemented | `extractFacts()` function |

308

| **Contradiction Detector** | ⚠️ Partial | `isContestedClaim` flag exists but no active contradiction search |

309

| **Quality Gate Validator** | ❌ Missing | No source quality gates, no mandatory checks |

310

| **Audit Sampling Scheduler** | ❌ Missing | No audit system |

311

| **Embedding Handler** | ❌ Missing | Not needed for POC |

312

| **Federation Sync** | ❌ Missing | Not needed for POC |

313

314

=== 4.3 Architecture Gaps ===

315

316

| Spec Requirement | POC1 Status | Gap Description |

317

| |-|-|

318

| **Three-Layer Architecture** | ✅ Implemented | Interface (Next.js) → Processing (AKEL) → Data (SQLite) |

319

| **LLM Abstraction Layer** | ✅ Implemented | AI SDK supports multiple providers with failover |

320

| **PostgreSQL Primary DB** | ⚠️ Different | Using SQLite for simplicity (acceptable for POC) |

321

| **Redis Caching** | ❌ Missing | No caching layer |

322

| **S3 Archival** | ❌ Missing | No long-term storage |

323

| **Background Jobs** | ❌ Missing | No scheduler for source updates, cache warming |

324

| **Quality Monitoring** | ⚠️ Partial | LLM call counting exists, but no anomaly detection |

325

326

=== 4.4 Publication & Review Gaps ===

327

328

| Spec Feature | POC1 Status | Gap Description |

329

| |-|-|

330

| **Risk Tier Publication Rules** | ❌ Missing | All results published immediately regardless of tier |

331

| **Human Review Queue** | ❌ Missing | No review workflow |

332

| **AI-Generated Labeling** | ⚠️ Partial | Results show "AI analysis" but no formal labeling system |

333

| **Audit Rate Sampling** | ❌ Missing | No sampling audits |

----

== 5. Optimization Recommendations ==

338

339

=== 5.1 Cost Optimizations ===

340

341

342

pie title Current LLM Cost Distribution (Estimated per Analysis)

343

"Step 1: Understand" : 15

344

"Step 2: Research (per source)" : 60

345

"Step 3: Verdicts" : 25

346

347

348

| Optimization | Estimated Savings | Implementation Effort |

349

| |-| |

350

| **Cache claim understanding** | 30-50% on repeated claims | Medium |

351

| **Use Haiku for fact extraction** | 40% on Step 2 costs | Low (config change) |

352

| **Batch fact extraction** | 20% fewer API calls | Medium |

353

| **Skip search for known claims** | 50%+ for cached claims | High (needs claim DB) |

354

| **Reduce max iterations** | Linear reduction | Low (config change) |

355

356

=== 5.2 Timing Optimizations ===

gantt

title Current Analysis Timeline (Typical)

dateFormat ss

axisFormat %S sec

section Current Flow

URL Fetch :a1, 00, 2s

366

Step 1 Understand :a2, after a1, 15s

367

Search Iteration 1 :a3, after a2, 8s

368

Fetch Sources 1 :a4, after a3, 10s

369

Extract Facts 1 :a5, after a4, 12s

370

Search Iteration 2 :a6, after a5, 8s

371

Fetch Sources 2 :a7, after a6, 10s

372

Extract Facts 2 :a8, after a7, 12s

373

Generate Verdicts :a9, after a8, 15s

374

375

section Optimized Flow

376

URL Fetch :b1, 00, 2s

377

Step 1 Understand :b2, after b1, 10s

378

Search + Fetch (parallel) :b3, after b2, 12s

379

Extract Facts (batched) :b4, after b3, 8s

380

Generate Verdicts :b5, after b4, 10s

381

382

383

| Optimization | Time Savings | Notes |

384

| | |-|

385

| **Parallel source fetching** | Already implemented | Currently fetches 3 sources in parallel |

386

| **Streaming LLM responses** | 20-30% perceived | User sees progress faster |

387

| **Search query batching** | 10-15% | Send multiple queries to search API |

388

| **Reduce prompt size** | 5-10% per call | Optimize system prompts |

389

| **Use faster models for extraction** | 30-40% on Step 2 | Claude Haiku vs Sonnet |

390

391

=== 5.3 Priority Recommendations ===

392

393

1. **HIGH PRIORITY - Implement Claim Caching**

394

- Cache claim verdicts by content hash

395

- Reduces costs for repeated/similar claims

396

- Enables the separated verdict architecture (see Section 6)

397

398

2. **MEDIUM PRIORITY - Use Tiered Models**

399

- Step 1 (Understand): Sonnet (needs reasoning)

400

- Step 2 (Extract): Haiku (simple extraction)

401

- Step 3 (Verdicts): Sonnet (needs synthesis)

402

403

3. **LOW PRIORITY - Add Redis Cache**

404

- Cache source content (24h TTL)

405

- Cache search results (1h TTL)

406

- Reduces external API calls

----

== 6. Separated Verdict Architecture Proposal ==

411

412

=== 6.1 Current Architecture ===

flowchart LR

subgraph Current["Current: Monolithic Analysis"]

417

INPUT[Article Input] --> ANALYZE[Full Analysis Pipeline]

418

ANALYZE --> CLAIMS[Claim Verdicts]

419

ANALYZE --> ARTICLE[Article Verdict]

420

CLAIMS -.->|"Aggregated"| ARTICLE

end

**Issues:**

- Every analysis re-processes all claims

426

- No caching of individual claim verdicts

427

- Article verdict tightly coupled to claim extraction

428

429

=== 6.2 Proposed Separated Architecture ===

flowchart TB

subgraph Input["Input Processing"]

434

ARTICLE[Article/Text Input]

435

EXTRACT[Claim Extraction]

436

end

437

438

subgraph ClaimLayer["Claim Verdict Layer (Cacheable)"]

439

CACHE[(Claim Cache ━━━━━━━━━━━━━ Key: claim_hash TTL: 7 days)]

440

441

CLAIM1["Claim 1 Analysis"]

442

CLAIM2["Claim 2 Analysis"]

443

CLAIM3["Claim N Analysis"]

444

445

VERDICT1[Claim 1 Verdict]

446

VERDICT2[Claim 2 Verdict]

447

VERDICT3[Claim N Verdict]

448

end

449

450

subgraph ArticleLayer["Article Verdict Layer (Dynamic)"]

451

AGGREGATE[Aggregate Claim Verdicts]

452

CONTEXT[Apply Article Context ━━━━━━━━━━━━━ • Claim relationships • Logical structure • Author intent]

453

ARTICLE_VERDICT[Article Verdict]

end

%% Flow

ARTICLE --> EXTRACT

EXTRACT --> CLAIM1

EXTRACT --> CLAIM2

EXTRACT --> CLAIM3

CLAIM1 -->|"Cache Miss"| VERDICT1

463

CLAIM2 -->|"Cache Hit"| VERDICT2

464

CLAIM3 -->|"Cache Miss"| VERDICT3

CLAIM1 <-.-> CACHE

CLAIM2 <-.-> CACHE

CLAIM3 <-.-> CACHE

VERDICT1 --> AGGREGATE

471

VERDICT2 --> AGGREGATE

472

VERDICT3 --> AGGREGATE

473

474

AGGREGATE --> CONTEXT

475

CONTEXT --> ARTICLE_VERDICT

476

477

classDef cache fill:#fff9c4,stroke:#f57f17,stroke-width:2px

478

classDef dynamic fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px

479

class CACHE cache

480

class CONTEXT,ARTICLE_VERDICT dynamic

481

482

483

=== 6.3 Benefits Analysis ===

484

485

| Benefit | Impact | Rationale |

486

|-| |-|

487

| **Cost Reduction** | 40-70% for repeated claims | Many articles share common claims (e.g., "COVID vaccines are safe") |

488

| **Faster Analysis** | 50%+ for cached claims | Skip research + LLM calls for known claims |

489

| **Consistency** | High | Same claim always gets same verdict (until cache expires) |

490

| **Freshness Control** | Configurable TTL | Balance consistency vs. new evidence |

491

| **Scalability** | Linear improvement | More users = higher cache hit rate |

492

493

=== 6.4 Implementation Considerations ===

494

495

**Claim Hashing Strategy:**

496

{{code language="typescript"}}function getClaimHash(claim: string): string {

497

// Normalize: lowercase, remove punctuation, stem words

498

const normalized = normalize(claim);

499

// Hash for cache key

500

return crypto.createHash('sha256').update(normalized).digest('hex').slice(0, 16);

501

}{{/code}}

502

503

**Cache Invalidation Triggers:**

504

- TTL expiration (default 7 days)

505

- Major news event related to claim topic

506

- Source track record significant change

507

- Manual invalidation by moderator

508

509

**Article Verdict Considerations:**

510

- Article verdict should ALWAYS be dynamic (never cached)

511

- Same claims in different article contexts may yield different article verdicts

512

- Example: "Vaccines are safe" + "Vaccines cause autism" → article may be misleading even if first claim is true

513

514

### 6.5 Recommendation##

515

516

**YES, separating is beneficial** with the following caveats:

517

518

1. **Claim verdicts should be cached** with semantic similarity matching (not just exact match)

519

2. **Article verdicts should always be dynamic** to account for:

520

- Claim relationships and logical structure

521

- Author's argumentative strategy

522

- Context and framing

523

- Selective use of true claims to support false conclusions

524

525

3. **Implementation phases:**

526

- Phase 1: Exact-match claim caching (simple hash)

527

- Phase 2: Semantic similarity caching (embedding-based)

528

- Phase 3: Federated claim sharing across instances

----

== 7. Summary ==

=== Current State ===

535

536

- POC1 implements core AKEL pipeline successfully

537

- Claim dependency tracking is implemented

538

- Multiple LLM providers supported

539

- No persistent claim storage or caching

540

541

=== Key Gaps from Specification ===

542

543

- No scenario extraction

544

- No user/role system

545

- No audit trail

546

- No source track record updates

547

- No review queue

548

549

=== Recommended Next Steps ===

550

551

1. Implement claim caching layer

552

2. Separate claim vs article verdict generation

553

3. Add Redis for source/search caching

554

4. Implement tiered model selection

555

5. Add basic audit logging

Wiki source code of FactHarbor POC1 Architecture Analysis