FactHarbor POC1 Architecture Analysis 01.Jan.26

| **Claim** | ⚠️ Partial | No persistent storage; claims exist only in JSON result. Missing: `status`, `confidence_score`, `risk_score`, `completeness_score`, `version`, `views`, `edit_count` |

405

| **Evidence** | ⚠️ Partial | Implemented as `ExtractedFact` but lacks: `supports` enum, proper `relevance_score` |

406

| **Source** | ⚠️ Partial | `FetchedSource` exists but missing: `type` enum, `accuracy_history`, `correction_frequency`, weekly update scheduler |

407

| **Scenario** | ❌ Missing | Not implemented. Claims are evaluated directly without scenario contexts |

408

| **Verdict** | ⚠️ Partial | `ClaimVerdict` exists but missing: `likelihood_range`, `uncertainty_factors` array, proper `explanation_summary` |

409

| **User** | ❌ Missing | No user authentication or role system |

410

| **Edit** | ❌ Missing | No audit trail for changes |

411

412

=== 4.2 AKEL Component Gaps ===

413

414

| Spec Component | POC1 Status | Gap Description |

415

| |-|-|

416

| **AKEL Orchestrator** | ✅ Implemented | `runAnalysis()` function serves this role |

417

| **Claim Extractor** | ✅ Implemented | `understandClaim()` with claim role/dependency tracking |

418

| **Claim Classifier** | ⚠️ Partial | Risk tier (A/B/C) assigned, but no domain classification |

419

| **Scenario Generator** | ❌ Missing | Claims evaluated without scenario extraction |

420

| **Evidence Summarizer** | ✅ Implemented | `extractFacts()` function |

421

| **Contradiction Detector** | ⚠️ Partial | `isContestedClaim` flag exists but no active contradiction search |

422

| **Quality Gate Validator** | ❌ Missing | No source quality gates, no mandatory checks |

423

| **Audit Sampling Scheduler** | ❌ Missing | No audit system |

424

| **Embedding Handler** | ❌ Missing | Not needed for POC |

425

| **Federation Sync** | ❌ Missing | Not needed for POC |

426

427

=== 4.3 Architecture Gaps ===

428

429

| Spec Requirement | POC1 Status | Gap Description |

430

| |-|-|

431

| **Three-Layer Architecture** | ✅ Implemented | Interface (Next.js) → Processing (AKEL) → Data (SQLite) |

432

| **LLM Abstraction Layer** | ✅ Implemented | AI SDK supports multiple providers with failover |

433

| **PostgreSQL Primary DB** | ⚠️ Different | Using SQLite for simplicity (acceptable for POC) |

434

| **Redis Caching** | ❌ Missing | No caching layer |

435

| **S3 Archival** | ❌ Missing | No long-term storage |

436

| **Background Jobs** | ❌ Missing | No scheduler for source updates, cache warming |

437

| **Quality Monitoring** | ⚠️ Partial | LLM call counting exists, but no anomaly detection |

438

439

=== 4.4 Publication & Review Gaps ===

440

441

| Spec Feature | POC1 Status | Gap Description |

442

| |-|-|

443

| **Risk Tier Publication Rules** | ❌ Missing | All results published immediately regardless of tier |

444

| **Human Review Queue** | ❌ Missing | No review workflow |

445

| **AI-Generated Labeling** | ⚠️ Partial | Results show "AI analysis" but no formal labeling system |

446

| **Audit Rate Sampling** | ❌ Missing | No sampling audits |

----

== 5. Optimization Recommendations ==

451

452

=== 5.1 Cost Optimizations ===

453

454

455

pie title Current LLM Cost Distribution (Estimated per Analysis)

456

"Step 1: Understand" : 15

457

"Step 2: Research (per source)" : 60

458

"Step 3: Verdicts" : 25

459

460

461

| Optimization | Estimated Savings | Implementation Effort |

462

| |-| |

463

| **Cache claim understanding** | 30-50% on repeated claims | Medium |

464

| **Use Haiku for fact extraction** | 40% on Step 2 costs | Low (config change) |

465

| **Batch fact extraction** | 20% fewer API calls | Medium |

466

| **Skip search for known claims** | 50%+ for cached claims | High (needs claim DB) |

467

| **Reduce max iterations** | Linear reduction | Low (config change) |

468

469

=== 5.2 Timing Optimizations ===

gantt

title Current Analysis Timeline (Typical)

dateFormat ss

axisFormat %S sec

section Current Flow

URL Fetch :a1, 00, 2s

479

Step 1 Understand :a2, after a1, 15s

480

Search Iteration 1 :a3, after a2, 8s

481

Fetch Sources 1 :a4, after a3, 10s

482

Extract Facts 1 :a5, after a4, 12s

483

Search Iteration 2 :a6, after a5, 8s

484

Fetch Sources 2 :a7, after a6, 10s

485

Extract Facts 2 :a8, after a7, 12s

486

Generate Verdicts :a9, after a8, 15s

487

488

section Optimized Flow

489

URL Fetch :b1, 00, 2s

490

Step 1 Understand :b2, after b1, 10s

491

Search + Fetch (parallel) :b3, after b2, 12s

492

Extract Facts (batched) :b4, after b3, 8s

493

Generate Verdicts :b5, after b4, 10s

494

495

496

| Optimization | Time Savings | Notes |

497

| | |-|

498

| **Parallel source fetching** | Already implemented | Currently fetches 3 sources in parallel |

499

| **Streaming LLM responses** | 20-30% perceived | User sees progress faster |

500

| **Search query batching** | 10-15% | Send multiple queries to search API |

501

| **Reduce prompt size** | 5-10% per call | Optimize system prompts |

502

| **Use faster models for extraction** | 30-40% on Step 2 | Claude Haiku vs Sonnet |

503

504

=== 5.3 Priority Recommendations ===

505

506

1. **HIGH PRIORITY - Implement Claim Caching**

507

- Cache claim verdicts by content hash

508

- Reduces costs for repeated/similar claims

509

- Enables the separated verdict architecture (see Section 6)

510

511

2. **MEDIUM PRIORITY - Use Tiered Models**

512

- Step 1 (Understand): Sonnet (needs reasoning)

513

- Step 2 (Extract): Haiku (simple extraction)

514

- Step 3 (Verdicts): Sonnet (needs synthesis)

515

516

3. **LOW PRIORITY - Add Redis Cache**

517

- Cache source content (24h TTL)

518

- Cache search results (1h TTL)

519

- Reduces external API calls

----

== 6. Separated Verdict Architecture Proposal ==

524

525

=== 6.1 Current Architecture ===

flowchart LR

subgraph Current["Current: Monolithic Analysis"]

530

INPUT[Article Input] --> ANALYZE[Full Analysis Pipeline]

531

ANALYZE --> CLAIMS[Claim Verdicts]

532

ANALYZE --> ARTICLE[Article Verdict]

533

CLAIMS -.->|"Aggregated"| ARTICLE

end

**Issues:**

- Every analysis re-processes all claims

539

- No caching of individual claim verdicts

540

- Article verdict tightly coupled to claim extraction

541

542

=== 6.2 Proposed Separated Architecture ===

flowchart TB

subgraph Input["Input Processing"]

547

ARTICLE[Article/Text Input]

548

EXTRACT[Claim Extraction]

549

end

550

551

subgraph ClaimLayer["Claim Verdict Layer (Cacheable)"]

552

CACHE[(Claim Cache ━━━━━━━━━━━━━ Key: claim_hash TTL: 7 days)]

553

554

CLAIM1["Claim 1 Analysis"]

555

CLAIM2["Claim 2 Analysis"]

556

CLAIM3["Claim N Analysis"]

557

558

VERDICT1[Claim 1 Verdict]

559

VERDICT2[Claim 2 Verdict]

560

VERDICT3[Claim N Verdict]

561

end

562

563

subgraph ArticleLayer["Article Verdict Layer (Dynamic)"]

564

AGGREGATE[Aggregate Claim Verdicts]

565

CONTEXT[Apply Article Context ━━━━━━━━━━━━━ • Claim relationships • Logical structure • Author intent]

566

ARTICLE_VERDICT[Article Verdict]

end

%% Flow

ARTICLE --> EXTRACT

EXTRACT --> CLAIM1

EXTRACT --> CLAIM2

EXTRACT --> CLAIM3

CLAIM1 -->|"Cache Miss"| VERDICT1

576

CLAIM2 -->|"Cache Hit"| VERDICT2

577

CLAIM3 -->|"Cache Miss"| VERDICT3

CLAIM1 <-.-> CACHE

CLAIM2 <-.-> CACHE

CLAIM3 <-.-> CACHE

VERDICT1 --> AGGREGATE

584

VERDICT2 --> AGGREGATE

585

VERDICT3 --> AGGREGATE

586

587

AGGREGATE --> CONTEXT

588

CONTEXT --> ARTICLE_VERDICT

589

590

classDef cache fill:#fff9c4,stroke:#f57f17,stroke-width:2px

591

classDef dynamic fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px

592

class CACHE cache

593

class CONTEXT,ARTICLE_VERDICT dynamic

594

595

596

=== 6.3 Benefits Analysis ===

597

598

| Benefit | Impact | Rationale |

599

|-| |-|

600

| **Cost Reduction** | 40-70% for repeated claims | Many articles share common claims (e.g., "COVID vaccines are safe") |

601

| **Faster Analysis** | 50%+ for cached claims | Skip research + LLM calls for known claims |

602

| **Consistency** | High | Same claim always gets same verdict (until cache expires) |

603

| **Freshness Control** | Configurable TTL | Balance consistency vs. new evidence |

604

| **Scalability** | Linear improvement | More users = higher cache hit rate |

605

606

=== 6.4 Implementation Considerations ===

607

608

**Claim Hashing Strategy:**

609

{{code language="typescript"}}function getClaimHash(claim: string): string {

610

// Normalize: lowercase, remove punctuation, stem words

611

const normalized = normalize(claim);

612

// Hash for cache key

613

return crypto.createHash('sha256').update(normalized).digest('hex').slice(0, 16);

614

}{{/code}}

615

616

**Cache Invalidation Triggers:**

617

- TTL expiration (default 7 days)

618

- Major news event related to claim topic

619

- Source track record significant change

620

- Manual invalidation by moderator

621

622

**Article Verdict Considerations:**

623

- Article verdict should ALWAYS be dynamic (never cached)

624

- Same claims in different article contexts may yield different article verdicts

625

- Example: "Vaccines are safe" + "Vaccines cause autism" → article may be misleading even if first claim is true

626

627

### 6.5 Recommendation##

628

629

**YES, separating is beneficial** with the following caveats:

630

631

1. **Claim verdicts should be cached** with semantic similarity matching (not just exact match)

632

2. **Article verdicts should always be dynamic** to account for:

633

- Claim relationships and logical structure

634

- Author's argumentative strategy

635

- Context and framing

636

- Selective use of true claims to support false conclusions

637

638

3. **Implementation phases:**

639

- Phase 1: Exact-match claim caching (simple hash)

640

- Phase 2: Semantic similarity caching (embedding-based)

641

- Phase 3: Federated claim sharing across instances

----

== 7. Summary ==

=== Current State ===

648

649

- POC1 implements core AKEL pipeline successfully

650

- Claim dependency tracking is implemented

651

- Multiple LLM providers supported

652

- No persistent claim storage or caching

653

654

=== Key Gaps from Specification ===

655

656

- No scenario extraction

657

- No user/role system

658

- No audit trail

659

- No source track record updates

660

- No review queue

661

662

=== Recommended Next Steps ===

663

664

1. Implement claim caching layer

665

2. Separate claim vs article verdict generation

666

3. Add Redis for source/search caching

667

4. Implement tiered model selection

668

5. Add basic audit logging

Wiki source code of FactHarbor POC1 Architecture Analysis 01.Jan.26