Changes for page FactHarbor POC1 Architecture Analysis 1.Jan.26
Last modified by Robert Schaub on 2026/02/08 08:12
From version 6.1
edited by Robert Schaub
on 2026/01/02 10:06
on 2026/01/02 10:06
Change comment:
There is no comment for this version
To version 2.1
edited by Robert Schaub
on 2026/01/02 10:01
on 2026/01/02 10:01
Change comment:
There is no comment for this version
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -1,9 +1,12 @@ 1 1 = FactHarbor POC1 Architecture Analysis = 2 2 3 + 3 3 **Version:** 2.6.17 4 4 **Analysis Date:** January 2026 5 5 **Document Purpose:** Technical diagrams, gap analysis, and optimization recommendations 6 6 8 +----- 9 + 7 7 ---- 8 8 9 9 == 1. AKEL Flow Diagram (with LLM and WebSearch Interactions) == ... ... @@ -90,10 +90,12 @@ 90 90 class UNDERSTAND,DECIDE,FETCHSRC,EXTRACT,VERDICT,REPORT step 91 91 {{/mermaid}} 92 92 93 ----- 96 +----- 94 94 98 + 95 95 == 2. ERD Data Model (Current POC1 Implementation) == 96 96 101 + 97 97 {{mermaid}} 98 98 erDiagram 99 99 JOB ||--o{ JOB_EVENT : "has" ... ... @@ -183,10 +183,12 @@ 183 183 } 184 184 {{/mermaid}} 185 185 186 ----- 191 +----- 187 187 193 + 188 188 == 3. Overall Architecture with Interactions == 189 189 196 + 190 190 {{mermaid}} 191 191 flowchart TB 192 192 subgraph Client["🖥️ Client Layer"] ... ... @@ -280,12 +280,16 @@ 280 280 class ANALYZE_API,JOBS_API,JOB_API,EVENTS_API,RUN_JOB api 281 281 {{/mermaid}} 282 282 283 ----- 290 +----- 284 284 292 + 285 285 == 4. Specification vs Implementation Gap Analysis == 286 286 295 + 296 + 287 287 === 4.1 Data Model Gaps === 288 288 299 + 289 289 | Specification Entity | POC1 Status | Gap Description | 290 290 |-|-|-| 291 291 | **Claim** | ⚠️ Partial | No persistent storage; claims exist only in JSON result. Missing: `status`, `confidence_score`, `risk_score`, `completeness_score`, `version`, `views`, `edit_count` | ... ... @@ -313,8 +313,9 @@ 313 313 314 314 === 4.3 Architecture Gaps === 315 315 327 + 316 316 | Spec Requirement | POC1 Status | Gap Description | 317 -| |-|-|329 +||-|-| 318 318 | **Three-Layer Architecture** | ✅ Implemented | Interface (Next.js) → Processing (AKEL) → Data (SQLite) | 319 319 | **LLM Abstraction Layer** | ✅ Implemented | AI SDK supports multiple providers with failover | 320 320 | **PostgreSQL Primary DB** | ⚠️ Different | Using SQLite for simplicity (acceptable for POC) | ... ... @@ -325,19 +325,24 @@ 325 325 326 326 === 4.4 Publication & Review Gaps === 327 327 340 + 328 328 | Spec Feature | POC1 Status | Gap Description | 329 -| |-|-|342 +||-|-| 330 330 | **Risk Tier Publication Rules** | ❌ Missing | All results published immediately regardless of tier | 331 331 | **Human Review Queue** | ❌ Missing | No review workflow | 332 332 | **AI-Generated Labeling** | ⚠️ Partial | Results show "AI analysis" but no formal labeling system | 333 333 | **Audit Rate Sampling** | ❌ Missing | No sampling audits | 334 334 335 ----- 348 +----- 336 336 350 + 337 337 == 5. Optimization Recommendations == 338 338 353 + 354 + 339 339 === 5.1 Cost Optimizations === 340 340 357 + 341 341 {{mermaid}} 342 342 pie title Current LLM Cost Distribution (Estimated per Analysis) 343 343 "Step 1: Understand" : 15 ... ... @@ -346,7 +346,7 @@ 346 346 {{/mermaid}} 347 347 348 348 | Optimization | Estimated Savings | Implementation Effort | 349 -| |-||366 +||-----|| 350 350 | **Cache claim understanding** | 30-50% on repeated claims | Medium | 351 351 | **Use Haiku for fact extraction** | 40% on Step 2 costs | Low (config change) | 352 352 | **Batch fact extraction** | 20% fewer API calls | Medium | ... ... @@ -355,6 +355,7 @@ 355 355 356 356 === 5.2 Timing Optimizations === 357 357 375 + 358 358 {{mermaid}} 359 359 gantt 360 360 title Current Analysis Timeline (Typical) ... ... @@ -381,7 +381,7 @@ 381 381 {{/mermaid}} 382 382 383 383 | Optimization | Time Savings | Notes | 384 -| ||-|402 +|||-----| 385 385 | **Parallel source fetching** | Already implemented | Currently fetches 3 sources in parallel | 386 386 | **Streaming LLM responses** | 20-30% perceived | User sees progress faster | 387 387 | **Search query batching** | 10-15% | Send multiple queries to search API | ... ... @@ -390,6 +390,7 @@ 390 390 391 391 === 5.3 Priority Recommendations === 392 392 411 + 393 393 1. **HIGH PRIORITY - Implement Claim Caching** 394 394 - Cache claim verdicts by content hash 395 395 - Reduces costs for repeated/similar claims ... ... @@ -405,12 +405,16 @@ 405 405 - Cache search results (1h TTL) 406 406 - Reduces external API calls 407 407 408 ----- 427 +----- 409 409 429 + 410 410 == 6. Separated Verdict Architecture Proposal == 411 411 432 + 433 + 412 412 === 6.1 Current Architecture === 413 413 436 + 414 414 {{mermaid}} 415 415 flowchart LR 416 416 subgraph Current["Current: Monolithic Analysis"] ... ... @@ -426,8 +426,10 @@ 426 426 - No caching of individual claim verdicts 427 427 - Article verdict tightly coupled to claim extraction 428 428 452 + 429 429 === 6.2 Proposed Separated Architecture === 430 430 455 + 431 431 {{mermaid}} 432 432 flowchart TB 433 433 subgraph Input["Input Processing"] ... ... @@ -480,10 +480,12 @@ 480 480 class CONTEXT,ARTICLE_VERDICT dynamic 481 481 {{/mermaid}} 482 482 508 + 483 483 === 6.3 Benefits Analysis === 484 484 511 + 485 485 | Benefit | Impact | Rationale | 486 -|-| |-| 513 +|-| |-----| 487 487 | **Cost Reduction** | 40-70% for repeated claims | Many articles share common claims (e.g., "COVID vaccines are safe") | 488 488 | **Faster Analysis** | 50%+ for cached claims | Skip research + LLM calls for known claims | 489 489 | **Consistency** | High | Same claim always gets same verdict (until cache expires) | ... ... @@ -527,10 +527,13 @@ 527 527 - Phase 2: Semantic similarity caching (embedding-based) 528 528 - Phase 3: Federated claim sharing across instances 529 529 530 ----- 557 +----- 531 531 559 + 532 532 == 7. Summary == 533 533 562 + 563 + 534 534 === Current State === 535 535 536 536 - POC1 implements core AKEL pipeline successfully ... ... @@ -538,6 +538,7 @@ 538 538 - Multiple LLM providers supported 539 539 - No persistent claim storage or caching 540 540 571 + 541 541 === Key Gaps from Specification === 542 542 543 543 - No scenario extraction ... ... @@ -546,6 +546,7 @@ 546 546 - No source track record updates 547 547 - No review queue 548 548 580 + 549 549 === Recommended Next Steps === 550 550 551 551 1. Implement claim caching layer