Changes for page FactHarbor POC1 Architecture Analysis 1.Jan.26
Last modified by Robert Schaub on 2026/02/08 08:12
From version 2.1
edited by Robert Schaub
on 2026/01/02 10:01
on 2026/01/02 10:01
Change comment:
There is no comment for this version
To version 6.1
edited by Robert Schaub
on 2026/01/02 10:06
on 2026/01/02 10:06
Change comment:
There is no comment for this version
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -1,12 +1,9 @@ 1 1 = FactHarbor POC1 Architecture Analysis = 2 2 3 - 4 4 **Version:** 2.6.17 5 5 **Analysis Date:** January 2026 6 6 **Document Purpose:** Technical diagrams, gap analysis, and optimization recommendations 7 7 8 ------ 9 - 10 10 ---- 11 11 12 12 == 1. AKEL Flow Diagram (with LLM and WebSearch Interactions) == ... ... @@ -93,12 +93,10 @@ 93 93 class UNDERSTAND,DECIDE,FETCHSRC,EXTRACT,VERDICT,REPORT step 94 94 {{/mermaid}} 95 95 96 ----- -93 +---- 97 97 98 - 99 99 == 2. ERD Data Model (Current POC1 Implementation) == 100 100 101 - 102 102 {{mermaid}} 103 103 erDiagram 104 104 JOB ||--o{ JOB_EVENT : "has" ... ... @@ -188,12 +188,10 @@ 188 188 } 189 189 {{/mermaid}} 190 190 191 ----- -186 +---- 192 192 193 - 194 194 == 3. Overall Architecture with Interactions == 195 195 196 - 197 197 {{mermaid}} 198 198 flowchart TB 199 199 subgraph Client["🖥️ Client Layer"] ... ... @@ -287,16 +287,12 @@ 287 287 class ANALYZE_API,JOBS_API,JOB_API,EVENTS_API,RUN_JOB api 288 288 {{/mermaid}} 289 289 290 ----- -283 +---- 291 291 292 - 293 293 == 4. Specification vs Implementation Gap Analysis == 294 294 295 - 296 - 297 297 === 4.1 Data Model Gaps === 298 298 299 - 300 300 | Specification Entity | POC1 Status | Gap Description | 301 301 |-|-|-| 302 302 | **Claim** | ⚠️ Partial | No persistent storage; claims exist only in JSON result. Missing: `status`, `confidence_score`, `risk_score`, `completeness_score`, `version`, `views`, `edit_count` | ... ... @@ -324,9 +324,8 @@ 324 324 325 325 === 4.3 Architecture Gaps === 326 326 327 - 328 328 | Spec Requirement | POC1 Status | Gap Description | 329 -||-|-| 317 +| |-|-| 330 330 | **Three-Layer Architecture** | ✅ Implemented | Interface (Next.js) → Processing (AKEL) → Data (SQLite) | 331 331 | **LLM Abstraction Layer** | ✅ Implemented | AI SDK supports multiple providers with failover | 332 332 | **PostgreSQL Primary DB** | ⚠️ Different | Using SQLite for simplicity (acceptable for POC) | ... ... @@ -337,24 +337,19 @@ 337 337 338 338 === 4.4 Publication & Review Gaps === 339 339 340 - 341 341 | Spec Feature | POC1 Status | Gap Description | 342 -||-|-| 329 +| |-|-| 343 343 | **Risk Tier Publication Rules** | ❌ Missing | All results published immediately regardless of tier | 344 344 | **Human Review Queue** | ❌ Missing | No review workflow | 345 345 | **AI-Generated Labeling** | ⚠️ Partial | Results show "AI analysis" but no formal labeling system | 346 346 | **Audit Rate Sampling** | ❌ Missing | No sampling audits | 347 347 348 ----- -335 +---- 349 349 350 - 351 351 == 5. Optimization Recommendations == 352 352 353 - 354 - 355 355 === 5.1 Cost Optimizations === 356 356 357 - 358 358 {{mermaid}} 359 359 pie title Current LLM Cost Distribution (Estimated per Analysis) 360 360 "Step 1: Understand" : 15 ... ... @@ -363,7 +363,7 @@ 363 363 {{/mermaid}} 364 364 365 365 | Optimization | Estimated Savings | Implementation Effort | 366 -||- ----||349 +| |-| | 367 367 | **Cache claim understanding** | 30-50% on repeated claims | Medium | 368 368 | **Use Haiku for fact extraction** | 40% on Step 2 costs | Low (config change) | 369 369 | **Batch fact extraction** | 20% fewer API calls | Medium | ... ... @@ -372,7 +372,6 @@ 372 372 373 373 === 5.2 Timing Optimizations === 374 374 375 - 376 376 {{mermaid}} 377 377 gantt 378 378 title Current Analysis Timeline (Typical) ... ... @@ -399,7 +399,7 @@ 399 399 {{/mermaid}} 400 400 401 401 | Optimization | Time Savings | Notes | 402 -|||- ----|384 +| | |-| 403 403 | **Parallel source fetching** | Already implemented | Currently fetches 3 sources in parallel | 404 404 | **Streaming LLM responses** | 20-30% perceived | User sees progress faster | 405 405 | **Search query batching** | 10-15% | Send multiple queries to search API | ... ... @@ -408,7 +408,6 @@ 408 408 409 409 === 5.3 Priority Recommendations === 410 410 411 - 412 412 1. **HIGH PRIORITY - Implement Claim Caching** 413 413 - Cache claim verdicts by content hash 414 414 - Reduces costs for repeated/similar claims ... ... @@ -424,16 +424,12 @@ 424 424 - Cache search results (1h TTL) 425 425 - Reduces external API calls 426 426 427 ----- -408 +---- 428 428 429 - 430 430 == 6. Separated Verdict Architecture Proposal == 431 431 432 - 433 - 434 434 === 6.1 Current Architecture === 435 435 436 - 437 437 {{mermaid}} 438 438 flowchart LR 439 439 subgraph Current["Current: Monolithic Analysis"] ... ... @@ -449,10 +449,8 @@ 449 449 - No caching of individual claim verdicts 450 450 - Article verdict tightly coupled to claim extraction 451 451 452 - 453 453 === 6.2 Proposed Separated Architecture === 454 454 455 - 456 456 {{mermaid}} 457 457 flowchart TB 458 458 subgraph Input["Input Processing"] ... ... @@ -505,12 +505,10 @@ 505 505 class CONTEXT,ARTICLE_VERDICT dynamic 506 506 {{/mermaid}} 507 507 508 - 509 509 === 6.3 Benefits Analysis === 510 510 511 - 512 512 | Benefit | Impact | Rationale | 513 -|-| |- ----|486 +|-| |-| 514 514 | **Cost Reduction** | 40-70% for repeated claims | Many articles share common claims (e.g., "COVID vaccines are safe") | 515 515 | **Faster Analysis** | 50%+ for cached claims | Skip research + LLM calls for known claims | 516 516 | **Consistency** | High | Same claim always gets same verdict (until cache expires) | ... ... @@ -554,13 +554,10 @@ 554 554 - Phase 2: Semantic similarity caching (embedding-based) 555 555 - Phase 3: Federated claim sharing across instances 556 556 557 ----- -530 +---- 558 558 559 - 560 560 == 7. Summary == 561 561 562 - 563 - 564 564 === Current State === 565 565 566 566 - POC1 implements core AKEL pipeline successfully ... ... @@ -568,7 +568,6 @@ 568 568 - Multiple LLM providers supported 569 569 - No persistent claim storage or caching 570 570 571 - 572 572 === Key Gaps from Specification === 573 573 574 574 - No scenario extraction ... ... @@ -577,7 +577,6 @@ 577 577 - No source track record updates 578 578 - No review queue 579 579 580 - 581 581 === Recommended Next Steps === 582 582 583 583 1. Implement claim caching layer