Last modified by Robert Schaub on 2026/02/08 08:12

From version 6.1
edited by Robert Schaub
on 2026/01/02 10:06
Change comment: There is no comment for this version
To version 2.1
edited by Robert Schaub
on 2026/01/02 10:01
Change comment: There is no comment for this version

Summary

Details

Page properties
Content
... ... @@ -1,9 +1,12 @@
1 1  = FactHarbor POC1 Architecture Analysis =
2 2  
3 +
3 3  **Version:** 2.6.17
4 4  **Analysis Date:** January 2026
5 5  **Document Purpose:** Technical diagrams, gap analysis, and optimization recommendations
6 6  
8 +-----
9 +
7 7  ----
8 8  
9 9  == 1. AKEL Flow Diagram (with LLM and WebSearch Interactions) ==
... ... @@ -90,10 +90,12 @@
90 90   class UNDERSTAND,DECIDE,FETCHSRC,EXTRACT,VERDICT,REPORT step
91 91  {{/mermaid}}
92 92  
93 -----
96 +-----
94 94  
98 +
95 95  == 2. ERD Data Model (Current POC1 Implementation) ==
96 96  
101 +
97 97  {{mermaid}}
98 98  erDiagram
99 99   JOB ||--o{ JOB_EVENT : "has"
... ... @@ -183,10 +183,12 @@
183 183   }
184 184  {{/mermaid}}
185 185  
186 -----
191 +-----
187 187  
193 +
188 188  == 3. Overall Architecture with Interactions ==
189 189  
196 +
190 190  {{mermaid}}
191 191  flowchart TB
192 192   subgraph Client["🖥️ Client Layer"]
... ... @@ -280,12 +280,16 @@
280 280   class ANALYZE_API,JOBS_API,JOB_API,EVENTS_API,RUN_JOB api
281 281  {{/mermaid}}
282 282  
283 -----
290 +-----
284 284  
292 +
285 285  == 4. Specification vs Implementation Gap Analysis ==
286 286  
295 +
296 +
287 287  === 4.1 Data Model Gaps ===
288 288  
299 +
289 289  | Specification Entity | POC1 Status | Gap Description |
290 290  |-|-|-|
291 291  | **Claim** | ⚠️ Partial | No persistent storage; claims exist only in JSON result. Missing: `status`, `confidence_score`, `risk_score`, `completeness_score`, `version`, `views`, `edit_count` |
... ... @@ -313,8 +313,9 @@
313 313  
314 314  === 4.3 Architecture Gaps ===
315 315  
327 +
316 316  | Spec Requirement | POC1 Status | Gap Description |
317 -| |-|-|
329 +||-|-|
318 318  | **Three-Layer Architecture** | ✅ Implemented | Interface (Next.js) → Processing (AKEL) → Data (SQLite) |
319 319  | **LLM Abstraction Layer** | ✅ Implemented | AI SDK supports multiple providers with failover |
320 320  | **PostgreSQL Primary DB** | ⚠️ Different | Using SQLite for simplicity (acceptable for POC) |
... ... @@ -325,19 +325,24 @@
325 325  
326 326  === 4.4 Publication & Review Gaps ===
327 327  
340 +
328 328  | Spec Feature | POC1 Status | Gap Description |
329 -| |-|-|
342 +||-|-|
330 330  | **Risk Tier Publication Rules** | ❌ Missing | All results published immediately regardless of tier |
331 331  | **Human Review Queue** | ❌ Missing | No review workflow |
332 332  | **AI-Generated Labeling** | ⚠️ Partial | Results show "AI analysis" but no formal labeling system |
333 333  | **Audit Rate Sampling** | ❌ Missing | No sampling audits |
334 334  
335 -----
348 +-----
336 336  
350 +
337 337  == 5. Optimization Recommendations ==
338 338  
353 +
354 +
339 339  === 5.1 Cost Optimizations ===
340 340  
357 +
341 341  {{mermaid}}
342 342  pie title Current LLM Cost Distribution (Estimated per Analysis)
343 343   "Step 1: Understand" : 15
... ... @@ -346,7 +346,7 @@
346 346  {{/mermaid}}
347 347  
348 348  | Optimization | Estimated Savings | Implementation Effort |
349 -| |-| |
366 +||-----||
350 350  | **Cache claim understanding** | 30-50% on repeated claims | Medium |
351 351  | **Use Haiku for fact extraction** | 40% on Step 2 costs | Low (config change) |
352 352  | **Batch fact extraction** | 20% fewer API calls | Medium |
... ... @@ -355,6 +355,7 @@
355 355  
356 356  === 5.2 Timing Optimizations ===
357 357  
375 +
358 358  {{mermaid}}
359 359  gantt
360 360   title Current Analysis Timeline (Typical)
... ... @@ -381,7 +381,7 @@
381 381  {{/mermaid}}
382 382  
383 383  | Optimization | Time Savings | Notes |
384 -| | |-|
402 +|||-----|
385 385  | **Parallel source fetching** | Already implemented | Currently fetches 3 sources in parallel |
386 386  | **Streaming LLM responses** | 20-30% perceived | User sees progress faster |
387 387  | **Search query batching** | 10-15% | Send multiple queries to search API |
... ... @@ -390,6 +390,7 @@
390 390  
391 391  === 5.3 Priority Recommendations ===
392 392  
411 +
393 393  1. **HIGH PRIORITY - Implement Claim Caching**
394 394   - Cache claim verdicts by content hash
395 395   - Reduces costs for repeated/similar claims
... ... @@ -405,12 +405,16 @@
405 405   - Cache search results (1h TTL)
406 406   - Reduces external API calls
407 407  
408 -----
427 +-----
409 409  
429 +
410 410  == 6. Separated Verdict Architecture Proposal ==
411 411  
432 +
433 +
412 412  === 6.1 Current Architecture ===
413 413  
436 +
414 414  {{mermaid}}
415 415  flowchart LR
416 416   subgraph Current["Current: Monolithic Analysis"]
... ... @@ -426,8 +426,10 @@
426 426  - No caching of individual claim verdicts
427 427  - Article verdict tightly coupled to claim extraction
428 428  
452 +
429 429  === 6.2 Proposed Separated Architecture ===
430 430  
455 +
431 431  {{mermaid}}
432 432  flowchart TB
433 433   subgraph Input["Input Processing"]
... ... @@ -480,10 +480,12 @@
480 480   class CONTEXT,ARTICLE_VERDICT dynamic
481 481  {{/mermaid}}
482 482  
508 +
483 483  === 6.3 Benefits Analysis ===
484 484  
511 +
485 485  | Benefit | Impact | Rationale |
486 -|-| |-|
513 +|-| |-----|
487 487  | **Cost Reduction** | 40-70% for repeated claims | Many articles share common claims (e.g., "COVID vaccines are safe") |
488 488  | **Faster Analysis** | 50%+ for cached claims | Skip research + LLM calls for known claims |
489 489  | **Consistency** | High | Same claim always gets same verdict (until cache expires) |
... ... @@ -527,10 +527,13 @@
527 527   - Phase 2: Semantic similarity caching (embedding-based)
528 528   - Phase 3: Federated claim sharing across instances
529 529  
530 -----
557 +-----
531 531  
559 +
532 532  == 7. Summary ==
533 533  
562 +
563 +
534 534  === Current State ===
535 535  
536 536  - POC1 implements core AKEL pipeline successfully
... ... @@ -538,6 +538,7 @@
538 538  - Multiple LLM providers supported
539 539  - No persistent claim storage or caching
540 540  
571 +
541 541  === Key Gaps from Specification ===
542 542  
543 543  - No scenario extraction
... ... @@ -546,6 +546,7 @@
546 546  - No source track record updates
547 547  - No review queue
548 548  
580 +
549 549  === Recommended Next Steps ===
550 550  
551 551  1. Implement claim caching layer