Last modified by Robert Schaub on 2026/02/08 08:12

From version 6.1
edited by Robert Schaub
on 2026/01/02 10:06
Change comment: There is no comment for this version
To version 1.1
edited by Robert Schaub
on 2026/01/02 09:59
Change comment: There is no comment for this version

Summary

Details

Page properties
Content
... ... @@ -1,12 +2,14 @@
1 -= FactHarbor POC1 Architecture Analysis =
2 2  
2 += FactHarbor POC1 Architecture Analysis=
3 +
4 +
3 3  **Version:** 2.6.17
4 4  **Analysis Date:** January 2026
5 5  **Document Purpose:** Technical diagrams, gap analysis, and optimization recommendations
6 6  
7 -----
9 +---
8 8  
9 -== 1. AKEL Flow Diagram (with LLM and WebSearch Interactions) ==
11 +== 1. AKEL Flow Diagram (with LLM and WebSearch Interactions)==
10 10  
11 11  
12 12  {{mermaid}}
... ... @@ -90,10 +90,12 @@
90 90   class UNDERSTAND,DECIDE,FETCHSRC,EXTRACT,VERDICT,REPORT step
91 91  {{/mermaid}}
92 92  
93 -----
95 +---
94 94  
95 -== 2. ERD Data Model (Current POC1 Implementation) ==
96 96  
98 +== 2. ERD Data Model (Current POC1 Implementation)==
99 +
100 +
97 97  {{mermaid}}
98 98  erDiagram
99 99   JOB ||--o{ JOB_EVENT : "has"
... ... @@ -183,10 +183,12 @@
183 183   }
184 184  {{/mermaid}}
185 185  
186 -----
190 +---
187 187  
188 -== 3. Overall Architecture with Interactions ==
189 189  
193 +== 3. Overall Architecture with Interactions==
194 +
195 +
190 190  {{mermaid}}
191 191  flowchart TB
192 192   subgraph Client["🖥️ Client Layer"]
... ... @@ -280,64 +280,77 @@
280 280   class ANALYZE_API,JOBS_API,JOB_API,EVENTS_API,RUN_JOB api
281 281  {{/mermaid}}
282 282  
283 -----
289 +---
284 284  
285 -== 4. Specification vs Implementation Gap Analysis ==
286 286  
287 -=== 4.1 Data Model Gaps ===
292 +== 4. Specification vs Implementation Gap Analysis==
288 288  
289 -| Specification Entity | POC1 Status | Gap Description |
290 -|-|-|-|
291 -| **Claim** | ⚠️ Partial | No persistent storage; claims exist only in JSON result. Missing: `status`, `confidence_score`, `risk_score`, `completeness_score`, `version`, `views`, `edit_count` |
292 -| **Evidence** | ⚠️ Partial | Implemented as `ExtractedFact` but lacks: `supports` enum, proper `relevance_score` |
293 -| **Source** | ⚠️ Partial | `FetchedSource` exists but missing: `type` enum, `accuracy_history`, `correction_frequency`, weekly update scheduler |
294 -| **Scenario** | ❌ Missing | Not implemented. Claims are evaluated directly without scenario contexts |
295 -| **Verdict** | ⚠️ Partial | `ClaimVerdict` exists but missing: `likelihood_range`, `uncertainty_factors` array, proper `explanation_summary` |
296 -| **User** | ❌ Missing | No user authentication or role system |
297 -| **Edit** | ❌ Missing | No audit trail for changes |
298 298  
295 +
296 +=== 4.1 Data Model Gaps===
297 +
298 +
299 +| Specification Entity | POC1 Status | Gap Description |
300 +|---------------------|-------------|-----------------|
301 +| **Claim** | ⚠️ Partial | No persistent storage; claims exist only in JSON result. Missing: `status`, `confidence_score`, `risk_score`, `completeness_score`, `version`, `views`, `edit_count` |
302 +| **Evidence** | ⚠️ Partial | Implemented as `ExtractedFact` but lacks: `supports` enum, proper `relevance_score` |
303 +| **Source** | ⚠️ Partial | `FetchedSource` exists but missing: `type` enum, `accuracy_history`, `correction_frequency`, weekly update scheduler |
304 +| **Scenario** | ❌ Missing | Not implemented. Claims are evaluated directly without scenario contexts |
305 +| **Verdict** | ⚠️ Partial | `ClaimVerdict` exists but missing: `likelihood_range`, `uncertainty_factors` array, proper `explanation_summary` |
306 +| **User** | ❌ Missing | No user authentication or role system |
307 +| **Edit** | ❌ Missing | No audit trail for changes |
308 +
309 +
299 299  === 4.2 AKEL Component Gaps ===
300 300  
301 -| Spec Component | POC1 Status | Gap Description |
302 -| |-|-|
303 -| **AKEL Orchestrator** | ✅ Implemented | `runAnalysis()` function serves this role |
304 -| **Claim Extractor** | ✅ Implemented | `understandClaim()` with claim role/dependency tracking |
305 -| **Claim Classifier** | ⚠️ Partial | Risk tier (A/B/C) assigned, but no domain classification |
306 -| **Scenario Generator** | ❌ Missing | Claims evaluated without scenario extraction |
307 -| **Evidence Summarizer** | ✅ Implemented | `extractFacts()` function |
308 -| **Contradiction Detector** | ⚠️ Partial | `isContestedClaim` flag exists but no active contradiction search |
309 -| **Quality Gate Validator** | ❌ Missing | No source quality gates, no mandatory checks |
310 -| **Audit Sampling Scheduler** | ❌ Missing | No audit system |
311 -| **Embedding Handler** | ❌ Missing | Not needed for POC |
312 -| **Federation Sync** | ❌ Missing | Not needed for POC |
312 +| Spec Component | POC1 Status | Gap Description |
313 +|----------------|-------------|-----------------|
314 +| **AKEL Orchestrator** | ✅ Implemented | `runAnalysis()` function serves this role |
315 +| **Claim Extractor** | ✅ Implemented | `understandClaim()` with claim role/dependency tracking |
316 +| **Claim Classifier** | ⚠️ Partial | Risk tier (A/B/C) assigned, but no domain classification |
317 +| **Scenario Generator** | ❌ Missing | Claims evaluated without scenario extraction |
318 +| **Evidence Summarizer** | ✅ Implemented | `extractFacts()` function |
319 +| **Contradiction Detector** | ⚠️ Partial | `isContestedClaim` flag exists but no active contradiction search |
320 +| **Quality Gate Validator** | ❌ Missing | No source quality gates, no mandatory checks |
321 +| **Audit Sampling Scheduler** | ❌ Missing | No audit system |
322 +| **Embedding Handler** | ❌ Missing | Not needed for POC |
323 +| **Federation Sync** | ❌ Missing | Not needed for POC |
313 313  
314 -=== 4.3 Architecture Gaps ===
315 315  
316 -| Spec Requirement | POC1 Status | Gap Description |
317 -| |-|-|
318 -| **Three-Layer Architecture** | ✅ Implemented | Interface (Next.js) → Processing (AKEL) → Data (SQLite) |
319 -| **LLM Abstraction Layer** | ✅ Implemented | AI SDK supports multiple providers with failover |
320 -| **PostgreSQL Primary DB** | ⚠️ Different | Using SQLite for simplicity (acceptable for POC) |
321 -| **Redis Caching** | ❌ Missing | No caching layer |
322 -| **S3 Archival** | ❌ Missing | No long-term storage |
323 -| **Background Jobs** | ❌ Missing | No scheduler for source updates, cache warming |
324 -| **Quality Monitoring** | ⚠️ Partial | LLM call counting exists, but no anomaly detection |
326 +=== 4.3 Architecture Gaps===
325 325  
326 -=== 4.4 Publication & Review Gaps ===
327 327  
328 -| Spec Feature | POC1 Status | Gap Description |
329 -| |-|-|
330 -| **Risk Tier Publication Rules** | ❌ Missing | All results published immediately regardless of tier |
331 -| **Human Review Queue** | ❌ Missing | No review workflow |
332 -| **AI-Generated Labeling** | ⚠️ Partial | Results show "AI analysis" but no formal labeling system |
333 -| **Audit Rate Sampling** | ❌ Missing | No sampling audits |
329 +| Spec Requirement | POC1 Status | Gap Description |
330 +|------------------|-------------|-----------------|
331 +| **Three-Layer Architecture** | ✅ Implemented | Interface (Next.js) → Processing (AKEL) → Data (SQLite) |
332 +| **LLM Abstraction Layer** | ✅ Implemented | AI SDK supports multiple providers with failover |
333 +| **PostgreSQL Primary DB** | ⚠️ Different | Using SQLite for simplicity (acceptable for POC) |
334 +| **Redis Caching** | ❌ Missing | No caching layer |
335 +| **S3 Archival** | ❌ Missing | No long-term storage |
336 +| **Background Jobs** | ❌ Missing | No scheduler for source updates, cache warming |
337 +| **Quality Monitoring** | ⚠️ Partial | LLM call counting exists, but no anomaly detection |
334 334  
335 -----
336 336  
337 -== 5. Optimization Recommendations ==
340 +=== 4.4 Publication & Review Gaps===
338 338  
339 -=== 5.1 Cost Optimizations ===
340 340  
343 +| Spec Feature | POC1 Status | Gap Description |
344 +|--------------|-------------|-----------------|
345 +| **Risk Tier Publication Rules** | ❌ Missing | All results published immediately regardless of tier |
346 +| **Human Review Queue** | ❌ Missing | No review workflow |
347 +| **AI-Generated Labeling** | ⚠️ Partial | Results show "AI analysis" but no formal labeling system |
348 +| **Audit Rate Sampling** | ❌ Missing | No sampling audits |
349 +
350 +---
351 +
352 +
353 +== 5. Optimization Recommendations==
354 +
355 +
356 +
357 +=== 5.1 Cost Optimizations===
358 +
359 +
341 341  {{mermaid}}
342 342  pie title Current LLM Cost Distribution (Estimated per Analysis)
343 343   "Step 1: Understand" : 15
... ... @@ -345,16 +345,18 @@
345 345   "Step 3: Verdicts" : 25
346 346  {{/mermaid}}
347 347  
348 -| Optimization | Estimated Savings | Implementation Effort |
349 -| |-| |
350 -| **Cache claim understanding** | 30-50% on repeated claims | Medium |
351 -| **Use Haiku for fact extraction** | 40% on Step 2 costs | Low (config change) |
352 -| **Batch fact extraction** | 20% fewer API calls | Medium |
353 -| **Skip search for known claims** | 50%+ for cached claims | High (needs claim DB) |
354 -| **Reduce max iterations** | Linear reduction | Low (config change) |
367 +| Optimization | Estimated Savings | Implementation Effort |
368 +|--------------|-------------------|----------------------|
369 +| **Cache claim understanding** | 30-50% on repeated claims | Medium |
370 +| **Use Haiku for fact extraction** | 40% on Step 2 costs | Low (config change) |
371 +| **Batch fact extraction** | 20% fewer API calls | Medium |
372 +| **Skip search for known claims** | 50%+ for cached claims | High (needs claim DB) |
373 +| **Reduce max iterations** | Linear reduction | Low (config change) |
355 355  
356 -=== 5.2 Timing Optimizations ===
357 357  
376 +=== 5.2 Timing Optimizations===
377 +
378 +
358 358  {{mermaid}}
359 359  gantt
360 360   title Current Analysis Timeline (Typical)
... ... @@ -380,16 +380,18 @@
380 380   Generate Verdicts :b5, after b4, 10s
381 381  {{/mermaid}}
382 382  
383 -| Optimization | Time Savings | Notes |
384 -| | |-|
385 -| **Parallel source fetching** | Already implemented | Currently fetches 3 sources in parallel |
386 -| **Streaming LLM responses** | 20-30% perceived | User sees progress faster |
387 -| **Search query batching** | 10-15% | Send multiple queries to search API |
388 -| **Reduce prompt size** | 5-10% per call | Optimize system prompts |
389 -| **Use faster models for extraction** | 30-40% on Step 2 | Claude Haiku vs Sonnet |
404 +| Optimization | Time Savings | Notes |
405 +|--------------|--------------|-------|
406 +| **Parallel source fetching** | Already implemented | Currently fetches 3 sources in parallel |
407 +| **Streaming LLM responses** | 20-30% perceived | User sees progress faster |
408 +| **Search query batching** | 10-15% | Send multiple queries to search API |
409 +| **Reduce prompt size** | 5-10% per call | Optimize system prompts |
410 +| **Use faster models for extraction** | 30-40% on Step 2 | Claude Haiku vs Sonnet |
390 390  
391 -=== 5.3 Priority Recommendations ===
392 392  
413 +=== 5.3 Priority Recommendations===
414 +
415 +
393 393  1. **HIGH PRIORITY - Implement Claim Caching**
394 394   - Cache claim verdicts by content hash
395 395   - Reduces costs for repeated/similar claims
... ... @@ -405,12 +405,16 @@
405 405   - Cache search results (1h TTL)
406 406   - Reduces external API calls
407 407  
408 -----
431 +---
409 409  
410 -== 6. Separated Verdict Architecture Proposal ==
411 411  
412 -=== 6.1 Current Architecture ===
434 +== 6. Separated Verdict Architecture Proposal==
413 413  
436 +
437 +
438 +=== 6.1 Current Architecture===
439 +
440 +
414 414  {{mermaid}}
415 415  flowchart LR
416 416   subgraph Current["Current: Monolithic Analysis"]
... ... @@ -426,8 +426,10 @@
426 426  - No caching of individual claim verdicts
427 427  - Article verdict tightly coupled to claim extraction
428 428  
429 -=== 6.2 Proposed Separated Architecture ===
430 430  
457 +=== 6.2 Proposed Separated Architecture===
458 +
459 +
431 431  {{mermaid}}
432 432  flowchart TB
433 433   subgraph Input["Input Processing"]
... ... @@ -480,25 +480,30 @@
480 480   class CONTEXT,ARTICLE_VERDICT dynamic
481 481  {{/mermaid}}
482 482  
483 -=== 6.3 Benefits Analysis ===
484 484  
485 -| Benefit | Impact | Rationale |
486 -|-| |-|
487 -| **Cost Reduction** | 40-70% for repeated claims | Many articles share common claims (e.g., "COVID vaccines are safe") |
488 -| **Faster Analysis** | 50%+ for cached claims | Skip research + LLM calls for known claims |
489 -| **Consistency** | High | Same claim always gets same verdict (until cache expires) |
490 -| **Freshness Control** | Configurable TTL | Balance consistency vs. new evidence |
491 -| **Scalability** | Linear improvement | More users = higher cache hit rate |
513 +=== 6.3 Benefits Analysis===
492 492  
515 +
516 +| Benefit | Impact | Rationale |
517 +|---------|--------|-----------|
518 +| **Cost Reduction** | 40-70% for repeated claims | Many articles share common claims (e.g., "COVID vaccines are safe") |
519 +| **Faster Analysis** | 50%+ for cached claims | Skip research + LLM calls for known claims |
520 +| **Consistency** | High | Same claim always gets same verdict (until cache expires) |
521 +| **Freshness Control** | Configurable TTL | Balance consistency vs. new evidence |
522 +| **Scalability** | Linear improvement | More users = higher cache hit rate |
523 +
524 +
493 493  === 6.4 Implementation Considerations ===
494 494  
495 495  **Claim Hashing Strategy:**
496 -{{code language="typescript"}}function getClaimHash(claim: string): string {
528 +{{code language="typescript"}}
529 +function getClaimHash(claim: string): string {
497 497   // Normalize: lowercase, remove punctuation, stem words
498 498   const normalized = normalize(claim);
499 499   // Hash for cache key
500 500   return crypto.createHash('sha256').update(normalized).digest('hex').slice(0, 16);
501 -}{{/code}}
534 +}
535 +{{/code}}
502 502  
503 503  **Cache Invalidation Triggers:**
504 504  - TTL expiration (default 7 days)
... ... @@ -511,7 +511,7 @@
511 511  - Same claims in different article contexts may yield different article verdicts
512 512  - Example: "Vaccines are safe" + "Vaccines cause autism" → article may be misleading even if first claim is true
513 513  
514 -### 6.5 Recommendation##
548 +### 6.5 Recommendation
515 515  
516 516  **YES, separating is beneficial** with the following caveats:
517 517  
... ... @@ -527,19 +527,23 @@
527 527   - Phase 2: Semantic similarity caching (embedding-based)
528 528   - Phase 3: Federated claim sharing across instances
529 529  
530 -----
564 +---
531 531  
532 -== 7. Summary ==
533 533  
534 -=== Current State ===
567 +== 7. Summary==
535 535  
569 +
570 +
571 +=== Current State===
572 +
536 536  - POC1 implements core AKEL pipeline successfully
537 537  - Claim dependency tracking is implemented
538 538  - Multiple LLM providers supported
539 539  - No persistent claim storage or caching
540 540  
541 -=== Key Gaps from Specification ===
542 542  
579 +=== Key Gaps from Specification===
580 +
543 543  - No scenario extraction
544 544  - No user/role system
545 545  - No audit trail
... ... @@ -546,10 +546,13 @@
546 546  - No source track record updates
547 547  - No review queue
548 548  
549 -=== Recommended Next Steps ===
550 550  
588 +=== Recommended Next Steps===
589 +
551 551  1. Implement claim caching layer
552 552  2. Separate claim vs article verdict generation
553 553  3. Add Redis for source/search caching
554 554  4. Implement tiered model selection
555 555  5. Add basic audit logging
595 +
596 +