Changes for page FactHarbor POC1 Architecture Analysis 1.Jan.26
Last modified by Robert Schaub on 2026/02/08 08:12
From version 6.1
edited by Robert Schaub
on 2026/01/02 10:06
on 2026/01/02 10:06
Change comment:
There is no comment for this version
To version 1.1
edited by Robert Schaub
on 2026/01/02 09:59
on 2026/01/02 09:59
Change comment:
There is no comment for this version
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -1,12 +2,14 @@ 1 -= FactHarbor POC1 Architecture Analysis = 2 2 2 += FactHarbor POC1 Architecture Analysis= 3 + 4 + 3 3 **Version:** 2.6.17 4 4 **Analysis Date:** January 2026 5 5 **Document Purpose:** Technical diagrams, gap analysis, and optimization recommendations 6 6 7 ---- -9 +--- 8 8 9 -== 1. AKEL Flow Diagram (with LLM and WebSearch Interactions) ==11 +== 1. AKEL Flow Diagram (with LLM and WebSearch Interactions)== 10 10 11 11 12 12 {{mermaid}} ... ... @@ -90,10 +90,12 @@ 90 90 class UNDERSTAND,DECIDE,FETCHSRC,EXTRACT,VERDICT,REPORT step 91 91 {{/mermaid}} 92 92 93 ---- -95 +--- 94 94 95 -== 2. ERD Data Model (Current POC1 Implementation) == 96 96 98 +== 2. ERD Data Model (Current POC1 Implementation)== 99 + 100 + 97 97 {{mermaid}} 98 98 erDiagram 99 99 JOB ||--o{ JOB_EVENT : "has" ... ... @@ -183,10 +183,12 @@ 183 183 } 184 184 {{/mermaid}} 185 185 186 ---- -190 +--- 187 187 188 -== 3. Overall Architecture with Interactions == 189 189 193 +== 3. Overall Architecture with Interactions== 194 + 195 + 190 190 {{mermaid}} 191 191 flowchart TB 192 192 subgraph Client["🖥️ Client Layer"] ... ... @@ -280,64 +280,77 @@ 280 280 class ANALYZE_API,JOBS_API,JOB_API,EVENTS_API,RUN_JOB api 281 281 {{/mermaid}} 282 282 283 ---- -289 +--- 284 284 285 -== 4. Specification vs Implementation Gap Analysis == 286 286 287 -== =4.1Data Model Gaps===292 +== 4. Specification vs Implementation Gap Analysis== 288 288 289 -| Specification Entity | POC1 Status | Gap Description | 290 -|-|-|-| 291 -| **Claim** | ⚠️ Partial | No persistent storage; claims exist only in JSON result. Missing: `status`, `confidence_score`, `risk_score`, `completeness_score`, `version`, `views`, `edit_count` | 292 -| **Evidence** | ⚠️ Partial | Implemented as `ExtractedFact` but lacks: `supports` enum, proper `relevance_score` | 293 -| **Source** | ⚠️ Partial | `FetchedSource` exists but missing: `type` enum, `accuracy_history`, `correction_frequency`, weekly update scheduler | 294 -| **Scenario** | ❌ Missing | Not implemented. Claims are evaluated directly without scenario contexts | 295 -| **Verdict** | ⚠️ Partial | `ClaimVerdict` exists but missing: `likelihood_range`, `uncertainty_factors` array, proper `explanation_summary` | 296 -| **User** | ❌ Missing | No user authentication or role system | 297 -| **Edit** | ❌ Missing | No audit trail for changes | 298 298 295 + 296 +=== 4.1 Data Model Gaps=== 297 + 298 + 299 +| Specification Entity | POC1 Status | Gap Description | 300 +|---------------------|-------------|-----------------| 301 +| **Claim** | ⚠️ Partial | No persistent storage; claims exist only in JSON result. Missing: `status`, `confidence_score`, `risk_score`, `completeness_score`, `version`, `views`, `edit_count` | 302 +| **Evidence** | ⚠️ Partial | Implemented as `ExtractedFact` but lacks: `supports` enum, proper `relevance_score` | 303 +| **Source** | ⚠️ Partial | `FetchedSource` exists but missing: `type` enum, `accuracy_history`, `correction_frequency`, weekly update scheduler | 304 +| **Scenario** | ❌ Missing | Not implemented. Claims are evaluated directly without scenario contexts | 305 +| **Verdict** | ⚠️ Partial | `ClaimVerdict` exists but missing: `likelihood_range`, `uncertainty_factors` array, proper `explanation_summary` | 306 +| **User** | ❌ Missing | No user authentication or role system | 307 +| **Edit** | ❌ Missing | No audit trail for changes | 308 + 309 + 299 299 === 4.2 AKEL Component Gaps === 300 300 301 -| Spec Component | POC1 Status | Gap Description | 302 -| |-|-|303 -| **AKEL Orchestrator** | ✅ Implemented | `runAnalysis()` function serves this role | 304 -| **Claim Extractor** | ✅ Implemented | `understandClaim()` with claim role/dependency tracking | 305 -| **Claim Classifier** | ⚠️ Partial | Risk tier (A/B/C) assigned, but no domain classification | 306 -| **Scenario Generator** | ❌ Missing | Claims evaluated without scenario extraction | 307 -| **Evidence Summarizer** | ✅ Implemented | `extractFacts()` function | 308 -| **Contradiction Detector** | ⚠️ Partial | `isContestedClaim` flag exists but no active contradiction search | 309 -| **Quality Gate Validator** | ❌ Missing | No source quality gates, no mandatory checks | 310 -| **Audit Sampling Scheduler** | ❌ Missing | No audit system | 311 -| **Embedding Handler** | ❌ Missing | Not needed for POC | 312 -| **Federation Sync** | ❌ Missing | Not needed for POC | 312 +| Spec Component | POC1 Status | Gap Description | 313 +|----------------|-------------|-----------------| 314 +| **AKEL Orchestrator** | ✅ Implemented | `runAnalysis()` function serves this role | 315 +| **Claim Extractor** | ✅ Implemented | `understandClaim()` with claim role/dependency tracking | 316 +| **Claim Classifier** | ⚠️ Partial | Risk tier (A/B/C) assigned, but no domain classification | 317 +| **Scenario Generator** | ❌ Missing | Claims evaluated without scenario extraction | 318 +| **Evidence Summarizer** | ✅ Implemented | `extractFacts()` function | 319 +| **Contradiction Detector** | ⚠️ Partial | `isContestedClaim` flag exists but no active contradiction search | 320 +| **Quality Gate Validator** | ❌ Missing | No source quality gates, no mandatory checks | 321 +| **Audit Sampling Scheduler** | ❌ Missing | No audit system | 322 +| **Embedding Handler** | ❌ Missing | Not needed for POC | 323 +| **Federation Sync** | ❌ Missing | Not needed for POC | 313 313 314 -=== 4.3 Architecture Gaps === 315 315 316 -| Spec Requirement | POC1 Status | Gap Description | 317 -| |-|-| 318 -| **Three-Layer Architecture** | ✅ Implemented | Interface (Next.js) → Processing (AKEL) → Data (SQLite) | 319 -| **LLM Abstraction Layer** | ✅ Implemented | AI SDK supports multiple providers with failover | 320 -| **PostgreSQL Primary DB** | ⚠️ Different | Using SQLite for simplicity (acceptable for POC) | 321 -| **Redis Caching** | ❌ Missing | No caching layer | 322 -| **S3 Archival** | ❌ Missing | No long-term storage | 323 -| **Background Jobs** | ❌ Missing | No scheduler for source updates, cache warming | 324 -| **Quality Monitoring** | ⚠️ Partial | LLM call counting exists, but no anomaly detection | 326 +=== 4.3 Architecture Gaps=== 325 325 326 -=== 4.4 Publication & Review Gaps === 327 327 328 -| Spec Feature | POC1 Status | Gap Description | 329 -| |-|-| 330 -| **Risk Tier Publication Rules** | ❌ Missing | All results published immediately regardless of tier | 331 -| **Human Review Queue** | ❌ Missing | No review workflow | 332 -| **AI-Generated Labeling** | ⚠️ Partial | Results show "AI analysis" but no formal labeling system | 333 -| **Audit Rate Sampling** | ❌ Missing | No sampling audits | 329 +| Spec Requirement | POC1 Status | Gap Description | 330 +|------------------|-------------|-----------------| 331 +| **Three-Layer Architecture** | ✅ Implemented | Interface (Next.js) → Processing (AKEL) → Data (SQLite) | 332 +| **LLM Abstraction Layer** | ✅ Implemented | AI SDK supports multiple providers with failover | 333 +| **PostgreSQL Primary DB** | ⚠️ Different | Using SQLite for simplicity (acceptable for POC) | 334 +| **Redis Caching** | ❌ Missing | No caching layer | 335 +| **S3 Archival** | ❌ Missing | No long-term storage | 336 +| **Background Jobs** | ❌ Missing | No scheduler for source updates, cache warming | 337 +| **Quality Monitoring** | ⚠️ Partial | LLM call counting exists, but no anomaly detection | 334 334 335 ----- 336 336 337 -== 5.Optimization Recommendations==340 +=== 4.4 Publication & Review Gaps=== 338 338 339 -=== 5.1 Cost Optimizations === 340 340 343 +| Spec Feature | POC1 Status | Gap Description | 344 +|--------------|-------------|-----------------| 345 +| **Risk Tier Publication Rules** | ❌ Missing | All results published immediately regardless of tier | 346 +| **Human Review Queue** | ❌ Missing | No review workflow | 347 +| **AI-Generated Labeling** | ⚠️ Partial | Results show "AI analysis" but no formal labeling system | 348 +| **Audit Rate Sampling** | ❌ Missing | No sampling audits | 349 + 350 +--- 351 + 352 + 353 +== 5. Optimization Recommendations== 354 + 355 + 356 + 357 +=== 5.1 Cost Optimizations=== 358 + 359 + 341 341 {{mermaid}} 342 342 pie title Current LLM Cost Distribution (Estimated per Analysis) 343 343 "Step 1: Understand" : 15 ... ... @@ -345,16 +345,18 @@ 345 345 "Step 3: Verdicts" : 25 346 346 {{/mermaid}} 347 347 348 -| Optimization | Estimated Savings | Implementation Effort | 349 -| |-||350 -| **Cache claim understanding** | 30-50% on repeated claims | Medium | 351 -| **Use Haiku for fact extraction** | 40% on Step 2 costs | Low (config change) | 352 -| **Batch fact extraction** | 20% fewer API calls | Medium | 353 -| **Skip search for known claims** | 50%+ for cached claims | High (needs claim DB) | 354 -| **Reduce max iterations** | Linear reduction | Low (config change) | 367 +| Optimization | Estimated Savings | Implementation Effort | 368 +|--------------|-------------------|----------------------| 369 +| **Cache claim understanding** | 30-50% on repeated claims | Medium | 370 +| **Use Haiku for fact extraction** | 40% on Step 2 costs | Low (config change) | 371 +| **Batch fact extraction** | 20% fewer API calls | Medium | 372 +| **Skip search for known claims** | 50%+ for cached claims | High (needs claim DB) | 373 +| **Reduce max iterations** | Linear reduction | Low (config change) | 355 355 356 -=== 5.2 Timing Optimizations === 357 357 376 +=== 5.2 Timing Optimizations=== 377 + 378 + 358 358 {{mermaid}} 359 359 gantt 360 360 title Current Analysis Timeline (Typical) ... ... @@ -380,16 +380,18 @@ 380 380 Generate Verdicts :b5, after b4, 10s 381 381 {{/mermaid}} 382 382 383 -| Optimization | Time Savings | Notes | 384 -| ||-|385 -| **Parallel source fetching** | Already implemented | Currently fetches 3 sources in parallel | 386 -| **Streaming LLM responses** | 20-30% perceived | User sees progress faster | 387 -| **Search query batching** | 10-15% | Send multiple queries to search API | 388 -| **Reduce prompt size** | 5-10% per call | Optimize system prompts | 389 -| **Use faster models for extraction** | 30-40% on Step 2 | Claude Haiku vs Sonnet | 404 +| Optimization | Time Savings | Notes | 405 +|--------------|--------------|-------| 406 +| **Parallel source fetching** | Already implemented | Currently fetches 3 sources in parallel | 407 +| **Streaming LLM responses** | 20-30% perceived | User sees progress faster | 408 +| **Search query batching** | 10-15% | Send multiple queries to search API | 409 +| **Reduce prompt size** | 5-10% per call | Optimize system prompts | 410 +| **Use faster models for extraction** | 30-40% on Step 2 | Claude Haiku vs Sonnet | 390 390 391 -=== 5.3 Priority Recommendations === 392 392 413 +=== 5.3 Priority Recommendations=== 414 + 415 + 393 393 1. **HIGH PRIORITY - Implement Claim Caching** 394 394 - Cache claim verdicts by content hash 395 395 - Reduces costs for repeated/similar claims ... ... @@ -405,12 +405,16 @@ 405 405 - Cache search results (1h TTL) 406 406 - Reduces external API calls 407 407 408 ---- -431 +--- 409 409 410 -== 6. Separated Verdict Architecture Proposal == 411 411 412 -== =6.1Current Architecture ===434 +== 6. Separated Verdict Architecture Proposal== 413 413 436 + 437 + 438 +=== 6.1 Current Architecture=== 439 + 440 + 414 414 {{mermaid}} 415 415 flowchart LR 416 416 subgraph Current["Current: Monolithic Analysis"] ... ... @@ -426,8 +426,10 @@ 426 426 - No caching of individual claim verdicts 427 427 - Article verdict tightly coupled to claim extraction 428 428 429 -=== 6.2 Proposed Separated Architecture === 430 430 457 +=== 6.2 Proposed Separated Architecture=== 458 + 459 + 431 431 {{mermaid}} 432 432 flowchart TB 433 433 subgraph Input["Input Processing"] ... ... @@ -480,25 +480,30 @@ 480 480 class CONTEXT,ARTICLE_VERDICT dynamic 481 481 {{/mermaid}} 482 482 483 -=== 6.3 Benefits Analysis === 484 484 485 -| Benefit | Impact | Rationale | 486 -|-| |-| 487 -| **Cost Reduction** | 40-70% for repeated claims | Many articles share common claims (e.g., "COVID vaccines are safe") | 488 -| **Faster Analysis** | 50%+ for cached claims | Skip research + LLM calls for known claims | 489 -| **Consistency** | High | Same claim always gets same verdict (until cache expires) | 490 -| **Freshness Control** | Configurable TTL | Balance consistency vs. new evidence | 491 -| **Scalability** | Linear improvement | More users = higher cache hit rate | 513 +=== 6.3 Benefits Analysis=== 492 492 515 + 516 +| Benefit | Impact | Rationale | 517 +|---------|--------|-----------| 518 +| **Cost Reduction** | 40-70% for repeated claims | Many articles share common claims (e.g., "COVID vaccines are safe") | 519 +| **Faster Analysis** | 50%+ for cached claims | Skip research + LLM calls for known claims | 520 +| **Consistency** | High | Same claim always gets same verdict (until cache expires) | 521 +| **Freshness Control** | Configurable TTL | Balance consistency vs. new evidence | 522 +| **Scalability** | Linear improvement | More users = higher cache hit rate | 523 + 524 + 493 493 === 6.4 Implementation Considerations === 494 494 495 495 **Claim Hashing Strategy:** 496 -{{code language="typescript"}}function getClaimHash(claim: string): string { 528 +{{code language="typescript"}} 529 +function getClaimHash(claim: string): string { 497 497 // Normalize: lowercase, remove punctuation, stem words 498 498 const normalized = normalize(claim); 499 499 // Hash for cache key 500 500 return crypto.createHash('sha256').update(normalized).digest('hex').slice(0, 16); 501 -}{{/code}} 534 +} 535 +{{/code}} 502 502 503 503 **Cache Invalidation Triggers:** 504 504 - TTL expiration (default 7 days) ... ... @@ -511,7 +511,7 @@ 511 511 - Same claims in different article contexts may yield different article verdicts 512 512 - Example: "Vaccines are safe" + "Vaccines cause autism" → article may be misleading even if first claim is true 513 513 514 -### 6.5 Recommendation ##548 +### 6.5 Recommendation 515 515 516 516 **YES, separating is beneficial** with the following caveats: 517 517 ... ... @@ -527,19 +527,23 @@ 527 527 - Phase 2: Semantic similarity caching (embedding-based) 528 528 - Phase 3: Federated claim sharing across instances 529 529 530 ---- -564 +--- 531 531 532 -== 7. Summary == 533 533 534 -== =CurrentState===567 +== 7. Summary== 535 535 569 + 570 + 571 +=== Current State=== 572 + 536 536 - POC1 implements core AKEL pipeline successfully 537 537 - Claim dependency tracking is implemented 538 538 - Multiple LLM providers supported 539 539 - No persistent claim storage or caching 540 540 541 -=== Key Gaps from Specification === 542 542 579 +=== Key Gaps from Specification=== 580 + 543 543 - No scenario extraction 544 544 - No user/role system 545 545 - No audit trail ... ... @@ -546,10 +546,13 @@ 546 546 - No source track record updates 547 547 - No review queue 548 548 549 -=== Recommended Next Steps === 550 550 588 +=== Recommended Next Steps=== 589 + 551 551 1. Implement claim caching layer 552 552 2. Separate claim vs article verdict generation 553 553 3. Add Redis for source/search caching 554 554 4. Implement tiered model selection 555 555 5. Add basic audit logging 595 + 596 +