Changes for page FactHarbor POC1 Architecture Analysis 1.Jan.26

Last modified by Robert Schaub on 2026/02/08 08:12

From 3.1 to 4.1 From 9.1 to 10.1

From version 4.1

edited by Robert Schaub
on 2026/01/02 10:03

Change comment: There is no comment for this version

To version 9.1

edited by Robert Schaub
on 2026/01/02 10:13

Change comment: Rollback to version 7.1

Raw
Rendered

Summary

Page properties (2 modified, 0 added, 0 removed)

Details

Page properties

Title

@@ -1,1 +1,1 @@
--POC1 Architecture Analysis - 1.Jan.26
++FactHarbor POC1 Architecture Analysis

Content

@@ -1,6 +1,5 @@
  = FactHarbor POC1 Architecture Analysis =
--
  **Version:** 2.6.17
  **Analysis Date:** January 2026
  **Document Purpose:** Technical diagrams, gap analysis, and optimization recommendations
@@ -93,12 +93,123 @@
  ----
--
  == 2. ERD Data Model (Current POC1 Implementation) ==
++**Data Objects ERD**
  {{mermaid}}
  erDiagram
++    ARTICLE ||--o{ CLAIM : "contains"
++    ARTICLE ||--|| ARTICLE_VERDICT : "has"
++    CLAIM ||--|| CLAIM_VERDICT : "has"
++    CLAIM ||--o{ CLAIM : "depends on"
++    CLAIM_VERDICT }o--o{ EVIDENCE : "supported by"
++    SOURCE ||--o{ EVIDENCE : "provides"
++    ARTICLE ||--o{ SOURCE : "references"
++
++    ARTICLE {
++        string id PK "Unique identifier (job ID)"
++        string inputType "text | url"
++        string inputValue "Original URL or text"
++        string articleThesis "Main argument/thesis"
++        string detectedInputType "question | claim | article"
++        boolean isQuestion "True if input is a question"
++        datetime createdAt "Analysis timestamp"
++        datetime updatedAt "Last update"
++        json distinctProceedings "Legal proceedings if any"
++        boolean hasMultipleProceedings "Multi-proceeding flag"
++        string proceedingContext "Context for proceedings"
++        json logicalFallacies "Detected fallacies array"
++        boolean isPseudoscience "Pseudoscience detection"
++        string_array pseudoscienceCategories "Categories if detected"
++        int llmCalls "Total LLM API calls"
++        json searchQueries "All search queries performed"
++        string schemaVersion "e.g. 2.6.17"
++    }
++
++    CLAIM {
++        string id PK "SC1, SC2, C1, etc."
++        string articleId FK "Parent article"
++        string text "The claim statement"
++        string type "legal | procedural | factual | evaluative"
++        string claimRole "attribution | source | timing | core"
++        string_array dependsOn "IDs of prerequisite claims"
++        string_array keyEntities "Named entities in claim"
++        boolean isCentral "Is this a central claim?"
++        string relatedProceedingId "Linked proceeding if any"
++        int startOffset "Position in original text"
++        int endOffset "End position in original text"
++        string approximatePosition "Descriptive position"
++    }
++
++    CLAIM_VERDICT {
++        string id PK "Same as claim ID"
++        string claimId FK "Reference to claim"
++        string llmVerdict "WELL-SUPPORTED | PARTIALLY-SUPPORTED | UNCERTAIN | REFUTED"
++        string verdict "True | Mostly True | Leaning True | Unverified | Leaning False | Mostly False | False"
++        int confidence "0-100 LLM confidence"
++        int truthPercentage "0-100 calibrated truth score"
++        string riskTier "A (high) | B (medium) | C (low)"
++        string reasoning "Explanation of verdict"
++        string_array supportingFactIds "Evidence IDs supporting this"
++        boolean dependencyFailed "True if prerequisite failed"
++        string_array failedDependencies "Which deps failed"
++        string highlightColor "green | light-green | yellow | orange | dark-orange | red | dark-red"
++        boolean isPseudoscience "Pseudoscience flag"
++        string escalationReason "Why verdict was escalated"
++    }
++
++    ARTICLE_VERDICT {
++        string id PK "Same as article ID"
++        string articleId FK "Reference to article"
++        string llmArticleVerdict "Original LLM verdict"
++        int llmArticleConfidence "Original LLM confidence"
++        string articleVerdict "True | Mostly True | Leaning True | Unverified | Leaning False | Mostly False | False"
++        int articleTruthPercentage "0-100 calibrated score"
++        string articleVerdictReason "Why verdict differs from claims avg"
++        int claimsAverageTruthPercentage "Average of claim verdicts"
++        string claimsAverageVerdict "7-point average verdict"
++        int claimsTotal "Total claims analyzed"
++        int claimsSupported "Claims with truth >= 72%"
++        int claimsUncertain "Claims with truth 43-71%"
++        int claimsRefuted "Claims with truth < 43%"
++        int centralClaimsTotal "Number of central claims"
++        int centralClaimsSupported "Central claims supported"
++    }
++
++    EVIDENCE {
++        string id PK "S1-F1, S1-F2 format"
++        string sourceId FK "Reference to source"
++        string claimId FK "Optional: specific claim this supports"
++        string fact "The factual statement extracted"
++        string category "legal_provision | evidence | expert_quote | statistic | event | criticism"
++        string specificity "high | medium"
++        string sourceExcerpt "Original text excerpt"
++        string relatedProceedingId "Linked proceeding if any"
++        boolean isContestedClaim "Is this a contested assertion"
++        string claimSource "Who made contested claim"
++    }
++
++    SOURCE {
++        string id PK "S1, S2, etc."
++        string articleId FK "Parent article"
++        string url "Full URL"
++        string title "Page/document title"
++        string domain "Extracted domain"
++        int trackRecordScore "0-100 reliability score or null"
++        string fullText "Extracted content"
++        datetime fetchedAt "When content was fetched"
++        string category "news | academic | government | legal"
++        boolean fetchSuccess "True if fetch succeeded"
++        string searchQuery "Which query found this"
++        string mimeType "text/html | application/pdf"
++    }
++{{/mermaid}}
++
++**Data Usage ERD**
++
++{{mermaid}}
++erDiagram
      JOB ||--o{ JOB_EVENT : "has"
      JOB ||--|| ANALYSIS_RESULT : "produces"
      ANALYSIS_RESULT ||--o{ CLAIM_VERDICT : "contains"
@@ -188,10 +188,8 @@
  ----
--
  == 3. Overall Architecture with Interactions ==
--
  {{mermaid}}
  flowchart TB
      subgraph Client["🖥️ Client Layer"]
@@ -287,14 +287,10 @@
  ----
--
  == 4. Specification vs Implementation Gap Analysis ==
--
--
  === 4.1 Data Model Gaps ===
--
  | Specification Entity | POC1 Status | Gap Description |
  |-|-|-|
  | **Claim** | ⚠️ Partial | No persistent storage; claims exist only in JSON result. Missing: `status`, `confidence_score`, `risk_score`, `completeness_score`, `version`, `views`, `edit_count` |
@@ -322,7 +322,6 @@
  === 4.3 Architecture Gaps ===
--
  | Spec Requirement | POC1 Status | Gap Description |
  | |-|-|
  | **Three-Layer Architecture** | ✅ Implemented | Interface (Next.js) → Processing (AKEL) → Data (SQLite) |
@@ -335,7 +335,6 @@
  === 4.4 Publication & Review Gaps ===
--
  | Spec Feature | POC1 Status | Gap Description |
  | |-|-|
  | **Risk Tier Publication Rules** | ❌ Missing | All results published immediately regardless of tier |
@@ -345,14 +345,10 @@
  ----
--
  == 5. Optimization Recommendations ==
--
--
  === 5.1 Cost Optimizations ===
--
  {{mermaid}}
  pie title Current LLM Cost Distribution (Estimated per Analysis)
      "Step 1: Understand" : 15
@@ -370,7 +370,6 @@
  === 5.2 Timing Optimizations ===
--
  {{mermaid}}
  gantt
      title Current Analysis Timeline (Typical)
@@ -406,7 +406,6 @@
  === 5.3 Priority Recommendations ===
--
 . **HIGH PRIORITY - Implement Claim Caching**
     - Cache claim verdicts by content hash
     - Reduces costs for repeated/similar claims
@@ -424,14 +424,10 @@
  ----
--
  == 6. Separated Verdict Architecture Proposal ==
--
--
  === 6.1 Current Architecture ===
--
  {{mermaid}}
  flowchart LR
      subgraph Current["Current: Monolithic Analysis"]
@@ -447,10 +447,8 @@
  - No caching of individual claim verdicts
  - Article verdict tightly coupled to claim extraction
--
  === 6.2 Proposed Separated Architecture ===
--
  {{mermaid}}
  flowchart TB
      subgraph Input["Input Processing"]
@@ -503,10 +503,8 @@
      class CONTEXT,ARTICLE_VERDICT dynamic
  {{/mermaid}}
--
  === 6.3 Benefits Analysis ===
--
  | Benefit | Impact | Rationale |
  |-| |-|
  | **Cost Reduction** | 40-70% for repeated claims | Many articles share common claims (e.g., "COVID vaccines are safe") |
@@ -554,11 +554,8 @@
  ----
--
  == 7. Summary ==
--
--
  === Current State ===
  - POC1 implements core AKEL pipeline successfully
@@ -566,7 +566,6 @@
  - Multiple LLM providers supported
  - No persistent claim storage or caching
--
  === Key Gaps from Specification ===
  - No scenario extraction
@@ -575,7 +575,6 @@
  - No source track record updates
  - No review queue
--
  === Recommended Next Steps ===
 . Implement claim caching layer

Changes for page FactHarbor POC1 Architecture Analysis 1.Jan.26

Summary

Details

Applications

Navigation

Need help?