POC1 API & Schemas Specification

1

= POC1 API & Schemas Specification =

2

3

----

4

5

== Version History ==

6

7

|=Version|=Date|=Changes

8

|0.4.1|2025-12-24|Applied 9 critical fixes: file format notice, verdict taxonomy, canonicalization algorithm, Stage 1 cost policy, BullMQ fix, language in cache key, historical claims TTL, idempotency, copyright policy

9

|0.4|2025-12-24|**BREAKING:** 3-stage pipeline with claim-level caching, user tier system, cache-only mode for free users, Redis cache architecture

10

|0.3.1|2025-12-24|Fixed single-prompt strategy, SSE clarification, schema canonicalization, cost constraints

11

|0.3|2025-12-24|Added complete API endpoints, LLM config, risk tiers, scraping details

----

== 1. Core Objective (POC1) ==

16

17

The primary technical goal of POC1 is to validate **Approach 1 (Single-Pass Holistic Analysis)** while implementing **claim-level caching** to achieve cost sustainability.

18

19

The system must prove that AI can identify an article's **Main Thesis** and determine if supporting claims logically support that thesis without committing fallacies.

20

21

=== Success Criteria: ===

22

23

* Test with 30 diverse articles

24

* Target: ≥70% accuracy detecting misleading articles

25

* Cost: <$0.25 per NEW analysis (uncached)

26

* Cost: $0.00 for cached claim reuse

27

* Cache hit rate: ≥50% after 1,000 articles

28

* Processing time: <2 minutes (standard depth)

29

30

=== Economic Model: ===

31

32

* **Free tier:** $10 credit per month (~~40-140 articles depending on cache hits)

33

* **After limit:** Cache-only mode (instant, free access to cached claims)

34

* **Paid tier:** Unlimited new analyses

----

== 2. Architecture Overview ==

39

40

=== 2.1 3-Stage Pipeline with Caching ===

41

42

FactHarbor POC1 uses a **3-stage architecture** designed for claim-level caching and cost efficiency:

graph TD

A[Article Input] --> B[Stage 1: Extract Claims]

47

B --> C{For Each Claim}

48

C --> D[Check Cache]

49

D -->|Cache HIT| E[Return Cached Verdict]

50

D -->|Cache MISS| F[Stage 2: Analyze Claim]

51

F --> G[Store in Cache]

52

G --> E

53

E --> H[Stage 3: Holistic Assessment]

54

H --> I[Final Report]

55

56

57

==== Stage 1: Claim Extraction (Haiku, no cache) ====

58

59

* **Input:** Article text

60

* **Output:** 5 canonical claims (normalized, deduplicated)

61

* **Model:** Claude Haiku 4 (default, configurable via LLM abstraction layer)

62

* **Cost:** $0.003 per article

63

* **Cache strategy:** No caching (article-specific)

64

65

==== Stage 2: Claim Analysis (Sonnet, CACHED) ====

66

67

* **Input:** Single canonical claim

68

* **Output:** Scenarios + Evidence + Verdicts

69

* **Model:** Claude Sonnet 3.5 (default, configurable via LLM abstraction layer)

70

* **Cost:** $0.081 per NEW claim

71

* **Cache strategy:** Redis, 90-day TTL

72

* **Cache key:** claim:v1norm1:{language}:{sha256(canonical_claim)}

73

74

==== Stage 3: Holistic Assessment (Sonnet, no cache) ====

75

76

* **Input:** Article + Claim verdicts (from cache or Stage 2)

77

* **Output:** Article verdict + Fallacies + Logic quality

78

* **Model:** Claude Sonnet 3.5 (default, configurable via LLM abstraction layer)

79

* **Cost:** $0.030 per article

80

* **Cache strategy:** No caching (article-specific)

**Note:** Stage 3 implements **Approach 1 (Single-Pass Holistic Analysis)** from the [[Article Verdict Problem>>Test.FactHarbor.Specification.POC.Article-Verdict-Problem]]. While claim analysis (Stage 2) is cached for efficiency, the holistic assessment maintains the integrated evaluation philosophy of Approach 1.

author	version	line-number	content
		1	= POC1 API & Schemas Specification =
		2
		3	----
		4
		5	== Version History ==
		6
		7	\|=Version\|=Date\|=Changes
		8	\|0.4.1\|2025-12-24\|Applied 9 critical fixes: file format notice, verdict taxonomy, canonicalization algorithm, Stage 1 cost policy, BullMQ fix, language in cache key, historical claims TTL, idempotency, copyright policy
		9	\|0.4\|2025-12-24\|BREAKING: 3-stage pipeline with claim-level caching, user tier system, cache-only mode for free users, Redis cache architecture
		10	\|0.3.1\|2025-12-24\|Fixed single-prompt strategy, SSE clarification, schema canonicalization, cost constraints
		11	\|0.3\|2025-12-24\|Added complete API endpoints, LLM config, risk tiers, scraping details
		12
		13	----
		14
		15	== 1. Core Objective (POC1) ==
		16
		17	The primary technical goal of POC1 is to validate Approach 1 (Single-Pass Holistic Analysis) while implementing claim-level caching to achieve cost sustainability.
		18
		19	The system must prove that AI can identify an article's Main Thesis and determine if supporting claims logically support that thesis without committing fallacies.
		20
		21	=== Success Criteria: ===
		22
		23	* Test with 30 diverse articles
		24	* Target: ≥70% accuracy detecting misleading articles
		25	* Cost: <$0.25 per NEW analysis (uncached)
		26	* Cost: $0.00 for cached claim reuse
		27	* Cache hit rate: ≥50% after 1,000 articles
		28	* Processing time: <2 minutes (standard depth)
		29
		30	=== Economic Model: ===
		31
		32	* Free tier: $10 credit per month (~~40-140 articles depending on cache hits)
		33	* After limit: Cache-only mode (instant, free access to cached claims)
		34	* Paid tier: Unlimited new analyses
		35
		36	----
		37
		38	== 2. Architecture Overview ==
		39
		40	=== 2.1 3-Stage Pipeline with Caching ===

Wiki source code of POC1 API & Schemas Specification

Applications

Need help?