Automation - XWiki

author	version	line-number	content
		1	= Automation =
		2	How FactHarbor scales through automated claim evaluation.
		3	== 1. Automation Philosophy ==
		4	FactHarbor is automation-first: AKEL (AI Knowledge Extraction Layer) makes all content decisions. Humans monitor system performance and improve algorithms.
		5	Why automation:
		6	* Scale: Can process millions of claims
		7	* Consistency: Same evaluation criteria applied uniformly
		8	* Transparency: Algorithms are auditable
		9	* Speed: Results in <20 seconds typically
		10	See [[Automation Philosophy>>Test.FactHarbor.Organisation.Automation-Philosophy]] for detailed principles.
		11	== 2. Claim Processing Flow ==
		12	=== 2.1 User Submits Claim ===
		13	* User provides claim text + source URLs
		14	* System validates format
		15	* Assigns processing ID
		16	* Queues for AKEL processing
		17	=== 2.2 AKEL Processing ===
		18	AKEL automatically:
		19	1. Parses claim into testable components
		20	2. Extracts evidence from sources
		21	3. Scores source credibility
		22	4. Evaluates claim against evidence
		23	5. Generates verdict with confidence score
		24	6. Assigns risk tier (A/B/C)
		25	7. Publishes result
		26	Processing time: Typically <20 seconds
		27	No human approval required - publication is automatic
		28	=== 2.3 Publication States ===
		29	Processing: AKEL working on claim (not visible to public)
		30	Published: AKEL completed evaluation (public)
		31	* Verdict displayed with confidence score
		32	* Evidence and sources shown
		33	* Risk tier indicated
		34	* Users can report issues
		35	Flagged: AKEL identified issue requiring moderator attention (still public)
		36	* Low confidence below threshold
		37	* Detected manipulation attempt
		38	* Unusual pattern
		39	* Moderator reviews and may take action == 2.5 LLM-Based Processing Architecture == FactHarbor delegates complex reasoning and analysis tasks to Large Language Models (LLMs). The architecture evolves from POC to production: === POC: Two-Phase Approach === Phase 1: Claim Extraction
		40	* Single LLM call to extract all claims from submitted content
		41	* Light structure, focused on identifying distinct verifiable claims
		42	* Output: List of claims with context Phase 2: Claim Analysis (Parallel)
		43	* Single LLM call per claim (parallelizable)
		44	* Full structured output: Evidence, Scenarios, Sources, Verdict, Risk
		45	* Each claim analyzed independently Advantages:
		46	* Fast to implement (2-4 weeks to working POC)
		47	* Only 2-3 API calls total (1 + N claims)
		48	* Simple to debug (claim-level isolation)
		49	* Proves concept viability === Production: Three-Phase Approach === Phase 1: Claim Extraction + Validation
		50	* Extract distinct verifiable claims
		51	* Validate claim clarity and uniqueness
		52	* Remove duplicates and vague claims Phase 2: Evidence Gathering (Parallel)
		53	* For each claim independently: * Find supporting and contradicting evidence * Identify authoritative sources * Generate test scenarios
		54	* Validation: Check evidence quality and source validity
		55	* Error containment: Issues in one claim don't affect others Phase 3: Verdict Generation (Parallel)
		56	* For each claim: * Generate verdict based on validated evidence * Assess confidence and risk level * Flag low-confidence results for human review
		57	* Validation: Check verdict consistency with evidence Advantages:
		58	* Error containment between phases
		59	* Clear quality gates and validation
		60	* Observable metrics per phase
		61	* Scalable (parallel processing across claims)
		62	* Adaptable (can optimize each phase independently) === LLM Task Delegation === All complex cognitive tasks are delegated to LLMs:
		63	* Claim Extraction: Understanding context, identifying distinct claims
		64	* Evidence Finding: Analyzing sources, assessing relevance
		65	* Scenario Generation: Creating testable hypotheses
		66	* Source Evaluation: Assessing reliability and authority
		67	* Verdict Generation: Synthesizing evidence into conclusions
		68	* Risk Assessment: Evaluating potential impact === Error Mitigation === Research shows sequential LLM calls face compound error risks. FactHarbor mitigates this through:
		69	* Validation gates between phases
		70	* Confidence thresholds for quality control
		71	* Parallel processing to avoid error propagation across claims
		72	* Human review queue for low-confidence verdicts
		73	* Independent claim processing - errors in one claim don't cascade to others == 3. Risk Tiers ==
		74	Risk tiers classify claims by potential impact and guide audit sampling rates.
		75	=== 3.1 Tier A (High Risk) ===
		76	Domains: Medical, legal, elections, safety, security
		77	Characteristics:
		78	* High potential for harm if incorrect
		79	* Complex specialized knowledge required
		80	* Often subject to regulation
		81	Publication: AKEL publishes automatically with prominent risk warning
		82	Audit rate: Higher sampling recommended
		83	=== 3.2 Tier B (Medium Risk) ===
		84	Domains: Complex policy, science, causality claims
		85	Characteristics:
		86	* Moderate potential impact
		87	* Requires careful evidence evaluation
		88	* Multiple valid interpretations possible
		89	Publication: AKEL publishes automatically with standard risk label
		90	Audit rate: Moderate sampling recommended
		91	=== 3.3 Tier C (Low Risk) ===
		92	Domains: Definitions, established facts, historical data
		93	Characteristics:
		94	* Low potential for harm
		95	* Well-documented information
		96	* Clear right/wrong answers typically
		97	Publication: AKEL publishes by default
		98	Audit rate: Lower sampling recommended
		99	== 4. Quality Gates ==
		100	AKEL applies quality gates before publication. If any fail, claim is flagged (not blocked - still published).
		101	Quality gates:
		102	* Sufficient evidence extracted (≥2 sources)
		103	* Sources meet minimum credibility threshold
		104	* Confidence score calculable
		105	* No detected manipulation patterns
		106	* Claim parseable into testable form
		107	Failed gates: Claim published with flag for moderator review
		108	== 5. Automation Levels ==
		109	{{include reference="Test.FactHarbor.Specification.Diagrams.Automation Level.WebHome"/}}
		110	FactHarbor progresses through automation maturity levels:
		111	Release 0.5 (Proof-of-Concept): Tier C only, human review required
		112	Release 1.0 (Initial): Tier B/C auto-published, Tier A flagged for review
		113	Release 2.0 (Mature): All tiers auto-published with risk labels, sampling audits
		114	See [[Automation Roadmap>>Test.FactHarbor.Specification.Diagrams.Automation Roadmap.WebHome]] for detailed progression. == 5.5 Automation Roadmap == {{include reference="Test.FactHarbor.Specification.Diagrams.Automation Roadmap.WebHome"/}} == 6. Human Role ==
		115	Humans do NOT review content for approval. Instead:
		116	Monitoring: Watch aggregate performance metrics
		117	Improvement: Fix algorithms when patterns show issues
		118	Exception handling: Review AKEL-flagged items
		119	Governance: Set policies AKEL applies
		120	See [[Contributor Processes>>Test.FactHarbor.Organisation.Contributor-Processes]] for how to improve the system. == 6.5 Manual vs Automated Matrix == {{include reference="Test.FactHarbor.Specification.Diagrams.Manual vs Automated matrix.WebHome"/}} == 7. Moderation ==
		121	Moderators handle items AKEL flags:
		122	Abuse detection: Spam, manipulation, harassment
		123	Safety issues: Content that could cause immediate harm
		124	System gaming: Attempts to manipulate scoring
		125	Action: May temporarily hide content, ban users, or propose algorithm improvements
		126	Does NOT: Routinely review claims or override verdicts
		127	See [[Organisational Model>>Test.FactHarbor.Organisation.Organisational-Model]] for moderator role details.
		128	== 8. Continuous Improvement ==
		129	Performance monitoring: Track AKEL accuracy, speed, coverage
		130	Issue identification: Find systematic errors from metrics
		131	Algorithm updates: Deploy improvements to fix patterns
		132	A/B testing: Validate changes before full rollout
		133	Retrospectives: Learn from failures systematically
		134	See [[Continuous Improvement>>Test.FactHarbor.Organisation.How-We-Work-Together.Continuous-Improvement]] for improvement cycle.
		135	== 9. Scalability ==
		136	Automation enables FactHarbor to scale:
		137	* Millions of claims processable
		138	* Consistent quality at any volume
		139	* Cost efficiency through automation
		140	* Rapid iteration on algorithms
		141	Without automation: Human review doesn't scale, creates bottlenecks, introduces inconsistency.
		142	== 10. Transparency ==
		143	All automation is transparent:
		144	* Algorithm parameters documented
		145	* Evaluation criteria public
		146	* Source scoring rules explicit
		147	* Confidence calculations explained
		148	* Performance metrics visible
		149	See [[System Performance Metrics>>Test.FactHarbor.Specification.System-Performance-Metrics]] for what we measure.

= Automation =

**How FactHarbor scales through automated claim evaluation.**

== 1. Automation Philosophy ==

FactHarbor is **automation-first**: AKEL (AI Knowledge Extraction Layer) makes all content decisions. Humans monitor system performance and improve algorithms.

5

**Why automation:**

6

* **Scale**: Can process millions of claims

7

* **Consistency**: Same evaluation criteria applied uniformly

8

* **Transparency**: Algorithms are auditable

9

* **Speed**: Results in <20 seconds typically

10

See [[Automation Philosophy>>Test.FactHarbor.Organisation.Automation-Philosophy]] for detailed principles.

11

== 2. Claim Processing Flow ==

12

=== 2.1 User Submits Claim ===

13

* User provides claim text + source URLs

14

* System validates format

15

* Assigns processing ID

16

* Queues for AKEL processing

17

=== 2.2 AKEL Processing ===

18

**AKEL automatically:**

19

1. Parses claim into testable components

20

2. Extracts evidence from sources

21

3. Scores source credibility

22

4. Evaluates claim against evidence

23

5. Generates verdict with confidence score

24

6. Assigns risk tier (A/B/C)

25

7. Publishes result

26

**Processing time**: Typically <20 seconds

27

**No human approval required** - publication is automatic

28

=== 2.3 Publication States ===

29

**Processing**: AKEL working on claim (not visible to public)

30

**Published**: AKEL completed evaluation (public)

31

* Verdict displayed with confidence score

32

* Evidence and sources shown

33

* Risk tier indicated

34

* Users can report issues

35

**Flagged**: AKEL identified issue requiring moderator attention (still public)

36

* Low confidence below threshold

37

* Detected manipulation attempt

38

* Unusual pattern

39

* Moderator reviews and may take action == 2.5 LLM-Based Processing Architecture == FactHarbor delegates complex reasoning and analysis tasks to Large Language Models (LLMs). The architecture evolves from POC to production: === POC: Two-Phase Approach === **Phase 1: Claim Extraction**

40

* Single LLM call to extract all claims from submitted content

41

* Light structure, focused on identifying distinct verifiable claims

42

* Output: List of claims with context **Phase 2: Claim Analysis (Parallel)**

43

* Single LLM call per claim (parallelizable)

44

* Full structured output: Evidence, Scenarios, Sources, Verdict, Risk

45

* Each claim analyzed independently **Advantages:**

46

* Fast to implement (2-4 weeks to working POC)

47

* Only 2-3 API calls total (1 + N claims)

48

* Simple to debug (claim-level isolation)

49

* Proves concept viability === Production: Three-Phase Approach === **Phase 1: Claim Extraction + Validation**

50

* Extract distinct verifiable claims

51

* Validate claim clarity and uniqueness

52

* Remove duplicates and vague claims **Phase 2: Evidence Gathering (Parallel)**

53

* For each claim independently: * Find supporting and contradicting evidence * Identify authoritative sources * Generate test scenarios

54

* Validation: Check evidence quality and source validity

55

* Error containment: Issues in one claim don't affect others **Phase 3: Verdict Generation (Parallel)**

56

* For each claim: * Generate verdict based on validated evidence * Assess confidence and risk level * Flag low-confidence results for human review

57

* Validation: Check verdict consistency with evidence **Advantages:**

58

* Error containment between phases

59

* Clear quality gates and validation

60

* Observable metrics per phase

61

* Scalable (parallel processing across claims)

62

* Adaptable (can optimize each phase independently) === LLM Task Delegation === All complex cognitive tasks are delegated to LLMs:

63

* **Claim Extraction**: Understanding context, identifying distinct claims

64

* **Evidence Finding**: Analyzing sources, assessing relevance

65

* **Scenario Generation**: Creating testable hypotheses

66

* **Source Evaluation**: Assessing reliability and authority

67

* **Verdict Generation**: Synthesizing evidence into conclusions

68

* **Risk Assessment**: Evaluating potential impact === Error Mitigation === Research shows sequential LLM calls face compound error risks. FactHarbor mitigates this through:

69

* **Validation gates** between phases

70

* **Confidence thresholds** for quality control

71

* **Parallel processing** to avoid error propagation across claims

72

* **Human review queue** for low-confidence verdicts

73

* **Independent claim processing** - errors in one claim don't cascade to others == 3. Risk Tiers ==

74

Risk tiers classify claims by potential impact and guide audit sampling rates.

75

=== 3.1 Tier A (High Risk) ===

76

**Domains**: Medical, legal, elections, safety, security

77

**Characteristics**:

78

* High potential for harm if incorrect

79

* Complex specialized knowledge required

80

* Often subject to regulation

81

**Publication**: AKEL publishes automatically with prominent risk warning

82

**Audit rate**: Higher sampling recommended

83

=== 3.2 Tier B (Medium Risk) ===

84

**Domains**: Complex policy, science, causality claims

85

**Characteristics**:

86

* Moderate potential impact

87

* Requires careful evidence evaluation

88

* Multiple valid interpretations possible

89

**Publication**: AKEL publishes automatically with standard risk label

90

**Audit rate**: Moderate sampling recommended

91

=== 3.3 Tier C (Low Risk) ===

92

**Domains**: Definitions, established facts, historical data

93

**Characteristics**:

94

* Low potential for harm

95

* Well-documented information

96

* Clear right/wrong answers typically

97

**Publication**: AKEL publishes by default

98

**Audit rate**: Lower sampling recommended

99

== 4. Quality Gates ==

100

AKEL applies quality gates before publication. If any fail, claim is **flagged** (not blocked - still published).

101

**Quality gates**:

102

* Sufficient evidence extracted (≥2 sources)

103

* Sources meet minimum credibility threshold

104

* Confidence score calculable

105

* No detected manipulation patterns

106

* Claim parseable into testable form

107

**Failed gates**: Claim published with flag for moderator review

108

== 5. Automation Levels ==

109

{{include reference="Test.FactHarbor.Specification.Diagrams.Automation Level.WebHome"/}}

110

FactHarbor progresses through automation maturity levels:

111

**Release 0.5** (Proof-of-Concept): Tier C only, human review required

112

**Release 1.0** (Initial): Tier B/C auto-published, Tier A flagged for review

113

**Release 2.0** (Mature): All tiers auto-published with risk labels, sampling audits

114

See [[Automation Roadmap>>Test.FactHarbor.Specification.Diagrams.Automation Roadmap.WebHome]] for detailed progression. == 5.5 Automation Roadmap == {{include reference="Test.FactHarbor.Specification.Diagrams.Automation Roadmap.WebHome"/}} == 6. Human Role ==

115

Humans do NOT review content for approval. Instead:

116

**Monitoring**: Watch aggregate performance metrics

117

**Improvement**: Fix algorithms when patterns show issues

118

**Exception handling**: Review AKEL-flagged items

119

**Governance**: Set policies AKEL applies

120

See [[Contributor Processes>>Test.FactHarbor.Organisation.Contributor-Processes]] for how to improve the system. == 6.5 Manual vs Automated Matrix == {{include reference="Test.FactHarbor.Specification.Diagrams.Manual vs Automated matrix.WebHome"/}} == 7. Moderation ==

121

Moderators handle items AKEL flags:

122

**Abuse detection**: Spam, manipulation, harassment

123

**Safety issues**: Content that could cause immediate harm

124

**System gaming**: Attempts to manipulate scoring

125

**Action**: May temporarily hide content, ban users, or propose algorithm improvements

126

**Does NOT**: Routinely review claims or override verdicts