Requirements

Last modified by Robert Schaub on 2025/12/24 21:46

Requirements

This page defines Roles, Content States, Rules, and System Requirements for FactHarbor.

Core Philosophy: Invest in system improvement, not manual data correction. When AI makes errors, improve the algorithm and re-process automatically.

Navigation

User Needs - What users need from FactHarbor (drives these requirements)
This page - How we fulfill those needs through system design

How to read this page:

User Needs drive Requirements: See User Needs for what users need
2. Requirements define implementation: This page shows how we fulfill those needs
3. Functional Requirements (FR): Specific features and capabilities
4. Non-Functional Requirements (NFR): Quality attributes (performance, security, etc.)

Each requirement references which User Needs it fulfills.

1. Roles

Fulfills: UN-12 (Submit claims), UN-13 (Cite verdicts), UN-14 (API access)

FactHarbor uses three simple roles plus a reputation system.

1.1 Reader

Who: Anyone (no login required)

Can:

Browse and search claims
View scenarios, evidence, verdicts, and confidence scores
Flag issues or errors
Use filters, search, and visualization tools
Submit claims automatically (new claims added if not duplicates)

Cannot:

Modify content
Access edit history details

User Needs served: UN-1 (Trust assessment), UN-2 (Claim verification), UN-3 (Article summary with FactHarbor analysis summary), UN-4 (Social media fact-checking), UN-5 (Source tracing), UN-7 (Evidence transparency), UN-8 (Understanding disagreement), UN-12 (Submit claims), UN-17 (In-article highlighting)

1.2 Contributor

Who: Registered users (earns reputation through contributions)

Can:

Everything a Reader can do
Edit claims, evidence, and scenarios
Add sources and citations
Suggest improvements to AI-generated content
Participate in discussions
Earn reputation points for quality contributions

Reputation System:

New contributors: Limited edit privileges
Established contributors (established reputation): Full edit access
Trusted contributors (substantial reputation): Can approve certain changes
Reputation earned through: Accepted edits, helpful flags, quality contributions
Reputation lost through: Reverted edits, invalid flags, abuse

Cannot:

Delete or hide content (only moderators)
Override moderation decisions

User Needs served: UN-13 (Cite and contribute)

1.3 Moderator

Who: Trusted community members with proven track record, appointed by governance board

Can:

Review flagged content
Hide harmful or abusive content
Resolve disputes between contributors
Issue warnings or temporary bans
Make final decisions on content disputes
Access full audit logs

Cannot:

Change governance rules
Permanently ban users without board approval
Override technical quality gates

Note: Small team (3-5 initially), supported by automated moderation tools.

1.4 Domain Trusted Contributors (Optional, Task-Specific)

Who: Subject matter specialists invited for specific high-stakes disputes

Not a permanent role: Contacted externally when needed for contested claims in their domain

When used:

Medical claims with life/safety implications
Legal interpretations with significant impact
Scientific claims with high controversy
Technical claims requiring specialized knowledge

Process:

Moderator identifies need for expert input
Contact expert externally (don't require them to be users)
Trusted Contributor provides written opinion with sources
Opinion added to claim record
Trusted Contributor acknowledged in claim

User Needs served: UN-16 (Expert validation status)

2. Content States

Fulfills: UN-1 (Trust indicators), UN-16 (Review status transparency)

FactHarbor uses two content states. Focus is on transparency and confidence scoring, not gatekeeping.

2.1 Published

Status: Visible to all users

Includes:

AI-generated analyses (default state)
User-contributed content
Edited/improved content

Quality Indicators (displayed with content):

Confidence Score: 0-100% (AI's confidence in analysis)
Source Quality Score: 0-100% (based on source track record)
Controversy Flag: If high dispute/edit activity
Completeness Score: % of expected fields filled
Last Updated: Date of most recent change
Edit Count: Number of revisions
Review Status: AI-generated / Human-reviewed / Expert-validated

Automatic Warnings:

Confidence < 60%: "Low confidence - use caution"
Source quality < 40%: "Sources may be unreliable"
High controversy: "Disputed - multiple interpretations exist"
Medical/Legal/Safety domain: "Seek professional advice"

User Needs served: UN-1 (Trust score), UN-9 (Methodology transparency), UN-15 (Evolution timeline), UN-16 (Review status)

2.2 Hidden

Status: Not visible to regular users (only to moderators)

Reasons:

Spam or advertising
Personal attacks or harassment
Illegal content
Privacy violations
Deliberate misinformation (verified)
Abuse or harmful content

Process:

Automated detection flags for moderator review
Moderator confirms and hides
Original author notified with reason
Can appeal to board if disputes moderator decision

Note: Content is hidden, not deleted (for audit trail)

3. Contribution Rules

3.1 All Contributors Must

Provide sources for factual claims
Use clear, neutral language in FactHarbor's own summaries
Respect others and maintain civil discourse
Accept community feedback constructively
Focus on improving quality, not protecting ego

3.2 AKEL (AI System)

AKEL is the primary system. Human contributions supplement and train AKEL.

AKEL Must:

Mark all outputs as AI-generated
Display confidence scores prominently
Provide source citations
Flag uncertainty clearly
Identify contradictions in evidence
Learn from human corrections

When AKEL Makes Errors:

Capture the error pattern (what, why, how common)
2. Improve the system (better prompt, model, validation)
3. Re-process affected claims automatically
4. Measure improvement (did quality increase?)

Human Role: Train AKEL through corrections, not replace AKEL

3.3 Contributors Should

Improve clarity and structure
Add missing sources
Flag errors for system improvement
Suggest better ways to present information
Participate in quality discussions

3.4 Moderators Must

Be impartial
Document moderation decisions
Respond to appeals promptly
Use automated tools to scale efforts
Focus on abuse/harm, not routine quality control

4. Quality Standards

Fulfills: UN-5 (Source reliability), UN-6 (Publisher track records), UN-7 (Evidence transparency), UN-9 (Methodology transparency)

4.1 Source Requirements

Track Record Over Credentials:

Sources evaluated by historical accuracy
Correction policy matters
Independence from conflicts of interest
Methodology transparency

Source Quality Database:

Automated tracking of source accuracy
Correction frequency
Reliability score (updated continuously)
Users can see source track record

No automatic trust for government, academia, or media - all evaluated by track record.

User Needs served: UN-5 (Source provenance), UN-6 (Publisher reliability)

4.2 Claim Requirements

Clear subject and assertion
Verifiable with available information
Sourced (or explicitly marked as needing sources)
Neutral language in FactHarbor summaries
Appropriate context provided

User Needs served: UN-2 (Claim extraction and verification)

4.3 Evidence Requirements

Publicly accessible (or explain why not)
Properly cited with attribution
Relevant to claim being evaluated
Original source preferred over secondary

User Needs served: UN-7 (Evidence transparency)

4.4 Confidence Scoring

Automated confidence calculation based on:

Source quality scores
Evidence consistency
Contradiction detection
Completeness of analysis
Historical accuracy of similar claims

Thresholds:

< 40%: Too low to publish (needs improvement)
40-60%: Published with "Low confidence" warning
60-80%: Published as standard
80-100%: Published as "High confidence"

User Needs served: UN-1 (Trust assessment), UN-9 (Methodology transparency)

5. Automated Risk Scoring

Fulfills: UN-10 (Manipulation detection), UN-16 (Appropriate review level)

Replace manual risk tiers with continuous automated scoring.

5.1 Risk Score Calculation

Factors (weighted algorithm):

Domain sensitivity: Medical, legal, safety auto-flagged higher
Potential impact: Views, citations, spread
Controversy level: Flags, disputes, edit wars
Uncertainty: Low confidence, contradictory evidence
Source reliability: Track record of sources used

Score: 0-100 (higher = more risk)

5.2 Automated Actions

Score > 80: Flag for moderator review before publication
Score 60-80: Publish with prominent warnings
Score 40-60: Publish with standard warnings
Score < 40: Publish normally

Continuous monitoring: Risk score recalculated as new information emerges

User Needs served: UN-10 (Detect manipulation tactics), UN-16 (Review status)

6. System Improvement Process

Core principle: Fix the system, not just the data.

6.1 Error Capture

When users flag errors or make corrections:

What was wrong? (categorize)
2. What should it have been?
3. Why did the system fail? (root cause)
4. How common is this pattern?
5. Store in ErrorPattern table (improvement queue)

6.2 Weekly Improvement Cycle

Review: Analyze top error patterns
2. Develop: Create fix (prompt, model, validation)
3. Test: Validate fix on sample claims
4. Deploy: Roll out if quality improves
5. Re-process: Automatically update affected claims
6. Monitor: Track quality metrics

6.3 Quality Metrics Dashboard

Track continuously:

Error rate by category
Source quality distribution
Confidence score trends
User flag rate (issues found)
Correction acceptance rate
Re-work rate
Claims processed per hour

Goal: 10% monthly improvement in error rate

7. Automated Quality Monitoring

Replace manual audit sampling with automated monitoring.

7.1 Continuous Metrics

Source quality: Track record database
Consistency: Contradiction detection
Clarity: Readability scores
Completeness: Field validation
Accuracy: User corrections tracked

7.2 Anomaly Detection

Automated alerts for:

Sudden quality drops
Unusual patterns
Contradiction clusters
Source reliability changes
User behavior anomalies

7.3 Targeted Review

Review only flagged items
Random sampling for calibration (not quotas)
Learn from corrections to improve automation

8. Functional Requirements

This section defines specific features that fulfill user needs.

8.1 Claim Intake & Normalization

FR1 — Claim Intake

Fulfills: UN-2 (Claim extraction), UN-4 (Quick fact-checking), UN-12 (Submit claims)

Users submit claims via simple form or API
Claims can be text, URL, or image
Duplicate detection (semantic similarity)
Auto-categorization by domain

FR2 — Claim Normalization

Fulfills: UN-2 (Claim verification)

Standardize to clear assertion format
Extract key entities (who, what, when, where)
Identify claim type (factual, predictive, evaluative)
Link to existing similar claims

FR3 — Claim Classification

Fulfills: UN-11 (Filtered research)

Domain: Politics, Science, Health, etc.
Type: Historical fact, current stat, prediction, etc.
Risk score: Automated calculation
Complexity: Simple, moderate, complex

8.2 Scenario System

FR4 — Scenario Generation

Fulfills: UN-2 (Context-dependent verification), UN-3 (Article summary with FactHarbor analysis summary), UN-8 (Understanding disagreement)

Automated scenario creation:

AKEL analyzes claim and generates likely scenarios (use-cases and contexts)
Each scenario includes: assumptions, definitions, boundaries, evidence context
Users can flag incorrect scenarios
System learns from corrections

Key Concept: Scenarios represent different interpretations or contexts (e.g., "Clinical trials with healthy adults" vs. "Real-world data with diverse populations")

FR5 — Evidence Linking

Fulfills: UN-5 (Source tracing), UN-7 (Evidence transparency)

Automated evidence discovery from sources
Relevance scoring
Contradiction detection
Source quality assessment

FR6 — Scenario Comparison

Fulfills: UN-3 (Article summary with FactHarbor analysis summary), UN-8 (Understanding disagreement)

Side-by-side comparison interface
Highlight key differences between scenarios
Show evidence supporting each scenario
Display confidence scores per scenario

8.3 Verdicts & Analysis

FR7 — Automated Verdicts

Fulfills: UN-1 (Trust score), UN-2 (Verification verdicts), UN-3 (Article summary with FactHarbor analysis summary), UN-13 (Cite verdicts)

AKEL generates verdict based on evidence within each scenario
Likelihood range displayed (e.g., "0.70-0.85 (likely true)") - NOT binary true/false
Uncertainty factors explicitly listed (e.g., "Small sample sizes", "Long-term effects unknown")
Confidence score displayed prominently
Source quality indicators shown
Contradictions noted
Uncertainty acknowledged

Key Innovation: Detailed probabilistic verdicts with explicit uncertainty, not binary judgments

FR8 — Time Evolution

Fulfills: UN-15 (Verdict evolution timeline)

Claims and verdicts update as new evidence emerges
Version history maintained for all verdicts
Changes highlighted
Confidence score trends visible
Users can see "as of date X, what did we know?"

8.4 User Interface & Presentation

FR12 — Two-Panel Summary View (Article Summary with FactHarbor Analysis Summary)

Fulfills: UN-3 (Article Summary with FactHarbor Analysis Summary)

Purpose: Provide side-by-side comparison of what a document claims vs. FactHarbor's complete analysis of its credibility

Left Panel: Article Summary:

Document title, source, and claimed credibility
"The Big Picture" - main thesis or position change
"Key Findings" - structured summary of document's main claims
"Reasoning" - document's explanation for positions
"Conclusion" - document's bottom line

Right Panel: FactHarbor Analysis Summary:

FactHarbor's independent source credibility assessment
Claim-by-claim verdicts with confidence scores
Methodology assessment (strengths, limitations)
Overall verdict on document quality
Analysis ID for reference

Design Principles:

No scrolling required - both panels visible simultaneously
Visual distinction between "what they say" and "FactHarbor's analysis"
Color coding for verdicts (supported, uncertain, refuted)
Confidence percentages clearly visible
Mobile responsive (panels stack vertically on small screens)

Implementation Notes:

Generated automatically by AKEL for every analyzed document
Updates when verdict evolves (maintains version history)
Exportable as standalone summary report
Shareable via permanent URL

FR13 — In-Article Claim Highlighting

Fulfills: UN-17 (In-article claim highlighting)

Purpose: Enable readers to quickly assess claim credibility while reading by visually highlighting factual claims with color-coded indicators

Visual Example: Article with Highlighted Claims

Article: "New Study Shows Benefits of Mediterranean Diet"

A recent study published in the Journal of Nutrition has revealed new findings about the Mediterranean diet.

🟢 Researchers found that Mediterranean diet followers had a 25% lower risk of heart disease compared to control groups

↑ WELL SUPPORTED • 87% confidence
Click for evidence details →

The study, which followed 10,000 participants over five years, showed significant improvements in cardiovascular health markers.

🟡 Some experts believe this diet can completely prevent heart attacks

↑ UNCERTAIN • 45% confidence
Overstated - evidence shows risk reduction, not prevention
Click for details →

Dr. Maria Rodriguez, lead researcher, recommends incorporating more olive oil, fish, and vegetables into daily meals.

🔴 The study proves that saturated fats cause heart disease

↑ REFUTED • 15% confidence
Claim not supported by study design; correlation ≠ causation
Click for counter-evidence →

Participants also reported feeling more energetic and experiencing better sleep quality, though these were secondary measures.

Legend:

🟢 = Well-supported claim (confidence ≥75%)
🟡 = Uncertain claim (confidence 40-74%)
🔴 = Refuted/unsupported claim (confidence <40%)
Plain text = Non-factual content (context, opinions, recommendations)

Tooltip on Hover/Click

FactHarbor Analysis

Claim:
"Researchers found that Mediterranean diet followers had a 25% lower risk of heart disease"

Verdict: WELL SUPPORTED
Confidence: 87%

Evidence Summary:

Meta-analysis of 12 RCTs confirms 23-28% risk reduction
Consistent findings across multiple populations
Published in peer-reviewed journal (high credibility)

Uncertainty Factors:

Exact percentage varies by study (20-30% range)

View Full Analysis →

Color-Coding System:

Green: Well-supported claims (confidence ≥75%, strong evidence)
Yellow/Orange: Uncertain claims (confidence 40-74%, conflicting or limited evidence)
Red: Refuted or unsupported claims (confidence <40%, contradicted by evidence)
Gray/Neutral: Non-factual content (opinions, questions, procedural text)

Interactive Highlighting Example (Detailed View)

Article Text	Status	Analysis
A recent study published in the Journal of Nutrition has revealed new findings about the Mediterranean diet.	Plain text	Context - no highlighting
Researchers found that Mediterranean diet followers had a 25% lower risk of heart disease compared to control groups	🟢 WELL SUPPORTED	87% confidence Meta-analysis of 12 RCTs confirms 23-28% risk reduction View Full Analysis
The study, which followed 10,000 participants over five years, showed significant improvements in cardiovascular health markers.	Plain text	Methodology - no highlighting
Some experts believe this diet can completely prevent heart attacks	🟡 UNCERTAIN	45% confidence Overstated - evidence shows risk reduction, not prevention View Details
Dr. Rodriguez recommends incorporating more olive oil, fish, and vegetables into daily meals.	Plain text	Recommendation - no highlighting
The study proves that saturated fats cause heart disease	🔴 REFUTED	15% confidence Claim not supported by study; correlation ≠ causation View Counter-Evidence

Design Notes:

Highlighted claims use italics to distinguish from plain text
Color backgrounds match XWiki message box colors (success/warning/error)
Status column shows verdict prominently
Analysis column provides quick summary with link to details

User Actions:

Hover over highlighted claim → Tooltip appears
Click highlighted claim → Detailed analysis modal/panel
Toggle button to turn highlighting on/off
Keyboard: Tab through highlighted claims

Interaction Design:

Hover/click on highlighted claim → Show tooltip with:
Claim text
Verdict (e.g., "WELL SUPPORTED")
Confidence score (e.g., "85%")
Brief evidence summary
Link to detailed analysis
Toggle highlighting on/off (user preference)
Adjustable color intensity for accessibility

Technical Requirements:

Real-time highlighting as page loads (non-blocking)
Claim boundary detection (start/end of assertion)
Handle nested or overlapping claims
Preserve original article formatting
Work with various content formats (HTML, plain text, PDFs)

Performance Requirements:

Highlighting renders within 500ms of page load
No perceptible delay in reading experience
Efficient DOM manipulation (avoid reflows)

Accessibility:

Color-blind friendly palette (use patterns/icons in addition to color)
Screen reader compatible (ARIA labels for claim credibility)
Keyboard navigation to highlighted claims

Implementation Notes:

Claims extracted and analyzed by AKEL during initial processing
Highlighting data stored as annotations with byte offsets
Client-side rendering of highlights based on verdict data
Mobile responsive (tap instead of hover)

8.5 Workflow & Moderation

FR9 — Publication Workflow

Fulfills: UN-1 (Fast access to verified content), UN-16 (Clear review status)

Simple flow:

Claim submitted
2. AKEL processes (automated)
3. If confidence > threshold: Publish (labeled as AI-generated)
4. If confidence < threshold: Flag for improvement
5. If risk score > threshold: Flag for moderator

No multi-stage approval process

FR10 — Moderation

Focus on abuse, not routine quality:

Automated abuse detection
Moderators handle flags
Quick response to harmful content
Minimal involvement in routine content

FR11 — Audit Trail

Fulfills: UN-14 (API access to histories), UN-15 (Evolution tracking)

All edits logged
Version history public
Moderation decisions documented
System improvements tracked

9. Non-Functional Requirements

9.1 NFR1 — Performance

Fulfills: UN-4 (Fast fact-checking), UN-11 (Responsive filtering)

Claim processing: < 30 seconds
Search response: < 2 seconds
Page load: < 3 seconds
99% uptime

9.2 NFR2 — Scalability

Fulfills: UN-14 (API access at scale)

Handle 10,000 claims initially
Scale to 1M+ claims
Support 100K+ concurrent users
Automated processing scales linearly

9.3 NFR3 — Transparency

Fulfills: UN-7 (Evidence transparency), UN-9 (Methodology transparency), UN-13 (Citable verdicts), UN-15 (Evolution visibility)

All algorithms open source
All data exportable
All decisions documented
Quality metrics public

9.4 NFR4 — Security & Privacy

Follow Privacy Policy
Secure authentication
Data encryption
Regular security audits

9.5 NFR5 — Maintainability

Modular architecture
Automated testing
Continuous integration
Comprehensive documentation

10. MVP Scope

Phase 1 (Months 1-3): Read-Only MVP

Build:

Automated claim analysis
Confidence scoring
Source evaluation
Browse/search interface
User flagging system

Goal: Prove AI quality before adding user editing

User Needs fulfilled in Phase 1: UN-1, UN-2, UN-3, UN-4, UN-5, UN-6, UN-7, UN-8, UN-9, UN-12

Phase 2 (Months 4-6): User Contributions

Add only if needed:

Simple editing (Wikipedia-style)
Reputation system
Basic moderation
In-article claim highlighting (FR13)

Additional User Needs fulfilled: UN-13, UN-17

Phase 3 (Months 7-12): Refinement

Continuous quality improvement
Feature additions based on real usage
Scale infrastructure

Additional User Needs fulfilled: UN-14 (API access), UN-15 (Full evolution tracking)

Deferred:

Federation (until multiple successful instances exist)
Complex contribution workflows (focus on automation)
Extensive role hierarchy (keep simple)

11. Success Metrics

System Quality (track weekly):

Error rate by category (target: -10%/month)
Average confidence score (target: increase)
Source quality distribution (target: more high-quality)
Contradiction detection rate (target: increase)

Efficiency (track monthly):

Claims processed per hour (target: increase)
Human hours per claim (target: decrease)
Automation coverage (target: >90%)
Re-work rate (target: <5%)

User Satisfaction (track quarterly):

User flag rate (issues found)
Correction acceptance rate (flags valid)
Return user rate
Trust indicators (surveys)

User Needs Metrics (track quarterly):

UN-1: % users who understand trust scores
UN-4: Time to verify social media claim (target: <30s)
UN-7: % users who access evidence details
UN-8: % users who view multiple scenarios
UN-15: % users who check evolution timeline
UN-17: % users who enable in-article highlighting; avg. time spent on highlighted vs. non-highlighted articles

12. Requirements Traceability

For full traceability matrix showing which requirements fulfill which user needs, see:

User Needs - Section 8 includes comprehensive mapping tables

13. Related Pages

User Needs - What users need (drives these requirements)
Architecture - How requirements are implemented
Data Model - Data structures supporting requirements
Workflows - User interaction workflows
AKEL - AI system fulfilling automation requirements
Global Rules
Privacy Policy