V1.0 Requirements
V1.0 Requirements
Version: 0.9.70
Phase: Version 1.0 (Production Launch)
Priority: CRITICAL
Status: Ready for Implementation
This page specifies the requirements that MUST be implemented for FactHarbor V1.0 production launch, based on comprehensive fact-checking industry research (December 2025).
Overview
V1.0 adds critical requirements for:
- Platform Integration (ClaimReview schema, corrections system)
- Quality Assurance (Enhanced AKEL gates, security)
- Media Verification (Image verification, evidence archiving)
- Community Safety (Contributor protection)
- Continuous Improvement (A/B testing, quality metrics)
Total New Requirements: 11 (FR44-FR54, NFR11-NFR13)
New User Needs: 3 (UN-26, UN-27, UN-28)
Category 1: Platform Integration & Standards Compliance
FR44: ClaimReview Schema Implementation
Priority: CRITICAL
Fulfills: UN-13 (Cite verdicts), UN-14 (API access), UN-26 (Search engine visibility)
Phase: V1.0
Purpose: Make FactHarbor analyses discoverable via Google Fact Check Explorer and other search engines.
Specification:
FactHarbor must generate valid ClaimReview structured data for every published analysis following Schema.org specifications.
Required Fields:
"@context": "https://schema.org",
"@type": "ClaimReview",
"datePublished": "YYYY-MM-DD",
"url": "https://factharbor.org/claims/{claim_id}",
"claimReviewed": "The exact claim text",
"author": {
"@type": "Organization",
"name": "FactHarbor",
"url": "https://factharbor.org"
},
"reviewRating": {
"@type": "Rating",
"ratingValue": "1-5",
"bestRating": "5",
"worstRating": "1",
"alternateName": "FactHarbor likelihood score"
},
"itemReviewed": {
"@type": "Claim",
"author": {
"@type": "Person" or "Organization",
"name": "Claim author if known"
},
"datePublished": "YYYY-MM-DD if known",
"appearance": {
"@type": "CreativeWork",
"url": "Original claim URL if from article"
}
}
}
FactHarbor-Specific Mapping:
Rating Value Conversion:
- 80-100% likelihood → 5 (Highly Supported)
- 60-79% likelihood → 4 (Supported)
- 40-59% likelihood → 3 (Mixed/Uncertain)
- 20-39% likelihood → 2 (Questionable)
- 0-19% likelihood → 1 (Refuted)
Multiple Scenarios Handling:
If claim has multiple scenarios with different verdicts:
- Generate separate ClaimReview for each scenario
- Add `disambiguatingDescription` field explaining scenario context
- Example: "In the context of [scenario assumptions]..."
Implementation Requirements:
- Auto-generate ClaimReview JSON-LD on claim publication
2. Embed in HTML `<head>` section of claim page
3. Validate against Schema.org validator before deployment
4. Submit sitemap to Google Search Console
5. Update ClaimReview when verdict changes (FR8: Time Evolution)
6. Handle corrections (update dateModified field)
Acceptance Criteria:
- ✅ Passes Google Structured Data Testing Tool
- ✅ Appears in Google Fact Check Explorer within 48 hours of publication
- ✅ Valid JSON-LD syntax (no errors)
- ✅ All required fields populated
- ✅ Handles multi-scenario claims correctly
- ✅ Updates automatically on verdict changes
Integration Points:
- FR7: Automated Verdicts (source of rating data)
- FR8: Time Evolution (triggers schema updates)
- FR11: Audit Trail (log schema generation/updates)
- FR45: Corrections (updates schema on corrections)
Resources:
- ClaimReview Project: https://www.claimreviewproject.com
- Schema.org ClaimReview: https://schema.org/ClaimReview
- Google Fact Check Guidelines: https://developers.google.com/search/docs/appearance/fact-check
FR45: User Corrections Notification System
Priority: CRITICAL
Fulfills: IFCN Principle 5 (Open & Honest Corrections), EFCSN compliance
Phase: V1.0
Purpose: When claim analyses are corrected, notify users who previously viewed the claim.
Specification:
Correction Types:
- Major Correction: Verdict changes category (e.g., "Supported" → "Refuted")
2. Significant Correction: Likelihood score changes >20%
3. Minor Correction: Evidence additions, source quality updates
4. Scenario Addition: New scenario added to existing claim
Notification Mechanisms:
- In-Page Banner (Required):
This analysis was updated on [DATE]. [View what changed] [Dismiss]
Major changes:
• Verdict changed from "Likely True (75%)" to "Uncertain (45%)"
• New contradicting evidence added from [Source]
• Scenario 2 updated with additional context
[See full correction log]
Display Rules:
- Show banner on ALL pages displaying the claim
- Banner persists for 30 days after correction
- "Corrections" count badge on claim card
- Timestamp on every verdict: "Last updated: [datetime]"
2. Correction Log Page (Required):
- Public changelog at `/claims/{id}/corrections`
- Displays:
- Date/time of correction
- What changed (before/after comparison)
- Why changed (reason if provided)
- Who made change (AKEL auto-update vs. contributor override)
- Diff view of changes
3. Email Notifications (Optional for users):
- Send to users who bookmarked/shared claim
- Subject: "FactHarbor Correction: [Claim title]"
- Include summary of changes
- Link to updated analysis
- Unsubscribe option
4. RSS/API Feed (Required):
- Corrections feed at `/corrections.rss`
- API endpoint: `GET /api/corrections?since={timestamp}`
- Enables external monitoring
- Machine-readable format
IFCN Compliance Requirements:
- Corrections policy published at `/corrections-policy`
- User can report suspected errors via `/report-error/{claim_id}`
- Link to IFCN complaint process (if FactHarbor becomes signatory)
- Scrupulous transparency: never silently edit
- All corrections permanent and public
Acceptance Criteria:
- ✅ Banner appears within 60 seconds of correction
- ✅ Correction log is permanent and public
- ✅ Email notifications deliver <5 minutes
- ✅ RSS feed updates in real-time
- ✅ Mobile-responsive banner design
- ✅ Accessible (screen reader compatible)
- ✅ Cannot be dismissed permanently (reappears for 30 days)
Integration Points:
- FR8: Time Evolution (triggers corrections)
- FR11: Audit Trail (source of correction data)
- NFR3: Transparency (public correction log)
- FR44: ClaimReview (updates schema)
Category 2: Quality Assurance
NFR11: AKEL Quality Assurance Framework
Priority: CRITICAL
Fulfills: AI safety, IFCN methodology transparency
Phase: V1.0
Purpose: Prevent AI hallucinations and low-quality outputs through multi-layer automated checks.
Specification:
This enhances the existing 4 quality gates with more detailed specifications and confidence thresholds.
Gate 1: Claim Extraction Validation
Purpose: Ensure extracted claims are factual assertions (not opinions/predictions)
Automated Checks:
- Factual Statement Test: Is this verifiable? (Yes/No)
2. Opinion Detection: Contains hedging language? ("I think", "probably", "might")
3. Future Prediction Test: Makes claim about future events?
4. Specificity Score: Contains specific entities, numbers, dates?
Thresholds:
- Factual: Must be "Yes"
- Opinion markers: <2 hedging phrases
- Specificity: ≥3 specific elements
Action if Failed:
- Flag claim as "Non-verifiable" or "Opinion"
- Do NOT generate verdict
- Display to user: "This appears to be an opinion rather than a factual claim"
- Log pattern for system improvement
Gate 2: Evidence Relevance Validation
Purpose: Ensure AI-linked evidence actually relates to claim
Automated Checks:
- Semantic Similarity Score: Evidence text vs. claim (using embeddings)
2. Entity Overlap: Do evidence and claim mention same people/places/things?
3. Contradiction Detection: Does evidence discuss the claim topic?
Thresholds:
- Similarity: ≥0.6 (cosine similarity)
- Entity overlap: ≥1 shared entity
- Topic relevance: ≥0.5
Action if Failed:
- Discard irrelevant evidence
- If <2 relevant evidence items remain, verdict = "Insufficient Evidence"
- Block publication if below threshold
- Log pattern for search improvement
Gate 3: Scenario Coherence Check
Purpose: Validate scenario assumptions are logical and complete
Automated Checks:
- Completeness: Scenario has all required fields
2. Internal Consistency: Assumptions don't contradict each other
3. Distinguishability: Scenarios are meaningfully different (not duplicates)
Thresholds:
- Required fields: 100% populated
- Contradiction score: <0.3 (self-contradiction embedding)
- Scenario similarity: <0.8 (between scenarios)
Action if Failed:
- Merge duplicate scenarios
- Flag inconsistent assumptions for sampling audit
- Reduce confidence score by 20%
Gate 4: Verdict Confidence Assessment
Purpose: Only publish high-confidence verdicts
Automated Checks:
- Evidence Count: Minimum 2 sources (EFCSN standard)
2. Source Quality: Average source reliability ≥0.6
3. Evidence Agreement: Supporting vs. contradicting ratio
4. Uncertainty Factors: Number of explicit uncertainties
Confidence Tiers:
- ≥3 high-quality sources
- >80% agreement
- <2 uncertainty factors
- Publish immediately (all risk tiers)
MEDIUM (50-79%):
- 2-3 sources
- 60-80% agreement
- 2-4 uncertainty factors
- Publish with standard labels
LOW (0-49%):
- <2 sources OR
- <60% agreement OR
- >4 uncertainty factors
- BLOCK publication
Publication Rules:
- HIGH confidence: Publish immediately
- MEDIUM confidence: Publish with "May contain uncertainties" label
- LOW confidence: Block, improve system
Acceptance Criteria:
- ✅ All 4 gates implemented
- ✅ Thresholds configurable (for A/B testing)
- ✅ Gate failures logged with details
- ✅ Confidence scores accurate (validated through sampling audits)
- ✅ <5% hallucination rate (measured via audits)
Integration Points:
- FR7: Automated Verdicts (applies gates)
- AKEL: Quality gates enforcement
- NFR13: Quality metrics (reports gate performance)
NFR12: Advanced Security Controls
Priority: CRITICAL
Fulfills: Production security requirements
Phase: V1.0
Specification:
Essential Security (V1.0 Launch):
- DDoS Protection:
- Rate limiting: 100 requests/hour per IP (content submission)
- Cloudflare or equivalent
- Automatic IP blocking on abuse
2. API Rate Limiting:
- 1000 requests/hour per API key
- Burst allowance: 50 requests/minute
- 429 responses with retry-after headers
3. Audit Logging:
- All moderation actions logged
- All system changes logged
- Logs retained 2 years
- Tamper-proof logging
4. Input Validation:
- Sanitize all user inputs
- Prevent SQL injection
- Prevent XSS attacks
- Max input sizes enforced
5. Authentication:
- OAuth 2.0 for API access
- Secure session management
- Password requirements (if applicable)
Full Security (V1.1+):
6. Penetration Testing: Annual third-party tests
7. Vulnerability Scanning: Automated weekly scans
8. Security Incident Response: Documented procedures
9. Data Encryption: At rest and in transit
10. Access Control: Role-based permissions
Acceptance Criteria:
- ✅ Rate limits enforced
- ✅ DDoS protection active
- ✅ All logs captured
- ✅ Input validation prevents common attacks
- ✅ API authentication required
NFR13: Reference-Free Quality Metrics
Priority: CRITICAL (POC2 onwards)
Fulfills: NFR3 (Transparency), continuous improvement monitoring
Phase: POC2, Beta 0, V1.0
Purpose: Measure AKEL quality without requiring human-labeled ground truth.
Specification:
Metrics Dashboard (Public):
- Consistency Metrics:
- Cross-Source Consistency: Do verdicts align with evidence?
- Temporal Consistency: Do verdicts remain stable over time (when evidence unchanged)?
- Scenario Consistency: Do related scenarios have coherent verdicts?
2. Completeness Metrics:
- Evidence Retrieval Rate: % of claims with ≥2 sources
- Contradiction Search Coverage: % of claims with counter-evidence searched
- Source Diversity: Number of distinct sources per claim
3. Confidence Calibration:
- Confidence vs. Evidence Strength: Does high confidence correlate with strong evidence?
- Confidence Distribution: Are verdicts appropriately uncertain?
4. Quality Gate Performance:
- Gate Pass Rates: % passing each gate
- Gate Failure Reasons: What causes most failures?
- Gate Effectiveness: Sampling audit validation of gates
5. User Engagement Metrics:
- Correction Rate: How often do users flag issues?
- Appeal Rate: How often are corrections requested?
- User Satisfaction: Survey results
Dashboard Features:
- Public access at `/quality-metrics`
- Updated daily
- Historical trends (30/90/365 days)
- Breakdown by risk tier
- Download raw data (CSV)
Acceptance Criteria:
- ✅ Dashboard publicly accessible
- ✅ Updates daily
- ✅ All metrics implemented
- ✅ Historical data retained
- ✅ Transparent methodology explained
Category 3: Media Verification
FR46: Image Verification System
Priority: CRITICAL
Fulfills: UN-27 (Visual claim verification)
Phase: V1.0 (Basic), V1.1 (Extended)
Purpose: Enable users to verify image-based claims.
V1.0 Specification (Basic):
- Reverse Image Search:
- Integration with Google Image Search API
- Integration with TinEye API
- Display earliest known appearance
- Show similar/modified versions
2. Metadata Analysis:
- EXIF data extraction
- Creation date/time
- Camera/device information
- GPS location (if available)
- Edit history (if available)
3. Basic Manipulation Detection:
- Error Level Analysis (ELA)
- Flag obviously manipulated images
- Not AI-powered detection (V1.1+)
UI Workflow:
↓
System runs reverse search
↓
System extracts metadata
↓
System performs ELA
↓
Results displayed:
- Earliest known appearance
- Similar images found
- Metadata (camera, date, location)
- Manipulation indicators (if any)
↓
User can create claim based on findings
V1.1 Specification (Extended - Future):
- AI-powered deepfake detection
- Acoustic signature analysis (for videos)
- Advanced forensic tools (noise patterns, compression artifacts)
Acceptance Criteria (V1.0):
- ✅ Reverse search functional (Google + TinEye)
- ✅ Metadata extracted correctly
- ✅ ELA results displayed
- ✅ User-friendly interface
- ✅ Results help users make informed decisions
FR47: Archive.org Integration
Priority: CRITICAL
Fulfills: Evidence persistence, FR5 (Evidence linking)
Phase: V1.0
Purpose: Ensure evidence remains accessible even if original sources are deleted.
Specification:
Automatic Archiving:
When AKEL links evidence:
- Check if URL already archived (Wayback Machine API)
2. If not, submit for archiving (Save Page Now API)
3. Store both original URL and archive URL
4. Display both to users
Archive Display:
Archived: [Archive.org URL] (Captured: [date])
[View Original] [View Archive]
Fallback Logic:
- If original URL unavailable → Auto-redirect to archive
- If archive unavailable → Display warning
- If both unavailable → Flag for manual review
API Integration:
- Use Wayback Machine Availability API
- Use Save Page Now API (SPNv2)
- Rate limiting: 15 requests/minute (Wayback limit)
Acceptance Criteria:
- ✅ All evidence URLs auto-archived
- ✅ Archive links displayed to users
- ✅ Fallback to archive if original unavailable
- ✅ API rate limits respected
- ✅ Archive status visible in evidence display
Category 4: Community Safety
FR48: Contributor Safety Framework
Priority: CRITICAL
Fulfills: UN-28 (Safe contribution environment)
Phase: V1.0
Purpose: Protect contributors from harassment, doxxing, and coordinated attacks.
Specification:
- Privacy Protection:
- Optional Pseudonymity: Contributors can use pseudonyms
- Email Privacy: Emails never displayed publicly
- Profile Privacy: Contributors control what's public
- IP Logging: Only for abuse prevention, not public
2. Harassment Prevention:
- Automated Toxicity Detection: Flag abusive comments
- Personal Information Detection: Auto-block doxxing attempts
- Coordinated Attack Detection: Identify brigading patterns
- Rapid Response: Moderator alerts for harassment
3. Safety Features:
- Block Users: Contributors can block harassers
- Private Contributions: Option to contribute anonymously
- Report Harassment: One-click harassment reporting
- Safety Resources: Links to support resources
4. Moderator Tools:
- Quick Ban: Immediately block abusers
- Pattern Detection: Identify coordinated attacks
- Appeal Process: Fair review of moderation actions
- Escalation: Serious threats escalated to authorities
5. Trusted Contributor Protection:
- Enhanced Privacy: Additional protection for high-profile contributors
- Verification: Optional identity verification (not public)
- Legal Support: Resources for contributors facing legal threats
Acceptance Criteria:
- ✅ Pseudonyms supported
- ✅ Toxicity detection active
- ✅ Doxxing auto-blocked
- ✅ Harassment reporting functional
- ✅ Moderator tools implemented
- ✅ Safety policy published
Category 5: Continuous Improvement
FR49: A/B Testing Framework
Priority: CRITICAL
Fulfills: Continuous system improvement
Phase: V1.0
Purpose: Test and measure improvements to AKEL prompts, algorithms, and workflows.
Specification:
Test Capabilities:
- Prompt Variations:
- Test different claim extraction prompts
- Test different verdict generation prompts
- Measure: Accuracy, clarity, completeness
2. Algorithm Variations:
- Test different source scoring algorithms
- Test different confidence calculations
- Measure: Audit accuracy, user satisfaction
3. Workflow Variations:
- Test different quality gate thresholds
- Test different risk tier assignments
- Measure: Publication rate, quality scores
Implementation:
- Traffic Split: 50/50 or 90/10 splits
- Randomization: Consistent per claim (not per user)
- Metrics Collection: Automatic for all variants
- Statistical Significance: Minimum sample size calculation
- Rollout: Winner promoted to 100% traffic
A/B Test Workflow:
2. Design test: Control vs. Variant
3. Define metrics: Extraction accuracy, completeness
4. Run test: 7-14 days, minimum 100 claims each
5. Analyze results: Statistical significance?
6. Decision: Deploy winner or iterate
Acceptance Criteria:
- ✅ A/B testing framework implemented
- ✅ Can test prompt variations
- ✅ Can test algorithm variations
- ✅ Metrics automatically collected
- ✅ Statistical significance calculated
- ✅ Results inform system improvements
FR54: Evidence Deduplication
Priority: CRITICAL (POC2/Beta)
Fulfills: Accurate evidence counting, quality metrics
Phase: POC2, Beta 0, V1.0
Purpose: Avoid counting the same source multiple times when it appears in different forms.
Specification:
Deduplication Logic:
- URL Normalization:
- Remove tracking parameters (?utm_source=...)
- Normalize http/https
- Normalize www/non-www
- Handle redirects
2. Content Similarity:
- If two sources have >90% text similarity → Same source
- If one is subset of other → Same source
- Use fuzzy matching for minor differences
3. Cross-Domain Syndication:
- Detect wire service content (AP, Reuters)
- Mark as single source if syndicated
- Count original publication only
Display:
1. Original Article (NYTimes)
- Also appeared in: WashPost, Guardian (syndicated)
2. Research Paper (Nature)
3. Official Statement (WHO)
Acceptance Criteria:
- ✅ URL normalization works
- ✅ Content similarity detected
- ✅ Syndicated content identified
- ✅ Unique vs. total counts accurate
- ✅ Improves evidence quality metrics
Additional Requirements (Lower Priority)
FR50: OSINT Toolkit Integration
Priority: HIGH (V1.1)
Fulfills: Advanced media verification
Phase: V1.1
Purpose: Integrate open-source intelligence tools for advanced verification.
Tools to Integrate:
- InVID/WeVerify (video verification)
- Bellingcat toolkit
- Additional TBD based on V1.0 learnings
FR51: Video Verification System
Priority: HIGH (V1.1)
Fulfills: UN-27 (Visual claims), advanced media verification
Phase: V1.1
Purpose: Verify video-based claims.
Specification:
- Keyframe extraction
- Reverse video search
- Deepfake detection (AI-powered)
- Metadata analysis
- Acoustic signature analysis
FR52: Interactive Detection Training
Priority: MEDIUM (V1.5)
Fulfills: Media literacy education
Phase: V1.5
Purpose: Teach users to identify misinformation.
Specification:
- Interactive tutorials
- Practice exercises
- Detection quizzes
- Gamification elements
FR53: Cross-Organizational Sharing
Priority: MEDIUM (V1.5)
Fulfills: Collaboration with other fact-checkers
Phase: V1.5
Purpose: Share findings with IFCN/EFCSN members.
Specification:
- API for fact-checking organizations
- Structured data exchange
- Privacy controls
- Attribution requirements
Summary
V1.0 Critical Requirements (Must Have):
- FR44: ClaimReview Schema ✅
- FR45: Corrections Notification ✅
- FR46: Image Verification ✅
- FR47: Archive.org Integration ✅
- FR48: Contributor Safety ✅
- FR49: A/B Testing ✅
- FR54: Evidence Deduplication ✅
- NFR11: Quality Assurance Framework ✅
- NFR12: Security Controls ✅
- NFR13: Quality Metrics Dashboard ✅
V1.1+ (Future):
- FR50: OSINT Integration
- FR51: Video Verification
- FR52: Detection Training
- FR53: Cross-Org Sharing
Total: 11 critical requirements for V1.0