Requirements
Requirements
This page defines Roles, Content States, Rules, and System Requirements for FactHarbor.
Core Philosophy: Invest in system improvement, not manual data correction. When AI makes errors, improve the algorithm and re-process automatically.
Navigation
- User Needs - What users need from FactHarbor (drives these requirements)
- This page - How we fulfill those needs through system design
1. Roles
Fulfills: UN-12 (Submit claims), UN-13 (Cite verdicts), UN-14 (API access)
FactHarbor uses three simple roles plus a reputation system.
1.1 Reader
Who: Anyone (no login required)
Can:
- Browse and search claims
- View scenarios, evidence, verdicts, and confidence scores
- Flag issues or errors
- Use filters, search, and visualization tools
- Submit claims automatically (new claims added if not duplicates)
Cannot:
- Modify content
- Access edit history details
User Needs served: UN-1 (Trust assessment), UN-2 (Claim verification), UN-3 (Article summary with FactHarbor analysis summary), UN-4 (Social media fact-checking), UN-5 (Source tracing), UN-7 (Evidence transparency), UN-8 (Understanding disagreement), UN-12 (Submit claims), UN-17 (In-article highlighting)
1.2 Contributor
Who: Registered users (earns reputation through contributions)
Can:
- Everything a Reader can do
- Edit claims, evidence, and scenarios
- Add sources and citations
- Suggest improvements to AI-generated content
- Participate in discussions
- Earn reputation points for quality contributions
Reputation System:
- New contributors: Limited edit privileges
- Established contributors (established reputation): Full edit access
- Trusted contributors (substantial reputation): Can approve certain changes
- Reputation earned through: Accepted edits, helpful flags, quality contributions
- Reputation lost through: Reverted edits, invalid flags, abuse
Cannot:
- Delete or hide content (only moderators)
- Override moderation decisions
User Needs served: UN-13 (Cite and contribute)
1.3 Moderator
Who: Trusted community members with proven track record, appointed by governance board
Can:
- Review flagged content
- Hide harmful or abusive content
- Resolve disputes between contributors
- Issue warnings or temporary bans
- Make final decisions on content disputes
- Access full audit logs
Cannot:
- Change governance rules
- Permanently ban users without board approval
- Override technical quality gates
Note: Small team (3-5 initially), supported by automated moderation tools.
1.4 Domain Trusted Contributors (Optional, Task-Specific)
Who: Subject matter specialists invited for specific high-stakes disputes
Not a permanent role: Contacted externally when needed for contested claims in their domain
When used:
- Medical claims with life/safety implications
- Legal interpretations with significant impact
- Scientific claims with high controversy
- Technical claims requiring specialized knowledge
Process:
- Moderator identifies need for expert input
- Contact expert externally (don't require them to be users)
- Trusted Contributor provides written opinion with sources
- Opinion added to claim record
- Trusted Contributor acknowledged in claim
User Needs served: UN-16 (Expert validation status)
2. Content States
Fulfills: UN-1 (Trust indicators), UN-16 (Review status transparency)
FactHarbor uses two content states. Focus is on transparency and confidence scoring, not gatekeeping.
2.1 Published
Status: Visible to all users
Includes:
- AI-generated analyses (default state)
- User-contributed content
- Edited/improved content
Quality Indicators (displayed with content):
- Confidence Score: 0-100% (AI's confidence in analysis)
- Source Quality Score: 0-100% (based on source track record)
- Controversy Flag: If high dispute/edit activity
- Completeness Score: % of expected fields filled
- Last Updated: Date of most recent change
- Edit Count: Number of revisions
- Review Status: AI-generated / Human-reviewed / Expert-validated
Automatic Warnings:
- Confidence < 60%: "Low confidence - use caution"
- Source quality < 40%: "Sources may be unreliable"
- High controversy: "Disputed - multiple interpretations exist"
- Medical/Legal/Safety domain: "Seek professional advice"
User Needs served: UN-1 (Trust score), UN-9 (Methodology transparency), UN-15 (Evolution timeline), UN-16 (Review status)
2.2 Hidden
Status: Not visible to regular users (only to moderators)
Reasons:
- Spam or advertising
- Personal attacks or harassment
- Illegal content
- Privacy violations
- Deliberate misinformation (verified)
- Abuse or harmful content
Process:
- Automated detection flags for moderator review
- Moderator confirms and hides
- Original author notified with reason
- Can appeal to board if disputes moderator decision
Note: Content is hidden, not deleted (for audit trail)
3. Contribution Rules
3.1 All Contributors Must
- Provide sources for factual claims
- Use clear, neutral language in FactHarbor's own summaries
- Respect others and maintain civil discourse
- Accept community feedback constructively
- Focus on improving quality, not protecting ego
3.2 AKEL (AI System)
AKEL is the primary system. Human contributions supplement and train AKEL.
AKEL Must:
- Mark all outputs as AI-generated
- Display confidence scores prominently
- Provide source citations
- Flag uncertainty clearly
- Identify contradictions in evidence
- Learn from human corrections
When AKEL Makes Errors:
- Capture the error pattern (what, why, how common)
2. Improve the system (better prompt, model, validation)
3. Re-process affected claims automatically
4. Measure improvement (did quality increase?)
Human Role: Train AKEL through corrections, not replace AKEL
3.3 Contributors Should
- Improve clarity and structure
- Add missing sources
- Flag errors for system improvement
- Suggest better ways to present information
- Participate in quality discussions
3.4 Moderators Must
- Be impartial
- Document moderation decisions
- Respond to appeals promptly
- Use automated tools to scale efforts
- Focus on abuse/harm, not routine quality control
4. Quality Standards
Fulfills: UN-5 (Source reliability), UN-6 (Publisher track records), UN-7 (Evidence transparency), UN-9 (Methodology transparency)
4.1 Source Requirements
Track Record Over Credentials:
- Sources evaluated by historical accuracy
- Correction policy matters
- Independence from conflicts of interest
- Methodology transparency
Source Quality Database:
- Automated tracking of source accuracy
- Correction frequency
- Reliability score (updated continuously)
- Users can see source track record
No automatic trust for government, academia, or media - all evaluated by track record.
User Needs served: UN-5 (Source provenance), UN-6 (Publisher reliability)
4.2 Claim Requirements
- Clear subject and assertion
- Verifiable with available information
- Sourced (or explicitly marked as needing sources)
- Neutral language in FactHarbor summaries
- Appropriate context provided
User Needs served: UN-2 (Claim extraction and verification)
4.3 Evidence Requirements
- Publicly accessible (or explain why not)
- Properly cited with attribution
- Relevant to claim being evaluated
- Original source preferred over secondary
User Needs served: UN-7 (Evidence transparency)
4.4 Confidence Scoring
Automated confidence calculation based on:
- Source quality scores
- Evidence consistency
- Contradiction detection
- Completeness of analysis
- Historical accuracy of similar claims
Thresholds:
- < 40%: Too low to publish (needs improvement)
- 40-60%: Published with "Low confidence" warning
- 60-80%: Published as standard
- 80-100%: Published as "High confidence"
User Needs served: UN-1 (Trust assessment), UN-9 (Methodology transparency)
5. Automated Risk Scoring
Fulfills: UN-10 (Manipulation detection), UN-16 (Appropriate review level)
Replace manual risk tiers with continuous automated scoring.
5.1 Risk Score Calculation
Factors (weighted algorithm):
- Domain sensitivity: Medical, legal, safety auto-flagged higher
- Potential impact: Views, citations, spread
- Controversy level: Flags, disputes, edit wars
- Uncertainty: Low confidence, contradictory evidence
- Source reliability: Track record of sources used
Score: 0-100 (higher = more risk)
5.2 Automated Actions
- Score > 80: Flag for moderator review before publication
- Score 60-80: Publish with prominent warnings
- Score 40-60: Publish with standard warnings
- Score < 40: Publish normally
Continuous monitoring: Risk score recalculated as new information emerges
User Needs served: UN-10 (Detect manipulation tactics), UN-16 (Review status)
6. System Improvement Process
Core principle: Fix the system, not just the data.
6.1 Error Capture
When users flag errors or make corrections:
- What was wrong? (categorize)
2. What should it have been?
3. Why did the system fail? (root cause)
4. How common is this pattern?
5. Store in ErrorPattern table (improvement queue)
6.2 Continuous Improvement Cycle
- Review: Analyze top error patterns
2. Develop: Create fix (prompt, model, validation)
3. Test: Validate fix on sample claims
4. Deploy: Roll out if quality improves
5. Re-process: Automatically update affected claims
6. Monitor: Track quality metrics
6.3 Quality Metrics Dashboard
Track continuously:
- Error rate by category
- Source quality distribution
- Confidence score trends
- User flag rate (issues found)
- Correction acceptance rate
- Re-work rate
- Claims processed per hour
Goal: continuous improvement in error rate
7. Automated Quality Monitoring
Replace manual audit sampling with automated monitoring.
7.1 Continuous Metrics
- Source quality: Track record database
- Consistency: Contradiction detection
- Clarity: Readability scores
- Completeness: Field validation
- Accuracy: User corrections tracked
7.2 Anomaly Detection
Automated alerts for:
- Sudden quality drops
- Unusual patterns
- Contradiction clusters
- Source reliability changes
- User behavior anomalies
7.3 Targeted Review
- Review only flagged items
- Random sampling for calibration (not quotas)
- Learn from corrections to improve automation
8. Functional Requirements
This section defines specific features that fulfill user needs.
8.1 Claim Intake & Normalization
FR1 — Claim Intake
Fulfills: UN-2 (Claim extraction), UN-4 (Quick fact-checking), UN-12 (Submit claims)
- Users submit claims via simple form or API
- Claims can be text, URL, or image
- Duplicate detection (semantic similarity)
- Auto-categorization by domain
FR2 — Claim Normalization
Fulfills: UN-2 (Claim verification)
- Standardize to clear assertion format
- Extract key entities (who, what, when, where)
- Identify claim type (factual, predictive, evaluative)
- Link to existing similar claims
FR3 — Claim Classification
Fulfills: UN-11 (Filtered research)
- Domain: Politics, Science, Health, etc.
- Type: Historical fact, current stat, prediction, etc.
- Risk score: Automated calculation
- Complexity: Simple, moderate, complex
8.2 Scenario System
FR4 — Scenario Generation
Fulfills: UN-2 (Context-dependent verification), UN-3 (Article summary with FactHarbor analysis summary), UN-8 (Understanding disagreement)
Automated scenario creation:
- AKEL analyzes claim and generates likely scenarios (use-cases and contexts)
- Each scenario includes: assumptions, definitions, boundaries, evidence context
- Users can flag incorrect scenarios
- System learns from corrections
Key Concept: Scenarios represent different interpretations or contexts (e.g., "Clinical trials with healthy adults" vs. "Real-world data with diverse populations")
FR5 — Evidence Linking
Fulfills: UN-5 (Source tracing), UN-7 (Evidence transparency)
- Automated evidence discovery from sources
- Relevance scoring
- Contradiction detection
- Source quality assessment
FR6 — Scenario Comparison
Fulfills: UN-3 (Article summary with FactHarbor analysis summary), UN-8 (Understanding disagreement)
- Side-by-side comparison interface
- Highlight key differences between scenarios
- Show evidence supporting each scenario
- Display confidence scores per scenario
8.3 Verdicts & Analysis
FR7 — Automated Verdicts
Fulfills: UN-1 (Trust score), UN-2 (Verification verdicts), UN-3 (Article summary with FactHarbor analysis summary), UN-13 (Cite verdicts)
- AKEL generates verdict based on evidence within each scenario
- Likelihood range displayed (e.g., "0.70-0.85 (likely true)") - NOT binary true/false
- Uncertainty factors explicitly listed (e.g., "Small sample sizes", "Long-term effects unknown")
- Confidence score displayed prominently
- Source quality indicators shown
- Contradictions noted
- Uncertainty acknowledged
Key Innovation: Detailed probabilistic verdicts with explicit uncertainty, not binary judgments
FR8 — Time Evolution
Fulfills: UN-15 (Verdict evolution timeline)
- Claims and verdicts update as new evidence emerges
- Version history maintained for all verdicts
- Changes highlighted
- Confidence score trends visible
- Users can see "as of date X, what did we know?"
8.4 User Interface & Presentation
FR12 — Two-Panel Summary View (Article Summary with FactHarbor Analysis Summary)
Fulfills: UN-3 (Article Summary with FactHarbor Analysis Summary)
Purpose: Provide side-by-side comparison of what a document claims vs. FactHarbor's complete analysis of its credibility
Left Panel: Article Summary:
- Document title, source, and claimed credibility
- "The Big Picture" - main thesis or position change
- "Key Findings" - structured summary of document's main claims
- "Reasoning" - document's explanation for positions
- "Conclusion" - document's bottom line
Right Panel: FactHarbor Analysis Summary:
- FactHarbor's independent source credibility assessment
- Claim-by-claim verdicts with confidence scores
- Methodology assessment (strengths, limitations)
- Overall verdict on document quality
- Analysis ID for reference
Design Principles:
- No scrolling required - both panels visible simultaneously
- Visual distinction between "what they say" and "FactHarbor's analysis"
- Color coding for verdicts (supported, uncertain, refuted)
- Confidence percentages clearly visible
- Mobile responsive (panels stack vertically on small screens)
Implementation Notes:
- Generated automatically by AKEL for every analyzed document
- Updates when verdict evolves (maintains version history)
- Exportable as standalone summary report
- Shareable via permanent URL
FR13 — In-Article Claim Highlighting
Fulfills: UN-17 (In-article claim highlighting)
Purpose: Enable readers to quickly assess claim credibility while reading by visually highlighting factual claims with color-coded indicators
Visual Example: Article with Highlighted Claims
Article: "New Study Shows Benefits of Mediterranean Diet"
A recent study published in the Journal of Nutrition has revealed new findings about the Mediterranean diet.
The study, which followed 10,000 participants over five years, showed significant improvements in cardiovascular health markers.
Dr. Maria Rodriguez, lead researcher, recommends incorporating more olive oil, fish, and vegetables into daily meals.
Participants also reported feeling more energetic and experiencing better sleep quality, though these were secondary measures.
Legend:
- 🟢 = Well-supported claim (confidence ≥75%)
- 🟡 = Uncertain claim (confidence 40-74%)
- 🔴 = Refuted/unsupported claim (confidence <40%)
- Plain text = Non-factual content (context, opinions, recommendations)
Tooltip on Hover/Click
Color-Coding System:
- Green: Well-supported claims (confidence ≥75%, strong evidence)
- Yellow/Orange: Uncertain claims (confidence 40-74%, conflicting or limited evidence)
- Red: Refuted or unsupported claims (confidence <40%, contradicted by evidence)
- Gray/Neutral: Non-factual content (opinions, questions, procedural text)
Interactive Highlighting Example (Detailed View)
| Article Text | Status | Analysis |
|---|---|---|
A recent study published in the Journal of Nutrition has revealed new findings about the Mediterranean diet. | Plain text | Context - no highlighting |
Researchers found that Mediterranean diet followers had a 25% lower risk of heart disease compared to control groups | 🟢 WELL SUPPORTED | |
The study, which followed 10,000 participants over five years, showed significant improvements in cardiovascular health markers. | Plain text | Methodology - no highlighting |
Some experts believe this diet can completely prevent heart attacks | 🟡 UNCERTAIN | |
Dr. Rodriguez recommends incorporating more olive oil, fish, and vegetables into daily meals. | Plain text | Recommendation - no highlighting |
The study proves that saturated fats cause heart disease | 🔴 REFUTED |
Design Notes:
- Highlighted claims use italics to distinguish from plain text
- Color backgrounds match XWiki message box colors (success/warning/error)
- Status column shows verdict prominently
- Analysis column provides quick summary with link to details
User Actions:
- Hover over highlighted claim → Tooltip appears
- Click highlighted claim → Detailed analysis modal/panel
- Toggle button to turn highlighting on/off
- Keyboard: Tab through highlighted claims
Interaction Design:
- Hover/click on highlighted claim → Show tooltip with:
- Claim text
- Verdict (e.g., "WELL SUPPORTED")
- Confidence score (e.g., "85%")
- Brief evidence summary
- Link to detailed analysis
- Toggle highlighting on/off (user preference)
- Adjustable color intensity for accessibility
Technical Requirements:
- Real-time highlighting as page loads (non-blocking)
- Claim boundary detection (start/end of assertion)
- Handle nested or overlapping claims
- Preserve original article formatting
- Work with various content formats (HTML, plain text, PDFs)
Performance Requirements:
- Highlighting renders within 500ms of page load
- No perceptible delay in reading experience
- Efficient DOM manipulation (avoid reflows)
Accessibility:
- Color-blind friendly palette (use patterns/icons in addition to color)
- Screen reader compatible (ARIA labels for claim credibility)
- Keyboard navigation to highlighted claims
Implementation Notes:
- Claims extracted and analyzed by AKEL during initial processing
- Highlighting data stored as annotations with byte offsets
- Client-side rendering of highlights based on verdict data
- Mobile responsive (tap instead of hover)
8.5 Workflow & Moderation
FR9 — Publication Workflow
Fulfills: UN-1 (Fast access to verified content), UN-16 (Clear review status)
Simple flow:
- Claim submitted
2. AKEL processes (automated)
3. If confidence > threshold: Publish (labeled as AI-generated)
4. If confidence < threshold: Flag for improvement
5. If risk score > threshold: Flag for moderator
No multi-stage approval process
FR10 — Moderation
Focus on abuse, not routine quality:
- Automated abuse detection
- Moderators handle flags
- Quick response to harmful content
- Minimal involvement in routine content
FR11 — Audit Trail
Fulfills: UN-14 (API access to histories), UN-15 (Evolution tracking)
- All edits logged
- Version history public
- Moderation decisions documented
- System improvements tracked
9. Non-Functional Requirements
9.1 NFR1 — Performance
Fulfills: UN-4 (Fast fact-checking), UN-11 (Responsive filtering)
- Claim processing: < 30 seconds
- Search response: < 2 seconds
- Page load: < 3 seconds
- 99% uptime
9.2 NFR2 — Scalability
Fulfills: UN-14 (API access at scale)
- Handle 10,000 claims initially
- Scale to 1M+ claims
- Support 100K+ concurrent users
- Automated processing scales linearly
9.3 NFR3 — Transparency
Fulfills: UN-7 (Evidence transparency), UN-9 (Methodology transparency), UN-13 (Citable verdicts), UN-15 (Evolution visibility)
- All algorithms open source
- All data exportable
- All decisions documented
- Quality metrics public
9.4 NFR4 — Security & Privacy
- Follow Privacy Policy
- Secure authentication
- Data encryption
- Regular security audits
9.5 NFR5 — Maintainability
- Modular architecture
- Automated testing
- Continuous integration
- Comprehensive documentation
NFR11: AKEL Quality Assurance Framework
Fulfills: AI safety, IFCN methodology transparency
Specification:
Multi-layer AI quality gates to detect hallucinations, low-confidence results, and logical inconsistencies.
Quality Gate 1: Claim Extraction Validation
Purpose: Ensure extracted claims are factual assertions (not opinions/predictions)
Checks:
- Factual Statement Test: Is this verifiable? (Yes/No)
2. Opinion Detection: Contains hedging language? ("I think", "probably", "best")
3. Future Prediction Test: Makes claims about future events?
4. Specificity Score: Contains specific entities, numbers, dates?
Thresholds:
- Factual: Must be "Yes"
- Opinion markers: <2 hedging phrases
- Specificity: ≥3 specific elements
Action if Failed: Flag as "Non-verifiable", do NOT generate verdict
Quality Gate 2: Evidence Relevance Validation
Purpose: Ensure AI-linked evidence actually relates to claim
Checks:
- Semantic Similarity Score: Evidence vs. claim (embeddings)
2. Entity Overlap: Shared people/places/things?
3. Topic Relevance: Discusses claim subject?
Thresholds:
- Similarity: ≥0.6 (cosine similarity)
- Entity overlap: ≥1 shared entity
- Topic relevance: ≥0.5
Action if Failed: Discard irrelevant evidence
Quality Gate 3: Scenario Coherence Check
Purpose: Validate scenario assumptions are logical and complete
Checks:
- Completeness: All required fields populated
2. Internal Consistency: Assumptions don't contradict
3. Distinguishability: Scenarios meaningfully different
Thresholds:
- Required fields: 100%
- Contradiction score: <0.3
- Scenario similarity: <0.8
Action if Failed: Merge duplicates, reduce confidence -20%
Quality Gate 4: Verdict Confidence Assessment
Purpose: Only publish high-confidence verdicts
Checks:
- Evidence Count: Minimum 2 sources
2. Source Quality: Average reliability ≥0.6
3. Evidence Agreement: Supporting vs. contradicting ≥0.6
4. Uncertainty Factors: Hedging in reasoning
Confidence Tiers:
- HIGH (80-100%): ≥3 sources, ≥0.7 quality, ≥80% agreement
- MEDIUM (50-79%): ≥2 sources, ≥0.6 quality, ≥60% agreement
- LOW (0-49%): <2 sources OR low quality/agreement
- INSUFFICIENT: <2 sources → DO NOT PUBLISH
Implementation Phases:
- POC1: Gates 1 & 4 only (basic validation)
- POC2: All 4 gates (complete framework)
- V1.0: Hardened with <5% hallucination rate
Acceptance Criteria:
- ✅ All gates operational
- ✅ Hallucination rate <5%
- ✅ Quality metrics public
NFR12: Security Controls
Fulfills: Production readiness, legal compliance
Requirements:
- Input Validation: SQL injection, XSS, CSRF prevention
2. Rate Limiting: 5 analyses per minute per IP
3. Authentication: Secure sessions, API key rotation
4. Data Protection: HTTPS, encryption, backups
5. Security Audit: Penetration testing, GDPR compliance
Milestone: Beta 0 (essential), V1.0 (complete) BLOCKER
NFR13: Quality Metrics Transparency
Fulfills: IFCN transparency, user trust
Public Metrics:
- Quality gates performance
- Evidence quality stats
- Hallucination rate
- User feedback
Milestone: POC2 (internal), Beta 0 (public), V1.0 (real-time)
13. Requirements Traceability
For full traceability matrix showing which requirements fulfill which user needs, see:
- User Needs - Section 8 includes comprehensive mapping tables
14. Related Pages
Non-Functional Requirements (see Section 9):
- NFR11 — AKEL Quality Assurance Framework
- NFR12 — Security Controls
- NFR13 — Quality Metrics Transparency
Other Requirements:
- User Needs - What users need (drives these requirements)
- Architecture - How requirements are implemented
- Data Model - Data structures supporting requirements
- Workflows - User interaction workflows
- AKEL - AI system fulfilling automation requirements
- Global Rules
- Privacy Policy
V0.9.70 Additional Requirements
Functional Requirements (Additional)
FR44: ClaimReview Schema Implementation
Generate valid ClaimReview structured data for Google/Bing visibility.
Schema.org Mapping:
- 80-100% likelihood → 5 (Highly Supported)
- 60-79% → 4 (Supported)
- 40-59% → 3 (Mixed)
- 20-39% → 2 (Questionable)
- 0-19% → 1 (Refuted)
Milestone: V1.0
FR45: User Corrections Notification System
Notify users when analyses are corrected.
Mechanisms:
- In-page banner (30 days)
2. Public correction log
3. Email notifications (opt-in)
4. RSS/API feed
Milestone: Beta 0 (basic), V1.0 (complete) BLOCKER
FR46: Image Verification System
Methods:
- Reverse image search
2. EXIF metadata analysis
3. Manipulation detection (basic)
4. Context verification
Milestone: Beta 0 (basic), V1.0 (extended)
FR47: Archive.org Integration
Auto-save evidence sources to Wayback Machine.
Milestone: Beta 0
FR48: Safety Framework for Contributors
Protect contributors from harassment and legal threats.
Milestone: V1.1
FR49: A/B Testing Framework
Test AKEL approaches and UI designs systematically.
Milestone: V1.0
FR50: OSINT Toolkit Integration
Priority: HIGH (V1.1)
Fulfills: Advanced media verification
Phase: V1.1
Purpose: Integrate open-source intelligence tools for advanced verification.
Tools to Integrate:
- InVID/WeVerify (video verification)
- Bellingcat toolkit
- Additional TBD based on V1.0 learnings
FR51: Video Verification System
Priority: HIGH (V1.1)
Fulfills: UN-27 (Visual claims), advanced media verification
Phase: V1.1
Purpose: Verify video-based claims.
Specification:
- Keyframe extraction
- Reverse video search
- Deepfake detection (AI-powered)
- Metadata analysis
- Acoustic signature analysis
FR52: Interactive Detection Training
Priority: MEDIUM (V1.5)
Fulfills: Media literacy education
Phase: V1.5
Purpose: Teach users to identify misinformation.
Specification:
- Interactive tutorials
- Practice exercises
- Detection quizzes
- Gamification elements
FR53: Cross-Organizational Sharing
Priority: MEDIUM (V1.5)
Fulfills: Collaboration with other fact-checkers
Phase: V1.5
Purpose: Share findings with IFCN/EFCSN members.
Specification:
- API for fact-checking organizations
- Structured data exchange
- Privacy controls
- Attribution requirements
Summary
V1.0 Critical Requirements (Must Have):
- FR44: ClaimReview Schema ✅
- FR45: Corrections Notification ✅
- FR46: Image Verification ✅
- FR47: Archive.org Integration ✅
- FR48: Contributor Safety ✅
- FR49: A/B Testing ✅
- FR54: Evidence Deduplication ✅
- NFR11: Quality Assurance Framework ✅
- NFR12: Security Controls ✅
- NFR13: Quality Metrics Dashboard ✅
V1.1+ (Future):
- FR50: OSINT Integration
- FR51: Video Verification
- FR52: Detection Training
- FR53: Cross-Org Sharing
Total: 11 critical requirements for V1.0
FR54: Evidence Deduplication
Priority: CRITICAL (POC2/Beta)
Fulfills: Accurate evidence counting, quality metrics
Phase: POC2, Beta 0, V1.0
Purpose: Avoid counting the same source multiple times when it appears in different forms.
Specification:
Deduplication Logic:
- URL Normalization:
- Remove tracking parameters (?utm_source=...)
- Normalize http/https
- Normalize www/non-www
- Handle redirects
2. Content Similarity:
- If two sources have >90% text similarity → Same source
- If one is subset of other → Same source
- Use fuzzy matching for minor differences
3. Cross-Domain Syndication:
- Detect wire service content (AP, Reuters)
- Mark as single source if syndicated
- Count original publication only
Display:
1. Original Article (NYTimes)
- Also appeared in: WashPost, Guardian (syndicated)
2. Research Paper (Nature)
3. Official Statement (WHO)
Acceptance Criteria:
- ✅ URL normalization works
- ✅ Content similarity detected
- ✅ Syndicated content identified
- ✅ Unique vs. total counts accurate
- ✅ Improves evidence quality metrics
Additional Requirements (Lower Priority)
FR7: Automated Verdicts (Enhanced with Quality Gates) ===
POC1+ Enhancement:
After AKEL generates verdict, it passes through quality gates:
1. Extract claims
↓
2. [GATE 1] Validate fact-checkable
↓
3. Generate scenarios
↓
4. Generate verdicts
↓
5. [GATE 4] Validate confidence
↓
6. Display to user
Updated Verdict States:
- PUBLISHED
- INSUFFICIENT_EVIDENCE
- NON_FACTUAL_CLAIM
- PROCESSING
- ERROR
FR4: Analysis Summary (Enhanced with Quality Metadata)
POC1+ Enhancement:
Display quality indicators:
Verifiable Claims: 3/5
High Confidence Verdicts: 1
Medium Confidence: 2
Evidence Sources: 12
Avg Source Quality: 0.73
Quality Score: 8.5/10