Wiki source code of Requirements

Version 1.1 by Robert Schaub on 2025/12/18 12:03

Show last authors
1 = Requirements =
2 This page defines **Roles**, **Content States**, **Rules**, and **System Principles** for FactHarbor.
3 **Core Philosophy:** Invest in system improvement, not manual data correction. When AI makes errors, improve the algorithm and re-process automatically.
4 == 1. Roles ==
5 FactHarbor uses three simple roles plus a reputation system.
6 === 1.1 Reader ===
7 **Who**: Anyone (no login required)
8 **Can**:
9 * Browse and search claims
10 * View scenarios, evidence, verdicts, and confidence scores
11 * Flag issues or errors
12 * Use filters, search, and visualization tools
13 * Submit claims automatically (new claims added if not duplicates)
14 **Cannot**:
15 * Modify content
16 * Access edit history details
17 === 1.2 Contributor ===
18 **Who**: Registered users (earns reputation through contributions)
19 **Can**:
20 * Everything a Reader can do
21 * Edit claims, evidence, and scenarios
22 * Add sources and citations
23 * Suggest improvements to AI-generated content
24 * Participate in discussions
25 * Earn reputation points for quality contributions
26 **Reputation System**:
27 * New contributors: Limited edit privileges
28 * Established contributors (established reputation): Full edit access
29 * Trusted contributors (substantial reputation): Can approve certain changes
30 * Reputation earned through: Accepted edits, helpful flags, quality contributions
31 * Reputation lost through: Reverted edits, invalid flags, abuse
32 **Cannot**:
33 * Delete or hide content (only moderators)
34 * Override moderation decisions
35 === 1.3 Moderator ===
36 **Who**: Trusted community members with proven track record, appointed by governance board
37 **Can**:
38 * Review flagged content
39 * Hide harmful or abusive content
40 * Resolve disputes between contributors
41 * Issue warnings or temporary bans
42 * Make final decisions on content disputes
43 * Access full audit logs
44 **Cannot**:
45 * Change governance rules
46 * Permanently ban users without board approval
47 * Override technical quality gates
48 **Note**: Small team (3-5 initially), supported by automated moderation tools.
49 === 1.4 Domain Trusted Contributors (Optional, Task-Specific) ===
50 **Who**: Subject matter specialists invited for specific high-stakes disputes
51 **Not a permanent role**: Contacted externally when needed for contested claims in their domain
52 **When used**:
53 * Medical claims with life/safety implications
54 * Legal interpretations with significant impact
55 * Scientific claims with high controversy
56 * Technical claims requiring specialized knowledge
57 **Process**:
58 * Moderator identifies need for expert input
59 * Contact expert externally (don't require them to be users)
60 * Trusted Contributor provides written opinion with sources
61 * Opinion added to claim record
62 * Trusted Contributor acknowledged in claim
63 == 2. Content States ==
64 FactHarbor uses two content states. Focus is on transparency and confidence scoring, not gatekeeping.
65 === 2.1 Published ===
66 **Status**: Visible to all users
67 **Includes**:
68 * AI-generated analyses (default state)
69 * User-contributed content
70 * Edited/improved content
71 **Quality Indicators** (displayed with content):
72 * **Confidence Score**: 0-100% (AI's confidence in analysis)
73 * **Source Quality Score**: 0-100% (based on source track record)
74 * **Controversy Flag**: If high dispute/edit activity
75 * **Completeness Score**: % of expected fields filled
76 * **Last Updated**: Date of most recent change
77 * **Edit Count**: Number of revisions
78 **Automatic Warnings**:
79 * Confidence < 60%: "Low confidence - use caution"
80 * Source quality < 40%: "Sources may be unreliable"
81 * High controversy: "Disputed - multiple interpretations exist"
82 * Medical/Legal/Safety domain: "Seek professional advice"
83 === 2.2 Hidden ===
84 **Status**: Not visible to regular users (only to moderators)
85 **Reasons**:
86 * Spam or advertising
87 * Personal attacks or harassment
88 * Illegal content
89 * Privacy violations
90 * Deliberate misinformation (verified)
91 * Abuse or harmful content
92 **Process**:
93 * Automated detection flags for moderator review
94 * Moderator confirms and hides
95 * Original author notified with reason
96 * Can appeal to board if disputes moderator decision
97 **Note**: Content is hidden, not deleted (for audit trail)
98 == 3. Contribution Rules ==
99 === 3.1 All Contributors Must ===
100 * Provide sources for factual claims
101 * Use clear, neutral language in FactHarbor's own summaries
102 * Respect others and maintain civil discourse
103 * Accept community feedback constructively
104 * Focus on improving quality, not protecting ego
105 === 3.2 AKEL (AI System) ===
106 **AKEL is the primary system**. Human contributions supplement and train AKEL.
107 **AKEL Must**:
108 * Mark all outputs as AI-generated
109 * Display confidence scores prominently
110 * Provide source citations
111 * Flag uncertainty clearly
112 * Identify contradictions in evidence
113 * Learn from human corrections
114 **When AKEL Makes Errors**:
115 1. Capture the error pattern (what, why, how common)
116 2. Improve the system (better prompt, model, validation)
117 3. Re-process affected claims automatically
118 4. Measure improvement (did quality increase?)
119 **Human Role**: Train AKEL through corrections, not replace AKEL
120 === 3.3 Contributors Should ===
121 * Improve clarity and structure
122 * Add missing sources
123 * Flag errors for system improvement
124 * Suggest better ways to present information
125 * Participate in quality discussions
126 === 3.4 Moderators Must ===
127 * Be impartial
128 * Document moderation decisions
129 * Respond to appeals promptly
130 * Use automated tools to scale efforts
131 * Focus on abuse/harm, not routine quality control
132 == 4. Quality Standards ==
133 === 4.1 Source Requirements ===
134 **Track Record Over Credentials**:
135 * Sources evaluated by historical accuracy
136 * Correction policy matters
137 * Independence from conflicts of interest
138 * Methodology transparency
139 **Source Quality Database**:
140 * Automated tracking of source accuracy
141 * Correction frequency
142 * Reliability score (updated continuously)
143 * Users can see source track record
144 **No automatic trust** for government, academia, or media - all evaluated by track record.
145 === 4.2 Claim Requirements ===
146 * Clear subject and assertion
147 * Verifiable with available information
148 * Sourced (or explicitly marked as needing sources)
149 * Neutral language in FactHarbor summaries
150 * Appropriate context provided
151 === 4.3 Evidence Requirements ===
152 * Publicly accessible (or explain why not)
153 * Properly cited with attribution
154 * Relevant to claim being evaluated
155 * Original source preferred over secondary
156 === 4.4 Confidence Scoring ===
157 **Automated confidence calculation based on**:
158 * Source quality scores
159 * Evidence consistency
160 * Contradiction detection
161 * Completeness of analysis
162 * Historical accuracy of similar claims
163 **Thresholds**:
164 * < 40%: Too low to publish (needs improvement)
165 * 40-60%: Published with "Low confidence" warning
166 * 60-80%: Published as standard
167 * 80-100%: Published as "High confidence"
168 == 5. Automated Risk Scoring ==
169 **Replace manual risk tiers with continuous automated scoring**.
170 === 5.1 Risk Score Calculation ===
171 **Factors** (weighted algorithm):
172 * **Domain sensitivity**: Medical, legal, safety auto-flagged higher
173 * **Potential impact**: Views, citations, spread
174 * **Controversy level**: Flags, disputes, edit wars
175 * **Uncertainty**: Low confidence, contradictory evidence
176 * **Source reliability**: Track record of sources used
177 **Score**: 0-100 (higher = more risk)
178 === 5.2 Automated Actions ===
179 * **Score > 80**: Flag for moderator review before publication
180 * **Score 60-80**: Publish with prominent warnings
181 * **Score 40-60**: Publish with standard warnings
182 * **Score < 40**: Publish normally
183 **Continuous monitoring**: Risk score recalculated as new information emerges
184 == 6. System Improvement Process ==
185 **Core principle**: Fix the system, not just the data.
186 === 6.1 Error Capture ===
187 **When users flag errors or make corrections**:
188 1. What was wrong? (categorize)
189 2. What should it have been?
190 3. Why did the system fail? (root cause)
191 4. How common is this pattern?
192 5. Store in ErrorPattern table (improvement queue)
193 === 6.2 Weekly Improvement Cycle ===
194 1. **Review**: Analyze top error patterns
195 2. **Develop**: Create fix (prompt, model, validation)
196 3. **Test**: Validate fix on sample claims
197 4. **Deploy**: Roll out if quality improves
198 5. **Re-process**: Automatically update affected claims
199 6. **Monitor**: Track quality metrics
200 === 6.3 Quality Metrics Dashboard ===
201 **Track continuously**:
202 * Error rate by category
203 * Source quality distribution
204 * Confidence score trends
205 * User flag rate (issues found)
206 * Correction acceptance rate
207 * Re-work rate
208 * Claims processed per hour
209 **Goal**: 10% monthly improvement in error rate
210 == 7. Automated Quality Monitoring ==
211 **Replace manual audit sampling with automated monitoring**.
212 === 7.1 Continuous Metrics ===
213 * **Source quality**: Track record database
214 * **Consistency**: Contradiction detection
215 * **Clarity**: Readability scores
216 * **Completeness**: Field validation
217 * **Accuracy**: User corrections tracked
218 === 7.2 Anomaly Detection ===
219 **Automated alerts for**:
220 * Sudden quality drops
221 * Unusual patterns
222 * Contradiction clusters
223 * Source reliability changes
224 * User behavior anomalies
225 === 7.3 Targeted Review ===
226 * Review only flagged items
227 * Random sampling for calibration (not quotas)
228 * Learn from corrections to improve automation
229 == 8. Claim Intake & Normalization ==
230 === 8.1 FR1 – Claim Intake ===
231 * Users submit claims via simple form or API
232 * Claims can be text, URL, or image
233 * Duplicate detection (semantic similarity)
234 * Auto-categorization by domain
235 === 8.2 FR2 – Claim Normalization ===
236 * Standardize to clear assertion format
237 * Extract key entities (who, what, when, where)
238 * Identify claim type (factual, predictive, evaluative)
239 * Link to existing similar claims
240 === 8.3 FR3 – Claim Classification ===
241 * Domain: Politics, Science, Health, etc.
242 * Type: Historical fact, current stat, prediction, etc.
243 * Risk score: Automated calculation
244 * Complexity: Simple, moderate, complex
245 == 9. Scenario System ==
246 === 9.1 FR4 – Scenario Generation ===
247 **Automated scenario creation**:
248 * AKEL analyzes claim and generates likely scenarios
249 * Each scenario includes: assumptions, evidence, conclusion
250 * Users can flag incorrect scenarios
251 * System learns from corrections
252 === 9.2 FR5 – Evidence Linking ===
253 * Automated evidence discovery from sources
254 * Relevance scoring
255 * Contradiction detection
256 * Source quality assessment
257 === 9.3 FR6 – Scenario Comparison ===
258 * Side-by-side comparison interface
259 * Highlight key differences
260 * Show evidence supporting each
261 * Display confidence scores
262 == 10. Verdicts & Analysis ==
263 === 10.1 FR7 – Automated Verdicts ===
264 * AKEL generates verdict based on evidence
265 * Confidence score displayed prominently
266 * Source quality indicators
267 * Contradictions noted
268 * Uncertainty acknowledged
269 === 10.2 FR8 – Time Evolution ===
270 * Claims update as new evidence emerges
271 * Version history maintained
272 * Changes highlighted
273 * Confidence score trends visible
274 == 11. Workflow & Moderation ==
275 === 11.1 FR9 – Publication Workflow ===
276 **Simple flow**:
277 1. Claim submitted
278 2. AKEL processes (automated)
279 3. If confidence > threshold: Publish
280 4. If confidence < threshold: Flag for improvement
281 5. If risk score > threshold: Flag for moderator
282 **No multi-stage approval process**
283 === 11.2 FR10 – Moderation ===
284 **Focus on abuse, not routine quality**:
285 * Automated abuse detection
286 * Moderators handle flags
287 * Quick response to harmful content
288 * Minimal involvement in routine content
289 === 11.3 FR11 – Audit Trail ===
290 * All edits logged
291 * Version history public
292 * Moderation decisions documented
293 * System improvements tracked
294 == 12. Technical Requirements ==
295 === 12.1 NFR1 – Performance ===
296 * Claim processing: < 30 seconds
297 * Search response: < 2 seconds
298 * Page load: < 3 seconds
299 * 99% uptime
300 === 12.2 NFR2 – Scalability ===
301 * Handle 10,000 claims initially
302 * Scale to 1M+ claims
303 * Support 100K+ concurrent users
304 * Automated processing scales linearly
305 === 12.3 NFR3 – Transparency ===
306 * All algorithms open source
307 * All data exportable
308 * All decisions documented
309 * Quality metrics public
310 === 12.4 NFR4 – Security & Privacy ===
311 * Follow [[Privacy Policy>>FactHarbor.Organisation.How-We-Work-Together.Privacy-Policy]]
312 * Secure authentication
313 * Data encryption
314 * Regular security audits
315 === 12.5 NFR5 – Maintainability ===
316 * Modular architecture
317 * Automated testing
318 * Continuous integration
319 * Comprehensive documentation
320 == 13. MVP Scope ==
321 **Phase 1 (Months 1-3): Read-Only MVP**
322 Build:
323 * Automated claim analysis
324 * Confidence scoring
325 * Source evaluation
326 * Browse/search interface
327 * User flagging system
328 **Goal**: Prove AI quality before adding user editing
329 **Phase 2 (Months 4-6): User Contributions**
330 Add only if needed:
331 * Simple editing (Wikipedia-style)
332 * Reputation system
333 * Basic moderation
334 **Phase 3 (Months 7-12): Refinement**
335 * Continuous quality improvement
336 * Feature additions based on real usage
337 * Scale infrastructure
338 **Deferred**:
339 * Federation (until multiple successful instances exist)
340 * Complex contribution workflows (focus on automation)
341 * Extensive role hierarchy (keep simple)
342 == 14. Success Metrics ==
343 **System Quality** (track weekly):
344 * Error rate by category (target: -10%/month)
345 * Average confidence score (target: increase)
346 * Source quality distribution (target: more high-quality)
347 * Contradiction detection rate (target: increase)
348 **Efficiency** (track monthly):
349 * Claims processed per hour (target: increase)
350 * Human hours per claim (target: decrease)
351 * Automation coverage (target: >90%)
352 * Re-work rate (target: <5%)
353 **User Satisfaction** (track quarterly):
354 * User flag rate (issues found)
355 * Correction acceptance rate (flags valid)
356 * Return user rate
357 * Trust indicators (surveys)
358 == 15. Related Pages ==
359 * [[Architecture>>FactHarbor.Specification.Architecture.WebHome]]
360 * [[Data Model>>FactHarbor.Specification.Data Model.WebHome]]
361 * [[Workflows>>FactHarbor.Specification.Workflows.WebHome]]
362 * [[Global Rules>>FactHarbor.Organisation.How-We-Work-Together.GlobalRules.WebHome]]
363 * [[Privacy Policy>>FactHarbor.Organisation.How-We-Work-Together.Privacy-Policy]]