
Filter papers
Remove All Filters
Policy Primer - Translating Safety
AI Policy
multilingual
Safety
AI Policy
multilingual
Safety
The Reality of AI and Biorisk
AI Policy
Responsible AI
Safety
AI Policy
Responsible AI
Safety
Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning
multilingual
Safety
Supervised Learning
Language
Human Feedback
Efficiency
multilingual
Safety
Supervised Learning
Language
Human Feedback
Efficiency
AI Policy
Safety
Consent in Crisis: The Rapid Decline of the AI Data Commons
Responsible AI
Safety
AI Policy
Data
Responsible AI
Safety
AI Policy
Data
The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm
multilingual
Safety
Supervised Learning
Language
Human Feedback
multilingual
Safety
Supervised Learning
Language
Human Feedback
From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models
Language
Safety
Generative Models
Continual Learning
Language
Safety
Generative Models
Continual Learning
The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI
Responsible AI
Safety
AI Policy
Data
Responsible AI
Safety
AI Policy
Data
Safety
Privacy
Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models
Safety
Generative Models
Language Models
Safety
Generative Models
Language Models
The Presidio Recommendations on Responsible Generative AI - World Economic Forum
Interpretability
Safety
Responsible AI
AI Policy
Interpretability
Safety
Responsible AI
AI Policy
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Safety
Reasoning
Interpretability
Safety
Reasoning
Interpretability
On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research
Safety
Reproducibility
Responsible AI
Safety
Reproducibility
Responsible AI