Oct 11, 2023
Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models
Toxicity definitions evolve and change over time. Why don't mitigation techniques account for this? With Goodtriever, we take the first steps on continual toxicity mitigation!

Authors
Luiza Pozzobon, Beyza Ermis, Patrick Lewis, Sara Hooker
Abstract
Considerable effort has been dedicated to mitigating toxicity, but existing methods often require drastic modifications to model parameters or the use of computationally intensive auxiliary models. Furthermore, previous approaches have often neglected the crucial factor of language's evolving nature over time. In this work, we present a comprehensive perspective on toxicity mitigation that takes into account its changing nature. We introduce Goodtriever, a flexible methodology that matches the current state-of-the-art toxicity mitigation while achieving 43% relative latency reduction during inference and being more computationally efficient. By incorporating a retrieval-based approach at decoding time, Goodtriever enables toxicity-controlled text generation. Our research advocates for an increased focus on adaptable mitigation techniques, which better reflect the data drift models face when deployed in the wild
Related works

Research
The State of Multilingual LLM Safety Research: From Measuring the Language Gap to Mitigating It
Read

Research
When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs
Read

Research
Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers
Read