Apr 24, 2023

On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research

How changes to the Perspective API, used widely for toxicity evaluation, impact research reproducibility and rankings of model risk.

Read the paper

Authors

Luiza Pozzobon, Beyza Ermis, Patrick Lewis, Sara Hooker

Abstract

Perception of toxicity evolves over time and often differs between geographies and cultural backgrounds. Similarly, black-box commercially available APIs for detecting toxicity, such as the Perspective API, are not static, but frequently retrained to address any unattended weaknesses and biases. We evaluate the implications of these changes on the reproducibility of findings that compare the relative merits of models and methods that aim to curb toxicity. Our findings suggest that research that relied on inherited automatic toxicity scores to compare models and techniques may have resulted in inaccurate findings. Rescoring all models from HELM, a widely respected living benchmark, for toxicity with the recent version of the API led to a different ranking of widely used foundation models. We suggest caution in applying apples-to-apples comparisons between studies and lay recommendations for a more structured approach to evaluating toxicity over time. Code and data are available at this https URL.

Related works

Research

One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers

Read

Research

The Multilingual Divide and Its Impact on Global AI Safety

Read

Research

The State of Multilingual LLM Safety Research: From Measuring the Language Gap to Mitigating It

Read