Jun 10, 2026

AI Exposure Scores: What they measure, what they miss, and what comes next

Exposure scores like GPTs, while methodologically influential, have widened structural and coordination gaps between research and policy that require collaborative efforts to address for more reliable and policy-relevant AI impact assessments.

Read the report

Authors

Campbell Lund, Thomas Euyang, Zanele Munyikwa, and Marzieh Fadaee

Abstract

A set of exposure scores calculated in 2023 have become a central empirical input to the future of work debate. Produced by Eloundou et al. (2023) and referenced throughout this piece as the GPTs are GPTs scores, they define exposure as the percentage of occupational tasks a large language model (LLM) tool can assist with. While this work is a genuine methodological contribution, as these scores travel from the time and place they were calculated, the limitations have widened as a result. This piece traces two gaps that have widened as a result. The first is a structural gap between what static exposure scores, such as GPTs are GPTs, measure and what kinds of evidence policy questions need to be reliably answered. Static scoring captures the performance of specific AI systems, against a specific occupational taxonomy, at a specific moment in time. Using the widespread diffusion of the GPTs are GPTs scores as a case study, we observe tangible impacts in how the temporal, geographic, and ontological limitations of these scores compound when translated into policy-facing analyses. Closing this gap is the motivation behind a growing body of work, and we survey five families of recent research that respond directly to the limitations of static exposure scoring: (1) dynamic and benchmark-based measures, (2) ensemble methods, (3) task-framework extensions, (4) worker-centered metrics, and (5) adoption and usage data. The second gap is characteristic of the future of work debate at large, and it is the gap this piece argues needs more attention: the coordination between researchers and policymakers. The methodological work responding to the limitations of static scoring is largely siloed within the research community. The policy-relevant work—analyses which ask who is harmed, who benefits, how, and when—continues to reference the static GPTs are GPTs scores without engagement with the methodological updates that would let these questions be answered more reliably. We close by asking what remains beyond ex-post frameworks and the deliberate, political work of reimagining what futures are worthy of building towards are additional steps towards navigating uncertainty—ones that both research and policy communities would benefit from further engagement with. We argue that closing the gap between research and policy is a shared task, with distinct but parallel responsibilities. Policymakers need to widen the evidence base they rely on, engage workers as epistemic partners, and shift the goal from prediction to preparedness. Researchers need to continue building the data infrastructure, engage with interdisciplinary and participatory methods, and produce work with the needs of policymakers in mind. Better measurement matters, but it will not close the second gap alone.

Related works

Research

CIRCLE: A Framework for Evaluating AI from a Real-World Lens

Read

Research

Soft-SVeRL: Self-Verified Reinforcement Learning with Soft Rewards

Read

Research

Self-Improving Robust Preference Optimization

Read