Global MMLU
Human-verified multilingual evaluation at scale
Accelerating multilingual AI development through comprehensive, bias-aware evaluation.
Responsible Translation
We use multiple translation approaches with human verification to ensure fair and accurate language representation for all languages.
Inclusive Coverage
We deliberately include high, mid, and low-resource languages to expose performance gaps and ensure no communities are left behind in AI evaluation.
Cultural Expertise
We classify questions by cultural sensitivity to expose bias and ensure models work equally well across all communities and contexts.
New Release
Global MMLU Lite V3
Our latest release adds six new languages to Global MMLU Lite, perfect for teams needing quick, accurate multilingual assessment.
Wider Coverage: Now featuring Oriya, Hungarian, Tajik, Slovak, Czech and Italian for even broader representation.
Fast, Fair Evaluation: Reliable performance metrics across 23 languages with our streamlined 6,000-sample dataset
Human Verified: Test model performance across diverse linguistic contexts with our lightweight, human-verified benchmark.
Global MMLU
Unlock the full potential of your multilingual AI with complete dataset evaluation across 42 languages.
Complete Coverage: 42 languages with 589,764 samples for thorough evaluation.
Research-Grade Depth: Full dataset with human+machine translated samples.
Production-Ready: Used by frontier labs to test models for real-world deployment.
Global MMLU continuously evolves through open science collaboration with independent contributors and partner organizations worldwide, incorporating cultural expertise to create a fairer, more comprehensive benchmark.
Collaborators
Total annotations
