
Filter papers
Remove All Filters
Mixture of Experts
Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress
Language
Robustness
Mixture of Experts
Language
Robustness
Mixture of Experts
BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts
Mixture of Experts
Language Models
Efficiency
Mixture of Experts
Language Models
Efficiency
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning
Mixture of Experts
Efficiency
Transfer Learning
Language
Generative Models
Compute
Mixture of Experts
Efficiency
Transfer Learning
Language
Generative Models
Compute