


Filter papers
Remove All Filters
Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts
Mixture of Experts
Scholars
Mixture of Experts
Scholars
Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress
Language
Robustness
Mixture of Experts
Scholars
Language
Robustness
Mixture of Experts
Scholars
BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts
Mixture of Experts
Language Models
Efficiency
Scholars
Mixture of Experts
Language Models
Efficiency
Scholars
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning
Mixture of Experts
Efficiency
Transfer Learning
Language
Generative Models
Compute
Scholars
Mixture of Experts
Efficiency
Transfer Learning
Language
Generative Models
Compute
Scholars