Cohere and Fujitsu Announce Strategic Partnership To Provide Japanese Enterprise AI Services
Date: Mar 20, 2024
Time: 4:00 PM - 5:00 PM
Location: Online
About: I am a PhD student at the Center for Data Science at NYU advised by Professor Andrew Gordon Wilson and a Visiting Researcher in the Fundamental AI Research (FAIR) group at Meta AI where I work with Brandon Amos. I work on the foundations of deep learning. My goal is to understand and quantify generalization in deep learning, and use this understanding to build more robust and reliable machine learning models.
Session Description: Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply regurgitate their training corpora. We provide the first non-vacuous generalization bounds for pretrained large language models (LLMs), indicating that language models are capable of discovering regularities that generalize to unseen data. In particular, we derive a compression bound that is valid for the unbounded log-likelihood loss using prediction smoothing, and we extend the bound to handle subsampling, accelerating bound computation on massive datasets. To achieve the extreme level of compression required for non-vacuous generalization bounds, we devise SubLoRA, a low-dimensional non-linear parameterization. Using this approach, we find that larger models have better generalization bounds and are more compressible than smaller models
Add event to calendar