Introducing Command R+: Our new, most powerful model in the Command R family.

Learn More

Cohere For AI - Guest Speaker: Ziyang Chen, PhD Student


Date: Jul 13, 2024

Time: 4:00 PM - 5:00 PM

Location: Online

About the session: tl;dr: We use diffusion models to generate spectrograms that look like images but can also be played as sound. Spectrograms are 2D representations of sound that look very different from the images found in our visual world. And natural images, when played as spectrograms, make unnatural sounds. In this paper, we show that it is possible to synthesize spectrograms that simultaneously look like natural images and sound like natural audio. We call these spectrograms images that sound. Our approach is simple and zero-shot, and it leverages pre-trained text-to-image and text-to-spectrogram diffusion models that operate in a shared latent space.

Bio: I am a third-year PhD student at the University of Michigan, advised by Prof. Andrew Owens. My research interests lie at the intersection of multimodal learning and self-supervision, specifically exploring the relationship between audio and visual modalities. During my Ph.D., I have interned at Codec Avatars Lab, Meta, working on the neural acoustic fields. I am now interning at Adobe Research this summer, exploring audio-visual generative models.