BlogAuthor

Zewen Shen

Member of Technical Staff, Foundations

Zewen Shen is a Member of Technical Staff in the Foundations team at Cohere, where he works on LLM inference acceleration. He holds a Ph.D. from the University of Toronto, with a background in computational mathematics that carries into his work on model quantization and mixed-precision computation.

Abstract imagery depicting concentric chart

AI for Developers Technology

Production-Ready W4A8: vLLM Integration and Quality Recovery Techniques Explained