Post-Doctoral Research Associate
Language Technologies Institute
Carnegie Mellon University
I am a Post-Doctoral Research Associate at Carnegie Mellon University's Language Technologies Institute, working on speech and audio processing with machine learning. My research focuses on developing robust systems for real-world scenarios involving overlapping speech, noise, and challenging acoustic conditions.
Developing advanced models like SepFormer for separating overlapping speech signals in challenging acoustic environments.
Creating systems that can identify "who spoke when" in multi-speaker audio recordings.
Improving speech quality and intelligibility by removing noise and reverberation from audio signals.
Building robust ASR systems that can accurately transcribe speech in diverse conditions.
Sound event detection and acoustic scene classification for computational analysis of environmental audio, including contributions to DCASE challenges.
Co-author of this widely-used PyTorch-based speech toolkit. One of the most influential open-source projects in speech processing with 884+ citations.
View on GitHubCo-author of this PyTorch-based audio source separation toolkit, enabling researchers worldwide to develop advanced speech separation systems.
View on GitHubCo-author of speech enhancement extensions to ESPnet, providing state-of-the-art tools for robust speech processing.
View on GitHubLead organizer of these challenges pushing the field toward generalizable and robust distant meeting transcription.
Learn MoreCo-organizer of this speech enhancement challenge series focusing on universality, robustness, and generalizability.
Learn MoreLed this sound event detection task for several years, advancing machine listening research and methodology.
View on GitHubSelected publications from my research in speech processing and machine learning. For a complete list, please visit my Google Scholar profile.
Carnegie Mellon University
Language Technologies Institute
Pittsburgh, PA