About me

My name is Yihan Wu. I am a forth-year Ph.D. student in the Gaoling School of Artificial Intelligence, Renmin University of China. My advisor is Prof. Ruihua Song. Prior to my Ph.D. studies, I earned my B.S. degree from Shandong University in 2021.

I am broadly interested in speech related researches, including speech synthesis, speech recognition, and speech language models.

Recent news

  • 🔍 I’m currently looking for summer internships!
  • 🎉 Our paper “Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization” was accepted by AAAI 2025!
  • đź’Ľ I’m on the job market as well.

Work experience

  • Visiting Scholar
    Language Technologies Institute, Carnegie Mellon University (Sep. 2023 - Sep. 2024)

    Worked with Prof. Shinji Watanabe.

  • Research Intern
    Microsoft Research Asia (Oct. 2021 - Oct. 2022)

    Worked with Xu Tan.

  • Research Intern
    Microsoft C+AI, Speech Team (May 2021 - Oct. 2021)

    Worked with Xi Wang and Lei He

Selected papers

Please visit Google Scholar to see the full list.

Robust Audiovisual Speech Recognition Models with Mixture-of-Experts

Yihan Wu, Yifan Peng, Yichen Lu, Xuankai Chang, Ruihua Song, Shinji Watanabe
SLT 2024
paper

Espnet-codec: Comprehensive training and evaluation of neural codecs for audio, music, and speech

Jiatong Shi*, Jinchuan Tian*, Yihan Wu*, Jee-weon Jung, Jia Qi Yip, Yoshiki Masuyama, William Chen, Yuning Wu, Yuxun Tang, Massa Baali, Dareen Alharhi, Dong Zhang, Ruifan Deng, Tejes Srivastava, Haibin Wu, Alexander H Liu, Bhiksha Raj, Qin Jin, Ruihua Song, Shinji Watanabe
SLT 2024
paper

Tiva: Time-aligned video-to-audio generation

Xihua Wang*, Yuyue Wang*, Yihan Wu*, Ruihua Song, Xu Tan, Zehua Chen, Hongteng Xu, Guodong Sui
ACM MM 2024
paper

VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing

Yihan Wu, Junliang Guo, Xu Tan, Chen Zhang, Bohan Li, Ruihua Song, Lei He, Sheng Zhao, Arul Menezes, Jiang Bian
AAAI 2023
paper

Adaspeech 4: Adaptive text to speech in zero-shot scenarios

Yihan Wu, Xu Tan, Bohan Li, Lei He, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu
INTERSPEECH 2022
paper

Self-supervised context-aware style representation for expressive speech synthesis

Yihan Wu, Xi Wang, Shaofei Zhang, Lei He, Ruihua Song, Jian-Yun Nie
INTERSPEECH 2022
paper