Publications

Please visit Google Scholar to see the full list.

Robust Audiovisual Speech Recognition Models with Mixture-of-Experts

Yihan Wu, Yifan Peng, Yichen Lu, Xuankai Chang, Ruihua Song, Shinji Watanabe
SLT 2024
paper

Understanding Human Preferences: Towards More Personalized Video to Text Generation

Yihan Wu, Ruihua Song, Xu Chen, Hao Jiang, Zhao Cao, Jin Yu
ACM Web Conference 2024
paper

Espnet-codec: Comprehensive training and evaluation of neural codecs for audio, music, and speech

Jiatong Shi*, Jinchuan Tian*, Yihan Wu*, Jee-weon Jung, Jia Qi Yip, Yoshiki Masuyama, William Chen, Yuning Wu, Yuxun Tang, Massa Baali, Dareen Alharhi, Dong Zhang, Ruifan Deng, Tejes Srivastava, Haibin Wu, Alexander H Liu, Bhiksha Raj, Qin Jin, Ruihua Song, Shinji Watanabe
SLT 2024
paper

Tiva: Time-aligned video-to-audio generation

Xihua Wang*, Yuyue Wang*, Yihan Wu*, Ruihua Song, Xu Tan, Zehua Chen, Hongteng Xu, Guodong Sui
ACM MM 2024
paper

LoVA: Long-form Video-to-Audio Generation

Xin Cheng, Xihua Wang, Yihan Wu, Yuyue Wang, Ruihua Song
NeurIPS 2024 Workshop
paper

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units

Xuankai Chang, Jiatong Shi, Jinchuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, Qin Jin
INTERSPEECH 2024
paper

VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing

Yihan Wu, Junliang Guo, Xu Tan, Chen Zhang, Bohan Li, Ruihua Song, Lei He, Sheng Zhao, Arul Menezes, Jiang Bian
AAAI 2023
paper

PromptTTS: Controllable Text-to-Speech with Text Descriptions

Zhifang Guo, Yichong Leng, Yihan Wu, Sheng Zhao, Xu Tan
ICASSP 2023
paper

Adaspeech 4: Adaptive text to speech in zero-shot scenarios

Yihan Wu, Xu Tan, Bohan Li, Lei He, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu
INTERSPEECH 2022
paper

Self-supervised context-aware style representation for expressive speech synthesis

Yihan Wu, Xi Wang, Shaofei Zhang, Lei He, Ruihua Song, Jian-Yun Nie
INTERSPEECH 2022
paper