Publications
Please visit Google Scholar to see the full list.
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts
Yihan Wu, Yifan Peng, Yichen Lu, Xuankai Chang, Ruihua Song, Shinji Watanabe
SLT 2024
paper
Understanding Human Preferences: Towards More Personalized Video to Text Generation
Yihan Wu, Ruihua Song, Xu Chen, Hao Jiang, Zhao Cao, Jin Yu
ACM Web Conference 2024
paper
Espnet-codec: Comprehensive training and evaluation of neural codecs for audio, music, and speech
Jiatong Shi*, Jinchuan Tian*, Yihan Wu*, Jee-weon Jung, Jia Qi Yip, Yoshiki Masuyama, William Chen, Yuning Wu, Yuxun Tang, Massa Baali, Dareen Alharhi, Dong Zhang, Ruifan Deng, Tejes Srivastava, Haibin Wu, Alexander H Liu, Bhiksha Raj, Qin Jin, Ruihua Song, Shinji Watanabe
SLT 2024
paper
Tiva: Time-aligned video-to-audio generation
Xihua Wang*, Yuyue Wang*, Yihan Wu*, Ruihua Song, Xu Tan, Zehua Chen, Hongteng Xu, Guodong Sui
ACM MM 2024
paper
LoVA: Long-form Video-to-Audio Generation
Xin Cheng, Xihua Wang, Yihan Wu, Yuyue Wang, Ruihua Song
NeurIPS 2024 Workshop
paper
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Xuankai Chang, Jiatong Shi, Jinchuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, Qin Jin
INTERSPEECH 2024
paper
VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing
Yihan Wu, Junliang Guo, Xu Tan, Chen Zhang, Bohan Li, Ruihua Song, Lei He, Sheng Zhao, Arul Menezes, Jiang Bian
AAAI 2023
paper
PromptTTS: Controllable Text-to-Speech with Text Descriptions
Zhifang Guo, Yichong Leng, Yihan Wu, Sheng Zhao, Xu Tan
ICASSP 2023
paper
Adaspeech 4: Adaptive text to speech in zero-shot scenarios
Yihan Wu, Xu Tan, Bohan Li, Lei He, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu
INTERSPEECH 2022
paper
Self-supervised context-aware style representation for expressive speech synthesis
Yihan Wu, Xi Wang, Shaofei Zhang, Lei He, Ruihua Song, Jian-Yun Nie
INTERSPEECH 2022
paper