Wen-Chin Huang 黃文勁

Photo taken at Kiyomizu-dera, Kyoto, Japan, Dec. 2022.

Education. I was born in Taiwan and now based in Nagoya, Japan. I received M.S. and Ph.D. from the Graduate School of Informatics, Nagoya University in 2021 and 2024, respectively. I received B.S. in Computer Science and Information Engineering from National Taiwan University in 2018.

Work experience. I worked as a student researcher at Google Japan from April 2023 to March 2024, under the supervision of Yuma Koizumi. From May to September 2022, I was an research intern at FAIR (Fundamental AI Research), Meta, working on speech-to-speech translation, under the supervision of Peng-Jen Chen. From August 2021 to February 2022, I was an research intern at Reality Labs Research (RL-R), Meta, working on binaural speech synthesis, under the supervision of Dejan Markovic. From August 2019 to September 2019, I interned at the NTT Communication Science Laboratories, NTT Corporation under the supervision of Prof. Hirokazu Kameoka. I work closely with the Institute of Information Science in Academia Sinica, Taipei, Taiwan with advisors Prof. Hsin-Min Wang and Prof. Yu Tsao, where I was a research assistant from July 2017 to March 2019.

Honors and activities. I received the Research Fellowship for Young Scientists (DC1) from Japan Society for the Promotion of Science (JSPS), which last from April 2021 to March 2024. I am a co-organizer of the VoiceMOS Challenge series, Voice Conversion Challenge 2020, and Singing Voice Conversion Challenge 2023. I was honored the Outstanding Graduate Student Award (学術奨励賞) of Nagoya University in 2023. I was honored the 16th IEEE Signal Processing Society Japan Student Best Paper Award, the Best Paper Award at the 13th Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021 and Best Student Paper Award at the 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), 2018. I am also a reviewer of several journals including IEEE SPL, IEEE/ACM TASLP, Speech Communication, etc.

My research interest focuses on speech processing, with a main focus on speech quality assessment, voice conversion, and speech synthesis. I also worked on related applications including speaking-aid and accent conversion.

When I am not doing research, I spend most of my time golfing. I have been golfing since 2023. My best score is 86 (regular tee). I have also been dancing for 10+ years, although I do not dance as much as I did previously. Check out this video, this video, this video, this video, and this video.

My CV can be downloaded here.

news

Aug 21, 2025	Two papers [SHEET] (first author) [Language-independent speaker anonymization] (co-author) were presented at Interspeech 2025.
Aug 17, 2025	I gave a tutorial at Interspeech 2025 wiith Erica Cooper (NICT, Japan) and Jiatong Shi (CMU, USA) on the topic “Automatic Quality Assessment for Speech and Beyond”. [Slides]
Jun 30, 2025	A journal was published in IEEE Journal of Selected Topics in Signal Processing.
Jun 25, 2025	The AudioMOS Challenge 2025 is officially over! Now you can freely get the datasets on the challenge page. There will also be a special session at ASRU 2025!
May 15, 2025	I gave an online invited talk at the Conversational AI Reading Group, MILA. The topic was “Automatic Quality Assessment for Speech and Beyond”. [Slides]
Apr 09, 2025	The Singing Voice Conversion Challenge 2025 kicks off today! This year we focus on singing technique conversion. If you are interested in participating, please register here and we will contact you! https://forms.gle/GZGAWJAZvgDK6QKcA
Apr 09, 2025	The AudioMOS Challenge 2025 kicks off today! Participants will build quality predictors for speehc, music and general audio. We are still accepting new challengers! If you are interested in participating, please register here and we will contact you! https://forms.gle/am1qDtEwWVmEnh5d9
Apr 08, 2025	One paper was presented at ICASSP 2025. [Investigating Factors Related to the Naturalness of Synthesized Unison Singing]
Dec 02, 2024	One paper was presented at SLT2024. [VoiceMOS Challenge 2024]
Nov 07, 2024	A new preprint is now available [MOS-Bench]. The corresponding open-source toolkit, [SHEET], is also available on GitHub.
Oct 25, 2024	I gave an invited talk on voice conversion at SP/IPSJ-SLP. Please find the slides here.
Oct 17, 2024	A paper was published at IEEE Signal Processing Letters (acceptance rate: 10-20%). [SA-TTS]
Aug 14, 2024	I gave an invited talk on voice conversion at CITI, Academia Sinica, Taiwan. Please find the slides here.
Jul 16, 2024	I gave a lecture on voice conversion. Please find the slides here.
Jul 01, 2024	We wrote a review paper on evaluation of synthesis speech, which was published at Acoustical Science and Technology, a journal in Japan. The English version can be found here.
Jun 14, 2024	The VoiceMOS Challenge 2024 is officially over! Now you can freely get the datasets by registering through the CodaBench page. There will also be a special session at SLT 2024!
May 17, 2024	One paper was accepted to IEEE/ACM TASLP. [Pretraining and Adaptation Techniques for Electrolaryngeal Speech Recognition]
Apr 16, 2024	One paper was accepted to IEEE/ACM TASLP. [A Large-Scale Evaluation of Speech Foundation Models]
Apr 14, 2024	One paper was presented at ICASSP 2024. [Electrolaryngeal Speech Intelligibility Enhancement through Robust Linguistic Encoders]
Apr 01, 2024	I am now an assistant professor at the Graduate School of Informatics, Nagoya University.
Feb 16, 2024	I successfully defended my Ph.D. thesis!
Dec 15, 2023	Four papers were presented at ASRU 2023. [SVCC2023] [VoiceMOS Challenge 2023] [NU-SVCC2023] [N2D-VC-GST]
Aug 25, 2023	A paper was accepted to APSIPA ASC 2023. [Evaluate-FAC]
Jun 26, 2023	The Singing Voice Conversion Challenge 2023 is over! We have a summary paper submitted to arXiv. There will also be a special session at ASRU 2023!
Jun 05, 2023	I was honored the Outstanding Graduate Student Award (学術奨励賞) of Nagoya University!
Apr 10, 2023	I start serving as a student researcher at Google Japan.
Apr 07, 2023	I open-sourced the seq2seq-vc toolkit! It is a toolkit for sequence-to-sequence voice conversion research. Please check it out!
Mar 15, 2023	I open-sourced the s3prl-vc toolkit! It also comes with a HuggingFace Spaces demo. Please check them out!
Jan 19, 2023	The first Singing Voice Conversion Challenge kicks off today! This is a new version of the voice conversion challenge (VCC) series that aims to compare techniques for singing voice conversion, in contrast to normal voice conversion. We are still accepting new challengers! If you are interested in participating, please fill in the registration form.
Dec 27, 2022	The VTN journal paper received the 16th IEEE Signal Processing Society Japan Student Best Paper Award. Open access.
Jul 25, 2022	One journal was accepted to the IEEE Journal of Selected Topics in Signal Processing. ArXiv version.
Jun 14, 2022	One paper [Expressive Speech-to-Speech Translation] was accepted to ICASSP 2023. Also, one paper I co-authored [Intermediate fine-tuning for pathological ASR] was also accepted.
Jun 14, 2022	Two papers [End-to-end binaural synthesis] [VoiceMOS Challenge 2022] were accepted to Interspeech 2022. Also, one paper I co-authored [SSL for pathological ASR] was also accepted.
May 16, 2022	I started my internship at FAIR (Fundamental AI Research), Meta.
Mar 23, 2022	I was invited to give a talk at 音声言語情報処理研究会/音声研究会 (SLP/SP), a Japanese domestic conference. Slides are [here].
Mar 23, 2022	The VoiceMOS Challenge 2022 is over! We have a [summary paper] submitted to arXiv. The CodaLab competition page is still opened, and ANYONE can register to get the dataset and give it a try!
Jan 22, 2022	Two first-author papers [S3PRL-VC] [LDNet] and one co-first author paper [N2D VC] were accepted to ICASSP 2022. Also, two papers I co-authored [mos-finetune-ssl] [Direct N2N VC] were also accepted.
Jan 13, 2022	The VoiceMOS Challenge was accepted as a special session at INTERSPEECH 2022! Again, we are still accepting new challengers. If you are interested in participating, please contact us at voicemos2022@nii.ac.jp first then register at the CodaLab page.
Dec 20, 2021	The first VoiceMOS Challenge kicks off today! This is a new challenge that aims to compare techniques for predicting the mean opinion score (MOS) of synthetic speech. We are still accepting new challengers! If you are interested in participating, please contact us at voicemos2022@nii.ac.jp.
Dec 17, 2021	Received the Best Paper Award at APSIPA ASC 2021!
Sep 11, 2021	One first-author paper [Prosody for ASR+TTS VC] was accepted to ASRU 2021. Also, one paper I co-authored [ELVC w/ Seq2seq] was accepted.
Aug 31, 2021	Three co-author papers were accepted to APSIPA ASC 2021. [ELVC w/ lip] [Noisy-to-noisy VC] [Investigation of non-parallel seq2seq VC w/ synthetic data]
Aug 31, 2021	I started my internship at Facebook Reality Labs Research.
Jul 27, 2021	You can read some posts I wrote in the blog page, as long as you understand Mandarin Chinese.
Jun 04, 2021	One first-author paper [Dysarthric VC w/ VTN+VAE] was accepted to Interspeech 2021. Also, one paper I co-authored [Relational data selection] was accepted.
Feb 22, 2021	I successfully defensed my master’s thesis. Also, I successfully passed the Ph.D. entrance exam, and will become a Ph.D. candidate at the Graduate School of Informatics, Nagoya University.
Jan 30, 2021	One paper [EMA2S] was accepted to IEEE International Symposium on Circuits and Systems (ISCAS) 2021.
Jan 30, 2021	Two first-author papers [VQVAE-VC] [BERT-ASR] were accepted to ICASSP 2021. Also, two papers I co-authored [crank] [NonAR seq2seq VC] were also accepted.
Jan 07, 2021	One journal was accepted to the IEEE/ACM Transactions on Audio, Speech, and Language Processing. The early access version is available now on IEEE Xplore. There is also an arXiv version.
Oct 17, 2020	Four papers are accepted to the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020! [Challenge Summary] [Objective Assesement] [Baseline ASR+TTS] [NU entry]
Oct 17, 2020	The proceeding of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020 is online now!
Jul 29, 2020	The implementation of VTN is open-sourced on ESPnet.
Jul 28, 2020	One paper [VTN] was accepted to Interspeech 2020.
May 20, 2020	One journal paper [ASVspoof 2019 database] was accepted to the Computer Speech & Language.
Mar 09, 2020	I am co-organizing the Voice Conversion Challenge 2020. I developed a seq-to-seq baseline w/ ESPnet.
Jan 19, 2020	One journal paper [CDVAE-CLS-GAN] was accepted to the IEEE Transactions on Emerging Topics in Computational Intelligence.

latest posts

Apr 29, 2025	名古屋大學助理教授1年目
Mar 05, 2025	助教0年目の振り返り
Jan 12, 2025	[長篇雜談] 台灣的食物真的好油
Jul 01, 2024	名古屋大學留學四&五年目、名古屋大學助教零年目（上）
Apr 24, 2022	名古屋大學留學三年目

selected publications

VTN

Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining

Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, and Tomoki Toda

In Proc. Interspeech, Aug 2020

arXiv Code Demo
S3PRL-VC

S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations

Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe, and Tomoki Toda

In Proc. ICASSP, Aug 2022

arXiv Code Demo
VMC’22

The Voicemos Challenge 2022

Wen-Chin Huang, E. Cooper, Y. Tsao, H.-M. Wang, J. Yamagishi, and T. Toda

In Proc. Interspeech, Aug 2022

arXiv Website
A review on subjective and objective evaluation of synthetic speech

E. Cooper, Wen-Chin Huang, Y. Tsao, H.-M. Wang, T. Toda, and J. Yamagishi

Acoustical Science and Technology, Aug 2024

Website
SHEET

SHEET: A Multi-purpose Open-source Speech Human Evaluation Estimation Toolkit

Tomoki Toda Wen-Chin Huang

In Proc. INTERSPEECH, Aug 2025

arXiv Website