Wen-Chin Huang 黃文勁

I am currently a Ph.D. candidate supervised by Prof. Tomoki Toda in Toda Laboratory at the Graduate School of Informatics, Nagoya University. I received M.S. in the Graduate School of Informatics, Nagoya University at March 2021. Prior to studying at N.U., I received B.S. in Computer Science and Information Engineering from National Taiwan University in June 2018.

Starting from April 2023, I work as a student researcher at Google Japan. From May to September 2022, I was an research intern at FAIR (Fundamental AI Research), Meta, working on speech-to-speech translation, under the supervision of Peng-Jen Chen. From August 2021 to February 2022, I was an research intern at Reality Labs Research (RL-R), Meta, working on binaural speech synthesis, under the supervision of Dejan Markovic. From August 2019 to September 2019, I interned at the NTT Communication Science Laboratories, NTT Corporation under the supervision of Prof. Hirokazu Kameoka. I work closely with the Institute of Information Science in Academia Sinica, Taipei, Taiwan with advisor Prof. Hsin-Min Wang, where I was a research assistant from July 2017 to March 2019.

I received the Research Fellowship for Young Scientists (DC1) from Japan Society for the Promotion of Science (JSPS), which lasts from April 2021 to March 2024. I am a co-organizer of the VoiceMOS Challenge and the Voice Conversion Challenge 2020. I was honored the Outstanding Graduate Student Award (学術奨励賞) of Nagoya University in 2023. I was honored the 16th IEEE Signal Processing Society Japan Student Best Paper Award, the Best Paper Award at the 13th Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021 and Best Student Paper Award at the 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), 2018. I am also a reviewer of several journals including IEEE SPL, IEEE/ACM TASLP, Speech Communication, etc.

My research interest focuses on speech processing, with a main focus on voice conversion and other speech synthesis related fields such as text-to-speech, neural vocoding, speech quality assessment and applications to speaking-aid devices. I am also interested in keeping track of the latest deep learning techmologies, from seq2seq modeling, Transformers, GANs, BERT, diffusion models, to ChatGPTs.

When I am not doing research, I spend most of the time street dancing. I have been dancing for 10 years. Check out this video, this video, this video, this video, and this video. Recently I also started golfing.

My CV can be downloaded here.

Photo taken at Kiyomizu-dera, Kyoto, Japan, Dec. 2022.

news

Aug, 2023 A paper was accepted to APSIPA ASC 2023. [Evaluate-FAC]
Jun, 2023 I was honored the Outstanding Graduate Student Award (学術奨励賞) of Nagoya University!
Apr, 2023 I start serving as a student researcher at Google Japan.
Mar, 2023 I open-sourced the s3prl-vc toolkit! It also comes with a HuggingFace Spaces demo. Please check them out!
Jan, 2023 The first Singing Voice Conversion Challenge kicks off today! This is a new version of the voice conversion challenge (VCC) series that aims to compare techniques for singing voice conversion, in contrast to normal voice conversion. We are still accepting new challengers! If you are interested in participating, please fill in the registration form.
Dec, 2022 The VTN journal paper received the 16th IEEE Signal Processing Society Japan Student Best Paper Award. Open access.
Jul, 2022 One journal was accepted to the IEEE Journal of Selected Topics in Signal Processing. ArXiv version.
Jun, 2022 The Singing Voice Conversion Challenge 2023 is over! We have a summary paper submitted to arXiv. There will also be a special session at ASRU 2023!
Jun, 2022 One paper [Expressive Speech-to-Speech Translation] was accepted to ICASSP 2023. Also, one paper I co-authored [Intermediate fine-tuning for pathological ASR] was also accepted.
Jun, 2022 Two papers [End-to-end binaural synthesis] [VoiceMOS Challenge 2022] were accepted to Interspeech 2022. Also, one paper I co-authored [SSL for pathological ASR] was also accepted.
May, 2022 I started my internship at FAIR (Fundamental AI Research), Meta.
Mar, 2022 I was invited to give a talk at 音声言語情報処理研究会/音声研究会 (SLP/SP), a Japanese domestic conference. Slides are [here].
Mar, 2022 The VoiceMOS Challenge 2022 is over! We have a [summary paper] submitted to arXiv. The CodaLab competition page is still opened, and ANYONE can register to get the dataset and give it a try!
Jan, 2022 Two first-author papers [S3PRL-VC] [LDNet] and one co-first author paper [N2D VC] were accepted to ICASSP 2022. Also, two papers I co-authored [mos-finetune-ssl] [Direct N2N VC] were also accepted.
Jan, 2022 The VoiceMOS Challenge was accepted as a special session at INTERSPEECH 2022! Again, we are still accepting new challengers. If you are interested in participating, please contact us at voicemos2022@nii.ac.jp first then register at the CodaLab page.
Dec, 2021 The first VoiceMOS Challenge kicks off today! This is a new challenge that aims to compare techniques for predicting the mean opinion score (MOS) of synthetic speech. We are still accepting new challengers! If you are interested in participating, please contact us at voicemos2022@nii.ac.jp.
Dec, 2021 Received the Best Paper Award at APSIPA ASC 2021!
Sep, 2021 One first-author paper [Prosody for ASR+TTS VC] was accepted to ASRU 2021. Also, one paper I co-authored [ELVC w/ Seq2seq] was accepted.
Sep, 2021 Three co-author papers were accepted to APSIPA ASC 2021. [ELVC w/ lip] [Noisy-to-noisy VC] [Investigation of non-parallel seq2seq VC w/ synthetic data]
Aug, 2021 I started my internship at Facebook Reality Labs Research.
Jul, 2021 You can read some posts I wrote in the blog page, as long as you understand Mandarin Chinese.
Jun, 2021 One first-author paper [Dysarthric VC w/ VTN+VAE] was accepted to Interspeech 2021. Also, one paper I co-authored [Relational data selection] was accepted.
Feb, 2021 I successfully defensed my master’s thesis. Also, I successfully passed the Ph.D. entrance exam, and will become a Ph.D. candidate at the Graduate School of Informatics, Nagoya University.
Jan, 2021 One paper [EMA2S] was accepted to IEEE International Symposium on Circuits and Systems (ISCAS) 2021.
Jan, 2021 Two first-author papers [VQVAE-VC] [BERT-ASR] were accepted to ICASSP 2021. Also, two papers I co-authored [crank] [NonAR seq2seq VC] were also accepted.
Jan, 2021 One journal was accepted to the IEEE/ACM Transactions on Audio, Speech, and Language Processing. The early access version is available now on IEEE Xplore. There is also an arXiv version.
Oct, 2020 Four papers are accepted to the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020! [Challenge Summary] [Objective Assesement] [Baseline ASR+TTS] [NU entry]
Oct, 2020 The proceeding of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020 is online now!
Jul, 2020 The implementation of VTN is open-sourced on ESPnet.
Jul, 2020 One paper [VTN] was accepted to Interspeech 2020.
May, 2020 One journal paper [ASVspoof 2019 database] was accepted to the Computer Speech & Language.
Mar, 2020 I am co-organizing the Voice Conversion Challenge 2020. I developed a seq-to-seq baseline w/ ESPnet.
Jan, 2020 One journal paper [CDVAE-CLS-GAN] was accepted to the IEEE Transactions on Emerging Topics in Computational Intelligence.