Selected publications (a.k.a. my first-author papers)

My full publication list can be found on my Google Scholar page or my CV.

2023

  1. A Holistic Cascade System, Benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation Huang, Wen Chin, Peloquin, Benjamin, Kao, Justine, Wang, Changhan, Gong, Hongyu, Salesky, Elizabeth, Adi, Yossi, Lee, Ann, and Chen, Peng-Jen In Proc. ICASSP 2023 [arXiv]
  2. Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion Huang, Wen Chin, and Toda, Tomoki In Proc. APSIPA ASC 2023 [arXiv] [Demo] [Code]
  3. SVCC2023
    The Singing Voice Conversion Challenge 2023 Huang, Wen Chin, Violeta, Lester Phillip, Liu, Songxiang, Shi, Jiatong, Yasuda, Yusuke, and Toda, Tomoki In Proc. ASRU 2023 [arXiv] [Link]
  4. VMC’23
    The Voicemos Challenge 2023: Zero-Shot Subjective Speech Quality Prediction for Multiple Domains Cooper, E., Huang, Wen-Chin, Tsao, Y., Wang, H.-M., Toda, T., and Yamagishi, J. In Proc. ASRU 2023 [arXiv] [Link]

2022

  1. S3PRL-VC
    S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations Huang, Wen-Chin, Yang, Shu-Wen, Hayashi, Tomoki, Lee, Hung-Yi, Watanabe, Shinji, and Toda, Tomoki In Proc. ICASSP 2022 [arXiv] [Demo] [Code]
  2. N2D VC
    Towards Identity Preserving Normal to Dysarthric Voice Conversion Huang, Wen-Chin, Halpern, Bence Mark, Violeta, Lester Phillip, Scharenborg, Odette, and Toda, Tomoki In Proc. ICASSP 2022 [arXiv] [Demo]
  3. LDNet
    LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech Huang, Wen-Chin, Cooper, E., Yamagishi, J., and Toda, T. In Proc. ICASSP 2022 [arXiv] [Code]
  4. A Comparative Study of Self-Supervised Speech Representation Based Voice Conversion Huang, Wen-Chin, Yang, Shu-Wen, Hayashi, Tomoki, and Toda, Tomoki IEEE Journal of Selected Topics in Signal Processing 2022 [arXiv]
  5. End-to-End Binaural Speech Synthesis Huang, Wen Chin, Markovic, Dejan, Richard, Alexander, Gebru, Israel Dejene, and Menon, Anjali In Proc. Interspeech 2022 [arXiv]
  6. VMC’22
    The Voicemos Challenge 2022 Huang, Wen-Chin, Cooper, E., Tsao, Y., Wang, H.-M., Yamagishi, J., and Toda, T. In Proc. Interspeech 2022 [arXiv] [Link]

2021

  1. Pretraining Techniques for Sequence-to-Sequence Voice Conversion Huang, Wen-Chin, Hayashi, Tomoki, Wu, Yi-Chiao, Kameoka, Hirokazu, and Toda, Tomoki IEEE/ACM Transactions on Audio, Speech, and Language Processing 2021 [arXiv] [Demo]
  2. Any-to-One Sequence-to-Sequence Voice Conversion using Self-Supervised Discrete Speech Representations Huang, Wen-Chin, Hayashi, Tomoki, Wu, Yi-Chiao, and Toda, Tomoki In Proc. ICASSP 2021 [arXiv] [Demo]
  3. BERT-ASR
    Speech Recognition by Simply Fine-tuning BERT Huang, Wen-Chin, Wu, Chia-Hua, Luo, Shang-Bao, Chen, Kuan-Yu, Wang, Hsin-Min, and Toda, Tomoki In Proc. ICASSP 2021 [arXiv]
  4. DVC-VTN-VAE
    A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion Huang, Wen-Chin, Kobayashi, K., Peng, Y.-H., Liu, C.-F., Tsao, Y., Wang, H.-M., and Toda, T. In Proc. Interspeech 2021 [arXiv] [Demo]
  5. On Prosody Modeling for ASR+TTS based Voice Conversion Huang, Wen-Chin, Hayashi, T., Li, X., Watanabe, S., and Toda, T. In Proc. ASRU 2021 [arXiv] [Demo]
  6. Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion Liou, Y.-S., Huang, Wen-Chin, Yen, M.-C., Tsai, S.-W., Peng, Y.-H., Toda, T., Tsao, Y., and Wang, H.-M. In Proc. APSIPA ASC 2021 [arXiv] [Demo]

2020

  1. VCC2020
    Voice Conversion Challenge 2020 – Intra-lingual semi-parallel and cross-lingual voice conversion – Yi, Zhao, Huang, Wen-Chin, Tian, Xiaohai, Yamagishi, Junichi, Das, Rohan Kumar, Kinnunen, Tomi, Ling, Zhen-Hua, and Toda, Tomoki In Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020 2020 [arXiv]
  2. Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions Das, Rohan Kumar, Kinnunen, Tomi, Huang, Wen-Chin, Ling, Zhen-Hua, Yamagishi, Junichi, Yi, Zhao, Tian, Xiaohai, and Toda, Tomoki In Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020 2020 [arXiv]
  3. ASR+TTS
    The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS Huang, Wen-Chin, Hayashi, Tomoki, Watanabe, Shinji, and Toda, Tomoki In Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020 2020 [arXiv]
  4. The NU Voice Conversion System for the Voice Conversion Challenge 2020: On the Effectiveness of Sequence-to-sequence Models and Autoregressive Neural Vocoders Huang, Wen-Chin, Tobing, Patrick Lumban, Wu, Yi-Chiao, Kobayashi, Kazuhiro, and Toda, Tomoki In Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020 2020 [arXiv]
  5. VTN
    Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining Huang, Wen-Chin, Hayashi, Tomoki, Wu, Yi-Chiao, Kameoka, Hirokazu, and Toda, Tomoki In Proc. Interspeech 2020 [arXiv] [Demo] [Code]
  6. CDVAE-CLS-GAN
    Unsupervised Representation Disentanglement Using Cross Domain Features and Adversarial Learning in Variational Autoencoder Based Voice Conversion Huang, Wen-Chin, Luo, Hao, Hwang, Hsin-Te, Lo, Chen-Chou, Peng, Yu-Huai, Tsao, Yu, and Wang, Hsin-Min IEEE Transactions on Emerging Topics in Computational Intelligence 2020 [arXiv] [Demo] [Code]

2019

  1. Generalization of Spectrum Differential based Direct Waveform Modification for Voice Conversion Huang, Wen-Chin, Wu, Yi-Chiao, Kobayashi, Kazuhiro, Peng, Yu-Huai, Hwang, Hsin-Te, Lumban Tobing, Patrick, Toda, Tomoki, Tsao, Yu, and Wang, Hsin-Min In Proc. 10th ISCA Speech Synthesis Workshop 2019 [arXiv] [Demo]
  2. F0-FCN-CDVAE
    Investigation of F0 Conditioning and Fully Convolutional Networks in Variational Autoencoder Based Voice Conversion Huang, Wen-Chin, Wu, Yi-Chiao, Lo, Chen-Chou, Lumban Tobing, Patrick, Tomoki, Hayashi, Kobayashi, Kazuhiro, Toda, Tomoki, Tsao, Yu, and Wang, Hsin-Min In Proc. Interspeech 2019 [arXiv] [Demo]
  3. Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion Huang, Wen-Chin, Wu, Yi-Chiao, Hwang, Hsin-Te, Lumban Tobing, Patrick, Hayashi, Tomoki, Kobayashi, Kazuhiro, Toda, Tomoki, Tsao, Yu, and Wang, Hsin-Min In Proc. 27th European Signal Processing Conference (EUSIPCO) 2019 [arXiv] [Demo]

2018

  1. CDVAE
    Voice Conversion Based on Cross-Domain Features Using Variational Auto Encoders Huang, Wen-Chin, Hwang, Hsin-Te, Peng, Yu-Huai, Tsao, Yu, and Wang, Hsin-Min In Proc. The 11th International Symposium on Chinese Spoken Language Processing (ISCSLP) 2018 [arXiv] [Demo] [Code]
  2. WaveNet Vocoder and its Applications in Voice Conversion Huang, Wen-Chin, Lo, Chen-Chou, Hwang, Hsin-Te, Tsao, Yu, and Wang, Hsin-Min In Proc. The 30th ROCLING Conference on Computational Linguistics and Speech Processing (ROCLING) 2018 [Paper]