Publications | Wen-Chin Huang 黃文勁

My full publication list can be found on my Google Scholar page or my CV.

2025

SHEET

SHEET: A Multi-purpose Open-source Speech Human Evaluation Estimation Toolkit

Tomoki Toda Wen-Chin Huang

In Proc. INTERSPEECH, 2025

arXiv Website
AMC’25

The AudioMOS Challenge 2025

In Proc. ASRU, 2025

Website

2024

A review on subjective and objective evaluation of synthetic speech

E. Cooper, Wen-Chin Huang, Y. Tsao, H.-M. Wang, T. Toda, and J. Yamagishi

Acoustical Science and Technology, 2024

Website
Multi-Speaker Text-to-Speech Training With Speaker Anonymized Data

Tomoki Toda Wen-Chin Huang

IEEE Signal Processing Letters, 2024

arXiv Demo
VMC’24

The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction

In Proc. SLT, 2024

arXiv Website

2023

A Holistic Cascade System, Benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation

Wen-Chin Huang, Benjamin Peloquin, Justine Kao, Changhan Wang, Hongyu Gong, Elizabeth Salesky, Yossi Adi, Ann Lee, and Peng-Jen Chen

In Proc. ICASSP, 2023

arXiv
Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion

Wen-Chin Huang, and Tomoki Toda

In Proc. APSIPA ASC, 2023

arXiv Code Demo
SVCC2023

The Singing Voice Conversion Challenge 2023

Wen-Chin Huang, Lester Phillip Violeta, Songxiang Liu, Jiatong Shi, Yusuke Yasuda, and Tomoki Toda

In Proc. ASRU, 2023

arXiv Website
VMC’23

The Voicemos Challenge 2023: Zero-Shot Subjective Speech Quality Prediction for Multiple Domains

E. Cooper, Wen-Chin Huang, Y. Tsao, H.-M. Wang, T. Toda, and J. Yamagishi

In Proc. ASRU, 2023

arXiv Website

2022

S3PRL-VC

S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations

Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe, and Tomoki Toda

In Proc. ICASSP, 2022

arXiv Code Demo
N2D VC

Towards Identity Preserving Normal to Dysarthric Voice Conversion

Wen-Chin Huang, Bence Mark Halpern, Lester Phillip Violeta, Odette Scharenborg, and Tomoki Toda

In Proc. ICASSP, 2022

arXiv Demo
LDNet

LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech

Wen-Chin Huang, E. Cooper, J. Yamagishi, and T. Toda

In Proc. ICASSP, 2022

arXiv Code
A Comparative Study of Self-Supervised Speech Representation Based Voice Conversion

Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, and Tomoki Toda

IEEE Journal of Selected Topics in Signal Processing, 2022

arXiv Code
End-to-End Binaural Speech Synthesis

Wen-Chin Huang, Dejan Markovic, Alexander Richard, Israel Dejene Gebru, and Anjali Menon

In Proc. Interspeech, 2022

arXiv
VMC’22

The Voicemos Challenge 2022

Wen-Chin Huang, E. Cooper, Y. Tsao, H.-M. Wang, J. Yamagishi, and T. Toda

In Proc. Interspeech, 2022

arXiv Website

2021

Pretraining Techniques for Sequence-to-Sequence Voice Conversion

Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, and Tomoki Toda

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021

arXiv Code Demo
Any-to-One Sequence-to-Sequence Voice Conversion using Self-Supervised Discrete Speech Representations

Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, and Tomoki Toda

In Proc. ICASSP, 2021

arXiv Demo
BERT-ASR

Speech Recognition by Simply Fine-tuning BERT

Wen-Chin Huang, Chia-Hua Wu, Shang-Bao Luo, Kuan-Yu Chen, Hsin-Min Wang, and Tomoki Toda

In Proc. ICASSP, 2021

arXiv
DVC-VTN-VAE

A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion

Wen-Chin Huang, K. Kobayashi, Y.-H. Peng, C.-F. Liu, Y. Tsao, H.-M. Wang, and T. Toda

In Proc. Interspeech, 2021

arXiv Demo
On Prosody Modeling for ASR+TTS based Voice Conversion

Wen-Chin Huang, T. Hayashi, X. Li, S. Watanabe, and T. Toda

In Proc. ASRU, 2021

arXiv Demo
Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion

Y.-S. Liou, Wen-Chin Huang, M.-C. Yen, S.-W. Tsai, Y.-H. Peng, T. Toda, Y. Tsao, and H.-M. Wang

In Proc. APSIPA ASC, 2021

arXiv Demo

2020

VCC2020

Voice Conversion Challenge 2020 – Intra-lingual semi-parallel and cross-lingual voice conversion –

Zhao Yi, Wen-Chin Huang, Xiaohai Tian, Junichi Yamagishi, Rohan Kumar Das, Tomi Kinnunen, Zhen-Hua Ling, and Tomoki Toda

In Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

arXiv
Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions

Rohan Kumar Das, Tomi Kinnunen, Wen-Chin Huang, Zhen-Hua Ling, Junichi Yamagishi, Zhao Yi, Xiaohai Tian, and Tomoki Toda

In Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

arXiv
ASR+TTS

The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS

Wen-Chin Huang, Tomoki Hayashi, Shinji Watanabe, and Tomoki Toda

In Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

arXiv
The NU Voice Conversion System for the Voice Conversion Challenge 2020: On the Effectiveness of Sequence-to-sequence Models and Autoregressive Neural Vocoders

Wen-Chin Huang, Patrick Lumban Tobing, Yi-Chiao Wu, Kazuhiro Kobayashi, and Tomoki Toda

In Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

arXiv
VTN

Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining

Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, and Tomoki Toda

In Proc. Interspeech, Aug 2020

arXiv Code Demo
CDVAE-CLS-GAN

Unsupervised Representation Disentanglement Using Cross Domain Features and Adversarial Learning in Variational Autoencoder Based Voice Conversion

Wen-Chin Huang, Hao Luo, Hsin-Te Hwang, Chen-Chou Lo, Yu-Huai Peng, Yu Tsao, and Hsin-Min Wang

IEEE Transactions on Emerging Topics in Computational Intelligence, Aug 2020

arXiv Code Demo

2019

Generalization of Spectrum Differential based Direct Waveform Modification for Voice Conversion

Wen-Chin Huang, Yi-Chiao Wu, Kazuhiro Kobayashi, Yu-Huai Peng, Hsin-Te Hwang, Patrick Lumban Tobing, Tomoki Toda, Yu Tsao, and Hsin-Min Wang

In Proc. 10th ISCA Speech Synthesis Workshop, Sep 2019

arXiv Demo
F0-FCN-CDVAE

Investigation of F0 Conditioning and Fully Convolutional Networks in Variational Autoencoder Based Voice Conversion

Wen-Chin Huang, Yi-Chiao Wu, Chen-Chou Lo, Patrick Lumban Tobing, Hayashi Tomoki, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, and Hsin-Min Wang

In Proc. Interspeech, Sep 2019

arXiv Demo
Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion

Wen-Chin Huang, Yi-Chiao Wu, Hsin-Te Hwang, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, and Hsin-Min Wang

In Proc. 27th European Signal Processing Conference (EUSIPCO), Sep 2019

arXiv Demo

2018

CDVAE

Voice Conversion Based on Cross-Domain Features Using Variational Auto Encoders

Wen-Chin Huang, Hsin-Te Hwang, Yu-Huai Peng, Yu Tsao, and Hsin-Min Wang

In Proc. The 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), Nov 2018

arXiv Code Demo
WaveNet Vocoder and its Applications in Voice Conversion

Wen-Chin Huang, Chen-Chou Lo, Hsin-Te Hwang, Yu Tsao, and Hsin-Min Wang

In Proc. The 30th ROCLING Conference on Computational Linguistics and Speech Processing (ROCLING), Oct 2018

PDF