Demo samples for the voice conversion experiments in the paper "Simulated electrolaryngeal speech corpus"

Paper: To be uploaded.
Code: https://github.com/unilight/seq2seq-vc
Authors: KOBAYASHI, Kazuhiro and OGITA, Kenichi (Nagoya Univ./TARVO, Inc.) and DING, Ma and VIOLETA, Lester and HUANG, WenChin and TODA, Tomoki (Nagoya Univ.)
Comments: Submitted to ASJ2024 Autumn (日本音響学会2024年秋季研究発表会).

Brief explanation (not the abstract of the paper):
This paper presents PESC (Pseudo-electrolaryngeal Speech Corpus), a dataset that consists of Japanese transcripts and speech data from multiple speakers, including both natural speech and electrolarynx (EL) speech. The corpus includes parallel data of 200 utterances per speaker from 14 healthy individuals, comprising both natural and pseudo-EL speech, as well as evaluation data of 50 utterances from 14 laryngectomees.

In this demo page, we present the voice conversion (VC) samples. The experiments was in a one-to-one setting, and the model was AAS-VC, a non-autoregressive sequence-to-sequence VC model whose implementation is open-sourced (see link above). We used the 200 parallel training utterances of each pseudo-speaker pair. The train/val/test split was 140/10/50. For simplicity, a Parallel WaveGAN vocoder was trained using the training utterances from all 20 speakers.

Speech Samples

Transcription: 今日も一日 頑張りましょう


SpeakerSourceConvertedGround truth
009
011
008
003
002
010

Transcription: どうぞよろしくお願いします


SpeakerSourceConvertedGround truth
009
011
008
003
002
010

Transcription: 今年一年間 お疲れ様でした


SpeakerSourceConvertedGround truth
009
011
008
003
002
010

[Back to top]