Abstract
This paper proposes a fast end-to-end non-parallel voice conversion (VC) named Tachylone. In Thachylone, speaker conversion and waveform generation is performed by a single vocoder network. In the training of Tachylone, a pre-trained universal neural vocoder is used as the initial model, and the model parameters are updated using source and target speakers’ non-parallel data based on cycle-consistent learning in an end-to-end manner. We compare Tachylone to conventional CycleGAN-based VC with objective and subjective measures and discuss the results.
Original language | English |
---|---|
Pages (from-to) | 116-119 |
Number of pages | 4 |
Journal | Acoustical Science and Technology |
Volume | 46 |
Issue number | 1 |
DOIs | |
Publication status | Published - 2025 Jan |
Keywords
- Cycle-consistent learning
- End-to-end VC
- Neural vocoder
- Non-parallel VC
- Voice conversion (VC)