HMM-based voice conversion using quantized F0 context

Takashi Nose, Yuhei Ota, Takao Kobayashi

Research output: Contribution to journalArticlepeer-review

11 Citations (Scopus)

Abstract

We propose a segment-based voice conversion technique using hidden Markov model (HMM)-based speech synthesis with nonparallel ttaining data. In the proposed technique, the phoneme information with durations and a quantized FO contotu" are extracted from the input speech of a source speaker, and are transmitted to a synthesis part. In the synthesis part, the quantized FO symbols are used as prosodic context. A phonetically and prosodically context-dependent label sequence is generated from the transmitted phoneme and the FO symbols. Then, converted speech is generated froin the label sequence with durations using the target speaker's pre-trained context-dependent HMMs. In the model training, the models of the source and target speakers can be trained separately, hence there is no need to prepare parallel speech data of the sotirce and target speakers. Objective and stibjective experimental results show that the segment-based voice conversion with phonetic and prosodic contexts works effectively even if the parallel speech data is not available.

Original languageEnglish
Pages (from-to)2483-2490
Number of pages8
JournalIEICE Transactions on Information and Systems
VolumeE93-D
Issue number9
DOIs
Publication statusPublished - 2010 Sept
Externally publishedYes

Keywords

  • F0 quantization
  • HMM-based speech synthesis
  • Nonparallel data
  • Prosodic context
  • Voice conversion

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'HMM-based voice conversion using quantized F0 context'. Together they form a unique fingerprint.

Cite this