Abstract
This paper proposes a simple and efficient technique for variance compensation to improve the perceptual quality of synthetic speech in parametric speech synthesis. First, we analyze the problem of spectral and F0 enhancement with global variance (GV) in HMM-based speech synthesis. In the conventional GV-based parameter generation, the enhancement is achieved by taking account of a GV probability density function with fixed GV model parameters for every output utterance through the speech parameter generation process. We find that the use of fixed GV parameters results in much smaller variations of GVs in synthesized utterances than those in natural speech. In addition, the computational cost is high because of iterative optimization. This paper examines these issues in terms of multiple objective measures such as variance characteristics, GV distortions, and GV correlations. We propose a simple and fast compensation method based on a global affine transformation that provides a GV distribution closer to that of natural speech and improves the correlation of GVs between natural and generated parameter sequences. The experimental results demonstrate that the proposed variance compensation methods outperform the conventional GV-based parameter generation in terms of objective and subjective speech similarity to natural speech while maintaining speech naturalness.
Original language | English |
---|---|
Pages (from-to) | 1694-1704 |
Number of pages | 11 |
Journal | IEEE/ACM Transactions on Audio Speech and Language Processing |
Volume | 24 |
Issue number | 10 |
DOIs | |
Publication status | Published - 2016 Oct |
Keywords
- HMM-based speech synthesis
- affine transformation
- global variance
- over-smoothing problem
- variance compensation
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- Acoustics and Ultrasonics
- Computational Mathematics
- Electrical and Electronic Engineering