Abstract
This paper proposes a statistical nonparametric speech synthe- sis technique based on a sparse Gaussian process regression (GPR). In our previous study, we proposed GPR-based speech synthesis where each frame of synthesis units is modeled by a regression of Gaussian processes. Preliminary experiments of synthesizing several phones including both vowels and conso- nants showed a potential of the technique. In this paper, the previous work is extended to full-sentence speech synthesis us- ing sparse GPs and context modification. Specifically, cluster- based sparse Gaussian processes such as local GPs and partially independent conditional (PIC) approximation are examined as a computationally feasible approach. Moreover, frame-level con- Text is extended to include not only a position context from a current phone but also adjacent phones to generate smoothly changing speech parameters. Objective and subjective evalua- Tion results show that the proposed technique outperforms the HMM-based speech synthesis with minimum generation error training.
Original language | English |
---|---|
Pages (from-to) | 1072-1076 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Publication status | Published - 2013 |
Event | 14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013 - Lyon, France Duration: 2013 Aug 25 → 2013 Aug 29 |
Keywords
- Gaussian process re- gression
- Non-parametric bayesian model
- Statistical speech synthesis