TY - GEN
T1 - Frame-level acoustic modeling based on Gaussian process regression for statistical nonparametric speech synthesis
AU - Koriyama, Tomoki
AU - Nose, Takashi
AU - Kobayashi, Takao
PY - 2013/10/18
Y1 - 2013/10/18
N2 - This paper proposes a new approach to text-to-speech based on Gaussian processes which are widely used to perform non-parametric Bayesian regression and classification. The Gaussian process regression model is designed for the prediction of frame-level acoustic features from the corresponding frame information. The frame information includes relative position in the phone and preceding and succeeding phoneme information obtained from linguistic information. In this paper, a frame context kernel is proposed as a similarity measure of respective frames. Experimental results using a small data set show the potential of the proposed approach without state-dependent dynamic features or decision-tree clustering used in a conventional HMM-based approach.
AB - This paper proposes a new approach to text-to-speech based on Gaussian processes which are widely used to perform non-parametric Bayesian regression and classification. The Gaussian process regression model is designed for the prediction of frame-level acoustic features from the corresponding frame information. The frame information includes relative position in the phone and preceding and succeeding phoneme information obtained from linguistic information. In this paper, a frame context kernel is proposed as a similarity measure of respective frames. Experimental results using a small data set show the potential of the proposed approach without state-dependent dynamic features or decision-tree clustering used in a conventional HMM-based approach.
KW - acoustic models
KW - context kernel
KW - Gaussian process regression
KW - non-parametric Bayesian model
KW - statistical speech synthesis
UR - http://www.scopus.com/inward/record.url?scp=84890499369&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84890499369&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2013.6639224
DO - 10.1109/ICASSP.2013.6639224
M3 - Conference contribution
AN - SCOPUS:84890499369
SN - 9781479903566
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 8007
EP - 8011
BT - 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
T2 - 2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
Y2 - 26 May 2013 through 31 May 2013
ER -