In this paper the authors propose a fundamental frequency (F0) control model using a representative vector and then propose a method to train the control rules for the model parameters using a speech database. The representative vector is a vector which represents the typical F0 contour for accent phrases, and a set of representative vectors is referred to as a representative vector codebook. The authors generated F0 contour for sentences by performing linear expansion or contraction by mora in line with the length of the phoneme duration and the parallel shift (offset) on the logarithmic frequency axis and then concatenating them for representative vectors selected for each accent phrase. Training the control rules corresponds to extracting from the speech database the representative vector code-book, the representative vector selection rules, and the offset prediction rules. Based on the criterion that the error between the F0 contour for the speech database and the contour generated by the model (the approximation error) be minimized, the control rules were trained. The results of training experiments for the control rules using a speech database consisting of four female speakers showed that an F0 contour close to the original speaker can be generated even for sentences not included in the training data.
- Fundamental frequency control
- Representative vector
- Speech synthesis