This paper presents a technique of very low bit-rate F0 coding for phonetic vocoders based on a hidden Markov model (HMM) using phone-level quantized F0 symbols. In the proposed technique, an input F0 sequence is converted into an F0 symbol sequence at the phone level using scalar quantization. The quantized F0 symbols represent the rough shape of the original F0 contour and are used as the prosodic context for the HMM in the decoding process. To model the F0 that has voiced and unvoiced regions, we use multi-space probability distribution HMM (MSD-HMM). Synthetic speech is generated from the context-dependent labels and pre-trained MSD-HMMs by using the HMM-based parameter generation algorithm. By taking into account the preceding and succeeding contexts as well as the current one in the modeling and synthesis, we can generate a smooth F0 trajectory similar to that of the original with only a small number of quantization bits. The experimental results reveal that the proposed F0 coding outperforms the conventional segment-based F0 coding technique using MSD-VQ. We also demonstrate that the decoded speech of the proposed vocoder has acceptable quality even when the F0 bit-rate is less than 50 bps.
- HMM-based speech synthesis
- Phonetic vocoder
- Very low bit-rate speech coding