TY - JOUR
T1 - Complex cepstrum for statistical parametric speech synthesis
AU - Maia, Ranniery
AU - Akamine, Masami
AU - Gales, Mark J.F.
PY - 2013/6
Y1 - 2013/6
N2 - Statistical parametric synthesizers have typically relied on a simplified model of speech production. In this model, speech is generated using a minimum-phase filter, implemented from coefficients derived from spectral parameters, driven by a zero or random phase excitation signal. This excitation signal is usually constructed from fundamental frequencies and parameters used to control the balance between the periodicity and aperiodicity of the signal. The application of this approach to statistical parametric synthesis has partly been motivated by speech coding theory. However, in contrast to most real-time speech coders, parametric speech synthesizers do not require causality. This allows the standard simplified model to be extended to represent the natural mixed-phase characteristics of speech signals. This paper proposes the use of the complex cepstrum to model the mixed phase characteristics of speech through the incorporation of phase information in statistical parametric synthesis. The phase information is contained in the anti-causal portion of the complex cepstrum. These parameters have a direct connection with the shape of the glottal pulse of the excitation signal. Phase parameters are extracted on a frame-basis and are modeled in the same fashion as the minimum-phase synthesis filter parameters. At synthesis time, phase parameter trajectories are generated and used to modify the excitation signal. Experimental results show that the use of such complex cepstrum-based phase features results in better synthesized speech quality. Listening test results yield an average preference of 60% for the system with the proposed phase feature on both female and male voices.
AB - Statistical parametric synthesizers have typically relied on a simplified model of speech production. In this model, speech is generated using a minimum-phase filter, implemented from coefficients derived from spectral parameters, driven by a zero or random phase excitation signal. This excitation signal is usually constructed from fundamental frequencies and parameters used to control the balance between the periodicity and aperiodicity of the signal. The application of this approach to statistical parametric synthesis has partly been motivated by speech coding theory. However, in contrast to most real-time speech coders, parametric speech synthesizers do not require causality. This allows the standard simplified model to be extended to represent the natural mixed-phase characteristics of speech signals. This paper proposes the use of the complex cepstrum to model the mixed phase characteristics of speech through the incorporation of phase information in statistical parametric synthesis. The phase information is contained in the anti-causal portion of the complex cepstrum. These parameters have a direct connection with the shape of the glottal pulse of the excitation signal. Phase parameters are extracted on a frame-basis and are modeled in the same fashion as the minimum-phase synthesis filter parameters. At synthesis time, phase parameter trajectories are generated and used to modify the excitation signal. Experimental results show that the use of such complex cepstrum-based phase features results in better synthesized speech quality. Listening test results yield an average preference of 60% for the system with the proposed phase feature on both female and male voices.
KW - Cepstral analysis
KW - Complex cepstrum
KW - Glottal source models
KW - Spectral analysis
KW - Speech synthesis
KW - Statistical parametric speech synthesis
UR - http://www.scopus.com/inward/record.url?scp=84876205258&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84876205258&partnerID=8YFLogxK
U2 - 10.1016/j.specom.2012.12.008
DO - 10.1016/j.specom.2012.12.008
M3 - Article
AN - SCOPUS:84876205258
SN - 0167-6393
VL - 55
SP - 606
EP - 618
JO - Speech Communication
JF - Speech Communication
IS - 5
ER -