TY - JOUR
T1 - HMM-based expressive singing voice synthesis with singing style control and robust pitch modeling
AU - Nose, Takashi
AU - Kanemoto, Misa
AU - Koriyama, Tomoki
AU - Kobayashi, Takao
N1 - Funding Information:
The authors thank Dr. Shinji Sako of Nagoya Institute of Technology for the use of MusicXML data, a part of singing voice data, MIDI data, and lyric data. Part of this work was supported by JSPS Grant-in-Aid for Scientific Research 24300071.
Publisher Copyright:
© 2015 Elsevier Ltd. All rights reserved.
PY - 2015/11/1
Y1 - 2015/11/1
N2 - This paper proposes a singing style control technique based on multiple regression hidden semi-Markov models (MRHSMMs) for changing singing styles and their intensities appearing in synthetic singing voices. In the proposed technique, singing styles and their intensities are represented by low-dimensional vectors called style vectors and are modeled in accordance with the assumption that mean parameters of acoustic models are given as multiple regressions of the style vectors. In the synthesis process, we can weaken or emphasize the intensities of singing styles by setting a desired style vector. In addition, the idea of pitch adaptive training is extended to the case of the MRHSMM to improve the modeling accuracy of pitch associated with musical notes. A novel vibrato modeling technique is also presented to extract vibrato parameters from singing voices that sometimes have unclear vibrato expressions. Subjective evaluations show that we can intuitively control singing styles and their intensities while maintaining the naturalness of synthetic singing voices comparable to the conventional HSMM-based singing voice synthesis.
AB - This paper proposes a singing style control technique based on multiple regression hidden semi-Markov models (MRHSMMs) for changing singing styles and their intensities appearing in synthetic singing voices. In the proposed technique, singing styles and their intensities are represented by low-dimensional vectors called style vectors and are modeled in accordance with the assumption that mean parameters of acoustic models are given as multiple regressions of the style vectors. In the synthesis process, we can weaken or emphasize the intensities of singing styles by setting a desired style vector. In addition, the idea of pitch adaptive training is extended to the case of the MRHSMM to improve the modeling accuracy of pitch associated with musical notes. A novel vibrato modeling technique is also presented to extract vibrato parameters from singing voices that sometimes have unclear vibrato expressions. Subjective evaluations show that we can intuitively control singing styles and their intensities while maintaining the naturalness of synthetic singing voices comparable to the conventional HSMM-based singing voice synthesis.
KW - HMM-based singing voice synthesis
KW - Multiple-regression HSMM
KW - Pitch adaptive training
KW - Singing style control
KW - Vibrato modeling
UR - http://www.scopus.com/inward/record.url?scp=84938088736&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84938088736&partnerID=8YFLogxK
U2 - 10.1016/j.csl.2015.04.001
DO - 10.1016/j.csl.2015.04.001
M3 - Article
AN - SCOPUS:84938088736
SN - 0885-2308
VL - 34
SP - 308
EP - 322
JO - Computer Speech and Language
JF - Computer Speech and Language
IS - 1
ER -