A parameter generation algorithm using local variance for HMM-Based speech synthesis

Takashi Nose, Vataya Chunwijitra, Takao Kobayashi

Research output: Contribution to journalArticlepeer-review

13 Citations (Scopus)


This paper proposes a parameter generation algorithm using a local variance (LV) model in HMM-based speech synthesis. In the proposed technique, we define the LV as a feature that represents the local variation of a spectral parameter sequence and model LVs using HMMs. Context-dependent HMMs are used to capture the dependence of LV trajectories on phonetic and prosodic contexts. In addition, the dynamic features of LVs are taken into account as well as the static one to appropriately model the dynamic characteristics of LV trajectories. By introducing the LV model into the spectral parameter generation process, the proposed technique can impose a more precise variance constraint for each frame than the conventional technique with a global variance (GV) model. Consequently, the proposed technique alleviates the excessive spectral peak enhancement that often occurs in GV-based parameter generation. Objective evaluation results show that the proposed technique can generate better spectral parameter trajectories than the GV-based technique in terms of spectral and LV distortion. Moreover, the results of subjective evaluation demonstrate that the proposed technique can generate synthetic speech significantly closer to the original one than the conventional technique while maintaining speech naturalness.

Original languageEnglish
Article number6609040
Pages (from-to)221-228
Number of pages8
JournalIEEE Journal on Selected Topics in Signal Processing
Issue number2
Publication statusPublished - 2014 Apr
Externally publishedYes


  • HMM-based speech synthesis
  • local variance
  • over-smoothing problem
  • spectral parameter generation

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering


Dive into the research topics of 'A parameter generation algorithm using local variance for HMM-Based speech synthesis'. Together they form a unique fingerprint.

Cite this