Restructuring output layers of deep neural networks using minimum risk parameter clustering

Yotaro Kubo, Jun Suzuki, Takaaki Hori, Atsushi Nakamura

Research output: Contribution to journalConference articlepeer-review

1 Citation (Scopus)


This paper attempts to optimize a topology of hidden Markov models (HMMs) for automatic speech recognition. Current state-of-the-art acoustic models for ASR involve HMMs with deep neural network (DNN)-based emission density functions. Even though DNN parameters are typically trained by optimizing a discriminative criterion, topology optimization of HMMs is usually performed by optimizing a generative criterion. Several approaches have been studied to achieve a discriminative state clustering, these approaches typically assume underlying Gaussian distributions of the acoustic features, and do not compatible with DNN-based emission density functions. In this paper, we attempt to derive a discriminative restructuring method of an HMM topology by introducing discriminative optimization with discrete constraints on the parameters, which force the parameters to be tied with the parameters of the other states. By applying this constrained optimization to the clustering of parameters of DNN-based acoustic models, we derived a discriminative HMM restructuring method that maintains discriminative performance of the original HMMs with the large number of states.

Original languageEnglish
Pages (from-to)1068-1072
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication statusPublished - 2014
Event15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore
Duration: 2014 Sept 142014 Sept 18


  • Automatic speech recognition
  • Context clustering
  • Discrete constraint optimization


Dive into the research topics of 'Restructuring output layers of deep neural networks using minimum risk parameter clustering'. Together they form a unique fingerprint.

Cite this