Abstract
The triphone model is frequently used as an acoustic model. It is effective for modeling phonetic variations caused by coarticulation. However, it is known that acoustic features of phonemes are also affected by other factors such as speaking style and speaking speed. In this paper, a new acoustic model is proposed. All training data which have the same phoneme context are automatically clustered into several clusters based on acoustic similarity, and a "sub-triphones" is trained using training data corresponding to a cluster. In experiments, the sub-triphone model achieved about 5% higher phoneme accuracy than the triphone model.
Original language | English |
---|---|
Pages (from-to) | 1399-1402 |
Number of pages | 4 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Publication status | Published - 2009 Nov 26 |
Event | 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009 - Brighton, United Kingdom Duration: 2009 Sept 6 → 2009 Sept 10 |
Keywords
- HMnet
- SSS-free
- Sub-triphone model
- Triphone
ASJC Scopus subject areas
- Human-Computer Interaction
- Signal Processing
- Software
- Sensory Systems