TY - CONF
T1 - Effects of intermodal timing difference and speed difference on intelligibility of auditory-visual speech in younger and older adults
AU - Tanaka, Akihiro
AU - Sakamoto, Shuichi
AU - Tsumura, Komi
AU - Suzuki, Yôiti
N1 - Funding Information:
This work was supported by a Grant-in-Aid for Young Scientists (B) No. 18730462 from MEXT Japan and the Cooperative Research Project Program of the Research Institute of Electrical Communication, Tohoku University. The authors would like to thank Dr. Hideki Kawahara for permission to use the STRAIGHT vocoding method. The authors would also like to thank the members of the NHK Science and Technical Research Laboratories for their helpful comments on our research.
Publisher Copyright:
© 2007 Auditory-Visual Speech Processing 2007, AVSP 2007. All rights reserved.
PY - 2007
Y1 - 2007
N2 - Previous studies have revealed a temporal window during which human observers perceive physically desynchronized auditory and visual signals as synchronous. This study investigated effects of intermodal timing differences and speed differences on intelligibility of auditory-visual speech. We used 20 minimal pairs of Japanese four-mora words such as “mi-zu-a-ge” (catch landing) versus “mi-zu-a-me” (starch syrup) and administered intelligibility tests. Words were presented under visual-only, auditory-only, and auditory-visual (AV) conditions. Two types of AV conditions were used: asynchronous and expansion conditions. In asynchronous (i.e. timing difference) conditions, the audio lag was 0-400 ms. In expansion (i.e. speed difference) conditions, the auditory signal was time-expanded while the visual signal was kept at the original speed. The amount of expansion was 0-400 ms. Results showed that the word intelligibility declined as the timing difference and speed difference increased. Results of AV benefit (i.e. the superiority of AV performance over auditory-only performance) revealed that the AV benefit at the end of words declined as the speed difference increased, although it did not decline as timing difference increased. These results suggest that intermodal lag recalibration requires a constant timing difference between auditory and visual signals. Older adults recalibrated neither the timing difference nor the speed difference. These results might be useful for design of a multimodal speech-rate conversion system.
AB - Previous studies have revealed a temporal window during which human observers perceive physically desynchronized auditory and visual signals as synchronous. This study investigated effects of intermodal timing differences and speed differences on intelligibility of auditory-visual speech. We used 20 minimal pairs of Japanese four-mora words such as “mi-zu-a-ge” (catch landing) versus “mi-zu-a-me” (starch syrup) and administered intelligibility tests. Words were presented under visual-only, auditory-only, and auditory-visual (AV) conditions. Two types of AV conditions were used: asynchronous and expansion conditions. In asynchronous (i.e. timing difference) conditions, the audio lag was 0-400 ms. In expansion (i.e. speed difference) conditions, the auditory signal was time-expanded while the visual signal was kept at the original speed. The amount of expansion was 0-400 ms. Results showed that the word intelligibility declined as the timing difference and speed difference increased. Results of AV benefit (i.e. the superiority of AV performance over auditory-only performance) revealed that the AV benefit at the end of words declined as the speed difference increased, although it did not decline as timing difference increased. These results suggest that intermodal lag recalibration requires a constant timing difference between auditory and visual signals. Older adults recalibrated neither the timing difference nor the speed difference. These results might be useful for design of a multimodal speech-rate conversion system.
KW - asynchrony
KW - intelligibility
KW - older listeners
KW - speech-rate conversion
UR - http://www.scopus.com/inward/record.url?scp=80052501746&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80052501746&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:80052501746
T2 - 2007 International Conference on Auditory-Visual Speech Processing, AVSP 2007
Y2 - 31 August 2007 through 3 September 2007
ER -