TY - GEN
T1 - Design and Construction of Japanese Multimodal Utterance Corpus with Improved Emotion Balance and Naturalness
AU - Horii, Daisuke
AU - Ito, Akinori
AU - Nose, Takashi
N1 - Funding Information:
This article is based on results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).
Publisher Copyright:
© 2022 Asia-Pacific of Signal and Information Processing Association (APSIPA).
PY - 2022
Y1 - 2022
N2 - This paper describes the development of a corpus of multimodal emotional behaviors. So far, many databases of multimodal affective behaviors have been developed. These databases are divided into spontaneous and acted behavior databases. Acted behavior databases can easily collect words with a balanced number of emotions; however, it has been pointed out that acted speech differs from spontaneous speech. In this work, we aim to collect acted multimodal emotional utterances that sound as natural as possible. To this end, we first collected scenes from tweets in which emotional balance was considered. Then, we performed an initial corpus collection, demonstrating that we could collect various emotional utterances. Next, we collected the corpus using a crowdsourcing platform. Then, we evaluated the naturalness of the collected speech by comparing it with the naturalness of the read speech database (JTES) and the spontaneous speech database (SMOC). As a result, the collected corpus was more natural than JTES, which indicates that the recording program effectively collected naturally-sounding emotional behavior corpus.
AB - This paper describes the development of a corpus of multimodal emotional behaviors. So far, many databases of multimodal affective behaviors have been developed. These databases are divided into spontaneous and acted behavior databases. Acted behavior databases can easily collect words with a balanced number of emotions; however, it has been pointed out that acted speech differs from spontaneous speech. In this work, we aim to collect acted multimodal emotional utterances that sound as natural as possible. To this end, we first collected scenes from tweets in which emotional balance was considered. Then, we performed an initial corpus collection, demonstrating that we could collect various emotional utterances. Next, we collected the corpus using a crowdsourcing platform. Then, we evaluated the naturalness of the collected speech by comparing it with the naturalness of the read speech database (JTES) and the spontaneous speech database (SMOC). As a result, the collected corpus was more natural than JTES, which indicates that the recording program effectively collected naturally-sounding emotional behavior corpus.
UR - http://www.scopus.com/inward/record.url?scp=85146265774&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146265774&partnerID=8YFLogxK
U2 - 10.23919/APSIPAASC55919.2022.9980272
DO - 10.23919/APSIPAASC55919.2022.9980272
M3 - Conference contribution
AN - SCOPUS:85146265774
T3 - Proceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
SP - 245
EP - 250
BT - Proceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
Y2 - 7 November 2022 through 10 November 2022
ER -