TY - GEN
T1 - Rescue Dog Action Recognition by Integrating Ego-Centric Video, Sound and Sensor Information
AU - Ide, Yuta
AU - Araki, Tsuyohito
AU - Hamada, Ryunosuke
AU - Ohno, Kazunori
AU - Yanai, Keiji
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - A dog which assists rescue activity in the scene of disasters such as earthquakes and landslides is called a “disaster rescue dog” or just a “rescue dog”. In Japan where earthquakes happen frequently, a research project on “Cyber-Rescue” is being organized for more efficient rescue activities. In the project, to analyze the activities of rescue dogs in the scene of disasters, “Cyber Dog Suits” equipped with sensors, a camera and a GPS were developed. In this work, we recognize dog activities in the ego-centric dog videos taken by the camera mounted on the cyber-dog suits. To do that, we propose an image/sound/sensor-based four-stream CNN for dog activity recognition which integrates sound and sensor signals as well as motion and appearance. We conducted some experiments for multi-class activity categorization using the proposed method. As a result, the proposed method which integrates appearance, motion, sound and sensor information achieved the highest accuracy, 48.05%. This result is relatively high as a recognition result of ego-centric videos.
AB - A dog which assists rescue activity in the scene of disasters such as earthquakes and landslides is called a “disaster rescue dog” or just a “rescue dog”. In Japan where earthquakes happen frequently, a research project on “Cyber-Rescue” is being organized for more efficient rescue activities. In the project, to analyze the activities of rescue dogs in the scene of disasters, “Cyber Dog Suits” equipped with sensors, a camera and a GPS were developed. In this work, we recognize dog activities in the ego-centric dog videos taken by the camera mounted on the cyber-dog suits. To do that, we propose an image/sound/sensor-based four-stream CNN for dog activity recognition which integrates sound and sensor signals as well as motion and appearance. We conducted some experiments for multi-class activity categorization using the proposed method. As a result, the proposed method which integrates appearance, motion, sound and sensor information achieved the highest accuracy, 48.05%. This result is relatively high as a recognition result of ego-centric videos.
UR - http://www.scopus.com/inward/record.url?scp=85104282812&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85104282812&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-68796-0_23
DO - 10.1007/978-3-030-68796-0_23
M3 - Conference contribution
AN - SCOPUS:85104282812
SN - 9783030687953
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 321
EP - 333
BT - Pattern Recognition. ICPR International Workshops and Challenges, 2021, Proceedings
A2 - Del Bimbo, Alberto
A2 - Cucchiara, Rita
A2 - Sclaroff, Stan
A2 - Farinella, Giovanni Maria
A2 - Mei, Tao
A2 - Bertini, Marco
A2 - Escalante, Hugo Jair
A2 - Vezzani, Roberto
PB - Springer Science and Business Media Deutschland GmbH
T2 - 25th International Conference on Pattern Recognition Workshops, ICPR 2020
Y2 - 10 January 2021 through 11 January 2021
ER -