In this work, we propose a neural network with novelty attention modules to perform human gesture recognition from point cloud data. Our network directly takes point clouds as input, and the attention modules learn to pay attention to only points that contribute to more accurate gesture recognition. The input point cloud consists of both spatial and temporal dimensions since the point cloud frames are concatenated as one input sample. This enables the network to learn the spatio-temporal features of the data without explicit modeling of gesture dynamics. The gather and scatter operations are proposed for points downsampling and upsampling in the feature space. We evaluate the performance of our network using a dataset of common Japanese gestures. The proposed network achieves state-of-the-art performance on this dataset. The analysis of architecture design and parameter choices are also discussed.
|Publication status||Published - 2019|
|Event||29th British Machine Vision Conference, BMVC 2018 - Newcastle, United Kingdom|
Duration: 2018 Sept 3 → 2018 Sept 6
|Conference||29th British Machine Vision Conference, BMVC 2018|
|Period||18/9/3 → 18/9/6|