TY - GEN
T1 - Fusion of Camera and Lidar Data for Large Scale Semantic Mapping
AU - Westfechtel, Thomas
AU - Ohno, Kazunori
AU - Bezerra Neto, Ranulfo Plutarco
AU - Kojima, Shotaro
AU - Tadokoro, Satoshi
N1 - Funding Information:
This research has been supported by CREST: Recognition, Summarization and Retrieval of Large-Scale Multimedia Data and RIKEN AIP challenge.
Funding Information:
This research has been supported by CREST: Recognition, Summarization and Retrieval of Large-Scale Multimedia Data and RIKEN AIP challenge
Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - Current self-driving vehicles rely on detailed maps of the environment, that contains exhaustive semantic information. This work presents a strategy to utilize the recent advancements in semantic segmentation of images, fuse the information extracted from the camera stream with accurate depth measurements of a Lidar sensor in order to create large scale semantic labeled point clouds of the environment. We fuse the color and semantic data gathered from a round-view camera system with the depth data gathered from a Lidar sensor. In our framework, each Lidar scan point is projected onto the camera stream to extract the color and semantic information while at the same time a large scale 3D map of the environment is generated by a Lidar-based SLAM algorithm. While we employed a network that achieved state of the art semantic segmentation results on the Cityscape dataset [1] (IoU score of 82.1%), the sole use of the extracted semantic information only achieved an IoU score of 38.9% on 105 manually labeled 5x5m tiles from 5 different trial runs within the Sendai city in Japan (this decrease in accuracy will discussed in section III-B). To increase the performance, we reclassify the label of each point. For this two different approaches were investigated: a random forest and SparseConvNet [2] (a deep learning approach). We investigated for both methods how the inclusion of semantic labels from the camera stream affected the classification task of the 3D point cloud. To which end we show, that a significant performance increase can be achieved by doing so - 25.4 percent points for random forest (40.0% w/o labels to 65.4% with labels) and 16.6 in case of the SparseConvNet (33.4% w/o labels to 50.8% with labels). Finally, we present practical examples on how semantic enriched maps can be employed for further tasks. In particular, we show how different classes (i.e. cars and vegetation) can be removed from the point cloud in order to increase the visibility of other classes (i.e. road and buildings). And how the data could be used for extracting the trajectories of vehicles and pedestrians.
AB - Current self-driving vehicles rely on detailed maps of the environment, that contains exhaustive semantic information. This work presents a strategy to utilize the recent advancements in semantic segmentation of images, fuse the information extracted from the camera stream with accurate depth measurements of a Lidar sensor in order to create large scale semantic labeled point clouds of the environment. We fuse the color and semantic data gathered from a round-view camera system with the depth data gathered from a Lidar sensor. In our framework, each Lidar scan point is projected onto the camera stream to extract the color and semantic information while at the same time a large scale 3D map of the environment is generated by a Lidar-based SLAM algorithm. While we employed a network that achieved state of the art semantic segmentation results on the Cityscape dataset [1] (IoU score of 82.1%), the sole use of the extracted semantic information only achieved an IoU score of 38.9% on 105 manually labeled 5x5m tiles from 5 different trial runs within the Sendai city in Japan (this decrease in accuracy will discussed in section III-B). To increase the performance, we reclassify the label of each point. For this two different approaches were investigated: a random forest and SparseConvNet [2] (a deep learning approach). We investigated for both methods how the inclusion of semantic labels from the camera stream affected the classification task of the 3D point cloud. To which end we show, that a significant performance increase can be achieved by doing so - 25.4 percent points for random forest (40.0% w/o labels to 65.4% with labels) and 16.6 in case of the SparseConvNet (33.4% w/o labels to 50.8% with labels). Finally, we present practical examples on how semantic enriched maps can be employed for further tasks. In particular, we show how different classes (i.e. cars and vegetation) can be removed from the point cloud in order to increase the visibility of other classes (i.e. road and buildings). And how the data could be used for extracting the trajectories of vehicles and pedestrians.
UR - http://www.scopus.com/inward/record.url?scp=85076818565&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85076818565&partnerID=8YFLogxK
U2 - 10.1109/ITSC.2019.8917107
DO - 10.1109/ITSC.2019.8917107
M3 - Conference contribution
AN - SCOPUS:85076818565
T3 - 2019 IEEE Intelligent Transportation Systems Conference, ITSC 2019
SP - 257
EP - 264
BT - 2019 IEEE Intelligent Transportation Systems Conference, ITSC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE Intelligent Transportation Systems Conference, ITSC 2019
Y2 - 27 October 2019 through 30 October 2019
ER -