TY - GEN
T1 - Visualization of convolutional neural networks for monocular depth estimation
AU - Hu, Junjie
AU - Zhang, Yan
AU - Okatani, Takayuki
N1 - Funding Information:
Acknowledgments: This work was partly supported by JSPS KAKENHI Grant Number JP15H05919 and JP19H01110 and by JST CREST Grant Number JP-MJCR14D1.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - Recently, convolutional neural networks (CNNs) have shown great success on the task of monocular depth estimation. A fundamental yet unanswered question is: How CNNs can infer depth from a single image. Toward answering this question, we consider visualization of inference of a CNN by identifying relevant pixels of an input image to depth estimation. We formulate it as an optimization problem of identifying the smallest number of image pixels from which the CNN can estimate a depth map with the minimum difference from the estimate from the entire image. To cope with a difficulty with optimization through a deep CNN, we propose to use another network to predict those relevant image pixels in a forward computation. In our experiments, we first show the effectiveness of this approach, and then apply it to different depth estimation networks on indoor and outdoor scene datasets. The results provide several findings that help exploration of the above question.
AB - Recently, convolutional neural networks (CNNs) have shown great success on the task of monocular depth estimation. A fundamental yet unanswered question is: How CNNs can infer depth from a single image. Toward answering this question, we consider visualization of inference of a CNN by identifying relevant pixels of an input image to depth estimation. We formulate it as an optimization problem of identifying the smallest number of image pixels from which the CNN can estimate a depth map with the minimum difference from the estimate from the entire image. To cope with a difficulty with optimization through a deep CNN, we propose to use another network to predict those relevant image pixels in a forward computation. In our experiments, we first show the effectiveness of this approach, and then apply it to different depth estimation networks on indoor and outdoor scene datasets. The results provide several findings that help exploration of the above question.
UR - http://www.scopus.com/inward/record.url?scp=85081897377&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081897377&partnerID=8YFLogxK
U2 - 10.1109/ICCV.2019.00397
DO - 10.1109/ICCV.2019.00397
M3 - Conference contribution
AN - SCOPUS:85081897377
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 3868
EP - 3877
BT - Proceedings - 2019 International Conference on Computer Vision, ICCV 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 17th IEEE/CVF International Conference on Computer Vision, ICCV 2019
Y2 - 27 October 2019 through 2 November 2019
ER -