TY - GEN
T1 - Learning to describe E-commerce images from noisy online data
AU - Yashima, Takuya
AU - Okazaki, Naoaki
AU - Inui, Kentaro
AU - Yamaguchi, Kota
AU - Okatani, Takayuki
N1 - Funding Information:
This work was supported by JSPS KAKENHI Grant Numbers JP15H05919 and JP15H05318.
Publisher Copyright:
© Springer International Publishing AG 2017.
PY - 2017
Y1 - 2017
N2 - Recent study shows successful results in generating a proper language description for the given image, where the focus is on detecting and describing the contextual relationship in the image, such as the kind of object, relationship between two objects, or the action. In this paper, we turn our attention to more subjective components of descriptions that contain rich expressions to modify objects – namely attribute expressions. We start by collecting a large amount of product images from the online market site Etsy, and consider learning a language generation model using a popular combination of a convolutional neural network (CNN) and a recurrent neural network (RNN). Our Etsy dataset contains unique noise characteristics often arising in the online market. We first apply natural language processing techniques to extract highquality, learnable examples in the real-world noisy data. We learn a generation model from product images with associated title descriptions, and examine how e-commerce specific meta-data and fine-tuning improve the generated expression. The experimental results suggest that we are able to learn from the noisy online data and produce a product description that is closer to a man-made description with possibly subjective attribute expressions.
AB - Recent study shows successful results in generating a proper language description for the given image, where the focus is on detecting and describing the contextual relationship in the image, such as the kind of object, relationship between two objects, or the action. In this paper, we turn our attention to more subjective components of descriptions that contain rich expressions to modify objects – namely attribute expressions. We start by collecting a large amount of product images from the online market site Etsy, and consider learning a language generation model using a popular combination of a convolutional neural network (CNN) and a recurrent neural network (RNN). Our Etsy dataset contains unique noise characteristics often arising in the online market. We first apply natural language processing techniques to extract highquality, learnable examples in the real-world noisy data. We learn a generation model from product images with associated title descriptions, and examine how e-commerce specific meta-data and fine-tuning improve the generated expression. The experimental results suggest that we are able to learn from the noisy online data and produce a product description that is closer to a man-made description with possibly subjective attribute expressions.
UR - http://www.scopus.com/inward/record.url?scp=85016296170&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85016296170&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-54193-8_6
DO - 10.1007/978-3-319-54193-8_6
M3 - Conference contribution
AN - SCOPUS:85016296170
SN - 9783319541921
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 85
EP - 100
BT - Computer Vision - 13th Asian Conference on Computer Vision, ACCV 2016, Revised Selected Papers
A2 - Nishino, Ko
A2 - Lai, Shang-Hong
A2 - Lepetit, Vincent
A2 - Sato, Yoichi
PB - Springer Verlag
T2 - 13th Asian Conference on Computer Vision, ACCV 2016
Y2 - 20 November 2016 through 24 November 2016
ER -