TY - JOUR
T1 - The mechanism of additive composition
AU - Tian, Ran
AU - Okazaki, Naoaki
AU - Inui, Kentaro
N1 - Funding Information:
This work was supported by CREST, JST. We thank the anonymous reviewers for helpful comments; we thank Daichi Mochihashi, Gemma Boleda, and Percy Liang for kind advices; and we thank Naho Orita, Yuichiro Matsubayashi, and Koji Matsuda for cozy discussions on early drafts of this work.
Publisher Copyright:
© 2017, The Author(s).
PY - 2017/7/1
Y1 - 2017/7/1
N2 - Additive composition (Foltz et al. in Discourse Process 15:285–307, 1998; Landauer and Dumais in Psychol Rev 104(2):211, 1997; Mitchell and Lapata in Cognit Sci 34(8):1388–1429, 2010) is a widely used method for computing meanings of phrases, which takes the average of vector representations of the constituent words. In this article, we prove an upper bound for the bias of additive composition, which is the first theoretical analysis on compositional frameworks from a machine learning point of view. The bound is written in terms of collocation strength; we prove that the more exclusively two successive words tend to occur together, the more accurate one can guarantee their additive composition as an approximation to the natural phrase vector. Our proof relies on properties of natural language data that are empirically verified, and can be theoretically derived from an assumption that the data is generated from a Hierarchical Pitman–Yor Process. The theory endorses additive composition as a reasonable operation for calculating meanings of phrases, and suggests ways to improve additive compositionality, including: transforming entries of distributional word vectors by a function that meets a specific condition, constructing a novel type of vector representations to make additive composition sensitive to word order, and utilizing singular value decomposition to train word vectors.
AB - Additive composition (Foltz et al. in Discourse Process 15:285–307, 1998; Landauer and Dumais in Psychol Rev 104(2):211, 1997; Mitchell and Lapata in Cognit Sci 34(8):1388–1429, 2010) is a widely used method for computing meanings of phrases, which takes the average of vector representations of the constituent words. In this article, we prove an upper bound for the bias of additive composition, which is the first theoretical analysis on compositional frameworks from a machine learning point of view. The bound is written in terms of collocation strength; we prove that the more exclusively two successive words tend to occur together, the more accurate one can guarantee their additive composition as an approximation to the natural phrase vector. Our proof relies on properties of natural language data that are empirically verified, and can be theoretically derived from an assumption that the data is generated from a Hierarchical Pitman–Yor Process. The theory endorses additive composition as a reasonable operation for calculating meanings of phrases, and suggests ways to improve additive compositionality, including: transforming entries of distributional word vectors by a function that meets a specific condition, constructing a novel type of vector representations to make additive composition sensitive to word order, and utilizing singular value decomposition to train word vectors.
KW - Approximation error bounds
KW - Bias and variance
KW - Compositional distributional semantics
KW - Hierarchical Pitman–Yor process
KW - Natural language data
UR - http://www.scopus.com/inward/record.url?scp=85016981749&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85016981749&partnerID=8YFLogxK
U2 - 10.1007/s10994-017-5634-8
DO - 10.1007/s10994-017-5634-8
M3 - Article
AN - SCOPUS:85016981749
SN - 0885-6125
VL - 106
SP - 1083
EP - 1130
JO - Machine Learning
JF - Machine Learning
IS - 7
ER -