TY - GEN
T1 - Fused feature representation discovery for high-dimensional and sparse data
AU - Suzuki, Jun
AU - Nagata, Masaaki
PY - 2014/1/1
Y1 - 2014/1/1
N2 - The automatic discovery of a significant low-dimensional feature representation from a given data set is a fundamental problem in machine learning. This paper focuses specifically on the development of the feature representation discovery methods appropriate for high-dimensional and sparse data. We formulate our feature representation discovery problem as a variant of the semi-supervised learning problem, namely, as an optimization problem over unsupervised data whose ob-jective is evaluating the impact of each feature with respect to modeling a target task according to the initial model constructed by using supervised data. The most notable characteristic of our method is that it offers a feasible processing speed even if the numbers of data and features are both in the millions or even billions, and successfully provides a significantly small number of feature sets, i.e., fewer than 10, that can also offer improved performance compared with those obtained with the original feature sets. We demonstrate the effectiveness of our method in experiments consisting of two well-studied natural language processing tasks.
AB - The automatic discovery of a significant low-dimensional feature representation from a given data set is a fundamental problem in machine learning. This paper focuses specifically on the development of the feature representation discovery methods appropriate for high-dimensional and sparse data. We formulate our feature representation discovery problem as a variant of the semi-supervised learning problem, namely, as an optimization problem over unsupervised data whose ob-jective is evaluating the impact of each feature with respect to modeling a target task according to the initial model constructed by using supervised data. The most notable characteristic of our method is that it offers a feasible processing speed even if the numbers of data and features are both in the millions or even billions, and successfully provides a significantly small number of feature sets, i.e., fewer than 10, that can also offer improved performance compared with those obtained with the original feature sets. We demonstrate the effectiveness of our method in experiments consisting of two well-studied natural language processing tasks.
UR - http://www.scopus.com/inward/record.url?scp=84908193699&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84908193699&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84908193699
T3 - Proceedings of the National Conference on Artificial Intelligence
SP - 1593
EP - 1599
BT - Proceedings of the National Conference on Artificial Intelligence
PB - AI Access Foundation
T2 - 28th AAAI Conference on Artificial Intelligence, AAAI 2014, 26th Innovative Applications of Artificial Intelligence Conference, IAAI 2014 and the 5th Symposium on Educational Advances in Artificial Intelligence, EAAI 2014
Y2 - 27 July 2014 through 31 July 2014
ER -