TY - GEN
T1 - Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data
AU - Suzuki, Jun
AU - Isozaki, Hideki
PY - 2008
Y1 - 2008
N2 - This paper provides evidence that the use of more unlabeled data in semi-supervised learning can improve the performance of Natural Language Processing (NLP) tasks, such as part-of-speech tagging, syntactic chunking, and named entity recognition. We first propose a simple yet powerful semi-supervised discriminative model appropriate for handling large scale unlabeled data. Then, we describe experiments performed on widely used test collections, namely, PTB III data, CoNLL'00 and'03 shared task data for the above three NLP tasks, respectively. We incorporate up to 1G-words (one billion tokens) of unlabeled data, which is the largest amount of unlabeled data ever used for these tasks, to investigate the performance improvement. In addition, our results are superior to the best reported results for all of the above test collections.
AB - This paper provides evidence that the use of more unlabeled data in semi-supervised learning can improve the performance of Natural Language Processing (NLP) tasks, such as part-of-speech tagging, syntactic chunking, and named entity recognition. We first propose a simple yet powerful semi-supervised discriminative model appropriate for handling large scale unlabeled data. Then, we describe experiments performed on widely used test collections, namely, PTB III data, CoNLL'00 and'03 shared task data for the above three NLP tasks, respectively. We incorporate up to 1G-words (one billion tokens) of unlabeled data, which is the largest amount of unlabeled data ever used for these tasks, to investigate the performance improvement. In addition, our results are superior to the best reported results for all of the above test collections.
UR - http://www.scopus.com/inward/record.url?scp=84859884966&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84859884966&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84859884966
SN - 9781932432046
T3 - ACL-08: HLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference
SP - 665
EP - 673
BT - ACL-08
T2 - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-08: HLT
Y2 - 15 June 2008 through 20 June 2008
ER -