TY - GEN
T1 - Relevant document retrieval using a spoken document
AU - Ito, Akinori
AU - Uno, Yu
AU - Masumura, Ryo
AU - Ito, Masashi
AU - Makino, Shozo
PY - 2009/12/1
Y1 - 2009/12/1
N2 - In this paper, we proposed a method of retrieving documents from the World Wide Web using a spoken document as a "key." This method can be viewed as a speech version of an ordinary relevant document retrieval, where a text document is used as a query of retrieval. Basically the retrieval is based on an automatic transcription of a spoken document using a speech recognizer. The difficult point of this task is that the automatic transcription contains many recognition errors, therefore we cannot trust keywords extracted from the automatic transcription using conventional method such as tf·idf. To solve this problem, we developed three methods. The first one is to measure relevance of a keyword to the spoken document by using Web documents retrieved using a Web search engine by specifying the keyword as a query. The second one is to compose a query from the selected keywords so that words derive from misrecognitions are excluded and similar words are gathered. The third one is to measure relevance of a downloaded Web document to the spoken document. The experimental results suggest that the proposed methods are promising for retrieving relevant documents of a spoken document.
AB - In this paper, we proposed a method of retrieving documents from the World Wide Web using a spoken document as a "key." This method can be viewed as a speech version of an ordinary relevant document retrieval, where a text document is used as a query of retrieval. Basically the retrieval is based on an automatic transcription of a spoken document using a speech recognizer. The difficult point of this task is that the automatic transcription contains many recognition errors, therefore we cannot trust keywords extracted from the automatic transcription using conventional method such as tf·idf. To solve this problem, we developed three methods. The first one is to measure relevance of a keyword to the spoken document by using Web documents retrieved using a Web search engine by specifying the keyword as a query. The second one is to compose a query from the selected keywords so that words derive from misrecognitions are excluded and similar words are gathered. The third one is to measure relevance of a downloaded Web document to the spoken document. The experimental results suggest that the proposed methods are promising for retrieving relevant documents of a spoken document.
UR - http://www.scopus.com/inward/record.url?scp=74549125436&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=74549125436&partnerID=8YFLogxK
U2 - 10.1109/ISCIT.2009.5341051
DO - 10.1109/ISCIT.2009.5341051
M3 - Conference contribution
AN - SCOPUS:74549125436
SN - 9781424445219
T3 - 2009 9th International Symposium on Communications and Information Technology, ISCIT 2009
SP - 1483
EP - 1488
BT - 2009 9th International Symposium on Communications and Information Technology, ISCIT 2009
T2 - 2009 9th International Symposium on Communications and Information Technology, ISCIT 2009
Y2 - 28 September 2009 through 30 September 2009
ER -