Morphological analysis of the corpus of spontaneous Japanese

Kiyotaka Uchimoto, Kazuma Takaoka, Chikashi Nobata, Atsushi Yamada, Satoshi Sekine, Hitoshi Isahara

    Research output: Contribution to journalArticlepeer-review

    Abstract

    This paper describes two methods for detecting word segments and their morphological information in a Japanese spontaneous speech corpus, and describes how to tag a large spontaneous speech corpus accurately by using the two methods. The first method is used to detect any type of word segments. The second method is used when there are several definitions for word segments and their POS categories, and when one type of word segments includes another type of word segments. In this paper, we show that by using semi-automatic analysis, we achieve a precision of better than 99% for detecting and tagging short-unit words and 97% for long-unit words; the two types of words that comprise the corpus. We also show that better accuracy is achieved by using both methods than by using only the first.
    Original languageEnglish
    Pages (from-to)382-390
    Number of pages8
    JournalIEEE Transactions on Speech and Audio Processing
    Volume12
    Issue number4
    DOIs
    Publication statusPublished - Jul 2004

    Keywords

    • Japanese spontaneous speech corpus
    • Maximum entropy models
    • Morphological analysis
    • Natural language processing
    • Unknown words

    Fingerprint

    Dive into the research topics of 'Morphological analysis of the corpus of spontaneous Japanese'. Together they form a unique fingerprint.

    Cite this