A domain-independent approach to IE rule development

Kalliopi Zervanou, John McNaught

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    A key element for the extraction of information in a natural language document is a set of shallow text analysis rules, which are
    typically based on pre-defined linguistic patterns. Current Information Extraction research aims at the automatic or semi-automatic
    acquisition of these rules. Within this research framework, we consider in this paper the potential for acquiring generic extraction
    patterns. Our research is based on the hypothesis that, terms (the linguistic representation of concepts in a specialised domain) and
    Named Entities (the names of persons, organisations and dates of importance in the text) can together be considered as the basic
    semantic entities of textual information and can therefore be used as a basis for the conceptual representation of domain specific texts
    and the definition of what constitutes an information extraction template in linguistic terms. The extraction patterns discovered by this
    approach involve significant associations of these semantic entities with verbs and they can subsequently be translated into the
    grammar formalism of choice.
    Original languageEnglish
    Title of host publicationProceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)
    PublisherEuropean Language Resources Association
    Pages745-748
    ISBN (Print)2-9517408-1-6
    Publication statusPublished - 2004

    Keywords

    • information extraction
    • text mining
    • rule induction

    Fingerprint

    Dive into the research topics of 'A domain-independent approach to IE rule development'. Together they form a unique fingerprint.

    Cite this