Abstract
A key element for the extraction of information in a natural language document is a set of shallow text analysis rules, which are
typically based on pre-defined linguistic patterns. Current Information Extraction research aims at the automatic or semi-automatic
acquisition of these rules. Within this research framework, we consider in this paper the potential for acquiring generic extraction
patterns. Our research is based on the hypothesis that, terms (the linguistic representation of concepts in a specialised domain) and
Named Entities (the names of persons, organisations and dates of importance in the text) can together be considered as the basic
semantic entities of textual information and can therefore be used as a basis for the conceptual representation of domain specific texts
and the definition of what constitutes an information extraction template in linguistic terms. The extraction patterns discovered by this
approach involve significant associations of these semantic entities with verbs and they can subsequently be translated into the
grammar formalism of choice.
typically based on pre-defined linguistic patterns. Current Information Extraction research aims at the automatic or semi-automatic
acquisition of these rules. Within this research framework, we consider in this paper the potential for acquiring generic extraction
patterns. Our research is based on the hypothesis that, terms (the linguistic representation of concepts in a specialised domain) and
Named Entities (the names of persons, organisations and dates of importance in the text) can together be considered as the basic
semantic entities of textual information and can therefore be used as a basis for the conceptual representation of domain specific texts
and the definition of what constitutes an information extraction template in linguistic terms. The extraction patterns discovered by this
approach involve significant associations of these semantic entities with verbs and they can subsequently be translated into the
grammar formalism of choice.
Original language | English |
---|---|
Title of host publication | Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004) |
Publisher | European Language Resources Association |
Pages | 745-748 |
ISBN (Print) | 2-9517408-1-6 |
Publication status | Published - 2004 |
Keywords
- information extraction
- text mining
- rule induction