Abstract
Automatic Term Recognition (ATR) is defined as the task of identifying domain specific terms from technical corpora. Termhood-based approaches measure the degree that a candidate term refers to a domain specific concept. Unithood-based approaches measure the attachment strength of a candidate term constituents. These methods have been evaluated using different, often incompatible evaluation schemes and datasets. This paper provides an overview and a thorough evaluation of state-of-the-art ATR methods, under a common evaluation framework, i.e. corpora and evaluation method. Our contributions are two-fold: (1) We compare a number of different ATR methods, showing that termhood-based methods achieve in general superior performance. (2) We show that the number of independent occurrences of a candidate term is the most effective source for estimating term nestedness, improving ATR performance. © 2008 Springer-Verlag Berlin Heidelberg.
Original language | English |
---|---|
Title of host publication | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|Lect. Notes Comput. Sci. |
Publisher | Springer Nature |
Pages | 248-259 |
Number of pages | 11 |
Volume | 5221 |
ISBN (Print) | 3540852867, 9783540852865 |
DOIs | |
Publication status | Published - 2008 |
Event | 6th International Conference on Natural Language Processing, GoTAL 2008 - Gothenburg Duration: 1 Jul 2008 → … http://dx.doi.org/10.1007/978-3-540-85287-2\_24 |
Publication series
Name | GoTAL '08 |
---|
Conference
Conference | 6th International Conference on Natural Language Processing, GoTAL 2008 |
---|---|
City | Gothenburg |
Period | 1/07/08 → … |
Internet address |
Keywords
- ATR
- Automatic term recognition
- Term extraction