Abstract
The C-value/NC-value algorithm, a hybrid approach to automatic term recognition, has been originally developed to extract multiword term candidates from specialised documents written in English. Here, we present three main modifications to this algorithm that affect how the obtained output is refined. The first modification aims to maximise the number of real terms in the list of candidates with a new approach for the stop-list application process. The second modification adapts the C-value calculation formula in order to consider single word terms. The third modification changes how the term candidates are grouped, exploiting a lemmatised version of the input corpus. Additionally, size of candidate's context window is variable. We also show the necessary linguistic modifications to apply this algorithm to the recognition of term candidates in Spanish. © Springer-Verlag Berlin Heidelberg 2009.
Original language | English |
---|---|
Title of host publication | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|Lect. Notes Comput. Sci. |
Place of Publication | Proceedings of the 10th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2009) |
Publisher | Springer Nature |
Pages | 125-136 |
Number of pages | 11 |
Volume | 5449 |
DOIs | |
Publication status | Published - 2009 |
Event | 10th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2009 - Mexico City Duration: 1 Jul 2009 → … |
Other
Other | 10th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2009 |
---|---|
City | Mexico City |
Period | 1/07/09 → … |