Abstract
An analogy-based clustering method is proposed, through the alignment of
definitions from two different sources. The method relies on the assumption that
two authors use different words to express a definition. The algorithm
introduced here is analogy-based, and starts from calculating the Levenshtein
distance, which is a variation of the edit distance, and allows us to align the
definitions. As a measure of similarity, the concept of longest collocation couple
is introduced, which is the basis of clustering similar words. The process
iterates, replacing similar pairs of words in the definitions until no new clusters
are found.
definitions from two different sources. The method relies on the assumption that
two authors use different words to express a definition. The algorithm
introduced here is analogy-based, and starts from calculating the Levenshtein
distance, which is a variation of the edit distance, and allows us to align the
definitions. As a measure of similarity, the concept of longest collocation couple
is introduced, which is the basis of clustering similar words. The process
iterates, replacing similar pairs of words in the definitions until no new clusters
are found.
Original language | English |
---|---|
Title of host publication | Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2000) |
Editors | Alexander Gelbukh |
Publisher | Instituto Politécnico Nacional |
Number of pages | 14 |
Publication status | Published - 2000 |
Keywords
- clustering
- alignment of definitions
- computational linguistics
- natural language processing