Extracting semantic clusters from the alignment of definitions

Gerardo Sierra, John McNaught

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    Through the alignment of definitions from
    two or more different sources, it is
    possible to retrieve pairs of words that can
    be used indistinguishably in the same
    sentence without changing the meaning of
    the concept. As lexicographic work
    exploits common defining schemes, such
    as genus and differentia, a concept is
    similarly defined by different dictionaries.
    The difference in words used between two
    lexicographic sources lets us extend the
    lexical knowledge base, so that clustering
    is available through merging two or more
    dictionaries into a single database and
    then using an appropriate alignment
    technique. Since alignment starts from the
    same entry of two dictionaries, clustering
    is faster than any other technique.
    The algorithm introduced here is analogy-based, and starts from calculating the
    Levenshtein distance, which is a variation
    of the edit distance, and allows us to align
    the definitions. As a measure of similarity,
    the concept of longest collocation couple
    is introduced, which is the basis of
    clustering similar words. The process
    iterates, replacing similar pairs of words
    in the definitions until no new clusters are
    found.
    Original languageEnglish
    Title of host publicationProceedings of the 18th International Conference on Computational Linguistics (COLING 2000)
    Place of PublicationNew Brunswick
    PublisherAssociation for Computational Linguistics
    Pages795-801
    Volume2
    ISBN (Print)1-55860-717-X
    Publication statusPublished - 2000

    Keywords

    • clustering
    • alignment of definitions
    • computational linguistics
    • natural language processing

    Fingerprint

    Dive into the research topics of 'Extracting semantic clusters from the alignment of definitions'. Together they form a unique fingerprint.

    Cite this