Pay-as-you-go Configuration of Entity Resolution

Ruhaila Maskat, Norman Paton, Suzanne Embury

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    232 Downloads (Pure)

    Abstract

    Entity resolution, which seeks to identify records that represent
    the same entity, is an important step in many data integration and
    data cleaning applications. However, entity resolution is challenging both
    in terms of scalability (all-against-all comparisons are computationally
    impractical) and result quality (syntactic evidence on record equivalence
    is often equivocal). As a result, end-to-end entity resolution proposals
    involve several stages, including blocking to efficiently identify candidate
    duplicates, detailed comparison to refine the conclusions from blocking,
    and clustering to identify the sets of records that may represent the
    same entity. However, the quality of the result is often crucially dependent
    on configuration parameters in all of these stages, for which it may
    be difficult for a human expert to provide suitable values. This paper
    describes an approach in which a complete entity resolution process is
    optimized, on the basis of feedback (such as might be obtained from
    crowds) on candidate duplicates. Given such feedback, an evolutionary
    search of the space of configuration parameters is carried out, with a view
    to maximizing the fitness of the resulting clusters. The approach is payas-
    you-go in that more feedback can be expected to give rise to better
    outcomes. An empirical evaluation shows that the co-optimization of the
    different stages in entity resolution can yield signifcant improvements
    over default parameters, even with small amounts of feedback.
    Original languageEnglish
    Title of host publicationLarge-scale Data and Knowledge-Centered Systems
    Volume10120
    DOIs
    Publication statusPublished - 16 Dec 2016

    Publication series

    NameLecture Notes in Computer Science
    PublisherSpringer
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Fingerprint

    Dive into the research topics of 'Pay-as-you-go Configuration of Entity Resolution'. Together they form a unique fingerprint.

    Cite this