Abstract
Technical developments, such as the web of data and web
data extraction, combined with policy developments such as
those relating to open government or open science, are lead-
ing to the availability of increasing numbers of data sources.
Indeed, given these physical sources, it is then also possible
to create further virtual sources that integrate, aggregate or
summarise the data from the original sources. As a result,
there is a plethora of data sources, from which a small subset
may be able to provide the information required to support a
task. The number and rate of change in the available sources
is likely to make manual source selection and curation by
experts impractical for many applications, leading to the
need to pursue a pay-as-you-go approach, in which crowds
or data consumers annotate results based on their correct-
ness or suitability, with the resulting annotations used to
inform, e.g., source selection algorithms. However, for pay-
as-you-go feedback collection to be cost-eective, it may be
necessary to select judiciously the data items on which feed-
back is to be obtained. This paper describes OLBP (Order-
ing and Labelling By Precision), a heuristics-based approach
to the targeting of data items for feedback to support map-
ping and source selection tasks, where users express their
preferences in terms of the trade-o between precision and
recall. The proposed approach is then evaluated on two
dierent scenarios, mapping selection with synthetic data,
and source selection with real data produced by web data
extraction. The results demonstrate a signicant reduction
in the amount of feedback required to reach user-provided
objectives when using OLBP.
data extraction, combined with policy developments such as
those relating to open government or open science, are lead-
ing to the availability of increasing numbers of data sources.
Indeed, given these physical sources, it is then also possible
to create further virtual sources that integrate, aggregate or
summarise the data from the original sources. As a result,
there is a plethora of data sources, from which a small subset
may be able to provide the information required to support a
task. The number and rate of change in the available sources
is likely to make manual source selection and curation by
experts impractical for many applications, leading to the
need to pursue a pay-as-you-go approach, in which crowds
or data consumers annotate results based on their correct-
ness or suitability, with the resulting annotations used to
inform, e.g., source selection algorithms. However, for pay-
as-you-go feedback collection to be cost-eective, it may be
necessary to select judiciously the data items on which feed-
back is to be obtained. This paper describes OLBP (Order-
ing and Labelling By Precision), a heuristics-based approach
to the targeting of data items for feedback to support map-
ping and source selection tasks, where users express their
preferences in terms of the trade-o between precision and
recall. The proposed approach is then evaluated on two
dierent scenarios, mapping selection with synthetic data,
and source selection with real data produced by web data
extraction. The results demonstrate a signicant reduction
in the amount of feedback required to reach user-provided
objectives when using OLBP.
Original language | English |
---|---|
Title of host publication | International Conference on Scientific and Statistical Database Management (SSDBM) July 18-20, 2016, Budapest, Hungary SSDBM ’16, July 18-20, 2016, Budapest, Hungary |
DOIs | |
Publication status | Published - 2016 |