Targeted Feedback Collection for Data Source Selection with Uncertainty

Student thesis: Phd


The aim of this dissertation is to contribute to research on pay-as-you-go data integration through the proposal of an approach for targeted feedback collection (TFC), which aims to improve the cost-effectiveness of feedback collection, especially when there is uncertainty associated with characteristics of the integration artefacts. In particular, this dissertation focuses on the data source selection task in data integration. It is shown how the impact of uncertainty about the evaluation of the characteristics of the candidate data sources, also known as data criteria, can be reduced, in a cost-effective manner, thereby improving the solutions to the data source selection problem. This dissertation shows how alternative approaches such as active learning and simple heuristics have drawbacks that throw light into the pursuit of better solutions to the problem. This dissertation describes the resulting TFC strategy and reports on its evaluation against alternative techniques. The evaluation scenarios vary from synthetic data sources with a single criterion and reliable feedback to real data sources with multiple criteria and unreliable feedback (such as can be obtained through crowdsourcing). The results confirm that the proposed TFC approach is cost-effective and leads to improved solutions for data source selection by seeking feedback that reduces uncertainty about the data criteria of the candidate data sources.
Date of Award1 Aug 2018
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorNorman Paton (Supervisor) & Alvaro Fernandes (Supervisor)


  • Pay-as-you-go
  • Data integration
  • Data source selection
  • Schema mapping selection
  • Feedback collection
  • Optimisation
  • Crowd-sourcing
  • Uncertainty handling

Cite this