Projects per year
Abstract
At the core of many data analysis processes lies the challenge of properly gathering and transforming data. This problem is known as data wrangling, and it can become even more challenging if the data sources that need to be transformed are heterogeneous and autonomous, i.e., have different origins, and if the output is meant to be used as a training dataset, thus, making it paramount for the dataset to be fair. Given the rise in usage of artificial intelligence (AI) systems for a variety of domains, it is necessary to take into account fairness issues while building these systems. In this paper, we aim to bridge the gap between gathering the data and making the datasets fair by proposing a method for performing data wrangling while considering fairness. To this end, our method comprises a data wrangling pipeline whose behaviour can be adjusted through a set of parameters. Based on the fairness metrics run on the output datasets, the system plans a set of data wrangling interventions with the aim of lowering the bias in the output dataset. The system uses Tabu Search to explore the space of candidate interventions. In this paper we consider two potential sources of dataset bias: those arising from unequal representation of sensitive groups and those arising from hidden biases through proxies for sensitive attributes. The approach is evaluated empirically.
Original language | English |
---|---|
Title of host publication | 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI) |
Pages | 341-348 |
DOIs | |
Publication status | E-pub ahead of print - 10 Sept 2020 |
Event | 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI) - Las Vegas, NV, USA Duration: 11 Aug 2020 → 13 Aug 2020 |
Conference
Conference | 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI) |
---|---|
Period | 11/08/20 → 13/08/20 |
Fingerprint
Dive into the research topics of 'Fairness in Data Wrangling'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Value Added Data Systems: Principles and Architecture.
Paton, N. (PI), Fernandes, A. (CoI) & Keane, J. (CoI)
1/04/15 → 30/09/20
Project: Research