Fairness in Data Wrangling

Lacramioara Mazilu, Norman W. Paton, Nikolaos Konstantinou, Alvaro A.a. Fernandes

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

256 Downloads (Pure)

Abstract

At the core of many data analysis processes lies the challenge of properly gathering and transforming data. This problem is known as data wrangling, and it can become even more challenging if the data sources that need to be transformed are heterogeneous and autonomous, i.e., have different origins, and if the output is meant to be used as a training dataset, thus, making it paramount for the dataset to be fair. Given the rise in usage of artificial intelligence (AI) systems for a variety of domains, it is necessary to take into account fairness issues while building these systems. In this paper, we aim to bridge the gap between gathering the data and making the datasets fair by proposing a method for performing data wrangling while considering fairness. To this end, our method comprises a data wrangling pipeline whose behaviour can be adjusted through a set of parameters. Based on the fairness metrics run on the output datasets, the system plans a set of data wrangling interventions with the aim of lowering the bias in the output dataset. The system uses Tabu Search to explore the space of candidate interventions. In this paper we consider two potential sources of dataset bias: those arising from unequal representation of sensitive groups and those arising from hidden biases through proxies for sensitive attributes. The approach is evaluated empirically.
Original languageEnglish
Title of host publication2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI)
Pages341-348
DOIs
Publication statusE-pub ahead of print - 10 Sept 2020
Event2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI) - Las Vegas, NV, USA
Duration: 11 Aug 202013 Aug 2020

Conference

Conference2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI)
Period11/08/2013/08/20

Fingerprint

Dive into the research topics of 'Fairness in Data Wrangling'. Together they form a unique fingerprint.

Cite this