A Goal-Oriented Approach for Preparing a Machine-Learning Dataset to Support Business Problem Validation

Robert Ahn, Sam Supakkul, Liping Zhao, Kirthy Kolluri, Tom Hill, Lawrence Chung

Research output: Contribution to conferencePaperpeer-review

84 Downloads (Pure)

Abstract

Preparing a dataset representing business problems
is an essential task in Machine Learning (ML). A suitable dataset
is critical to accurate ML algorithms, which helps validate business
problems. For example, preparing a dataset for predicting
loan default in one bank would be vital in the ML project as bank
staff may take some actions to mitigate the problem. However,
preparing a dataset for identifying potential business problems is
challenging. Some challenges might include determining possible
events leading to problems, identifying testable factors of the
events, and mapping a testable factor to data features to extract
relevant data from source data. ML models using irrelevant
or unimportant data may give incorrect predictions, negatively
impacting problem validation, consequently not solving business
problems. We present a goal-oriented approach for preparing an
ML dataset to address this challenge. The approach provides an
ontology and a process for guiding data preparation. In addition,
it helps capture problematic business events, refine a business
event to find a testable factor, map a testable factor to a database
entity and features, and extract data from a database or Big data.
We illustrate the approach using a retail banking application
and a Financial database. The experimental results, we believe
at least, show that the approach supports preparing a relevant
ML dataset, helping validate business problems.
Original languageEnglish
Publication statusPublished - 2021
EventIEEE BigData 2021: IEEE International Conference on Big Data -
Duration: 15 Dec 202118 Dec 2021

Conference

ConferenceIEEE BigData 2021
Period15/12/2118/12/21

Fingerprint

Dive into the research topics of 'A Goal-Oriented Approach for Preparing a Machine-Learning Dataset to Support Business Problem Validation'. Together they form a unique fingerprint.

Cite this