Abstract
Preparing a dataset representing business problems
is an essential task in Machine Learning (ML). A suitable dataset
is critical to accurate ML algorithms, which helps validate business
problems. For example, preparing a dataset for predicting
loan default in one bank would be vital in the ML project as bank
staff may take some actions to mitigate the problem. However,
preparing a dataset for identifying potential business problems is
challenging. Some challenges might include determining possible
events leading to problems, identifying testable factors of the
events, and mapping a testable factor to data features to extract
relevant data from source data. ML models using irrelevant
or unimportant data may give incorrect predictions, negatively
impacting problem validation, consequently not solving business
problems. We present a goal-oriented approach for preparing an
ML dataset to address this challenge. The approach provides an
ontology and a process for guiding data preparation. In addition,
it helps capture problematic business events, refine a business
event to find a testable factor, map a testable factor to a database
entity and features, and extract data from a database or Big data.
We illustrate the approach using a retail banking application
and a Financial database. The experimental results, we believe
at least, show that the approach supports preparing a relevant
ML dataset, helping validate business problems.
is an essential task in Machine Learning (ML). A suitable dataset
is critical to accurate ML algorithms, which helps validate business
problems. For example, preparing a dataset for predicting
loan default in one bank would be vital in the ML project as bank
staff may take some actions to mitigate the problem. However,
preparing a dataset for identifying potential business problems is
challenging. Some challenges might include determining possible
events leading to problems, identifying testable factors of the
events, and mapping a testable factor to data features to extract
relevant data from source data. ML models using irrelevant
or unimportant data may give incorrect predictions, negatively
impacting problem validation, consequently not solving business
problems. We present a goal-oriented approach for preparing an
ML dataset to address this challenge. The approach provides an
ontology and a process for guiding data preparation. In addition,
it helps capture problematic business events, refine a business
event to find a testable factor, map a testable factor to a database
entity and features, and extract data from a database or Big data.
We illustrate the approach using a retail banking application
and a Financial database. The experimental results, we believe
at least, show that the approach supports preparing a relevant
ML dataset, helping validate business problems.
Original language | English |
---|---|
Publication status | Published - 2021 |
Event | IEEE BigData 2021: IEEE International Conference on Big Data - Duration: 15 Dec 2021 → 18 Dec 2021 |
Conference
Conference | IEEE BigData 2021 |
---|---|
Period | 15/12/21 → 18/12/21 |