A heterogeneous dataset can be defined as a dataset having diverse types of features that describe a given instance or object. Each of these features represents a piece of valuable information. Currently, regression models limit the number of features that can be processed at one time, which means that only a subset of the information is considered. Consequently, their regression analysis deals with incomplete data descriptions, which are affected by significant information loss and miss important relationships between features. With the rapidly increasing use of datasets containing mixed data types, current learning techniques for this kind of dataset include pre-processing and learning phases. The former focuses on unifying data types by transferring them into categorical or numerical inputs or defining distance measures. The resulting data can be used in learning models suited to its types. However, this scheme approach to dealing with mixed data types can lead to a lack of compatibility. It may also suffer from tremendous data dimensions, which may overload the computation capacity of the learning model. This study focuses on developing a regression model that can handle heterogeneous datasets based on a radial basis function. Three main solutions are proposed. The first solution is based on defining a heterogeneous distance measurement and then using it to train a radial basis function network. The second solution is developing a regression model without the need to define a distance measure or unifying data types by the rough development of a heterogeneous radial basis function regression model that can directly learn from heterogeneous data. As each feature has its own characteristics and has been widely explored in the literature, a hybrid-regression model is proposed by combining multiple regression models. With this strategy, information can be extracted efficiently, and underlying knowledge is revealed optimally by developing a model for each data type. These three proposed models as a solution to the regression analysis for heterogeneous dataset, were evaluated using a set of mixed numerical and categorical datasets and social media prediction data that contained numerical categorical and textual features. The results of these models were compared to well-known regression models, such as random forest, support vector regression, and linear regression. The best results were achieved from the hybrid-regression, where the learnerâÂÂs performance was significantly increased. This model proved effective, and its results showed that with suitable models and simple approaches, heterogeneous data learning problems can be solved quite easily.
Date of Award | 31 Dec 2022 |
---|
Original language | English |
---|
Awarding Institution | - The University of Manchester
|
---|
Supervisor | Xiaojun Zeng (Supervisor) & Ke Chen (Supervisor) |
---|
- Mixed data type
- Heterogeneous Distance Measure
- Radial Basis Network
- Heterogeneous data
- Social Media
- Hybrid model
- Regression models
Design and Learning Hybrid Radial Basis Function Networks for Heterogeneous Data
Alghanmi, N. (Author). 31 Dec 2022
Student thesis: Phd