Ensemble Synthetic Oversampling with Manhattan Distance for Unbalanced Hyperspectral Data

Tajul Miftahushudur, Bruce Grieve, Hujun Yin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Hyperspectral imaging is a spectroscopic imaging technique that can cover a broad range of electromagnetic wavelengths and subdivide those into spectral bands. As a consequence, it may distinguish specific features more effectively than conventional colour cameras. This technology has been increasingly used in agriculture for various applications such as crop leaf area index, plant classification and disease monitoring. However, the abundance of information in hyperspectral imagery may cause high dimensionality problem, leading to computational complexity and storage issues. Furthermore, data availability is another major issue. In agriculture application, typically, it is difficult to collect equal number of samples as some classes or diseases are rare while others are abundant and easy to collect. This may give rise to an imbalanced data problem that can severely reduce machine learning performance and introduce bias in performance measurement. In this paper, an oversampling method is proposed based on Safe-Level synthetic minority oversampling technique (Safe-Level SMOTE), which is modified in terms of its k-nearest neighbours (KNN) function to make it fit better with high dimensional data. Using convolutional neural networks (CNN) as the classifier combined with ensemble bagging with differentiated sampling rate (DSR), the approach demonstrates better performances than the other state-of-the-art methods in handling imbalance situations.
Original languageEnglish
Title of host publicationProceedings of International Conference on Intelligent Data Engineering and Automated Learning
PublisherSpringer Nature
Pages54-64
Volume13113
DOIs
Publication statusPublished - 23 Nov 2021

Fingerprint

Dive into the research topics of 'Ensemble Synthetic Oversampling with Manhattan Distance for Unbalanced Hyperspectral Data'. Together they form a unique fingerprint.

Cite this