Abstract
Feature selection is the data analysis process that selects a smaller and curated subset of the original dataset by filtering out data (features) which are irrelevant or redundant. The most important features can be ranked and selected based on statistical measures, such as mutual information. Feature selection not only reduces the size of dataset as well as the execution time for training Machine Learning (ML) models, but it can also improve the accuracy of the inference. This paper analyses mutual-information-based feature selection for resource-constrained FPGAs and proposes FINESSD, a novel approach that can be deployed for near-storage acceleration. This paper highlights that the Mutual Information Maximization (MIM) algorithm does not require multiple passes over the data while being a good trade-off between accuracy and FPGA resources, when approximated appropriately. The new FPGA accelerator for MIM generated by FINESSD can fully utilize the NVMe bandwidth of a modern SSD and perform feature selection without requiring full dataset transfers onto the main processor. The evaluation using a Samsung SmartSSD over small, large and out-of-core datasets shows that, compared to the mainstream multiprocessing Python ML libraries and an optimized C library, FINESSD yields up to 35x and 19x speedup respectively while being more than 70x more energy efficient for large, out-of-core datasets.
Original language | English |
---|---|
Pages | 173-184 |
DOIs | |
Publication status | Accepted/In press - 2024 |
Event | 32nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2024 - Orlando, United States Duration: 5 May 2024 → 8 May 2024 |
Conference
Conference | 32nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2024 |
---|---|
Abbreviated title | FCCM 2024 |
Country/Territory | United States |
City | Orlando |
Period | 5/05/24 → 8/05/24 |