TY - JOUR
T1 - Comparing Multi-class, Binary and Hierarchical Machine Learning Classication schemes for variable stars
AU - Hosenie, Zafiirah
AU - Lyon, Robert
AU - Stappers, Benjamin
AU - Mootoovaloo, Arrykrishna
PY - 2019/7/25
Y1 - 2019/7/25
N2 - Upcoming synoptic surveys are set to generate an unprecedented amount of data. This requires an automatic framework that can quickly and efficiently provide classification labels for several new object classification challenges. Using data describing 11 types of variable stars from the Catalina Real-Time Transient Surveys (CRTS), we illustrate how to capture the most important information from computed features and describe detailed methods of how to robustly use Information Theory for feature selection and evaluation. We apply three Machine Learning (ML) algorithms and demonstrate how to optimize these classifiers via cross-validation techniques. For the CRTS dataset, we find that the Random Forest (RF) classifier performs best in terms of balanced- accuracy and geometric means. We demonstrate substantially improved classification results by converting the multi-class problem into a binary classication task, achieving a balanced-accuracy rate of 99 per cent for the classification of δ-Scuti and Anomalous Cepheids (ACEP). Additionally, we describe how classification performance can be improved via converting a `at-multi-class' problem into a hierarchical taxonomy. We develop a new hierarchical structure and propose a new set of classification features, enabling the accurate identification of subtypes of cepheids, RR Lyrae and eclipsing binary stars in CRTS data.
AB - Upcoming synoptic surveys are set to generate an unprecedented amount of data. This requires an automatic framework that can quickly and efficiently provide classification labels for several new object classification challenges. Using data describing 11 types of variable stars from the Catalina Real-Time Transient Surveys (CRTS), we illustrate how to capture the most important information from computed features and describe detailed methods of how to robustly use Information Theory for feature selection and evaluation. We apply three Machine Learning (ML) algorithms and demonstrate how to optimize these classifiers via cross-validation techniques. For the CRTS dataset, we find that the Random Forest (RF) classifier performs best in terms of balanced- accuracy and geometric means. We demonstrate substantially improved classification results by converting the multi-class problem into a binary classication task, achieving a balanced-accuracy rate of 99 per cent for the classification of δ-Scuti and Anomalous Cepheids (ACEP). Additionally, we describe how classification performance can be improved via converting a `at-multi-class' problem into a hierarchical taxonomy. We develop a new hierarchical structure and propose a new set of classification features, enabling the accurate identification of subtypes of cepheids, RR Lyrae and eclipsing binary stars in CRTS data.
KW - stars
KW - variables- general { methods
KW - data analysis - Astronomical instru- mentation, methods, and techniques
U2 - 10.1093/mnras/stz1999
DO - 10.1093/mnras/stz1999
M3 - Article
SN - 1365-2966
JO - Monthly Notices of the Royal Astronomical Society
JF - Monthly Notices of the Royal Astronomical Society
ER -