In the field of machine learning, it is well known that supervised problems can be one of two categories: classification or regression. Within the context of classification, several metrics and graphs used to assess the performance of a model only work in the context of a classification problem that computes the decision boundary between two classes (binary classification). With a greater adoption of machine learning, organizations now find themselves determining decision boundaries between several classes (multiclass). The usual question that arises is, how can one set up a multi-class problem and assess its performance? Although expansions on binary performance metrics do exist for this situation, there are a number of challenges worth considering. Suffering from limitations such as insufficient data samples and class imbalance, multi-class experiments can be unreliable for several machine learning problems. Developing a work-around, we compare and contrast several approaches to re-designing a multi-classification into binary classification. We further elucidate the best experimental design for assessing the final decisions of our model (s). The experiments for this case study analysis are applied to determine the taxonomic levels of several COVID-19 viral genomes to identify the pathogenic strains based on digital signal and chaos-inspired features.
Talk Main Points:
*What is multi-class classification?
*Compare and contrast the performance of multi-class and binary class problems
*Transforming a multi-class problem into a binary class problem
*Assessing limitations of each transformation approach in the process of COVID-19 viral taxonomy classification
Rishov Chatterjee,City of Hope, Data Scientist