Distributed Multi-Feature Selection-Based Model for Microarray Data

Due to lack of scalability of feature selection algorithms when applied in a centralized manner, most classification algorithms perform sub-optimally especially in the presence of irrelevant and redundant features in high dimensional datasets-large feature size small instances. Though it is imperative to remove insignificant features to improve learning, the process is complex and time-consuming. This paper proposed a distributed learning method (DLM) by horizontal partitioning, maintained the class distribution and measured the data complexity and stability of the feature subsets towards achieving scalability. The min-max, mean imputation and minority oversampling techniques were applied on the dataset to create a balanced feature-sample sized ratio. Three common feature selection algorithms: information gain, gain ratio and chi-square as well as classifiers: SVM, decision tree and Naive Bayes, were used to demonstrate the adequacy of the model. The study obtained 99.67% accuracy with significant reduction in runtime and reduct - feature subsets when compared with existing models. The findings suggest that the model has future prospects in accuracy improvement, runtime and reduct size reduction.

Author(s)

E. C. IGODAN

U. D. GEORGE

E. PHILEMON

Volume

Keyword(s)

Feature selection

Distributed learning method

machine learning

Fishers’ discriminant ratio

Scalability

Year

2024

Page Number

52-74

Upload

BJPS(2) Paper 4.pdf (779.58 KB)