Skip to main content
BJPS

Main navigation

  • Home
User account menu
  • Log in

Breadcrumb

  1. Home

Distributed Multi-Feature Selection-Based Model for Microarray Data

Due to lack of scalability of feature selection algorithms when applied in a centralized manner, most classification algorithms perform sub-optimally especially in the presence of irrelevant and redundant features in high dimensional datasets-large feature size small instances. Though it is imperative to remove insignificant features to improve learning, the process is complex and time-consuming. This paper proposed a distributed learning method (DLM) by horizontal partitioning, maintained the class distribution and measured the data complexity and stability of the feature subsets towards achieving scalability. The min-max, mean imputation and minority oversampling techniques were applied on the dataset to create a balanced feature-sample sized ratio. Three common feature selection algorithms: information gain, gain ratio and chi-square as well as classifiers: SVM, decision tree and Naive Bayes, were used to demonstrate the adequacy of the model. The study obtained 99.67% accuracy with significant reduction in runtime and reduct - feature subsets when compared with existing models. The findings suggest that the model has future prospects in accuracy improvement, runtime and reduct size reduction.
    

Author(s)
E. C. IGODAN
U. D. GEORGE
E. PHILEMON
Volume
1
Keyword(s)
Feature selection
Distributed learning method
machine learning
Fishers’ discriminant ratio
Scalability
Year
2024
Page Number
52-74
Upload
BJPS(2) Paper 4.pdf (779.58 KB)
RSS feed
Powered by Drupal