IEEE Access (Jan 2019)

Input Feature Selection Method Based on Feature Set Equivalence and Mutual Information Gain Maximization

  • Xinzheng Wang,
  • Bing Guo,
  • Yan Shen,
  • Chimin Zhou,
  • Xuliang Duan

DOI
https://doi.org/10.1109/ACCESS.2019.2948095
Journal volume & issue
Vol. 7
pp. 151525 – 151538

Abstract

Read online

Feature selection is the first and essential step for dimension reduction in many application areas, such as data mining and machine learning, due to its computational efficiency and interpretability of the results. This paper focuses on feature selection methods based on information theory. By studying and analyzing the ideas and drawbacks of existing feature selection methods, it finds that in the process of feature selection separately focuses on a candidate feature its individual relationship with the predicted class vector may lead to some problems. And we believe that when the candidate feature is combined with the selected features, its comprehensive discriminative ability should be taken as the evaluation index of the candidate feature. Therefore, we propose a novel feature selection method in this paper. In the proposed method, we introduced the equivalent partition concept and adopted the mutual information gain maximize (MIGM) criterion to evaluate the candidate feature. In order to estimate the performance of MIGM, we conducted experiments on ten benchmark datasets and two different classifiers, k-Nearest Neighbor (KNN) and Naïve-Bayes (NB). Extensive experimental results demonstrate that our method can identify an effective feature subset that leads to better classification results than other methods.

Keywords