Input Feature Selection Method Based on Feature Set Equivalence and Mutual Information Gain Maximization

Xinzheng Wang; Bing Guo; Yan Shen; Chimin Zhou; Xuliang Duan

doi:10.1109/ACCESS.2019.2948095

IEEE Access (Jan 2019)

Input Feature Selection Method Based on Feature Set Equivalence and Mutual Information Gain Maximization

Xinzheng Wang,
Bing Guo,
Yan Shen,
Chimin Zhou,
Xuliang Duan

Affiliations

Xinzheng Wang: ORCiD; College of Computer Science, Sichuan University, Chengdu, China
Bing Guo: College of Computer Science, Sichuan University, Chengdu, China
Yan Shen: School of Control Engineering, Chengdu University of Information Technology, Chengdu, China
Chimin Zhou: ORCiD; College of Computer Science, Sichuan University, Chengdu, China
Xuliang Duan: College of Computer Science, Sichuan University, Chengdu, China

DOI: https://doi.org/10.1109/ACCESS.2019.2948095
Journal volume & issue: Vol. 7
pp. 151525 – 151538

Abstract

Read online

Feature selection is the first and essential step for dimension reduction in many application areas, such as data mining and machine learning, due to its computational efficiency and interpretability of the results. This paper focuses on feature selection methods based on information theory. By studying and analyzing the ideas and drawbacks of existing feature selection methods, it finds that in the process of feature selection separately focuses on a candidate feature its individual relationship with the predicted class vector may lead to some problems. And we believe that when the candidate feature is combined with the selected features, its comprehensive discriminative ability should be taken as the evaluation index of the candidate feature. Therefore, we propose a novel feature selection method in this paper. In the proposed method, we introduced the equivalent partition concept and adopted the mutual information gain maximize (MIGM) criterion to evaluate the candidate feature. In order to estimate the performance of MIGM, we conducted experiments on ten benchmark datasets and two different classifiers, k-Nearest Neighbor (KNN) and Naïve-Bayes (NB). Extensive experimental results demonstrate that our method can identify an effective feature subset that leads to better classification results than other methods.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords