JCP 2015 Vol.10(4): 284-291 ISSN: 1796-203X
doi: 10.17706/jcp.10.4.284-291
doi: 10.17706/jcp.10.4.284-291
A Feature Selection Based on Relevance and Redundancy
Yonghe Lu, Wenqiu Liu, Yanfeng Li
Sun Yat-sen University, Guangzhou, China
Abstract—At present, most of the researches on feature selection do not consider the relevance between a term and its own category, the redundancy among terms. In order to solve this problem efficiently, we propose a new feature selection based on analyzing how to measure the relevance and the redundancy, which use Euclidean distance as the similarity calculation method. R2, the new feature selection algorithm, can obtain the optimal feature subset which has considered the correlations between term and category and filtered the redundant terms. Finally, the validity of the new algorithm in feature selection is validated by the classification experiments on Chinese classification corpus by two classifiers, including KNN and Centroid-based classifier.
Index Terms—Text classification; feature selection; relevance; redundancy.
Abstract—At present, most of the researches on feature selection do not consider the relevance between a term and its own category, the redundancy among terms. In order to solve this problem efficiently, we propose a new feature selection based on analyzing how to measure the relevance and the redundancy, which use Euclidean distance as the similarity calculation method. R2, the new feature selection algorithm, can obtain the optimal feature subset which has considered the correlations between term and category and filtered the redundant terms. Finally, the validity of the new algorithm in feature selection is validated by the classification experiments on Chinese classification corpus by two classifiers, including KNN and Centroid-based classifier.
Index Terms—Text classification; feature selection; relevance; redundancy.
Cite: Yonghe Lu, Wenqiu Liu, Yanfeng Li, "A Feature Selection Based on Relevance and Redundancy," Journal of Computers vol. 10, no. 4, pp. 284-291, 2015.
NEXT PAPER
Last page
General Information
ISSN: 1796-203X
Abbreviated Title: J.Comput.
Frequency: Bimonthly
Abbreviated Title: J.Comput.
Frequency: Bimonthly
Editor-in-Chief: Prof. Liansheng Tan
Executive Editor: Ms. Nina Lee
Abstracting/ Indexing: DBLP, EBSCO, ProQuest, INSPEC, ULRICH's Periodicals Directory, WorldCat,etc
E-mail: jcp@iap.org
-
Nov 14, 2019 News!
Vol 14, No 11 has been published with online version [Click]
-
Mar 20, 2020 News!
Vol 15, No 2 has been published with online version [Click]
-
Dec 16, 2019 News!
Vol 14, No 12 has been published with online version [Click]
-
Sep 16, 2019 News!
Vol 14, No 9 has been published with online version [Click]
-
Aug 16, 2019 News!
Vol 14, No 8 has been published with online version [Click]
- Read more>>