Volume 10 Number 4 (July 2015)
Home > Archive > 2015 > Volume 10 Number 4 (July 2015) >
JCP 2015 Vol.10(4): 260-267 ISSN: 1796-203X
doi: 10.17706/jcp.10.4.260-267

Feature Weighting Improvement of Web Text Categorization Based on Particle Swarm Optimization Algorithm

Yonghe Lu, Yanhong Peng
Sun Yat-sen University, Guangzhou, China
Abstract—It is usually true that some structures like title can express the main content of texts, and these structures may have an influence on the effectiveness of text categorization. However, the most common feature weighting algorithms, called term frequency-inverse document frequency (TF-IDF) doesn’t think about the structural information of texts. To solve this problem, a new feature weighting algorithm based on Particle Swarm Optimization algorithm is put forward. It considers the structure information (i.e., HTML tags) of web pages. Firstly, web pages are crawled and pre-processed, at the same time, the content of four HTML tags is reserved; secondly, Chi-squared (CHI) is used to select features; thirdly, a new feature weighting algorithm, which is called the feature tag weighting algorithm, is come up with. In the feature tag weighting algorithm, we use particle swarm optimization (PSO) to calculate tag weighting coefficients; lastly, k-nearestneighbor (kNN) is used as the web text categorization. The experiment results show that feature tag weighting algorithm has better performance than TF-IDF in the effectiveness of web text categorization.

Index Terms—Text categorization, TF-IDF, PSO, web text, HTML tag.

[PDF]

Cite: Yonghe Lu, Yanhong Peng, "Feature Weighting Improvement of Web Text Categorization Based on Particle Swarm Optimization Algorithm," Journal of Computers vol. 10, no. 4, pp. 260-267, 2015.

General Information

ISSN: 1796-203X
Abbreviated Title: J.Comput.
Frequency: Bimonthly
Editor-in-Chief: Prof. Liansheng Tan
Executive Editor: Ms. Nina Lee
Abstracting/ Indexing: DBLP, EBSCO,  ProQuest, INSPEC, ULRICH's Periodicals Directory, WorldCat,etc
E-mail: jcp@iap.org
  • Nov 14, 2019 News!

    Vol 14, No 11 has been published with online version   [Click]

  • Mar 20, 2020 News!

    Vol 15, No 2 has been published with online version   [Click]

  • Dec 16, 2019 News!

    Vol 14, No 12 has been published with online version   [Click]

  • Sep 16, 2019 News!

    Vol 14, No 9 has been published with online version   [Click]

  • Aug 16, 2019 News!

    Vol 14, No 8 has been published with online version   [Click]

  • Read more>>