JCP 2012 Vol.7(11): 2612-2616 ISSN: 1796-203X
doi: 10.4304/jcp.7.11.2612-2616
doi: 10.4304/jcp.7.11.2612-2616
Applications of Text Clustering Based on Semantic Body for Chinese Spam Filtering
Qiu-yu Zhang1, 2, Peng Wang and Hui-juan Yang
1School of Computer and Communication, Lanzhou University of Technology, Lanzhou, China
2Technology & Research Center of Gansu Manufacturing Informatization Engineering, Lanzhou, China
Abstract—The effect of spam filtering method based on statistics is not good enough in filtering the new-type spam with synonymous substitution and camouflage, because the method based on statistics ignores the semantic relation between words in the text, and only judges from the word itself. So, a method of spam filtering based on the semantic body is proposed in this paper. The method adopts lexical chain based on HowNet and TFIDF method based on statistics to extract e-mail features, and handle spam with text clustering method. The result of the experiment shows that the new method proposed in this pager provides a good effect in filtering new-type spam.
Index Terms—Semantic body, lexical chain, semantic similarity, text clustering, spam filter
2Technology & Research Center of Gansu Manufacturing Informatization Engineering, Lanzhou, China
Abstract—The effect of spam filtering method based on statistics is not good enough in filtering the new-type spam with synonymous substitution and camouflage, because the method based on statistics ignores the semantic relation between words in the text, and only judges from the word itself. So, a method of spam filtering based on the semantic body is proposed in this paper. The method adopts lexical chain based on HowNet and TFIDF method based on statistics to extract e-mail features, and handle spam with text clustering method. The result of the experiment shows that the new method proposed in this pager provides a good effect in filtering new-type spam.
Index Terms—Semantic body, lexical chain, semantic similarity, text clustering, spam filter
Cite: Qiu-yu Zhang, Peng Wang and Hui-juan Yang, "Applications of Text Clustering Based on Semantic Body for Chinese Spam Filtering," Journal of Computers vol. 7, no. 11, pp. 2612-2616, 2012.
General Information
ISSN: 1796-203X
Abbreviated Title: J.Comput.
Frequency: Bimonthly
Abbreviated Title: J.Comput.
Frequency: Bimonthly
Editor-in-Chief: Prof. Liansheng Tan
Executive Editor: Ms. Nina Lee
Abstracting/ Indexing: DBLP, EBSCO, ProQuest, INSPEC, ULRICH's Periodicals Directory, WorldCat,etc
E-mail: jcp@iap.org
-
Nov 14, 2019 News!
Vol 14, No 11 has been published with online version [Click]
-
Mar 20, 2020 News!
Vol 15, No 2 has been published with online version [Click]
-
Dec 16, 2019 News!
Vol 14, No 12 has been published with online version [Click]
-
Sep 16, 2019 News!
Vol 14, No 9 has been published with online version [Click]
-
Aug 16, 2019 News!
Vol 14, No 8 has been published with online version [Click]
- Read more>>