Volume 6 Number 5 (May 2011)
Home > Archive > 2011 > Volume 6 Number 5 (May 2011) >
JCP 2011 Vol.6(5): 923-930 ISSN: 1796-203X
doi: 10.4304/jcp.6.5.923-930

Key Information Expansion Applied in Spoken Document Classification based on Lattice

Lei Zhang, Zhuo Zhang, Xue-Zhi Xiang
Information and Communication Engineering College, Harbin Engineering University, Harbin, China
Abstract—Traditionally, query words or key words in spoken document classification are generated by manual. In this paper, based on CHI-square, TFIDF and maximum poster probability (MPP) features, a new hybrid feature for key information extraction is proposed. It can combine the advantages of these three features, and the weight of each word in hybrid feature can be further integrated into the classification system. Here, the weights of key words can reveal the relationship between words and topic to some extent. Furthermore, when the query words or key words are not enough, key information expansion part based on focus score can be added to dig the latent information about the topic. In the key information expansion part, not only the documents with key words occurring but also the other documents with no key word participate into the expansion procedure. Additionally, in the classification system, document length as prior information is adopted when no query is found. The whole classification system is based on lattice, which has more information than 1-best result in speech recognition system. Among CHI-square, TFIDF and MPP, the system performance of MPP is a little worse than the others. CHI-square is a little better than TFIDF when the key words number is increasing. Among these feature, hybrid feature can almost obtain the best performance under the same condition. Combined with document length information, the classification system performance is further enhanced, especially for less key information condition. Experiments show that when the system is combined weight and document length information, hybrid feature can obtain the best performance with a MAP of 0.7817 under 50 key words. When key information is not enough, key information expansion can improve the system performance when only 1, 5, 10 key words here. In the proposed key information expansion approach, since the focus factor is introduced to adjust the effect of documents with no key words, some empty words can be avoided to some extent, and the number of expansion words can be under control.

Index Terms—Hybrid feature, key information extraction, document length, spoken document classification, lattice

[PDF]

Cite: Lei Zhang, Zhuo Zhang, Xue-Zhi Xiang, "Key Information Expansion Applied in Spoken Document Classification based on Lattice," Journal of Computers vol. 6, no. 5, pp. 923-930, 2011.

General Information

ISSN: 1796-203X
Abbreviated Title: J.Comput.
Frequency: Bimonthly
Editor-in-Chief: Prof. Liansheng Tan
Executive Editor: Ms. Nina Lee
Abstracting/ Indexing: DBLP, EBSCO,  ProQuest, INSPEC, ULRICH's Periodicals Directory, WorldCat,etc
E-mail: jcp@iap.org
  • Nov 14, 2019 News!

    Vol 14, No 11 has been published with online version   [Click]

  • Mar 20, 2020 News!

    Vol 15, No 2 has been published with online version   [Click]

  • Dec 16, 2019 News!

    Vol 14, No 12 has been published with online version   [Click]

  • Sep 16, 2019 News!

    Vol 14, No 9 has been published with online version   [Click]

  • Aug 16, 2019 News!

    Vol 14, No 8 has been published with online version   [Click]

  • Read more>>