Volume 6 Number 5 (May 2011)
Home > Archive > 2011 > Volume 6 Number 5 (May 2011) >
JCP 2011 Vol.6(5): 905-912 ISSN: 1796-203X
doi: 10.4304/jcp.6.5.905-912

Multiple Linear Regression for Extracting Phrase Translation Pairs

Chun-Xiang Zhang1, Ming-Yuan Ren1, Zhi-Mao Lu2, Ying-Hong Liang3, Da-Song Sun4, Yong Liu5
1School of Software, Harbin University of Science and Technology, Harbin, China
2College of Information and Communication Engineering, Harbin Engineering University, Harbin, China
3School of Computer Engineering, Vocational University of Suzhou City, Suzhou, China
4School of Computer Engineering, Vocational University of Suzhou City, Suzhou, China
5School of Computer Science and Technology, Heilongjiang University, Harbin, China


Abstract—Phrase translation pairs are very useful for bilingual lexicography, machine translation system, crosslingual information retrieval and many applications in natural language processing. Phrase translation pairs are always extracted from bilingual sentence pairs. In this paper, we extract phrase translation pairs based on word alignment results of Chinese-English bilingual sentence pairs and parsing trees of Chinese sentences, in order to decrease the influence of the grammar disagreement between Chinese and English. Discriminative features for phrase translation pairs are proposed to evaluate extracted ones in this paper, including translation literality, phrase alignment probability and phrase length difference. Multiple linear regression model combined with N-best strategy will be employed to filter phrase translation pairs, in order to improve the evaluating and filtering performance. Experimental results indicate that the filtering performance of phrase alignment probability is best in three kinds of discriminative features for evaluating Chinese- English phrase translation pairs. After multiple linear regression model combined with N-best strategy is used, its F1 achieves 86.24%.

Index Terms—Phrase translation pairs, natural language processing, bilingual sentence pairs, parsing trees, discriminative features, multiple linear regression

[PDF]

Cite: Chun-Xiang Zhang, Ming-Yuan Ren, Zhi-Mao Lu, Ying-Hong Liang, Da-Song Sun, Yong Liu, "Multiple Linear Regression for Extracting Phrase Translation Pairs," Journal of Computers vol. 6, no. 5, pp. 905-912, 2011.

General Information

ISSN: 1796-203X
Abbreviated Title: J.Comput.
Frequency: Bimonthly
Editor-in-Chief: Prof. Liansheng Tan
Executive Editor: Ms. Nina Lee
Abstracting/ Indexing: DBLP, EBSCO,  ProQuest, INSPEC, ULRICH's Periodicals Directory, WorldCat,etc
E-mail: jcp@iap.org
  • Nov 14, 2019 News!

    Vol 14, No 11 has been published with online version   [Click]

  • Mar 20, 2020 News!

    Vol 15, No 2 has been published with online version   [Click]

  • Dec 16, 2019 News!

    Vol 14, No 12 has been published with online version   [Click]

  • Sep 16, 2019 News!

    Vol 14, No 9 has been published with online version   [Click]

  • Aug 16, 2019 News!

    Vol 14, No 8 has been published with online version   [Click]

  • Read more>>