Volume 3 Number 1 (Jan. 2008)
Home > Archive > 2008 > Volume 3 Number 1 (Jan. 2008) >
JCP 2008 Vol.3(1): 51-62 ISSN: 1796-203X
doi: 10.4304/jcp.3.1.51-62

Analysis and Improved Recognition of Protein Names Using Transductive SVM

Masaki Murata1, Tomohiro Mitsumori2, Kouichi Doi3
1National Institute of Information and Communications Technology
2Miyazono Patent Office
3Pharma Security Consulting Inc.


Abstract—We first analyzed protein names using various dictionaries and databases and found five problems with protein names; i.e., the treatment of special characters, the treatment of homonyms, cases where the protein-name string may be a substring of a different protein-name string, cases where one protein exists in different organisms, and the treatment of modifiers. We confirmed that we could use a machine-learning approach to recognizing protein names to solve these problems. Thus, machine-learning methods have recently been used in research to recognize protein names. A classifier trained in a specific domain, however, can cause overfitting and be so inflexible that it can only be used in that domain. We therefore developed a new corpus on breast cancer and investigated the flexibility of classifiers trained on the GENIA [1] or the breast-cancer corpora. We used a transductive support vector machine (SVM) to avoid overfitting, and we evaluated the effect of transductive learning. We found that transductive SVM prevented overfitting in experiments and yielded higher accuracies than were obtained from the conventional SVM. The transductive SVM increased the F-scores (70.46 to 79.64 and 70.63 to 74.61) in our two experiments for the criterion of “Sub” that we define in this paper.

Index Terms—overfitting, protein name recognition, biomedical literature, SVM, transductive SVM, different domain

[PDF]

Cite: Masaki Murata, Tomohiro Mitsumori, Kouichi Doi, "Analysis and Improved Recognition of Protein Names Using Transductive SVM," Journal of Computers vol. 3, no. 1, pp. 51-62, 2008.

General Information

ISSN: 1796-203X
Abbreviated Title: J.Comput.
Frequency: Bimonthly
Editor-in-Chief: Prof. Liansheng Tan
Executive Editor: Ms. Nina Lee
Abstracting/ Indexing: DBLP, EBSCO,  ProQuest, INSPEC, ULRICH's Periodicals Directory, WorldCat,etc
E-mail: jcp@iap.org
  • Nov 14, 2019 News!

    Vol 14, No 11 has been published with online version   [Click]

  • Mar 20, 2020 News!

    Vol 15, No 2 has been published with online version   [Click]

  • Dec 16, 2019 News!

    Vol 14, No 12 has been published with online version   [Click]

  • Sep 16, 2019 News!

    Vol 14, No 9 has been published with online version   [Click]

  • Aug 16, 2019 News!

    Vol 14, No 8 has been published with online version   [Click]

  • Read more>>