4
|
Torii M, Li G, Li Z, Oughtred R, Diella F, Celen I, Arighi CN, Huang H, Vijay-Shanker K, Wu CH. RLIMS-P: an online text-mining tool for literature-based extraction of protein phosphorylation information. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau081. [PMID: 25122463 PMCID: PMC4131691 DOI: 10.1093/database/bau081] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Protein phosphorylation is central to the regulation of most aspects of cell function. Given its importance, it has been the subject of active research as well as the focus of curation in several biological databases. We have developed Rule-based Literature Mining System for protein Phosphorylation (RLIMS-P), an online text-mining tool to help curators identify biomedical research articles relevant to protein phosphorylation. The tool presents information on protein kinases, substrates and phosphorylation sites automatically extracted from the biomedical literature. The utility of the RLIMS-P Web site has been evaluated by curators from Phospho.ELM, PhosphoGRID/BioGrid and Protein Ontology as part of the BioCreative IV user interactive task (IAT). The system achieved F-scores of 0.76, 0.88 and 0.92 for the extraction of kinase, substrate and phosphorylation sites, respectively, and a precision of 0.88 in the retrieval of relevant phosphorylation literature. The system also received highly favorable feedback from the curators in a user survey. Based on the curators’ suggestions, the Web site has been enhanced to improve its usability. In the RLIMS-P Web site, phosphorylation information can be retrieved by PubMed IDs or keywords, with an option for selecting targeted species. The result page displays a sortable table with phosphorylation information. The text evidence page displays the abstract with color-coded entity mentions and includes links to UniProtKB entries via normalization, i.e. the linking of entity mentions to database identifiers, facilitated by the GenNorm tool and by the links to the bibliography in UniProt. Log in and editing capabilities are offered to any user interested in contributing to the validation of RLIMS-P results. Retrieved phosphorylation information can also be downloaded in CSV format and the text evidence in the BioC format. RLIMS-P is freely available. Database URL:http://www.proteininformationresource.org/rlimsp/
Collapse
Affiliation(s)
- Manabu Torii
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA, Department of Computer and Information Sciences, University of Delaware, Newark, DE 19711, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, Structural and Computational Biology Unit, EMBL (European Molecular Biology Laboratory), 69117 Heidelberg, Germany, Department of Biochemistry, Molecular and Cellular Biology, Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USACenter for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA, Department of Computer and Information Sciences, University of Delaware, Newark, DE 19711, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, Structural and Computational Biology Unit, EMBL (European Molecular Biology Laboratory), 69117 Heidelberg, Germany, Department of Biochemistry, Molecular and Cellular Biology, Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Gang Li
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA, Department of Computer and Information Sciences, University of Delaware, Newark, DE 19711, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, Structural and Computational Biology Unit, EMBL (European Molecular Biology Laboratory), 69117 Heidelberg, Germany, Department of Biochemistry, Molecular and Cellular Biology, Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USACenter for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA, Department of Computer and Information Sciences, University of Delaware, Newark, DE 19711, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, Structural and Computational Biology Unit, EMBL (European Molecular Biology Laboratory), 69117 Heidelberg, Germany, Department of Biochemistry, Molecular and Cellular Biology, Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Zhiwen Li
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA, Department of Computer and Information Sciences, University of Delaware, Newark, DE 19711, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, Structural and Computational Biology Unit, EMBL (European Molecular Biology Laboratory), 69117 Heidelberg, Germany, Department of Biochemistry, Molecular and Cellular Biology, Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USACenter for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA, Department of Computer and Information Sciences, University of Delaware, Newark, DE 19711, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, Structural and Computational Biology Unit, EMBL (European Molecular Biology Laboratory), 69117 Heidelberg, Germany, Department of Biochemistry, Molecular and Cellular Biology, Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Rose Oughtred
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA, Department of Computer and Information Sciences, University of Delaware, Newark, DE 19711, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, Structural and Computational Biology Unit, EMBL (European Molecular Biology Laboratory), 69117 Heidelberg, Germany, Department of Biochemistry, Molecular and Cellular Biology, Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Francesca Diella
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA, Department of Computer and Information Sciences, University of Delaware, Newark, DE 19711, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, Structural and Computational Biology Unit, EMBL (European Molecular Biology Laboratory), 69117 Heidelberg, Germany, Department of Biochemistry, Molecular and Cellular Biology, Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Irem Celen
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA, Department of Computer and Information Sciences, University of Delaware, Newark, DE 19711, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, Structural and Computational Biology Unit, EMBL (European Molecular Biology Laboratory), 69117 Heidelberg, Germany, Department of Biochemistry, Molecular and Cellular Biology, Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Cecilia N Arighi
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA, Department of Computer and Information Sciences, University of Delaware, Newark, DE 19711, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, Structural and Computational Biology Unit, EMBL (European Molecular Biology Laboratory), 69117 Heidelberg, Germany, Department of Biochemistry, Molecular and Cellular Biology, Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USACenter for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA, Department of Computer and Information Sciences, University of Delaware, Newark, DE 19711, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, Structural and Computational Biology Unit, EMBL (European Molecular Biology Laboratory), 69117 Heidelberg, Germany, Department of Biochemistry, Molecular and Cellular Biology, Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USACenter for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA, Department of Computer and Information Sciences, University of Delaware, Newark, DE 19711, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, Structural and Computational Biology Unit, EMBL (European Molecular Biology Laboratory), 69117 Heidelberg, Germany, Department of Biochemistry, Molecular and Cellular Biology, Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Hongzhan Huang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA, Department of Computer and Information Sciences, University of Delaware, Newark, DE 19711, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, Structural and Computational Biology Unit, EMBL (European Molecular Biology Laboratory), 69117 Heidelberg, Germany, Department of Biochemistry, Molecular and Cellular Biology, Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USACenter for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA, Department of Computer and Information Sciences, University of Delaware, Newark, DE 19711, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, Structural and Computational Biology Unit, EMBL (European Molecular Biology Laboratory), 69117 Heidelberg, Germany, Department of Biochemistry, Molecular and Cellular Biology, Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USACenter for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA, Department of Computer and Information Sciences, University of Delaware, Newark, DE 19711, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, Structural and Computational Biology Unit, EMBL (European Molecular Biology Laboratory), 69117 Heidelberg, Germany, Department of Biochemistry, Molecular and Cellular Biology, Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - K Vijay-Shanker
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA, Department of Computer and Information Sciences, University of Delaware, Newark, DE 19711, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, Structural and Computational Biology Unit, EMBL (European Molecular Biology Laboratory), 69117 Heidelberg, Germany, Department of Biochemistry, Molecular and Cellular Biology, Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Cathy H Wu
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA, Department of Computer and Information Sciences, University of Delaware, Newark, DE 19711, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, Structural and Computational Biology Unit, EMBL (European Molecular Biology Laboratory), 69117 Heidelberg, Germany, Department of Biochemistry, Molecular and Cellular Biology, Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USACenter for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA, Department of Computer and Information Sciences, University of Delaware, Newark, DE 19711, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, Structural and Computational Biology Unit, EMBL (European Molecular Biology Laboratory), 69117 Heidelberg, Germany, Department of Biochemistry, Molecular and Cellular Biology, Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USACenter for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA, Department of Computer and Information Sciences, University of Delaware, Newark, DE 19711, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, Structural and Computational Biology Unit, EMBL (European Molecular Biology Laboratory), 69117 Heidelberg, Germany, Department of Biochemistry, Molecular and Cellular Biology, Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| |
Collapse
|