Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Tudor CO, Ross KE, Li G, Vijay-Shanker K, Wu CH, Arighi CN. Construction of phosphorylation interaction networks by text mining of full-length articles using the eFIP system. Database (Oxford) 2015;2015:bav020. [PMID: 25833953 PMCID: PMC4381107 DOI: 10.1093/database/bav020] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2014] [Revised: 02/17/2015] [Accepted: 02/18/2015] [Indexed: 12/11/2022]

For:	Tudor CO, Ross KE, Li G, Vijay-Shanker K, Wu CH, Arighi CN. Construction of phosphorylation interaction networks by text mining of full-length articles using the eFIP system. Database (Oxford) 2015;2015:bav020. [PMID: 25833953 PMCID: PMC4381107 DOI: 10.1093/database/bav020] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2014] [Revised: 02/17/2015] [Accepted: 02/18/2015] [Indexed: 12/11/2022]

Number

Cited by Other Article(s)

Jehle S, Kunowska N, Benlasfer N, Woodsmith J, Weber G, Wahl MC, Stelzl U. A human kinase yeast array for the identification of kinases modulating phosphorylation-dependent protein-protein interactions. Mol Syst Biol 2022;18:e10820. [PMID: 35225431 PMCID: PMC8883442 DOI: 10.15252/msb.202110820] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 01/28/2022] [Accepted: 01/31/2022] [Indexed: 12/11/2022] Open

Lee YH, Choi D, Jang G, Park JY, Song ES, Lee H, Kuk MU, Joo J, Ahn SK, Byun Y, Park JT. Targeting regulation of ATP synthase 5 alpha/beta dimerization alleviates senescence. Aging (Albany NY) 2022;14:678-707. [PMID: 35093936 PMCID: PMC8833107 DOI: 10.18632/aging.203858] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Accepted: 01/14/2022] [Indexed: 11/25/2022]

Elangovan A, Li Y, Pires DEV, Davis MJ, Verspoor K. Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT. BMC Bioinformatics 2022;23:4. [PMID: 34983371 PMCID: PMC8729035 DOI: 10.1186/s12859-021-04504-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Accepted: 11/30/2021] [Indexed: 11/10/2022] Open

Abstract

MOTIVATION

Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation.

METHOD

We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models-dubbed PPI-BioBERT-x10-to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions.

RESULTS AND CONCLUSION

The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [Formula: see text] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.

Collapse

Gavali S, Ross KE, Cowart J, Chen C, Wu CH. iPTMnet RESTful API for Post-translational Modification Network Analysis. Methods Mol Biol 2022;2499:187-204. [PMID: 35696082 PMCID: PMC10082948 DOI: 10.1007/978-1-0716-2317-6_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Seymour RW, van der Post S, Mooradian AD, Held JM. ProteoSushi: A Software Tool to Biologically Annotate and Quantify Modification-Specific, Peptide-Centric Proteomics Data Sets. J Proteome Res 2021;20:3621-3628. [PMID: 34056901 DOI: 10.1021/acs.jproteome.1c00203] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Comeau DC, Wei CH, Islamaj Doğan R, Lu Z. PMC text mining subset in BioC: about three million full-text articles and growing. Bioinformatics 2020;35:3533-3535. [PMID: 30715220 DOI: 10.1093/bioinformatics/btz070] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Revised: 01/17/2018] [Accepted: 01/28/2019] [Indexed: 12/19/2022] Open

Nicholson DN, Greene CS. Constructing knowledge graphs and their biomedical applications. Comput Struct Biotechnol J 2020;18:1414-1428. [PMID: 32637040 PMCID: PMC7327409 DOI: 10.1016/j.csbj.2020.05.017] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 05/22/2020] [Accepted: 05/23/2020] [Indexed: 12/31/2022] Open

Gavali S, Cowart J, Chen C, Ross KE, Arighi C, Wu CH. RESTful API for iPTMnet: a resource for protein post-translational modification network discovery. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020;2020:5829784. [PMID: 32395768 PMCID: PMC7216315 DOI: 10.1093/database/baz157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 12/09/2019] [Accepted: 12/23/2019] [Indexed: 11/12/2022]

Huang H, Arighi CN, Ross KE, Ren J, Li G, Chen SC, Wang Q, Cowart J, Vijay-Shanker K, Wu CH. iPTMnet: an integrated resource for protein post-translational modification network discovery. Nucleic Acids Res 2019;46:D542-D550. [PMID: 29145615 PMCID: PMC5753337 DOI: 10.1093/nar/gkx1104] [Citation(s) in RCA: 89] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Accepted: 10/24/2017] [Indexed: 12/19/2022] Open

Ding R, Boutet E, Lieberherr D, Schneider M, Tognolli M, Wu CH, Vijay-Shanker K, Arighi CN. eGenPub, a text mining system for extending computationally mapped bibliography for UniProt Knowledgebase by capturing centrality. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018;2017:4627699. [PMID: 29220476 PMCID: PMC5691349 DOI: 10.1093/database/bax081] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Accepted: 10/11/2017] [Indexed: 11/13/2022]

Kang HT, Park JT, Choi K, Choi HJC, Jung CW, Kim GR, Lee YS, Park SC. Chemical screening identifies ROCK as a target for recovering mitochondrial function in Hutchinson-Gilford progeria syndrome. Aging Cell 2017;16:541-550. [PMID: 28317242 PMCID: PMC5418208 DOI: 10.1111/acel.12584] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/06/2017] [Indexed: 12/29/2022] Open

Kang HT, Park JT, Choi K, Kim Y, Choi HJC, Jung CW, Lee YS, Park SC. Chemical screening identifies ATM as a target for alleviating senescence. Nat Chem Biol 2017;13:616-623. [DOI: 10.1038/nchembio.2342] [Citation(s) in RCA: 91] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Accepted: 12/21/2016] [Indexed: 12/19/2022]

Natale DA, Arighi CN, Blake JA, Bona J, Chen C, Chen SC, Christie KR, Cowart J, D'Eustachio P, Diehl AD, Drabkin HJ, Duncan WD, Huang H, Ren J, Ross K, Ruttenberg A, Shamovsky V, Smith B, Wang Q, Zhang J, El-Sayed A, Wu CH. Protein Ontology (PRO): enhancing and scaling up the representation of protein entities. Nucleic Acids Res 2017;45:D339-D346. [PMID: 27899649 PMCID: PMC5210558 DOI: 10.1093/nar/gkw1075] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 10/21/2016] [Accepted: 10/25/2016] [Indexed: 12/04/2022] Open

Affiliation(s)

Darren A Natale Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
Cecilia N Arighi Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
Judith A Blake The Jackson Laboratory, Bar Harbor, ME 04609, USA
Jonathan Bona Oral Diagnostic Sciences, University at Buffalo School of Dental Medicine, Buffalo, NY 14214, USA
Chuming Chen Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
Sheng-Chih Chen Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
Karen R Christie The Jackson Laboratory, Bar Harbor, ME 04609, USA
Julie Cowart Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
Peter D'Eustachio Department of Biochemistry & Molecular Pharmacology, NYU School of Medicine, New York, NY 10016, USA
Alexander D Diehl Department of Neurology, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY 14203, USA New York State Center of Excellence in Bioinformatics and Life Sciences, University at Buffalo, Buffalo, NY 14203, USA
Harold J Drabkin The Jackson Laboratory, Bar Harbor, ME 04609, USA
William D Duncan Roswell Park Cancer Institute, Buffalo, NY 14203, USA New York State Center of Excellence in Bioinformatics and Life Sciences, University at Buffalo, Buffalo, NY 14203, USA
Hongzhan Huang Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
Jia Ren Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
Karen Ross Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
Alan Ruttenberg Oral Diagnostic Sciences, University at Buffalo School of Dental Medicine, Buffalo, NY 14214, USA
Veronica Shamovsky Department of Biochemistry & Molecular Pharmacology, NYU School of Medicine, New York, NY 10016, USA
Barry Smith National Center for Ontological Research, University at Buffalo, Buffalo, NY 14214, USA
Qinghua Wang Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
Jian Zhang Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
Abdelrahman El-Sayed Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
Cathy H Wu Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA

Collapse

Wang Q, Ross KE, Huang H, Ren J, Li G, Vijay-Shanker K, Wu CH, Arighi CN. Analysis of Protein Phosphorylation and Its Functional Impact on Protein-Protein Interactions via Text Mining of the Scientific Literature. Methods Mol Biol 2017;1558:213-232. [PMID: 28150240 PMCID: PMC5446092 DOI: 10.1007/978-1-4939-6783-4_10] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2023]

Chen C, Huang H, Wu CH. Protein Bioinformatics Databases and Resources. Methods Mol Biol 2017;1558:3-39. [PMID: 28150231 PMCID: PMC5506686 DOI: 10.1007/978-1-4939-6783-4_1] [Citation(s) in RCA: 110] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

iPTMnet: Integrative Bioinformatics for Studying PTM Networks. Methods Mol Biol 2017;1558:333-353. [PMID: 28150246 DOI: 10.1007/978-1-4939-6783-4_16] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]

Chang JW, Zhou YQ, Ul Qamar MT, Chen LL, Ding YD. Prediction of Protein-Protein Interactions by Evidence Combining Methods. Int J Mol Sci 2016;17:ijms17111946. [PMID: 27879651 PMCID: PMC5133940 DOI: 10.3390/ijms17111946] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 11/15/2016] [Accepted: 11/15/2016] [Indexed: 12/27/2022] Open

Ross KE, Natale DA, Arighi C, Chen SC, Huang H, Li G, Ren J, Wang M, Vijay-Shanker K, Wu CH. Scalable Text Mining Assisted Curation of Post-Translationally Modified Proteoforms in the Protein Ontology. CEUR WORKSHOP PROCEEDINGS 2016;1747:http://ceur-ws.org/Vol-1747/BIT103_ICBO2016.pdf. [PMID: 28706471 PMCID: PMC5504912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Soliman M, Nasraoui O, Cooper NGF. Building a glaucoma interaction network using a text mining approach. BioData Min 2016;9:17. [PMID: 27152122 PMCID: PMC4857381 DOI: 10.1186/s13040-016-0096-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2015] [Accepted: 04/23/2016] [Indexed: 11/21/2022] Open

Abstract

Background

The volume of biomedical literature and its underlying knowledge base is rapidly expanding, making it beyond the ability of a single human being to read through all the literature. Several automated methods have been developed to help make sense of this dilemma. The present study reports on the results of a text mining approach to extract gene interactions from the data warehouse of published experimental results which are then used to benchmark an interaction network associated with glaucoma. To the best of our knowledge, there is, as yet, no glaucoma interaction network derived solely from text mining approaches. The presence of such a network could provide a useful summative knowledge base to complement other forms of clinical information related to this disease.

Results

A glaucoma corpus was constructed from PubMed Central and a text mining approach was applied to extract genes and their relations from this corpus. The extracted relations between genes were checked using reference interaction databases and classified generally as known or new relations. The extracted genes and relations were then used to construct a glaucoma interaction network. Analysis of the resulting network indicated that it bears the characteristics of a small world interaction network. Our analysis showed the presence of seven glaucoma linked genes that defined the network modularity. A web-based system for browsing and visualizing the extracted glaucoma related interaction networks is made available at http://neurogene.spd.louisville.edu/GlaucomaINViewer/Form1.aspx.

Conclusions

This study has reported the first version of a glaucoma interaction network using a text mining approach. The power of such an approach is in its ability to cover a wide range of glaucoma related studies published over many years. Hence, a bigger picture of the disease can be established. To the best of our knowledge, this is the first glaucoma interaction network to summarize the known literature. The major findings were a set of relations that could not be found in existing interaction databases and that were found to be new, in addition to a smaller subnetwork consisting of interconnected clusters of seven glaucoma genes. Future improvements can be applied towards obtaining a better version of this network.

Electronic supplementary material

The online version of this article (doi:10.1186/s13040-016-0096-2) contains supplementary material, which is available to authorized users.

Collapse

Bioinformatics Knowledge Map for Analysis of Beta-Catenin Function in Cancer. PLoS One 2015;10:e0141773. [PMID: 26509276 PMCID: PMC4624812 DOI: 10.1371/journal.pone.0141773] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2015] [Accepted: 10/13/2015] [Indexed: 01/26/2023] Open

Abstract

Given the wealth of bioinformatics resources and the growing complexity of biological information, it is valuable to integrate data from disparate sources to gain insight into the role of genes/proteins in health and disease. We have developed a bioinformatics framework that combines literature mining with information from biomedical ontologies and curated databases to create knowledge "maps" of genes/proteins of interest. We applied this approach to the study of beta-catenin, a cell adhesion molecule and transcriptional regulator implicated in cancer. The knowledge map includes post-translational modifications (PTMs), protein-protein interactions, disease-associated mutations, and transcription factors co-activated by beta-catenin and their targets and captures the major processes in which beta-catenin is known to participate. Using the map, we generated testable hypotheses about beta-catenin biology in normal and cancer cells. By focusing on proteins participating in multiple relation types, we identified proteins that may participate in feedback loops regulating beta-catenin transcriptional activity. By combining multiple network relations with PTM proteoform-specific functional information, we proposed a mechanism to explain the observation that the cyclin dependent kinase CDK5 positively regulates beta-catenin co-activator activity. Finally, by overlaying cancer-associated mutation data with sequence features, we observed mutation patterns in several beta-catenin PTM sites and PTM enzyme binding sites that varied by tissue type, suggesting multiple mechanisms by which beta-catenin mutations can contribute to cancer. The approach described, which captures rich information for molecular species from genes and proteins to PTM proteoforms, is extensible to other proteins and their involvement in disease.

Collapse

Li G, Ross KE, Arighi CN, Peng Y, Wu CH, Vijay-Shanker K. miRTex: A Text Mining System for miRNA-Gene Relation Extraction. PLoS Comput Biol 2015;11:e1004391. [PMID: 26407127 PMCID: PMC4583433 DOI: 10.1371/journal.pcbi.1004391] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Accepted: 06/08/2015] [Indexed: 12/27/2022] Open

Abstract

MicroRNAs (miRNAs) regulate a wide range of cellular and developmental processes through gene expression suppression or mRNA degradation. Experimentally validated miRNA gene targets are often reported in the literature. In this paper, we describe miRTex, a text mining system that extracts miRNA-target relations, as well as miRNA-gene and gene-miRNA regulation relations. The system achieves good precision and recall when evaluated on a literature corpus of 150 abstracts with F-scores close to 0.90 on the three different types of relations. We conducted full-scale text mining using miRTex to process all the Medline abstracts and all the full-length articles in the PubMed Central Open Access Subset. The results for all the Medline abstracts are stored in a database for interactive query and file download via the website at http://proteininformationresource.org/mirtex. Using miRTex, we identified genes potentially regulated by miRNAs in Triple Negative Breast Cancer, as well as miRNA-gene relations that, in conjunction with kinase-substrate relations, regulate the response to abiotic stress in Arabidopsis thaliana. These two use cases demonstrate the usefulness of miRTex text mining in the analysis of miRNA-regulated biological processes.

MicroRNAs (miRNAs) are an important class of RNAs that regulate a wide range of biological processes by post-transcriptional regulation of gene expression. The amount of literature describing experimentally validated miRNA targets is increasing rapidly, which poses a challenge to researchers and biocurators to stay up-to-date with the available information. Text mining methods have been used to extract miRNA-gene associated pairs and assist in curation. In this paper, we describe miRTex, a text mining system that extracts miRNA-target, miRNA-gene regulation and gene-miRNA regulation relations. We evaluate miRTex performance on two corpora, and show that the elaborate use of lexico-syntactic information and linguistic generalizations enables it to achieve the state-of-the-art performance. We have processed the all the Medline abstracts and all the full-length articles in the PubMed Central Open Access Subset with miRTex, and provide a website to access the extraction results from all the Medline abstracts. The full-scale text mining results will be a useful resource for miRNA researchers, while the miRTex tool itself can be integrated into literature-based curation pipelines. We present two use cases (for animal and plant miRNAs, respectively) that show how the full-scale text mining can be used in combination with other bioinformatics resources to gain insight into biological processes.

Collapse