1
|
Jehle S, Kunowska N, Benlasfer N, Woodsmith J, Weber G, Wahl MC, Stelzl U. A human kinase yeast array for the identification of kinases modulating phosphorylation-dependent protein-protein interactions. Mol Syst Biol 2022; 18:e10820. [PMID: 35225431 PMCID: PMC8883442 DOI: 10.15252/msb.202110820] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 01/28/2022] [Accepted: 01/31/2022] [Indexed: 12/11/2022] Open
Abstract
Protein kinases play an important role in cellular signaling pathways and their dysregulation leads to multiple diseases, making kinases prime drug targets. While more than 500 human protein kinases are known to collectively mediate phosphorylation of over 290,000 S/T/Y sites, the activities have been characterized only for a minor, intensively studied subset. To systematically address this discrepancy, we developed a human kinase array in Saccharomyces cerevisiae as a simple readout tool to systematically assess kinase activities. For this array, we expressed 266 human kinases in four different S. cerevisiae strains and profiled ectopic growth as a proxy for kinase activity across 33 conditions. More than half of the kinases showed an activity-dependent phenotype across many conditions and in more than one strain. We then employed the kinase array to identify the kinase(s) that can modulate protein-protein interactions (PPIs). Two characterized, phosphorylation-dependent PPIs with unknown kinase-substrate relationships were analyzed in a phospho-yeast two-hybrid assay. CK2α1 and SGK2 kinases can abrogate the interaction between the spliceosomal proteins AAR2 and PRPF8, and NEK6 kinase was found to mediate the estrogen receptor (ERα) interaction with 14-3-3 proteins. The human kinase yeast array can thus be used for a variety of kinase activity-dependent readouts.
Collapse
Affiliation(s)
- Stefanie Jehle
- Otto-Warburg-Laboratory, Max-Planck-Institute for Molecular Genetics (MPIMG), Berlin, Germany
| | - Natalia Kunowska
- Institute of Pharmaceutical Sciences, University of Graz, Graz, Austria
| | - Nouhad Benlasfer
- Otto-Warburg-Laboratory, Max-Planck-Institute for Molecular Genetics (MPIMG), Berlin, Germany
| | - Jonathan Woodsmith
- Otto-Warburg-Laboratory, Max-Planck-Institute for Molecular Genetics (MPIMG), Berlin, Germany
- Institute of Pharmaceutical Sciences, University of Graz, Graz, Austria
| | - Gert Weber
- Institut für Chemie und Biochemie, Freie Universität, Berlin, Germany
- Helmholtz-Zentrum Berlin für Materialien und Energie, Macromolecular Crystallography, Berlin, Germany
| | - Markus C Wahl
- Institut für Chemie und Biochemie, Freie Universität, Berlin, Germany
| | - Ulrich Stelzl
- Otto-Warburg-Laboratory, Max-Planck-Institute for Molecular Genetics (MPIMG), Berlin, Germany
- Institute of Pharmaceutical Sciences, University of Graz, Graz, Austria
- Field of Excellence BioHealth, University of Graz and BioTechMed-Graz, Graz, Austria
| |
Collapse
|
2
|
Lee YH, Choi D, Jang G, Park JY, Song ES, Lee H, Kuk MU, Joo J, Ahn SK, Byun Y, Park JT. Targeting regulation of ATP synthase 5 alpha/beta dimerization alleviates senescence. Aging (Albany NY) 2022; 14:678-707. [PMID: 35093936 PMCID: PMC8833107 DOI: 10.18632/aging.203858] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Accepted: 01/14/2022] [Indexed: 11/25/2022]
Abstract
Senescence is a distinct set of changes in the senescence-associated secretory phenotype (SASP) and leads to aging and age-related diseases. Here, we screened compounds that could ameliorate senescence and identified an oxazoloquinoline analog (KB1541) designed to inhibit IL-33 signaling pathway. To elucidate the mechanism of action of KB1541, the proteins binding to KB1541 were investigated, and an interaction between KB1541 and 14-3-3ζ protein was found. Specifically, KB1541 interacted with 14-3-3ζ protein and phosphorylated of 14-3-3ζ protein at serine 58 residue. This phosphorylation increased ATP synthase 5 alpha/beta dimerization, which in turn promoted ATP production through increased oxidative phosphorylation (OXPHOS) efficiency. Then, the increased OXPHOS efficiency induced the recovery of mitochondrial function, coupled with senescence alleviation. Taken together, our results demonstrate a mechanism by which senescence is regulated by ATP synthase 5 alpha/beta dimerization upon fine-tuning of KB1541-mediated 14-3-3ζ protein activity.
Collapse
Affiliation(s)
- Yun Haeng Lee
- Division of Life Sciences, College of Life Sciences and Bioengineering, Incheon National University, Incheon 22012, Korea
| | - Doyoung Choi
- College of Pharmacy, Korea University, Sejong 30019, Republic of Korea
| | - Geonhee Jang
- College of Pharmacy, Korea University, Sejong 30019, Republic of Korea
| | - Ji Yun Park
- Division of Life Sciences, College of Life Sciences and Bioengineering, Incheon National University, Incheon 22012, Korea
| | - Eun Seon Song
- Division of Life Sciences, College of Life Sciences and Bioengineering, Incheon National University, Incheon 22012, Korea
| | - Haneur Lee
- Division of Life Sciences, College of Life Sciences and Bioengineering, Incheon National University, Incheon 22012, Korea
| | - Myeong Uk Kuk
- Division of Life Sciences, College of Life Sciences and Bioengineering, Incheon National University, Incheon 22012, Korea
| | - Junghyun Joo
- Division of Life Sciences, College of Life Sciences and Bioengineering, Incheon National University, Incheon 22012, Korea
| | - Soon Kil Ahn
- Division of Life Sciences, College of Life Sciences and Bioengineering, Incheon National University, Incheon 22012, Korea
| | - Youngjoo Byun
- College of Pharmacy, Korea University, Sejong 30019, Republic of Korea
| | - Joon Tae Park
- Division of Life Sciences, College of Life Sciences and Bioengineering, Incheon National University, Incheon 22012, Korea
| |
Collapse
|
3
|
Elangovan A, Li Y, Pires DEV, Davis MJ, Verspoor K. Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT. BMC Bioinformatics 2022; 23:4. [PMID: 34983371 PMCID: PMC8729035 DOI: 10.1186/s12859-021-04504-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Accepted: 11/30/2021] [Indexed: 11/10/2022] Open
Abstract
MOTIVATION Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. METHOD We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models-dubbed PPI-BioBERT-x10-to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. RESULTS AND CONCLUSION The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [Formula: see text] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.
Collapse
Affiliation(s)
- Aparna Elangovan
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
| | - Yuan Li
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
| | - Douglas E. V. Pires
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
| | - Melissa J. Davis
- The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
- Department of Clinical Pathology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Australia
| | - Karin Verspoor
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
- School of Computing Technologies, RMIT University, Melbourne, Australia
| |
Collapse
|
4
|
Gavali S, Ross KE, Cowart J, Chen C, Wu CH. iPTMnet RESTful API for Post-translational Modification Network Analysis. Methods Mol Biol 2022; 2499:187-204. [PMID: 35696082 PMCID: PMC10082948 DOI: 10.1007/978-1-0716-2317-6_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
iPTMnet is a resource that combines rich information about protein post-translational modifications (PTM) from curated databases as well as text mining tools. Researchers can use the iPTMnet website to query, analyze and download the PTM data. In this chapter we describe the iPTMnet RESTful API which provides a way to streamline the integration of iPTMnet data into an automated data analysis workflow. In the first section, we give an overview of the architecture of the API. In the second section, we describe various function defined by the API and provide detailed examples of using these functions.
Collapse
|
5
|
Seymour RW, van der Post S, Mooradian AD, Held JM. ProteoSushi: A Software Tool to Biologically Annotate and Quantify Modification-Specific, Peptide-Centric Proteomics Data Sets. J Proteome Res 2021; 20:3621-3628. [PMID: 34056901 DOI: 10.1021/acs.jproteome.1c00203] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Large-scale proteomic profiling of protein post-translational modifications has provided important insights into the regulation of cell signaling and disease. These modification-specific proteomics workflows nearly universally enrich modified peptides prior to mass spectrometry analysis, but protein-centric proteomic software tools have many limitations evaluating and interpreting these peptide-centric data sets. We, therefore, developed ProteoSushi, a software tool tailored to analysis of each modified site in peptide-centric proteomic data sets that is compatible with any post-translational modification or chemical label. ProteoSushi uses a unique approach to assign identified peptides to shared proteins and genes, minimizing redundancy by prioritizing shared assignments based on UniProt annotation score and optional user-supplied protein/gene lists. ProteoSushi simplifies quantitation by summing or averaging intensities for each modified site, merging overlapping peptide charge states, missed cleavages, spectral matches, and variable modifications into a single value. ProteoSushi also annotates each PTM site with the most up-to-date biological information available from UniProt, such as functional roles or known modifications, the protein domain in which the site resides, the protein's subcellular location and function, and more. ProteoSushi has a graphical user interface for ease of use. ProteoSushi's flexibility and combination of analysis features streamlines peptide-centric data processing and knowledge mining of large modification-specific proteomics data sets.
Collapse
Affiliation(s)
- Robert W Seymour
- Department of Medicine, Washington University School of Medicine in St. Louis, Campus Box 8076, 660 South Euclid Avenue, St. Louis, Missouri 63110, United States
| | - Sjoerd van der Post
- Department of Medicine, Washington University School of Medicine in St. Louis, Campus Box 8076, 660 South Euclid Avenue, St. Louis, Missouri 63110, United States.,Department of Medical Biochemistry, University of Gothenburg, Gothenburg, Sweden
| | - Arshag D Mooradian
- Department of Medicine, Washington University School of Medicine in St. Louis, Campus Box 8076, 660 South Euclid Avenue, St. Louis, Missouri 63110, United States
| | - Jason M Held
- Department of Medicine, Washington University School of Medicine in St. Louis, Campus Box 8076, 660 South Euclid Avenue, St. Louis, Missouri 63110, United States.,Department of Anesthesiology, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, United States.,Siteman Cancer Center, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, United States
| |
Collapse
|
6
|
Comeau DC, Wei CH, Islamaj Doğan R, Lu Z. PMC text mining subset in BioC: about three million full-text articles and growing. Bioinformatics 2020; 35:3533-3535. [PMID: 30715220 DOI: 10.1093/bioinformatics/btz070] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Revised: 01/17/2018] [Accepted: 01/28/2019] [Indexed: 12/19/2022] Open
Abstract
MOTIVATION Interest in text mining full-text biomedical research articles is growing. To facilitate automated processing of nearly 3 million full-text articles (in PubMed Central® Open Access and Author Manuscript subsets) and to improve interoperability, we convert these articles to BioC, a community-driven simple data structure in either XML or JavaScript Object Notation format for conveniently sharing text and annotations. RESULTS The resultant articles can be downloaded via both File Transfer Protocol for bulk access and a Web API for updates or a more focused collection. Since the availability of the Web API in 2017, our BioC collection has been widely used by the research community. AVAILABILITY AND IMPLEMENTATION https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/BioC-PMC/.
Collapse
Affiliation(s)
- Donald C Comeau
- National Center for Biotechnology Information (NCBI), U.S. Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA
| | - Chih-Hsuan Wei
- National Center for Biotechnology Information (NCBI), U.S. Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA
| | - Rezarta Islamaj Doğan
- National Center for Biotechnology Information (NCBI), U.S. Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), U.S. Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA
| |
Collapse
|
7
|
Nicholson DN, Greene CS. Constructing knowledge graphs and their biomedical applications. Comput Struct Biotechnol J 2020; 18:1414-1428. [PMID: 32637040 PMCID: PMC7327409 DOI: 10.1016/j.csbj.2020.05.017] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 05/22/2020] [Accepted: 05/23/2020] [Indexed: 12/31/2022] Open
Abstract
Knowledge graphs can support many biomedical applications. These graphs represent biomedical concepts and relationships in the form of nodes and edges. In this review, we discuss how these graphs are constructed and applied with a particular focus on how machine learning approaches are changing these processes. Biomedical knowledge graphs have often been constructed by integrating databases that were populated by experts via manual curation, but we are now seeing a more robust use of automated systems. A number of techniques are used to represent knowledge graphs, but often machine learning methods are used to construct a low-dimensional representation that can support many different applications. This representation is designed to preserve a knowledge graph's local and/or global structure. Additional machine learning methods can be applied to this representation to make predictions within genomic, pharmaceutical, and clinical domains. We frame our discussion first around knowledge graph construction and then around unifying representational learning techniques and unifying applications. Advances in machine learning for biomedicine are creating new opportunities across many domains, and we note potential avenues for future work with knowledge graphs that appear particularly promising.
Collapse
Affiliation(s)
- David N. Nicholson
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, United States
| | - Casey S. Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, United States
| |
Collapse
|
8
|
Gavali S, Cowart J, Chen C, Ross KE, Arighi C, Wu CH. RESTful API for iPTMnet: a resource for protein post-translational modification network discovery. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5829784. [PMID: 32395768 PMCID: PMC7216315 DOI: 10.1093/database/baz157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 12/09/2019] [Accepted: 12/23/2019] [Indexed: 11/12/2022]
Abstract
iPTMnet is a bioinformatics resource that integrates protein post-translational modification (PTM) data from text mining and curated databases and ontologies to aid in knowledge discovery and scientific study. The current iPTMnet website can be used for querying and browsing rich PTM information but does not support automated iPTMnet data integration with other tools. Hence, we have developed a RESTful API utilizing the latest developments in cloud technologies to facilitate the integration of iPTMnet into existing tools and pipelines. We have packaged iPTMnet API software in Docker containers and published it on DockerHub for easy redistribution. We have also developed Python and R packages that allow users to integrate iPTMnet for scientific discovery, as demonstrated in a use case that connects PTM sites to kinase signaling pathways.
Collapse
Affiliation(s)
- Sachin Gavali
- Center for Bioinformatics and Computational Biology, 205 Delaware Biotechnology Institute, 15 Innovation Way, Newark, DE 19711, USA
| | - Julie Cowart
- Center for Bioinformatics and Computational Biology, 205 Delaware Biotechnology Institute, 15 Innovation Way, Newark, DE 19711, USA
| | - Chuming Chen
- Center for Bioinformatics and Computational Biology, 205 Delaware Biotechnology Institute, 15 Innovation Way, Newark, DE 19711, USA.,Department of Computer and Information Sciences, 101 Smith Hall, 18 Amstel Ave Newark, DE 19716, USA
| | - Karen E Ross
- Department of Biochemistry and Molecular & Cellular Biology, 337 Basic Science Building, 3900 Reservoir Road, N.W, Washington D.C. 20057, USA
| | - Cecilia Arighi
- Center for Bioinformatics and Computational Biology, 205 Delaware Biotechnology Institute, 15 Innovation Way, Newark, DE 19711, USA.,Department of Computer and Information Sciences, 101 Smith Hall, 18 Amstel Ave Newark, DE 19716, USA
| | - Cathy H Wu
- Center for Bioinformatics and Computational Biology, 205 Delaware Biotechnology Institute, 15 Innovation Way, Newark, DE 19711, USA.,Department of Biochemistry and Molecular & Cellular Biology, 337 Basic Science Building, 3900 Reservoir Road, N.W, Washington D.C. 20057, USA.,Department of Computer and Information Sciences, 101 Smith Hall, 18 Amstel Ave Newark, DE 19716, USA
| |
Collapse
|
9
|
Huang H, Arighi CN, Ross KE, Ren J, Li G, Chen SC, Wang Q, Cowart J, Vijay-Shanker K, Wu CH. iPTMnet: an integrated resource for protein post-translational modification network discovery. Nucleic Acids Res 2019; 46:D542-D550. [PMID: 29145615 PMCID: PMC5753337 DOI: 10.1093/nar/gkx1104] [Citation(s) in RCA: 89] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Accepted: 10/24/2017] [Indexed: 12/19/2022] Open
Abstract
Protein post-translational modifications (PTMs) play a pivotal role in numerous biological processes by modulating regulation of protein function. We have developed iPTMnet (http://proteininformationresource.org/iPTMnet) for PTM knowledge discovery, employing an integrative bioinformatics approach—combining text mining, data mining, and ontological representation to capture rich PTM information, including PTM enzyme-substrate-site relationships, PTM-specific protein-protein interactions (PPIs) and PTM conservation across species. iPTMnet encompasses data from (i) our PTM-focused text mining tools, RLIMS-P and eFIP, which extract phosphorylation information from full-scale mining of PubMed abstracts and full-length articles; (ii) a set of curated databases with experimentally observed PTMs; and iii) Protein Ontology that organizes proteins and PTM proteoforms, enabling their representation, annotation and comparison within and across species. Presently covering eight major PTM types (phosphorylation, ubiquitination, acetylation, methylation, glycosylation, S-nitrosylation, sumoylation and myristoylation), iPTMnet knowledgebase contains more than 654 500 unique PTM sites in over 62 100 proteins, along with more than 1200 PTM enzymes and over 24 300 PTM enzyme-substrate-site relations. The website supports online search, browsing, retrieval and visual analysis for scientific queries. Several examples, including functional interpretation of phosphoproteomic data, demonstrate iPTMnet as a gateway for visual exploration and systematic analysis of PTM networks and conservation, thereby enabling PTM discovery and hypothesis generation.
Collapse
Affiliation(s)
- Hongzhan Huang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA.,Department of Computer & Information Sciences, University of Delaware, Newark, DE 19711, USA
| | - Cecilia N Arighi
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA.,Department of Computer & Information Sciences, University of Delaware, Newark, DE 19711, USA
| | - Karen E Ross
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20057, USA
| | - Jia Ren
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Gang Li
- Department of Computer & Information Sciences, University of Delaware, Newark, DE 19711, USA
| | - Sheng-Chih Chen
- Department of Computer & Information Sciences, University of Delaware, Newark, DE 19711, USA
| | - Qinghua Wang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA.,Department of Computer & Information Sciences, University of Delaware, Newark, DE 19711, USA
| | - Julie Cowart
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - K Vijay-Shanker
- Department of Computer & Information Sciences, University of Delaware, Newark, DE 19711, USA
| | - Cathy H Wu
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA.,Department of Computer & Information Sciences, University of Delaware, Newark, DE 19711, USA.,Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20057, USA
| |
Collapse
|
10
|
Ding R, Boutet E, Lieberherr D, Schneider M, Tognolli M, Wu CH, Vijay-Shanker K, Arighi CN. eGenPub, a text mining system for extending computationally mapped bibliography for UniProt Knowledgebase by capturing centrality. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2017:4627699. [PMID: 29220476 PMCID: PMC5691349 DOI: 10.1093/database/bax081] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Accepted: 10/11/2017] [Indexed: 11/13/2022]
Abstract
UniProt Knowledgebase (UniProtKB) is a publicly available database with access to a vast amount of protein sequence and functional information. To widen the scope of the publications associated with a protein entry, UniProt has introduced the computationally mapped additional bibliography section, which includes literature collected from external sources. In this article, we describe a text mining system, eGenPub, which selects articles that are 'about' specific proteins and allows automatic identification of additional bibliography for given UniProt protein entries. Focusing on plant proteins initially, eGenPub utilizes a gene normalization tool called pGenN, and a trained support vector machine model, which achieves a precision of 95.3%, to predict whether an article, based on its abstract, should be linked to a given UniProt entry. We have conducted a full-scale PubMed processing using eGenPub for eight common plant species. Altogether, 9025 articles are identified as relevant bibliography for 4752 UniProt entries, among which 5252 are additional papers not in the existing publication section. These newly computationally mapped additional bibliography via eGenPub is being integrated in the UniProt production pipeline, and can be accessed via the UniProtKB protein entry publication view.
Collapse
Affiliation(s)
- Ruoyao Ding
- Department of Computer and Information Science, University of Delaware, Newark, DE 19716, USA
| | - Emmanuel Boutet
- Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Damien Lieberherr
- Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Michel Schneider
- Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Michael Tognolli
- Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Cathy H Wu
- Department of Computer and Information Science, University of Delaware, Newark, DE 19716, USA.,Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19716, USA.,Protein Information Resource, University of Delaware, Newark, DE 19716 and Georgetown University, Washington, DC 20007, USA
| | - K Vijay-Shanker
- Department of Computer and Information Science, University of Delaware, Newark, DE 19716, USA
| | - Cecilia N Arighi
- Department of Computer and Information Science, University of Delaware, Newark, DE 19716, USA.,Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19716, USA.,Protein Information Resource, University of Delaware, Newark, DE 19716 and Georgetown University, Washington, DC 20007, USA
| |
Collapse
|
11
|
Kang HT, Park JT, Choi K, Choi HJC, Jung CW, Kim GR, Lee YS, Park SC. Chemical screening identifies ROCK as a target for recovering mitochondrial function in Hutchinson-Gilford progeria syndrome. Aging Cell 2017; 16:541-550. [PMID: 28317242 PMCID: PMC5418208 DOI: 10.1111/acel.12584] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/06/2017] [Indexed: 12/29/2022] Open
Abstract
Hutchinson-Gilford progeria syndrome (HGPS) constitutes a genetic disease wherein an aging phenotype manifests in childhood. Recent studies indicate that reactive oxygen species (ROS) play important roles in HGPS phenotype progression. Thus, pharmacological reduction in ROS levels has been proposed as a potentially effective treatment for patient with this disorder. In this study, we performed high-throughput screening to find compounds that could reduce ROS levels in HGPS fibroblasts and identified rho-associated protein kinase (ROCK) inhibitor (Y-27632) as an effective agent. To elucidate the underlying mechanism of ROCK in regulating ROS levels, we performed a yeast two-hybrid screen and discovered that ROCK1 interacts with Rac1b. ROCK activation phosphorylated Rac1b at Ser71 and increased ROS levels by facilitating the interaction between Rac1b and cytochrome c. Conversely, ROCK inactivation with Y-27632 abolished their interaction, concomitant with ROS reduction. Additionally, ROCK activation resulted in mitochondrial dysfunction, whereas ROCK inactivation with Y-27632 induced the recovery of mitochondrial function. Furthermore, a reduction in the frequency of abnormal nuclear morphology and DNA double-strand breaks was observed along with decreased ROS levels. Thus, our study reveals a novel mechanism through which alleviation of the HGPS phenotype is mediated by the recovery of mitochondrial function upon ROCK inactivation.
Collapse
Affiliation(s)
- Hyun Tae Kang
- Well Aging Research Center; Samsung Advanced Institute of Technology; Samsung Electronics; Suwon-si Korea
| | - Joon Tae Park
- Well Aging Research Center; Samsung Advanced Institute of Technology; Samsung Electronics; Suwon-si Korea
| | - Kobong Choi
- Well Aging Research Center; Samsung Advanced Institute of Technology; Samsung Electronics; Suwon-si Korea
| | - Hyo Jei Claudia Choi
- Well Aging Research Center; Samsung Advanced Institute of Technology; Samsung Electronics; Suwon-si Korea
| | - Chul Won Jung
- Well Aging Research Center; Samsung Advanced Institute of Technology; Samsung Electronics; Suwon-si Korea
| | - Gyu Ree Kim
- Well Aging Research Center; DGIST; Daegu Korea
| | - Young-Sam Lee
- Well Aging Research Center; DGIST; Daegu Korea
- Department of New Biology; DGIST; Daegu Korea
| | - Sang Chul Park
- Well Aging Research Center; DGIST; Daegu Korea
- Department of New Biology; DGIST; Daegu Korea
| |
Collapse
|
12
|
Kang HT, Park JT, Choi K, Kim Y, Choi HJC, Jung CW, Lee YS, Park SC. Chemical screening identifies ATM as a target for alleviating senescence. Nat Chem Biol 2017; 13:616-623. [DOI: 10.1038/nchembio.2342] [Citation(s) in RCA: 91] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Accepted: 12/21/2016] [Indexed: 12/19/2022]
|
13
|
Natale DA, Arighi CN, Blake JA, Bona J, Chen C, Chen SC, Christie KR, Cowart J, D'Eustachio P, Diehl AD, Drabkin HJ, Duncan WD, Huang H, Ren J, Ross K, Ruttenberg A, Shamovsky V, Smith B, Wang Q, Zhang J, El-Sayed A, Wu CH. Protein Ontology (PRO): enhancing and scaling up the representation of protein entities. Nucleic Acids Res 2017; 45:D339-D346. [PMID: 27899649 PMCID: PMC5210558 DOI: 10.1093/nar/gkw1075] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 10/21/2016] [Accepted: 10/25/2016] [Indexed: 12/04/2022] Open
Abstract
The Protein Ontology (PRO; http://purl.obolibrary.org/obo/pr) formally defines and describes taxon-specific and taxon-neutral protein-related entities in three major areas: proteins related by evolution; proteins produced from a given gene; and protein-containing complexes. PRO thus serves as a tool for referencing protein entities at any level of specificity. To enhance this ability, and to facilitate the comparison of such entities described in different resources, we developed a standardized representation of proteoforms using UniProtKB as a sequence reference and PSI-MOD as a post-translational modification reference. We illustrate its use in facilitating an alignment between PRO and Reactome protein entities. We also address issues of scalability, describing our first steps into the use of text mining to identify protein-related entities, the large-scale import of proteoform information from expert curated resources, and our ability to dynamically generate PRO terms. Web views for individual terms are now more informative about closely-related terms, including for example an interactive multiple sequence alignment. Finally, we describe recent improvement in semantic utility, with PRO now represented in OWL and as a SPARQL endpoint. These developments will further support the anticipated growth of PRO and facilitate discoverability of and allow aggregation of data relating to protein entities.
Collapse
Affiliation(s)
- Darren A Natale
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Cecilia N Arighi
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | | | - Jonathan Bona
- Oral Diagnostic Sciences, University at Buffalo School of Dental Medicine, Buffalo, NY 14214, USA
| | - Chuming Chen
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Sheng-Chih Chen
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | | | - Julie Cowart
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Peter D'Eustachio
- Department of Biochemistry & Molecular Pharmacology, NYU School of Medicine, New York, NY 10016, USA
| | - Alexander D Diehl
- Department of Neurology, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY 14203, USA
- New York State Center of Excellence in Bioinformatics and Life Sciences, University at Buffalo, Buffalo, NY 14203, USA
| | | | - William D Duncan
- Roswell Park Cancer Institute, Buffalo, NY 14203, USA
- New York State Center of Excellence in Bioinformatics and Life Sciences, University at Buffalo, Buffalo, NY 14203, USA
| | - Hongzhan Huang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Jia Ren
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Karen Ross
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Alan Ruttenberg
- Oral Diagnostic Sciences, University at Buffalo School of Dental Medicine, Buffalo, NY 14214, USA
| | - Veronica Shamovsky
- Department of Biochemistry & Molecular Pharmacology, NYU School of Medicine, New York, NY 10016, USA
| | - Barry Smith
- National Center for Ontological Research, University at Buffalo, Buffalo, NY 14214, USA
| | - Qinghua Wang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Jian Zhang
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Abdelrahman El-Sayed
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Cathy H Wu
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| |
Collapse
|
14
|
Wang Q, Ross KE, Huang H, Ren J, Li G, Vijay-Shanker K, Wu CH, Arighi CN. Analysis of Protein Phosphorylation and Its Functional Impact on Protein-Protein Interactions via Text Mining of the Scientific Literature. Methods Mol Biol 2017; 1558:213-232. [PMID: 28150240 PMCID: PMC5446092 DOI: 10.1007/978-1-4939-6783-4_10] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2023]
Abstract
Post-translational modifications (PTMs) are one of the main contributors to the diversity of proteoforms in the proteomic landscape. In particular, protein phosphorylation represents an essential regulatory mechanism that plays a role in many biological processes. Protein kinases, the enzymes catalyzing this reaction, are key participants in metabolic and signaling pathways. Their activation or inactivation dictate downstream events: what substrates are modified and their subsequent impact (e.g., activation state, localization, protein-protein interactions (PPIs)). The biomedical literature continues to be the main source of evidence for experimental information about protein phosphorylation. Automatic methods to bring together phosphorylation events and phosphorylation-dependent PPIs can help to summarize the current knowledge and to expose hidden connections. In this chapter, we demonstrate two text mining tools, RLIMS-P and eFIP, for the retrieval and extraction of kinase-substrate-site data and phosphorylation-dependent PPIs from the literature. These tools offer several advantages over a literature search in PubMed as their results are specific for phosphorylation. RLIMS-P and eFIP results can be sorted, organized, and viewed in multiple ways to answer relevant biological questions, and the protein mentions are linked to UniProt identifiers.
Collapse
Affiliation(s)
- Qinghua Wang
- Center for Bioinformatics and Computational Biology, Delaware Biotechnology Institute, University of Delaware, 15 Innovation Way, Suite 205, Newark, DE, 19711, USA
- Department of Computer & Information Sciences, University of Delaware, Newark, DE, 19711, USA
| | - Karen E Ross
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC, 20057, USA
| | - Hongzhan Huang
- Center for Bioinformatics and Computational Biology, Delaware Biotechnology Institute, University of Delaware, 15 Innovation Way, Suite 205, Newark, DE, 19711, USA
- Department of Computer & Information Sciences, University of Delaware, Newark, DE, 19711, USA
| | - Jia Ren
- Center for Bioinformatics and Computational Biology, Delaware Biotechnology Institute, University of Delaware, 15 Innovation Way, Suite 205, Newark, DE, 19711, USA
| | - Gang Li
- Department of Computer & Information Sciences, University of Delaware, Newark, DE, 19711, USA
| | - K Vijay-Shanker
- Department of Computer & Information Sciences, University of Delaware, Newark, DE, 19711, USA
| | - Cathy H Wu
- Center for Bioinformatics and Computational Biology, Delaware Biotechnology Institute, University of Delaware, 15 Innovation Way, Suite 205, Newark, DE, 19711, USA
- Department of Computer & Information Sciences, University of Delaware, Newark, DE, 19711, USA
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC, 20057, USA
| | - Cecilia N Arighi
- Center for Bioinformatics and Computational Biology, Delaware Biotechnology Institute, University of Delaware, 15 Innovation Way, Suite 205, Newark, DE, 19711, USA.
- Department of Computer & Information Sciences, University of Delaware, Newark, DE, 19711, USA.
| |
Collapse
|
15
|
Abstract
Many publicly available data repositories and resources have been developed to support protein-related information management, data-driven hypothesis generation, and biological knowledge discovery. To help researchers quickly find the appropriate protein-related informatics resources, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases in this chapter. We also discuss the challenges and opportunities for developing next-generation protein bioinformatics databases and resources to support data integration and data analytics in the Big Data era.
Collapse
Affiliation(s)
- Chuming Chen
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19711, USA.
| | - Hongzhan Huang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19711, USA
| | - Cathy H Wu
- Center for Bioinformatics and Computational Biology, Department of Computer and Information Sciences, University of Delaware, Newark, DE, 19711, USA
- Protein Information Resource, Department of Biochemistry and Molecular and Cellular Biology, Georgetown University Medical Center, Washington, DC, 20007, USA
| |
Collapse
|
16
|
Abstract
Protein post-translational modification (PTM) is an essential cellular regulatory mechanism, and disruptions in PTM have been implicated in disease. PTMs are an active area of study in many fields, leading to a wealth of PTM information in the scientific literature. There is a need for user-friendly bioinformatics resources that capture PTM information from the literature and support analyses of PTMs and their functional consequences. This chapter describes the use of iPTMnet ( http://proteininformationresource.org/iPTMnet/ ), a resource that integrates PTM information from text mining, curated databases, and ontologies and provides visualization tools for exploring PTM networks, PTM crosstalk, and PTM conservation across species. We present several PTM-related queries and demonstrate how they can be addressed using iPTMnet.
Collapse
|
17
|
Chang JW, Zhou YQ, Ul Qamar MT, Chen LL, Ding YD. Prediction of Protein-Protein Interactions by Evidence Combining Methods. Int J Mol Sci 2016; 17:ijms17111946. [PMID: 27879651 PMCID: PMC5133940 DOI: 10.3390/ijms17111946] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 11/15/2016] [Accepted: 11/15/2016] [Indexed: 12/27/2022] Open
Abstract
Most cellular functions involve proteins' features based on their physical interactions with other partner proteins. Sketching a map of protein-protein interactions (PPIs) is therefore an important inception step towards understanding the basics of cell functions. Several experimental techniques operating in vivo or in vitro have made significant contributions to screening a large number of protein interaction partners, especially high-throughput experimental methods. However, computational approaches for PPI predication supported by rapid accumulation of data generated from experimental techniques, 3D structure definitions, and genome sequencing have boosted the map sketching of PPIs. In this review, we shed light on in silico PPI prediction methods that integrate evidence from multiple sources, including evolutionary relationship, function annotation, sequence/structure features, network topology and text mining. These methods are developed for integration of multi-dimensional evidence, for designing the strategies to predict novel interactions, and for making the results consistent with the increase of prediction coverage and accuracy.
Collapse
Affiliation(s)
- Ji-Wei Chang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Yan-Qing Zhou
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Muhammad Tahir Ul Qamar
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Ling-Ling Chen
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Yu-Duan Ding
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
18
|
Ross KE, Natale DA, Arighi C, Chen SC, Huang H, Li G, Ren J, Wang M, Vijay-Shanker K, Wu CH. Scalable Text Mining Assisted Curation of Post-Translationally Modified Proteoforms in the Protein Ontology. CEUR WORKSHOP PROCEEDINGS 2016; 1747:http://ceur-ws.org/Vol-1747/BIT103_ICBO2016.pdf. [PMID: 28706471 PMCID: PMC5504912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The Protein Ontology (PRO) defines protein classes and their interrelationships from the family to the protein form (proteoform) level within and across species. One of the unique contributions of PRO is its representation of post-translationally modified (PTM) proteoforms. However, progress in adding PTM proteoform classes to PRO has been relatively slow due to the extensive manual curation effort required. Here we report an automated pipeline for creation of PTM proteoform classes that leverages two phosphorylation-focused text mining tools (RLIMS-P, which detects mentions of kinases, substrates, and phosphorylation sites, and eFIP, which detects phosphorylation-dependent protein-protein interactions (PPIs)) and our integrated PTM database, iPTMnet. By applying this pipeline, we obtained a set of ~820 substrate-site pairs that are suitable for automated PRO term generation with literature-based evidence attribution. Inclusion of these terms in PRO will increase PRO coverage of species-specific PTM proteoforms by 50%. Many of these new proteoforms also have associated kinase and/or PPI information. Finally, we show a phosphorylation network for the human and mouse peptidyl-prolyl cis-trans isomerase (PIN1/Pin1) derived from our dataset that demonstrates the biological complexity of the information we have extracted. Our approach addresses scalability in PRO curation and will be further expanded to advance PRO representation of phosphorylated proteoforms.
Collapse
Affiliation(s)
- Karen E Ross
- Protein Information Resource, Georgetown University Medical Center, Washington, DC, USA
| | - Darren A Natale
- Protein Information Resource, Georgetown University Medical Center, Washington, DC, USA
| | - Cecilia Arighi
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Sheng-Chih Chen
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Hongzhan Huang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Gang Li
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Jia Ren
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Michael Wang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - K Vijay-Shanker
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Cathy H Wu
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| |
Collapse
|
19
|
Soliman M, Nasraoui O, Cooper NGF. Building a glaucoma interaction network using a text mining approach. BioData Min 2016; 9:17. [PMID: 27152122 PMCID: PMC4857381 DOI: 10.1186/s13040-016-0096-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2015] [Accepted: 04/23/2016] [Indexed: 11/21/2022] Open
Abstract
Background The volume of biomedical literature and its underlying knowledge base is rapidly expanding, making it beyond the ability of a single human being to read through all the literature. Several automated methods have been developed to help make sense of this dilemma. The present study reports on the results of a text mining approach to extract gene interactions from the data warehouse of published experimental results which are then used to benchmark an interaction network associated with glaucoma. To the best of our knowledge, there is, as yet, no glaucoma interaction network derived solely from text mining approaches. The presence of such a network could provide a useful summative knowledge base to complement other forms of clinical information related to this disease. Results A glaucoma corpus was constructed from PubMed Central and a text mining approach was applied to extract genes and their relations from this corpus. The extracted relations between genes were checked using reference interaction databases and classified generally as known or new relations. The extracted genes and relations were then used to construct a glaucoma interaction network. Analysis of the resulting network indicated that it bears the characteristics of a small world interaction network. Our analysis showed the presence of seven glaucoma linked genes that defined the network modularity. A web-based system for browsing and visualizing the extracted glaucoma related interaction networks is made available at http://neurogene.spd.louisville.edu/GlaucomaINViewer/Form1.aspx. Conclusions This study has reported the first version of a glaucoma interaction network using a text mining approach. The power of such an approach is in its ability to cover a wide range of glaucoma related studies published over many years. Hence, a bigger picture of the disease can be established. To the best of our knowledge, this is the first glaucoma interaction network to summarize the known literature. The major findings were a set of relations that could not be found in existing interaction databases and that were found to be new, in addition to a smaller subnetwork consisting of interconnected clusters of seven glaucoma genes. Future improvements can be applied towards obtaining a better version of this network. Electronic supplementary material The online version of this article (doi:10.1186/s13040-016-0096-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Maha Soliman
- Department of Anatomical Sciences and Neurobiology, University of Louisville, School of Medicine, Louisville, KY USA
| | - Olfa Nasraoui
- Knowledge Discovery & Web Mining Lab, Department of Computer Engineering & Computer Science, University of Louisville, J.B Speed School of Engineering, Louisville, KY USA
| | - Nigel G F Cooper
- Department of Anatomical Sciences and Neurobiology, University of Louisville, School of Medicine, Louisville, KY USA
| |
Collapse
|
20
|
Bioinformatics Knowledge Map for Analysis of Beta-Catenin Function in Cancer. PLoS One 2015; 10:e0141773. [PMID: 26509276 PMCID: PMC4624812 DOI: 10.1371/journal.pone.0141773] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2015] [Accepted: 10/13/2015] [Indexed: 01/26/2023] Open
Abstract
Given the wealth of bioinformatics resources and the growing complexity of biological information, it is valuable to integrate data from disparate sources to gain insight into the role of genes/proteins in health and disease. We have developed a bioinformatics framework that combines literature mining with information from biomedical ontologies and curated databases to create knowledge "maps" of genes/proteins of interest. We applied this approach to the study of beta-catenin, a cell adhesion molecule and transcriptional regulator implicated in cancer. The knowledge map includes post-translational modifications (PTMs), protein-protein interactions, disease-associated mutations, and transcription factors co-activated by beta-catenin and their targets and captures the major processes in which beta-catenin is known to participate. Using the map, we generated testable hypotheses about beta-catenin biology in normal and cancer cells. By focusing on proteins participating in multiple relation types, we identified proteins that may participate in feedback loops regulating beta-catenin transcriptional activity. By combining multiple network relations with PTM proteoform-specific functional information, we proposed a mechanism to explain the observation that the cyclin dependent kinase CDK5 positively regulates beta-catenin co-activator activity. Finally, by overlaying cancer-associated mutation data with sequence features, we observed mutation patterns in several beta-catenin PTM sites and PTM enzyme binding sites that varied by tissue type, suggesting multiple mechanisms by which beta-catenin mutations can contribute to cancer. The approach described, which captures rich information for molecular species from genes and proteins to PTM proteoforms, is extensible to other proteins and their involvement in disease.
Collapse
|
21
|
Li G, Ross KE, Arighi CN, Peng Y, Wu CH, Vijay-Shanker K. miRTex: A Text Mining System for miRNA-Gene Relation Extraction. PLoS Comput Biol 2015; 11:e1004391. [PMID: 26407127 PMCID: PMC4583433 DOI: 10.1371/journal.pcbi.1004391] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Accepted: 06/08/2015] [Indexed: 12/27/2022] Open
Abstract
MicroRNAs (miRNAs) regulate a wide range of cellular and developmental processes through gene expression suppression or mRNA degradation. Experimentally validated miRNA gene targets are often reported in the literature. In this paper, we describe miRTex, a text mining system that extracts miRNA-target relations, as well as miRNA-gene and gene-miRNA regulation relations. The system achieves good precision and recall when evaluated on a literature corpus of 150 abstracts with F-scores close to 0.90 on the three different types of relations. We conducted full-scale text mining using miRTex to process all the Medline abstracts and all the full-length articles in the PubMed Central Open Access Subset. The results for all the Medline abstracts are stored in a database for interactive query and file download via the website at http://proteininformationresource.org/mirtex. Using miRTex, we identified genes potentially regulated by miRNAs in Triple Negative Breast Cancer, as well as miRNA-gene relations that, in conjunction with kinase-substrate relations, regulate the response to abiotic stress in Arabidopsis thaliana. These two use cases demonstrate the usefulness of miRTex text mining in the analysis of miRNA-regulated biological processes. MicroRNAs (miRNAs) are an important class of RNAs that regulate a wide range of biological processes by post-transcriptional regulation of gene expression. The amount of literature describing experimentally validated miRNA targets is increasing rapidly, which poses a challenge to researchers and biocurators to stay up-to-date with the available information. Text mining methods have been used to extract miRNA-gene associated pairs and assist in curation. In this paper, we describe miRTex, a text mining system that extracts miRNA-target, miRNA-gene regulation and gene-miRNA regulation relations. We evaluate miRTex performance on two corpora, and show that the elaborate use of lexico-syntactic information and linguistic generalizations enables it to achieve the state-of-the-art performance. We have processed the all the Medline abstracts and all the full-length articles in the PubMed Central Open Access Subset with miRTex, and provide a website to access the extraction results from all the Medline abstracts. The full-scale text mining results will be a useful resource for miRNA researchers, while the miRTex tool itself can be integrated into literature-based curation pipelines. We present two use cases (for animal and plant miRNAs, respectively) that show how the full-scale text mining can be used in combination with other bioinformatics resources to gain insight into biological processes.
Collapse
Affiliation(s)
- Gang Li
- Department of Computer and Information Sciences, University of Delaware, Newark, Delaware, United States of America
- * E-mail:
| | - Karen E. Ross
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC, United States of America
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, Delaware, United States of America
| | - Cecilia N. Arighi
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, Delaware, United States of America
| | - Yifan Peng
- Department of Computer and Information Sciences, University of Delaware, Newark, Delaware, United States of America
| | - Cathy H. Wu
- Department of Computer and Information Sciences, University of Delaware, Newark, Delaware, United States of America
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC, United States of America
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, Delaware, United States of America
| | - K. Vijay-Shanker
- Department of Computer and Information Sciences, University of Delaware, Newark, Delaware, United States of America
| |
Collapse
|