1
|
da Silva Rosa SC, Barzegar Behrooz A, Guedes S, Vitorino R, Ghavami S. Prioritization of genes for translation: a computational approach. Expert Rev Proteomics 2024; 21:125-147. [PMID: 38563427 DOI: 10.1080/14789450.2024.2337004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 02/21/2024] [Indexed: 04/04/2024]
Abstract
INTRODUCTION Gene identification for genetic diseases is critical for the development of new diagnostic approaches and personalized treatment options. Prioritization of gene translation is an important consideration in the molecular biology field, allowing researchers to focus on the most promising candidates for further investigation. AREAS COVERED In this paper, we discussed different approaches to prioritize genes for translation, including the use of computational tools and machine learning algorithms, as well as experimental techniques such as knockdown and overexpression studies. We also explored the potential biases and limitations of these approaches and proposed strategies to improve the accuracy and reliability of gene prioritization methods. Although numerous computational methods have been developed for this purpose, there is a need for computational methods that incorporate tissue-specific information to enable more accurate prioritization of candidate genes. Such methods should provide tissue-specific predictions, insights into underlying disease mechanisms, and more accurate prioritization of genes. EXPERT OPINION Using advanced computational tools and machine learning algorithms to prioritize genes, we can identify potential targets for therapeutic intervention of complex diseases. This represents an up-and-coming method for drug development and personalized medicine.
Collapse
Affiliation(s)
- Simone C da Silva Rosa
- Department of Human Anatomy and Cell Science, Max Rady College of Medicine, Rady Faculty of Health Science, University of Manitoba, Winnipeg, Canada
| | - Amir Barzegar Behrooz
- Department of Human Anatomy and Cell Science, Max Rady College of Medicine, Rady Faculty of Health Science, University of Manitoba, Winnipeg, Canada
- Electrophysiology Research Center, Neuroscience Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Sofia Guedes
- LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Rui Vitorino
- LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro, Portugal
- Department of Medical Sciences, Institute of Biomedicine-iBiMED, University of Aveiro, Aveiro, Portugal
- UnIC@RISE, Department of Surgery and Physiology, Faculty of Medicine of the University of Porto, Porto, Portugal
| | - Saeid Ghavami
- Department of Human Anatomy and Cell Science, Max Rady College of Medicine, Rady Faculty of Health Science, University of Manitoba, Winnipeg, Canada
- Faculty of Medicine in Zabrze, Academia of Silesia, Katowice, Poland
- Research Institute of Oncology and Hematology, Cancer Care Manitoba, University of Manitoba, Winnipeg, Canada
| |
Collapse
|
2
|
Omenn GS. Reflections on the HUPO Human Proteome Project, the Flagship Project of the Human Proteome Organization, at 10 Years. Mol Cell Proteomics 2021; 20:100062. [PMID: 33640492 PMCID: PMC8058560 DOI: 10.1016/j.mcpro.2021.100062] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 02/04/2021] [Accepted: 02/05/2021] [Indexed: 02/08/2023] Open
Abstract
We celebrate the 10th anniversary of the launch of the HUPO Human Proteome Project (HPP) and its major milestone of confident detection of at least one protein from each of 90% of the predicted protein-coding genes, based on the output of the entire proteomics community. The Human Genome Project reached a similar decadal milestone 20 years ago. The HPP has engaged proteomics teams around the world, strongly influenced data-sharing, enhanced quality assurance, and issued stringent guidelines for claims of detecting previously "missing proteins." This invited perspective complements papers on "A High-Stringency Blueprint of the Human Proteome" and "The Human Proteome Reaches a Major Milestone" in special issues of Nature Communications and Journal of Proteome Research, respectively, released in conjunction with the October 2020 virtual HUPO Congress and its celebration of the 10th anniversary of the HUPO HPP.
Collapse
Affiliation(s)
- Gilbert S Omenn
- University of Michigan Medical School, Departments of Computational Medicine & Bioinformatics, Internal Medicine, Human Genetics, and School of Public Health, Ann Arbor, Michigan, USA.
| |
Collapse
|
3
|
Hou C, Xie H, Fu Y, Ma Y, Li T. MloDisDB: a manually curated database of the relations between membraneless organelles and diseases. Brief Bioinform 2020; 22:5943794. [PMID: 33126250 DOI: 10.1093/bib/bbaa271] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 09/15/2020] [Accepted: 09/19/2020] [Indexed: 01/03/2023] Open
Abstract
Cells are compartmentalized by numerous membrane-bounded organelles and membraneless organelles (MLOs) to ensure temporal and spatial regulation of various biological processes. A number of MLOs, such as nucleoli, nuclear speckles and stress granules, exist as liquid droplets within the cells and arise from the condensation of proteins and RNAs via liquid-liquid phase separation (LLPS). By concentrating certain proteins and RNAs, MLOs accelerate biochemical reactions and protect cells during stress, and dysfunction of MLOs is associated with various pathological processes. With the development in this field, more and more relations between the MLOs and diseases have been described; however, these results have not been made available in a centralized resource. Herein, we build MloDisDB, a database which aims to gather the relations between MLOs and diseases from dispersed literature. In addition, the relations between LLPS and diseases were included as well. Currently, MloDisDB contains 771 curated entries from 607 publications; each entry in MloDisDB contains detailed information about the MLO, the disease and the functional factor in the relation. Furthermore, an efficient and user-friendly interface for users to search, browse and download all entries was provided. MloDisDB is the first comprehensive database of the relations between MLOs and diseases so far, and the database is freely accessible at http://mlodis.phasep.pro/.
Collapse
Affiliation(s)
- Chao Hou
- Department of Biomedical Informatics, Peking University Health Science Center
| | | | - Yang Fu
- Peking University Health Science Center
| | - Yao Ma
- Peking University Health Science Center
| | - Tingting Li
- Department of Biomedical Informatics, Peking University Health Science Center, Beijing, China
| |
Collapse
|
4
|
Cerciello F, Choi M, Sinicropi-Yao SL, Lomeo K, Amann JM, Felley-Bosco E, Stahel RA, Robinson BWS, Creaney J, Pass HI, Vitek O, Carbone DP. Verification of a Blood-Based Targeted Proteomics Signature for Malignant Pleural Mesothelioma. Cancer Epidemiol Biomarkers Prev 2020; 29:1973-1982. [PMID: 32732250 PMCID: PMC7541795 DOI: 10.1158/1055-9965.epi-20-0543] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 06/18/2020] [Accepted: 07/27/2020] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND We have verified a mass spectrometry (MS)-based targeted proteomics signature for the detection of malignant pleural mesothelioma (MPM) from the blood. METHODS A seven-peptide biomarker MPM signature by targeted proteomics in serum was identified in a previous independent study. Here, we have verified the predictive accuracy of a reduced version of that signature, now composed of six-peptide biomarkers. We have applied liquid chromatography-selected reaction monitoring (LC-SRM), also known as multiple-reaction monitoring (MRM), for the investigation of 402 serum samples from 213 patients with MPM and 189 cancer-free asbestos-exposed donors from the United States, Australia, and Europe. RESULTS Each of the biomarkers composing the signature was independently informative, with no apparent functional or physical relation to each other. The multiplexing possibility offered by MS proteomics allowed their integration into a single signature with a higher discriminating capacity than that of the single biomarkers alone. The strategy allowed in this way to increase their potential utility for clinical decisions. The signature discriminated patients with MPM and asbestos-exposed donors with AUC of 0.738. For early-stage MPM, AUC was 0.765. This signature was also prognostic, and Kaplan-Meier analysis showed a significant difference between high- and low-risk groups with an HR of 1.659 (95% CI, 1.075-2.562; P = 0.021). CONCLUSIONS Targeted proteomics allowed the development of a multianalyte signature with diagnostic and prognostic potential for MPM from the blood. IMPACT The proteomic signature represents an additional diagnostic approach for informing clinical decisions for patients at risk for MPM.
Collapse
Affiliation(s)
- Ferdinando Cerciello
- James Thoracic Center, James Cancer Center, The Ohio State University Medical Center, Columbus, Ohio.
| | - Meena Choi
- College of Computer and Information Science, Northeastern University, Boston, Massachusetts
| | - Sara L Sinicropi-Yao
- James Thoracic Center, James Cancer Center, The Ohio State University Medical Center, Columbus, Ohio
| | - Katie Lomeo
- James Thoracic Center, James Cancer Center, The Ohio State University Medical Center, Columbus, Ohio
| | - Joseph M Amann
- James Thoracic Center, James Cancer Center, The Ohio State University Medical Center, Columbus, Ohio
| | - Emanuela Felley-Bosco
- Laboratory of Molecular Oncology, Division of Thoracic Surgery, University Hospital Zürich, Zürich, Switzerland
| | - Rolf A Stahel
- Department of Oncology, Center of Hematology and Oncology, Comprehensive Cancer Center Zürich, University Hospital Zürich, Zürich, Switzerland
| | - Bruce W S Robinson
- National Centre for Asbestos Related Disease, University of Western Australia, School of Medicine and Pharmacology, Nedlands, Western Australia
| | - Jenette Creaney
- National Centre for Asbestos Related Disease, University of Western Australia, School of Medicine and Pharmacology, Nedlands, Western Australia
| | - Harvey I Pass
- New York University, Langone Medical Center, New York, New York
| | - Olga Vitek
- College of Computer and Information Science, Northeastern University, Boston, Massachusetts
| | - David P Carbone
- James Thoracic Center, James Cancer Center, The Ohio State University Medical Center, Columbus, Ohio.
| |
Collapse
|
5
|
Steffen P, Wu J, Hariharan S, Voss H, Raghunath V, Molloy MP, Schlüter H. OmixLitMiner: A Bioinformatics Tool for Prioritizing Biological Leads from 'Omics Data Using Literature Retrieval and Data Mining. Int J Mol Sci 2020; 21:ijms21041374. [PMID: 32092871 PMCID: PMC7073124 DOI: 10.3390/ijms21041374] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Revised: 02/05/2020] [Accepted: 02/13/2020] [Indexed: 12/13/2022] Open
Abstract
Proteomics and genomics discovery experiments generate increasingly large result tables, necessitating more researcher time to convert the biological data into new knowledge. Literature review is an important step in this process and can be tedious for large scale experiments. An informed and strategic decision about which biomolecule targets should be pursued for follow-up experiments thus remains a considerable challenge. To streamline and formalise this process of literature retrieval and analysis of discovery based 'omics data and as a decision-facilitating support tool for follow-up experiments we present OmixLitMiner, a package written in the computational language R. The tool automates the retrieval of literature from PubMed based on UniProt protein identifiers, gene names and their synonyms, combined with user defined contextual keyword search (i.e., gene ontology based). The search strategy is programmed to allow either strict or more lenient literature retrieval and the outputs are assigned to three categories describing how well characterized a regulated gene or protein is. The category helps to meet a decision, regarding which gene/protein follow-up experiments may be performed for gaining new knowledge and to exclude following already known biomarkers. We demonstrate the tool's usefulness in this retrospective study assessing three cancer proteomics and one cancer genomics publication. Using the tool, we were able to corroborate most of the decisions in these papers as well as detect additional biomolecule leads that may be valuable for future research.
Collapse
Affiliation(s)
- Pascal Steffen
- Bowel Cancer and Biomarker Research, Kolling Institute, The University of Sydney, Sydney 2065, Australia; (P.S.); (S.H.)
- Department of Molecular Sciences, Macquarie University, Sydney 2109, Australia;
- Mass Spectrometric Proteomics Group, Department of Clinical Chemistry and Laboratory Medicine, University Medical Center Hamburg-Eppendorf (UKE), Hamburg 20246, Germany;
| | - Jemma Wu
- Department of Molecular Sciences, Macquarie University, Sydney 2109, Australia;
| | - Shubhang Hariharan
- Bowel Cancer and Biomarker Research, Kolling Institute, The University of Sydney, Sydney 2065, Australia; (P.S.); (S.H.)
| | - Hannah Voss
- Mass Spectrometric Proteomics Group, Department of Clinical Chemistry and Laboratory Medicine, University Medical Center Hamburg-Eppendorf (UKE), Hamburg 20246, Germany;
| | - Vijay Raghunath
- Sydney Informatics Hub, The University of Sydney, Sydney 2008, Australia;
| | - Mark P. Molloy
- Bowel Cancer and Biomarker Research, Kolling Institute, The University of Sydney, Sydney 2065, Australia; (P.S.); (S.H.)
- Department of Molecular Sciences, Macquarie University, Sydney 2109, Australia;
- Correspondence: (M.P.M.); (H.S.)
| | - Hartmut Schlüter
- Mass Spectrometric Proteomics Group, Department of Clinical Chemistry and Laboratory Medicine, University Medical Center Hamburg-Eppendorf (UKE), Hamburg 20246, Germany;
- Correspondence: (M.P.M.); (H.S.)
| |
Collapse
|
6
|
Han Y, Wennersten SA, Lam MPY. Working the literature harder: what can text mining and bibliometric analysis reveal? Expert Rev Proteomics 2019; 16:871-873. [PMID: 31822126 DOI: 10.1080/14789450.2019.1703678] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Affiliation(s)
- Yu Han
- Department of Medicine, Division of Cardiology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.,Consortium for Fibrosis Research and Translation, University of Colorado Anschutz Medical Campus, Aurora, USA
| | - Sara A Wennersten
- Department of Medicine, Division of Cardiology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.,Consortium for Fibrosis Research and Translation, University of Colorado Anschutz Medical Campus, Aurora, USA
| | - Maggie P Y Lam
- Department of Medicine, Division of Cardiology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.,Consortium for Fibrosis Research and Translation, University of Colorado Anschutz Medical Campus, Aurora, USA.,Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
7
|
Ruiz-Romero C, Lam MPY, Nilsson P, Önnerfjord P, Utz PJ, Van Eyk JE, Venkatraman V, Fert-Bober J, Watt FE, Blanco FJ. Mining the Proteome Associated with Rheumatic and Autoimmune Diseases. J Proteome Res 2019; 18:4231-4239. [PMID: 31599600 DOI: 10.1021/acs.jproteome.9b00360] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
A steady increase in the incidence of osteoarthritis and other rheumatic diseases has been observed in recent decades, including autoimmune conditions such as rheumatoid arthritis, spondyloarthropathies, systemic lupus erythematosus, systemic sclerosis, and Sjögren's syndrome. Rheumatic and autoimmune diseases (RADs) are characterized by the inflammation of joints, muscles, or other connective tissues. In addition to often experiencing debilitating mobility and pain, RAD patients are also at a higher risk of suffering comorbidities such as cardiovascular or infectious events. Given the socioeconomic impact of RADs, broad research efforts have been dedicated to these diseases worldwide. In the present work, we applied literature mining platforms to identify "popular" proteins closely related to RADs. The platform is based on publicly available literature. The results not only will enable the systematic prioritization of candidates to perform targeted proteomics studies but also may lead to a greater insight into the key pathogenic processes of these disorders.
Collapse
Affiliation(s)
- Cristina Ruiz-Romero
- Grupo de Investigación de Reumatología (GIR), Unidad de Proteómica, INIBIC - Complejo Hospitalario Universitario de A Coruña, SERGAS , Universidad de A Coruña , A Coruña 15006 , Spain
| | - Maggie P Y Lam
- Department of Medicine, Division of Cardiology, Consortium for Fibrosis Research and Translation, Anschutz Medical Campus , University of Colorado Denver , Aurora , Colorado 80045 , United States
| | - Peter Nilsson
- Division of Affinity Proteomics, SciLifeLab, Department of Protein Science , KTH Royal Institute of Technology , Stockholm 17121 , Sweden
| | - Patrik Önnerfjord
- Department of Clinical Sciences, Section for Rheumatology and Molecular Skeletal Biology , Lund University , Lund 22184 , Sweden
| | - Paul J Utz
- Division of Immunology and Rheumatology , Stanford University School of Medicine ; Palo Alto , California 94304 , United States
| | - Jennifer E Van Eyk
- Department of Medicine and The Heart Institute , Cedars-Sinai Medical Center , Los Angeles , California 90048 , United States
| | - Vidya Venkatraman
- Department of Medicine and The Heart Institute , Cedars-Sinai Medical Center , Los Angeles , California 90048 , United States
| | - Justyna Fert-Bober
- Department of Medicine and The Heart Institute , Cedars-Sinai Medical Center , Los Angeles , California 90048 , United States
| | - Fiona E Watt
- Arthritis Research UK Centre for Osteoarthritis Pathogenesis, Kennedy Institute of Rheumatology , University of Oxford , Oxford OX3 7FY , United Kingdom
| | - Francisco J Blanco
- Grupo de Investigación de Reumatología, INIBIC-Complejo Hospitalario Universitario de A Coruña, SERGAS , Departamento de Medicina Universidad de A Coruña , A Coruña 15006 , Spain
| |
Collapse
|
8
|
Omenn GS, Lane L, Overall CM, Corrales FJ, Schwenk JM, Paik YK, Van Eyk JE, Liu S, Pennington S, Snyder MP, Baker MS, Deutsch EW. Progress on Identifying and Characterizing the Human Proteome: 2019 Metrics from the HUPO Human Proteome Project. J Proteome Res 2019; 18:4098-4107. [PMID: 31430157 PMCID: PMC6898754 DOI: 10.1021/acs.jproteome.9b00434] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
The Human Proteome Project (HPP) annually reports on progress made throughout the field in credibly identifying and characterizing the complete human protein parts list and making proteomics an integral part of multiomics studies in medicine and the life sciences. NeXtProt release 2019-01-11 contains 17 694 proteins with strong protein-level evidence (PE1), compliant with HPP Guidelines for Interpretation of MS Data v2.1; these represent 89% of all 19 823 neXtProt predicted coding genes (all PE1,2,3,4 proteins), up from 17 470 one year earlier. Conversely, the number of neXtProt PE2,3,4 proteins, termed the "missing proteins" (MPs), has been reduced from 2949 to 2129 since 2016 through efforts throughout the community, including the chromosome-centric HPP. PeptideAtlas is the source of uniformly reanalyzed raw mass spectrometry data for neXtProt; PeptideAtlas added 495 canonical proteins between 2018 and 2019, especially from studies designed to detect hard-to-identify proteins. Meanwhile, the Human Protein Atlas has released version 18.1 with immunohistochemical evidence of expression of 17 000 proteins and survival plots as part of the Pathology Atlas. Many investigators apply multiplexed SRM-targeted proteomics for quantitation of organ-specific popular proteins in studies of various human diseases. The 19 teams of the Biology and Disease-driven B/D-HPP published a total of 160 publications in 2018, bringing proteomics to a broad array of biomedical research.
Collapse
Affiliation(s)
- Gilbert S. Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, Michigan 48109-2218, United States
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, Washington 98109-5263, United States
| | - Lydie Lane
- CALIPHO Group, SIB Swiss Institute of Bioinformatics and Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, CMU, Michel-Servet 1, 1211 Geneva 4, Switzerland
| | - Christopher M. Overall
- Life Sciences Institute, Faculty of Dentistry, University of British Columbia, 2350 Health Sciences Mall, Room 4.401, Vancouver, British Columbia V6T 1Z3, Canada
| | | | - Jochen M. Schwenk
- Science for Life Laboratory, KTH Royal Institute of Technology, Tomtebodavägen 23A, 17165 Solna, Sweden
| | - Young-Ki Paik
- Yonsei Proteome Research Center, Yonsei University, Room 425, Building #114, 50 Yonsei-ro, Seodaemoon-ku, Seoul 120-749, South Korea
| | - Jennifer E. Van Eyk
- Advanced Clinical BioSystems Research Institute, Cedars Sinai Precision Biomarker Laboratories, Barbra Streisand Women’s Heart Center, Cedars-Sinai Medical Center, Los Angeles, California 90048, United States
| | - Siqi Liu
- BGI Group-Shenzhen, Yantian District, Shenzhen 518083, China
| | - Stephen Pennington
- School of Medicine, University College Dublin, Conway Institute Belfield, Dublin 4, Ireland
| | - Michael P. Snyder
- Department of Genetics, Stanford University, Alway Building, 300 Pasteur Drive and 3165 Porter Drive, Palo Alto, California 94304, United States
| | - Mark S. Baker
- Department of Biomedical Sciences, Faculty of Medicine & Health Sciences, Macquarie University, 75 Talavera Road, North Ryde, NSW 2109, Australia
| | - Eric W. Deutsch
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, Washington 98109-5263, United States
| |
Collapse
|
9
|
Paik YK, Overall CM, Corrales F, Deutsch EW, Lane L, Omenn GS. Toward Completion of the Human Proteome Parts List: Progress Uncovering Proteins That Are Missing or Have Unknown Function and Developing Analytical Methods. J Proteome Res 2019; 17:4023-4030. [PMID: 30985145 PMCID: PMC6288998 DOI: 10.1021/acs.jproteome.8b00885] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Young-Ki Paik
- Yonsei Proteome Research Center, College of Life Science and Technology, Yonsei University
| | - Christopher M Overall
- Centre for Blood Research, Departments of Oral Biological & Medical Sciences and Biochemistry & Molecular Biology, Faculty of Dentistry, University of British Columbia
| | - Fernando Corrales
- Functional Proteomics Laboratory National Center of Biotechnology, CSIC
| | | | - Lydie Lane
- CALIPHO Group, SIB Swiss Institute of Bioinformatics and Department of Microbiology and Molecular Medicine, Faculty of Medicine, CMU, University of Geneva
| | - Gilbert S Omenn
- Institute for Systems Biology, Departments of Computational Medicine & Bioinformatics, Internal Medicine, and Human Genetics & School of Public Health, University of Michigan
| |
Collapse
|
10
|
Caufield JH, Ping P. New advances in extracting and learning from protein-protein interactions within unstructured biomedical text data. Emerg Top Life Sci 2019; 3:357-369. [PMID: 33523203 DOI: 10.1042/etls20190003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 07/11/2019] [Accepted: 07/16/2019] [Indexed: 12/14/2022]
Abstract
Protein-protein interactions, or PPIs, constitute a basic unit of our understanding of protein function. Though substantial effort has been made to organize PPI knowledge into structured databases, maintenance of these resources requires careful manual curation. Even then, many PPIs remain uncurated within unstructured text data. Extracting PPIs from experimental research supports assembly of PPI networks and highlights relationships crucial to elucidating protein functions. Isolating specific protein-protein relationships from numerous documents is technically demanding by both manual and automated means. Recent advances in the design of these methods have leveraged emerging computational developments and have demonstrated impressive results on test datasets. In this review, we discuss recent developments in PPI extraction from unstructured biomedical text. We explore the historical context of these developments, recent strategies for integrating and comparing PPI data, and their application to advancing the understanding of protein function. Finally, we describe the challenges facing the application of PPI mining to the text concerning protein families, using the multifunctional 14-3-3 protein family as an example.
Collapse
Affiliation(s)
- J Harry Caufield
- The NIH BD2K Center of Excellence in Biomedical Computing, University of California at Los Angeles, Los Angeles, CA 90095, U.S.A
- Department of Physiology, University of California at Los Angeles, Los Angeles, CA 90095, U.S.A
| | - Peipei Ping
- The NIH BD2K Center of Excellence in Biomedical Computing, University of California at Los Angeles, Los Angeles, CA 90095, U.S.A
- Department of Physiology, University of California at Los Angeles, Los Angeles, CA 90095, U.S.A
- Department of Medicine/Cardiology, University of California at Los Angeles, Los Angeles, CA 90095, U.S.A
- Department of Bioinformatics, University of California at Los Angeles, Los Angeles, CA 90095, U.S.A
- Scalable Analytics Institute (ScAi), University of California at Los Angeles, Los Angeles, CA 90095, U.S.A
| |
Collapse
|
11
|
Fernández-Irigoyen J, Corrales F, Santamaría E. The Human Brain Proteome Project: Biological and Technological Challenges. Methods Mol Biol 2019; 2044:3-23. [PMID: 31432403 DOI: 10.1007/978-1-4939-9706-0_1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Brain proteomics has become a method of choice that allows zooming-in where neuropathophysiological alterations are taking place, detecting protein mediators that might eventually be measured in cerebrospinal fluid (CSF) as potential neuropathologically derived biomarkers. Following this hypothesis, mass spectrometry-based neuroproteomics has emerged as a powerful approach to profile neural proteomes derived from brain structures and CSF in order to map the extensive protein catalog of the human brain. This chapter provides a historical perspective on the Human Brain Proteome Project (HBPP), some recommendation to the experimental design in neuroproteomic projects, and a brief description of relevant technological and computational innovations that are emerging in the neurobiology field thanks to the proteomics community. Importantly, this chapter highlights recent discoveries from the biology- and disease-oriented branch of the HBPP (B/D-HBPP) focused on spatiotemporal proteomic characterizations of mouse models of neurodegenerative diseases, elucidation of proteostatic networks in different types of dementia, the characterization of unresolved clinical phenotypes, and the discovery of novel biomarker candidates in CSF.
Collapse
Affiliation(s)
- Joaquín Fernández-Irigoyen
- Proteomics Unit, Clinical Neuroproteomics Laboratory, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Proteored-ISCIII, Pamplona, Spain
| | - Fernando Corrales
- Functional Proteomics Laboratory,, Proteored-ISCIII, CIBERehd, Madrid, Spain
| | - Enrique Santamaría
- Proteomics Unit, Clinical Neuroproteomics Laboratory, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Proteored-ISCIII, Pamplona, Spain.
| |
Collapse
|
12
|
Omenn GS, Lane L, Overall CM, Corrales FJ, Schwenk JM, Paik YK, Van Eyk JE, Liu S, Snyder M, Baker MS, Deutsch EW. Progress on Identifying and Characterizing the Human Proteome: 2018 Metrics from the HUPO Human Proteome Project. J Proteome Res 2018; 17:4031-4041. [PMID: 30099871 PMCID: PMC6387656 DOI: 10.1021/acs.jproteome.8b00441] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The Human Proteome Project (HPP) annually reports on progress throughout the field in credibly identifying and characterizing the human protein parts list and making proteomics an integral part of multiomics studies in medicine and the life sciences. NeXtProt release 2018-01-17, the baseline for this sixth annual HPP special issue of the Journal of Proteome Research, contains 17 470 PE1 proteins, 89% of all neXtProt predicted PE1-4 proteins, up from 17 008 in release 2017-01-23 and 13 975 in release 2012-02-24. Conversely, the number of neXtProt PE2,3,4 missing proteins has been reduced from 2949 to 2579 to 2186 over the past two years. Of the PE1 proteins, 16 092 are based on mass spectrometry results, and 1378 on other kinds of protein studies, notably protein-protein interaction findings. PeptideAtlas has 15 798 canonical proteins, up 625 over the past year, including 269 from SUMOylation studies. The largest reason for missing proteins is low abundance. Meanwhile, the Human Protein Atlas has released its Cell Atlas, Pathology Atlas, and updated Tissue Atlas, and is applying recommendations from the International Working Group on Antibody Validation. Finally, there is progress using the quantitative multiplex organ-specific popular proteins targeted proteomics approach in various disease categories.
Collapse
Affiliation(s)
- Gilbert S. Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, Michigan 48109-2218, United States
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, Washington 98109-5263, United States
| | - Lydie Lane
- CALIPHO Group, SIB Swiss Institute of Bioinformatics and Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, CMU, Michel-Servet 1, 1211 Geneva 4, Switzerland
| | - Christopher M. Overall
- Life Sciences Institute, Faculty of Dentistry, University of British Columbia, 2350 Health Sciences Mall, Room 4.401, Vancouver, BC Canada V6T 1Z3
| | | | - Jochen M. Schwenk
- Science for Life Laboratory, KTH Royal Institute of Technology, Tomtebodavägen 23A, 17165 Solna, Sweden
| | - Young-Ki Paik
- Yonsei Proteome Research Center, Room 425, Building #114, Yonsei University,50 Yonsei-ro, Seodaemoon-ku, Seoul 120-749, Korea
| | - Jennifer E. Van Eyk
- Advanced Clinical BioSystems Research Institute, Cedars Sinai Precision Biomarker Laboratories, Barbra Streisand Women’s Heart Center, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States
| | - Siqi Liu
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390-9148, United States
| | - Michael Snyder
- Department of Genetics, Stanford University, Alway Building, 300 Pasteur Drive, 3165 Porter Drive, Palo Alto, 94304, United States
| | - Mark S. Baker
- Department of Biomedical Sciences, Macquarie University, NSW 2109, Australia
| | - Eric W. Deutsch
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, Washington 98109-5263, United States
| |
Collapse
|
13
|
Lau E, Venkatraman V, Thomas CT, Wu JC, Van Eyk JE, Lam MPY. Identifying High-Priority Proteins Across the Human Diseasome Using Semantic Similarity. J Proteome Res 2018; 17:4267-4278. [PMID: 30256117 PMCID: PMC6606054 DOI: 10.1021/acs.jproteome.8b00393] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Identifying the genes and proteins associated with a biological process or disease is a central goal of the biomedical research enterprise. However, relatively few systematic approaches are available that provide objective evaluation of the genes or proteins known to be important to a research topic, and hence researchers often rely on subjective evaluation of domain experts and laborious manual literature review. Computational bibliometric analysis, in conjunction with text mining and data curation, attempts to automate this process and return prioritized proteins in any given research topic. We describe here a method to identify and rank protein-topic relationships by calculating the semantic similarity between a protein and a query term in the biomerical literature while adjusting for the impact and immediacy of associated research articles. We term the calculated metric the weighted copublication distance (WCD) and show that it compares well to related approaches in predicting benchmark protein lists in multiple biological processes. We used WCD to extract prioritized "popular proteins" across multiple cell types, subanatomical regions, and standardized vocabularies containing over 20 000 human disease terms. The collection of protein-disease associations across the resulting human "diseasome" supports data analytical workflows to perform reverse protein-to-disease queries and functional annotation of experimental protein lists. We envision that the described improvement to the popular proteins strategy will be useful for annotating protein lists and guiding method development efforts as well as generating new hypotheses on understudied disease proteins using bibliometric information.
Collapse
Affiliation(s)
- Edward Lau
- Stanford Cardiovascular Institute, Stanford University, Stanford, California 94305, United States
| | - Vidya Venkatraman
- Advanced Clinical Biosystems Research Institute, Department of Medicine and The Heart Institute, Cedars-Sinai Medical Center, Los Angeles, California 90048, United States
| | - Cody T. Thomas
- Department of Medicine, Division of Cardiology, Consortium for Fibrosis Research and Translation, Anschutz Medical Campus, University of Colorado Denver, Aurora, Colorado 80045, United States
| | - Joseph C. Wu
- Stanford Cardiovascular Institute, Stanford University, Stanford, California 94305, United States
| | - Jennifer E. Van Eyk
- Advanced Clinical Biosystems Research Institute, Department of Medicine and The Heart Institute, Cedars-Sinai Medical Center, Los Angeles, California 90048, United States
| | - Maggie P. Y. Lam
- Department of Medicine, Division of Cardiology, Consortium for Fibrosis Research and Translation, Anschutz Medical Campus, University of Colorado Denver, Aurora, Colorado 80045, United States
| |
Collapse
|
14
|
Boersema PJ, Melnik A, Hazenberg BPC, Rezeli M, Marko-Varga G, Kamiie J, Portelius E, Blennow K, Zubarev RA, Polymenidou M, Picotti P. Biology/Disease-Driven Initiative on Protein-Aggregation Diseases of the Human Proteome Project: Goals and Progress to Date. J Proteome Res 2018; 17:4072-4084. [PMID: 30137990 DOI: 10.1021/acs.jproteome.8b00401] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The Biology/Disease-driven (B/D) working groups of the Human Proteome Project are alliances of research groups aimed at developing or improving proteomic tools to support specific biological or disease-related research areas. Here, we describe the activities and progress to date of the B/D working group focused on protein aggregation diseases (PADs). PADs are characterized by the intra- or extracellular accumulation of aggregated proteins and include devastating diseases such as Parkinson's and Alzheimer's disease and systemic amyloidosis. The PAD B/D working group aims for the development of proteomic assays for the quantification of aggregation-prone proteins involved in PADs to support basic and clinical research on PADs. Because the proteins in PADs undergo aberrant conformational changes, a goal is to quantitatively resolve altered protein structures and aggregation states in complex biological specimens. We have developed protein-extraction protocols and a set of mass spectrometric (MS) methods that enable the detection and quantification of proteins involved in the systemic and localized amyloidosis and the probing of aberrant protein conformational transitions in cell and tissue extracts. In several studies, we have demonstrated the potential of MS-based proteomics approaches for specific and sensitive clinical diagnoses and for the subtyping of PADs. The developed methods have been detailed in both protocol papers and manuscripts describing applications to facilitate implementation by nonspecialized laboratories, and assay coordinates are shared through public repositories and databases. Clinicians actively involved in the PAD working group support the transfer to clinical practice of the developed methods, such as assays to quantify specific disease-related proteins and their fragments in biofluids and multiplexed MS-based methods for the diagnosis and typing of systemic amyloidosis. We believe that the increasing availability of tools to precisely measure proteins involved in PADs will positively impact research on the molecular bases of these diseases and support early disease diagnosis and a more-confident subtyping.
Collapse
Affiliation(s)
- Paul J Boersema
- Institute of Molecular Systems Biology, Department of Biology , ETH Zurich , Otto-Stern-Weg 3 , 8093 Zurich , Switzerland
| | - Andre Melnik
- Institute of Molecular Systems Biology, Department of Biology , ETH Zurich , Otto-Stern-Weg 3 , 8093 Zurich , Switzerland
| | - Bouke P C Hazenberg
- Department of Rheumatology & Clinical Immunology , University of Groningen, University Medical Center Groningen , Hanzeplein 1 , 9713 GZ Groningen , The Netherlands
| | - Melinda Rezeli
- Clinical Protein Science & Imaging, Department of Biomedical Engineering , Lund University, BMC D13 , 221 84 Lund , Sweden
| | - György Marko-Varga
- Clinical Protein Science & Imaging, Department of Biomedical Engineering , Lund University, BMC D13 , 221 84 Lund , Sweden
| | - Junichi Kamiie
- Laboratory of Veterinary Pathology , Azabu University , 1-17-71 Fuchinobe , Chuo-ku, Sagamihara , Kanagawa 252-5201 , Japan
| | - Erik Portelius
- Institute of Neuroscience and Physiology, Department of Psychiatry and Neurochemistry , The Sahlgrenska Academy at University of Gothenburg , S-431 80 Mölndal , Sweden.,Clinical Neurochemistry Laboratory , Sahlgrenska University Hospital , Mölndal S-431 80 , Sweden
| | - Kaj Blennow
- Institute of Neuroscience and Physiology, Department of Psychiatry and Neurochemistry , The Sahlgrenska Academy at University of Gothenburg , S-431 80 Mölndal , Sweden.,Clinical Neurochemistry Laboratory , Sahlgrenska University Hospital , Mölndal S-431 80 , Sweden
| | - Roman A Zubarev
- Department of Medical Biochemistry and Biophysics , Karolinska Institute , 17177 Stockholm , Sweden
| | - Magdalini Polymenidou
- Institute of Molecular Life Sciences, University of Zürich , Winterthurerstrasse 190 , Zürich , Switzerland
| | - Paola Picotti
- Institute of Molecular Systems Biology, Department of Biology , ETH Zurich , Otto-Stern-Weg 3 , 8093 Zurich , Switzerland
| |
Collapse
|
15
|
Mato JM, Elortza F, Lu SC, Brun V, Paradela A, Corrales FJ. Liver cancer-associated changes to the proteome: what deserves clinical focus? Expert Rev Proteomics 2018; 15:749-756. [PMID: 30204005 DOI: 10.1080/14789450.2018.1521277] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
INTRODUCTION Hepatocellular carcinoma (HCC) is recognized as the fifth most common neoplasm and currently represents the second leading form of cancer-related death worldwide. Despite great progress has been done in the understanding of its pathogenesis, HCC represents a heavy societal and economic burden as most patients are still diagnosed at advanced stages and the 5-year survival rate remain below 20%. Early detection and revolutionary therapies that rely on the discovery of new molecular biomarkers and therapeutic targets are therefore urgently needed to develop precision medicine strategies for a more efficient management of patients. Areas covered: This review intends to comprehensively analyse the proteomics-based research conducted in the last few years to address some of the principal still open riddles in HCC biology, based on the identification of molecular drivers of tumor progression and metastasis. Expert commentary: The technical advances in mass spectrometry experienced in the last decade have significantly improved the analytical capacity of proteome wide studies. Large-scale protein and protein variant (post-translational modifications) identification and quantification have allowed detailed dissections of molecular mechanisms underlying HCC progression and are already paving the way for the identification of clinically relevant proteins and the development of their use on patient care.
Collapse
Affiliation(s)
- José M Mato
- a CIC bioGUNE, CIBERehd, ProteoRed-ISCIII, Bizkaia Science and Technology Park , Derio , Spain.,b National Institute for the Study of Liver and Gastrointestinal Diseases (CIBERehd), Carlos III National Institute of Health , Madrid , Spain
| | - Félix Elortza
- a CIC bioGUNE, CIBERehd, ProteoRed-ISCIII, Bizkaia Science and Technology Park , Derio , Spain.,b National Institute for the Study of Liver and Gastrointestinal Diseases (CIBERehd), Carlos III National Institute of Health , Madrid , Spain
| | - Shelly C Lu
- c Division of Digestive and Liver Diseases , Cedars-Sinai Medical Center , LA , CA , USA
| | - Virginie Brun
- d Université Grenoble-Alpes, CEA, BIG, Biologie à Grande Echelle, Inserm , Grenoble , France
| | - Alberto Paradela
- e Functional Proteomics Laboratory , Centro Nacional de Biotecnología-CSIC, Proteored-ISCIII, CIBERehd , Madrid , Spain
| | - Fernando J Corrales
- b National Institute for the Study of Liver and Gastrointestinal Diseases (CIBERehd), Carlos III National Institute of Health , Madrid , Spain.,e Functional Proteomics Laboratory , Centro Nacional de Biotecnología-CSIC, Proteored-ISCIII, CIBERehd , Madrid , Spain
| |
Collapse
|
16
|
Yu KH, Lee TLM, Chen YJ, Ré C, Kou SC, Chiang JH, Snyder M, Kohane IS. A Cloud-Based Metabolite and Chemical Prioritization System for the Biology/Disease-Driven Human Proteome Project. J Proteome Res 2018; 17:4345-4357. [PMID: 30094994 DOI: 10.1021/acs.jproteome.8b00378] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Targeted metabolomics and biochemical studies complement the ongoing investigations led by the Human Proteome Organization (HUPO) Biology/Disease-Driven Human Proteome Project (B/D-HPP). However, it is challenging to identify and prioritize metabolite and chemical targets. Literature-mining-based approaches have been proposed for target proteomics studies, but text mining methods for metabolite and chemical prioritization are hindered by a large number of synonyms and nonstandardized names of each entity. In this study, we developed a cloud-based literature mining and summarization platform that maps metabolites and chemicals in the literature to unique identifiers and summarizes the copublication trends of metabolites/chemicals and B/D-HPP topics using Protein Universal Reference Publication-Originated Search Engine (PURPOSE) scores. We successfully prioritized metabolites and chemicals associated with the B/D-HPP targeted fields and validated the results by checking against expert-curated associations and enrichment analyses. Compared with existing algorithms, our system achieved better precision and recall in retrieving chemicals related to B/D-HPP focused areas. Our cloud-based platform enables queries on all biological terms in multiple species, which will contribute to B/D-HPP and targeted metabolomics/chemical studies.
Collapse
Affiliation(s)
- Kun-Hsing Yu
- Department of Biomedical Informatics , Harvard Medical School , Boston , Massachusetts 02115 , United States.,Department of Statistics , Harvard University , Cambridge , Massachusetts 02138 , United States
| | - Tsung-Lu Michael Lee
- Department of Information Engineering , Kun Shan University , Tainan City 710 , Taiwan
| | - Yu-Ju Chen
- Institute of Chemistry , Academia Sinica , Taipei City 115 , Taiwan
| | - Christopher Ré
- Department of Computer Science , Stanford University , Stanford , California 94305 , United States
| | - Samuel C Kou
- Department of Statistics , Harvard University , Cambridge , Massachusetts 02138 , United States
| | - Jung-Hsien Chiang
- Department of Computer Science and Information Engineering , National Cheng Kung University , Tainan City 701 , Taiwan
| | - Michael Snyder
- Department of Genetics, School of Medicine , Stanford University , Stanford , California 94305 , United States
| | - Isaac S Kohane
- Department of Biomedical Informatics , Harvard Medical School , Boston , Massachusetts 02115 , United States
| |
Collapse
|