1
|
Araújo R, Ramalhete L, Viegas A, Von Rekowski CP, Fonseca TAH, Calado CRC, Bento L. Simplifying Data Analysis in Biomedical Research: An Automated, User-Friendly Tool. Methods Protoc 2024; 7:36. [PMID: 38804330 PMCID: PMC11130801 DOI: 10.3390/mps7030036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 04/20/2024] [Accepted: 04/22/2024] [Indexed: 05/29/2024] Open
Abstract
Robust data normalization and analysis are pivotal in biomedical research to ensure that observed differences in populations are directly attributable to the target variable, rather than disparities between control and study groups. ArsHive addresses this challenge using advanced algorithms to normalize populations (e.g., control and study groups) and perform statistical evaluations between demographic, clinical, and other variables within biomedical datasets, resulting in more balanced and unbiased analyses. The tool's functionality extends to comprehensive data reporting, which elucidates the effects of data processing, while maintaining dataset integrity. Additionally, ArsHive is complemented by A.D.A. (Autonomous Digital Assistant), which employs OpenAI's GPT-4 model to assist researchers with inquiries, enhancing the decision-making process. In this proof-of-concept study, we tested ArsHive on three different datasets derived from proprietary data, demonstrating its effectiveness in managing complex clinical and therapeutic information and highlighting its versatility for diverse research fields.
Collapse
Affiliation(s)
- Rúben Araújo
- NMS—NOVA Medical School, FCM—Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Campo Mártires da Pátria 130, 1169-056 Lisbon, Portugal
- CHRC—Comprehensive Health Research Centre, Universidade NOVA de Lisboa, 1150-082 Lisbon, Portugal
- ISEL—Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, Rua Conselheiro Emídio Navarro 1, 1959-007 Lisbon, Portugal
| | - Luís Ramalhete
- NMS—NOVA Medical School, FCM—Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Campo Mártires da Pátria 130, 1169-056 Lisbon, Portugal
- Blood and Transplantation Center of Lisbon, IPST—Instituto Português do Sangue e da Transplantação, Alameda das Linhas de Torres 117, 1769-001 Lisbon, Portugal
- iNOVA4Health—Advancing Precision Medicine, RG11: Reno-Vascular Diseases Group, NMS—NOVA Medical School, FCM—Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, 1169-056 Lisbon, Portugal
| | - Ana Viegas
- CHRC—Comprehensive Health Research Centre, Universidade NOVA de Lisboa, 1150-082 Lisbon, Portugal
- ESTeSL—Escola Superior de Tecnologia da Saúde de Lisboa, Instituto Politécnico de Lisboa, Avenida D. João II, Lote 4.69.01, 1990-096 Lisbon, Portugal
- Neurosciences Area, Clinical Neurophysiology Unit, ULSSJ—Unidade Local de Saúde São José, Rua José António Serrano, 1150-199 Lisbon, Portugal
| | - Cristiana P. Von Rekowski
- NMS—NOVA Medical School, FCM—Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Campo Mártires da Pátria 130, 1169-056 Lisbon, Portugal
- CHRC—Comprehensive Health Research Centre, Universidade NOVA de Lisboa, 1150-082 Lisbon, Portugal
- ISEL—Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, Rua Conselheiro Emídio Navarro 1, 1959-007 Lisbon, Portugal
| | - Tiago A. H. Fonseca
- NMS—NOVA Medical School, FCM—Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Campo Mártires da Pátria 130, 1169-056 Lisbon, Portugal
- CHRC—Comprehensive Health Research Centre, Universidade NOVA de Lisboa, 1150-082 Lisbon, Portugal
- ISEL—Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, Rua Conselheiro Emídio Navarro 1, 1959-007 Lisbon, Portugal
| | - Cecília R. C. Calado
- ISEL—Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, Rua Conselheiro Emídio Navarro 1, 1959-007 Lisbon, Portugal
- Institute for Bioengineering and Biosciences (iBB), The Associate Laboratory Institute for Health and Bioeconomy–i4HB, Instituto Superior Técnico (IST), Universidade de Lisboa (UL), Av. Rovisco Pais, 1049-001 Lisboa, Portugal
| | - Luís Bento
- Intensive Care Department, ULSSJ—Unidade Local de Saúde São José, Rua José António Serrano, 1150-199 Lisbon, Portugal;
- Integrated Pathophysiological Mechanisms, CHRC—Comprehensive Health Research Centre, NMS—NOVA Medical School, FCM—Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Campo Mártires da Pátria 130, 1169-056 Lisbon, Portugal
| |
Collapse
|
2
|
Robles J, Prakash A, Vizcaíno JA, Casal JI. Integrated meta-analysis of colorectal cancer public proteomic datasets for biomarker discovery and validation. PLoS Comput Biol 2024; 20:e1011828. [PMID: 38252632 PMCID: PMC10833860 DOI: 10.1371/journal.pcbi.1011828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 02/01/2024] [Accepted: 01/15/2024] [Indexed: 01/24/2024] Open
Abstract
The cancer biomarker field has been an object of thorough investigation in the last decades. Despite this, colorectal cancer (CRC) heterogeneity makes it challenging to identify and validate effective prognostic biomarkers for patient classification according to outcome and treatment response. Although a massive amount of proteomics data has been deposited in public data repositories, this rich source of information is vastly underused. Here, we attempted to reuse public proteomics datasets with two main objectives: i) to generate hypotheses (detection of biomarkers) for their posterior/downstream validation, and (ii) to validate, using an orthogonal approach, a previously described biomarker panel. Twelve CRC public proteomics datasets (mostly from the PRIDE database) were re-analysed and integrated to create a landscape of protein expression. Samples from both solid and liquid biopsies were included in the reanalysis. Integrating this data with survival annotation data, we have validated in silico a six-gene signature for CRC classification at the protein level, and identified five new blood-detectable biomarkers (CD14, PPIA, MRC2, PRDX1, and TXNDC5) associated with CRC prognosis. The prognostic value of these blood-derived proteins was confirmed using additional public datasets, supporting their potential clinical value. As a conclusion, this proof-of-the-concept study demonstrates the value of re-using public proteomics datasets as the basis to create a useful resource for biomarker discovery and validation. The protein expression data has been made available in the public resource Expression Atlas.
Collapse
Affiliation(s)
- Javier Robles
- Department of Molecular Biomedicine, Centro de Investigaciones Biológicas Margarita Salas, Consejo Superior de Investigaciones Científicas, Madrid, Spain
- Protein Alternatives SL, Tres Cantos, Madrid, Spain
| | - Ananth Prakash
- European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - J. Ignacio Casal
- Department of Molecular Biomedicine, Centro de Investigaciones Biológicas Margarita Salas, Consejo Superior de Investigaciones Científicas, Madrid, Spain
| |
Collapse
|
3
|
Gagaoua M, Suman SP, Purslow PP, Lebret B. The color of fresh pork: Consumers expectations, underlying farm-to-fork factors, myoglobin chemistry and contribution of proteomics to decipher the biochemical mechanisms. Meat Sci 2023; 206:109340. [PMID: 37708621 DOI: 10.1016/j.meatsci.2023.109340] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 08/14/2023] [Accepted: 09/06/2023] [Indexed: 09/16/2023]
Abstract
The color of fresh pork is a crucial quality attribute that significantly influences consumer perception and purchase decisions. This review first explores consumer expectations and discrimination regarding pork color, as well as an overview of the underlying factors that, from farm-to-fork, contribute to its variation. Understanding the husbandry factors, peri- and post-mortem factors and consumer preferences is essential for the pork industry to meet market demands effectively. This review then delves into current knowledge of pork myoglobin chemistry, its modifications and pork discoloration. Pork myoglobin, which has certain peculiarities comparted to other meat species, plays a weak role in determining pork color, and a thorough understanding of the biochemical changes it undergoes is crucial to understand and improve color stability. Furthermore, the growing role of proteomics as a high-throughput approach and its application as a powerful research tool in meat research, mainly to decipher the biochemical mechanisms involved in pork color determination and identify protein biomarkers, are highlighted. Based on an integrative muscle biology approach, the available proteomics studies on pork color have enabled us to provide the first repertoire of pork color biomarkers, to shortlist and propose a list of proteins for evaluation, and to provide valuable insights into the interconnected biochemical processes implicated in pork color determination. By highlighting the contributions of proteomics in elucidating the biochemical mechanisms underlying pork color determination, the knowledge gained hold significant potential for the pork industry to effectively meet market demands, enhance product quality, and ensure consistent and appealing pork color.
Collapse
Affiliation(s)
| | - Surendranath P Suman
- Department of Animal and Food Sciences, University of Kentucky, Lexington, KY 40546, United States
| | | | | |
Collapse
|
4
|
Claeys T, Van Den Bossche T, Perez-Riverol Y, Gevaert K, Vizcaíno JA, Martens L. lesSDRF is more: maximizing the value of proteomics data through streamlined metadata annotation. Nat Commun 2023; 14:6743. [PMID: 37875519 PMCID: PMC10598006 DOI: 10.1038/s41467-023-42543-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 10/13/2023] [Indexed: 10/26/2023] Open
Abstract
Public proteomics data often lack essential metadata, limiting its potential. To address this, we present lesSDRF, a tool to simplify the process of metadata annotation, thereby ensuring that data leave a lasting, impactful legacy well beyond its initial publication.
Collapse
Affiliation(s)
- Tine Claeys
- VIB-UGent Center for Medical Biotechnology, VIB, 9000, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000, Ghent, Belgium
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology, VIB, 9000, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000, Ghent, Belgium
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, CB10 1SD, UK
| | - Kris Gevaert
- VIB-UGent Center for Medical Biotechnology, VIB, 9000, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000, Ghent, Belgium
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, CB10 1SD, UK.
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9000, Ghent, Belgium.
- Department of Biomolecular Medicine, Ghent University, 9000, Ghent, Belgium.
| |
Collapse
|
5
|
Perez-Riverol Y. Proteomic repository data submission, dissemination, and reuse: key messages. Expert Rev Proteomics 2022; 19:297-310. [PMID: 36529941 PMCID: PMC7614296 DOI: 10.1080/14789450.2022.2160324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 12/07/2022] [Indexed: 12/23/2022]
Abstract
INTRODUCTION The creation of ProteomeXchange data workflows in 2012 transformed the field of proteomics, consisting of the standardization of data submission and dissemination and enabling the widespread reanalysis of public MS proteomics data worldwide. ProteomeXchange has triggered a growing trend toward public dissemination of proteomics data, facilitating the assessment, reuse, comparative analyses, and extraction of new findings from public datasets. By 2022, the consortium is integrated by PRIDE, PeptideAtlas, MassIVE, jPOST, iProX, and Panorama Public. AREAS COVERED Here, we review and discuss the current ecosystem of resources, guidelines, and file formats for proteomics data dissemination and reanalysis. Special attention is drawn to new exciting quantitative and post-translational modification-oriented resources. The challenges and future directions on data depositions including the lack of metadata and cloud-based and high-performance software solutions for fast and reproducible reanalysis of the available data are discussed. EXPERT OPINION The success of ProteomeXchange and the amount of proteomics data available in the public domain have triggered the creation and/or growth of other protein knowledgebase resources. Data reuse is a leading, active, and evolving field; supporting the creation of new formats, tools, and workflows to rediscover and reshape the public proteomics data.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
6
|
Perez-Riverol Y, Bai J, Bandla C, García-Seisdedos D, Hewapathirana S, Kamatchinathan S, Kundu D, Prakash A, Frericks-Zipper A, Eisenacher M, Walzer M, Wang S, Brazma A, Vizcaíno J. The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res 2022; 50:D543-D552. [PMID: 34723319 PMCID: PMC8728295 DOI: 10.1093/nar/gkab1038] [Citation(s) in RCA: 3188] [Impact Index Per Article: 1594.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 10/12/2021] [Accepted: 10/14/2021] [Indexed: 12/12/2022] Open
Abstract
The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data. PRIDE is one of the founding members of the global ProteomeXchange (PX) consortium and an ELIXIR core data resource. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2019. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 500 datasets per month during 2021. In addition to continuous improvements in PRIDE Archive data pipelines and infrastructure, the PRIDE Spectra Archive has been developed to provide direct access to the submitted mass spectra using Universal Spectrum Identifiers. As a key point, the file format MAGE-TAB for proteomics has been developed to enable the improvement of sample metadata annotation. Additionally, the resource PRIDE Peptidome provides access to aggregated peptide/protein evidences across PRIDE Archive. Furthermore, we will describe how PRIDE has increased its efforts to reuse and disseminate high-quality proteomics data into other added-value resources such as UniProt, Ensembl and Expression Atlas.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jingwen Bai
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Chakradhar Bandla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David García-Seisdedos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Suresh Hewapathirana
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Selvakumar Kamatchinathan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Deepti J Kundu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ananth Prakash
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anika Frericks-Zipper
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, D-44801 Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, 44801 Bochum, Germany
| | - Martin Eisenacher
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, D-44801 Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, 44801 Bochum, Germany
| | - Mathias Walzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Shengbo Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
7
|
A proteomics sample metadata representation for multiomics integration and big data analysis. Nat Commun 2021; 12:5854. [PMID: 34615866 PMCID: PMC8494749 DOI: 10.1038/s41467-021-26111-3] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 09/16/2021] [Indexed: 11/08/2022] Open
Abstract
The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.
Collapse
|
8
|
Muggia L, Ametrano CG, Sterflinger K, Tesei D. An Overview of Genomics, Phylogenomics and Proteomics Approaches in Ascomycota. Life (Basel) 2020; 10:E356. [PMID: 33348904 PMCID: PMC7765829 DOI: 10.3390/life10120356] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 12/10/2020] [Accepted: 12/12/2020] [Indexed: 12/26/2022] Open
Abstract
Fungi are among the most successful eukaryotes on Earth: they have evolved strategies to survive in the most diverse environments and stressful conditions and have been selected and exploited for multiple aims by humans. The characteristic features intrinsic of Fungi have required evolutionary changes and adaptations at deep molecular levels. Omics approaches, nowadays including genomics, metagenomics, phylogenomics, transcriptomics, metabolomics, and proteomics have enormously advanced the way to understand fungal diversity at diverse taxonomic levels, under changeable conditions and in still under-investigated environments. These approaches can be applied both on environmental communities and on individual organisms, either in nature or in axenic culture and have led the traditional morphology-based fungal systematic to increasingly implement molecular-based approaches. The advent of next-generation sequencing technologies was key to boost advances in fungal genomics and proteomics research. Much effort has also been directed towards the development of methodologies for optimal genomic DNA and protein extraction and separation. To date, the amount of proteomics investigations in Ascomycetes exceeds those carried out in any other fungal group. This is primarily due to the preponderance of their involvement in plant and animal diseases and multiple industrial applications, and therefore the need to understand the biological basis of the infectious process to develop mechanisms for biologic control, as well as to detect key proteins with roles in stress survival. Here we chose to present an overview as much comprehensive as possible of the major advances, mainly of the past decade, in the fields of genomics (including phylogenomics) and proteomics of Ascomycota, focusing particularly on those reporting on opportunistic pathogenic, extremophilic, polyextremotolerant and lichenized fungi. We also present a review of the mostly used genome sequencing technologies and methods for DNA sequence and protein analyses applied so far for fungi.
Collapse
Affiliation(s)
- Lucia Muggia
- Department of Life Sciences, University of Trieste, 34127 Trieste, Italy
| | - Claudio G. Ametrano
- Grainger Bioinformatics Center, Department of Science and Education, The Field Museum, Chicago, IL 60605, USA;
| | - Katja Sterflinger
- Academy of Fine Arts Vienna, Institute of Natual Sciences and Technology in the Arts, 1090 Vienna, Austria;
| | - Donatella Tesei
- Department of Biotechnology, University of Natural Resources and Life Sciences, 1190 Vienna, Austria;
| |
Collapse
|
9
|
Abstract
Metadata is essential in proteomics data repositories and is crucial to interpret and reanalyze the deposited data sets. For every proteomics data set, we should capture at least three levels of metadata: (i) data set description, (ii) the sample to data files related information, and (iii) standard data file formats (e.g., mzIdentML, mzML, or mzTab). While the data set description and standard data file formats are supported by all ProteomeXchange partners, the information regarding the sample to data files is mostly missing. Recently, members of the European Bioinformatics Community for Mass Spectrometry (EuBIC) have created an open-source project called Sample to Data file format for Proteomics (https://github.com/bigbio/proteomics-metadata-standard/) to enable the standardization of sample metadata of public proteomics data sets. Here, the project is presented to the proteomics community, and we call for contributors, including researchers, journals, and consortiums to provide feedback about the format. We believe this work will improve reproducibility and facilitate the development of new tools dedicated to proteomics data analysis.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | | |
Collapse
|
10
|
Noor Z, Ranganathan S. Bioinformatics approaches for improving seminal plasma proteome analysis. Theriogenology 2019; 137:43-49. [PMID: 31186128 DOI: 10.1016/j.theriogenology.2019.05.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Reproduction efficiency of male animals is one of the key factors influencing the sustainability of livestock. Mass spectrometry (MS) based proteomics has become an important tool for studying seminal plasma proteomes. In this review, we summarize bioinformatics analysis strategies for current proteomics approaches, for identifying novel biomarkers of reproductive robustness.
Collapse
Affiliation(s)
- Zainab Noor
- Department of Molecular Sciences, Macquarie University, Sydney, Australia
| | - Shoba Ranganathan
- Department of Molecular Sciences, Macquarie University, Sydney, Australia.
| |
Collapse
|
11
|
Jarnuczak AF, Ternent T, Vizcaíno JA. Quantitative Proteomics Data in the Public Domain: Challenges and Opportunities. Methods Mol Biol 2019; 1977:217-235. [PMID: 30980331 DOI: 10.1007/978-1-4939-9232-4_14] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Mass spectrometry based proteomics is no longer only a qualitative discipline, and can be successfully employed to obtain a truly multidimensional view of the proteome. In particular, systematic protein expression profiling is now a routine part of many studies in the field and beyond. The large growth in the number of quantitative studies is accompanied by a trend to share publicly the associated analysis results and the underlying raw data. This trend, established and strongly supported by public repositories such as the PRIDE database at the European Bioinformatics Institute, opens up enormous possibilities to explore the data beyond the original publications, for instance by reusing, reanalyzing, and performing different flavors of meta-analysis studies. To help researchers and scientists realize about this potential, here we describe the mainstream public proteomics resources containing quantitative proteomics data, including the processed analysis results and/or the underlying raw data. We then present and discuss the most important points to consider when attempting to (re)use proteomics data in the public domain. We conclude by highlighting potential pitfalls of (re)using quantitative data and discuss some of our own experiences in this context.
Collapse
Affiliation(s)
- Andrew F Jarnuczak
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Tobias Ternent
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.
| |
Collapse
|
12
|
Gioria S, Urbán P, Hajduch M, Barboro P, Cabaleiro N, La Spina R, Chassaigne H. Proteomics study of silver nanoparticles on Caco-2 cells. Toxicol In Vitro 2018; 50:347-372. [PMID: 29626626 PMCID: PMC6021817 DOI: 10.1016/j.tiv.2018.03.015] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 03/28/2018] [Accepted: 03/29/2018] [Indexed: 12/20/2022]
Abstract
Silver nanoparticles (AgNPs) have been incorporated into several consumer products. While these advances in technology are promising and exciting, the effects of these nanoparticles have not equally been studied. Due to the size, AgNPs can penetrate the body through oral exposure and reach the gastrointestinal tract. The present study was designed as a comparative proteomic analysis of Caco-2 cells, used as an in vitro model of the small intestine, exposed to 30 nm citrate stabilized-silver nanoparticles (AgNPs) for 24 or 72 h. Using two complementary proteomic approaches, 2D gel-based and label-free mass spectrometry, we present insight into the effects of AgNPs at proteins level. Exposure of 1 or 10 μg/mL AgNPs to Caco-2 cells resulted in 56 and 88 altered proteins at 24 h and 72 h respectively, by 2D gel-based technique. Ten of these proteins were found to be common between the two time-points. Using label-free mass spectrometry technique, 291 and 179 altered proteins were found at 24 h and 72 h, of which 24 were in common. Analysis of the proteomes showed several major biological processes altered, from which, cell cycle, cell morphology, cellular function and maintenance were the most affected. Comparison between 2D gel-based vs label-free MS based proteomics study Significant changes in the protein profiles of Caco-2 cells exposed to AgNPs. Contribute to understand the mechanisms of action of AgNPs
Collapse
Affiliation(s)
- Sabrina Gioria
- European Commission, Joint Research Centre (JRC), Directorate F - Health, Consumers and Reference Materials, Via Enrico Fermi 2749, I-21027 Ispra, VA, Italy.
| | - Patricia Urbán
- European Commission, Joint Research Centre (JRC), Directorate F - Health, Consumers and Reference Materials, Via Enrico Fermi 2749, I-21027 Ispra, VA, Italy
| | - Martin Hajduch
- European Commission, Joint Research Centre (JRC), Directorate F - Health, Consumers and Reference Materials, Via Enrico Fermi 2749, I-21027 Ispra, VA, Italy
| | - Paola Barboro
- Academic Unit of Medical Oncology, Ospedale Policlinico San Martino, L.go R. Benzi 10, 16132 Genova, Italy
| | - Noelia Cabaleiro
- European Commission, Joint Research Centre (JRC), Directorate F - Health, Consumers and Reference Materials, Via Enrico Fermi 2749, I-21027 Ispra, VA, Italy
| | - Rita La Spina
- European Commission, Joint Research Centre (JRC), Directorate F - Health, Consumers and Reference Materials, Via Enrico Fermi 2749, I-21027 Ispra, VA, Italy
| | - Hubert Chassaigne
- European Commission, Joint Research Centre (JRC), Directorate F - Health, Consumers and Reference Materials, Via Enrico Fermi 2749, I-21027 Ispra, VA, Italy
| |
Collapse
|
13
|
Merkley ED, Sego LH, Lin A, Leiser OP, Kaiser BLD, Adkins JN, Keim PS, Wagner DM, Kreuzer HW. Protein abundances can distinguish between naturally-occurring and laboratory strains of Yersinia pestis, the causative agent of plague. PLoS One 2017; 12:e0183478. [PMID: 28854255 PMCID: PMC5576697 DOI: 10.1371/journal.pone.0183478] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2016] [Accepted: 08/05/2017] [Indexed: 11/19/2022] Open
Abstract
The rapid pace of bacterial evolution enables organisms to adapt to the laboratory environment with repeated passage and thus diverge from naturally-occurring environmental ("wild") strains. Distinguishing wild and laboratory strains is clearly important for biodefense and bioforensics; however, DNA sequence data alone has thus far not provided a clear signature, perhaps due to lack of understanding of how diverse genome changes lead to convergent phenotypes, difficulty in detecting certain types of mutations, or perhaps because some adaptive modifications are epigenetic. Monitoring protein abundance, a molecular measure of phenotype, can overcome some of these difficulties. We have assembled a collection of Yersinia pestis proteomics datasets from our own published and unpublished work, and from a proteomics data archive, and demonstrated that protein abundance data can clearly distinguish laboratory-adapted from wild. We developed a lasso logistic regression classifier that uses binary (presence/absence) or quantitative protein abundance measures to predict whether a sample is laboratory-adapted or wild that proved to be ~98% accurate, as judged by replicated 10-fold cross-validation. Protein features selected by the classifier accord well with our previous study of laboratory adaptation in Y. pestis. The input data was derived from a variety of unrelated experiments and contained significant confounding variables. We show that the classifier is robust with respect to these variables. The methodology is able to discover signatures for laboratory facility and culture medium that are largely independent of the signature of laboratory adaptation. Going beyond our previous laboratory evolution study, this work suggests that proteomic differences between laboratory-adapted and wild Y. pestis are general, potentially pointing to a process that could apply to other species as well. Additionally, we show that proteomics datasets (even archived data collected for different purposes) contain the information necessary to distinguish wild and laboratory samples. This work has clear applications in biomarker detection as well as biodefense.
Collapse
Affiliation(s)
- Eric D. Merkley
- Chemical and Biological Signature Sciences, Pacific Northwest National Laboratory, Richland, Washington, United States of America
- * E-mail:
| | - Landon H. Sego
- Applied Statistics and Computational Modeling, Pacific Northwest National Laboratory, Richland, Washington, United States of America
| | - Andy Lin
- Chemical and Biological Signature Sciences, Pacific Northwest National Laboratory, Richland, Washington, United States of America
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Owen P. Leiser
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, United States of America
| | - Brooke L. Deatherage Kaiser
- Chemical and Biological Signature Sciences, Pacific Northwest National Laboratory, Richland, Washington, United States of America
| | - Joshua N. Adkins
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, United States of America
| | - Paul S. Keim
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, United States of America
| | - David M. Wagner
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, United States of America
| | - Helen W. Kreuzer
- Chemical and Biological Signature Sciences, Pacific Northwest National Laboratory, Richland, Washington, United States of America
| |
Collapse
|
14
|
A Golden Age for Working with Public Proteomics Data. Trends Biochem Sci 2017; 42:333-341. [PMID: 28118949 PMCID: PMC5414595 DOI: 10.1016/j.tibs.2017.01.001] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2016] [Revised: 12/13/2016] [Accepted: 01/02/2017] [Indexed: 11/23/2022]
Abstract
Data sharing in mass spectrometry (MS)-based proteomics is becoming a common scientific practice, as is now common in the case of other, more mature ‘omics’ disciplines like genomics and transcriptomics. We want to highlight that this situation, unprecedented in the field, opens a plethora of opportunities for data scientists. First, we explain in some detail some of the work already achieved, such as systematic reanalysis efforts. We also explain existing applications of public proteomics data, such as proteogenomics and the creation of spectral libraries and spectral archives. Finally, we discuss the main existing challenges and mention the first attempts to combine public proteomics data with other types of omics data sets. The field of proteomics has matured and diversified substantially over the past 10 years. Proteomics data are increasingly shared through centralized, public repositories. Standardization efforts have ensured that a large proportion of these public data can be read and processed by any interested researcher. Because any proteomics data set is only partially understood, there is great opportunity for (orthogonal) reuse of public data. While public proteomics data has so far remained outside ethics and privacy discussions, recent work indicates that there is an inherent risk.
Collapse
|
15
|
Thomas S, Hao L, Ricke WA, Li L. Biomarker discovery in mass spectrometry-based urinary proteomics. Proteomics Clin Appl 2016; 10:358-70. [PMID: 26703953 DOI: 10.1002/prca.201500102] [Citation(s) in RCA: 101] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2015] [Revised: 12/05/2015] [Accepted: 12/21/2015] [Indexed: 01/03/2023]
Abstract
Urinary proteomics has become one of the most attractive topics in disease biomarker discovery. MS-based proteomic analysis has advanced continuously and emerged as a prominent tool in the field of clinical bioanalysis. However, only few protein biomarkers have made their way to validation and clinical practice. Biomarker discovery is challenged by many clinical and analytical factors including, but not limited to, the complexity of urine and the wide dynamic range of endogenous proteins in the sample. This article highlights promising technologies and strategies in the MS-based biomarker discovery process, including study design, sample preparation, protein quantification, instrumental platforms, and bioinformatics. Different proteomics approaches are discussed, and progresses in maximizing urinary proteome coverage and standardization are emphasized in this review. MS-based urinary proteomics has great potential in the development of noninvasive diagnostic assays in the future, which will require collaborative efforts between analytical scientists, systems biologists, and clinicians.
Collapse
Affiliation(s)
- Samuel Thomas
- Molecular and Environmental Toxicology Center, University of Wisconsin-Madison, Madison, WI, USA
| | - Ling Hao
- School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | - William A Ricke
- Molecular and Environmental Toxicology Center, University of Wisconsin-Madison, Madison, WI, USA.,Department of Urology, University of Wisconsin-Madison, Madison, WI, USA
| | - Lingjun Li
- Molecular and Environmental Toxicology Center, University of Wisconsin-Madison, Madison, WI, USA.,School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA.,Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
16
|
Vizcaíno JA, Csordas A, del-Toro N, Dianes JA, Griss J, Lavidas I, Mayer G, Perez-Riverol Y, Reisinger F, Ternent T, Xu QW, Wang R, Hermjakob H. 2016 update of the PRIDE database and its related tools. Nucleic Acids Res 2016; 44:D447-56. [PMID: 26527722 PMCID: PMC4702828 DOI: 10.1093/nar/gkv1145] [Citation(s) in RCA: 2530] [Impact Index Per Article: 316.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2015] [Revised: 10/14/2015] [Accepted: 10/16/2015] [Indexed: 11/18/2022] Open
Abstract
The PRoteomics IDEntifications (PRIDE) database is one of the world-leading data repositories of mass spectrometry (MS)-based proteomics data. Since the beginning of 2014, PRIDE Archive (http://www.ebi.ac.uk/pride/archive/) is the new PRIDE archival system, replacing the original PRIDE database. Here we summarize the developments in PRIDE resources and related tools since the previous update manuscript in the Database Issue in 2013. PRIDE Archive constitutes a complete redevelopment of the original PRIDE, comprising a new storage backend, data submission system and web interface, among other components. PRIDE Archive supports the most-widely used PSI (Proteomics Standards Initiative) data standard formats (mzML and mzIdentML) and implements the data requirements and guidelines of the ProteomeXchange Consortium. The wide adoption of ProteomeXchange within the community has triggered an unprecedented increase in the number of submitted data sets (around 150 data sets per month). We outline some statistics on the current PRIDE Archive data contents. We also report on the status of the PRIDE related stand-alone tools: PRIDE Inspector, PRIDE Converter 2 and the ProteomeXchange submission tool. Finally, we will give a brief update on the resources under development 'PRIDE Cluster' and 'PRIDE Proteomes', which provide a complementary view and quality-scored information of the peptide and protein identification data available in PRIDE Archive.
Collapse
Affiliation(s)
- Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Attila Csordas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Noemi del-Toro
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - José A Dianes
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Johannes Griss
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Austria
| | - Ilias Lavidas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Gerhard Mayer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK Medizinisches Proteom Center (MPC), Ruhr-Universität Bochum, D-44801 Bochum, Germany
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Florian Reisinger
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Tobias Ternent
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Qing-Wei Xu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK Department of Computer Science and Technology, Hubei University of Education, Wuhan, China
| | - Rui Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK National Center for Protein Sciences, Beijing, China
| |
Collapse
|
17
|
Vaudel M, Verheggen K, Csordas A, Raeder H, Berven FS, Martens L, Vizcaíno JA, Barsnes H. Exploring the potential of public proteomics data. Proteomics 2016; 16:214-25. [PMID: 26449181 PMCID: PMC4738454 DOI: 10.1002/pmic.201500295] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Revised: 08/25/2015] [Accepted: 09/28/2015] [Indexed: 12/22/2022]
Abstract
In a global effort for scientific transparency, it has become feasible and good practice to share experimental data supporting novel findings. Consequently, the amount of publicly available MS-based proteomics data has grown substantially in recent years. With some notable exceptions, this extensive material has however largely been left untouched. The time has now come for the proteomics community to utilize this potential gold mine for new discoveries, and uncover its untapped potential. In this review, we provide a brief history of the sharing of proteomics data, showing ways in which publicly available proteomics data are already being (re-)used, and outline potential future opportunities based on four different usage types: use, reuse, reprocess, and repurpose. We thus aim to assist the proteomics community in stepping up to the challenge, and to make the most of the rapidly increasing amount of public proteomics data.
Collapse
Affiliation(s)
- Marc Vaudel
- Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway
| | - Kenneth Verheggen
- Medical Biotechnology Center, VIB, Ghent, Belgium
- Department of Biochemistry, Ghent University, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Attila Csordas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Helge Raeder
- Department of Clinical Science, KG Jebsen Center for Diabetes Research, University of Bergen, Bergen, Norway
| | - Frode S Berven
- Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway
- Department of Clinical Medicine, KG Jebsen Centre for Multiple Sclerosis Research, University of Bergen, Bergen, Norway
| | - Lennart Martens
- Medical Biotechnology Center, VIB, Ghent, Belgium
- Department of Biochemistry, Ghent University, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Juan A Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Harald Barsnes
- Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway
- Department of Clinical Science, KG Jebsen Center for Diabetes Research, University of Bergen, Bergen, Norway
| |
Collapse
|
18
|
Ezkurdia I, Calvo E, Del Pozo A, Vázquez J, Valencia A, Tress ML. The potential clinical impact of the release of two drafts of the human proteome. Expert Rev Proteomics 2015; 12:579-93. [PMID: 26496066 PMCID: PMC4732427 DOI: 10.1586/14789450.2015.1103186] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The authors have carried out an investigation of the two "draft maps of the human proteome" published in 2014 in Nature. The findings include an abundance of poor spectra, low-scoring peptide-spectrum matches and incorrectly identified proteins in both these studies, highlighting clear issues with the application of false discovery rates. This noise means that the claims made by the two papers - the identification of high numbers of protein coding genes, the detection of novel coding regions and the draft tissue maps themselves - should be treated with considerable caution. The authors recommend that clinicians and researchers do not use the unfiltered data from these studies. Despite this these studies will inspire further investigation into tissue-based proteomics. As long as this future work has proper quality controls, it could help produce a consensus map of the human proteome and improve our understanding of the processes that underlie health and disease.
Collapse
Affiliation(s)
- Iakes Ezkurdia
- Unidad de Proteómica, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Madrid, Spain
| | - Enrique Calvo
- Unidad de Proteómica, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Madrid, Spain
| | - Angela Del Pozo
- Instituto de Genetica Medica y Molecular, Hospital Universitario La Paz, Madrid, Spain
| | - Jesús Vázquez
- Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Madrid, Spain
| | - Alfonso Valencia
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Michael L. Tress
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| |
Collapse
|