1
|
Zhan Z, Wang L. Fast peak error correction algorithms for proteoform identification using top-down tandem mass spectra. Bioinformatics 2024; 40:btae149. [PMID: 38498847 PMCID: PMC11212493 DOI: 10.1093/bioinformatics/btae149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 03/05/2024] [Accepted: 03/15/2024] [Indexed: 03/20/2024] Open
Abstract
MOTIVATION Proteoform identification is an important problem in proteomics. The main task is to find a modified protein that best fits the input spectrum. To overcome the combinatorial explosion of possible proteoforms, the proteoform mass graph and spectrum mass graph are used to represent the protein database and the spectrum, respectively. The problem becomes finding an optimal alignment between the proteoform mass graph and the spectrum mass graph. Peak error correction is an important issue for computing an optimal alignment between the two input mass graphs. RESULTS We propose a faster algorithm for the error correction alignment of spectrum mass graph and proteoform mass graph problem and produce a program package TopMGFast. The newly designed algorithms require less space and running time so that we are able to compute global optimal alignments for the two input mass graphs in a reasonable time. For the local alignment version, experiments show that the running time of the new algorithm is reduced by 2.5 times. For the global alignment version, experiments show that the maximum mass errors between any pair of matched nodes in the alignments obtained by our method are within a small range as designed, while the alignments produced by the state-of-the-art method, TopMG, have very large maximum mass errors for many cases. The obtained alignment sizes are roughly the same for both TopMG and TopMGFast. Of course, TopMGFast needs more running time than TopMG. Therefore, our new algorithm can obtain more reliable global alignments within a reasonable time. This is the first time that global optimal error correction alignments can be obtained using real datasets. AVAILABILITY AND IMPLEMENTATION The source code of the algorithm is available at https://github.com/Zeirdo/TopMGFast.
Collapse
Affiliation(s)
- Zhaohui Zhan
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China
| | - Lusheng Wang
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China
- City University of Hong Kong Shenzhen Research Institution, ShenZhen, 518057, China
| |
Collapse
|
2
|
Castillo J, de la Iglesia A, Leiva M, Jodar M, Oliva R. Proteomics of human spermatozoa. Hum Reprod 2023; 38:2312-2320. [PMID: 37632247 DOI: 10.1093/humrep/dead170] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 07/12/2023] [Indexed: 08/27/2023] Open
Abstract
Proteomic methodologies offer a robust approach to identify and quantify thousands of proteins from semen components in both fertile donors and infertile patients. These strategies provide an unprecedented discovery potential, which many research teams are currently exploiting. However, it is essential to follow a suitable experimental design to generate robust data, including proper purification of samples, appropriate technical procedures to increase identification throughput, and data analysis following quality criteria. More than 6000 proteins have been described so far through proteomic analyses in the mature sperm cell, increasing our knowledge on processes involved in sperm function, intercommunication between spermatozoa and seminal fluid, and the transcriptional origin of the proteins. These data have been complemented with comparative studies to ascertain the potential role of the identified proteins on sperm maturation and functionality, and its impact on infertility. By comparing sperm protein profiles, many proteins involved in the acquisition of fertilizing ability have been identified. Furthermore, altered abundance of specific protein groups has been observed in a wide range of infertile phenotypes, including asthenozoospermia, oligozoospermia, and normozoospermia with unsuccessful assisted reproductive techniques outcomes, leading to the identification of potential clinically useful protein biomarkers. Finally, proteomics has been used to evaluate alterations derived from semen sample processing, which might have an impact on fertility treatments. However, the intrinsic heterogeneity and inter-individual variability of the semen samples have resulted in a relatively low overlap among proteomic reports, highlighting the relevance of combining strategies for data validation and applying strict criteria for proteomic data analysis to obtain reliable results. This mini-review provides an overview of the most critical steps to conduct robust sperm proteomic studies, the most relevant results obtained so far, and potential next steps to increase the impact of sperm proteomic data.
Collapse
Affiliation(s)
- Judit Castillo
- Molecular Biology of Reproduction and Development Research Group, Departament de Biomedicina, Facultat de Medicina i Ciències de la Salut, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Fundació Clínic per a la Recerca Biomèdica, Universitat de Barcelona (UB), Barcelona, Spain
| | - Alberto de la Iglesia
- Molecular Biology of Reproduction and Development Research Group, Departament de Biomedicina, Facultat de Medicina i Ciències de la Salut, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Fundació Clínic per a la Recerca Biomèdica, Universitat de Barcelona (UB), Barcelona, Spain
| | - Marina Leiva
- Molecular Biology of Reproduction and Development Research Group, Departament de Biomedicina, Facultat de Medicina i Ciències de la Salut, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Fundació Clínic per a la Recerca Biomèdica, Universitat de Barcelona (UB), Barcelona, Spain
| | - Meritxell Jodar
- Molecular Biology of Reproduction and Development Research Group, Departament de Biomedicina, Facultat de Medicina i Ciències de la Salut, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Fundació Clínic per a la Recerca Biomèdica, Universitat de Barcelona (UB), Barcelona, Spain
- Biochemistry and Molecular Genetics Service, Biomedical Diagnostic Center (CDB), Hospital Clínic de Barcelona, Barcelona, Spain
| | - Rafael Oliva
- Molecular Biology of Reproduction and Development Research Group, Departament de Biomedicina, Facultat de Medicina i Ciències de la Salut, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Fundació Clínic per a la Recerca Biomèdica, Universitat de Barcelona (UB), Barcelona, Spain
- Biochemistry and Molecular Genetics Service, Biomedical Diagnostic Center (CDB), Hospital Clínic de Barcelona, Barcelona, Spain
| |
Collapse
|
3
|
Miyazaki MA, Guilharducci RL, Intasqui P, Bertolla RP. Mapping the human sperm proteome - novel insights into reproductive research. Expert Rev Proteomics 2023; 20:19-45. [PMID: 37140161 DOI: 10.1080/14789450.2023.2210764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
INTRODUCTION Spermatozoa are highly specialized cells with unique morphology. In addition, spermatozoa lose a considerable amount of cytoplasm during spermiogenesis, when they also compact their DNA, resulting in a transcriptionally quiescent cell. Throughout the male reproductive tract, sperm will acquire proteins that enable them to interact with the female reproductive tract. After ejaculation, proteins undergo post-translational modifications for sperm to capacitate, hyperactivate and fertilize the oocyte. Many proteins have been identified as predictors of male infertility, and also investigated in diseases that compromise reproductive potential. AREAS COVERED In this review we proposed to summarize the recent findings about the sperm proteome and how they affect sperm structure, function, and fertility. A literature search was performed using PubMed and Google Scholar databases within the past 5 years until August 2022. EXPERT OPINION Sperm function depends on protein abundance, conformation, and PTMs; understanding the sperm proteome may help to identify pathways essential to fertility, even making it possible to unravel the mechanisms involved in idiopathic infertility. In addition, proteomics evaluation offers knowledge regarding alterations that compromise the male reproductive potential.
Collapse
Affiliation(s)
- Mika Alexia Miyazaki
- Department of Surgery, Division of Urology, Human Reproduction Section, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Raquel Lozano Guilharducci
- Department of Surgery, Division of Urology, Human Reproduction Section, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Paula Intasqui
- Department of Surgery, Division of Urology, Human Reproduction Section, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Ricardo Pimenta Bertolla
- Department of Surgery, Division of Urology, Human Reproduction Section, Universidade Federal de São Paulo, São Paulo, Brazil
| |
Collapse
|
4
|
de la Iglesia A, Jodar M, Oliva R, Castillo J. Insights into the sperm chromatin and implications for male infertility from a protein perspective. WIREs Mech Dis 2023; 15:e1588. [PMID: 36181449 DOI: 10.1002/wsbm.1588] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 09/06/2022] [Accepted: 09/12/2022] [Indexed: 11/06/2022]
Abstract
Male germ cells undergo an extreme but fascinating process of chromatin remodeling that begins in the testis during the last phase of spermatogenesis and continues through epididymal sperm maturation. Most of the histones are replaced by small proteins named protamines, whose high basicity leads to a tight genomic compaction. This process is epigenetically regulated at many levels, not only by posttranslational modifications, but also by readers, writers, and erasers, in a context of a highly coordinated postmeiotic gene expression program. Protamines are key proteins for acquiring this highly specialized chromatin conformation, needed for sperm functionality. Interestingly, and contrary to what could be inferred from its very specific DNA-packaging function across protamine-containing species, human sperm chromatin contains a wide spectrum of protamine proteoforms, including truncated and posttranslationally modified proteoforms. The generation of protamine knock-out models revealed not only chromatin compaction defects, but also collateral sperm alterations contributing to infertile phenotypes, evidencing the importance of sperm chromatin protamination toward the generation of a new individual. The unique features of sperm chromatin have motivated its study, applying from conventional to the most ground-breaking techniques to disentangle its peculiarities and the cellular mechanisms governing its successful conferment, especially relevant from the protein point of view due to the important epigenetic role of sperm nuclear proteins. Gathering and contextualizing the most striking discoveries will provide a global understanding of the importance and complexity of achieving a proper chromatin compaction and exploring its implications on postfertilization events and beyond. This article is categorized under: Reproductive System Diseases > Genetics/Genomics/Epigenetics Reproductive System Diseases > Molecular and Cellular Physiology.
Collapse
Affiliation(s)
- Alberto de la Iglesia
- Molecular Biology of Reproduction and Development Research Group, Fundació Clínic per a la Recerca Biomèdica, Departament de Biomedicina, Facultat de Medicina i Ciències de la Salut, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Universitat de Barcelona (UB), Barcelona, Spain
| | - Meritxell Jodar
- Molecular Biology of Reproduction and Development Research Group, Fundació Clínic per a la Recerca Biomèdica, Departament de Biomedicina, Facultat de Medicina i Ciències de la Salut, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Universitat de Barcelona (UB), Barcelona, Spain.,Biochemistry and Molecular Genetics Service, Hospital Clinic, Barcelona, Spain
| | - Rafael Oliva
- Molecular Biology of Reproduction and Development Research Group, Fundació Clínic per a la Recerca Biomèdica, Departament de Biomedicina, Facultat de Medicina i Ciències de la Salut, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Universitat de Barcelona (UB), Barcelona, Spain.,Biochemistry and Molecular Genetics Service, Hospital Clinic, Barcelona, Spain
| | - Judit Castillo
- Molecular Biology of Reproduction and Development Research Group, Fundació Clínic per a la Recerca Biomèdica, Departament de Biomedicina, Facultat de Medicina i Ciències de la Salut, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Universitat de Barcelona (UB), Barcelona, Spain
| |
Collapse
|
5
|
Martin EA, Fulcher JM, Zhou M, Monroe ME, Petyuk VA. TopPICR: A Companion R Package for Top-Down Proteomics Data Analysis. J Proteome Res 2023; 22:399-409. [PMID: 36631391 DOI: 10.1021/acs.jproteome.2c00570] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Top-down proteomics is the analysis of proteins in their intact form without proteolysis, thus preserving valuable information about post-translational modifications, isoforms, and proteolytic processing. However, it is still a developing field due to limitations in the instrumentation, difficulties with the interpretation of complex mass spectra, and a lack of well-established quantification approaches. TopPIC is one of the popular tools for proteoform identification. We extended its capabilities into label-free proteoform quantification by developing a companion R package (TopPICR). Key steps in the TopPICR pipeline include filtering identifications, inferring a minimal set of protein accessions explaining the observed sequences, aligning retention times, recalibrating measured masses, clustering features across data sets, and finally compiling feature intensities using the match-between-runs approach. The output of the pipeline is an MSnSet object which makes downstream data analysis seamlessly compatible with packages from the Bioconductor project. It also provides the capability for visualizing proteoforms within the context of the parent protein sequence. The functionality of TopPICR is demonstrated on top-down LC-MS/MS data sets of 10 human-in-mouse xenografts of luminal and basal breast tumor samples.
Collapse
Affiliation(s)
- Evan A Martin
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington99352, United States
| | - James M Fulcher
- Environmental and Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington99352, United States
| | - Mowei Zhou
- Environmental and Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington99352, United States
| | - Matthew E Monroe
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington99352, United States
| | - Vladislav A Petyuk
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington99352, United States
| |
Collapse
|
6
|
Koudelka T, Winkels K, Kaleja P, Tholey A. Shedding light on both ends: An update on analytical approaches for N- and C-terminomics. BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR CELL RESEARCH 2021; 1869:119137. [PMID: 34626679 DOI: 10.1016/j.bbamcr.2021.119137] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 08/27/2021] [Accepted: 09/06/2021] [Indexed: 02/04/2023]
Abstract
Though proteases were long regarded as nonspecific degradative enzymes, over time, it was recognized that they also hydrolyze peptide bonds very specifically with a limited substrate pool. This irreversible posttranslational modification modulates the fate and activity of many proteins, making proteolytic processing a master switch in the regulation of e.g., the immune system, apoptosis and cancer progression. N- and C-terminomics, the identification of protein termini, has become indispensable in elucidating protease substrates and therefore protease function. Further, terminomics has the potential to identify yet unknown proteoforms, e.g. formed by alternative splicing or the recently discovered alternative ORFs. Different strategies and workflows have been developed that achieve higher sensitivity, a greater depth of coverage or higher throughput. In this review, we summarize recent developments in both N- and C-terminomics and include the potential of top-down proteomics which inherently delivers information on both ends of analytes in a single analysis.
Collapse
Affiliation(s)
- Tomas Koudelka
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - Konrad Winkels
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - Patrick Kaleja
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany.
| |
Collapse
|