1
|
Rodriguez JM, Abascal F, Cerdán-Vélez D, Gómez LM, Vázquez J, Tress ML. Evidence for widespread translation of 5' untranslated regions. Nucleic Acids Res 2024:gkae571. [PMID: 38953162 DOI: 10.1093/nar/gkae571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 06/07/2024] [Accepted: 06/19/2024] [Indexed: 07/03/2024] Open
Abstract
Ribosome profiling experiments support the translation of a range of novel human open reading frames. By contrast, most peptides from large-scale proteomics experiments derive from just one source, 5' untranslated regions. Across the human genome we find evidence for 192 translated upstream regions, most of which would produce protein isoforms with extended N-terminal ends. Almost all of these N-terminal extensions are from highly abundant genes, which suggests that the novel regions we detect are just the tip of the iceberg. These upstream regions have characteristics that are not typical of coding exons. Their GC-content is remarkably high, even higher than 5' regions in other genes, and a large majority have non-canonical start codons. Although some novel upstream regions have cross-species conservation - five have orthologues in invertebrates for example - the reading frames of two thirds are not conserved beyond simians. These non-conserved regions also have no evidence of purifying selection, which suggests that much of this translation is not functional. In addition, non-conserved upstream regions have significantly more peptides in cancer cell lines than would be expected, a strong indication that an aberrant or noisy translation initiation process may play an important role in translation from upstream regions.
Collapse
Affiliation(s)
- Jose Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
- CIBER de Enfermedades Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | - Federico Abascal
- Somatic Evolution Group, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA. UK
| | - Daniel Cerdán-Vélez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - Laura Martínez Gómez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - Jesús Vázquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
- CIBER de Enfermedades Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| |
Collapse
|
2
|
Abbas Q, Wilhelm M, Kuster B, Poppenberger B, Frishman D. Exploring crop genomes: assembly features, gene prediction accuracy, and implications for proteomics studies. BMC Genomics 2024; 25:619. [PMID: 38898442 PMCID: PMC11186247 DOI: 10.1186/s12864-024-10521-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 06/13/2024] [Indexed: 06/21/2024] Open
Abstract
Plant genomics plays a pivotal role in enhancing global food security and sustainability by offering innovative solutions for improving crop yield, disease resistance, and stress tolerance. As the number of sequenced genomes grows and the accuracy and contiguity of genome assemblies improve, structural annotation of plant genomes continues to be a significant challenge due to their large size, polyploidy, and rich repeat content. In this paper, we present an overview of the current landscape in crop genomics research, highlighting the diversity of genomic characteristics across various crop species. We also assessed the accuracy of popular gene prediction tools in identifying genes within crop genomes and examined the factors that impact their performance. Our findings highlight the strengths and limitations of BRAKER2 and Helixer as leading structural genome annotation tools and underscore the impact of genome complexity, fragmentation, and repeat content on their performance. Furthermore, we evaluated the suitability of the predicted proteins as a reliable search space in proteomics studies using mass spectrometry data. Our results provide valuable insights for future efforts to refine and advance the field of structural genome annotation.
Collapse
Affiliation(s)
- Qussai Abbas
- Chair of Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
- Munich Data Science Institute, Technical University of Munich, Garching, Germany
| | - Bernhard Kuster
- Munich Data Science Institute, Technical University of Munich, Garching, Germany
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Brigitte Poppenberger
- Biotechnology of Horticultural Crops, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Dmitrij Frishman
- Chair of Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
| |
Collapse
|
3
|
Glidden-Handgis G, Wheeler TJ. WAS IT A MATch I SAW? Approximate palindromes lead to overstated false match rates in benchmarks using reversed sequences. BIOINFORMATICS ADVANCES 2024; 4:vbae052. [PMID: 38764475 PMCID: PMC11099658 DOI: 10.1093/bioadv/vbae052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 03/31/2024] [Accepted: 04/04/2024] [Indexed: 05/21/2024]
Abstract
Background Software for labeling biological sequences typically produces a theory-based statistic for each match (the E-value) that indicates the likelihood of seeing that match's score by chance. E-values accurately predict false match rate for comparisons of random (shuffled) sequences, and thus provide a reasoned mechanism for setting score thresholds that enable high sensitivity with low expected false match rate. This threshold-setting strategy is challenged by real biological sequences, which contain regions of local repetition and low sequence complexity that cause excess matches between non-homologous sequences. Knowing this, tool developers often develop benchmarks that use realistic-seeming decoy sequences to explore empirical tradeoffs between sensitivity and false match rate. A recent trend has been to employ reversed biological sequences as realistic decoys, because these preserve the distribution of letters and the existence of local repeats, while disrupting the original sequence's functional properties. However, we and others have observed that sequences appear to produce high scoring alignments to their reversals with surprising frequency, leading to overstatement of false match risk that may negatively affect downstream analysis. Results We demonstrate that an alignment between a sequence S and its (possibly mutated) reversal tends to produce higher scores than alignment between truly unrelated sequences, even when S is a shuffled string with no notable repetitive or low-complexity regions. This phenomenon is due to the unintuitive fact that (even randomly shuffled) sequences contain palindromes that are on average longer than the longest common substrings (LCS) shared between permuted variants of the same sequence. Though the expected palindrome length is only slightly larger than the expected LCS, the distribution of alignment scores involving reversed sequences is strongly right-shifted, leading to greatly increased frequency of high-scoring alignments to reversed sequences. Impact Overestimates of false match risk can motivate unnecessarily high score thresholds, leading to potentially reduced true match sensitivity. Also, when tool sensitivity is only reported up to the score of the first matched decoy sequence, a large decoy set consisting of reversed sequences can obscure sensitivity differences between tools. As a result of these observations, we advise that reversed biological sequences be used as decoys only when care is taken to remove positive matches in the original (un-reversed) sequences, or when overstatement of false labeling is not a concern. Though the primary focus of the analysis is on sequence annotation, we also demonstrate that the prevalence of internal palindromes may lead to an overstatement of the rate of false labels in protein identification with mass spectrometry.
Collapse
Affiliation(s)
| | - Travis J Wheeler
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ 85721, United States
| |
Collapse
|
4
|
Dumas T, Gomez E, Boccard J, Ramirez G, Armengaud J, Escande A, Mathieu O, Fenet H, Courant F. Mixture effects of pharmaceuticals carbamazepine, diclofenac and venlafaxine on Mytilus galloprovincialis mussel probed by metabolomics and proteogenomics combined approach. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 907:168015. [PMID: 37879482 DOI: 10.1016/j.scitotenv.2023.168015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 10/20/2023] [Accepted: 10/20/2023] [Indexed: 10/27/2023]
Abstract
Exposure to single molecules under laboratory conditions has led to a better understanding of the mechanisms of action (MeOAs) and effects of pharmaceutical active compounds (PhACs) on non-target organisms. However, not taking the co-occurrence of contaminants in the environment and their possible interactions into account may lead to underestimation of their impacts. In this study, we combined untargeted metabolomics and proteogenomics approaches to assess the mixture effects of diclofenac, carbamazepine and venlafaxine on marine mussels (Mytilus galloprovincialis). Our multi-omics approach and data fusion strategy highlighted how such xenobiotic cocktails induce important cellular changes that can be harmful to marine bivalves. This response is mainly characterized by energy metabolism disruption, fatty acid degradation, protein synthesis and degradation, and the induction of endoplasmic reticulum stress and oxidative stress. The known MeOAs and molecular signatures of PhACs were taken into consideration to gain insight into the mixture effects, thereby revealing a potential additive effect. Multi-omics approaches on mussels as sentinels offer a comprehensive overview of molecular and cellular responses triggered by exposure to contaminant mixtures, even at environmental concentrations.
Collapse
Affiliation(s)
- Thibaut Dumas
- HydroSciences Montpellier, IRD, CNRS, University of Montpellier, Montpellier, France
| | - Elena Gomez
- HydroSciences Montpellier, IRD, CNRS, University of Montpellier, Montpellier, France
| | - Julien Boccard
- School of Pharmaceutical Sciences, University of Geneva, Geneva 1211, Switzerland; Institute of Pharmaceutical Sciences of Western Switzerland, University of Geneva, Geneva 1211, Switzerland
| | - Gaëlle Ramirez
- HydroSciences Montpellier, IRD, CNRS, University of Montpellier, Montpellier, France
| | - Jean Armengaud
- Université Paris-Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé (DMTS), SPI, Bagnols-sur-Cèze, France
| | - Aurélie Escande
- HydroSciences Montpellier, IRD, CNRS, University of Montpellier, Montpellier, France
| | - Olivier Mathieu
- HydroSciences Montpellier, IRD, CNRS, University of Montpellier, Montpellier, France; Laboratoire de Pharmacologie-Toxicologie, CHU de Montpellier, Montpellier, France
| | - Hélène Fenet
- HydroSciences Montpellier, IRD, CNRS, University of Montpellier, Montpellier, France
| | - Frédérique Courant
- HydroSciences Montpellier, IRD, CNRS, University of Montpellier, Montpellier, France.
| |
Collapse
|
5
|
Hauserman MR, Ferraro MJ, Carroll RK, Rice KC. Altered quorum sensing and physiology of Staphylococcus aureus during spaceflight detected by multi-omics data analysis. NPJ Microgravity 2024; 10:2. [PMID: 38191486 PMCID: PMC10774393 DOI: 10.1038/s41526-023-00343-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 12/15/2023] [Indexed: 01/10/2024] Open
Abstract
Staphylococcus aureus colonizes the nares of approximately 30% of humans, a risk factor for opportunistic infections. To gain insight into S. aureus virulence potential in the spaceflight environment, we analyzed RNA-Seq, cellular proteomics, and metabolomics data from the "Biological Research in Canisters-23" (BRIC-23) GeneLab spaceflight experiment, a mission designed to measure the response of S. aureus to growth in low earth orbit on the international space station. This experiment used Biological Research in Canisters-Petri Dish Fixation Units (BRIC-PDFUs) to grow asynchronous ground control and spaceflight cultures of S. aureus for 48 h. RNAIII, the effector of the Accessory Gene Regulator (Agr) quorum sensing system, was the most highly upregulated gene transcript in spaceflight relative to ground controls. The agr operon gene transcripts were also highly upregulated during spaceflight, followed by genes encoding phenol-soluble modulins and secreted proteases, which are positively regulated by Agr. Upregulated spaceflight genes/proteins also had functions related to urease activity, type VII-like Ess secretion, and copper transport. We also performed secretome analysis of BRIC-23 culture supernatants, which revealed that spaceflight samples had increased abundance of secreted virulence factors, including Agr-regulated proteases (SspA, SspB), staphylococcal nuclease (Nuc), and EsxA (secreted by the Ess system). These data also indicated that S. aureus metabolism is altered in spaceflight conditions relative to the ground controls. Collectively, these data suggest that S. aureus experiences increased quorum sensing and altered expression of virulence factors in response to the spaceflight environment that may impact its pathogenic potential.
Collapse
Affiliation(s)
- Matthew R Hauserman
- Department of Microbiology and Cell Science, IFAS, University of Florida, Gainesville, FL, USA
| | - Mariola J Ferraro
- Department of Microbiology and Cell Science, IFAS, University of Florida, Gainesville, FL, USA
| | - Ronan K Carroll
- Department of Biological Sciences, Ohio University, Athens, OH, USA
| | - Kelly C Rice
- Department of Microbiology and Cell Science, IFAS, University of Florida, Gainesville, FL, USA.
| |
Collapse
|
6
|
Skiadopoulou D, Vašíček J, Kuznetsova K, Bouyssié D, Käll L, Vaudel M. Retention Time and Fragmentation Predictors Increase Confidence in Identification of Common Variant Peptides. J Proteome Res 2023; 22:3190-3199. [PMID: 37656829 PMCID: PMC10563157 DOI: 10.1021/acs.jproteome.3c00243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Indexed: 09/03/2023]
Abstract
Precision medicine focuses on adapting care to the individual profile of patients, for example, accounting for their unique genetic makeup. Being able to account for the effect of genetic variation on the proteome holds great promise toward this goal. However, identifying the protein products of genetic variation using mass spectrometry has proven very challenging. Here we show that the identification of variant peptides can be improved by the integration of retention time and fragmentation predictors into a unified proteogenomic pipeline. By combining these intrinsic peptide characteristics using the search-engine post-processor Percolator, we demonstrate improved discrimination power between correct and incorrect peptide-spectrum matches. Our results demonstrate that the drop in performance that is induced when expanding a protein sequence database can be compensated, hence enabling efficient identification of genetic variation products in proteomics data. We anticipate that this enhancement of proteogenomic pipelines can provide a more refined picture of the unique proteome of patients and thereby contribute to improving patient care.
Collapse
Affiliation(s)
- Dafni Skiadopoulou
- Mohn
Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, NO-5020 Bergen, Norway
- Computational
Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
| | - Jakub Vašíček
- Mohn
Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, NO-5020 Bergen, Norway
- Computational
Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
| | - Ksenia Kuznetsova
- Mohn
Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, NO-5020 Bergen, Norway
- Computational
Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
| | - David Bouyssié
- Institut
de Pharmacologie et de Biologie Structurale (IPBS), Université
de Toulouse, CNRS, Université Toulouse III—Paul Sabatier
(UT3), 31000 Toulouse, France
| | - Lukas Käll
- Science
for Life Laboratory, School of Engineering Sciences in Chemistry,
Biotechnology and Health, KTH Royal Institute
of Technology, SE-100 44 Stockholm, Sweden
| | - Marc Vaudel
- Mohn
Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, NO-5020 Bergen, Norway
- Computational
Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
- Department
of Genetics and Bioinformatics, Health Data and Digitalization, Norwegian Institute of Public Health, N-0213 Oslo, Norway
| |
Collapse
|
7
|
Nowatzky Y, Benner P, Reinert K, Muth T. Mistle: bringing spectral library predictions to metaproteomics with an efficient search index. Bioinformatics 2023; 39:btad376. [PMID: 37294786 PMCID: PMC10313348 DOI: 10.1093/bioinformatics/btad376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 05/11/2023] [Accepted: 06/08/2023] [Indexed: 06/11/2023] Open
Abstract
MOTIVATION Deep learning has moved to the forefront of tandem mass spectrometry-driven proteomics and authentic prediction for peptide fragmentation is more feasible than ever. Still, at this point spectral prediction is mainly used to validate database search results or for confined search spaces. Fully predicted spectral libraries have not yet been efficiently adapted to large search space problems that often occur in metaproteomics or proteogenomics. RESULTS In this study, we showcase a workflow that uses Prosit for spectral library predictions on two common metaproteomes and implement an indexing and search algorithm, Mistle, to efficiently identify experimental mass spectra within the library. Hence, the workflow emulates a classic protein sequence database search with protein digestion but builds a searchable index from spectral predictions as an in-between step. We compare Mistle to popular search engines, both on a spectral and database search level, and provide evidence that this approach is more accurate than a database search using MSFragger. Mistle outperforms other spectral library search engines in terms of run time and proves to be extremely memory efficient with a 4- to 22-fold decrease in RAM usage. This makes Mistle universally applicable to large search spaces, e.g. covering comprehensive sequence databases of diverse microbiomes. AVAILABILITY AND IMPLEMENTATION Mistle is freely available on GitHub at https://github.com/BAMeScience/Mistle.
Collapse
Affiliation(s)
- Yannek Nowatzky
- Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM), Berlin 12205, Germany
| | - Philipp Benner
- Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM), Berlin 12205, Germany
| | - Knut Reinert
- Department of Mathematics and Computer Science, FU Berlin, Berlin 14195, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany
| | - Thilo Muth
- Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM), Berlin 12205, Germany
| |
Collapse
|
8
|
Miura N, Okuda S. Current progress and critical challenges to overcome in the bioinformatics of mass spectrometry-based metaproteomics. Comput Struct Biotechnol J 2023; 21:1140-1150. [PMID: 36817962 PMCID: PMC9925844 DOI: 10.1016/j.csbj.2023.01.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 01/14/2023] [Accepted: 01/14/2023] [Indexed: 01/18/2023] Open
Abstract
Metaproteomics is a relatively young field that has only been studied for approximately 15 years. Nevertheless, it has the potential to play a key role in disease research by elucidating the mechanisms of communication between the human host and the microbiome. Although it has been useful in developing an understanding of various diseases, its analytical strategies remain limited to the extended application of proteomics. The sequence databases in metaproteomics must be large because of the presence of thousands of species in a typical sample, which causes problems unique to large databases. In this review, we demonstrate the usefulness of metaproteomics in disease research through examples from several studies. Additionally, we discuss the challenges of applying metaproteomics to conventional proteomics analysis methods and introduce studies that may provide clues to the solutions. We also discuss the need for a standard false discovery rate control method for metaproteomics to replace common target-decoy search approaches in proteomics and a method to ensure the reliability of peptide spectrum match.
Collapse
Affiliation(s)
- Nobuaki Miura
- Division of Bioinformatics, Niigata University Graduate School of Medical and Dental Sciences, 2-5274 Gakkocho-dori, Chuo-ku, Niigata 951-8514, Japan
| | - Shujiro Okuda
- Division of Bioinformatics, Niigata University Graduate School of Medical and Dental Sciences, 2-5274 Gakkocho-dori, Chuo-ku, Niigata 951-8514, Japan,Medical AI Center, Niigata University School of Medicine, 2-5274 Gakkocho-dori, Chuo-ku, Niigata 951-8514, Japan,Corresponding author at: Medical AI Center, Niigata University School of Medicine, 2-5274 Gakkocho-dori, Chuo-ku, Niigata 951-8514, Japan.
| |
Collapse
|
9
|
Vašíček J, Skiadopoulou D, Kuznetsova KG, Wen B, Johansson S, Njølstad PR, Bruckner S, Käll L, Vaudel M. Finding haplotypic signatures in proteins. Gigascience 2022; 12:giad093. [PMID: 37919975 PMCID: PMC10622322 DOI: 10.1093/gigascience/giad093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 09/24/2023] [Accepted: 10/08/2023] [Indexed: 11/04/2023] Open
Abstract
BACKGROUND The nonrandom distribution of alleles of common genomic variants produces haplotypes, which are fundamental in medical and population genetic studies. Consequently, protein-coding genes with different co-occurring sets of alleles can encode different amino acid sequences: protein haplotypes. These protein haplotypes are present in biological samples and detectable by mass spectrometry, but they are not accounted for in proteomic searches. Consequently, the impact of haplotypic variation on the results of proteomic searches and the discoverability of peptides specific to haplotypes remain unknown. FINDINGS Here, we study how common genetic haplotypes influence the proteomic search space and investigate the possibility to match peptides containing multiple amino acid substitutions to a publicly available data set of mass spectra. We found that for 12.42% of the discoverable amino acid substitutions encoded by common haplotypes, 2 or more substitutions may co-occur in the same peptide after tryptic digestion of the protein haplotypes. We identified 352 spectra that matched to such multivariant peptides, and out of the 4,582 amino acid substitutions identified, 6.37% were covered by multivariant peptides. However, the evaluation of the reliability of these matches remains challenging, suggesting that refined error rate estimation procedures are needed for such complex proteomic searches. CONCLUSIONS As these procedures become available and the ability to analyze protein haplotypes increases, we anticipate that proteomics will provide new information on the consequences of common variation, across tissues and time.
Collapse
Affiliation(s)
- Jakub Vašíček
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5008, Norway
| | - Dafni Skiadopoulou
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5008, Norway
| | - Ksenia G Kuznetsova
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5008, Norway
| | - Bo Wen
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
| | - Stefan Johansson
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Department of Medical Genetics, Haukeland University Hospital, Bergen 5021, Norway
| | - Pål R Njølstad
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Children and Youth Clinic, Haukeland University Hospital, Bergen 5021, Norway
| | - Stefan Bruckner
- Chair of Visual Analytics, Institute for Visual and Analytic Computing, University of Rostock, Rostock 18051, Germany
| | - Lukas Käll
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH–Royal Institute of Technology, Solna 17121, Sweden
| | - Marc Vaudel
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5008, Norway
- Department of Genetics and Bioinformatics, Health Data and Digitalization, Norwegian Institute of Public Health, Oslo 0473, Norway
| |
Collapse
|
10
|
Aortic disease in Marfan syndrome is caused by overactivation of sGC-PRKG signaling by NO. Nat Commun 2021; 12:2628. [PMID: 33976159 PMCID: PMC8113458 DOI: 10.1038/s41467-021-22933-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 04/01/2021] [Indexed: 02/06/2023] Open
Abstract
Thoracic aortic aneurysm, as occurs in Marfan syndrome, is generally asymptomatic until dissection or rupture, requiring surgical intervention as the only available treatment. Here, we show that nitric oxide (NO) signaling dysregulates actin cytoskeleton dynamics in Marfan Syndrome smooth muscle cells and that NO-donors induce Marfan-like aortopathy in wild-type mice, indicating that a marked increase in NO suffices to induce aortopathy. Levels of nitrated proteins are higher in plasma from Marfan patients and mice and in aortic tissue from Marfan mice than in control samples, indicating elevated circulating and tissue NO. Soluble guanylate cyclase and cGMP-dependent protein kinase are both activated in Marfan patients and mice and in wild-type mice treated with NO-donors, as shown by increased plasma cGMP and pVASP-S239 staining in aortic tissue. Marfan aortopathy in mice is reverted by pharmacological inhibition of soluble guanylate cyclase and cGMP-dependent protein kinase and lentiviral-mediated Prkg1 silencing. These findings identify potential biomarkers for monitoring Marfan Syndrome in patients and urge evaluation of cGMP-dependent protein kinase and soluble guanylate cyclase as therapeutic targets. Aortic aneurysm and dissection, the major problem linked to Marfan syndrome (MFS), lacks effective pharmacological treatment. Here, the authors show that the NO pathway is overactivated in MFS and that inhibition of guanylate cyclase and cGMP-dependent protein kinase reverts MFS aortopathy in mice.
Collapse
|
11
|
Rodriguez JM, Pozo F, di Domenico T, Vazquez J, Tress ML. An analysis of tissue-specific alternative splicing at the protein level. PLoS Comput Biol 2020; 16:e1008287. [PMID: 33017396 PMCID: PMC7561204 DOI: 10.1371/journal.pcbi.1008287] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 10/15/2020] [Accepted: 08/25/2020] [Indexed: 01/09/2023] Open
Abstract
The role of alternative splicing is one of the great unanswered questions in cellular biology. There is strong evidence for alternative splicing at the transcript level, and transcriptomics experiments show that many splice events are tissue specific. It has been suggested that alternative splicing evolved in order to remodel tissue-specific protein-protein networks. Here we investigated the evidence for tissue-specific splicing among splice isoforms detected in a large-scale proteomics analysis. Although the data supporting alternative splicing is limited at the protein level, clear patterns emerged among the small numbers of alternative splice events that we could detect in the proteomics data. More than a third of these splice events were tissue-specific and most were ancient: over 95% of splice events that were tissue-specific in both proteomics and RNAseq analyses evolved prior to the ancestors of lobe-finned fish, at least 400 million years ago. By way of contrast, three in four alternative exons in the human gene set arose in the primate lineage, so our results cannot be extrapolated to the whole genome. Tissue-specific alternative protein forms in the proteomics analysis were particularly abundant in nervous and muscle tissues and their genes had roles related to the cytoskeleton and either the structure of muscle fibres or cell-cell connections. Our results suggest that this conserved tissue-specific alternative splicing may have played a role in the development of the vertebrate brain and heart. We manually curated a set of 255 splice events detected in a large-scale tissue-based proteomics experiment and found that more than a third had evidence of significant tissue-specific differences. Events that were significantly tissue-specific at the protein level were highly conserved; almost 75% evolved over 400 million years ago. The tissues in which we found most evidence for tissue-specific splicing were nervous tissues and cardiac tissues. Genes with tissue-specific events in these two tissues had functions related to important cellular structures in brain and heart tissues. These splice events may have been essential for the development of vertebrate heart and muscle. However, our data set may not be representative of alternative exons as a whole. We found that most tissue specific splicing was strongly conserved, but just 5% of annotated alternative exons in the human gene set are ancient. More than three quarters of alternative exons are primate-derived. Although the analysis does not provide a definitive answer to the question of the functional role of alternative splicing, our results do indicate that alternative splice variants may have played a significant part in the evolution of brain and heart tissues in vertebrates.
Collapse
Affiliation(s)
- Jose Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Calle Melchor Fernandez, Madrid, Spain
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez, Madrid, Spain
| | - Tomas di Domenico
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez, Madrid, Spain
| | - Jesus Vazquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Calle Melchor Fernandez, Madrid, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares (CIBERCV), Madrid, Spain
| | - Michael L. Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez, Madrid, Spain
- * E-mail:
| |
Collapse
|
12
|
Cogne Y, Gouveia D, Chaumot A, Degli-Esposti D, Geffard O, Pible O, Almunia C, Armengaud J. Proteogenomics-Guided Evaluation of RNA-Seq Assembly and Protein Database Construction for Emergent Model Organisms. Proteomics 2020; 20:e1900261. [PMID: 32249536 DOI: 10.1002/pmic.201900261] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Revised: 03/24/2020] [Indexed: 11/10/2022]
Abstract
Proteogenomics is gaining momentum as, today, genomics, transcriptomics, and proteomics can be readily performed on any new species. This approach allows key alterations to molecular pathways to be identified when comparing conditions. For animals and plants, RNA-seq-informed proteomics is the most popular means of interpreting tandem mass spectrometry spectra acquired for species for which the genome has not yet been sequenced. It relies on high-performance de novo RNA-seq assembly and optimized translation strategies. Here, several pre-treatments for Illumina RNA-seq reads before assembly are explored to translate the resulting contigs into useful polypeptide sequences. Experimental transcriptomics and proteomics datasets acquired for individual Gammarus fossarum freshwater crustaceans are used, the most relevant procedure is defined by the ratio of MS/MS spectra assigned to peptide sequences. Removing reads with a mean quality score of less than 17-which represents a single probable nucleotide error on 150-bp reads-prior to assembly, increases the proteomics outcome. The best translation using Transdecoder is achieved with a minimal open reading frame length of 50 amino acids and systematic selection of ORFs longer than 900 nucleotides. Using these parameters, transcriptome assembly and translation informed by proteomics pave the way to further improvements in proteogenomics.
Collapse
Affiliation(s)
- Yannick Cogne
- Université Paris Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé, SPI, 30200, Bagnols-sur-Cèze, France
| | - Duarte Gouveia
- Université Paris Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé, SPI, 30200, Bagnols-sur-Cèze, France
| | - Arnaud Chaumot
- INRAE, UR RiverLY Laboratoire d'écotoxicologie, Centre de Lyon-Villeurbanne, Villeurbanne, F-69625, France
| | - Davide Degli-Esposti
- INRAE, UR RiverLY Laboratoire d'écotoxicologie, Centre de Lyon-Villeurbanne, Villeurbanne, F-69625, France
| | - Olivier Geffard
- INRAE, UR RiverLY Laboratoire d'écotoxicologie, Centre de Lyon-Villeurbanne, Villeurbanne, F-69625, France
| | - Olivier Pible
- Université Paris Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé, SPI, 30200, Bagnols-sur-Cèze, France
| | - Christine Almunia
- Université Paris Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé, SPI, 30200, Bagnols-sur-Cèze, France
| | - Jean Armengaud
- Université Paris Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé, SPI, 30200, Bagnols-sur-Cèze, France
| |
Collapse
|
13
|
Heinz LX, Lee J, Kapoor U, Kartnig F, Sedlyarov V, Papakostas K, César-Razquin A, Essletzbichler P, Goldmann U, Stefanovic A, Bigenzahn JW, Scorzoni S, Pizzagalli MD, Bensimon A, Müller AC, King FJ, Li J, Girardi E, Mbow ML, Whitehurst CE, Rebsamen M, Superti-Furga G. TASL is the SLC15A4-associated adaptor for IRF5 activation by TLR7-9. Nature 2020; 581:316-322. [PMID: 32433612 PMCID: PMC7610944 DOI: 10.1038/s41586-020-2282-0] [Citation(s) in RCA: 108] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 04/07/2020] [Indexed: 12/20/2022]
Abstract
Toll-like receptors (TLRs) have a crucial role in the recognition of pathogens and initiation of immune responses1–3. Here we show that a previously uncharacterized protein encoded by CXorf21—a gene that is associated with systemic lupus erythematosus4,5—interacts with the endolysosomal transporter SLC15A4, an essential but poorly understood component of the endolysosomal TLR machinery also linked to autoimmune disease4,6–9. Loss of this type-I-interferon-inducible protein, which we refer to as ‘TLR adaptor interacting with SLC15A4 on the lysosome’ (TASL), abrogated responses to endolysosomal TLR agonists in both primary and transformed human immune cells. Deletion of SLC15A4 or TASL specifically impaired the activation of the IRF pathway without affecting NF-κB and MAPK signalling, which indicates that ligand recognition and TLR engagement in the endolysosome occurred normally. Extensive mutagenesis of TASL demonstrated that its localization and function relies on the interaction with SLC15A4. TASL contains a conserved pLxIS motif (in which p denotes a hydrophilic residue and x denotes any residue) that mediates the recruitment and activation of IRF5. This finding shows that TASL is an innate immune adaptor for TLR7, TLR8 and TLR9 signalling, revealing a clear mechanistic analogy with the IRF3 adaptors STING, MAVS and TRIF10,11. The identification of TASL as the component that links endolysosomal TLRs to the IRF5 transcription factor via SLC15A4 provides a mechanistic explanation for the involvement of these proteins in systemic lupus erythematosus12–14.
Collapse
Affiliation(s)
- Leonhard X Heinz
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - JangEun Lee
- Boehringer Ingelheim Pharmaceuticals, Ridgefield, CT, USA
| | - Utkarsh Kapoor
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Felix Kartnig
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Vitaly Sedlyarov
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Konstantinos Papakostas
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Adrian César-Razquin
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Patrick Essletzbichler
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Ulrich Goldmann
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Adrijana Stefanovic
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Johannes W Bigenzahn
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Stefania Scorzoni
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Mattia D Pizzagalli
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Ariel Bensimon
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - André C Müller
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - F James King
- Boehringer Ingelheim Pharmaceuticals, Ridgefield, CT, USA
| | - Jun Li
- Boehringer Ingelheim Pharmaceuticals, Ridgefield, CT, USA
| | - Enrico Girardi
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - M Lamine Mbow
- Boehringer Ingelheim Pharmaceuticals, Ridgefield, CT, USA
| | | | - Manuele Rebsamen
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.
| | - Giulio Superti-Furga
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria. .,Center for Physiology and Pharmacology, Medical University of Vienna, Vienna, Austria.
| |
Collapse
|
14
|
Gomez-Auli A, Hillebrand LE, Christen D, Günther SC, Biniossek ML, Peters C, Schilling O, Reinheckel T. The secreted inhibitor of invasive cell growth CREG1 is negatively regulated by cathepsin proteases. Cell Mol Life Sci 2020; 78:733-755. [PMID: 32385587 PMCID: PMC7873128 DOI: 10.1007/s00018-020-03528-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Revised: 03/31/2020] [Accepted: 04/13/2020] [Indexed: 01/15/2023]
Abstract
Previous clinical and experimental evidence strongly supports a breast cancer-promoting function of the lysosomal protease cathepsin B. However, the cathepsin B-dependent molecular pathways are not completely understood. Here, we studied the cathepsin-mediated secretome changes in the context of the MMTV-PyMT breast cancer mouse model. Employing the cell-conditioned media from tumor-macrophage co-cultures, as well as tumor interstitial fluid obtained by a novel strategy from PyMT mice with differential cathepsin B expression, we identified an important proteolytic and lysosomal signature, highlighting the importance of this organelle and these enzymes in the tumor micro-environment. The Cellular Repressor of E1A Stimulated Genes 1 (CREG1), a secreted endolysosomal glycoprotein, displayed reduced abundance upon over-expression of cathepsin B as well as increased abundance upon cathepsin B deletion or inhibition. Moreover, it was cleaved by cathepsin B in vitro. CREG1 reportedly could act as tumor suppressor. We show that treatment of PyMT tumor cells with recombinant CREG1 reduced proliferation, migration, and invasion; whereas, the opposite was observed with reduced CREG1 expression. This was further validated in vivo by orthotopic transplantation. Our study highlights CREG1 as a key player in tumor–stroma interaction and suggests that cathepsin B sustains malignant cell behavior by reducing the levels of the growth suppressor CREG1 in the tumor microenvironment.
Collapse
Affiliation(s)
- Alejandro Gomez-Auli
- Institute of Molecular Medicine and Cell Research, Faculty of Medicine, University of Freiburg, 79104, Freiburg, Germany
| | - Larissa Elisabeth Hillebrand
- Institute of Molecular Medicine and Cell Research, Faculty of Medicine, University of Freiburg, 79104, Freiburg, Germany
| | - Daniel Christen
- Institute of Molecular Medicine and Cell Research, Faculty of Medicine, University of Freiburg, 79104, Freiburg, Germany
| | - Sira Carolin Günther
- Institute of Molecular Medicine and Cell Research, Faculty of Medicine, University of Freiburg, 79104, Freiburg, Germany
| | - Martin Lothar Biniossek
- Institute of Molecular Medicine and Cell Research, Faculty of Medicine, University of Freiburg, 79104, Freiburg, Germany
| | - Christoph Peters
- Institute of Molecular Medicine and Cell Research, Faculty of Medicine, University of Freiburg, 79104, Freiburg, Germany.,German Cancer Research Center (DKFZ) Heidelberg, and German Cancer Consortium (DKTK), Partner Site Freiburg, 79104, Freiburg, Germany.,BIOSS Centre for Biological Signalling Studies, University of Freiburg, 79104, Freiburg, Germany
| | - Oliver Schilling
- Institute of Surgical Pathology, University Medical Center, Faculty of Medicine, University of Freiburg, 79106, Freiburg, Germany.,German Cancer Research Center (DKFZ) Heidelberg, and German Cancer Consortium (DKTK), Partner Site Freiburg, 79104, Freiburg, Germany.,BIOSS Centre for Biological Signalling Studies, University of Freiburg, 79104, Freiburg, Germany
| | - Thomas Reinheckel
- Institute of Molecular Medicine and Cell Research, Faculty of Medicine, University of Freiburg, 79104, Freiburg, Germany. .,German Cancer Research Center (DKFZ) Heidelberg, and German Cancer Consortium (DKTK), Partner Site Freiburg, 79104, Freiburg, Germany. .,BIOSS Centre for Biological Signalling Studies, University of Freiburg, 79104, Freiburg, Germany.
| |
Collapse
|
15
|
Khan YA, Jungreis I, Wright JC, Mudge JM, Choudhary JS, Firth AE, Kellis M. Evidence for a novel overlapping coding sequence in POLG initiated at a CUG start codon. BMC Genet 2020; 21:25. [PMID: 32138667 PMCID: PMC7059407 DOI: 10.1186/s12863-020-0828-7] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 02/19/2020] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND POLG, located on nuclear chromosome 15, encodes the DNA polymerase γ(Pol γ). Pol γ is responsible for the replication and repair of mitochondrial DNA (mtDNA). Pol γ is the only DNA polymerase found in mitochondria for most animal cells. Mutations in POLG are the most common single-gene cause of diseases of mitochondria and have been mapped over the coding region of the POLG ORF. RESULTS Using PhyloCSF to survey alternative reading frames, we found a conserved coding signature in an alternative frame in exons 2 and 3 of POLG, herein referred to as ORF-Y that arose de novo in placental mammals. Using the synplot2 program, synonymous site conservation was found among mammals in the region of the POLG ORF that is overlapped by ORF-Y. Ribosome profiling data revealed that ORF-Y is translated and that initiation likely occurs at a CUG codon. Inspection of an alignment of mammalian sequences containing ORF-Y revealed that the CUG codon has a strong initiation context and that a well-conserved predicted RNA stem-loop begins 14 nucleotides downstream. Such features are associated with enhanced initiation at near-cognate non-AUG codons. Reanalysis of the Kim et al. (2014) draft human proteome dataset yielded two unique peptides that map unambiguously to ORF-Y. An additional conserved uORF, herein referred to as ORF-Z, was also found in exon 2 of POLG. Lastly, we surveyed Clinvar variants that are synonymous with respect to the POLG ORF and found that most of these variants cause amino acid changes in ORF-Y or ORF-Z. CONCLUSIONS We provide evidence for a novel coding sequence, ORF-Y, that overlaps the POLG ORF. Ribosome profiling and mass spectrometry data show that ORF-Y is expressed. PhyloCSF and synplot2 analysis show that ORF-Y is subject to strong purifying selection. An abundance of disease-correlated mutations that map to exons 2 and 3 of POLG but also affect ORF-Y provides potential clinical significance to this finding.
Collapse
Affiliation(s)
- Yousuf A Khan
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA, 94305, USA.
- Division of Virology, Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QP, UK.
| | - Irwin Jungreis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
| | - James C Wright
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 123 Old Brompton Road, London, SW7 3RP, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | - Andrew E Firth
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| |
Collapse
|
16
|
Prieto G, Vázquez J. Protein Probability Model for High-Throughput Protein Identification by Mass Spectrometry-Based Proteomics. J Proteome Res 2020; 19:1285-1297. [PMID: 32037837 DOI: 10.1021/acs.jproteome.9b00819] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Shotgun proteomics is the method of choice for high-throughput protein identification; however, robust statistical methods are essential to automatize this task while minimizing the number of false identifications. The standard method for estimating the false discovery rate (FDR) of individual identifications and keeping it below a threshold (typically 1%) is the target-decoy approach. However, numerous works have shown that FDR at the protein level may become much larger than FDR at the peptide level. The development of an appropriate scoring model to identify proteins from their peptides using high-throughput shotgun proteomics is highly needed. In this study, we present a novel protein-level scoring algorithm that uses the scores of the identified peptides and maintains all of the properties expected for a true protein probability. We also present a refinement of the picked method to calculate FDR at the protein level. These algorithms can be used together as a robust identification workflow suitable for large-scale proteomics, and we show that the identification performance of this workflow is superior to that of other widely used methods in several samples and using different search engines. Our protein probability model offers the scientific community an algorithm that is easy to integrate into protein identification workflows for the automated analysis of shotgun proteomics data.
Collapse
Affiliation(s)
- Gorka Prieto
- Department of Communications Engineering, University of the Basque Country (UPV/EHU), 48013 Bilbao, Spain
| | - Jesús Vázquez
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28049 Madrid, Spain
| |
Collapse
|
17
|
Abstract
Shotgun proteomics is the method of choice for large-scale protein identification. However, the use of a robust statistical workflow to validate such identification is mandatory to minimize false matches, ambiguities, and amplification of error rates from spectra to proteins. In this chapter we emphasize the key concepts to take into account when processing the output of a search engine to obtain reliable peptide or protein identifications. We assume that the reader is already familiar with tandem mass spectrometry so we can focus on the use of statistical confidence methods. After introducing the key concepts we present different software tools and how to use them with an example dataset.
Collapse
Affiliation(s)
- Gorka Prieto
- Department of Communications Engineering, Faculty of Engineering of Bilbao, University of the Basque Country (UPV/EHU), Bilbao, Spain.
| | - Jesús Vázquez
- Laboratory of Cardiovascular Proteomics, Centro Nacional de Investigaciones Cardiovasculares (CNIC) and CIBER de Enfermedades Cardiovasculares (CIBERCV), Madrid, Spain
| |
Collapse
|
18
|
Gouveia D, Cogne Y, Gaillard JC, Almunia C, Pible O, François A, Degli-Esposti D, Geffard O, Chaumot A, Armengaud J. Shotgun proteomics datasets acquired on Gammarus pulex animals sampled from the wild. Data Brief 2019; 27:104650. [PMID: 31687451 PMCID: PMC6820120 DOI: 10.1016/j.dib.2019.104650] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Revised: 09/27/2019] [Accepted: 10/03/2019] [Indexed: 11/10/2022] Open
Abstract
This data article associated with the manuscript “Comparative proteomics in the wild: accounting for intrapopulation variability improves describing proteome response in a Gammarus pulex field population exposed to cadmium” refers to the shotgun proteomics analysis performed on 40 Gammarus pulex animals sampled from the wild. Proteins were extracted, digested with trypsin, and the resulting peptides were identified by tandem mass spectrometry. Here, we present the list of proteins from males and the list of proteins from females that are differentially detected between the Brameloup and the Pollon populations. Data are available via ProteomeXchange with identifiers PXD013656 and PXD013712, respectively.
Collapse
Affiliation(s)
- Duarte Gouveia
- Laboratoire Innovations technologiques pour la Détection et le Diagnostic (Li2D), Service de Pharmacologie et Immunoanalyse (SPI), CEA, INRA, F-30207 Bagnols-sur-Cèze, France
| | - Yannick Cogne
- Laboratoire Innovations technologiques pour la Détection et le Diagnostic (Li2D), Service de Pharmacologie et Immunoanalyse (SPI), CEA, INRA, F-30207 Bagnols-sur-Cèze, France
| | - Jean-Charles Gaillard
- Laboratoire Innovations technologiques pour la Détection et le Diagnostic (Li2D), Service de Pharmacologie et Immunoanalyse (SPI), CEA, INRA, F-30207 Bagnols-sur-Cèze, France
| | - Christine Almunia
- Laboratoire Innovations technologiques pour la Détection et le Diagnostic (Li2D), Service de Pharmacologie et Immunoanalyse (SPI), CEA, INRA, F-30207 Bagnols-sur-Cèze, France
| | - Olivier Pible
- Laboratoire Innovations technologiques pour la Détection et le Diagnostic (Li2D), Service de Pharmacologie et Immunoanalyse (SPI), CEA, INRA, F-30207 Bagnols-sur-Cèze, France
| | - Adeline François
- Irstea, UR RiverLy, Laboratoire d'écotoxicologie, centre de Lyon-Villeurbanne, F-69625 Villeurbanne, France
| | - Davide Degli-Esposti
- Irstea, UR RiverLy, Laboratoire d'écotoxicologie, centre de Lyon-Villeurbanne, F-69625 Villeurbanne, France
| | - Olivier Geffard
- Irstea, UR RiverLy, Laboratoire d'écotoxicologie, centre de Lyon-Villeurbanne, F-69625 Villeurbanne, France
| | - Arnaud Chaumot
- Irstea, UR RiverLy, Laboratoire d'écotoxicologie, centre de Lyon-Villeurbanne, F-69625 Villeurbanne, France
| | - Jean Armengaud
- Laboratoire Innovations technologiques pour la Détection et le Diagnostic (Li2D), Service de Pharmacologie et Immunoanalyse (SPI), CEA, INRA, F-30207 Bagnols-sur-Cèze, France
| |
Collapse
|
19
|
Cogne Y, Almunia C, Gouveia D, Pible O, François A, Degli-Esposti D, Geffard O, Armengaud J, Chaumot A. Comparative proteomics in the wild: Accounting for intrapopulation variability improves describing proteome response in a Gammarus pulex field population exposed to cadmium. AQUATIC TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2019; 214:105244. [PMID: 31352074 DOI: 10.1016/j.aquatox.2019.105244] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Revised: 05/14/2019] [Accepted: 07/09/2019] [Indexed: 06/10/2023]
Abstract
High-throughput proteomics can be performed on animal sentinels for discovering key molecular biomarkers signing the physiological response and adaptation of organisms. Ecotoxicoproteomics is today amenable by means of proteogenomics to small arthropods such as Gammarids which are well known sentinels of aquatic environments. Here, we analysed two regional Gammarus pulex populations to characterize the potential proteome divergence induced in one site by natural bioavailable mono-metallic contamination (cadmium) compared to a non-contaminated site. Two RNAseq-derived protein sequence databases were established previously on male and female individuals sampled from the reference site. Here, individual proteomes were acquired on 10 male and 10 female paired organisms sampled from each site. Proteins involved in protein lipidation, carbohydrate metabolism, proteolysis, innate immunity, oxidative stress response and lipid transport were found more abundant in animals exposed to cadmium, while hemocyanins were found in lower abundance. The intrapopulation proteome variability of long-term exposed G. pulex was inflated relatively to the non-contaminated population. These results show that, while remaining a challenge for such organisms with not yet sequenced genomes, taking into account intrapopulation variability is important to better define the molecular players induced by toxic stress in a comparative field proteomics approach.
Collapse
Affiliation(s)
- Yannick Cogne
- Laboratoire Innovations technologiques pour la Détection et le Diagnostic (Li2D), Service de Pharmacologie et Immunoanalyse (SPI), CEA, INRA, F-30207, Bagnols-sur-Cèze, France
| | - Christine Almunia
- Laboratoire Innovations technologiques pour la Détection et le Diagnostic (Li2D), Service de Pharmacologie et Immunoanalyse (SPI), CEA, INRA, F-30207, Bagnols-sur-Cèze, France
| | - Duarte Gouveia
- Laboratoire Innovations technologiques pour la Détection et le Diagnostic (Li2D), Service de Pharmacologie et Immunoanalyse (SPI), CEA, INRA, F-30207, Bagnols-sur-Cèze, France
| | - Olivier Pible
- Laboratoire Innovations technologiques pour la Détection et le Diagnostic (Li2D), Service de Pharmacologie et Immunoanalyse (SPI), CEA, INRA, F-30207, Bagnols-sur-Cèze, France
| | - Adeline François
- Irstea, UR RiverLy, Laboratoire d'écotoxicologie, centre de Lyon-Villeurbanne, F-69625, Villeurbanne, France
| | - Davide Degli-Esposti
- Irstea, UR RiverLy, Laboratoire d'écotoxicologie, centre de Lyon-Villeurbanne, F-69625, Villeurbanne, France
| | - Olivier Geffard
- Irstea, UR RiverLy, Laboratoire d'écotoxicologie, centre de Lyon-Villeurbanne, F-69625, Villeurbanne, France
| | - Jean Armengaud
- Laboratoire Innovations technologiques pour la Détection et le Diagnostic (Li2D), Service de Pharmacologie et Immunoanalyse (SPI), CEA, INRA, F-30207, Bagnols-sur-Cèze, France.
| | - Arnaud Chaumot
- Irstea, UR RiverLy, Laboratoire d'écotoxicologie, centre de Lyon-Villeurbanne, F-69625, Villeurbanne, France
| |
Collapse
|
20
|
Choo MS, Wan C, Rudd PM, Nguyen-Khuong T. GlycopeptideGraphMS: Improved Glycopeptide Detection and Identification by Exploiting Graph Theoretical Patterns in Mass and Retention Time. Anal Chem 2019; 91:7236-7244. [PMID: 31079452 DOI: 10.1021/acs.analchem.9b00594] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The leading proteomic method for identifying N-glycosylated peptides is liquid chromatography coupled with tandem fragmentation mass spectrometry (LCMS/MS) followed by spectral matching of MS/MS fragment masses to a database of possible glycan and peptide combinations. Such database-dependent approaches come with challenges such as needing high-quality informative MS/MS spectra, ignoring unexpected glycan or peptide sequences, and making incorrect assignments because some glycan combinations are equivalent in mass to amino acids. To address these challenges, we present GlycopeptideGraphMS, a graph theoretical bioinformatic approach complementary to the database-dependent method. Using the AXL receptor tyrosine kinase (AXL) as a model glycoprotein with multiple N-glycosylation sites, we show that those LCMS features that could be grouped into graph networks on the basis of glycan mass and retention time differences were actually N-glycopeptides with the same peptide backbone but different N-glycan compositions. Conversely, unglycosylated peptides did not exhibit this grouping behavior. Furthermore, MS/MS sequencing of the glycan and peptide composition of just one N-glycopeptide in the graph was sufficient to identify the rest of the N-glycopeptides in the graph. By validating the identifications with exoglycosidase cocktails and MS/MS fragmentation, we determined the experimental false discovery rate of identifications to be 2.21%. GlycopeptideGraphMS detected more than 500 unique N-glycopeptides from AXL, triple the number found by a database search with Byonic software, and detected incorrect assignments due to a nonspecific protease cleavage. This method overcomes some limitations of the database approach and is a step closer to comprehensive automated glycoproteomics.
Collapse
Affiliation(s)
- Matthew S Choo
- Bioprocessing Technology Institute , 20 Biopolis Way #06-01 , Singapore 138668
| | - Corrine Wan
- Bioprocessing Technology Institute , 20 Biopolis Way #06-01 , Singapore 138668
| | - Pauline M Rudd
- Bioprocessing Technology Institute , 20 Biopolis Way #06-01 , Singapore 138668.,National Institute for Bioprocessing Research and Training , Conway Institute , Dublin , Ireland.,University College Dublin, Belfield , Dublin , Ireland
| | - Terry Nguyen-Khuong
- Bioprocessing Technology Institute , 20 Biopolis Way #06-01 , Singapore 138668
| |
Collapse
|
21
|
Frankish A, Diekhans M, Ferreira AM, Johnson R, Jungreis I, Loveland J, Mudge JM, Sisu C, Wright J, Armstrong J, Barnes I, Berry A, Bignell A, Carbonell Sala S, Chrast J, Cunningham F, Di Domenico T, Donaldson S, Fiddes IT, García Girón C, Gonzalez JM, Grego T, Hardy M, Hourlier T, Hunt T, Izuogu OG, Lagarde J, Martin FJ, Martínez L, Mohanan S, Muir P, Navarro FC, Parker A, Pei B, Pozo F, Ruffier M, Schmitt BM, Stapleton E, Suner MM, Sycheva I, Uszczynska-Ratajczak B, Xu J, Yates A, Zerbino D, Zhang Y, Aken B, Choudhary JS, Gerstein M, Guigó R, Hubbard TJ, Kellis M, Paten B, Reymond A, Tress ML, Flicek P. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res 2019; 47:D766-D773. [PMID: 30357393 PMCID: PMC6323946 DOI: 10.1093/nar/gky955] [Citation(s) in RCA: 1761] [Impact Index Per Article: 352.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 09/20/2018] [Accepted: 10/08/2018] [Indexed: 02/06/2023] Open
Abstract
The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.
Collapse
Affiliation(s)
- Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Anne-Maud Ferreira
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Rory Johnson
- Department of Medical Oncology, Inselspital, University Hospital, University of Bern, Bern, Switzerland
- Department of Biomedical Research (DBMR), University of Bern, Bern, Switzerland
| | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vasser St, Cambridge, MA 02139, USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Jane Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cristina Sisu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Bioscience, Brunel University London, Uxbridge UB8 3PH, UK
| | - James Wright
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 123 Old Brompton Road, London SW7 3RP, UK
| | - Joel Armstrong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alexandra Bignell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Silvia Carbonell Sala
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain
| | - Jacqueline Chrast
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tomás Di Domenico
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ian T Fiddes
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Carlos García Girón
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jose Manuel Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tiago Grego
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Osagie G Izuogu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Julien Lagarde
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Laura Martínez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Shamika Mohanan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Muir
- Department of Molecular, Cellular & Developmental Biology, Yale University, New Haven, CT 06520, USA
- Systems Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Fabio C P Navarro
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Anne Parker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Baikang Pei
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bianca M Schmitt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Eloise Stapleton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Irina Sycheva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Jinuri Xu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Andrew Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Yan Zhang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Bronwen Aken
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jyoti S Choudhary
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 123 Old Brompton Road, London SW7 3RP, UK
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology & Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, E-08003 Catalonia, Spain
| | - Tim J P Hubbard
- Department of Medical and Molecular Genetics, King's College London, Guys Hospital, Great Maze Pond, London SE1 9RT, UK
| | - Manolis Kellis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vasser St, Cambridge, MA 02139, USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Alexandre Reymond
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
22
|
Chemometrics-Assisted Shotgun Proteomics for Establishment of Potential Peptide Markers of Non-Halal Pork (Sus scrofa) among Halal Beef and Chicken. FOOD ANAL METHOD 2018. [DOI: 10.1007/s12161-018-1327-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
23
|
Misra BB. Updates on resources, software tools, and databases for plant proteomics in 2016-2017. Electrophoresis 2018; 39:1543-1557. [PMID: 29420853 DOI: 10.1002/elps.201700401] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2017] [Revised: 01/23/2018] [Accepted: 02/02/2018] [Indexed: 11/05/2022]
Abstract
Proteomics data processing, annotation, and analysis can often lead to major hurdles in large-scale high-throughput bottom-up proteomics experiments. Given the recent rise in protein-based big datasets being generated, efforts in in silico tool development occurrences have had an unprecedented increase; so much so, that it has become increasingly difficult to keep track of all the advances in a particular academic year. However, these tools benefit the plant proteomics community in circumventing critical issues in data analysis and visualization, as these continually developing open-source and community-developed tools hold potential in future research efforts. This review will aim to introduce and summarize more than 50 software tools, databases, and resources developed and published during 2016-2017 under the following categories: tools for data pre-processing and analysis, statistical analysis tools, peptide identification tools, databases and spectral libraries, and data visualization and interpretation tools. Intended for a well-informed proteomics community, finally, efforts in data archiving and validation datasets for the community will be discussed as well. Additionally, the author delineates the current and most commonly used proteomics tools in order to introduce novice readers to this -omics discovery platform.
Collapse
Affiliation(s)
- Biswapriya B Misra
- Department of Internal Medicine, Section of Molecular Medicine, Medical Center Boulevard, Winston-Salem, NC, USA
| |
Collapse
|
24
|
Abstract
Omics approaches have become popular in biology as powerful discovery tools, and currently gain in interest for diagnostic applications. Establishing the accurate genome sequence of any organism is easy, but the outcome of its annotation by means of automatic pipelines remains imprecise. Some protein-encoding genes may be missed as soon as they are specific and poorly conserved in a given taxon, while important to explain the specific traits of the organism. Translational starts are also poorly predicted in a relatively important number of cases, thus impacting the protein sequence database used in proteomics, comparative genomics, and systems biology. The use of high-throughput proteomics data to improve genome annotation is an attractive option to obtain a more comprehensive molecular picture of a given organism. Here, protocols for reannotating prokaryote genomes are described based on shotgun proteomics and derivatization of protein N-termini with a positively charged reagent coupled to high-resolution tandem mass spectrometry.
Collapse
|