1
|
Bartoszewicz JM, Nasri F, Nowicka M, Renard BY. Detecting DNA of novel fungal pathogens using ResNets and a curated fungi-hosts data collection. Bioinformatics 2022; 38:ii168-ii174. [PMID: 36124807 DOI: 10.1093/bioinformatics/btac495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/08/2022] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Emerging pathogens are a growing threat, but large data collections and approaches for predicting the risk associated with novel agents are limited to bacteria and viruses. Pathogenic fungi, which also pose a constant threat to public health, remain understudied. Relevant data remain comparatively scarce and scattered among many different sources, hindering the development of sequencing-based detection workflows for novel fungal pathogens. No prediction method working for agents across all three groups is available, even though the cause of an infection is often difficult to identify from symptoms alone. RESULTS We present a curated collection of fungal host range data, comprising records on human, animal and plant pathogens, as well as other plant-associated fungi, linked to publicly available genomes. We show that it can be used to predict the pathogenic potential of novel fungal species directly from DNA sequences with either sequence homology or deep learning. We develop learned, numerical representations of the collected genomes and visualize the landscape of fungal pathogenicity. Finally, we train multi-class models predicting if next-generation sequencing reads originate from novel fungal, bacterial or viral threats. CONCLUSIONS The neural networks trained using our data collection enable accurate detection of novel fungal pathogens. A curated set of over 1400 genomes with host and pathogenicity metadata supports training of machine-learning models and sequence comparison, not limited to the pathogen detection task. AVAILABILITY AND IMPLEMENTATION The data, models and code are hosted at https://zenodo.org/record/5846345, https://zenodo.org/record/5711877 and https://gitlab.com/dacs-hpi/deepac. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jakub M Bartoszewicz
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany.,Department of Mathematics and Computer Science, Free University of Berlin, Berlin 14195, Germany
| | - Ferdous Nasri
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany.,Department of Mathematics and Computer Science, Free University of Berlin, Berlin 14195, Germany
| | - Melania Nowicka
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany.,Department of Mathematics and Computer Science, Free University of Berlin, Berlin 14195, Germany
| | - Bernhard Y Renard
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| |
Collapse
|
2
|
Hruska M, Holub D. Evaluation of an integrative Bayesian peptide detection approach on a combinatorial peptide library. EUROPEAN JOURNAL OF MASS SPECTROMETRY (CHICHESTER, ENGLAND) 2021; 27:217-234. [PMID: 34989269 DOI: 10.1177/14690667211066725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Detection of peptides lies at the core of bottom-up proteomics analyses. We examined a Bayesian approach to peptide detection, integrating match-based models (fragments, retention time, isotopic distribution, and precursor mass) and peptide prior probability models under a unified probabilistic framework. To assess the relevance of these models and their various combinations, we employed a complete- and a tail-complete search of a low-precursor-mass synthetic peptide library based on oncogenic KRAS peptides. The fragment match was by far the most informative match-based model, while the retention time match was the only remaining such model with an appreciable impact--increasing correct detections by around 8 %. A peptide prior probability model built from a reference proteome greatly improved the detection over a uniform prior, essentially transforming de novo sequencing into a reference-guided search. The knowledge of a correct sequence tag in advance to peptide-spectrum matching had only a moderate impact on peptide detection unless the tag was long and of high certainty. The approach also derived more precise error rates on the analyzed combinatorial peptide library than those estimated using PeptideProphet and Percolator, showing its potential applicability for the detection of homologous peptides. Although the approach requires further computational developments for routine data analysis, it illustrates the value of peptide prior probabilities and presents a Bayesian approach for their incorporation into peptide detection.
Collapse
Affiliation(s)
- Miroslav Hruska
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, 98735Palacky University, Olomouc, Czech Republic
- Department of Computer Science, Faculty of Science, 98735Palacky University, Olomouc, Czech Republic
| | - Dusan Holub
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, 98735Palacky University, Olomouc, Czech Republic
| |
Collapse
|
3
|
Comprehensive proteomic analysis revealing multifaceted regulatory network of the xero-halophyte Haloxylon salicornicum involved in salt tolerance. J Biotechnol 2020; 324:143-161. [DOI: 10.1016/j.jbiotec.2020.10.011] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 09/29/2020] [Accepted: 10/09/2020] [Indexed: 01/06/2023]
|
4
|
Muth T, Renard BY. Evaluating de novo sequencing in proteomics: already an accurate alternative to database-driven peptide identification? Brief Bioinform 2019; 19:954-970. [PMID: 28369237 DOI: 10.1093/bib/bbx033] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2016] [Indexed: 01/24/2023] Open
Abstract
While peptide identifications in mass spectrometry (MS)-based shotgun proteomics are mostly obtained using database search methods, high-resolution spectrum data from modern MS instruments nowadays offer the prospect of improving the performance of computational de novo peptide sequencing. The major benefit of de novo sequencing is that it does not require a reference database to deduce full-length or partial tag-based peptide sequences directly from experimental tandem mass spectrometry spectra. Although various algorithms have been developed for automated de novo sequencing, the prediction accuracy of proposed solutions has been rarely evaluated in independent benchmarking studies. The main objective of this work is to provide a detailed evaluation on the performance of de novo sequencing algorithms on high-resolution data. For this purpose, we processed four experimental data sets acquired from different instrument types from collision-induced dissociation and higher energy collisional dissociation (HCD) fragmentation mode using the software packages Novor, PEAKS and PepNovo. Moreover, the accuracy of these algorithms is also tested on ground truth data based on simulated spectra generated from peak intensity prediction software. We found that Novor shows the overall best performance compared with PEAKS and PepNovo with respect to the accuracy of correct full peptide, tag-based and single-residue predictions. In addition, the same tool outpaced the commercial competitor PEAKS in terms of running time speedup by factors of around 12-17. Despite around 35% prediction accuracy for complete peptide sequences on HCD data sets, taken as a whole, the evaluated algorithms perform moderately on experimental data but show a significantly better performance on simulated data (up to 84% accuracy). Further, we describe the most frequently occurring de novo sequencing errors and evaluate the influence of missing fragment ion peaks and spectral noise on the accuracy. Finally, we discuss the potential of de novo sequencing for now becoming more widely used in the field.
Collapse
Affiliation(s)
- Thilo Muth
- Research Group Bioinformatics, Robert Koch Institute, Berlin, Germany
| | - Bernhard Y Renard
- Research Group Bioinformatics, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
5
|
Muth T, Hartkopf F, Vaudel M, Renard BY. A Potential Golden Age to Come-Current Tools, Recent Use Cases, and Future Avenues for De Novo Sequencing in Proteomics. Proteomics 2018; 18:e1700150. [PMID: 29968278 DOI: 10.1002/pmic.201700150] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 05/23/2018] [Indexed: 01/15/2023]
Abstract
In shotgun proteomics, peptide and protein identification is most commonly conducted using database search engines, the method of choice when reference protein sequences are available. Despite its widespread use the database-driven approach is limited, mainly because of its static search space. In contrast, de novo sequencing derives peptide sequence information in an unbiased manner, using only the fragment ion information from the tandem mass spectra. In recent years, with the improvements in MS instrumentation, various new methods have been proposed for de novo sequencing. This review article provides an overview of existing de novo sequencing algorithms and software tools ranging from peptide sequencing to sequence-to-protein mapping. Various use cases are described for which de novo sequencing was successfully applied. Finally, limitations of current methods are highlighted and new directions are discussed for a wider acceptance of de novo sequencing in the community.
Collapse
Affiliation(s)
- Thilo Muth
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| | - Felix Hartkopf
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| | - Marc Vaudel
- K.G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, 5020, Bergen, Norway.,Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, 5020, Bergen, Norway
| | - Bernhard Y Renard
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| |
Collapse
|
6
|
Kumar D, Yadav AK, Dash D. Choosing an Optimal Database for Protein Identification from Tandem Mass Spectrometry Data. Methods Mol Biol 2017; 1549:17-29. [PMID: 27975281 DOI: 10.1007/978-1-4939-6740-7_3] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Database searching is the preferred method for protein identification from digital spectra of mass to charge ratios (m/z) detected for protein samples through mass spectrometers. The search database is one of the major influencing factors in discovering proteins present in the sample and thus in deriving biological conclusions. In most cases the choice of search database is arbitrary. Here we describe common search databases used in proteomic studies and their impact on final list of identified proteins. We also elaborate upon factors like composition and size of the search database that can influence the protein identification process. In conclusion, we suggest that choice of the database depends on the type of inferences to be derived from proteomics data. However, making additional efforts to build a compact and concise database for a targeted question should generally be rewarding in achieving confident protein identifications.
Collapse
Affiliation(s)
- Dhirendra Kumar
- G.N. Ramachandran Knowledge Centre for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Mathura Road, Delhi, 110025, India
| | - Amit Kumar Yadav
- G.N. Ramachandran Knowledge Centre for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Mathura Road, Delhi, 110025, India
| | - Debasis Dash
- G.N. Ramachandran Knowledge Centre for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Mathura Road, Delhi, 110025, India.
| |
Collapse
|
7
|
Azri W, Barhoumi Z, Chibani F, Borji M, Bessrour M, Mliki A. Proteomic responses in shoots of the facultative halophyte Aeluropus littoralis (Poaceae) under NaCl salt stress. FUNCTIONAL PLANT BIOLOGY : FPB 2016; 43:1028-1047. [PMID: 32480524 DOI: 10.1071/fp16114] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Accepted: 07/13/2016] [Indexed: 06/11/2023]
Abstract
Salinity is an environmental constraint that limits agricultural productivity worldwide. Studies on the halophytes provide valuable information to describe the physiological and molecular mechanisms of salinity tolerance. Therefore, because of genetic relationships of Aeluropus littoralis (Willd) Parl. with rice, wheat and barley, the present study was conducted to investigate changes in shoot proteome patterns in response to different salt treatments using proteomic methods. To examine the effect of salinity on A. littoralis proteome pattern, salt treatments (0, 200 and 400mM NaCl) were applied for 24h and 7 and 30 days. After 24h and 7 days exposure to salt treatments, seedlings were fresh and green, but after 30 days, severe chlorosis was established in old leaves of 400mM NaCl-salt treated plants. Comparative proteomic analysis of the leaves revealed that the relative abundance of 95 and 120 proteins was significantly altered in 200 and 400mM NaCl treated plants respectively. Mass spectrometry-based identification was successful for 66 out of 98 selected protein spots. These proteins were mainly involved in carbohydrate, energy, amino acids and protein metabolisms, photosynthesis, detoxification, oxidative stress, translation, transcription and signal transduction. These results suggest that the reduction of proteins related to photosynthesis and induction of proteins involved in glycolysis, tricarboxylic acid (TCA) cycle, and energy metabolism could be the main mechanisms for salt tolerance in A. littoralis. This study provides important information about salt tolerance, and a framework for further functional studies on the identified proteins in A. littoralis.
Collapse
Affiliation(s)
- Wassim Azri
- Laboratory of Plant Molecular Physiology, Biotechnology Centre of Borj Cedria, PO Box 901, 2050 Hammam-Lif, Tunisia
| | - Zouhaier Barhoumi
- Laboratory of Extremophyle Plants, Biotechnology Centre of Borj Cedria, PO Box 901, 2050 Hammam-Lif, Tunisia
| | - Farhat Chibani
- Laboratory of Plant Molecular Physiology, Biotechnology Centre of Borj Cedria, PO Box 901, 2050 Hammam-Lif, Tunisia
| | - Manel Borji
- Laboratory of Plant Molecular Physiology, Biotechnology Centre of Borj Cedria, PO Box 901, 2050 Hammam-Lif, Tunisia
| | - Mouna Bessrour
- Laboratory of Extremophyle Plants, Biotechnology Centre of Borj Cedria, PO Box 901, 2050 Hammam-Lif, Tunisia
| | - Ahmed Mliki
- Laboratory of Plant Molecular Physiology, Biotechnology Centre of Borj Cedria, PO Box 901, 2050 Hammam-Lif, Tunisia
| |
Collapse
|
8
|
Muth T, Renard BY, Martens L. Metaproteomic data analysis at a glance: advances in computational microbial community proteomics. Expert Rev Proteomics 2016; 13:757-69. [DOI: 10.1080/14789450.2016.1209418] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
9
|
Giese SH, Zickmann F, Renard BY. Detection of Unknown Amino Acid Substitutions Using Error-Tolerant Database Search. Methods Mol Biol 2016; 1362:247-264. [PMID: 26519182 DOI: 10.1007/978-1-4939-3106-4_16] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Recent studies have demonstrated that mass spectrometry-based variant detection is feasible. Typically, either genomic variant databases or transcript data are used to construct customized target databases for the identification of single-amino acid variants in mass spectrometry data. However, both approaches require additional data to perform the identification of SAAVs. Here, we discuss the application of an error-tolerant peptide search engine such as BICEPS for identifying variants exclusively based on standard Uniprot databases. Thereby, unnecessary and redundant extensions of the search space are avoided. The workflow provides an unbiased view on the data; the search space is not limited to known variants and simultaneously does not require additional data. In a subsequent step a second identification search is performed to verify the initially identified variant peptides and aggregate information on the protein level.
Collapse
Affiliation(s)
- Sven H Giese
- Research Group Bioinformatics (NG4), Robert Koch-Institute, Nordufer 20, 13353, Berlin, Germany
- Department of Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355, Berlin, Germany
- Wellcome Trust Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3JR, UK
| | - Franziska Zickmann
- Research Group Bioinformatics (NG4), Robert Koch-Institute, Nordufer 20, 13353, Berlin, Germany
| | - Bernhard Y Renard
- Research Group Bioinformatics (NG4), Robert Koch-Institute, Nordufer 20, 13353, Berlin, Germany.
| |
Collapse
|
10
|
Zickmann F, Renard BY. MSProGene: integrative proteogenomics beyond six-frames and single nucleotide polymorphisms. Bioinformatics 2015; 31:i106-15. [PMID: 26072472 PMCID: PMC4765881 DOI: 10.1093/bioinformatics/btv236] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Summary: Ongoing advances in high-throughput technologies have facilitated accurate proteomic measurements and provide a wealth of information on genomic and transcript level. In proteogenomics, this multi-omics data is combined to analyze unannotated organisms and to allow more accurate sample-specific predictions. Existing analysis methods still mainly depend on six-frame translations or reference protein databases that are extended by transcriptomic information or known single nucleotide polymorphisms (SNPs). However, six-frames introduce an artificial sixfold increase of the target database and SNP integration requires a suitable database summarizing results from previous experiments. We overcome these limitations by introducing MSProGene, a new method for integrative proteogenomic analysis based on customized RNA-Seq driven transcript databases. MSProGene is independent from existing reference databases or annotated SNPs and avoids large six-frame translated databases by constructing sample-specific transcripts. In addition, it creates a network combining RNA-Seq and peptide information that is optimized by a maximum-flow algorithm. It thereby also allows resolving the ambiguity of shared peptides for protein inference. We applied MSProGene on three datasets and show that it facilitates a database-independent reliable yet accurate prediction on gene and protein level and additionally identifies novel genes. Availability and implementation: MSProGene is written in Java and Python. It is open source and available at http://sourceforge.net/projects/msprogene/. Contact:renardb@rki.de
Collapse
Affiliation(s)
- Franziska Zickmann
- Research Group Bioinformatics (NG4), Robert Koch Institute, 13353 Berlin, Germany
| | - Bernhard Y Renard
- Research Group Bioinformatics (NG4), Robert Koch Institute, 13353 Berlin, Germany
| |
Collapse
|
11
|
Kuhring M, Renard BY. Estimating the computational limits of detection of microbial non-model organisms. Proteomics 2015; 15:3580-4. [DOI: 10.1002/pmic.201400598] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Revised: 05/20/2015] [Accepted: 06/26/2015] [Indexed: 11/11/2022]
Affiliation(s)
- Mathias Kuhring
- Research Group Bioinformatics (NG4); Robert Koch Institute; Berlin Germany
| | - Bernhard Y. Renard
- Research Group Bioinformatics (NG4); Robert Koch Institute; Berlin Germany
| |
Collapse
|
12
|
Chick JM, Kolippakkam D, Nusinow DP, Zhai B, Rad R, Huttlin EL, Gygi SP. A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides. Nat Biotechnol 2015; 33:743-9. [PMID: 26076430 PMCID: PMC4515955 DOI: 10.1038/nbt.3267] [Citation(s) in RCA: 284] [Impact Index Per Article: 31.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2014] [Accepted: 05/11/2015] [Indexed: 12/17/2022]
Abstract
Fewer than half of all tandem mass spectrometry (MS/MS) spectra acquired in shotgun proteomics experiments are typically matched to a peptide with high confidence. Here we determine the identity of unassigned peptides using an ultra-tolerant Sequest database search that allows peptide matching even with modifications of unknown masses up to ± 500 Da. In a proteome-wide data set on HEK293 cells (9,513 proteins and 396,736 peptides), this approach matched an additional 184,000 modified peptides, which were linked to biological and chemical modifications representing 523 distinct mass bins, including phosphorylation, glycosylation and methylation. We localized all unknown modification masses to specific regions within a peptide. Known modifications were assigned to the correct amino acids with frequencies >90%. We conclude that at least one-third of unassigned spectra arise from peptides with substoichiometric modifications.
Collapse
Affiliation(s)
- Joel M. Chick
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, USA
| | - Deepak Kolippakkam
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, USA
| | - David P. Nusinow
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, USA
| | - Bo Zhai
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, USA
| | - Ramin Rad
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, USA
| | - Edward L. Huttlin
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, USA
| | - Steven P. Gygi
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
13
|
Wang J, Meng Y, Li B, Ma X, Lai Y, Si E, Yang K, Xu X, Shang X, Wang H, Wang D. Physiological and proteomic analyses of salt stress response in the halophyte Halogeton glomeratus. PLANT, CELL & ENVIRONMENT 2015; 38:655-69. [PMID: 25124288 PMCID: PMC4407928 DOI: 10.1111/pce.12428] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2014] [Revised: 07/19/2014] [Accepted: 07/23/2014] [Indexed: 05/04/2023]
Abstract
Very little is known about the adaptation mechanism of Chenopodiaceae Halogeton glomeratus, a succulent annual halophyte, under saline conditions. In this study, we investigated the morphological and physiological adaptation mechanisms of seedlings exposed to different concentrations of NaCl treatment for 21 d. Our results revealed that H. glomeratus has a robust ability to tolerate salt; its optimal growth occurs under approximately 100 mm NaCl conditions. Salt crystals were deposited in water-storage tissue under saline conditions. We speculate that osmotic adjustment may be the primary mechanism of salt tolerance in H. glomeratus, which transports toxic ions such as sodium into specific salt-storage cells and compartmentalizes them in large vacuoles to maintain the water content of tissues and the succulence of the leaves. To investigate the molecular response mechanisms to salt stress in H. glomeratus, we conducted a comparative proteomic analysis of seedling leaves that had been exposed to 200 mm NaCl for 24 h, 72 h and 7 d. Forty-nine protein spots, exhibiting significant changes in abundance after stress, were identified using matrix-assisted laser desorption ionization tandem time-of-flight mass spectrometry (MALDI-TOF/TOF MS/MS) and similarity searches across EST database of H. glomeratus. These stress-responsive proteins were categorized into nine functional groups, such as photosynthesis, carbohydrate and energy metabolism, and stress and defence response.
Collapse
Affiliation(s)
- Juncheng Wang
- Gansu Provincial Key Lab of Aridland Crop Science/Gansu Key Lab of Crop Improvement & Germplasm EnhancementLanzhou, 730070, China
- College of Agronomy, Gansu Agricultural UniversityLanzhou, 730070, China
| | - Yaxiong Meng
- Gansu Provincial Key Lab of Aridland Crop Science/Gansu Key Lab of Crop Improvement & Germplasm EnhancementLanzhou, 730070, China
- College of Agronomy, Gansu Agricultural UniversityLanzhou, 730070, China
| | - Baochun Li
- Gansu Provincial Key Lab of Aridland Crop Science/Gansu Key Lab of Crop Improvement & Germplasm EnhancementLanzhou, 730070, China
- College of Life Sciences and Technology, Gansu Agricultural UniversityLanzhou, 730070, China
| | - Xiaole Ma
- Gansu Provincial Key Lab of Aridland Crop Science/Gansu Key Lab of Crop Improvement & Germplasm EnhancementLanzhou, 730070, China
- College of Agronomy, Gansu Agricultural UniversityLanzhou, 730070, China
| | - Yong Lai
- Gansu Provincial Key Lab of Aridland Crop Science/Gansu Key Lab of Crop Improvement & Germplasm EnhancementLanzhou, 730070, China
- College of Agronomy, Gansu Agricultural UniversityLanzhou, 730070, China
| | - Erjing Si
- Gansu Provincial Key Lab of Aridland Crop Science/Gansu Key Lab of Crop Improvement & Germplasm EnhancementLanzhou, 730070, China
- College of Agronomy, Gansu Agricultural UniversityLanzhou, 730070, China
| | - Ke Yang
- Gansu Provincial Key Lab of Aridland Crop Science/Gansu Key Lab of Crop Improvement & Germplasm EnhancementLanzhou, 730070, China
- College of Agronomy, Gansu Agricultural UniversityLanzhou, 730070, China
| | - Xianliang Xu
- Gansu Provincial Key Lab of Aridland Crop Science/Gansu Key Lab of Crop Improvement & Germplasm EnhancementLanzhou, 730070, China
- College of Agronomy, Gansu Agricultural UniversityLanzhou, 730070, China
| | - Xunwu Shang
- Gansu Provincial Key Lab of Aridland Crop Science/Gansu Key Lab of Crop Improvement & Germplasm EnhancementLanzhou, 730070, China
| | - Huajun Wang
- Gansu Provincial Key Lab of Aridland Crop Science/Gansu Key Lab of Crop Improvement & Germplasm EnhancementLanzhou, 730070, China
- College of Agronomy, Gansu Agricultural UniversityLanzhou, 730070, China
| | - Di Wang
- Gansu Provincial Key Lab of Aridland Crop Science/Gansu Key Lab of Crop Improvement & Germplasm EnhancementLanzhou, 730070, China
- College of Agronomy, Gansu Agricultural UniversityLanzhou, 730070, China
| |
Collapse
|
14
|
Nesvizhskii AI. Proteogenomics: concepts, applications and computational strategies. Nat Methods 2015; 11:1114-25. [PMID: 25357241 DOI: 10.1038/nmeth.3144] [Citation(s) in RCA: 505] [Impact Index Per Article: 56.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 09/22/2014] [Indexed: 12/19/2022]
Abstract
Proteogenomics is an area of research at the interface of proteomics and genomics. In this approach, customized protein sequence databases generated using genomic and transcriptomic information are used to help identify novel peptides (not present in reference protein sequence databases) from mass spectrometry-based proteomic data; in turn, the proteomic data can be used to provide protein-level evidence of gene expression and to help refine gene models. In recent years, owing to the emergence of new sequencing technologies such as RNA-seq and dramatic improvements in the depth and throughput of mass spectrometry-based proteomics, the pace of proteogenomic research has greatly accelerated. Here I review the current state of proteogenomic methods and applications, including computational strategies for building and using customized protein sequence databases. I also draw attention to the challenge of false positive identifications in proteogenomics and provide guidelines for analyzing the data and reporting the results of proteogenomic studies.
Collapse
Affiliation(s)
- Alexey I Nesvizhskii
- 1] Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA. [2] Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
15
|
Penzlin A, Lindner MS, Doellinger J, Dabrowski PW, Nitsche A, Renard BY. Pipasic: similarity and expression correction for strain-level identification and quantification in metaproteomics. ACTA ACUST UNITED AC 2014; 30:i149-56. [PMID: 24931978 PMCID: PMC4058918 DOI: 10.1093/bioinformatics/btu267] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
MOTIVATION Metaproteomic analysis allows studying the interplay of organisms or functional groups and has become increasingly popular also for diagnostic purposes. However, difficulties arise owing to the high sequence similarity between related organisms. Further, the state of conservation of proteins between species can be correlated with their expression level, which can lead to significant bias in results and interpretation. These challenges are similar but not identical to the challenges arising in the analysis of metagenomic samples and require specific solutions. RESULTS We introduce Pipasic (peptide intensity-weighted proteome abundance similarity correction) as a tool that corrects identification and spectral counting-based quantification results using peptide similarity estimation and expression level weighting within a non-negative lasso framework. Pipasic has distinct advantages over approaches only regarding unique peptides or aggregating results to the lowest common ancestor, as demonstrated on examples of viral diagnostics and an acid mine drainage dataset. AVAILABILITY AND IMPLEMENTATION Pipasic source code is freely available from https://sourceforge.net/projects/pipasic/. CONTACT RenardB@rki.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anke Penzlin
- Research Group Bioinformatics (NG4), Centre for Biological Threats and Special Pathogens 1 (ZBS 1), Centre for Biological Threats and Special Pathogens 6 (ZBS 6) and Central Administration 4 (IT), Robert Koch Institute, 13353 Berlin, Germany
| | - Martin S Lindner
- Research Group Bioinformatics (NG4), Centre for Biological Threats and Special Pathogens 1 (ZBS 1), Centre for Biological Threats and Special Pathogens 6 (ZBS 6) and Central Administration 4 (IT), Robert Koch Institute, 13353 Berlin, Germany
| | - Joerg Doellinger
- Research Group Bioinformatics (NG4), Centre for Biological Threats and Special Pathogens 1 (ZBS 1), Centre for Biological Threats and Special Pathogens 6 (ZBS 6) and Central Administration 4 (IT), Robert Koch Institute, 13353 Berlin, GermanyResearch Group Bioinformatics (NG4), Centre for Biological Threats and Special Pathogens 1 (ZBS 1), Centre for Biological Threats and Special Pathogens 6 (ZBS 6) and Central Administration 4 (IT), Robert Koch Institute, 13353 Berlin, Germany
| | - Piotr Wojtek Dabrowski
- Research Group Bioinformatics (NG4), Centre for Biological Threats and Special Pathogens 1 (ZBS 1), Centre for Biological Threats and Special Pathogens 6 (ZBS 6) and Central Administration 4 (IT), Robert Koch Institute, 13353 Berlin, GermanyResearch Group Bioinformatics (NG4), Centre for Biological Threats and Special Pathogens 1 (ZBS 1), Centre for Biological Threats and Special Pathogens 6 (ZBS 6) and Central Administration 4 (IT), Robert Koch Institute, 13353 Berlin, Germany
| | - Andreas Nitsche
- Research Group Bioinformatics (NG4), Centre for Biological Threats and Special Pathogens 1 (ZBS 1), Centre for Biological Threats and Special Pathogens 6 (ZBS 6) and Central Administration 4 (IT), Robert Koch Institute, 13353 Berlin, Germany
| | - Bernhard Y Renard
- Research Group Bioinformatics (NG4), Centre for Biological Threats and Special Pathogens 1 (ZBS 1), Centre for Biological Threats and Special Pathogens 6 (ZBS 6) and Central Administration 4 (IT), Robert Koch Institute, 13353 Berlin, Germany
| |
Collapse
|
16
|
Cappellini E, Collins MJ, Gilbert MTP. Biochemistry. Unlocking ancient protein palimpsests. Science 2014; 343:1320-2. [PMID: 24653025 DOI: 10.1126/science.1249274] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Affiliation(s)
- Enrico Cappellini
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, 1350 Copenhagen, Denmark
| | | | | |
Collapse
|
17
|
Solazzo C, Wadsley M, Dyer JM, Clerens S, Collins MJ, Plowman J. Characterisation of novel α-keratin peptide markers for species identification in keratinous tissues using mass spectrometry. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2013; 27:2685-2698. [PMID: 24591030 DOI: 10.1002/rcm.6730] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2013] [Revised: 08/31/2013] [Accepted: 09/05/2013] [Indexed: 06/03/2023]
Abstract
RATIONALE In ancient and/or damaged artefacts containing keratinous materials, the species of origin of the materials can be difficult to identify through visual examination; therefore, a minimally destructive methodology for species identification is required. While hair fibres from some species have seen substantial characterisation, others such as horn or baleen have received little or no attention, or lack protein sequences allowing formal identification using proteomics techniques. METHODS We used the PMF method (Peptide Mass Fingerprinting with matrix-assisted laser desorption/ionisation time-of-flight mass spectrometry (MALDI-TOF-MS)) to catalogue and identify diagnostic peptide markers up to the genus level. Sequences were checked using nanoflow liquid chromatography/electrospray ionisation tandem mass spectrometry (nanoLC/ESI-MS/MS) and unidentified peptides were searched against a theoretical database generated by substituting amino acids in keratin sequences. RESULTS Specific peptides were identified by m/z and sequences characterised whenever possible for a range of species belonging to Bovidae and Camelidae, and for tissues such as baleen and horn. The theoretical database allowed an increase in the number of peptides of up to 10% in species with little genetic information. CONCLUSIONS A proteomics approach can successfully identify specific markers for the identification of materials to the genus level, and should be considered when identification by other means is not possible. Identification by PMF is fast, reliable and inexpensive.
Collapse
Affiliation(s)
- Caroline Solazzo
- BioArCh, Biology (S Block), Wentworth Way, University of York, York, YO10 5DD, UK; Proteins and Biomaterials, AgResearch Lincoln Research Centre, Private Bag 4749, Christchurch, 8140, New Zealand; Smithsonian's Museum Conservation Institute, 4210 Silver Hill Road, Suitland, MD, 20746, USA
| | | | | | | | | | | |
Collapse
|
18
|
Mayne J, Starr AE, Ning Z, Chen R, Chiang CK, Figeys D. Fine Tuning of Proteomic Technologies to Improve Biological Findings: Advancements in 2011–2013. Anal Chem 2013; 86:176-95. [DOI: 10.1021/ac403551f] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Janice Mayne
- Ottawa Institute of
Systems Biology, Department of Biochemistry, Microbiology
and Immunology, University of Ottawa, 451 Smyth Road, Ottawa, ON, Canada K1H8M5
| | - Amanda E. Starr
- Ottawa Institute of
Systems Biology, Department of Biochemistry, Microbiology
and Immunology, University of Ottawa, 451 Smyth Road, Ottawa, ON, Canada K1H8M5
| | - Zhibin Ning
- Ottawa Institute of
Systems Biology, Department of Biochemistry, Microbiology
and Immunology, University of Ottawa, 451 Smyth Road, Ottawa, ON, Canada K1H8M5
| | - Rui Chen
- Ottawa Institute of
Systems Biology, Department of Biochemistry, Microbiology
and Immunology, University of Ottawa, 451 Smyth Road, Ottawa, ON, Canada K1H8M5
| | - Cheng-Kang Chiang
- Ottawa Institute of
Systems Biology, Department of Biochemistry, Microbiology
and Immunology, University of Ottawa, 451 Smyth Road, Ottawa, ON, Canada K1H8M5
| | - Daniel Figeys
- Ottawa Institute of
Systems Biology, Department of Biochemistry, Microbiology
and Immunology, University of Ottawa, 451 Smyth Road, Ottawa, ON, Canada K1H8M5
| |
Collapse
|
19
|
Seifert J, Herbst FA, Halkjaer Nielsen P, Planes FJ, Jehmlich N, Ferrer M, von Bergen M. Bioinformatic progress and applications in metaproteogenomics for bridging the gap between genomic sequences and metabolic functions in microbial communities. Proteomics 2013; 13:2786-804. [PMID: 23625762 DOI: 10.1002/pmic.201200566] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2012] [Revised: 03/07/2013] [Accepted: 03/28/2013] [Indexed: 11/06/2022]
Abstract
Metaproteomics of microbial communities promises to add functional information to the blueprint of genes derived from metagenomics. Right from its beginning, the achievements and developments in metaproteomics were closely interlinked with metagenomics. In addition, the evaluation, visualization, and interpretation of metaproteome data demanded for the developments in bioinformatics. This review will give an overview about recent strategies to use genomic data either from public databases or organismal specific genomes/metagenomes to increase the number of identified proteins obtained by mass spectrometric measurements. We will review different published metaproteogenomic approaches in respect to the used MS pipeline and to the used protein identification workflow. Furthermore, different approaches of data visualization and strategies for phylogenetic interpretation of metaproteome data are discussed as well as approaches for functional mapping of the results to the investigated biological systems. This information will in the end allow a comprehensive analysis of interactions and interdependencies within microbial communities.
Collapse
Affiliation(s)
- Jana Seifert
- Department of Proteomics, UFZ-Helmholtz Centre for Environmental Research, Leipzig, Germany; Institute of Animal Nutrition, University of Hohenheim, Stuttgart, Germany
| | | | | | | | | | | | | |
Collapse
|
20
|
Abstract
High-throughput identification of proteins with the latest generation of hybrid high-resolution mass spectrometers is opening new perspectives in microbiology. I present, here, an overview of tandem mass spectrometry technology and bioinformatics for shotgun proteomics that make 2D-PAGE approaches obsolete. Non-labelling quantitative approaches have become more popular than labelling techniques on most proteomic platforms because they are easier to carry out while their quantitative outcome is rather robust. Parameters for recording mass spectrometry data, however, need to be chosen carefully and statistics to assess the confidence of the results should not be neglected. Interestingly, next-generation sequencing methodologies make any microbial model quickly amenable to proteomics, leading to the documentation of a wide range of organisms from diverse environments. Some recent discoveries made using microbial proteomics have challenged some biological dogma, such as: (i) initiation of the translation does not occur predominantly from ATG codons in some microorganisms, (ii) non-canonical initiation codons are used to regulate the production of specific but important proteins and (iii) a gene may code for multiple polypeptide species, heterogeneous in terms of sequences. Microbial diversity and microbial physiology can now be revisited by means of exhaustive comparative proteomic surveys where thousands of proteins are detected and quantified. Proteogenomics, consisting of better annotating of genomes with the help of proteomic evidence, is paving the way for integrated multi-omic approaches in microbiology. Finally, meta-proteomic tools and approaches are emerging for tackling the high complexity of the microbial world as a whole, opening new perspectives for assessing how microbial communities function.
Collapse
Affiliation(s)
- Jean Armengaud
- CEA, DSV, IBEB, Lab Biochim System Perturb, F-30207 Bagnols-sur-Cèze, France.
| |
Collapse
|