1
|
Thippabhotla S, Liu B, Podgorny A, Yooseph S, Yang Y, Zhang J, Zhong C. Integrated de novo gene prediction and peptide assembly of metagenomic sequencing data. NAR Genom Bioinform 2023; 5:lqad023. [PMID: 36915411 PMCID: PMC10006731 DOI: 10.1093/nargab/lqad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 12/03/2022] [Accepted: 02/18/2023] [Indexed: 03/16/2023] Open
Abstract
Metagenomics is the study of all genomic content contained in given microbial communities. Metagenomic functional analysis aims to quantify protein families and reconstruct metabolic pathways from the metagenome. It plays a central role in understanding the interaction between the microbial community and its host or environment. De novo functional analysis, which allows the discovery of novel protein families, remains challenging for high-complexity communities. There are currently three main approaches for recovering novel genes or proteins: de novo nucleotide assembly, gene calling and peptide assembly. Unfortunately, their information dependency has been overlooked, and each has been formulated as an independent problem. In this work, we develop a sophisticated workflow called integrated Metagenomic Protein Predictor (iMPP), which leverages the information dependencies for better de novo functional analysis. iMPP contains three novel modules: a hybrid assembly graph generation module, a graph-based gene calling module, and a peptide assembly-based refinement module. iMPP significantly improved the existing gene calling sensitivity on unassembled metagenomic reads, achieving a 92-97% recall rate at a high precision level (>85%). iMPP further allowed for more sensitive and accurate peptide assembly, recovering more reference proteins and delivering more hypothetical protein sequences. The high performance of iMPP can provide a more comprehensive and unbiased view of the microbial communities under investigation. iMPP is freely available from https://github.com/Sirisha-t/iMPP.
Collapse
Affiliation(s)
- Sirisha Thippabhotla
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS 66045, USA
| | - Ben Liu
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS 66045, USA
| | - Adam Podgorny
- Center for Computational Biology, The University of Kansas, Lawrence, KS 66045, USA
| | - Shibu Yooseph
- Department of Computer Science, Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, USA
| | - Youngik Yang
- National Marine Biodiversity Institute of Korea, 101-75, Jangsan-ro, Janghang-eup, Seochun-gun, Chungchungnam-do, 33662, South Korea
| | - Jun Zhang
- Division of Medical Oncology, Department of Internal Medicine, University of Kansas Medical Center, Kansas City, KS 66160, USA.,Department of Cancer Biology, University of Kansas Cancer Center; Kansas City, KS 66160, USA
| | - Cuncong Zhong
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS 66045, USA
| |
Collapse
|
2
|
Tariq MU, Haseeb M, Aledhari M, Razzak R, Parizi RM, Saeed F. Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 9:5497-5516. [PMID: 33537181 PMCID: PMC7853650 DOI: 10.1109/access.2020.3047588] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Big Data Proteogenomics lies at the intersection of high-throughput Mass Spectrometry (MS) based proteomics and Next Generation Sequencing based genomics. The combined and integrated analysis of these two high-throughput technologies can help discover novel proteins using genomic, and transcriptomic data. Due to the biological significance of integrated analysis, the recent past has seen an influx of proteogenomic tools that perform various tasks, including mapping proteins to the genomic data, searching experimental MS spectra against a six-frame translation genome database, and automating the process of annotating genome sequences. To date, most of such tools have not focused on scalability issues that are inherent in proteogenomic data analysis where the size of the database is much larger than a typical protein database. These state-of-the-art tools can take more than half a month to process a small-scale dataset of one million spectra against a genome of 3 GB. In this article, we provide an up-to-date review of tools that can analyze proteogenomic datasets, providing a critical analysis of the techniques' relative merits and potential pitfalls. We also point out potential bottlenecks and recommendations that can be incorporated in the future design of these workflows to ensure scalability with the increasing size of proteogenomic data. Lastly, we make a case of how high-performance computing (HPC) solutions may be the best bet to ensure the scalability of future big data proteogenomic data analysis.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Muhammad Haseeb
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Mohammed Aledhari
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Rehma Razzak
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Reza M Parizi
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Fahad Saeed
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| |
Collapse
|
3
|
Brown CT, Moritz D, O'Brien MP, Reidl F, Reiter T, Sullivan BD. Exploring neighborhoods in large metagenome assembly graphs using spacegraphcats reveals hidden sequence diversity. Genome Biol 2020; 21:164. [PMID: 32631445 PMCID: PMC7336657 DOI: 10.1186/s13059-020-02066-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Accepted: 05/29/2020] [Indexed: 11/10/2022] Open
Abstract
Genomes computationally inferred from large metagenomic data sets are often incomplete and may be missing functionally important content and strain variation. We introduce an information retrieval system for large metagenomic data sets that exploits the sparsity of DNA assembly graphs to efficiently extract subgraphs surrounding an inferred genome. We apply this system to recover missing content from genome bins and show that substantial genomic sequence variation is present in a real metagenome. Our software implementation is available at https://github.com/spacegraphcats/spacegraphcats under the 3-Clause BSD License.
Collapse
Affiliation(s)
- C Titus Brown
- Department of Population Health and Reproduction, University of California Davis, Davis, USA.
| | - Dominik Moritz
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA
| | | | - Felix Reidl
- Department of Computer Science, NC State University, Raleigh, USA
| | - Taylor Reiter
- Department of Population Health and Reproduction, University of California Davis, Davis, USA
| | - Blair D Sullivan
- Department of Computer Science, NC State University, Raleigh, USA.
| |
Collapse
|
4
|
Liu J, Lian Q, Chen Y, Qi J. Amino acid based de Bruijn graph algorithm for identifying complete coding genes from metagenomic and metatranscriptomic short reads. Nucleic Acids Res 2019; 47:e30. [PMID: 30657979 PMCID: PMC6412133 DOI: 10.1093/nar/gkz017] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Revised: 12/19/2018] [Accepted: 01/08/2019] [Indexed: 11/12/2022] Open
Abstract
Metagenomic studies, greatly promoted by the fast development of next-generation sequencing (NGS) technologies, uncover complex structures of microbial communities and their interactions with environment. As the majority of microbes lack information of genome sequences, it is essential to assemble prokaryotic genomes ab initio aiming to retrieve complete coding genes from various metabolic pathways. The complex nature of microbial composition and the burden of handling a vast amount of metagenomic data, bring great challenges to the development of effective and efficient bioinformatic tools. Here we present a protein assembler (MetaPA), based on de Bruijn graph searching on oligopeptide spaces and can be applied on both metagenomic and metatranscriptomic sequencing data. When public homologous protein sequences are involved to guide the assembling procedures, MetaPA assembles 85% of total proteins in complete sequences with high precision of 83% on real high-throughput sequencing datasets. Application of MetaPA on metatranscriptomic data successfully identifies the majority of actively transcribed genes validated in related studies. The results suggest that MetaPA has a good potential in both metagenomic and metatranscriptomic studies to characterize the composition and abundance of microbiota.
Collapse
Affiliation(s)
- Jiemeng Liu
- State key Laboratory of Genetic Engineering, Institute of Plant Biology, School of Life Sciences, Fudan University, Shanghai 200433, China.,The T-Life Research Center, Fudan University, Shanghai 200433, China
| | - Qichao Lian
- State key Laboratory of Genetic Engineering, Institute of Plant Biology, School of Life Sciences, Fudan University, Shanghai 200433, China
| | - Yamao Chen
- State key Laboratory of Genetic Engineering, Institute of Plant Biology, School of Life Sciences, Fudan University, Shanghai 200433, China
| | - Ji Qi
- State key Laboratory of Genetic Engineering, Institute of Plant Biology, School of Life Sciences, Fudan University, Shanghai 200433, China
| |
Collapse
|
5
|
Zhong C, Yang Y, Yooseph S. GRASP2: fast and memory-efficient gene-centric assembly and homolog search for metagenomic sequencing data. BMC Bioinformatics 2019; 20:276. [PMID: 31167633 PMCID: PMC6551247 DOI: 10.1186/s12859-019-2818-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A crucial task in metagenomic analysis is to annotate the function and taxonomy of the sequencing reads generated from a microbiome sample. In general, the reads can either be assembled into contigs and searched against reference databases, or individually searched without assembly. The first approach may suffer from fragmentary and incomplete assembly, while the second is hampered by the reduced functional signal contained in the short reads. To tackle these issues, we have previously developed GRASP (Guided Reference-based Assembly of Short Peptides), which accepts a reference protein sequence as input and aims to assemble its homologs from a database containing fragmentary protein sequences. In addition to a gene-centric assembly tool, GRASP also serves as a homolog search tool when using the assembled protein sequences as templates to recruit reads. GRASP has significantly improved recall rate (60-80% vs. 30-40%) compared to other homolog search tools such as BLAST. However, GRASP is both time- and space-consuming. Subsequently, we developed GRASPx, which is 30X faster than GRASP. Here, we present a completely redesigned algorithm, GRASP2, for this computational problem. RESULTS GRASP2 utilizes Burrows-Wheeler Transformation (BWT) and FM-index to perform assembly graph generation, and reduces the search space by employing a fast ungapped alignment strategy as a filter. GRASP2 also explicitly generates candidate paths prior to alignment, which effectively uncouples the iterative access of the assembly graph and alignment matrix. This strategy makes the execution of the program more efficient under current computer architecture, and contributes to GRASP2's speedup. GRASP2 is 8-fold faster than GRASPx (and 250-fold faster than GRASP) and uses 8-fold less memory while maintaining the original high recall rate of GRASP. GRASP2 reaches ~ 80% recall rate compared to that of ~ 40% generated by BLAST, both at a high precision level (> 95%). With such a high performance, GRASP2 is only ~3X slower than BLASTP. CONCLUSION GRASP2 is a high-performance gene-centric and homolog search tool with significant speedup compared to its predecessors, which makes GRASP2 a useful tool for metagenomics data analysis, GRASP2 is implemented in C++ and is freely available from http://www.sourceforge.net/projects/grasp2 .
Collapse
Affiliation(s)
- Cuncong Zhong
- Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS, 66045, USA.
| | - Youngik Yang
- National Marine Biodiversity Institute of Korea, 101-75, Jangsan-ro, Janghang-eup, Seochun-gun, Chungchungnam-do, 33662, South Korea
| | - Shibu Yooseph
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA
| |
Collapse
|
6
|
Peng G, Ji P, Zhao F. A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes. Genome Biol 2016; 17:232. [PMID: 27855707 PMCID: PMC5114782 DOI: 10.1186/s13059-016-1094-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Accepted: 10/31/2016] [Indexed: 11/10/2022] Open
Abstract
Most gene prediction methods detect coding sequences from transcriptome assemblies in the absence of closely related reference genomes. Such methods are of limited application due to high transcript fragmentation and extensive assembly errors, which may lead to redundant or false coding sequence predictions. We present inGAP-CDG, which can construct full-length and non-redundant coding sequences from unassembled transcriptomes by using a codon-based de Bruijn graph to simplify the assembly process and a machine learning-based approach to filter false positives. Compared with other methods, inGAP-CDG exhibits a significant increase in predicted coding sequence length and robustness to sequencing errors and varied read length.
Collapse
Affiliation(s)
- Gongxin Peng
- Computational Genomics Lab, Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Peifeng Ji
- Computational Genomics Lab, Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Fangqing Zhao
- Computational Genomics Lab, Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China. .,University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
7
|
Zhong C, Yang Y, Yooseph S. GRASPx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly. BMC Bioinformatics 2016; 17 Suppl 8:283. [PMID: 27585568 PMCID: PMC5009819 DOI: 10.1186/s12859-016-1119-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Metagenomics is a cultivation-independent approach that enables the study of the genomic composition of microbes present in an environment. Metagenomic samples are routinely sequenced using next-generation sequencing technologies that generate short nucleotide reads. Proteins identified from these reads are mostly of partial length. On the other hand, de novo assembly of a large metagenomic dataset is computationally demanding and the assembled contigs are often fragmented, resulting in the identification of protein sequences that are also of partial length and incomplete. Annotation of an incomplete protein sequence often proceeds by identifying its homologs in a database of reference sequences. Identifying the homologs of incomplete sequences is a challenge and can result in substandard annotation of proteins from metagenomic datasets. To address this problem, we recently developed a homology detection algorithm named GRASP (Guided Reference-based Assembly of Short Peptides) that identifies the homologs of a given reference protein sequence in a database of short peptide metagenomic sequences. GRASP was developed to implement a simultaneous alignment and assembly algorithm for annotation of short peptides identified on metagenomic reads. The program achieves significantly improved recall rate at the cost of computational efficiency. In this article, we adopted three techniques to speed up the original version of GRASP, including the pre-construction of extension links, local assembly of individual seeds, and the implementation of query-level parallelism. RESULTS The resulting new program, GRASPx, achieves >30X speedup compared to its predecessor GRASP. At the same time, we show that the performance of GRASPx is consistent with that of GRASP, and that both of them significantly outperform other popular homology-search tools including the BLAST and FASTA suites. GRASPx was also applied to a human saliva metagenome dataset and shows superior performance for both recall and precision rates. CONCLUSIONS In this article we present GRASPx, a fast and accurate homology-search program implementing a simultaneous alignment and assembly framework. GRASPx can be used for more comprehensive and accurate annotation of short peptides. GRASPx is freely available at http://graspx.sourceforge.net/ .
Collapse
Affiliation(s)
- Cuncong Zhong
- Informatics Department, J. Craig Venter Institute, La Jolla, CA 92037 USA
| | - Youngik Yang
- Informatics Department, J. Craig Venter Institute, La Jolla, CA 92037 USA
| | - Shibu Yooseph
- Informatics Department, J. Craig Venter Institute, La Jolla, CA 92037 USA
| |
Collapse
|
8
|
Zhong C, Edlund A, Yang Y, McLean JS, Yooseph S. Metagenome and Metatranscriptome Analyses Using Protein Family Profiles. PLoS Comput Biol 2016; 12:e1004991. [PMID: 27400380 PMCID: PMC4939949 DOI: 10.1371/journal.pcbi.1004991] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2015] [Accepted: 05/13/2016] [Indexed: 11/18/2022] Open
Abstract
Analyses of metagenome data (MG) and metatranscriptome data (MT) are often challenged by a paucity of complete reference genome sequences and the uneven/low sequencing depth of the constituent organisms in the microbial community, which respectively limit the power of reference-based alignment and de novo sequence assembly. These limitations make accurate protein family classification and abundance estimation challenging, which in turn hamper downstream analyses such as abundance profiling of metabolic pathways, identification of differentially encoded/expressed genes, and de novo reconstruction of complete gene and protein sequences from the protein family of interest. The profile hidden Markov model (HMM) framework enables the construction of very useful probabilistic models for protein families that allow for accurate modeling of position specific matches, insertions, and deletions. We present a novel homology detection algorithm that integrates banded Viterbi algorithm for profile HMM parsing with an iterative simultaneous alignment and assembly computational framework. The algorithm searches a given profile HMM of a protein family against a database of fragmentary MG/MT sequencing data and simultaneously assembles complete or near-complete gene and protein sequences of the protein family. The resulting program, HMM-GRASPx, demonstrates superior performance in aligning and assembling homologs when benchmarked on both simulated marine MG and real human saliva MG datasets. On real supragingival plaque and stool MG datasets that were generated from healthy individuals, HMM-GRASPx accurately estimates the abundances of the antimicrobial resistance (AMR) gene families and enables accurate characterization of the resistome profiles of these microbial communities. For real human oral microbiome MT datasets, using the HMM-GRASPx estimated transcript abundances significantly improves detection of differentially expressed (DE) genes. Finally, HMM-GRASPx was used to reconstruct comprehensive sets of complete or near-complete protein and nucleotide sequences for the query protein families. HMM-GRASPx is freely available online from http://sourceforge.net/projects/hmm-graspx.
Collapse
Affiliation(s)
- Cuncong Zhong
- J. Craig Venter Institute, La Jolla, California, United States of America
| | - Anna Edlund
- J. Craig Venter Institute, La Jolla, California, United States of America
- School of Dentistry, University of California Los Angeles, Los Angeles, California, United States of America
| | - Youngik Yang
- J. Craig Venter Institute, La Jolla, California, United States of America
| | - Jeffrey S. McLean
- J. Craig Venter Institute, La Jolla, California, United States of America
- Department of Periodontics, University of Washington, Seattle, Washington, United States of America
| | - Shibu Yooseph
- J. Craig Venter Institute, La Jolla, California, United States of America
| |
Collapse
|
9
|
Zhong C, Yang Y, Yooseph S. GRASP: guided reference-based assembly of short peptides. Nucleic Acids Res 2015; 43:e18. [PMID: 25414351 PMCID: PMC4330339 DOI: 10.1093/nar/gku1210] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2014] [Revised: 11/01/2014] [Accepted: 11/05/2014] [Indexed: 12/22/2022] Open
Abstract
Protein sequences predicted from metagenomic datasets are annotated by identifying their homologs via sequence comparisons with reference or curated proteins. However, a majority of metagenomic protein sequences are partial-length, arising as a result of identifying genes on sequencing reads or on assembled nucleotide contigs, which themselves are often very fragmented. The fragmented nature of metagenomic protein predictions adversely impacts homology detection and, therefore, the quality of the overall annotation of the dataset. Here we present a novel algorithm called GRASP that accurately identifies the homologs of a given reference protein sequence from a database consisting of partial-length metagenomic proteins. Our homology detection strategy is guided by the reference sequence, and involves the simultaneous search and assembly of overlapping database sequences. GRASP was compared to three commonly used protein sequence search programs (BLASTP, PSI-BLAST and FASTM). Our evaluations using several simulated and real datasets show that GRASP has a significantly higher sensitivity than these programs while maintaining a very high specificity. GRASP can be a very useful program for detecting and quantifying taxonomic and protein family abundances in metagenomic datasets. GRASP is implemented in GNU C++, and is freely available at http://sourceforge.net/projects/grasp-release.
Collapse
Affiliation(s)
- Cuncong Zhong
- Informatics Department, J. Craig Venter Institute, La Jolla, CA 92037, USA
| | - Youngik Yang
- Informatics Department, J. Craig Venter Institute, La Jolla, CA 92037, USA
| | - Shibu Yooseph
- Informatics Department, J. Craig Venter Institute, La Jolla, CA 92037, USA
| |
Collapse
|
10
|
Yang Y, Zhong C, Yooseph S. SFA-SPA: a suffix array based short peptide assembler for metagenomic data. ACTA ACUST UNITED AC 2015; 31:1833-5. [PMID: 25637561 DOI: 10.1093/bioinformatics/btv052] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2014] [Accepted: 01/26/2015] [Indexed: 11/14/2022]
Abstract
UNLABELLED The determination of protein sequences from a metagenomic dataset enables the study of metabolism and functional roles of the organisms that are present in the sampled microbial community. We had previously introduced algorithm and software for the accurate reconstruction of protein sequences from short peptides identified on nucleotide reads in a metagenomic dataset. Here, we present significant computational improvements to the short peptide assembly algorithm that make it practical to reconstruct proteins from large metagenomic datasets containing several hundred million reads, while maintaining accuracy. The improved computational efficiency is achieved using a suffix array data structure that allows for fast querying during the assembly process, and a significant redesign of assembly steps that enables multi-threaded execution. AVAILABILITY AND IMPLEMENTATION The program is available under the GPLv3 license from sourceforge.net/projects/spa-assembler.
Collapse
Affiliation(s)
- Youngik Yang
- Informatics Department, J. Craig Venter Institute, La Jolla, CA 92037, USA
| | - Cuncong Zhong
- Informatics Department, J. Craig Venter Institute, La Jolla, CA 92037, USA
| | - Shibu Yooseph
- Informatics Department, J. Craig Venter Institute, La Jolla, CA 92037, USA
| |
Collapse
|
11
|
Williams TJ, Cavicchioli R. Marine metaproteomics: deciphering the microbial metabolic food web. Trends Microbiol 2014; 22:248-60. [PMID: 24731505 DOI: 10.1016/j.tim.2014.03.004] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Revised: 03/12/2014] [Accepted: 03/12/2014] [Indexed: 10/25/2022]
Abstract
Metaproteomics can be applied to marine systems to discover metabolic processes in the ocean. This review describes current breakthroughs regarding marine microbes in the areas of microbial procurement of nutrients, important and previously unrecognized metabolic processes, functional roles for proteins with previously unknown functions, and intricate networks of metabolic interactions between symbiotic microbes and their hosts. By recognizing that metaproteomics empowers our understanding of the roles that marine microbes play in global biogeochemical cycles, the achievements to date from this advancing field highlight the enormous potential that the future holds.
Collapse
Affiliation(s)
- Timothy J Williams
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Ricardo Cavicchioli
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia.
| |
Collapse
|