1
|
Elworth RAL, Wang Q, Kota PK, Barberan CJ, Coleman B, Balaji A, Gupta G, Baraniuk RG, Shrivastava A, Treangen T. To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics. Nucleic Acids Res 2020; 48:5217-5234. [PMID: 32338745 PMCID: PMC7261164 DOI: 10.1093/nar/gkaa265] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 03/20/2020] [Accepted: 04/04/2020] [Indexed: 02/01/2023] Open
Abstract
As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions.
Collapse
Affiliation(s)
| | - Qi Wang
- Systems, Synthetic, and Physical Biology (SSPB) Graduate Program, Houston, TX 77005, USA
| | - Pavan K Kota
- Department of Bioengineering, Houston, TX 77005, USA
| | - C J Barberan
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Benjamin Coleman
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Advait Balaji
- Department of Computer Science, Houston, TX 77005, USA
| | - Gaurav Gupta
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Richard G Baraniuk
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Anshumali Shrivastava
- Department of Computer Science, Houston, TX 77005, USA
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Todd J Treangen
- Department of Computer Science, Houston, TX 77005, USA
- Systems, Synthetic, and Physical Biology (SSPB) Graduate Program, Houston, TX 77005, USA
| |
Collapse
|