2
|
Mitchell K, Brito JJ, Mandric I, Wu Q, Knyazev S, Chang S, Martin LS, Karlsberg A, Gerasimov E, Littman R, Hill BL, Wu NC, Yang HT, Hsieh K, Chen L, Littman E, Shabani T, Enik G, Yao D, Sun R, Schroeder J, Eskin E, Zelikovsky A, Skums P, Pop M, Mangul S. Benchmarking of computational error-correction methods for next-generation sequencing data. Genome Biol 2020; 21:71. [PMID: 32183840 PMCID: PMC7079412 DOI: 10.1186/s13059-020-01988-3] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Accepted: 03/06/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. RESULTS In this paper, we evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error-correction methods. CONCLUSIONS In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity.
Collapse
Affiliation(s)
- Keith Mitchell
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Jaqueline J Brito
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA
| | - Igor Mandric
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
| | - Qiaozhen Wu
- Department of Mathematics, University of California Los Angeles, 520 Portola Plaza, Los Angeles, CA, 90095, USA
| | - Sergey Knyazev
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
| | - Sei Chang
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Lana S Martin
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA
| | - Aaron Karlsberg
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA
| | - Ekaterina Gerasimov
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
| | - Russell Littman
- UCLA Bioinformatics, 621 Charles E Young Dr S, Los Angeles, CA, 90024, USA
| | - Brian L Hill
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Nicholas C Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Harry Taegyun Yang
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Kevin Hsieh
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Linus Chen
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Eli Littman
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Taylor Shabani
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - German Enik
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Douglas Yao
- Department of Molecular, Cell, and Developmental Biology, University of California Los Angeles, 650 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
| | - Ren Sun
- Department of Molecular and Medical Pharmacology, University of California Los Angeles, 650 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
| | - Jan Schroeder
- Epigenetics & Reprogramming Laboratory, Monash University, 15 Innovation Walk, Melbourne, VIC, 3800, Australia
| | - Eleazar Eskin
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
- The Laboratory of Bioinformatics, I.M, Sechenov First Moscow State Medical University, Moscow, Russia, 119991
| | - Pavel Skums
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
| | - Mihai Pop
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, 20742, USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA.
| |
Collapse
|
4
|
Kong S, Zhang Y. Deciphering Hi-C: from 3D genome to function. Cell Biol Toxicol 2019; 35:15-32. [PMID: 30610495 DOI: 10.1007/s10565-018-09456-2] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 12/02/2018] [Indexed: 12/11/2022]
Abstract
Hi-C is a commonly used technology in 3D genomics which can depict global chromatin interactions across eukaryotic genome. Integrating with different datasets, it can also be applied to studying various biological questions, such as nuclear organization, gene transcription regulation, spatiotemporal development, genome assembly, and cancer genomics. During the last decade, the development and application of Hi-C have dramatically changed the view of genome architecture, chromatin conformation, and gene interaction. So far, Hi-C-related studies remain vivacious and controversial; thus, a unified standard of library construction and bioinformatics analysis are urgently needed. In this review, we have summarized its history, development, methodologies, advances, applications, shortages, and future perspectives. We discuss a few limitations of the current Hi-C technologies and future directions for improvement and highlight how Hi-C can bridge 3D structure to gene function. This review will be helpful for scientists who want to engage in the 3D genomics field; it also shows some future tracks.
Collapse
Affiliation(s)
- Siyuan Kong
- Animal Functional Genomics Group, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, 7 Pengfei Road, Dapeng District, 518120, Shenzhen, People's Republic of China
| | - Yubo Zhang
- Animal Functional Genomics Group, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, 7 Pengfei Road, Dapeng District, 518120, Shenzhen, People's Republic of China.
| |
Collapse
|
6
|
Developing novel methods to image and visualize 3D genomes. Cell Biol Toxicol 2018; 34:367-380. [PMID: 29577183 PMCID: PMC6133007 DOI: 10.1007/s10565-018-9427-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Accepted: 03/11/2018] [Indexed: 02/07/2023]
Abstract
To investigate three-dimensional (3D) genome organization in prokaryotic and eukaryotic cells, three main strategies are employed, namely nuclear proximity ligation-based methods, imaging tools (such as fluorescence in situ hybridization (FISH) and its derivatives), and computational/visualization methods. Proximity ligation-based methods are based on digestion and re-ligation of physically proximal cross-linked chromatin fragments accompanied by massively parallel DNA sequencing to measure the relative spatial proximity between genomic loci. Imaging tools enable direct visualization and quantification of spatial distances between genomic loci, and advanced implementation of (super-resolution) microscopy helps to significantly improve the resolution of images. Computational methods are used to map global 3D genome structures at various scales driven by experimental data, and visualization methods are used to visualize genome 3D structures in virtual 3D space-based on algorithms. In this review, we focus on the introduction of novel imaging and visualization methods to study 3D genomes. First, we introduce the progress made recently in 3D genome imaging in both fixed cell and live cells based on long-probe labeling, short-probe labeling, RNA FISH, and the CRISPR system. As the fluorescence-capturing capability of a particular microscope is very important for the sensitivity of bioimaging experiments, we also introduce two novel super-resolution microscopy methods, SDOM and low-power super-resolution STED, which have potential for time-lapse super-resolution live-cell imaging of chromatin. Finally, we review some software tools developed recently to visualize proximity ligation-based data. The imaging and visualization methods are complementary to each other, and all three strategies are not mutually exclusive. These methods provide powerful tools to explore the mechanisms of gene regulation and transcription in cell nuclei.
Collapse
|