151
|
Methods to Study Translated Pseudogenes: Recombinant Expression and Complementation, Targeted Proteomics, and RNA Profiling. Methods Mol Biol 2021. [PMID: 34165719 DOI: 10.1007/978-1-0716-1503-4_15] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/19/2024]
Abstract
The technical challenge in proving that a given expressed pseudogene is in fact translated into a functional protein is specificity. To circumvent this challenge, one approach is to use PCR in order to generate a series of clones that allow one to exogenously express the pseudogenic protein of interest, either native or fused to a tag, which can facilitate purification, detection, and complementation in both bacterial and mammalian cells. This approach allows an assessment of whether a putative pseudogenic protein possesses enzymatic activity, to identify its subcellular localization and to test its capacity to complement the parental homolog. An alternative approach is to detect the endogenous protein using targeted proteomics analysis and to assess the full range of endogenous RNA isoforms, in order to consider additional coding and noncoding RNA functionality.
Collapse
|
152
|
Tunjić-Cvitanić M, Pasantes JJ, García-Souto D, Cvitanić T, Plohl M, Šatović-Vukšić E. Satellitome Analysis of the Pacific Oyster Crassostrea gigas Reveals New Pattern of Satellite DNA Organization, Highly Scattered across the Genome. Int J Mol Sci 2021; 22:ijms22136798. [PMID: 34202698 PMCID: PMC8268682 DOI: 10.3390/ijms22136798] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 06/18/2021] [Accepted: 06/19/2021] [Indexed: 12/22/2022] Open
Abstract
Several features already qualified the invasive bivalve species Crassostrea gigas as a valuable non-standard model organism in genome research. C. gigas is characterized by the low contribution of satellite DNAs (satDNAs) vs. mobile elements and has an extremely low amount of heterochromatin, predominantly built of DNA transposons. In this work, we have identified 52 satDNAs composing the satellitome of C. gigas and constituting about 6.33% of the genome. Satellitome analysis reveals unusual, highly scattered organization of relatively short satDNA arrays across the whole genome. However, peculiar chromosomal distribution and densities are specific for each satDNA. The inspection of the organizational forms of the 11 most abundant satDNAs shows association with constitutive parts of Helitron mobile elements. Nine of the inspected satDNAs are dominantly found in mobile element-associated form, two mostly appear standalone, and only one is present exclusively as Helitron-associated sequence. The Helitron-related satDNAs appear in more chromosomes than other satDNAs, indicating that these mobile elements could be leading satDNA propagation in C. gigas. No significant accumulation of satDNAs on certain chromosomal positions was detected in C. gigas, thus establishing a novel pattern of satDNA organization on the genome level.
Collapse
Affiliation(s)
- Monika Tunjić-Cvitanić
- Division of Molecular Biology, Ruđer Bošković Institute, 10000 Zagreb, Croatia; (M.T.-C.); (M.P.)
| | - Juan J. Pasantes
- Centro de Investigación Mariña, Universidade de Vigo, Dpto de Bioquímica, Xenética e Inmunoloxía, 36310 Vigo, Spain;
| | - Daniel García-Souto
- Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, 15706 Santiago de Compostela, Spain;
- Department of Zoology, Genetics and Physical Anthropology, Universidade de Santiago de Compostela, 15706 Santiago de Compostela, Spain
| | - Tonči Cvitanić
- Rimac Automobili d.o.o., Ljubljanska ulica 7, 10431 Sveta Nedelja, Croatia;
| | - Miroslav Plohl
- Division of Molecular Biology, Ruđer Bošković Institute, 10000 Zagreb, Croatia; (M.T.-C.); (M.P.)
| | - Eva Šatović-Vukšić
- Division of Molecular Biology, Ruđer Bošković Institute, 10000 Zagreb, Croatia; (M.T.-C.); (M.P.)
- Correspondence:
| |
Collapse
|
153
|
Tvedte ES, Gasser M, Sparklin BC, Michalski J, Hjelmen CE, Johnston JS, Zhao X, Bromley R, Tallon LJ, Sadzewicz L, Rasko DA, Dunning Hotopp JC. Comparison of long-read sequencing technologies in interrogating bacteria and fly genomes. G3 (BETHESDA, MD.) 2021; 11:jkab083. [PMID: 33768248 PMCID: PMC8495745 DOI: 10.1093/g3journal/jkab083] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 03/07/2021] [Indexed: 12/14/2022]
Abstract
The newest generation of DNA sequencing technology is highlighted by the ability to generate sequence reads hundreds of kilobases in length. Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) have pioneered competitive long read platforms, with more recent work focused on improving sequencing throughput and per-base accuracy. We used whole-genome sequencing data produced by three PacBio protocols (Sequel II CLR, Sequel II HiFi, RS II) and two ONT protocols (Rapid Sequencing and Ligation Sequencing) to compare assemblies of the bacteria Escherichia coli and the fruit fly Drosophila ananassae. In both organisms tested, Sequel II assemblies had the highest consensus accuracy, even after accounting for differences in sequencing throughput. ONT and PacBio CLR had the longest reads sequenced compared to PacBio RS II and HiFi, and genome contiguity was highest when assembling these datasets. ONT Rapid Sequencing libraries had the fewest chimeric reads in addition to superior quantification of E. coli plasmids versus ligation-based libraries. The quality of assemblies can be enhanced by adopting hybrid approaches using Illumina libraries for bacterial genome assembly or polishing eukaryotic genome assemblies, and an ONT-Illumina hybrid approach would be more cost-effective for many users. Genome-wide DNA methylation could be detected using both technologies, however ONT libraries enabled the identification of a broader range of known E. coli methyltransferase recognition motifs in addition to undocumented D. ananassae motifs. The ideal choice of long read technology may depend on several factors including the question or hypothesis under examination. No single technology outperformed others in all metrics examined.
Collapse
Affiliation(s)
- Eric S Tvedte
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Mark Gasser
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Benjamin C Sparklin
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Jane Michalski
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Carl E Hjelmen
- Department of Biology, Texas A&M University, College Station, TX 77843, USA
| | - J Spencer Johnston
- Department of Entomology, Texas A&M University, College Station, TX 77843, USA
| | - Xuechu Zhao
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Robin Bromley
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Luke J Tallon
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Lisa Sadzewicz
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - David A Rasko
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Julie C Dunning Hotopp
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA
- Greenebaum Cancer Center, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| |
Collapse
|
154
|
Suh A, Dion-Côté AM. New Perspectives on the Evolution of Within-Individual Genome Variation and Germline/Soma Distinction. Genome Biol Evol 2021; 13:evab095. [PMID: 33963843 PMCID: PMC8245192 DOI: 10.1093/gbe/evab095] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/07/2021] [Indexed: 12/19/2022] Open
Abstract
Genomes can vary significantly even within the same individual. The underlying mechanisms are manifold, ranging from somatic mutation and recombination, development-associated ploidy changes and genetic bottlenecks, over to programmed DNA elimination during germline/soma differentiation. In this perspective piece, we briefly review recent developments in the study of within-individual genome variation in eukaryotes and prokaryotes. We highlight a Society for Molecular Biology and Evolution 2020 virtual symposium entitled "Within-individual genome variation and germline/soma distinction" and the present Special Section of the same name in Genome Biology and Evolution, together fostering cross-taxon synergies in the field to identify and tackle key open questions in the understanding of within-individual genome variation.
Collapse
Affiliation(s)
- Alexander Suh
- School of Biological Sciences—Organisms and the Environment, University of East Anglia, Norwich, United Kingdom
- Department of Organismal Biology—Systematic Biology, Evolutionary Biology Centre (EBC), Science for Life Laboratory, Uppsala University, Sweden
| | | |
Collapse
|
155
|
Guiglielmoni N, Houtain A, Derzelle A, Van Doninck K, Flot JF. Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms. BMC Bioinformatics 2021; 22:303. [PMID: 34090340 PMCID: PMC8178825 DOI: 10.1186/s12859-021-04118-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 04/02/2021] [Indexed: 12/21/2022] Open
Abstract
Background Long-read sequencing is revolutionizing genome assembly: as PacBio and Nanopore technologies become more accessible in technicity and in cost, long-read assemblers flourish and are starting to deliver chromosome-level assemblies. However, these long reads are usually error-prone, making the generation of a haploid reference out of a diploid genome a difficult enterprise. Failure to properly collapse haplotypes results in fragmented and structurally incorrect assemblies and wreaks havoc on orthology inference pipelines, yet this serious issue is rarely acknowledged and dealt with in genomic projects, and an independent, comparative benchmark of the capacity of assemblers and post-processing tools to properly collapse or purge haplotypes is still lacking. Results We tested different assembly strategies on the genome of the rotifer Adineta vaga, a non-model organism for which high coverages of both PacBio and Nanopore reads were available. The assemblers we tested (Canu, Flye, NextDenovo, Ra, Raven, Shasta and wtdbg2) exhibited strikingly different behaviors when dealing with highly heterozygous regions, resulting in variable amounts of uncollapsed haplotypes. Filtering reads generally improved haploid assemblies, and we also benchmarked three post-processing tools aimed at detecting and purging uncollapsed haplotypes in long-read assemblies: HaploMerger2, purge_haplotigs and purge_dups. Conclusions We provide a thorough evaluation of popular assemblers on a non-model eukaryote genome with variable levels of heterozygosity. Our study highlights several strategies using pre and post-processing approaches to generate haploid assemblies with high continuity and completeness. This benchmark will help users to improve haploid assemblies of non-model organisms, and evaluate the quality of their own assemblies. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04118-3.
Collapse
Affiliation(s)
- Nadège Guiglielmoni
- Service Evolution Biologique et Ecologie, Université libre de Bruxelles (ULB), Avenue Franklin D. Roosevelt 50, 1050, Brussels, Belgium.
| | - Antoine Houtain
- Laboratoire d'Ecologie et Génétique Evolutive, Université de Namur, Rue de Bruxelles 61, 5000, Namur, Belgium
| | - Alessandro Derzelle
- Laboratoire d'Ecologie et Génétique Evolutive, Université de Namur, Rue de Bruxelles 61, 5000, Namur, Belgium
| | - Karine Van Doninck
- Laboratoire d'Ecologie et Génétique Evolutive, Université de Namur, Rue de Bruxelles 61, 5000, Namur, Belgium.,Département de Biologie des Organismes, Université libre de Bruxelles (ULB), Avenue Franklin D. Roosevelt 50, 1050, Brussels, Belgium
| | - Jean-François Flot
- Service Evolution Biologique et Ecologie, Université libre de Bruxelles (ULB), Avenue Franklin D. Roosevelt 50, 1050, Brussels, Belgium.,Interuniversity Institute of Bioinformatics in Brussels - (IB)², Avenue Franklin D. Roosevelt 50, 1050, Brussels, Belgium
| |
Collapse
|
156
|
Ono Y, Asai K, Hamada M. PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores. Bioinformatics 2021; 37:589-595. [PMID: 32976553 PMCID: PMC8097687 DOI: 10.1093/bioinformatics/btaa835] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 08/20/2020] [Accepted: 09/11/2020] [Indexed: 12/21/2022] Open
Abstract
Motivation Recent advances in high-throughput long-read sequencers, such as PacBio and Oxford Nanopore sequencers, produce longer reads with more errors than short-read sequencers. In addition to the high error rates of reads, non-uniformity of errors leads to difficulties in various downstream analyses using long reads. Many useful simulators, which characterize long-read error patterns and simulate them, have been developed. However, there is still room for improvement in the simulation of the non-uniformity of errors. Results To capture characteristics of errors in reads for long-read sequencers, here, we introduce a generative model for quality scores, in which a hidden Markov Model with a latest model selection method, called factorized information criteria, is utilized. We evaluated our developed simulator from various points, indicating that our simulator successfully simulates reads that are consistent with real reads. Availability and implementation The source codes of PBSIM2 are freely available from https://github.com/yukiteruono/pbsim2. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yukiteru Ono
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa 277-8561, Japan
| | - Kiyoshi Asai
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa 277-8561, Japan.,Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,Institute for Medical-oriented Structural Biology, Waseda University, Tokyo 162-8480, Japan.,Graduate School of Medicine, Nippon Medical School, Tokyo 113-8602, Japan
| |
Collapse
|
157
|
Quan C, Li Y, Liu X, Wang Y, Ping J, Lu Y, Zhou G. Characterization of structural variation in Tibetans reveals new evidence of high-altitude adaptation and introgression. Genome Biol 2021; 22:159. [PMID: 34034800 PMCID: PMC8146648 DOI: 10.1186/s13059-021-02382-3] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 05/14/2021] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Structural variation (SV) acts as an essential mutational force shaping the evolution and function of the human genome. However, few studies have examined the role of SVs in high-altitude adaptation and little is known of adaptive introgressed SVs in Tibetans so far. RESULTS Here, we generate a comprehensive catalog of SVs in a Chinese Tibetan (n = 15) and Han (n = 10) population using nanopore sequencing technology. Among a total of 38,216 unique SVs in the catalog, 27% are sequence-resolved for the first time. We systematically assess the distribution of these SVs across repeat sequences and functional genomic regions. Through genotyping in additional 276 genomes, we identify 69 Tibetan-Han stratified SVs and 80 candidate adaptive genes. We also discover a few adaptive introgressed SV candidates and provide evidence for a deletion of 335 base pairs at 1p36.32. CONCLUSIONS Overall, our results highlight the important role of SVs in the evolutionary processes of Tibetans' adaptation to the Qinghai-Tibet Plateau and provide a valuable resource for future high-altitude adaptation studies.
Collapse
Affiliation(s)
- Cheng Quan
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing, 100850 People’s Republic of China
| | - Yuanfeng Li
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing, 100850 People’s Republic of China
| | - Xinyi Liu
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing, 100850 People’s Republic of China
| | - Yahui Wang
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing, 100850 People’s Republic of China
| | - Jie Ping
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing, 100850 People’s Republic of China
| | - Yiming Lu
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing, 100850 People’s Republic of China
- Hebei University, Baoding, Hebei Province 071002 People’s Republic of China
| | - Gangqiao Zhou
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing, 100850 People’s Republic of China
- Hebei University, Baoding, Hebei Province 071002 People’s Republic of China
- Collaborative Innovation Center for Personalized Cancer Medicine, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu Province 211166 People’s Republic of China
- Medical College of Guizhou University, Guiyang, Guizhou Province 550025 People’s Republic of China
| |
Collapse
|
158
|
Hadi K, Yao X, Behr JM, Deshpande A, Xanthopoulakis C, Tian H, Kudman S, Rosiene J, Darmofal M, DeRose J, Mortensen R, Adney EM, Shaiber A, Gajic Z, Sigouros M, Eng K, Wala JA, Wrzeszczyński KO, Arora K, Shah M, Emde AK, Felice V, Frank MO, Darnell RB, Ghandi M, Huang F, Dewhurst S, Maciejowski J, de Lange T, Setton J, Riaz N, Reis-Filho JS, Powell S, Knowles DA, Reznik E, Mishra B, Beroukhim R, Zody MC, Robine N, Oman KM, Sanchez CA, Kuhner MK, Smith LP, Galipeau PC, Paulson TG, Reid BJ, Li X, Wilkes D, Sboner A, Mosquera JM, Elemento O, Imielinski M. Distinct Classes of Complex Structural Variation Uncovered across Thousands of Cancer Genome Graphs. Cell 2021; 183:197-210.e32. [PMID: 33007263 DOI: 10.1016/j.cell.2020.08.006] [Citation(s) in RCA: 127] [Impact Index Per Article: 42.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2019] [Revised: 04/08/2020] [Accepted: 08/03/2020] [Indexed: 12/12/2022]
Abstract
Cancer genomes often harbor hundreds of somatic DNA rearrangement junctions, many of which cannot be easily classified into simple (e.g., deletion) or complex (e.g., chromothripsis) structural variant classes. Applying a novel genome graph computational paradigm to analyze the topology of junction copy number (JCN) across 2,778 tumor whole-genome sequences, we uncovered three novel complex rearrangement phenomena: pyrgo, rigma, and tyfonas. Pyrgo are "towers" of low-JCN duplications associated with early-replicating regions, superenhancers, and breast or ovarian cancers. Rigma comprise "chasms" of low-JCN deletions enriched in late-replicating fragile sites and gastrointestinal carcinomas. Tyfonas are "typhoons" of high-JCN junctions and fold-back inversions associated with expressed protein-coding fusions, breakend hypermutation, and acral, but not cutaneous, melanomas. Clustering of tumors according to genome graph-derived features identified subgroups associated with DNA repair defects and poor prognosis.
Collapse
Affiliation(s)
- Kevin Hadi
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; New York Genome Center, New York, NY 10013, USA
| | - Xiaotong Yao
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; New York Genome Center, New York, NY 10013, USA; Tri-institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Julie M Behr
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; New York Genome Center, New York, NY 10013, USA; Tri-institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Aditya Deshpande
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; New York Genome Center, New York, NY 10013, USA; Tri-institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | | | - Huasong Tian
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; New York Genome Center, New York, NY 10013, USA
| | - Sarah Kudman
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Joel Rosiene
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; New York Genome Center, New York, NY 10013, USA
| | - Madison Darmofal
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; New York Genome Center, New York, NY 10013, USA; Tri-institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | | | | | - Emily M Adney
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; New York Genome Center, New York, NY 10013, USA
| | - Alon Shaiber
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; New York Genome Center, New York, NY 10013, USA; Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Zoran Gajic
- New York Genome Center, New York, NY 10013, USA
| | - Michael Sigouros
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Kenneth Eng
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Jeremiah A Wala
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Departments of Medical Oncology and Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; School of Medicine, University of California, San Francisco, San Francisco, CA 94143, USA
| | | | | | - Minita Shah
- New York Genome Center, New York, NY 10013, USA
| | | | | | - Mayu O Frank
- New York Genome Center, New York, NY 10013, USA; Laboratory of Molecular Neuro-Oncology and Howard Hughes Medical Institute, The Rockefeller University, New York, NY 10065, USA
| | - Robert B Darnell
- New York Genome Center, New York, NY 10013, USA; Laboratory of Molecular Neuro-Oncology and Howard Hughes Medical Institute, The Rockefeller University, New York, NY 10065, USA
| | - Mahmoud Ghandi
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Franklin Huang
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; School of Medicine, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Sally Dewhurst
- Laboratory of Cell Biology and Genetics, The Rockefeller University, New York, NY 10065, USA
| | - John Maciejowski
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Titia de Lange
- Laboratory of Cell Biology and Genetics, The Rockefeller University, New York, NY 10065, USA
| | - Jeremy Setton
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Nadeem Riaz
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Immunogenomics and Precision Oncology Platform, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Jorge S Reis-Filho
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Simon Powell
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - David A Knowles
- New York Genome Center, New York, NY 10013, USA; Department of Computer Science, Columbia University, New York, NY 10027, USA
| | - Ed Reznik
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Bud Mishra
- Departments of Computer Science, Mathematics and Cell Biology, Courant Institute and NYU School of Medicine, New York University, New York, NY 10012, USA
| | - Rameen Beroukhim
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Departments of Medical Oncology and Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | | | | | - Kenji M Oman
- Divisions of Human Biology and Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Carissa A Sanchez
- Divisions of Human Biology and Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Mary K Kuhner
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Lucian P Smith
- Department of Bioengineering, University of Washington, Seattle, WA 98195, USA
| | - Patricia C Galipeau
- Divisions of Human Biology and Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Thomas G Paulson
- Divisions of Human Biology and Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Brian J Reid
- Divisions of Human Biology and Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Xiaohong Li
- Divisions of Human Biology and Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - David Wilkes
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Andrea Sboner
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Juan Miguel Mosquera
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Olivier Elemento
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Marcin Imielinski
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; New York Genome Center, New York, NY 10013, USA; Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA.
| |
Collapse
|
159
|
Savara J, Novosád T, Gajdoš P, Kriegová E. Comparison of structural variants detected by optical mapping with long-read next-generation sequencing. Bioinformatics 2021; 37:3398-3404. [PMID: 33983367 DOI: 10.1093/bioinformatics/btab359] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 04/21/2021] [Accepted: 05/08/2021] [Indexed: 12/29/2022] Open
Abstract
MOTIVATION Recent studies have shown the potential of using long-read whole-genome sequencing (WGS) approaches and optical mapping (OM) for the detection of clinically relevant structural variants (SVs) in cancer research. Three main long-read WGS platforms are currently in use: Pacific Biosciences (PacBio), Oxford Nanopore Technologies (ONT) and 10x Genomics. Recently, whole-genome OM technology (Bionano Genomics) has been introduced into human diagnostics. Questions remain about the accuracy of these long-read sequencing platforms, how comparable/interchangeable they are when searching for SVs and to what extent they can be replaced or supplemented by OM. Moreover, no tool can effectively compare SVs obtained by OM and WGS. RESULTS This study compared optical maps of the breast cancer cell line SKBR3 with AnnotSV outputs from WGS platforms. For this purpose, a software tool with comparative and filtering features was developed. The majority of SVs up to a 50 kbp distance variance threshold found by OM were confirmed by all WGS platforms, and 99% of translocations and 80% of deletions found by OM were confirmed by both PacBio and ONT, with ∼70% being confirmed by 10x Genomics in combination with PacBio and/or ONT. Interestingly, long deletions (>100 kbp) were detected only by 10x Genomics. Regarding insertions, ∼72% was confirmed by PacBio and ONT, but none by 10x Genomics. Inversions and duplications detected by OM were not detected by WGS. Moreover, the tool enabled the confirmation of SVs that overlapped in the same gene(s) and was applied to the filtering of disease-associated SVs. AVAILABILITY https://github.com/novosadt/om-annotsv-svc.
Collapse
Affiliation(s)
- Jakub Savara
- Department of Computer Science, VSB-Technical University of Ostrava, Ostrava, 708 00, Czech Republic
- Department of Immunology, Faculty of Medicine and Dentistry, Palacký University in Olomouc and University Hospital Olomouc, 779 00, Olomouc, Czech Republic
| | - Tomáš Novosád
- Department of Computer Science, VSB-Technical University of Ostrava, Ostrava, 708 00, Czech Republic
| | - Petr Gajdoš
- Department of Computer Science, VSB-Technical University of Ostrava, Ostrava, 708 00, Czech Republic
| | - Eva Kriegová
- Department of Immunology, Faculty of Medicine and Dentistry, Palacký University in Olomouc and University Hospital Olomouc, 779 00, Olomouc, Czech Republic
| |
Collapse
|
160
|
Liu Y, Jiang T, Su J, Liu B, Zang T, Wang Y. SKSV: ultrafast structural variation detection from circular consensus sequencing reads. Bioinformatics 2021; 37:3647-3649. [PMID: 33963826 DOI: 10.1093/bioinformatics/btab341] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 04/29/2021] [Accepted: 05/04/2021] [Indexed: 01/23/2023] Open
Abstract
SUMMARY Circular consensus sequencing (CCS) reads are promising for the comprehensive detection of structural variants (SVs). However, alignment-based SV calling pipelines are computationally intensive due to the generation of complete read-alignments and its post-processing. Herein, we propose a SKeleton-based analysis toolkit for Structural Variation detection (SKSV). Benchmarks on real and simulated datasets demonstrate that SKSV has an order of magnitude of faster speed than state-of-the-art SV calling approaches, moreover, it achieves higher F1 scores for various types of SVs. AVAILABILITY SKSV is available from https://github.com/ydLiu-HIT/SKSV. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yadong Liu
- Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Tao Jiang
- Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Junhao Su
- Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.,Department of Computer Science, The University of Hong Kong, Hong Kong 999077, China
| | - Bo Liu
- Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Tianyi Zang
- Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yadong Wang
- Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| |
Collapse
|
161
|
Lopes M, Louzada S, Gama-Carvalho M, Chaves R. Genomic Tackling of Human Satellite DNA: Breaking Barriers through Time. Int J Mol Sci 2021; 22:4707. [PMID: 33946766 PMCID: PMC8125562 DOI: 10.3390/ijms22094707] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 04/24/2021] [Accepted: 04/27/2021] [Indexed: 12/12/2022] Open
Abstract
(Peri)centromeric repetitive sequences and, more specifically, satellite DNA (satDNA) sequences, constitute a major human genomic component. SatDNA sequences can vary on a large number of features, including nucleotide composition, complexity, and abundance. Several satDNA families have been identified and characterized in the human genome through time, albeit at different speeds. Human satDNA families present a high degree of sub-variability, leading to the definition of various subfamilies with different organization and clustered localization. Evolution of satDNA analysis has enabled the progressive characterization of satDNA features. Despite recent advances in the sequencing of centromeric arrays, comprehensive genomic studies to assess their variability are still required to provide accurate and proportional representation of satDNA (peri)centromeric/acrocentric short arm sequences. Approaches combining multiple techniques have been successfully applied and seem to be the path to follow for generating integrated knowledge in the promising field of human satDNA biology.
Collapse
Affiliation(s)
- Mariana Lopes
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (M.L.); (S.L.)
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| | - Sandra Louzada
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (M.L.); (S.L.)
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| | - Margarida Gama-Carvalho
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| | - Raquel Chaves
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (M.L.); (S.L.)
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| |
Collapse
|
162
|
Nattestad M, Aboukhalil R, Chin CS, Schatz MC. Ribbon: intuitive visualization for complex genomic variation. Bioinformatics 2021; 37:413-415. [PMID: 32766814 DOI: 10.1093/bioinformatics/btaa680] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 06/15/2020] [Accepted: 07/21/2020] [Indexed: 01/08/2023] Open
Abstract
SUMMARY Ribbon is an alignment visualization tool that shows how alignments are positioned within both the reference and read contexts, giving an intuitive view that enables a better understanding of structural variants and the read evidence supporting them. Ribbon was born out of a need to curate complex structural variant calls and determine whether each was well supported by long-read evidence, and it uses the same intuitive visualization method to shed light on contig alignments from genome-to-genome comparisons. AVAILABILITY AND IMPLEMENTATION Ribbon is freely available online at http://genomeribbon.com/ and is open-source at https://github.com/marianattestad/ribbon. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maria Nattestad
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | | | | | - Michael C Schatz
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.,Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA.,Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
163
|
Kronenberg ZN, Rhie A, Koren S, Concepcion GT, Peluso P, Munson KM, Porubsky D, Kuhn K, Mueller KA, Low WY, Hiendleder S, Fedrigo O, Liachko I, Hall RJ, Phillippy AM, Eichler EE, Williams JL, Smith TPL, Jarvis ED, Sullivan ST, Kingan SB. Extended haplotype-phasing of long-read de novo genome assemblies using Hi-C. Nat Commun 2021; 12:1935. [PMID: 33911078 PMCID: PMC8081726 DOI: 10.1038/s41467-020-20536-y] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 11/12/2020] [Indexed: 01/27/2023] Open
Abstract
Haplotype-resolved genome assemblies are important for understanding how combinations of variants impact phenotypes. To date, these assemblies have been best created with complex protocols, such as cultured cells that contain a single-haplotype (haploid) genome, single cells where haplotypes are separated, or co-sequencing of parental genomes in a trio-based approach. These approaches are impractical in most situations. To address this issue, we present FALCON-Phase, a phasing tool that uses ultra-long-range Hi-C chromatin interaction data to extend phase blocks of partially-phased diploid assembles to chromosome or scaffold scale. FALCON-Phase uses the inherent phasing information in Hi-C reads, skipping variant calling, and reduces the computational complexity of phasing. Our method is validated on three benchmark datasets generated as part of the Vertebrate Genomes Project (VGP), including human, cow, and zebra finch, for which high-quality, fully haplotype-resolved assemblies are available using the trio-based approach. FALCON-Phase is accurate without having parental data and performance is better in samples with higher heterozygosity. For cow and zebra finch the accuracy is 97% compared to 80-91% for human. FALCON-Phase is applicable to any draft assembly that contains long primary contigs and phased associate contigs.
Collapse
Affiliation(s)
- Zev N Kronenberg
- Phase Genomics, Seattle, WA, USA.
- Pacific Biosciences, Menlo Park, CA, USA.
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | | | | | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kristen Kuhn
- US Meat Animal Research Center, ARS USDA, Clay Center, NE, USA
| | | | - Wai Yee Low
- Davies Research Centre, School of Animal and Veterinary Sciences, The University of Adelaide, Roseworthy, SA, Australia
| | - Stefan Hiendleder
- Davies Research Centre, School of Animal and Veterinary Sciences, The University of Adelaide, Roseworthy, SA, Australia
| | - Olivier Fedrigo
- Vertebrate Genomes Laboratory, The Rockefeller University, New York, NY, USA
| | | | | | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - John L Williams
- Davies Research Centre, School of Animal and Veterinary Sciences, The University of Adelaide, Roseworthy, SA, Australia
- Dipartimento di Scienze Animali, della Nutrizione e degli Alimenti, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
| | | | - Erich D Jarvis
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | | | | |
Collapse
|
164
|
Wahlster L, Verboon JM, Ludwig LS, Black SC, Luo W, Garg K, Voit RA, Collins RL, Garimella K, Costello M, Chao KR, Goodrich JK, DiTroia SP, O'Donnell-Luria A, Talkowski ME, Michelson AD, Cantor AB, Sankaran VG. Familial thrombocytopenia due to a complex structural variant resulting in a WAC-ANKRD26 fusion transcript. J Exp Med 2021; 218:211998. [PMID: 33857290 PMCID: PMC8056752 DOI: 10.1084/jem.20210444] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 03/09/2021] [Accepted: 03/11/2021] [Indexed: 12/11/2022] Open
Abstract
Advances in genome sequencing have resulted in the identification of the causes for numerous rare diseases. However, many cases remain unsolved with standard molecular analyses. We describe a family presenting with a phenotype resembling inherited thrombocytopenia 2 (THC2). THC2 is generally caused by single nucleotide variants that prevent silencing of ANKRD26 expression during hematopoietic differentiation. Short-read whole-exome and genome sequencing approaches were unable to identify a causal variant in this family. Using long-read whole-genome sequencing, a large complex structural variant involving a paired-duplication inversion was identified. Through functional studies, we show that this structural variant results in a pathogenic gain-of-function WAC-ANKRD26 fusion transcript. Our findings illustrate how complex structural variants that may be missed by conventional genome sequencing approaches can cause human disease.
Collapse
Affiliation(s)
- Lara Wahlster
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA.,Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA.,Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
| | - Jeffrey M Verboon
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA.,Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA.,Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
| | - Leif S Ludwig
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA.,Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA.,Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
| | - Susan C Black
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA.,Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA.,Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
| | - Wendy Luo
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA.,Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA.,Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
| | - Kopal Garg
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA.,Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA.,Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
| | - Richard A Voit
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA.,Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA.,Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
| | - Ryan L Collins
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA.,Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA
| | - Kiran Garimella
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
| | - Maura Costello
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
| | - Katherine R Chao
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
| | - Julia K Goodrich
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
| | - Stephanie P DiTroia
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
| | - Anne O'Donnell-Luria
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
| | - Michael E Talkowski
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA.,Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA
| | - Alan D Michelson
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA.,Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
| | - Alan B Cantor
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA.,Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
| | - Vijay G Sankaran
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA.,Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA.,Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
| |
Collapse
|
165
|
Garg S. Computational methods for chromosome-scale haplotype reconstruction. Genome Biol 2021; 22:101. [PMID: 33845884 PMCID: PMC8040228 DOI: 10.1186/s13059-021-02328-9] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 03/25/2021] [Indexed: 12/13/2022] Open
Abstract
High-quality chromosome-scale haplotype sequences of diploid genomes, polyploid genomes, and metagenomes provide important insights into genetic variation associated with disease and biodiversity. However, whole-genome short read sequencing does not yield haplotype information spanning whole chromosomes directly. Computational assembly of shorter haplotype fragments is required for haplotype reconstruction, which can be challenging owing to limited fragment lengths and high haplotype and repeat variability across genomes. Recent advancements in long-read and chromosome-scale sequencing technologies, alongside computational innovations, are improving the reconstruction of haplotypes at the level of whole chromosomes. Here, we review recent and discuss methodological progress and perspectives in these areas.
Collapse
Affiliation(s)
- Shilpa Garg
- Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
166
|
Pauper M, Kucuk E, Wenger AM, Chakraborty S, Baybayan P, Kwint M, van der Sanden B, Nelen MR, Derks R, Brunner HG, Hoischen A, Vissers LELM, Gilissen C. Long-read trio sequencing of individuals with unsolved intellectual disability. Eur J Hum Genet 2021; 29:637-648. [PMID: 33257779 PMCID: PMC8115091 DOI: 10.1038/s41431-020-00770-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 10/27/2020] [Indexed: 02/06/2023] Open
Abstract
Long-read sequencing (LRS) has the potential to comprehensively identify all medically relevant genome variation, including variation commonly missed by short-read sequencing (SRS) approaches. To determine this potential, we performed LRS around 15×-40× genome coverage using the Pacific Biosciences Sequel I System for five trios. The respective probands were diagnosed with intellectual disability (ID) whose etiology remained unresolved after SRS exomes and genomes. Systematic assessment of LRS coverage showed that ~35 Mb of the human reference genome was only accessible by LRS and not SRS. Genome-wide structural variant (SV) calling yielded on average 28,292 SV calls per individual, totaling 12.9 Mb of sequence. Trio-based analyses which allowed to study segregation, showed concordance for up to 95% of these SV calls across the genome, and 80% of the LRS SV calls were not identified by SRS. De novo mutation analysis did not identify any de novo SVs, confirming that these are rare events. Because of high sequence coverage, we were also able to call single nucleotide substitutions. On average, we identified 3 million substitutions per genome, with a Mendelian inheritance concordance of up to 97%. Of these, ~100,000 were located in the ~35 Mb of the genome that was only captured by LRS. Moreover, these variants affected the coding sequence of 64 genes, including 32 known Mendelian disease genes. Our data show the potential added value of LRS compared to SRS for identifying medically relevant genome variation.
Collapse
Affiliation(s)
- Marc Pauper
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Erdi Kucuk
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Radboud Institute for Molecular Life Sciences, Radboud University, Nijmegen, The Netherlands
| | | | | | | | - Michael Kwint
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Bart van der Sanden
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6525 HR, Nijmegen, The Netherlands
| | - Marcel R Nelen
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Ronny Derks
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Han G Brunner
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Radboud Institute for Molecular Life Sciences, Radboud University, Nijmegen, The Netherlands
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Alexander Hoischen
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Radboud Institute for Molecular Life Sciences, Radboud University, Nijmegen, The Netherlands
- Department of Internal Medicine, Center for Infectious Diseases (RCI), Radboud University Medical Center, Nijmegen, The Netherlands
| | - Lisenka E L M Vissers
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6525 HR, Nijmegen, The Netherlands
| | - Christian Gilissen
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands.
- Radboud Institute for Molecular Life Sciences, Radboud University, Nijmegen, The Netherlands.
| |
Collapse
|
167
|
Di Genova A, Buena-Atienza E, Ossowski S, Sagot MF. Efficient hybrid de novo assembly of human genomes with WENGAN. Nat Biotechnol 2021; 39:422-430. [PMID: 33318652 PMCID: PMC8041623 DOI: 10.1038/s41587-020-00747-w] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Revised: 10/08/2020] [Accepted: 10/21/2020] [Indexed: 12/12/2022]
Abstract
Generating accurate genome assemblies of large, repeat-rich human genomes has proved difficult using only long, error-prone reads, and most human genomes assembled from long reads add accurate short reads to polish the consensus sequence. Here we report an algorithm for hybrid assembly, WENGAN, that provides very high quality at low computational cost. We demonstrate de novo assembly of four human genomes using a combination of sequencing data generated on ONT PromethION, PacBio Sequel, Illumina and MGI technology. WENGAN implements efficient algorithms to improve assembly contiguity as well as consensus quality. The resulting genome assemblies have high contiguity (contig NG50: 17.24-80.64 Mb), few assembly errors (contig NGA50: 11.8-59.59 Mb), good consensus quality (QV: 27.84-42.88) and high gene completeness (BUSCO complete: 94.6-95.2%), while consuming low computational resources (CPU hours: 187-1,200). In particular, the WENGAN assembly of the haploid CHM13 sample achieved a contig NG50 of 80.64 Mb (NGA50: 59.59 Mb), which surpasses the contiguity of the current human reference genome (GRCh38 contig NG50: 57.88 Mb).
Collapse
Affiliation(s)
- Alex Di Genova
- Inria Grenoble Rhône-Alpes, Montbonnot, France.
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, Villeurbanne, France.
| | - Elena Buena-Atienza
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- NGS Competence Center Tübingen (NCCT), University of Tübingen, Tübingen, Germany
| | - Stephan Ossowski
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- NGS Competence Center Tübingen (NCCT), University of Tübingen, Tübingen, Germany
| | - Marie-France Sagot
- Inria Grenoble Rhône-Alpes, Montbonnot, France.
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, Villeurbanne, France.
| |
Collapse
|
168
|
Kovaka S, Fan Y, Ni B, Timp W, Schatz MC. Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED. Nat Biotechnol 2021; 39:431-441. [PMID: 33257863 PMCID: PMC8567335 DOI: 10.1038/s41587-020-0731-9] [Citation(s) in RCA: 118] [Impact Index Per Article: 39.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Accepted: 10/07/2020] [Indexed: 02/07/2023]
Abstract
Conventional targeted sequencing methods eliminate many of the benefits of nanopore sequencing, such as the ability to accurately detect structural variants or epigenetic modifications. The ReadUntil method allows nanopore devices to selectively eject reads from pores in real time, which could enable purely computational targeted sequencing. However, this requires rapid identification of on-target reads while most mapping methods require computationally intensive basecalling. We present UNCALLED ( https://github.com/skovaka/UNCALLED ), an open source mapper that rapidly matches streaming of nanopore current signals to a reference sequence. UNCALLED probabilistically considers k-mers that could be represented by the signal and then prunes the candidates based on the reference encoded within a Ferragina-Manzini index. We used UNCALLED to deplete sequencing of known bacterial genomes within a metagenomics community, enriching the remaining species 4.46-fold. UNCALLED also enriched 148 human genes associated with hereditary cancers to 29.6× coverage using one MinION flowcell, enabling accurate detection of single-nucleotide polymorphisms, insertions and deletions, structural variants and methylation in these genes.
Collapse
Affiliation(s)
- Sam Kovaka
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| | - Yunfan Fan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Bohan Ni
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| |
Collapse
|
169
|
Blom MPK. Opportunities and challenges for high-quality biodiversity tissue archives in the age of long-read sequencing. Mol Ecol 2021; 30:5935-5948. [PMID: 33786900 DOI: 10.1111/mec.15909] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 03/06/2021] [Accepted: 03/22/2021] [Indexed: 12/11/2022]
Abstract
The technological ability to characterize genetic variation at a genome-wide scale provides an unprecedented opportunity to study the genetic underpinnings and evolutionary mechanisms that promote and sustain biodiversity. The transition from short- to long-read sequencing is particularly promising and allows a more holistic view on any changes in genetic diversity across time and space. Long-read sequencing has tremendous potential but sequencing success strongly depends on the long-range integrity of DNA molecules and therefore on the availability of high-quality tissue samples. With the scope of genomic experiments expanding and wild populations simultaneously disappearing at an unprecedented rate, access to high-quality samples may soon be a major concern for many projects. The need for high-quality biodiversity tissue archives is therefore urgent but sampling and preserving high-quality samples is not a trivial exercise. In this review, I will briefly outline how long-read sequencing can benefit the study of molecular ecology, how this will substantially increase the demand for high-quality tissues and why it is challenging to preserve DNA integrity. I will then provide an overview of preservation approaches and end with a call for support to acknowledge the efforts needed to assemble high-quality tissue archives. In doing so, I hope to simultaneously motivate field biologists to expand sampling practices and molecular biologists to develop (cost) efficient guidelines for the sampling and long-term storage of tissues. A concerted, interdisciplinary, effort is needed to catalogue the genetic variation underlying contemporary biodiversity and will eventually provide a critical resource for future studies.
Collapse
Affiliation(s)
- Mozes P K Blom
- Leibniz Institut für Evolutions- und Biodiversitätsforschung, Museum für Naturkunde, Berlin, Germany
| |
Collapse
|
170
|
Mc Cartney AM, Mahmoud M, Jochum M, Agustinho DP, Zorman B, Al Khleifat A, Dabbaghie F, K Kesharwani R, Smolka M, Dawood M, Albin D, Aliyev E, Almabrazi H, Arslan A, Balaji A, Behera S, Billingsley K, L Cameron D, Daw J, T. Dawson E, De Coster W, Du H, Dunn C, Esteban R, Jolly A, Kalra D, Liao C, Liu Y, Lu TY, M Havrilla J, M Khayat M, Marin M, Monlong J, Price S, Rafael Gener A, Ren J, Sagayaradj S, Sapoval N, Sinner C, C. Soto D, Soylev A, Subramaniyan A, Syed N, Tadimeti N, Tater P, Vats P, Vaughn J, Walker K, Wang G, Zeng Q, Zhang S, Zhao T, Kille B, Biederstedt E, Chaisson M, English A, Kronenberg Z, J. Treangen T, Hefferon T, Chin CS, Busby B, J Sedlazeck F. An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates. F1000Res 2021; 10:246. [PMID: 34621504 PMCID: PMC8479851 DOI: 10.12688/f1000research.51477.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/23/2021] [Indexed: 11/20/2022] Open
Abstract
In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research. The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Fawaz Dabbaghie
- Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | | | | | | | | | | | | | - Ahmed Arslan
- Stanford University School of Medicine, California, USA
| | | | | | | | - Daniel L Cameron
- Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
| | - Joyjit Daw
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | - Haowei Du
- Baylor College of Medicine, Houston, USA
| | | | | | | | | | | | | | | | | | | | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, USA
| | | | | | | | | | | | | | | | - Arda Soylev
- Konya Food and Agriculture University, Konya, Turkey
| | | | | | | | | | - Pankaj Vats
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | | | - Qiandong Zeng
- Laboratory Corporation of America Holdings, Westborough, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
171
|
Mc Cartney AM, Mahmoud M, Jochum M, Agustinho DP, Zorman B, Al Khleifat A, Dabbaghie F, K Kesharwani R, Smolka M, Dawood M, Albin D, Aliyev E, Almabrazi H, Arslan A, Balaji A, Behera S, Billingsley K, L Cameron D, Daw J, T. Dawson E, De Coster W, Du H, Dunn C, Esteban R, Jolly A, Kalra D, Liao C, Liu Y, Lu TY, M Havrilla J, M Khayat M, Marin M, Monlong J, Price S, Rafael Gener A, Ren J, Sagayaradj S, Sapoval N, Sinner C, C. Soto D, Soylev A, Subramaniyan A, Syed N, Tadimeti N, Tater P, Vats P, Vaughn J, Walker K, Wang G, Zeng Q, Zhang S, Zhao T, Kille B, Biederstedt E, Chaisson M, English A, Kronenberg Z, J. Treangen T, Hefferon T, Chin CS, Busby B, J Sedlazeck F. An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates. F1000Res 2021; 10:246. [PMID: 34621504 PMCID: PMC8479851 DOI: 10.12688/f1000research.51477.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/04/2021] [Indexed: 11/08/2023] Open
Abstract
In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research. The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Fawaz Dabbaghie
- Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | | | | | | | | | | | | | - Ahmed Arslan
- Stanford University School of Medicine, California, USA
| | | | | | | | - Daniel L Cameron
- Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
| | - Joyjit Daw
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | - Haowei Du
- Baylor College of Medicine, Houston, USA
| | | | | | | | | | | | | | | | | | | | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, USA
| | | | | | | | | | | | | | | | - Arda Soylev
- Konya Food and Agriculture University, Konya, Turkey
| | | | | | | | | | - Pankaj Vats
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | | | - Qiandong Zeng
- Laboratory Corporation of America Holdings, Westborough, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
172
|
Gusic M, Prokisch H. Genetic basis of mitochondrial diseases. FEBS Lett 2021; 595:1132-1158. [PMID: 33655490 DOI: 10.1002/1873-3468.14068] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 02/17/2021] [Accepted: 02/18/2021] [Indexed: 12/13/2022]
Abstract
Mitochondrial disorders are monogenic disorders characterized by a defect in oxidative phosphorylation and caused by pathogenic variants in one of over 340 different genes. The implementation of whole-exome sequencing has led to a revolution in their diagnosis, duplicated the number of associated disease genes, and significantly increased the diagnosed fraction. However, the genetic etiology of a substantial fraction of patients exhibiting mitochondrial disorders remains unknown, highlighting limitations in variant detection and interpretation, which calls for improved computational and DNA sequencing methods, as well as the addition of OMICS tools. More intriguingly, this also suggests that some pathogenic variants lie outside of the protein-coding genes and that the mechanisms beyond the Mendelian inheritance and the mtDNA are of relevance. This review covers the current status of the genetic basis of mitochondrial diseases, discusses current challenges and perspectives, and explores the contribution of factors beyond the protein-coding regions and monogenic inheritance in the expansion of the genetic spectrum of disease.
Collapse
Affiliation(s)
- Mirjana Gusic
- Institute of Neurogenomics, Helmholtz Zentrum München, Neuherberg, Germany.,Institute of Human Genetics, Technical University of Munich, Germany.,DZHK (German Centre for Cardiovascular Research), Partner Site Munich Heart Alliance, Germany
| | - Holger Prokisch
- Institute of Neurogenomics, Helmholtz Zentrum München, Neuherberg, Germany.,Institute of Human Genetics, Technical University of Munich, Germany
| |
Collapse
|
173
|
van Belzen IAEM, Schönhuth A, Kemmeren P, Hehir-Kwa JY. Structural variant detection in cancer genomes: computational challenges and perspectives for precision oncology. NPJ Precis Oncol 2021; 5:15. [PMID: 33654267 PMCID: PMC7925608 DOI: 10.1038/s41698-021-00155-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 01/12/2021] [Indexed: 01/31/2023] Open
Abstract
Cancer is generally characterized by acquired genomic aberrations in a broad spectrum of types and sizes, ranging from single nucleotide variants to structural variants (SVs). At least 30% of cancers have a known pathogenic SV used in diagnosis or treatment stratification. However, research into the role of SVs in cancer has been limited due to difficulties in detection. Biological and computational challenges confound SV detection in cancer samples, including intratumor heterogeneity, polyploidy, and distinguishing tumor-specific SVs from germline and somatic variants present in healthy cells. Classification of tumor-specific SVs is challenging due to inconsistencies in detected breakpoints, derived variant types and biological complexity of some rearrangements. Full-spectrum SV detection with high recall and precision requires integration of multiple algorithms and sequencing technologies to rescue variants that are difficult to resolve through individual methods. Here, we explore current strategies for integrating SV callsets and to enable the use of tumor-specific SVs in precision oncology.
Collapse
Affiliation(s)
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Patrick Kemmeren
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Jayne Y Hehir-Kwa
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands.
| |
Collapse
|
174
|
Padgitt-Cobb LK, Kingan SB, Wells J, Elser J, Kronmiller B, Moore D, Concepcion G, Peluso P, Rank D, Jaiswal P, Henning J, Hendrix DA. A draft phased assembly of the diploid Cascade hop (Humulus lupulus) genome. THE PLANT GENOME 2021; 14:e20072. [PMID: 33605092 DOI: 10.1002/tpg2.20072] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 10/03/2020] [Indexed: 05/25/2023]
Abstract
Hop (Humulus lupulus L. var Lupulus) is a diploid, dioecious plant with a history of cultivation spanning more than one thousand years. Hop cones are valued for their use in brewing and contain compounds of therapeutic interest including xanthohumol. Efforts to determine how biochemical pathways responsible for desirable traits are regulated have been challenged by the large (2.8 Gb), repetitive, and heterozygous genome of hop. We present a draft haplotype-phased assembly of the Cascade cultivar genome. Our draft assembly and annotation of the Cascade genome is the most extensive representation of the hop genome to date. PacBio long-read sequences from hop were assembled with FALCON and partially phased with FALCON-Unzip. Comparative analysis of haplotype sequences provides insight into selective pressures that have driven evolution in hop. We discovered genes with greater sequence divergence enriched for stress-response, growth, and flowering functions in the draft phased assembly. With improved resolution of long terminal retrotransposons (LTRs) due to long-read sequencing, we found that hop is over 70% repetitive. We identified a homolog of cannabidiolic acid synthase (CBDAS) that is expressed in multiple tissues. The approaches we developed to analyze the draft phased assembly serve to deepen our understanding of the genomic landscape of hop and may have broader applicability to the study of other large, complex genomes.
Collapse
Affiliation(s)
- Lillian K Padgitt-Cobb
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, OR, 97331, USA
| | - Sarah B Kingan
- Pacific Biosciences of California, Menlo Park, CA, 94025, USA
| | - Jackson Wells
- Center for Genome Research and Biocomputing, Oregon State University, Corvallis, OR, 97331, USA
| | - Justin Elser
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, 97331, USA
| | - Brent Kronmiller
- Center for Genome Research and Biocomputing, Oregon State University, Corvallis, OR, 97331, USA
| | | | | | - Paul Peluso
- Pacific Biosciences of California, Menlo Park, CA, 94025, USA
| | - David Rank
- Pacific Biosciences of California, Menlo Park, CA, 94025, USA
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, 97331, USA
| | | | - David A Hendrix
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, OR, 97331, USA
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, 97331, USA
| |
Collapse
|
175
|
Luo J, Wei Y, Lyu M, Wu Z, Liu X, Luo H, Yan C. A comprehensive review of scaffolding methods in genome assembly. Brief Bioinform 2021; 22:6149347. [PMID: 33634311 DOI: 10.1093/bib/bbab033] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 01/21/2021] [Accepted: 01/22/2021] [Indexed: 12/20/2022] Open
Abstract
In the field of genome assembly, scaffolding methods make it possible to obtain a more complete and contiguous reference genome, which is the cornerstone of genomic research. Scaffolding methods typically utilize the alignments between contigs and sequencing data (reads) to determine the orientation and order among contigs and to produce longer scaffolds, which are helpful for genomic downstream analysis. With the rapid development of high-throughput sequencing technologies, diverse types of reads have emerged over the past decade, especially in long-range sequencing, which have greatly enhanced the assembly quality of scaffolding methods. As the number of scaffolding methods increases, biology and bioinformatics researchers need to perform in-depth analyses of state-of-the-art scaffolding methods. In this article, we focus on the difficulties in scaffolding, the differences in characteristics among various kinds of reads, the methods by which current scaffolding methods address these difficulties, and future research opportunities. We hope this work will benefit the design of new scaffolding methods and the selection of appropriate scaffolding methods for specific biological studies.
Collapse
Affiliation(s)
- Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Yawei Wei
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Mengna Lyu
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Zhengjiang Wu
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Xiaoyan Liu
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| |
Collapse
|
176
|
Liu X, Andrews MV, Skinner JP, Johanson TM, Chong MMW. A comparison of alternative mRNA splicing in the CD4 and CD8 T cell lineages. Mol Immunol 2021; 133:53-62. [PMID: 33631555 DOI: 10.1016/j.molimm.2021.02.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 01/05/2021] [Accepted: 02/08/2021] [Indexed: 12/14/2022]
Abstract
T cells can be subdivided into a number of different subsets that are defined by their distinct functions. While the specialization of different T cell subsets is partly achieved by the expression of specific genes, the overall transcriptional profiles of all T cells appear very similar. Alternative mRNA splicing is a mechanism that facilitates greater transcript/protein diversity from a limited number of genes, which may contribute to the functional specialization of distinct T cell subsets. In this study we employ a combination of short-read and long-read sequencing technologies to compare alternative mRNA splicing between the CD4 and CD8 T cell lineages. While long-read technology was effective at assembling full-length alternatively spliced transcripts, the low sequencing depth did not facilitate accurate quantitation. On the other hand, short-read technology was ineffective at assembling full-length transcripts but was highly accurate for quantifying expression. We show that integrating long-read and short-read data together achieves a more complete view of transcriptomic diversity. We found that while the overall usage of transcript isoforms was very similar between the CD4 and CD8 lineages, there were numerous alternative spliced mRNA isoforms that were preferentially used by one lineage over the other. These alternative spliced isoforms included ones with different exon usage, exon exclusion or intron inclusion, all of which are expected to significantly alter the protein sequence.
Collapse
Affiliation(s)
- Xin Liu
- St Vincent's Institute of Medical Research, Fitzroy, Victoria, Australia
| | - Matthew V Andrews
- St Vincent's Institute of Medical Research, Fitzroy, Victoria, Australia
| | - Jarrod P Skinner
- St Vincent's Institute of Medical Research, Fitzroy, Victoria, Australia
| | - Timothy M Johanson
- St Vincent's Institute of Medical Research, Fitzroy, Victoria, Australia
| | - Mark M W Chong
- St Vincent's Institute of Medical Research, Fitzroy, Victoria, Australia; Department of Medicine (St Vincent's), The University of Melbourne, Fitzroy, Victoria, Australia.
| |
Collapse
|
177
|
Wang P, Meng F, Moore BM, Shiu SH. Impact of short-read sequencing on the misassembly of a plant genome. BMC Genomics 2021; 22:99. [PMID: 33530937 PMCID: PMC7852129 DOI: 10.1186/s12864-021-07397-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 01/19/2021] [Indexed: 12/16/2022] Open
Abstract
Background Availability of plant genome sequences has led to significant advances. However, with few exceptions, the great majority of existing genome assemblies are derived from short read sequencing technologies with highly uneven read coverages indicative of sequencing and assembly issues that could significantly impact any downstream analysis of plant genomes. In tomato for example, 0.6% (5.1 Mb) and 9.7% (79.6 Mb) of short-read based assembly had significantly higher and lower coverage compared to background, respectively. Results To understand what the causes may be for such uneven coverage, we first established machine learning models capable of predicting genomic regions with variable coverages and found that high coverage regions tend to have higher simple sequence repeat and tandem gene densities compared to background regions. To determine if the high coverage regions were misassembled, we examined a recently available tomato long-read based assembly and found that 27.8% (1.41 Mb) of high coverage regions were potentially misassembled of duplicate sequences, compared to 1.4% in background regions. In addition, using a predictive model that can distinguish correctly and incorrectly assembled high coverage regions, we found that misassembled, high coverage regions tend to be flanked by simple sequence repeats, pseudogenes, and transposon elements. Conclusions Our study provides insights on the causes of variable coverage regions and a quantitative assessment of factors contributing to plant genome misassembly when using short reads and the generality of these causes and factors should be tested further in other species. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07397-5.
Collapse
Affiliation(s)
- Peipei Wang
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA.,DOE Great Lake Bioenergy Research Center, Michigan State University, East Lansing, MI, 48824, USA
| | - Fanrui Meng
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA.,DOE Great Lake Bioenergy Research Center, Michigan State University, East Lansing, MI, 48824, USA
| | - Bethany M Moore
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA.,The Ecology, Evolution, and Behavioral Biology Program, Michigan State University, East Lansing, MI, 48824, USA
| | - Shin-Han Shiu
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA. .,DOE Great Lake Bioenergy Research Center, Michigan State University, East Lansing, MI, 48824, USA. .,The Ecology, Evolution, and Behavioral Biology Program, Michigan State University, East Lansing, MI, 48824, USA. .,Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
178
|
|
179
|
Wang Q, Liu J, Janssen JM, Le Bouteiller M, Frock RL, Gonçalves MAFV. Precise and broad scope genome editing based on high-specificity Cas9 nickases. Nucleic Acids Res 2021; 49:1173-1198. [PMID: 33398349 PMCID: PMC7826261 DOI: 10.1093/nar/gkaa1236] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 12/04/2020] [Accepted: 12/08/2020] [Indexed: 12/19/2022] Open
Abstract
RNA-guided nucleases (RGNs) based on CRISPR systems permit installing short and large edits within eukaryotic genomes. However, precise genome editing is often hindered due to nuclease off-target activities and the multiple-copy character of the vast majority of chromosomal sequences. Dual nicking RGNs and high-specificity RGNs both exhibit low off-target activities. Here, we report that high-specificity Cas9 nucleases are convertible into nicking Cas9D10A variants whose precision is superior to that of the commonly used Cas9D10A nickase. Dual nicking RGNs based on a selected group of these Cas9D10A variants can yield gene knockouts and gene knock-ins at frequencies similar to or higher than those achieved by their conventional counterparts. Moreover, high-specificity dual nicking RGNs are capable of distinguishing highly similar sequences by 'tiptoeing' over pre-existing single base-pair polymorphisms. Finally, high-specificity RNA-guided nicking complexes generally preserve genomic integrity, as demonstrated by unbiased genome-wide high-throughput sequencing assays. Thus, in addition to substantially enlarging the Cas9 nickase toolkit, we demonstrate the feasibility in expanding the range and precision of DNA knockout and knock-in procedures. The herein introduced tools and multi-tier high-specificity genome editing strategies might be particularly beneficial whenever predictability and/or safety of genetic manipulations are paramount.
Collapse
Affiliation(s)
- Qian Wang
- Department of Cell and Chemical Biology, Leiden University Medical Center, Einthovenweg 20, 2333 ZC Leiden, The Netherlands
| | - Jin Liu
- Department of Cell and Chemical Biology, Leiden University Medical Center, Einthovenweg 20, 2333 ZC Leiden, The Netherlands
| | - Josephine M Janssen
- Department of Cell and Chemical Biology, Leiden University Medical Center, Einthovenweg 20, 2333 ZC Leiden, The Netherlands
| | - Marie Le Bouteiller
- Department of Radiation Oncology, Division of Radiation and Cancer Biology, Stanford University School of Medicine, 269 Campus Dr. Stanford, CA 94305, USA
| | - Richard L Frock
- Department of Radiation Oncology, Division of Radiation and Cancer Biology, Stanford University School of Medicine, 269 Campus Dr. Stanford, CA 94305, USA
| | - Manuel A F V Gonçalves
- Department of Cell and Chemical Biology, Leiden University Medical Center, Einthovenweg 20, 2333 ZC Leiden, The Netherlands
| |
Collapse
|
180
|
Whibley A, Kelley JL, Narum SR. The changing face of genome assemblies: Guidance on achieving high-quality reference genomes. Mol Ecol Resour 2021; 21:641-652. [PMID: 33326691 DOI: 10.1111/1755-0998.13312] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 12/08/2020] [Accepted: 12/11/2020] [Indexed: 12/20/2022]
Abstract
The quality of genome assemblies has improved rapidly in recent years due to continual advances in sequencing technology, assembly approaches, and quality control. In the field of molecular ecology, this has led to the development of exceptional quality genome assemblies that will be important long-term resources for broader studies into ecological, conservation, evolutionary, and population genomics of naturally occurring species. Moreover, the extent to which a single reference genome represents the diversity within a species varies: pan-genomes will become increasingly important ecological genomics resources, particularly in systems found to have considerable presence-absence variation in their functional content. Here, we highlight advances in technology that have raised the bar for genome assembly and provide guidance on standards to achieve exceptional quality reference genomes. Key recommendations include the following: (a) Genome assemblies should include long-read sequencing except in rare cases where it is effectively impossible to acquire adequately preserved samples needed for high molecular weight DNA standards. (b) At least one scaffolding approach should be included with genome assembly such as Hi-C or optical mapping. (c) Genome assemblies should be carefully evaluated, this may involve utilising short read data for genome polishing, error correction, k-mer analyses, and estimating the percent of reads that map back to an assembly. Finally, a genome assembly is most valuable if all data and methods are made publicly available and the utility of a genome for further studies is verified through examples. While these recommendations are based on current technology, we anticipate that future advances will push the field further and the molecular ecology community should continue to adopt new approaches that attain the highest quality genome assemblies.
Collapse
Affiliation(s)
| | | | - Shawn R Narum
- University of Idaho, Moscow, ID, USA.,Columbia River Inter-Tribal Fish Commission, Hagerman, ID, USA
| |
Collapse
|
181
|
Eschenbrenner CJ, Feurtey A, Stukenbrock EH. Population Genomics of Fungal Plant Pathogens and the Analyses of Rapidly Evolving Genome Compartments. Methods Mol Biol 2021; 2090:337-355. [PMID: 31975174 DOI: 10.1007/978-1-0716-0199-0_14] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Genome sequencing of fungal pathogens have documented extensive variation in genome structure and composition between species and in many cases between individuals of the same species. This type of genomic variation can be adaptive for pathogens to rapidly evolve new virulence phenotypes. Analyses of genome-wide variation in fungal pathogen genomes rely on high quality assemblies and methods to detect and quantify structural variation. Population genomic studies in fungi have addressed the underlying mechanisms whereby structural variation can be rapidly generated. Transposable elements, high mutation and recombination rates as well as incorrect chromosome segregation during mitosis and meiosis contribute to extensive variation observed in many species. We here summarize key findings in the field of fungal pathogen genomics and we discuss methods to detect and characterize structural variants including an alignment-based pipeline to study variation in population genomic data.
Collapse
Affiliation(s)
- Christoph J Eschenbrenner
- Environmental Genomics, Christian-Albrechts University of Kiel, Kiel, Germany
- Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Alice Feurtey
- Environmental Genomics, Christian-Albrechts University of Kiel, Kiel, Germany
- Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Eva H Stukenbrock
- Environmental Genomics, Christian-Albrechts University of Kiel, Kiel, Germany.
- Max Planck Institute for Evolutionary Biology, Plön, Germany.
| |
Collapse
|
182
|
Du H, Diao C, Zhao P, Zhou L, Liu JF. Integrated hybrid de novo assembly technologies to obtain high-quality pig genome using short and long reads. Brief Bioinform 2021; 22:6082823. [PMID: 33429431 DOI: 10.1093/bib/bbaa399] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 11/20/2020] [Accepted: 12/08/2020] [Indexed: 11/12/2022] Open
Abstract
With the rapid progress of sequencing technologies, various types of sequencing reads and assembly algorithms have been designed to construct genome assemblies. Although recent studies have attempted to evaluate the appropriate type of sequencing reads and algorithms for assembling high-quality genomes, it is still a challenge to set the correct combination for constructing animal genomes. Here, we present a comparative performance assessment of 14 assembly combinations-9 software programs with different short and long reads of Duroc pig. Based on the results of the optimization process for genome construction, we designed an integrated hybrid de novo assembly pipeline, HSCG, and constructed a draft genome for Duroc pig. Comparison between the new genome and Sus scrofa 11.1 revealed important breakpoints in two S. scrofa 11.1 genes. Our findings may provide new insights into the pan-genome analysis studies of agricultural animals, and the integrated assembly pipeline may serve as a guide for the assembly of other animal genomes.
Collapse
Affiliation(s)
- Heng Du
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Chenguang Diao
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Pengju Zhao
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Lei Zhou
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Jian-Feng Liu
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| |
Collapse
|
183
|
Morisse P, Marchet C, Limasset A, Lecroq T, Lefebvre A. Scalable long read self-correction and assembly polishing with multiple sequence alignment. Sci Rep 2021; 11:761. [PMID: 33436980 PMCID: PMC7804095 DOI: 10.1038/s41598-020-80757-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 12/22/2020] [Indexed: 11/09/2022] Open
Abstract
Third-generation sequencing technologies allow to sequence long reads of tens of kbp, that are expected to solve various problems. However, they display high error rates, currently capped around 10%. Self-correction is thus regularly used in long reads analysis projects. We introduce CONSENT, a new self-correction method that relies both on multiple sequence alignment and local de Bruijn graphs. To ensure scalability, multiple sequence alignment computation benefits from a new and efficient segmentation strategy, allowing a massive speedup. CONSENT compares well to the state-of-the-art, and performs better on real Oxford Nanopore data. Specifically, CONSENT is the only method that efficiently scales to ultra-long reads, and allows to process a full human dataset, containing reads reaching up to 1.5 Mbp, in 10 days. Moreover, our experiments show that error correction with CONSENT improves the quality of Flye assemblies. Additionally, CONSENT implements a polishing feature, allowing to correct raw assemblies. Our experiments show that CONSENT is 2-38x times faster than other polishing tools, while providing comparable results. Furthermore, we show that, on a human dataset, assembling the raw data and polishing the assembly is less resource consuming than correcting and then assembling the reads, while providing better results. CONSENT is available at https://github.com/morispi/CONSENT .
Collapse
|
184
|
Holley G, Beyter D, Ingimundardottir H, Møller PL, Kristmundsdottir S, Eggertsson HP, Halldorsson BV. Ratatosk: hybrid error correction of long reads enables accurate variant calling and assembly. Genome Biol 2021; 22:28. [PMID: 33419473 PMCID: PMC7792008 DOI: 10.1186/s13059-020-02244-4] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 12/15/2020] [Indexed: 12/20/2022] Open
Abstract
A major challenge to long read sequencing data is their high error rate of up to 15%. We present Ratatosk, a method to correct long reads with short read data. We demonstrate on 5 human genome trios that Ratatosk reduces the error rate of long reads 6-fold on average with a median error rate as low as 0.22 %. SNP calls in Ratatosk corrected reads are nearly 99 % accurate and indel calls accuracy is increased by up to 37 %. An assembly of Ratatosk corrected reads from an Ashkenazi individual yields a contig N50 of 45 Mbp and less misassemblies than a PacBio HiFi reads assembly.
Collapse
Affiliation(s)
| | | | | | - Peter L Møller
- Department of Biomedicine, Aarhus University, Aarhus, Denmark
| | - Snædis Kristmundsdottir
- deCODE genetics/Amgen Inc., Reykjavík, Iceland
- School of Technology, Reykjavik University, Reykjavík, Iceland
| | | | - Bjarni V Halldorsson
- deCODE genetics/Amgen Inc., Reykjavík, Iceland
- School of Technology, Reykjavik University, Reykjavík, Iceland
| |
Collapse
|
185
|
Peona V, Blom MPK, Xu L, Burri R, Sullivan S, Bunikis I, Liachko I, Haryoko T, Jønsson KA, Zhou Q, Irestedt M, Suh A. Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise. Mol Ecol Resour 2021; 21:263-286. [PMID: 32937018 PMCID: PMC7757076 DOI: 10.1111/1755-0998.13252] [Citation(s) in RCA: 74] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 08/21/2020] [Accepted: 08/26/2020] [Indexed: 01/09/2023]
Abstract
Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat-rich and GC-rich regions (genomic "dark matter") limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long-read, linked-read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC-rich microchromosomes and the repeat-rich W chromosome. Telomere-to-telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes.
Collapse
Affiliation(s)
- Valentina Peona
- Department of Ecology and Genetics—Evolutionary BiologyScience for Life LaboratoriesUppsala UniversityUppsalaSweden
- Department of Organismal Biology—Systematic BiologyScience for Life LaboratoriesUppsala UniversityUppsalaSweden
| | - Mozes P. K. Blom
- Department of Bioinformatics and GeneticsSwedish Museum of Natural HistoryStockholmSweden
- Museum für NaturkundeLeibniz Institut für Evolutions‐ und BiodiversitätsforschungBerlinGermany
| | - Luohao Xu
- Department of Neurosciences and Developmental BiologyUniversity of ViennaViennaAustria
| | - Reto Burri
- Department of Population EcologyInstitute of Ecology and EvolutionFriedrich‐Schiller‐University JenaJenaGermany
| | | | - Ignas Bunikis
- Department of Immunology, Genetics and PathologyScience for Life LaboratoryUppsala Genome CenterUppsala UniversityUppsalaSweden
| | | | - Tri Haryoko
- Research Centre for BiologyMuseum Zoologicum BogorienseIndonesian Institute of Sciences (LIPI)CibinongIndonesia
| | - Knud A. Jønsson
- Natural History Museum of DenmarkUniversity of CopenhagenCopenhagenDenmark
| | - Qi Zhou
- Department of Neurosciences and Developmental BiologyUniversity of ViennaViennaAustria
- MOE Laboratory of Biosystems Homeostasis & ProtectionLife Sciences InstituteZhejiang UniversityHangzhouChina
- Center for Reproductive MedicineThe 2nd Affiliated HospitalSchool of MedicineZhejiang UniversityHangzhouChina
| | - Martin Irestedt
- Department of Bioinformatics and GeneticsSwedish Museum of Natural HistoryStockholmSweden
| | - Alexander Suh
- Department of Ecology and Genetics—Evolutionary BiologyScience for Life LaboratoriesUppsala UniversityUppsalaSweden
- Department of Organismal Biology—Systematic BiologyScience for Life LaboratoriesUppsala UniversityUppsalaSweden
- School of Biological Sciences—Organisms and the EnvironmentUniversity of East AngliaNorwichUK
| |
Collapse
|
186
|
Zhang H, Jain C, Aluru S. A comprehensive evaluation of long read error correction methods. BMC Genomics 2020; 21:889. [PMID: 33349243 PMCID: PMC7751105 DOI: 10.1186/s12864-020-07227-0] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Accepted: 11/12/2020] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Third-generation single molecule sequencing technologies can sequence long reads, which is advancing the frontiers of genomics research. However, their high error rates prohibit accurate and efficient downstream analysis. This difficulty has motivated the development of many long read error correction tools, which tackle this problem through sampling redundancy and/or leveraging accurate short reads of the same biological samples. Existing studies to asses these tools use simulated data sets, and are not sufficiently comprehensive in the range of software covered or diversity of evaluation measures used. RESULTS In this paper, we present a categorization and review of long read error correction methods, and provide a comprehensive evaluation of the corresponding long read error correction tools. Leveraging recent real sequencing data, we establish benchmark data sets and set up evaluation criteria for a comparative assessment which includes quality of error correction as well as run-time and memory usage. We study how trimming and long read sequencing depth affect error correction in terms of length distribution and genome coverage post-correction, and the impact of error correction performance on an important application of long reads, genome assembly. We provide guidelines for practitioners for choosing among the available error correction tools and identify directions for future research. CONCLUSIONS Despite the high error rate of long reads, the state-of-the-art correction tools can achieve high correction quality. When short reads are available, the best hybrid methods outperform non-hybrid methods in terms of correction quality and computing resource usage. When choosing tools for use, practitioners are suggested to be careful with a few correction tools that discard reads, and check the effect of error correction tools on downstream analysis. Our evaluation code is available as open-source at https://github.com/haowenz/LRECE .
Collapse
Affiliation(s)
- Haowen Zhang
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, 30332, GA, USA
| | - Chirag Jain
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, 30332, GA, USA
| | - Srinivas Aluru
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, 30332, GA, USA. .,Institute for Data Engineering and Science, Georgia Institute of Technology, Atlanta, 30332, GA, USA.
| |
Collapse
|
187
|
Heller D, Vingron M. SVIM-asm: Structural variant detection from haploid and diploid genome assemblies. Bioinformatics 2020; 36:5519-5521. [PMID: 33346817 PMCID: PMC8016491 DOI: 10.1093/bioinformatics/btaa1034] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 11/16/2020] [Accepted: 12/12/2020] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION With the availability of new sequencing technologies, the generation of haplotype-resolved genome assemblies up to chromosome scale has become feasible. These assemblies capture the complete genetic information of both parental haplotypes, increase structural variant (SV) calling sensitivity and enable direct genotyping and phasing of SVs. Yet, existing SV callers are designed for haploid genome assemblies only, do not support genotyping or detect only a limited set of SV classes. RESULTS We introduce our method SVIM-asm for the detection and genotyping of six common classes of SVs from haploid and diploid genome assemblies. Compared against the only other existing SV caller for diploid assemblies, DipCall, SVIM-asm detects more SV classes and reached higher F1 scores for the detection of insertions and deletions on two recently published assemblies of the HG002 individual. AVAILABILITY AND IMPLEMENTATION SVIM-asm has been implemented in Python and can be easily installed via bioconda. Its source code is available at github.com/eldariont/svim-asm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David Heller
- Computational Molecular Biology Department, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Martin Vingron
- Computational Molecular Biology Department, Max Planck Institute for Molecular Genetics, Berlin, Germany
| |
Collapse
|
188
|
Bennett EP, Petersen BL, Johansen IE, Niu Y, Yang Z, Chamberlain CA, Met Ö, Wandall HH, Frödin M. INDEL detection, the 'Achilles heel' of precise genome editing: a survey of methods for accurate profiling of gene editing induced indels. Nucleic Acids Res 2020; 48:11958-11981. [PMID: 33170255 PMCID: PMC7708060 DOI: 10.1093/nar/gkaa975] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 10/05/2020] [Accepted: 10/15/2020] [Indexed: 12/11/2022] Open
Abstract
Advances in genome editing technologies have enabled manipulation of genomes at the single base level. These technologies are based on programmable nucleases (PNs) that include meganucleases, zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs) and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR-associated 9 (Cas9) nucleases and have given researchers the ability to delete, insert or replace genomic DNA in cells, tissues and whole organisms. The great flexibility in re-designing the genomic target specificity of PNs has vastly expanded the scope of gene editing applications in life science, and shows great promise for development of the next generation gene therapies. PN technologies share the principle of inducing a DNA double-strand break (DSB) at a user-specified site in the genome, followed by cellular repair of the induced DSB. PN-elicited DSBs are mainly repaired by the non-homologous end joining (NHEJ) and the microhomology-mediated end joining (MMEJ) pathways, which can elicit a variety of small insertion or deletion (indel) mutations. If indels are elicited in a protein coding sequence and shift the reading frame, targeted gene knock out (KO) can readily be achieved using either of the available PNs. Despite the ease by which gene inactivation in principle can be achieved, in practice, successful KO is not only determined by the efficiency of NHEJ and MMEJ repair; it also depends on the design and properties of the PN utilized, delivery format chosen, the preferred indel repair outcomes at the targeted site, the chromatin state of the target site and the relative activities of the repair pathways in the edited cells. These variables preclude accurate prediction of the nature and frequency of PN induced indels. A key step of any gene KO experiment therefore becomes the detection, characterization and quantification of the indel(s) induced at the targeted genomic site in cells, tissues or whole organisms. In this survey, we briefly review naturally occurring indels and their detection. Next, we review the methods that have been developed for detection of PN-induced indels. We briefly outline the experimental steps and describe the pros and cons of the various methods to help users decide a suitable method for their editing application. We highlight recent advances that enable accurate and sensitive quantification of indel events in cells regardless of their genome complexity, turning a complex pool of different indel events into informative indel profiles. Finally, we review what has been learned about PN-elicited indel formation through the use of the new methods and how this insight is helping to further advance the genome editing field.
Collapse
Affiliation(s)
- Eric Paul Bennett
- Copenhagen Center for Glycomics, Department of Odontology and Molecular and Cellular Medicine, Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen N, Denmark
| | - Bent Larsen Petersen
- Department of Plant and Environmental Sciences, University of Copenhagen, DK-1871 Frederiksberg C, Denmark
| | - Ida Elisabeth Johansen
- Department of Plant and Environmental Sciences, University of Copenhagen, DK-1871 Frederiksberg C, Denmark
| | - Yiyuan Niu
- Biotech Research and Innovation Centre (BRIC), Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
- College of Animal Science and Technology, Northwest A&F University, Yangling Shaanxi, China
| | - Zhang Yang
- Copenhagen Center for Glycomics, Department of Odontology and Molecular and Cellular Medicine, Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen N, Denmark
| | | | - Özcan Met
- Center for Cancer Immune Therapy, Department of Oncology, Copenhagen University Hospital, Herlev, Denmark
- Department of Immunology and Microbiology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Hans H Wandall
- Copenhagen Center for Glycomics, Department of Odontology and Molecular and Cellular Medicine, Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen N, Denmark
| | - Morten Frödin
- Biotech Research and Innovation Centre (BRIC), Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
189
|
Fatima N, Petri A, Gyllensten U, Feuk L, Ameur A. Evaluation of Single-Molecule Sequencing Technologies for Structural Variant Detection in Two Swedish Human Genomes. Genes (Basel) 2020; 11:E1444. [PMID: 33266238 PMCID: PMC7760597 DOI: 10.3390/genes11121444] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 11/24/2020] [Accepted: 11/26/2020] [Indexed: 01/23/2023] Open
Abstract
Long-read single molecule sequencing is increasingly used in human genomics research, as it allows to accurately detect large-scale DNA rearrangements such as structural variations (SVs) at high resolution. However, few studies have evaluated the performance of different single molecule sequencing platforms for SV detection in human samples. Here we performed Oxford Nanopore Technologies (ONT) whole-genome sequencing of two Swedish human samples (average 32× coverage) and compared the results to previously generated Pacific Biosciences (PacBio) data for the same individuals (average 66× coverage). Our analysis inferred an average of 17k and 23k SVs from the ONT and PacBio data, respectively, with a majority of them overlapping with an available multi-platform SV dataset. When comparing the SV calls in the two Swedish individuals, we find a higher concordance between ONT and PacBio SVs detected in the same individual as compared to SVs detected by the same technology in different individuals. Downsampling of PacBio reads, performed to obtain similar coverage levels for all datasets, resulted in 17k SVs per individual and improved overlap with the ONT SVs. Our results suggest that ONT and PacBio have a similar performance for SV detection in human whole genome sequencing data, and that both technologies are feasible for population-scale studies.
Collapse
Affiliation(s)
- Nazeefa Fatima
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
| | - Anna Petri
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
| | - Ulf Gyllensten
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
| | - Lars Feuk
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
| | - Adam Ameur
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
- Department of Epidemiology and Preventive Medicine, Monash University, Melbourne, Clayton, VIC 3800, Australia
| |
Collapse
|
190
|
Koebley SR, Mikheikin A, Leslie K, Guest D, McConnell-Wells W, Lehman JH, Al Juhaishi T, Zhang X, Roberts CH, Picco L, Toor A, Chesney A, Reed J. Digital Polymerase Chain Reaction Paired with High-Speed Atomic Force Microscopy for Quantitation and Length Analysis of DNA Length Polymorphisms. ACS NANO 2020; 14:15385-15393. [PMID: 33169971 DOI: 10.1021/acsnano.0c05897] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
DNA length polymorphisms are found in many serious diseases, and assessment of their length and abundance is often critical for accurate diagnosis. However, measuring their length and frequency in a mostly wild-type background, as occurs in many situations, remains challenging due to their variable and repetitive nature. To overcome these hurdles, we combined two powerful techniques, digital polymerase chain reaction (dPCR) and high-speed atomic force microscopy (HSAFM), to create a simple, rapid, and flexible method for quantifying both the size and proportion of DNA length polymorphisms. In our approach, individual amplicons from each dPCR partition are imaged and sized directly. We focused on internal tandem duplications (ITDs) located within the FLT3 gene, which are associated with acute myeloid leukemia and often indicative of a poor prognosis. In an analysis of over 1.5 million HSAFM-imaged amplicons from cell line and clinical samples containing FLT3-ITDs, dPCR-HSAFM returned the expected variant length and variant allele frequency, down to 5% variant samples. As a high-throughput method with single-molecule resolution, dPCR-HSAFM thus represents an advance in HSAFM analysis and a powerful tool for the diagnosis of length polymorphisms.
Collapse
Affiliation(s)
- Sean R Koebley
- Physics Department, Virginia Commonwealth University, Richmond, Virginia 23284, United States
| | - Andrey Mikheikin
- Physics Department, Virginia Commonwealth University, Richmond, Virginia 23284, United States
| | - Kevin Leslie
- Physics Department, Virginia Commonwealth University, Richmond, Virginia 23284, United States
| | - Daniel Guest
- Physics Department, Virginia Commonwealth University, Richmond, Virginia 23284, United States
| | - Wendy McConnell-Wells
- Physics Department, Virginia Commonwealth University, Richmond, Virginia 23284, United States
| | - Joshua H Lehman
- Physics Department, Virginia Commonwealth University, Richmond, Virginia 23284, United States
| | - Taha Al Juhaishi
- Department of Internal Medicine, Virginia Commonwealth University, Richmond, Virginia 23298, United States
| | - Xiaojie Zhang
- Department of Internal Medicine, Virginia Commonwealth University, Richmond, Virginia 23298, United States
| | - Catherine H Roberts
- Massey Cancer Center, Virginia Commonwealth University, Richmond, Virginia 23298, United States
| | - Loren Picco
- Physics Department, Virginia Commonwealth University, Richmond, Virginia 23284, United States
| | - Amir Toor
- Department of Internal Medicine, Virginia Commonwealth University, Richmond, Virginia 23298, United States
| | - Alden Chesney
- Department of Pathology, Virginia Commonwealth University, Richmond, Virginia 23298, United States
| | - Jason Reed
- Physics Department, Virginia Commonwealth University, Richmond, Virginia 23284, United States
- Massey Cancer Center, Virginia Commonwealth University, Richmond, Virginia 23298, United States
| |
Collapse
|
191
|
Short and long-read ultra-deep sequencing profiles emerging heterogeneity across five platform Escherichia coli strains. Metab Eng 2020; 65:197-206. [PMID: 33242648 DOI: 10.1016/j.ymben.2020.11.006] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Revised: 10/26/2020] [Accepted: 11/12/2020] [Indexed: 11/24/2022]
Abstract
Reprogramming organisms for large-scale bioproduction counters their evolutionary objectives of fast growth and often leads to mutational collapse of the engineered production pathways during cultivation. Yet, the mutational susceptibility of academic and industrial Escherichia coli bioproduction host strains are poorly understood. In this study, we apply 2nd and 3rd generation deep sequencing to profile simultaneous modes of genetic heterogeneity that decimate engineered biosynthetic production in five popular E. coli hosts BL21(DE3), TOP10, MG1655, W, and W3110 producing 2,3-butanediol and mevalonic acid. Combining short-read and long-read sequencing, we detect strain and sequence-specific mutational modes including single nucleotide polymorphism, inversion, and mobile element transposition, as well as complex structural variations that disrupt the integrity of the engineered biosynthetic pathway. Our analysis suggests that organism engineers should avoid chassis strains hosting active insertion sequence (IS) subfamilies such as IS1 and IS10 present in popular E. coli TOP10. We also recommend monitoring for increased mutagenicity in the pathway transcription initiation regions and recombinogenic repeats. Together, short and long sequencing reads identified latent low-frequency mutation events such as a short detrimental inversion within a pathway gene, driven by 8-bp short inverted repeats. This demonstrates the power of combining ultra-deep DNA sequencing technologies to profile genetic heterogeneities of engineered constructs and explore the markedly different mutational landscapes of common E. coli host strains. The observed multitude of evolving variants underlines the usefulness of early mutational profiling for new synthetic pathways designed to sustain in organisms over long cultivation scales.
Collapse
|
192
|
Murphy WJ, Foley NM, Bredemeyer KR, Gatesy J, Springer MS. Phylogenomics and the Genetic Architecture of the Placental Mammal Radiation. Annu Rev Anim Biosci 2020; 9:29-53. [PMID: 33228377 DOI: 10.1146/annurev-animal-061220-023149] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The genomes of placental mammals are being sequenced at an unprecedented rate. Alignments of hundreds, and one day thousands, of genomes spanning the rich living and extinct diversity of species offer unparalleled power to resolve phylogenetic controversies, identify genomic innovations of adaptation, and dissect the genetic architecture of reproductive isolation. We highlight outstanding questions about the earliest phases of placental mammal diversification and the promise of newer methods, as well as remaining challenges, toward using whole genome data to resolve placental mammal phylogeny. The next phase of mammalian comparative genomics will see the completion and application of finished-quality, gapless genome assemblies from many ordinal lineages and closely related species. Interspecific comparisons between the most hypervariable genomic loci will likely reveal large, but heretofore mostly underappreciated, effects on population divergence, morphological innovation, and the origin of new species.
Collapse
Affiliation(s)
- William J Murphy
- Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77843, USA;
| | - Nicole M Foley
- Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77843, USA;
| | - Kevin R Bredemeyer
- Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77843, USA;
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA
| | - Mark S Springer
- Department of Evolution, Ecology and Organismal Biology, University of California, Riverside, California 92521, USA
| |
Collapse
|
193
|
Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions. PLoS Comput Biol 2020; 16:e1008397. [PMID: 33226985 PMCID: PMC7721175 DOI: 10.1371/journal.pcbi.1008397] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Revised: 12/07/2020] [Accepted: 09/24/2020] [Indexed: 11/19/2022] Open
Abstract
Genetic diseases are driven by aberrations of the human genome. Identification of such aberrations including structural variations (SVs) is key to our understanding. Conventional short-reads whole genome sequencing (cWGS) can identify SVs to base-pair resolution, but utilizes only short-range information and suffers from high false discovery rate (FDR). Linked-reads sequencing (10XWGS) utilizes long-range information by linkage of short-reads originating from the same large DNA molecule. This can mitigate alignment-based artefacts especially in repetitive regions and should enable better prediction of SVs. However, an unbiased evaluation of this technology is not available. In this study, we performed a comprehensive analysis of different types and sizes of SVs predicted by both the technologies and validated with an independent PCR based approach. The SVs commonly identified by both the technologies were highly specific, while validation rate dropped for uncommon events. A particularly high FDR was observed for SVs only found by 10XWGS. To improve FDR and sensitivity, statistical models for both the technologies were trained. Using our approach, we characterized SVs from the MCF7 cell line and a primary breast cancer tumor with high precision. This approach improves SV prediction and can therefore help in understanding the underlying genetics in various diseases. Cancer and many other diseases are often driven by structural rearrangements in the patients. Their precise identification is necessary to understand evolution and cure for the disease. In this study, we have compared two sequencing technologies for the identification of structural variations i.e. Illumina’s short-reads and 10X Genomics linked-reads sequencing. Short-reads sequencing is already known to have high false discovery rate for structural variations, while, an unbiased performance evaluation of linked-reads sequencing is missing. Hence, we evaluate the performance of these two technologies using computational and PCR based methodologies. Moreover, we also present a statistical approach to increase their performance, supporting better detection of structural variations and thus further research into disease biology.
Collapse
|
194
|
Lee N, Park MJ, Song W, Jeon K, Jeong S. Currently Applied Molecular Assays for Identifying ESR1 Mutations in Patients with Advanced Breast Cancer. Int J Mol Sci 2020; 21:ijms21228807. [PMID: 33233830 PMCID: PMC7699999 DOI: 10.3390/ijms21228807] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 11/17/2020] [Accepted: 11/19/2020] [Indexed: 12/11/2022] Open
Abstract
Approximately 70% of breast cancers, the leading cause of cancer-related mortality worldwide, are positive for the estrogen receptor (ER). Treatment of patients with luminal subtypes is mainly based on endocrine therapy. However, ER positivity is reduced and ESR1 mutations play an important role in resistance to endocrine therapy, leading to advanced breast cancer. Various methodologies for the detection of ESR1 mutations have been developed, and the most commonly used method is next-generation sequencing (NGS)-based assays (50.0%) followed by droplet digital PCR (ddPCR) (45.5%). Regarding the sample type, tissue (50.0%) was more frequently used than plasma (27.3%). However, plasma (46.2%) became the most used method in 2016-2019, in contrast to 2012-2015 (22.2%). In 2016-2019, ddPCR (61.5%), rather than NGS (30.8%), became a more popular method than it was in 2012-2015. The easy accessibility, non-invasiveness, and demonstrated usefulness with high sensitivity of ddPCR using plasma have changed the trends. When using these assays, there should be a comprehensive understanding of the principles, advantages, vulnerability, and precautions for interpretation. In the future, advanced NGS platforms and modified ddPCR will benefit patients by facilitating treatment decisions efficiently based on information regarding ESR1 mutations.
Collapse
Affiliation(s)
- Nuri Lee
- Department of Laboratory Medicine, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07440, Korea; (N.L.); (M.-J.P.); (W.S.)
| | - Min-Jeong Park
- Department of Laboratory Medicine, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07440, Korea; (N.L.); (M.-J.P.); (W.S.)
| | - Wonkeun Song
- Department of Laboratory Medicine, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07440, Korea; (N.L.); (M.-J.P.); (W.S.)
| | - Kibum Jeon
- Department of Laboratory Medicine, Hangang Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07440, Korea;
| | - Seri Jeong
- Department of Laboratory Medicine, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07440, Korea; (N.L.); (M.-J.P.); (W.S.)
- Correspondence: ; Tel.: +82-845-5305
| |
Collapse
|
195
|
Rubin MA, Bristow RG, Thienger PD, Dive C, Imielinski M. Impact of Lineage Plasticity to and from a Neuroendocrine Phenotype on Progression and Response in Prostate and Lung Cancers. Mol Cell 2020; 80:562-577. [PMID: 33217316 PMCID: PMC8399907 DOI: 10.1016/j.molcel.2020.10.033] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 09/06/2020] [Accepted: 10/22/2020] [Indexed: 02/07/2023]
Abstract
Intratumoral heterogeneity can occur via phenotype transitions, often after chronic exposure to targeted anticancer agents. This process, termed lineage plasticity, is associated with acquired independence to an initial oncogenic driver, resulting in treatment failure. In non-small cell lung cancer (NSCLC) and prostate cancers, lineage plasticity manifests when the adenocarcinoma phenotype transforms into neuroendocrine (NE) disease. The exact molecular mechanisms involved in this NE transdifferentiation remain elusive. In small cell lung cancer (SCLC), plasticity from NE to nonNE phenotypes is driven by NOTCH signaling. Herein we review current understanding of NE lineage plasticity dynamics, exemplified by prostate cancer, NSCLC, and SCLC.
Collapse
Affiliation(s)
- Mark A Rubin
- Department for BioMedical Research, University of Bern and Inselspital, 3010 Bern, Switzerland; Bern Center for Precision Medicine, University of Bern and Inselspital, 3010 Bern, Switzerland.
| | - Robert G Bristow
- Manchester Cancer Research Centre and Cancer Research UK Manchester Institute, University of Manchester, Macclesfield SK10 4TG, UK
| | - Phillip D Thienger
- Department for BioMedical Research, University of Bern and Inselspital, 3010 Bern, Switzerland
| | - Caroline Dive
- Cancer Research UK Manchester Institute Cancer Biomarker Centre, University of Manchester, Macclesfield SK10 4TG, UK
| | - Marcin Imielinski
- Pathology and Laboratory Medicine and Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA
| |
Collapse
|
196
|
Benaud N, Edwards RJ, Amos TG, D'Agostino PM, Gutiérrez-Chávez C, Montgomery K, Nicetic I, Ferrari BC. Antarctic desert soil bacteria exhibit high novel natural product potential, evaluated through long-read genome sequencing and comparative genomics. Environ Microbiol 2020; 23:3646-3664. [PMID: 33140504 DOI: 10.1111/1462-2920.15300] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 10/29/2020] [Indexed: 11/30/2022]
Abstract
Actinobacteria and Proteobacteria are important producers of bioactive natural products (NP), and these phyla dominate in the arid soils of Antarctica, where metabolic adaptations influence survival under harsh conditions. Biosynthetic gene clusters (BGCs) which encode NPs, are typically long and repetitious high G + C regions difficult to sequence with short-read technologies. We sequenced 17 Antarctic soil bacteria from multi-genome libraries, employing the long-read PacBio platform, to optimize capture of BGCs and to facilitate a comprehensive analysis of their NP capacity. We report 13 complete bacterial genomes of high quality and contiguity, representing 10 different cold-adapted genera including novel species. Antarctic BGCs exhibited low similarity to known compound BGCs (av. 31%), with an abundance of terpene, non-ribosomal peptide and polyketide-encoding clusters. Comparative genome analysis was used to map BGC variation between closely related strains from geographically distant environments. Results showed the greatest biosynthetic differences to be in a psychrotolerant Streptomyces strain, as well as a rare Actinobacteria genus, Kribbella, while two other Streptomyces spp. were surprisingly similar to known genomes. Streptomyces and Kribbella BGCs were predicted to encode antitumour, antifungal, antibacterial and biosurfactant-like compounds, and the synthesis of NPs with antibacterial, antifungal and surfactant properties was confirmed through bioactivity assays.
Collapse
Affiliation(s)
- Nicole Benaud
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, 2052, Australia
| | - Richard J Edwards
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, 2052, Australia
| | - Timothy G Amos
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, 2052, Australia
| | - Paul M D'Agostino
- Technische Universität Dresden, Chair of Technical Biochemistry, Bergstraße 66, 01602 Dresden, Germany
| | | | - Kate Montgomery
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, 2052, Australia
| | - Iskra Nicetic
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, 2052, Australia
| | - Belinda C Ferrari
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, 2052, Australia
| |
Collapse
|
197
|
Kadota M, Nishimura O, Miura H, Tanaka K, Hiratani I, Kuraku S. Multifaceted Hi-C benchmarking: what makes a difference in chromosome-scale genome scaffolding? Gigascience 2020; 9:5695848. [PMID: 31919520 PMCID: PMC6952475 DOI: 10.1093/gigascience/giz158] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Revised: 10/23/2019] [Accepted: 12/02/2019] [Indexed: 12/28/2022] Open
Abstract
Background Hi-C is derived from chromosome conformation capture (3C) and targets chromatin contacts on a genomic scale. This method has also been used frequently in scaffolding nucleotide sequences obtained by de novo genome sequencing and assembly, in which the number of resultant sequences rarely converges to the chromosome number. Despite its prevalent use, the sample preparation methods for Hi-C have not been intensively discussed, especially from the standpoint of genome scaffolding. Results To gain insight into the best practice of Hi-C scaffolding, we performed a multifaceted methodological comparison using vertebrate samples and optimized various factors during sample preparation, sequencing, and computation. As a result, we identified several key factors that helped improve Hi-C scaffolding, including the choice and preparation of tissues, library preparation conditions, the choice of restriction enzyme(s), and the choice of scaffolding program and its usage. Conclusions This study provides the first comparison of multiple sample preparation kits/protocols and computational programs for Hi-C scaffolding by an academic third party. We introduce a customized protocol designated “inexpensive and controllable Hi-C (iconHi-C) protocol,” which incorporates the optimal conditions identified in this study, and demonstrate this technique on chromosome-scale genome sequences of the Chinese softshell turtle Pelodiscus sinensis.
Collapse
Affiliation(s)
- Mitsutaka Kadota
- Laboratory for Phyloinformatics, RIKEN Center for Biosystems Dynamics Research (BDR), Kobe 650-0047, Japan
| | - Osamu Nishimura
- Laboratory for Phyloinformatics, RIKEN Center for Biosystems Dynamics Research (BDR), Kobe 650-0047, Japan
| | - Hisashi Miura
- Laboratory for Developmental Epigenetics, RIKEN BDR, Kobe 650-0047, Japan
| | - Kaori Tanaka
- Laboratory for Phyloinformatics, RIKEN Center for Biosystems Dynamics Research (BDR), Kobe 650-0047, Japan
| | - Ichiro Hiratani
- Laboratory for Developmental Epigenetics, RIKEN BDR, Kobe 650-0047, Japan
| | - Shigehiro Kuraku
- Laboratory for Phyloinformatics, RIKEN Center for Biosystems Dynamics Research (BDR), Kobe 650-0047, Japan
| |
Collapse
|
198
|
Logsdon GA, Vollger MR, Eichler EE. Long-read human genome sequencing and its applications. Nat Rev Genet 2020; 21:597-614. [PMID: 32504078 PMCID: PMC7877196 DOI: 10.1038/s41576-020-0236-x] [Citation(s) in RCA: 457] [Impact Index Per Article: 114.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/31/2020] [Indexed: 12/27/2022]
Abstract
Over the past decade, long-read, single-molecule DNA sequencing technologies have emerged as powerful players in genomics. With the ability to generate reads tens to thousands of kilobases in length with an accuracy approaching that of short-read sequencing technologies, these platforms have proven their ability to resolve some of the most challenging regions of the human genome, detect previously inaccessible structural variants and generate some of the first telomere-to-telomere assemblies of whole chromosomes. Long-read sequencing technologies will soon permit the routine assembly of diploid genomes, which will revolutionize genomics by revealing the full spectrum of human genetic variation, resolving some of the missing heritability and leading to the discovery of novel mechanisms of disease.
Collapse
Affiliation(s)
- Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
199
|
Implications of germline copy-number variations in psychiatric disorders: review of large-scale genetic studies. J Hum Genet 2020; 66:25-37. [PMID: 32958875 DOI: 10.1038/s10038-020-00838-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 08/28/2020] [Accepted: 09/01/2020] [Indexed: 02/07/2023]
Abstract
Copy number variants (CNVs), defined as genome sequences of ≥50 bp that differ in copy number from that in a reference genome, are a common form of structural variation. Germline CNVs account for some of the missing heritability that single nucleotide polymorphisms could not account for. Recent technological advances have had a huge impact on CNV research. Microarray technology enables relatively low-cost, high-throughput, genome-wide measurements, and short-read sequencing technology enables the detection of short CNVs that cannot be detected by microarrays. As a result, large-scale genetic studies have been able to identify a variety of common and rare germline CNVs and their associations with diseases. Rare germline CNVs have been reported to be associated with neuropsychiatric disorders. In this review, we focused on germline CNVs and briefly described their functional characteristics, formation mechanisms, detection methods, related databases, and the latest findings. Finally, we introduced recent large-scale genetic studies to assess associations of CNVs with diseases, especially psychiatric disorders, and discussed the use of CNV-based animal models to investigate the molecular and cellular mechanisms underlying these disorders. The development and implementation of improved detection methods, such as long-read single-molecule sequencing, are expected to provide additional insight into the molecular basis of psychiatric disorders and other complex diseases, thus facilitating basic and clinical research on CNVs.
Collapse
|
200
|
Penouilh-Suzette C, Fourré S, Besnard G, Godiard L, Pecrix Y. A simple method for high molecular-weight genomic DNA extraction suitable for long-read sequencing from spores of an obligate biotroph oomycete. J Microbiol Methods 2020; 178:106054. [PMID: 32926900 DOI: 10.1016/j.mimet.2020.106054] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 08/09/2020] [Accepted: 09/07/2020] [Indexed: 10/23/2022]
Abstract
Long-read sequencing technologies are having a major impact on our approaches to studying non-model organisms and microbial communities. By significantly reducing the cost and facilitating the genome assembly pipelines, any laboratory can now develop its own genomics program regardless of the complexity of the genome studied. The most crucial current challenge is to develop efficient protocols for extracting genomic DNA (gDNA) with high quality and integrity adapted to the organism of interest. This can be particularly complex for obligate pathogens that must maintain intimate interactions inside infected host tissues. Here we propose a simple and cost-effective method for high molecular weight gDNA extraction from spores of Plasmopara halstedii, an obligate biotroph oomycete pathogen responsible for downy mildew in sunflower. We optimized the yield, the quality and the integrity of the extracted gDNA by fine-tuning three critical parameters, the grinding, the lysis temperature and the lysis duration. We obtained gDNA with a fragment size distribution reaching a peak ranging from 79 to 145 kb. More than half of the extracted gDNA consisted of DNA fragments larger than 42 kb, with 23% of fragments larger than 100 kb. We then demonstrated the relevance of this protocol for long-read sequencing using PacBio RSII technology. With this protocol, we were able to obtain a mean read length of 9.3 kb, a max read length of 71 kb and an N50 of 13.3 kb. The development of such DNA extraction protocols is an essential prerequisite for fully exploiting technologies requiring high molecular weight gDNA (e.g. long-read sequencing or optical mapping). These technological advances will help generate data to answer questions such as the role of newly duplicated gene clusters, repeated regions, genomic structural variations or to define number of chromosomes that still remains undefined in many species of pathogenic fungi and oomycetes.
Collapse
Affiliation(s)
- Charlotte Penouilh-Suzette
- LIPM (Laboratoire des Interactions Plantes Microorganismes), INRAE, CNRS, Université de Toulouse, 24 Chemin de Borde-Rouge, BP 52627, F-31326 Castanet-Tolosan, France.
| | - Sandra Fourré
- GeT-PlaGe, INRAE Auzeville, US 1426, 24 Chemin de Borde-Rouge, BP 52627, F-31326 Castanet-Tolosan, France.
| | - Guillaume Besnard
- CNRS, Université Paul Sabatier, IRD, UMR 5174 EDB (Laboratoire Évolution et Diversité Biologique), 118 route de Narbonne, F-31062 Toulouse, France.
| | - Laurence Godiard
- LIPM (Laboratoire des Interactions Plantes Microorganismes), INRAE, CNRS, Université de Toulouse, 24 Chemin de Borde-Rouge, BP 52627, F-31326 Castanet-Tolosan, France.
| | - Yann Pecrix
- LIPM (Laboratoire des Interactions Plantes Microorganismes), INRAE, CNRS, Université de Toulouse, 24 Chemin de Borde-Rouge, BP 52627, F-31326 Castanet-Tolosan, France; CIRAD, UMR 53 Peuplements Végétaux et Bioagresseurs en Milieu Tropical (PVBMT), Pole de Protection des Plantes, 7 chemin de l'IRAT, F-97410 Saint Pierre, Réunion, France.
| |
Collapse
|