51
|
Mackintosh A, Laetsch DR, Baril T, Ebdon S, Jay P, Vila R, Hayward A, Lohse K. The genome sequence of the scarce swallowtail, Iphiclides podalirius. G3 (BETHESDA, MD.) 2022; 12:jkac193. [PMID: 35929795 PMCID: PMC9434224 DOI: 10.1093/g3journal/jkac193] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 07/20/2022] [Indexed: 12/04/2022]
Abstract
The scarce swallowtail, Iphiclides podalirius (Linnaeus, 1758), is a species of butterfly in the family Papilionidae. Here, we present a chromosome-level genome assembly for Iphiclides podalirius as well as gene and transposable element annotations. We investigate how the density of genomic features differs between the 30 Iphiclides podalirius chromosomes. We find that shorter chromosomes have higher heterozygosity at four-fold-degenerate sites and a greater density of transposable elements. While the first result is an expected consequence of differences in recombination rate, the second suggests a counter-intuitive relationship between recombination and transposable element evolution. This high-quality genome assembly, the first for any species in the tribe Leptocircini, will be a valuable resource for population genomics in the genus Iphiclides and comparative genomics more generally.
Collapse
Affiliation(s)
- Alexander Mackintosh
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Dominik R Laetsch
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Tobias Baril
- Centre for Ecology and Conservation, University of Exeter, Penryn Campus, Cornwall TR10 9FE, UK
| | - Sam Ebdon
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Paul Jay
- Ecologie Systématique Evolution, Bâtiment 360, CNRS, AgroParisTech, Université Paris-Saclay, 91400 Orsay, France
| | - Roger Vila
- Institut de Biologia Evolutiva (CSIC—Universitat Pompeu Fabra), Barcelona 08003, Spain
| | - Alex Hayward
- Centre for Ecology and Conservation, University of Exeter, Penryn Campus, Cornwall TR10 9FE, UK
| | - Konrad Lohse
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh EH9 3FL, UK
| |
Collapse
|
52
|
Löytynoja A. Thousands of human mutation clusters are explained by short-range template switching. Genome Res 2022; 32:1437-1447. [PMID: 35760560 PMCID: PMC9435742 DOI: 10.1101/gr.276478.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 06/21/2022] [Indexed: 02/03/2023]
Abstract
Variation within human genomes is unevenly distributed, and variants show spatial clustering. DNA replication-related template switching is a poorly known mutational mechanism capable of causing major chromosomal rearrangements as well as creating short inverted sequence copies that appear as local mutation clusters in sequence comparisons. In this study, haplotype-resolved genome assemblies representing 25 human populations and multinucleotide variants aggregated from 140,000 human sequencing experiments were reanalyzed. Local template switching could explain thousands of complex mutation clusters across the human genome, the loci segregating within and between populations. During the study, computational tools were developed for identification of template switch events using both short-read sequencing data and genotype data, and for genotyping candidate loci using short-read data. The characteristics of template-switch mutations complicate their detection, and widely used analysis pipelines for short-read sequencing data, normally capable of identifying single nucleotide changes, were found to miss template-switch mutations of tens of base pairs, potentially invalidating medical genetic studies searching for a causative allele behind genetic diseases. Combined with the massive sequencing data now available for humans, the novel tools described here enable building catalogs of affected loci and studying the cellular mechanisms behind template switching in both healthy organisms and disease.
Collapse
Affiliation(s)
- Ari Löytynoja
- Institute of Biotechnology, University of Helsinki, FI-00014 Helsinki, Finland
| |
Collapse
|
53
|
Achakkagari SR, Kyriakidou M, Gardner KM, De Koeyer D, De Jong H, Strömvik MV, Tai HH. Genome sequencing of adapted diploid potato clones. FRONTIERS IN PLANT SCIENCE 2022; 13:954933. [PMID: 36003817 PMCID: PMC9394749 DOI: 10.3389/fpls.2022.954933] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 07/20/2022] [Indexed: 06/15/2023]
Abstract
Cultivated potato is a vegetatively propagated crop, and most varieties are autotetraploid with high levels of heterozygosity. Reducing the ploidy and breeding potato at the diploid level can increase efficiency for genetic improvement including greater ease of introgression of diploid wild relatives and more efficient use of genomics and markers in selection. More recently, selfing of diploids for generation of inbred lines for F1 hybrid breeding has had a lot of attention in potato. The current study provides genomics resources for nine legacy non-inbred adapted diploid potato clones developed at Agriculture and Agri-Food Canada. De novo genome sequence assembly using 10× Genomics and Illumina sequencing technologies show the genome sizes ranged from 712 to 948 Mbp. Structural variation was identified by comparison to two references, the potato DMv6.1 genome and the phased RHv3 genome, and a k-mer based analysis of sequence reads showed the genome heterozygosity range of 1 to 9.04% between clones. A genome-wide approach was taken to scan 5 Mb bins to visualize patterns of heterozygous deleterious alleles. These were found dispersed throughout the genome including regions overlapping segregation distortions. Novel variants of the StCDF1 gene conferring earliness of tuberization were found among these clones, which all produce tubers under long days. The genomes will be useful tools for genome design for potato breeding.
Collapse
Affiliation(s)
| | - Maria Kyriakidou
- Department of Plant Science, McGill University, Sainte-Anne-de-Bellevue, QC, Canada
| | - Kyle M. Gardner
- Fredericton Research and Development Centre, Agriculture and Agri-Food Canada, Fredericton, NB, Canada
| | - David De Koeyer
- Fredericton Research and Development Centre, Agriculture and Agri-Food Canada, Fredericton, NB, Canada
| | - Hielke De Jong
- Fredericton Research and Development Centre, Agriculture and Agri-Food Canada, Fredericton, NB, Canada
| | - Martina V. Strömvik
- Department of Plant Science, McGill University, Sainte-Anne-de-Bellevue, QC, Canada
| | - Helen H. Tai
- Fredericton Research and Development Centre, Agriculture and Agri-Food Canada, Fredericton, NB, Canada
| |
Collapse
|
54
|
Zhang T, Zhou J, Gao W, Jia Y, Wei Y, Wang G. Complex genome assembly based on long-read sequencing. Brief Bioinform 2022; 23:6657663. [PMID: 35940845 DOI: 10.1093/bib/bbac305] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 06/20/2022] [Accepted: 07/06/2022] [Indexed: 11/12/2022] Open
Abstract
High-quality genome chromosome-scale sequences provide an important basis for genomics downstream analysis, especially the construction of haplotype-resolved and complete genomes, which plays a key role in genome annotation, mutation detection, evolutionary analysis, gene function research, comparative genomics and other aspects. However, genome-wide short-read sequencing is difficult to produce a complete genome in the face of a complex genome with high duplication and multiple heterozygosity. The emergence of long-read sequencing technology has greatly improved the integrity of complex genome assembly. We review a variety of computational methods for complex genome assembly and describe in detail the theories, innovations and shortcomings of collapsed, semi-collapsed and uncollapsed assemblers based on long reads. Among the three methods, uncollapsed assembly is the most correct and complete way to represent genomes. In addition, genome assembly is closely related to haplotype reconstruction, that is uncollapsed assembly realizes haplotype reconstruction, and haplotype reconstruction promotes uncollapsed assembly. We hope that gapless, telomere-to-telomere and accurate assembly of complex genomes can be truly routinely achieved using only a simple process or a single tool in the future.
Collapse
Affiliation(s)
- Tianjiao Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Jie Zhou
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Wentao Gao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Yuran Jia
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Yanan Wei
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Guohua Wang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| |
Collapse
|
55
|
Meleshko D, Yang R, Marks P, Williams S, Hajirasouliha I. Efficient detection and assembly of non-reference DNA sequences with synthetic long reads. Nucleic Acids Res 2022; 50:e108. [PMID: 35924489 PMCID: PMC9561269 DOI: 10.1093/nar/gkac653] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 06/10/2022] [Accepted: 08/01/2022] [Indexed: 11/14/2022] Open
Abstract
Recent pan-genome studies have revealed an abundance of DNA sequences in human genomes that are not present in the reference genome. A lion's share of these non-reference sequences (NRSs) cannot be reliably assembled or placed on the reference genome. Improvements in long-read and synthetic long-read (aka linked-read) technologies have great potential for the characterization of NRSs. While synthetic long reads require less input DNA than long-read datasets, they are algorithmically more challenging to use. Except for computationally expensive whole-genome assembly methods, there is no synthetic long-read method for NRS detection. We propose a novel integrated alignment-based and local assembly-based algorithm, Novel-X, that uses the barcode information encoded in synthetic long reads to improve the detection of such events without a whole-genome de novo assembly. Our evaluations demonstrate that Novel-X finds many non-reference sequences that cannot be found by state-of-the-art short-read methods. We applied Novel-X to a diverse set of 68 samples from the Polaris HiSeq 4000 PGx cohort. Novel-X discovered 16 691 NRS insertions of size > 300 bp (total length 18.2 Mb). Many of them are population specific or may have a functional impact.
Collapse
Affiliation(s)
- Dmitry Meleshko
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, NY 10021, USA.,Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, NY 10021, USA
| | - Rui Yang
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, NY 10021, USA.,Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, NY 10021, USA
| | - Patrick Marks
- 10x Genomics Inc., Stoneridge Mall Road, Pleasanton, CA 94566, USA
| | - Stephen Williams
- 10x Genomics Inc., Stoneridge Mall Road, Pleasanton, CA 94566, USA
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, NY 10021, USA.,Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, NY 10021, USA
| |
Collapse
|
56
|
Tunstrom K, Wheat CW, Parmesan C, Singer MC, Mikheyev AS. A Genome for Edith's Checkerspot Butterfly: An Insect with Complex Host-Adaptive Suites and Rapid Evolutionary Responses to Environmental Changes. Genome Biol Evol 2022; 14:evac113. [PMID: 35876165 PMCID: PMC9348621 DOI: 10.1093/gbe/evac113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/06/2022] [Indexed: 11/15/2022] Open
Abstract
Insects have been key players in the assessments of biodiversity impacts of anthropogenically driven environmental change, including the evolutionary and ecological impacts of climate change. Populations of Edith's Checkerspot Butterfly (Euphydryas editha) adapt rapidly to diverse environmental conditions, with numerous high-impact studies documenting these dynamics over several decades. However, studies of the underlying genetic bases of these responses have been hampered by missing genomic resources, limiting the ability to connect genomic responses to environmental change. Using a combination of Oxford Nanopore long reads, haplotype merging, HiC scaffolding followed by Illumina polishing, we generated a highly contiguous and complete assembly (contigs n = 142, N50 = 21.2 Mb, total length = 607.8 Mb; BUSCOs n = 5,286, single copy complete = 97.8%, duplicated = 0.9%, fragmented = 0.3%, missing = 1.0%). A total of 98% of the assembled genome was placed into 31 chromosomes, which displayed large-scale synteny with other well-characterized lepidopteran genomes. The E. editha genome, annotation, and functional descriptions now fill a missing gap for one of the leading field-based ecological model systems in North America.
Collapse
Affiliation(s)
- Kalle Tunstrom
- Department of Zoology, Stockholm University, Stockholm, Sweden
| | | | - Camille Parmesan
- Station d’Écologie Théorique et Expérimentale, CNRS, 2 route du CNRS, 09200 Moulis, France
- Biological and Marine Sciences, University of Plymouth, Plymouth, UK
- Department of Geological Sciences, University of Texas at Austin, TX, USA
| | - Michael C Singer
- Station d’Écologie Théorique et Expérimentale, CNRS, 2 route du CNRS, 09200 Moulis, France
- Biological and Marine Sciences, University of Plymouth, Plymouth, UK
| | - Alexander S Mikheyev
- Research School of Biology, Australian National University, Canberra, ACT, Australia
| |
Collapse
|
57
|
Li S, Park S, Ye C, Danyko C, Wroten M, Andrews P, Wigler M, Levy D. Targeted de novo phasing and long-range assembly by template mutagenesis. Nucleic Acids Res 2022; 50:e103. [PMID: 35822882 PMCID: PMC9561374 DOI: 10.1093/nar/gkac592] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 05/16/2022] [Accepted: 06/27/2022] [Indexed: 01/31/2023] Open
Abstract
Short-read sequencers provide highly accurate reads at very low cost. Unfortunately, short reads are often inadequate for important applications such as assembly in complex regions or phasing across distant heterozygous sites. In this study, we describe novel bench protocols and algorithms to obtain haplotype-phased sequence assemblies with ultra-low error for regions 10 kb and longer using short reads only. We accomplish this by imprinting each template strand from a target region with a dense and unique mutation pattern. The mutation process randomly and independently converts ∼50% of cytosines to uracils. Sequencing libraries are made from both mutated and unmutated templates. Using de Bruijn graphs and paired-end read information, we assemble each mutated template and use the unmutated library to correct the mutated bases. Templates are partitioned into two or more haplotypes, and the final haplotypes are assembled and corrected for residual template mutations and PCR errors. With sufficient template coverage, the final assemblies have per-base error rates below 10–9. We demonstrate this method on a four-member nuclear family, correctly assembling and phasing three genomic intervals, including the highly polymorphic HLA-B gene.
Collapse
Affiliation(s)
- Siran Li
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Sarah Park
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Catherine Ye
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Cassidy Danyko
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Matthew Wroten
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Peter Andrews
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Michael Wigler
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Dan Levy
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| |
Collapse
|
58
|
Pham MT, Gupta A, Gupta H, Vaghasia A, Skaist A, Garrison MA, Coulter JB, Haffner MC, Zheng SL, Xu J, DeStefano Shields C, Isaacs WB, Wheelan SJ, Nelson WG, Yegnasubramanian S. Identifying Phased Mutations and Complex Rearrangements in Human Prostate Cancer Cell Lines through Linked-Read Whole-Genome Sequencing. Mol Cancer Res 2022; 20:1013-1020. [PMID: 35452513 PMCID: PMC9262859 DOI: 10.1158/1541-7786.mcr-21-0683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 03/19/2022] [Accepted: 04/14/2022] [Indexed: 11/16/2022]
Abstract
A limited number of cell lines have fueled the majority of preclinical prostate cancer research, but their genomes remain incompletely characterized. Here, we utilized whole-genome linked-read sequencing for comprehensive characterization of phased mutations and rearrangements in the most commonly used cell lines in prostate cancer research including PC3, LNCaP, DU145, CWR22Rv1, VCaP, LAPC4, MDA-PCa-2b, RWPE-1, and four derivative castrate-resistant (CR) cell lines LNCaP_Abl, LNCaP_C42b, VCaP-CR, and LAPC4-CR. Phasing of mutations allowed determination of "gene-level haplotype" to assess whether genes harbored heterozygous mutations in one or both alleles. Phased structural variant analysis allowed identification of complex rearrangement chains consistent with chromothripsis and chromoplexy. In addition, comparison of parental and derivative CR lines revealed previously known and novel genomic alterations associated with the CR phenotype. IMPLICATIONS This study therefore comprehensively characterized phased genomic alterations in the commonly used prostate cancer cell lines, providing a useful resource for future prostate cancer research.
Collapse
Affiliation(s)
- Minh-Tam Pham
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Department of Urology, James Buchanan Brady Urological Institute, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Cellular and Molecular Medicine Graduate Program, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Anuj Gupta
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Harshath Gupta
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Ajay Vaghasia
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Cellular and Molecular Medicine Graduate Program, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Alyza Skaist
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - McKinzie A. Garrison
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Biochemistry, Cellular and Molecular Biology Graduate Program, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Jonathan B. Coulter
- Department of Urology, James Buchanan Brady Urological Institute, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Michael C. Haffner
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Division of Human Biology and Clinical Research, Fred Hutchinson Cancer Research Center, Seattle, Washington
- Department of Pathology, University of Washington, Seattle, Washington
| | - S. Lilly Zheng
- Program for Personalized Cancer Care, NorthShore University HealthSystem, Evanston, Illinois
| | - Jianfeng Xu
- Program for Personalized Cancer Care, NorthShore University HealthSystem, Evanston, Illinois
| | - Christina DeStefano Shields
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - William B. Isaacs
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Department of Urology, James Buchanan Brady Urological Institute, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Cellular and Molecular Medicine Graduate Program, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Sarah J. Wheelan
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - William G. Nelson
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Cellular and Molecular Medicine Graduate Program, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Srinivasan Yegnasubramanian
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Cellular and Molecular Medicine Graduate Program, Johns Hopkins University School of Medicine, Baltimore, Maryland
| |
Collapse
|
59
|
Su Y, Hong AL. Recent Advances in Renal Medullary Carcinoma. Int J Mol Sci 2022; 23:ijms23137097. [PMID: 35806102 PMCID: PMC9266801 DOI: 10.3390/ijms23137097] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 06/20/2022] [Accepted: 06/24/2022] [Indexed: 02/05/2023] Open
Abstract
Renal medullary carcinoma (RMC) is a rare renal malignancy that has been associated with sickle hemoglobinopathies. RMC is aggressive, difficult to treat, and occurs primarily in adolescents and young adults of African ancestry. This cancer is driven by the loss of SMARCB1, a tumor suppressor seen in a number of primarily rare childhood cancers (e.g., rhabdoid tumor of the kidney and atypical teratoid rhabdoid tumor). Treatment options remain limited due in part to the limited knowledge of RMC biology. However, significant advances have been made in unraveling the biology of RMC, from genomics to therapeutic targets, over the past 5 years. In this review, we will present these advances and discuss what new questions exist in the field.
Collapse
Affiliation(s)
- Yongdong Su
- Department of Pediatrics, Emory University School of Medicine, Atlanta, GA 30322, USA;
- Aflac Cancer and Blood Disorders Center, Children’s Healthcare of Atlanta, Atlanta, GA 30322, USA
| | - Andrew L. Hong
- Department of Pediatrics, Emory University School of Medicine, Atlanta, GA 30322, USA;
- Aflac Cancer and Blood Disorders Center, Children’s Healthcare of Atlanta, Atlanta, GA 30322, USA
- Winship Cancer Institute, Emory University School of Medicine, Atlanta, GA 30322, USA
- Correspondence:
| |
Collapse
|
60
|
Wilcox JJS, Arca-Ruibal B, Samour J, Mateuta V, Idaghdour Y, Boissinot S. Linked-Read Sequencing of Eight Falcons Reveals a Unique Genomic Architecture in Flux. Genome Biol Evol 2022; 14:evac090. [PMID: 35700227 PMCID: PMC9214253 DOI: 10.1093/gbe/evac090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 05/27/2022] [Accepted: 06/06/2022] [Indexed: 11/12/2022] Open
Abstract
Falcons are diverse birds of cultural and economic importance. They have undergone major lineage-specific chromosomal rearrangements, resulting in greatly-reduced chromosome counts relative to other birds. Here, we use 10X Genomics linked reads to provide new high-contiguity genomes for two gyrfalcons, a saker falcon, a lanner falcon, three subspecies of peregrine falcons, and the common kestrel. Assisted by a transcriptome sequenced from 22 gyrfalcon tissues, we annotate these genomes for a variety of genomic features, estimate historical demography, and then investigate genomic equilibrium in the context of falcon-specific chromosomal rearrangements. We find that falcon genomes are not in AT-GC equilibrium with a bias in substitutions towards higher AT content; this bias is predominantly but not exclusively driven by hypermutability of CpG sites. Small indels and large structural variants were also biased towards insertions rather than deletions. Patterns of disequilibrium were linked to chromosomal rearrangements: falcons have lost GC content in regions that have fused to larger chromosomes from microchromosomes and gained GC content in regions of macrochromosomes that have translocated to microchromosomes. Inserted bases have accumulated on regions ancestrally belonging to microchromosomes, consistent with insertion-biased gene conversion. We also find an excess of interspersed repeats on regions of microchromosomes that have fused to macrochromosomes. Our results reveal that falcon genomes are in a state of flux. They further suggest that many of the key differences between microchromosomes and macrochromosomes are driven by differences in chromosome size, and indicate a clear role for recombination and biased-gene-conversion in determining genomic equilibrium.
Collapse
Affiliation(s)
- Justin J S Wilcox
- Center for Genomics & Systems Biology, New York University Abu Dhabi, Saadiyat Island, Abu Dhabi, United Arab Emirates
| | | | - Jaime Samour
- Wildlife Management and Falcon Medicine and Breeding Consultancy, Abu Dhabi, United Arab Emirates
| | | | - Youssef Idaghdour
- Center for Genomics & Systems Biology, New York University Abu Dhabi, Saadiyat Island, Abu Dhabi, United Arab Emirates
- Biology Program, New York University Abu Dhabi, Saadiyat Island, Abu Dhabi, United Arab Emirates
| | - Stéphane Boissinot
- Center for Genomics & Systems Biology, New York University Abu Dhabi, Saadiyat Island, Abu Dhabi, United Arab Emirates
- Biology Program, New York University Abu Dhabi, Saadiyat Island, Abu Dhabi, United Arab Emirates
| |
Collapse
|
61
|
Linked-read whole-genome sequencing resolves common and private structural variants in multiple myeloma. Blood Adv 2022; 6:5009-5023. [PMID: 35675515 PMCID: PMC9631623 DOI: 10.1182/bloodadvances.2021006720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 05/31/2022] [Indexed: 01/18/2023] Open
Abstract
Linked-read WGS can be performed without DNA purification and allows for resolution of the diverse structural variants found in MM. Linked-read WGS can, as a standalone assay, provide comprehensive genetics in myeloma and other diseases with complex genomes.
Multiple myeloma (MM) is an incurable and aggressive plasma cell malignancy characterized by a complex karyotype with multiple structural variants (SVs) and copy-number variations (CNVs). Linked-read whole-genome sequencing (lrWGS) allows for refined detection and reconstruction of SVs by providing long-range genetic information from standard short-read sequencing. This makes lrWGS an attractive solution for capturing the full genomic complexity of MM. Here we show that high-quality lrWGS data can be generated from low numbers of cells subjected to fluorescence-activated cell sorting (FACS) without DNA purification. Using this protocol, we analyzed MM cells after FACS from 37 patients with MM using lrWGS. We found high concordance between lrWGS and fluorescence in situ hybridization (FISH) for the detection of recurrent translocations and CNVs. Outside of the regions investigated by FISH, we identified >150 additional SVs and CNVs across the cohort. Analysis of the lrWGS data allowed for resolution of the structure of diverse SVs affecting the MYC and t(11;14) loci, causing the duplication of genes and gene regulatory elements. In addition, we identified private SVs causing the dysregulation of genes recurrently involved in translocations with the IGH locus and show that these can alter the molecular classification of MM. Overall, we conclude that lrWGS allows for the detection of aberrations critical for MM prognostics and provides a feasible route for providing comprehensive genetics. Implementing lrWGS could provide more accurate clinical prognostics, facilitate genomic medicine initiatives, and greatly improve the stratification of patients included in clinical trials.
Collapse
|
62
|
Chiu R, Rajan-Babu IS, Birol I, Friedman JM. Linked-read sequencing for detecting short tandem repeat expansions. Sci Rep 2022; 12:9352. [PMID: 35672336 PMCID: PMC9174224 DOI: 10.1038/s41598-022-13024-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 05/19/2022] [Indexed: 11/09/2022] Open
Abstract
Detection of short tandem repeat (STR) expansions with standard short-read sequencing is challenging due to the difficulty in mapping multicopy repeat sequences. In this study, we explored how the long-range sequence information of barcode linked-read sequencing (BLRS) can be leveraged to improve repeat-read detection. We also devised a novel algorithm using BLRS barcodes for distance estimation and evaluated its application for STR genotyping. Both approaches were designed for genotyping large expansions (> 1 kb) that cannot be sized accurately by existing methods. Using simulated and experimental data of genomes with STR expansions from multiple BLRS platforms, we validated the utility of barcode and phasing information in attaining better STR genotypes compared to standard short-read sequencing. Although the coverage bias of extremely GC-rich STRs is an important limitation of BLRS, BLRS is an effective strategy for genotyping many other STR loci.
Collapse
Affiliation(s)
- Readman Chiu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Indhu-Shree Rajan-Babu
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada.,Department of Medical and Molecular Genetics, King's College London, Strand, London, WC2R 2LS, UK
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada. .,Department of Medical Genetics, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada.
| | - Jan M Friedman
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada.,BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada
| |
Collapse
|
63
|
Gao Y, Ma L, Liu GE. Initial Analysis of Structural Variation Detections in Cattle Using Long-Read Sequencing Methods. Genes (Basel) 2022; 13:828. [PMID: 35627213 PMCID: PMC9142105 DOI: 10.3390/genes13050828] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 05/01/2022] [Accepted: 05/04/2022] [Indexed: 02/01/2023] Open
Abstract
Structural variations (SVs), as a great source of genetic variation, are widely distributed in the genome. SVs involve longer genomic sequences and potentially have stronger effects than SNPs, but they are not well captured by short-read sequencing owing to their size and relevance to repeats. Improved characterization of SVs can provide more advanced insight into complex traits. With the availability of long-read sequencing, it has become feasible to uncover the full range of SVs. Here, we sequenced one cattle individual using 10× Genomics (10 × G) linked read, Pacific Biosciences (PacBio) continuous long reads (CLR) and circular consensus sequencing (CCS), as well as Oxford Nanopore Technologies (ONT) PromethION. We evaluated the ability of various methods for SV detection. We identified 21,164 SVs, which amount to 186 Mb covering 7.07% of the whole genome. The number of SVs inferred from long-read-based inferences was greater than that from short reads. The PacBio CLR identified the most of large SVs and covered the most genomes. SVs called with PacBio CCS and ONT data showed high uniformity. The one with the most overlap with the results obtained by short-read data was PB CCS. Together, we found that long reads outperformed short reads in terms of SV detections.
Collapse
Affiliation(s)
- Yahui Gao
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, U.S. Department of Agriculture, Beltsville, MD 20705, USA;
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA;
| | - Li Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA;
| | - George E. Liu
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, U.S. Department of Agriculture, Beltsville, MD 20705, USA;
| |
Collapse
|
64
|
Shearman JR, Naktang C, Sonthirod C, Kongkachana W, U-Thoomporn S, Jomchai N, Maknual C, Yamprasai S, Promchoo W, Ruang-Areerate P, Pootakham W, Tangphatsornruang S. Assembly of a hybrid mangrove, Bruguiera hainesii, and its two ancestral contributors, Bruguiera cylindrica and Bruguiera gymnorhiza. Genomics 2022; 114:110382. [PMID: 35526741 DOI: 10.1016/j.ygeno.2022.110382] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 04/19/2022] [Accepted: 05/02/2022] [Indexed: 01/14/2023]
Abstract
Mangroves are plants that live in tropical and subtropical coastal regions of the world, they are adapted to high salt environments and cyclic tidal flooding. Mangroves play important ecological roles, including acting as breeding grounds for many fish species and to prevent coastal erosion. The genomes of three mangrove species, Bruguiera gymnorhiza, Bruguiera cylindrica, and a hybrid of the two, Bruguiera hainesii were sequenced, assembled and annotated. The two progenitor species, B. gymnorhiza and B. cylindrica, were found to be highly similar to each other and sufficiently similar to B. parviflora to allow it to be used for reference based scaffolding to generate chromosome level scaffolds. The two subgenomes of B. hainesii were independently assembled and scaffolded. Analysis of B. hainesii confirms that it is a hybrid and the hybridisation event was estimated at 2.4 to 3.5 million years ago using a Bayesian Relaxed Molecular Clock approach.
Collapse
Affiliation(s)
- Jeremy R Shearman
- National Omics Center, National Science and Technology Development Agency, 111 Thailand Science Park, Paholyothin Road, Khlong Nueng, Khlong Luang, Pathumthani 12120, Thailand
| | - Chaiwat Naktang
- National Omics Center, National Science and Technology Development Agency, 111 Thailand Science Park, Paholyothin Road, Khlong Nueng, Khlong Luang, Pathumthani 12120, Thailand
| | - Chutima Sonthirod
- National Omics Center, National Science and Technology Development Agency, 111 Thailand Science Park, Paholyothin Road, Khlong Nueng, Khlong Luang, Pathumthani 12120, Thailand
| | - Wasitthee Kongkachana
- National Omics Center, National Science and Technology Development Agency, 111 Thailand Science Park, Paholyothin Road, Khlong Nueng, Khlong Luang, Pathumthani 12120, Thailand
| | - Sonicha U-Thoomporn
- National Omics Center, National Science and Technology Development Agency, 111 Thailand Science Park, Paholyothin Road, Khlong Nueng, Khlong Luang, Pathumthani 12120, Thailand
| | - Nukoon Jomchai
- National Omics Center, National Science and Technology Development Agency, 111 Thailand Science Park, Paholyothin Road, Khlong Nueng, Khlong Luang, Pathumthani 12120, Thailand
| | - Chatree Maknual
- Department of Marine and Coastal Resources, 120 The Government Complex, Chaengwatthana Rd., Thung Song Hong, Bangkok 10210, Thailand
| | - Suchart Yamprasai
- Department of Marine and Coastal Resources, 120 The Government Complex, Chaengwatthana Rd., Thung Song Hong, Bangkok 10210, Thailand
| | - Waratthaya Promchoo
- Department of Marine and Coastal Resources, 120 The Government Complex, Chaengwatthana Rd., Thung Song Hong, Bangkok 10210, Thailand
| | - Panthita Ruang-Areerate
- National Omics Center, National Science and Technology Development Agency, 111 Thailand Science Park, Paholyothin Road, Khlong Nueng, Khlong Luang, Pathumthani 12120, Thailand
| | - Wirulda Pootakham
- National Omics Center, National Science and Technology Development Agency, 111 Thailand Science Park, Paholyothin Road, Khlong Nueng, Khlong Luang, Pathumthani 12120, Thailand
| | - Sithichoke Tangphatsornruang
- National Omics Center, National Science and Technology Development Agency, 111 Thailand Science Park, Paholyothin Road, Khlong Nueng, Khlong Luang, Pathumthani 12120, Thailand.
| |
Collapse
|
65
|
Wagner J, Olson ND, Harris L, Khan Z, Farek J, Mahmoud M, Stankovic A, Kovacevic V, Yoo B, Miller N, Rosenfeld JA, Ni B, Zarate S, Kirsche M, Aganezov S, Schatz MC, Narzisi G, Byrska-Bishop M, Clarke W, Evani US, Markello C, Shafin K, Zhou X, Sidow A, Bansal V, Ebert P, Marschall T, Lansdorp P, Hanlon V, Mattsson CA, Barrio AM, Fiddes IT, Xiao C, Fungtammasan A, Chin CS, Wenger AM, Rowell WJ, Sedlazeck FJ, Carroll A, Salit M, Zook JM. Benchmarking challenging small variants with linked and long reads. CELL GENOMICS 2022; 2:100128. [PMID: 36452119 PMCID: PMC9706577 DOI: 10.1016/j.xgen.2022.100128] [Citation(s) in RCA: 55] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling and sequencing methods. Here we use accurate linked and long reads to expand benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are challenging for short reads. These benchmarks add more than 300,000 SNVs and 50,000 insertions or deletions (indels) and include 16% more exonic variants, many in challenging, clinically relevant genes not covered previously, such as PMS2. For HG002, we include 92% of the autosomal GRCh38 assembly while excluding regions problematic for benchmarking small variants, such as copy number variants, that should not have been in the previous version, which included 85% of GRCh38. It identifies eight times more false negatives in a short read variant call set relative to our previous benchmark. We demonstrate that this benchmark reliably identifies false positives and false negatives across technologies, enabling ongoing methods development.
Collapse
Affiliation(s)
- Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
- Corresponding author
| | - Nathan D. Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| | - Lindsay Harris
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| | - Ziad Khan
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Jesse Farek
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Ana Stankovic
- Seven Bridges, Omladinskih brigada 90g, 11070 Belgrade, Republic of Serbia
| | - Vladimir Kovacevic
- Seven Bridges, Omladinskih brigada 90g, 11070 Belgrade, Republic of Serbia
| | - Byunggil Yoo
- Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Neil Miller
- Children’s Mercy Kansas City, Kansas City, MO, USA
| | | | - Bohan Ni
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Giuseppe Narzisi
- New York Genome Center, 101 Avenue of the Americas, New York, NY, USA
| | | | - Wayne Clarke
- New York Genome Center, 101 Avenue of the Americas, New York, NY, USA
| | - Uday S. Evani
- New York Genome Center, 101 Avenue of the Americas, New York, NY, USA
| | - Charles Markello
- University of California at Santa Cruz Genomics Institute, 1156 High Street, Santa Cruz, CA, USA
| | - Kishwar Shafin
- University of California at Santa Cruz Genomics Institute, 1156 High Street, Santa Cruz, CA, USA
| | - Xin Zhou
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Arend Sidow
- Department of Pathology, Stanford University, Stanford, CA 94305, USA
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Vikas Bansal
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
| | - Peter Ebert
- Institute of Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | - Tobias Marschall
- Institute of Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | - Peter Lansdorp
- Institute of Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | - Vincent Hanlon
- Terry Fox Laboratory, BC Cancer Research Institute and Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Carl-Adam Mattsson
- Terry Fox Laboratory, BC Cancer Research Institute and Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | | | | | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | | | | | | | | | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Andrew Carroll
- Google Inc., 1600 Amphitheatre Pkwy., Mountain View, CA 94040, USA
| | - Marc Salit
- Joint Initiative for Metrology in Biology, SLAC National Laboratory, Stanford, CA, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
- Corresponding author
| |
Collapse
|
66
|
Lewerentz J, Johansson AM, Larsson J, Stenberg P. Transposon activity, local duplications and propagation of structural variants across haplotypes drive the evolution of the Drosophila S2 cell line. BMC Genomics 2022; 23:276. [PMID: 35392795 PMCID: PMC8991648 DOI: 10.1186/s12864-022-08472-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 03/15/2022] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Immortalized cell lines are widely used model systems whose genomes are often highly rearranged and polyploid. However, their genome structure is seldom deciphered and is thus not accounted for during analyses. We therefore used linked short- and long-read sequencing to perform haplotype-level reconstruction of the genome of a Drosophila melanogaster cell line (S2-DRSC) with a complex genome structure. RESULTS Using a custom implementation (that is designed to use ultra-long reads in complex genomes with nested rearrangements) to call structural variants (SVs), we found that the most common SV was repetitive sequence insertion or deletion (> 80% of SVs), with Gypsy retrotransposon insertions dominating. The second most common SV was local sequence duplication. SNPs and other SVs were rarer, but several large chromosomal translocations and mitochondrial genome insertions were observed. Haplotypes were highly similar at the nucleotide level but structurally very different. Insertion SVs existed at various haplotype frequencies and were unlinked on chromosomes, demonstrating that haplotypes have different structures and suggesting the existence of a mechanism that allows SVs to propagate across haplotypes. Finally, using public short-read data, we found that transposable element insertions and local duplications are common in other D. melanogaster cell lines. CONCLUSIONS The S2-DRSC cell line evolved through retrotransposon activity and vast local sequence duplications, that we hypothesize were the products of DNA re-replication events. Additionally, mutations can propagate across haplotypes (possibly explained by mitotic recombination), which enables fine-tuning of mutational impact and prevents accumulation of deleterious events, an inherent problem of clonal reproduction. We conclude that traditional linear homozygous genome representation conceals the complexity when dealing with rearranged and heterozygous clonal cells.
Collapse
Affiliation(s)
- Jacob Lewerentz
- Department of Molecular Biology, Umeå University, SE-901 87, Umeå, Västerbotten, Sweden.
| | - Anna-Mia Johansson
- Department of Molecular Biology, Umeå University, SE-901 87, Umeå, Västerbotten, Sweden
| | - Jan Larsson
- Department of Molecular Biology, Umeå University, SE-901 87, Umeå, Västerbotten, Sweden.
| | - Per Stenberg
- Department of Ecology and Environmental Sciences, Umeå University, SE-901 87, Umeå, Västerbotten, Sweden.
| |
Collapse
|
67
|
Whole-genome sequencing of 1,171 elderly admixed individuals from São Paulo, Brazil. Nat Commun 2022; 13:1004. [PMID: 35246524 PMCID: PMC8897431 DOI: 10.1038/s41467-022-28648-3] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 01/21/2022] [Indexed: 02/07/2023] Open
Abstract
As whole-genome sequencing (WGS) becomes the gold standard tool for studying population genomics and medical applications, data on diverse non-European and admixed individuals are still scarce. Here, we present a high-coverage WGS dataset of 1,171 highly admixed elderly Brazilians from a census-based cohort, providing over 76 million variants, of which ~2 million are absent from large public databases. WGS enables identification of ~2,000 previously undescribed mobile element insertions without previous description, nearly 5 Mb of genomic segments absent from the human genome reference, and over 140 alleles from HLA genes absent from public resources. We reclassify and curate pathogenicity assertions for nearly four hundred variants in genes associated with dominantly-inherited Mendelian disorders and calculate the incidence for selected recessive disorders, demonstrating the clinical usefulness of the present study. Finally, we observe that whole-genome and HLA imputation could be significantly improved compared to available datasets since rare variation represents the largest proportion of input from WGS. These results demonstrate that even smaller sample sizes of underrepresented populations bring relevant data for genomic studies, especially when exploring analyses allowed only by WGS. Whole genome sequencing (WGS) data on non-European and admixed individuals remains scarce. Here, the authors analyse WGS data from 1,171 admixed elderly Brazilians from a census cohort, characterising population-specific genetic variation and exploring the clinical utility of this expanded dataset.
Collapse
|
68
|
Chen J, Zhong J, He X, Li X, Ni P, Safner T, Šprem N, Han J. The de novo assembly of a European wild boar genome revealed unique patterns of chromosomal structural variations and segmental duplications. Anim Genet 2022; 53:281-292. [PMID: 35238061 PMCID: PMC9314987 DOI: 10.1111/age.13181] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 02/12/2022] [Accepted: 02/12/2022] [Indexed: 02/05/2023]
Abstract
The rapid progress of sequencing technology has greatly facilitated the de novo genome assembly of pig breeds. However, the assembly of the wild boar genome is still lacking, hampering our understanding of chromosomal and genomic evolution during domestication from wild boars into domestic pigs. Here, we sequenced and de novo assembled a European wild boar genome (ASM2165605v1) using the long‐range information provided by 10× Linked‐Reads sequencing. We achieved a high‐quality assembly with contig N50 of 26.09 Mb. Additionally, 1.64% of the contigs (222) with lengths from 107.65 kb to 75.36 Mb covered 90.3% of the total genome size of ASM2165605v1 (~2.5 Gb). Mapping analysis revealed that the contigs can fill 24.73% (93/376) of the gaps present in the orthologous regions of the updated pig reference genome (Sscrofa11.1). We further improved the contigs into chromosome level with a reference‐assistant scaffolding method. Using the ‘assembly‐to‐assembly’ approach, we identified intra‐chromosomal large structural variations (SVs, length >1 kb) between ASM2165605v1 and Sscrofa11.1 assemblies. Interestingly, we found that the number of SV events on the X chromosome deviated significantly from the linear models fitting autosomes (R2 > 0.64, p < 0.001). Specifically, deletions and insertions were deficient on the X chromosome by 66.14 and 58.41% respectively, whereas duplications and inversions were excessive on the X chromosome by 71.96 and 107.61% respectively. We further used the large segmental duplications (SDs, >1 kb) events as a proxy to understand the large‐scale inter‐chromosomal evolution, by resolving parental‐derived relationships for SD pairs. We revealed a significant excess of SD movements from the X chromosome to autosomes (p < 0.001), consistent with the expectation of meiotic sex chromosome inactivation. Enrichment analyses indicated that the genes within derived SD copies on autosomes were significantly related to biological processes involving nervous system, lipid biosynthesis and sperm motility (p < 0.01). Together, our analyses of the de novo assembly of ASM2165605v1 provides insight into the SVs between European wild boar and domestic pig, in addition to the ongoing process of meiotic sex chromosome inactivation in driving inter‐chromosomal interaction between the sex chromosome and autosomes.
Collapse
Affiliation(s)
- Jianhai Chen
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
| | - Jie Zhong
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
| | - Xuefei He
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
| | - Xiaoyu Li
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
| | - Pan Ni
- Animal Husbandry and Veterinary Institute of Keqiao District, Shaoxing, Zhejiang, China
| | - Toni Safner
- Faculty of Agriculture, University of Zagreb, Zagreb, Croatia.,Centre of Excellence for Biodiversity and Molecular Plant Breeding, (CoE CroP-BioDiv), Zagreb, Croatia
| | - Nikica Šprem
- Faculty of Agriculture, University of Zagreb, Zagreb, Croatia
| | - Jianlin Han
- International Livestock Research Institute, Nairobi, Kenya.,CAAS-ILRI Joint Laboratory on Livestock and Forage Genetic Resources, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| |
Collapse
|
69
|
Methods to Improve Molecular Diagnosis in Genomic Cold Cases in Pediatric Neurology. Genes (Basel) 2022; 13:genes13020333. [PMID: 35205378 PMCID: PMC8871714 DOI: 10.3390/genes13020333] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 02/06/2022] [Accepted: 02/07/2022] [Indexed: 02/04/2023] Open
Abstract
During the last decade, genetic testing has emerged as an important etiological diagnostic tool for Mendelian diseases, including pediatric neurological conditions. A genetic diagnosis has a considerable impact on disease management and treatment; however, many cases remain undiagnosed after applying standard diagnostic sequencing techniques. This review discusses various methods to improve the molecular diagnostic rates in these genomic cold cases. We discuss extended analysis methods to consider, non-Mendelian inheritance models, mosaicism, dual/multiple diagnoses, periodic re-analysis, artificial intelligence tools, and deep phenotyping, in addition to integrating various omics methods to improve variant prioritization. Last, novel genomic technologies, including long-read sequencing, artificial long-read sequencing, and optical genome mapping are discussed. In conclusion, a more comprehensive molecular analysis and a timely re-analysis of unsolved cases are imperative to improve diagnostic rates. In addition, our current understanding of the human genome is still limited due to restrictions in technologies. Novel technologies are now available that improve upon some of these limitations and can capture all human genomic variation more accurately. Last, we recommend a more routine implementation of high molecular weight DNA extraction methods that is coherent with the ability to use and/or optimally benefit from these novel genomic methods.
Collapse
|
70
|
Rodriguez S, Celay J, Goicoechea I, Jimenez C, Botta C, Garcia-Barchino MJ, Garces JJ, Larrayoz M, Santos S, Alignani D, Vilas-Zornoza A, Perez C, Garate S, Sarvide S, Lopez A, Reinhardt HC, Carrasco YR, Sanchez-Garcia I, Larrayoz MJ, Calasanz MJ, Panizo C, Prosper F, Lamo-Espinosa JM, Motta M, Tucci A, Sacco A, Gentile M, Duarte S, Vitoria H, Geraldes C, Paiva A, Puig N, Garcia-Sanz R, Roccaro AM, Fuerte G, San Miguel JF, Martinez-Climent JA, Paiva B. Preneoplastic somatic mutations including MYD88L265P in lymphoplasmacytic lymphoma. SCIENCE ADVANCES 2022; 8:eabl4644. [PMID: 35044826 PMCID: PMC8769557 DOI: 10.1126/sciadv.abl4644] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
Normal cell counterparts of solid and myeloid tumors accumulate mutations years before disease onset; whether this occurs in B lymphocytes before lymphoma remains uncertain. We sequenced multiple stages of the B lineage in elderly individuals and patients with lymphoplasmacytic lymphoma, a singular disease for studying lymphomagenesis because of the high prevalence of mutated MYD88. We observed similar accumulation of random mutations in B lineages from both cohorts and unexpectedly found MYD88L265P in normal precursor and mature B lymphocytes from patients with lymphoma. We uncovered genetic and transcriptional pathways driving malignant transformation and leveraged these to model lymphoplasmacytic lymphoma in mice, based on mutated MYD88 in B cell precursors and BCL2 overexpression. Thus, MYD88L265P is a preneoplastic event, which challenges the current understanding of lymphomagenesis and may have implications for early detection of B cell lymphomas.
Collapse
Affiliation(s)
- Sara Rodriguez
- Clinica Universidad de Navarra, Centro de Investigacion Medica Aplicada (CIMA), Instituto de Investigacion Sanitaria de Navarra (IDISNA), CIBER-ONC, Pamplona, Spain
| | - Jon Celay
- Clinica Universidad de Navarra, Centro de Investigacion Medica Aplicada (CIMA), Instituto de Investigacion Sanitaria de Navarra (IDISNA), CIBER-ONC, Pamplona, Spain
| | - Ibai Goicoechea
- Clinica Universidad de Navarra, Centro de Investigacion Medica Aplicada (CIMA), Instituto de Investigacion Sanitaria de Navarra (IDISNA), CIBER-ONC, Pamplona, Spain
| | - Cristina Jimenez
- Hospital Universitario de Salamanca, Instituto de Investigacion Biomedica de Salamanca (IBSAL), Centro de Investigación del Cancer (IBMCC-USAL, CSIC), CIBER-ONC, Salamanca, Spain
| | - Cirino Botta
- Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties, University of Palermo, Palermo, Italy
| | - Maria-José Garcia-Barchino
- Clinica Universidad de Navarra, Centro de Investigacion Medica Aplicada (CIMA), Instituto de Investigacion Sanitaria de Navarra (IDISNA), CIBER-ONC, Pamplona, Spain
| | - Juan-Jose Garces
- Clinica Universidad de Navarra, Centro de Investigacion Medica Aplicada (CIMA), Instituto de Investigacion Sanitaria de Navarra (IDISNA), CIBER-ONC, Pamplona, Spain
| | - Marta Larrayoz
- Clinica Universidad de Navarra, Centro de Investigacion Medica Aplicada (CIMA), Instituto de Investigacion Sanitaria de Navarra (IDISNA), CIBER-ONC, Pamplona, Spain
| | - Susana Santos
- Centro Hospitalar e Universitario de Coimbra, Coimbra, Portugal
| | - Diego Alignani
- Clinica Universidad de Navarra, Centro de Investigacion Medica Aplicada (CIMA), Instituto de Investigacion Sanitaria de Navarra (IDISNA), CIBER-ONC, Pamplona, Spain
| | - Amaia Vilas-Zornoza
- Clinica Universidad de Navarra, Centro de Investigacion Medica Aplicada (CIMA), Instituto de Investigacion Sanitaria de Navarra (IDISNA), CIBER-ONC, Pamplona, Spain
| | - Cristina Perez
- Clinica Universidad de Navarra, Centro de Investigacion Medica Aplicada (CIMA), Instituto de Investigacion Sanitaria de Navarra (IDISNA), CIBER-ONC, Pamplona, Spain
| | - Sonia Garate
- Clinica Universidad de Navarra, Centro de Investigacion Medica Aplicada (CIMA), Instituto de Investigacion Sanitaria de Navarra (IDISNA), CIBER-ONC, Pamplona, Spain
| | - Sarai Sarvide
- Clinica Universidad de Navarra, Centro de Investigacion Medica Aplicada (CIMA), Instituto de Investigacion Sanitaria de Navarra (IDISNA), CIBER-ONC, Pamplona, Spain
| | - Aitziber Lopez
- Clinica Universidad de Navarra, Centro de Investigacion Medica Aplicada (CIMA), Instituto de Investigacion Sanitaria de Navarra (IDISNA), CIBER-ONC, Pamplona, Spain
| | - Hans-Christian Reinhardt
- Department of Hematology and Stem Cell Transplantation, West German Cancer Center, DKTK Partner Site Essen, Center for Molecular Biotechnology, University Hospital Essen, Hufelandstr. 55, 45147, Essen, Germany
| | - Yolanda R. Carrasco
- Department of Immunology and Oncology, Centro Nacional de Biotecnología (CNB)–CSIC, Madrid, Spain
| | - Isidro Sanchez-Garcia
- Experimental Therapeutics and Translational Oncology Program, Instituto de Biología Molecular y Celular del Cáncer, CSIC/Universidad de Salamanca and Institute of Biomedical Research of Salamanca (IBSAL), Salamanca, Spain
| | - Maria-Jose Larrayoz
- Clinica Universidad de Navarra, Centro de Investigacion Medica Aplicada (CIMA), Instituto de Investigacion Sanitaria de Navarra (IDISNA), CIBER-ONC, Pamplona, Spain
| | - Maria-Jose Calasanz
- Clinica Universidad de Navarra, Centro de Investigacion Medica Aplicada (CIMA), Instituto de Investigacion Sanitaria de Navarra (IDISNA), CIBER-ONC, Pamplona, Spain
| | - Carlos Panizo
- Clinica Universidad de Navarra, Centro de Investigacion Medica Aplicada (CIMA), Instituto de Investigacion Sanitaria de Navarra (IDISNA), CIBER-ONC, Pamplona, Spain
| | - Felipe Prosper
- Clinica Universidad de Navarra, Centro de Investigacion Medica Aplicada (CIMA), Instituto de Investigacion Sanitaria de Navarra (IDISNA), CIBER-ONC, Pamplona, Spain
| | - Jose-Maria Lamo-Espinosa
- Clinica Universidad de Navarra, Centro de Investigacion Medica Aplicada (CIMA), Instituto de Investigacion Sanitaria de Navarra (IDISNA), CIBER-ONC, Pamplona, Spain
| | - Marina Motta
- Department of Hematology, ASST Spedali Civili di Brescia, Brescia, Italy
| | - Alessandra Tucci
- Department of Hematology, ASST Spedali Civili di Brescia, Brescia, Italy
| | - Antonio Sacco
- Clinical Research Development and Phase I Unit, ASST Spedali Civili di Brescia, Brescia, Italy
| | - Massimo Gentile
- Department of Oncohematology, “Annunziata” Hospital, Cosenza, Italy
| | - Sara Duarte
- Centro Hospitalar e Universitario de Coimbra, Coimbra, Portugal
| | | | | | - Artur Paiva
- Centro Hospitalar e Universitario de Coimbra, Coimbra, Portugal
| | - Noemi Puig
- Hospital Universitario de Salamanca, Instituto de Investigacion Biomedica de Salamanca (IBSAL), Centro de Investigación del Cancer (IBMCC-USAL, CSIC), CIBER-ONC, Salamanca, Spain
| | - Ramon Garcia-Sanz
- Hospital Universitario de Salamanca, Instituto de Investigacion Biomedica de Salamanca (IBSAL), Centro de Investigación del Cancer (IBMCC-USAL, CSIC), CIBER-ONC, Salamanca, Spain
| | - Aldo M. Roccaro
- Clinical Research Development and Phase I Unit, ASST Spedali Civili di Brescia, Brescia, Italy
| | | | - Jesus F. San Miguel
- Clinica Universidad de Navarra, Centro de Investigacion Medica Aplicada (CIMA), Instituto de Investigacion Sanitaria de Navarra (IDISNA), CIBER-ONC, Pamplona, Spain
| | - Jose-Angel Martinez-Climent
- Clinica Universidad de Navarra, Centro de Investigacion Medica Aplicada (CIMA), Instituto de Investigacion Sanitaria de Navarra (IDISNA), CIBER-ONC, Pamplona, Spain
- Corresponding author. (J.-A.M.-C.); (B.P.)
| | - Bruno Paiva
- Clinica Universidad de Navarra, Centro de Investigacion Medica Aplicada (CIMA), Instituto de Investigacion Sanitaria de Navarra (IDISNA), CIBER-ONC, Pamplona, Spain
- Corresponding author. (J.-A.M.-C.); (B.P.)
| |
Collapse
|
71
|
Hu G, Feng J, Xiang X, Wang J, Salojärvi J, Liu C, Wu Z, Zhang J, Liang X, Jiang Z, Liu W, Ou L, Li J, Fan G, Mai Y, Chen C, Zhang X, Zheng J, Zhang Y, Peng H, Yao L, Wai CM, Luo X, Fu J, Tang H, Lan T, Lai B, Sun J, Wei Y, Li H, Chen J, Huang X, Yan Q, Liu X, McHale LK, Rolling W, Guyot R, Sankoff D, Zheng C, Albert VA, Ming R, Chen H, Xia R, Li J. Two divergent haplotypes from a highly heterozygous lychee genome suggest independent domestication events for early and late-maturing cultivars. Nat Genet 2022; 54:73-83. [PMID: 34980919 PMCID: PMC8755541 DOI: 10.1038/s41588-021-00971-3] [Citation(s) in RCA: 63] [Impact Index Per Article: 31.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 10/19/2021] [Indexed: 01/25/2023]
Abstract
Lychee is an exotic tropical fruit with a distinct flavor. The genome of cultivar ‘Feizixiao’ was assembled into 15 pseudochromosomes, totaling ~470 Mb. High heterozygosity (2.27%) resulted in two complete haplotypic assemblies. A total of 13,517 allelic genes (42.4%) were differentially expressed in diverse tissues. Analyses of 72 resequenced lychee accessions revealed two independent domestication events. The extremely early maturing cultivars preferentially aligned to one haplotype were domesticated from a wild population in Yunnan, whereas the late-maturing cultivars that mapped mostly to the second haplotype were domesticated independently from a wild population in Hainan. Early maturing cultivars were probably developed in Guangdong via hybridization between extremely early maturing cultivar and late-maturing cultivar individuals. Variable deletions of a 3.7 kb region encompassed by a pair of CONSTANS-like genes probably regulate fruit maturation differences among lychee cultivars. These genomic resources provide insights into the natural history of lychee domestication and will accelerate the improvement of lychee and related crops. Two divergent haplotypes from a highly heterozygous lychee genome of the cultivar ‘Feizixiao’ and resequencing of 72 lychee accessions provide insights into the genome evolution and domestication history of lychee.
Collapse
Affiliation(s)
- Guibing Hu
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops, Ministry of Agriculture and Rural Affairs, Guangdong Litchi Engineering Research Center, College of Horticulture, South China Agricultural University, Guangzhou, China
| | - Junting Feng
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops, Ministry of Agriculture and Rural Affairs, Guangdong Litchi Engineering Research Center, College of Horticulture, South China Agricultural University, Guangzhou, China
| | - Xu Xiang
- Key Laboratory of South Subtropical Fruit Biology and Genetic Resource Utilization, Institute of Fruit Tree Research, Guangdong Academy of Agricultural Sciences, Ministry of Agriculture and Rural Affairs, Guangdong Provincial Key Laboratory of Tropical and Subtropical Fruit Tree Research, Guangzhou, China
| | - Jiabao Wang
- Danzhou Scientific Observing and Experimental Station of Agro-Environment, Ministry of Agriculture and Rural Affairs, Environment and Plant Protection Institute, Chinese Academy of Tropical Agriculture Sciences, Haikou, China
| | - Jarkko Salojärvi
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Chengming Liu
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops, Ministry of Agriculture and Rural Affairs, Guangdong Litchi Engineering Research Center, College of Horticulture, South China Agricultural University, Guangzhou, China
| | - Zhenxian Wu
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops, Ministry of Agriculture and Rural Affairs, Guangdong Litchi Engineering Research Center, College of Horticulture, South China Agricultural University, Guangzhou, China
| | - Jisen Zhang
- Center for Genomics and Biotechnology, Haixia Institute of Science and Technology Fujian Agriculture and Forestry University, Fuzhou, China
| | | | - Zide Jiang
- Guangdong Key Laboratory of Microbial Signals and Disease Control, College of Plant Protection, South China Agricultural University, Guangzhou, China
| | - Wei Liu
- Key Laboratory of South Subtropical Fruit Biology and Genetic Resource Utilization, Institute of Fruit Tree Research, Guangdong Academy of Agricultural Sciences, Ministry of Agriculture and Rural Affairs, Guangdong Provincial Key Laboratory of Tropical and Subtropical Fruit Tree Research, Guangzhou, China
| | - Liangxi Ou
- Key Laboratory of South Subtropical Fruit Biology and Genetic Resource Utilization, Institute of Fruit Tree Research, Guangdong Academy of Agricultural Sciences, Ministry of Agriculture and Rural Affairs, Guangdong Provincial Key Laboratory of Tropical and Subtropical Fruit Tree Research, Guangzhou, China
| | - Jiawei Li
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops, Ministry of Agriculture and Rural Affairs, Guangdong Litchi Engineering Research Center, College of Horticulture, South China Agricultural University, Guangzhou, China
| | | | - Yingxiao Mai
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops, Ministry of Agriculture and Rural Affairs, Guangdong Litchi Engineering Research Center, College of Horticulture, South China Agricultural University, Guangzhou, China
| | - Chengjie Chen
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops, Ministry of Agriculture and Rural Affairs, Guangdong Litchi Engineering Research Center, College of Horticulture, South China Agricultural University, Guangzhou, China
| | - Xingtan Zhang
- Center for Genomics and Biotechnology, Haixia Institute of Science and Technology Fujian Agriculture and Forestry University, Fuzhou, China
| | - Jiakun Zheng
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops, Ministry of Agriculture and Rural Affairs, Guangdong Litchi Engineering Research Center, College of Horticulture, South China Agricultural University, Guangzhou, China
| | - Yanqing Zhang
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops, Ministry of Agriculture and Rural Affairs, Guangdong Litchi Engineering Research Center, College of Horticulture, South China Agricultural University, Guangzhou, China
| | - Hongxiang Peng
- Horticultural Research Institute, Guangxi Academy of Agricultural Sciences, Nanning, China
| | - Lixian Yao
- College of Natural Resources and Environment, South China Agricultural University, Guangzhou, China
| | - Ching Man Wai
- Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Xinping Luo
- Institute of Tropical and Subtropical Cash Crops, Yunnan Academy of Agricultural Sciences, Baoshan, China
| | - Jiaxin Fu
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops, Ministry of Agriculture and Rural Affairs, Guangdong Litchi Engineering Research Center, College of Horticulture, South China Agricultural University, Guangzhou, China
| | - Haibao Tang
- Center for Genomics and Biotechnology, Haixia Institute of Science and Technology Fujian Agriculture and Forestry University, Fuzhou, China
| | - Tianying Lan
- Department of Biological Sciences, University at Buffalo, Buffalo, NY, USA
| | - Biao Lai
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops, Ministry of Agriculture and Rural Affairs, Guangdong Litchi Engineering Research Center, College of Horticulture, South China Agricultural University, Guangzhou, China
| | - Jinhua Sun
- Danzhou Scientific Observing and Experimental Station of Agro-Environment, Ministry of Agriculture and Rural Affairs, Environment and Plant Protection Institute, Chinese Academy of Tropical Agriculture Sciences, Haikou, China
| | - Yongzan Wei
- Key Laboratory for Tropical Fruit Biology of Ministry of Agriculture and Rural Affair, South Subtropical Crops Research Institute, Chinese Academy of Tropical Agriculture Sciences, Zhanjiang, China
| | - Huanling Li
- Danzhou Scientific Observing and Experimental Station of Agro-Environment, Ministry of Agriculture and Rural Affairs, Environment and Plant Protection Institute, Chinese Academy of Tropical Agriculture Sciences, Haikou, China
| | - Jiezhen Chen
- Key Laboratory of South Subtropical Fruit Biology and Genetic Resource Utilization, Institute of Fruit Tree Research, Guangdong Academy of Agricultural Sciences, Ministry of Agriculture and Rural Affairs, Guangdong Provincial Key Laboratory of Tropical and Subtropical Fruit Tree Research, Guangzhou, China
| | - Xuming Huang
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops, Ministry of Agriculture and Rural Affairs, Guangdong Litchi Engineering Research Center, College of Horticulture, South China Agricultural University, Guangzhou, China
| | - Qian Yan
- Key Laboratory of South Subtropical Fruit Biology and Genetic Resource Utilization, Institute of Fruit Tree Research, Guangdong Academy of Agricultural Sciences, Ministry of Agriculture and Rural Affairs, Guangdong Provincial Key Laboratory of Tropical and Subtropical Fruit Tree Research, Guangzhou, China
| | - Xin Liu
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Leah K McHale
- Department of Horticulture and Crop Sciences and Center for Applied Plant Sciences, The Ohio State University, Columbus, OH, USA
| | - William Rolling
- Center for Applied Plant Sciences, The Ohio State University, Columbus, OH, USA
| | | | - David Sankoff
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Ontario, Canada
| | - Chunfang Zheng
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Ontario, Canada
| | - Victor A Albert
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore. .,Department of Biological Sciences, University at Buffalo, Buffalo, NY, USA.
| | - Ray Ming
- Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
| | - Houbin Chen
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops, Ministry of Agriculture and Rural Affairs, Guangdong Litchi Engineering Research Center, College of Horticulture, South China Agricultural University, Guangzhou, China.
| | - Rui Xia
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops, Ministry of Agriculture and Rural Affairs, Guangdong Litchi Engineering Research Center, College of Horticulture, South China Agricultural University, Guangzhou, China.
| | - Jianguo Li
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops, Ministry of Agriculture and Rural Affairs, Guangdong Litchi Engineering Research Center, College of Horticulture, South China Agricultural University, Guangzhou, China.
| |
Collapse
|
72
|
Barrozo ER, Aagaard KM. Human placental biology at single-cell resolution: a contemporaneous review. BJOG 2022; 129:208-220. [PMID: 34651399 PMCID: PMC8688323 DOI: 10.1111/1471-0528.16970] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 09/23/2021] [Accepted: 10/05/2021] [Indexed: 01/03/2023]
Abstract
Single-cell technologies capture cellular heterogeneity to focus on previously poorly described subpopulations of cells. Work by our laboratory and many others has metagenomically characterised a low biomass intrauterine microbial community, alongside microbial transcripts, antigens and metabolites, but the functional importance of low biomass microbial communities in placental immuno-microenvironments is still being elucidated. Given their hypothesised role in modulating inflammation and immune ontogeny to enable tolerance of beneficial microbes while warding off pathogens, there is a need for single-cell resolution. Herein, we summarise the potential for mechanistic understanding of these and other key fundamental early developmental processes by applying single-cell approaches.
Collapse
Affiliation(s)
- Enrico R. Barrozo
- Division of Maternal-Fetal Medicine, Department of Obstetrics & Gynecology, Baylor College of Medicine & Texas Children’s Hospital, Houston, TX, USA
| | - Kjersti M. Aagaard
- Division of Maternal-Fetal Medicine, Department of Obstetrics & Gynecology, Baylor College of Medicine & Texas Children’s Hospital, Houston, TX, USA
| |
Collapse
|
73
|
Prunier J, Carrier A, Gilbert I, Poisson W, Albert V, Taillon J, Bourret V, Côté SD, Droit A, Robert C. CNVs with adaptive potential in Rangifer tarandus: genome architecture and new annotated assembly. Life Sci Alliance 2021; 5:5/3/e202101207. [PMID: 34911809 PMCID: PMC8711850 DOI: 10.26508/lsa.202101207] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 11/29/2021] [Accepted: 11/29/2021] [Indexed: 01/13/2023] Open
Abstract
Rangifer tarandus has experienced recent drastic population size reductions throughout its circumpolar distribution and preserving the species implies genetic diversity conservation. To facilitate genomic studies of the species populations, we improved the genome assembly by combining long read and linked read and obtained a new highly accurate and contiguous genome assembly made of 13,994 scaffolds (L90 = 131 scaffolds). Using de novo transcriptome assembly of RNA-sequencing reads and similarity with annotated human gene sequences, 17,394 robust gene models were identified. As copy number variations (CNVs) likely play a role in adaptation, we additionally investigated these variations among 20 genomes representing three caribou ecotypes (migratory, boreal and mountain). A total of 1,698 large CNVs (length > 1 kb) showing a genome distribution including hotspots were identified. 43 large CNVs were particularly distinctive of the migratory and sedentary ecotypes and included genes annotated for functions likely related to the expected adaptations. This work includes the first publicly available annotation of the caribou genome and the first assembly allowing genome architecture analyses, including the likely adaptive CNVs reported here.
Collapse
Affiliation(s)
- Julien Prunier
- Département de Médecine Moléculaire, Faculté de Médecine, Université Laval, Quebec City, Canada
| | - Alexandra Carrier
- Département des sciences animales, Faculté des Sciences de l'Agriculture et de l'Alimentation, Université Laval, Quebec City, Canada
| | - Isabelle Gilbert
- Département des sciences animales, Faculté des Sciences de l'Agriculture et de l'Alimentation, Université Laval, Quebec City, Canada
| | - William Poisson
- Département des sciences animales, Faculté des Sciences de l'Agriculture et de l'Alimentation, Université Laval, Quebec City, Canada
| | - Vicky Albert
- Ministère des Forêts, de la Faune et des Parcs du Québec, Quebec City, Canada
| | - Joëlle Taillon
- Ministère des Forêts, de la Faune et des Parcs du Québec, Quebec City, Canada
| | - Vincent Bourret
- Ministère des Forêts, de la Faune et des Parcs du Québec, Quebec City, Canada
| | - Steeve D Côté
- Caribou Ungava, département de biologie, Faculté des Sciences et de Génie, Université Laval, Quebec City, Canada
| | - Arnaud Droit
- Département de Médecine Moléculaire, Faculté de Médecine, Université Laval, Quebec City, Canada
| | - Claude Robert
- Département des sciences animales, Faculté des Sciences de l'Agriculture et de l'Alimentation, Université Laval, Quebec City, Canada
| |
Collapse
|
74
|
Peel E, Silver L, Brandies P, Hogg CJ, Belov K. A reference genome for the critically endangered woylie, Bettongia penicillata ogilbyi. GIGABYTE 2021; 2021:gigabyte35. [PMID: 36824341 PMCID: PMC9650285 DOI: 10.46471/gigabyte.35] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 12/08/2021] [Indexed: 11/09/2022] Open
Abstract
Biodiversity is declining globally, and Australia has one of the worst extinction records for mammals. The development of sequencing technologies means that genomic approaches are now available as important tools for wildlife conservation and management. Despite this, genome sequences are available for only 5% of threatened Australian species. Here we report the first reference genome for the woylie (Bettongia penicillata ogilbyi), a critically endangered marsupial from Western Australia, and the first genome within the Potoroidae family. The woylie reference genome was generated using Pacific Biosciences HiFi long-reads, resulting in a 3.39 Gbp assembly with a scaffold N50 of 6.49 Mbp and 86.5% complete mammalian BUSCOs. Assembly of a global transcriptome from pouch skin, tongue, heart and blood RNA-seq reads was used to guide annotation with Fgenesh++, resulting in the annotation of 24,655 genes. The woylie reference genome is a valuable resource for conservation, management and investigations into disease-induced decline of this critically endangered marsupial.
Collapse
Affiliation(s)
- Emma Peel
- School of Life and Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia
| | - Luke Silver
- School of Life and Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia
| | - Parice Brandies
- School of Life and Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia
| | - Carolyn J. Hogg
- School of Life and Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia
| | - Katherine Belov
- School of Life and Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
75
|
Tarabichi M, Demeulemeester J, Verfaillie A, Flanagan AM, Van Loo P, Konopka T. A pan-cancer landscape of somatic mutations in non-unique regions of the human genome. Nat Biotechnol 2021; 39:1589-1596. [PMID: 34282324 PMCID: PMC7612106 DOI: 10.1038/s41587-021-00971-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Accepted: 06/02/2021] [Indexed: 12/27/2022]
Abstract
A substantial fraction of the human genome displays high sequence similarity with at least one other genomic sequence, posing a challenge for the identification of somatic mutations from short-read sequencing data. Here we annotate genomic variants in 2,658 cancers from the Pan-Cancer Analysis of Whole Genomes (PCAWG) cohort with links to similar sites across the human genome. We train a machine learning model to use signals distributed over multiple genomic sites to call somatic events in non-unique regions and validate the data against linked-read sequencing in an independent dataset. Using this approach, we uncover previously hidden mutations in ~1,700 coding sequences and in thousands of regulatory elements, including in known cancer genes, immunoglobulins and highly mutated gene families. Mutations in non-unique regions are consistent with mutations in unique regions in terms of mutation burden and substitution profiles. The analysis provides a systematic summary of the mutation events in non-unique regions at a genome-wide scale across multiple human cancers.
Collapse
Affiliation(s)
- Maxime Tarabichi
- The Francis Crick Institute, London, UK.
- Institute for Interdisciplinary Research, Université Libre de Bruxelles, Brussels, Belgium.
| | - Jonas Demeulemeester
- The Francis Crick Institute, London, UK
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | | | - Adrienne M Flanagan
- Research Department of Pathology, Cancer Institute, University College London, London, UK
- Department of Cellular and Molecular Pathology, Royal National Orthopaedic Hospital NHS Trust, Stanmore, UK
| | | | - Tomasz Konopka
- The Francis Crick Institute, London, UK.
- William Harvey Research Institute, Queen Mary University of London, London, UK.
| |
Collapse
|
76
|
Righini M, Costa J, Zhou W. DNA bridges: A novel platform for single-molecule sequencing and other DNA-protein interaction applications. PLoS One 2021; 16:e0260428. [PMID: 34807931 PMCID: PMC8608331 DOI: 10.1371/journal.pone.0260428] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 11/10/2021] [Indexed: 01/22/2023] Open
Abstract
DNA molecular combing is a technique that stretches thousands of long individual DNA molecules (up to 10 Mbp) into a parallel configuration on surface. It has previously been proposed to sequence these molecules by synthesis. However, this approach poses two critical challenges: 1-Combed DNA molecules are overstretched and therefore a nonoptimal substrate for polymerase extension. 2-The combing surface sterically impedes full enzymatic access to the DNA backbone. Here, we introduce a novel approach that attaches thousands of molecules to a removable surface, with a tunable stretching factor. Next, we dissolve portions of the surface, leaving the DNA molecules suspended as 'bridges'. We demonstrate that the suspended molecules are enzymatically accessible, and we have used an enzyme to incorporate labeled nucleotides, as predicted by the specific molecular sequence. Our results suggest that this novel platform is a promising candidate to achieve high-throughput sequencing of Mbp-long molecules, which could have additional genomic applications, such as the study of other protein-DNA interactions.
Collapse
Affiliation(s)
- Maurizio Righini
- Department of Advanced Research and Development, Centrillion Technologies, Palo Alto, California, United States of America
| | - Justin Costa
- Department of Advanced Research and Development, Centrillion Technologies, Palo Alto, California, United States of America
| | - Wei Zhou
- Department of Advanced Research and Development, Centrillion Technologies, Palo Alto, California, United States of America
| |
Collapse
|
77
|
Chen C, Chen M, Zhu Y, Jiang L, Li J, Wang Y, Lu Z, Guo F, Wang H, Peng Z, Yang Y, Sun J. Noninvasive prenatal diagnosis of monogenic disorders based on direct haplotype phasing through targeted linked-read sequencing. BMC Med Genomics 2021; 14:244. [PMID: 34627256 PMCID: PMC8502361 DOI: 10.1186/s12920-021-01091-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 09/22/2021] [Indexed: 11/30/2022] Open
Abstract
Background Though massively parallel sequencing has been widely applied to noninvasive prenatal screen for common trisomy, the clinical use of massively parallel sequencing to noninvasive prenatal diagnose monogenic disorders is limited. This study was to develop a method for directly determining paternal haplotypes for noninvasive prenatal diagnosis of monogenic disorders without requiring proband’s samples. Methods The study recruited 40 families at high risk for autosomal recessive diseases. The targeted linked-read sequencing was performed on high molecular weight (HMW) DNA of parents using customized probes designed to capture targeted genes and single-nucleotide polymorphisms (SNPs) distributed within 1Mb flanking region of targeted genes. Plasma DNA from pregnant mothers also underwent targeted sequencing using the same probes to determine fetal haplotypes according to parental haplotypes. The results were further confirmed by invasive prenatal diagnosis. Results Seventy-eight parental haplotypes of targeted gene were successfully determined by targeted linked-read sequencing. The predicted fetal inheritance of variant was correctly deduced in 38 families in which the variants had been confirmed by invasive prenatal diagnosis. Two families were determined to be no-call. Conclusions Targeted linked-read sequencing method demonstrated to be an effective means to phase personal haplotype for noninvasive prenatal diagnosis of monogenic disorders. Supplementary Information The online version contains supplementary material available at 10.1186/s12920-021-01091-x.
Collapse
Affiliation(s)
- Chao Chen
- BGI Genomics, BGI-Shenzhen, Shenzhen, 518083, China.,Tianjin Medical Laboratory, BGI-Tianjin, BGI-Shenzhen, Tianjin, 300308, China
| | - Min Chen
- Department of Fetal Medicine and Prenatal Diagnosis, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, 510150, China
| | - Yaping Zhu
- BGI Genomics, BGI-Shenzhen, Shenzhen, 518083, China.,Tianjin Medical Laboratory, BGI-Tianjin, BGI-Shenzhen, Tianjin, 300308, China
| | - Lu Jiang
- BGI Genomics, BGI-Shenzhen, Shenzhen, 518083, China.,Tianjin Medical Laboratory, BGI-Tianjin, BGI-Shenzhen, Tianjin, 300308, China
| | - Jia Li
- BGI Genomics, BGI-Shenzhen, Shenzhen, 518083, China
| | - Yaoshen Wang
- BGI Genomics, BGI-Shenzhen, Shenzhen, 518083, China.,Tianjin Medical Laboratory, BGI-Tianjin, BGI-Shenzhen, Tianjin, 300308, China
| | - Zhe Lu
- BGI Genomics, BGI-Shenzhen, Shenzhen, 518083, China.,Tianjin Medical Laboratory, BGI-Tianjin, BGI-Shenzhen, Tianjin, 300308, China
| | - Fengyu Guo
- BGI Genomics, BGI-Shenzhen, Shenzhen, 518083, China.,Tianjin Medical Laboratory, BGI-Tianjin, BGI-Shenzhen, Tianjin, 300308, China
| | - Hairong Wang
- BGI Genomics, BGI-Shenzhen, Shenzhen, 518083, China.,BGI-Wuhan Clinical Laboratories, BGI-Shenzhen, Wuhan, 430074, China
| | - Zhiyu Peng
- BGI Genomics, BGI-Shenzhen, Shenzhen, 518083, China
| | - Yun Yang
- BGI Genomics, BGI-Shenzhen, Shenzhen, 518083, China. .,BGI-Wuhan Clinical Laboratories, BGI-Shenzhen, Wuhan, 430074, China. .,Department of Obstetrics and Gynecology, The Second Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, China.
| | - Jun Sun
- BGI Genomics, BGI-Shenzhen, Shenzhen, 518083, China. .,Tianjin Medical Laboratory, BGI-Tianjin, BGI-Shenzhen, Tianjin, 300308, China.
| |
Collapse
|
78
|
Shieh JT, Penon-Portmann M, Wong KHY, Levy-Sakin M, Verghese M, Slavotinek A, Gallagher RC, Mendelsohn BA, Tenney J, Beleford D, Perry H, Chow SK, Sharo AG, Brenner SE, Qi Z, Yu J, Klein OD, Martin D, Kwok PY, Boffelli D. Application of full-genome analysis to diagnose rare monogenic disorders. NPJ Genom Med 2021; 6:77. [PMID: 34556655 PMCID: PMC8460793 DOI: 10.1038/s41525-021-00241-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 10/21/2020] [Indexed: 11/30/2022] Open
Abstract
Current genetic tests for rare diseases provide a diagnosis in only a modest proportion of cases. The Full-Genome Analysis method, FGA, combines long-range assembly and whole-genome sequencing to detect small variants, structural variants with breakpoint resolution, and phasing. We built a variant prioritization pipeline and tested FGA’s utility for diagnosis of rare diseases in a clinical setting. FGA identified structural variants and small variants with an overall diagnostic yield of 40% (20 of 50 cases) and 35% in exome-negative cases (8 of 23 cases), 4 of these were structural variants. FGA detected and mapped structural variants that are missed by short reads, including non-coding duplication, and phased variants across long distances of more than 180 kb. With the prioritization algorithm, longer DNA technologies could replace multiple tests for monogenic disorders and expand the range of variants detected. Our study suggests that genomes produced from technologies like FGA can improve variant detection and provide higher resolution genome maps for future application.
Collapse
Affiliation(s)
- Joseph T Shieh
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA. .,Division of Medical Genetics, Pediatrics, Benioff Children's Hospital, University of California San Francisco, San Francisco, CA, USA.
| | - Monica Penon-Portmann
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA.,Division of Medical Genetics, Pediatrics, Benioff Children's Hospital, University of California San Francisco, San Francisco, CA, USA
| | - Karen H Y Wong
- Cardiovascular Research Institute, University of California San Francisco, San Francisco, CA, USA
| | - Michal Levy-Sakin
- Cardiovascular Research Institute, University of California San Francisco, San Francisco, CA, USA
| | - Michelle Verghese
- Cardiovascular Research Institute, University of California San Francisco, San Francisco, CA, USA
| | - Anne Slavotinek
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA.,Division of Medical Genetics, Pediatrics, Benioff Children's Hospital, University of California San Francisco, San Francisco, CA, USA
| | - Renata C Gallagher
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA.,Division of Medical Genetics, Pediatrics, Benioff Children's Hospital, University of California San Francisco, San Francisco, CA, USA
| | - Bryce A Mendelsohn
- Division of Medical Genetics, Pediatrics, Benioff Children's Hospital, University of California San Francisco, San Francisco, CA, USA
| | - Jessica Tenney
- Division of Medical Genetics, Pediatrics, Benioff Children's Hospital, University of California San Francisco, San Francisco, CA, USA
| | - Daniah Beleford
- Division of Medical Genetics, Pediatrics, Benioff Children's Hospital, University of California San Francisco, San Francisco, CA, USA
| | - Hazel Perry
- Division of Medical Genetics, Pediatrics, Benioff Children's Hospital, University of California San Francisco, San Francisco, CA, USA
| | - Stephen K Chow
- Cardiovascular Research Institute, University of California San Francisco, San Francisco, CA, USA
| | - Andrew G Sharo
- Biophysics Graduate Group, University of California Berkeley, Berkeley, CA, USA
| | - Steven E Brenner
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, USA
| | - Zhongxia Qi
- Department of Laboratory Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Jingwei Yu
- Department of Laboratory Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Ophir D Klein
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA.,Division of Medical Genetics, Pediatrics, Benioff Children's Hospital, University of California San Francisco, San Francisco, CA, USA.,Craniofacial Biology and Department of Orofacial Sciences, University of California San Francisco, San Francisco, CA, USA
| | - David Martin
- Children's Hospital Oakland Research Institute, Benioff Children's Hospital Oakland, University of California San Francisco, Oakland, CA, USA
| | - Pui-Yan Kwok
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA.,Cardiovascular Research Institute, University of California San Francisco, San Francisco, CA, USA.,Department of Dermatology, University of California San Francisco, San Francisco, CA, USA
| | - Dario Boffelli
- Children's Hospital Oakland Research Institute, Benioff Children's Hospital Oakland, University of California San Francisco, Oakland, CA, USA
| |
Collapse
|
79
|
Sætre CLC, Eroukhmanoff F, Rönkä K, Kluen E, Thorogood R, Torrance J, Tracey A, Chow W, Pelan S, Howe K, Jakobsen KS, Tørresen OK. A Chromosome-Level Genome Assembly of the Reed Warbler (Acrocephalus scirpaceus). Genome Biol Evol 2021; 13:6367782. [PMID: 34499122 PMCID: PMC8459166 DOI: 10.1093/gbe/evab212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/06/2021] [Indexed: 11/13/2022] Open
Abstract
The reed warbler (Acrocephalus scirpaceus) is a long-distance migrant passerine with a wide distribution across Eurasia. This species has fascinated researchers for decades, especially its role as host of a brood parasite, and its capacity for rapid phenotypic change in the face of climate change. Currently, it is expanding its range northwards in Europe, and is altering its migratory behavior in certain areas. Thus, there is great potential to discover signs of recent evolution and its impact on the genomic composition of the reed warbler. Here, we present a high-quality reference genome for the reed warbler, based on PacBio, 10×, and Hi-C sequencing. The genome has an assembly size of 1,075,083,815 bp with a scaffold N50 of 74,438,198 bp and a contig N50 of 12,742,779 bp. BUSCO analysis using aves_odb10 as a model showed that 95.7% of BUSCO genes were complete. We found unequivocal evidence of two separate macrochromosomal fusions in the reed warbler genome, in addition to the previously identified fusion between chromosome Z and a part of chromosome 4A in the Sylvioidea superfamily. We annotated 14,645 protein-coding genes, and a BUSCO analysis of the protein sequences indicated 97.5% completeness. This reference genome will serve as an important resource, and will provide new insights into the genomic effects of evolutionary drivers such as coevolution, range expansion, and adaptations to climate change, as well as chromosomal rearrangements in birds.
Collapse
Affiliation(s)
| | | | - Katja Rönkä
- HiLIFE Helsinki Institute of Life Sciences, University of Helsinki, Finland.,Research Programme in Organismal and Evolutionary Biology, Faculty of Biological and Environmental Sciences, University of Helsinki, Finland
| | - Edward Kluen
- HiLIFE Helsinki Institute of Life Sciences, University of Helsinki, Finland.,Research Programme in Organismal and Evolutionary Biology, Faculty of Biological and Environmental Sciences, University of Helsinki, Finland
| | - Rose Thorogood
- HiLIFE Helsinki Institute of Life Sciences, University of Helsinki, Finland.,Research Programme in Organismal and Evolutionary Biology, Faculty of Biological and Environmental Sciences, University of Helsinki, Finland
| | - James Torrance
- Tree of Life, Wellcome Sanger Institute, Cambridge, United Kingdom
| | - Alan Tracey
- Tree of Life, Wellcome Sanger Institute, Cambridge, United Kingdom
| | - William Chow
- Tree of Life, Wellcome Sanger Institute, Cambridge, United Kingdom
| | - Sarah Pelan
- Tree of Life, Wellcome Sanger Institute, Cambridge, United Kingdom
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Cambridge, United Kingdom
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, University of Oslo, Norway
| | - Ole K Tørresen
- Centre for Ecological and Evolutionary Synthesis, University of Oslo, Norway
| |
Collapse
|
80
|
Stuckert AMM, Chouteau M, McClure M, LaPolice TM, Linderoth T, Nielsen R, Summers K, MacManes MD. The genomics of mimicry: Gene expression throughout development provides insights into convergent and divergent phenotypes in a Müllerian mimicry system. Mol Ecol 2021; 30:4039-4061. [PMID: 34145931 PMCID: PMC8457190 DOI: 10.1111/mec.16024] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Revised: 05/13/2021] [Accepted: 05/26/2021] [Indexed: 12/12/2022]
Abstract
A common goal in evolutionary biology is to discern the mechanisms that produce the astounding diversity of morphologies seen across the tree of life. Aposematic species, those with a conspicuous phenotype coupled with some form of defence, are excellent models to understand the link between vivid colour pattern variations, the natural selection shaping it, and the underlying genetic mechanisms underpinning this variation. Mimicry systems in which multiple species share the same conspicuous phenotype can provide an even better model for understanding the mechanisms of colour production in aposematic species, especially if comimics have divergent evolutionary histories. Here we investigate the genetic mechanisms by which vivid colour and pattern are produced in a Müllerian mimicry complex of poison frogs. We did this by first assembling a high-quality de novo genome assembly for the mimic poison frog Ranitomeya imitator. This assembled genome is 6.8 Gbp in size, with a contig N50 of 300 Kbp R. imitator and two colour morphs from both Ranitomeya fantastica and R. variabilis which R. imitator mimics. We identified a large number of pigmentation and patterning genes that are differentially expressed throughout development, many of them related to melanocyte development, melanin synthesis, iridophore development and guanine synthesis. Polytypic differences within species may be the result of differences in expression and/or timing of expression, whereas convergence for colour pattern between species does not appear to be due to the same changes in gene expression. In addition, we identify the pteridine synthesis pathway (including genes such as qdpr and xdh) as a key driver of the variation in colour between morphs of these species. Finally, we hypothesize that genes in the keratin family are important for producing different structural colours within these frogs.
Collapse
Affiliation(s)
- Adam M. M. Stuckert
- Department of Molecular, Cellular, and Biomedical SciencesUniversity of New HampshireDurhamNew HampshireUSA
- Department of BiologyEast Carolina UniversityGreenvilleNorth CarolinaUSA
| | - Mathieu Chouteau
- Laboratoire Écologie, Évolution, Interactions des Systèmes Amazoniens (LEEISA)Université de Guyane, CNRS, IFREMERCayenneFrance
| | - Melanie McClure
- Laboratoire Écologie, Évolution, Interactions des Systèmes Amazoniens (LEEISA)Université de Guyane, CNRS, IFREMERCayenneFrance
| | - Troy M. LaPolice
- Department of Molecular, Cellular, and Biomedical SciencesUniversity of New HampshireDurhamNew HampshireUSA
| | - Tyler Linderoth
- Department of Integrative BiologyUniversity of CaliforniaBerkeleyCaliforniaUSA
| | - Rasmus Nielsen
- Department of Integrative BiologyUniversity of CaliforniaBerkeleyCaliforniaUSA
| | - Kyle Summers
- Department of BiologyEast Carolina UniversityGreenvilleNorth CarolinaUSA
| | - Matthew D. MacManes
- Department of Molecular, Cellular, and Biomedical SciencesUniversity of New HampshireDurhamNew HampshireUSA
| |
Collapse
|
81
|
Musunuri R, Arora K, Corvelo A, Shah M, Shelton J, Zody MC, Narzisi G. Somatic variant analysis of linked-reads sequencing data with Lancet. Bioinformatics 2021; 37:1918-1919. [PMID: 33241313 DOI: 10.1093/bioinformatics/btaa888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 09/03/2020] [Accepted: 10/02/2020] [Indexed: 11/14/2022] Open
Abstract
SUMMARY We present a new version of the popular somatic variant caller, Lancet, that supports the analysis of linked-reads sequencing data. By seamlessly integrating barcodes and haplotype read assignments within the colored De Bruijn graph local-assembly framework, Lancet computes a barcode-aware coverage and identifies variants that disagree with the local haplotype structure. AVAILABILITY AND IMPLEMENTATION Lancet is implemented in C++ and available for academic and non-commercial research purposes as an open-source package at https://github.com/nygenome/lancet. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rajeeva Musunuri
- Computational Biology Lab, New York Genome Center, New York, NY 10013, USA
| | - Kanika Arora
- Computational Biology Lab, New York Genome Center, New York, NY 10013, USA
| | - André Corvelo
- Computational Biology Lab, New York Genome Center, New York, NY 10013, USA
| | - Minita Shah
- Computational Biology Lab, New York Genome Center, New York, NY 10013, USA
| | - Jennifer Shelton
- Computational Biology Lab, New York Genome Center, New York, NY 10013, USA
| | - Michael C Zody
- Computational Biology Lab, New York Genome Center, New York, NY 10013, USA
| | - Giuseppe Narzisi
- Computational Biology Lab, New York Genome Center, New York, NY 10013, USA
| |
Collapse
|
82
|
Trost B, Loureiro LO, Scherer SW. Discovery of genomic variation across a generation. Hum Mol Genet 2021; 30:R174-R186. [PMID: 34296264 PMCID: PMC8490016 DOI: 10.1093/hmg/ddab209] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 07/09/2021] [Accepted: 07/19/2021] [Indexed: 11/12/2022] Open
Abstract
Over the past 30 years (the timespan of a generation), advances in genomics technologies have revealed tremendous and unexpected variation in the human genome and have provided increasingly accurate answers to long-standing questions of how much genetic variation exists in human populations and to what degree the DNA complement changes between parents and offspring. Tracking the characteristics of these inherited and spontaneous (or de novo) variations has been the basis of the study of human genetic disease. From genome-wide microarray and next-generation sequencing scans, we now know that each human genome contains over 3 million single nucleotide variants when compared with the ~ 3 billion base pairs in the human reference genome, along with roughly an order of magnitude more DNA—approximately 30 megabase pairs (Mb)—being ‘structurally variable’, mostly in the form of indels and copy number changes. Additional large-scale variations include balanced inversions (average of 18 Mb) and complex, difficult-to-resolve alterations. Collectively, ~1% of an individual’s genome will differ from the human reference sequence. When comparing across a generation, fewer than 100 new genetic variants are typically detected in the euchromatic portion of a child’s genome. Driven by increasingly higher-resolution and higher-throughput sequencing technologies, newer and more accurate databases of genetic variation (for instance, more comprehensive structural variation data and phasing of combinations of variants along chromosomes) of worldwide populations will emerge to underpin the next era of discovery in human molecular genetics.
Collapse
Affiliation(s)
- Brett Trost
- The Centre for Applied Genomics and Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Livia O Loureiro
- The Centre for Applied Genomics and Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Stephen W Scherer
- The Centre for Applied Genomics and Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada.,McLaughlin Centre and Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| |
Collapse
|
83
|
Tan KT, Kim H, Carrot-Zhang J, Zhang Y, Kim WJ, Kugener G, Wala JA, Howard TP, Chi YY, Beroukhim R, Li H, Ha G, Alper SL, Perlman EJ, Mullen EA, Hahn WC, Meyerson M, Hong AL. Haplotype-resolved germline and somatic alterations in renal medullary carcinomas. Genome Med 2021; 13:114. [PMID: 34261517 PMCID: PMC8281718 DOI: 10.1186/s13073-021-00929-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 06/25/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Renal medullary carcinomas (RMCs) are rare kidney cancers that occur in adolescents and young adults of African ancestry. Although RMC is associated with the sickle cell trait and somatic loss of the tumor suppressor, SMARCB1, the ancestral origins of RMC remain unknown. Further, characterization of structural variants (SVs) involving SMARCB1 in RMC remains limited. METHODS We used linked-read genome sequencing to reconstruct germline and somatic haplotypes in 15 unrelated patients with RMC registered on the Children's Oncology Group (COG) AREN03B2 study between 2006 and 2017 or from our prior study. We performed fine-mapping of the HBB locus and assessed the germline for cancer predisposition genes. Subsequently, we assessed the tumor samples for mutations outside of SMARCB1 and integrated RNA sequencing to interrogate the structural variants at the SMARCB1 locus. RESULTS We find that the haplotype of the sickle cell mutation in patients with RMC originated from three geographical regions in Africa. In addition, fine-mapping of the HBB locus identified the sickle cell mutation as the sole candidate variant. We further identify that the SMARCB1 structural variants are characterized by blunt or 1-bp homology events. CONCLUSIONS Our findings suggest that RMC does not arise from a single founder population and that the HbS allele is a strong candidate germline allele which confers risk for RMC. Furthermore, we find that the SVs that disrupt SMARCB1 function are likely repaired by non-homologous end-joining. These findings highlight how haplotype-based analyses using linked-read genome sequencing can be applied to identify potential risk variants in small and rare disease cohorts and provide nucleotide resolution to structural variants.
Collapse
Affiliation(s)
- Kar-Tong Tan
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Hyunji Kim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Jian Carrot-Zhang
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Yuxiang Zhang
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Won Jun Kim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Jeremiah A Wala
- Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Thomas P Howard
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Yueh-Yun Chi
- Department of Pediatrics, University of Southern California, Los Angeles, CA, USA
| | - Rameen Beroukhim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Heng Li
- Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Gavin Ha
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Seth L Alper
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | | | - Elizabeth A Mullen
- Department of Hematology and Oncology, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - William C Hahn
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Matthew Meyerson
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
| | - Andrew L Hong
- Department of Pediatrics, Emory University, Atlanta, GA, USA.
- Aflac Center for Cancer and Blood Disorders, Children's Healthcare of Atlanta, Atlanta, GA, USA.
| |
Collapse
|
84
|
Achakkagari SR, Tai HH, Davidson C, De Jong H, Strömvik MV. The complete mitogenome assemblies of ten diploid potato clones reveal recombination and overlapping variants. DNA Res 2021; 28:6319723. [PMID: 34254134 PMCID: PMC8386665 DOI: 10.1093/dnares/dsab009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 07/07/2021] [Indexed: 01/30/2023] Open
Abstract
The potato mitogenome is complex and to understand various biological functions and nuclear-cytoplasmic interactions, it is important to characterize its gene content and structure. In this study, the complete mitogenome sequences of nine diploid potato clones along with a diploid Solanum okadae clone were characterized. Each mitogenome was assembled and annotated from Pacific Biosciences (PacBio) long-reads and 10X genomics short reads. The results show that each mitogenome consists of multiple circular molecules with similar structure and gene organization, though two groups (clones 07506-01, DW84-1457, 08675-21, and H412-1 in one group, and clones W5281-2, 12625-02, 12120-03, and 11379-03 in another group) could be distinguished, and two mitogenomes (clone 10908-06 and OKA15) were not consistent with those or with each other. Significant differences in the repeat structure of the ten mitogenomes were found, as was recombination events leading to multiple sub-genomic circles. Comparison between individual molecules revealed a translocation of ∼774 bp region located between a short repeat of 40 bp in molecule 3 of each mitogenome, and an insertion of the same in the molecule 2 of the 10908-06 mitogenome. Finally, phylogenetic analyses revealed a close relationship between the mitogenomes of these clones and previously published potato mitogenomes.
Collapse
Affiliation(s)
| | - Helen H Tai
- Fredericton Research and Development Centre, Agriculture and Agri-Food Canada, Fredericton, Canada
| | - Charlotte Davidson
- Fredericton Research and Development Centre, Agriculture and Agri-Food Canada, Fredericton, Canada
| | - Hielke De Jong
- Fredericton Research and Development Centre, Agriculture and Agri-Food Canada, Fredericton, Canada
| | | |
Collapse
|
85
|
Seaby EG, Ennis S. Challenges in the diagnosis and discovery of rare genetic disorders using contemporary sequencing technologies. Brief Funct Genomics 2021; 19:243-258. [PMID: 32393978 DOI: 10.1093/bfgp/elaa009] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Next generation sequencing (NGS) has revolutionised rare disease diagnostics. Concomitant with advancing technologies has been a rise in the number of new gene disorders discovered and diagnoses made for patients and their families. However, despite the trend towards whole exome and whole genome sequencing, diagnostic rates remain suboptimal. On average, only ~30% of patients receive a molecular diagnosis. National sequencing projects launched in the last 5 years are integrating clinical diagnostic testing with research avenues to widen the spectrum of known genetic disorders. Consequently, efforts to diagnose genetic disorders in a clinical setting are now often shared with efforts to prioritise candidate variants for the detection of new disease genes. Herein we discuss some of the biggest obstacles precluding molecular diagnosis and discovery of new gene disorders. We consider bioinformatic and analytical challenges faced when interpreting next generation sequencing data and showcase some of the newest tools available to mitigate these issues. We consider how incomplete penetrance, non-coding variation and structural variants are likely to impact diagnostic rates, and we further discuss methods for uplifting novel gene discovery by adopting a gene-to-patient-based approach.
Collapse
|
86
|
Barley AJ, Reeder TW, Nieto-Montes de Oca A, Cole CJ, Thomson RC. A New Diploid Parthenogenetic Whiptail Lizard from Sonora, Mexico, Is the "Missing Link" in the Evolutionary Transition to Polyploidy. Am Nat 2021; 198:295-309. [PMID: 34260872 DOI: 10.1086/715056] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
AbstractTransitions between sexual and unisexual reproductive modes have significant consequences for the evolutionary trajectories of species. These transitions have occurred numerous times in vertebrates and are frequently mediated by hybridization events. Triploid unisexual vertebrates are thought to arise through hybridization between individuals of a diploid unisexual lineage and a sexual species, although additional evidence that confirms this mechanism is needed in numerous groups. North American whiptail lizards (Aspidoscelis) are notable for being one of the largest radiations of unisexual vertebrates, and the most diverse group of Aspidoscelis includes numerous triploid lineages that have no known diploid unisexual ancestors. This pattern of "missing" ancestors may result from the short evolutionary life span of unisexual lineages or the selective advantages of polyploidy, or it could suggest that alternative mechanisms of triploid formation are operating in nature. We leverage genomic, morphological, and karyotypic data to describe a new diploid unisexual whiptail and show that it is likely the unisexual progenitor of an extant triploid lineage, A. opatae. We also resolve patterns of polyploidization within the A. sexlineatus species group and test predictions about the phenotypic outcomes of hybridization.
Collapse
|
87
|
Schwarz JM, Lüpken R, Seelow D, Kehr B. Novel sequencing technologies and bioinformatic tools for deciphering the non-coding genome. MED GENET-BERLIN 2021; 33:133-145. [PMID: 38836034 PMCID: PMC11006320 DOI: 10.1515/medgen-2021-2072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 06/24/2021] [Indexed: 06/06/2024]
Abstract
High-throughput sequencing techniques have significantly increased the molecular diagnosis rate for patients with monogenic disorders. This is primarily due to a substantially increased identification rate of disease mutations in the coding sequence, primarily SNVs and indels. Further progress is hampered by difficulties in the detection of structural variants and the interpretation of variants outside the coding sequence. In this review, we provide an overview about how novel sequencing techniques and state-of-the-art algorithms can be used to discover small and structural variants across the whole genome and introduce bioinformatic tools for the prediction of effects variants may have in the non-coding part of the genome.
Collapse
Affiliation(s)
- Jana Marie Schwarz
- Department of Neuropediatrics, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- NeuroCure Cluster of Excellence, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Richard Lüpken
- BIH-Junior Research Group Genome Informatics, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Dominik Seelow
- BIH-Bioinformatics and Translational Genetics, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Birte Kehr
- BIH-Junior Research Group Genome Informatics, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
- Algorithmic Bioinformatics, Regensburg Center for Interventional Immunology (RCI), Franz-Josef-Strauß-Allee 11, 93053 Regensburg, Germany
- University Regensburg, Regensburg, Germany
| |
Collapse
|
88
|
Determination of complete chromosomal haplotypes by bulk DNA sequencing. Genome Biol 2021; 22:139. [PMID: 33957932 PMCID: PMC8101039 DOI: 10.1186/s13059-021-02330-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Accepted: 03/25/2021] [Indexed: 01/02/2023] Open
Abstract
Haplotype phase represents the collective genetic variation between homologous chromosomes and is an essential feature of non-haploid genomes. Here we describe a computational strategy to reliably determine complete whole-chromosome haplotypes using a combination of bulk long-range sequencing and Hi-C sequencing. We demonstrate that this strategy can resolve the haplotypes of parental chromosomes in diploid human genomes with high precision (>99%) and completeness (>98%) and assemble the syntenic structure of rearranged chromosomes in aneuploid cancer genomes at base pair level resolution. Our work enables direct interrogation of chromosome-specific alterations and chromatin reorganization using bulk DNA sequencing.
Collapse
|
89
|
Coimbra RTF, Winter S, Kumar V, Koepfli KP, Gooley RM, Dobrynin P, Fennessy J, Janke A. Whole-genome analysis of giraffe supports four distinct species. Curr Biol 2021; 31:2929-2938.e5. [PMID: 33957077 DOI: 10.1016/j.cub.2021.04.033] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 01/06/2021] [Accepted: 04/14/2021] [Indexed: 12/24/2022]
Abstract
Species is the fundamental taxonomic unit in biology and its delimitation has implications for conservation. In giraffe (Giraffa spp.), multiple taxonomic classifications have been proposed since the early 1900s.1 However, one species with nine subspecies has been generally accepted,2 likely due to limited in-depth assessments, subspecies hybridizing in captivity,3,4 and anecdotal reports of hybrids in the wild.5 Giraffe taxonomy received new attention after population genetic studies using traditional genetic markers suggested at least four species.6,7 This view has been met with controversy,8 setting the stage for debate.9,10 Genomics is significantly enhancing our understanding of biodiversity and speciation relative to traditional genetic approaches and thus has important implications for species delineation and conservation.11 We present a high-quality de novo genome assembly of the critically endangered Kordofan giraffe (G. camelopardalis antiquorum)12 and a comprehensive whole-genome analysis of 50 giraffe representing all traditionally recognized subspecies. Population structure and phylogenomic analyses support four separately evolving giraffe lineages, which diverged 230-370 ka ago. These lineages underwent distinct demographic histories and show different levels of heterozygosity and inbreeding. Our results strengthen previous findings of limited gene flow and admixture among putative giraffe species6,7,9 and establish a genomic foundation for recognizing four species and seven subspecies, the latter of which should be considered as evolutionary significant units. Achieving a consensus over the number of species and subspecies in giraffe is essential for adequately assessing their threat level and will improve conservation efforts for these iconic taxa.
Collapse
Affiliation(s)
- Raphael T F Coimbra
- Senckenberg Biodiversity and Climate Research Centre, Senckenberganlage 25, 60325 Frankfurt am Main, Germany; Institute for Ecology, Evolution and Diversity, Goethe University, Max-von-Laue-Straße 13, 60438 Frankfurt am Main, Germany.
| | - Sven Winter
- Senckenberg Biodiversity and Climate Research Centre, Senckenberganlage 25, 60325 Frankfurt am Main, Germany; Institute for Ecology, Evolution and Diversity, Goethe University, Max-von-Laue-Straße 13, 60438 Frankfurt am Main, Germany
| | - Vikas Kumar
- Key Laboratory of Vertebrate Evolution and Human Origins, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing 100044, China; Center for Excellence in Life and Paleoenvironment, Chinese Academy of Sciences, Beijing 100044, China
| | - Klaus-Peter Koepfli
- Smithsonian-Mason School of Conservation, Front Royal, VA, 22630, USA; Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, 3001 Connecticut Avenue NW, Washington, DC 20008, USA
| | - Rebecca M Gooley
- Smithsonian-Mason School of Conservation, Front Royal, VA, 22630, USA; Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, 3001 Connecticut Avenue NW, Washington, DC 20008, USA
| | - Pavel Dobrynin
- Computer Technologies Laboratory, ITMO University, 49 Kronverkskiy Pr., Saint Petersburg 197101, Russia
| | - Julian Fennessy
- Giraffe Conservation Foundation, PO Box 86099, Eros, Windhoek, Namibia
| | - Axel Janke
- Senckenberg Biodiversity and Climate Research Centre, Senckenberganlage 25, 60325 Frankfurt am Main, Germany; Institute for Ecology, Evolution and Diversity, Goethe University, Max-von-Laue-Straße 13, 60438 Frankfurt am Main, Germany; LOEWE Centre for Translational Biodiversity Genomics, Senckenberganlage 25, 60325 Frankfurt am Main, Germany.
| |
Collapse
|
90
|
Sun H, Shen XR, Fang ZB, Jiang ZZ, Wei XJ, Wang ZY, Yu XF. Next-Generation Sequencing Technologies and Neurogenetic Diseases. Life (Basel) 2021; 11:life11040361. [PMID: 33921670 PMCID: PMC8072598 DOI: 10.3390/life11040361] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 04/05/2021] [Accepted: 04/16/2021] [Indexed: 12/18/2022] Open
Abstract
Next-generation sequencing (NGS) technology has led to great advances in understanding the causes of Mendelian and complex neurological diseases. Owing to the complexity of genetic diseases, the genetic factors contributing to many rare and common neurological diseases remain poorly understood. Selecting the correct genetic test based on cost-effectiveness, coverage area, and sequencing range can improve diagnosis, treatments, and prevention. Whole-exome sequencing and whole-genome sequencing are suitable methods for finding new mutations, and gene panels are suitable for exploring the roles of specific genes in neurogenetic diseases. Here, we provide an overview of the classifications, applications, advantages, and limitations of NGS in research on neurological diseases. We further provide examples of NGS-based explorations and insights of the genetic causes of neurogenetic diseases, including Charcot-Marie-Tooth disease, spinocerebellar ataxias, epilepsy, and multiple sclerosis. In addition, we focus on issues related to NGS-based analyses, including interpretations of variants of uncertain significance, de novo mutations, congenital genetic diseases with complex phenotypes, and single-molecule real-time approaches.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Xue-Fan Yu
- Correspondence: ; Tel.: +86-157-5430-1836
| |
Collapse
|
91
|
Lyu R, Tsui V, McCarthy DJ, Crismani W. Personalized genome structure via single gamete sequencing. Genome Biol 2021; 22:112. [PMID: 33874978 PMCID: PMC8054432 DOI: 10.1186/s13059-021-02327-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 03/25/2021] [Indexed: 12/13/2022] Open
Abstract
Genetic maps have been fundamental to building our understanding of disease genetics and evolutionary processes. The gametes of an individual contain all of the information required to perform a de novo chromosome-scale assembly of an individual's genome, which historically has been performed with populations and pedigrees. Here, we discuss how single-cell gamete sequencing offers the potential to merge the advantages of short-read sequencing with the ability to build personalized genetic maps and open up an entirely new space in personalized genetics.
Collapse
Affiliation(s)
- Ruqian Lyu
- Bioinformatics and Cellular Genomics, St. Vincent's Institute of Medical Research, Melbourne, Australia
- Melbourne Integrative Genomics, Faculty of Science, The University of Melbourne, Melbourne, Australia
| | - Vanessa Tsui
- DNA Repair and Recombination Laboratory, St. Vincent's Institute of Medical Research, Melbourne, Australia
- The Faculty of Medicine, Dentistry and Health Science, The University of Melbourne, Melbourne, Australia
| | - Davis J McCarthy
- Bioinformatics and Cellular Genomics, St. Vincent's Institute of Medical Research, Melbourne, Australia.
- Melbourne Integrative Genomics, Faculty of Science, The University of Melbourne, Melbourne, Australia.
| | - Wayne Crismani
- DNA Repair and Recombination Laboratory, St. Vincent's Institute of Medical Research, Melbourne, Australia.
- The Faculty of Medicine, Dentistry and Health Science, The University of Melbourne, Melbourne, Australia.
| |
Collapse
|
92
|
Garg S. Computational methods for chromosome-scale haplotype reconstruction. Genome Biol 2021; 22:101. [PMID: 33845884 PMCID: PMC8040228 DOI: 10.1186/s13059-021-02328-9] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 03/25/2021] [Indexed: 12/13/2022] Open
Abstract
High-quality chromosome-scale haplotype sequences of diploid genomes, polyploid genomes, and metagenomes provide important insights into genetic variation associated with disease and biodiversity. However, whole-genome short read sequencing does not yield haplotype information spanning whole chromosomes directly. Computational assembly of shorter haplotype fragments is required for haplotype reconstruction, which can be challenging owing to limited fragment lengths and high haplotype and repeat variability across genomes. Recent advancements in long-read and chromosome-scale sequencing technologies, alongside computational innovations, are improving the reconstruction of haplotypes at the level of whole chromosomes. Here, we review recent and discuss methodological progress and perspectives in these areas.
Collapse
Affiliation(s)
- Shilpa Garg
- Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
93
|
Wang Y, Bae T, Thorpe J, Sherman MA, Jones AG, Cho S, Daily K, Dou Y, Ganz J, Galor A, Lobon I, Pattni R, Rosenbluh C, Tomasi S, Tomasini L, Yang X, Zhou B, Akbarian S, Ball LL, Bizzotto S, Emery SB, Doan R, Fasching L, Jang Y, Juan D, Lizano E, Luquette LJ, Moldovan JB, Narurkar R, Oetjens MT, Rodin RE, Sekar S, Shin JH, Soriano E, Straub RE, Zhou W, Chess A, Gleeson JG, Marquès-Bonet T, Park PJ, Peters MA, Pevsner J, Walsh CA, Weinberger DR, Vaccarino FM, Moran JV, Urban AE, Kidd JM, Mills RE, Abyzov A. Comprehensive identification of somatic nucleotide variants in human brain tissue. Genome Biol 2021; 22:92. [PMID: 33781308 PMCID: PMC8006362 DOI: 10.1186/s13059-021-02285-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 02/01/2021] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Post-zygotic mutations incurred during DNA replication, DNA repair, and other cellular processes lead to somatic mosaicism. Somatic mosaicism is an established cause of various diseases, including cancers. However, detecting mosaic variants in DNA from non-cancerous somatic tissues poses significant challenges, particularly if the variants only are present in a small fraction of cells. RESULTS Here, the Brain Somatic Mosaicism Network conducts a coordinated, multi-institutional study to examine the ability of existing methods to detect simulated somatic single-nucleotide variants (SNVs) in DNA mixing experiments, generate multiple replicates of whole-genome sequencing data from the dorsolateral prefrontal cortex, other brain regions, dura mater, and dural fibroblasts of a single neurotypical individual, devise strategies to discover somatic SNVs, and apply various approaches to validate somatic SNVs. These efforts lead to the identification of 43 bona fide somatic SNVs that range in variant allele fractions from ~ 0.005 to ~ 0.28. Guided by these results, we devise best practices for calling mosaic SNVs from 250× whole-genome sequencing data in the accessible portion of the human genome that achieve 90% specificity and sensitivity. Finally, we demonstrate that analysis of multiple bulk DNA samples from a single individual allows the reconstruction of early developmental cell lineage trees. CONCLUSIONS This study provides a unified set of best practices to detect somatic SNVs in non-cancerous tissues. The data and methods are freely available to the scientific community and should serve as a guide to assess the contributions of somatic SNVs to neuropsychiatric diseases.
Collapse
Affiliation(s)
- Yifan Wang
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI, 48109, USA
| | - Taejeong Bae
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, 55905, USA
| | - Jeremy Thorpe
- Program in Biochemistry, Cellular and Molecular Biology, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Maxwell A Sherman
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- MIT Department of Electrical Engineering and Computer Science, Cambridge, MA, USA
| | - Attila G Jones
- Department of Cell, Developmental and Regenerative Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Sean Cho
- Department of Neurology, Kennedy Krieger Institute, Baltimore, MD, 21205, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- Present Address: Arcus Biosciences, Hayward, CA, 94545, USA
| | | | - Yanmei Dou
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Javier Ganz
- Division of Genetics and Genomics, Manton Center for Orphan Disease, and Howard Hughes Medical Institute, Boston Children's Hospital, Boston, MA, 02115, USA
- Departments of Neurology and Pediatrics, Harvard Medical School, Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Alon Galor
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Irene Lobon
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), PRBB, 08003, Barcelona, Catalonia, Spain
- Department of Cell Biology, Physiology and Immunology, and Institute of Neurosciences, University of Barcelona, 08028, Barcelona, Spain
| | - Reenal Pattni
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Chaggai Rosenbluh
- Department of Cell, Developmental and Regenerative Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Simone Tomasi
- Child Study Center, Yale University, New Haven, CT, 06520, USA
| | - Livia Tomasini
- Child Study Center, Yale University, New Haven, CT, 06520, USA
| | - Xiaoxu Yang
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Rady Children's Institute for Genomic Medicine, San Diego, CA, USA
| | - Bo Zhou
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Schahram Akbarian
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Laurel L Ball
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Rady Children's Institute for Genomic Medicine, San Diego, CA, USA
| | - Sara Bizzotto
- Division of Genetics and Genomics, Manton Center for Orphan Disease, and Howard Hughes Medical Institute, Boston Children's Hospital, Boston, MA, 02115, USA
- Departments of Neurology and Pediatrics, Harvard Medical School, Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Sarah B Emery
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Ryan Doan
- Division of Genetics and Genomics, Manton Center for Orphan Disease, and Howard Hughes Medical Institute, Boston Children's Hospital, Boston, MA, 02115, USA
- Departments of Neurology and Pediatrics, Harvard Medical School, Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Liana Fasching
- Child Study Center, Yale University, New Haven, CT, 06520, USA
| | - Yeongjun Jang
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, 55905, USA
| | - David Juan
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), PRBB, 08003, Barcelona, Catalonia, Spain
| | - Esther Lizano
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), PRBB, 08003, Barcelona, Catalonia, Spain
| | - Lovelace J Luquette
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - John B Moldovan
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Rujuta Narurkar
- Lieber Institute for Brain Development, Baltimore, MD, 21205, USA
| | - Matthew T Oetjens
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Rachel E Rodin
- Division of Genetics and Genomics, Manton Center for Orphan Disease, and Howard Hughes Medical Institute, Boston Children's Hospital, Boston, MA, 02115, USA
- Departments of Neurology and Pediatrics, Harvard Medical School, Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Shobana Sekar
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, 55905, USA
| | - Joo Heon Shin
- Lieber Institute for Brain Development, Baltimore, MD, 21205, USA
- Department of Neurology, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Eduardo Soriano
- Department of Cell Biology, Physiology and Immunology, and Institute of Neurosciences, University of Barcelona, 08028, Barcelona, Spain
- Vall d'Hebron Institut de Recerca, 08035, Barcelona, Spain
- Centro de Investigación en Red sobre Enfermedades Neurodegenerativas (CIBERNED), 28031, Madrid, Spain
- ICREA Academia, 08010 Barcelona, Spain
| | - Richard E Straub
- Lieber Institute for Brain Development, Baltimore, MD, 21205, USA
| | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI, 48109, USA
| | - Andrew Chess
- Department of Cell, Developmental and Regenerative Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technologies, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Joseph G Gleeson
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Rady Children's Institute for Genomic Medicine, San Diego, CA, USA
| | - Tomas Marquès-Bonet
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), PRBB, 08003, Barcelona, Catalonia, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), 08010, Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), 08036, Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, 08193, Cerdanyola del Vallès, Barcelona, Spain
| | - Peter J Park
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | | | - Jonathan Pevsner
- Department of Neurology, Kennedy Krieger Institute, Baltimore, MD, 21205, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Christopher A Walsh
- Division of Genetics and Genomics, Manton Center for Orphan Disease, and Howard Hughes Medical Institute, Boston Children's Hospital, Boston, MA, 02115, USA
- Departments of Neurology and Pediatrics, Harvard Medical School, Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Daniel R Weinberger
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- Lieber Institute for Brain Development, Baltimore, MD, 21205, USA
- Department of Neurology, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Flora M Vaccarino
- Child Study Center, Yale University, New Haven, CT, 06520, USA
- Department of Neuroscience, Yale University, New Haven, 06520, CT, USA
| | - John V Moran
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Alexander E Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, 94305, USA
- Tashia and John Morgridge Faculty Scholar, Stanford Child Health Research Institute, Stanford, CA, 94305, USA
| | - Jeffrey M Kidd
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI, 48109, USA
| | - Ryan E Mills
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI, 48109, USA
| | - Alexej Abyzov
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, 55905, USA.
| |
Collapse
|
94
|
Charvet B, Pierquin J, Brunel J, Gorter R, Quétard C, Horvat B, Amor S, Portoukalian J, Perron H. Human Endogenous Retrovirus Type W Envelope from Multiple Sclerosis Demyelinating Lesions Shows Unique Solubility and Antigenic Characteristics. Virol Sin 2021; 36:1006-1026. [PMID: 33770381 PMCID: PMC8558138 DOI: 10.1007/s12250-021-00372-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 02/08/2021] [Indexed: 12/15/2022] Open
Abstract
In multiple sclerosis (MS), human endogenous retrovirus W family (HERV-W) envelope protein, pHERV-W ENV, limits remyelination and induces microglia-mediated neurodegeneration. To better understand its role, we examined the soluble pHERV-W antigen from MS brain lesions detected by specific antibodies. Physico-chemical and antigenic characteristics confirmed differences between pHERV-W ENV and syncytin-1. pHERV-W ENV monomers and trimers remained associated with membranes, while hexamers self-assembled from monomers into a soluble macrostructure involving sulfatides in MS brain. Extracellular hexamers are stabilized by internal hydrophobic bonds and external hydrophilic moieties. HERV-W studies in MS also suggest that this diffusible antigen may correspond to a previously described high-molecular-weight neurotoxic factor secreted by MS B-cells and thus represents a major agonist in MS pathogenesis. Adapted methods are now needed to identify encoding HERV provirus(es) in affected cells DNA. The properties and origin of MS brain pHERV-W ENV soluble antigen will allow a better understanding of the role of HERVs in MS pathogenesis. The present results anyhow pave the way to an accurate detection of the different forms of pHERV-W ENV antigen with appropriate conditions that remained unseen until now.
Collapse
Affiliation(s)
- Benjamin Charvet
- GeNeuro Innovation, Lyon, 69008, France. .,CIRI, International Center for Infectiology Research, INSERM U1111, CNRS UMR5308, University of Lyon, ENS Lyon, France. .,Université Claude Bernard Lyon 1, Lyon, 69000, France.
| | | | - Joanna Brunel
- GeNeuro Innovation, Lyon, 69008, France.,CIRI, International Center for Infectiology Research, INSERM U1111, CNRS UMR5308, University of Lyon, ENS Lyon, France.,Université Claude Bernard Lyon 1, Lyon, 69000, France
| | - Rianne Gorter
- Department of Pathology, Amsterdam UMC, Location VUMC, 1007 MB, Amsterdam, The Netherlands
| | | | - Branka Horvat
- CIRI, International Center for Infectiology Research, INSERM U1111, CNRS UMR5308, University of Lyon, ENS Lyon, France.,Université Claude Bernard Lyon 1, Lyon, 69000, France
| | - Sandra Amor
- Department of Pathology, Amsterdam UMC, Location VUMC, 1007 MB, Amsterdam, The Netherlands.,Centre for Neuroscience and Trauma, Blizard Institute, Barts and London School of Medicine and Dentistry, Queen Mary University of London, London, E1 2AT, UK
| | | | - Hervé Perron
- GeNeuro Innovation, Lyon, 69008, France. .,Université Claude Bernard Lyon 1, Lyon, 69000, France.
| |
Collapse
|
95
|
Guo L, Xu M, Wang W, Gu S, Zhao X, Chen F, Wang O, Xu X, Seim I, Fan G, Deng L, Liu X. SLR-superscaffolder: a de novo scaffolding tool for synthetic long reads using a top-to-bottom scheme. BMC Bioinformatics 2021; 22:158. [PMID: 33765921 PMCID: PMC7993450 DOI: 10.1186/s12859-021-04081-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 03/16/2021] [Indexed: 12/30/2022] Open
Abstract
Background Synthetic long reads (SLR) with long-range co-barcoding information are now widely applied in genomics research. Although several tools have been developed for each specific SLR technique, a robust standalone scaffolder with high efficiency is warranted for hybrid genome assembly. Results In this work, we developed a standalone scaffolding tool, SLR-superscaffolder, to link together contigs in draft assemblies using co-barcoding and paired-end read information. Our top-to-bottom scheme first builds a global scaffold graph based on Jaccard Similarity to determine the order and orientation of contigs, and then locally improves the scaffolds with the aid of paired-end information. We also exploited a screening algorithm to reduce the negative effect of misassembled contigs in the input assembly. We applied SLR-superscaffolder to a human single tube long fragment read sequencing dataset and increased the scaffold NG50 of its corresponding draft assembly 1349 fold. Moreover, benchmarking on different input contigs showed that this approach overall outperformed existing SLR scaffolders, providing longer contiguity and fewer misassemblies, especially for short contigs assembled by next-generation sequencing data. The open-source code of SLR-superscaffolder is available at https://github.com/BGI-Qingdao/SLR-superscaffolder. Conclusions SLR-superscaffolder can dramatically improve the contiguity of a draft assembly by integrating a hybrid assembly strategy.
Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04081-z.
Collapse
Affiliation(s)
- Lidong Guo
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, 518083, China.,BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China.,State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China
| | - Mengyang Xu
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China.,State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China.,BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China
| | - Wenchao Wang
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China
| | - Shengqiang Gu
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, 518083, China
| | - Xia Zhao
- MGI, BGI-Shenzhen, Shenzhen, 518083, China
| | - Fang Chen
- MGI, BGI-Shenzhen, Shenzhen, 518083, China
| | - Ou Wang
- BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China
| | - Xun Xu
- BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China
| | - Inge Seim
- Integrative Biology Laboratory, College of Life Sciences, Nanjing Normal University, Nanjing, 210046, China.,School of Biology and Environmental Science, Queensland University of Technology, Brisbane, 4000, Australia
| | - Guangyi Fan
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China.,State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China.,BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China
| | - Li Deng
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China. .,State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China. .,BGI-Shenzhen, Shenzhen, 518083, China. .,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China.
| | - Xin Liu
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China. .,State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China. .,BGI-Shenzhen, Shenzhen, 518083, China. .,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China.
| |
Collapse
|
96
|
Guo J, Shi C, Chen X, Wang O, Liu P, Yang H, Xu X, Zhang W, Zhu H. stLFRsv: A Germline Structural Variant Analysis Pipeline Using Co-barcoded Reads. Front Genet 2021; 12:636239. [PMID: 33815469 PMCID: PMC8012683 DOI: 10.3389/fgene.2021.636239] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 02/04/2021] [Indexed: 11/13/2022] Open
Abstract
Co-barcoded reads originating from long DNA fragments (mean length >30 kbp) maintain both single base level accuracy and long-range genomic information. We propose a pipeline, stLFRsv, to detect structural variation using co-barcoded reads. stLFRsv identifies abnormal large gaps between co-barcoded reads to detect potential breakpoints and reconstruct complex structural variants (SVs). Haplotype phasing by co-barcoded reads increases the signal to noise ratio, and barcode sharing profiles are used to filter out false positives. We integrate the short read SV caller smoove for smaller variants with stLFRsv. The integrated pipeline was evaluated on the well-characterized genome HG002/NA24385, and 74.5% precision and a 22.4% recall rate were obtained for deletions. stLFRsv revealed some large variants not included in the benchmark set that were verified by long reads or assembly. For the HG001/NA12878 genome, stLFRsv also achieved the best performance for both resource usage and the detection of large variants. Our work indicates that co-barcoded read technology has the potential to improve genome completeness.
Collapse
Affiliation(s)
- Junfu Guo
- BGI-Tianjin, BGI-Shenzhen, Tianjin, China
| | - Chang Shi
- BGI-Tianjin, BGI-Shenzhen, Tianjin, China
| | - Xi Chen
- BGI-Tianjin, BGI-Shenzhen, Tianjin, China
| | - Ou Wang
- BGI-Shenzhen, Shenzhen, China
| | - Ping Liu
- MGI, BGI-Shenzhen, Shenzhen, China
| | - Huanming Yang
- Guangdong Provincial Academician Workstation of BGI Synthetic Genomics, BGI-Shenzhen, Shenzhen, China
| | - Xun Xu
- Guangdong Provincial Key Laboratory of Genome Read and Write, BGI-Shenzhen, Shenzhen, China
| | | | | |
Collapse
|
97
|
Kumar A, Adhikari S, Kankainen M, Heckman CA. Comparison of Structural and Short Variants Detected by Linked-Read and Whole-Exome Sequencing in Multiple Myeloma. Cancers (Basel) 2021; 13:1212. [PMID: 33802025 PMCID: PMC7999337 DOI: 10.3390/cancers13061212] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 03/07/2021] [Accepted: 03/08/2021] [Indexed: 02/07/2023] Open
Abstract
Linked-read sequencing was developed to aid the detection of large structural variants (SVs) from short-read sequencing efforts. We performed a systematic evaluation to determine if linked-read exome sequencing provides more comprehensive and clinically relevant information than whole-exome sequencing (WES) when applied to the same set of multiple myeloma patient samples. We report that linked-read sequencing detected a higher number of SVs (n = 18,455) than WES (n = 4065). However, linked-read predictions were dominated by inversions (92.4%), leading to poor detection of other types of SVs. In contrast, WES detected 56.3% deletions, 32.6% insertions, 6.7% translocations, 3.3% duplications and 1.2% inversions. Surprisingly, the quantitative performance assessment suggested a higher performance for WES (AUC = 0.791) compared to linked-read sequencing (AUC = 0.766) for detecting clinically validated cytogenetic alterations. We also found that linked-read sequencing detected more short variants (n = 704) compared to WES (n = 109). WES detected somatic mutations in all MM-related genes while linked-read sequencing failed to detect certain mutations. The comparison of somatic mutations detected using linked-read, WES and RNA-seq revealed that WES and RNA-seq detected more mutations than linked-read sequencing. These data indicate that WES outperforms and is more efficient than linked-read sequencing for detecting clinically relevant SVs and MM-specific short variants.
Collapse
Affiliation(s)
- Ashwini Kumar
- Institute for Molecular Medicine Finland-FIMM, HiLIFE-Helsinki Institute of Life Science, iCAN Digital Cancer Medicine Flagship, University of Helsinki, Tukholmankatu 8, 00290 Helsinki, Finland; (A.K.); (S.A.)
- iCAN Digital Precision Cancer Medicine, University of Helsinki, 00014 Helsinki, Finland;
| | - Sadiksha Adhikari
- Institute for Molecular Medicine Finland-FIMM, HiLIFE-Helsinki Institute of Life Science, iCAN Digital Cancer Medicine Flagship, University of Helsinki, Tukholmankatu 8, 00290 Helsinki, Finland; (A.K.); (S.A.)
- iCAN Digital Precision Cancer Medicine, University of Helsinki, 00014 Helsinki, Finland;
| | - Matti Kankainen
- iCAN Digital Precision Cancer Medicine, University of Helsinki, 00014 Helsinki, Finland;
- Medical and Clinical Genetics, University of Helsinki, Helsinki University Hospital, 00029 Helsinki, Finland
- Translational Immunology Research Program and Department of Clinical Chemistry, University of Helsinki, 00290 Helsinki, Finland
- Hematology Research Unit Helsinki, Department of Hematology, Helsinki University Hospital Comprehensive Cancer Center, 00290 Helsinki, Finland
| | - Caroline A. Heckman
- Institute for Molecular Medicine Finland-FIMM, HiLIFE-Helsinki Institute of Life Science, iCAN Digital Cancer Medicine Flagship, University of Helsinki, Tukholmankatu 8, 00290 Helsinki, Finland; (A.K.); (S.A.)
- iCAN Digital Precision Cancer Medicine, University of Helsinki, 00014 Helsinki, Finland;
| |
Collapse
|
98
|
Savarese M, Välipakka S, Johari M, Hackman P, Udd B. Is Gene-Size an Issue for the Diagnosis of Skeletal Muscle Disorders? J Neuromuscul Dis 2021; 7:203-216. [PMID: 32176652 PMCID: PMC7369045 DOI: 10.3233/jnd-190459] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Human genes have a variable length. Those having a coding sequence of extraordinary length and a high number of exons were almost impossible to sequence using the traditional Sanger-based gene-by-gene approach. High-throughput sequencing has partly overcome the size-related technical issues, enabling a straightforward, rapid and relatively inexpensive analysis of large genes. Several large genes (e.g. TTN, NEB, RYR1, DMD) are recognized as disease-causing in patients with skeletal muscle diseases. However, because of their sheer size, the clinical interpretation of variants in these genes is probably the most challenging aspect of the high-throughput genetic investigation in the field of skeletal muscle diseases. The main aim of this review is to discuss the technical and interpretative issues related to the diagnostic investigation of large genes and to reflect upon the current state of the art and the future advancements in the field.
Collapse
Affiliation(s)
- Marco Savarese
- Folkhälsan Research Center, Helsinki, Finland.,Department of Medical Genetics, Medicum, University of Helsinki, Helsinki, Finland
| | - Salla Välipakka
- Folkhälsan Research Center, Helsinki, Finland.,Department of Medical Genetics, Medicum, University of Helsinki, Helsinki, Finland
| | - Mridul Johari
- Folkhälsan Research Center, Helsinki, Finland.,Department of Medical Genetics, Medicum, University of Helsinki, Helsinki, Finland
| | - Peter Hackman
- Folkhälsan Research Center, Helsinki, Finland.,Department of Medical Genetics, Medicum, University of Helsinki, Helsinki, Finland
| | - Bjarne Udd
- Folkhälsan Research Center, Helsinki, Finland.,Department of Medical Genetics, Medicum, University of Helsinki, Helsinki, Finland.,Neuromuscular Research Center, Tampere University and University Hospital, Tampere, Finland.,Department of Neurology, Vaasa Central Hospital, Vaasa, Finland
| |
Collapse
|
99
|
van Belzen IAEM, Schönhuth A, Kemmeren P, Hehir-Kwa JY. Structural variant detection in cancer genomes: computational challenges and perspectives for precision oncology. NPJ Precis Oncol 2021; 5:15. [PMID: 33654267 PMCID: PMC7925608 DOI: 10.1038/s41698-021-00155-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 01/12/2021] [Indexed: 01/31/2023] Open
Abstract
Cancer is generally characterized by acquired genomic aberrations in a broad spectrum of types and sizes, ranging from single nucleotide variants to structural variants (SVs). At least 30% of cancers have a known pathogenic SV used in diagnosis or treatment stratification. However, research into the role of SVs in cancer has been limited due to difficulties in detection. Biological and computational challenges confound SV detection in cancer samples, including intratumor heterogeneity, polyploidy, and distinguishing tumor-specific SVs from germline and somatic variants present in healthy cells. Classification of tumor-specific SVs is challenging due to inconsistencies in detected breakpoints, derived variant types and biological complexity of some rearrangements. Full-spectrum SV detection with high recall and precision requires integration of multiple algorithms and sequencing technologies to rescue variants that are difficult to resolve through individual methods. Here, we explore current strategies for integrating SV callsets and to enable the use of tumor-specific SVs in precision oncology.
Collapse
Affiliation(s)
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Patrick Kemmeren
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Jayne Y Hehir-Kwa
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands.
| |
Collapse
|
100
|
Zhou X, Zhang L, Weng Z, Dill DL, Sidow A. Aquila enables reference-assisted diploid personal genome assembly and comprehensive variant detection based on linked reads. Nat Commun 2021; 12:1077. [PMID: 33597536 PMCID: PMC7889865 DOI: 10.1038/s41467-021-21395-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 01/20/2021] [Indexed: 01/19/2023] Open
Abstract
We introduce Aquila, a new approach to variant discovery in personal genomes, which is critical for uncovering the genetic contributions to health and disease. Aquila uses a reference sequence and linked-read data to generate a high quality diploid genome assembly, from which it then comprehensively detects and phases personal genetic variation. The contigs of the assemblies from our libraries cover >95% of the human reference genome, with over 98% of that in a diploid state. Thus, the assemblies support detection and accurate genotyping of the most prevalent types of human genetic variation, including single nucleotide polymorphisms (SNPs), small insertions and deletions (small indels), and structural variants (SVs), in all but the most difficult regions. All heterozygous variants are phased in blocks that can approach arm-level length. The final output of Aquila is a diploid and phased personal genome sequence, and a phased Variant Call Format (VCF) file that also contains homozygous and a few unphased heterozygous variants. Aquila represents a cost-effective approach that can be applied to cohorts for variation discovery or association studies, or to single individuals with rare phenotypes that could be caused by SVs or compound heterozygosity.
Collapse
Affiliation(s)
- Xin Zhou
- Department of Computer Science, Stanford University, Stanford, CA, USA.
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA.
| | - Lu Zhang
- Department of Pathology, Stanford University, Stanford, CA, USA
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - Ziming Weng
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - David L Dill
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Arend Sidow
- Department of Pathology, Stanford University, Stanford, CA, USA.
- Department of Genetics, Stanford University, Stanford, CA, USA.
| |
Collapse
|