1
|
Zhou R, Jenkins JW, Zeng Y, Shu S, Jang H, Harding SA, Williams M, Plott C, Barry KW, Koriabine M, Amirebrahimi M, Talag J, Rajasekar S, Grimwood J, Schmitz RJ, Dawe RK, Schmutz J, Tsai CJ. Haplotype-resolved genome assembly of Populus tremula × P. alba reveals aspen-specific megabase satellite DNA. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2023; 116:1003-1017. [PMID: 37675609 DOI: 10.1111/tpj.16454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 08/23/2023] [Accepted: 08/25/2023] [Indexed: 09/08/2023]
Abstract
Populus species play a foundational role in diverse ecosystems and are important renewable feedstocks for bioenergy and bioproducts. Hybrid aspen Populus tremula × P. alba INRA 717-1B4 is a widely used transformation model in tree functional genomics and biotechnology research. As an outcrossing interspecific hybrid, its genome is riddled with sequence polymorphisms which present a challenge for sequence-sensitive analyses. Here we report a telomere-to-telomere genome for this hybrid aspen with two chromosome-scale, haplotype-resolved assemblies. We performed a comprehensive analysis of the repetitive landscape and identified both tandem repeat array-based and array-less centromeres. Unexpectedly, the most abundant satellite repeats in both haplotypes lie outside of the centromeres, consist of a 147 bp monomer PtaM147, frequently span >1 megabases, and form heterochromatic knobs. PtaM147 repeats are detected exclusively in aspens (section Populus) but PtaM147-like sequences occur in LTR-retrotransposons of closely related species, suggesting their origin from the retrotransposons. The genomic resource generated for this transformation model genotype has greatly improved the design and analysis of genome editing experiments that are highly sensitive to sequence polymorphisms. The work should motivate future hypothesis-driven research to probe into the function of the abundant and aspen-specific PtaM147 satellite DNA.
Collapse
Affiliation(s)
- Ran Zhou
- School of Forestry and Natural Resources, University of Georgia, Athens, Georgia, USA
- Department of Genetics, University of Georgia, Athens, Georgia, USA
- Department of Plant Biology, University of Georgia, Athens, Georgia, USA
| | - Jerry W Jenkins
- HudsonAlpha Institute of Biotechnology, Huntsville, Alabama, USA
| | - Yibing Zeng
- Department of Genetics, University of Georgia, Athens, Georgia, USA
| | - Shengqiang Shu
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, 94720, USA
| | - Hosung Jang
- Department of Genetics, University of Georgia, Athens, Georgia, USA
| | - Scott A Harding
- School of Forestry and Natural Resources, University of Georgia, Athens, Georgia, USA
- Department of Genetics, University of Georgia, Athens, Georgia, USA
- Department of Plant Biology, University of Georgia, Athens, Georgia, USA
| | - Melissa Williams
- HudsonAlpha Institute of Biotechnology, Huntsville, Alabama, USA
| | | | - Kerrie W Barry
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, 94720, USA
| | - Maxim Koriabine
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, 94720, USA
| | - Mojgan Amirebrahimi
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, 94720, USA
| | - Jayson Talag
- Arizona Genomics Institute, School of Plant Sciences, University of Arizona, Tucson, Arizona, USA
| | - Shanmugam Rajasekar
- Arizona Genomics Institute, School of Plant Sciences, University of Arizona, Tucson, Arizona, USA
| | - Jane Grimwood
- HudsonAlpha Institute of Biotechnology, Huntsville, Alabama, USA
| | - Robert J Schmitz
- Department of Genetics, University of Georgia, Athens, Georgia, USA
| | - R Kelly Dawe
- Department of Genetics, University of Georgia, Athens, Georgia, USA
- Department of Plant Biology, University of Georgia, Athens, Georgia, USA
| | - Jeremy Schmutz
- HudsonAlpha Institute of Biotechnology, Huntsville, Alabama, USA
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, 94720, USA
| | - Chung-Jui Tsai
- School of Forestry and Natural Resources, University of Georgia, Athens, Georgia, USA
- Department of Genetics, University of Georgia, Athens, Georgia, USA
- Department of Plant Biology, University of Georgia, Athens, Georgia, USA
| |
Collapse
|
2
|
Lainscsek X, Taher L. Predicting chromosomal compartments directly from the nucleotide sequence with DNA-DDA. Brief Bioinform 2023; 24:bbad198. [PMID: 37264486 PMCID: PMC10359093 DOI: 10.1093/bib/bbad198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 04/18/2023] [Accepted: 05/08/2023] [Indexed: 06/03/2023] Open
Abstract
Three-dimensional (3D) genome architecture is characterized by multi-scale patterns and plays an essential role in gene regulation. Chromatin conformation capturing experiments have revealed many properties underlying 3D genome architecture, such as the compartmentalization of chromatin based on transcriptional states. However, they are complex, costly and time consuming, and therefore only a limited number of cell types have been examined using these techniques. Increasing effort is being directed towards deriving computational methods that can predict chromatin conformation and associated structures. Here we present DNA-delay differential analysis (DDA), a purely sequence-based method based on chaos theory to predict genome-wide A and B compartments. We show that DNA-DDA models derived from a 20 Mb sequence are sufficient to predict genome wide compartmentalization at the scale of 100 kb in four different cell types. Although this is a proof-of-concept study, our method shows promise in elucidating the mechanisms responsible for genome folding as well as modeling the impact of genetic variation on 3D genome architecture and the processes regulated thereby.
Collapse
Affiliation(s)
- Xenia Lainscsek
- Institute of Biomedical Informatics, Graz University of Technology, Austria
| | - Leila Taher
- Institute of Biomedical Informatics, Graz University of Technology, Austria
| |
Collapse
|
3
|
Davidson PL, Guo H, Wang L, Berrio A, Zhang H, Chang Y, Soborowski AL, McClay DR, Fan G, Wray GA. Chromosomal-Level Genome Assembly of the Sea Urchin Lytechinus variegatus Substantially Improves Functional Genomic Analyses. Genome Biol Evol 2021; 12:1080-1086. [PMID: 32433766 DOI: 10.1093/gbe/evaa101] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2020] [Indexed: 11/13/2022] Open
Abstract
Lytechinus variegatus is a camarodont sea urchin found widely throughout the western Atlantic Ocean in a variety of shallow-water marine habitats. Its distribution, abundance, and amenability to developmental perturbation make it a popular model for ecologists and developmental biologists. Here, we present a chromosomal-level genome assembly of L. variegatus generated from a combination of PacBio long reads, 10× Genomics sequencing, and HiC chromatin interaction sequencing. We show L. variegatus has 19 chromosomes with an assembly size of 870.4 Mb. The contiguity and completeness of this assembly are reflected by a scaffold length N50 of 45.5 Mb and BUSCO completeness score of 95.5%. Ab initio and transcript-informed gene modeling and annotation identified 27,232 genes with an average gene length of 12.6 kb, comprising an estimated 39.5% of the genome. Repetitive regions, on the other hand, make up 45.4% of the genome. Physical mapping of well-studied developmental genes onto each chromosome reveals nonrandom spatial distribution of distinct genes and gene families, which provides insight into how certain gene families may have evolved and are transcriptionally regulated in this species. Lastly, aligning RNA-seq and ATAC-seq data onto this assembly demonstrates the value of highly contiguous, complete genome assemblies for functional genomics analyses that is unattainable with fragmented, incomplete assemblies. This genome will be of great value to the scientific community as a resource for genome evolution, developmental, and ecological studies of this species and the Echinodermata.
Collapse
Affiliation(s)
| | - Haobing Guo
- Beijing Genomics Institute-Qingdao, China.,Beijing Genomics Institute-Shenzhen, China
| | | | | | - He Zhang
- Beijing Genomics Institute-Qingdao, China.,Beijing Genomics Institute-Shenzhen, China
| | - Yue Chang
- Beijing Genomics Institute-Qingdao, China.,Beijing Genomics Institute-Shenzhen, China
| | - Andrew L Soborowski
- Program in Computational Biology and Bioinformatics, Duke University.,Center for Genomic and Computational Biology, Duke University
| | | | - Guangyi Fan
- Beijing Genomics Institute-Qingdao, China.,Beijing Genomics Institute-Shenzhen, China
| | - Gregory A Wray
- Department of Biology, Duke University.,Program in Computational Biology and Bioinformatics, Duke University.,Center for Genomic and Computational Biology, Duke University
| |
Collapse
|
4
|
|
5
|
Garvin MR, T Prates E, Pavicic M, Jones P, Amos BK, Geiger A, Shah MB, Streich J, Felipe Machado Gazolla JG, Kainer D, Cliff A, Romero J, Keith N, Brown JB, Jacobson D. Potentially adaptive SARS-CoV-2 mutations discovered with novel spatiotemporal and explainable AI models. Genome Biol 2020; 21:304. [PMID: 33357233 PMCID: PMC7756312 DOI: 10.1186/s13059-020-02191-0] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 10/29/2020] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND A mechanistic understanding of the spread of SARS-CoV-2 and diligent tracking of ongoing mutagenesis are of key importance to plan robust strategies for confining its transmission. Large numbers of available sequences and their dates of transmission provide an unprecedented opportunity to analyze evolutionary adaptation in novel ways. Addition of high-resolution structural information can reveal the functional basis of these processes at the molecular level. Integrated systems biology-directed analyses of these data layers afford valuable insights to build a global understanding of the COVID-19 pandemic. RESULTS Here we identify globally distributed haplotypes from 15,789 SARS-CoV-2 genomes and model their success based on their duration, dispersal, and frequency in the host population. Our models identify mutations that are likely compensatory adaptive changes that allowed for rapid expansion of the virus. Functional predictions from structural analyses indicate that, contrary to previous reports, the Asp614Gly mutation in the spike glycoprotein (S) likely reduced transmission and the subsequent Pro323Leu mutation in the RNA-dependent RNA polymerase led to the precipitous spread of the virus. Our model also suggests that two mutations in the nsp13 helicase allowed for the adaptation of the virus to the Pacific Northwest of the USA. Finally, our explainable artificial intelligence algorithm identified a mutational hotspot in the sequence of S that also displays a signature of positive selection and may have implications for tissue or cell-specific expression of the virus. CONCLUSIONS These results provide valuable insights for the development of drugs and surveillance strategies to combat the current and future pandemics.
Collapse
Affiliation(s)
- Michael R Garvin
- Oak Ridge National Laboratory, Biosciences Division, Oak Ridge, TN, USA
| | - Erica T Prates
- Oak Ridge National Laboratory, Biosciences Division, Oak Ridge, TN, USA
| | - Mirko Pavicic
- Oak Ridge National Laboratory, Biosciences Division, Oak Ridge, TN, USA
| | - Piet Jones
- Oak Ridge National Laboratory, Biosciences Division, Oak Ridge, TN, USA
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee Knoxville, Knoxville, TN, USA
| | - B Kirtley Amos
- Oak Ridge National Laboratory, Biosciences Division, Oak Ridge, TN, USA
- Department of Horticulture, N-318 Ag Sciences Center, University of Kentucky, Lexington, KY, USA
| | - Armin Geiger
- Oak Ridge National Laboratory, Biosciences Division, Oak Ridge, TN, USA
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee Knoxville, Knoxville, TN, USA
| | - Manesh B Shah
- Oak Ridge National Laboratory, Biosciences Division, Oak Ridge, TN, USA
| | - Jared Streich
- Oak Ridge National Laboratory, Biosciences Division, Oak Ridge, TN, USA
| | | | - David Kainer
- Oak Ridge National Laboratory, Biosciences Division, Oak Ridge, TN, USA
| | - Ashley Cliff
- Oak Ridge National Laboratory, Biosciences Division, Oak Ridge, TN, USA
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee Knoxville, Knoxville, TN, USA
| | - Jonathon Romero
- Oak Ridge National Laboratory, Biosciences Division, Oak Ridge, TN, USA
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee Knoxville, Knoxville, TN, USA
| | - Nathan Keith
- Lawrence Berkeley National Laboratory, Environmental Genomics & Systems Biology, Berkeley, CA, USA
| | - James B Brown
- Lawrence Berkeley National Laboratory, Environmental Genomics & Systems Biology, Berkeley, CA, USA
| | - Daniel Jacobson
- Oak Ridge National Laboratory, Biosciences Division, Oak Ridge, TN, USA.
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee Knoxville, Knoxville, TN, USA.
- Department of Psychology, University of Tennessee Knoxville, Knoxville, TN, USA.
| |
Collapse
|
6
|
Weighill D, Tschaplinski TJ, Tuskan GA, Jacobson D. Data Integration in Poplar: 'Omics Layers and Integration Strategies. Front Genet 2019; 10:874. [PMID: 31608114 PMCID: PMC6773870 DOI: 10.3389/fgene.2019.00874] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2019] [Accepted: 08/20/2019] [Indexed: 12/20/2022] Open
Abstract
Populus trichocarpa is an important biofuel feedstock that has been the target of extensive research and is emerging as a model organism for plants, especially woody perennials. This research has generated several large ‘omics datasets. However, only few studies in Populus have attempted to integrate various data types. This review will summarize various ‘omics data layers, focusing on their application in Populus species. Subsequently, network and signal processing techniques for the integration and analysis of these data types will be discussed, with particular reference to examples in Populus.
Collapse
Affiliation(s)
- Deborah Weighill
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, Knoxville, TN, United States.,Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Timothy J Tschaplinski
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, Knoxville, TN, United States.,Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Gerald A Tuskan
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Daniel Jacobson
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, Knoxville, TN, United States.,Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| |
Collapse
|