1
|
Wang J, Yang W, Zhang S, Hu H, Yuan Y, Dong J, Chen L, Ma Y, Yang T, Zhou L, Chen J, Liu B, Li C, Edwards D, Zhao J. A pangenome analysis pipeline provides insights into functional gene identification in rice. Genome Biol 2023; 24:19. [PMID: 36703158 PMCID: PMC9878884 DOI: 10.1186/s13059-023-02861-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 01/18/2023] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND A pangenome aims to capture the complete genetic diversity within a species and reduce bias in genetic analysis inherent in using a single reference genome. However, the current linear format of most plant pangenomes limits the presentation of position information for novel sequences. Graph pangenomes have been developed to overcome this limitation. However, bioinformatics analysis tools for graph format genomes are lacking. RESULTS To overcome this problem, we develop a novel strategy for pangenome construction and a downstream pangenome analysis pipeline (PSVCP) that captures genetic variants' position information while maintaining a linearized layout. Using PSVCP, we construct a high-quality rice pangenome using 12 representative rice genomes and analyze an international rice panel with 413 diverse accessions using the pangenome as the reference. We show that PSVCP successfully identifies causal structural variations for rice grain weight and plant height. Our results provide insights into rice population structure and genomic diversity. We characterize a new locus (qPH8-1) associated with plant height on chromosome 8 undetected by the SNP-based genome-wide association study (GWAS). CONCLUSIONS Our results demonstrate that the pangenome constructed by our pipeline combined with a presence and absence variation-based GWAS can provide additional power for genomic and genetic analysis. The pangenome constructed in this study and the associated genome sequence and genetic variants data provide valuable genomic resources for rice genomics research and improvement in future.
Collapse
Affiliation(s)
- Jian Wang
- grid.135769.f0000 0001 0561 6611Rice Research Institute & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangdong Academy of Agricultural Sciences, Guangzhou, 510640 China
| | - Wu Yang
- grid.135769.f0000 0001 0561 6611Rice Research Institute & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangdong Academy of Agricultural Sciences, Guangzhou, 510640 China
| | - Shaohong Zhang
- grid.135769.f0000 0001 0561 6611Rice Research Institute & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangdong Academy of Agricultural Sciences, Guangzhou, 510640 China
| | - Haifei Hu
- grid.135769.f0000 0001 0561 6611Rice Research Institute & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangdong Academy of Agricultural Sciences, Guangzhou, 510640 China ,grid.1025.60000 0004 0436 6763Western Crop Genetics Alliance, Murdoch University, Murdoch, Western Australia 6150 Australia
| | - Yuxuan Yuan
- grid.10784.3a0000 0004 1937 0482School of Life Sciences and State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Hong Kong, SAR China
| | - Jingfang Dong
- grid.135769.f0000 0001 0561 6611Rice Research Institute & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangdong Academy of Agricultural Sciences, Guangzhou, 510640 China
| | - Luo Chen
- grid.135769.f0000 0001 0561 6611Rice Research Institute & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangdong Academy of Agricultural Sciences, Guangzhou, 510640 China
| | - Yamei Ma
- grid.135769.f0000 0001 0561 6611Rice Research Institute & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangdong Academy of Agricultural Sciences, Guangzhou, 510640 China
| | - Tifeng Yang
- grid.135769.f0000 0001 0561 6611Rice Research Institute & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangdong Academy of Agricultural Sciences, Guangzhou, 510640 China
| | - Lian Zhou
- grid.135769.f0000 0001 0561 6611Rice Research Institute & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangdong Academy of Agricultural Sciences, Guangzhou, 510640 China
| | - Jiansong Chen
- grid.135769.f0000 0001 0561 6611Rice Research Institute & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangdong Academy of Agricultural Sciences, Guangzhou, 510640 China
| | - Bin Liu
- grid.135769.f0000 0001 0561 6611Rice Research Institute & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangdong Academy of Agricultural Sciences, Guangzhou, 510640 China
| | - Chengdao Li
- grid.1025.60000 0004 0436 6763Western Crop Genetics Alliance, Murdoch University, Murdoch, Western Australia 6150 Australia
| | - David Edwards
- grid.1012.20000 0004 1936 7910School of Biological Sciences and Centre for Applied Bioinformatics, University of Western Australia, Perth, WA Australia
| | - Junliang Zhao
- grid.135769.f0000 0001 0561 6611Rice Research Institute & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangdong Academy of Agricultural Sciences, Guangzhou, 510640 China
| |
Collapse
|
2
|
Han S, Dias GB, Basting PJ, Viswanatha R, Perrimon N, Bergman C. Local assembly of long reads enables phylogenomics of transposable elements in a polyploid cell line. Nucleic Acids Res 2022; 50:e124. [PMID: 36156149 PMCID: PMC9757076 DOI: 10.1093/nar/gkac794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 07/21/2022] [Accepted: 09/16/2022] [Indexed: 12/24/2022] Open
Abstract
Animal cell lines often undergo extreme genome restructuring events, including polyploidy and segmental aneuploidy that can impede de novo whole-genome assembly (WGA). In some species like Drosophila, cell lines also exhibit massive proliferation of transposable elements (TEs). To better understand the role of transposition during animal cell culture, we sequenced the genome of the tetraploid Drosophila S2R+ cell line using long-read and linked-read technologies. WGAs for S2R+ were highly fragmented and generated variable estimates of TE content across sequencing and assembly technologies. We therefore developed a novel WGA-independent bioinformatics method called TELR that identifies, locally assembles, and estimates allele frequency of TEs from long-read sequence data (https://github.com/bergmanlab/telr). Application of TELR to a ∼130x PacBio dataset for S2R+ revealed many haplotype-specific TE insertions that arose by transposition after initial cell line establishment and subsequent tetraploidization. Local assemblies from TELR also allowed phylogenetic analysis of paralogous TEs, which revealed that proliferation of TE families in vitro can be driven by single or multiple source lineages. Our work provides a model for the analysis of TEs in complex heterozygous or polyploid genomes that are recalcitrant to WGA and yields new insights into the mechanisms of genome evolution in animal cell culture.
Collapse
Affiliation(s)
| | | | - Preston J Basting
- Institute of Bioinformatics, University of Georgia, 120 E. Green St., Athens, GA, USA
| | - Raghuvir Viswanatha
- Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA, USA
| | - Norbert Perrimon
- Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA, USA,Howard Hughes Medical Institute, Boston, MA, USA
| | - Casey M Bergman
- To whom correspondence should be addressed. Tel: +1 706 542 1764; Fax: +1 706 542 3910;
| |
Collapse
|
3
|
Tan KT, Slevin MK, Meyerson M, Li H. Identifying and correcting repeat-calling errors in nanopore sequencing of telomeres. Genome Biol 2022; 23:180. [PMID: 36028900 PMCID: PMC9414165 DOI: 10.1186/s13059-022-02751-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 08/16/2022] [Indexed: 12/27/2022] Open
Abstract
Nanopore long-read sequencing is an emerging approach for studying genomes, including long repetitive elements like telomeres. Here, we report extensive basecalling induced errors at telomere repeats across nanopore datasets, sequencing platforms, basecallers, and basecalling models. We find that telomeres in many organisms are frequently miscalled. We demonstrate that tuning of nanopore basecalling models leads to improved recovery and analysis of telomeric regions, with minimal negative impact on other genomic regions. We highlight the importance of verifying nanopore basecalls in long, repetitive, and poorly defined regions, and showcase how artefacts can be resolved by improvements in nanopore basecalling models.
Collapse
Affiliation(s)
- Kar-Tong Tan
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Michael K Slevin
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Center for Cancer Genomics, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Matthew Meyerson
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
- Center for Cancer Genomics, Dana-Farber Cancer Institute, Boston, MA, USA.
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
4
|
Han S, Dias GB, Basting PJ, Nelson MG, Patel S, Marzo M, Bergman CM. Ongoing transposition in cell culture reveals the phylogeny of diverse Drosophila S2 sublines. Genetics 2022; 221:iyac077. [PMID: 35536183 PMCID: PMC9252272 DOI: 10.1093/genetics/iyac077] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 04/28/2022] [Indexed: 11/13/2022] Open
Abstract
Cultured cells are widely used in molecular biology despite poor understanding of how cell line genomes change in vitro over time. Previous work has shown that Drosophila cultured cells have a higher transposable element content than whole flies, but whether this increase in transposable element content resulted from an initial burst of transposition during cell line establishment or ongoing transposition in cell culture remains unclear. Here, we sequenced the genomes of 25 sublines of Drosophila S2 cells and show that transposable element insertions provide abundant markers for the phylogenetic reconstruction of diverse sublines in a model animal cell culture system. DNA copy number evolution across S2 sublines revealed dramatically different patterns of genome organization that support the overall evolutionary history reconstructed using transposable element insertions. Analysis of transposable element insertion site occupancy and ancestral states support a model of ongoing transposition dominated by episodic activity of a small number of retrotransposon families. Our work demonstrates that substantial genome evolution occurs during long-term Drosophila cell culture, which may impact the reproducibility of experiments that do not control for subline identity.
Collapse
Affiliation(s)
- Shunhua Han
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Guilherme B Dias
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
- Department of Genetics, University of Georgia, Athens, GA 30602, USA
| | - Preston J Basting
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Michael G Nelson
- Faculty of Life Sciences, University of Manchester, Manchester M13 9PT, UK
| | - Sanjai Patel
- Faculty of Life Sciences, University of Manchester, Manchester M13 9PT, UK
| | - Mar Marzo
- Faculty of Life Sciences, University of Manchester, Manchester M13 9PT, UK
| | - Casey M Bergman
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
- Department of Genetics, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|