1
|
Hjelmen CE. Genome size and chromosome number are critical metrics for accurate genome assembly assessment in Eukaryota. Genetics 2024; 227:iyae099. [PMID: 38869251 DOI: 10.1093/genetics/iyae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 04/02/2024] [Accepted: 06/06/2024] [Indexed: 06/14/2024] Open
Abstract
The number of genome assemblies has rapidly increased in recent history, with NCBI databases reaching over 41,000 eukaryotic genome assemblies across about 2,300 species. Increases in read length and improvements in assembly algorithms have led to increased contiguity and larger genome assemblies. While this number of assemblies is impressive, only about a third of these assemblies have corresponding genome size estimations for their respective species on publicly available databases. In this paper, genome assemblies are assessed regarding their total size compared to their respective publicly available genome size estimations. These deviations in size are assessed related to genome size, kingdom, sequencing platform, and standard assembly metrics, such as N50 and BUSCO values. A large proportion of assemblies deviate from their estimated genome size by more than 10%, with increasing deviations in size with increased genome size, suggesting nonprotein coding and structural DNA may be to blame. Modest differences in performance of sequencing platforms are noted as well. While standard metrics of genome assessment are more likely to indicate an assembly approaching the estimated genome size, much of the variation in this deviation in size is not explained with these raw metrics. A new, proportional N50 metric is proposed, in which N50 values are made relative to the average chromosome size of each species. This new metric has a stronger relationship with complete genome assemblies and, due to its proportional nature, allows for a more direct comparison across assemblies for genomes with variation in sizes and architectures.
Collapse
Affiliation(s)
- Carl E Hjelmen
- Department of Biology, Utah Valley University, 800 W. University Parkway, Orem, UT 84058, USA
| |
Collapse
|
2
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024; 25:460-475. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
3
|
Teterina AA, Willis JH, Baer CF, Phillips PC. Pervasive conservation of intron number and other genetic elements revealed by a chromosome-level genomic assembly of the hyper-polymorphic nematode Caenorhabditis brenneri. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600681. [PMID: 38979286 PMCID: PMC11230420 DOI: 10.1101/2024.06.25.600681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
With within-species genetic diversity estimates that span the gambit of that seen across the entirety of animals, the Caenorhabditis genus of nematodes holds unique potential to provide insights into how population size and reproductive strategies influence gene and genome organization and evolution. Our study focuses on Caenorhabditis brenneri, currently known as one of the most genetically diverse nematodes within its genus and metazoan phyla. Here, we present a high-quality gapless genome assembly and annotation for C. brenneri, revealing a common nematode chromosome arrangement characterized by gene-dense central regions and repeat rich peripheral parts. Comparison of C. brenneri with other nematodes from the 'Elegans' group revealed conserved macrosynteny but a lack of microsynteny, characterized by frequent rearrangements and low correlation iof orthogroup sizes, indicative of high rates of gene turnover. We also assessed genome organization within corresponding syntenic blocks in selfing and outcrossing species, affirming that selfing species predominantly experience loss of both genes and intergenic DNA. Comparison of gene structures revealed strikingly small number of shared introns across species, yet consistent distributions of intron number and length, regardless of population size or reproductive mode, suggesting that their evolutionary dynamics are primarily reflective of functional constraints. Our study provides valuable insights into genome evolution and expands the nematode genome resources with the highly genetically diverse C. brenneri, facilitating research into various aspects of nematode biology and evolutionary processes.
Collapse
Affiliation(s)
- Anastasia A Teterina
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR, USA
- Center of Parasitology, Severtsov Institute of Ecology and Evolution RAS, Moscow, Russia
| | - John H Willis
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR, USA
| | - Charles F Baer
- Department of Biology, University of Florida, Gainesville, USA
| | - Patrick C Phillips
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR, USA
| |
Collapse
|
4
|
Henglin M, Ghareghani M, Harvey W, Porubsky D, Koren S, Eichler EE, Ebert P, Marschall T. Phasing Diploid Genome Assembly Graphs with Single-Cell Strand Sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.15.580432. [PMID: 38529499 PMCID: PMC10962706 DOI: 10.1101/2024.02.15.580432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
Haplotype information is crucial for biomedical and population genetics research. However, current strategies to produce de-novo haplotype-resolved assemblies often require either difficult-to-acquire parental data or an intermediate haplotype-collapsed assembly. Here, we present Graphasing, a workflow which synthesizes the global phase signal of Strand-seq with assembly graph topology to produce chromosome-scale de-novo haplotypes for diploid genomes. Graphasing readily integrates with any assembly workflow that both outputs an assembly graph and has a haplotype assembly mode. Graphasing performs comparably to trio-phasing in contiguity, phasing accuracy, and assembly quality, outperforms Hi-C in phasing accuracy, and generates human assemblies with over 18 chromosome-spanning haplotypes.
Collapse
Affiliation(s)
- Mir Henglin
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Germany
| | - Maryam Ghareghani
- Department of Mathematics and Computer Science, Freie Universität Berlin, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - William Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Germany
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Germany
| |
Collapse
|
5
|
Niu Y, Fan X, Yang Y, Li J, Lian J, Wang L, Zhang Y, Tang Y, Tang Z. Haplotype-resolved assembly of a pig genome using single-sperm sequencing. Commun Biol 2024; 7:738. [PMID: 38890535 PMCID: PMC11189477 DOI: 10.1038/s42003-024-06397-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 05/29/2024] [Indexed: 06/20/2024] Open
Abstract
Single gamete cell sequencing together with long-read sequencing can reliably produce chromosome-level phased genomes. In this study, we employed PacBio HiFi and Hi-C sequencing on a male Landrace pig, coupled with single-sperm sequencing of its 102 sperm cells. A haplotype assembly method was developed based on long-read sequencing and sperm-phased markers. The chromosome-level phased assembly showed higher phasing accuracy than methods that rely only on HiFi reads. The use of single-sperm sequencing data enabled the construction of a genetic map, successfully mapping the sperm motility trait to a specific region on chromosome 1 (105.40-110.70 Mb). Furthermore, with the assistance of Y chromosome-bearing sperm data, 26.16 Mb Y chromosome sequences were assembled. We report a reliable approach for assembling chromosome-level phased genomes and reveal the potential of sperm population in basic biology research and sperm phenotype research.
Collapse
Affiliation(s)
- Yongchao Niu
- Kunpeng Institute of Modern Agriculture at Foshan, Agricultural Genomics Institute, Chinese Academy of Agricultural Sciences, Foshan, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Agriculture Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Xinhao Fan
- Kunpeng Institute of Modern Agriculture at Foshan, Agricultural Genomics Institute, Chinese Academy of Agricultural Sciences, Foshan, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Agriculture Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- GuangXi Engineering Centre for Resource Development of Bama Xiang Pig, Bama, China
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Yalan Yang
- Kunpeng Institute of Modern Agriculture at Foshan, Agricultural Genomics Institute, Chinese Academy of Agricultural Sciences, Foshan, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Agriculture Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Jiang Li
- Biozeron Shenzhen, Inc., Shenzhen, China
| | | | - Liu Wang
- Kunpeng Institute of Modern Agriculture at Foshan, Agricultural Genomics Institute, Chinese Academy of Agricultural Sciences, Foshan, China
| | - Yongjin Zhang
- GuangXi Engineering Centre for Resource Development of Bama Xiang Pig, Bama, China
| | - Yijie Tang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Agriculture Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Zhonglin Tang
- Kunpeng Institute of Modern Agriculture at Foshan, Agricultural Genomics Institute, Chinese Academy of Agricultural Sciences, Foshan, China.
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Agriculture Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
- GuangXi Engineering Centre for Resource Development of Bama Xiang Pig, Bama, China.
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
| |
Collapse
|
6
|
Li Q, Qiao X, Li L, Gu C, Yin H, Qi K, Xie Z, Yang S, Zhao Q, Wang Z, Yang Y, Pan J, Li H, Wang J, Wang C, Rieseberg LH, Zhang S, Tao S. Haplotype-resolved T2T genome assemblies and pangenome graph of pear reveal diverse patterns of allele-specific expression and the genomic basis of fruit quality traits. PLANT COMMUNICATIONS 2024:101000. [PMID: 38859586 DOI: 10.1016/j.xplc.2024.101000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 05/15/2024] [Accepted: 06/07/2024] [Indexed: 06/12/2024]
Abstract
Hybrid crops often exhibit increased yield and greater resilience, yet the genomic mechanism(s) underlying hybrid vigor or heterosis remain unclear, hindering our ability to predict the expression of phenotypic traits in hybrid breeding. Here, we generated haplotype-resolved T2T genome assemblies of two pear hybrid varieties, 'Yuluxiang' (YLX) and 'Hongxiangsu' (HXS), which share the same maternal parent but differ in their paternal parents. We then used these assemblies to explore the genome-scale landscape of allele-specific expression (ASE) and create a pangenome graph for pear. ASE was observed for close to 6000 genes in both hybrid cultivars. A subset of ASE genes related to aspects of fruit quality such as sugars, organic acids, and cuticular wax were identified, suggesting their important contributions to heterosis. Specifically, Ma1, a gene regulating fruit acidity, is absent in the paternal haplotypes of HXS and YLX. A pangenome graph was built based on our assemblies and seven published pear genomes. Resequencing data for 139 cultivated pear genotypes (including 97 genotypes sequenced here) were subsequently aligned to the pangenome graph, revealing numerous structural variant hotspots and selective sweeps during pear diversification. As predicted, the Ma1 allele was found to be absent in varieties with low organic acid content, and this association was functionally validated by Ma1 overexpression in pear fruit and calli. Overall, these results reveal the contributions of ASE to fruit-quality heterosis and provide a robust pangenome reference for high-resolution allele discovery and association mapping.
Collapse
Affiliation(s)
- Qionghou Li
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Xin Qiao
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Lanqing Li
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Chao Gu
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Hao Yin
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Kaijie Qi
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Zhihua Xie
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Sheng Yang
- Pomology Institute, Shanxi Agricultural University, Taigu, Shanxi 030801, China
| | - Qifeng Zhao
- Pomology Institute, Shanxi Agricultural University, Taigu, Shanxi 030801, China
| | - Zewen Wang
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Yuhang Yang
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Jiahui Pan
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Hongxiang Li
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Jie Wang
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Chao Wang
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Loren H Rieseberg
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada
| | - Shaoling Zhang
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Shutian Tao
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China.
| |
Collapse
|
7
|
Singh K, Huff M, Liu J, Park JW, Rickman T, Keremane M, Krueger RR, Kunta M, Roose ML, Dardick C, Staton M, Ramadugu C. Chromosome-Scale, De Novo, Phased Genome Assemblies of Three Australian Limes: Citrus australasica, C. inodora, and C. glauca. PLANTS (BASEL, SWITZERLAND) 2024; 13:1460. [PMID: 38891269 PMCID: PMC11174732 DOI: 10.3390/plants13111460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 05/14/2024] [Accepted: 05/20/2024] [Indexed: 06/21/2024]
Abstract
Huanglongbing (HLB) is a severe citrus disease worldwide. Wild Australian limes like Citrus australasica, C. inodora, and C. glauca possess beneficial HLB resistance traits. Individual trees of the three taxa were extensively used in a breeding program for over a decade to introgress resistance traits into commercial-quality citrus germplasm. We generated high-quality, phased, de novo genome assemblies of the three Australian limes using PacBio long-read sequencing. The genome assembly sizes of the primary and alternate haplotypes were determined for C. australasica (337 Mb/335 Mb), C. inodora (304 Mb/299 Mb), and C. glauca (376 Mb/379 Mb). The nine chromosome-scale scaffolds included 86-91% of the genome sequences generated. The integrity and completeness of the assembled genomes were estimated to be at 97.2-98.8%. Gene annotation studies identified 25,461 genes in C. australasica, 27,665 in C. inodora, and 30,067 in C. glauca. Genes belonging to 118 orthogroups were specific to Australian lime genomes compared to other citrus genomes analyzed. Significantly fewer canonical resistance (R) genes were found in C. inodora and C. glauca (319 and 449, respectively) compared to C. australasica (576), C. clementina (579), and C. sinensis (651). Similar patterns were observed for other gene families associated with potential HLB resistance, including Phloem protein 2 (PP2) and Callose synthase (CalS) genes predicted in the Australian lime genomes. The genomic information on Australian limes developed in the present study will help understand the genetic basis of HLB resistance.
Collapse
Affiliation(s)
- Khushwant Singh
- Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA; (K.S.); (M.L.R.)
| | - Matthew Huff
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN 37996, USA; (M.H.); (T.R.); (M.S.)
| | - Jianyang Liu
- Innovative Fruit Production, Improvement, and Protection, Appalachian Fruit Research Station, USDA-ARS, Kearneysville, WV 25430, USA; (J.L.); (C.D.)
| | - Jong-Won Park
- Citrus Center, Texas A&M University-Kingsville, Weslaco, TX 78599, USA; (J.-W.P.); (M.K.)
| | - Tara Rickman
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN 37996, USA; (M.H.); (T.R.); (M.S.)
| | - Manjunath Keremane
- National Clonal Germplasm Repository for Citrus and Dates, USDA-ARS, Riverside, CA 92507, USA; (M.K.); (R.R.K.)
| | - Robert R. Krueger
- National Clonal Germplasm Repository for Citrus and Dates, USDA-ARS, Riverside, CA 92507, USA; (M.K.); (R.R.K.)
| | - Madhurababu Kunta
- Citrus Center, Texas A&M University-Kingsville, Weslaco, TX 78599, USA; (J.-W.P.); (M.K.)
| | - Mikeal L. Roose
- Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA; (K.S.); (M.L.R.)
| | - Chris Dardick
- Innovative Fruit Production, Improvement, and Protection, Appalachian Fruit Research Station, USDA-ARS, Kearneysville, WV 25430, USA; (J.L.); (C.D.)
| | - Margaret Staton
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN 37996, USA; (M.H.); (T.R.); (M.S.)
| | - Chandrika Ramadugu
- Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA; (K.S.); (M.L.R.)
| |
Collapse
|
8
|
Wang X, Muenzler M, King J, Liu M, Li H, Budowle B, Ge J. A complete pipeline enables haplotyping and phasing macrohaplotype in long sequencing reads for polyploidy samples and a multi-source DNA mixture. Electrophoresis 2024; 45:877-884. [PMID: 38196015 DOI: 10.1002/elps.202300143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 11/19/2023] [Accepted: 11/30/2023] [Indexed: 01/11/2024]
Abstract
Macrohaplotype combines multiple types of phased DNA variants, increasing forensic discrimination power. High-quality long-sequencing reads, for example, PacBio HiFi reads, provide data to detect macrohaplotypes in multiploidy and DNA mixtures. However, the bioinformatics tools for detecting macrohaplotypes are lacking. In this study, we developed a bioinformatics software, MacroHapCaller, in which targeted loci (i.e., short TRs [STRs], single nucleotide polymorphisms, and insertion and deletions) are genotyped and combined with novel algorithms to call macrohaplotypes from long reads. MacroHapCaller uses physical phasing (i.e., read-backed phasing) to identify macrohaplotypes, and thus it can detect multi-allelic macrohaplotypes for a given sample. MacroHapCaller was validated with data generated from our designed targeted PacBio HiFi sequencing pipeline, which sequenced ∼8-kb amplicon regions harboring 20 core forensic STR loci in human benchmark samples HG002 and HG003. MacroHapCaller also was validated in whole-genome long-read sequencing data. Robust and accurate genotyping and phased macrohaplotypes were obtained with MacroHapCaller compared with the known ground truth. MacroHapCaller achieved a higher or consistent genotyping accuracy and faster speed than existing tools HipSTR and DeepVar. MacroHapCaller enables efficient macrohaplotype analysis from high-throughput sequencing data and supports applications using discriminating macrohaplotypes.
Collapse
Affiliation(s)
- Xuewen Wang
- Health Science Center, University of North Texas, Fort Worth, Texas, USA
| | - Melissa Muenzler
- Health Science Center, University of North Texas, Fort Worth, Texas, USA
| | - Jonathan King
- Health Science Center, University of North Texas, Fort Worth, Texas, USA
| | - Muyi Liu
- Health Science Center, University of North Texas, Fort Worth, Texas, USA
| | - Hongmin Li
- College of Science, Cal State East Bay, Hayward, California, USA
| | - Bruce Budowle
- Department of Forensic Medicine, University of Helsinki, Helsinki, Finland
- Forensic Science Institute, Radford University, Radford, Virginia, USA
| | - Jianye Ge
- Health Science Center, University of North Texas, Fort Worth, Texas, USA
| |
Collapse
|
9
|
Nie F, Ni P, Huang N, Zhang J, Wang Z, Xiao C, Luo F, Wang J. De novo diploid genome assembly using long noisy reads. Nat Commun 2024; 15:2964. [PMID: 38580638 PMCID: PMC10997618 DOI: 10.1038/s41467-024-47349-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 03/25/2024] [Indexed: 04/07/2024] Open
Abstract
The high sequencing error rate has impeded the application of long noisy reads for diploid genome assembly. Most existing assemblers failed to generate high-quality phased assemblies using long noisy reads. Here, we present PECAT, a Phased Error Correction and Assembly Tool, for reconstructing diploid genomes from long noisy reads. We design a haplotype-aware error correction method that can retain heterozygote alleles while correcting sequencing errors. We combine a corrected read SNP caller and a raw read SNP caller to further improve the identification of inconsistent overlaps in the string graph. We use a grouping method to assign reads to different haplotype groups. PECAT efficiently assembles diploid genomes using Nanopore R9, PacBio CLR or Nanopore R10 reads only. PECAT generates more contiguous haplotype-specific contigs compared to other assemblers. Especially, PECAT achieves nearly haplotype-resolved assembly on B. taurus (Bison×Simmental) using Nanopore R9 reads and phase block NG50 with 59.4/58.0 Mb for HG002 using Nanopore R10 reads.
Collapse
Affiliation(s)
- Fan Nie
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Xiangjiang Laboratory, Changsha, 410205, China
- National Center for Applied Mathematics in Hunan and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan, 411105, China
| | - Peng Ni
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Xiangjiang Laboratory, Changsha, 410205, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Neng Huang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Xiangjiang Laboratory, Changsha, 410205, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Jun Zhang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Xiangjiang Laboratory, Changsha, 410205, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Zhenyu Wang
- Institute of Nanfan & Seed Industry, Guangdong Academy of Sciences, Guangdong, 510316, China
| | - Chuanle Xiao
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University #7 Jinsui Road, Tianhe District, Guangzhou, China.
| | - Feng Luo
- School of Computing, Clemson University, Clemson, SC, 29634-0974, USA.
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
- Xiangjiang Laboratory, Changsha, 410205, China.
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China.
| |
Collapse
|
10
|
Darian JC, Kundu R, Rajaby R, Sung WK. Constructing telomere-to-telomere diploid genome by polishing haploid nanopore-based assembly. Nat Methods 2024; 21:574-583. [PMID: 38459383 DOI: 10.1038/s41592-023-02141-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 11/30/2023] [Indexed: 03/10/2024]
Abstract
Draft genomes generated from Oxford Nanopore Technologies (ONT) long reads are known to have a higher error rate. Although existing genome polishers can enhance their quality, the error rate (including mismatches, indels and switching errors between paternal and maternal haplotypes) can be significant. Here, we develop two polishers, hypo-short and hypo-hybrid to address this issue. Hypo-short utilizes Illumina short reads to polish an ONT-based draft assembly, resulting in a high-quality assembly with low error rates and switching errors. Expanding on this, hypo-hybrid incorporates ONT long reads to further refine the assembly into a diploid representation. Leveraging on hypo-hybrid, we have created a diploid genome assembly pipeline called hypo-assembler. Hypo-assembler automates the generation of highly accurate, contiguous and nearly complete diploid assemblies using ONT long reads, Illumina short reads and optionally Hi-C reads. Notably, our solution even allows for the production of telomere-to-telomere diploid genomes with additional manual steps. As a proof of concept, we successfully assembled a fully phased telomere-to-telomere diploid genome of HG00733, achieving a quality value exceeding 50.
Collapse
Affiliation(s)
| | - Ritu Kundu
- School of Computing, National University of Singapore, Singapore, Singapore
| | | | - Wing-Kin Sung
- School of Computing, National University of Singapore, Singapore, Singapore.
- Genome Institute of Singapore, Singapore, Singapore.
- Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong, China.
- JC STEM Laboratory of Computational Genomics, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong, China.
- Hong Kong Genome Institute, Hong Kong, China.
| |
Collapse
|
11
|
Yoon I, Kim U, Song Y, Park T, Lee DS. 3C methods in cancer research: recent advances and future prospects. Exp Mol Med 2024; 56:788-798. [PMID: 38658701 PMCID: PMC11059347 DOI: 10.1038/s12276-024-01236-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 03/15/2024] [Accepted: 03/25/2024] [Indexed: 04/26/2024] Open
Abstract
In recent years, Hi-C technology has revolutionized cancer research by elucidating the mystery of three-dimensional chromatin organization and its role in gene regulation. This paper explored the impact of Hi-C advancements on cancer research by delving into high-resolution techniques, such as chromatin loops, structural variants, haplotype phasing, and extrachromosomal DNA (ecDNA). Distant regulatory elements interact with their target genes through chromatin loops. Structural variants contribute to the development and progression of cancer. Haplotype phasing is crucial for understanding allele-specific genomic rearrangements and somatic clonal evolution in cancer. The role of ecDNA in driving oncogene amplification and drug resistance in cancer cells has also been revealed. These innovations offer a deeper understanding of cancer biology and the potential for personalized therapies. Despite these advancements, challenges, such as the accurate mapping of repetitive sequences and precise identification of structural variants, persist. Integrating Hi-C with multiomics data is key to overcoming these challenges and comprehensively understanding complex cancer genomes. Thus, Hi-C is a powerful tool for guiding precision medicine in cancer research and treatment.
Collapse
Affiliation(s)
- Insoo Yoon
- Department of Life Science, University of Seoul, Seoul, 02504, Republic of Korea
| | - Uijin Kim
- Department of Life Science, University of Seoul, Seoul, 02504, Republic of Korea
| | - Yousuk Song
- Department of Life Science, University of Seoul, Seoul, 02504, Republic of Korea
| | - Taesoo Park
- Department of Life Science, University of Seoul, Seoul, 02504, Republic of Korea
| | - Dong-Sung Lee
- Department of Life Science, University of Seoul, Seoul, 02504, Republic of Korea.
| |
Collapse
|
12
|
Guitart X, Porubsky D, Yoo D, Dougherty ML, Dishuck PC, Munson KM, Lewis AP, Hoekzema K, Knuth J, Chang S, Pastinen T, Eichler EE. Independent expansion, selection and hypervariability of the TBC1D3 gene family in humans. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.12.584650. [PMID: 38654825 PMCID: PMC11037872 DOI: 10.1101/2024.03.12.584650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
TBC1D3 is a primate-specific gene family that has expanded in the human lineage and has been implicated in neuronal progenitor proliferation and expansion of the frontal cortex. The gene family and its expression have been challenging to investigate because it is embedded in high-identity and highly variable segmental duplications. We sequenced and assembled the gene family using long-read sequencing data from 34 humans and 11 nonhuman primate species. Our analysis shows that this particular gene family has independently duplicated in at least five primate lineages, and the duplicated loci are enriched at sites of large-scale chromosomal rearrangements on chromosome 17. We find that most humans vary along two TBC1D3 clusters where human haplotypes are highly variable in copy number, differing by as many as 20 copies, and structure (structural heterozygosity 90%). We also show evidence of positive selection, as well as a significant change in the predicted human TBC1D3 protein sequence. Lastly, we find that, despite multiple duplications, human TBC1D3 expression is limited to a subset of copies and, most notably, from a single paralog group: TBC1D3-CDKL. These observations may help explain why a gene potentially important in cortical development can be so variable in the human population.
Collapse
Affiliation(s)
- Xavi Guitart
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Max L. Dougherty
- Tisch Cancer Institute, Division of Hematology and Medical Oncology, The Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Philip C. Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P. Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jordan Knuth
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Stephen Chang
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University, Stanford, CA, USA
| | - Tomi Pastinen
- Department of Pediatrics, Genomic Medicine Center, Children’s Mercy Kansas City, Kansas City, MO, USA
- Department of Pediatrics, School of Medicine, University of Missouri Kansas City, Kansas City, MO, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
13
|
Delorean EE, Youngblood RC, Simpson SA, Schoonmaker AN, Scheffler BE, Rutter WB, Hulse-Kemp AM. Representing true plant genomes: haplotype-resolved hybrid pepper genome with trio-binning. FRONTIERS IN PLANT SCIENCE 2023; 14:1184112. [PMID: 38034563 PMCID: PMC10687446 DOI: 10.3389/fpls.2023.1184112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Accepted: 10/17/2023] [Indexed: 12/02/2023]
Abstract
As sequencing costs decrease and availability of high fidelity long-read sequencing increases, generating experiment specific de novo genome assemblies becomes feasible. In many crop species, obtaining the genome of a hybrid or heterozygous individual is necessary for systems that do not tolerate inbreeding or for investigating important biological questions, such as hybrid vigor. However, most genome assembly methods that have been used in plants result in a merged single sequence representation that is not a true biologically accurate representation of either haplotype within a diploid individual. The resulting genome assembly is often fragmented and exhibits a mosaic of the two haplotypes, referred to as haplotype-switching. Important haplotype level information, such as causal mutations and structural variation is therefore lost causing difficulties in interpreting downstream analyses. To overcome this challenge, we have applied a method developed for animal genome assembly called trio-binning to an intra-specific hybrid of chili pepper (Capsicum annuum L. cv. HDA149 x Capsicum annuum L. cv. HDA330). We tested all currently available softwares for performing trio-binning, combined with multiple scaffolding technologies including Bionano to determine the optimal method of producing the best haplotype-resolved assembly. Ultimately, we produced highly contiguous biologically true haplotype-resolved genome assemblies for each parent, with scaffold N50s of 266.0 Mb and 281.3 Mb, with 99.6% and 99.8% positioned into chromosomes respectively. The assemblies captured 3.10 Gb and 3.12 Gb of the estimated 3.5 Gb chili pepper genome size. These assemblies represent the complete genome structure of the intraspecific hybrid, as well as the two parental genomes, and show measurable improvements over the currently available reference genomes. Our manuscript provides a valuable guide on how to apply trio-binning to other plant genomes.
Collapse
Affiliation(s)
- Emily E. Delorean
- Genomics and Bioinformatics Research Unit, USDA-ARS, Raleigh, NC, United States
- Crop and Soil Sciences Department, North Carolina State University, Raleigh, NC, United States
| | - Ramey C. Youngblood
- Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Starkville, MS, United States
| | - Sheron A. Simpson
- Genomics and Bioinformatics Research Unit, United States Department of Agriculture - Agriculture Research Service (USDA-ARS), Stoneville, MS, United States
| | - Ashley N. Schoonmaker
- Crop and Soil Sciences Department, North Carolina State University, Raleigh, NC, United States
| | - Brian E. Scheffler
- Genomics and Bioinformatics Research Unit, United States Department of Agriculture - Agriculture Research Service (USDA-ARS), Stoneville, MS, United States
| | - William B. Rutter
- US Vegetable Laboratory, United States Department of Agriculture - Agriculture Research Service (USDA-ARS), Charleston, SC, United States
| | - Amanda M. Hulse-Kemp
- Genomics and Bioinformatics Research Unit, USDA-ARS, Raleigh, NC, United States
- Crop and Soil Sciences Department, North Carolina State University, Raleigh, NC, United States
| |
Collapse
|
14
|
Fuller T, Bickhart DM, Koch LM, Kucek LK, Ali S, Mangelson H, Monteros MJ, Hernandez T, Smith TPL, Riday H, Sullivan ML. A reference assembly for the legume cover crop hairy vetch ( Vicia villosa). GIGABYTE 2023; 2023:gigabyte98. [PMID: 38023065 PMCID: PMC10659084 DOI: 10.46471/gigabyte.98] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 11/03/2023] [Indexed: 12/01/2023] Open
Abstract
Vicia villosa is an incompletely domesticated annual legume of the Fabaceae family native to Europe and Western Asia. V. villosa is widely used as a cover crop and forage due to its ability to withstand harsh winters. Here, we generated a reference-quality genome assembly (Vvill1.0) from low error-rate long-sequence reads to improve the genetic-based trait selection of this species. Our Vvill1.0 assembly includes seven scaffolds corresponding to the seven estimated linkage groups and comprising approximately 68% of the total genome size of 2.03 Gbp. This assembly is expected to be a useful resource for genetically improving this emerging cover crop species and provide useful insights into legume genomics and plant genome evolution.
Collapse
Affiliation(s)
- Tyson Fuller
- US Dairy Forage Research Center, United States Department of Agriculture Agricultural Research Service (USDA-ARS), 1925 Linden Drive, Madison, WI 53706, USA
| | - Derek M. Bickhart
- US Dairy Forage Research Center, United States Department of Agriculture Agricultural Research Service (USDA-ARS), 1925 Linden Drive, Madison, WI 53706, USA
| | - Lisa M. Koch
- US Dairy Forage Research Center, United States Department of Agriculture Agricultural Research Service (USDA-ARS), 1925 Linden Drive, Madison, WI 53706, USA
| | - Lisa Kissing Kucek
- US Dairy Forage Research Center, United States Department of Agriculture Agricultural Research Service (USDA-ARS), 1925 Linden Drive, Madison, WI 53706, USA
| | - Shahjahan Ali
- US Dairy Forage Research Center, United States Department of Agriculture Agricultural Research Service (USDA-ARS), 1925 Linden Drive, Madison, WI 53706, USA
| | | | - Maria J. Monteros
- Noble Research Institute, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| | - Timothy Hernandez
- Noble Research Institute, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| | - Timothy P. L. Smith
- US Meat Animal Research Center, United States Department of Agriculture Agricultural Research Service (USDA-ARS), PO Box 166 (State Spur 18D), Clay Center, NE 68933, USA
| | - Heathcliffe Riday
- US Dairy Forage Research Center, United States Department of Agriculture Agricultural Research Service (USDA-ARS), 1925 Linden Drive, Madison, WI 53706, USA
| | - Michael L. Sullivan
- US Dairy Forage Research Center, United States Department of Agriculture Agricultural Research Service (USDA-ARS), 1925 Linden Drive, Madison, WI 53706, USA
| |
Collapse
|
15
|
Bramsiepe J, Krabberød AK, Bjerkan KN, Alling RM, Johannessen IM, Hornslien KS, Miller JR, Brysting AK, Grini PE. Structural evidence for MADS-box type I family expansion seen in new assemblies of Arabidopsis arenosa and A. lyrata. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2023; 116:942-961. [PMID: 37517071 DOI: 10.1111/tpj.16401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Revised: 05/24/2023] [Accepted: 07/13/2023] [Indexed: 08/01/2023]
Abstract
Arabidopsis thaliana diverged from A. arenosa and A. lyrata at least 6 million years ago. The three species differ by genome-wide polymorphisms and morphological traits. The species are to a high degree reproductively isolated, but hybridization barriers are incomplete. A special type of hybridization barrier is based on the triploid endosperm of the seed, where embryo lethality is caused by endosperm failure to support the developing embryo. The MADS-box type I family of transcription factors is specifically expressed in the endosperm and has been proposed to play a role in endosperm-based hybridization barriers. The gene family is well known for its high evolutionary duplication rate, as well as being regulated by genomic imprinting. Here we address MADS-box type I gene family evolution and the role of type I genes in the context of hybridization. Using two de-novo assembled and annotated chromosome-level genomes of A. arenosa and A. lyrata ssp. petraea we analyzed the MADS-box type I gene family in Arabidopsis to predict orthologs, copy number, and structural genomic variation related to the type I loci. Our findings were compared to gene expression profiles sampled before and after the transition to endosperm cellularization in order to investigate the involvement of MADS-box type I loci in endosperm-based hybridization barriers. We observed substantial differences in type-I expression in the endosperm of A. arenosa and A. lyrata ssp. petraea, suggesting a genetic cause for the endosperm-based hybridization barrier between A. arenosa and A. lyrata ssp. petraea.
Collapse
Affiliation(s)
- Jonathan Bramsiepe
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
- CEES, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
| | - Anders K Krabberød
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
| | - Katrine N Bjerkan
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
- CEES, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
| | - Renate M Alling
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
- CEES, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
| | - Ida M Johannessen
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
| | - Karina S Hornslien
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
| | - Jason R Miller
- College of STEM, Shepherd University, Shepherdstown, West Virginia, 25443-5000, USA
| | - Anne K Brysting
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
- CEES, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
| | - Paul E Grini
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
| |
Collapse
|
16
|
Hart SFM, Yonemitsu MA, Giersch RM, Garrett FES, Beal BF, Arriagada G, Davis BW, Ostrander EA, Goff SP, Metzger MJ. Centuries of genome instability and evolution in soft-shell clam, Mya arenaria, bivalve transmissible neoplasia. NATURE CANCER 2023; 4:1561-1574. [PMID: 37783804 PMCID: PMC10663159 DOI: 10.1038/s43018-023-00643-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 08/29/2023] [Indexed: 10/04/2023]
Abstract
Transmissible cancers are infectious parasitic clones that metastasize to new hosts, living past the death of the founder animal in which the cancer initiated. We investigated the evolutionary history of a cancer lineage that has spread though the soft-shell clam (Mya arenaria) population by assembling a chromosome-scale soft-shell clam reference genome and characterizing somatic mutations in transmissible cancer. We observe high mutation density, widespread copy-number gain, structural rearrangement, loss of heterozygosity, variable telomere lengths, mitochondrial genome expansion and transposable element activity, all indicative of an unstable cancer genome. We also discover a previously unreported mutational signature associated with overexpression of an error-prone polymerase and use this to estimate the lineage to be >200 years old. Our study reveals the ability for an invertebrate cancer lineage to survive for centuries while its genome continues to structurally mutate, likely contributing to the evolution of this lineage as a parasitic cancer.
Collapse
Affiliation(s)
- Samuel F M Hart
- Pacific Northwest Research Institute, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | - Marisa A Yonemitsu
- Pacific Northwest Research Institute, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | | | | | - Brian F Beal
- Division of Environmental and Biological Sciences, University of Maine at Machias, Machias, ME, USA
- Downeast Institute, Beals, ME, USA
| | - Gloria Arriagada
- Instituto de Ciencias Biomedicas, Facultad de Medicina y Facultad de Ciencias de la Vida, Universidad Andres Bello, Santiago, Chile
- FONDAP Center for Genome Regulation, Santiago, Chile
| | - Brian W Davis
- Department of Veterinary Integrative Biosciences, Texas A&M University School of Veterinary Medicine, College Station, TX, USA
- Department of Small Animal Clinical Sciences, Texas A&M University School of Veterinary Medicine, College Station, TX, USA
| | - Elaine A Ostrander
- Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Stephen P Goff
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
- Department of Microbiology and Immunology, Columbia University, New York, NY, USA
| | - Michael J Metzger
- Pacific Northwest Research Institute, Seattle, WA, USA.
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA.
| |
Collapse
|
17
|
Sperschneider J, Hewitt T, Lewis DC, Periyannan S, Milgate AW, Hickey LT, Mago R, Dodds PN, Figueroa M. Nuclear exchange generates population diversity in the wheat leaf rust pathogen Puccinia triticina. Nat Microbiol 2023; 8:2130-2141. [PMID: 37884814 PMCID: PMC10627818 DOI: 10.1038/s41564-023-01494-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 09/11/2023] [Indexed: 10/28/2023]
Abstract
In clonally reproducing dikaryotic rust fungi, non-sexual processes such as somatic nuclear exchange are postulated to play a role in diversity but have been difficult to detect due to the lack of genome resolution between the two haploid nuclei. We examined three nuclear-phased genome assemblies of Puccinia triticina, which causes wheat leaf rust disease. We found that the most recently emerged Australian lineage was derived by nuclear exchange between two pre-existing lineages, which originated in Europe and North America. Haplotype-specific phylogenetic analysis reveals that repeated somatic exchange events have shuffled haploid nuclei between long-term clonal lineages, leading to a global P. triticina population representing different combinations of a limited number of haploid genomes. Thus, nuclear exchange seems to be the predominant mechanism generating diversity and the emergence of new strains in this otherwise clonal pathogen. Such genomics-accelerated surveillance of pathogen evolution paves the way for more accurate global disease monitoring.
Collapse
Affiliation(s)
- Jana Sperschneider
- Black Mountain Science and Innovation Park, CSIRO Agriculture and Food, GPO, Canberra, Australian Capital Territory, Australia.
| | - Tim Hewitt
- Black Mountain Science and Innovation Park, CSIRO Agriculture and Food, GPO, Canberra, Australian Capital Territory, Australia
| | - David C Lewis
- Black Mountain Science and Innovation Park, CSIRO Agriculture and Food, GPO, Canberra, Australian Capital Territory, Australia
| | - Sambasivam Periyannan
- Black Mountain Science and Innovation Park, CSIRO Agriculture and Food, GPO, Canberra, Australian Capital Territory, Australia
- School of Agriculture and Environmental Science, Centre for Crop Health, The University of Southern Queensland, Toowoomba, Queensland, Australia
| | - Andrew W Milgate
- NSW Department of Primary Industries, Wagga Wagga Agricultural Institute, Wagga Wagga, New South Wales, Australia
| | - Lee T Hickey
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, Queensland, Australia
| | - Rohit Mago
- Black Mountain Science and Innovation Park, CSIRO Agriculture and Food, GPO, Canberra, Australian Capital Territory, Australia
| | - Peter N Dodds
- Black Mountain Science and Innovation Park, CSIRO Agriculture and Food, GPO, Canberra, Australian Capital Territory, Australia.
| | - Melania Figueroa
- Black Mountain Science and Innovation Park, CSIRO Agriculture and Food, GPO, Canberra, Australian Capital Territory, Australia.
| |
Collapse
|
18
|
Xie H, Li W, Guo Y, Su X, Chen K, Wen L, Tang F. Long-read-based single sperm genome sequencing for chromosome-wide haplotype phasing of both SNPs and SVs. Nucleic Acids Res 2023; 51:8020-8034. [PMID: 37351613 PMCID: PMC10450174 DOI: 10.1093/nar/gkad532] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 06/01/2023] [Accepted: 06/09/2023] [Indexed: 06/24/2023] Open
Abstract
Although localized haploid phasing can be achieved using long read genome sequencing without parental data, reliable chromosome-scale phasing remains a great challenge. Given that sperm is a natural haploid cell, single-sperm genome sequencing can provide a chromosome-wide phase signal. Due to the limitation of read length, current short-read-based single-sperm genome sequencing methods can only achieve SNP haplotyping and come with difficulties in detecting and haplotyping structural variations (SVs) in complex genomic regions. To overcome these limitations, we developed a long-read-based single-sperm genome sequencing method and a corresponding data analysis pipeline that can accurately identify crossover events and chromosomal level aneuploidies in single sperm and efficiently detect SVs within individual sperm cells. Importantly, without parental genome information, our method can accurately conduct de novo phasing of heterozygous SVs as well as SNPs from male individuals at the whole chromosome scale. The accuracy for phasing of SVs was as high as 98.59% using 100 single sperm cells, and the accuracy for phasing of SNPs was as high as 99.95%. Additionally, our method reliably enabled deduction of the repeat expansions of haplotype-resolved STRs/VNTRs in single sperm cells. Our method provides a new opportunity for studying haplotype-related genetics in mammals.
Collapse
Affiliation(s)
- Haoling Xie
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China
- Changping Laboratory, Changping Laboratory, Yard 28, Science Park Road, Changping District, Beijing 102206, China
| | - Wen Li
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China
| | - Yuqing Guo
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China
| | - Xinjie Su
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China
| | - Kexuan Chen
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China
| | - Lu Wen
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China
| | - Fuchou Tang
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China
- Changping Laboratory, Changping Laboratory, Yard 28, Science Park Road, Changping District, Beijing 102206, China
| |
Collapse
|
19
|
Ruiz JL, Reimering S, Escobar-Prieto JD, Brancucci NMB, Echeverry DF, Abdi AI, Marti M, Gómez-Díaz E, Otto TD. From contigs towards chromosomes: automatic improvement of long read assemblies (ILRA). Brief Bioinform 2023; 24:bbad248. [PMID: 37406192 PMCID: PMC10359078 DOI: 10.1093/bib/bbad248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/24/2023] [Accepted: 06/16/2023] [Indexed: 07/07/2023] Open
Abstract
Recent advances in long read technologies not only enable large consortia to aim to sequence all eukaryotes on Earth, but they also allow individual laboratories to sequence their species of interest with relatively low investment. Long read technologies embody the promise of overcoming scaffolding problems associated with repeats and low complexity sequences, but the number of contigs often far exceeds the number of chromosomes and they may contain many insertion and deletion errors around homopolymer tracts. To overcome these issues, we have implemented the ILRA pipeline to correct long read-based assemblies. Contigs are first reordered, renamed, merged, circularized, or filtered if erroneous or contaminated. Illumina short reads are used subsequently to correct homopolymer errors. We successfully tested our approach by improving the genome sequences of Homo sapiens, Trypanosoma brucei, and Leptosphaeria spp., and by generating four novel Plasmodium falciparum assemblies from field samples. We found that correcting homopolymer tracts reduced the number of genes incorrectly annotated as pseudogenes, but an iterative approach seems to be required to correct more sequencing errors. In summary, we describe and benchmark the performance of our new tool, which improved the quality of novel long read assemblies up to 1 Gbp. The pipeline is available at GitHub: https://github.com/ThomasDOtto/ILRA.
Collapse
Affiliation(s)
- José Luis Ruiz
- Instituto de Parasitología y Biomedicina López-Neyra (IPBLN), Consejo Superior de Investigaciones Científicas, 18016, Granada, Spain
| | - Susanne Reimering
- Department for Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Nicolas M B Brancucci
- School of Infection & Immunity, MVLS, University of Glasgow, Glasgow, UK
- Department of Medical Parasitology and Infection Biology, Swiss Tropical and Public Health Institute, 4123 Allschwil, Switzerland
- University of Basel, 4001 Basel, Switzerland
| | - Diego F Echeverry
- Centro Internacional de Entrenamiento e Investigaciones Médicas (CIDEIM), Cali, Colombia
- Departamento de Microbiología, Facultad de Salud, Universidad del Valle, Cali, Colombia
| | | | - Matthias Marti
- School of Infection & Immunity, MVLS, University of Glasgow, Glasgow, UK
| | - Elena Gómez-Díaz
- Instituto de Parasitología y Biomedicina López-Neyra (IPBLN), Consejo Superior de Investigaciones Científicas, 18016, Granada, Spain
| | - Thomas D Otto
- School of Infection & Immunity, MVLS, University of Glasgow, Glasgow, UK
| |
Collapse
|
20
|
Ouchi S, Kajitani R, Itoh T. GreenHill: a de novo chromosome-level scaffolding and phasing tool using Hi-C. Genome Biol 2023; 24:162. [PMID: 37434204 DOI: 10.1186/s13059-023-03006-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 07/04/2023] [Indexed: 07/13/2023] Open
Abstract
Chromosome-level haplotype-resolved genome assembly is an important resource in molecular biology. However, current de novo haplotype assemblers require parental data or reference genomes and often fail to provide chromosome-level results. We present GreenHill, a novel scaffolding and phasing tool that considers various assemblers' contigs as input to reconstruct chromosome-level haplotypes using Hi-C without parental or reference data. Its unique functions include new error correction based on Hi-C contacts and the simultaneous use of Hi-C and long reads. Benchmarks reveal that GreenHill outperforms other approaches in contiguity and phasing accuracy, and the majority of chromosome arms are entirely phased.
Collapse
Affiliation(s)
- Shun Ouchi
- School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-Ku, Tokyo, 152-8550, Japan
| | - Rei Kajitani
- School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-Ku, Tokyo, 152-8550, Japan
| | - Takehiko Itoh
- School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-Ku, Tokyo, 152-8550, Japan.
| |
Collapse
|
21
|
Shinde SS, Sharma A, Vijay N. Decoding the fibromelanosis locus complex chromosomal rearrangement of black-bone chicken: genetic differentiation, selective sweeps and protein-coding changes in Kadaknath chicken. Front Genet 2023; 14:1180658. [PMID: 37424723 PMCID: PMC10325862 DOI: 10.3389/fgene.2023.1180658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 06/05/2023] [Indexed: 07/11/2023] Open
Abstract
Black-bone chicken (BBC) meat is popular for its distinctive taste and texture. A complex chromosomal rearrangement at the fibromelanosis (Fm) locus on the 20th chromosome results in increased endothelin-3 (EDN3) gene expression and is responsible for melanin hyperpigmentation in BBC. We use public long-read sequencing data of the Silkie breed to resolve high-confidence haplotypes at the Fm locus spanning both Dup1 and Dup2 regions and establish that the Fm_2 scenario is correct of the three possible scenarios of the complex chromosomal rearrangement. The relationship between Chinese and Korean BBC breeds with Kadaknath native to India is underexplored. Our data from whole-genome re-sequencing establish that all BBC breeds, including Kadaknath, share the complex chromosomal rearrangement junctions at the fibromelanosis (Fm) locus. We also identify two Fm locus proximal regions (∼70 Kb and ∼300 Kb) with signatures of selection unique to Kadaknath. These regions harbor several genes with protein-coding changes, with the bactericidal/permeability-increasing-protein-like gene having two Kadaknath-specific changes within protein domains. Our results indicate that protein-coding changes in the bactericidal/permeability-increasing-protein-like gene hitchhiked with the Fm locus in Kadaknath due to close physical linkage. Identifying this Fm locus proximal selective sweep sheds light on the genetic distinctiveness of Kadaknath compared to other BBC.
Collapse
Affiliation(s)
| | | | - Nagarjun Vijay
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal, Bhauri, Madhya Pradesh, India
| |
Collapse
|
22
|
Kong W, Wang Y, Zhang S, Yu J, Zhang X. Recent Advances in Assembly of Complex Plant Genomes. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:427-439. [PMID: 37100237 PMCID: PMC10787022 DOI: 10.1016/j.gpb.2023.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Revised: 03/18/2023] [Accepted: 04/07/2023] [Indexed: 04/28/2023]
Abstract
Over the past 20 years, tremendous advances in sequencing technologies and computational algorithms have spurred plant genomic research into a thriving era with hundreds of genomes decoded already, ranging from those of nonvascular plants to those of flowering plants. However, complex plant genome assembly is still challenging and remains difficult to fully resolve with conventional sequencing and assembly methods due to high heterozygosity, highly repetitive sequences, or high ploidy characteristics of complex genomes. Herein, we summarize the challenges of and advances in complex plant genome assembly, including feasible experimental strategies, upgrades to sequencing technology, existing assembly methods, and different phasing algorithms. Moreover, we list actual cases of complex genome projects for readers to refer to and draw upon to solve future problems related to complex genomes. Finally, we expect that the accurate, gapless, telomere-to-telomere, and fully phased assembly of complex plant genomes could soon become routine.
Collapse
Affiliation(s)
- Weilong Kong
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Yibin Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Shengcheng Zhang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Jiaxin Yu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Xingtan Zhang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China.
| |
Collapse
|
23
|
Gonzalez-Garcia L, Guevara-Barrientos D, Lozano-Arce D, Gil J, Díaz-Riaño J, Duarte E, Andrade G, Bojacá JC, Hoyos-Sanchez MC, Chavarro C, Guayazan N, Chica LA, Buitrago Acosta MC, Bautista E, Trujillo M, Duitama J. New algorithms for accurate and efficient de novo genome assembly from long DNA sequencing reads. Life Sci Alliance 2023; 6:e202201719. [PMID: 36813568 PMCID: PMC9946810 DOI: 10.26508/lsa.202201719] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Revised: 02/10/2023] [Accepted: 02/13/2023] [Indexed: 02/24/2023] Open
Abstract
Building de novo genome assemblies for complex genomes is possible thanks to long-read DNA sequencing technologies. However, maximizing the quality of assemblies based on long reads is a challenging task that requires the development of specialized data analysis techniques. We present new algorithms for assembling long DNA sequencing reads from haploid and diploid organisms. The assembly algorithm builds an undirected graph with two vertices for each read based on minimizers selected by a hash function derived from the k-mer distribution. Statistics collected during the graph construction are used as features to build layout paths by selecting edges, ranked by a likelihood function. For diploid samples, we integrated a reimplementation of the ReFHap algorithm to perform molecular phasing. We ran the implemented algorithms on PacBio HiFi and Nanopore sequencing data taken from haploid and diploid samples of different species. Our algorithms showed competitive accuracy and computational efficiency, compared with other currently used software. We expect that this new development will be useful for researchers building genome assemblies for different species.
Collapse
Affiliation(s)
- Laura Gonzalez-Garcia
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | | | - Daniela Lozano-Arce
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Juanita Gil
- Department of Entomology and Plant Pathology, University of Arkansas, Fayetteville, AR, USA
| | - Jorge Díaz-Riaño
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Erick Duarte
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Germán Andrade
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Juan Camilo Bojacá
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | | | - Christian Chavarro
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Natalia Guayazan
- Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - Luis Alberto Chica
- Research Group on Computational Biology and Microbial Ecology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
- Max Planck Tandem Group in Computational Biology, Universidad de los Andes, Bogotá, Colombia
| | | | - Edwin Bautista
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Miller Trujillo
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| |
Collapse
|
24
|
Trible W, Chandra V, Lacy KD, Limón G, McKenzie SK, Olivos-Cisneros L, Arsenault SV, Kronauer DJC. A caste differentiation mutant elucidates the evolution of socially parasitic ants. Curr Biol 2023; 33:1047-1058.e4. [PMID: 36858043 PMCID: PMC10050096 DOI: 10.1016/j.cub.2023.01.067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 08/31/2022] [Accepted: 01/31/2023] [Indexed: 03/03/2023]
Abstract
Most ant species have two distinct female castes-queens and workers-yet the developmental and genetic mechanisms that produce these alternative phenotypes remain poorly understood. Working with a clonal ant, we discovered a variant strain that expresses queen-like traits in individuals that would normally become workers. The variants show changes in morphology, behavior, and fitness that cause them to rely on workers in wild-type (WT) colonies for survival. Overall, they resemble the queens of many obligately parasitic ants that have evolutionarily lost the worker caste and live inside colonies of closely related hosts. The prevailing theory for the evolution of these workerless social parasites is that they evolve from reproductively isolated populations of facultative intermediates that acquire parasitic phenotypes in a stepwise fashion. However, empirical evidence for such facultative ancestors remains weak, and it is unclear how reproductive isolation could gradually arise in sympatry. In contrast, we isolated these variants just a few generations after they arose within their WT parent colony, implying that the complex phenotype reported here was induced in a single genetic step. This suggests that a single genetic module can decouple the coordinated mechanisms of caste development, allowing an obligately parasitic variant to arise directly from a free-living ancestor. Consistent with this hypothesis, the variants have lost one of the two alleles of a putative supergene that is heterozygous in WTs. These findings provide a plausible explanation for the evolution of ant social parasites and implicate new candidate molecular mechanisms for ant caste differentiation.
Collapse
Affiliation(s)
- Waring Trible
- Laboratory of Social Evolution and Behavior, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA; John Harvard Distinguished Science Fellowship Program, Harvard University, 52 Oxford Street, Cambridge, MA 02138, USA.
| | - Vikram Chandra
- Laboratory of Social Evolution and Behavior, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA; Department of Organismic and Evolutionary Biology, Harvard University, 52 Oxford Street, Cambridge, MA 02138, USA
| | - Kip D Lacy
- Laboratory of Social Evolution and Behavior, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA
| | - Gina Limón
- Laboratory of Social Evolution and Behavior, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA; Department of Microbiology, New York University School of Medicine, 430 E. 29th Street, New York, NY 10016, USA
| | - Sean K McKenzie
- Laboratory of Social Evolution and Behavior, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA; Oxford Nanopore Technologies, Oxford OX4 4DQ, UK
| | - Leonora Olivos-Cisneros
- Laboratory of Social Evolution and Behavior, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA
| | - Samuel V Arsenault
- John Harvard Distinguished Science Fellowship Program, Harvard University, 52 Oxford Street, Cambridge, MA 02138, USA; Department of Organismic and Evolutionary Biology, Harvard University, 52 Oxford Street, Cambridge, MA 02138, USA
| | - Daniel J C Kronauer
- Laboratory of Social Evolution and Behavior, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA; Howard Hughes Medical Institute, New York, NY 10065, USA.
| |
Collapse
|
25
|
De Novo Assembly and Annotation of 11 Diverse Shrub Willow ( Salix) Genomes Reveals Novel Gene Organization in Sex-Linked Regions. Int J Mol Sci 2023; 24:ijms24032904. [PMID: 36769224 PMCID: PMC9917877 DOI: 10.3390/ijms24032904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 01/13/2023] [Accepted: 01/31/2023] [Indexed: 02/05/2023] Open
Abstract
Poplar and willow species in the Salicaceae are dioecious, yet have been shown to use different sex determination systems located on different chromosomes. Willows in the subgenus Vetrix are interesting for comparative studies of sex determination systems, yet genomic resources for these species are still quite limited. Only a few annotated reference genome assemblies are available, despite many species in use in breeding programs. Here we present de novo assemblies and annotations of 11 shrub willow genomes from six species. Copy number variation of candidate sex determination genes within each genome was characterized and revealed remarkable differences in putative master regulator gene duplication and deletion. We also analyzed copy number and expression of candidate genes involved in floral secondary metabolism, and identified substantial variation across genotypes, which can be used for parental selection in breeding programs. Lastly, we report on a genotype that produces only female descendants and identified gene presence/absence variation in the mitochondrial genome that may be responsible for this unusual inheritance.
Collapse
|
26
|
Mengist MF, Bostan H, De Paola D, Teresi SJ, Platts AE, Cremona G, Qi X, Mackey T, Bassil NV, Ashrafi H, Giongo L, Jibran R, Chagné D, Bianco L, Lila MA, Rowland LJ, Iovene M, Edger PP, Iorizzo M. Autopolyploid inheritance and a heterozygous reciprocal translocation shape chromosome genetic behavior in tetraploid blueberry (Vaccinium corymbosum). THE NEW PHYTOLOGIST 2023; 237:1024-1039. [PMID: 35962608 PMCID: PMC10087351 DOI: 10.1111/nph.18428] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 08/01/2022] [Indexed: 06/02/2023]
Abstract
Understanding chromosome recombination behavior in polyploidy species is key to advancing genetic discoveries. In blueberry, a tetraploid species, the line of evidences about its genetic behavior still remain poorly understood, owing to the inter-specific, and inter-ploidy admixture of its genome and lack of in depth genome-wide inheritance and comparative structural studies. Here we describe a new high-quality, phased, chromosome-scale genome of a diploid blueberry, clone W85. The genome was integrated with cytogenetics and high-density, genetic maps representing six tetraploid blueberry cultivars, harboring different levels of wild genome admixture, to uncover recombination behavior and structural genome divergence across tetraploid and wild diploid species. Analysis of chromosome inheritance and pairing demonstrated that tetraploid blueberry behaves as an autotetraploid with tetrasomic inheritance. Comparative analysis demonstrated the presence of a reciprocal, heterozygous, translocation spanning one homolog of chr-6 and one of chr-10 in the cultivar Draper. The translocation affects pairing and recombination of chromosomes 6 and 10. Besides the translocation detected in Draper, no other structural genomic divergences were detected across tetraploid cultivars and highly inter-crossable wild diploid species. These findings and resources will facilitate new genetic and comparative genomic studies in Vaccinium and the development of genomic assisted selection strategy for this crop.
Collapse
Affiliation(s)
- Molla F. Mengist
- Plants for Human Health InstituteNorth Carolina State UniversityKannapolisNC28081USA
| | - Hamed Bostan
- Plants for Human Health InstituteNorth Carolina State UniversityKannapolisNC28081USA
| | - Domenico De Paola
- Institute of Biosciences and BioresourcesNational Research Council of ItalyBari70126Italy
| | - Scott J. Teresi
- Department of HorticultureMichigan State UniversityEast LansingMI48824USA
| | - Adrian E. Platts
- Department of HorticultureMichigan State UniversityEast LansingMI48824USA
| | - Gaetana Cremona
- Institute of Biosciences and BioresourcesNational Research Council of ItalyPorticiNA80055Italy
| | - Xinpeng Qi
- Genetic Improvement for Fruits and Vegetables LaboratoryBeltsville Agricultural Research Center‐West, US Department of Agriculture, Agricultural Research ServiceBeltsvilleMD20705USA
| | - Ted Mackey
- Horticultural Crops Research UnitUS Department of Agriculture, Agricultural Research ServiceCorvallisOR97330USA
| | - Nahla V. Bassil
- National Clonal Germplasm RepositoryUS Department of Agriculture, Agricultural Research ServiceCorvallisOR97333USA
| | - Hamid Ashrafi
- Department of Horticultural ScienceNorth Carolina State UniversityRaleighNC27695USA
| | - Lara Giongo
- Foundation of Edmund MachSan Michele all'AdigeTN38098Italy
| | - Rubina Jibran
- Plant & Food ResearchFitzherbertPalmerston North4474New Zealand
| | - David Chagné
- Plant & Food ResearchFitzherbertPalmerston North4474New Zealand
| | - Luca Bianco
- Foundation of Edmund MachSan Michele all'AdigeTN38098Italy
| | - Mary A. Lila
- Plants for Human Health InstituteNorth Carolina State UniversityKannapolisNC28081USA
| | - Lisa J. Rowland
- Genetic Improvement for Fruits and Vegetables LaboratoryBeltsville Agricultural Research Center‐West, US Department of Agriculture, Agricultural Research ServiceBeltsvilleMD20705USA
| | - Marina Iovene
- Institute of Biosciences and BioresourcesNational Research Council of ItalyPorticiNA80055Italy
| | - Patrick P. Edger
- Department of HorticultureMichigan State UniversityEast LansingMI48824USA
| | - Massimo Iorizzo
- Plants for Human Health InstituteNorth Carolina State UniversityKannapolisNC28081USA
- Department of Horticultural ScienceNorth Carolina State UniversityRaleighNC27695USA
| |
Collapse
|
27
|
Wang F, Moon W, Letsou W, Sapkota Y, Wang Z, Im C, Baedke JL, Robison L, Yasui Y. Genome-Wide Analysis of Rare Haplotypes Associated with Breast Cancer Risk. Cancer Res 2023; 83:332-345. [PMID: 36354368 PMCID: PMC9852031 DOI: 10.1158/0008-5472.can-22-1888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 09/09/2022] [Accepted: 11/08/2022] [Indexed: 11/12/2022]
Abstract
Numerous common genetic variants have been linked to breast cancer risk, but they only partially explain the total breast cancer heritability. Inference from Nordic population-based twin data indicates rare high-risk loci as the chief determinant of breast cancer risk. Here, we use haplotypes, rather than single variants, to identify rare high-risk loci for breast cancer. With computationally phased genotypes from 181,034 white British women in the UK Biobank, a genome-wide haplotype-breast cancer association analysis was conducted using sliding windows of 5 to 500 consecutive array-genotyped variants. In the discovery stage, haplotype-breast cancer associations were evaluated retrospectively in the prestudy-enrollment data including 5,487 breast cancer cases. Breast cancer hazard ratios (HR) for additive haplotypic effects were estimated using Cox regression. The replication analysis included a prospective cohort of women free of breast cancer at enrollment, of whom 3,524 later developed breast cancer. This two-stage analysis detected 13 rare loci (frequency <1%), each associated with an appreciable breast cancer-risk increase (discovery: HRs = 2.84-6.10, P < 5 × 10-8; replication: HRs = 2.08-5.61, P < 0.01). In contrast, the variants that formed these rare haplotypes individually exhibited much smaller effects. Functional annotation revealed extensive cis-regulatory DNA elements in breast cancer-related cells underlying the replicated rare haplotypes. Using phased, imputed genotypes from 30,064 cases and 25,282 controls in the DRIVE OncoArray case-control study, 6 of the 13 rare-loci associations were found generalizable (odds ratio estimates: 1.48-7.67, P < 0.05). This study demonstrates the complementary advantage of utilizing rare haplotypes to capture novel risk loci and suggests the potential for the discovery of more genetic elements contributing to cancer heritability as large data sets of germline whole-genome sequencing become available. SIGNIFICANCE A genome-wide two-stage haplotype analysis identifies rare haplotypes associated with breast cancer risk and suggests that the rare risk haplotypes represent long-range interactions with regulatory consequences influencing cancer risk.
Collapse
Affiliation(s)
- Fan Wang
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Wonjong Moon
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - William Letsou
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Yadav Sapkota
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Zhaoming Wang
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Cindy Im
- School of Public Health, University of Alberta, Edmonton, Alberta T6G 1C9, Canada
| | - Jessica L. Baedke
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Leslie Robison
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Yutaka Yasui
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
- School of Public Health, University of Alberta, Edmonton, Alberta T6G 1C9, Canada
| |
Collapse
|
28
|
Chan AP, Choi Y, Rangan A, Zhang G, Podder A, Berens M, Sharma S, Pirrotte P, Byron S, Duggan D, Schork NJ. Interrogating the Human Diplome: Computational Methods, Emerging Applications, and Challenges. Methods Mol Biol 2023; 2590:1-30. [PMID: 36335489 DOI: 10.1007/978-1-0716-2819-5_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Human DNA sequencing protocols have revolutionized human biology, biomedical science, and clinical practice, but still have very important limitations. One limitation is that most protocols do not separate or assemble (i.e., "phase") the nucleotide content of each of the maternally and paternally derived chromosomal homologs making up the 22 autosomal pairs and the chromosomal pair making up the pseudo-autosomal region of the sex chromosomes. This has led to a dearth of studies and a consequent underappreciation of many phenomena of fundamental importance to basic and clinical genomic science. We discuss a few protocols for obtaining phase information as well as their limitations, including those that could be used in tumor phasing settings. We then describe a number of biological and clinical phenomena that require phase information. These include phenomena that require precise knowledge of the nucleotide sequence in a chromosomal segment from germline or somatic cells, such as DNA binding events, and insight into unique cis vs. trans-acting functionally impactful variant combinations-for example, variants implicated in a phenotype governed by compound heterozygosity. In addition, we also comment on the need for reliable and consensus-based diploid-context computational workflows for variant identification as well as the need for laboratory-based functional verification strategies for validating cis vs. trans effects of variant combinations. We also briefly describe available resources, example studies, as well as areas of further research, and ultimately argue that the science behind the study of human diploidy, referred to as "diplomics," which will be enabled by nucleotide-level resolution of phased genomes, is a logical next step in the analysis of human genome biology.
Collapse
Affiliation(s)
- Agnes P Chan
- The Translational Genomics Research Institute (TGen), part of the City of Hope National Medical Center, Phoenix, AZ, USA
| | - Yongwook Choi
- The Translational Genomics Research Institute (TGen), part of the City of Hope National Medical Center, Phoenix, AZ, USA
| | - Aditya Rangan
- Courant Institute of Mathematical Sciences at New York University, New York, NY, USA
| | - Guangfa Zhang
- The Translational Genomics Research Institute (TGen), part of the City of Hope National Medical Center, Phoenix, AZ, USA
| | - Avijit Podder
- The Translational Genomics Research Institute (TGen), part of the City of Hope National Medical Center, Phoenix, AZ, USA
| | - Michael Berens
- The Translational Genomics Research Institute (TGen), part of the City of Hope National Medical Center, Phoenix, AZ, USA
- The City of Hope National Medical Center, Duarte, CA, USA
| | - Sunil Sharma
- The Translational Genomics Research Institute (TGen), part of the City of Hope National Medical Center, Phoenix, AZ, USA
- The City of Hope National Medical Center, Duarte, CA, USA
| | - Patrick Pirrotte
- The Translational Genomics Research Institute (TGen), part of the City of Hope National Medical Center, Phoenix, AZ, USA
- The City of Hope National Medical Center, Duarte, CA, USA
| | - Sara Byron
- The Translational Genomics Research Institute (TGen), part of the City of Hope National Medical Center, Phoenix, AZ, USA
- The City of Hope National Medical Center, Duarte, CA, USA
| | - Dave Duggan
- The Translational Genomics Research Institute (TGen), part of the City of Hope National Medical Center, Phoenix, AZ, USA
- The City of Hope National Medical Center, Duarte, CA, USA
| | - Nicholas J Schork
- The Translational Genomics Research Institute (TGen), part of the City of Hope National Medical Center, Phoenix, AZ, USA.
- The City of Hope National Medical Center, Duarte, CA, USA.
| |
Collapse
|
29
|
Hu Y, Yang C, Zhang L, Zhou X. Haplotyping-Assisted Diploid Assembly and Variant Detection with Linked Reads. Methods Mol Biol 2023; 2590:161-182. [PMID: 36335499 DOI: 10.1007/978-1-0716-2819-5_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Phasing is essential for determining the origins of each set of alleles in the whole-genome sequencing data of individuals. As such, it provides essential information for the causes of hereditary diseases and the sources of individual variability. Recent technical breakthroughs in linked-read (referred to as co-barcoding in other chapters of the book) and long-read sequencing and downstream analysis have brought the goal of accurate and complete phasing within reach. Here we review recent progress related to the assembly and phasing of personal genomes based on linked-reads and related applications. Motivated by current limitations in generating high-quality diploid assemblies and detecting variants, a new suite of software tools, Aquila, was developed to fully take advantage of linked-read sequencing technology. The overarching goal of Aquila is to exploit the strengths of linked-read technology including long-range connectivity and inherent phasing of variants for reference-assisted local de novo assembly at the whole-genome scale. The diploid nature of the assemblies facilitates detection and phasing of genetic variation, including single nucleotide variations (SNVs), small insertions and deletions (indels), and structural variants (SVs). An extension of Aquila, Aquila_stLFR, focuses on another newly developed linked-reads sequencing technology, single-tube long-fragment read (stLFR). AquilaSV, a region-based diploid assembly approach, is used to characterize structural variants and can achieve diploid assembly in one target region at a time. Lastly, we introduce HAPDeNovo, a program that exploits phasing information from linked-read sequencing to improve detection of de novo mutations. Use of these tools is expected to harness the advantages of linked-reads technology, improve phasing, and advance variant discovery.
Collapse
Affiliation(s)
- Yunfei Hu
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Chao Yang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong.
| | - Xin Zhou
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA.
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA.
- Data Science Institute, Nashville, TN, USA.
| |
Collapse
|
30
|
Qi H, Cong R, Wang Y, Li L, Zhang G. Construction and analysis of the chromosome-level haplotype-resolved genomes of two Crassostrea oyster congeners: Crassostrea angulata and Crassostrea gigas. Gigascience 2022; 12:giad077. [PMID: 37787064 PMCID: PMC10546077 DOI: 10.1093/gigascience/giad077] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 07/24/2023] [Accepted: 08/30/2023] [Indexed: 10/04/2023] Open
Abstract
BACKGROUND The Portuguese oyster Crassostrea angulata and the Pacific oyster C. gigas are two major Crassostrea species that are naturally distributed along the Northwest Pacific coast and possess great ecological and economic value. Here, we report the construction and comparative analysis of the chromosome-level haplotype-resolved genomes of the two oyster congeners. FINDINGS Based on a trio-binning strategy, the PacBio high-fidelity and Illumina Hi-C reads of the offspring of the hybrid cross C. angulata (♂) × C. gigas (♀) were partitioned and independently assembled to construct two chromosome-level fully phased genomes. The assembly size (contig N50 size, BUSCO completeness) of the two genomes were 582.4 M (12.8 M, 99.1%) and 606.4 M (5.46 M, 98.9%) for C. angulata and C. gigas, respectively, ranking at the top of mollusk genomes with high contiguity and integrity. The general features of the two genomes were highly similar, and 15,475 highly conserved ortholog gene pairs shared identical gene structures and similar genomic locations. Highly similar sequences can be primarily identified in the coding regions, whereas most noncoding regions and introns of genes in the same ortholog group contain substantial small genomic and/or structural variations. Based on population resequencing analysis, a total of 2,756 species-specific single-nucleotide polymorphisms and 1,088 genes possibly under selection were identified. CONCLUSIONS This is the first report of trio-binned fully phased chromosome-level genomes in marine invertebrates. The study provides fundamental resources for the research on mollusk genetics, comparative genomics, and molecular evolution.
Collapse
Affiliation(s)
- Haigang Qi
- CAS and Shandong Province Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China
- Laboratory for Marine Biology and Biotechnology, Laoshan Laboratory, Qingdao 266237, China
- National and Local Joint Engineering Key Laboratory of Ecological Mariculture, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China
- Shandong Technology Innovation Center of Oyster Seed Industry, Qingdao 266105, China
| | - Rihao Cong
- CAS and Shandong Province Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China
- Shandong Technology Innovation Center of Oyster Seed Industry, Qingdao 266105, China
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- The Innovation of Seed Design, Chinese Academy of Sciences, Wuhan 430072, China
| | - Yanjun Wang
- Marine Science Data Center, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China
| | - Li Li
- CAS and Shandong Province Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China
- Shandong Technology Innovation Center of Oyster Seed Industry, Qingdao 266105, China
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- The Innovation of Seed Design, Chinese Academy of Sciences, Wuhan 430072, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guofan Zhang
- CAS and Shandong Province Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China
- Laboratory for Marine Biology and Biotechnology, Laoshan Laboratory, Qingdao 266237, China
- National and Local Joint Engineering Key Laboratory of Ecological Mariculture, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China
- Shandong Technology Innovation Center of Oyster Seed Industry, Qingdao 266105, China
| |
Collapse
|
31
|
Logsdon GA, Eichler EE. The Dynamic Structure and Rapid Evolution of Human Centromeric Satellite DNA. Genes (Basel) 2022; 14:92. [PMID: 36672831 PMCID: PMC9859433 DOI: 10.3390/genes14010092] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 12/22/2022] [Accepted: 12/24/2022] [Indexed: 12/31/2022] Open
Abstract
The complete sequence of a human genome provided our first comprehensive view of the organization of satellite DNA associated with heterochromatin. We review how our understanding of the genetic architecture and epigenetic properties of human centromeric DNA have advanced as a result. Preliminary studies of human and nonhuman ape centromeres reveal complex, saltatory mutational changes organized around distinct evolutionary layers. Pockets of regional hypomethylation within higher-order α-satellite DNA, termed centromere dip regions, appear to define the site of kinetochore attachment in all human chromosomes, although such epigenetic features can vary even within the same chromosome. Sequence resolution of satellite DNA is providing new insights into centromeric function with potential implications for improving our understanding of human biology and health.
Collapse
Affiliation(s)
- Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
32
|
Li J, Wang T, Liu W, Yin D, Lai Z, Zhang G, Zhang K, Ji J, Yin S. A high-quality chromosome-level genome assembly of Pelteobagrus vachelli provides insights into its environmental adaptation and population history. Front Genet 2022; 13:1050192. [DOI: 10.3389/fgene.2022.1050192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 11/01/2022] [Indexed: 11/16/2022] Open
Abstract
Pelteobagrus vachelli is a freshwater fish with high economic value, but the lack of genome resources has severely restricted its industrial development and population conservation. Here, we constructed the first chromosome-level genome assembly of P. vachelli with a total length of approximately 662.13 Mb and a contig N50 was 14.02 Mb, and scaffolds covering 99.79% of the assembly were anchored to 26 chromosomes. Combining the comparative genome results and transcriptome data under environmental stress (high temperature, hypoxia and Edwardsiella. ictaluri infection), the MAPK signaling pathway, PI3K-Akt signaling pathway and apelin signaling pathway play an important role in environmental adaptation of P. vachelli, and these pathways were interconnected by the ErbB family and involved in cell proliferation, differentiation and apoptosis. Population evolution analysis showed that artificial interventions have affected wild populations of P. vachelli. This study provides a useful genomic information for the genetic breeding of P. vachelli, as well as references for further studies on fish biology and evolution.
Collapse
|
33
|
Jarvis ED, Formenti G, Rhie A, Guarracino A, Yang C, Wood J, Tracey A, Thibaud-Nissen F, Vollger MR, Porubsky D, Cheng H, Asri M, Logsdon GA, Carnevali P, Chaisson MJP, Chin CS, Cody S, Collins J, Ebert P, Escalona M, Fedrigo O, Fulton RS, Fulton LL, Garg S, Gerton JL, Ghurye J, Granat A, Green RE, Harvey W, Hasenfeld P, Hastie A, Haukness M, Jaeger EB, Jain M, Kirsche M, Kolmogorov M, Korbel JO, Koren S, Korlach J, Lee J, Li D, Lindsay T, Lucas J, Luo F, Marschall T, Mitchell MW, McDaniel J, Nie F, Olsen HE, Olson ND, Pesout T, Potapova T, Puiu D, Regier A, Ruan J, Salzberg SL, Sanders AD, Schatz MC, Schmitt A, Schneider VA, Selvaraj S, Shafin K, Shumate A, Stitziel NO, Stober C, Torrance J, Wagner J, Wang J, Wenger A, Xiao C, Zimin AV, Zhang G, Wang T, Li H, Garrison E, Haussler D, Hall I, Zook JM, Eichler EE, Phillippy AM, Paten B, Howe K, Miga KH. Semi-automated assembly of high-quality diploid human reference genomes. Nature 2022; 611:519-531. [PMID: 36261518 PMCID: PMC9668749 DOI: 10.1038/s41586-022-05325-5] [Citation(s) in RCA: 70] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Accepted: 09/06/2022] [Indexed: 01/01/2023]
Abstract
The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.
Collapse
Affiliation(s)
- Erich D. Jarvis
- grid.134907.80000 0001 2166 1519Vertebrate Genome Laboratory, The Rockefeller University, New York, NY USA ,grid.413575.10000 0001 2167 1581Howard Hughes Medical Institute, Chevy Chase, MD USA
| | - Giulio Formenti
- grid.134907.80000 0001 2166 1519Vertebrate Genome Laboratory, The Rockefeller University, New York, NY USA
| | - Arang Rhie
- grid.94365.3d0000 0001 2297 5165Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD USA
| | - Andrea Guarracino
- grid.510779.d0000 0004 9414 6915Genomics Research Centre, Human Technopole, Viale Rita Levi-Montalcini, Milan, Italy
| | - Chentao Yang
- grid.21155.320000 0001 2034 1839BGI-Shenzhen, Shenzhen, China
| | - Jonathan Wood
- grid.10306.340000 0004 0606 5382Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Alan Tracey
- grid.10306.340000 0004 0606 5382Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Francoise Thibaud-Nissen
- grid.94365.3d0000 0001 2297 5165National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD USA
| | - Mitchell R. Vollger
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA USA
| | - David Porubsky
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA USA
| | - Haoyu Cheng
- grid.65499.370000 0001 2106 9910Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA USA ,grid.38142.3c000000041936754XDepartment of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Mobin Asri
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Glennis A. Logsdon
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA USA
| | - Paolo Carnevali
- grid.507326.50000 0004 6090 4941Chan Zuckerberg Initiative, Redwood City, CA USA
| | - Mark J. P. Chaisson
- grid.42505.360000 0001 2156 6853Quantitative and Computational Biology, University of Southern California, Los Angeles, CA USA
| | | | - Sarah Cody
- grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA
| | - Joanna Collins
- grid.10306.340000 0004 0606 5382Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Peter Ebert
- grid.411327.20000 0001 2176 9917Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Merly Escalona
- grid.205975.c0000 0001 0740 6917Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA USA
| | - Olivier Fedrigo
- grid.134907.80000 0001 2166 1519Vertebrate Genome Laboratory, The Rockefeller University, New York, NY USA
| | - Robert S. Fulton
- grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA
| | - Lucinda L. Fulton
- grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA
| | - Shilpa Garg
- grid.5254.60000 0001 0674 042XDepartment of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jennifer L. Gerton
- grid.250820.d0000 0000 9420 1591Stowers Institute for Medical Research, Kansas City, MO USA
| | - Jay Ghurye
- grid.504403.6Dovetail Genomics, Scotts Valley, CA USA
| | | | - Richard E. Green
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - William Harvey
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA USA
| | - Patrick Hasenfeld
- grid.4709.a0000 0004 0495 846XEuropean Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Alex Hastie
- grid.470262.50000 0004 0473 1353Bionano Genomics, San Diego, CA USA
| | - Marina Haukness
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Erich B. Jaeger
- grid.185669.50000 0004 0507 3954Illumina, Inc., San Diego, CA USA
| | - Miten Jain
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Melanie Kirsche
- grid.21107.350000 0001 2171 9311Department of Computer Science, Johns Hopkins University, Baltimore, MD USA
| | - Mikhail Kolmogorov
- grid.266100.30000 0001 2107 4242Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA USA
| | - Jan O. Korbel
- grid.4709.a0000 0004 0495 846XEuropean Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Sergey Koren
- grid.94365.3d0000 0001 2297 5165Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD USA
| | - Jonas Korlach
- grid.423340.20000 0004 0640 9878Pacific Biosciences, Menlo Park, CA USA
| | - Joyce Lee
- grid.470262.50000 0004 0473 1353Bionano Genomics, San Diego, CA USA
| | - Daofeng Li
- grid.4367.60000 0001 2355 7002Department of Genetics, Washington University School of Medicine, St. Louis, MO USA ,grid.4367.60000 0001 2355 7002The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO USA
| | - Tina Lindsay
- grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA
| | - Julian Lucas
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Feng Luo
- grid.26090.3d0000 0001 0665 0280School of Computing, Clemson University, Clemson, SC USA
| | - Tobias Marschall
- grid.411327.20000 0001 2176 9917Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Matthew W. Mitchell
- grid.282012.b0000 0004 0627 5048Coriell Institute for Medical Research, Camden, NJ USA
| | - Jennifer McDaniel
- grid.94225.38000000012158463XMaterial Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD USA
| | - Fan Nie
- grid.216417.70000 0001 0379 7164Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Hugh E. Olsen
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Nathan D. Olson
- grid.94225.38000000012158463XMaterial Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD USA
| | - Trevor Pesout
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Tamara Potapova
- grid.250820.d0000 0000 9420 1591Stowers Institute for Medical Research, Kansas City, MO USA
| | - Daniela Puiu
- grid.21107.350000 0001 2171 9311Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD USA
| | - Allison Regier
- grid.511991.40000 0004 4910 5831DNAnexus, Mountain View, CA USA
| | - Jue Ruan
- grid.410727.70000 0001 0526 1937Agricultural Genomics Institute, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Steven L. Salzberg
- grid.21107.350000 0001 2171 9311Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD USA
| | - Ashley D. Sanders
- grid.419491.00000 0001 1014 0849Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Michael C. Schatz
- grid.21107.350000 0001 2171 9311Department of Computer Science, Johns Hopkins University, Baltimore, MD USA
| | | | - Valerie A. Schneider
- grid.94365.3d0000 0001 2297 5165National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD USA
| | | | - Kishwar Shafin
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Alaina Shumate
- grid.21107.350000 0001 2171 9311Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD USA
| | - Nathan O. Stitziel
- grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA ,grid.4367.60000 0001 2355 7002Department of Genetics, Washington University School of Medicine, St. Louis, MO USA ,grid.4367.60000 0001 2355 7002Cardiovascular Division, John T. Milliken Department of Internal Medicine, Washington University School of Medicine, St. Louis, USA
| | - Catherine Stober
- grid.4709.a0000 0004 0495 846XEuropean Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - James Torrance
- grid.10306.340000 0004 0606 5382Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Justin Wagner
- grid.94225.38000000012158463XMaterial Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD USA
| | - Jianxin Wang
- grid.216417.70000 0001 0379 7164Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Aaron Wenger
- grid.423340.20000 0004 0640 9878Pacific Biosciences, Menlo Park, CA USA
| | - Chuanle Xiao
- grid.12981.330000 0001 2360 039XState Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Aleksey V. Zimin
- grid.21107.350000 0001 2171 9311Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD USA
| | - Guojie Zhang
- grid.13402.340000 0004 1759 700XCenter for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou, China
| | - Ting Wang
- grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA ,grid.4367.60000 0001 2355 7002Department of Genetics, Washington University School of Medicine, St. Louis, MO USA ,grid.4367.60000 0001 2355 7002The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO USA
| | - Heng Li
- grid.65499.370000 0001 2106 9910Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA USA
| | - Erik Garrison
- grid.267301.10000 0004 0386 9246Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN USA
| | - David Haussler
- grid.413575.10000 0001 2167 1581Howard Hughes Medical Institute, Chevy Chase, MD USA ,grid.205975.c0000 0001 0740 6917Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA USA
| | - Ira Hall
- grid.47100.320000000419368710Yale School of Medicine, New Haven, CT USA
| | - Justin M. Zook
- grid.94225.38000000012158463XMaterial Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD USA
| | - Evan E. Eichler
- grid.413575.10000 0001 2167 1581Howard Hughes Medical Institute, Chevy Chase, MD USA ,grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA USA
| | - Adam M. Phillippy
- grid.94365.3d0000 0001 2297 5165Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD USA
| | - Benedict Paten
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Kerstin Howe
- grid.10306.340000 0004 0606 5382Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Karen H. Miga
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | | |
Collapse
|
34
|
Ko BJ, Lee C, Kim J, Rhie A, Yoo DA, Howe K, Wood J, Cho S, Brown S, Formenti G, Jarvis ED, Kim H. Widespread false gene gains caused by duplication errors in genome assemblies. Genome Biol 2022; 23:205. [PMID: 36167596 PMCID: PMC9516828 DOI: 10.1186/s13059-022-02764-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 09/02/2022] [Indexed: 12/22/2022] Open
Abstract
Background False duplications in genome assemblies lead to false biological conclusions. We quantified false duplications in popularly used previous genome assemblies for platypus, zebra finch, and Anna’s Hummingbird, and their new counterparts of the same species generated by the Vertebrate Genomes Project, of which the Vertebrate Genomes Project pipeline attempted to eliminate false duplications through haplotype phasing and purging. These assemblies are among the first generated by the Vertebrate Genomes Project where there was a prior chromosomal level reference assembly to compare with. Results Whole genome alignments revealed that 4 to 16% of the sequences are falsely duplicated in the previous assemblies, impacting hundreds to thousands of genes. These lead to overestimated gene family expansions. The main source of the false duplications is heterotype duplications, where the haplotype sequences were relatively more divergent than other parts of the genome leading the assembly algorithms to classify them as separate genes or genomic regions. A minor source is sequencing errors. Ancient ATP nucleotide binding gene families have a higher prevalence of false duplications compared to other gene families. Although present in a smaller proportion, we observe false duplications remaining in the Vertebrate Genomes Project assemblies that can be identified and purged. Conclusions This study highlights the need for more advanced assembly methods that better separate haplotypes and sequence errors, and the need for cautious analyses on gene gains. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02764-1.
Collapse
Affiliation(s)
- Byung June Ko
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
| | - Chul Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Juwan Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, USA
| | - Dong Ahn Yoo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | | | | | - Seoae Cho
- eGnome, Inc, Seoul, Republic of Korea
| | - Samara Brown
- Laboratory of the Neurogenetics of Language, The Rockefeller University, New York, NY, USA.,Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Giulio Formenti
- Laboratory of the Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Erich D Jarvis
- Laboratory of the Neurogenetics of Language, The Rockefeller University, New York, NY, USA. .,Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| | - Heebal Kim
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea. .,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea. .,eGnome, Inc, Seoul, Republic of Korea.
| |
Collapse
|
35
|
Genomic analyses of the Linum distyly supergene reveal convergent evolution at the molecular level. Curr Biol 2022; 32:4360-4371.e6. [PMID: 36087578 DOI: 10.1016/j.cub.2022.08.042] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 08/11/2022] [Accepted: 08/15/2022] [Indexed: 11/23/2022]
Abstract
Supergenes govern multi-trait-balanced polymorphisms in a wide range of systems; however, our understanding of their origins and evolution remains incomplete. The reciprocal placement of stigmas and anthers in pin and thrum floral morphs of distylous species constitutes an iconic example of a balanced polymorphism governed by a supergene, the distyly S-locus. Recent studies have shown that the Primula and Turnera distyly supergenes are both hemizygous in thrums, but it remains unknown whether hemizygosity is pervasive among distyly S-loci. As hemizygosity has major consequences for supergene evolution and loss, clarifying whether this genetic architecture is shared among distylous species is critical. Here, we have characterized the genetic architecture and evolution of the distyly supergene in Linum by generating a chromosome-level genome assembly of Linum tenue, followed by the identification of the S-locus using population genomic data. We show that hemizygosity and thrum-specific expression of S-linked genes, including a pistil-expressed candidate gene for style length, are major features of the Linum S-locus. Structural variation is likely instrumental for recombination suppression, and although the non-recombining dominant haplotype has accumulated transposable elements, S-linked genes are not under relaxed purifying selection. Our findings reveal remarkable convergence in the genetic architecture and evolution of independently derived distyly supergenes, provide a counterexample to classic inversion-based supergenes, and shed new light on the origin and maintenance of an iconic floral polymorphism.
Collapse
|
36
|
Ji Y, Feng S, Wu L, Fang Q, Brüniche-Olsen A, DeWoody JA, Cheng Y, Zhang D, Hao Y, Song G, Qu Y, Suh A, Zhang G, Hackett SJ, Lei F. Orthologous microsatellites, transposable elements, and DNA deletions correlate with generation time and body mass in neoavian birds. SCIENCE ADVANCES 2022; 8:eabo0099. [PMID: 36044583 PMCID: PMC9432842 DOI: 10.1126/sciadv.abo0099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2022] [Accepted: 07/18/2022] [Indexed: 06/15/2023]
Abstract
The rate of mutation accumulation in germline cells can be affected by cell replication and/or DNA damage, which are further related to life history traits such as generation time and body mass. Leveraging the existing datasets of 233 neoavian bird species, here, we investigated whether generation time and body mass contribute to the interspecific variation of orthologous microsatellite length, transposable element (TE) length, and deletion length and how these genomic attributes affect genome sizes. In nonpasserines, we found that generation time is correlated to both orthologous microsatellite length and TE length, and body mass is negatively correlated to DNA deletions. These patterns are less pronounced in passerines. In all species, we found that DNA deletions relate to genome size similarly as TE length, suggesting a role of body mass dynamics in genome evolution. Our results indicate that generation time and body mass shape the evolution of genomic attributes in neoavian birds.
Collapse
Affiliation(s)
- Yanzhu Ji
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- Negaunee Integrative Research Center, Field Museum of Natural History, Chicago, IL 60605, USA
| | - Shaohong Feng
- BGI-Shenzhen, Beishan Industrial Zone, Shenzhen 518083, China
- Future Health Laboratory, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
- Evolutionary and Organismal Biology Research Center, Zhejiang University School of Medicine, Hangzhou, China
| | - Lei Wu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- University of the Chinese Academy of Sciences, Beijing 100049, China
| | - Qi Fang
- BGI-Shenzhen, Beishan Industrial Zone, Shenzhen 518083, China
- Villum Centre for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen DK-2200, Denmark
| | - Anna Brüniche-Olsen
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen DK-2200, Denmark
| | - J. Andrew DeWoody
- Departments of Forestry and Natural Resources and Biological Sciences, Purdue University, West Lafayette, IN 47906, USA
| | - Yalin Cheng
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Dezhi Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Yan Hao
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Gang Song
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Yanhua Qu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Alexander Suh
- School of Biological Sciences, Organism and Environment, University of East Anglia, NR4 7TU, Norwich, UK
- Department of Organismal Biology, Systematic Biology, Evolutionary Biology Centre (EBC), Science for Life Laboratory, Uppsala University, Uppsala SE-752 36, Sweden
| | - Guojie Zhang
- Future Health Laboratory, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
- Evolutionary and Organismal Biology Research Center, Zhejiang University School of Medicine, Hangzhou, China
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou 311121, China
- Villum Centre for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen DK-2200, Denmark
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
- Women’s Hospital, School of Medicine, Zhejiang University, Shangcheng District, Hangzhou, 310006, China
| | - Shannon J. Hackett
- Negaunee Integrative Research Center, Field Museum of Natural History, Chicago, IL 60605, USA
| | - Fumin Lei
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- University of the Chinese Academy of Sciences, Beijing 100049, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650201, China
| |
Collapse
|
37
|
Dahn HA, Mountcastle J, Balacco J, Winkler S, Bista I, Schmitt AD, Pettersson OV, Formenti G, Oliver K, Smith M, Tan W, Kraus A, Mac S, Komoroske LM, Lama T, Crawford AJ, Murphy RW, Brown S, Scott AF, Morin PA, Jarvis ED, Fedrigo O. Benchmarking ultra-high molecular weight DNA preservation methods for long-read and long-range sequencing. Gigascience 2022; 11:6659719. [PMID: 35946988 PMCID: PMC9364683 DOI: 10.1093/gigascience/giac068] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 01/26/2022] [Accepted: 06/16/2022] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Studies in vertebrate genomics require sampling from a broad range of tissue types, taxa, and localities. Recent advancements in long-read and long-range genome sequencing have made it possible to produce high-quality chromosome-level genome assemblies for almost any organism. However, adequate tissue preservation for the requisite ultra-high molecular weight DNA (uHMW DNA) remains a major challenge. Here we present a comparative study of preservation methods for field and laboratory tissue sampling, across vertebrate classes and different tissue types. RESULTS We find that storage temperature was the strongest predictor of uHMW fragment lengths. While immediate flash-freezing remains the sample preservation gold standard, samples preserved in 95% EtOH or 20-25% DMSO-EDTA showed little fragment length degradation when stored at 4°C for 6 hours. Samples in 95% EtOH or 20-25% DMSO-EDTA kept at 4°C for 1 week after dissection still yielded adequate amounts of uHMW DNA for most applications. Tissue type was a significant predictor of total DNA yield but not fragment length. Preservation solution had a smaller but significant influence on both fragment length and DNA yield. CONCLUSION We provide sample preservation guidelines that ensure sufficient DNA integrity and amount required for use with long-read and long-range sequencing technologies across vertebrates. Our best practices generated the uHMW DNA needed for the high-quality reference genomes for phase 1 of the Vertebrate Genomes Project, whose ultimate mission is to generate chromosome-level reference genome assemblies of all ∼70,000 extant vertebrate species.
Collapse
Affiliation(s)
| | | | | | - Sylke Winkler
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Saxony 01307, Germany
| | - Iliana Bista
- Tree of Life Program, Wellcome Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
- Department of Genetics, University of Cambridge, Cambridge, Cambridgeshire CB2 3EH, UK
| | | | | | | | - Karen Oliver
- Tree of Life Program, Wellcome Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Michelle Smith
- Tree of Life Program, Wellcome Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Wenhua Tan
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Saxony 01307, Germany
| | - Anne Kraus
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Saxony 01307, Germany
| | - Stephen Mac
- Arima Genomics, Inc., San Diego, CA 92121, USA
| | - Lisa M Komoroske
- Department of Environmental Conservation, University of Massachusetts Amherst, Amherst, MA 01003-9285, USA
| | - Tanya Lama
- Department of Environmental Conservation, University of Massachusetts Amherst, Amherst, MA 01003-9285, USA
| | - Andrew J Crawford
- Department of Biological Sciences, Universidad de los Andes, Bogotá 111711, Colombia
| | - Robert W Murphy
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| | - Samara Brown
- The Rockefeller University, New York, NY 10065, USA
| | - Alan F Scott
- Department of Medicine, Johns Hopkins University, Baltimore, MD 21287, USA
| | - Phillip A Morin
- Southwest Fisheries Science Center, National Marine Fisheries Service, NOAA, La Jolla, CA 92037, USA
| | - Erich D Jarvis
- The Rockefeller University, New York, NY 10065, USA
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Olivier Fedrigo
- Correspondence address. Olivier Fedrigo, Vertebrate Genome Laboratory, The Rockefeller University, 1230 York Avenue, Box 366, New York, NY 10065, USA. E-mail:
| |
Collapse
|
38
|
Guk J, Jang M, Choi J, Lee YM, Kim S. De novo phasing resolves haplotype sequences in complex plant genomes. PLANT BIOTECHNOLOGY JOURNAL 2022; 20:1031-1041. [PMID: 35332665 PMCID: PMC9129073 DOI: 10.1111/pbi.13815] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 02/07/2022] [Accepted: 03/20/2022] [Indexed: 05/12/2023]
Abstract
Genome phasing is a recently developed assembly method that separates heterozygous eukaryotic genomic regions and builds haplotype-resolved assemblies. Because differences between haplotypes are ignored in most published de novo genomes, assemblies are available as consensus genomes consisting of haplotype mixtures, thus increasing the need for genome phasing. Here, we review the operating principles and characteristics of several freely available and widely used phasing tools (TrioCanu, FALCON-Phase, and ALLHiC). An examination of downstream analyses using haplotype-resolved genome assemblies in plants indicated significant differences among haplotypes regarding chromosomal rearrangements, sequence insertions, and expression of specific alleles that contribute to the acquisition of the biological characteristics of plant species. Finally, we suggest directions to solve addressing limitations of current genome-phasing methods. This review provides insights into the current progress, limitations, and future directions of de novo genome phasing, which will enable researchers to easily access and utilize genome-phasing in studies involving highly heterozygous complex plant genomes.
Collapse
Affiliation(s)
- Ji‐Yoon Guk
- Department of Environmental HorticultureUniversity of SeoulSeoulKorea
| | - Min‐Jeong Jang
- Department of Environmental HorticultureUniversity of SeoulSeoulKorea
| | - Jin‐Wook Choi
- Department of Environmental HorticultureUniversity of SeoulSeoulKorea
| | - Yeon Mi Lee
- Department of Environmental HorticultureUniversity of SeoulSeoulKorea
| | - Seungill Kim
- Department of Environmental HorticultureUniversity of SeoulSeoulKorea
| |
Collapse
|
39
|
Carey SB, Lovell JT, Jenkins J, Leebens-Mack J, Schmutz J, Wilson MA, Harkess A. Representing sex chromosomes in genome assemblies. CELL GENOMICS 2022; 2. [PMID: 35720975 PMCID: PMC9205529 DOI: 10.1016/j.xgen.2022.100132] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Sex chromosomes have evolved hundreds of independent times across eukaryotes. As genome sequencing, assembly, and scaffolding techniques rapidly improve, it is now feasible to build fully phased sex chromosome assemblies. Despite technological advances enabling phased assembly of whole chromosomes, there are currently no standards for representing sex chromosomes when publicly releasing a genome. Furthermore, most computational analysis tools are unable to efficiently investigate their unique biology relative to autosomes. We discuss a diversity of sex chromosome systems and consider the challenges of representing sex chromosome pairs in genome assemblies. By addressing these issues now as technologies for full phasing of chromosomal assemblies are maturing, we can collectively ensure that future genome analysis toolkits can be broadly applied to all eukaryotes with diverse types of sex chromosome systems. Here we provide best practice guidelines for presenting a genome assembly that contains sex chromosomes. These guidelines can also be applied to other non-recombining genomic regions, such as S-loci in plants and mating-type loci in fungi and algae.
Collapse
Affiliation(s)
- Sarah B Carey
- Department of Crop, Soil, and Environmental Sciences, Auburn University, Auburn, AL 36849, USA.,HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA
| | - John T Lovell
- HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA
| | - Jerry Jenkins
- HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA
| | - Jim Leebens-Mack
- Department of Plant Biology, University of Georgia, Athens, GA 30602, USA
| | - Jeremy Schmutz
- HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA.,US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Melissa A Wilson
- School of Life Sciences, Center for Evolution and Medicine, The Biodesign Institute, Arizona State University, Tempe, AZ 85287, USA
| | - Alex Harkess
- Department of Crop, Soil, and Environmental Sciences, Auburn University, Auburn, AL 36849, USA.,HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA
| |
Collapse
|
40
|
Gupta PK. Earth Biogenome Project: present status and future plans. Trends Genet 2022; 38:811-820. [DOI: 10.1016/j.tig.2022.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 04/11/2022] [Accepted: 04/22/2022] [Indexed: 10/18/2022]
|
41
|
Nashima K, Shirasawa K, Isobe S, Urasaki N, Tarora K, Irei A, Shoda M, Takeuchi M, Omine Y, Nishiba Y, Sugawara T, Kunihisa M, Nishitani C, Yamamoto T. Gene prediction for leaf margin phenotype and fruit flesh color in pineapple (Ananas comosus) using haplotype-resolved genome sequencing. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 110:720-734. [PMID: 35122338 DOI: 10.1111/tpj.15699] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 01/17/2022] [Accepted: 01/31/2022] [Indexed: 06/14/2023]
Abstract
Pineapple (Ananas comosus (L.) Merr.) is one of the most economically important tropical fruit species. The major aim of the breeding programs in several countries, including Japan, is quality improvement, mainly for the fresh market. ‘Yugafu’, a Japanese cultivar with distinctive pipe-type leaf margin phenotype and white flesh color, is popular for fresh consumption. Therefore, genome sequencing of ‘Yugafu’ is expected to assist pineapple breeding. Here, we developed a haplotype-resolved assembly for the heterozygous genome of ‘Yugafu’ using long-read sequencing technology and obtained a pair of 25 pseudomolecule sequences inherited from the parental accessions ‘Cream pineapple’ and ‘HI101’. The causative genes for leaf margin and fruit flesh color were identified. Fine mapping revealed a 162-kb region on CLG23 for the leaf margin phenotype. In this region, 20 kb of inverted repeat was specifically observed in the ‘Cream pineapple’ derived allele, and the WUSCHEL-related homeobox 3 (AcWOX3) gene was predicted as the key gene for leaf margin morphogenesis. Dominantly repressed AcWOX3 via RNAi was suggested to be the cause of the pipe-type leaf margin phenotype. Quantitative trait locus (QTL) analysis revealed that the terminal region of CLG08 contributed to white flesh and low carotenoid content. Carotenoid cleaved dioxygenase 4 (AcCCD4), a key gene for carotenoid degradation underlying this QTL, was predicted as the key gene for white flesh color through expression analysis. These findings could assist in modern pineapple breeding and facilitate marker-assisted selection for important traits.
Collapse
Affiliation(s)
- Kenji Nashima
- College of Bioresource Sciences, Nihon University, Fujisawa, Kanagawa, 252-0880, Japan
| | - Kenta Shirasawa
- Kazusa DNA Research Institute, Kisarazu, Chiba, 292-0813, Japan
| | - Sachiko Isobe
- Kazusa DNA Research Institute, Kisarazu, Chiba, 292-0813, Japan
| | - Naoya Urasaki
- Okinawa Prefectural Agricultural Research Center, Itoman, Okinawa, 901-0336, Japan
| | - Kazuhiko Tarora
- Okinawa Prefectural Agricultural Research Center, Itoman, Okinawa, 901-0336, Japan
| | - Ayaka Irei
- Okinawa Prefectural Agricultural Research Center, Itoman, Okinawa, 901-0336, Japan
| | - Moriyuki Shoda
- Okinawa Prefectural Agricultural Research Center Nago Branch, Nago, Okinawa, 905-0012, Japan
| | - Makoto Takeuchi
- Okinawa Prefectural Agricultural Research Center Nago Branch, Nago, Okinawa, 905-0012, Japan
| | - Yuta Omine
- Okinawa Prefectural Agricultural Research Center Nago Branch, Nago, Okinawa, 905-0012, Japan
| | - Yoichi Nishiba
- Kyushu Okinawa Agricultural Research Center, NARO, Koshi, Kumamoto, 861-1192, Japan
| | - Terumi Sugawara
- Kyushu Okinawa Agricultural Research Center, NARO, Koshi, Kumamoto, 861-1192, Japan
| | - Miyuki Kunihisa
- Institute of Fruit Tree and Tea Science, NARO, Tsukuba, Ibaraki, 305-0852, Japan
| | - Chikako Nishitani
- Institute of Fruit Tree and Tea Science, NARO, Tsukuba, Ibaraki, 305-0852, Japan
| | - Toshiya Yamamoto
- Institute of Fruit Tree and Tea Science, NARO, Tsukuba, Ibaraki, 305-0852, Japan
| |
Collapse
|
42
|
Lin JH, Chen LC, Yu SC, Huang YT. LongPhase: an ultra-fast chromosome-scale phasing algorithm for small and large variants. Bioinformatics 2022; 38:1816-1822. [PMID: 35104333 DOI: 10.1093/bioinformatics/btac058] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 01/04/2022] [Accepted: 01/26/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Long-read phasing has been used for reconstructing diploid genomes, improving variant calling and resolving microbial strains in metagenomics. However, the phasing blocks of existing methods are broken by large Structural Variations (SVs), and the efficiency is unsatisfactory for population-scale phasing. RESULTS This article presents a novel algorithm, LongPhase, which can simultaneously phase single nucleotide polymorphisms (SNPs) and SVs of a human genome in 10-20 min, 10× faster than the state-of-the-art WhatsHap, HapCUT2 and Margin. In particular, co-phasing SNPs and SVs produces much larger haplotype blocks (N50 = 25 Mbp) than those of existing methods (N50 = 10-15 Mbp). We show that LongPhase combined with Nanopore ultra-long reads is a cost-effective and highly contiguous solution, which can produce between one and 26 blocks per chromosome arm without the need for additional trios, chromosome-conformation and strand-seq data. AVAILABILITYAND IMPLEMENTATION LongPhase is freely available at https://github.com/twolinin/LongPhase/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jyun-Hong Lin
- Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi 621301, Taiwan
| | - Liang-Chi Chen
- Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi 621301, Taiwan
| | - Shu-Chi Yu
- Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi 621301, Taiwan
| | - Yao-Ting Huang
- Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi 621301, Taiwan
| |
Collapse
|
43
|
Duan H, Jones AW, Hewitt T, Mackenzie A, Hu Y, Sharp A, Lewis D, Mago R, Upadhyaya NM, Rathjen JP, Stone EA, Schwessinger B, Figueroa M, Dodds PN, Periyannan S, Sperschneider J. Physical separation of haplotypes in dikaryons allows benchmarking of phasing accuracy in Nanopore and HiFi assemblies with Hi-C data. Genome Biol 2022; 23:84. [PMID: 35337367 PMCID: PMC8957140 DOI: 10.1186/s13059-022-02658-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 03/21/2022] [Indexed: 12/21/2022] Open
Abstract
Background Most animals and plants have more than one set of chromosomes and package these haplotypes into a single nucleus within each cell. In contrast, many fungal species carry multiple haploid nuclei per cell. Rust fungi are such species with two nuclei (karyons) that contain a full set of haploid chromosomes each. The physical separation of haplotypes in dikaryons means that, unlike in diploids, Hi-C chromatin contacts between haplotypes are false-positive signals. Results We generate the first chromosome-scale, fully-phased assembly for the dikaryotic leaf rust fungus Puccinia triticina and compare Nanopore MinION and PacBio HiFi sequence-based assemblies. We show that false-positive Hi-C contacts between haplotypes are predominantly caused by phase switches rather than by collapsed regions or Hi-C read mis-mappings. We introduce a method for phasing of dikaryotic genomes into the two haplotypes using Hi-C contact graphs, including a phase switch correction step. In the HiFi assembly, relatively few phase switches occur, and these are predominantly located at haplotig boundaries and can be readily corrected. In contrast, phase switches are widespread throughout the Nanopore assembly. We show that haploid genome read coverage of 30–40 times using HiFi sequencing is required for phasing of the leaf rust genome, with 0.7% heterozygosity, and that HiFi sequencing resolves genomic regions with low heterozygosity that are otherwise collapsed in the Nanopore assembly. Conclusions This first Hi-C based phasing pipeline for dikaryons and comparison of long-read sequencing technologies will inform future genome assembly and haplotype phasing projects in other non-haploid organisms. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02658-2.
Collapse
Affiliation(s)
- Hongyu Duan
- Biological Data Science Institute, The Australian National University, Canberra, Australia
| | - Ashley W Jones
- Research School of Biology, The Australian National University, Canberra, Australia
| | - Tim Hewitt
- Black Mountain Science and Innovation Park, CSIRO Agriculture and Food, Canberra, Australia
| | - Amy Mackenzie
- Research School of Biology, The Australian National University, Canberra, Australia.,Black Mountain Science and Innovation Park, CSIRO Agriculture and Food, Canberra, Australia
| | - Yiheng Hu
- Research School of Biology, The Australian National University, Canberra, Australia
| | - Anna Sharp
- Research School of Biology, The Australian National University, Canberra, Australia.,Current Address: John Curtin School of Medical Research, The Australian National University, Canberra, Australia
| | - David Lewis
- Black Mountain Science and Innovation Park, CSIRO Agriculture and Food, Canberra, Australia
| | - Rohit Mago
- Black Mountain Science and Innovation Park, CSIRO Agriculture and Food, Canberra, Australia
| | - Narayana M Upadhyaya
- Black Mountain Science and Innovation Park, CSIRO Agriculture and Food, Canberra, Australia
| | - John P Rathjen
- Research School of Biology, The Australian National University, Canberra, Australia
| | - Eric A Stone
- Biological Data Science Institute, The Australian National University, Canberra, Australia
| | | | - Melania Figueroa
- Black Mountain Science and Innovation Park, CSIRO Agriculture and Food, Canberra, Australia
| | - Peter N Dodds
- Black Mountain Science and Innovation Park, CSIRO Agriculture and Food, Canberra, Australia
| | - Sambasivam Periyannan
- Research School of Biology, The Australian National University, Canberra, Australia.,Black Mountain Science and Innovation Park, CSIRO Agriculture and Food, Canberra, Australia
| | - Jana Sperschneider
- Biological Data Science Institute, The Australian National University, Canberra, Australia. .,Current Address: Black Mountain Science and Innovation Park, CSIRO Agriculture and Food, Canberra, Australia.
| |
Collapse
|
44
|
Cheng H, Jarvis ED, Fedrigo O, Koepfli KP, Urban L, Gemmell NJ, Li H. Haplotype-resolved assembly of diploid genomes without parental data. Nat Biotechnol 2022; 40:1332-1335. [PMID: 35332338 PMCID: PMC9464699 DOI: 10.1038/s41587-022-01261-x] [Citation(s) in RCA: 123] [Impact Index Per Article: 61.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 02/14/2022] [Indexed: 12/29/2022]
Abstract
Routine haplotype-resolved genome assembly from single samples remains an unresolved problem. Here we describe an algorithm that combines PacBio HiFi reads and Hi-C chromatin interaction data to produce a haplotype-resolved assembly without the sequencing of parents. Applied to human and other vertebrate samples, our algorithm consistently outperforms existing single-sample assembly pipelines and generates assemblies of similar quality to the best pedigree-based assemblies.
Collapse
Affiliation(s)
- Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Erich D. Jarvis
- The Vertebrate Genome Lab, The Rockefeller University, New York, NY 10065,Howard Hughes Medical Institute, Chevy Chase, MD, 20815
| | - Olivier Fedrigo
- The Vertebrate Genome Lab, The Rockefeller University, New York, NY 10065
| | - Klaus-Peter Koepfli
- Smithsonian-Mason School of Conservation, George Mason University, Front Royal, VA 22630, USA,Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, Washington, D.C., 20008, USA,ITMO University, Computer Technologies Laboratory, St. Petersburg 197101, Russia
| | - Lara Urban
- Department of Anatomy, University of Otago, Dunedin 9016, New Zealand
| | - Neil J. Gemmell
- Department of Anatomy, University of Otago, Dunedin 9016, New Zealand
| | - Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA. .,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
45
|
Abstract
A decade of progress in whole-genome sequencing techniques has imbued researchers with the confidence to sequence all eukaryotic life on earth. But what will be essential to their success, and what challenges await them?
Collapse
|
46
|
Ramos L, Antunes A. Decoding sex: Elucidating sex determination and how high-quality genome assemblies are untangling the evolutionary dynamics of sex chromosomes. Genomics 2022; 114:110277. [PMID: 35104609 DOI: 10.1016/j.ygeno.2022.110277] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 12/22/2021] [Accepted: 01/26/2022] [Indexed: 11/28/2022]
Abstract
Sexual reproduction is a diverse and widespread process. In gonochoristic species, the differentiation of sexes occurs through diverse mechanisms, influenced by environmental and genetic factors. In most vertebrates, a master-switch gene is responsible for triggering a sex determination network. However, only a few genes have acquired master-switch functions, and this process is associated with the evolution of sex-chromosomes, which have a significant influence in evolution. Additionally, their highly repetitive regions impose challenges for high-quality sequencing, even using high-throughput, state-of-the-art techniques. Here, we review the mechanisms involved in sex determination and their role in the evolution of species, particularly vertebrates, focusing on sex chromosomes and the challenges involved in sequencing these genomic elements. We also address the improvements provided by the growth of sequencing projects, by generating a massive number of near-gapless, telomere-to-telomere, chromosome-level, phased assemblies, increasing the number and quality of sex-chromosome sequences available for further studies.
Collapse
Affiliation(s)
- Luana Ramos
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208 Porto, Portugal; Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208 Porto, Portugal; Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal.
| |
Collapse
|
47
|
Pucker B, Irisarri I, de Vries J, Xu B. Plant genome sequence assembly in the era of long reads: Progress, challenges and future directions. QUANTITATIVE PLANT BIOLOGY 2022; 3:e5. [PMID: 37077982 PMCID: PMC10095996 DOI: 10.1017/qpb.2021.18] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 11/24/2021] [Accepted: 12/21/2021] [Indexed: 05/03/2023]
Abstract
Third-generation long-read sequencing is transforming plant genomics. Oxford Nanopore Technologies and Pacific Biosciences are offering competing long-read sequencing technologies and enable plant scientists to investigate even large and complex plant genomes. Sequencing projects can be conducted by single research groups and sequences of smaller plant genomes can be completed within days. This also resulted in an increased investigation of genomes from multiple species in large scale to address fundamental questions associated with the origin and evolution of land plants. Increased accessibility of sequencing devices and user-friendly software allows more researchers to get involved in genomics. Current challenges are accurately resolving diploid or polyploid genome sequences and better accounting for the intra-specific diversity by switching from the use of single reference genome sequences to a pangenome graph.
Collapse
Affiliation(s)
- Boas Pucker
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom
- Institute of Plant Biology & Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
- Author for correspondence: Boas Pucker E-mail:
| | - Iker Irisarri
- Department of Applied Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Göttingen, Germany
- Campus Institute Data Science (CIDAS), University of Goettingen, Göttingen, Germany
| | - Jan de Vries
- Department of Applied Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Göttingen, Germany
- Campus Institute Data Science (CIDAS), University of Goettingen, Göttingen, Germany
- Department of Applied Bioinformatics, Göttingen Center for Molecular Biosciences (GZMB), University of Goettingen, Göttingen, Germany
| | - Bo Xu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
48
|
Zhang B, Chen S, Liu J, Yan YB, Chen J, Li D, Liu JY. A High-Quality Haplotype-Resolved Genome of Common Bermudagrass ( Cynodon dactylon L.) Provides Insights Into Polyploid Genome Stability and Prostrate Growth. FRONTIERS IN PLANT SCIENCE 2022; 13:890980. [PMID: 35548270 PMCID: PMC9081840 DOI: 10.3389/fpls.2022.890980] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 04/04/2022] [Indexed: 05/03/2023]
Abstract
Common bermudagrass (Cynodon dactylon L.) is an important perennial warm-season turfgrass species with great economic value. However, the reference genome is still deficient in C. dactylon, which severely impedes basic studies and breeding studies. In this study, a high-quality haplotype-resolved genome of C. dactylon cultivar Yangjiang was successfully assembled using a combination of multiple sequencing strategies. The assembled genome is approximately 1.01 Gb in size and is comprised of 36 pseudo chromosomes belonging to four haplotypes. In total, 76,879 protein-coding genes and 529,092 repeat sequences were annotated in the assembled genome. Evolution analysis indicated that C. dactylon underwent two rounds of whole-genome duplication events, whereas syntenic and transcriptome analysis revealed that global subgenome dominance was absent among the four haplotypes. Genome-wide gene family analyses further indicated that homologous recombination-regulating genes and tiller-angle-regulating genes all showed an adaptive evolution in C. dactylon, providing insights into genome-scale regulation of polyploid genome stability and prostrate growth. These results not only facilitate a better understanding of the complex genome composition and unique plant architectural characteristics of common bermudagrass, but also offer a valuable resource for comparative genome analyses of turfgrasses and other plant species.
Collapse
Affiliation(s)
- Bing Zhang
- School of Life Sciences, Tsinghua University, Beijing, China
- College of Animal Science and Technology, Yangzhou University, Yangzhou, China
| | - Si Chen
- College of Animal Science and Technology, Yangzhou University, Yangzhou, China
| | - Jianxiu Liu
- Institute of Botany, Jiangsu Province and Chinese Academy of Sciences, Nanjing, China
| | - Yong-Bin Yan
- School of Life Sciences, Tsinghua University, Beijing, China
| | - Jingbo Chen
- Institute of Botany, Jiangsu Province and Chinese Academy of Sciences, Nanjing, China
| | - Dandan Li
- Institute of Botany, Jiangsu Province and Chinese Academy of Sciences, Nanjing, China
| | - Jin-Yuan Liu
- School of Life Sciences, Tsinghua University, Beijing, China
- *Correspondence: Jin-Yuan Liu,
| |
Collapse
|
49
|
Tait BD. The importance of establishing genetic phase in clinical medicine. Int J Immunogenet 2021; 49:1-7. [PMID: 34958529 DOI: 10.1111/iji.12567] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Revised: 11/15/2021] [Accepted: 11/19/2021] [Indexed: 12/27/2022]
Abstract
Haplotyping or determination of genetic phase has always played a pivotal role in MHC (HLA studies) both in helping to understand inheritance patterns in diseases such as type 1 diabetes (T1D) and in ensuring better matching in transplantation scenarios such as haematopoietic stem cell transplantation (HSCT), using donors genetically related to the patient. In recent years the need to establish genetic phase in a number of clinical scenarios has become apparent. These include: Genetic phasing for hematopoietic stem cell transplants using unrelated donors, where the HLA haplotypes are not known but where haplotype-matched recipients fare better clinically than allele matched, but haplotype mismatched patients. The use of checkpoint inhibitors is one of the most innovative and exciting developments in cancer treatment in years. An example is the use of the monoclonal ipilimumab to block the CTLA-4 receptor which is known to contain polymorphic sites. Until the phase of these polymorphisms is known it will not be possible to determine how effectively this monoclonal will perform in individual patients. The role of miRNA single strand molecules and their effect on gene expression. Thousands of non-coding genes have been identified and have been shown to be polymorphic, as have their target genes. Genetic phasing of polymorphism both in the miRNA source genes and their targets is clearly a fertile area of research In areas such a drug metabolism where the polymorphic family of CYP genes is responsible for the metabolism of the majority of prescription drugs, determining phase of SNPs is critical to understanding drug metabolism and efficacy. In multigenic disease studies combinations of single nucleotide polymorphisms (SNPs) in participating genes require accurate phasing in order to fully appreciate their role in the disease process. In addition, the level of expression of genes (point 3) is also important in understanding disease processes at the functional level. This review outlines the techniques that are currently available for approximating phase and discusses the clinical relevance of establishing genetic phase in areas of clinical medicine outlined in points 1-3.
Collapse
Affiliation(s)
- Brian D Tait
- Haplomic Technologies, Melbourne, Australia.,Department of Medicine, University of Melbourne, Royal Melbourne Hospital, Australia
| |
Collapse
|
50
|
Mansfeld BN, Boyher A, Berry JC, Wilson M, Ou S, Polydore S, Michael TP, Fahlgren N, Bart RS. Large structural variations in the haplotype-resolved African cassava genome. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2021; 108:1830-1848. [PMID: 34661327 PMCID: PMC9299708 DOI: 10.1111/tpj.15543] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 09/29/2021] [Accepted: 10/06/2021] [Indexed: 05/12/2023]
Abstract
Cassava (Manihot esculenta Crantz, 2n = 36) is a global food security crop. It has a highly heterozygous genome, high genetic load, and genotype-dependent asynchronous flowering. It is typically propagated by stem cuttings and any genetic variation between haplotypes, including large structural variations, is preserved by such clonal propagation. Traditional genome assembly approaches generate a collapsed haplotype representation of the genome. In highly heterozygous plants, this results in artifacts and an oversimplification of heterozygous regions. We used a combination of Pacific Biosciences (PacBio), Illumina, and Hi-C to resolve each haplotype of the genome of a farmer-preferred cassava line, TME7 (Oko-iyawo). PacBio reads were assembled using the FALCON suite. Phase switch errors were corrected using FALCON-Phase and Hi-C read data. The ultralong-range information from Hi-C sequencing was also used for scaffolding. Comparison of the two phases revealed >5000 large haplotype-specific structural variants affecting over 8 Mb, including insertions and deletions spanning thousands of base pairs. The potential of these variants to affect allele-specific expression was further explored. RNA-sequencing data from 11 different tissue types were mapped against the scaffolded haploid assembly and gene expression data are incorporated into our existing easy-to-use web-based interface to facilitate use by the broader plant science community. These two assemblies provide an excellent means to study the effects of heterozygosity, haplotype-specific structural variation, gene hemizygosity, and allele-specific gene expression contributing to important agricultural traits and further our understanding of the genetics and domestication of cassava.
Collapse
Affiliation(s)
| | - Adam Boyher
- Donald Danforth Plant Science CenterSt. LouisMO63132USA
| | | | - Mark Wilson
- Donald Danforth Plant Science CenterSt. LouisMO63132USA
| | - Shujun Ou
- Department of Ecology, Evolution, and Organismal BiologyIowa State UniversityAmesIA50011USA
| | - Seth Polydore
- Donald Danforth Plant Science CenterSt. LouisMO63132USA
| | - Todd P. Michael
- The Molecular and Cellular Biology LaboratoryThe Salk Institute for Biological StudiesLa JollaCA92037USA
| | - Noah Fahlgren
- Donald Danforth Plant Science CenterSt. LouisMO63132USA
| | | |
Collapse
|