1
|
L Rocha J, Lou RN, Sudmant PH. Structural variation in humans and our primate kin in the era of telomere-to-telomere genomes and pangenomics. Curr Opin Genet Dev 2024; 87:102233. [PMID: 39042999 DOI: 10.1016/j.gde.2024.102233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 07/02/2024] [Accepted: 07/05/2024] [Indexed: 07/25/2024]
Abstract
Structural variants (SVs) account for the majority of base pair differences both within and between primate species. However, our understanding of inter- and intra-species SV has been historically hampered by the quality of draft primate genomes and the absence of genome resources for key taxa. Recently, advances in long-read sequencing and genome assembly have begun to radically reshape our understanding of SVs. Two landmark achievements include the publication of a human telomere-to-telomere (T2T) genome as well as the development of the first human pangenome reference. In this review, we first look back to the major works laying the foundation for these projects. We then examine the ways in which T2T genome assemblies and pangenomes are transforming our understanding of and approach to primate SV. Finally, we discuss what the future of primate SV research may look like in the era of T2T genomes and pangenomics.
Collapse
Affiliation(s)
- Joana L Rocha
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA. https://twitter.com/@joanocha
| | - Runyang N Lou
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA. https://twitter.com/@NicolasLou10
| | - Peter H Sudmant
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA; Center for Computational Biology, University of California, Berkeley, Berkeley, USA.
| |
Collapse
|
2
|
Ivancevic A, Simpson DM, Joyner OM, Bagby SM, Nguyen LL, Bitler BG, Pitts TM, Chuong EB. Endogenous retroviruses mediate transcriptional rewiring in response to oncogenic signaling in colorectal cancer. SCIENCE ADVANCES 2024; 10:eado1218. [PMID: 39018396 PMCID: PMC466953 DOI: 10.1126/sciadv.ado1218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 06/13/2024] [Indexed: 07/19/2024]
Abstract
Cancer cells exhibit rewired transcriptional regulatory networks that promote tumor growth and survival. However, the mechanisms underlying the formation of these pathological networks remain poorly understood. Through a pan-cancer epigenomic analysis, we found that primate-specific endogenous retroviruses (ERVs) are a rich source of enhancers displaying cancer-specific activity. In colorectal cancer and other epithelial tumors, oncogenic MAPK/AP1 signaling drives the activation of enhancers derived from the primate-specific ERV family LTR10. Functional studies in colorectal cancer cells revealed that LTR10 elements regulate tumor-specific expression of multiple genes associated with tumorigenesis, such as ATG12 and XRCC4. Within the human population, individual LTR10 elements exhibit germline and somatic structural variation resulting from a highly mutable internal tandem repeat region, which affects AP1 binding activity. Our findings reveal that ERV-derived enhancers contribute to transcriptional dysregulation in response to oncogenic signaling and shape the evolution of cancer-specific regulatory networks.
Collapse
Affiliation(s)
- Atma Ivancevic
- BioFrontiers Institute and Department of Molecular, Cellular and Developmental Biology, University of Colorado Boulder, Boulder, CO, USA
| | - David M. Simpson
- BioFrontiers Institute and Department of Molecular, Cellular and Developmental Biology, University of Colorado Boulder, Boulder, CO, USA
| | - Olivia M. Joyner
- BioFrontiers Institute and Department of Molecular, Cellular and Developmental Biology, University of Colorado Boulder, Boulder, CO, USA
| | - Stacey M. Bagby
- Division of Medical Oncology, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Lily L. Nguyen
- BioFrontiers Institute and Department of Molecular, Cellular and Developmental Biology, University of Colorado Boulder, Boulder, CO, USA
- Division of Reproductive Sciences, Department of Obstetrics and Gynecology, University of Colorado School of Medicine, Aurora, CO, USA
| | - Ben G. Bitler
- Division of Reproductive Sciences, Department of Obstetrics and Gynecology, University of Colorado School of Medicine, Aurora, CO, USA
| | - Todd M. Pitts
- Division of Medical Oncology, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Edward B. Chuong
- BioFrontiers Institute and Department of Molecular, Cellular and Developmental Biology, University of Colorado Boulder, Boulder, CO, USA
| |
Collapse
|
3
|
Ji Y, Zhao J, Gong J, Sedlazeck FJ, Fan S. Unveiling novel genetic variants in 370 challenging medically relevant genes using the long read sequencing data of 41 samples from 19 global populations. Mol Genet Genomics 2024; 299:65. [PMID: 38972030 DOI: 10.1007/s00438-024-02158-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 06/16/2024] [Indexed: 07/08/2024]
Abstract
BACKGROUND A large number of challenging medically relevant genes (CMRGs) are situated in complex or highly repetitive regions of the human genome, hindering comprehensive characterization of genetic variants using next-generation sequencing technologies. In this study, we employed long-read sequencing technology, extensively utilized in studying complex genomic regions, to characterize genetic alterations, including short variants (single nucleotide variants and short insertions and deletions) and copy number variations, in 370 CMRGs across 41 individuals from 19 global populations. RESULTS Our analysis revealed high levels of genetic variants in CMRGs, with 68.73% exhibiting copy number variations and 65.20% containing short variants that may disrupt protein function across individuals. Such variants can influence pharmacogenomics, genetic disease susceptibility, and other clinical outcomes. We observed significant differences in CMRG variation across populations, with individuals of African ancestry harboring the highest number of copy number variants and short variants compared to samples from other continents. Notably, 15.79% to 33.96% of short variants were exclusively detectable through long-read sequencing. While the T2T-CHM13 reference genome significantly improved the assembly of CMRG regions, thereby facilitating variant detection in these regions, some regions still lacked resolution. CONCLUSION Our results provide an important reference for future clinical and pharmacogenetic studies, highlighting the need for a comprehensive representation of global genetic diversity in the reference genome and improved variant calling techniques to fully resolve medically relevant genes.
Collapse
Affiliation(s)
- Yanfeng Ji
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Junfan Zhao
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Jiao Gong
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA.
| | - Shaohua Fan
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China.
| |
Collapse
|
4
|
Huang A, Feng S, Ye Z, Zhang T, Chen S, Chen C, Chen S. Genome Assembly and Structural Variation Analysis of Luffa acutangula Provide Insights on Flowering Time and Ridge Development. PLANTS (BASEL, SWITZERLAND) 2024; 13:1828. [PMID: 38999668 PMCID: PMC11243878 DOI: 10.3390/plants13131828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 06/21/2024] [Accepted: 06/21/2024] [Indexed: 07/14/2024]
Abstract
Luffa spp. is an important worldwide cultivated vegetable and medicinal plant from the Cucurbitaceae family. In this study, we report a high-quality chromosome-level genome of the high-generation inbred line SG261 of Luffa acutangula. The genomic sequence was determined by PacBio long reads, Hi-C sequencing reads, and 10× Genomics sequencing, with an assembly size of 739.82 Mb, contig N50 of 18.38 Mb, and scaffold N50 of 56.08 Mb. The genome of L. acutangula SG261 was predicted to contain 27,312 protein-coding genes and 72.56% repetitive sequences, of which long terminal repeats (LTRs) were an important form of repetitive sequences, accounting for 67.84% of the genome. Phylogenetic analysis reveals that L. acutangula evolved later than Luffa cylindrica, and Luffa is closely related to Momodica charantia. Comparing the genome of L. acutangula SG261 and L. cylindrica with PacBio data, 67,128 high-quality structural variations (SVs) and 55,978 presence-absence variations (PAVs) were identified in SG261, resulting in 2424 and 1094 genes with variation in the CDS region, respectively, and there are 287 identical genes affected by two different structural variation analyses. In addition, we found that the transcription factor FY (FLOWERING LOCUS Y) families had a large expansion in L. acutangula SG261 (flowering in the morning) compared to L. cylindrica (flowering in the afternoon), which may result in the early flowering time in L. acutangula SG261. This study provides valuable reference for the breeding of and pan-genome research into Luffa species.
Collapse
Affiliation(s)
- Aizheng Huang
- Institute of Agricultural Science Research of Jiangmen, Jiangmen 529060, China;
| | - Shuo Feng
- College of Horticulture, South China Agricultural University, Guangzhou 510642, China; (S.F.)
| | - Zhuole Ye
- Dongguan Agricultural Scientific Research Center, Dongguan 523086, China
| | - Ting Zhang
- College of Horticulture, South China Agricultural University, Guangzhou 510642, China; (S.F.)
| | - Shenglong Chen
- Dongguan Agricultural Scientific Research Center, Dongguan 523086, China
| | - Changming Chen
- College of Horticulture, South China Agricultural University, Guangzhou 510642, China; (S.F.)
| | - Shijun Chen
- Institute of Agricultural Science Research of Jiangmen, Jiangmen 529060, China;
| |
Collapse
|
5
|
Ou J, Wang J, Sun J, Ni M, Meng Q, Ding J, Fan H, Feng S, Huang Y, Li H, Fei J. Analysis of Preimplantation and Clinical Outcomes of Two Cases by Oxford Nanopore Sequencing. Reprod Sci 2024; 31:2123-2134. [PMID: 38347380 DOI: 10.1007/s43032-024-01470-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 01/19/2024] [Indexed: 07/03/2024]
Abstract
It is challenging to distinguish embryos with a balanced translocation karyotype from a normal karyotype by existing conventional genetic testing methods. However, in germ-cell gamete generation, chromosome exchange and separation through cell meiosis form a different proportion of unbalanced gametes. Adverse birth events may occur, such as repeated miscarriages and fetal birth defects. In this study, the exact breakpoints of structural variation (SV) from two balanced translocation carrier families by using Nanopore long reads sequencing technology were obtained, and haplotype analysis and Sanger verified the accuracy of the detection results, confirming the application value of the Nanopore sequencing technology in the detection of balanced translocation before embryo implantation. Nanopore long-read sequencing was performed to find the precise breakpoint of chromosome-balanced translocation carriers. The breakpoints were subsequently verified by designing primers across the breakpoints and Sanger sequencing. Haplotype linkage analysis of SNPs which can be linked by a read block of families around the breakpoint regions was followed. After frozen (-thawed) embryo transfer (FET), prenatal cytogenetic analysis of amniotic fluid cells confirmed the predicted karyotypes from the transferred embryos. The presence of breakpoints was detected in three embryos of patient 1. No breakpoints were detected in either embryo of patient 2. One balanced translocated embryo from patient 1 and one normal euploid embryo from patient 2 were transplanted back into the patients, and amniotic fluid cells were analyzed for the karyotype of fetuses. The results were entirely consistent with the fetal karyotype. And through late follow-up, both patients successfully had a live birth fetus. The breakpoint location of the balanced chromosome translocation can be accurately found by Nanopore sequencing. The haplotype of carriers can be successfully constructed by Nanopore and sanger sequencing confirmed that the results were accurate. This is very advantageous for preimplantation genetic testing for chromosomal structural rearrangements (PGT-SR) detection in the families without proband.
Collapse
Affiliation(s)
- Jian Ou
- Center for Reproduction and Genetics, The Affiliated Suzhou Hospital of Nanjing Medical University, Suzhou Municipal Hospital, Suzhou, China
| | | | - Jian Sun
- Center for Reproduction and Genetics, The Affiliated Suzhou Hospital of Nanjing Medical University, Suzhou Municipal Hospital, Suzhou, China
| | - Mengxia Ni
- Center for Reproduction and Genetics, The Affiliated Suzhou Hospital of Nanjing Medical University, Suzhou Municipal Hospital, Suzhou, China
| | - QingXia Meng
- Center for Reproduction and Genetics, The Affiliated Suzhou Hospital of Nanjing Medical University, Suzhou Municipal Hospital, Suzhou, China
| | - Jie Ding
- Center for Reproduction and Genetics, The Affiliated Suzhou Hospital of Nanjing Medical University, Suzhou Municipal Hospital, Suzhou, China
| | - Haiyang Fan
- Peking Jabrehoo Med-Tech Co., Ltd, Beijing, China
| | - Shaohua Feng
- Peking Jabrehoo Med-Tech Co., Ltd, Beijing, China
| | - Yining Huang
- Peking Jabrehoo Med-Tech Co., Ltd, Beijing, China
| | - Hong Li
- Center for Reproduction and Genetics, The Affiliated Suzhou Hospital of Nanjing Medical University, Suzhou Municipal Hospital, Suzhou, China.
| | - Jia Fei
- Peking Jabrehoo Med-Tech Co., Ltd, Beijing, China.
| |
Collapse
|
6
|
Catlin NS, Agha HI, Platts AE, Munasinghe M, Hirsch CN, Josephs EB. Structural variants contribute to phenotypic variation in maize. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.14.599082. [PMID: 38948717 PMCID: PMC11212879 DOI: 10.1101/2024.06.14.599082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Comprehensively identifying the loci shaping trait variation has been challenging, in part because standard approaches often miss many types of genetic variants. Structural variants, especially transposable elements are likely to affect phenotypic variation but we need better methods in maize for detecting polymorphic structural variants and TEs using short-read sequencing data. Here, we used a whole genome alignment between two maize genotypes to identify polymorphic structural variants and then genotyped a large maize diversity panel for these variants using short-read sequencing data. We characterized variation of SVs within the panel and identified SV polymorphisms that are associated with life history traits and genotype-by-environment interactions. While most of the SVs associated with traits contained TEs, only one of the SV's boundaries clearly matched TE breakpoints indicative of a TE insertion, whereas the other polymorphisms were likely caused by deletions. All of the SVs associated with traits were in linkage disequilibrium with nearby single nucleotide polymorphisms (SNPs), suggesting that this method did not identify variants that would have been missed in a SNP association study.
Collapse
Affiliation(s)
- Nathan S. Catlin
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
- Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, MI, 48824, USA
- Plant Resilience Institute, Michigan State University, East Lansing, MI, 48824, USA
| | - Husain I. Agha
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
- Plant Resilience Institute, Michigan State University, East Lansing, MI, 48824, USA
| | - Adrian E. Platts
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Manisha Munasinghe
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN, 55108, USA
| | - Candice N. Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Emily B. Josephs
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
- Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, MI, 48824, USA
- Plant Resilience Institute, Michigan State University, East Lansing, MI, 48824, USA
| |
Collapse
|
7
|
Sun W, Xiong D, Ouyang J, Xiao X, Jiang Y, Wang Y, Li S, Xie Z, Wang J, Tang Z, Zhang Q. Altered chromatin topologies caused by balanced chromosomal translocation lead to central iris hypoplasia. Nat Commun 2024; 15:5048. [PMID: 38871723 DOI: 10.1038/s41467-024-49376-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 06/04/2024] [Indexed: 06/15/2024] Open
Abstract
Despite the advent of genomic sequencing, molecular diagnosis remains unsolved in approximately half of patients with Mendelian disorders, largely due to unclarified functions of noncoding regions and the difficulty in identifying complex structural variations. In this study, we map a unique form of central iris hypoplasia in a large family to 6q15-q23.3 and 18p11.31-q12.1 using a genome-wide linkage scan. Long-read sequencing reveals a balanced translocation t(6;18)(q22.31;p11.22) with intergenic breakpoints. By performing Hi-C on induced pluripotent stem cells from a patient, we identify two chromatin topologically associating domains spanning across the breakpoints. These alterations lead the ectopic chromatin interactions between APCDD1 on chromosome 18 and enhancers on chromosome 6, resulting in upregulation of APCDD1. Notably, APCDD1 is specifically localized in the iris of human eyes. Our findings demonstrate that noncoding structural variations can lead to Mendelian diseases by disrupting the 3D genome structure and resulting in altered gene expression.
Collapse
Affiliation(s)
- Wenmin Sun
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510060, China
| | - Dan Xiong
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
| | - Jiamin Ouyang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510060, China
| | - Xueshan Xiao
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510060, China
| | - Yi Jiang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510060, China
| | - Yingwei Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510060, China
| | - Shiqiang Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510060, China
| | - Ziying Xie
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
| | - Junwen Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510060, China
| | - Zhonghui Tang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China.
| | - Qingjiong Zhang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510060, China.
| |
Collapse
|
8
|
Li Q, Qiao X, Li L, Gu C, Yin H, Qi K, Xie Z, Yang S, Zhao Q, Wang Z, Yang Y, Pan J, Li H, Wang J, Wang C, Rieseberg LH, Zhang S, Tao S. Haplotype-resolved T2T genome assemblies and pangenome graph of pear reveal diverse patterns of allele-specific expression and the genomic basis of fruit quality traits. PLANT COMMUNICATIONS 2024:101000. [PMID: 38859586 DOI: 10.1016/j.xplc.2024.101000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 05/15/2024] [Accepted: 06/07/2024] [Indexed: 06/12/2024]
Abstract
Hybrid crops often exhibit increased yield and greater resilience, yet the genomic mechanism(s) underlying hybrid vigor or heterosis remain unclear, hindering our ability to predict the expression of phenotypic traits in hybrid breeding. Here, we generated haplotype-resolved T2T genome assemblies of two pear hybrid varieties, 'Yuluxiang' (YLX) and 'Hongxiangsu' (HXS), which share the same maternal parent but differ in their paternal parents. We then used these assemblies to explore the genome-scale landscape of allele-specific expression (ASE) and create a pangenome graph for pear. ASE was observed for close to 6000 genes in both hybrid cultivars. A subset of ASE genes related to aspects of fruit quality such as sugars, organic acids, and cuticular wax were identified, suggesting their important contributions to heterosis. Specifically, Ma1, a gene regulating fruit acidity, is absent in the paternal haplotypes of HXS and YLX. A pangenome graph was built based on our assemblies and seven published pear genomes. Resequencing data for 139 cultivated pear genotypes (including 97 genotypes sequenced here) were subsequently aligned to the pangenome graph, revealing numerous structural variant hotspots and selective sweeps during pear diversification. As predicted, the Ma1 allele was found to be absent in varieties with low organic acid content, and this association was functionally validated by Ma1 overexpression in pear fruit and calli. Overall, these results reveal the contributions of ASE to fruit-quality heterosis and provide a robust pangenome reference for high-resolution allele discovery and association mapping.
Collapse
Affiliation(s)
- Qionghou Li
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Xin Qiao
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Lanqing Li
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Chao Gu
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Hao Yin
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Kaijie Qi
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Zhihua Xie
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Sheng Yang
- Pomology Institute, Shanxi Agricultural University, Taigu, Shanxi 030801, China
| | - Qifeng Zhao
- Pomology Institute, Shanxi Agricultural University, Taigu, Shanxi 030801, China
| | - Zewen Wang
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Yuhang Yang
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Jiahui Pan
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Hongxiang Li
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Jie Wang
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Chao Wang
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Loren H Rieseberg
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada
| | - Shaoling Zhang
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Shutian Tao
- National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Sanya Institute of Nanjing Agricultural University, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China.
| |
Collapse
|
9
|
Yang J, Wang DF, Huang JH, Zhu QH, Luo LY, Lu R, Xie XL, Salehian-Dehkordi H, Esmailizadeh A, Liu GE, Li MH. Structural variant landscapes reveal convergent signatures of evolution in sheep and goats. Genome Biol 2024; 25:148. [PMID: 38845023 PMCID: PMC11155191 DOI: 10.1186/s13059-024-03288-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 05/21/2024] [Indexed: 06/10/2024] Open
Abstract
BACKGROUND Sheep and goats have undergone domestication and improvement to produce similar phenotypes, which have been greatly impacted by structural variants (SVs). Here, we report a high-quality chromosome-level reference genome of Asiatic mouflon, and implement a comprehensive analysis of SVs in 897 genomes of worldwide wild and domestic populations of sheep and goats to reveal genetic signatures underlying convergent evolution. RESULTS We characterize the SV landscapes in terms of genetic diversity, chromosomal distribution and their links with genes, QTLs and transposable elements, and examine their impacts on regulatory elements. We identify several novel SVs and annotate corresponding genes (e.g., BMPR1B, BMPR2, RALYL, COL21A1, and LRP1B) associated with important production traits such as fertility, meat and milk production, and wool/hair fineness. We detect signatures of selection involving the parallel evolution of orthologous SV-associated genes during domestication, local environmental adaptation, and improvement. In particular, we find that fecundity traits experienced convergent selection targeting the gene BMPR1B, with the DEL00067921 deletion explaining ~10.4% of the phenotypic variation observed in goats. CONCLUSIONS Our results provide new insights into the convergent evolution of SVs and serve as a rich resource for the future improvement of sheep, goats, and related livestock.
Collapse
Affiliation(s)
- Ji Yang
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Dong-Feng Wang
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Jia-Hui Huang
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Qiang-Hui Zhu
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Ling-Yun Luo
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Ran Lu
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Xing-Long Xie
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Hosein Salehian-Dehkordi
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Ali Esmailizadeh
- Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, 76169-133, Iran
| | - George E Liu
- Animal Genomics and Improvement Laboratory, BARC, USDA-ARS, Beltsville, MD, 20705, USA
| | - Meng-Hua Li
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China.
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
| |
Collapse
|
10
|
Corradi Z, Dhaenens CM, Grunewald O, Kocabaş IS, Meunier I, Banfi S, Karali M, Cremers FPM, Hitti-Malin RJ. Novel and Recurrent Copy Number Variants in ABCA4-Associated Retinopathy. Int J Mol Sci 2024; 25:5940. [PMID: 38892127 PMCID: PMC11173210 DOI: 10.3390/ijms25115940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 05/22/2024] [Accepted: 05/24/2024] [Indexed: 06/21/2024] Open
Abstract
ABCA4 is the most frequently mutated gene leading to inherited retinal disease (IRD) with over 2200 pathogenic variants reported to date. Of these, ~1% are copy number variants (CNVs) involving the deletion or duplication of genomic regions, typically >50 nucleotides in length. An in-depth assessment of the current literature based on the public database LOVD, regarding the presence of known CNVs and structural variants in ABCA4, and additional sequencing analysis of ABCA4 using single-molecule Molecular Inversion Probes (smMIPs) for 148 probands highlighted recurrent and novel CNVs associated with ABCA4-associated retinopathies. An analysis of the coverage depth in the sequencing data led to the identification of eleven deletions (six novel and five recurrent), three duplications (one novel and two recurrent) and one complex CNV. Of particular interest was the identification of a complex defect, i.e., a 15.3 kb duplicated segment encompassing exon 31 through intron 41 that was inserted at the junction of a downstream 2.7 kb deletion encompassing intron 44 through intron 47. In addition, we identified a 7.0 kb tandem duplication of intron 1 in three cases. The identification of CNVs in ABCA4 can provide patients and their families with a genetic diagnosis whilst expanding our understanding of the complexity of diseases caused by ABCA4 variants.
Collapse
Affiliation(s)
- Zelia Corradi
- Department of Human Genetics, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Claire-Marie Dhaenens
- Université de Lille, Inserm, CHU Lille, U1172-LilNCog-Lille Neuroscience & Cognition, F-59000 Lille, France
| | - Olivier Grunewald
- Université de Lille, Inserm, CHU Lille, U1172-LilNCog-Lille Neuroscience & Cognition, F-59000 Lille, France
| | - Ipek Selen Kocabaş
- Department of Human Genetics, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Isabelle Meunier
- Institute des Neurosciences de Montpellier, INSERM, Université de Montpellier, F-34295 Montpellier, France
| | - Sandro Banfi
- Department of Precision Medicine, University of Campania “Luigi Vanvitelli”, 81031 Naples, Italy
- Telethon Institute of Genetics and Medicine (TIGEM), 80078 Pozzuoli, Italy
| | - Marianthi Karali
- Department of Precision Medicine, University of Campania “Luigi Vanvitelli”, 81031 Naples, Italy
- Eye Clinic, Multidisciplinary Department of Medical, Surgical and Dental Sciences, University of Campania “Luigi Vanvitelli”, 81031 Naples, Italy
| | - Frans P. M. Cremers
- Department of Human Genetics, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Rebekkah J. Hitti-Malin
- Department of Human Genetics, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| |
Collapse
|
11
|
Xue Z, Zhou A, Zhu X, Li L, Zhu H, Jin X, Wang J. NIPT-PG: empowering non-invasive prenatal testing to learn from population genomics through an incremental pan-genomic approach. Brief Bioinform 2024; 25:bbae266. [PMID: 38836702 DOI: 10.1093/bib/bbae266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 05/03/2024] [Accepted: 05/21/2024] [Indexed: 06/06/2024] Open
Abstract
Non-invasive prenatal testing (NIPT) is a quite popular approach for detecting fetal genomic aneuploidies. However, due to the limitations on sequencing read length and coverage, NIPT suffers a bottleneck on further improving performance and conducting earlier detection. The errors mainly come from reference biases and population polymorphism. To break this bottleneck, we proposed NIPT-PG, which enables the NIPT algorithm to learn from population data. A pan-genome model is introduced to incorporate variant and polymorphic loci information from tested population. Subsequently, we proposed a sequence-to-graph alignment method, which considers the read mis-match rates during the mapping process, and an indexing method using hash indexing and adjacency lists to accelerate the read alignment process. Finally, by integrating multi-source aligned read and polymorphic sites across the pan-genome, NIPT-PG obtains a more accurate z-score, thereby improving the accuracy of chromosomal aneuploidy detection. We tested NIPT-PG on two simulated datasets and 745 real-world cell-free DNA sequencing data sets from pregnant women. Results demonstrate that NIPT-PG outperforms the standard z-score test. Furthermore, combining experimental and theoretical analyses, we demonstrate the probably approximately correct learnability of NIPT-PG. In summary, NIPT-PG provides a new perspective for fetal chromosomal aneuploidies detection. NIPT-PG may have broad applications in clinical testing, and its detection results can serve as a reference for false positive samples approaching the critical threshold.
Collapse
Affiliation(s)
- Zhengfa Xue
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an 710049, China
| | - Aifen Zhou
- Institute of Maternal and Child Health, Wuhan Children's Hospital (Wuhan Maternal and Child Health care Hospital), Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430015, China
- Department of Obstetrics, Wuhan Children's Hospital (Wuhan Maternal and Child Health care Hospital), Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430015, China
| | - Xiaoyan Zhu
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an 710049, China
| | - Linxuan Li
- BGI Research, Shenzhen 518083, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | | | - Xin Jin
- BGI Research, Shenzhen 518083, China
- School of Medicine, South China University of Technology, Guangzhou 510006, China
| | - Jiayin Wang
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an 710049, China
| |
Collapse
|
12
|
Moya R, Wang X, Tsien RW, Maurano MT. Structural characterization of a polymorphic repeat at the CACNA1C schizophrenia locus. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.05.24303780. [PMID: 38798557 PMCID: PMC11118589 DOI: 10.1101/2024.03.05.24303780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Genetic variation within intron 3 of the CACNA1C calcium channel gene is associated with schizophrenia and bipolar disorder, but analysis of the causal variants and their effect is complicated by a nearby variable-number tandem repeat (VNTR). Here, we used 155 long-read genome assemblies from 78 diverse individuals to delineate the structure and population variability of the CACNA1C intron 3 VNTR. We categorized VNTR sequences into 7 Types of structural alleles using sequence differences among repeat units. Only 12 repeat units at the 5' end of the VNTR were shared across most Types, but several Types were related through a series of large and small duplications. The most diverged Types were rare and present only in individuals with African ancestry, but the multiallelic structural polymorphism Variable Region 2 was present across populations at different frequencies, consistent with expansion of the VNTR preceding the emergence of early hominins. VR2 was in complete linkage disequilibrium with fine-mapped schizophrenia variants (SNPs) from genome-wide association studies (GWAS). This risk haplotype was associated with decreased CACNA1C gene expression in brain tissues profiled by the GTEx project. Our work suggests that sequence variation within a human-specific VNTR affects gene expression, and provides a detailed characterization of new alleles at a flagship neuropsychiatric locus.
Collapse
Affiliation(s)
- Raquel Moya
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
| | - Xiaohan Wang
- Neuroscience Institute, NYU School of Medicine, New York, NY 10016, USA
- Department of Neuroscience and Physiology, New York University, New York, NY 10016
| | - Richard W. Tsien
- Neuroscience Institute, NYU School of Medicine, New York, NY 10016, USA
- Department of Neuroscience and Physiology, New York University, New York, NY 10016
| | - Matthew T. Maurano
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
- Department of Pathology, NYU School of Medicine, New York, NY 10016, USA
| |
Collapse
|
13
|
Schrauwen I, Rajendran Y, Acharya A, Öhman S, Arvio M, Paetau R, Siren A, Avela K, Granvik J, Leal SM, Määttä T, Kokkonen H, Järvelä I. Optical genome mapping unveils hidden structural variants in neurodevelopmental disorders. Sci Rep 2024; 14:11239. [PMID: 38755281 PMCID: PMC11099145 DOI: 10.1038/s41598-024-62009-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 05/13/2024] [Indexed: 05/18/2024] Open
Abstract
While short-read sequencing currently dominates genetic research and diagnostics, it frequently falls short of capturing certain structural variants (SVs), which are often implicated in the etiology of neurodevelopmental disorders (NDDs). Optical genome mapping (OGM) is an innovative technique capable of capturing SVs that are undetectable or challenging-to-detect via short-read methods. This study aimed to investigate NDDs using OGM, specifically focusing on cases that remained unsolved after standard exome sequencing. OGM was performed in 47 families using ultra-high molecular weight DNA. Single-molecule maps were assembled de novo, followed by SV and copy number variant calling. We identified 7 variants of interest, of which 5 (10.6%) were classified as likely pathogenic or pathogenic, located in BCL11A, OPHN1, PHF8, SON, and NFIA. We also identified an inversion disrupting NAALADL2, a gene which previously was found to harbor complex rearrangements in two NDD cases. Variants in known NDD genes or candidate variants of interest missed by exome sequencing mainly consisted of larger insertions (> 1kbp), inversions, and deletions/duplications of a low number of exons (1-4 exons). In conclusion, in addition to improving molecular diagnosis in NDDs, this technique may also reveal novel NDD genes which may harbor complex SVs often missed by standard sequencing techniques.
Collapse
Affiliation(s)
- Isabelle Schrauwen
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA.
| | - Yasmin Rajendran
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA
| | - Anushree Acharya
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA
| | | | - Maria Arvio
- Päijät-Häme Wellbeing Services, Neurology, Lahti, Finland
| | - Ritva Paetau
- Department of Child Neurology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
| | - Auli Siren
- Kanta-Häme Central Hospital, Hämeenlinna, Finland
| | - Kristiina Avela
- Institute of Biomedicine, University of Turku, Turku, Finland
| | - Johanna Granvik
- The Wellbeing Services County of Ostrobothnia, Kokkola, Finland
| | - Suzanne M Leal
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA
- Taub Institute for Alzheimer's Disease and the Aging Brain, Columbia University Medical Center, New York, NY, USA
| | - Tuomo Määttä
- The Wellbeing Services County of Kainuu, Kajaani, Finland
| | - Hannaleena Kokkonen
- Northern Finland Laboratory Centre NordLab and Medical Research Centre, Oulu University Hospital and University of Oulu, Oulu, Finland
| | - Irma Järvelä
- Department of Medical Genetics, University of Helsinki, Helsinki, Finland
| |
Collapse
|
14
|
Yang X, Zheng G, Jia P, Wang S, Ye K. Pindel-TD: A Tandem Duplication Detector Based on A Pattern Growth Approach. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae008. [PMID: 38862430 DOI: 10.1093/gpbjnl/qzae008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 10/24/2023] [Accepted: 11/03/2023] [Indexed: 06/13/2024]
Abstract
Tandem duplication (TD) is a major type of structural variations (SVs) that plays an important role in novel gene formation and human diseases. However, TDs are often missed or incorrectly classified as insertions by most modern SV detection methods due to the lack of specialized operation on TD-related mutational signals. Herein, we developed a TD detection module for the Pindel tool, referred to as Pindel-TD, based on a TD-specific pattern growth approach. Pindel-TD is capable of detecting TDs with a wide size range at single nucleotide resolution. Using simulated and real read data from HG002, we demonstrated that Pindel-TD outperforms other leading methods in terms of precision, recall, F1-score, and robustness. Furthermore, by applying Pindel-TD to data generated from the K562 cancer cell line, we identified a TD located at the seventh exon of SAGE1, providing an explanation for its high expression. Pindel-TD is available for non-commercial use at https://github.com/xjtu-omics/pindel.
Collapse
Affiliation(s)
- Xiaofei Yang
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- Center for Mathematical Medical, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Gaoyang Zheng
- Center for Mathematical Medical, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
| | - Peng Jia
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Songbo Wang
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Kai Ye
- Center for Mathematical Medical, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
- Faculty of Science, Leiden University, Leiden 2311 EZ, Netherland
| |
Collapse
|
15
|
Pokrovac I, Rohner N, Pezer Ž. The prevalence of copy number increase at multiallelic copy number variants associated with cave colonization. Mol Ecol 2024; 33:e17339. [PMID: 38556927 DOI: 10.1111/mec.17339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 03/16/2024] [Accepted: 03/22/2024] [Indexed: 04/02/2024]
Abstract
Copy number variation is a common contributor to phenotypic diversity, yet its involvement in ecological adaptation is not easily discerned. Instances of parallelly evolving populations of the same species in a similar environment marked by strong selective pressures present opportunities to study the role of copy number variants (CNVs) in adaptation. By identifying CNVs that repeatedly occur in multiple populations of the derived ecotype and are not (or are rarely) present in the populations of the ancestral ecotype, the association of such CNVs with adaptation to the novel environment can be inferred. We used this paradigm to identify CNVs associated with recurrent adaptation of the Mexican tetra (Astyanax mexicanus) to cave environment. Using a read-depth approach, we detected CNVs from previously re-sequenced genomes of 44 individuals belonging to two ancestral surfaces and three derived cave populations. We identified 102 genes and 292 genomic regions that repeatedly diverge in copy number between the two ecotypes and occupy 0.8% of the reference genome. Functional analysis revealed their association with processes previously recognized to be relevant for adaptation, such as vision, immunity, oxygen consumption, metabolism, and neural function and we propose that these variants have been selected for in the cave or surface waters. The majority of the ecotype-divergent CNVs are multiallelic and display copy number increases in cavefish compared to surface fish. Our findings suggest that multiallelic CNVs - including gene duplications - and divergence in copy number provide a fast route to produce novel phenotypes associated with adaptation to subterranean life.
Collapse
Affiliation(s)
| | - Nicolas Rohner
- Stowers Institute for Medical Research, Kansas City, Missouri, USA
| | | |
Collapse
|
16
|
Li X, Liu Q, Fu C, Li M, Li C, Li X, Zhao S, Zheng Z. Characterizing structural variants based on graph-genotyping provides insights into pig domestication and local adaption. J Genet Genomics 2024; 51:394-406. [PMID: 38056526 DOI: 10.1016/j.jgg.2023.11.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 11/23/2023] [Accepted: 11/24/2023] [Indexed: 12/08/2023]
Abstract
Structural variants (SVs), such as deletions (DELs) and insertions (INSs), contribute substantially to pig genetic diversity and phenotypic variation. Using a library of SVs discovered from long-read primary assemblies and short-read sequenced genomes, we map pig genomic SVs with a graph-based method for re-genotyping SVs in 402 genomes. Our results demonstrate that those SVs harboring specific trait-associated genes may greatly shape pig domestication and local adaptation. Further characterization of SVs reveals that some population-stratified SVs may alter the transcription of genes by affecting regulatory elements. We identify that the genotypes of two DELs (296-bp DEL, chr7: 52,172,101-52,172,397; 278-bp DEL, chr18: 23,840,143-23,840,421) located in muscle-specific enhancers are associated with the expression of target genes related to meat quality (FSD2) and muscle fiber hypertrophy (LMOD2 and WASL) in pigs. Our results highlight the role of SVs in domestic porcine evolution, and the identified candidate functional genes and SVs are valuable resources for future genomic research and breeding programs in pigs.
Collapse
Affiliation(s)
- Xin Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Quan Liu
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Chong Fu
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Mengxun Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Changchun Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; The Cooperative Innovation Center for Sustainable Pig Production, Wuhan, Hubei 430070, China
| | - Xinyun Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; The Cooperative Innovation Center for Sustainable Pig Production, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China
| | - Shuhong Zhao
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; The Cooperative Innovation Center for Sustainable Pig Production, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China.
| | - Zhuqing Zheng
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; Institute of Agricultural Biotechnology, Jingchu University of Technology, Jingmen, Hubei 448000, China.
| |
Collapse
|
17
|
Yu Y, Chen H. Human pangenome: far-reaching implications in precision medicine. Front Med 2024; 18:403-409. [PMID: 38157192 DOI: 10.1007/s11684-023-1039-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 10/15/2023] [Indexed: 01/03/2024]
Affiliation(s)
- Yingyan Yu
- Department of General Surgery of Ruijin Hospital, Shanghai Institute of Digestive Surgery, and Shanghai Key Laboratory for Gastric Neoplasms, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Hongzhuan Chen
- Shuguang Lab for Future Health, Shanghai Frontier Science Center of TCM Chemical Biology, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China.
| |
Collapse
|
18
|
Jensen TD, Ni B, Reuter CM, Gorzynski JE, Fazal S, Bonner D, Ungar RA, Goddard PC, Raja A, Ashley EA, Bernstein JA, Zuchner S, Greicius MD, Montgomery SB, Schatz MC, Wheeler MT, Battle A. Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.22.24304565. [PMID: 38585781 PMCID: PMC10996727 DOI: 10.1101/2024.03.22.24304565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Rare structural variants (SVs) - insertions, deletions, and complex rearrangements - can cause Mendelian disease, yet they remain difficult to accurately detect and interpret. We sequenced and analyzed Oxford Nanopore long-read genomes of 68 individuals from the Undiagnosed Disease Network (UDN) with no previously identified diagnostic mutations from short-read sequencing. Using our optimized SV detection pipelines and 571 control long-read genomes, we detected 716 long-read rare (MAF < 0.01) SV alleles per genome on average, achieving a 2.4x increase from short-reads. To characterize the functional effects of rare SVs, we assessed their relationship with gene expression from blood or fibroblasts from the same individuals, and found that rare SVs overlapping enhancers were enriched (LOR = 0.46) near expression outliers. We also evaluated tandem repeat expansions (TREs) and found 14 rare TREs per genome; notably these TREs were also enriched near overexpression outliers. To prioritize candidate functional SVs, we developed Watershed-SV, a probabilistic model that integrates expression data with SV-specific genomic annotations, which significantly outperforms baseline models that don't incorporate expression data. Watershed-SV identified a median of eight high-confidence functional SVs per UDN genome. Notably, this included compound heterozygous deletions in FAM177A1 shared by two siblings, which were likely causal for a rare neurodevelopmental disorder. Our observations demonstrate the promise of integrating long-read sequencing with gene expression towards improving the prioritization of functional SVs and TREs in rare disease patients.
Collapse
|
19
|
Wu Z, Li T, Jiang Z, Zheng J, Gu Y, Liu Y, Liu Y, Xie Z. Human pangenome analysis of sequences missing from the reference genome reveals their widespread evolutionary, phenotypic, and functional roles. Nucleic Acids Res 2024; 52:2212-2230. [PMID: 38364871 PMCID: PMC10954445 DOI: 10.1093/nar/gkae086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 01/18/2024] [Accepted: 01/27/2024] [Indexed: 02/18/2024] Open
Abstract
Nonreference sequences (NRSs) are DNA sequences present in global populations but absent in the current human reference genome. However, the extent and functional significance of NRSs in the human genomes and populations remains unclear. Here, we de novo assembled 539 genomes from five genetically divergent human populations using long-read sequencing technology, resulting in the identification of 5.1 million NRSs. These were merged into 45284 unique NRSs, with 29.7% being novel discoveries. Among these NRSs, 38.7% were common across the five populations, and 35.6% were population specific. The use of a graph-based pangenome approach allowed for the detection of 565 transcript expression quantitative trait loci on NRSs, with 426 of these being novel findings. Moreover, 26 NRS candidates displayed evidence of adaptive selection within human populations. Genes situated in close proximity to or intersecting with these candidates may be associated with metabolism and type 2 diabetes. Genome-wide association studies revealed 14 NRSs to be significantly associated with eight phenotypes. Additionally, 154 NRSs were found to be in strong linkage disequilibrium with 258 phenotype-associated SNPs in the GWAS catalogue. Our work expands the understanding of human NRSs and provides novel insights into their functions, facilitating evolutionary and biomedical researches.
Collapse
Affiliation(s)
- Zhikun Wu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Tong Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Zehang Jiang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Jingjing Zheng
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yizhou Gu
- Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China
- University of Wisconsin-Madison, WI, USA
| | - Yizhi Liu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yun Liu
- MOE Key Laboratory of Metabolism and Molecular Medicine, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences and Shanghai Xuhui Central Hospital, Fudan University, Shanghai, China
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
- Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
20
|
Kwon D, Park N, Wy S, Lee D, Park W, Chai HH, Cho IC, Lee J, Kwon K, Kim H, Moon Y, Kim J, Kim J. Identification and characterization of structural variants related to meat quality in pigs using chromosome-level genome assemblies. BMC Genomics 2024; 25:299. [PMID: 38515031 PMCID: PMC10956321 DOI: 10.1186/s12864-024-10225-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 03/14/2024] [Indexed: 03/23/2024] Open
Abstract
BACKGROUND Many studies have been performed to identify various genomic loci and genes associated with the meat quality in pigs. However, the full genetic architecture of the trait still remains unclear in part because of the lack of accurate identification of related structural variations (SVs) which resulted from the shortage of target breeds, the limitations of sequencing data, and the incompleteness of genome assemblies. The recent generation of a new pig breed with superior meat quality, called Nanchukmacdon, and its chromosome-level genome assembly (the NCMD assembly) has provided new opportunities. RESULTS By applying assembly-based SV calling approaches to various genome assemblies of pigs including Nanchukmacdon, the impact of SVs on meat quality was investigated. Especially, by checking the commonality of SVs with other pig breeds, a total of 13,819 Nanchukmacdon-specific SVs (NSVs) were identified, which have a potential effect on the unique meat quality of Nanchukmacdon. The regulatory potentials of NSVs for the expression of nearby genes were further examined using transcriptome- and epigenome-based analyses in different tissues. CONCLUSIONS Whole-genome comparisons based on chromosome-level genome assemblies have led to the discovery of SVs affecting meat quality in pigs, and their regulatory potentials were analyzed. The identified NSVs will provide new insights regarding genetic architectures underlying the meat quality in pigs. Finally, this study confirms the utility of chromosome-level genome assemblies and multi-omics analysis to enhance the understanding of unique phenotypes.
Collapse
Affiliation(s)
- Daehong Kwon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Nayoung Park
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Suyeon Wy
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Daehwan Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Woncheoul Park
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju, 55365, Republic of Korea
| | - Han-Ha Chai
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju, 55365, Republic of Korea
| | - In-Cheol Cho
- Subtropical Livestock Research Institute, National Institute of Animal Science, RDA, Jeju, 63242, Republic of Korea
| | - Jongin Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Kisang Kwon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Heesun Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Youngbeen Moon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Juyeon Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea.
| |
Collapse
|
21
|
Zhang Z, Gomes Viana JP, Zhang B, Walden KKO, Müller Paul H, Moose SP, Morris GP, Daum C, Barry KW, Shakoor N, Hudson ME. Major impacts of widespread structural variation on sorghum. Genome Res 2024; 34:286-299. [PMID: 38479835 PMCID: PMC10984582 DOI: 10.1101/gr.278396.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 01/22/2024] [Indexed: 03/22/2024]
Abstract
Genetic diversity is critical to crop breeding and improvement, and dissection of the genomic variation underlying agronomic traits can both assist breeding and give insight into basic biological mechanisms. Although recent genome analyses in plants reveal many structural variants (SVs), most current studies of crop genetic variation are dominated by single-nucleotide polymorphisms (SNPs). The extent of the impact of SVs on global trait variation, as well as their utility in genome-wide selection, is not yet understood. In this study, we built an SV data set based on whole-genome resequencing of diverse sorghum lines (n = 363), validated the correlation of photoperiod sensitivity and variety type, and identified SV hotspots underlying the divergent evolution of cellulosic and sweet sorghum. In addition, we showed the complementary contribution of SVs for heritability of traits related to sorghum adaptation. Importantly, inclusion of SV polymorphisms in association studies revealed genotype-phenotype associations not observed with SNPs alone. Three-way genome-wide association studies (GWAS) based on whole-genome SNP, SV, and integrated SNP + SV data sets showed substantial associations between SVs and sorghum traits. The addition of SVs to GWAS substantially increased heritability estimates for some traits, indicating their important contribution to functional allelic variation at the genome level. Our discovery of the widespread impacts of SVs on heritable gene expression variation could render a plausible mechanism for their disproportionate impact on phenotypic variation. This study expands our knowledge of SVs and emphasizes the extensive impacts of SVs on sorghum.
Collapse
Affiliation(s)
- Zhihai Zhang
- DOE Center for Advanced Bioenergy and Bioproducts Innovation (CABBI), University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Joao Paulo Gomes Viana
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Bosen Zhang
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Kimberly K O Walden
- High Performance Computing in Biology, Carver Biotechnology Center, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Hans Müller Paul
- DOE Center for Advanced Bioenergy and Bioproducts Innovation (CABBI), University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Stephen P Moose
- DOE Center for Advanced Bioenergy and Bioproducts Innovation (CABBI), University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Geoffrey P Morris
- Department of Soil and Crop Science, Colorado State University, Fort Collins, Colorado 80523, USA
| | - Chris Daum
- United States Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Kerrie W Barry
- United States Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Nadia Shakoor
- Donald Danforth Plant Science Center, St. Louis, Missouri 63132, USA
| | - Matthew E Hudson
- DOE Center for Advanced Bioenergy and Bioproducts Innovation (CABBI), University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA;
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
22
|
Gustafson JA, Gibson SB, Damaraju N, Zalusky MPG, Hoekzema K, Twesigomwe D, Yang L, Snead AA, Richmond PA, De Coster W, Olson ND, Guarracino A, Li Q, Miller AL, Goffena J, Anderson Z, Storz SHR, Ward SA, Sinha M, Gonzaga-Jauregui C, Clarke WE, Basile AO, Corvelo A, Reeves C, Helland A, Musunuri RL, Revsine M, Patterson KE, Paschal CR, Zakarian C, Goodwin S, Jensen TD, Robb E, McCombie WR, Sedlazeck FJ, Zook JM, Montgomery SB, Garrison E, Kolmogorov M, Schatz MC, McLaughlin RN, Dashnow H, Zody MC, Loose M, Jain M, Eichler EE, Miller DE. Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.05.24303792. [PMID: 38496498 PMCID: PMC10942501 DOI: 10.1101/2024.03.05.24303792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Less than half of individuals with a suspected Mendelian condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.
Collapse
Affiliation(s)
- Jonas A. Gustafson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | - Sophia B. Gibson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Nikhita Damaraju
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Institute for Public Health Genetics, University of Washington, Seattle, WA, USA
| | - Miranda PG Zalusky
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - David Twesigomwe
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Lei Yang
- Pacific Northwest Research Institute, Seattle, WA, USA
| | | | | | - Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Nathan D. Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Human Technopole, Milan, Italy
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Angela L. Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Joy Goffena
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Zachery Anderson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Sophie HR Storz
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Sydney A. Ward
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Maisha Sinha
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Claudia Gonzaga-Jauregui
- International Laboratory for Human Genome Research, Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México
| | - Wayne E. Clarke
- New York Genome Center, New York, NY, USA
- Outlier Informatics Inc., Saskatoon, SK, Canada
| | | | | | | | | | | | - Mahler Revsine
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | | | - Cate R. Paschal
- Department of Laboratories, Seattle Children’s Hospital, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Christina Zakarian
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | | | - Esther Robb
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | | | | | | | | | - Fritz J. Sedlazeck
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Mikhail Kolmogorov
- Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Richard N. McLaughlin
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
- Pacific Northwest Research Institute, Seattle, WA, USA
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
| | | | - Matt Loose
- Deep Seq, School of Life Sciences, University of Nottingham, Nottingham, England
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Khoury College of Computer Sciences, Northeastern University, Boston, MA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Danny E. Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA
| |
Collapse
|
23
|
Linderman MD, Wallace J, van der Heyde A, Wieman E, Brey D, Shi Y, Hansen P, Shamsi Z, Liu J, Gelb BD, Bashir A. NPSV-deep: a deep learning method for genotyping structural variants in short read genome sequencing data. Bioinformatics 2024; 40:btae129. [PMID: 38444093 PMCID: PMC10955255 DOI: 10.1093/bioinformatics/btae129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 01/15/2024] [Accepted: 03/04/2024] [Indexed: 03/07/2024] Open
Abstract
MOTIVATION Structural variants (SVs) play a causal role in numerous diseases but can be difficult to detect and accurately genotype (determine zygosity) with short-read genome sequencing data (SRS). Improving SV genotyping accuracy in SRS data, particularly for the many SVs first detected with long-read sequencing, will improve our understanding of genetic variation. RESULTS NPSV-deep is a deep learning-based approach for genotyping previously reported insertion and deletion SVs that recasts this task as an image similarity problem. NPSV-deep predicts the SV genotype based on the similarity between pileup images generated from the actual SRS data and matching SRS simulations. We show that NPSV-deep consistently matches or improves upon the state-of-the-art for SV genotyping accuracy across different SV call sets, samples and variant types, including a 25% reduction in genotyping errors for the Genome-in-a-Bottle (GIAB) high-confidence SVs. NPSV-deep is not limited to the SVs as described; it improves deletion genotyping concordance a further 1.5 percentage points for GIAB SVs (92%) by automatically correcting imprecise/incorrectly described SVs. AVAILABILITY AND IMPLEMENTATION Python/C++ source code and pre-trained models freely available at https://github.com/mlinderm/npsv2.
Collapse
Affiliation(s)
- Michael D Linderman
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Jacob Wallace
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Alderik van der Heyde
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Eliza Wieman
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Daniel Brey
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Yiran Shi
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Peter Hansen
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | | | | | - Bruce D Gelb
- Mindich Child Health and Development Institute and the Departments of Pediatrics and Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
| | - Ali Bashir
- Google, Mountain View, CA 94043, United States
| |
Collapse
|
24
|
Porubsky D, Eichler EE. A 25-year odyssey of genomic technology advances and structural variant discovery. Cell 2024; 187:1024-1037. [PMID: 38290514 DOI: 10.1016/j.cell.2024.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 12/20/2023] [Accepted: 01/02/2024] [Indexed: 02/01/2024]
Abstract
This perspective focuses on advances in genome technology over the last 25 years and their impact on germline variant discovery within the field of human genetics. The field has witnessed tremendous technological advances from microarrays to short-read sequencing and now long-read sequencing. Each technology has provided genome-wide access to different classes of human genetic variation. We are now on the verge of comprehensive variant detection of all forms of variation for the first time with a single assay. We predict that this transition will further transform our understanding of human health and biology and, more importantly, provide novel insights into the dynamic mutational processes shaping our genomes.
Collapse
Affiliation(s)
- David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
25
|
Audano PA, Beck CR. Small polymorphisms are a source of ancestral bias in structural variant breakpoint placement. Genome Res 2024; 34:7-19. [PMID: 38176712 PMCID: PMC10904011 DOI: 10.1101/gr.278203.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 01/02/2024] [Indexed: 01/06/2024]
Abstract
High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥50 bp) has improved to near base pair precision. Despite these advances, many SV breakpoint locations are subject to systematic bias affecting variant representation. To understand why SV breakpoints are inconsistent across samples, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identify 882 SV insertions and 180 SV deletions with variable breakpoints not anchored in tandem repeats (TRs) or segmental duplications (SDs). SVs called from aligned sequencing reads increase breakpoint disagreements by 2×-16×. Sequence accuracy had a minimal impact on breakpoints, but we observe a strong effect of ancestry. We confirm that SNP and indel polymorphisms are enriched at shifted breakpoints and are also absent from variant callsets. Breakpoint homology increases the likelihood of imprecise SV calls and the distance they are shifted, and tandem duplications are the most heavily affected SVs. Because graph genome methods normalize SV calls across samples, we investigated graphs generated by two different methods and find the resulting breakpoints are subject to other technical biases affecting breakpoint accuracy. The breakpoint inconsistencies we characterize affect ∼5% of the SVs called in a human genome and can impact variant interpretation and annotation. These limitations underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoints, and increase the value of callsets for investigating breakpoint features.
Collapse
Affiliation(s)
- Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA;
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, Connecticut 06030, USA
| |
Collapse
|
26
|
Jamshidi Parvar S, Hall BA, Shorthouse D. Interpreting the effect of mutations to protein binding sites from large-scale genomic screens. Methods 2024; 222:122-132. [PMID: 38185227 DOI: 10.1016/j.ymeth.2023.12.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/27/2023] [Accepted: 12/22/2023] [Indexed: 01/09/2024] Open
Abstract
Predicting the functionality of missense mutations is extremely difficult. Large-scale genomic screens are commonly performed to identify mutational correlates or drivers of disease and treatment resistance, but interpretation of how these mutations impact protein function is limited. One such consequence of mutations to a protein is to impact its ability to bind and interact with partners or small molecules such as ATP, thereby modulating its function. Multiple methods exist for predicting the impact of a single mutation on protein-protein binding energy, but it is difficult in the context of a genomic screen to understand if these mutations with large impacts on binding are more common than statistically expected. We present a methodology for taking mutational data from large-scale genomic screens and generating functional and statistical insights into their role in the binding of proteins both with each other and their small molecule ligands. This allows a quantitative and statistical analysis to determine whether mutations impacting protein binding or ligand interactions are occurring more or less frequently than expected by chance. We achieve this by calculating the potential impact of any possible mutation and comparing an expected distribution to the observed mutations. This method is applied to examples demonstrating its ability to interpret mutations involved in protein-protein binding, protein-DNA interactions, and the evolution of therapeutic resistance.
Collapse
Affiliation(s)
| | - Benjamin A Hall
- UCL Department of Medical Physics and Biomedical Engineering, Mallet Place Engineering Building, University College London, Gower Street, London WC1E 6BT, UK
| | - David Shorthouse
- UCL School of Pharmacy, 29-39 Brunswick Square, London WC1N 1AX, UK.
| |
Collapse
|
27
|
Smeds L, Huson LSA, Ellegren H. Structural genomic variation in the inbred Scandinavian wolf population contributes to the realized genetic load but is positively affected by immigration. Evol Appl 2024; 17:e13652. [PMID: 38333557 PMCID: PMC10848878 DOI: 10.1111/eva.13652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 01/08/2024] [Accepted: 01/16/2024] [Indexed: 02/10/2024] Open
Abstract
When populations decrease in size and may become isolated, genomic erosion by loss of diversity from genetic drift and accumulation of deleterious mutations is likely an inevitable consequence. In such cases, immigration (genetic rescue) is necessary to restore levels of genetic diversity and counteract inbreeding depression. Recent work in conservation genomics has studied these processes focusing on the genetic diversity of single nucleotide polymorphisms. In contrast, our knowledge about structural genomic variation (insertions, deletions, duplications and inversions) in endangered species is limited. We analysed whole-genome, short-read sequences from 212 wolves from the inbred Scandinavian population and from neighbouring populations in Finland and Russia, and detected >35,000 structural variants (SVs) after stringent quality and genotype frequency filtering; >26,000 high-confidence variants remained after manual curation. The majority of variants were shorter than 1 kb, with a distinct peak in the length distribution of deletions at 190 bp, corresponding to insertion events of SINE/tRNA-Lys elements. The site frequency spectrum of SVs in protein-coding regions was significantly shifted towards rare alleles compared to putatively neutral variants, consistent with purifying selection. The realized genetic load of SVs in protein-coding regions increased with inbreeding levels in the Scandinavian population, but immigration provided a genetic rescue effect by lowering the load and reintroducing ancestral alleles at loci fixed for derived SVs. Our study shows that structural variation comprises a common type of in part deleterious mutations in endangered species and that establishing gene flow is necessary to mitigate the negative consequences of loss of diversity.
Collapse
Affiliation(s)
- Linnéa Smeds
- Department of Ecology and Genetics, Evolutionary BiologyUppsala UniversityUppsalaSweden
| | - Lars S. A. Huson
- Department of Ecology and Genetics, Evolutionary BiologyUppsala UniversityUppsalaSweden
| | - Hans Ellegren
- Department of Ecology and Genetics, Evolutionary BiologyUppsala UniversityUppsalaSweden
| |
Collapse
|
28
|
Li Y, Yao J, Sang H, Wang Q, Su L, Zhao X, Xia Z, Wang F, Wang K, Lou D, Wang G, Waterhouse RM, Wang H, Luo S, Sun C. Pan-genome analysis highlights the role of structural variation in the evolution and environmental adaptation of Asian honeybees. Mol Ecol Resour 2024; 24:e13905. [PMID: 37996991 DOI: 10.1111/1755-0998.13905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 10/20/2023] [Accepted: 11/09/2023] [Indexed: 11/25/2023]
Abstract
The Asian honeybee, Apis cerana, is an ecologically and economically important pollinator. Mapping its genetic variation is key to understanding population-level health, histories and potential capacities to respond to environmental changes. However, most efforts to date were focused on single nucleotide polymorphisms (SNPs) based on a single reference genome, thereby ignoring larger scale genomic variation. We employed long-read sequencing technologies to generate a chromosome-scale reference genome for the ancestral group of A. cerana. Integrating this with 525 resequencing data sets, we constructed the first pan-genome of A. cerana, encompassing almost the entire gene content. We found that 31.32% of genes in the pan-genome were variably present across populations, providing a broad gene pool for environmental adaptation. We identified and characterized structural variations (SVs) and found that they were not closely linked with SNP distributions; however, the formation of SVs was closely associated with transposable elements. Furthermore, phylogenetic analysis using SVs revealed a novel A. cerana ecological group not recoverable from the SNP data. Performing environmental association analysis identified a total of 44 SVs likely to be associated with environmental adaptation. Verification and analysis of one of these, a 330 bp deletion in the Atpalpha gene, indicated that this SV may promote the cold adaptation of A. cerana by altering gene expression. Taken together, our study demonstrates the feasibility and utility of applying pan-genome approaches to map and explore genetic feature variations of honeybee populations, and in particular to examine the role of SVs in the evolution and environmental adaptation of A. cerana.
Collapse
Affiliation(s)
- Yancan Li
- State Key Laboratory of Resource Insects, Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, Beijing, China
- Western Research Institute, Chinese Academy of Agricultural Sciences, Changji, China
| | - Jun Yao
- State Key Laboratory of Resource Insects, Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Huiling Sang
- State Key Laboratory of Resource Insects, Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Quangui Wang
- State Key Laboratory of Resource Insects, Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Long Su
- Institute of Plant Protection, Shandong Academy of Agricultural Sciences, Jinan, China
| | - Xiaomeng Zhao
- College of Animal Science, Shanxi Agricultural University, Shanxi, China
| | - Zhenyu Xia
- State Key Laboratory of Resource Insects, Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Feiran Wang
- State Key Laboratory of Resource Insects, Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, Beijing, China
- Western Research Institute, Chinese Academy of Agricultural Sciences, Changji, China
| | - Kai Wang
- State Key Laboratory of Resource Insects, Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Delong Lou
- Shandong Provincial Animal Husbandry Station, Jinan, China
| | - Guizhi Wang
- Department of Animal Science, Shandong Agricultural University, Taian, China
| | - Robert M Waterhouse
- Department of Ecology and Evolution, University of Lausanne, and SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Huihua Wang
- State Key Laboratory of Resource Insects, Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Shudong Luo
- State Key Laboratory of Resource Insects, Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, Beijing, China
- Western Research Institute, Chinese Academy of Agricultural Sciences, Changji, China
| | - Cheng Sun
- College of Life Sciences, Capital Normal University, Beijing, China
| |
Collapse
|
29
|
Mahmoud M, Huang Y, Garimella K, Audano PA, Wan W, Prasad N, Handsaker RE, Hall S, Pionzio A, Schatz MC, Talkowski ME, Eichler EE, Levy SE, Sedlazeck FJ. Utility of long-read sequencing for All of Us. Nat Commun 2024; 15:837. [PMID: 38281971 PMCID: PMC10822842 DOI: 10.1038/s41467-024-44804-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 01/03/2024] [Indexed: 01/30/2024] Open
Abstract
The All of Us (AoU) initiative aims to sequence the genomes of over one million Americans from diverse ethnic backgrounds to improve personalized medical care. In a recent technical pilot, we compare the performance of traditional short-read sequencing with long-read sequencing in a small cohort of samples from the HapMap project and two AoU control samples representing eight datasets. Our analysis reveals substantial differences in the ability of these technologies to accurately sequence complex medically relevant genes, particularly in terms of gene coverage and pathogenic variant identification. We also consider the advantages and challenges of using low coverage sequencing to increase sample numbers in large cohort analysis. Our results show that HiFi reads produce the most accurate results for both small and large variants. Further, we present a cloud-based pipeline to optimize SNV, indel and SV calling at scale for long-reads analysis. These results lead to widespread improvements across AoU.
Collapse
Affiliation(s)
- M Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Y Huang
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, 02141, USA
| | - K Garimella
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, 02141, USA
| | - P A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - W Wan
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, 02141, USA
| | - N Prasad
- Discovery Life Sciences, Huntsville, AL, 35806, USA
| | - R E Handsaker
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02141, USA
| | - S Hall
- Discovery Life Sciences, Huntsville, AL, 35806, USA
| | - A Pionzio
- Discovery Life Sciences, Huntsville, AL, 35806, USA
| | - M C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - M E Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02141, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - E E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - S E Levy
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA
| | - F J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
30
|
Harvey WT, Ebert P, Ebler J, Audano PA, Munson KM, Hoekzema K, Porubsky D, Beck CR, Marschall T, Garimella K, Eichler EE. Whole-genome long-read sequencing downsampling and its effect on variant-calling precision and recall. Genome Res 2023; 33:2029-2040. [PMID: 38190646 PMCID: PMC10760522 DOI: 10.1101/gr.278070.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 11/03/2023] [Indexed: 01/10/2024]
Abstract
Advances in long-read sequencing (LRS) technologies continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant-calling precision and recall of Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant-calling precision and recall of SVs and indels in HiFi data sets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant call sets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.
Collapse
Affiliation(s)
- William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, Connecticut 06030-6403, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Kiran Garimella
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA;
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
31
|
Birnbaum R. Rediscovering tandem repeat variation in schizophrenia: challenges and opportunities. Transl Psychiatry 2023; 13:402. [PMID: 38123544 PMCID: PMC10733427 DOI: 10.1038/s41398-023-02689-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 11/23/2023] [Accepted: 11/27/2023] [Indexed: 12/23/2023] Open
Abstract
Tandem repeats (TRs) are prevalent throughout the genome, constituting at least 3% of the genome, and often highly polymorphic. The high mutation rate of TRs, which can be orders of magnitude higher than single-nucleotide polymorphisms and indels, indicates that they are likely to make significant contributions to phenotypic variation, yet their contribution to schizophrenia has been largely ignored by recent genome-wide association studies (GWAS). Tandem repeat expansions are already known causative factors for over 50 disorders, while common tandem repeat variation is increasingly being identified as significantly associated with complex disease and gene regulation. The current review summarizes key background concepts of tandem repeat variation as pertains to disease risk, elucidating their potential for schizophrenia association. An overview of next-generation sequencing-based methods that may be applied for TR genome-wide identification is provided, and some key methodological challenges in TR analyses are delineated.
Collapse
Affiliation(s)
- Rebecca Birnbaum
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
32
|
Chaisson MJP, Sulovari A, Valdmanis PN, Miller DE, Eichler EE. Advances in the discovery and analyses of human tandem repeats. Emerg Top Life Sci 2023; 7:361-381. [PMID: 37905568 PMCID: PMC10806765 DOI: 10.1042/etls20230074] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/18/2023] [Accepted: 10/18/2023] [Indexed: 11/02/2023]
Abstract
Long-read sequencing platforms provide unparalleled access to the structure and composition of all classes of tandemly repeated DNA from STRs to satellite arrays. This review summarizes our current understanding of their organization within the human genome, their importance with respect to disease, as well as the advances and challenges in understanding their genetic diversity and functional effects. Novel computational methods are being developed to visualize and associate these complex patterns of human variation with disease, expression, and epigenetic differences. We predict accurate characterization of this repeat-rich form of human variation will become increasingly relevant to both basic and clinical human genetics.
Collapse
Affiliation(s)
- Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, U.S.A
- The Genomic and Epigenomic Regulation Program, USC Norris Cancer Center, University of Southern California, Los Angeles, CA 90089, U.S.A
| | - Arvis Sulovari
- Computational Biology, Cajal Neuroscience Inc, Seattle, WA 98102, U.S.A
| | - Paul N Valdmanis
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
| | - Danny E Miller
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, U.S.A
- Department of Pediatrics, University of Washington, Seattle, WA 98195, U.S.A
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, U.S.A
| |
Collapse
|
33
|
Shi J, Jia Z, Sun J, Wang X, Zhao X, Zhao C, Liang F, Song X, Guan J, Jia X, Yang J, Chen Q, Yu K, Jia Q, Wu J, Wang D, Xiao Y, Xu X, Liu Y, Wu S, Zhong Q, Wu J, Cui S, Bo X, Wu Z, Park M, Kellis M, He K. Structural variants involved in high-altitude adaptation detected using single-molecule long-read sequencing. Nat Commun 2023; 14:8282. [PMID: 38092772 PMCID: PMC10719358 DOI: 10.1038/s41467-023-44034-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 11/27/2023] [Indexed: 12/17/2023] Open
Abstract
Structural variants (SVs), accounting for a larger fraction of the genome than SNPs/InDels, are an important pool of genetic variation, enabling environmental adaptations. Here, we perform long-read sequencing data of 320 Tibetan and Han samples and show that SVs are highly involved in high-altitude adaptation. We expand the landscape of global SVs, apply robust models of selection and population differentiation combining SVs, SNPs and InDels, and use epigenomic analyses to predict enhancers, target genes and biological functions. We reveal diverse Tibetan-specific SVs affecting the regulatory circuitry of biological functions, including the hypoxia response, energy metabolism and pulmonary function. We find a Tibetan-specific deletion disrupts a super-enhancer and downregulates EPAS1 using enhancer reporter, cellular knock-out and DNA pull-down assays. Our study expands the global SV landscape, reveals the role of gene-regulatory circuitry rewiring in human adaptation, and illustrates the diverse functional roles of SVs in human biology.
Collapse
Affiliation(s)
- Jinlong Shi
- Medical Big Data Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
- National Engineering Research Center of Medical Big Data, Chinese PLA General Hospital, Beijing, 100853, China
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
- Beijing Key Laboratory for Precision Medicine of Chronic Heart Failure, Chinese PLA General Hospital, Beijing, China
| | - Zhilong Jia
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
- Beijing Key Laboratory for Precision Medicine of Chronic Heart Failure, Chinese PLA General Hospital, Beijing, China
- Medical Artificial Intelligence Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
| | - Jinxiu Sun
- Medical Big Data Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
- National Engineering Research Center of Medical Big Data, Chinese PLA General Hospital, Beijing, 100853, China
| | - Xiaoreng Wang
- Laboratory of Nuclear and Radiation Injury, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
- State Key Laboratory of Experimental Hematology, Beijing, 100853, China
| | - Xiaojing Zhao
- Beijing Key Laboratory for Precision Medicine of Chronic Heart Failure, Chinese PLA General Hospital, Beijing, China
- Translational Medicine Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
| | - Chenghui Zhao
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
- Research Center for Biomedical Engineering, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
| | - Fan Liang
- NextOmics Biosciences Inc, Wuhan, 430000, China
| | - Xinyu Song
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
- Medical Artificial Intelligence Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
| | - Jiawei Guan
- Medical Big Data Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
- National Engineering Research Center of Medical Big Data, Chinese PLA General Hospital, Beijing, 100853, China
| | - Xue Jia
- Laboratory of Nuclear and Radiation Injury, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
| | - Jing Yang
- Laboratory of Nuclear and Radiation Injury, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
| | - Qi Chen
- Medical Big Data Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
- National Engineering Research Center of Medical Big Data, Chinese PLA General Hospital, Beijing, 100853, China
| | - Kang Yu
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
| | - Qian Jia
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
| | - Jing Wu
- Medical Big Data Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
- National Engineering Research Center of Medical Big Data, Chinese PLA General Hospital, Beijing, 100853, China
| | - Depeng Wang
- NextOmics Biosciences Inc, Wuhan, 430000, China
| | - Yuhui Xiao
- NextOmics Biosciences Inc, Wuhan, 430000, China
| | - Xiaoman Xu
- NextOmics Biosciences Inc, Wuhan, 430000, China
| | - Yinzhe Liu
- NextOmics Biosciences Inc, Wuhan, 430000, China
| | - Shijing Wu
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
| | - Qin Zhong
- Medical Big Data Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
- National Engineering Research Center of Medical Big Data, Chinese PLA General Hospital, Beijing, 100853, China
| | - Jue Wu
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
| | - Saijia Cui
- Beijing Key Laboratory for Precision Medicine of Chronic Heart Failure, Chinese PLA General Hospital, Beijing, China
| | - Xiaochen Bo
- Beijing Institute of Radiation Medicine, Beijing, 100850, China
| | | | | | - Manolis Kellis
- Massachusetts Institute of Technology; MIT Computer Science and Artificial Intelligence Laboratory, Broad Institute of MIT and Harvard, Cambridge, 02139, MA, USA
| | - Kunlun He
- Medical Big Data Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China.
- National Engineering Research Center of Medical Big Data, Chinese PLA General Hospital, Beijing, 100853, China.
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China.
- Beijing Key Laboratory for Precision Medicine of Chronic Heart Failure, Chinese PLA General Hospital, Beijing, China.
| |
Collapse
|
34
|
Reis ALM, Rapadas M, Hammond JM, Gamaarachchi H, Stevanovski I, Ayuputeri Kumaheri M, Chintalaphani SR, Dissanayake DSB, Siggs OM, Hewitt AW, Llamas B, Brown A, Baynam G, Mann GJ, McMorran BJ, Easteal S, Hermes A, Jenkins MR, Patel HR, Deveson IW. The landscape of genomic structural variation in Indigenous Australians. Nature 2023; 624:602-610. [PMID: 38093003 PMCID: PMC10733147 DOI: 10.1038/s41586-023-06842-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 11/07/2023] [Indexed: 12/20/2023]
Abstract
Indigenous Australians harbour rich and unique genomic diversity. However, Aboriginal and Torres Strait Islander ancestries are historically under-represented in genomics research and almost completely missing from reference datasets1-3. Addressing this representation gap is critical, both to advance our understanding of global human genomic diversity and as a prerequisite for ensuring equitable outcomes in genomic medicine. Here we apply population-scale whole-genome long-read sequencing4 to profile genomic structural variation across four remote Indigenous communities. We uncover an abundance of large insertion-deletion variants (20-49 bp; n = 136,797), structural variants (50 b-50 kb; n = 159,912) and regions of variable copy number (>50 kb; n = 156). The majority of variants are composed of tandem repeat or interspersed mobile element sequences (up to 90%) and have not been previously annotated (up to 62%). A large fraction of structural variants appear to be exclusive to Indigenous Australians (12% lower-bound estimate) and most of these are found in only a single community, underscoring the need for broad and deep sampling to achieve a comprehensive catalogue of genomic structural variation across the Australian continent. Finally, we explore short tandem repeats throughout the genome to characterize allelic diversity at 50 known disease loci5, uncover hundreds of novel repeat expansion sites within protein-coding genes, and identify unique patterns of diversity and constraint among short tandem repeat sequences. Our study sheds new light on the dimensions and dynamics of genomic structural variation within and beyond Australia.
Collapse
Affiliation(s)
- Andre L M Reis
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
- Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia
| | - Melissa Rapadas
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
| | - Jillian M Hammond
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
| | - Hasindu Gamaarachchi
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
- School of Computer Science and Engineering, University of New South Wales, Sydney, New South Wales, Australia
| | - Igor Stevanovski
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
| | - Meutia Ayuputeri Kumaheri
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
| | - Sanjog R Chintalaphani
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
- Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia
| | - Duminda S B Dissanayake
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
- Institute for Applied Ecology, University of Canberra, Canberra, Australian Capital Territory, Australia
| | - Owen M Siggs
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
- Department of Ophthalmology, Flinders University, Bedford Park, South Australia, Australia
| | - Alex W Hewitt
- Menzies Institute for Medical Research, University of Tasmania, Hobart, Tasmania, Australia
| | - Bastien Llamas
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
- Australian Centre for Ancient DNA, School of Biological Sciences and Environment Institute, University of Adelaide, Adelaide, South Australia, Australia
- ARC Centre of Excellence for Australian Biodiversity and Heritage, University of Adelaide, Adelaide, South Australia, Australia
- Indigenous Genomics, Telethon Kids Institute, Adelaide, South Australia, Australia
| | - Alex Brown
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
- Indigenous Genomics, Telethon Kids Institute, Adelaide, South Australia, Australia
| | - Gareth Baynam
- Telethon Kids Institute and Division of Paediatrics, Faculty of Health and Medical Sciences, University of Western Australia, Perth, Western Australia, Australia
- Genetic Services of Western Australia, Western Australian Department of Health, Perth, Western Australia, Australia
- Western Australian Register of Developmental Anomalies, Western Australian Department of Health, Perth, Western Australia, Australia
| | - Graham J Mann
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Brendan J McMorran
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Simon Easteal
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Azure Hermes
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Misty R Jenkins
- Immunology Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
| | - Hardip R Patel
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia.
| | - Ira W Deveson
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia.
- Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
35
|
Makova KD, Pickett BD, Harris RS, Hartley GA, Cechova M, Pal K, Nurk S, Yoo D, Li Q, Hebbar P, McGrath BC, Antonacci F, Aubel M, Biddanda A, Borchers M, Bomberg E, Bouffard GG, Brooks SY, Carbone L, Carrel L, Carroll A, Chang PC, Chin CS, Cook DE, Craig SJ, de Gennaro L, Diekhans M, Dutra A, Garcia GH, Grady PG, Green RE, Haddad D, Hallast P, Harvey WT, Hickey G, Hillis DA, Hoyt SJ, Jeong H, Kamali K, Kosakovsky Pond SL, LaPolice TM, Lee C, Lewis AP, Loh YHE, Masterson P, McCoy RC, Medvedev P, Miga KH, Munson KM, Pak E, Paten B, Pinto BJ, Potapova T, Rhie A, Rocha JL, Ryabov F, Ryder OA, Sacco S, Shafin K, Shepelev VA, Slon V, Solar SJ, Storer JM, Sudmant PH, Sweetalana, Sweeten A, Tassia MG, Thibaud-Nissen F, Ventura M, Wilson MA, Young AC, Zeng H, Zhang X, Szpiech ZA, Huber CD, Gerton JL, Yi SV, Schatz MC, Alexandrov IA, Koren S, O’Neill RJ, Eichler E, Phillippy AM. The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.30.569198. [PMID: 38077089 PMCID: PMC10705393 DOI: 10.1101/2023.11.30.569198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Apes possess two sex chromosomes-the male-specific Y and the X shared by males and females. The Y chromosome is crucial for male reproduction, with deletions linked to infertility. The X chromosome carries genes vital for reproduction and cognition. Variation in mating patterns and brain function among great apes suggests corresponding differences in their sex chromosome structure and evolution. However, due to their highly repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the state-of-the-art experimental and computational methods developed for the telomere-to-telomere (T2T) human genome, we produced gapless, complete assemblies of the X and Y chromosomes for five great apes (chimpanzee, bonobo, gorilla, Bornean and Sumatran orangutans) and a lesser ape, the siamang gibbon. These assemblies completely resolved ampliconic, palindromic, and satellite sequences, including the entire centromeres, allowing us to untangle the intricacies of ape sex chromosome evolution. We found that, compared to the X, ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements. This divergence on the Y arises from the accumulation of lineage-specific ampliconic regions and palindromes (which are shared more broadly among species on the X) and from the abundance of transposable elements and satellites (which have a lower representation on the X). Our analysis of Y chromosome genes revealed lineage-specific expansions of multi-copy gene families and signatures of purifying selection. In summary, the Y exhibits dynamic evolution, while the X is more stable. Finally, mapping short-read sequencing data from >100 great ape individuals revealed the patterns of diversity and selection on their sex chromosomes, demonstrating the utility of these reference assemblies for studies of great ape evolution. These complete sex chromosome assemblies are expected to further inform conservation genetics of nonhuman apes, all of which are endangered species.
Collapse
Affiliation(s)
| | - Brandon D. Pickett
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Monika Cechova
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Karol Pal
- Penn State University, University Park, PA, USA
| | - Sergey Nurk
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - DongAhn Yoo
- University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Johns Hopkins University, Baltimore, MD, USA
| | - Prajna Hebbar
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | | | | | - Erich Bomberg
- University of Münster, Münster, Germany
- MPI for Developmental Biology, Tübingen, Germany
| | - Gerard G. Bouffard
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shelise Y. Brooks
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lucia Carbone
- Oregon Health & Science University, Portland, OR, USA
- Oregon National Primate Research Center, Hillsboro, OR, USA
| | - Laura Carrel
- Penn State University School of Medicine, Hershey, PA, USA
| | | | | | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | | | | | | | - Mark Diekhans
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Amalia Dutra
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gage H. Garcia
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Glenn Hickey
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - David A. Hillis
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | - Hyeonsoo Jeong
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | | | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Karen H. Miga
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Evgenia Pak
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Benedict Paten
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | - Arang Rhie
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | | | - Samuel Sacco
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | - Steven J. Solar
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Sweetalana
- Penn State University, University Park, PA, USA
| | - Alex Sweeten
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Johns Hopkins University, Baltimore, MD, USA
| | | | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Alice C. Young
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Xinru Zhang
- Penn State University, University Park, PA, USA
| | | | | | | | - Soojin V. Yi
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | | | - Sergey Koren
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Evan Eichler
- University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Adam M. Phillippy
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
36
|
Xu Z, Li Q, Marchionni L, Wang K. PhenoSV: interpretable phenotype-aware model for the prioritization of genes affected by structural variants. Nat Commun 2023; 14:7805. [PMID: 38016949 PMCID: PMC10684511 DOI: 10.1038/s41467-023-43651-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 11/15/2023] [Indexed: 11/30/2023] Open
Abstract
Structural variants (SVs) represent a major source of genetic variation associated with phenotypic diversity and disease susceptibility. While long-read sequencing can discover over 20,000 SVs per human genome, interpreting their functional consequences remains challenging. Existing methods for identifying disease-related SVs focus on deletion/duplication only and cannot prioritize individual genes affected by SVs, especially for noncoding SVs. Here, we introduce PhenoSV, a phenotype-aware machine-learning model that interprets all major types of SVs and genes affected. PhenoSV segments and annotates SVs with diverse genomic features and employs a transformer-based architecture to predict their impacts under a multiple-instance learning framework. With phenotype information, PhenoSV further utilizes gene-phenotype associations to prioritize phenotype-related SVs. Evaluation on extensive human SV datasets covering all SV types demonstrates PhenoSV's superior performance over competing methods. Applications in diseases suggest that PhenoSV can determine disease-related genes from SVs. A web server and a command-line tool for PhenoSV are available at https://phenosv.wglab.org .
Collapse
Affiliation(s)
- Zhuoran Xu
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Quan Li
- Princess Margaret Cancer Centre, University Health Network, University of Toronto, Toronto, ON, M5G2C1, Canada
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
37
|
Chu C, Lin EW, Tran A, Jin H, Ho NI, Veit A, Cortes-Ciriano I, Burns KH, Ting DT, Park PJ. The landscape of human SVA retrotransposons. Nucleic Acids Res 2023; 51:11453-11465. [PMID: 37823611 PMCID: PMC10681720 DOI: 10.1093/nar/gkad821] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 09/12/2023] [Accepted: 09/20/2023] [Indexed: 10/13/2023] Open
Abstract
SINE-VNTR-Alu (SVA) retrotransposons are evolutionarily young and still-active transposable elements (TEs) in the human genome. Several pathogenic SVA insertions have been identified that directly mutate host genes to cause neurodegenerative and other types of diseases. However, due to their sequence heterogeneity and complex structures as well as limitations in sequencing techniques and analysis, SVA insertions have been less well studied compared to other mobile element insertions. Here, we identified polymorphic SVA insertions from 3646 whole-genome sequencing (WGS) samples of >150 diverse populations and constructed a polymorphic SVA insertion reference catalog. Using 20 long-read samples, we also assembled reference and polymorphic SVA sequences and characterized the internal hexamer/variable-number-tandem-repeat (VNTR) expansions as well as differing SVA activity for SVA subfamilies and human populations. In addition, we developed a module to annotate both reference and polymorphic SVA copies. By characterizing the landscape of both reference and polymorphic SVA retrotransposons, our study enables more accurate genotyping of these elements and facilitate the discovery of pathogenic SVA insertions.
Collapse
Affiliation(s)
- Chong Chu
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Eric W Lin
- Massachusetts General Hospital Cancer Center, Harvard Medical School, Charlestown, MA 02129, USA
- Department of Medicine, Massachusetts General Hospital Harvard Medical School, Boston, MA 02114, USA
| | - Antuan Tran
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Hu Jin
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Natalie I Ho
- Massachusetts General Hospital Cancer Center, Harvard Medical School, Charlestown, MA 02129, USA
- Department of Medicine, Massachusetts General Hospital Harvard Medical School, Boston, MA 02114, USA
| | - Alexander Veit
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Isidro Cortes-Ciriano
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
| | - Kathleen H Burns
- Department of Pathology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02215, USA
| | - David T Ting
- Massachusetts General Hospital Cancer Center, Harvard Medical School, Charlestown, MA 02129, USA
- Department of Medicine, Massachusetts General Hospital Harvard Medical School, Boston, MA 02114, USA
| | - Peter J Park
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
38
|
Ren L, Duan X, Dong L, Zhang R, Yang J, Gao Y, Peng R, Hou W, Liu Y, Li J, Yu Y, Zhang N, Shang J, Liang F, Wang D, Chen H, Sun L, Hao L, Scherer A, Nordlund J, Xiao W, Xu J, Tong W, Hu X, Jia P, Ye K, Li J, Jin L, Hong H, Wang J, Fan S, Fang X, Zheng Y, Shi L. Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance. Genome Biol 2023; 24:270. [PMID: 38012772 PMCID: PMC10680274 DOI: 10.1186/s13059-023-03109-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 11/13/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND Genomic DNA reference materials are widely recognized as essential for ensuring data quality in omics research. However, relying solely on reference datasets to evaluate the accuracy of variant calling results is incomplete, as they are limited to benchmark regions. Therefore, it is important to develop DNA reference materials that enable the assessment of variant detection performance across the entire genome. RESULTS We established a DNA reference material suite from four immortalized cell lines derived from a family of parents and monozygotic twins. Comprehensive reference datasets of 4.2 million small variants and 15,000 structural variants were integrated and certified for evaluating the reliability of germline variant calls inside the benchmark regions. Importantly, the genetic built-in-truth of the Quartet family design enables estimation of the precision of variant calls outside the benchmark regions. Using the Quartet reference materials along with study samples, batch effects are objectively monitored and alleviated by training a machine learning model with the Quartet reference datasets to remove potential artifact calls. Moreover, the matched RNA and protein reference materials and datasets from the Quartet project enables cross-omics validation of variant calls from multiomics data. CONCLUSIONS The Quartet DNA reference materials and reference datasets provide a unique resource for objectively assessing the quality of germline variant calls throughout the whole-genome regions and improving the reliability of large-scale genomic profiling.
Collapse
Affiliation(s)
- Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Xiaoke Duan
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | | | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, Guangdong, China
| | - Yuechen Gao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Rongxue Peng
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Jingjing Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Fan Liang
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Depeng Wang
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Hui Chen
- OrigiMed Co., Ltd, Shanghai, China
| | - Lele Sun
- Sequanta Technologies Co., Ltd, Shanghai, China
| | | | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Jessica Nordlund
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Department of Medical Sciences, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Wenming Xiao
- Office of Oncologic Diseases, Office of New Drugs, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Xin Hu
- Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Peng Jia
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Jing Wang
- National Institute of Metrology, Beijing, China.
| | - Shaohua Fan
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Xiang Fang
- National Institute of Metrology, Beijing, China.
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Shanghai Cancer Center, Fudan University, Shanghai, China
- International Human Phenome Institutes, Shanghai, China
| |
Collapse
|
39
|
Rice ES, Alberdi A, Alfieri J, Athrey G, Balacco JR, Bardou P, Blackmon H, Charles M, Cheng HH, Fedrigo O, Fiddaman SR, Formenti G, Frantz LAF, Gilbert MTP, Hearn CJ, Jarvis ED, Klopp C, Marcos S, Mason AS, Velez-Irizarry D, Xu L, Warren WC. A pangenome graph reference of 30 chicken genomes allows genotyping of large and complex structural variants. BMC Biol 2023; 21:267. [PMID: 37993882 PMCID: PMC10664547 DOI: 10.1186/s12915-023-01758-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 11/02/2023] [Indexed: 11/24/2023] Open
Abstract
BACKGROUND The red junglefowl, the wild outgroup of domestic chickens, has historically served as a reference for genomic studies of domestic chickens. These studies have provided insight into the etiology of traits of commercial importance. However, the use of a single reference genome does not capture diversity present among modern breeds, many of which have accumulated molecular changes due to drift and selection. While reference-based resequencing is well-suited to cataloging simple variants such as single-nucleotide changes and short insertions and deletions, it is mostly inadequate to discover more complex structural variation in the genome. METHODS We present a pangenome for the domestic chicken consisting of thirty assemblies of chickens from different breeds and research lines. RESULTS We demonstrate how this pangenome can be used to catalog structural variants present in modern breeds and untangle complex nested variation. We show that alignment of short reads from 100 diverse wild and domestic chickens to this pangenome reduces reference bias by 38%, which affects downstream genotyping results. This approach also allows for the accurate genotyping of a large and complex pair of structural variants at the K feathering locus using short reads, which would not be possible using a linear reference. CONCLUSIONS We expect that this new paradigm of genomic reference will allow better pinpointing of exact mutations responsible for specific phenotypes, which will in turn be necessary for breeding chickens that meet new sustainability criteria and are resilient to quickly evolving pathogen threats.
Collapse
Affiliation(s)
- Edward S Rice
- Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
- Faculty of Veterinary Medicine, Ludwig-Maximilians-Universität, Munich, Germany
| | - Antton Alberdi
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen (UCPH), Copenhagen, Denmark
| | - James Alfieri
- Department of Ecology & Evolutionary Biology, Texas A&M University, College Station, TX, USA
| | - Giridhar Athrey
- Department of Poultry Science, Texas A&M University, College Station, TX, USA
| | - Jennifer R Balacco
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Philippe Bardou
- Sigenae, GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, 31326, France
| | - Heath Blackmon
- Department of Biology, Texas A&M University, College Station, TX, USA
| | - Mathieu Charles
- University Paris-Saclay, INRAE, AgroParisTech, GABI, Sigenae, Jouy-en-Josas, France
| | - Hans H Cheng
- Avian Disease and Oncology Laboratory, USDA, ARS, USNPRC, East Lansing, MI, USA
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | | | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Laurent A F Frantz
- Faculty of Veterinary Medicine, Ludwig-Maximilians-Universität, Munich, Germany
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, E1 4DQ, UK
| | - M Thomas P Gilbert
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen (UCPH), Copenhagen, Denmark
| | - Cari J Hearn
- Avian Disease and Oncology Laboratory, USDA, ARS, USNPRC, East Lansing, MI, USA
| | - Erich D Jarvis
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
- The Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Christophe Klopp
- Sigenae, Genotoul Bioinfo, MIAT UR875, INRAE, Castanet Tolosan, France
| | - Sofia Marcos
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen (UCPH), Copenhagen, Denmark
- Applied Genomics and Bioinformatics, University of the Basque Country (UPV/EHU), Leioa, Bilbao, Spain
| | | | | | - Luohao Xu
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Key Laboratory of Aquatic Science of Chongqing, School of Life Sciences, Southwest University, Chongqing, 400715, China
| | - Wesley C Warren
- Department of Animal Sciences, University of Missouri, Columbia, MO, USA.
| |
Collapse
|
40
|
Guo M, Songyang Z, Xiong Y. ChArmTelo Enables Large-Scale Chromosome Arm-Level Telomere Analysis across Human Populations and in Cancer Patients. SMALL METHODS 2023; 7:e2300385. [PMID: 37526331 DOI: 10.1002/smtd.202300385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 06/29/2023] [Indexed: 08/02/2023]
Abstract
Telomeres are structures protecting chromosome ends. However, a scalable and cost-effective method to investigate chromosome arm-level (ChArm) telomeres (Telos) in large-scale projects is still lacking, hindering intensive investigation of high-resolution telomeres across cancers and other diseases. Here, ChArmTelo, the first computational toolbox to analyze telomeres at chromosome arm level in human and other animal species, using 10X linked-read and similar technologies, is presented. ChArmTelo currently consists of two algorithms, TeloEM and TeloKnow, for arm-level telomere length (TL) analysis. The algorithms are demonstrated by comprehensive analysis of chromosome arm-level telomere lengths (chArmTLs) in nearly 400 whole genome sequencing samples (WGS) from human populations and animals, including healthy and cancer samples. Notably, considerable performance improvement contributed by using the latest complete telomere-to-telomere reference genome (CHM13v2), compared to hg38, is shown. ChArmTelo reveals population-specific chArmTL differences and liver cancer signatures of chArmTLs and that DNA replication origin disruption may contribute to cancer by affecting TLs. Importantly, ChArmTelo can be readily applied to tens of thousands of cancer and healthy samples with published WGS data.
Collapse
Affiliation(s)
- Mengbiao Guo
- Key Laboratory of Gene Engineering of the Ministry of Education, Institute of Healthy Aging Research, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510006, China
| | - Zhou Songyang
- Key Laboratory of Gene Engineering of the Ministry of Education, Institute of Healthy Aging Research, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510006, China
| | - Yuanyan Xiong
- Key Laboratory of Gene Engineering of the Ministry of Education, Institute of Healthy Aging Research, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510006, China
| |
Collapse
|
41
|
Bhati M, Mapel XM, Lloret-Villas A, Pausch H. Structural variants and short tandem repeats impact gene expression and splicing in bovine testis tissue. Genetics 2023; 225:iyad161. [PMID: 37655920 PMCID: PMC10627265 DOI: 10.1093/genetics/iyad161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 06/05/2023] [Accepted: 08/24/2023] [Indexed: 09/02/2023] Open
Abstract
Structural variants (SVs) and short tandem repeats (STRs) are significant sources of genetic variation. However, the impacts of these variants on gene regulation have not been investigated in cattle. Here, we genotyped and characterized 19,408 SVs and 374,821 STRs in 183 bovine genomes and investigated their impact on molecular phenotypes derived from testis transcriptomes. We found that 71% STRs were multiallelic. The vast majority (95%) of STRs and SVs were in intergenic and intronic regions. Only 37% SVs and 40% STRs were in high linkage disequilibrium (LD) (R2 > 0.8) with surrounding SNPs/insertions and deletions (Indels), indicating that SNP-based association testing and genomic prediction are blind to a nonnegligible portion of genetic variation. We showed that both SVs and STRs were more than 2-fold enriched among expression and splicing QTL (e/sQTL) relative to SNPs/Indels and were often associated with differential expression and splicing of multiple genes. Deletions and duplications had larger impacts on splicing and expression than any other type of SV. Exonic duplications predominantly increased gene expression either through alternative splicing or other mechanisms, whereas expression- and splicing-associated STRs primarily resided in intronic regions and exhibited bimodal effects on the molecular phenotypes investigated. Most e/sQTL resided within 100 kb of the affected genes or splicing junctions. We pinpoint candidate causal STRs and SVs associated with the expression of SLC13A4 and TTC7B and alternative splicing of a lncRNA and CAPP1. We provide a catalog of STRs and SVs for taurine cattle and show that these variants contribute substantially to gene expression and splicing variation.
Collapse
Affiliation(s)
- Meenu Bhati
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| | - Xena Marie Mapel
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| | | | - Hubert Pausch
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| |
Collapse
|
42
|
Devine SE. Emerging Opportunities to Study Mobile Element Insertions and Their Source Elements in an Expanding Universe of Sequenced Human Genomes. Genes (Basel) 2023; 14:1923. [PMID: 37895272 PMCID: PMC10606232 DOI: 10.3390/genes14101923] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 09/29/2023] [Accepted: 09/30/2023] [Indexed: 10/29/2023] Open
Abstract
Three mobile element classes, namely Alu, LINE-1 (L1), and SVA elements, remain actively mobile in human genomes and continue to produce new mobile element insertions (MEIs). Historically, MEIs have been discovered and studied using several methods, including: (1) Southern blots, (2) PCR (including PCR display), and (3) the detection of MEI copies from young subfamilies. We are now entering a new phase of MEI discovery where these methods are being replaced by whole genome sequencing and bioinformatics analysis to discover novel MEIs. We expect that the universe of sequenced human genomes will continue to expand rapidly over the next several years, both with short-read and long-read technologies. These resources will provide unprecedented opportunities to discover MEIs and study their impact on human traits and diseases. They also will allow the MEI community to discover and study the source elements that produce these new MEIs, which will facilitate our ability to study source element regulation in various tissue contexts and disease states. This, in turn, will allow us to better understand MEI mutagenesis in humans and the impact of this mutagenesis on human biology.
Collapse
Affiliation(s)
- Scott E Devine
- Institute for Genome Sciences, Department of Medicine, and Greenebaum Comprehensive Cancer Center, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| |
Collapse
|
43
|
Kang M, Wu H, Liu H, Liu W, Zhu M, Han Y, Liu W, Chen C, Song Y, Tan L, Yin K, Zhao Y, Yan Z, Lou S, Zan Y, Liu J. The pan-genome and local adaptation of Arabidopsis thaliana. Nat Commun 2023; 14:6259. [PMID: 37802986 PMCID: PMC10558531 DOI: 10.1038/s41467-023-42029-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Accepted: 09/27/2023] [Indexed: 10/08/2023] Open
Abstract
Arabidopsis thaliana serves as a model species for investigating various aspects of plant biology. However, the contribution of genomic structural variations (SVs) and their associate genes to the local adaptation of this widely distribute species remains unclear. Here, we de novo assemble chromosome-level genomes of 32 A. thaliana ecotypes and determine that variable genes expand the gene pool in different ecotypes and thus assist local adaptation. We develop a graph-based pan-genome and identify 61,332 SVs that overlap with 18,883 genes, some of which are highly involved in ecological adaptation of this species. For instance, we observe a specific 332 bp insertion in the promoter region of the HPCA1 gene in the Tibet-0 ecotype that enhances gene expression, thereby promotes adaptation to alpine environments. These findings augment our understanding of the molecular mechanisms underlying the local adaptation of A. thaliana across diverse habitats.
Collapse
Affiliation(s)
- Minghui Kang
- State Key Laboratory of Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Haolin Wu
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Huanhuan Liu
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Wenyu Liu
- State Key Laboratory of Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China
| | - Mingjia Zhu
- State Key Laboratory of Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China
| | - Yu Han
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Wei Liu
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Chunlin Chen
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Yan Song
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Luna Tan
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Kangqun Yin
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Yusen Zhao
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Zhen Yan
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Shangling Lou
- State Key Laboratory of Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China.
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China.
| | - Yanjun Zan
- Key Laboratory of Tobacco Improvement and Biotechnology, Tobacco Research Institute, Chinese Academy of Agricultural Sciences, Qingdao, 266000, China.
| | - Jianquan Liu
- State Key Laboratory of Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China.
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China.
| |
Collapse
|
44
|
Yang C, Zhou Y, Song Y, Wu D, Zeng Y, Nie L, Liu P, Zhang S, Chen G, Xu J, Zhou H, Zhou L, Qian X, Liu C, Tan S, Zhou C, Dai W, Xu M, Qi Y, Wang X, Guo L, Fan G, Wang A, Deng Y, Zhang Y, Jin J, He Y, Guo C, Guo G, Zhou Q, Xu X, Yang H, Wang J, Xu S, Mao Y, Jin X, Ruan J, Zhang G. The complete and fully-phased diploid genome of a male Han Chinese. Cell Res 2023; 33:745-761. [PMID: 37452091 PMCID: PMC10542383 DOI: 10.1038/s41422-023-00849-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 06/29/2023] [Indexed: 07/18/2023] Open
Abstract
Since the release of the complete human genome, the priority of human genomic study has now been shifting towards closing gaps in ethnic diversity. Here, we present a fully phased and well-annotated diploid human genome from a Han Chinese male individual (CN1), in which the assemblies of both haploids achieve the telomere-to-telomere (T2T) level. Comparison of this diploid genome with the CHM13 haploid T2T genome revealed significant variations in the centromere. Outside the centromere, we discovered 11,413 structural variations, including numerous novel ones. We also detected thousands of CN1 alleles that have accumulated high substitution rates and a few that have been under positive selection in the East Asian population. Further, we found that CN1 outperforms CHM13 as a reference genome in mapping and variant calling for the East Asian population owing to the distinct structural variants of the two references. Comparison of SNP calling for a large cohort of 8869 Chinese genomes using CN1 and CHM13 as reference respectively showed that the reference bias profoundly impacts rare SNP calling, with nearly 2 million rare SNPs miss-called with different reference genomes. Finally, applying the CN1 as a reference, we discovered 5.80 Mb and 4.21 Mb putative introgression sequences from Neanderthal and Denisovan, respectively, including many East Asian specific ones undetected using CHM13 as the reference. Our analyses reveal the advances of using CN1 as a reference for population genomic studies and paleo-genomic studies. This complete genome will serve as an alternative reference for future genomic studies on the East Asian population.
Collapse
Affiliation(s)
- Chentao Yang
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Yang Zhou
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI Research-Wuhan, BGI, Wuhan, Hubei, China
| | - Yanni Song
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Dongya Wu
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Institute of Crop Science & Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang, China
| | - Yan Zeng
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Lei Nie
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Guangji Chen
- BGI-Shenzhen, Shenzhen, Guangdong, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Jinjin Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Hongling Zhou
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Long Zhou
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Innovation Center of Yangtze River Delta, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xiaobo Qian
- BGI-Shenzhen, Shenzhen, Guangdong, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Chenlu Liu
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | | | | | - Wei Dai
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Mengyang Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Yanwei Qi
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Xiaobo Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Lidong Guo
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Guangyi Fan
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Aijun Wang
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Yuan Deng
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Yong Zhang
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Yunqiu He
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Chunxue Guo
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI-Hangzhou, Hangzhou, Zhejiang, China
| | - Guoji Guo
- School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Qing Zhou
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xun Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Jian Wang
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
- Human Phenome Institute, Zhangjiang Fudan International Innovation Center, and Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai, China
- Jiangsu Key Laboratory of Phylogenomics & Comparative Genomics, International Joint Center of Genomics of Jiangsu Province School of Life Sciences, Jiangsu Normal University, Xuzhou, Jiangsu, China
- Department of Liver Surgery and Transplantation Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Xin Jin
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China.
| | - Guojie Zhang
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China.
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China.
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China.
- Innovation Center of Yangtze River Delta, Zhejiang University, Hangzhou, Zhejiang, China.
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.
| |
Collapse
|
45
|
Liu X, Liu W, Lenstra JA, Zheng Z, Wu X, Yang J, Li B, Yang Y, Qiu Q, Liu H, Li K, Liang C, Guo X, Ma X, Abbott RJ, Kang M, Yan P, Liu J. Evolutionary origin of genomic structural variations in domestic yaks. Nat Commun 2023; 14:5617. [PMID: 37726270 PMCID: PMC10509194 DOI: 10.1038/s41467-023-41220-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 08/23/2023] [Indexed: 09/21/2023] Open
Abstract
Yak has been subject to natural selection, human domestication and interspecific introgression during its evolution. However, genetic variants favored by each of these processes have not been distinguished previously. We constructed a graph-genome for 47 genomes of 7 cross-fertile bovine species. This allowed detection of 57,432 high-resolution structural variants (SVs) within and across the species, which were genotyped in 386 individuals. We distinguished the evolutionary origins of diverse SVs in domestic yaks by phylogenetic analyses. We further identified 334 genes overlapping with SVs in domestic yaks that bore potential signals of selection from wild yaks, plus an additional 686 genes introgressed from cattle. Nearly 90% of the domestic yaks were introgressed by cattle. Introgression of an SV spanning the KIT gene triggered the breeding of white domestic yaks. We validated a significant association of the selected stratified SVs with gene expression, which contributes to phenotypic variations. Our results highlight that SVs of different origins contribute to the phenotypic diversity of domestic yaks.
Collapse
Affiliation(s)
- Xinfeng Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou, 730050, China
- Academy of Plateau Science and Sustainability, Qinghai Normal University, Xining, 810016, China
| | - Wenyu Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China
| | - Johannes A Lenstra
- Faculty of Veterinary Medicine, Utrecht University, Utrecht, 3508 TD, The Netherlands
| | - Zeyu Zheng
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China
| | - Xiaoyun Wu
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou, 730050, China
| | - Jiao Yang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China
| | - Bowen Li
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China
| | - Yongzhi Yang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China
| | - Qiang Qiu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China
| | - Hongyu Liu
- Anhui Provincial Laboratory of Local Livestock and Poultry Genetical Resource Conservation and Breeding, College of Animal Science and Technology, Anhui Agricultural University, Hefei, 230036, China
| | - Kexin Li
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China
| | - Chunnian Liang
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou, 730050, China
| | - Xian Guo
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou, 730050, China
| | - Xiaoming Ma
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou, 730050, China
| | - Richard J Abbott
- School of Biology, University of St Andrews, St Andrews, KY16 9AJ, UK
| | - Minghui Kang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China.
| | - Ping Yan
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou, 730050, China.
| | - Jianquan Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China.
- Academy of Plateau Science and Sustainability, Qinghai Normal University, Xining, 810016, China.
| |
Collapse
|
46
|
Xia X, Zhang F, Li S, Luo X, Peng L, Dong Z, Pausch H, Leonard AS, Crysnanto D, Wang S, Tong B, Lenstra JA, Han J, Li F, Xu T, Gu L, Jin L, Dang R, Huang Y, Lan X, Ren G, Wang Y, Gao Y, Ma Z, Cheng H, Ma Y, Chen H, Pang W, Lei C, Chen N. Structural variation and introgression from wild populations in East Asian cattle genomes confer adaptation to local environment. Genome Biol 2023; 24:211. [PMID: 37723525 PMCID: PMC10507960 DOI: 10.1186/s13059-023-03052-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 09/07/2023] [Indexed: 09/20/2023] Open
Abstract
BACKGROUND Structural variations (SVs) in individual genomes are major determinants of complex traits, including adaptability to environmental variables. The Mongolian and Hainan cattle breeds in East Asia are of taurine and indicine origins that have evolved to adapt to cold and hot environments, respectively. However, few studies have investigated SVs in East Asian cattle genomes and their roles in environmental adaptation, and little is known about adaptively introgressed SVs in East Asian cattle. RESULTS In this study, we examine the roles of SVs in the climate adaptation of these two cattle lineages by generating highly contiguous chromosome-scale genome assemblies. Comparison of the two assemblies along with 18 Mongolian and Hainan cattle genomes obtained by long-read sequencing data provides a catalog of 123,898 nonredundant SVs. Several SVs detected from long reads are in exons of genes associated with epidermal differentiation, skin barrier, and bovine tuberculosis resistance. Functional investigations show that a 108-bp exonic insertion in SPN may affect the uptake of Mycobacterium tuberculosis by macrophages, which might contribute to the low susceptibility of Hainan cattle to bovine tuberculosis. Genotyping of 373 whole genomes from 39 breeds identifies 2610 SVs that are differentiated along a "north-south" gradient in China and overlap with 862 related genes that are enriched in pathways related to environmental adaptation. We identify 1457 Chinese indicine-stratified SVs that possibly originate from banteng and are frequent in Chinese indicine cattle. CONCLUSIONS Our findings highlight the unique contribution of SVs in East Asian cattle to environmental adaptation and disease resistance.
Collapse
Affiliation(s)
- Xiaoting Xia
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Fengwei Zhang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Shuang Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Xiaoyu Luo
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Lixin Peng
- National Engineering Research Center for Non-Food Biorefinery, Guangxi Academy of Sciences, 98 Daling Road, Nanning, China
| | - Zheng Dong
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Hubert Pausch
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8006, Zurich, Switzerland
| | - Alexander S Leonard
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8006, Zurich, Switzerland
| | - Danang Crysnanto
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8006, Zurich, Switzerland
| | - Shikang Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Bin Tong
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Johannes A Lenstra
- Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands
| | - Jianlin Han
- Livestock Genetics Program, International Livestock Research Institute (ILRI), Nairobi, Kenya
- CAAS-ILRI Joint Laboratory On Livestock and Forage Genetic Resources, Institute of Animal Science, Chinese Academy of Agriculture Sciences (CAAS), Beijing, China
| | - Fuyong Li
- Department of Infectious Diseases and Public Health, Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| | - Tieshan Xu
- Tropical Crops Genetic Resources Institute, Chinese Academy of Tropical Agricultural Sciences, Haikou, China
| | - Lihong Gu
- Institute of Animal Science & Veterinary Medicine, Hainan Academy of Agricultural Sciences, Haikou, China
| | - Liangliang Jin
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Ruihua Dang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Yongzhen Huang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Xianyong Lan
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Gang Ren
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Yu Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Yuanpeng Gao
- College of Veterinary Medicine, Northwest A&F University, Xianyang, Yangling, China
| | - Zhijie Ma
- Qinghai Academy of Animal Science and Veterinary Medicine, Qinghai University, Xining, China
| | - Haijian Cheng
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
- Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Shandong Key Lab of Animal Disease Control and Breeding, Jinan, China
| | - Yun Ma
- Key Laboratory of Ruminant Molecular and Cellular Breeding of Ningxia Hui Autonomous Region, School of Agriculture, Ningxia University, Yinchuan, China
| | - Hong Chen
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Weijun Pang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China.
| | - Chuzhao Lei
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China.
| | - Ningbo Chen
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China.
| |
Collapse
|
47
|
Ichikawa K, Kawahara R, Asano T, Morishita S. A landscape of complex tandem repeats within individual human genomes. Nat Commun 2023; 14:5530. [PMID: 37709751 PMCID: PMC10502081 DOI: 10.1038/s41467-023-41262-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 08/28/2023] [Indexed: 09/16/2023] Open
Abstract
Markedly expanded tandem repeats (TRs) have been correlated with ~60 diseases. TR diversity has been considered a clue toward understanding missing heritability. However, haplotype-resolved long TRs remain mostly hidden or blacked out because their complex structures (TRs composed of various units and minisatellites containing >10-bp units) make them difficult to determine accurately with existing methods. Here, using a high-precision algorithm to determine complex TR structures from long, accurate reads of PacBio HiFi, an investigation of 270 Japanese control samples yields several genome-wide findings. Approximately 322,000 TRs are difficult to impute from the surrounding single-nucleotide variants. Greater genetic divergence of TR loci is significantly correlated with more events of younger replication slippage. Complex TRs are more abundant than single-unit TRs, and a tendency for complex TRs to consist of <10-bp units and single-unit TRs to be minisatellites is statistically significant at loci with ≥500-bp TRs. Of note, 8909 loci with extended TRs (>100b longer than the mode) contain several known disease-associated TRs and are considered candidates for association with disorders. Overall, complex TRs and minisatellites are found to be abundant and diverse, even in genetically small Japanese populations, yielding insights into the landscape of long TRs.
Collapse
Affiliation(s)
- Kazuki Ichikawa
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan
| | - Riki Kawahara
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan
| | - Takeshi Asano
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan
| | - Shinichi Morishita
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan.
| |
Collapse
|
48
|
Alhatim H, Abdullah MNH, Abu Bakar S, Amer SA. Effect of Carcinomas on Autosomal Trait Screening: A Review Article. Curr Issues Mol Biol 2023; 45:7275-7285. [PMID: 37754244 PMCID: PMC10529457 DOI: 10.3390/cimb45090460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 08/23/2023] [Accepted: 08/31/2023] [Indexed: 09/28/2023] Open
Abstract
This review highlights the effect of carcinomas on the results of the examination of autosomal genetic traits for identification and paternity tests when carcinoid tissue is the only source and no other samples are available. In DNA typing or genetic fingerprinting, variable elements are isolated and identified within the base pair sequences that form the DNA. The person's probable identity can be determined by analysing nucleotide sequences in particular regions of DNA unique to everyone. Genetics plays an increasingly important role in the risk stratification and management of carcinoma patients. The available information from previous studies has indicated that in some incidents, including mass disasters and crimes such as terrorist incidents, biological evidence may not be available at the scene of the accident, except for some unknown human remains found in the form of undefined human tissues. If these tissues have cancerous tumours, it may affect the examination of the genetic traits derived from these samples, thereby resulting in a failure to identify the person. Pathology units, more often, verify the identity of the patients who were diagnosed with cancer in reference to their deceased tumorous relatives. Genetic fingerprinting (GF) is also used in paternity testing when the alleged parent disappeared or died and earlier was diagnosed and treated for cancer.
Collapse
Affiliation(s)
- Husein Alhatim
- Department of Biomedical Sciences, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia, Serdang 43400, Selangor, Malaysia; (H.A.); (S.A.B.)
| | - Muhammad Nazrul Hakim Abdullah
- Department of Biomedical Sciences, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia, Serdang 43400, Selangor, Malaysia; (H.A.); (S.A.B.)
| | - Suhaili Abu Bakar
- Department of Biomedical Sciences, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia, Serdang 43400, Selangor, Malaysia; (H.A.); (S.A.B.)
| | - Sayed Amin Amer
- Department of Forensic Sciences, College of Criminal Justice, Naif Arab University for Security Sciences, Riyadh 14812, Saudi Arabia
| |
Collapse
|
49
|
Antinucci M, Comas D, Calafell F. Population history modulates the fitness effects of Copy Number Variation in the Roma. Hum Genet 2023; 142:1327-1343. [PMID: 37311904 PMCID: PMC10449987 DOI: 10.1007/s00439-023-02579-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 06/02/2023] [Indexed: 06/15/2023]
Abstract
We provide the first whole genome Copy Number Variant (CNV) study addressing Roma, along with reference populations from South Asia, the Middle East and Europe. Using CNV calling software for short-read sequence data, we identified 3171 deletions and 489 duplications. Taking into account the known population history of the Roma, as inferred from whole genome nucleotide variation, we could discern how this history has shaped CNV variation. As expected, patterns of deletion variation, but not duplication, in the Roma followed those obtained from single nucleotide polymorphisms (SNPs). Reduced effective population size resulting in slightly relaxed natural selection may explain our observation of an increase in intronic (but not exonic) deletions within Loss of Function (LoF)-intolerant genes. Over-representation analysis for LoF-intolerant gene sets hosting intronic deletions highlights a substantial accumulation of shared biological processes in Roma, intriguingly related to signaling, nervous system and development features, which may be related to the known profile of private disease in the population. Finally, we show the link between deletions and known trait-related SNPs reported in the genome-wide association study (GWAS) catalog, which exhibited even frequency distributions among the studied populations. This suggests that, in general human populations, the strong association between deletions and SNPs associated to biomedical conditions and traits could be widespread across continental populations, reflecting a common background of potentially disease/trait-related CNVs.
Collapse
Affiliation(s)
- Marco Antinucci
- Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - David Comas
- Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Francesc Calafell
- Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain.
| |
Collapse
|
50
|
Hallast P, Ebert P, Loftus M, Yilmaz F, Audano PA, Logsdon GA, Bonder MJ, Zhou W, Höps W, Kim K, Li C, Hoyt SJ, Dishuck PC, Porubsky D, Tsetsos F, Kwon JY, Zhu Q, Munson KM, Hasenfeld P, Harvey WT, Lewis AP, Kordosky J, Hoekzema K, O'Neill RJ, Korbel JO, Tyler-Smith C, Eichler EE, Shi X, Beck CR, Marschall T, Konkel MK, Lee C. Assembly of 43 human Y chromosomes reveals extensive complexity and variation. Nature 2023; 621:355-364. [PMID: 37612510 PMCID: PMC10726138 DOI: 10.1038/s41586-023-06425-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 07/11/2023] [Indexed: 08/25/2023]
Abstract
The prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date1 and led to its systematic omission from genomic analyses. Here we present de novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution and report considerable diversity in size and structure. Half of the male-specific euchromatic region is subject to large inversions with a greater than twofold higher recurrence rate compared with all other chromosomes2. Ampliconic sequences associated with these inversions show differing mutation rates that are sequence context dependent, and some ampliconic genes exhibit evidence for concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest heterochromatic region in the human genome, Yq12, is composed of alternating repeat arrays that show extensive variation in the number, size and distribution, but retain a 1:1 copy-number ratio. Finally, our data suggest that the boundary between the recombining pseudoautosomal region 1 and the non-recombining portions of the X and Y chromosomes lies 500 kb away from the currently established1 boundary. The availability of fully sequence-resolved Y chromosomes from multiple individuals provides a unique opportunity for identifying new associations of traits with specific Y-chromosomal variants and garnering insights into the evolution and function of complex regions of the human genome.
Collapse
Affiliation(s)
- Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Mark Loftus
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marc Jan Bonder
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Wolfram Höps
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Kwondo Kim
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Chong Li
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Savannah J Hoyt
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Fotios Tsetsos
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Jee Young Kwon
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Patrick Hasenfeld
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Rachel J O'Neill
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- The University of Connecticut Health Center, Farmington, CT, USA
| | - Jan O Korbel
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | | | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Xinghua Shi
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- The University of Connecticut Health Center, Farmington, CT, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Miriam K Konkel
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
| |
Collapse
|