1
|
Kwon SG, Bae GH, Hong JH, Choi JW, Choi JH, Lim NS, Jeon C, Mali NM, Jun MS, Shin J, Kim J, Cho ES, Han MH, Oh JW. Comprehensive analysis of somatic mutations and structural variations in domestic pig. Mamm Genome 2024; 35:645-656. [PMID: 39177814 DOI: 10.1007/s00335-024-10058-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 08/01/2024] [Indexed: 08/24/2024]
Abstract
Understanding somatic mutations and structural variations in domestic pigs (Sus scrofa domestica) is critical due to their increasing importance as model organisms in biomedical research. In this study, we conducted a comprehensive analysis through whole-genome sequencing of skin, organs, and blood samples. By examining two pig pedigrees, we investigated the inheritance and sharedness of structural variants among fathers, mothers, and offsprings. Utilizing single-cell clonal expansion techniques, we observed significant variations in the number of somatic mutations across different tissues. An in-house developed pipeline enabled precise filtering and analysis of these mutations, resulting in the construction of individual phylogenetic trees for two pigs. These trees explored the developmental relationships between different tissues, revealing insights into clonal expansions from various anatomical locations. This study enhances the understanding of pig genomes, affirming their increasing value in clinical and genomic research, and provides a foundation for future studies in other animals, paralleling previous studies in mice and humans. This approach not only deepens our understanding of mammalian genomic variations but also strengthens the role of pigs as a crucial model in human health and disease research.
Collapse
Affiliation(s)
- Seong Gyu Kwon
- Department of Anatomy, Yonsei University College of Medicine, Seoul, Republic of Korea
- Department of Anatomy, BK21 Plus KNU Biomedical Convergence Program, School of Medicine, Kyungpook National University, Daegu, Republic of Korea
| | - Geon Hue Bae
- Department of Anatomy, Yonsei University College of Medicine, Seoul, Republic of Korea
- Department of Anatomy, BK21 Plus KNU Biomedical Convergence Program, School of Medicine, Kyungpook National University, Daegu, Republic of Korea
| | - Joo Hee Hong
- Department of Anatomy, BK21 Plus KNU Biomedical Convergence Program, School of Medicine, Kyungpook National University, Daegu, Republic of Korea
| | - Jeong-Woo Choi
- Department of Anatomy, BK21 Plus KNU Biomedical Convergence Program, School of Medicine, Kyungpook National University, Daegu, Republic of Korea
| | - June Hyug Choi
- Department of Anatomy, Yonsei University College of Medicine, Seoul, Republic of Korea
- Department of Anatomy, BK21 Plus KNU Biomedical Convergence Program, School of Medicine, Kyungpook National University, Daegu, Republic of Korea
| | - Nam Seop Lim
- Department of Anatomy, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - CheolMin Jeon
- Department of Anatomy, BK21 Plus KNU Biomedical Convergence Program, School of Medicine, Kyungpook National University, Daegu, Republic of Korea
| | - Nanda Maya Mali
- Department of Anatomy, BK21 Plus KNU Biomedical Convergence Program, School of Medicine, Kyungpook National University, Daegu, Republic of Korea
| | - Mee Sook Jun
- Department of Anatomy, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - JaeEun Shin
- Department of Anatomy, Yonsei University College of Medicine, Seoul, Republic of Korea
- Department of Anatomy, BK21 Plus KNU Biomedical Convergence Program, School of Medicine, Kyungpook National University, Daegu, Republic of Korea
| | - JinSoo Kim
- Department of Animal Industry Convergence, Kangwon National University, Chuncheon, Republic of Korea
| | - Eun-Seok Cho
- Department of Livestock Resource Development, National Institute of Animal Science, Jeonbuk, Republic of Korea
| | - Man-Hoon Han
- Department of Pathology, Kyungpook National University Hospital, Daegu, Republic of Korea.
- Department of Pathology, School of Medicine, Kyungpook National University, Daegu, Republic of Korea.
| | - Ji Won Oh
- Department of Anatomy, Yonsei University College of Medicine, Seoul, Republic of Korea.
- Absolute DNA, Inc., Daegu, Republic of Korea.
| |
Collapse
|
2
|
Miao Z, Ren Y, Tarabini A, Yang L, Li H, Ye C, Liti G, Fischer G, Li J, Yue JX. ScRAPdb: an integrated pan-omics database for the Saccharomyces cerevisiae reference assembly panel. Nucleic Acids Res 2024:gkae955. [PMID: 39470715 DOI: 10.1093/nar/gkae955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Revised: 10/05/2024] [Accepted: 10/10/2024] [Indexed: 10/30/2024] Open
Abstract
As a unicellular eukaryote, the budding yeast Saccharomyces cerevisiae strikes a unique balance between biological complexity and experimental tractability, serving as a long-standing classic model for both basic and applied studies. Recently, S. cerevisiae further emerged as a leading system for studying natural diversity of genome evolution and its associated functional implication at population scales. Having high-quality comparative and functional genomics data are critical for such efforts. Here, we exhaustively expanded the telomere-to-telomere (T2T) S. cerevisiae reference assembly panel (ScRAP) that we previously constructed for 142 strains to cover high-quality genome assemblies and annotations of 264 S. cerevisiae strains from diverse geographical and ecological niches and also 33 outgroup strains from all the other Saccharomyces species complex. We created a dedicated online database, ScRAPdb (https://www.evomicslab.org/db/ScRAPdb/), to host this expanded pangenome collection. Furthermore, ScRAPdb also integrates an array of population-scale pan-omics atlases (pantranscriptome, panproteome and panphenome) and extensive data exploration toolkits for intuitive genomics analyses. All curated data and downstream analysis results can be easily downloaded from ScRAPdb. We expect ScRAPdb to become a highly valuable platform for the yeast community and beyond, leading to a pan-omics understanding of the global genetic and phenotypic diversity.
Collapse
Affiliation(s)
- Zepu Miao
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, 651 Dongfeng East Road, Guangzhou 510060, China
| | - Yifan Ren
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, 651 Dongfeng East Road, Guangzhou 510060, China
| | - Andrea Tarabini
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, 7-9 Quai Saint Bernard, Paris 75005, France
| | - Ludong Yang
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, 651 Dongfeng East Road, Guangzhou 510060, China
| | - Huihui Li
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, 651 Dongfeng East Road, Guangzhou 510060, China
| | - Chang Ye
- Department of Chemistry, University of Chicago, 929 E 57th Street, Chicago, IL 60637, USA
| | - Gianni Liti
- CNRS, INSERM, IRCAN, Université Côte d'Azur, 28 Avenue de Valombrose, Nice 06107, France
| | - Gilles Fischer
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, 7-9 Quai Saint Bernard, Paris 75005, France
| | - Jing Li
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, 651 Dongfeng East Road, Guangzhou 510060, China
| | - Jia-Xing Yue
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, 651 Dongfeng East Road, Guangzhou 510060, China
| |
Collapse
|
3
|
Hu T, Mosbruger TL, Tairis NG, Dinou A, Jayaraman P, Sarmady M, Brewster K, Li Y, Hayeck TJ, Duke JL, Monos DS. Targeted and complete genomic sequencing of the major histocompatibility complex in haplotypic form of individual heterozygous samples. Genome Res 2024; 34:1500-1513. [PMID: 39327030 DOI: 10.1101/gr.278588.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 09/19/2024] [Indexed: 09/28/2024]
Abstract
The human major histocompatibility complex (MHC) is a ∼4 Mb genomic segment on Chromosome 6 that plays a pivotal role in the immune response. Despite its importance in various traits and diseases, its complex nature makes it challenging to accurately characterize on a routine basis. We present a novel approach allowing targeted sequencing and de novo haplotypic assembly of the MHC region in heterozygous samples, using long-read sequencing technologies. Our approach is validated using two reference samples, two family trios, and an African-American sample. We achieved excellent coverage (96.6%-99.9% with at least 30× depth) and high accuracy (99.89%-99.99%) for the different haplotypes. This methodology offers a reliable and cost-effective method for sequencing and fully characterizing the MHC without the need for whole-genome sequencing, facilitating broader studies on this important genomic segment and having significant implications in immunology, genetics, and medicine.
Collapse
Affiliation(s)
- Taishan Hu
- Immunogenetics Laboratory, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - Timothy L Mosbruger
- Immunogenetics Laboratory, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - Nikolaos G Tairis
- Immunogenetics Laboratory, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - Amalia Dinou
- Immunogenetics Laboratory, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - Pushkala Jayaraman
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - Mahdi Sarmady
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - Kingham Brewster
- Sequencing and Genotyping Center, Delaware Biotechnology Institute, University of Delaware, Newark, Delaware 19713, USA
| | - Yang Li
- Immunogenetics Laboratory, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - Tristan J Hayeck
- Immunogenetics Laboratory, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Jamie L Duke
- Immunogenetics Laboratory, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - Dimitri S Monos
- Immunogenetics Laboratory, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA;
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|
4
|
Behera S, Belyeu JR, Chen X, Paulin LF, Nguyen NQH, Newman E, Mahmoud M, Menon VK, Qi Q, Joshi P, Marcovina S, Rossi M, Roller E, Han J, Onuchic V, Avery CL, Ballantyne CM, Rodriguez CJ, Kaplan RC, Muzny DM, Metcalf GA, Gibbs RA, Yu B, Boerwinkle E, Eberle MA, Sedlazeck FJ. Identification of allele-specific KIV-2 repeats and impact on Lp(a) measurements for cardiovascular disease risk. BMC Med Genomics 2024; 17:255. [PMID: 39449055 PMCID: PMC11515395 DOI: 10.1186/s12920-024-02024-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 10/07/2024] [Indexed: 10/26/2024] Open
Abstract
The abundance of Lp(a) protein holds significant implications for the risk of cardiovascular disease (CVD), which is directly impacted by the copy number (CN) of KIV-2, a 5.5 kbp sub-region. KIV-2 is highly polymorphic in the population and accurate analysis is challenging. In this study, we present the DRAGEN KIV-2 CN caller, which utilizes short reads. Data across 166 WGS show that the caller has high accuracy, compared to optical mapping and can further phase approximately 50% of the samples. We compared KIV-2 CN numbers to 24 previously postulated KIV-2 relevant SNVs, revealing that many are ineffective predictors of KIV-2 copy number. Population studies, including USA-based cohorts, showed distinct KIV-2 CN, distributions for European-, African-, and Hispanic-American populations and further underscored the limitations of SNV predictors. We demonstrate that the CN estimates correlate significantly with the available Lp(a) protein levels and that phasing is highly important.
Collapse
Affiliation(s)
- Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Jonathan R Belyeu
- Illumina Inc, San Diego, CA, USA
- Present Address: Pacific Biosciences, San Francisco, CA, USA
| | - Xiao Chen
- Illumina Inc, San Diego, CA, USA
- Present Address: Pacific Biosciences, San Francisco, CA, USA
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Ngoc Quynh H Nguyen
- School of Public Health, University of Texas Health Science Center, Houston, TX, USA
| | | | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Vipin K Menon
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Genentech, San Francisco, CA, USA
| | - Qibin Qi
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Parag Joshi
- Medpace Reference Laboratories, Cincinnati, OH, USA
| | | | | | | | | | | | - Christy L Avery
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - Carlos J Rodriguez
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Robert C Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
- Fred Hutchinson Cancer Center, Public Health Sciences Division, Seattle, WA, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Ginger A Metcalf
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Bing Yu
- School of Public Health, University of Texas Health Science Center, Houston, TX, USA
| | - Eric Boerwinkle
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- School of Public Health, University of Texas Health Science Center, Houston, TX, USA
| | - Michael A Eberle
- Illumina Inc, San Diego, CA, USA
- Present Address: Pacific Biosciences, San Francisco, CA, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
5
|
Mutlu MB, Karakaya T, Çelebi HBG, Duymuş F, Seyhan S, Yılmaz S, Yiş U, Atik T, Yetkin MF, Gümüş H. Utility of Optical Genome Mapping in Repeat Disorders. Clin Genet 2024. [PMID: 39435674 DOI: 10.1111/cge.14633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2024] [Revised: 09/30/2024] [Accepted: 10/01/2024] [Indexed: 10/23/2024]
Abstract
Genomic repeat sequences are patterns of nucleic acids that exist in multiple copies throughout the genome. More than 60 Mendelian disorders are caused by the expansion or contraction of these repeats. Various specific methods for determining tandem repeat variations have been developed. However, these methods are highly specific to the genomic region being studied and sometimes require specialized tools. In this study, we have investigated the use of Optical Genome Mapping (OGM) as a diagnostic tool for detecting repeat disorders. We evaluated 19 patients with a prediagnosis of repeat disorders and explained the molecular etiology of 9 of them with OGM (5 patients with Facioscapulohumeral Muscular Dystrophy (FSHD), 2 patients with Friedreich's Ataxia (FA), 1 patient with Fragile X Syndrome (FXS), and 1 patient with Progressive Myoclonic Epilepsy 1A (EPM1A)). We confirmed OGM results with more widely used fragment analysis techniques. This study highlights the utility of OGM as a diagnostic tool for repeat expansion and contraction diseases such as FA, FXS, EPM1A, and FSHD.
Collapse
Affiliation(s)
| | - Taner Karakaya
- Department of Medical Genetics, Samsun Education and Research Hospital, Samsun, Türkiye
| | | | - Fahrettin Duymuş
- Department of Medical Genetics, Konya City Hospital, Konya, Türkiye
| | - Serhat Seyhan
- Laboratory of Genetics, Memorial Şişli Hospital, Istanbul, Türkiye
| | - Sanem Yılmaz
- Department of Pediatrics, Division of Pediatric Neurology, Ege University Faculty of Medicine, Izmir, Türkiye
| | - Uluç Yiş
- Department of Pediatrics, Division of Pediatric Neurology, Dokuz Eylül University Faculty of Medicine, Izmir, Türkiye
| | - Tahir Atik
- Department of Pediatrics, Division of Pediatric Genetics, Ege University Faculty of Medicine, Izmir, Türkiye
| | - Mehmet Fatih Yetkin
- Department of Neurology, Erciyes University Faculty of Medicine, Kayseri, Türkiye
| | - Hakan Gümüş
- Department of Pediatrics, Division of Pediatric Neurology, Erciyes University Faculty of Medicine, Kayseri, Türkiye
| |
Collapse
|
6
|
Dodge TO, Kim BY, Baczenas JJ, Banerjee SM, Gunn TR, Donny AE, Given LA, Rice AR, Haase Cox SK, Weinstein ML, Cross R, Moran BM, Haber K, Haghani NB, Machin Kairuz JA, Gellert HR, Du K, Aguillon SM, Tudor MS, Gutiérrez-Rodríguez C, Rios-Cardenas O, Morris MR, Schartl M, Powell DL, Schumer M. Structural genomic variation and behavioral interactions underpin a balanced sexual mimicry polymorphism. Curr Biol 2024; 34:4662-4676.e9. [PMID: 39326413 DOI: 10.1016/j.cub.2024.08.053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/15/2024] [Accepted: 08/29/2024] [Indexed: 09/28/2024]
Abstract
How phenotypic diversity originates and persists within populations are classic puzzles in evolutionary biology. While balanced polymorphisms segregate within many species, it remains rare for both the genetic basis and the selective forces to be known, leading to an incomplete understanding of many classes of traits under balancing selection. Here, we uncover the genetic architecture of a balanced sexual mimicry polymorphism and identify behavioral mechanisms that may be involved in its maintenance in the swordtail fish Xiphophorus birchmanni. We find that ∼40% of X. birchmanni males develop a "false gravid spot," a melanic pigmentation pattern that mimics the "pregnancy spot" associated with sexual maturity in female live-bearing fish. Using genome-wide association mapping, we detect a single intergenic region associated with variation in the false gravid spot phenotype, which is upstream of kitlga, a melanophore patterning gene. By performing long-read sequencing within and across populations, we identify complex structural rearrangements between alternate alleles at this locus. The false gravid spot haplotype drives increased allele-specific expression of kitlga, which provides a mechanistic explanation for the increased melanophore abundance that causes the spot. By studying social interactions in the laboratory and in nature, we find that males with the false gravid spot experience less aggression; however, they also receive increased attention from other males and are disdained by females. These behavioral interactions may contribute to the maintenance of this phenotypic polymorphism in natural populations. We speculate that structural variants affecting gene regulation may be an underappreciated driver of balanced polymorphisms across diverse species.
Collapse
Affiliation(s)
- Tristram O Dodge
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México.
| | - Bernard Y Kim
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA
| | - John J Baczenas
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA
| | - Shreya M Banerjee
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México; Center for Population Biology and Department of Evolution and Ecology, University of California, Davis, 475 Storer Mall, Davis, CA 95616, USA
| | - Theresa R Gunn
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México
| | - Alex E Donny
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México
| | - Lyle A Given
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA
| | - Andreas R Rice
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA
| | - Sophia K Haase Cox
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA
| | - M Luke Weinstein
- Department of Biological Sciences, Ohio University, 7 Depot St., Athens, OH 45701, USA
| | - Ryan Cross
- Department of Biological Sciences, Ohio University, 7 Depot St., Athens, OH 45701, USA
| | - Benjamin M Moran
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México
| | - Kate Haber
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Berkeley High School, 1980 Allston Way, Berkeley, CA 94704, USA
| | - Nadia B Haghani
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México
| | | | - Hannah R Gellert
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA
| | - Kang Du
- Xiphophorus Genetic Stock Center, Texas State University, San Marcos, 601 University Drive, San Marcos, TX 78666, USA
| | - Stepfanie M Aguillon
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México; Department of Ecology and Evolutionary Biology, University of California, Los Angeles, 612 Charles E. Young Drive South, Los Angeles, CA 90095, USA
| | - M Scarlett Tudor
- Cooperative Extension and Aquaculture Research Institute, University of Maine, 33 Salmon Farm Road, Franklin, ME 04634, USA
| | - Carla Gutiérrez-Rodríguez
- Red de Biología Evolutiva, Instituto de Ecología, A.C., Carretera antigua a Coatepec 351, Col. El Haya, Xalapa, Veracruz 91073, México
| | - Oscar Rios-Cardenas
- Red de Biología Evolutiva, Instituto de Ecología, A.C., Carretera antigua a Coatepec 351, Col. El Haya, Xalapa, Veracruz 91073, México
| | - Molly R Morris
- Department of Biological Sciences, Ohio University, 7 Depot St., Athens, OH 45701, USA
| | - Manfred Schartl
- Xiphophorus Genetic Stock Center, Texas State University, San Marcos, 601 University Drive, San Marcos, TX 78666, USA; Developmental Biochemistry, Biocenter, University of Würzburg, Am Hubland, 97074 Wuerzburg, Germany
| | - Daniel L Powell
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México; Department of Biology, Louisiana State University, 202 Life Science Building, Baton Rouge, LA 70803, USA
| | - Molly Schumer
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México; Howard Hughes Medical Institute, 327 Campus Drive, Stanford, CA 94305, USA.
| |
Collapse
|
7
|
Yılmaz F, Karageorgiou C, Kim K, Pajic P, Scheer K, Beck CR, Torregrossa AM, Lee C, Gokcumen O. Reconstruction of the human amylase locus reveals ancient duplications seeding modern-day variation. Science 2024:eadn0609. [PMID: 39418342 DOI: 10.1126/science.adn0609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 05/27/2024] [Accepted: 09/24/2024] [Indexed: 10/19/2024]
Abstract
Previous studies suggested that the copy number of the human salivary amylase gene, AMY1, correlates with starch-rich diets. However, evolutionary analyses are hampered by the absence of accurate, sequence-resolved haplotype variation maps. We identified 30 structurally distinct haplotypes at nucleotide resolution among 98 present-day humans, revealing that the coding sequences of AMY1 copies are evolving under negative selection. Genomic analyses of these haplotypes in archaic hominins and ancient human genomes suggest that a common three-copy haplotype, dating as far back as 800 KYA, has seeded rapidly evolving rearrangements through recurrent non-allelic homologous recombination. Additionally, haplotypes with more than three AMY1 copies have significantly increased in frequency among European farmers over the past 4,000 years, potentially as an adaptive response to increased starch digestion.
Collapse
Affiliation(s)
- Feyza Yılmaz
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | | | - Kwondo Kim
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Petar Pajic
- Department of Biological Sciences, University at Buffalo, Buffalo, NY 14260, USA
| | - Kendra Scheer
- Department of Biological Sciences, University at Buffalo, Buffalo, NY 14260, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
- University of Connecticut, Institute for Systems Genomics, Storrs, CT 06269, USA
- The University of Connecticut Health Center, Farmington, CT 06032, USA
| | - Ann-Marie Torregrossa
- Department of Psychology, University at Buffalo, Buffalo, NY 14260, USA
- University at Buffalo Center for Ingestive Behavior Research, University at Buffalo, Buffalo, NY 14260, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Omer Gokcumen
- Department of Biological Sciences, University at Buffalo, Buffalo, NY 14260, USA
| |
Collapse
|
8
|
Parmar JM, Laing NG, Kennerson ML, Ravenscroft G. Genetics of inherited peripheral neuropathies and the next frontier: looking backwards to progress forwards. J Neurol Neurosurg Psychiatry 2024; 95:992-1001. [PMID: 38744462 PMCID: PMC11503175 DOI: 10.1136/jnnp-2024-333436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 04/10/2024] [Indexed: 05/16/2024]
Abstract
Inherited peripheral neuropathies (IPNs) encompass a clinically and genetically heterogeneous group of disorders causing length-dependent degeneration of peripheral autonomic, motor and/or sensory nerves. Despite gold-standard diagnostic testing for pathogenic variants in over 100 known associated genes, many patients with IPN remain genetically unsolved. Providing patients with a diagnosis is critical for reducing their 'diagnostic odyssey', improving clinical care, and for informed genetic counselling. The last decade of massively parallel sequencing technologies has seen a rapid increase in the number of newly described IPN-associated gene variants contributing to IPN pathogenesis. However, the scarcity of additional families and functional data supporting variants in potential novel genes is prolonging patient diagnostic uncertainty and contributing to the missing heritability of IPNs. We review the last decade of IPN disease gene discovery to highlight novel genes, structural variation and short tandem repeat expansions contributing to IPN pathogenesis. From the lessons learnt, we provide our vision for IPN research as we anticipate the future, providing examples of emerging technologies, resources and tools that we propose that will expedite the genetic diagnosis of unsolved IPN families.
Collapse
Affiliation(s)
- Jevin M Parmar
- Rare Disease Genetics and Functional Genomics, Harry Perkins Institute of Medical Research, Perth, Western Australia, Australia
- Centre for Medical Research, Faculty of Health and Medical Sciences, The University of Western Australia, Perth, Western Australia, Australia
| | - Nigel G Laing
- Centre for Medical Research, Faculty of Health and Medical Sciences, The University of Western Australia, Perth, Western Australia, Australia
- Preventive Genetics, Harry Perkins Institute of Medical Research, Perth, Western Australia, Australia
| | - Marina L Kennerson
- Northcott Neuroscience Laboratory, ANZAC Research Institute, Concord, New South Wales, Australia
- Molecular Medicine Laboratory, Concord Hospital, Concord, New South Wales, Australia
| | - Gianina Ravenscroft
- Rare Disease Genetics and Functional Genomics, Harry Perkins Institute of Medical Research, Perth, Western Australia, Australia
- Centre for Medical Research, Faculty of Health and Medical Sciences, The University of Western Australia, Perth, Western Australia, Australia
| |
Collapse
|
9
|
Ma C, Shi X, Li X, Zhang YP, Peng MS. Comprehensive evaluation and guidance of structural variation detection tools in chicken whole genome sequence data. BMC Genomics 2024; 25:970. [PMID: 39415108 PMCID: PMC11481438 DOI: 10.1186/s12864-024-10875-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 10/08/2024] [Indexed: 10/18/2024] Open
Abstract
BACKGROUND Structural variations (SVs) are widespread across genome and have a great impact on evolution, disease, and phenotypic diversity. Despite the development of numerous bioinformatic tools, commonly referred to as SV callers, tailored for detecting SVs using whole genome sequence (WGS) data and employing diverse algorithms, their performance necessitates rigorous evaluation with real data and validated SVs. Moreover, a considerable proportion of these tools have been primarily designed and optimized using human genome data. Consequently, their applicability and performance in Avian species, characterized by smaller genomes and distinct genomic architectures, remain inadequately assessed. RESULTS We performed a comprehensive assessment of the performance of ten widely used SV callers using population-level real genomic data with the validated five common types of SVs. The performance of SV callers varies with the types and sizes of SVs. As compared with other tools, GRIDSS, Lumpy, Wham, and Manta present better detection accuracy. Pindel can detect more small SVs than others. CNVnator and CNVkit can detect more medium and large copy number variations. Given the poor consistency among different SV callers, the combination calling strategy is not recommended. All tools show poor ability in the detection of insertions (especially with size > 150 bp). At least 50× read depth is required to detect more than 80% of the SVs for most tools. CONCLUSIONS This study highlights the importance and necessity of using real sequencing data, rather than simulated data only, with validated SVs for SV caller evaluation. Some practical guidance and suggestions are provided for SV detection in future researches.
Collapse
Affiliation(s)
- Cheng Ma
- Key Laboratory of Genetic Evolution & Animal Models and Yunnan Key Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
- Department of Medical Biochemistry and Microbiology, Uppsala University, BMC, Uppsala, SE-75123, Sweden
| | - Xian Shi
- Key Laboratory of Genetic Evolution & Animal Models and Yunnan Key Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xuzhen Li
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming, 650201, China
- College of Biological Big Data, Yunnan Agriculture University, Kunming, 650201, China
| | - Ya-Ping Zhang
- Key Laboratory of Genetic Evolution & Animal Models and Yunnan Key Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, 650091, China.
- KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China.
| | - Min-Sheng Peng
- Key Laboratory of Genetic Evolution & Animal Models and Yunnan Key Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China.
| |
Collapse
|
10
|
Schultz LM, Knighton A, Huguet G, Saci Z, Jean-Louis M, Mollon J, Knowles EEM, Glahn DC, Jacquemont S, Almasy L. Copy-number variants differ in frequency across genetic ancestry groups. HGG ADVANCES 2024; 5:100340. [PMID: 39138864 PMCID: PMC11401192 DOI: 10.1016/j.xhgg.2024.100340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 08/07/2024] [Accepted: 08/07/2024] [Indexed: 08/15/2024] Open
Abstract
Copy-number variants (CNVs) have been implicated in a variety of neuropsychiatric and cognitive phenotypes. We found that deleterious CNVs are less prevalent in non-European ancestry groups than they are in European ancestry groups of both the UK Biobank (UKBB) and a US replication cohort (SPARK). We also identified specific recurrent CNVs that consistently differ in frequency across ancestry groups in both the UKBB and SPARK. These ancestry-related differences in CNV prevalence present in both an unselected community population and a family cohort enriched with individuals diagnosed with autism spectrum disorder (ASD) strongly suggest that genetic ancestry should be considered when probing associations between CNVs and health outcomes.
Collapse
Affiliation(s)
- Laura M Schultz
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
| | - Alexys Knighton
- School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
| | | | - Zohra Saci
- CHU Sainte-Justine, Montréal, QC, Canada
| | | | - Josephine Mollon
- Department of Psychiatry and Behavioral Sciences, Boston Children's Hospital, Boston, MA, USA; Department of Psychiatry, Harvard Medical School, Boston, MA, USA
| | - Emma E M Knowles
- Department of Psychiatry and Behavioral Sciences, Boston Children's Hospital, Boston, MA, USA; Department of Psychiatry, Harvard Medical School, Boston, MA, USA
| | - David C Glahn
- Department of Psychiatry and Behavioral Sciences, Boston Children's Hospital, Boston, MA, USA; Department of Psychiatry, Harvard Medical School, Boston, MA, USA
| | - Sébastien Jacquemont
- CHU Sainte-Justine, Montréal, QC, Canada; Department of Pediatrics, Université de Montréal, Montréal, QC, Canada
| | - Laura Almasy
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, USA; Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
11
|
Amstler S, Streiter G, Pfurtscheller C, Forer L, Di Maio S, Weissensteiner H, Paulweber B, Schönherr S, Kronenberg F, Coassin S. Nanopore sequencing with unique molecular identifiers enables accurate mutation analysis and haplotyping in the complex lipoprotein(a) KIV-2 VNTR. Genome Med 2024; 16:117. [PMID: 39380090 PMCID: PMC11462820 DOI: 10.1186/s13073-024-01391-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 10/01/2024] [Indexed: 10/10/2024] Open
Abstract
BACKGROUND Repetitive genome regions, such as variable number of tandem repeats (VNTR) or short tandem repeats (STR), are major constituents of the uncharted dark genome and evade conventional sequencing approaches. The protein-coding LPA kringle IV type-2 (KIV-2) VNTR (5.6 kb per unit, 1-40 units per allele) is a medically highly relevant example with a particularly intricate structure, multiple haplotypes, intragenic homologies, and an intra-VNTR STR. It is the primary regulator of plasma lipoprotein(a) [Lp(a)] concentrations, an important cardiovascular risk factor. Lp(a) concentrations vary widely between individuals and ancestries. Multiple variants and functional haplotypes in the LPA gene and especially in the KIV-2 VNTR strongly contribute to this variance. METHODS We evaluated the performance of amplicon-based nanopore sequencing with unique molecular identifiers (UMI-ONT-Seq) for SNP detection, haplotype mapping, VNTR unit consensus sequence generation, and copy number estimation via coverage-corrected haplotypes quantification in the KIV-2 VNTR. We used 15 human samples and low-level mixtures (0.5 to 5%) of KIV-2 plasmids as a validation set. We then applied UMI-ONT-Seq to extract KIV-2 VNTR haplotypes in 48 multi-ancestry 1000 Genome samples and analyzed at scale a poorly characterized STR within the KIV-2 VNTR. RESULTS UMI-ONT-Seq detected KIV-2 SNPs down to 1% variant level with high sensitivity, specificity, and precision (0.977 ± 0.018; 1.000 ± 0.0005; 0.993 ± 0.02) and accurately retrieved the full-length haplotype of each VNTR unit. Human variant levels were highly correlated with next-generation sequencing (R2 = 0.983) without bias across the whole variant level range. Six reads per UMI produced sequences of each KIV-2 unit with Q40 quality. The KIV-2 repeat number determined by coverage-corrected unique haplotype counting was in close agreement with droplet digital PCR (ddPCR), with 70% of the samples falling even within the narrow confidence interval of ddPCR. We then analyzed 62,679 intra-KIV-2 STR sequences and explored KIV-2 SNP haplotype patterns across five ancestries. CONCLUSIONS UMI-ONT-Seq accurately retrieves the SNP haplotype and precisely quantifies the VNTR copy number of each repeat unit of the complex KIV-2 VNTR region across multiple ancestries. This study utilizes the KIV-2 VNTR, presenting a novel and potent tool for comprehensive characterization of medically relevant complex genome regions at scale.
Collapse
Affiliation(s)
- Stephan Amstler
- Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria
| | - Gertraud Streiter
- Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria
| | - Cathrin Pfurtscheller
- Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria
| | - Lukas Forer
- Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria
| | - Silvia Di Maio
- Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria
| | - Hansi Weissensteiner
- Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria
| | - Bernhard Paulweber
- Department of Internal Medicine I, Paracelsus Medical University, Salzburg, Austria
| | - Sebastian Schönherr
- Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria
| | - Florian Kronenberg
- Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria
| | - Stefan Coassin
- Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria.
| |
Collapse
|
12
|
Gaczorek T, Dudek K, Fritz U, Bahri-Sfar L, Baird SJE, Bonhomme F, Dufresnes C, Gvoždík V, Irwin D, Kotlík P, Marková S, McGinnity P, Migalska M, Moravec J, Natola L, Pabijan M, Phillips KP, Schöneberg Y, Souissi A, Radwan J, Babik W. Widespread Adaptive Introgression of Major Histocompatibility Complex Genes across Vertebrate Hybrid Zones. Mol Biol Evol 2024; 41:msae201. [PMID: 39324637 PMCID: PMC11472244 DOI: 10.1093/molbev/msae201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 08/23/2024] [Accepted: 09/23/2024] [Indexed: 09/27/2024] Open
Abstract
Interspecific introgression is a potentially important source of novel variation of adaptive significance. Although multiple cases of adaptive introgression are well documented, broader generalizations about its targets and mechanisms are lacking. Multiallelic balancing selection, particularly when acting through rare allele advantage, is an evolutionary mechanism expected to favor adaptive introgression. This is because introgressed alleles are likely to confer an immediate selective advantage, facilitating their establishment in the recipient species even in the face of strong genomic barriers to introgression. Vertebrate major histocompatibility complex genes are well-established targets of long-term multiallelic balancing selection, so widespread adaptive major histocompatibility complex introgression is expected. Here, we evaluate this hypothesis using data from 29 hybrid zones formed by fish, amphibians, squamates, turtles, birds, and mammals at advanced stages of speciation. The key prediction of more extensive major histocompatibility complex introgression compared to genome-wide introgression was tested with three complementary statistical approaches. We found evidence for widespread adaptive introgression of major histocompatibility complex genes, providing a link between the process of adaptive introgression and an underlying mechanism. Our work identifies major histocompatibility complex introgression as a general mechanism by which species can acquire novel, and possibly regain previously lost, variation that may enhance defense against pathogens and increase adaptive potential.
Collapse
Affiliation(s)
- T Gaczorek
- Institute of Environmental Sciences, Faculty of Biology, Jagiellonian University, Kraków, Poland
| | - K Dudek
- Institute of Environmental Sciences, Faculty of Biology, Jagiellonian University, Kraków, Poland
| | - U Fritz
- Museum of Zoology (Museum für Tierkunde), Senckenberg Dresden, Dresden, Germany
| | - L Bahri-Sfar
- Biodiversité, Parasitologie et Ecologie des Ecosystèmes Aquatiques, Faculté des Sciences de Tunis, Univ de Tunis El Manar, Tunis, Tunisia
| | - S J E Baird
- Institute of Vertebrate Biology of the Czech Academy of Sciences, Brno, Czech Republic
| | - F Bonhomme
- Institut des Sciences de l'Evolution, Université de Montpellier, Montpellier, France
| | - C Dufresnes
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum National d’Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, Paris, France
| | - V Gvoždík
- Institute of Vertebrate Biology of the Czech Academy of Sciences, Brno, Czech Republic
- Department of Zoology, National Museum of the Czech Republic, Prague, Czech Republic
| | - D Irwin
- Biodiversity Research Centre and Department of Zoology, University of British Columbia, Vancouver, British Columbia, Canada
| | - P Kotlík
- Laboratory of Molecular Ecology, Institute of Animal Physiology and Genetics of the Czech Academy of Sciences, Liběchov, Czech Republic
| | - S Marková
- Laboratory of Molecular Ecology, Institute of Animal Physiology and Genetics of the Czech Academy of Sciences, Liběchov, Czech Republic
| | - P McGinnity
- School of Biological, Earth and Environmental Sciences, University College Cork, Cork, Ireland
| | - M Migalska
- Institute of Environmental Sciences, Faculty of Biology, Jagiellonian University, Kraków, Poland
| | - J Moravec
- Department of Zoology, National Museum of the Czech Republic, Prague, Czech Republic
| | - L Natola
- Biodiversity Research Centre and Department of Zoology, University of British Columbia, Vancouver, British Columbia, Canada
| | - M Pabijan
- Institute of Zoology and Biomedical Research, Faculty of Biology, Jagiellonian University, Kraków, Poland
| | - K P Phillips
- Laboratory of Molecular Ecology, Institute of Animal Physiology and Genetics of the Czech Academy of Sciences, Liběchov, Czech Republic
- Canadian Rivers Institute, University of New Brunswick, Fredericton, New Brunswick, Canada
| | - Y Schöneberg
- Senckenberg Biodiversity and Climate Research Centre (BiK-F), Frankfurt am Main, Germany
- Institute for Ecology, Evolution and Diversity, Goethe University, Frankfurt am Main, Germany
| | - A Souissi
- Biodiversité, Parasitologie et Ecologie des Ecosystèmes Aquatiques, Faculté des Sciences de Tunis, Univ de Tunis El Manar, Tunis, Tunisia
- MARBEC, Univ Montpellier, 34000 Montpellier, France
| | - J Radwan
- Institute of Environmental Biology, Faculty of Biology, Adam Mickiewicz University, Poznań, Poland
| | - W Babik
- Institute of Environmental Sciences, Faculty of Biology, Jagiellonian University, Kraków, Poland
| |
Collapse
|
13
|
Dunn T, Zook JM, Holt JM, Narayanasamy S. Jointly benchmarking small and structural variant calls with vcfdist. Genome Biol 2024; 25:253. [PMID: 39358801 PMCID: PMC11446017 DOI: 10.1186/s13059-024-03394-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Accepted: 09/17/2024] [Indexed: 10/04/2024] Open
Abstract
In this work, we extend vcfdist to be the first variant call benchmarking tool to jointly evaluate phased single-nucleotide polymorphisms (SNPs), small insertions/deletions (INDELs), and structural variants (SVs) for the whole genome. First, we find that a joint evaluation of small and structural variants uniformly reduces measured errors for SNPs (- 28.9%), INDELs (- 19.3%), and SVs (- 52.4%) across three datasets. vcfdist also corrects a common flaw in phasing evaluations, reducing measured flip errors by over 50%. Lastly, we show that vcfdist is more accurate than previously published works and on par with the newest approaches while providing improved result interpretability.
Collapse
Affiliation(s)
- Tim Dunn
- Computer Science and Engineering, University of Michigan, Ann Arbor, Michigan, USA.
| | - Justin M Zook
- National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| | | | - Satish Narayanasamy
- Computer Science and Engineering, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
14
|
Lee H, Niida H, Sung S, Lee J. Haplotype-resolved de novo assembly revealed unique characteristics of alternative lengthening of telomeres in mouse embryonic stem cells. Nucleic Acids Res 2024:gkae842. [PMID: 39351882 DOI: 10.1093/nar/gkae842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 09/11/2024] [Accepted: 09/17/2024] [Indexed: 10/03/2024] Open
Abstract
Telomeres protect chromosome ends from DNA damage responses, and their dysfunction triggers genomic alterations like chromosome fusion and rearrangement, which can lead to cellular death. Certain cells, including specific cancer cells, adopt alternative lengthening of telomere (ALT) to counteract dysfunctional telomeres and proliferate indefinitely. While telomere instability and ALT activity are likely major sources of genomic alteration, the patterns and consequences of such changes at the nucleotide level in ALT cells remain unexplored. Here we generated haplotype-resolved genome assemblies for type I ALT mouse embryonic stem cells, facilitated by highly accurate or ultra-long reads and Hi-C reads. High-quality genome revealed ALT-specific complex chromosome end structures and various genomic alterations including over 1000 structural variants (SVs). The unique sequence (mTALT) used as a template for type I ALT telomeres showed traces of being recruited into the genome, with mTALT being replicated with remarkably high accuracy. Subtelomeric regions exhibited distinct characteristics: resistance to the accumulation of SVs and small variants. We genotyped SVs at allele resolution, identifying genes (Rgs6, Dpf3 and Tacc2) crucial for maintaining ALT telomere stability. Our genome assembly-based approach elucidated the unique characteristics of ALT genome, offering insights into the genome evolution of cells surviving telomere-derived crisis.
Collapse
Affiliation(s)
- Hyunji Lee
- Department of Biological Sciences, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, Korea
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, Korea
| | - Hiroyuki Niida
- Hamamatsu University School of Medicine, 1-20-1 Handayama, Chuo-ku, Hamamatsu city, Shizuoka 431-3192, Japan
| | - Sanghyun Sung
- Department of Biological Sciences, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, Korea
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, Korea
| | - Junho Lee
- Department of Biological Sciences, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, Korea
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, Korea
- Research Institute of Basic Sciences, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
15
|
Dolzhenko E, English A, Dashnow H, De Sena Brandine G, Mokveld T, Rowell WJ, Karniski C, Kronenberg Z, Danzi MC, Cheung WA, Bi C, Farrow E, Wenger A, Chua KP, Martínez-Cerdeño V, Bartley TD, Jin P, Nelson DL, Zuchner S, Pastinen T, Quinlan AR, Sedlazeck FJ, Eberle MA. Characterization and visualization of tandem repeats at genome scale. Nat Biotechnol 2024; 42:1606-1614. [PMID: 38168995 DOI: 10.1038/s41587-023-02057-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 11/06/2023] [Indexed: 01/05/2024]
Abstract
Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there is still a need for computational methods to profile TRs across the genome. Here we introduce the Tandem Repeat Genotyping Tool (TRGT) and an accompanying TR database. TRGT determines the consensus sequences and methylation levels of specified TRs from PacBio HiFi sequencing data. It also reports reads that support each repeat allele. These reads can be subsequently visualized with a companion TR visualization tool. Assessing 937,122 TRs, TRGT showed a Mendelian concordance of 98.38%, allowing a single repeat unit difference. In six samples with known repeat expansions, TRGT detected all expansions while also identifying methylation signals and mosaicism and providing finer repeat length resolution than existing methods. Additionally, we released a database with allele sequences and methylation levels for 937,122 TRs across 100 genomes.
Collapse
Affiliation(s)
| | - Adam English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Harriet Dashnow
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | | | - Tom Mokveld
- Pacific Biosciences of California, Menlo Park, CA, USA
| | | | | | | | - Matt C Danzi
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Warren A Cheung
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Chengpeng Bi
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Emily Farrow
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron Wenger
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Khi Pin Chua
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Verónica Martínez-Cerdeño
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
- MIND Institute, UC Davis School of Medicine, Sacramento, CA, USA
| | - Trevor D Bartley
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
| | - Peng Jin
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - David L Nelson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Stephan Zuchner
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Tomi Pastinen
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron R Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | | |
Collapse
|
16
|
Smolka M, Paulin LF, Grochowski CM, Horner DW, Mahmoud M, Behera S, Kalef-Ezra E, Gandhi M, Hong K, Pehlivan D, Scholz SW, Carvalho CMB, Proukakis C, Sedlazeck FJ. Detection of mosaic and population-level structural variants with Sniffles2. Nat Biotechnol 2024; 42:1571-1580. [PMID: 38168980 PMCID: PMC11217151 DOI: 10.1038/s41587-023-02024-y] [Citation(s) in RCA: 34] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 10/11/2023] [Indexed: 01/05/2024]
Abstract
Calling structural variations (SVs) is technically challenging, but using long reads remains the most accurate way to identify complex genomic alterations. Here we present Sniffles2, which improves over current methods by implementing a repeat aware clustering coupled with a fast consensus sequence and coverage-adaptive filtering. Sniffles2 is 11.8 times faster and 29% more accurate than state-of-the-art SV callers across different coverages (5-50×), sequencing technologies (ONT and HiFi) and SV types. Furthermore, Sniffles2 solves the problem of family-level to population-level SV calling to produce fully genotyped VCF files. Across 11 probands, we accurately identified causative SVs around MECP2, including highly complex alleles with three overlapping SVs. Sniffles2 also enables the detection of mosaic SVs in bulk long-read data. As a result, we identified multiple mosaic SVs in brain tissue from a patient with multiple system atrophy. The identified SV showed a remarkable diversity within the cingulate cortex, impacting both genes involved in neuron function and repetitive elements.
Collapse
Affiliation(s)
- Moritz Smolka
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Luis F Paulin
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | | | - Dominic W Horner
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Sairam Behera
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Ester Kalef-Ezra
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Mira Gandhi
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Karl Hong
- Bionano Genomics, San Diego, CA, USA
| | - Davut Pehlivan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Division of Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Sonja W Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Claudia M B Carvalho
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Christos Proukakis
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
17
|
Nesta A, Veiga DFT, Banchereau J, Anczukow O, Beck CR. Alternative splicing of transposable elements in human breast cancer. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.26.615242. [PMID: 39386569 PMCID: PMC11463404 DOI: 10.1101/2024.09.26.615242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Transposable elements (TEs) drive genome evolution and can affect gene expression through diverse mechanisms. In breast cancer, disrupted regulation of TE sequences may facilitate tumor-specific transcriptomic alterations. We examine 142,514 full-length isoforms derived from long-read RNA sequencing (LR-seq) of 30 breast samples to investigate the effects of TEs on the breast cancer transcriptome. Approximately half of these isoforms contain TE sequences, and these contribute to half of the novel annotated splice junctions. We quantify splicing of these LR-seq derived isoforms in 1,135 breast tumors from The Cancer Genome Atlas (TCGA) and 1,329 healthy tissue samples from the Genotype-Tissue Expression (GTEx), and find 300 TE-overlapping tumor-specific splicing events. Some splicing events are enriched in specific breast cancer subtypes - for example, a TE-driven transcription start site upstream of ERBB2 in HER2+ tumors, and several TE-mediated splicing events are associated with patient survival and poor prognosis. The full-length sequences we capture with LR-seq reveal thousands of isoforms with signatures of RNA editing, including a novel isoform belonging to RHOA; a gene previously implicated in tumor progression. We utilize our full-length isoforms to discover polymorphic TE insertions that alter splicing and validate one of these events in breast cancer cell lines. Together, our results demonstrate the widespread effects of dysregulated TEs on breast cancer transcriptomes and highlight the advantages of long-read isoform sequencing for understanding TE biology. TE-derived isoforms may alter the expression of genes important in cancer and can potentially be used as novel, disease-specific therapeutic targets or biomarkers.
Collapse
Affiliation(s)
- Alex Nesta
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032 USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06030, USA
| | - Diogo F. T. Veiga
- Department of Translational Medicine, School of Medical Sciences, University of Campinas (UNICAMP), Campinas, SP 13083, Brazil
| | - Jacques Banchereau
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032 USA
- Immunoledge LLC, Montclair, NJ, 07042, USA
| | - Olga Anczukow
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032 USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06030, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| | - Christine R. Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032 USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06030, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
18
|
Zhou B, Arthur JG, Guo H, Kim T, Huang Y, Pattni R, Wang T, Kundu S, Luo JXJ, Lee H, Nachun DC, Purmann C, Monte EM, Weimer AK, Qu PP, Shi M, Jiang L, Yang X, Fullard JF, Bendl J, Girdhar K, Kim M, Chen X, Greenleaf WJ, Duncan L, Ji HP, Zhu X, Song G, Montgomery SB, Palejev D, Zu Dohna H, Roussos P, Kundaje A, Hallmayer JF, Snyder MP, Wong WH, Urban AE. Detection and analysis of complex structural variation in human genomes across populations and in brains of donors with psychiatric disorders. Cell 2024:S0092-8674(24)01032-8. [PMID: 39353437 DOI: 10.1016/j.cell.2024.09.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 07/01/2024] [Accepted: 09/10/2024] [Indexed: 10/04/2024]
Abstract
Complex structural variations (cxSVs) are often overlooked in genome analyses due to detection challenges. We developed ARC-SV, a probabilistic and machine-learning-based method that enables accurate detection and reconstruction of cxSVs from standard datasets. By applying ARC-SV across 4,262 genomes representing all continental populations, we identified cxSVs as a significant source of natural human genetic variation. Rare cxSVs have a propensity to occur in neural genes and loci that underwent rapid human-specific evolution, including those regulating corticogenesis. By performing single-nucleus multiomics in postmortem brains, we discovered cxSVs associated with differential gene expression and chromatin accessibility across various brain regions and cell types. Additionally, cxSVs detected in brains of psychiatric cases are enriched for linkage with psychiatric GWAS risk alleles detected in the same brains. Furthermore, our analysis revealed significantly decreased brain-region- and cell-type-specific expression of cxSV genes, specifically for psychiatric cases, implicating cxSVs in the molecular etiology of major neuropsychiatric disorders.
Collapse
Affiliation(s)
- Bo Zhou
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA; Department of Genetics, Stanford University, Stanford, CA 94305, USA; Maternal and Child Health Research Institute, Stanford University School of Medicine, Stanford, CA 94305, USA.
| | - Joseph G Arthur
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Hanmin Guo
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA; Department of Genetics, Stanford University, Stanford, CA 94305, USA; Maternal and Child Health Research Institute, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Statistics, Stanford University, Stanford, CA 94305, USA; Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Taeyoung Kim
- School of Computer Science and Engineering, Pusan National University, Busan 46241, South Korea
| | - Yiling Huang
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA; Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Reenal Pattni
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA; Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Tao Wang
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Soumya Kundu
- Department of Genetics, Stanford University, Stanford, CA 94305, USA; Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Jay X J Luo
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Daniel C Nachun
- Department of Pathology, Stanford University, Stanford, CA 94305, USA
| | - Carolin Purmann
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA; Department of Genetics, Stanford University, Stanford, CA 94305, USA; Maternal and Child Health Research Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Emma M Monte
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Annika K Weimer
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Ping-Ping Qu
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA; Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Minyi Shi
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Lixia Jiang
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Xinqiong Yang
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - John F Fullard
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Jaroslav Bendl
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Kiran Girdhar
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Minsu Kim
- School of Computer Science and Engineering, Pusan National University, Busan 46241, South Korea
| | - Xi Chen
- Department of Statistics, Stanford University, Stanford, CA 94305, USA; Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | | | - Laramie Duncan
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Xiang Zhu
- Department of Statistics, Stanford University, Stanford, CA 94305, USA; Department of Statistics, Pennsylvania State University, University Park, PA 16802, USA; Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
| | - Giltae Song
- School of Computer Science and Engineering, Pusan National University, Busan 46241, South Korea; Center for Artificial Intelligence Research, Pusan National University, Busan 46241, South Korea
| | - Stephen B Montgomery
- Department of Genetics, Stanford University, Stanford, CA 94305, USA; Maternal and Child Health Research Institute, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA; Department of Pathology, Stanford University, Stanford, CA 94305, USA
| | - Dean Palejev
- Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia 1113, Bulgaria
| | - Heinrich Zu Dohna
- Department of Biology, American University of Beirut, Beirut 11-0236, Lebanon
| | - Panos Roussos
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Center for Precision Medicine and Translational Therapeutics, James J. Peters VA Medical Center, Bronx, NY 10468, USA; Mental Illness Research Education and Clinical Center (VISN 2 South), James J. Peters VA Medical Center, Bronx, NY 10468, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA 94305, USA; Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Joachim F Hallmayer
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA
| | - Michael P Snyder
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Wing H Wong
- Department of Statistics, Stanford University, Stanford, CA 94305, USA; Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA.
| | - Alexander E Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA; Department of Genetics, Stanford University, Stanford, CA 94305, USA; Maternal and Child Health Research Institute, Stanford University School of Medicine, Stanford, CA 94305, USA.
| |
Collapse
|
19
|
Luan M, Chen K, Zhao W, Tang M, Wang L, Liu S, Zhu L, Xie S. Selective Effect of DNA N6-Methyladenosine Modification on Transcriptional Genetic Variations in East Asian Samples. Int J Mol Sci 2024; 25:10400. [PMID: 39408729 PMCID: PMC11477068 DOI: 10.3390/ijms251910400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 09/23/2024] [Accepted: 09/24/2024] [Indexed: 10/20/2024] Open
Abstract
Genetic variations and DNA modification are two common dominant factors ubiquitous across the entire human genome and induce human disease, especially through static genetic variations in DNA or RNA that cause human genetic diseases. DNA N6-methyladenosine (6mA) methylation, as a new epigenetic modification mark, has been widely studied for regulatory biological processes in humans. However, the effect of DNA modification on dynamic transcriptional genetic variations from DNA to RNA has rarely been reported. Here, we identified DNA, RNA and transcriptional genetic variations from Illumina short-read sequencing data in East Asian samples (HX1 and AK1) and detected global DNA 6mA modification using single-molecule, real-time sequencing (SMRT) data. We decoded the effects of DNA 6mA modification on transcriptional genetic variations in East Asian samples and the results were extensively verified in the HeLa cell line. DNA 6mA modification had a stabilized distribution in the East Asian samples and the methylated genes were less likely to mutate than the non-methylated genes. For methylated genes, the 6mA density was positively correlated with the number of variations. DNA 6mA modification had a selective effect on transcriptional genetic variations from DNA to RNA, in which the dynamic transcriptional variations of heterozygous (0/1 to 0/1) and homozygous (1/1 to 1/1) were significantly affected by 6mA modification. The effect of DNA methylation on transcriptional genetic variations provides new insights into the influencing factors of DNA to RNA transcriptional regulation in the central doctrine of molecular biology.
Collapse
Affiliation(s)
- Meiwei Luan
- School of Basic Medicine, Harbin Medical University, Harbin 150081, China;
| | - Kaining Chen
- Guangzhou Women and Children’s Medical Center, Guangzhou Medical University, Guangzhou 511436, China;
| | - Wenwen Zhao
- College of Forestry, Hainan University, Haikou 570228, China; (W.Z.); (M.T.); (L.W.); (S.L.)
| | - Minqiang Tang
- College of Forestry, Hainan University, Haikou 570228, China; (W.Z.); (M.T.); (L.W.); (S.L.)
| | - Lingxia Wang
- College of Forestry, Hainan University, Haikou 570228, China; (W.Z.); (M.T.); (L.W.); (S.L.)
| | - Shoubai Liu
- College of Forestry, Hainan University, Haikou 570228, China; (W.Z.); (M.T.); (L.W.); (S.L.)
| | - Linan Zhu
- School of Mechanical and Materials Engineering, Washington State University, Pullman, WA 99163, USA;
| | - Shangqian Xie
- College of Forestry, Hainan University, Haikou 570228, China; (W.Z.); (M.T.); (L.W.); (S.L.)
| |
Collapse
|
20
|
Logsdon GA, Ebert P, Audano PA, Loftus M, Porubsky D, Ebler J, Yilmaz F, Hallast P, Prodanov T, Yoo D, Paisie CA, Harvey WT, Zhao X, Martino GV, Henglin M, Munson KM, Rabbani K, Chin CS, Gu B, Ashraf H, Austine-Orimoloye O, Balachandran P, Bonder MJ, Cheng H, Chong Z, Crabtree J, Gerstein M, Guethlein LA, Hasenfeld P, Hickey G, Hoekzema K, Hunt SE, Jensen M, Jiang Y, Koren S, Kwon Y, Li C, Li H, Li J, Norman PJ, Oshima KK, Paten B, Phillippy AM, Pollock NR, Rausch T, Rautiainen M, Scholz S, Song Y, Söylev A, Sulovari A, Surapaneni L, Tsapalou V, Zhou W, Zhou Y, Zhu Q, Zody MC, Mills RE, Devine SE, Shi X, Talkowski ME, Chaisson MJP, Dilthey AT, Konkel MK, Korbel JO, Lee C, Beck CR, Eichler EE, Marschall T. Complex genetic variation in nearly complete human genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.24.614721. [PMID: 39372794 PMCID: PMC11451754 DOI: 10.1101/2024.09.24.614721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
Diverse sets of complete human genomes are required to construct a pangenome reference and to understand the extent of complex structural variation. Here, we sequence 65 diverse human genomes and build 130 haplotype-resolved assemblies (130 Mbp median continuity), closing 92% of all previous assembly gaps1,2 and reaching telomere-to-telomere (T2T) status for 39% of the chromosomes. We highlight complete sequence continuity of complex loci, including the major histocompatibility complex (MHC), SMN1/SMN2, NBPF8, and AMY1/AMY2, and fully resolve 1,852 complex structural variants (SVs). In addition, we completely assemble and validate 1,246 human centromeres. We find up to 30-fold variation in α-satellite high-order repeat (HOR) array length and characterize the pattern of mobile element insertions into α-satellite HOR arrays. While most centromeres predict a single site of kinetochore attachment, epigenetic analysis suggests the presence of two hypomethylated regions for 7% of centromeres. Combining our data with the draft pangenome reference1 significantly enhances genotyping accuracy from short-read data, enabling whole-genome inference3 to a median quality value (QV) of 45. Using this approach, 26,115 SVs per sample are detected, substantially increasing the number of SVs now amenable to downstream disease association studies.
Collapse
Affiliation(s)
- Glennis A Logsdon
- Perelman School of Medicine, University of Pennsylvania, Department of Genetics, Epigenetics Institute, Philadelphia, PA, USA
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter Ebert
- Core Unit Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Mark Loftus
- Clemson University, Department of Genetics & Biochemistry, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Timofey Prodanov
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Carolyn A Paisie
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Xuefang Zhao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Gianni V Martino
- Clemson University, Department of Genetics & Biochemistry, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
- Medical University of South Carolina, College of Graduate Studies, Charleston, SC, USA
| | - Mir Henglin
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Keon Rabbani
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | - Bida Gu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Hufsah Ashraf
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Olanrewaju Austine-Orimoloye
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | | | - Marc Jan Bonder
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands; Oncode Institute, Utrecht, The Netherlands
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center, Heidelberg, Germany
| | - Haoyu Cheng
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| | - Zechen Chong
- Department of Biomedical Informatics and Data Science, Heersink School of Medicine, University of Alabama, Birmingham, AL, USA
| | - Jonathan Crabtree
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Lisbeth A Guethlein
- Department of Structural Biology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Patrick Hasenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Matthew Jensen
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Yunzhe Jiang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Youngjun Kwon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Chong Li
- Temple University, Department of Computer and Information Sciences, College of Science and Technology, Philadelphia, PA, USA
- Temple University, Institute for Genomics and Evolutionary Medicine, Philadelphia, PA, USA
| | - Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Jiaqi Li
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Paul J Norman
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
- Department of Immunology and Microbiology, University of Colorado School of Medicine, Aurora, CO, USA
| | - Keisuke K Oshima
- Perelman School of Medicine, University of Pennsylvania, Department of Genetics, Epigenetics Institute, Philadelphia, PA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nicholas R Pollock
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
| | - Tobias Rausch
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Mikko Rautiainen
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Stephan Scholz
- Institute of Medical Microbiology and Hospital Hygiene, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Yuwei Song
- Department of Biomedical Informatics and Data Science, Heersink School of Medicine, University of Alabama, Birmingham, AL, USA
| | - Arda Söylev
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Likhitha Surapaneni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Vasiliki Tsapalou
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Weichen Zhou
- Department of Computational Medicine & Bioinformatics, University of Michigan, MI, USA
| | - Ying Zhou
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Stanford Health Care, Palo Alto, CA, USA
| | | | - Ryan E Mills
- Department of Computational Medicine & Bioinformatics, University of Michigan, MI, USA
| | - Scott E Devine
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Xinghua Shi
- Temple University, Department of Computer and Information Sciences, College of Science and Technology, Philadelphia, PA, USA
- Temple University, Institute for Genomics and Evolutionary Medicine, Philadelphia, PA, USA
| | - Mike E Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Alexander T Dilthey
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
- Institute of Medical Microbiology and Hospital Hygiene, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Miriam K Konkel
- Clemson University, Department of Genetics & Biochemistry, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- The University of Connecticut Health Center, Farmington, CT, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| |
Collapse
|
21
|
Mastoras M, Asri M, Brambrink L, Hebbar P, Kolesnikov A, Cook DE, Nattestad M, Lucas J, Won TS, Chang PC, Carroll A, Paten B, Shafin K. Highly accurate assembly polishing with DeepPolisher. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.17.613505. [PMID: 39345401 PMCID: PMC11429912 DOI: 10.1101/2024.09.17.613505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Accurate genome assemblies are essential for biological research, but even the highest quality assemblies retain errors caused by the technologies used to construct them. Base-level errors are typically fixed with an additional polishing step that uses reads aligned to the draft assembly to identify necessary edits. However, current methods struggle to find a balance between over-and under-polishing. Here, we present an encoder-only transformer model for assembly polishing called DeepPolisher, which predicts corrections to the underlying sequence using Pacbio HiFi read alignments to a diploid assembly. Our pipeline introduces a method, PHARAOH (Phasing Reads in Areas Of Homozygosity), which uses ultra-long ONT data to ensure alignments are accurately phased and to correctly introduce heterozygous edits in falsely homozygous regions. We demonstrate that the DeepPolisher pipeline can reduce assembly errors by half, with a greater than 70% reduction in indel errors. We have applied our DeepPolisher-based pipeline to 180 assemblies from the next Human Pangenome Reference Consortium (HPRC) data release, producing an average predicted Quality Value (QV) improvement of 3.4 (54% error reduction) for the majority of the genome.
Collapse
Affiliation(s)
- Mira Mastoras
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | | | - Prajna Hebbar
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | | | | | | | - Julian Lucas
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Taylor S. Won
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | | | | | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | | |
Collapse
|
22
|
Höps W, Rausch T, Jendrusch M, Korbel JO, Sedlazeck FJ. Impact and characterization of serial structural variations across humans and great apes. Nat Commun 2024; 15:8007. [PMID: 39266513 PMCID: PMC11393467 DOI: 10.1038/s41467-024-52027-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 08/23/2024] [Indexed: 09/14/2024] Open
Abstract
Modern sequencing technology enables the systematic detection of complex structural variation (SV) across genomes. However, extensive DNA rearrangements arising through a series of mutations, a phenomenon we refer to as serial SV (sSV), remain underexplored, posing a challenge for SV discovery. Here, we present NAHRwhals ( https://github.com/WHops/NAHRwhals ), a method to infer repeat-mediated series of SVs in long-read genomic assemblies. Applying NAHRwhals to haplotype-resolved human genomes from 28 individuals reveals 37 sSV loci of various length and complexity. These sSVs explain otherwise cryptic variation in medically relevant regions such as the TPSAB1 gene, 8p23.1, 22q11 and Sotos syndrome regions. Comparisons with great ape assemblies indicate that most human sSVs formed recently, after the human-ape split, and involved non-repeat-mediated processes in addition to non-allelic homologous recombination. NAHRwhals reliably discovers and characterizes sSVs at scale and independent of species, uncovering their genomic abundance and suggesting broader implications for disease.
Collapse
Affiliation(s)
- Wolfram Höps
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
| | - Tobias Rausch
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
- Molecular Medicine Partnership Unit, European Molecular Biology Laboratory, University of Heidelberg, Heidelberg, Germany
| | - Michael Jendrusch
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| |
Collapse
|
23
|
Aqil A, Li Y, Wang Z, Islam S, Russell M, Kallak TK, Saitou M, Gokcumen O, Masuda N. Switch-like Gene Expression Modulates Disease Susceptibility. RESEARCH SQUARE 2024:rs.3.rs-4974188. [PMID: 39315271 PMCID: PMC11419265 DOI: 10.21203/rs.3.rs-4974188/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
A fundamental challenge in biomedicine is understanding the mechanisms predisposing individuals to disease. While previous research has suggested that switch-like gene expression is crucial in driving biological variation and disease susceptibility, a systematic analysis across multiple tissues is still lacking. By analyzing transcriptomes from 943 individuals across 27 tissues, we identified 1,013 switch-like genes. We found that only 31 (3.1%) of these genes exhibit switch-like behavior across all tissues. These universally switch-like genes appear to be genetically driven, with large exonic genomic structural variants explaining five (~18%) of them. The remaining switch-like genes exhibit tissue-specific expression patterns. Notably, tissue-specific switch-like genes tend to be switched on or off in unison within individuals, likely under the influence of tissue-specific master regulators, including hormonal signals. Among our most significant findings, we identified hundreds of concordantly switched-off genes in the stomach and vagina that are linked to gastric cancer (41-fold, p<10-4) and vaginal atrophy (44-fold, p<10-4), respectively. Experimental analysis of vaginal tissues revealed that low systemic levels of estrogen lead to a significant reduction in both the epithelial thickness and the expression of the switch-like gene ALOX12. We propose a model wherein the switching off of driver genes in basal and parabasal epithelium suppresses cell proliferation therein, leading to epithelial thinning and, therefore, vaginal atrophy. Our findings underscore the significant biomedical implications of switch-like gene expression and lay the groundwork for potential diagnostic and therapeutic applications.
Collapse
Affiliation(s)
- Alber Aqil
- Department of Biological Sciences, State University of New York at Buffalo, Buffalo, NY, USA
| | - Yanyan Li
- Department of Mathematics, State University of New York at Buffalo, Buffalo, NY, USA
| | - Zhiliang Wang
- Department of Mathematics, State University of New York at Buffalo, Buffalo, NY, USA
| | - Saiful Islam
- Institute for Artificial Intelligence and Data Science, State University of New York at Buffalo, Buffalo, NY, USA
| | - Madison Russell
- Department of Mathematics, State University of New York at Buffalo, Buffalo, NY, USA
| | | | - Marie Saitou
- Faculty of Biosciences, Norwegian University of Life Sciences, Aas, Norway
| | - Omer Gokcumen
- Department of Biological Sciences, State University of New York at Buffalo, Buffalo, NY, USA
| | - Naoki Masuda
- Department of Mathematics, State University of New York at Buffalo, Buffalo, NY, USA
- Institute for Artificial Intelligence and Data Science, State University of New York at Buffalo, Buffalo, NY, USA
| |
Collapse
|
24
|
Kostos P, Galligos A, Gerton JL. Ribosomes unraveled: The path from variant to impact. CELL GENOMICS 2024; 4:100658. [PMID: 39265527 PMCID: PMC11480852 DOI: 10.1016/j.xgen.2024.100658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Revised: 08/20/2024] [Accepted: 08/20/2024] [Indexed: 09/14/2024]
Abstract
In this issue of Cell Genomics, Rothschild et al.1 reveal how ribosomal RNA diversity impacts ribosome structure and its implications for health and disease. Their innovative methodologies uncover distinct ribosome subtypes with significant structural variations and expression patterns. This work reveals connections to tissue-specific biology and cancer, positing new research avenues.
Collapse
Affiliation(s)
- Paxton Kostos
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Anna Galligos
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | | |
Collapse
|
25
|
Redelings BD, Holmes I, Lunter G, Pupko T, Anisimova M. Insertions and Deletions: Computational Methods, Evolutionary Dynamics, and Biological Applications. Mol Biol Evol 2024; 41:msae177. [PMID: 39172750 PMCID: PMC11385596 DOI: 10.1093/molbev/msae177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 07/02/2024] [Accepted: 07/09/2024] [Indexed: 08/24/2024] Open
Abstract
Insertions and deletions constitute the second most important source of natural genomic variation. Insertions and deletions make up to 25% of genomic variants in humans and are involved in complex evolutionary processes including genomic rearrangements, adaptation, and speciation. Recent advances in long-read sequencing technologies allow detailed inference of insertions and deletion variation in species and populations. Yet, despite their importance, evolutionary studies have traditionally ignored or mishandled insertions and deletions due to a lack of comprehensive methodologies and statistical models of insertions and deletion dynamics. Here, we discuss methods for describing insertions and deletion variation and modeling insertions and deletions over evolutionary time. We provide practical advice for tackling insertions and deletions in genomic sequences and illustrate our discussion with examples of insertions and deletion-induced effects in human and other natural populations and their contribution to evolutionary processes. We outline promising directions for future developments in statistical methodologies that would allow researchers to analyze insertions and deletion variation and their effects in large genomic data sets and to incorporate insertions and deletions in evolutionary inference.
Collapse
Affiliation(s)
| | - Ian Holmes
- Department of Bioengineering, University of California, Berkeley, CA 94720, USA
- Calico Life Sciences LLC, South San Francisco, CA 94080, USA
| | - Gerton Lunter
- Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen 9713 GZ, The Netherlands
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Maria Anisimova
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
26
|
Köroğlu Ç, Chen P, Traurig M, Altok S, Bogardus C, Baier LJ. De Novo Genome Assemblies From Two Indigenous Americans from Arizona Identify New Polymorphisms in Non-Reference Sequences. Genome Biol Evol 2024; 16:evae188. [PMID: 39190003 PMCID: PMC11384899 DOI: 10.1093/gbe/evae188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 05/17/2024] [Accepted: 08/22/2024] [Indexed: 08/28/2024] Open
Abstract
There is a collective push to diversify human genetic studies by including underrepresented populations. However, analyzing DNA sequence reads involves the initial step of aligning the reads to the GRCh38/hg38 reference genome which is inadequate for non-European ancestries. In this study, using long-read sequencing technology, we constructed de novo genome assemblies from two indigenous Americans from Arizona (IAZ). Each assembly included ∼17 Mb of DNA sequence not present [nonreference sequence (NRS)] in hg38, which consists mostly of repeat elements. Forty NRSs totaling 240 kb were uniquely anchored to the hg38 primary assembly generating a modified hg38-NRS reference genome. DNA sequence alignment and variant calling were then conducted with whole-genome sequencing (WGS) sequencing data from 387 IAZ using both the hg38 and modified hg38-NRS reference maps. Variant calling with the hg38-NRS map identified ∼50,000 single-nucleotide variants present in at least 5% of the WGS samples which were not detected with the hg38 reference map. We also directly assessed the NRSs positioned within genes. Seventeen NRSs anchored to regions including an identical 187 bp NRS found in both de novo assemblies. The NRS is located in HCN2 79 bp downstream of Exon 3 and contains several putative transcriptional regulatory elements. Genotyping of the HCN2-NRS revealed that the insertion is enriched in IAZ (minor allele frequency = 0.45) compared to other reference populations tested. This study shows that inclusion of population-specific NRSs can dramatically change the variant profile in an underrepresented ethnic groups and thereby lead to the discovery of previously missed common variations.
Collapse
Affiliation(s)
- Çiğdem Köroğlu
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Peng Chen
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Michael Traurig
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Serdar Altok
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Clifton Bogardus
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Leslie J Baier
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| |
Collapse
|
27
|
Jiang T, Zhou Z, Zhang Z, Cao S, Wang Y, Liu Y. MEHunter: transformer-based mobile element variant detection from long reads. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae557. [PMID: 39287014 DOI: 10.1093/bioinformatics/btae557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 09/03/2024] [Accepted: 09/13/2024] [Indexed: 09/19/2024]
Abstract
SUMMARY Mobile genetic elements (MEs) are heritable mutagens that significantly contribute to genetic diseases. The advent of long-read sequencing technologies, capable of resolving large DNA fragments, offers promising prospects for the comprehensive detection of ME variants (MEVs). However, achieving high precision while maintaining recall performance remains challenging mainly brought by the variable length and similar content of MEV signatures, which are often obscured by the noise in long reads. Here, we propose MEHunter, a high-performance MEV detection approach utilizing a fine-tuned transformer model adept at identifying potential MEVs with fragmented features. Benchmark experiments on both simulated and real datasets demonstrate that MEHunter consistently achieves higher accuracy and sensitivity than the state-of-the-art tools. Furthermore, it is capable of detecting novel potentially individual-specific MEVs that have been overlooked in published population projects. AVAILABILITY AND IMPLEMENTATION MEHunter is available from https://github.com/120L021101/MEHunter.
Collapse
Affiliation(s)
- Tao Jiang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan 450000, China
| | - Zuji Zhou
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Zhendong Zhang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Shuqi Cao
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yadong Wang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan 450000, China
| | - Yadong Liu
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan 450000, China
| |
Collapse
|
28
|
Mirus T, Lohmayer R, Döhring C, Halldórsson BV, Kehr B. GGTyper: genotyping complex structural variants using short-read sequencing data. Bioinformatics 2024; 40:ii11-ii19. [PMID: 39230689 PMCID: PMC11373317 DOI: 10.1093/bioinformatics/btae391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open
Abstract
MOTIVATION Complex structural variants (SVs) are genomic rearrangements that involve multiple segments of DNA. They contribute to human diversity and have been shown to cause Mendelian disease. Nevertheless, our abilities to analyse complex SVs are very limited. As opposed to deletions and other canonical types of SVs, there are no established tools that have explicitly been designed for analysing complex SVs. RESULTS Here, we describe a new computational approach that we specifically designed for genotyping complex SVs in short-read sequenced genomes. Given a variant description, our approach computes genotype-specific probability distributions for observing aligned read pairs with a wide range of properties. Subsequently, these distributions can be used to efficiently determine the most likely genotype for any set of aligned read pairs observed in a sequenced genome. In addition, we use these distributions to compute a genotyping difficulty for a given variant, which predicts the amount of data needed to achieve a reliable call. Careful evaluation confirms that our approach outperforms other genotypers by making reliable genotype predictions across both simulated and real data. On up to 7829 human genomes, we achieve high concordance with population-genetic assumptions and expected inheritance patterns. On simulated data, we show that precision correlates well with our prediction of genotyping difficulty. This together with low memory and time requirements makes our approach well-suited for application in biomedical studies involving small to very large numbers of short-read sequenced genomes. AVAILABILITY AND IMPLEMENTATION Source code is available at https://github.com/kehrlab/Complex-SV-Genotyping.
Collapse
Affiliation(s)
- Tim Mirus
- AG Algorithmic Bioinformatics, Leibniz-Institut für Immuntherapie, Regensburg 93053, Germany
| | - Robert Lohmayer
- AG Algorithmic Bioinformatics, Leibniz-Institut für Immuntherapie, Regensburg 93053, Germany
| | - Clementine Döhring
- AG Algorithmic Bioinformatics, Leibniz-Institut für Immuntherapie, Regensburg 93053, Germany
| | - Bjarni V Halldórsson
- deCODE genetics/Amgen Inc, Reykjavik 101, Iceland
- School of Technology, Reykjavik University, Reykjavic 102, Iceland
| | - Birte Kehr
- AG Algorithmic Bioinformatics, Leibniz-Institut für Immuntherapie, Regensburg 93053, Germany
- Fakultät für Informatik und Data Science, Universität Regensburg, Regensburg 93053, Germany
| |
Collapse
|
29
|
Li H, Durbin R. Genome assembly in the telomere-to-telomere era. Nat Rev Genet 2024; 25:658-670. [PMID: 38649458 DOI: 10.1038/s41576-024-00718-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/27/2024] [Indexed: 04/25/2024]
Abstract
Genome sequences largely determine the biology and encode the history of an organism, and de novo assembly - the process of reconstructing the genome sequence of an organism from sequencing reads - has been a central problem in bioinformatics for four decades. Until recently, genomes were typically assembled into fragments of a few megabases at best, but now technological advances in long-read sequencing enable the near-complete assembly of each chromosome - also known as telomere-to-telomere assembly - for many organisms. Here, we review recent progress on assembly algorithms and protocols, with a focus on how to derive near-telomere-to-telomere assemblies. We also discuss the additional developments that will be required to resolve remaining assembly gaps and to assemble non-diploid genomes.
Collapse
Affiliation(s)
- Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Richard Durbin
- Department of Genetics, Cambridge University, Cambridge, UK.
| |
Collapse
|
30
|
Negi S, Stenton SL, Berger SI, McNulty B, Violich I, Gardner J, Hillaker T, O'Rourke SM, O'Leary MC, Carbonell E, Austin-Tse C, Lemire G, Serrano J, Mangilog B, VanNoy G, Kolmogorov M, Vilain E, O'Donnell-Luria A, Délot E, Miga KH, Monlong J, Paten B. Advancing long-read nanopore genome assembly and accurate variant calling for rare disease detection. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.08.22.24312327. [PMID: 39228712 PMCID: PMC11370519 DOI: 10.1101/2024.08.22.24312327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
More than 50% of families with suspected rare monogenic diseases remain unsolved after whole genome analysis by short read sequencing (SRS). Long-read sequencing (LRS) could help bridge this diagnostic gap by capturing variants inaccessible to SRS, facilitating long-range mapping and phasing, and providing haplotype-resolved methylation profiling. To evaluate LRS's additional diagnostic yield, we sequenced a rare disease cohort of 98 samples, including 41 probands and some family members, using nanopore sequencing, achieving per sample ∼36x average coverage and 32 kilobase (kb) read N50 from a single flow cell. Our Napu pipeline generated assemblies, phased variants, and methylation calls. LRS covered, on average, coding exons in ∼280 genes and ∼5 known Mendelian disease genes that were not covered by SRS. In comparison to SRS, LRS detected additional rare, functionally annotated variants, including SVs and tandem repeats, and completely phased 87% of protein-coding genes. LRS detected additional de novo variants, and could be used to distinguish postzygotic mosaic variants from prezygotic de novos . Eleven probands were solved, with diverse underlying genetic causes including de novo and compound heterozygous variants, large-scale SVs, and epigenetic modifications. Our study demonstrates LRS's potential to enhance diagnostic yield for rare monogenic diseases, implying utility in future clinical genomics workflows.
Collapse
|
31
|
Sugiyama Y, Okada S, Daigaku Y, Kusumoto E, Ito T. Strategic targeting of Cas9 nickase induces large segmental duplications. CELL GENOMICS 2024; 4:100610. [PMID: 39053455 PMCID: PMC11406185 DOI: 10.1016/j.xgen.2024.100610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 04/15/2024] [Accepted: 07/02/2024] [Indexed: 07/27/2024]
Abstract
Gene/segmental duplications play crucial roles in genome evolution and variation. Here, we introduce paired nicking-induced amplification (PNAmp) for their experimental induction. PNAmp strategically places two Cas9 nickases upstream and downstream of a replication origin on opposite strands. This configuration directs the sister replication forks initiated from the origin to break at the nicks, generating a pair of one-ended double-strand breaks. If homologous sequences flank the two break sites, then end resection converts them to single-stranded DNAs that readily anneal to drive duplication of the region bounded by the homologous sequences. PNAmp induces duplication of segments as large as ∼1 Mb with efficiencies exceeding 10% in the budding yeast Saccharomyces cerevisiae. Furthermore, appropriate splint DNAs allow PNAmp to duplicate/multiplicate even segments not bounded by homologous sequences. We also provide evidence for PNAmp in mammalian cells. Therefore, PNAmp provides a prototype method to induce structural variations by manipulating replication fork progression.
Collapse
Affiliation(s)
- Yuki Sugiyama
- Department of Biochemistry, Kyushu University Graduate School of Medical Sciences, Fukuoka 812-8582, Japan
| | - Satoshi Okada
- Department of Biochemistry, Kyushu University Graduate School of Medical Sciences, Fukuoka 812-8582, Japan
| | - Yasukazu Daigaku
- Cancer Genome Dynamics Project, Cancer Institute, Japanese Foundation for Cancer Research, Tokyo 135-8550, Japan
| | - Emiko Kusumoto
- Department of Biochemistry, Kyushu University Graduate School of Medical Sciences, Fukuoka 812-8582, Japan
| | - Takashi Ito
- Department of Biochemistry, Kyushu University Graduate School of Medical Sciences, Fukuoka 812-8582, Japan.
| |
Collapse
|
32
|
Luo C, Liu YH, Zhou XM. VolcanoSV enables accurate and robust structural variant calling in diploid genomes from single-molecule long read sequencing. Nat Commun 2024; 15:6956. [PMID: 39138168 PMCID: PMC11322167 DOI: 10.1038/s41467-024-51282-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 07/31/2024] [Indexed: 08/15/2024] Open
Abstract
Structural variants (SVs) significantly contribute to human genome diversity and play a crucial role in precision medicine. Although advancements in single-molecule long-read sequencing offer a groundbreaking resource for SV detection, identifying SV breakpoints and sequences accurately and robustly remains challenging. We introduce VolcanoSV, an innovative hybrid SV detection pipeline that utilizes both a reference genome and local de novo assembly to generate a phased diploid assembly. VolcanoSV uses phased SNPs and unique k-mer similarity analysis, enabling precise haplotype-resolved SV discovery. VolcanoSV is adept at constructing comprehensive genetic maps encompassing SNPs, small indels, and all types of SVs, making it well-suited for human genomics studies. Our extensive experiments demonstrate that VolcanoSV surpasses state-of-the-art assembly-based tools in the detection of insertion and deletion SVs, exhibiting superior recall, precision, F1 scores, and genotype accuracy across a diverse range of datasets, including low-coverage (10x) datasets. VolcanoSV outperforms assembly-based tools in the identification of complex SVs, including translocations, duplications, and inversions, in both simulated and real cancer data. Moreover, VolcanoSV is robust to various evaluation parameters and accurately identifies breakpoints and SV sequences.
Collapse
Affiliation(s)
- Can Luo
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA
| | - Yichen Henry Liu
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Xin Maizie Zhou
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA.
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA.
- Data Science Institute, Vanderbilt University, Nashville, TN, USA.
| |
Collapse
|
33
|
Plender EG, Prodanov T, Hsieh P, Nizamis E, Harvey WT, Sulovari A, Munson KM, Kaufman EJ, O'Neal WK, Valdmanis PN, Marschall T, Bloom JD, Eichler EE. Structural and genetic diversity in the secreted mucins MUC5AC and MUC5B. Am J Hum Genet 2024; 111:1700-1716. [PMID: 38991590 PMCID: PMC11344006 DOI: 10.1016/j.ajhg.2024.06.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 06/14/2024] [Accepted: 06/17/2024] [Indexed: 07/13/2024] Open
Abstract
The secreted mucins MUC5AC and MUC5B are large glycoproteins that play critical defensive roles in pathogen entrapment and mucociliary clearance. Their respective genes contain polymorphic and degenerate protein-coding variable number tandem repeats (VNTRs) that make the loci difficult to investigate with short reads. We characterize the structural diversity of MUC5AC and MUC5B by long-read sequencing and assembly of 206 human and 20 nonhuman primate (NHP) haplotypes. We find that human MUC5B is largely invariant (5,761-5,762 amino acids [aa]); however, seven haplotypes have expanded VNTRs (6,291-7,019 aa). In contrast, 30 allelic variants of MUC5AC encode 16 distinct proteins (5,249-6,325 aa) with cysteine-rich domain and VNTR copy-number variation. We group MUC5AC alleles into three phylogenetic clades: H1 (46%, ∼5,654 aa), H2 (33%, ∼5,742 aa), and H3 (7%, ∼6,325 aa). The two most common human MUC5AC variants are smaller than NHP gene models, suggesting a reduction in protein length during recent human evolution. Linkage disequilibrium and Tajima's D analyses reveal that East Asians carry exceptionally large blocks with an excess of rare variation (p < 0.05) at MUC5AC. To validate this result, we use Locityper for genotyping MUC5AC haplogroups in 2,600 unrelated samples from the 1000 Genomes Project. We observe a signature of positive selection in H1 among East Asians and a depletion of the likely ancestral haplogroup (H3). In Europeans, H3 alleles show an excess of common variation and deviate from Hardy-Weinberg equilibrium (p < 0.05), consistent with heterozygote advantage and balancing selection. This study provides a generalizable strategy to characterize complex protein-coding VNTRs for improved disease associations.
Collapse
Affiliation(s)
- Elizabeth G Plender
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Timofey Prodanov
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany; Center for Digital Medicine, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Department of Genetics, Cell Biology, and Development, University of Minnesota Medical School, Minneapolis, MN 55455, USA
| | - Evangelos Nizamis
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Computational Biology, Cajal Neuroscience Inc, Seattle, WA 98102, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Eli J Kaufman
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Wanda K O'Neal
- Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Paul N Valdmanis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany; Center for Digital Medicine, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany
| | - Jesse D Bloom
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA; Howard Hughes Medical Institute, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
34
|
Said I, Barbash DA, Clark AG. The Structure of Simple Satellite Variation in the Human Genome and Its Correlation With Centromere Ancestry. Genome Biol Evol 2024; 16:evae153. [PMID: 39018452 PMCID: PMC11305138 DOI: 10.1093/gbe/evae153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 06/21/2024] [Accepted: 07/12/2024] [Indexed: 07/19/2024] Open
Abstract
Although repetitive DNA forms much of the human genome, its study is challenging due to limitations in assembly and alignment of repetitive short-reads. We have deployed k-Seek, software that detects tandem repeats embedded in single reads, on 2,504 human genomes from the 1,000 Genomes Project to quantify the variation and abundance of simple satellites (repeat units <20 bp). We find that the ancestral monomer of Human Satellite 3 makes up the largest portion of simple satellite content in humans (mean of ∼8 Mb). We discovered ∼50,000 rare tandem repeats that are not detected in the T2T-CHM13v2.0 assembly, including undescribed variants of telomericand pericentromeric repeats. We find broad homogeneity of the most abundant repeats across populations, except for AG-rich repeats which are more abundant in African individuals. We also find cliques of highly similar AG- and AT-rich satellites that are interspersed and form higher-order structures that covary in copy number across individuals, likely through concerted amplification via unequal exchange. Finally, we use pericentromeric polymorphisms to estimate centromeric genetic relatedness between individuals and find a strong predictive relationship between centromeric lineages and pericentromeric simple satellite abundances. In particular, ancestral monomers of Human Satellite 2 and Human Satellite 3 abundances correlate with clusters of centromeric ancestry on chromosome 16 and chromosome 9, with some clusters structured by population. These results provide new descriptions of the population dynamics that underlie the evolution of simple satellites in humans.
Collapse
Affiliation(s)
- Iskander Said
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Daniel A Barbash
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
35
|
L Rocha J, Lou RN, Sudmant PH. Structural variation in humans and our primate kin in the era of telomere-to-telomere genomes and pangenomics. Curr Opin Genet Dev 2024; 87:102233. [PMID: 39042999 DOI: 10.1016/j.gde.2024.102233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 07/02/2024] [Accepted: 07/05/2024] [Indexed: 07/25/2024]
Abstract
Structural variants (SVs) account for the majority of base pair differences both within and between primate species. However, our understanding of inter- and intra-species SV has been historically hampered by the quality of draft primate genomes and the absence of genome resources for key taxa. Recently, advances in long-read sequencing and genome assembly have begun to radically reshape our understanding of SVs. Two landmark achievements include the publication of a human telomere-to-telomere (T2T) genome as well as the development of the first human pangenome reference. In this review, we first look back to the major works laying the foundation for these projects. We then examine the ways in which T2T genome assemblies and pangenomes are transforming our understanding of and approach to primate SV. Finally, we discuss what the future of primate SV research may look like in the era of T2T genomes and pangenomics.
Collapse
Affiliation(s)
- Joana L Rocha
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA. https://twitter.com/@joanocha
| | - Runyang N Lou
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA. https://twitter.com/@NicolasLou10
| | - Peter H Sudmant
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA; Center for Computational Biology, University of California, Berkeley, Berkeley, USA.
| |
Collapse
|
36
|
Kamitaki N, Hujoel MLA, Mukamel RE, Gebara E, McCarroll SA, Loh PR. A sequence of SVA retrotransposon insertions in ASIP shaped human pigmentation. Nat Genet 2024; 56:1583-1591. [PMID: 39048794 PMCID: PMC11319198 DOI: 10.1038/s41588-024-01841-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 06/21/2024] [Indexed: 07/27/2024]
Abstract
Retrotransposons comprise about 45% of the human genome1, but their contributions to human trait variation and evolution are only beginning to be explored2,3. Here, we find that a sequence of SVA retrotransposon insertions in an early intron of the ASIP (agouti signaling protein) gene has probably shaped human pigmentation several times. In the UK Biobank (n = 169,641), a recent 3.3-kb SVA insertion polymorphism associated strongly with lighter skin pigmentation (0.22 [0.21-0.23] s.d.; P = 2.8 × 10-351) and increased skin cancer risk (odds ratio = 1.23 [1.18-1.27]; P = 1.3 × 10-28), appearing to underlie one of the strongest common genetic influences on these phenotypes within European populations4-6. ASIP expression in skin displayed the same association pattern, with the SVA insertion allele exhibiting 2.2-fold (1.9-2.6) increased expression. This effect had an unusual apparent mechanism: an earlier, nonpolymorphic, human-specific SVA retrotransposon 3.9 kb upstream appeared to have caused ASIP hypofunction by nonproductive splicing, which the new (polymorphic) SVA insertion largely eliminated. Extended haplotype homozygosity indicated that the insertion allele has risen to allele frequencies up to 11% in European populations over the past several thousand years. These results indicate that a sequence of retrotransposon insertions contributed to a species-wide increase, then a local decrease, of human pigmentation.
Collapse
Affiliation(s)
- Nolan Kamitaki
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Margaux L A Hujoel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ronen E Mukamel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Edward Gebara
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Steven A McCarroll
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
| | - Po-Ru Loh
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
37
|
Taylor DJ, Eizenga JM, Li Q, Das A, Jenike KM, Kenny EE, Miga KH, Monlong J, McCoy RC, Paten B, Schatz MC. Beyond the Human Genome Project: The Age of Complete Human Genome Sequences and Pangenome References. Annu Rev Genomics Hum Genet 2024; 25:77-104. [PMID: 38663087 PMCID: PMC11451085 DOI: 10.1146/annurev-genom-021623-081639] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2024]
Abstract
The Human Genome Project was an enormous accomplishment, providing a foundation for countless explorations into the genetics and genomics of the human species. Yet for many years, the human genome reference sequence remained incomplete and lacked representation of human genetic diversity. Recently, two major advances have emerged to address these shortcomings: complete gap-free human genome sequences, such as the one developed by the Telomere-to-Telomere Consortium, and high-quality pangenomes, such as the one developed by the Human Pangenome Reference Consortium. Facilitated by advances in long-read DNA sequencing and genome assembly algorithms, complete human genome sequences resolve regions that have been historically difficult to sequence, including centromeres, telomeres, and segmental duplications. In parallel, pangenomes capture the extensive genetic diversity across populations worldwide. Together, these advances usher in a new era of genomics research, enhancing the accuracy of genomic analysis, paving the path for precision medicine, and contributing to deeper insights into human biology.
Collapse
Affiliation(s)
- Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Jordan M Eizenga
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Arun Das
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Katharine M Jenike
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA;
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA;
| | - Karen H Miga
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Jean Monlong
- Institut de Recherche en Santé Digestive, Université de Toulouse, INSERM, INRA, ENVT, UPS, Toulouse, France;
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Benedict Paten
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| |
Collapse
|
38
|
Liu C, Wu P, Wu X, Zhao X, Chen F, Cheng X, Zhu H, Wang O, Xu M. AsmMix: an efficient haplotype-resolved hybrid de novo genome assembling pipeline. Front Genet 2024; 15:1421565. [PMID: 39130747 PMCID: PMC11310137 DOI: 10.3389/fgene.2024.1421565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 07/05/2024] [Indexed: 08/13/2024] Open
Abstract
Accurate haplotyping facilitates distinguishing allele-specific expression, identifying cis-regulatory elements, and characterizing genomic variations, which enables more precise investigations into the relationship between genotype and phenotype. Recent advances in third-generation single-molecule long read and synthetic co-barcoded read sequencing techniques have harnessed long-range information to simplify the assembly graph and improve assembly genomic sequence. However, it remains methodologically challenging to reconstruct the complete haplotypes due to high sequencing error rates of long reads and limited capturing efficiency of co-barcoded reads. We here present a pipeline, AsmMix, for generating both contiguous and accurate diploid genomes. It first assembles co-barcoded reads to generate accurate haplotype-resolved assemblies that may contain many gaps, while the long-read assembly is contiguous but susceptible to errors. Then two assembly sets are integrated into haplotype-resolved assemblies with reduced misassembles. Through extensive evaluation on multiple synthetic datasets, AsmMix consistently demonstrates high precision and recall rates for haplotyping across diverse sequencing platforms, coverage depths, read lengths, and read accuracies, significantly outperforming other existing tools in the field. Furthermore, we validate the effectiveness of our pipeline using a human whole genome dataset (HG002), and produce highly contiguous, accurate, and haplotype-resolved assemblies. These assemblies are evaluated using the GIAB benchmarks, confirming the accuracy of variant calling. Our results demonstrate that AsmMix offers a straightforward yet highly efficient approach that effectively leverages both long reads and co-barcoded reads for haplotype-resolved assembly.
Collapse
Affiliation(s)
- Chao Liu
- BGI, Tianjin, China
- BGI Research, Shenzhen, China
| | - Pei Wu
- BGI, Tianjin, China
- BGI Research, Shenzhen, China
| | - Xue Wu
- BGI Research, Shenzhen, China
| | | | | | | | - Hongmei Zhu
- BGI, Tianjin, China
- BGI Research, Shenzhen, China
| | - Ou Wang
- BGI Research, Shenzhen, China
| | - Mengyang Xu
- BGI Research, Shenzhen, China
- BGI Research, Qingdao, China
| |
Collapse
|
39
|
Sarwal V, Lee S, Yang J, Sankararaman S, Chaisson M, Eskin E, Mangul S. VISTA: an integrated framework for structural variant discovery. Brief Bioinform 2024; 25:bbae462. [PMID: 39297879 PMCID: PMC11411772 DOI: 10.1093/bib/bbae462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 08/27/2024] [Accepted: 09/07/2024] [Indexed: 09/26/2024] Open
Abstract
Structural variation (SV) refers to insertions, deletions, inversions, and duplications in human genomes. SVs are present in approximately 1.5% of the human genome. Still, this small subset of genetic variation has been implicated in the pathogenesis of psoriasis, Crohn's disease and other autoimmune disorders, autism spectrum and other neurodevelopmental disorders, and schizophrenia. Since identifying structural variants is an important problem in genetics, several specialized computational techniques have been developed to detect structural variants directly from sequencing data. With advances in whole-genome sequencing (WGS) technologies, a plethora of SV detection methods have been developed. However, dissecting SVs from WGS data remains a challenge, with the majority of SV detection methods prone to a high false-positive rate, and no existing method able to precisely detect a full range of SVs present in a sample. Previous studies have shown that none of the existing SV callers can maintain high accuracy across various SV lengths and genomic coverages. Here, we report an integrated structural variant calling framework, Variant Identification and Structural Variant Analysis (VISTA), that leverages the results of individual callers using a novel and robust filtering and merging algorithm. In contrast to existing consensus-based tools which ignore the length and coverage, VISTA overcomes this limitation by executing various combinations of top-performing callers based on variant length and genomic coverage to generate SV events with high accuracy. We evaluated the performance of VISTA on comprehensive gold-standard datasets across varying organisms and coverage. We benchmarked VISTA using the Genome-in-a-Bottle gold standard SV set, haplotype-resolved de novo assemblies from the Human Pangenome Reference Consortium, along with an in-house polymerase chain reaction (PCR)-validated mouse gold standard set. VISTA maintained the highest F1 score among top consensus-based tools measured using a comprehensive gold standard across both mouse and human genomes. VISTA also has an optimized mode, where the calls can be optimized for precision or recall. VISTA-optimized can attain 100% precision and the highest sensitivity among other variant callers. In conclusion, VISTA represents a significant advancement in structural variant calling, offering a robust and accurate framework that outperforms existing consensus-based tools and sets a new standard for SV detection in genomic research.
Collapse
Affiliation(s)
- Varuni Sarwal
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, United States
| | - Seungmo Lee
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, United States
| | - Jianzhi Yang
- Department of Quantitative and Computational Biology, Dana and David Dornsife College of Letters, Arts and Sciences University of Southern California, 3540 S Figueroa St, Los Angeles, California 90089, United States
| | - Sriram Sankararaman
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, United States
| | - Mark Chaisson
- Department of Quantitative and Computational Biology, Dana and David Dornsife College of Letters, Arts and Sciences University of Southern California, 3540 S Figueroa St, Los Angeles, California 90089, United States
| | - Eleazar Eskin
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, United States
| | - Serghei Mangul
- Department of Quantitative and Computational Biology, Dana and David Dornsife College of Letters, Arts and Sciences University of Southern California, 3540 S Figueroa St, Los Angeles, California 90089, United States
- Department of Clinical Pharmacy, Alfred E. Mann School of Pharmacy, University of Southern California, 1540 Alcazar Street, Los Angeles, CA 90033, United States
| |
Collapse
|
40
|
Junjun R, Zhengqian Z, Ying W, Jialiang W, Yongzhuang L. A comprehensive review of deep learning-based variant calling methods. Brief Funct Genomics 2024; 23:303-313. [PMID: 38366908 DOI: 10.1093/bfgp/elae003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/14/2024] [Accepted: 01/18/2023] [Indexed: 02/18/2024] Open
Abstract
Genome sequencing data have become increasingly important in the field of personalized medicine and diagnosis. However, accurately detecting genomic variations remains a challenging task. Traditional variation detection methods rely on manual inspection or predefined rules, which can be time-consuming and prone to errors. Consequently, deep learning-based approaches for variation detection have gained attention due to their ability to automatically learn genomic features that distinguish between variants. In our review, we discuss the recent advancements in deep learning-based algorithms for detecting small variations and structural variations in genomic data, as well as their advantages and limitations.
Collapse
Affiliation(s)
- Ren Junjun
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Zhang Zhengqian
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wu Ying
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wang Jialiang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Liu Yongzhuang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| |
Collapse
|
41
|
Bai X, Chen Z, Chen K, Wu Z, Wang R, Liu J, Chang L, Wen L, Tang F. Simultaneous de novo calling and phasing of genetic variants at chromosome-scale using NanoStrand-seq. Cell Discov 2024; 10:74. [PMID: 38977679 PMCID: PMC11231365 DOI: 10.1038/s41421-024-00694-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 05/23/2024] [Indexed: 07/10/2024] Open
Abstract
The successful accomplishment of the first telomere-to-telomere human genome assembly, T2T-CHM13, marked a milestone in achieving completeness of the human reference genome. The upcoming era of genome study will focus on fully phased diploid genome assembly, with an emphasis on genetic differences between individual haplotypes. Most existing sequencing approaches only achieved localized haplotype phasing and relied on additional pedigree information for further whole-chromosome scale phasing. The short-read-based Strand-seq method is able to directly phase single nucleotide polymorphisms (SNPs) at whole-chromosome scale but falls short when it comes to phasing structural variations (SVs). To shed light on this issue, we developed a Nanopore sequencing platform-based Strand-seq approach, which we named NanoStrand-seq. This method allowed for de novo SNP calling with high precision (99.52%) and acheived a superior phasing accuracy (0.02% Hamming error rate) at whole-chromosome scale, a level of performance comparable to Strand-seq for haplotype phasing of the GM12878 genome. Importantly, we demonstrated that NanoStrand-seq can efficiently resolve the MHC locus, a highly polymorphic genomic region. Moreover, NanoStrand-seq enabled independent direct calling and phasing of deletions and insertions at whole-chromosome level; when applied to long genomic regions of SNP homozygosity, it outperformed the strategy that combined Strand-seq with bulk long-read sequencing. Finally, we showed that, like Strand-seq, NanoStrand-seq was also applicable to primary cultured cells. Together, here we provided a novel methodology that enabled interrogation of a full spectrum of haplotype-resolved SNPs and SVs at whole-chromosome scale, with broad applications for species with diploid or even potentially polypoid genomes.
Collapse
Affiliation(s)
- Xiuzhen Bai
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
| | - Zonggui Chen
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Changping Laboratory, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Kexuan Chen
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- School of Life Sciences, Peking University, Beijing, China
| | - Zixin Wu
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Rui Wang
- Department of Medicine, Cancer Institute, Stanford University, Stanford, CA, USA
| | - Jun'e Liu
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
- School of Life Sciences, Peking University, Beijing, China
| | - Liang Chang
- State Key Laboratory of Female Fertility Promotion, Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, China
- National Clinical Research Center for Obstetrics and Gynecology (Peking University Third Hospital), Beijing, China
- Key Laboratory of Assisted Reproduction (Peking University), Ministry of Education Beijing, Beijing, China
- Key Laboratory of Reproductive Endocrinology and Assisted Reproductive Technology, Beijing, China
| | - Lu Wen
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
| | - Fuchou Tang
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China.
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China.
- Changping Laboratory, Beijing, China.
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China.
- School of Life Sciences, Peking University, Beijing, China.
| |
Collapse
|
42
|
Ji Y, Zhao J, Gong J, Sedlazeck FJ, Fan S. Unveiling novel genetic variants in 370 challenging medically relevant genes using the long read sequencing data of 41 samples from 19 global populations. Mol Genet Genomics 2024; 299:65. [PMID: 38972030 DOI: 10.1007/s00438-024-02158-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 06/16/2024] [Indexed: 07/08/2024]
Abstract
BACKGROUND A large number of challenging medically relevant genes (CMRGs) are situated in complex or highly repetitive regions of the human genome, hindering comprehensive characterization of genetic variants using next-generation sequencing technologies. In this study, we employed long-read sequencing technology, extensively utilized in studying complex genomic regions, to characterize genetic alterations, including short variants (single nucleotide variants and short insertions and deletions) and copy number variations, in 370 CMRGs across 41 individuals from 19 global populations. RESULTS Our analysis revealed high levels of genetic variants in CMRGs, with 68.73% exhibiting copy number variations and 65.20% containing short variants that may disrupt protein function across individuals. Such variants can influence pharmacogenomics, genetic disease susceptibility, and other clinical outcomes. We observed significant differences in CMRG variation across populations, with individuals of African ancestry harboring the highest number of copy number variants and short variants compared to samples from other continents. Notably, 15.79% to 33.96% of short variants were exclusively detectable through long-read sequencing. While the T2T-CHM13 reference genome significantly improved the assembly of CMRG regions, thereby facilitating variant detection in these regions, some regions still lacked resolution. CONCLUSION Our results provide an important reference for future clinical and pharmacogenetic studies, highlighting the need for a comprehensive representation of global genetic diversity in the reference genome and improved variant calling techniques to fully resolve medically relevant genes.
Collapse
Affiliation(s)
- Yanfeng Ji
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Junfan Zhao
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Jiao Gong
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA.
| | - Shaohua Fan
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China.
| |
Collapse
|
43
|
Jia H, Tan S, Cai Y, Guo Y, Shen J, Zhang Y, Ma H, Zhang Q, Chen J, Qiao G, Ruan J, Zhang YE. Low-input PacBio sequencing generates high-quality individual fly genomes and characterizes mutational processes. Nat Commun 2024; 15:5644. [PMID: 38969648 PMCID: PMC11226609 DOI: 10.1038/s41467-024-49992-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 06/20/2024] [Indexed: 07/07/2024] Open
Abstract
Long-read sequencing, exemplified by PacBio, revolutionizes genomics, overcoming challenges like repetitive sequences. However, the high DNA requirement ( > 1 µg) is prohibitive for small organisms. We develop a low-input (100 ng), low-cost, and amplification-free library-generation method for PacBio sequencing (LILAP) using Tn5-based tagmentation and DNA circularization within one tube. We test LILAP with two Drosophila melanogaster individuals, and generate near-complete genomes, surpassing preexisting single-fly genomes. By analyzing variations in these two genomes, we characterize mutational processes: complex transpositions (transposon insertions together with extra duplications and/or deletions) prefer regions characterized by non-B DNA structures, and gene conversion of transposons occurs on both DNA and RNA levels. Concurrently, we generate two complete assemblies for the endosymbiotic bacterium Wolbachia in these flies and similarly detect transposon conversion. Thus, LILAP promises a broad PacBio sequencing adoption for not only mutational studies of flies and their symbionts but also explorations of other small organisms or precious samples.
Collapse
Affiliation(s)
- Hangxing Jia
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
| | - Shengjun Tan
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
| | - Yingao Cai
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yanyan Guo
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jieyu Shen
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yaqiong Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Huijing Ma
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Qingzhu Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jinfeng Chen
- University of Chinese Academy of Sciences, Beijing, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Gexia Qiao
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
| | - Yong E Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
44
|
Kramer M, Goodwin S, Wappel R, Borio M, Offit K, Feldman DR, Stadler ZK, McCombie WR. Exploring the genetic and epigenetic underpinnings of early-onset cancers: Variant prioritization for long read whole genome sequencing from family cancer pedigrees. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.27.601096. [PMID: 39005350 PMCID: PMC11244929 DOI: 10.1101/2024.06.27.601096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Despite significant advances in our understanding of genetic cancer susceptibility, known inherited cancer predisposition syndromes explain at most 20% of early-onset cancers. As early-onset cancer prevalence continues to increase, the need to assess previously inaccessible areas of the human genome, harnessing a trio or quad family-based architecture for variant filtration, may reveal further insights into cancer susceptibility. To assess a broader spectrum of variation than can be ascertained by multi-gene panel sequencing, or even whole genome sequencing with short reads, we employed long read whole genome sequencing using an Oxford Nanopore Technology (ONT) PromethION of 3 families containing an early-onset cancer proband using a trio or quad family architecture. Analysis included 2 early-onset colorectal cancer family trios and one quad consisting of two siblings with testicular cancer, all with unaffected parents. Structural variants (SVs), epigenetic profiles and single nucleotide variants (SNVs) were determined for each individual, and a filtering strategy was employed to refine and prioritize candidate variants based on the family architecture. The family architecture enabled us to focus on inapposite variants while filtering variants shared with the unaffected parents, significantly decreasing background variation that can hamper identification of potentially disease causing differences. Candidate d e novo and compound heterozygous variants were identified in this way. Gene expression, in matched neoplastic and pre-neoplastic lesions, was assessed for one trio. Our study demonstrates the feasibility of a streamlined analysis of genomic variants from long read ONT whole genome sequencing and a way to prioritize key variants for further evaluation of pathogenicity, while revealing what may be missing from panel based analyses.
Collapse
|
45
|
Trégouët DA, Morange PE. Next-generation sequencing strategies in venous thromboembolism: in whom and for what purpose? J Thromb Haemost 2024; 22:1826-1834. [PMID: 38641321 DOI: 10.1016/j.jtha.2024.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/04/2024] [Accepted: 04/05/2024] [Indexed: 04/21/2024]
Abstract
This invited review follows the oral presentation "To Sequence or Not to Sequence, That Is Not the Question; But 'When, Who, Which and What For?' Is" given during the State of the Art session "Translational Genomics in Thrombosis: From OMICs to Clinics" of the International Society on Thrombosis and Haemostasis 2023 Congress. Emphasizing the power of next-generation sequencing technologies and the diverse strategies associated with DNA variant analysis, this review highlights the unresolved questions and challenges in their implementation both for the clinical diagnosis of venous thromboembolism and in translational research.
Collapse
Affiliation(s)
- David-Alexandre Trégouët
- University of Bordeaux, Institut National de la Santé et de la Recherche Médicale, Bordeaux Population Health Research Center, Unité Mixte de Recherche 1219, Bordeaux, France.
| | - Pierre-Emmanuel Morange
- Cardiovascular and Nutrition Research Center (Centre de Recherche en CardioVasculaire et Nutrition), Institut National de la Santé et de la Recherche Médicale, Institut National de Recherche pour l'agriculture, l' Alimentation et l'Environnement, Aix-Marseille University, Marseille, France
| |
Collapse
|
46
|
Li W, Xu M, Zhang Z, Liang J, Fu R, Lin W, Luo W, Zhang X, Ren T. Regulatory Effects of 198-bp Structural Variants in the GSTA2 Promoter Region on Adipogenesis in Chickens. Int J Mol Sci 2024; 25:7155. [PMID: 39000259 PMCID: PMC11241197 DOI: 10.3390/ijms25137155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 06/20/2024] [Accepted: 06/24/2024] [Indexed: 07/16/2024] Open
Abstract
Molecular breeding accelerates animal breeding and improves efficiency by utilizing genetic mutations. Structural variations (SVs), a significant source of genetic mutations, have a greater impact on phenotypic variation than SNPs. Understanding SV functional mechanisms and obtaining precise information are crucial for molecular breeding. In this study, association analysis revealed significant correlations between 198-bp SVs in the GSTA2 promoter region and abdominal fat weight, intramuscular fat content, and subcutaneous fat thickness in chickens. High expression of GSTA2 in adipose tissue was positively correlated with the abdominal fat percentage, and different genotypes of GSTA2 exhibited varied expression patterns in the liver. The 198-bp SVs regulate GSTA2 expression by binding to different transcription factors. Overexpression of GSTA2 promoted preadipocyte proliferation and differentiation, while interference had the opposite effect. Mechanistically, the 198-bp fragment contains binding sites for transcription factors such as C/EBPα that regulate GSTA2 expression and fat synthesis. These SVs are significantly associated with chicken fat traits, positively influencing preadipocyte development by regulating cell proliferation and differentiation. Our work provides compelling evidence for the use of 198-bp SVs in the GSTA2 promoter region as molecular markers for poultry breeding and offers new insights into the pivotal role of the GSTA2 gene in fat generation.
Collapse
Affiliation(s)
- Wangyu Li
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou 510642, China (R.F.); (W.L.); (W.L.)
- Guangdong Key Laboratory of Genome and Molecular Breeding of Agricultural Animals and Key Laboratory of Chicken Genetic Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, China
- State Key Laboratory of Swine and Poultry Breeding Industry, Guangzhou 510642, China
| | - Meng Xu
- College of Veterinary Medicine, Jilin University, Changchun 130062, China
| | - Zihao Zhang
- College of Coastal Agricultural Sciences, Guangdong Ocean University, Zhanjiang 524088, China;
| | - Jiaying Liang
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou 510642, China (R.F.); (W.L.); (W.L.)
- Guangdong Key Laboratory of Genome and Molecular Breeding of Agricultural Animals and Key Laboratory of Chicken Genetic Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, China
- State Key Laboratory of Swine and Poultry Breeding Industry, Guangzhou 510642, China
| | - Rong Fu
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou 510642, China (R.F.); (W.L.); (W.L.)
- Guangdong Key Laboratory of Genome and Molecular Breeding of Agricultural Animals and Key Laboratory of Chicken Genetic Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, China
- State Key Laboratory of Swine and Poultry Breeding Industry, Guangzhou 510642, China
| | - Wujian Lin
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou 510642, China (R.F.); (W.L.); (W.L.)
- Guangdong Key Laboratory of Genome and Molecular Breeding of Agricultural Animals and Key Laboratory of Chicken Genetic Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, China
- State Key Laboratory of Swine and Poultry Breeding Industry, Guangzhou 510642, China
| | - Wen Luo
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou 510642, China (R.F.); (W.L.); (W.L.)
- Guangdong Key Laboratory of Genome and Molecular Breeding of Agricultural Animals and Key Laboratory of Chicken Genetic Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, China
- State Key Laboratory of Swine and Poultry Breeding Industry, Guangzhou 510642, China
| | - Xiquan Zhang
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou 510642, China (R.F.); (W.L.); (W.L.)
- Guangdong Key Laboratory of Genome and Molecular Breeding of Agricultural Animals and Key Laboratory of Chicken Genetic Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, China
- State Key Laboratory of Swine and Poultry Breeding Industry, Guangzhou 510642, China
| | - Tuanhui Ren
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou 510642, China (R.F.); (W.L.); (W.L.)
- Guangdong Key Laboratory of Genome and Molecular Breeding of Agricultural Animals and Key Laboratory of Chicken Genetic Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, China
- State Key Laboratory of Swine and Poultry Breeding Industry, Guangzhou 510642, China
- College of Veterinary Medicine, Jilin University, Changchun 130062, China
| |
Collapse
|
47
|
Koenig Z, Yohannes MT, Nkambule LL, Zhao X, Goodrich JK, Kim HA, Wilson MW, Tiao G, Hao SP, Sahakian N, Chao KR, Walker MA, Lyu Y, Rehm HL, Neale BM, Talkowski ME, Daly MJ, Brand H, Karczewski KJ, Atkinson EG, Martin AR. A harmonized public resource of deeply sequenced diverse human genomes. Genome Res 2024; 34:796-809. [PMID: 38749656 PMCID: PMC11216312 DOI: 10.1101/gr.278378.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 05/07/2024] [Indexed: 05/18/2024]
Abstract
Underrepresented populations are often excluded from genomic studies owing in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high-quality set of 4094 whole genomes from 80 populations in the HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also show substantial added value from this data set compared with the prior versions of the component resources, typically combined via liftOver and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared with previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality-control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.
Collapse
Affiliation(s)
- Zan Koenig
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Mary T Yohannes
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Lethukuthula L Nkambule
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Xuefang Zhao
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Julia K Goodrich
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Heesu Ally Kim
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Michael W Wilson
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Grace Tiao
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Stephanie P Hao
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Nareh Sahakian
- Broad Genomics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, 02141, USA
| | - Katherine R Chao
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Mark A Walker
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Data Sciences Platform, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Yunfei Lyu
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
| | - Heidi L Rehm
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Benjamin M Neale
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Michael E Talkowski
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Mark J Daly
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115, USA
- Institute for Molecular Medicine Finland, 00290 Helsinki, Finland
| | - Harrison Brand
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Konrad J Karczewski
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Elizabeth G Atkinson
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Alicia R Martin
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA;
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| |
Collapse
|
48
|
Chiu R, Rajan-Babu IS, Friedman JM, Birol I. A comprehensive tandem repeat catalog of the human genome. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.06.19.24309173. [PMID: 38947075 PMCID: PMC11213036 DOI: 10.1101/2024.06.19.24309173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
With the increasing availability of long-read sequencing data, high-quality human genome assemblies, and software for fully characterizing tandem repeats, genome-wide genotyping of tandem repeat loci on a population scale becomes more feasible. Such efforts not only expand our knowledge of the tandem repeat landscape in the human genome but also enhance our ability to differentiate pathogenic tandem repeat mutations from benign polymorphisms. To this end, we analyzed 272 genomes assembled using datasets from three public initiatives that employed different long-read sequencing technologies. Here, we report a catalog of over 18 million tandem repeat loci, many of which were previously unannotated. Some of these loci are highly polymorphic, and many of them reside within coding sequences.
Collapse
Affiliation(s)
- Readman Chiu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
| | - Indhu-Shree Rajan-Babu
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| | - Jan M Friedman
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
- BC Children's Hospital Research Institute, Vancouver, BC V5Z 4H4, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| |
Collapse
|
49
|
Catlin NS, Agha HI, Platts AE, Munasinghe M, Hirsch CN, Josephs EB. Structural variants contribute to phenotypic variation in maize. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.14.599082. [PMID: 38948717 PMCID: PMC11212879 DOI: 10.1101/2024.06.14.599082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Comprehensively identifying the loci shaping trait variation has been challenging, in part because standard approaches often miss many types of genetic variants. Structural variants, especially transposable elements are likely to affect phenotypic variation but we need better methods in maize for detecting polymorphic structural variants and TEs using short-read sequencing data. Here, we used a whole genome alignment between two maize genotypes to identify polymorphic structural variants and then genotyped a large maize diversity panel for these variants using short-read sequencing data. We characterized variation of SVs within the panel and identified SV polymorphisms that are associated with life history traits and genotype-by-environment interactions. While most of the SVs associated with traits contained TEs, only one of the SV's boundaries clearly matched TE breakpoints indicative of a TE insertion, whereas the other polymorphisms were likely caused by deletions. All of the SVs associated with traits were in linkage disequilibrium with nearby single nucleotide polymorphisms (SNPs), suggesting that this method did not identify variants that would have been missed in a SNP association study.
Collapse
Affiliation(s)
- Nathan S. Catlin
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
- Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, MI, 48824, USA
- Plant Resilience Institute, Michigan State University, East Lansing, MI, 48824, USA
| | - Husain I. Agha
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
- Plant Resilience Institute, Michigan State University, East Lansing, MI, 48824, USA
| | - Adrian E. Platts
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Manisha Munasinghe
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN, 55108, USA
| | - Candice N. Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Emily B. Josephs
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
- Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, MI, 48824, USA
- Plant Resilience Institute, Michigan State University, East Lansing, MI, 48824, USA
| |
Collapse
|
50
|
Liang H, Sedillo JC, Schrodi SJ, Ikeda A. Structural variants in linkage disequilibrium with GWAS-significant SNPs. Heliyon 2024; 10:e32053. [PMID: 38882374 PMCID: PMC11177133 DOI: 10.1016/j.heliyon.2024.e32053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 05/17/2024] [Accepted: 05/28/2024] [Indexed: 06/18/2024] Open
Abstract
With the recent expansion of structural variant identification in the human genome, understanding the role of these impactful variants in disease architecture is critically important. Currently, a large proportion of genome-wide-significant genome-wide association study (GWAS) single nucleotide polymorphisms (SNPs) are functionally unresolved, raising the possibility that some of these SNPs are associated with disease through linkage disequilibrium with causal structural variants. Hence, understanding the linkage disequilibrium between newly discovered structural variants and statistically significant SNPs may provide a resource for further investigation into disease-associated regions in the genome. Here we present a resource cataloging structural variant-significant SNP pairs in high linkage disequilibrium. The database is composed of (i) SNPs that have exhibited genome-wide significant association with traits, primarily disease phenotypes, (ii) newly released structural variants (SVs), and (iii) linkage disequilibrium values calculated from unphased data. All data files including those detailing SV and GWAS SNP associations and results of GWAS-SNP-SV pairs are available at the SV-SNP LD Database and can be accessed at 'https://github.com/hliang-SchrodiLab/SV_SNPs. Our analysis results represent a useful fine mapping tool for interrogating SVs in linkage disequilibrium with disease-associated SNPs. We anticipate that this resource may play an important role in subsequent studies which investigate incorporating disease causing SVs into disease risk prediction models.
Collapse
Affiliation(s)
- Hao Liang
- Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI, USA
| | - Joni C Sedillo
- Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI, USA
- Computation and Informatics in Biology and Medicine, University of Wisconsin-Madison, Madison, WI, USA
| | - Steven J Schrodi
- Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI, USA
- Computation and Informatics in Biology and Medicine, University of Wisconsin-Madison, Madison, WI, USA
| | - Akihiro Ikeda
- Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI, USA
- McPherson Eye Research Institute, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|