1
|
An Z, Jiang A, Chen J. Toward understanding the role of genomic repeat elements in neurodegenerative diseases. Neural Regen Res 2025; 20:646-659. [PMID: 38886931 PMCID: PMC11433896 DOI: 10.4103/nrr.nrr-d-23-01568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 12/21/2023] [Accepted: 03/02/2024] [Indexed: 06/20/2024] Open
Abstract
Neurodegenerative diseases cause great medical and economic burdens for both patients and society; however, the complex molecular mechanisms thereof are not yet well understood. With the development of high-coverage sequencing technology, researchers have started to notice that genomic repeat regions, previously neglected in search of disease culprits, are active contributors to multiple neurodegenerative diseases. In this review, we describe the association between repeat element variants and multiple degenerative diseases through genome-wide association studies and targeted sequencing. We discuss the identification of disease-relevant repeat element variants, further powered by the advancement of long-read sequencing technologies and their related tools, and summarize recent findings in the molecular mechanisms of repeat element variants in brain degeneration, such as those causing transcriptional silencing or RNA-mediated gain of toxic function. Furthermore, we describe how in silico predictions using innovative computational models, such as deep learning language models, could enhance and accelerate our understanding of the functional impact of repeat element variants. Finally, we discuss future directions to advance current findings for a better understanding of neurodegenerative diseases and the clinical applications of genomic repeat elements.
Collapse
Affiliation(s)
- Zhengyu An
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Aidi Jiang
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Jingqi Chen
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Fudan University, Shanghai, China
- Zhangjiang Fudan International Innovation Center, Shanghai, China
| |
Collapse
|
2
|
Liu X, Gu L, Hao C, Xu W, Leng F, Zhang P, Li W. Systematic assessment of structural variant annotation tools for genomic interpretation. Life Sci Alliance 2025; 8:e202402949. [PMID: 39658089 PMCID: PMC11632063 DOI: 10.26508/lsa.202402949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 11/30/2024] [Accepted: 12/02/2024] [Indexed: 12/12/2024] Open
Abstract
Structural variants (SVs) over 50 base pairs play a significant role in phenotypic diversity and are associated with various diseases, but their analysis is complex and resource-intensive. Numerous computational tools have been developed for SV prioritization, yet their effectiveness in biomedicine remains unclear. Here we benchmarked eight widely used SV prioritization tools, categorized into knowledge-driven (AnnotSV, ClassifyCNV) and data-driven (CADD-SV, dbCNV, StrVCTVRE, SVScore, TADA, XCNV) groups in accordance with the ACMG guidelines. We assessed their accuracy, robustness, and usability across diverse genomic contexts, biological mechanisms and computational efficiency using seven carefully curated independent datasets. Our results revealed that both groups of methods exhibit comparable effectiveness in predicting SV pathogenicity, although performance varies among tools, emphasizing the importance of selecting the appropriate tool based on specific research purposes. Furthermore, we pinpointed the potential improvement of expanding these tools for future applications. Our benchmarking framework provides a crucial evaluation method for SV analysis tools, offering practical guidance for biomedical research and facilitating the advancement of better genomic research tools.
Collapse
Affiliation(s)
- Xuanshi Liu
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Lei Gu
- Epigenetics Laboratory, Max-Planck Institute for Heart and Lung Research, Cardiopulmonary Institute, Bad Nauheim, Germany
| | - Chanjuan Hao
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Wenjian Xu
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Fei Leng
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Peng Zhang
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Wei Li
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| |
Collapse
|
3
|
Zhang S, Gao Y, Wang X, Li Q, Tan J, Liang B, Gao M, Wu J, Ling X, Liu J, Teng X, Li H, Sun Y, Huang W, Tong X, Lei C, Li H, Wang J, Li S, Xu X, Zhang J, Wu W, Liang S, Ou J, Zhao Q, Jin R, Zhang Y, Xu C, Lu D, Yan J, Sun X, Choy KW, Xu C, Chen ZJ. Preimplantation genetic testing for structural rearrangements by genome-wide SNP genotyping and haplotype analysis: a prospective multicenter clinical study. EBioMedicine 2024; 111:105514. [PMID: 39708428 DOI: 10.1016/j.ebiom.2024.105514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2024] [Revised: 12/05/2024] [Accepted: 12/06/2024] [Indexed: 12/23/2024] Open
Abstract
BACKGROUND Preimplantation genetic testing for chromosomal structural rearrangements (PGT-SR) has been widely utilized to select euploid embryos in patients carrying balanced chromosomal rearrangements (BCRs) by chromosome copy number analysis. However, reliable and extensively validated PGT-SR methods for selecting embryos without BCRs in large-cohort studies are lacking. METHODS In this prospective, multicenter, cohort study, carriers with BCRs undergoing PGT-SR were recruited across 12 academic fertility centers within China. PGT-SR was performed using genome-wide SNP genotyping and haplotyping approach. Parental haplotypes were phased by available genotypes from a close relative or an unbalanced embryo. The karyotypes of embryos were inferred from the haplotypes. Only a single embryo was transferred in each cycle. FINDINGS Between April 2018 and March 2023, 1298 carriers we randomly enrolled. A total of 7867 blastocysts from 1603 PGT-SR cycles were biopsied, in which 7750 (98.51%) were successfully genotyped and analyzed. Overall, 75.98% (1218/1603) of cycles obtained euploid embryos and 53.15% (852/1603) generated non-carrier embryos. The proportion of carrier and non-carrier embryos was similar in different subgroups. A total of 1030 non-carrier and 439 carrier embryos were transferred, 817 healthy babies were delivered cumulatively. Our results demonstrate that SNP-haplotyping method is highly accurate (sensitivity 95% CI: 98.34%-100%, specificity 95% CI: 96.63%-100%, respectively), and can be applied universally to different BCR types. Moreover, the clinical outcomes were comparable between the carrier and non-carrier embryo groups. INTERPRETATION This study demonstrates the effectiveness of preimplantation genetic genome-wide SNP-genotyping and haplotyping method, resulting in the delivery of more babies with a normal karyotype. FUNDING This study was funded by the National Key Research and Development Program of China (2022YFC2703200, 2021YFC2700600, 2021YFC2700500), National Natural Science Foundation of China (82201807, 82171639, 82071717). Shanghai Science and Technology Innovation Action Plan Program (18411953800), and the Municipal Human Resources Development Program for Outstanding Young Talents in Medical and Health Sciences in Shanghai (2022YQ075).
Collapse
Affiliation(s)
- Shuo Zhang
- Shanghai Ji Ai Genetics & IVF Institute, Obstetrics and Gynecology Hospital of Fudan University, Shanghai, 200011, China; Shanghai Key Laboratory of Female Reproductive Endocrine Related Diseases, Obstetrics and Gynecology Hospital of Fudan University, Shanghai, 200011, China; Department of Obstetrics and Gynecology of Shanghai Medical School, Fudan University, Shanghai, 200032, China
| | - Yuan Gao
- Center for Reproductive Medicine, Shandong University, Jinan, Shandong, 250012, China; Key Laboratory of Reproductive Endocrinology of Ministry of Education, Shandong University, Jinan, Shandong, 250012, China; Shandong Key Laboratory of Reproductive Medicine, Jinan, Shandong, 250012, China; Shandong Provincial Clinical Research Center for Reproductive Health, Jinan, Shandong, 250012, China; Shandong Technology Innovation Center for Reproductive Health, Jinan, Shandong, 250012, China; National Research Center for Assisted Reproductive Technology and Reproductive Genetics, Shandong University, Jinan, Shandong, 250012, China
| | - Xiaohong Wang
- Department of Gynecology & Obstetrics, Center for Reproductive Medicine, Tang Du Hospital, The Air Force Medical University, Xi'an, Shaanxi, 710038, China
| | - Qing Li
- Department of Obstetrics and Gynecology, Experimental Department of Obstetrics and Gynecology Institute, Guangdong Provincial Key Laboratory of Major Obstetric Diseases, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Jichun Tan
- Department of Obstetrics and Gynecology, Center of Reproductive Medicine, Shengjing Hospital of China Medical University, Shenyang, China; Key Laboratory of Reproductive Dysfunction Disease and Fertility Remodeling of Liaoning Province, Shenyang, China
| | - Bo Liang
- Department of Bioinformatics and Biostatistics, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Ming Gao
- Center for Reproductive Medicine, Shandong University, Jinan, Shandong, 250012, China; Key Laboratory of Reproductive Endocrinology of Ministry of Education, Shandong University, Jinan, Shandong, 250012, China; Shandong Key Laboratory of Reproductive Medicine, Jinan, Shandong, 250012, China; Shandong Provincial Clinical Research Center for Reproductive Health, Jinan, Shandong, 250012, China; Shandong Technology Innovation Center for Reproductive Health, Jinan, Shandong, 250012, China; National Research Center for Assisted Reproductive Technology and Reproductive Genetics, Shandong University, Jinan, Shandong, 250012, China
| | - Junping Wu
- Shanghai Ji Ai Genetics & IVF Institute, Obstetrics and Gynecology Hospital of Fudan University, Shanghai, 200011, China; Shanghai Key Laboratory of Female Reproductive Endocrine Related Diseases, Obstetrics and Gynecology Hospital of Fudan University, Shanghai, 200011, China; Department of Obstetrics and Gynecology of Shanghai Medical School, Fudan University, Shanghai, 200032, China
| | - Xiufeng Ling
- Department of Reproduction, The Affiliated Obstetrics and Gynecology Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing, Jiangsu, China
| | - Jiayin Liu
- State Key Laboratory of Reproductive Medicine, Clinical Center of Reproductive Medicine, First Affiliated Hospital, Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China
| | - Xiaoming Teng
- Department of Assisted Reproductive Medicine, Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Hong Li
- Center for Reproduction and Genetics, The Affiliated Suzhou Hospital of Nanjing Medical University, Suzhou Municipal Hospital, Gusu School, Nanjing Medical University, Suzhou, Jiangsu, China
| | - Yun Sun
- Center for Reproductive Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200135, China; Shanghai Key Laboratory for Assisted Reproduction and Reproductive Genetics, Shanghai, 200135, China
| | - Weidong Huang
- Xinjiang Jiayin Hospital, Urumqi, Xinjiang, 830000, China
| | - Xianhong Tong
- Reproductive and Genetic Hospital, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230001, China
| | - Caixia Lei
- Shanghai Ji Ai Genetics & IVF Institute, Obstetrics and Gynecology Hospital of Fudan University, Shanghai, 200011, China
| | - Hongchang Li
- Center for Reproductive Medicine, Shandong University, Jinan, Shandong, 250012, China; Key Laboratory of Reproductive Endocrinology of Ministry of Education, Shandong University, Jinan, Shandong, 250012, China; Shandong Key Laboratory of Reproductive Medicine, Jinan, Shandong, 250012, China; Shandong Provincial Clinical Research Center for Reproductive Health, Jinan, Shandong, 250012, China; Shandong Technology Innovation Center for Reproductive Health, Jinan, Shandong, 250012, China; National Research Center for Assisted Reproductive Technology and Reproductive Genetics, Shandong University, Jinan, Shandong, 250012, China
| | - Jun Wang
- Department of Gynecology & Obstetrics, Center for Reproductive Medicine, Tang Du Hospital, The Air Force Medical University, Xi'an, Shaanxi, 710038, China
| | - Shaoying Li
- Department of Obstetrics and Gynecology, Experimental Department of Obstetrics and Gynecology Institute, Guangdong Provincial Key Laboratory of Major Obstetric Diseases, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Xiaoyan Xu
- Department of Obstetrics and Gynecology, Center of Reproductive Medicine, Shengjing Hospital of China Medical University, Shenyang, China; Key Laboratory of Reproductive Dysfunction Disease and Fertility Remodeling of Liaoning Province, Shenyang, China
| | - Junqiang Zhang
- Department of Reproduction, The Affiliated Obstetrics and Gynecology Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing, Jiangsu, China
| | - Wei Wu
- State Key Laboratory of Reproductive Medicine, Clinical Center of Reproductive Medicine, First Affiliated Hospital, Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China
| | - Shanshan Liang
- Department of Assisted Reproductive Medicine, Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Jian Ou
- Center for Reproduction and Genetics, The Affiliated Suzhou Hospital of Nanjing Medical University, Suzhou Municipal Hospital, Gusu School, Nanjing Medical University, Suzhou, Jiangsu, China
| | - Qiongzhen Zhao
- Xinjiang Jiayin Hospital, Urumqi, Xinjiang, 830000, China
| | - Rentao Jin
- Reproductive and Genetic Hospital, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230001, China
| | - Yueping Zhang
- Shanghai Ji Ai Genetics & IVF Institute, Obstetrics and Gynecology Hospital of Fudan University, Shanghai, 200011, China; Center for Reproductive Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200135, China
| | - Chenming Xu
- Shanghai Ji Ai Genetics & IVF Institute, Obstetrics and Gynecology Hospital of Fudan University, Shanghai, 200011, China; Institute of Reproduction and Development, Fudan University, Shanghai, 200011, China
| | - Daru Lu
- State Key Laboratory of Genetic Engineering, School of Life Science, Fudan University, Shanghai, 200438, China; NHC Key Laboratory of Birth Defects and Reproductive Health, Chongqing Key Laboratory of Birth Defects and Reproductive Health, Chongqing Population and Family Planning, Science and Technology Research Institute, Chongqing, China
| | - Junhao Yan
- Center for Reproductive Medicine, Shandong University, Jinan, Shandong, 250012, China; Key Laboratory of Reproductive Endocrinology of Ministry of Education, Shandong University, Jinan, Shandong, 250012, China; Shandong Key Laboratory of Reproductive Medicine, Jinan, Shandong, 250012, China; Shandong Provincial Clinical Research Center for Reproductive Health, Jinan, Shandong, 250012, China; Shandong Technology Innovation Center for Reproductive Health, Jinan, Shandong, 250012, China; National Research Center for Assisted Reproductive Technology and Reproductive Genetics, Shandong University, Jinan, Shandong, 250012, China
| | - Xiaoxi Sun
- Shanghai Ji Ai Genetics & IVF Institute, Obstetrics and Gynecology Hospital of Fudan University, Shanghai, 200011, China; Shanghai Key Laboratory of Female Reproductive Endocrine Related Diseases, Obstetrics and Gynecology Hospital of Fudan University, Shanghai, 200011, China; Department of Obstetrics and Gynecology of Shanghai Medical School, Fudan University, Shanghai, 200032, China
| | - Kwong Wai Choy
- Fertility Preservation Research Centre, Department of Obstetrics and Gynaecology, The Chinese University of Hong Kong, Hong Kong, SAR, China
| | - Congjian Xu
- Shanghai Ji Ai Genetics & IVF Institute, Obstetrics and Gynecology Hospital of Fudan University, Shanghai, 200011, China; Shanghai Key Laboratory of Female Reproductive Endocrine Related Diseases, Obstetrics and Gynecology Hospital of Fudan University, Shanghai, 200011, China; Department of Obstetrics and Gynecology of Shanghai Medical School, Fudan University, Shanghai, 200032, China; Institute of Reproduction and Development, Fudan University, Shanghai, 200011, China.
| | - Zi-Jiang Chen
- Center for Reproductive Medicine, Shandong University, Jinan, Shandong, 250012, China; Key Laboratory of Reproductive Endocrinology of Ministry of Education, Shandong University, Jinan, Shandong, 250012, China; Shandong Key Laboratory of Reproductive Medicine, Jinan, Shandong, 250012, China; Shandong Provincial Clinical Research Center for Reproductive Health, Jinan, Shandong, 250012, China; Shandong Technology Innovation Center for Reproductive Health, Jinan, Shandong, 250012, China; National Research Center for Assisted Reproductive Technology and Reproductive Genetics, Shandong University, Jinan, Shandong, 250012, China; Center for Reproductive Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200135, China; Shanghai Key Laboratory for Assisted Reproduction and Reproductive Genetics, Shanghai, 200135, China.
| |
Collapse
|
4
|
Au-Yeung CCY, Cheung YT, Cheng JYT, Ip KWH, Lee SD, Yang VYT, Lau AYT, Lee CKC, Chong PKH, Lau KW, van Lunenburg JTJ, Zheng DFD, Ho BHM, Tik C, Ho KKK, Rajaby R, Au CH, Yu MHC, Sung WK. UniVar: A variant interpretation platform enhancing rare disease diagnosis through robust filtering and unified analysis of SNV, INDEL, CNV and SV. Comput Biol Med 2024; 185:109560. [PMID: 39700857 DOI: 10.1016/j.compbiomed.2024.109560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 11/24/2024] [Accepted: 12/08/2024] [Indexed: 12/21/2024]
Abstract
BACKGROUND Interpreting the pathogenicity of genetic variants associated with rare diseases is a laborious and time-consuming endeavour. To streamline the diagnostic process and lighten the burden of variant interpretation, it is crucial to automate variant annotation and prioritization. Unfortunately, currently available variant interpretation tools lack a unified and comprehensive workflow that can collectively assess the clinical significance of these types of variants together: small nucleotide variants (SNVs), small insertions/deletions (INDELs), copy number variants (CNVs) and structural variants (SVs). RESULTS The Unified Variant Interpretation Platform (UniVar) is a free web server tool that offers an automated and comprehensive workflow on annotation, filtering and prioritization for SNV, INDEL, CNV and SV collectively to identify disease-causing variants for rare diseases in one interface, ensuring accessibility for users even without programming expertise. To filter common CNVs/SVs, a diverse SV catalogue has been generated, that enables robust filtering of common SVs based on population allele frequency. Through benchmarking our SV catalogue, we showed that it is more complete and accurate than the state-of-the-art SV catalogues. Furthermore, to cope with those patients without detailed clinical information, we have developed a novel computational method that enables variant prioritization from gene panels. Our analysis shows that our approach could prioritize pathogenic variants as effective as using HPO terms assigned by clinicians, which adds value for cases without specific clinically assigned HPO terms. Lastly, through a practical case study of disease-causing compound heterozygous variants across SNV and SV, we demonstrated the uniqueness and effectiveness in variant interpretation of UniVar, edging over any existing interpretation tools. CONCLUSIONS UniVar is a unified and versatile platform that empowers researchers and clinicians to identify and interpret disease-causing variants in rare diseases efficiently through a single holistic interface and without a prerequisite for HPO terms. It is freely available without login and installation at https://univar.live/.
Collapse
Affiliation(s)
- Cherie C Y Au-Yeung
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Yuen-Ting Cheung
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Joshua Y T Cheng
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Ken W H Ip
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Sau-Dan Lee
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Victor Y T Yang
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Amy Y T Lau
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Chit K C Lee
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Peter K H Chong
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - King Wai Lau
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | | | - Damon F D Zheng
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Brian H M Ho
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Crystal Tik
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Kingsley K K Ho
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Ramesh Rajaby
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China; Shibuya Laboratory, Division of Medical Data Informatics, Human Genome Center, University of Tokyo, Japan
| | - Chun-Hang Au
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Mullin H C Yu
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Wing-Kin Sung
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China; Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong, China; Laboratory of Computational Genomics, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong, China.
| |
Collapse
|
5
|
Qi F, Chen X, Wang J, Niu X, Li S, Huang S, Ran X. Genome-wide characterization of structure variations in the Xiang pig for genetic resistance to African swine fever. Virulence 2024; 15:2382762. [PMID: 39092797 PMCID: PMC11299630 DOI: 10.1080/21505594.2024.2382762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 05/07/2024] [Accepted: 07/12/2024] [Indexed: 08/04/2024] Open
Abstract
African swine fever (ASF) is a rapidly fatal viral haemorrhagic fever in Chinese domestic pigs. Although very high mortality is observed in pig farms after an ASF outbreak, clinically healthy and antibody-positive pigs are found in those farms, and viral detection is rare from these pigs. The ability of pigs to resist ASF viral infection may be modulated by host genetic variations. However, the genetic basis of the resistance of domestic pigs against ASF remains unclear. We generated a comprehensive set of structural variations (SVs) in a Chinese indigenous Xiang pig with ASF-resistant (Xiang-R) and ASF-susceptible (Xiang-S) phenotypes using whole-genome resequencing method. A total of 53,589 nonredundant SVs were identified, with an average of 25,656 SVs per individual in the Xiang pig genome, including insertion, deletion, inversion and duplication variations. The Xiang-R group harboured more SVs than the Xiang-S group. The F-statistics (FST) was carried out to reveal genetic differences between two populations using the resequencing data at each SV locus. We identified 2,414 population-stratified SVs and annotated 1,152 Ensembl genes (including 986 protein-coding genes), in which 1,326 SVs might disturb the structure and expression of the Ensembl genes. Those protein-coding genes were mainly enriched in the Wnt, Hippo, and calcium signalling pathways. Other important pathways associated with the ASF viral infection were also identified, such as the endocytosis, apoptosis, focal adhesion, Fc gamma R-mediated phagocytosis, junction, NOD-like receptor, PI3K-Akt, and c-type lectin receptor signalling pathways. Finally, we identified 135 candidate adaptive genes overlapping 166 SVs that were involved in the virus entry and virus-host cell interactions. The fact that some of population-stratified SVs regions detected as selective sweep signals gave another support for the genetic variations affecting pig resistance against ASF. The research indicates that SVs play an important role in the evolutionary processes of Xiang pig adaptation to ASF infection.
Collapse
Affiliation(s)
- Fenfang Qi
- Institute of Agro-Bioengineering, Key Laboratory of Plant Resource Conservation and Germplasm Innovation in Mountainous Region (Ministry of Education), College of Life Sciences, College of Animal Science, Guizhou University, Guiyang, Guizhou Province, China
| | - Xia Chen
- Institute of Agro-Bioengineering, Key Laboratory of Plant Resource Conservation and Germplasm Innovation in Mountainous Region (Ministry of Education), College of Life Sciences, College of Animal Science, Guizhou University, Guiyang, Guizhou Province, China
| | - Jiafu Wang
- Institute of Agro-Bioengineering, Key Laboratory of Plant Resource Conservation and Germplasm Innovation in Mountainous Region (Ministry of Education), College of Life Sciences, College of Animal Science, Guizhou University, Guiyang, Guizhou Province, China
| | - Xi Niu
- Institute of Agro-Bioengineering, Key Laboratory of Plant Resource Conservation and Germplasm Innovation in Mountainous Region (Ministry of Education), College of Life Sciences, College of Animal Science, Guizhou University, Guiyang, Guizhou Province, China
| | - Sheng Li
- Institute of Agro-Bioengineering, Key Laboratory of Plant Resource Conservation and Germplasm Innovation in Mountainous Region (Ministry of Education), College of Life Sciences, College of Animal Science, Guizhou University, Guiyang, Guizhou Province, China
| | - Shihui Huang
- Institute of Agro-Bioengineering, Key Laboratory of Plant Resource Conservation and Germplasm Innovation in Mountainous Region (Ministry of Education), College of Life Sciences, College of Animal Science, Guizhou University, Guiyang, Guizhou Province, China
| | - Xueqin Ran
- Institute of Agro-Bioengineering, Key Laboratory of Plant Resource Conservation and Germplasm Innovation in Mountainous Region (Ministry of Education), College of Life Sciences, College of Animal Science, Guizhou University, Guiyang, Guizhou Province, China
| |
Collapse
|
6
|
Zinner AC, Jakt LM. Multiple losses of aKRAB from PRDM9 coincide with a teleost-specific intron size distribution. BMC Biol 2024; 22:275. [PMID: 39604973 PMCID: PMC11600626 DOI: 10.1186/s12915-024-02059-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Accepted: 11/01/2024] [Indexed: 11/29/2024] Open
Abstract
BACKGROUND Primary transcripts are largely comprised of intronic sequences that are excised and discarded shortly after synthesis. In vertebrates, the shape of the intron size distribution is largely constant; however, most teleost fish have a diverged log-bimodal 'teleost distribution' (TD) that is seen only in teleosts. How the TD evolved and to what extent this was affected by adaptative or non-adaptive mechanisms is unknown. RESULTS Here, we show that the TD has evolved independently at least six times and that its appearance is linked to the loss of the aKRAB domain from PRDM9. We determined intron size distributions and identified PRDM9 orthologues from annotated genomes in addition to scanning 1193 teleost assemblies for the aKRAB domain. We show that a diverged form of PRDM9 ( β ) is predominant in teleosts whereas the α version is absent from most species. Only a subset of PRDM9- α proteins contain aKRAB, and hence, it is present only in a small number of teleost lineages. Almost all lineages lacking aKRAB (but no species with) had TDs. CONCLUSIONS In mammals, PRDM9 defines the sites of meiotic recombination through a mechanism that increases structural variance and depends on aKRAB. The loss of aKRAB is likely to have shifted the locations of both recombination and structural variance hotspots. Our observations suggest that the TD evolved as a side-effect of these changes and link recombination to the evolution of intron size illustrating how genome architectures can evolve in the absence of selection.
Collapse
Affiliation(s)
- Ann-Christin Zinner
- Faculty of Biosciences and Aquaculture, Nord University, Universitetsalléen 11, Bodø, 8026, Norway
| | - Lars Martin Jakt
- Faculty of Biosciences and Aquaculture, Nord University, Universitetsalléen 11, Bodø, 8026, Norway.
| |
Collapse
|
7
|
Calvo-Roitberg E, Daniels RF, Pai AA. Challenges in identifying mRNA transcript starts and ends from long-read sequencing data. Genome Res 2024; 34:1719-1734. [PMID: 39567236 DOI: 10.1101/gr.279559.124] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 08/16/2024] [Indexed: 11/22/2024]
Abstract
Long-read sequencing (LRS) technologies have the potential to revolutionize scientific discoveries in RNA biology through the comprehensive identification and quantification of full-length mRNA isoforms. Despite great promise, challenges remain in the widespread implementation of LRS technologies for RNA-based applications, including concerns about low coverage, high sequencing error, and robust computational pipelines. Although much focus has been placed on defining mRNA exon composition and structure with LRS data, less careful characterization has been done of the ability to assess the terminal ends of isoforms, specifically, transcription start and end sites. Such characterization is crucial for completely delineating full mRNA molecules and regulatory consequences. However, there are substantial inconsistencies in both start and end coordinates of LRS reads spanning a gene, such that LRS reads often fail to accurately recapitulate annotated or empirically derived terminal ends of mRNA molecules. Here, we describe the specific challenges of identifying and quantifying mRNA terminal ends with LRS technologies and how these issues influence biological interpretations of LRS data. We then review recent experimental and computational advances designed to alleviate these problems, with ideal use cases for each approach. Finally, we outline anticipated developments and necessary improvements for the characterization of terminal ends from LRS data.
Collapse
Affiliation(s)
- Ezequiel Calvo-Roitberg
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, Massachusetts 01605, USA
| | - Rachel F Daniels
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, Massachusetts 01605, USA
| | - Athma A Pai
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, Massachusetts 01605, USA
| |
Collapse
|
8
|
De Coster W, Höijer I, Bruggeman I, D'Hert S, Melin M, Ameur A, Rademakers R. Visualization and analysis of medically relevant tandem repeats in nanopore sequencing of control cohorts with pathSTR. Genome Res 2024; 34:2074-2080. [PMID: 39147583 DOI: 10.1101/gr.279265.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 08/02/2024] [Indexed: 08/17/2024]
Abstract
The lack of population-scale databases hampers research and diagnostics for medically relevant tandem repeats and repeat expansions. We attempt to fill this gap using our pathSTR web tool, which leverages long-read sequencing of large cohorts to determine repeat length and sequence composition in a healthy population. The current version includes 1040 individuals of The 1000 Genomes Project cohort sequenced on the Oxford Nanopore Technologies PromethION. A comprehensive set of medically relevant tandem repeats has been genotyped using STRdust and LongTR to determine the tandem repeat length and sequence composition. PathSTR provides rich visualizations of this data set and the feature to upload one's data for comparison along the control cohort. We demonstrate the implementation of this application using data from targeted nanopore sequencing of a patient with myotonic dystrophy type 1. This resource will empower the genetics community to get a more complete overview of normal variation in tandem repeat length and sequence composition and, as such, enable a better assessment of rare tandem repeat alleles observed in patients.
Collapse
Affiliation(s)
- Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium;
- Department of Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
| | - Ida Höijer
- Department of Immunology, Genetics and Pathology, SciLifeLab, Uppsala University, 751 85 Uppsala, Sweden
| | - Inge Bruggeman
- Department of Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
| | - Svenn D'Hert
- Department of Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
- Neuromics Support Facility, VIB Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium
| | - Malin Melin
- Department of Immunology, Genetics and Pathology, SciLifeLab, Uppsala University, 751 85 Uppsala, Sweden
| | - Adam Ameur
- Department of Immunology, Genetics and Pathology, SciLifeLab, Uppsala University, 751 85 Uppsala, Sweden
| | - Rosa Rademakers
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium
- Department of Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
| |
Collapse
|
9
|
Frampton S, Smith R, Ferson L, Gibson J, Hollox EJ, Cragg MS, Strefford JC. Fc gamma receptors: Their evolution, genomic architecture, genetic variation, and impact on human disease. Immunol Rev 2024; 328:65-97. [PMID: 39345014 DOI: 10.1111/imr.13401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Fc gamma receptors (FcγRs) are a family of receptors that bind IgG antibodies and interface at the junction of humoral and innate immunity. Precise regulation of receptor expression provides the necessary balance to achieve healthy immune homeostasis by establishing an appropriate immune threshold to limit autoimmunity but respond effectively to infection. The underlying genetics of the FCGR gene family are central to achieving this immune threshold by regulating affinity for IgG, signaling efficacy, and receptor expression. The FCGR gene locus was duplicated during evolution, retaining very high homology and resulting in a genomic region that is technically difficult to study. Here, we review the recent evolution of the gene family in mammals, its complexity and variation through copy number variation and single-nucleotide polymorphism, and impact of these on disease incidence, resolution, and therapeutic antibody efficacy. We also discuss the progress and limitations of current approaches to study the region and emphasize how new genomics technologies will likely resolve much of the current confusion in the field. This will lead to definitive conclusions on the impact of genetic variation within the FCGR gene locus on immune function and disease.
Collapse
Affiliation(s)
- Sarah Frampton
- Cancer Genomics Group, Faculty of Medicine, School of Cancer Sciences, University of Southampton, Southampton, UK
| | - Rosanna Smith
- Antibody and Vaccine Group, Faculty of Medicine, School of Cancer Sciences, Centre for Cancer Immunology, University of Southampton, Southampton, UK
| | - Lili Ferson
- Cancer Genomics Group, Faculty of Medicine, School of Cancer Sciences, University of Southampton, Southampton, UK
| | - Jane Gibson
- Cancer Genomics Group, Faculty of Medicine, School of Cancer Sciences, University of Southampton, Southampton, UK
| | - Edward J Hollox
- Department of Genetics, Genomics and Cancer Sciences, College of Life Sciences, University of Leicester, Leicester, UK
| | - Mark S Cragg
- Antibody and Vaccine Group, Faculty of Medicine, School of Cancer Sciences, Centre for Cancer Immunology, University of Southampton, Southampton, UK
| | - Jonathan C Strefford
- Cancer Genomics Group, Faculty of Medicine, School of Cancer Sciences, University of Southampton, Southampton, UK
| |
Collapse
|
10
|
Harris L, McDonagh EM, Zhang X, Fawcett K, Foreman A, Daneck P, Sergouniotis PI, Parkinson H, Mazzarotto F, Inouye M, Hollox EJ, Birney E, Fitzgerald T. Genome-wide association testing beyond SNPs. Nat Rev Genet 2024:10.1038/s41576-024-00778-y. [PMID: 39375560 DOI: 10.1038/s41576-024-00778-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/03/2024] [Indexed: 10/09/2024]
Abstract
Decades of genetic association testing in human cohorts have provided important insights into the genetic architecture and biological underpinnings of complex traits and diseases. However, for certain traits, genome-wide association studies (GWAS) for common SNPs are approaching signal saturation, which underscores the need to explore other types of genetic variation to understand the genetic basis of traits and diseases. Copy number variation (CNV) is an important source of heritability that is well known to functionally affect human traits. Recent technological and computational advances enable the large-scale, genome-wide evaluation of CNVs, with implications for downstream applications such as polygenic risk scoring and drug target identification. Here, we review the current state of CNV-GWAS, discuss current limitations in resource infrastructure that need to be overcome to enable the wider uptake of CNV-GWAS results, highlight emerging opportunities and suggest guidelines and standards for future GWAS for genetic variation beyond SNPs at scale.
Collapse
Affiliation(s)
- Laura Harris
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Ellen M McDonagh
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Xiaolei Zhang
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Katherine Fawcett
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
- Department of Population Health Sciences, University of Leicester, Leicester, UK
| | - Amy Foreman
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Petr Daneck
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Panagiotis I Sergouniotis
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
- Division of Evolution, Infection and Genomics, School of Biological Sciences, University of Manchester, Manchester, UK
| | - Helen Parkinson
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Francesco Mazzarotto
- Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
- National Heart and Lung Institute, Imperial College London, London, UK
| | - Michael Inouye
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Australia
| | - Edward J Hollox
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Ewan Birney
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Tomas Fitzgerald
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK.
| |
Collapse
|
11
|
Smolka M, Paulin LF, Grochowski CM, Horner DW, Mahmoud M, Behera S, Kalef-Ezra E, Gandhi M, Hong K, Pehlivan D, Scholz SW, Carvalho CMB, Proukakis C, Sedlazeck FJ. Detection of mosaic and population-level structural variants with Sniffles2. Nat Biotechnol 2024; 42:1571-1580. [PMID: 38168980 PMCID: PMC11217151 DOI: 10.1038/s41587-023-02024-y] [Citation(s) in RCA: 34] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 10/11/2023] [Indexed: 01/05/2024]
Abstract
Calling structural variations (SVs) is technically challenging, but using long reads remains the most accurate way to identify complex genomic alterations. Here we present Sniffles2, which improves over current methods by implementing a repeat aware clustering coupled with a fast consensus sequence and coverage-adaptive filtering. Sniffles2 is 11.8 times faster and 29% more accurate than state-of-the-art SV callers across different coverages (5-50×), sequencing technologies (ONT and HiFi) and SV types. Furthermore, Sniffles2 solves the problem of family-level to population-level SV calling to produce fully genotyped VCF files. Across 11 probands, we accurately identified causative SVs around MECP2, including highly complex alleles with three overlapping SVs. Sniffles2 also enables the detection of mosaic SVs in bulk long-read data. As a result, we identified multiple mosaic SVs in brain tissue from a patient with multiple system atrophy. The identified SV showed a remarkable diversity within the cingulate cortex, impacting both genes involved in neuron function and repetitive elements.
Collapse
Affiliation(s)
- Moritz Smolka
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Luis F Paulin
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | | | - Dominic W Horner
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Sairam Behera
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Ester Kalef-Ezra
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Mira Gandhi
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Karl Hong
- Bionano Genomics, San Diego, CA, USA
| | - Davut Pehlivan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Division of Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Sonja W Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Claudia M B Carvalho
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Christos Proukakis
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
12
|
Höps W, Rausch T, Jendrusch M, Korbel JO, Sedlazeck FJ. Impact and characterization of serial structural variations across humans and great apes. Nat Commun 2024; 15:8007. [PMID: 39266513 PMCID: PMC11393467 DOI: 10.1038/s41467-024-52027-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 08/23/2024] [Indexed: 09/14/2024] Open
Abstract
Modern sequencing technology enables the systematic detection of complex structural variation (SV) across genomes. However, extensive DNA rearrangements arising through a series of mutations, a phenomenon we refer to as serial SV (sSV), remain underexplored, posing a challenge for SV discovery. Here, we present NAHRwhals ( https://github.com/WHops/NAHRwhals ), a method to infer repeat-mediated series of SVs in long-read genomic assemblies. Applying NAHRwhals to haplotype-resolved human genomes from 28 individuals reveals 37 sSV loci of various length and complexity. These sSVs explain otherwise cryptic variation in medically relevant regions such as the TPSAB1 gene, 8p23.1, 22q11 and Sotos syndrome regions. Comparisons with great ape assemblies indicate that most human sSVs formed recently, after the human-ape split, and involved non-repeat-mediated processes in addition to non-allelic homologous recombination. NAHRwhals reliably discovers and characterizes sSVs at scale and independent of species, uncovering their genomic abundance and suggesting broader implications for disease.
Collapse
Affiliation(s)
- Wolfram Höps
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
| | - Tobias Rausch
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
- Molecular Medicine Partnership Unit, European Molecular Biology Laboratory, University of Heidelberg, Heidelberg, Germany
| | - Michael Jendrusch
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| |
Collapse
|
13
|
Köroğlu Ç, Chen P, Traurig M, Altok S, Bogardus C, Baier LJ. De Novo Genome Assemblies From Two Indigenous Americans from Arizona Identify New Polymorphisms in Non-Reference Sequences. Genome Biol Evol 2024; 16:evae188. [PMID: 39190003 PMCID: PMC11384899 DOI: 10.1093/gbe/evae188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 05/17/2024] [Accepted: 08/22/2024] [Indexed: 08/28/2024] Open
Abstract
There is a collective push to diversify human genetic studies by including underrepresented populations. However, analyzing DNA sequence reads involves the initial step of aligning the reads to the GRCh38/hg38 reference genome which is inadequate for non-European ancestries. In this study, using long-read sequencing technology, we constructed de novo genome assemblies from two indigenous Americans from Arizona (IAZ). Each assembly included ∼17 Mb of DNA sequence not present [nonreference sequence (NRS)] in hg38, which consists mostly of repeat elements. Forty NRSs totaling 240 kb were uniquely anchored to the hg38 primary assembly generating a modified hg38-NRS reference genome. DNA sequence alignment and variant calling were then conducted with whole-genome sequencing (WGS) sequencing data from 387 IAZ using both the hg38 and modified hg38-NRS reference maps. Variant calling with the hg38-NRS map identified ∼50,000 single-nucleotide variants present in at least 5% of the WGS samples which were not detected with the hg38 reference map. We also directly assessed the NRSs positioned within genes. Seventeen NRSs anchored to regions including an identical 187 bp NRS found in both de novo assemblies. The NRS is located in HCN2 79 bp downstream of Exon 3 and contains several putative transcriptional regulatory elements. Genotyping of the HCN2-NRS revealed that the insertion is enriched in IAZ (minor allele frequency = 0.45) compared to other reference populations tested. This study shows that inclusion of population-specific NRSs can dramatically change the variant profile in an underrepresented ethnic groups and thereby lead to the discovery of previously missed common variations.
Collapse
Affiliation(s)
- Çiğdem Köroğlu
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Peng Chen
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Michael Traurig
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Serdar Altok
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Clifton Bogardus
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Leslie J Baier
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| |
Collapse
|
14
|
Mirus T, Lohmayer R, Döhring C, Halldórsson BV, Kehr B. GGTyper: genotyping complex structural variants using short-read sequencing data. Bioinformatics 2024; 40:ii11-ii19. [PMID: 39230689 PMCID: PMC11373317 DOI: 10.1093/bioinformatics/btae391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open
Abstract
MOTIVATION Complex structural variants (SVs) are genomic rearrangements that involve multiple segments of DNA. They contribute to human diversity and have been shown to cause Mendelian disease. Nevertheless, our abilities to analyse complex SVs are very limited. As opposed to deletions and other canonical types of SVs, there are no established tools that have explicitly been designed for analysing complex SVs. RESULTS Here, we describe a new computational approach that we specifically designed for genotyping complex SVs in short-read sequenced genomes. Given a variant description, our approach computes genotype-specific probability distributions for observing aligned read pairs with a wide range of properties. Subsequently, these distributions can be used to efficiently determine the most likely genotype for any set of aligned read pairs observed in a sequenced genome. In addition, we use these distributions to compute a genotyping difficulty for a given variant, which predicts the amount of data needed to achieve a reliable call. Careful evaluation confirms that our approach outperforms other genotypers by making reliable genotype predictions across both simulated and real data. On up to 7829 human genomes, we achieve high concordance with population-genetic assumptions and expected inheritance patterns. On simulated data, we show that precision correlates well with our prediction of genotyping difficulty. This together with low memory and time requirements makes our approach well-suited for application in biomedical studies involving small to very large numbers of short-read sequenced genomes. AVAILABILITY AND IMPLEMENTATION Source code is available at https://github.com/kehrlab/Complex-SV-Genotyping.
Collapse
Affiliation(s)
- Tim Mirus
- AG Algorithmic Bioinformatics, Leibniz-Institut für Immuntherapie, Regensburg 93053, Germany
| | - Robert Lohmayer
- AG Algorithmic Bioinformatics, Leibniz-Institut für Immuntherapie, Regensburg 93053, Germany
| | - Clementine Döhring
- AG Algorithmic Bioinformatics, Leibniz-Institut für Immuntherapie, Regensburg 93053, Germany
| | - Bjarni V Halldórsson
- deCODE genetics/Amgen Inc, Reykjavik 101, Iceland
- School of Technology, Reykjavik University, Reykjavic 102, Iceland
| | - Birte Kehr
- AG Algorithmic Bioinformatics, Leibniz-Institut für Immuntherapie, Regensburg 93053, Germany
- Fakultät für Informatik und Data Science, Universität Regensburg, Regensburg 93053, Germany
| |
Collapse
|
15
|
Rojas de Oliveira H, Chud TCS, Oliveira GA, Hermisdorff IC, Narayana SG, Rochus CM, Butty AM, Malchiodi F, Stothard P, Miglior F, Baes CF, Schenkel FS. Genome-wide association analyses reveal copy number variant regions associated with reproduction and disease traits in Canadian Holstein cattle. J Dairy Sci 2024; 107:7052-7063. [PMID: 38788846 DOI: 10.3168/jds.2023-24295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 04/01/2024] [Indexed: 05/26/2024]
Abstract
This study aimed to evaluate the impact of copy number variants (CNV) on 13 reproduction and 12 disease traits in Holstein cattle. Intensity signal files containing log R ratio and B allele frequency information from 13,730 Holstein animals genotyped with a 95K SNP panel, and 8,467 Holstein animals genotyped with a 50K SNP panel were used to identify the CNVs. Subsequently, the identified CNVs were validated using whole-genome sequence data from 126 animals, resulting in 870 high-confidence copy number variant regions (CNVR) on 12,131 animals. Out of these, 54 CNVR had frequencies higher than or equal to 1% in the population and were used in the genome-wide association analysis (one CNVR at a time, including the G matrix). Results revealed that 4 CNVR were significantly associated with at least one of the traits analyzed in this study. Specifically, 2 CNVR were associated with 3 reproduction traits (i.e., calf survival, first service to conception, and nonreturn rate), and 2 CNVR were associated with 2 disease traits (i.e., metritis and retained placenta). These CNVR harbored genes implicated in immune response, cellular signaling, and neuronal development, supporting their potential involvement in these traits. Further investigations to unravel the mechanistic and functional implications of these CNVR on the mentioned traits are warranted.
Collapse
Affiliation(s)
- Hinayah Rojas de Oliveira
- Department of Animal Sciences, Purdue University, West Lafayette, IN 47907; Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada N1G 2W1.
| | - Tatiane C S Chud
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada N1G 2W1
| | - Gerson A Oliveira
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada N1G 2W1
| | - Isis C Hermisdorff
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada N1G 2W1
| | - Saranya G Narayana
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada N1G 2W1; Lactanet, Guelph, ON, Canada N1K 1E5
| | - Christina M Rochus
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada N1G 2W1
| | | | - Francesca Malchiodi
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada N1G 2W1; Semex, Guelph, ON, Canada N1H 6J2
| | - Paul Stothard
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada T6G 2H1
| | - Filippo Miglior
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada N1G 2W1; Lactanet, Guelph, ON, Canada N1K 1E5
| | - Christine F Baes
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada N1G 2W1; Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern, Switzerland 3012
| | - Flavio S Schenkel
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada N1G 2W1.
| |
Collapse
|
16
|
Wu XR, Wu BS, Kang JJ, Chen LM, Deng YT, Chen SD, Dong Q, Feng JF, Cheng W, Yu JT. Contribution of copy number variations to education, socioeconomic status and cognition from a genome-wide study of 305,401 subjects. Mol Psychiatry 2024:10.1038/s41380-024-02717-z. [PMID: 39215183 DOI: 10.1038/s41380-024-02717-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 08/19/2024] [Accepted: 08/22/2024] [Indexed: 09/04/2024]
Abstract
Educational attainment (EA), socioeconomic status (SES) and cognition are phenotypically and genetically linked to health outcomes. However, the role of copy number variations (CNVs) in influencing EA/SES/cognition remains unclear. Using a large-scale (n = 305,401) genome-wide CNV-level association analysis, we discovered 33 CNV loci significantly associated with EA/SES/cognition, 20 of which were novel (deletions at 2p22.2, 2p16.2, 2p12, 3p25.3, 4p15.2, 5p15.33, 5q21.1, 8p21.3, 9p21.1, 11p14.3, 13q12.13, 17q21.31, and 20q13.33, as well as duplications at 3q12.2, 3q23, 7p22.3, 8p23.1, 8p23.2, 17q12 (105 kb), and 19q13.32). The genes identified in gene-level tests were enriched in biological pathways such as neurodegeneration, telomere maintenance and axon guidance. Phenome-wide association studies further identified novel associations of EA/SES/cognition-associated CNVs with mental and physical diseases, such as 6q27 duplication with upper respiratory disease and 17q12 (105 kb) duplication with mood disorders. Our findings provide a genome-wide CNV profile for EA/SES/cognition and bridge their connections to health. The expanded candidate CNVs database and the residing genes would be a valuable resource for future studies aimed at uncovering the biological mechanisms underlying cognitive function and related clinical phenotypes.
Collapse
Affiliation(s)
- Xin-Rui Wu
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Bang-Sheng Wu
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Ju-Jiao Kang
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Li-Min Chen
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Yue-Ting Deng
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Shi-Dong Chen
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Qiang Dong
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Jian-Feng Feng
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- Department of Computer Science, University of Warwick, Coventry, CV4 7AL, UK
| | - Wei Cheng
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China.
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.
| | - Jin-Tai Yu
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China.
| |
Collapse
|
17
|
Lai S, Wang H, Bork P, Chen WH, Zhao XM. Long-read sequencing reveals extensive gut phageome structural variations driven by genetic exchange with bacterial hosts. SCIENCE ADVANCES 2024; 10:eadn3316. [PMID: 39141729 PMCID: PMC11323893 DOI: 10.1126/sciadv.adn3316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 07/10/2024] [Indexed: 08/16/2024]
Abstract
Genetic variations are instrumental for unraveling phage evolution and deciphering their functional implications. Here, we explore the underlying fine-scale genetic variations in the gut phageome, especially structural variations (SVs). By using virome-enriched long-read metagenomic sequencing across 91 individuals, we identified a total of 14,438 nonredundant phage SVs and revealed their prevalence within the human gut phageome. These SVs are mainly enriched in genes involved in recombination, DNA methylation, and antibiotic resistance. Notably, a substantial fraction of phage SV sequences share close homology with bacterial fragments, with most SVs enriched for horizontal gene transfer (HGT) mechanism. Further investigations showed that these SV sequences were genetic exchanged between specific phage-bacteria pairs, particularly between phages and their respective bacterial hosts. Temperate phages exhibit a higher frequency of genetic exchange with bacterial chromosomes and then virulent phages. Collectively, our findings provide insights into the genetic landscape of the human gut phageome.
Collapse
Affiliation(s)
- Senying Lai
- Department of Neurology, Zhongshan Hospital and Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Huarui Wang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular Imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Peer Bork
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany
- Max Delbrück Centre for Molecular Medicine, Berlin, Germany
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| | - Wei-Hua Chen
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China
- College of Life Science, Henan Normal University, Xinxiang, Henan, China
| | - Xing-Ming Zhao
- Department of Neurology, Zhongshan Hospital and Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| |
Collapse
|
18
|
Stefansson OA, Sigurpalsdottir BD, Rognvaldsson S, Halldorsson GH, Juliusson K, Sveinbjornsson G, Gunnarsson B, Beyter D, Jonsson H, Gudjonsson SA, Olafsdottir TA, Saevarsdottir S, Magnusson MK, Lund SH, Tragante V, Oddsson A, Hardarson MT, Eggertsson HP, Gudmundsson RL, Sverrisson S, Frigge ML, Zink F, Holm H, Stefansson H, Rafnar T, Jonsdottir I, Sulem P, Helgason A, Gudbjartsson DF, Halldorsson BV, Thorsteinsdottir U, Stefansson K. The correlation between CpG methylation and gene expression is driven by sequence variants. Nat Genet 2024; 56:1624-1631. [PMID: 39048797 PMCID: PMC11319203 DOI: 10.1038/s41588-024-01851-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 06/27/2024] [Indexed: 07/27/2024]
Abstract
Gene promoter and enhancer sequences are bound by transcription factors and are depleted of methylated CpG sites (cytosines preceding guanines in DNA). The absence of methylated CpGs in these sequences typically correlates with increased gene expression, indicating a regulatory role for methylation. We used nanopore sequencing to determine haplotype-specific methylation rates of 15.3 million CpG units in 7,179 whole-blood genomes. We identified 189,178 methylation depleted sequences where three or more proximal CpGs were unmethylated on at least one haplotype. A total of 77,789 methylation depleted sequences (~41%) associated with 80,503 cis-acting sequence variants, which we termed allele-specific methylation quantitative trait loci (ASM-QTLs). RNA sequencing of 896 samples from the same blood draws used to perform nanopore sequencing showed that the ASM-QTL, that is, DNA sequence variability, drives most of the correlation found between gene expression and CpG methylation. ASM-QTLs were enriched 40.2-fold (95% confidence interval 32.2, 49.9) among sequence variants associating with hematological traits, demonstrating that ASM-QTLs are important functional units in the noncoding genome.
Collapse
Affiliation(s)
| | - Brynja Dogg Sigurpalsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- School of Technology, Reykjavik University, Reykjavik, Iceland
| | | | - Gisli Hreinn Halldorsson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | | | | | | | | | | | | | - Thorunn Asta Olafsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | - Saedis Saevarsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | - Magnus Karl Magnusson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | - Sigrun Helga Lund
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | | | | | - Marteinn Thor Hardarson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- School of Technology, Reykjavik University, Reykjavik, Iceland
| | | | | | | | | | | | - Hilma Holm
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
| | | | | | - Ingileif Jonsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | | | - Agnar Helgason
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- Department of Anthropology, University of Iceland, Reykjavik, Iceland
| | - Daniel F Gudbjartsson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | - Bjarni V Halldorsson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- School of Technology, Reykjavik University, Reykjavik, Iceland
| | - Unnur Thorsteinsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | - Kari Stefansson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland.
| |
Collapse
|
19
|
Taylor DJ, Eizenga JM, Li Q, Das A, Jenike KM, Kenny EE, Miga KH, Monlong J, McCoy RC, Paten B, Schatz MC. Beyond the Human Genome Project: The Age of Complete Human Genome Sequences and Pangenome References. Annu Rev Genomics Hum Genet 2024; 25:77-104. [PMID: 38663087 PMCID: PMC11451085 DOI: 10.1146/annurev-genom-021623-081639] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2024]
Abstract
The Human Genome Project was an enormous accomplishment, providing a foundation for countless explorations into the genetics and genomics of the human species. Yet for many years, the human genome reference sequence remained incomplete and lacked representation of human genetic diversity. Recently, two major advances have emerged to address these shortcomings: complete gap-free human genome sequences, such as the one developed by the Telomere-to-Telomere Consortium, and high-quality pangenomes, such as the one developed by the Human Pangenome Reference Consortium. Facilitated by advances in long-read DNA sequencing and genome assembly algorithms, complete human genome sequences resolve regions that have been historically difficult to sequence, including centromeres, telomeres, and segmental duplications. In parallel, pangenomes capture the extensive genetic diversity across populations worldwide. Together, these advances usher in a new era of genomics research, enhancing the accuracy of genomic analysis, paving the path for precision medicine, and contributing to deeper insights into human biology.
Collapse
Affiliation(s)
- Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Jordan M Eizenga
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Arun Das
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Katharine M Jenike
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA;
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA;
| | - Karen H Miga
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Jean Monlong
- Institut de Recherche en Santé Digestive, Université de Toulouse, INSERM, INRA, ENVT, UPS, Toulouse, France;
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Benedict Paten
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| |
Collapse
|
20
|
Alvarez Jerez P, Daida K, Grenn FP, Malik L, Miano-Burkhardt A, Makarious MB, Ding J, Gibbs JR, Moore A, Reed X, Nalls MA, Shah S, Mahmoud M, Sedlazeck FJ, Dolzhenko E, Park M, Iwaki H, Casey B, Ryten M, Blauwendraat C, Singleton AB, Billingsley KJ. Characterizing a complex CT-rich haplotype in intron 4 of SNCA using large-scale targeted amplicon long-read sequencing. NPJ Parkinsons Dis 2024; 10:136. [PMID: 39060285 PMCID: PMC11282088 DOI: 10.1038/s41531-024-00749-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 07/04/2024] [Indexed: 07/28/2024] Open
Abstract
Parkinson's disease (PD) is a common neurodegenerative disorder with a significant risk proportion driven by genetics. While much progress has been made, most of the heritability remains unknown. This is in-part because previous genetic studies have focused on the contribution of single nucleotide variants. More complex forms of variation, such as structural variants and tandem repeats, are already associated with several synucleinopathies. However, because more sophisticated sequencing methods are usually required to detect these regions, little is understood regarding their contribution to PD. One example is a polymorphic CT-rich region in intron 4 of the SNCA gene. This haplotype has been suggested to be associated with risk of Lewy Body (LB) pathology in Alzheimer's Disease and SNCA gene expression, but is yet to be investigated in PD. Here, we attempt to resolve this CT-rich haplotype and investigate its role in PD. We performed targeted PacBio HiFi sequencing of the region in 1375 PD cases and 959 controls. We replicate the previously reported associations and a novel association between two PD risk SNVs (rs356182 and rs5019538) and haplotype 4, the largest haplotype. Through quantitative trait locus analyzes we identify a significant haplotype 4 association with alternative CAGE transcriptional start site usage, not leading to significant differential SNCA gene expression in post-mortem frontal cortex brain tissue. Therefore, disease association in this locus might not be biologically driven by this CT-rich repeat region. Our data demonstrates the complexity of this SNCA region and highlights that further follow up functional studies are warranted.
Collapse
Affiliation(s)
- Pilar Alvarez Jerez
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Kensuke Daida
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Francis P Grenn
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Laksh Malik
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Abigail Miano-Burkhardt
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Mary B Makarious
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Jinhui Ding
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - J Raphael Gibbs
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Anni Moore
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Xylena Reed
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Mike A Nalls
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
- DataTecnica LLC, Washington, DC, USA
| | - Syed Shah
- DataTecnica LLC, Washington, DC, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Morgan Park
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Hirotaka Iwaki
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
- DataTecnica LLC, Washington, DC, USA
| | - Bradford Casey
- The Michael J. Fox Foundation for Parkinson's Research, New York, New York, USA
| | - Mina Ryten
- Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
- Uk Dementia Research Institute at the University of Cambridge and Department of Clinical Neurosciences, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Cornelis Blauwendraat
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Andrew B Singleton
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Kimberley J Billingsley
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA.
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA.
| |
Collapse
|
21
|
Yuan N, Jia P. Comprehensive assessment of long-read sequencing platforms and calling algorithms for detection of copy number variation. Brief Bioinform 2024; 25:bbae441. [PMID: 39256200 PMCID: PMC11387058 DOI: 10.1093/bib/bbae441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 07/09/2024] [Accepted: 08/25/2024] [Indexed: 09/12/2024] Open
Abstract
Copy number variations (CNVs) play pivotal roles in disease susceptibility and have been intensively investigated in human disease studies. Long-read sequencing technologies offer opportunities for comprehensive structural variation (SV) detection, and numerous methodologies have been developed recently. Consequently, there is a pressing need to assess these methods and aid researchers in selecting appropriate techniques for CNV detection using long-read sequencing. Hence, we conducted an evaluation of eight CNV calling methods across 22 datasets from nine publicly available samples and 15 simulated datasets, covering multiple sequencing platforms. The overall performance of CNV callers varied substantially and was influenced by the input dataset type, sequencing depth, and CNV type, among others. Specifically, the PacBio CCS sequencing platform outperformed PacBio CLR and Nanopore platforms regarding CNV detection recall rates. A sequencing depth of 10x demonstrated the capability to identify 85% of the CNVs detected in a 50x dataset. Moreover, deletions were more generally detectable than duplications. Among the eight benchmarked methods, cuteSV, Delly, pbsv, and Sniffles2 demonstrated superior accuracy, while SVIM exhibited high recall rates.
Collapse
Affiliation(s)
- Na Yuan
- National Genomics Data Center, China National Center for Bioinformation, Beichen West Road, Chaoyang District, Beijing 100101, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beichen West Road, Chaoyang District, Beijing 100101, China
| | - Peilin Jia
- National Genomics Data Center, China National Center for Bioinformation, Beichen West Road, Chaoyang District, Beijing 100101, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beichen West Road, Chaoyang District, Beijing 100101, China
| |
Collapse
|
22
|
Buckley RM, Ostrander EA. Large-scale genomic analysis of the domestic dog informs biological discovery. Genome Res 2024; 34:811-821. [PMID: 38955465 PMCID: PMC11293549 DOI: 10.1101/gr.278569.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
Recent advances in genomics, coupled with a unique population structure and remarkable levels of variation, have propelled the domestic dog to new levels as a system for understanding fundamental principles in mammalian biology. Central to this advance are more than 350 recognized breeds, each a closed population that has undergone selection for unique features. Genetic variation in the domestic dog is particularly well characterized compared with other domestic mammals, with almost 3000 high-coverage genomes publicly available. Importantly, as the number of sequenced genomes increases, new avenues for analysis are becoming available. Herein, we discuss recent discoveries in canine genomics regarding behavior, morphology, and disease susceptibility. We explore the limitations of current data sets for variant interpretation, tradeoffs between sequencing strategies, and the burgeoning role of long-read genomes for capturing structural variants. In addition, we consider how large-scale collections of whole-genome sequence data drive rare variant discovery and assess the geographic distribution of canine diversity, which identifies Asia as a major source of missing variation. Finally, we review recent comparative genomic analyses that will facilitate annotation of the noncoding genome in dogs.
Collapse
Affiliation(s)
- Reuben M Buckley
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Elaine A Ostrander
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| |
Collapse
|
23
|
Subramanian K, Chopra M, Kahali B. Landscape of genomic structural variations in Indian population-based cohorts: Deeper insights into their prevalence and clinical relevance. HGG ADVANCES 2024; 5:100285. [PMID: 38521976 PMCID: PMC11007539 DOI: 10.1016/j.xhgg.2024.100285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 03/13/2024] [Accepted: 03/20/2024] [Indexed: 03/25/2024] Open
Abstract
Structural variations (SV) are large (>50 base pairs) genomic rearrangements comprising deletions, duplications, insertions, inversions, and translocations. Studying SVs is important because they play active and critical roles in regulating gene expression, determining disease predispositions, and identifying population-specific differences among individuals of diverse ancestries. However, SV discoveries in the Indian population using whole-genome sequencing (WGS) have been limited. In this study, using short-read WGS having an average 42X depth of coverage, we identify and characterize 36,210 SVs from 529 individuals enrolled in population-based cohorts in India. These SVs include 24,574 deletions, 2,913 duplications, 8,710 insertions, and 13 inversions; 1.26% (456 out of 36,210) of the identified SVs can potentially impact the coding regions of genes. Furthermore, 56 of these SVs are highly intolerant to loss-of-function changes to the mapped genes, and five SVs impacting ADAMTS17, CCDC40, and RHCE are common in our study individuals. Seven rare SVs significantly impact dosage sensitivity of genes known to be associated with various clinical phenotypes. Most of the SVs in our study are rare and heterozygous. This fine-scale SV discovery in the underrepresented Indian population provides valuable insights that extend beyond Eurocentric human genetic studies.
Collapse
Affiliation(s)
- Krithika Subramanian
- Centre for Brain Research, Indian Institute of Science, Bangalore 560012, India; Manipal Academy of Higher Education, Manipal, Karnataka 576104, India
| | - Mehak Chopra
- Centre for Brain Research, Indian Institute of Science, Bangalore 560012, India
| | - Bratati Kahali
- Centre for Brain Research, Indian Institute of Science, Bangalore 560012, India.
| |
Collapse
|
24
|
Ji Y, Zhao J, Gong J, Sedlazeck FJ, Fan S. Unveiling novel genetic variants in 370 challenging medically relevant genes using the long read sequencing data of 41 samples from 19 global populations. Mol Genet Genomics 2024; 299:65. [PMID: 38972030 DOI: 10.1007/s00438-024-02158-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 06/16/2024] [Indexed: 07/08/2024]
Abstract
BACKGROUND A large number of challenging medically relevant genes (CMRGs) are situated in complex or highly repetitive regions of the human genome, hindering comprehensive characterization of genetic variants using next-generation sequencing technologies. In this study, we employed long-read sequencing technology, extensively utilized in studying complex genomic regions, to characterize genetic alterations, including short variants (single nucleotide variants and short insertions and deletions) and copy number variations, in 370 CMRGs across 41 individuals from 19 global populations. RESULTS Our analysis revealed high levels of genetic variants in CMRGs, with 68.73% exhibiting copy number variations and 65.20% containing short variants that may disrupt protein function across individuals. Such variants can influence pharmacogenomics, genetic disease susceptibility, and other clinical outcomes. We observed significant differences in CMRG variation across populations, with individuals of African ancestry harboring the highest number of copy number variants and short variants compared to samples from other continents. Notably, 15.79% to 33.96% of short variants were exclusively detectable through long-read sequencing. While the T2T-CHM13 reference genome significantly improved the assembly of CMRG regions, thereby facilitating variant detection in these regions, some regions still lacked resolution. CONCLUSION Our results provide an important reference for future clinical and pharmacogenetic studies, highlighting the need for a comprehensive representation of global genetic diversity in the reference genome and improved variant calling techniques to fully resolve medically relevant genes.
Collapse
Affiliation(s)
- Yanfeng Ji
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Junfan Zhao
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Jiao Gong
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA.
| | - Shaohua Fan
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China.
| |
Collapse
|
25
|
Lamkin M, Gymrek M. The emerging role of tandem repeats in complex traits. Nat Rev Genet 2024; 25:452-453. [PMID: 38714860 DOI: 10.1038/s41576-024-00736-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Affiliation(s)
- Michael Lamkin
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
26
|
Liang H, Sedillo JC, Schrodi SJ, Ikeda A. Structural variants in linkage disequilibrium with GWAS-significant SNPs. Heliyon 2024; 10:e32053. [PMID: 38882374 PMCID: PMC11177133 DOI: 10.1016/j.heliyon.2024.e32053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 05/17/2024] [Accepted: 05/28/2024] [Indexed: 06/18/2024] Open
Abstract
With the recent expansion of structural variant identification in the human genome, understanding the role of these impactful variants in disease architecture is critically important. Currently, a large proportion of genome-wide-significant genome-wide association study (GWAS) single nucleotide polymorphisms (SNPs) are functionally unresolved, raising the possibility that some of these SNPs are associated with disease through linkage disequilibrium with causal structural variants. Hence, understanding the linkage disequilibrium between newly discovered structural variants and statistically significant SNPs may provide a resource for further investigation into disease-associated regions in the genome. Here we present a resource cataloging structural variant-significant SNP pairs in high linkage disequilibrium. The database is composed of (i) SNPs that have exhibited genome-wide significant association with traits, primarily disease phenotypes, (ii) newly released structural variants (SVs), and (iii) linkage disequilibrium values calculated from unphased data. All data files including those detailing SV and GWAS SNP associations and results of GWAS-SNP-SV pairs are available at the SV-SNP LD Database and can be accessed at 'https://github.com/hliang-SchrodiLab/SV_SNPs. Our analysis results represent a useful fine mapping tool for interrogating SVs in linkage disequilibrium with disease-associated SNPs. We anticipate that this resource may play an important role in subsequent studies which investigate incorporating disease causing SVs into disease risk prediction models.
Collapse
Affiliation(s)
- Hao Liang
- Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI, USA
| | - Joni C Sedillo
- Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI, USA
- Computation and Informatics in Biology and Medicine, University of Wisconsin-Madison, Madison, WI, USA
| | - Steven J Schrodi
- Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI, USA
- Computation and Informatics in Biology and Medicine, University of Wisconsin-Madison, Madison, WI, USA
| | - Akihiro Ikeda
- Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI, USA
- McPherson Eye Research Institute, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
27
|
Pan C, Reinert K. Leaf: an ultrafast filter for population-scale long-read SV detection. Genome Biol 2024; 25:155. [PMID: 38872200 PMCID: PMC11170821 DOI: 10.1186/s13059-024-03297-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 06/04/2024] [Indexed: 06/15/2024] Open
Abstract
Advances in sequencing technology have facilitated population-scale long-read structural variant (SV) detection. Arguably, one of the main challenges in population-scale analysis is developing effective computational pipelines. Here, we present a new filter-based pipeline for population-scale long-read SV detection. It better captures SV signals at an early stage than conventional assembly-based or alignment-based pipelines. Assessments in this work suggest that the filter-based pipeline helps better resolve intra-read rearrangements. Moreover, it is also more computationally efficient than conventional pipelines and thus may facilitate population-scale long-read applications.
Collapse
Affiliation(s)
- Chenxu Pan
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany.
| | - Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, 14195, Germany
| |
Collapse
|
28
|
Patel-Tupper D, Kelikian A, Leipertz A, Maryn N, Tjahjadi M, Karavolias NG, Cho MJ, Niyogi KK. Multiplexed CRISPR-Cas9 mutagenesis of rice PSBS1 noncoding sequences for transgene-free overexpression. SCIENCE ADVANCES 2024; 10:eadm7452. [PMID: 38848363 PMCID: PMC11160471 DOI: 10.1126/sciadv.adm7452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Accepted: 05/03/2024] [Indexed: 06/09/2024]
Abstract
Understanding CRISPR-Cas9's capacity to produce native overexpression (OX) alleles would accelerate agronomic gains achievable by gene editing. To generate OX alleles with increased RNA and protein abundance, we leveraged multiplexed CRISPR-Cas9 mutagenesis of noncoding sequences upstream of the rice PSBS1 gene. We isolated 120 gene-edited alleles with varying non-photochemical quenching (NPQ) capacity in vivo-from knockout to overexpression-using a high-throughput screening pipeline. Overexpression increased OsPsbS1 protein abundance two- to threefold, matching fold changes obtained by transgenesis. Increased PsbS protein abundance enhanced NPQ capacity and water-use efficiency. Across our resolved genetic variation, we identify the role of 5'UTR indels and inversions in driving knockout/knockdown and overexpression phenotypes, respectively. Complex structural variants, such as the 252-kb duplication/inversion generated here, evidence the potential of CRISPR-Cas9 to facilitate significant genomic changes with negligible off-target transcriptomic perturbations. Our results may inform future gene-editing strategies for hypermorphic alleles and have advanced the pursuit of gene-edited, non-transgenic rice plants with accelerated relaxation of photoprotection.
Collapse
Affiliation(s)
- Dhruv Patel-Tupper
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA 94720, USA
| | - Armen Kelikian
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Anna Leipertz
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Nina Maryn
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Michelle Tjahjadi
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
| | - Nicholas G. Karavolias
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
| | - Myeong-Je Cho
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
| | - Krishna K. Niyogi
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| |
Collapse
|
29
|
Hu H, Gao R, Gao W, Gao B, Jiang Z, Zhou M, Wang G, Jiang T. SVDF: enhancing structural variation detect from long-read sequencing via automatic filtering strategies. Brief Bioinform 2024; 25:bbae336. [PMID: 38980375 PMCID: PMC11232458 DOI: 10.1093/bib/bbae336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 06/03/2024] [Accepted: 06/27/2024] [Indexed: 07/10/2024] Open
Abstract
Structural variation (SV) is an important form of genomic variation that influences gene function and expression by altering the structure of the genome. Although long-read data have been proven to better characterize SVs, SVs detected from noisy long-read data still include a considerable portion of false-positive calls. To accurately detect SVs in long-read data, we present SVDF, a method that employs a learning-based noise filtering strategy and an SV signature-adaptive clustering algorithm, for effectively reducing the likelihood of false-positive events. Benchmarking results from multiple orthogonal experiments demonstrate that, across different sequencing platforms and depths, SVDF achieves higher calling accuracy for each sample compared to several existing general SV calling tools. We believe that, with its meticulous and sensitive SV detection capability, SVDF can bring new opportunities and advancements to cutting-edge genomic research.
Collapse
Affiliation(s)
- Heng Hu
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Runtian Gao
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Wentao Gao
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Bo Gao
- Department of Radiology, The Second Affiliated Hospital of Harbin Medical University, Harbin 150000, China
| | - Zhongjun Jiang
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Murong Zhou
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150000, China
- State Key Laboratory of Tree Genetics and Breeding, Harbin 150000, China
| | - Tao Jiang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| |
Collapse
|
30
|
Yu Y, Gao R, Luo J. LcDel: deletion variation detection based on clustering and long reads. Front Genet 2024; 15:1404415. [PMID: 38798694 PMCID: PMC11116628 DOI: 10.3389/fgene.2024.1404415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 04/25/2024] [Indexed: 05/29/2024] Open
Abstract
Motivation: Genomic structural variation refers to chromosomal level variations such as genome rearrangement or insertion/deletion, which typically involve larger DNA fragments compared to single nucleotide variations. Deletion is a common type of structural variants in the genome, which may lead to mangy diseases, so the detection of deletions can help to gain insights into the pathogenesis of diseases and provide accurate information for disease diagnosis, treatment, and prevention. Many tools exist for deletion variant detection, but they are still inadequate in some aspects, and most of them ignore the presence of chimeric variants in clustering, resulting in less precise clustering results. Results: In this paper, we present LcDel, which can detect deletion variation based on clustering and long reads. LcDel first finds the candidate deletion sites and then performs the first clustering step using two clustering methods (sliding window-based and coverage-based, respectively) based on the length of the deletion. After that, LcDel immediately uses the second clustering by hierarchical clustering to determine the location and length of the deletion. LcDel is benchmarked against some other structural variation detection tools on multiple datasets, and the results show that LcDel has better detection performance for deletion. The source code is available in https://github.com/cyq1314woaini/LcDel.
Collapse
Affiliation(s)
| | | | - Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo, China
| |
Collapse
|
31
|
Fernández-Suárez E, González-Del Pozo M, Méndez-Vidal C, Martín-Sánchez M, Mena M, de la Morena-Barrio B, Corral J, Borrego S, Antiñolo G. Long-read sequencing improves the genetic diagnosis of retinitis pigmentosa by identifying an Alu retrotransposon insertion in the EYS gene. Mob DNA 2024; 15:9. [PMID: 38704576 PMCID: PMC11069205 DOI: 10.1186/s13100-024-00320-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 04/10/2024] [Indexed: 05/06/2024] Open
Abstract
BACKGROUND Biallelic variants in EYS are the major cause of autosomal recessive retinitis pigmentosa (arRP) in certain populations, a clinically and genetically heterogeneous disease that may lead to legal blindness. EYS is one of the largest genes (~ 2 Mb) expressed in the retina, in which structural variants (SVs) represent a common cause of disease. However, their identification using short-read sequencing (SRS) is not always feasible. Here, we conducted targeted long-read sequencing (T-LRS) using adaptive sampling of EYS on the MinION sequencing platform (Oxford Nanopore Technologies) to definitively diagnose an arRP family, whose affected individuals (n = 3) carried the heterozygous pathogenic deletion of exons 32-33 in the EYS gene. As this was a recurrent variant identified in three additional families in our cohort, we also aimed to characterize the known deletion at the nucleotide level to assess a possible founder effect. RESULTS T-LRS in family A unveiled a heterozygous AluYa5 insertion in the coding exon 43 of EYS (chr6(GRCh37):g.64430524_64430525ins352), which segregated with the disease in compound heterozygosity with the previously identified deletion. Visual inspection of previous SRS alignments using IGV revealed several reads containing soft-clipped bases, accompanied by a slight drop in coverage at the Alu insertion site. This prompted us to develop a simplified program using grep command to investigate the recurrence of this variant in our cohort from SRS data. Moreover, LRS also allowed the characterization of the CNV as a ~ 56.4kb deletion spanning exons 32-33 of EYS (chr6(GRCh37):g.64764235_64820592del). The results of further characterization by Sanger sequencing and linkage analysis in the four families were consistent with a founder variant. CONCLUSIONS To our knowledge, this is the first report of a mobile element insertion into the coding sequence of EYS, as a likely cause of arRP in a family. Our study highlights the value of LRS technology in characterizing and identifying hidden pathogenic SVs, such as retrotransposon insertions, whose contribution to the etiopathogenesis of rare diseases may be underestimated.
Collapse
Affiliation(s)
- Elena Fernández-Suárez
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville (IBiS), University Hospital Virgen del Rocío/CSIC, University of Seville, Seville, Spain
- Center for Biomedical Network Research On Rare Diseases (CIBERER), Seville, Spain
| | - María González-Del Pozo
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville (IBiS), University Hospital Virgen del Rocío/CSIC, University of Seville, Seville, Spain
- Center for Biomedical Network Research On Rare Diseases (CIBERER), Seville, Spain
| | - Cristina Méndez-Vidal
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville (IBiS), University Hospital Virgen del Rocío/CSIC, University of Seville, Seville, Spain
- Center for Biomedical Network Research On Rare Diseases (CIBERER), Seville, Spain
| | - Marta Martín-Sánchez
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville (IBiS), University Hospital Virgen del Rocío/CSIC, University of Seville, Seville, Spain
- Center for Biomedical Network Research On Rare Diseases (CIBERER), Seville, Spain
| | - Marcela Mena
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville (IBiS), University Hospital Virgen del Rocío/CSIC, University of Seville, Seville, Spain
- Center for Biomedical Network Research On Rare Diseases (CIBERER), Seville, Spain
| | - Belén de la Morena-Barrio
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, IMIB-Pascual Parrilla, CIBERER-ISCIII, Murcia, Spain
| | - Javier Corral
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, IMIB-Pascual Parrilla, CIBERER-ISCIII, Murcia, Spain
| | - Salud Borrego
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville (IBiS), University Hospital Virgen del Rocío/CSIC, University of Seville, Seville, Spain.
- Center for Biomedical Network Research On Rare Diseases (CIBERER), Seville, Spain.
| | - Guillermo Antiñolo
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville (IBiS), University Hospital Virgen del Rocío/CSIC, University of Seville, Seville, Spain.
- Center for Biomedical Network Research On Rare Diseases (CIBERER), Seville, Spain.
| |
Collapse
|
32
|
Steyaert W, Sagath L, Demidov G, Yépez VA, Esteve-Codina A, Gagneur J, Ellwanger K, Derks R, Weiss M, den Ouden A, van den Heuvel S, Swinkels H, Zomer N, Steehouwer M, O'Gorman L, Astuti G, Neveling K, Schüle R, Xu J, Synofzik M, Beijer D, Hengel H, Schöls L, Claeys KG, Baets J, Van de Vondel L, Ferlini A, Selvatici R, Morsy H, Saeed Abd Elmaksoud M, Straub V, Müller J, Pini V, Perry L, Sarkozy A, Zaharieva I, Muntoni F, Bugiardini E, Polavarapu K, Horvath R, Reid E, Lochmüller H, Spinazzi M, Savarese M, Matalonga L, Laurie S, Brunner HG, Graessner H, Beltran S, Ossowski S, Vissers LELM, Gilissen C, Hoischen A. Unravelling undiagnosed rare disease cases by HiFi long-read genome sequencing. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.03.24305331. [PMID: 38746462 PMCID: PMC11092722 DOI: 10.1101/2024.05.03.24305331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Solve-RD is a pan-European rare disease (RD) research program that aims to identify disease-causing genetic variants in previously undiagnosed RD families. We utilised 10-fold coverage HiFi long-read sequencing (LRS) for detecting causative structural variants (SVs), single nucleotide variants (SNVs), insertion-deletions (InDels), and short tandem repeat (STR) expansions in extensively studied RD families without clear molecular diagnoses. Our cohort includes 293 individuals from 114 genetically undiagnosed RD families selected by European Rare Disease Network (ERN) experts. Of these, 21 families were affected by so-called 'unsolvable' syndromes for which genetic causes remain unknown, and 93 families with at least one individual affected by a rare neurological, neuromuscular, or epilepsy disorder without genetic diagnosis despite extensive prior testing. Clinical interpretation and orthogonal validation of variants in known disease genes yielded thirteen novel genetic diagnoses due to de novo and rare inherited SNVs, InDels, SVs, and STR expansions. In an additional four families, we identified a candidate disease-causing SV affecting several genes including an MCF2 / FGF13 fusion and PSMA3 deletion. However, no common genetic cause was identified in any of the 'unsolvable' syndromes. Taken together, we found (likely) disease-causing genetic variants in 13.0% of previously unsolved families and additional candidate disease-causing SVs in another 4.3% of these families. In conclusion, our results demonstrate the added value of HiFi long-read genome sequencing in undiagnosed rare diseases.
Collapse
|
33
|
Schloissnig S, Pani S, Rodriguez-Martin B, Ebler J, Hain C, Tsapalou V, Söylev A, Hüther P, Ashraf H, Prodanov T, Asparuhova M, Hunt S, Rausch T, Marschall T, Korbel JO. Long-read sequencing and structural variant characterization in 1,019 samples from the 1000 Genomes Project. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.18.590093. [PMID: 38659906 PMCID: PMC11042266 DOI: 10.1101/2024.04.18.590093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Structural variants (SVs) contribute significantly to human genetic diversity and disease 1-4 . Previously, SVs have remained incompletely resolved by population genomics, with short-read sequencing facing limitations in capturing the whole spectrum of SVs at nucleotide resolution 5-7 . Here we leveraged nanopore sequencing 8 to construct an intermediate coverage resource of 1,019 long-read genomes sampled within 26 human populations from the 1000 Genomes Project. By integrating linear and graph-based approaches for SV analysis via pangenome graph-augmentation, we uncover 167,291 sequence-resolved SVs in these samples, considerably advancing SV characterization compared to population-wide short-read sequencing studies 3,4 . Our analysis details diverse SV classes-deletions, duplications, insertions, and inversions-at population-scale. LINE-1 and SVA retrotransposition activities frequently mediate transductions 9,10 of unique sequences, with both mobile element classes transducing sequences at either the 3'- or 5'-end, depending on the source element locus. Furthermore, analyses of SV breakpoint junctions suggest a continuum of homology-mediated rearrangement processes are integral to SV formation, and highlight evidence for SV recurrence involving repeat sequences. Our open-access dataset underscores the transformative impact of long-read sequencing in advancing the characterisation of polymorphic genomic architectures, and provides a resource for guiding variant prioritisation in future long-read sequencing-based disease studies.
Collapse
|
34
|
Perez-Becerril C, Burghel GJ, Hartley C, Rowlands CF, Evans DG, Smith MJ. Improved sensitivity for detection of pathogenic variants in familial NF2-related schwannomatosis. J Med Genet 2024; 61:452-458. [PMID: 38302265 DOI: 10.1136/jmg-2023-109586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 12/07/2023] [Indexed: 02/03/2024]
Abstract
PURPOSE To determine the impact of additional genetic screening techniques on the rate of detection of pathogenic variants leading to familial NF2-related schwannomatosis. METHODS We conducted genetic screening of a cohort of 168 second-generation individuals meeting the clinical criteria for NF2-related schwannomatosis. In addition to the current clinical screening techniques, targeted next-generation sequencing (NGS) and multiplex ligation-dependent probe amplification analysis, we applied additional genetic screening techniques, including karyotype and RNA analysis. For characterisation of a complex structural variant, we also performed long-read sequencing analysis. RESULTS Additional genetic analysis resulted in increased sensitivity of detection of pathogenic variants from 87% to 95% in our second-generation NF2-related schwannomatosis cohort. A number of pathogenic variants identified through extended analysis had been previously observed after NGS analysis but had been overlooked or classified as variants of uncertain significance. CONCLUSION Our study indicates there is added value in performing additional genetic analysis for detection of pathogenic variants that are difficult to identify with current clinical genetic screening methods. In particular, RNA analysis is valuable for accurate classification of non-canonical splicing variants. Karyotype analysis and whole genome sequencing analysis are of particular value for identification of large and/or complex structural variants, with additional advantages in the use of long-read sequencing techniques.
Collapse
Affiliation(s)
- Cristina Perez-Becerril
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Manchester, UK
- Division of Evolution, Infection and Genomics, School of Biological Sciences, The University of Manchester, Manchester, UK
| | - George J Burghel
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Manchester, UK
- Division of Evolution, Infection and Genomics, School of Biological Sciences, The University of Manchester, Manchester, UK
| | - Claire Hartley
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Manchester, UK
| | - Charles F Rowlands
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - D Gareth Evans
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Manchester, UK
- Division of Evolution, Infection and Genomics, School of Biological Sciences, The University of Manchester, Manchester, UK
| | - Miriam J Smith
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Manchester, UK
- Division of Evolution, Infection and Genomics, School of Biological Sciences, The University of Manchester, Manchester, UK
| |
Collapse
|
35
|
Hujoel MLA, Handsaker RE, Sherman MA, Kamitaki N, Barton AR, Mukamel RE, Terao C, McCarroll SA, Loh PR. Protein-altering variants at copy number-variable regions influence diverse human phenotypes. Nat Genet 2024; 56:569-578. [PMID: 38548989 PMCID: PMC11018521 DOI: 10.1038/s41588-024-01684-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 02/08/2024] [Indexed: 04/09/2024]
Abstract
Copy number variants (CNVs) are among the largest genetic variants, yet CNVs have not been effectively ascertained in most genetic association studies. Here we ascertained protein-altering CNVs from UK Biobank whole-exome sequencing data (n = 468,570) using haplotype-informed methods capable of detecting subexonic CNVs and variation within segmental duplications. Incorporating CNVs into analyses of rare variants predicted to cause gene loss of function (LOF) identified 100 associations of predicted LOF variants with 41 quantitative traits. A low-frequency partial deletion of RGL3 exon 6 conferred one of the strongest protective effects of gene LOF on hypertension risk (odds ratio = 0.86 (0.82-0.90)). Protein-coding variation in rapidly evolving gene families within segmental duplications-previously invisible to most analysis methods-generated some of the human genome's largest contributions to variation in type 2 diabetes risk, chronotype and blood cell traits. These results illustrate the potential for new genetic insights from genomic variation that has escaped large-scale analysis to date.
Collapse
Affiliation(s)
- Margaux L A Hujoel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Robert E Handsaker
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Maxwell A Sherman
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Serinus Biosciences Inc., New York, NY, USA
| | - Nolan Kamitaki
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Alison R Barton
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Ronen E Mukamel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- Department of Applied Genetics, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| | - Steven A McCarroll
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Po-Ru Loh
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
36
|
Chen Z, Finnell RH, Lei Y, Wang H. Progress and clinical prospect of genomic structural variants investigation. Sci Bull (Beijing) 2024; 69:705-708. [PMID: 38310047 DOI: 10.1016/j.scib.2024.01.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2024]
Affiliation(s)
- Zhongzhong Chen
- Obstetrics and Gynecology Hospital, State Key Laboratory of Genetic Engineering, Institute of Reproduction and Development, Fudan University, Shanghai 200011, China; Shanghai Children's Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200062, China
| | - Richard H Finnell
- Center for Precision Environmental Health, Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston 77030, USA; Departments of Molecular and Human Genetics and Medicine, Baylor College of Medicine, One Baylor Plaza, Houston 77030, USA
| | - Yunping Lei
- Center for Precision Environmental Health, Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston 77030, USA.
| | - Hongyan Wang
- Obstetrics and Gynecology Hospital, State Key Laboratory of Genetic Engineering, Institute of Reproduction and Development, Fudan University, Shanghai 200011, China; Shanghai Key Laboratory of Metabolic Remodelling and Health, Institute of Metabolism and Integrative Biology, Fudan University, Shanghai 200438, China; Children's Hospital of Fudan University, Shanghai 201102, China.
| |
Collapse
|
37
|
Wu Z, Li T, Jiang Z, Zheng J, Gu Y, Liu Y, Liu Y, Xie Z. Human pangenome analysis of sequences missing from the reference genome reveals their widespread evolutionary, phenotypic, and functional roles. Nucleic Acids Res 2024; 52:2212-2230. [PMID: 38364871 PMCID: PMC10954445 DOI: 10.1093/nar/gkae086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 01/18/2024] [Accepted: 01/27/2024] [Indexed: 02/18/2024] Open
Abstract
Nonreference sequences (NRSs) are DNA sequences present in global populations but absent in the current human reference genome. However, the extent and functional significance of NRSs in the human genomes and populations remains unclear. Here, we de novo assembled 539 genomes from five genetically divergent human populations using long-read sequencing technology, resulting in the identification of 5.1 million NRSs. These were merged into 45284 unique NRSs, with 29.7% being novel discoveries. Among these NRSs, 38.7% were common across the five populations, and 35.6% were population specific. The use of a graph-based pangenome approach allowed for the detection of 565 transcript expression quantitative trait loci on NRSs, with 426 of these being novel findings. Moreover, 26 NRS candidates displayed evidence of adaptive selection within human populations. Genes situated in close proximity to or intersecting with these candidates may be associated with metabolism and type 2 diabetes. Genome-wide association studies revealed 14 NRSs to be significantly associated with eight phenotypes. Additionally, 154 NRSs were found to be in strong linkage disequilibrium with 258 phenotype-associated SNPs in the GWAS catalogue. Our work expands the understanding of human NRSs and provides novel insights into their functions, facilitating evolutionary and biomedical researches.
Collapse
Affiliation(s)
- Zhikun Wu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Tong Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Zehang Jiang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Jingjing Zheng
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yizhou Gu
- Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China
- University of Wisconsin-Madison, WI, USA
| | - Yizhi Liu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yun Liu
- MOE Key Laboratory of Metabolism and Molecular Medicine, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences and Shanghai Xuhui Central Hospital, Fudan University, Shanghai, China
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
- Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
38
|
Leonard AS, Mapel XM, Pausch H. Pangenome-genotyped structural variation improves molecular phenotype mapping in cattle. Genome Res 2024; 34:300-309. [PMID: 38355307 PMCID: PMC10984387 DOI: 10.1101/gr.278267.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 02/01/2024] [Indexed: 02/16/2024]
Abstract
Expression and splicing quantitative trait loci (e/sQTL) are large contributors to phenotypic variability. Achieving sufficient statistical power for e/sQTL mapping requires large cohorts with both genotypes and molecular phenotypes, and so, the genomic variation is often called from short-read alignments, which are unable to comprehensively resolve structural variation. Here we build a pangenome from 16 HiFi haplotype-resolved cattle assemblies to identify small and structural variation and genotype them with PanGenie in 307 short-read samples. We find high (>90%) concordance of PanGenie-genotyped and DeepVariant-called small variation and confidently genotype close to 21 million small and 43,000 structural variants in the larger population. We validate 85% of these structural variants (with MAF > 0.1) directly with a subset of 25 short-read samples that also have medium coverage HiFi reads. We then conduct e/sQTL mapping with this comprehensive variant set in a subset of 117 cattle that have testis transcriptome data, and find 92 structural variants as causal candidates for eQTL and 73 for sQTL. We find that roughly half of the top associated structural variants affecting expression or splicing are transposable elements, such as SV-eQTL for STN1 and MYH7 and SV-sQTL for CEP89 and ASAH2 Extensive linkage disequilibrium between small and structural variation results in only 28 additional eQTL and 17 sQTL discovered when including SVs, although many top associated SVs are compelling candidates.
Collapse
Affiliation(s)
| | - Xena M Mapel
- Animal Genomics, ETH Zurich, 8092 Zurich, Switzerland
| | - Hubert Pausch
- Animal Genomics, ETH Zurich, 8092 Zurich, Switzerland
| |
Collapse
|
39
|
Sigurpalsdottir BD, Stefansson OA, Holley G, Beyter D, Zink F, Hardarson MÞ, Sverrisson SÞ, Kristinsdottir N, Magnusdottir DN, Magnusson OÞ, Gudbjartsson DF, Halldorsson BV, Stefansson K. A comparison of methods for detecting DNA methylation from long-read sequencing of human genomes. Genome Biol 2024; 25:69. [PMID: 38468278 PMCID: PMC10929077 DOI: 10.1186/s13059-024-03207-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 02/28/2024] [Indexed: 03/13/2024] Open
Abstract
BACKGROUND Long-read sequencing can enable the detection of base modifications, such as CpG methylation, in single molecules of DNA. The most commonly used methods for long-read sequencing are nanopore developed by Oxford Nanopore Technologies (ONT) and single molecule real-time (SMRT) sequencing developed by Pacific Bioscience (PacBio). In this study, we systematically compare the performance of CpG methylation detection from long-read sequencing. RESULTS We demonstrate that CpG methylation detection from 7179 nanopore-sequenced DNA samples is highly accurate and consistent with 132 oxidative bisulfite-sequenced (oxBS) samples, isolated from the same blood draws. We introduce quality filters for CpGs that further enhance the accuracy of CpG methylation detection from nanopore-sequenced DNA, while removing at most 30% of CpGs. We evaluate the per-site performance of CpG methylation detection across different genomic features and CpG methylation rates and demonstrate how the latest R10.4 flowcell chemistry and base-calling algorithms improve methylation detection from nanopore sequencing. Additionally, we show how the methylation detection of 50 SMRT-sequenced genomes compares to nanopore sequencing and oxBS. CONCLUSIONS This study provides the first systematic comparison of CpG methylation detection tools for long-read sequencing methods. We compare two commonly used computational methods for the detection of CpG methylation in a large number of nanopore genomes, including samples sequenced using the latest R10.4 nanopore flowcell chemistry and 50 SMRT sequenced samples. We provide insights into the strengths and limitations of each sequencing method as well as recommendations for standardization and evaluation of tools designed for genome-scale modified base detection using long-read sequencing.
Collapse
Affiliation(s)
- Brynja D Sigurpalsdottir
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland.
- School of Technology, Reykjavík University, Reykjavík, Iceland.
| | | | | | - Doruk Beyter
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland
| | - Florian Zink
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland
| | - Marteinn Þ Hardarson
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland
- School of Technology, Reykjavík University, Reykjavík, Iceland
| | | | | | | | | | - Daniel F Gudbjartsson
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavík, Iceland
| | - Bjarni V Halldorsson
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland.
- School of Technology, Reykjavík University, Reykjavík, Iceland.
| | - Kari Stefansson
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland
- Faculty of Medicine, School of Health Science, University of Iceland, Reykjavík, Iceland
| |
Collapse
|
40
|
Wang Z, Fu G, Ma G, Wang C, Wang Q, Lu C, Fu L, Zhang X, Cong B, Li S. The association between DNA methylation and human height and a prospective model of DNA methylation-based height prediction. Hum Genet 2024; 143:401-421. [PMID: 38507014 DOI: 10.1007/s00439-024-02659-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 02/13/2024] [Indexed: 03/22/2024]
Abstract
As a vital anthropometric characteristic, human height information not only helps to understand overall developmental status and genetic risk factors, but is also important for forensic DNA phenotyping. We utilized linear regression analysis to test the association between each CpG probe and the height phenotype. Next, we designed a methylation sequencing panel targeting 959 CpGs and subsequent height inference models were constructed for the Chinese population. A total of 11,730 height-associated sites were identified. By employing KPCA and deep neural networks, a prediction model was developed, of which the cross-validation RMSE, MAE and R2 were 5.62 cm, 4.45 cm and 0.64, respectively. Genetic factors could explain 39.4% of the methylation level variance of sites used in the height inference models. Collectively, we demonstrated an association between height and DNA methylation status through an EWAS analysis. Targeted methylation sequencing of only 959 CpGs combined with deep learning techniques could provide a model to estimate human height with higher accuracy than SNP-based prediction models.
Collapse
Affiliation(s)
- Zhonghua Wang
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Guangping Fu
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Guanju Ma
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Chunyan Wang
- Physical Examination Center of Shijiazhuang People's Hospital, Shijiazhuang, 050011, Hebei, China
| | - Qian Wang
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Chaolong Lu
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Lihong Fu
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Xiaojing Zhang
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Bin Cong
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Shujin Li
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China.
| |
Collapse
|
41
|
Nakamura W, Hirata M, Oda S, Chiba K, Okada A, Mateos RN, Sugawa M, Iida N, Ushiama M, Tanabe N, Sakamoto H, Sekine S, Hirasawa A, Kawai Y, Tokunaga K, Tsujimoto SI, Shiba N, Ito S, Yoshida T, Shiraishi Y. Assessing the efficacy of target adaptive sampling long-read sequencing through hereditary cancer patient genomes. NPJ Genom Med 2024; 9:11. [PMID: 38368425 PMCID: PMC10874402 DOI: 10.1038/s41525-024-00394-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 01/15/2024] [Indexed: 02/19/2024] Open
Abstract
Innovations in sequencing technology have led to the discovery of novel mutations that cause inherited diseases. However, many patients with suspected genetic diseases remain undiagnosed. Long-read sequencing technologies are expected to significantly improve the diagnostic rate by overcoming the limitations of short-read sequencing. In addition, Oxford Nanopore Technologies (ONT) offers adaptive sampling and computationally driven target enrichment technology. This enables more affordable intensive analysis of target gene regions compared to standard non-selective long-read sequencing. In this study, we developed an efficient computational workflow for target adaptive sampling long-read sequencing (TAS-LRS) and evaluated it through application to 33 genomes collected from suspected hereditary cancer patients. Our workflow can identify single nucleotide variants with nearly the same accuracy as the short-read platform and elucidate complex forms of structural variations. We also newly identified several SINE-R/VNTR/Alu (SVA) elements affecting the APC gene in two patients with familial adenomatous polyposis, as well as their sites of origin. In addition, we demonstrated that off-target reads from adaptive sampling, which is typically discarded, can be effectively used to accurately genotype common single-nucleotide polymorphisms (SNPs) across the entire genome, enabling the calculation of a polygenic risk score. Furthermore, we identified allele-specific MLH1 promoter hypermethylation in a Lynch syndrome patient. In summary, our workflow with TAS-LRS can simultaneously capture monogenic risk variants including complex structural variations, polygenic background as well as epigenetic alterations, and will be an efficient platform for genetic disease research and diagnosis.
Collapse
Affiliation(s)
- Wataru Nakamura
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
- Department of Pediatrics, Yokohama City University Hospital, Kanagawa, Japan
| | - Makoto Hirata
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Department of Molecular Pathology, National Cancer Center Research Institute, Tokyo, Japan
| | - Satoyo Oda
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Division of Laboratory Medicine, National Cancer Center Hospital, Tokyo, Japan
| | - Kenichi Chiba
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Ai Okada
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Raúl Nicolás Mateos
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Masahiro Sugawa
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Naoko Iida
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Mineko Ushiama
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Department of Clinical Genetics, National Cancer Center Research Institute, Tokyo, Japan
| | - Noriko Tanabe
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
| | - Hiromi Sakamoto
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Department of Clinical Genetics, National Cancer Center Research Institute, Tokyo, Japan
| | - Shigeki Sekine
- Division of Molecular Pathology, National Cancer Center Research Institute, Tokyo, Japan
| | - Akira Hirasawa
- Department of Clinical Genetics and Genomic Medicine, Okayama University Hospital, Okayama, Japan
| | - Yosuke Kawai
- Genome Medical Science Project, Research Institute, National Center for Global Health and Medicine, Tokyo, Japan
| | - Katsushi Tokunaga
- Genome Medical Science Project, Research Institute, National Center for Global Health and Medicine, Tokyo, Japan
- Central Biobank, National Center Biobank Network, Tokyo, Japan
| | - Shin-Ichi Tsujimoto
- Department of Pediatrics, Yokohama City University Hospital, Kanagawa, Japan
| | - Norio Shiba
- Department of Pediatrics, Yokohama City University Hospital, Kanagawa, Japan
| | - Shuichi Ito
- Department of Pediatrics, Yokohama City University Hospital, Kanagawa, Japan
| | - Teruhiko Yoshida
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Department of Clinical Genetics, National Cancer Center Research Institute, Tokyo, Japan
| | - Yuichi Shiraishi
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan.
| |
Collapse
|
42
|
Audano PA, Beck CR. Small polymorphisms are a source of ancestral bias in structural variant breakpoint placement. Genome Res 2024; 34:7-19. [PMID: 38176712 PMCID: PMC10904011 DOI: 10.1101/gr.278203.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 01/02/2024] [Indexed: 01/06/2024]
Abstract
High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥50 bp) has improved to near base pair precision. Despite these advances, many SV breakpoint locations are subject to systematic bias affecting variant representation. To understand why SV breakpoints are inconsistent across samples, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identify 882 SV insertions and 180 SV deletions with variable breakpoints not anchored in tandem repeats (TRs) or segmental duplications (SDs). SVs called from aligned sequencing reads increase breakpoint disagreements by 2×-16×. Sequence accuracy had a minimal impact on breakpoints, but we observe a strong effect of ancestry. We confirm that SNP and indel polymorphisms are enriched at shifted breakpoints and are also absent from variant callsets. Breakpoint homology increases the likelihood of imprecise SV calls and the distance they are shifted, and tandem duplications are the most heavily affected SVs. Because graph genome methods normalize SV calls across samples, we investigated graphs generated by two different methods and find the resulting breakpoints are subject to other technical biases affecting breakpoint accuracy. The breakpoint inconsistencies we characterize affect ∼5% of the SVs called in a human genome and can impact variant interpretation and annotation. These limitations underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoints, and increase the value of callsets for investigating breakpoint features.
Collapse
Affiliation(s)
- Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA;
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, Connecticut 06030, USA
| |
Collapse
|
43
|
Gueuning M, Thun GA, Trost N, Schneider L, Sigurdardottir S, Engström C, Larbes N, Merki Y, Frey BM, Gassner C, Meyer S, Mattle-Greminger MP. Resolving Genotype-Phenotype Discrepancies of the Kidd Blood Group System Using Long-Read Nanopore Sequencing. Biomedicines 2024; 12:225. [PMID: 38275395 PMCID: PMC10813000 DOI: 10.3390/biomedicines12010225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 01/15/2024] [Accepted: 01/16/2024] [Indexed: 01/27/2024] Open
Abstract
Due to substantial improvements in read accuracy, third-generation long-read sequencing holds great potential in blood group diagnostics, particularly in cases where traditional genotyping or sequencing techniques, primarily targeting exons, fail to explain serological phenotypes. In this study, we employed Oxford Nanopore sequencing to resolve all genotype-phenotype discrepancies in the Kidd blood group system (JK, encoded by SLC14A1) observed over seven years of routine high-throughput donor genotyping using a mass spectrometry-based platform at the Blood Transfusion Service, Zurich. Discrepant results from standard serological typing and donor genotyping were confirmed using commercial PCR-SSP kits. To resolve discrepancies, we amplified the entire coding region of SLC14A1 (~24 kb, exons 3 to 10) in two overlapping long-range PCRs in all samples. Amplicons were barcoded and sequenced on a MinION flow cell. Sanger sequencing and bridge-PCRs were used to confirm findings. Among 11,972 donors with both serological and genotype data available for the Kidd system, we identified 10 cases with unexplained conflicting results. Five were linked to known weak and null alleles caused by variants not included in the routine donor genotyping. In two cases, we identified novel null alleles on the JK*01 (Gly40Asp; c.119G>A) and JK*02 (Gly242Glu; c.725G>A) haplotypes, respectively. Remarkably, the remaining three cases were associated with a yet unknown deletion of ~5 kb spanning exons 9-10 of the JK*01 allele, which other molecular methods had failed to detect. Overall, nanopore sequencing demonstrated reliable and accurate performance for detecting both single-nucleotide and structural variants. It possesses the potential to become a robust tool in the molecular diagnostic portfolio, particularly for addressing challenging structural variants such as hybrid genes, deletions and duplications.
Collapse
Affiliation(s)
- Morgan Gueuning
- Department of Research and Development, Blood Transfusion Service Zurich, Swiss Red Cross, Rütistrasse 19, 8952 Schlieren, Switzerland
| | - Gian Andri Thun
- Department of Research and Development, Blood Transfusion Service Zurich, Swiss Red Cross, Rütistrasse 19, 8952 Schlieren, Switzerland
| | - Nadine Trost
- Department of Molecular Diagnostics and Cytometry, Blood Transfusion Service Zurich, Swiss Red Cross, 8952 Schlieren, Switzerland
| | - Linda Schneider
- Department of Molecular Diagnostics and Cytometry, Blood Transfusion Service Zurich, Swiss Red Cross, 8952 Schlieren, Switzerland
| | - Sonja Sigurdardottir
- Department of Molecular Diagnostics and Cytometry, Blood Transfusion Service Zurich, Swiss Red Cross, 8952 Schlieren, Switzerland
| | - Charlotte Engström
- Department of Immunohematology, Blood Transfusion Service Zurich, Swiss Red Cross, 8952 Schlieren, Switzerland; (C.E.)
| | - Naemi Larbes
- Department of Immunohematology, Blood Transfusion Service Zurich, Swiss Red Cross, 8952 Schlieren, Switzerland; (C.E.)
| | - Yvonne Merki
- Department of Molecular Diagnostics and Cytometry, Blood Transfusion Service Zurich, Swiss Red Cross, 8952 Schlieren, Switzerland
| | - Beat M. Frey
- Department of Research and Development, Blood Transfusion Service Zurich, Swiss Red Cross, Rütistrasse 19, 8952 Schlieren, Switzerland
- Department of Molecular Diagnostics and Cytometry, Blood Transfusion Service Zurich, Swiss Red Cross, 8952 Schlieren, Switzerland
- Department of Immunohematology, Blood Transfusion Service Zurich, Swiss Red Cross, 8952 Schlieren, Switzerland; (C.E.)
| | - Christoph Gassner
- Institute of Translational Medicine, Private University in the Principality of Liechtenstein, 9495 Triesen, Liechtenstein;
| | - Stefan Meyer
- Department of Molecular Diagnostics and Cytometry, Blood Transfusion Service Zurich, Swiss Red Cross, 8952 Schlieren, Switzerland
| | - Maja P. Mattle-Greminger
- Department of Research and Development, Blood Transfusion Service Zurich, Swiss Red Cross, Rütistrasse 19, 8952 Schlieren, Switzerland
| |
Collapse
|
44
|
Auwerx C, Jõeloo M, Sadler MC, Tesio N, Ojavee S, Clark CJ, Mägi R, Reymond A, Kutalik Z. Rare copy-number variants as modulators of common disease susceptibility. Genome Med 2024; 16:5. [PMID: 38185688 PMCID: PMC10773105 DOI: 10.1186/s13073-023-01265-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 11/27/2023] [Indexed: 01/09/2024] Open
Abstract
BACKGROUND Copy-number variations (CNVs) have been associated with rare and debilitating genomic disorders (GDs) but their impact on health later in life in the general population remains poorly described. METHODS Assessing four modes of CNV action, we performed genome-wide association scans (GWASs) between the copy-number of CNV-proxy probes and 60 curated ICD-10 based clinical diagnoses in 331,522 unrelated white British UK Biobank (UKBB) participants with replication in the Estonian Biobank. RESULTS We identified 73 signals involving 40 diseases, all of which indicating that CNVs increased disease risk and caused earlier onset. We estimated that 16% of these associations are indirect, acting by increasing body mass index (BMI). Signals mapped to 45 unique, non-overlapping regions, nine of which being linked to known GDs. Number and identity of genes affected by CNVs modulated their pathogenicity, with many associations being supported by colocalization with both common and rare single-nucleotide variant association signals. Dissection of association signals provided insights into the epidemiology of known gene-disease pairs (e.g., deletions in BRCA1 and LDLR increased risk for ovarian cancer and ischemic heart disease, respectively), clarified dosage mechanisms of action (e.g., both increased and decreased dosage of 17q12 impacted renal health), and identified putative causal genes (e.g., ABCC6 for kidney stones). Characterization of the pleiotropic pathological consequences of recurrent CNVs at 15q13, 16p13.11, 16p12.2, and 22q11.2 in adulthood indicated variable expressivity of these regions and the involvement of multiple genes. Finally, we show that while the total burden of rare CNVs-and especially deletions-strongly associated with disease risk, it only accounted for ~ 0.02% of the UKBB disease burden. These associations are mainly driven by CNVs at known GD CNV regions, whose pleiotropic effect on common diseases was broader than anticipated by our CNV-GWAS. CONCLUSIONS Our results shed light on the prominent role of rare CNVs in determining common disease susceptibility within the general population and provide actionable insights for anticipating later-onset comorbidities in carriers of recurrent CNVs.
Collapse
Affiliation(s)
- Chiara Auwerx
- Center for Integrative Genomics, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland.
- Department of Computational Biology, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
- University Center for Primary Care and Public Health, 1005, Lausanne, Switzerland.
| | - Maarja Jõeloo
- Institute of Molecular and Cell Biology, University of Tartu, 51010, Tartu, Estonia
- Estonian Genome Centre, Institute of Genomics, University of Tartu, 51010, Tartu, Estonia
| | - Marie C Sadler
- Department of Computational Biology, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
- University Center for Primary Care and Public Health, 1005, Lausanne, Switzerland
| | - Nicolò Tesio
- Center for Integrative Genomics, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland
| | - Sven Ojavee
- Department of Computational Biology, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Charlie J Clark
- Center for Integrative Genomics, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland
| | - Reedik Mägi
- Estonian Genome Centre, Institute of Genomics, University of Tartu, 51010, Tartu, Estonia
| | - Alexandre Reymond
- Center for Integrative Genomics, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland.
| | - Zoltán Kutalik
- Department of Computational Biology, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
- University Center for Primary Care and Public Health, 1005, Lausanne, Switzerland.
| |
Collapse
|
45
|
Chaisson MJP, Sulovari A, Valdmanis PN, Miller DE, Eichler EE. Advances in the discovery and analyses of human tandem repeats. Emerg Top Life Sci 2023; 7:361-381. [PMID: 37905568 PMCID: PMC10806765 DOI: 10.1042/etls20230074] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/18/2023] [Accepted: 10/18/2023] [Indexed: 11/02/2023]
Abstract
Long-read sequencing platforms provide unparalleled access to the structure and composition of all classes of tandemly repeated DNA from STRs to satellite arrays. This review summarizes our current understanding of their organization within the human genome, their importance with respect to disease, as well as the advances and challenges in understanding their genetic diversity and functional effects. Novel computational methods are being developed to visualize and associate these complex patterns of human variation with disease, expression, and epigenetic differences. We predict accurate characterization of this repeat-rich form of human variation will become increasingly relevant to both basic and clinical human genetics.
Collapse
Affiliation(s)
- Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, U.S.A
- The Genomic and Epigenomic Regulation Program, USC Norris Cancer Center, University of Southern California, Los Angeles, CA 90089, U.S.A
| | - Arvis Sulovari
- Computational Biology, Cajal Neuroscience Inc, Seattle, WA 98102, U.S.A
| | - Paul N Valdmanis
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
| | - Danny E Miller
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, U.S.A
- Department of Pediatrics, University of Washington, Seattle, WA 98195, U.S.A
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, U.S.A
| |
Collapse
|
46
|
Shi J, Jia Z, Sun J, Wang X, Zhao X, Zhao C, Liang F, Song X, Guan J, Jia X, Yang J, Chen Q, Yu K, Jia Q, Wu J, Wang D, Xiao Y, Xu X, Liu Y, Wu S, Zhong Q, Wu J, Cui S, Bo X, Wu Z, Park M, Kellis M, He K. Structural variants involved in high-altitude adaptation detected using single-molecule long-read sequencing. Nat Commun 2023; 14:8282. [PMID: 38092772 PMCID: PMC10719358 DOI: 10.1038/s41467-023-44034-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 11/27/2023] [Indexed: 12/17/2023] Open
Abstract
Structural variants (SVs), accounting for a larger fraction of the genome than SNPs/InDels, are an important pool of genetic variation, enabling environmental adaptations. Here, we perform long-read sequencing data of 320 Tibetan and Han samples and show that SVs are highly involved in high-altitude adaptation. We expand the landscape of global SVs, apply robust models of selection and population differentiation combining SVs, SNPs and InDels, and use epigenomic analyses to predict enhancers, target genes and biological functions. We reveal diverse Tibetan-specific SVs affecting the regulatory circuitry of biological functions, including the hypoxia response, energy metabolism and pulmonary function. We find a Tibetan-specific deletion disrupts a super-enhancer and downregulates EPAS1 using enhancer reporter, cellular knock-out and DNA pull-down assays. Our study expands the global SV landscape, reveals the role of gene-regulatory circuitry rewiring in human adaptation, and illustrates the diverse functional roles of SVs in human biology.
Collapse
Affiliation(s)
- Jinlong Shi
- Medical Big Data Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
- National Engineering Research Center of Medical Big Data, Chinese PLA General Hospital, Beijing, 100853, China
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
- Beijing Key Laboratory for Precision Medicine of Chronic Heart Failure, Chinese PLA General Hospital, Beijing, China
| | - Zhilong Jia
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
- Beijing Key Laboratory for Precision Medicine of Chronic Heart Failure, Chinese PLA General Hospital, Beijing, China
- Medical Artificial Intelligence Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
| | - Jinxiu Sun
- Medical Big Data Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
- National Engineering Research Center of Medical Big Data, Chinese PLA General Hospital, Beijing, 100853, China
| | - Xiaoreng Wang
- Laboratory of Nuclear and Radiation Injury, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
- State Key Laboratory of Experimental Hematology, Beijing, 100853, China
| | - Xiaojing Zhao
- Beijing Key Laboratory for Precision Medicine of Chronic Heart Failure, Chinese PLA General Hospital, Beijing, China
- Translational Medicine Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
| | - Chenghui Zhao
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
- Research Center for Biomedical Engineering, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
| | - Fan Liang
- NextOmics Biosciences Inc, Wuhan, 430000, China
| | - Xinyu Song
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
- Medical Artificial Intelligence Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
| | - Jiawei Guan
- Medical Big Data Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
- National Engineering Research Center of Medical Big Data, Chinese PLA General Hospital, Beijing, 100853, China
| | - Xue Jia
- Laboratory of Nuclear and Radiation Injury, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
| | - Jing Yang
- Laboratory of Nuclear and Radiation Injury, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
| | - Qi Chen
- Medical Big Data Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
- National Engineering Research Center of Medical Big Data, Chinese PLA General Hospital, Beijing, 100853, China
| | - Kang Yu
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
| | - Qian Jia
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
| | - Jing Wu
- Medical Big Data Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
- National Engineering Research Center of Medical Big Data, Chinese PLA General Hospital, Beijing, 100853, China
| | - Depeng Wang
- NextOmics Biosciences Inc, Wuhan, 430000, China
| | - Yuhui Xiao
- NextOmics Biosciences Inc, Wuhan, 430000, China
| | - Xiaoman Xu
- NextOmics Biosciences Inc, Wuhan, 430000, China
| | - Yinzhe Liu
- NextOmics Biosciences Inc, Wuhan, 430000, China
| | - Shijing Wu
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
| | - Qin Zhong
- Medical Big Data Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
- National Engineering Research Center of Medical Big Data, Chinese PLA General Hospital, Beijing, 100853, China
| | - Jue Wu
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
| | - Saijia Cui
- Beijing Key Laboratory for Precision Medicine of Chronic Heart Failure, Chinese PLA General Hospital, Beijing, China
| | - Xiaochen Bo
- Beijing Institute of Radiation Medicine, Beijing, 100850, China
| | | | | | - Manolis Kellis
- Massachusetts Institute of Technology; MIT Computer Science and Artificial Intelligence Laboratory, Broad Institute of MIT and Harvard, Cambridge, 02139, MA, USA
| | - Kunlun He
- Medical Big Data Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China.
- National Engineering Research Center of Medical Big Data, Chinese PLA General Hospital, Beijing, 100853, China.
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China.
- Beijing Key Laboratory for Precision Medicine of Chronic Heart Failure, Chinese PLA General Hospital, Beijing, China.
| |
Collapse
|
47
|
Reis ALM, Rapadas M, Hammond JM, Gamaarachchi H, Stevanovski I, Ayuputeri Kumaheri M, Chintalaphani SR, Dissanayake DSB, Siggs OM, Hewitt AW, Llamas B, Brown A, Baynam G, Mann GJ, McMorran BJ, Easteal S, Hermes A, Jenkins MR, Patel HR, Deveson IW. The landscape of genomic structural variation in Indigenous Australians. Nature 2023; 624:602-610. [PMID: 38093003 PMCID: PMC10733147 DOI: 10.1038/s41586-023-06842-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 11/07/2023] [Indexed: 12/20/2023]
Abstract
Indigenous Australians harbour rich and unique genomic diversity. However, Aboriginal and Torres Strait Islander ancestries are historically under-represented in genomics research and almost completely missing from reference datasets1-3. Addressing this representation gap is critical, both to advance our understanding of global human genomic diversity and as a prerequisite for ensuring equitable outcomes in genomic medicine. Here we apply population-scale whole-genome long-read sequencing4 to profile genomic structural variation across four remote Indigenous communities. We uncover an abundance of large insertion-deletion variants (20-49 bp; n = 136,797), structural variants (50 b-50 kb; n = 159,912) and regions of variable copy number (>50 kb; n = 156). The majority of variants are composed of tandem repeat or interspersed mobile element sequences (up to 90%) and have not been previously annotated (up to 62%). A large fraction of structural variants appear to be exclusive to Indigenous Australians (12% lower-bound estimate) and most of these are found in only a single community, underscoring the need for broad and deep sampling to achieve a comprehensive catalogue of genomic structural variation across the Australian continent. Finally, we explore short tandem repeats throughout the genome to characterize allelic diversity at 50 known disease loci5, uncover hundreds of novel repeat expansion sites within protein-coding genes, and identify unique patterns of diversity and constraint among short tandem repeat sequences. Our study sheds new light on the dimensions and dynamics of genomic structural variation within and beyond Australia.
Collapse
Affiliation(s)
- Andre L M Reis
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
- Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia
| | - Melissa Rapadas
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
| | - Jillian M Hammond
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
| | - Hasindu Gamaarachchi
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
- School of Computer Science and Engineering, University of New South Wales, Sydney, New South Wales, Australia
| | - Igor Stevanovski
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
| | - Meutia Ayuputeri Kumaheri
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
| | - Sanjog R Chintalaphani
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
- Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia
| | - Duminda S B Dissanayake
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
- Institute for Applied Ecology, University of Canberra, Canberra, Australian Capital Territory, Australia
| | - Owen M Siggs
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
- Department of Ophthalmology, Flinders University, Bedford Park, South Australia, Australia
| | - Alex W Hewitt
- Menzies Institute for Medical Research, University of Tasmania, Hobart, Tasmania, Australia
| | - Bastien Llamas
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
- Australian Centre for Ancient DNA, School of Biological Sciences and Environment Institute, University of Adelaide, Adelaide, South Australia, Australia
- ARC Centre of Excellence for Australian Biodiversity and Heritage, University of Adelaide, Adelaide, South Australia, Australia
- Indigenous Genomics, Telethon Kids Institute, Adelaide, South Australia, Australia
| | - Alex Brown
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
- Indigenous Genomics, Telethon Kids Institute, Adelaide, South Australia, Australia
| | - Gareth Baynam
- Telethon Kids Institute and Division of Paediatrics, Faculty of Health and Medical Sciences, University of Western Australia, Perth, Western Australia, Australia
- Genetic Services of Western Australia, Western Australian Department of Health, Perth, Western Australia, Australia
- Western Australian Register of Developmental Anomalies, Western Australian Department of Health, Perth, Western Australia, Australia
| | - Graham J Mann
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Brendan J McMorran
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Simon Easteal
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Azure Hermes
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Misty R Jenkins
- Immunology Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
| | - Hardip R Patel
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia.
| | - Ira W Deveson
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia.
- Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
48
|
Hinch R, Donnelly P, Hinch AG. Meiotic DNA breaks drive multifaceted mutagenesis in the human germ line. Science 2023; 382:eadh2531. [PMID: 38033082 PMCID: PMC7615360 DOI: 10.1126/science.adh2531] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 09/29/2023] [Indexed: 12/02/2023]
Abstract
Meiotic recombination commences with hundreds of programmed DNA breaks; however, the degree to which they are accurately repaired remains poorly understood. We report that meiotic break repair is eightfold more mutagenic for single-base substitutions than was previously understood, leading to de novo mutation in one in four sperm and one in 12 eggs. Its impact on indels and structural variants is even higher, with 100- to 1300-fold increases in rates per break. We uncovered new mutational signatures and footprints relative to break sites, which implicate unexpected biochemical processes and error-prone DNA repair mechanisms, including translesion synthesis and end joining in meiotic break repair. We provide evidence that these mechanisms drive mutagenesis in human germ lines and lead to disruption of hundreds of genes genome wide.
Collapse
Affiliation(s)
- Robert Hinch
- Big Data Institute, University of Oxford; Oxford, UK
| | - Peter Donnelly
- Wellcome Centre for Human Genetics, University of Oxford; Oxford, UK
- Genomics plc; Oxford, UK
| | | |
Collapse
|
49
|
Xu Z, Li Q, Marchionni L, Wang K. PhenoSV: interpretable phenotype-aware model for the prioritization of genes affected by structural variants. Nat Commun 2023; 14:7805. [PMID: 38016949 PMCID: PMC10684511 DOI: 10.1038/s41467-023-43651-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 11/15/2023] [Indexed: 11/30/2023] Open
Abstract
Structural variants (SVs) represent a major source of genetic variation associated with phenotypic diversity and disease susceptibility. While long-read sequencing can discover over 20,000 SVs per human genome, interpreting their functional consequences remains challenging. Existing methods for identifying disease-related SVs focus on deletion/duplication only and cannot prioritize individual genes affected by SVs, especially for noncoding SVs. Here, we introduce PhenoSV, a phenotype-aware machine-learning model that interprets all major types of SVs and genes affected. PhenoSV segments and annotates SVs with diverse genomic features and employs a transformer-based architecture to predict their impacts under a multiple-instance learning framework. With phenotype information, PhenoSV further utilizes gene-phenotype associations to prioritize phenotype-related SVs. Evaluation on extensive human SV datasets covering all SV types demonstrates PhenoSV's superior performance over competing methods. Applications in diseases suggest that PhenoSV can determine disease-related genes from SVs. A web server and a command-line tool for PhenoSV are available at https://phenosv.wglab.org .
Collapse
Affiliation(s)
- Zhuoran Xu
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Quan Li
- Princess Margaret Cancer Centre, University Health Network, University of Toronto, Toronto, ON, M5G2C1, Canada
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
50
|
Ren L, Duan X, Dong L, Zhang R, Yang J, Gao Y, Peng R, Hou W, Liu Y, Li J, Yu Y, Zhang N, Shang J, Liang F, Wang D, Chen H, Sun L, Hao L, Scherer A, Nordlund J, Xiao W, Xu J, Tong W, Hu X, Jia P, Ye K, Li J, Jin L, Hong H, Wang J, Fan S, Fang X, Zheng Y, Shi L. Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance. Genome Biol 2023; 24:270. [PMID: 38012772 PMCID: PMC10680274 DOI: 10.1186/s13059-023-03109-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 11/13/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND Genomic DNA reference materials are widely recognized as essential for ensuring data quality in omics research. However, relying solely on reference datasets to evaluate the accuracy of variant calling results is incomplete, as they are limited to benchmark regions. Therefore, it is important to develop DNA reference materials that enable the assessment of variant detection performance across the entire genome. RESULTS We established a DNA reference material suite from four immortalized cell lines derived from a family of parents and monozygotic twins. Comprehensive reference datasets of 4.2 million small variants and 15,000 structural variants were integrated and certified for evaluating the reliability of germline variant calls inside the benchmark regions. Importantly, the genetic built-in-truth of the Quartet family design enables estimation of the precision of variant calls outside the benchmark regions. Using the Quartet reference materials along with study samples, batch effects are objectively monitored and alleviated by training a machine learning model with the Quartet reference datasets to remove potential artifact calls. Moreover, the matched RNA and protein reference materials and datasets from the Quartet project enables cross-omics validation of variant calls from multiomics data. CONCLUSIONS The Quartet DNA reference materials and reference datasets provide a unique resource for objectively assessing the quality of germline variant calls throughout the whole-genome regions and improving the reliability of large-scale genomic profiling.
Collapse
Affiliation(s)
- Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Xiaoke Duan
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | | | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, Guangdong, China
| | - Yuechen Gao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Rongxue Peng
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Jingjing Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Fan Liang
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Depeng Wang
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Hui Chen
- OrigiMed Co., Ltd, Shanghai, China
| | - Lele Sun
- Sequanta Technologies Co., Ltd, Shanghai, China
| | | | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Jessica Nordlund
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Department of Medical Sciences, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Wenming Xiao
- Office of Oncologic Diseases, Office of New Drugs, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Xin Hu
- Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Peng Jia
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Jing Wang
- National Institute of Metrology, Beijing, China.
| | - Shaohua Fan
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Xiang Fang
- National Institute of Metrology, Beijing, China.
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Shanghai Cancer Center, Fudan University, Shanghai, China
- International Human Phenome Institutes, Shanghai, China
| |
Collapse
|