1
|
Saratto T, Visuri K, Lehtinen J, Ortega-Sanz I, Steenwyk JL, Sihvonen S. Solu: a cloud platform for real-time genomic pathogen surveillance. BMC Bioinformatics 2025; 26:12. [PMID: 39806295 PMCID: PMC11731562 DOI: 10.1186/s12859-024-06005-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Accepted: 12/05/2024] [Indexed: 01/16/2025] Open
Abstract
BACKGROUND Genomic surveillance is extensively used for tracking public health outbreaks and healthcare-associated pathogens. Despite advancements in bioinformatics pipelines, there are still significant challenges in terms of infrastructure, expertise, and security when it comes to continuous surveillance. The existing pipelines often require the user to set up and manage their own infrastructure and are not designed for continuous surveillance that demands integration of new and regularly generated sequencing data with previous analyses. Additionally, academic projects often do not meet the privacy requirements of healthcare providers. RESULTS We present Solu, a cloud-based platform that integrates genomic data into a real-time, privacy-focused surveillance system. EVALUATION Solu's accuracy for taxonomy assignment, antimicrobial resistance genes, and phylogenetics was comparable to established pathogen surveillance pipelines. In some cases, Solu identified antimicrobial resistance genes that were previously undetected. Together, these findings demonstrate the efficacy of our platform. CONCLUSIONS By enabling reliable, user-friendly, and privacy-focused genomic surveillance, Solu has the potential to bridge the gap between cutting-edge research and practical, widespread application in healthcare settings. The platform is available for free academic use at https://platform.solugenomics.com .
Collapse
Affiliation(s)
- Timo Saratto
- Solu Healthcare Oy, Kalevankatu 31 A 13, 00100, Helsinki, Finland.
| | - Kerkko Visuri
- Solu Healthcare Oy, Kalevankatu 31 A 13, 00100, Helsinki, Finland
| | - Jonatan Lehtinen
- Solu Healthcare Oy, Kalevankatu 31 A 13, 00100, Helsinki, Finland
| | - Irene Ortega-Sanz
- Department of Food Technology, Safety and Health, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
| | - Jacob L Steenwyk
- Howards Hughes Medical Institute and the Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Samuel Sihvonen
- Solu Healthcare Oy, Kalevankatu 31 A 13, 00100, Helsinki, Finland
| |
Collapse
|
2
|
Zhao W, Tao Y, Xiong J, Liu L, Wang Z, Shao C, Shang L, Hu Y, Xu Y, Su Y, Yu J, Feng T, Xie J, Xu H, Zhang Z, Peng J, Wu J, Zhang Y, Zhu S, Xia K, Tang B, Zhao G, Li J, Li B. GoFCards: an integrated database and analytic platform for gain of function variants in humans. Nucleic Acids Res 2025; 53:D976-D988. [PMID: 39578693 PMCID: PMC11701611 DOI: 10.1093/nar/gkae1079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 10/20/2024] [Accepted: 10/28/2024] [Indexed: 11/24/2024] Open
Abstract
Gain-of-function (GOF) variants, which introduce new or amplify protein functions, are essential for understanding disease mechanisms. Despite advances in genomics and functional research, identifying and analyzing pathogenic GOF variants remains challenging owing to fragmented data and database limitations, underscoring the difficulty in accessing critical genetic information. To address this challenge, we manually reviewed the literature, pinpointing 3089 single-nucleotide variants and 72 insertions and deletions in 579 genes associated with 1299 diseases from 2069 studies, and integrated these with the 3.5 million predicted GOF variants. Our approach is complemented by a proprietary scoring system that prioritizes GOF variants on the basis of the evidence supporting their GOF effects and provides predictive scores for variants that lack existing documentation. We then developed a database named GoFCards for general geneticists and clinicians to easily obtain GOF variants in humans (http://www.genemed.tech/gofcards). This database also contains data from >150 sources and offers comprehensive variant-level and gene-level annotations, with the aim of providing users with convenient access to detailed and relevant genetic information. Furthermore, GoFCards empowers users with limited bioinformatic skills to analyze and annotate genetic data, and prioritize GOF variants. GoFCards offers an efficient platform for interpreting GOF variants and thereby advancing genetic research.
Collapse
Affiliation(s)
- Wenjing Zhao
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
- Department of Medical Genetics, NHC Key Laboratory of Healthy Birth and Birth Defect Prevention in Western China, The First People's Hospital of Yunnan Province, No. 157 Jinbi Road, Xishan District, Kunming, Yunnan 650000, China
- School of Medicinie, Kunming University of Science and Technology, No. 727 Jingming South Road, Chenggong District, Kunming, Yunnan 650000, China
| | - Youfu Tao
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Jiayi Xiong
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
| | - Lei Liu
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Zhongqing Wang
- School of Medicinie, Kunming University of Science and Technology, No. 727 Jingming South Road, Chenggong District, Kunming, Yunnan 650000, China
| | - Chuhan Shao
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Ling Shang
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Yue Hu
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Yishu Xu
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Yingluo Su
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Jiahui Yu
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Tianyi Feng
- Xiangya School of Medicine, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Junyi Xie
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Huijuan Xu
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Zijun Zhang
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Jiayi Peng
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Jianbin Wu
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Yuchang Zhang
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Shaobo Zhu
- School of Life Science, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha, Hunan 410008, China
| | - Kun Xia
- MOE Key Laboratory of Pediatric Rare Diseases & Hunan Key Laboratory of Medical Genetics, Central South University, No. 110 Xiangya Road, Furong District, Changsha, Hunan 410008, China
| | - Beisha Tang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
- Department of Neurology & Multi-omics Research Center for Brain Disorders, The First Affiliated Hospital University of South China, 69 Chuan Shan Road, Shi Gu District, Hengyang, Hunan 421000, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Department of Neurology, Xiangya Hospital, Central South University, No. 87 Xiangya Road, Furong District, Changsha,Hunan 410008, China
| | - Guihu Zhao
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
| | - Jinchen Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Department of Neurology, Xiangya Hospital, Central South University, No. 87 Xiangya Road, Furong District, Changsha,Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
| | - Bin Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, No. 87 Xiangya Road, Furong District, Changsha, Hunan 410008, China
| |
Collapse
|
3
|
Giovannetti A, Lazzari S, Mangoni M, Traversa A, Mazza T, Parisi C, Caputo V. Exploring non-coding genetic variability in ACE2: Functional annotation and in vitro validation of regulatory variants. Gene 2024; 915:148422. [PMID: 38570058 DOI: 10.1016/j.gene.2024.148422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 02/23/2024] [Accepted: 03/13/2024] [Indexed: 04/05/2024]
Abstract
The surge in human whole-genome sequencing data has facilitated the study of non-coding region variations, yet understanding their biological significance remains a challenge. We used a computational workflow to assess the regulatory potential of non-coding variants, with a particular focus on the Angiotensin Converting Enzyme 2 (ACE2) gene. This gene is crucial in physiological processes and serves as the entry point for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus causing coronavirus disease 19 (COVID-19). In our analysis, using data from the gnomAD population database and functional annotation, we identified 17 significant Single Nucleotide Variants (SNVs) in ACE2, particularly in its enhancers, promoters, and 3' untranslated regions (UTRs). We found preliminary evidence supporting the regulatory impact of some of these variants on ACE2 expression. Our detailed examination of two SNVs, rs147718775 and rs140394675, in the ACE2 promoter revealed that these co-occurring SNVs, when mutated, significantly enhance promoter activity, suggesting a possible increase in specific ACE2 isoform expression. This method proves effective in identifying and interpreting impactful non-coding variants, aiding in further studies and enhancing understanding of molecular bases of monogenic and complex traits.
Collapse
Affiliation(s)
- Agnese Giovannetti
- Clinical Genomics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini, snc, 71013 S. Giovanni Rotondo (FG), Italy.
| | - Sara Lazzari
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy.
| | - Manuel Mangoni
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy; Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini, snc, 71013 S. Giovanni Rotondo (FG), Italy.
| | - Alice Traversa
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy; Dipartimento di Scienze della Vita, della Salute e delle Professioni Sanitarie, Università degli Studi "Link Campus University", Via del Casale di San Pio V 44, 00165 Roma, Italy.
| | - Tommaso Mazza
- Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini, snc, 71013 S. Giovanni Rotondo (FG), Italy.
| | - Chiara Parisi
- Institute of Biochemistry and Cell Biology, CNR-National Research Council, Via Ercole Ramarini, 32, 00015 Monterotondo Scalo (RM), Italy.
| | - Viviana Caputo
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy.
| |
Collapse
|
4
|
Jin W, Xia Y, Thela SR, Liu Y, Chen L. In silico generation and augmentation of regulatory variants from massively parallel reporter assay using conditional variational autoencoder. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600715. [PMID: 38979263 PMCID: PMC11230389 DOI: 10.1101/2024.06.25.600715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Predicting the functional consequences of genetic variants in non-coding regions is a challenging problem. Massively parallel reporter assays (MPRAs), which are an in vitro high-throughput method, can simultaneously test thousands of variants by evaluating the existence of allele specific regulatory activity. Nevertheless, the identified labelled variants by MPRAs, which shows differential allelic regulatory effects on the gene expression are usually limited to the scale of hundreds, limiting their potential to be used as the training set for achieving a robust genome-wide prediction. To address the limitation, we propose a deep generative model, MpraVAE, to in silico generate and augment the training sample size of labelled variants. By benchmarking on several MPRA datasets, we demonstrate that MpraVAE significantly improves the prediction performance for MPRA regulatory variants compared to the baseline method, conventional data augmentation approaches as well as existing variant scoring methods. Taking autoimmune diseases as one example, we apply MpraVAE to perform a genome-wide prediction of regulatory variants and find that predicted regulatory variants are more enriched than background variants in enhancers, active histone marks, open chromatin regions in immune-related cell types, and chromatin states associated with promoter, enhancer activity and binding sites of cMyC and Pol II that regulate gene expression. Importantly, predicted regulatory variants are found to link immune-related genes by leveraging chromatin loop and accessible chromatin, demonstrating the importance of MpraVAE in genetic and gene discovery for complex traits.
Collapse
Affiliation(s)
- Weijia Jin
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA
| | - Yi Xia
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA
| | - Sai Ritesh Thela
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA
| | - Yunlong Liu
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Li Chen
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA
| |
Collapse
|
5
|
Wade KJ, Suseno R, Kizer K, Williams J, Boquett J, Caillier S, Pollock NR, Renschen A, Santaniello A, Oksenberg JR, Norman PJ, Augusto DG, Hollenbach JA. MHConstructor: A high-throughput, haplotype-informed solution to the MHC assembly challenge. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.20.595060. [PMID: 38826378 PMCID: PMC11142050 DOI: 10.1101/2024.05.20.595060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
The extremely high levels of genetic polymorphism within the human major histocompatibility complex (MHC) limit the usefulness of reference-based alignment methods for sequence assembly. We incorporate a short read de novo assembly algorithm into a workflow for novel application to the MHC. MHConstructor is a containerized pipeline designed for high-throughput, haplotype-informed, reproducible assembly of both whole genome sequencing and target-capture short read data in large, population cohorts. To-date, no other self-contained tool exists for the generation of de novo MHC assemblies from short read data. MHConstructor facilitates wide-spread access to high quality, alignment-free MHC sequence analysis.
Collapse
Affiliation(s)
- Kristen J. Wade
- Weill Institute for Neurosciences, Department of Neurology, University of California San Francisco, San Francisco, CA, United States
| | - Rayo Suseno
- Weill Institute for Neurosciences, Department of Neurology, University of California San Francisco, San Francisco, CA, United States
| | - Kerry Kizer
- Weill Institute for Neurosciences, Department of Neurology, University of California San Francisco, San Francisco, CA, United States
| | - Jacqueline Williams
- Weill Institute for Neurosciences, Department of Neurology, University of California San Francisco, San Francisco, CA, United States
| | - Juliano Boquett
- Weill Institute for Neurosciences, Department of Neurology, University of California San Francisco, San Francisco, CA, United States
| | - Stacy Caillier
- Weill Institute for Neurosciences, Department of Neurology, University of California San Francisco, San Francisco, CA, United States
| | - Nicholas R. Pollock
- Department of Biomedical Informatics, Anschutz Medical Campus, University of Colorado, Aurora, Colorado, USA
- Department of Immunology and Microbiology, Anschutz Medical Campus, University of Colorado, Aurora, Colorado, USA
| | - Adam Renschen
- Weill Institute for Neurosciences, Department of Neurology, University of California San Francisco, San Francisco, CA, United States
| | - Adam Santaniello
- Weill Institute for Neurosciences, Department of Neurology, University of California San Francisco, San Francisco, CA, United States
| | - Jorge R. Oksenberg
- Weill Institute for Neurosciences, Department of Neurology, University of California San Francisco, San Francisco, CA, United States
| | - Paul J. Norman
- Department of Biomedical Informatics, Anschutz Medical Campus, University of Colorado, Aurora, Colorado, USA
- Department of Immunology and Microbiology, Anschutz Medical Campus, University of Colorado, Aurora, Colorado, USA
| | - Danillo G. Augusto
- Department of Biological Sciences, University of North Carolina Charlotte, Charlotte, NC, United States
- Programa de Pós-Graduação em Genética, Universidade Federal do Paraná, Curitiba, Brazil
| | - Jill A. Hollenbach
- Weill Institute for Neurosciences, Department of Neurology, University of California San Francisco, San Francisco, CA, United States
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, United States
| |
Collapse
|
6
|
Nakamura T, Ueda J, Mizuno S, Honda K, Kazuno AA, Yamamoto H, Hara T, Takata A. Topologically associating domains define the impact of de novo promoter variants on autism spectrum disorder risk. CELL GENOMICS 2024; 4:100488. [PMID: 38280381 PMCID: PMC10879036 DOI: 10.1016/j.xgen.2024.100488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 08/24/2023] [Accepted: 01/02/2024] [Indexed: 01/29/2024]
Abstract
Whole-genome sequencing (WGS) studies of autism spectrum disorder (ASD) have demonstrated the roles of rare promoter de novo variants (DNVs). However, most promoter DNVs in ASD are not located immediately upstream of known ASD genes. In this study analyzing WGS data of 5,044 ASD probands, 4,095 unaffected siblings, and their parents, we show that promoter DNVs within topologically associating domains (TADs) containing ASD genes are significantly and specifically associated with ASD. An analysis considering TADs as functional units identified specific TADs enriched for promoter DNVs in ASD and indicated that common variants in these regions also confer ASD heritability. Experimental validation using human induced pluripotent stem cells (iPSCs) showed that likely deleterious promoter DNVs in ASD can influence multiple genes within the same TAD, resulting in overall dysregulation of ASD-associated genes. These results highlight the importance of TADs and gene-regulatory mechanisms in better understanding the genetic architecture of ASD.
Collapse
Affiliation(s)
- Takumi Nakamura
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Junko Ueda
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.
| | - Shota Mizuno
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Kurara Honda
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - An-A Kazuno
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Hirona Yamamoto
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Department of Neuropsychiatry, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8654, Japan
| | - Tomonori Hara
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Department of Organ Anatomy, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8575, Japan
| | - Atsushi Takata
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Research Institute for Diseases of Old Age, Juntendo University Graduate School of Medicine, 2-1-1 Hongo, Bunkyo-ku, Tokyo 113-8421, Japan.
| |
Collapse
|
7
|
Wang Z, Zhao G, Zhu Z, Wang Y, Xiang X, Zhang S, Luo T, Zhou Q, Qiu J, Tang B, Xia K, Li B, Li J. VarCards2: an integrated genetic and clinical database for ACMG-AMP variant-interpretation guidelines in the human whole genome. Nucleic Acids Res 2024; 52:D1478-D1489. [PMID: 37956311 PMCID: PMC10767961 DOI: 10.1093/nar/gkad1061] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/21/2023] [Accepted: 10/25/2023] [Indexed: 11/15/2023] Open
Abstract
VarCards, an online database, combines comprehensive variant- and gene-level annotation data to streamline genetic counselling for coding variants. Recognising the increasing clinical relevance of non-coding variations, there has been an accelerated development of bioinformatics tools dedicated to interpreting non-coding variations, including single-nucleotide variants and copy number variations. Regrettably, most tools remain as either locally installed databases or command-line tools dispersed across diverse online platforms. Such a landscape poses inconveniences and challenges for genetic counsellors seeking to utilise these resources without advanced bioinformatics expertise. Consequently, we developed VarCards2, which incorporates nearly nine billion artificially generated single-nucleotide variants (including those from mitochondrial DNA) and compiles vital annotation information for genetic counselling based on ACMG-AMP variant-interpretation guidelines. These annotations include (I) functional effects; (II) minor allele frequencies; (III) comprehensive function and pathogenicity predictions covering all potential variants, such as non-synonymous substitutions, non-canonical splicing variants, and non-coding variations and (IV) gene-level information. Furthermore, VarCards2 incorporates 368 820 266 documented short insertions and deletions and 2 773 555 documented copy number variations, complemented by their corresponding annotation and prediction tools. In conclusion, VarCards2, by integrating over 150 variant- and gene-level annotation sources, significantly enhances the efficiency of genetic counselling and can be freely accessed at http://www.genemed.tech/varcards2/.
Collapse
Affiliation(s)
- Zheng Wang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Guihu Zhao
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Zhaopo Zhu
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Yijing Wang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Xudong Xiang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Shiyu Zhang
- Xiangya School of Medicine, Central South University, Changsha, Hunan 410013, China
| | - Tengfei Luo
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Qiao Zhou
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Jian Qiu
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Beisha Tang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, & Multi-Omics Research Center for Brain Disorders, The First Affiliated Hospital, University of South China, Hengyang, Hunan, China
| | - Kun Xia
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Bin Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Jinchen Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| |
Collapse
|
8
|
Shi FY, Wang Y, Huang D, Liang Y, Liang N, Chen XW, Gao G. Computational Assessment of the Expression-modulating Potential for Non-coding Variants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:662-673. [PMID: 34890839 PMCID: PMC10787178 DOI: 10.1016/j.gpb.2021.10.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 10/13/2021] [Accepted: 11/01/2021] [Indexed: 06/13/2023]
Abstract
Large-scale genome-wide association studies (GWAS) and expression quantitative trait locus (eQTL) studies have identified multiple non-coding variants associated with genetic diseases by affecting gene expression. However, pinpointing causal variants effectively and efficiently remains a serious challenge. Here, we developed CARMEN, a novel algorithm to identify functional non-coding expression-modulating variants. Multiple evaluations demonstrated CARMEN's superior performance over state-of-the-art tools. Applying CARMEN to GWAS and eQTL datasets further pinpointed several causal variants other than the reported lead single-nucleotide polymorphisms (SNPs). CARMEN scales well with the massive datasets, and is available online as a web server at http://carmen.gao-lab.org.
Collapse
Affiliation(s)
- Fang-Yuan Shi
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Yu Wang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Dong Huang
- State Key Laboratory of Membrane Biology, Institute of Molecular Medicine, Peking University, Beijing 100871, China
| | - Yu Liang
- Human Aging Research Institute, School of Life Science, Nanchang University, Nanchang 330031, China
| | - Nan Liang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Xiao-Wei Chen
- State Key Laboratory of Membrane Biology, Institute of Molecular Medicine, Peking University, Beijing 100871, China; Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Ge Gao
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China.
| |
Collapse
|
9
|
Wang Z, Zhao G, Li B, Fang Z, Chen Q, Wang X, Luo T, Wang Y, Zhou Q, Li K, Xia L, Zhang Y, Zhou X, Pan H, Zhao Y, Wang Y, Wang L, Guo J, Tang B, Xia K, Li J. Performance Comparison of Computational Methods for the Prediction of the Function and Pathogenicity of Non-coding Variants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:649-661. [PMID: 35272052 PMCID: PMC10787016 DOI: 10.1016/j.gpb.2022.02.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 12/28/2021] [Accepted: 02/27/2022] [Indexed: 06/14/2023]
Abstract
Non-coding variants in the human genome significantly influence human traits and complex diseases via their regulation and modification effects. Hence, an increasing number of computational methods are developed to predict the effects of variants in human non-coding sequences. However, it is difficult for inexperienced users to select appropriate computational methods from dozens of available methods. To solve this issue, we assessed 12 performance metrics of 24 methods on four independent non-coding variant benchmark datasets: (1) rare germline variants from clinical relevant sequence variants (ClinVar), (2) rare somatic variants from Catalogue Of Somatic Mutations In Cancer (COSMIC), (3) common regulatory variants from curated expression quantitative trait locus (eQTL) data, and (4) disease-associated common variants from curated genome-wide association studies (GWAS). All 24 tested methods performed differently under various conditions, indicating varying strengths and weaknesses under different scenarios. Importantly, the performance of existing methods was acceptable for rare germline variants from ClinVar with the area under the receiver operating characteristic curve (AUROC) of 0.4481-0.8033 and poor for rare somatic variants from COSMIC (AUROC = 0.4984-0.7131), common regulatory variants from curated eQTL data (AUROC = 0.4837-0.6472), and disease-associated common variants from curated GWAS (AUROC = 0.4766-0.5188). We also compared the prediction performance of 24 methods for non-coding de novo mutations in autism spectrum disorder, and found that the combined annotation-dependent depletion (CADD) and context-dependent tolerance score (CDTS) methods showed better performance. Summarily, we assessed the performance of 24 computational methods under diverse scenarios, providing preliminary advice for proper tool selection and guiding the development of new techniques in interpreting non-coding variants.
Collapse
Affiliation(s)
- Zheng Wang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Guihu Zhao
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Bin Li
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Zhenghuan Fang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Qian Chen
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Xiaomeng Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Tengfei Luo
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Yijing Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Qiao Zhou
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Kuokuo Li
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Lu Xia
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Yi Zhang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Xun Zhou
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Hongxu Pan
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Yuwen Zhao
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Yige Wang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Lin Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China; Reproductive Medicine Center, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Jifeng Guo
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Beisha Tang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Kun Xia
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Jinchen Li
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China; Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China.
| |
Collapse
|
10
|
Li RY, Huang Y, Zhao Z, Qin ZS. Comprehensive 100-bp resolution genome-wide epigenomic profiling data for the hg38 human reference genome. Data Brief 2023; 46:108827. [PMID: 36582986 PMCID: PMC9792340 DOI: 10.1016/j.dib.2022.108827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 11/21/2022] [Accepted: 12/09/2022] [Indexed: 12/15/2022] Open
Abstract
This manuscript presents a comprehensive collection of diverse epigenomic profiling data for the human genome in 100-bp resolution with full genome-wide coverage. The datasets are processed from raw read count data collected from five types of sequencing-based assays collected by the Encyclopedia of DNA Elements consortium (ENCODE, http://www.encodeproject.org). Data from high-throughput sequencing assays were processed and crystallized into a total of 6,305 genome-wide profiles. To ensure the quality of the features, we filtered out assays with low read depth, inconsistent read counts, and poor data quality. The types of sequencing-based experiment assays include DNase-seq, histone and TF ChIP-seq, ATAC-seq, and Poly(A) RNA-seq. Merging of processed data was done by averaging read counts across technical replicates to obtain signals in about 30 million predefined 100-bp bins that tile the entire genome. We provide an example of fetching read counts using disease-related risk variants from the GWAS Catalog. Additionally, we have created a tabix index enabling fast user retrieval of read counts given coordinates in the human genome. The data processing pipeline is replicable for users' own purposes and for other experimental assays. The processed data can be found on Zenodo at https://zenodo.org/record/7015783. These data can be used as features for statistical and machine learning models to predict or infer a wide range of variables of biological interest. They can also be applied to generate novel insights into gene expression, chromatin accessibility, and epigenetic modifications across the human genome. Finally, the processing pipeline can be easily applied to data from any other genome-wide profiling assays, expanding the amount of available data.
Collapse
Affiliation(s)
- Ronnie Y. Li
- Graduate program in Neuroscience, Emory University, United States
| | - Yanting Huang
- Department of Computer Science, Emory University, United States
| | - Zhiyue Zhao
- Department of Computer Science, Emory University, United States
| | - Zhaohui S. Qin
- Department of Biostatistics and Bioinformatics, Emory University, United States
| |
Collapse
|
11
|
Sobahy TM, Motwalli O, Alazmi M. AllelePred: A Simple Allele Frequencies Ensemble Predictor for Different Single Nucleotide Variants. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:796-801. [PMID: 35239491 DOI: 10.1109/tcbb.2022.3155659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
BACKGROUND & OBJECTIVE Genomic medicine stands to be revolutionized by understanding single nucleotide variants (SNVs) and their expression in single-gene disorders (Mendelian diseases). Computational tools can play a vital role in the exploration of such variations and their pathogenicity. Consequently, we developed the ensemble prediction tool AllelePred to identify deleterious SNVs and disease causative genes. RESULTS The model utilizes different population genetics backgrounds and restricted criteria for features selection to help generate high accuracy results. In comparison to other tools, such as Eigen, PROVEAN, and fathmm-MKL our classifier achieves higher accuracy (98%), precision (96%), F1 score (93%), and coverage (100%) for different types of coding variants. The new method was also compared against a bioinformatics analytical workflow, which uses gnomAD overall AFs (less than 1%) and CADD (scaled C-score of at least 15). Furthermore, this research highlights the stature of genetic variant sharing and curation. We accumulated a list of highly probable deleterious variants and recommended further experimental validation before medical diagnostic usage. CONCLUSIONS The ensemble prediction tool AllelePred enables increased accuracy in recognizing deleterious SNVs and the genetic determinants in real clinical data.
Collapse
|
12
|
He Z, Liu L, Belloy ME, Le Guen Y, Sossin A, Liu X, Qi X, Ma S, Gyawali PK, Wyss-Coray T, Tang H, Sabatti C, Candès E, Greicius MD, Ionita-Laza I. GhostKnockoff inference empowers identification of putative causal variants in genome-wide association studies. Nat Commun 2022; 13:7209. [PMID: 36418338 PMCID: PMC9684164 DOI: 10.1038/s41467-022-34932-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 11/09/2022] [Indexed: 11/27/2022] Open
Abstract
Recent advances in genome sequencing and imputation technologies provide an exciting opportunity to comprehensively study the contribution of genetic variants to complex phenotypes. However, our ability to translate genetic discoveries into mechanistic insights remains limited at this point. In this paper, we propose an efficient knockoff-based method, GhostKnockoff, for genome-wide association studies (GWAS) that leads to improved power and ability to prioritize putative causal variants relative to conventional GWAS approaches. The method requires only Z-scores from conventional GWAS and hence can be easily applied to enhance existing and future studies. The method can also be applied to meta-analysis of multiple GWAS allowing for arbitrary sample overlap. We demonstrate its performance using empirical simulations and two applications: (1) a meta-analysis for Alzheimer's disease comprising nine overlapping large-scale GWAS, whole-exome and whole-genome sequencing studies and (2) analysis of 1403 binary phenotypes from the UK Biobank data in 408,961 samples of European ancestry. Our results demonstrate that GhostKnockoff can identify putatively functional variants with weaker statistical effects that are missed by conventional association tests.
Collapse
Affiliation(s)
- Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA.
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA, 94305, USA.
| | - Linxi Liu
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Michael E Belloy
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Yann Le Guen
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
- Institut du Cerveau - Paris Brain Institute - ICM, Paris, 75013, France
| | - Aaron Sossin
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Xiaoxia Liu
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Xinran Qi
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Shiyang Ma
- Department of Biostatistics, Columbia University, New York, NY, 10032, USA
| | - Prashnna K Gyawali
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Tony Wyss-Coray
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Hua Tang
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Chiara Sabatti
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Emmanuel Candès
- Department of Statistics, Stanford University, Stanford, CA, 94305, USA
- Department of Mathematics, Stanford University, Stanford, CA, 94305, USA
| | - Michael D Greicius
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | | |
Collapse
|
13
|
Van de Sompele S, Small KW, Cicekdal MB, Soriano VL, D'haene E, Shaya FS, Agemy S, Van der Snickt T, Rey AD, Rosseel T, Van Heetvelde M, Vergult S, Balikova I, Bergen AA, Boon CJF, De Zaeytijd J, Inglehearn CF, Kousal B, Leroy BP, Rivolta C, Vaclavik V, van den Ende J, van Schooneveld MJ, Gómez-Skarmeta JL, Tena JJ, Martinez-Morales JR, Liskova P, Vleminckx K, De Baere E. Multi-omics approach dissects cis-regulatory mechanisms underlying North Carolina macular dystrophy, a retinal enhanceropathy. Am J Hum Genet 2022; 109:2029-2048. [PMID: 36243009 PMCID: PMC9674966 DOI: 10.1016/j.ajhg.2022.09.013] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 09/28/2022] [Indexed: 01/26/2023] Open
Abstract
North Carolina macular dystrophy (NCMD) is a rare autosomal-dominant disease affecting macular development. The disease is caused by non-coding single-nucleotide variants (SNVs) in two hotspot regions near PRDM13 and by duplications in two distinct chromosomal loci, overlapping DNase I hypersensitive sites near either PRDM13 or IRX1. To unravel the mechanisms by which these variants cause disease, we first established a genome-wide multi-omics retinal database, RegRet. Integration of UMI-4C profiles we generated on adult human retina then allowed fine-mapping of the interactions of the PRDM13 and IRX1 promoters and the identification of eighteen candidate cis-regulatory elements (cCREs), the activity of which was investigated by luciferase and Xenopus enhancer assays. Next, luciferase assays showed that the non-coding SNVs located in the two hotspot regions of PRDM13 affect cCRE activity, including two NCMD-associated non-coding SNVs that we identified herein. Interestingly, the cCRE containing one of these SNVs was shown to interact with the PRDM13 promoter, demonstrated in vivo activity in Xenopus, and is active at the developmental stage when progenitor cells of the central retina exit mitosis, suggesting that this region is a PRDM13 enhancer. Finally, mining of single-cell transcriptional data of embryonic and adult retina revealed the highest expression of PRDM13 and IRX1 when amacrine cells start to synapse with retinal ganglion cells, supporting the hypothesis that altered PRDM13 or IRX1 expression impairs interactions between these cells during retinogenesis. Overall, this study provides insight into the cis-regulatory mechanisms of NCMD and supports that this condition is a retinal enhanceropathy.
Collapse
Affiliation(s)
- Stijn Van de Sompele
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Kent W Small
- Macula and Retina Institute, Los Angeles and Glendale, California, USA
| | - Munevver Burcu Cicekdal
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium; Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium
| | - Víctor López Soriano
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Eva D'haene
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Fadi S Shaya
- Macula and Retina Institute, Los Angeles and Glendale, California, USA
| | - Steven Agemy
- Department of Ophthalmology, SUNY Downstate Medical Center University, Brooklyn, New York, USA
| | - Thijs Van der Snickt
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Alfredo Dueñas Rey
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Toon Rosseel
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Mattias Van Heetvelde
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Sarah Vergult
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Irina Balikova
- Department of Ophthalmology, University Hospitals Leuven, Leuven, Belgium
| | - Arthur A Bergen
- Department of Human Genetics, Amsterdam UMC, Academic Medical Center, 1105 AZ Amsterdam, The Netherlands; Queen Emma Centre of Precision Medicine, Amsterdam University Medical Centre, University of Amsterdam, Amsterdam, The Netherlands
| | - Camiel J F Boon
- Department of Ophthalmology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, The Netherlands; Department of Ophthalmology, Leiden University Medical Center, Leiden, The Netherlands
| | - Julie De Zaeytijd
- Department of Ophthalmology, Ghent University Hospital, Ghent, Belgium
| | - Chris F Inglehearn
- Division of Molecular Medicine, Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - Bohdan Kousal
- Department of Ophthalmology, First Faculty of Medicine, Charles University and General University Hospital in Prague, Prague, Czech Republic
| | - Bart P Leroy
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium; Department of Ophthalmology, Ghent University Hospital, Ghent, Belgium; Department of Head & Skin, Ghent University, Ghent, Belgium; Division of Ophthalmology & Center for Cellular & Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Carlo Rivolta
- Institute of Molecular and Clinical Ophthalmology Basel (IOB), Basel, Switzerland; Department of Ophthalmology, University of Basel, Basel, Switzerland; Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Veronika Vaclavik
- University of Lausanne, Jules-Gonin Eye Hospital, Lausanne, Switzerland
| | | | - Mary J van Schooneveld
- Department of Ophthalmology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, The Netherlands; Bartiméus, Diagnostic Center for Complex Visual Disorders, Zeist, The Netherlands
| | - José Luis Gómez-Skarmeta
- Centro Andaluz de Biología del Desarrollo, Consejo Superior de Investigaciones Científicas and Universidad Pablo de Olavide, Sevilla, Spain
| | - Juan J Tena
- Centro Andaluz de Biología del Desarrollo, Consejo Superior de Investigaciones Científicas and Universidad Pablo de Olavide, Sevilla, Spain
| | - Juan R Martinez-Morales
- Centro Andaluz de Biología del Desarrollo, Consejo Superior de Investigaciones Científicas and Universidad Pablo de Olavide, Sevilla, Spain
| | - Petra Liskova
- Department of Ophthalmology, First Faculty of Medicine, Charles University and General University Hospital in Prague, Prague, Czech Republic; Department of Paediatrics and Inherited Metabolic Disorders, First Faculty of Medicine, Charles University and General University Hospital in Prague, Prague, Czech Republic
| | - Kris Vleminckx
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium; Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium
| | - Elfride De Baere
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium.
| |
Collapse
|
14
|
Exploration of Tools for the Interpretation of Human Non-Coding Variants. Int J Mol Sci 2022; 23:ijms232112977. [PMID: 36361767 PMCID: PMC9654743 DOI: 10.3390/ijms232112977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/17/2022] [Accepted: 10/23/2022] [Indexed: 02/01/2023] Open
Abstract
The advent of Whole Genome Sequencing (WGS) broadened the genetic variation detection range, revealing the presence of variants even in non-coding regions of the genome, which would have been missed using targeted approaches. One of the most challenging issues in WGS analysis regards the interpretation of annotated variants. This review focuses on tools suitable for the functional annotation of variants falling into non-coding regions. It couples the description of non-coding genomic areas with the results and performance of existing tools for a functional interpretation of the effect of variants in these regions. Tools were tested in a controlled genomic scenario, representing the ground-truth and allowing us to determine software performance.
Collapse
|
15
|
Katsonis P, Wilhelm K, Williams A, Lichtarge O. Genome interpretation using in silico predictors of variant impact. Hum Genet 2022; 141:1549-1577. [PMID: 35488922 PMCID: PMC9055222 DOI: 10.1007/s00439-022-02457-6] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 04/17/2022] [Indexed: 02/06/2023]
Abstract
Estimating the effects of variants found in disease driver genes opens the door to personalized therapeutic opportunities. Clinical associations and laboratory experiments can only characterize a tiny fraction of all the available variants, leaving the majority as variants of unknown significance (VUS). In silico methods bridge this gap by providing instant estimates on a large scale, most often based on the numerous genetic differences between species. Despite concerns that these methods may lack reliability in individual subjects, their numerous practical applications over cohorts suggest they are already helpful and have a role to play in genome interpretation when used at the proper scale and context. In this review, we aim to gain insights into the training and validation of these variant effect predicting methods and illustrate representative types of experimental and clinical applications. Objective performance assessments using various datasets that are not yet published indicate the strengths and limitations of each method. These show that cautious use of in silico variant impact predictors is essential for addressing genome interpretation challenges.
Collapse
Affiliation(s)
- Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Kevin Wilhelm
- Graduate School of Biomedical Sciences, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Amanda Williams
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Biochemistry, Human Genetics and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Pharmacology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| |
Collapse
|
16
|
Yang M, Huang L, Huang H, Tang H, Zhang N, Yang H, Wu J, Mu F. Integrating convolution and self-attention improves language model of human genome for interpreting non-coding regions at base-resolution. Nucleic Acids Res 2022; 50:e81. [PMID: 35536244 PMCID: PMC9371931 DOI: 10.1093/nar/gkac326] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 02/22/2022] [Accepted: 05/09/2022] [Indexed: 12/12/2022] Open
Abstract
Interpretation of non-coding genome remains an unsolved challenge in human genetics due to impracticality of exhaustively annotating biochemically active elements in all conditions. Deep learning based computational approaches emerge recently to help interpret non-coding regions. Here, we present LOGO (Language of Genome), a self-attention based contextualized pre-trained language model containing only two self-attention layers with 1 million parameters as a substantially light architecture that applies self-supervision techniques to learn bidirectional representations of the unlabelled human reference genome. LOGO is then fine-tuned for sequence labelling task, and further extended to variant prioritization task via a special input encoding scheme of alternative alleles followed by adding a convolutional module. Experiments show that LOGO achieves 15% absolute improvement for promoter identification and up to 4.5% absolute improvement for enhancer-promoter interaction prediction. LOGO exhibits state-of-the-art multi-task predictive power on thousands of chromatin features with only 3% parameterization benchmarking against the fully supervised model, DeepSEA and 1% parameterization against a recent BERT-based DNA language model. For allelic-effect prediction, locality introduced by one dimensional convolution shows improved sensitivity and specificity for prioritizing non-coding variants associated with human diseases. In addition, we apply LOGO to interpret type 2 diabetes (T2D) GWAS signals and infer underlying regulatory mechanisms. We make a conceptual analogy between natural language and human genome and demonstrate LOGO is an accurate, fast, scalable, and robust framework to interpret non-coding regions for global sequence labeling as well as for variant prioritization at base-resolution.
Collapse
Affiliation(s)
- Meng Yang
- MGI, BGI-Shenzhen, Shenzhen 518083, China.,Department of Biology, University of Copenhagen, Copenhagen DK-2200, Denmark
| | | | | | - Hui Tang
- MGI, BGI-Shenzhen, Shenzhen 518083, China
| | - Nan Zhang
- MGI, BGI-Shenzhen, Shenzhen 518083, China
| | - Huanming Yang
- BGI-Shenzhen, Shenzhen 518083, China.,Guangdong Provincial Academician Workstation of BGI Synthetic Genomics, BGI-Shenzhen, Shenzhen, 518120, China
| | - Jihong Wu
- Department of Ophthalmology, Eye & ENT Hospital, Shanghai Medical College, Fudan University, Shanghai, China.,Shanghai Key Laboratory of Visual Impairment and Restoration, Science and Technology Commission of Shanghai Municipality, Shanghai, China.,Key Laboratory of Myopia (Fudan University), Chinese Academy of Medical Sciences, National Health Commission, Shanghai, China
| | - Feng Mu
- MGI, BGI-Shenzhen, Shenzhen 518083, China
| |
Collapse
|
17
|
Brooks-Warburton J, Modos D, Sudhakar P, Madgwick M, Thomas JP, Bohar B, Fazekas D, Zoufir A, Kapuy O, Szalay-Beko M, Verstockt B, Hall LJ, Watson A, Tremelling M, Parkes M, Vermeire S, Bender A, Carding SR, Korcsmaros T. A systems genomics approach to uncover patient-specific pathogenic pathways and proteins in ulcerative colitis. Nat Commun 2022; 13:2299. [PMID: 35484353 PMCID: PMC9051123 DOI: 10.1038/s41467-022-29998-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 04/06/2022] [Indexed: 12/11/2022] Open
Abstract
We describe a precision medicine workflow, the integrated single nucleotide polymorphism network platform (iSNP), designed to determine the mechanisms by which SNPs affect cellular regulatory networks, and how SNP co-occurrences contribute to disease pathogenesis in ulcerative colitis (UC). Using SNP profiles of 378 UC patients we map the regulatory effects of the SNPs to a human signalling network containing protein-protein, miRNA-mRNA and transcription factor binding interactions. With unsupervised clustering algorithms we group these patient-specific networks into four distinct clusters driven by PRKCB, HLA, SNAI1/CEBPB/PTPN1 and VEGFA/XPO5/POLH hubs. The pathway analysis identifies calcium homeostasis, wound healing and cell motility as key processes in UC pathogenesis. Using transcriptomic data from an independent patient cohort, with three complementary validation approaches focusing on the SNP-affected genes, the patient specific modules and affected functions, we confirm the regulatory impact of non-coding SNPs. iSNP identified regulatory effects for disease-associated non-coding SNPs, and by predicting the patient-specific pathogenic processes, we propose a systems-level way to stratify patients.
Collapse
Affiliation(s)
- Johanne Brooks-Warburton
- Earlham Institute, Norwich Research Park, Norwich, UK
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
- Department of Clinical, Pharmaceutical and Biological Sciences, University of Hertfordshire, Hertford, UK
- Gastroenterology Department, Lister Hospital, Stevenage, UK
| | - Dezso Modos
- Earlham Institute, Norwich Research Park, Norwich, UK
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Padhmanand Sudhakar
- Earlham Institute, Norwich Research Park, Norwich, UK
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
- KU Leuven, Department of Chronic diseases, Metabolism and Ageing, Leuven, Belgium
| | - Matthew Madgwick
- Earlham Institute, Norwich Research Park, Norwich, UK
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
| | - John P Thomas
- Earlham Institute, Norwich Research Park, Norwich, UK
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
- Department of Gastroenterology, Norfolk and Norwich University Hospitals, Norwich, UK
| | - Balazs Bohar
- Earlham Institute, Norwich Research Park, Norwich, UK
- Department of Genetics, Eötvös Loránd University, Budapest, Hungary
| | - David Fazekas
- Earlham Institute, Norwich Research Park, Norwich, UK
- Department of Genetics, Eötvös Loránd University, Budapest, Hungary
| | - Azedine Zoufir
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Orsolya Kapuy
- Department of Molecular Biology, Semmelweis University, Budapest, Hungary
| | | | - Bram Verstockt
- KU Leuven, Department of Chronic diseases, Metabolism and Ageing, Leuven, Belgium
- University Hospitals Leuven, Department of Gastroenterology and Hepatology, KU Leuven, Leuven, Belgium
| | - Lindsay J Hall
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
- Norwich Medical School, University of East Anglia, Norwich, UK
- School of Life Sciences, ZIEL - Institute for Food & Health, Technical University of Munich, 80333, Freising, Germany
| | - Alastair Watson
- Department of Gastroenterology, Norfolk and Norwich University Hospitals, Norwich, UK
- Norwich Medical School, University of East Anglia, Norwich, UK
| | - Mark Tremelling
- Department of Gastroenterology, Norfolk and Norwich University Hospitals, Norwich, UK
| | - Miles Parkes
- Inflammatory Bowel Disease Research Group, Addenbrooke's Hospital, University of Cambridge, Cambridge, UK
| | - Severine Vermeire
- KU Leuven, Department of Chronic diseases, Metabolism and Ageing, Leuven, Belgium
- University Hospitals Leuven, Department of Gastroenterology and Hepatology, KU Leuven, Leuven, Belgium
| | - Andreas Bender
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Simon R Carding
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK.
- Norwich Medical School, University of East Anglia, Norwich, UK.
| | - Tamas Korcsmaros
- Earlham Institute, Norwich Research Park, Norwich, UK.
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK.
| |
Collapse
|
18
|
Chen L, Wang Y, Zhao F. Exploiting deep transfer learning for the prediction of functional non-coding variants using genomic sequence. Bioinformatics 2022; 38:3164-3172. [PMID: 35389435 PMCID: PMC9890318 DOI: 10.1093/bioinformatics/btac214] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 03/04/2022] [Accepted: 04/06/2022] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Though genome-wide association studies have identified tens of thousands of variants associated with complex traits and most of them fall within the non-coding regions, they may not be the causal ones. The development of high-throughput functional assays leads to the discovery of experimental validated non-coding functional variants. However, these validated variants are rare due to technical difficulty and financial cost. The small sample size of validated variants makes it less reliable to develop a supervised machine learning model for achieving a whole genome-wide prediction of non-coding causal variants. RESULTS We will exploit a deep transfer learning model, which is based on convolutional neural network, to improve the prediction for functional non-coding variants (NCVs). To address the challenge of small sample size, the transfer learning model leverages both large-scale generic functional NCVs to improve the learning of low-level features and context-specific functional NCVs to learn high-level features toward the context-specific prediction task. By evaluating the deep transfer learning model on three MPRA datasets and 16 GWAS datasets, we demonstrate that the proposed model outperforms deep learning models without pretraining or retraining. In addition, the deep transfer learning model outperforms 18 existing computational methods in both MPRA and GWAS datasets. AVAILABILITY AND IMPLEMENTATION https://github.com/lichen-lab/TLVar. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Li Chen
- To whom correspondence should be addressed.
| | | | - Fengdi Zhao
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN 46202, USA,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| |
Collapse
|
19
|
Young RS, Talmane L, Marion de Procé S, Taylor MS. The contribution of evolutionarily volatile promoters to molecular phenotypes and human trait variation. Genome Biol 2022; 23:89. [PMID: 35379293 PMCID: PMC8978360 DOI: 10.1186/s13059-022-02634-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 02/16/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Promoters are sites of transcription initiation that harbour a high concentration of phenotype-associated genetic variation. The evolutionary gain and loss of promoters between species (collectively, termed turnover) is pervasive across mammalian genomes and may play a prominent role in driving human phenotypic diversity. RESULTS We classified human promoters by their evolutionary history during the divergence of mouse and human lineages from a common ancestor. This defined conserved, human-inserted and mouse-deleted promoters, and a class of functional-turnover promoters that align between species but are only active in humans. We show that promoters of all evolutionary categories are hotspots for substitution and often, insertion mutations. Loci with a history of insertion and deletion continue that mode of evolution within contemporary humans. The presence of an evolutionary volatile promoter within a gene is associated with increased expression variance between individuals, but only in the case of human-inserted and mouse-deleted promoters does that correspond to an enrichment of promoter-proximal genetic effects. Despite the enrichment of these molecular quantitative trait loci (QTL) at evolutionarily volatile promoters, this does not translate into a corresponding enrichment of phenotypic traits mapping to these loci. CONCLUSIONS Promoter turnover is pervasive in the human genome, and these promoters are rich in molecularly quantifiable but phenotypically inconsequential variation in gene expression. However, since evolutionarily volatile promoters show evidence of selection, coupled with high mutation rates and enrichment of QTLs, this implicates them as a source of evolutionary innovation and phenotypic variation, albeit with a high background of selectively neutral expression variation.
Collapse
Affiliation(s)
- Robert S Young
- Usher Institute, University of Edinburgh, Teviot Place, Edinburgh, EH8 9AG, UK. .,Zhejiang University - University of Edinburgh Institute, Zhejiang University, 718 East Haizhou Road, 314400, Haining, China. .,MRC Human Genetics Unit, Institute for Genetics and Cancer, University of Edinburgh, Crewe Road, Edinburgh, EH4 2XU, UK.
| | - Lana Talmane
- MRC Human Genetics Unit, Institute for Genetics and Cancer, University of Edinburgh, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Sophie Marion de Procé
- Usher Institute, University of Edinburgh, Teviot Place, Edinburgh, EH8 9AG, UK.,MRC Human Genetics Unit, Institute for Genetics and Cancer, University of Edinburgh, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Martin S Taylor
- MRC Human Genetics Unit, Institute for Genetics and Cancer, University of Edinburgh, Crewe Road, Edinburgh, EH4 2XU, UK.
| |
Collapse
|
20
|
Cao Z, Huang Y, Duan R, Jin P, Qin ZS, Zhang S. Disease category-specific annotation of variants using an ensemble learning framework. Brief Bioinform 2021; 23:6394995. [PMID: 34643213 DOI: 10.1093/bib/bbab438] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 09/03/2021] [Accepted: 09/22/2021] [Indexed: 02/01/2023] Open
Abstract
Understanding the impact of non-coding sequence variants on complex diseases is an essential problem. We present a novel ensemble learning framework-CASAVA, to predict genomic loci in terms of disease category-specific risk. Using disease-associated variants identified by GWAS as training data, and diverse sequencing-based genomics and epigenomics profiles as features, CASAVA provides risk prediction of 24 major categories of diseases throughout the human genome. Our studies showed that CASAVA scores at a genomic locus provide a reasonable prediction of the disease-specific and disease category-specific risk prediction for non-coding variants located within the locus. Taking MHC2TA and immune system diseases as an example, we demonstrate the potential of CASAVA in revealing variant-disease associations. A website (http://zhanglabtools.org/CASAVA) has been built to facilitate easily access to CASAVA scores.
Collapse
Affiliation(s)
- Zhen Cao
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yanting Huang
- Department of Computer Science, Emory University, Atlanta, GA 30322, USA
| | - Ran Duan
- Department of Software Engineering, Yunnan University, Kunming 650500, China
| | - Peng Jin
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Zhaohui S Qin
- Department of Computer Science, Emory University, Atlanta, GA 30322, USA.,Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China.,Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
| |
Collapse
|
21
|
Jin Y, Jiang J, Wang R, Qin ZS. Systematic Evaluation of DNA Sequence Variations on in vivo Transcription Factor Binding Affinity. Front Genet 2021; 12:667866. [PMID: 34567058 PMCID: PMC8458901 DOI: 10.3389/fgene.2021.667866] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 08/02/2021] [Indexed: 02/01/2023] Open
Abstract
The majority of the single nucleotide variants (SNVs) identified by genome-wide association studies (GWAS) fall outside of the protein-coding regions. Elucidating the functional implications of these variants has been a major challenge. A possible mechanism for functional non-coding variants is that they disrupted the canonical transcription factor (TF) binding sites that affect the in vivo binding of the TF. However, their impact varies since many positions within a TF binding motif are not well conserved. Therefore, simply annotating all variants located in putative TF binding sites may overestimate the functional impact of these SNVs. We conducted a comprehensive survey to study the effect of SNVs on the TF binding affinity. A sequence-based machine learning method was used to estimate the change in binding affinity for each SNV located inside a putative motif site. From the results obtained on 18 TF binding motifs, we found that there is a substantial variation in terms of a SNV’s impact on TF binding affinity. We found that only about 20% of SNVs located inside putative TF binding sites would likely to have significant impact on the TF-DNA binding.
Collapse
Affiliation(s)
- Yutong Jin
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, United States
| | - Jiahui Jiang
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, United States
| | - Ruixuan Wang
- College of Environmental Sciences and Engineering, Peking University, Beijing, China
| | - Zhaohui S Qin
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, United States
| |
Collapse
|
22
|
Lu Y, Wu Y, Liu Y, Li Y, Jing R, Li M. Prediction of disease-associated functional variants in noncoding regions through a comprehensive analysis by integrating datasets and features. Hum Mutat 2021; 42:667-684. [PMID: 33822436 DOI: 10.1002/humu.24203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 02/01/2021] [Accepted: 03/31/2021] [Indexed: 02/01/2023]
Abstract
One of the greatest challenges in human genetics is deciphering the link between functional variants in noncoding sequences and the pathophysiology of complex diseases. To address this issue, many methods have been developed to sort functional single-nucleotide variants (SNVs) for neutral SNVs in noncoding regions. In this study, we integrated well-established features and commonly used datasets and merged them into large-scale datasets based on a random forest model, which yielded promising performance and outperformed some cutting-edge approaches. Our analyses of feature importance and data coverage also provide certain clues for future research in enhancing the prediction of functional noncoding SNVs.
Collapse
Affiliation(s)
- Yu Lu
- College of Chemistry, Sichuan University, Chengdu, Sichuan, China
| | - Yiming Wu
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Yuan Liu
- College of Chemistry, Sichuan University, Chengdu, Sichuan, China
| | - Yizhou Li
- College of Chemistry, Sichuan University, Chengdu, Sichuan, China
| | - Runyu Jing
- College of Cybersecurity, Sichuan University, Chengdu, Sichuan, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, Sichuan, China
| |
Collapse
|
23
|
Xiao X, Jiao B, Liao X, Zhang W, Yuan Z, Guo L, Wang X, Zhou L, Liu X, Yan X, Tang B, Shen L. Association of Genes Involved in the Metabolic Pathways of Amyloid-β and Tau Proteins With Sporadic Late-Onset Alzheimer's Disease in the Southern Han Chinese Population. Front Aging Neurosci 2020; 12:584801. [PMID: 33240075 PMCID: PMC7677357 DOI: 10.3389/fnagi.2020.584801] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Accepted: 10/13/2020] [Indexed: 01/04/2023] Open
Abstract
The genes involved in the metabolic pathways of amyloid-β (Aβ) and tau proteins significantly influence the etiology of Alzheimer's disease (AD). Various studies have explored the associations between some of these genes and AD in the Caucasian population; however, researches regarding these associations remain limited in the Chinese population. To systematically evaluate the associations of these genes with AD, we investigated 19 genes involved in the metabolism of Aβ and tau based on previous studies selected using the PubMed database. This study included 372 patients with sporadic late-onset AD (sLOAD) and 345 cognitively healthy individuals from southern China. The results were replicated in the International Genomics of Alzheimer's Project (IGAP). Protein-protein interactions were determined using the STRING v11 database. We found that a single-nucleotide polymorphism, rs11682128, of BIN1 conferred susceptibility to sLOAD after adjusting for age, sex, and APOE ε4 status and performing the Bonferroni correction {corrected P = 0.000153, odds ratio (OR) [95% confidence interval (CI)] = 1.403 (1.079-1.824)}, which was replicated in the IGAP. Protein-protein interactions indicated that BIN1 was correlated with MAPT. Moreover, rare variants of NEP and FERMT2 (0.0026 < corrected P < 0.05), and the Aβ degradation, tau pathology, and tau phosphatase pathways (0.01 < corrected P < 0.05), were nominally significantly associated with sLOAD. This study suggested that the genes involved in the metabolic pathways of Aβ and tau contributed to the etiology of sLOAD in the southern Han Chinese population.
Collapse
Affiliation(s)
- Xuewen Xiao
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Bin Jiao
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
- National Clinical Research Center for Geriatric Disorders, Central South University, Changsha, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Central South University, Changsha, China
| | - Xinxin Liao
- Department of Geriatrics Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Weiwei Zhang
- Department of Radiology, Xiangya Hospital, Central South University, Changsha, China
| | - Zhenhua Yuan
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Lina Guo
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Xin Wang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Lu Zhou
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Xixi Liu
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Xinxiang Yan
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
- National Clinical Research Center for Geriatric Disorders, Central South University, Changsha, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Central South University, Changsha, China
| | - Beisha Tang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
- National Clinical Research Center for Geriatric Disorders, Central South University, Changsha, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Central South University, Changsha, China
| | - Lu Shen
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
- National Clinical Research Center for Geriatric Disorders, Central South University, Changsha, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Central South University, Changsha, China
- Key Laboratory of Organ Injury, Aging and Regenerative Medicine of Hunan Province, Changsha, China
| |
Collapse
|
24
|
Subaran RL, Stewart WCL. FREQMAX provides an alternative approach for determining high-resolution allele frequency thresholds in carrier screening. Hum Mutat 2020; 41:2078-2086. [PMID: 33032373 DOI: 10.1002/humu.24123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 06/26/2020] [Accepted: 10/01/2020] [Indexed: 11/08/2022]
Abstract
As whole-genome data become available for increasing numbers of individuals across diverse populations, the list of genomic variants of unknown significance (VOUS) continues to grow. One powerful tool in VOUS interpretation is determining whether an allele is too common to be considered pathogenic. As genetic and epidemiological parameters vary across disease models, so too does the pathogenic allele frequency threshold for each disease gene. One threshold-setting approach is the maximum credible allele frequency (MCAF) method. However, estimating some of the input values MCAF requires, especially those involving heterogeneity, can present nontrivial statistical challenges. Here, we introduce FREQMAX, our alternative approach for determining allele frequency thresholds in carrier screening. FREQMAX makes efficient use of the data available for well-studied traits and exhibits flexibility for traits where information may be less complete. For cystic fibrosis, more alleles are excluded as benign by FREQMAX than by MCAF. For less-comprehensively characterized traits like ciliary dyskinesia and Smith-Lemli-Opitz syndrome, FREQMAX is able to set the allele frequency threshold without requiring a priori estimates of maximum genetic and allelic contributions. Furthermore, though we describe FREQMAX in the context of carrier screening, its classical population genetics framework also provides context for adaptation to other trait models.
Collapse
Affiliation(s)
- Ryan L Subaran
- Bioinformatics R&D, Sema4, a Mount Sinai Venture, Stamford, Connecticut, USA
| | - William C L Stewart
- Battelle Center for Mathematical Medicine, Nationwide Children's Hospital, Columbus, Ohio, USA
| |
Collapse
|
25
|
Xu W, Han SD, Zhang C, Li JQ, Wang YJ, Tan CC, Li HQ, Dong Q, Mei C, Tan L, Yu JT. The FAM171A2 gene is a key regulator of progranulin expression and modifies the risk of multiple neurodegenerative diseases. SCIENCE ADVANCES 2020; 6:eabb3063. [PMID: 33087363 PMCID: PMC7577723 DOI: 10.1126/sciadv.abb3063] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Accepted: 09/08/2020] [Indexed: 05/14/2023]
Abstract
Progranulin (PGRN) is a secreted pleiotropic glycoprotein associated with the development of common neurodegenerative diseases. Understanding the pathophysiological role of PGRN may help uncover biological underpinnings. We performed a genome-wide association study to determine the genetic regulators of cerebrospinal fluid (CSF) PGRN levels. Common variants in region of FAM171A2 were associated with lower CSF PGRN levels (rs708384, P = 3.95 × 10-12). This was replicated in another independent cohort. The rs708384 was associated with increased risk of Alzheimer's disease, Parkinson's disease, and frontotemporal dementia and could modify the expression of the FAM171A2 gene. FAM171A2 was considerably expressed in the vascular endothelium and microglia, which are rich in PGRN. The in vitro study further confirmed that the rs708384 mutation up-regulated the expression of FAM171A2, which caused a decrease in the PGRN level. Collectively, genetic, molecular, and bioinformatic findings suggested that FAM171A2 is a key player in regulating PGRN production.
Collapse
Affiliation(s)
- Wei Xu
- Department of Neurology and Institute of Neurology, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, Shanghai, China
- Department of Neurology, Qingdao Municipal Hospital, Qingdao University, Qingdao, China
| | - Si-Da Han
- Department of Neurology and Institute of Neurology, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, Shanghai, China
| | - Can Zhang
- Genetics and Aging Research Unit, McCance Center for Brain Health, Mass General Institute for Neurodegenerative Diseases (MIND), Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Charlestown, MA, USA
| | - Jie-Qiong Li
- Department of Neurology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Yan-Jiang Wang
- Department of Neurology and Center for Clinical Neuroscience, Daping Hospital, Third Military Medical University, Chongqing, China
| | - Chen-Chen Tan
- Department of Neurology, Qingdao Municipal Hospital, Qingdao University, Qingdao, China
| | - Hong-Qi Li
- Department of Neurology and Institute of Neurology, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, Shanghai, China
| | - Qiang Dong
- Department of Neurology and Institute of Neurology, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, Shanghai, China
| | - Cui Mei
- Department of Neurology and Institute of Neurology, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, Shanghai, China
| | - Lan Tan
- Department of Neurology, Qingdao Municipal Hospital, Qingdao University, Qingdao, China
| | - Jin-Tai Yu
- Department of Neurology and Institute of Neurology, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, Shanghai, China.
| |
Collapse
|
26
|
Werren EA, Garcia O, Bigham AW. Identifying adaptive alleles in the human genome: from selection mapping to functional validation. Hum Genet 2020; 140:241-276. [PMID: 32728809 DOI: 10.1007/s00439-020-02206-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Accepted: 07/07/2020] [Indexed: 12/19/2022]
Abstract
The suite of phenotypic diversity across geographically distributed human populations is the outcome of genetic drift, gene flow, and natural selection throughout human evolution. Human genetic variation underlying local biological adaptations to selective pressures is incompletely characterized. With the emergence of population genetics modeling of large-scale genomic data derived from diverse populations, scientists are able to map signatures of natural selection in the genome in a process known as selection mapping. Inferred selection signals further can be used to identify candidate functional alleles that underlie putative adaptive phenotypes. Phenotypic association, fine mapping, and functional experiments facilitate the identification of candidate adaptive alleles. Functional investigation of candidate adaptive variation using novel techniques in molecular biology is slowly beginning to unravel how selection signals translate to changes in biology that underlie the phenotypic spectrum of our species. In addition to informing evolutionary hypotheses of adaptation, the discovery and functional annotation of adaptive alleles also may be of clinical significance. While selection mapping efforts in non-European populations are growing, there remains a stark under-representation of diverse human populations in current public genomic databases, of both clinical and non-clinical cohorts. This lack of inclusion limits the study of human biological variation. Identifying and functionally validating candidate adaptive alleles in more global populations is necessary for understanding basic human biology and human disease.
Collapse
Affiliation(s)
- Elizabeth A Werren
- Department of Human Genetics, The University of Michigan, Ann Arbor, MI, USA
- Department of Anthropology, The University of Michigan, Ann Arbor, MI, USA
| | - Obed Garcia
- Department of Anthropology, The University of Michigan, Ann Arbor, MI, USA
| | - Abigail W Bigham
- Department of Anthropology, University of California Los Angeles, 341 Haines Hall, Los Angeles, CA, 90095, USA.
| |
Collapse
|
27
|
Li X, Shi L, Wang Y, Zhong J, Zhao X, Teng H, Shi X, Yang H, Ruan S, Li M, Sun ZS, Zhan Q, Mao F. OncoBase: a platform for decoding regulatory somatic mutations in human cancers. Nucleic Acids Res 2020; 47:D1044-D1055. [PMID: 30445567 PMCID: PMC6323961 DOI: 10.1093/nar/gky1139] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Accepted: 11/11/2018] [Indexed: 12/16/2022] Open
Abstract
Whole-exome and whole-genome sequencing have revealed millions of somatic mutations associated with different human cancers, and the vast majority of them are located outside of coding sequences, making it challenging to directly interpret their functional effects. With the rapid advances in high-throughput sequencing technologies, genome-scale long-range chromatin interactions were detected, and distal target genes of regulatory elements were determined using three-dimensional (3D) chromatin looping. Herein, we present OncoBase (http://www.oncobase.biols.ac.cn/), an integrated database for annotating 81 385 242 somatic mutations in 68 cancer types from more than 120 cancer projects by exploring their roles in distal interactions between target genes and regulatory elements. OncoBase integrates local chromatin signatures, 3D chromatin interactions in different cell types and reconstruction of enhancer-target networks using state-of-the-art algorithms. It employs informative visualization tools to display the integrated local and 3D chromatin signatures and effects of somatic mutations on regulatory elements. Enhancer-promoter interactions estimated from chromatin interactions are integrated into a network diffusion system that quantitatively prioritizes somatic mutations and target genes from a large pool. Thus, OncoBase is a useful resource for the functional annotation of regulatory noncoding regions and systematically benchmarking the regulatory effects of embedded noncoding somatic mutations in human carcinogenesis.
Collapse
Affiliation(s)
- Xianfeng Li
- Key laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Laboratory of Molecular Oncology, Peking University Cancer Hospital & Institute, Beijing 100142, China.,Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China
| | - Leisheng Shi
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yan Wang
- Key laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Laboratory of Molecular Oncology, Peking University Cancer Hospital & Institute, Beijing 100142, China
| | - Jianing Zhong
- Key Laboratory of Prevention and Treatment of Cardiovascular and Cerebrovascular Diseases of Ministry of Education, Gannan Medical University, Ganzhou 341000,China
| | - Xiaolu Zhao
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Huajing Teng
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China
| | - Xiaohui Shi
- Sino-Danish college, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Haonan Yang
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Shasha Ruan
- Department of Clinical Oncology, Renmin Hospital of Wuhan University, Wuhan, Hubei 430072, China
| | - MingKun Li
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Zhong Sheng Sun
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China
| | - Qimin Zhan
- Key laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Laboratory of Molecular Oncology, Peking University Cancer Hospital & Institute, Beijing 100142, China
| | - Fengbiao Mao
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
28
|
Dapas M, Sisk R, Legro RS, Urbanek M, Dunaif A, Hayes MG. Family-Based Quantitative Trait Meta-Analysis Implicates Rare Noncoding Variants in DENND1A in Polycystic Ovary Syndrome. J Clin Endocrinol Metab 2019; 104:3835-3850. [PMID: 31038695 PMCID: PMC6660913 DOI: 10.1210/jc.2018-02496] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 04/17/2019] [Indexed: 02/07/2023]
Abstract
CONTEXT Polycystic ovary syndrome (PCOS) is among the most common endocrine disorders of premenopausal women, affecting 5% to15% of this population depending on the diagnostic criteria applied. It is characterized by hyperandrogenism, ovulatory dysfunction, and polycystic ovarian morphology. PCOS is highly heritable, but only a small proportion of this heritability can be accounted for by the common genetic susceptibility variants identified to date. OBJECTIVE The objective of this study was to test whether rare genetic variants contribute to PCOS pathogenesis. DESIGN, PATIENTS, AND METHODS We performed whole-genome sequencing on DNA from 261 individuals from 62 families with one or more daughters with PCOS. We tested for associations of rare variants with PCOS and its concomitant hormonal traits using a quantitative trait meta-analysis. RESULTS We found rare variants in DENND1A (P = 5.31 × 10-5, adjusted P = 0.039) that were significantly associated with reproductive and metabolic traits in PCOS families. CONCLUSIONS Common variants in DENND1A have previously been associated with PCOS diagnosis in genome-wide association studies. Subsequent studies indicated that DENND1A is an important regulator of human ovarian androgen biosynthesis. Our findings provide additional evidence that DENND1A plays a central role in PCOS and suggest that rare noncoding variants contribute to disease pathogenesis.
Collapse
Affiliation(s)
- Matthew Dapas
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Ryan Sisk
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Richard S Legro
- Department of Obstetrics and Gynecology, Penn State College of Medicine, Hershey, Pennsylvania
| | - Margrit Urbanek
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Center for Reproductive Science, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Andrea Dunaif
- Division of Endocrinology, Diabetes, and Bone Disease, Icahn School of Medicine at Mount Sinai, New York, New York
| | - M Geoffrey Hayes
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Department of Anthropology, Northwestern University, Evanston, Illinois
| |
Collapse
|
29
|
Chen X, Jin J, Wang Q, Xue H, Zhang N, Du Y, Zhang T, Zhang B, Wu J, Liu Z. A de novo pathogenic CSNK1E mutation identified by exome sequencing in family trios with epileptic encephalopathy. Hum Mutat 2018; 40:281-287. [PMID: 30488659 DOI: 10.1002/humu.23690] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2018] [Revised: 11/07/2018] [Accepted: 11/24/2018] [Indexed: 12/29/2022]
Abstract
Recent whole-exome sequencing (WES) studies have demonstrated the contribution of de novo mutations (DNMs) to epileptic encephalopathies (EEs). Here, we performed WES on four trios with West syndrome and identified three loss-of-function DNMs in both CSNK1E (c.885+1G>A) and STXBP1 (splicing, c.1111-2A>G; nonsense, p.(Y519X)). The splicing mutation in CSNK1E creates insertion of 116 new amino acids at position 246 followed by a premature stop codon. Both CSNK1E and STXBP1 showed a closer coexpression relationship with epilepsy candidate genes beyond that expected by chance. In addition, genes coexpressed with CSNK1E were enriched in early prenatal stages across multiple brain regions. We also found that 60 CSNK1E-interacting genes share an association with multiple neuropsychiatric disorders, and these genes formed a significant interconnected interaction network with roles in the midbrain development. Our study supported the potential role of CSNK1E variants in EE susceptibility and expanded the phenotypic spectrum associated with CSNK1E variation.
Collapse
Affiliation(s)
- Xiaomin Chen
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China.,Center of Scientific Research, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, Wenzhou, China
| | - Jing Jin
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China.,School of Laboratory Medicine and Life Sciences, Wenzhou Medical University, Wenzhou, China
| | - Qiongdan Wang
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China.,Department of Laboratory Medicine, the Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, Wenzhou, China
| | - Huangqi Xue
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China.,School of Laboratory Medicine and Life Sciences, Wenzhou Medical University, Wenzhou, China
| | - Na Zhang
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Yaoqiang Du
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China.,Research Center of Blood Transfusion Medicine, Education Ministry Key Laboratory of Laboratory Medicine, Zhejiang Provincial People's Hospital, People's Hospital of Hangzhou Medical College, Hangzhou, China
| | - Tao Zhang
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Bing Zhang
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Jinyu Wu
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Zhenwei Liu
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| |
Collapse
|