1
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: trends from three decades of genetic variant impact predictors. Hum Genomics 2024; 18:90. [PMID: 39198917 PMCID: PMC11360829 DOI: 10.1186/s40246-024-00663-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2024] [Accepted: 08/19/2024] [Indexed: 09/01/2024] Open
Abstract
BACKGROUND Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). RESULTS The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past three decades, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 190 VIPs, resulting in a total of 407 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. CONCLUSIONS VIPdb version 2 summarizes 407 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. VIPdb is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA
| | - Arul S Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA
- Illumina, Foster City, CA, 94404, USA
| | - Steven E Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA.
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA.
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA.
| |
Collapse
|
2
|
Giovannetti A, Lazzari S, Mangoni M, Traversa A, Mazza T, Parisi C, Caputo V. Exploring non-coding genetic variability in ACE2: Functional annotation and in vitro validation of regulatory variants. Gene 2024; 915:148422. [PMID: 38570058 DOI: 10.1016/j.gene.2024.148422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 02/23/2024] [Accepted: 03/13/2024] [Indexed: 04/05/2024]
Abstract
The surge in human whole-genome sequencing data has facilitated the study of non-coding region variations, yet understanding their biological significance remains a challenge. We used a computational workflow to assess the regulatory potential of non-coding variants, with a particular focus on the Angiotensin Converting Enzyme 2 (ACE2) gene. This gene is crucial in physiological processes and serves as the entry point for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus causing coronavirus disease 19 (COVID-19). In our analysis, using data from the gnomAD population database and functional annotation, we identified 17 significant Single Nucleotide Variants (SNVs) in ACE2, particularly in its enhancers, promoters, and 3' untranslated regions (UTRs). We found preliminary evidence supporting the regulatory impact of some of these variants on ACE2 expression. Our detailed examination of two SNVs, rs147718775 and rs140394675, in the ACE2 promoter revealed that these co-occurring SNVs, when mutated, significantly enhance promoter activity, suggesting a possible increase in specific ACE2 isoform expression. This method proves effective in identifying and interpreting impactful non-coding variants, aiding in further studies and enhancing understanding of molecular bases of monogenic and complex traits.
Collapse
Affiliation(s)
- Agnese Giovannetti
- Clinical Genomics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini, snc, 71013 S. Giovanni Rotondo (FG), Italy.
| | - Sara Lazzari
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy.
| | - Manuel Mangoni
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy; Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini, snc, 71013 S. Giovanni Rotondo (FG), Italy.
| | - Alice Traversa
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy; Dipartimento di Scienze della Vita, della Salute e delle Professioni Sanitarie, Università degli Studi "Link Campus University", Via del Casale di San Pio V 44, 00165 Roma, Italy.
| | - Tommaso Mazza
- Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini, snc, 71013 S. Giovanni Rotondo (FG), Italy.
| | - Chiara Parisi
- Institute of Biochemistry and Cell Biology, CNR-National Research Council, Via Ercole Ramarini, 32, 00015 Monterotondo Scalo (RM), Italy.
| | - Viviana Caputo
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy.
| |
Collapse
|
3
|
Jin W, Xia Y, Thela SR, Liu Y, Chen L. In silico generation and augmentation of regulatory variants from massively parallel reporter assay using conditional variational autoencoder. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600715. [PMID: 38979263 PMCID: PMC11230389 DOI: 10.1101/2024.06.25.600715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Predicting the functional consequences of genetic variants in non-coding regions is a challenging problem. Massively parallel reporter assays (MPRAs), which are an in vitro high-throughput method, can simultaneously test thousands of variants by evaluating the existence of allele specific regulatory activity. Nevertheless, the identified labelled variants by MPRAs, which shows differential allelic regulatory effects on the gene expression are usually limited to the scale of hundreds, limiting their potential to be used as the training set for achieving a robust genome-wide prediction. To address the limitation, we propose a deep generative model, MpraVAE, to in silico generate and augment the training sample size of labelled variants. By benchmarking on several MPRA datasets, we demonstrate that MpraVAE significantly improves the prediction performance for MPRA regulatory variants compared to the baseline method, conventional data augmentation approaches as well as existing variant scoring methods. Taking autoimmune diseases as one example, we apply MpraVAE to perform a genome-wide prediction of regulatory variants and find that predicted regulatory variants are more enriched than background variants in enhancers, active histone marks, open chromatin regions in immune-related cell types, and chromatin states associated with promoter, enhancer activity and binding sites of cMyC and Pol II that regulate gene expression. Importantly, predicted regulatory variants are found to link immune-related genes by leveraging chromatin loop and accessible chromatin, demonstrating the importance of MpraVAE in genetic and gene discovery for complex traits.
Collapse
Affiliation(s)
- Weijia Jin
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA
| | - Yi Xia
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA
| | - Sai Ritesh Thela
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA
| | - Yunlong Liu
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Li Chen
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA
| |
Collapse
|
4
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: Trends from 25 years of genetic variant impact predictors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600283. [PMID: 38979289 PMCID: PMC11230257 DOI: 10.1101/2024.06.25.600283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Background Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). Results The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past 25 years, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 186 VIPs, resulting in a total of 403 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. Conclusions VIPdb version 2 summarizes 403 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. Availability VIPdb version 2 is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
| | - Arul S. Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Currently at: Illumina, Foster City, California 94404, USA
| | - Steven E. Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
5
|
Iñiguez-Muñoz S, Llinàs-Arias P, Ensenyat-Mendez M, Bedoya-López AF, Orozco JIJ, Cortés J, Roy A, Forsberg-Nilsson K, DiNome ML, Marzese DM. Hidden secrets of the cancer genome: unlocking the impact of non-coding mutations in gene regulatory elements. Cell Mol Life Sci 2024; 81:274. [PMID: 38902506 PMCID: PMC11335195 DOI: 10.1007/s00018-024-05314-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 12/07/2023] [Accepted: 06/06/2024] [Indexed: 06/22/2024]
Abstract
Discoveries in the field of genomics have revealed that non-coding genomic regions are not merely "junk DNA", but rather comprise critical elements involved in gene expression. These gene regulatory elements (GREs) include enhancers, insulators, silencers, and gene promoters. Notably, new evidence shows how mutations within these regions substantially influence gene expression programs, especially in the context of cancer. Advances in high-throughput sequencing technologies have accelerated the identification of somatic and germline single nucleotide mutations in non-coding genomic regions. This review provides an overview of somatic and germline non-coding single nucleotide alterations affecting transcription factor binding sites in GREs, specifically involved in cancer biology. It also summarizes the technologies available for exploring GREs and the challenges associated with studying and characterizing non-coding single nucleotide mutations. Understanding the role of GRE alterations in cancer is essential for improving diagnostic and prognostic capabilities in the precision medicine era, leading to enhanced patient-centered clinical outcomes.
Collapse
Affiliation(s)
- Sandra Iñiguez-Muñoz
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain
| | - Pere Llinàs-Arias
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain
| | - Miquel Ensenyat-Mendez
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain
| | - Andrés F Bedoya-López
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain
| | - Javier I J Orozco
- Saint John's Cancer Institute, Providence Saint John's Health Center, Santa Monica, CA, USA
| | - Javier Cortés
- International Breast Cancer Center (IBCC), Pangaea Oncology, Quiron Group, 08017, Barcelona, Spain
- Medica Scientia Innovation Research SL (MEDSIR), 08018, Barcelona, Spain
- Faculty of Biomedical and Health Sciences, Department of Medicine, Universidad Europea de Madrid, 28670, Madrid, Spain
| | - Ananya Roy
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Karin Forsberg-Nilsson
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
- University of Nottingham Biodiscovery Institute, Nottingham, UK
| | - Maggie L DiNome
- Department of Surgery, Duke University School of Medicine, Durham, NC, USA
| | - Diego M Marzese
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain.
- Department of Surgery, Duke University School of Medicine, Durham, NC, USA.
| |
Collapse
|
6
|
Dorans E, Jagadeesh K, Dey K, Price AL. Linking regulatory variants to target genes by integrating single-cell multiome methods and genomic distance. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.24.24307813. [PMID: 38826240 PMCID: PMC11142273 DOI: 10.1101/2024.05.24.24307813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Methods that analyze single-cell paired RNA-seq and ATAC-seq multiome data have shown great promise in linking regulatory elements to genes. However, existing methods differ in their modeling assumptions and approaches to account for biological and technical noise-leading to low concordance in their linking scores-and do not capture the effects of genomic distance. We propose pgBoost, an integrative modeling framework that trains a non-linear combination of existing linking strategies (including genomic distance) on fine-mapped eQTL data to assign a probabilistic score to each candidate SNP-gene link. We applied pgBoost to single-cell multiome data from 85k cells representing 6 major immune/blood cell types. pgBoost attained higher enrichment for fine-mapped eSNP-eGene pairs (e.g. 21x at distance >10kb) than existing methods (1.2-10x; p-value for difference = 5e-13 vs. distance-based method and < 4e-35 for each other method), with larger improvements at larger distances (e.g. 35x vs. 0.89-6.6x at distance >100kb; p-value for difference < 0.002 vs. each other method). pgBoost also outperformed existing methods in enrichment for CRISPR-validated links (e.g. 4.8x vs. 1.6-4.1x at distance >10kb; p-value for difference = 0.25 vs. distance-based method and < 2e-5 for each other method), with larger improvements at larger distances (e.g. 15x vs. 1.6-2.5x at distance >100kb; p-value for difference < 0.009 for each other method). Similar improvements in enrichment were observed for links derived from Activity-By-Contact (ABC) scores and GWAS data. We further determined that restricting pgBoost to features from a focal cell type improved the identification of SNP-gene links relevant to that cell type. We highlight several examples where pgBoost linked fine-mapped GWAS variants to experimentally validated or biologically plausible target genes that were not implicated by other methods. In conclusion, a non-linear combination of linking strategies, including genomic distance, improves power to identify target genes underlying GWAS associations.
Collapse
|
7
|
Nakamura T, Ueda J, Mizuno S, Honda K, Kazuno AA, Yamamoto H, Hara T, Takata A. Topologically associating domains define the impact of de novo promoter variants on autism spectrum disorder risk. CELL GENOMICS 2024; 4:100488. [PMID: 38280381 PMCID: PMC10879036 DOI: 10.1016/j.xgen.2024.100488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 08/24/2023] [Accepted: 01/02/2024] [Indexed: 01/29/2024]
Abstract
Whole-genome sequencing (WGS) studies of autism spectrum disorder (ASD) have demonstrated the roles of rare promoter de novo variants (DNVs). However, most promoter DNVs in ASD are not located immediately upstream of known ASD genes. In this study analyzing WGS data of 5,044 ASD probands, 4,095 unaffected siblings, and their parents, we show that promoter DNVs within topologically associating domains (TADs) containing ASD genes are significantly and specifically associated with ASD. An analysis considering TADs as functional units identified specific TADs enriched for promoter DNVs in ASD and indicated that common variants in these regions also confer ASD heritability. Experimental validation using human induced pluripotent stem cells (iPSCs) showed that likely deleterious promoter DNVs in ASD can influence multiple genes within the same TAD, resulting in overall dysregulation of ASD-associated genes. These results highlight the importance of TADs and gene-regulatory mechanisms in better understanding the genetic architecture of ASD.
Collapse
Affiliation(s)
- Takumi Nakamura
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Junko Ueda
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.
| | - Shota Mizuno
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Kurara Honda
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - An-A Kazuno
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Hirona Yamamoto
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Department of Neuropsychiatry, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8654, Japan
| | - Tomonori Hara
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Department of Organ Anatomy, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8575, Japan
| | - Atsushi Takata
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Research Institute for Diseases of Old Age, Juntendo University Graduate School of Medicine, 2-1-1 Hongo, Bunkyo-ku, Tokyo 113-8421, Japan.
| |
Collapse
|
8
|
Feng X, Liu S, Li K, Bu F, Yuan H. NCAD v1.0: a database for non-coding variant annotation and interpretation. J Genet Genomics 2024; 51:230-242. [PMID: 38142743 DOI: 10.1016/j.jgg.2023.12.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 12/15/2023] [Accepted: 12/18/2023] [Indexed: 12/26/2023]
Abstract
The application of whole genome sequencing is expanding in clinical diagnostics across various genetic disorders, and the significance of non-coding variants in penetrant diseases is increasingly being demonstrated. Therefore, it is urgent to improve the diagnostic yield by exploring the pathogenic mechanisms of variants in non-coding regions. However, the interpretation of non-coding variants remains a significant challenge, due to the complex functional regulatory mechanisms of non-coding regions and the current limitations of available databases and tools. Hence, we develop the non-coding variant annotation database (NCAD, http://www.ncawdb.net/), encompassing comprehensive insights into 665,679,194 variants, regulatory elements, and element interaction details. Integrating data from 96 sources, spanning both GRCh37 and GRCh38 versions, NCAD v1.0 provides vital information to support the genetic diagnosis of non-coding variants, including allele frequencies of 12 diverse populations, with a particular focus on the population frequency information for 230,235,698 variants in 20,964 Chinese individuals. Moreover, it offers prediction scores for variant functionality, five categories of regulatory elements, and four types of non-coding RNAs. With its rich data and comprehensive coverage, NCAD serves as a valuable platform, empowering researchers and clinicians with profound insights into non-coding regulatory mechanisms while facilitating the interpretation of non-coding variants.
Collapse
Affiliation(s)
- Xiaoshu Feng
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China
| | - Sihan Liu
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China
| | - Ke Li
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China
| | - Fengxiao Bu
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China.
| | - Huijun Yuan
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China.
| |
Collapse
|
9
|
Wang Z, Zhao G, Zhu Z, Wang Y, Xiang X, Zhang S, Luo T, Zhou Q, Qiu J, Tang B, Xia K, Li B, Li J. VarCards2: an integrated genetic and clinical database for ACMG-AMP variant-interpretation guidelines in the human whole genome. Nucleic Acids Res 2024; 52:D1478-D1489. [PMID: 37956311 PMCID: PMC10767961 DOI: 10.1093/nar/gkad1061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/21/2023] [Accepted: 10/25/2023] [Indexed: 11/15/2023] Open
Abstract
VarCards, an online database, combines comprehensive variant- and gene-level annotation data to streamline genetic counselling for coding variants. Recognising the increasing clinical relevance of non-coding variations, there has been an accelerated development of bioinformatics tools dedicated to interpreting non-coding variations, including single-nucleotide variants and copy number variations. Regrettably, most tools remain as either locally installed databases or command-line tools dispersed across diverse online platforms. Such a landscape poses inconveniences and challenges for genetic counsellors seeking to utilise these resources without advanced bioinformatics expertise. Consequently, we developed VarCards2, which incorporates nearly nine billion artificially generated single-nucleotide variants (including those from mitochondrial DNA) and compiles vital annotation information for genetic counselling based on ACMG-AMP variant-interpretation guidelines. These annotations include (I) functional effects; (II) minor allele frequencies; (III) comprehensive function and pathogenicity predictions covering all potential variants, such as non-synonymous substitutions, non-canonical splicing variants, and non-coding variations and (IV) gene-level information. Furthermore, VarCards2 incorporates 368 820 266 documented short insertions and deletions and 2 773 555 documented copy number variations, complemented by their corresponding annotation and prediction tools. In conclusion, VarCards2, by integrating over 150 variant- and gene-level annotation sources, significantly enhances the efficiency of genetic counselling and can be freely accessed at http://www.genemed.tech/varcards2/.
Collapse
Affiliation(s)
- Zheng Wang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Guihu Zhao
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Zhaopo Zhu
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Yijing Wang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Xudong Xiang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Shiyu Zhang
- Xiangya School of Medicine, Central South University, Changsha, Hunan 410013, China
| | - Tengfei Luo
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Qiao Zhou
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Jian Qiu
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Beisha Tang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, & Multi-Omics Research Center for Brain Disorders, The First Affiliated Hospital, University of South China, Hengyang, Hunan, China
| | - Kun Xia
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Bin Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Jinchen Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| |
Collapse
|
10
|
Pagnamenta AT, Camps C, Giacopuzzi E, Taylor JM, Hashim M, Calpena E, Kaisaki PJ, Hashimoto A, Yu J, Sanders E, Schwessinger R, Hughes JR, Lunter G, Dreau H, Ferla M, Lange L, Kesim Y, Ragoussis V, Vavoulis DV, Allroggen H, Ansorge O, Babbs C, Banka S, Baños-Piñero B, Beeson D, Ben-Ami T, Bennett DL, Bento C, Blair E, Brasch-Andersen C, Bull KR, Cario H, Cilliers D, Conti V, Davies EG, Dhalla F, Dacal BD, Dong Y, Dunford JE, Guerrini R, Harris AL, Hartley J, Hollander G, Javaid K, Kane M, Kelly D, Kelly D, Knight SJL, Kreins AY, Kvikstad EM, Langman CB, Lester T, Lines KE, Lord SR, Lu X, Mansour S, Manzur A, Maroofian R, Marsden B, Mason J, McGowan SJ, Mei D, Mlcochova H, Murakami Y, Németh AH, Okoli S, Ormondroyd E, Ousager LB, Palace J, Patel SY, Pentony MM, Pugh C, Rad A, Ramesh A, Riva SG, Roberts I, Roy N, Salminen O, Schilling KD, Scott C, Sen A, Smith C, Stevenson M, Thakker RV, Twigg SRF, Uhlig HH, van Wijk R, Vona B, Wall S, Wang J, Watkins H, Zak J, Schuh AH, Kini U, Wilkie AOM, Popitsch N, Taylor JC. Structural and non-coding variants increase the diagnostic yield of clinical whole genome sequencing for rare diseases. Genome Med 2023; 15:94. [PMID: 37946251 PMCID: PMC10636885 DOI: 10.1186/s13073-023-01240-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 09/27/2023] [Indexed: 11/12/2023] Open
Abstract
BACKGROUND Whole genome sequencing is increasingly being used for the diagnosis of patients with rare diseases. However, the diagnostic yields of many studies, particularly those conducted in a healthcare setting, are often disappointingly low, at 25-30%. This is in part because although entire genomes are sequenced, analysis is often confined to in silico gene panels or coding regions of the genome. METHODS We undertook WGS on a cohort of 122 unrelated rare disease patients and their relatives (300 genomes) who had been pre-screened by gene panels or arrays. Patients were recruited from a broad spectrum of clinical specialties. We applied a bioinformatics pipeline that would allow comprehensive analysis of all variant types. We combined established bioinformatics tools for phenotypic and genomic analysis with our novel algorithms (SVRare, ALTSPLICE and GREEN-DB) to detect and annotate structural, splice site and non-coding variants. RESULTS Our diagnostic yield was 43/122 cases (35%), although 47/122 cases (39%) were considered solved when considering novel candidate genes with supporting functional data into account. Structural, splice site and deep intronic variants contributed to 20/47 (43%) of our solved cases. Five genes that are novel, or were novel at the time of discovery, were identified, whilst a further three genes are putative novel disease genes with evidence of causality. We identified variants of uncertain significance in a further fourteen candidate genes. The phenotypic spectrum associated with RMND1 was expanded to include polymicrogyria. Two patients with secondary findings in FBN1 and KCNQ1 were confirmed to have previously unidentified Marfan and long QT syndromes, respectively, and were referred for further clinical interventions. Clinical diagnoses were changed in six patients and treatment adjustments made for eight individuals, which for five patients was considered life-saving. CONCLUSIONS Genome sequencing is increasingly being considered as a first-line genetic test in routine clinical settings and can make a substantial contribution to rapidly identifying a causal aetiology for many patients, shortening their diagnostic odyssey. We have demonstrated that structural, splice site and intronic variants make a significant contribution to diagnostic yield and that comprehensive analysis of the entire genome is essential to maximise the value of clinical genome sequencing.
Collapse
Affiliation(s)
- Alistair T Pagnamenta
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Carme Camps
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Edoardo Giacopuzzi
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- Human Technopole, Viale Rita Levi Montalcini 1, 20157, Milan, Italy
| | - John M Taylor
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- Oxford Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Churchill Hospital, Old Road, Oxford, OX3 7LE, UK
| | - Mona Hashim
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Eduardo Calpena
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Pamela J Kaisaki
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Akiko Hashimoto
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Jing Yu
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Edward Sanders
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Ron Schwessinger
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Jim R Hughes
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Gerton Lunter
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
- University Medical Center Groningen, Groningen University, PO Box 72, 9700 AB, Groningen, The Netherlands
| | - Helene Dreau
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- Department of Oncology, Oxford Molecular Diagnostics Centre, University of Oxford, Level 4, John Radcliffe Hospital, Headley Way, Oxford, OX3 9DU, UK
| | - Matteo Ferla
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Lukas Lange
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Yesim Kesim
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Vassilis Ragoussis
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Dimitrios V Vavoulis
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- Department of Oncology, Oxford Molecular Diagnostics Centre, University of Oxford, Level 4, John Radcliffe Hospital, Headley Way, Oxford, OX3 9DU, UK
| | - Holger Allroggen
- Neurosciences Department, UHCW NHS Trust, Clifford Bridge Road, Coventry, CV2 2DX, UK
| | - Olaf Ansorge
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK
| | - Christian Babbs
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Siddharth Banka
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
- Manchester Centre for Genomic Medicine, Saint Mary's Hospital, Oxford Road, Manchester, M13 9WL, UK
| | - Benito Baños-Piñero
- Oxford Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Churchill Hospital, Old Road, Oxford, OX3 7LE, UK
| | - David Beeson
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK
| | - Tal Ben-Ami
- Pediatric Hematology-Oncology Unit, Kaplan Medical Center, Rehovot, Israel
| | - David L Bennett
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK
| | - Celeste Bento
- Hematology Department, Hospitais da Universidade de Coimbra, Coimbra, Portugal
| | - Edward Blair
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- Oxford Centre for Genomic Medicine, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 7LE, UK
| | - Charlotte Brasch-Andersen
- Department of Clinical Genetics, Odense University Hospital and Department of Clinical Research, University of Southern Denmark, Odense, Denmark
| | - Katherine R Bull
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7BN, UK
| | - Holger Cario
- Department of Pediatrics and Adolescent Medicine, University Medical Center, Eythstrasse 24, 89075, Ulm, Germany
| | - Deirdre Cilliers
- Oxford Centre for Genomic Medicine, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 7LE, UK
| | - Valerio Conti
- Neuroscience Department, Meyer Children's Hospital IRCCS, Viale Pieraccini 24, 50139, Florence, Italy
| | - E Graham Davies
- Department of Immunology, Great Ormond Street Hospital for Children NHS Trust and UCL Great Ormond Street Institute of Child Health, Zayed Centre for Research, 2Nd Floor, 20C Guilford Street, London, WC1N 1DZ, UK
| | - Fatima Dhalla
- Department of Paediatrics, Institute of Developmental and Regenerative Medicine, IMS-Tetsuya Nakamura Building, Old Road Campus, Roosevelt Drive, Oxford, OX3 7TY, UK
| | - Beatriz Diez Dacal
- Oxford Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Churchill Hospital, Old Road, Oxford, OX3 7LE, UK
| | - Yin Dong
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK
| | - James E Dunford
- Oxford NIHR Musculoskeletal BRC and Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Nuffield Orthopaedic Centre, Old Road, Oxford, OX3 7HE, UK
| | - Renzo Guerrini
- Neuroscience Department, Meyer Children's Hospital IRCCS, Viale Pieraccini 24, 50139, Florence, Italy
| | - Adrian L Harris
- Department of Oncology, University of Oxford, Old Road Campus Research Building, Oxford, OX3 7DQ, UK
| | - Jane Hartley
- Liver Unit, Birmingham Women's & Children's Hospital and University of Birmingham, Steelhouse Lane, Birmingham, B4 6NH, UK
| | - Georg Hollander
- Department of Paediatrics, University of Oxford, Level 2, Children's Hospital, John Radcliffe Hospital, Oxford, OX3 9DU, UK
| | - Kassim Javaid
- Oxford NIHR Musculoskeletal BRC and Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Nuffield Orthopaedic Centre, Old Road, Oxford, OX3 7HE, UK
| | - Maureen Kane
- Department of Pharmaceutical Sciences, School of Pharmacy, University of Maryland, Pharmacy Hall North, Room 731, 20 N. Pine Street, Baltimore, MD, 21201, USA
| | - Deirdre Kelly
- Liver Unit, Birmingham Women's & Children's Hospital and University of Birmingham, Steelhouse Lane, Birmingham, B4 6NH, UK
| | - Dominic Kelly
- Children's Hospital, OUH NHS Foundation Trust, NIHR Oxford BRC, Headley Way, Oxford, OX3 9DU, UK
| | - Samantha J L Knight
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Alexandra Y Kreins
- Department of Immunology, Great Ormond Street Hospital for Children NHS Trust and UCL Great Ormond Street Institute of Child Health, Zayed Centre for Research, 2Nd Floor, 20C Guilford Street, London, WC1N 1DZ, UK
| | - Erika M Kvikstad
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Craig B Langman
- Feinberg School of Medicine, Northwestern University, 211 E Chicago Avenue, Chicago, IL, MS37, USA
| | - Tracy Lester
- Oxford Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Churchill Hospital, Old Road, Oxford, OX3 7LE, UK
| | - Kate E Lines
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- University of Oxford, Academic Endocrine Unit, OCDEM, Churchill Hospital, Oxford, OX3 7LJ, UK
| | - Simon R Lord
- Early Phase Clinical Trials Unit, Department of Oncology, University of Oxford, Cancer and Haematology Centre, Level 2 Administration Area, Churchill Hospital, Oxford, OX3 7LJ, UK
| | - Xin Lu
- Nuffield Department of Clinical Medicine, Ludwig Institute for Cancer Research, University of Oxford, Old Road Campus Research Building, Oxford, OX3 7DQ, UK
| | - Sahar Mansour
- St George's University Hospitals NHS Foundation Trust, Blackshore Road, Tooting, London, SW17 0QT, UK
| | - Adnan Manzur
- MRC Centre for Neuromuscular Diseases, National Hospital for Neurology and Neurosurgery, Queen Square, London, WC1N 3BG, UK
| | - Reza Maroofian
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology and The National Hospital for Neurology and Neurosurgery, London, WC1N 3BG, UK
| | - Brian Marsden
- Nuffield Department of Medicine, Kennedy Institute, University of Oxford, Oxford, OX3 7BN, UK
| | - Joanne Mason
- Yourgene Health Headquarters, Skelton House, Lloyd Street North, Manchester Science Park, Manchester, M15 6SH, UK
| | - Simon J McGowan
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Davide Mei
- Neuroscience Department, Meyer Children's Hospital IRCCS, Viale Pieraccini 24, 50139, Florence, Italy
| | - Hana Mlcochova
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Yoshiko Murakami
- Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Andrea H Németh
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK
- Oxford Centre for Genomic Medicine, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 7LE, UK
| | - Steven Okoli
- Imperial College NHS Trust, Department of Haematology, Hammersmith Hospital, Du Cane Road, London, W12 0HS, UK
| | - Elizabeth Ormondroyd
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- University of Oxford, Level 6 West Wing, Oxford, OX3 9DU, JR, UK
| | - Lilian Bomme Ousager
- Department of Clinical Genetics, Odense University Hospital and Department of Clinical Research, University of Southern Denmark, Odense, Denmark
| | - Jacqueline Palace
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK
| | - Smita Y Patel
- Clinical Immunology, John Radcliffe Hospital, Level 4A, Oxford, OX3 9DU, UK
| | - Melissa M Pentony
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Chris Pugh
- Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7BN, UK
| | - Aboulfazl Rad
- Department of Otolaryngology-Head & Neck Surgery, Tübingen Hearing Research Centre, Eberhard Karls University, Elfriede-Aulhorn-Str. 5, 72076, Tübingen, Germany
| | - Archana Ramesh
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK
| | - Simone G Riva
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Irene Roberts
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
- Department of Paediatrics, University of Oxford, Level 2, Children's Hospital, John Radcliffe Hospital, Oxford, OX3 9DU, UK
| | - Noémi Roy
- Department of Haematology, Oxford University Hospitals NHS Foundation Trust, Level 4, Haematology, John Radcliffe Hospital, Oxford, OX3 9DU, UK
| | - Outi Salminen
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- Department of Oncology, Oxford Molecular Diagnostics Centre, University of Oxford, Level 4, John Radcliffe Hospital, Headley Way, Oxford, OX3 9DU, UK
| | - Kyleen D Schilling
- Ann & Robert H. Lurie Children's Hospital of Chicago, 225 E Chicago Avenue, Chicago, IL, 60611, USA
| | - Caroline Scott
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Arjune Sen
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK
| | - Conrad Smith
- Oxford Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Churchill Hospital, Old Road, Oxford, OX3 7LE, UK
| | - Mark Stevenson
- University of Oxford, Academic Endocrine Unit, OCDEM, Churchill Hospital, Oxford, OX3 7LJ, UK
| | - Rajesh V Thakker
- University of Oxford, Academic Endocrine Unit, OCDEM, Churchill Hospital, Oxford, OX3 7LJ, UK
| | - Stephen R F Twigg
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Holm H Uhlig
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- Department of Paediatrics, University of Oxford, Level 2, Children's Hospital, John Radcliffe Hospital, Oxford, OX3 9DU, UK
- Translational Gastroenterology Unit, John Radcliffe Hospital, Oxford, OX3 9DU, UK
| | - Richard van Wijk
- UMC Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, The Netherlands
| | - Barbara Vona
- Department of Otolaryngology-Head & Neck Surgery, Tübingen Hearing Research Centre, Eberhard Karls University, Elfriede-Aulhorn-Str. 5, 72076, Tübingen, Germany
- Institute of Human Genetics, University Medical Center Göttingen, Heinrich-Düker-Weg 12, 37073, Göttingen, Germany
- Institute for Auditory Neuroscience and InnerEarLab, University Medical Center Göttingen, Robert-Koch-Str. 40, 37075, Göttingen, Germany
| | - Steven Wall
- Oxford Craniofacial Unit, John Radcliffe Hospital, Level LG1, West Wing, Oxford, OX3 9DU, UK
| | - Jing Wang
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK
| | - Hugh Watkins
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- University of Oxford, Level 6 West Wing, Oxford, OX3 9DU, JR, UK
| | - Jaroslav Zak
- Nuffield Department of Clinical Medicine, Ludwig Institute for Cancer Research, University of Oxford, Old Road Campus Research Building, Oxford, OX3 7DQ, UK
- Department of Immunology and Microbiology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Anna H Schuh
- Department of Oncology, Oxford Molecular Diagnostics Centre, University of Oxford, Level 4, John Radcliffe Hospital, Headley Way, Oxford, OX3 9DU, UK
| | - Usha Kini
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- Oxford Centre for Genomic Medicine, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 7LE, UK
| | - Andrew O M Wilkie
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Niko Popitsch
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- Department of Biochemistry and Cell Biology, Max Perutz Labs, University of Vienna, Vienna BioCenter(VBC), Dr.-Bohr-Gasse 9, 1030, Vienna, Austria
| | - Jenny C Taylor
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK.
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK.
| |
Collapse
|
11
|
Bohn E, Lau TTY, Wagih O, Masud T, Merico D. A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction. Front Mol Biosci 2023; 10:1257550. [PMID: 37745687 PMCID: PMC10517338 DOI: 10.3389/fmolb.2023.1257550] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 08/28/2023] [Indexed: 09/26/2023] Open
Abstract
Introduction: Variants in 5' and 3' untranslated regions (UTR) contribute to rare disease. While predictive algorithms to assist in classifying pathogenicity can potentially be highly valuable, the utility of these tools is often unclear, as it depends on carefully selected training and validation conditions. To address this, we developed a high confidence set of pathogenic (P) and likely pathogenic (LP) variants and assessed deep learning (DL) models for predicting their molecular effects. Methods: 3' and 5' UTR variants documented as P or LP (P/LP) were obtained from ClinVar and refined by reviewing the annotated variant effect and reassessing evidence of pathogenicity following published guidelines. Prediction scores from sequence-based DL models were compared between three groups: P/LP variants acting though the mechanism for which the model was designed (model-matched), those operating through other mechanisms (model-mismatched), and putative benign variants. PhyloP was used to compare conservation scores between P/LP and putative benign variants. Results: 295 3' and 188 5' UTR variants were obtained from ClinVar, of which 26 3' and 68 5' UTR variants were classified as P/LP. Predictions by DL models achieved statistically significant differences when comparing modelmatched P/LP variants to both putative benign variants and modelmismatched P/LP variants, as well as when comparing all P/LP variants to putative benign variants. PhyloP conservation scores were significantly higher among P/LP compared to putative benign variants for both the 3' and 5' UTR. Discussion: In conclusion, we present a high-confidence set of P/LP 3' and 5' UTR variants spanning a range of mechanisms and supported by detailed pathogenicity and molecular mechanism evidence curation. Predictions from DL models further substantiate these classifications. These datasets will support further development and validation of DL algorithms designed to predict the functional impact of variants that may be implicated in rare disease.
Collapse
Affiliation(s)
- Emma Bohn
- Deep Genomics Inc., Toronto, ON, Canada
| | | | | | | | - Daniele Merico
- Deep Genomics Inc., Toronto, ON, Canada
- The Centre for Applied Genomics, Hospital for Sick Children, Toronto, ON, Canada
| |
Collapse
|
12
|
Yang M, Ali O, Bjørås M, Wang J. Identifying functional regulatory mutation blocks by integrating genome sequencing and transcriptome data. iScience 2023; 26:107266. [PMID: 37520692 PMCID: PMC10371843 DOI: 10.1016/j.isci.2023.107266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 04/05/2023] [Accepted: 06/28/2023] [Indexed: 08/01/2023] Open
Abstract
Millions of single nucleotide variants (SNVs) exist in the human genome; however, it remains challenging to identify functional SNVs associated with diseases. We propose a non-encoding SNVs analysis tool bpb3, BayesPI-BAR version 3, aiming to identify the functional mutation blocks (FMBs) by integrating genome sequencing and transcriptome data. The identified FMBs display high frequency SNVs, significant changes in transcription factors (TFs) binding affinity and are nearby the regulatory regions of differentially expressed genes. A two-level Bayesian approach with a biophysical model for protein-DNA interactions is implemented, to compute TF-DNA binding affinity changes based on clustered position weight matrices (PWMs) from over 1700 TF-motifs. The epigenetic data, such as the DNA methylome can also be integrated to scan FMBs. By testing the datasets from follicular lymphoma and melanoma, bpb3 automatically and robustly identifies FMBs, demonstrating that bpb3 can provide insight into patho-mechanisms, and therapeutic targets from transcriptomic and genomic data.
Collapse
Affiliation(s)
- Mingyi Yang
- Department of Microbiology, Oslo University Hospital and University of Oslo, Oslo, Norway
- Department of Medical Biochemistry, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Omer Ali
- Department of Pathology, Oslo University Hospital - Norwegian Radium Hospital, Oslo, Norway
- Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Magnar Bjørås
- Department of Microbiology, Oslo University Hospital and University of Oslo, Oslo, Norway
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Junbai Wang
- Department of Clinical Molecular Biology (EpiGen), Akershus University Hospital and University of Oslo, Lørenskog, Norway
| |
Collapse
|
13
|
Liu Z, Samee M. Structural underpinnings of mutation rate variations in the human genome. Nucleic Acids Res 2023; 51:7184-7197. [PMID: 37395403 PMCID: PMC10415140 DOI: 10.1093/nar/gkad551] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 06/06/2023] [Accepted: 06/15/2023] [Indexed: 07/04/2023] Open
Abstract
Single nucleotide mutation rates have critical implications for human evolution and genetic diseases. Importantly, the rates vary substantially across the genome and the principles underlying such variations remain poorly understood. A recent model explained much of this variation by considering higher-order nucleotide interactions in the 7-mer sequence context around mutated nucleotides. This model's success implicates a connection between DNA shape and mutation rates. DNA shape, i.e. structural properties like helical twist and tilt, is known to capture interactions between nucleotides within a local context. Thus, we hypothesized that changes in DNA shape features at and around mutated positions can explain mutation rate variations in the human genome. Indeed, DNA shape-based models of mutation rates showed similar or improved performance over current nucleotide sequence-based models. These models accurately characterized mutation hotspots in the human genome and revealed the shape features whose interactions underlie mutation rate variations. DNA shape also impacts mutation rates within putative functional regions like transcription factor binding sites where we find a strong association between DNA shape and position-specific mutation rates. This work demonstrates the structural underpinnings of nucleotide mutations in the human genome and lays the groundwork for future models of genetic variations to incorporate DNA shape.
Collapse
Affiliation(s)
- Zian Liu
- Department of Integrative Physiology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Md Abul Hassan Samee
- Department of Integrative Physiology, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
14
|
Butera A, Amelio I. Healthy lifestyle? or just the right genetic mutations. Cell Cycle 2023; 22:1353-1356. [PMID: 37128635 PMCID: PMC10228415 DOI: 10.1080/15384101.2023.2206351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 01/25/2023] [Accepted: 01/25/2023] [Indexed: 05/03/2023] Open
Abstract
The development of genomic technologies over the past decades has enabled identification of genetic variants responsible of disease; occasionally however, protective rare variants emerged. Verweij et al have recently reported genetic variants in CIDEB gene that are protective from liver injury. Here, we briefly summarise the recent findings on the impact of CIDEB variants on liver disease, while emphasizing how phenotype-genotype studies tailored for the identification of "protective" mutations might direct development of prevention and therapeutic strategies for common diseases.
Collapse
Affiliation(s)
- Alessio Butera
- Chair for Systems Toxicology, University of Konstanz, Konstanz, Germany
| | - Ivano Amelio
- Chair for Systems Toxicology, University of Konstanz, Konstanz, Germany
| |
Collapse
|
15
|
Wang Z, Zhao G, Li B, Fang Z, Chen Q, Wang X, Luo T, Wang Y, Zhou Q, Li K, Xia L, Zhang Y, Zhou X, Pan H, Zhao Y, Wang Y, Wang L, Guo J, Tang B, Xia K, Li J. Performance Comparison of Computational Methods for the Prediction of the Function and Pathogenicity of Non-coding Variants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:649-661. [PMID: 35272052 PMCID: PMC10787016 DOI: 10.1016/j.gpb.2022.02.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 12/28/2021] [Accepted: 02/27/2022] [Indexed: 06/14/2023]
Abstract
Non-coding variants in the human genome significantly influence human traits and complex diseases via their regulation and modification effects. Hence, an increasing number of computational methods are developed to predict the effects of variants in human non-coding sequences. However, it is difficult for inexperienced users to select appropriate computational methods from dozens of available methods. To solve this issue, we assessed 12 performance metrics of 24 methods on four independent non-coding variant benchmark datasets: (1) rare germline variants from clinical relevant sequence variants (ClinVar), (2) rare somatic variants from Catalogue Of Somatic Mutations In Cancer (COSMIC), (3) common regulatory variants from curated expression quantitative trait locus (eQTL) data, and (4) disease-associated common variants from curated genome-wide association studies (GWAS). All 24 tested methods performed differently under various conditions, indicating varying strengths and weaknesses under different scenarios. Importantly, the performance of existing methods was acceptable for rare germline variants from ClinVar with the area under the receiver operating characteristic curve (AUROC) of 0.4481-0.8033 and poor for rare somatic variants from COSMIC (AUROC = 0.4984-0.7131), common regulatory variants from curated eQTL data (AUROC = 0.4837-0.6472), and disease-associated common variants from curated GWAS (AUROC = 0.4766-0.5188). We also compared the prediction performance of 24 methods for non-coding de novo mutations in autism spectrum disorder, and found that the combined annotation-dependent depletion (CADD) and context-dependent tolerance score (CDTS) methods showed better performance. Summarily, we assessed the performance of 24 computational methods under diverse scenarios, providing preliminary advice for proper tool selection and guiding the development of new techniques in interpreting non-coding variants.
Collapse
Affiliation(s)
- Zheng Wang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Guihu Zhao
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Bin Li
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Zhenghuan Fang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Qian Chen
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Xiaomeng Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Tengfei Luo
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Yijing Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Qiao Zhou
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Kuokuo Li
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Lu Xia
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Yi Zhang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Xun Zhou
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Hongxu Pan
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Yuwen Zhao
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Yige Wang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Lin Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China; Reproductive Medicine Center, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Jifeng Guo
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Beisha Tang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Kun Xia
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Jinchen Li
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China; Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China.
| |
Collapse
|
16
|
Shi FY, Wang Y, Huang D, Liang Y, Liang N, Chen XW, Gao G. Computational Assessment of the Expression-modulating Potential for Non-coding Variants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:662-673. [PMID: 34890839 PMCID: PMC10787178 DOI: 10.1016/j.gpb.2021.10.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 10/13/2021] [Accepted: 11/01/2021] [Indexed: 06/13/2023]
Abstract
Large-scale genome-wide association studies (GWAS) and expression quantitative trait locus (eQTL) studies have identified multiple non-coding variants associated with genetic diseases by affecting gene expression. However, pinpointing causal variants effectively and efficiently remains a serious challenge. Here, we developed CARMEN, a novel algorithm to identify functional non-coding expression-modulating variants. Multiple evaluations demonstrated CARMEN's superior performance over state-of-the-art tools. Applying CARMEN to GWAS and eQTL datasets further pinpointed several causal variants other than the reported lead single-nucleotide polymorphisms (SNPs). CARMEN scales well with the massive datasets, and is available online as a web server at http://carmen.gao-lab.org.
Collapse
Affiliation(s)
- Fang-Yuan Shi
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Yu Wang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Dong Huang
- State Key Laboratory of Membrane Biology, Institute of Molecular Medicine, Peking University, Beijing 100871, China
| | - Yu Liang
- Human Aging Research Institute, School of Life Science, Nanchang University, Nanchang 330031, China
| | - Nan Liang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Xiao-Wei Chen
- State Key Laboratory of Membrane Biology, Institute of Molecular Medicine, Peking University, Beijing 100871, China; Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Ge Gao
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China.
| |
Collapse
|
17
|
Licata L, Via A, Turina P, Babbi G, Benevenuta S, Carta C, Casadio R, Cicconardi A, Facchiano A, Fariselli P, Giordano D, Isidori F, Marabotti A, Martelli PL, Pascarella S, Pinelli M, Pippucci T, Russo R, Savojardo C, Scafuri B, Valeriani L, Capriotti E. Resources and tools for rare disease variant interpretation. Front Mol Biosci 2023; 10:1169109. [PMID: 37234922 PMCID: PMC10206239 DOI: 10.3389/fmolb.2023.1169109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Accepted: 04/25/2023] [Indexed: 05/28/2023] Open
Abstract
Collectively, rare genetic disorders affect a substantial portion of the world's population. In most cases, those affected face difficulties in receiving a clinical diagnosis and genetic characterization. The understanding of the molecular mechanisms of these diseases and the development of therapeutic treatments for patients are also challenging. However, the application of recent advancements in genome sequencing/analysis technologies and computer-aided tools for predicting phenotype-genotype associations can bring significant benefits to this field. In this review, we highlight the most relevant online resources and computational tools for genome interpretation that can enhance the diagnosis, clinical management, and development of treatments for rare disorders. Our focus is on resources for interpreting single nucleotide variants. Additionally, we present use cases for interpreting genetic variants in clinical settings and review the limitations of these results and prediction tools. Finally, we have compiled a curated set of core resources and tools for analyzing rare disease genomes. Such resources and tools can be utilized to develop standardized protocols that will enhance the accuracy and effectiveness of rare disease diagnosis.
Collapse
Affiliation(s)
- Luana Licata
- Department of Biology, University of Rome Tor Vergata, Roma, Italy
| | - Allegra Via
- Department of Biochemical Sciences “A. Rossi Fanelli”, University of Rome “La Sapienza”, Roma, Italy
| | - Paola Turina
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Giulia Babbi
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | - Claudio Carta
- National Centre for Rare Diseases, Istituto Superiore di Sanità, Roma, Italy
| | - Rita Casadio
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Andrea Cicconardi
- Department of Physics, University of Genova, Genova, Italy
- Italiano di Tecnologia—IIT, Genova, Italy
| | - Angelo Facchiano
- National Research Council, Institute of Food Science, Avellino, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Torino, Italy
| | - Deborah Giordano
- National Research Council, Institute of Food Science, Avellino, Italy
| | - Federica Isidori
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Anna Marabotti
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano, SA, Italy
| | - Pier Luigi Martelli
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Stefano Pascarella
- Department of Biochemical Sciences “A. Rossi Fanelli”, University of Rome “La Sapienza”, Roma, Italy
| | - Michele Pinelli
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy
| | - Tommaso Pippucci
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Roberta Russo
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy
- CEINGE Biotecnologie Avanzate Franco Salvatore, Napoli, Italy
| | - Castrense Savojardo
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Bernardina Scafuri
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano, SA, Italy
| | | | - Emidio Capriotti
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
18
|
Zieger HK, Weinhold L, Schmidt A, Holtgrewe M, Juranek SA, Siewert A, Scheer AB, Thieme F, Mangold E, Ishorst N, Brand FU, Welzenbach J, Beule D, Paeschke K, Krawitz PM, Ludwig KU. Prioritization of non-coding elements involved in non-syndromic cleft lip with/without cleft palate through genome-wide analysis of de novo mutations. HGG ADVANCES 2023; 4:100166. [PMID: 36589413 PMCID: PMC9795529 DOI: 10.1016/j.xhgg.2022.100166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 12/01/2022] [Indexed: 12/12/2022] Open
Abstract
Non-syndromic cleft lip with/without cleft palate (nsCL/P) is a highly heritable facial disorder. To date, systematic investigations of the contribution of rare variants in non-coding regions to nsCL/P etiology are sparse. Here, we re-analyzed available whole-genome sequence (WGS) data from 211 European case-parent trios with nsCL/P and identified 13,522 de novo mutations (DNMs) in nsCL/P cases, 13,055 of which mapped to non-coding regions. We integrated these data with DNMs from a reference cohort, with results of previous genome-wide association studies (GWASs), and functional and epigenetic datasets of relevance to embryonic facial development. A significant enrichment of nsCL/P DNMs was observed at two GWAS risk loci (4q28.1 (p = 8 × 10-4) and 2p21 (p = 0.02)), suggesting a convergence of both common and rare variants at these loci. We also mapped the DNMs to 810 position weight matrices indicative of transcription factor (TF) binding, and quantified the effect of the allelic changes in silico. This revealed a nominally significant overrepresentation of DNMs (p = 0.037), and a stronger effect on binding strength, for DNMs located in the sequence of the core binding region of the TF Musculin (MSC). Notably, MSC is involved in facial muscle development, together with a set of nsCL/P genes located at GWAS loci. Supported by additional results from single-cell transcriptomic data and molecular binding assays, this suggests that variation in MSC binding sites contributes to nsCL/P etiology. Our study describes a set of approaches that can be applied to increase the added value of WGS data.
Collapse
Affiliation(s)
- Hanna K. Zieger
- Institute of Human Genetics, University of Bonn, School of Medicine and University Hospital Bonn, Bonn 53127, Germany
| | - Leonie Weinhold
- Institute for Medical Biometry, Informatics and Epidemiology, University Hospital Bonn, Bonn 53127, Germany
| | - Axel Schmidt
- Institute of Human Genetics, University of Bonn, School of Medicine and University Hospital Bonn, Bonn 53127, Germany
| | - Manuel Holtgrewe
- Core Unit Bioinformatics, Berlin Institute of Health, Berlin 10117, Germany
| | - Stefan A. Juranek
- Department of Oncology, Hematology and Rheumatology, University Hospital Bonn, Bonn 53127, Germany
| | - Anna Siewert
- Institute of Human Genetics, University of Bonn, School of Medicine and University Hospital Bonn, Bonn 53127, Germany
| | - Annika B. Scheer
- Institute of Human Genetics, University of Bonn, School of Medicine and University Hospital Bonn, Bonn 53127, Germany
| | - Frederic Thieme
- Institute of Human Genetics, University of Bonn, School of Medicine and University Hospital Bonn, Bonn 53127, Germany
| | - Elisabeth Mangold
- Institute of Human Genetics, University of Bonn, School of Medicine and University Hospital Bonn, Bonn 53127, Germany
| | - Nina Ishorst
- Institute of Human Genetics, University of Bonn, School of Medicine and University Hospital Bonn, Bonn 53127, Germany
| | - Fabian U. Brand
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Bonn 53127, Germany
| | - Julia Welzenbach
- Institute of Human Genetics, University of Bonn, School of Medicine and University Hospital Bonn, Bonn 53127, Germany
| | - Dieter Beule
- Core Unit Bioinformatics, Berlin Institute of Health, Berlin 10117, Germany
- Max Delbrück Center for Molecular Medicine, Berlin 13125, Germany
| | - Katrin Paeschke
- Department of Oncology, Hematology and Rheumatology, University Hospital Bonn, Bonn 53127, Germany
| | - Peter M. Krawitz
- Institute for Medical Biometry, Informatics and Epidemiology, University Hospital Bonn, Bonn 53127, Germany
| | - Kerstin U. Ludwig
- Institute of Human Genetics, University of Bonn, School of Medicine and University Hospital Bonn, Bonn 53127, Germany
| |
Collapse
|
19
|
Papageorgiou L, Kalospyrou E, Papakonstantinou E, Diakou I, Pierouli K, Dragoumani K, Bacopoulou F, Chrousos GP, Exarchos TP, Vlamos P, Eliopoulos E, Vlachakis D. DRDs and Brain-Derived Neurotrophic Factor Share a Common Therapeutic Ground: A Novel Bioinformatic Approach Sheds New Light Toward Pharmacological Treatment of Cognitive and Behavioral Disorders. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2023; 1424:97-115. [PMID: 37486484 DOI: 10.1007/978-3-031-31982-2_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/25/2023]
Abstract
Cognitive and behavioral disorders are subgroups of mental health disorders. Both cognitive and behavioral disorders can occur in people of different ages, genders, and social backgrounds, and they can cause serious physical, mental, or social problems. The risk factors for these diseases are numerous, with a range from genetic and epigenetic factors to physical factors. In most cases, the appearance of such a disorder in an individual is a combination of his genetic profile and environmental stimuli. To date, researchers have not been able to identify the specific causes of these disorders, and as such, there is urgent need for innovative study approaches. The aim of the present study was to identify the genetic factors which seem to be more directly responsible for the occurrence of a cognitive and/or behavioral disorder. More specifically, through bioinformatics tools and software as well as analytical methods such as systemic data and text mining, semantic analysis, and scoring functions, we extracted the most relevant single nucleotide polymorphisms (SNPs) and genes connected to these disorders. All the extracted SNPs were filtered, annotated, classified, and evaluated in order to create the "genomic grammar" of these diseases. The identified SNPs guided the search for top suspected genetic factors, dopamine receptors D and neurotrophic factor BDNF, for which regulatory networks were built. The identification of the "genomic grammar" and underlying factors connected to cognitive and behavioral disorders can aid in the successful disease profiling and the establishment of novel pharmacological targets and provide the basis for personalized medicine, which takes into account the patient's genetic background as well as epigenetic factors.
Collapse
Affiliation(s)
- Louis Papageorgiou
- Department of Biotechnology, Laboratory of Genetics, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Efstathia Kalospyrou
- Department of Biotechnology, Laboratory of Genetics, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Eleni Papakonstantinou
- Department of Biotechnology, Laboratory of Genetics, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Io Diakou
- Department of Biotechnology, Laboratory of Genetics, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Katerina Pierouli
- Department of Biotechnology, Laboratory of Genetics, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Konstantina Dragoumani
- Department of Biotechnology, Laboratory of Genetics, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Flora Bacopoulou
- University Research Institute of Maternal and Child Health & Precision Medicine, National and Kapodistrian University of Athens, "Aghia Sophia" Children's Hospital, Athens, Greece
| | - George P Chrousos
- University Research Institute of Maternal and Child Health & Precision Medicine, National and Kapodistrian University of Athens, "Aghia Sophia" Children's Hospital, Athens, Greece
| | - Themis P Exarchos
- Department of Informatics, Bioinformatics & Human Electrophysiology Laboratory, Ionian University, Corfu, Greece
| | - Panagiotis Vlamos
- Department of Informatics, Bioinformatics & Human Electrophysiology Laboratory, Ionian University, Corfu, Greece
| | - Elias Eliopoulos
- Department of Biotechnology, Laboratory of Genetics, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Dimitrios Vlachakis
- Department of Biotechnology, Laboratory of Genetics, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece.
- University Research Institute of Maternal and Child Health & Precision Medicine, National and Kapodistrian University of Athens, "Aghia Sophia" Children's Hospital, Athens, Greece.
- Division of Endocrinology and Metabolism, Center of Clinical, Experimental Surgery and Translational Research, Biomedical Research Foundation of the Academy of Athens, Athens, Greece.
| |
Collapse
|
20
|
Schubach M, Nazaretyan L, Kircher M. The Regulatory Mendelian Mutation score for GRCh38. Gigascience 2022; 12:giad024. [PMID: 37083939 PMCID: PMC10120424 DOI: 10.1093/gigascience/giad024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 01/10/2023] [Accepted: 03/21/2023] [Indexed: 04/22/2023] Open
Abstract
BACKGROUND Genome sequencing efforts for individuals with rare Mendelian disease have increased the research focus on the noncoding genome and the clinical need for methods that prioritize potentially disease causal noncoding variants. Some tools for assessment of variant pathogenicity as well as annotations are not available for the current human genome build (GRCh38), for which the adoption in databases, software, and pipelines was slow. RESULTS Here, we present an updated version of the Regulatory Mendelian Mutation (ReMM) score, retrained on features and variants derived from the GRCh38 genome build. Like its GRCh37 version, it achieves good performance on its highly imbalanced data. To improve accessibility and provide users with a toolbox to score their variant files and look up scores in the genome, we developed a website and API for easy score lookup. CONCLUSIONS Scores of the GRCh38 genome build are highly correlated to the prior release with a performance increase due to the better coverage of features. For prioritization of noncoding mutations in imbalanced datasets, the ReMM score performed much better than other variation scores. Prescored whole-genome files of GRCh37 and GRCh38 genome builds are cited in the article and the website; UCSC genome browser tracks, and an API are available at https://remm.bihealth.org.
Collapse
Affiliation(s)
- Max Schubach
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité–Universitätsmedizin Berlin, 10117 Berlin, Germany
| | - Lusiné Nazaretyan
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité–Universitätsmedizin Berlin, 10117 Berlin, Germany
| | - Martin Kircher
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité–Universitätsmedizin Berlin, 10117 Berlin, Germany
- Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck, 23562 Lübeck, Germany
| |
Collapse
|
21
|
Morova T, Ding Y, Huang CCF, Sar F, Schwarz T, Giambartolomei C, Baca S, Grishin D, Hach F, Gusev A, Freedman M, Pasaniuc B, Lack N. Optimized high-throughput screening of non-coding variants identified from genome-wide association studies. Nucleic Acids Res 2022; 51:e18. [PMID: 36546757 PMCID: PMC9943666 DOI: 10.1093/nar/gkac1198] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 11/19/2022] [Accepted: 12/06/2022] [Indexed: 12/24/2022] Open
Abstract
The vast majority of disease-associated single nucleotide polymorphisms (SNP) identified from genome-wide association studies (GWAS) are localized in non-coding regions. A significant fraction of these variants impact transcription factors binding to enhancer elements and alter gene expression. To functionally interrogate the activity of such variants we developed snpSTARRseq, a high-throughput experimental method that can interrogate the functional impact of hundreds to thousands of non-coding variants on enhancer activity. snpSTARRseq dramatically improves signal-to-noise by utilizing a novel sequencing and bioinformatic approach that increases both insert size and the number of variants tested per loci. Using this strategy, we interrogated known prostate cancer (PCa) risk-associated loci and demonstrated that 35% of them harbor SNPs that significantly altered enhancer activity. Combining these results with chromosomal looping data we could identify interacting genes and provide a mechanism of action for 20 PCa GWAS risk regions. When benchmarked to orthogonal methods, snpSTARRseq showed a strong correlation with in vivo experimental allelic-imbalance studies whereas there was no correlation with predictive in silico approaches. Overall, snpSTARRseq provides an integrated experimental and computational framework to functionally test non-coding genetic variants.
Collapse
Affiliation(s)
- Tunc Morova
- Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| | - Yi Ding
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | | | - Funda Sar
- Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| | - Tommer Schwarz
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Claudia Giambartolomei
- Central RNA Lab, Istituto Italiano di Tecnologia, Genova 16163, Italy,Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Sylvan C Baca
- Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana Farber Cancer Institute, Boston, MA 02215, USA
| | - Dennis Grishin
- Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana Farber Cancer Institute, Boston, MA 02215, USA
| | - Faraz Hach
- Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada,Department of Urologic Science, University of British Columbia, Vancouver, BC V5Z 1M9, Canada
| | - Alexander Gusev
- Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana Farber Cancer Institute, Boston, MA 02215, USA,Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Matthew L Freedman
- Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana Farber Cancer Institute, Boston, MA 02215, USA,The Center for Cancer Genome Discovery, Dana Farber Cancer Institute, Boston, MA 02215, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Nathan A Lack
- To whom correspondence should be addressed. Tel: +1 604 875 4411;
| |
Collapse
|
22
|
Ryan N, Ormond C, Chang YC, Contreras J, Raventos H, Gill M, Heron E, Mathews CA, Corvin A. Identity-by-descent analysis of a large Tourette's syndrome pedigree from Costa Rica implicates genes involved in neuronal development and signal transduction. Mol Psychiatry 2022; 27:5020-5027. [PMID: 36224258 PMCID: PMC9763103 DOI: 10.1038/s41380-022-01771-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 05/13/2022] [Accepted: 08/30/2022] [Indexed: 01/14/2023]
Abstract
Tourette Syndrome (TS) is a heritable, early-onset neuropsychiatric disorder that typically begins in early childhood. Identifying rare genetic variants that make a significant contribution to risk in affected families may provide important insights into the molecular aetiology of this complex and heterogeneous syndrome. Here we present a whole-genome sequencing (WGS) analysis from the 11-generation pedigree (>500 individuals) of a densely affected Costa Rican family which shares ancestry from six founder pairs. By conducting an identity-by-descent (IBD) analysis using WGS data from 19 individuals from the extended pedigree we have identified putative risk haplotypes that were not seen in controls, and can be linked with four of the six founder pairs. Rare coding and non-coding variants present on the haplotypes and only seen in haplotype carriers show an enrichment in pathways such as regulation of locomotion and signal transduction, suggesting common mechanisms by which the haplotype-specific variants may be contributing to TS-risk in this pedigree. In particular we have identified a rare deleterious missense variation in RAPGEF1 on a chromosome 9 haplotype and two ultra-rare deleterious intronic variants in ERBB4 and IKZF2 on the same chromosome 2 haplotype. All three genes play a role in neurodevelopment. This study, using WGS data in a pedigree-based approach, shows the importance of investigating both coding and non-coding variants to identify genes that may contribute to disease risk. Together, the genes and variants identified on the IBD haplotypes represent biologically relevant targets for investigation in other pedigree and population-based TS data.
Collapse
Affiliation(s)
- Niamh Ryan
- Neuropsychiatric Genetics Research Group, Department of Psychiatry, Trinity College Dublin, Dublin, Ireland
| | - Cathal Ormond
- Neuropsychiatric Genetics Research Group, Department of Psychiatry, Trinity College Dublin, Dublin, Ireland
| | - Yi-Chieh Chang
- Department of Psychiatry, Center for OCD, Anxiety, and Related Disorders, University of Florida, Gainesville, FL, USA
| | - Javier Contreras
- Centro de Investigación en Biología Celular y Molecular, Universidad de Costa Rica, San José, Costa Rica
| | - Henriette Raventos
- Centro de Investigación en Biología Celular y Molecular, Universidad de Costa Rica, San José, Costa Rica
- School of Biology, Universidad de Costa Rica, San José, Costa Rica
| | - Michael Gill
- Neuropsychiatric Genetics Research Group, Department of Psychiatry, Trinity College Dublin, Dublin, Ireland
| | - Elizabeth Heron
- Neuropsychiatric Genetics Research Group, Department of Psychiatry, Trinity College Dublin, Dublin, Ireland
| | - Carol A Mathews
- Department of Psychiatry, Center for OCD, Anxiety, and Related Disorders, University of Florida, Gainesville, FL, USA.
- University of Florida Genetics Institute, University of Florida, Gainesville, FL, USA.
| | - Aiden Corvin
- Neuropsychiatric Genetics Research Group, Department of Psychiatry, Trinity College Dublin, Dublin, Ireland.
| |
Collapse
|
23
|
He Z, Liu L, Belloy ME, Le Guen Y, Sossin A, Liu X, Qi X, Ma S, Gyawali PK, Wyss-Coray T, Tang H, Sabatti C, Candès E, Greicius MD, Ionita-Laza I. GhostKnockoff inference empowers identification of putative causal variants in genome-wide association studies. Nat Commun 2022; 13:7209. [PMID: 36418338 PMCID: PMC9684164 DOI: 10.1038/s41467-022-34932-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 11/09/2022] [Indexed: 11/27/2022] Open
Abstract
Recent advances in genome sequencing and imputation technologies provide an exciting opportunity to comprehensively study the contribution of genetic variants to complex phenotypes. However, our ability to translate genetic discoveries into mechanistic insights remains limited at this point. In this paper, we propose an efficient knockoff-based method, GhostKnockoff, for genome-wide association studies (GWAS) that leads to improved power and ability to prioritize putative causal variants relative to conventional GWAS approaches. The method requires only Z-scores from conventional GWAS and hence can be easily applied to enhance existing and future studies. The method can also be applied to meta-analysis of multiple GWAS allowing for arbitrary sample overlap. We demonstrate its performance using empirical simulations and two applications: (1) a meta-analysis for Alzheimer's disease comprising nine overlapping large-scale GWAS, whole-exome and whole-genome sequencing studies and (2) analysis of 1403 binary phenotypes from the UK Biobank data in 408,961 samples of European ancestry. Our results demonstrate that GhostKnockoff can identify putatively functional variants with weaker statistical effects that are missed by conventional association tests.
Collapse
Affiliation(s)
- Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA.
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA, 94305, USA.
| | - Linxi Liu
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Michael E Belloy
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Yann Le Guen
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
- Institut du Cerveau - Paris Brain Institute - ICM, Paris, 75013, France
| | - Aaron Sossin
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Xiaoxia Liu
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Xinran Qi
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Shiyang Ma
- Department of Biostatistics, Columbia University, New York, NY, 10032, USA
| | - Prashnna K Gyawali
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Tony Wyss-Coray
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Hua Tang
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Chiara Sabatti
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Emmanuel Candès
- Department of Statistics, Stanford University, Stanford, CA, 94305, USA
- Department of Mathematics, Stanford University, Stanford, CA, 94305, USA
| | - Michael D Greicius
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | | |
Collapse
|
24
|
Van de Sompele S, Small KW, Cicekdal MB, Soriano VL, D'haene E, Shaya FS, Agemy S, Van der Snickt T, Rey AD, Rosseel T, Van Heetvelde M, Vergult S, Balikova I, Bergen AA, Boon CJF, De Zaeytijd J, Inglehearn CF, Kousal B, Leroy BP, Rivolta C, Vaclavik V, van den Ende J, van Schooneveld MJ, Gómez-Skarmeta JL, Tena JJ, Martinez-Morales JR, Liskova P, Vleminckx K, De Baere E. Multi-omics approach dissects cis-regulatory mechanisms underlying North Carolina macular dystrophy, a retinal enhanceropathy. Am J Hum Genet 2022; 109:2029-2048. [PMID: 36243009 PMCID: PMC9674966 DOI: 10.1016/j.ajhg.2022.09.013] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 09/28/2022] [Indexed: 01/26/2023] Open
Abstract
North Carolina macular dystrophy (NCMD) is a rare autosomal-dominant disease affecting macular development. The disease is caused by non-coding single-nucleotide variants (SNVs) in two hotspot regions near PRDM13 and by duplications in two distinct chromosomal loci, overlapping DNase I hypersensitive sites near either PRDM13 or IRX1. To unravel the mechanisms by which these variants cause disease, we first established a genome-wide multi-omics retinal database, RegRet. Integration of UMI-4C profiles we generated on adult human retina then allowed fine-mapping of the interactions of the PRDM13 and IRX1 promoters and the identification of eighteen candidate cis-regulatory elements (cCREs), the activity of which was investigated by luciferase and Xenopus enhancer assays. Next, luciferase assays showed that the non-coding SNVs located in the two hotspot regions of PRDM13 affect cCRE activity, including two NCMD-associated non-coding SNVs that we identified herein. Interestingly, the cCRE containing one of these SNVs was shown to interact with the PRDM13 promoter, demonstrated in vivo activity in Xenopus, and is active at the developmental stage when progenitor cells of the central retina exit mitosis, suggesting that this region is a PRDM13 enhancer. Finally, mining of single-cell transcriptional data of embryonic and adult retina revealed the highest expression of PRDM13 and IRX1 when amacrine cells start to synapse with retinal ganglion cells, supporting the hypothesis that altered PRDM13 or IRX1 expression impairs interactions between these cells during retinogenesis. Overall, this study provides insight into the cis-regulatory mechanisms of NCMD and supports that this condition is a retinal enhanceropathy.
Collapse
Affiliation(s)
- Stijn Van de Sompele
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Kent W Small
- Macula and Retina Institute, Los Angeles and Glendale, California, USA
| | - Munevver Burcu Cicekdal
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium; Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium
| | - Víctor López Soriano
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Eva D'haene
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Fadi S Shaya
- Macula and Retina Institute, Los Angeles and Glendale, California, USA
| | - Steven Agemy
- Department of Ophthalmology, SUNY Downstate Medical Center University, Brooklyn, New York, USA
| | - Thijs Van der Snickt
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Alfredo Dueñas Rey
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Toon Rosseel
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Mattias Van Heetvelde
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Sarah Vergult
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Irina Balikova
- Department of Ophthalmology, University Hospitals Leuven, Leuven, Belgium
| | - Arthur A Bergen
- Department of Human Genetics, Amsterdam UMC, Academic Medical Center, 1105 AZ Amsterdam, The Netherlands; Queen Emma Centre of Precision Medicine, Amsterdam University Medical Centre, University of Amsterdam, Amsterdam, The Netherlands
| | - Camiel J F Boon
- Department of Ophthalmology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, The Netherlands; Department of Ophthalmology, Leiden University Medical Center, Leiden, The Netherlands
| | - Julie De Zaeytijd
- Department of Ophthalmology, Ghent University Hospital, Ghent, Belgium
| | - Chris F Inglehearn
- Division of Molecular Medicine, Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - Bohdan Kousal
- Department of Ophthalmology, First Faculty of Medicine, Charles University and General University Hospital in Prague, Prague, Czech Republic
| | - Bart P Leroy
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium; Department of Ophthalmology, Ghent University Hospital, Ghent, Belgium; Department of Head & Skin, Ghent University, Ghent, Belgium; Division of Ophthalmology & Center for Cellular & Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Carlo Rivolta
- Institute of Molecular and Clinical Ophthalmology Basel (IOB), Basel, Switzerland; Department of Ophthalmology, University of Basel, Basel, Switzerland; Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Veronika Vaclavik
- University of Lausanne, Jules-Gonin Eye Hospital, Lausanne, Switzerland
| | | | - Mary J van Schooneveld
- Department of Ophthalmology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, The Netherlands; Bartiméus, Diagnostic Center for Complex Visual Disorders, Zeist, The Netherlands
| | - José Luis Gómez-Skarmeta
- Centro Andaluz de Biología del Desarrollo, Consejo Superior de Investigaciones Científicas and Universidad Pablo de Olavide, Sevilla, Spain
| | - Juan J Tena
- Centro Andaluz de Biología del Desarrollo, Consejo Superior de Investigaciones Científicas and Universidad Pablo de Olavide, Sevilla, Spain
| | - Juan R Martinez-Morales
- Centro Andaluz de Biología del Desarrollo, Consejo Superior de Investigaciones Científicas and Universidad Pablo de Olavide, Sevilla, Spain
| | - Petra Liskova
- Department of Ophthalmology, First Faculty of Medicine, Charles University and General University Hospital in Prague, Prague, Czech Republic; Department of Paediatrics and Inherited Metabolic Disorders, First Faculty of Medicine, Charles University and General University Hospital in Prague, Prague, Czech Republic
| | - Kris Vleminckx
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium; Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium
| | - Elfride De Baere
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium.
| |
Collapse
|
25
|
Bykova M, Hou Y, Eng C, Cheng F. Quantitative trait locus (xQTL) approaches identify risk genes and drug targets from human non-coding genomes. Hum Mol Genet 2022; 31:R105-R113. [PMID: 36018824 PMCID: PMC9989738 DOI: 10.1093/hmg/ddac208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 08/18/2022] [Accepted: 08/19/2022] [Indexed: 11/13/2022] Open
Abstract
Advances and reduction of costs in various sequencing technologies allow for a closer look at variations present in the non-coding regions of the human genome. Correlating non-coding variants with large-scale multi-omic data holds the promise not only of a better understanding of likely causal connections between non-coding DNA and expression of traits but also identifying potential disease-modifying medicines. Genome-phenome association studies have created large datasets of DNA variants that are associated with multiple traits or diseases, such as Alzheimer's disease; yet, the functional consequences of variants, in particular of non-coding variants, remain largely unknown. Recent advances in functional genomics and computational approaches have led to the identification of potential roles of DNA variants, such as various quantitative trait locus (xQTL) techniques. Multi-omic assays and analytic approaches toward xQTL have identified links between genetic loci and human transcriptomic, epigenomic, proteomic and metabolomic data. In this review, we first discuss the recent development of xQTL from multi-omic findings. We then highlight multimodal analysis of xQTL and genetic data for identification of risk genes and drug targets using Alzheimer's disease as an example. We finally discuss challenges and future research directions (e.g. artificial intelligence) for annotation of non-coding variants in complex diseases.
Collapse
Affiliation(s)
- Marina Bykova
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Yuan Hou
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Charis Eng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
| |
Collapse
|
26
|
Vitsios D, Dhindsa RS, Matelska D, Mitchell J, Zou X, Armenia J, Hu F, Wang Q, Sidders B, Harper AR, Petrovski S. Cancer-driving mutations are enriched in genic regions intolerant to germline variation. SCIENCE ADVANCES 2022; 8:eabo6371. [PMID: 36026442 PMCID: PMC9417173 DOI: 10.1126/sciadv.abo6371] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Large reference datasets of protein-coding variation in human populations have allowed us to determine which genes and genic subregions are intolerant to germline genetic variation. There is also a growing number of genes implicated in severe Mendelian diseases that overlap with genes implicated in cancer. We hypothesized that cancer-driving mutations might be enriched in genic subregions that are depleted of germline variation relative to somatic variation. We introduce a new metric, OncMTR (oncology missense tolerance ratio), which uses 125,748 exomes in the Genome Aggregation Database (gnomAD) to identify these genic subregions. We demonstrate that OncMTR can significantly predict driver mutations implicated in hematologic malignancies. Divergent OncMTR regions were enriched for cancer-relevant protein domains, and overlaying OncMTR scores on protein structures identified functionally important protein residues. Last, we performed a rare variant, gene-based collapsing analysis on an independent set of 394,694 exomes from the UK Biobank and find that OncMTR markedly improves genetic signals for hematologic malignancies.
Collapse
Affiliation(s)
- Dimitrios Vitsios
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
- Corresponding author. (D.V.), (R.S.D.), (S.P.)
| | - Ryan S. Dhindsa
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Waltham, MA, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Jan and Dan Duncan Neurological Research Institute at Texas Children’s Hospital, Houston, TX, USA
- Corresponding author. (D.V.), (R.S.D.), (S.P.)
| | - Dorota Matelska
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Jonathan Mitchell
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Xuequing Zou
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Joshua Armenia
- Bioinformatics and Data Science, Research, and Early Development, Oncology R&D, AstraZeneca, Cambridge, UK
| | - Fengyuan Hu
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Quanli Wang
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Waltham, MA, USA
| | - Ben Sidders
- Bioinformatics and Data Science, Research, and Early Development, Oncology R&D, AstraZeneca, Cambridge, UK
| | - Andrew R. Harper
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Slavé Petrovski
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
- Department of Medicine, University of Melbourne, Austin Health, Melbourne, Victoria, Australia
- Corresponding author. (D.V.), (R.S.D.), (S.P.)
| |
Collapse
|
27
|
Zeng L, Liu Y, Yu ZG, Liu Y. iEnhancer-DLRA: identification of enhancers and their strengths by a self-attention fusion strategy for local and global features. Brief Funct Genomics 2022; 21:399-407. [PMID: 35942693 DOI: 10.1093/bfgp/elac023] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 06/30/2022] [Accepted: 07/12/2022] [Indexed: 11/14/2022] Open
Abstract
Identification and classification of enhancers are highly significant because they play crucial roles in controlling gene transcription. Recently, several deep learning-based methods for identifying enhancers and their strengths have been developed. However, existing methods are usually limited because they use only local or only global features. The combination of local and global features is critical to further improve the prediction performance. In this work, we propose a novel deep learning-based method, called iEnhancer-DLRA, to identify enhancers and their strengths. iEnhancer-DLRA extracts local and multi-scale global features of sequences by using a residual convolutional network and two bidirectional long short-term memory networks. Then, a self-attention fusion strategy is proposed to deeply integrate these local and global features. The experimental results on the independent test dataset indicate that iEnhancer-DLRA performs better than nine existing state-of-the-art methods in both identification and classification of enhancers in almost all metrics. iEnhancer-DLRA achieves 13.8% (for identifying enhancers) and 12.6% (for classifying strengths) improvement in accuracy compared with the best existing state-of-the-art method. This is the first time that the accuracy of an enhancer identifier exceeds 0.9 and the accuracy of the enhancer classifier exceeds 0.8 on the independent test set. Moreover, iEnhancer-DLRA achieves superior predictive performance on the rice dataset compared with the state-of-the-art method RiceENN.
Collapse
Affiliation(s)
- Li Zeng
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, 411105, Xiangtan, China
| | - Yang Liu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, 411105, Xiangtan, China
| | - Zu-Guo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, 411105, Xiangtan, China
| | - Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Changsha, China
| |
Collapse
|
28
|
Ahsan F, Yan Z, Precup D, Blanchette M. PhyloPGM: boosting regulatory function prediction accuracy using evolutionary information. Bioinformatics 2022; 38:i299-i306. [PMID: 35758792 PMCID: PMC9235490 DOI: 10.1093/bioinformatics/btac259] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Motivation The computational prediction of regulatory function associated with a genomic sequence is of utter importance in -omics study, which facilitates our understanding of the underlying mechanisms underpinning the vast gene regulatory network. Prominent examples in this area include the binding prediction of transcription factors in DNA regulatory regions, and predicting RNA–protein interaction in the context of post-transcriptional gene expression. However, existing computational methods have suffered from high false-positive rates and have seldom used any evolutionary information, despite the vast amount of available orthologous data across multitudes of extant and ancestral genomes, which readily present an opportunity to improve the accuracy of existing computational methods. Results In this study, we present a novel probabilistic approach called PhyloPGM that leverages previously trained TFBS or RNA–RBP binding predictors by aggregating their predictions from various orthologous regions, in order to boost the overall prediction accuracy on human sequences. Throughout our experiments, PhyloPGM has shown significant improvement over baselines such as the sequence-based RNA–RBP binding predictor RNATracker and the sequence-based TFBS predictor that is known as FactorNet. PhyloPGM is simple in principle, easy to implement and yet, yields impressive results. Availability and implementation The PhyloPGM package is available at https://github.com/BlanchetteLab/PhyloPGM Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Faizy Ahsan
- School of Computer Science, McGill University, Montreal H3A 0G4, Canada
| | - Zichao Yan
- School of Computer Science, McGill University, Montreal H3A 0G4, Canada
| | - Doina Precup
- School of Computer Science, McGill University, Montreal H3A 0G4, Canada
| | | |
Collapse
|
29
|
Zhang YY, Zhang WY, Xin XH, Du PF. dbEssLnc: A manually curated database of human and mouse essential lncRNA genes. Comput Struct Biotechnol J 2022; 20:2657-2663. [PMID: 35685362 PMCID: PMC9162909 DOI: 10.1016/j.csbj.2022.05.043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/20/2022] [Accepted: 05/21/2022] [Indexed: 02/07/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) play important roles in many biological processes. Knocking out or knocking down some lncRNAs will lead to lethality or infertility. These lncRNAs are called essential lncRNAs. Knowledges of essential lncRNAs are important in establishing minimal genomes of living cells, developing drug therapies and early diagnostic approaches for complex diseases. However, existing databases focus on collecting essential coding genes. Essential non-coding gene records are rare in existing databases. A comprehensive collection of essential non-coding genes, particularly essential lncRNA genes, is demanded. We manually curated 207 essential lncRNAs from literatures for establishing a database on essential lncRNAs, which is named as dbEssLnc (Database of essential lncRNAs). The dbEssLnc database has a web-based user-friendly interface for the users to browse, to search, to visualize and to blast search records in the database. The dbEssLnc database is freely accessible at https://esslnc.pufengdu.org. All data and source codes for mirroring the dbEssLnc database have been deposited in GitHub (https://github.com/yyZhang14/dbEssLnc).
Collapse
Affiliation(s)
- Ying-Ying Zhang
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Wen-Ya Zhang
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Xiao-Hong Xin
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| |
Collapse
|
30
|
Giovannetti A, Bianco SD, Traversa A, Panzironi N, Bruselles A, Lazzari S, Liorni N, Tartaglia M, Carella M, Pizzuti A, Mazza T, Caputo V. MiRLog and dbmiR: prioritization and functional annotation tools to study human microRNA sequence variants. Hum Mutat 2022; 43:1201-1215. [PMID: 35583122 PMCID: PMC9546175 DOI: 10.1002/humu.24399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 05/03/2022] [Accepted: 05/11/2022] [Indexed: 11/22/2022]
Abstract
The recent identification of noncoding variants with pathogenic effects suggests that these variations could underlie a significant number of undiagnosed cases. Several computational methods have been developed to predict the functional impact of noncoding variants, but they exhibit only partial concordance and are not integrated with functional annotation resources, making the interpretation of these variants still challenging. MicroRNAs (miRNAs) are small noncoding RNA molecules that act as fine regulators of gene expression and play crucial functions in several biological processes, such as cell proliferation and differentiation. An increasing number of studies demonstrate a significant impact of miRNA single nucleotide variants (SNVs) both in Mendelian diseases and complex traits. To predict the functional effect of miRNA SNVs, we implemented a new meta‐predictor, MiRLog, and we integrated it into a comprehensive database, dbmiR, which includes a precompiled list of all possible miRNA allelic SNVs, providing their biological annotations at nucleotide and miRNA levels. MiRLog and dbmiR were used to explore the genetic variability of miRNAs in 15,708 human genomes included in the gnomAD project, finding several ultra‐rare SNVs with a potentially deleterious effect on miRNA biogenesis and function representing putative contributors to human phenotypes.
Collapse
Affiliation(s)
- Agnese Giovannetti
- Laboratory of Clinical Genomics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), Italy
| | - Salvatore Daniele Bianco
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy.,Unit of Bioinformatics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), Italy
| | - Alice Traversa
- Laboratory of Clinical Genomics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), Italy
| | - Noemi Panzironi
- Laboratory of Clinical Genomics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), Italy
| | - Alessandro Bruselles
- Department of Oncology and Molecular Medicine, Istituto Superiore di Sanità, Rome, Italy
| | - Sara Lazzari
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy
| | - Niccolò Liorni
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy.,Unit of Bioinformatics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), Italy
| | - Marco Tartaglia
- Genetics and Rare Diseases Research Division, Ospedale Pediatrico Bambino Gesù, IRCCS, Rome, Italy
| | - Massimo Carella
- Medical Genetics Unit, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), Italy
| | - Antonio Pizzuti
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy
| | - Tommaso Mazza
- Unit of Bioinformatics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), Italy
| | - Viviana Caputo
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy
| |
Collapse
|
31
|
Poszewiecka B, Pienkowski VM, Nowosad K, Robin JD, Gogolewski K, Gambin A. TADeus2: a web server facilitating the clinical diagnosis by pathogenicity assessment of structural variations disarranging 3D chromatin structure. Nucleic Acids Res 2022; 50:W744-W752. [PMID: 35524567 PMCID: PMC9252839 DOI: 10.1093/nar/gkac318] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Revised: 04/12/2022] [Accepted: 04/21/2022] [Indexed: 01/01/2023] Open
Abstract
In recent years great progress has been made in identification of structural variants (SV) in the human genome. However, the interpretation of SVs, especially located in non-coding DNA, remains challenging. One of the reasons stems in the lack of tools exclusively designed for clinical SVs evaluation acknowledging the 3D chromatin architecture. Therefore, we present TADeus2 a web server dedicated for a quick investigation of chromatin conformation changes, providing a visual framework for the interpretation of SVs affecting topologically associating domains (TADs). This tool provides a convenient visual inspection of SVs, both in a continuous genome view as well as from a rearrangement’s breakpoint perspective. Additionally, TADeus2 allows the user to assess the influence of analyzed SVs within flaking coding/non-coding regions based on the Hi-C matrix. Importantly, the SVs pathogenicity is quantified and ranked using TADA, ClassifyCNV tools and sampling-based P-value. TADeus2 is publicly available at https://tadeus2.mimuw.edu.pl.
Collapse
Affiliation(s)
- Barbara Poszewiecka
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, 2 Banacha street, 02-097 Warsaw, Poland
| | - Victor Murcia Pienkowski
- Aix Marseille Univ, INSERM, Marseille Medical Genetics, MMG, Marseille, France.,Department of Medical Genetics, Medical University of Warsaw, Adolfa Pawińskiego 3c, 02-106 Warsaw, Poland
| | - Karol Nowosad
- Department of Cell Biology, Erasmus Medical Center, Doctor Molewaterplein 40, 3015 GD Rotterdam, Netherlands.,Department of Biomedical Sciences, Laboratory of Molecular Genetics, Medical University of Lublin, Doktora Witolda Chodźki 1, 20-400 Lublin, Poland.,The Postgraduate School of Molecular Medicine, Medical University of Warsaw, Żwirki i Wigury 61, 02-091 Warsaw, Poland
| | - Jérôme D Robin
- Aix Marseille Univ, INSERM, Marseille Medical Genetics, MMG, Marseille, France
| | - Krzysztof Gogolewski
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, 2 Banacha street, 02-097 Warsaw, Poland
| | - Anna Gambin
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, 2 Banacha street, 02-097 Warsaw, Poland
| |
Collapse
|
32
|
Chen L, Wang Y, Zhao F. Exploiting deep transfer learning for the prediction of functional non-coding variants using genomic sequence. Bioinformatics 2022; 38:3164-3172. [PMID: 35389435 PMCID: PMC9890318 DOI: 10.1093/bioinformatics/btac214] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 03/04/2022] [Accepted: 04/06/2022] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Though genome-wide association studies have identified tens of thousands of variants associated with complex traits and most of them fall within the non-coding regions, they may not be the causal ones. The development of high-throughput functional assays leads to the discovery of experimental validated non-coding functional variants. However, these validated variants are rare due to technical difficulty and financial cost. The small sample size of validated variants makes it less reliable to develop a supervised machine learning model for achieving a whole genome-wide prediction of non-coding causal variants. RESULTS We will exploit a deep transfer learning model, which is based on convolutional neural network, to improve the prediction for functional non-coding variants (NCVs). To address the challenge of small sample size, the transfer learning model leverages both large-scale generic functional NCVs to improve the learning of low-level features and context-specific functional NCVs to learn high-level features toward the context-specific prediction task. By evaluating the deep transfer learning model on three MPRA datasets and 16 GWAS datasets, we demonstrate that the proposed model outperforms deep learning models without pretraining or retraining. In addition, the deep transfer learning model outperforms 18 existing computational methods in both MPRA and GWAS datasets. AVAILABILITY AND IMPLEMENTATION https://github.com/lichen-lab/TLVar. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Li Chen
- To whom correspondence should be addressed.
| | | | - Fengdi Zhao
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN 46202, USA,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| |
Collapse
|
33
|
Giacopuzzi E, Popitsch N, Taylor JC. GREEN-DB: a framework for the annotation and prioritization of non-coding regulatory variants from whole-genome sequencing data. Nucleic Acids Res 2022; 50:2522-2535. [PMID: 35234913 PMCID: PMC8934622 DOI: 10.1093/nar/gkac130] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 02/02/2022] [Accepted: 02/14/2022] [Indexed: 11/25/2022] Open
Abstract
Non-coding variants have long been recognized as important contributors to common disease risks, but with the expansion of clinical whole genome sequencing, examples of rare, high-impact non-coding variants are also accumulating. Despite recent advances in the study of regulatory elements and the availability of specialized data collections, the systematic annotation of non-coding variants from genome sequencing remains challenging. Here, we propose a new framework for the prioritization of non-coding regulatory variants that integrates information about regulatory regions with prediction scores and HPO-based prioritization. Firstly, we created a comprehensive collection of annotations for regulatory regions including a database of 2.4 million regulatory elements (GREEN-DB) annotated with controlled gene(s), tissue(s) and associated phenotype(s) where available. Secondly, we calculated a variation constraint metric and showed that constrained regulatory regions associate with disease-associated genes and essential genes from mouse knock-outs. Thirdly, we compared 19 non-coding impact prediction scores providing suggestions for variant prioritization. Finally, we developed a VCF annotation tool (GREEN-VARAN) that can integrate all these elements to annotate variants for their potential regulatory impact. In our evaluation, we show that GREEN-DB can capture previously published disease-associated non-coding variants as well as identify additional candidate disease genes in trio analyses.
Collapse
Affiliation(s)
- Edoardo Giacopuzzi
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
- National Institute for Health Research Oxford Biomedical Research Centre, Oxford OX4 2PG, UK
| | - Niko Popitsch
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
- Max Perutz Labs, University of Vienna, Dr. Bohr-Gasse 9, 1030 Vienna, Austria
| | - Jenny C Taylor
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
- National Institute for Health Research Oxford Biomedical Research Centre, Oxford OX4 2PG, UK
| |
Collapse
|
34
|
Mustafi D, Hisama FM, Huey J, Chao JR. The current state of genetic testing platforms for inherited retinal diseases. Ophthalmol Retina 2022; 6:702-710. [PMID: 35307606 PMCID: PMC9356993 DOI: 10.1016/j.oret.2022.03.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 03/08/2022] [Accepted: 03/14/2022] [Indexed: 11/30/2022]
Abstract
PURPOSE To evaluate genetic testing platforms used to aid in the diagnosis of inherited retinal degenerations (IRDs). DESIGN Evaluation of diagnostic test or technology SUBJECTS: Targeted genetic panel testing for IRDs METHODS, INTERVENTION, OR TESTING: Data collected regarding targeted genetic panel testing for IRDs offered by different labs were investigated for inclusion of coding and non-coding variants in disease genes. Both large IRD panels and smaller, more focused disease specific panels were included in the analysis. MAIN OUTCOME MEASURES Number of disease genes tested as well as the commonality and uniqueness across testing platforms in both coding and non-coding variants of disease. RESULTS Across the three IRD panel tests investigated, 409 unique genes are represented, of which 269 genes are tested by all three panels. The top 20 genes known to cause over 70% of all IRDs are represented in the 269 common genes tested by all three panels. In addition, 138 non-coding variants are assayed across the three platforms in 50 unique genes. Focused disease specific panels exhibited significant variability across 5 testing platforms that were studied. CONCLUSIONS Ordering genetic testing for IRDs is not straightforward, as evidenced by the multitude of panels available to providers. It is important that there is coverage of both coding and non-coding regions in IRD genes to offer a diagnosis in these patients. This paper details the diversity of testing platforms currently available to clinicians and provides a thorough explanation of genes tested in the different IRD panels. In a time of increased importance for clinical genetic testing of IRD patients, knowledge of the proper test to order is paramount.
Collapse
Affiliation(s)
- Debarshi Mustafi
- Department of Ophthalmology, University of Washington, Seattle, Washington; Department of Ophthalmology, Seattle Children's Hospital, Seattle, Washington.
| | - Fuki M Hisama
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington
| | - Jennifer Huey
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, Washington
| | - Jennifer R Chao
- Department of Ophthalmology, University of Washington, Seattle, Washington
| |
Collapse
|
35
|
Yousefi S, Deng R, Lanko K, Salsench EM, Nikoncuk A, van der Linde HC, Perenthaler E, van Ham TJ, Mulugeta E, Barakat TS. Comprehensive multi-omics integration identifies differentially active enhancers during human brain development with clinical relevance. Genome Med 2021; 13:162. [PMID: 34663447 PMCID: PMC8524963 DOI: 10.1186/s13073-021-00980-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 09/29/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Non-coding regulatory elements (NCREs), such as enhancers, play a crucial role in gene regulation, and genetic aberrations in NCREs can lead to human disease, including brain disorders. The human brain is a complex organ that is susceptible to numerous disorders; many of these are caused by genetic changes, but a multitude remain currently unexplained. Understanding NCREs acting during brain development has the potential to shed light on previously unrecognized genetic causes of human brain disease. Despite immense community-wide efforts to understand the role of the non-coding genome and NCREs, annotating functional NCREs remains challenging. METHODS Here we performed an integrative computational analysis of virtually all currently available epigenome data sets related to human fetal brain. RESULTS Our in-depth analysis unravels 39,709 differentially active enhancers (DAEs) that show dynamic epigenomic rearrangement during early stages of human brain development, indicating likely biological function. Many of these DAEs are linked to clinically relevant genes, and functional validation of selected DAEs in cell models and zebrafish confirms their role in gene regulation. Compared to enhancers without dynamic epigenomic rearrangement, DAEs are subjected to higher sequence constraints in humans, have distinct sequence characteristics and are bound by a distinct transcription factor landscape. DAEs are enriched for GWAS loci for brain-related traits and for genetic variation found in individuals with neurodevelopmental disorders, including autism. CONCLUSION This compendium of high-confidence enhancers will assist in deciphering the mechanism behind developmental genetics of human brain and will be relevant to uncover missing heritability in human genetic brain disorders.
Collapse
Affiliation(s)
- Soheil Yousefi
- Department of Clinical Genetics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Ruizhi Deng
- Department of Clinical Genetics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Kristina Lanko
- Department of Clinical Genetics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Eva Medico Salsench
- Department of Clinical Genetics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Anita Nikoncuk
- Department of Clinical Genetics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Herma C. van der Linde
- Department of Clinical Genetics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Elena Perenthaler
- Department of Clinical Genetics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Tjakko J. van Ham
- Department of Clinical Genetics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Eskeatnaf Mulugeta
- Department of Cell Biology, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Tahsin Stefan Barakat
- Department of Clinical Genetics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| |
Collapse
|
36
|
Li G, Panday SK, Peng Y, Alexov E. SAMPDI-3D: predicting the effects of protein and DNA mutations on protein-DNA interactions. Bioinformatics 2021; 37:3760-3765. [PMID: 34343273 DOI: 10.1093/bioinformatics/btab567] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 06/28/2021] [Accepted: 07/31/2021] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Mutations that alter protein-DNA interactions may be pathogenic and cause diseases. Therefore, it is extremely important to quantify the effect of mutations on protein-DNA binding free energy to reveal the molecular origin of diseases and to assist the development of treatments. Although several methods that predict the change of protein-DNA binding affinity upon mutations in the binding protein were developed, the effect of DNA mutations was not considered yet. RESULTS Here, we report a new version of SAMPDI, the SAMPDI-3D, which is a gradient boosting decision tree machine learning method to predict the change of the protein-DNA binding free energy caused by mutations in both the binding protein and the bases of the corresponding DNA. The method is shown to achieve Pearson correlation coefficient of 0.76 and 0.80 in a benchmarking test against experimentally determined change of the binding free energy caused by mutations in the binding protein or DNA, respectively. Furthermore, three datasets collected from literature were used to do blind benchmark for SAMPDI-3D and it is shown that it outperforms all existing state-of-the-art methods. The method is very fast allowing for genome-scale investigations. AVAILABILITY It is available as a web server and a stand-code at http://compbio.clemson.edu/SAMPDI-3D/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gen Li
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | | | - Yunhui Peng
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| |
Collapse
|
37
|
Zhytnik L, Peters M, Tilk K, Simm K, Tõnisson N, Reimand T, Maasalu K, Acharya G, Krjutškov K, Salumets A. From late fatherhood to prenatal screening of monogenic disorders: evidence and ethical concerns. Hum Reprod Update 2021; 27:1056-1085. [PMID: 34329448 DOI: 10.1093/humupd/dmab023] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 06/27/2021] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND With the help of ART, an advanced parental age is not considered to be a serious obstacle for reproduction anymore. However, significant health risks for future offspring hide behind the success of reproductive medicine for the treatment of reduced fertility associated with late parenthood. Although an advanced maternal age is a well-known risk factor for poor reproductive outcomes, understanding the impact of an advanced paternal age on offspring is yet to be elucidated. De novo monogenic disorders (MDs) are highly associated with late fatherhood. MDs are one of the major sources of paediatric morbidity and mortality, causing significant socioeconomic and psychological burdens to society. Although individually rare, the combined prevalence of these disorders is as high as that of chromosomal aneuploidies, indicating the increasing need for prenatal screening. With the help of advanced reproductive technologies, families with late paternity have the option of non-invasive prenatal testing (NIPT) for multiple MDs (MD-NIPT), which has a sensitivity and specificity of almost 100%. OBJECTIVE AND RATIONALE The main aims of the current review were to examine the effect of late paternity on the origin and nature of MDs, to highlight the role of NIPT for the detection of a variety of paternal age-associated MDs, to describe clinical experiences and to reflect on the ethical concerns surrounding the topic of late paternity and MD-NIPT. SEARCH METHODS An extensive search of peer-reviewed publications (1980-2021) in English from the PubMed and Google Scholar databases was based on key words in different combinations: late paternity, paternal age, spermatogenesis, selfish spermatogonial selection, paternal age effect, de novo mutations (DNMs), MDs, NIPT, ethics of late fatherhood, prenatal testing and paternal rights. OUTCOMES An advanced paternal age provokes the accumulation of DNMs, which arise in continuously dividing germline cells. A subset of DNMs, owing to their effect on the rat sarcoma virus protein-mitogen-activated protein kinase signalling pathway, becomes beneficial for spermatogonia, causing selfish spermatogonial selection and outgrowth, and in some rare cases may lead to spermatocytic seminoma later in life. In the offspring, these selfish DNMs cause paternal age effect (PAE) disorders with a severe and even life-threatening phenotype. The increasing tendency for late paternity and the subsequent high risk of PAE disorders indicate an increased need for a safe and reliable detection procedure, such as MD-NIPT. The MD-NIPT approach has the capacity to provide safe screening for pregnancies at risk of PAE disorders and MDs, which constitute up to 20% of all pregnancies. The primary risks include pregnancies with a paternal age over 40 years, a previous history of an affected pregnancy/child, and/or congenital anomalies detected by routine ultrasonography. The implementation of NIPT-based screening would support the early diagnosis and management needed in cases of affected pregnancy. However, the benefits of MD-NIPT need to be balanced with the ethical challenges associated with the introduction of such an approach into routine clinical practice, namely concerns regarding reproductive autonomy, informed consent, potential disability discrimination, paternal rights and PAE-associated issues, equity and justice in accessing services, and counselling. WIDER IMPLICATIONS Considering the increasing parental age and risks of MDs, combined NIPT for chromosomal aneuploidies and microdeletion syndromes as well as tests for MDs might become a part of routine pregnancy management in the near future. Moreover, the ethical challenges associated with the introduction of MD-NIPT into routine clinical practice need to be carefully evaluated. Furthermore, more focus and attention should be directed towards the ethics of late paternity, paternal rights and paternal genetic guilt associated with pregnancies affected with PAE MDs.
Collapse
Affiliation(s)
- Lidiia Zhytnik
- Competence Centre on Health Technologies, Tartu, Estonia
| | - Maire Peters
- Competence Centre on Health Technologies, Tartu, Estonia.,Department of Obstetrics and Gynaecology, Institute of Clinical Medicine, University of Tartu, Tartu, Estonia
| | - Kadi Tilk
- Competence Centre on Health Technologies, Tartu, Estonia
| | - Kadri Simm
- Institute of Philosophy and Semiotics, Faculty of Arts and Humanities, University of Tartu, Tartu, Estonia.,Centre of Ethics, University of Tartu, Tartu, Estonia
| | - Neeme Tõnisson
- Institute of Genomics, University of Tartu, Tartu, Estonia.,Department of Clinical Genetics, United Laboratories, Tartu University Hospital, Tartu, Estonia.,Department of Reproductive Medicine, West Tallinn Central Hospital, Tallinn, Estonia
| | - Tiia Reimand
- Department of Clinical Genetics, United Laboratories, Tartu University Hospital, Tartu, Estonia.,Department of Clinical Genetics, Institute of Clinical Medicine, University of Tartu, Tartu, Estonia
| | - Katre Maasalu
- Clinic of Traumatology and Orthopaedics, Tartu University Hospital, Tartu, Estonia.,Department of Traumatology and Orthopaedics, Institute of Clinical Medicine, University of Tartu, Tartu, Estonia
| | - Ganesh Acharya
- Division of Obstetrics and Gynaecology, Department of Clinical Science, Intervention and Technology (CLINTEC), Karolinska Institutet, Stockholm, Sweden
| | - Kaarel Krjutškov
- Competence Centre on Health Technologies, Tartu, Estonia.,Department of Obstetrics and Gynaecology, Institute of Clinical Medicine, University of Tartu, Tartu, Estonia
| | - Andres Salumets
- Competence Centre on Health Technologies, Tartu, Estonia.,Department of Obstetrics and Gynaecology, Institute of Clinical Medicine, University of Tartu, Tartu, Estonia.,Institute of Genomics, University of Tartu, Tartu, Estonia.,Division of Obstetrics and Gynaecology, Department of Clinical Science, Intervention and Technology (CLINTEC), Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
38
|
Wang QS, Kelley DR, Ulirsch J, Kanai M, Sadhuka S, Cui R, Albors C, Cheng N, Okada Y, Aguet F, Ardlie KG, MacArthur DG, Finucane HK. Leveraging supervised learning for functionally informed fine-mapping of cis-eQTLs identifies an additional 20,913 putative causal eQTLs. Nat Commun 2021; 12:3394. [PMID: 34099641 PMCID: PMC8184741 DOI: 10.1038/s41467-021-23134-8] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 04/15/2021] [Indexed: 02/05/2023] Open
Abstract
The large majority of variants identified by GWAS are non-coding, motivating detailed characterization of the function of non-coding variants. Experimental methods to assess variants' effect on gene expressions in native chromatin context via direct perturbation are low-throughput. Existing high-throughput computational predictors thus have lacked large gold standard sets of regulatory variants for training and validation. Here, we leverage a set of 14,807 putative causal eQTLs in humans obtained through statistical fine-mapping, and we use 6121 features to directly train a predictor of whether a variant modifies nearby gene expression. We call the resulting prediction the expression modifier score (EMS). We validate EMS by comparing its ability to prioritize functional variants with other major scores. We then use EMS as a prior for statistical fine-mapping of eQTLs to identify an additional 20,913 putatively causal eQTLs, and we incorporate EMS into co-localization analysis to identify 310 additional candidate genes across UK Biobank phenotypes.
Collapse
Affiliation(s)
- Qingbo S Wang
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- PhD program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, USA.
| | | | - Jacob Ulirsch
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- PhD program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
| | - Masahiro Kanai
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- PhD program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Osaka, Japan
| | - Shuvom Sadhuka
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard College, Cambridge, MA, USA
| | - Ran Cui
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Carlos Albors
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Nathan Cheng
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Osaka, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Osaka, Japan
- Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Osaka, Japan
| | | | | | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Parkville, VIC, Australia
| | - Hilary K Finucane
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
| |
Collapse
|
39
|
Green DJ, Lenassi E, Manning CS, McGaughey D, Sharma V, Black GC, Ellingford JM, Sergouniotis PI. North Carolina Macular Dystrophy: Phenotypic Variability and Computational Analysis of Disease-Associated Noncoding Variants. Invest Ophthalmol Vis Sci 2021; 62:16. [PMID: 34125159 PMCID: PMC8212441 DOI: 10.1167/iovs.62.7.16] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Purpose North Carolina macular dystrophy (NCMD) is an autosomal dominant, congenital disorder affecting the central retina. Here, we report clinical and genetic findings in three families segregating NCMD and use epigenomic datasets from human tissues to gain insights into the effect of NCMD-implicated variants. Methods Clinical assessment and genetic testing were performed. Publicly available transcriptomic and epigenomic datasets were analyzed and the activity-by-contact method for scoring enhancer elements and linking them to target genes was used. Results A previously described, heterozygous, noncoding variant upstream of the PRDM13 gene was detected in all six affected study participants (chr6:100,040,987G>C [GRCh37/hg19]). Interfamilial and intrafamilial variability were observed; the visual acuity ranged from 0.0 to 1.6 LogMAR and fundoscopic findings ranged from visually insignificant, confluent, drusen-like macular deposits to coloboma-like macular lesions. Variable degrees of peripheral retinal spots (which were easily detected on widefield retinal imaging) were observed in all study subjects. Notably, a 6-year-old patient developed choroidal neovascularization and required treatment with intravitreal bevacizumab injections. Computational analysis of the five single nucleotide variants that have been implicated in NCMD revealed that these noncoding changes lie within two putative enhancer elements; these elements are predicted to interact with PRDM13 in the developing human retina. PRDM13 was found to be expressed in the fetal retina, with greatest expression in the amacrine precursor cell population. Conclusions We provide further evidence supporting the role of PRDM13 dysregulation in the pathogenesis of NCMD and highlight the usefulness of widefield retinal imaging in individuals suspected to have this condition.
Collapse
Affiliation(s)
- David J Green
- Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom
| | - Eva Lenassi
- Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom
- Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester, United Kingdom
- Manchester Royal Eye Hospital, Manchester University NHS Foundation Trust, Manchester, United Kingdom
| | - Cerys S Manning
- Division of Developmental Biology and Medicine, School of Medical Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom
| | - David McGaughey
- Ophthalmic Genetics and Visual Function Branch, National Eye Institute, National Institutes of Health, Bethesda, Maryland, United States
| | - Vinod Sharma
- Manchester Royal Eye Hospital, Manchester University NHS Foundation Trust, Manchester, United Kingdom
| | - Graeme C Black
- Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom
- Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester, United Kingdom
| | - Jamie M Ellingford
- Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom
| | - Panagiotis I Sergouniotis
- Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom
- Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester, United Kingdom
- Manchester Royal Eye Hospital, Manchester University NHS Foundation Trust, Manchester, United Kingdom
- Institute of Biochemistry and Molecular Genetics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| |
Collapse
|
40
|
Pinsach-Abuin M, del Olmo B, Pérez-Agustin A, Mates J, Allegue C, Iglesias A, Ma Q, Merkurjev D, Konovalov S, Zhang J, Sheikh F, Telenti A, Brugada J, Brugada R, Gymrek M, di Iulio J, Garcia-Bassets I, Pagans S. Analysis of Brugada syndrome loci reveals that fine-mapping clustered GWAS hits enhances the annotation of disease-relevant variants. Cell Rep Med 2021; 2:100250. [PMID: 33948580 PMCID: PMC8080235 DOI: 10.1016/j.xcrm.2021.100250] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 01/07/2021] [Accepted: 03/23/2021] [Indexed: 11/30/2022]
Abstract
Genome-wide association studies (GWASs) are instrumental in identifying loci harboring common single-nucleotide variants (SNVs) that affect human traits and diseases. GWAS hits emerge in clusters, but the focus is often on the most significant hit in each trait- or disease-associated locus. The remaining hits represent SNVs in linkage disequilibrium (LD) and are considered redundant and thus frequently marginally reported or exploited. Here, we interrogate the value of integrating the full set of GWAS hits in a locus repeatedly associated with cardiac conduction traits and arrhythmia, SCN5A-SCN10A. Our analysis reveals 5 common 7-SNV haplotypes (Hap1-5) with 2 combinations associated with life-threatening arrhythmia-Brugada syndrome (the risk Hap1/1 and protective Hap2/3 genotypes). Hap1 and Hap2 share 3 SNVs; thus, this analysis suggests that assuming redundancy among clustered GWAS hits can lead to confounding disease-risk associations and supports the need to deconstruct GWAS data in the context of haplotype composition.
Collapse
Affiliation(s)
- Mel·lina Pinsach-Abuin
- Department of Medical Sciences, School of Medicine, Universitat de Girona, Girona, Spain
- Visiting Scholar Program, Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA, USA
- Institut d’Investigació Biomèdica de Girona, Salt, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares, Madrid, Spain
| | - Bernat del Olmo
- Department of Medical Sciences, School of Medicine, Universitat de Girona, Girona, Spain
- Visiting Scholar Program, Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA, USA
- Institut d’Investigació Biomèdica de Girona, Salt, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares, Madrid, Spain
| | - Adrian Pérez-Agustin
- Department of Medical Sciences, School of Medicine, Universitat de Girona, Girona, Spain
- Institut d’Investigació Biomèdica de Girona, Salt, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares, Madrid, Spain
| | - Jesus Mates
- Department of Medical Sciences, School of Medicine, Universitat de Girona, Girona, Spain
- Institut d’Investigació Biomèdica de Girona, Salt, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares, Madrid, Spain
| | - Catarina Allegue
- Department of Medical Sciences, School of Medicine, Universitat de Girona, Girona, Spain
- Visiting Scholar Program, Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA, USA
- Institut d’Investigació Biomèdica de Girona, Salt, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares, Madrid, Spain
| | - Anna Iglesias
- Department of Medical Sciences, School of Medicine, Universitat de Girona, Girona, Spain
- Institut d’Investigació Biomèdica de Girona, Salt, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares, Madrid, Spain
| | - Qi Ma
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Daria Merkurjev
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA, USA
- Department of Statistics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Sergiy Konovalov
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Jing Zhang
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Farah Sheikh
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Amalio Telenti
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Josep Brugada
- Arrhythmia Unit, Hospital Clinic de Barcelona, Universitat de Barcelona, Barcelona, Spain
| | - Ramon Brugada
- Department of Medical Sciences, School of Medicine, Universitat de Girona, Girona, Spain
- Institut d’Investigació Biomèdica de Girona, Salt, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares, Madrid, Spain
- Cardiology Service, Hospital Universitari Dr. Josep Trueta, Girona, Spain
| | - Melissa Gymrek
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA, USA
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Julia di Iulio
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Ivan Garcia-Bassets
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Sara Pagans
- Department of Medical Sciences, School of Medicine, Universitat de Girona, Girona, Spain
- Institut d’Investigació Biomèdica de Girona, Salt, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares, Madrid, Spain
| |
Collapse
|
41
|
Castells-Roca L, Tejero E, Rodríguez-Santiago B, Surrallés J. CRISPR Screens in Synthetic Lethality and Combinatorial Therapies for Cancer. Cancers (Basel) 2021; 13:1591. [PMID: 33808217 PMCID: PMC8037779 DOI: 10.3390/cancers13071591] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 03/24/2021] [Accepted: 03/25/2021] [Indexed: 12/26/2022] Open
Abstract
Cancer is a complex disease resulting from the accumulation of genetic dysfunctions. Tumor heterogeneity causes the molecular variety that divergently controls responses to chemotherapy, leading to the recurrent problem of cancer reappearance. For many decades, efforts have focused on identifying essential tumoral genes and cancer driver mutations. More recently, prompted by the clinical success of the synthetic lethality (SL)-based therapy of the PARP inhibitors in homologous recombinant deficient tumors, scientists have centered their novel research on SL interactions (SLI). The state of the art to find new genetic interactions are currently large-scale forward genetic CRISPR screens. CRISPR technology has rapidly evolved to be a common tool in the vast majority of laboratories, as tools to implement CRISPR screen protocols are available to all researchers. Taking advantage of SLI, combinatorial therapies have become the ultimate model to treat cancer with lower toxicity, and therefore better efficiency. This review explores the CRISPR screen methodology, integrates the up-to-date published findings on CRISPR screens in the cancer field and proposes future directions to uncover cancer regulation and individual responses to chemotherapy.
Collapse
Affiliation(s)
- Laia Castells-Roca
- Genome Instability and DNA Repair Syndromes Group, Sant Pau Biomedical Research Institute (IIB Sant Pau) and Join Unit UAB-IR Sant Pau on Genomic Medicine, 08041 Barcelona, Spain
- Genetics Department, Hospital de la Santa Creu i Sant Pau, 08041 Barcelona, Spain;
- Genetics and Microbiology Department, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
| | - Eudald Tejero
- Sant Pau Biomedical Research Institute (IIB Sant Pau), 08041 Barcelona, Spain;
| | - Benjamín Rodríguez-Santiago
- Genetics Department, Hospital de la Santa Creu i Sant Pau, 08041 Barcelona, Spain;
- Center for Biomedical Network Research on Rare Diseases (CIBERER) and Sant Pau Biomedical Research Institute (IIB Sant Pau), 08041 Barcelona, Spain
| | - Jordi Surrallés
- Genome Instability and DNA Repair Syndromes Group, Sant Pau Biomedical Research Institute (IIB Sant Pau) and Join Unit UAB-IR Sant Pau on Genomic Medicine, 08041 Barcelona, Spain
- Genetics Department, Hospital de la Santa Creu i Sant Pau, 08041 Barcelona, Spain;
- Genetics and Microbiology Department, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER) and Sant Pau Biomedical Research Institute (IIB Sant Pau), 08041 Barcelona, Spain
| |
Collapse
|
42
|
Vitsios D, Dhindsa RS, Middleton L, Gussow AB, Petrovski S. Prioritizing non-coding regions based on human genomic constraint and sequence context with deep learning. Nat Commun 2021; 12:1504. [PMID: 33686085 PMCID: PMC7940646 DOI: 10.1038/s41467-021-21790-4] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 02/12/2021] [Indexed: 11/14/2022] Open
Abstract
Elucidating functionality in non-coding regions is a key challenge in human genomics. It has been shown that intolerance to variation of coding and proximal non-coding sequence is a strong predictor of human disease relevance. Here, we integrate intolerance to variation, functional genomic annotations and primary genomic sequence to build JARVIS: a comprehensive deep learning model to prioritize non-coding regions, outperforming other human lineage-specific scores. Despite being agnostic to evolutionary conservation, JARVIS performs comparably or outperforms conservation-based scores in classifying pathogenic single-nucleotide and structural variants. In constructing JARVIS, we introduce the genome-wide residual variation intolerance score (gwRVIS), applying a sliding-window approach to whole genome sequencing data from 62,784 individuals. gwRVIS distinguishes Mendelian disease genes from more tolerant CCDS regions and highlights ultra-conserved non-coding elements as the most intolerant regions in the human genome. Both JARVIS and gwRVIS capture previously inaccessible human-lineage constraint information and will enhance our understanding of the non-coding genome. Intolerance to variation is a strong indicator of disease relevance for coding regions of the human genome. Here, the authors present JARVIS, a deep learning method integrating intolerance to variation in non-coding regions and sequence-specific annotations to infer non-coding variant pathogenicity.
Collapse
Affiliation(s)
- Dimitrios Vitsios
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.
| | - Ryan S Dhindsa
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Lawrence Middleton
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Ayal B Gussow
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA
| | - Slavé Petrovski
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.
| |
Collapse
|
43
|
Wang H, Chen S, Wei J, Song G, Zhao Y. A-to-I RNA Editing in Cancer: From Evaluating the Editing Level to Exploring the Editing Effects. Front Oncol 2021; 10:632187. [PMID: 33643923 PMCID: PMC7905090 DOI: 10.3389/fonc.2020.632187] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Accepted: 12/21/2020] [Indexed: 12/21/2022] Open
Abstract
As an important regulatory mechanism at the posttranscriptional level in metazoans, adenosine deaminase acting on RNA (ADAR)-induced A-to-I RNA editing modification of double-stranded RNA has been widely detected and reported. Editing may lead to non-synonymous amino acid mutations, RNA secondary structure alterations, pre-mRNA processing changes, and microRNA-mRNA redirection, thereby affecting multiple cellular processes and functions. In recent years, researchers have successfully developed several bioinformatics software tools and pipelines to identify RNA editing sites. However, there are still no widely accepted editing site standards due to the variety of parallel optimization and RNA high-seq protocols and programs. It is also challenging to identify RNA editing by normal protocols in tumor samples due to the high DNA mutation rate. Numerous RNA editing sites have been reported to be located in non-coding regions and can affect the biosynthesis of ncRNAs, including miRNAs and circular RNAs. Predicting the function of RNA editing sites located in non-coding regions and ncRNAs is significantly difficult. In this review, we aim to provide a better understanding of bioinformatics strategies for human cancer A-to-I RNA editing identification and briefly discuss recent advances in related areas, such as the oncogenic and tumor suppressive effects of RNA editing.
Collapse
Affiliation(s)
- Heming Wang
- Clinical Medical College, Changchun University of Chinese Medicine, Changchun, China
- Department of Gastroenterology and Hepatology, Zhongshan Hospital of Fudan University, Shanghai, China
- Shanghai Institute of Liver Diseases, Shanghai, China
| | - Sinuo Chen
- Department of Gastroenterology and Hepatology, Zhongshan Hospital of Fudan University, Shanghai, China
- Shanghai Institute of Liver Diseases, Shanghai, China
| | - Jiayi Wei
- Department of Gastroenterology and Hepatology, Zhongshan Hospital of Fudan University, Shanghai, China
- Shanghai Institute of Liver Diseases, Shanghai, China
| | - Guangqi Song
- Department of Gastroenterology and Hepatology, Zhongshan Hospital of Fudan University, Shanghai, China
- Shanghai Institute of Liver Diseases, Shanghai, China
| | - Yicheng Zhao
- Clinical Medical College, Changchun University of Chinese Medicine, Changchun, China
| |
Collapse
|
44
|
Di Resta C, Pipitone GB, Carrera P, Ferrari M. Current scenario of the genetic testing for rare neurological disorders exploiting next generation sequencing. Neural Regen Res 2021; 16:475-481. [PMID: 32985468 PMCID: PMC7996035 DOI: 10.4103/1673-5374.293135] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Next generation sequencing is currently a cornerstone of genetic testing in routine diagnostics, allowing for the detection of sequence variants with so far unprecedented large scale, mainly in genetically heterogenous diseases, such as neurological disorders. It is a fast-moving field, where new wet enrichment protocols and bioinformatics tools are constantly being developed to overcome initial limitations. Despite the as yet undiscussed advantages, however, there are still some challenges in data analysis and the interpretation of variants. In this review, we address the current state of next generation sequencing diagnostic testing for inherited human disorders, particularly giving an overview of the available high-throughput sequencing approaches; including targeted, whole-exome and whole-genome sequencing; and discussing the main critical aspects of the bioinformatic process, from raw data analysis to molecular diagnosis.
Collapse
Affiliation(s)
- Chiara Di Resta
- Vita-Salute San Raffaele University; Unit of Genomics for Human Disease Diagnosis, Division of Genetics and Cell Biology, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | | | - Paola Carrera
- Unit of Genomics for Human Disease Diagnosis, Division of Genetics and Cell Biology, IRCCS San Raffaele Scientific Institute; Clinical Molecular Biology Laboratory, IRCCS San Raffaele Hospital, Milan, Italy
| | - Maurizio Ferrari
- Vita-Salute San Raffaele University; Unit of Genomics for Human Disease Diagnosis, Division of Genetics and Cell Biology, IRCCS San Raffaele Scientific Institute; Clinical Molecular Biology Laboratory, IRCCS San Raffaele Hospital, Milan, Italy
| |
Collapse
|
45
|
Chiara M, Mandreoli P, Tangaro MA, D'Erchia AM, Sorrentino S, Forleo C, Horner DS, Zambelli F, Pesole G. VINYL: Variant prIoritizatioN by survivaL analysis. Bioinformatics 2020; 36:5590-5599. [PMID: 33367501 DOI: 10.1093/bioinformatics/btaa1067] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Revised: 10/31/2020] [Accepted: 12/14/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Clinical applications of genome re-sequencing technologies typically generate large amounts of data that need to be carefully annotated and interpreted to identify genetic variants potentially associated with pathological conditions. In this context, accurate and reproducible methods for the functional annotation and prioritization of genetic variants are of fundamental importance. RESULTS In this paper, we present VINYL, a flexible and fully automated system for the functional annotation and prioritization of genetic variants. Extensive analyses of both real and simulated datasets suggest that VINYL can identify clinically relevant genetic variants in a more accurate manner compared to equivalent state of the art methods, allowing a more rapid and effective prioritization of genetic variants in different experimental settings. As such we believe that VINYL can establish itself as a valuable tool to assist healthcare operators and researchers in clinical genomics investigations. AVAILABILITY VINYL is available at http://beaconlab.it/VINYL and https://github.com/matteo14c/VINYL. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matteo Chiara
- Department of Biosciences, University of Milan, Milan, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Bari, Italy
| | | | - Marco Antonio Tangaro
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Bari, Italy
| | - Anna Maria D'Erchia
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Bari, Italy.,Department of Biosciences, Biotechnology and Biopharmaceutics, University of Bari "Aldo Moro", Bari, Italy
| | - Sandro Sorrentino
- Cardiology Unit, Department of Emergency and Organ Transplantation, University of Bari "Aldo Moro", Bari, Italy
| | - Cinzia Forleo
- Cardiology Unit, Department of Emergency and Organ Transplantation, University of Bari "Aldo Moro", Bari, Italy
| | - David S Horner
- Department of Biosciences, University of Milan, Milan, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Bari, Italy
| | - Federico Zambelli
- Department of Biosciences, University of Milan, Milan, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Bari, Italy
| | - Graziano Pesole
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Bari, Italy.,Department of Biosciences, Biotechnology and Biopharmaceutics, University of Bari "Aldo Moro", Bari, Italy
| |
Collapse
|
46
|
Kim SS, Dey KK, Weissbrod O, Márquez-Luna C, Gazal S, Price AL. Improving the informativeness of Mendelian disease-derived pathogenicity scores for common disease. Nat Commun 2020; 11:6258. [PMID: 33288751 PMCID: PMC7721881 DOI: 10.1038/s41467-020-20087-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 11/09/2020] [Indexed: 02/08/2023] Open
Abstract
Despite considerable progress on pathogenicity scores prioritizing variants for Mendelian disease, little is known about the utility of these scores for common disease. Here, we assess the informativeness of Mendelian disease-derived pathogenicity scores for common disease and improve upon existing scores. We first apply stratified linkage disequilibrium (LD) score regression to evaluate published pathogenicity scores across 41 common diseases and complex traits (average N = 320K). Several of the resulting annotations are informative for common disease, even after conditioning on a broad set of functional annotations. We then improve upon published pathogenicity scores by developing AnnotBoost, a machine learning framework to impute and denoise pathogenicity scores using a broad set of functional annotations. AnnotBoost substantially increases the informativeness for common disease of both previously uninformative and previously informative pathogenicity scores, implying that Mendelian and common disease variants share similar properties. The boosted scores also produce improvements in heritability model fit and in classifying disease-associated, fine-mapped SNPs. Our boosted scores may improve fine-mapping and candidate gene discovery for common disease.
Collapse
Affiliation(s)
- Samuel S Kim
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA.
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
| | - Kushal K Dey
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Omer Weissbrod
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Carla Márquez-Luna
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
| |
Collapse
|
47
|
Lincoln MR, Axisa PP, Hafler DA. Epigenetic fine-mapping: identification of causal mechanisms for autoimmunity. Curr Opin Immunol 2020; 67:50-56. [PMID: 32977183 DOI: 10.1016/j.coi.2020.09.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 08/28/2020] [Accepted: 09/07/2020] [Indexed: 11/25/2022]
Abstract
Genome-wide association studies (GWAS) have identified genetic susceptibility loci for a variety of autoimmune and inflammatory diseases. These studies confirm the fundamental genetic basis of individual autoimmune diseases, and also point to shared etiological mechanisms across the spectrum of autoimmunity. While hundreds of genetic loci have been implicated in autoimmune diseases, the translation of individual susceptibility loci into specific molecular mechanisms for individual diseases remains difficult. This review highlights recent advances in the genetics of autoimmune disease, and the emerging use of epigenetic techniques to identify pathogenic cell types and causal molecular mechanisms of autoimmunity.
Collapse
Affiliation(s)
- Matthew R Lincoln
- Departments of Neurology and Immunobiology, Yale School of Medicine, New Haven, CT, USA; Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Pierre-Paul Axisa
- Departments of Neurology and Immunobiology, Yale School of Medicine, New Haven, CT, USA.
| | - David A Hafler
- Departments of Neurology and Immunobiology, Yale School of Medicine, New Haven, CT, USA; Broad Institute of Harvard and MIT, Cambridge, MA, USA
| |
Collapse
|
48
|
Awany D, Chimusa ER. Heritability jointly explained by host genotype and microbiome: will improve traits prediction? Brief Bioinform 2020; 22:5893981. [PMID: 32810866 DOI: 10.1093/bib/bbaa175] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 07/09/2020] [Accepted: 07/10/2020] [Indexed: 11/14/2022] Open
Abstract
As we observe the $70$th anniversary of the publication by Robertson that formalized the notion of 'heritability', geneticists remain puzzled by the problem of missing/hidden heritability, where heritability estimates from genome-wide association studies (GWASs) fall short of that from twin-based studies. Many possible explanations have been offered for this discrepancy, including existence of genetic variants poorly captured by existing arrays, dominance, epistasis and unaccounted-for environmental factors; albeit these remain controversial. We believe a substantial part of this problem could be solved or better understood by incorporating the host's microbiota information in the GWAS model for heritability estimation and may also increase human traits prediction for clinical utility. This is because, despite empirical observations such as (i) the intimate role of the microbiome in many complex human phenotypes, (ii) the overlap between genetic variants associated with both microbiome attributes and complex diseases and (iii) the existence of heritable bacterial taxa, current GWAS models for heritability estimate do not take into account the contributory role of the microbiome. Furthermore, heritability estimate from twin-based studies does not discern microbiome component of the observed total phenotypic variance. Here, we summarize the concept of heritability in GWAS and microbiome-wide association studies, focusing on its estimation, from a statistical genetics perspective. We then discuss a possible statistical method to incorporate the microbiome in the estimation of heritability in host GWAS.
Collapse
Affiliation(s)
- Denis Awany
- Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town, South Africa
| | - Emile R Chimusa
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| |
Collapse
|
49
|
Spreafico R, Soriaga LB, Grosse J, Virgin HW, Telenti A. Advances in Genomics for Drug Development. Genes (Basel) 2020; 11:E942. [PMID: 32824125 PMCID: PMC7465049 DOI: 10.3390/genes11080942] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 08/04/2020] [Accepted: 08/13/2020] [Indexed: 11/16/2022] Open
Abstract
Drug development (target identification, advancing drug leads to candidates for preclinical and clinical studies) can be facilitated by genetic and genomic knowledge. Here, we review the contribution of population genomics to target identification, the value of bulk and single cell gene expression analysis for understanding the biological relevance of a drug target, and genome-wide CRISPR editing for the prioritization of drug targets. In genomics, we discuss the different scope of genome-wide association studies using genotyping arrays, versus exome and whole genome sequencing. In transcriptomics, we discuss the information from drug perturbation and the selection of biomarkers. For CRISPR screens, we discuss target discovery, mechanism of action and the concept of gene to drug mapping. Harnessing genetic support increases the probability of drug developability and approval.
Collapse
Affiliation(s)
| | | | | | | | - Amalio Telenti
- Vir Biotechnology, Inc., San Francisco, CA 94158, USA; (R.S.); (L.B.S.); (J.G.); (H.W.V.)
| |
Collapse
|
50
|
Ross PJ, Mok RSF, Smith BS, Rodrigues DC, Mufteev M, Scherer SW, Ellis J. Modeling neuronal consequences of autism-associated gene regulatory variants with human induced pluripotent stem cells. Mol Autism 2020; 11:33. [PMID: 32398033 PMCID: PMC7218542 DOI: 10.1186/s13229-020-00333-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 04/03/2020] [Indexed: 12/27/2022] Open
Abstract
Genetic factors contribute to the development of autism spectrum disorder (ASD), and although non-protein-coding regions of the genome are being increasingly implicated in ASD, the functional consequences of these variants remain largely uncharacterized. Induced pluripotent stem cells (iPSCs) enable the production of personalized neurons that are genetically matched to people with ASD and can therefore be used to directly test the effects of genomic variation on neuronal gene expression, synapse function, and connectivity. The combined use of human pluripotent stem cells with genome editing to introduce or correct specific variants has proved to be a powerful approach for exploring the functional consequences of ASD-associated variants in protein-coding genes and, more recently, long non-coding RNAs (lncRNAs). Here, we review recent studies that implicate lncRNAs, other non-coding mutations, and regulatory variants in ASD susceptibility. We also discuss experimental design considerations for using iPSCs and genome editing to study the role of the non-protein-coding genome in ASD.
Collapse
Affiliation(s)
- P Joel Ross
- Department of Biology, University of Prince Edward Island, Charlottetown, PE, Canada.
| | - Rebecca S F Mok
- Developmental & Stem Cell Biology Program, The Hospital for Sick Children, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Brandon S Smith
- Department of Biology, University of Prince Edward Island, Charlottetown, PE, Canada
| | - Deivid C Rodrigues
- Developmental & Stem Cell Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
| | - Marat Mufteev
- Developmental & Stem Cell Biology Program, The Hospital for Sick Children, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Stephen W Scherer
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.,Genetics & Genome Biology Program and The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, Canada.,McLaughlin Centre, University of Toronto, Toronto, ON, Canada
| | - James Ellis
- Developmental & Stem Cell Biology Program, The Hospital for Sick Children, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| |
Collapse
|