1
|
Giovannetti A, Lazzari S, Mangoni M, Traversa A, Mazza T, Parisi C, Caputo V. Exploring non-coding genetic variability in ACE2: Functional annotation and in vitro validation of regulatory variants. Gene 2024; 915:148422. [PMID: 38570058 DOI: 10.1016/j.gene.2024.148422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 02/23/2024] [Accepted: 03/13/2024] [Indexed: 04/05/2024]
Abstract
The surge in human whole-genome sequencing data has facilitated the study of non-coding region variations, yet understanding their biological significance remains a challenge. We used a computational workflow to assess the regulatory potential of non-coding variants, with a particular focus on the Angiotensin Converting Enzyme 2 (ACE2) gene. This gene is crucial in physiological processes and serves as the entry point for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus causing coronavirus disease 19 (COVID-19). In our analysis, using data from the gnomAD population database and functional annotation, we identified 17 significant Single Nucleotide Variants (SNVs) in ACE2, particularly in its enhancers, promoters, and 3' untranslated regions (UTRs). We found preliminary evidence supporting the regulatory impact of some of these variants on ACE2 expression. Our detailed examination of two SNVs, rs147718775 and rs140394675, in the ACE2 promoter revealed that these co-occurring SNVs, when mutated, significantly enhance promoter activity, suggesting a possible increase in specific ACE2 isoform expression. This method proves effective in identifying and interpreting impactful non-coding variants, aiding in further studies and enhancing understanding of molecular bases of monogenic and complex traits.
Collapse
Affiliation(s)
- Agnese Giovannetti
- Clinical Genomics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini, snc, 71013 S. Giovanni Rotondo (FG), Italy.
| | - Sara Lazzari
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy.
| | - Manuel Mangoni
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy; Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini, snc, 71013 S. Giovanni Rotondo (FG), Italy.
| | - Alice Traversa
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy; Dipartimento di Scienze della Vita, della Salute e delle Professioni Sanitarie, Università degli Studi "Link Campus University", Via del Casale di San Pio V 44, 00165 Roma, Italy.
| | - Tommaso Mazza
- Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini, snc, 71013 S. Giovanni Rotondo (FG), Italy.
| | - Chiara Parisi
- Institute of Biochemistry and Cell Biology, CNR-National Research Council, Via Ercole Ramarini, 32, 00015 Monterotondo Scalo (RM), Italy.
| | - Viviana Caputo
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy.
| |
Collapse
|
2
|
Tabet DR, Kuang D, Lancaster MC, Li R, Liu K, Weile J, Coté AG, Wu Y, Hegele RA, Roden DM, Roth FP. Benchmarking computational variant effect predictors by their ability to infer human traits. Genome Biol 2024; 25:172. [PMID: 38951922 PMCID: PMC11218265 DOI: 10.1186/s13059-024-03314-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 06/17/2024] [Indexed: 07/03/2024] Open
Abstract
BACKGROUND Computational variant effect predictors offer a scalable and increasingly reliable means of interpreting human genetic variation, but concerns of circularity and bias have limited previous methods for evaluating and comparing predictors. Population-level cohorts of genotyped and phenotyped participants that have not been used in predictor training can facilitate an unbiased benchmarking of available methods. Using a curated set of human gene-trait associations with a reported rare-variant burden association, we evaluate the correlations of 24 computational variant effect predictors with associated human traits in the UK Biobank and All of Us cohorts. RESULTS AlphaMissense outperformed all other predictors in inferring human traits based on rare missense variants in UK Biobank and All of Us participants. The overall rankings of computational variant effect predictors in these two cohorts showed a significant positive correlation. CONCLUSION We describe a method to assess computational variant effect predictors that sidesteps the limitations of previous evaluations. This approach is generalizable to future predictors and could continue to inform predictor choice for personal and clinical genetics.
Collapse
Affiliation(s)
- Daniel R Tabet
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Da Kuang
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Megan C Lancaster
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Roujia Li
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Karen Liu
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Jochen Weile
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Atina G Coté
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Yingzhou Wu
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Robert A Hegele
- Department of Medicine, Department of Biochemistry, Schulich School of Medicine and Dentistry, Robarts Research Institute, Western University, London, ON, Canada
| | - Dan M Roden
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Pharmacology, Vanderbilt University Medical Centre, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Frederick P Roth
- Donnelly Centre, University of Toronto, Toronto, ON, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada.
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
3
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: Trends from 25 years of genetic variant impact predictors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600283. [PMID: 38979289 PMCID: PMC11230257 DOI: 10.1101/2024.06.25.600283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Background Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). Results The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past 25 years, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 186 VIPs, resulting in a total of 403 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. Conclusions VIPdb version 2 summarizes 403 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. Availability VIPdb version 2 is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
| | - Arul S. Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Currently at: Illumina, Foster City, California 94404, USA
| | - Steven E. Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
4
|
Rastogi R, Chung R, Li S, Li C, Lee K, Woo J, Kim DW, Keum C, Babbi G, Martelli PL, Savojardo C, Casadio R, Chennen K, Weber T, Poch O, Ancien F, Cia G, Pucci F, Raimondi D, Vranken W, Rooman M, Marquet C, Olenyi T, Rost B, Andreoletti G, Kamandula A, Peng Y, Bakolitsa C, Mort M, Cooper DN, Bergquist T, Pejaver V, Liu X, Radivojac P, Brenner SE, Ioannidis NM. Critical assessment of missense variant effect predictors on disease-relevant variant data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.06.597828. [PMID: 38895200 PMCID: PMC11185644 DOI: 10.1101/2024.06.06.597828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Regular, systematic, and independent assessment of computational tools used to predict the pathogenicity of missense variants is necessary to evaluate their clinical and research utility and suggest directions for future improvement. Here, as part of the sixth edition of the Critical Assessment of Genome Interpretation (CAGI) challenge, we assess missense variant effect predictors (or variant impact predictors) on an evaluation dataset of rare missense variants from disease-relevant databases. Our assessment evaluates predictors submitted to the CAGI6 Annotate-All-Missense challenge, predictors commonly used by the clinical genetics community, and recently developed deep learning methods for variant effect prediction. To explore a variety of settings that are relevant for different clinical and research applications, we assess performance within different subsets of the evaluation data and within high-specificity and high-sensitivity regimes. We find strong performance of many predictors across multiple settings. Meta-predictors tend to outperform their constituent individual predictors; however, several individual predictors have performance similar to that of commonly used meta-predictors. The relative performance of predictors differs in high-specificity and high-sensitivity regimes, suggesting that different methods may be best suited to different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors supervised on pathogenicity labels from curated variant databases often learn label imbalances within genes. Overall, we find notable advances over the oldest and most cited missense variant effect predictors and continued improvements among the most recently developed tools, and the CAGI Annotate-All-Missense challenge (also termed the Missense Marathon) will continue to assess state-of-the-art methods as the field progresses. Together, our results help illuminate the current clinical and research utility of missense variant effect predictors and identify potential areas for future development.
Collapse
|
5
|
Zhou Y, Pirmann S, Lauschke VM. APF2: an improved ensemble method for pharmacogenomic variant effect prediction. THE PHARMACOGENOMICS JOURNAL 2024; 24:17. [PMID: 38802404 PMCID: PMC11129946 DOI: 10.1038/s41397-024-00338-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 04/26/2024] [Accepted: 05/15/2024] [Indexed: 05/29/2024]
Abstract
Lack of efficacy or adverse drug response are common phenomena in pharmacological therapy causing considerable morbidity and mortality. It is estimated that 20-30% of this variability in drug response stems from variations in genes encoding drug targets or factors involved in drug disposition. Leveraging such pharmacogenomic information for the preemptive identification of patients who would benefit from dose adjustments or alternative medications thus constitutes an important frontier of precision medicine. Computational methods can be used to predict the functional effects of variant of unknown significance. However, their performance on pharmacogenomic variant data has been lackluster. To overcome this limitation, we previously developed an ensemble classifier, termed APF, specifically designed for pharmacogenomic variant prediction. Here, we aimed to further improve predictions by leveraging recent key advances in the prediction of protein folding based on deep neural networks. Benchmarking of 28 variant effect predictors on 530 pharmacogenetic missense variants revealed that structural predictions using AlphaMissense were most specific, whereas APF exhibited the most balanced performance. We then developed a new tool, APF2, by optimizing algorithm parametrization of the top performing algorithms for pharmacogenomic variations and aggregating their predictions into a unified ensemble score. Importantly, APF2 provides quantitative variant effect estimates that correlate well with experimental results (R2 = 0.91, p = 0.003) and predicts the functional impact of pharmacogenomic variants with higher accuracy than previous methods, particularly for clinically relevant variations with actionable pharmacogenomic guidelines. We furthermore demonstrate better performance (92% accuracy) on an independent test set of 146 variants across 61 pharmacogenes not used for model training or validation. Application of APF2 to population-scale sequencing data from over 800,000 individuals revealed drastic ethnogeographic differences with important implications for pharmacotherapy. We thus think that APF2 holds the potential to improve the translation of genetic information into pharmacogenetic recommendations, thereby facilitating the use of Next-Generation Sequencing data for stratified medicine.
Collapse
Affiliation(s)
- Yitian Zhou
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
- Center for Molecular Medicine, Karolinska Institutet and University Hospital, Stockholm, Sweden
| | - Sebastian Pirmann
- Computational Oncology Group, Molecular Precision Oncology Program, National Center for Tumor Diseases (NCT) Heidelberg and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Helmholtz Information and Data Science School for Health, Karlsruhe/Heidelberg, Germany
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
| | - Volker M Lauschke
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden.
- Center for Molecular Medicine, Karolinska Institutet and University Hospital, Stockholm, Sweden.
- Dr Margarete Fischer-Bosch Institute of Clinical Pharmacology, Stuttgart, Germany.
- University of Tübingen, Tübingen, Germany.
| |
Collapse
|
6
|
Ginete C, Delgadinho M, Santos B, Miranda A, Silva C, Guerreiro P, Chimusa ER, Brito M. Genetic Modifiers of Sickle Cell Anemia Phenotype in a Cohort of Angolan Children. Genes (Basel) 2024; 15:469. [PMID: 38674403 PMCID: PMC11049512 DOI: 10.3390/genes15040469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 04/04/2024] [Accepted: 04/05/2024] [Indexed: 04/28/2024] Open
Abstract
The aim of this study was to identify genetic markers in the HBB Cluster; HBS1L-MYB intergenic region; and BCL11A, KLF1, FOX3, and ZBTB7A genes associated with the heterogeneous phenotypes of Sickle Cell Anemia (SCA) using next-generation sequencing, as well as to assess their influence and prevalence in an Angolan population. Hematological, biochemical, and clinical data were considered to determine patients' severity phenotypes. Samples from 192 patients were sequenced, and 5,019,378 variants of high quality were registered. A catalog of candidate modifier genes that clustered in pathophysiological pathways important for SCA was generated, and candidate genes associated with increasing vaso-occlusive crises (VOC) and with lower fetal hemoglobin (HbF) were identified. These data support the polygenic view of the genetic architecture of SCA phenotypic variability. Two single nucleotide polymorphisms in the intronic region of 2q16.1, harboring the BCL11A gene, are genome-wide and significantly associated with decreasing HbF. A set of variants was identified to nominally be associated with increasing VOC and are potential genetic modifiers harboring phenotypic variation among patients. To the best of our knowledge, this is the first investigation of clinical variation in SCA in Angola using a well-customized and targeted sequencing approach.
Collapse
Affiliation(s)
- Catarina Ginete
- H&TRC-Health & Technology Research Center, ESTeSL-Escola Superior de Tecnologia da Saúde, Instituto Politécnico de Lisboa, 1990-096 Lisbon, Portugal; (C.G.); (M.D.); (C.S.); (P.G.)
| | - Mariana Delgadinho
- H&TRC-Health & Technology Research Center, ESTeSL-Escola Superior de Tecnologia da Saúde, Instituto Politécnico de Lisboa, 1990-096 Lisbon, Portugal; (C.G.); (M.D.); (C.S.); (P.G.)
| | - Brígida Santos
- Centro de Investigação em Saúde de Angola (CISA), Bengo 9999, Angola;
- Hospital Pediátrico David Bernardino (HPDB), Luanda 3067, Angola
| | - Armandina Miranda
- Instituto Nacional de Saúde Doutor Ricardo Jorge (INSA), 1649-016 Lisbon, Portugal;
| | - Carina Silva
- H&TRC-Health & Technology Research Center, ESTeSL-Escola Superior de Tecnologia da Saúde, Instituto Politécnico de Lisboa, 1990-096 Lisbon, Portugal; (C.G.); (M.D.); (C.S.); (P.G.)
- Centro de Estatística e Aplicações, Universidade de Lisboa, 1649-013 Lisbon, Portugal
| | - Paulo Guerreiro
- H&TRC-Health & Technology Research Center, ESTeSL-Escola Superior de Tecnologia da Saúde, Instituto Politécnico de Lisboa, 1990-096 Lisbon, Portugal; (C.G.); (M.D.); (C.S.); (P.G.)
| | - Emile R. Chimusa
- Department of Applied Sciences, Faculty of Health and Life Sciences, Northumbria University, Newcastle upon Tyne NE1 8ST, UK;
| | - Miguel Brito
- H&TRC-Health & Technology Research Center, ESTeSL-Escola Superior de Tecnologia da Saúde, Instituto Politécnico de Lisboa, 1990-096 Lisbon, Portugal; (C.G.); (M.D.); (C.S.); (P.G.)
- Centro de Investigação em Saúde de Angola (CISA), Bengo 9999, Angola;
| |
Collapse
|
7
|
Wei X, Li H, Zhu T, Sun Z, Sui R. Genotype-Phenotype Associations in an X-Linked Retinoschisis Patient Cohort: The Molecular Dynamic Insight and a Promising SD-OCT Indicator. Invest Ophthalmol Vis Sci 2024; 65:17. [PMID: 38324300 PMCID: PMC10854265 DOI: 10.1167/iovs.65.2.17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 01/23/2024] [Indexed: 02/08/2024] Open
Abstract
Purpose This study investigated a three-dimensional indicator in spectral-domain optical coherence tomography (SD-OCT) and established phenotype-genotype correlation in X-linked retinoschisis (XLRS). Methods Thirty-seven patients with XLRS underwent comprehensive ophthalmic examinations, including visual acuity (VA), fundus examination, electroretinogram (ERG), and SD-OCT. SD-OCT parameters of central foveal thickness (CFT), cyst cavity volume (CCV), and photoreceptor outer segment length were assessed. CCV was defined as the sum of the areas of cyst cavities in uential B-scans, measured automatically by self-developed software (OCT-CCSEG). Structural changes of the protein associated with missense variants were quantified by molecular dynamics (MD). The correlation between genotype and phenotype was analyzed. Results Twenty-seven different RS1 variants were identified, including a novel variant c.336_337insT(p.L113Sfs*8). The average age of onset was 14.76 ± 15.75 years, and the mean VA was 0.84 ± 0.43 logMAR. The mean CCV was 1.69 ± 1.87 mm3, correlating significantly with CFT (R = 0.66; P < 0.01). In the genotype-phenotype analysis of missense variants, CCV significantly correlated with the structural effect on the protein of mutational changes referred to as wild type, including root-mean-square deviation (R = 0.34; P = 0.04), solvent accessible surface area (R = 0.38; P = 0.02), and surface hydrophobic area (R = 0.37; P = 0.03). The amplitude of scotopic 3.0 ERG a-waves and b-waves significantly correlated with the percentage change of the β-strand in the secondary structure (a-wave: R = -0.58, P < 0.01; b-wave: R = -0.53, P < 0.01). Conclusions CCV is a promising indicator to quantify the structural disorganization of XLRS retina. The OCT-CCSEG software calculated CCV automatically, potentially facilitating prognosis assessment and development of personalized treatment. Moreover, MD-involved genotype-phenotype analysis suggests an association between protein structural alterations and XLRS severity measured by CCV and ERG.
Collapse
Affiliation(s)
- Xing Wei
- Department of Ophthalmology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
| | - Hui Li
- Department of Ophthalmology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
| | - Tian Zhu
- Department of Ophthalmology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
| | - Zixi Sun
- Department of Ophthalmology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
| | - Ruifang Sui
- Department of Ophthalmology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
| |
Collapse
|
8
|
Nourbakhsh M, Degn K, Saksager A, Tiberti M, Papaleo E. Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks. Brief Bioinform 2024; 25:bbad519. [PMID: 38261338 PMCID: PMC10805075 DOI: 10.1093/bib/bbad519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 11/27/2023] [Accepted: 12/11/2023] [Indexed: 01/24/2024] Open
Abstract
The vast amount of available sequencing data allows the scientific community to explore different genetic alterations that may drive cancer or favor cancer progression. Software developers have proposed a myriad of predictive tools, allowing researchers and clinicians to compare and prioritize driver genes and mutations and their relative pathogenicity. However, there is little consensus on the computational approach or a golden standard for comparison. Hence, benchmarking the different tools depends highly on the input data, indicating that overfitting is still a massive problem. One of the solutions is to limit the scope and usage of specific tools. However, such limitations force researchers to walk on a tightrope between creating and using high-quality tools for a specific purpose and describing the complex alterations driving cancer. While the knowledge of cancer development increases daily, many bioinformatic pipelines rely on single nucleotide variants or alterations in a vacuum without accounting for cellular compartments, mutational burden or disease progression. Even within bioinformatics and computational cancer biology, the research fields work in silos, risking overlooking potential synergies or breakthroughs. Here, we provide an overview of databases and datasets for building or testing predictive cancer driver tools. Furthermore, we introduce predictive tools for driver genes, driver mutations, and the impact of these based on structural analysis. Additionally, we suggest and recommend directions in the field to avoid silo-research, moving towards integrative frameworks.
Collapse
Affiliation(s)
- Mona Nourbakhsh
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Kristine Degn
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Astrid Saksager
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Matteo Tiberti
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| | - Elena Papaleo
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| |
Collapse
|
9
|
Wang Z, Zhao G, Zhu Z, Wang Y, Xiang X, Zhang S, Luo T, Zhou Q, Qiu J, Tang B, Xia K, Li B, Li J. VarCards2: an integrated genetic and clinical database for ACMG-AMP variant-interpretation guidelines in the human whole genome. Nucleic Acids Res 2024; 52:D1478-D1489. [PMID: 37956311 PMCID: PMC10767961 DOI: 10.1093/nar/gkad1061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/21/2023] [Accepted: 10/25/2023] [Indexed: 11/15/2023] Open
Abstract
VarCards, an online database, combines comprehensive variant- and gene-level annotation data to streamline genetic counselling for coding variants. Recognising the increasing clinical relevance of non-coding variations, there has been an accelerated development of bioinformatics tools dedicated to interpreting non-coding variations, including single-nucleotide variants and copy number variations. Regrettably, most tools remain as either locally installed databases or command-line tools dispersed across diverse online platforms. Such a landscape poses inconveniences and challenges for genetic counsellors seeking to utilise these resources without advanced bioinformatics expertise. Consequently, we developed VarCards2, which incorporates nearly nine billion artificially generated single-nucleotide variants (including those from mitochondrial DNA) and compiles vital annotation information for genetic counselling based on ACMG-AMP variant-interpretation guidelines. These annotations include (I) functional effects; (II) minor allele frequencies; (III) comprehensive function and pathogenicity predictions covering all potential variants, such as non-synonymous substitutions, non-canonical splicing variants, and non-coding variations and (IV) gene-level information. Furthermore, VarCards2 incorporates 368 820 266 documented short insertions and deletions and 2 773 555 documented copy number variations, complemented by their corresponding annotation and prediction tools. In conclusion, VarCards2, by integrating over 150 variant- and gene-level annotation sources, significantly enhances the efficiency of genetic counselling and can be freely accessed at http://www.genemed.tech/varcards2/.
Collapse
Affiliation(s)
- Zheng Wang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Guihu Zhao
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Zhaopo Zhu
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Yijing Wang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Xudong Xiang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Shiyu Zhang
- Xiangya School of Medicine, Central South University, Changsha, Hunan 410013, China
| | - Tengfei Luo
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Qiao Zhou
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Jian Qiu
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Beisha Tang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, & Multi-Omics Research Center for Brain Disorders, The First Affiliated Hospital, University of South China, Hengyang, Hunan, China
| | - Kun Xia
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Bin Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Jinchen Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| |
Collapse
|
10
|
Tan HJ, Deng ZH, Shen H, Deng HW, Xiao HM. Single-cell RNA-seq identified novel genes involved in primordial follicle formation. Front Endocrinol (Lausanne) 2023; 14:1285667. [PMID: 38149096 PMCID: PMC10750415 DOI: 10.3389/fendo.2023.1285667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 11/27/2023] [Indexed: 12/28/2023] Open
Abstract
Introduction The number of primordial follicles (PFs) in mammals determines the ovarian reserve, and impairment of primordial follicle formation (PFF) will cause premature ovarian insufficiency (POI). Methods By analyzing public single-cell RNA sequencing performed during PFF on mice and human ovaries, we identified novel functional genes and novel ligand-receptor interaction during PFF. Based on immunofluorescence and in vitro ovarian culture, we confirmed mechanisms of genes and ligand-receptor interaction in PFF. We also applied whole exome sequencing (WES) in 93 cases with POI and whole genome sequencing (WGS) in 465 controls. Variants in POI patients were further investigated by in silico analysis and functional verification. Results We revealed ANXA7 (annexin A7) and GTF2F1 (general transcription factor IIF subunit 1) in germ cells to be novel potentially genes in promoting PFF. Ligand Mdk (midkine) in germ cells and its receptor Sdc1 (syndecan 1) in granulosa cells are novel interaction crucial for PFF. Based on immunofluorescence, we confirmed significant up-regulation of ANXA7 in PFs compared with germline cysts, and uniform expression of GTF2F1, MDK and SDC1 during PFF, in 25 weeks human fetal ovary. In vitro investigation indicated that Anxa7 and Gtf2f1 are vital for mice PFF by regulating Jak/Stat3 and Jnk signaling pathways, respectively. Ligand-receptor (Mdk-Sdc1) are crucial for PFF by regulating Pi3k-akt signaling pathway. Two heterozygous variants in GTF2F1, and one heterozygous variants in SDC1 were identified in cases, but no variant were identified in controls. The protein level of GTF2F1 or SDC1 in POI cases are significantly lower than that of controls, indicating the pathogenic effects of the two genes on ovarian function were dosage dependent. Discussion Our study identified novel genes and novel ligand-receptor interaction during PFF, and further expanding the genetic architecture of POI.
Collapse
Affiliation(s)
- Hang-Jing Tan
- Institute of Reproduction and Stem Cell Engineering, School of Basic Medical Science, Central South University, Changsha, China
- Center for Reproductive Health, and System Biology, Data Sciences, School of Basic Medical Science, Central South University, Changsha, China
| | - Zi-Heng Deng
- Institute of Reproduction and Stem Cell Engineering, School of Basic Medical Science, Central South University, Changsha, China
- Center for Reproductive Health, and System Biology, Data Sciences, School of Basic Medical Science, Central South University, Changsha, China
| | - Hui Shen
- Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University School of Medicine, New Orleans, LA, United States
| | - Hong-Wen Deng
- Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University School of Medicine, New Orleans, LA, United States
| | - Hong-Mei Xiao
- Institute of Reproduction and Stem Cell Engineering, School of Basic Medical Science, Central South University, Changsha, China
- Center for Reproductive Health, and System Biology, Data Sciences, School of Basic Medical Science, Central South University, Changsha, China
| |
Collapse
|
11
|
Stein D, Kars ME, Wu Y, Bayrak ÇS, Stenson PD, Cooper DN, Schlessinger A, Itan Y. Genome-wide prediction of pathogenic gain- and loss-of-function variants from ensemble learning of a diverse feature set. Genome Med 2023; 15:103. [PMID: 38037155 PMCID: PMC10688473 DOI: 10.1186/s13073-023-01261-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 11/16/2023] [Indexed: 12/02/2023] Open
Abstract
Gain-of-function (GOF) variants give rise to increased/novel protein functions whereas loss-of-function (LOF) variants lead to diminished protein function. Experimental approaches for identifying GOF and LOF are generally slow and costly, whilst available computational methods have not been optimized to discriminate between GOF and LOF variants. We have developed LoGoFunc, a machine learning method for predicting pathogenic GOF, pathogenic LOF, and neutral genetic variants, trained on a broad range of gene-, protein-, and variant-level features describing diverse biological characteristics. LoGoFunc outperforms other tools trained solely to predict pathogenicity for identifying pathogenic GOF and LOF variants and is available at https://itanlab.shinyapps.io/goflof/ .
Collapse
Affiliation(s)
- David Stein
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Meltem Ece Kars
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Yiming Wu
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- College of Life Science, China West Normal University, Nan Chong, Si Chuan, 637009, China
| | - Çiğdem Sevim Bayrak
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - Avner Schlessinger
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
- Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
| | - Yuval Itan
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
| |
Collapse
|
12
|
Jorge SD, Chi YI, Mazaba JL, Haque N, Wagenknecht J, Smith BC, Volkman BF, Mathison AJ, Lomberk G, Zimmermann MT, Urrutia R. Deep computational phenotyping of genomic variants impacting the SET domain of KMT2C reveal molecular mechanisms for their dysfunction. Front Genet 2023; 14:1291307. [PMID: 38090150 PMCID: PMC10715303 DOI: 10.3389/fgene.2023.1291307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 11/17/2023] [Indexed: 12/29/2023] Open
Abstract
Introduction: Kleefstra Syndrome type 2 (KLEFS-2) is a genetic, neurodevelopmental disorder characterized by intellectual disability, infantile hypotonia, severe expressive language delay, and characteristic facial appearance, with a spectrum of other distinct clinical manifestations. Pathogenic mutations in the epigenetic modifier type 2 lysine methyltransferase KMT2C have been identified to be causative in KLEFS-2 individuals. Methods: This work reports a translational genomic study that applies a multidimensional computational approach for deep variant phenotyping, combining conventional genomic analyses, advanced protein bioinformatics, computational biophysics, biochemistry, and biostatistics-based modeling. We use standard variant annotation, paralog annotation analyses, molecular mechanics, and molecular dynamics simulations to evaluate damaging scores and provide potential mechanisms underlying KMT2C variant dysfunction. Results: We integrated data derived from the structure and dynamics of KMT2C to classify variants into SV (Structural Variant), DV (Dynamic Variant), SDV (Structural and Dynamic Variant), and VUS (Variant of Uncertain Significance). When compared with controls, these variants show values reflecting alterations in molecular fitness in both structure and dynamics. Discussion: We demonstrate that our 3D models for KMT2C variants suggest distinct mechanisms that lead to their imbalance and are not predictable from sequence alone. Thus, the missense variants studied here cause destabilizing effects on KMT2C function by different biophysical and biochemical mechanisms which we adeptly describe. This new knowledge extends our understanding of how variations in the KMT2C gene cause the dysfunction of its methyltransferase enzyme product, thereby bearing significant biomedical relevance for carriers of KLEFS2-associated genomic mutations.
Collapse
Affiliation(s)
- Salomão Dória Jorge
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Young-In Chi
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
- Division of Research, Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Jose Lizarraga Mazaba
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Neshatul Haque
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Jessica Wagenknecht
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Brian C. Smith
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Brian F. Volkman
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Angela J. Mathison
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
- Division of Research, Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Gwen Lomberk
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
- Division of Research, Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, United States
- Department of Pharmacology and Toxicology, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Michael T. Zimmermann
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI, United States
- Clinical and Translational Sciences Institute, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Raul Urrutia
- Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
- Division of Research, Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, United States
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI, United States
| |
Collapse
|
13
|
Ge F, Arif M, Yan Z, Alahmadi H, Worachartcheewan A, Yu DJ, Shoombuatong W. MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction. J Chem Inf Model 2023; 63:7239-7257. [PMID: 37947586 PMCID: PMC10685454 DOI: 10.1021/acs.jcim.3c00950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 10/21/2023] [Accepted: 10/23/2023] [Indexed: 11/12/2023]
Abstract
Understanding the pathogenicity of missense mutation (MM) is essential for shed light on genetic diseases, gene functions, and individual variations. In this study, we propose a novel computational approach, called MMPatho, for enhancing missense mutation pathogenic prediction. First, we established a large-scale nonredundant MM benchmark data set based on the entire Ensembl database, complemented by a focused blind test set specifically for pathogenic GOF/LOF MM. Based on this data set, for each mutation, we utilized Ensembl VEP v104 and dbNSFP v4.1a to extract variant-level, amino acid-level, individuals' outputs, and genome-level features. Additionally, protein sequences were generated using ENSP identifiers with the Ensembl API, and then encoded. The mutant sites' ESM-1b and ProtTrans-T5 embeddings were subsequently extracted. Then, our model group (MMPatho) was developed by leveraging upon these efforts, which comprised ConsMM and EvoIndMM. To be specific, ConsMM employs individuals' outputs and XGBoost with SHAP explanation analysis, while EvoIndMM investigates the potential enhancement of predictive capability by incorporating evolutionary information from ESM-1b and ProtT5-XL-U50, large protein language embeddings. Through rigorous comparative experiments, both ConsMM and EvoIndMM were capable of achieving remarkable AUROC (0.9836 and 0.9854) and AUPR (0.9852 and 0.9902) values on the blind test set devoid of overlapping variations and proteins from the training data, thus highlighting the superiority of our computational approach in the prediction of MM pathogenicity. Our Web server, available at http://csbio.njust.edu.cn/bioinf/mmpatho/, allows researchers to predict the pathogenicity (alongside the reliability index score) of MMs using the ConsMM and EvoIndMM models and provides extensive annotations for user input. Additionally, the newly constructed benchmark data set and blind test set can be accessed via the data page of our web server.
Collapse
Affiliation(s)
- Fang Ge
- School
of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, 9 Wenyuanlu, Nanjing 210023, China
- Center
for Research Innovation and Biomedical Informatics, Faculty of Medical
Technology, Mahidol University, Bangkok 10700, Thailand
| | - Muhammad Arif
- College
of Science and Engineering, Hamad Bin Khalifa
University, Doha 34110, Qatar
- Department
of Community Medical Technology, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Zihao Yan
- School
of Computer Science and Engineering, Nanjing
University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Hanin Alahmadi
- College of
Computer Science and Engineering, Taibah
University, Madinah 344, Saudi Arabia
| | - Apilak Worachartcheewan
- Department
of Community Medical Technology, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Dong-Jun Yu
- School
of Computer Science and Engineering, Nanjing
University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Watshara Shoombuatong
- Center
for Research Innovation and Biomedical Informatics, Faculty of Medical
Technology, Mahidol University, Bangkok 10700, Thailand
| |
Collapse
|
14
|
Moore A, Marks JA, Quach BC, Guo Y, Bierut LJ, Gaddis NC, Hancock DB, Page GP, Johnson EO. Evaluating 17 methods incorporating biological function with GWAS summary statistics to accelerate discovery demonstrates a tradeoff between high sensitivity and high positive predictive value. Commun Biol 2023; 6:1199. [PMID: 38001305 PMCID: PMC10673847 DOI: 10.1038/s42003-023-05413-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 10/03/2023] [Indexed: 11/26/2023] Open
Abstract
Where sufficiently large genome-wide association study (GWAS) samples are not currently available or feasible, methods that leverage increasing knowledge of the biological function of variants may illuminate discoveries without increasing sample size. We comprehensively evaluated 17 functional weighting methods for identifying novel associations. We assessed the performance of these methods using published results from multiple GWAS waves across each of five complex traits. Although no method achieved both high sensitivity and positive predictive value (PPV) for any trait, a subset of methods utilizing pleiotropy and expression quantitative trait loci nominated variants with high PPV (>75%) for multiple traits. Application of functionally weighting methods to enhance GWAS power for locus discovery is unlikely to circumvent the need for larger sample sizes in truly underpowered GWAS, but these results suggest that applying functional weighting to GWAS can accurately nominate additional novel loci from available samples for follow-up studies.
Collapse
Affiliation(s)
- Amy Moore
- Genomics and Translational Research Center, RTI International, Research Triangle Park, NC, 27709, USA.
| | - Jesse A Marks
- Genomics and Translational Research Center, RTI International, Research Triangle Park, NC, 27709, USA
| | - Bryan C Quach
- Genomics and Translational Research Center, RTI International, Research Triangle Park, NC, 27709, USA
| | - Yuelong Guo
- GeneCentric Therapeutics, Inc., Cary, NC, USA
| | - Laura J Bierut
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, USA
| | - Nathan C Gaddis
- Genomics and Translational Research Center, RTI International, Research Triangle Park, NC, 27709, USA
| | - Dana B Hancock
- Genomics and Translational Research Center, RTI International, Research Triangle Park, NC, 27709, USA
| | - Grier P Page
- Genomics and Translational Research Center, RTI International, Research Triangle Park, NC, 27709, USA
- Fellow Program, RTI International, Research Triangle Park, NC, 27709, USA
| | - Eric O Johnson
- Genomics and Translational Research Center, RTI International, Research Triangle Park, NC, 27709, USA.
- Fellow Program, RTI International, Research Triangle Park, NC, 27709, USA.
| |
Collapse
|
15
|
Tao LR, Ye Y, Zhao H. Early breast cancer risk detection: a novel framework leveraging polygenic risk scores and machine learning. J Med Genet 2023; 60:960-964. [PMID: 37055164 DOI: 10.1136/jmg-2022-108582] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Accepted: 03/27/2023] [Indexed: 04/15/2023]
Abstract
BACKGROUND Breast cancer (BC) is the most common cancer and the second leading cause of cancer death in women; an estimated one in eight women in the USA will develop BC during her lifetime. However, current methods of BC screening, including clinical breast exams, mammograms, biopsies and others, are often underused due to limited access, expense and a lack of risk awareness, causing 30% (up to 80% in low-income and middle-income countries) of patients with BC to miss the precious early detection phase. METHODS This study creates a key step to supplement the current BC diagnostic pipeline: a prescreening platform, prior to traditional detection and diagnostic steps. We have developed BREast CAncer Risk Detection Application (BRECARDA), a novel framework that personalises BC risk assessment using artificial intelligence neural networks to incorporate relevant genetic and non-genetic risk factors. A polygenic risk score (PRS) was enhanced by employing AnnoPred and validated by fivefolds cross-validation, outperforming three existing state-of-the-art PRS methods. RESULTS We used data from 97 597 female participants of the UK BioBank to train our algorithm. Using the enhanced PRS thus trained together with non-genetic information, BRECARDA was evaluated in a testing dataset with 48 074 UK Biobank female participants and achieved a high accuracy of 94.28% and area under the curve of 0.7861. Our optimised AnnoPred outperformed other state-of-the-art methods on quantifying genetic risk, indicating its potential for supplementing the current BC detection tests, population screening and risk evaluation. CONCLUSION BRECARDA can enhance disease risk prediction, identify high-risk individuals for BC screening, facilitate disease diagnosis and improve population-level screening efficiency. It can serve as a valuable and supplemental platform to assist doctors in BC diagnosis and evaluation.
Collapse
Affiliation(s)
- Lynn Rose Tao
- Thomas Jefferson High School for Science and Technology, Alexandria, Virginia, USA
| | - Yixuan Ye
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT, USA
| |
Collapse
|
16
|
He Q, Keding TJ, Zhang Q, Miao J, Russell JD, Herringa RJ, Lu Q, Travers BG, Li JJ. Neurogenetic mechanisms of risk for ADHD: Examining associations of polygenic scores and brain volumes in a population cohort. J Neurodev Disord 2023; 15:30. [PMID: 37653373 PMCID: PMC10469494 DOI: 10.1186/s11689-023-09498-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 08/21/2023] [Indexed: 09/02/2023] Open
Abstract
BACKGROUND ADHD polygenic scores (PGSs) have been previously shown to predict ADHD outcomes in several studies. However, ADHD PGSs are typically correlated with ADHD but not necessarily reflective of causal mechanisms. More research is needed to elucidate the neurobiological mechanisms underlying ADHD. We leveraged functional annotation information into an ADHD PGS to (1) improve the prediction performance over a non-annotated ADHD PGS and (2) test whether volumetric variation in brain regions putatively associated with ADHD mediate the association between PGSs and ADHD outcomes. METHODS Data were from the Philadelphia Neurodevelopmental Cohort (N = 555). Multiple mediation models were tested to examine the indirect effects of two ADHD PGSs-one using a traditional computation involving clumping and thresholding and another using a functionally annotated approach (i.e., AnnoPred)-on ADHD inattention (IA) and hyperactivity-impulsivity (HI) symptoms, via gray matter volumes in the cingulate gyrus, angular gyrus, caudate, dorsolateral prefrontal cortex (DLPFC), and inferior temporal lobe. RESULTS A direct effect was detected between the AnnoPred ADHD PGS and IA symptoms in adolescents. No indirect effects via brain volumes were detected for either IA or HI symptoms. However, both ADHD PGSs were negatively associated with the DLPFC. CONCLUSIONS The AnnoPred ADHD PGS was a more developmentally specific predictor of adolescent IA symptoms compared to the traditional ADHD PGS. However, brain volumes did not mediate the effects of either a traditional or AnnoPred ADHD PGS on ADHD symptoms, suggesting that we may still be underpowered in clarifying brain-based biomarkers for ADHD using genetic measures.
Collapse
Affiliation(s)
- Quanfa He
- Department of Psychology, University of, Wisconsin-Madison, 1202 W. Johnson Street, Madison, WI, 53706, USA
- Waisman Center, University of Wisconsin-Madison, Madison, USA
| | | | - Qi Zhang
- Department of Educational Psychology, University of Wisconsin-Madison, Madison, USA
| | - Jiacheng Miao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, USA
| | - Justin D Russell
- Department of Psychiatry, School of Medicine and Public Health, University of Wisconsin, Madison, USA
| | - Ryan J Herringa
- Department of Psychiatry, School of Medicine and Public Health, University of Wisconsin, Madison, USA
| | - Qiongshi Lu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, USA
- Center for Demography of Health and Aging, University of Wisconsin-Madison, Madison, USA
- Department of Statistics, University of Wisconsin-Madison, Madison, USA
| | - Brittany G Travers
- Waisman Center, University of Wisconsin-Madison, Madison, USA
- Department of Kinesiology, University of Wisconsin-Madison, Madison, USA
| | - James J Li
- Department of Psychology, University of, Wisconsin-Madison, 1202 W. Johnson Street, Madison, WI, 53706, USA.
- Waisman Center, University of Wisconsin-Madison, Madison, USA.
- Center for Demography of Health and Aging, University of Wisconsin-Madison, Madison, USA.
| |
Collapse
|
17
|
Ye Y, Noche RB, Szejko N, Both CP, Acosta JN, Leasure AC, Brown SC, Sheth KN, Gill TM, Zhao H, Falcone GJ. A genome-wide association study of frailty identifies significant genetic correlation with neuropsychiatric, cardiovascular, and inflammation pathways. GeroScience 2023; 45:2511-2523. [PMID: 36928559 PMCID: PMC10651618 DOI: 10.1007/s11357-023-00771-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Accepted: 03/10/2023] [Indexed: 03/18/2023] Open
Abstract
Frailty is an aging-related clinical phenotype defined as a state in which there is an increase in a person's vulnerability for dependency and/or mortality when exposed to a stressor. While underlying mechanisms leading to the occurrence of frailty are complex, the importance of genetic factors has not been fully investigated. We conducted a large-scale genome-wide association study (GWAS) of frailty, as defined by the five criteria (weight loss, exhaustion, physical activity, walking speed, and grip strength) captured in the Fried Frailty Score (FFS), in 386,565 European descent participants enrolled in the UK Biobank (mean age 57 [SD 8] years, 208,481 [54%] females). We identified 37 independent, novel loci associated with the FFS (p < 5 × 10-8), including seven loci without prior described associations with other traits. The variants associated with FFS were significantly enriched in brain tissues as well as aging-related pathways. Our post-GWAS bioinformatic analyses revealed significant genetic correlations between FFS and cardiovascular-, neurological-, and inflammation-related diseases/traits, and subsequent Mendelian Randomization analyses identified causal associations with chronic pain, obesity, diabetes, education-related traits, joint disorders, and depressive/neurological, metabolic, and respiratory diseases. The GWAS signals were replicated in the Health and Retirement Study (HRS, n = 9,720, mean age 73 [SD 7], 5,582 [57%] females), where the polygenic risk score built from UKB GWAS was significantly associated with the FFS in HRS individuals (OR per SD of the score 1.27, 95% CI 1.22-1.31, p = 1.3 × 10-11). These results provide new insight into the biology of frailty by comprehensively evaluating its genetic architecture.
Collapse
Affiliation(s)
- Yixuan Ye
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Rommell B Noche
- Department of Neurology, Yale School of Medicine, 15 York Street, LLCI Room 1004D, P.O. Box 20801, New Haven, CT, 06510, USA
| | - Natalia Szejko
- Department of Neurology, Yale School of Medicine, 15 York Street, LLCI Room 1004D, P.O. Box 20801, New Haven, CT, 06510, USA
- Department of Neurology, Medical University of Warsaw, Warsaw, Poland
- Department of Bioethics, Medical University of Warsaw, Warsaw, Poland
| | - Cameron P Both
- Department of Neurology, Yale School of Medicine, 15 York Street, LLCI Room 1004D, P.O. Box 20801, New Haven, CT, 06510, USA
| | - Julian N Acosta
- Department of Neurology, Yale School of Medicine, 15 York Street, LLCI Room 1004D, P.O. Box 20801, New Haven, CT, 06510, USA
| | - Audrey C Leasure
- Department of Neurology, Yale School of Medicine, 15 York Street, LLCI Room 1004D, P.O. Box 20801, New Haven, CT, 06510, USA
| | - Stacy C Brown
- University of Hawai'I, John A. Burns School of Medicine, Honolulu, HI, USA
| | - Kevin N Sheth
- Department of Neurology, Yale School of Medicine, 15 York Street, LLCI Room 1004D, P.O. Box 20801, New Haven, CT, 06510, USA
| | - Thomas M Gill
- Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.
- Department of Biostatistics, Yale School of Public Health, 60 College Street, P.O. Box 208034, New Haven, CT, 06520, USA.
| | - Guido J Falcone
- Department of Neurology, Yale School of Medicine, 15 York Street, LLCI Room 1004D, P.O. Box 20801, New Haven, CT, 06510, USA.
| |
Collapse
|
18
|
Shi FY, Wang Y, Huang D, Liang Y, Liang N, Chen XW, Gao G. Computational Assessment of the Expression-modulating Potential for Non-coding Variants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:662-673. [PMID: 34890839 PMCID: PMC10787178 DOI: 10.1016/j.gpb.2021.10.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 10/13/2021] [Accepted: 11/01/2021] [Indexed: 06/13/2023]
Abstract
Large-scale genome-wide association studies (GWAS) and expression quantitative trait locus (eQTL) studies have identified multiple non-coding variants associated with genetic diseases by affecting gene expression. However, pinpointing causal variants effectively and efficiently remains a serious challenge. Here, we developed CARMEN, a novel algorithm to identify functional non-coding expression-modulating variants. Multiple evaluations demonstrated CARMEN's superior performance over state-of-the-art tools. Applying CARMEN to GWAS and eQTL datasets further pinpointed several causal variants other than the reported lead single-nucleotide polymorphisms (SNPs). CARMEN scales well with the massive datasets, and is available online as a web server at http://carmen.gao-lab.org.
Collapse
Affiliation(s)
- Fang-Yuan Shi
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Yu Wang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Dong Huang
- State Key Laboratory of Membrane Biology, Institute of Molecular Medicine, Peking University, Beijing 100871, China
| | - Yu Liang
- Human Aging Research Institute, School of Life Science, Nanchang University, Nanchang 330031, China
| | - Nan Liang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Xiao-Wei Chen
- State Key Laboratory of Membrane Biology, Institute of Molecular Medicine, Peking University, Beijing 100871, China; Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Ge Gao
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China.
| |
Collapse
|
19
|
Wang Z, Zhao G, Li B, Fang Z, Chen Q, Wang X, Luo T, Wang Y, Zhou Q, Li K, Xia L, Zhang Y, Zhou X, Pan H, Zhao Y, Wang Y, Wang L, Guo J, Tang B, Xia K, Li J. Performance Comparison of Computational Methods for the Prediction of the Function and Pathogenicity of Non-coding Variants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:649-661. [PMID: 35272052 PMCID: PMC10787016 DOI: 10.1016/j.gpb.2022.02.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 12/28/2021] [Accepted: 02/27/2022] [Indexed: 06/14/2023]
Abstract
Non-coding variants in the human genome significantly influence human traits and complex diseases via their regulation and modification effects. Hence, an increasing number of computational methods are developed to predict the effects of variants in human non-coding sequences. However, it is difficult for inexperienced users to select appropriate computational methods from dozens of available methods. To solve this issue, we assessed 12 performance metrics of 24 methods on four independent non-coding variant benchmark datasets: (1) rare germline variants from clinical relevant sequence variants (ClinVar), (2) rare somatic variants from Catalogue Of Somatic Mutations In Cancer (COSMIC), (3) common regulatory variants from curated expression quantitative trait locus (eQTL) data, and (4) disease-associated common variants from curated genome-wide association studies (GWAS). All 24 tested methods performed differently under various conditions, indicating varying strengths and weaknesses under different scenarios. Importantly, the performance of existing methods was acceptable for rare germline variants from ClinVar with the area under the receiver operating characteristic curve (AUROC) of 0.4481-0.8033 and poor for rare somatic variants from COSMIC (AUROC = 0.4984-0.7131), common regulatory variants from curated eQTL data (AUROC = 0.4837-0.6472), and disease-associated common variants from curated GWAS (AUROC = 0.4766-0.5188). We also compared the prediction performance of 24 methods for non-coding de novo mutations in autism spectrum disorder, and found that the combined annotation-dependent depletion (CADD) and context-dependent tolerance score (CDTS) methods showed better performance. Summarily, we assessed the performance of 24 computational methods under diverse scenarios, providing preliminary advice for proper tool selection and guiding the development of new techniques in interpreting non-coding variants.
Collapse
Affiliation(s)
- Zheng Wang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Guihu Zhao
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Bin Li
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Zhenghuan Fang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Qian Chen
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Xiaomeng Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Tengfei Luo
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Yijing Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Qiao Zhou
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Kuokuo Li
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Lu Xia
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Yi Zhang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Xun Zhou
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Hongxu Pan
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Yuwen Zhao
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Yige Wang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Lin Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China; Reproductive Medicine Center, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Jifeng Guo
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Beisha Tang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Kun Xia
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Jinchen Li
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China; Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China.
| |
Collapse
|
20
|
Johnson EC, Kapoor M, Hatoum AS, Zhou H, Polimanti R, Wendt FR, Walters RK, Lai D, Kember RL, Hartz S, Meyers JL, Peterson RE, Ripke S, Bigdeli TB, Fanous AH, Pato CN, Pato MT, Goate AM, Kranzler HR, O'Donovan MC, Walters JTR, Gelernter J, Edenberg HJ, Agrawal A. Investigation of convergent and divergent genetic influences underlying schizophrenia and alcohol use disorder. Psychol Med 2023; 53:1196-1204. [PMID: 34231451 PMCID: PMC8738774 DOI: 10.1017/s003329172100266x] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
BACKGROUND Alcohol use disorder (AUD) and schizophrenia (SCZ) frequently co-occur, and large-scale genome-wide association studies (GWAS) have identified significant genetic correlations between these disorders. METHODS We used the largest published GWAS for AUD (total cases = 77 822) and SCZ (total cases = 46 827) to identify genetic variants that influence both disorders (with either the same or opposite direction of effect) and those that are disorder specific. RESULTS We identified 55 independent genome-wide significant single nucleotide polymorphisms with the same direction of effect on AUD and SCZ, 8 with robust effects in opposite directions, and 98 with disorder-specific effects. We also found evidence for 12 genes whose pleiotropic associations with AUD and SCZ are consistent with mediation via gene expression in the prefrontal cortex. The genetic covariance between AUD and SCZ was concentrated in genomic regions functional in brain tissues (p = 0.001). CONCLUSIONS Our findings provide further evidence that SCZ shares meaningful genetic overlap with AUD.
Collapse
Affiliation(s)
- Emma C Johnson
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
| | - Manav Kapoor
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Alexander S Hatoum
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
| | - Hang Zhou
- Department of Psychiatry, Division of Human Genetics, Yale University School of Medicine, New Haven, CT, USA
- Department of Psychiatry, Veterans Affairs Connecticut Healthcare System, West Haven, CT, USA
| | - Renato Polimanti
- Department of Psychiatry, Division of Human Genetics, Yale University School of Medicine, New Haven, CT, USA
- Department of Psychiatry, Veterans Affairs Connecticut Healthcare System, West Haven, CT, USA
| | - Frank R Wendt
- Department of Psychiatry, Division of Human Genetics, Yale University School of Medicine, New Haven, CT, USA
- Department of Psychiatry, Veterans Affairs Connecticut Healthcare System, West Haven, CT, USA
| | - Raymond K Walters
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Dongbing Lai
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Rachel L Kember
- Department of Psychiatry, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- VISN 4 MIRECC, Crescenz VAMC, Philadelphia, PA, USA
| | - Sarah Hartz
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
| | - Jacquelyn L Meyers
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
- Henri Begleiter Neurodynamics Laboratory, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Roseann E Peterson
- Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA
| | - Stephan Ripke
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Psychiatry and Psychotherapy, Charité - Universitätsmedizin Berlin, Campus Mitte, Berlin, Germany
| | - Tim B Bigdeli
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Ayman H Fanous
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Carlos N Pato
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Michele T Pato
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Alison M Goate
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Henry R Kranzler
- Department of Psychiatry, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- VISN 4 MIRECC, Crescenz VAMC, Philadelphia, PA, USA
| | - Michael C O'Donovan
- Division of Psychological Medicine and Clinical Neurosciences, MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University School of Medicine, Cardiff, UK
| | - James T R Walters
- Division of Psychological Medicine and Clinical Neurosciences, MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University School of Medicine, Cardiff, UK
| | - Joel Gelernter
- Department of Psychiatry, Division of Human Genetics, Yale University School of Medicine, New Haven, CT, USA
- Department of Psychiatry, Veterans Affairs Connecticut Healthcare System, West Haven, CT, USA
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Department of Neuroscience, Yale University School of Medicine, New Haven, CT, USA
| | - Howard J Edenberg
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Arpana Agrawal
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
| |
Collapse
|
21
|
Zhang J, Zhao H. eQTL Studies: from Bulk Tissues to Single Cells. ARXIV 2023:arXiv:2302.11662v1. [PMID: 36866231 PMCID: PMC9980190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
An expression quantitative trait locus (eQTL) is a chromosomal region where genetic variants are associated with the expression levels of certain genes that can be both nearby or distant. The identifications of eQTLs for different tissues, cell types, and contexts have led to better understanding of the dynamic regulations of gene expressions and implications of functional genes and variants for complex traits and diseases. Although most eQTL studies to date have been performed on data collected from bulk tissues, recent studies have demonstrated the importance of cell-type-specific and context-dependent gene regulations in biological processes and disease mechanisms. In this review, we discuss statistical methods that have been developed to enable the detections of cell-type-specific and context-dependent eQTLs from bulk tissues, purified cell types, and single cells. We also discuss the limitations of the current methods and future research opportunities.
Collapse
Affiliation(s)
- Jingfei Zhang
- Information Systems and Operations Management, Emory University
| | - Hongyu Zhao
- Department of Biostatistics, Yale University
| |
Collapse
|
22
|
Molecular Dynamic Simulation Analysis of a Novel Missense Variant in CYB5R3 Gene in Patients with Methemoglobinemia. MEDICINA (KAUNAS, LITHUANIA) 2023; 59:medicina59020379. [PMID: 36837579 PMCID: PMC9967277 DOI: 10.3390/medicina59020379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 02/13/2023] [Accepted: 02/14/2023] [Indexed: 02/18/2023]
Abstract
Background and Objective: Mutations in the CYB5R3 gene cause reduced NADH-dependent cytochrome b5 reductase enzyme function and consequently lead to recessive congenital methemoglobinemia (RCM). RCM exists as RCM type I (RCM1) and RCM type II (RCM2). RCM1 leads to higher methemoglobin levels causing only cyanosis, while in RCM2, neurological complications are also present along with cyanosis. Materials and Methods: In the current study, a consanguineous Pakistani family with three individuals showing clinical manifestations of cyanosis, chest pain radiating to the left arm, dyspnea, orthopnea, and hemoptysis was studied. Following clinical assessment, a search for the causative gene was performed using whole exome sequencing (WES) and Sanger sequencing. Various variant effect prediction tools and ACMG criteria were applied to interpret the pathogenicity of the prioritized variants. Molecular dynamic simulation studies of wild and mutant systems were performed to determine the stability of the mutant CYB5R3 protein. Results: Data analysis of WES revealed a novel homozygous missense variant NM_001171660.2: c.670A > T: NP_001165131.1: p.(Ile224Phe) in exon 8 of the CYB5R3 gene located on chromosome 22q13.2. Sanger sequencing validated the segregation of the identified variant with the disease phenotype within the family. Bioinformatics prediction tools and ACMG guidelines predicted the identified variant p.(Ile224Phe) as disease-causing and likely pathogenic, respectively. Molecular dynamics study revealed that the variant p.(Ile224Phe) in the CYB5R3 resides in the NADH domain of the protein, the aberrant function of which is detrimental. Conclusions: The present study expanded the variant spectrum of the CYB5R3 gene. This will facilitate genetic counselling of the same and other similar families carrying mutations in the CYB5R3 gene.
Collapse
|
23
|
Li RY, Huang Y, Zhao Z, Qin ZS. Comprehensive 100-bp resolution genome-wide epigenomic profiling data for the hg38 human reference genome. Data Brief 2022; 46:108827. [PMID: 36582986 PMCID: PMC9792340 DOI: 10.1016/j.dib.2022.108827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 11/21/2022] [Accepted: 12/09/2022] [Indexed: 12/15/2022] Open
Abstract
This manuscript presents a comprehensive collection of diverse epigenomic profiling data for the human genome in 100-bp resolution with full genome-wide coverage. The datasets are processed from raw read count data collected from five types of sequencing-based assays collected by the Encyclopedia of DNA Elements consortium (ENCODE, http://www.encodeproject.org). Data from high-throughput sequencing assays were processed and crystallized into a total of 6,305 genome-wide profiles. To ensure the quality of the features, we filtered out assays with low read depth, inconsistent read counts, and poor data quality. The types of sequencing-based experiment assays include DNase-seq, histone and TF ChIP-seq, ATAC-seq, and Poly(A) RNA-seq. Merging of processed data was done by averaging read counts across technical replicates to obtain signals in about 30 million predefined 100-bp bins that tile the entire genome. We provide an example of fetching read counts using disease-related risk variants from the GWAS Catalog. Additionally, we have created a tabix index enabling fast user retrieval of read counts given coordinates in the human genome. The data processing pipeline is replicable for users' own purposes and for other experimental assays. The processed data can be found on Zenodo at https://zenodo.org/record/7015783. These data can be used as features for statistical and machine learning models to predict or infer a wide range of variables of biological interest. They can also be applied to generate novel insights into gene expression, chromatin accessibility, and epigenetic modifications across the human genome. Finally, the processing pipeline can be easily applied to data from any other genome-wide profiling assays, expanding the amount of available data.
Collapse
Affiliation(s)
- Ronnie Y. Li
- Graduate program in Neuroscience, Emory University, United States
| | - Yanting Huang
- Department of Computer Science, Emory University, United States
| | - Zhiyue Zhao
- Department of Computer Science, Emory University, United States
| | - Zhaohui S. Qin
- Department of Biostatistics and Bioinformatics, Emory University, United States
- Corresponding author. @SteveQinEmory
| |
Collapse
|
24
|
Garcia FADO, de Andrade ES, Palmero EI. Insights on variant analysis in silico tools for pathogenicity prediction. Front Genet 2022; 13:1010327. [PMID: 36568376 PMCID: PMC9774026 DOI: 10.3389/fgene.2022.1010327] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 11/14/2022] [Indexed: 12/03/2022] Open
Abstract
Molecular biology is currently a fast-advancing science. Sequencing techniques are getting cheaper, but the interpretation of genetic variants requires expertise and computational power, therefore is still a challenge. Next-generation sequencing releases thousands of variants and to classify them, researchers propose protocols with several parameters. Here we present a review of several in silico pathogenicity prediction tools involved in the variant prioritization/classification process used by some international protocols for variant analysis and studies evaluating their efficiency.
Collapse
Affiliation(s)
| | | | - Edenir Inez Palmero
- Molecular Oncology Research Center—Barretos Cancer Hospital, Barretos, Brazil,National Institute of Cancer, Rio de Janeiro, Brazil,*Correspondence: Edenir Inez Palmero,
| |
Collapse
|
25
|
He Z, Liu L, Belloy ME, Le Guen Y, Sossin A, Liu X, Qi X, Ma S, Gyawali PK, Wyss-Coray T, Tang H, Sabatti C, Candès E, Greicius MD, Ionita-Laza I. GhostKnockoff inference empowers identification of putative causal variants in genome-wide association studies. Nat Commun 2022; 13:7209. [PMID: 36418338 PMCID: PMC9684164 DOI: 10.1038/s41467-022-34932-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 11/09/2022] [Indexed: 11/27/2022] Open
Abstract
Recent advances in genome sequencing and imputation technologies provide an exciting opportunity to comprehensively study the contribution of genetic variants to complex phenotypes. However, our ability to translate genetic discoveries into mechanistic insights remains limited at this point. In this paper, we propose an efficient knockoff-based method, GhostKnockoff, for genome-wide association studies (GWAS) that leads to improved power and ability to prioritize putative causal variants relative to conventional GWAS approaches. The method requires only Z-scores from conventional GWAS and hence can be easily applied to enhance existing and future studies. The method can also be applied to meta-analysis of multiple GWAS allowing for arbitrary sample overlap. We demonstrate its performance using empirical simulations and two applications: (1) a meta-analysis for Alzheimer's disease comprising nine overlapping large-scale GWAS, whole-exome and whole-genome sequencing studies and (2) analysis of 1403 binary phenotypes from the UK Biobank data in 408,961 samples of European ancestry. Our results demonstrate that GhostKnockoff can identify putatively functional variants with weaker statistical effects that are missed by conventional association tests.
Collapse
Affiliation(s)
- Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA. .,Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA, 94305, USA.
| | - Linxi Liu
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Michael E Belloy
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Yann Le Guen
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA.,Institut du Cerveau - Paris Brain Institute - ICM, Paris, 75013, France
| | - Aaron Sossin
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Xiaoxia Liu
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Xinran Qi
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Shiyang Ma
- Department of Biostatistics, Columbia University, New York, NY, 10032, USA
| | - Prashnna K Gyawali
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Tony Wyss-Coray
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Hua Tang
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Chiara Sabatti
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Emmanuel Candès
- Department of Statistics, Stanford University, Stanford, CA, 94305, USA.,Department of Mathematics, Stanford University, Stanford, CA, 94305, USA
| | - Michael D Greicius
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | | |
Collapse
|
26
|
Multi-omics approach dissects cis-regulatory mechanisms underlying North Carolina macular dystrophy, a retinal enhanceropathy. Am J Hum Genet 2022; 109:2029-2048. [PMID: 36243009 PMCID: PMC9674966 DOI: 10.1016/j.ajhg.2022.09.013] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 09/28/2022] [Indexed: 01/26/2023] Open
Abstract
North Carolina macular dystrophy (NCMD) is a rare autosomal-dominant disease affecting macular development. The disease is caused by non-coding single-nucleotide variants (SNVs) in two hotspot regions near PRDM13 and by duplications in two distinct chromosomal loci, overlapping DNase I hypersensitive sites near either PRDM13 or IRX1. To unravel the mechanisms by which these variants cause disease, we first established a genome-wide multi-omics retinal database, RegRet. Integration of UMI-4C profiles we generated on adult human retina then allowed fine-mapping of the interactions of the PRDM13 and IRX1 promoters and the identification of eighteen candidate cis-regulatory elements (cCREs), the activity of which was investigated by luciferase and Xenopus enhancer assays. Next, luciferase assays showed that the non-coding SNVs located in the two hotspot regions of PRDM13 affect cCRE activity, including two NCMD-associated non-coding SNVs that we identified herein. Interestingly, the cCRE containing one of these SNVs was shown to interact with the PRDM13 promoter, demonstrated in vivo activity in Xenopus, and is active at the developmental stage when progenitor cells of the central retina exit mitosis, suggesting that this region is a PRDM13 enhancer. Finally, mining of single-cell transcriptional data of embryonic and adult retina revealed the highest expression of PRDM13 and IRX1 when amacrine cells start to synapse with retinal ganglion cells, supporting the hypothesis that altered PRDM13 or IRX1 expression impairs interactions between these cells during retinogenesis. Overall, this study provides insight into the cis-regulatory mechanisms of NCMD and supports that this condition is a retinal enhanceropathy.
Collapse
|
27
|
Exploration of Tools for the Interpretation of Human Non-Coding Variants. Int J Mol Sci 2022; 23:ijms232112977. [PMID: 36361767 PMCID: PMC9654743 DOI: 10.3390/ijms232112977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/17/2022] [Accepted: 10/23/2022] [Indexed: 02/01/2023] Open
Abstract
The advent of Whole Genome Sequencing (WGS) broadened the genetic variation detection range, revealing the presence of variants even in non-coding regions of the genome, which would have been missed using targeted approaches. One of the most challenging issues in WGS analysis regards the interpretation of annotated variants. This review focuses on tools suitable for the functional annotation of variants falling into non-coding regions. It couples the description of non-coding genomic areas with the results and performance of existing tools for a functional interpretation of the effect of variants in these regions. Tools were tested in a controlled genomic scenario, representing the ground-truth and allowing us to determine software performance.
Collapse
|
28
|
Li C, Zhi D, Wang K, Liu X. MetaRNN: differentiating rare pathogenic and rare benign missense SNVs and InDels using deep learning. Genome Med 2022; 14:115. [PMID: 36209109 PMCID: PMC9548151 DOI: 10.1186/s13073-022-01120-z] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 09/22/2022] [Indexed: 11/22/2022] Open
Abstract
Multiple computational approaches have been developed to improve our understanding of genetic variants. However, their ability to identify rare pathogenic variants from rare benign ones is still lacking. Using context annotations and deep learning methods, we present pathogenicity prediction models, MetaRNN and MetaRNN-indel, to help identify and prioritize rare nonsynonymous single nucleotide variants (nsSNVs) and non-frameshift insertion/deletions (nfINDELs). We use independent test sets to demonstrate that these new models outperform state-of-the-art competitors and achieve a more interpretable score distribution. Importantly, prediction scores from both models are comparable, enabling easy adoption of integrated genotype-phenotype association analysis methods. All pre-computed nsSNV scores are available at http://www.liulab.science/MetaRNN. The stand-alone program is also available at https://github.com/Chang-Li2019/MetaRNN.
Collapse
Affiliation(s)
- Chang Li
- USF Genomics & College of Public Health, University of South Florida, 3720 Spectrum Boulevard, Suite 304, Tampa, FL, 33612, USA
| | - Degui Zhi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Kai Wang
- Children's Hospital of Philadelphia & Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Xiaoming Liu
- USF Genomics & College of Public Health, University of South Florida, 3720 Spectrum Boulevard, Suite 304, Tampa, FL, 33612, USA.
| |
Collapse
|
29
|
Huang YS, Hsu C, Chune YC, Liao IC, Wang H, Lin YL, Hwu WL, Lee NC, Lai F. Diagnosis of a Single-Nucleotide Variant in Whole-Exome Sequencing Data for Patients With Inherited Diseases: Machine Learning Study Using Artificial Intelligence Variant Prioritization. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2022; 3:e37701. [PMID: 38935959 PMCID: PMC11168239 DOI: 10.2196/37701] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 07/29/2022] [Accepted: 08/22/2022] [Indexed: 06/29/2024]
Abstract
BACKGROUND In recent years, thanks to the rapid development of next-generation sequencing (NGS) technology, an entire human genome can be sequenced in a short period. As a result, NGS technology is now being widely introduced into clinical diagnosis practice, especially for diagnosis of hereditary disorders. Although the exome data of single-nucleotide variant (SNV) can be generated using these approaches, processing the DNA sequence data of a patient requires multiple tools and complex bioinformatics pipelines. OBJECTIVE This study aims to assist physicians to automatically interpret the genetic variation information generated by NGS in a short period. To determine the true causal variants of a patient with genetic disease, currently, physicians often need to view numerous features on every variant manually and search for literature in different databases to understand the effect of genetic variation. METHODS We constructed a machine learning model for predicting disease-causing variants in exome data. We collected sequencing data from whole-exome sequencing (WES) and gene panel as training set, and then integrated variant annotations from multiple genetic databases for model training. The model built ranked SNVs and output the most possible disease-causing candidates. For model testing, we collected WES data from 108 patients with rare genetic disorders in National Taiwan University Hospital. We applied sequencing data and phenotypic information automatically extracted by a keyword extraction tool from patient's electronic medical records into our machine learning model. RESULTS We succeeded in locating 92.5% (124/134) of the causative variant in the top 10 ranking list among an average of 741 candidate variants per person after filtering. AI Variant Prioritizer was able to assign the target gene to the top rank for around 61.1% (66/108) of the patients, followed by Variant Prioritizer, which assigned it for 44.4% (48/108) of the patients. The cumulative rank result revealed that our AI Variant Prioritizer has the highest accuracy at ranks 1, 5, 10, and 20. It also shows that AI Variant Prioritizer presents better performance than other tools. After adopting the Human Phenotype Ontology (HPO) terms by looking up the databases, the top 10 ranking list can be increased to 93.5% (101/108). CONCLUSIONS We successfully applied sequencing data from WES and free-text phenotypic information of patient's disease automatically extracted by the keyword extraction tool for model training and testing. By interpreting our model, we identified which features of variants are important. Besides, we achieved a satisfactory result on finding the target variant in our testing data set. After adopting the HPO terms by looking up the databases, the top 10 ranking list can be increased to 93.5% (101/108). The performance of the model is similar to that of manual analysis, and it has been used to help National Taiwan University Hospital with a genetic diagnosis.
Collapse
Affiliation(s)
- Yu-Shan Huang
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei City, Taiwan
| | - Ching Hsu
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei City, Taiwan
| | - Yu-Chang Chune
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei City, Taiwan
| | - I-Cheng Liao
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei City, Taiwan
| | - Hsin Wang
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei City, Taiwan
| | - Yi-Lin Lin
- Department of Medical Genetics, National Taiwan University Hospital, Taipei City, Taiwan
| | - Wuh-Liang Hwu
- Department of Pediatrics, National Taiwan University Hospital, Taipei City, Taiwan
| | - Ni-Chung Lee
- Department of Medical Genetics, National Taiwan University Hospital, Taipei City, Taiwan
| | - Feipei Lai
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei City, Taiwan
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei City, Taiwan
| |
Collapse
|
30
|
Integrating variant functional annotation scores have varied abilities to improve power of genome-wide association studies. Sci Rep 2022; 12:10720. [PMID: 35750789 PMCID: PMC9232605 DOI: 10.1038/s41598-022-14924-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Accepted: 06/15/2022] [Indexed: 11/12/2022] Open
Abstract
Functional annotations have the potential to increase power of genome-wide association studies (GWAS) by prioritizing variants according to their biological function, but this potential has not been well studied. We comprehensively evaluated all 1132 traits in the UK Biobank whose SNP-heritability estimates were given “medium” or “high” labels by Neale’s lab. For each trait, we integrated GWAS summary statistics of close to 8 million common variants (minor allele frequency \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$>1\%$$\end{document}>1%) with either their 75 individual functional scores or their meta-scores, using three different data-integration methods. Overall, the number of new genome-wide significant findings after data-integration increases as a trait SNP-heritability estimate increases. However, there is a trade-off between new findings and loss of baseline GWAS findings, resulting in similar total numbers of significant findings between using GWAS alone and integrating GWAS with functional scores, across all 1132 traits analyzed and all three data-integration methods considered. Our findings suggest that, even with the current biobank-level sample size, more informative functional scores and/or new data-integration methods are needed to further improve the power of GWAS of common variants. For example, studying variants in coding sequence and obtaining cell-type-specific scores are potential future directions.
Collapse
|
31
|
Chimusa ER, Alosaimi S, Bope CD. Dissecting Generalizability and Actionability of Disease-Associated Genes From 20 Worldwide Ethnolinguistic Cultural Groups. Front Genet 2022; 13:835713. [PMID: 35812734 PMCID: PMC9263835 DOI: 10.3389/fgene.2022.835713] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 04/29/2022] [Indexed: 11/30/2022] Open
Abstract
Findings resulting from whole-genome sequencing (WGS) have markedly increased due to the massive evolvement of sequencing methods and have led to further investigations such as clinical actionability of genes, as documented by the American College of Medical Genetics and Genomics (ACMG). ACMG's actionable genes (ACGs) may not necessarily be clinically actionable across all populations worldwide. It is critical to examine the actionability of these genes in different populations. Here, we have leveraged a combined WES from the African Genome Variation and 1000 Genomes Project to examine the generalizability of ACG and potential actionable genes from four diseases: high-burden malaria, TB, HIV/AIDS, and sickle cell disease. Our results suggest that ethnolinguistic cultural groups from Africa, particularly Bantu and Khoesan, have high genetic diversity, high proportion of derived alleles at low minor allele frequency (0.0-0.1), and the highest proportion of pathogenic variants within HIV, TB, malaria, and sickle cell diseases. In contrast, ethnolinguistic cultural groups from the non-Africa continent, including Latin American, Afro-related, and European-related groups, have a high proportion of pathogenic variants within ACG than most of the ethnolinguistic cultural groups from Africa. Overall, our results show high genetic diversity in the present actionable and known disease-associated genes of four African high-burden diseases, suggesting the limitation of transferability or generalizability of ACG. This supports the use of personalized medicine as beneficial to the worldwide population as well as actionable gene list recommendation to further foster equitable global healthcare. The results point out the bias in the knowledge about the frequency distribution of these phenotypes and genetic variants associated with some diseases, especially in African and African ancestry populations.
Collapse
Affiliation(s)
- Emile R Chimusa
- Division of Human Genetics, Department of Pathology, University of Cape Town, Medical School Cape Town, Cape Town, South Africa
- Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Shatha Alosaimi
- Division of Human Genetics, Department of Pathology, University of Cape Town, Medical School Cape Town, Cape Town, South Africa
| | - Christian D Bope
- Division of Human Genetics, Department of Pathology, University of Cape Town, Medical School Cape Town, Cape Town, South Africa
- Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
- Department of Mathematics and Computer Science, University of Kinshasa, Kinshasa, Congo
- Centre for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| |
Collapse
|
32
|
Chen D, Wang X, Huang T, Jia J. Sleep and Late-Onset Alzheimer's Disease: Shared Genetic Risk Factors, Drug Targets, Molecular Mechanisms, and Causal Effects. Front Genet 2022; 13:794202. [PMID: 35656316 PMCID: PMC9152224 DOI: 10.3389/fgene.2022.794202] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 03/23/2022] [Indexed: 12/30/2022] Open
Abstract
Late-onset Alzheimer's disease (AD) is associated with sleep-related phenotypes (SRPs). The fact that whether they share a common genetic etiology remains largely unknown. We explored the shared genetics and causality between AD and SRPs by using high-definition likelihood (HDL), cross-phenotype association study (CPASSOC), transcriptome-wide association study (TWAS), and bidirectional Mendelian randomization (MR) in summary-level data for AD (N = 455,258) and summary-level data for seven SRPs (sample size ranges from 359,916 to 1,331,010). AD shared a strong genetic basis with insomnia (r g = 0.20; p = 9.70 × 10-5), snoring (r g = 0.13; p = 2.45 × 10-3), and sleep duration (r g = -0.11; p = 1.18 × 10-3). The CPASSOC identifies 31 independent loci shared between AD and SRPs, including four novel shared loci. Functional analysis and the TWAS showed shared genes were enriched in liver, brain, breast, and heart tissues and highlighted the regulatory roles of immunological disorders, very-low-density lipoprotein particle clearance, triglyceride-rich lipoprotein particle clearance, chylomicron remnant clearance, and positive regulation of T-cell-mediated cytotoxicity pathways. Protein-protein interaction analysis identified three potential drug target genes (APOE, MARK4, and HLA-DRA) that interacted with known FDA-approved drug target genes. The CPASSOC and TWAS demonstrated three regions 11p11.2, 6p22.3, and 16p11.2 may account for the shared basis between AD and sleep duration or snoring. MR showed insomnia had a causal effect on AD (ORIVW = 1.02, P IVW = 6.7 × 10-6), and multivariate MR suggested a potential role of sleep duration and major depression in this association. Our findings provide strong evidence of shared genetics and causation between AD and sleep abnormalities and advance our understanding of the genetic overlap between them. Identifying shared drug targets and molecular pathways can be beneficial for treating AD and sleep disorders more efficiently.
Collapse
Affiliation(s)
- Dongze Chen
- Department of Biostatistics, School of Public Health, Peking University, Beijing, China
| | - Xinpei Wang
- Department of Biostatistics, School of Public Health, Peking University, Beijing, China
| | - Tao Huang
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China.,Key Laboratory of Molecular Cardiovascular Sciences (Peking University), Ministry of Education, Beijing, China.,Center for Intelligent Public Health, Institute for Artificial Intelligence, Peking University, Beijing, China
| | - Jinzhu Jia
- Department of Biostatistics, School of Public Health, Peking University, Beijing, China.,Center for Statistical Science, Peking University, Beijing, China
| |
Collapse
|
33
|
Katsonis P, Wilhelm K, Williams A, Lichtarge O. Genome interpretation using in silico predictors of variant impact. Hum Genet 2022; 141:1549-1577. [PMID: 35488922 PMCID: PMC9055222 DOI: 10.1007/s00439-022-02457-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 04/17/2022] [Indexed: 02/06/2023]
Abstract
Estimating the effects of variants found in disease driver genes opens the door to personalized therapeutic opportunities. Clinical associations and laboratory experiments can only characterize a tiny fraction of all the available variants, leaving the majority as variants of unknown significance (VUS). In silico methods bridge this gap by providing instant estimates on a large scale, most often based on the numerous genetic differences between species. Despite concerns that these methods may lack reliability in individual subjects, their numerous practical applications over cohorts suggest they are already helpful and have a role to play in genome interpretation when used at the proper scale and context. In this review, we aim to gain insights into the training and validation of these variant effect predicting methods and illustrate representative types of experimental and clinical applications. Objective performance assessments using various datasets that are not yet published indicate the strengths and limitations of each method. These show that cautious use of in silico variant impact predictors is essential for addressing genome interpretation challenges.
Collapse
Affiliation(s)
- Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Kevin Wilhelm
- Graduate School of Biomedical Sciences, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Amanda Williams
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA. .,Department of Biochemistry, Human Genetics and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA. .,Department of Pharmacology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA. .,Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| |
Collapse
|
34
|
Chen L, Wang Y, Zhao F. Exploiting deep transfer learning for the prediction of functional non-coding variants using genomic sequence. Bioinformatics 2022; 38:3164-3172. [PMID: 35389435 PMCID: PMC9890318 DOI: 10.1093/bioinformatics/btac214] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 03/04/2022] [Accepted: 04/06/2022] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Though genome-wide association studies have identified tens of thousands of variants associated with complex traits and most of them fall within the non-coding regions, they may not be the causal ones. The development of high-throughput functional assays leads to the discovery of experimental validated non-coding functional variants. However, these validated variants are rare due to technical difficulty and financial cost. The small sample size of validated variants makes it less reliable to develop a supervised machine learning model for achieving a whole genome-wide prediction of non-coding causal variants. RESULTS We will exploit a deep transfer learning model, which is based on convolutional neural network, to improve the prediction for functional non-coding variants (NCVs). To address the challenge of small sample size, the transfer learning model leverages both large-scale generic functional NCVs to improve the learning of low-level features and context-specific functional NCVs to learn high-level features toward the context-specific prediction task. By evaluating the deep transfer learning model on three MPRA datasets and 16 GWAS datasets, we demonstrate that the proposed model outperforms deep learning models without pretraining or retraining. In addition, the deep transfer learning model outperforms 18 existing computational methods in both MPRA and GWAS datasets. AVAILABILITY AND IMPLEMENTATION https://github.com/lichen-lab/TLVar. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Li Chen
- To whom correspondence should be addressed.
| | | | - Fengdi Zhao
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN 46202, USA,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| |
Collapse
|
35
|
Analysis of missense variants in the human genome reveals widespread gene-specific clustering and improves prediction of pathogenicity. Am J Hum Genet 2022; 109:457-470. [PMID: 35120630 PMCID: PMC8948164 DOI: 10.1016/j.ajhg.2022.01.006] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 01/11/2022] [Indexed: 12/11/2022] Open
Abstract
We used a machine learning approach to analyze the within-gene distribution of missense variants observed in hereditary conditions and cancer. When applied to 840 genes from the ClinVar database, this approach detected a significant non-random distribution of pathogenic and benign variants in 387 (46%) and 172 (20%) genes, respectively, revealing that variant clustering is widespread across the human exome. This clustering likely occurs as a consequence of mechanisms shaping pathogenicity at the protein level, as illustrated by the overlap of some clusters with known functional domains. We then took advantage of these findings to develop a pathogenicity predictor, MutScore, that integrates qualitative features of DNA substitutions with the new additional information derived from this positional clustering. Using a random forest approach, MutScore was able to identify pathogenic missense mutations with very high accuracy, outperforming existing predictive tools, especially for variants associated with autosomal-dominant disease and cancer. Thus, the within-gene clustering of pathogenic and benign DNA changes is an important and previously underappreciated feature of the human exome, which can be harnessed to improve the prediction of pathogenicity and disambiguation of DNA variants of uncertain significance.
Collapse
|
36
|
Li X, Yung G, Zhou H, Sun R, Li Z, Hou K, Zhang MJ, Liu Y, Arapoglou T, Wang C, Ionita-Laza I, Lin X. A multi-dimensional integrative scoring framework for predicting functional variants in the human genome. Am J Hum Genet 2022; 109:446-456. [PMID: 35216679 PMCID: PMC8948160 DOI: 10.1016/j.ajhg.2022.01.017] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 01/26/2022] [Indexed: 12/26/2022] Open
Abstract
Attempts to identify and prioritize functional DNA elements in coding and non-coding regions, particularly through use of in silico functional annotation data, continue to increase in popularity. However, specific functional roles can vary widely from one variant to another, making it challenging to summarize different aspects of variant function with a one-dimensional rating. Here we propose multi-dimensional annotation-class integrative estimation (MACIE), an unsupervised multivariate mixed-model framework capable of integrating annotations of diverse origin to assess multi-dimensional functional roles for both coding and non-coding variants. Unlike existing one-dimensional scoring methods, MACIE views variant functionality as a composite attribute encompassing multiple characteristics and estimates the joint posterior functional probabilities of each genomic position. This estimate offers more comprehensive and interpretable information in the presence of multiple aspects of functionality. Applied to a variety of independent coding and non-coding datasets, MACIE demonstrates powerful and robust performance in discriminating between functional and non-functional variants. We also show an application of MACIE to fine-mapping and heritability enrichment analysis by using the lipids GWAS summary statistics data from the European Network for Genetic and Genomic Epidemiology Consortium.
Collapse
Affiliation(s)
- Xihao Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Godwin Yung
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Methods, Collaboration and Outreach Group, Genentech/Roche, South San Francisco, CA 94080, USA
| | - Hufeng Zhou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Ryan Sun
- Department of Biostatistics, University of Texas M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Zilin Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Kangcheng Hou
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Martin Jinye Zhang
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Yaowu Liu
- School of Statistics, Southwestern University of Finance and Economics, Chengdu, Sichuan, China
| | - Theodore Arapoglou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Chen Wang
- Department of Biostatistics, Columbia University Mailman School of Public Health, New York, NY 10032, USA
| | - Iuliana Ionita-Laza
- Department of Biostatistics, Columbia University Mailman School of Public Health, New York, NY 10032, USA.
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Statistics, Harvard University, Cambridge, MA, 02138, USA.
| |
Collapse
|
37
|
Anderson D, Lassmann T. An expanded phenotype centric benchmark of variant prioritisation tools. Hum Mutat 2022; 43:539-546. [PMID: 35224813 PMCID: PMC9313608 DOI: 10.1002/humu.24362] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 01/18/2022] [Accepted: 02/23/2022] [Indexed: 11/17/2022]
Abstract
Identifying the causal variant for diagnosis of genetic diseases is challenging when using next‐generation sequencing approaches and variant prioritization tools can assist in this task. These tools provide in silico predictions of variant pathogenicity, however they are agnostic to the disease under study. We previously performed a disease‐specific benchmark of 24 such tools to assess how they perform in different disease contexts. We found that the tools themselves show large differences in performance, but more importantly that the best tools for variant prioritization are dependent on the disease phenotypes being considered. Here we expand the assessment to 37 tools and refine our assessment by separating performance for nonsynonymous single nucleotide variants (nsSNVs) and missense variants (i.e., excluding nonsense variants). We found differences in performance for missense variants compared to nsSNVs and recommend three tools that stand out in terms of their performance (BayesDel, CADD, and ClinPred).
Collapse
Affiliation(s)
- Denise Anderson
- Telethon Kids Institute The University of Western Australia Subiaco Western Australia 6008 Australia
| | - Timo Lassmann
- Telethon Kids Institute The University of Western Australia Subiaco Western Australia 6008 Australia
| |
Collapse
|
38
|
DVPred: a disease-specific prediction tool for variant pathogenicity classification for hearing loss. Hum Genet 2022; 141:401-411. [PMID: 35182233 DOI: 10.1007/s00439-022-02440-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 02/06/2022] [Indexed: 02/08/2023]
Abstract
Numerous computational prediction tools have been introduced to estimate the functional impact of variants in the human genome based on evolutionary constraints and biochemical metrics. However, their implementation in diagnostic settings to classify variants faced challenges with accuracy and validity. Most existing tools are pan-genome and pan-diseases, which neglected gene- and disease-specific properties and limited the accessibility of curated data. As a proof-of-concept, we developed a disease-specific prediction tool named Deafness Variant deleteriousness Prediction tool (DVPred) that focused on the 157 genes reportedly causing genetic hearing loss (HL). DVPred applied the gradient boosting decision tree (GBDT) algorithm to the dataset consisting of expert-curated pathogenic and benign variants from a large in-house HL patient cohort and public databases. With the incorporation of variant-level and gene-level features, DVPred outperformed the existing universal tools. It boasts an area under the curve (AUC) of 0.98, and showed consistent performance (AUC = 0.985) in an independent assessment dataset. We further demonstrated that multiple gene-level metrics, including low complexity genomic regions and substitution intolerance scores, were the top features of the model. A comprehensive analysis of missense variants showed a gene-specific ratio of predicted deleterious and neutral variants, implying varied tolerance or intolerance to variation in different genes. DVPred explored the utility of disease-specific strategy in improving the deafness variant prediction tool. It can improve the prioritization of pathogenic variants among massive variants identified by high-throughput sequencing on HL genes. It also shed light on the development of variant prediction tools for other genetic disorders.
Collapse
|
39
|
Garcia FADO, de Andrade ES, de Campos Reis Galvão H, da Silva Sábato C, Campacci N, de Paula AE, Evangelista AF, Santana IVV, Melendez ME, Reis RM, Palmero EI. New insights on familial colorectal cancer type X syndrome. Sci Rep 2022; 12:2846. [PMID: 35181726 PMCID: PMC8857274 DOI: 10.1038/s41598-022-06782-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 12/17/2021] [Indexed: 12/22/2022] Open
Abstract
Familial colorectal cancer type X (FCCTX) is a heterogeneous colorectal cancer predisposition syndrome that, although displays a cancer pattern similar to Lynch syndrome, is mismatch repair proficient and does not exhibit microsatellite instability. Besides, its genetic etiology remains to be elucidated. In this study we performed germline exome sequencing of 39 cancer-affected patients from 34 families at risk for FCCTX. Variant classification followed the American College of Medical Genetics and Genomics (ACMG) guidelines. Pathogenic/likely pathogenic variants were identified in 17.65% of the families. Rare and potentially pathogenic alterations were identified in known hereditary cancer genes (CHEK2), in putative FCCTX candidate genes (OGG1 and FAN1) and in other cancer-related genes such as ATR, ASXL1, PARK2, SLX4 and TREX1. This study provides novel important clues that can contribute to the understanding of FCCTX genetic basis.
Collapse
Affiliation(s)
- Felipe Antonio de Oliveira Garcia
- Molecular Oncology Research Center, Barretos Cancer Hospital, Antenor Duarte Villela Street, 1331, Barretos, São Paulo, CEP 14784-400, Brazil
| | - Edilene Santos de Andrade
- Molecular Oncology Research Center, Barretos Cancer Hospital, Antenor Duarte Villela Street, 1331, Barretos, São Paulo, CEP 14784-400, Brazil
| | | | | | - Natália Campacci
- Molecular Oncology Research Center, Barretos Cancer Hospital, Antenor Duarte Villela Street, 1331, Barretos, São Paulo, CEP 14784-400, Brazil
| | | | - Adriane Feijó Evangelista
- Molecular Oncology Research Center, Barretos Cancer Hospital, Antenor Duarte Villela Street, 1331, Barretos, São Paulo, CEP 14784-400, Brazil
| | | | - Matias Eliseo Melendez
- Molecular Oncology Research Center, Barretos Cancer Hospital, Antenor Duarte Villela Street, 1331, Barretos, São Paulo, CEP 14784-400, Brazil.,Department of Molecular Carcinogenesis, Brazilian National Cancer Institute, Rio de Janeiro, Brazil
| | - Rui Manuel Reis
- Molecular Oncology Research Center, Barretos Cancer Hospital, Antenor Duarte Villela Street, 1331, Barretos, São Paulo, CEP 14784-400, Brazil.,Center of Molecular Diagnosis, Barretos Cancer Hospital, Barretos, São Paulo, Brazil.,Life and Health Sciences Research Institute (ICVS), Medical School, University of Minho, Braga, Portugal.,ICVS/3B's-PT Government Associate Laboratory, Braga/Guimarães, Portugal
| | - Edenir Inez Palmero
- Molecular Oncology Research Center, Barretos Cancer Hospital, Antenor Duarte Villela Street, 1331, Barretos, São Paulo, CEP 14784-400, Brazil. .,Department of Genetics, Brazilian National Cancer Institute, Rio de Janeiro, Brazil.
| |
Collapse
|
40
|
Khatiwada A, Wolf BJ, Yilmaz AS, Ramos PS, Pietrzak M, Lawson A, Hunt KJ, Kim HJ, Chung D. GPA-Tree: statistical approach for functional-annotation-tree-guided prioritization of GWAS results. Bioinformatics 2022; 38:1067-1074. [PMID: 34849578 PMCID: PMC10060690 DOI: 10.1093/bioinformatics/btab802] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 10/09/2021] [Accepted: 11/23/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION In spite of great success of genome-wide association studies (GWAS), multiple challenges still remain. First, complex traits are often associated with many single nucleotide polymorphisms (SNPs), each with small or moderate effect sizes. Second, our understanding of the functional mechanisms through which genetic variants are associated with complex traits is still limited. To address these challenges, we propose GPA-Tree and it simultaneously implements association mapping and identifies key combinations of functional annotations related to risk-associated SNPs by combining a decision tree algorithm with a hierarchical modeling framework. RESULTS First, we implemented simulation studies to evaluate the proposed GPA-Tree method and compared its performance with existing statistical approaches. The results indicate that GPA-Tree outperforms existing statistical approaches in detecting risk-associated SNPs and identifying the true combinations of functional annotations with high accuracy. Second, we applied GPA-Tree to a systemic lupus erythematosus (SLE) GWAS and functional annotation data including GenoSkyline and GenoSkylinePlus. The results from GPA-Tree highlight the dysregulation of blood immune cells, including but not limited to primary B, memory helper T, regulatory T, neutrophils and CD8+ memory T cells in SLE. These results demonstrate that GPA-Tree can be a powerful tool that improves association mapping while facilitating understanding of the underlying genetic architecture of complex traits and potential mechanisms linking risk-associated SNPs with complex traits. AVAILABILITY AND IMPLEMENTATION The GPATree software is available at https://dongjunchung.github.io/GPATree/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Aastha Khatiwada
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29425, USA
- Division of Biostatistics and Bioinformatics, National Jewish Health, Denver, CO 80206, USA
| | - Bethany J Wolf
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29425, USA
| | - Ayse Selen Yilmaz
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
| | - Paula S Ramos
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29425, USA
- Department of Medicine, Medical University of South Carolina, Charleston, SC 29425, USA
| | - Maciej Pietrzak
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
| | - Andrew Lawson
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29425, USA
| | - Kelly J Hunt
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29425, USA
| | - Hang J Kim
- Division of Statistics and Data Science, University of Cincinnati, Cincinnati, OH 45221, USA
| | - Dongjun Chung
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
41
|
Computational Resources for the Interpretation of Variations in Cancer. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1361:177-198. [DOI: 10.1007/978-3-030-91836-1_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
42
|
Cao Z, Huang Y, Duan R, Jin P, Qin ZS, Zhang S. Disease category-specific annotation of variants using an ensemble learning framework. Brief Bioinform 2021; 23:6394995. [PMID: 34643213 DOI: 10.1093/bib/bbab438] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 09/03/2021] [Accepted: 09/22/2021] [Indexed: 02/01/2023] Open
Abstract
Understanding the impact of non-coding sequence variants on complex diseases is an essential problem. We present a novel ensemble learning framework-CASAVA, to predict genomic loci in terms of disease category-specific risk. Using disease-associated variants identified by GWAS as training data, and diverse sequencing-based genomics and epigenomics profiles as features, CASAVA provides risk prediction of 24 major categories of diseases throughout the human genome. Our studies showed that CASAVA scores at a genomic locus provide a reasonable prediction of the disease-specific and disease category-specific risk prediction for non-coding variants located within the locus. Taking MHC2TA and immune system diseases as an example, we demonstrate the potential of CASAVA in revealing variant-disease associations. A website (http://zhanglabtools.org/CASAVA) has been built to facilitate easily access to CASAVA scores.
Collapse
Affiliation(s)
- Zhen Cao
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yanting Huang
- Department of Computer Science, Emory University, Atlanta, GA 30322, USA
| | - Ran Duan
- Department of Software Engineering, Yunnan University, Kunming 650500, China
| | - Peng Jin
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Zhaohui S Qin
- Department of Computer Science, Emory University, Atlanta, GA 30322, USA.,Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China.,Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
| |
Collapse
|
43
|
Zhou Q, Wang J, Xia L, Li R, Zhang Q, Pan S. SYN1 Mutation Causes X-Linked Toothbrushing Epilepsy in a Chinese Family. Front Neurol 2021; 12:736977. [PMID: 34616357 PMCID: PMC8488375 DOI: 10.3389/fneur.2021.736977] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 08/25/2021] [Indexed: 11/15/2022] Open
Abstract
Toothbrushing epilepsy is a rare form of reflex epilepsy (RE) with sporadic incidence. To characterize the genetic profile of reflex epilepsy patients with tooth brushing-induced seizures in a Chinese family. Solo clinical whole-exome sequencing (WES) of the proband, a 37-year-old Chinese man, was performed to characterize the genetic etiology of toothbrushing epilepsy. Mutations in the maternal X-linked synapsin 1 (SYN1) identified in the proband and his family members were confirmed by Sanger sequencing. The pathogenicity of these mutations was determined using in silico analysis. The proband had four episodes of toothbrushing-induced seizures. The semiology included nausea, twitching of the right side of the mouth and face, followed by a generalized tonic-clonic seizure (GTCS). The proband's elder maternal uncle had three toothbrushing-induced epileptic seizures at the age of 26. The proband's younger maternal uncle had no history of epileptic seizures but had a learning disability and aggressive tendencies. We identified a deleterious nonsense mutation, c.1807C>T (p.Q603Ter), in exon 12 of the SYN1 gene (NM_006950), which can result in a truncated SYN1 phosphoprotein with altered flexibility and hydropathicity. This novel mutation has not been reported in the 1000G, EVS, ExAC, gnomAD, or HGMD databases. We identified a novel X-linked SYN1 exon 12 mutant gene in a Chinese family with toothbrushing epilepsy. Our findings provide novel insights into the mechanism of this complex form of reflex epilepsy that could potentially be applied in disease diagnosis.
Collapse
Affiliation(s)
- Qin Zhou
- Department of Neurology, Renmin Hospital, Wuhan University, Wuhan, China
| | - Jingwei Wang
- Department of Clinical Laboratory, Renmin Hospital of Wuhan University, Wuhan, China
| | - Li Xia
- Department of Neurology, Renmin Hospital, Wuhan University, Wuhan, China
| | - Rong Li
- Department of Neurology, Renmin Hospital, Wuhan University, Wuhan, China
| | - Qiumin Zhang
- Department of Neurology, Renmin Hospital, Wuhan University, Wuhan, China
| | - Songqing Pan
- Department of Neurology, Renmin Hospital, Wuhan University, Wuhan, China
| |
Collapse
|
44
|
Wu Y, Liu H, Li R, Sun S, Weile J, Roth FP. Improved pathogenicity prediction for rare human missense variants. Am J Hum Genet 2021; 108:1891-1906. [PMID: 34551312 PMCID: PMC8546039 DOI: 10.1016/j.ajhg.2021.08.012] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Accepted: 08/18/2021] [Indexed: 01/01/2023] Open
Abstract
The success of personalized genomic medicine depends on our ability to assess the pathogenicity of rare human variants, including the important class of missense variation. There are many challenges in training accurate computational systems, e.g., in finding the balance between quantity, quality, and bias in the variant sets used as training examples and avoiding predictive features that can accentuate the effects of bias. Here, we describe VARITY, which judiciously exploits a larger reservoir of training examples with uncertain accuracy and representativity. To limit circularity and bias, VARITY excludes features informed by variant annotation and protein identity. To provide a rationale for each prediction, we quantified the contribution of features and feature combinations to the pathogenicity inference of each variant. VARITY outperformed all previous computational methods evaluated, identifying at least 10% more pathogenic variants at thresholds achieving high (90% precision) stringency.
Collapse
|
45
|
Hutchinson A, Reales G, Willis T, Wallace C. Leveraging auxiliary data from arbitrary distributions to boost GWAS discovery with Flexible cFDR. PLoS Genet 2021; 17:e1009853. [PMID: 34669738 PMCID: PMC8559959 DOI: 10.1371/journal.pgen.1009853] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 11/01/2021] [Accepted: 09/30/2021] [Indexed: 12/15/2022] Open
Abstract
Genome-wide association studies (GWAS) have identified thousands of genetic variants that are associated with complex traits. However, a stringent significance threshold is required to identify robust genetic associations. Leveraging relevant auxiliary covariates has the potential to boost statistical power to exceed the significance threshold. Particularly, abundant pleiotropy and the non-random distribution of SNPs across various functional categories suggests that leveraging GWAS test statistics from related traits and/or functional genomic data may boost GWAS discovery. While type 1 error rate control has become standard in GWAS, control of the false discovery rate can be a more powerful approach. The conditional false discovery rate (cFDR) extends the standard FDR framework by conditioning on auxiliary data to call significant associations, but current implementations are restricted to auxiliary data satisfying specific parametric distributions, typically GWAS p-values for related traits. We relax these distributional assumptions, enabling an extension of the cFDR framework that supports auxiliary covariates from arbitrary continuous distributions ("Flexible cFDR"). Our method can be applied iteratively, thereby supporting multi-dimensional covariate data. Through simulations we show that Flexible cFDR increases sensitivity whilst controlling FDR after one or several iterations. We further demonstrate its practical potential through application to an asthma GWAS, leveraging various functional genomic data to find additional genetic associations for asthma, which we validate in the larger, independent, UK Biobank data resource.
Collapse
Affiliation(s)
- Anna Hutchinson
- MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom
| | - Guillermo Reales
- Cambridge Institute of Therapeutic Immunology and Infectious Disease (CITIID), University of Cambridge, Cambridge, United Kingdom
- Department of Medicine, University of Cambridge, Cambridge, United Kingdom
| | - Thomas Willis
- MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom
| | - Chris Wallace
- MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom
- Cambridge Institute of Therapeutic Immunology and Infectious Disease (CITIID), University of Cambridge, Cambridge, United Kingdom
- Department of Medicine, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
46
|
Jin Y, Jiang J, Wang R, Qin ZS. Systematic Evaluation of DNA Sequence Variations on in vivo Transcription Factor Binding Affinity. Front Genet 2021; 12:667866. [PMID: 34567058 PMCID: PMC8458901 DOI: 10.3389/fgene.2021.667866] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 08/02/2021] [Indexed: 02/01/2023] Open
Abstract
The majority of the single nucleotide variants (SNVs) identified by genome-wide association studies (GWAS) fall outside of the protein-coding regions. Elucidating the functional implications of these variants has been a major challenge. A possible mechanism for functional non-coding variants is that they disrupted the canonical transcription factor (TF) binding sites that affect the in vivo binding of the TF. However, their impact varies since many positions within a TF binding motif are not well conserved. Therefore, simply annotating all variants located in putative TF binding sites may overestimate the functional impact of these SNVs. We conducted a comprehensive survey to study the effect of SNVs on the TF binding affinity. A sequence-based machine learning method was used to estimate the change in binding affinity for each SNV located inside a putative motif site. From the results obtained on 18 TF binding motifs, we found that there is a substantial variation in terms of a SNV’s impact on TF binding affinity. We found that only about 20% of SNVs located inside putative TF binding sites would likely to have significant impact on the TF-DNA binding.
Collapse
Affiliation(s)
- Yutong Jin
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, United States
| | - Jiahui Jiang
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, United States
| | - Ruixuan Wang
- College of Environmental Sciences and Engineering, Peking University, Beijing, China
| | - Zhaohui S Qin
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, United States
| |
Collapse
|
47
|
Kim HY, Jeon W, Kim D. An enhanced variant effect predictor based on a deep generative model and the Born-Again Networks. Sci Rep 2021; 11:19127. [PMID: 34580383 PMCID: PMC8476491 DOI: 10.1038/s41598-021-98693-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 09/07/2021] [Indexed: 11/09/2022] Open
Abstract
The development of an accurate and reliable variant effect prediction tool is important for research in human genetic diseases. A large number of predictors have been developed towards this goal, yet many of these predictors suffer from the problem of data circularity. Here we present MTBAN (Mutation effect predictor using the Temporal convolutional network and the Born-Again Networks), a method for predicting the deleteriousness of variants. We apply a form of knowledge distillation technique known as the Born-Again Networks (BAN) to a previously developed deep autoregressive generative model, mutationTCN, to achieve an improved performance in variant effect prediction. As the model is fully unsupervised and trained only on the evolutionarily related sequences of a protein, it does not suffer from the problem of data circularity which is common across supervised predictors. When evaluated on a test dataset consisting of deleterious and benign human protein variants, MTBAN shows an outstanding predictive ability compared to other well-known variant effect predictors. We also offer a user-friendly web server to predict variant effects using MTBAN, freely accessible at http://mtban.kaist.ac.kr . To our knowledge, MTBAN is the first variant effect prediction tool based on a deep generative model that provides a user-friendly web server for the prediction of deleteriousness of variants.
Collapse
Affiliation(s)
- Ha Young Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, 34141, Republic of Korea
| | - Woosung Jeon
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, 34141, Republic of Korea
| | - Dongsup Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
48
|
Fisher V, Sebastiani P, Cupples LA, Liu CT. ANNORE: Genetic fine mapping with functional annotation. Hum Mol Genet 2021; 31:32-40. [PMID: 34302344 DOI: 10.1093/hmg/ddab210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 06/30/2021] [Accepted: 07/19/2021] [Indexed: 11/13/2022] Open
Abstract
Genome-wide association studies (GWAS) have successfully identified loci of the human genome implicated in numerous complex traits. However, the limitations of this study design make it difficult to identify specific causal variants or biological mechanisms of association. We propose a novel method, AnnoRE, which uses GWAS summary statistics, local correlation structure among genotypes, and functional annotation from external databases to prioritize the most plausible causal SNPs in each trait-associated locus. Our proposed method improves upon previous fine mapping approaches by estimating the effects of functional annotation from genome-wide summary statistics, allowing for the inclusion of many annotation categories. By implementing a multiple regression model with differential shrinkage via random effects, we avoid reductive assumptions on the number of causal SNPs per locus. Application of this method to a large GWAS meta-analysis of body mass index identified six loci with significant evidence in favor of one or more variants. In an additional 24 loci, one or two variants were strongly prioritized over others in the region. The use of functional annotation in genetic fine mapping studies helps to distinguish between variants in high LD, and to identify promising targets for follow-up studies.
Collapse
Affiliation(s)
- Virginia Fisher
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA
| | - Paola Sebastiani
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA.,Tufts Medical Center, Boston, MA 02111, USA
| | - L Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA
| | - Ching-Ti Liu
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA
| |
Collapse
|
49
|
Huang Y, Sun X, Jiang H, Yu S, Robins C, Armstrong MJ, Li R, Mei Z, Shi X, Gerasimov ES, De Jager PL, Bennett DA, Wingo AP, Jin P, Wingo TS, Qin ZS. A machine learning approach to brain epigenetic analysis reveals kinases associated with Alzheimer's disease. Nat Commun 2021; 12:4472. [PMID: 34294691 PMCID: PMC8298578 DOI: 10.1038/s41467-021-24710-8] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2020] [Accepted: 06/28/2021] [Indexed: 12/21/2022] Open
Abstract
Alzheimer's disease (AD) is influenced by both genetic and environmental factors; thus, brain epigenomic alterations may provide insights into AD pathogenesis. Multiple array-based Epigenome-Wide Association Studies (EWASs) have identified robust brain methylation changes in AD; however, array-based assays only test about 2% of all CpG sites in the genome. Here, we develop EWASplus, a computational method that uses a supervised machine learning strategy to extend EWAS coverage to the entire genome. Application to six AD-related traits predicts hundreds of new significant brain CpGs associated with AD, some of which are further validated experimentally. EWASplus also performs well on data collected from independent cohorts and different brain regions. Genes found near top EWASplus loci are enriched for kinases and for genes with evidence for physical interactions with known AD genes. In this work, we show that EWASplus implicates additional epigenetic loci for AD that are not found using array-based AD EWASs.
Collapse
Affiliation(s)
- Yanting Huang
- Department of Computer Science, Emory University, Atlanta, GA, USA
| | - Xiaobo Sun
- Department of Mathematical and Statistical Finance, School of Statistics and Mathematics, Zhongnan University of Economics and Laws, Wuhan, Hubei, China.
| | - Huige Jiang
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Shaojun Yu
- Department of Computer Science, Emory University, Atlanta, GA, USA
| | - Chloe Robins
- Department of Neurology, Emory University School of Medicine, Atlanta, GA, USA
| | - Matthew J Armstrong
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - Ronghua Li
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - Zhen Mei
- Department of Neurology, Emory University School of Medicine, Atlanta, GA, USA
| | - Xiaochuan Shi
- Department of Statistics, University of Washington, Seattle, WA, USA
| | | | - Philip L De Jager
- Center for Translational and Computational Neuroimmunology, Department of Neurology, Columbia University Medical Center, New York, NY, USA
| | - David A Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Aliza P Wingo
- Division of Mental Health, Atlanta VA Medical Center, Decatur, GA, USA
- Department of Psychiatry, Emory University School of Medicine, Atlanta, GA, USA
| | - Peng Jin
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - Thomas S Wingo
- Department of Neurology, Emory University School of Medicine, Atlanta, GA, USA.
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA.
| | - Zhaohui S Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA.
| |
Collapse
|
50
|
Seaby EG, Ennis S. Challenges in the diagnosis and discovery of rare genetic disorders using contemporary sequencing technologies. Brief Funct Genomics 2021; 19:243-258. [PMID: 32393978 DOI: 10.1093/bfgp/elaa009] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Next generation sequencing (NGS) has revolutionised rare disease diagnostics. Concomitant with advancing technologies has been a rise in the number of new gene disorders discovered and diagnoses made for patients and their families. However, despite the trend towards whole exome and whole genome sequencing, diagnostic rates remain suboptimal. On average, only ~30% of patients receive a molecular diagnosis. National sequencing projects launched in the last 5 years are integrating clinical diagnostic testing with research avenues to widen the spectrum of known genetic disorders. Consequently, efforts to diagnose genetic disorders in a clinical setting are now often shared with efforts to prioritise candidate variants for the detection of new disease genes. Herein we discuss some of the biggest obstacles precluding molecular diagnosis and discovery of new gene disorders. We consider bioinformatic and analytical challenges faced when interpreting next generation sequencing data and showcase some of the newest tools available to mitigate these issues. We consider how incomplete penetrance, non-coding variation and structural variants are likely to impact diagnostic rates, and we further discuss methods for uplifting novel gene discovery by adopting a gene-to-patient-based approach.
Collapse
|