1
|
Dai Y, Itai T, Pei G, Yan F, Chu Y, Jiang X, Weinberg SM, Mukhopadhyay N, Marazita ML, Simon LM, Jia P, Zhao Z. DeepFace: Deep-learning-based framework to contextualize orofacial-cleft-related variants during human embryonic craniofacial development. HGG ADVANCES 2024; 5:100312. [PMID: 38796699 PMCID: PMC11193024 DOI: 10.1016/j.xhgg.2024.100312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 05/23/2024] [Accepted: 05/23/2024] [Indexed: 05/28/2024] Open
Abstract
Orofacial clefts (OFCs) are among the most common human congenital birth defects. Previous multiethnic studies have identified dozens of associated loci for both cleft lip with or without cleft palate (CL/P) and cleft palate alone (CP). Although several nearby genes have been highlighted, the "casual" variants are largely unknown. Here, we developed DeepFace, a convolutional neural network model, to assess the functional impact of variants by SNP activity difference (SAD) scores. The DeepFace model is trained with 204 epigenomic assays from crucial human embryonic craniofacial developmental stages of post-conception week (pcw) 4 to pcw 10. The Pearson correlation coefficient between the predicted and actual values for 12 epigenetic features achieved a median range of 0.50-0.83. Specifically, our model revealed that SNPs significantly associated with OFCs tended to exhibit higher SAD scores across various variant categories compared to less related groups, indicating a context-specific impact of OFC-related SNPs. Notably, we identified six SNPs with a significant linear relationship to SAD scores throughout developmental progression, suggesting that these SNPs could play a temporal regulatory role. Furthermore, our cell-type specificity analysis pinpointed the trophoblast cell as having the highest enrichment of risk signals associated with OFCs. Overall, DeepFace can harness distal regulatory signals from extensive epigenomic assays, offering new perspectives for prioritizing OFC variants using contextualized functional genomic features. We expect DeepFace to be instrumental in accessing and predicting the regulatory roles of variants associated with OFCs, and the model can be extended to study other complex diseases or traits.
Collapse
Affiliation(s)
- Yulin Dai
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Toshiyuki Itai
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Guangsheng Pei
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Fangfang Yan
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Yan Chu
- Center for Secure Artificial Intelligence for Healthcare, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Xiaoqian Jiang
- Center for Secure Artificial Intelligence for Healthcare, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Seth M Weinberg
- Department of Oral and Craniofacial Sciences, School of Dental Medicine, Center for Craniofacial and Dental Genetics, University of Pittsburgh, Pittsburgh, PA 15213, USA; Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Nandita Mukhopadhyay
- Department of Oral and Craniofacial Sciences, School of Dental Medicine, Center for Craniofacial and Dental Genetics, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Mary L Marazita
- Department of Oral and Craniofacial Sciences, School of Dental Medicine, Center for Craniofacial and Dental Genetics, University of Pittsburgh, Pittsburgh, PA 15213, USA; Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261, USA; Clinical and Translational Science Institute, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Lukas M Simon
- Therapeutic Innovation Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Peilin Jia
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Zhongming Zhao
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX 77030, USA.
| |
Collapse
|
2
|
Gao Y, Zhou Q, Luo J, Xia C, Zhang Y, Yue Z. Crop-GPA: an integrated platform of crop gene-phenotype associations. NPJ Syst Biol Appl 2024; 10:15. [PMID: 38346982 PMCID: PMC10861494 DOI: 10.1038/s41540-024-00343-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 01/22/2024] [Indexed: 02/15/2024] Open
Abstract
With the increasing availability of large-scale biology data in crop plants, there is an urgent demand for a versatile platform that fully mines and utilizes the data for modern molecular breeding. We present Crop-GPA ( https://crop-gpa.aielab.net ), a comprehensive and functional open-source platform for crop gene-phenotype association data. The current Crop-GPA provides well-curated information on genes, phenotypes, and their associations (GPAs) to researchers through an intuitive interface, dynamic graphical visualizations, and efficient online tools. Two computational tools, GPA-BERT and GPA-GCN, are specifically developed and integrated into Crop-GPA, facilitating the automatic extraction of gene-phenotype associations from bio-crop literature and predicting unknown relations based on known associations. Through usage examples, we demonstrate how our platform enables the exploration of complex correlations between genes and phenotypes in crop plants. In summary, Crop-GPA serves as a valuable multi-functional resource, empowering the crop research community to gain deeper insights into the biological mechanisms of interest.
Collapse
Affiliation(s)
- Yujia Gao
- School of Information and Artificial Intelligence, Anhui Beidou Precision Agriculture Information Engineering Research Center, Anhui Agricultural University, Hefei, Anhui, 230036, China
| | - Qian Zhou
- School of Information and Artificial Intelligence, Anhui Beidou Precision Agriculture Information Engineering Research Center, Anhui Agricultural University, Hefei, Anhui, 230036, China
| | - Jiaxin Luo
- School of Information and Artificial Intelligence, Anhui Beidou Precision Agriculture Information Engineering Research Center, Anhui Agricultural University, Hefei, Anhui, 230036, China
| | - Chuan Xia
- School of Information and Artificial Intelligence, Anhui Beidou Precision Agriculture Information Engineering Research Center, Anhui Agricultural University, Hefei, Anhui, 230036, China
| | - Youhua Zhang
- School of Information and Artificial Intelligence, Anhui Beidou Precision Agriculture Information Engineering Research Center, Anhui Agricultural University, Hefei, Anhui, 230036, China.
| | - Zhenyu Yue
- School of Information and Artificial Intelligence, Anhui Beidou Precision Agriculture Information Engineering Research Center, Anhui Agricultural University, Hefei, Anhui, 230036, China.
| |
Collapse
|
3
|
Jia P, Hu R, Yan F, Dai Y, Zhao Z. scGWAS: landscape of trait-cell type associations by integrating single-cell transcriptomics-wide and genome-wide association studies. Genome Biol 2022; 23:220. [PMID: 36253801 PMCID: PMC9575201 DOI: 10.1186/s13059-022-02785-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 10/05/2022] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND The rapid accumulation of single-cell RNA sequencing (scRNA-seq) data presents unique opportunities to decode the genetically mediated cell-type specificity in complex diseases. Here, we develop a new method, scGWAS, which effectively leverages scRNA-seq data to achieve two goals: (1) to infer the cell types in which the disease-associated genes manifest and (2) to construct cellular modules which imply disease-specific activation of different processes. RESULTS scGWAS only utilizes the average gene expression for each cell type followed by virtual search processes to construct the null distributions of module scores, making it scalable to large scRNA-seq datasets. We demonstrated scGWAS in 40 genome-wide association studies (GWAS) datasets (average sample size N ≈ 154,000) using 18 scRNA-seq datasets from nine major human/mouse tissues (totaling 1.08 million cells) and identified 2533 trait and cell-type associations, each with significant modules for further investigation. The module genes were validated using disease or clinically annotated references from ClinVar, OMIM, and pLI variants. CONCLUSIONS We showed that the trait-cell type associations identified by scGWAS, while generally constrained to trait-tissue associations, could recapitulate many well-studied relationships and also reveal novel relationships, providing insights into the unsolved trait-tissue associations. Moreover, in each specific cell type, the associations with different traits were often mediated by different sets of risk genes, implying disease-specific activation of driving processes. In summary, scGWAS is a powerful tool for exploring the genetic basis of complex diseases at the cell type level using single-cell expression data.
Collapse
Affiliation(s)
- Peilin Jia
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Ruifeng Hu
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Fangfang Yan
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Yulin Dai
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
- MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX 77030 USA
| |
Collapse
|
4
|
Drug-Target Network Study Reveals the Core Target-Protein Interactions of Various COVID-19 Treatments. Genes (Basel) 2022; 13:genes13071210. [PMID: 35885993 PMCID: PMC9316565 DOI: 10.3390/genes13071210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 07/01/2022] [Accepted: 07/03/2022] [Indexed: 02/04/2023] Open
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has caused a dramatic loss of human life and devastated the worldwide economy. Numerous efforts have been made to mitigate COVID-19 symptoms and reduce the death rate. We conducted literature mining of more than 250 thousand published works and curated the 174 most widely used COVID-19 medications. Overlaid with the human protein-protein interaction (PPI) network, we used Steiner tree analysis to extract a core subnetwork that grew from the pharmacological targets of ten credible drugs ascertained by the CTD database. The resultant core subnetwork consisted of 34 interconnected genes, which were associated with 36 drugs. Immune cell membrane receptors, the downstream cellular signaling cascade, and severe COVID-19 symptom risk were significantly enriched for the core subnetwork genes. The lung mast cell was most enriched for the target genes among 1355 human tissue-cell types. Human bronchoalveolar lavage fluid COVID-19 single-cell RNA-Seq data highlighted the fact that T cells and macrophages have the most overlapping genes from the core subnetwork. Overall, we constructed an actionable human target-protein module that mainly involved anti-inflammatory/antiviral entry functions and highly overlapped with COVID-19-severity-related genes. Our findings could serve as a knowledge base for guiding drug discovery or drug repurposing to confront the fast-evolving SARS-CoV-2 virus and other severe infectious diseases.
Collapse
|
5
|
Liu A, Manuel AM, Dai Y, Zhao Z. Prioritization of risk genes in multiple sclerosis by a refined Bayesian framework followed by tissue-specificity and cell type feature assessment. BMC Genomics 2022; 23:362. [PMID: 35545758 PMCID: PMC9092676 DOI: 10.1186/s12864-022-08580-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 04/22/2022] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Multiple sclerosis (MS) is a debilitating immune-mediated disease of the central nervous system that affects over 2 million people worldwide, resulting in a heavy burden to families and entire communities. Understanding the genetic basis underlying MS could help decipher the pathogenesis and shed light on MS treatment. We refined a recently developed Bayesian framework, Integrative Risk Gene Selector (iRIGS), to prioritize risk genes associated with MS by integrating the summary statistics from the largest GWAS to date (n = 115,803), various genomic features, and gene-gene closeness. RESULTS We identified 163 MS-associated prioritized risk genes (MS-PRGenes) through the Bayesian framework. We replicated 35 MS-PRGenes through two-sample Mendelian randomization (2SMR) approach by integrating data from GWAS and Genotype-Tissue Expression (GTEx) expression quantitative trait loci (eQTL) of 19 tissues. We demonstrated that MS-PRGenes had more substantial deleterious effects and disease risk. Moreover, single-cell enrichment analysis indicated MS-PRGenes were more enriched in activated macrophages and microglia macrophages than non-activated ones in control samples. Biological and drug enrichment analyses highlighted inflammatory signaling pathways. CONCLUSIONS In summary, we predicted and validated a high-confidence MS risk gene set from diverse genomic, epigenomic, eQTL, single-cell, and drug data. The MS-PRGenes could further serve as a benchmark of MS GWAS risk genes for future validation or genetic studies.
Collapse
Affiliation(s)
- Andi Liu
- grid.267308.80000 0000 9206 2401Department of Epidemiology, School of Public Health, Human Genetics and Environmental Sciences, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA ,grid.267308.80000 0000 9206 2401Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Astrid M. Manuel
- grid.267308.80000 0000 9206 2401Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Yulin Dai
- grid.267308.80000 0000 9206 2401Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Zhongming Zhao
- grid.267308.80000 0000 9206 2401Department of Epidemiology, School of Public Health, Human Genetics and Environmental Sciences, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA ,grid.267308.80000 0000 9206 2401Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA ,grid.267308.80000 0000 9206 2401Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| |
Collapse
|
6
|
Zhu M, Yin P, Hu F, Jiang J, Yin L, Li Y, Wang S. Integrating genome-wide association and transcriptome prediction model identifies novel target genes for osteoporosis. Osteoporos Int 2021; 32:2493-2503. [PMID: 34142171 PMCID: PMC8608767 DOI: 10.1007/s00198-021-06024-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 05/31/2021] [Indexed: 12/12/2022]
Abstract
UNLABELLED In this study, we integrated large-scale GWAS summary data and used the predicted transcriptome-wide association study method to discover novel genes associated with osteoporosis. We identified 204 candidate genes, which provide novel clues for understanding the genetic mechanism of osteoporosis and indicate potential therapeutic targets. INTRODUCTION Osteoporosis is a highly polygenetic disease characterized by low bone mass and deterioration of the bone microarchitecture. Our objective was to discover novel candidate genes associated with osteoporosis. METHODS To identify potential causal genes of the associated loci, we investigated trait-gene expression associations using the transcriptome-wide association study (TWAS) method. This method directly imputes gene expression effects from genome-wide association study (GWAS) data using a statistical prediction model trained on GTEx reference transcriptome data. We then performed a colocalization analysis to evaluate the posterior probability of biological patterns: associations characterized by a single causal variant or multiple distinct causal variants. Finally, a functional enrichment analysis of gene sets was performed using the VarElect and CluePedia tools, which assess the causal relationships between genes and a disease and search for potential gene's functional pathways. The osteoporosis-associated genes were further confirmed based on the differentially expressed genes profiled from mRNA expression data of bone tissue. RESULTS Our analysis identified 204 candidate genes, including 154 genes that have been previously associated with osteoporosis, 50 genes that have not been previously discovered. A biological function analysis found that 20 of the candidate genes were directly associated with osteoporosis. Further analysis of multiple gene expression profiles showed that 15 genes were differentially expressed in patients with osteoporosis. Among these, SLC11A2, MAP2K5, NFATC4, and HSP90B1 were enriched in four pathways, namely, mineral absorption pathway, MAPK signaling pathway, Wnt signaling pathway, and PI3K-Akt signaling pathway, which indicates a causal relationship with the occurrence of osteoporosis. CONCLUSIONS We demonstrated that transcriptome fine-mapping identifies more osteoporosis-related genes and provides key insight into the development of novel targeted therapeutics for the treatment of osteoporosis.
Collapse
Affiliation(s)
- M Zhu
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - P Yin
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
| | - F Hu
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - J Jiang
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - L Yin
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Y Li
- AnLan AI, Shenzhen, China
| | - S Wang
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| |
Collapse
|
7
|
Pei G, Hu R, Jia P, Zhao Z. DeepFun: a deep learning sequence-based model to decipher non-coding variant effect in a tissue- and cell type-specific manner. Nucleic Acids Res 2021; 49:W131-W139. [PMID: 34048560 PMCID: PMC8262726 DOI: 10.1093/nar/gkab429] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 04/26/2021] [Accepted: 05/04/2021] [Indexed: 12/29/2022] Open
Abstract
More than 90% of the genetic variants identified from genome-wide association studies (GWAS) are located in non-coding regions of the human genome. Here, we present a user-friendly web server, DeepFun (https://bioinfo.uth.edu/deepfun/), to assess the functional activity of non-coding genetic variants. This new server is built on a convolutional neural network (CNN) framework that has been extensively evaluated. Specifically, we collected chromatin profiles from ENCODE and Roadmap projects to construct the feature space, including 1548 DNase I accessibility, 1536 histone mark, and 4795 transcription factor binding profiles covering 225 tissues or cell types. With such comprehensive epigenomics annotations, DeepFun expands the functionality of existing non-coding variant prioritizing tools to provide a more specific functional assessment on non-coding variants in a tissue- and cell type-specific manner. By using the datasets from various GWAS studies, we conducted independent validations and demonstrated the functions of the DeepFun web server in predicting the effect of a non-coding variant in a specific tissue or cell type, as well as visualizing the potential motifs in the region around variants. We expect our server will be widely used in genetics, functional genomics, and disease studies.
Collapse
Affiliation(s)
- Guangsheng Pei
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Ruifeng Hu
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Peilin Jia
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX 77030, USA
| |
Collapse
|
8
|
Struckmann S, Ernst M, Fischer S, Mah N, Fuellen G, Möller S. Scoring functions for drug-effect similarity. Brief Bioinform 2021; 22:bbaa072. [PMID: 32484516 PMCID: PMC8138836 DOI: 10.1093/bib/bbaa072] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Revised: 03/26/2020] [Accepted: 03/31/2020] [Indexed: 02/07/2023] Open
Abstract
MOTIVATION The difficulty to find new drugs and bring them to the market has led to an increased interest to find new applications for known compounds. Biological samples from many disease contexts have been extensively profiled by transcriptomics, and, intuitively, this motivates to search for compounds with a reversing effect on the expression of characteristic disease genes. However, disease effects may be cell line-specific and also depend on other factors, such as genetics and environment. Transcription profile changes between healthy and diseased cells relate in complex ways to profile changes gathered from cell lines upon stimulation with a drug. Despite these differences, we expect that there will be some similarity in the gene regulatory networks at play in both situations. The challenge is to match transcriptomes for both diseases and drugs alike, even though the exact molecular pathology/pharmacogenomics may not be known. RESULTS We substitute the challenge to match a drug effect to a disease effect with the challenge to match a drug effect to the effect of the same drug at another concentration or in another cell line. This is welldefined, reproducible in vitro and in silico and extendable with external data. Based on the Connectivity Map (CMap) dataset, we combined 26 different similarity scores with six different heuristics to reduce the number of genes in the model. Such gene filters may also utilize external knowledge e.g. from biological networks. We found that no similarity score always outperforms all others for all drugs, but the Pearson correlation finds the same drug with the highest reliability. Results are improved by filtering for highly expressed genes and to a lesser degree for genes with large fold changes. Also a network-based reduction of contributing transcripts was beneficial, here implemented by the FocusHeuristics. We found no drop in prediction accuracy when reducing the whole transcriptome to the set of 1000 landmark genes of the CMap's successor project Library of Integrated Network-based Cellular Signatures. All source code to re-analyze and extend the CMap data, the source code of heuristics, filters and their evaluation are available to propel the development of new methods for drug repurposing. AVAILABILITY https://bitbucket.org/ibima/moldrugeffectsdb. CONTACT steffen.moeller@uni-rostock.de. SUPPLEMENTARY INFORMATION Supplementary data are available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
- Stephan Struckmann
- IBIMA, Rostock University Medical Center, Rostock, 18041, Germany
- SHIP-KEF, Institute for Community Medicine, University Medicine of Greifswald, Walther-Rathenau-Straβe 48, 17475 Greifswald, Germany
| | - Mathias Ernst
- IBIMA, Rostock University Medical Center, Rostock, 18041, Germany
- Friedrich-Alexander-University Erlangen-Nuremberg, 91058 Erlangen, Germany
| | - Sarah Fischer
- IBIMA, Rostock University Medical Center, Rostock, 18041, Germany
| | - Nancy Mah
- BCRT - Berlin Institute of Health Center for Regenerative Therapies, Charité - University Medicine Berlin, 13353, Germany
| | - Georg Fuellen
- IBIMA, Rostock University Medical Center, Rostock, 18041, Germany
| | - Steffen Möller
- IBIMA, Rostock University Medical Center, Rostock, 18041, Germany
| |
Collapse
|
9
|
Dai Y, Hu R, Manuel AM, Liu A, Jia P, Zhao Z. CSEA-DB: an omnibus for human complex trait and cell type associations. Nucleic Acids Res 2021; 49:D862-D870. [PMID: 33211888 PMCID: PMC7778923 DOI: 10.1093/nar/gkaa1064] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 10/18/2020] [Accepted: 10/21/2020] [Indexed: 12/20/2022] Open
Abstract
During the past decade, genome-wide association studies (GWAS) have identified many genetic variants with susceptibility to several thousands of complex diseases or traits. The genetic regulation of gene expression is highly tissue-specific and cell type-specific. Recently, single-cell technology has paved the way to dissect cellular heterogeneity in human tissues. Here, we present a reference database for GWAS trait-associated cell type-specificity, named Cell type-Specific Enrichment Analysis DataBase (CSEA-DB, available at https://bioinfo.uth.edu/CSEADB/). Specifically, we curated total of 5120 GWAS summary statistics data for a wide range of human traits and diseases followed by rigorous quality control. We further collected >900 000 cells from the leading consortia such as Human Cell Landscape, Human Cell Atlas, and extensive literature mining, including 752 tissue cell types from 71 adult and fetal tissues across 11 human organ systems. The tissues and cell types were annotated with Uberon and Cell Ontology. By applying our deTS algorithm, we conducted 10 250 480 times of trait-cell type associations, reporting a total of 598 (11.68%) GWAS traits with at least one significantly associated cell type. In summary, CSEA-DB could serve as a repository of association map for human complex traits and their underlying cell types, manually curated GWAS, and single-cell transcriptome resources.
Collapse
Affiliation(s)
- Yulin Dai
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Ruifeng Hu
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Astrid Marilyn Manuel
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Andi Liu
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Peilin Jia
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.,Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.,MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX 77030, USA
| |
Collapse
|
10
|
Pei G, Hu R, Dai Y, Manuel AM, Zhao Z, Jia P. Predicting regulatory variants using a dense epigenomic mapped CNN model elucidated the molecular basis of trait-tissue associations. Nucleic Acids Res 2021; 49:53-66. [PMID: 33300042 PMCID: PMC7797043 DOI: 10.1093/nar/gkaa1137] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 10/22/2020] [Accepted: 12/08/2020] [Indexed: 02/06/2023] Open
Abstract
Assessing the causal tissues of human complex diseases is important for the prioritization of trait-associated genetic variants. Yet, the biological underpinnings of trait-associated variants are extremely difficult to infer due to statistical noise in genome-wide association studies (GWAS), and because >90% of genetic variants from GWAS are located in non-coding regions. Here, we collected the largest human epigenomic map from ENCODE and Roadmap consortia and implemented a deep-learning-based convolutional neural network (CNN) model to predict the regulatory roles of genetic variants across a comprehensive list of epigenomic modifications. Our model, called DeepFun, was built on DNA accessibility maps, histone modification marks, and transcription factors. DeepFun can systematically assess the impact of non-coding variants in the most functional elements with tissue or cell-type specificity, even for rare variants or de novo mutations. By applying this model, we prioritized trait-associated loci for 51 publicly-available GWAS studies. We demonstrated that CNN-based analyses on dense and high-resolution epigenomic annotations can refine important GWAS associations in order to identify regulatory loci from background signals, which yield novel insights for better understanding the molecular basis of human complex disease. We anticipate our approaches will become routine in GWAS downstream analysis and non-coding variant evaluation.
Collapse
Affiliation(s)
- Guangsheng Pei
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Ruifeng Hu
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Yulin Dai
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Astrid Marilyn Manuel
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.,Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.,MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX 77030, USA.,Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| | - Peilin Jia
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
11
|
Dai Y, O'Brien TD, Pei G, Zhao Z, Jia P. Characterization of genome-wide association study data reveals spatiotemporal heterogeneity of mental disorders. BMC Med Genomics 2020; 13:192. [PMID: 33371872 PMCID: PMC7771094 DOI: 10.1186/s12920-020-00832-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Accepted: 11/23/2020] [Indexed: 12/15/2022] Open
Abstract
Background Psychiatric disorders such as schizophrenia (SCZ), bipolar disorder (BIP), major depressive disorder (MDD), attention deficit-hyperactivity disorder (ADHD), and autism spectrum disorder (ASD) are often related to brain development. Both shared and unique biological and neurodevelopmental processes have been reported to be involved in these disorders. Methods In this work, we developed an integrative analysis framework to seek for the sensitive spatiotemporal point during brain development underlying each disorder. Specifically, we first identified spatiotemporal gene co-expression modules for four brain regions three developmental stages (prenatal, birth to 11 years old, and older than 13 years), totaling 12 spatiotemporal sites. By integrating GWAS summary statistics and the spatiotemporal co-expression modules, we characterized the risk genes and their co-expression partners for five disorders. Results We found that SCZ and BIP, ASD and ADHD tend to cluster with each other and keep a distance from other psychiatric disorders. At the gene level, we identified several genes that were shared among the most significant modules, such as CTNNB1 and LNX1, and a hub gene, ATF2, in multiple modules. Moreover, we pinpointed two spatiotemporal points in the prenatal stage with active expression activities and highlighted one postnatal point for BIP. Further functional analysis of the disorder-related module highlighted the apoptotic signaling pathway for ASD and the immune-related and cell-cell adhesion function for SCZ, respectively. Conclusion Our study demonstrated the dynamic changes of disorder-related genes at the network level, shedding light on the spatiotemporal regulation during brain development.
Collapse
Affiliation(s)
- Yulin Dai
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St. Suite 820, Houston, TX, 77030, USA
| | - Timothy D O'Brien
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St. Suite 820, Houston, TX, 77030, USA
| | - Guangsheng Pei
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St. Suite 820, Houston, TX, 77030, USA
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St. Suite 820, Houston, TX, 77030, USA. .,Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA. .,MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX, 77030, USA. .,Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37203, USA.
| | - Peilin Jia
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St. Suite 820, Houston, TX, 77030, USA.
| |
Collapse
|
12
|
Pei G, Wang YY, Simon LM, Dai Y, Zhao Z, Jia P. Gene expression imputation and cell-type deconvolution in human brain with spatiotemporal precision and its implications for brain-related disorders. Genome Res 2020; 31:146-158. [PMID: 33272935 PMCID: PMC7849392 DOI: 10.1101/gr.265769.120] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Accepted: 11/25/2020] [Indexed: 12/30/2022]
Abstract
As the most complex organ of the human body, the brain is composed of diverse regions, each consisting of distinct cell types and their respective cellular interactions. Human brain development involves a finely tuned cascade of interactive events. These include spatiotemporal gene expression changes and dynamic alterations in cell-type composition. However, our understanding of this process is still largely incomplete owing to the difficulty of brain spatiotemporal transcriptome collection. In this study, we developed a tensor-based approach to impute gene expression on a transcriptome-wide level. After rigorous computational benchmarking, we applied our approach to infer missing data points in the widely used BrainSpan resource and completed the entire grid of spatiotemporal transcriptomics. Next, we conducted deconvolutional analyses to comprehensively characterize major cell-type dynamics across the entire BrainSpan resource to estimate the cellular temporal changes and distinct neocortical areas across development. Moreover, integration of these results with GWAS summary statistics for 13 brain-associated traits revealed multiple novel trait–cell-type associations and trait-spatiotemporal relationships. In summary, our imputed BrainSpan transcriptomic data provide a valuable resource for the research community and our findings help further studies of the transcriptional and cellular dynamics of the human brain and related diseases.
Collapse
Affiliation(s)
- Guangsheng Pei
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas 77030, USA
| | - Yin-Ying Wang
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas 77030, USA
| | - Lukas M Simon
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas 77030, USA
| | - Yulin Dai
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas 77030, USA
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas 77030, USA.,Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas 77030, USA.,MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, Texas 77030, USA.,Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee 37203, USA
| | - Peilin Jia
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas 77030, USA
| |
Collapse
|
13
|
Dai Y, Hu R, Pei G, Zhang H, Zhao Z, Jia P. Diverse types of genomic evidence converge on alcohol use disorder risk genes. J Med Genet 2020; 57:733-743. [PMID: 32170004 DOI: 10.1136/jmedgenet-2019-106490] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Revised: 01/08/2020] [Accepted: 02/10/2020] [Indexed: 12/27/2022]
Abstract
BACKGROUND Alcohol use disorder (AUD) is one of the most common forms of substance use disorders with a strong contribution of genetic (50%-60%) and environmental factors. Genome-wide association studies (GWAS) have identified a number of AUD-associated variants, including those in alcohol metabolism genes. These genetic variants may modulate gene expression, making individuals more susceptible to AUD. A long-term alcohol consumption can also change the transcriptome patterns of subjects via epigenetic modulations. METHODS To explore the interactive effect of genetic and epigenetic factors on AUD, we conducted a secondary analysis by integrating GWAS, CNV, brain transcriptome and DNA methylation data to unravel novel AUD-associated genes/variants. We applied the mega-analysis of OR (MegaOR) method to prioritise AUD candidate genes (AUDgenes). RESULTS We identified a consensus set of 206 AUDgenes based on the multi-omics data. We demonstrated that these AUDgenes tend to interact with each other more frequent than chance expectation. Functional annotation analysis indicated that these AUDgenes were involved in substance dependence, synaptic transmission, glial cell proliferation and enriched in neuronal and liver cells. We obtained a multidimensional evidence that AUD is a polygenic disorder influenced by both genetic and epigenetic factors as well as the interaction of them. CONCLUSION We characterised multidimensional evidence of genetic, epigenetic and transcriptomic data in AUD. We found that 206 AUD associated genes were highly expressed in liver, brain cerebellum, frontal cortex, hippocampus and pituitary. Our studies provides important insights into the molecular mechanism of AUD and potential target genes for AUD treatment.
Collapse
Affiliation(s)
- Yulin Dai
- School of Biomedical Science, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Ruifeng Hu
- School of Biomedical Science, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Guangsheng Pei
- School of Biomedical Science, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Huiping Zhang
- Department of Psychiatry, Boston University School of Medicine, Boston, Massachusetts, USA.,Department of Medicine (Biomedical Genetics), Boston University School of Medicine, Boston, Massachusetts, USA
| | - Zhongming Zhao
- School of Biomedical Science, University of Texas Health Science Center at Houston, Houston, Texas, USA.,Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Peilin Jia
- School of Biomedical Science, University of Texas Health Science Center at Houston, Houston, Texas, USA
| |
Collapse
|
14
|
Rigden DJ, Fernández XM. The 27th annual Nucleic Acids Research database issue and molecular biology database collection. Nucleic Acids Res 2020; 48:D1-D8. [PMID: 31906604 PMCID: PMC6943072 DOI: 10.1093/nar/gkz1161] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The 2020 Nucleic Acids Research Database Issue contains 148 papers spanning molecular biology. They include 59 papers reporting on new databases and 79 covering recent changes to resources previously published in the issue. A further ten papers are updates on databases most recently published elsewhere. This issue contains three breakthrough articles: AntiBodies Chemically Defined (ABCD) curates antibody sequences and their cognate antigens; SCOP returns with a new schema and breaks away from a purely hierarchical structure; while the new Alliance of Genome Resources brings together a number of Model Organism databases to pool knowledge and tools. Major returning nucleic acid databases include miRDB and miRTarBase. Databases for protein sequence analysis include CDD, DisProt and ELM, alongside no fewer than four newcomers covering proteins involved in liquid-liquid phase separation. In metabolism and signaling, Pathway Commons, Reactome and Metabolights all contribute papers. PATRIC and MicroScope update in microbial genomes while human and model organism genomics resources include Ensembl, Ensembl genomes and UCSC Genome Browser. Immune-related proteins are covered by updates from IPD-IMGT/HLA and AFND, as well as newcomers VDJbase and OGRDB. Drug design is catered for by updates from the IUPHAR/BPS Guide to Pharmacology and the Therapeutic Target Database. The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). The NAR online Molecular Biology Database Collection has been revised, updating 305 entries, adding 65 new resources and eliminating 125 discontinued URLs; so bringing the current total to 1637 databases. It is available at http://www.oxfordjournals.org/nar/database/c/.
Collapse
Affiliation(s)
- Daniel J Rigden
- Institute of Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK
| | | |
Collapse
|