1
|
Brabec JL, Lara MK, Tyler AL, Mahoney JM. System-Level Analysis of Alzheimer's Disease Prioritizes Candidate Genes for Neurodegeneration. Front Genet 2021; 12:625246. [PMID: 33889174 PMCID: PMC8056044 DOI: 10.3389/fgene.2021.625246] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Accepted: 02/22/2021] [Indexed: 12/11/2022] Open
Abstract
Alzheimer’s disease (AD) is a debilitating neurodegenerative disorder. Since the advent of the genome-wide association study (GWAS) we have come to understand much about the genes involved in AD heritability and pathophysiology. Large case-control meta-GWAS studies have increased our ability to prioritize weaker effect alleles, while the recent development of network-based functional prediction has provided a mechanism by which we can use machine learning to reprioritize GWAS hits in the functional context of relevant brain tissues like the hippocampus and amygdala. In parallel with these developments, groups like the Alzheimer’s Disease Neuroimaging Initiative (ADNI) have compiled rich compendia of AD patient data including genotype and biomarker information, including derived volume measures for relevant structures like the hippocampus and the amygdala. In this study we wanted to identify genes involved in AD-related atrophy of these two structures, which are often critically impaired over the course of the disease. To do this we developed a combined score prioritization method which uses the cumulative distribution function of a gene’s functional and positional score, to prioritize top genes that not only segregate with disease status, but also with hippocampal and amygdalar atrophy. Our method identified a mix of genes that had previously been identified in AD GWAS including APOE, TOMM40, and NECTIN2(PVRL2) and several others that have not been identified in AD genetic studies, but play integral roles in AD-effected functional pathways including IQSEC1, PFN1, and PAK2. Our findings support the viability of our novel combined score as a method for prioritizing region- and even cell-specific AD risk genes.
Collapse
Affiliation(s)
- Jeffrey L Brabec
- Department of Neurological Sciences, University of Vermont, Burlington, VT, United States
| | - Montana Kay Lara
- Department of Neurological Sciences, University of Vermont, Burlington, VT, United States
| | - Anna L Tyler
- The Jackson Laboratory, Bar Harbor, ME, United States
| | - J Matthew Mahoney
- Department of Neurological Sciences, University of Vermont, Burlington, VT, United States.,The Jackson Laboratory, Bar Harbor, ME, United States
| |
Collapse
|
2
|
Cai J, Cai H, Chen J, Yang X. Identifying "Many-to-Many" Relationships between Gene-Expression Data and Drug-Response Data via Sparse Binary Matching. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:165-176. [PMID: 29994482 DOI: 10.1109/tcbb.2018.2849708] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Identifying gene-drug patterns is a critical step in pharmacology for unveiling disease mechanisms and drug discovery. The availability of high-throughput technologies accumulates massive large-scale pharmacological and genomic data, and thus provides a new substantial opportunity to deeply understand how the oncogenic genes and the therapeutic drugs relate to each other. However, most previous studies merely used the pharmacological and genomic datasets without any prior knowledge to infer the gene-drug patterns. Here, we proposed a novel network-guided sparse binary matching model (NSBM) to decode these relationships hidden in the datasets. Not only the large-scale gene-expression data and drug-response data are jointly analyzed in our method, but also the additional prior information of genes and drugs are integrated into the form of network-based regularization. The essential structure of the NSBM model is a convex quadratic minimization problem with network-based penalties. It was demonstrated to be superior when compared with two benchmark methods through extensive experiments on both synthetic and empirical data. Posterior validation, including gene-ontology and enrichment analysis, confirmed the effectiveness of NSBM in revealing gene-drug patterns on a large-scale heterogeneous data source.
Collapse
|
3
|
Ma L, Rolls ET, Liu X, Liu Y, Jiao Z, Wang Y, Gong W, Ma Z, Gong F, Wan L. Multi-scale analysis of schizophrenia risk genes, brain structure, and clinical symptoms reveals integrative clues for subtyping schizophrenia patients. J Mol Cell Biol 2019; 11:678-687. [PMID: 30508120 PMCID: PMC6788727 DOI: 10.1093/jmcb/mjy071] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Revised: 11/01/2018] [Accepted: 11/20/2018] [Indexed: 12/30/2022] Open
Abstract
Analysis linking directly genomics, neuroimaging phenotypes and clinical measurements is crucial for understanding psychiatric disorders, but remains rare. Here, we describe a multi-scale analysis using genome-wide SNPs, gene expression, grey matter volume (GMV), and the positive and negative syndrome scale scores (PANSS) to explore the etiology of schizophrenia. With 72 drug-naive schizophrenic first episode patients (FEPs) and 73 matched heathy controls, we identified 108 genes, from schizophrenia risk genes, that correlated significantly with GMV, which are highly co-expressed in the brain during development. Among these 108 candidates, 19 distinct genes were found associated with 16 brain regions referred to as hot clusters (HCs), primarily in the frontal cortex, sensory-motor regions and temporal and parietal regions. The patients were subtyped into three groups with distinguishable PANSS scores by the GMV of the identified HCs. Furthermore, we found that HCs with common GMV among patient groups are related to genes that mostly mapped to pathways relevant to neural signaling, which are associated with the risk for schizophrenia. Our results provide an integrated view of how genetic variants may affect brain structures that lead to distinct disease phenotypes. The method of multi-scale analysis that was described in this research, may help to advance the understanding of the etiology of schizophrenia.
Collapse
Affiliation(s)
- Liang Ma
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Edmund T Rolls
- Department of Computer Science, University of Warwick, Coventry, UK.,Oxford Centre for Computational Neuroscience, Oxford, UK
| | - Xiuqin Liu
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing, China
| | - Yuting Liu
- School of Science, Beijing Jiaotong University, Beijing, China
| | - Zeyu Jiao
- Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, China
| | - Yue Wang
- School of Science, Beijing Jiaotong University, Beijing, China
| | - Weikang Gong
- CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Zhiming Ma
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Fuzhou Gong
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Lin Wan
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
4
|
Yin Q, Wu M, Liu Q, Lv H, Jiang R. DeepHistone: a deep learning approach to predicting histone modifications. BMC Genomics 2019; 20:193. [PMID: 30967126 PMCID: PMC6456942 DOI: 10.1186/s12864-019-5489-4] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION Quantitative detection of histone modifications has emerged in the recent years as a major means for understanding such biological processes as chromosome packaging, transcriptional activation, and DNA damage. However, high-throughput experimental techniques such as ChIP-seq are usually expensive and time-consuming, prohibiting the establishment of a histone modification landscape for hundreds of cell types across dozens of histone markers. These disadvantages have been appealing for computational methods to complement experimental approaches towards large-scale analysis of histone modifications. RESULTS We proposed a deep learning framework to integrate sequence information and chromatin accessibility data for the accurate prediction of modification sites specific to different histone markers. Our method, named DeepHistone, outperformed several baseline methods in a series of comprehensive validation experiments, not only within an epigenome but also across epigenomes. Besides, sequence signatures automatically extracted by our method was consistent with known transcription factor binding sites, thereby giving insights into regulatory signatures of histone modifications. As an application, our method was shown to be able to distinguish functional single nucleotide polymorphisms from their nearby genetic variants, thereby having the potential to be used for exploring functional implications of putative disease-associated genetic variants. CONCLUSIONS DeepHistone demonstrated the possibility of using a deep learning framework to integrate DNA sequence and experimental data for predicting epigenomic signals. With the state-of-the-art performance, DeepHistone was expected to shed light on a variety of epigenomic studies. DeepHistone is freely available in https://github.com/QijinYin/DeepHistone .
Collapse
Affiliation(s)
- Qijin Yin
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Laboratory for Information Science and Technology, Tsinghua University, Beijing, 100084, China
| | - Mengmeng Wu
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Laboratory for Information Science and Technology, Tsinghua University, Beijing, 100084, China
| | - Qiao Liu
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Laboratory for Information Science and Technology, Tsinghua University, Beijing, 100084, China
| | - Hairong Lv
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Laboratory for Information Science and Technology, Tsinghua University, Beijing, 100084, China.
| | - Rui Jiang
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Laboratory for Information Science and Technology, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
5
|
Shan N, Wang Z, Hou L. Identification of trans-eQTLs using mediation analysis with multiple mediators. BMC Bioinformatics 2019; 20:126. [PMID: 30925861 PMCID: PMC6440281 DOI: 10.1186/s12859-019-2651-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Background Mapping expression quantitative trait loci (eQTLs) has provided insight into gene regulation. Compared to cis-eQTLs, the regulatory mechanisms of trans-eQTLs are less known. Previous studies suggest that trans-eQTLs may regulate expression of remote genes by altering the expression of nearby genes. Trans-association has been studied in the mediation analysis with a single mediator. However, prior applications with one mediator are prone to model misspecification due to correlations between genes. Motivated from the observation that trans-eQTLs are more likely to associate with more than one cis-gene than randomly selected SNPs in the GTEx dataset, we developed a computational method to identify trans-eQTLs that are mediated by multiple mediators. Results We proposed two hypothesis tests for testing the total mediation effect (TME) and the component-wise mediation effects (CME), respectively. We demonstrated in simulation studies that the type I error rates were controlled in both tests despite model misspecification. The TME test was more powerful than the CME test when the two mediation effects are in the same direction, while the CME test was more powerful than the TME test when the two mediation effects are in opposite direction. Multiple mediator analysis had increased power to detect mediated trans-eQTLs, especially in large samples. In the HapMap3 data, we identified 11 mediated trans-eQTLs that were not detected by the single mediator analysis in the combined samples of African populations. Moreover, the mediated trans-eQTLs in the HapMap3 samples are more likely to be trait-associated SNPs. In terms of computation, although there is no limit in the number of mediators in our model, analysis takes more time when adding additional mediators. In the analysis of the HapMap3 samples, we included at most 5 cis-gene mediators. Majority of the trios we considered have one or two mediators. Conclusions Trans-eQTLs are more likely to associate with multiple cis-genes than randomly selected SNPs. Mediation analysis with multiple mediators improves power of identification of mediated trans-eQTLs, especially in large samples. Electronic supplementary material The online version of this article (10.1186/s12859-019-2651-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nayang Shan
- Center for Statistical Science, Tsinghua University, Beijing, 100084, China.,Department of Industrial Engineering, Tsinghua University, Beijing, 100084, China
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06510, USA.
| | - Lin Hou
- Center for Statistical Science, Tsinghua University, Beijing, 100084, China. .,Department of Industrial Engineering, Tsinghua University, Beijing, 100084, China. .,MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
6
|
Ma S, Jiang T, Jiang R. Constructing tissue-specific transcriptional regulatory networks via a Markov random field. BMC Genomics 2018; 19:884. [PMID: 30598101 PMCID: PMC6311931 DOI: 10.1186/s12864-018-5277-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Recent advances in sequencing technologies have enabled parallel assays of chromatin accessibility and gene expression for major human cell lines. Such innovation provides a great opportunity to decode phenotypic consequences of genetic variation via the construction of predictive gene regulatory network models. However, there still lacks a computational method to systematically integrate chromatin accessibility information with gene expression data to recover complicated regulatory relationships between genes in a tissue-specific manner. RESULTS We propose a Markov random field (MRF) model for constructing tissue-specific transcriptional regulatory networks via integrative analysis of DNase-seq and RNA-seq data. Our method, named CSNets (cell-line specific regulatory networks), first infers regulatory networks for individual cell lines using chromatin accessibility information, and then fine-tunes these networks using the MRF based on pairwise similarity between cell lines derived from gene expression data. Using this method, we constructed regulatory networks specific to 110 human cell lines and 13 major tissues with the use of ENCODE data. We demonstrated the high quality of these networks via comprehensive statistical analysis based on ChIP-seq profiles, functional annotations, taxonomic analysis, and literature surveys. We further applied these networks to analyze GWAS data of Crohn's disease and prostate cancer. Results were either consistent with the literature or provided biological insights into regulatory mechanisms of these two complex diseases. The website of CSNets is freely available at http://bioinfo.au.tsinghua.edu.cn/jianglab/CSNETS/ . CONCLUSIONS CSNets demonstrated the power of joint analysis on epigenomic and transcriptomic data towards the accurate construction of gene regulatory network. Our work provides not only a useful resource of regulatory networks to the community, but also valuable experiences in methodology development for multi-omics data integration.
Collapse
Affiliation(s)
- Shining Ma
- Department of Statistics, Department of Biomedical Data Science, Bio-X Program Stanford University, Stanford, CA 94305 USA
| | - Tao Jiang
- Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, 100084 China
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521 USA
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, 100084 China
| |
Collapse
|
7
|
Yang X, Han G, Chen J, Cai H. Finding Correlated Patterns via High-Order Matching for Multiple Sourced Biological Data. IEEE Trans Biomed Eng 2018; 66:1017-1025. [PMID: 30130172 DOI: 10.1109/tbme.2018.2866266] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
OBJECTIVE The emergence of multidimensional genomic data poses new challenges in data analysis. Finding correlated patterns within multiple-sourced biological data is useful in understanding potential interactions between the multimodal genomic data. METHODS Multidimensional genomic data contain multiple genomic data types, and different types of genomic data have different scales and units. These data cannot simply be aggregated for analysis. To address this issue, a correlated pattern discovery model incorporating prior knowledge is proposed. Tensor similarity is used to measure the correlation between common patterns. The model is combined with prior knowledge, the expression of which is transformed into constraints. Efficient numerical solutions are designed and analyzed. RESULTS The proposed method is shown to perform robustly and effectively with both simulated data and real biological data. We conduct experiments on five real cancer data sets to reveal various cancer subtypes. A survival analysis of these subtypes confirms the effectiveness of the model. CONCLUSION We introduce a correlated pattern discovery model incorporating prior knowledge. This model is meaningful for the realization of personalized diagnoses by doctors in the treatment of cancer and other diseases. SIGNIFICANCE The problem of finding correlated patterns from multiple-sourced biological data was formulated as a high-order graph matching problem, and the prior knowledge data were seamlessly incorporated into the matching model.
Collapse
|
8
|
Leveraging multiple gene networks to prioritize GWAS candidate genes via network representation learning. Methods 2018; 145:41-50. [DOI: 10.1016/j.ymeth.2018.06.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Revised: 04/10/2018] [Accepted: 06/01/2018] [Indexed: 12/20/2022] Open
|