1
|
Ye X, Yang S, Tu J, Xu L, Wang Y, Chen H, Yu R, Huang P. Leveraging baseline transcriptional features and information from single-cell data to power the prediction of influenza vaccine response. Front Cell Infect Microbiol 2024; 14:1243586. [PMID: 38384303 PMCID: PMC10879619 DOI: 10.3389/fcimb.2024.1243586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 01/11/2024] [Indexed: 02/23/2024] Open
Abstract
Introduction Vaccination is still the primary means for preventing influenza virus infection, but the protective effects vary greatly among individuals. Identifying individuals at risk of low response to influenza vaccination is important. This study aimed to explore improved strategies for constructing predictive models of influenza vaccine response using gene expression data. Methods We first used gene expression and immune response data from the Immune Signatures Data Resource (IS2) to define influenza vaccine response-related transcriptional expression and alteration features at different time points across vaccination via differential expression analysis. Then, we mapped these features to single-cell resolution using additional published single-cell data to investigate the possible mechanism. Finally, we explored the potential of these identified transcriptional features in predicting influenza vaccine response. We used several modeling strategies and also attempted to leverage the information from single-cell RNA sequencing (scRNA-seq) data to optimize the predictive models. Results The results showed that models based on genes showing differential expression (DEGs) or fold change (DFGs) at day 7 post-vaccination performed the best in internal validation, while models based on DFGs had a better performance in external validation than those based on DEGs. In addition, incorporating baseline predictors could improve the performance of models based on days 1-3, while the model based on the expression profile of plasma cells deconvoluted from the model that used DEGs at day 7 as predictors showed an improved performance in external validation. Conclusion Our study emphasizes the value of using combination modeling strategy and leveraging information from single-cell levels in constructing influenza vaccine response predictive models.
Collapse
Affiliation(s)
- Xiangyu Ye
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Sheng Yang
- Department of Biostatistics, National Vaccine Innovation Platform, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Junlan Tu
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Lei Xu
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Yifan Wang
- Department of Infectious Disease, Jurong Hospital Affiliated to Jiangsu University, Jurong, China
| | - Hongbo Chen
- Department of Infectious Disease, Jurong Hospital Affiliated to Jiangsu University, Jurong, China
| | - Rongbin Yu
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Peng Huang
- Department of Epidemiology, National Vaccine Innovation Platform, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| |
Collapse
|
2
|
Yao S, Zhang X, Zou SC, Zhu Y, Li B, Kuang WP, Guo Y, Li XS, Li L, Wang XY. A transcriptome-wide association study identifies susceptibility genes for Parkinson's disease. NPJ Parkinsons Dis 2021; 7:79. [PMID: 34504106 PMCID: PMC8429416 DOI: 10.1038/s41531-021-00221-7] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Accepted: 08/10/2021] [Indexed: 12/19/2022] Open
Abstract
Genome-wide association study (GWAS) has seen great strides in revealing initial insights into the genetic architecture of Parkinson's disease (PD). Since GWAS signals often reside in non-coding regions, relatively few of the associations have implicated specific biological mechanisms. Here, we aimed to integrate the GWAS results with large-scale expression quantitative trait loci (eQTL) in 13 brain tissues to identify candidate causal genes for PD. We conducted a transcriptome-wide association study (TWAS) for PD using the summary statistics of over 480,000 individuals from the most recent PD GWAS. We identified 18 genes significantly associated with PD after Bonferroni corrections. The most significant gene, LRRC37A2, was associated with PD in all 13 brain tissues, such as in the hypothalamus (P = 6.12 × 10-22) and nucleus accumbens basal ganglia (P = 5.62 × 10-21). We also identified eight conditionally independent genes, including four new genes at known PD loci: CD38, LRRC37A2, RNF40, and ZSWIM7. Through conditional analyses, we demonstrated that several of the GWAS significant signals on PD could be driven by genetically regulated gene expression. The most significant TWAS gene LRRC37A2 accounts for 0.855 of the GWAS signal at its loci, and ZSWIM7 accounts for all the GWAS signals at its loci. We further identified several phenotypes previously associated with PD by querying the single nucleotide polymorphisms (SNPs) in the final model of the identified genes in phenome databases. In conclusion, we prioritized genes that are likely to affect PD by using a TWAS approach and identified phenotypes associated with PD.
Collapse
Affiliation(s)
- Shi Yao
- Department of Neurosurgery, Hunan Brain Hospital, Clinical Medical School of Hunan University of Chinese Medicine, Changsha, Hunan, P. R. China
- National and Local Joint Engineering Research Center of Biodiagnosis and Biotherapy, The Second Affiliated Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi, P. R. China
| | - Xi Zhang
- Department of Neurosurgery, Hunan Brain Hospital, Clinical Medical School of Hunan University of Chinese Medicine, Changsha, Hunan, P. R. China
| | - Shu-Cheng Zou
- Department of Neurosurgery, Hunan Brain Hospital, Clinical Medical School of Hunan University of Chinese Medicine, Changsha, Hunan, P. R. China
| | - Yong Zhu
- Department of Neurosurgery, Hunan Brain Hospital, Clinical Medical School of Hunan University of Chinese Medicine, Changsha, Hunan, P. R. China
| | - Bo Li
- Department of Neurosurgery, Hunan Brain Hospital, Clinical Medical School of Hunan University of Chinese Medicine, Changsha, Hunan, P. R. China
| | - Wei-Ping Kuang
- Department of Neurosurgery, Hunan Brain Hospital, Clinical Medical School of Hunan University of Chinese Medicine, Changsha, Hunan, P. R. China
| | - Yan Guo
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, P. R. China
| | - Xiao-Song Li
- Department of Neurosurgery, Hunan Brain Hospital, Clinical Medical School of Hunan University of Chinese Medicine, Changsha, Hunan, P. R. China.
| | - Liang Li
- Provincial Key Laboratory of TCM Diagnostics, Hunan University of Chinese Medicine, Changsha, Hunan, P. R. China.
| | - Xiao-Ye Wang
- Department of Neurosurgery, Hunan Brain Hospital, Clinical Medical School of Hunan University of Chinese Medicine, Changsha, Hunan, P. R. China.
| |
Collapse
|
3
|
Zhu D, Yao S, Wu H, Ke X, Zhou X, Geng S, Dong S, Chen H, Yang T, Cheng Y, Guo Y. A transcriptome-wide association study identifies novel susceptibility genes for psoriasis. Hum Mol Genet 2021; 31:300-308. [PMID: 34409462 DOI: 10.1093/hmg/ddab237] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 08/09/2021] [Accepted: 08/10/2021] [Indexed: 01/17/2023] Open
Abstract
Although more than 80 psoriasis genetic risk loci have been reported through genome-wide association studies (GWASs), the genetic mechanism of psoriasis remains unclear. To identify novel candidate genes associated with psoriasis and reveal the potential effects of genetic factors in the development of psoriasis, we conducted a transcriptome-wide association study (TWAS) based on summary statistics from GWAS of psoriasis (5175 cases and 447 089 controls) and gene expression levels from six tissues datasets (blood and skin). We identified 11 conditionally independent genes for psoriasis after Bonferroni corrections, such as the most significant genes UBLCP1 (PYFS = 2.98 × 10-16), and LCE3C (PSNSE = 9.72 × 10-12, PSSE = 6.24 × 10-12). The omnibus test identified additional 5 genes associated with psoriasis via the joint association model from multiple reference tissues. Among the 16 identified genes, 5 genes (CTSW, E1F1AD, KLRC3, FIBP, and EFEMP2) were regarded as novel genes for psoriasis. We evaluated the 16 candidate genes by querying public databases and identified 11 differentially expressed genes and 8 genes proved by the knockout mice models. Through GO enrichment analyses, we found that TWAS genes were enriched in the known GO terms associated with skin development, such as cornified envelope (P = 4.80 × 10-8) and peptide cross-linking (P = 1.50 × 10-7). Taken together, our results detected multiple novel candidate genes for psoriasis, providing clues for understanding the genetic mechanism of psoriasis.
Collapse
Affiliation(s)
- Dongli Zhu
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, P. R. China
| | - Shi Yao
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, P. R. China.,National and Local Joint Engineering Research Center of Biodiagnosis and Biotherapy, The Second Affiliated Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi, 710004, P. R. China
| | - Hao Wu
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, P. R. China
| | - Xin Ke
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, P. R. China
| | - Xiaorong Zhou
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, P. R. China
| | - Songmei Geng
- Department of Dermatology, The Second Affiliated Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi, 710004, P. R. China
| | - Shanshan Dong
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, P. R. China
| | - Hao Chen
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, P. R. China.,Research Institute of Xi'an Jiaotong University, Hangzhou, Zhejiang, 311215, P.R. China
| | - Tielin Yang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, P. R. China.,National and Local Joint Engineering Research Center of Biodiagnosis and Biotherapy, The Second Affiliated Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi, 710004, P. R. China
| | - Ying Cheng
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, P. R. China
| | - Yan Guo
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, P. R. China.,National and Local Joint Engineering Research Center of Biodiagnosis and Biotherapy, The Second Affiliated Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi, 710004, P. R. China
| |
Collapse
|
4
|
Zhang J, Lu H, Zhang S, Wang T, Zhao H, Guan F, Zeng P. Leveraging Methylation Alterations to Discover Potential Causal Genes Associated With the Survival Risk of Cervical Cancer in TCGA Through a Two-Stage Inference Approach. Front Genet 2021; 12:667877. [PMID: 34149809 PMCID: PMC8206792 DOI: 10.3389/fgene.2021.667877] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 04/19/2021] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Multiple genes were previously identified to be associated with cervical cancer; however, the genetic architecture of cervical cancer remains unknown and many potential causal genes are yet to be discovered. METHODS To explore potential causal genes related to cervical cancer, a two-stage causal inference approach was proposed within the framework of Mendelian randomization, where the gene expression was treated as exposure, with methylations located within the promoter regions of genes serving as instrumental variables. Five prediction models were first utilized to characterize the relationship between the expression and methylations for each gene; then, the methylation-regulated gene expression (MReX) was obtained and the association was evaluated via Cox mixed-effect model based on MReX. We further implemented the aggregated Cauchy association test (ACAT) combination to take advantage of respective strengths of these prediction models while accounting for dependency among the p-values. RESULTS A total of 14 potential causal genes were discovered to be associated with the survival risk of cervical cancer in TCGA when the five prediction models were separately employed. The total number of potential causal genes was brought to 23 when conducting ACAT. Some of the newly discovered genes may be novel (e.g., YJEFN3, SPATA5L1, IMMP1L, C5orf55, PPIP5K2, ZNF330, CRYZL1, PPM1A, ESCO2, ZNF605, ZNF225, ZNF266, FICD, and OSTC). Functional analyses showed that these genes were enriched in tumor-associated pathways. Additionally, four genes (i.e., COL6A1, SYDE1, ESCO2, and GIPC1) were differentially expressed between tumor and normal tissues. CONCLUSION Our study discovered promising candidate genes that were causally associated with the survival risk of cervical cancer and thus provided new insights into the genetic etiology of cervical cancer.
Collapse
Affiliation(s)
- Jinhui Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Haojie Lu
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Shuo Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
- Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Huashuo Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
- Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Fengjun Guan
- Department of Pediatrics, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
- Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
5
|
Banerjee S, Simonetti FL, Detrois KE, Kaphle A, Mitra R, Nagial R, Söding J. Tejaas: reverse regression increases power for detecting trans-eQTLs. Genome Biol 2021; 22:142. [PMID: 33957961 PMCID: PMC8101255 DOI: 10.1186/s13059-021-02361-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 04/22/2021] [Indexed: 12/18/2022] Open
Abstract
Trans-acting expression quantitative trait loci (trans-eQTLs) account for ≥70% expression heritability and could therefore facilitate uncovering mechanisms underlying the origination of complex diseases. Identifying trans-eQTLs is challenging because of small effect sizes, tissue specificity, and a severe multiple-testing burden. Tejaas predicts trans-eQTLs by performing L2-regularized “reverse” multiple regression of each SNP on all genes, aggregating evidence from many small trans-effects while being unaffected by the strong expression correlations. Combined with a novel unsupervised k-nearest neighbor method to remove confounders, Tejaas predicts 18851 unique trans-eQTLs across 49 tissues from GTEx. They are enriched in open chromatin, enhancers, and other regulatory regions. Many overlap with disease-associated SNPs, pointing to tissue-specific transcriptional regulation mechanisms.
Collapse
Affiliation(s)
- Saikat Banerjee
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Göttingen, 37077, Germany.
| | - Franco L Simonetti
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Göttingen, 37077, Germany
| | - Kira E Detrois
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Göttingen, 37077, Germany.,Georg-August University, Göttingen, 37075, Germany
| | - Anubhav Kaphle
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Göttingen, 37077, Germany.,Georg-August University, Göttingen, 37075, Germany
| | | | | | - Johannes Söding
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Göttingen, 37077, Germany. .,Campus-Institut Data Science (CIDAS), University of Göttingen, Göttingen, 37073, Germany. .,Cluster of Excellence "Multiscale Bioimaging" (MBExC), University of Göttingen, Göttingen, 37075, Germany.
| |
Collapse
|
6
|
Okoro PC, Schubert R, Guo X, Johnson WC, Rotter JI, Hoeschele I, Liu Y, Im HK, Luke A, Dugas LR, Wheeler HE. Transcriptome prediction performance across machine learning models and diverse ancestries. HGG ADVANCES 2021; 2:100019. [PMID: 33937878 PMCID: PMC8087249 DOI: 10.1016/j.xhgg.2020.100019] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Transcriptome prediction methods such as PrediXcan and FUSION have become popular in complex trait mapping. Most transcriptome prediction models have been trained in European populations using methods that make parametric linear assumptions like the elastic net (EN). To potentially further optimize imputation performance of gene expression across global populations, we built transcriptome prediction models using both linear and non-linear machine learning (ML) algorithms and evaluated their performance in comparison to EN. We trained models using genotype and blood monocyte transcriptome data from the Multi-Ethnic Study of Atherosclerosis (MESA) comprising individuals of African, Hispanic, and European ancestries and tested them using genotype and whole-blood transcriptome data from the Modeling the Epidemiology Transition Study (METS) comprising individuals of African ancestries. We show that the prediction performance is highest when the training and the testing population share similar ancestries regardless of the prediction algorithm used. While EN generally outperformed random forest (RF), support vector regression (SVR), and K nearest neighbor (KNN), we found that RF outperformed EN for some genes, particularly between disparate ancestries, suggesting potential robustness and reduced variability of RF imputation performance across global populations. When applied to a high-density lipoprotein (HDL) phenotype, we show including RF prediction models in PrediXcan revealed potential gene associations missed by EN models. Therefore, by integrating other ML modeling into PrediXcan and diversifying our training populations to include more global ancestries, we may uncover new genes associated with complex traits.
Collapse
Affiliation(s)
- Paul C Okoro
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, USA
| | - Ryan Schubert
- Department of Mathematics and Statistics, Loyola University Chicago, Chicago, IL, USA
| | - Xiuqing Guo
- Institute for Translational Genomics and Population Sciences, The Lundquist Institute and Department of Pediatrics at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - W Craig Johnson
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Jerome I Rotter
- Institute for Translational Genomics and Population Sciences, The Lundquist Institute and Department of Pediatrics at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Ina Hoeschele
- Fralin Life Sciences Institute, Virginia Tech, Blacksburg, VA, USA.,Department of Statistics, Virginia Tech, Blacksburg, VA, USA.,Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Yongmei Liu
- Department of Medicine, Duke University School of Medicine, Durham, NC, USA
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Amy Luke
- Department of Public Health Sciences, Parkinson School of Health Sciences and Public Health, Loyola University Chicago, Maywood, IL, USA
| | - Lara R Dugas
- Department of Public Health Sciences, Parkinson School of Health Sciences and Public Health, Loyola University Chicago, Maywood, IL, USA.,Department of Human Biology, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Heather E Wheeler
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, USA.,Department of Biology, Loyola University Chicago, Chicago, IL, USA.,Department of Computer Science, Loyola University Chicago, Chicago, IL, USA
| |
Collapse
|
7
|
Yao S, Wu H, Liu TT, Wang JH, Ding JM, Guo J, Rong Y, Ke X, Hao RH, Dong SS, Yang TL, Guo Y. Epigenetic Element-Based Transcriptome-Wide Association Study Identifies Novel Genes for Bipolar Disorder. Schizophr Bull 2021; 47:1642-1652. [PMID: 33772305 PMCID: PMC8530404 DOI: 10.1093/schbul/sbab023] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Since the bipolar disorder (BD) signals identified by genome-wide association study (GWAS) often reside in the non-coding regions, understanding the biological relevance of these genetic loci has proven to be complicated. Transcriptome-wide association studies (TWAS) providing a powerful approach to identify novel disease risk genes and uncover possible causal genes at loci identified previously by GWAS. However, these methods did not consider the importance of epigenetic regulation in gene expression. Here, we developed a novel epigenetic element-based transcriptome-wide association study (ETWAS) that tested the effects of genetic variants on gene expression levels with the epigenetic features as prior and further mediated the association between predicted expression and BD. We conducted an ETWAS consisting of 20 352 cases and 31 358 controls and identified 44 transcriptome-wide significant hits. We found 14 conditionally independent genes, and 10 genes that did not previously implicate with BD were regarded as novel candidate genes, such as ASB16 in the cerebellar hemisphere (P = 9.29 × 10-8). We demonstrated that several genome-wide significant signals from the BD GWAS driven by genetically regulated expression, and NEK4 explained 90.1% of the GWAS signal. Additionally, ETWAS identified genes could explain heritability beyond that explained by GWAS-associated SNPs (P = 5.60 × 10-66). By querying the SNPs in the final models of identified genes in phenome databases, we identified several phenotypes previously associated with BD, such as schizophrenia and depression. In conclusion, ETWAS is a powerful method, and we identified several novel candidate genes associated with BD.
Collapse
Affiliation(s)
- Shi Yao
- National and Local Joint Engineering Research Center of Biodiagnosis and Biotherapy, The Second Affiliated Hospital, Xi’an Jiaotong University, Xi’an, Shaanxi 710004, P. R. China,Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, P. R. China
| | - Hao Wu
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, P. R. China
| | - Tong-Tong Liu
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, P. R. China
| | - Jia-Hao Wang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, P. R. China
| | - Jing-Miao Ding
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, P. R. China
| | - Jing Guo
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, P. R. China
| | - Yu Rong
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, P. R. China
| | - Xin Ke
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, P. R. China
| | - Ruo-Han Hao
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, P. R. China
| | - Shan-Shan Dong
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, P. R. China
| | - Tie-Lin Yang
- National and Local Joint Engineering Research Center of Biodiagnosis and Biotherapy, The Second Affiliated Hospital, Xi’an Jiaotong University, Xi’an, Shaanxi 710004, P. R. China,Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, P. R. China
| | - Yan Guo
- National and Local Joint Engineering Research Center of Biodiagnosis and Biotherapy, The Second Affiliated Hospital, Xi’an Jiaotong University, Xi’an, Shaanxi 710004, P. R. China,Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, P. R. China,To whom correspondence should be addressed; tel: +86-29-62818386, fax: +86-29-62818386, e-mail:
| |
Collapse
|
8
|
Zeng P, Dai J, Jin S, Zhou X. Aggregating multiple expression prediction models improves the power of transcriptome-wide association studies. Hum Mol Genet 2021; 30:939-951. [PMID: 33615361 DOI: 10.1093/hmg/ddab056] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 02/10/2021] [Accepted: 02/15/2021] [Indexed: 12/11/2022] Open
Abstract
Transcriptome-wide association study (TWAS) is an important integrative method for identifying genes that are causally associated with phenotypes. A key step of TWAS involves the construction of expression prediction models for every gene in turn using its cis-SNPs as predictors. Different TWAS methods rely on different models for gene expression prediction, and each such model makes a distinct modeling assumption that is often suitable for a particular genetic architecture underlying expression. However, the genetic architectures underlying gene expression vary across genes throughout the transcriptome. Consequently, different TWAS methods may be beneficial in detecting genes with distinct genetic architectures. Here, we develop a new method, HMAT, which aggregates TWAS association evidence obtained across multiple gene expression prediction models by leveraging the harmonic mean P-value combination strategy. Because each expression prediction model is suited to capture a particular genetic architecture, aggregating TWAS associations across prediction models as in HMAT improves accurate expression prediction and enables subsequent powerful TWAS analysis across the transcriptome. A key feature of HMAT is its ability to accommodate the correlations among different TWAS test statistics and produce calibrated P-values after aggregation. Through numerical simulations, we illustrated the advantage of HMAT over commonly used TWAS methods as well as ad hoc P-value combination rules such as Fisher's method. We also applied HMAT to analyze summary statistics of nine common diseases. In the real data applications, HMAT was on average 30.6% more powerful compared to the next best method, detecting many new disease-associated genes that were otherwise not identified by existing TWAS approaches. In conclusion, HMAT represents a flexible and powerful TWAS method that enjoys robust performance across a range of genetic architectures underlying gene expression.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Jing Dai
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Siyi Jin
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.,Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
9
|
Alpay BA, Demetci P, Istrail S, Aguiar D. Combinatorial and statistical prediction of gene expression from haplotype sequence. Bioinformatics 2020; 36:i194-i202. [PMID: 32657373 PMCID: PMC7355230 DOI: 10.1093/bioinformatics/btaa318] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Genome-wide association studies (GWAS) have discovered thousands of significant genetic effects on disease phenotypes. By considering gene expression as the intermediary between genotype and disease phenotype, expression quantitative trait loci studies have interpreted many of these variants by their regulatory effects on gene expression. However, there remains a considerable gap between genotype-to-gene expression association and genotype-to-gene expression prediction. Accurate prediction of gene expression enables gene-based association studies to be performed post hoc for existing GWAS, reduces multiple testing burden, and can prioritize genes for subsequent experimental investigation. RESULTS In this work, we develop gene expression prediction methods that relax the independence and additivity assumptions between genetic markers. First, we consider gene expression prediction from a regression perspective and develop the HAPLEXR algorithm which combines haplotype clusterings with allelic dosages. Second, we introduce the new gene expression classification problem, which focuses on identifying expression groups rather than continuous measurements; we formalize the selection of an appropriate number of expression groups using the principle of maximum entropy. Third, we develop the HAPLEXD algorithm that models haplotype sharing with a modified suffix tree data structure and computes expression groups by spectral clustering. In both models, we penalize model complexity by prioritizing genetic clusters that indicate significant effects on expression. We compare HAPLEXR and HAPLEXD with three state-of-the-art expression prediction methods and two novel logistic regression approaches across five GTEx v8 tissues. HAPLEXD exhibits significantly higher classification accuracy overall; HAPLEXR shows higher prediction accuracy on approximately half of the genes tested and the largest number of best predicted genes (r2>0.1) among all methods. We show that variant and haplotype features selected by HAPLEXR are smaller in size than competing methods (and thus more interpretable) and are significantly enriched in functional annotations related to gene regulation. These results demonstrate the importance of explicitly modeling non-dosage dependent and intragenic epistatic effects when predicting expression. AVAILABILITY AND IMPLEMENTATION Source code and binaries are freely available at https://github.com/rapturous/HAPLEX. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Berk A Alpay
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA
| | - Pinar Demetci
- Department of Computer Science and Center for Computational Biology, Brown University, Providence, RI 02912, USA
| | - Sorin Istrail
- Department of Computer Science and Center for Computational Biology, Brown University, Providence, RI 02912, USA
| | - Derek Aguiar
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
10
|
Shi W, Fornes O, Wasserman WW. Gene expression models based on transcription factor binding events confer insight into functional cis-regulatory variants. Bioinformatics 2020; 35:2610-2617. [PMID: 30541050 PMCID: PMC6662294 DOI: 10.1093/bioinformatics/bty992] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Revised: 10/17/2018] [Accepted: 12/10/2018] [Indexed: 01/03/2023] Open
Abstract
Motivation Deciphering the functional roles of cis-regulatory variants is a critical challenge in genome analysis and interpretation. It has been hypothesized that altered transcription factor (TF) binding events are a central mechanism by which cis-regulatory variants impact gene expression levels. However, we lack a computational framework to understand and quantify such mechanistic contributions. Results We present TF2Exp, a gene-based framework to predict the impact of altered TF-binding events on gene expression levels. Using data from lymphoblastoid cell lines, TF2Exp models were applied successfully to predict the expression levels of 3196 genes. Alterations within DNase I hypersensitive, CTCF-bound and tissue-specific TF-bound regions were the greatest contributing features to the models. TF2Exp models performed as well as models based on common variants, both in cross-validation and external validation. Combining TF alteration and common variant features can further improve model performance. Unlike variant-based models, TF2Exp models have the unique advantage to evaluate the functional impact of variants in linkage disequilibrium and uncommon variants. We find that adding TF-binding events altered only by uncommon variants could increase the number of predictable genes (R2 > 0.05). Taken together, TF2Exp represents a key step towards interpreting the functional roles of cis-regulatory variants in the human genome. Availability and implementation The code and model training results are publicly available at https://github.com/wqshi/TF2Exp. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenqiang Shi
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada.,Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada.,Beijing Institute of Microbiology and Epidemiology, Beijing, China
| | - Oriol Fornes
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada
| | - Wyeth W Wasserman
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
11
|
Petty LE, Highland HM, Gamazon ER, Hu H, Karhade M, Chen HH, de Vries PS, Grove ML, Aguilar D, Bell GI, Huff CD, Hanis CL, Doddapaneni H, Munzy DM, Gibbs RA, Ma J, Parra EJ, Cruz M, Valladares-Salgado A, Arking DE, Barbeira A, Im HK, Morrison AC, Boerwinkle E, Below JE. Functionally oriented analysis of cardiometabolic traits in a trans-ethnic sample. Hum Mol Genet 2019; 28:1212-1224. [PMID: 30624610 PMCID: PMC6423424 DOI: 10.1093/hmg/ddy435] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 11/13/2018] [Accepted: 11/20/2018] [Indexed: 01/02/2023] Open
Abstract
Interpretation of genetic association results is difficult because signals often lack biological context. To generate hypotheses of the functional genetic etiology of complex cardiometabolic traits, we estimated the genetically determined component of gene expression from common variants using PrediXcan (1) and determined genes with differential predicted expression by trait. PrediXcan imputes tissue-specific expression levels from genetic variation using variant-level effect on gene expression in transcriptome data. To explore the value of imputed genetically regulated gene expression (GReX) models across different ancestral populations, we evaluated imputed expression levels for predictive accuracy genome-wide in RNA sequence data in samples drawn from European-ancestry and African-ancestry populations and identified substantial predictive power using European-derived models in a non-European target population. We then tested the association of GReX on 15 cardiometabolic traits including blood lipid levels, body mass index, height, blood pressure, fasting glucose and insulin, RR interval, fibrinogen level, factor VII level and white blood cell and platelet counts in 15 755 individuals across three ancestry groups, resulting in 20 novel gene-phenotype associations reaching experiment-wide significance across ancestries. In addition, we identified 18 significant novel gene-phenotype associations in our ancestry-specific analyses. Top associations were assessed for additional support via query of S-PrediXcan (2) results derived from publicly available genome-wide association studies summary data. Collectively, these findings illustrate the utility of transcriptome-based imputation models for discovery of cardiometabolic effect genes in a diverse dataset.
Collapse
Affiliation(s)
- Lauren E Petty
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.,Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Heather M Highland
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA.,Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
| | - Eric R Gamazon
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.,Clare Hall, University of Cambridge, Cambridge, UK
| | - Hao Hu
- Department of Epidemiology, MD Anderson Cancer Center, Houston, TX, USA
| | - Mandar Karhade
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Hung-Hsin Chen
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.,Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Paul S de Vries
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Megan L Grove
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - David Aguilar
- Department of Cardiology, Baylor College of Medicine Houston, TX, USA
| | - Graeme I Bell
- Departments of Medicine and Human Genetics, The University of Chicago, Chicago, IL, USA
| | - Chad D Huff
- Department of Epidemiology, MD Anderson Cancer Center, Houston, TX, USA
| | - Craig L Hanis
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | | | - Donna M Munzy
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Jianzhong Ma
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Esteban J Parra
- Department of Anthropology, University of Toronto at Mississauga, Mississauga, Ontario, Canada
| | - Miguel Cruz
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, Centro Médico Nacional Siglo XXI, IMSS, Mexico City, Mexico
| | - Adan Valladares-Salgado
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, Centro Médico Nacional Siglo XXI, IMSS, Mexico City, Mexico
| | - Dan E Arking
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Alvaro Barbeira
- Section of Genetic Medicine, Department of Medicine, University of Chicago, IL, USA
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of Medicine, University of Chicago, IL, USA
| | - Alanna C Morrison
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Jennifer E Below
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.,Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
12
|
Abstract
Genome-wide association studies (GWASs) have identified thousands of loci associated with hundreds of complex diseases and traits, and progress is being made toward elucidating the causal variants and genes underlying these associations. Functional characterization of mechanisms at GWAS loci is a multi-faceted challenge. Challenges include linkage disequilibrium and allelic heterogeneity at each locus, the noncoding nature of most loci, and the time and cost needed for experimentally evaluating the potential mechanistic contributions of genes and variants. As GWAS sample sizes increase, more loci are identified, and the complexities of individual loci emerge. Loci can consist of multiple association signals, each of which can reflect the influence of multiple variants, inseparable by association analyses. Each signal within a locus can influence the same or different target genes. Experimental studies of genes and variants can differ on the basis of cell type, cellular environment, or other context-specific variables. In this review, we describe the complexity of mechanisms at GWAS loci-including multiple signals, multiple variants, and/or multiple genes-and the implications these complexities hold for experimental study design and interpretation of GWAS mechanisms.
Collapse
Affiliation(s)
- Maren E Cannon
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Karen L Mohlke
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
13
|
Mogil LS, Andaleon A, Badalamenti A, Dickinson SP, Guo X, Rotter JI, Johnson WC, Im HK, Liu Y, Wheeler HE. Genetic architecture of gene expression traits across diverse populations. PLoS Genet 2018; 14:e1007586. [PMID: 30096133 PMCID: PMC6105030 DOI: 10.1371/journal.pgen.1007586] [Citation(s) in RCA: 85] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Revised: 08/22/2018] [Accepted: 07/24/2018] [Indexed: 01/14/2023] Open
Abstract
For many complex traits, gene regulation is likely to play a crucial mechanistic role. How the genetic architectures of complex traits vary between populations and subsequent effects on genetic prediction are not well understood, in part due to the historical paucity of GWAS in populations of non-European ancestry. We used data from the MESA (Multi-Ethnic Study of Atherosclerosis) cohort to characterize the genetic architecture of gene expression within and between diverse populations. Genotype and monocyte gene expression were available in individuals with African American (AFA, n = 233), Hispanic (HIS, n = 352), and European (CAU, n = 578) ancestry. We performed expression quantitative trait loci (eQTL) mapping in each population and show genetic correlation of gene expression depends on shared ancestry proportions. Using elastic net modeling with cross validation to optimize genotypic predictors of gene expression in each population, we show the genetic architecture of gene expression for most predictable genes is sparse. We found the best predicted gene in each population, TACSTD2 in AFA and CHURC1 in CAU and HIS, had similar prediction performance across populations with R2 > 0.8 in each population. However, we identified a subset of genes that are well-predicted in one population, but poorly predicted in another. We show these differences in predictive performance are due to allele frequency differences between populations. Using genotype weights trained in MESA to predict gene expression in independent populations showed that a training set with ancestry similar to the test set is better at predicting gene expression in test populations, demonstrating an urgent need for diverse population sampling in genomics. Our predictive models and performance statistics in diverse cohorts are made publicly available for use in transcriptome mapping methods at https://github.com/WheelerLab/DivPop.
Collapse
Affiliation(s)
- Lauren S. Mogil
- Department of Biology, Loyola University Chicago, Chicago, Illinois, United States of America
| | - Angela Andaleon
- Department of Biology, Loyola University Chicago, Chicago, Illinois, United States of America
- Program in Bioinformatics, Loyola University Chicago, Chicago, Illinois, United States of America
| | - Alexa Badalamenti
- Program in Bioinformatics, Loyola University Chicago, Chicago, Illinois, United States of America
| | - Scott P. Dickinson
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Xiuqing Guo
- Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute and Department of Pediatrics at Harbor-UCLA Medical Center, Torrance, California, United States of America
| | - Jerome I. Rotter
- Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute and Department of Pediatrics at Harbor-UCLA Medical Center, Torrance, California, United States of America
| | - W. Craig Johnson
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Yongmei Liu
- Department of Epidemiology & Prevention, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Heather E. Wheeler
- Department of Biology, Loyola University Chicago, Chicago, Illinois, United States of America
- Program in Bioinformatics, Loyola University Chicago, Chicago, Illinois, United States of America
- Department of Computer Science, Loyola University Chicago, Chicago, Illinois, United States of America
- Department of Public Health Sciences, Stritch School of Medicine, Loyola University Chicago, Maywood, Illinois, United States of America
| |
Collapse
|
14
|
Heinig M. Using Gene Expression to Annotate Cardiovascular GWAS Loci. Front Cardiovasc Med 2018; 5:59. [PMID: 29922679 PMCID: PMC5996083 DOI: 10.3389/fcvm.2018.00059] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Accepted: 05/15/2018] [Indexed: 01/27/2023] Open
Abstract
Genetic variants at hundreds of loci associated with cardiovascular phenotypes have been identified by genome wide association studies. Most of these variants are located in intronic or intergenic regions rendering the functional and mechanistic follow up difficult. These non-protein-coding regions harbor regulatory sequences. Thus the study of genetic variants associated with transcription—so called expression quantitative trait loci—has emerged as a promising approach to identify regulatory sequence variants. The genes and pathways they control constitute candidate causal drivers at cardiovascular risk loci. This review provides an overview of the expression quantitative trait loci resources available for cardiovascular genetics research and the most commonly used approaches for candidate gene identification.
Collapse
Affiliation(s)
- Matthias Heinig
- Institute of Computational Biology, Helmholtz Zentrum München German Research Center for Environmental Health, Neuherberg, Germany.,Department of Informatics, Technical University of Munich, Munich, Germany
| |
Collapse
|
15
|
Barbeira AN, Dickinson SP, Bonazzola R, Zheng J, Wheeler HE, Torres JM, Torstenson ES, Shah KP, Garcia T, Edwards TL, Stahl EA, Huckins LM, Nicolae DL, Cox NJ, Im HK. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat Commun 2018; 9:1825. [PMID: 29739930 PMCID: PMC5940825 DOI: 10.1038/s41467-018-03621-1] [Citation(s) in RCA: 589] [Impact Index Per Article: 98.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Accepted: 12/27/2017] [Indexed: 12/25/2022] Open
Abstract
Scalable, integrative methods to understand mechanisms that link genetic variants with phenotypes are needed. Here we derive a mathematical expression to compute PrediXcan (a gene mapping approach) results using summary data (S-PrediXcan) and show its accuracy and general robustness to misspecified reference sets. We apply this framework to 44 GTEx tissues and 100+ phenotypes from GWAS and meta-analysis studies, creating a growing public catalog of associations that seeks to capture the effects of gene expression variation on human phenotypes. Replication in an independent cohort is shown. Most of the associations are tissue specific, suggesting context specificity of the trait etiology. Colocalized significant associations in unexpected tissues underscore the need for an agnostic scanning of multiple contexts to improve our ability to detect causal regulatory mechanisms. Monogenic disease genes are enriched among significant associations for related traits, suggesting that smaller alterations of these genes may cause a spectrum of milder phenotypes.
Collapse
Affiliation(s)
- Alvaro N Barbeira
- Section of Genetic Medicine, The University of Chicago, Chicago, IL, 60637, USA
| | - Scott P Dickinson
- Section of Genetic Medicine, The University of Chicago, Chicago, IL, 60637, USA
| | - Rodrigo Bonazzola
- Section of Genetic Medicine, The University of Chicago, Chicago, IL, 60637, USA
| | - Jiamao Zheng
- Section of Genetic Medicine, The University of Chicago, Chicago, IL, 60637, USA
| | - Heather E Wheeler
- Department of Biology, Loyola University Chicago, Chicago, IL, 60660, USA.,Department of Computer Science, Loyola University Chicago, Chicago, IL, 60660, USA
| | - Jason M Torres
- Committee on Molecular Metabolism and Nutrition, The University of Chicago, Chicago, IL, 60637, USA
| | - Eric S Torstenson
- Vanderbilt Genetic Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Kaanan P Shah
- Section of Genetic Medicine, The University of Chicago, Chicago, IL, 60637, USA
| | - Tzintzuni Garcia
- Center for Research Informatics, The University of Chicago, Chicago, IL, 60615, USA
| | - Todd L Edwards
- Division of Epidemiology, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Eli A Stahl
- Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, NYC, NY, 10029, USA.,Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, NYC, NY, 10029, USA
| | - Laura M Huckins
- Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, NYC, NY, 10029, USA.,Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, NYC, NY, 10029, USA
| | | | - Dan L Nicolae
- Section of Genetic Medicine, The University of Chicago, Chicago, IL, 60637, USA
| | - Nancy J Cox
- Vanderbilt Genetic Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Hae Kyung Im
- Section of Genetic Medicine, The University of Chicago, Chicago, IL, 60637, USA.
| |
Collapse
|
16
|
Abstract
BACKGROUND Gene expression is a key intermediate level that genotypes lead to a particular trait. Gene expression is affected by various factors including genotypes of genetic variants. With an aim of delineating the genetic impact on gene expression, we build a deep auto-encoder model to assess how good genetic variants will contribute to gene expression changes. This new deep learning model is a regression-based predictive model based on the MultiLayer Perceptron and Stacked Denoising Auto-encoder (MLP-SAE). The model is trained using a stacked denoising auto-encoder for feature selection and a multilayer perceptron framework for backpropagation. We further improve the model by introducing dropout to prevent overfitting and improve performance. RESULTS To demonstrate the usage of this model, we apply MLP-SAE to a real genomic datasets with genotypes and gene expression profiles measured in yeast. Our results show that the MLP-SAE model with dropout outperforms other models including Lasso, Random Forests and the MLP-SAE model without dropout. Using the MLP-SAE model with dropout, we show that gene expression quantifications predicted by the model solely based on genotypes, align well with true gene expression patterns. CONCLUSION We provide a deep auto-encoder model for predicting gene expression from SNP genotypes. This study demonstrates that deep learning is appropriate for tackling another genomic problem, i.e., building predictive models to understand genotypes' contribution to gene expression. With the emerging availability of richer genomic data, we anticipate that deep learning models play a bigger role in modeling and interpreting genomics.
Collapse
Affiliation(s)
- Rui Xie
- Department of Computer Science, University of Missouri at Columbia, Columbia, MO USA
| | - Jia Wen
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, University City Blvd, Charlotte, NC USA
| | - Andrew Quitadamo
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, University City Blvd, Charlotte, NC USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri at Columbia, Columbia, MO USA
| | - Xinghua Shi
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, University City Blvd, Charlotte, NC USA
| |
Collapse
|
17
|
Zeng P, Wang T, Huang S. Cis-SNPs Set Testing and PrediXcan Analysis for Gene Expression Data using Linear Mixed Models. Sci Rep 2017; 7:15237. [PMID: 29127305 PMCID: PMC5681585 DOI: 10.1038/s41598-017-15055-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Accepted: 10/19/2017] [Indexed: 12/21/2022] Open
Abstract
Understanding the functional mechanism of SNPs identified in GWAS on complex diseases is currently a challenging task. The studies of expression quantitative trait loci (eQTL) have shown that regulatory variants play a crucial role in the function of associated SNPs. Detecting significant genes (called eGenes) in eQTL studies and analyzing the effect sizes of cis-SNPs can offer important implications on the genetic architecture of associated SNPs and interpretations of the molecular basis of diseases. We applied linear mixed models (LMM) to the gene expression level and constructed likelihood ratio tests (LRT) to test for eGene in the Geuvadis data. We identified about 11% genes as eGenes in the Geuvadis data and found some eGenes were enriched in approximately independent linkage disequilibrium (LD) blocks (e.g. MHC). We further performed PrediXcan analysis for seven diseases in the WTCCC data with weights estimated using LMM and identified 64, 5, 21 and 1 significant genes (p < 0.05 after Bonferroni correction) associated with T1D, CD, RA and T2D. We found most of the significant genes of T1D and RA were also located within the MHC region. Our results provide strong evidence that gene expression plays an intermediate role for the associated variants in GWAS.
Collapse
Affiliation(s)
- Ping Zeng
- Xuzhou Medical University, Department of Epidemiology and Biostatistics, Xuzhou, 221004, China.
- University of Michigan, Department of Biostatistics, Ann Arbor, MI, 48104, USA.
| | - Ting Wang
- Xuzhou Medical University, Department of Epidemiology and Biostatistics, Xuzhou, 221004, China
| | - Shuiping Huang
- Xuzhou Medical University, Department of Epidemiology and Biostatistics, Xuzhou, 221004, China.
| |
Collapse
|
18
|
Zeng P, Zhou X, Huang S. Prediction of gene expression with cis-SNPs using mixed models and regularization methods. BMC Genomics 2017; 18:368. [PMID: 28490319 PMCID: PMC5425981 DOI: 10.1186/s12864-017-3759-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2016] [Accepted: 05/03/2017] [Indexed: 12/25/2022] Open
Abstract
Background It has been shown that gene expression in human tissues is heritable, thus predicting gene expression using only SNPs becomes possible. The prediction of gene expression can offer important implications on the genetic architecture of individual functional associated SNPs and further interpretations of the molecular basis underlying human diseases. Methods We compared three types of methods for predicting gene expression using only cis-SNPs, including the polygenic model, i.e. linear mixed model (LMM), two sparse models, i.e. Lasso and elastic net (ENET), and the hybrid of LMM and sparse model, i.e. Bayesian sparse linear mixed model (BSLMM). The three kinds of prediction methods have very different assumptions of underlying genetic architectures. These methods were evaluated using simulations under various scenarios, and were applied to the Geuvadis gene expression data. Results The simulations showed that these four prediction methods (i.e. Lasso, ENET, LMM and BSLMM) behaved best when their respective modeling assumptions were satisfied, but BSLMM had a robust performance across a range of scenarios. According to R2 of these models in the Geuvadis data, the four methods performed quite similarly. We did not observe any clustering or enrichment of predictive genes (defined as genes with R2 ≥ 0.05) across the chromosomes, and also did not see there was any clear relationship between the proportion of the predictive genes and the proportion of genes in each chromosome. However, an interesting finding in the Geuvadis data was that highly predictive genes (e.g. R2 ≥ 0.30) may have sparse genetic architectures since Lasso, ENET and BSLMM outperformed LMM for these genes; and this observation was validated in another gene expression data. We further showed that the predictive genes were enriched in approximately independent LD blocks. Conclusions Gene expression can be predicted with only cis-SNPs using well-developed prediction models and these predictive genes were enriched in some approximately independent LD blocks. The prediction of gene expression can shed some light on the functional interpretation for identified SNPs in GWASs.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, 209 Tongshan Rd, Xuzhou, Jiangsu, 221004, China. .,Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI, 48104, USA.
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI, 48104, USA
| | - Shuiping Huang
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, 209 Tongshan Rd, Xuzhou, Jiangsu, 221004, China.
| |
Collapse
|
19
|
Maricque BB, Dougherty JD, Cohen BA. A genome-integrated massively parallel reporter assay reveals DNA sequence determinants of cis-regulatory activity in neural cells. Nucleic Acids Res 2017; 45:e16. [PMID: 28204611 PMCID: PMC5389540 DOI: 10.1093/nar/gkw942] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2016] [Revised: 10/05/2016] [Accepted: 10/11/2016] [Indexed: 11/12/2022] Open
Abstract
Recent large-scale genomics efforts to characterize the cis-regulatory sequences that orchestrate genome-wide expression patterns have produced impressive catalogues of putative regulatory elements. Most of these sequences have not been functionally tested, and our limited understanding of the non-coding genome prevents us from predicting which sequences are bona fide cis-regulatory elements. Recently, massively parallel reporter assays (MPRAs) have been deployed to measure the activity of putative cis-regulatory sequences in several biological contexts, each with specific advantages and distinct limitations. We developed LV-MPRA, a novel lentiviral-based, massively parallel reporter gene assay, to study the function of genome-integrated regulatory elements in any mammalian cell type; thus, making it possible to apply MPRAs in more biologically relevant contexts. We measured the activity of 2,600 sequences in U87 glioblastoma cells and human neural progenitor cells (hNPCs) and explored how regulatory activity is encoded in DNA sequence. We demonstrate that LV-MPRA can be applied to estimate the effects of local DNA sequence and regional chromatin on regulatory activity. Our data reveal that primary DNA sequence features, such as GC content and dinucleotide composition, accurately distinguish sequences with high activity from sequences with low activity in a full chromosomal context, and may also function in combination with different transcription factor binding sites to determine cell type specificity. We conclude that LV-MPRA will be an important tool for identifying cis-regulatory elements and stimulating new understanding about how the non-coding genome encodes information.
Collapse
Affiliation(s)
- Brett B. Maricque
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, Saint Louis, MO 63108, USA
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63108, USA
| | - Joseph D. Dougherty
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63108, USA
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO 63108, USA
| | - Barak A. Cohen
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, Saint Louis, MO 63108, USA
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63108, USA
| |
Collapse
|
20
|
Wheeler HE, Shah KP, Brenner J, Garcia T, Aquino-Michaels K, Cox NJ, Nicolae DL, Im HK. Survey of the Heritability and Sparse Architecture of Gene Expression Traits across Human Tissues. PLoS Genet 2016; 12:e1006423. [PMID: 27835642 PMCID: PMC5106030 DOI: 10.1371/journal.pgen.1006423] [Citation(s) in RCA: 127] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Accepted: 10/12/2016] [Indexed: 11/19/2022] Open
Abstract
Understanding the genetic architecture of gene expression traits is key to elucidating the underlying mechanisms of complex traits. Here, for the first time, we perform a systematic survey of the heritability and the distribution of effect sizes across all representative tissues in the human body. We find that local h2 can be relatively well characterized with 59% of expressed genes showing significant h2 (FDR < 0.1) in the DGN whole blood cohort. However, current sample sizes (n ≤ 922) do not allow us to compute distal h2. Bayesian Sparse Linear Mixed Model (BSLMM) analysis provides strong evidence that the genetic contribution to local expression traits is dominated by a handful of genetic variants rather than by the collective contribution of a large number of variants each of modest size. In other words, the local architecture of gene expression traits is sparse rather than polygenic across all 40 tissues (from DGN and GTEx) examined. This result is confirmed by the sparsity of optimal performing gene expression predictors via elastic net modeling. To further explore the tissue context specificity, we decompose the expression traits into cross-tissue and tissue-specific components using a novel Orthogonal Tissue Decomposition (OTD) approach. Through a series of simulations we show that the cross-tissue and tissue-specific components are identifiable via OTD. Heritability and sparsity estimates of these derived expression phenotypes show similar characteristics to the original traits. Consistent properties relative to prior GTEx multi-tissue analysis results suggest that these traits reflect the expected biology. Finally, we apply this knowledge to develop prediction models of gene expression traits for all tissues. The prediction models, heritability, and prediction performance R2 for original and decomposed expression phenotypes are made publicly available (https://github.com/hakyimlab/PrediXcan).
Collapse
Affiliation(s)
- Heather E. Wheeler
- Department of Biology, Loyola University Chicago, Chicago, Illinois, United States of America
- Department of Computer Science, Loyola University Chicago, Chicago, Illinois, United States of America
| | - Kaanan P. Shah
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Jonathon Brenner
- Department of Computer Science, Loyola University Chicago, Chicago, Illinois, United States of America
| | - Tzintzuni Garcia
- Center for Research Informatics, University of Chicago, Chicago, Illinois, United States of America
| | - Keston Aquino-Michaels
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | | | - Nancy J. Cox
- Division of Genetic Medicine, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Dan L. Nicolae
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
21
|
Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Carroll RJ, Eyler AE, Denny JC, Nicolae DL, Cox NJ, Kyung Im H. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet 2015; 47:1091-8. [PMID: 26258848 PMCID: PMC4552594 DOI: 10.1038/ng.3367] [Citation(s) in RCA: 1055] [Impact Index Per Article: 117.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Accepted: 07/06/2015] [Indexed: 12/14/2022]
Abstract
Genome-wide association studies (GWAS) have identified thousands of variants robustly associated with complex traits. However, the biological mechanisms underlying these associations are, in general, not well understood. We propose a gene-based association method called PrediXcan that directly tests the molecular mechanisms through which genetic variation affects phenotype. The approach estimates the component of gene expression determined by an individual's genetic profile and correlates 'imputed' gene expression with the phenotype under investigation to identify genes involved in the etiology of the phenotype. Genetically regulated gene expression is estimated using whole-genome tissue-dependent prediction models trained with reference transcriptome data sets. PrediXcan enjoys the benefits of gene-based approaches such as reduced multiple-testing burden and a principled approach to the design of follow-up experiments. Our results demonstrate that PrediXcan can detect known and new genes associated with disease traits and provide insights into the mechanism of these associations.
Collapse
Affiliation(s)
- Eric R. Gamazon
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL
- Division of Genetic Medicine, Vanderbilt University, Nashville, TN
| | - Heather E. Wheeler
- Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL
| | - Kaanan P. Shah
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL
| | | | | | - Robert J. Carroll
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN
| | - Anne E. Eyler
- Department of Medicine, Vanderbilt University, Nashville, TN
| | - Joshua C. Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN
| | - GTEx Consortium
- A full list of members and affiliations appears in the Supplementary Note
| | - Dan L. Nicolae
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL
- Department of Human Genetics, University of Chicago, Chicago, IL
- Department of Statistics, University of Chicago, Chicago, IL
| | - Nancy J. Cox
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL
- Division of Genetic Medicine, Vanderbilt University, Nashville, TN
- Department of Human Genetics, University of Chicago, Chicago, IL
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL
| |
Collapse
|
22
|
Albert FW, Kruglyak L. The role of regulatory variation in complex traits and disease. Nat Rev Genet 2015; 16:197-212. [DOI: 10.1038/nrg3891] [Citation(s) in RCA: 684] [Impact Index Per Article: 76.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
23
|
Manor O, Segal E. GenoExp: a web tool for predicting gene expression levels from single nucleotide polymorphisms. ACTA ACUST UNITED AC 2015; 31:1848-50. [PMID: 25637557 DOI: 10.1093/bioinformatics/btv050] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2014] [Accepted: 01/26/2015] [Indexed: 12/16/2022]
Abstract
UNLABELLED Understanding the effect of single nucleotide polymorphisms (SNPs) on the expression level of genes is an important goal. We recently published a study in which we devised a multi-SNP predictive model for gene expression in Lymphoblastoid cell lines (LCL), and showed that it can robustly predict the expression of a small number of genes in test individuals. Here, we validate the generality of our models by predicting expression profiles for genes in LCL in an independent study, and extend the pool of predictable genes for which we are able to explain more than 25% of their expression variability to 232 genes across 14 different cell types. As the number of people who obtained their SNP profiles through companies such as 23andMe is rising rapidly, we developed GenoExp, a web-based tool in which users can upload their individual SNP data and obtain predicted expression levels for the set of predictable genes across the 14 different cell types. Our tool thus allows users with biological knowledge to study the possible effects that their set of SNPs might have on these genes and predict their cell-specific expression levels relative to the population average. AVAILABILITY AND IMPLEMENTATION GenoExp is freely available at http://genie.weizmann.ac.il/pubs/GenoExp/.
Collapse
Affiliation(s)
- Ohad Manor
- Department of Computer Science and Applied Mathematics and Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Eran Segal
- Department of Computer Science and Applied Mathematics and Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel Department of Computer Science and Applied Mathematics and Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
24
|
Okser S, Pahikkala T, Airola A, Salakoski T, Ripatti S, Aittokallio T. Regularized machine learning in the genetic prediction of complex traits. PLoS Genet 2014; 10:e1004754. [PMID: 25393026 PMCID: PMC4230844 DOI: 10.1371/journal.pgen.1004754] [Citation(s) in RCA: 99] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Affiliation(s)
- Sebastian Okser
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Tapio Pahikkala
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Antti Airola
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Tapio Salakoski
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Samuli Ripatti
- Hjelt Institute, University of Helsinki, Helsinki, Finland
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Wellcome Trust Sanger Institute, Hinxton, United Kingdom
| | - Tero Aittokallio
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- * E-mail:
| |
Collapse
|
25
|
Abstract
Instructions for when, where and to what level each gene should be expressed are encoded within regulatory sequences. The importance of motifs recognized by DNA-binding regulators has long been known, but their extensive characterization afforded by recent technologies only partly accounts for how regulatory instructions are encoded in the genome. Here, we review recent advances in our understanding of regulatory sequences that influence transcription and go beyond the description of motifs. We discuss how understanding different aspects of the sequence-encoded regulation can help to unravel the genotype-phenotype relationship, which would lead to a more accurate and mechanistic interpretation of personal genome sequences.
Collapse
Affiliation(s)
- Michal Levo
- Department of Molecular Cell Biology, and Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Eran Segal
- Department of Molecular Cell Biology, and Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|