1
|
Khan A, Kiryluk K. Polygenic scores and their applications in kidney disease. Nat Rev Nephrol 2024:10.1038/s41581-024-00886-2. [PMID: 39271761 DOI: 10.1038/s41581-024-00886-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/06/2024] [Indexed: 09/15/2024]
Abstract
Genome-wide association studies (GWAS) have uncovered thousands of risk variants that individually have small effects on the risk of human diseases, including chronic kidney disease, type 2 diabetes, heart diseases and inflammatory disorders, but cumulatively explain a substantial fraction of disease risk, underscoring the complexity and pervasive polygenicity of common disorders. This complexity poses unique challenges to the clinical translation of GWAS findings. Polygenic scores combine small effects of individual GWAS risk variants across the genome to improve personalized risk prediction. Several polygenic scores have now been developed that exhibit sufficiently large effects to be considered clinically actionable. However, their clinical use is limited by their partial transferability across ancestries and a lack of validated models that combine polygenic, monogenic, family history and clinical risk factors. Moreover, prospective studies are still needed to demonstrate the clinical utility and cost-effectiveness of polygenic scores in clinical practice. Here, we discuss evolving methods for developing polygenic scores, best practices for validating and reporting their performance, and the study designs that will empower their clinical implementation. We specifically focus on the polygenic scores relevant to nephrology and other chronic, complex diseases and review their key limitations, necessary refinements and potential clinical applications.
Collapse
Affiliation(s)
- Atlas Khan
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
| | - Krzysztof Kiryluk
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA.
| |
Collapse
|
2
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: trends from three decades of genetic variant impact predictors. Hum Genomics 2024; 18:90. [PMID: 39198917 PMCID: PMC11360829 DOI: 10.1186/s40246-024-00663-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2024] [Accepted: 08/19/2024] [Indexed: 09/01/2024] Open
Abstract
BACKGROUND Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). RESULTS The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past three decades, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 190 VIPs, resulting in a total of 407 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. CONCLUSIONS VIPdb version 2 summarizes 407 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. VIPdb is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA
| | - Arul S Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA
- Illumina, Foster City, CA, 94404, USA
| | - Steven E Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA.
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA.
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA.
| |
Collapse
|
3
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: Trends from 25 years of genetic variant impact predictors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600283. [PMID: 38979289 PMCID: PMC11230257 DOI: 10.1101/2024.06.25.600283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Background Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). Results The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past 25 years, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 186 VIPs, resulting in a total of 403 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. Conclusions VIPdb version 2 summarizes 403 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. Availability VIPdb version 2 is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
| | - Arul S. Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Currently at: Illumina, Foster City, California 94404, USA
| | - Steven E. Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
4
|
Lu H, Ma L, Quan C, Li L, Lu Y, Zhou G, Zhang C. RegVar: Tissue-specific Prioritization of Non-coding Regulatory Variants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:385-395. [PMID: 34973416 PMCID: PMC10626172 DOI: 10.1016/j.gpb.2021.08.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 06/11/2021] [Accepted: 09/27/2021] [Indexed: 06/14/2023]
Abstract
Non-coding genomic variants constitute the majority of trait-associated genome variations; however, the identification of functional non-coding variants is still a challenge in human genetics, and a method for systematically assessing the impact of regulatory variants on gene expression and linking these regulatory variants to potential target genes is still lacking. Here, we introduce a deep neural network (DNN)-based computational framework, RegVar, which can accurately predict the tissue-specific impact of non-coding regulatory variants on target genes. We show that by robustly learning the genomic characteristics of massive variant-gene expression associations in a variety of human tissues, RegVar vastly surpasses all current non-coding variant prioritization methods in predicting regulatory variants under different circumstances. The unique features of RegVar make it an excellent framework for assessing the regulatory impact of any variant on its putative target genes in a variety of tissues. RegVar is available as a web server at https://regvar.omic.tech/.
Collapse
Affiliation(s)
- Hao Lu
- Beijing Institute of Radiation Medicine, State Key Laboratory of Proteomics, Beijing 100850, China
| | - Luyu Ma
- Beijing Institute of Radiation Medicine, State Key Laboratory of Proteomics, Beijing 100850, China
| | - Cheng Quan
- Beijing Institute of Radiation Medicine, State Key Laboratory of Proteomics, Beijing 100850, China
| | - Lei Li
- Beijing Institute of Radiation Medicine, State Key Laboratory of Proteomics, Beijing 100850, China
| | - Yiming Lu
- Beijing Institute of Radiation Medicine, State Key Laboratory of Proteomics, Beijing 100850, China.
| | - Gangqiao Zhou
- Beijing Institute of Radiation Medicine, State Key Laboratory of Proteomics, Beijing 100850, China.
| | - Chenggang Zhang
- Beijing Institute of Radiation Medicine, State Key Laboratory of Proteomics, Beijing 100850, China.
| |
Collapse
|
5
|
An autoimmune pleiotropic SNP modulates IRF5 alternative promoter usage through ZBTB3-mediated chromatin looping. Nat Commun 2023; 14:1208. [PMID: 36869052 PMCID: PMC9984425 DOI: 10.1038/s41467-023-36897-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 02/22/2023] [Indexed: 03/05/2023] Open
Abstract
Genetic sharing is extensively observed for autoimmune diseases, but the causal variants and their underlying molecular mechanisms remain largely unknown. Through systematic investigation of autoimmune disease pleiotropic loci, we found most of these shared genetic effects are transmitted from regulatory code. We used an evidence-based strategy to functionally prioritize causal pleiotropic variants and identify their target genes. A top-ranked pleiotropic variant, rs4728142, yielded many lines of evidence as being causal. Mechanistically, the rs4728142-containing region interacts with the IRF5 alternative promoter in an allele-specific manner and orchestrates its upstream enhancer to regulate IRF5 alternative promoter usage through chromatin looping. A putative structural regulator, ZBTB3, mediates the allele-specific loop to promote IRF5-short transcript expression at the rs4728142 risk allele, resulting in IRF5 overactivation and M1 macrophage polarization. Together, our findings establish a causal mechanism between the regulatory variant and fine-scale molecular phenotype underlying the dysfunction of pleiotropic genes in human autoimmunity.
Collapse
|
6
|
Zhou Y, Lauschke VM. Challenges Related to the Use of Next-Generation Sequencing for the Optimization of Drug Therapy. Handb Exp Pharmacol 2023; 280:237-260. [PMID: 35792943 DOI: 10.1007/164_2022_596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Over the last decade, next-generation sequencing (NGS) methods have become increasingly used in various areas of human genomics. In routine clinical care, their use is already implemented in oncology to profile the mutational landscape of a tumor, as well as in rare disease diagnostics. However, its utilization in pharmacogenomics is largely lacking behind. Recent population-scale genome data has revealed that human pharmacogenes carry a plethora of rare genetic variations that are not interrogated by conventional array-based profiling methods and it is estimated that these variants could explain around 30% of the genetically encoded functional pharmacogenetic variability.To interpret the impact of such variants on drug response a multitude of computational tools have been developed, but, while there have been major advancements, it remains to be shown whether their accuracy is sufficient to improve personalized pharmacogenetic recommendations in robust trials. In addition, conventional short-read sequencing methods face difficulties in the interrogation of complex pharmacogenes and high NGS test costs require stringent evaluations of cost-effectiveness to decide about reimbursement by national healthcare programs. Here, we illustrate current challenges and discuss future directions toward the clinical implementation of NGS to inform genotype-guided decision-making.
Collapse
Affiliation(s)
- Yitian Zhou
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| | - Volker M Lauschke
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden.
- Dr Margarete Fischer-Bosch Institute of Clinical Pharmacology, Stuttgart, Germany.
- University of Tuebingen, Tuebingen, Germany.
| |
Collapse
|
7
|
Dai H, Chu X, Liang Q, Wang M, Li L, Zhou Y, Zheng Z, Wang W, Wang Z, Li H, Wang J, Zheng H, Zhao Y, Liu L, Yao H, Luo M, Wang Q, Kang S, Li Y, Wang K, Song F, Zhang R, Wu X, Cheng X, Zhang W, Wei Q, Li MJ, Chen K. Genome-wide association and functional interrogation identified a variant at 3p26.1 modulating ovarian cancer survival among Chinese women. Cell Discov 2021; 7:121. [PMID: 34930913 PMCID: PMC8688503 DOI: 10.1038/s41421-021-00342-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 09/23/2021] [Indexed: 12/03/2022] Open
Abstract
Ovarian cancer survival varies considerably among patients, to which germline variation may also contribute in addition to mutational signatures. To identify genetic markers modulating ovarian cancer outcome, we performed a genome-wide association study in 2130 Chinese ovarian cancer patients and found a hitherto unrecognized locus at 3p26.1 to be associated with the overall survival (Pcombined = 8.90 × 10−10). Subsequent statistical fine-mapping, functional annotation, and eQTL mapping prioritized a likely casual SNP rs9311399 in the non-coding regulatory region. Mechanistically, rs9311399 altered its enhancer activity through an allele-specific transcription factor binding and a long-range interaction with the promoter of a lncRNA BHLHE40-AS1. Deletion of the rs9311399-associated enhancer resulted in expression changes in several oncogenic signaling pathway genes and a decrease in tumor growth. Thus, we have identified a novel genetic locus that is associated with ovarian cancer survival possibly through a long-range gene regulation of oncogenic pathways.
Collapse
Affiliation(s)
- Hongji Dai
- Department of Epidemiology and Biostatistics, National Clinical Research Center for Cancer, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Xinlei Chu
- Department of Epidemiology and Biostatistics, National Clinical Research Center for Cancer, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Qian Liang
- Department of Pharmacology, the Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Mengyun Wang
- Cancer Institute, Fudan University Shanghai Cancer Center, Fudan University, Shanghai, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Lian Li
- Department of Epidemiology and Biostatistics, National Clinical Research Center for Cancer, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Yao Zhou
- Department of Pharmacology, the Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Zhanye Zheng
- Department of Pharmacology, the Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Wei Wang
- Department of Epidemiology and Biostatistics, National Clinical Research Center for Cancer, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Zhao Wang
- Department of Pharmacology, the Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Haixin Li
- Cancer Biobank, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Jianhua Wang
- Department of Pharmacology, the Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Hong Zheng
- Department of Epidemiology and Biostatistics, National Clinical Research Center for Cancer, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Yanrui Zhao
- Department of Epidemiology and Biostatistics, National Clinical Research Center for Cancer, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Luyang Liu
- Department of Epidemiology and Biostatistics, National Clinical Research Center for Cancer, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Hongcheng Yao
- School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, SAR, China
| | - Menghan Luo
- Department of Pharmacology, the Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Qiong Wang
- Department of Epidemiology and Biostatistics, National Clinical Research Center for Cancer, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Shan Kang
- Department of Obstetrics and Gynaecology, Hebei Medical University, Fourth Hospital, Shijiazhuang, China
| | - Yan Li
- Department of Molecular Biology, Hebei Medical University, Fourth Hospital, Shijiazhuang, China
| | - Ke Wang
- Department of Gynecologic Oncology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Fengju Song
- Department of Epidemiology and Biostatistics, National Clinical Research Center for Cancer, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Ruoxin Zhang
- Cancer Institute, Fudan University Shanghai Cancer Center, Fudan University, Shanghai, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Xiaohua Wu
- Cancer Institute, Fudan University Shanghai Cancer Center, Fudan University, Shanghai, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Xi Cheng
- Cancer Institute, Fudan University Shanghai Cancer Center, Fudan University, Shanghai, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Wei Zhang
- Center for Cancer Genomics and Precision Oncology, Wake Forest Baptist Comprehensive Cancer Center, Wake Forest Baptist Medical Center, Winston-Salem, NC, USA.,Department of Cancer Biology, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Qingyi Wei
- Cancer Institute, Fudan University Shanghai Cancer Center, Fudan University, Shanghai, China. .,Duke Cancer Institute, Duke University Medical Center, Durham, NC, USA. .,Department of Population Health Sciences, Duke University School of Medicine, Durham, NC, USA.
| | - Mulin Jun Li
- Department of Epidemiology and Biostatistics, National Clinical Research Center for Cancer, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China. .,Department of Pharmacology, the Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China.
| | - Kexin Chen
- Department of Epidemiology and Biostatistics, National Clinical Research Center for Cancer, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China.
| |
Collapse
|
8
|
Yi X, Zheng Z, Xu H, Zhou Y, Huang D, Wang J, Feng X, Zhao K, Fan X, Zhang S, Dong X, Wang Z, Shen Y, Cheng H, Shi L, Li MJ. Interrogating cell type-specific cooperation of transcriptional regulators in 3D chromatin. iScience 2021; 24:103468. [PMID: 34888502 PMCID: PMC8634045 DOI: 10.1016/j.isci.2021.103468] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 09/23/2021] [Accepted: 11/12/2021] [Indexed: 12/14/2022] Open
Abstract
Context-specific activities of transcription regulators (TRs) in the nucleus modulate spatiotemporal gene expression precisely. Using the largest ChIP-seq data and chromatin loops in the human K562 cell line, we initially interrogated TR cooperation in 3D chromatin via a graphical model and revealed many known and novel TRs manipulating context-specific pathways. To explore TR cooperation across broad tissue/cell types, we systematically leveraged large-scale open chromatin profiles, computational footprinting, and high-resolution chromatin interactions to investigate tissue/cell type-specific TR cooperation. We first delineated a landscape of TR cooperation across 40 human tissue/cell types. Network modularity analyses uncovered the commonality and specificity of TR cooperation in different conditions. We also demonstrated that TR cooperation information can better interpret the disease-causal variants identified by genome-wide association studies and recapitulate cell states during neural development. Our study characterizes shared and unique patterns of TR cooperation associated with the cell type specificity of gene regulation in 3D chromatin. Computational inference of transcriptional regulator (TR) cooperation in 3D chromatin A landscape of 3D TR cooperation across 40 human tissue/cell types TR cooperation can better interpret the disease-causal variants identified by GWAS Cooperation of certain TRs shapes context-specific gene regulation in cell development
Collapse
Affiliation(s)
- Xianfu Yi
- School of Biomedical Engineering and Technology, Tianjin Medical University, Tianjin 300070, China.,Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China
| | - Zhanye Zheng
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Hang Xu
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China
| | - Yao Zhou
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Dandan Huang
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Jianhua Wang
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Xiangling Feng
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Ke Zhao
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Xutong Fan
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Shijie Zhang
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Xiaobao Dong
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Genetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Zhao Wang
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Yujun Shen
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Hui Cheng
- State Key Laboratory of Experimental Hematology, Chinese Academy of Medical Sciences, Tianjin 300070, China
| | - Lei Shi
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Mulin Jun Li
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China.,Department of Epidemiology and Biostatistics, Tianjin Key Laboratory of Molecular Cancer Epidemiology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| |
Collapse
|
9
|
Dong S, Boyle AP. Prioritization of regulatory variants with tissue-specific function in the non-coding regions of human genome. Nucleic Acids Res 2021; 50:e6. [PMID: 34648033 PMCID: PMC8754628 DOI: 10.1093/nar/gkab924] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 09/21/2021] [Accepted: 09/27/2021] [Indexed: 02/06/2023] Open
Abstract
Understanding the functional consequences of genetic variation in the non-coding regions of the human genome remains a challenge. We introduce h ere a computational tool, TURF, to prioritize regulatory variants with tissue-specific function by leveraging evidence from functional genomics experiments, including over 3000 functional genomics datasets from the ENCODE project provided in the RegulomeDB database. TURF is able to generate prediction scores at both organism and tissue/organ-specific levels for any non-coding variant on the genome. We present that TURF has an overall top performance in prediction by using validated variants from MPRA experiments. We also demonstrate how TURF can pick out the regulatory variants with tissue-specific function over a candidate list from associate studies. Furthermore, we found that various GWAS traits showed the enrichment of regulatory variants predicted by TURF scores in the trait-relevant organs, which indicates that these variants can be a valuable source for future studies.
Collapse
Affiliation(s)
- Shengcheng Dong
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Alan P Boyle
- To whom correspondence should be addressed. Tel: +1 734 763 7382; Fax: +1 734 763 7382;
| |
Collapse
|
10
|
Huang D, Zhou Y, Yi X, Fan X, Wang J, Yao H, Sham PC, Hao J, Chen K, Li MJ. VannoPortal: multiscale functional annotation of human genetic variants for interrogating molecular mechanism of traits and diseases. Nucleic Acids Res 2021; 50:D1408-D1416. [PMID: 34570217 PMCID: PMC8728305 DOI: 10.1093/nar/gkab853] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2021] [Revised: 09/05/2021] [Accepted: 09/14/2021] [Indexed: 12/16/2022] Open
Abstract
Interpreting the molecular mechanism of genomic variations and their causal relationship with diseases/traits are important and challenging problems in the human genetic study. To provide comprehensive and context-specific variant annotations for biologists and clinicians, here, by systematically integrating over 4TB genomic/epigenomic profiles and frequently-used annotation databases from various biological domains, we develop a variant annotation database, called VannoPortal. In general, the database has following major features: (i) systematically integrates 40 genome-wide variant annotations and prediction scores regarding allele frequency, linkage disequilibrium, evolutionary signature, disease/trait association, tissue/cell type-specific epigenome, base-wise functional prediction, allelic imbalance and pathogenicity; (ii) equips with our recent novel index system and parallel random-sweep searching algorithms for efficient management of backend databases and information extraction; (iii) greatly expands context-dependent variant annotation to incorporate large-scale epigenomic maps and regulatory profiles (such as EpiMap) across over 33 tissue/cell types; (iv) compiles many genome-scale base-wise prediction scores for regulatory/pathogenic variant classification beyond protein-coding region; (v) enables fast retrieval and direct comparison of functional evidence among linked variants using highly interactive web panel in addition to plain table; (vi) introduces many visualization functions for more efficient identification and interpretation of functional variants in single web page. VannoPortal is freely available at http://mulinlab.org/vportal.
Collapse
Affiliation(s)
- Dandan Huang
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China.,Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Yao Zhou
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Xianfu Yi
- School of Biomedical Engineering, Tianjin Medical University, Tianjin 300070, China
| | - Xutong Fan
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Jianhua Wang
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Hongcheng Yao
- Centre for PanorOmic Sciences-Genomics and Bioinformatics Cores, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR 999077, China
| | - Pak Chung Sham
- Centre for PanorOmic Sciences-Genomics and Bioinformatics Cores, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR 999077, China
| | - Jihui Hao
- Department of Pancreatic Cancer, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Kexin Chen
- Department of Epidemiology and Biostatistics, Tianjin Key Laboratory of Molecular Cancer Epidemiology, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300060, China
| | - Mulin Jun Li
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China.,Department of Epidemiology and Biostatistics, Tianjin Key Laboratory of Molecular Cancer Epidemiology, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300060, China
| |
Collapse
|
11
|
Huang D, Wang Z, Zhou Y, Liang Q, Sham PC, Yao H, Li MJ. vSampler: fast and annotation-based matched variant sampling tool. Bioinformatics 2021; 37:1915-1917. [PMID: 33270826 DOI: 10.1093/bioinformatics/btaa883] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 07/28/2020] [Accepted: 09/30/2020] [Indexed: 11/13/2022] Open
Abstract
SUMMARY Sampling of control variants having matched properties with input variants is widely used in enrichment analysis of genome-wide association studies/quantitative trait loci and negative data construction for pathogenic/regulatory variant prediction methods. Spurious enrichment results because of confounding factors, such as minor allele frequency and linkage disequilibrium pattern, can be avoided by calibration of statistical significance based on matched controls. Here, we presented vSampler which can generate sets of randomly drawn variants with comprehensive choices of matching properties, such as tissue/cell type-specific epigenomic features. Importantly, the development of a novel data structure and sampling algorithms for vSampler makes it significantly fast than existing tools. AVAILABILITY AND IMPLEMENTATION vSampler web server and local program are available at http://mulinlab.org/vsampler. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dandan Huang
- Department of Epidemiology and Biostatistics, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Zhao Wang
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Yao Zhou
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Qian Liang
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | | | - Hongcheng Yao
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR 999077, China
| | - Mulin Jun Li
- Department of Epidemiology and Biostatistics, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| |
Collapse
|
12
|
Deng F, Zhao J, Jia W, Fu K, Zuo X, Huang L, Wang N, Xia H, Zhang Y, Fu W, Liu G. Increased hypospadias risk by GREM1 rs3743104[G] in the southern Han Chinese population. Aging (Albany NY) 2021; 13:13898-13908. [PMID: 33962391 PMCID: PMC8202882 DOI: 10.18632/aging.202983] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 03/23/2021] [Indexed: 02/07/2023]
Abstract
Hypospadias is a common congenital genitourinary malformation characterized by ventral opening of the urethral meatus. As a member of the bone morphogenic protein antagonist family, GREM1 has been identified as associated with susceptibility to hypospadias in the European population. The present study was designed to elaborate on the mutual relationship between replicated single-nucleotide polymorphisms (SNPs) and hypospadias in Asia's largest case-control study in the Southern Han Chinese population involving 577 patients and 654 controls. Our results demonstrate that the GREM1 risk allele rs3743104[G] markedly increases the risk of mild/moderate and severe hypospadias (P<0.01, 0.28≤OR≤0.66). GTEx expression quantitative trait locus data revealed that the eQTL SNP rs3743104 has more associations of eQTL SNP rs3743104 and GREM1 targets in pituitary tissues. Additionally, Bioinformatics and Luciferase Assays show that miR-182 is identified as a suppressor for GREM1 expression, likely through regulation of its binding affinity to rs3743104 locus. In conclusion, the GREM1 risk allele rs3743104[G] increases hypospadias susceptibility in mild/moderate and severe cases among the southern Han population. rs3743104 regulates GREM1 expression by altering the binding affinity of miR-182 to their locus. Collectively, this study provides new evidence that GREM1 rs3743104 is associated with an increased risk of hypospadias. These findings provide a promising biomarker and merit further exploration.
Collapse
Affiliation(s)
- Fuming Deng
- Department of Pediatric Urology, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou 510623, Guangdong, China
| | - Jinglu Zhao
- Department of Pediatric Surgery, Guangzhou Institute of Pediatrics, Guangdong Provincial Key Laboratory of Research in Structural Birth Defect Disease, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou 510623, Guangdong, China
| | - Wei Jia
- Department of Pediatric Urology, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou 510623, Guangdong, China
| | - Kai Fu
- Department of Pediatric Urology, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou 510623, Guangdong, China
| | - Xiaoyu Zuo
- Department of Pediatric Surgery, Guangzhou Institute of Pediatrics, Guangdong Provincial Key Laboratory of Research in Structural Birth Defect Disease, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou 510623, Guangdong, China
| | - Lihua Huang
- Department of Pediatric Surgery, Guangzhou Institute of Pediatrics, Guangdong Provincial Key Laboratory of Research in Structural Birth Defect Disease, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou 510623, Guangdong, China
| | - Ning Wang
- Department of Pediatric Surgery, Guangzhou Institute of Pediatrics, Guangdong Provincial Key Laboratory of Research in Structural Birth Defect Disease, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou 510623, Guangdong, China
| | - Huiming Xia
- Department of Pediatric Surgery, Guangzhou Institute of Pediatrics, Guangdong Provincial Key Laboratory of Research in Structural Birth Defect Disease, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou 510623, Guangdong, China
| | - Yan Zhang
- Department of Pediatric Surgery, Guangzhou Institute of Pediatrics, Guangdong Provincial Key Laboratory of Research in Structural Birth Defect Disease, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou 510623, Guangdong, China
| | - Wen Fu
- Department of Pediatric Urology, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou 510623, Guangdong, China
| | - Guochang Liu
- Department of Pediatric Urology, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou 510623, Guangdong, China
| |
Collapse
|
13
|
Huang D, Yi X, Zhou Y, Yao H, Xu H, Wang J, Zhang S, Nong W, Wang P, Shi L, Xuan C, Li M, Wang J, Li W, Kwan HS, Sham PC, Wang K, Li MJ. Ultrafast and scalable variant annotation and prioritization with big functional genomics data. Genome Res 2020; 30:1789-1801. [PMID: 33060171 PMCID: PMC7706736 DOI: 10.1101/gr.267997.120] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Accepted: 09/22/2020] [Indexed: 02/06/2023]
Abstract
The advances of large-scale genomics studies have enabled compilation of cell type–specific, genome-wide DNA functional elements at high resolution. With the growing volume of functional annotation data and sequencing variants, existing variant annotation algorithms lack the efficiency and scalability to process big genomic data, particularly when annotating whole-genome sequencing variants against a huge database with billions of genomic features. Here, we develop VarNote to rapidly annotate genome-scale variants in large and complex functional annotation resources. Equipped with a novel index system and a parallel random-sweep searching algorithm, VarNote shows substantial performance improvements (two to three orders of magnitude) over existing algorithms at different scales. It supports both region-based and allele-specific annotations and introduces advanced functions for the flexible extraction of annotations. By integrating massive base-wise and context-dependent annotations in the VarNote framework, we introduce three efficient and accurate pipelines to prioritize the causal regulatory variants for common diseases, Mendelian disorders, and cancers.
Collapse
Affiliation(s)
- Dandan Huang
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China.,Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Xianfu Yi
- School of Biomedical Engineering, Tianjin Medical University, Tianjin 300070, China
| | - Yao Zhou
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Hongcheng Yao
- School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR 999077, China
| | - Hang Xu
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China.,School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR 999077, China
| | - Jianhua Wang
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Shijie Zhang
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Wenyan Nong
- School of Life Sciences, The Chinese University of Hong Kong, Hong Kong SAR 999077, China
| | - Panwen Wang
- Department of Health Sciences Research and Center for Individualized Medicine, Mayo Clinic, Scottsdale, Arizona 85259, USA
| | - Lei Shi
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Chenghao Xuan
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Miaoxin Li
- Center for Genome Research, Center for Precision Medicine, Zhongshan School of Medicine, First Affiliated Hospital, Sun Yat-Sen University, Guangzhou 510080, China
| | - Junwen Wang
- Department of Health Sciences Research and Center for Individualized Medicine, Mayo Clinic, Scottsdale, Arizona 85259, USA
| | - Weidong Li
- Department of Genetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Hoi Shan Kwan
- School of Life Sciences, The Chinese University of Hong Kong, Hong Kong SAR 999077, China
| | - Pak Chung Sham
- Centre of Genomics Sciences, Departments of Psychiatry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR 999077, China
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - Mulin Jun Li
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China.,Department of Epidemiology and Biostatistics, Tianjin Key Laboratory of Molecular Cancer Epidemiology, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| |
Collapse
|
14
|
Wu Z, Ioannidis NM, Zou J. Predicting target genes of non-coding regulatory variants with IRT. Bioinformatics 2020; 36:4440-4448. [PMID: 32330225 DOI: 10.1093/bioinformatics/btaa254] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 03/15/2020] [Accepted: 04/17/2020] [Indexed: 11/14/2022] Open
Abstract
SUMMARY Interpreting genetic variants of unknown significance (VUS) is essential in clinical applications of genome sequencing for diagnosis and personalized care. Non-coding variants remain particularly difficult to interpret, despite making up a large majority of trait associations identified in genome-wide association studies (GWAS) analyses. Predicting the regulatory effects of non-coding variants on candidate genes is a key step in evaluating their clinical significance. Here, we develop a machine-learning algorithm, Inference of Connected expression quantitative trait loci (eQTLs) (IRT), to predict the regulatory targets of non-coding variants identified in studies of eQTLs. We assemble datasets using eQTL results from the Genotype-Tissue Expression (GTEx) project and learn to separate positive and negative pairs based on annotations characterizing the variant, gene and the intermediate sequence. IRT achieves an area under the receiver operating characteristic curve (ROC-AUC) of 0.799 using random cross-validation, and 0.700 for a more stringent position-based cross-validation. Further evaluation on rare variants and experimentally validated regulatory variants shows a significant enrichment in IRT identifying the true target genes versus negative controls. In gene-ranking experiments, IRT achieves a top-1 accuracy of 50% and top-3 accuracy of 90%. Salient features, including GC-content, histone modifications and Hi-C interactions are further analyzed and visualized to illustrate their influences on predictions. IRT can be applied to any VUS of interest and each candidate nearby gene to output a score reflecting the likelihood of regulatory effect on the expression level. These scores can be used to prioritize variants and genes to assist in patient diagnosis and GWAS follow-up studies. AVAILABILITY AND IMPLEMENTATION Codes and data used in this work are available at https://github.com/miaecle/eQTL_Trees. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhenqin Wu
- Department of Chemistry, Stanford University, CA 94305, USA.,Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, 94305 CA, USA
| | - Nilah M Ioannidis
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, 94305 CA, USA
| | - James Zou
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, 94305 CA, USA.,Chan-Zuckerberg Biohub, San Francisco, 94158 CA, USA
| |
Collapse
|
15
|
Jiang D, Deng J, Dong C, Ma X, Xiao Q, Zhou B, Yang C, Wei L, Conran C, Zheng SL, Ng IOL, Yu L, Xu J, Sham PC, Qi X, Hou J, Ji Y, Cao G, Li M. Knowledge-based analyses reveal new candidate genes associated with risk of hepatitis B virus related hepatocellular carcinoma. BMC Cancer 2020; 20:403. [PMID: 32393195 PMCID: PMC7216662 DOI: 10.1186/s12885-020-06842-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2019] [Accepted: 04/07/2020] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Recent genome-wide association studies (GWASs) have suggested several susceptibility loci of hepatitis B virus (HBV)-related hepatocellular carcinoma (HCC) by statistical analysis at individual single-nucleotide polymorphisms (SNPs). However, these loci only explain a small fraction of HBV-related HCC heritability. In the present study, we aimed to identify additional susceptibility loci of HBV-related HCC using advanced knowledge-based analysis. METHODS We performed knowledge-based analysis (including gene- and gene-set-based association tests) on variant-level association p-values from two existing GWASs of HBV-related HCC. Five different types of gene-sets were collected for the association analysis. A number of SNPs within the gene prioritized by the knowledge-based association tests were selected to replicate genetic associations in an independent sample of 965 cases and 923 controls. RESULTS The gene-based association analysis detected four genes significantly or suggestively associated with HBV-related HCC risk: SLC39A8, GOLGA8M, SMIM31, and WHAMMP2. The gene-set-based association analysis prioritized two promising gene sets for HCC, cell cycle G1/S transition and NOTCH1 intracellular domain regulates transcription. Within the gene sets, three promising candidate genes (CDC45, NCOR1 and KAT2A) were further prioritized for HCC. Among genes of liver-specific expression, multiple genes previously implicated in HCC were also highlighted. However, probably due to small sample size, none of the genes prioritized by the knowledge-based association analyses were successfully replicated by variant-level association test in the independent sample. CONCLUSIONS This comprehensive knowledge-based association mining study suggested several promising genes and gene-sets associated with HBV-related HCC risks, which would facilitate follow-up functional studies on the pathogenic mechanism of HCC.
Collapse
Affiliation(s)
- Deke Jiang
- State Key Laboratory of Organ Failure Research, Guangdong Key Laboratory of Viral Hepatitis Research, Institutes of Liver Diseases Research of Guangdong Province, Department of Infectious Diseases and Hepatology Unit, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Jiaen Deng
- Department of Psychiatry, the University of Hong Kong, Pokfulam, Hong Kong
| | | | - Xiaopin Ma
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
| | - Qianyi Xiao
- Center for Genomic Translational Medicine and Prevention, School of Public Health, Fudan University, Shanghai, China
| | - Bin Zhou
- State Key Laboratory of Organ Failure Research, Guangdong Key Laboratory of Viral Hepatitis Research, Institutes of Liver Diseases Research of Guangdong Province, Department of Infectious Diseases and Hepatology Unit, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Chou Yang
- State Key Laboratory of Organ Failure Research, Guangdong Key Laboratory of Viral Hepatitis Research, Institutes of Liver Diseases Research of Guangdong Province, Department of Infectious Diseases and Hepatology Unit, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Lin Wei
- Program of Computational Genomics & Medicine, NorthShore University HealthSystem, Evanston, IL, USA.,Department of Public Health Sciences, University of Chicago, Chicago, IL, USA
| | - Carly Conran
- Program for Personalized Cancer Care, NorthShore University HealthSystem, Pritzker School of Medicine, University of Chicago, Evanston, IL, USA
| | - S Lilly Zheng
- Program of Computational Genomics & Medicine, NorthShore University HealthSystem, Evanston, IL, USA
| | - Irene Oi-Lin Ng
- Department of Pathology, the University of Hong Kong, Pokfulam, Hong Kong.,State Key Laboratory of Liver Research, the University of Hong Kong, Pokfulam, Hong Kong
| | - Long Yu
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
| | - Jianfeng Xu
- Program of Computational Genomics & Medicine, NorthShore University HealthSystem, Evanston, IL, USA
| | - Pak C Sham
- The Centre for Genomic Sciences, the University of Hong Kong, Pokfulam, Hong Kong
| | - Xiaolong Qi
- State Key Laboratory of Organ Failure Research, Guangdong Key Laboratory of Viral Hepatitis Research, Institutes of Liver Diseases Research of Guangdong Province, Department of Infectious Diseases and Hepatology Unit, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Jinlin Hou
- State Key Laboratory of Organ Failure Research, Guangdong Key Laboratory of Viral Hepatitis Research, Institutes of Liver Diseases Research of Guangdong Province, Department of Infectious Diseases and Hepatology Unit, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Yuan Ji
- Department of Public Health Sciences, University of Chicago, Chicago, IL, USA
| | - Guangwen Cao
- Department of Epidemiology, Second Military Medical University, Shanghai, China.
| | - Miaoxin Li
- Department of Psychiatry, the University of Hong Kong, Pokfulam, Hong Kong. .,The Centre for Genomic Sciences, the University of Hong Kong, Pokfulam, Hong Kong. .,State Key Laboratory for Cognitive and Brain Sciences, the University of Hong Kong, Pokfulam, Hong Kong. .,Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China. .,Key Laboratory of Tropical Disease Control (SYSU), Ministry of Education, Guangzhou, China.
| |
Collapse
|
16
|
Zhang S, He Y, Liu H, Zhai H, Huang D, Yi X, Dong X, Wang Z, Zhao K, Zhou Y, Wang J, Yao H, Xu H, Yang Z, Sham PC, Chen K, Li MJ. regBase: whole genome base-wise aggregation and functional prediction for human non-coding regulatory variants. Nucleic Acids Res 2020; 47:e134. [PMID: 31511901 PMCID: PMC6868349 DOI: 10.1093/nar/gkz774] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Accepted: 08/29/2019] [Indexed: 12/19/2022] Open
Abstract
Predicting the functional or pathogenic regulatory variants in the human non-coding genome facilitates the interpretation of disease causation. While numerous prediction methods are available, their performance is inconsistent or restricted to specific tasks, which raises the demand of developing comprehensive integration for those methods. Here, we compile whole genome base-wise aggregations, regBase, that incorporate largest prediction scores. Building on different assumptions of causality, we train three composite models to score functional, pathogenic and cancer driver non-coding regulatory variants respectively. We demonstrate the superior and stable performance of our models using independent benchmarks and show great success to fine-map causal regulatory variants on specific locus or at base-wise resolution. We believe that regBase database together with three composite models will be useful in different areas of human genetic studies, such as annotation-based casual variant fine-mapping, pathogenic variant discovery as well as cancer driver mutation identification. regBase is freely available at https://github.com/mulinlab/regBase.
Collapse
Affiliation(s)
- Shijie Zhang
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Key Laboratory of Inflammation Biology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Yukun He
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Key Laboratory of Inflammation Biology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Huanhuan Liu
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Key Laboratory of Inflammation Biology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Haoyu Zhai
- Department of Computer Science, University of Illinois Urbana-Champaign, IL, USA
| | - Dandan Huang
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Key Laboratory of Inflammation Biology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China.,Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Xianfu Yi
- School of Biomedical Engineering, Tianjin Medical University, Tianjin, China
| | - Xiaobao Dong
- Department of Genetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Zhao Wang
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Key Laboratory of Inflammation Biology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Ke Zhao
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Key Laboratory of Inflammation Biology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Yao Zhou
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Key Laboratory of Inflammation Biology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Jianhua Wang
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Key Laboratory of Inflammation Biology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Hongcheng Yao
- School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Hang Xu
- School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Zhenglu Yang
- College of Computer Science, Nankai University, Tianjin, China
| | - Pak Chung Sham
- Centre of Genomics Sciences, State Key Laboratory of Brain and Cognitive Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Kexin Chen
- Department of Epidemiology and Biostatistics, Tianjin Key Laboratory of Molecular Cancer Epidemiology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Mulin Jun Li
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Key Laboratory of Inflammation Biology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China.,Department of Epidemiology and Biostatistics, Tianjin Key Laboratory of Molecular Cancer Epidemiology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| |
Collapse
|
17
|
Chimusa ER, Dalvie S, Dandara C, Wonkam A, Mazandu GK. Post genome-wide association analysis: dissecting computational pathway/network-based approaches. Brief Bioinform 2020; 20:690-700. [PMID: 29701762 DOI: 10.1093/bib/bby035] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2018] [Revised: 04/04/2018] [Indexed: 02/02/2023] Open
Abstract
Over thousands of genetic associations to diseases have been identified by genome-wide association studies (GWASs), which conceptually is a single-marker-based approach. There are potentially many uses of these identified variants, including a better understanding of the pathogenesis of diseases, new leads for studying underlying risk prediction and clinical prediction of treatment. However, because of inadequate power, GWAS might miss disease genes and/or pathways with weak genetic or strong epistatic effects. Driven by the need to extract useful information from GWAS summary statistics, post-GWAS approaches (PGAs) were introduced. Here, we dissect and discuss advances made in pathway/network-based PGAs, with a particular focus on protein-protein interaction networks that leverage GWAS summary statistics by combining effects of multiple loci, subnetworks or pathways to detect genetic signals associated with complex diseases. We conclude with a discussion of research areas where further work on summary statistic-based methods is needed.
Collapse
Affiliation(s)
- Emile R Chimusa
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Level 3, Wernher and Beit North, Private Bag, Rondebosch, 7700, Anzio road, Observatory Cape Town, South Africa
| | - Shareefa Dalvie
- Department of Psychiatry and Mental Health, University of Cape Town, Observatory, 7925, Cape Town, South Africa
| | - Collet Dandara
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Private Bag, Rondebosch, 7700, Cape Town, South Africa
| | - Ambroise Wonkam
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Private Bag, Rondebosch, 7700, Cape Town, South Africa
| | - Gaston K Mazandu
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Private Bag, Rondebosch, 7700, Cape Town, South Africa; African Institute for Mathematical Sciences, 7945 Muizenberg, Cape Town, South Africa and Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, Anzio Road, Observatory, 7925, Cape Town, South Africa
| |
Collapse
|
18
|
Zheng Z, Huang D, Wang J, Zhao K, Zhou Y, Guo Z, Zhai S, Xu H, Cui H, Yao H, Wang Z, Yi X, Zhang S, Sham PC, Li MJ. QTLbase: an integrative resource for quantitative trait loci across multiple human molecular phenotypes. Nucleic Acids Res 2020; 48:D983-D991. [PMID: 31598699 PMCID: PMC6943073 DOI: 10.1093/nar/gkz888] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 09/24/2019] [Accepted: 10/02/2019] [Indexed: 12/20/2022] Open
Abstract
Recent advances in genome sequencing and functional genomic profiling have promoted many large-scale quantitative trait locus (QTL) studies, which connect genotypes with tissue/cell type-specific cellular functions from transcriptional to post-translational level. However, no comprehensive resource can perform QTL lookup across multiple molecular phenotypes and investigate the potential cascade effect of functional variants. We developed a versatile resource, named QTLbase, for interpreting the possible molecular functions of genetic variants, as well as their tissue/cell-type specificity. Overall, QTLbase has five key functions: (i) curating and compiling genome-wide QTL summary statistics for 13 human molecular traits from 233 independent studies; (ii) mapping QTL-relevant tissue/cell types to 78 unified terms according to a standard anatomogram; (iii) normalizing variant and trait information uniformly, yielding >170 million significant QTLs; (iv) providing a rich web client that enables phenome- and tissue-wise visualization; and (v) integrating the most comprehensive genomic features and functional predictions to annotate the potential QTL mechanisms. QTLbase provides a one-stop shop for QTL retrieval and comparison across multiple tissues and multiple layers of molecular complexity, and will greatly help researchers interrogate the biological mechanism of causal variants and guide the direction of functional validation. QTLbase is freely available at http://mulinlab.org/qtlbase.
Collapse
Affiliation(s)
- Zhanye Zheng
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, School of Basic Medical Sciences, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Dandan Huang
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, School of Basic Medical Sciences, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Jianhua Wang
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, School of Basic Medical Sciences, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Ke Zhao
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, School of Basic Medical Sciences, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Yao Zhou
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, School of Basic Medical Sciences, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Zhenyang Guo
- School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR 999077, China
| | - Sinan Zhai
- School of Biomedical Engineering, Tianjin Medical University, Tianjin 300070, China
| | - Hang Xu
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, School of Basic Medical Sciences, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Hui Cui
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, School of Basic Medical Sciences, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Hongcheng Yao
- School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR 999077, China
| | - Zhao Wang
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, School of Basic Medical Sciences, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Xianfu Yi
- School of Biomedical Engineering, Tianjin Medical University, Tianjin 300070, China
| | - Shijie Zhang
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, School of Basic Medical Sciences, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Pak Chung Sham
- Centre of Genomics Sciences, State Key Laboratory of Brain and Cognitive Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR 999077, China
| | - Mulin Jun Li
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, School of Basic Medical Sciences, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| |
Collapse
|
19
|
Li M, Jiang L, Mak TSH, Kwan JSH, Xue C, Chen P, Leung HCM, Cui L, Li T, Sham PC. A powerful conditional gene-based association approach implicated functionally important genes for schizophrenia. Bioinformatics 2019; 35:628-635. [PMID: 30101339 DOI: 10.1093/bioinformatics/bty682] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Revised: 06/27/2018] [Accepted: 08/06/2018] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION It remains challenging to unravel new susceptibility genes of complex diseases and the mechanisms in genome-wide association studies. There are at least two difficulties, isolation of the genuine susceptibility genes from many indirectly associated genes and functional validation of these genes. RESULTS We first proposed a novel conditional gene-based association test which can use only summary statistics to isolate independently associated genes of a disease. Applying this method, we detected 185 genes of independent association with schizophrenia. We then designed an in-silico experiment based on expression/co-expression to systematically validate pathogenic potential of these genes. We found that genes of independent association with schizophrenia formed more co-expression pairs in normal post-natal but not pre-natal human brain regions than expected. Interestingly, no co-expression enrichment was found in the brain regions of schizophrenia patients. The genes with independent association also had more significant P-values for differential expression between schizophrenia patients and controls in the brain regions. In contrast, indirectly associated genes or associated genes by other widely-used gene-based tests had no such differential expression and co-expression patterns. In summary, this conditional gene-based association test is effective for isolating directly associated genes from indirectly associated genes, and the results insightfully suggest that common variants might contribute to schizophrenia largely by distorting expression and co-expression in post-natal brains. AVAILABILITY AND IMPLEMENTATION The conditional gene-based association test has been implemented in a platform 'KGG' in Java and is publicly available at http://grass.cgs.hku.hk/limx/kgg/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Miaoxin Li
- Zhongshan School of Medicine, First Affiliated Hospital, Center for Genome Research, Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China.,The Centre for Genomic Sciences, The University of Hong Kong, Pokfulam, Hong Kong, China.,Department of Psychiatry, The University of Hong Kong, Pokfulam, Hong Kong, China.,State Key Laboratory for Cognitive and Brain Sciences, The University of Hong Kong, Pokfulam, Hong Kong, China.,Key Laboratory of Tropical Disease Control (SYSU), Ministry of Education, Guangzhou, Hong Kong, China
| | - Lin Jiang
- Zhongshan School of Medicine, First Affiliated Hospital, Center for Genome Research, Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China.,The Centre for Genomic Sciences, The University of Hong Kong, Pokfulam, Hong Kong, China
| | - Timothy Shin Heng Mak
- The Centre for Genomic Sciences, The University of Hong Kong, Pokfulam, Hong Kong, China
| | | | - Chao Xue
- Zhongshan School of Medicine, First Affiliated Hospital, Center for Genome Research, Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China
| | - Peikai Chen
- The Centre for Genomic Sciences, The University of Hong Kong, Pokfulam, Hong Kong, China.,School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong, China
| | - Henry Chi-Ming Leung
- Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong, China
| | - Liqian Cui
- The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Tao Li
- The Mental Health Center and the Psychiatric Laboratory, West China Hospital, Sichuan University, Chengdu, China
| | - Pak Chung Sham
- The Centre for Genomic Sciences, The University of Hong Kong, Pokfulam, Hong Kong, China.,Department of Psychiatry, The University of Hong Kong, Pokfulam, Hong Kong, China.,State Key Laboratory for Cognitive and Brain Sciences, The University of Hong Kong, Pokfulam, Hong Kong, China
| |
Collapse
|
20
|
Rojano E, Seoane P, Ranea JAG, Perkins JR. Regulatory variants: from detection to predicting impact. Brief Bioinform 2019; 20:1639-1654. [PMID: 29893792 PMCID: PMC6917219 DOI: 10.1093/bib/bby039] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Revised: 04/18/2018] [Indexed: 02/01/2023] Open
Abstract
Variants within non-coding genomic regions can greatly affect disease. In recent years, increasing focus has been given to these variants, and how they can alter regulatory elements, such as enhancers, transcription factor binding sites and DNA methylation regions. Such variants can be considered regulatory variants. Concurrently, much effort has been put into establishing international consortia to undertake large projects aimed at discovering regulatory elements in different tissues, cell lines and organisms, and probing the effects of genetic variants on regulation by measuring gene expression. Here, we describe methods and techniques for discovering disease-associated non-coding variants using sequencing technologies. We then explain the computational procedures that can be used for annotating these variants using the information from the aforementioned projects, and prediction of their putative effects, including potential pathogenicity, based on rule-based and machine learning approaches. We provide the details of techniques to validate these predictions, by mapping chromatin-chromatin and chromatin-protein interactions, and introduce Clustered Regularly Interspaced Short Palindromic Repeats-Associated Protein 9 (CRISPR-Cas9) technology, which has already been used in this field and is likely to have a big impact on its future evolution. We also give examples of regulatory variants associated with multiple complex diseases. This review is aimed at bioinformaticians interested in the characterization of regulatory variants, molecular biologists and geneticists interested in understanding more about the nature and potential role of such variants from a functional point of views, and clinicians who may wish to learn about variants in non-coding genomic regions associated with a given disease and find out what to do next to uncover how they impact on the underlying mechanisms.
Collapse
Affiliation(s)
- Elena Rojano
- Department of Molecular Biology and Biochemistry, University of Malaga (UMA), 29010 Malaga, Spain
| | - Pedro Seoane
- Department of Molecular Biology and Biochemistry, University of Malaga (UMA), 29010 Malaga, Spain
| | - Juan A G Ranea
- CIBER de Enfermedades Raras, ISCIII, Madrid, Spain and Department of Molecular Biology and Biochemistry, University of Malaga (UMA), 29010 Malaga, Spain
| | - James R Perkins
- Research laboratory, IBIMA-Regional University Hospital of Malaga, UMA, Malaga 29009, Spain
| |
Collapse
|
21
|
Hu Z, Yu C, Furutsuki M, Andreoletti G, Ly M, Hoskins R, Adhikari AN, Brenner SE. VIPdb, a genetic Variant Impact Predictor Database. Hum Mutat 2019; 40:1202-1214. [PMID: 31283070 PMCID: PMC7288905 DOI: 10.1002/humu.23858] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 06/27/2019] [Indexed: 12/30/2022]
Abstract
Genome sequencing identifies vast number of genetic variants. Predicting these variants' molecular and clinical effects is one of the preeminent challenges in human genetics. Accurate prediction of the impact of genetic variants improves our understanding of how genetic information is conveyed to molecular and cellular functions, and is an essential step towards precision medicine. Over one hundred tools/resources have been developed specifically for this purpose. We summarize these tools as well as their characteristics, in the genetic Variant Impact Predictor Database (VIPdb). This database will help researchers and clinicians explore appropriate tools, and inform the development of improved methods. VIPdb can be browsed and downloaded at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Changhua Yu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Department of Bioengineering, University of California, Berkeley, California 94720, USA
| | - Mabel Furutsuki
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720, USA
| | - Gaia Andreoletti
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Melissa Ly
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Division of Data Sciences, University of California, Berkeley, California 94720, USA
| | - Roger Hoskins
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Aashish N. Adhikari
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Steven E. Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
22
|
Huang D, Yi X, Zhang S, Zheng Z, Wang P, Xuan C, Sham PC, Wang J, Li MJ. GWAS4D: multidimensional analysis of context-specific regulatory variant for human complex diseases and traits. Nucleic Acids Res 2019; 46:W114-W120. [PMID: 29771388 PMCID: PMC6030885 DOI: 10.1093/nar/gky407] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2018] [Accepted: 05/03/2018] [Indexed: 01/04/2023] Open
Abstract
Genome-wide association studies have generated over thousands of susceptibility loci for many human complex traits, and yet for most of these associations the true causal variants remain unknown. Tissue/cell type-specific prediction and prioritization of non-coding regulatory variants will facilitate the identification of causal variants and underlying pathogenic mechanisms for particular complex diseases and traits. By leveraging recent large-scale functional genomics/epigenomics data, we develop an intuitive web server, GWAS4D (http://mulinlab.tmu.edu.cn/gwas4d or http://mulinlab.org/gwas4d), that systematically evaluates GWAS signals and identifies context-specific regulatory variants. The updated web server includes six major features: (i) updates the regulatory variant prioritization method with our new algorithm; (ii) incorporates 127 tissue/cell type-specific epigenomes data; (iii) integrates motifs of 1480 transcriptional regulators from 13 public resources; (iv) uniformly processes Hi-C data and generates significant interactions at 5 kb resolution across 60 tissues/cell types; (v) adds comprehensive non-coding variant functional annotations; (vi) equips a highly interactive visualization function for SNP-target interaction. Using a GWAS fine-mapped set for 161 coronary artery disease risk loci, we demonstrate that GWAS4D is able to efficiently prioritize disease-causal regulatory variants.
Collapse
Affiliation(s)
- Dandan Huang
- Department of Pharmacology, Collaborative Innovation Center of Tianjin for Medical Epigenetics, Tianjin Key Laboratory of Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China.,Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Xianfu Yi
- School of Biomedical Engineering, Tianjin Medical University, Tianjin, China
| | - Shijie Zhang
- Department of Pharmacology, Collaborative Innovation Center of Tianjin for Medical Epigenetics, Tianjin Key Laboratory of Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Zhanye Zheng
- Department of Pharmacology, Collaborative Innovation Center of Tianjin for Medical Epigenetics, Tianjin Key Laboratory of Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Panwen Wang
- Department of Health Sciences Research & Center for Individualized Medicine, Mayo Clinic, Scottsdale, USA
| | - Chenghao Xuan
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Pak Chung Sham
- Center for Genomic Sciences, The University of Hong Kong, Hong Kong SAR, China.,Departments of Psychiatry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.,State Key Laboratory of Brain and Cognitive Sciences, The University of Hong Kong, Hong Kong SAR, China
| | - Junwen Wang
- Department of Health Sciences Research & Center for Individualized Medicine, Mayo Clinic, Scottsdale, USA.,Department of Biomedical Informatics, Arizona State University, Scottsdale, USA
| | - Mulin Jun Li
- Department of Pharmacology, Collaborative Innovation Center of Tianjin for Medical Epigenetics, Tianjin Key Laboratory of Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| |
Collapse
|
23
|
Xu J, Li Q, Qin W, Jun Li M, Zhuo C, Liu H, Liu F, Wang J, Schumann G, Yu C. Neurobiological substrates underlying the effect of genomic risk for depression on the conversion of amnestic mild cognitive impairment. Brain 2019; 141:3457-3471. [PMID: 30445590 DOI: 10.1093/brain/awy277] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Accepted: 09/12/2018] [Indexed: 12/28/2022] Open
Abstract
Depression increases the conversion risk from amnestic mild cognitive impairment to Alzheimer's disease with unknown mechanisms. We hypothesize that the cumulative genomic risk for major depressive disorder may be a candidate cause for the increased conversion risk. Here, we aimed to investigate the predictive effect of the polygenic risk scores of major depressive disorder-specific genetic variants (PRSsMDD) on the conversion from non-depressed amnestic mild cognitive impairment to Alzheimer's disease, and its underlying neurobiological mechanisms. The PRSsMDD could predict the conversion from amnestic mild cognitive impairment to Alzheimer's disease, and amnestic mild cognitive impairment patients with high risk scores showed 16.25% higher conversion rate than those with low risk. The PRSsMDD was correlated with the left hippocampal volume, which was found to mediate the predictive effect of the PRSsMDD on the conversion of amnestic mild cognitive impairment. The major depressive disorder-specific genetic variants were mapped into genes using different strategies, and then enrichment analyses and protein-protein interaction network analysis revealed that these genes were involved in developmental process and amyloid-beta binding. They showed temporal-specific expression in the hippocampus in middle and late foetal developmental periods. Cell type-specific expression analysis of these genes demonstrated significant over-representation in the pyramidal neurons and interneurons in the hippocampus. These cross-scale neurobiological analyses and functional annotations indicate that major depressive disorder-specific genetic variants may increase the conversion from amnestic mild cognitive impairment to Alzheimer's disease by modulating the early hippocampal development and amyloid-beta binding. The PRSsMDD could be used as a complementary measure to select patients with amnestic mild cognitive impairment with high conversion risk to Alzheimer's disease.
Collapse
Affiliation(s)
- Jiayuan Xu
- Department of Radiology and Tianjin Key Laboratory of Functional Imaging, Tianjin Medical University General Hospital, Tianjin, P.R. China
| | - Qiaojun Li
- College of Information Engineering, Tianjin University of Commerce, Tianjin, P.R. China
| | - Wen Qin
- Department of Radiology and Tianjin Key Laboratory of Functional Imaging, Tianjin Medical University General Hospital, Tianjin, P.R. China
| | - Mulin Jun Li
- Collaborative Innovation Center of Tianjin for Medical Epigenetics, Tianjin Key Laboratory of Medical Epigenetics, Department of Pharmacology, Tianjin Medical University, Tianjin, P.R. China
| | - Chuanjun Zhuo
- Department of Radiology and Tianjin Key Laboratory of Functional Imaging, Tianjin Medical University General Hospital, Tianjin, P.R. China.,Department of Psychiatry Functional Neuroimaging Laboratory, Tianjin Mental Health Center, Tianjin Anding Hospital, Tianjin, P.R. China
| | - Huaigui Liu
- Department of Radiology and Tianjin Key Laboratory of Functional Imaging, Tianjin Medical University General Hospital, Tianjin, P.R. China
| | - Feng Liu
- Department of Radiology and Tianjin Key Laboratory of Functional Imaging, Tianjin Medical University General Hospital, Tianjin, P.R. China
| | - Junping Wang
- Department of Radiology and Tianjin Key Laboratory of Functional Imaging, Tianjin Medical University General Hospital, Tianjin, P.R. China
| | - Gunter Schumann
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.,Medical Research Council Social, Genetic and Developmental Psychiatry Centre, London, UK
| | - Chunshui Yu
- Department of Radiology and Tianjin Key Laboratory of Functional Imaging, Tianjin Medical University General Hospital, Tianjin, P.R. China.,CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, P.R. China
| |
Collapse
|
24
|
Wang C, Zhang S. Large-scale determination and characterization of cell type-specific regulatory elements in the human genome. J Mol Cell Biol 2019; 9:463-476. [PMID: 29281093 DOI: 10.1093/jmcb/mjx058] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2017] [Accepted: 12/19/2017] [Indexed: 01/05/2023] Open
Abstract
Histone modifications have been widely elucidated to play vital roles in gene regulation and cell identity. The Roadmap Epigenomics Consortium generated a reference catalog of several key histone modifications across >100s of human cell types and tissues. Decoding these epigenomes into functional regulatory elements is a challenging task in computational biology. To this end, we adopted a differential chromatin modification analysis framework to comprehensively determine and characterize cell type-specific regulatory elements (CSREs) and their histone modification codes in the human epigenomes of five histone modifications across 127 tissues or cell types. The CSREs show significant relevance with cell type-specific biological functions and diseases and cell identity. Clustering of CSREs with their specificity signals reveals distinct histone codes, demonstrating the diversity of functional roles of CSREs within the same cell or tissue. Last but not least, dynamics of CSREs from close cell types or tissues can give a detailed view of developmental processes such as normal tissue development and cancer occurrence.
Collapse
Affiliation(s)
- Can Wang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
25
|
|
26
|
Ioannidis NM, Davis JR, DeGorter MK, Larson NB, McDonnell SK, French AJ, Battle AJ, Hastie TJ, Thibodeau SN, Montgomery SB, Bustamante CD, Sieh W, Whittemore AS. FIRE: functional inference of genetic variants that regulate gene expression. Bioinformatics 2018; 33:3895-3901. [PMID: 28961785 DOI: 10.1093/bioinformatics/btx534] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 08/23/2017] [Indexed: 12/18/2022] Open
Abstract
Motivation Interpreting genetic variation in noncoding regions of the genome is an important challenge for personal genome analysis. One mechanism by which noncoding single nucleotide variants (SNVs) influence downstream phenotypes is through the regulation of gene expression. Methods to predict whether or not individual SNVs are likely to regulate gene expression would aid interpretation of variants of unknown significance identified in whole-genome sequencing studies. Results We developed FIRE (Functional Inference of Regulators of Expression), a tool to score both noncoding and coding SNVs based on their potential to regulate the expression levels of nearby genes. FIRE consists of 23 random forests trained to recognize SNVs in cis-expression quantitative trait loci (cis-eQTLs) using a set of 92 genomic annotations as predictive features. FIRE scores discriminate cis-eQTL SNVs from non-eQTL SNVs in the training set with a cross-validated area under the receiver operating characteristic curve (AUC) of 0.807, and discriminate cis-eQTL SNVs shared across six populations of different ancestry from non-eQTL SNVs with an AUC of 0.939. FIRE scores are also predictive of cis-eQTL SNVs across a variety of tissue types. Availability and implementation FIRE scores for genome-wide SNVs in hg19/GRCh37 are available for download at https://sites.google.com/site/fireregulatoryvariation/. Contact nilah@stanford.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Marianne K DeGorter
- Department of Genetics
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | | | | | - Amy J French
- Department of Laboratory Medicine & Pathology, Mayo Clinic, Rochester, MN 55905, USA
| | - Alexis J Battle
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Trevor J Hastie
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Stephen N Thibodeau
- Department of Laboratory Medicine & Pathology, Mayo Clinic, Rochester, MN 55905, USA
| | - Stephen B Montgomery
- Department of Genetics
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Carlos D Bustamante
- Department of Genetics
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Weiva Sieh
- Department of Health Research & Policy
- Department of Population Health Science & Policy
- Department of Genetics & Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Alice S Whittemore
- Department of Health Research & Policy
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
27
|
Abstract
Defects in chromatin modifiers and remodelers have been described both for hematological and solid malignancies, corroborating and strengthening the role of epigenetic aberrations in the etiology of cancer. Furthermore, epigenetic marks-DNA methylation, histone modifications, chromatin remodeling, and microRNA-can be considered potential markers of cancer development and progression. Here, we review whether altered epigenetic landscapes are merely a consequence of chromatin modifier/remodeler aberrations or a hallmark of cancer etiology. We critically evaluate current knowledge on causal epigenetic aberrations and examine to what extent the prioritization of (epi)genetic deregulations can be assessed in cancer as some type of genetic lesion characterizing solid cancer progression. We also discuss the multiple challenges in developing compounds targeting epigenetic enzymes (named epidrugs) for epigenetic-based therapies. The implementation of acquired knowledge of epigenetic biomarkers for patient stratification, together with the development of next-generation epidrugs and predictive models, will take our understanding and use of cancer epigenetics in diagnosis, prognosis, and treatment of cancer patients to a new level.
Collapse
Affiliation(s)
- Angela Nebbioso
- Dipartimento di Medicina di Precisione, Università degli Studi della Campania "L. Vanvitelli," Napoli, Italy
| | - Francesco Paolo Tambaro
- Struttura Semplice Dipartimentale Trapianto di Midollo Osseo-Azienda Ospedialiera di Rilievo Nazionale, Santobono-Pausilipon, Napoli, Italy
| | - Carmela Dell'Aversana
- Dipartimento di Medicina di Precisione, Università degli Studi della Campania "L. Vanvitelli," Napoli, Italy
| | - Lucia Altucci
- Dipartimento di Medicina di Precisione, Università degli Studi della Campania "L. Vanvitelli," Napoli, Italy
| |
Collapse
|
28
|
Backenroth D, He Z, Kiryluk K, Boeva V, Pethukova L, Khurana E, Christiano A, Buxbaum JD, Ionita-Laza I. FUN-LDA: A Latent Dirichlet Allocation Model for Predicting Tissue-Specific Functional Effects of Noncoding Variation: Methods and Applications. Am J Hum Genet 2018; 102:920-942. [PMID: 29727691 PMCID: PMC5986983 DOI: 10.1016/j.ajhg.2018.03.026] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 03/21/2018] [Indexed: 10/17/2022] Open
Abstract
We describe a method based on a latent Dirichlet allocation model for predicting functional effects of noncoding genetic variants in a cell-type- and/or tissue-specific way (FUN-LDA). Using this unsupervised approach, we predict tissue-specific functional effects for every position in the human genome in 127 different tissues and cell types. We demonstrate the usefulness of our predictions by using several validation experiments. Using eQTL data from several sources, including the GTEx project, Geuvadis project, and TwinsUK cohort, we show that eQTLs in specific tissues tend to be most enriched among the predicted functional variants in relevant tissues in Roadmap. We further show how these integrated functional scores can be used for (1) deriving the most likely cell or tissue type causally implicated for a complex trait by using summary statistics from genome-wide association studies and (2) estimating a tissue-based correlation matrix of various complex traits. We found large enrichment of heritability in functional components of relevant tissues for various complex traits, and FUN-LDA yielded higher enrichment estimates than existing methods. Finally, using experimentally validated functional variants from the literature and variants possibly implicated in disease by previous studies, we rigorously compare FUN-LDA with state-of-the-art functional annotation methods and show that FUN-LDA has better prediction accuracy and higher resolution than these methods. In particular, our results suggest that tissue- and cell-type-specific functional prediction methods tend to have substantially better prediction accuracy than organism-level prediction methods. Scores for each position in the human genome and for each ENCODE and Roadmap tissue are available online (see Web Resources).
Collapse
Affiliation(s)
- Daniel Backenroth
- Department of Biostatistics, Columbia University, New York, NY 10032, USA
| | - Zihuai He
- Department of Biostatistics, Columbia University, New York, NY 10032, USA
| | - Krzysztof Kiryluk
- Department of Medicine, Columbia University, New York, NY 10032, USA
| | - Valentina Boeva
- INSERM, U900, 75005 Paris, France; Institut Curie, Mines ParisTech, PSL Research University, 75005 Paris, France
| | - Lynn Pethukova
- Department of Epidemiology, Columbia University, New York, NY 10032, USA; Department of Dermatology, Columbia University, New York, NY 10032, USA
| | - Ekta Khurana
- Department of Physiology and Biophysics, Weill Medical College, Cornell University, New York, NY 10021, USA
| | - Angela Christiano
- Department of Dermatology, Columbia University, New York, NY 10032, USA; Department of Genetics and Development, Columbia University, New York, NY 10032, USA
| | - Joseph D Buxbaum
- Departments of Psychiatry, Neuroscience, and Genetics and Genomic Sciences, Icahn School of Medicine at Mount SInai, New York, NY 10029, USA; Friedman Brain Institute and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | | |
Collapse
|
29
|
Ying D, Li MJ, Sham PC, Li M. A powerful approach reveals numerous expression quantitative trait haplotypes in multiple tissues. Bioinformatics 2018; 34:3145-3150. [DOI: 10.1093/bioinformatics/bty318] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 04/25/2018] [Indexed: 12/21/2022] Open
Affiliation(s)
- Dingge Ying
- Department of Psychiatry, The Centre for Genomic Sciences, State Key Laboratory of Brain and Cognitive Sciences, The University of Hong Kong, Pokfulam, Hong Kong, China
| | - Mulin Jun Li
- Department of Psychiatry, The Centre for Genomic Sciences, State Key Laboratory of Brain and Cognitive Sciences, The University of Hong Kong, Pokfulam, Hong Kong, China
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Pak Chung Sham
- Department of Psychiatry, The Centre for Genomic Sciences, State Key Laboratory of Brain and Cognitive Sciences, The University of Hong Kong, Pokfulam, Hong Kong, China
| | - Miaoxin Li
- Department of Psychiatry, The Centre for Genomic Sciences, State Key Laboratory of Brain and Cognitive Sciences, The University of Hong Kong, Pokfulam, Hong Kong, China
- Zhongshan School of Medicine, Center for Disease Genomics, Sun Yat-Sen University, Guangzhou, China
- Key Laboratory of Tropical Disease Control (SYSU), Ministry of Education, Guangzhou, China
| |
Collapse
|
30
|
Lee PH, Lee C, Li X, Wee B, Dwivedi T, Daly M. Principles and methods of in-silico prioritization of non-coding regulatory variants. Hum Genet 2018; 137:15-30. [PMID: 29288389 PMCID: PMC5892192 DOI: 10.1007/s00439-017-1861-0] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 12/14/2017] [Indexed: 12/13/2022]
Abstract
Over a decade of genome-wide association, studies have made great strides toward the detection of genes and genetic mechanisms underlying complex traits. However, the majority of associated loci reside in non-coding regions that are functionally uncharacterized in general. Now, the availability of large-scale tissue and cell type-specific transcriptome and epigenome data enables us to elucidate how non-coding genetic variants can affect gene expressions and are associated with phenotypic changes. Here, we provide an overview of this emerging field in human genomics, summarizing available data resources and state-of-the-art analytic methods to facilitate in-silico prioritization of non-coding regulatory mutations. We also highlight the limitations of current approaches and discuss the direction of much-needed future research.
Collapse
Affiliation(s)
- Phil H Lee
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA.
- Quantitative Genomics Program, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | - Christian Lee
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA
- Department of Life Sciences, Harvard University, Cambridge, MA, USA
| | - Xihao Li
- Quantitative Genomics Program, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Brian Wee
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA
| | - Tushar Dwivedi
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Mark Daly
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| |
Collapse
|
31
|
Han J, Li J, Achour I, Pesce L, Foster I, Li H, Lussier YA. Convergent downstream candidate mechanisms of independent intergenic polymorphisms between co-classified diseases implicate epistasis among noncoding elements. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018; 23:524-535. [PMID: 29218911 PMCID: PMC5730078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Eighty percent of DNA outside protein coding regions was shown biochemically functional by the ENCODE project, enabling studies of their interactions. Studies have since explored how convergent downstream mechanisms arise from independent genetic risks of one complex disease. However, the cross-talk and epistasis between intergenic risks associated with distinct complex diseases have not been comprehensively characterized. Our recent integrative genomic analysis unveiled downstream biological effectors of disease-specific polymorphisms buried in intergenic regions, and we then validated their genetic synergy and antagonism in distinct GWAS. We extend this approach to characterize convergent downstream candidate mechanisms of distinct intergenic SNPs across distinct diseases within the same clinical classification. We construct a multipartite network consisting of 467 diseases organized in 15 classes, 2,358 disease-associated SNPs, 6,301 SNPassociated mRNAs by eQTL, and mRNA annotations to 4,538 Gene Ontology mechanisms. Functional similarity between two SNPs (similar SNP pairs) is imputed using a nested information theoretic distance model for which p-values are assigned by conservative scale-free permutation of network edges without replacement (node degrees constant). At FDR≤5%, we prioritized 3,870 intergenic SNP pairs associated, among which 755 are associated with distinct diseases sharing the same disease class, implicating 167 intergenic SNPs, 14 classes, 230 mRNAs, and 134 GO terms. Co-classified SNP pairs were more likely to be prioritized as compared to those of distinct classes confirming a noncoding genetic underpinning to clinical classification (odds ratio ∼3.8; p≤10-25). The prioritized pairs were also enriched in regions bound to the same/interacting transcription factors and/or interacting in long-range chromatin interactions suggestive of epistasis (odds ratio ∼ 2,500; p≤10-25). This prioritized network implicates complex epistasis between intergenic polymorphisms of co-classified diseases and offers a roadmap for a novel therapeutic paradigm: repositioning medications that target proteins within downstream mechanisms of intergenic disease-associated SNPs. Supplementary information and software: http://lussiergroup.org/publications/disease_class.
Collapse
Affiliation(s)
- Jiali Han
- Center for Biomedical Informatics and Biostatistics (CB2) and Departments of Medicine and of Systems and Industrial Engineering, The University of Arizona, Tucson, AZ 85721, USA,
| | | | | | | | | | | | | |
Collapse
|