1
|
Mackay TFC, Anholt RRH. Pleiotropy, epistasis and the genetic architecture of quantitative traits. Nat Rev Genet 2024; 25:639-657. [PMID: 38565962 PMCID: PMC11330371 DOI: 10.1038/s41576-024-00711-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/14/2024] [Indexed: 04/04/2024]
Abstract
Pleiotropy (whereby one genetic polymorphism affects multiple traits) and epistasis (whereby non-linear interactions between genetic polymorphisms affect the same trait) are fundamental aspects of the genetic architecture of quantitative traits. Recent advances in the ability to characterize the effects of polymorphic variants on molecular and organismal phenotypes in human and model organism populations have revealed the prevalence of pleiotropy and unexpected shared molecular genetic bases among quantitative traits, including diseases. By contrast, epistasis is common between polymorphic loci associated with quantitative traits in model organisms, such that alleles at one locus have different effects in different genetic backgrounds, but is rarely observed for human quantitative traits and common diseases. Here, we review the concepts and recent inferences about pleiotropy and epistasis, and discuss factors that contribute to similarities and differences between the genetic architecture of quantitative traits in model organisms and humans.
Collapse
Affiliation(s)
- Trudy F C Mackay
- Center for Human Genetics, Clemson University, Greenwood, SC, USA.
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA.
| | - Robert R H Anholt
- Center for Human Genetics, Clemson University, Greenwood, SC, USA.
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA.
| |
Collapse
|
2
|
Ng JWY, Felix JF, Olson DM. A novel approach to risk exposure and epigenetics-the use of multidimensional context to gain insights into the early origins of cardiometabolic and neurocognitive health. BMC Med 2023; 21:466. [PMID: 38012757 PMCID: PMC10683259 DOI: 10.1186/s12916-023-03168-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 11/09/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND Each mother-child dyad represents a unique combination of genetic and environmental factors. This constellation of variables impacts the expression of countless genes. Numerous studies have uncovered changes in DNA methylation (DNAm), a form of epigenetic regulation, in offspring related to maternal risk factors. How these changes work together to link maternal-child risks to childhood cardiometabolic and neurocognitive traits remains unknown. This question is a key research priority as such traits predispose to future non-communicable diseases (NCDs). We propose viewing risk and the genome through a multidimensional lens to identify common DNAm patterns shared among diverse risk profiles. METHODS We identified multifactorial Maternal Risk Profiles (MRPs) generated from population-based data (n = 15,454, Avon Longitudinal Study of Parents and Children (ALSPAC)). Using cord blood HumanMethylation450 BeadChip data, we identified genome-wide patterns of DNAm that co-vary with these MRPs. We tested the prospective relation of these DNAm patterns (n = 914) to future outcomes using decision tree analysis. We then tested the reproducibility of these patterns in (1) DNAm data at age 7 and 17 years within the same cohort (n = 973 and 974, respectively) and (2) cord DNAm in an independent cohort, the Generation R Study (n = 686). RESULTS We identified twenty MRP-related DNAm patterns at birth in ALSPAC. Four were prospectively related to cardiometabolic and/or neurocognitive childhood outcomes. These patterns were replicated in DNAm data from blood collected at later ages. Three of these patterns were externally validated in cord DNAm data in Generation R. Compared to previous literature, DNAm patterns exhibited novel spatial distribution across the genome that intersects with chromatin functional and tissue-specific signatures. CONCLUSIONS To our knowledge, we are the first to leverage multifactorial population-wide data to detect patterns of variability in DNAm. This context-based approach decreases biases stemming from overreliance on specific samples or variables. We discovered molecular patterns demonstrating prospective and replicable relations to complex traits. Moreover, results suggest that patterns harbour a genome-wide organisation specific to chromatin regulation and target tissues. These preliminary findings warrant further investigation to better reflect the reality of human context in molecular studies of NCDs.
Collapse
Affiliation(s)
- Jane W Y Ng
- Department of Pediatrics, Cummings School of Medicine, University of Calgary, 28 Oki Drive NW, Calgary, AB, T3B 6A8, Canada
| | - Janine F Felix
- The Generation F Study Group, Erasmus MC University Medical Center Rotterdam, Postbus, 2040, 3000 CA, Rotterdam, The Netherlands
- Department of Pediatrics, Erasmus MC University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - David M Olson
- Departments of Obstetrics and Gynecology, Physiology, and Pediatrics, Faculty of Medicine and Dentistry, University of Alberta, 220 HMRC, Edmonton, AB, T6G2S2, Canada.
| |
Collapse
|
3
|
Yao Q, Gorevic P, Shen B, Gibson G. Genetically transitional disease: a new concept in genomic medicine. Trends Genet 2023; 39:98-108. [PMID: 36564319 DOI: 10.1016/j.tig.2022.11.002] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 11/02/2022] [Accepted: 11/27/2022] [Indexed: 12/24/2022]
Abstract
Traditional classification of genetic diseases as monogenic and polygenic has lagged far behind scientific progress. In this opinion article, we propose and define a new terminology, genetically transitional disease (GTD), referring to cases where a large-effect mutation is necessary, but not sufficient, to cause disease. This leads to a working disease nosology based on gradients of four types of genetic architecture: monogenic, polygenic, GTD, and mixed. We present four scenarios under which GTD may occur; namely, subsets of traditionally Mendelian disease, modifiable Tier 1 monogenic conditions, variable penetrance, and situations where a genetic mutational spectrum produces qualitatively divergent pathologies. The implications of the new nosology in precision medicine are discussed, in which therapeutic options may target the molecular cause or the disease phenotype.
Collapse
Affiliation(s)
- Qingping Yao
- Division of Rheumatology, Allergy, and Immunology, Stony Brook University Renaissance School of Medicine, Stony Brook, NY, USA.
| | - Peter Gorevic
- Division of Rheumatology, Allergy, and Immunology, Stony Brook University Renaissance School of Medicine, Stony Brook, NY, USA
| | - Bo Shen
- Center for Inflammatory Bowel Diseases, New York-Presbyterian/Columbia University Irving Medical Center, New York, NY, USA
| | - Greg Gibson
- Center for Integrative Genomics, School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| |
Collapse
|
4
|
Xiao J, Cai M, Yu X, Hu X, Chen G, Wan X, Yang C. Leveraging the local genetic structure for trans-ancestry association mapping. Am J Hum Genet 2022; 109:1317-1337. [PMID: 35714612 PMCID: PMC9300880 DOI: 10.1016/j.ajhg.2022.05.013] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Accepted: 05/23/2022] [Indexed: 01/09/2023] Open
Abstract
Over the past two decades, genome-wide association studies (GWASs) have successfully advanced our understanding of the genetic basis of complex traits. Despite the fruitful discovery of GWASs, most GWAS samples are collected from European populations, and these GWASs are often criticized for their lack of ancestry diversity. Trans-ancestry association mapping (TRAM) offers an exciting opportunity to fill the gap of disparities in genetic studies between non-Europeans and Europeans. Here, we propose a statistical method, LOG-TRAM, to leverage the local genetic architecture for TRAM. By using biobank-scale datasets, we showed that LOG-TRAM can greatly improve the statistical power of identifying risk variants in under-represented populations while producing well-calibrated p values. We applied LOG-TRAM to the GWAS summary statistics of various complex traits/diseases from BioBank Japan, UK Biobank, and African populations. We obtained substantial gains in power and achieved effective correction of confounding biases in TRAM. Finally, we showed that LOG-TRAM can be successfully applied to identify ancestry-specific loci and the LOG-TRAM output can be further used for construction of more accurate polygenic risk scores in under-represented populations.
Collapse
Affiliation(s)
- Jiashun Xiao
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Mingxuan Cai
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Xinyi Yu
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Xianghong Hu
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Gang Chen
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Xiang Wan
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China; Pazhou Lab, Guangzhou 510330, China.
| | - Can Yang
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
| |
Collapse
|
5
|
Xiao J, Cai M, Hu X, Wan X, Chen G, Yang C. XPXP: improving polygenic prediction by cross-population and cross-phenotype analysis. Bioinformatics 2022; 38:1947-1955. [PMID: 35040939 DOI: 10.1093/bioinformatics/btac029] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 11/16/2021] [Accepted: 01/12/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION As increasing sample sizes from genome-wide association studies (GWASs), polygenic risk scores (PRSs) have shown great potential in personalized medicine with disease risk prediction, prevention and treatment. However, the PRS constructed using European samples becomes less accurate when it is applied to individuals from non-European populations. It is an urgent task to improve the accuracy of PRSs in under-represented populations, such as African populations and East Asian populations. RESULTS In this article, we propose a cross-population and cross-phenotype (XPXP) method for construction of PRSs in under-represented populations. XPXP can construct accurate PRSs by leveraging biobank-scale datasets in European populations and multiple GWASs of genetically correlated phenotypes. XPXP also allows to incorporate population-specific and phenotype-specific effects, and thus further improves the accuracy of PRS. Through comprehensive simulation studies and real data analysis, we demonstrated that our XPXP outperformed existing PRS approaches. We showed that the height PRSs constructed by XPXP achieved 9% and 18% improvement over the runner-up method in terms of predicted R2 in East Asian and African populations, respectively. We also showed that XPXP substantially improved the stratification ability in identifying individuals at high genetic risk of type 2 diabetes. AVAILABILITY AND IMPLEMENTATION The XPXP software and all analysis code are available at github.com/YangLabHKUST/XPXP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiashun Xiao
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China.,Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Mingxuan Cai
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China.,Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Xianghong Hu
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China.,Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Xiang Wan
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China.,Pazhou Lab, Guangzhou 510330, China
| | - Gang Chen
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Can Yang
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China.,Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| |
Collapse
|
6
|
Sitlani CM, Baldassari AR, Highland HM, Hodonsky CJ, McKnight B, Avery CL. Comparison of adaptive multiple phenotype association tests using summary statistics in genome-wide association studies. Hum Mol Genet 2021; 30:1371-1383. [PMID: 33949650 PMCID: PMC8283209 DOI: 10.1093/hmg/ddab126] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 04/26/2021] [Accepted: 04/27/2021] [Indexed: 12/15/2022] Open
Abstract
Genome-wide association studies have been successful mapping loci for individual phenotypes, but few studies have comprehensively interrogated evidence of shared genetic effects across multiple phenotypes simultaneously. Statistical methods have been proposed for analyzing multiple phenotypes using summary statistics, which enables studies of shared genetic effects while avoiding challenges associated with individual-level data sharing. Adaptive tests have been developed to maintain power against multiple alternative hypotheses because the most powerful single-alternative test depends on the underlying structure of the associations between the multiple phenotypes and a single nucleotide polymorphism (SNP). Here we compare the performance of six such adaptive tests: two adaptive sum of powered scores (aSPU) tests, the unified score association test (metaUSAT), the adaptive test in a mixed-models framework (mixAda) and two principal-component-based adaptive tests (PCAQ and PCO). Our simulations highlight practical challenges that arise when multivariate distributions of phenotypes do not satisfy assumptions of multivariate normality. Previous reports in this context focus on low minor allele count (MAC) and omit the aSPU test, which relies less than other methods on asymptotic and distributional assumptions. When these assumptions are not satisfied, particularly when MAC is low and/or phenotype covariance matrices are singular or nearly singular, aSPU better preserves type I error, sometimes at the cost of decreased power. We illustrate this trade-off with multiple phenotype analyses of six quantitative electrocardiogram traits in the Population Architecture using Genomics and Epidemiology (PAGE) study.
Collapse
Affiliation(s)
- Colleen M Sitlani
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA 98101 USA
| | - Antoine R Baldassari
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516 USA
| | - Heather M Highland
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516 USA
| | - Chani J Hodonsky
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908 USA
| | - Barbara McKnight
- Department of Biostatistics, University of Washington, Seattle, WA 98195 USA
| | - Christy L Avery
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516 USA
| |
Collapse
|
7
|
Li T, Ning Z, Yang Z, Zhai R, Zheng C, Xu W, Wang Y, Ying K, Chen Y, Shen X. Total genetic contribution assessment across the human genome. Nat Commun 2021; 12:2845. [PMID: 33990588 PMCID: PMC8121943 DOI: 10.1038/s41467-021-23124-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 04/08/2021] [Indexed: 01/11/2023] Open
Abstract
Quantifying the overall magnitude of every single locus' genetic effect on the widely measured human phenome is of great challenge. We introduce a unified modelling technique that can consistently provide a total genetic contribution assessment (TGCA) of a gene or genetic variant without thresholding genetic association signals. Genome-wide TGCA in five UK Biobank phenotype domains highlights loci such as the HLA locus for medical conditions, the bone mineral density locus WNT16 for physical measures, and the skin tanning locus MC1R and smoking behaviour locus CHRNA3 for lifestyle. Tissue-specificity investigation reveals several tissues associated with total genetic contributions, including the brain tissues for mental health. Such associations are driven by tissue-specific gene expressions, which share genetic basis with the total genetic contributions. TGCA can provide a genome-wide atlas for the overall genetic contributions in each particular domain of human complex traits.
Collapse
Affiliation(s)
- Ting Li
- Biostatistics Group, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Zheng Ning
- Biostatistics Group, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Zhijian Yang
- Biostatistics Group, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Ranran Zhai
- Biostatistics Group, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Chenqing Zheng
- Biostatistics Group, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Wenzheng Xu
- Biostatistics Group, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Yipeng Wang
- Biostatistics Group, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Kejun Ying
- Biostatistics Group, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
| | - Yiwen Chen
- Biostatistics Group, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Xia Shen
- Biostatistics Group, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China.
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
8
|
Broc C, Truong T, Liquet B. Penalized partial least squares for pleiotropy. BMC Bioinformatics 2021; 22:86. [PMID: 33627076 PMCID: PMC7905667 DOI: 10.1186/s12859-021-03968-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 01/14/2021] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level. RESULTS Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an example of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers. CONCLUSION The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.
Collapse
Affiliation(s)
- Camilo Broc
- LIST, CEA, Laboratory for Data Sciences and Decision (Digiteo), Gif-sur-Yvette, France
- CNRS, Laboratoire de Mathématiques et de leurs Applications de PAU E2S UPPA, Pau, France
| | - Therese Truong
- UVSQ, Inserm, CESP, Université Paris-Saclay, 94807 Villejuif, France
- Institut Gustave Roussy, 94805 Villejuif, France
| | - Benoit Liquet
- CNRS, Laboratoire de Mathématiques et de leurs Applications de PAU E2S UPPA, Pau, France
- Department of Mathematics and Statistics, Macquarie University, Sydney, Australia
| |
Collapse
|
9
|
Yuan X, Biswas S. Detecting rare haplotype association with two correlated phenotypes of binary and continuous types. Stat Med 2021; 40:1877-1900. [PMID: 33438281 DOI: 10.1002/sim.8877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Revised: 11/18/2020] [Accepted: 12/25/2020] [Indexed: 11/10/2022]
Abstract
Multiple correlated traits/phenotypes are often collected in genetic association studies and they may share a common genetic mechanism. Joint analysis of correlated phenotypes has well-known advantages over one-at-a-time analysis including gain in power and better understanding of genetic etiology. However, when the phenotypes are of discordant types such as binary and continuous, the joint modeling is more challenging. Another research area of current interest is discovery of rare genetic variants. Currently there is no method available for detecting association of rare (or common) haplotypes with multiple discordant phenotypes jointly. Our goal is to fill this gap specifically for two discordant phenotypes. We consider a rare haplotype association method for a binary phenotype, logistic Bayesian LASSO (univariate LBL) and its extension for two correlated binary phenotypes (bivariate LBL-2B). Under this framework, we propose a haplotype association test with binary and continuous phenotypes jointly (bivariate LBL-BC). Specifically, we use a latent variable to induce correlation between the two phenotypes. We carry out extensive simulations to investigate bivariate LBL-BC and compare it with univariate LBL and bivariate LBL-2B. In most settings, bivariate LBL-BC performs the best. In only two situations, bivariate LBL-BC has similar performance-when the two phenotypes are (1) weakly or not correlated and the target haplotype affects the binary phenotype only and (2) strongly positively correlated and the target haplotype affects both phenotypes in positive direction. Finally, we apply the method to a data set on lung cancer and nicotine dependence and detect several haplotypes including a rare one.
Collapse
Affiliation(s)
- Xiaochen Yuan
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| |
Collapse
|
10
|
Zinski AL, Carrion S, Michal JJ, Gartstein MA, Quock RM, Davis JF, Jiang Z. Genome-to-phenome research in rats: progress and perspectives. Int J Biol Sci 2021; 17:119-133. [PMID: 33390838 PMCID: PMC7757052 DOI: 10.7150/ijbs.51628] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Accepted: 10/06/2020] [Indexed: 01/07/2023] Open
Abstract
Because of their relatively short lifespan (<4 years), rats have become the second most used model organism to study health and diseases in humans who may live for up to 120 years. First-, second- and third-generation sequencing technologies and platforms have produced increasingly greater sequencing depth and accurate reads, leading to significant advancements in the rat genome assembly during the last 20 years. In fact, whole genome sequencing (WGS) of 47 strains have been completed. This has led to the discovery of genome variants in rats, which have been widely used to detect quantitative trait loci underlying complex phenotypes based on gene, haplotype, and sweep association analyses. DNA variants can also reveal strain, chromosome and gene functional evolutions. In parallel, phenome programs have advanced significantly in rats during the last 15 years and more than 10 databases host genome and/or phenome information. In order to discover the bridges between genome and phenome, systems genetics and integrative genomics approaches have been developed. On the other hand, multiple level information transfers from genome to phenome are executed by differential usage of alternative transcriptional start (ATS) and polyadenylation (APA) sites per gene. We used our own experiments to demonstrate how alternative transcriptome analysis can lead to enrichment of phenome-related causal pathways in rats. Development of advanced genome-to-phenome assays will certainly enhance rats as models for human biomedical research.
Collapse
Affiliation(s)
- Amy L. Zinski
- Department of Animal Sciences, Washington State University, Pullman, WA 99164-7620
| | - Shane Carrion
- Department of Animal Sciences, Washington State University, Pullman, WA 99164-7620
| | - Jennifer J. Michal
- Department of Animal Sciences, Washington State University, Pullman, WA 99164-7620
| | - Maria A. Gartstein
- Department of Psychology, Washington State University, Pullman, WA 99164-4820
| | - Raymond M. Quock
- Department of Psychology, Washington State University, Pullman, WA 99164-4820
| | - Jon F. Davis
- Department of Integrative Physiology and Neuroscience, Washington State University, Pullman, WA 99164-7620
| | - Zhihua Jiang
- Department of Animal Sciences, Washington State University, Pullman, WA 99164-7620
| |
Collapse
|
11
|
Sha Q, Lyu J, Zhao M, Li H, Guo M, Sun Q. Multi-Omics Analysis of Diabetic Nephropathy Reveals Potential New Mechanisms and Drug Targets. Front Genet 2020; 11:616435. [PMID: 33362869 PMCID: PMC7759603 DOI: 10.3389/fgene.2020.616435] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 11/23/2020] [Indexed: 12/21/2022] Open
Abstract
Diabetic nephropathy (DN) is one of the most common diabetic complications, which is the major course of end-stage renal disease (ESRD). However, the systematical molecular characterizations during DN pathogenesis and progression has not been not well understood. To identify the fundamental mediators of the pathogenesis and progression of DN. we performed a combination RNASeq, proteomics, and metabolomics analyses of both patients’ derived kidney biopsy samples and kidneys from in vivo DN model. As a result, molecular changes of DN contain extracellular matrix accumulation, abnormal activated inflamed microenvironment, and metabolism disorders, bringing about glomerular sclerosis and tubular interstitial fibrosis. Specificity, Further integration analyses have identified that the linoleic acid metabolism and fatty-acids β-oxidation are significantly inhibited during DN pathogenesis and progression, the transporter protein ABCD3, the fatty acyl-CoA activated enzymes ACOX1, ACOX2, and ACOX3, and some corresponding metabolites such as 13′-HODE, stearidonic acid, docosahexaenoic acid, (±)10(11)-EpDPA were also significantly reduced. Our study thus provides potential molecular mechanisms for DN progression and suggests that targeting the key enzymes or supplying some lipids may be a promising avenue in the treatment of DN, especially advanced-stage DN.
Collapse
Affiliation(s)
- Qian Sha
- Department of Pharmacy, The Affiliated Hospital of Xuzhou Medical University, Xuzhou, China.,Jiangsu Key Laboratory of New Drug Research and Clinical Pharmacy, Xuzhou Medical University, Xuzhou, China
| | - Jinxiu Lyu
- Jiangsu Key Laboratory of New Drug Research and Clinical Pharmacy, Xuzhou Medical University, Xuzhou, China
| | - Meng Zhao
- Jiangsu Key Laboratory of New Drug Research and Clinical Pharmacy, Xuzhou Medical University, Xuzhou, China
| | - Haijuan Li
- Jiangsu Key Laboratory of New Drug Research and Clinical Pharmacy, Xuzhou Medical University, Xuzhou, China
| | - Mengzhe Guo
- Jiangsu Key Laboratory of New Drug Research and Clinical Pharmacy, Xuzhou Medical University, Xuzhou, China
| | - Qiang Sun
- Jiangsu Key Laboratory of New Drug Research and Clinical Pharmacy, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
12
|
A powerful method for pleiotropic analysis under composite null hypothesis identifies novel shared loci between Type 2 Diabetes and Prostate Cancer. PLoS Genet 2020; 16:e1009218. [PMID: 33290408 PMCID: PMC7748289 DOI: 10.1371/journal.pgen.1009218] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 12/18/2020] [Accepted: 10/22/2020] [Indexed: 12/24/2022] Open
Abstract
There is increasing evidence that pleiotropy, the association of multiple traits with the same genetic variants/loci, is a very common phenomenon. Cross-phenotype association tests are often used to jointly analyze multiple traits from a genome-wide association study (GWAS). The underlying methods, however, are often designed to test the global null hypothesis that there is no association of a genetic variant with any of the traits, the rejection of which does not implicate pleiotropy. In this article, we propose a new statistical approach, PLACO, for specifically detecting pleiotropic loci between two traits by considering an underlying composite null hypothesis that a variant is associated with none or only one of the traits. We propose testing the null hypothesis based on the product of the Z-statistics of the genetic variants across two studies and derive a null distribution of the test statistic in the form of a mixture distribution that allows for fractions of variants to be associated with none or only one of the traits. We borrow approaches from the statistical literature on mediation analysis that allow asymptotic approximation of the null distribution avoiding estimation of nuisance parameters related to mixture proportions and variance components. Simulation studies demonstrate that the proposed method can maintain type I error and can achieve major power gain over alternative simpler methods that are typically used for testing pleiotropy. PLACO allows correlation in summary statistics between studies that may arise due to sharing of controls between disease traits. Application of PLACO to publicly available summary data from two large case-control GWAS of Type 2 Diabetes and of Prostate Cancer implicated a number of novel shared genetic regions: 3q23 (ZBTB38), 6q25.3 (RGS17), 9p22.1 (HAUS6), 9p13.3 (UBAP2), 11p11.2 (RAPSN), 14q12 (AKAP6), 15q15 (KNL1) and 18q23 (ZNF236). We propose a new approach PLACO that uses aggregate-level genotype-phenotype association statistics—commonly referred to as GWAS summary statistics—to identify genetic variants that influence risk of two traits or diseases. It allows correlation in summary statistics between studies that may arise due to sharing of controls between disease traits. We demonstrate that PLACO can achieve major power gain over alternative methods that are typically used. We applied PLACO to Type 2 Diabetes and Prostate Cancer summary data from two large case-control studies. Many previous studies have reported an inverse association of these two chronic diseases suggesting shared risk factors; however, shared genetic mechanisms underlying this association is poorly understood. PLACO identified a number of novel shared genetic regions that are not detected by individual trait analysis. Many of the loci implicated by PLACO increase risk for one disease while decreasing risk for the other. PLACO can similarly be used on other traits to shed light on shared genetic risk factors.
Collapse
|
13
|
Ming J, Wang T, Yang C. LPM: a latent probit model to characterize the relationship among complex traits using summary statistics from multiple GWASs and functional annotations. Bioinformatics 2020; 36:2506-2514. [PMID: 31860024 DOI: 10.1093/bioinformatics/btz947] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2019] [Revised: 12/13/2019] [Accepted: 12/18/2019] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION Much effort has been made toward understanding the genetic architecture of complex traits and diseases. In the past decade, fruitful GWAS findings have highlighted the important role of regulatory variants and pervasive pleiotropy. Because of the accumulation of GWAS data on a wide range of phenotypes and high-quality functional annotations in different cell types, it is timely to develop a statistical framework to explore the genetic architecture of human complex traits by integrating rich data resources. RESULTS In this study, we propose a unified statistical approach, aiming to characterize relationship among complex traits, and prioritize risk variants by leveraging regulatory information collected in functional annotations. Specifically, we consider a latent probit model (LPM) to integrate summary-level GWAS data and functional annotations. The developed computational framework not only makes LPM scalable to hundreds of annotations and phenotypes but also ensures its statistically guaranteed accuracy. Through comprehensive simulation studies, we evaluated LPM's performance and compared it with related methods. Then, we applied it to analyze 44 GWASs with 9 genic category annotations and 127 cell-type specific functional annotations. The results demonstrate the benefits of LPM and gain insights of genetic architecture of complex traits. AVAILABILITY AND IMPLEMENTATION The LPM package, all simulation codes and real datasets in this study are available at https://github.com/mingjingsi/LPM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jingsi Ming
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Tao Wang
- Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, Shanghai, China.,MoE Key Lab of Artificial Intelligence, Shanghai Jiao Tong University, Shanghai, China
| | - Can Yang
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| |
Collapse
|
14
|
Zhao J, Ming J, Hu X, Chen G, Liu J, Yang C. Bayesian weighted Mendelian randomization for causal inference based on summary statistics. Bioinformatics 2020; 36:1501-1508. [PMID: 31593215 DOI: 10.1093/bioinformatics/btz749] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 09/06/2019] [Accepted: 10/02/2019] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION The results from Genome-Wide Association Studies (GWAS) on thousands of phenotypes provide an unprecedented opportunity to infer the causal effect of one phenotype (exposure) on another (outcome). Mendelian randomization (MR), an instrumental variable (IV) method, has been introduced for causal inference using GWAS data. Due to the polygenic architecture of complex traits/diseases and the ubiquity of pleiotropy, however, MR has many unique challenges compared to conventional IV methods. RESULTS We propose a Bayesian weighted Mendelian randomization (BWMR) for causal inference to address these challenges. In our BWMR model, the uncertainty of weak effects owing to polygenicity has been taken into account and the violation of IV assumption due to pleiotropy has been addressed through outlier detection by Bayesian weighting. To make the causal inference based on BWMR computationally stable and efficient, we developed a variational expectation-maximization (VEM) algorithm. Moreover, we have also derived an exact closed-form formula to correct the posterior covariance which is often underestimated in variational inference. Through comprehensive simulation studies, we evaluated the performance of BWMR, demonstrating the advantage of BWMR over its competitors. Then we applied BWMR to make causal inference between 130 metabolites and 93 complex human traits, uncovering novel causal relationship between exposure and outcome traits. AVAILABILITY AND IMPLEMENTATION The BWMR software is available at https://github.com/jiazhao97/BWMR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jia Zhao
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR 999077
- School of Mathematical Sciences, Beijing Normal University, Beijing 100875
| | - Jingsi Ming
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR 999077
| | - Xianghong Hu
- Department of Mathematics, Hong Kong Baptist University, Hong Kong SAR 999077
- Department of Mathematics, Southern University of Science and Technology, Shenzhen 518055
| | - Gang Chen
- The WeGene Company, Shenzhen 518042, China
| | - Jin Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, 169857 Singapore
| | - Can Yang
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR 999077
| |
Collapse
|
15
|
Liu W, Song C, Ren Z, Zhang Z, Pei X, Liu Y, He K, Zhang F, Zhao J, Zhang J, Wang X, Yang D, Li W. Genome-wide association study reveals the genetic basis of fiber quality traits in upland cotton (Gossypium hirsutum L.). BMC PLANT BIOLOGY 2020; 20:395. [PMID: 32854609 PMCID: PMC7450593 DOI: 10.1186/s12870-020-02611-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Accepted: 08/18/2020] [Indexed: 05/08/2023]
Abstract
BACKGROUND Fiber quality is an important economic trait of cotton, and its improvement is a major goal of cotton breeding. To better understand the genetic mechanisms responsible for fiber quality traits, we conducted a genome-wide association study to identify and mine fiber-quality-related quantitative trait loci (QTLs) and genes. RESULTS In total, 42 single nucleotide polymorphisms (SNPs) and 31 QTLs were identified as being significantly associated with five fiber quality traits. Twenty-five QTLs were identified in previous studies, and six novel QTLs were firstly identified in this study. In the QTL regions, 822 genes were identified and divided into four clusters based on their expression profiles. We also identified two pleiotropic SNPs. The SNP locus i52359Gb was associated with fiber elongation, strength, length and uniformity, while i11316Gh was associated with fiber strength and length. Moreover, these two SNPs were nonsynonymous and located in genes Gh_D09G2376 and Gh_D06G1908, respectively. RT-qPCR analysis revealed that these two genes were preferentially expressed at one or more stages of cotton fiber development, which was consistent with the RNA-seq data. Thus, Gh_D09G2376 and Gh_D06G1908 may be involved in fiber developmental processes. CONCLUSIONS The findings of this study provide insights into the genetic bases of fiber quality traits, and the identified QTLs or genes may be applicable in cotton breeding to improve fiber quality.
Collapse
Affiliation(s)
- Wei Liu
- Collaborative Innovation Center of Henan Grain Crops, Agronomy College, Henan Agricultural University, Zhengzhou, 450002, China
| | - Chengxiang Song
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Zhongying Ren
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Zhiqiang Zhang
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Xiaoyu Pei
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Yangai Liu
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Kunlun He
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Fei Zhang
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Junjie Zhao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Jie Zhang
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China
| | - Xingxing Wang
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, China
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China
| | - Daigang Yang
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, China
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China
| | - Wei Li
- Collaborative Innovation Center of Henan Grain Crops, Agronomy College, Henan Agricultural University, Zhengzhou, 450002, China.
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, China.
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China.
| |
Collapse
|
16
|
Dai M, Wan X, Peng H, Wang Y, Liu Y, Liu J, Xu Z, Yang C. Joint analysis of individual-level and summary-level GWAS data by leveraging pleiotropy. Bioinformatics 2020; 35:1729-1736. [PMID: 30307540 DOI: 10.1093/bioinformatics/bty870] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Revised: 09/06/2018] [Accepted: 10/09/2018] [Indexed: 12/18/2022] Open
Abstract
MOTIVATION A large number of recent genome-wide association studies (GWASs) for complex phenotypes confirm the early conjecture for polygenicity, suggesting the presence of large number of variants with only tiny or moderate effects. However, due to the limited sample size of a single GWAS, many associated genetic variants are too weak to achieve the genome-wide significance. These undiscovered variants further limit the prediction capability of GWAS. Restricted access to the individual-level data and the increasing availability of the published GWAS results motivate the development of methods integrating both the individual-level and summary-level data. How to build the connection between the individual-level and summary-level data determines the efficiency of using the existing abundant summary-level resources with limited individual-level data, and this issue inspires more efforts in the existing area. RESULTS In this study, we propose a novel statistical approach, LEP, which provides a novel way of modeling the connection between the individual-level data and summary-level data. LEP integrates both types of data by LEveraging Pleiotropy to increase the statistical power of risk variants identification and the accuracy of risk prediction. The algorithm for parameter estimation is developed to handle genome-wide-scale data. Through comprehensive simulation studies, we demonstrated the advantages of LEP over the existing methods. We further applied LEP to perform integrative analysis of Crohn's disease from WTCCC and summary statistics from GWAS of some other diseases, such as Type 1 diabetes, Ulcerative colitis and Primary biliary cirrhosis. LEP was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.39% (±0.58%) to 68.33% (±0.32%) using about 195 000 variants. AVAILABILITY AND IMPLEMENTATION The LEP software is available at https://github.com/daviddaigithub/LEP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mingwei Dai
- Department of Applied Mathematics, School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China.,Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong, China
| | - Xiang Wan
- ShenZhen Research Institute of Big Data, ShenZhen, China
| | - Hao Peng
- School of Business Administration, Southwestern University of Finance and Economics, Chengdu, China
| | - Yao Wang
- Department of Applied Mathematics, School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China
| | - Yue Liu
- Xiyuan Hospital of China Academy of Chinese Medical Sciences, Beijing, China
| | - Jin Liu
- Centre for Quantitative Medicine, Program in Health Services and Systems Research, Duke-NUS Medical School, Singapore
| | - Zongben Xu
- Department of Applied Mathematics, School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China
| | - Can Yang
- Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong, China
| |
Collapse
|
17
|
Marín M, Esteban FJ, Ramírez-Rodrigo H, Ros E, Sáez-Lara MJ. An integrative methodology based on protein-protein interaction networks for identification and functional annotation of disease-relevant genes applied to channelopathies. BMC Bioinformatics 2019; 20:565. [PMID: 31718537 PMCID: PMC6849233 DOI: 10.1186/s12859-019-3162-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Accepted: 10/15/2019] [Indexed: 12/19/2022] Open
Abstract
Background Biologically data-driven networks have become powerful analytical tools that handle massive, heterogeneous datasets generated from biomedical fields. Protein-protein interaction networks can identify the most relevant structures directly tied to biological functions. Functional enrichments can then be performed based on these structural aspects of gene relationships for the study of channelopathies. Channelopathies refer to a complex group of disorders resulting from dysfunctional ion channels with distinct polygenic manifestations. This study presents a semi-automatic workflow using protein-protein interaction networks that can identify the most relevant genes and their biological processes and pathways in channelopathies to better understand their etiopathogenesis. In addition, the clinical manifestations that are strongly associated with these genes are also identified as the most characteristic in this complex group of diseases. Results In particular, a set of nine representative disease-related genes was detected, these being the most significant genes in relation to their roles in channelopathies. In this way we attested the implication of some voltage-gated sodium (SCN1A, SCN2A, SCN4A, SCN4B, SCN5A, SCN9A) and potassium (KCNQ2, KCNH2) channels in cardiovascular diseases, epilepsies, febrile seizures, headache disorders, neuromuscular, neurodegenerative diseases or neurobehavioral manifestations. We also revealed the role of Ankyrin-G (ANK3) in the neurodegenerative and neurobehavioral disorders as well as the implication of these genes in other systems, such as the immunological or endocrine systems. Conclusions This research provides a systems biology approach to extract information from interaction networks of gene expression. We show how large-scale computational integration of heterogeneous datasets, PPI network analyses, functional databases and published literature may support the detection and assessment of possible potential therapeutic targets in the disease. Applying our workflow makes it feasible to spot the most relevant genes and unknown relationships in channelopathies and shows its potential as a first-step approach to identify both genes and functional interactions in clinical-knowledge scenarios of target diseases. Methods An initial gene pool is previously defined by searching general databases under a specific semantic framework. From the resulting interaction network, a subset of genes are identified as the most relevant through the workflow that includes centrality measures and other filtering and enrichment databases.
Collapse
Affiliation(s)
- Milagros Marín
- Department of Computer Architecture and Technology - CITIC, University of Granada, Granada, Spain.,Department of Biochemistry and Molecular Biology I, University of Granada, Granada, Spain
| | - Francisco J Esteban
- Systems Biology Unit, Department of Experimental Biology, University of Jaén, Jaén, Spain.
| | | | - Eduardo Ros
- Department of Computer Architecture and Technology - CITIC, University of Granada, Granada, Spain
| | - María José Sáez-Lara
- Department of Biochemistry and Molecular Biology I, University of Granada, Granada, Spain.
| |
Collapse
|
18
|
Yuan X, Biswas S. Bivariate logistic Bayesian LASSO for detecting rare haplotype association with two correlated phenotypes. Genet Epidemiol 2019; 43:996-1017. [PMID: 31544985 DOI: 10.1002/gepi.22258] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 07/31/2019] [Accepted: 08/09/2019] [Indexed: 11/08/2022]
Abstract
In genetic association studies, joint modeling of related traits/phenotypes can utilize the correlation between them and thereby provide more power and uncover additional information about genetic etiology. Moreover, detecting rare genetic variants are of current scientific interest as a key to missing heritability. Logistic Bayesian LASSO (LBL) has been proposed recently to detect rare haplotype variants using case-control data, that is, a single binary phenotype. As there is currently no haplotype association method that can handle multiple binary phenotypes, we extend LBL to fill this gap. We develop a bivariate model by using a latent variable to induce correlation between the two outcomes. We carry out extensive simulations to investigate the bivariate LBL and compare with the univariate LBL. The bivariate LBL performs better or similar to the univariate LBL in most settings. It has the highest gain in power when a haplotype is associated with both traits and it affects at least one trait in a direction opposite to the direction of the correlation between the traits. We analyze two data sets-Genetic Analysis Workshop 19 sequence data on systolic and diastolic blood pressures and a genome-wide association data set on lung cancer and smoking and detect several associated rare haplotypes.
Collapse
Affiliation(s)
- Xiaochen Yuan
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas
| |
Collapse
|
19
|
Lamichhaney S, Card DC, Grayson P, Tonini JFR, Bravo GA, Näpflin K, Termignoni-Garcia F, Torres C, Burbrink F, Clarke JA, Sackton TB, Edwards SV. Integrating natural history collections and comparative genomics to study the genetic architecture of convergent evolution. Philos Trans R Soc Lond B Biol Sci 2019; 374:20180248. [PMID: 31154982 PMCID: PMC6560268 DOI: 10.1098/rstb.2018.0248] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/25/2019] [Indexed: 12/20/2022] Open
Abstract
Evolutionary convergence has been long considered primary evidence of adaptation driven by natural selection and provides opportunities to explore evolutionary repeatability and predictability. In recent years, there has been increased interest in exploring the genetic mechanisms underlying convergent evolution, in part, owing to the advent of genomic techniques. However, the current 'genomics gold rush' in studies of convergence has overshadowed the reality that most trait classifications are quite broadly defined, resulting in incomplete or potentially biased interpretations of results. Genomic studies of convergence would be greatly improved by integrating deep 'vertical', natural history knowledge with 'horizontal' knowledge focusing on the breadth of taxonomic diversity. Natural history collections have and continue to be best positioned for increasing our comprehensive understanding of phenotypic diversity, with modern practices of digitization and databasing of morphological traits providing exciting improvements in our ability to evaluate the degree of morphological convergence. Combining more detailed phenotypic data with the well-established field of genomics will enable scientists to make progress on an important goal in biology: to understand the degree to which genetic or molecular convergence is associated with phenotypic convergence. Although the fields of comparative biology or comparative genomics alone can separately reveal important insights into convergent evolution, here we suggest that the synergistic and complementary roles of natural history collection-derived phenomic data and comparative genomics methods can be particularly powerful in together elucidating the genomic basis of convergent evolution among higher taxa. This article is part of the theme issue 'Convergent evolution in the genomics era: new insights and directions'.
Collapse
Affiliation(s)
- Sangeet Lamichhaney
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| | - Daren C. Card
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
- Department of Biology, University of Texas Arlington, Arlington, TX 76019, USA
| | - Phil Grayson
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| | - João F. R. Tonini
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| | - Gustavo A. Bravo
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| | - Kathrin Näpflin
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| | - Flavia Termignoni-Garcia
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| | - Christopher Torres
- Department of Biology, The University of Texas at Austin, Austin, MA 78712, USA
- Department of Geological Sciences, The University of Texas at Austin, Austin, MA 78712, USA
| | - Frank Burbrink
- Department of Herpetology, The American Museum of Natural History, New York, NY 10024, USA
| | - Julia A. Clarke
- Department of Biology, The University of Texas at Austin, Austin, MA 78712, USA
- Department of Geological Sciences, The University of Texas at Austin, Austin, MA 78712, USA
| | | | - Scott V. Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
20
|
Liu J, Wan X, Wang C, Yang C, Zhou X, Yang C. LLR: a latent low-rank approach to colocalizing genetic risk variants in multiple GWAS. Bioinformatics 2018; 33:3878-3886. [PMID: 28961754 DOI: 10.1093/bioinformatics/btx512] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2016] [Accepted: 08/09/2017] [Indexed: 12/30/2022] Open
Abstract
Motivation Genome-wide association studies (GWAS), which genotype millions of single nucleotide polymorphisms (SNPs) in thousands of individuals, are widely used to identify the risk SNPs underlying complex human phenotypes (quantitative traits or diseases). Most conventional statistical methods in GWAS only investigate one phenotype at a time. However, an increasing number of reports suggest the ubiquity of pleiotropy, i.e. many complex phenotypes sharing common genetic bases. This motivated us to leverage pleiotropy to develop new statistical approaches to joint analysis of multiple GWAS. Results In this study, we propose a latent low-rank (LLR) approach to colocalizing genetic risk variants using summary statistics. In the presence of pleiotropy, there exist risk loci that affect multiple phenotypes. To leverage pleiotropy, we introduce a low-rank structure to modulate the probabilities of the latent association statuses between loci and phenotypes. Regarding the computational efficiency of LLR, a novel expectation-maximization-path (EM-path) algorithm has been developed to greatly reduce the computational cost and facilitate model selection and inference. We demonstrate the advantages of LLR over competing approaches through simulation studies and joint analysis of 18 GWAS datasets. Availability and implementation The LLR software is available on https://sites.google.com/site/liujin810822. Contact macyang@ust.hk.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jin Liu
- Center for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Xiang Wan
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
| | - Chaolong Wang
- Genome Institute of Singapore, A*STAR, Singapore, Singapore
| | | | - Xiaowei Zhou
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Can Yang
- Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong, China.,Department of Mathematics, Hong Kong Baptist University, Hong Kong, China
| |
Collapse
|
21
|
Ristevski B, Chen M. Big Data Analytics in Medicine and Healthcare. J Integr Bioinform 2018; 15:/j/jib.ahead-of-print/jib-2017-0030/jib-2017-0030.xml. [PMID: 29746254 PMCID: PMC6340124 DOI: 10.1515/jib-2017-0030] [Citation(s) in RCA: 95] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2017] [Accepted: 03/20/2018] [Indexed: 12/28/2022] Open
Abstract
This paper surveys big data with highlighting the big data analytics in medicine and healthcare. Big data characteristics: value, volume, velocity, variety, veracity and variability are described. Big data analytics in medicine and healthcare covers integration and analysis of large amount of complex heterogeneous data such as various – omics data (genomics, epigenomics, transcriptomics, proteomics, metabolomics, interactomics, pharmacogenomics, diseasomics), biomedical data and electronic health records data. We underline the challenging issues about big data privacy and security. Regarding big data characteristics, some directions of using suitable and promising open-source distributed data processing software platform are given.
Collapse
Affiliation(s)
- Blagoj Ristevski
- "St. Kliment Ohridski" University - Bitola, Faculty of Information and Communication Technologies, ul. Partizanska bb, 7000 Bitola, Republic of Macedonia
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University Zijingang Campus, Hangzhou, P.R. China
| |
Collapse
|
22
|
Chesmore K, Bartlett J, Williams SM. The ubiquity of pleiotropy in human disease. Hum Genet 2017; 137:39-44. [DOI: 10.1007/s00439-017-1854-z] [Citation(s) in RCA: 70] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 11/14/2017] [Indexed: 02/07/2023]
|
23
|
Onukwugha E, Duru OK, Peprah E. Foreword: Big Data and Its Application in Health Disparities Research. Ethn Dis 2017; 27:69-72. [PMID: 28439175 DOI: 10.18865/ed.27.2.69] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The articles presented in this special issue advance the conversation by describing the current efforts, findings and concerns related to Big Data and health disparities. They offer important recommendations and perspectives to consider when designing systems that can usefully leverage Big Data to reduce health disparities. We hope that ongoing Big Data efforts can build on these contributions to advance the conversation, address our embedded assumptions, and identify levers for action to reduce health care disparities.
Collapse
Affiliation(s)
- Eberechukwu Onukwugha
- Pharmaceutical Health Services Research Department; University of Maryland School of Pharmacy
| | - O Kenrik Duru
- Division of General Internal Medicine; Geffen School of Medicine, University of California Los Angeles
| | - Emmanuel Peprah
- Center for Translation Research and Implementation Science (CTRIS); National Heart, Lung, and Blood Institute; National Institutes of Health
| |
Collapse
|
24
|
Moreno-Moral A, Pesce F, Behmoaras J, Petretto E. Systems Genetics as a Tool to Identify Master Genetic Regulators in Complex Disease. Methods Mol Biol 2017; 1488:337-362. [PMID: 27933533 DOI: 10.1007/978-1-4939-6427-7_16] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Systems genetics stems from systems biology and similarly employs integrative modeling approaches to describe the perturbations and phenotypic effects observed in a complex system. However, in the case of systems genetics the main source of perturbation is naturally occurring genetic variation, which can be analyzed at the systems-level to explain the observed variation in phenotypic traits. In contrast with conventional single-variant association approaches, the success of systems genetics has been in the identification of gene networks and molecular pathways that underlie complex disease. In addition, systems genetics has proven useful in the discovery of master trans-acting genetic regulators of functional networks and pathways, which in many cases revealed unexpected gene targets for disease. Here we detail the central components of a fully integrated systems genetics approach to complex disease, starting from assessment of genetic and gene expression variation, linking DNA sequence variation to mRNA (expression QTL mapping), gene regulatory network analysis and mapping the genetic control of regulatory networks. By summarizing a few illustrative (and successful) examples, we highlight how different data-modeling strategies can be effectively integrated in a systems genetics study.
Collapse
Affiliation(s)
- Aida Moreno-Moral
- Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore
| | - Francesco Pesce
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, Hammersmith Campus, Imperial Centre for Translational and Experimental Medicine, London, UK
| | - Jacques Behmoaras
- Centre for Complement and Inflammation Research, Imperial College London, Hammersmith Hospital, Du Cane Road, London, W12 0NN, UK
| | - Enrico Petretto
- Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore.
| |
Collapse
|
25
|
Dixon P, Davey Smith G, von Hinke S, Davies NM, Hollingworth W. Estimating Marginal Healthcare Costs Using Genetic Variants as Instrumental Variables: Mendelian Randomization in Economic Evaluation. PHARMACOECONOMICS 2016; 34:1075-1086. [PMID: 27484822 PMCID: PMC5073110 DOI: 10.1007/s40273-016-0432-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Accurate measurement of the marginal healthcare costs associated with different diseases and health conditions is important, especially for increasingly prevalent conditions such as obesity. However, existing observational study designs cannot identify the causal impact of disease on healthcare costs. This paper explores the possibilities for causal inference offered by Mendelian randomization, a form of instrumental variable analysis that uses genetic variation as a proxy for modifiable risk exposures, to estimate the effect of health conditions on cost. Well-conducted genome-wide association studies provide robust evidence of the associations of genetic variants with health conditions or disease risk factors. The subsequent causal effects of these health conditions on cost can be estimated using genetic variants as instruments for the health conditions. This is because the approximately random allocation of genotypes at conception means that many genetic variants are orthogonal to observable and unobservable confounders. Datasets with linked genotypic and resource use information obtained from electronic medical records or from routinely collected administrative data are now becoming available and will facilitate this form of analysis. We describe some of the methodological issues that arise in this type of analysis, which we illustrate by considering how Mendelian randomization could be used to estimate the causal impact of obesity, a complex trait, on healthcare costs. We describe some of the data sources that could be used for this type of analysis. We conclude by considering the challenges and opportunities offered by Mendelian randomization for economic evaluation.
Collapse
Affiliation(s)
- Padraig Dixon
- School of Social and Community Medicine, University of Bristol, Canynge Hall, 39 Whatley Road, Bristol, BS8 2PS, UK.
| | - George Davey Smith
- School of Social and Community Medicine, University of Bristol, Canynge Hall, 39 Whatley Road, Bristol, BS8 2PS, UK
- MRC Integrative Epidemiology Unit at the University of Bristol, Oakfield House, Oakfield Grove, Bristol, BS8 2BN, UK
| | - Stephanie von Hinke
- School of Economics, Finance and Management, University of Bristol, 8 Woodland Road, Bristol, BS8 1TN, UK
| | - Neil M Davies
- School of Social and Community Medicine, University of Bristol, Canynge Hall, 39 Whatley Road, Bristol, BS8 2PS, UK
- MRC Integrative Epidemiology Unit at the University of Bristol, Oakfield House, Oakfield Grove, Bristol, BS8 2BN, UK
| | - William Hollingworth
- School of Social and Community Medicine, University of Bristol, Canynge Hall, 39 Whatley Road, Bristol, BS8 2PS, UK
| |
Collapse
|
26
|
Moreno-Moral A, Petretto E. From integrative genomics to systems genetics in the rat to link genotypes to phenotypes. Dis Model Mech 2016; 9:1097-1110. [PMID: 27736746 PMCID: PMC5087832 DOI: 10.1242/dmm.026104] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Complementary to traditional gene mapping approaches used to identify the hereditary components of complex diseases, integrative genomics and systems genetics have emerged as powerful strategies to decipher the key genetic drivers of molecular pathways that underlie disease. Broadly speaking, integrative genomics aims to link cellular-level traits (such as mRNA expression) to the genome to identify their genetic determinants. With the characterization of several cellular-level traits within the same system, the integrative genomics approach evolved into a more comprehensive study design, called systems genetics, which aims to unravel the complex biological networks and pathways involved in disease, and in turn map their genetic control points. The first fully integrated systems genetics study was carried out in rats, and the results, which revealed conserved trans-acting genetic regulation of a pro-inflammatory network relevant to type 1 diabetes, were translated to humans. Many studies using different organisms subsequently stemmed from this example. The aim of this Review is to describe the most recent advances in the fields of integrative genomics and systems genetics applied in the rat, with a focus on studies of complex diseases ranging from inflammatory to cardiometabolic disorders. We aim to provide the genetics community with a comprehensive insight into how the systems genetics approach came to life, starting from the first integrative genomics strategies [such as expression quantitative trait loci (eQTLs) mapping] and concluding with the most sophisticated gene network-based analyses in multiple systems and disease states. Although not limited to studies that have been directly translated to humans, we will focus particularly on the successful investigations in the rat that have led to primary discoveries of genes and pathways relevant to human disease.
Collapse
Affiliation(s)
- Aida Moreno-Moral
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore (NUS) Medical School, Singapore
| | - Enrico Petretto
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore (NUS) Medical School, Singapore
| |
Collapse
|
27
|
Liu J, Wan X, Ma S, Yang C. EPS: an empirical Bayes approach to integrating pleiotropy and tissue-specific information for prioritizing risk genes. Bioinformatics 2016; 32:1856-64. [DOI: 10.1093/bioinformatics/btw081] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Accepted: 02/05/2016] [Indexed: 12/12/2022] Open
Affiliation(s)
- Jin Liu
- Center of Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore,
| | - Xiang Wan
- Department of Computer Science, Institute of Computational and Theoretical Studies, Hong Kong Baptist University, Kowloon, Hong Kong
| | - Shuangge Ma
- Department of Biostatistics, Yale University, New Heaven, CT, USA
| | - Can Yang
- Department of Mathematics, Hong Kong Baptist University, Kowloon, Hong Kong
| |
Collapse
|
28
|
Smith MT, de la Rosa R, Daniels SI. Using exposomics to assess cumulative risks and promote health. ENVIRONMENTAL AND MOLECULAR MUTAGENESIS 2015; 56:715-23. [PMID: 26475350 PMCID: PMC4636923 DOI: 10.1002/em.21985] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Accepted: 09/21/2015] [Indexed: 05/10/2023]
Abstract
Under the exposome paradigm all nongenetic factors contributing to disease are considered to be 'environmental' including chemicals, drugs, infectious agents, and psychosocial stress. We can consider these collectively as environmental stressors. Exposomics is the comprehensive analysis of exposure to all environmental stressors and should yield a more thorough understanding of chronic disease development. We can operationalize exposomics by studying all the small molecules in the body and their influence on biological pathways that lead to impaired health. Here, we describe methods by which this may be achieved and discuss the application of exposomics to cumulative risk assessment in vulnerable populations. Since the goal of cumulative risk assessment is to analyze, characterize, and quantify the combined risks to health from exposures to multiple agents or stressors, it seems that exposomics is perfectly poised to advance this important area of environmental health science. We should therefore support development of tools for exposomic analysis and begin to engage impacted communities in participatory exposome research. A first step may be to apply exposomics to vulnerable populations already studied by more conventional cumulative risk approaches. We further propose that recent migrants, low socioeconomic groups with high environmental chemical exposures, and pregnant women should be high priority populations for study by exposomics. Moreover, exposomics allows us to study interactions between chronic stress and environmental chemicals that disrupt stress response pathways (i.e., 'stressogens'). Exploring the impact of early life exposures and maternal stress may be an interesting and accessible topic for investigation by exposomics using biobanked samples.
Collapse
Affiliation(s)
- Martyn T Smith
- Superfund Research Program, Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley, California, 94720-7360
| | - Rosemarie de la Rosa
- Superfund Research Program, Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley, California, 94720-7360
| | - Sarah I Daniels
- Superfund Research Program, Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley, California, 94720-7360
| |
Collapse
|