1
|
Schraiber JG, Edge MD, Pennell M. Unifying approaches from statistical genetics and phylogenetics for mapping phenotypes in structured populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.10.579721. [PMID: 38496530 PMCID: PMC10942266 DOI: 10.1101/2024.02.10.579721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
In both statistical genetics and phylogenetics, a major goal is to identify correlations between genetic loci or other aspects of the phenotype or environment and a focal trait. In these two fields, there are sophisticated but disparate statistical traditions aimed at these tasks. The disconnect between their respective approaches is becoming untenable as questions in medicine, conservation biology, and evolutionary biology increasingly rely on integrating data from within and among species, and once-clear conceptual divisions are becoming increasingly blurred. To help bridge this divide, we derive a general model describing the covariance between the genetic contributions to the quantitative phenotypes of different individuals. Taking this approach shows that standard models in both statistical genetics (e.g., Genome-Wide Association Studies; GWAS) and phylogenetic comparative biology (e.g., phylogenetic regression) can be interpreted as special cases of this more general quantitative-genetic model. The fact that these models share the same core architecture means that we can build a unified understanding of the strengths and limitations of different methods for controlling for genetic structure when testing for associations. We develop intuition for why and when spurious correlations may occur using analytical theory and conduct population-genetic and phylogenetic simulations of quantitative traits. The structural similarity of problems in statistical genetics and phylogenetics enables us to take methodological advances from one field and apply them in the other. We demonstrate this by showing how a standard GWAS technique-including both the genetic relatedness matrix (GRM) as well as its leading eigenvectors, corresponding to the principal components of the genotype matrix, in a regression model-can mitigate spurious correlations in phylogenetic analyses. As a case study of this, we re-examine an analysis testing for co-evolution of expression levels between genes across a fungal phylogeny, and show that including covariance matrix eigenvectors as covariates decreases the false positive rate while simultaneously increasing the true positive rate. More generally, this work provides a foundation for more integrative approaches for understanding the genetic architecture of phenotypes and how evolutionary processes shape it.
Collapse
|
2
|
Uricchio LH. Evolutionary perspectives on polygenic selection, missing heritability, and GWAS. Hum Genet 2020; 139:5-21. [PMID: 31201529 PMCID: PMC8059781 DOI: 10.1007/s00439-019-02040-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Accepted: 06/06/2019] [Indexed: 12/26/2022]
Abstract
Genome-wide association studies (GWAS) have successfully identified many trait-associated variants, but there is still much we do not know about the genetic basis of complex traits. Here, we review recent theoretical and empirical literature regarding selection on complex traits to argue that "missing heritability" is as much an evolutionary problem as it is a statistical problem. We discuss empirical findings that suggest a role for selection in shaping the effect sizes and allele frequencies of causal variation underlying complex traits, and the limitations of these studies. We then use simulations of selection, realistic genome structure, and complex human demography to illustrate the results of recent theoretical work on polygenic selection, and show that statistical inference of causal loci is sharply affected by evolutionary processes. In particular, when selection acts on causal alleles, it hampers the ability to detect causal loci and constrains the transferability of GWAS results across populations. Last, we discuss the implications of these findings for future association studies, and suggest that future statistical methods to infer causal loci for genetic traits will benefit from explicit modeling of the joint distribution of effect sizes and allele frequencies under plausible evolutionary models.
Collapse
Affiliation(s)
- Lawrence H Uricchio
- Department of Biology, Stanford University, Stanford, CA, USA.
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA.
| |
Collapse
|
3
|
Genome-wide analysis indicates association between heterozygote advantage and healthy aging in humans. BMC Genet 2019; 20:52. [PMID: 31266448 PMCID: PMC6604157 DOI: 10.1186/s12863-019-0758-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 06/20/2019] [Indexed: 11/25/2022] Open
Abstract
Background Genetic diversity is known to confer survival advantage in many species across the tree of life. Here, we hypothesize that such pattern applies to humans as well and could be a result of higher fitness in individuals with higher genomic heterozygosity. Results We use healthy aging as a proxy for better health and fitness, and observe greater heterozygosity in healthy-aged individuals. Specifically, we find that only common genetic variants show significantly higher excess of heterozygosity in the healthy-aged cohort. Lack of difference in heterozygosity for low-frequency variants or disease-associated variants excludes the possibility of compensation for deleterious recessive alleles as a mechanism. In addition, coding SNPs with the highest excess of heterozygosity in the healthy-aged cohort are enriched in genes involved in extracellular matrix and glycoproteins, a group of genes known to be under long-term balancing selection. We also find that individual heterozygosity rate is a significant predictor of electronic health record (EHR)-based estimates of 10-year survival probability in men but not in women, accounting for several factors including age and ethnicity. Conclusions Our results demonstrate that the genomic heterozygosity is associated with human healthspan, and that the relationship between higher heterozygosity and healthy aging could be explained by heterozygote advantage. Further characterization of this relationship will have important implications in aging-associated disease risk prediction. Electronic supplementary material The online version of this article (10.1186/s12863-019-0758-4) contains supplementary material, which is available to authorized users.
Collapse
|
4
|
Patel R, Kumar S. On estimating evolutionary probabilities of population variants. BMC Evol Biol 2019; 19:133. [PMID: 31238981 PMCID: PMC6593550 DOI: 10.1186/s12862-019-1455-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Accepted: 06/06/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The evolutionary probability (EP) of an allele in a DNA or protein sequence predicts evolutionarily permissible (ePerm; EP ≥ 0.05) and forbidden (eForb; EP < 0.05) variants. EP of an allele represents an independent evolutionary expectation of observing an allele in a population based solely on the long-term substitution patterns captured in a multiple sequence alignment. In the neutral theory, EP and population frequencies can be compared to identify neutral and non-neutral alleles. This approach has been used to discover candidate adaptive polymorphisms in humans, which are eForbs segregating with high frequencies. The original method to compute EP requires the evolutionary relationships and divergence times of species in the sequence alignment (a timetree), which are not known with certainty for most datasets. This requirement impedes a general use of the original EP formulation. Here, we present an approach in which the phylogeny and times are inferred from the sequence alignment itself prior to the EP calculation. We evaluate if the modified EP approach produces results that are similar to those from the original method. RESULTS We compared EP estimates from the original and the modified approaches by using more than 18,000 protein sequence alignments containing orthologous sequences from 46 vertebrate species. For the original EP calculations, we used species relationships from UCSC and divergence times from TimeTree web resource, and the resulting EP estimates were considered to be the ground truth. We found that the modified approaches produced reasonable EP estimates for HGMD disease missense variant and 1000 Genomes Project missense variant datasets. Our results showed that reliable estimates of EP can be obtained without a priori knowledge of the sequence phylogeny and divergence times. We also found that, in order to obtain robust EP estimates, it is important to assemble a dataset with many sequences, sampling from a diversity of species groups. CONCLUSION We conclude that the modified EP approach will be generally applicable for alignments and enable the detection of potentially neutral, deleterious, and adaptive alleles in populations.
Collapse
Affiliation(s)
- Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA. .,Department of Biology, Temple University, Philadelphia, PA, 19122, USA. .,Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia.
| |
Collapse
|
5
|
Kono TJY, Lei L, Shih CH, Hoffman PJ, Morrell PL, Fay JC. Comparative Genomics Approaches Accurately Predict Deleterious Variants in Plants. G3 (BETHESDA, MD.) 2018; 8:3321-3329. [PMID: 30139765 PMCID: PMC6169392 DOI: 10.1534/g3.118.200563] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2018] [Accepted: 08/10/2018] [Indexed: 12/11/2022]
Abstract
Recent advances in genome resequencing have led to increased interest in prediction of the functional consequences of genetic variants. Variants at phylogenetically conserved sites are of particular interest, because they are more likely than variants at phylogenetically variable sites to have deleterious effects on fitness and contribute to phenotypic variation. Numerous comparative genomic approaches have been developed to predict deleterious variants, but the approaches are nearly always assessed based on their ability to identify known disease-causing mutations in humans. Determining the accuracy of deleterious variant predictions in nonhuman species is important to understanding evolution, domestication, and potentially to improving crop quality and yield. To examine our ability to predict deleterious variants in plants we generated a curated database of 2,910 Arabidopsis thaliana mutants with known phenotypes. We evaluated seven approaches and found that while all performed well, their relative ranking differed from prior benchmarks in humans. We conclude that deleterious mutations can be reliably predicted in A. thaliana and likely other plant species, but that the relative performance of various approaches does not necessarily translate from one species to another.
Collapse
Affiliation(s)
- Thomas J Y Kono
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN 551085
| | - Li Lei
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN 551085
| | - Ching-Hua Shih
- Department of Genetics, Washington University, St. Louis, MO 63110
| | - Paul J Hoffman
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN 551085
| | - Peter L Morrell
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN 551085
| | - Justin C Fay
- Department of Genetics, Washington University, St. Louis, MO 63110
| |
Collapse
|
6
|
Spataro N, Rodríguez JA, Navarro A, Bosch E. Properties of human disease genes and the role of genes linked to Mendelian disorders in complex disease aetiology. Hum Mol Genet 2017; 26:489-500. [PMID: 28053046 PMCID: PMC5409085 DOI: 10.1093/hmg/ddw405] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Revised: 11/10/2016] [Accepted: 11/23/2016] [Indexed: 01/19/2023] Open
Abstract
Do genes presenting variation that has been linked to human disease have different biological properties than genes that have never been related to disease? What is the relationship between disease and fitness? Are the evolutionary pressures that affect genes linked to Mendelian diseases the same to those acting on genes whose variation contributes to complex disorders? The answers to these questions could shed light on the architecture of human genetic disorders and may have relevant implications when designing mapping strategies in future genetic studies. Here we show that, relative to non-disease genes, human disease (HD) genes have specific evolutionary profiles and protein network properties. Additionally, our results indicate that the mutation-selection balance renders an insufficient account of the evolutionary history of some HD genes and that adaptive selection could also contribute to shape their genetic architecture. Notably, several biological features of HD genes depend on the type of pathology (complex or Mendelian) with which they are related. For example, genes harbouring both causal variants for Mendelian disorders and risk factors for complex disease traits (Complex-Mendelian genes), tend to present higher functional relevance in the protein network and higher expression levels than genes associated only with complex disorders. Moreover, risk variants in Complex-Mendelian genes tend to present higher odds ratios than those on genes associated with the same complex disorders but with no link to Mendelian diseases. Taken together, our results suggest that genetic variation at genes linked to Mendelian disorders plays an important role in driving susceptibility to complex disease.
Collapse
Affiliation(s)
- Nino Spataro
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Juan Antonio Rodríguez
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Arcadi Navarro
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
- National Institute for Bioinformatics (INB), Barcelona Biomedical Research Park (PRBB), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona Biomedical Research Park (PRBB), Barcelona, Spain
- Center for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona Biomedical Research Park (PRBB), Barcelona, Spain
| | - Elena Bosch
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| |
Collapse
|
7
|
Karim S, NourEldin HF, Abusamra H, Salem N, Alhathli E, Dudley J, Sanderford M, Scheinfeldt LB, Chaudhary AG, Al-Qahtani MH, Kumar S. e-GRASP: an integrated evolutionary and GRASP resource for exploring disease associations. BMC Genomics 2016; 17:770. [PMID: 27766955 PMCID: PMC5073857 DOI: 10.1186/s12864-016-3088-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2023] Open
Abstract
Background Genome-wide association studies (GWAS) have become a mainstay of biological research concerned with discovering genetic variation linked to phenotypic traits and diseases. Both discrete and continuous traits can be analyzed in GWAS to discover associations between single nucleotide polymorphisms (SNPs) and traits of interest. Associations are typically determined by estimating the significance of the statistical relationship between genetic loci and the given trait. However, the prioritization of bona fide, reproducible genetic associations from GWAS results remains a central challenge in identifying genomic loci underlying common complex diseases. Evolutionary-aware meta-analysis of the growing GWAS literature is one way to address this challenge and to advance from association to causation in the discovery of genotype-phenotype relationships. Description We have created an evolutionary GWAS resource to enable in-depth query and exploration of published GWAS results. This resource uses the publically available GWAS results annotated in the GRASP2 database. The GRASP2 database includes results from 2082 studies, 177 broad phenotype categories, and ~8.87 million SNP-phenotype associations. For each SNP in e-GRASP, we present information from the GRASP2 database for convenience as well as evolutionary information (e.g., rate and timespan). Users can, therefore, identify not only SNPs with highly significant phenotype-association P-values, but also SNPs that are highly replicated and/or occur at evolutionarily conserved sites that are likely to be functionally important. Additionally, we provide an evolutionary-adjusted SNP association ranking (E-rank) that uses cross-species evolutionary conservation scores and population allele frequencies to transform P-values in an effort to enhance the discovery of SNPs with a greater probability of biologically meaningful disease associations. Conclusion By adding an evolutionary dimension to the GWAS results available in the GRASP2 database, our e-GRASP resource will enable a more effective exploration of SNPs not only by the statistical significance of trait associations, but also by the number of studies in which associations have been replicated, and the evolutionary context of the associated mutations. Therefore, e-GRASP will be a valuable resource for aiding researchers in the identification of bona fide, reproducible genetic associations from GWAS results. This resource is freely available at http://www.mypeg.info/egrasp.
Collapse
Affiliation(s)
- Sajjad Karim
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Hend Fakhri NourEldin
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Heba Abusamra
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Nada Salem
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Elham Alhathli
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Joel Dudley
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, NY, 10029, USA
| | - Max Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA
| | - Laura B Scheinfeldt
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | | | | | - Sudhir Kumar
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia. .,Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA. .,Department of Biology, Temple University, Philadelphia, PA, 19122, USA.
| |
Collapse
|
8
|
|
9
|
Gorlov IP, Gorlova OY, Amos CI. Allelic Spectra of Risk SNPs Are Different for Environment/Lifestyle Dependent versus Independent Diseases. PLoS Genet 2015; 11:e1005371. [PMID: 26201053 PMCID: PMC4511800 DOI: 10.1371/journal.pgen.1005371] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Accepted: 06/18/2015] [Indexed: 11/18/2022] Open
Abstract
Genome-wide association studies (GWAS) have generated sufficient data to assess the role of selection in shaping allelic diversity of disease-associated SNPs. Negative selection against disease risk variants is expected to reduce their frequencies making them overrepresented in the group of minor (<50%) alleles. Indeed, we found that the overall proportion of risk alleles was higher among alleles with frequency <50% (minor alleles) compared to that in the group of major alleles. We hypothesized that negative selection may have different effects on environment (or lifestyle)-dependent versus environment (or lifestyle)-independent diseases. We used an environment/lifestyle index (ELI) to assess influence of environmental/lifestyle factors on disease etiology. ELI was defined as the number of publications mentioning "environment" or "lifestyle" AND disease per 1,000 disease-mentioning publications. We found that the frequency distributions of the risk alleles for the diseases with strong environmental/lifestyle components follow the distribution expected under a selectively neutral model, while frequency distributions of the risk alleles for the diseases with weak environmental/lifestyle influences is shifted to the lower values indicating effects of negative selection. We hypothesized that previously selectively neutral variants become risk alleles when environment changes. The hypothesis of ancestrally neutral, currently disadvantageous risk-associated alleles predicts that the distribution of risk alleles for the environment/lifestyle dependent diseases will follow a neutral model since natural selection has not had enough time to influence allele frequencies. The results of our analysis suggest that prediction of SNP functionality based on the level of evolutionary conservation may not be useful for SNPs associated with environment/lifestyle dependent diseases.
Collapse
Affiliation(s)
- Ivan P. Gorlov
- The Geisel School of Medicine, Dartmouth College, Dartmouth-Hitchcock Medical Center, Lebanon, New Hampshire, United States of America
| | - Olga Y. Gorlova
- The Geisel School of Medicine, Dartmouth College, Dartmouth-Hitchcock Medical Center, Lebanon, New Hampshire, United States of America
| | - Christopher I. Amos
- The Geisel School of Medicine, Dartmouth College, Dartmouth-Hitchcock Medical Center, Lebanon, New Hampshire, United States of America
| |
Collapse
|
10
|
Gibson G, Powell JE, Marigorta UM. Expression quantitative trait locus analysis for translational medicine. Genome Med 2015; 7:60. [PMID: 26110023 PMCID: PMC4479075 DOI: 10.1186/s13073-015-0186-7] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
Expression quantitative trait locus analysis has emerged as an important component of efforts to understand how genetic polymorphisms influence disease risk and is poised to make contributions to translational medicine. Here we review how expression quantitative trait locus analysis is aiding the identification of which gene(s) within regions of association are causal for a disease or phenotypic trait; the narrowing down of the cell types or regulators involved in the etiology of disease; the characterization of drivers and modifiers of cancer; and our understanding of how different environments and cellular contexts can modify gene expression. We also introduce the concept of transcriptional risk scores as a means of refining estimates of individual liability to disease based on targeted profiling of the transcripts that are regulated by polymorphisms jointly associated with disease and gene expression.
Collapse
Affiliation(s)
- Greg Gibson
- Center for Integrative Genomics, School of Biology, Georgia Institute of Technology, Atlanta, GA 30332 USA
| | - Joseph E Powell
- Centre for Neurogenetics and Statistical Genomics, Queensland Brain Institute, University of Queensland, St Lucia, Brisbane, QLD 4072 Australia ; The Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072 Australia
| | - Urko M Marigorta
- Center for Integrative Genomics, School of Biology, Georgia Institute of Technology, Atlanta, GA 30332 USA
| |
Collapse
|
11
|
Xu K, Schadt EE, Pollard KS, Roussos P, Dudley JT. Genomic and network patterns of schizophrenia genetic variation in human evolutionary accelerated regions. Mol Biol Evol 2015; 32:1148-60. [PMID: 25681384 DOI: 10.1093/molbev/msv031] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
The population persistence of schizophrenia despite associated reductions in fitness and fecundity suggests that the genetic basis of schizophrenia has a complex evolutionary history. A recent meta-analysis of schizophrenia genome-wide association studies offers novel opportunities for assessment of the evolutionary trajectories of schizophrenia-associated loci. In this study, we hypothesize that components of the genetic architecture of schizophrenia are attributable to human lineage-specific evolution. Our results suggest that schizophrenia-associated loci enrich in genes near previously identified human accelerated regions (HARs). Specifically, we find that genes near HARs conserved in nonhuman primates (pHARs) are enriched for schizophrenia-associated loci, and that pHAR-associated schizophrenia genes are under stronger selective pressure than other schizophrenia genes and other pHAR-associated genes. We further evaluate pHAR-associated schizophrenia genes in regulatory network contexts to investigate associated molecular functions and mechanisms. We find that pHAR-associated schizophrenia genes significantly enrich in a GABA-related coexpression module that was previously found to be differentially regulated in schizophrenia affected individuals versus healthy controls. In another two independent networks constructed from gene expression profiles from prefrontal cortex samples, we find that pHAR-associated schizophrenia genes are located in more central positions and their average path lengths to the other nodes are significantly shorter than those of other schizophrenia genes. Together, our results suggest that HARs are associated with potentially important functional roles in the genetic architecture of schizophrenia.
Collapse
Affiliation(s)
- Ke Xu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Eric E Schadt
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Katherine S Pollard
- Gladstone Institutes, University of California, San Francisco Institute for Human Genetics, University of California, San Francisco Department of Epidemiology and Biostatistics, University of California, San Francisco
| | - Panos Roussos
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Joel T Dudley
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY
| |
Collapse
|
12
|
Cheng F, Jia P, Wang Q, Lin CC, Li WH, Zhao Z. Studying tumorigenesis through network evolution and somatic mutational perturbations in the cancer interactome. Mol Biol Evol 2014; 31:2156-69. [PMID: 24881052 DOI: 10.1093/molbev/msu167] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Cells govern biological functions through complex biological networks. Perturbations to networks may drive cells to new phenotypic states, for example, tumorigenesis. Identifying how genetic lesions perturb molecular networks is a fundamental challenge. This study used large-scale human interactome data to systematically explore the relationship among network topology, somatic mutation, evolutionary rate, and evolutionary origin of cancer genes. We found the unique network centrality of cancer proteins, which is largely independent of gene essentiality. Cancer genes likely have experienced a lower evolutionary rate and stronger purifying selection than those of noncancer, Mendelian disease, and orphan disease genes. Cancer proteins tend to have ancient histories, likely originated in early metazoan, although they are younger than proteins encoded by Mendelian disease genes, orphan disease genes, and essential genes. We found that the protein evolutionary origin (age) positively correlates with protein connectivity in the human interactome. Furthermore, we investigated the network-attacking perturbations due to somatic mutations identified from 3,268 tumors across 12 cancer types in The Cancer Genome Atlas. We observed a positive correlation between protein connectivity and the number of nonsynonymous somatic mutations, whereas a weaker or insignificant correlation between protein connectivity and the number of synonymous somatic mutations. These observations suggest that somatic mutational network-attacking perturbations to hub genes play an important role in tumor emergence and evolution. Collectively, this work has broad biomedical implications for both basic cancer biology and the development of personalized cancer therapy.
Collapse
Affiliation(s)
- Feixiong Cheng
- Department of Biomedical Informatics, Vanderbilt University School of Medicine
| | - Peilin Jia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine
| | - Quan Wang
- Department of Biomedical Informatics, Vanderbilt University School of Medicine
| | - Chen-Ching Lin
- Department of Biomedical Informatics, Vanderbilt University School of Medicine
| | - Wen-Hsiung Li
- Department of Ecology and Evolution, University of ChicagoBiodiversity Research Center and Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of MedicineDepartment of Cancer Biology, Vanderbilt University School of MedicineDepartment of Psychiatry, Vanderbilt University School of MedicineCenter for Quantitative Sciences, Vanderbilt University Medical Center
| |
Collapse
|
13
|
Maher MC, Uricchio LH, Torgerson DG, Hernandez RD. Population genetics of rare variants and complex diseases. Hum Hered 2013; 74:118-28. [PMID: 23594490 PMCID: PMC3698246 DOI: 10.1159/000346826] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
OBJECTIVES Identifying drivers of complex traits from the noisy signals of genetic variation obtained from high-throughput genome sequencing technologies is a central challenge faced by human geneticists today. We hypothesize that the variants involved in complex diseases are likely to exhibit non-neutral evolutionary signatures. Uncovering the evolutionary history of all variants is therefore of intrinsic interest for complex disease research. However, doing so necessitates the simultaneous elucidation of the targets of natural selection and population-specific demographic history. METHODS Here we characterize the action of natural selection operating across complex disease categories, and use population genetic simulations to evaluate the expected patterns of genetic variation in large samples. We focus on populations that have experienced historical bottlenecks followed by explosive growth (consistent with many human populations), and describe the differences between evolutionarily deleterious mutations and those that are neutral. RESULTS Genes associated with several complex disease categories exhibit stronger signatures of purifying selection than non-disease genes. In addition, loci identified through genome-wide association studies of complex traits also exhibit signatures consistent with being in regions recurrently targeted by purifying selection. Through simulations, we show that population bottlenecks and rapid growth enable deleterious rare variants to persist at low frequencies just as long as neutral variants, but low-frequency and common variants tend to be much younger than neutral variants. This has resulted in a large proportion of modern-day rare alleles that have a deleterious effect on function and that potentially contribute to disease susceptibility. CONCLUSIONS The key question for sequencing-based association studies of complex traits is how to distinguish between deleterious and benign genetic variation. We used population genetic simulations to uncover patterns of genetic variation that distinguish these two categories, especially derived allele age, thereby providing inroads into novel methods for characterizing rare genetic variation driving complex diseases.
Collapse
Affiliation(s)
- M. Cyrus Maher
- Department of Epidemiology and Biostatistics, University of California, San Francisco
| | - Lawrence H. Uricchio
- UC Berkeley & UCSF Joint Graduate Group in Bioengineering, University of California, San Francisco
| | | | - Ryan D. Hernandez
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco
| |
Collapse
|
14
|
Cagliani R, Pozzoli U, Forni D, Cassinotti A, Fumagalli M, Giani M, Fichera M, Lombardini M, Ardizzone S, Asselta R, de Franchis R, Riva S, Biasin M, Comi GP, Bresolin N, Clerici M, Sironi M. Crohn's disease loci are common targets of protozoa-driven selection. Mol Biol Evol 2013; 30:1077-87. [PMID: 23389767 DOI: 10.1093/molbev/mst020] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Previous studies indicated that a few risk variants for autoimmune diseases are subject to pathogen-driven selection. Nonetheless, the proportion of risk loci that has been targeted by pathogens and the type of infectious agent(s) that exerted the strongest pressure remain to be evaluated. We assessed whether different pathogens exerted a pressure on known Crohn's disease (CD) risk variants and demonstrate that these single-nucleotide polymorphisms (SNPs) are preferential targets of protozoa-driven selection (P = 0.008). In particular, 19% of SNPs associated with CD have been subject to protozoa-driven selective pressure. Analysis of P values from genome-wide association studies (GWASs) and meta-analyses indicated that protozoan-selected SNPs display significantly stronger association with CD compared with nonselected variants. This same behavior was not observed for GWASs of other autoimmune diseases. Thus, we integrated selection signatures and meta-analysis results to prioritize five genic SNPs for replication in an Italian cohort. Three SNPs were significantly associated with CD risk, and combination with meta-analysis results yielded P values < 4 × 10(-6). The bona fide risk alleles are located in ARHGEF2, an interactor of NOD2, NSF, a gene involved in autophagy, and HEBP1, encoding a possible mediator of inflammation. Pathway analysis indicated that ARHGEF2 and NSF participate in a molecular network, which also contains VAMP3 (previously associated to CD) and is centered around miR-31 (known to be disregulated in CD). Thus, we show that protozoa-driven selective pressure had a major role in shaping predisposition to CD. We next used this information for the identification of three bona fide novel susceptibility loci.
Collapse
Affiliation(s)
- Rachele Cagliani
- Bioinformatics Laboratory, Scientific Institute IRCCS E Medea, Bosisio Parini, LC, Italy
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Dudley JT, Kim Y, Liu L, Markov GJ, Gerold K, Chen R, Butte AJ, Kumar S. Human genomic disease variants: a neutral evolutionary explanation. Genome Res 2012; 22:1383-94. [PMID: 22665443 PMCID: PMC3409252 DOI: 10.1101/gr.133702.111] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Many perspectives on the role of evolution in human health include nonempirical assumptions concerning the adaptive evolutionary origins of human diseases. Evolutionary analyses of the increasing wealth of clinical and population genomic data have begun to challenge these presumptions. In order to systematically evaluate such claims, the time has come to build a common framework for an empirical and intellectual unification of evolution and modern medicine. We review the emerging evidence and provide a supporting conceptual framework that establishes the classical neutral theory of molecular evolution (NTME) as the basis for evaluating disease- associated genomic variations in health and medicine. For over a decade, the NTME has already explained the origins and distribution of variants implicated in diseases and has illuminated the power of evolutionary thinking in genomic medicine. We suggest that a majority of disease variants in modern populations will have neutral evolutionary origins (previously neutral), with a relatively smaller fraction exhibiting adaptive evolutionary origins (previously adaptive). This pattern is expected to hold true for common as well as rare disease variants. Ultimately, a neutral evolutionary perspective will provide medicine with an informative and actionable framework that enables objective clinical assessment beyond convenient tendencies to invoke past adaptive events in human history as a root cause of human disease.
Collapse
Affiliation(s)
- Joel T Dudley
- Program in Biomedical Informatics, Stanford University School of Medicine, Stanford, California 94305, USA
| | | | | | | | | | | | | | | |
Collapse
|
16
|
Enard W. Functional primate genomics—leveraging the medical potential. J Mol Med (Berl) 2012; 90:471-80. [DOI: 10.1007/s00109-012-0901-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Revised: 04/04/2012] [Accepted: 04/05/2012] [Indexed: 10/28/2022]
|