1
|
Giel AS, Bigge J, Schumacher J, Maj C, Dasmeh P. Analysis of Evolutionary Conservation, Expression Level, and Genetic Association at a Genome-wide Scale Reveals Heterogeneity Across Polygenic Phenotypes. Mol Biol Evol 2024; 41:msae115. [PMID: 38865495 PMCID: PMC11247350 DOI: 10.1093/molbev/msae115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 03/22/2024] [Accepted: 05/03/2024] [Indexed: 06/14/2024] Open
Abstract
Understanding the expression level and evolutionary rate of associated genes with human polygenic diseases provides crucial insights into their disease-contributing roles. In this work, we leveraged genome-wide association studies (GWASs) to investigate the relationship between the genetic association and both the evolutionary rate (dN/dS) and expression level of human genes associated with the two polygenic diseases of schizophrenia and coronary artery disease. Our findings highlight a distinct variation in these relationships between the two diseases. Genes associated with both diseases exhibit a significantly greater variance in evolutionary rate compared to those implicated in monogenic diseases. Expanding our analyses to 4,756 complex traits in the GWAS atlas database, we unraveled distinct trait categories with a unique interplay among the evolutionary rate, expression level, and genetic association of human genes. In most polygenic traits, highly expressed genes were more associated with the polygenic phenotypes compared to lowly expressed genes. About 69% of polygenic traits displayed a negative correlation between genetic association and evolutionary rate, while approximately 30% of these traits showed a positive correlation between genetic association and evolutionary rate. Our results demonstrate the presence of a spectrum among complex traits, shaped by natural selection. Notably, at opposite ends of this spectrum, we find metabolic traits being more likely influenced by purifying selection, and immunological traits that are more likely shaped by positive selection. We further established the polygenic evolution portal (evopolygen.de) as a resource for investigating relationships and generating hypotheses in the field of human polygenic trait evolution.
Collapse
Affiliation(s)
- Ann-Sophie Giel
- Centre for Human Genetics, Marburg University, Marburg, Germany
| | - Jessica Bigge
- Centre for Human Genetics, Marburg University, Marburg, Germany
| | | | - Carlo Maj
- Centre for Human Genetics, Marburg University, Marburg, Germany
| | - Pouria Dasmeh
- Centre for Human Genetics, Marburg University, Marburg, Germany
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA
- Institute for Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| |
Collapse
|
2
|
Chen J, Landback P, Arsala D, Guzzetta A, Xia S, Atlas J, Sosa D, Zhang YE, Cheng J, Shen B, Long M. Evolutionarily new genes in humans with disease phenotypes reveal functional enrichment patterns shaped by adaptive innovation and sexual selection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.14.567139. [PMID: 38045239 PMCID: PMC10690195 DOI: 10.1101/2023.11.14.567139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
New genes (or young genes) are genetic novelties pivotal in mammalian evolution. Their phenotypic impacts and evolutionary pattern over time, however, remain elusive in humans due to the technical and ethical complexities in functional studies. By combining human gene age dating and Mendelian disease phenotyping, our research reveals a gradual increase in disease gene proportions with gene age. Logistic regression modeling indicates that this increase could be related to longer protein lengths and higher burdens of deleterious de novo germline variants (DNVs) for older genes. We also find a steady integration of new genes with biomedical phenotypes into the human genome over macroevolutionary timescales (~0.07% per million years). Despite this stable pace, we observe distinct patterns in phenotypic enrichment, pleiotropy, and selective pressures across gene ages. Notably, young genes show significant enrichment in diseases related to the male reproductive system, indicating strong sexual selection. Young genes also exhibit disease-related functions in tissues and systems potentially linked to human phenotypic innovations, such as increased brain size, musculoskeletal phenotypes, and color vision. We further reveal a logistic growth pattern of pleiotropy over evolutionary time, indicating a diminishing marginal growth of new functions for older genes due to intensifying selective constraints over time. We propose a "pleiotropy-barrier" model that delineates higher potentials of phenotypic innovation for young genes than for older genes, a process subject to natural selection. Our study demonstrates that evolutionary new genes are critical in influencing human reproductive evolution and adaptive phenotypic innovations driven by sexual and natural selection, with low pleiotropy as a selective advantage.
Collapse
Affiliation(s)
- Jianhai Chen
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
- Institutes for Systems Genetics, West China University Hospital, Chengdu 610041, China
| | - Patrick Landback
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Deanna Arsala
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Alexander Guzzetta
- Department of Pathology, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Shengqian Xia
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Jared Atlas
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Dylan Sosa
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Yong E. Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Jingqiu Cheng
- Institutes for Systems Genetics, West China University Hospital, Chengdu 610041, China
| | - Bairong Shen
- Institutes for Systems Genetics, West China University Hospital, Chengdu 610041, China
| | - Manyuan Long
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| |
Collapse
|
3
|
Chen J. Evolutionarily new genes in humans with disease phenotypes reveal functional enrichment patterns shaped by adaptive innovation and sexual selection. RESEARCH SQUARE 2023:rs.3.rs-3632644. [PMID: 38045389 PMCID: PMC10690325 DOI: 10.21203/rs.3.rs-3632644/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
New genes (or young genes) are structural novelties pivotal in mammalian evolution. Their phenotypic impact on humans, however, remains elusive due to the technical and ethical complexities in functional studies. Through combining gene age dating with Mendelian disease phenotyping, our research reveals that new genes associated with disease phenotypes steadily integrate into the human genome at a rate of ~ 0.07% every million years over macroevolutionary timescales. Despite this stable pace, we observe distinct patterns in phenotypic enrichment, pleiotropy, and selective pressures between young and old genes. Notably, young genes show significant enrichment in the male reproductive system, indicating strong sexual selection. Young genes also exhibit functions in tissues and systems potentially linked to human phenotypic innovations, such as increased brain size, bipedal locomotion, and color vision. Our findings further reveal increasing levels of pleiotropy over evolutionary time, which accompanies stronger selective constraints. We propose a "pleiotropy-barrier" model that delineates different potentials for phenotypic innovation between young and older genes subject to natural selection. Our study demonstrates that evolutionary new genes are critical in influencing human reproductive evolution and adaptive phenotypic innovations driven by sexual and natural selection, with low pleiotropy as a selective advantage.
Collapse
|
4
|
Di C, Murga Moreno J, Salazar-Tortosa DF, Lauterbur ME, Enard D. Decreased recent adaptation at human mendelian disease genes as a possible consequence of interference between advantageous and deleterious variants. eLife 2021; 10:69026. [PMID: 34636724 PMCID: PMC8526059 DOI: 10.7554/elife.69026] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 10/02/2021] [Indexed: 11/27/2022] Open
Abstract
Advances in genome sequencing have improved our understanding of the genetic basis of human diseases, and thousands of human genes have been associated with different diseases. Recent genomic adaptation at disease genes has not been well characterized. Here, we compare the rate of strong recent adaptation in the form of selective sweeps between mendelian, non-infectious disease genes and non-disease genes across distinct human populations from the 1000 Genomes Project. We find that mendelian disease genes have experienced far less selective sweeps compared to non-disease genes especially in Africa. Investigating further the possible causes of the sweep deficit at disease genes, we find that this deficit is very strong at disease genes with both low recombination rates and with high numbers of associated disease variants, but is almost non-existent at disease genes with higher recombination rates or lower numbers of associated disease variants. Because segregating recessive deleterious variants have the ability to interfere with adaptive ones, these observations strongly suggest that adaptation has been slowed down by the presence of interfering recessive deleterious variants at disease genes. These results suggest that disease genes suffer from a transient inability to adapt as fast as the rest of the genome.
Collapse
Affiliation(s)
- Chenlu Di
- University of Arizona Department of Ecology and Evolutionary Biology, Tucson, United States
| | - Jesus Murga Moreno
- Institut de Biotecnologia i de Biomedicina and Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Barcelona, Spain
| | | | - M Elise Lauterbur
- University of Arizona Department of Ecology and Evolutionary Biology, Tucson, United States
| | - David Enard
- University of Arizona Department of Ecology and Evolutionary Biology, Tucson, United States
| |
Collapse
|
5
|
Rapaport F, Boisson B, Gregor A, Béziat V, Boisson-Dupuis S, Bustamante J, Jouanguy E, Puel A, Rosain J, Zhang Q, Zhang SY, Gleeson JG, Quintana-Murci L, Casanova JL, Abel L, Patin E. Negative selection on human genes underlying inborn errors depends on disease outcome and both the mode and mechanism of inheritance. Proc Natl Acad Sci U S A 2021; 118:e2001248118. [PMID: 33408250 PMCID: PMC7826345 DOI: 10.1073/pnas.2001248118] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Genetic variants underlying life-threatening diseases, being unlikely to be transmitted to the next generation, are gradually and selectively eliminated from the population through negative selection. We study the determinants of this evolutionary process in human genes underlying monogenic diseases by comparing various negative selection scores and an integrative approach, CoNeS, at 366 loci underlying inborn errors of immunity (IEI). We find that genes underlying autosomal dominant (AD) or X-linked IEI have stronger negative selection scores than those underlying autosomal recessive (AR) IEI, whose scores are not different from those of genes not known to be disease causing. Nevertheless, genes underlying AR IEI that are lethal before reproductive maturity with complete penetrance have stronger negative selection scores than other genes underlying AR IEI. We also show that genes underlying AD IEI by loss of function have stronger negative selection scores than genes underlying AD IEI by gain of function, while genes underlying AD IEI by haploinsufficiency are under stronger negative selection than other genes underlying AD IEI. These results are replicated in 1,140 genes underlying inborn errors of neurodevelopment. Finally, we propose a supervised classifier, SCoNeS, which predicts better than state-of-the-art approaches whether a gene is more likely to underlie an AD or AR disease. The clinical outcomes of monogenic inborn errors, together with their mode and mechanisms of inheritance, determine the levels of negative selection at their corresponding loci. Integrating scores of negative selection may facilitate the prioritization of candidate genes and variants in patients suspected to carry an inborn error.
Collapse
Affiliation(s)
- Franck Rapaport
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065;
| | - Bertrand Boisson
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR 1163, Necker Hospital for Sick Children, 75015 Paris, France
- University of Paris, Imagine Institute, 75015 Paris, France
| | - Anne Gregor
- Institute of Human Genetics, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Vivien Béziat
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR 1163, Necker Hospital for Sick Children, 75015 Paris, France
- University of Paris, Imagine Institute, 75015 Paris, France
| | - Stéphanie Boisson-Dupuis
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR 1163, Necker Hospital for Sick Children, 75015 Paris, France
- University of Paris, Imagine Institute, 75015 Paris, France
| | - Jacinta Bustamante
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR 1163, Necker Hospital for Sick Children, 75015 Paris, France
- University of Paris, Imagine Institute, 75015 Paris, France
- Center for the Study of Primary Immunodeficiencies, Necker Hospital for Sick Children, Assistance Publique-Hôpitaux de Paris, 75015 Paris, France
| | - Emmanuelle Jouanguy
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR 1163, Necker Hospital for Sick Children, 75015 Paris, France
- University of Paris, Imagine Institute, 75015 Paris, France
| | - Anne Puel
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR 1163, Necker Hospital for Sick Children, 75015 Paris, France
- University of Paris, Imagine Institute, 75015 Paris, France
| | - Jérémie Rosain
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR 1163, Necker Hospital for Sick Children, 75015 Paris, France
- University of Paris, Imagine Institute, 75015 Paris, France
- Center for the Study of Primary Immunodeficiencies, Necker Hospital for Sick Children, Assistance Publique-Hôpitaux de Paris, 75015 Paris, France
| | - Qian Zhang
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065
| | - Shen-Ying Zhang
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR 1163, Necker Hospital for Sick Children, 75015 Paris, France
- University of Paris, Imagine Institute, 75015 Paris, France
| | - Joseph G Gleeson
- Howard Hughes Medical Institute, La Jolla, CA 92093
- Rady Children's Institute of Genomic Medicine, Department of Neurosciences, University of California San Diego, La Jolla, CA 92093
- Laboratory for Pediatric Brain Disease, The Rockefeller University, New York, NY 10065
| | - Lluis Quintana-Murci
- Human Evolutionary Genetics Unit, Institut Pasteur, UMR 2000, CNRS, 75015 Paris, France
- Chair of Human Genomics and Evolution, Collège de France, 75231 Paris, France
| | - Jean-Laurent Casanova
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065;
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR 1163, Necker Hospital for Sick Children, 75015 Paris, France
- University of Paris, Imagine Institute, 75015 Paris, France
- Howard Hughes Medical Institute, New York, NY 10065
| | - Laurent Abel
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR 1163, Necker Hospital for Sick Children, 75015 Paris, France
- University of Paris, Imagine Institute, 75015 Paris, France
| | - Etienne Patin
- Human Evolutionary Genetics Unit, Institut Pasteur, UMR 2000, CNRS, 75015 Paris, France
| |
Collapse
|
6
|
Romdhane L, Bouhamed H, Ghedira K, Ben Hamda C, Louhichi A, Jmel H, Romdhane S, Charfeddine C, Mokni M, Abdelhak S, Rebai A. The morbid cutaneous anatomy of the human genome revealed by a bioinformatic approach. Genomics 2020; 112:4232-4241. [PMID: 32650097 DOI: 10.1016/j.ygeno.2020.07.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 03/28/2020] [Accepted: 07/02/2020] [Indexed: 01/05/2023]
Abstract
Computational approaches have been developed to prioritize candidate genes in disease gene identification. They are based on different pieces of evidences associating each gene with the given disease. In this study, 648 genes underlying genodermatoses have been compared to 1808 genes involved in other genetic diseases using a bioinformatic approach. These genes were studied at the structural, evolutionary and functional levels. Results show that genes underlying genodermatoses present longer CDS and have more exons. Significant differences were observed in nucleotide motif and amino-acid compositions. Evolutionary conservation analysis revealed that genodermatoses genes have less paralogs, more orthologs in Mouse and Dog and are less conserved. Functional analysis revealed that genodermatosis genes seem to be involved in immune system and skin layers. The Bayesian network model returned a rate of good classification of around 80%. This computational approach could help investigators working in the field of dermatology by prioritizing positional candidate genes for mutation screening.
Collapse
Affiliation(s)
- Lilia Romdhane
- Biomedical Genomics and Oncogenetics Laboratory LR11IPT05, LR16IPT05, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia; Department of Biology, Faculty of Sciences of Bizerte, Jarzouna, Université Tunis Carthage, Tunis, Tunisia.
| | - Heni Bouhamed
- Molecular and Cellular Screening Process Laboratory, Centre of Biotechnology of Sfax, Sfax, Tunisia
| | - Kais Ghedira
- Laboratory of Bioinformatics, Biomathematics and Biostatistics (LR16IPT09), Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Cherif Ben Hamda
- Laboratory of Bioinformatics, Biomathematics and Biostatistics (LR16IPT09), Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Amel Louhichi
- Molecular and Cellular Screening Process Laboratory, Centre of Biotechnology of Sfax, Sfax, Tunisia
| | - Haifa Jmel
- Biomedical Genomics and Oncogenetics Laboratory LR11IPT05, LR16IPT05, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Safa Romdhane
- Biomedical Genomics and Oncogenetics Laboratory LR11IPT05, LR16IPT05, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Chérine Charfeddine
- Biomedical Genomics and Oncogenetics Laboratory LR11IPT05, LR16IPT05, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia; High Institut of Biotechnology of Sidi Thabet, University of Manouba, BiotechPole of Sidi Thabet, Ariana, Tunisia
| | - Mourad Mokni
- Department of Dermatology, CHU La Rabta Tunis, Tunis, Tunisia; Public health and infection Research Laboratory, La Rabta Hospital, Tunis, Tunisia
| | - Sonia Abdelhak
- Biomedical Genomics and Oncogenetics Laboratory LR11IPT05, LR16IPT05, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Ahmed Rebai
- Molecular and Cellular Screening Process Laboratory, Centre of Biotechnology of Sfax, Sfax, Tunisia
| |
Collapse
|
7
|
Esteller-Cucala P, Maceda I, Børglum AD, Demontis D, Faraone SV, Cormand B, Lao O. Genomic analysis of the natural history of attention-deficit/hyperactivity disorder using Neanderthal and ancient Homo sapiens samples. Sci Rep 2020; 10:8622. [PMID: 32451437 PMCID: PMC7248073 DOI: 10.1038/s41598-020-65322-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 04/24/2020] [Indexed: 11/18/2022] Open
Abstract
Attention-deficit/hyperactivity disorder (ADHD) is an impairing neurodevelopmental condition highly prevalent in current populations. Several hypotheses have been proposed to explain this paradox, mainly in the context of the Paleolithic versus Neolithic cultural shift but especially within the framework of the mismatch theory. This theory elaborates on how a particular trait once favoured in an ancient environment might become maladaptive upon environmental changes. However, given the lack of genomic data available for ADHD, these theories have not been empirically tested. We took advantage of the largest GWAS meta-analysis available for this disorder consisting of over 20,000 individuals diagnosed with ADHD and 35,000 controls, to assess the evolution of ADHD-associated alleles in European populations using archaic, ancient and modern human samples. We also included Approximate Bayesian computation coupled with deep learning analyses and singleton density scores to detect human adaptation. Our analyses indicate that ADHD-associated alleles are enriched in loss of function intolerant genes, supporting the role of selective pressures in this early-onset phenotype. Furthermore, we observed that the frequency of variants associated with ADHD has steadily decreased since Paleolithic times, particularly in Paleolithic European populations compared to samples from the Neolithic Fertile Crescent. We demonstrate this trend cannot be explained by African admixture nor Neanderthal introgression, since introgressed Neanderthal alleles are enriched in ADHD risk variants. All analyses performed support the presence of long-standing selective pressures acting against ADHD-associated alleles until recent times. Overall, our results are compatible with the mismatch theory for ADHD but suggest a much older time frame for the evolution of ADHD-associated alleles compared to previous hypotheses.
Collapse
Affiliation(s)
- Paula Esteller-Cucala
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Institut de Biologia Evolutiva (UPF-CSIC), Barcelona, Spain
| | - Iago Maceda
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Anders D Børglum
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Centre for Integrative Sequencing, iSEQ, and Aarhus Genome Centre, Aarhus, Denmark
- Department of Biomedicine - Human Genetics, Aarhus University, Aarhus, Denmark
| | - Ditte Demontis
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Centre for Integrative Sequencing, iSEQ, and Aarhus Genome Centre, Aarhus, Denmark
- Department of Biomedicine - Human Genetics, Aarhus University, Aarhus, Denmark
| | - Stephen V Faraone
- Departments of Psychiatry and of Neuroscience and Physiology, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Bru Cormand
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain.
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Instituto de Salud Carlos III, Madrid, Spain.
- Institut de Biomedicina de la Universitat de Barcelona (IBUB), Barcelona, Spain.
- Institut de Recerca Sant Joan de Déu (IR-SJD), Esplugues de Llobregat, Spain.
| | - Oscar Lao
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.
| |
Collapse
|
8
|
Maleki E, Babashah H, Koohi S, Kavehvash Z. All-optical DNA variant discovery utilizing extended DV-curve-based wavelength modulation. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2018; 35:1929-1940. [PMID: 30461853 DOI: 10.1364/josaa.35.001929] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 10/03/2018] [Indexed: 06/09/2023]
Abstract
This paper presents a novel optical processing approach for exploring genome sequences built upon an optical correlator for global alignment and the extended dual-vector-curve (DV-curve) method for local alignment. To overcome the problem of the traditional DV-curve method for presenting an accurate and simplified output, we propose the hybrid amplitude wavelength polarization optical DV-curve (HAWPOD) method, built upon the DV-curve method, to analyze genome sequences in three steps: DNA coding, alignment, and post-analysis. For this purpose, a tunable graphene-based color filter is designed for wavelength modulation of optical signals. Moreover, all-optical implementation of the HAWPOD method is developed, while its accuracy is validated through numerical simulations in LUMERICAL FDTD. The results express that the proposed method is much faster than its electrical counterparts.
Collapse
|
9
|
Pardiñas AF, Holmans P, Pocklington AJ, Escott-Price V, Ripke S, Carrera N, Legge SE, Bishop S, Cameron D, Hamshere ML, Han J, Hubbard L, Lynham A, Mantripragada K, Rees E, MacCabe JH, McCarroll SA, Baune BT, Breen G, Byrne EM, Dannlowski U, Eley TC, Hayward C, Martin NG, McIntosh AM, Plomin R, Porteous DJ, Wray NR, Caballero A, Geschwind DH, Huckins LM, Ruderfer DM, Santiago E, Sklar P, Stahl EA, Won H, Agerbo E, Als TD, Andreassen OA, Bækvad-Hansen M, Mortensen PB, Pedersen CB, Børglum AD, Bybjerg-Grauholm J, Djurovic S, Durmishi N, Pedersen MG, Golimbet V, Grove J, Hougaard DM, Mattheisen M, Molden E, Mors O, Nordentoft M, Pejovic-Milovancevic M, Sigurdsson E, Silagadze T, Hansen CS, Stefansson K, Stefansson H, Steinberg S, Tosato S, Werge T, Collier DA, Rujescu D, Kirov G, Owen MJ, O'Donovan MC, Walters JTR. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nat Genet 2018; 50:381-389. [PMID: 29483656 PMCID: PMC5918692 DOI: 10.1038/s41588-018-0059-2] [Citation(s) in RCA: 975] [Impact Index Per Article: 162.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 01/07/2018] [Indexed: 12/13/2022]
Abstract
Schizophrenia is a debilitating psychiatric condition often associated with poor quality of life and decreased life expectancy. Lack of progress in improving treatment outcomes has been attributed to limited knowledge of the underlying biology, although large-scale genomic studies have begun to provide insights. We report a new genome-wide association study of schizophrenia (11,260 cases and 24,542 controls), and through meta-analysis with existing data we identify 50 novel associated loci and 145 loci in total. Through integrating genomic fine-mapping with brain expression and chromosome conformation data, we identify candidate causal genes within 33 loci. We also show for the first time that the common variant association signal is highly enriched among genes that are under strong selective pressures. These findings provide new insights into the biology and genetic architecture of schizophrenia, highlight the importance of mutation-intolerant genes and suggest a mechanism by which common risk variants persist in the population.
Collapse
Affiliation(s)
- Antonio F Pardiñas
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Peter Holmans
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Andrew J Pocklington
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Valentina Escott-Price
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Stephan Ripke
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychiatry and Psychotherapy, Charité, Campus Mitte, Berlin, Germany
| | - Noa Carrera
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Sophie E Legge
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Sophie Bishop
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Darren Cameron
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Marian L Hamshere
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Jun Han
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Leon Hubbard
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Amy Lynham
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Kiran Mantripragada
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Elliott Rees
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - James H MacCabe
- Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Steven A McCarroll
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Bernhard T Baune
- Discipline of Psychiatry, University of Adelaide, Adelaide, South Australia, Australia
| | - Gerome Breen
- MRC Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- NIHR Biomedical Research Centre for Mental Health, Maudsley Hospital and Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Enda M Byrne
- Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
| | - Udo Dannlowski
- Department of Psychiatry and Psychotherapy, University of Münster, Münster, Germany
| | - Thalia C Eley
- MRC Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Caroline Hayward
- Medical Genetics Section, Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - Nicholas G Martin
- School of Psychology, University of Queensland, Brisbane, Queensland, Australia
- QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Andrew M McIntosh
- Division of Psychiatry, University of Edinburgh, Edinburgh, UK
- Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh, UK
| | - Robert Plomin
- MRC Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - David J Porteous
- Medical Genetics Section, Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - Naomi R Wray
- Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
| | - Armando Caballero
- Departamento de Bioquímica, Genética e Inmunología. Facultad de Biología, Universidad de Vigo, Vigo, Spain
| | - Daniel H Geschwind
- Department of Neurology, Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Laura M Huckins
- Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Douglas M Ruderfer
- Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Enrique Santiago
- Departamento de Biología Funcional. Facultad de Biología, Universidad de Oviedo, Oviedo, Spain
| | - Pamela Sklar
- Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Eli A Stahl
- Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Hyejung Won
- Department of Neurology, Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Esben Agerbo
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
| | - Thomas D Als
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- iSEQ, Center for Integrative Sequencing, Aarhus University, Aarhus, Denmark
- Department of Biomedicine-Human Genetics, Aarhus University, Aarhus, Denmark
| | - Ole A Andreassen
- Institute of Clinical Medicine, University of Oslo, Oslo, Norway
- NORMENT, KG Jebsen Centre for Psychosis Research, Division of Mental Health and Addiction, Oslo University Hospital, Oslo, Norway
| | - Marie Bækvad-Hansen
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
| | - Preben Bo Mortensen
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
- iSEQ, Center for Integrative Sequencing, Aarhus University, Aarhus, Denmark
| | - Carsten Bøcker Pedersen
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
| | - Anders D Børglum
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- iSEQ, Center for Integrative Sequencing, Aarhus University, Aarhus, Denmark
- Department of Biomedicine-Human Genetics, Aarhus University, Aarhus, Denmark
| | - Jonas Bybjerg-Grauholm
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
| | - Srdjan Djurovic
- NORMENT, KG Jebsen Centre for Psychosis Research, Department of Clinical Science, University of Bergen, Bergen, Norway
- Department of Medical Genetics, Oslo University Hospital, Oslo, Norway
| | - Naser Durmishi
- Department of Child and Adolescent Psychiatry, University Clinic of Psychiatry, Skopje, Macedonia
| | - Marianne Giørtz Pedersen
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
| | - Vera Golimbet
- Department of Clinical Genetics, Mental Health Research Center, Moscow, Russia
| | - Jakob Grove
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- iSEQ, Center for Integrative Sequencing, Aarhus University, Aarhus, Denmark
- Department of Biomedicine-Human Genetics, Aarhus University, Aarhus, Denmark
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - David M Hougaard
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
| | - Manuel Mattheisen
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- iSEQ, Center for Integrative Sequencing, Aarhus University, Aarhus, Denmark
- Department of Biomedicine-Human Genetics, Aarhus University, Aarhus, Denmark
| | - Espen Molden
- Center for Psychopharmacology, Diakonhjemmet Hospital, Oslo, Norway
| | - Ole Mors
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- Psychosis Research Unit, Aarhus University Hospital, Risskov, Denmark
| | - Merete Nordentoft
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- Mental Health Services in the Capital Region of Denmark, Mental Health Center Copenhagen, University of Copenhagen, Copenhagen, Denmark
| | | | | | - Teimuraz Silagadze
- Department of Psychiatry and Drug Addiction, Tbilisi State Medical University (TSMU), Tbilisi, Georgia
| | - Christine Søholm Hansen
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
| | | | | | | | - Sarah Tosato
- Section of Psychiatry, Department of Public Health and Community Medicine, University of Verona, Verona, Italy
| | - Thomas Werge
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- Institute of Biological Psychiatry, MHC Sct. Hans, Mental Health Services Copenhagen, Roskilde, Denmark
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| | - David A Collier
- MRC Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- Discovery Neuroscience Research, Eli Lilly and Company, Lilly Research Laboratories, Windlesham, UK
| | - Dan Rujescu
- Department of Psychiatry, University of Halle, Halle, Germany
- Department of Psychiatry, University of Munich, Munich, Germany
| | - George Kirov
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Michael J Owen
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK.
| | - Michael C O'Donovan
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK.
| | - James T R Walters
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK.
| |
Collapse
|
10
|
Fernandes M, Wan C, Tacutu R, Barardo D, Rajput A, Wang J, Thoppil H, Thornton D, Yang C, Freitas A, de Magalhães JP. Systematic analysis of the gerontome reveals links between aging and age-related diseases. Hum Mol Genet 2018; 25:4804-4818. [PMID: 28175300 PMCID: PMC5418736 DOI: 10.1093/hmg/ddw307] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2016] [Revised: 08/05/2016] [Accepted: 08/26/2016] [Indexed: 12/11/2022] Open
Abstract
In model organisms, over 2,000 genes have been shown to modulate aging, the collection of which we call the ‘gerontome’. Although some individual aging-related genes have been the subject of intense scrutiny, their analysis as a whole has been limited. In particular, the genetic interaction of aging and age-related pathologies remain a subject of debate. In this work, we perform a systematic analysis of the gerontome across species, including human aging-related genes. First, by classifying aging-related genes as pro- or anti-longevity, we define distinct pathways and genes that modulate aging in different ways. Our subsequent comparison of aging-related genes with age-related disease genes reveals species-specific effects with strong overlaps between aging and age-related diseases in mice, yet surprisingly few overlaps in lower model organisms. We discover that genetic links between aging and age-related diseases are due to a small fraction of aging-related genes which also tend to have a high network connectivity. Other insights from our systematic analysis include assessing how using datasets with genes more or less studied than average may result in biases, showing that age-related disease genes have faster molecular evolution rates and predicting new aging-related drugs based on drug-gene interaction data. Overall, this is the largest systems-level analysis of the genetics of aging to date and the first to discriminate anti- and pro-longevity genes, revealing new insights on aging-related genes as a whole and their interactions with age-related diseases.
Collapse
Affiliation(s)
- Maria Fernandes
- Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK.,LaSIGE - Large-Scale Informatics Systems Laboratory, Faculty of Sciences, University of Lisbon, Portugal
| | - Cen Wan
- School of Computing, University of Kent, Canterbury, UK.,Department of Computer Science, University College London, London, UK
| | - Robi Tacutu
- Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
| | - Diogo Barardo
- Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
| | - Ashish Rajput
- Research Group for Computational Systems Biology, German Center for Neurodegenerative Diseases (DZNE), Göttingen, Germany
| | - Jingwei Wang
- Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
| | - Harikrishnan Thoppil
- Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
| | - Daniel Thornton
- Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
| | - Chenhao Yang
- Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
| | - Alex Freitas
- School of Computing, University of Kent, Canterbury, UK
| | - João Pedro de Magalhães
- Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
| |
Collapse
|
11
|
Spataro N, Rodríguez JA, Navarro A, Bosch E. Properties of human disease genes and the role of genes linked to Mendelian disorders in complex disease aetiology. Hum Mol Genet 2017; 26:489-500. [PMID: 28053046 PMCID: PMC5409085 DOI: 10.1093/hmg/ddw405] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Revised: 11/10/2016] [Accepted: 11/23/2016] [Indexed: 01/19/2023] Open
Abstract
Do genes presenting variation that has been linked to human disease have different biological properties than genes that have never been related to disease? What is the relationship between disease and fitness? Are the evolutionary pressures that affect genes linked to Mendelian diseases the same to those acting on genes whose variation contributes to complex disorders? The answers to these questions could shed light on the architecture of human genetic disorders and may have relevant implications when designing mapping strategies in future genetic studies. Here we show that, relative to non-disease genes, human disease (HD) genes have specific evolutionary profiles and protein network properties. Additionally, our results indicate that the mutation-selection balance renders an insufficient account of the evolutionary history of some HD genes and that adaptive selection could also contribute to shape their genetic architecture. Notably, several biological features of HD genes depend on the type of pathology (complex or Mendelian) with which they are related. For example, genes harbouring both causal variants for Mendelian disorders and risk factors for complex disease traits (Complex-Mendelian genes), tend to present higher functional relevance in the protein network and higher expression levels than genes associated only with complex disorders. Moreover, risk variants in Complex-Mendelian genes tend to present higher odds ratios than those on genes associated with the same complex disorders but with no link to Mendelian diseases. Taken together, our results suggest that genetic variation at genes linked to Mendelian disorders plays an important role in driving susceptibility to complex disease.
Collapse
Affiliation(s)
- Nino Spataro
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Juan Antonio Rodríguez
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Arcadi Navarro
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
- National Institute for Bioinformatics (INB), Barcelona Biomedical Research Park (PRBB), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona Biomedical Research Park (PRBB), Barcelona, Spain
- Center for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona Biomedical Research Park (PRBB), Barcelona, Spain
| | - Elena Bosch
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| |
Collapse
|
12
|
McLysaght A, Hurst LD. Open questions in the study of de novo genes: what, how and why. Nat Rev Genet 2016; 17:567-78. [PMID: 27452112 DOI: 10.1038/nrg.2016.78] [Citation(s) in RCA: 125] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The study of de novo protein-coding genes is maturing from the ad hoc reporting of individual cases to the systematic analysis of extensive genomic data from several species. We identify three key challenges for this emerging field: understanding how best to identify de novo genes, how they arise and why they spread. We highlight the intellectual challenges of understanding how a de novo gene becomes integrated into pre-existing functions and becomes essential. We suggest that, as with protein sequence evolution, antagonistic co-evolution may be key to de novo gene evolution, particularly for new essential genes and new cancer-associated genes.
Collapse
Affiliation(s)
- Aoife McLysaght
- The Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin 2, Ireland
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, Somerset BA2 7AY, UK
| |
Collapse
|
13
|
Purifying selection shapes the coincident SNP distribution of primate coding sequences. Sci Rep 2016; 6:27272. [PMID: 27255481 PMCID: PMC4891680 DOI: 10.1038/srep27272] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Accepted: 05/17/2016] [Indexed: 12/13/2022] Open
Abstract
Genome-wide analysis has observed an excess of coincident single nucleotide polymorphisms (coSNPs) at human-chimpanzee orthologous positions, and suggested that this is due to cryptic variation in the mutation rate. While this phenomenon primarily corresponds with non-coding coSNPs, the situation in coding sequences remains unclear. Here we calculate the observed-to-expected ratio of coSNPs (coSNPO/E) to estimate the prevalence of human-chimpanzee coSNPs, and show that the excess of coSNPs is also present in coding regions. Intriguingly, coSNPO/E is much higher at zero-fold than at nonzero-fold degenerate sites; such a difference is due to an elevation of coSNPO/E at zero-fold degenerate sites, rather than a reduction at nonzero-fold degenerate ones. These trends are independent of chimpanzee subpopulation, population size, or sequencing techniques; and hold in broad generality across primates. We find that this discrepancy cannot fully explained by sequence contexts, shared ancestral polymorphisms, SNP density, and recombination rate, and that coSNPO/E in coding sequences is significantly influenced by purifying selection. We also show that selection and mutation rate affect coSNPO/E independently, and coSNPs tend to be less damaging and more correlated with human diseases than non-coSNPs. These suggest that coSNPs may represent a “signature” during primate protein evolution.
Collapse
|
14
|
Tranchevent LC, Ardeshirdavani A, ElShal S, Alcaide D, Aerts J, Auboeuf D, Moreau Y. Candidate gene prioritization with Endeavour. Nucleic Acids Res 2016; 44:W117-21. [PMID: 27131783 PMCID: PMC4987917 DOI: 10.1093/nar/gkw365] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 04/23/2016] [Indexed: 01/25/2023] Open
Abstract
Genomic studies and high-throughput experiments often produce large lists of candidate genes among which only a small fraction are truly relevant to the disease, phenotype or biological process of interest. Gene prioritization tackles this problem by ranking candidate genes by profiling candidates across multiple genomic data sources and integrating this heterogeneous information into a global ranking. We describe an extended version of our gene prioritization method, Endeavour, now available for six species and integrating 75 data sources. The performance (Area Under the Curve) of Endeavour on cross-validation benchmarks using ‘gold standard’ gene sets varies from 88% (for human phenotypes) to 95% (for worm gene function). In addition, we have also validated our approach using a time-stamped benchmark derived from the Human Phenotype Ontology, which provides a setting close to prospective validation. With this benchmark, using 3854 novel gene–phenotype associations, we observe a performance of 82%. Altogether, our results indicate that this extended version of Endeavour efficiently prioritizes candidate genes. The Endeavour web server is freely available at https://endeavour.esat.kuleuven.be/.
Collapse
Affiliation(s)
- Léon-Charles Tranchevent
- INSERM U1210, CNRS UMR5239, Laboratoire de Biologie et de Modélisation de la Cellule, Ecole Normale Supérieure de Lyon, Université de Lyon, 69364 Lyon, France
| | - Amin Ardeshirdavani
- Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics Department, KU Leuven, B-3001 Leuven, Belgium iMinds Future Health Department, KU Leuven, B-3001 Leuven, Belgium
| | - Sarah ElShal
- Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics Department, KU Leuven, B-3001 Leuven, Belgium iMinds Future Health Department, KU Leuven, B-3001 Leuven, Belgium
| | - Daniel Alcaide
- Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics Department, KU Leuven, B-3001 Leuven, Belgium iMinds Future Health Department, KU Leuven, B-3001 Leuven, Belgium
| | - Jan Aerts
- Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics Department, KU Leuven, B-3001 Leuven, Belgium iMinds Future Health Department, KU Leuven, B-3001 Leuven, Belgium
| | - Didier Auboeuf
- INSERM U1210, CNRS UMR5239, Laboratoire de Biologie et de Modélisation de la Cellule, Ecole Normale Supérieure de Lyon, Université de Lyon, 69364 Lyon, France
| | - Yves Moreau
- Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics Department, KU Leuven, B-3001 Leuven, Belgium iMinds Future Health Department, KU Leuven, B-3001 Leuven, Belgium
| |
Collapse
|
15
|
Baird A, Coimbra R, Dang X, Eliceiri BP, Costantini TW. Up-regulation of the human-specific CHRFAM7A gene in inflammatory bowel disease. BBA CLINICAL 2016; 5:66-71. [PMID: 27051591 PMCID: PMC4802402 DOI: 10.1016/j.bbacli.2015.12.003] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/23/2015] [Revised: 12/16/2015] [Accepted: 12/17/2015] [Indexed: 12/16/2022]
Abstract
Background: The α7-subunit of the α7-nicotinic acetylcholine receptor (α7-nAChR) is an obligatory intermediate for the anti-inflammatory effects of the vagus nerve. But in humans, there exists a second gene called CHRFAM7A that encodes a dominant negative α7-nAChR inhibitor. Here, we investigated whether their expression was altered in inflammatory bowel disease (IBD) and colon cancer. Methods: Quantitative RT-PCR measured gene expression of human α7-nAChR gene (CHRNA7), CHRFAM7A, TBC3D1, and actin in biopsies of normal large and small intestine, and compared to their expression in biopsies of ulcerative colitis, Crohn's disease, and colon cancer. Results: qRT-PCR showed that CHRFAM7A and CHRNA7 gene expression was significantly (p < .02) up-regulated in IBD (N = 64). Gene expression was unchanged in colon cancer. Further analyses revealed that there were differences in ulcerative colitis and Crohn's Disease. Colon biopsies of ulcerative colitis (N = 33) confirmed increased expression of CHRFAM7A and decreased in CHRNA7 expression (p < 0.001). Biopsies of Crohn's disease (N = 31), however, showed only small changes in CHRFAM7A expression (p < 0.04) and no change in CHRNA7. When segregated by tissue source, both CHRFAM7A up-regulation (p < 0.02) and CHRNA7 down-regulation (p < 0.001) were measured in colon, but not in small intestine. Conclusion: The human-specific CHRFAM7A gene is up-regulated, and its target, CHRNA7, down-regulated, in IBD. Differences between ulcerative colitis and Crohn's disease tie to location of disease. Significance: The appearance of IBD in modern humans may be consequent to the emergence of CHRFAM7A, a human-specific α7-nAChR antagonist. CHRFAM7A could present a new, unrecognized target for development of IBD therapeutics. CHRFAM7A is a pro-inflammatory and human-specific gene not found in other species. CHRFAM7A expression is elevated in certain IBD, but its target CHRNA7 decreased. Changes in CHRFAM7A and CHRNA7 expression are disease- and tissue site specific. Some IBDs may be examples of “off-target disease sequelae” of human evolution. Animal modeling of human disease do not test contributions of human-specific genes.
Collapse
Affiliation(s)
- Andrew Baird
- Division of Trauma, Surgical Critical Care, Burns and Acute Care Surgery, Department of Surgery, University of California San Diego, La Jolla, CA, USA
| | - Raul Coimbra
- Division of Trauma, Surgical Critical Care, Burns and Acute Care Surgery, Department of Surgery, University of California San Diego, La Jolla, CA, USA
| | - Xitong Dang
- Division of Trauma, Surgical Critical Care, Burns and Acute Care Surgery, Department of Surgery, University of California San Diego, La Jolla, CA, USA; The Key Laboratory of Medical Electrophysiology, Institute of Cardiovascular Research, Sichuan Medical University, Luzhou, China
| | - Brian P Eliceiri
- Division of Trauma, Surgical Critical Care, Burns and Acute Care Surgery, Department of Surgery, University of California San Diego, La Jolla, CA, USA
| | - Todd W Costantini
- Division of Trauma, Surgical Critical Care, Burns and Acute Care Surgery, Department of Surgery, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
16
|
Chakraborty S, Panda A, Ghosh TC. Exploring the evolutionary rate differences between human disease and non-disease genes. Genomics 2015; 108:18-24. [PMID: 26562439 DOI: 10.1016/j.ygeno.2015.11.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Revised: 10/29/2015] [Accepted: 11/03/2015] [Indexed: 10/22/2022]
Abstract
Comparisons of evolutionary features between human disease and non-disease genes have a wide implication to understand the genetic basis of human disease genes. However, it has not yet been resolved whether disease genes evolve at slower or faster rate than the non-disease genes. To resolve this controversy, here we integrated human disease genes from several databases and compared their protein evolutionary rates with non-disease genes in both housekeeping and tissue-specific group. We noticed that in tissue specific group, disease genes evolve significantly at a slower rate than non-disease genes. However, we found no significant difference in evolutionary rates between disease and non-disease genes in housekeeping group. Tissue specific disease genes have a higher protein complex number, elevated gene expression level and are also associated with conserve biological processes. Finally, our regression analysis suggested that protein complex number followed by protein multifunctionality independently modulates the evolutionary rate of human disease genes.
Collapse
Affiliation(s)
- Sandip Chakraborty
- Bioinformatics Centre, Bose Institute, P-1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | - Arup Panda
- Bioinformatics Centre, Bose Institute, P-1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | - Tapash Chandra Ghosh
- Bioinformatics Centre, Bose Institute, P-1/12, C.I.T. Scheme VII M, Kolkata 700 054, India.
| |
Collapse
|
17
|
Dang X, Eliceiri BP, Baird A, Costantini TW. CHRFAM7A: a human-specific α7-nicotinic acetylcholine receptor gene shows differential responsiveness of human intestinal epithelial cells to LPS. FASEB J 2015; 29:2292-302. [PMID: 25681457 DOI: 10.1096/fj.14-268037] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Accepted: 01/20/2015] [Indexed: 02/06/2023]
Abstract
The human genome contains a unique, distinct, and human-specific α7-nicotinic acetylcholine receptor (α7nAChR) gene [CHRNA7 (gene-encoding α7-nicotinic acetylcholine receptor)] called CHRFAM7A (gene-encoding dup-α7-nicotinic acetylcholine receptor) on a locus of chromosome 15 associated with mental illness, including schizophrenia. Located 5' upstream from the "wild-type" CHRNA7 gene that is found in other vertebrates, we demonstrate CHRFAM7A expression in a broad range of epithelial cells and sequenced the CHRFAM7A transcript found in normal human fetal small intestine epithelial (FHs) cells to prove its identity. We then compared its expression to CHRNA7 in 11 gut epithelial cell lines, showed that there is a differential response to LPS when compared to CHRNA7, and characterized the CHRFAM7A promoter. We report that both CHRFAM7A and CHRNA7 gene expression are widely distributed in human epithelial cell lines but that the levels of CHRFAM7A gene expression vary up to 5000-fold between different gut epithelial cells. A 3-hour treatment of epithelial cells with 100 ng/ml LPS increased CHRFAM7A gene expression by almost 1000-fold but had little effect on CHRNA7 gene expression. Mapping the regulatory elements responsible for CHRFAM7A gene expression identifies a 1 kb sequence in the UTR of the CHRFAM7A gene that is modulated by LPS. Taken together, these data establish the presence, identity, and differential regulation of the human-specific CHRFAM7A gene in human gut epithelial cells. In light of the fact that CHRFAM7A expression is reported to modulate ligand binding to, and alter the activity of, the wild-type α7nAChR ligand-gated pentameric ion channel, the findings point to the existence of a species-specific α7nAChR response that might regulate gut epithelial function in a human-specific fashion.
Collapse
Affiliation(s)
- Xitong Dang
- *Division of Trauma, Surgical Critical Care, Burns, and Acute Care Surgery, Department of Surgery, University of California, San Diego Health Sciences, San Diego, California, USA; and Cardiovascular Research Center, Luzhou Medical College, Luzhou, Sichuan, China
| | - Brian P Eliceiri
- *Division of Trauma, Surgical Critical Care, Burns, and Acute Care Surgery, Department of Surgery, University of California, San Diego Health Sciences, San Diego, California, USA; and Cardiovascular Research Center, Luzhou Medical College, Luzhou, Sichuan, China
| | - Andrew Baird
- *Division of Trauma, Surgical Critical Care, Burns, and Acute Care Surgery, Department of Surgery, University of California, San Diego Health Sciences, San Diego, California, USA; and Cardiovascular Research Center, Luzhou Medical College, Luzhou, Sichuan, China
| | - Todd W Costantini
- *Division of Trauma, Surgical Critical Care, Burns, and Acute Care Surgery, Department of Surgery, University of California, San Diego Health Sciences, San Diego, California, USA; and Cardiovascular Research Center, Luzhou Medical College, Luzhou, Sichuan, China
| |
Collapse
|
18
|
Begum T, Ghosh TC. Elucidating the genotype-phenotype relationships and network perturbations of human shared and specific disease genes from an evolutionary perspective. Genome Biol Evol 2014; 6:2741-53. [PMID: 25287147 PMCID: PMC4224346 DOI: 10.1093/gbe/evu220] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
To date, numerous studies have been attempted to determine the extent of variation in evolutionary rates between human disease and nondisease (ND) genes. In our present study, we have considered human autosomal monogenic (Mendelian) disease genes, which were classified into two groups according to the number of phenotypic defects, that is, specific disease (SPD) gene (one gene: one defect) and shared disease (SHD) gene (one gene: multiple defects). Here, we have compared the evolutionary rates of these two groups of genes, that is, SPD genes and SHD genes with respect to ND genes. We observed that the average evolutionary rates are slow in SHD group, intermediate in SPD group, and fast in ND group. Group-to-group evolutionary rate differences remain statistically significant regardless of their gene expression levels and number of defects. We demonstrated that disease genes are under strong selective constraint if they emerge through edgetic perturbation or drug-induced perturbation of the interactome network, show tissue-restricted expression, and are involved in transmembrane transport. Among all the factors, our regression analyses interestingly suggest the independent effects of 1) drug-induced perturbation and 2) the interaction term of expression breadth and transmembrane transport on protein evolutionary rates. We reasoned that the drug-induced network disruption is a combination of several edgetic perturbations and, thus, has more severe effect on gene phenotypes.
Collapse
Affiliation(s)
- Tina Begum
- Bioinformatics Centre, Bose Institute, Kolkata, West Bengal, India
| | | |
Collapse
|
19
|
Zhu C, Wu C, Aronow BJ, Jegga AG. Computational approaches for human disease gene prediction and ranking. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2014; 799:69-84. [PMID: 24292962 DOI: 10.1007/978-1-4614-8778-4_4] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
While candidate gene association studies continue to be the most practical and frequently employed approach in disease gene investigation for complex disorders, selecting suitable genes to test is a challenge. There are several computational approaches available for selecting and prioritizing disease candidate genes. A majority of these tools are based on guilt-by-association principle where novel disease candidate genes are identified and prioritized based on either functional or topological similarity to known disease genes. In this chapter we review the prioritization criteria and the algorithms along with some use cases that demonstrate how these tools can be used for identifying and ranking human disease candidate genes.
Collapse
Affiliation(s)
- Cheng Zhu
- Department of Computer Science, College of Engineering and Applied Science, University of Cincinnati, Cincinnati, OH, USA
| | | | | | | |
Collapse
|
20
|
Eyre-Walker YC, Eyre-Walker A. The role of mutation rate variation and genetic diversity in the architecture of human disease. PLoS One 2014; 9:e90166. [PMID: 24587257 PMCID: PMC3937440 DOI: 10.1371/journal.pone.0090166] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2013] [Accepted: 01/28/2014] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND We have investigated the role that the mutation rate and the structure of genetic variation at a locus play in determining whether a gene is involved in disease. We predict that the mutation rate and its genetic diversity should be higher in genes associated with disease, unless all genes that could cause disease have already been identified. RESULTS Consistent with our predictions we find that genes associated with Mendelian and complex disease are substantially longer than non-disease genes. However, we find that both Mendelian and complex disease genes are found in regions of the genome with relatively low mutation rates, as inferred from intron divergence between humans and chimpanzees, and they are predicted to have similar rates of non-synonymous mutation as other genes. Finally, we find that disease genes are in regions of significantly elevated genetic diversity, even when variation in the rate of mutation is controlled for. The effect is small nevertheless. CONCLUSIONS Our results suggest that gene length contributes to whether a gene is associated with disease. However, the mutation rate and the genetic architecture of the locus appear to play only a minor role in determining whether a gene is associated with disease.
Collapse
Affiliation(s)
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
- * E-mail:
| |
Collapse
|
21
|
Zhan Y, Zhang R, Lv H, Song X, Xu X, Chai L, Lv W, Shang Z, Jiang Y, Zhang R. Prioritization of candidate genes for periodontitis using multiple computational tools. J Periodontol 2014; 85:1059-69. [PMID: 24476546 DOI: 10.1902/jop.2014.130523] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
BACKGROUND Both genetic and environmental factors contribute to the development of periodontitis. Genetic studies identified a variety of candidate genes for periodontitis. The aim of the present study is to identify the most promising candidate genes for periodontitis using an integrative gene ranking method. METHODS Seed genes that were confirmed to be associated with periodontitis were identified using text mining. Three types of candidate genes were then extracted from different resources (expression profiles, genome-wide association studies). Combining the seed genes, four freely available bioinformatics tools (ToppGene, DIR, Endeavour, and GPEC) were integrated for prioritization of candidate genes. Candidate genes that identified with at least three programs and ranked in the top 20 by each program were considered the most promising. RESULTS Prioritization analysis resulted in 21 promising genes involved or potentially involved in periodontitis. Among them, IL18 (interleukin 18), CD44 (CD44 molecule), CXCL1 (chemokine [CXC motif] ligand 1), IL6ST (interleukin 6 signal transducer), MMP3 (matrix metallopeptidase 3), MMP7, CCR1 (chemokine [C-C motif] receptor 1), MMP13, and TLR9 (Toll-like receptor 9) had been associated with periodontitis. However, the roles of other genes, such as CSF3 (colony stimulating factor 3 receptor), CD40, TNFSF14 (tumor necrosis factor receptor superfamily, member 14), IFNB1 (interferon-β1), TIRAP (toll-interleukin 1 receptor domain containing adaptor protein), IL2RA (interleukin 2 receptor α), ETS1 (v-ets avian erythroblastosis virus E26 oncogene homolog 1), GADD45B (growth arrest and DNA-damage-inducible 45 β), BIRC3 (baculoviral IAP repeat containing 3), VAV1 (vav 1 guanine nucleotide exchange factor), COL5A1 (collagen, type V, α1), and C3 (complement component 3), have not been investigated thoroughly in the process of periodontitis. These genes are mainly involved in bacterial infection, immune response, and inflammatory reaction, suggesting that further characterizing their roles in periodontitis will be important. CONCLUSIONS A combination of computational tools will be useful in mining candidate genes for periodontitis. These theoretical results provide new clues for experimental biologists to plan targeted experiments.
Collapse
Affiliation(s)
- Yuanbo Zhan
- Department of Periodontology and Oral Mucosa, Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Abstract
Increasing evidence indicates that genes containing disease causal variation have distinct functional and genomic properties. The importance of understanding these properties is highlighted by efforts to filter lists of variants from next-generation sequencing studies, where the number of potentially deleterious variants, which are in fact unrelated to disease, may be large. Available evidence indicates that the majority of disease genes are 'non-essential' and their products occupy functionally peripheral positions in protein networks. They tend to be intermediate between genes that have core biological functions, particularly low mutation rates and low haplotype diversity, and genes for which high haplotype diversity and high mutation rates are advantageous (such as those involved in sensory perception and some immune system functions). Evidence presented here supports these conclusions through analysis of integrated data sets incorporating the latest mutational profiles, linkage disequilibrium structure and other genomic properties of individual genes. The analysis highlights the contrasting functions of genes predicted as least and most likely to contain disease variation and provides a basis for filtering gene variant lists to exclude the least plausible disease candidates.
Collapse
|
23
|
Vandeweyer G, Kooy RF. Detection and interpretation of genomic structural variation in health and disease. Expert Rev Mol Diagn 2014; 13:61-82. [DOI: 10.1586/erm.12.119] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
24
|
Abstract
A rare or orphan disorder is any disease that affects a small percentage of the population. Most genes and pathways underlying these disorders remain unknown. High-throughput techniques are frequently applied to detect disease candidate genes. The speed and affordability of sequencing following recent technological advances while advantageous are accompanied by the problem of data deluge. Furthermore, experimental validation of disease candidate genes is both time-consuming and expensive. Therefore, several computational approaches have been developed to identify the most promising candidates for follow-up studies. Based on the guilt by association principle, most of these approaches use prior knowledge about a disease of interest to discover and rank novel candidate genes. In this chapter, a brief overview of some of the in silico strategies for candidate gene prioritization is provided. To demonstrate their utility in rare disease research, a Web-based computational suite of tools that use integrated heterogeneous data sources for ranking disease candidate genes is used to demonstrate how to run typical queries using this system.
Collapse
Affiliation(s)
- Anil G Jegga
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, ML 7024, Cincinnati, OH, 45229, USA,
| |
Collapse
|
25
|
Abstract
While the genomics-derived discoveries promise benefits to basic research and health care, the speed and affordability of sequencing following recent technological advances has further aggravated the data deluge. Seamless integration of the ever-increasing clinical, genomic, and experimental data and efficient mining for knowledge extraction, delivering actionable insight and generating testable hypotheses are therefore critical for the needs of biomedical research. For instance, high-throughput techniques are frequently applied to detect disease candidate genes. Experimental validation of these candidates however is both time-consuming and expensive. Hence, several computational approaches based on literature and data mining have been developed to identify the most promising candidates for follow-up studies. Based on "guilt by association" principle, most of these methods use prior knowledge about a disease of interest to discover and rank novel candidate genes. In this chapter, we provide a brief overview of recent advances made in literature- and data-mining-based approaches for candidate gene prioritization. As a case study, we focus on a Web-based computational approach that uses integrated heterogeneous data sources including gene-literature associations for ranking disease candidate genes and explain how to run typical queries using this system.
Collapse
|
26
|
Gao S, Jia S, Hessner MJ, Wang X. Predicting disease-related subnetworks for type 1 diabetes using a new network activity score. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2012; 16:566-78. [PMID: 22917479 DOI: 10.1089/omi.2012.0029] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
In this study we investigated the advantage of including network information in prioritizing disease genes of type 1 diabetes (T1D). First, a naïve Bayesian network (NBN) model was developed to integrate information from multiple data sources and to define a T1D-involvement probability score (PS) for each individual gene. The algorithm was validated using known functional candidate genes as a benchmark. Genes with higher PS were found to be more likely to appear in T1D-related publications. Next a new network activity metric was proposed to evaluate the T1D relevance of protein-protein interaction (PPI) subnetworks. The metric considered the contribution both from individual genes and from network topological characteristics. The predictions were confirmed by several independent datasets, including a genome wide association study (GWAS), and two large-scale human gene expression studies. We found that novel candidate genes in the T1D subnetworks showed more significant associations with T1D than genes predicted using PS alone. Interestingly, most novel candidates were not encoded within the human leukocyte antigen (HLA) region, and their expression levels showed correlation with disease only in cohorts with low-risk HLA genotypes. The results suggested the importance of mapping disease gene networks in dissecting the genetics of complex diseases, and offered a general approach to network-based disease gene prioritization from multiple data sources.
Collapse
Affiliation(s)
- Shouguo Gao
- Department of Physics, the University of Alabama at Birmingham, Birmingham, Alabama 35294, USA
| | | | | | | |
Collapse
|
27
|
Rishishwar L, Varghese N, Tyagi E, Harvey SC, Jordan IK, McCarty NA. Relating the disease mutation spectrum to the evolution of the cystic fibrosis transmembrane conductance regulator (CFTR). PLoS One 2012; 7:e42336. [PMID: 22879944 PMCID: PMC3413703 DOI: 10.1371/journal.pone.0042336] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2012] [Accepted: 07/03/2012] [Indexed: 11/18/2022] Open
Abstract
Cystic fibrosis (CF) is the most common genetic disease among Caucasians, and accordingly the cystic fibrosis transmembrane conductance regulator (CFTR) protein has perhaps the best characterized disease mutation spectrum with more than 1,500 causative mutations having been identified. In this study, we took advantage of that wealth of mutational information in an effort to relate site-specific evolutionary parameters with the propensity and severity of CFTR disease-causing mutations. To do this, we devised a scoring scheme for known CFTR disease-causing mutations based on the Grantham amino acid chemical difference matrix. CFTR site-specific evolutionary constraint values were then computed for seven different evolutionary metrics across a range of increasing evolutionary depths. The CFTR mutational scores and the various site-specific evolutionary constraint values were compared in order to evaluate which evolutionary measures best reflect the disease-causing mutation spectrum. Site-specific evolutionary constraint values from the widely used comparative method PolyPhen2 show the best correlation with the CFTR mutation score spectrum, whereas more straightforward conservation based measures (ConSurf and ScoreCons) show the greatest ability to predict individual CFTR disease-causing mutations. While far greater than could be expected by chance alone, the fraction of the variability in mutation scores explained by the PolyPhen2 metric (3.6%), along with the best set of paired sensitivity (58%) and specificity (60%) values for the prediction of disease-causing residues, were marginal. These data indicate that evolutionary constraint levels are informative but far from determinant with respect to disease-causing mutations in CFTR. Nevertheless, this work shows that, when combined with additional lines of evidence, information on site-specific evolutionary conservation can and should be used to guide site-directed mutagenesis experiments by more narrowly defining the set of target residues, resulting in a potential savings of both time and money.
Collapse
Affiliation(s)
- Lavanya Rishishwar
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | | | | | | | | | | |
Collapse
|
28
|
Computational tools for prioritizing candidate genes: boosting disease gene discovery. Nat Rev Genet 2012; 13:523-36. [DOI: 10.1038/nrg3253] [Citation(s) in RCA: 332] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
29
|
Assis R, Kondrashov AS. A strong deletion bias in nonallelic gene conversion. PLoS Genet 2012; 8:e1002508. [PMID: 22359514 PMCID: PMC3280953 DOI: 10.1371/journal.pgen.1002508] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2011] [Accepted: 12/12/2011] [Indexed: 11/19/2022] Open
Abstract
Gene conversion is the unidirectional transfer of genetic information between orthologous (allelic) or paralogous (nonallelic) genomic segments. Though a number of studies have examined nucleotide replacements, little is known about length difference mutations produced by gene conversion. Here, we investigate insertions and deletions produced by nonallelic gene conversion in 338 Drosophila and 10,149 primate paralogs. Using a direct phylogenetic approach, we identify 179 insertions and 614 deletions in Drosophila paralogs, and 132 insertions and 455 deletions in primate paralogs. Thus, nonallelic gene conversion is strongly deletion-biased in both lineages, with almost 3.5 times as many conversion-induced deletions as insertions. In primates, the deletion bias is considerably stronger for long indels and, in both lineages, the per-site rate of gene conversion is orders of magnitudes higher than that of ordinary mutation. Due to this high rate, deletion-biased nonallelic gene conversion plays a key role in genome size evolution, leading to the cooperative shrinkage and eventual disappearance of selectively neutral paralogs. Gene conversion is a process whereby a DNA sequence is copied from one segment of the genome (donor) to another (recipient), resulting in the replacement, insertion, or deletion of a DNA sequence in the recipient. This exchange is facilitated by the high sequence similarity of the two segments, which is due to their evolutionary relationship. Here, we study insertions and deletions produced by gene conversion between paralogs, segments related by DNA duplication events. By comparing paralog sequences in multiple species of fruit flies and primates, we find that deletions occur more than three times as frequently as insertions. We also discover that the rate of gene conversion between paralogs is quite high. The deletion bias and high rate of this process causes paralogs to shrink cooperatively and eventually be eliminated from the genome. Because of the abundance of paralogs in animal genomes, this phenomenon can lead to a significant reduction in genome size. Therefore, our finding enhances our understanding of the forces that lead to changes in genome size during evolution.
Collapse
Affiliation(s)
- Raquel Assis
- Department of Integrative Biology, Center for Theoretical Evolutionary Genomics, University of California Berkeley, Berkeley, California, USA.
| | | |
Collapse
|
30
|
Podder S, Ghosh TC. Evolutionary dynamics of human autoimmune disease genes and malfunctioned immunological genes. BMC Evol Biol 2012; 12:10. [PMID: 22276655 PMCID: PMC3347981 DOI: 10.1186/1471-2148-12-10] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2011] [Accepted: 01/25/2012] [Indexed: 02/01/2023] Open
Abstract
Background One of the main issues of molecular evolution is to divulge the principles in dictating the evolutionary rate differences among various gene classes. Immunological genes have received considerable attention in evolutionary biology as candidates for local adaptation and for studying functionally important polymorphisms. The normal structure and function of immunological genes will be distorted when they experience mutations leading to immunological dysfunctions. Results Here, we examined the fundamental differences between the genes which on mutation give rise to autoimmune or other immune system related diseases and the immunological genes that do not cause any disease phenotypes. Although the disease genes examined are analogous to non-disease genes in product, expression, function, and pathway affiliation, a statistically significant decrease in evolutionary rate has been found in autoimmune disease genes relative to all other immune related diseases and non-disease genes. Possible ways of accumulation of mutation in the three steps of the central dogma (DNA-mRNA-Protein) have been studied to trace the mutational effects predisposed to disease consequence and acquiring higher selection pressure. Principal Component Analysis and Multivariate Regression Analysis have established the predominant role of single nucleotide polymorphisms in guiding the evolutionary rate of immunological disease and non-disease genes followed by m-RNA abundance, paralogs number, fraction of phosphorylation residue, alternatively spliced exon, protein residue burial and protein disorder. Conclusions Our study provides an empirical insight into the etiology of autoimmune disease genes and other immunological diseases. The immediate utility of our study is to help in disease gene identification and may also help in medicinal improvement of immune related disease.
Collapse
|
31
|
Jin W, Qin P, Lou H, Jin L, Xu S. A systematic characterization of genes underlying both complex and Mendelian diseases. Hum Mol Genet 2011; 21:1611-24. [DOI: 10.1093/hmg/ddr599] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
|
32
|
Dickerson JE, Zhu A, Robertson DL, Hentges KE. Defining the role of essential genes in human disease. PLoS One 2011; 6:e27368. [PMID: 22096564 PMCID: PMC3214036 DOI: 10.1371/journal.pone.0027368] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2011] [Accepted: 10/15/2011] [Indexed: 12/31/2022] Open
Abstract
A greater understanding of the causes of human disease can come from identifying characteristics that are specific to disease genes. However, a full understanding of the contribution of essential genes to human disease is lacking, due to the premise that these genes tend to cause developmental abnormalities rather than adult disease. We tested the hypothesis that human orthologs of mouse essential genes are associated with a variety of human diseases, rather than only those related to miscarriage and birth defects. We segregated human disease genes according to whether the knockout phenotype of their mouse ortholog was lethal or viable, defining those with orthologs producing lethal knockouts as essential disease genes. We show that the human orthologs of mouse essential genes are associated with a wide spectrum of diseases affecting diverse physiological systems. Notably, human disease genes with essential mouse orthologs are over-represented among disease genes associated with cancer, suggesting links between adult cellular abnormalities and developmental functions. The proteins encoded by essential genes are highly connected in protein-protein interaction networks, which we find correlates with an over-representation of nuclear proteins amongst essential disease genes. Disease genes associated with essential orthologs also are more likely than those with non-essential orthologs to contribute to disease through an autosomal dominant inheritance pattern, suggesting that these diseases may actually result from semi-dominant mutant alleles. Overall, we have described attributes found in disease genes according to the essentiality status of their mouse orthologs. These findings demonstrate that disease genes do occupy highly connected positions in protein-protein interaction networks, and that due to the complexity of disease-associated alleles, essential genes cannot be ignored as candidates for causing diverse human diseases.
Collapse
Affiliation(s)
| | - Ana Zhu
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom
| | - David L. Robertson
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom
| | - Kathryn E. Hentges
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom
- * E-mail:
| |
Collapse
|
33
|
Nguyen TP, Ho TB. Detecting disease genes based on semi-supervised learning and protein-protein interaction networks. Artif Intell Med 2011; 54:63-71. [PMID: 22000346 DOI: 10.1016/j.artmed.2011.09.003] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2009] [Revised: 05/24/2011] [Accepted: 09/01/2011] [Indexed: 11/19/2022]
Abstract
OBJECTIVE Predicting or prioritizing the human genes that cause disease, or "disease genes", is one of the emerging tasks in biomedicine informatics. Research on network-based approach to this problem is carried out upon the key assumption of "the network-neighbour of a disease gene is likely to cause the same or a similar disease", and mostly employs data regarding well-known disease genes, using supervised learning methods. This work aims to find an effective method to exploit the disease gene neighbourhood and the integration of several useful omics data sources, which potentially enhance disease gene predictions. METHODS We have presented a novel method to effectively predict disease genes by exploiting, in the semi-supervised learning (SSL) scheme, data regarding both disease genes and disease gene neighbours via protein-protein interaction network. Multiple proteomic and genomic data were integrated from six biological databases, including Universal Protein Resource, Interologous Interaction Database, Reactome, Gene Ontology, Pfam, and InterDom, and a gene expression dataset. RESULTS By employing a 10 times stratified 10-fold cross validation, the SSL method performs better than the k-nearest neighbour method and the support vector machines method in terms of sensitivity of 85%, specificity of 79%, precision of 81%, accuracy of 82%, and a balanced F-function of 83%. The other comparative experimental evaluations demonstrate advantages of the proposed method given a small amount of labeled data with accuracy of 78%. We have applied the proposed method to detect 572 putative disease genes, which are biologically validated by some indirect ways. CONCLUSION Semi-supervised learning improved ability to study disease genes, especially a specific disease when the known disease genes (as labeled data) are very often limited. In addition to the computational improvement, the analysis of predicted disease proteins indicates that the findings are beneficial in deciphering the pathogenic mechanisms.
Collapse
|
34
|
Dickerson JE, Robertson DL. On the origins of Mendelian disease genes in man: the impact of gene duplication. Mol Biol Evol 2011; 29:61-9. [PMID: 21705381 PMCID: PMC3709195 DOI: 10.1093/molbev/msr111] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Over 3,000 human diseases are known to be linked to heritable genetic variation, mapping to over 1,700 unique genes. Dating of the evolutionary age of these disease-associated genes has suggested that they have a tendency to be ancient, specifically coming into existence with early metazoa. The approach taken by past studies, however, assumes that the age of a disease is the same as the age of its common ancestor, ignoring the fundamental contribution of duplication events in the evolution of new genes and function. Here, we date both the common ancestor and the duplication history of known human disease-associated genes. We find that the majority of disease genes (80%) are genes that have been duplicated in their evolutionary history. Periods for which there are more disease-associated genes, for example, at the origins of bony vertebrates, are explained by the emergence of more genes at that time, and the majority of these are duplicates inferred to have arisen by whole-genome duplication. These relationships are similar for different disease types and the disease-associated gene's cellular function. This indicates that the emergence of duplication-associated diseases has been ongoing and approximately constant (relative to the retention of duplicate genes) throughout the evolution of life. This continued until approximately 390 Ma from which time relatively fewer novel genes came into existence on the human lineage, let alone disease genes. For single-copy genes associated with disease, we find that the numbers of disease genes decreases with recency. For the majority of duplicates, the disease-associated mutation is associated with just one of the duplicate copies. A universal explanation for heritable disease is, thus, that it is merely a by-product of the evolutionary process; the evolution of new genes (de novo or by duplication) results in the potential for new diseases to emerge.
Collapse
Affiliation(s)
- Jonathan E Dickerson
- Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom
| | | |
Collapse
|
35
|
Kaimal V, Sardana D, Bardes EE, Gudivada RC, Chen J, Jegga AG. Integrative systems biology approaches to identify and prioritize disease and drug candidate genes. Methods Mol Biol 2011; 700:241-259. [PMID: 21204038 DOI: 10.1007/978-1-61737-954-3_16] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Although a number of computational approaches have been developed to integrate data from multiple sources for the purpose of predicting or prioritizing candidate disease genes, relatively few of them focus on identifying or ranking drug targets. To address this deficit, we have developed an approach to specifically identify and prioritize disease and drug candidate genes. In this chapter, we demonstrate the applicability of integrative systems-biology-based approaches to identify potential drug targets and candidate genes by employing information extracted from public databases. We illustrate the method in detail using examples of two neurodegenerative diseases (Alzheimer's and Parkinson's) and one neuropsychiatric disease (Schizophrenia).
Collapse
Affiliation(s)
- Vivek Kaimal
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | | | | | | | | | | |
Collapse
|
36
|
Nagaraj SH, Ingham A, Reverter A. The interplay between evolution, regulation and tissue specificity in the Human Hereditary Diseasome. BMC Genomics 2010; 11 Suppl 4:S23. [PMID: 21143807 PMCID: PMC3005915 DOI: 10.1186/1471-2164-11-s4-s23] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Background Human disease genes can be distinguished from essential (embryonically lethal) and non-disease genes using gene attributes. Such attributes include gene age, tissue specificity of expression, regulatory capacity, sequence length, rate of sequence variation and capacity for interaction. The resulting information has been used to inform data mining approaches seeking to identify novel disease genes. Given the dynamic nature of this field and the rapid rise in relevant information, we have chosen to perform a single integrated mining approach to explore relationships among gene attributes and thereby characterise evolutionary trends associated with disease genes. Results All against all cross comparison of 2,522 disease gene attributes revealed significant relationships existed between the age, disease-association and expression pattern of genes and the tissues within which they are expressed. We found that the over-representation of disease genes among old genes holds for tissue-specific genes, but the correlation between age and disease association vanished when conditioning on tissue-specificity. Of the 32 tissues studied, the genes expressed in pancreas are on average older than the genes expressed in any other tissue, while the testis expressed the lowest proportion of old genes. Following a focussed analysis on the impact of regulatory apparatus on evolution of disease genes, we show that regulators, comprising transcription factors and post-translation modified proteins, are over-represented among ancient disease genes. In addition, we show that the proportion of regulator genes is affected by gene age among disease genes and by tissue-specificity among non-disease genes. Finally, using 55,606 true positive gene interaction data, we find that old disease genes interacts with other old disease genes and interacting new genes interacts with genes originating from higher phylostrata. Conclusion This study supports the non-random nature of the human diseasome. We have identified a variety of distinct features and correlations to other molecular attributes that can be used to distinguish the set of disease causing genes. This was achieved by harnessing the power of mining large scale datasets from OMIM and other databases. Ultimately such knowledge may contribute to the identification of novel human disease genes and an enhanced understanding of human biology.
Collapse
Affiliation(s)
- Shivashankar H Nagaraj
- CSIRO Livestock Industries, Queensland Bioscience Precinct, St. Lucia, Queensland, Australia.
| | | | | |
Collapse
|
37
|
Gefen A, Cohen R, Birk OS. Syndrome to gene (S2G): in-silico identification of candidate genes for human diseases. Hum Mutat 2010; 31:229-36. [PMID: 20052752 DOI: 10.1002/humu.21171] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The identification of genomic loci associated with human genetic syndromes has been significantly facilitated through the generation of high density SNP arrays. However, optimal selection of candidate genes from within such loci is still a tedious labor-intensive bottleneck. Syndrome to Gene (S2G) is based on novel algorithms which allow an efficient search for candidate genes in a genomic locus, using known genes whose defects cause phenotypically similar syndromes. S2G (http://fohs.bgu.ac.il/s2g/index.html) includes two components: a phenotype Online Mendelian Inheritance in Man (OMIM)-based search engine that alleviates many of the problems in the existing OMIM search engine (negation phrases, overlapping terms, etc.). The second component is a gene prioritizing engine that uses a novel algorithm to integrate information from 18 databases. When the detailed phenotype of a syndrome is inserted to the web-based software, S2G offers a complete improved search of the OMIM database for similar syndromes. The software then prioritizes a list of genes from within a genomic locus, based on their association with genes whose defects are known to underlie similar clinical syndromes. We demonstrate that in all 30 cases of novel disease genes identified in the past year, the disease gene was within the top 20% of candidate genes predicted by S2G, and in most cases--within the top 10%. Thus, S2G provides clinicians with an efficient tool for diagnosis and researchers with a candidate gene prediction tool based on phenotypic data and a wide range of gene data resources. S2G can also serve in studies of polygenic diseases, and in finding interacting molecules for any gene of choice.
Collapse
Affiliation(s)
- Avitan Gefen
- The Morris Kahn Laboratory of Human Genetics, National Institute for Biotechnology in the Negev, Ben Gurion University, Beer-Sheva, Israel
| | | | | |
Collapse
|
38
|
Cooper DN, Chen JM, Ball EV, Howells K, Mort M, Phillips AD, Chuzhanova N, Krawczak M, Kehrer-Sawatzki H, Stenson PD. Genes, mutations, and human inherited disease at the dawn of the age of personalized genomics. Hum Mutat 2010; 31:631-55. [PMID: 20506564 DOI: 10.1002/humu.21260] [Citation(s) in RCA: 117] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The number of reported germline mutations in human nuclear genes, either underlying or associated with inherited disease, has now exceeded 100,000 in more than 3,700 different genes. The availability of these data has both revolutionized the study of the morbid anatomy of the human genome and facilitated "personalized genomics." With approximately 300 new "inherited disease genes" (and approximately 10,000 new mutations) being identified annually, it is pertinent to ask how many "inherited disease genes" there are in the human genome, how many mutations reside within them, and where such lesions are likely to be located? To address these questions, it is necessary not only to reconsider how we define human genes but also to explore notions of gene "essentiality" and "dispensability."Answers to these questions are now emerging from recent novel insights into genome structure and function and through complete genome sequence information derived from multiple individual human genomes. However, a change in focus toward screening functional genomic elements as opposed to genes sensu stricto will be required if we are to capitalize fully on recent technical and conceptual advances and identify new types of disease-associated mutation within noncoding regions remote from the genes whose function they disrupt.
Collapse
Affiliation(s)
- David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, United Kingdom.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Kowarsch A, Fuchs A, Frishman D, Pagel P. Correlated mutations: a hallmark of phenotypic amino acid substitutions. PLoS Comput Biol 2010; 6. [PMID: 20862353 PMCID: PMC2940720 DOI: 10.1371/journal.pcbi.1000923] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2009] [Accepted: 08/09/2010] [Indexed: 11/18/2022] Open
Abstract
Point mutations resulting in the substitution of a single amino acid can cause severe functional consequences, but can also be completely harmless. Understanding what determines the phenotypical impact is important both for planning targeted mutation experiments in the laboratory and for analyzing naturally occurring mutations found in patients. Common wisdom suggests using the extent of evolutionary conservation of a residue or a sequence motif as an indicator of its functional importance and thus vulnerability in case of mutation. In this work, we put forward the hypothesis that in addition to conservation, co-evolution of residues in a protein influences the likelihood of a residue to be functionally important and thus associated with disease. While the basic idea of a relation between co-evolution and functional sites has been explored before, we have conducted the first systematic and comprehensive analysis of point mutations causing disease in humans with respect to correlated mutations. We included 14,211 distinct positions with known disease-causing point mutations in 1,153 human proteins in our analysis. Our data show that (1) correlated positions are significantly more likely to be disease-associated than expected by chance, and that (2) this signal cannot be explained by conservation patterns of individual sequence positions. Although correlated residues have primarily been used to predict contact sites, our data are in agreement with previous observations that (3) many such correlations do not relate to physical contacts between amino acid residues. Access to our analysis results are provided at http://webclu.bio.wzw.tum.de/~pagel/supplements/correlated-positions/. Point mutations (i.e., changes of a single sequence element) can have a severe impact on protein function. Many diseases are caused by such minute defects. On the other hand, the majority of such mutations does not lead to noticeable effects. Although previous research has revealed important aspects that influence or predict the chance of a mutation to cause disease, much remains to be learned before we fully understand this complex problem. In our work, we use the observation that sometimes certain positions in a protein mutate in an apparently correlated fashion and analyze this correlation with respect to mutation vulnerability. Our results show that positions exhibiting evolutionary correlation are significantly more likely to be vulnerable to mutation than average positions. On one hand, our data further support the concept of correlated positions to not only be associated with protein contacts but also functional sites and/or disease positions (as introduced by others). On the other hand, this could be useful to further improve the understanding and prediction of the consequences of mutations. Our work is the first to attempt a large-scale quantitation of this relationship.
Collapse
Affiliation(s)
- Andreas Kowarsch
- Lehrstuhl für Genomorientierte Bioinformatik, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
- Institut für Bioinformatik und Systembiologie/MIPS, Helmholtz Zentrum München – Deutsches Forschungszentrum für Gesundheit und Umwelt, Neuherberg, Germany
| | - Angelika Fuchs
- Lehrstuhl für Genomorientierte Bioinformatik, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
| | - Dmitrij Frishman
- Lehrstuhl für Genomorientierte Bioinformatik, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
- Institut für Bioinformatik und Systembiologie/MIPS, Helmholtz Zentrum München – Deutsches Forschungszentrum für Gesundheit und Umwelt, Neuherberg, Germany
| | - Philipp Pagel
- Lehrstuhl für Genomorientierte Bioinformatik, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
- Institut für Bioinformatik und Systembiologie/MIPS, Helmholtz Zentrum München – Deutsches Forschungszentrum für Gesundheit und Umwelt, Neuherberg, Germany
- * E-mail:
| |
Collapse
|
40
|
Cooper DN, Ball EV, Mort M. Chromosomal distribution of disease genes in the human genome. Genet Test Mol Biomarkers 2010; 14:441-6. [PMID: 20642358 DOI: 10.1089/gtmb.2010.0081] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Genes are nonrandomly distributed in the human genome, both within and between chromosomes. Thus, genes of similar function and common evolutionary origin are often clustered, as are genes with similar expression profiles. We now report that the >2400 genes known to underlie human monogenic inherited disease are non-randomly distributed in the genome over and above the general nonrandomness evident in the distribution of human genes. Further, a subset of 315 inherited disease genes subject to gross deletion was found to exhibit a degree of clustering that was twice that manifested by disease genes in general. The clustering of human disease genes is likely to have important implications for understanding the genotype-phenotype relationship in contiguous gene syndromes as well as those conditions characterized by multigene deletions or complex chromosomal rearrangements.
Collapse
Affiliation(s)
- David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, United Kingdom.
| | | | | |
Collapse
|
41
|
Cooper DN, Mort M. Do Inherited Disease Genes Have Distinguishing Functional Characteristics? Genet Test Mol Biomarkers 2010; 14:289-91. [DOI: 10.1089/gtmb.2010.0033] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- David N. Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, United Kingdom
| | - Matthew Mort
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, United Kingdom
| |
Collapse
|
42
|
Zhu J, Xiao H, Shen X, Wang J, Zou J, Zhang L, Yang D, Ma W, Yao C, Gong X, Zhang M, Zhang Y, Guo Z. Viewing cancer genes from co-evolving gene modules. Bioinformatics 2010; 26:919-24. [DOI: 10.1093/bioinformatics/btq055] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
43
|
Kann MG. Advances in translational bioinformatics: computational approaches for the hunting of disease genes. Brief Bioinform 2010; 11:96-110. [PMID: 20007728 PMCID: PMC2810112 DOI: 10.1093/bib/bbp048] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2009] [Revised: 09/15/2009] [Indexed: 12/29/2022] Open
Abstract
Over a 100 years ago, William Bateson provided, through his observations of the transmission of alkaptonuria in first cousin offspring, evidence of the application of Mendelian genetics to certain human traits and diseases. His work was corroborated by Archibald Garrod (Archibald AE. The incidence of alkaptonuria: a study in chemical individuality. Lancert 1902;ii:1616-20) and William Farabee (Farabee WC. Inheritance of digital malformations in man. In: Papers of the Peabody Museum of American Archaeology and Ethnology. Cambridge, Mass: Harvard University, 1905; 65-78), who recorded the familial tendencies of inheritance of malformations of human hands and feet. These were the pioneers of the hunt for disease genes that would continue through the century and result in the discovery of hundreds of genes that can be associated with different diseases. Despite many ground-breaking discoveries during the last century, we are far from having a complete understanding of the intricate network of molecular processes involved in diseases, and we are still searching for the cures for most complex diseases. In the last few years, new genome sequencing and other high-throughput experimental techniques have generated vast amounts of molecular and clinical data that contain crucial information with the potential of leading to the next major biomedical discoveries. The need to mine, visualize and integrate these data has motivated the development of several informatics approaches that can broadly be grouped in the research area of 'translational bioinformatics'. This review highlights the latest advances in the field of translational bioinformatics, focusing on the advances of computational techniques to search for and classify disease genes.
Collapse
Affiliation(s)
- Maricel G Kann
- University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA.
| |
Collapse
|
44
|
Hennah W, Thomson P, McQuillin A, Bass N, Loukola A, Anjorin A, Blackwood D, Curtis D, Deary IJ, Harris SE, Isometsä ET, Lawrence J, Lönnqvist J, Muir W, Palotie A, Partonen T, Paunio T, Pylkkö E, Robinson M, Soronen P, Suominen K, Suvisaari J, Thirumalai S, St Clair D, Gurling H, Peltonen L, Porteous D. DISC1 association, heterogeneity and interplay in schizophrenia and bipolar disorder. Mol Psychiatry 2009; 14:865-73. [PMID: 18317464 DOI: 10.1038/mp.2008.22] [Citation(s) in RCA: 129] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Disrupted in schizophrenia 1 (DISC1) has been associated with risk of schizophrenia, schizoaffective disorder, bipolar disorder, major depression, autism and Asperger syndrome, but apart from in the original translocation family, true causal variants have yet to be confirmed. Here we report a harmonized association study for DISC1 in European cohorts of schizophrenia and bipolar disorder. We identify regions of significant association, demonstrate allele frequency heterogeneity and provide preliminary evidence for modifying interplay between variants. Whereas no associations survived permutation analysis in the combined data set, significant corrected associations were observed for bipolar disorder at rs1538979 in the Finnish cohorts (uncorrected P=0.00020; corrected P=0.016; odds ratio=2.73+/-95% confidence interval (CI) 1.42-5.27) and at rs821577 in the London cohort (uncorrected P=0.00070; corrected P=0.040; odds ratio=1.64+/-95% CI 1.23-2.19). The rs821577 single nucleotide polymorphism (SNP) showed evidence for increased risk within the combined European cohorts (odds ratio=1.27+/-95% CI 1.07-1.51), even though significant corrected association was not detected (uncorrected P=0.0058; corrected P=0.28). After conditioning the European data set on the two risk alleles, reanalysis revealed a third significant SNP association (uncorrected P=0.00050; corrected P=0.025). This SNP showed evidence for interplay, either increasing or decreasing risk, dependent upon the presence or absence of rs1538979 or rs821577. These findings provide further support for the role of DISC1 in psychiatric illness and demonstrate the presence of locus heterogeneity, with the effect that clinically relevant genetic variants may go undetected by standard analysis of combined cohorts.
Collapse
Affiliation(s)
- W Hennah
- Medical Genetics Section, University of Edinburgh, Edinburgh EH4 2XU, Scotland.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Abstract
In the present study we have examined human-mouse homologous intronless disease and non-disease genes alongside their extent of sequence conservation, tissue expression, domain and gene ontology composition to get an idea regarding evolutionary and functional attributes. We show that selection has significantly discriminated between the two groups and the disease associated genes in particular exhibit lower K(a) and K(a)/K(s) while K(s) although smaller is not significantly different. Our analyses suggest that majority of disease related intronless human genes have homology limited to eukaryotic genomes and their expression is localized. Also we observed that different classes of intronless disease related genes have experienced diverse selective pressures and are enriched for higher level functionality that is essentially needed for developmental processes in complex organisms. It is expected that these insights will enhance our understanding of the nature of these genes and also improve our ability to identify disease related intronless genes.
Collapse
Affiliation(s)
- Subhash Mohan Agarwal
- Center for Computational Biology and Bioinformatics, School of Information Technology, Jawaharlal Nehru University, New Delhi 110067, India.
| | | |
Collapse
|
46
|
Cai JJ, Borenstein E, Chen R, Petrov DA. Similarly strong purifying selection acts on human disease genes of all evolutionary ages. Genome Biol Evol 2009; 1:131-44. [PMID: 20333184 PMCID: PMC2817408 DOI: 10.1093/gbe/evp013] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/22/2009] [Indexed: 12/20/2022] Open
Abstract
A number of studies have showed that recently created genes differ from the genes created in deep evolutionary past in many aspects. Here, we determined the age of emergence and propensity for gene loss (PGL) of all human protein–coding genes and compared disease genes with non-disease genes in terms of their evolutionary rate, strength of purifying selection, mRNA expression, and genetic redundancy. The older and the less prone to loss, non-disease genes have been evolving 1.5- to 3-fold slower between humans and chimps than young non-disease genes, whereas Mendelian disease genes have been evolving very slowly regardless of their ages and PGL. Complex disease genes showed an intermediate pattern. Disease genes also have higher mRNA expression heterogeneity across multiple tissues than non-disease genes regardless of age and PGL. Young and middle-aged disease genes have fewer similar paralogs as non-disease genes of the same age. We reasoned that genes were more likely to be involved in human disease if they were under a strong functional constraint, expressed heterogeneously across tissues, and lacked genetic redundancy. Young human genes that have been evolving under strong constraint between humans and chimps might also be enriched for genes that encode important primate or even human-specific functions.
Collapse
Affiliation(s)
- James J Cai
- Department of Biology, Stanford University, CA, USA
| | | | | | | |
Collapse
|
47
|
Chen J, Bardes EE, Aronow BJ, Jegga AG. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res 2009; 37:W305-11. [PMID: 19465376 PMCID: PMC2703978 DOI: 10.1093/nar/gkp427] [Citation(s) in RCA: 2050] [Impact Index Per Article: 136.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
ToppGene Suite (http://toppgene.cchmc.org; this web site is free and open to all users and does not require a login to access) is a one-stop portal for (i) gene list functional enrichment, (ii) candidate gene prioritization using either functional annotations or network analysis and (iii) identification and prioritization of novel disease candidate genes in the interactome. Functional annotation-based disease candidate gene prioritization uses a fuzzy-based similarity measure to compute the similarity between any two genes based on semantic annotations. The similarity scores from individual features are combined into an overall score using statistical meta-analysis. A P-value of each annotation of a test gene is derived by random sampling of the whole genome. The protein–protein interaction network (PPIN)-based disease candidate gene prioritization uses social and Web networks analysis algorithms (extended versions of the PageRank and HITS algorithms, and the K-Step Markov method). We demonstrate the utility of ToppGene Suite using 20 recently reported GWAS-based gene–disease associations (including novel disease genes) representing five diseases. ToppGene ranked 19 of 20 (95%) candidate genes within the top 20%, while ToppNet ranked 12 of 16 (75%) candidate genes among the top 20%.
Collapse
Affiliation(s)
- Jing Chen
- Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
| | | | | | | |
Collapse
|
48
|
Care M, Bradford J, Needham C, Bulpitt A, Westhead D. Combining the interactome and deleterious SNP predictions to improve disease gene identification. Hum Mutat 2009; 30:485-92. [DOI: 10.1002/humu.20917] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
49
|
Abstract
Human genes responsible for inherited diseases are important for the understanding of human disease. We investigated the degree of polymorphism and divergence in the human disease genes to elucidate the effect of natural selection on human disease genes. In particular, the effect of disease dominance was incorporated into the analysis. Both dominant disease genes (DDG) and recessive disease genes (RDG) had a higher mutation rate per site and encoded longer proteins than the nondisease genes, which exposed the disease genes to a faster flux of new mutations. Using an unbiased polymorphism dataset, we found that, proportionally, RDG harbor more nonsynonymous polymorphisms compared with DDG. We estimated the selection intensity on the disease genes using polymorphism and divergence data and determined whether the different patterns of polymorphism and divergence between DDG and RDG could be explained by the difference in only dominance. Even after the dominance effect was considered, the selection intensity on RDG was significantly different from DDG, suggesting that the deleterious effect of the dominant and recessive disease mutations are fundamentally different.
Collapse
|
50
|
Analysis of human disease genes in the context of gene essentiality. Genomics 2008; 92:414-8. [DOI: 10.1016/j.ygeno.2008.08.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2008] [Revised: 08/04/2008] [Accepted: 08/07/2008] [Indexed: 11/21/2022]
|