1
|
Jubran J, Slutsky R, Rozenblum N, Rokach L, Ben-David U, Yeger-Lotem E. Machine-learning analysis reveals an important role for negative selection in shaping cancer aneuploidy landscapes. Genome Biol 2024; 25:95. [PMID: 38622679 PMCID: PMC11020441 DOI: 10.1186/s13059-024-03225-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 03/26/2024] [Indexed: 04/17/2024] Open
Abstract
BACKGROUND Aneuploidy, an abnormal number of chromosomes within a cell, is a hallmark of cancer. Patterns of aneuploidy differ across cancers, yet are similar in cancers affecting closely related tissues. The selection pressures underlying aneuploidy patterns are not fully understood, hindering our understanding of cancer development and progression. RESULTS Here, we apply interpretable machine learning methods to study tissue-selective aneuploidy patterns. We define 20 types of features corresponding to genomic attributes of chromosome-arms, normal tissues, primary tumors, and cancer cell lines (CCLs), and use them to model gains and losses of chromosome arms in 24 cancer types. To reveal the factors that shape the tissue-specific cancer aneuploidy landscapes, we interpret the machine learning models by estimating the relative contribution of each feature to the models. While confirming known drivers of positive selection, our quantitative analysis highlights the importance of negative selection for shaping aneuploidy landscapes. This is exemplified by tumor suppressor gene density being a better predictor of gain patterns than oncogene density, and vice versa for loss patterns. We also identify the importance of tissue-selective features and demonstrate them experimentally, revealing KLF5 as an important driver for chr13q gain in colon cancer. Further supporting an important role for negative selection in shaping the aneuploidy landscapes, we find compensation by paralogs to be among the top predictors of chromosome arm loss prevalence and demonstrate this relationship for one paralog interaction. Similar factors shape aneuploidy patterns in human CCLs, demonstrating their relevance for aneuploidy research. CONCLUSIONS Our quantitative, interpretable machine learning models improve the understanding of the genomic properties that shape cancer aneuploidy landscapes.
Collapse
Affiliation(s)
- Juman Jubran
- Department of Clinical Biochemistry and Pharmacology, Ben-Gurion University of the Negev, 84105, Beer Sheva, Israel
| | - Rachel Slutsky
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Nir Rozenblum
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Lior Rokach
- Department of Software & Information Systems Engineering, Ben-Gurion University of the Negev, 84105, Beer Sheva, Israel
| | - Uri Ben-David
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| | - Esti Yeger-Lotem
- Department of Clinical Biochemistry and Pharmacology, Ben-Gurion University of the Negev, 84105, Beer Sheva, Israel.
- The National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, 84105, Beer Sheva, Israel.
| |
Collapse
|
2
|
Felício D, du Mérac TR, Amorim A, Martins S. Functional implications of paralog genes in polyglutamine spinocerebellar ataxias. Hum Genet 2023; 142:1651-1676. [PMID: 37845370 PMCID: PMC10676324 DOI: 10.1007/s00439-023-02607-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 09/22/2023] [Indexed: 10/18/2023]
Abstract
Polyglutamine (polyQ) spinocerebellar ataxias (SCAs) comprise a group of autosomal dominant neurodegenerative disorders caused by (CAG/CAA)n expansions. The elongated stretches of adjacent glutamines alter the conformation of the native proteins inducing neurotoxicity, and subsequent motor and neurological symptoms. Although the etiology and neuropathology of most polyQ SCAs have been extensively studied, only a limited selection of therapies is available. Previous studies on SCA1 demonstrated that ATXN1L, a human duplicated gene of the disease-associated ATXN1, alleviated neuropathology in mice models. Other SCA-associated genes have paralogs (i.e., copies at different chromosomal locations derived from duplication of the parental gene), but their functional relevance and potential role in disease pathogenesis remain unexplored. Here, we review the protein homology, expression pattern, and molecular functions of paralogs in seven polyQ dominant ataxias-SCA1, SCA2, MJD/SCA3, SCA6, SCA7, SCA17, and DRPLA. Besides ATXN1L, we highlight ATXN2L, ATXN3L, CACNA1B, ATXN7L1, ATXN7L2, TBPL2, and RERE as promising functional candidates to play a role in the neuropathology of the respective SCA, along with the parental gene. Although most of these duplicates lack the (CAG/CAA)n region, if functionally redundant, they may compensate for a partial loss-of-function or dysfunction of the wild-type genes in SCAs. We aim to draw attention to the hypothesis that paralogs of disease-associated genes may underlie the complex neuropathology of dominant ataxias and potentiate new therapeutic strategies.
Collapse
Affiliation(s)
- Daniela Felício
- Instituto de Investigação e Inovação em Saúde (i3S), 4200-135, Porto, Portugal
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), 4200-135, Porto, Portugal
- Instituto Ciências Biomédicas Abel Salazar (ICBAS), Universidade do Porto, 4050-313, Porto, Portugal
| | - Tanguy Rubat du Mérac
- Instituto de Investigação e Inovação em Saúde (i3S), 4200-135, Porto, Portugal
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), 4200-135, Porto, Portugal
- Faculty of Science, University of Amsterdam, 1098 XH, Amsterdam, The Netherlands
| | - António Amorim
- Instituto de Investigação e Inovação em Saúde (i3S), 4200-135, Porto, Portugal
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), 4200-135, Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, 4169-007, Porto, Portugal
| | - Sandra Martins
- Instituto de Investigação e Inovação em Saúde (i3S), 4200-135, Porto, Portugal.
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), 4200-135, Porto, Portugal.
| |
Collapse
|
3
|
Renaux A, Terwagne C, Cochez M, Tiddi I, Nowé A, Lenaerts T. A knowledge graph approach to predict and interpret disease-causing gene interactions. BMC Bioinformatics 2023; 24:324. [PMID: 37644440 PMCID: PMC10463539 DOI: 10.1186/s12859-023-05451-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 08/22/2023] [Indexed: 08/31/2023] Open
Abstract
BACKGROUND Understanding the impact of gene interactions on disease phenotypes is increasingly recognised as a crucial aspect of genetic disease research. This trend is reflected by the growing amount of clinical research on oligogenic diseases, where disease manifestations are influenced by combinations of variants on a few specific genes. Although statistical machine-learning methods have been developed to identify relevant genetic variant or gene combinations associated with oligogenic diseases, they rely on abstract features and black-box models, posing challenges to interpretability for medical experts and impeding their ability to comprehend and validate predictions. In this work, we present a novel, interpretable predictive approach based on a knowledge graph that not only provides accurate predictions of disease-causing gene interactions but also offers explanations for these results. RESULTS We introduce BOCK, a knowledge graph constructed to explore disease-causing genetic interactions, integrating curated information on oligogenic diseases from clinical cases with relevant biomedical networks and ontologies. Using this graph, we developed a novel predictive framework based on heterogenous paths connecting gene pairs. This method trains an interpretable decision set model that not only accurately predicts pathogenic gene interactions, but also unveils the patterns associated with these diseases. A unique aspect of our approach is its ability to offer, along with each positive prediction, explanations in the form of subgraphs, revealing the specific entities and relationships that led to each pathogenic prediction. CONCLUSION Our method, built with interpretability in mind, leverages heterogenous path information in knowledge graphs to predict pathogenic gene interactions and generate meaningful explanations. This not only broadens our understanding of the molecular mechanisms underlying oligogenic diseases, but also presents a novel application of knowledge graphs in creating more transparent and insightful predictors for genetic research.
Collapse
Affiliation(s)
- Alexandre Renaux
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence lab, Vrije Universiteit Brussel, Brussels, Belgium
| | - Chloé Terwagne
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
| | - Michael Cochez
- Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Discovery Lab, Elsevier, Amsterdam, The Netherlands
| | - Ilaria Tiddi
- Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Ann Nowé
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium
- Artificial Intelligence lab, Vrije Universiteit Brussel, Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence lab, Vrije Universiteit Brussel, Brussels, Belgium
| |
Collapse
|
4
|
Andrianova EP, Marmion RA, Shvartsman SY, Zhulin IB. Evolutionary history of MEK1 illuminates the nature of deleterious mutations. Proc Natl Acad Sci U S A 2023; 120:e2304184120. [PMID: 37579140 PMCID: PMC10450672 DOI: 10.1073/pnas.2304184120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 07/24/2023] [Indexed: 08/16/2023] Open
Abstract
Mutations in signal transduction pathways lead to various diseases including cancers. MEK1 kinase, encoded by the human MAP2K1 gene, is one of the central components of the MAPK pathway and more than a hundred somatic mutations in the MAP2K1 gene were identified in various tumors. Germline mutations deregulating MEK1 also lead to congenital abnormalities, such as the cardiofaciocutaneous syndrome and arteriovenous malformation. Evaluating variants associated with a disease is a challenge, and computational genomic approaches aid in this process. Establishing evolutionary history of a gene improves computational prediction of disease-causing mutations; however, the evolutionary history of MEK1 is not well understood. Here, by revealing a precise evolutionary history of MEK1, we construct a well-defined dataset of MEK1 metazoan orthologs, which provides sufficient depth to distinguish between conserved and variable amino acid positions. We matched known and predicted disease-causing and benign mutations to evolutionary changes observed in corresponding amino acid positions and found that all known and many suspected disease-causing mutations are evolutionarily intolerable. We selected several variants that cannot be unambiguously assessed by automated prediction tools but that are confidently identified as "damaging" by our approach, for experimental validation in Drosophila. In all cases, evolutionary intolerant variants caused increased mortality and severe defects in fruit fly embryos confirming their damaging nature. We anticipate that our analysis will serve as a blueprint to help evaluate known and novel missense variants in MEK1 and that our approach will contribute to improving automated tools for disease-associated variant interpretation.
Collapse
Affiliation(s)
- Ekaterina P. Andrianova
- Department of Microbiology, The Ohio State University, Columbus, OH43210
- Translational Data Analytics Institute, The Ohio State University, Columbus, OH43210
| | - Robert A. Marmion
- The Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ08544
| | - Stanislav Y. Shvartsman
- The Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ08544
- Department of Molecular Biology, Princeton University, Princeton, NJ08544
- Flatiron Institute, Simons Foundation, New York, NY10010
| | - Igor B. Zhulin
- Department of Microbiology, The Ohio State University, Columbus, OH43210
- Translational Data Analytics Institute, The Ohio State University, Columbus, OH43210
| |
Collapse
|
5
|
Sun YH, Wu YL, Liao BY. Phenotypic heterogeneity in human genetic diseases: ultrasensitivity-mediated threshold effects as a unifying molecular mechanism. J Biomed Sci 2023; 30:58. [PMID: 37525275 PMCID: PMC10388531 DOI: 10.1186/s12929-023-00959-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Accepted: 07/26/2023] [Indexed: 08/02/2023] Open
Abstract
Phenotypic heterogeneity is very common in genetic systems and in human diseases and has important consequences for disease diagnosis and treatment. In addition to the many genetic and non-genetic (e.g., epigenetic, environmental) factors reported to account for part of the heterogeneity, we stress the importance of stochastic fluctuation and regulatory network topology in contributing to phenotypic heterogeneity. We argue that a threshold effect is a unifying principle to explain the phenomenon; that ultrasensitivity is the molecular mechanism for this threshold effect; and discuss the three conditions for phenotypic heterogeneity to occur. We suggest that threshold effects occur not only at the cellular level, but also at the organ level. We stress the importance of context-dependence and its relationship to pleiotropy and edgetic mutations. Based on this model, we provide practical strategies to study human genetic diseases. By understanding the network mechanism for ultrasensitivity and identifying the critical factor, we may manipulate the weak spot to gently nudge the system from an ultrasensitive state to a stable non-disease state. Our analysis provides a new insight into the prevention and treatment of genetic diseases.
Collapse
Affiliation(s)
- Y Henry Sun
- Institute of Molecular and Genomic Medicine, National Health Research Institute, Zhunan, Miaoli, Taiwan.
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan.
| | - Yueh-Lin Wu
- Institute of Molecular and Genomic Medicine, National Health Research Institute, Zhunan, Miaoli, Taiwan
- Division of Nephrology, Department of Internal Medicine, Wei-Gong Memorial Hospital, Miaoli, Taiwan
- Division of Nephrology, Department of Internal Medicine, Taipei Medical University Hospital, Taipei, Taiwan
- TMU Research Center of Urology and Kidney, Taipei Medical University, Taipei, Taiwan
- Division of Nephrology, Department of Internal Medicine, Wan Fang Hospital, Taipei Medical University, Taipei City, Taiwan
| | - Ben-Yang Liao
- Institute of Population Health Sciences, National Health Research Institute, Zhunan, Miaoli, Taiwan
| |
Collapse
|
6
|
Vihinen M. Systematic errors in annotations of truncations, loss-of-function and synonymous variants. Front Genet 2023; 14:1015017. [PMID: 36713076 PMCID: PMC9880313 DOI: 10.3389/fgene.2023.1015017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 01/03/2023] [Indexed: 01/15/2023] Open
Abstract
Description of genetic phenomena and variations requires exact language and concepts. Vast amounts of variation data are produced with next-generation sequencing pipelines. The obtained variations are automatically annotated, e.g., for their functional consequences. These tools and pipelines, along with systematic nomenclature, mainly work well, but there are still some problems in nomenclature, organization of some databases, misuse of concepts and certain practices. Therefore, systematic errors prevent correct annotation and often preclude further analysis of certain variation types. Problems and solutions are described for presumed protein truncations, variants that are claimed to be of loss-of-function based on the type of variation, and synonymous variants that are not synonymous and lead to sequence changes or to missing protein.
Collapse
|
7
|
Ray S, Banerjee A. Exploring the Human USP Gene Family and Its Association with Cancer: An In Silico Study. LECTURE NOTES IN ELECTRICAL ENGINEERING 2023:685-694. [DOI: 10.1007/978-981-99-3656-4_70] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
8
|
Vihinen M. Individual Genetic Heterogeneity. Genes (Basel) 2022; 13:1626. [PMID: 36140794 PMCID: PMC9498725 DOI: 10.3390/genes13091626] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 08/25/2022] [Accepted: 09/08/2022] [Indexed: 11/28/2022] Open
Abstract
Genetic variation has been widely covered in literature, however, not from the perspective of an individual in any species. Here, a synthesis of genetic concepts and variations relevant for individual genetic constitution is provided. All the different levels of genetic information and variation are covered, ranging from whether an organism is unmixed or hybrid, has variations in genome, chromosomes, and more locally in DNA regions, to epigenetic variants or alterations in selfish genetic elements. Genetic constitution and heterogeneity of microbiota are highly relevant for health and wellbeing of an individual. Mutation rates vary widely for variation types, e.g., due to the sequence context. Genetic information guides numerous aspects in organisms. Types of inheritance, whether Mendelian or non-Mendelian, zygosity, sexual reproduction, and sex determination are covered. Functions of DNA and functional effects of variations are introduced, along with mechanism that reduce and modulate functional effects, including TARAR countermeasures and intraindividual genetic conflict. TARAR countermeasures for tolerance, avoidance, repair, attenuation, and resistance are essential for life, integrity of genetic information, and gene expression. The genetic composition, effects of variations, and their expression are considered also in diseases and personalized medicine. The text synthesizes knowledge and insight on individual genetic heterogeneity and organizes and systematizes the central concepts.
Collapse
Affiliation(s)
- Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22184 Lund, Sweden
| |
Collapse
|
9
|
Kingdom R, Wright CF. Incomplete Penetrance and Variable Expressivity: From Clinical Studies to Population Cohorts. Front Genet 2022; 13:920390. [PMID: 35983412 PMCID: PMC9380816 DOI: 10.3389/fgene.2022.920390] [Citation(s) in RCA: 94] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 06/09/2022] [Indexed: 12/20/2022] Open
Abstract
The same genetic variant found in different individuals can cause a range of diverse phenotypes, from no discernible clinical phenotype to severe disease, even among related individuals. Such variants can be said to display incomplete penetrance, a binary phenomenon where the genotype either causes the expected clinical phenotype or it does not, or they can be said to display variable expressivity, in which the same genotype can cause a wide range of clinical symptoms across a spectrum. Both incomplete penetrance and variable expressivity are thought to be caused by a range of factors, including common variants, variants in regulatory regions, epigenetics, environmental factors, and lifestyle. Many thousands of genetic variants have been identified as the cause of monogenic disorders, mostly determined through small clinical studies, and thus, the penetrance and expressivity of these variants may be overestimated when compared to their effect on the general population. With the wealth of population cohort data currently available, the penetrance and expressivity of such genetic variants can be investigated across a much wider contingent, potentially helping to reclassify variants that were previously thought to be completely penetrant. Research into the penetrance and expressivity of such genetic variants is important for clinical classification, both for determining causative mechanisms of disease in the affected population and for providing accurate risk information through genetic counseling. A genotype-based definition of the causes of rare diseases incorporating information from population cohorts and clinical studies is critical for our understanding of incomplete penetrance and variable expressivity. This review examines our current knowledge of the penetrance and expressivity of genetic variants in rare disease and across populations, as well as looking into the potential causes of the variation seen, including genetic modifiers, mosaicism, and polygenic factors, among others. We also considered the challenges that come with investigating penetrance and expressivity.
Collapse
Affiliation(s)
| | - Caroline F. Wright
- Institute of Biomedical and Clinical Science, Royal Devon & Exeter Hospital, University of Exeter Medical School, Exeter, United Kingdom
| |
Collapse
|
10
|
Gera T, Jonas F, More R, Barkai N. Evolution of binding preferences among whole-genome duplicated transcription factors. eLife 2022; 11:73225. [PMID: 35404235 PMCID: PMC9000951 DOI: 10.7554/elife.73225] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Accepted: 01/20/2022] [Indexed: 01/10/2023] Open
Abstract
Throughout evolution, new transcription factors (TFs) emerge by gene duplication, promoting growth and rewiring of transcriptional networks. How TF duplicates diverge was studied in a few cases only. To provide a genome-scale view, we considered the set of budding yeast TFs classified as whole-genome duplication (WGD)-retained paralogs (~35% of all specific TFs). Using high-resolution profiling, we find that ~60% of paralogs evolved differential binding preferences. We show that this divergence results primarily from variations outside the DNA-binding domains (DBDs), while DBD preferences remain largely conserved. Analysis of non-WGD orthologs revealed uneven splitting of ancestral preferences between duplicates, and the preferential acquiring of new targets by the least conserved paralog (biased neo/sub-functionalization). Interactions between paralogs were rare, and, when present, occurred through weak competition for DNA-binding or dependency between dimer-forming paralogs. We discuss the implications of our findings for the evolutionary design of transcriptional networks.
Collapse
Affiliation(s)
- Tamar Gera
- Department of Molecular Genetics, Weizmann Institute of Science
| | - Felix Jonas
- Department of Molecular Genetics, Weizmann Institute of Science
| | - Roye More
- Department of Molecular Genetics, Weizmann Institute of Science
| | - Naama Barkai
- Department of Molecular Genetics, Weizmann Institute of Science
| |
Collapse
|
11
|
Brohard-Julien S, Frouin V, Meyer V, Chalabi S, Deleuze JF, Le Floch E, Battail C. Region-specific expression of young small-scale duplications in the human central nervous system. BMC Ecol Evol 2021; 21:59. [PMID: 33882820 PMCID: PMC8059171 DOI: 10.1186/s12862-021-01794-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 04/11/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The duplication of genes is one of the main genetic mechanisms that led to the gain in complexity of biological tissue. Although the implication of duplicated gene expression in brain evolution was extensively studied through comparisons between organs, their role in the regional specialization of the adult human central nervous system has not yet been well described. RESULTS Our work explored intra-organ expression properties of paralogs through multiple territories of the human central nervous system (CNS) using transcriptome data generated by the Genotype-Tissue Expression (GTEx) consortium. Interestingly, we found that paralogs were associated with region-specific expression in CNS, suggesting their involvement in the differentiation of these territories. Beside the influence of gene expression level on region-specificity, we observed the contribution of both duplication age and duplication type to the CNS region-specificity of paralogs. Indeed, we found that small scale duplicated genes (SSDs) and in particular ySSDs (SSDs younger than the 2 rounds of whole genome duplications) were more CNS region-specific than other paralogs. Next, by studying the two paralogs of ySSD pairs, we observed that when they were region-specific, they tend to be specific to the same region more often than for other paralogs, showing the high co-expression of ySSD pairs. The extension of this analysis to families of paralogs showed that the families with co-expressed gene members (i.e. homogeneous families) were enriched in ySSDs. Furthermore, these homogeneous families tended to be region-specific families, where the majority of their gene members were specifically expressed in the same region. CONCLUSIONS Overall, our study suggests the involvement of ySSDs in the differentiation of human central nervous system territories. Therefore, we show the relevance of exploring region-specific expression of paralogs at the intra-organ level.
Collapse
Affiliation(s)
- Solène Brohard-Julien
- Centre National de Recherche en Génomique Humaine (CNRGH), Institut François Jacob, CEA, Université Paris-Saclay, Evry, France.
- UNATI, Neurospin, Institut Joliot, CEA, Université Paris-Saclay, 91191, Gif-sur-Yvette, France.
- Université Paris-Sud, Université Paris-Saclay, Orsay, France.
| | - Vincent Frouin
- UNATI, Neurospin, Institut Joliot, CEA, Université Paris-Saclay, 91191, Gif-sur-Yvette, France
| | - Vincent Meyer
- Centre National de Recherche en Génomique Humaine (CNRGH), Institut François Jacob, CEA, Université Paris-Saclay, Evry, France
| | - Smahane Chalabi
- Centre National de Recherche en Génomique Humaine (CNRGH), Institut François Jacob, CEA, Université Paris-Saclay, Evry, France
| | - Jean-François Deleuze
- Centre National de Recherche en Génomique Humaine (CNRGH), Institut François Jacob, CEA, Université Paris-Saclay, Evry, France
- Centre d'Etude du Polymorphisme Humain, Fondation Jean Dausset, Paris, France
- Centre de Référence, d'Innovation, d'expertise et de transfert (CREFIX), Evry, France
| | - Edith Le Floch
- Centre National de Recherche en Génomique Humaine (CNRGH), Institut François Jacob, CEA, Université Paris-Saclay, Evry, France.
| | - Christophe Battail
- Centre National de Recherche en Génomique Humaine (CNRGH), Institut François Jacob, CEA, Université Paris-Saclay, Evry, France.
- CEA, Univ. Grenoble Alpes, INSERM, IRIG, Biology of Cancer and Infection UMR1292, 38000, Grenoble, France.
| |
Collapse
|
12
|
Baker EA, Gilbert SPR, Shimeld SM, Woollard A. Extensive non-redundancy in a recently duplicated developmental gene family. BMC Ecol Evol 2021; 21:33. [PMID: 33648446 PMCID: PMC7919330 DOI: 10.1186/s12862-020-01735-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 12/13/2020] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND It has been proposed that recently duplicated genes are more likely to be redundant with one another compared to ancient paralogues. The evolutionary logic underpinning this idea is simple, as the assumption is that recently derived paralogous genes are more similar in sequence compared to members of ancient gene families. We set out to test this idea by using molecular phylogenetics and exploiting the genetic tractability of the model nematode, Caenorhabditis elegans, in studying the nematode-specific family of Hedgehog-related genes, the Warthogs. Hedgehog is one of a handful of signal transduction pathways that underpins the development of bilaterian animals. While having lost a bona fide Hedgehog gene, most nematodes have evolved an expanded repertoire of Hedgehog-related genes, ten of which reside within the Warthog family. RESULTS We have characterised their evolutionary origin and their roles in C. elegans and found that these genes have adopted new functions in aspects of post-embryonic development, including left-right asymmetry and cell fate determination, akin to the functions of their vertebrate counterparts. Analysis of various double and triple mutants of the Warthog family reveals that more recently derived paralogues are not redundant with one another, while a pair of divergent Warthogs do display redundancy with respect to their function in cuticle biosynthesis. CONCLUSIONS We have shown that newer members of taxon-restricted gene families are not always functionally redundant despite their recent inception, whereas much older paralogues can be, which is considered paradoxical according to the current framework in gene evolution.
Collapse
Affiliation(s)
- E A Baker
- Department of Biochemistry, University of Oxford, Oxford, OX1 3QU, UK
| | - S P R Gilbert
- Department of Biochemistry, University of Oxford, Oxford, OX1 3QU, UK
| | - S M Shimeld
- Department of Zoology, University of Oxford, Oxford, OX1 3SZ, UK
| | - A Woollard
- Department of Biochemistry, University of Oxford, Oxford, OX1 3QU, UK.
| |
Collapse
|
13
|
Jubran J, Hekselman I, Novack L, Yeger-Lotem E. Dosage-sensitive molecular mechanisms are associated with the tissue-specificity of traits and diseases. Comput Struct Biotechnol J 2020; 18:4024-4032. [PMID: 33363699 PMCID: PMC7744645 DOI: 10.1016/j.csbj.2020.10.030] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 10/16/2020] [Accepted: 10/28/2020] [Indexed: 11/30/2022] Open
Abstract
Hereditary diseases and complex traits often manifest in specific tissues, whereas their causal genes are expressed in many tissues that remain unaffected. Among the mechanisms that have been suggested for this enigmatic phenomenon is dosage-sensitive compensation by paralogs of causal genes. Accordingly, tissue-selectivity stems from dosage imbalance between causal genes and paralogs that occurs particularly in disease-susceptible tissues. Here, we used a large-scale dataset of thousands of tissue transcriptomes and applied a linear mixed model (LMM) framework to assess this and other dosage-sensitive mechanisms. LMM analysis of 382 hereditary diseases consistently showed evidence for dosage-sensitive compensation by paralogs across diseases subsets and susceptible tissues. LMM analysis of 135 candidate genes that are strongly associated with 16 tissue-selective complex traits revealed a similar tendency among half of the trait-associated genes. This suggests that dosage-sensitive compensation by paralogs affects the tissue-selectivity of complex traits, and can be used to illuminate candidate genes' modes of action. Next, we applied LMM to analyze dosage imbalance between causal genes and three classes of genetic modifiers, including regulatory micro-RNAs, pseudogenes, and genetic interactors. Our results propose modifiers as a fundamental axis in tissue-selectivity of diseases and traits, and demonstrates the power of LMM as a statistical framework for discovering treatment avenues.
Collapse
Affiliation(s)
- Juman Jubran
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel
| | - Idan Hekselman
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel
| | - Lena Novack
- Soroka University Medical Center, Beer-Sheva 84101, Israel.,Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel
| | - Esti Yeger-Lotem
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel.,The National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel
| |
Collapse
|
14
|
Pejaver V, Urresti J, Lugo-Martinez J, Pagel KA, Lin GN, Nam HJ, Mort M, Cooper DN, Sebat J, Iakoucheva LM, Mooney SD, Radivojac P. Inferring the molecular and phenotypic impact of amino acid variants with MutPred2. Nat Commun 2020; 11:5918. [PMID: 33219223 PMCID: PMC7680112 DOI: 10.1038/s41467-020-19669-x] [Citation(s) in RCA: 342] [Impact Index Per Article: 68.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Accepted: 10/23/2020] [Indexed: 01/02/2023] Open
Abstract
Identifying pathogenic variants and underlying functional alterations is challenging. To this end, we introduce MutPred2, a tool that improves the prioritization of pathogenic amino acid substitutions over existing methods, generates molecular mechanisms potentially causative of disease, and returns interpretable pathogenicity score distributions on individual genomes. Whilst its prioritization performance is state-of-the-art, a distinguishing feature of MutPred2 is the probabilistic modeling of variant impact on specific aspects of protein structure and function that can serve to guide experimental studies of phenotype-altering variants. We demonstrate the utility of MutPred2 in the identification of the structural and functional mutational signatures relevant to Mendelian disorders and the prioritization of de novo mutations associated with complex neurodevelopmental disorders. We then experimentally validate the functional impact of several variants identified in patients with such disorders. We argue that mechanism-driven studies of human inherited disease have the potential to significantly accelerate the discovery of clinically actionable variants.
Collapse
Affiliation(s)
- Vikas Pejaver
- Department of Computer Science, Indiana University, Bloomington, IN, USA
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Jorge Urresti
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
| | - Jose Lugo-Martinez
- Department of Computer Science, Indiana University, Bloomington, IN, USA
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213, USA
| | - Kymberleigh A Pagel
- Department of Computer Science, Indiana University, Bloomington, IN, USA
- Institute for Computational Medicine, Whiting School of Engineering, Johns Hopkins University, 220 Hackerman Hall, 3400 N Charles St, Baltimore, MD, 21218, USA
| | - Guan Ning Lin
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, People's Republic of China
| | - Hyun-Jun Nam
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
| | - Matthew Mort
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, UK
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, UK
| | - Jonathan Sebat
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
- Beyster Center for Genomics of Psychiatric Diseases, University of California San Diego, La Jolla, CA, USA
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
| | - Lilia M Iakoucheva
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA.
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA.
| | - Predrag Radivojac
- Department of Computer Science, Indiana University, Bloomington, IN, USA.
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA.
| |
Collapse
|
15
|
Vihinen M. Functional effects of protein variants. Biochimie 2020; 180:104-120. [PMID: 33164889 DOI: 10.1016/j.biochi.2020.10.009] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 10/15/2020] [Accepted: 10/19/2020] [Indexed: 12/11/2022]
Abstract
Genetic and other variations frequently affect protein functions. Scientific articles can contain confusing descriptions about which function or property is affected, and in many cases the statements are pure speculation without any experimental evidence. To clarify functional effects of protein variations of genetic or non-genetic origin, a systematic conceptualisation and framework are introduced. This framework describes protein functional effects on abundance, activity, specificity and affinity, along with countermeasures, which allow cells, tissues and organisms to tolerate, avoid, repair, attenuate or resist (TARAR) the effects. Effects on abundance discussed include gene dosage, restricted expression, mis-localisation and degradation. Enzymopathies, effects on kinetics, allostery and regulation of protein activity are subtopics for the effects of variants on activity. Variation outcomes on specificity and affinity comprise promiscuity, specificity, affinity and moonlighting. TARAR mechanisms redress variations with active and passive processes including chaperones, redundancy, robustness, canalisation and metabolic and signalling rewiring. A framework for pragmatic protein function analysis and presentation is introduced. All of the mechanisms and effects are described along with representative examples, most often in relation to diseases. In addition, protein function is discussed from evolutionary point of view. Application of the presented framework facilitates unambiguous, detailed and specific description of functional effects and their systematic study.
Collapse
Affiliation(s)
- Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22 184, Lund, Sweden.
| |
Collapse
|
16
|
Park Y, Seo H, Y Ryu B, Kim JH. Gene-wise variant burden and genomic characterization of nearly every gene. Pharmacogenomics 2020; 21:827-840. [DOI: 10.2217/pgs-2020-0039] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Aim: Current gene-level prioritization methods aim to provide information for further prioritization of ‘disease-causing’ mutations. Since, they are inherently biased toward disease genes, methods specific to pharmacogenetic (PGx) genes are required. Methods: We proposed a gene-wise variant burden (GVB) method that integrates in silico deleteriousness scores of the multitude of variants of a given gene at a personal-genome level. Results: GVB in its simplest form outperformed the two state-of-the-art methods with regard to predicting pharmacogenes and complex disease genes but not for rare Mendelian disease genes. GVB* adjusted by paralog counts robustly performed well in most of the pharmacogenetic subcategories. Seven molecular genetic features well characterized the unique genomic properties of PGx, complex, and Mendelian disease genes. Conclusion: Altogether, GVB is an individual-specific genescore, especially advantageous for PGx studies.
Collapse
Affiliation(s)
- Yoomi Park
- Seoul National University Biomedical Informatics (SNUBI), Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul, 03080, Korea
| | - Heewon Seo
- Seoul National University Biomedical Informatics (SNUBI), Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul, 03080, Korea
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, M5G 2M9, Canada
| | - Brian Y Ryu
- Seoul National University Biomedical Informatics (SNUBI), Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul, 03080, Korea
| | - Ju Han Kim
- Seoul National University Biomedical Informatics (SNUBI), Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul, 03080, Korea
- Center for Precision Medicine, Seoul National University Hospital, Seoul, 03080, Korea
| |
Collapse
|
17
|
Zhao Y, Wang J, Chen J, Zhang X, Guo M, Yu G. A Literature Review of Gene Function Prediction by Modeling Gene Ontology. Front Genet 2020; 11:400. [PMID: 32391061 PMCID: PMC7193026 DOI: 10.3389/fgene.2020.00400] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/30/2020] [Indexed: 12/14/2022] Open
Abstract
Annotating the functional properties of gene products, i.e., RNAs and proteins, is a fundamental task in biology. The Gene Ontology database (GO) was developed to systematically describe the functional properties of gene products across species, and to facilitate the computational prediction of gene function. As GO is routinely updated, it serves as the gold standard and main knowledge source in functional genomics. Many gene function prediction methods making use of GO have been proposed. But no literature review has summarized these methods and the possibilities for future efforts from the perspective of GO. To bridge this gap, we review the existing methods with an emphasis on recent solutions. First, we introduce the conventions of GO and the widely adopted evaluation metrics for gene function prediction. Next, we summarize current methods of gene function prediction that apply GO in different ways, such as using hierarchical or flat inter-relationships between GO terms, compressing massive GO terms and quantifying semantic similarities. Although many efforts have improved performance by harnessing GO, we conclude that there remain many largely overlooked but important topics for future research.
Collapse
Affiliation(s)
- Yingwen Zhao
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Jun Wang
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Jian Chen
- State Key Laboratory of Agrobiotechnology and National Maize Improvement Center, China Agricultural University, Beijing, China
| | - Xiangliang Zhang
- CBRC, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| | - Guoxian Yu
- College of Computer and Information Science, Southwest University, Chongqing, China
- CBRC, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
18
|
Yamasaki M, Makino T, Khor SS, Toyoda H, Miyagawa T, Liu X, Kuwabara H, Kano Y, Shimada T, Sugiyama T, Nishida H, Sugaya N, Tochigi M, Otowa T, Okazaki Y, Kaiya H, Kawamura Y, Miyashita A, Kuwano R, Kasai K, Tanii H, Sasaki T, Honda M, Tokunaga K. Sensitivity to gene dosage and gene expression affects genes with copy number variants observed among neuropsychiatric diseases. BMC Med Genomics 2020; 13:55. [PMID: 32223758 PMCID: PMC7104509 DOI: 10.1186/s12920-020-0699-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Accepted: 02/24/2020] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Copy number variants (CNVs) have been reported to be associated with diseases, traits, and evolution. However, it is hard to determine which gene should have priority as a target for further functional experiments if a CNV is rare or a singleton. In this study, we attempted to overcome this issue by using two approaches: by assessing the influences of gene dosage sensitivity and gene expression sensitivity. Dosage sensitive genes derived from two-round whole-genome duplication in previous studies. In addition, we proposed a cross-sectional omics approach that utilizes open data from GTEx to assess the effect of whole-genome CNVs on gene expression. METHODS Affymetrix Genome-Wide SNP Array 6.0 was used to detect CNVs by PennCNV and CNV Workshop. After quality controls for population stratification, family relationship and CNV detection, 287 patients with narcolepsy, 133 patients with essential hypersomnia, 380 patients with panic disorders, 164 patients with autism, 784 patients with Alzheimer disease and 1280 healthy individuals remained for the enrichment analysis. RESULTS Overall, significant enrichment of dosage sensitive genes was found across patients with narcolepsy, panic disorders and autism. Particularly, significant enrichment of dosage-sensitive genes in duplications was observed across all diseases except for Alzheimer disease. For deletions, less or no enrichment of dosage-sensitive genes with deletions was seen in the patients when compared to the healthy individuals. Interestingly, significant enrichments of genes with expression sensitivity in brain were observed in patients with panic disorder and autism. While duplications presented a higher burden, deletions did not cause significant differences when compared to the healthy individuals. When we assess the effect of sensitivity to genome dosage and gene expression at the same time, the highest ratio of enrichment was observed in the group including dosage-sensitive genes and genes with expression sensitivity only in brain. In addition, shared CNV regions among the five neuropsychiatric diseases were also investigated. CONCLUSIONS This study contributed the evidence that dosage-sensitive genes are associated with CNVs among neuropsychiatric diseases. In addition, we utilized open data from GTEx to assess the effect of whole-genome CNVs on gene expression. We also investigated shared CNV region among neuropsychiatric diseases.
Collapse
Affiliation(s)
- Maria Yamasaki
- Department of Health Data Science Research, Healthy Aging Innovation Center, Tokyo Metropolitan Geriatric Medical Center, Tokyo, Japan
| | - Takashi Makino
- Laboratory of Evolutionary Genomics, Graduate School of Life Sciences, Tohoku University, Sendai, Japan
| | - Seik-Soon Khor
- Genome Medical Science Project (Toyama), National Center for for Global Health and Medicine, Tokyo, Japan
| | - Hiromi Toyoda
- Genome Medical Science Project (Toyama), National Center for for Global Health and Medicine, Tokyo, Japan
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Taku Miyagawa
- Department of Psychiatry and Behavioral Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| | - Xiaoxi Liu
- RIKEN Center for Integrative Medical Sciences, Kanagawa, Japan
| | - Hitoshi Kuwabara
- Department of Psychiatry, Hamamatsu University School of Medicine, Hamamatsu, Japan
| | - Yukiko Kano
- Department of Child and Adolescent Psychiatry, Hamamatsu University School of Medicine, Shizuoka, Japan
- Department of Child Psychiatry, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Takafumi Shimada
- Division for Counseling and Support, The University of Tokyo, Tokyo, Japan
| | - Toshiro Sugiyama
- Department of Child and Adolescent Psychiatry, Hamamatsu University School of Medicine, Shizuoka, Japan
| | - Hisami Nishida
- Asunaro Hospital for Child and Adolescent Psychiatry, Mie, Japan
| | - Nagisa Sugaya
- Unit of Public Health and Preventive Medicine, School of Medicine, Yokohama City University, Kanagawa, Japan
| | - Mamoru Tochigi
- Department of Neuropsychiatry, Teikyo University Hospital, Tokyo, Japan
| | - Takeshi Otowa
- Department of Neuropsychiatry, NTT Medical Center Tokyo, Tokyo, Japan
| | - Yuji Okazaki
- Department of Psychiatry, Koseikai Michinoo Hospital, Nagasaki, Japan
| | - Hisanobu Kaiya
- Panic Disorder Research Center, Warakukai Med Corp, Tokyo, Japan
| | - Yoshiya Kawamura
- Department of Psychiatry, Shonan Kamakura General Hospital, Kanagawa, Japan
| | - Akinori Miyashita
- Department of Molecular Genetics, Bioresource Science Branch, Center for Bioresources, Brain Research Institute, Niigata University, Niigata, Japan
| | - Ryozo Kuwano
- Department of Molecular Genetics, Bioresource Science Branch, Center for Bioresources, Brain Research Institute, Niigata University, Niigata, Japan
- Asahigawaso Research Institute, Asahigawaso Medical-Welfare Center, Okayama, Japan
| | - Kiyoto Kasai
- Department of Neuropsychiatry, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Hisashi Tanii
- Center for Physical and Mental Health, Mie University, Tsu, Mie Japan
| | - Tsukasa Sasaki
- Division of Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
| | - Makoto Honda
- Department of Psychiatry and Behavioral Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| | - Katsushi Tokunaga
- Genome Medical Science Project (Toyama), National Center for for Global Health and Medicine, Tokyo, Japan
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
19
|
Pérez-Palma E, May P, Iqbal S, Niestroj LM, Du J, Heyne HO, Castrillon JA, O'Donnell-Luria A, Nürnberg P, Palotie A, Daly M, Lal D. Identification of pathogenic variant enriched regions across genes and gene families. Genome Res 2020; 30:62-71. [PMID: 31871067 PMCID: PMC6961572 DOI: 10.1101/gr.252601.119] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 12/19/2019] [Indexed: 12/11/2022]
Abstract
Missense variant interpretation is challenging. Essential regions for protein function are conserved among gene-family members, and genetic variants within these regions are potentially more likely to confer risk to disease. Here, we generated 2871 gene-family protein sequence alignments involving 9990 genes and performed missense variant burden analyses to identify novel essential protein regions. We mapped 2,219,811 variants from the general population into these alignments and compared their distribution with 76,153 missense variants from patients. With this gene-family approach, we identified 465 regions enriched for patient variants spanning 41,463 amino acids in 1252 genes. As a comparison, by testing the same genes individually, we identified fewer patient variant enriched regions, involving only 2639 amino acids and 215 genes. Next, we selected de novo variants from 6753 patients with neurodevelopmental disorders and 1911 unaffected siblings and observed an 8.33-fold enrichment of patient variants in our identified regions (95% C.I. = 3.90-Inf, P-value = 2.72 × 10-11). By using the complete ClinVar variant set, we found that missense variants inside the identified regions are 106-fold more likely to be classified as pathogenic in comparison to benign classification (OR = 106.15, 95% C.I = 70.66-Inf, P-value < 2.2 × 10-16). All pathogenic variant enriched regions (PERs) identified are available online through "PER viewer," a user-friendly online platform for interactive data mining, visualization, and download. In summary, our gene-family burden analysis approach identified novel PERs in protein sequences. This annotation can empower variant interpretation.
Collapse
Affiliation(s)
- Eduardo Pérez-Palma
- Cologne Center for Genomics, University of Cologne, Cologne, 50931 NRW, Germany
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio 44195, USA
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University Luxembourg, L-4367 Esch-sur-Alzette, Luxembourg
| | - Sumaiya Iqbal
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Lisa-Marie Niestroj
- Cologne Center for Genomics, University of Cologne, Cologne, 50931 NRW, Germany
| | - Juanjiangmeng Du
- Cologne Center for Genomics, University of Cologne, Cologne, 50931 NRW, Germany
| | - Henrike O Heyne
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, FI-00014 Helsinki, Finland
| | | | - Anne O'Donnell-Luria
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02142, USA
| | - Peter Nürnberg
- Cologne Center for Genomics, University of Cologne, Cologne, 50931 NRW, Germany
| | - Aarno Palotie
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, FI-00014 Helsinki, Finland
| | - Mark Daly
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, FI-00014 Helsinki, Finland
| | - Dennis Lal
- Cologne Center for Genomics, University of Cologne, Cologne, 50931 NRW, Germany
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio 44195, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, Ohio 44195, USA
| |
Collapse
|
20
|
Dandage R, Landry CR. Paralog dependency indirectly affects the robustness of human cells. Mol Syst Biol 2019; 15:e8871. [PMID: 31556487 PMCID: PMC6757259 DOI: 10.15252/msb.20198871] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 08/26/2019] [Accepted: 08/28/2019] [Indexed: 12/19/2022] Open
Abstract
The protective redundancy of paralogous genes partly relies on the fact that they carry their functions independently. However, a significant fraction of paralogous proteins may form functionally dependent pairs, for instance, through heteromerization. As a consequence, one could expect these heteromeric paralogs to be less protective against deleterious mutations. To test this hypothesis, we examined the robustness landscape of gene loss-of-function by CRISPR-Cas9 in more than 450 human cell lines. This landscape shows regions of greater deleteriousness to gene inactivation as a function of key paralog properties. Heteromeric paralogs are more likely to occupy such regions owing to their high expression and large number of protein-protein interaction partners. Further investigation revealed that heteromers may also be under stricter dosage balance, which may also contribute to the higher deleteriousness upon gene inactivation. Finally, we suggest that physical dependency may contribute to the deleteriousness upon loss-of-function as revealed by the correlation between the strength of interactions between paralogs and their higher deleteriousness upon loss of function.
Collapse
Affiliation(s)
- Rohan Dandage
- Département de BiologieUniversité LavalQuébecQCCanada
- Département de Biochimie, Microbiologie et Bio‐InformatiqueUniversité LavalQuébecQCCanada
- Institut de Biologie Intégrative et des Systèmes (IBIS)Université LavalQuébecQCCanada
- The Québec Network for Research on Protein Function, Engineering, and Applications (PROTEO)Université LavalQuébecQCCanada
- Centre de Recherche en Données Massives (CRDM)Université LavalQuébecQCCanada
| | - Christian R Landry
- Département de BiologieUniversité LavalQuébecQCCanada
- Département de Biochimie, Microbiologie et Bio‐InformatiqueUniversité LavalQuébecQCCanada
- Institut de Biologie Intégrative et des Systèmes (IBIS)Université LavalQuébecQCCanada
- The Québec Network for Research on Protein Function, Engineering, and Applications (PROTEO)Université LavalQuébecQCCanada
- Centre de Recherche en Données Massives (CRDM)Université LavalQuébecQCCanada
| |
Collapse
|
21
|
Barshir R, Hekselman I, Shemesh N, Sharon M, Novack L, Yeger-Lotem E. Role of duplicate genes in determining the tissue-selectivity of hereditary diseases. PLoS Genet 2018; 14:e1007327. [PMID: 29723191 PMCID: PMC5953478 DOI: 10.1371/journal.pgen.1007327] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Revised: 05/15/2018] [Accepted: 03/21/2018] [Indexed: 11/18/2022] Open
Abstract
A longstanding puzzle in human genetics is what limits the clinical manifestation of hundreds of hereditary diseases to certain tissues, while their causal genes are expressed throughout the human body. A general conception is that tissue-selective disease phenotypes emerge when masking factors operate in unaffected tissues, but are specifically absent or insufficient in disease-manifesting tissues. Although this conception has critical impact on the understanding of disease manifestation, it was never challenged in a systematic manner across a variety of hereditary diseases and affected tissues. Here, we address this gap in our understanding via rigorous analysis of the susceptibility of over 30 tissues to 112 tissue-selective hereditary diseases. We focused on the roles of paralogs of causal genes, which are presumably capable of compensating for their aberration. We show for the first time at large-scale via quantitative analysis of omics datasets that, preferentially in the disease-manifesting tissues, paralogs are under-expressed relative to causal genes in more than half of the diseases. This was observed for several susceptible tissues and for causal genes with varying number of paralogs, suggesting that imbalanced expression of paralogs increases tissue susceptibility. While for many diseases this imbalance stemmed from up-regulation of the causal gene in the disease-manifesting tissue relative to other tissues, it was often combined with down-regulation of its paralog. Notably in roughly 20% of the cases, this imbalance stemmed only from significant down-regulation of the paralog. Thus, dosage relationships between paralogs appear as important, yet currently under-appreciated, modifiers of disease manifestation.
Collapse
Affiliation(s)
- Ruth Barshir
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Idan Hekselman
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Netta Shemesh
- Department of Life Sciences, Faculty of Natural Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Moran Sharon
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Lena Novack
- Department of Public Health, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Esti Yeger-Lotem
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
- National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| |
Collapse
|
22
|
Roux J, Liu J, Robinson-Rechavi M. Selective Constraints on Coding Sequences of Nervous System Genes Are a Major Determinant of Duplicate Gene Retention in Vertebrates. Mol Biol Evol 2018; 34:2773-2791. [PMID: 28981708 PMCID: PMC5850798 DOI: 10.1093/molbev/msx199] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The evolutionary history of vertebrates is marked by three ancient whole-genome duplications: two successive rounds in the ancestor of vertebrates, and a third one specific to teleost fishes. Biased loss of most duplicates enriched the genome for specific genes, such as slow evolving genes, but this selective retention process is not well understood. To understand what drives the long-term preservation of duplicate genes, we characterized duplicated genes in terms of their expression patterns. We used a new method of expression enrichment analysis, TopAnat, applied to in situ hybridization data from thousands of genes from zebrafish and mouse. We showed that the presence of expression in the nervous system is a good predictor of a higher rate of retention of duplicate genes after whole-genome duplication. Further analyses suggest that purifying selection against the toxic effects of misfolded or misinteracting proteins, which is particularly strong in nonrenewing neural tissues, likely constrains the evolution of coding sequences of nervous system genes, leading indirectly to the preservation of duplicate genes after whole-genome duplication. Whole-genome duplications thus greatly contributed to the expansion of the toolkit of genes available for the evolution of profound novelties of the nervous system at the base of the vertebrate radiation.
Collapse
Affiliation(s)
- Julien Roux
- Département d'Ecologie et d'Evolution, Université de Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Jialin Liu
- Département d'Ecologie et d'Evolution, Université de Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Département d'Ecologie et d'Evolution, Université de Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
23
|
Huang X, Liu H, Li X, Guan L, Li J, Tellier LCAM, Yang H, Wang J, Zhang J. Revealing Alzheimer's disease genes spectrum in the whole-genome by machine learning. BMC Neurol 2018; 18:5. [PMID: 29320986 PMCID: PMC5763548 DOI: 10.1186/s12883-017-1010-3] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Accepted: 12/21/2017] [Indexed: 11/23/2022] Open
Abstract
Background Alzheimer’s disease (AD) is an important, progressive neurodegenerative disease, with a complex genetic architecture. A key goal of biomedical research is to seek out disease risk genes, and to elucidate the function of these risk genes in the development of disease. For this purpose, expanding the AD-associated gene set is necessary. In past research, the prediction methods for AD related genes has been limited in their exploration of the target genome regions. We here present a genome-wide method for AD candidate genes predictions. Methods We present a machine learning approach (SVM), based upon integrating gene expression data with human brain-specific gene network data, to discover the full spectrum of AD genes across the whole genome. Results We classified AD candidate genes with an accuracy and the area under the receiver operating characteristic (ROC) curve of 84.56% and 94%. Our approach provides a supplement for the spectrum of AD-associated genes extracted from more than 20,000 genes in a genome wide scale. Conclusions In this study, we have elucidated the whole-genome spectrum of AD, using a machine learning approach. Through this method, we expect for the candidate gene catalogue to provide a more comprehensive annotation of AD for researchers. Electronic supplementary material The online version of this article (10.1186/s12883-017-1010-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaoyan Huang
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, 518083, China.,BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China
| | - Hankui Liu
- BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China
| | - Xinming Li
- College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Liping Guan
- BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China
| | - Jiankang Li
- BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China
| | - Laurent Christian Asker M Tellier
- BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China.,Department of Biology, Bioinformatics, University of Copenhagen, Copenhagen, Denmark
| | - Huanming Yang
- BGI-Shenzhen, Shenzhen, 518083, China.,James D. Watson Institute of Genome Sciences, Hangzhou, 310058, China
| | - Jian Wang
- BGI-Shenzhen, Shenzhen, 518083, China.,James D. Watson Institute of Genome Sciences, Hangzhou, 310058, China
| | - Jianguo Zhang
- BGI-Shenzhen, Shenzhen, 518083, China. .,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China. .,Shenzhen Key Lab of Neurogenomics, BGI-Shenzhen, Shenzhen, 518120, China.
| |
Collapse
|
24
|
Chen WH, Lu G, Chen X, Zhao XM, Bork P. OGEE v2: an update of the online gene essentiality database with special focus on differentially essential genes in human cancer cell lines. Nucleic Acids Res 2017; 45:D940-D944. [PMID: 27799467 PMCID: PMC5210522 DOI: 10.1093/nar/gkw1013] [Citation(s) in RCA: 106] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Revised: 10/14/2016] [Accepted: 10/18/2016] [Indexed: 01/14/2023] Open
Abstract
OGEE is an Online GEne Essentiality database. To enhance our understanding of the essentiality of genes, in OGEE we collected experimentally tested essential and non-essential genes, as well as associated gene properties known to contribute to gene essentiality. We focus on large-scale experiments, and complement our data with text-mining results. We organized tested genes into data sets according to their sources, and tagged those with variable essentiality statuses across data sets as conditionally essential genes, intending to highlight the complex interplay between gene functions and environments/experimental perturbations. Developments since the last public release include increased numbers of species and gene essentiality data sets, inclusion of non-coding essential sequences and genes with intermediate essentiality statuses. In addition, we included 16 essentiality data sets from cancer cell lines, corresponding to 9 human cancers; with OGEE, users can easily explore the shared and differentially essential genes within and between cancer types. These genes, especially those derived from cell lines that are similar to tumor samples, could reveal the oncogenic drivers, paralogous gene expression pattern and chromosomal structure of the corresponding cancer types, and can be further screened to identify targets for cancer therapy and/or new drug development. OGEE is freely available at http://ogee.medgenius.info.
Collapse
Affiliation(s)
- Wei-Hua Chen
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology (HUST), 430074 Wuhan, Hubei, China
| | - Guanting Lu
- Department of Blood Transfusion, Tangdu Hospital, the Fourth Military Medical University, No 1, Xinsi Road, Chanba District, 710000 Xi'an, China
| | - Xiao Chen
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
| | - Xing-Ming Zhao
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
| | - Peer Bork
- European molecular biology laboratory (EMBL), Meyerhofstrasse 1, 69117 Heidelberg, Germany
- Molecular Medicine Partnership Unit, University of Heidelberg and European Molecular Biology Laboratory, 69120 Heidelberg, Germany
- Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Straße 10, 13125 Berlin, Germany
- Department of Bioinformatics, Biocenter, University of Würzburg, 97074 Würzburg, Germany
| |
Collapse
|
25
|
Diekwisch TGH. Novel approaches toward managing the micromanagers: 'non-toxic' but effective. Gene Ther 2016; 23:697-698. [PMID: 27383252 DOI: 10.1038/gt.2016.49] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2016] [Accepted: 04/18/2016] [Indexed: 01/27/2023]
Affiliation(s)
- T G H Diekwisch
- Center for Craniofacial Research and Diagnosis, Texas A&M University Baylor College of Dentistry, Dallas, TX, USA
| |
Collapse
|
26
|
Hulsey CD, Fraser GJ, Meyer A. Biting into the Genome to Phenome Map: Developmental Genetic Modularity of Cichlid Fish Dentitions. Integr Comp Biol 2016; 56:373-88. [DOI: 10.1093/icb/icw059] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
|
27
|
The exploration of network motifs as potential drug targets from post-translational regulatory networks. Sci Rep 2016; 6:20558. [PMID: 26853265 PMCID: PMC4744934 DOI: 10.1038/srep20558] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Accepted: 01/06/2016] [Indexed: 12/15/2022] Open
Abstract
Phosphorylation and proteolysis are among the most common post-translational modifications (PTMs), and play critical roles in various biological processes. More recent discoveries imply that the crosstalks between these two PTMs are involved in many diseases. In this work, we construct a post-translational regulatory network (PTRN) consists of phosphorylation and proteolysis processes, which enables us to investigate the regulatory interplays between these two PTMs. With the PTRN, we identify some functional network motifs that are significantly enriched with drug targets, some of which are further found to contain multiple proteins targeted by combinatorial drugs. These findings imply that the network motifs may be used to predict targets when designing new drugs. Inspired by this, we propose a novel computational approach called NetTar for predicting drug targets using the identified network motifs. Benchmarking results on real data indicate that our approach can be used for accurate prediction of novel proteins targeted by known drugs.
Collapse
|
28
|
Zhu G, Wu A, Xu XJ, Xiao PP, Lu L, Liu J, Cao Y, Chen L, Wu J, Zhao XM. PPIM: A Protein-Protein Interaction Database for Maize. PLANT PHYSIOLOGY 2016; 170:618-26. [PMID: 26620522 PMCID: PMC4734591 DOI: 10.1104/pp.15.01821] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2015] [Accepted: 11/25/2015] [Indexed: 05/18/2023]
Abstract
Maize (Zea mays) is one of the most important crops worldwide. To understand the biological processes underlying various traits of the crop (e.g. yield and response to stress), a detailed protein-protein interaction (PPI) network is highly demanded. Unfortunately, there are very few such PPIs available in the literature. Therefore, in this work, we present the Protein-Protein Interaction Database for Maize (PPIM), which covers 2,762,560 interactions among 14,000 proteins. The PPIM contains not only accurately predicted PPIs but also those molecular interactions collected from the literature. The database is freely available at http://comp-sysbio.org/ppim with a user-friendly powerful interface. We believe that the PPIM resource can help biologists better understand the maize crop.
Collapse
Affiliation(s)
- Guanghui Zhu
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China (G.Z., P.-P.X., J.W., X.-M.Z.);Key Laboratory of Food Safety Research, Institute for Nutritional Sciences (A.W.), and Key Laboratory of Systems Biology (L.C.), Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China;Department of Mathematics, Shanghai University, Shanghai 200444, China (X.-J.X.); andMonsanto Company, St. Louis, Missouri 63167 (L.L., J.L., Y.C.)
| | - Aibo Wu
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China (G.Z., P.-P.X., J.W., X.-M.Z.);Key Laboratory of Food Safety Research, Institute for Nutritional Sciences (A.W.), and Key Laboratory of Systems Biology (L.C.), Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China;Department of Mathematics, Shanghai University, Shanghai 200444, China (X.-J.X.); andMonsanto Company, St. Louis, Missouri 63167 (L.L., J.L., Y.C.)
| | - Xin-Jian Xu
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China (G.Z., P.-P.X., J.W., X.-M.Z.);Key Laboratory of Food Safety Research, Institute for Nutritional Sciences (A.W.), and Key Laboratory of Systems Biology (L.C.), Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China;Department of Mathematics, Shanghai University, Shanghai 200444, China (X.-J.X.); andMonsanto Company, St. Louis, Missouri 63167 (L.L., J.L., Y.C.)
| | - Pei-Pei Xiao
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China (G.Z., P.-P.X., J.W., X.-M.Z.);Key Laboratory of Food Safety Research, Institute for Nutritional Sciences (A.W.), and Key Laboratory of Systems Biology (L.C.), Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China;Department of Mathematics, Shanghai University, Shanghai 200444, China (X.-J.X.); andMonsanto Company, St. Louis, Missouri 63167 (L.L., J.L., Y.C.)
| | - Le Lu
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China (G.Z., P.-P.X., J.W., X.-M.Z.);Key Laboratory of Food Safety Research, Institute for Nutritional Sciences (A.W.), and Key Laboratory of Systems Biology (L.C.), Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China;Department of Mathematics, Shanghai University, Shanghai 200444, China (X.-J.X.); andMonsanto Company, St. Louis, Missouri 63167 (L.L., J.L., Y.C.)
| | - Jingdong Liu
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China (G.Z., P.-P.X., J.W., X.-M.Z.);Key Laboratory of Food Safety Research, Institute for Nutritional Sciences (A.W.), and Key Laboratory of Systems Biology (L.C.), Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China;Department of Mathematics, Shanghai University, Shanghai 200444, China (X.-J.X.); andMonsanto Company, St. Louis, Missouri 63167 (L.L., J.L., Y.C.)
| | - Yongwei Cao
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China (G.Z., P.-P.X., J.W., X.-M.Z.);Key Laboratory of Food Safety Research, Institute for Nutritional Sciences (A.W.), and Key Laboratory of Systems Biology (L.C.), Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China;Department of Mathematics, Shanghai University, Shanghai 200444, China (X.-J.X.); andMonsanto Company, St. Louis, Missouri 63167 (L.L., J.L., Y.C.)
| | - Luonan Chen
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China (G.Z., P.-P.X., J.W., X.-M.Z.);Key Laboratory of Food Safety Research, Institute for Nutritional Sciences (A.W.), and Key Laboratory of Systems Biology (L.C.), Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China;Department of Mathematics, Shanghai University, Shanghai 200444, China (X.-J.X.); andMonsanto Company, St. Louis, Missouri 63167 (L.L., J.L., Y.C.)
| | - Jun Wu
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China (G.Z., P.-P.X., J.W., X.-M.Z.);Key Laboratory of Food Safety Research, Institute for Nutritional Sciences (A.W.), and Key Laboratory of Systems Biology (L.C.), Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China;Department of Mathematics, Shanghai University, Shanghai 200444, China (X.-J.X.); andMonsanto Company, St. Louis, Missouri 63167 (L.L., J.L., Y.C.)
| | - Xing-Ming Zhao
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China (G.Z., P.-P.X., J.W., X.-M.Z.);Key Laboratory of Food Safety Research, Institute for Nutritional Sciences (A.W.), and Key Laboratory of Systems Biology (L.C.), Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China;Department of Mathematics, Shanghai University, Shanghai 200444, China (X.-J.X.); andMonsanto Company, St. Louis, Missouri 63167 (L.L., J.L., Y.C.)
| |
Collapse
|
29
|
Acharya D, Ghosh TC. Global analysis of human duplicated genes reveals the relative importance of whole-genome duplicates originated in the early vertebrate evolution. BMC Genomics 2016; 17:71. [PMID: 26801093 PMCID: PMC4724117 DOI: 10.1186/s12864-016-2392-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Accepted: 01/13/2016] [Indexed: 12/13/2022] Open
Abstract
Background Gene duplication is a genetic mutation that creates functionally redundant gene copies that are initially relieved from selective pressures and may adapt themselves to new functions with time. The levels of gene duplication may vary from small-scale duplication (SSD) to whole genome duplication (WGD). Studies with yeast revealed ample differences between these duplicates: Yeast WGD pairs were functionally more similar, less divergent in subcellular localization and contained a lesser proportion of essential genes. In this study, we explored the differences in evolutionary genomic properties of human SSD and WGD genes, with the identifiable human duplicates coming from the two rounds of whole genome duplication occurred early in vertebrate evolution. Results We observed that these two groups of duplicates were also dissimilar in terms of their evolutionary and genomic properties. But interestingly, this is not like the same observed in yeast. The human WGDs were found to be functionally less similar, diverge more in subcellular level and contain a higher proportion of essential genes than the SSDs, all of which are opposite from yeast. Additionally, we explored that human WGDs were more divergent in their gene expression profile, have higher multifunctionality and are more often associated with disease, and are evolutionarily more conserved than human SSDs. Conclusions Our study suggests that human WGD duplicates are more divergent and entails the adaptation of WGDs to novel and important functions that consequently lead to their evolutionary conservation in the course of evolution. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2392-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Debarun Acharya
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata, 700054, West Bengal, India
| | - Tapash C Ghosh
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata, 700054, West Bengal, India.
| |
Collapse
|
30
|
Miura S, Tate S, Kumar S. Using Disease-Associated Coding Sequence Variation to Investigate Functional Compensation by Human Paralogous Proteins. Evol Bioinform Online 2015; 11:245-51. [PMID: 26604664 PMCID: PMC4631161 DOI: 10.4137/ebo.s30594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2015] [Revised: 09/14/2015] [Accepted: 09/18/2015] [Indexed: 11/09/2022] Open
Abstract
Gene duplication enables the functional diversification in species. It is thought that duplicated genes may be able to compensate if the function of one of the gene copies is disrupted. This possibility is extensively debated with some studies reporting proteome-wide compensation, whereas others suggest functional compensation among only recent gene duplicates or no compensation at all. We report results from a systematic molecular evolutionary analysis to test the predictions of the functional compensation hypothesis. We contrasted the density of Mendelian disease-associated single nucleotide variants (dSNVs) in proteins with no discernable paralogs (singletons) with the dSNV density in proteins found in multigene families. Under the functional compensation hypothesis, we expected to find greater numbers of dSNVs in singletons due to the lack of any compensating partners. Our analyses produced an opposite pattern; paralogs have over 35% higher dSNV density than singletons. We found that these patterns are concordant with similar differences in the rates of amino acid evolution (ie, functional constraints), as the proteins with paralogs have evolved 33% slower than singletons. Our evolutionary constraint explanation is robust to differences in family sizes, ages (young vs. old duplicates), and degrees of amino acid sequence similarities among paralogs. Therefore, disease-associated human variation does not exhibit significant signals of functional compensation among paralogous proteins, but rather an evolutionary constraint hypothesis provides a better explanation for the observed patterns of disease-associated and neutral polymorphisms in the human genome.
Collapse
Affiliation(s)
- Sayaka Miura
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - Stephanie Tate
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA. ; Department of Biology, Temple University, Philadelphia, PA, USA. ; Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
31
|
Liu J, Zhao D, Fan R. Shared and unique mutational gene co-occurrences in cancers. Biochem Biophys Res Commun 2015; 465:777-83. [PMID: 26315265 DOI: 10.1016/j.bbrc.2015.08.086] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2015] [Accepted: 08/20/2015] [Indexed: 01/27/2023]
Abstract
Cancers are often associated with mutations in multiple genes; thus, studying the distributions of genes that harbor cancer-promoting mutations in cancer samples and their co-occurrences could provide insights into cancer diagnostics and treatment. Using data from the Catalogue of Somatic Mutations in Cancer (COSMIC), we found that mutated genes in cancer samples followed a power-law distribution. For instance, a few genes were mutated in a large number of samples (designated as high-frequent genes), while a large number of genes were only mutated in a few samples. This power-law distribution can be found in samples of all cancer types as well as individual cancers. In samples where two or more mutated genes are found, the high-frequent genes, i.e., those that were frequently mutated, often did not co-occur with other genes, while the other genes often tended to co-occur. Co-occurrences of mutated genes were often unique to a certain cancer; however, some co-occurrences were shared by multiple cancer types. Our results revealed distinct patterns of high-frequent genes and those that were less-frequently mutated in the cancer samples in co-occurring and anti-co-occurring networks. Our results indicated that distinct treatment strategies should be adopted for cancer patients with known high-frequent gene mutations and those without. The latter might be better treated with a combination of drugs targeting multiple genes. Our results also suggested that possible cross-cancer treatments, i.e., the use of the same drug combinations, may treat cancers of different histological origins.
Collapse
Affiliation(s)
- Junqi Liu
- Department of Radiation Oncology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China; Department of Radiation Oncology, Universitätsmedizin Mannheim, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany.
| | - Di Zhao
- The Institute of Experimental and Clinical Pharmacology and Toxicology, Universitätsmedizin Mannheim, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany.
| | - Ruitai Fan
- Department of Radiation Oncology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China.
| |
Collapse
|
32
|
Buchner DA, Nadeau JH. Contrasting genetic architectures in different mouse reference populations used for studying complex traits. Genome Res 2015; 25:775-91. [PMID: 25953951 PMCID: PMC4448675 DOI: 10.1101/gr.187450.114] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 03/31/2015] [Indexed: 01/14/2023]
Abstract
Quantitative trait loci (QTLs) are being used to study genetic networks, protein functions, and systems properties that underlie phenotypic variation and disease risk in humans, model organisms, agricultural species, and natural populations. The challenges are many, beginning with the seemingly simple tasks of mapping QTLs and identifying their underlying genetic determinants. Various specialized resources have been developed to study complex traits in many model organisms. In the mouse, remarkably different pictures of genetic architectures are emerging. Chromosome Substitution Strains (CSSs) reveal many QTLs, large phenotypic effects, pervasive epistasis, and readily identified genetic variants. In contrast, other resources as well as genome-wide association studies (GWAS) in humans and other species reveal genetic architectures dominated with a relatively modest number of QTLs that have small individual and combined phenotypic effects. These contrasting architectures are the result of intrinsic differences in the study designs underlying different resources. The CSSs examine context-dependent phenotypic effects independently among individual genotypes, whereas with GWAS and other mouse resources, the average effect of each QTL is assessed among many individuals with heterogeneous genetic backgrounds. We argue that variation of genetic architectures among individuals is as important as population averages. Each of these important resources has particular merits and specific applications for these individual and population perspectives. Collectively, these resources together with high-throughput genotyping, sequencing and genetic engineering technologies, and information repositories highlight the power of the mouse for genetic, functional, and systems studies of complex traits and disease models.
Collapse
Affiliation(s)
- David A Buchner
- Department of Genetics and Genome Sciences, Department of Biochemistry, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Joseph H Nadeau
- Pacific Northwest Diabetes Research Institute, Seattle, Washington 98122, USA
| |
Collapse
|
33
|
Wang Q, Lu Q, Zhao H. A review of study designs and statistical methods for genomic epidemiology studies using next generation sequencing. Front Genet 2015; 6:149. [PMID: 25941534 PMCID: PMC4403555 DOI: 10.3389/fgene.2015.00149] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2015] [Accepted: 03/30/2015] [Indexed: 12/22/2022] Open
Abstract
Results from numerous linkage and association studies have greatly deepened scientists’ understanding of the genetic basis of many human diseases, yet some important questions remain unanswered. For example, although a large number of disease-associated loci have been identified from genome-wide association studies in the past 10 years, it is challenging to interpret these results as most disease-associated markers have no clear functional roles in disease etiology, and all the identified genomic factors only explain a small portion of disease heritability. With the help of next-generation sequencing (NGS), diverse types of genomic and epigenetic variations can be detected with high accuracy. More importantly, instead of using linkage disequilibrium to detect association signals based on a set of pre-set probes, NGS allows researchers to directly study all the variants in each individual, therefore promises opportunities for identifying functional variants and a more comprehensive dissection of disease heritability. Although the current scale of NGS studies is still limited due to the high cost, the success of several recent studies suggests the great potential for applying NGS in genomic epidemiology, especially as the cost of sequencing continues to drop. In this review, we discuss several pioneer applications of NGS, summarize scientific discoveries for rare and complex diseases, and compare various study designs including targeted sequencing and whole-genome sequencing using population-based and family-based cohorts. Finally, we highlight recent advancements in statistical methods proposed for sequencing analysis, including group-based association tests, meta-analysis techniques, and annotation tools for variant prioritization.
Collapse
Affiliation(s)
- Qian Wang
- Program of Computational Biology and Bioinformatics, Yale University New Haven, CT, USA
| | - Qiongshi Lu
- Department of Biostatistics, Yale School of Public Health New Haven, CT, USA
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University New Haven, CT, USA ; Department of Biostatistics, Yale School of Public Health New Haven, CT, USA ; Veterans Affairs Cooperative Studies Program Coordinating Center West Haven, CT, USA
| |
Collapse
|
34
|
Shyr C, Tarailo-Graovac M, Gottlieb M, Lee JJY, van Karnebeek C, Wasserman WW. FLAGS, frequently mutated genes in public exomes. BMC Med Genomics 2014; 7:64. [PMID: 25466818 PMCID: PMC4267152 DOI: 10.1186/s12920-014-0064-y] [Citation(s) in RCA: 100] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2014] [Accepted: 10/24/2014] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Dramatic improvements in DNA-sequencing technologies and computational analyses have led to wide use of whole exome sequencing (WES) to identify the genetic basis of Mendelian disorders. More than 180 novel rare-disease-causing genes with Mendelian inheritance patterns have been discovered through sequencing the exomes of just a few unrelated individuals or family members. As rare/novel genetic variants continue to be uncovered, there is a major challenge in distinguishing true pathogenic variants from rare benign mutations. METHODS We used publicly available exome cohorts, together with the dbSNP database, to derive a list of genes (n = 100) that most frequently exhibit rare (<1%) non-synonymous/splice-site variants in general populations. We termed these genes FLAGS for FrequentLy mutAted GeneS and analyzed their properties. RESULTS Analysis of FLAGS revealed that these genes have significantly longer protein coding sequences, a greater number of paralogs and display less evolutionarily selective pressure than expected. FLAGS are more frequently reported in PubMed clinical literature and more frequently associated with diseased phenotypes compared to the set of human protein-coding genes. We demonstrated an overlap between FLAGS and the rare-disease causing genes recently discovered through WES studies (n = 10) and the need for replication studies and rigorous statistical and biological analyses when associating FLAGS to rare disease. Finally, we showed how FLAGS are applied in disease-causing variant prioritization approach on exome data from a family affected by an unknown rare genetic disorder. CONCLUSIONS We showed that some genes are frequently affected by rare, likely functional variants in general population, and are frequently observed in WES studies analyzing diverse rare phenotypes. We found that the rate at which genes accumulate rare mutations is beneficial information for prioritizing candidates. We provided a ranking system based on the mutation accumulation rates for prioritizing exome-captured human genes, and propose that clinical reports associating any disease/phenotype to FLAGS be evaluated with extra caution.
Collapse
Affiliation(s)
- Casper Shyr
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Vancouver, BC, Canada. .,Treatable Intellectual Disability Endeavour in British Columbia, Vancouver, Canada. .,Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada.
| | - Maja Tarailo-Graovac
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Vancouver, BC, Canada. .,Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada. .,Treatable Intellectual Disability Endeavour in British Columbia, Vancouver, Canada.
| | - Michael Gottlieb
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Vancouver, BC, Canada.
| | - Jessica J Y Lee
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Vancouver, BC, Canada. .,Genome Science and Technology Graduate Program, University of British Columbia, Vancouver, BC, Canada.
| | - Clara van Karnebeek
- Treatable Intellectual Disability Endeavour in British Columbia, Vancouver, Canada. .,Division of Biochemical Diseases, BC Children's Hospital, Vancouver, BC, Canada. .,Department of Pediatrics, University of British Columbia, Vancouver, BC, Canada.
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Vancouver, BC, Canada. .,Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada. .,Treatable Intellectual Disability Endeavour in British Columbia, Vancouver, Canada.
| |
Collapse
|
35
|
Singh PP, Affeldt S, Malaguti G, Isambert H. Human dominant disease genes are enriched in paralogs originating from whole genome duplication. PLoS Comput Biol 2014; 10:e1003754. [PMID: 25080083 PMCID: PMC4117431 DOI: 10.1371/journal.pcbi.1003754] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
| | - Séverine Affeldt
- CNRS-UMR168, UPMC, Institut Curie, Research Center, Paris, France
| | - Giulia Malaguti
- CNRS-UMR168, UPMC, Institut Curie, Research Center, Paris, France
| | - Hervé Isambert
- CNRS-UMR168, UPMC, Institut Curie, Research Center, Paris, France
- * E-mail:
| |
Collapse
|
36
|
Chen WH, Zhao XM, van Noort V, Bork P. Comments on "Human dominant disease genes are enriched in paralogs originating from whole genome duplication". PLoS Comput Biol 2014; 10:e1003758. [PMID: 25077479 PMCID: PMC4117415 DOI: 10.1371/journal.pcbi.1003758] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Affiliation(s)
- Wei-Hua Chen
- European Molecular Biology Laboratory (EMBL) Heidelberg, Heidelberg, Germany
| | - Xing-Ming Zhao
- Department of Computer Science, School of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Vera van Noort
- European Molecular Biology Laboratory (EMBL) Heidelberg, Heidelberg, Germany
| | - Peer Bork
- European Molecular Biology Laboratory (EMBL) Heidelberg, Heidelberg, Germany
- Max-Delbrück-Centrum für Molekulare Medizin (MDC), Berlin, Germany
- * E-mail:
| |
Collapse
|
37
|
|