1
|
O'Connor M, Qiao H, Odamah K, Cerdeira PC, Man HY. Heterozygous Nexmif female mice demonstrate mosaic NEXMIF expression, autism-like behaviors, and abnormalities in dendritic arborization and synaptogenesis. Heliyon 2024; 10:e24703. [PMID: 38322873 PMCID: PMC10844029 DOI: 10.1016/j.heliyon.2024.e24703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 11/28/2023] [Accepted: 01/12/2024] [Indexed: 02/08/2024] Open
Abstract
Autism spectrum disorder (ASD) is a neurodevelopmental disorder with a strong genetic basis. ASDs are commonly characterized by impairments in language, restrictive and repetitive behaviors, and deficits in social interactions. Although ASD is a highly heterogeneous disease with many different genes implicated in its etiology, many ASD-associated genes converge on common cellular defects, such as aberrant neuronal morphology and synapse dysregulation. Our previous work revealed that, in mice, complete loss of the ASD-associated X-linked gene NEXMIF results in a reduction in dendritic complexity, a decrease in spine and synapse density, altered synaptic transmission, and ASD-like behaviors. Interestingly, human females of NEXMIF haploinsufficiency have recently been reported to demonstrate autistic features; however, the cellular and molecular basis for this haploinsufficiency-caused ASD remains unclear. Here we report that in the brains of Nexmif± female mice, NEXMIF shows a mosaic pattern in its expression in neurons. Heterozygous female mice demonstrate behavioral impairments similar to those of knockout male mice. In the mosaic mixture of neurons from Nexmif± mice, cells that lack NEXMIF have impairments in dendritic arborization and spine development. Remarkably, the NEXMIF-expressing neurons from Nexmif± mice also demonstrate similar defects in dendritic growth and spine formation. These findings establish a novel mouse model of NEXMIF haploinsufficiency and provide new insights into the pathogenesis of NEXMIF-dependent ASD.
Collapse
Affiliation(s)
- Margaret O'Connor
- Department of Biology, Boston University, 5 Cummington Mall, Boston, MA 02215, USA
| | - Hui Qiao
- Department of Biology, Boston University, 5 Cummington Mall, Boston, MA 02215, USA
| | - KathrynAnn Odamah
- Department of Biology, Boston University, 5 Cummington Mall, Boston, MA 02215, USA
| | | | - Heng-Ye Man
- Department of Biology, Boston University, 5 Cummington Mall, Boston, MA 02215, USA
- Department of Pharmacology, Physiology & Biophysics, Boston University School of Medicine, 72 East Concord St., Boston, MA 02118, USA
- Center for Systems Neuroscience, Boston University, 610 Commonwealth Ave, Boston, MA 02215, USA
| |
Collapse
|
2
|
Liu Z, Huang YF. Deep multiple-instance learning accurately predicts gene haploinsufficiency and deletion pathogenicity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.29.555384. [PMID: 37693607 PMCID: PMC10491176 DOI: 10.1101/2023.08.29.555384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Copy number losses (deletions) are a major contributor to the etiology of severe genetic disorders. Although haploinsufficient genes play a critical role in deletion pathogenicity, current methods for deletion pathogenicity prediction fail to integrate multiple lines of evidence for haploinsufficiency at the gene level, limiting their power to pinpoint deleterious deletions associated with genetic disorders. Here we introduce DosaCNV, a deep multiple-instance learning framework that, for the first time, models deletion pathogenicity jointly with gene haploinsufficiency. By integrating over 30 gene-level features potentially predictive of haploinsufficiency, DosaCNV shows unmatched performance in prioritizing pathogenic deletions associated with a broad spectrum of genetic disorders. Furthermore, DosaCNV outperforms existing methods in predicting gene haploinsufficiency even though it is not trained on known haploinsufficient genes. Finally, DosaCNV leverages a state-of-the-art technique to quantify the contributions of individual gene-level features to haploinsufficiency, allowing for human-understandable explanations of model predictions. Altogether, DosaCNV is a powerful computational tool for both fundamental and translational research.
Collapse
Affiliation(s)
- Zhihan Liu
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
- Molecular, Cellular, and Integrative Biosciences Program, Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
| | - Yi-Fei Huang
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
3
|
Alfieri F, Caravagna G, Schaefer MH. Cancer genomes tolerate deleterious coding mutations through somatic copy number amplifications of wild-type regions. Nat Commun 2023; 14:3594. [PMID: 37328455 PMCID: PMC10276008 DOI: 10.1038/s41467-023-39313-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 06/01/2023] [Indexed: 06/18/2023] Open
Abstract
Cancers evolve under the accumulation of thousands of somatic mutations and chromosomal aberrations. While most coding mutations are deleterious, almost all protein-coding genes lack detectable signals of negative selection. This raises the question of how tumors tolerate such large amounts of deleterious mutations. Using 8,690 tumor samples from The Cancer Genome Atlas, we demonstrate that copy number amplifications frequently cover haploinsufficient genes in mutation-prone regions. This could increase tolerance towards the deleterious impact of mutations by creating safe copies of wild-type regions and, hence, protecting the genes therein. Our findings demonstrate that these potential buffering events are highly influenced by gene functions, essentiality, and mutation impact and that they occur early during tumor evolution. We show how cancer type-specific mutation landscapes drive copy number alteration patterns across cancer types. Ultimately, our work paves the way for the detection of novel cancer vulnerabilities by revealing genes that fall within amplifications likely selected during evolution to mitigate the effect of mutations.
Collapse
Affiliation(s)
- Fabio Alfieri
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, 20139, Italy
| | - Giulio Caravagna
- Department of Mathematics and Geosciences, University of Trieste, Trieste, 34127, Italy
| | - Martin H Schaefer
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, 20139, Italy.
| |
Collapse
|
4
|
Badonyi M, Marsh JA. Buffering of genetic dominance by allele-specific protein complex assembly. SCIENCE ADVANCES 2023; 9:eadf9845. [PMID: 37256959 PMCID: PMC10413657 DOI: 10.1126/sciadv.adf9845] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Accepted: 04/24/2023] [Indexed: 06/02/2023]
Abstract
Protein complex assembly often occurs while subunits are being translated, resulting in complexes whose subunits were translated from the same mRNA in an allele-specific manner. It has thus been hypothesized that such cotranslational assembly may counter the assembly-mediated dominant-negative effect, whereby co-assembly of mutant and wild-type subunits "poisons" complex activity. Here, we show that cotranslationally assembling subunits are much less likely to be associated with autosomal dominant relative to recessive disorders, and that subunits with dominant-negative disease mutations are significantly depleted in cotranslational assembly compared to those associated with loss-of-function mutations. We also find that complexes with known dominant-negative effects tend to expose their interfaces late during translation, lessening the likelihood of cotranslational assembly. Finally, by combining complex properties with other features, we trained a computational model for predicting proteins likely to be associated with non-loss-of-function disease mechanisms, which we believe will be of considerable utility for protein variant interpretation.
Collapse
Affiliation(s)
- Mihaly Badonyi
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | | |
Collapse
|
5
|
Lv K, Chen D, Xiong D, Tang H, Ou T, Kan L, Zhang X. dbCNV: deleteriousness-based model to predict pathogenicity of copy number variations. BMC Genomics 2023; 24:131. [PMID: 36941551 PMCID: PMC10029177 DOI: 10.1186/s12864-023-09225-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Accepted: 03/06/2023] [Indexed: 03/23/2023] Open
Abstract
BACKGROUND Copy number variation (CNV) is a type of structural variation, which is a gain or loss event with abnormal changes in copy number. Methods to predict the pathogenicity of CNVs are required to realize the relationship between these variants and clinical phenotypes. ClassifyCNV, X-CNV, StrVCTVRE, etc. have been trained to predict the pathogenicity of CNVs, but few studies have been reported based on the deleterious significance of features. RESULTS From single nucleotide polymorphism (SNP), gene and region dimensions, we collected 79 informative features that quantitatively describe the characteristics of CNV, such as CNV length, the number of protein genes, the number of three prime untranslated region. Then, according to the deleterious significance, we formulated quantitative methods for features, which fall into two categories: the first is variable type, including maximum, minimum and mean; the second is attribute type, which is measured by numerical sum. We used Gradient Boosted Trees (GBT) algorithm to construct dbCNV, which can be used to predict pathogenicity for five-tier classification and binary classification of CNVs. We demonstrated that the distribution of most feature values was consistent with the deleterious significance. The five-tier classification model accuracy for 0.85 and 0.79 in loss and gain CNVs, which proved that it has high discrimination power in predicting the pathogenicity of five-tier classification CNVs. The binary model achieved area under curve (AUC) values of 0.96 and 0.81 in the validation set, respectively, in gain and loss CNVs. CONCLUSION The performance of the dbCNV suggest that functional deleteriousness-based model of CNV is a promising approach to support the classification prediction and to further understand the pathogenic mechanism.
Collapse
Affiliation(s)
- Kangqi Lv
- Xinxiang Medical University, 453003, Xinxiang, China
- Medical Laboratory of the Third Affiliated Hospital of Shenzhen University, No. 47 of Youyi Road, 518001, Shenzhen City, Guangdong Province, China
| | - Dayang Chen
- Medical Laboratory of the Third Affiliated Hospital of Shenzhen University, No. 47 of Youyi Road, 518001, Shenzhen City, Guangdong Province, China
| | - Dan Xiong
- Medical Laboratory of the Third Affiliated Hospital of Shenzhen University, No. 47 of Youyi Road, 518001, Shenzhen City, Guangdong Province, China
| | - Huamei Tang
- Medical Laboratory of the Third Affiliated Hospital of Shenzhen University, No. 47 of Youyi Road, 518001, Shenzhen City, Guangdong Province, China
| | - Tong Ou
- Medical Laboratory of the Third Affiliated Hospital of Shenzhen University, No. 47 of Youyi Road, 518001, Shenzhen City, Guangdong Province, China
| | - Lijuan Kan
- Medical Laboratory of the Third Affiliated Hospital of Shenzhen University, No. 47 of Youyi Road, 518001, Shenzhen City, Guangdong Province, China.
| | - Xiuming Zhang
- Xinxiang Medical University, 453003, Xinxiang, China
- Medical Laboratory of the Third Affiliated Hospital of Shenzhen University, No. 47 of Youyi Road, 518001, Shenzhen City, Guangdong Province, China
| |
Collapse
|
6
|
Gruhl F, Janich P, Kaessmann H, Gatfield D. Circular RNA repertoires are associated with evolutionarily young transposable elements. eLife 2021; 10:67991. [PMID: 34542406 PMCID: PMC8516420 DOI: 10.7554/elife.67991] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 09/19/2021] [Indexed: 12/29/2022] Open
Abstract
Circular RNAs (circRNAs) are found across eukaryotes and can function in post-transcriptional gene regulation. Their biogenesis through a circle-forming backsplicing reaction is facilitated by reverse-complementary repetitive sequences promoting pre-mRNA folding. Orthologous genes from which circRNAs arise, overall contain more strongly conserved splice sites and exons than other genes, yet it remains unclear to what extent this conservation reflects purifying selection acting on the circRNAs themselves. Our analyses of circRNA repertoires from five species representing three mammalian lineages (marsupials, eutherians: rodents, primates) reveal that surprisingly few circRNAs arise from orthologous exonic loci across all species. Even the circRNAs from orthologous loci are associated with young, recently active and species-specific transposable elements, rather than with common, ancient transposon integration events. These observations suggest that many circRNAs emerged convergently during evolution - as a byproduct of splicing in orthologs prone to transposon insertion. Overall, our findings argue against widespread functional circRNA conservation.
Collapse
Affiliation(s)
- Franziska Gruhl
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Peggy Janich
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Krebsforschung Schweiz, Bern, Switzerland
| | - Henrik Kaessmann
- Center for Molecular Biology of Heidelberg University (ZMBH), DKFZ-ZMBH Alliance, Heidelberg, Germany
| | - David Gatfield
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| |
Collapse
|
7
|
Zhang L, Shi J, Ouyang J, Zhang R, Tao Y, Yuan D, Lv C, Wang R, Ning B, Roberts R, Tong W, Liu Z, Shi T. X-CNV: genome-wide prediction of the pathogenicity of copy number variations. Genome Med 2021; 13:132. [PMID: 34407882 PMCID: PMC8375180 DOI: 10.1186/s13073-021-00945-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 07/30/2021] [Indexed: 01/04/2023] Open
Abstract
Background Gene copy number variations (CNVs) contribute to genetic diversity and disease prevalence across populations. Substantial efforts have been made to decipher the relationship between CNVs and pathogenesis but with limited success. Results We have developed a novel computational framework X-CNV (www.unimd.org/XCNV), to predict the pathogenicity of CNVs by integrating more than 30 informative features such as allele frequency (AF), CNV length, CNV type, and some deleterious scores. Notably, over 14 million CNVs across various ethnic groups, covering nearly 93% of the human genome, were unified to calculate the AF. X-CNV, which yielded area under curve (AUC) values of 0.96 and 0.94 in training and validation sets, was demonstrated to outperform other available tools in terms of CNV pathogenicity prediction. A meta-voting prediction (MVP) score was developed to quantitively measure the pathogenic effect, which is based on the probabilistic value generated from the XGBoost algorithm. The proposed MVP score demonstrated a high discriminative power in determining pathogenetic CNVs for inherited traits/diseases in different ethnic groups. Conclusions The ability of the X-CNV framework to quantitatively prioritize functional, deleterious, and disease-causing CNV on a genome-wide basis outperformed current CNV-annotation tools and will have broad utility in population genetics, disease-association studies, and diagnostic screening. Supplementary Information The online version contains supplementary material available at 10.1186/s13073-021-00945-4.
Collapse
Affiliation(s)
- Li Zhang
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, 200241, China.,School of Statistics, Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, East China Normal University, Shanghai, 200062, China
| | - Jingru Shi
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Jian Ouyang
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Riquan Zhang
- School of Statistics, Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, East China Normal University, Shanghai, 200062, China
| | - Yiran Tao
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Dongsheng Yuan
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Chengkai Lv
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Ruiyuan Wang
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Baitang Ning
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Ruth Roberts
- ApconiX Ltd, Alderley Park, Alderley Edge, SK10 4TG, UK.,University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| | - Weida Tong
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, 72079, USA.
| | - Zhichao Liu
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, 72079, USA.
| | - Tieliu Shi
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, 200241, China. .,School of Statistics, Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, East China Normal University, Shanghai, 200062, China. .,Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University & Capital Medical University, Beijing, 100083, China.
| |
Collapse
|
8
|
Caldu-Primo JL, Verduzco-Martínez JA, Alvarez-Buylla ER, Davila-Velderrain J. In vivo and in vitro human gene essentiality estimations capture contrasting functional constraints. NAR Genom Bioinform 2021; 3:lqab063. [PMID: 34268495 PMCID: PMC8276763 DOI: 10.1093/nargab/lqab063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 06/18/2021] [Accepted: 07/07/2021] [Indexed: 11/28/2022] Open
Abstract
Gene essentiality estimation is a popular empirical approach to link genotypes to phenotypes. In humans, essentiality is estimated based on loss-of-function (LoF) mutation intolerance, either from population exome sequencing (in vivo) data or CRISPR-based in vitro perturbation experiments. Both approaches identify genes presumed to have detrimental consequences on the organism upon mutation. Are these genes constrained by having key cellular/organismal roles? Do in vivo and in vitro estimations equally recover these constraints? Insights into these questions have important implications in generalizing observations from cell models and interpreting disease risk genes. To empirically address these questions, we integrate genome-scale datasets and compare structural, functional and evolutionary features of essential genes versus genes with extremely high mutational tolerance. We found that essentiality estimates do recover functional constraints. However, the organismal or cellular context of estimation leads to functionally contrasting properties underlying the constraint. Our results suggest that depletion of LoF mutations in human populations effectively captures organismal-level functional constraints not experimentally accessible through CRISPR-based screens. Finally, we identify a set of genes (OrgEssential), which are mutationally intolerant in vivo but highly tolerant in vitro. These genes drive observed functional constraint differences and have an unexpected preference for nervous system expression.
Collapse
Affiliation(s)
- Jose Luis Caldu-Primo
- Instituto de Ecología, Universidad Nacional Autónoma de México, Cd. Universitaria, CDMX., 04510, México
| | - Jorge Armando Verduzco-Martínez
- Departamento de Biología Celular y Genética, Facultad de Ciencias Biológicas, Universidad Autónoma de Nuevo León, San Nicolás de los Garza, Nuevo León, 66400, México
| | - Elena R Alvarez-Buylla
- Instituto de Ecología, Universidad Nacional Autónoma de México, Cd. Universitaria, CDMX., 04510, México
| | | |
Collapse
|
9
|
Yang Y, Li S, Wang Y, Ma Z, Wong KC, Li X. Identification of haploinsufficient genes from epigenomic data using deep forest. Brief Bioinform 2021; 22:6102676. [PMID: 33454736 DOI: 10.1093/bib/bbaa393] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2020] [Revised: 11/29/2020] [Accepted: 12/01/2020] [Indexed: 11/14/2022] Open
Abstract
Haploinsufficiency, wherein a single allele is not enough to maintain normal functions, can lead to many diseases including cancers and neurodevelopmental disorders. Recently, computational methods for identifying haploinsufficiency have been developed. However, most of those computational methods suffer from study bias, experimental noise and instability, resulting in unsatisfactory identification of haploinsufficient genes. To address those challenges, we propose a deep forest model, called HaForest, to identify haploinsufficient genes. The multiscale scanning is proposed to extract local contextual representations from input features under Linear Discriminant Analysis. After that, the cascade forest structure is applied to obtain the concatenated features directly by integrating decision-tree-based forests. Meanwhile, to exploit the complex dependency structure among haploinsufficient genes, the LightGBM library is embedded into HaForest to reveal the highly expressive features. To validate the effectiveness of our method, we compared it to several computational methods and four deep learning algorithms on five epigenomic data sets. The results reveal that HaForest achieves superior performance over the other algorithms, demonstrating its unique and complementary performance in identifying haploinsufficient genes. The standalone tool is available at https://github.com/yangyn533/HaForest.
Collapse
Affiliation(s)
- Yuning Yang
- School of Artificial Intelligence, Jilin University and School of Information Science and Technology, Northeast Normal University, China
| | - Shaochuan Li
- School of Information Science and Technology, Northeast Normal University, China
| | - Yunhe Wang
- School of Information Science and Technology, Northeast Normal University, China
| | - Zhiqiang Ma
- School of Information Science and Technology, Northeast Normal University, China
| | | | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Changchun, Jilin, China
| |
Collapse
|
10
|
Vihinen M. Functional effects of protein variants. Biochimie 2020; 180:104-120. [PMID: 33164889 DOI: 10.1016/j.biochi.2020.10.009] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 10/15/2020] [Accepted: 10/19/2020] [Indexed: 12/11/2022]
Abstract
Genetic and other variations frequently affect protein functions. Scientific articles can contain confusing descriptions about which function or property is affected, and in many cases the statements are pure speculation without any experimental evidence. To clarify functional effects of protein variations of genetic or non-genetic origin, a systematic conceptualisation and framework are introduced. This framework describes protein functional effects on abundance, activity, specificity and affinity, along with countermeasures, which allow cells, tissues and organisms to tolerate, avoid, repair, attenuate or resist (TARAR) the effects. Effects on abundance discussed include gene dosage, restricted expression, mis-localisation and degradation. Enzymopathies, effects on kinetics, allostery and regulation of protein activity are subtopics for the effects of variants on activity. Variation outcomes on specificity and affinity comprise promiscuity, specificity, affinity and moonlighting. TARAR mechanisms redress variations with active and passive processes including chaperones, redundancy, robustness, canalisation and metabolic and signalling rewiring. A framework for pragmatic protein function analysis and presentation is introduced. All of the mechanisms and effects are described along with representative examples, most often in relation to diseases. In addition, protein function is discussed from evolutionary point of view. Application of the presented framework facilitates unambiguous, detailed and specific description of functional effects and their systematic study.
Collapse
Affiliation(s)
- Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22 184, Lund, Sweden.
| |
Collapse
|
11
|
Boukas L, Bjornsson HT, Hansen KD. Promoter CpG Density Predicts Downstream Gene Loss-of-Function Intolerance. Am J Hum Genet 2020; 107:487-498. [PMID: 32800095 PMCID: PMC7477270 DOI: 10.1016/j.ajhg.2020.07.014] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Accepted: 07/22/2020] [Indexed: 12/26/2022] Open
Abstract
The aggregation and joint analysis of large numbers of exome sequences has recently made it possible to derive estimates of intolerance to loss-of-function (LoF) variation for human genes. Here, we demonstrate strong and widespread coupling between genic LoF intolerance and promoter CpG density across the human genome. Genes downstream of the most CpG-rich promoters (top 10% CpG density) have a 67.2% probability of being highly LoF intolerant, using the LOEUF metric from gnomAD. This is in contrast to 7.4% of genes downstream of the most CpG-poor (bottom 10% CpG density) promoters. Combining promoter CpG density with exonic and promoter conservation explains 33.4% of the variation in LOEUF, and the contribution of CpG density exceeds the individual contributions of exonic and promoter conservation. We leverage this to train a simple and easily interpretable predictive model that outperforms other existing predictors and allows us to classify 1,760 genes-which are currently unascertained in gnomAD-as highly LoF intolerant or not. These predictions have the potential to aid in the interpretation of novel variants in the clinical setting. Moreover, our results reveal that high CpG density is not merely a generic feature of human promoters but is preferentially encountered at the promoters of the most selectively constrained genes, calling into question the prevailing view that CpG islands are not subject to selection.
Collapse
Affiliation(s)
- Leandros Boukas
- Human Genetics Training Program, Johns Hopkins University School of Medicine, 733 N Broadway, Baltimore, MD 21205, USA; Department of Genetic Medicine, Johns Hopkins University School of Medicine, 733 N Broadway, Baltimore, MD 21205, USA
| | - Hans T Bjornsson
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, 733 N Broadway, Baltimore, MD 21205, USA; Department of Pediatrics, Johns Hopkins University School of Medicine, 1800 Orleans Street, Baltimore, MD 21287, USA; Faculty of Medicine, University of Iceland, Sturlugata 8, 101 Reykjavik, Iceland; Landspitali University Hospital, Hringbraut, 101 Reykjavik, Iceland.
| | - Kasper D Hansen
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, 733 N Broadway, Baltimore, MD 21205, USA; Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N. Wolfe St, Baltimore, MD 21205, USA.
| |
Collapse
|
12
|
Cody JD. The Consequences of Abnormal Gene Dosage: Lessons from Chromosome 18. Trends Genet 2020; 36:764-776. [PMID: 32660784 DOI: 10.1016/j.tig.2020.06.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 06/17/2020] [Accepted: 06/18/2020] [Indexed: 12/18/2022]
Abstract
Accurate interpretation of genomic copy number variation (CNV) remains a challenge and has important consequences for both congenital and late-onset disease. Hemizygosity dosage characterization of the genes on chromosome 18 reveals a spectrum of outcomes ranging from no clinical effect, to risk factors for disease, to both low- and high-penetrance disease. These data are important for accurate and predictive clinical management. Additionally, the potential mechanisms of reduced penetrance due to dosage compensation are discussed as a key to understanding avenues for potential treatment. This review describes the chromosome 18 findings, and discusses the molecular mechanisms that allow haploinsufficiency, reduced penetrance, and dosage compensation.
Collapse
Affiliation(s)
- Jannine DeMars Cody
- Department of Pediatrics, University of Texas Health San Antonio, San Antonio, TX 78229, USA; Chromosome 18 Registry and Research Society, San Antonio, TX 78229, USA.
| |
Collapse
|
13
|
Pei J, Kinch LN, Otwinowski Z, Grishin NV. Mutation severity spectrum of rare alleles in the human genome is predictive of disease type. PLoS Comput Biol 2020; 16:e1007775. [PMID: 32413045 PMCID: PMC7255613 DOI: 10.1371/journal.pcbi.1007775] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 05/28/2020] [Accepted: 03/06/2020] [Indexed: 12/19/2022] Open
Abstract
The human genome harbors a variety of genetic variations. Single-nucleotide changes that alter amino acids in protein-coding regions are one of the major causes of human phenotypic variation and diseases. These single-amino acid variations (SAVs) are routinely found in whole genome and exome sequencing. Evaluating the functional impact of such genomic alterations is crucial for diagnosis of genetic disorders. We developed DeepSAV, a deep-learning convolutional neural network to differentiate disease-causing and benign SAVs based on a variety of protein sequence, structural and functional properties. Our method outperforms most stand-alone programs, and the version incorporating population and gene-level information (DeepSAV+PG) has similar predictive power as some of the best available. We transformed DeepSAV scores of rare SAVs in the human population into a quantity termed "mutation severity measure" for each human protein-coding gene. It reflects a gene's tolerance to deleterious missense mutations and serves as a useful tool to study gene-disease associations. Genes implicated in cancer, autism, and viral interaction are found by this measure as intolerant to mutations, while genes associated with a number of other diseases are scored as tolerant. Among known disease-associated genes, those that are mutation-intolerant are likely to function in development and signal transduction pathways, while those that are mutation-tolerant tend to encode metabolic and mitochondrial proteins.
Collapse
Affiliation(s)
- Jimin Pei
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Lisa N. Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Zbyszek Otwinowski
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Nick V. Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- * E-mail:
| |
Collapse
|
14
|
Shindyapina AV, Zenin AA, Tarkhov AE, Santesmasses D, Fedichev PO, Gladyshev VN. Germline burden of rare damaging variants negatively affects human healthspan and lifespan. eLife 2020; 9:e53449. [PMID: 32254024 PMCID: PMC7314550 DOI: 10.7554/elife.53449] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 03/20/2020] [Indexed: 12/12/2022] Open
Abstract
Heritability of human lifespan is 23-33% as evident from twin studies. Genome-wide association studies explored this question by linking particular alleles to lifespan traits. However, genetic variants identified so far can explain only a small fraction of lifespan heritability in humans. Here, we report that the burden of rarest protein-truncating variants (PTVs) in two large cohorts is negatively associated with human healthspan and lifespan, accounting for 0.4 and 1.3 years of their variability, respectively. In addition, longer-living individuals possess both fewer rarest PTVs and less damaging PTVs. We further estimated that somatic accumulation of PTVs accounts for only a small fraction of mortality and morbidity acceleration and hence is unlikely to be causal in aging. We conclude that rare damaging mutations, both inherited and accumulated throughout life, contribute to the aging process, and that burden of ultra-rare variants in combination with common alleles better explain apparent heritability of human lifespan.
Collapse
Affiliation(s)
| | - Aleksandr A Zenin
- Gero LLCMoscowRussian Federation
- The Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State UniversityMoscowRussian Federation
| | - Andrei E Tarkhov
- Gero LLCMoscowRussian Federation
- Skolkovo Institute of Science and Technology, Skolkovo Innovation CenterMoscowRussian Federation
| | | | - Peter O Fedichev
- Gero LLCMoscowRussian Federation
- Moscow Institute of Physics and TechnologyMoscowRussian Federation
| | - Vadim N Gladyshev
- Brigham and Women’s Hospital, Harvard Medical SchoolBostonUnited States
| |
Collapse
|
15
|
Genetic diagnosis of autoinflammatory disease patients using clinical exome sequencing. Eur J Med Genet 2020; 63:103920. [PMID: 32222431 DOI: 10.1016/j.ejmg.2020.103920] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Revised: 02/13/2020] [Accepted: 03/21/2020] [Indexed: 11/22/2022]
Abstract
Autoinflammatory diseases comprise a wide range of syndromes caused by dysregulation of the innate immune response. They are difficult to diagnose due to their phenotypic heterogeneity and variable expressivity. Thus, the genetic origin of the disease remains undetermined for an important proportion of patients. We aim to identify causal genetic variants in patients with suspected autoinflammatory disease and to test the advantages and limitations of the clinical exome gene panels for molecular diagnosis. Twenty-two unrelated patients with clinical features of autoinflammatory diseases were analyzed using clinical exome sequencing (~4800 genes), followed by bioinformatic analyses to detect likely pathogenic variants. By integrating genetic and clinical information, we found a likely causative heterozygous genetic variant in NFKBIA (p.D31N) in a North-African patient with a clinical picture resembling the deficiency of interleukin-1 receptor antagonist, and a heterozygous variant in DNASE2 (p.G322D) in a Spanish patient with a suspected lupus-like monogenic disorder. We also found variants likely to increase the susceptibility to autoinflammatory diseases in three additional Spanish patients: one with an initial diagnosis of juvenile idiopathic arthritis who carries two heterozygous UNC13D variants (p.R727Q and p.A59T), and two with early-onset inflammatory bowel disease harbouring NOD2 variants (p.L221R and p.A728V respectively). Our results show a similar proportion of molecular diagnosis to other studies using whole exome or targeted resequencing in primary immunodeficiencies. Thus, despite its main limitation of not including all candidate genes, clinical exome targeted sequencing can be an appropriate approach to detect likely causative variants in autoinflammatory diseases.
Collapse
|
16
|
Alyousfi D, Baralle D, Collins A. Essentiality-specific pathogenicity prioritization gene score to improve filtering of disease sequence data. Brief Bioinform 2020; 22:1782-1789. [PMID: 32186701 DOI: 10.1093/bib/bbaa029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2019] [Revised: 02/17/2020] [Accepted: 02/18/2020] [Indexed: 11/12/2022] Open
Abstract
The causal genetic variants underlying more than 50% of single gene (monogenic) disorders are yet to be discovered. Many patients with conditions likely to have a monogenic basis do not receive a confirmed molecular diagnosis which has potential impacts on clinical management. We have developed a gene-specific score, essentiality-specific pathogenicity prioritization (ESPP), to guide the recognition of genes likely to underlie monogenic disease variation to assist in filtering of genome sequence data. When a patient genome is sequenced, there are frequently several plausibly pathogenic variants identified in different genes. Recognition of the single gene most likely to include pathogenic variation can guide the identification of a causal variant. The ESPP score integrates gene-level scores which are broadly related to gene essentiality. Previous work towards the recognition of monogenic disease genes proposed a model with increasing gene essentiality from 'non-essential' to 'essential' genes (for which pathogenic variation may be incompatible with survival) with genes liable to contain disease variation positioned between these two extremes. We demonstrate that the ESPP score is useful for recognizing genes with high potential for pathogenic disease-related variation. Genes classed as essential have particularly high scores, as do genes recently recognized as strong candidates for developmental disorders. Through the integration of individual gene-specific scores, which have different properties and assumptions, we demonstrate the utility of an essentiality-based gene score to improve sequence genome filtering.
Collapse
|
17
|
GeVIR is a continuous gene-level metric that uses variant distribution patterns to prioritize disease candidate genes. Nat Genet 2019; 52:35-39. [PMID: 31873297 DOI: 10.1038/s41588-019-0560-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Accepted: 11/22/2019] [Indexed: 01/08/2023]
Abstract
With large-scale population sequencing projects gathering pace, there is a need for strategies that advance disease gene prioritization1,2. Metrics that provide information about a gene and its ability to tolerate protein-altering variation can aid in clinical interpretation of human genomes and can advance disease gene discovery1-4. Previous reported methods analyzed the total variant load in a gene1-4, but did not analyze the distribution pattern of variants within a gene. Using data from 138,632 exome and genome sequences2, we developed gene variation intolerance rank (GeVIR), a continuous gene-level metric for 19,361 genes that is able to prioritize both dominant and recessive Mendelian disease genes5, that outperforms missense constraint metrics3 and that is comparable-but complementary-to loss-of-function (LOF) constraint metrics2. GeVIR is also able to prioritize short genes, for which LOF constraint cannot be estimated with confidence2. The majority of the most intolerant genes identified here have no defined phenotype and are candidates for severe dominant disorders.
Collapse
|
18
|
Courel M, Clément Y, Bossevain C, Foretek D, Vidal Cruchez O, Yi Z, Bénard M, Benassy MN, Kress M, Vindry C, Ernoult-Lange M, Antoniewski C, Morillon A, Brest P, Hubstenberger A, Roest Crollius H, Standart N, Weil D. GC content shapes mRNA storage and decay in human cells. eLife 2019; 8:49708. [PMID: 31855182 PMCID: PMC6944446 DOI: 10.7554/elife.49708] [Citation(s) in RCA: 99] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Accepted: 12/18/2019] [Indexed: 02/07/2023] Open
Abstract
mRNA translation and decay appear often intimately linked although the rules of this interplay are poorly understood. In this study, we combined our recent P-body transcriptome with transcriptomes obtained following silencing of broadly acting mRNA decay and repression factors, and with available CLIP and related data. This revealed the central role of GC content in mRNA fate, in terms of P-body localization, mRNA translation and mRNA stability: P-bodies contain mostly AU-rich mRNAs, which have a particular codon usage associated with a low protein yield; AU-rich and GC-rich transcripts tend to follow distinct decay pathways; and the targets of sequence-specific RBPs and miRNAs are also biased in terms of GC content. Altogether, these results suggest an integrated view of post-transcriptional control in human cells where most translation regulation is dedicated to inefficiently translated AU-rich mRNAs, whereas control at the level of 5’ decay applies to optimally translated GC-rich mRNAs.
Collapse
Affiliation(s)
- Maïté Courel
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine (IBPS), Laboratoire de Biologie du Développement, Paris, France
| | - Yves Clément
- Ecole Normale Supérieure, Institut de Biologie de l'ENS, IBENS, Paris, France
| | - Clémentine Bossevain
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine (IBPS), Laboratoire de Biologie du Développement, Paris, France
| | - Dominika Foretek
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, PSL Research University, CNRS UMR 3244, Sorbonne Université, Paris, France
| | | | - Zhou Yi
- Université Côte d'Azur, CNRS, INSERM, iBV, Nice, France
| | - Marianne Bénard
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine (IBPS), Laboratoire de Biologie du Développement, Paris, France
| | - Marie-Noëlle Benassy
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine (IBPS), Laboratoire de Biologie du Développement, Paris, France
| | - Michel Kress
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine (IBPS), Laboratoire de Biologie du Développement, Paris, France
| | - Caroline Vindry
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Michèle Ernoult-Lange
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine (IBPS), Laboratoire de Biologie du Développement, Paris, France
| | - Christophe Antoniewski
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine (IBPS), ARTbio Bioinformatics Analysis Facility, Paris, France
| | - Antonin Morillon
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, PSL Research University, CNRS UMR 3244, Sorbonne Université, Paris, France
| | - Patrick Brest
- Université Côte d'Azur, CNRS, INSERM, IRCAN, FHU-OncoAge, Nice, France
| | | | | | - Nancy Standart
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Dominique Weil
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine (IBPS), Laboratoire de Biologie du Développement, Paris, France
| |
Collapse
|
19
|
Holt JM, Wilk B, Birch CL, Brown DM, Gajapathy M, Moss AC, Sosonkina N, Wilk MA, Anderson JA, Harris JM, Kelly JM, Shaterferdosian F, Uno-Antonison AE, Weborg A, Worthey EA. VarSight: prioritizing clinically reported variants with binary classification algorithms. BMC Bioinformatics 2019; 20:496. [PMID: 31615419 PMCID: PMC6792253 DOI: 10.1186/s12859-019-3026-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Accepted: 08/12/2019] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND When applying genomic medicine to a rare disease patient, the primary goal is to identify one or more genomic variants that may explain the patient's phenotypes. Typically, this is done through annotation, filtering, and then prioritization of variants for manual curation. However, prioritization of variants in rare disease patients remains a challenging task due to the high degree of variability in phenotype presentation and molecular source of disease. Thus, methods that can identify and/or prioritize variants to be clinically reported in the presence of such variability are of critical importance. METHODS We tested the application of classification algorithms that ingest variant annotations along with phenotype information for predicting whether a variant will ultimately be clinically reported and returned to a patient. To test the classifiers, we performed a retrospective study on variants that were clinically reported to 237 patients in the Undiagnosed Diseases Network. RESULTS We treated the classifiers as variant prioritization systems and compared them to four variant prioritization algorithms and two single-measure controls. We showed that the trained classifiers outperformed all other tested methods with the best classifiers ranking 72% of all reported variants and 94% of reported pathogenic variants in the top 20. CONCLUSIONS We demonstrated how freely available binary classification algorithms can be used to prioritize variants even in the presence of real-world variability. Furthermore, these classifiers outperformed all other tested methods, suggesting that they may be well suited for working with real rare disease patient datasets.
Collapse
Affiliation(s)
- James M. Holt
- HudsonAlpha Institute for Biotechnology, Software Development and Informatics, 601 Genome Way, Huntsville, 35806 USA
| | - Brandon Wilk
- HudsonAlpha Institute for Biotechnology, Software Development and Informatics, 601 Genome Way, Huntsville, 35806 USA
| | - Camille L. Birch
- HudsonAlpha Institute for Biotechnology, Software Development and Informatics, 601 Genome Way, Huntsville, 35806 USA
| | - Donna M. Brown
- HudsonAlpha Institute for Biotechnology, Software Development and Informatics, 601 Genome Way, Huntsville, 35806 USA
| | - Manavalan Gajapathy
- HudsonAlpha Institute for Biotechnology, Software Development and Informatics, 601 Genome Way, Huntsville, 35806 USA
| | - Alexander C. Moss
- HudsonAlpha Institute for Biotechnology, Software Development and Informatics, 601 Genome Way, Huntsville, 35806 USA
| | - Nadiya Sosonkina
- HudsonAlpha Institute for Biotechnology, Software Development and Informatics, 601 Genome Way, Huntsville, 35806 USA
- University of Alabama at Birmingham, Department of Genetics, 720 20th Street South, Birmingham, 35294 USA
| | - Melissa A. Wilk
- HudsonAlpha Institute for Biotechnology, Software Development and Informatics, 601 Genome Way, Huntsville, 35806 USA
| | - Julie A. Anderson
- HudsonAlpha Institute for Biotechnology, Software Development and Informatics, 601 Genome Way, Huntsville, 35806 USA
| | - Jeremy M. Harris
- HudsonAlpha Institute for Biotechnology, Software Development and Informatics, 601 Genome Way, Huntsville, 35806 USA
| | - Jacob M. Kelly
- HudsonAlpha Institute for Biotechnology, Software Development and Informatics, 601 Genome Way, Huntsville, 35806 USA
| | - Fariba Shaterferdosian
- HudsonAlpha Institute for Biotechnology, Software Development and Informatics, 601 Genome Way, Huntsville, 35806 USA
| | - Angelina E. Uno-Antonison
- HudsonAlpha Institute for Biotechnology, Software Development and Informatics, 601 Genome Way, Huntsville, 35806 USA
| | - Arthur Weborg
- HudsonAlpha Institute for Biotechnology, Software Development and Informatics, 601 Genome Way, Huntsville, 35806 USA
| | - Elizabeth A. Worthey
- HudsonAlpha Institute for Biotechnology, Software Development and Informatics, 601 Genome Way, Huntsville, 35806 USA
| |
Collapse
|
20
|
Johnson AF, Nguyen HT, Veitia RA. Causes and effects of haploinsufficiency. Biol Rev Camb Philos Soc 2019; 94:1774-1785. [DOI: 10.1111/brv.12527] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 05/08/2019] [Accepted: 05/10/2019] [Indexed: 12/14/2022]
Affiliation(s)
- Adam F. Johnson
- Institute of Research and DevelopmentDuy Tan University Da Nang, 550000 Vietnam
| | - Ha T. Nguyen
- Institute of Research and DevelopmentDuy Tan University Da Nang, 550000 Vietnam
| | | |
Collapse
|
21
|
Abstract
Haploinsufficiency describes the decrease in organismal fitness observed when a single copy of a gene is deleted in diploids. We investigated the origin of haploinsufficiency by creating a comprehensive dosage sensitivity data set for genes under their native promoters. We demonstrate that the expression of haploinsufficient genes is limited by the toxicity of their overexpression. We further show that the fitness penalty associated with excess gene copy number is not the only determinant of haploinsufficiency. Haploinsufficient genes represent a unique subset of genes sensitive to copy number increases, as they are also limiting for important cellular processes when present in one copy instead of two. The selective pressure to decrease gene expression due to the toxicity of overexpression, combined with the pressure to increase expression due to their fitness-limiting nature, has made haploinsufficient genes extremely sensitive to changes in gene expression. As a consequence, haploinsufficient genes are dosage stabilized, showing much more narrow ranges in cell-to-cell variability of expression compared with other genes in the genome. We propose a dosage-stabilizing hypothesis of haploinsufficiency to explain its persistence over evolutionary time.
Collapse
|
22
|
Measuring intolerance to mutation in human genetics. Nat Genet 2019; 51:772-776. [PMID: 30962618 DOI: 10.1038/s41588-019-0383-1] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 02/22/2019] [Indexed: 01/07/2023]
Abstract
In numerous applications, from working with animal models to mapping the genetic basis of human disease susceptibility, knowing whether a single disrupting mutation in a gene is likely to be deleterious is useful. With this goal in mind, a number of measures have been developed to identify genes in which protein-truncating variants (PTVs), or other types of mutations, are absent or kept at very low frequency in large population samples-genes that appear 'intolerant' to mutation. One measure in particular, the probability of being loss-of-function intolerant (pLI), has been widely adopted. This measure was designed to classify genes into three categories, null, recessive and haploinsufficient, on the basis of the contrast between observed and expected numbers of PTVs. Such population-genetic approaches can be useful in many applications. As we clarify, however, they reflect the strength of selection acting on heterozygotes and not dominance or haploinsufficiency.
Collapse
|
23
|
Abstract
Inherited retinal degeneration (IRD), a group of rare retinal diseases that primarily lead to the progressive loss of retinal photoreceptor cells, can be inherited in all modes of inheritance: autosomal dominant (AD), autosomal recessive (AR), X-linked (XL), and mitochondrial. Based on the pattern of inheritance of the dystrophy, retinal gene therapy has 2 main strategies. AR, XL, and AD IRDs with haploinsufficiency can be treated by inserting a functional copy of the gene using either viral or nonviral vectors (gene augmentation). Different types of viral vectors and nonviral vectors are used to transfer plasmid DNA both in vitro and in vivo. AD IRDs with gain-of-function mutations or dominant-negative mutations can be treated by disrupting the mutant allele with (and occasionally without) gene augmentation. This review article aims to provide an overview of ocular gene therapy for treating IRDs using gene augmentation with viral or nonviral vectors or gene disruption through different gene-editing tools, especially with the clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) system.
Collapse
Affiliation(s)
- Amirmohsen Arbabi
- Department of Ophthalmology, USC Roski Eye Institute, Keck School of Medicine, University of Southern California, Los Angeles, California
| | - Amelia Liu
- Department of Ophthalmology, USC Roski Eye Institute, Keck School of Medicine, University of Southern California, Los Angeles, California
| | - Hossein Ameri
- Department of Ophthalmology, USC Roski Eye Institute, Keck School of Medicine, University of Southern California, Los Angeles, California
| |
Collapse
|
24
|
Abstract
Variably expressive copy-number variants (CNVs) are characterized by extensive phenotypic heterogeneity of neuropsychiatric phenotypes. Approaches to identify single causative genes for these phenotypes within each CNV have not been successful. Here, we posit using multiple lines of evidence, including pathogenicity metrics, functional assays of model organisms, and gene expression data, that multiple genes within each CNV region are likely responsible for the observed phenotypes. We propose that candidate genes within each region likely interact with each other through shared pathways to modulate the individual gene phenotypes, emphasizing the genetic complexity of CNV-associated neuropsychiatric features.
Collapse
Affiliation(s)
- Matthew Jensen
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, United States of America
- Bioinformatics and Genomics Program, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Santhosh Girirajan
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, United States of America
- Bioinformatics and Genomics Program, Pennsylvania State University, University Park, Pennsylvania, United States of America
- Department of Anthropology, Pennsylvania State University, University Park, Pennsylvania, United States of America
| |
Collapse
|
25
|
Alyousfi D, Baralle D, Collins A. Gene-specific metrics to facilitate identification of disease genes for molecular diagnosis in patient genomes: a systematic review. Brief Funct Genomics 2018; 18:23-29. [DOI: 10.1093/bfgp/ely033] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Revised: 08/30/2018] [Accepted: 09/20/2018] [Indexed: 11/13/2022] Open
Affiliation(s)
- Dareen Alyousfi
- Genetic Epidemiology and Bioinformatics Research Group, Human Development and Health, Faculty of Medicine, University of Southampton, UK
| | - Diana Baralle
- Human Development and Health, Faculty of Medicine, University of Southampton, UK
- Wessex Clinical Genetics Service, Princess Anne Hospital, Southampton, UK
| | - Andrew Collins
- Genetic Epidemiology and Bioinformatics Research Group, Human Development and Health, Faculty of Medicine, University of Southampton, UK
| |
Collapse
|
26
|
Han X, Chen S, Flynn E, Wu S, Wintner D, Shen Y. Distinct epigenomic patterns are associated with haploinsufficiency and predict risk genes of developmental disorders. Nat Commun 2018; 9:2138. [PMID: 29849042 PMCID: PMC5976622 DOI: 10.1038/s41467-018-04552-7] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Accepted: 05/08/2018] [Indexed: 12/21/2022] Open
Abstract
Haploinsufficiency is a major mechanism of genetic risk in developmental disorders. Accurate prediction of haploinsufficient genes is essential for prioritizing and interpreting deleterious variants in genetic studies. Current methods based on mutation intolerance in population data suffer from inadequate power for genes with short transcripts. Here we show haploinsufficiency is strongly associated with epigenomic patterns, and develop a computational method (Episcore) to predict haploinsufficiency leveraging epigenomic data from a broad range of tissue and cell types by machine learning methods. Based on data from recent exome sequencing studies on developmental disorders, Episcore achieves better performance in prioritizing likely-gene-disrupting (LGD) de novo variants than current methods. We further show that Episcore is less-biased by gene size, and complementary to mutation intolerance metrics for prioritizing LGD variants. Our approach enables new applications of epigenomic data and facilitates discovery and interpretation of novel risk variants implicated in developmental disorders. Predicting haploinsufficient genes helps to understand the genetic risk underlying developmental disorders. Here, the authors develop a Random Forest-based method that uses epigenomic data to predict haploinsufficiency, Episcore, which is complementary to methods based on mutation intolerance scores.
Collapse
Affiliation(s)
- Xinwei Han
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA.,Department of Pediatrics, Columbia University, New York, NY, 10032, USA.,Constellation Pharmaceuticals, 215 First Street, Cambridge, MA, 02142, USA
| | - Siying Chen
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA.,The Integrated Program in Cellular, Molecular and Biomedical Studies, Columbia University, New York, NY, 10032, USA
| | - Elise Flynn
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA.,The Integrated Program in Cellular, Molecular and Biomedical Studies, Columbia University, New York, NY, 10032, USA
| | - Shuang Wu
- Department of Biostatistics, Columbia University, New York, NY, 10032, USA
| | - Dana Wintner
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA. .,Department of Biomedical Informatics, Columbia University, New York, NY, 10032, USA. .,JP Sulzberger Columbia Genome Center, Columbia University, New York, NY, 10032, USA.
| |
Collapse
|
27
|
Ohnuki S, Ohya Y. High-dimensional single-cell phenotyping reveals extensive haploinsufficiency. PLoS Biol 2018; 16:e2005130. [PMID: 29768403 PMCID: PMC5955526 DOI: 10.1371/journal.pbio.2005130] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2017] [Accepted: 04/06/2018] [Indexed: 12/17/2022] Open
Abstract
Haploinsufficiency, a dominant phenotype caused by a heterozygous loss-of-function mutation, has been rarely observed. However, high-dimensional single-cell phenotyping of yeast morphological characteristics revealed haploinsufficiency phenotypes for more than half of 1,112 essential genes under optimal growth conditions. Additionally, 40% of the essential genes with no obvious phenotype under optimal growth conditions displayed haploinsufficiency under severe growth conditions. Haploinsufficiency was detected more frequently in essential genes than in nonessential genes. Similar haploinsufficiency phenotypes were observed mostly in mutants with heterozygous deletion of functionally related genes, suggesting that haploinsufficiency phenotypes were caused by functional defects of the genes. A global view of the gene network was presented based on the similarities of the haploinsufficiency phenotypes. Our dataset contains rich information regarding essential gene functions, providing evidence that single-cell phenotyping is a powerful approach, even in the heterozygous condition, for analyzing complex biological systems. Diploid organisms harboring a wild-type gene and a loss-of-function mutation are called heterozygotes. They are expected to have weak or no individual phenotypes because the mutation is compensated for by the intact allele. The dominant inheritance of phenotypes in heterozygotes is an exceptional phenomenon called haploinsufficiency. Haploinsufficiency was thought to be a rare occurrence; however, a sensitive technique called high-dimensional single-cell phenotyping challenges this perspective. Investigations of single-cell phenotypes revealed that a large extent of the essential genes in yeast exhibit haploinsufficiency. Our analyses also provided crucial information on gene functional networks based on haploinsufficiency phenotypes. This work shows that high-dimensional single-cell phenotyping is a useful tool that can be used to better understand complex biological systems.
Collapse
Affiliation(s)
- Shinsuke Ohnuki
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Chiba, Japan
| | - Yoshikazu Ohya
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Chiba, Japan
- AIST-UTokyo Advanced Operando-Measurement Technology Open Innovation Laboratory (OPERANDO-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Kashiwa, Chiba, Japan
- * E-mail:
| |
Collapse
|
28
|
Shihab HA, Rogers MF, Campbell C, Gaunt TR. HIPred: an integrative approach to predicting haploinsufficient genes. Bioinformatics 2018; 33:1751-1757. [PMID: 28137713 PMCID: PMC5581952 DOI: 10.1093/bioinformatics/btx028] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Accepted: 01/19/2017] [Indexed: 11/13/2022] Open
Abstract
Motivation A major cause of autosomal dominant disease is haploinsufficiency, whereby a single copy of a gene is not sufficient to maintain the normal function of the gene. A large proportion of existing methods for predicting haploinsufficiency incorporate biological networks, e.g. protein-protein interaction networks that have recently been shown to introduce study bias. As a result, these methods tend to perform best on well-studied genes, but underperform on less studied genes. The advent of large genome sequencing consortia, such as the 1000 genomes project, NHLBI Exome Sequencing Project and the Exome Aggregation Consortium creates an urgent need for unbiased haploinsufficiency prediction methods. Results Here, we describe a machine learning approach, called HIPred, that integrates genomic and evolutionary information from ENSEMBL, with functional annotations from the Encyclopaedia of DNA Elements consortium and the NIH Roadmap Epigenomics Project to predict haploinsufficiency, without the study bias described earlier. We benchmark HIPred using several datasets and show that our unbiased method performs as well as, and in most cases, outperforms existing biased algorithms. Availability and Implementation HIPred scores for all gene identifiers are available at: https://github.com/HAShihab/HIPred. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hashem A Shihab
- MRC Integrative Epidemiology Unit (IEU), University of Bristol, Bristol, UK
| | - Mark F Rogers
- Intelligent Systems Laboratory, University of Bristol, Bristol, UK
| | - Colin Campbell
- Intelligent Systems Laboratory, University of Bristol, Bristol, UK
| | - Tom R Gaunt
- MRC Integrative Epidemiology Unit (IEU), University of Bristol, Bristol, UK
| |
Collapse
|
29
|
Popadin K, Peischl S, Garieri M, Sailani MR, Letourneau A, Santoni F, Lukowski SW, Bazykin GA, Nikolaev S, Meyer D, Excoffier L, Reymond A, Antonarakis SE. Slightly deleterious genomic variants and transcriptome perturbations in Down syndrome embryonic selection. Genome Res 2017; 28:1-10. [PMID: 29237728 PMCID: PMC5749173 DOI: 10.1101/gr.228411.117] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2017] [Accepted: 11/20/2017] [Indexed: 12/13/2022]
Abstract
The majority of aneuploid fetuses are spontaneously miscarried. Nevertheless, some aneuploid individuals survive despite the strong genetic insult. Here, we investigate if the survival probability of aneuploid fetuses is affected by the genome-wide burden of slightly deleterious variants. We analyzed two cohorts of live-born Down syndrome individuals (388 genotyped samples and 16 fibroblast transcriptomes) and observed a deficit of slightly deleterious variants on Chromosome 21 and decreased transcriptome-wide variation in the expression level of highly constrained genes. We interpret these results as signatures of embryonic selection, and propose a genetic handicap model whereby an individual bearing an extremely severe deleterious variant (such as aneuploidy) could escape embryonic lethality if the genome-wide burden of slightly deleterious variants is sufficiently low. This approach can be used to study the composition and effect of the numerous slightly deleterious variants in humans and model organisms.
Collapse
Affiliation(s)
- Konstantin Popadin
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland.,Center for Integrative Genomics, University of Lausanne, CH-1015 Lausanne, Switzerland.,Immanuel Kant Baltic Federal University, Kaliningrad, 236041, Russia.,Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Stephan Peischl
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Interfaculty Bioinformatics Unit, University of Bern, 3012 Bern, Switzerland
| | - Marco Garieri
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland
| | - M Reza Sailani
- Stanford School of Medicine, Stanford University, Stanford, California 94305, USA
| | - Audrey Letourneau
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland
| | - Federico Santoni
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland
| | - Samuel W Lukowski
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
| | - Georgii A Bazykin
- Institute for Information Transmission Problems (Kharkevich Institute) of the Russian Academy of Sciences, Moscow, 127051, Russia.,Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Skolkovo, 143026, Russia
| | - Sergey Nikolaev
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland
| | - Diogo Meyer
- Department of Genetics and Evolutionary Biology, University of Sao Paulo, 05508-090, Sao Paulo, Brazil
| | - Laurent Excoffier
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Institute for Ecology and Evolution, University of Bern, CH-3012 Bern, Switzerland
| | - Alexandre Reymond
- Center for Integrative Genomics, University of Lausanne, CH-1015 Lausanne, Switzerland
| | - Stylianos E Antonarakis
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland
| |
Collapse
|
30
|
|
31
|
Elurbe DM, Paranjpe SS, Georgiou G, van Kruijsbergen I, Bogdanovic O, Gibeaux R, Heald R, Lister R, Huynen MA, van Heeringen SJ, Veenstra GJC. Regulatory remodeling in the allo-tetraploid frog Xenopus laevis. Genome Biol 2017; 18:198. [PMID: 29065907 PMCID: PMC5655803 DOI: 10.1186/s13059-017-1335-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Accepted: 10/03/2017] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Genome duplication has played a pivotal role in the evolution of many eukaryotic lineages, including the vertebrates. A relatively recent vertebrate genome duplication is that in Xenopus laevis, which resulted from the hybridization of two closely related species about 17 million years ago. However, little is known about the consequences of this duplication at the level of the genome, the epigenome, and gene expression. RESULTS The X. laevis genome consists of two subgenomes, referred to as L (long chromosomes) and S (short chromosomes), that originated from distinct diploid progenitors. Of the parental subgenomes, S chromosomes have degraded faster than L chromosomes from the point of genome duplication until the present day. Deletions appear to have the largest effect on pseudogene formation and loss of regulatory regions. Deleted regions are enriched for long DNA repeats and the flanking regions have high alignment scores, suggesting that non-allelic homologous recombination has played a significant role in the loss of DNA. To assess innovations in the X. laevis subgenomes we examined p300-bound enhancer peaks that are unique to one subgenome and absent from X. tropicalis. A large majority of new enhancers comprise transposable elements. Finally, to dissect early and late events following interspecific hybridization, we examined the epigenome and the enhancer landscape in X. tropicalis × X. laevis hybrid embryos. Strikingly, young X. tropicalis DNA transposons are derepressed and recruit p300 in hybrid embryos. CONCLUSIONS The results show that erosion of X. laevis genes and functional regulatory elements is associated with repeats and non-allelic homologous recombination and furthermore that young repeats have also contributed to the p300-bound regulatory landscape following hybridization and whole-genome duplication.
Collapse
Affiliation(s)
- Dei M Elurbe
- Radboud University Medical Center, Center for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, 6500 HB, Nijmegen, The Netherlands
| | - Sarita S Paranjpe
- Radboud University, Faculty of Science, Department of Molecular Developmental Biology, Radboud Institute for Molecular Life Sciences, 6500 HB, Nijmegen, The Netherlands
| | - Georgios Georgiou
- Radboud University, Faculty of Science, Department of Molecular Developmental Biology, Radboud Institute for Molecular Life Sciences, 6500 HB, Nijmegen, The Netherlands
| | - Ila van Kruijsbergen
- Radboud University, Faculty of Science, Department of Molecular Developmental Biology, Radboud Institute for Molecular Life Sciences, 6500 HB, Nijmegen, The Netherlands
| | - Ozren Bogdanovic
- Genomics and Epigenetics Division, Garvan Institute of Medical Research, Sydney, Australia
- St Vincent's Clinical School, Faculty of Medicine, University of New South Wales, Sydney, Australia
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, Australia
| | - Romain Gibeaux
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
| | - Rebecca Heald
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
| | - Ryan Lister
- Harry Perkins Institute of Medical Research and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA, 6009, Australia
| | - Martijn A Huynen
- Radboud University Medical Center, Center for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, 6500 HB, Nijmegen, The Netherlands.
| | - Simon J van Heeringen
- Radboud University, Faculty of Science, Department of Molecular Developmental Biology, Radboud Institute for Molecular Life Sciences, 6500 HB, Nijmegen, The Netherlands.
| | - Gert Jan C Veenstra
- Radboud University, Faculty of Science, Department of Molecular Developmental Biology, Radboud Institute for Molecular Life Sciences, 6500 HB, Nijmegen, The Netherlands.
| |
Collapse
|
32
|
Worthey EA. Analysis and Annotation of Whole-Genome or Whole-Exome Sequencing Derived Variants for Clinical Diagnosis. ACTA ACUST UNITED AC 2017; 95:9.24.1-9.24.28. [PMID: 29044471 DOI: 10.1002/cphg.49] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Over the last 10 years, next-generation sequencing (NGS) has transformed genomic research through substantial advances in technology and reduction in the cost of sequencing, and also in the systems required for analysis of these large volumes of data. This technology is now being used as a standard molecular diagnostic test in some clinical settings. The advances in sequencing have come so rapidly that the major bottleneck in identification of causal variants is no longer the sequencing or analysis (given access to appropriate tools), but rather clinical interpretation. Interpretation of genetic findings in a complex and ever changing clinical setting is scarcely a new challenge, but the task is increasingly complex in clinical genome-wide sequencing given the dramatic increase in dataset size and complexity. This increase requires application of appropriate interpretation tools, as well as development and application of appropriate methodologies and standard procedures. This unit provides an overview of these items. Specific challenges related to implementation of genome-wide sequencing in a clinical setting are discussed. © 2017 by John Wiley & Sons, Inc.
Collapse
|
33
|
Quinodoz M, Royer-Bertrand B, Cisarova K, Di Gioia SA, Superti-Furga A, Rivolta C. DOMINO: Using Machine Learning to Predict Genes Associated with Dominant Disorders. Am J Hum Genet 2017; 101:623-629. [PMID: 28985496 DOI: 10.1016/j.ajhg.2017.09.001] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2017] [Accepted: 09/01/2017] [Indexed: 10/18/2022] Open
Abstract
In contrast to recessive conditions with biallelic inheritance, identification of dominant (monoallelic) mutations for Mendelian disorders is more difficult, because of the abundance of benign heterozygous variants that act as massive background noise (typically, in a 400:1 excess ratio). To reduce this overflow of false positives in next-generation sequencing (NGS) screens, we developed DOMINO, a tool assessing the likelihood for a gene to harbor dominant changes. Unlike commonly-used predictors of pathogenicity, DOMINO takes into consideration features that are the properties of genes, rather than of variants. It uses a machine-learning approach to extract discriminant information from a broad array of features (N = 432), including: genomic data, intra-, and interspecies conservation, gene expression, protein-protein interactions, protein structure, etc. DOMINO's iterative architecture includes a training process on 985 genes with well-established inheritance patterns for Mendelian conditions, and repeated cross-validation that optimizes its discriminant power. When validated on 99 newly-discovered genes with pathogenic mutations, the algorithm displays an excellent final performance, with an area under the curve (AUC) of 0.92. Furthermore, unsupervised analysis by DOMINO of real sets of NGS data from individuals with intellectual disability or epilepsy correctly recognizes known genes and predicts 9 new candidates, with very high confidence. In summary, DOMINO is a robust and reliable tool that can infer dominance of candidate genes with high sensitivity and specificity, making it a useful complement to any NGS pipeline dealing with the analysis of the morbid human genome.
Collapse
|
34
|
Abstract
Down syndrome (also known as trisomy 21) is the model human phenotype for all genomic gain dosage imbalances, including microduplications. The functional genomic exploration of the post-sequencing years of chromosome 21, and the generation of numerous cellular and mouse models, have provided an unprecedented opportunity to decipher the molecular consequences of genome dosage imbalance. Studies of Down syndrome could provide knowledge far beyond the well-known characteristics of intellectual disability and dysmorphic features, as several other important features, including congenital heart defects, early ageing, Alzheimer disease and childhood leukaemia, are also part of the Down syndrome phenotypic spectrum. The elucidation of the molecular mechanisms that cause or modify the risk for different Down syndrome phenotypes could lead to the introduction of previously unimaginable therapeutic options.
Collapse
|
35
|
Increased burden of deleterious variants in essential genes in autism spectrum disorder. Proc Natl Acad Sci U S A 2016; 113:15054-15059. [PMID: 27956632 DOI: 10.1073/pnas.1613195113] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Autism spectrum disorder (ASD) is a heterogeneous, highly heritable neurodevelopmental syndrome characterized by impaired social interaction, communication, and repetitive behavior. It is estimated that hundreds of genes contribute to ASD. We asked if genes with a strong effect on survival and fitness contribute to ASD risk. Human orthologs of genes with an essential role in pre- and postnatal development in the mouse [essential genes (EGs)] are enriched for disease genes and under strong purifying selection relative to human orthologs of mouse genes with a known nonlethal phenotype [nonessential genes (NEGs)]. This intolerance to deleterious mutations, commonly observed haploinsufficiency, and the importance of EGs in development suggest a possible cumulative effect of deleterious variants in EGs on complex neurodevelopmental disorders. With a comprehensive catalog of 3,915 mammalian EGs, we provide compelling evidence for a stronger contribution of EGs to ASD risk compared with NEGs. By examining the exonic de novo and inherited variants from 1,781 ASD quartet families, we show a significantly higher burden of damaging mutations in EGs in ASD probands compared with their non-ASD siblings. The analysis of EGs in the developing brain identified clusters of coexpressed EGs implicated in ASD. Finally, we suggest a high-priority list of 29 EGs with potential ASD risk as targets for future functional and behavioral studies. Overall, we show that large-scale studies of gene function in model organisms provide a powerful approach for prioritization of genes and pathogenic variants identified by sequencing studies of human disease.
Collapse
|
36
|
Uddin M, Pellecchia G, Thiruvahindrapuram B, D'Abate L, Merico D, Chan A, Zarrei M, Tammimies K, Walker S, Gazzellone MJ, Nalpathamkalam T, Yuen RKC, Devriendt K, Mathonnet G, Lemyre E, Nizard S, Shago M, Joseph-George AM, Noor A, Carter MT, Yoon G, Kannu P, Tihy F, Thorland EC, Marshall CR, Buchanan JA, Speevak M, Stavropoulos DJ, Scherer SW. Indexing Effects of Copy Number Variation on Genes Involved in Developmental Delay. Sci Rep 2016; 6:28663. [PMID: 27363808 PMCID: PMC4929460 DOI: 10.1038/srep28663] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2016] [Accepted: 06/06/2016] [Indexed: 01/03/2023] Open
Abstract
A challenge in clinical genomics is to predict whether copy number variation (CNV) affecting a gene or multiple genes will manifest as disease. Increasing recognition of gene dosage effects in neurodevelopmental disorders prompted us to develop a computational approach based on critical-exon (highly expressed in brain, highly conserved) examination for potential etiologic effects. Using a large CNV dataset, our updated analyses revealed significant (P < 1.64 × 10−15) enrichment of critical-exons within rare CNVs in cases compared to controls. Separately, we used a weighted gene co-expression network analysis (WGCNA) to construct an unbiased protein module from prenatal and adult tissues and found it significantly enriched for critical exons in prenatal (P < 1.15 × 10−50, OR = 2.11) and adult (P < 6.03 × 10−18, OR = 1.55) tissues. WGCNA yielded 1,206 proteins for which we prioritized the corresponding genes as likely to have a role in neurodevelopmental disorders. We compared the gene lists obtained from critical-exon and WGCNA analysis and found 438 candidate genes associated with CNVs annotated as pathogenic, or as variants of uncertain significance (VOUS), from among 10,619 developmental delay cases. We identified genes containing CNVs previously considered to be VOUS to be new candidate genes for neurodevelopmental disorders (GIT1, MVB12B and PPP1R9A) demonstrating the utility of this strategy to index the clinical effects of CNVs.
Collapse
Affiliation(s)
- Mohammed Uddin
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada.,Program in Genetics and Genome Biology (GGB), The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Giovanna Pellecchia
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada.,Program in Genetics and Genome Biology (GGB), The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Bhooma Thiruvahindrapuram
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada.,Program in Genetics and Genome Biology (GGB), The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Lia D'Abate
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada.,Program in Genetics and Genome Biology (GGB), The Hospital for Sick Children, Toronto, Ontario, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Daniele Merico
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada.,Program in Genetics and Genome Biology (GGB), The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Ada Chan
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada.,Program in Genetics and Genome Biology (GGB), The Hospital for Sick Children, Toronto, Ontario, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Mehdi Zarrei
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada.,Program in Genetics and Genome Biology (GGB), The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Kristiina Tammimies
- Center of Neurodevelopmental Disorders (KIND), Neuropsychiatric Unit, Department of Women's and Children's Health, Karolinska Institutet, Stockholm, Sweden
| | - Susan Walker
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada.,Program in Genetics and Genome Biology (GGB), The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Matthew J Gazzellone
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada.,Program in Genetics and Genome Biology (GGB), The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Thomas Nalpathamkalam
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada.,Program in Genetics and Genome Biology (GGB), The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Ryan K C Yuen
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada.,Program in Genetics and Genome Biology (GGB), The Hospital for Sick Children, Toronto, Ontario, Canada
| | | | | | - Emmanuelle Lemyre
- CHU Sainte-Justine, University de Montreal, Montreal, Quebec, Canada
| | - Sonia Nizard
- CHU Sainte-Justine, University de Montreal, Montreal, Quebec, Canada
| | - Mary Shago
- Genome Diagnostics, Paediatric Laboratory Medicine, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Ann M Joseph-George
- Genome Diagnostics, Paediatric Laboratory Medicine, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Abdul Noor
- Department of Pathology and Laboratory Medicine, Division of Diagnostic Medical Genetics, Mount Sinai Hospital, Toronto, Ontario, Canada
| | - Melissa T Carter
- Department of Genetics, The Children's Hospital of Eastern Ontario, Ottawa, ON, Canada
| | - Grace Yoon
- Division of Clinical and Metabolic Genetics, Department of Pediatrics, The Hospital for Sick Children, University of Toronto, Toronto, Ontario M5G 2L3, Canada
| | - Peter Kannu
- Division of Clinical and Metabolic Genetics, Department of Pediatrics, The Hospital for Sick Children, University of Toronto, Toronto, Ontario M5G 2L3, Canada
| | - Frédérique Tihy
- CHU Sainte-Justine, University de Montreal, Montreal, Quebec, Canada
| | - Erik C Thorland
- Cytogenetics Laboratory, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota, USA
| | - Christian R Marshall
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada.,Genome Diagnostics, Paediatric Laboratory Medicine, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Janet A Buchanan
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada.,Program in Genetics and Genome Biology (GGB), The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Marsha Speevak
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada
| | - Dimitri J Stavropoulos
- Genome Diagnostics, Paediatric Laboratory Medicine, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Stephen W Scherer
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada.,Program in Genetics and Genome Biology (GGB), The Hospital for Sick Children, Toronto, Ontario, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.,McLaughlin Centre, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
37
|
Bartha I, Rausell A, McLaren PJ, Mohammadi P, Tardaguila M, Chaturvedi N, Fellay J, Telenti A. The Characteristics of Heterozygous Protein Truncating Variants in the Human Genome. PLoS Comput Biol 2015; 11:e1004647. [PMID: 26642228 PMCID: PMC4671652 DOI: 10.1371/journal.pcbi.1004647] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2015] [Accepted: 11/06/2015] [Indexed: 11/18/2022] Open
Abstract
Sequencing projects have identified large numbers of rare stop-gain and frameshift variants in the human genome. As most of these are observed in the heterozygous state, they test a gene’s tolerance to haploinsufficiency and dominant loss of function. We analyzed the distribution of truncating variants across 16,260 autosomal protein coding genes in 11,546 individuals. We observed 39,893 truncating variants affecting 12,062 genes, which significantly differed from an expectation of 12,916 genes under a model of neutral de novo mutation (p<10−4). Extrapolating this to increasing numbers of sequenced individuals, we estimate that 10.8% of human genes do not tolerate heterozygous truncating variants. An additional 10 to 15% of truncated genes may be rescued by incomplete penetrance or compensatory mutations, or because the truncating variants are of limited functional impact. The study of protein truncating variants delineates the essential genome and, more generally, identifies rare heterozygous variants as an unexplored source of diversity of phenotypic traits and diseases. Genome sequencing provides evidence for large numbers of putative protein truncating variants in humans. Most truncating variants are only observed in few individuals but are collectively prevalent and widely distributed across the coding genome. Most of the truncating variants are so rare that they are only observed in heterozygosis. The current study identifies 10% of genes where heterozygous truncations are not observed and describes their biological characteristics. In addition, for genes where rare truncations are observed, we argue that these are an unexplored source of diversity of phenotypic traits and diseases.
Collapse
Affiliation(s)
- István Bartha
- SIB Swiss Institute of Bioinformatics, Lausanne and Basel, Switzerland
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Antonio Rausell
- SIB Swiss Institute of Bioinformatics, Lausanne and Basel, Switzerland
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Paul J. McLaren
- SIB Swiss Institute of Bioinformatics, Lausanne and Basel, Switzerland
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Pejman Mohammadi
- SIB Swiss Institute of Bioinformatics, Lausanne and Basel, Switzerland
- Computational Biology Group, ETH Zurich, Zurich, Switzerland
| | - Manuel Tardaguila
- SIB Swiss Institute of Bioinformatics, Lausanne and Basel, Switzerland
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Nimisha Chaturvedi
- SIB Swiss Institute of Bioinformatics, Lausanne and Basel, Switzerland
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Jacques Fellay
- SIB Swiss Institute of Bioinformatics, Lausanne and Basel, Switzerland
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Amalio Telenti
- J. Craig Venter Institute, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|