1
|
Bromberg Y, Prabakaran R, Kabir A, Shehu A. Variant Effect Prediction in the Age of Machine Learning. Cold Spring Harb Perspect Biol 2024; 16:a041467. [PMID: 38621825 PMCID: PMC11216171 DOI: 10.1101/cshperspect.a041467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
Over the years, many computational methods have been created for the analysis of the impact of single amino acid substitutions resulting from single-nucleotide variants in genome coding regions. Historically, all methods have been supervised and thus limited by the inadequate sizes of experimentally curated data sets and by the lack of a standardized definition of variant effect. The emergence of unsupervised, deep learning (DL)-based methods raised an important question: Can machines learn the language of life from the unannotated protein sequence data well enough to identify significant errors in the protein "sentences"? Our analysis suggests that some unsupervised methods perform as well or better than existing supervised methods. Unsupervised methods are also faster and can, thus, be useful in large-scale variant evaluations. For all other methods, however, their performance varies by both evaluation metrics and by the type of variant effect being predicted. We also note that the evaluation of method performance is still lacking on less-studied, nonhuman proteins where unsupervised methods hold the most promise.
Collapse
Affiliation(s)
- Yana Bromberg
- Department of Biology, Emory University, Atlanta 30322, Georgia, USA
- Department of Computer Science, Emory University, Atlanta 30322, Georgia, USA
| | - R Prabakaran
- Department of Biology, Emory University, Atlanta 30322, Georgia, USA
| | - Anowarul Kabir
- Department of Computer Science, George Mason University, Fairfax 22030, Virginia, USA
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax 22030, Virginia, USA
| |
Collapse
|
2
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: Trends from 25 years of genetic variant impact predictors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600283. [PMID: 38979289 PMCID: PMC11230257 DOI: 10.1101/2024.06.25.600283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Background Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). Results The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past 25 years, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 186 VIPs, resulting in a total of 403 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. Conclusions VIPdb version 2 summarizes 403 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. Availability VIPdb version 2 is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
| | - Arul S. Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Currently at: Illumina, Foster City, California 94404, USA
| | - Steven E. Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
3
|
Egbewale SO, Kumar A, Mokoena MP, Olaniran AO. Purification, characterization and three-dimensional structure prediction of multicopper oxidase Laccases from Trichoderma lixii FLU1 and Talaromyces pinophilus FLU12. Sci Rep 2024; 14:13371. [PMID: 38862560 PMCID: PMC11167041 DOI: 10.1038/s41598-024-63959-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 06/04/2024] [Indexed: 06/13/2024] Open
Abstract
Broad-spectrum biocatalysts enzymes, Laccases, have been implicated in the complete degradation of harmful pollutants into less-toxic compounds. In this study, two extracellularly produced Laccases were purified to homogeneity from two different Ascomycetes spp. Trichoderma lixii FLU1 (TlFLU1) and Talaromyces pinophilus FLU12 (TpFLU12). The purified enzymes are monomeric units, with a molecular mass of 44 kDa and 68.7 kDa for TlFLU1 and TpFLU12, respectively, on SDS-PAGE and zymogram. It reveals distinct properties beyond classic protein absorption at 270-280 nm, with TlFLU1's peak at 270 nm aligning with this typical range of type II Cu site (white Laccase), while TpFLU12's unique 600 nm peak signifies a type I Cu2+ site (blue Laccase), highlighting the diverse spectral fingerprints within the Laccase family. The Km and kcat values revealed that ABTS is the most suitable substrate as compared to 2,6-dimethoxyphenol, caffeic acid and guaiacol for both Laccases. The bioinformatics analysis revealed critical His, Ile, and Arg residues for copper binding at active sites, deviating from the traditional two His and a Cys motif in some Laccases. The predicted biological functions of the Laccases include oxidation-reduction, lignin metabolism, cellular metal ion homeostasis, phenylpropanoid catabolism, aromatic compound metabolism, cellulose metabolism, and biological adhesion. Additionally, investigation of degradation of polycyclic aromatic hydrocarbons (PAHs) by purified Laccases show significant reductions in residual concentrations of fluoranthene and anthracene after a 96-h incubation period. TlFLU1 Laccase achieved 39.0% and 44.9% transformation of fluoranthene and anthracene, respectively, while TpFLU12 Laccase achieved 47.2% and 50.0% transformation, respectively. The enzyme structure-function relationship study provided insights into the catalytic mechanism of these Laccases for possible biotechnological and industrial applications.
Collapse
Affiliation(s)
- Samson O Egbewale
- Discipline of Microbiology, School of Life Sciences, College of Agriculture, Engineering and Science, University of KwaZulu-Natal (Westville Campus), Durban, 4001, South Africa
| | - Ajit Kumar
- Discipline of Microbiology, School of Life Sciences, College of Agriculture, Engineering and Science, University of KwaZulu-Natal (Westville Campus), Durban, 4001, South Africa
| | - Mduduzi P Mokoena
- Discipline of Microbiology, School of Life Sciences, College of Agriculture, Engineering and Science, University of KwaZulu-Natal (Westville Campus), Durban, 4001, South Africa
- Department of Pathology, School of Medicine, University of Limpopo, Private Bag X1106, Sovenga, 0727, South Africa
| | - Ademola O Olaniran
- Discipline of Microbiology, School of Life Sciences, College of Agriculture, Engineering and Science, University of KwaZulu-Natal (Westville Campus), Durban, 4001, South Africa.
| |
Collapse
|
4
|
Trigos AS, Bongiovanni F, Zhang Y, Zethoven M, Tothill R, Pearson R, Papenfuss AT, Goode DL. Disruption of metazoan gene regulatory networks in cancer alters the balance of co-expression between genes of unicellular and multicellular origins. Genome Biol 2024; 25:110. [PMID: 38685127 PMCID: PMC11057133 DOI: 10.1186/s13059-024-03247-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 04/12/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND Metazoans inherited genes from unicellular ancestors that perform essential biological processes such as cell division, metabolism, and protein translation. Multicellularity requires careful control and coordination of these unicellular genes to maintain tissue integrity and homeostasis. Gene regulatory networks (GRNs) that arose during metazoan evolution are frequently altered in cancer, resulting in over-expression of unicellular genes. We propose that an imbalance in co-expression of unicellular (UC) and multicellular (MC) genes is a driving force in cancer. RESULTS We combine gene co-expression analysis to infer changes to GRNs in cancer with protein sequence conservation data to distinguish genes with UC and MC origins. Co-expression networks created using RNA sequencing data from 31 tumor types and normal tissue samples are divided into modules enriched for UC genes, MC genes, or mixed UC-MC modules. The greatest differences between tumor and normal tissue co-expression networks occur within mixed UC-MC modules. MC and UC genes not commonly co-expressed in normal tissues form distinct co-expression modules seen only in tumors. The degree of rewiring of genes within mixed UC-MC modules increases with tumor grade and stage. Mixed UC-MC modules are enriched for somatic mutations in cancer genes, particularly amplifications, suggesting an important driver of the rewiring observed in tumors is copy number changes. CONCLUSIONS Our study shows the greatest changes to gene co-expression patterns during tumor progression occur between genes of MC and UC origins, implicating the breakdown and rewiring of metazoan gene regulatory networks in cancer development and progression.
Collapse
Affiliation(s)
- Anna S Trigos
- Peter MacCallum Cancer Centre, 305 Grattan St., Melbourne, VIC, 3000, Australia.
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, 3010, Australia.
- Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC, 3168, Australia.
| | - Felicia Bongiovanni
- Peter MacCallum Cancer Centre, 305 Grattan St., Melbourne, VIC, 3000, Australia
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Yangyi Zhang
- Peter MacCallum Cancer Centre, 305 Grattan St., Melbourne, VIC, 3000, Australia
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Maia Zethoven
- Peter MacCallum Cancer Centre, 305 Grattan St., Melbourne, VIC, 3000, Australia
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Richard Tothill
- Centre for Cancer Research, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Richard Pearson
- Peter MacCallum Cancer Centre, 305 Grattan St., Melbourne, VIC, 3000, Australia
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, 3010, Australia
- Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC, 3168, Australia
- Department of Biochemistry and Molecular Biology, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Anthony T Papenfuss
- Peter MacCallum Cancer Centre, 305 Grattan St., Melbourne, VIC, 3000, Australia
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, 3010, Australia
- Bioinformatics Division, The Walter & Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia
| | - David L Goode
- Peter MacCallum Cancer Centre, 305 Grattan St., Melbourne, VIC, 3000, Australia.
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, 3010, Australia.
| |
Collapse
|
5
|
Atasu B, Simón-Sánchez J, Hanagasi H, Bilgic B, Hauser AK, Guven G, Heutink P, Gasser T, Lohmann E. Dissecting genetic architecture of rare dystonia: genetic, molecular and clinical insights. J Med Genet 2024; 61:443-451. [PMID: 38458754 DOI: 10.1136/jmg-2022-109099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 12/24/2023] [Indexed: 03/10/2024]
Abstract
BACKGROUND Dystonia is one of the most common movement disorders. To date, the genetic causes of dystonia in populations of European descent have been extensively studied. However, other populations, particularly those from the Middle East, have not been adequately studied. The purpose of this study is to discover the genetic basis of dystonia in a clinically and genetically well-characterised dystonia cohort from Turkey, which harbours poorly studied populations. METHODS Exome sequencing analysis was performed in 42 Turkish dystonia families. Using co-expression network (CEN) analysis, identified candidate genes were interrogated for the networks including known dystonia-associated genes and genes further associated with the protein-protein interaction, animal model-based characteristics and clinical findings. RESULTS We identified potentially disease-causing variants in the established dystonia genes (PRKRA, SGCE, KMT2B, SLC2A1, GCH1, THAP1, HPCA, TSPOAP1, AOPEP; n=11 families (26%)), in the uncommon forms of dystonia-associated genes (PCCB, CACNA1A, ALDH5A1, PRKN; n=4 families (10%)) and in the candidate genes prioritised based on the pathogenicity of the variants and CEN-based analyses (n=11 families (21%)). The diagnostic yield was found to be 36%. Several pathways and gene ontologies implicated in immune system, transcription, metabolic pathways, endosomal-lysosomal and neurodevelopmental mechanisms were over-represented in our CEN analysis. CONCLUSIONS Here, using a structured approach, we have characterised a clinically and genetically well-defined dystonia cohort from Turkey, where dystonia has not been widely studied, and provided an uncovered genetic basis, which will facilitate diagnostic dystonia research.
Collapse
Affiliation(s)
- Burcu Atasu
- Eberhard Karls Universität Tübingen Hertie Institut für klinische Hirnforschung Allgemeine Neurologie, Tubingen, Germany
| | - Javier Simón-Sánchez
- Eberhard Karls Universität Tübingen Hertie Institut für klinische Hirnforschung Allgemeine Neurologie, Tubingen, Germany
| | - Hasmet Hanagasi
- Department of Neurology, Istanbul University Istanbul Faculty of Medicine, Istanbul, Turkey
| | - Basar Bilgic
- Department of Neurology, Istanbul University Istanbul Faculty of Medicine, Istanbul, Turkey
| | - Ann-Kathrin Hauser
- Eberhard Karls Universität Tübingen Hertie Institut für klinische Hirnforschung Allgemeine Neurologie, Tubingen, Germany
| | - Gamze Guven
- Genetics Department, Aziz Sancar Institute of Experimental Medicine, Istanbul, Turkey
| | | | - Thomas Gasser
- Eberhard Karls Universität Tübingen Hertie Institut für klinische Hirnforschung Allgemeine Neurologie, Tubingen, Germany
| | - Ebba Lohmann
- Eberhard Karls Universität Tübingen Hertie Institut für klinische Hirnforschung Allgemeine Neurologie, Tubingen, Germany
| |
Collapse
|
6
|
Dagostino R, Gottlieb A. Tissue-specific atlas of trans-models for gene regulation elucidates complex regulation patterns. BMC Genomics 2024; 25:377. [PMID: 38632500 PMCID: PMC11022497 DOI: 10.1186/s12864-024-10317-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 04/16/2024] [Indexed: 04/19/2024] Open
Abstract
BACKGROUND Deciphering gene regulation is essential for understanding the underlying mechanisms of healthy and disease states. While the regulatory networks formed by transcription factors (TFs) and their target genes has been mostly studied with relation to cis effects such as in TF binding sites, we focused on trans effects of TFs on the expression of their transcribed genes and their potential mechanisms. RESULTS We provide a comprehensive tissue-specific atlas, spanning 49 tissues of TF variations affecting gene expression through computational models considering two potential mechanisms, including combinatorial regulation by the expression of the TFs, and by genetic variants within the TF. We demonstrate that similarity between tissues based on our discovered genes corresponds to other types of tissue similarity. The genes affected by complex TF regulation, and their modelled TFs, were highly enriched for pharmacogenomic functions, while the TFs themselves were also enriched in several cancer and metabolic pathways. Additionally, genes that appear in multiple clusters are enriched for regulation of immune system while tissue clusters include cluster-specific genes that are enriched for biological functions and diseases previously associated with the tissues forming the cluster. Finally, our atlas exposes multilevel regulation across multiple tissues, where TFs regulate other TFs through the two tested mechanisms. CONCLUSIONS Our tissue-specific atlas provides hierarchical tissue-specific trans genetic regulations that can be further studied for association with human phenotypes.
Collapse
Affiliation(s)
- Robert Dagostino
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Assaf Gottlieb
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.
| |
Collapse
|
7
|
Mizraji E. Homeostasis and information processing: The key frames for the thermodynamics of biological systems. Biosystems 2024; 236:105115. [PMID: 38163548 DOI: 10.1016/j.biosystems.2023.105115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 12/29/2023] [Accepted: 12/29/2023] [Indexed: 01/03/2024]
Abstract
Life is a natural phenomenon ineluctably subject to the laws and principles of physics. In this framework, thermodynamics has a crucial role, since living beings are structured on a molecular and cellular basis that can only be maintained with extensive energy consumption. This imposes that living beings are necessarily open systems. But the survival of each type of organism depends on the relative stability of certain essential variables, even in the presence of the disturbances to which they are subjected. The stability of these variables is relative in the sense that they have a narrow range of variation. This stability of the essential variables is a consequence of refined control mechanisms developed in the course of evolution, that lead to the condition called homeostasis. This homeostasis requires that control mechanisms process the various types of information related to the internal structure of the organism and its environment. Consequently, a biological system, through information processing aimed at guiding the mechanisms that maintain its homeostasis, manages the conditions imposed by the principles of thermodynamics, obtaining the most efficient use of energy possible and keeping entropic degradation controlled. In this article, we discuss the close links between thermodynamics, homeostasis and the information processing necessary to maintain homeostasis.
Collapse
Affiliation(s)
- Eduardo Mizraji
- Group of Cognitive Systems Modeling, Biophysics and Systems Biology Section, Facultad de Ciencias, Universidad de la República, Iguá 4225, Montevideo, 11400, Uruguay.
| |
Collapse
|
8
|
Wang B, Lei X, Tian W, Perez-Rathke A, Tseng YY, Liang J. Structure-based pathogenicity relationship identifier for predicting effects of single missense variants and discovery of higher-order cancer susceptibility clusters of mutations. Brief Bioinform 2023; 24:bbad206. [PMID: 37332013 PMCID: PMC10359089 DOI: 10.1093/bib/bbad206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 04/19/2023] [Accepted: 05/13/2023] [Indexed: 06/20/2023] Open
Abstract
We report the structure-based pathogenicity relationship identifier (SPRI), a novel computational tool for accurate evaluation of pathological effects of missense single mutations and prediction of higher-order spatially organized units of mutational clusters. SPRI can effectively extract properties determining pathogenicity encoded in protein structures, and can identify deleterious missense mutations of germ line origin associated with Mendelian diseases, as well as mutations of somatic origin associated with cancer drivers. It compares favorably to other methods in predicting deleterious mutations. Furthermore, SPRI can discover spatially organized pathogenic higher-order spatial clusters (patHOS) of deleterious mutations, including those of low recurrence, and can be used for discovery of candidate cancer driver genes and driver mutations. We further demonstrate that SPRI can take advantage of AlphaFold2 predicted structures and can be deployed for saturation mutation analysis of the whole human proteome.
Collapse
Affiliation(s)
- Boshen Wang
- Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill, Department of Biomedical Engineering, University of Illinois at Chicago, W103 Suite, 820 S Wood St, 60612 IL, USA
| | - Xue Lei
- Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill, Department of Biomedical Engineering, University of Illinois at Chicago, W103 Suite, 820 S Wood St, 60612 IL, USA
| | - Wei Tian
- Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill, Department of Biomedical Engineering, University of Illinois at Chicago, W103 Suite, 820 S Wood St, 60612 IL, USA
| | - Alan Perez-Rathke
- Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill, Department of Biomedical Engineering, University of Illinois at Chicago, W103 Suite, 820 S Wood St, 60612 IL, USA
| | - Yan-Yuan Tseng
- Center for Molecular Medicine and Genetics, Biochemistry and Molecular Biology Department, School of Medicine, Wayne State University, 540 E. Canfield Avenue, 48201MI, USA
| | - Jie Liang
- Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill, Department of Biomedical Engineering, University of Illinois at Chicago, W103 Suite, 820 S Wood St, 60612 IL, USA
| |
Collapse
|
9
|
Converting the genomic knowledge base to build protein specific machine learning prediction models; a classification study on thermophilic serine protease. Biologia (Bratisl) 2022. [DOI: 10.1007/s11756-022-01214-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
10
|
Bernardini A, Gallo A, Gnesutta N, Dolfini D, Mantovani R. Phylogeny of NF-YA trans-activation splicing isoforms in vertebrate evolution. Genomics 2022; 114:110390. [PMID: 35589059 DOI: 10.1016/j.ygeno.2022.110390] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 05/02/2022] [Accepted: 05/12/2022] [Indexed: 11/04/2022]
Abstract
NF-Y is a trimeric pioneer Transcription Factor (TF) whose target sequence -the CCAAT box- is present in ~25% of mammalian promoters. We reconstruct the phylogenetic history of the regulatory NF-YA subunit in vertebrates. We find that in addition to the remarkable conservation of the subunits-interaction and DNA-binding parts, the Transcriptional Activation Domain (TAD) is also conserved (>90% identity among bony vertebrates). We infer the phylogeny of the alternatively spliced exon-3 and partial splicing events of exon-7 -7N and 7C- revealing independent clade-specific losses of these regions. These isoforms shape the TAD. Absence of exon-3 in basal deuterostomes, cartilaginous fishes and hagfish, but not in lampreys, suggests that the "short" isoform is primordial, with emergence of exon-3 in chordates. Exon 7N was present in the vertebrate common ancestor, while 7C is a molecular innovation of teleost fishes. RNA-seq analysis in several species confirms expression of all these isoforms. We identify 3 blocks of amino acids in the TAD shared across deuterostomes, yet structural predictions and sequence analyses suggest an evolutionary drive for maintenance of an Intrinsically Disordered Region -IDR- within the TAD. Overall, these data help reconstruct the logic for alternative splicing of this essential eukaryotic TF.
Collapse
Affiliation(s)
- Andrea Bernardini
- Dipartimento di Bioscienze, Università degli Studi di Milano, Via Celoria 26, 20133 Milano, Italy.
| | - Alberto Gallo
- Dipartimento di Bioscienze, Università degli Studi di Milano, Via Celoria 26, 20133 Milano, Italy
| | - Nerina Gnesutta
- Dipartimento di Bioscienze, Università degli Studi di Milano, Via Celoria 26, 20133 Milano, Italy
| | - Diletta Dolfini
- Dipartimento di Bioscienze, Università degli Studi di Milano, Via Celoria 26, 20133 Milano, Italy
| | - Roberto Mantovani
- Dipartimento di Bioscienze, Università degli Studi di Milano, Via Celoria 26, 20133 Milano, Italy.
| |
Collapse
|
11
|
Labes S, Stupp D, Wagner N, Bloch I, Lotem M, L Lahad E, Polak P, Pupko T, Tabach Y. Machine-learning of complex evolutionary signals improves classification of SNVs. NAR Genom Bioinform 2022; 4:lqac025. [PMID: 35402908 PMCID: PMC8988715 DOI: 10.1093/nargab/lqac025] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 02/08/2022] [Accepted: 03/28/2022] [Indexed: 12/12/2022] Open
Abstract
Conservation is a strong predictor for the pathogenicity of single-nucleotide variants (SNVs). However, some positions that present complex conservation patterns across vertebrates stray from this paradigm. Here, we analyzed the association between complex conservation patterns and the pathogenicity of SNVs in the 115 disease-genes that had sufficient variant data. We show that conservation is not a one-rule-fits-all solution since its accuracy highly depends on the analyzed set of species and genes. For example, pairwise comparisons between the human and 99 vertebrate species showed that species differ in their ability to predict the clinical outcomes of variants among different genes using conservation. Furthermore, certain genes were less amenable for conservation-based variant prediction, while others demonstrated species that optimize prediction. These insights led to developing EvoDiagnostics, which uses the conservation against each species as a feature within a random-forest machine-learning classification algorithm. EvoDiagnostics outperformed traditional conservation algorithms, deep-learning based methods and most ensemble tools in every prediction-task, highlighting the strength of optimizing conservation analysis per-species and per-gene. Overall, we suggest a new and a more biologically relevant approach for analyzing conservation, which improves prediction of variant pathogenicity.
Collapse
Affiliation(s)
- Sapir Labes
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| | - Doron Stupp
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| | - Naama Wagner
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Idit Bloch
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| | - Michal Lotem
- Sharett Institute of Oncology, Hadassah University Medical Center, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| | - Ephrat L Lahad
- Medical Genetics Institute, Shaare Zedek Medical Center, Jerusalem9103102, Israel
| | - Paz Polak
- Oncological Sciences, Icahn School of Medicine at Mount Sinai, NY10029, USA
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Yuval Tabach
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| |
Collapse
|
12
|
Ovek D, Abali Z, Zeylan ME, Keskin O, Gursoy A, Tuncbag N. Artificial intelligence based methods for hot spot prediction. Curr Opin Struct Biol 2021; 72:209-218. [PMID: 34954608 DOI: 10.1016/j.sbi.2021.11.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 10/07/2021] [Accepted: 11/08/2021] [Indexed: 11/29/2022]
Abstract
Proteins interact through their interfaces to fulfill essential functions in the cell. They bind to their partners in a highly specific manner and form complexes that have a profound effect on understanding the biological pathways they are involved in. Any abnormal interactions may cause diseases. Therefore, the identification of small molecules which modulate protein interactions through their interfaces has high therapeutic potential. However, discovering such molecules is challenging. Most protein-protein binding affinity is attributed to a small set of amino acids found in protein interfaces known as hot spots. Recent studies demonstrate that drug-like small molecules specifically may bind to hot spots. Therefore, hot spot prediction is crucial. As experimental data accumulates, artificial intelligence begins to be used for computational hot spot prediction. First, we review machine learning and deep learning for computational hot spot prediction and then explain the significance of hot spots toward drug design.
Collapse
Affiliation(s)
- Damla Ovek
- College of Engineering, Koc University, 34450 Istanbul, Turkey
| | - Zeynep Abali
- College of Engineering, Koc University, 34450 Istanbul, Turkey
| | | | - Ozlem Keskin
- College of Engineering, Koc University, 34450 Istanbul, Turkey.
| | - Attila Gursoy
- College of Engineering, Koc University, 34450 Istanbul, Turkey.
| | - Nurcan Tuncbag
- College of Engineering, Koc University, 34450 Istanbul, Turkey; School of Medicine, Koc University, 34450 Istanbul, Turkey.
| |
Collapse
|
13
|
Yabumoto M, Kianmahd J, Singh M, Palafox MF, Wei A, Elliott K, Goodloe DH, Dean SJ, Gooch C, Murray BK, Swartz E, Schrier Vergano SA, Towne MC, Nugent K, Roeder ER, Kresge C, Pletcher BA, Grand K, Graham JM, Gates R, Gomez‐Ospina N, Ramanathan S, Clark RD, Glaser K, Benke PJ, Cohen JS, Fatemi A, Mu W, Baranano KW, Madden JA, Gubbels CS, Yu TW, Agrawal PB, Chambers M, Phornphutkul C, Pugh JA, Tauber KA, Azova S, Smith JR, O’Donnell‐Luria A, Medsker H, Srivastava S, Krakow D, Schweitzer DN, Arboleda VA. Novel variants in KAT6B spectrum of disorders expand our knowledge of clinical manifestations and molecular mechanisms. Mol Genet Genomic Med 2021; 9:e1809. [PMID: 34519438 PMCID: PMC8580094 DOI: 10.1002/mgg3.1809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 08/26/2021] [Indexed: 01/07/2023] Open
Abstract
The phenotypic variability associated with pathogenic variants in Lysine Acetyltransferase 6B (KAT6B, a.k.a. MORF, MYST4) results in several interrelated syndromes including Say-Barber-Biesecker-Young-Simpson Syndrome and Genitopatellar Syndrome. Here we present 20 new cases representing 10 novel KAT6B variants. These patients exhibit a range of clinical phenotypes including intellectual disability, mobility and language difficulties, craniofacial dysmorphology, and skeletal anomalies. Given the range of features previously described for KAT6B-related syndromes, we have identified additional phenotypes including concern for keratoconus, sensitivity to light or noise, recurring infections, and fractures in greater numbers than previously reported. We surveyed clinicians to qualitatively assess the ways families engage with genetic counselors upon diagnosis. We found that 56% (10/18) of individuals receive diagnoses before the age of 2 years (median age = 1.96 years), making it challenging to address future complications with limited accessible information and vast phenotypic severity. We used CRISPR to introduce truncating variants into the KAT6B gene in model cell lines and performed chromatin accessibility and transcriptome sequencing to identify key dysregulated pathways. This study expands the clinical spectrum and addresses the challenges to management and genetic counseling for patients with KAT6B-related disorders.
Collapse
Affiliation(s)
- Megan Yabumoto
- Department of Human GeneticsDavid Geffen School of MedicineUCLALos AngelesCaliforniaUSA,Department of Pathology and Laboratory MedicineDavid Geffen School of MedicineUCLALos AngelesCaliforniaUSA
| | - Jessica Kianmahd
- Division of Medical GeneticsDepartment of PediatricsDavid Geffen School of MedicineUCLALos AngelesCaliforniaUSA
| | - Meghna Singh
- Department of Human GeneticsDavid Geffen School of MedicineUCLALos AngelesCaliforniaUSA,Department of Pathology and Laboratory MedicineDavid Geffen School of MedicineUCLALos AngelesCaliforniaUSA
| | - Maria F. Palafox
- Department of Human GeneticsDavid Geffen School of MedicineUCLALos AngelesCaliforniaUSA,Department of Pathology and Laboratory MedicineDavid Geffen School of MedicineUCLALos AngelesCaliforniaUSA
| | - Angela Wei
- Department of Pathology and Laboratory MedicineDavid Geffen School of MedicineUCLALos AngelesCaliforniaUSA
| | - Kathryn Elliott
- Department of Pathology and Laboratory MedicineDavid Geffen School of MedicineUCLALos AngelesCaliforniaUSA
| | - Dana H. Goodloe
- Department of GeneticsUniversity of Alabama at BirminghamBirminghamAlabamaUSA
| | - S. Joy Dean
- Department of GeneticsUniversity of Alabama at BirminghamBirminghamAlabamaUSA
| | - Catherine Gooch
- Department of PediatricsWashington University School of Medicine in St. LouisSt. LouisMissouriUSA
| | - Brianna K. Murray
- Division of Medical Genetics and MetabolismChildren’s Hospital of The King’s DaughtersNorfolkVirginiaUSA
| | - Erin Swartz
- Division of Medical Genetics and MetabolismChildren’s Hospital of The King’s DaughtersNorfolkVirginiaUSA
| | | | | | - Kimberly Nugent
- Department of PediatricsBaylor College of MedicineSan AntonioTexasUSA,Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTexasUSA
| | - Elizabeth R. Roeder
- Department of PediatricsBaylor College of MedicineSan AntonioTexasUSA,Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTexasUSA
| | - Christina Kresge
- Department of PediatricsDivision of Clinical GeneticsRutgers New Jersey Medical SchoolNewarkNew JerseyUSA
| | - Beth A. Pletcher
- Department of PediatricsDivision of Clinical GeneticsRutgers New Jersey Medical SchoolNewarkNew JerseyUSA
| | - Katheryn Grand
- Department of PediatricsCedars‐Sinai Medical CenterLos AngelesCaliforniaUSA
| | - John M. Graham
- Department of PediatricsCedars‐Sinai Medical CenterLos AngelesCaliforniaUSA
| | - Ryan Gates
- Department of PediatricsDivision of Medical GeneticsStanford UniversityStanfordCaliforniaUSA
| | - Natalia Gomez‐Ospina
- Department of PediatricsDivision of Medical GeneticsStanford UniversityStanfordCaliforniaUSA
| | - Subhadra Ramanathan
- Department of PediatricsDivision of Medical GeneticsLoma Linda University Children’s HospitalLoma LindaCaliforniaUSA
| | - Robin Dawn Clark
- Department of PediatricsDivision of Medical GeneticsLoma Linda University Children’s HospitalLoma LindaCaliforniaUSA
| | - Kimberly Glaser
- Division of GeneticsJoe DiMaggio Children’s HospitalHollywoodFloridaUSA
| | - Paul J. Benke
- Division of GeneticsJoe DiMaggio Children’s HospitalHollywoodFloridaUSA
| | - Julie S. Cohen
- Department of Neurology and Developmental MedicineKennedy Krieger InstituteBaltimoreMarylandUSA,Department of NeurologyJohns Hopkins School of MedicineBaltimoreMarylandUSA
| | - Ali Fatemi
- Department of Neurology and Developmental MedicineKennedy Krieger InstituteBaltimoreMarylandUSA,Department of NeurologyJohns Hopkins School of MedicineBaltimoreMarylandUSA
| | - Weiyi Mu
- Department of Genetic MedicineJohns Hopkins School of MedicineBaltimoreMarylandUSA
| | | | - Jill A. Madden
- Division of Genetics and GenomicsDepartment of PediatricsBoston Children’s HospitalHarvard Medical SchoolBostonMassachusettsUSA,The Manton Center for Orphan Disease ResearchBoston Children’s HospitalBostonMassachusettsUSA
| | - Cynthia S. Gubbels
- Division of Genetics and GenomicsDepartment of PediatricsBoston Children’s HospitalHarvard Medical SchoolBostonMassachusettsUSA
| | - Timothy W. Yu
- Division of Genetics and GenomicsDepartment of PediatricsBoston Children’s HospitalHarvard Medical SchoolBostonMassachusettsUSA
| | - Pankaj B. Agrawal
- Division of Genetics and GenomicsDepartment of PediatricsBoston Children’s HospitalHarvard Medical SchoolBostonMassachusettsUSA,The Manton Center for Orphan Disease ResearchBoston Children’s HospitalBostonMassachusettsUSA,Division of Newborn MedicineDepartment of PediatricsBoston Children’s HospitalBostonMassachusettsUSA
| | - Mary‐Kathryn Chambers
- Division of Human GeneticsWarren Alpert Medical School of Brown UniversityHasbro Children’s Hospital/Rhode Island HospitalProvidenceRhode IslandUSA
| | - Chanika Phornphutkul
- Division of Human GeneticsWarren Alpert Medical School of Brown UniversityHasbro Children’s Hospital/Rhode Island HospitalProvidenceRhode IslandUSA
| | - John A. Pugh
- Division of Child NeurologyDepartment of NeurologyAlbany Medical CenterAlbanyNew YorkUSA
| | - Kate A. Tauber
- Division of NeonatologyDepartment of PediatricsAlbany Medical CenterBernard and Millie Duker Children’s HospitalAlbanyNew YorkUSA
| | - Svetlana Azova
- Division of EndocrinologyBoston Children’s HospitalHarvard Medical SchoolBostonMassachusettsUSA
| | - Jessica R. Smith
- Division of EndocrinologyBoston Children’s HospitalHarvard Medical SchoolBostonMassachusettsUSA
| | - Anne O’Donnell‐Luria
- Division of Genetics and GenomicsDepartment of PediatricsBoston Children’s HospitalHarvard Medical SchoolBostonMassachusettsUSA
| | - Hannah Medsker
- Department of NeurologyBoston Children’s HospitalHarvard Medical SchoolBostonMassachusettsUSA
| | - Siddharth Srivastava
- Department of NeurologyBoston Children’s HospitalHarvard Medical SchoolBostonMassachusettsUSA
| | - Deborah Krakow
- Department of Human GeneticsDavid Geffen School of MedicineUCLALos AngelesCaliforniaUSA,Department of Obstetrics and GynecologyDavid Geffen School of MedicineUCLALos AngelesCaliforniaUSA
| | - Daniela N. Schweitzer
- Division of Medical GeneticsDepartment of PediatricsDavid Geffen School of MedicineUCLALos AngelesCaliforniaUSA
| | - Valerie A. Arboleda
- Department of Human GeneticsDavid Geffen School of MedicineUCLALos AngelesCaliforniaUSA,Department of Pathology and Laboratory MedicineDavid Geffen School of MedicineUCLALos AngelesCaliforniaUSA
| |
Collapse
|
14
|
Mahlich Y, Miller M, Zeng Z, Bromberg Y. Low Diversity of Human Variation Despite Mostly Mild Functional Impact of De Novo Variants. Front Mol Biosci 2021; 8:635382. [PMID: 33816556 PMCID: PMC8012514 DOI: 10.3389/fmolb.2021.635382] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 02/01/2021] [Indexed: 01/07/2023] Open
Abstract
Non-synonymous Single Nucleotide Variants (nsSNVs), resulting in single amino acid variants (SAVs), are important drivers of evolutionary adaptation across the tree of life. Humans carry on average over 10,000 SAVs per individual genome, many of which likely have little to no impact on the function of the protein they affect. Experimental evidence for protein function changes as a result of SAVs remain sparse – a situation that can be somewhat alleviated by predicting their impact using computational methods. Here, we used SNAP to examine both observed and in silico generated human variation in a set of 1,265 proteins that are consistently found across a number of diverse species. The number of SAVs that are predicted to have any functional effect on these proteins is smaller than expected, suggesting sequence/function optimization over evolutionary timescales. Additionally, we find that only a few of the yet-unobserved SAVs could drastically change the function of these proteins, while nearly a quarter would have only a mild functional effect. We observed that variants common in the human population localized to less conserved protein positions and carried mild to moderate functional effects more frequently than rare variants. As expected, rare variants carried severe effects more frequently than common variants. In line with current assumptions, we demonstrated that the change of the human reference sequence amino acid to the reference of another species (a cross-species variant) is unlikely to significantly impact protein function. However, we also observed that many cross-species variants may be weakly non-neutral for the purposes of quick adaptation to environmental changes, but may not be identified as such by current state-of-the-art methodology.
Collapse
Affiliation(s)
- Yannick Mahlich
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States
| | - Maximillian Miller
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States
| | - Zishuo Zeng
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States.,Department of Genetics, Rutgers University, Piscataway, NJ, United States
| |
Collapse
|
15
|
Zhou JB, Xiong Y, An K, Ye ZQ, Wu YD. IDRMutPred: predicting disease-associated germline nonsynonymous single nucleotide variants (nsSNVs) in intrinsically disordered regions. Bioinformatics 2021; 36:4977-4983. [PMID: 32756939 PMCID: PMC7755418 DOI: 10.1093/bioinformatics/btaa618] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 06/28/2020] [Accepted: 07/01/2020] [Indexed: 01/09/2023] Open
Abstract
Motivation Despite of the lack of folded structure, intrinsically disordered regions (IDRs) of proteins play versatile roles in various biological processes, and many nonsynonymous single nucleotide variants (nsSNVs) in IDRs are associated with human diseases. The continuous accumulation of nsSNVs resulted from the wide application of NGS has driven the development of disease-association prediction methods for decades. However, their performance on nsSNVs in IDRs remains inferior, possibly due to the domination of nsSNVs from structured regions in training data. Therefore, it is highly demanding to build a disease-association predictor specifically for nsSNVs in IDRs with better performance. Results We present IDRMutPred, a machine learning-based tool specifically for predicting disease-associated germline nsSNVs in IDRs. Based on 17 selected optimal features that are extracted from sequence alignments, protein annotations, hydrophobicity indices and disorder scores, IDRMutPred was trained using three ensemble learning algorithms on the training dataset containing only IDR nsSNVs. The evaluation on the two testing datasets shows that all the three prediction models outperform 17 other popular general predictors significantly, achieving the ACC between 0.856 and 0.868 and MCC between 0.713 and 0.737. IDRMutPred will prioritize disease-associated IDR germline nsSNVs more reliably than general predictors. Availability and implementation The software is freely available at http://www.wdspdb.com/IDRMutPred. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jing-Bo Zhou
- Lab of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | - Yao Xiong
- Lab of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | - Ke An
- Lab of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | - Zhi-Qiang Ye
- Lab of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China.,Shenzhen Bay Laboratory, Shenzhen 518055, China
| | - Yun-Dong Wu
- Lab of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China.,Shenzhen Bay Laboratory, Shenzhen 518055, China.,College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| |
Collapse
|
16
|
Rochman ND, Wolf YI, Koonin EV. Deep phylogeny of cancer drivers and compensatory mutations. Commun Biol 2020; 3:551. [PMID: 33009502 PMCID: PMC7532533 DOI: 10.1038/s42003-020-01276-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Accepted: 09/03/2020] [Indexed: 12/14/2022] Open
Abstract
Driver mutations (DM) are the genetic impetus for most cancers. The DM are assumed to be deleterious in species evolution, being eliminated by purifying selection unless compensated by other mutations. We present deep phylogenies for 84 cancer driver genes and investigate the prevalence of 434 DM across gene-species trees. The DM are rare in species evolution, and 181 are completely absent, validating their negative fitness effect. The DM are more common in unicellular than in multicellular eukaryotes, suggesting a link between these mutations and cell proliferation control. 18 DM appear as the ancestral state in one or more major clades, including 3 among mammals. We identify within-gene, compensatory mutations for 98 DM and infer likely interactions between the DM and compensatory sites in protein structures. These findings elucidate the evolutionary status of DM and are expected to advance the understanding of the functions and evolution of oncogenes and tumor suppressors. Rochman et al. present deep phylogenies for 84 cancer driver genes and examine the prevalence of driver mutations across gene-species trees. Their results show that driver mutations are rare in species evolution and give insight into the evolution of driver mutations and oncogenes.
Collapse
Affiliation(s)
- Nash D Rochman
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA.
| |
Collapse
|
17
|
Kuppu S, Ron M, Marimuthu MP, Li G, Huddleson A, Siddeek MH, Terry J, Buchner R, Shabek N, Comai L, Britt AB. A variety of changes, including CRISPR/Cas9-mediated deletions, in CENH3 lead to haploid induction on outcrossing. PLANT BIOTECHNOLOGY JOURNAL 2020; 18:2068-2080. [PMID: 32096293 PMCID: PMC7540420 DOI: 10.1111/pbi.13365] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Revised: 02/03/2020] [Accepted: 02/13/2020] [Indexed: 05/03/2023]
Abstract
Creating true-breeding lines is a critical step in plant breeding. Novel, completely homozygous true-breeding lines can be generated by doubled haploid technology in single generation. Haploid induction through modification of the centromere-specific histone 3 variant (CENH3), including chimeric proteins, expression of non-native CENH3 and single amino acid substitutions, has been shown to induce, on outcrossing to wild type, haploid progeny possessing only the genome of the wild-type parent, in Arabidopsis thaliana. Here, we report the characterization of 31 additional EMS-inducible amino acid substitutions in CENH3 for their ability to complement a knockout in the endogenous CENH3 gene and induce haploid progeny when pollinated by the wild type. We also tested the effect of double amino acid changes, which might be generated through a second round of EMS mutagenesis. Finally, we report on the effects of CRISPR/Cas9-mediated in-frame deletions in the αN helix of the CENH3 histone fold domain. Remarkably, we found that complete deletion of the αN helix, which is conserved throughout angiosperms, results in plants which exhibit normal growth and fertility while acting as excellent haploid inducers when pollinated by wild-type pollen. Both of these technologies, CRISPR mutagenesis and EMS mutagenesis, represent non-transgenic approaches to the generation of haploid inducers.
Collapse
Affiliation(s)
- Sundaram Kuppu
- Department of Plant BiologyUniversity of CaliforniaDavisCAUSA
| | - Mily Ron
- Department of Plant BiologyUniversity of CaliforniaDavisCAUSA
| | - Mohan P.A. Marimuthu
- Department of Plant BiologyUniversity of CaliforniaDavisCAUSA
- UC Davis Genome CenterUniversity of CaliforniaDavisCAUSA
| | - Glenda Li
- Department of Plant BiologyUniversity of CaliforniaDavisCAUSA
| | - Amy Huddleson
- Department of Plant BiologyUniversity of CaliforniaDavisCAUSA
| | | | - Joshua Terry
- Department of Plant BiologyUniversity of CaliforniaDavisCAUSA
| | - Ryan Buchner
- Department of Plant BiologyUniversity of CaliforniaDavisCAUSA
| | - Nitzan Shabek
- Department of Plant BiologyUniversity of CaliforniaDavisCAUSA
| | - Luca Comai
- Department of Plant BiologyUniversity of CaliforniaDavisCAUSA
- UC Davis Genome CenterUniversity of CaliforniaDavisCAUSA
| | - Anne B. Britt
- Department of Plant BiologyUniversity of CaliforniaDavisCAUSA
| |
Collapse
|
18
|
Suplatov D, Sharapova Y, Geraseva E, Švedas V. Zebra2: advanced and easy-to-use web-server for bioinformatic analysis of subfamily-specific and conserved positions in diverse protein superfamilies. Nucleic Acids Res 2020; 48:W65-W71. [PMID: 32313959 PMCID: PMC7319439 DOI: 10.1093/nar/gkaa276] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Revised: 03/29/2020] [Accepted: 04/08/2020] [Indexed: 12/17/2022] Open
Abstract
Zebra2 is a highly automated web-tool to search for subfamily-specific and conserved positions (i.e. the determinants of functional diversity as well as the key catalytic and structural residues) in protein superfamilies. The bioinformatic analysis is facilitated by Mustguseal—a companion web-server to automatically collect and superimpose a large representative set of functionally diverse homologs with high structure similarity but low sequence identity to the selected query protein. The results are automatically prioritized and provided at four information levels to facilitate the knowledge-driven expert selection of the most promising positions on-line: as a sequence similarity network; interfaces to sequence-based and 3D-structure-based analysis of conservation and variability; and accompanied by the detailed annotation of proteins accumulated from the integrated databases with links to the external resources. The integration of Zebra2 and Mustguseal web-tools provides the first of its kind out-of-the-box open-access solution to conduct a systematic analysis of evolutionarily related proteins implementing different functions within a shared 3D-structure of the superfamily, determine common and specific patterns of function-associated local structural elements, assist to select hot-spots for rational design and to prepare focused libraries for directed evolution. The web-servers are free and open to all users at https://biokinet.belozersky.msu.ru/zebra2, no login required.
Collapse
Affiliation(s)
- Dmitry Suplatov
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology and Faculty of Bioengineering and Bioinformatics, Lenin Hills 1-73, Moscow 119234, Russia
| | - Yana Sharapova
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology and Faculty of Bioengineering and Bioinformatics, Lenin Hills 1-73, Moscow 119234, Russia
| | - Elizaveta Geraseva
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology and Faculty of Bioengineering and Bioinformatics, Lenin Hills 1-73, Moscow 119234, Russia
| | - Vytas Švedas
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology and Faculty of Bioengineering and Bioinformatics, Lenin Hills 1-73, Moscow 119234, Russia
| |
Collapse
|
19
|
Malhis N, Jacobson M, Jones SJM, Gsponer J. LIST-S2: taxonomy based sorting of deleterious missense mutations across species. Nucleic Acids Res 2020; 48:W154-W161. [PMID: 32352516 PMCID: PMC7319545 DOI: 10.1093/nar/gkaa288] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 04/05/2020] [Accepted: 04/19/2020] [Indexed: 12/18/2022] Open
Abstract
The separation of deleterious from benign mutations remains a key challenge in the interpretation of genomic data. Computational methods used to sort mutations based on their potential deleteriousness rely largely on conservation measures derived from sequence alignments. Here, we introduce LIST-S2, a successor to our previously developed approach LIST, which aims to exploit local sequence identity and taxonomy distances in quantifying the conservation of human protein sequences. Unlike its predecessor, LIST-S2 is not limited to human sequences but can assess conservation and make predictions for sequences from any organism. Moreover, we provide a web-tool and downloadable software to compute and visualize the deleteriousness of mutations in user-provided sequences. This web-tool contains an HTML interface and a RESTful API to submit and manage sequences as well as a browsable set of precomputed predictions for a large number of UniProtKB protein sequences of common taxa. LIST-S2 is available at: https://list-s2.msl.ubc.ca/.
Collapse
Affiliation(s)
- Nawar Malhis
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Matthew Jacobson
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Steven J M Jones
- Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada.,Department of Medical Genetics, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4, Canada.,Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
| |
Collapse
|
20
|
Analysis of APPL1 Gene Polymorphisms in Patients with a Phenotype of Maturity Onset Diabetes of the Young. J Pers Med 2020; 10:jpm10030100. [PMID: 32854233 PMCID: PMC7565648 DOI: 10.3390/jpm10030100] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 08/03/2020] [Accepted: 08/22/2020] [Indexed: 02/06/2023] Open
Abstract
The APPL1 gene encodes a protein mediating the cross-talk between adiponectin and insulin signaling. Recently, it was found that APPL1 mutations can cause maturity onset diabetes of the young, type 14. Here, an analysis of APPL1 was performed in patients with a maturity-onset diabetes of the young (MODY) phenotype, and prevalence of these mutations was estimated in a Russian population, among type 2 diabetes mellitus (T2DM) and MODY patients. Whole-exome sequencing or targeted sequencing was performed on 151 probands with a MODY phenotype, with subsequent association analysis of one of identified variants, rs11544593, in a white population of Western Siberia (276 control subjects and 169 T2DM patients). Thirteen variants were found in APPL1, three of which (rs79282761, rs138485817, and rs11544593) are located in exons. There were no statistically significant differences in the frequencies of rs11544593 alleles and genotypes between T2DM patients and the general population. In the MODY group, AG rs11544593 genotype carriers were significantly more frequent (AG vs. AA + GG: odds ratio 1.83, confidence interval 1.15-2.90, p = 0.011) compared with the control group. An association of rs11544593 with blood glucose concentration was revealed in the MODY group. The genotyping data suggest that rs11544593 may contribute to carbohydrate metabolism disturbances.
Collapse
|