101
|
Wang C, Balch WE. Bridging Genomics to Phenomics at Atomic Resolution through Variation Spatial Profiling. Cell Rep 2020; 24:2013-2028.e6. [PMID: 30134164 PMCID: PMC6261431 DOI: 10.1016/j.celrep.2018.07.059] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 06/25/2018] [Accepted: 07/16/2018] [Indexed: 01/04/2023] Open
Abstract
To understand the impact of genome sequence variation (the genotype) responsible for biological diversity and human health (the phenotype) including cystic fibrosis and Alzheimer's disease, we developed a Gaussian-process-based machine learning (ML) approach, variation spatial profiling (VSP). VSP uses a sparse collection of known variants found in the population that perturb the protein fold to define unknown variant function based on the emergent general principle of spatial covariance (SCV). SCV quantitatively captures the role of proximity in genotype-to-phenotype spatial-temporal relationships. Phenotype landscapes generated through SCV provide a platform that can be used to describe the functional properties that drive sequence-to-function-to-structure design of the polypeptide fold at atomic resolution. We provide proof of principle that SCV can enable the use of population-based genomic platforms to define the origins and mechanism of action of genotype-to-phenotype transformations contributing to the health and disease of an individual.
Collapse
Affiliation(s)
- Chao Wang
- Department of Molecular Medicine, The Scripps Research Institute (TSRI), La Jolla, CA 92037, USA
| | - William E Balch
- Department of Molecular Medicine, The Scripps Research Institute (TSRI), La Jolla, CA 92037, USA; The Skaggs Institute for Chemical Biology, The Scripps Research Institute (TSRI), La Jolla, CA 92037, USA.
| |
Collapse
|
102
|
Defining the landscape of ATP-competitive inhibitor resistance residues in protein kinases. Nat Struct Mol Biol 2020; 27:92-104. [PMID: 31925410 DOI: 10.1038/s41594-019-0358-z] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2019] [Accepted: 11/27/2019] [Indexed: 02/07/2023]
Abstract
Kinases are involved in disease development and modulation of their activity can be therapeutically beneficial. Drug-resistant mutant kinases are valuable tools in drug discovery efforts, but the prediction of mutants across the kinome is challenging. Here, we generate deep mutational scanning data to identify mutant mammalian kinases that drive resistance to clinically relevant inhibitors. We aggregate these data with subsaturation mutagenesis data and use it to develop, test and validate a framework to prospectively identify residues that mediate kinase activity and drug resistance across the kinome. We validate predicted resistance mutations in CDK4, CDK6, ERK2, EGFR and HER2. Capitalizing on a highly predictable residue, we generate resistance mutations in TBK1, CSNK2A1 and BRAF. Unexpectedly, we uncover a potentially generalizable activation site that mediates drug resistance and confirm its impact in BRAF, EGFR, HER2 and MEK1. We anticipate that the identification of these residues will enable the broad interrogation of the kinome and its inhibitors.
Collapse
|
103
|
Sruthi CK, Prakash M. Deep2Full: Evaluating strategies for selecting the minimal mutational experiments for optimal computational predictions of deep mutational scan outcomes. PLoS One 2020; 15:e0227621. [PMID: 31923916 PMCID: PMC6954071 DOI: 10.1371/journal.pone.0227621] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Accepted: 12/23/2019] [Indexed: 11/18/2022] Open
Abstract
Performing a complete deep mutational scan with all single point mutations may not be practical, and may not even be required, especially if predictive computational models can be developed. Computational models are however naive to cellular response in the myriads of assay-conditions. In a realistic paradigm of assay context-aware predictive hybrid models that combine minimal experimental data from deep mutational scans with structure, sequence information and computational models, we define and evaluate different strategies for choosing this minimal set. We evaluated the trivial strategy of a systematic reduction in the number of mutational studies from 85% to 15%, along with several others about the choice of the types of mutations such as random versus site-directed with the same 15% data completeness. Interestingly, the predictive capabilities by training on a random set of mutations and using a systematic substitution of all amino acids to alanine, asparagine and histidine (ANH) were comparable. Another strategy we explored, augmenting the training data with measurements of the same mutants at multiple assay conditions, did not improve the prediction quality. For the six proteins we analyzed, the bin-wise error in prediction is optimal when 50-100 mutations per bin are used in training the computational model, suggesting that good prediction quality may be achieved with a library of 500-1000 mutations.
Collapse
Affiliation(s)
- C. K. Sruthi
- Theoretical Sciences Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore, India
| | - Meher Prakash
- Theoretical Sciences Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore, India
- * E-mail:
| |
Collapse
|
104
|
|
105
|
Gelman H, Dines JN, Berg J, Berger AH, Brnich S, Hisama FM, James RG, Rubin AF, Shendure J, Shirts B, Fowler DM, Starita LM. Recommendations for the collection and use of multiplexed functional data for clinical variant interpretation. Genome Med 2019; 11:85. [PMID: 31862013 PMCID: PMC6925490 DOI: 10.1186/s13073-019-0698-7] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Accepted: 11/20/2019] [Indexed: 01/31/2023] Open
Abstract
Variants of uncertain significance represent a massive challenge to medical genetics. Multiplexed functional assays, in which the functional effects of thousands of genomic variants are assessed simultaneously, are increasingly generating data that can be used as additional evidence for or against variant pathogenicity. Such assays have the potential to resolve variants of uncertain significance, thereby increasing the clinical utility of genomic testing. Existing standards from the American College of Medical Genetics and Genomics (ACMG)/Association for Molecular Pathology (AMP) and new guidelines from the Clinical Genome Resource (ClinGen) establish the role of functional data in variant interpretation, but do not address the specific challenges or advantages of using functional data derived from multiplexed assays. Here, we build on these existing guidelines to provide recommendations to experimentalists for the production and reporting of multiplexed functional data and to clinicians for the evaluation and use of such data. By following these recommendations, experimentalists can produce transparent, complete, and well-validated datasets that are primed for clinical uptake. Our recommendations to clinicians and diagnostic labs on how to evaluate the quality of multiplexed functional datasets, and how different datasets could be incorporated into the ACMG/AMP variant-interpretation framework, will hopefully clarify whether and how such data should be used. The recommendations that we provide are designed to enhance the quality and utility of multiplexed functional data, and to promote their judicious use.
Collapse
Affiliation(s)
- Hannah Gelman
- Department of Genome Sciences, University of Washington School of Medicine, 15th Avenue NE, Seattle, WA, 98195, USA
- Current affiliation: Center of Innovation for Veteran-Centered and Value-Driven Care, VA Puget Sound Health Care System, S Columbian Way, Seattle, WA, 98108, USA
| | - Jennifer N Dines
- Department of Genome Sciences, University of Washington School of Medicine, 15th Avenue NE, Seattle, WA, 98195, USA
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA, 98195, USA
- Current affiliation: Adaptive Biotechnologies, Eastlake Avenue E, Seattle, WA, 98102, USA
| | - Jonathan Berg
- Department of Genetics, University of North Carolina at Chapel Hill,, Mason Farm Road, Chapel Hill, NC, 27514, USA
| | - Alice H Berger
- Human Biology Division, Fred Hutchinson Cancer Research Center, Fairview Avenue, Seattle, WA, 98109, USA
- Brotman Baty Institute for Precision Medicine, NE Pacific Street, Seattle, WA, 98195, USA
| | - Sarah Brnich
- Department of Genetics, University of North Carolina at Chapel Hill,, Mason Farm Road, Chapel Hill, NC, 27514, USA
| | - Fuki M Hisama
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA, 98195, USA
- Brotman Baty Institute for Precision Medicine, NE Pacific Street, Seattle, WA, 98195, USA
| | - Richard G James
- Brotman Baty Institute for Precision Medicine, NE Pacific Street, Seattle, WA, 98195, USA
- Department of Pediatrics, University of Washington School of Medicine, NE Pacific Street, Seattle, WA, 98195, USA
- Center for Immunity and Immunotherapies, Seattle Children, Research Institute, Ninth Avenue, Seattle, WA, 98145, USA
| | - Alan F Rubin
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Royal Parade, Parkville, VIC, 3052, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, 3010, Australia
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Grattan Street, Melbourne, VIC, 3000, Australia
| | - Jay Shendure
- Department of Genome Sciences, University of Washington School of Medicine, 15th Avenue NE, Seattle, WA, 98195, USA
- Brotman Baty Institute for Precision Medicine, NE Pacific Street, Seattle, WA, 98195, USA
- Howard Hughes Medical Institute, Pacific Street, Seattle, WA, 98195, USA
| | - Brian Shirts
- Brotman Baty Institute for Precision Medicine, NE Pacific Street, Seattle, WA, 98195, USA
- Department of Laboratory Medicine, University of Washington School of Medicine, NE Pacific Street, Seattle, WA, 98195, USA
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington School of Medicine, 15th Avenue NE, Seattle, WA, 98195, USA.
- Brotman Baty Institute for Precision Medicine, NE Pacific Street, Seattle, WA, 98195, USA.
- Department of Bioengineering, University of Washington, 15th Avenue NE, Seattle, WA, 98195, USA.
| | - Lea M Starita
- Department of Genome Sciences, University of Washington School of Medicine, 15th Avenue NE, Seattle, WA, 98195, USA.
- Brotman Baty Institute for Precision Medicine, NE Pacific Street, Seattle, WA, 98195, USA.
| |
Collapse
|
106
|
Tang N, Sandahl TD, Ott P, Kepp KP. Computing the Pathogenicity of Wilson's Disease ATP7B Mutations: Implications for Disease Prevalence. J Chem Inf Model 2019; 59:5230-5243. [PMID: 31751128 DOI: 10.1021/acs.jcim.9b00852] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Genetic variations in the gene encoding the copper-transport protein ATP7B are the primary cause of Wilson's disease. Controversially, clinical prevalence seems much smaller than the prevalence estimated by genetic screening tools, causing fear that many people are undiagnosed, although early diagnosis and treatment is essential. To address this issue, we benchmarked 16 state-of-the-art computational disease-prediction methods against established data of missense ATP7B mutations. Our results show that the quality of the methods varies widely. We show the importance of optimizing the threshold of the methods used to distinguish pathogenic from nonpathogenic mutations against data of clinically confirmed pathogenic and nonpathogenic mutations. We find that most methods use thresholds that predict too many ATP7B mutations to be pathogenic. Thus, our findings explain the current controversy on Wilson's disease prevalence because meta-analysis and text search methods include many computational estimates that lead to higher disease prevalence than clinically observed. As proteins and diseases differ widely, a one-size-fits-all threshold cannot distinguish pathogenic and nonpathogenic mutations efficiently, as shown here. We also show that amino acid changes with small evolutionary substitution probability, mainly due to amino acid volume, are more associated with the disease, implying a pathological effect on the conformational state of the protein, which could affect copper transport or adenosine triphosphate recognition and hydrolysis. These findings may be a first step toward a more quantitative genotype-phenotype relationship of Wilson's disease.
Collapse
Affiliation(s)
- Ning Tang
- DTU Chemistry , Technical University of Denmark , Kemitorvet 206 , 2800 Kongens Lyngby , Denmark
| | - Thomas D Sandahl
- Department of Hepatology and Gastroenterology , Aarhus University Hospital , 8200 Aarhus , Denmark
| | - Peter Ott
- Department of Hepatology and Gastroenterology , Aarhus University Hospital , 8200 Aarhus , Denmark
| | - Kasper P Kepp
- DTU Chemistry , Technical University of Denmark , Kemitorvet 206 , 2800 Kongens Lyngby , Denmark
| |
Collapse
|
107
|
Kim HY, Kim D. Prediction of mutation effects using a deep temporal convolutional network. Bioinformatics 2019; 36:2047-2052. [DOI: 10.1093/bioinformatics/btz873] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2019] [Revised: 11/14/2019] [Accepted: 11/19/2019] [Indexed: 11/13/2022] Open
Abstract
Abstract
Motivation
Accurate prediction of the effects of genetic variation is a major goal in biological research. Towards this goal, numerous machine learning models have been developed to learn information from evolutionary sequence data. The most effective method so far is a deep generative model based on the variational autoencoder (VAE) that models the distributions using a latent variable. In this study, we propose a deep autoregressive generative model named mutationTCN, which employs dilated causal convolutions and attention mechanism for the modeling of inter-residue correlations in a biological sequence.
Results
We show that this model is competitive with the VAE model when tested against a set of 42 high-throughput mutation scan experiments, with the mean improvement in Spearman rank correlation ∼0.023. In particular, our model can more efficiently capture information from multiple sequence alignments with lower effective number of sequences, such as in viral sequence families, compared with the latent variable model. Also, we extend this architecture to a semi-supervised learning framework, which shows high prediction accuracy. We show that our model enables a direct optimization of the data likelihood and allows for a simple and stable training process.
Availability and implementation
Source code is available at https://github.com/ha01994/mutationTCN.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ha Young Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| | - Dongsup Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| |
Collapse
|
108
|
Esposito D, Weile J, Shendure J, Starita LM, Papenfuss AT, Roth FP, Fowler DM, Rubin AF. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol 2019; 20:223. [PMID: 31679514 PMCID: PMC6827219 DOI: 10.1186/s13059-019-1845-6] [Citation(s) in RCA: 117] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Accepted: 10/01/2019] [Indexed: 11/10/2022] Open
Abstract
Multiplex assays of variant effect (MAVEs), such as deep mutational scans and massively parallel reporter assays, test thousands of sequence variants in a single experiment. Despite the importance of MAVE data for basic and clinical research, there is no standard resource for their discovery and distribution. Here, we present MaveDB ( https://www.mavedb.org ), a public repository for large-scale measurements of sequence variant impact, designed for interoperability with applications to interpret these datasets. We also describe the first such application, MaveVis, which retrieves, visualizes, and contextualizes variant effect maps. Together, the database and applications will empower the community to mine these powerful datasets.
Collapse
Affiliation(s)
- Daniel Esposito
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
| | - Jochen Weile
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Lea M Starita
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Anthony T Papenfuss
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
- Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| | - Frederick P Roth
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada.
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
| | - Alan F Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia.
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia.
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.
| |
Collapse
|
109
|
Unified rational protein engineering with sequence-based deep representation learning. Nat Methods 2019; 16:1315-1322. [PMID: 31636460 DOI: 10.1038/s41592-019-0598-1] [Citation(s) in RCA: 491] [Impact Index Per Article: 98.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Accepted: 09/11/2019] [Indexed: 01/03/2023]
Abstract
Rational protein engineering requires a holistic understanding of protein function. Here, we apply deep learning to unlabeled amino-acid sequences to distill the fundamental features of a protein into a statistical representation that is semantically rich and structurally, evolutionarily and biophysically grounded. We show that the simplest models built on top of this unified representation (UniRep) are broadly applicable and generalize to unseen regions of sequence space. Our data-driven approach predicts the stability of natural and de novo designed proteins, and the quantitative function of molecularly diverse mutants, competitively with the state-of-the-art methods. UniRep further enables two orders of magnitude efficiency improvement in a protein engineering task. UniRep is a versatile summary of fundamental protein features that can be applied across protein engineering informatics.
Collapse
|
110
|
Balak C, Benard M, Schaefer E, Iqbal S, Ramsey K, Ernoult-Lange M, Mattioli F, Llaci L, Geoffroy V, Courel M, Naymik M, Bachman KK, Pfundt R, Rump P, Ter Beest J, Wentzensen IM, Monaghan KG, McWalter K, Richholt R, Le Béchec A, Jepsen W, De Both M, Belnap N, Boland A, Piras IS, Deleuze JF, Szelinger S, Dollfus H, Chelly J, Muller J, Campbell A, Lal D, Rangasamy S, Mandel JL, Narayanan V, Huentelman M, Weil D, Piton A. Rare De Novo Missense Variants in RNA Helicase DDX6 Cause Intellectual Disability and Dysmorphic Features and Lead to P-Body Defects and RNA Dysregulation. Am J Hum Genet 2019; 105:509-525. [PMID: 31422817 PMCID: PMC6731366 DOI: 10.1016/j.ajhg.2019.07.010] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Accepted: 07/17/2019] [Indexed: 01/13/2023] Open
Abstract
The human RNA helicase DDX6 is an essential component of membrane-less organelles called processing bodies (PBs). PBs are involved in mRNA metabolic processes including translational repression via coordinated storage of mRNAs. Previous studies in human cell lines have implicated altered DDX6 in molecular and cellular dysfunction, but clinical consequences and pathogenesis in humans have yet to be described. Here, we report the identification of five rare de novo missense variants in DDX6 in probands presenting with intellectual disability, developmental delay, and similar dysmorphic features including telecanthus, epicanthus, arched eyebrows, and low-set ears. All five missense variants (p.His372Arg, p.Arg373Gln, p.Cys390Arg, p.Thr391Ile, and p.Thr391Pro) are located in two conserved motifs of the RecA-2 domain of DDX6 involved in RNA binding, helicase activity, and protein-partner binding. We use functional studies to demonstrate that the first variants identified (p.Arg373Gln and p.Cys390Arg) cause significant defects in PB assembly in primary fibroblast and model human cell lines. These variants' interactions with several protein partners were also disrupted in immunoprecipitation assays. Further investigation via complementation assays included the additional variants p.Thr391Ile and p.Thr391Pro, both of which, similarly to p.Arg373Gln and p.Cys390Arg, demonstrated significant defects in P-body assembly. Complementing these molecular findings, modeling of the variants on solved protein structures showed distinct spatial clustering near known protein binding regions. Collectively, our clinical and molecular data describe a neurodevelopmental syndrome associated with pathogenic missense variants in DDX6. Additionally, we suggest DDX6 join the DExD/H-box genes DDX3X and DHX30 in an emerging class of neurodevelopmental disorders involving RNA helicases.
Collapse
Affiliation(s)
- Chris Balak
- Translational Genomics Research Institute, Neurogenomics Division, Phoenix, AZ 85004, USA; Translational Genomics Research Institute's Center for Rare Childhood Disorders, Phoenix, AZ 85012, USA.
| | - Marianne Benard
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie du Développement, F-75005 Paris, France
| | - Elise Schaefer
- Medical Genetics Department, University Hospitals of Strasbourg, the Institute of Medical Genetics of Alsace, 67000 Strasbourg, France; Laboratoire de Génétique Médicale, Institut de Génétique Médicale d'Alsace, INSERM U1112, Fédération de Médecine Translationnelle de Strasbourg, Université de Strasbourg, 67081 Strasbourg, France
| | - Sumaiya Iqbal
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Keri Ramsey
- Translational Genomics Research Institute, Neurogenomics Division, Phoenix, AZ 85004, USA; Translational Genomics Research Institute's Center for Rare Childhood Disorders, Phoenix, AZ 85012, USA
| | - Michèle Ernoult-Lange
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie du Développement, F-75005 Paris, France
| | - Francesca Mattioli
- Institute of Genetics and Molecular and Cellular Biology, Illkirch, France; French National Center for Scientific Research, UMR7104, 67400 Illkirch, France; National Institute of Health and Medical Research U964, 67400 Illkirch, France; University of Strasbourg, 67081 Illkirch, France
| | - Lorida Llaci
- Translational Genomics Research Institute, Neurogenomics Division, Phoenix, AZ 85004, USA; Translational Genomics Research Institute's Center for Rare Childhood Disorders, Phoenix, AZ 85012, USA
| | - Véronique Geoffroy
- Laboratoire de Génétique Médicale, Institut de Génétique Médicale d'Alsace, INSERM U1112, Fédération de Médecine Translationnelle de Strasbourg, Université de Strasbourg, 67081 Strasbourg, France
| | - Maité Courel
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie du Développement, F-75005 Paris, France
| | - Marcus Naymik
- Translational Genomics Research Institute, Neurogenomics Division, Phoenix, AZ 85004, USA; Translational Genomics Research Institute's Center for Rare Childhood Disorders, Phoenix, AZ 85012, USA
| | | | - Rolph Pfundt
- Department of Genetics, University Medical Center Groningen, University of Groningen, 9713 GZ Groningen, the Netherlands
| | - Patrick Rump
- Radboud University Nijmegen Medical Center, Department of Human Genetics, Division of Genome Diagnostics, 6525 GA Nijmegen, the Netherlands
| | - Johanna Ter Beest
- Department of Genetics, University Medical Center Groningen, University of Groningen, 9713 GZ Groningen, the Netherlands
| | | | | | | | - Ryan Richholt
- Translational Genomics Research Institute, Neurogenomics Division, Phoenix, AZ 85004, USA
| | - Antony Le Béchec
- Medical Bioinformatics Unit, UF7363, Strasbourg University Hospital, 67000 Strasbourg, France
| | - Wayne Jepsen
- Translational Genomics Research Institute, Neurogenomics Division, Phoenix, AZ 85004, USA; Translational Genomics Research Institute's Center for Rare Childhood Disorders, Phoenix, AZ 85012, USA
| | - Matt De Both
- Translational Genomics Research Institute, Neurogenomics Division, Phoenix, AZ 85004, USA; Translational Genomics Research Institute's Center for Rare Childhood Disorders, Phoenix, AZ 85012, USA
| | - Newell Belnap
- Translational Genomics Research Institute's Center for Rare Childhood Disorders, Phoenix, AZ 85012, USA
| | - Anne Boland
- Centre National de Recherche en Génomique Humaine, Institut de Biologie François Jacob, CEA, Université Paris-Saclay, F-91057, Evry, France
| | - Ignazio S Piras
- Translational Genomics Research Institute, Neurogenomics Division, Phoenix, AZ 85004, USA; Translational Genomics Research Institute's Center for Rare Childhood Disorders, Phoenix, AZ 85012, USA
| | - Jean-François Deleuze
- Centre National de Recherche en Génomique Humaine, Institut de Biologie François Jacob, CEA, Université Paris-Saclay, F-91057, Evry, France
| | - Szabolcs Szelinger
- Translational Genomics Research Institute, Neurogenomics Division, Phoenix, AZ 85004, USA; Translational Genomics Research Institute's Center for Rare Childhood Disorders, Phoenix, AZ 85012, USA
| | - Hélène Dollfus
- Medical Genetics Department, University Hospitals of Strasbourg, the Institute of Medical Genetics of Alsace, 67000 Strasbourg, France; Laboratoire de Génétique Médicale, Institut de Génétique Médicale d'Alsace, INSERM U1112, Fédération de Médecine Translationnelle de Strasbourg, Université de Strasbourg, 67081 Strasbourg, France
| | - Jamel Chelly
- Institute of Genetics and Molecular and Cellular Biology, Illkirch, France; French National Center for Scientific Research, UMR7104, 67400 Illkirch, France; National Institute of Health and Medical Research U964, 67400 Illkirch, France; University of Strasbourg, 67081 Illkirch, France; Molecular Genetics Unit, Strasbourg University Hospital, 67000 Strasbourg, France
| | - Jean Muller
- Laboratoire de Génétique Médicale, Institut de Génétique Médicale d'Alsace, INSERM U1112, Fédération de Médecine Translationnelle de Strasbourg, Université de Strasbourg, 67081 Strasbourg, France; Molecular Genetics Unit, Strasbourg University Hospital, 67000 Strasbourg, France
| | - Arthur Campbell
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Dennis Lal
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Genomic Medicine Institute, Lerner Research Institute Cleveland Clinic, Cleveland, OH 44195, USA; Cologne Center for Genomics, University of Cologne, 50931 Cologne, Germany
| | - Sampathkumar Rangasamy
- Translational Genomics Research Institute, Neurogenomics Division, Phoenix, AZ 85004, USA; Translational Genomics Research Institute's Center for Rare Childhood Disorders, Phoenix, AZ 85012, USA
| | - Jean-Louis Mandel
- Institute of Genetics and Molecular and Cellular Biology, Illkirch, France; French National Center for Scientific Research, UMR7104, 67400 Illkirch, France; National Institute of Health and Medical Research U964, 67400 Illkirch, France; University of Strasbourg, 67081 Illkirch, France; University of Strasbourg Institute of Advanced Studies, 67081 Strasbourg, France
| | - Vinodh Narayanan
- Translational Genomics Research Institute, Neurogenomics Division, Phoenix, AZ 85004, USA; Translational Genomics Research Institute's Center for Rare Childhood Disorders, Phoenix, AZ 85012, USA
| | - Matt Huentelman
- Translational Genomics Research Institute, Neurogenomics Division, Phoenix, AZ 85004, USA; Translational Genomics Research Institute's Center for Rare Childhood Disorders, Phoenix, AZ 85012, USA
| | - Dominique Weil
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie du Développement, F-75005 Paris, France
| | - Amélie Piton
- Institute of Genetics and Molecular and Cellular Biology, Illkirch, France; French National Center for Scientific Research, UMR7104, 67400 Illkirch, France; National Institute of Health and Medical Research U964, 67400 Illkirch, France; University of Strasbourg, 67081 Illkirch, France; Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH 44195, USA.
| |
Collapse
|
111
|
Pejaver V, Babbi G, Casadio R, Folkman L, Katsonis P, Kundu K, Lichtarge O, Martelli PL, Miller M, Moult J, Pal LR, Savojardo C, Yin Y, Zhou Y, Radivojac P, Bromberg Y. Assessment of methods for predicting the effects of PTEN and TPMT protein variants. Hum Mutat 2019; 40:1495-1506. [PMID: 31184403 PMCID: PMC6744362 DOI: 10.1002/humu.23838] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Revised: 05/27/2019] [Accepted: 06/06/2019] [Indexed: 01/16/2023]
Abstract
Thermodynamic stability is a fundamental property shared by all proteins. Changes in stability due to mutation are a widespread molecular mechanism in genetic diseases. Methods for the prediction of mutation-induced stability change have typically been developed and evaluated on incomplete and/or biased data sets. As part of the Critical Assessment of Genome Interpretation, we explored the utility of high-throughput variant stability profiling (VSP) assay data as an alternative for the assessment of computational methods and evaluated state-of-the-art predictors against over 7,000 nonsynonymous variants from two proteins. We found that predictions were modestly correlated with actual experimental values. Predictors fared better when evaluated as classifiers of extreme stability effects. While different methods emerging as top performers depending on the metric, it is nontrivial to draw conclusions on their adoption or improvement. Our analyses revealed that only 16% of all variants in VSP assays could be confidently defined as stability-affecting. Furthermore, it is unclear as to what extent VSP abundance scores were reasonable proxies for the stability-related quantities that participating methods were designed to predict. Overall, our observations underscore the need for clearly defined objectives when developing and using both computational and experimental methods in the context of measuring variant impact.
Collapse
Affiliation(s)
- Vikas Pejaver
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington
- The eScience Institute, University of Washington, Seattle, Washington
| | - Giulia Babbi
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | - Lukas Folkman
- School of Information and Communication Technology, Griffith University, Southport, Australia
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
| | - Kunal Kundu
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland
- Computational Biology, Bioinformatics and Genomics, Biological Sciences Graduate Program, University of Maryland, College Park, Maryland
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
- Department of Biochemistry & Molecular Biology, Baylor College of Medicine, Houston, Texas
- Department of Pharmacology, Baylor College of Medicine, Houston, Texas
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas
| | - Pier Luigi Martelli
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | - Maximilian Miller
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey
| | - John Moult
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland
| | - Lipika R Pal
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland
| | - Castrense Savojardo
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | - Yizhou Yin
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Australia
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey
- Department of Genetics, Human Genetics Institute, Rutgers University, Piscataway, New Jersey
- Institute for Advanced Study at Technische Universität München (TUM-IAS), Garching/Munich, Germany
| |
Collapse
|
112
|
Wu Y, Weile J, Cote AG, Sun S, Knapp J, Verby M, Roth FP. A web application and service for imputing and visualizing missense variant effect maps. Bioinformatics 2019; 35:3191-3193. [PMID: 30649215 PMCID: PMC6735881 DOI: 10.1093/bioinformatics/btz012] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Revised: 12/04/2018] [Accepted: 01/07/2019] [Indexed: 11/24/2022] Open
Abstract
SUMMARY The promise of personalized genomic medicine depends on our ability to assess the functional impact of rare sequence variation. Multiplexed assays can experimentally measure the functional impact of missense variants on a massive scale. However, even after such assays, many missense variants remain poorly measured. Here we describe a software pipeline and application to impute missing information in experimentally determined variant effect maps. AVAILABILITY AND IMPLEMENTATION http://impute.varianteffect.org source code: https://github.com/joewuca/imputation. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yingzhou Wu
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Jochen Weile
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Atina G Cote
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Song Sun
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Jennifer Knapp
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Marta Verby
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Frederick P Roth
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
- Center for Cancer Systems Biology, Dana Farber Cancer Institute, Boston, MA, USA
- Canadian Institute for Advanced Research, Toronto, ON, Canada
| |
Collapse
|
113
|
Stein A, Fowler DM, Hartmann-Petersen R, Lindorff-Larsen K. Biophysical and Mechanistic Models for Disease-Causing Protein Variants. Trends Biochem Sci 2019; 44:575-588. [PMID: 30712981 PMCID: PMC6579676 DOI: 10.1016/j.tibs.2019.01.003] [Citation(s) in RCA: 96] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Revised: 01/04/2019] [Accepted: 01/08/2019] [Indexed: 12/13/2022]
Abstract
The rapid decrease in DNA sequencing cost is revolutionizing medicine and science. In medicine, genome sequencing has revealed millions of missense variants that change protein sequences, yet we only understand the molecular and phenotypic consequences of a small fraction. Within protein science, high-throughput deep mutational scanning experiments enable us to probe thousands of variants in a single, multiplexed experiment. We review efforts that bring together these topics via experimental and computational approaches to determine the consequences of missense variants in proteins. We focus on the role of changes in protein stability as a driver for disease, and how experiments, biophysical models, and computation are providing a framework for understanding and predicting how changes in protein sequence affect cellular protein stability.
Collapse
Affiliation(s)
- Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Douglas M Fowler
- Departments of Genome Sciences and Bioengineering, University of Washington, Seattle, WA, USA
| | - Rasmus Hartmann-Petersen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
114
|
Wan A, Place E, Pierce EA, Comander J. Characterizing variants of unknown significance in rhodopsin: A functional genomics approach. Hum Mutat 2019; 40:1127-1144. [PMID: 30977563 PMCID: PMC7027811 DOI: 10.1002/humu.23762] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 03/31/2019] [Accepted: 04/08/2019] [Indexed: 01/19/2023]
Abstract
Characterizing the pathogenicity of DNA sequence variants of unknown significance (VUS) is a major bottleneck in human genetics, and is increasingly important in determining which patients with inherited retinal diseases could benefit from gene therapy. A library of 210 rhodopsin (RHO) variants from literature and in‐house genetic diagnostic testing were created to efficiently detect pathogenic RHO variants that fail to express on the cell surface. This study, while focused on RHO, demonstrates a streamlined, generalizable method for detecting pathogenic VUS. A relatively simple next‐generation sequencing‐based readout was developed so that a flow cytometry‐based assay could be performed simultaneously on all variants in a pooled format, without the need for barcodes or viral transduction. The resulting dataset characterized the surface expression of every RHO library variant with a high degree of reproducibility (r2 = 0.92–0.95), recategorizing 37 variants. For example, three retinitis pigmentosa pedigrees were solved by identifying VUS which showed low expression levels (p.G18D, p.G101V, and p.P180T). Results were validated across multiple assays and correlated with clinical disease severity. This study presents a parallelized, higher‐throughput cell‐based assay for the functional characterization of VUS in RHO, and can be applied more broadly to other inherited retinal disease genes and other disorders.
Collapse
Affiliation(s)
- Aliete Wan
- Department of Ophthalmology, Ocular Genomics Institute, Berman-Gund Laboratory for the Study of Retinal Degenerations, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts
| | - Emily Place
- Department of Ophthalmology, Ocular Genomics Institute, Berman-Gund Laboratory for the Study of Retinal Degenerations, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts
| | - Eric A Pierce
- Department of Ophthalmology, Ocular Genomics Institute, Berman-Gund Laboratory for the Study of Retinal Degenerations, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts
| | - Jason Comander
- Department of Ophthalmology, Ocular Genomics Institute, Berman-Gund Laboratory for the Study of Retinal Degenerations, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
115
|
Jakobson CM, Jarosz DF. Molecular Origins of Complex Heritability in Natural Genotype-to-Phenotype Relationships. Cell Syst 2019; 8:363-379.e3. [PMID: 31054809 PMCID: PMC6560647 DOI: 10.1016/j.cels.2019.04.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 03/25/2019] [Accepted: 04/05/2019] [Indexed: 01/09/2023]
Abstract
The statistical complexity of heredity has long been evident, but its molecular origins remain elusive. To investigate, we charted 90 comprehensive genotype-to-phenotype maps in a large population of wild diploid yeast. In contrast to long-standing assumptions, all types of genetic variation contributed similarly to phenotype. Causal synonymous and regulatory variants exhibited distinct molecular signatures, as did nonlinearities in heterozygote fitness that likely contribute to hybrid vigor. Highly pleiotropic variants altered disordered sequences within signaling hubs, and their effects correlated across environments-even when antagonistic-suggesting that large fitness gains bring concomitant costs. Natural genetic networks defined by the causal loci differed from those determined by precise gene deletions or protein-protein interactions. Finally, we found that traits that would appear omnigenic in less powered studies do in fact have finite genetic determinants. Integrating these molecular principles will be crucial as genome reading and writing become routine in research, industry, and medicine.
Collapse
Affiliation(s)
- Christopher M Jakobson
- Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Daniel F Jarosz
- Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305, USA.
| |
Collapse
|
116
|
Soussi T, Leroy B, Devir M, Rosenberg S. High prevalence of cancer-associated TP53 variants in the gnomAD database: A word of caution concerning the use of variant filtering. Hum Mutat 2019; 40:516-524. [PMID: 30720243 DOI: 10.1002/humu.23717] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Revised: 01/09/2019] [Accepted: 01/28/2019] [Indexed: 12/14/2022]
Abstract
The 1,000 genome project, the Exome Aggregation Consortium (ExAC) or the Genome Aggregation database (gnomAD) datasets, were developed to provide large-scale reference data of genetic variations for various populations to filter out common benign variants and identify rare variants of clinical importance based on their frequency in the human population. Using a TP53 repository of 80,000 cancer variants, as well as TP53 variants from multiple cancer genome projects, we have defined a set of certified oncogenic TP53 variants. This specific set has been independently validated by functional and in silico predictive analysis. Here we show that a significant number of these variants are included in gnomAD and ExAC. Most of them correspond to TP53 hotspot variants occurring as somatic and germline events in human cancer. Similarly, disease-associated variants for five other tumor suppressor genes, including BRCA1, BRCA2, APC, PTEN, and MLH1, have also been identified. This study demonstrates that germline TP53 variants in the human population are more frequent than previously thought. Furthermore, population databases such as gnomAD or ExAC must be used with caution and need to be annotated for the presence of oncogenic variants to improve their clinical utility.
Collapse
Affiliation(s)
- Thierry Soussi
- UPMC Univ, Sorbonne Université, Dpt of Life Science, Paris, France.,Centre de Recherche des Cordeliers, INSERM, Paris, France.,Department of Oncology-Pathology, Cancer Center Karolinska (CCK), Karolinska Institutet, Stockholm, Sweden
| | - Bernard Leroy
- UPMC Univ, Sorbonne Université, Dpt of Life Science, Paris, France
| | - Michal Devir
- Laboratory for Cancer Computational Biology, Hadassah Medical Center, Hebrew University, Jerusalem, Israel
| | - Shai Rosenberg
- Laboratory for Cancer Computational Biology, Hadassah Medical Center, Hebrew University, Jerusalem, Israel.,Gaffin Center for Neuro-oncology, Sharett Institute for Oncology, Hadassah-Hebrew University Medical Center, Jerusalem, Israel
| |
Collapse
|
117
|
Gauthier L, Di Franco R, Serohijos AWR. SodaPop: a forward simulation suite for the evolutionary dynamics of asexual populations on protein fitness landscapes. Bioinformatics 2019; 35:4053-4062. [DOI: 10.1093/bioinformatics/btz175] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2018] [Revised: 01/21/2019] [Accepted: 03/12/2019] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Protein evolution is determined by forces at multiple levels of biological organization. Random mutations have an immediate effect on the biophysical properties, structure and function of proteins. These same mutations also affect the fitness of the organism. However, the evolutionary fate of mutations, whether they succeed to fixation or are purged, also depends on population size and dynamics. There is an emerging interest, both theoretically and experimentally, to integrate these two factors in protein evolution. Although there are several tools available for simulating protein evolution, most of them focus on either the biophysical or the population-level determinants, but not both. Hence, there is a need for a publicly available computational tool to explore both the effects of protein biophysics and population dynamics on protein evolution.
Results
To address this need, we developed SodaPop, a computational suite to simulate protein evolution in the context of the population dynamics of asexual populations. SodaPop accepts as input several fitness landscapes based on protein biochemistry or other user-defined fitness functions. The user can also provide as input experimental fitness landscapes derived from deep mutational scanning approaches or theoretical landscapes derived from physical force field estimates. Here, we demonstrate the broad utility of SodaPop with different applications describing the interplay of selection for protein properties and population dynamics. SodaPop is designed such that population geneticists can explore the influence of protein biochemistry on patterns of genetic variation, and that biochemists and biophysicists can explore the role of population size and demography on protein evolution.
Availability and implementation
Source code and binaries are freely available at https://github.com/louisgt/SodaPop under the GNU GPLv3 license. The software is implemented in C++ and supported on Linux, Mac OS/X and Windows.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Louis Gauthier
- Département de Biochimie, Université de Montréal, Montréal, QC, Canada
- Centre Robert-Cedergren en Bioinformatique et Génomique, Université de Montréal, Montréal, QC, Canada
| | - Rémicia Di Franco
- Département de Biochimie, Université de Montréal, Montréal, QC, Canada
- Centre Robert-Cedergren en Bioinformatique et Génomique, Université de Montréal, Montréal, QC, Canada
- Enseirb-Matmeca, Bordeaux Institute of Technology, Talence, France
| | - Adrian W R Serohijos
- Département de Biochimie, Université de Montréal, Montréal, QC, Canada
- Centre Robert-Cedergren en Bioinformatique et Génomique, Université de Montréal, Montréal, QC, Canada
| |
Collapse
|
118
|
Wagih O, Galardini M, Busby BP, Memon D, Typas A, Beltrao P. A resource of variant effect predictions of single nucleotide variants in model organisms. Mol Syst Biol 2018; 14:e8430. [PMID: 30573687 PMCID: PMC6301329 DOI: 10.15252/msb.20188430] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Revised: 11/19/2018] [Accepted: 11/21/2018] [Indexed: 12/18/2022] Open
Abstract
The effect of single nucleotide variants (SNVs) in coding and noncoding regions is of great interest in genetics. Although many computational methods aim to elucidate the effects of SNVs on cellular mechanisms, it is not straightforward to comprehensively cover different molecular effects. To address this, we compiled and benchmarked sequence and structure-based variant effect predictors and we computed the impact of nearly all possible amino acid and nucleotide variants in the reference genomes of Homo sapiens, Saccharomyces cerevisiae and Escherichia coli Studied mechanisms include protein stability, interaction interfaces, post-translational modifications and transcription factor binding sites. We apply this resource to the study of natural and disease coding variants. We also show how variant effects can be aggregated to generate protein complex burden scores that uncover protein complex to phenotype associations based on a set of newly generated growth profiles of 93 sequenced S. cerevisiae strains in 43 conditions. This resource is available through mutfunc (www.mutfunc.com), a tool by which users can query precomputed predictions by providing amino acid or nucleotide-level variants.
Collapse
Affiliation(s)
- Omar Wagih
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Marco Galardini
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Bede P Busby
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Danish Memon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Athanasios Typas
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Pedro Beltrao
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| |
Collapse
|
119
|
Ulirsch JC, Verboon JM, Kazerounian S, Guo MH, Yuan D, Ludwig LS, Handsaker RE, Abdulhay NJ, Fiorini C, Genovese G, Lim ET, Cheng A, Cummings BB, Chao KR, Beggs AH, Genetti CA, Sieff CA, Newburger PE, Niewiadomska E, Matysiak M, Vlachos A, Lipton JM, Atsidaftos E, Glader B, Narla A, Gleizes PE, O'Donohue MF, Montel-Lehry N, Amor DJ, McCarroll SA, O'Donnell-Luria AH, Gupta N, Gabriel SB, MacArthur DG, Lander ES, Lek M, Da Costa L, Nathan DG, Korostelev AA, Do R, Sankaran VG, Gazda HT. The Genetic Landscape of Diamond-Blackfan Anemia. Am J Hum Genet 2018; 103:930-947. [PMID: 30503522 PMCID: PMC6288280 DOI: 10.1016/j.ajhg.2018.10.027] [Citation(s) in RCA: 159] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Accepted: 10/29/2018] [Indexed: 01/19/2023] Open
Abstract
Diamond-Blackfan anemia (DBA) is a rare bone marrow failure disorder that affects 7 out of 1,000,000 live births and has been associated with mutations in components of the ribosome. In order to characterize the genetic landscape of this heterogeneous disorder, we recruited a cohort of 472 individuals with a clinical diagnosis of DBA and performed whole-exome sequencing (WES). We identified relevant rare and predicted damaging mutations for 78% of individuals. The majority of mutations were singletons, absent from population databases, predicted to cause loss of function, and located in 1 of 19 previously reported ribosomal protein (RP)-encoding genes. Using exon coverage estimates, we identified and validated 31 deletions in RP genes. We also observed an enrichment for extended splice site mutations and validated their diverse effects using RNA sequencing in cell lines obtained from individuals with DBA. Leveraging the size of our cohort, we observed robust genotype-phenotype associations with congenital abnormalities and treatment outcomes. We further identified rare mutations in seven previously unreported RP genes that may cause DBA, as well as several distinct disorders that appear to phenocopy DBA, including nine individuals with biallelic CECR1 mutations that result in deficiency of ADA2. However, no new genes were identified at exome-wide significance, suggesting that there are no unidentified genes containing mutations readily identified by WES that explain >5% of DBA-affected case subjects. Overall, this report should inform not only clinical practice for DBA-affected individuals, but also the design and analysis of rare variant studies for heterogeneous Mendelian disorders.
Collapse
Affiliation(s)
- Jacob C Ulirsch
- Division of Hematology/Oncology, The Manton Center for Orphan Disease Research, Boston Children's Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, USA
| | - Jeffrey M Verboon
- Division of Hematology/Oncology, The Manton Center for Orphan Disease Research, Boston Children's Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Shideh Kazerounian
- Division of Genetics and Genomics, The Manton Center for Orphan Disease Research, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Michael H Guo
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Daniel Yuan
- Division of Genetics and Genomics, The Manton Center for Orphan Disease Research, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Leif S Ludwig
- Division of Hematology/Oncology, The Manton Center for Orphan Disease Research, Boston Children's Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Robert E Handsaker
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Nour J Abdulhay
- Division of Hematology/Oncology, The Manton Center for Orphan Disease Research, Boston Children's Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Claudia Fiorini
- Division of Hematology/Oncology, The Manton Center for Orphan Disease Research, Boston Children's Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Giulio Genovese
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Elaine T Lim
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Aaron Cheng
- Division of Hematology/Oncology, The Manton Center for Orphan Disease Research, Boston Children's Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Beryl B Cummings
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, USA
| | - Katherine R Chao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Alan H Beggs
- Division of Genetics and Genomics, The Manton Center for Orphan Disease Research, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Casie A Genetti
- Division of Genetics and Genomics, The Manton Center for Orphan Disease Research, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Colin A Sieff
- Division of Hematology/Oncology, The Manton Center for Orphan Disease Research, Boston Children's Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Peter E Newburger
- Department of Pediatrics, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Edyta Niewiadomska
- Department of Pediatric Hematology/Oncology, Medical University of Warsaw, Warsaw, Poland
| | - Michal Matysiak
- Department of Pediatric Hematology/Oncology, Medical University of Warsaw, Warsaw, Poland
| | - Adrianna Vlachos
- Feinstein Institute for Medical Research, Manhasset, NY; Division of Hematology/Oncology and Stem Cell Transplantation, Cohen Children's Medical Center, New Hyde Park, NY; Hofstra Northwell School of Medicine, Hempstead, NY 11030, USA
| | - Jeffrey M Lipton
- Feinstein Institute for Medical Research, Manhasset, NY; Division of Hematology/Oncology and Stem Cell Transplantation, Cohen Children's Medical Center, New Hyde Park, NY; Hofstra Northwell School of Medicine, Hempstead, NY 11030, USA
| | - Eva Atsidaftos
- Feinstein Institute for Medical Research, Manhasset, NY; Division of Hematology/Oncology and Stem Cell Transplantation, Cohen Children's Medical Center, New Hyde Park, NY; Hofstra Northwell School of Medicine, Hempstead, NY 11030, USA
| | - Bertil Glader
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 02114, USA
| | - Anupama Narla
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 02114, USA
| | - Pierre-Emmanuel Gleizes
- Laboratory of Eukaryotic Molecular Biology, Center for Integrative Biology (CBI), University of Toulouse, CNRS, Toulouse, France
| | - Marie-Françoise O'Donohue
- Laboratory of Eukaryotic Molecular Biology, Center for Integrative Biology (CBI), University of Toulouse, CNRS, Toulouse, France
| | - Nathalie Montel-Lehry
- Laboratory of Eukaryotic Molecular Biology, Center for Integrative Biology (CBI), University of Toulouse, CNRS, Toulouse, France
| | - David J Amor
- Murdoch Children's Research Institute and Department of Paediatrics, University of Melbourne, Parkville, VIC, Australia
| | - Steven A McCarroll
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Anne H O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, The Manton Center for Orphan Disease Research, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Namrata Gupta
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Stacey B Gabriel
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Daniel G MacArthur
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Eric S Lander
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Monkol Lek
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Lydie Da Costa
- University Paris VII Denis DIDEROT, Faculté de Médecine Xavier Bichat, 75019 Paris, France; Laboratory of Excellence for Red Cell, LABEX GR-Ex, 75015 Paris, France
| | - David G Nathan
- Division of Hematology/Oncology, The Manton Center for Orphan Disease Research, Boston Children's Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Andrei A Korostelev
- RNA Therapeutics Institute, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, 368 Plantation Street, Worcester, MA 01605, USA
| | - Ron Do
- Department of Genetics and Genomic Sciences and The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Vijay G Sankaran
- Division of Hematology/Oncology, The Manton Center for Orphan Disease Research, Boston Children's Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Stem Cell Institute, Cambridge, MA 02138, USA.
| | - Hanna T Gazda
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, The Manton Center for Orphan Disease Research, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
120
|
Zhou Y, Fujikura K, Mkrtchian S, Lauschke VM. Computational Methods for the Pharmacogenetic Interpretation of Next Generation Sequencing Data. Front Pharmacol 2018; 9:1437. [PMID: 30564131 PMCID: PMC6288784 DOI: 10.3389/fphar.2018.01437] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 11/20/2018] [Indexed: 12/21/2022] Open
Abstract
Up to half of all patients do not respond to pharmacological treatment as intended. A substantial fraction of these inter-individual differences is due to heritable factors and a growing number of associations between genetic variations and drug response phenotypes have been identified. Importantly, the rapid progress in Next Generation Sequencing technologies in recent years unveiled the true complexity of the genetic landscape in pharmacogenes with tens of thousands of rare genetic variants. As each individual was found to harbor numerous such rare variants they are anticipated to be important contributors to the genetically encoded inter-individual variability in drug effects. The fundamental challenge however is their functional interpretation due to the sheer scale of the problem that renders systematic experimental characterization of these variants currently unfeasible. Here, we review concepts and important progress in the development of computational prediction methods that allow to evaluate the effect of amino acid sequence alterations in drug metabolizing enzymes and transporters. In addition, we discuss recent advances in the interpretation of functional effects of non-coding variants, such as variations in splice sites, regulatory regions and miRNA binding sites. We anticipate that these methodologies will provide a useful toolkit to facilitate the integration of the vast extent of rare genetic variability into drug response predictions in a precision medicine framework.
Collapse
Affiliation(s)
- Yitian Zhou
- Section of Pharmacogenetics, Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| | - Kohei Fujikura
- Department of Diagnostic Pathology, Kobe University Graduate School of Medicine, Kobe, Japan
| | - Souren Mkrtchian
- Section of Pharmacogenetics, Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| | - Volker M. Lauschke
- Section of Pharmacogenetics, Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
121
|
Multiplexed assays of variant effects contribute to a growing genotype-phenotype atlas. Hum Genet 2018; 137:665-678. [PMID: 30073413 PMCID: PMC6153521 DOI: 10.1007/s00439-018-1916-x] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Accepted: 07/21/2018] [Indexed: 12/12/2022]
Abstract
Given the constantly improving cost and speed of genome sequencing, it is reasonable to expect that personal genomes will soon be known for many millions of humans. This stands in stark contrast with our limited ability to interpret the sequence variants which we find. Although it is, perhaps, easiest to interpret variants in coding regions, knowledge of functional impact is unknown for the vast majority of missense variants. While many computational approaches can predict the impact of coding variants, they are given a little weight in the current guidelines for interpreting clinical variants. Laboratory assays produce comparatively more trustworthy results, but until recently did not scale to the space of all possible mutations. The development of deep mutational scanning and other multiplexed assays of variant effect has now brought feasibility of this endeavour within view. Here, we review progress in this field over the last decade, break down the different approaches into their components, and compare methodological differences.
Collapse
|
122
|
Mighell TL, Evans-Dutson S, O'Roak BJ. A Saturation Mutagenesis Approach to Understanding PTEN Lipid Phosphatase Activity and Genotype-Phenotype Relationships. Am J Hum Genet 2018; 102:943-955. [PMID: 29706350 PMCID: PMC5986715 DOI: 10.1016/j.ajhg.2018.03.018] [Citation(s) in RCA: 118] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Accepted: 03/16/2018] [Indexed: 12/19/2022] Open
Abstract
Phosphatase and tensin homolog (PTEN) is a tumor suppressor frequently mutated in diverse cancers. Germline PTEN mutations are also associated with a range of clinical outcomes, including PTEN hamartoma tumor syndrome (PHTS) and autism spectrum disorder (ASD). To empower new insights into PTEN function and clinically relevant genotype-phenotype relationships, we systematically evaluated the effect of PTEN mutations on lipid phosphatase activity in vivo. Using a massively parallel approach that leverages an artificial humanized yeast model, we derived high-confidence estimates of functional impact for 7,244 single amino acid PTEN variants (86% of possible). We identified 2,273 mutations with reduced cellular lipid phosphatase activity, which includes 1,789 missense mutations. These data recapitulated known functional findings but also uncovered new insights into PTEN protein structure, biochemistry, and mutation tolerance. Several residues in the catalytic pocket showed surprising mutational tolerance. We identified that the solvent exposure of wild-type residues is a critical determinant of mutational tolerance. Further, we created a comprehensive functional map by leveraging correlations between amino acid substitutions to impute functional scores for all variants, including those not present in the assay. Variant functional scores can reliably discriminate likely pathogenic from benign alleles. Further, 32% of ClinVar unclassified missense variants are phosphatase deficient in our assay, supporting their reclassification. ASD-associated mutations generally had less severe fitness scores relative to PHTS-associated mutations (p = 7.16 × 10-5) and a higher fraction of hypomorphic mutations, arguing for continued genotype-phenotype studies in larger clinical datasets that can further leverage these rich functional data.
Collapse
Affiliation(s)
- Taylor L Mighell
- Neuroscience Graduate Program, Oregon Health & Science University, Portland, OR 97239, USA; Department of Molecular & Medical Genetics, Oregon Health & Science University, Portland, OR 97239, USA
| | - Sara Evans-Dutson
- Department of Molecular & Medical Genetics, Oregon Health & Science University, Portland, OR 97239, USA
| | - Brian J O'Roak
- Department of Molecular & Medical Genetics, Oregon Health & Science University, Portland, OR 97239, USA.
| |
Collapse
|