1
|
Thompson MD, Reiner-Link D, Berghella A, Rana BK, Rovati GE, Capra V, Gorvin CM, Hauser AS. G protein-coupled receptor (GPCR) pharmacogenomics. Crit Rev Clin Lab Sci 2024:1-44. [PMID: 39119983 DOI: 10.1080/10408363.2024.2358304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/03/2023] [Accepted: 05/18/2024] [Indexed: 08/10/2024]
Abstract
The field of pharmacogenetics, the investigation of the influence of one or more sequence variants on drug response phenotypes, is a special case of pharmacogenomics, a discipline that takes a genome-wide approach. Massively parallel, next generation sequencing (NGS), has allowed pharmacogenetics to be subsumed by pharmacogenomics with respect to the identification of variants associated with responders and non-responders, optimal drug response, and adverse drug reactions. A plethora of rare and common naturally-occurring GPCR variants must be considered in the context of signals from across the genome. Many fundamentals of pharmacogenetics were established for G protein-coupled receptor (GPCR) genes because they are primary targets for a large number of therapeutic drugs. Functional studies, demonstrating likely-pathogenic and pathogenic GPCR variants, have been integral to establishing models used for in silico analysis. Variants in GPCR genes include both coding and non-coding single nucleotide variants and insertion or deletions (indels) that affect cell surface expression (trafficking, dimerization, and desensitization/downregulation), ligand binding and G protein coupling, and variants that result in alternate splicing encoding isoforms/variable expression. As the breadth of data on the GPCR genome increases, we may expect an increase in the use of drug labels that note variants that significantly impact the clinical use of GPCR-targeting agents. We discuss the implications of GPCR pharmacogenomic data derived from the genomes available from individuals who have been well-phenotyped for receptor structure and function and receptor-ligand interactions, and the potential benefits to patients of optimized drug selection. Examples discussed include the renin-angiotensin system in SARS-CoV-2 (COVID-19) infection, the probable role of chemokine receptors in the cytokine storm, and potential protease activating receptor (PAR) interventions. Resources dedicated to GPCRs, including publicly available computational tools, are also discussed.
Collapse
Affiliation(s)
- Miles D Thompson
- Krembil Brain Institute, Toronto Western Hospital, Toronto, Ontario, Canada
| | - David Reiner-Link
- Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Alessandro Berghella
- Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Brinda K Rana
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - G Enrico Rovati
- Department of Pharmacological and Biomolecular Sciences, Università degli Studi di Milano, Milan, Italy
| | - Valerie Capra
- Department of Pharmacological and Biomolecular Sciences, Università degli Studi di Milano, Milan, Italy
| | - Caroline M Gorvin
- Institute of Metabolism and Systems Research (IMSR), University of Birmingham, Birmingham, United Kingdom
| | - Alexander S Hauser
- Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
2
|
Bromberg Y, Prabakaran R, Kabir A, Shehu A. Variant Effect Prediction in the Age of Machine Learning. Cold Spring Harb Perspect Biol 2024; 16:a041467. [PMID: 38621825 PMCID: PMC11216171 DOI: 10.1101/cshperspect.a041467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
Over the years, many computational methods have been created for the analysis of the impact of single amino acid substitutions resulting from single-nucleotide variants in genome coding regions. Historically, all methods have been supervised and thus limited by the inadequate sizes of experimentally curated data sets and by the lack of a standardized definition of variant effect. The emergence of unsupervised, deep learning (DL)-based methods raised an important question: Can machines learn the language of life from the unannotated protein sequence data well enough to identify significant errors in the protein "sentences"? Our analysis suggests that some unsupervised methods perform as well or better than existing supervised methods. Unsupervised methods are also faster and can, thus, be useful in large-scale variant evaluations. For all other methods, however, their performance varies by both evaluation metrics and by the type of variant effect being predicted. We also note that the evaluation of method performance is still lacking on less-studied, nonhuman proteins where unsupervised methods hold the most promise.
Collapse
Affiliation(s)
- Yana Bromberg
- Department of Biology, Emory University, Atlanta 30322, Georgia, USA
- Department of Computer Science, Emory University, Atlanta 30322, Georgia, USA
| | - R Prabakaran
- Department of Biology, Emory University, Atlanta 30322, Georgia, USA
| | - Anowarul Kabir
- Department of Computer Science, George Mason University, Fairfax 22030, Virginia, USA
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax 22030, Virginia, USA
| |
Collapse
|
3
|
Littmann M, Heinzinger M, Dallago C, Weissenow K, Rost B. Protein embeddings and deep learning predict binding residues for various ligand classes. Sci Rep 2021; 11:23916. [PMID: 34903827 PMCID: PMC8668950 DOI: 10.1038/s41598-021-03431-4] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Accepted: 12/02/2021] [Indexed: 01/27/2023] Open
Abstract
One important aspect of protein function is the binding of proteins to ligands, including small molecules, metal ions, and macromolecules such as DNA or RNA. Despite decades of experimental progress many binding sites remain obscure. Here, we proposed bindEmbed21, a method predicting whether a protein residue binds to metal ions, nucleic acids, or small molecules. The Artificial Intelligence (AI)-based method exclusively uses embeddings from the Transformer-based protein Language Model (pLM) ProtT5 as input. Using only single sequences without creating multiple sequence alignments (MSAs), bindEmbed21DL outperformed MSA-based predictions. Combination with homology-based inference increased performance to F1 = 48 ± 3% (95% CI) and MCC = 0.46 ± 0.04 when merging all three ligand classes into one. All results were confirmed by three independent data sets. Focusing on very reliably predicted residues could complement experimental evidence: For the 25% most strongly predicted binding residues, at least 73% were correctly predicted even when ignoring the problem of missing experimental annotations. The new method bindEmbed21 is fast, simple, and broadly applicable-neither using structure nor MSAs. Thereby, it found binding residues in over 42% of all human proteins not otherwise implied in binding and predicted about 6% of all residues as binding to metal ions, nucleic acids, or small molecules.
Collapse
Affiliation(s)
- Maria Littmann
- Department of Informatics, Bioinformatics and Computational Biology, I12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.
| | - Michael Heinzinger
- Department of Informatics, Bioinformatics and Computational Biology, I12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Christian Dallago
- Department of Informatics, Bioinformatics and Computational Biology, I12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Konstantin Weissenow
- Department of Informatics, Bioinformatics and Computational Biology, I12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Burkhard Rost
- Department of Informatics, Bioinformatics and Computational Biology, I12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, Garching, 85748, Munich, Germany
- TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany
- Department of Biochemistry and Molecular Biophysics, Columbia University, 701 West, 168th Street, New York, NY, 10032, USA
| |
Collapse
|
4
|
In silico analyses of predicted substitutions in fibrinolytic protein ‘Lumbrokinase-6’ suggest enhanced activity. Process Biochem 2021. [DOI: 10.1016/j.procbio.2021.08.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
5
|
Pathogenic missense protein variants affect different functional pathways and proteomic features than healthy population variants. PLoS Biol 2021; 19:e3001207. [PMID: 33909605 PMCID: PMC8110273 DOI: 10.1371/journal.pbio.3001207] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 05/10/2021] [Accepted: 03/26/2021] [Indexed: 12/27/2022] Open
Abstract
Missense variants are present amongst the healthy population, but some of them are causative of human diseases. A classification of variants associated with “healthy” or “diseased” states is therefore not always straightforward. A deeper understanding of the nature of missense variants in health and disease, the cellular processes they may affect, and the general molecular principles which underlie these differences is essential to offer mechanistic explanations of the true impact of pathogenic variants. Here, we have formalised a statistical framework which enables robust probabilistic quantification of variant enrichment across full-length proteins, their domains, and 3D structure-defined regions. Using this framework, we validate and extend previously reported trends of variant enrichment in different protein structural regions (surface/core/interface). By examining the association of variant enrichment with available functional pathways and transcriptomic and proteomic (protein half-life, thermal stability, abundance) data, we have mined a rich set of molecular features which distinguish between pathogenic and population variants: Pathogenic variants mainly affect proteins involved in cell proliferation and nucleotide processing and are enriched in more abundant proteins. Additionally, rare population variants display features closer to common than pathogenic variants. We validate the association between these molecular features and variant pathogenicity by comparing against existing in silico variant impact annotations. This study provides molecular details into how different proteins exhibit resilience and/or sensitivity towards missense variants and provides the rationale to prioritise variant-enriched proteins and protein domains for therapeutic targeting and development. The ZoomVar database, which we created for this study, is available at fraternalilab.kcl.ac.uk/ZoomVar. It allows users to programmatically annotate missense variants with protein structural information and to calculate variant enrichment in different protein structural regions. How do can one improve the classification of genetic variants as harmful or harmless? This study uses a robust statistical analysis to exploit the interplay between protein structure, proteomic measurements and functional pathways to enable better discrimination between missense variants in health and disease.
Collapse
|
6
|
Nawar N, Paul A, Mahmood HN, Faisal MI, Hosen MI, Shekhar HU. Structure analysis of deleterious nsSNPs in human PALB2 protein for functional inference. Bioinformation 2021; 17:424-438. [PMID: 34092963 PMCID: PMC8131579 DOI: 10.6026/97320630017424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Revised: 03/15/2021] [Accepted: 03/18/2021] [Indexed: 11/23/2022] Open
Abstract
Partner and Localizer of BRCA2 or PALB2 is a typical tumor suppressor protein, that responds to DNA double stranded breaks through homologous recombination repair. Heterozygous mutations in PALB2 are known to contribute to the susceptibility of breast and ovarian cancer. However, there is no comprehensive study characterizing the structural and functional impacts of SNPs located in the PALB2 gene. Therefore, it is of interest to document a comprehensive analysis of coding and non-coding SNPs located at the PALB2 loci using in silico tools. The data for 1455 non-synonymous SNPs (nsSNPs) located in the PALB2 loci were retrieved from the dbSNP database. Comprehensive characterization of the SNPs using a combination of in silico tools such as SIFT, PROVEAN, PolyPhen, PANTHER, PhD-SNP, Pmut, MutPred 2.0 and SNAP-2, identified 28 functionally important SNPs. Among these, 16 nsSNPs were further selected for structural analysis using conservation profile and protein stability. The most deleterious nsSNPs were documented within the WD40 domain of PALB2. A general outline of the structural consequences of each variant was developed using the HOPE project data. These 16 mutant structures were further modelled using SWISS Model and three most damaging mutant models (rs78179744, rs180177123 and rs45525135) were identified. The non-coding SNPs in the 3' UTR region of the PALB2 gene were analyzed for altered miRNA target sites. The comprehensive characterization of the coding and non-coding SNPs in the PALB2 locus has provided a list of damaging SNPs with potential disease association. Further validation through genetic association study will reveal their clinical significance.
Collapse
Affiliation(s)
- Noshin Nawar
- Clinical Biochemistry and Translational Medicine Laboratory, Department of Biochemistry and Molecular Biology, University of Dhaka, Bangladesh
| | - Anik Paul
- Clinical Biochemistry and Translational Medicine Laboratory, Department of Biochemistry and Molecular Biology, University of Dhaka, Bangladesh
| | - Hamida Nooreen Mahmood
- Clinical Biochemistry and Translational Medicine Laboratory, Department of Biochemistry and Molecular Biology, University of Dhaka, Bangladesh
| | - Md Ismail Faisal
- Clinical Biochemistry and Translational Medicine Laboratory, Department of Biochemistry and Molecular Biology, University of Dhaka, Bangladesh
| | - Md Ismail Hosen
- Clinical Biochemistry and Translational Medicine Laboratory, Department of Biochemistry and Molecular Biology, University of Dhaka, Bangladesh
| | - Hossain Uddin Shekhar
- Clinical Biochemistry and Translational Medicine Laboratory, Department of Biochemistry and Molecular Biology, University of Dhaka, Bangladesh
| |
Collapse
|
7
|
Mahlich Y, Miller M, Zeng Z, Bromberg Y. Low Diversity of Human Variation Despite Mostly Mild Functional Impact of De Novo Variants. Front Mol Biosci 2021; 8:635382. [PMID: 33816556 PMCID: PMC8012514 DOI: 10.3389/fmolb.2021.635382] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 02/01/2021] [Indexed: 01/07/2023] Open
Abstract
Non-synonymous Single Nucleotide Variants (nsSNVs), resulting in single amino acid variants (SAVs), are important drivers of evolutionary adaptation across the tree of life. Humans carry on average over 10,000 SAVs per individual genome, many of which likely have little to no impact on the function of the protein they affect. Experimental evidence for protein function changes as a result of SAVs remain sparse – a situation that can be somewhat alleviated by predicting their impact using computational methods. Here, we used SNAP to examine both observed and in silico generated human variation in a set of 1,265 proteins that are consistently found across a number of diverse species. The number of SAVs that are predicted to have any functional effect on these proteins is smaller than expected, suggesting sequence/function optimization over evolutionary timescales. Additionally, we find that only a few of the yet-unobserved SAVs could drastically change the function of these proteins, while nearly a quarter would have only a mild functional effect. We observed that variants common in the human population localized to less conserved protein positions and carried mild to moderate functional effects more frequently than rare variants. As expected, rare variants carried severe effects more frequently than common variants. In line with current assumptions, we demonstrated that the change of the human reference sequence amino acid to the reference of another species (a cross-species variant) is unlikely to significantly impact protein function. However, we also observed that many cross-species variants may be weakly non-neutral for the purposes of quick adaptation to environmental changes, but may not be identified as such by current state-of-the-art methodology.
Collapse
Affiliation(s)
- Yannick Mahlich
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States
| | - Maximillian Miller
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States
| | - Zishuo Zeng
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States.,Department of Genetics, Rutgers University, Piscataway, NJ, United States
| |
Collapse
|
8
|
Qiu J, Nechaev D, Rost B. Protein-protein and protein-nucleic acid binding residues important for common and rare sequence variants in human. BMC Bioinformatics 2020; 21:452. [PMID: 33050876 PMCID: PMC7557062 DOI: 10.1186/s12859-020-03759-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 09/16/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Any two unrelated people differ by about 20,000 missense mutations (also referred to as SAVs: Single Amino acid Variants or missense SNV). Many SAVs have been predicted to strongly affect molecular protein function. Common SAVs (> 5% of population) were predicted to have, on average, more effect on molecular protein function than rare SAVs (< 1% of population). We hypothesized that the prevalence of effect in common over rare SAVs might partially be caused by common SAVs more often occurring at interfaces of proteins with other proteins, DNA, or RNA, thereby creating subgroup-specific phenotypes. We analyzed SAVs from 60,706 people through the lens of two prediction methods, one (SNAP2) predicting the effects of SAVs on molecular protein function, the other (ProNA2020) predicting residues in DNA-, RNA- and protein-binding interfaces. RESULTS Three results stood out. Firstly, SAVs predicted to occur at binding interfaces were predicted to more likely affect molecular function than those predicted as not binding (p value < 2.2 × 10-16). Secondly, for SAVs predicted to occur at binding interfaces, common SAVs were predicted more strongly with effect on protein function than rare SAVs (p value < 2.2 × 10-16). Restriction to SAVs with experimental annotations confirmed all results, although the resulting subsets were too small to establish statistical significance for any result. Thirdly, the fraction of SAVs predicted at binding interfaces differed significantly between tissues, e.g. urinary bladder tissue was found abundant in SAVs predicted at protein-binding interfaces, and reproductive tissues (ovary, testis, vagina, seminal vesicle and endometrium) in SAVs predicted at DNA-binding interfaces. CONCLUSIONS Overall, the results suggested that residues at protein-, DNA-, and RNA-binding interfaces contributed toward predicting that common SAVs more likely affect molecular function than rare SAVs.
Collapse
Affiliation(s)
- Jiajun Qiu
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany. .,TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), 85748, Garching, Germany. .,Biobank of Ninth People's Hospital, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200125, China.
| | - Dmitrii Nechaev
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany.,TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), 85748, Garching, Germany
| | - Burkhard Rost
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany.,Institute of Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching, Munich, Germany.,Institute for Food and Plant Sciences (WZW) Weihenstephan, Alte Akademie 8, 85354, Freising, Germany
| |
Collapse
|
9
|
Protein-Protein Interactions Mediated by Intrinsically Disordered Protein Regions Are Enriched in Missense Mutations. Biomolecules 2020; 10:biom10081097. [PMID: 32722039 PMCID: PMC7463635 DOI: 10.3390/biom10081097] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2020] [Revised: 07/15/2020] [Accepted: 07/20/2020] [Indexed: 12/27/2022] Open
Abstract
Because proteins are fundamental to most biological processes, many genetic diseases can be traced back to single nucleotide variants (SNVs) that cause changes in protein sequences. However, not all SNVs that result in amino acid substitutions cause disease as each residue is under different structural and functional constraints. Influential studies have shown that protein–protein interaction interfaces are enriched in disease-associated SNVs and depleted in SNVs that are common in the general population. These studies focus primarily on folded (globular) protein domains and overlook the prevalent class of protein interactions mediated by intrinsically disordered regions (IDRs). Therefore, we investigated the enrichment patterns of missense mutation-causing SNVs that are associated with disease and cancer, as well as those present in the healthy population, in structures of IDR-mediated interactions with comparisons to classical globular interactions. When comparing the different categories of interaction interfaces, division of the interface regions into solvent-exposed rim residues and buried core residues reveal distinctive enrichment patterns for the various types of missense mutations. Most notably, we demonstrate a strong enrichment at the interface core of interacting IDRs in disease mutations and its depletion in neutral ones, which supports the view that the disruption of IDR interactions is a mechanism underlying many diseases. Intriguingly, we also found an asymmetry across the IDR interaction interface in the enrichment of certain missense mutation types, which may hint at an increased variant tolerance and urges further investigations of IDR interactions.
Collapse
|
10
|
Reeb J, Wirth T, Rost B. Variant effect predictions capture some aspects of deep mutational scanning experiments. BMC Bioinformatics 2020; 21:107. [PMID: 32183714 PMCID: PMC7077003 DOI: 10.1186/s12859-020-3439-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 03/03/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Deep mutational scanning (DMS) studies exploit the mutational landscape of sequence variation by systematically and comprehensively assaying the effect of single amino acid variants (SAVs; also referred to as missense mutations, or non-synonymous Single Nucleotide Variants - missense SNVs or nsSNVs) for particular proteins. We assembled SAV annotations from 22 different DMS experiments and normalized the effect scores to evaluate variant effect prediction methods. Three trained on traditional variant effect data (PolyPhen-2, SIFT, SNAP2), a regression method optimized on DMS data (Envision), and a naïve prediction using conservation information from homologs. RESULTS On a set of 32,981 SAVs, all methods captured some aspects of the experimental effect scores, albeit not the same. Traditional methods such as SNAP2 correlated slightly more with measurements and better classified binary states (effect or neutral). Envision appeared to better estimate the precise degree of effect. Most surprising was that the simple naïve conservation approach using PSI-BLAST in many cases outperformed other methods. All methods captured beneficial effects (gain-of-function) significantly worse than deleterious (loss-of-function). For the few proteins with multiple independent experimental measurements, experiments differed substantially, but agreed more with each other than with predictions. CONCLUSIONS DMS provides a new powerful experimental means of understanding the dynamics of the protein sequence space. As always, promising new beginnings have to overcome challenges. While our results demonstrated that DMS will be crucial to improve variant effect prediction methods, data diversity hindered simplification and generalization.
Collapse
Affiliation(s)
- Jonas Reeb
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr 3, 85748, Garching/Munich, Germany.
| | - Theresa Wirth
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr 3, 85748, Garching/Munich, Germany
| | - Burkhard Rost
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr 3, 85748, Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr 2a, 85748, Garching/Munich, Germany
- TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany
- Department of Biochemistry and Molecular Biophysics, Columbia University, 701 West, 168th Street, New York, NY, 10032, USA
| |
Collapse
|
11
|
Iourov IY, Vorsanova SG, Yurov YB. The variome concept: focus on CNVariome. Mol Cytogenet 2019; 12:52. [PMID: 31890032 PMCID: PMC6924070 DOI: 10.1186/s13039-019-0467-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Accepted: 12/13/2019] [Indexed: 02/07/2023] Open
Abstract
Background Variome may be used for designating complex system of interplay between genomic variations specific for an individual or a disease. Despite the recognized complexity of genomic basis for phenotypic traits and diseases, studies of genetic causes of a disease are usually dedicated to the identification of single causative genomic changes (mutations). When such an artificially simplified model is employed, genomic basis of phenotypic outcomes remains elusive in the overwhelming majority of human diseases. Moreover, it is repeatedly demonstrated that multiple genomic changes within an individual genome are likely to underlie the phenome. Probably the best example of cumulative effect of variome on the phenotype is CNV (copy number variation) burden. Accordingly, we have proposed a variome concept based on CNV studies providing the evidence for the existence of a CNVariome (the set of CNV affecting an individual genome), a target for genomic analyses useful for unraveling genetic mechanisms of diseases and phenotypic traits. Conclusion Variome (CNVariome) concept suggests that a genomic milieu is determined by the whole set of genomic variations (CNV) within an individual genome. The genomic milieu is likely to result from interplay between these variations. Furthermore, such kind of variome may be either individual or disease-specific. Additionally, such variome may be pathway-specific. The latter is able to affect molecular/cellular pathways of genome stability maintenance leading to occurrence of genomic/chromosome instability and/or somatic mosaicism resulting in somatic variome. This variome type seems to be important for unraveling disease mechanisms, as well. Finally, it appears that bioinformatic analysis of both individual and somatic variomes in the context of diseases- and pathway-specific variomes is the most promising way to determine genomic basis of the phenome and to unravel disease mechanisms for the management and treatment of currently incurable diseases.
Collapse
Affiliation(s)
- Ivan Y Iourov
- Yurov's Laboratory of Molecular Genetics and Cytogenomics of the Brain, Mental Health Research Center, 117152 Moscow, Russia.,2Veltischev Research and Clinical Institute for Pediatrics of the Pirogov Russian National Research Medical University, Ministry of Health of Russian Federation, 125412 Moscow, Russia
| | - Svetlana G Vorsanova
- Yurov's Laboratory of Molecular Genetics and Cytogenomics of the Brain, Mental Health Research Center, 117152 Moscow, Russia.,2Veltischev Research and Clinical Institute for Pediatrics of the Pirogov Russian National Research Medical University, Ministry of Health of Russian Federation, 125412 Moscow, Russia
| | - Yuri B Yurov
- Yurov's Laboratory of Molecular Genetics and Cytogenomics of the Brain, Mental Health Research Center, 117152 Moscow, Russia.,2Veltischev Research and Clinical Institute for Pediatrics of the Pirogov Russian National Research Medical University, Ministry of Health of Russian Federation, 125412 Moscow, Russia
| |
Collapse
|
12
|
Zeng Z, Bromberg Y. Predicting Functional Effects of Synonymous Variants: A Systematic Review and Perspectives. Front Genet 2019; 10:914. [PMID: 31649718 PMCID: PMC6791167 DOI: 10.3389/fgene.2019.00914] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Accepted: 08/29/2019] [Indexed: 12/13/2022] Open
Abstract
Recent advances in high-throughput experimentation have put the exploration of genome sequences at the forefront of precision medicine. In an effort to interpret the sequencing data, numerous computational methods have been developed for evaluating the effects of genome variants. Interestingly, despite the fact that every person has as many synonymous (sSNV) as non-synonymous single nucleotide variants, our ability to predict their effects is limited. The paucity of experimentally tested sSNV effects appears to be the limiting factor in development of such methods. Here, we summarize the details and evaluate the performance of nine existing computational methods capable of predicting sSNV effects. We used a set of observed and artificially generated variants to approximate large scale performance expectations of these tools. We note that the distribution of these variants across amino acid and codon types suggests purifying evolutionary selection retaining generated variants out of the observed set; i.e., we expect the generated set to be enriched for deleterious variants. Closer inspection of the relationship between the observed variant frequencies and the associated prediction scores identifies predictor-specific scoring thresholds of reliable effect predictions. Notably, across all predictors, the variants scoring above these thresholds were significantly more often generated than observed. which confirms our assumption that the generated set is enriched for deleterious variants. Finally, we find that while the methods differ in their ability to identify severe sSNV effects, no predictor appears capable of definitively recognizing subtle effects of such variants on a large scale.
Collapse
Affiliation(s)
- Zishuo Zeng
- Institute for Quantitative Biomedicine, Rutgers University, Piscataway, NJ, United States
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States
- Department of Genetics, Rutgers University, Human Genetics Institute, Piscataway, NJ, United States
| |
Collapse
|
13
|
Pejaver V, Babbi G, Casadio R, Folkman L, Katsonis P, Kundu K, Lichtarge O, Martelli PL, Miller M, Moult J, Pal LR, Savojardo C, Yin Y, Zhou Y, Radivojac P, Bromberg Y. Assessment of methods for predicting the effects of PTEN and TPMT protein variants. Hum Mutat 2019; 40:1495-1506. [PMID: 31184403 PMCID: PMC6744362 DOI: 10.1002/humu.23838] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Revised: 05/27/2019] [Accepted: 06/06/2019] [Indexed: 01/16/2023]
Abstract
Thermodynamic stability is a fundamental property shared by all proteins. Changes in stability due to mutation are a widespread molecular mechanism in genetic diseases. Methods for the prediction of mutation-induced stability change have typically been developed and evaluated on incomplete and/or biased data sets. As part of the Critical Assessment of Genome Interpretation, we explored the utility of high-throughput variant stability profiling (VSP) assay data as an alternative for the assessment of computational methods and evaluated state-of-the-art predictors against over 7,000 nonsynonymous variants from two proteins. We found that predictions were modestly correlated with actual experimental values. Predictors fared better when evaluated as classifiers of extreme stability effects. While different methods emerging as top performers depending on the metric, it is nontrivial to draw conclusions on their adoption or improvement. Our analyses revealed that only 16% of all variants in VSP assays could be confidently defined as stability-affecting. Furthermore, it is unclear as to what extent VSP abundance scores were reasonable proxies for the stability-related quantities that participating methods were designed to predict. Overall, our observations underscore the need for clearly defined objectives when developing and using both computational and experimental methods in the context of measuring variant impact.
Collapse
Affiliation(s)
- Vikas Pejaver
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington
- The eScience Institute, University of Washington, Seattle, Washington
| | - Giulia Babbi
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | - Lukas Folkman
- School of Information and Communication Technology, Griffith University, Southport, Australia
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
| | - Kunal Kundu
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland
- Computational Biology, Bioinformatics and Genomics, Biological Sciences Graduate Program, University of Maryland, College Park, Maryland
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
- Department of Biochemistry & Molecular Biology, Baylor College of Medicine, Houston, Texas
- Department of Pharmacology, Baylor College of Medicine, Houston, Texas
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas
| | - Pier Luigi Martelli
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | - Maximilian Miller
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey
| | - John Moult
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland
| | - Lipika R Pal
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland
| | - Castrense Savojardo
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | - Yizhou Yin
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Australia
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey
- Department of Genetics, Human Genetics Institute, Rutgers University, Piscataway, New Jersey
- Institute for Advanced Study at Technische Universität München (TUM-IAS), Garching/Munich, Germany
| |
Collapse
|
14
|
Patel R, Kumar S. On estimating evolutionary probabilities of population variants. BMC Evol Biol 2019; 19:133. [PMID: 31238981 PMCID: PMC6593550 DOI: 10.1186/s12862-019-1455-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Accepted: 06/06/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The evolutionary probability (EP) of an allele in a DNA or protein sequence predicts evolutionarily permissible (ePerm; EP ≥ 0.05) and forbidden (eForb; EP < 0.05) variants. EP of an allele represents an independent evolutionary expectation of observing an allele in a population based solely on the long-term substitution patterns captured in a multiple sequence alignment. In the neutral theory, EP and population frequencies can be compared to identify neutral and non-neutral alleles. This approach has been used to discover candidate adaptive polymorphisms in humans, which are eForbs segregating with high frequencies. The original method to compute EP requires the evolutionary relationships and divergence times of species in the sequence alignment (a timetree), which are not known with certainty for most datasets. This requirement impedes a general use of the original EP formulation. Here, we present an approach in which the phylogeny and times are inferred from the sequence alignment itself prior to the EP calculation. We evaluate if the modified EP approach produces results that are similar to those from the original method. RESULTS We compared EP estimates from the original and the modified approaches by using more than 18,000 protein sequence alignments containing orthologous sequences from 46 vertebrate species. For the original EP calculations, we used species relationships from UCSC and divergence times from TimeTree web resource, and the resulting EP estimates were considered to be the ground truth. We found that the modified approaches produced reasonable EP estimates for HGMD disease missense variant and 1000 Genomes Project missense variant datasets. Our results showed that reliable estimates of EP can be obtained without a priori knowledge of the sequence phylogeny and divergence times. We also found that, in order to obtain robust EP estimates, it is important to assemble a dataset with many sequences, sampling from a diversity of species groups. CONCLUSION We conclude that the modified EP approach will be generally applicable for alignments and enable the detection of potentially neutral, deleterious, and adaptive alleles in populations.
Collapse
Affiliation(s)
- Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA. .,Department of Biology, Temple University, Philadelphia, PA, 19122, USA. .,Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia.
| |
Collapse
|
15
|
Churko JM. Linking Clinical Parameters and Genotype in Dilated Cardiomyopathy. Circ Heart Fail 2018; 11:e005459. [PMID: 30525987 PMCID: PMC6291840 DOI: 10.1161/circheartfailure.118.005459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Affiliation(s)
- Jared M Churko
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson
| |
Collapse
|