1
|
Álvarez-Machancoses Ó, Faraggi E, deAndrés-Galiana EJ, Fernández-Martínez JL, Kloczkowski A. Prediction of Deleterious Single Amino Acid Polymorphisms with a Consensus Holdout Sampler. Curr Genomics 2024; 25:171-184. [PMID: 39086995 PMCID: PMC11288160 DOI: 10.2174/0113892029236347240308054538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 08/03/2023] [Accepted: 09/22/2023] [Indexed: 08/02/2024] Open
Abstract
Background Single Amino Acid Polymorphisms (SAPs) or nonsynonymous Single Nucleotide Variants (nsSNVs) are the most common genetic variations. They result from missense mutations where a single base pair substitution changes the genetic code in such a way that the triplet of bases (codon) at a given position is coding a different amino acid. Since genetic mutations sometimes cause genetic diseases, it is important to comprehend and foresee which variations are harmful and which ones are neutral (not causing changes in the phenotype). This can be posed as a classification problem. Methods Computational methods using machine intelligence are gradually replacing repetitive and exceedingly overpriced mutagenic tests. By and large, uneven quality, deficiencies, and irregularities of nsSNVs datasets debase the convenience of artificial intelligence-based methods. Subsequently, strong and more exact approaches are needed to address these problems. In the present work paper, we show a consensus classifier built on the holdout sampler, which appears strong and precise and outflanks all other popular methods. Results We produced 100 holdouts to test the structures and diverse classification variables of diverse classifiers during the training phase. The finest performing holdouts were chosen to develop a consensus classifier and tested using a k-fold (1 ≤ k ≤5) cross-validation method. We also examined which protein properties have the biggest impact on the precise prediction of the effects of nsSNVs. Conclusion Our Consensus Holdout Sampler outflanks other popular algorithms, and gives excellent results, highly accurate with low standard deviation. The advantage of our method emerges from using a tree of holdouts, where diverse LM/AI-based programs are sampled in diverse ways.
Collapse
Affiliation(s)
- Óscar Álvarez-Machancoses
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C. Federico García Lorca, 18, 33007, Oviedo, Spain
| | - Eshel Faraggi
- School of Science, Indiana University-Purdue University Indianapolis, IN, USA
| | - Enrique J deAndrés-Galiana
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C. Federico García Lorca, 18, 33007, Oviedo, Spain
- Department of Computer Science, University of Oviedo, C. Federico García Lorca, 18, 33007, Oviedo, Spain
| | - Juan L Fernández-Martínez
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C. Federico García Lorca, 18, 33007, Oviedo, Spain
| | - Andrzej Kloczkowski
- Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA
- Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| |
Collapse
|
2
|
Hauser BM, Luo Y, Nathan A, Al-Moujahed A, Vavvas DG, Comander J, Pierce EA, Place EM, Bujakowska KM, Gaiha GD, Rossin EJ. Structure-based network analysis predicts pathogenic variants in human proteins associated with inherited retinal disease. NPJ Genom Med 2024; 9:31. [PMID: 38802398 PMCID: PMC11130145 DOI: 10.1038/s41525-024-00416-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 05/02/2024] [Indexed: 05/29/2024] Open
Abstract
Advances in gene sequencing technologies have accelerated the identification of genetic variants, but better tools are needed to understand which are causal of disease. This would be particularly useful in fields where gene therapy is a potential therapeutic modality for a disease-causing variant such as inherited retinal disease (IRD). Here, we apply structure-based network analysis (SBNA), which has been successfully utilized to identify variant-constrained amino acid residues in viral proteins, to identify residues that may cause IRD if subject to missense mutation. SBNA is based entirely on structural first principles and is not fit to specific outcome data, which makes it distinct from other contemporary missense prediction tools. In 4 well-studied human disease-associated proteins (BRCA1, HRAS, PTEN, and ERK2) with high-quality structural data, we find that SBNA scores correlate strongly with deep mutagenesis data. When applied to 47 IRD genes with available high-quality crystal structure data, SBNA scores reliably identified disease-causing variants according to phenotype definitions from the ClinVar database. Finally, we applied this approach to 63 patients at Massachusetts Eye and Ear (MEE) with IRD but for whom no genetic cause had been identified. Untrained models built using SBNA scores and BLOSUM62 scores for IRD-associated genes successfully predicted the pathogenicity of novel variants (AUC = 0.851), allowing us to identify likely causative disease variants in 40 IRD patients. Model performance was further augmented by incorporating orthogonal data from EVE scores (AUC = 0.927), which are based on evolutionary multiple sequence alignments. In conclusion, SBNA can used to successfully identify variants as causal of disease in human proteins and may help predict variants causative of IRD in an unbiased fashion.
Collapse
Affiliation(s)
| | - Yuyang Luo
- Harvard Medical School, Department of Ophthalmology, Massachusetts Eye and Ear, Boston, MA, USA
| | - Anusha Nathan
- Ragon Institute of Mass General, MIT, and Harvard, Cambridge, MA, USA
| | - Ahmad Al-Moujahed
- Harvard Medical School, Department of Ophthalmology, Massachusetts Eye and Ear, Boston, MA, USA
| | - Demetrios G Vavvas
- Harvard Medical School, Department of Ophthalmology, Massachusetts Eye and Ear, Boston, MA, USA
| | - Jason Comander
- Harvard Medical School, Department of Ophthalmology, Massachusetts Eye and Ear, Boston, MA, USA
| | - Eric A Pierce
- Harvard Medical School, Department of Ophthalmology, Massachusetts Eye and Ear, Boston, MA, USA
| | - Emily M Place
- Harvard Medical School, Department of Ophthalmology, Massachusetts Eye and Ear, Boston, MA, USA
| | - Kinga M Bujakowska
- Harvard Medical School, Department of Ophthalmology, Massachusetts Eye and Ear, Boston, MA, USA
| | - Gaurav D Gaiha
- Ragon Institute of Mass General, MIT, and Harvard, Cambridge, MA, USA
- Division of Gastroenterology, Massachusetts General Hospital, Boston, MA, USA
| | - Elizabeth J Rossin
- Harvard Medical School, Department of Ophthalmology, Massachusetts Eye and Ear, Boston, MA, USA.
| |
Collapse
|
3
|
Ahmad RM, Ali BR, Al-Jasmi F, Sinnott RO, Al Dhaheri N, Mohamad MS. A review of genetic variant databases and machine learning tools for predicting the pathogenicity of breast cancer. Brief Bioinform 2023; 25:bbad479. [PMID: 38149678 PMCID: PMC10782903 DOI: 10.1093/bib/bbad479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 09/22/2023] [Accepted: 12/04/2023] [Indexed: 12/28/2023] Open
Abstract
Studies continue to uncover contributing risk factors for breast cancer (BC) development including genetic variants. Advances in machine learning and big data generated from genetic sequencing can now be used for predicting BC pathogenicity. However, it is unclear which tool developed for pathogenicity prediction is most suited for predicting the impact and pathogenicity of variant effects. A significant challenge is to determine the most suitable data source for each tool since different tools can yield different prediction results with different data inputs. To this end, this work reviews genetic variant databases and tools used specifically for the prediction of BC pathogenicity. We provide a description of existing genetic variants databases and, where appropriate, the diseases for which they have been established. Through example, we illustrate how they can be used for prediction of BC pathogenicity and discuss their associated advantages and disadvantages. We conclude that the tools that are specialized by training on multiple diverse datasets from different databases for the same disease have enhanced accuracy and specificity and are thereby more helpful to the clinicians in predicting and diagnosing BC as early as possible.
Collapse
Affiliation(s)
- Rahaf M Ahmad
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| | - Bassam R Ali
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| | - Fatma Al-Jasmi
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
- Division of Metabolic Genetics, Department of Pediatrics, Tawam Hospital, Al Ain, United Arab Emirates
| | - Richard O Sinnott
- School of Computing and Information System, Faculty of Engineering and Information Technology, The University of Melbourne, Melbourne, Victoria, Australia
| | - Noura Al Dhaheri
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
- Division of Metabolic Genetics, Department of Pediatrics, Tawam Hospital, Al Ain, United Arab Emirates
| | - Mohd Saberi Mohamad
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| |
Collapse
|
4
|
Hauser BM, Luo Y, Nathan A, Gaiha GD, Vavvas D, Comander J, Pierce EA, Place EM, Bujakowska KM, Rossin EJ. Structure-based network analysis predicts mutations associated with inherited retinal disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.05.23292247. [PMID: 37461650 PMCID: PMC10350150 DOI: 10.1101/2023.07.05.23292247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/27/2023]
Abstract
With continued advances in gene sequencing technologies comes the need to develop better tools to understand which mutations cause disease. Here we validate structure-based network analysis (SBNA)1,2 in well-studied human proteins and report results of using SBNA to identify critical amino acids that may cause retinal disease if subject to missense mutation. We computed SBNA scores for genes with high-quality structural data, starting with validating the method using 4 well-studied human disease-associated proteins. We then analyzed 47 inherited retinal disease (IRD) genes. We compared SBNA scores to phenotype data from the ClinVar database and found a significant difference between benign and pathogenic mutations with respect to network score. Finally, we applied this approach to 65 patients at Massachusetts Eye and Ear (MEE) who were diagnosed with IRD but for whom no genetic cause was found. Multivariable logistic regression models built using SBNA scores for IRD-associated genes successfully predicted pathogenicity of novel mutations, allowing us to identify likely causative disease variants in 37 patients with IRD from our clinic. In conclusion, SBNA can be meaningfully applied to human proteins and may help predict mutations causative of IRD.
Collapse
Affiliation(s)
| | - Yuyang Luo
- Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA
| | - Anusha Nathan
- Ragon Institute of Mass General, MIT, and Harvard, Cambridge, MA
| | - Gaurav D. Gaiha
- Ragon Institute of Mass General, MIT, and Harvard, Cambridge, MA
| | - Demetrios Vavvas
- Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA
| | - Jason Comander
- Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA
| | - Eric A. Pierce
- Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA
| | - Emily M. Place
- Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA
| | - Kinga M. Bujakowska
- Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA
| | - Elizabeth J. Rossin
- Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA
| |
Collapse
|
5
|
Luzuriaga-Neira AR, Ritchie AM, Payne BL, Carrillo-Parramon O, Liberles DA, Alvarez-Ponce D. Highly Abundant Proteins Are Highly Thermostable. Genome Biol Evol 2023; 15:evad112. [PMID: 37399326 DOI: 10.1093/gbe/evad112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/08/2023] [Indexed: 07/05/2023] Open
Abstract
Highly abundant proteins tend to evolve slowly (a trend called E-R anticorrelation), and a number of hypotheses have been proposed to explain this phenomenon. The misfolding avoidance hypothesis attributes the E-R anticorrelation to the abundance-dependent toxic effects of protein misfolding. To avoid these toxic effects, protein sequences (particularly those of highly expressed proteins) would be under selection to fold properly. One prediction of the misfolding avoidance hypothesis is that highly abundant proteins should exhibit high thermostability (i.e., a highly negative free energy of folding, ΔG). Thus far, only a handful of analyses have tested for a relationship between protein abundance and thermostability, producing contradictory results. These analyses have been limited by 1) the scarcity of ΔG data, 2) the fact that these data have been obtained by different laboratories and under different experimental conditions, 3) the problems associated with using proteins' melting energy (Tm) as a proxy for ΔG, and 4) the difficulty of controlling for potentially confounding variables. Here, we use computational methods to compare the free energy of folding of pairs of human-mouse orthologous proteins with different expression levels. Even though the effect size is limited, the most highly expressed ortholog is often the one with a more negative ΔG of folding, indicating that highly expressed proteins are often more thermostable.
Collapse
Affiliation(s)
| | - Andrew M Ritchie
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, USA
| | | | | | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, USA
| | | |
Collapse
|
6
|
Dottore GR, Lanzolla G, Comi S, Menconi F, Mencacci LC, Dallan I, Marcocci C, Marinò M. Insights into the role of DNA methylation and gene expression in Graves' orbitopathy. J Clin Endocrinol Metab 2022; 108:e160-e168. [PMID: 36334311 DOI: 10.1210/clinem/dgac645] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 09/15/2022] [Accepted: 10/31/2022] [Indexed: 11/08/2022]
Abstract
CONTEXT A role of DNA methylation in Graves' orbitopathy (GO) has been proposed. Design. To investigate DNA methylation and gene expression in orbital fibroblasts from control and GO patients, under basal conditions or following challenge with an anti-TSH receptor antibody (M22) or cytokines involved in GO; to investigate the relationship between DNA methylation and cell function (proliferation); to perform a methylome analysis. Setting. Referral Center. Materials. Orbital fibroblasts from six GO and six control patients. Intervention. None. Main Outcome Measure. Methylome analysis of the whole genome. Results. Global DNA methylation increased significantly both in control and GO fibroblasts upon incubation with M22. Expression of two selected genes (CYP19A1 and AIFM2) was variably affected by M22 and interleukin-6. M22 increased cell proliferation in control and GO fibroblasts, which correlated with global DNA methylation. Methylome analysis revealed 19,869 DNA regions differently methylated in GO fibroblasts, encompassing 3,957 genes and involving CpG islands, shores and shelves. One-hundred and nineteen gene families and subfamilies, 89 protein groups, 402 biological processes and seven pathways were involved. Three genes found to be differentially expressed were concordantly hyper- or hypomethylated. Among the differently methylated genes, insulin-like growth factor-1 receptor and several fibroblast growth factors and receptors were included. CONCLUSIONS We propose that, when exposed to an autoimmune environment, orbital fibroblasts undergo hyper- or hypomethylation of certain genes, involving CpG promoters, which results in differential gene expression, which may be responsible for functional alterations, in particular higher proliferation, and ultimately for the GO phenotype in vivo.
Collapse
Affiliation(s)
- Giovanna Rotondo Dottore
- Department of Clinical and Experimental Medicine, Endocrinology Unit II, University of Pisa and University Hospital of Pisa, Via Paradisa 2, 56124, Pisa, Italy
| | - Giulia Lanzolla
- Department of Clinical and Experimental Medicine, Endocrinology Unit II, University of Pisa and University Hospital of Pisa, Via Paradisa 2, 56124, Pisa, Italy
| | - Simone Comi
- Department of Clinical and Experimental Medicine, Endocrinology Unit II, University of Pisa and University Hospital of Pisa, Via Paradisa 2, 56124, Pisa, Italy
| | - Francesca Menconi
- Department of Clinical and Experimental Medicine, Endocrinology Unit II, University of Pisa and University Hospital of Pisa, Via Paradisa 2, 56124, Pisa, Italy
| | - Lodovica Cristofani Mencacci
- Department of Surgical, Medical and Molecular Pathology, ENT Unit I, University of Pisa and University Hospital of Pisa, Italy, Via Paradisa 2, 56124, Pisa, Italy
| | - Iacopo Dallan
- Department of Surgical, Medical and Molecular Pathology, ENT Unit I, University of Pisa and University Hospital of Pisa, Italy, Via Paradisa 2, 56124, Pisa, Italy
| | - Claudio Marcocci
- Department of Clinical and Experimental Medicine, Endocrinology Unit II, University of Pisa and University Hospital of Pisa, Via Paradisa 2, 56124, Pisa, Italy
| | - Michele Marinò
- Department of Clinical and Experimental Medicine, Endocrinology Unit II, University of Pisa and University Hospital of Pisa, Via Paradisa 2, 56124, Pisa, Italy
| |
Collapse
|
7
|
Gou X, Feng X, Shi H, Guo T, Xie R, Liu Y, Wang Q, Li H, Yang B, Chen L, Lu Y. PPVED: A machine learning tool for predicting the effect of single amino acid substitution on protein function in plants. PLANT BIOTECHNOLOGY JOURNAL 2022; 20:1417-1431. [PMID: 35398963 PMCID: PMC9241370 DOI: 10.1111/pbi.13823] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Accepted: 04/03/2022] [Indexed: 05/31/2023]
Abstract
Single amino acid substitution (SAAS) produces the most common variant of protein function change under physiological conditions. As the number of SAAS events in plants has increased exponentially, an effective prediction tool is required to help identify and distinguish functional SAASs from the whole genome as either potentially causal traits or as variants. Here, we constructed a plant SAAS database that stores 12 865 SAASs in 6172 proteins and developed a tool called Plant Protein Variation Effect Detector (PPVED) that predicts the effect of SAASs on protein function in plants. PPVED achieved an 87% predictive accuracy when applied to plant SAASs, an accuracy that was much higher than those from six human database software: SIFT, PROVEAN, PANTHER-PSEP, PhD-SNP, PolyPhen-2, and MutPred2. The predictive effect of six SAASs from three proteins in Arabidopsis and maize was validated with wet lab experiments, of which five substitution sites were accurately predicted. PPVED could facilitate the identification and characterization of genetic variants that explain observed phenotype variations in plants, contributing to solutions for challenges in functional genomics and systems biology. PPVED can be accessed under a CC-BY (4.0) license via http://www.ppved.org.cn.
Collapse
Affiliation(s)
- Xiangjian Gou
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest ChinaWenjiangSichuanChina
- Maize Research InstituteSichuan Agricultural UniversityWenjiangSichuanChina
| | - Xuanjun Feng
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest ChinaWenjiangSichuanChina
- Maize Research InstituteSichuan Agricultural UniversityWenjiangSichuanChina
| | - Haoran Shi
- Chengdu Academy of Agricultural and Forestry SciencesWenjiangSichuanChina
| | - Tingting Guo
- National Key Laboratory of Crop Genetic ImprovementHuazhong Agricultural UniversityWuhanHubeiChina
| | - Rongqian Xie
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest ChinaWenjiangSichuanChina
- Maize Research InstituteSichuan Agricultural UniversityWenjiangSichuanChina
| | - Yaxi Liu
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest ChinaWenjiangSichuanChina
- Triticeae Research InstituteSichuan Agricultural UniversityWenjiangSichuanChina
| | - Qi Wang
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest ChinaWenjiangSichuanChina
| | - Hongxiang Li
- College of Information EngineeringSichuan Agricultural UniversityYa’anSichuanChina
| | - Banglie Yang
- College of Information EngineeringSichuan Agricultural UniversityYa’anSichuanChina
| | - Lixue Chen
- College of Information EngineeringSichuan Agricultural UniversityYa’anSichuanChina
| | - Yanli Lu
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest ChinaWenjiangSichuanChina
- Maize Research InstituteSichuan Agricultural UniversityWenjiangSichuanChina
| |
Collapse
|
8
|
Horne J, Shukla D. Recent Advances in Machine Learning Variant Effect Prediction Tools for Protein Engineering. Ind Eng Chem Res 2022; 61:6235-6245. [PMID: 36051311 PMCID: PMC9432854 DOI: 10.1021/acs.iecr.1c04943] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Proteins are Nature's molecular machinery and comprise diverse roles while consisting of chemically similar building blocks. In recent years, protein engineering and design have become important research areas, with many applications in the pharmaceutical, energy, and biocatalysis fields, among others-where the aim is to ultimately create a protein given desired structural and functional properties. It is often critical to model the relationship between a protein's sequence, folded structure, and biological function to assist in such protein engineering pursuits. However, significant challenges remain in concretely mapping an amino acid sequence to specific protein properties and biological activities. Mutations may enhance or diminish molecular protein function, and the epistatic interactions between mutations result in an inherently complex mapping between genetic modifications and protein function. Therefore, estimating the quantitative effects of mutations on protein function(s) remains a grand challenge of biology, bioinformatics, and many related fields and would rapidly accelerate protein engineering tasks when successful. Such estimation is often known as variant effect prediction (VEP). However, progress has been demonstrated in recent years with the development of machine learning (ML) methods in modeling the relationship between mutations and protein function. In this Review, recent advances in variant effect prediction (VEP) are discussed as tools for protein engineering, focusing on techniques incorporating gains from the broader ML community and challenges in estimating biomolecular functional differences. Primary developments highlighted include convolutional neural networks, graph neural networks, and natural language embeddings for protein sequences.
Collapse
Affiliation(s)
- Jesse Horne
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Champaign, Illinois 61801, United States
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering and Department of Bioengineering, University of Illinois Urbana-Champaign, Champaign, Illinois 61801, United States; Department of Plant Biology, Cancer Center at Illinois, and Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Champaign, Illinois 61801, United States
| |
Collapse
|
9
|
Lai J, Yang J, Gamsiz Uzun ED, Rubenstein BM, Sarkar IN. LYRUS: a machine learning model for predicting the pathogenicity of missense variants. BIOINFORMATICS ADVANCES 2021; 2:vbab045. [PMID: 35036922 PMCID: PMC8754197 DOI: 10.1093/bioadv/vbab045] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Revised: 12/08/2021] [Accepted: 12/21/2021] [Indexed: 01/27/2023]
Abstract
SUMMARY Single amino acid variations (SAVs) are a primary contributor to variations in the human genome. Identifying pathogenic SAVs can provide insights to the genetic architecture of complex diseases. Most approaches for predicting the functional effects or pathogenicity of SAVs rely on either sequence or structural information. This study presents 〈Lai Yang Rubenstein Uzun Sarkar〉 (LYRUS), a machine learning method that uses an XGBoost classifier to predict the pathogenicity of SAVs. LYRUS incorporates five sequence-based, six structure-based and four dynamics-based features. Uniquely, LYRUS includes a newly proposed sequence co-evolution feature called the variation number. LYRUS was trained using a dataset that contains 4363 protein structures corresponding to 22 639 SAVs from the ClinVar database, and tested using the VariBench testing dataset. Performance analysis showed that LYRUS achieved comparable performance to current variant effect predictors. LYRUS's performance was also benchmarked against six Deep Mutational Scanning datasets for PTEN and TP53. AVAILABILITY AND IMPLEMENTATION LYRUS is freely available and the source code can be found at https://github.com/jiaying2508/LYRUS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Jiaying Lai
- Center for Biomedical Informatics, Brown University, Providence, RI 02903, USA,Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
| | - Jordan Yang
- Department of Chemistry, Brown University, Providence, RI 02906, USA
| | - Ece D Gamsiz Uzun
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA,Department of Pathology and Laboratory Medicine, Brown University Alpert Medical School, Providence, RI 02903, USA,Department of Pathology, Rhode Island Hospital, Providence, RI 02903, USA
| | - Brenda M Rubenstein
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA,Department of Chemistry, Brown University, Providence, RI 02906, USA,To whom correspondence should be addressed. and
| | - Indra Neil Sarkar
- Center for Biomedical Informatics, Brown University, Providence, RI 02903, USA,Rhode Island Quality Institute, Providence, RI 02908, USA,To whom correspondence should be addressed. and
| |
Collapse
|
10
|
Computational analysis of missense variants in MMP2 gene linked with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis reveals structural shift in protein-protein and protein-ligand complexes. Meta Gene 2021. [DOI: 10.1016/j.mgene.2021.100931] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
|
11
|
Ancien F, Pucci F, Rooman M. In Silico Analysis of the Molecular-Level Impact of SMPD1 Variants on Niemann-Pick Disease Severity. Int J Mol Sci 2021; 22:4516. [PMID: 33925997 PMCID: PMC8123603 DOI: 10.3390/ijms22094516] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Revised: 04/10/2021] [Accepted: 04/20/2021] [Indexed: 12/12/2022] Open
Abstract
Sphingomyelin phosphodiesterase (SMPD1) is a key enzyme in the sphingolipid metabolism. Genetic SMPD1 variants have been related to the Niemann-Pick lysosomal storage disorder, which has different degrees of phenotypic severity ranging from severe symptomatology involving the central nervous system (type A) to milder ones (type B). They have also been linked to neurodegenerative disorders such as Parkinson and Alzheimer. In this paper, we leveraged structural, evolutionary and stability information on SMPD1 to predict and analyze the impact of variants at the molecular level. We developed the SMPD1-ZooM algorithm, which is able to predict with good accuracy whether variants cause Niemann-Pick disease and its phenotypic severity; the predictor is freely available for download. We performed a large-scale analysis of all possible SMPD1 variants, which led us to identify protein regions that are either robust or fragile with respect to amino acid variations, and show the importance of aromatic-involving interactions in SMPD1 function and stability. Our study also revealed a good correlation between SMPD1-ZooM scores and in vitro loss of SMPD1 activity. The understanding of the molecular effects of SMPD1 variants is of crucial importance to improve genetic screening of SMPD1-related disorders and to develop personalized treatments that restore SMPD1 functionality.
Collapse
Affiliation(s)
- François Ancien
- 3BIO—Computational Biology and Bioinformatics, Université Libre de Bruxelles, Avenue F. Roosevelt 50, 1050 Brussels, Belgium; (F.A.); (F.P.)
- (IB)—Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, 1050 Brussels, Belgium
| | - Fabrizio Pucci
- 3BIO—Computational Biology and Bioinformatics, Université Libre de Bruxelles, Avenue F. Roosevelt 50, 1050 Brussels, Belgium; (F.A.); (F.P.)
- (IB)—Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, 1050 Brussels, Belgium
| | - Marianne Rooman
- 3BIO—Computational Biology and Bioinformatics, Université Libre de Bruxelles, Avenue F. Roosevelt 50, 1050 Brussels, Belgium; (F.A.); (F.P.)
- (IB)—Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, 1050 Brussels, Belgium
| |
Collapse
|
12
|
Rotondo Dottore G, Bucci I, Lanzolla G, Dallan I, Sframeli A, Torregrossa L, Casini G, Basolo F, Figus M, Nardi M, Marcocci C, Marinò M. Genetic Profiling of Orbital Fibroblasts from Patients with Graves' Orbitopathy. J Clin Endocrinol Metab 2021; 106:e2176-e2190. [PMID: 33484567 DOI: 10.1210/clinem/dgab035] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Indexed: 02/04/2023]
Abstract
CONTEXT Graves' orbitopathy (GO) is an autoimmune disease that persists when immunosuppression is achieved. Orbital fibroblasts from GO patients display peculiar phenotypes even if not exposed to autoimmunity, possibly reflecting genetic or epigenetic mechanisms, which we investigated here. OBJECTIVE We aimed to explore potential genetic or epigenetic differences using primary cultures of orbital fibroblasts from GO and control patients. METHODS Cell proliferation, hyaluronic acid (HA) secretion, and HA synthases (HAS) were measured. Next-generation sequencing and gene expression analysis of the whole genome were performed, as well as real-time-PCR of selected genes and global DNA methylation assay on orbital fibroblasts from 6 patients with GO and 6 control patients from a referral center. RESULTS Cell proliferation was higher in GO than in control fibroblasts. Likewise, HA in the cell medium was higher in GO fibroblasts. HAS-1 and HAS-2 did not differ between GO and control fibroblasts, whereas HAS-3 was more expressed in GO fibroblasts. No relevant gene variants were detected by whole-genome sequencing. However, 58 genes were found to be differentially expressed in GO compared with control fibroblasts, and RT-PCR confirmed the findings in 10 selected genes. We postulated that the differential gene expression was related to an epigenetic mechanism, reflecting diverse DNA methylation, which we therefore measured. In support of our hypothesis, global DNA methylation was significantly higher in GO fibroblasts. CONCLUSIONS We propose that, following an autoimmune insult, DNA methylation elicits differential gene expression and sustains the maintenance of GO.
Collapse
Affiliation(s)
- Giovanna Rotondo Dottore
- Department of Clinical and Experimental Medicine, Endocrinology Unit II, University of Pisa and University Hospital of Pisa, Pisa, Italy
| | - Ilaria Bucci
- Department of Clinical and Experimental Medicine, Endocrinology Unit II, University of Pisa and University Hospital of Pisa, Pisa, Italy
| | - Giulia Lanzolla
- Department of Clinical and Experimental Medicine, Endocrinology Unit II, University of Pisa and University Hospital of Pisa, Pisa, Italy
| | - Iacopo Dallan
- Department of Surgical, Medical and Molecular Pathology, ENT Unit I, University of Pisa and University Hospital of Pisa, Pisa, Italy
| | - Angela Sframeli
- Department of Surgical, Medical and Molecular Pathology, Ophthalmopathy Unit I, University of Pisa and University Hospital of Pisa, Pisa, Italy
| | - Liborio Torregrossa
- Department of Surgical, Medical and Molecular Pathology, Pathology Unit, University of Pisa and University Hospital of Pisa, Pisa, Italy
| | - Giamberto Casini
- Department of Surgical, Medical and Molecular Pathology, Ophthalmopathy Unit I, University of Pisa and University Hospital of Pisa, Pisa, Italy
| | - Fulvio Basolo
- Department of Surgical, Medical and Molecular Pathology, Pathology Unit, University of Pisa and University Hospital of Pisa, Pisa, Italy
| | - Michele Figus
- Department of Surgical, Medical and Molecular Pathology, Ophthalmopathy Unit I, University of Pisa and University Hospital of Pisa, Pisa, Italy
| | - Marco Nardi
- Department of Surgical, Medical and Molecular Pathology, Ophthalmopathy Unit I, University of Pisa and University Hospital of Pisa, Pisa, Italy
| | - Claudio Marcocci
- Department of Clinical and Experimental Medicine, Endocrinology Unit II, University of Pisa and University Hospital of Pisa, Pisa, Italy
| | - Michele Marinò
- Department of Clinical and Experimental Medicine, Endocrinology Unit II, University of Pisa and University Hospital of Pisa, Pisa, Italy
| |
Collapse
|
13
|
Ponzoni L, Peñaherrera DA, Oltvai ZN, Bahar I. Rhapsody: predicting the pathogenicity of human missense variants. Bioinformatics 2020; 36:3084-3092. [PMID: 32101277 PMCID: PMC7214033 DOI: 10.1093/bioinformatics/btaa127] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Revised: 12/27/2019] [Accepted: 02/21/2020] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION The biological effects of human missense variants have been studied experimentally for decades but predicting their effects in clinical molecular diagnostics remains challenging. Available computational tools are usually based on the analysis of sequence conservation and structural properties of the mutant protein. We recently introduced a new machine learning method that demonstrated for the first time the significance of protein dynamics in determining the pathogenicity of missense variants. RESULTS Here, we present a new interface (Rhapsody) that enables fully automated assessment of pathogenicity, incorporating both sequence coevolution data and structure- and dynamics-based features. Benchmarked against a dataset of about 20 000 annotated variants, the methodology is shown to outperform well-established and/or advanced prediction tools. We illustrate the utility of Rhapsody by in silico saturation mutagenesis studies of human H-Ras, phosphatase and tensin homolog and thiopurine S-methyltransferase. AVAILABILITY AND IMPLEMENTATION The new tool is available both as an online webserver at http://rhapsody.csb.pitt.edu and as an open-source Python package (GitHub repository: https://github.com/prody/rhapsody; PyPI package installation: pip install prody-rhapsody). Links to additional resources, tutorials and package documentation are provided in the 'Python package' section of the website. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Luca Ponzoni
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Daniel A Peñaherrera
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Zoltán N Oltvai
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260, USA.,Department of Pathology, University of Pittsburgh, Pittsburgh, PA 15261, USA.,Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN 55455, USA
| | - Ivet Bahar
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260, USA
| |
Collapse
|
14
|
An Integrated Deep-Mutational-Scanning Approach Provides Clinical Insights on PTEN Genotype-Phenotype Relationships. Am J Hum Genet 2020; 106:818-829. [PMID: 32442409 DOI: 10.1016/j.ajhg.2020.04.014] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Accepted: 04/21/2020] [Indexed: 01/03/2023] Open
Abstract
Germline variation in PTEN results in variable clinical presentations, including benign and malignant neoplasia and neurodevelopmental disorders. Despite decades of research, it remains unclear how the PTEN genotype is related to clinical outcomes. In this study, we combined two recent deep mutational scanning (DMS) datasets probing the effects of single amino acid variation on enzyme activity and steady-state cellular abundance with a large, well-curated clinical cohort of PTEN-variant carriers. We sought to connect variant-specific molecular phenotypes to the clinical outcomes of individuals with PTEN variants. We found that DMS data partially explain quantitative clinical traits, including head circumference and Cleveland Clinic (CC) score, which is a semiquantitative surrogate of disease burden. We built logistic regression models that use DMS and CADD scores to separate clinical PTEN variation from gnomAD control-only variation with high accuracy. By using a survival-like analysis, we identified molecular phenotype groups with differential risk of early cancer onset as well as lifetime risk of cancer. Finally, we identified classes of DMS-defined variants with significantly different risk levels for classical hamartoma-related features (odds ratio [OR] range of 4.1-102.9). In stark contrast, the risk for developing autism or developmental delay does not significantly change across variant classes (OR range of 5.4-12.4). Together, these findings highlight the potential impact of combining DMS datasets with rich clinical data and provide new insights that might guide personalized clinical decisions for PTEN-variant carriers.
Collapse
|
15
|
Sealfon RSG, Mariani LH, Kretzler M, Troyanskaya OG. Machine learning, the kidney, and genotype-phenotype analysis. Kidney Int 2020; 97:1141-1149. [PMID: 32359808 PMCID: PMC8048707 DOI: 10.1016/j.kint.2020.02.028] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 01/13/2020] [Accepted: 02/06/2020] [Indexed: 01/23/2023]
Abstract
With biomedical research transitioning into data-rich science, machine learning provides a powerful toolkit for extracting knowledge from large-scale biological data sets. The increasing availability of comprehensive kidney omics compendia (transcriptomics, proteomics, metabolomics, and genome sequencing), as well as other data modalities such as electronic health records, digital nephropathology repositories, and radiology renal images, makes machine learning approaches increasingly essential for analyzing human kidney data sets. Here, we discuss how machine learning approaches can be applied to the study of kidney disease, with a particular focus on how they can be used for understanding the relationship between genotype and phenotype.
Collapse
Affiliation(s)
- Rachel S G Sealfon
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, New York, USA
| | - Laura H Mariani
- Division of Nephrology, University of Michigan, Ann Arbor, Michigan, USA
| | - Matthias Kretzler
- Division of Nephrology, University of Michigan, Ann Arbor, Michigan, USA.
| | - Olga G Troyanskaya
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, New York, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA; Department of Computer Science, Princeton University, Princeton, New Jersey, USA.
| |
Collapse
|
16
|
Mustafa MI, Mohammed ZO, Murshed NS, Elfadol NM, Abdelmoneim AH, Hassan MA. In Silico Genetics Revealing 5 Mutations in CEBPA Gene Associated With Acute Myeloid Leukemia. Cancer Inform 2019; 18:1176935119870817. [PMID: 31621694 PMCID: PMC6777061 DOI: 10.1177/1176935119870817] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Accepted: 07/30/2019] [Indexed: 12/11/2022] Open
Abstract
Background: Acute myeloid leukemia (AML) is an extremely heterogeneous malignant
disorder; AML has been reported as one of the main causes of death in
children. The objective of this work was to classify the most deleterious
mutation in CCAAT/enhancer-binding protein-alpha (CEBPA)
and to predict their influence on the functional, structural, and expression
levels by various Bioinformatics analysis tools. Methods: The single nucleotide polymorphisms (SNPs) were claimed from the National
Center for Biotechnology Information (NCBI) database and then submitted into
various functional analysis tools, which were done to predict the influence
of each SNP, followed by structural analysis of modeled protein followed by
predicting the mutation effect on energy stability; the most damaging
mutations were chosen for additional investigation by Mutation3D, Project
hope, ConSurf, BioEdit, and UCSF Chimera tools. Results: A total of 5 mutations out of 248 were likely to be responsible for the
structural and functional variations in CEBPA protein, whereas in the
3′-untranslated region (3′-UTR) the result showed that among 350 SNPs in the
3′-UTR of CEBPA gene, about 11 SNPs were predicted. Among
these 11 SNPs, 65 alleles disrupted a conserved miRNA site and 22 derived
alleles created a new site of miRNA. Conclusions: In this study, the impact of functional mutations in the CEBPA gene was
investigated through different bioinformatics analysis techniques, which
determined that R339W, R288P, N292S, N292T, and D63N are pathogenic
mutations that have a possible functional and structural influence,
therefore, could be used as genetic biomarkers and may assist in genetic
studies with a special consideration of the large heterogeneity of AML.
Collapse
Affiliation(s)
- Mujahed I Mustafa
- Department of Biotechnology, Africa City of Technology, Khartoum North, Sudan
| | - Zainab O Mohammed
- Department of Haematology, Ribat University Hospital, Khartoum, Sudan
| | - Naseem S Murshed
- Department of Biotechnology, Africa City of Technology, Khartoum North, Sudan
| | - Nafisa M Elfadol
- Department of Biotechnology, Africa City of Technology, Khartoum North, Sudan
| | | | - Mohamed A Hassan
- Department of Biotechnology, Africa City of Technology, Khartoum North, Sudan
| |
Collapse
|
17
|
Guttula PK, Chandrasekaran G, Gupta MK. Screening and insilico analysis of deleterious nsSNPs (missense) in human CSF3 for their effects on protein structure, stability and function. Comput Biol Chem 2019; 82:57-64. [PMID: 31272062 DOI: 10.1016/j.compbiolchem.2019.06.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 05/25/2019] [Accepted: 06/02/2019] [Indexed: 10/26/2022]
Abstract
Human granulocyte colony stimulating factor (hG-CSF), known as CSF3, plays an important role in the growth, differentiation, proliferation, survival, and activation of the granulocyte cell lineage such as neutrophils and their precursors. Functional reduction in native CSF3 protein results in reduced proliferation and activation of neutrophils and leads to neutropenia. Single nucleotide polymorphisms (SNPs) in the CSF3 gene may have deleterious effects on the CSF3 protein function. This study was undertaken to find the functional SNPs in the human CSF3 gene. Results suggest that 18.9% of all the SNPs in the dbSNP database for CSF3 gene were present in the coding region. Out of 59 non-synonymous SNPs (nsSNPs), 26 nsSNPs were predicted to be non-tolerable by SIFT whereas 18 and 7 nsSNPs were predicted as probably damaging and possibly damaging, respectively by PolyPhen. Among this 31 nsSNPs, 16 nsSNPs were identified to be potentially deleterious by PANTHER server, and 4 nsSNPs were found to be neutral by PROVEAN. SNPAnalyzer predicted 7 nsSNPs to be neutral phenotype and the remaining 24 nsSNPs to be associated with diseases. Among the predicted nsSNPs, rs144408123, rs144408123, rs145136406, rs145311241, rs373191696, rs762945096, rs763688260, rs767572172, rs775326370, rs777777864, rs777983866, rs781596455, rs139072004, rs757612684, rs772911210, rs139072004, rs746634544, rs749993200, rs763426127, rs772466210 were identified as deleterious and potentially damaging. I-Mutant analysis revealed that th 20 nsSNPs were important for protein stability of CSF3. Therefore, th 20 nsSNPs may be used for the wider population-based genetic studies and also should be taken into account while engineering the recombinant CSF3 protein for clinical use.
Collapse
Affiliation(s)
- Praveen Kumar Guttula
- Gene Manipulation Laboratory, Department of Biotechnology and Medical Engineering, National Institute of Technology Rourkela, Odisha, 769008, India
| | - Gopalakrishnan Chandrasekaran
- Gene Manipulation Laboratory, Department of Biotechnology and Medical Engineering, National Institute of Technology Rourkela, Odisha, 769008, India
| | - Mukesh Kumar Gupta
- Gene Manipulation Laboratory, Department of Biotechnology and Medical Engineering, National Institute of Technology Rourkela, Odisha, 769008, India.
| |
Collapse
|
18
|
Budde M, Friedrichs S, Alliey-Rodriguez N, Ament S, Badner JA, Berrettini WH, Bloss CS, Byerley W, Cichon S, Comes AL, Coryell W, Craig DW, Degenhardt F, Edenberg HJ, Foroud T, Forstner AJ, Frank J, Gershon ES, Goes FS, Greenwood TA, Guo Y, Hipolito M, Hood L, Keating BJ, Koller DL, Lawson WB, Liu C, Mahon PB, McInnis MG, McMahon FJ, Meier SM, Mühleisen TW, Murray SS, Nievergelt CM, Nurnberger JI, Nwulia EA, Potash JB, Quarless D, Rice J, Roach JC, Scheftner WA, Schork NJ, Shekhtman T, Shilling PD, Smith EN, Streit F, Strohmaier J, Szelinger S, Treutlein J, Witt SH, Zandi PP, Zhang P, Zöllner S, Bickeböller H, Falkai PG, Kelsoe JR, Nöthen MM, Rietschel M, Schulze TG, Malzahn D. Efficient region-based test strategy uncovers genetic risk factors for functional outcome in bipolar disorder. Eur Neuropsychopharmacol 2019; 29:156-170. [PMID: 30503783 DOI: 10.1016/j.euroneuro.2018.10.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Revised: 10/16/2018] [Accepted: 10/23/2018] [Indexed: 11/21/2022]
Abstract
Genome-wide association studies of case-control status have advanced the understanding of the genetic basis of psychiatric disorders. Further progress may be gained by increasing sample size but also by new analysis strategies that advance the exploitation of existing data, especially for clinically important quantitative phenotypes. The functionally-informed efficient region-based test strategy (FIERS) introduced herein uses prior knowledge on biological function and dependence of genotypes within a powerful statistical framework with improved sensitivity and specificity for detecting consistent genetic effects across studies. As proof of concept, FIERS was used for the first genome-wide single nucleotide polymorphism (SNP)-based investigation on bipolar disorder (BD) that focuses on an important aspect of disease course, the functional outcome. FIERS identified a significantly associated locus on chromosome 15 (hg38: chr15:48965004 - 49464789 bp) with consistent effect strength between two independent studies (GAIN/TGen: European Americans, BOMA: Germans; n = 1592 BD patients in total). Protective and risk haplotypes were found on the most strongly associated SNPs. They contain a CTCF binding site (rs586758); CTCF sites are known to regulate sets of genes within a chromatin domain. The rs586758 - rs2086256 - rs1904317 haplotype is located in the promoter flanking region of the COPS2 gene, close to microRNA4716, and the EID1, SHC4, DTWD1 genes as plausible biological candidates. While implication with BD is novel, COPS2, EID1, and SHC4 are known to be relevant for neuronal differentiation and function and DTWD1 for psychopharmacological side effects. The test strategy FIERS that enabled this discovery is equally applicable for tag SNPs and sequence data.
Collapse
Affiliation(s)
- Monika Budde
- Institute of Psychiatric Phenomics and Genomics, University Hospital, LMU Munich, Nussbaumstr. 7, Munich 80336, Germany
| | - Stefanie Friedrichs
- Department of Genetic Epidemiology, University Medical Center Göttingen, Georg-August-University, Göttingen 37099, Germany
| | - Ney Alliey-Rodriguez
- Department of Psychiatry and Behavioral Neuroscience, University of Chicago, Chicago, IL 60637, United States
| | - Seth Ament
- Institute for Systems Biology, Seattle, WA 98109, United States
| | - Judith A Badner
- Department of Psychiatry, Rush University Medical Center, Chicago, IL 60612, United States
| | - Wade H Berrettini
- Department of Psychiatry, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Cinnamon S Bloss
- University of California San Diego, La Jolla, CA 92093, United States
| | - William Byerley
- Department of Psychiatry, University of California at San Francisco, San Francisco, CA 94103, United States
| | - Sven Cichon
- Human Genomics Research Group, Department of Biomedicine, University of Basel, Basel 4031, Switzerland; Institute of Medical Genetics and Pathology, University Hospital Basel, Basel 4031, Switzerland; Institute of Neuroscience and Medicine (INM-1), Research Centre Jülich, Jülich 52425, Germany
| | - Ashley L Comes
- Institute of Psychiatric Phenomics and Genomics, University Hospital, LMU Munich, Nussbaumstr. 7, Munich 80336, Germany; International Max Planck Research School for Translational Psychiatry, Max Planck Institute of Psychiatry, Munich 80804, Germany
| | - William Coryell
- University of Iowa Hospitals and Clinics, Iowa City, IA 52242, United States
| | - David W Craig
- The Translational Genomics Research Institute, Phoenix, AZ 85004, United States
| | - Franziska Degenhardt
- Institute of Human Genetics, School of Medicine & University Hospital Bonn, University of Bonn, Bonn 53127, Germany; Department of Genomics, Life & Brain Center, University of Bonn, Bonn 53127, Germany
| | - Howard J Edenberg
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN 46202, United States; Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, United States
| | - Tatiana Foroud
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, United States
| | - Andreas J Forstner
- Institute of Human Genetics, School of Medicine & University Hospital Bonn, University of Bonn, Bonn 53127, Germany; Department of Genomics, Life & Brain Center, University of Bonn, Bonn 53127, Germany; Human Genomics Research Group, Department of Biomedicine, University of Basel, Basel 4031, Switzerland; Department of Psychiatry (UPK), University of Basel, Basel 4012, Switzerland
| | - Josef Frank
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim 68159, Germany
| | - Elliot S Gershon
- Department of Psychiatry and Behavioral Neuroscience, University of Chicago, Chicago, IL 60637, United States
| | - Fernando S Goes
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD 21287, United States
| | - Tiffany A Greenwood
- Department of Psychiatry, University of California San Diego, San Diego, CA 92093, United States
| | - Yiran Guo
- Center for Applied Genomics, Children's Hospital of Philadelphia, Abramson Research Center, Philadelphia, PA 19104, United States; Beijing Genomics Institute at Shenzhen, Shenzhen 518083, China
| | - Maria Hipolito
- Department of Psychiatry and Behavioral Sciences, Howard University Hospital, Washington, DC 20060, United States
| | - Leroy Hood
- Institute for Systems Biology, Seattle, WA 98109, United States
| | - Brendan J Keating
- Cardiovascular Institute, University of Pennsylvania School of Medicine, Philadelphia, PA 19104-5159, United States; Institute for Translational Medicine and Therapeutics, School of Medicine, University of Pennsylvania, Philadelphia, PA 19104-5158, United States
| | - Daniel L Koller
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, United States
| | - William B Lawson
- Dell Medical School, University of Texas at Austin, Austin, TX 78723, United States
| | - Chunyu Liu
- SUNY Upstate Medical University, Syracuse, NY 13210, United States
| | - Pamela B Mahon
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD 21287, United States
| | - Melvin G McInnis
- Department of Psychiatry, University of Michigan, Ann Arbor, MI 48105, United States
| | - Francis J McMahon
- U.S. Department of Health & Human Services, Intramural Research Program, National Institute of Mental Health, National Institutes of Health, Bethesda, MD 20894, United States
| | - Sandra M Meier
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim 68159, Germany; National Centre for Register-Based Research, Aarhus University, Aarhus V 8210, Denmark
| | - Thomas W Mühleisen
- Institute of Neuroscience and Medicine (INM-1), Research Centre Jülich, Jülich 52425, Germany; Human Genomics Research Group, Department of Biomedicine, University of Basel, Basel 4031, Switzerland
| | - Sarah S Murray
- Scripps Genomic Medicine & The Scripps Translational Sciences Institute (STSI), La Jolla, CA 92037, United States; Department of Pathology, University of California San Diego, La Jolla, CA 92093, United States
| | - Caroline M Nievergelt
- Department of Psychiatry, University of California San Diego, San Diego, CA 92093, United States
| | - John I Nurnberger
- Department of Psychiatry, Indiana University School of Medicine, Indianapolis, IN 46202, United States
| | - Evaristus A Nwulia
- Department of Psychiatry and Behavioral Sciences, Howard University Hospital, Washington, DC 20060, United States
| | - James B Potash
- Department of Psychiatry, Carver College of Medicine, University of Iowa School of Medicine, Iowa City, IA 52242, United States
| | - Danjuma Quarless
- J. Craig Venter Institute, La Jolla, CA 92037, United States; University of California San Diego, La Jolla, CA 92093, United States
| | - John Rice
- Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, United States
| | - Jared C Roach
- Institute for Systems Biology, Seattle, WA 98109, United States
| | | | - Nicholas J Schork
- J. Craig Venter Institute, La Jolla, CA 92037, United States; The Translational Genomics Research Institute, Phoenix, AZ 85004, United States; University of California San Diego, La Jolla, CA 92093, United States
| | - Tatyana Shekhtman
- Department of Psychiatry, University of California San Diego, San Diego, CA 92093, United States
| | - Paul D Shilling
- Department of Psychiatry, University of California San Diego, San Diego, CA 92093, United States
| | - Erin N Smith
- Scripps Genomic Medicine & The Scripps Translational Sciences Institute (STSI), La Jolla, CA 92037, United States; Department of Pediatrics and Rady's Children's Hospital, School of Medicine, University of California San Diego, La Jolla, CA 92037, United States
| | - Fabian Streit
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim 68159, Germany
| | - Jana Strohmaier
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim 68159, Germany
| | - Szabolcs Szelinger
- The Translational Genomics Research Institute, Phoenix, AZ 85004, United States
| | - Jens Treutlein
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim 68159, Germany
| | - Stephanie H Witt
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim 68159, Germany
| | - Peter P Zandi
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, United States
| | - Peng Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, United States
| | - Sebastian Zöllner
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, United States; Department of Psychiatry, University of Michigan, Ann Arbor, MI 48105, United States
| | - Heike Bickeböller
- Department of Genetic Epidemiology, University Medical Center Göttingen, Georg-August-University, Göttingen 37099, Germany
| | - Peter G Falkai
- Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, Munich 80336, Germany
| | - John R Kelsoe
- Department of Psychiatry, University of California San Diego, San Diego, CA 92093, United States
| | - Markus M Nöthen
- Institute of Human Genetics, School of Medicine & University Hospital Bonn, University of Bonn, Bonn 53127, Germany; Department of Genomics, Life & Brain Center, University of Bonn, Bonn 53127, Germany
| | - Marcella Rietschel
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim 68159, Germany
| | - Thomas G Schulze
- Institute of Psychiatric Phenomics and Genomics, University Hospital, LMU Munich, Nussbaumstr. 7, Munich 80336, Germany; Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim 68159, Germany; Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD 21287, United States; U.S. Department of Health & Human Services, Intramural Research Program, National Institute of Mental Health, National Institutes of Health, Bethesda, MD 20894, United States.
| | - Dörthe Malzahn
- Department of Genetic Epidemiology, University Medical Center Göttingen, Georg-August-University, Göttingen 37099, Germany.
| |
Collapse
|
19
|
Large-scale genome-wide meta-analysis of polycystic ovary syndrome suggests shared genetic architecture for different diagnosis criteria. PLoS Genet 2018; 14:e1007813. [PMID: 30566500 PMCID: PMC6300389 DOI: 10.1371/journal.pgen.1007813] [Citation(s) in RCA: 275] [Impact Index Per Article: 45.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Accepted: 11/06/2018] [Indexed: 11/19/2022] Open
Abstract
Polycystic ovary syndrome (PCOS) is a disorder characterized by hyperandrogenism, ovulatory dysfunction and polycystic ovarian morphology. Affected women frequently have metabolic disturbances including insulin resistance and dysregulation of glucose homeostasis. PCOS is diagnosed with two different sets of diagnostic criteria, resulting in a phenotypic spectrum of PCOS cases. The genetic similarities between cases diagnosed based on the two criteria have been largely unknown. Previous studies in Chinese and European subjects have identified 16 loci associated with risk of PCOS. We report a fixed-effect, inverse-weighted-variance meta-analysis from 10,074 PCOS cases and 103,164 controls of European ancestry and characterisation of PCOS related traits. We identified 3 novel loci (near PLGRKT, ZBTB16 and MAPRE1), and provide replication of 11 previously reported loci. Only one locus differed significantly in its association by diagnostic criteria; otherwise the genetic architecture was similar between PCOS diagnosed by self-report and PCOS diagnosed by NIH or non-NIH Rotterdam criteria across common variants at 13 loci. Identified variants were associated with hyperandrogenism, gonadotropin regulation and testosterone levels in affected women. Linkage disequilibrium score regression analysis revealed genetic correlations with obesity, fasting insulin, type 2 diabetes, lipid levels and coronary artery disease, indicating shared genetic architecture between metabolic traits and PCOS. Mendelian randomization analyses suggested variants associated with body mass index, fasting insulin, menopause timing, depression and male-pattern balding play a causal role in PCOS. The data thus demonstrate 3 novel loci associated with PCOS and similar genetic architecture for all diagnostic criteria. The data also provide the first genetic evidence for a male phenotype for PCOS and a causal link to depression, a previously hypothesized comorbid disease. Thus, the genetics provide a comprehensive view of PCOS that encompasses multiple diagnostic criteria, gender, reproductive potential and mental health. We performed an international meta-analysis of genome-wide association studies combining over 10,000,000 genetic markers in more than 10,000 European women with polycystic ovary syndrome (PCOS) and 100,000 controls. We found three new risk variants associated with PCOS. Our data demonstrate that the genetic architecture does not differ based on the diagnostic criteria used for PCOS. We also demonstrate a genetic pathway shared with male pattern baldness, representing the first evidence for shared disease biology in men, and shared genetics with depression, previously postulated based only on observational studies.
Collapse
|
20
|
Dobson L, Mészáros B, Tusnády GE. Structural Principles Governing Disease-Causing Germline Mutations. J Mol Biol 2018; 430:4955-4970. [DOI: 10.1016/j.jmb.2018.10.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Accepted: 10/11/2018] [Indexed: 01/03/2023]
|
21
|
Raimondi D, Orlando G, Tabaro F, Lenaerts T, Rooman M, Moreau Y, Vranken WF. Large-scale in-silico statistical mutagenesis analysis sheds light on the deleteriousness landscape of the human proteome. Sci Rep 2018; 8:16980. [PMID: 30451933 PMCID: PMC6242909 DOI: 10.1038/s41598-018-34959-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Accepted: 10/26/2018] [Indexed: 12/18/2022] Open
Abstract
Next generation sequencing technologies are providing increasing amounts of sequencing data, paving the way for improvements in clinical genetics and precision medicine. The interpretation of the observed genomic variants in the light of their phenotypic effects is thus emerging as a crucial task to solve in order to advance our understanding of how exomic variants affect proteins and how the proteins' functional changes affect human health. Since the experimental evaluation of the effects of every observed variant is unfeasible, Bioinformatics methods are being developed to address this challenge in-silico, by predicting the impact of millions of variants, thus providing insight into the deleteriousness landscape of entire proteomes. Here we show the feasibility of this approach by using the recently developed DEOGEN2 variant-effect predictor to perform the largest in-silico mutagenesis scan to date. We computed the deleteriousness score of 170 million variants over 15000 human proteins and we analysed the results, investigating how the predicted deleteriousness landscape of the proteins relates to known functionally and structurally relevant protein regions and biophysical properties. Moreover, we qualitatively validated our results by comparing them with two mutagenesis studies targeting two specific proteins, showing the consistency of DEOGEN2 predictions with respect to experimental data.
Collapse
Affiliation(s)
- Daniele Raimondi
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan, 1050, Brussels, Belgium
- ESAT-STADIUS, KU Leuven, Kasteelpark Arenberg 10, 3001, Leuven, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050, Brussels, Belgium
| | - Gabriele Orlando
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan, 1050, Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050, Brussels, Belgium
| | - Francesco Tabaro
- Institute of Biosciences and Medical Technology, Arvo Ylpőn katu 34, 33520, Tampere, Finland
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan, 1050, Brussels, Belgium
- Machine Learning Group, ULB, La Plaine Campus, 1050, Brussels, Belgium
| | - Marianne Rooman
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan, 1050, Brussels, Belgium
- Department of BioModeling, BioInformatics & BioProcesses, Université Libre de Bruxelles, 1050, Brussels, Belgium
| | - Yves Moreau
- ESAT-STADIUS, KU Leuven, Kasteelpark Arenberg 10, 3001, Leuven, Belgium
- Imec, 3001, Leuven, Belgium
| | - Wim F Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan, 1050, Brussels, Belgium.
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050, Brussels, Belgium.
| |
Collapse
|
22
|
Affiliation(s)
- Valerie Vaissier Welborn
- Kenneth S. Pitzer Center for Theoretical Chemistry and Department of Chemistry, University of California, Berkeley, California 94720, United States
- Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Teresa Head-Gordon
- Kenneth S. Pitzer Center for Theoretical Chemistry and Department of Chemistry, University of California, Berkeley, California 94720, United States
- Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
- Department of Chemical and Biomolecular Engineering and Department of Bioengineering, University of California, Berkeley, California 94720, United States
| |
Collapse
|
23
|
Corso G, Figueiredo J, La Vecchia C, Veronesi P, Pravettoni G, Macis D, Karam R, Lo Gullo R, Provenzano E, Toesca A, Mazzocco K, Carneiro F, Seruca R, Melo S, Schmitt F, Roviello F, De Scalzi AM, Intra M, Feroce I, De Camilli E, Villardita MG, Trentin C, De Lorenzi F, Bonanni B, Galimberti V. Hereditary lobular breast cancer with an emphasis on E-cadherin genetic defect. J Med Genet 2018; 55:431-441. [PMID: 29929997 DOI: 10.1136/jmedgenet-2018-105337] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Revised: 05/25/2018] [Accepted: 06/05/2018] [Indexed: 12/22/2022]
Abstract
Recent studies have reported germline CDH1 mutations in cases of lobular breast cancer (LBC) not associated with the classical hereditary diffuse gastric cancer syndrome. A multidisciplinary workgroup discussed genetic susceptibility, pathophysiology and clinical management of hereditary LBC (HLBC). The team has established the clinical criteria for CDH1 screening and results' interpretation, and created consensus guidelines regarding genetic counselling, breast surveillance and imaging techniques, clinicopathological findings, psychological and decisional support, as well as prophylactic surgery and plastic reconstruction. Based on a review of current evidence for the identification of HLBC cases/families, CDH1 genetic testing is recommended in patients fulfilling the following criteria: (A) bilateral LBC with or without family history of LBC, with age at onset <50 years, and (B) unilateral LBC with family history of LBC, with age at onset <45 years. In CDH1 asymptomatic mutant carriers, breast surveillance with clinical examination, yearly mammography, contrast-enhanced breast MRI and breast ultrasonography (US) with 6-month interval between the US and the MRI should be implemented as a first approach. In selected cases with personal history, family history of LBC and CDH1 mutations, prophylactic mastectomy could be discussed with an integrative group of clinical experts. Psychodecisional support also plays a pivotal role in the management of individuals with or without CDH1 germline alterations. Ultimately, the definition of a specific protocol for CDH1 genetic screening and ongoing coordinated management of patients with HLBC is crucial for the effective surveillance and early detection of LBC.
Collapse
Affiliation(s)
- Giovanni Corso
- Division of Breast Surgery, European Institute of Oncology, Milano, Italy
| | - Joana Figueiredo
- EPIC Lab, Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal.,Institute of Molecular Pathology and Immunology, University of Porto (IPATIMUP), Porto, Portugal
| | - Carlo La Vecchia
- Department of Clinical Sciences and Community Health, University of Milan, Milan, Italy
| | - Paolo Veronesi
- Division of Breast Surgery, European Institute of Oncology, Milano, Italy.,Oncology and Hematology, University of Milan, Milan, Italy
| | - Gabriella Pravettoni
- Oncology and Hematology, University of Milan, Milan, Italy.,Applied Research Division for Cognitive and Psychological Science, European Institute of Oncology, Milan, Italy
| | - Debora Macis
- Division of Cancer Prevention and Genetics, European Institute of Oncology, Milan, Italy
| | | | - Roberto Lo Gullo
- Division of Breast Imaging, European Institute of Oncology, Milan, Italy
| | - Elena Provenzano
- NIHR Cambridge Biomedical Research Centre, Addenbrooke's Hospital, Cambridge, UK.,Cambridge Breast Cancer Research Unit, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK.,Department of Histopathology, Addenbrooke's Hospital, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Antonio Toesca
- Division of Breast Surgery, European Institute of Oncology, Milano, Italy
| | - Ketti Mazzocco
- Oncology and Hematology, University of Milan, Milan, Italy.,Applied Research Division for Cognitive and Psychological Science, European Institute of Oncology, Milan, Italy
| | - Fátima Carneiro
- Institute of Molecular Pathology and Immunology, University of Porto (IPATIMUP), Porto, Portugal.,Division of Pathology, Hospital São Joao, Porto, Portugal
| | - Raquel Seruca
- EPIC Lab, Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal.,Institute of Molecular Pathology and Immunology, University of Porto (IPATIMUP), Porto, Portugal.,Medical Faculty of the University of Porto, Porto, Portugal
| | - Soraia Melo
- EPIC Lab, Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal.,Institute of Molecular Pathology and Immunology, University of Porto (IPATIMUP), Porto, Portugal.,Medical Faculty of the University of Porto, Porto, Portugal
| | - Fernando Schmitt
- EPIC Lab, Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal.,Institute of Molecular Pathology and Immunology, University of Porto (IPATIMUP), Porto, Portugal.,Medical Faculty of the University of Porto, Porto, Portugal
| | - Franco Roviello
- Departments of Surgery and Pathology, Le Scotte Hospital, University of Siena, Siena, Italy
| | | | - Mattia Intra
- Division of Breast Surgery, European Institute of Oncology, Milano, Italy
| | - Irene Feroce
- Division of Cancer Prevention and Genetics, European Institute of Oncology, Milan, Italy
| | - Elisa De Camilli
- Division of Pathology, European Institute of Oncology, Milan, Italy
| | | | - Chiara Trentin
- Division of Breast Imaging, European Institute of Oncology, Milan, Italy
| | | | - Bernardo Bonanni
- Division of Cancer Prevention and Genetics, European Institute of Oncology, Milan, Italy
| | - Viviana Galimberti
- Division of Breast Surgery, European Institute of Oncology, Milano, Italy
| |
Collapse
|
24
|
Ancien F, Pucci F, Godfroid M, Rooman M. Prediction and interpretation of deleterious coding variants in terms of protein structural stability. Sci Rep 2018. [PMID: 29540703 PMCID: PMC5852127 DOI: 10.1038/s41598-018-22531-2] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The classification of human genetic variants into deleterious and neutral is a challenging issue, whose complexity is rooted in the large variety of biophysical mechanisms that can be responsible for disease conditions. For non-synonymous mutations in structured proteins, one of these is the protein stability change, which can lead to loss of protein structure or function. We developed a stability-driven knowledge-based classifier that uses protein structure, artificial neural networks and solvent accessibility-dependent combinations of statistical potentials to predict whether destabilizing or stabilizing mutations are disease-causing. Our predictor yields a balanced accuracy of 71% in cross validation. As expected, it has a very high positive predictive value of 89%: it predicts with high accuracy the subset of mutations that are deleterious because of stability issues, but is by construction unable of classifying variants that are deleterious for other reasons. Its combination with an evolutionary-based predictor increases the balanced accuracy up to 75%, and allowed predicting more than 1/4 of the variants with 95% positive predictive value. Our method, called SNPMuSiC, can be used with both experimental and modeled structures and compares favorably with other prediction tools on several independent test sets. It constitutes a step towards interpreting variant effects at the molecular scale. SNPMuSiC is freely available at https://soft.dezyme.com/.
Collapse
Affiliation(s)
- François Ancien
- Department of BioModeling, BioInformatics & BioProcesses, Université Libre de Bruxelles (ULB), CP 165/61, Roosevelt Avenue 50, 1050, Brussels, Belgium. .,Interuniversity Institute of Bioinformatics in Brussels, ULB, CP 263, Triumph Bld, 1050, Brussels, Belgium.
| | - Fabrizio Pucci
- Department of BioModeling, BioInformatics & BioProcesses, Université Libre de Bruxelles (ULB), CP 165/61, Roosevelt Avenue 50, 1050, Brussels, Belgium. .,Interuniversity Institute of Bioinformatics in Brussels, ULB, CP 263, Triumph Bld, 1050, Brussels, Belgium.
| | - Maxime Godfroid
- Department of BioModeling, BioInformatics & BioProcesses, Université Libre de Bruxelles (ULB), CP 165/61, Roosevelt Avenue 50, 1050, Brussels, Belgium.,Institute of General Microbiology, Kiel University, Am Botanischen Garten 11, 24118, Kiel, Germany
| | - Marianne Rooman
- Department of BioModeling, BioInformatics & BioProcesses, Université Libre de Bruxelles (ULB), CP 165/61, Roosevelt Avenue 50, 1050, Brussels, Belgium. .,Interuniversity Institute of Bioinformatics in Brussels, ULB, CP 263, Triumph Bld, 1050, Brussels, Belgium.
| |
Collapse
|
25
|
Abstract
Humoral immune responses against the malaria parasite are an important component of a protective immune response. Antibodies are often directed towards conformational epitopes, and the native structure of the antigenic region is usually critical for antibody recognition. We examined the structural features of various Plasmodium antigens that may impact on epitope location, by performing a comprehensive analysis of known and modelled structures from P. falciparum. Examining the location of known polymorphisms over all available structures, we observed a strong propensity for polymorphic residues to be exposed on the surface and to occur in particular secondary structure segments such as hydrogen-bonded turns. We also utilised established prediction algorithms for B-cell epitopes and MHC class II binding peptides, examining predicted epitopes in relation to known polymorphic sites within structured regions. Finally, we used the available structures to examine polymorphic hotspots and Tajima's D values using a spatial averaging approach. We identified a region of PfAMA1 involving both domains II and III under a high degree of balancing selection relative to the rest of the protein. In summary, we developed general methods for examining how sequence-based features relate to one another in three-dimensional space and applied these methods to key P. falciparum antigens.
Collapse
|
26
|
Predicting the Functional Impact of CDH1 Missense Mutations in Hereditary Diffuse Gastric Cancer. Int J Mol Sci 2017; 18:ijms18122687. [PMID: 29231860 PMCID: PMC5751289 DOI: 10.3390/ijms18122687] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 11/28/2017] [Accepted: 11/30/2017] [Indexed: 12/20/2022] Open
Abstract
The role of E-cadherin in Hereditary Diffuse Gastric Cancer (HDGC) is unequivocal. Germline alterations in its encoding gene (CDH1) are causative of HDGC and occur in about 40% of patients. Importantly, while in most cases CDH1 alterations result in the complete loss of E-cadherin associated with a well-established clinical impact, in about 20% of cases the mutations are of the missense type. The latter are of particular concern in terms of genetic counselling and clinical management, as the effect of the sequence variants in E-cadherin function is not predictable. If a deleterious variant is identified, prophylactic surgery could be recommended. Therefore, over the last few years, intensive research has focused on evaluating the functional consequences of CDH1 missense variants and in assessing E-cadherin pathogenicity. In that context, our group has contributed to better characterize CDH1 germline missense variants and is now considered a worldwide reference centre. In this review, we highlight the state of the art methodologies to categorize CDH1 variants, as neutral or deleterious. This information is subsequently integrated with clinical data for genetic counseling and management of CDH1 variant carriers.
Collapse
|
27
|
Gray VE, Hause RJ, Luebeck J, Shendure J, Fowler DM. Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data. Cell Syst 2017; 6:116-124.e3. [PMID: 29226803 DOI: 10.1016/j.cels.2017.11.003] [Citation(s) in RCA: 118] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2017] [Revised: 08/30/2017] [Accepted: 11/03/2017] [Indexed: 11/26/2022]
Abstract
Large datasets describing the quantitative effects of mutations on protein function are becoming increasingly available. Here, we leverage these datasets to develop Envision, which predicts the magnitude of a missense variant's molecular effect. Envision combines 21,026 variant effect measurements from nine large-scale experimental mutagenesis datasets, a hitherto untapped training resource, with a supervised, stochastic gradient boosting learning algorithm. Envision outperforms other missense variant effect predictors both on large-scale mutagenesis data and on an independent test dataset comprising 2,312 TP53 variants whose effects were measured using a low-throughput approach. This dataset was never used for hyperparameter tuning or model training and thus serves as an independent validation set. Envision prediction accuracy is also more consistent across amino acids than other predictors. Finally, we demonstrate that Envision's performance improves as more large-scale mutagenesis data are incorporated. We precompute Envision predictions for every possible single amino acid variant in human, mouse, frog, zebrafish, fruit fly, worm, and yeast proteomes (https://envision.gs.washington.edu/).
Collapse
Affiliation(s)
- Vanessa E Gray
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Ronald J Hause
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Jens Luebeck
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Howard Hughes Medical Institute, Seattle, WA 98195, USA
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Department of Bioengineering, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
28
|
Lee TS, Potts SJ, McGinniss MJ, Strom CM. Multiple Property Tolerance Analysis for the Evaluation of Missense Mutations. Evol Bioinform Online 2017. [DOI: 10.1177/117693430600200019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Computational prediction of the impact of a mutation on protein function is still not accurate enough for clinical diagnostics without additional human expert analysis. Sequence alignment-based methods have been extensively used but their results highly depend on the quality of the input alignments and the choice of sequences. Incorporating the structural information with alignments improves prediction accuracy. Here, we present a conservation of amino acid properties method for mutation prediction, Multiple Properties Tolerance Analysis (MuTA), and a new strategy, MuTA/S, to incorporate the solvent accessible surface (SAS) property into MuTA. Instead of combining multiple features by machine learning or mathematical methods, an intuitive strategy is used to divide the residues of a protein into different groups, and in each group the properties used is adjusted. The results for LacI, lysozyme, and HIV protease show that MuTA performs as well as the widely used SIFT algorithm while MuTA/S outperforms SIFT and MuTA by 2%–25% in terms of prediction accuracy. By incorporating the SAS term alone, the alignment dependency of overall prediction accuracy is significantly reduced. MuTA/S also defines a new way to incorporate any structural features and knowledge and may lead to more accurate predictions.
Collapse
Affiliation(s)
- Tai-Sung Lee
- Consortium for Bioinformatics and Computational Biology, and Department of Chemistry, University of Minnesota, P.O. Box 14800, Minneapolis, MN 55414
| | - Steven J. Potts
- Quest Diagnostics Nichols Institute, 33608 Ortega Highway, San Juan Capistrano, CA 92690
| | - Matthew J. McGinniss
- Quest Diagnostics Nichols Institute, 33608 Ortega Highway, San Juan Capistrano, CA 92690
| | - Charles M. Strom
- Quest Diagnostics Nichols Institute, 33608 Ortega Highway, San Juan Capistrano, CA 92690
| |
Collapse
|
29
|
Zhang J, Kinch LN, Cong Q, Weile J, Sun S, Cote AG, Roth FP, Grishin NV. Assessing predictions of fitness effects of missense mutations in SUMO-conjugating enzyme UBE2I. Hum Mutat 2017; 38:1051-1063. [PMID: 28817247 PMCID: PMC5746193 DOI: 10.1002/humu.23293] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2017] [Revised: 06/29/2017] [Accepted: 06/30/2017] [Indexed: 11/07/2022]
Abstract
The exponential growth of genomic variants uncovered by next-generation sequencing necessitates efficient and accurate computational analyses to predict their functional effects. A number of computational methods have been developed for the task, but few unbiased comparisons of their performance are available. To fill the gap, The Critical Assessment of Genome Interpretation (CAGI) comprehensively assesses phenotypic predictions on newly collected experimental datasets. Here, we present the results of the SUMO conjugase challenge where participants were predicting functional effects of missense mutations in human SUMO-conjugating enzyme UBE2I. The performance of the predictors is similar to each other and is far from perfection. Evolutionary information from sequence alignments dominates the success: deleterious mutations at conserved positions and benign mutations at variable positions are accurately predicted. Prediction accuracy of other mutations remains unsatisfactory, and this fast-growing field of research is yet to learn the use of spatial structure information to improve the predictions significantly.
Collapse
Affiliation(s)
- Jing Zhang
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, Texas 75390-8816, USA
| | - Lisa N. Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, Texas 75390-9050, USA
| | - Qian Cong
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, Texas 75390-8816, USA
| | - Jochen Weile
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto Ontario M5G 1X5, Canada
- The Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Song Sun
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto Ontario M5G 1X5, Canada
- The Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Atina G Cote
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto Ontario M5G 1X5, Canada
- The Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Frederick P. Roth
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto Ontario M5G 1X5, Canada
- The Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Computer Science University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Nick V. Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, Texas 75390-9050, USA
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, Texas 75390-8816, USA
| |
Collapse
|
30
|
Pan Y, Liu D, Deng L. Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties. PLoS One 2017; 12:e0179314. [PMID: 28614374 PMCID: PMC5470696 DOI: 10.1371/journal.pone.0179314] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2017] [Accepted: 05/27/2017] [Indexed: 12/20/2022] Open
Abstract
Single amino acid variations (SAVs) potentially alter biological functions, including causing diseases or natural differences between individuals. Identifying the relationship between a SAV and certain disease provides the starting point for understanding the underlying mechanisms of specific associations, and can help further prevention and diagnosis of inherited disease.We propose PredSAV, a computational method that can effectively predict how likely SAVs are to be associated with disease by incorporating gradient tree boosting (GTB) algorithm and optimally selected neighborhood features. A two-step feature selection approach is used to explore the most relevant and informative neighborhood properties that contribute to the prediction of disease association of SAVs across a wide range of sequence and structural features, especially some novel structural neighborhood features. In cross-validation experiments on the benchmark dataset, PredSAV achieves promising performances with an AUC score of 0.908 and a specificity of 0.838, which are significantly better than that of the other existing methods. Furthermore, we validate the capability of our proposed method by an independent test and gain a competitive advantage as a result. PredSAV, which combines gradient tree boosting with optimally selected neighborhood features, can return reliable predictions in distinguishing between disease-associated and neutral variants. Compared with existing methods, PredSAV shows improved specificity as well as increased overall performance.
Collapse
Affiliation(s)
- Yuliang Pan
- School of Software, Central South University, Changsha, China
| | - Diwei Liu
- School of Software, Central South University, Changsha, China
| | - Lei Deng
- School of Software, Central South University, Changsha, China
- Shanghai Key Laboratory of Intelligent Information Processing, Shanghai, China
| |
Collapse
|
31
|
Molecular modeling and molecular dynamic simulation of the effects of variants in the TGFBR2 kinase domain as a paradigm for interpretation of variants obtained by next generation sequencing. PLoS One 2017; 12:e0170822. [PMID: 28182693 PMCID: PMC5300139 DOI: 10.1371/journal.pone.0170822] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 01/11/2017] [Indexed: 01/01/2023] Open
Abstract
Variants in the TGFBR2 kinase domain cause several human diseases and can increase propensity for cancer. The widespread application of next generation sequencing within the setting of Individualized Medicine (IM) is increasing the rate at which TGFBR2 kinase domain variants are being identified. However, their clinical relevance is often uncertain. Consequently, we sought to evaluate the use of molecular modeling and molecular dynamics (MD) simulations for assessing the potential impact of variants within this domain. We documented the structural differences revealed by these models across 57 variants using independent MD simulations for each. Our simulations revealed various mechanisms by which variants may lead to functional alteration; some are revealed energetically, while others structurally or dynamically. We found that the ATP binding site and activation loop dynamics may be affected by variants at positions throughout the structure. This prediction cannot be made from the linear sequence alone. We present our structure-based analyses alongside those obtained using several commonly used genomics-based predictive algorithms. We believe the further mechanistic information revealed by molecular modeling will be useful in guiding the examination of clinically observed variants throughout the exome, as well as those likely to be discovered in the near future by clinical tests leveraging next-generation sequencing through IM efforts.
Collapse
|
32
|
Mahdieh N, Rabbani B. Beta thalassemia in 31,734 cases with HBB gene mutations: Pathogenic and structural analysis of the common mutations; Iran as the crossroads of the Middle East. Blood Rev 2016; 30:493-508. [DOI: 10.1016/j.blre.2016.07.001] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2015] [Revised: 06/13/2016] [Accepted: 07/08/2016] [Indexed: 12/16/2022]
|
33
|
Jardin F, Pujals A, Pelletier L, Bohers E, Camus V, Mareschal S, Dubois S, Sola B, Ochmann M, Lemonnier F, Viailly PJ, Bertrand P, Maingonnat C, Traverse-Glehen A, Gaulard P, Damotte D, Delarue R, Haioun C, Argueta C, Landesman Y, Salles G, Jais JP, Figeac M, Copie-Bergman C, Molina TJ, Picquenot JM, Cornic M, Fest T, Milpied N, Lemasle E, Stamatoullas A, Moeller P, Dyer MJS, Sundstrom C, Bastard C, Tilly H, Leroy K. Recurrent mutations of the exportin 1 gene (XPO1) and their impact on selective inhibitor of nuclear export compounds sensitivity in primary mediastinal B-cell lymphoma. Am J Hematol 2016; 91:923-30. [PMID: 27312795 DOI: 10.1002/ajh.24451] [Citation(s) in RCA: 78] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Revised: 06/11/2016] [Accepted: 06/14/2016] [Indexed: 02/01/2023]
Abstract
Primary mediastinal B-cell lymphoma (PMBL) is an entity of B-cell lymphoma distinct from the other molecular subtypes of diffuse large B-cell lymphoma (DLBCL). We investigated the prevalence, specificity, and clinical relevance of mutations of XPO1, which encodes a member of the karyopherin-β nuclear transporters, in a large cohort of PMBL. PMBL cases defined histologically or by gene expression profiling (GEP) were sequenced and the XPO1 mutational status was correlated to genetic and clinical characteristics. The XPO1 mutational status was also assessed in DLBCL, Hodgkin lymphoma (HL) and mediastinal gray-zone lymphoma (MGZL).The biological impact of the mutation on Selective Inhibitor of Nuclear Export (SINE) compounds (KPT-185/330) sensitivity was investigated in vitro. XPO1 mutations were present in 28/117 (24%) PMBL cases and in 5/19 (26%) HL cases but absent/rare in MGZL (0/20) or DLBCL (3/197). A higher prevalence (50%) of the recurrent codon 571 variant (p.E571K) was observed in GEP-defined PMBL and was associated with shorter PFS. Age, International Prognostic Index and bulky mass were similar in XPO1 mutant and wild-type cases. KPT-185 induced a dose-dependent decrease in cell proliferation and increased cell-death in PMBL cell lines harboring wild type or XPO1 E571K mutant alleles. Experiments in transfected U2OS cells further confirmed that the XPO1 E571K mutation does not have a drastic impact on KPT-330 binding. To conclude the XPO1 E571K mutation represents a genetic hallmark of the PMBL subtype and serves as a new relevant PMBL biomarker. SINE compounds appear active for both mutated and wild-type protein. Am. J. Hematol. 91:923-930, 2016. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Fabrice Jardin
- Departement of Hematology, Inserm U918, Centre Henri Becquerel, Rouen, France
| | - Anais Pujals
- Departement of Hematology, Inserm U955 Team 09, APHP Hospital Henri Mondor, Créteil, France
| | - Laura Pelletier
- Departement of Hematology, Inserm U955 Team 09, APHP Hospital Henri Mondor, Créteil, France
| | - Elodie Bohers
- Departement of Hematology, Inserm U918, Centre Henri Becquerel, Rouen, France
| | - Vincent Camus
- Departement of Hematology, Inserm U918, Centre Henri Becquerel, Rouen, France
| | - Sylvain Mareschal
- Departement of Hematology, Inserm U918, Centre Henri Becquerel, Rouen, France
| | - Sydney Dubois
- Departement of Hematology, Inserm U918, Centre Henri Becquerel, Rouen, France
| | - Brigitte Sola
- Departement of Hematology, Normandie Univ, UNICAEN, Caen, EA4652, France
| | - Marlène Ochmann
- Departement of Hematology, Inserm U917, CHU Pontchaillou, Rennes, France
| | - François Lemonnier
- Departement of Hematology, Inserm U955 Team 09, APHP Hospital Henri Mondor, Créteil, France
| | | | - Philippe Bertrand
- Departement of Hematology, Inserm U918, Centre Henri Becquerel, Rouen, France
| | | | | | - Philippe Gaulard
- Departement of Hematology, Inserm U955 Team 09, APHP Hospital Henri Mondor, Créteil, France
| | - Diane Damotte
- Departement of Hematology, Hospices Civils De Lyon, Lyon-1 University, Pierre Benite, CNRS UMR5239, France
| | - Richard Delarue
- Department of Pathology, Hôpitaux Universitaires, Paris Centre, Team « Cancer, Immune Control, and Escape » INSERM U1138, Cordeliers Research Center, Paris, France
| | - Corinne Haioun
- Departement of Hematology, Inserm U955 Team 09, APHP Hospital Henri Mondor, Créteil, France
| | | | - Yosef Landesman
- Department of Hematology, Necker Hospital, AP-HP, Paris, France
| | | | | | - Martin Figeac
- Departement of Genomics, Functional Genomic Platforms, IRCL, Lille, France
| | | | | | | | - Marie Cornic
- Departement of Hematology, Inserm U918, Centre Henri Becquerel, Rouen, France
| | - Thierry Fest
- Departement of Hematology, Inserm U917, CHU Pontchaillou, Rennes, France
| | - Noel Milpied
- Department of Hematology, CHU De Bordeaux, France
| | - Emilie Lemasle
- Departement of Hematology, Inserm U918, Centre Henri Becquerel, Rouen, France
| | | | - Peter Moeller
- Department of Pathology, Institute of Pathology, University of Ulm, Germany
| | - Martin J S Dyer
- Department of Hematology, Ernest and Helen Scott Haematological Research Institute, University of Leicester, Leicester, United Kingdom
| | | | - Christian Bastard
- Departement of Hematology, Inserm U918, Centre Henri Becquerel, Rouen, France
| | - Hervé Tilly
- Departement of Hematology, Inserm U918, Centre Henri Becquerel, Rouen, France
| | - Karen Leroy
- Departement of Hematology, Inserm U955 Team 09, APHP Hospital Henri Mondor, Créteil, France
| |
Collapse
|
34
|
Lugo-Martinez J, Pejaver V, Pagel KA, Jain S, Mort M, Cooper DN, Mooney SD, Radivojac P. The Loss and Gain of Functional Amino Acid Residues Is a Common Mechanism Causing Human Inherited Disease. PLoS Comput Biol 2016; 12:e1005091. [PMID: 27564311 PMCID: PMC5001644 DOI: 10.1371/journal.pcbi.1005091] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Accepted: 08/02/2016] [Indexed: 01/12/2023] Open
Abstract
Elucidating the precise molecular events altered by disease-causing genetic variants represents a major challenge in translational bioinformatics. To this end, many studies have investigated the structural and functional impact of amino acid substitutions. Most of these studies were however limited in scope to either individual molecular functions or were concerned with functional effects (e.g. deleterious vs. neutral) without specifically considering possible molecular alterations. The recent growth of structural, molecular and genetic data presents an opportunity for more comprehensive studies to consider the structural environment of a residue of interest, to hypothesize specific molecular effects of sequence variants and to statistically associate these effects with genetic disease. In this study, we analyzed data sets of disease-causing and putatively neutral human variants mapped to protein 3D structures as part of a systematic study of the loss and gain of various types of functional attribute potentially underlying pathogenic molecular alterations. We first propose a formal model to assess probabilistically function-impacting variants. We then develop an array of structure-based functional residue predictors, evaluate their performance, and use them to quantify the impact of disease-causing amino acid substitutions on catalytic activity, metal binding, macromolecular binding, ligand binding, allosteric regulation and post-translational modifications. We show that our methodology generates actionable biological hypotheses for up to 41% of disease-causing genetic variants mapped to protein structures suggesting that it can be reliably used to guide experimental validation. Our results suggest that a significant fraction of disease-causing human variants mapping to protein structures are function-altering both in the presence and absence of stability disruption. Identifying the molecular changes caused by mutations is a major challenge in understanding and treating human genetic disease. To address this problem, we have developed a wide range of profiling tools designed to predict specific types of functional site from protein 3D structures. We then apply these tools to data sets of inherited disease-associated and putatively neutral amino acid substitutions and estimate the relative contribution of the loss and gain of functional residues in disease. Our results suggest that alterations of molecular function are involved in a significant number of cases of human genetic disease and are over-represented as compared to putatively neutral variants. Additionally, we use experimental data to show that it is possible to computationally identify the loss of specific functional events in disease pathogenesis. Finally, our methodology can be used to reliably identify the potential molecular consequences of disease-causing genetic variants and hence prioritize experimental validation.
Collapse
Affiliation(s)
- Jose Lugo-Martinez
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana, United States of America
| | - Vikas Pejaver
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana, United States of America
| | - Kymberleigh A. Pagel
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana, United States of America
| | - Shantanu Jain
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana, United States of America
| | - Matthew Mort
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - David N. Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - Sean D. Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, United States of America
- * E-mail: (SDM); (PR)
| | - Predrag Radivojac
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana, United States of America
- * E-mail: (SDM); (PR)
| |
Collapse
|
35
|
Schuch JB, Paixão-Côrtes VR, Friedrich DC, Tovo-Rodrigues L. The contribution of protein intrinsic disorder to understand the role of genetic variants uncovered by autism spectrum disorders exome studies. Am J Med Genet B Neuropsychiatr Genet 2016; 171B:479-91. [PMID: 26892727 DOI: 10.1002/ajmg.b.32431] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/24/2015] [Accepted: 02/04/2016] [Indexed: 01/09/2023]
Abstract
Several autism spectrum disorders (ASD) exome studies suggest that coding single nucleotide variants (SNVs) play an important role on ASD etiology. Usually, the pathogenic effect of missense mutations is estimated through predictors that lose accuracy for those SNVs placed in intrinsically disordered regions of protein. Here, we used bioinformatics tools to investigate the effect of mutations described in ASD published exome studies (549 mutations) in protein disorder, considering post-translational modification, PEST and Molecular Recognition Features (MoRFs) motifs. Schizophrenia and type 2 diabetes (T2D) datasets were created for comparison purposes. The frequency of mutations predicted as disordered was comparable among the three datasets (38.1% in ASD, 35.7% in schizophrenia, 46.4% in T2D). However, the frequency of SNVs predicted to lead a gain or loss of functional sites or change intrinsic disorder tendencies was higher in ASD and schizophrenia than T2D (46.9%, 36.4%, and 23.1%, respectively). The results obtained by SIFT and PolyPhen-2 indicated that 38.9% and 34.4% of the mutations predicted, respectively, as tolerated and benign showed functional alterations in disorder properties. Given the frequency of mutations placed in IDRs and their functional impact, this study suggests that alterations in intrinsic disorder properties might play a role in ASD and schizophrenia etiologies. They should be taken into consideration when researching the pathogenicity of mutations in neurodevelopmental and psychiatric diseases. Finally, mutations with functional alterations in disorder properties must be potential targets for in vitro and in vivo functional studies.
Collapse
Affiliation(s)
- Jaqueline Bohrer Schuch
- Department of Genetics, Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Rio Grande do Sul, Brazil
| | | | - Deise C Friedrich
- Department of Cellular and Molecular Biology, Pontifical Catholic University of Rio Grande do Sul (PUCRS), Porto Alegre, Rio Grande do Sul, Brazil
| | - Luciana Tovo-Rodrigues
- Postgraduate Program in Epidemiology, Federal University of Pelotas (UFPel), Pelotas, Rio Grande do Sul, Brazil
| |
Collapse
|
36
|
Baugh EH, Simmons-Edler R, Müller CL, Alford RF, Volfovsky N, Lash AE, Bonneau R. Robust classification of protein variation using structural modelling and large-scale data integration. Nucleic Acids Res 2016; 44:2501-13. [PMID: 26926108 PMCID: PMC4824117 DOI: 10.1093/nar/gkw120] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Accepted: 02/16/2016] [Indexed: 01/23/2023] Open
Abstract
Existing methods for interpreting protein variation focus on annotating mutation pathogenicity rather than detailed interpretation of variant deleteriousness and frequently use only sequence-based or structure-based information. We present VIPUR, a computational framework that seamlessly integrates sequence analysis and structural modelling (using the Rosetta protein modelling suite) to identify and interpret deleterious protein variants. To train VIPUR, we collected 9477 protein variants with known effects on protein function from multiple organisms and curated structural models for each variant from crystal structures and homology models. VIPUR can be applied to mutations in any organism's proteome with improved generalized accuracy (AUROC .83) and interpretability (AUPR .87) compared to other methods. We demonstrate that VIPUR's predictions of deleteriousness match the biological phenotypes in ClinVar and provide a clear ranking of prediction confidence. We use VIPUR to interpret known mutations associated with inflammation and diabetes, demonstrating the structural diversity of disrupted functional sites and improved interpretation of mutations associated with human diseases. Lastly, we demonstrate VIPUR's ability to highlight candidate variants associated with human diseases by applying VIPUR to de novo variants associated with autism spectrum disorders.
Collapse
Affiliation(s)
- Evan H Baugh
- Department of Biology, New York University, New York, NY 10003, USA New York University Center for Genomics and Systems Biology, New York, NY 10003, USA
| | - Riley Simmons-Edler
- Department of Biology, New York University, New York, NY 10003, USA New York University Center for Genomics and Systems Biology, New York, NY 10003, USA
| | - Christian L Müller
- New York University Center for Genomics and Systems Biology, New York, NY 10003, USA Computer Science Department, New York University, New York, NY 10003, USA Simons Center for Data Analysis, Simons Foundation, New York, NY 10010, USA
| | - Rebecca F Alford
- Carnegie Mellon University Department of Chemistry, 5000 Forbes Ave, Pittsburgh, PA 15289, USA Commack High School, Commack, NY 11725, USA
| | | | | | - Richard Bonneau
- Department of Biology, New York University, New York, NY 10003, USA New York University Center for Genomics and Systems Biology, New York, NY 10003, USA Computer Science Department, New York University, New York, NY 10003, USA Simons Center for Data Analysis, Simons Foundation, New York, NY 10010, USA Simons Foundation, New York, NY 10010, USA
| |
Collapse
|
37
|
Bhowmick A, Sharma SC, Honma H, Head-Gordon T. The role of side chain entropy and mutual information for improving the de novo design of Kemp eliminases KE07 and KE70. Phys Chem Chem Phys 2016; 18:19386-96. [DOI: 10.1039/c6cp03622h] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Side chain entropy and mutual entropy information between residue pairs have been calculated for two de novo designed Kemp eliminase enzymes, KE07 and KE70, and for their most improved versions at the end of laboratory directed evolution (LDE).
Collapse
Affiliation(s)
- Asmit Bhowmick
- Department of Chemical and Biomolecular Engineering
- University of California Berkeley
- Berkeley
- USA
| | - Sudhir C. Sharma
- Department of Chemistry
- University of California Berkeley
- Berkeley
- USA
| | - Hallie Honma
- Department of Bioengineering, University of California Berkeley
- Berkeley
- USA
| | - Teresa Head-Gordon
- Department of Chemical and Biomolecular Engineering
- University of California Berkeley
- Berkeley
- USA
- Department of Chemistry
| |
Collapse
|
38
|
Layers: A molecular surface peeling algorithm and its applications to analyze protein structures. Sci Rep 2015; 5:16141. [PMID: 26553411 PMCID: PMC4639851 DOI: 10.1038/srep16141] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 10/01/2015] [Indexed: 11/08/2022] Open
Abstract
We present an algorithm 'Layers' to peel the atoms of proteins as layers. Using Layers we show an efficient way to transform protein structures into 2D pattern, named residue transition pattern (RTP), which is independent of molecular orientations. RTP explains the folding patterns of proteins and hence identification of similarity between proteins is simple and reliable using RTP than with the standard sequence or structure based methods. Moreover, Layers generates a fine-tunable coarse model for the molecular surface by using non-random sampling. The coarse model can be used for shape comparison, protein recognition and ligand design. Additionally, Layers can be used to develop biased initial configuration of molecules for protein folding simulations. We have developed a random forest classifier to predict the RTP of a given polypeptide sequence. Layers is a standalone application; however, it can be merged with other applications to reduce the computational load when working with large datasets of protein structures. Layers is available freely at http://www.csb.iitkgp.ernet.in/applications/mol_layers/main.
Collapse
|
39
|
Brender JR, Zhang Y. Predicting the Effect of Mutations on Protein-Protein Binding Interactions through Structure-Based Interface Profiles. PLoS Comput Biol 2015; 11:e1004494. [PMID: 26506533 PMCID: PMC4624718 DOI: 10.1371/journal.pcbi.1004494] [Citation(s) in RCA: 99] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 08/06/2015] [Indexed: 11/18/2022] Open
Abstract
The formation of protein-protein complexes is essential for proteins to perform their physiological functions in the cell. Mutations that prevent the proper formation of the correct complexes can have serious consequences for the associated cellular processes. Since experimental determination of protein-protein binding affinity remains difficult when performed on a large scale, computational methods for predicting the consequences of mutations on binding affinity are highly desirable. We show that a scoring function based on interface structure profiles collected from analogous protein-protein interactions in the PDB is a powerful predictor of protein binding affinity changes upon mutation. As a standalone feature, the differences between the interface profile score of the mutant and wild-type proteins has an accuracy equivalent to the best all-atom potentials, despite being two orders of magnitude faster once the profile has been constructed. Due to its unique sensitivity in collecting the evolutionary profiles of analogous binding interactions and the high speed of calculation, the interface profile score has additional advantages as a complementary feature to combine with physics-based potentials for improving the accuracy of composite scoring approaches. By incorporating the sequence-derived and residue-level coarse-grained potentials with the interface structure profile score, a composite model was constructed through the random forest training, which generates a Pearson correlation coefficient >0.8 between the predicted and observed binding free-energy changes upon mutation. This accuracy is comparable to, or outperforms in most cases, the current best methods, but does not require high-resolution full-atomic models of the mutant structures. The binding interface profiling approach should find useful application in human-disease mutation recognition and protein interface design studies. Few proteins carry out their tasks in isolation. Instead, proteins combine with each other in complicated ways that can be affected by either the natural genetic variation that occurs among people or by disease causing mutations such as those that occur in cancer or in genetic disorders. To understand how these mutations affect our health, it is necessary to understand how mutations can affect the strength of the interactions that bind proteins together. This is a difficult task to do in a laboratory on a large scale and scientists are increasingly turning to computational methods to predict these effects in advance. We show that by looking at the multiple alignments of similar protein-protein complex structures at the interface regions, new constraints based on the evolution of the three dimensional structures of proteins can be made to predict which mutations are compatible with two proteins interacting and which are not.
Collapse
Affiliation(s)
- Jeffrey R. Brender
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail:
| |
Collapse
|
40
|
Anoosha P, Huang LT, Sakthivel R, Karunagaran D, Gromiha MM. Discrimination of driver and passenger mutations in epidermal growth factor receptor in cancer. Mutat Res 2015; 780:24-34. [PMID: 26264175 DOI: 10.1016/j.mrfmmm.2015.07.005] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2015] [Revised: 05/21/2015] [Accepted: 07/07/2015] [Indexed: 06/04/2023]
Abstract
Cancer is one of the most life-threatening diseases and mutations in several genes are the vital cause in tumorigenesis. Protein kinases play essential roles in cancer progression and specifically, epidermal growth factor receptor (EGFR) is an important target for cancer therapy. In this work, we have developed a method to classify single amino acid polymorphisms (SAPs) in EGFR into disease-causing (driver) and neutral (passenger) mutations using both sequence and structure based features of the mutation site by machine learning approaches. We compiled a set of 222 features and selected a set of 21 properties utilizing feature selection methods, for maximizing the prediction performance. In a set of 540 mutants, we obtained an overall classification accuracy of 67.8% with 10 fold cross validation using support vector machines. Further, the mutations have been grouped into four sets based on secondary structure and accessible surface area, which enhanced the overall classification accuracy to 80.2%, 81.9%, 77.9% and 75.1% for helix, strand, coil-buried and coil-exposed mutants, respectively. The method was tested with a blind dataset of 60 mutations, which showed an average accuracy of 85.4%. These accuracy levels are superior to other methods available in the literature for EGFR mutants, with an increase of more than 30%. Moreover, we have screened all possible single amino acid polymorphisms (SAPs) in EGFR and suggested the probable driver and passenger mutations, which would help in the development of mutation specific drugs for cancer treatment.
Collapse
Affiliation(s)
- P Anoosha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600 036, Tamil Nadu, India
| | - Liang-Tsung Huang
- Department of Medical Informatics, Tzu Chi University, Hualien 970, Taiwan
| | - R Sakthivel
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600 036, Tamil Nadu, India
| | - D Karunagaran
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600 036, Tamil Nadu, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600 036, Tamil Nadu, India.
| |
Collapse
|
41
|
Computational approaches to study the effects of small genomic variations. J Mol Model 2015; 21:251. [PMID: 26350246 DOI: 10.1007/s00894-015-2794-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Accepted: 08/23/2015] [Indexed: 10/23/2022]
Abstract
Advances in DNA sequencing technologies have led to an avalanche-like increase in the number of gene sequences deposited in public databases over the last decade as well as the detection of an enormous number of previously unseen nucleotide variants therein. Given the size and complex nature of the genome-wide sequence variation data, as well as the rate of data generation, experimental characterization of the disease association of each of these variations or their effects on protein structure/function would be costly, laborious, time-consuming, and essentially impossible. Thus, in silico methods to predict the functional effects of sequence variations are constantly being developed. In this review, we summarize the major computational approaches and tools that are aimed at the prediction of the functional effect of mutations, and describe the state-of-the-art databases that can be used to obtain information about mutation significance. We also discuss future directions in this highly competitive field.
Collapse
|
42
|
Rockah-Shmuel L, Tóth-Petróczy Á, Tawfik DS. Systematic Mapping of Protein Mutational Space by Prolonged Drift Reveals the Deleterious Effects of Seemingly Neutral Mutations. PLoS Comput Biol 2015; 11:e1004421. [PMID: 26274323 PMCID: PMC4537296 DOI: 10.1371/journal.pcbi.1004421] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Accepted: 06/30/2015] [Indexed: 11/18/2022] Open
Abstract
Systematic mappings of the effects of protein mutations are becoming increasingly popular. Unexpectedly, these experiments often find that proteins are tolerant to most amino acid substitutions, including substitutions in positions that are highly conserved in nature. To obtain a more realistic distribution of the effects of protein mutations, we applied a laboratory drift comprising 17 rounds of random mutagenesis and selection of M.HaeIII, a DNA methyltransferase. During this drift, multiple mutations gradually accumulated. Deep sequencing of the drifted gene ensembles allowed determination of the relative effects of all possible single nucleotide mutations. Despite being averaged across many different genetic backgrounds, about 67% of all nonsynonymous, missense mutations were evidently deleterious, and an additional 16% were likely to be deleterious. In the early generations, the frequency of most deleterious mutations remained high. However, by the 17th generation, their frequency was consistently reduced, and those remaining were accepted alongside compensatory mutations. The tolerance to mutations measured in this laboratory drift correlated with sequence exchanges seen in M.HaeIII’s natural orthologs. The biophysical constraints dictating purging in nature and in this laboratory drift also seemed to overlap. Our experiment therefore provides an improved method for measuring the effects of protein mutations that more closely replicates the natural evolutionary forces, and thereby a more realistic view of the mutational space of proteins. Understanding and predicting the effects of single nucleotide polymorphisms (SNPs) is of fundamental importance in many fields. Systematic experimental mappings of the effects of such mutations within a given gene/protein comprise an essential experimental tool for determining protein function and for refining models of protein evolution, as well as an important resource for improving prediction algorithms. Here, we present the results of a laboratory system that mimics the manner by which protein sequences diverge in nature: a prolonged process of gradually accumulating random mutations that retain the protein’s structure and function. The change in frequencies of mutations over generations, as obtained by deep sequencing, enabled us to assess the relative effects of all possible SNPs at the background of an accumulating number of mutations. Compared to previous reports, we found that > 80% of all possible amino acid exchanges have potential deleterious effects, with 67% being clearly deleterious. Tolerance vs. purging of mutations in our prolonged drift also showed better correlation with natural diversity. Overall, our experimental setup provides a better understanding of how protein sequences diverge in nature, plus a new basis for improving the prediction accuracy of the effects of protein mutations, and specifically of SNPs.
Collapse
Affiliation(s)
- Liat Rockah-Shmuel
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Ágnes Tóth-Petróczy
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Dan S. Tawfik
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
- * E-mail:
| |
Collapse
|
43
|
Cardoso JGR, Andersen MR, Herrgård MJ, Sonnenschein N. Analysis of genetic variation and potential applications in genome-scale metabolic modeling. Front Bioeng Biotechnol 2015; 3:13. [PMID: 25763369 PMCID: PMC4329917 DOI: 10.3389/fbioe.2015.00013] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2014] [Accepted: 01/22/2015] [Indexed: 11/13/2022] Open
Abstract
Genetic variation is the motor of evolution and allows organisms to overcome the environmental challenges they encounter. It can be both beneficial and harmful in the process of engineering cell factories for the production of proteins and chemicals. Throughout the history of biotechnology, there have been efforts to exploit genetic variation in our favor to create strains with favorable phenotypes. Genetic variation can either be present in natural populations or it can be artificially created by mutagenesis and selection or adaptive laboratory evolution. On the other hand, unintended genetic variation during a long term production process may lead to significant economic losses and it is important to understand how to control this type of variation. With the emergence of next-generation sequencing technologies, genetic variation in microbial strains can now be determined on an unprecedented scale and resolution by re-sequencing thousands of strains systematically. In this article, we review challenges in the integration and analysis of large-scale re-sequencing data, present an extensive overview of bioinformatics methods for predicting the effects of genetic variants on protein function, and discuss approaches for interfacing existing bioinformatics approaches with genome-scale models of cellular processes in order to predict effects of sequence variation on cellular phenotypes.
Collapse
Affiliation(s)
- João G. R. Cardoso
- The Novo Nordisk Foundation Center of Biosustainability, Technical University of Denmark, Hørsholm, Denmark
| | | | - Markus J. Herrgård
- The Novo Nordisk Foundation Center of Biosustainability, Technical University of Denmark, Hørsholm, Denmark
| | - Nikolaus Sonnenschein
- The Novo Nordisk Foundation Center of Biosustainability, Technical University of Denmark, Hørsholm, Denmark
| |
Collapse
|
44
|
Bioinformatics tools for discovery and functional analysis of single nucleotide polymorphisms. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2015; 827:287-310. [PMID: 25387971 DOI: 10.1007/978-94-017-9245-5_17] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
With the high speed DNA sequencing of genome, databases of genome data continue to grow, and the understanding of genetic variation between individuals grows as well. Single nucleotide polymorphisms (SNPs), a main type of genetic variation, are increasingly important resource for understanding the structure and function of the human genome and become a valuable resource for investigating the genetic basis of disease. During the past years, in addition to experimental approaches to characterize specific variants, intense bioinformatics techniques were applied to understand effects of these genetic changes. In the genetics studies, one intends to understand the molecular basis of disease, and computational methods are becoming increasingly important for SNPs selection, prediction and understanding the downstream effects of genetic variation. The review provides systematic information on the available resources and methods for SNPs discovery and analysis. We also report some new results on DNA sequence-based prediction of SNPs in human cytochrome P450, which serves as an example of computational methods to predict and discovery SNPs. Additionally, annotation and prediction of functional SNPs, as well as a comprehensive list of existing tools and online recourses, are reviewed and described.
Collapse
|
45
|
Magesh R, George Priya Doss C. Computational pipeline to identify and characterize functional mutations in ornithine transcarbamylase deficiency. 3 Biotech 2014; 4:621-634. [PMID: 28324312 PMCID: PMC4235886 DOI: 10.1007/s13205-014-0216-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2014] [Accepted: 04/01/2014] [Indexed: 11/28/2022] Open
Abstract
Ornithine transcarbamylase (OTC) (E.C. 2.1.3.3) is one of the enzymes in the urea cycle, which involves in a sequence of reactions in the liver cells. During protein assimilation in our body surplus nitrogen is made, this open nitrogen is altered into urea and expelled out of the body by kidneys, in this cycle OTC helps in the conversion of free toxic nitrogen into urea. Ornithine transcarbamylase deficiency (OTCD: OMIM#311250) is triggered by mutation in this OTC gene. To date more than 200 mutations have been noted. Mutation in OTC gene indicates alteration in enzyme production, which upsets the ability to carry out the chemical reaction. The computational analysis was initiated to identify the deleterious nsSNPs in OTC gene in causing OTCD using five different computational tools such as SIFT, PolyPhen 2, I-Mutant 3, SNPs&Go, and PhD-SNP. Studies on the molecular basis of OTC gene and OTCD have been done partially till date. Hence, in silico categorization of functional SNPs in OTC gene can provide valuable insight in near future in the diagnosis and treatment of OTCD.
Collapse
Affiliation(s)
- R Magesh
- Department of Biotechnology, Faculty of Biomedical Sciences, Technology and Research, Sri Ramachandra University, Chennai, 600116, India
| | - C George Priya Doss
- Medical Biotechnology Division, School of Biosciences and Technology, VIT University, Vellore, India.
| |
Collapse
|
46
|
Katsonis P, Koire A, Wilson SJ, Hsu TK, Lua RC, Wilkins AD, Lichtarge O. Single nucleotide variations: biological impact and theoretical interpretation. Protein Sci 2014; 23:1650-66. [PMID: 25234433 PMCID: PMC4253807 DOI: 10.1002/pro.2552] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2014] [Revised: 09/12/2014] [Accepted: 09/15/2014] [Indexed: 12/27/2022]
Abstract
Genome-wide association studies (GWAS) and whole-exome sequencing (WES) generate massive amounts of genomic variant information, and a major challenge is to identify which variations drive disease or contribute to phenotypic traits. Because the majority of known disease-causing mutations are exonic non-synonymous single nucleotide variations (nsSNVs), most studies focus on whether these nsSNVs affect protein function. Computational studies show that the impact of nsSNVs on protein function reflects sequence homology and structural information and predict the impact through statistical methods, machine learning techniques, or models of protein evolution. Here, we review impact prediction methods and discuss their underlying principles, their advantages and limitations, and how they compare to and complement one another. Finally, we present current applications and future directions for these methods in biological research and medical genetics.
Collapse
Affiliation(s)
- Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of MedicineHouston, Texas
| | - Amanda Koire
- Department of Structural and Computational Biology and Molecular BiophysicsHouston, Texas
| | - Stephen Joseph Wilson
- Department of Biochemistry and Molecular Biology, Baylor College of MedicineHouston, Texas
| | - Teng-Kuei Hsu
- Department of Biochemistry and Molecular Biology, Baylor College of MedicineHouston, Texas
| | - Rhonald C Lua
- Department of Molecular and Human Genetics, Baylor College of MedicineHouston, Texas
| | - Angela Dawn Wilkins
- Department of Molecular and Human Genetics, Baylor College of MedicineHouston, Texas
- Computational and Integrative Biomedical Research Center, Baylor College of MedicineHouston, Texas
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of MedicineHouston, Texas
- Department of Structural and Computational Biology and Molecular BiophysicsHouston, Texas
- Department of Biochemistry and Molecular Biology, Baylor College of MedicineHouston, Texas
- Computational and Integrative Biomedical Research Center, Baylor College of MedicineHouston, Texas
- Department of Pharmacology, Baylor College of MedicineHouston, Texas
| |
Collapse
|
47
|
Abstract
The computational approaches in determining disease-associated Non-synonymous single nucleotide polymorphisms (nsSNPs) have evolved very rapidly. Large number of deleterious and disease-associated nsSNP detection tools have been developed in last decade showing high prediction reliability. Despite of all these highly efficient tools, we still lack the accuracy level in determining the genotype-phenotype association of predicted nsSNPs. Furthermore, there are enormous questions that are yet to be computationally compiled before we might talk about the prediction accuracy. Earlier we have incorporated molecular dynamics simulation approaches to foster the accuracy level of computational nsSNP analysis roadmap, which further helped us to determine the changes in the protein phenotype associated with the computationally predicted disease-associated mutation. Here we have discussed on the present scenario of computational nsSNP characterization technique and some of the questions that are crucial for the proper understanding of pathogenicity level for any disease associated mutations.
Collapse
|
48
|
Berliner N, Teyra J, Çolak R, Garcia Lopez S, Kim PM. Combining structural modeling with ensemble machine learning to accurately predict protein fold stability and binding affinity effects upon mutation. PLoS One 2014; 9:e107353. [PMID: 25243403 PMCID: PMC4170975 DOI: 10.1371/journal.pone.0107353] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2014] [Accepted: 07/21/2014] [Indexed: 12/04/2022] Open
Abstract
Advances in sequencing have led to a rapid accumulation of mutations, some of which are associated with diseases. However, to draw mechanistic conclusions, a biochemical understanding of these mutations is necessary. For coding mutations, accurate prediction of significant changes in either the stability of proteins or their affinity to their binding partners is required. Traditional methods have used semi-empirical force fields, while newer methods employ machine learning of sequence and structural features. Here, we show how combining both of these approaches leads to a marked boost in accuracy. We introduce ELASPIC, a novel ensemble machine learning approach that is able to predict stability effects upon mutation in both, domain cores and domain-domain interfaces. We combine semi-empirical energy terms, sequence conservation, and a wide variety of molecular details with a Stochastic Gradient Boosting of Decision Trees (SGB-DT) algorithm. The accuracy of our predictions surpasses existing methods by a considerable margin, achieving correlation coefficients of 0.77 for stability, and 0.75 for affinity predictions. Notably, we integrated homology modeling to enable proteome-wide prediction and show that accurate prediction on modeled structures is possible. Lastly, ELASPIC showed significant differences between various types of disease-associated mutations, as well as between disease and common neutral mutations. Unlike pure sequence-based prediction methods that try to predict phenotypic effects of mutations, our predictions unravel the molecular details governing the protein instability, and help us better understand the molecular causes of diseases.
Collapse
Affiliation(s)
- Niklas Berliner
- Terrence Donnelly Centre for Cellular and Biomolecular Research (CCBR), University of Toronto, Toronto, Ontario, Canada
| | - Joan Teyra
- Terrence Donnelly Centre for Cellular and Biomolecular Research (CCBR), University of Toronto, Toronto, Ontario, Canada
| | - Recep Çolak
- Terrence Donnelly Centre for Cellular and Biomolecular Research (CCBR), University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Sebastian Garcia Lopez
- Terrence Donnelly Centre for Cellular and Biomolecular Research (CCBR), University of Toronto, Toronto, Ontario, Canada
- Universidad Nacional de Colombia, Manizales, Colombia
| | - Philip M. Kim
- Terrence Donnelly Centre for Cellular and Biomolecular Research (CCBR), University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
49
|
Jia M, Yang B, Li Z, Shen H, Song X, Gu W. Computational analysis of functional single nucleotide polymorphisms associated with the CYP11B2 gene. PLoS One 2014; 9:e104311. [PMID: 25102047 PMCID: PMC4125216 DOI: 10.1371/journal.pone.0104311] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2014] [Accepted: 07/07/2014] [Indexed: 12/17/2022] Open
Abstract
Single nucleotide polymorphisms (SNPs) are the most common type of genetic variations in humans and play a major role in the genetics of human phenotype variation and the genetic basis of human complex diseases. Recently, there is considerable interest in understanding the possible role of the CYP11B2 gene with corticosterone methyl oxidase deficiency, primary aldosteronism, and cardio-cerebro-vascular diseases. Hence, the elucidation of the function and molecular dynamic behavior of CYP11B2 mutations is crucial in current genomics. In this study, we investigated the pathogenic effect of 51 nsSNPs and 26 UTR SNPs in the CYP11B2 gene through computational platforms. Using a combination of SIFT, PolyPhen, I-Mutant Suite, and ConSurf server, four nsSNPs (F487V, V129M, T498A, and V403E) were identified to potentially affect the structure, function, and activity of the CYP11B2 protein. Furthermore, molecular dynamics simulation and structure analyses also confirmed the impact of these nsSNPs on the stability and secondary properties of the CYP11B2 protein. Additionally, utilizing the UTRscan, MirSNP, PolymiRTS and miRNASNP, three SNPs in the 3'UTR region were predicted to exhibit a pattern change in the upstream open reading frames (uORF), and eight microRNA binding sites were found to be highly affected due to 3'UTR SNPs. This cataloguing of deleterious SNPs is essential for narrowing down the number of CYP11B2 mutations to be screened in genetic association studies and for a better understanding of the functional and structural aspects of the CYP11B2 protein.
Collapse
Affiliation(s)
- Minyue Jia
- Department of Endocrinology and Metabolism, the Second Affiliated Hospital Zhejiang University School of Medicine, Hangzhou, China
| | - Boyun Yang
- Department of Endocrinology and Metabolism, the Second Affiliated Hospital Zhejiang University School of Medicine, Hangzhou, China
| | - Zhongyi Li
- Department of Urology, the Second Affiliated Hospital (Binjiang Branch) Zhejiang University School of Medicine, Hangzhou Binjiang Hospital, Hangzhou, China
| | - Huiling Shen
- Department of Endocrinology and Metabolism, the Second Affiliated Hospital Zhejiang University School of Medicine, Hangzhou, China
| | - Xiaoxiao Song
- Department of Endocrinology and Metabolism, the Second Affiliated Hospital Zhejiang University School of Medicine, Hangzhou, China
| | - Wei Gu
- Department of Endocrinology and Metabolism, the Second Affiliated Hospital Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
50
|
Computational and experimental approaches to reveal the effects of single nucleotide polymorphisms with respect to disease diagnostics. Int J Mol Sci 2014; 15:9670-717. [PMID: 24886813 PMCID: PMC4100115 DOI: 10.3390/ijms15069670] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Revised: 05/15/2014] [Accepted: 05/16/2014] [Indexed: 12/25/2022] Open
Abstract
DNA mutations are the cause of many human diseases and they are the reason for natural differences among individuals by affecting the structure, function, interactions, and other properties of DNA and expressed proteins. The ability to predict whether a given mutation is disease-causing or harmless is of great importance for the early detection of patients with a high risk of developing a particular disease and would pave the way for personalized medicine and diagnostics. Here we review existing methods and techniques to study and predict the effects of DNA mutations from three different perspectives: in silico, in vitro and in vivo. It is emphasized that the problem is complicated and successful detection of a pathogenic mutation frequently requires a combination of several methods and a knowledge of the biological phenomena associated with the corresponding macromolecules.
Collapse
|