1
|
Shukla K, Idanwekhai K, Naradikian M, Ting S, Schoenberger SP, Brunk E. Machine Learning of Three-Dimensional Protein Structures to Predict the Functional Impacts of Genome Variation. J Chem Inf Model 2024; 64:5328-5343. [PMID: 38635316 DOI: 10.1021/acs.jcim.3c01967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
Research in the human genome sciences generates a substantial amount of genetic data for hundreds of thousands of individuals, which concomitantly increases the number of variants of unknown significance (VUS). Bioinformatic analyses can successfully reveal rare variants and variants with clear associations with disease-related phenotypes. These studies have had a significant impact on how clinical genetic screens are interpreted and how patients are stratified for treatment. There are few, if any, computational methods for variants comparable to biological activity predictions. To address this gap, we developed a machine learning method that uses protein three-dimensional structures from AlphaFold to predict how a variant will influence changes to a gene's downstream biological pathways. We trained state-of-the-art machine learning classifiers to predict which protein regions will most likely impact transcriptional activities of two proto-oncogenes, nuclear factor erythroid 2 (NFE2L2)-related factor 2 (NRF2) and c-Myc. We have identified classifiers that attain accuracies higher than 80%, which have allowed us to identify a set of key protein regions that lead to significant perturbations in c-Myc or NRF2 transcriptional pathway activities.
Collapse
Affiliation(s)
- Kriti Shukla
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
| | - Kelvin Idanwekhai
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
| | - Martin Naradikian
- La Jolla Institute for Immunology, San Diego, California 92093, United States
| | - Stephanie Ting
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
| | | | - Elizabeth Brunk
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- Integrative Program for Biological and Genome Sciences (IBGS), University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
| |
Collapse
|
2
|
Wang Y, Fang G, Chen X, Cao Y, Wu N, Cui Q, Zhu C, Qian L, Huang Y, Zhan S. The genome of the black cutworm Agrotis ipsilon. INSECT BIOCHEMISTRY AND MOLECULAR BIOLOGY 2021; 139:103665. [PMID: 34624466 DOI: 10.1016/j.ibmb.2021.103665] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 09/29/2021] [Accepted: 09/29/2021] [Indexed: 06/13/2023]
Abstract
The black cutworm (BCW), Agrotis ipsilon, is a worldwide polyphagous and underground pest that causes a high level of economic loss to a wide range of crops through the damage of roots. This species performs non-directed migration throughout East and Southeast Asia seasonally. Lack of a genome information has limited further studies on its unique biology and the development of novel management approaches. In this study, we present a 476 Mb de novo assembly of BCW, along with a consensus gene set of 14,801 protein-coding gene models. Quality controls show that both genome assembly and annotations are high-quality and mostly complete. We focus manual annotation and comparative genomics on gene families that related to the unique attributes of this species, such as nocturnality, long-distance migration, and host adaptation. We find that the BCW genome encodes a similar gene repertoire in various migration-related gene families to the diural migratory butterfly Danaus plexiipus, with additional copies of long wavelength opsin and two eye development-related genes. On the other hand, we find that the genomes of BCW and many other polyphagous lepidopterans encode many more gustatory receptor genes, particularly the lineage-specific expanded bitter receptor genes, than the mono- or oligo-phagous species, suggesting a common role of gustatory receptors (GRs) expansion in host range expansion. The availability of a BCW genome provides valuable resources to study the molecular mechanisms of non-directed migration in lepidopteran pests and to develop novel strategies to control migratory nocturnal pests.
Collapse
Affiliation(s)
- Yaohui Wang
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Gangqi Fang
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, China; CAS Center for Excellence in Biotic Interactions, University of Chinese Academy of Sciences, Beijing, China
| | - Xi'en Chen
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yanghui Cao
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Ningning Wu
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Qian Cui
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, China; CAS Center for Excellence in Biotic Interactions, University of Chinese Academy of Sciences, Beijing, China
| | - Chenxu Zhu
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Lansa Qian
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yongping Huang
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, China; CAS Center for Excellence in Biotic Interactions, University of Chinese Academy of Sciences, Beijing, China.
| | - Shuai Zhan
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, China; CAS Center for Excellence in Biotic Interactions, University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
3
|
Fang G, Zhang Q, Chen X, Cao Y, Wang Y, Qi M, Wu N, Qian L, Zhu C, Huang Y, Zhan S. The draft genome of the Asian corn borer yields insights into ecological adaptation of a devastating maize pest. INSECT BIOCHEMISTRY AND MOLECULAR BIOLOGY 2021; 138:103638. [PMID: 34428581 DOI: 10.1016/j.ibmb.2021.103638] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/13/2021] [Accepted: 08/16/2021] [Indexed: 06/13/2023]
Abstract
The Asian corn borer (ACB) is the most devastating pest on maize in the western Pacific region of Asia. Despite broad interests in insecticide resistance, seasonal adaptation, and larval color mimicry regarding the ACB system, lacking of reference genomic information and a powerful gene editing approach have hindered the in-depth studies of these aspects. Here we present a 455.7 Mb draft genome of ACB with 98.4% completeness. Comparative genomics analysis showed an evident expansion in gene families of gustatory receptors (105), which is related to polyphagous characteristics. Based on the comparative transcriptome analysis of resistant and susceptible ACB against Bt Cry1Ab toxin, we identified 26 genes related to Cry1Ab resistance. Additionally, transcriptomics of insects exposed to conditions of low temperature and diapause (LT) vs. room temperature and diapause (RT) provided insights into the genetic mechanisms of cold adaptation. We also successfully developed an efficient CRISPR/Cas9-based genome editing system and applied it to explore the role of color pattern genes in the ecological adaptation of ACB. Taken together, our study provides a fully annotated high-quality reference genome and efficient gene editing system to realize the potential of ACB as a study system to address important biological questions such as insecticide resistance, seasonal adaptation, and coloration. These valuable genomic resources will also benefit the development of novel strategies for maize pest management.
Collapse
Affiliation(s)
- Gangqi Fang
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200032, China; CAS Center for Excellence in Biotic Interactions, University of the Chinese Academy of Sciences, Beijing, 100049, China
| | - Qi Zhang
- College of Plant Protection, Shenyang Agricultural University, Shenyang, 110866, China
| | - Xi'en Chen
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200032, China
| | - Yanghui Cao
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200032, China
| | - Yaohui Wang
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200032, China
| | - Mengmeng Qi
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200032, China; CAS Center for Excellence in Biotic Interactions, University of the Chinese Academy of Sciences, Beijing, 100049, China
| | - Ningning Wu
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200032, China
| | - Lansa Qian
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200032, China
| | - Chenxu Zhu
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200032, China
| | - Yongping Huang
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200032, China; CAS Center for Excellence in Biotic Interactions, University of the Chinese Academy of Sciences, Beijing, 100049, China.
| | - Shuai Zhan
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200032, China; CAS Center for Excellence in Biotic Interactions, University of the Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
4
|
P R, Ramireddy S, Chakraborty S, Mukherjee S, J S, C S. Structural localization of pathogenic mutations in the central nucleotide-binding domain (NBD) of nucleotide-binding oligomerization domain-2 (NOD2) protein and their inference in inflammatory disorders. NUCLEOSIDES NUCLEOTIDES & NUCLEIC ACIDS 2021; 40:1198-1219. [PMID: 34622739 DOI: 10.1080/15257770.2021.1986719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
The human NBD domain which is centrally located in the NOD2 protein displays an essential role in oligomerization and initiates the immune response via CARD-RIPK2 interaction. The mutations associated with the NBD domain have been largely implicated in inflammatory disorders such as Blau syndrome and sarcoidosis. This study aims to determine the structural and phenotypic effect of a lethal mutation that occurs in the NBD domain which has an axiomatic impact on protein dysfunction. Initially, the most deleterious missense mutations were screened through various in silico analysis. Out of 33 variants, I-Mutant 3.0, SIFT, PolyPhen 2, Align GVGD, PHD SNP and SNP&GO have statistically identified 5 variants (R42W, D90E, E91K, G189D & W198L) as less stable, deleterious and damaging. Our predicted models have paved the way to understand the various structural properties such as physiochemical, secondary structural arrangements and stabilizing residues in folding associated with the native and mutant NBD domain especially of the functionally important regions. From the aforementioned results, R42W and G189D were found to be the more predominant among the mutants. Precisely, through molecular simulation, we have strongly justified the significant conformational disruption of R42W and G189D through the stabilization factors, folding and essential dynamics. Conclusively, these regions (α341-44, α13185-191 and β6133-143β7) seem to adopt such structures that are not conducive to wild-type-like functionality. Our prediction and validation of lethal mutations based on structural stability may be useful for conducting experimental studies in detail to uncover the protein deregulation leading to inflammatory disorders.
Collapse
Affiliation(s)
- Raghuraman P
- Department of Biotechnology, School of Bioscience and Technology, Vellore Institute of Technology, Vellore, India
| | - Sriroopreddy Ramireddy
- Department of Biotechnology, School of Bioscience and Technology, Vellore Institute of Technology, Vellore, India
| | - Sulagno Chakraborty
- Department of Biotechnology, School of Bioscience and Technology, Vellore Institute of Technology, Vellore, India
| | - Sayani Mukherjee
- Department of Biotechnology, School of Bioscience and Technology, Vellore Institute of Technology, Vellore, India
| | - Sreeshma J
- Department of Biotechnology, School of Bioscience and Technology, Vellore Institute of Technology, Vellore, India
| | - Sudandiradoss C
- Department of Biotechnology, School of Bioscience and Technology, Vellore Institute of Technology, Vellore, India
| |
Collapse
|
5
|
Li F, Fan C, Marquez-Lago TT, Leier A, Revote J, Jia C, Zhu Y, Smith AI, Webb GI, Liu Q, Wei L, Li J, Song J. PRISMOID: a comprehensive 3D structure database for post-translational modifications and mutations with functional impact. Brief Bioinform 2020; 21:1069-1079. [PMID: 31161204 PMCID: PMC7299293 DOI: 10.1093/bib/bbz050] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 03/26/2019] [Accepted: 03/29/2019] [Indexed: 12/26/2022] Open
Abstract
Post-translational modifications (PTMs) play very important roles in various cell signaling pathways and biological process. Due to PTMs' extremely important roles, many major PTMs have been studied, while the functional and mechanical characterization of major PTMs is well documented in several databases. However, most currently available databases mainly focus on protein sequences, while the real 3D structures of PTMs have been largely ignored. Therefore, studies of PTMs 3D structural signatures have been severely limited by the deficiency of the data. Here, we develop PRISMOID, a novel publicly available and free 3D structure database for a wide range of PTMs. PRISMOID represents an up-to-date and interactive online knowledge base with specific focus on 3D structural contexts of PTMs sites and mutations that occur on PTMs and in the close proximity of PTM sites with functional impact. The first version of PRISMOID encompasses 17 145 non-redundant modification sites on 3919 related protein 3D structure entries pertaining to 37 different types of PTMs. Our entry web page is organized in a comprehensive manner, including detailed PTM annotation on the 3D structure and biological information in terms of mutations affecting PTMs, secondary structure features and per-residue solvent accessibility features of PTM sites, domain context, predicted natively disordered regions and sequence alignments. In addition, high-definition JavaScript packages are employed to enhance information visualization in PRISMOID. PRISMOID equips a variety of interactive and customizable search options and data browsing functions; these capabilities allow users to access data via keyword, ID and advanced options combination search in an efficient and user-friendly way. A download page is also provided to enable users to download the SQL file, computational structural features and PTM sites' data. We anticipate PRISMOID will swiftly become an invaluable online resource, assisting both biologists and bioinformaticians to conduct experiments and develop applications supporting discovery efforts in the sequence-structural-functional relationship of PTMs and providing important insight into mutations and PTM sites interaction mechanisms. The PRISMOID database is freely accessible at http://prismoid.erc.monash.edu/. The database and web interface are implemented in MySQL, JSP, JavaScript and HTML with all major browsers supported.
Collapse
Affiliation(s)
- Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| | - Cunshuo Fan
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Tatiana T Marquez-Lago
- Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Jerico Revote
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, Australia
| | - Cangzhi Jia
- College of Science, Dalian Maritime University, Dalian, China
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | - Yan Zhu
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, Victoria, Australia
| | - A Ian Smith
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, Australia
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Leyi Wei
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jian Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| |
Collapse
|
6
|
Huang KY, Lee TY, Kao HJ, Ma CT, Lee CC, Lin TH, Chang WC, Huang HD. dbPTM in 2019: exploring disease association and cross-talk of post-translational modifications. Nucleic Acids Res 2020; 47:D298-D308. [PMID: 30418626 PMCID: PMC6323979 DOI: 10.1093/nar/gky1074] [Citation(s) in RCA: 146] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 10/19/2018] [Indexed: 12/25/2022] Open
Abstract
The dbPTM (http://dbPTM.mbc.nctu.edu.tw/) has been maintained for over 10 years with the aim to provide functional and structural analyses for post-translational modifications (PTMs). In this update, dbPTM not only integrates more experimentally validated PTMs from available databases and through manual curation of literature but also provides PTM-disease associations based on non-synonymous single nucleotide polymorphisms (nsSNPs). The high-throughput deep sequencing technology has led to a surge in the data generated through analysis of association between SNPs and diseases, both in terms of growth amount and scope. This update thus integrated disease-associated nsSNPs from dbSNP based on genome-wide association studies. The PTM substrate sites located at a specified distance in terms of the amino acids encoded from nsSNPs were deemed to have an association with the involved diseases. In recent years, increasing evidence for crosstalk between PTMs has been reported. Although mass spectrometry-based proteomics has substantially improved our knowledge about substrate site specificity of single PTMs, the fact that the crosstalk of combinatorial PTMs may act in concert with the regulation of protein function and activity is neglected. Because of the relatively limited information about concurrent frequency and functional relevance of PTM crosstalk, in this update, the PTM sites neighboring other PTM sites in a specified window length were subjected to motif discovery and functional enrichment analysis. This update highlights the current challenges in PTM crosstalk investigation and breaks the bottleneck of how proteomics may contribute to understanding PTM codes, revealing the next level of data complexity and proteomic limitation in prospective PTM research.
Collapse
Affiliation(s)
- Kai-Yao Huang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Life and Health Science, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Life and Health Science, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Hui-Ju Kao
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 32003, Taiwan
| | - Chen-Tse Ma
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 32003, Taiwan
| | - Chao-Chun Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 32003, Taiwan
| | - Tsai-Hsuan Lin
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 32003, Taiwan
| | - Wen-Chi Chang
- Institute of Tropical Plant Sciences, College of Biosciences and Biotechnology, National Cheng Kung University, Tainan 70101, Taiwan
| | - Hsien-Da Huang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Life and Health Science, The Chinese University of Hong Kong, Shenzhen 518172, China
| |
Collapse
|
7
|
Sen Gupta PS, Islam RNUI, Banerjee S, Nayek A, Rana MK, Bandyopadhyay AK. Screening and molecular characterization of lethal mutations of human homogentisate 1, 2 dioxigenase. J Biomol Struct Dyn 2020; 39:1661-1671. [PMID: 32107984 DOI: 10.1080/07391102.2020.1736158] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Affiliation(s)
- Parth Sarthi Sen Gupta
- Department of Biotechnology, The University of Burdwan, Bardhaman, West Bengal, India
- Department of Chemical Sciences, Indian Institute of Science Education and Research (IISER) Berhampur, Ganjam, Odisha, India
| | - Rifat Nawaz UI Islam
- Department of Biotechnology, The University of Burdwan, Bardhaman, West Bengal, India
| | - Sahini Banerjee
- Department of Biological Sciences, Indian Statistical Institute, Kolkata, West Bengal, India
| | - Arnab Nayek
- Department of Biotechnology, The University of Burdwan, Bardhaman, West Bengal, India
| | - Malay Kumar Rana
- Department of Chemical Sciences, Indian Institute of Science Education and Research (IISER) Berhampur, Ganjam, Odisha, India
| | | |
Collapse
|
8
|
Abstract
Aims:Post-Translational Modifications (PTMs), which include more than 450 types, can be regarded as the fundamental cellular regulation.Background:Recently, experiments demonstrated that the lysine malonylation modification is a significant process in several organisms and cells. Meanwhile, malonylation plays an important role in the regulation of protein subcellular localization, stability, translocation to lipid rafts and many other protein functions.Objective:Identification of malonylation will contribute to understanding the molecular mechanism in the field of biology. Nevertheless, several existing experimental approaches, which can hardly meet the need of the high speed data generation, are expensive and time-consuming. Moreover, some machine learning methods can hardly meet the high-accuracy need in this issue.Methods:In this study, we proposed a method, named MSIT that means malonylation sites identification tree, utilized the amino acid residues and profile information to identify the lysine malonylation sites with the tree structural neural network in the peptides sequence level.Methods:The proposed algorithm can get 0.8699 of F1 score and 89.34% in true positive ratio in E. coli. MSIT outperformed existing malonylation site identification methods and features on different species datasets.Conclusion:Based on these measures, it can be demonstrated that MSIT will be helpful in identifying candidate malonylation sites.
Collapse
Affiliation(s)
- Wenzheng Bao
- School of Information and Electrical Engineering, Xuzhou University of Technology, Xuzhou 221018, China
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Yue-Hui Chen
- School of Information, University of Jinan, Jinan 250022, China
| |
Collapse
|
9
|
Gopalakrishnan C, Al-Subaie AM, N N, Yeh HY, Tayubi IA, Kamaraj B. Prioritization of SNPs in y+LAT-1 culpable of Lysinuric protein intolerance and their mutational impacts using protein-protein docking and molecular dynamics simulation studies. J Cell Biochem 2019; 120:18496-18508. [PMID: 31211457 DOI: 10.1002/jcb.29172] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 05/21/2019] [Accepted: 05/23/2019] [Indexed: 12/18/2022]
Abstract
Lysinuric protein intolerance (LPI) is a rare, yet inimical, genetic disorder characterized by the paucity of essential dibasic amino acids in the cells. Amino acid transporter y+LAT-1 interacts with 4F2 cell-surface antigen heavy chain to transport the required dibasic amino acids. Mutation in y+LAT-1 is rumored to cause LPI. However, the underlying pathological mechanism is unknown, and, in this analysis, we investigate the impact of point mutation in y+LAT-1's interaction with 4F2 cell-surface antigen heavy chain in causing LPI. Using an efficient and extensive computational pipeline, we have isolated M50K and L334R single-nucleotide polymorphisms to be the most deleterious mutations in y+LAT-1s. Docking of mutant y+LAT-1 with 4F2 cell-surface antigen heavy chain showed decreased interaction compared with native y+LAT-1. Further, molecular dynamic simulation analysis reveals that the protein molecules increase in size, become more flexible, and alter their secondary structure upon mutation. We believe that these conformational changes because of mutation could be the reason for decreased interaction with 4F2 cell-surface antigen heavy chain causing LPI. Our analysis gives pathological insights about LPI and helps researchers to better understand the disease mechanism and develop an effective treatment strategy.
Collapse
Affiliation(s)
| | - Abeer Mohammed Al-Subaie
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - Nagasundaram N
- School of Humanities, Nanyang Technological University, Singapore
| | - Hui-Yuan Yeh
- School of Humanities, Nanyang Technological University, Singapore
| | - Iftikhar Alam Tayubi
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Balu Kamaraj
- Department of Neuroscience Technology, Imam Abdulrahman Bin Faisal University, Jubail, Saudi Arabia
| |
Collapse
|
10
|
Chong YK, Ho CC, Leung SY, Lau SK, Woo PC. Clinical Mass Spectrometry in the Bioinformatics Era: A Hitchhiker's Guide. Comput Struct Biotechnol J 2018; 16:316-334. [PMID: 30237866 PMCID: PMC6138949 DOI: 10.1016/j.csbj.2018.08.003] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Revised: 08/20/2018] [Accepted: 08/21/2018] [Indexed: 02/06/2023] Open
Abstract
Mass spectrometry (MS) is a sensitive, specific and versatile analytical technique in the clinical laboratory that has recently undergone rapid development. From initial use in metabolic profiling, it has matured into applications including clinical toxicology assays, target hormone and metabolite quantitation, and more recently, rapid microbial identification and antimicrobial resistance detection by matrix assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS). In this mini-review, we first succinctly outline the basics of clinical mass spectrometry. Examples of hard ionization (electron ionization) and soft ionization (electrospray ionization, MALDI) are presented to demonstrate their clinical applications. Next, a conceptual discourse on mass selection and determination is presented: quadrupole mass filter, time-of-flight mass spectrometer and the Orbitrap; and MS/MS (tandem-in-space, tandem-in-time and data acquisition), illustrated with clinical examples. Current applications in (1) bacterial and fungal identification, antimicrobial susceptibility testing and phylogenetic classification, (2) general unknown urine toxicology screening and expanded new-born metabolic screening and (3) clinical metabolic profiling by gas chromatography are outlined. Finally, major limitations of MS-based techniques, including the technical challenges of matrix effect and isobaric interference; and novel challenges in the post-genomic era, such as protein molecular variants, are critically discussed from the perspective of service laboratories. Computer technology and structural biology have played important roles in the maturation of this field. MS-based techniques have the potential to replace current analytical techniques, and existing expertise and instrument will undergo rapid evolution. Significant automation and adaptation to regulatory requirements are underway. Mass spectrometry is unleashing its potentials in clinical laboratories.
Collapse
Affiliation(s)
- Yeow-Kuan Chong
- Hospital Authority Toxicology Reference Laboratory, Department of Pathology, Princess Margaret Hospital (PMH), Kowloon, Hong Kong
- Chemical Pathology and Medical Genetics, Department of Pathology, Princess Margaret Hospital (PMH), Kowloon, Hong Kong
| | - Chi-Chun Ho
- Division of Chemical Pathology, Department of Clinical Pathology, Pamela Youde Nethersole Eastern Hospital (PYNEH), Hong Kong
- Division of Clinical Biochemistry, Department of Pathology, Queen Mary Hospital (QMH), Hong Kong
- Centre for Genomic Sciences, The University of Hong Kong, Hong Kong
- Department of Microbiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong
| | - Shui-Yee Leung
- Department of Ocean Science, School of Science, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Susanna K.P. Lau
- Department of Microbiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong
- State Key Laboratory of Emerging Infectious Diseases, Department of Microbiology, The University of Hong Kong, Hong Kong
- Research Centre of Infection and Immunology, The University of Hong Kong, Hong Kong
- Carol Yu Centre for Infection, The University of Hong Kong, Hong Kong
- Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The University of Hong Kong, Hong Kong
| | - Patrick C.Y. Woo
- Department of Microbiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong
- State Key Laboratory of Emerging Infectious Diseases, Department of Microbiology, The University of Hong Kong, Hong Kong
- Research Centre of Infection and Immunology, The University of Hong Kong, Hong Kong
- Carol Yu Centre for Infection, The University of Hong Kong, Hong Kong
- Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The University of Hong Kong, Hong Kong
| |
Collapse
|
11
|
Shi S, Wang L, Cao M, Chen G, Yu J. Proteomic analysis and prediction of amino acid variations that influence protein posttranslational modifications. Brief Bioinform 2018; 20:1597-1606. [DOI: 10.1093/bib/bby036] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 03/07/2018] [Indexed: 12/18/2022] Open
Abstract
Abstract
Accumulative studies have indicated that amino acid variations through changing the type of residues of the target sites or key flanking residues could directly or indirectly influence protein posttranslational modifications (PTMs) and bring about a detrimental effect on protein function. Computational mutation analysis can greatly narrow down the efforts on experimental work. To increase the utilization of current computational resources, we first provide an overview of computational prediction of amino acid variations that influence protein PTMs and their functional analysis. We also discuss the challenges that are faced while developing novel in silico approaches in the future. The development of better methods for mutation analysis-related protein PTMs will help to facilitate the development of personalized precision medicine.
Collapse
Affiliation(s)
- Shaoping Shi
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, Jiangxi 330031, China
| | - Lina Wang
- Department of Science, Nanchang Institute of Technology, Nanchang, Jiangxi 330031, China
| | - Man Cao
- Department of Mathematics, School of Sciences, Nanchang University, Nanchang, Jiangxi 330031, China
| | - Guodong Chen
- Department of Mathematics, School of Sciences, Nanchang University, Nanchang, Jiangxi 330031, China
| | - Jialin Yu
- Department of Mathematics, School of Sciences, Nanchang University, Nanchang, Jiangxi 330031, China
| |
Collapse
|
12
|
Xu HD, Wang LN, Wen PP, Shi SP, Qiu JD. Site-Specific Systematic Analysis of Lysine Modification Crosstalk. Proteomics 2018. [DOI: 10.1002/pmic.201700292] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Hao-Dong Xu
- Department of Chemistry; Nanchang University; No. 999 Xuefu Road Nanchang Honggutan New District Jiangxi Province 330031 P. R. China
| | - Li-Na Wang
- Department of Chemistry; Nanchang University; No. 999 Xuefu Road Nanchang Honggutan New District Jiangxi Province 330031 P. R. China
| | - Ping-Ping Wen
- Department of Chemistry; Nanchang University; No. 999 Xuefu Road Nanchang Honggutan New District Jiangxi Province 330031 P. R. China
| | - Shao-Ping Shi
- Department of Chemistry; Nanchang University; No. 999 Xuefu Road Nanchang Honggutan New District Jiangxi Province 330031 P. R. China
| | - Jian-Ding Qiu
- Department of Chemistry; Nanchang University; No. 999 Xuefu Road Nanchang Honggutan New District Jiangxi Province 330031 P. R. China
- Department of Materials and Chemical Engineering; Pingxiang University; Pingxiang P. R. China
| |
Collapse
|
13
|
Jemimah S, Yugandhar K, Michael Gromiha M. PROXiMATE: a database of mutant protein-protein complex thermodynamics and kinetics. Bioinformatics 2018; 33:2787-2788. [PMID: 28498885 DOI: 10.1093/bioinformatics/btx312] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2016] [Accepted: 05/10/2017] [Indexed: 11/13/2022] Open
Abstract
Summary We have developed PROXiMATE, a database of thermodynamic data for more than 6000 missense mutations in 174 heterodimeric protein-protein complexes, supplemented with interaction network data from STRING database, solvent accessibility, sequence, structural and functional information, experimental conditions and literature information. Additional features include complex structure visualization, search and display options, download options and a provision for users to upload their data. Availability and implementation The database is freely available at http://www.iitm.ac.in/bioinfo/PROXiMATE/ . The website is implemented in Python, and supports recent versions of major browsers such as IE10, Firefox, Chrome and Opera. Contact gromiha@iitm.ac.in. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sherlyn Jemimah
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - K Yugandhar
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| |
Collapse
|
14
|
Nagarajan N, Chellam J, Kannan RR. Exploring the functional impact of mutational drift in LRRK2 gene and identification of specific inhibitors for the treatment of Parkinson disease. J Cell Biochem 2018; 119:4878-4889. [PMID: 29369408 DOI: 10.1002/jcb.26703] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Accepted: 01/23/2018] [Indexed: 12/13/2022]
Abstract
Parkinson's disease (PD) is a disorder of the central nervous system that is caused due to the death of the dopaminergic neurons in the region of the brain called substantia nigra. Mutations in LRRK2 genes are associated with disease condition and it's been reported as crucial factor for drug resistance. Identification of deleterious mutations and studying the structural and functional impact of such mutations may lead to the identification of potential selective inhibitors. In this study, we analyzed 52 PD associated mutations, among that 20 were identified as highly deleterious. The deleterious mutations G2019S and I2020T in the kinase domain were playing a key role in causing resistance to drug levedopa. Molecular docking analyses have been performed to understand the binding affinity of levodapa with LRRK2 in wild and mutant condition. Molecular docking results show that levedopa binds differentially and obtained less number of hydrogen bonds in compared with wild type LRRK2. In addition, molecular dynamics simulations were performed to study the efficacy of docked complexes and it was observed that the efficacy of the mutant complexes (G2019S-Levodopa and I2020T-Levodopa) affected in the presence of mutation. Finally, through virtual screening approach specific inhibitors SCHEMBL6473053 and SCHEMBL1278779 have been identified that could potentially inhibit LLRK2 mutants G2019S and I2020T respectively. Over all this computational investigation correlates the impact of genotypic modulation in structure and function of drug target which enhanced in the identification of precision medicine to treat PD.
Collapse
Affiliation(s)
- Nagasundaram Nagarajan
- Molecular and Nanomedicine Research Unit, Centre for Nanoscience and Nanotechnology, Sathyabama University, Chennai, Tamil Nadu, India
| | - Jaynthy Chellam
- Department of Bioinformatics, Sathyabama University, Chennai, Tamil Nadu, India
| | - Rajaretinam Rajesh Kannan
- Molecular and Nanomedicine Research Unit, Centre for Nanoscience and Nanotechnology, Sathyabama University, Chennai, Tamil Nadu, India
| |
Collapse
|
15
|
Kumar S, Cieplak P. Effect of phosphorylation and single nucleotide polymorphisms on caspase substrates processing. Apoptosis 2018; 23:194-200. [DOI: 10.1007/s10495-018-1442-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
16
|
Dakal TC, Kala D, Dhiman G, Yadav V, Krokhotin A, Dokholyan NV. Predicting the functional consequences of non-synonymous single nucleotide polymorphisms in IL8 gene. Sci Rep 2017; 7:6525. [PMID: 28747718 PMCID: PMC5529537 DOI: 10.1038/s41598-017-06575-4] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Accepted: 06/14/2017] [Indexed: 01/19/2023] Open
Abstract
Here we report an in-silico approach for identification, characterization and validation of deleterious non-synonymous SNPs (nsSNPs) in the interleukin-8 gene using three steps. In first step, sequence homology-based genetic analysis of a set of 50 coding SNPs associated with 41 rsIDs using SIFT (Sorting Intolerant from Tolerant) and PROVEAN (Protein Variation Effect Analyzer) identified 23 nsSNPs to be putatively damaging/deleterious in at least one of the two tools used. Subsequently, structure-homology based PolyPhen-2 (Polymorphism Phenotyping) analysis predicted 9 of 23 nsSNPs (K4T, E31A, E31K, S41Y, I55N, P59L, P59S, L70P and V88D) to be damaging. According to the conditional hypothesis for the study, only nsSNPs that score damaging/deleterious prediction in both sequence and structural homology-based approach will be considered as 'high-confidence' nsSNPs. In step 2, based on conservation of amino acid residues, stability analysis, structural superimposition, RSMD and docking analysis, the possible structural-functional relationship was ascertained for high-confidence nsSNPs. Finally, in a separate analysis (step 3), the IL-8 deregulation has also appeared to be an important prognostic marker for detection of patients with gastric and lung cancer. This study, for the first time, provided in-depth insights on the effects of amino acid substitutions on IL-8 protein structure, function and disease association.
Collapse
Affiliation(s)
- Tikam Chand Dakal
- Department of Biosciences, Manipal University Jaipur, Dehmi Kalan, Off Jaipur-Ajmer Expressway, Jaipur, 303007, Rajasthan, India.
| | - Deepak Kala
- University Institute of Biopharma Sciences, Chandigarh University, Mohali, 140413, Punjab, India
| | - Gourav Dhiman
- University Institute of Biopharma Sciences, Chandigarh University, Mohali, 140413, Punjab, India
| | - Vinod Yadav
- Department of Microbiology, Central University of Haryana, Mahendergarh, 123029, Haryana, India
| | - Andrey Krokhotin
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Nikolay V Dokholyan
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC, 27599, USA
| |
Collapse
|
17
|
Pan Y, Liu D, Deng L. Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties. PLoS One 2017; 12:e0179314. [PMID: 28614374 PMCID: PMC5470696 DOI: 10.1371/journal.pone.0179314] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2017] [Accepted: 05/27/2017] [Indexed: 12/20/2022] Open
Abstract
Single amino acid variations (SAVs) potentially alter biological functions, including causing diseases or natural differences between individuals. Identifying the relationship between a SAV and certain disease provides the starting point for understanding the underlying mechanisms of specific associations, and can help further prevention and diagnosis of inherited disease.We propose PredSAV, a computational method that can effectively predict how likely SAVs are to be associated with disease by incorporating gradient tree boosting (GTB) algorithm and optimally selected neighborhood features. A two-step feature selection approach is used to explore the most relevant and informative neighborhood properties that contribute to the prediction of disease association of SAVs across a wide range of sequence and structural features, especially some novel structural neighborhood features. In cross-validation experiments on the benchmark dataset, PredSAV achieves promising performances with an AUC score of 0.908 and a specificity of 0.838, which are significantly better than that of the other existing methods. Furthermore, we validate the capability of our proposed method by an independent test and gain a competitive advantage as a result. PredSAV, which combines gradient tree boosting with optimally selected neighborhood features, can return reliable predictions in distinguishing between disease-associated and neutral variants. Compared with existing methods, PredSAV shows improved specificity as well as increased overall performance.
Collapse
Affiliation(s)
- Yuliang Pan
- School of Software, Central South University, Changsha, China
| | - Diwei Liu
- School of Software, Central South University, Changsha, China
| | - Lei Deng
- School of Software, Central South University, Changsha, China
- Shanghai Key Laboratory of Intelligent Information Processing, Shanghai, China
| |
Collapse
|
18
|
Computational predictors fail to identify amino acid substitution effects at rheostat positions. Sci Rep 2017; 7:41329. [PMID: 28134345 PMCID: PMC5278360 DOI: 10.1038/srep41329] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2016] [Accepted: 12/15/2016] [Indexed: 12/31/2022] Open
Abstract
Many computational approaches exist for predicting the effects of amino acid substitutions. Here, we considered whether the protein sequence position class - rheostat or toggle - affects these predictions. The classes are defined as follows: experimentally evaluated effects of amino acid substitutions at toggle positions are binary, while rheostat positions show progressive changes. For substitutions in the LacI protein, all evaluated methods failed two key expectations: toggle neutrals were incorrectly predicted as more non-neutral than rheostat non-neutrals, while toggle and rheostat neutrals were incorrectly predicted to be different. However, toggle non-neutrals were distinct from rheostat neutrals. Since many toggle positions are conserved, and most rheostats are not, predictors appear to annotate position conservation better than mutational effect. This finding can explain the well-known observation that predictors assign disproportionate weight to conservation, as well as the field's inability to improve predictor performance. Thus, building reliable predictors requires distinguishing between rheostat and toggle positions.
Collapse
|
19
|
Wang L, Tang N, Gao X, Chang Z, Zhang L, Zhou G, Guo D, Zeng Z, Li W, Akinyemi IA, Yang H, Wu Q. Genome sequence of a rice pest, the white-backed planthopper (Sogatella furcifera). Gigascience 2017; 6:1-9. [PMID: 28369349 PMCID: PMC5437944 DOI: 10.1093/gigascience/giw004] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2016] [Revised: 10/10/2016] [Accepted: 11/15/2016] [Indexed: 11/16/2022] Open
Abstract
Background Sogatella furcifera is an important phloem sap-sucking and plant virus-transmitting migratory insect of rice. Because of its high reproductive potential, dispersal capability and transmission of plant viral diseases, S. furcifera causes considerable damage to rice grain production and has great economical and agricultural impacts. Comprehensive studies into ecological aspects and virus-host interactions of S. furcifera have been limited because of the lack of a well-assembled genome sequence. Findings A total of 241.3 Gb of raw reads from the whole genome of S. furcifera were generated by Illumina sequencing using different combinations of mate-pair and paired-end libraries from 17 insert libraries ranging between 180 bp and 40 kbp. The final genome assembly (0.72 Gb), with average N50 contig size of 70.7 kb and scaffold N50 of 1.18 Mb, covers 98.6 % of the estimated genome size of S. furcifera . Genome annotation, assisted by eight different developmental stages (embryos, 1 st -5 th instar nymphs, 5-day-old adults and 10-day-old adults), generated 21 254 protein-coding genes, which captured 99.59 % (247/248) of core CEGMA genes and 91.7 % (2453/2675) of BUSCO genes. Conclusions We report the first assembled and annotated whole genome sequence and transcriptome of S. furcifera . The assembled draft genome of S. furcifera will be a valuable resource for ecological and virus-host interaction studies of this pest.
Collapse
Affiliation(s)
- Lin Wang
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Nan Tang
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Xinlei Gao
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Zhaoxia Chang
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Liqin Zhang
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Guohui Zhou
- Guangdong Province Key Laboratory of Microbial Signals and Disease Control, College of Agriculture, South China Agricultural University, Guangzhou, Guangdong 510642, China
| | - Dongyang Guo
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Zhen Zeng
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Wenjie Li
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Ibukun A. Akinyemi
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Huanming Yang
- BGI–Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
| | - Qingfa Wu
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230027, China
- Chinese Academy of Sciences Key Laboratory of Innate Immunity and Chronic Disease, University of Science and Technology of China, Hefei, Anhui 230027, China
- Hefei National Laboratory for Physical Sciences at the Microscale, Bio-X Interdisciplinary Sciences, 443 Huang-Shan Road, Hefei, Anhui 230027, China
| |
Collapse
|
20
|
Kumar S, Cieplak P. CaspNeuroD: a knowledgebase of predicted caspase cleavage sites in human proteins related to neurodegenerative diseases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw142. [PMID: 28025335 PMCID: PMC5199200 DOI: 10.1093/database/baw142] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Revised: 09/18/2016] [Accepted: 10/06/2016] [Indexed: 12/19/2022]
Abstract
Background: A variety of neurodegenerative diseases (NDs) have been associated with deregulated caspase activation that leads to neuronal death. Caspases appear to be involved in the molecular pathology of NDs by directly cleaving important proteins. For instance, several proteins involved in Alzheimer’s disease, including β-amyloid precursor protein (APP) and presenilins, are known to be cleaved by caspases. Therefore, cell death pathway may play a central role in many neurological diseases, and targeting the important proteins that control the cell survival and death may potentially represent a therapeutic approach for chronic neurodegenerative disorders. Findings: We developed CaspNeuroD, a relational database of in silico predicted caspase cleavage sites in human proteins associated with NDs. The prediction has been done on collection of 249 human proteins reported in clinical studies of NDs using the recently published CaspDB Random Forest machine-learning model. This database could be used for identifying new caspase substrates and further our understanding of the caspase-mediated substrate cleavage in NDs. Conclusion: Our database provides information about potential caspase cleavage sites in a verified set of human proteins involved in NDs. It provides also information about the conservation of cleavage positions in corresponding orthologs, and information about the positions of single nucleotide polymorphisms and posttranslational modifications (PTMs) that may modulate the caspase cleavage efficiency. Database URL:caspdb.sanfordburnham.org/caspneurod.php .
Collapse
Affiliation(s)
- Sonu Kumar
- SBP Medical Discovery Institute, 10901 North Torrey Pines Road, La Jolla, CA 92037, USA
| | - Piotr Cieplak
- SBP Medical Discovery Institute, 10901 North Torrey Pines Road, La Jolla, CA 92037, USA
| |
Collapse
|
21
|
Kamaraj B, Purohit R. Mutational Analysis on Membrane Associated Transporter Protein (MATP) and Their Structural Consequences in Oculocutaeous Albinism Type 4 (OCA4)-A Molecular Dynamics Approach. J Cell Biochem 2016; 117:2608-19. [PMID: 27019209 DOI: 10.1002/jcb.25555] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2016] [Accepted: 03/24/2016] [Indexed: 12/11/2022]
Abstract
Oculocutaneous albinism type IV (OCA4) is an autosomal recessive inherited disorder which is characterized by reduced biosynthesis of melanin pigmentation in skin, hair, and eyes and caused by the genetic mutations in the membrane-associated transporter protein (MATP) encoded by SLC45A2 gene. The MATP protein consists of 530 amino acids which contains 12 putative transmembrane domains and plays an important role in pigmentation and probably functions as a membrane transporter in melanosomes. We scrutinized the most OCA4 disease-associated mutation and their structural consequences on SLC45A2 gene. To understand the atomic arrangement in 3D space, the native and mutant structures were modeled. Further the structural behavior of native and mutant MATP protein was investigated by molecular dynamics simulation (MDS) approach in explicit lipid and water background. We found Y317C as the most deleterious and disease-associated SNP on SLC45A2 gene. In MDS, mutations in MATP protein showed loss of stability and became more flexible, which alter its structural conformation and function. This phenomenon has indicated a significant role in inducing OCA4. Our study explored the understanding of molecular mechanism of MATP protein upon mutation at atomic level and further helps in the field of pharmacogenomics to develop a personalized medicine for OCA4 disorder. J. Cell. Biochem. 117: 2608-2619, 2016. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Balu Kamaraj
- Research Group PLASMANT, Department of Chemistry, University of Antwerp, Universiteitsplein 1, 2610, Wilrijk-Antwerp, Belgium
| | - Rituraj Purohit
- Department of Biotechnology, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India.
| |
Collapse
|
22
|
Al-Numair NS, Lopes L, Syrris P, Monserrat L, Elliott P, Martin ACR. The structural effects of mutations can aid in differential phenotype prediction of beta-myosin heavy chain (Myosin-7) missense variants. Bioinformatics 2016; 32:2947-55. [PMID: 27318203 DOI: 10.1093/bioinformatics/btw362] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Accepted: 06/06/2016] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION High-throughput sequencing platforms are increasingly used to screen patients with genetic disease for pathogenic mutations, but prediction of the effects of mutations remains challenging. Previously we developed SAAPdap (Single Amino Acid Polymorphism Data Analysis Pipeline) and SAAPpred (Single Amino Acid Polymorphism Predictor) that use a combination of rule-based structural measures to predict whether a missense genetic variant is pathogenic. Here we investigate whether the same methodology can be used to develop a differential phenotype predictor, which, once a mutation has been predicted as pathogenic, is able to distinguish between phenotypes-in this case the two major clinical phenotypes (hypertrophic cardiomyopathy, HCM and dilated cardiomyopathy, DCM) associated with mutations in the beta-myosin heavy chain (MYH7) gene product (Myosin-7). RESULTS A random forest predictor trained on rule-based structural analyses together with structural clustering data gave a Matthews' correlation coefficient (MCC) of 0.53 (accuracy, 75%). A post hoc removal of machine learning models that performed particularly badly, increased the performance (MCC = 0.61, Acc = 79%). This proof of concept suggests that methods used for pathogenicity prediction can be extended for use in differential phenotype prediction. AVAILABILITY AND IMPLEMENTATION Analyses were implemented in Perl and C and used the Java-based Weka machine learning environment. Please contact the authors for availability. CONTACTS andrew@bioinf.org.uk or andrew.martin@ucl.ac.uk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nouf S Al-Numair
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London WC1E 6BT, UK
| | - Luis Lopes
- Institute of Cardiovascular Science, UCL, London, UK
| | - Petros Syrris
- Institute of Cardiovascular Science, UCL, London, UK
| | - Lorenzo Monserrat
- Complejo Hospitalario Universitario de A Coruña, Insituto de Investigación Biomédica, Coruña, Spain
| | - Perry Elliott
- Institute of Cardiovascular Science, UCL, London, UK
| | - Andrew C R Martin
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London WC1E 6BT, UK
| |
Collapse
|
23
|
Mueller SC, Sommer B, Backes C, Haas J, Meder B, Meese E, Keller A. From Single Variants to Protein Cascades: MULTISCALE MODELING OF SINGLE NUCLEOTIDE VARIANT SETS IN GENETIC DISORDERS. J Biol Chem 2016; 291:1582-1590. [PMID: 26601959 DOI: 10.1074/jbc.m115.695247] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2015] [Indexed: 01/18/2023] Open
Abstract
Understanding the role of genetics in disease has become a central part of medical research. Non-synonymous single nucleotide variants (nsSNVs) in coding regions of human genes frequently lead to pathological phenotypes. Beyond single variations, the individual combination of nsSNVs may add to pathogenic processes. We developed a multiscale pipeline to systematically analyze the existence of quantitative effects of multiple nsSNVs and gene combinations in single individuals on pathogenicity. Based on this pipeline, we detected in a data set of 842 nsSNVs discovered in 76 genes related to cardiomyopathies, associated nsSNV combinations in seven genes present in at least 70% of all 639 patient samples, but not in a control cohort of healthy humans. Structural analyses of these revealed primarily an influence on the protein stability. For amino acid substitutions located at the protein surface, we generally observed a proximity to putative binding pockets. To computationally analyze cumulative effects and their impact, pathogenicity methods are currently being developed. Our approach supports this process, as shown on the example of a cardiac phenotype but can be likewise applied to other diseases such as cancer.
Collapse
Affiliation(s)
- Sabine C Mueller
- From the Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany,; Department of Human Genetics, Saarland University, 66421 Homburg, Germany,.
| | - Björn Sommer
- the Bio-/Medical Informatics Department, Faculty of Technology, Bielefeld University, 33501 Bielefeld, Germany,; Clayton School of Information Technology, Faculty of Information Technology, Monash University, Melbourne 3800, Australia
| | - Christina Backes
- From the Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
| | - Jan Haas
- the Department of Internal Medicine III, Heidelberg University, 69120 Heidelberg, Germany, and; the DZHK (German Centre for Cardiovascular Research), 69120 Heidelberg, Germany
| | - Benjamin Meder
- the Department of Internal Medicine III, Heidelberg University, 69120 Heidelberg, Germany, and; the DZHK (German Centre for Cardiovascular Research), 69120 Heidelberg, Germany
| | - Eckart Meese
- Department of Human Genetics, Saarland University, 66421 Homburg, Germany
| | - Andreas Keller
- From the Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
| |
Collapse
|
24
|
Abstract
Deleterious or 'disease-associated' mutations are mutations that lead to disease with high phenotype penetrance: they are inherited in a simple Mendelian manner, or, in the case of cancer, accumulate in somatic cells leading directly to disease. However, in some cases, the amino acid that is substituted resulting in disease is the wild-type native residue in the functionally equivalent protein in another species. Such examples are known as 'compensated pathogenic deviations' (CPDs) because, somewhere in the second species, there must be compensatory mutations that allow the protein to function normally despite having a residue which would cause disease in the first species. Depending on the nature of the mutations, compensation can occur in the same protein, or in a different protein with which it interacts. In principle, compensation can be achieved by a single mutation (most probably structurally close to the CPD), or by the cumulative effect of several mutations. Although it is clear that these effects occur in proteins, compensatory mutations are also important in RNA potentially having an impact on disease. As a much simpler molecule, RNA provides an interesting model for understanding mechanisms of compensatory effects, both by looking at naturally occurring RNA molecules and as a means of computational simulation. This review surveys the rather limited literature that has explored these effects. Understanding the nature of CPDs is important in understanding traversal along fitness landscape valleys in evolution. It could also have applications in treating diseases that result from such mutations.
Collapse
|
25
|
Cesaro L, Pinna LA, Salvi M. A Comparative Analysis and Review of lysyl Residues Affected by Posttranslational Modifications. Curr Genomics 2015; 16:128-38. [PMID: 26085811 PMCID: PMC4467303 DOI: 10.2174/1389202916666150216221038] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2014] [Revised: 02/09/2015] [Accepted: 02/10/2015] [Indexed: 11/22/2022] Open
Abstract
Post-translational modification is the most common mechanism of regulating protein function. If
phosphorylation is considered a key event in many signal transduction pathways, other modifications must be
considered as well. In particular the side chain of lysine residues is a target of different modifications; notably
acetylation, methylation, ubiquitylation, sumoylation, neddylation, etc. Mass spectrometry approaches combining
highly sensitive instruments and specific enrichment strategies have enabled the identification of modified
sites on a large scale. Here we make a comparative analysis of the most representative lysine modifications
(ubiquitylation, acetylation, sumoylation and methylation) identified in the human proteome. This review focuses on
conserved amino acids, secondary structures preference, subcellular localization of modified proteins, and signaling pathways
where these modifications are implicated. We discuss specific differences and similarities between these modifications,
characteristics of the crosstalk among lysine post translational modifications, and single nucleotide polymorphisms
that could influence lysine post-translational modifications in humans.
Collapse
Affiliation(s)
- Luca Cesaro
- Department of Biomedical Sciences, University of Padova, Via U. Bassi 58/B, Padova, Italy
| | - Lorenzo A Pinna
- Department of Biomedical Sciences, University of Padova, Via U. Bassi 58/B, Padova, Italy ; Institute of Neurosciences, V.le G. Colombo 3, Padova, Italy
| | - Mauro Salvi
- Department of Biomedical Sciences, University of Padova, Via U. Bassi 58/B, Padova, Italy
| |
Collapse
|
26
|
Kumar S, Ratnikov BI, Kazanov MD, Smith JW, Cieplak P. CleavPredict: A Platform for Reasoning about Matrix Metalloproteinases Proteolytic Events. PLoS One 2015; 10:e0127877. [PMID: 25996941 PMCID: PMC4440711 DOI: 10.1371/journal.pone.0127877] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 04/21/2015] [Indexed: 11/19/2022] Open
Abstract
CleavPredict (http://cleavpredict.sanfordburnham.org) is a Web server for substrate cleavage prediction for matrix metalloproteinases (MMPs). It is intended as a computational platform aiding the scientific community in reasoning about proteolytic events. CleavPredict offers in silico prediction of cleavage sites specific for 11 human MMPs. The prediction method employs the MMP specific position weight matrices (PWMs) derived from statistical analysis of high-throughput phage display experimental results. To augment the substrate cleavage prediction process, CleavPredict provides information about the structural features of potential cleavage sites that influence proteolysis. These include: secondary structure, disordered regions, transmembrane domains, and solvent accessibility. The server also provides information about subcellular location, co-localization, and co-expression of proteinase and potential substrates, along with experimentally determined positions of single nucleotide polymorphism (SNP), and posttranslational modification (PTM) sites in substrates. All this information will provide the user with perspectives in reasoning about proteolytic events. CleavPredict is freely accessible, and there is no login required.
Collapse
Affiliation(s)
- Sonu Kumar
- Sanford Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Boris I. Ratnikov
- Sanford Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Marat D. Kazanov
- Institute for Information Transmission Problems, Russian Academy of Science, Moscow, Russia
| | - Jeffrey W. Smith
- Sanford Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Piotr Cieplak
- Sanford Burnham Medical Research Institute, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
27
|
Mueller SC, Backes C, Haas J, Katus HA, Meder B, Meese E, Keller A. Pathogenicity prediction of non-synonymous single nucleotide variants in dilated cardiomyopathy. Brief Bioinform 2015; 16:769-79. [PMID: 25638801 DOI: 10.1093/bib/bbu054] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2014] [Indexed: 02/03/2023] Open
Abstract
Non-synonymous single nucleotide variants (nsSNVs) in coding DNA regions can result in phenotypic differences between individuals; however, only some nsSNVs are causative for a certain disease. As just a fraction of respective nsSNVs is annotated in databases, computational biology tools are applied to predict the pathogenicity in silico. In addition to applications in oncology, novel molecular diagnostic tests have been developed for cardiovascular disorders as a leading cause of morbidity and mortality in industrialized nations. We explored the concordance and performance of 13 nsSNV pathogenicity prediction tools on panel sequencing results of dilated cardiomyopathy. The analyzed data set from the INHERITANCE study contained 842 nsSNVs discovered in 639 patients, screened for the full sequence of 76 genes related to cardiomyopathies. The single tools prediction revealed a surprisingly high heterogeneity and discordance based on the implemented prediction method. Known disease associations were not reported by the tools, limiting usability in clinics. Because different tools have different advantages, we combined their results. By clustering of correlated methods using similar prediction strategies and calculating a majority vote-based consensus, we found that the prediction accuracy and sensitivity can be further improved. Although challenges remain, different in silico tools bear the potential to predict the malignancy of nsSNVs, especially if different algorithms are combined. Most tools rely mainly on sequence features; beyond these, structural information is important to analyze the relationship of nsSNVs with disease phenotypes. Likewise, current tools consider single nsSNVs, which may, however, show a cumulative effect and turn neutral mutations in an ensemble into pathogenic variants.
Collapse
|
28
|
Kumar S, van Raam BJ, Salvesen GS, Cieplak P. Caspase cleavage sites in the human proteome: CaspDB, a database of predicted substrates. PLoS One 2014; 9:e110539. [PMID: 25330111 PMCID: PMC4201543 DOI: 10.1371/journal.pone.0110539] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2014] [Accepted: 09/19/2014] [Indexed: 12/28/2022] Open
Abstract
Caspases are enzymes belonging to a conserved family of cysteine-dependent aspartic-specific proteases that are involved in vital cellular processes and play a prominent role in apoptosis and inflammation. Determining all relevant protein substrates of caspases remains a challenging task. Over 1500 caspase substrates have been discovered in the human proteome according to published data and new substrates are discovered on a daily basis. To aid the discovery process we developed a caspase cleavage prediction method using the recently published curated MerCASBA database of experimentally determined caspase substrates and a Random Forest classification method. On both internal and external test sets, the ranking of predicted cleavage positions is superior to all previously developed prediction methods. The in silico predicted caspase cleavage positions in human proteins are available from a relational database: CaspDB. Our database provides information about potential cleavage sites in a verified set of all human proteins collected in Uniprot and their orthologs, allowing for tracing of cleavage motif conservation. It also provides information about the positions of disease-annotated single nucleotide polymorphisms, and posttranslational modifications that may modulate the caspase cleaving efficiency.
Collapse
Affiliation(s)
- Sonu Kumar
- Sanford Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Bram J. van Raam
- Sanford Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Guy S. Salvesen
- Sanford Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Piotr Cieplak
- Sanford Burnham Medical Research Institute, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
29
|
Famiglietti ML, Estreicher A, Gos A, Bolleman J, Géhant S, Breuza L, Bridge A, Poux S, Redaschi N, Bougueleret L, Xenarios I. Genetic variations and diseases in UniProtKB/Swiss-Prot: the ins and outs of expert manual curation. Hum Mutat 2014; 35:927-35. [PMID: 24848695 PMCID: PMC4107114 DOI: 10.1002/humu.22594] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2014] [Accepted: 05/09/2014] [Indexed: 11/25/2022]
Abstract
During the last few years, next-generation sequencing (NGS) technologies have accelerated the detection of genetic variants resulting in the rapid discovery of new disease-associated genes. However, the wealth of variation data made available by NGS alone is not sufficient to understand the mechanisms underlying disease pathogenesis and manifestation. Multidisciplinary approaches combining sequence and clinical data with prior biological knowledge are needed to unravel the role of genetic variants in human health and disease. In this context, it is crucial that these data are linked, organized, and made readily available through reliable online resources. The Swiss-Prot section of the Universal Protein Knowledgebase (UniProtKB/Swiss-Prot) provides the scientific community with a collection of information on protein functions, interactions, biological pathways, as well as human genetic diseases and variants, all manually reviewed by experts. In this article, we present an overview of the information content of UniProtKB/Swiss-Prot to show how this knowledgebase can support researchers in the elucidation of the mechanisms leading from a molecular defect to a disease phenotype.
Collapse
Affiliation(s)
- Maria Livia Famiglietti
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Zeng S, Yang J, Chung BHY, Lau YL, Yang W. EFIN: predicting the functional impact of nonsynonymous single nucleotide polymorphisms in human genome. BMC Genomics 2014; 15:455. [PMID: 24916671 PMCID: PMC4061446 DOI: 10.1186/1471-2164-15-455] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2013] [Accepted: 06/04/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Predicting the functional impact of amino acid substitutions (AAS) caused by nonsynonymous single nucleotide polymorphisms (nsSNPs) is becoming increasingly important as more and more novel variants are being discovered. Bioinformatics analysis is essential to predict potentially causal or contributing AAS to human diseases for further analysis, as for each genome, thousands of rare or private AAS exist and only a very small number of which are related to an underlying disease. Existing algorithms in this field still have high false prediction rate and novel development is needed to take full advantage of vast amount of genomic data. RESULTS Here we report a novel algorithm that features two innovative changes: 1. making better use of sequence conservation information by grouping the homologous protein sequences into six blocks according to evolutionary distances to human and evaluating sequence conservation in each block independently, and 2. including as many such homologous sequences as possible in analyses. Random forests are used to evaluate sequence conservation in each block and to predict potential impact of an AAS on protein function. Testing of this algorithm on a comprehensive dataset showed significant improvement on prediction accuracy upon currently widely-used programs. The algorithm and a web-based application tool implementing it, EFIN (Evaluation of Functional Impact of Nonsynonymous SNPs) were made freely available (http://paed.hku.hk/efin/) to the public. CONCLUSIONS Grouping homologous sequences into different blocks according to the evolutionary distance of the species to human and evaluating sequence conservation in each group independently significantly improved prediction accuracy. This approach may help us better understand the roles of genetic variants in human disease and health.
Collapse
Affiliation(s)
| | | | | | | | - Wanling Yang
- Department of Paediatrics and Adolescent Medicine, LKS Faculty of Medicine, The University of Hong Kong, 5 Sassoon Road, Hong Kong, China.
| |
Collapse
|
31
|
Abstract
Proteins are macromolecules that serve a cell’s myriad processes and functions in all living organisms via dynamic interactions with other proteins, small molecules and cellular components. Genetic variations in the protein-encoding regions of the human genome account for >85% of all known Mendelian diseases, and play an influential role in shaping complex polygenic diseases. Proteins also serve as the predominant target class for the design of small molecule drugs to modulate their activity. Knowledge of the shape and form of proteins, by means of their three-dimensional structures, is therefore instrumental to understanding their roles in disease and their potentials for drug development. In this chapter we outline, with the wide readership of non-structural biologists in mind, the various experimental and computational methods available for protein structure determination. We summarize how the wealth of structure information, contributed to a large extent by the technological advances in structure determination to date, serves as a useful tool to decipher the molecular basis of genetic variations for disease characterization and diagnosis, particularly in the emerging era of genomic medicine, and becomes an integral component in the modern day approach towards rational drug development.
Collapse
Affiliation(s)
- Nelson L.S. Tang
- Dept. of Chemical Pathology and Lab. of Genetics of Disease Suscept., The Chinese University of Hong Kong, Hong Kong, People's Republic of China
| | - Terence Poon
- Department of Paediatrics and Proteomics Laboratory, The Chinese University of Hong Kong, Hong Kong, People's Republic of China
| |
Collapse
|
32
|
Suo SB, Qiu JD, Shi SP, Chen X, Liang RP. PSEA: Kinase-specific prediction and analysis of human phosphorylation substrates. Sci Rep 2014; 4:4524. [PMID: 24681538 PMCID: PMC3970127 DOI: 10.1038/srep04524] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2013] [Accepted: 03/11/2014] [Indexed: 11/09/2022] Open
Abstract
Protein phosphorylation catalysed by kinases plays crucial regulatory roles in intracellular signal transduction. With the increasing number of kinase-specific phosphorylation sites and disease-related phosphorylation substrates that have been identified, the desire to explore the regulatory relationship between protein kinases and disease-related phosphorylation substrates is motivated. In this work, we analysed the kinases' characteristic of all disease-related phosphorylation substrates by using our developed Phosphorylation Set Enrichment Analysis (PSEA) method. We evaluated the efficiency of our method with independent test and concluded that our approach is reliable for identifying kinases responsible for phosphorylated substrates. In addition, we found that Mitogen-activated protein kinase (MAPK) and Glycogen synthase kinase (GSK) families are more associated with abnormal phosphorylation. It can be anticipated that our method might be helpful to identify the mechanism of phosphorylation and the relationship between kinase and phosphorylation related diseases. A user-friendly web interface is now freely available at http://bioinfo.ncu.edu.cn/PKPred_Home.aspx.
Collapse
Affiliation(s)
- Sheng-Bao Suo
- Department of Chemistry, Nanchang University, Nanchang, 330031, China
| | - Jian-Ding Qiu
- 1] Department of Chemistry, Nanchang University, Nanchang, 330031, China [2] Department of Chemical Engineering, Pingxiang College, Pingxiang, 337055, China
| | - Shao-Ping Shi
- 1] Department of Chemistry, Nanchang University, Nanchang, 330031, China [2] Department of Mathematics, Nanchang University, Nanchang, 330031, China
| | - Xiang Chen
- Department of Chemistry, Nanchang University, Nanchang, 330031, China
| | - Ru-Ping Liang
- Department of Chemistry, Nanchang University, Nanchang, 330031, China
| |
Collapse
|
33
|
Jimeno Yepes A, Verspoor K. Literature mining of genetic variants for curation: quantifying the importance of supplementary material. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau003. [PMID: 24520105 PMCID: PMC3920087 DOI: 10.1093/database/bau003] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
A major focus of modern biological research is the understanding of how genomic variation relates to disease. Although there are significant ongoing efforts to capture this understanding in curated resources, much of the information remains locked in unstructured sources, in particular, the scientific literature. Thus, there have been several text mining systems developed to target extraction of mutations and other genetic variation from the literature. We have performed the first study of the use of text mining for the recovery of genetic variants curated directly from the literature. We consider two curated databases, COSMIC (Catalogue Of Somatic Mutations In Cancer) and InSiGHT (International Society for Gastro-intestinal Hereditary Tumours), that contain explicit links to the source literature for each included mutation. Our analysis shows that the recall of the mutations catalogued in the databases using a text mining tool is very low, despite the well-established good performance of the tool and even when the full text of the associated article is available for processing. We demonstrate that this discrepancy can be explained by considering the supplementary material linked to the published articles, not previously considered by text mining tools. Although it is anecdotally known that supplementary material contains 'all of the information', and some researchers have speculated about the role of supplementary material (Schenck et al. Extraction of genetic mutations associated with cancer from public literature. J Health Med Inform 2012;S2:2.), our analysis substantiates the significant extent to which this material is critical. Our results highlight the need for literature mining tools to consider not only the narrative content of a publication but also the full set of material related to a publication.
Collapse
Affiliation(s)
- Antonio Jimeno Yepes
- National ICT Australia, Victoria Research Laboratory, Melbourne, Australia and Department of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
| | | |
Collapse
|
34
|
Jimeno Yepes A, Verspoor K. Mutation extraction tools can be combined for robust recognition of genetic variants in the literature. F1000Res 2014; 3:18. [PMID: 25285203 PMCID: PMC4176422 DOI: 10.12688/f1000research.3-18.v2] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/27/2014] [Indexed: 11/20/2022] Open
Abstract
As the cost of genomic sequencing continues to fall, the amount of data being collected and studied for the purpose of understanding the genetic basis of disease is increasing dramatically. Much of the source information relevant to such efforts is available only from unstructured sources such as the scientific literature, and significant resources are expended in manually curating and structuring the information in the literature. As such, there have been a number of systems developed to target automatic extraction of mutations and other genetic variation from the literature using text mining tools. We have performed a broad survey of the existing publicly available tools for extraction of genetic variants from the scientific literature. We consider not just one tool but a number of different tools, individually and in combination, and apply the tools in two scenarios. First, they are compared in an intrinsic evaluation context, where the tools are tested for their ability to identify specific mentions of genetic variants in a corpus of manually annotated papers, the Variome corpus. Second, they are compared in an extrinsic evaluation context based on our previous study of text mining support for curation of the COSMIC and InSiGHT databases. Our results demonstrate that no single tool covers the full range of genetic variants mentioned in the literature. Rather, several tools have complementary coverage and can be used together effectively. In the intrinsic evaluation on the Variome corpus, the combined performance is above 0.95 in F-measure, while in the extrinsic evaluation the combined recall performance is above 0.71 for COSMIC and above 0.62 for InSiGHT, a substantial improvement over the performance of any individual tool. Based on the analysis of these results, we suggest several directions for the improvement of text mining tools for genetic variant extraction from the literature.
Collapse
Affiliation(s)
- Antonio Jimeno Yepes
- National ICT Australia, Victoria Research Laboratory, Melbourne, Australia ; Department of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
| | - Karin Verspoor
- National ICT Australia, Victoria Research Laboratory, Melbourne, Australia ; Department of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
| |
Collapse
|
35
|
Kamaraj B, Rajendran V, Sethumadhavan R, Purohit R. In-silico screening of cancer associated mutation on PLK1 protein and its structural consequences. J Mol Model 2013; 19:5587-99. [PMID: 24271645 DOI: 10.1007/s00894-013-2044-0] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2013] [Accepted: 10/21/2013] [Indexed: 11/28/2022]
Abstract
The Polo-like kinases (Plks) are a conserved subfamily of serine-threonine protein kinases that have significant roles in cell proliferation. The serine/threonine protein kinases or polo-like kinase 1 (PLK1) exist in centrosome during interphase and is an important regulatory enzyme in cell cycle progression during M phase. Mutations in mammalian PLK1 were found to be over expressed in various human cancers and it is disrupting the binding ability of polo box domain with target peptide. In this analysis we implemented a computational approach to filter the most deleterious and cancer associated mutation on PLK1 protein. We found W414F as the most deleterious and cancer associated by Polyphen 2.0, SIFT, I-mutant 3.0, PANTHER, PhD-SNP, SNP&GO, Mutpred and Dr Cancer tools. Molecular docking and molecular dynamics simulation (MDS) approach was used to investigate the structural and functional behavior of PLK1 protein upon mutation. MDS and docking results showed stability loss in mutant PLK1 protein. Due to mutation, PLK1 protein became more flexible and alters the dynamic property of protein which might affect the interaction with target peptide and leads to cell proliferation. Our study provided a well designed computational methodology to examine the cancer associated nsSNPs and their molecular mechanism. It further helps scientists to develop a drug therapy against PLK1 cancer-associated diseases.
Collapse
Affiliation(s)
- Balu Kamaraj
- School of Bio Sciences and Technology (SBST), Bioinformatics Division, Vellore Institute of Technology University, Vellore, 632014, Tamil Nadu, India
| | | | | | | |
Collapse
|
36
|
Ji R, Cong Q, Li W, Grishin NV. M2SG: mapping human disease-related genetic variants to protein sequences and genomic loci. ACTA ACUST UNITED AC 2013; 29:2953-4. [PMID: 24002112 DOI: 10.1093/bioinformatics/btt507] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
SUMMARY Online Mendelian Inheritance in Man (OMIM) is a manually curated compendium of human genetic variants and the corresponding phenotypes, mostly human diseases. Instead of directly documenting the native sequences for gene entries, OMIM links its entries to protein and DNA sequences in other databases. However, because of the existence of gene isoforms and errors in OMIM records, mapping a specific OMIM mutation to its corresponding protein sequence is not trivial. Combining computer programs and extensive manual curation of OMIM full-text descriptions and original literature, we mapped 98% of OMIM amino acid substitutions (AASs) and all SwissProt Variant (SwissVar) disease-related AASs to reference sequences and confidently mapped 99.96% of all AASs to the genomic loci. Based on the results, we developed an online database and interactive web server (M2SG) to (i) retrieve the mapped OMIM and SwissVar variants for a given protein sequence; and (ii) obtain related proteins and mutations for an input disease phenotype. This database will be useful for analyzing sequences, understanding the effect of mutations, identifying important genetic variations and designing experiments on a protein of interest. AVAILABILITY AND IMPLEMENTATION The database and web server are freely available at http://prodata.swmed.edu/M2S/mut2seq.cgi.
Collapse
Affiliation(s)
- Renkai Ji
- Departments of biophysics and biochemistry, University of Texas Southwestern Medical Center and Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX 75390-9050, USA
| | | | | | | |
Collapse
|
37
|
Chitranshi N, Tiwari AK, Somvanshi P, Tripathi PK, Seth PK. Investigating the function of single nucleotide polymorphisms in the CTSB gene: a computational approach. FUTURE NEUROLOGY 2013. [DOI: 10.2217/fnl.13.26] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Aim: Recent genome-wide association studies have revealed large numbers of single nucleotide polymorphisms (SNPs) related to Alzheimer’s disease. Here, we have investigated the gene CTSB, which plays a crucial role in encoding CTSB, a lysosomal cysteine proteinase protein. CTSB is also involved in the proteolytic processing of amyloid precursor protein (APP), which is believed to be a causative factor in Alzheimer’s disease. Materials & methods: Several bioinformatics algorithms such as, Sorting Intolerant from Tolerant (SIFT), Polymorphism Phenotyping (PolyPhen) and CUPSAT could identify the synonymous SNPs and nonsynonymous SNPs (nsSNPs), which are predicted to be deleterious and nondeleterious, respectively. Similar tools were used to predict the impact of single amino acid substitutions on CTSB protein activity. The FASTSNP server and UTRscan were used to predict the influence on splicing regulations. The stability and solvent-accessible surface area of modeled mutated proteins were analyzed using PBEQ solver and NetASA view. Furthermore, the DSP program was used to determine the secondary structures of the modeled protein. Results: A total of 999 SNPs in CTSB were retrieved from the SNP database; 55 nsSNPs, 35 synonymous SNPs, 165 mRNA were found in the 3´untranslated region SNPs, 12 SNPs were found in the 5´untranslated region in addition to 732 intronic SNPs. Potential functions of SNPs in the CTSB gene were identified using different web servers. For example, SIFT, PolyPhen and CUPSAT servers predicted ten nsSNPs to be intolerant, three nsSNPs to be damaging and eight nsSNPs to have the potential to destabilize protein structure. The FASTSNP server predicted 12 SNPs to influence splicing regulation, whereas two SNPs could predict a risk in the range of 3–4 (medium to high). Furthermore, mutant proteins were modeled and the total energy values were compared with the native CTSB protein. It was observed that on the surface of the protein, a mutation from threonine to serine at position 235 (rs17573) caused the greatest impact on stability. Conclusion: The genome-wide association studies database has already found rs7003814 of the CTSB gene reported against Alzheimer’s disease. Our study demonstrates the presence of other deleterious nsSNPs, which may play a crucial role in predicting Alzheimer’s disease risk.
Collapse
Affiliation(s)
- Nitin Chitranshi
- Gautam Buddh Technical University, Lucknow 227202, Uttar Pradesh, India
- Bioinformatics Centre, Biotech Park, Sector-G, Jankipuram, Lucknow-226021, Uttar Pradesh, India.
| | - Amit K Tiwari
- Department of Biomedical Sciences, College of Veterinary Medicine, Nursing & Allied Health, Tuskegee University, Tuskegee, AL 36088, USA
| | - Pallavi Somvanshi
- Department of Biotechnology, TERI University, 10, Institutional Area, Vasantkunj, New Delhi 110070, India
| | | | - Prahlad K Seth
- Bioinformatics Centre, Biotech Park, Sector-G, Jankipuram, Lucknow-226021, Uttar Pradesh, India
| |
Collapse
|
38
|
In silico screening and molecular dynamics simulation of disease-associated nsSNP in TYRP1 gene and its structural consequences in OCA3. BIOMED RESEARCH INTERNATIONAL 2013; 2013:697051. [PMID: 23862152 PMCID: PMC3703794 DOI: 10.1155/2013/697051] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2013] [Revised: 05/23/2013] [Accepted: 05/23/2013] [Indexed: 11/17/2022]
Abstract
Oculocutaneous albinism type III (OCA3), caused by mutations of TYRP1 gene, is an autosomal recessive disorder characterized by reduced biosynthesis of melanin pigment in the hair, skin, and eyes. The TYRP1 gene encodes a protein called tyrosinase-related protein-1 (Tyrp1). Tyrp1 is involved in maintaining the stability of tyrosinase protein and modulating its catalytic activity in eumelanin synthesis. Tyrp1 is also involved in maintenance of melanosome structure and affects melanocyte proliferation and cell death. In this work we implemented computational analysis to filter the most probable mutation that might be associated with OCA3. We found R326H and R356Q as most deleterious and disease associated by using PolyPhen 2.0, SIFT, PANTHER, I-mutant 3.0, PhD-SNP, SNP&GO, Pmut, and Mutpred tools. To understand the atomic arrangement in 3D space, the native and mutant (R326H and R356Q) structures were modelled. Finally the structural analyses of native and mutant Tyrp1 proteins were investigated using molecular dynamics simulation (MDS) approach. MDS results showed more flexibility in native Tyrp1 structure. Due to mutation in Tyrp1 protein, it became more rigid and might disturb the structural conformation and catalytic function of the structure and might also play a significant role in inducing OCA3. The results obtained from this study would facilitate wet-lab researches to develop a potent drug therapies against OCA3.
Collapse
|
39
|
Al-Numair NS, Martin ACR. The SAAP pipeline and database: tools to analyze the impact and predict the pathogenicity of mutations. BMC Genomics 2013; 14 Suppl 3:S4. [PMID: 23819919 PMCID: PMC3665582 DOI: 10.1186/1471-2164-14-s3-s4] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Understanding and predicting the effects of mutations on protein structure and phenotype is an increasingly important area. Genes for many genetically linked diseases are now routinely sequenced in the clinic. Previously we focused on understanding the structural effects of mutations, creating the SAAPdb resource. Results We have updated SAAPdb to include 41% more SNPs and 36% more PDs. Introducing a hydrophobic residue on the surface, or a hydrophilic residue in the core, no longer shows significant differences between SNPs and PDs. We have improved some of the analyses significantly enhancing the analysis of clashes and of mutations to-proline and from-glycine. A new web interface has been developed allowing users to analyze their own mutations. Finally we have developed a machine learning method which gives a cross-validated accuracy of 0.846, considerably out-performing well known methods including SIFT and PolyPhen2 which give accuracies between 0.690 and 0.785. Conclusions We have updated SAAPdb and improved its analyses, but with the increasing rate with which mutation data are generated, we have created a new analysis pipeline and web interface. Results of machine learning using the structural analysis results to predict pathogenicity considerably outperform other methods.
Collapse
Affiliation(s)
- Nouf S Al-Numair
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK
| | | |
Collapse
|
40
|
Suo SB, Qiu JD, Shi SP, Chen X, Huang SY, Liang RP. Proteome-wide analysis of amino acid variations that influence protein lysine acetylation. J Proteome Res 2013; 12:949-58. [PMID: 23298314 DOI: 10.1021/pr301007j] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Next-generation sequencing (NGS) technologies are yielding ever higher volumes of genetic variation data. Given this large amount of data, it has become both a possibility and a priority to determine what the functional implication of genetic variations is. Considering the essential roles of acetylation in protein functions, it is highly likely that acetylation related genetic variations change protein functions. In this work, we performed a proteome-wide analysis of amino acid variations that could potentially influence protein lysine acetylation characteristics in human variant proteins. Here, we defined the AcetylAAVs as acetylation related amino acid variations that affect acetylation sites or their interacting acetyltransferases, and categorized three types of AcetylAAVs. Using the developed prediction system, named KAcePred, we detected that 50.87% of amino acid variations are potential AcetylAAVs and 12.32% of disease mutations could result in AcetylAAVs. More interestingly, from the statistical analysis, we found that the amino acid variations that directly create new potential lysine acetylation sites have more chance to cause diseases. It can be anticipated that the analysis of AcetylAAVs might be useful to screen important polymorphisms and help to identify the mechanism of genetic diseases. A user-friendly web interface for analysis of AcetylAAVs is now freely available at http://bioinfo.ncu.edu.cn/AcetylAAVs_Home.aspx .
Collapse
Affiliation(s)
- Sheng-Bao Suo
- Department of Chemistry, Nanchang Universit y, Nanchang 330031, China
| | | | | | | | | | | |
Collapse
|
41
|
Sreevishnupriya K, Chandrasekaran P, Senthilkumar A, Sethumadhavan R, Shanthi V, Daisy P, Nisha J, Ramanathan K, Rajasekaran R. Computational analysis of deleterious missense mutations in aspartoacylase that cause Canavan's disease. SCIENCE CHINA-LIFE SCIENCES 2012; 55:1109-19. [PMID: 23233226 DOI: 10.1007/s11427-012-4406-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2012] [Accepted: 11/06/2012] [Indexed: 01/09/2023]
Abstract
In this work, the most detrimental missense mutations of aspartoacylase that cause Canavan's disease were identified computationally and the substrate binding efficiencies of those missense mutations were analyzed. Out of 30 missense mutations, I-Mutant 2.0, SIFT and PolyPhen programs identified 22 variants that were less stable, deleterious and damaging respectively. Subsequently, modeling of these 22 variants was performed to understand the change in their conformations with respect to the native aspartoacylase by computing their root mean squared deviation (RMSD). Furthermore, the native protein and the 22 mutants were docked with the substrate NAA (N-Acetyl-Aspartic acid) to explain the substrate binding efficiencies of those detrimental missense mutations. Among the 22 mutants, the docking studies identified that 15 mutants caused lower binding affinity for NAA than the native protein. Finally, normal mode analysis determined that the loss of binding affinity of these 15 mutants was caused by altered flexibility in the amino acids that bind to NAA compared with the native protein. Thus, the present study showed that the majority of the substrate-binding amino acids in those 15 mutants displayed loss of flexibility, which could be the theoretical explanation of decreased binding affinity between the mutant aspartoacylases and NAA.
Collapse
Affiliation(s)
- K Sreevishnupriya
- Bioinformatics Division, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Nair PS, Vihinen M. VariBench: A Benchmark Database for Variations. Hum Mutat 2012; 34:42-9. [DOI: 10.1002/humu.22204] [Citation(s) in RCA: 106] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2012] [Accepted: 07/31/2012] [Indexed: 12/21/2022]
|
43
|
FunSAV: predicting the functional effect of single amino acid variants using a two-stage random forest model. PLoS One 2012; 7:e43847. [PMID: 22937107 PMCID: PMC3427247 DOI: 10.1371/journal.pone.0043847] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2012] [Accepted: 07/26/2012] [Indexed: 11/26/2022] Open
Abstract
Single amino acid variants (SAVs) are the most abundant form of known genetic variations associated with human disease. Successful prediction of the functional impact of SAVs from sequences can thus lead to an improved understanding of the underlying mechanisms of why a SAV may be associated with certain disease. In this work, we constructed a high-quality structural dataset that contained 679 high-quality protein structures with 2,048 SAVs by collecting the human genetic variant data from multiple resources and dividing them into two categories, i.e., disease-associated and neutral variants. We built a two-stage random forest (RF) model, termed as FunSAV, to predict the functional effect of SAVs by combining sequence, structure and residue-contact network features with other additional features that were not explored in previous studies. Importantly, a two-step feature selection procedure was proposed to select the most important and informative features that contribute to the prediction of disease association of SAVs. In cross-validation experiments on the benchmark dataset, FunSAV achieved a good prediction performance with the area under the curve (AUC) of 0.882, which is competitive with and in some cases better than other existing tools including SIFT, SNAP, Polyphen2, PANTHER, nsSNPAnalyzer and PhD-SNP. The sourcecodes of FunSAV and the datasets can be downloaded at http://sunflower.kuicr.kyoto-u.ac.jp/sjn/FunSAV.
Collapse
|
44
|
Masoodi TA, Shammari SAA, Al-Muammar MN, Almubrad TM, Alhamdan AA. Screening and structural evaluation of deleterious Non-Synonymous SNPs of ePHA2 gene involved in susceptibility to cataract formation. Bioinformation 2012; 8:562-7. [PMID: 22829731 PMCID: PMC3398778 DOI: 10.6026/97320630008562] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2012] [Accepted: 05/24/2012] [Indexed: 02/02/2023] Open
Abstract
Age-related cataract is clinically and genetically heterogeneous disorder affecting the ocular lens, and the leading cause of vision loss and blindness worldwide. Here we screened nonsynonymous single nucleotide polymorphisms (nsSNPs) of a novel gene, EPHA2 responsible for age related cataracts. The SNPs were retrieved from dbSNP. Using I-Mutant, protein stability change was calculated. The potentially functional nsSNPs and their effect on protein was predicted by PolyPhen and SIFT respectively. FASTSNP was used for functional analysis and estimation of risk score. The functional impact on the EPHA2 protein was evaluated by using SWISSPDB viewer and NOMAD-Ref server. Our analysis revealed 16 SNPs as nonsynonymous out of which 6 nsSNPs, namely rs11543934, rs2291806, rs1058371, rs1058370, rs79100278 and rs113882203 were found to be least stable by I-Mutant 2.0 with DDG value of > -1.0. nsSNPs, namely rs35903225, rs2291806, rs1058372, rs1058370, rs79100278 and rs113882203 showed a highly deleterious tolerance index score of 0.00 by SIFT server. Four nsSNPs namely rs11543934, rs2291806, rs1058370 and rs113882203 were found to be probably damaging with PSIC score of ≥ 2. 0 by Polyp hen server. Three nsSNPs namely, rs11543934, rs2291806 and rs1058370 were found to be highly polymorphic with a risk score of 3-4 with a possible effect of Non-conservative change and splicing regulation by FASTSNP. The total energy and RMSD value was higher for the mutant-type structure compared to the native type structure. We concluded that the nsSNP namely rs2291806 as the potential functional polymorphic that is likely to have functional impact on the EPHA2 gene.
Collapse
Affiliation(s)
- Tariq Ahmad Masoodi
- Health Care Development for Elderly Research Chair, College of Applied Medical Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Sulaiman A Al Shammari
- Health Care Development for Elderly Research Chair, College of Applied Medical Sciences, King Saud University, Riyadh, Saudi Arabia
| | - May N Al-Muammar
- Health Care Development for Elderly Research Chair, College of Applied Medical Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Turki M Almubrad
- Department of Optometry, College of Applied Medical Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Adel A Alhamdan
- Health Care Development for Elderly Research Chair, College of Applied Medical Sciences, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
45
|
Olatubosun A, Väliaho J, Härkönen J, Thusberg J, Vihinen M. PON-P: integrated predictor for pathogenicity of missense variants. Hum Mutat 2012; 33:1166-74. [PMID: 22505138 DOI: 10.1002/humu.22102] [Citation(s) in RCA: 83] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2011] [Accepted: 03/28/2012] [Indexed: 12/21/2022]
Abstract
High-throughput sequencing data generation demands the development of methods for interpreting the effects of genomic variants. Numerous computational methods have been developed to assess the impact of variations because experimental methods are unable to cope with both the speed and volume of data generation. To harness the strength of currently available predictors, the Pathogenic-or-Not-Pipeline (PON-P) integrates five predictors to predict the probability that nonsynonymous variations affect protein function and may consequently be disease related. Random forest methodology-based PON-P shows consistently improved performance in cross-validation tests and on independent test sets, providing ternary classification and statistical reliability estimate of results. Applied to missense variants in a melanoma cancer cell line, PON-P predicts variants in 17 genes to affect protein function. Previous studies implicate nine of these genes in the pathogenesis of various forms of cancer. PON-P may thus be used as a first step in screening and prioritizing variants to determine deleterious ones for further experimentation.
Collapse
Affiliation(s)
- Ayodeji Olatubosun
- Institute of Biomedical Technology, University of Tampere, Tampere, Finland
| | | | | | | | | |
Collapse
|
46
|
Luu TD, Rusu AM, Walter V, Ripp R, Moulinier L, Muller J, Toursel T, Thompson JD, Poch O, Nguyen H. MSV3d: database of human MisSense Variants mapped to 3D protein structure. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2012; 2012:bas018. [PMID: 22491796 PMCID: PMC3317913 DOI: 10.1093/database/bas018] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
The elucidation of the complex relationships linking genotypic and phenotypic variations to protein structure is a major challenge in the post-genomic era. We present MSV3d (Database of human MisSense Variants mapped to 3D protein structure), a new database that contains detailed annotation of missense variants of all human proteins (20 199 proteins). The multi-level characterization includes details of the physico-chemical changes induced by amino acid modification, as well as information related to the conservation of the mutated residue and its position relative to functional features in the available or predicted 3D model. Major releases of the database are automatically generated and updated regularly in line with the dbSNP (database of Single Nucleotide Polymorphism) and SwissVar releases, by exploiting the extensive Décrypthon computational grid resources. The database (http://decrypthon.igbmc.fr/msv3d) is easily accessible through a simple web interface coupled to a powerful query engine and a standard web service. The content is completely or partially downloadable in XML or flat file formats. Database URL:http://decrypthon.igbmc.fr/msv3d
Collapse
Affiliation(s)
- Tien-Dao Luu
- Laboratoire de Bioinformatique et Génomique Intégratives, Institut de Génétique et de Biologie Moléculaire et Cellulaire (UMR7104), 67404 Illkirch
| | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Partition dataset according to amino acid type improves the prediction of deleterious non-synonymous SNPs. Biochem Biophys Res Commun 2012; 419:99-103. [PMID: 22326261 DOI: 10.1016/j.bbrc.2012.01.138] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2012] [Accepted: 01/27/2012] [Indexed: 11/23/2022]
Abstract
Many non-synonymous SNPs (nsSNPs) are associated with diseases, and numerous machine learning methods have been applied to train classifiers for sorting disease-associated nsSNPs from neutral ones. The continuously accumulated nsSNP data allows us to further explore better prediction approaches. In this work, we partitioned the training data into 20 subsets according to either original or substituted amino acid type at the nsSNP site. Using support vector machine (SVM), training classification models on each subset resulted in an overall accuracy of 76.3% or 74.9% depending on the two different partition criteria, while training on the whole dataset obtained an accuracy of only 72.6%. Moreover, the dataset was also randomly divided into 20 subsets, but the corresponding accuracy was only 73.2%. Our results demonstrated that partitioning the whole training dataset into subsets properly, i.e., according to the residue type at the nsSNP site, will improve the performance of the trained classifiers significantly, which should be valuable in developing better tools for predicting the disease-association of nsSNPs.
Collapse
|
48
|
Qin W, Li Y, Li J, Yu L, Wu D, Jing R, Pu X, Guo Y, Li M. Predicting deleterious non-synonymous single nucleotide polymorphisms in signal peptides based on hybrid sequence attributes. Comput Biol Chem 2012; 36:31-5. [DOI: 10.1016/j.compbiolchem.2011.12.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2011] [Revised: 12/13/2011] [Accepted: 12/21/2011] [Indexed: 10/14/2022]
|
49
|
Neighborhood properties are important determinants of temperature sensitive mutations. PLoS One 2011; 6:e28507. [PMID: 22164302 PMCID: PMC3229608 DOI: 10.1371/journal.pone.0028507] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2011] [Accepted: 11/09/2011] [Indexed: 02/08/2023] Open
Abstract
Temperature-sensitive (TS) mutants are powerful tools to study gene function in vivo. These mutants exhibit wild-type activity at permissive temperatures and reduced activity at restrictive temperatures. Although random mutagenesis can be used to generate TS mutants, the procedure is laborious and unfeasible in multicellular organisms. Further, the underlying molecular mechanisms of the TS phenotype are poorly understood. To elucidate TS mechanisms, we used a machine learning method–logistic regression–to investigate a large number of sequence and structure features. We developed and tested 133 features, describing properties of either the mutation site or the mutation site neighborhood. We defined three types of neighborhood using sequence distance, Euclidean distance, and topological distance. We discovered that neighborhood features outperformed mutation site features in predicting TS mutations. The most predictive features suggest that TS mutations tend to occur at buried and rigid residues, and are located at conserved protein domains. The environment of a buried residue often determines the overall structural stability of a protein, thus may lead to reversible activity change upon temperature switch. We developed TS prediction models based on logistic regression and the Lasso regularized procedure. Through a ten-fold cross-validation, we obtained the area under the curve of 0.91 for the model using both sequence and structure features. Testing on independent datasets suggested that the model predicted TS mutations with a 50% precision. In summary, our study elucidated the molecular basis of TS mutants and suggested the importance of neighborhood properties in determining TS mutations. We further developed models to predict TS mutations derived from single amino acid substitutions. In this way, TS mutants can be efficiently obtained through experimentally introducing the predicted mutations.
Collapse
|
50
|
Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet 2011; 12:628-40. [PMID: 21850043 DOI: 10.1038/nrg3046] [Citation(s) in RCA: 397] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Genome and exome sequencing yield extensive catalogues of human genetic variation. However, pinpointing the few phenotypically causal variants among the many variants present in human genomes remains a major challenge, particularly for rare and complex traits wherein genetic information alone is often insufficient. Here, we review approaches to estimate the deleteriousness of single nucleotide variants (SNVs), which can be used to prioritize disease-causal variants. We describe recent advances in comparative and functional genomics that enable systematic annotation of both coding and non-coding variants. Application and optimization of these methods will be essential to find the genetic answers that sequencing promises to hide in plain sight.
Collapse
|