1
|
Yang S, Liu D, Song Y, Liang Y, Yu H, Zuo Y. Designing a structure-function alphabet of helix based on reduced amino acid clusters. Arch Biochem Biophys 2024; 754:109942. [PMID: 38387828 DOI: 10.1016/j.abb.2024.109942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 02/16/2024] [Accepted: 02/19/2024] [Indexed: 02/24/2024]
Abstract
Several simple secondary structures could form complex and diverse functional proteins, meaning that secondary structures may contain a lot of hidden information and are arranged according to certain principles, to carry enough information of functional specificity and diversity. However, these inner information and principles have not been understood systematically. In our study, we designed a structure-function alphabet of helix based on reduced amino acid clusters to describe the typical features of helices and delve into the information. Firstly, we selected 480 typical helices from membrane proteins, zymoproteins, transcription factors, and other proteins to define and calculate the interval range, and the helices are classified in terms of hydrophilicity, charge and length: (1) hydrophobic helix (≤43%), amphiphilic helix (43%∼71%), and hydrophilic helix (≥71%). (2) positive helix, negative helix, electrically neutral helix and uncharged helix. (3) short helix (≤8 aa), medium-length helix (9-28 aa), and long helix (≥29 aa). Then, we designed an alphabet containing 36 triplet codes according to the above classification, so that the main features of each helix can be represented by only three letters. This alphabet not only preliminarily defined the helix characteristics, but also greatly reduced the informational dimension of protein structure. Finally, we present an application example to demonstrate the value of the structure-function alphabet in protein functional determination and differentiation.
Collapse
Affiliation(s)
- Siqi Yang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Dongyang Liu
- Key Laboratory of Photobiology, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China; University of Chinese Academy of Sciences, Beijing, China
| | - Yancheng Song
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Yuchao Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Haoyu Yu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010021, China.
| |
Collapse
|
2
|
Cesaro A, Bagheri M, Torres MDT, Wan F, de la Fuente-Nunez C. Deep learning tools to accelerate antibiotic discovery. Expert Opin Drug Discov 2023; 18:1245-1257. [PMID: 37794737 PMCID: PMC10790350 DOI: 10.1080/17460441.2023.2250721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 08/18/2023] [Indexed: 10/06/2023]
Abstract
INTRODUCTION As machine learning (ML) and artificial intelligence (AI) expand to many segments of our society, they are increasingly being used for drug discovery. Recent deep learning models offer an efficient way to explore high-dimensional data and design compounds with desired properties, including those with antibacterial activity. AREAS COVERED This review covers key frameworks in antibiotic discovery, highlighting physicochemical features and addressing dataset limitations. The deep learning approaches here described include discriminative models such as convolutional neural networks, recurrent neural networks, graph neural networks, and generative models like neural language models, variational autoencoders, generative adversarial networks, normalizing flow, and diffusion models. As the integration of these approaches in drug discovery continues to evolve, this review aims to provide insights into promising prospects and challenges that lie ahead in harnessing such technologies for the development of antibiotics. EXPERT OPINION Accurate antimicrobial prediction using deep learning faces challenges such as imbalanced data, limited datasets, experimental validation, target strains, and structure. The integration of deep generative models with bioinformatics, molecular dynamics, and data augmentation holds the potential to overcome these challenges, enhance model performance, and utlimately accelerate antimicrobial discovery.
Collapse
Affiliation(s)
- Angela Cesaro
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Mojtaba Bagheri
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Marcelo D. T. Torres
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Fangping Wan
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
3
|
Forghani M, Firstkov AL, Alyannezhadi MM, Danilenko DM, Komissarov AB. Reduced amino acid alphabet-based encoding and its impact on modeling influenza antigenic evolution. RUSSIAN JOURNAL OF INFECTION AND IMMUNITY 2022. [DOI: 10.15789/2220-7619-raa-1968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Currently, vaccination is one of the most efficient ways to control and prevent influenza infection. Vaccine production largely relies on the results of laboratory assays, including hemagglutination inhibition and microneutralization assays, which are time-consuming and laborious. Viruses can escape from the immune response that results in the need to revise and update vaccines biannually. The hemagglutination inhibition assay can measure how effectively antibodies against a reference strain bind and block an antigen of the test strain. Various computer-aided models have been developed to optimize candidate vaccine strain selection. A general problem in modeling of antigenic evolution is the representation of genetic sequences for input into the research model. Our motivation stems from the well-known problem of encoding genetic information for modeling antigenic evolution. This paper introduces a two-fold encoding approach based on reduced amino acid alphabet and amino acid index databases called AAindex. We propose to apply a simplified amino acid alphabet in modeling of antigenic evolution. A simplified alphabet, also called a sub-alphabet or reduced amino acid alphabet, implies to use the 20 amino acids being clustered and divided into amino acid groups. The proposed encoding allows to redefine mutations termed for amino acid groups located in reduced alphabets. We investigated 40 reduced amino acid sets and their performance in modeling antigenic evolution. The experimental results indicate that the proposed reduced amino acid alphabets can achieve the performance of the standard alphabet in its accuracy. Moreover, these alphabets provide deeper insight into various aspects of the relationship between mutation and antigenic variation. By checking identified high-impact sites in the Influenza Research Database, we found that not only antigenic sites have a significant influence on antigenicity, but also other amino acids located in close proximity. The results indicate that all selected non-antigenic sites are related to immune responses. According to the Influenza Research Database, these have been experimentally determined to be T-cell epitopes, B-cell epitopes, and MHC-binding epitopes of different classes. This highlighted a caveat: while simulating antigenic evolution, the model should consider not only the genetic information on antigenic sites, but also that of neighboring positions, as they may indirectly impact antigenicity. Additionally, our findings indicate that structural and charge characteristics are the most beneficial in modeling antigenic evolution, which is in agreement with previous studies.
Collapse
|
4
|
Molecular and thermodynamic mechanisms for protein adaptation. EUROPEAN BIOPHYSICS JOURNAL 2022; 51:519-534. [DOI: 10.1007/s00249-022-01618-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Revised: 08/01/2022] [Accepted: 09/20/2022] [Indexed: 11/07/2022]
|
5
|
Nguyen Q, Tran HV, Nguyen BP, Do TTT. Identifying Transcription Factors That Prefer Binding to Methylated DNA Using Reduced G-Gap Dipeptide Composition. ACS OMEGA 2022; 7:32322-32330. [PMID: 36119976 PMCID: PMC9475634 DOI: 10.1021/acsomega.2c03696] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Accepted: 08/23/2022] [Indexed: 06/15/2023]
Abstract
Transcription factors (TFs) play an important role in gene expression and regulation of 3D genome conformation. TFs have ability to bind to specific DNA fragments called enhancers and promoters. Some TFs bind to promoter DNA fragments which are near the transcription initiation site and form complexes that allow polymerase enzymes to bind to initiate transcription. Previous studies showed that methylated DNAs had ability to inhibit and prevent TFs from binding to DNA fragments. However, recent studies have found that there were TFs that could bind to methylated DNA fragments. The identification of these TFs is an important steppingstone to a better understanding of cellular gene expression mechanisms. However, as experimental methods are often time-consuming and labor-intensive, developing computational methods is essential. In this study, we propose two machine learning methods for two problems: (1) identifying TFs and (2) identifying TFs that prefer binding to methylated DNA targets (TFPMs). For the TF identification problem, the proposed method uses the position-specific scoring matrix for data representation and a deep convolutional neural network for modeling. This method achieved 90.56% sensitivity, 83.96% specificity, and an area under the receiver operating characteristic curve (AUC) of 0.9596 on an independent test set. For the TFPM identification problem, we propose to use the reduced g-gap dipeptide composition for data representation and the support vector machine algorithm for modeling. This method achieved 82.61% sensitivity, 64.86% specificity, and an AUC of 0.8486 on another independent test set. These results are higher than those of other studies on the same problems.
Collapse
Affiliation(s)
- Quang
H. Nguyen
- School
of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet, Hanoi 100000, Vietnam
| | - Hoang V. Tran
- School
of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet, Hanoi 100000, Vietnam
| | - Binh P. Nguyen
- School
of Mathematics and Statistics, Victoria
University of Wellington, Kelburn Parade, Wellington 6140, New Zealand
| | - Trang T. T. Do
- School
of Innovation, Design and Technology, Wellington
Institute of Technology, 21 Kensington Avenue, Lower Hutt 5012, New Zealand
| |
Collapse
|
6
|
Liang Y, Yang S, Zheng L, Wang H, Zhou J, Huang S, Yang L, Zuo Y. Research progress of reduced amino acid alphabets in protein analysis and prediction. Comput Struct Biotechnol J 2022; 20:3503-3510. [PMID: 35860409 PMCID: PMC9284397 DOI: 10.1016/j.csbj.2022.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 06/30/2022] [Accepted: 07/01/2022] [Indexed: 11/29/2022] Open
Abstract
A comprehensive summary of the literature on the reduced amino acid alphabets. A systematic review of the development history of reduced amino acid alphabets. Rich application cases of amino acid reduction alphabets are described in the article. A detailed analysis of the properties and uses of the reduced amino acid alphabets.
Proteins are the executors of cellular physiological activities, and accurate structural and function elucidation are crucial for the refined mapping of proteins. As a feature engineering method, the reduction of amino acid composition is not only an important method for protein structure and function analysis, but also opens a broad horizon for the complex field of machine learning. Representing sequences with fewer amino acid types greatly reduces the complexity and noise of traditional feature engineering in dimension, and provides more interpretable predictive models for machine learning to capture key features. In this paper, we systematically reviewed the strategy and method studies of the reduced amino acid (RAA) alphabets, and summarized its main research in protein sequence alignment, functional classification, and prediction of structural properties, respectively. In the end, we gave a comprehensive analysis of 672 RAA alphabets from 74 reduction methods.
Collapse
Affiliation(s)
- Yuchao Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Siqi Yang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Hao Wang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Jian Zhou
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Shenghui Huang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
- Corresponding authors.
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
- Corresponding authors.
| |
Collapse
|
7
|
Zheng L, Liu D, Li YA, Yang S, Liang Y, Xing Y, Zuo Y. RaacFold: a webserver for 3D visualization and analysis of protein structure by using reduced amino acid alphabets. Nucleic Acids Res 2022; 50:W633-W638. [PMID: 35639512 PMCID: PMC9252778 DOI: 10.1093/nar/gkac415] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 04/23/2022] [Accepted: 05/09/2022] [Indexed: 12/11/2022] Open
Abstract
Protein structure exhibits greater complexity and diversity than DNA structure, and usually affects the interpretation of the function, interactions and biological annotations. Reduced amino acid alphabets (Raaa) exhibit a powerful ability to decrease protein complexity and identify functional conserved regions, which motivated us to create RaacFold. The RaacFold provides 687 reduced amino acid clusters (Raac) based on 58 reduction methods and offers three analysis tools: Protein Analysis, Align Analysis, and Multi Analysis. The Protein Analysis and Align Analysis provide reduced representations of sequence-structure according to physicochemical similarities and computational biology strategies. With the simplified representations, the protein structure can be viewed more concise and clearer to capture biological insight than the unreduced structure. Thus, the design of artificial protein will be more convenient, and redundant interference is avoided. In addition, Multi Analysis allows users to explore biophysical variation and conservation in the evolution of protein structure and function. This supplies important information for the identification and exploration of the nonhomologous functions of paralogs. Simultaneously, RaacFold provides powerful 2D and 3D rendering performance with advanced parameters for sequences, structures, and related annotations. RaacFold is freely available at http://bioinfor.imu.edu.cn/raacfold.
Collapse
Affiliation(s)
- Lei Zheng
- State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Dongyang Liu
- Photosynthesis Research Center, Key Laboratory of Photobiology, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | | | - Siqi Yang
- State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Yuchao Liang
- State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Yongqiang Xing
- The Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou 014010, China.,Department of Biological Sciences, Center for Systems Biology, the University of Texas at Dallas, Richardson, TX 75080-3021, USA
| | - Yongchun Zuo
- State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| |
Collapse
|
8
|
Das JK, Heryakusuma C, Susanti D, Choudhury PP, Mukhopadhyay B. Reduced Protein Sequence Patterns in Identifying Key Structural Elements of Dissimilatory Sulfite Reductase Homologs. Comput Biol Chem 2022; 98:107691. [DOI: 10.1016/j.compbiolchem.2022.107691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Revised: 04/26/2022] [Accepted: 04/26/2022] [Indexed: 11/03/2022]
|
9
|
Sikander R, Ghulam A, Ali F. XGB-DrugPred: computational prediction of druggable proteins using eXtreme gradient boosting and optimized features set. Sci Rep 2022; 12:5505. [PMID: 35365726 PMCID: PMC8976041 DOI: 10.1038/s41598-022-09484-3] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Accepted: 03/07/2022] [Indexed: 11/19/2022] Open
Abstract
Accurate identification of drug-targets in human body has great significance for designing novel drugs. Compared with traditional experimental methods, prediction of drug-targets via machine learning algorithms has enhanced the attention of many researchers due to fast and accurate prediction. In this study, we propose a machine learning-based method, namely XGB-DrugPred for accurate prediction of druggable proteins. The features from primary protein sequences are extracted by group dipeptide composition, reduced amino acid alphabet, and novel encoder pseudo amino acid composition segmentation. To select the best feature set, eXtreme Gradient Boosting-recursive feature elimination is implemented. The best feature set is provided to eXtreme Gradient Boosting (XGB), Random Forest, and Extremely Randomized Tree classifiers for model training and prediction. The performance of these classifiers is evaluated by tenfold cross-validation. The empirical results show that XGB-based predictor achieves the best results compared with other classifiers and existing methods in the literature.
Collapse
Affiliation(s)
- Rahu Sikander
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China.
| | - Ali Ghulam
- Computerization and Network Section, Sindh Agriculture University, Tandojam, Pakistan
| | - Farman Ali
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
| |
Collapse
|
10
|
Fried SD, Fujishima K, Makarov M, Cherepashuk I, Hlouchova K. Peptides before and during the nucleotide world: an origins story emphasizing cooperation between proteins and nucleic acids. J R Soc Interface 2022; 19:20210641. [PMID: 35135297 PMCID: PMC8833103 DOI: 10.1098/rsif.2021.0641] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Recent developments in Origins of Life research have focused on substantiating the narrative of an abiotic emergence of nucleic acids from organic molecules of low molecular weight, a paradigm that typically sidelines the roles of peptides. Nevertheless, the simple synthesis of amino acids, the facile nature of their activation and condensation, their ability to recognize metals and cofactors and their remarkable capacity to self-assemble make peptides (and their analogues) favourable candidates for one of the earliest functional polymers. In this mini-review, we explore the ramifications of this hypothesis. Diverse lines of research in molecular biology, bioinformatics, geochemistry, biophysics and astrobiology provide clues about the progression and early evolution of proteins, and lend credence to the idea that early peptides served many central prebiotic roles before they were encodable by a polynucleotide template, in a putative 'peptide-polynucleotide stage'. For example, early peptides and mini-proteins could have served as catalysts, compartments and structural hubs. In sum, we shed light on the role of early peptides and small proteins before and during the nucleotide world, in which nascent life fully grasped the potential of primordial proteins, and which has left an imprint on the idiosyncratic properties of extant proteins.
Collapse
Affiliation(s)
- Stephen D Fried
- Department of Chemistry, Johns Hopkins University, Baltimore, MD 21212, USA.,Department of Biophysics, Johns Hopkins University, Baltimore, MD 21212, USA
| | - Kosuke Fujishima
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo 1528550, Japan.,Graduate School of Media and Governance, Keio University, Fujisawa 2520882, Japan
| | - Mikhail Makarov
- Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague 12800, Czech Republic
| | - Ivan Cherepashuk
- Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague 12800, Czech Republic
| | - Klara Hlouchova
- Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague 12800, Czech Republic.,Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague 16610, Czech Republic
| |
Collapse
|
11
|
Fonseca NJ, Afonso MQL, Carrijo L, Bleicher L. CONAN: a web application to detect specificity determinants and functional sites by amino acids co-variation network analysis. Bioinformatics 2021; 37:1026-1028. [PMID: 32780795 DOI: 10.1093/bioinformatics/btaa713] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Revised: 08/01/2020] [Accepted: 08/05/2020] [Indexed: 11/12/2022] Open
Abstract
SUMMARY CONAN is a web application developed to detect specificity determinants and function-related sites by amino acids co-variation networks analysis, emphasizing local coevolutionary constraints. The software allows the characterization of structurally and functionally relevant groups of residues and their relationship with subsets of sequences by automatic cross-referencing with GO terms, UniprotKb annotations and INTERPRO. AVAILABILITY AND IMPLEMENTATION CONAN is free and open-source, being distributed in the terms of the GPLV3 license. The software is available as a web application and python script versions and can be accessed at http://bioinfo.icb.ufmg.br/conan. We also provide running instructions, the source code and a user guide.
Collapse
Affiliation(s)
- N J Fonseca
- Cellular Structure and 3D Bioimaging, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK.,Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - M Q L Afonso
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - L Carrijo
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - L Bleicher
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte 31270-901, Brazil
| |
Collapse
|
12
|
Iannuzzi R, Rossetti G, Spitaleri A, Bonnal RJP, Pagani M, Mollica L. A Simplified Amino Acidic Alphabet to Unveil the T-Cells Receptors Antigens: A Computational Perspective. Front Chem 2021; 9:598802. [PMID: 33718327 PMCID: PMC7947793 DOI: 10.3389/fchem.2021.598802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 01/19/2021] [Indexed: 11/15/2022] Open
Abstract
The exposure to pathogens triggers the activation of adaptive immune responses through antigens bound to surface receptors of antigen presenting cells (APCs). T cell receptors (TCR) are responsible for initiating the immune response through their physical direct interaction with antigen-bound receptors on the APCs surface. The study of T cell interactions with antigens is considered of crucial importance for the comprehension of the role of immune responses in cancer growth and for the subsequent design of immunomodulating anticancer drugs. RNA sequencing experiments performed on T cells represented a major breakthrough for this branch of experimental molecular biology. Apart from the gene expression levels, the hypervariable CDR3α/β sequences of the TCR loops can now be easily determined and modelled in the three dimensions, being the portions of TCR mainly responsible for the interaction with APC receptors. The most direct experimental method for the investigation of antigens would be based on peptide libraries, but their huge combinatorial nature, size, cost, and the difficulty of experimental fine tuning makes this approach complicated time consuming, and costly. We have implemented in silico methodology with the aim of moving from CDR3α/β sequences to a library of potentially antigenic peptides that can be used in immunologically oriented experiments to study T cells’ reactivity. To reduce the size of the library, we have verified the reproducibility of experimental benchmarks using the permutation of only six residues that can be considered representative of all ensembles of 20 natural amino acids. Such a simplified alphabet is able to correctly find the poses and chemical nature of original antigens within a small subset of ligands of potential interest. The newly generated library would have the advantage of leading to potentially antigenic ligands that would contribute to a better understanding of the chemical nature of TCR-antigen interactions. This step is crucial in the design of immunomodulators targeted towards T-cells response as well as in understanding the first principles of an immune response in several diseases, from cancer to autoimmune disorders.
Collapse
Affiliation(s)
- Raffaele Iannuzzi
- Istituto Nazionale Genetica Molecolare INGM 'Romeo ed Enrica Invernizzi', Milan, Italy
| | - Grazisa Rossetti
- Molecular Oncology and Immunology, FIRC Institute of Molecular Oncology (IFOM), Milan, Italy
| | - Andrea Spitaleri
- Emerging Bacterial Pathogens Unit, Division of Immunology, Transplantation and Infectious Diseases, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Raoul J P Bonnal
- Molecular Oncology and Immunology, FIRC Institute of Molecular Oncology (IFOM), Milan, Italy
| | - Massimiliano Pagani
- Molecular Oncology and Immunology, FIRC Institute of Molecular Oncology (IFOM), Milan, Italy.,Department of Medical Biotechnology and Translational Medicine, University of Milan, Milan, Italy
| | - Luca Mollica
- Department of Medical Biotechnology and Translational Medicine, University of Milan, Milan, Italy
| |
Collapse
|
13
|
Feng P, Liu W, Huang C, Tang Z. Classifying the superfamily of small heat shock proteins by using g-gap dipeptide compositions. Int J Biol Macromol 2020; 167:1575-1578. [PMID: 33212104 DOI: 10.1016/j.ijbiomac.2020.11.111] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 11/02/2020] [Accepted: 11/13/2020] [Indexed: 01/16/2023]
Abstract
Small heat shock protein (sHSP) is a superfamily of molecular chaperone and is found from archaea to human. Recent researches have demonstrated that sHSPs participate in a series of biological processes and are even closely associated with serious diseases. Since sHSP is a very large superfamily and members from different superfamilies exhibit distinct functions, accurate classification of the subfamily of sHSP will be helpful for unrevealing its functions. In the present work, a support vector machine-based method was proposed to classify the subfamily of sHSPs. In the 10-fold cross validation test, an overall accuracy of 93.25% was obtained for classifying the subfamily of sHSPs. The superiority of the proposed method was also demonstrated by comparing it with the other methods. It is anticipated that the proposed method will become a useful tool for classifying the subfamily of sHSPs.
Collapse
Affiliation(s)
- Pengmian Feng
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China.
| | - Weiwei Liu
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China
| | - Cong Huang
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China
| | - Zhaohui Tang
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China
| |
Collapse
|
14
|
Sun Z, Huang S, Zheng L, Liang P, Yang W, Zuo Y. ICTC-RAAC: An improved web predictor for identifying the types of ion channel-targeted conotoxins by using reduced amino acid cluster descriptors. Comput Biol Chem 2020; 89:107371. [PMID: 32950852 DOI: 10.1016/j.compbiolchem.2020.107371] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Revised: 09/01/2020] [Accepted: 09/02/2020] [Indexed: 12/27/2022]
Abstract
Conotoxins are small peptide toxins which are rich in disulfide and have the unique diversity of sequences. It is significant to correctly identify the types of ion channel-targeted conotoxins because that they are considered as the optimal pharmacological candidate medicine in drug design owing to their ability specifically binding to ion channels and interfering with neural transmission. Comparing with other feature extracting methods, the reduced amino acid cluster (RAAC) better resolved in simplifying protein complexity and identifying functional conserved regions. Thus, in our study, 673 RAACs generated from 74 types of reduced amino acid alphabet were comprehensively assessed to establish a state-of-the-art predictor for predicting ion channel-targeted conotoxins. The results showed Type 20, Cluster 9 (T = 20, C = 9) in the tripeptide composition (N = 3) achieved the best accuracy, 89.3%, which was based on the algorithm of amino acids reduction of variance maximization. Further, the ANOVA with incremental feature selection (IFS) was used for feature selection to improve prediction performance. Finally, the cross-validation results showed that the best overall accuracy we calculated was 96.4% and 1.8% higher than the best accuracy of previous studies. Based on the predictor we proposed, a user-friendly webserver was established and can be friendly accessed at http://bioinfor.imu.edu.cn/ictcraac.
Collapse
Affiliation(s)
- Zijie Sun
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China; School of Mathematical Sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Shenghui Huang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Pengfei Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Wuritu Yang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China.
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China.
| |
Collapse
|
15
|
Caldararu O, Mehra R, Blundell TL, Kepp KP. Systematic Investigation of the Data Set Dependency of Protein Stability Predictors. J Chem Inf Model 2020; 60:4772-4784. [DOI: 10.1021/acs.jcim.0c00591] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Octav Caldararu
- DTU Chemistry, Technical University of Denmark, Building 206, 2800 Kgs. Lyngby, Denmark
| | - Rukmankesh Mehra
- DTU Chemistry, Technical University of Denmark, Building 206, 2800 Kgs. Lyngby, Denmark
| | - Tom L. Blundell
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
| | - Kasper P. Kepp
- DTU Chemistry, Technical University of Denmark, Building 206, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
16
|
Li X, Tang Q, Tang H, Chen W. Identifying Antioxidant Proteins by Combining Multiple Methods. Front Bioeng Biotechnol 2020; 8:858. [PMID: 32793581 PMCID: PMC7391787 DOI: 10.3389/fbioe.2020.00858] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Accepted: 07/03/2020] [Indexed: 11/13/2022] Open
Abstract
Antioxidant proteins play important roles in preventing free radical oxidation from damaging cells and DNA. They have become ideal candidates of disease prevention and treatment. Therefore, it is urgent to identify antioxidants from natural compounds. Since experimental methods are still cost ineffective, a series of computational methods have been proposed to identify antioxidant proteins. However, the performance of the current methods are still not satisfactory. In this study, a support vector machine based method, called Vote9, was proposed to identify antioxidants, in which the sequences were encoded by using the features generated from 9 optimal individual models. Results from jackknife test demonstrated that Vote9 is comparable with the best one of the existing predictors for this task. We hope that Vote9 will become a useful tool or at least can play a complementary role to the existing methods for identifying antioxidants.
Collapse
Affiliation(s)
- Xianhai Li
- School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China.,Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Qiang Tang
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Hua Tang
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Wei Chen
- School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China.,Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China.,School of Life Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, China
| |
Collapse
|
17
|
Feng P, Wang Z. Recent Advances in Computational Methods for Identifying Anticancer Peptides. Curr Drug Targets 2020; 20:481-487. [PMID: 30068270 DOI: 10.2174/1389450119666180801121548] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2018] [Revised: 05/28/2018] [Accepted: 05/28/2018] [Indexed: 01/10/2023]
Abstract
Anticancer peptide (ACP) is a kind of small peptides that can kill cancer cells without damaging normal cells. In recent years, ACP has been pre-clinically used for cancer treatment. Therefore, accurate identification of ACPs will promote their clinical applications. In contrast to labor-intensive experimental techniques, a series of computational methods have been proposed for identifying ACPs. In this review, we briefly summarized the current progress in computational identification of ACPs. The challenges and future perspectives in developing reliable methods for identification of ACPs were also discussed. We anticipate that this review could provide novel insights into future researches on anticancer peptides.
Collapse
Affiliation(s)
- Pengmian Feng
- School of Public Health, North China University of Science and Technology, Tangshan, 063000, China
| | - Zhenyi Wang
- Center for Genomics and Computational Biology, School of Life Science, North China University of Science and Technology, Tangshan, 063000, China
| |
Collapse
|
18
|
Zheng L, Liu D, Yang W, Yang L, Zuo Y. RaacLogo: a new sequence logo generator by using reduced amino acid clusters. Brief Bioinform 2020; 22:5855392. [PMID: 32524143 DOI: 10.1093/bib/bbaa096] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2020] [Revised: 04/12/2020] [Accepted: 04/29/2020] [Indexed: 12/15/2022] Open
Abstract
Sequence logos give a fast and concise display in visualizing consensus sequence. Protein exhibits greater complexity and diversity than DNA, which usually affects the graphical representation of the logo. Reduced amino acids perform powerful ability for simplifying complexity of sequence alignment, which motivated us to establish RaacLogo. As a new sequence logo generator by using reduced amino acid alphabets, RaacLogo can easily generate many different simplified logos tailored to users by selecting various reduced amino acid alphabets that consisted of more than 40 clustering algorithms. This current web server provides 74 types of reduced amino acid alphabet, which were manually extracted to generate 673 reduced amino acid clusters (RAACs) for dealing with protein alignment. A two-dimensional selector was proposed for easily selecting desired RAACs with underlying biology knowledge. It is anticipated that the RaacLogo web server will play more high-potential roles for protein sequence alignment, topological estimation and protein design experiments. RaacLogo is freely available at http://bioinfor.imu.edu.cn/raaclogo.
Collapse
Affiliation(s)
- Lei Zheng
- State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University
| | - Dongyang Liu
- State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University
| | - Wuritu Yang
- State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University
| | - Yongchun Zuo
- State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University
| |
Collapse
|
19
|
Arif M, Ali F, Ahmad S, Kabir M, Ali Z, Hayat M. Pred-BVP-Unb: Fast prediction of bacteriophage Virion proteins using un-biased multi-perspective properties with recursive feature elimination. Genomics 2019; 112:1565-1574. [PMID: 31526842 DOI: 10.1016/j.ygeno.2019.09.006] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 08/27/2019] [Accepted: 09/11/2019] [Indexed: 10/26/2022]
Abstract
Bacteriophage virion proteins (BVPs) are bacterial viruses that have a great impact on different biological functions of bacteria. They are significantly used in genetic engineering and phage therapy applications. Correct identification of BVP through conventional pathogen methods are slow and expensive. Thus, designing a Bioinformatics predictor is urgently desirable to accelerate correct identification of BVPs within a huge volume of proteins. However, available prediction tools performance is inadequate due to the lack of useful feature representation and severe imbalance issue. In the present study, we propose an intelligent model, called Pred-BVP-Unb for discrimination of BVPs that employed three nominal sequences-driven descriptors, i.e. Bi-PSSM evolutionary information, composition & translation, and split amino acid composition. The imbalance phenomena between classes were coped with the help of a synthetic minority oversampling technique. The essential attributes are selected by a robust algorithm called recursive feature elimination. Finally, the optimal feature space is provided to support vector machine classifier using a radial base kernel in order to train the model. Our predictor remarkably outperforms than existing approaches in the literature by achieving the highest accuracy of 92.54% and 83.06% respectively on the benchmark and independent datasets. We expect that Pred-BVP-Unb tool can provide useful hints for designing antibacterial drugs and also helpful to expedite large scale discovery of new bacteriophage virion proteins. The source code and all datasets are publicly available at https://github.com/Muhammad-Arif-NUST/BVP_Pred_Unb.
Collapse
Affiliation(s)
- Muhammad Arif
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China; Department of Computer Science, Abdul Wali Khan University Mardan, KP, Pakistan.
| | - Farman Ali
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.
| | - Saeed Ahmad
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Muhammad Kabir
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Zakir Ali
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, KP, Pakistan.
| |
Collapse
|
20
|
Narwani TJ, Craveur P, Shinada NK, Floch A, Santuz H, Vattekatte AM, Srinivasan N, Rebehmed J, Gelly JC, Etchebest C, de Brevern AG. Discrete analyses of protein dynamics. J Biomol Struct Dyn 2019; 38:2988-3002. [PMID: 31361191 DOI: 10.1080/07391102.2019.1650112] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Protein structures are highly dynamic macromolecules. This dynamics is often analysed through experimental and/or computational methods only for an isolated or a limited number of proteins. Here, we explore large-scale protein dynamics simulation to observe dynamics of local protein conformations using different perspectives. We analysed molecular dynamics to investigate protein flexibility locally, using classical approaches such as RMSf, solvent accessibility, but also innovative approaches such as local entropy. First, we focussed on classical secondary structures and analysed specifically how β-strand, β-turns, and bends evolve during molecular simulations. We underlined interesting specific bias between β-turns and bends, which are considered as the same category, while their dynamics show differences. Second, we used a structural alphabet that is able to approximate every part of the protein structures conformations, namely protein blocks (PBs) to analyse (i) how each initial local protein conformations evolve during dynamics and (ii) if some exchange can exist among these PBs. Interestingly, the results are largely complex than simple regular/rigid and coil/flexible exchange. AbbreviationsNeqnumber of equivalentPBProtein BlocksPDBProtein DataBankRMSfroot mean square fluctuationsCommunicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Tarun Jairaj Narwani
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Pierrick Craveur
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Nicolas K Shinada
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Discngine, SAS, Paris, France
| | - Aline Floch
- Laboratoire D'Excellence GR-Ex, Paris, France.,Etablissement Français du Sang Ile de France, Créteil, France.,IMRB - INSERM U955 Team 2 « Transfusion et Maladies du Globule Rouge », Paris Est- Créteil Univ, Créteil, France.,UPEC, Université Paris Est-Créteil, Créteil, France
| | - Hubert Santuz
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Akhila Melarkode Vattekatte
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France
| | | | - Joseph Rebehmed
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Department of Computer Science and Mathematics, Lebanese American University, Byblos, Lebanon
| | - Jean-Christophe Gelly
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France.,IBL, Paris, France
| | - Catherine Etchebest
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France
| | - Alexandre G de Brevern
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France.,IBL, Paris, France
| |
Collapse
|
21
|
Chen W, Feng P, Liu T, Jin D. Recent Advances in Machine Learning Methods for Predicting Heat Shock Proteins. Curr Drug Metab 2019; 20:224-228. [DOI: 10.2174/1389200219666181031105916] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 05/21/2018] [Accepted: 08/02/2018] [Indexed: 02/08/2023]
Abstract
Background:As molecular chaperones, Heat Shock Proteins (HSPs) not only play key roles in protein folding and maintaining protein stabilities, but are also linked with multiple kinds of diseases. Therefore, HSPs have been regarded as the focus of drug design. Since HSPs from different families play distinct functions, accurately classifying the families of HSPs is the key step to clearly understand their biological functions. In contrast to laborintensive and cost-ineffective experimental methods, computational classification of HSP families has emerged to be an alternative approach.Methods:We reviewed the paper that described the existing datasets of HSPs and the representative computational approaches developed for the identification and classification of HSPs.Results:The two benchmark datasets of HSPs, namely HSPIR and sHSPdb were introduced, which provided invaluable resources for computationally identifying HSPs. The gold standard dataset and sequence encoding schemes for building computational methods of classifying HSPs were also introduced. The three representative web-servers for identifying HSPs and their families were described.Conclusion:The existing machine learning methods for identifying the different families of HSPs indeed yielded quite encouraging results and did play a role in promoting the research on HSPs. However, the number of HSPs with known structures is very limited. Therefore, determining the structure of the HSPs is also urgent, which will be helpful in revealing their functions.
Collapse
Affiliation(s)
- Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China
| | - Pengmian Feng
- Hebei Province Key Laboratory of Occupational Health and Safety for Coal Industry, School of Public Health, North China University of Science and Technology, Tangshan 063000, China
| | - Tao Liu
- School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063000, China
| | - Dianchuan Jin
- School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063000, China
| |
Collapse
|
22
|
Discrimination power of knowledge-based potential dictated by the dominant energies in native protein structures. Amino Acids 2019; 51:1029-1038. [PMID: 31098784 DOI: 10.1007/s00726-019-02743-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Accepted: 05/08/2019] [Indexed: 01/20/2023]
Abstract
Extracting a well-designed energy function is important for protein structure evaluation. Knowledge-based potential functions are one type of the energy functions which can be obtained from known protein structures. The pairwise potential between atom types is approximated using Boltzmann's law which relates the frequency of atom types to its potential. The total energy is approximated as a summation of pairwise potential between the atomic pairs. In the present study, the performance of knowledge-based potential function was assessed based on the strength of interaction between groups of amino acids. The dominant energies involved in the pairwise potentials were revealed by eigenvalue analysis of the matrix, the elements of which represent the energy between amino acids. For this purpose, the matrix including the mean of the energies of residue-residue interaction types was constructed using 500 native protein structures. The matrix has a dominant eigenvalue and amino acids, with LEU, VAL, ILE, PHE, TYR, ALA and TRP having high values along the dominant eigenvector. The results show that the ranking of amino acids is consistent with the power of amino acids in discriminating native structures using K-alphabet reduced model. In the reduced interactions, only amino acids from a subset of all 20 amino acids, along with their interactions are considered to assess the energy. In the K-alphabet reduced model, the reduced structures are constructed based on only the K-amino acid types. The dominant K-alphabet reduced model derived for the k-first amino acids in the list [LEU, VAL, PHE, ILE, TYR, ALA, TRP] of amino acids has the best discrimination of native structure among all possible K-alphabet reduced models. Knowledge-based potentials might be improved with a new strategy.
Collapse
|
23
|
Akbar S, Hayat M, Kabir M, Iqbal M. iAFP-gap-SMOTE: An Efficient Feature Extraction Scheme Gapped Dipeptide Composition is Coupled with an Oversampling Technique for Identification of Antifreeze Proteins. LETT ORG CHEM 2019. [DOI: 10.2174/1570178615666180816101653] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Antifreeze proteins (AFPs) perform distinguishable roles in maintaining homeostatic conditions of living organisms and protect their cell and body from freezing in extremely cold conditions. Owing to high diversity in protein sequences and structures, the discrimination of AFPs from non- AFPs through experimental approaches is expensive and lengthy. It is, therefore, vastly desirable to propose a computational intelligent and high throughput model that truly reflects AFPs quickly and accurately. In a sequel, a new predictor called “iAFP-gap-SMOTE” is proposed for the identification of AFPs. Protein sequences are expressed by adopting three numerical feature extraction schemes namely; Split Amino Acid Composition, G-gap di-peptide Composition and Reduce Amino Acid alphabet composition. Usually, classification hypothesis biased towards majority class in case of the imbalanced dataset. Oversampling technique Synthetic Minority Over-sampling Technique is employed in order to increase the instances of the lower class and control the biasness. 10-fold cross-validation test is applied to appraise the success rates of “iAFP-gap-SMOTE” model. After the empirical investigation, “iAFP-gap-SMOTE” model obtained 95.02% accuracy. The comparison suggested that the accuracy of” iAFP-gap-SMOTE” model is higher than that of the present techniques in the literature so far. It is greatly recommended that our proposed model “iAFP-gap-SMOTE” might be helpful for the research community and academia.
Collapse
Affiliation(s)
- Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP 23200, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP 23200, Pakistan
| | - Muhammad Kabir
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP 23200, Pakistan
| | - Muhammad Iqbal
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP 23200, Pakistan
| |
Collapse
|
24
|
Yao A, Reed SA, Koh M, Yu C, Luo X, Mehta AP, Schultz PG. Progress toward a reduced phage genetic code. Bioorg Med Chem 2018; 26:5247-5252. [PMID: 29609949 DOI: 10.1016/j.bmc.2018.03.035] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2018] [Revised: 03/14/2018] [Accepted: 03/23/2018] [Indexed: 12/23/2022]
Abstract
All known living organisms use at least 20 amino acids as the basic building blocks of life. Efforts to reduce the number of building blocks in a replicating system to below the 20 canonical amino acids have not been successful to date. In this work, we use filamentous phage as a model system to investigate the feasibility of removing methionine (Met) from the proteome. We show that all 24 elongation Met sites in the M13 phage genome can be replaced by other canonical amino acids. Most of these changes involve substitution of methionine by leucine (Leu), but in some cases additional compensatory mutations are required. Combining Met substituted sites in the proteome generally led to lower viability/infectivity of the mutant phages, which remains the major challenge in eliminating all methionines from the phage proteome. To date a total of 15 (out of all 24) elongation Mets have been simultaneously deleted from the M13 proteome, providing a useful foundation for future efforts to minimize the genetic code.
Collapse
Affiliation(s)
- Anzhi Yao
- The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, United States
| | - Sean A Reed
- The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, United States
| | - Minseob Koh
- The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, United States
| | - Chenguang Yu
- The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, United States
| | - Xiaozhou Luo
- The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, United States
| | - Angad P Mehta
- The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, United States
| | - Peter G Schultz
- The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, United States.
| |
Collapse
|
25
|
Meher PK, Sahu TK, Gahoi S, Rao AR. ir-HSP: Improved Recognition of Heat Shock Proteins, Their Families and Sub-types Based On g-Spaced Di-peptide Features and Support Vector Machine. Front Genet 2018; 8:235. [PMID: 29379521 PMCID: PMC5770798 DOI: 10.3389/fgene.2017.00235] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 12/27/2017] [Indexed: 12/24/2022] Open
Abstract
Heat shock proteins (HSPs) play a pivotal role in cell growth and variability. Since conventional approaches are expensive and voluminous protein sequence information is available in the post-genomic era, development of an automated and accurate computational tool is highly desirable for prediction of HSPs, their families and sub-types. Thus, we propose a computational approach for reliable prediction of all these components in a single framework and with higher accuracy as well. The proposed approach achieved an overall accuracy of ~84% in predicting HSPs, ~97% in predicting six different families of HSPs, and ~94% in predicting four types of DnaJ proteins, with bench mark datasets. The developed approach also achieved higher accuracy as compared to most of the existing approaches. For easy prediction of HSPs by experimental scientists, a user friendly web server ir-HSP is made freely accessible at http://cabgrid.res.in:8080/ir-hsp. The ir-HSP was further evaluated for proteome-wide identification of HSPs by using proteome datasets of eight different species, and ~50% of the predicted HSPs in each species were found to be annotated with InterPro HSP families/domains. Thus, the developed computational method is expected to supplement the currently available approaches for prediction of HSPs, to the extent of their families and sub-types.
Collapse
Affiliation(s)
- Prabina K Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Tanmaya K Sahu
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Shachi Gahoi
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Atmakuri R Rao
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| |
Collapse
|
26
|
Kumar S. Prediction of Metal Ion Binding Sites in Proteins from Amino Acid Sequences by Using Simplified Amino Acid Alphabets and Random Forest Model. Genomics Inform 2017; 15:162-169. [PMID: 29307143 PMCID: PMC5769865 DOI: 10.5808/gi.2017.15.4.162] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Revised: 11/16/2017] [Accepted: 11/16/2017] [Indexed: 11/20/2022] Open
Abstract
Metal binding proteins or metallo-proteins are important for the stability of the protein and also serve as co-factors in various functions like controlling metabolism, regulating signal transport, and metal homeostasis. In structural genomics, prediction of metal binding proteins help in the selection of suitable growth medium for overexpression's studies and also help in obtaining the functional protein. Computational prediction using machine learning approach has been widely used in various fields of bioinformatics based on the fact all the information contains in amino acid sequence. In this study, random forest machine learning prediction systems were deployed with simplified amino acid for prediction of individual major metal ion binding sites like copper, calcium, cobalt, iron, magnesium, manganese, nickel, and zinc.
Collapse
Affiliation(s)
- Suresh Kumar
- Department of Diagnostic and Allied Health Sciences, Faculty of Health and Life Sciences, Management and Science University, 40100 Shah Alam, Malaysia
| |
Collapse
|
27
|
Akbar S, Hayat M, Iqbal M, Jan MA. iACP-GAEnsC: Evolutionary genetic algorithm based ensemble classification of anticancer peptides by utilizing hybrid feature space. Artif Intell Med 2017; 79:62-70. [PMID: 28655440 DOI: 10.1016/j.artmed.2017.06.008] [Citation(s) in RCA: 85] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Revised: 06/12/2017] [Accepted: 06/16/2017] [Indexed: 01/10/2023]
Abstract
Cancer is a fatal disease, responsible for one-quarter of all deaths in developed countries. Traditional anticancer therapies such as, chemotherapy and radiation, are highly expensive, susceptible to errors and ineffective techniques. These conventional techniques induce severe side-effects on human cells. Due to perilous impact of cancer, the development of an accurate and highly efficient intelligent computational model is desirable for identification of anticancer peptides. In this paper, evolutionary intelligent genetic algorithm-based ensemble model, 'iACP-GAEnsC', is proposed for the identification of anticancer peptides. In this model, the protein sequences are formulated, using three different discrete feature representation methods, i.e., amphiphilic Pseudo amino acid composition, g-Gap dipeptide composition, and Reduce amino acid alphabet composition. The performance of the extracted feature spaces are investigated separately and then merged to exhibit the significance of hybridization. In addition, the predicted results of individual classifiers are combined together, using optimized genetic algorithm and simple majority technique in order to enhance the true classification rate. It is observed that genetic algorithm-based ensemble classification outperforms than individual classifiers as well as simple majority voting base ensemble. The performance of genetic algorithm-based ensemble classification is highly reported on hybrid feature space, with an accuracy of 96.45%. In comparison to the existing techniques, 'iACP-GAEnsC' model has achieved remarkable improvement in terms of various performance metrics. Based on the simulation results, it is observed that 'iACP-GAEnsC' model might be a leading tool in the field of drug design and proteomics for researchers.
Collapse
Affiliation(s)
- Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan.
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan.
| | - Muhammad Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan.
| | - Mian Ahmad Jan
- Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan.
| |
Collapse
|
28
|
Zhang W, Pei J, Lai L. Statistical Analysis and Prediction of Covalent Ligand Targeted Cysteine Residues. J Chem Inf Model 2017; 57:1453-1460. [DOI: 10.1021/acs.jcim.7b00163] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Affiliation(s)
- Weilin Zhang
- Peking-Tsinghua
Center for Life Sciences, AAIS, Peking University, Beijing 100871, P.R. China
| | - Jianfeng Pei
- Center
for Quantitative Biology, AAIS, Peking University, Beijing 100871, P.R. China
| | - Luhua Lai
- Peking-Tsinghua
Center for Life Sciences, AAIS, Peking University, Beijing 100871, P.R. China
- Center
for Quantitative Biology, AAIS, Peking University, Beijing 100871, P.R. China
- BNLMS,
State Key Laboratory for Structural Chemistry of Unstable and Stable
Species, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, P.R. China
| |
Collapse
|
29
|
Liutkus M, Fraser SA, Caron K, Stigers DJ, Easton CJ. Peptide Synthesis through Cell-Free Expression of Fusion Proteins Incorporating Modified Amino Acids as Latent Cleavage Sites for Peptide Release. Chembiochem 2016; 17:908-12. [PMID: 26918308 DOI: 10.1002/cbic.201600091] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2016] [Indexed: 01/03/2023]
Abstract
Chlorinated analogues of Leu and Ile are incorporated during cell-free expression of peptides fused to protein, by exploiting the promiscuity of the natural biosynthetic machinery. They then act as sites for clean and efficient release of the peptides simply by brief heat treatment. Dehydro analogues of Leu and Ile are similarly incorporated as latent sites for peptide release through treatment with iodine under cold conditions. These protocols complement enzyme-catalyzed methods and have been used to prepare calcitonin, gastrin-releasing peptide, cholecystokinin-7, and prolactin-releasing peptide prohormones, as well as analogues substituted with unusual amino acids, thus illustrating their practical utility as alternatives to more traditional chemical peptide synthesis.
Collapse
Affiliation(s)
- Mantas Liutkus
- Research School of Chemistry, Australian National University, Canberra, ACT, 2601, Australia
| | - Samuel A Fraser
- Research School of Chemistry, Australian National University, Canberra, ACT, 2601, Australia
| | - Karine Caron
- Research School of Chemistry, Australian National University, Canberra, ACT, 2601, Australia
| | - Dannon J Stigers
- Research School of Chemistry, Australian National University, Canberra, ACT, 2601, Australia
| | - Christopher J Easton
- Research School of Chemistry, Australian National University, Canberra, ACT, 2601, Australia.
| |
Collapse
|
30
|
Childs LM, Baskerville EB, Cobey S. Trade-offs in antibody repertoires to complex antigens. Philos Trans R Soc Lond B Biol Sci 2016; 370:rstb.2014.0245. [PMID: 26194759 PMCID: PMC4528422 DOI: 10.1098/rstb.2014.0245] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Pathogens vary in their antigenic complexity. While some pathogens such as measles present a few relatively invariant targets to the immune system, others such as malaria display considerable antigenic diversity. How the immune response copes in the presence of multiple antigens, and whether a trade-off exists between the breadth and efficacy of antibody (Ab)-mediated immune responses, are unsolved problems. We present a theoretical model of affinity maturation of B-cell receptors (BCRs) during a primary infection and examine how variation in the number of accessible antigenic sites alters the Ab repertoire. Naive B cells with randomly generated receptor sequences initiate the germinal centre (GC) reaction. The binding affinity of a BCR to an antigen is quantified via a genotype-phenotype map, based on a random energy landscape, that combines local and distant interactions between residues. In the presence of numerous antigens or epitopes, B-cell clones with different specificities compete for stimulation during rounds of mutation within GCs. We find that the availability of many epitopes reduces the affinity and relative breadth of the Ab repertoire. Despite the stochasticity of somatic hypermutation, patterns of immunodominance are strongly shaped by chance selection of naive B cells with specificities for particular epitopes. Our model provides a mechanistic basis for the diversity of Ab repertoires and the evolutionary advantage of antigenically complex pathogens.
Collapse
Affiliation(s)
- Lauren M Childs
- Center for Communicable Disease Dynamics, Harvard T.H. Chan School of Public Health, Boston, MA, USA Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | - Sarah Cobey
- Ecology and Evolution, University of Chicago, Chicago, IL, USA
| |
Collapse
|
31
|
iDPF-PseRAAAC: A Web-Server for Identifying the Defensin Peptide Family and Subfamily Using Pseudo Reduced Amino Acid Alphabet Composition. PLoS One 2015; 10:e0145541. [PMID: 26713618 PMCID: PMC4694767 DOI: 10.1371/journal.pone.0145541] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Accepted: 12/04/2015] [Indexed: 11/29/2022] Open
Abstract
Defensins as one of the most abundant classes of antimicrobial peptides are an essential part of the innate immunity that has evolved in most living organisms from lower organisms to humans. To identify specific defensins as interesting antifungal leads, in this study, we constructed a more rigorous benchmark dataset and the iDPF-PseRAAAC server was developed to predict the defensin family and subfamily. Using reduced dipeptide compositions were used, the overall accuracy of proposed method increased to 95.10% for the defensin family, and 98.39% for the vertebrate subfamily, which is higher than the accuracy from other methods. The jackknife test shows that more than 4% improvement was obtained comparing with the previous method. A free online server was further established for the convenience of most experimental scientists at http://wlxy.imu.edu.cn/college/biostation/fuwu/iDPF-PseRAAAC/index.asp. A friendly guide is provided to describe how to use the web server. We anticipate that iDPF-PseRAAAC may become a useful high-throughput tool for both basic research and drug design.
Collapse
|
32
|
Solis AD. Amino acid alphabet reduction preserves fold information contained in contact interactions in proteins. Proteins 2015; 83:2198-216. [DOI: 10.1002/prot.24936] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2015] [Revised: 09/04/2015] [Accepted: 09/04/2015] [Indexed: 12/14/2022]
Affiliation(s)
- Armando D. Solis
- Biological Sciences Department, New York City College of Technology; the City University of New York (CUNY); Brooklyn New York 11201
| |
Collapse
|
33
|
Huang Q, You Z, Zhang X, Zhou Y. Prediction of protein-protein interactions with clustered amino acids and weighted sparse representation. Int J Mol Sci 2015; 16:10855-69. [PMID: 25984606 PMCID: PMC4463679 DOI: 10.3390/ijms160510855] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Revised: 05/06/2015] [Accepted: 05/07/2015] [Indexed: 01/22/2023] Open
Abstract
With the completion of the Human Genome Project, bioscience has entered into the era of the genome and proteome. Therefore, protein–protein interactions (PPIs) research is becoming more and more important. Life activities and the protein–protein interactions are inseparable, such as DNA synthesis, gene transcription activation, protein translation, etc. Though many methods based on biological experiments and machine learning have been proposed, they all spent a long time to learn and obtained an imprecise accuracy. How to efficiently and accurately predict PPIs is still a big challenge. To take up such a challenge, we developed a new predictor by incorporating the reduced amino acid alphabet (RAAA) information into the general form of pseudo-amino acid composition (PseAAC) and with the weighted sparse representation-based classification (WSRC). The remarkable advantages of introducing the reduced amino acid alphabet is being able to avoid the notorious dimensionality disaster or overfitting problem in statistical prediction. Additionally, experiments have proven that our method achieved good performance in both a low- and high-dimensional feature space. Among all of the experiments performed on the PPIs data of Saccharomyces cerevisiae, the best one achieved 90.91% accuracy, 94.17% sensitivity, 87.22% precision and a 83.43% Matthews correlation coefficient (MCC) value. In order to evaluate the prediction ability of our method, extensive experiments are performed to compare with the state-of-the-art technique, support vector machine (SVM). The achieved results show that the proposed approach is very promising for predicting PPIs, and it can be a helpful supplement for PPIs prediction.
Collapse
Affiliation(s)
- Qiaoying Huang
- Shenzhen Graduate School, Harbin Institute of Technology, HIT Campus of University Town of Shenzhen, Shenzhen 518055, China.
| | - Zhuhong You
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China.
| | - Xiaofeng Zhang
- Shenzhen Graduate School, Harbin Institute of Technology, HIT Campus of University Town of Shenzhen, Shenzhen 518055, China.
| | - Yong Zhou
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China.
| |
Collapse
|
34
|
Acevedo-Rocha CG, Reetz MT. Assembly of Designed Oligonucleotides: a useful tool in synthetic biology for creating high-quality combinatorial DNA libraries. Methods Mol Biol 2015; 1179:189-206. [PMID: 25055779 DOI: 10.1007/978-1-4939-1053-3_13] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The method dubbed Assembly of Designed Oligonucleotides (ADO) is a powerful tool in synthetic biology to create combinatorial DNA libraries for gene, protein, metabolic, and genome engineering. In directed evolution of proteins, ADO benefits from using reduced amino acid alphabets for saturation mutagenesis and/or DNA shuffling, but all 20 canonical amino acids can be also used as building blocks. ADO is performed in a two-step reaction. The first involves a primer-free, polymerase cycling assembly or overlap extension PCR step using carefully designed overlapping oligonucleotides. The second step is a PCR amplification using the outer primers, resulting in a high-quality and bias-free double-stranded DNA library that can be assembled with other gene fragments and/or cloned into a suitable plasmid subsequently. The protocol can be performed in a few hours. In theory, neither the length of the DNA library nor the number of DNA changes has any limits. Furthermore, with the costs of synthetic DNA dropping every year, after an initial investment is made in the oligonucleotides, these can be exchanged for alternative ones with different sequences at any point in the process, fully exploiting the potential of creating highly diverse combinatorial libraries. In the example chosen here, we show the construction of a high-quality combinatorial ADO library targeting sixteen different codons simultaneously with nonredundant degenerate codons encoding various reduced alphabets of four amino acids along the heme region of the monooxygenase P450-BM3.
Collapse
Affiliation(s)
- Carlos G Acevedo-Rocha
- Organische Synthese, Max-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm-Platz 1, 45470, Mülheim, Germany
| | | |
Collapse
|
35
|
Huang JT, Wang T, Huang SR, Li X. Reduced alphabet for protein folding prediction. Proteins 2015; 83:631-9. [PMID: 25641420 DOI: 10.1002/prot.24762] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Revised: 11/07/2014] [Accepted: 12/21/2014] [Indexed: 01/17/2023]
Abstract
What are the key building blocks that would have been needed to construct complex protein folds? This is an important issue for understanding protein folding mechanism and guiding de novo protein design. Twenty naturally occurring amino acids and eight secondary structures consist of a 28-letter alphabet to determine folding kinetics and mechanism. Here we predict folding kinetic rates of proteins from many reduced alphabets. We find that a reduced alphabet of 10 letters achieves good correlation with folding rates, close to the one achieved by full 28-letter alphabet. Many other reduced alphabets are not significantly correlated to folding rates. The finding suggests that not all amino acids and secondary structures are equally important for protein folding. The foldable sequence of a protein could be designed using at least 10 folding units, which can either promote or inhibit protein folding. Reducing alphabet cardinality without losing key folding kinetic information opens the door to potentially faster machine learning and data mining applications in protein structure prediction, sequence alignment and protein design.
Collapse
Affiliation(s)
- Jitao T Huang
- Department of Chemistry and National Laboratory of Elemento-Organic Chemistry, Nankai University, Tianjin, 300071, People's Republic of China
| | | | | | | |
Collapse
|
36
|
Modeling and molecular dynamics simulations of the V33 variant of the integrin subunit β3: Structural comparison with the L33 (HPA-1a) and P33 (HPA-1b) variants. Biochimie 2014; 105:84-90. [DOI: 10.1016/j.biochi.2014.06.017] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2014] [Accepted: 06/21/2014] [Indexed: 11/21/2022]
|
37
|
Predicting the types of J-proteins using clustered amino acids. BIOMED RESEARCH INTERNATIONAL 2014; 2014:935719. [PMID: 24804260 PMCID: PMC3996952 DOI: 10.1155/2014/935719] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2014] [Revised: 03/04/2014] [Accepted: 03/13/2014] [Indexed: 01/24/2023]
Abstract
J-proteins are molecular chaperones and present in a wide variety of organisms from prokaryote to eukaryote. Based on their domain organizations, J-proteins can be classified into 4 types, that is, Type I, Type II, Type III, and Type IV. Different types of J-proteins play distinct roles in influencing cancer properties and cell death. Thus, reliably annotating the types of J-proteins is essential to better understand their molecular functions. In the present work, a support vector machine based method was developed to identify the types of J-proteins using the tripeptide composition of reduced amino acid alphabet. In the jackknife cross-validation, the maximum overall accuracy of 94% was achieved on a stringent benchmark dataset. We also analyzed the amino acid compositions by using analysis of variance and found the distinct distributions of amino acids in each family of the J-proteins. To enhance the value of the practical applications of the proposed model, an online web server was developed and can be freely accessed.
Collapse
|
38
|
Ma J, Wang S. Algorithms, Applications, and Challenges of Protein Structure Alignment. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2014; 94:121-75. [DOI: 10.1016/b978-0-12-800168-4.00005-6] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
39
|
Feng PM, Chen W, Lin H, Chou KC. iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal Biochem 2013; 442:118-25. [DOI: 10.1016/j.ab.2013.05.024] [Citation(s) in RCA: 230] [Impact Index Per Article: 20.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2013] [Revised: 05/21/2013] [Accepted: 05/22/2013] [Indexed: 01/22/2023]
|
40
|
Fan GL, Li QZ. Discriminating bioluminescent proteins by incorporating average chemical shift and evolutionary information into the general form of Chou's pseudo amino acid composition. J Theor Biol 2013; 334:45-51. [DOI: 10.1016/j.jtbi.2013.06.003] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Revised: 05/30/2013] [Accepted: 06/03/2013] [Indexed: 01/22/2023]
|
41
|
Craveur P, Joseph AP, Rebehmed J, de Brevern AG. β-Bulges: extensive structural analyses of β-sheets irregularities. Protein Sci 2013; 22:1366-78. [PMID: 23904395 DOI: 10.1002/pro.2324] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2013] [Revised: 07/19/2013] [Accepted: 07/22/2013] [Indexed: 12/30/2022]
Abstract
β-Sheets are quite frequent in protein structures and are stabilized by regular main-chain hydrogen bond patterns. Irregularities in β-sheets, named β-bulges, are distorted regions between two consecutive hydrogen bonds. They disrupt the classical alternation of side chain direction and can alter the directionality of β-strands. They are implicated in protein-protein interactions and are introduced to avoid β-strand aggregation. Five different types of β-bulges are defined. Previous studies on β-bulges were performed on a limited number of protein structures or one specific family. These studies evoked a potential conservation during evolution. In this work, we analyze the β-bulge distribution and conservation in terms of local backbone conformations and amino acid composition. Our dataset consists of 66 times more β-bulges than the last systematic study (Chan et al. Protein Science 1993, 2:1574-1590). Novel amino acid preferences are underlined and local structure conformations are highlighted by the use of a structural alphabet. We observed that β-bulges are preferably localized at the N- and C-termini of β-strands, but contrary to the earlier studies, no significant conservation of β-bulges was observed among structural homologues. Displacement of β-bulges along the sequence was also investigated by Molecular Dynamics simulations.
Collapse
Affiliation(s)
- Pierrick Craveur
- INSERM, U665, DSIMB, F-75739, Paris, France; University of Paris Diderot, Sorbonne Paris Cité, UMR_S 665, F-75739, Paris, France; Institut National de la Transfusion Sanguine (INTS), F-75739, Paris, France; Laboratoire d'Excellence GR-Ex, F-75739, Paris, France
| | | | | | | |
Collapse
|
42
|
Predicting acidic and alkaline enzymes by incorporating the average chemical shift and gene ontology informations into the general form of Chou's PseAAC. Process Biochem 2013. [DOI: 10.1016/j.procbio.2013.05.012] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
43
|
Stephenson JD, Freeland SJ. Unearthing the root of amino acid similarity. J Mol Evol 2013; 77:159-69. [PMID: 23743923 PMCID: PMC6763418 DOI: 10.1007/s00239-013-9565-0] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2013] [Accepted: 05/08/2013] [Indexed: 12/31/2022]
Abstract
Similarities and differences between amino acids define the rates at which they substitute for one another within protein sequences and the patterns by which these sequences form protein structures. However, there exist many ways to measure similarity, whether one considers the molecular attributes of individual amino acids, the roles that they play within proteins, or some nuanced contribution of each. One popular approach to representing these relationships is to divide the 20 amino acids of the standard genetic code into groups, thereby forming a simplified amino acid alphabet. Here, we develop a method to compare or combine different simplified alphabets, and apply it to 34 simplified alphabets from the scientific literature. We use this method to show that while different suggestions vary and agree in non-intuitive ways, they combine to reveal a consensus view of amino acid similarity that is clearly rooted in physico-chemistry.
Collapse
Affiliation(s)
- James D Stephenson
- NASA Astrobiology Institute, University of Hawaii, Honolulu, HI, 96822, USA,
| | | |
Collapse
|
44
|
Uversky VN. A decade and a half of protein intrinsic disorder: biology still waits for physics. Protein Sci 2013; 22:693-724. [PMID: 23553817 PMCID: PMC3690711 DOI: 10.1002/pro.2261] [Citation(s) in RCA: 363] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2013] [Revised: 03/23/2013] [Accepted: 03/25/2013] [Indexed: 12/28/2022]
Abstract
The abundant existence of proteins and regions that possess specific functions without being uniquely folded into unique 3D structures has become accepted by a significant number of protein scientists. Sequences of these intrinsically disordered proteins (IDPs) and IDP regions (IDPRs) are characterized by a number of specific features, such as low overall hydrophobicity and high net charge which makes these proteins predictable. IDPs/IDPRs possess large hydrodynamic volumes, low contents of ordered secondary structure, and are characterized by high structural heterogeneity. They are very flexible, but some may undergo disorder to order transitions in the presence of natural ligands. The degree of these structural rearrangements varies over a very wide range. IDPs/IDPRs are tightly controlled under the normal conditions and have numerous specific functions that complement functions of ordered proteins and domains. When lacking proper control, they have multiple roles in pathogenesis of various human diseases. Gaining structural and functional information about these proteins is a challenge, since they do not typically "freeze" while their "pictures are taken." However, despite or perhaps because of the experimental challenges, these fuzzy objects with fuzzy structures and fuzzy functions are among the most interesting targets for modern protein research. This review briefly summarizes some of the recent advances in this exciting field and considers some of the basic lessons learned from the analysis of physics, chemistry, and biology of IDPs.
Collapse
Affiliation(s)
- Vladimir N Uversky
- Department of Molecular Medicine, USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, Florida 33612, USA.
| |
Collapse
|
45
|
Ferrada E, Wagner A. A comparison of genotype-phenotype maps for RNA and proteins. Biophys J 2012; 102:1916-25. [PMID: 22768948 DOI: 10.1016/j.bpj.2012.01.047] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2011] [Revised: 01/19/2012] [Accepted: 01/27/2012] [Indexed: 02/04/2023] Open
Abstract
The relationship between the genotype (sequence) and the phenotype (structure) of macromolecules affects their ability to evolve new structures and functions. We here compare the genotype space organization of proteins and RNA molecules to identify differences that may affect this ability. To this end, we computationally study the genotype-phenotype relationship for short RNA and lattice proteins of a reduced monomer alphabet size, to make exhaustive analysis and direct comparison of their genotype spaces feasible. We find that many fewer protein molecules than RNA molecules fold, but they fold into many more structures than RNA. In consequence, protein phenotypes have smaller genotype networks whose member genotypes tend to be more similar than for RNA phenotypes. Neighborhoods in sequence space of a given radius around an RNA molecule contain more novel structures than for protein molecules. We compare this property to evidence from natural RNA and protein molecules, and conclude that RNA genotype space may be more conducive to the evolution of new structure phenotypes.
Collapse
Affiliation(s)
- Evandro Ferrada
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland.
| | | |
Collapse
|
46
|
Jallu V, Bertrand G, Bianchi F, Chenet C, Poulain P, Kaplan C. The αIIb p.Leu841Met (Cab3(a+) ) polymorphism results in a new human platelet alloantigen involved in neonatal alloimmune thrombocytopenia. Transfusion 2012; 53:554-63. [PMID: 22738334 DOI: 10.1111/j.1537-2995.2012.03762.x] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
BACKGROUND Fetal-neonatal alloimmune thrombocytopenia (FNAIT) diagnosis relies on maternofetal incompatibility and alloantibody identification. Genotyping for rare platelet (PLT) polymorphisms allowed the identification of three families with suspected or confirmed maternofetal incompatibility for the αIIb-c.2614C>A mutation (Halle et al., Transfusion 2008;48:14-15). STUDY DESIGN AND METHODS A polymerase chain reaction-sequence-specific primers amplification assay was designed to genotype the αIIb-c.2614C>A mutation. HEK293 cells expressing αIIb-Leu841 or αIIb-Met841 αIIbβ3 forms were used to probe the reactivity of maternal sera from these families and to study the effects of the substitution on αIIbβ3 expression and functions. RESULTS Tested by flow cytometry (FCM), one serum sample specifically reacted with αIIb-Met841 but not with αIIb-Leu841 αIIbβ3. This specificity revealed the αIIb-Leu841 polymorphism as a new alloantigen named Cab3(a+) . Cross-match testing using FCM also showed the Cab3(a+) antigen to be expressed at the PLT surface. As for anti-human PLT alloantigen (HPA)-3a (or -3b) and anti-HPA-9bw, detection of anti-Cab3(a+) alloantibodies appeared difficult and required whole PLT assays when classical monoclonal antibody-specific immobilization of PLT antigen test failed. In our FNAIT set, the immune response to Cab3(a+) maternofetal incompatibility could induce severe thrombocytopenias and life-threatening hemorrhages. The p.Leu841Met substitution has limited effects, if any, on local αIIb structure, preserving both αIIbβ3 expression and functions. CONCLUSION The Cab3(a+) polymorphism is a new rare alloantigen (allelic frequency <1%) carried by αIIb that might result in severe life-threatening thrombocytopenias. In Sub-Saharan African populations, higher Cab3(a+) gene frequencies (up to 8.2%; Halle et al., Transfusion 2008;48:14-15) and homozygous people are observed.
Collapse
Affiliation(s)
- Vincent Jallu
- Platelet Immunology Laboratory, INTS; DSIMB, INSERM, U665, France
| | | | | | | | | | | |
Collapse
|
47
|
Chen W, Feng P, Lin H. Prediction of ketoacyl synthase family using reduced amino acid alphabets. J Ind Microbiol Biotechnol 2011; 39:579-84. [PMID: 22042516 DOI: 10.1007/s10295-011-1047-z] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2011] [Accepted: 10/04/2011] [Indexed: 11/28/2022]
Abstract
Ketoacyl synthases are enzymes involved in fatty acid synthesis and can be classified into five families based on primary sequence similarity. Different families have different catalytic mechanisms. Developing cost-effective computational models to identify the family of ketoacyl synthases will be helpful for enzyme engineering and in knowing individual enzymes' catalytic mechanisms. In this work, a support vector machine-based method was developed to predict ketoacyl synthase family using the n-peptide composition of reduced amino acid alphabets. In jackknife cross-validation, the model based on the 2-peptide composition of a reduced amino acid alphabet of size 13 yielded the best overall accuracy of 96.44% with average accuracy of 93.36%, which is superior to other state-of-the-art methods. This result suggests that the information provided by n-peptide compositions of reduced amino acid alphabets provides efficient means for enzyme family classification and that the proposed model can be efficiently used for ketoacyl synthase family annotation.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics, College of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China.
| | | | | |
Collapse
|
48
|
Species specific amino acid sequence–protein local structure relationships: An analysis in the light of a structural alphabet. J Theor Biol 2011; 276:209-17. [DOI: 10.1016/j.jtbi.2011.01.047] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2010] [Revised: 01/28/2011] [Accepted: 01/31/2011] [Indexed: 11/24/2022]
|
49
|
Novel hydrophobins from Trichoderma define a new hydrophobin subclass: protein properties, evolution, regulation and processing. J Mol Evol 2011; 72:339-51. [PMID: 21424760 DOI: 10.1007/s00239-011-9438-3] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2010] [Accepted: 03/01/2011] [Indexed: 10/18/2022]
Abstract
Hydrophobins are small proteins, characterised by the presence of eight positionally conserved cysteine residues, and are present in all filamentous asco- and basidiomycetes. They are found on the outer surfaces of cell walls of hyphae and conidia, where they mediate interactions between the fungus and the environment. Hydrophobins are conventionally grouped into two classes (class I and II) according to their solubility in solvents, hydropathy profiles and spacing between the conserved cysteines. Here we describe a novel set of hydrophobins from Trichoderma spp. that deviate from this classification in their hydropathy, cysteine spacing and protein surface pattern. Phylogenetic analysis shows that they form separate clades within ascomycete class I hydrophobins. Using T. atroviride as a model, the novel hydrophobins were found to be expressed under conditions of glucose limitation and to be regulated by differential splicing.
Collapse
|
50
|
Using increment of diversity to predict mitochondrial proteins of malaria parasite: integrating pseudo-amino acid composition and structural alphabet. Amino Acids 2010; 42:1309-16. [DOI: 10.1007/s00726-010-0825-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2010] [Accepted: 12/17/2010] [Indexed: 11/29/2022]
|