1
|
Bhattarai S, Tayara H, Chong KT. Advancing Peptide-Based Cancer Therapy with AI: In-Depth Analysis of State-of-the-Art AI Models. J Chem Inf Model 2024; 64:4941-4957. [PMID: 38874445 DOI: 10.1021/acs.jcim.4c00295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2024]
Abstract
Anticancer peptides (ACPs) play a vital role in selectively targeting and eliminating cancer cells. Evaluating and comparing predictions from various machine learning (ML) and deep learning (DL) techniques is challenging but crucial for anticancer drug research. We conducted a comprehensive analysis of 15 ML and 10 DL models, including the models released after 2022, and found that support vector machines (SVMs) with feature combination and selection significantly enhance overall performance. DL models, especially convolutional neural networks (CNNs) with light gradient boosting machine (LGBM) based feature selection approaches, demonstrate improved characterization. Assessment using a new test data set (ACP10) identifies ACPred, MLACP 2.0, AI4ACP, mACPred, and AntiCP2.0_AAC as successive optimal predictors, showcasing robust performance. Our review underscores current prediction tool limitations and advocates for an omnidirectional ACP prediction framework to propel ongoing research.
Collapse
Affiliation(s)
- Sadik Bhattarai
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju-si, 54896 Jeollabuk-do, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju-si, 54896 Jeollabuk-do, South Korea
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju-si, 54896 Jeollabuk-do, South Korea
- Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju-si, 54896 Jeollabuk-do, South Korea
| |
Collapse
|
2
|
Han Y, Zhang H, Zeng Z, Liu Z, Lu D, Liu Z. Descriptor-augmented machine learning for enzyme-chemical interaction predictions. Synth Syst Biotechnol 2024; 9:259-268. [PMID: 38450325 PMCID: PMC10915406 DOI: 10.1016/j.synbio.2024.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 02/21/2024] [Accepted: 02/22/2024] [Indexed: 03/08/2024] Open
Abstract
Descriptors play a pivotal role in enzyme design for the greener synthesis of biochemicals, as they could characterize enzymes and chemicals from the physicochemical and evolutionary perspective. This study examined the effects of various descriptors on the performance of Random Forest model used for enzyme-chemical relationships prediction. We curated activity data of seven specific enzyme families from the literature and developed the pipeline for evaluation the machine learning model performance using 10-fold cross-validation. The influence of protein and chemical descriptors was assessed in three scenarios, which were predicting the activity of unknown relations between known enzymes and known chemicals (new relationship evaluation), predicting the activity of novel enzymes on known chemicals (new enzyme evaluation), and predicting the activity of new chemicals on known enzymes (new chemical evaluation). The results showed that protein descriptors significantly enhanced the classification performance of model on new enzyme evaluation in three out of the seven datasets with the greatest number of enzymes, whereas chemical descriptors appear no effect. A variety of sequence-based and structure-based protein descriptors were constructed, among which the esm-2 descriptor achieved the best results. Using enzyme families as labels showed that descriptors could cluster proteins well, which could explain the contributions of descriptors to the machine learning model. As a counterpart, in the new chemical evaluation, chemical descriptors made significant improvement in four out of the seven datasets, while protein descriptors appear no effect. We attempted to evaluate the generalization ability of the model by correlating the statistics of the datasets with the performance of the models. The results showed that datasets with higher sequence similarity were more likely to get better results in the new enzyme evaluation and datasets with more enzymes were more likely beneficial from the protein descriptor strategy. This work provides guidance for the development of machine learning models for specific enzyme families.
Collapse
Affiliation(s)
- Yilei Han
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Haoye Zhang
- Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
| | - Zheni Zeng
- Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
| | - Zhiyuan Liu
- Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
| | - Diannan Lu
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Zheng Liu
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
3
|
Michalik I, Kuder KJ. Machine Learning Methods in Protein-Protein Docking. Methods Mol Biol 2024; 2780:107-126. [PMID: 38987466 DOI: 10.1007/978-1-0716-3985-6_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
An exponential increase in the number of publications that address artificial intelligence (AI) usage in life sciences has been noticed in recent years, while new modeling techniques are constantly being reported. The potential of these methods is vast-from understanding fundamental cellular processes to discovering new drugs and breakthrough therapies. Computational studies of protein-protein interactions, crucial for understanding the operation of biological systems, are no exception in this field. However, despite the rapid development of technology and the progress in developing new approaches, many aspects remain challenging to solve, such as predicting conformational changes in proteins, or more "trivial" issues as high-quality data in huge quantities.Therefore, this chapter focuses on a short introduction to various AI approaches to study protein-protein interactions, followed by a description of the most up-to-date algorithms and programs used for this purpose. Yet, given the considerable pace of development in this hot area of computational science, at the time you read this chapter, the development of the algorithms described, or the emergence of new (and better) ones should come as no surprise.
Collapse
Affiliation(s)
- Ilona Michalik
- Department of Technology and Biotechnology of Drugs, Faculty of Pharmacy, Jagiellonian University Medical College, Kraków, Poland
| | - Kamil J Kuder
- Department of Technology and Biotechnology of Drugs, Faculty of Pharmacy, Jagiellonian University Medical College, Kraków, Poland.
| |
Collapse
|
4
|
Jarończyk M. Software for Predicting Binding Free Energy of Protein-Protein Complexes and Their Mutants. Methods Mol Biol 2024; 2780:139-147. [PMID: 38987468 DOI: 10.1007/978-1-0716-3985-6_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Protein-protein binding affinity prediction is important for understanding complex biochemical pathways and to uncover protein interaction networks. Quantitative estimation of the binding affinity changes caused by mutations can provide critical information for protein function annotation and genetic disease diagnoses. The binding free energies of protein-protein complexes can be predicted using several computational tools. This chapter is a summary of software developed for the prediction of binding free energies for protein-protein complexes and their mutants.
Collapse
|
5
|
Nath A, Chaube R. Mining Chemogenomic Spaces for Prediction of Drug-Target Interactions. Methods Mol Biol 2024; 2714:155-169. [PMID: 37676598 DOI: 10.1007/978-1-0716-3441-7_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
The pipeline of drug discovery consists of a number of processes; drug-target interaction determination is one of the salient steps among them. Computational prediction of drug-target interactions can facilitate in reducing the search space of experimental wet lab-based verifications steps, thus considerably reducing time and other resources dedicated to the drug discovery pipeline. While machine learning-based methods are more widespread for drug-target interaction prediction, network-centric methods are also evolving. In this chapter, we focus on the process of the drug-target interaction prediction from the perspective of using machine learning algorithms and the various stages involved for developing an accurate predictor.
Collapse
Affiliation(s)
- Abhigyan Nath
- Department of Biochemistry, Pt. Jawahar Lal Nehru Memorial Medical College, Raipur, India
| | - Radha Chaube
- Department of Zoology, Institute of Science, Banaras Hindu University, Varanasi, India
| |
Collapse
|
6
|
Durairaj J, de Ridder D, van Dijk AD. Beyond sequence: Structure-based machine learning. Comput Struct Biotechnol J 2022; 21:630-643. [PMID: 36659927 PMCID: PMC9826903 DOI: 10.1016/j.csbj.2022.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 12/21/2022] [Accepted: 12/21/2022] [Indexed: 12/31/2022] Open
Abstract
Recent breakthroughs in protein structure prediction demarcate the start of a new era in structural bioinformatics. Combined with various advances in experimental structure determination and the uninterrupted pace at which new structures are published, this promises an age in which protein structure information is as prevalent and ubiquitous as sequence. Machine learning in protein bioinformatics has been dominated by sequence-based methods, but this is now changing to make use of the deluge of rich structural information as input. Machine learning methods making use of structures are scattered across literature and cover a number of different applications and scopes; while some try to address questions and tasks within a single protein family, others aim to capture characteristics across all available proteins. In this review, we look at the variety of structure-based machine learning approaches, how structures can be used as input, and typical applications of these approaches in protein biology. We also discuss current challenges and opportunities in this all-important and increasingly popular field.
Collapse
Affiliation(s)
- Janani Durairaj
- Biozentrum, University of Basel, Basel, Switzerland
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Aalt D.J. van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| |
Collapse
|
7
|
ABP-Finder: A Tool to Identify Antibacterial Peptides and the Gram-Staining Type of Targeted Bacteria. Antibiotics (Basel) 2022; 11:antibiotics11121708. [PMID: 36551365 PMCID: PMC9774453 DOI: 10.3390/antibiotics11121708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 11/16/2022] [Accepted: 11/17/2022] [Indexed: 11/29/2022] Open
Abstract
Multi-drug resistance in bacteria is a major health problem worldwide. To overcome this issue, new approaches allowing for the identification and development of antibacterial agents are urgently needed. Peptides, due to their binding specificity and low expected side effects, are promising candidates for a new generation of antibiotics. For over two decades, a large diversity of antimicrobial peptides (AMPs) has been discovered and annotated in public databases. The AMP family encompasses nearly 20 biological functions, thus representing a potentially valuable resource for data mining analyses. Nonetheless, despite the availability of machine learning-based approaches focused on AMPs, these tools lack evidence of successful application for AMPs' discovery, and many are not designed to predict a specific function for putative AMPs, such as antibacterial activity. Consequently, among the apparent variety of data mining methods to screen peptide sequences for antibacterial activity, only few tools can deal with such task consistently, although with limited precision and generally no information about the possible targets. Here, we addressed this gap by introducing a tool specifically designed to identify antibacterial peptides (ABPs) with an estimation of which type of bacteria is susceptible to the action of these peptides, according to their response to the Gram-staining assay. Our tool is freely available via a web server named ABP-Finder. This new method ranks within the top state-of-the-art ABP predictors, particularly in terms of precision. Importantly, we showed the successful application of ABP-Finder for the screening of a large peptide library from the human urine peptidome and the identification of an antibacterial peptide.
Collapse
|
8
|
Yang Y, Zhao J, Zeng L, Vihinen M. ProTstab2 for Prediction of Protein Thermal Stabilities. Int J Mol Sci 2022; 23:ijms231810798. [PMID: 36142711 PMCID: PMC9505338 DOI: 10.3390/ijms231810798] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 09/12/2022] [Accepted: 09/13/2022] [Indexed: 11/16/2022] Open
Abstract
The stability of proteins is an essential property that has several biological implications. Knowledge about protein stability is important in many ways, ranging from protein purification and structure determination to stability in cells and biotechnological applications. Experimental determination of thermal stabilities has been tedious and available data have been limited. The introduction of limited proteolysis and mass spectrometry approaches has facilitated more extensive cellular protein stability data production. We collected melting temperature information for 34,913 proteins and developed a machine learning predictor, ProTstab2, by utilizing a gradient boosting algorithm after testing seven algorithms. The method performance was assessed on a blind test data set and showed a Pearson correlation coefficient of 0.753 and root mean square error of 7.005. Comparison to previous methods indicated that ProTstab2 had superior performance. The method is fast, so it was applied to predict and compare the stabilities of all proteins in human, mouse, and zebrafish proteomes for which experimental data were not determined. The tool is freely available.
Collapse
Affiliation(s)
- Yang Yang
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
| | - Jianjun Zhao
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Lianjie Zeng
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22184 Lund, Sweden
- Correspondence:
| |
Collapse
|
9
|
Agüero-Chapin G, Galpert-Cañizares D, Domínguez-Pérez D, Marrero-Ponce Y, Pérez-Machado G, Teijeira M, Antunes A. Emerging Computational Approaches for Antimicrobial Peptide Discovery. Antibiotics (Basel) 2022; 11:antibiotics11070936. [PMID: 35884190 PMCID: PMC9311958 DOI: 10.3390/antibiotics11070936] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 07/01/2022] [Accepted: 07/08/2022] [Indexed: 02/05/2023] Open
Abstract
In the last two decades many reports have addressed the application of artificial intelligence (AI) in the search and design of antimicrobial peptides (AMPs). AI has been represented by machine learning (ML) algorithms that use sequence-based features for the discovery of new peptidic scaffolds with promising biological activity. From AI perspective, evolutionary algorithms have been also applied to the rational generation of peptide libraries aimed at the optimization/design of AMPs. However, the literature has scarcely dedicated to other emerging non-conventional in silico approaches for the search/design of such bioactive peptides. Thus, the first motivation here is to bring up some non-standard peptide features that have been used to build classical ML predictive models. Secondly, it is valuable to highlight emerging ML algorithms and alternative computational tools to predict/design AMPs as well as to explore their chemical space. Another point worthy of mention is the recent application of evolutionary algorithms that actually simulate sequence evolution to both the generation of diversity-oriented peptide libraries and the optimization of hit peptides. Last but not least, included here some new considerations in proteogenomic analyses currently incorporated into the computational workflow for unravelling AMPs in natural sources.
Collapse
Affiliation(s)
- Guillermin Agüero-Chapin
- CIIMAR—Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208 Porto, Portugal;
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
- Correspondence: (G.A.-C.); (A.A.); Tel.: +351-22-340-1813 (G.A.-C. & A.A.)
| | - Deborah Galpert-Cañizares
- Departamento de Ciencia de la Computación, Universidad Central Marta Abreu de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Dany Domínguez-Pérez
- CIIMAR—Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208 Porto, Portugal;
- Proquinorte, Unipessoal, Lda, Avenida 5 de Outubro, 124, 7º Piso, Avenidas Novas, 1050-061 Lisboa, Portugal
| | - Yovani Marrero-Ponce
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Translacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas and Instituto de Simulación Computacional (ISC-USFQ), Diego de Robles y vía Interoceánica, Quito 170157, Ecuador;
| | - Gisselle Pérez-Machado
- EpiDisease S.L—Spin-Off of Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), 46980 Valencia, Spain;
| | - Marta Teijeira
- Departamento de Química Orgánica, Facultade de Química, Universidade de Vigo, 36310 Vigo, Spain;
- Instituto de Investigación Sanitaria Galicia Sur, Hospital Álvaro Cunqueiro, 36213 Vigo, Spain
| | - Agostinho Antunes
- CIIMAR—Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208 Porto, Portugal;
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
- Correspondence: (G.A.-C.); (A.A.); Tel.: +351-22-340-1813 (G.A.-C. & A.A.)
| |
Collapse
|
10
|
Romero-Molina S, Ruiz-Blanco YB, Mieres-Perez J, Harms M, Münch J, Ehrmann M, Sanchez-Garcia E. PPI-Affinity: A Web Tool for the Prediction and Optimization of Protein-Peptide and Protein-Protein Binding Affinity. J Proteome Res 2022; 21:1829-1841. [PMID: 35654412 PMCID: PMC9361347 DOI: 10.1021/acs.jproteome.2c00020] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
Virtual screening
of protein–protein and protein–peptide
interactions is a challenging task that directly impacts the processes
of hit identification and hit-to-lead optimization in drug design
projects involving peptide-based pharmaceuticals. Although several
screening tools designed to predict the binding affinity of protein–protein
complexes have been proposed, methods specifically developed to predict
protein–peptide binding affinity are comparatively scarce.
Frequently, predictors trained to score the affinity of small molecules
are used for peptides indistinctively, despite the larger complexity
and heterogeneity of interactions rendered by peptide binders. To
address this issue, we introduce PPI-Affinity, a tool that leverages
support vector machine (SVM) predictors of binding affinity to screen
datasets of protein–protein and protein–peptide complexes,
as well as to generate and rank mutants of a given structure. The
performance of the SVM models was assessed on four benchmark datasets,
which include protein–protein and protein–peptide binding
affinity data. In addition, we evaluated our model on a set of mutants
of EPI-X4, an endogenous peptide inhibitor of the chemokine receptor
CXCR4, and on complexes of the serine proteases HTRA1 and HTRA3 with
peptides. PPI-Affinity is freely accessible at https://protdcal.zmb.uni-due.de/PPIAffinity.
Collapse
Affiliation(s)
- Sandra Romero-Molina
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen, Essen 45141, Germany
| | - Yasser B Ruiz-Blanco
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen, Essen 45141, Germany
| | - Joel Mieres-Perez
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen, Essen 45141, Germany
| | - Mirja Harms
- Institute of Molecular Virology, Ulm University Medical Center, Ulm 89081, Germany
| | - Jan Münch
- Institute of Molecular Virology, Ulm University Medical Center, Ulm 89081, Germany.,Core Facility Functional Peptidomics, Ulm University Medical Center, Ulm 89081, Germany
| | - Michael Ehrmann
- Faculty of Biology, Center of Medical Biotechnology, University of Duisburg-Essen, Essen 45141, Germany
| | - Elsa Sanchez-Garcia
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen, Essen 45141, Germany
| |
Collapse
|
11
|
Quevedo-Tumailli V, Ortega-Tenezaca B, González-Díaz H. IFPTML Mapping of Drug Graphs with Protein and Chromosome Structural Networks vs. Pre-Clinical Assay Information for Discovery of Antimalarial Compounds. Int J Mol Sci 2021; 22:ijms222313066. [PMID: 34884870 PMCID: PMC8657696 DOI: 10.3390/ijms222313066] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 11/23/2021] [Accepted: 11/24/2021] [Indexed: 11/16/2022] Open
Abstract
The parasite species of genus Plasmodium causes Malaria, which remains a major global health problem due to parasite resistance to available Antimalarial drugs and increasing treatment costs. Consequently, computational prediction of new Antimalarial compounds with novel targets in the proteome of Plasmodium sp. is a very important goal for the pharmaceutical industry. We can expect that the success of the pre-clinical assay depends on the conditions of assay per se, the chemical structure of the drug, the structure of the target protein to be targeted, as well as on factors governing the expression of this protein in the proteome such as genes (Deoxyribonucleic acid, DNA) sequence and/or chromosomes structure. However, there are no reports of computational models that consider all these factors simultaneously. Some of the difficulties for this kind of analysis are the dispersion of data in different datasets, the high heterogeneity of data, etc. In this work, we analyzed three databases ChEMBL (Chemical database of the European Molecular Biology Laboratory), UniProt (Universal Protein Resource), and NCBI-GDV (National Center for Biotechnology Information—Genome Data Viewer) to achieve this goal. The ChEMBL dataset contains outcomes for 17,758 unique assays of potential Antimalarial compounds including numeric descriptors (variables) for the structure of compounds as well as a huge amount of information about the conditions of assays. The NCBI-GDV and UniProt datasets include the sequence of genes, proteins, and their functions. In addition, we also created two partitions (cassayj = caj and cdataj = cdj) of categorical variables from theChEMBL dataset. These partitions contain variables that encode information about experimental conditions of preclinical assays (caj) or about the nature and quality of data (cdj). These categorical variables include information about 22 parameters of biological activity (ca0), 28 target proteins (ca1), and 9 organisms of assay (ca2), etc. We also created another partition of (cprotj = cpj) including categorical variables with biological information about the target proteins, genes, and chromosomes. These variables cover32 genes (cp0), 10 chromosomes (cp1), gene orientation (cp2), and 31 protein functions (cp3). We used a Perturbation-Theory Machine Learning Information Fusion (IFPTML) algorithm to map all this information (from three databases) into and train a predictive model. Shannon’s entropy measure Shk (numerical variables) was used to quantify the information about the structure of drugs, protein sequences, gene sequences, and chromosomes in the same information scale. Perturbation Theory Operators (PTOs) with the form of Moving Average (MA) operators have been used to quantify perturbations (deviations) in the structural variables with respect to their expected values for different subsets (partitions) of categorical variables. We obtained three IFPTML models using General Discriminant Analysis (GDA), Classification Tree with Univariate Splits (CTUS), and Classification Tree with Linear Combinations (CTLC). The IFPTML-CTLC presented the better performance with Sensitivity Sn(%) = 83.6/85.1, and Specificity Sp(%) = 89.8/89.7 for training/validation sets, respectively. This model could become a useful tool for the optimization of preclinical assays of new Antimalarial compounds vs. different proteins in the proteome of Plasmodium.
Collapse
Affiliation(s)
- Viviana Quevedo-Tumailli
- Grupo RNASA-IMEDIR, Department of Computer Science, University of A Coruña, 15071 A Coruña, Spain; (V.Q.-T.); (B.O.-T.)
- Research Department, Puyo Campus, Universidad Estatal Amazónica, Puyo 160150, Ecuador
| | - Bernabe Ortega-Tenezaca
- Grupo RNASA-IMEDIR, Department of Computer Science, University of A Coruña, 15071 A Coruña, Spain; (V.Q.-T.); (B.O.-T.)
- Information and Communications Technology Management Department, Puyo Campus, Universidad Estatal Amazónica, Puyo 160150, Ecuador
| | - Humberto González-Díaz
- Department of Organic and Inorganic Chemistry, University of the Basque Country UPV/EHU, 48940 Leioa, Spain
- BIOFISIKA, Basque Centre for Biophysics, CSIC-UPV/EHU, 48940 Leioa, Spain
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Spain
- Correspondence: ;Tel.: +34-94-601-3547
| |
Collapse
|
12
|
PTML modeling for peptide discovery: in silico design of non-hemolytic peptides with antihypertensive activity. Mol Divers 2021; 26:2523-2534. [PMID: 34802116 DOI: 10.1007/s11030-021-10350-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 11/05/2021] [Indexed: 01/19/2023]
Abstract
Hypertension is a medical condition that affects millions of people worldwide. Despite the high efficacy of the current antihypertensive drugs, they are associated with serious side effects. Peptides constitute attractive options for chemical therapy against hypertension, and computational models can accelerate the design of antihypertensive peptides. Yet, to the best of our knowledge, all the in silico models predict only the antihypertensive activity of peptides while neglecting their inherent toxic potential to red blood cells. In this work, we report the first sequence-based model that combines perturbation theory and machine learning through multilayer perceptron networks (SB-PTML-MLP) to enable the simultaneous screening of antihypertensive activity and hemotoxicity of peptides. We have interpreted the molecular descriptors present in the model from a physicochemical and structural point of view. By strictly following such interpretations as guidelines, we performed two tasks. First, we selected amino acids with favorable contributions to both the increase of the antihypertensive activity and the diminution of hemotoxicity. Then, we assembled those suitable amino acids, virtually designing peptides that were predicted by the SB-PTML-MLP model as antihypertensive agents exhibiting low hemotoxicity. The potentiality of the SB-PTML-MLP model as a tool for designing potent and safe antihypertensive peptides was confirmed by predictions performed by online computational tools reported in the scientific literature. The methodology presented here can be extended to other pharmacological applications of peptides.
Collapse
|
13
|
Proteome-wide Prediction of Lysine Methylation Leads to Identification of H2BK43 Methylation and Outlines the Potential Methyllysine Proteome. Cell Rep 2021; 32:107896. [PMID: 32668242 DOI: 10.1016/j.celrep.2020.107896] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 04/29/2020] [Accepted: 06/22/2020] [Indexed: 12/15/2022] Open
Abstract
Protein Lys methylation plays a critical role in numerous cellular processes, but it is challenging to identify Lys methylation in a systematic manner. Here we present an approach combining in silico prediction with targeted mass spectrometry (MS) to identify Lys methylation (Kme) sites at the proteome level. We develop MethylSight, a program that predicts Kme events solely on the physicochemical properties of residues surrounding the putative methylation sites, which then requires validation by targeted MS. Using this approach, we identify 70 new histone Kme marks with a 90% validation rate. H2BK43me2, which undergoes dynamic changes during stem cell differentiation, is found to be a substrate of KDM5b. Furthermore, MethylSight predicts that Lys methylation is a prevalent post-translational modification in the human proteome. Our work provides a useful resource for guiding systematic exploration of the role of Lys methylation in human health and disease.
Collapse
|
14
|
Ruiz-Blanco YB, Ávila-Barrientos LP, Hernández-García E, Antunes A, Agüero-Chapin G, García-Hernández E. Engineering protein fragments via evolutionary and protein-protein interaction algorithms: de novo design of peptide inhibitors for F O F 1 -ATP synthase. FEBS Lett 2020; 595:183-194. [PMID: 33151544 DOI: 10.1002/1873-3468.13988] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 10/23/2020] [Accepted: 10/30/2020] [Indexed: 11/08/2022]
Abstract
Enzyme subunit interfaces have remarkable potential in drug design as both target and scaffold for their own inhibitors. We show an evolution-driven strategy for the de novo design of peptide inhibitors targeting interfaces of the Escherichia coli FoF1-ATP synthase as a case study. The evolutionary algorithm ROSE was applied to generate diversity-oriented peptide libraries by engineering peptide fragments from ATP synthase interfaces. The resulting peptides were scored with PPI-Detect, a sequence-based predictor of protein-protein interactions. Two selected peptides were confirmed by in vitro inhibition and binding tests. The proposed methodology can be widely applied to design peptides targeting relevant interfaces of enzymatic complexes.
Collapse
Affiliation(s)
| | | | | | - Agostinho Antunes
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Portugal.,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Portugal
| | - Guillermin Agüero-Chapin
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Portugal.,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Portugal
| | | |
Collapse
|
15
|
Kumar A, Dubey R, Singhai S, Konar AD, Basu A. Structural characterization with light scattering: A tool for rationally designing protein formulations. Anal Biochem 2020; 609:113979. [PMID: 33035463 DOI: 10.1016/j.ab.2020.113979] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 09/22/2020] [Accepted: 09/28/2020] [Indexed: 11/15/2022]
Abstract
Here we explore the possibility of using light scattering technologies as an analytical tool for understanding structural features of a protein that might be responsible for initiating aggregative interactions. Using widely independent complementary experimental and computational techniques, we found that interaction parameters like Km in particular possess good correlation with residue specific descriptors for the model protein Bovine Serum Albumin. Such information can help rationally design protein engineering and/or formulation strategies for prolonged shelf-life of such products.
Collapse
Affiliation(s)
- Atul Kumar
- School of Pharmaceutical Sciences, Rajiv Gandhi Technical University, Bhopal, India
| | - Richa Dubey
- School of Pharmaceutical Sciences, Rajiv Gandhi Technical University, Bhopal, India
| | - Sakshi Singhai
- School of Pharmaceutical Sciences, Rajiv Gandhi Technical University, Bhopal, India
| | - Anita Dutt Konar
- School of Pharmaceutical Sciences, Rajiv Gandhi Technical University, Bhopal, India; Department of Applied Chemistry, Rajiv Gandhi Technical University, Bhopal, India; University Grants Commission, UGC, New Delhi, India
| | - Anindya Basu
- School of Pharmaceutical Sciences, Rajiv Gandhi Technical University, Bhopal, India; University Grants Commission, UGC, New Delhi, India.
| |
Collapse
|
16
|
Mou Z, Eakes J, Cooper CJ, Foster CM, Standaert RF, Podar M, Doktycz MJ, Parks JM. Machine learning‐based prediction of enzyme substrate scope: Application to bacterial nitrilases. Proteins 2020; 89:336-347. [DOI: 10.1002/prot.26019] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 09/02/2020] [Accepted: 10/17/2020] [Indexed: 01/11/2023]
Affiliation(s)
- Zhongyu Mou
- Biosciences Division Oak Ridge National Laboratory Oak Ridge Tennessee USA
| | - Jason Eakes
- Biosciences Division Oak Ridge National Laboratory Oak Ridge Tennessee USA
| | - Connor J. Cooper
- Graduate School of Genome Science and Technology University of TennesseeWalters Life Science Knoxville Tennessee USA
| | - Carmen M. Foster
- Biosciences Division Oak Ridge National Laboratory Oak Ridge Tennessee USA
| | | | - Mircea Podar
- Biosciences Division Oak Ridge National Laboratory Oak Ridge Tennessee USA
| | - Mitchel J. Doktycz
- Biosciences Division Oak Ridge National Laboratory Oak Ridge Tennessee USA
- Graduate School of Genome Science and Technology University of TennesseeWalters Life Science Knoxville Tennessee USA
| | - Jerry M. Parks
- Biosciences Division Oak Ridge National Laboratory Oak Ridge Tennessee USA
- Graduate School of Genome Science and Technology University of TennesseeWalters Life Science Knoxville Tennessee USA
| |
Collapse
|
17
|
Karlberg M, de Souza JV, Fan L, Kizhedath A, Bronowska AK, Glassey J. QSAR Implementation for HIC Retention Time Prediction of mAbs Using Fab Structure: A Comparison between Structural Representations. Int J Mol Sci 2020; 21:ijms21218037. [PMID: 33126648 PMCID: PMC7663183 DOI: 10.3390/ijms21218037] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 10/22/2020] [Accepted: 10/27/2020] [Indexed: 12/19/2022] Open
Abstract
Monoclonal antibodies (mAbs) constitute a rapidly growing biopharmaceutical sector. However, their growth is impeded by high failure rates originating from failed clinical trials and developability issues in process development. There is, therefore, a growing need for better in silico tools to aid in risk assessment of mAb candidates to promote early-stage screening of potentially problematic mAb candidates. In this study, a quantitative structure–activity relationship (QSAR) modelling workflow was designed for the prediction of hydrophobic interaction chromatography (HIC) retention times of mAbs. Three novel descriptor sets derived from primary sequence, homology modelling, and atomistic molecular dynamics (MD) simulations were developed and assessed to determine the necessary level of structural resolution needed to accurately capture the relationship between mAb structures and HIC retention times. The results showed that descriptors derived from 3D structures obtained after MD simulations were the most suitable for HIC retention time prediction with a R2 = 0.63 in an external test set. It was found that when using homology modelling, the resulting 3D structures became biased towards the used structural template. Performing an MD simulation therefore proved to be a necessary post-processing step for the mAb structures in order to relax the structures and allow them to attain a more natural conformation. Based on the results, the proposed workflow in this paper could therefore potentially contribute to aid in risk assessment of mAb candidates in early development.
Collapse
Affiliation(s)
- Micael Karlberg
- School of Engineering, Newcastle University, Newcastle upon Tyne NE1 7RU, UK; (M.K.); (L.F.); (A.K.)
| | - João Victor de Souza
- Chemistry—School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UK; (J.V.d.S.); (A.K.B.)
| | - Lanyu Fan
- School of Engineering, Newcastle University, Newcastle upon Tyne NE1 7RU, UK; (M.K.); (L.F.); (A.K.)
- Chemistry—School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UK; (J.V.d.S.); (A.K.B.)
| | - Arathi Kizhedath
- School of Engineering, Newcastle University, Newcastle upon Tyne NE1 7RU, UK; (M.K.); (L.F.); (A.K.)
| | - Agnieszka K. Bronowska
- Chemistry—School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UK; (J.V.d.S.); (A.K.B.)
| | - Jarka Glassey
- School of Engineering, Newcastle University, Newcastle upon Tyne NE1 7RU, UK; (M.K.); (L.F.); (A.K.)
- Correspondence:
| |
Collapse
|
18
|
Aguilera-Mendoza L, Marrero-Ponce Y, García-Jacas CR, Chavez E, Beltran JA, Guillen-Ramirez HA, Brizuela CA. Automatic construction of molecular similarity networks for visual graph mining in chemical space of bioactive peptides: an unsupervised learning approach. Sci Rep 2020; 10:18074. [PMID: 33093586 PMCID: PMC7583304 DOI: 10.1038/s41598-020-75029-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Accepted: 09/23/2020] [Indexed: 12/15/2022] Open
Abstract
The increasing interest in bioactive peptides with therapeutic potentials has been reflected in a large variety of biological databases published over the last years. However, the knowledge discovery process from these heterogeneous data sources is a nontrivial task, becoming the essence of our research endeavor. Therefore, we devise a unified data model based on molecular similarity networks for representing a chemical reference space of bioactive peptides, having an implicit knowledge that is currently not explicitly accessed in existing biological databases. Indeed, our main contribution is a novel workflow for the automatic construction of such similarity networks, enabling visual graph mining techniques to uncover new insights from the "ocean" of known bioactive peptides. The workflow presented here relies on the following sequential steps: (i) calculation of molecular descriptors by applying statistical and aggregation operators on amino acid property vectors; (ii) a two-stage unsupervised feature selection method to identify an optimized subset of descriptors using the concepts of entropy and mutual information; (iii) generation of sparse networks where nodes represent bioactive peptides, and edges between two nodes denote their pairwise similarity/distance relationships in the defined descriptor space; and (iv) exploratory analysis using visual inspection in combination with clustering and network science techniques. For practical purposes, the proposed workflow has been implemented in our visual analytics software tool ( http://mobiosd-hub.com/starpep/ ), to assist researchers in extracting useful information from an integrated collection of 45120 bioactive peptides, which is one of the largest and most diverse data in its field. Finally, we illustrate the applicability of the proposed workflow for discovering central nodes in molecular similarity networks that may represent a biologically relevant chemical space known to date.
Collapse
Affiliation(s)
- Longendri Aguilera-Mendoza
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Baja California, 22860, Mexico
| | - Yovani Marrero-Ponce
- Universidad San Francisco de Quito, Grupo de Medicina Molecular y Traslacional (MeM&T), Escuela de Medicina, Colegio de Ciencias de la Salud (COCSA), Av. Interoceánica Km 12 1/2 y Av. Florencia, 17-1200-841, Quito, Ecuador.
- Grupo GINUMED, Corporacion Universitaria Rafael Nuñez. Facultad de Salud, Programa de Medicina, Cartagena, Colombia.
- Unidad de Investigación de Diseño de Fármacos y Conectividad Molecular, Departamento de Química Física, Facultad de Farmacia, Universitat de València, Valencia, Spain.
| | - César R García-Jacas
- Cátedras Conacyt - Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California, Mexico
| | - Edgar Chavez
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Baja California, 22860, Mexico
| | - Jesus A Beltran
- Department of Informatics, University of California, Irvine, Irvine, CA, USA
| | - Hugo A Guillen-Ramirez
- Department of BioMedical Research (DBMR), University of Bern, Bern, 3008, Switzerland
- Department of Medical Oncology, Inselspital, University Hospital and University of Bern, 3010, Bern, Switzerland
| | - Carlos A Brizuela
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Baja California, 22860, Mexico.
| |
Collapse
|
19
|
Poot Velez AH, Fontove F, Del Rio G. Protein-Protein Interactions Efficiently Modeled by Residue Cluster Classes. Int J Mol Sci 2020; 21:E4787. [PMID: 32640745 PMCID: PMC7370293 DOI: 10.3390/ijms21134787] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Revised: 06/20/2020] [Accepted: 06/28/2020] [Indexed: 01/22/2023] Open
Abstract
Predicting protein-protein interactions (PPI) represents an important challenge in structural bioinformatics. Current computational methods display different degrees of accuracy when predicting these interactions. Different factors were proposed to help improve these predictions, including choosing the proper descriptors of proteins to represent these interactions, among others. In the current work, we provide a representative protein structure that is amenable to PPI classification using machine learning approaches, referred to as residue cluster classes. Through sampling and optimization, we identified the best algorithm-parameter pair to classify PPI from more than 360 different training sets. We tested these classifiers against PPI datasets that were not included in the training set but shared sequence similarity with proteins in the training set to reproduce the situation of most proteins sharing sequence similarity with others. We identified a model with almost no PPI error (96-99% of correctly classified instances) and showed that residue cluster classes of protein pairs displayed a distinct pattern between positive and negative protein interactions. Our results indicated that residue cluster classes are structural features relevant to model PPI and provide a novel tool to mathematically model the protein structure/function relationship.
Collapse
Affiliation(s)
- Albros Hermes Poot Velez
- Department of biochemistry and structural biology, Instituto de fisiologia celular, UNAM Mexico City 04510, Mexico;
| | | | - Gabriel Del Rio
- Department of biochemistry and structural biology, Instituto de fisiologia celular, UNAM Mexico City 04510, Mexico;
| |
Collapse
|
20
|
Youmans M, Spainhour JCG, Qiu P. Classification of Antibacterial Peptides Using Long Short-Term Memory Recurrent Neural Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1134-1140. [PMID: 30843849 DOI: 10.1109/tcbb.2019.2903800] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Antimicrobial peptides are short amino acid sequences that may be antibacterial, antifungal, and antiviral. Most machine learning methodologies applied to identifying antibacterial peptides have developed feature vectors of identical lengths for each peptide in a given dataset although the peptides themselves may differ in number of amino acids. Features are often chosen which represent certain periodic patterns in the peptide sequence without any initial guidance as to whether such patterns are relevant for the classification task at hand. This can result in the construction of a large number of irrelevant features in addition to relevant features. To help alleviate these issues, we choose to extract a feature vector from individual amino acid feature representations through the application of bidirectional Long Short-Term Memory recurrent neural networks. The Long Short-Term Memory network recursively iterates along both directions of the given amino acid sequence and ultimately extracts a finite length feature vector that is then used to classify the peptide. This work demonstrates the application of Long Short-Term Memory recurrent neural networks to classification of antibacterial peptides and compares it to a Random Forest classifier and a k-nearest neighbor classifier.
Collapse
|
21
|
Romero-Molina S, Ruiz-Blanco YB, Green JR, Sanchez-Garcia E. ProtDCal-Suite: A web server for the numerical codification and functional analysis of proteins. Protein Sci 2020; 28:1734-1743. [PMID: 31271472 DOI: 10.1002/pro.3673] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 06/21/2019] [Accepted: 06/24/2019] [Indexed: 12/24/2022]
Abstract
Computational tools for the analysis of protein data and the prediction of biological properties are essential in life sciences and biomedical research. Here, we introduce ProtDCal-Suite, a web server comprising a set of machine learning-based methods for studying proteins. The main module of ProtDCal-Suite is the ProtDCal software. ProtDCal translates the structural information of proteins into numerical descriptors that serve as input to machine-learning techniques. The ProtDCal-Suite server also incorporates a post-processing optional stage that allows ranking and filtering the obtained descriptors by computing their Shannon entropy values across the input set of proteins. ProtDCal's codification was used in the development of models for the prediction of specific protein properties. Thus, the other modules of ProtDCal-Suite are protein analysis tools implemented using ProtDCal's descriptors. Among them are PPI-Detect, for predicting the interaction likelihood of protein-protein and protein-peptide pairs, Enzyme Identifier, for identifying enzymes from amino acid sequences or 3D structures, and Pred-NGlyco, for predicting N-glycosylation sites. ProtDCal-Suite is freely accessible at https://protdcal.zmb.uni-due.de.
Collapse
Affiliation(s)
- Sandra Romero-Molina
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen, Essen, Germany
| | - Yasser B Ruiz-Blanco
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen, Essen, Germany
| | - James R Green
- Systems and Computer Engineering, Carleton University, Ottawa, Ontario, Canada
| | - Elsa Sanchez-Garcia
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen, Essen, Germany
| |
Collapse
|
22
|
Contreras-Torres E, Marrero-Ponce Y, Terán JE, García-Jacas CR, Brizuela CA, Sánchez-Rodríguez JC. MuLiMs-MCoMPAs: A Novel Multiplatform Framework to Compute Tensor Algebra-Based Three-Dimensional Protein Descriptors. J Chem Inf Model 2020; 60:1042-1059. [PMID: 31663741 DOI: 10.1021/acs.jcim.9b00629] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
This report introduces the MuLiMs-MCoMPAs software (acronym for Multi-Linear Maps based on N-Metric and Contact Matrices of 3D Protein and Amino-acid weightings), designed to compute tensor-based 3D protein structural descriptors by applying two- and three-linear algebraic forms. Moreover, these descriptors contemplate generalizing components such as novel 3D protein structural representations, (dis)similarity metrics, and multimetrics to extract geometrical related information between two and three amino acids, weighting schemes based on amino acid properties, matrix normalization procedures that consider simple-stochastic and mutual probability transformations, topological and geometrical cutoffs, amino acid, and group-based MD calculations, and aggregation operators for merging amino acidic and group MDs. The MuLiMs-MCoMPAs software, which belongs to the ToMoCoMD-CAMPS suite, was developed in Java (version 1.8) using the Chemistry Development Kit (CDK) (version 1.4.19) and the Jmol libraries. This software implemented a divide-and-conquer strategy to parallelize the computation of the indices as well as modules for data preprocessing and batch computing functionalities. Furthermore, it consists of two components: (i) a desktop-graphical user interface (GUI) and (ii) an API library. The relevance of this novel approach is demonstrated through two analyses that considered Shannon's entropy-based variability and a principal component analysis. These studies showed that the MuLiMs-MCoMPAs' three-linear descriptor family contains higher informational entropy than several other descriptors generated with available computation tools. Moreover, the MuLiMs-MCoMPAs indices capture additional orthogonal information to the one codified by the available calculation approaches. As a result, two sets of suggested theoretical configurations that contain 13648 two-linear indices and 20263 three-linear indices are available for download at tomocomd.com . Furthermore, as a demonstration of the applicability and easy integration of the MuLiMs library into a QSAR-based expert system, a software application (ProStAF) was generated to predict SCOP protein structural classes and folding rate. It can thus be anticipated that the MuLiMs-MCoMPAs framework will turn into a valuable contribution to the chem- and bioinformatics research fields.
Collapse
Affiliation(s)
- Ernesto Contreras-Torres
- Computer-Aided Molecular "Biosilico" Discovery and Bioinformatics Research International Network (CAMD-BIR IN) , Cumbayá, Quito , Ecuador.,Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas; and Instituto de Simulación Computacional (ISC-USFQ) , Universidad San Francisco de Quito (USFQ) , Diego de Robles y vía Interoceánica , Quito 170157 , Pichincha , Ecuador
| | - Yovani Marrero-Ponce
- Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas; and Instituto de Simulación Computacional (ISC-USFQ) , Universidad San Francisco de Quito (USFQ) , Diego de Robles y vía Interoceánica , Quito 170157 , Pichincha , Ecuador.,Grupo GINUMED, Facultad de Salud, Programa de Medicina , Corporacion Universitaria Rafal Nuñez , Cartagena , Colombia.,Unidad de Investigación de Diseño de Fármacos y Conectividad Molecular, Departamento de Química Física, Facultad de Farmacia , Universitat de València , 46010 Valéncia , Spain
| | - Julio E Terán
- Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas; and Instituto de Simulación Computacional (ISC-USFQ) , Universidad San Francisco de Quito (USFQ) , Diego de Robles y vía Interoceánica , Quito 170157 , Pichincha , Ecuador.,Grupo de Química Computacional y Teórica, Departamento de Ingeniería Química , Universidad San Francisco de Quito (USFQ) , Diego de Robles y vía Interoceánica , Quito 170157 , Pichincha Ecuador
| | - César R García-Jacas
- Cátedras Conacyt-Departamento de Ciencias de la Computación , Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE) , Ensenada , Baja California , México
| | - Carlos A Brizuela
- Departamento de Ciencias de la Computación , Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE) , Ensenada , Baja California , México
| | | |
Collapse
|
23
|
García-Jacas CR, Marrero-Ponce Y, Vivas-Reyes R, Suárez-Lezcano J, Martinez-Rios F, Terán JE, Aguilera-Mendoza L. Distributed and multicore QuBiLS-MIDAS software v2.0: Computing chiral, fuzzy, weighted and truncated geometrical molecular descriptors based on tensor algebra. J Comput Chem 2020; 41:1209-1227. [PMID: 32058625 DOI: 10.1002/jcc.26167] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2019] [Revised: 01/22/2020] [Accepted: 01/26/2020] [Indexed: 12/12/2022]
Abstract
Advances to the distributed, multi-core and fully cross-platform QuBiLS-MIDAS software v2.0 (http://tomocomd.com/qubils-midas) are reported in this article since the v1.0 release. The QuBiLS-MIDAS software is the only one that computes atom-pair and alignment-free geometrical MDs (3D-MDs) from several distance metrics other than the Euclidean distance, as well as alignment-free 3D-MDs that codify structural information regarding the relations among three and four atoms of a molecule. The most recent features added to the QuBiLS-MIDAS software v2.0 are related (a) to the calculation of atomic weightings from indices based on the vertex-degree invariant (e.g., Alikhanidi index); (b) to consider central chirality during the molecular encoding; (c) to use measures based on clustering methods and statistical functions to codify structural information among more than two atoms; (d) to the use of a novel method based on fuzzy membership functions to spherically truncate inter-atomic relations; and (e) to the use of weighted and fuzzy aggregation operators to compute global 3D-MDs according to the importance and/or interrelation of the atoms of a molecule during the molecular encoding. Moreover, a novel module to compute QuBiLS-MIDAS 3D-MDs from their headings was also developed. This module can be used either by the graphical user interface or by means of the software library. By using the library, both the predictive models built with the QuBiLS-MIDAS 3D-MDs and the QuBiLS-MIDAS 3D-MDs calculation can be embedded in other tools. A set of predefined QuBiLS-MIDAS 3D-MDs with high information content and low redundancy on a set comprised of 20,469 compounds is also provided to be employed in further cheminformatics tasks. This set of predefined 3D-MDs evidenced better performance than all the universe of Dragon (v5.5) and PaDEL 0D-to-3D MDs in variability studies, whereas a linear independence study proved that these QuBiLS-MIDAS 3D-MDs codify chemical information orthogonal to the Dragon 0D-to-3D MDs. This set of predefined 3D-MDs would be periodically updated as long as new results be achieved. In general, this report highlights our continued efforts to provide a better tool for a most suitable characterization of compounds, and in this way, to contribute to obtaining better outcomes in future applications.
Collapse
Affiliation(s)
- César R García-Jacas
- Cátedras Conacyt - Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja, California, Mexico
| | - Yovani Marrero-Ponce
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Pichincha, Ecuador.,Instituto de Simulación Computacional (ISC-USFQ), Diego de Robles y vía Interoceánica, Quito, Pichincha, Ecuador.,Grupo GINUMED, Corporacion Universitaria Rafael Nuñez, Facultad de Salud, Programa de Medicina, Cartagena, Colombia.,Unidad de Investigación de Diseño de Fármacos y Conectividad Molecular, Departamento de Química Física, Facultad de Farmacia, Universitat de València, Spain
| | - Ricardo Vivas-Reyes
- Grupo de Química Cuántica y Teórica de la Universidad de Cartagena - Facultad de Ciencias Exactas y Naturales. Programa de Química. Campus de San Pablo, Cartagena, Colombia.,Grupo CipTec, Facultad de Ingenierias. Fundacion Universitaria Tecnologico Comfenalco - Cartagena, Cartagena, Bolívar, Colombia
| | - José Suárez-Lezcano
- Pontificia Universidad Católica del Ecuador Sede Esmeraldas (PUCESE), Esmeraldas, Ecuador
| | | | - Julio E Terán
- Department of Textile Engineering, Chemistry and Science, College of Textiles, NorthCarolina State University, Raleigh, NC, USA
| | - Longendri Aguilera-Mendoza
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California, Mexico
| |
Collapse
|
24
|
Gentiluomo L, Svilenov HL, Augustijn D, El Bialy I, Greco ML, Kulakova A, Indrakumar S, Mahapatra S, Morales MM, Pohl C, Roche A, Tosstorff A, Curtis R, Derrick JP, Nørgaard A, Khan TA, Peters GHJ, Pluen A, Rinnan Å, Streicher WW, van der Walle CF, Uddin S, Winter G, Roessner D, Harris P, Frieß W. Advancing Therapeutic Protein Discovery and Development through Comprehensive Computational and Biophysical Characterization. Mol Pharm 2020; 17:426-440. [PMID: 31790599 DOI: 10.1021/acs.molpharmaceut.9b00852] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Therapeutic protein candidates should exhibit favorable properties that render them suitable to become drugs. Nevertheless, there are no well-established guidelines for the efficient selection of proteinaceous molecules with desired features during early stage development. Such guidelines can emerge only from a large body of published research that employs orthogonal techniques to characterize therapeutic proteins in different formulations. In this work, we share a study on a diverse group of proteins, including their primary sequences, purity data, and computational and biophysical characterization at different pH and ionic strength. We report weak linear correlations between many of the biophysical parameters. We suggest that a stability comparison of diverse therapeutic protein candidates should be based on a computational and biophysical characterization in multiple formulation conditions, as the latter can largely determine whether a protein is above or below a certain stability threshold. We use the presented data set to calculate several stability risk scores obtained with an increasing level of analytical effort and show how they correlate with protein aggregation during storage. Our work highlights the importance of developing combined risk scores that can be used for early stage developability assessment. We suggest that such scores can have high prediction accuracy only when they are based on protein stability characterization in different solution conditions.
Collapse
Affiliation(s)
- Lorenzo Gentiluomo
- Wyatt Technology Europe GmbH , Hochstrasse 18 , 56307 Dernbach , Germany.,Department of Pharmacy, Pharmaceutical Technology and Biopharmaceutics , Ludwig-Maximilians-Universitaet Muenchen , Butenandtstrasse 5 , 81377 Munich , Germany
| | - Hristo L Svilenov
- Department of Pharmacy, Pharmaceutical Technology and Biopharmaceutics , Ludwig-Maximilians-Universitaet Muenchen , Butenandtstrasse 5 , 81377 Munich , Germany
| | - Dillen Augustijn
- Department of Food Science, Faculty of Science , Copenhagen University , Rolighedsvej 26 , 1958 Frederiksberg , Denmark
| | - Inas El Bialy
- Department of Pharmacy, Pharmaceutical Technology and Biopharmaceutics , Ludwig-Maximilians-Universitaet Muenchen , Butenandtstrasse 5 , 81377 Munich , Germany
| | - Maria Laura Greco
- Dosage Form Design and Development , AstraZeneca , Sir Aaron Klug Building, Granta Park , Cambridge CB21 6GH , U.K
| | - Alina Kulakova
- Department of Chemistry , Technical University of Denmark , Kemitorvet 207 , 2800 Kongens Lyngby , Denmark
| | - Sowmya Indrakumar
- Department of Chemistry , Technical University of Denmark , Kemitorvet 207 , 2800 Kongens Lyngby , Denmark
| | | | - Marcello Martinez Morales
- Dosage Form Design and Development , AstraZeneca , Sir Aaron Klug Building, Granta Park , Cambridge CB21 6GH , U.K
| | - Christin Pohl
- Novozymes A/S , Krogshoejvej 36 , 2880 Bagsvaerd , Denmark
| | - Aisling Roche
- School of Chemical Engineering and Analytical Science, Manchester Institute of Biotechnology , The University of Manchester , 131 Princess Street , Manchester M1 7DN , U.K
| | - Andreas Tosstorff
- Department of Pharmacy, Pharmaceutical Technology and Biopharmaceutics , Ludwig-Maximilians-Universitaet Muenchen , Butenandtstrasse 5 , 81377 Munich , Germany
| | - Robin Curtis
- School of Chemical Engineering and Analytical Science, Manchester Institute of Biotechnology , The University of Manchester , 131 Princess Street , Manchester M1 7DN , U.K
| | - Jeremy P Derrick
- School of Biological Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre , The University of Manchester , Oxford Road , Manchester M13 9PT , U.K
| | - Allan Nørgaard
- Novozymes A/S , Krogshoejvej 36 , 2880 Bagsvaerd , Denmark
| | - Tarik A Khan
- Pharmaceutical Development & Supplies, Pharma Technical Development Biologics Europe , F. Hoffmann-La Roche Ltd. , Grenzacherstrasse 124 , 4070 Basel , Switzerland
| | - Günther H J Peters
- Department of Chemistry , Technical University of Denmark , Kemitorvet 207 , 2800 Kongens Lyngby , Denmark
| | - Alain Pluen
- School of Chemical Engineering and Analytical Science, Manchester Institute of Biotechnology , The University of Manchester , 131 Princess Street , Manchester M1 7DN , U.K
| | - Åsmund Rinnan
- Department of Food Science, Faculty of Science , Copenhagen University , Rolighedsvej 26 , 1958 Frederiksberg , Denmark
| | | | - Christopher F van der Walle
- Dosage Form Design and Development , AstraZeneca , Sir Aaron Klug Building, Granta Park , Cambridge CB21 6GH , U.K
| | - Shahid Uddin
- Dosage Form Design and Development , AstraZeneca , Sir Aaron Klug Building, Granta Park , Cambridge CB21 6GH , U.K
| | - Gerhard Winter
- Department of Pharmacy, Pharmaceutical Technology and Biopharmaceutics , Ludwig-Maximilians-Universitaet Muenchen , Butenandtstrasse 5 , 81377 Munich , Germany
| | - Dierk Roessner
- Wyatt Technology Europe GmbH , Hochstrasse 18 , 56307 Dernbach , Germany
| | - Pernille Harris
- Department of Chemistry , Technical University of Denmark , Kemitorvet 207 , 2800 Kongens Lyngby , Denmark
| | - Wolfgang Frieß
- Department of Pharmacy, Pharmaceutical Technology and Biopharmaceutics , Ludwig-Maximilians-Universitaet Muenchen , Butenandtstrasse 5 , 81377 Munich , Germany
| |
Collapse
|
25
|
Agüero-Chapin G, Galpert D, Molina-Ruiz R, Ancede-Gallardo E, Pérez-Machado G, De la Riva GA, Antunes A. Graph Theory-Based Sequence Descriptors as Remote Homology Predictors. Biomolecules 2019; 10:E26. [PMID: 31878100 PMCID: PMC7022958 DOI: 10.3390/biom10010026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 12/16/2019] [Accepted: 12/18/2019] [Indexed: 12/23/2022] Open
Abstract
Alignment-free (AF) methodologies have increased in popularity in the last decades as alternative tools to alignment-based (AB) algorithms for performing comparative sequence analyses. They have been especially useful to detect remote homologs within the twilight zone of highly diverse gene/protein families and superfamilies. The most popular alignment-free methodologies, as well as their applications to classification problems, have been described in previous reviews. Despite a new set of graph theory-derived sequence/structural descriptors that have been gaining relevance in the detection of remote homology, they have been omitted as AF predictors when the topic is addressed. Here, we first go over the most popular AF approaches used for detecting homology signals within the twilight zone and then bring out the state-of-the-art tools encoding graph theory-derived sequence/structure descriptors and their success for identifying remote homologs. We also highlight the tendency of integrating AF features/measures with the AB ones, either into the same prediction model or by assembling the predictions from different algorithms using voting/weighting strategies, for improving the detection of remote signals. Lastly, we briefly discuss the efforts made to scale up AB and AF features/measures for the comparison of multiple genomes and proteomes. Alongside the achieved experiences in remote homology detection by both the most popular AF tools and other less known ones, we provide our own using the graphical-numerical methodologies, MARCH-INSIDE, TI2BioP, and ProtDCal. We also present a new Python-based tool (SeqDivA) with a friendly graphical user interface (GUI) for delimiting the twilight zone by using several similar criteria.
Collapse
Affiliation(s)
- Guillermin Agüero-Chapin
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Deborah Galpert
- Departamento de Ciencia de la Computación. Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Reinaldo Molina-Ruiz
- Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Evys Ancede-Gallardo
- Programa de Doctorado en Fisicoquímica Molecular, Facultad de Ciencias Exactas, Universidad Andrés Bello, Av. República 239, Santiago 8370146, Chile;
| | - Gisselle Pérez-Machado
- EpiDisease S.L. Spin-Off of Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), 46980 Valencia, Spain;
| | - Gustavo A. De la Riva
- Laboratorio de Biotecnología Aplicada S. de R.L. de C.V., GRECA Inc., Carretera La Piedad-Carapán, km 3.5, La Piedad, Michoacán 59300, Mexico;
- Tecnológico Nacional de México, Instituto Tecnológico de la Piedad, Av. Ricardo Guzmán Romero, Santa Fe, La Piedad de Cavadas, Michoacán 59370, Mexico
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| |
Collapse
|
26
|
Affiliation(s)
- Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
| | - Zbynek Prokop
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International Centre for Clinical Research, St. Ann’s Hospital, 602 00 Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International Centre for Clinical Research, St. Ann’s Hospital, 602 00 Brno, Czech Republic
| |
Collapse
|
27
|
Yang Y, Ding X, Zhu G, Niroula A, Lv Q, Vihinen M. ProTstab - predictor for cellular protein stability. BMC Genomics 2019; 20:804. [PMID: 31684883 PMCID: PMC6830000 DOI: 10.1186/s12864-019-6138-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Accepted: 09/24/2019] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Stability is one of the most fundamental intrinsic characteristics of proteins and can be determined with various methods. Characterization of protein properties does not keep pace with increase in new sequence data and therefore even basic properties are not known for far majority of identified proteins. There have been some attempts to develop predictors for protein stabilities; however, they have suffered from small numbers of known examples. RESULTS We took benefit of results from a recently developed cellular stability method, which is based on limited proteolysis and mass spectrometry, and developed a machine learning method using gradient boosting of regression trees. ProTstab method has high performance and is well suited for large scale prediction of protein stabilities. CONCLUSIONS The Pearson's correlation coefficient was 0.793 in 10-fold cross validation and 0.763 in independent blind test. The corresponding values for mean absolute error are 0.024 and 0.036, respectively. Comparison with a previously published method indicated ProTstab to have superior performance. We used the method to predict stabilities of all the remaining proteins in the entire human proteome and then correlated the predicted stabilities to protein chain lengths of isoforms and to localizations of proteins.
Collapse
Affiliation(s)
- Yang Yang
- School of Computer Science and Technology, Soochow University, Suzhou, China
- Department of Experimental Medical Science, BMC B13, Lund University, Lund, Sweden
- Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou, China
| | - Xuesong Ding
- School of Computer Science and Technology, Soochow University, Suzhou, China
| | - Guanchen Zhu
- School of Computer Science and Technology, Soochow University, Suzhou, China
| | - Abhishek Niroula
- Department of Experimental Medical Science, BMC B13, Lund University, Lund, Sweden
| | - Qiang Lv
- School of Computer Science and Technology, Soochow University, Suzhou, China
| | - Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, Lund, Sweden.
| |
Collapse
|
28
|
Application of interpretable artificial neural networks to early monoclonal antibodies development. Eur J Pharm Biopharm 2019; 141:81-89. [DOI: 10.1016/j.ejpb.2019.05.017] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Revised: 05/17/2019] [Accepted: 05/17/2019] [Indexed: 11/20/2022]
|
29
|
Kizhedath A, Karlberg M, Glassey J. Cross-Interaction Chromatography-Based QSAR Model for Early-Stage Screening to Facilitate Enhanced Developability of Monoclonal Antibody Therapeutics. Biotechnol J 2019; 14:e1800696. [PMID: 30810283 DOI: 10.1002/biot.201800696] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Revised: 01/19/2019] [Indexed: 01/13/2023]
Abstract
Monoclonal antibodies (mAbs) constitute a rapidly growing biopharmaceutical sector. However, their growth is impeded by developability issues such as polyspecificity and lack of solubility, which leads to attrition as well as manufacturing failures. In this study a multitool hybrid quantitative structure-activity relationship (QSAR) model development framework is described. This framework uses four novel datasets derived from the primary sequences of IgG1-κ-humanized mAbs with varying degrees of resolutions. Unsupervised pattern recognition is first performed on the descriptor sets to visualize any intrinsic property-based clustering, followed by regression of descriptors against cross-interaction chromatography (CIC) retention times. Model optimization is performed via unsupervised variable reduction followed by supervised variable selection. Finally, the models and datasets are benchmarked based on the regression model performance metrics such as R2 , Q2 , and RMSE. The results show that datasets containing localized descriptors rather than averaged value over the entire protein have better predictive performance of CIC retention behavior with R2 > 0.8 and RMSE < 0.3. Furthermore, the results indicate the physicochemical, electronic, and topological properties of hypervariable regions of antibodies that contribute most to the CIC retention times. The results of these studies could contribute to early-stage screening and better design of mAbs.
Collapse
Affiliation(s)
- Arathi Kizhedath
- School of Engineering, Newcastle University, Newcastle upon Tyne, NE17RU, UK
| | - Micael Karlberg
- School of Engineering, Newcastle University, Newcastle upon Tyne, NE17RU, UK
| | - Jarka Glassey
- School of Engineering, Newcastle University, Newcastle upon Tyne, NE17RU, UK
| |
Collapse
|
30
|
Sachdev K, Gupta MK. A comprehensive review of feature based methods for drug target interaction prediction. J Biomed Inform 2019; 93:103159. [PMID: 30926470 DOI: 10.1016/j.jbi.2019.103159] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 03/25/2019] [Accepted: 03/26/2019] [Indexed: 12/22/2022]
Abstract
Drug target interaction is a prominent research area in the field of drug discovery. It refers to the recognition of interactions between chemical compounds and the protein targets in the human body. Wet lab experiments to identify these interactions are expensive as well as time consuming. The computational methods of interaction prediction help limit the search space for these experiments. These computational methods can be divided into ligand based approaches, docking approaches and chemogenomic approaches. In this review, we aim to describe the various feature based chemogenomic methods for drug target interaction prediction. It provides a comprehensive overview of the various techniques, datasets, tools and metrics. The feature based methods have been categorized, explained and compared. A novel framework for drug target interaction prediction has also been proposed that aims to improve the performance of existing methods. To the best of our knowledge, this is the first comprehensive review focusing only on feature based methods of drug target interaction.
Collapse
Affiliation(s)
- Kanica Sachdev
- Computer Science and Engineering Department, SMVDU, J&K, India.
| | | |
Collapse
|
31
|
Romero-Molina S, Ruiz-Blanco YB, Harms M, Münch J, Sanchez-Garcia E. PPI-Detect: A support vector machine model for sequence-based prediction of protein-protein interactions. J Comput Chem 2019; 40:1233-1242. [PMID: 30768790 DOI: 10.1002/jcc.25780] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 11/29/2018] [Accepted: 12/29/2018] [Indexed: 12/18/2022]
Abstract
The prediction of peptide-protein or protein-protein interactions (PPI) is a challenging task, especially if amino acid sequences are the only information available. Machine learning methods allow us to exploit the information content in PPI datasets. However, the numerical codification of these datasets often influences the performance of data mining approaches. Here, we introduce a procedure for the general-purpose numerical codification of polypeptides. This procedure transforms pairs of amino acid sequences into a machine learning-friendly vector, whose elements represent numerical descriptors of residues in proteins. We used this numerical encoding procedure for the development of a support vector machine model (PPI-Detect), which allows predicting whether two proteins will interact or not. PPI-Detect (https://ppi-detect.zmb.uni-due.de/) outperforms state of the art sequence-based predictors of PPI. We employed PPI-Detect for the analysis of derivatives of EPI-X4, an endogenous peptide inhibitor of CXCR4, a G-protein-coupled receptor. There, we identified with high accuracy those peptides which bind better than EPI-X4 to the receptor. Also using PPI-Detect, we designed a novel peptide and then experimentally established its anti-CXCR4 activity. © 2019 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Sandra Romero-Molina
- Center of Medical Biotechnology, University of Duisburg-Essen, Duisburg, Germany
| | - Yasser B Ruiz-Blanco
- Center of Medical Biotechnology, University of Duisburg-Essen, Duisburg, Germany
| | - Mirja Harms
- Institute of Molecular Virology, Ulm University Medical Center, Ulm, Germany
| | - Jan Münch
- Institute of Molecular Virology, Ulm University Medical Center, Ulm, Germany.,Core Facility Functional Peptidomics, Ulm University Medical Center, Ulm, Germany
| | - Elsa Sanchez-Garcia
- Center of Medical Biotechnology, University of Duisburg-Essen, Duisburg, Germany
| |
Collapse
|
32
|
Johnson DE. Biotherapeutics: Challenges and Opportunities for Predictive Toxicology of Monoclonal Antibodies. Int J Mol Sci 2018; 19:E3685. [PMID: 30469350 PMCID: PMC6274697 DOI: 10.3390/ijms19113685] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Revised: 11/18/2018] [Accepted: 11/19/2018] [Indexed: 12/19/2022] Open
Abstract
Biotherapeutics are a rapidly growing portion of the total pharmaceutical market accounting for almost one-half of recent new drug approvals. A major portion of these approvals each year are monoclonal antibodies (mAbs). During development, non-clinical pharmacology and toxicology testing of mAbs differs from that done with chemical entities since these biotherapeutics are derived from a biological source and therefore the animal models must share the same epitopes (targets) as humans to elicit a pharmacological response. Mechanisms of toxicity of mAbs are both pharmacological and non-pharmacological in nature; however, standard in silico predictive toxicological methods used in research and development of chemical entities currently do not apply to these biotherapeutics. Challenges and potential opportunities exist for new methodologies to provide a more predictive program to assess and monitor potential adverse drug reactions of mAbs for specific patients before and during clinical trials and after market approval.
Collapse
Affiliation(s)
- Dale E Johnson
- Morgan Hall, University of California, Berkeley, Berkeley, CA 94720, USA.
| |
Collapse
|
33
|
Contreras-Torres E. Predicting structural classes of proteins by incorporating their global and local physicochemical and conformational properties into general Chou's PseAAC. J Theor Biol 2018; 454:139-145. [DOI: 10.1016/j.jtbi.2018.05.033] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Revised: 05/23/2018] [Accepted: 05/28/2018] [Indexed: 11/24/2022]
|
34
|
García-Jacas CR, Cabrera-Leyva L, Marrero-Ponce Y, Suárez-Lezcano J, Cortés-Guzmán F, García-González LA. GOWAWA Aggregation Operator-based Global Molecular Characterizations: Weighting Atom/bond Contributions (LOVIs/LOEIs) According to their Influence in the Molecular Encoding. Mol Inform 2018; 37:e1800039. [PMID: 30070434 DOI: 10.1002/minf.201800039] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Accepted: 07/13/2018] [Indexed: 11/11/2022]
Abstract
A different perspective to compute global weighted definitions of molecular descriptors from the contributions of each atom (LOVIs) or covalent bond (LOEIs) within a molecule is presented, using the generalized ordered weighted averaging - weighted averaging (GOWAWA) aggregation operator. This operator is rather different from the other norm-, mean- and statistic-based operators used up to date for the descriptors calculation from LOVIs/LOEIs. GOWAWA unifies the generalized ordered weighted averaging (GOWA) and the weighted generalized mean (WGM) functions and, in addition, it uses a smoothing parameter to assign different importance values to both functions depending on the problem under study. With the GOWAWA operator, diversity of novel global aggregations of molecular descriptors can be determined, where the influence that each atom (or covalent bond) has on the molecular characterization is taken into account. Therefore, this approach is completely different from the ones reported in the literature, where the values of LOVIs/LOEIs are considered equally important. To demonstrate the feasibility of using this operator, the QuBiLS-MIDAS descriptors (http://tomocomd.com/qubils-midas) were used and, as a result, a module was built into the corresponding software to compute them, being thus the only software reported in the literature that can be employed to determine weighted descriptors. Moreover, several modeling studies were performed on eight chemical datasets, which demonstrated that, with the GOWAWA aggregation operator, weighted QuBiLS-MIDAS descriptors that contribute to develop models with greater predictive power can be computed, if compared to the models based on the non-weighted descriptors calculated from the other operators used up to date. A non-parametric statistical assessment confirmed that the GOWAWA-based predictions are significantly superior to the others obtained. Therefore, all in all, it can be concluded that, from the results achieved, the GOWAWA operator constitutes a prominent alternative to codify relevant chemical information of the molecules, ultimately useful in improving the modeling ability of several old and recent descriptors whose definition is based on the LOVIs/LOEIs calculation.
Collapse
Affiliation(s)
- César R García-Jacas
- Instituto de Química, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, México
| | - Lisset Cabrera-Leyva
- Grupo de Investigación de Inteligencia Artificial (AIRES), Facultad de Informática, Universidad de Camagüey, Camagüey, Cuba
| | - Yovani Marrero-Ponce
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Pichincha, Ecuador.,Grupo de Investigación Ambiental (GIA), Programas Ambientales, Facultad de Ingenierías, Fundación Universitaria Tecnológico de Comfenalco (COMFENALCO), Cartagena de Indias, Bolívar, Colombia
| | - José Suárez-Lezcano
- Pontificia Universidad Católica del Ecuador Sede Esmeraldas (PUCESE), Esmeraldas, Ecuador
| | - Fernando Cortés-Guzmán
- Instituto de Química, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, México
| | - Luis A García-González
- Grupo de Investigación de Bioinformática, Universidad de las Ciencias Informáticas (UCI), La Habana, Cuba
| |
Collapse
|
35
|
PON-tstab: Protein Variant Stability Predictor. Importance of Training Data Quality. Int J Mol Sci 2018; 19:ijms19041009. [PMID: 29597263 PMCID: PMC5979465 DOI: 10.3390/ijms19041009] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Revised: 03/21/2018] [Accepted: 03/24/2018] [Indexed: 12/24/2022] Open
Abstract
Several methods have been developed to predict effects of amino acid substitutions on protein stability. Benchmark datasets are essential for method training and testing and have numerous requirements including that the data is representative for the investigated phenomenon. Available machine learning algorithms for variant stability have all been trained with ProTherm data. We noticed a number of issues with the contents, quality and relevance of the database. There were errors, but also features that had not been clearly communicated. Consequently, all machine learning variant stability predictors have been trained on biased and incorrect data. We obtained a corrected dataset and trained a random forests-based tool, PON-tstab, applicable to variants in any organism. Our results highlight the importance of the benchmark quality, suitability and appropriateness. Predictions are provided for three categories: stability decreasing, increasing and those not affecting stability.
Collapse
|
36
|
Dong J, Yao ZJ, Zhang L, Luo F, Lin Q, Lu AP, Chen AF, Cao DS. PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions. J Cheminform 2018; 10:16. [PMID: 29556758 PMCID: PMC5861255 DOI: 10.1186/s13321-018-0270-2] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Accepted: 03/12/2018] [Indexed: 11/15/2022] Open
Abstract
Background
With the increasing development of biotechnology and informatics technology, publicly available data in chemistry and biology are undergoing explosive growth. Such wealthy information in these data needs to be extracted and transformed to useful knowledge by various data mining methods. Considering the amazing rate at which data are accumulated in chemistry and biology fields, new tools that process and interpret large and complex interaction data are increasingly important. So far, there are no suitable toolkits that can effectively link the chemical and biological space in view of molecular representation. To further explore these complex data, an integrated toolkit for various molecular representation is urgently needed which could be easily integrated with data mining algorithms to start a full data analysis pipeline. Results Herein, the python library PyBioMed is presented, which comprises functionalities for online download for various molecular objects by providing different IDs, the pretreatment of molecular structures, the computation of various molecular descriptors for chemicals, proteins, DNAs and their interactions. PyBioMed is a feature-rich and highly customized python library used for the characterization of various complex chemical and biological molecules and interaction samples. The current version of PyBioMed could calculate 775 chemical descriptors and 19 kinds of chemical fingerprints, 9920 protein descriptors based on protein sequences, more than 6000 DNA descriptors from nucleotide sequences, and interaction descriptors from pairwise samples using three different combining strategies. Several examples and five real-life applications were provided to clearly guide the users how to use PyBioMed as an integral part of data analysis projects. By using PyBioMed, users are able to start a full pipelining from getting molecular data, pretreating molecules, molecular representation to constructing machine learning models conveniently. Conclusion PyBioMed provides various user-friendly and highly customized APIs to calculate various features of biological molecules and complex interaction samples conveniently, which aims at building integrated analysis pipelines from data acquisition, data checking, and descriptor calculation to modeling. PyBioMed is freely available at http://projects.scbdd.com/pybiomed.html.![]()
Collapse
Affiliation(s)
- Jie Dong
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China.,College of Food Science and Engineering, National Engineering Laboratory for Deep Processing of Rice and Byproducts, Central South University of Forestry and Technology, Changsha, China
| | - Zhi-Jiang Yao
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China
| | - Lin Zhang
- College of Food Science and Engineering, National Engineering Laboratory for Deep Processing of Rice and Byproducts, Central South University of Forestry and Technology, Changsha, China
| | - Feijun Luo
- College of Food Science and Engineering, National Engineering Laboratory for Deep Processing of Rice and Byproducts, Central South University of Forestry and Technology, Changsha, China
| | - Qinlu Lin
- College of Food Science and Engineering, National Engineering Laboratory for Deep Processing of Rice and Byproducts, Central South University of Forestry and Technology, Changsha, China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China
| | - Alex F Chen
- Center for Vascular Disease and Translational Medicine, Third Xiangya Hospital, Central South University, Changsha, People's Republic of China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China. .,Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China. .,Center for Vascular Disease and Translational Medicine, Third Xiangya Hospital, Central South University, Changsha, People's Republic of China.
| |
Collapse
|
37
|
Nath A, Kumari P, Chaube R. Prediction of Human Drug Targets and Their Interactions Using Machine Learning Methods: Current and Future Perspectives. Methods Mol Biol 2018; 1762:21-30. [PMID: 29594765 DOI: 10.1007/978-1-4939-7756-7_2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Identification of drug targets and drug target interactions are important steps in the drug-discovery pipeline. Successful computational prediction methods can reduce the cost and time demanded by the experimental methods. Knowledge of putative drug targets and their interactions can be very useful for drug repurposing. Supervised machine learning methods have been very useful in drug target prediction and in prediction of drug target interactions. Here, we describe the details for developing prediction models using supervised learning techniques for human drug target prediction and their interactions.
Collapse
Affiliation(s)
- Abhigyan Nath
- Department of Zoology, Institute of Science, Banaras Hindu University, Varanasi, Uttar Pradesh, India
| | - Priyanka Kumari
- Department of Biotechnology, Delhi Technological University, Delhi, India
| | - Radha Chaube
- Department of Zoology, Institute of Science, Banaras Hindu University, Varanasi, Uttar Pradesh, India.
| |
Collapse
|
38
|
Systematic Identification of Machine-Learning Models Aimed to Classify Critical Residues for Protein Function from Protein Structure. Molecules 2017; 22:molecules22101673. [PMID: 28991206 PMCID: PMC6151554 DOI: 10.3390/molecules22101673] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Revised: 09/24/2017] [Accepted: 09/24/2017] [Indexed: 12/14/2022] Open
Abstract
Protein structure and protein function should be related, yet the nature of this relationship remains unsolved. Mapping the critical residues for protein function with protein structure features represents an opportunity to explore this relationship, yet two important limitations have precluded a proper analysis of the structure-function relationship of proteins: (i) the lack of a formal definition of what critical residues are and (ii) the lack of a systematic evaluation of methods and protein structure features. To address this problem, here we introduce an index to quantify the protein-function criticality of a residue based on experimental data and a strategy aimed to optimize both, descriptors of protein structure (physicochemical and centrality descriptors) and machine learning algorithms, to minimize the error in the classification of critical residues. We observed that both physicochemical and centrality descriptors of residues effectively relate protein structure and protein function, and that physicochemical descriptors better describe critical residues. We also show that critical residues are better classified when residue criticality is considered as a binary attribute (i.e., residues are considered critical or not critical). Using this binary annotation for critical residues 8 models rendered accurate and non-overlapping classification of critical residues, confirming the multi-factorial character of the structure-function relationship of proteins.
Collapse
|
39
|
Ruiz-Blanco YB, Agüero-Chapin G, García-Hernández E, Álvarez O, Antunes A, Green J. Exploring general-purpose protein features for distinguishing enzymes and non-enzymes within the twilight zone. BMC Bioinformatics 2017; 18:349. [PMID: 28732462 PMCID: PMC5521120 DOI: 10.1186/s12859-017-1758-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 07/13/2017] [Indexed: 11/10/2022] Open
Affiliation(s)
- Yasser B Ruiz-Blanco
- Facultad de Química y Farmacia, Universidad Central "Marta Abreu" de Las Villas, 54830, Santa Clara, Cuba.,Theoretical Chemistry, Max Planck Institute für Kohlenforschung, 45470, Mulheim an der Ruhr, Germany
| | - Guillermin Agüero-Chapin
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208, Porto, Portugal. .,Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba. .,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal.
| | - Enrique García-Hernández
- Instituto de Química, Universidad Nacional Autónoma de México (UNAM), 04360, D.F, México, Mexico
| | - Orlando Álvarez
- Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba
| | - Agostinho Antunes
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208, Porto, Portugal.,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal
| | - James Green
- Department of Systems and Computer Engineering, Carleton University, Ottawa, Canada
| |
Collapse
|
40
|
Pan W, Chen DS, Lu YJ, Xu HW, Hao WT, Zhang YW, Qin SP, Zheng KY, Tang RX. Genetic diversity and phylogenetic analysis of EG95 sequences of Echinococcus granulosus: Implications for EG95 vaccine application. ASIAN PAC J TROP MED 2017. [PMID: 28647192 DOI: 10.1016/j.apjtm.2017.05.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
OBJECTIVE To analyse the genetic variability of EG95 sequences and provide guidance for EG95 vaccine application against Echinococcus granulosus (E. granulosus). METHODS We analysed EG95 polymorphism by collecting total 97 different E. granulosus isolates from 12 different host species that originated from 10 different countries. Multiple sequence alignments and the homology were performed by Lasergene 1 (DNASTAR Inc., Madison, WI), and the phylogenetic analysis was performed by using MEGA5.1 (CEMI, Tempe, AZ, USA). In addition, linear and conformational epitopes were analysed, including secondary structure, NXT/S glycosylation, fibronectin type III (FnIII) domain and glycosylphosphatidylinositol anchor signal (GPI-anchor). The secondary structure was predicted by PSIPRED method. RESULTS Our results indicated that most isolates overall shared 72.6-100% identity in EG95 gene sequence with the published standard EG95 sequence, X90928. However, EG95 gene indeed has polymorphism in different isolates. Phylogenetic analysis showed that different isolates could be divided into three subgroups. Subgroup 1 contained 87 isolates while Subgroup 2 and Subgroup 3 consisted of 3 and 7 isolates, respectively. Four sequences cloned from oncosphere shared a high identity with the parental sequence of the current vaccine, X90928, and they belonged to Subgroup 1. However, in comparison to X90928, several amino acid mutations occurred in most isolates besides oncosphere, which potentially altered the immunodominant linear epitopes, glycosylation sites and secondary structures in EG95 genes. All these variations might change their previous antigenicity and thereby affecting the efficacy of current EG95 vaccine. CONCLUSIONS This study reveals the genetic variability of EG95 sequences in different E. granulosus isolates, and proposed that more vaccination trials would be needed to test the effectiveness of current EG95 vaccine against distinct isolates in different countries.
Collapse
Affiliation(s)
- Wei Pan
- Jiangsu Key Laboratory of Immunity and Metabolism, Department of Pathogenic Biology and Immunology, Laboratory of Infection and Immunity, Xuzhou Medical University, Xuzhou, Jiangsu Province, 221004, PR China
| | - De-Sheng Chen
- Jiangsu Key Laboratory of Immunity and Metabolism, Department of Pathogenic Biology and Immunology, Laboratory of Infection and Immunity, Xuzhou Medical University, Xuzhou, Jiangsu Province, 221004, PR China; Department of Clinical Medicine, Xuzhou Medical University, Xuzhou, Jiangsu Province, 221004, PR China
| | - Yun-Juan Lu
- Jiangsu Key Laboratory of Immunity and Metabolism, Department of Pathogenic Biology and Immunology, Laboratory of Infection and Immunity, Xuzhou Medical University, Xuzhou, Jiangsu Province, 221004, PR China; Department of Clinical Medicine, Xuzhou Medical University, Xuzhou, Jiangsu Province, 221004, PR China
| | - Hui-Wen Xu
- Jiangsu Key Laboratory of Immunity and Metabolism, Department of Pathogenic Biology and Immunology, Laboratory of Infection and Immunity, Xuzhou Medical University, Xuzhou, Jiangsu Province, 221004, PR China; Department of Clinical Medicine, Xuzhou Medical University, Xuzhou, Jiangsu Province, 221004, PR China
| | - Wen-Ting Hao
- Jiangsu Key Laboratory of Immunity and Metabolism, Department of Pathogenic Biology and Immunology, Laboratory of Infection and Immunity, Xuzhou Medical University, Xuzhou, Jiangsu Province, 221004, PR China
| | - Ya-Wen Zhang
- Jiangsu Key Laboratory of Immunity and Metabolism, Department of Pathogenic Biology and Immunology, Laboratory of Infection and Immunity, Xuzhou Medical University, Xuzhou, Jiangsu Province, 221004, PR China; Department of Clinical Medicine, Xuzhou Medical University, Xuzhou, Jiangsu Province, 221004, PR China
| | - Su-Ping Qin
- Jiangsu Key Laboratory of Immunity and Metabolism, Department of Pathogenic Biology and Immunology, Laboratory of Infection and Immunity, Xuzhou Medical University, Xuzhou, Jiangsu Province, 221004, PR China
| | - Kui-Yang Zheng
- Jiangsu Key Laboratory of Immunity and Metabolism, Department of Pathogenic Biology and Immunology, Laboratory of Infection and Immunity, Xuzhou Medical University, Xuzhou, Jiangsu Province, 221004, PR China
| | - Ren-Xian Tang
- Jiangsu Key Laboratory of Immunity and Metabolism, Department of Pathogenic Biology and Immunology, Laboratory of Infection and Immunity, Xuzhou Medical University, Xuzhou, Jiangsu Province, 221004, PR China.
| |
Collapse
|
41
|
Simeon S, Li H, Win TS, Malik AA, Kandhro AH, Piacham T, Shoombuatong W, Nuchnoi P, Wikberg JES, Gleeson MP, Nantasenamat C. PepBio: predicting the bioactivity of host defense peptides. RSC Adv 2017. [DOI: 10.1039/c7ra01388d] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
A large-scale QSAR study of host defense peptides sheds light on the origin of their bioactivities (antibacterial, anticancer, antiviral and antifungal).
Collapse
|
42
|
Novel "extended sequons" of human N-glycosylation sites improve the precision of qualitative predictions: an alignment-free study of pattern recognition using ProtDCal protein features. Amino Acids 2016; 49:317-325. [PMID: 27896447 DOI: 10.1007/s00726-016-2362-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2016] [Accepted: 11/05/2016] [Indexed: 10/20/2022]
Abstract
N-Glycosylation is a common post-translational modification that plays an important role in the proper folding and function of many proteins. This modification is largely dependent on the presence of a sequence motif called a "sequon" defined as Asn-Xxx-Ser/Thr. However, evidence has shown that the presence of such a "sequon" is insufficient to determine the occurrence of N-glycosylation with high precision. This study aims to elucidate patterns that can more accurately predict N-glycosylation sites in human proteins. The novel motifs are evaluated using benchmarking data from 188 organisms. Performance is largely sustained compared to the human data, which validates the robustness of the novel extracted "extended sequons". We, therefore, introduce new knowledge about sequence-related factors that control N-glycosylation.
Collapse
|
43
|
Kizhedath A, Wilkinson S, Glassey J. Applicability of predictive toxicology methods for monoclonal antibody therapeutics: status Quo and scope. Arch Toxicol 2016; 91:1595-1612. [PMID: 27766364 PMCID: PMC5364268 DOI: 10.1007/s00204-016-1876-7] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 10/12/2016] [Indexed: 12/31/2022]
Abstract
Biopharmaceuticals, monoclonal antibody (mAb)-based therapeutics in particular, have positively impacted millions of lives. MAbs and related therapeutics are highly desirable from a biopharmaceutical perspective as they are highly target specific and well tolerated within the human system. Nevertheless, several mAbs have been discontinued or withdrawn based either on their inability to demonstrate efficacy and/or due to adverse effects. Approved monoclonal antibodies and derived therapeutics have been associated with adverse effects such as immunogenicity, cytokine release syndrome, progressive multifocal leukoencephalopathy, intravascular haemolysis, cardiac arrhythmias, abnormal liver function, gastrointestinal perforation, bronchospasm, intraocular inflammation, urticaria, nephritis, neuropathy, birth defects, fever and cough to name a few. The advances made in this field are also impeded by a lack of progress in bioprocess development strategies as well as increasing costs owing to attrition, wherein the lack of efficacy and safety accounts for nearly 60 % of all factors contributing to attrition. This reiterates the need for smarter preclinical development using quality by design-based approaches encompassing carefully designed predictive models during early stages of drug development. Different in vitro and in silico methods are extensively used for predicting biological activity as well as toxicity during small molecule drug development; however, their full potential has not been utilized for biological drug development. The scope of in vitro and in silico tools in early developmental stages of monoclonal antibody-based therapeutics production and how it contributes to lower attrition rates leading to faster development of potential drug candidates has been evaluated. The applicability of computational toxicology approaches in this context as well as the pitfalls and promises of extending such techniques to biopharmaceutical development has been highlighted.
Collapse
Affiliation(s)
- Arathi Kizhedath
- Chemical Engineering and Advanced Materials, Newcastle University, Newcastle upon Tyne, NE17RU, UK. .,Medical Toxicology Centre, Institute of Cellular Medicine, Newcastle University, Newcastle upon Tyne, NE2 4AA, UK.
| | - Simon Wilkinson
- Medical Toxicology Centre, Institute of Cellular Medicine, Newcastle University, Newcastle upon Tyne, NE2 4AA, UK
| | - Jarka Glassey
- Chemical Engineering and Advanced Materials, Newcastle University, Newcastle upon Tyne, NE17RU, UK
| |
Collapse
|
44
|
Huang BFF, Boutros PC. The parameter sensitivity of random forests. BMC Bioinformatics 2016; 17:331. [PMID: 27586051 PMCID: PMC5009551 DOI: 10.1186/s12859-016-1228-x] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Accepted: 08/26/2016] [Indexed: 02/07/2023] Open
Abstract
Background The Random Forest (RF) algorithm for supervised machine learning is an ensemble learning method widely used in science and many other fields. Its popularity has been increasing, but relatively few studies address the parameter selection process: a critical step in model fitting. Due to numerous assertions regarding the performance reliability of the default parameters, many RF models are fit using these values. However there has not yet been a thorough examination of the parameter-sensitivity of RFs in computational genomic studies. We address this gap here. Results We examined the effects of parameter selection on classification performance using the RF machine learning algorithm on two biological datasets with distinct p/n ratios: sequencing summary statistics (low p/n) and microarray-derived data (high p/n). Here, p, refers to the number of variables and, n, the number of samples. Our findings demonstrate that parameterization is highly correlated with prediction accuracy and variable importance measures (VIMs). Further, we demonstrate that different parameters are critical in tuning different datasets, and that parameter-optimization significantly enhances upon the default parameters. Conclusions Parameter performance demonstrated wide variability on both low and high p/n data. Therefore, there is significant benefit to be gained by model tuning RFs away from their default parameter settings. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1228-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Barbara F F Huang
- Informatics and Bio-computing Program, Ontario Institute for Cancer Research, Toronto, Canada
| | - Paul C Boutros
- Informatics and Bio-computing Program, Ontario Institute for Cancer Research, Toronto, Canada. .,Department of Medical Biophysics, University of Toronto, Toronto, Canada. .,Department of Pharmacology and Toxicology, University of Toronto, Toronto, Canada. .,MaRS Centre, 661 University Avenue, Suite 510, Toronto, Ontario, M5G 0A3, Canada.
| |
Collapse
|
45
|
Kleandrova VV, Ruso JM, Speck-Planche A, Dias Soeiro Cordeiro MN. Enabling the Discovery and Virtual Screening of Potent and Safe Antimicrobial Peptides. Simultaneous Prediction of Antibacterial Activity and Cytotoxicity. ACS COMBINATORIAL SCIENCE 2016; 18:490-8. [PMID: 27280735 DOI: 10.1021/acscombsci.6b00063] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Antimicrobial peptides (AMPs) represent promising alternatives to fight against bacterial pathogens. However, cellular toxicity remains one of the main concerns in the early development of peptide-based drugs. This work introduces the first multitasking (mtk) computational model focused on performing simultaneous predictions of antibacterial activities, and cytotoxicities of peptides. The model was created from a data set containing 3592 cases, and it displayed accuracy higher than 96% for classifying/predicting peptides in both training and prediction (test) sets. The technique known as alanine scanning was computationally applied to illustrate the calculation of the quantitative contributions of the amino acids (in their respective positions of the sequence) to the biological effects of a defined peptide. A small library formed by 10 peptides was generated, where peptides were designed by considering the interpretations of the different descriptors in the mtk-computational model. All the peptides were predicted to exhibit high antibacterial activities against multiple bacterial strains, and low cytotoxicity against various cell types. The present mtk-computational model can be considered a very useful tool to support high throughput research for the discovery of potent and safe AMPs.
Collapse
Affiliation(s)
- Valeria V. Kleandrova
- Faculty
of Technology and Production Management, Moscow State University of Food Production, Volokolamskoe shosse 11, Moscow, Russia
| | - Juan M. Ruso
- Department
of Applied Physics, University of Santiago de Compostela (USC), 15782 Santiago de Compostela, Spain
| | - Alejandro Speck-Planche
- Department
of Applied Physics, University of Santiago de Compostela (USC), 15782 Santiago de Compostela, Spain
- LAQV@REQUIMTE,
Department of Chemistry and Biochemistry, University of Porto, 4169-007 Porto, Portugal
| | | |
Collapse
|
46
|
Speck-Planche A, Kleandrova VV, Ruso JM, Cordeiro MNDS. First Multitarget Chemo-Bioinformatic Model To Enable the Discovery of Antibacterial Peptides against Multiple Gram-Positive Pathogens. J Chem Inf Model 2016; 56:588-98. [PMID: 26960000 DOI: 10.1021/acs.jcim.5b00630] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Antimicrobial peptides (AMPs) have emerged as promising therapeutic alternatives to fight against the diverse infections caused by different pathogenic microorganisms. In this context, theoretical approaches in bioinformatics have paved the way toward the creation of several in silico models capable of predicting antimicrobial activities of peptides. All current models have several significant handicaps, which prevent the efficient search for highly active AMPs. Here, we introduce the first multitarget (mt) chemo-bioinformatic model devoted to performing alignment-free prediction of antibacterial activity of peptides against multiple Gram-positive bacterial strains. The model was constructed from a data set containing 2488 cases of AMPs sequences assayed against at least 1 out of 50 Gram-positive bacterial strains. This mt-chemo-bioinformatic model displayed percentages of correct classification higher than 90.00% in both training and prediction (test) sets. For the first time, two computational approaches derived from basic concepts in genetics and molecular biology were applied, allowing the calculations of the relative contributions of any amino acid (in a defined position) to the antibacterial activity of an AMP and depending on the bacterial strain used in the biological assay. The present mt-chemo-bioinformatic model constitutes a powerful tool to enable the discovery of potent and versatile AMPs.
Collapse
Affiliation(s)
- Alejandro Speck-Planche
- Department of Applied Physics, University of Santiago de Compostela (USC) , 15782 Santiago de Compostela, Spain.,REQUIMTE/Department of Chemistry and Biochemistry, University of Porto , 4169-007 Porto, Portugal
| | - Valeria V Kleandrova
- Faculty of Technology and Production Management, Moscow State University of Food Production , Volokolamskoe shosse 11, 125080 Moscow, Russia
| | - Juan M Ruso
- Department of Applied Physics, University of Santiago de Compostela (USC) , 15782 Santiago de Compostela, Spain
| | - M N D S Cordeiro
- REQUIMTE/Department of Chemistry and Biochemistry, University of Porto , 4169-007 Porto, Portugal
| |
Collapse
|
47
|
Global informatics and physical property selection in protein sequences. Proc Natl Acad Sci U S A 2016; 113:1808-10. [PMID: 26831093 DOI: 10.1073/pnas.1525745113] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The degree of informatic independence between the physical properties of amino acids as encoded in actual protein sequences is calculated. It is shown that no physical property can be identified that carries significantly less information than others and that the information overlap between different properties and different length scales along the sequence is essentially zero. These observations suggest that bioinformatic models based on arbitrarily selected sets of physical properties are inherently deficient.
Collapse
|
48
|
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures. BIOMED RESEARCH INTERNATIONAL 2015; 2015:563674. [PMID: 26605332 PMCID: PMC4641208 DOI: 10.1155/2015/563674] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/24/2015] [Revised: 10/04/2015] [Accepted: 10/05/2015] [Indexed: 11/18/2022]
Abstract
Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high
F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.
Collapse
|