1
|
Sampaio-Dias IE, Rodríguez-Borges JE, Yáñez-Pérez V, Arrasate S, Llorente J, Brea JM, Bediaga H, Viña D, Loza MI, Caamaño O, García-Mera X, González-Díaz H. Synthesis, Pharmacological, and Biological Evaluation of 2-Furoyl-Based MIF-1 Peptidomimetics and the Development of a General-Purpose Model for Allosteric Modulators (ALLOPTML). ACS Chem Neurosci 2021; 12:203-215. [PMID: 33347281 DOI: 10.1021/acschemneuro.0c00687] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
This work describes the synthesis and pharmacological evaluation of 2-furoyl-based Melanostatin (MIF-1) peptidomimetics as dopamine D2 modulating agents. Eight novel peptidomimetics were tested for their ability to enhance the maximal effect of tritiated N-propylapomorphine ([3H]-NPA) at D2 receptors (D2R). In this series, 2-furoyl-l-leucylglycinamide (6a) produced a statistically significant increase in the maximal [3H]-NPA response at 10 pM (11 ± 1%), comparable to the effect of MIF-1 (18 ± 9%) at the same concentration. This result supports previous evidence that the replacement of proline residue by heteroaromatic scaffolds are tolerated at the allosteric binding site of MIF-1. Biological assays performed for peptidomimetic 6a using cortex neurons from 19-day-old Wistar-Kyoto rat embryos suggest that 6a displays no neurotoxicity up to 100 μM. Overall, the pharmacological and toxicological profile and the structural simplicity of 6a makes this peptidomimetic a potential lead compound for further development and optimization, paving the way for the development of novel modulating agents of D2R suitable for the treatment of CNS-related diseases. Additionally, the pharmacological and biological data herein reported, along with >20 000 outcomes of preclinical assays, was used to seek a general model to predict the allosteric modulatory potential of molecular candidates for a myriad of target receptors, organisms, cell lines, and biological activity parameters based on perturbation theory (PT) ideas and machine learning (ML) techniques, abbreviated as ALLOPTML. By doing so, ALLOPTML shows high specificity Sp = 89.2/89.4%, sensitivity Sn = 71.3/72.2%, and accuracy Ac = 86.1%/86.4% in training/validation series, respectively. To the best of our knowledge, ALLOPTML is the first general-purpose chemoinformatic tool using a PTML-based model for the multioutput and multicondition prediction of allosteric compounds, which is expected to save both time and resources during the early drug discovery of allosteric modulators.
Collapse
Affiliation(s)
- Ivo E. Sampaio-Dias
- LAQV/REQUIMTE, Dept. of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
| | - José E. Rodríguez-Borges
- LAQV/REQUIMTE, Dept. of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
| | - Víctor Yáñez-Pérez
- Dept. of Organic Chemistry II, University of Basque Country (UPV-EHU), 48940 Leioa, Spain
| | - Sonia Arrasate
- Dept. of Pharmacology, Faculty of Medicine and Nursing, University of Basque Country (UPV-EHU), 48940 Leioa, Spain
| | - Javier Llorente
- Dept. of Pharmacology, Faculty of Medicine and Nursing, University of Basque Country (UPV-EHU), 48940 Leioa, Spain
- Dept. of Pharmacology, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - José M. Brea
- Innopharma Screening Platform, Biofarma Research group, Centre of Research in Molecular Medicine and Chronic Diseases CIMUS, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - Harbil Bediaga
- Dept. of Organic Chemistry II, University of Basque Country (UPV-EHU), 48940 Leioa, Spain
- Dept. of Physical Chemistry, University of Basque Country (UPV-EHU), 48940 Leioa, Spain
| | - Dolores Viña
- Dept. of Pharmacology, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
- Centre of Research in Molecular Medicine and Chronic Diseases CIMUS, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - María Isabel Loza
- Innopharma Screening Platform, Biofarma Research group, Centre of Research in Molecular Medicine and Chronic Diseases CIMUS, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - Olga Caamaño
- Dept. of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - Xerardo García-Mera
- Dept. of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - Humberto González-Díaz
- Dept. of Organic Chemistry II, University of Basque Country (UPV-EHU), 48940 Leioa, Spain
- Basque Center for Biophysics (CSIC UPV/EHU), University of Basque Country (UPV-EHU), 48940 Leioa, Spain
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Spain
| |
Collapse
|
2
|
Agüero-Chapin G, Galpert D, Molina-Ruiz R, Ancede-Gallardo E, Pérez-Machado G, De la Riva GA, Antunes A. Graph Theory-Based Sequence Descriptors as Remote Homology Predictors. Biomolecules 2019; 10:E26. [PMID: 31878100 PMCID: PMC7022958 DOI: 10.3390/biom10010026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 12/16/2019] [Accepted: 12/18/2019] [Indexed: 12/23/2022] Open
Abstract
Alignment-free (AF) methodologies have increased in popularity in the last decades as alternative tools to alignment-based (AB) algorithms for performing comparative sequence analyses. They have been especially useful to detect remote homologs within the twilight zone of highly diverse gene/protein families and superfamilies. The most popular alignment-free methodologies, as well as their applications to classification problems, have been described in previous reviews. Despite a new set of graph theory-derived sequence/structural descriptors that have been gaining relevance in the detection of remote homology, they have been omitted as AF predictors when the topic is addressed. Here, we first go over the most popular AF approaches used for detecting homology signals within the twilight zone and then bring out the state-of-the-art tools encoding graph theory-derived sequence/structure descriptors and their success for identifying remote homologs. We also highlight the tendency of integrating AF features/measures with the AB ones, either into the same prediction model or by assembling the predictions from different algorithms using voting/weighting strategies, for improving the detection of remote signals. Lastly, we briefly discuss the efforts made to scale up AB and AF features/measures for the comparison of multiple genomes and proteomes. Alongside the achieved experiences in remote homology detection by both the most popular AF tools and other less known ones, we provide our own using the graphical-numerical methodologies, MARCH-INSIDE, TI2BioP, and ProtDCal. We also present a new Python-based tool (SeqDivA) with a friendly graphical user interface (GUI) for delimiting the twilight zone by using several similar criteria.
Collapse
Affiliation(s)
- Guillermin Agüero-Chapin
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Deborah Galpert
- Departamento de Ciencia de la Computación. Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Reinaldo Molina-Ruiz
- Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Evys Ancede-Gallardo
- Programa de Doctorado en Fisicoquímica Molecular, Facultad de Ciencias Exactas, Universidad Andrés Bello, Av. República 239, Santiago 8370146, Chile;
| | - Gisselle Pérez-Machado
- EpiDisease S.L. Spin-Off of Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), 46980 Valencia, Spain;
| | - Gustavo A. De la Riva
- Laboratorio de Biotecnología Aplicada S. de R.L. de C.V., GRECA Inc., Carretera La Piedad-Carapán, km 3.5, La Piedad, Michoacán 59300, Mexico;
- Tecnológico Nacional de México, Instituto Tecnológico de la Piedad, Av. Ricardo Guzmán Romero, Santa Fe, La Piedad de Cavadas, Michoacán 59370, Mexico
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| |
Collapse
|
3
|
Vásquez-Domínguez E, Armijos-Jaramillo VD, Tejera E, González-Díaz H. Multioutput Perturbation-Theory Machine Learning (PTML) Model of ChEMBL Data for Antiretroviral Compounds. Mol Pharm 2019; 16:4200-4212. [PMID: 31426639 DOI: 10.1021/acs.molpharmaceut.9b00538] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Retroviral infections, such as HIV, are, until now, diseases with no cure. Medicine and pharmaceutical chemistry need and consider it a huge goal to define target proteins of new antiretroviral compounds. ChEMBL manages Big Data features with a complex data set, which is hard to organize. This makes information difficult to analyze due to a big number of characteristics described in order to predict new drug candidates for retroviral infections. For this reason, we propose to develop a new predictive model combining perturbation theory (PT) bases and machine learning (ML) modeling to create a new tool that can take advantage of all the available information. The PTML model proposed in this work for the ChEMBL data set preclinical experimental assays for antiretroviral compounds consists of a linear equation with four variables. The PT operators used are founded on multicondition moving averages, combining different features and simplifying the difficulty to manage all data. More than 140 000 preclinical assays for 56 105 compounds with different characteristics or experimental conditions have been carried out and can be found in ChEMBL database, covering combinations with 359 biological activity parameters (c0), 55 protein accessions (c1), 83 cell lines (c2), 64 organisms of assay (c3), and 773 subtypes or strains. We have included 150 148 preclinical experimental assays for HIV virus, 1188 for HTLV virus, 84 for simian immunodeficiency virus, 370 for murine leukemia virus, 119 for Rous sarcoma virus, 1581 for MMTV, etc. We also included 5277 assays for hepatitis B virus. The developed PTML model reached considerable values in sensibility (73.05% for training and 73.10% for validation), specificity (86.61% for training and 87.17% for validation), and accuracy (75.84% for training and 75.98% for validation). We also compared alternative PTML models with different PT operators such as covariance, moments, and exponential terms. Finally, we made a comparison between literature ML models with our PTML model and also artificial neural network (ANN) nonlinear models. We conclude that this PTML model is the first one to consider multiple characteristics of preclinical experimental antiretroviral assays combined, generating a simple, useful, and adaptable instrument, which could reduce time and costs in antiretroviral drugs research.
Collapse
Affiliation(s)
- Emilia Vásquez-Domínguez
- Department of Organic Chemistry II , University of Basque Country UPV/EHU , 48940 Leioa , Spain.,Faculty of Engineering and Applied Sciences-Biotechnology , Universidad de Las Américas (UDLA) , 170125 Quito , Ecuador
| | - Vinicio Danilo Armijos-Jaramillo
- Faculty of Engineering and Applied Sciences-Biotechnology , Universidad de Las Américas (UDLA) , 170125 Quito , Ecuador.,Bio-chemioinformatics group , Universidad de Las Américas (UDLA) , 170125 Quito , Ecuador
| | - Eduardo Tejera
- Faculty of Engineering and Applied Sciences-Biotechnology , Universidad de Las Américas (UDLA) , 170125 Quito , Ecuador.,Bio-chemioinformatics group , Universidad de Las Américas (UDLA) , 170125 Quito , Ecuador
| | - Humbert González-Díaz
- Department of Organic Chemistry II , University of Basque Country UPV/EHU , 48940 Leioa , Spain.,IKERBASQUE, Basque Foundation for Science , 48011 Bilbao , Spain
| |
Collapse
|
4
|
Spänig S, Heider D. Encodings and models for antimicrobial peptide classification for multi-resistant pathogens. BioData Min 2019; 12:7. [PMID: 30867681 PMCID: PMC6399931 DOI: 10.1186/s13040-019-0196-x] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 02/24/2019] [Indexed: 01/10/2023] Open
Abstract
Antimicrobial peptides (AMPs) are part of the inherent immune system. In fact, they occur in almost all organisms including, e.g., plants, animals, and humans. Remarkably, they show effectivity also against multi-resistant pathogens with a high selectivity. This is especially crucial in times, where society is faced with the major threat of an ever-increasing amount of antibiotic resistant microbes. In addition, AMPs can also exhibit antitumor and antiviral effects, thus a variety of scientific studies dealt with the prediction of active peptides in recent years. Due to their potential, even the pharmaceutical industry is keen on discovering and developing novel AMPs. However, AMPs are difficult to verify in vitro, hence researchers conduct sequence similarity experiments against known, active peptides. Unfortunately, this approach is very time-consuming and limits potential candidates to sequences with a high similarity to known AMPs. Machine learning methods offer the opportunity to explore the huge space of sequence variations in a timely manner. These algorithms have, in principal, paved the way for an automated discovery of AMPs. However, machine learning models require a numerical input, thus an informative encoding is very important. Unfortunately, developing an appropriate encoding is a major challenge, which has not been entirely solved so far. For this reason, the development of novel amino acid encodings is established as a stand-alone research branch. The present review introduces state-of-the-art encodings of amino acids as well as their properties in sequence and structure based aggregation. Moreover, albeit a well-chosen encoding is essential, performant classifiers are required, which is reflected by a tendency towards specifically designed models in the literature. Furthermore, we introduce these models with a particular focus on encodings derived from support vector machines and deep learning approaches. Albeit a strong focus has been set on AMP predictions, not all of the mentioned encodings have been elaborated as part of antimicrobial research studies, but rather as general protein or peptide representations.
Collapse
Affiliation(s)
- Sebastian Spänig
- Department of Bioinformatics, Faculty of Mathematics and Computer Science, Philipps-University of Marburg, Marburg, Germany
| | - Dominik Heider
- Department of Bioinformatics, Faculty of Mathematics and Computer Science, Philipps-University of Marburg, Marburg, Germany
| |
Collapse
|
5
|
Ferreira da Costa J, Silva D, Caamaño O, Brea JM, Loza MI, Munteanu CR, Pazos A, García-Mera X, González-Díaz H. Perturbation Theory/Machine Learning Model of ChEMBL Data for Dopamine Targets: Docking, Synthesis, and Assay of New l-Prolyl-l-leucyl-glycinamide Peptidomimetics. ACS Chem Neurosci 2018; 9:2572-2587. [PMID: 29791132 DOI: 10.1021/acschemneuro.8b00083] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Predicting drug-protein interactions (DPIs) for target proteins involved in dopamine pathways is a very important goal in medicinal chemistry. We can tackle this problem using Molecular Docking or Machine Learning (ML) models for one specific protein. Unfortunately, these models fail to account for large and complex big data sets of preclinical assays reported in public databases. This includes multiple conditions of assays, such as different experimental parameters, biological assays, target proteins, cell lines, organism of the target, or organism of assay. On the other hand, perturbation theory (PT) models allow us to predict the properties of a query compound or molecular system in experimental assays with multiple boundary conditions based on a previously known case of reference. In this work, we report the first PTML (PT + ML) study of a large ChEMBL data set of preclinical assays of compounds targeting dopamine pathway proteins. The best PTML model found predicts 50000 cases with accuracy of 70-91% in training and external validation series. We also compared the linear PTML model with alternative PTML models trained with multiple nonlinear methods (artificial neural network (ANN), Random Forest, Deep Learning, etc.). Some of the nonlinear methods outperform the linear model but at the cost of a notable increment of the complexity of the model. We illustrated the practical use of the new model with a proof-of-concept theoretical-experimental study. We reported for the first time the organic synthesis, chemical characterization, and pharmacological assay of a new series of l-prolyl-l-leucyl-glycinamide (PLG) peptidomimetic compounds. In addition, we performed a molecular docking study for some of these compounds with the software Vina AutoDock. The work ends with a PTML model predictive study of the outcomes of the new compounds in a large number of assays. Therefore, this study offers a new computational methodology for predicting the outcome for any compound in new assays. This PTML method focuses on the prediction with a simple linear model of multiple pharmacological parameters (IC50, EC50, Ki, etc.) for compounds in assays involving different cell lines used, organisms of the protein target, or organism of assay for proteins in the dopamine pathway.
Collapse
Affiliation(s)
- Joana Ferreira da Costa
- Department of Organic Chemistry, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - David Silva
- Department of Organic Chemistry, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - Olga Caamaño
- Department of Organic Chemistry, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - José M. Brea
- CIMUS, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
- Department of Pharmacology, Pharmacy and Pharmaceutical Technology, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - Maria Isabel Loza
- CIMUS, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
- Department of Pharmacology, Pharmacy and Pharmaceutical Technology, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - Cristian R. Munteanu
- Instituto de Investigacion Biomedica de A Coruña (INIBIC), Complexo Hospitalario Universitario de A Coruña (CHUAC), A Coruña, 15006, Spain
| | - Alejandro Pazos
- Instituto de Investigacion Biomedica de A Coruña (INIBIC), Complexo Hospitalario Universitario de A Coruña (CHUAC), A Coruña, 15006, Spain
- Computer Science Department, Faculty of Computer Science, University of A Coruna, 15071 A Coruña, Spain
| | - Xerardo García-Mera
- Department of Organic Chemistry, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - Humbert González-Díaz
- Department of Organic Chemistry II, University of Basque Country UPV/EHU, 48940 Leioa, Spain
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Spain
| |
Collapse
|
6
|
Bediaga H, Arrasate S, González-Díaz H. PTML Combinatorial Model of ChEMBL Compounds Assays for Multiple Types of Cancer. ACS COMBINATORIAL SCIENCE 2018; 20:621-632. [PMID: 30240186 DOI: 10.1021/acscombsci.8b00090] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Determining the target proteins of new anticancer compounds is a very important task in Medicinal Chemistry. In this sense, chemists carry out preclinical assays with a high number of combinations of experimental conditions (c j). In fact, ChEMBL database contains outcomes of 65 534 different anticancer activity preclinical assays for 35 565 different chemical compounds (1.84 assays per compound). These assays cover different combinations of c j formed from >70 different biological activity parameters ( c0), >300 different drug targets ( c1), >230 cell lines ( c2), and 5 organisms of assay ( c3) or organisms of the target ( c4). It include a total of 45 833 assays in leukemia, 6227 assays in breast cancer, 2499 assays in ovarian cancer, 3499 in colon cancer, 3159 in lung cancer, 2750 in prostate cancer, 601 in melanoma, etc. This is a very complex data set with multiple Big Data features. This data is hard to be rationalized by researchers to extract useful relationships and predict new compounds. In this context, we propose to combine perturbation theory (PT) ideas and machine learning (ML) modeling to solve this combinatorial-like problem. In this work, we report a PTML (PT + ML) model for ChEMBL data set of preclinical assays of anticancer compounds. This is a simple linear model with only three variables. The model presented values of area under receiver operating curve = AUROC = 0.872, specificity = Sp(%) = 90.2, sensitivity = Sn(%) = 70.6, and overall accuracy = Ac(%) = 87.7 in training series. The model also have Sp(%) = 90.1, Sn(%) = 71.4, and Ac(%) = 87.8 in external validation series. The model use PT operators based on multicondition moving averages to capture all the complexity of the data set. We also compared the model with nonlinear artificial neural network (ANN) models obtaining similar results. This confirms the hypothesis of a linear relationship between the PT operators and the classification as anticancer compounds in different combinations of assay conditions. Last, we compared the model with other PTML models reported in the literature concluding that this is the only one PTML model able to predict activity against multiple types of cancer. This model is a simple but versatile tool for the prediction of the targets of anticancer compounds taking into consideration multiple combinations of experimental conditions in preclinical assays.
Collapse
Affiliation(s)
- Harbil Bediaga
- Department of Organic Chemistry II, University of Basque Country UPV/EHU, 48940, Leioa, Spain
| | - Sonia Arrasate
- Department of Organic Chemistry II, University of Basque Country UPV/EHU, 48940, Leioa, Spain
| | - Humbert González-Díaz
- Department of Organic Chemistry II, University of Basque Country UPV/EHU, 48940, Leioa, Spain
- IKERBASQUE, Basque Foundation for Science, 48011, Bilbao, Spain
| |
Collapse
|
7
|
Fernandez M, Abreu JI, Shi H, Barnard AS. Machine Learning Prediction of the Energy Gap of Graphene Nanoflakes Using Topological Autocorrelation Vectors. ACS COMBINATORIAL SCIENCE 2016; 18:661-664. [PMID: 27598830 DOI: 10.1021/acscombsci.6b00094] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The possibility of band gap engineering in graphene opens countless new opportunities for application in nanoelectronics. In this work, the energy gaps of 622 computationally optimized graphene nanoflakes were mapped to topological autocorrelation vectors using machine learning techniques. Machine learning modeling revealed that the most relevant correlations appear at topological distances in the range of 1 to 42 with prediction accuracy higher than 80%. The data-driven model can statistically discriminate between graphene nanoflakes with different energy gaps on the basis of their molecular topology.
Collapse
Affiliation(s)
- Michael Fernandez
- Data61, CSIRO, 343 Royal Parade, Parkville, Victoria 3052, Australia
| | - Jose I. Abreu
- Departamento
de Ingeniería Informática, Facultad de Ingeniería, Universidad Católica de la Santísima Concepción, Alonso
de Ribera 2850, Concepción, Chile
| | - Hongqing Shi
- Data61, CSIRO, 343 Royal Parade, Parkville, Victoria 3052, Australia
| | - Amanda S. Barnard
- Data61, CSIRO, 343 Royal Parade, Parkville, Victoria 3052, Australia
| |
Collapse
|
8
|
Chrysostomou C, Seker H. Novel protein weight matrix generated from amino acid indices. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016; 2015:8181-4. [PMID: 26738193 DOI: 10.1109/embc.2015.7320293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
In recent years, numerous protein weight matrices have been developed that include physical characteristics of proteins, such as local sequence-structure information, alpha-helix information, secondary structure information and solvent accessibility states. These protein weight matrices are shown to have generally improved protein sequence alignments over classical protein weight matrices, like Point Accepted Mutation (PAM), Blocks of Amino Acid Substitution (BLOSUM), and GONNET matrices, where important limitations have been observe in recent works. In this paper, a novel protein weight matrix is constructed and presented. This protein weight matrix is not considered based on the mutation rate, like PAM or BLOSUM matrices, but on the physicochemical properties of each amino acid. In the literature, over 500 amino acid indices exist, each one representing a unique biological protein feature. For this study, 25 amino acid indices were selected. These amino acid indices represent general and widely accepted features of the amino acids. By using the proposed protein weight matrix the following advantages can be obtained compared to the classical protein weight matrices. The proposed protein weight matrix is not biased to specific groups of protein sequences as the values are calculated from the amino acid indices, and not from the protein sequences. Additionally, for the proposed protein weight matrix, the same matrix can be considered regardless of the protein sequence's homology to be aligned or the mutation rate presented. A correlation to the physical characterisations of the amino acids that the protein weight matrix derived from can be achieved. Different similarity matrices can be generated when different physical characterisations of amino acids are considered.
Collapse
|
9
|
Fernandez M, Ahmad S, Abreu JI, Sarai A. Large-scale recognition of high-affinity protease–inhibitor complexes using topological autocorrelation and support vector machines. MOLECULAR SIMULATION 2015. [DOI: 10.1080/08927022.2015.1059937] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
10
|
CISAPS: Complex Informational Spectrum for the Analysis of Protein Sequences. Adv Bioinformatics 2015; 2015:909765. [PMID: 25632276 PMCID: PMC4302972 DOI: 10.1155/2015/909765] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2014] [Revised: 11/27/2014] [Accepted: 12/04/2014] [Indexed: 11/23/2022] Open
Abstract
Complex informational spectrum analysis for protein sequences (CISAPS) and its web-based server are developed and presented. As recent studies show, only the use of the absolute spectrum in the analysis of protein sequences using the informational spectrum analysis is proven to be insufficient. Therefore, CISAPS is developed to consider and provide results in three forms including absolute, real, and imaginary spectrum. Biologically related features to the analysis of influenza A subtypes as presented as a case study in this study can also appear individually either in the real or imaginary spectrum. As the results presented, protein classes can present similarities or differences according to the features extracted from CISAPS web server. These associations are probable to be related with the protein feature that the specific amino acid index represents. In addition, various technical issues such as zero-padding and windowing that may affect the analysis are also addressed. CISAPS uses an expanded list of 611 unique amino acid indices where each one represents a different property to perform the analysis. This web-based server enables researchers with little knowledge of signal processing methods to apply and include complex informational spectrum analysis to their work.
Collapse
|
11
|
De novoinference of protein function from coarse-grained dynamics. Proteins 2014; 82:2443-54. [DOI: 10.1002/prot.24609] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Revised: 04/29/2014] [Accepted: 05/13/2014] [Indexed: 01/04/2023]
|
12
|
Garriga M, Retamales JB, Romero-Bravo S, Caligari PDS, Lobos GA. Chlorophyll, anthocyanin, and gas exchange changes assessed by spectroradiometry in Fragaria chiloensis under salt stress. JOURNAL OF INTEGRATIVE PLANT BIOLOGY 2014; 56:505-15. [PMID: 24618024 DOI: 10.1111/jipb.12193] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Accepted: 03/07/2014] [Indexed: 05/03/2023]
Abstract
Chlorophyll and anthocyanin contents provide a valuable indicator of the status of a plant's physiology, but to be more widely utilized it needs to be assessed easily and non-destructively. This is particularly evident in terms of assessing and exploiting germplasm for plant-breeding programs. We report, for the first time, experiments with Fragaria chiloensis (L.) Duch. and the estimation of the effects of response to salinity stress (0, 30, and 60 mmol NaCl/L) in terms of these pigments content and gas exchange. It is shown that both pigments (which interestingly, themselves show a high correlation) give a good indication of stress response. Both pigments can be accurately predicted using spectral reflectance indices (SRI); however, the accuracy of the predictions was slightly improved using multilinear regression analysis models and genetic algorithm analysis. Specifically for chlorophyll content, unlike other species, the use of published SRI gave better indications of stress response than Normalized Difference Vegetation Index. The effect of salt on gas exchange is only evident at the highest concentration and some SRI gave better prediction performance than the known Photochemical Reflectance Index. This information will therefore be useful for identifying tolerant genotypes to salt stress for incorporation in breeding programs.
Collapse
Affiliation(s)
- Miguel Garriga
- Faculty of Agricultural Sciences, Plant Breeding and Phenomic Center, Universidad de Talca, Talca, Chile
| | | | | | | | | |
Collapse
|
13
|
Chrysostomou C, Seker H. Construction of protein dendrograms based on amino acid indices and Discrete Fourier Transform. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2014; 2014:816-819. [PMID: 25570084 DOI: 10.1109/embc.2014.6943716] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
From the literature, existing methods use pairwise percent identity to identify the percentage of similarity between two protein sequences, in order to create a dendrogram. As this is a parametric method of measuring the similarities between proteins, and different parameter may yield different results, this method does not guarantee that the global optimal similarity values will be found. As protein dendrogram construction is used in other areas, such as multiple protein sequence alignments, it is very important that the most related protein sequences to be identified and align first. Furthermore, by using the pairwise percent identity of the protein sequences to construct the dendrograms, the physical characteristics of protein sequences and amino acids are not considered. In this paper, a new method was proposed for constructing protein sequence dendrograms. For this method, Discrete Fourier Transform, was used to construct the distance matrix in combination with the multiple amino acid indices that were used to encode protein sequences into numerical sequences. In order to show the applicability and robustness of the proposed method, a case study was presented by using nine Cluster of Differentiation 4 protein sequences extracted from the UniProt online database.
Collapse
|
14
|
Chrysostomou C, Seker H. Construction of protein distance matrix based on amino acid indices and Discrete Fourier Transform. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2013; 2013:4066-4069. [PMID: 24110625 DOI: 10.1109/embc.2013.6610438] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Protein distance matrix is widely used in various protein sequence analyses, and mainly obtained by using pairwise sequence alignment scores or protein sequence homology, which fail to take into consideration of individual physical characteristics of protein sequences and amino acids, or a combination of these features. In this paper, a new method is therefore proposed for constructing protein distance matrix based on natural amino acid indices in combination with Discrete Fourier Transform (DFT). For the proposed method, protein distance matrices can be generated using any given set of amino acid indices, each one of which represents a unique biological feature of protein sequences. In this study, the results are based on the combination of 25 widely accepted amino acid indices, which produced the best results, according to the biological relationships between proteins. As a case study 26 Cluster of Differentiation 4 (CD4) protein sequences were used in order to construct a distance matrix based on the proposed method. The results show that the pairwise relationship between CD4 protein sequences remain the same in comparison with their pairwise percent identity. For another group of protein sequences the pairwise relationship between CD4 protein sequences dramatically changed with the proposed method in comparison to the pairwise percent identity. The proposed distance matrix has been shown to have a positive impact on these case studies and therefore is expected to be useful in several fields such as multiple protein sequence alignment and phylogenetic analysis, where an accurate distance matrix based on natural generalized protein properties plays an important role.
Collapse
|
15
|
González-Díaz H, Riera-Fernández P. New Markov-Autocorrelation Indices for Re-evaluation of Links in Chemical and Biological Complex Networks used in Metabolomics, Parasitology, Neurosciences, and Epidemiology. J Chem Inf Model 2012; 52:3331-40. [DOI: 10.1021/ci300321f] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- Humberto González-Díaz
- Department of Microbiology
and Parasitology,
Faculty of Pharmacy, University of Santiago de Compostela (USC), 15782 Santiago de Compostela, Spain
| | - Pablo Riera-Fernández
- Department of Microbiology
and Parasitology,
Faculty of Pharmacy, University of Santiago de Compostela (USC), 15782 Santiago de Compostela, Spain
| |
Collapse
|
16
|
Rao HB, Zhu F, Yang GB, Li ZR, Chen YZ. Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res 2011; 39:W385-90. [PMID: 21609959 PMCID: PMC3125735 DOI: 10.1093/nar/gkr284] [Citation(s) in RCA: 105] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Sequence-derived structural and physicochemical features have been extensively used for analyzing and predicting structural, functional, expression and interaction profiles of proteins and peptides. PROFEAT has been developed as a web server for computing commonly used features of proteins and peptides from amino acid sequence. To facilitate more extensive studies of protein and peptides, numerous improvements and updates have been made to PROFEAT. We added new functions for computing descriptors of protein–protein and protein–small molecule interactions, segment descriptors for local properties of protein sequences, topological descriptors for peptide sequences and small molecule structures. We also added new feature groups for proteins and peptides (pseudo-amino acid composition, amphiphilic pseudo-amino acid composition, total amino acid properties and atomic-level topological descriptors) as well as for small molecules (atomic-level topological descriptors). Overall, PROFEAT computes 11 feature groups of descriptors for proteins and peptides, and a feature group of more than 400 descriptors for small molecules plus the derived features for protein–protein and protein–small molecule interactions. Our computational algorithms have been extensively tested and used in a number of published works for predicting proteins of specific structural or functional classes, protein–protein interactions, peptides of specific functions and quantitative structure activity relationships of small molecules. PROFEAT is accessible free of charge at http://bidd.cz3.nus.edu.sg/cgi-bin/prof/protein/profnew.cgi.
Collapse
Affiliation(s)
- H B Rao
- College of Chemistry, Sichuan University, Chengdu, 610064, PR China
| | | | | | | | | |
Collapse
|
17
|
Fuzzy oil drop model to interpret the structure of antifreeze proteins and their mutants. J Mol Model 2011; 18:229-37. [PMID: 21523554 PMCID: PMC3249532 DOI: 10.1007/s00894-011-1033-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2010] [Accepted: 03/07/2011] [Indexed: 12/04/2022]
Abstract
Mutations in proteins introduce structural changes and influence biological activity: the specific effects depend on the location of the mutation. The simple method proposed in the present paper is based on a two-step model of in silico protein folding. The structure of the first intermediate is assumed to be determined solely by backbone conformation. The structure of the second one is assumed to be determined by the presence of a hydrophobic center. The comparable structural analysis of the set of mutants is performed to identify the mutant-induced structural changes. The changes of the hydrophobic core organization measured by the divergence entropy allows quantitative comparison estimating the relative structural changes upon mutation. The set of antifreeze proteins, which appeared to represent the hydrophobic core structure accordant with “fuzzy oil drop” model was selected for analysis.
Collapse
|
18
|
Prado-Prado F, García-Mera X, Abeijón P, Alonso N, Caamaño O, Yáñez M, Gárate T, Mezo M, González-Warleta M, Muiño L, Ubeira FM, González-Díaz H. Using entropy of drug and protein graphs to predict FDA drug-target network: theoretic-experimental study of MAO inhibitors and hemoglobin peptides from Fasciola hepatica. Eur J Med Chem 2011; 46:1074-94. [PMID: 21315497 DOI: 10.1016/j.ejmech.2011.01.023] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2010] [Revised: 01/10/2011] [Accepted: 01/13/2011] [Indexed: 12/11/2022]
Abstract
There are many drugs described with very different affinity to a large number of receptors. In this work, we selected Drug-Target pairs (DTPs/nDTPs) of drugs with high affinity/non-affinity for different targets like proteins. Quantitative Structure-Activity Relationships (QSAR) models become a very useful tool in this context to substantially reduce time and resources consuming experiments. Unfortunately, most QSAR models predict activity against only one protein. To solve this problem, we developed here a multi-target QSAR (mt-QSAR) classifier using the MARCH-INSIDE technique to calculate structural parameters of drug and target plus one Artificial Neuronal Network (ANN) to seek the model. The best ANN model found is a Multi-Layer Perceptron (MLP) with profile MLP 32:32-15-1:1. This MLP classifies correctly 623 out of 678 DTPs (Sensitivity = 91.89%) and 2995 out of 3234 nDTPs (Specificity = 92.61%), corresponding to training Accuracy = 92.48%. The validation of the model was carried out by means of external predicting series. The model classifies correctly 313 out of 338 DTPs (Sensitivity = 92.60%) and 1411 out of 1534 nDTP (Specificity = 91.98%) in validation series, corresponding to total Accuracy = 92.09% for validation series (Predictability). This model favorably compares with other LDA and ANN models developed in this work and Machine Learning classifiers published before to address the same problem in different aspects. These mt-QSARs offer also a good opportunity to construct drug-protein Complex Networks (CNs) that can be used to explore large and complex drug-protein receptors databases. Finally, we illustrated two practical uses of this model with two different experiments. In experiment 1, we report prediction, synthesis, characterization, and MAO-A and MAO-B pharmacological assay of 10 rasagiline derivatives promising for anti-Parkinson drug design. In experiment 2, we report sampling, parasite culture, SEC and 1DE sample preparation, MALDI-TOF MS and MS/MS analysis, MASCOT search, MM/MD 3D structure modeling, and QSAR prediction for different peptides of hemoglobin found in the proteome of the human parasite Fasciola hepatica; which is promising for anti-parasite drug targets discovery.
Collapse
|
19
|
Agüero-Chapin G, de la Riva GA, Molina-Ruiz R, Sánchez-Rodríguez A, Pérez-Machado G, Vasconcelos V, Antunes A. Non-linear models based on simple topological indices to identify RNase III protein members. J Theor Biol 2010; 273:167-78. [PMID: 21192951 DOI: 10.1016/j.jtbi.2010.12.019] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2010] [Revised: 11/15/2010] [Accepted: 12/13/2010] [Indexed: 01/27/2023]
Abstract
Alignment-free classifiers are especially useful in the functional classification of protein classes with variable homology and different domain structures. Thus, the Topological Indices to BioPolymers (TI2BioP) methodology (Agüero-Chapin et al., 2010) inspired in both the TOPS-MODE and the MARCH-INSIDE methodologies allows the calculation of simple topological indices (TIs) as alignment-free classifiers. These indices were derived from the clustering of the amino acids into four classes of hydrophobicity and polarity revealing higher sequence-order information beyond the amino acid composition level. The predictability power of such TIs was evaluated for the first time on the RNase III family, due to the high diversity of its members (primary sequence and domain organization). Three non-linear models were developed for RNase III class prediction: Decision Tree Model (DTM), Artificial Neural Networks (ANN)-model and Hidden Markov Model (HMM). The first two are alignment-free approaches, using TIs as input predictors. Their performances were compared with a non-classical HMM, modified according to our amino acid clustering strategy. The alignment-free models showed similar performances on the training and the test sets reaching values above 90% in the overall classification. The non-classical HMM showed the highest rate in the classification with values above 95% in training and 100% in test. Although the higher accuracy of the HMM, the DTM showed simplicity for the RNase III classification with low computational cost. Such simplicity was evaluated in respect to HMM and ANN models for the functional annotation of a new bacterial RNase III class member, isolated and annotated by our group.
Collapse
Affiliation(s)
- Guillermin Agüero-Chapin
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Rua dos Bragas, 177, 4050-123 Porto, Portugal
| | | | | | | | | | | | | |
Collapse
|
20
|
Fernandez M, Ahmad S, Sarai A. Proteochemometric Recognition of Stable Kinase Inhibition Complexes Using Topological Autocorrelation and Support Vector Machines. J Chem Inf Model 2010; 50:1179-88. [DOI: 10.1021/ci1000532] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Affiliation(s)
- Michael Fernandez
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology (KIT), 680-4 Kawazu, Iizuka, 820-8502 Japan, and National Institute of Biomedical Innovation, 7-6-8, Saito-Asagi, Ibaraki-shi, Osaka 5670085, Japan
| | - Shandar Ahmad
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology (KIT), 680-4 Kawazu, Iizuka, 820-8502 Japan, and National Institute of Biomedical Innovation, 7-6-8, Saito-Asagi, Ibaraki-shi, Osaka 5670085, Japan
| | - Akinori Sarai
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology (KIT), 680-4 Kawazu, Iizuka, 820-8502 Japan, and National Institute of Biomedical Innovation, 7-6-8, Saito-Asagi, Ibaraki-shi, Osaka 5670085, Japan
| |
Collapse
|
21
|
Fernandez M, Caballero J, Fernandez L, Sarai A. Genetic algorithm optimization in drug design QSAR: Bayesian-regularized genetic neural networks (BRGNN) and genetic algorithm-optimized support vectors machines (GA-SVM). Mol Divers 2010; 15:269-89. [PMID: 20306130 DOI: 10.1007/s11030-010-9234-9] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2009] [Accepted: 01/25/2010] [Indexed: 10/19/2022]
Abstract
Many articles in "in silico" drug design implemented genetic algorithm (GA) for feature selection, model optimization, conformational search, or docking studies. Some of these articles described GA applications to quantitative structure-activity relationships (QSAR) modeling in combination with regression and/or classification techniques. We reviewed the implementation of GA in drug design QSAR and specifically its performance in the optimization of robust mathematical models such as Bayesian-regularized artificial neural networks (BRANNs) and support vector machines (SVMs) on different drug design problems. Modeled data sets encompassed ADMET and solubility properties, cancer target inhibitors, acetylcholinesterase inhibitors, HIV-1 protease inhibitors, ion-channel and calcium entry blockers, and antiprotozoan compounds as well as protein classes, functional, and conformational stability data. The GA-optimized predictors were often more accurate and robust than previous published models on the same data sets and explained more than 65% of data variances in validation experiments. In addition, feature selection over large pools of molecular descriptors provided insights into the structural and atomic properties ruling ligand-target interactions.
Collapse
Affiliation(s)
- Michael Fernandez
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology (KIT), 680-4 Kawazu, Iizuka, 820-8502, Japan.
| | | | | | | |
Collapse
|
22
|
Rodriguez-Soca Y, Munteanu CR, Dorado J, Rabuñal J, Pazos A, González-Díaz H. Plasmod-PPI: A web-server predicting complex biopolymer targets in plasmodium with entropy measures of protein–protein interactions. POLYMER 2010. [DOI: 10.1016/j.polymer.2009.11.029] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
23
|
Rodriguez-Soca Y, Munteanu CR, Dorado J, Pazos A, Prado-Prado FJ, González-Díaz H. Trypano-PPI: A Web Server for Prediction of Unique Targets in Trypanosome Proteome by using Electrostatic Parameters of Protein−protein Interactions. J Proteome Res 2009; 9:1182-90. [DOI: 10.1021/pr900827b] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Affiliation(s)
- Yamilet Rodriguez-Soca
- Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782, Santiago de Compostela, Spain, and Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, Spain
| | - Cristian R. Munteanu
- Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782, Santiago de Compostela, Spain, and Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, Spain
| | - Julián Dorado
- Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782, Santiago de Compostela, Spain, and Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, Spain
| | - Alejandro Pazos
- Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782, Santiago de Compostela, Spain, and Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, Spain
| | - Francisco J. Prado-Prado
- Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782, Santiago de Compostela, Spain, and Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, Spain
| | - Humberto González-Díaz
- Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782, Santiago de Compostela, Spain, and Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, Spain
| |
Collapse
|
24
|
Heider D, Appelmann J, Bayro T, Dreckmann W, Held A, Winkler J, Barnekow A, Borschbach M. A Computational Approach for the Identification of Small GTPases Based on Preprocessed Amino Acid Sequences. Technol Cancer Res Treat 2009; 8:333-41. [DOI: 10.1177/153303460900800503] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The prediction of essential biological features based on a given protein sequence is a challenging task in computational biology. To limit the amount of in vitro verification, the prediction of essential biological activities gives the opportunity to detect so far unknown sequences with similar properties. Besides the application within the identification of proteins being involved in tumorigenesis, other functional classes of proteins can be predicted. The prediction accuracy depends on the selected machine learning approach and even more on the composition of the descriptor set used. A computational approach based on feedforward neural networks was applied for the prediction of small GTPases. Consequently, this was realized by taking secondary structure and hydrophobicity information as a preprocessing architecture and thus, as descriptors for the neural networks. We developed a neural network cluster, which consists of a filter network and four subfamily networks. The filter network was trained to identify small GTPases and the subfamily networks were trained to assign a small GTPase to one of the subfamilies. The accuracy of the prediction, whether a given sequence represents a small GTPase is very high (98.25%). The classifications of the subfamily networks yield comparable accuracy. The high prediction accuracy of the neural network cluster developed, gives the opportunity to suggest the use of hydrophobicity and secondary structure prediction in combination with a neural network cluster, as a promising method for the prediction of essential biological activities.
Collapse
Affiliation(s)
- Dominik Heider
- Department of Bioinformatics Center for Medical Biotechnology University of Duisburg-Essen Universitätsstr. 2, 45117 Essen, Germany
| | - Jessica Appelmann
- Department of Experimental Tumorbiology, University of Münster Badestr. 9, 48149 Münster, Germany
| | - Tuygun Bayro
- Department of Experimental Tumorbiology, University of Münster Badestr. 9, 48149 Münster, Germany
| | - Winfried Dreckmann
- Institute of Computer Science University of Münster, Einsteinstr. 62 48149 Münster, Germany
| | - Andreas Held
- Institute of Computer Science University of Münster, Einsteinstr. 62 48149 Münster, Germany
| | - Jonas Winkler
- Department of Experimental Tumorbiology, University of Münster Badestr. 9, 48149 Münster, Germany
| | - Angelika Barnekow
- Department of Experimental Tumorbiology, University of Münster Badestr. 9, 48149 Münster, Germany
| | - Markus Borschbach
- Faculty of Computer Science University of Applied Science Hauptstraße 2, 51465 Bergisch Gladbach Germany
| |
Collapse
|
25
|
Concu R, Podda G, Uriarte E, González-Díaz H. Computational chemistry study of 3D-structure-function relationships for enzymes based on Markov models for protein electrostatic, HINT, and van der Waals potentials. J Comput Chem 2009; 30:1510-20. [DOI: 10.1002/jcc.21170] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
26
|
Taskin T, Sevin F. QSAR Approach to Correlate TRPV1 Antagonist Activity for a Series of Heteroaromatic Urea. ACTA ACUST UNITED AC 2009. [DOI: 10.1002/qsar.200810157] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
27
|
Cruz-Monteagudo M, Borges F, Cordeiro MNDS. Desirability-based multiobjective optimization for global QSAR studies: application to the design of novel NSAIDs with improved analgesic, antiinflammatory, and ulcerogenic profiles. J Comput Chem 2008; 29:2445-59. [PMID: 18452123 DOI: 10.1002/jcc.20994] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Up to now, very few reports have been published concerning the application of multiobjective optimization (MOOP) techniques to quantitative structure-activity relationship (QSAR) studies. However, none reports the optimization of objectives related directly to the desired pharmaceutical profile of the drug. In this work, for the first time, it is proposed a MOOP method based on Derringer's desirability function that allows conducting global QSAR studies considering simultaneously the pharmacological, pharmacokinetic and toxicological profile of a set of molecule candidates. The usefulness of the method is demonstrated by applying it to the simultaneous optimization of the analgesic, antiinflammatory, and ulcerogenic properties of a library of fifteen 3-(3-methylphenyl)-2-substituted amino-3H-quinazolin-4-one compounds. The levels of the predictor variables producing concurrently the best possible compromise between these properties is found and used to design a set of new optimized drug candidates. Our results also suggest the relevant role of the bulkiness of alkyl substituents on the C-2 position of the quinazoline ring over the ulcerogenic properties for this family of compounds. Finally, and most importantly, the desirability-based MOOP method proposed is a valuable tool and shall aid in the future rational design of novel successful drugs.
Collapse
Affiliation(s)
- Maykel Cruz-Monteagudo
- Physico-Chemical Molecular Research Unit, Department of Organic Chemistry, Faculty of Pharmacy, University of Porto, 4150-047 Porto, Portugal.
| | | | | |
Collapse
|
28
|
Quantitative Proteome–Property Relationships (QPPRs). Part 1: Finding biomarkers of organic drugs with mean Markov connectivity indices of spiral networks of blood mass spectra. Bioorg Med Chem 2008; 16:9684-93. [DOI: 10.1016/j.bmc.2008.10.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2008] [Revised: 09/29/2008] [Accepted: 10/02/2008] [Indexed: 11/22/2022]
|
29
|
Fernández M, Fernández L, Sánchez P, Caballero J, Abreu JI. Proteometric modelling of protein conformational stability using amino acid sequence autocorrelation vectors and genetic algorithm-optimised support vector machines. MOLECULAR SIMULATION 2008. [DOI: 10.1080/08927020802301920] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Michael Fernández
- a Faculty of Agronomy, Center for Biotechnological Studies, University of Matanzas, Molecular Modeling Group , Matanzas, Cuba
- b Kyushu Institute of Technology (KIT), Department of Bioscience and Bioinformatics , Iizuka, Fukuoka, Japan
| | - Leyden Fernández
- a Faculty of Agronomy, Center for Biotechnological Studies, University of Matanzas, Molecular Modeling Group , Matanzas, Cuba
| | - Pedro Sánchez
- a Faculty of Agronomy, Center for Biotechnological Studies, University of Matanzas, Molecular Modeling Group , Matanzas, Cuba
- c Faculty of Informatics, University of Matanzas, Artificial Intelligence Lab , Matanzas, Cuba
| | - Julio Caballero
- d Centro de Bioinformática y Simulación Molecular, Universidad de Talca , Talca, Chile
| | - Jose Ignacio Abreu
- a Faculty of Agronomy, Center for Biotechnological Studies, University of Matanzas, Molecular Modeling Group , Matanzas, Cuba
- c Faculty of Informatics, University of Matanzas, Artificial Intelligence Lab , Matanzas, Cuba
| |
Collapse
|
30
|
Yan S, Wu G. Quantitative relationship between mutated amino-acid sequence of human copper-transporting ATPases and their related diseases. Mol Divers 2008; 12:119-29. [PMID: 18688737 DOI: 10.1007/s11030-008-9084-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2008] [Accepted: 07/19/2008] [Indexed: 02/03/2023]
Abstract
Copper-transporting ATPase 1 and 2 (ATP7A and ATP7B) are two highly homologous P-type copper ATPase exporters. Mutations in ATP7A can lead to Menkes disease which is an X-linked disorder of copper deficiency. Mutations in ATP7B can cause Wilson disease which is an autosomal recessive disorder of copper toxicity. In this study, we attempt to build a quantitative relationship between mutated ATPase and Menkes/Wilson disease. First, we use the amino-acid distribution probability as a measure to quantify the difference in ATPase before and after mutation. Second, we use the cross-impact analysis to define the quantitative relationship between mutant ATPase protein and Menkes/Wilson disease, and compute various probabilities. Finally, we use the Bayesian equation to determine the probability that Menkes/Wilson disease is diagnosed under a mutation. The results show (i) the vast majority of mutations lead to the amino-acid distribution probability increase in mutant ATP7As and decrease in ATP7Bs, and (ii) the probability that a mutation causes Menkes/Wilson disease is about nine tenth. Thus we provide a way to use the descriptively probabilistic method to couple the mutation with its clinical outcome after quantifying mutations in proteins.
Collapse
Affiliation(s)
- Shaomin Yan
- Guangxi Academy of Sciences, 98 Daling Road, Nanning, Guangxi, 530007, China
| | | |
Collapse
|
31
|
Dea-Ayuela MA, Pérez-Castillo Y, Meneses-Marcel A, Ubeira FM, Bolas-Fernández F, Chou KC, González-Díaz H. HP-Lattice QSAR for dynein proteins: experimental proteomics (2D-electrophoresis, mass spectrometry) and theoretic study of a Leishmania infantum sequence. Bioorg Med Chem 2008; 16:7770-6. [PMID: 18662882 DOI: 10.1016/j.bmc.2008.07.023] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2008] [Revised: 06/23/2008] [Accepted: 07/02/2008] [Indexed: 10/21/2022]
Abstract
The toxicity and inefficacy of actual organic drugs against Leishmaniosis justify research projects to find new molecular targets in Leishmania species including Leishmania infantum (L. infantum) and Leishmaniamajor (L. major), both important pathogens. In this sense, quantitative structure-activity relationship (QSAR) methods, which are very useful in Bioorganic and Medicinal Chemistry to discover small-sized drugs, may help to identify not only new drugs but also new drug targets, if we apply them to proteins. Dyneins are important proteins of these parasites governing fundamental processes such as cilia and flagella motion, nuclear migration, organization of the mitotic splinde, and chromosome separation during mitosis. However, despite the interest for them as potential drug targets, so far there has been no report whatsoever on dyneins with QSAR techniques. To the best of our knowledge, we report here the first QSAR for dynein proteins. We used as input the Spectral Moments of a Markov matrix associated to the HP-Lattice Network of the protein sequence. The data contain 411 protein sequences of different species selected by ClustalX to develop a QSAR that correctly discriminates on average between 92.75% and 92.51% of dyneins and other proteins in four different train and cross-validation datasets. We also report a combined experimental and theoretic study of a new dynein sequence in order to illustrate the utility of the model to search for potential drug targets with a practical example. First, we carried out a 2D-electrophoresis analysis of L. infantum biological samples. Next, we excised from 2D-E gels one spot of interest belonging to an unknown protein or protein fragment in the region M<20,200 and pI<4. We used MASCOT search engine to find proteins in the L. major data base with the highest similarity score to the MS of the protein isolated from L. infantum. We used the QSAR model to predict the new sequence as dynein with probability of 99.99% without relying upon alignment. In order to confirm the previous function annotation we predicted the sequences as dynein with BLAST and the omniBLAST tools (96% alignment similarity to dyneins of other species). Using this combined strategy, we have successfully identified L. infantum protein containing dynein heavy chain, and illustrated the potential use of the QSAR model as a complement to alignment tools.
Collapse
|
32
|
Fernández M, Fernández L, Caballero J, Abreu JI, Reyes G. Proteochemometric Modeling of the Inhibition Complexes of Matrix Metalloproteinases withN-Hydroxy-2-[(Phenylsulfonyl)Amino]Acetamide Derivatives Using Topological Autocorrelation Interaction Matrix and Model Ensemble Averaging. Chem Biol Drug Des 2008; 72:65-78. [DOI: 10.1111/j.1747-0285.2008.00675.x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
33
|
González-Díaz H, González-Díaz Y, Santana L, Ubeira FM, Uriarte E. Proteomics, networks and connectivity indices. Proteomics 2008; 8:750-78. [DOI: 10.1002/pmic.200700638] [Citation(s) in RCA: 170] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
34
|
Fernández M, Fernández L, Abreu JI, Garriga M. Classification of voltage-gated K(+) ion channels from 3D pseudo-folding graph representation of protein sequences using genetic algorithm-optimized support vector machines. J Mol Graph Model 2008; 26:1306-14. [PMID: 18289899 DOI: 10.1016/j.jmgm.2008.01.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2007] [Revised: 01/03/2008] [Accepted: 01/03/2008] [Indexed: 11/26/2022]
Abstract
Voltage-gated K(+) ion channels (VKCs) are membrane proteins that regulate the passage of potassium ions through membranes. This work reports a classification scheme of VKCs according to the signs of three electrophysiological variables: activation threshold voltage (V(t)), half-activation voltage (V(a50)) and half-inactivation voltage (V(h50)). A novel 3D pseudo-folding graph representation of protein sequences encoded the VKC sequences. Amino acid pseudo-folding 3D distances count (AAp3DC) descriptors, calculated from the Euclidean distances matrices (EDMs) were tested for building the classifiers. Genetic algorithm (GA)-optimized support vector machines (SVMs) with a radial basis function (RBF) kernel well discriminated between VKCs having negative and positive/zero V(t), V(a50) and V(h50) values with overall accuracies about 80, 90 and 86%, respectively, in crossvalidation test. We found contributions of the "pseudo-core" and "pseudo-surface" of the 3D pseudo-folded proteins to the discrimination between VKCs according to the three electrophysiological variables.
Collapse
Affiliation(s)
- Michael Fernández
- Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, 44740 Matanzas, Cuba.
| | | | | | | |
Collapse
|
35
|
Fernández M, Caballero J, Fernández L, Abreu JI, Garriga M. Protein radial distribution function (P-RDF) and Bayesian-Regularized Genetic Neural Networks for modeling protein conformational stability: Chymotrypsin inhibitor 2 mutants. J Mol Graph Model 2007; 26:748-59. [PMID: 17569565 DOI: 10.1016/j.jmgm.2007.04.011] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2007] [Revised: 04/03/2007] [Accepted: 04/28/2007] [Indexed: 11/30/2022]
Abstract
Development of novel computational approaches for modeling protein properties is a main goal in applied Proteomics. In this work, we reported the extension of the radial distribution function (RDF) scores formalism to proteins for encoding 3D structural information with modeling purposes. Protein-RDF (P-RDF) scores measure spherical distributions on protein 3D structure of 48 amino acids/residues properties selected from the AAindex data base. P-RDF scores were tested for building predictive models of the change of thermal unfolding Gibbs free energy change (DeltaDeltaG) of chymotrypsin inhibitor 2 upon mutations. In this sense, an ensemble of Bayesian-Regularized Genetic Neural Networks (BRGNNs) yielded an optimum nonlinear model for the conformational stability. The ensemble predictor described about 84% and 70% variance of the data in training and test sets, respectively.
Collapse
Affiliation(s)
- Michael Fernández
- Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, 44740 Matanzas, Cuba.
| | | | | | | | | |
Collapse
|
36
|
Fernández M, Abreu JI, Caballero J, Garriga M, Fernández L. Comparative modeling of the conformational stability of chymotrypsin inhibitor 2 protein mutants using amino acid sequence autocorrelation (AASA) and amino acid 3D autocorrelation (AA3DA) vectors and ensembles of Bayesian-regularized genetic neural networks. MOLECULAR SIMULATION 2007. [DOI: 10.1080/08927020701564479] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
37
|
Fernández M, Caballero J, Fernández L, Abreu JI, Acosta G. Classification of conformational stability of protein mutants from 2D graph representation of protein sequences using support vector machines. MOLECULAR SIMULATION 2007. [DOI: 10.1080/08927020701377070] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
38
|
Fernández M, Caballero J, Fernández L, Abreu JI, Acosta G. Classification of conformational stability of protein mutants from 3D pseudo-folding graph representation of protein sequences using support vector machines. Proteins 2007; 70:167-75. [PMID: 17654549 DOI: 10.1002/prot.21524] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
This work reports a novel 3D pseudo-folding graph representation of protein sequences for modeling purposes. Amino acids euclidean distances matrices (EDMs) encode primary structural information. Amino Acid Pseudo-Folding 3D Distances Count (AAp3DC) descriptors, calculated from the EDMs of a large data set of 1363 single protein mutants of 64 proteins, were tested for building a classifier for the signs of the change of thermal unfolding Gibbs free energy change (DeltaDeltaG) upon single mutations. An optimum support vector machine (SVM) with a radial basis function (RBF) kernel well recognized stable and unstable mutants with accuracies over 70% in crossvalidation test. To the best of our knowledge, this result for stable mutant recognition is the highest ever reported for a sequence-based predictor with more than 1000 mutants. Furthermore, the model adequately classified mutations associated to diseases of human prion protein and human transthyretin.
Collapse
Affiliation(s)
- Michael Fernández
- Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, 44740 Matanzas, Cuba.
| | | | | | | | | |
Collapse
|