1
|
Balakrishnan N, Baskar G, Balaji S, Kullappan M, Krishna Mohan S. Machine learning modeling to identify affinity improved biobetter anticancer drug trastuzumab and the insight of molecular recognition of trastuzumab towards its antigen HER2. J Biomol Struct Dyn 2022; 40:11638-11652. [PMID: 34392800 DOI: 10.1080/07391102.2021.1961866] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
In the present study, a machine learning (ML) model was developed to predict the epistatic phenomena of combination mutants to improve the anticancer antibody-drug trastuzumab's binding affinity towards its antigen human epidermal growth factor receptor 2 (HER2). An ML algorithm, Support Vector Regression (SVR) was used to develop ML models with a data set consists of 193 affinity values of single mutants of trastuzumab and its associated various amino acid sequence derived descriptors. The subset selection of descriptors and SVR hyperparameters were done using the Genetic Algorithm (GA) within the SVR and the wrapper approach called GA-SVR. A 100 evolutionary cycles of GA produced the best 100 probable GA-SVR models based on their fitness score (Q2) estimated using a stratified 5 fold cross-validation procedure. The final ML model found to be highly predictive of test data set of six combination mutants and one single mutant with Rpre2 = 0.71. The analysis of descriptors in the ML model highlighted the importance of mutant induced secondary structural variation causes the binding affinity variation of the trastuzumab. The same was verified using a short 20 ns and a long 100 ns in duplicate molecular dynamics simulation of a wild and mutant variant of trastuzumab. The secondary structure induced affinity change due to mutations in the CDR-H3 is a novel insight that came out of this study. That should help rational mutant selection to develop a biobetter trastuzumab with a multifold improved binding affinity into the market quickly.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
| | - Gurunathan Baskar
- Department of Biotechnology, St. Joseph's College of Engineering, Chennai, India
| | - Sathyanarayan Balaji
- Department of Biotechnology, Bannari Amman Institute of Technology, Erode, India
| | - Malathi Kullappan
- Department of Research, Panimalar Medical College Hospital & Research Institute, Chennai, India
| | - Surapaneni Krishna Mohan
- Department of Biochemistry, Panimalar Medical College Hospital & Research Institute, Chennai, India.,Department of Molecular Virology, Panimalar Medical College Hospital & Research Institute, Chennai, India.,Department of Clinical Skills & Simulation, Panimalar Medical College Hospital & Research Institute, Chennai, India
| |
Collapse
|
2
|
Balakrishnan N, Gurunathan B, Surapaneni KM. Application of proteometric approach for identification of functional mutant sites to improve the binding affinity of anticancer biologic trastuzumab with its antigen human epidermal growth factor receptor 2. J Mol Recognit 2019; 33:e2818. [DOI: 10.1002/jmr.2818] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2019] [Revised: 09/15/2019] [Accepted: 09/22/2019] [Indexed: 11/07/2022]
Affiliation(s)
- Nataraj Balakrishnan
- Biotechnology Division, R&D CentreOrchid Pharma Ltd. (Formerly known as Orchid Chemicals and Pharmaceuticals Ltd.) Chennai India
| | - Baskar Gurunathan
- Department of BiotechnologySt. Joseph's College of Engineering Chennai India
| | | |
Collapse
|
3
|
Cui Y, Chen Q, Li Y, Tang L. A new model of flavonoids affinity towards P-glycoprotein: genetic algorithm-support vector machine with features selected by a modified particle swarm optimization algorithm. Arch Pharm Res 2016; 40:214-230. [DOI: 10.1007/s12272-016-0876-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Accepted: 12/16/2016] [Indexed: 01/04/2023]
|
4
|
Bosc N, Wroblowski B, Aci-Sèche S, Meyer C, Bonnet P. A Proteometric Analysis of Human Kinome: Insight into Discriminant Conformation-dependent Residues. ACS Chem Biol 2015; 10:2827-40. [PMID: 26411811 DOI: 10.1021/acschembio.5b00555] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Because of the success of imatinib, the first type-II kinase inhibitor approved by the FDA in 2001, sustained efforts have been made by the pharmaceutical industry to discover novel compounds stabilizing the inactive conformation of protein kinases. On the seven type-II inhibitors having reached the market, four were released in 2012, suggesting an acceleration of the research of such a class of compounds. Still, they represent less than a third of the protein kinase inhibitors available to patients today. The identification of key residues involved in the binding of this type of ligands in the kinase active site might ease the design of potent and selective type-II inhibitors. In order to identify those discriminant residues, we have developed a proteometric approach combining residue descriptors of protein kinase sequences and biological activities of various type-II kinase inhibitors. We applied Partial Least Squares (PLS) regression to identify 29 key residues that influence the binding of four type-II inhibitors to most proteins of the kinome. The gatekeeper residue was found to be the most relevant, confirming an essential role in ligand binding as well as in protein kinase conformational changes. Using the newly developed proteometric model, we predicted the propensity of each protein kinase to be inhibited by type-II ligands. The model was further validated using an external data set of protein/ligand activity pairs. Other residues present in the kinase domain, and more specifically in the binding site, have been highlighted by this approach, but their role in biological mechanisms is still unknown.
Collapse
Affiliation(s)
- Nicolas Bosc
- Institut
de Chimie Organique et Analytique (ICOA), UMR CNRS-Université d’Orléans 7311, Université d’Orléans
BP 6759, 45067 Orléans
Cedex 2, France
| | - Berthold Wroblowski
- Janssen Research & Development, a division of Janssen Pharmaceutica N.V., Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Samia Aci-Sèche
- Institut
de Chimie Organique et Analytique (ICOA), UMR CNRS-Université d’Orléans 7311, Université d’Orléans
BP 6759, 45067 Orléans
Cedex 2, France
| | - Christophe Meyer
- Centre de Recherche Janssen-Cilag, Campus de Maigremont - CS
10615, 27106 Val de
Reuil Cedex, France
| | - Pascal Bonnet
- Institut
de Chimie Organique et Analytique (ICOA), UMR CNRS-Université d’Orléans 7311, Université d’Orléans
BP 6759, 45067 Orléans
Cedex 2, France
| |
Collapse
|
5
|
Fernandez M, Ahmad S, Abreu JI, Sarai A. Large-scale recognition of high-affinity protease–inhibitor complexes using topological autocorrelation and support vector machines. MOLECULAR SIMULATION 2015. [DOI: 10.1080/08927022.2015.1059937] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
6
|
Niu B, Zhang Y, Ding J, Lu Y, Wang M, Lu W, Yuan X, Yin J. Predicting network of drug-enzyme interaction based on machine learning method. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1844:214-23. [PMID: 23907006 DOI: 10.1016/j.bbapap.2013.07.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2012] [Revised: 07/16/2013] [Accepted: 07/18/2013] [Indexed: 12/11/2022]
Abstract
It is important to correctly and efficiently map drugs and enzymes to their possible interaction network in modern drug research. In this work, a novel approach was introduced to encode drug and enzyme molecules with physicochemical molecular descriptors and pseudo amino acid composition, respectively. Based on this encoding method, Random Forest was adopted to build the drug-enzyme interaction network. After selecting the optimal features that are able to represent the main factors of drug-enzyme interaction in our prediction, a total of 129 features were attained which can be clustered into nine categories: Elemental Analysis, Geometry, Chemistry, Amino Acid Composition, Secondary Structure, Polarity, Molecular Volume, Codon Diversity and Electrostatic Charge. It is further found that Geometry features were the most important of all the features. As a result, our predicting model achieved an MCC of 0.915 and a sensitivity of 87.9% at the specificity level of 99.8% for 10-fold cross-validation test, and achieved an MCC of 0.895 and a sensitivity of 95.7% at the specificity level of 95.4% for independent set test. This article is part of a Special Issue entitled: Computational Proteomics, Systems Biology & Clinical Implications. Guest Editor: Yudong Cai.
Collapse
Affiliation(s)
- Bing Niu
- College of Life Science, Shanghai University, 99 Shang-Da Road, Shanghai 200072, China
| | | | | | | | | | | | | | | |
Collapse
|
7
|
Development of predictive quantitative structure–activity relationship model and its application in the discovery of human leukotriene A4 hydrolase inhibitors. Future Med Chem 2013; 5:27-40. [DOI: 10.4155/fmc.12.184] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Background: Human LTA4H catalyzes the conversion of LTA4 to LTB4 and plays a key role in innate immune responses. Inhibition of this enzyme can be a valid method in the treatment of inflammatory response exhibited through LTB4. Results & discussion: The quantitative structure–activity relationship (QSAR) models were developed using genetic function approximation and validated. A training set of 26 diverse compounds and their molecular descriptors were used to develop highly correlating QSAR models. A six-descriptor model explaining the biological activity of the training and test sets with correlation values of 0.846 and 0.502, respectively, was selected as the best model and used in a database screening of drug-like Maybridge database followed by molecular docking. Conclusion: Based on the predicted potent inhibitory activities, expected binding mode and molecular interactions at the active site of hLTA4H final leads were selected as to be utilized in designing future hLTA4H inhibitors.
Collapse
|
8
|
González-Díaz H, Riera-Fernández P. New Markov-Autocorrelation Indices for Re-evaluation of Links in Chemical and Biological Complex Networks used in Metabolomics, Parasitology, Neurosciences, and Epidemiology. J Chem Inf Model 2012; 52:3331-40. [DOI: 10.1021/ci300321f] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- Humberto González-Díaz
- Department of Microbiology
and Parasitology,
Faculty of Pharmacy, University of Santiago de Compostela (USC), 15782 Santiago de Compostela, Spain
| | - Pablo Riera-Fernández
- Department of Microbiology
and Parasitology,
Faculty of Pharmacy, University of Santiago de Compostela (USC), 15782 Santiago de Compostela, Spain
| |
Collapse
|
9
|
Hosseinzadeh F, Ebrahimi M, Goliaei B, Shamabadi N. Classification of lung cancer tumors based on structural and physicochemical properties of proteins by bioinformatics models. PLoS One 2012; 7:e40017. [PMID: 22829872 PMCID: PMC3400626 DOI: 10.1371/journal.pone.0040017] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2012] [Accepted: 05/30/2012] [Indexed: 12/03/2022] Open
Abstract
Rapid distinction between small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) tumors is very important in diagnosis of this disease. Furthermore sequence-derived structural and physicochemical descriptors are very useful for machine learning prediction of protein structural and functional classes, classifying proteins and the prediction performance. Herein, in this study is the classification of lung tumors based on 1497 attributes derived from structural and physicochemical properties of protein sequences (based on genes defined by microarray analysis) investigated through a combination of attribute weighting, supervised and unsupervised clustering algorithms. Eighty percent of the weighting methods selected features such as autocorrelation, dipeptide composition and distribution of hydrophobicity as the most important protein attributes in classification of SCLC, NSCLC and COMMON classes of lung tumors. The same results were observed by most tree induction algorithms while descriptors of hydrophobicity distribution were high in protein sequences COMMON in both groups and distribution of charge in these proteins was very low; showing COMMON proteins were very hydrophobic. Furthermore, compositions of polar dipeptide in SCLC proteins were higher than NSCLC proteins. Some clustering models (alone or in combination with attribute weighting algorithms) were able to nearly classify SCLC and NSCLC proteins. Random Forest tree induction algorithm, calculated on leaves one-out and 10-fold cross validation) shows more than 86% accuracy in clustering and predicting three different lung cancer tumors. Here for the first time the application of data mining tools to effectively classify three classes of lung cancer tumors regarding the importance of dipeptide composition, autocorrelation and distribution descriptor has been reported.
Collapse
Affiliation(s)
- Faezeh Hosseinzadeh
- Student at Laboratory of Biophysics and Molecular Biology, Institute of Biophysics and Biochemistry, University of Tehran, Tehran, Iran
| | - Mansour Ebrahimi
- Department of Biology at Basic science School & Bioinformatics Research Group, Green Research Center, University of Qom, Qom, Iran
| | - Bahram Goliaei
- Department of Medical Physics, Iran University of Medical Science, Tehran, Iran
| | - Narges Shamabadi
- Bioinformatics Research Group, Green Research Center, University of Qom, Qom, Iran
| |
Collapse
|
10
|
Molecular docking and QSAR study on steroidal compounds as aromatase inhibitors. Eur J Med Chem 2010; 45:5612-20. [PMID: 20926163 DOI: 10.1016/j.ejmech.2010.09.011] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2010] [Revised: 08/06/2010] [Accepted: 09/06/2010] [Indexed: 11/21/2022]
Abstract
In order to develop more potent, selective and less toxic steroidal aromatase (AR) inhibitors, molecular docking, 2D and 3D hybrid quantitative structure-activity relationship (QSAR) study have been conducted using topological, molecular shape, spatial, structural and thermodynamic descriptors on 32 steroidal compounds. The molecular docking study shows that one or more hydrogen bonds with MET374 are one of the essential requirements for the optimum binding of ligands. The QSAR model obtained indicates that the aromatase inhibitory activity can be enhanced by increasing SIC, SC_3_C, Jurs_WNSA_1, Jurs_WPSA_1 and decreasing CDOCKER interaction energy (ECD), IAC_Total and Shadow_XZfrac. The predicted results shows that this model has a comparatively good predictive power which can be used in prediction of activity of new steroidal aromatase inhibitors.
Collapse
|
11
|
Fernandez M, Ahmad S, Sarai A. Proteochemometric Recognition of Stable Kinase Inhibition Complexes Using Topological Autocorrelation and Support Vector Machines. J Chem Inf Model 2010; 50:1179-88. [DOI: 10.1021/ci1000532] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Affiliation(s)
- Michael Fernandez
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology (KIT), 680-4 Kawazu, Iizuka, 820-8502 Japan, and National Institute of Biomedical Innovation, 7-6-8, Saito-Asagi, Ibaraki-shi, Osaka 5670085, Japan
| | - Shandar Ahmad
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology (KIT), 680-4 Kawazu, Iizuka, 820-8502 Japan, and National Institute of Biomedical Innovation, 7-6-8, Saito-Asagi, Ibaraki-shi, Osaka 5670085, Japan
| | - Akinori Sarai
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology (KIT), 680-4 Kawazu, Iizuka, 820-8502 Japan, and National Institute of Biomedical Innovation, 7-6-8, Saito-Asagi, Ibaraki-shi, Osaka 5670085, Japan
| |
Collapse
|
12
|
Xi L, Li S, Liu H, Li J, Lei B, Yao X. Global and local prediction of protein folding rates based on sequence autocorrelation information. J Theor Biol 2010; 264:1159-68. [DOI: 10.1016/j.jtbi.2010.03.042] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2009] [Revised: 03/28/2010] [Accepted: 03/29/2010] [Indexed: 11/24/2022]
|
13
|
Cruz-Cano R, Chew DSH, Kwok-Pui C, Ming-Ying L. Least-Squares Support Vector Machine Approach to Viral Replication Origin Prediction. INFORMS JOURNAL ON COMPUTING 2010; 22:457-470. [PMID: 20729987 PMCID: PMC2923853 DOI: 10.1287/ijoc.1090.0360] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Replication of their DNA genomes is a central step in the reproduction of many viruses. Procedures to find replication origins, which are initiation sites of the DNA replication process, are therefore of great importance for controlling the growth and spread of such viruses. Existing computational methods for viral replication origin prediction have mostly been tested within the family of herpesviruses. This paper proposes a new approach by least-squares support vector machines (LS-SVMs) and tests its performance not only on the herpes family but also on a collection of caudoviruses coming from three viral families under the order of caudovirales. The LS-SVM approach provides sensitivities and positive predictive values superior or comparable to those given by the previous methods. When suitably combined with previous methods, the LS-SVM approach further improves the prediction accuracy for the herpesvirus replication origins. Furthermore, by recursive feature elimination, the LS-SVM has also helped find the most significant features of the data sets. The results suggest that the LS-SVMs will be a highly useful addition to the set of computational tools for viral replication origin prediction and illustrate the value of optimization-based computing techniques in biomedical applications.
Collapse
Affiliation(s)
- Raul Cruz-Cano
- Department of Computer and Information Sciences, Texas A&M University-Texarkana, Texarkana, TX, 75501, USA,
| | | | | | | |
Collapse
|
14
|
Fernandez M, Caballero J, Fernandez L, Sarai A. Genetic algorithm optimization in drug design QSAR: Bayesian-regularized genetic neural networks (BRGNN) and genetic algorithm-optimized support vectors machines (GA-SVM). Mol Divers 2010; 15:269-89. [PMID: 20306130 DOI: 10.1007/s11030-010-9234-9] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2009] [Accepted: 01/25/2010] [Indexed: 10/19/2022]
Abstract
Many articles in "in silico" drug design implemented genetic algorithm (GA) for feature selection, model optimization, conformational search, or docking studies. Some of these articles described GA applications to quantitative structure-activity relationships (QSAR) modeling in combination with regression and/or classification techniques. We reviewed the implementation of GA in drug design QSAR and specifically its performance in the optimization of robust mathematical models such as Bayesian-regularized artificial neural networks (BRANNs) and support vector machines (SVMs) on different drug design problems. Modeled data sets encompassed ADMET and solubility properties, cancer target inhibitors, acetylcholinesterase inhibitors, HIV-1 protease inhibitors, ion-channel and calcium entry blockers, and antiprotozoan compounds as well as protein classes, functional, and conformational stability data. The GA-optimized predictors were often more accurate and robust than previous published models on the same data sets and explained more than 65% of data variances in validation experiments. In addition, feature selection over large pools of molecular descriptors provided insights into the structural and atomic properties ruling ligand-target interactions.
Collapse
Affiliation(s)
- Michael Fernandez
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology (KIT), 680-4 Kawazu, Iizuka, 820-8502, Japan.
| | | | | | | |
Collapse
|
15
|
A network-QSAR model for prediction of genetic-component biomarkers in human colorectal cancer. J Theor Biol 2009; 261:449-58. [DOI: 10.1016/j.jtbi.2009.07.031] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2009] [Revised: 07/20/2009] [Accepted: 07/25/2009] [Indexed: 11/23/2022]
|
16
|
Zhu X, Shan Y, Li G, Huang A, Zhang Z. Prediction of wood property in Chinese Fir based on visible/near-infrared spectroscopy and least square-support vector machine. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2009; 74:344-8. [PMID: 19576843 DOI: 10.1016/j.saa.2009.06.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2009] [Revised: 05/12/2009] [Accepted: 06/07/2009] [Indexed: 05/13/2023]
Abstract
A method for the quantification of density of Chinese Fir samples based on visible/near-infrared (vis-NIR) spectrometry and least squares-support vector machine (LS-SVM) was proposed. Sample set partitioning based on joint x-y distances (SPXY) algorithm was used for dividing calibration and prediction samples, it is of value for prediction of property involving complex matrices. A stepwise procedure is employed to select samples according to their differences in both x (instrumental responses) and y (predicted parameter) spaces. For comparison, the models were also constructed by Kennard-Stone method, as well as by using the duplex and random sampling methods for subset partitioning. The results revealed that the SPXY algorithm may be an advantageous alternative to the other three strategies. To validate the reliability of LS-SVM, comparisons were made among other modeling methods such as support vector machine (SVM) and partial least squares (PLS) regression. Satisfactory models were built using LS-SVM, with lower prediction errors and superior performance in relation to SVM and PLS. These results showed possibility of building robust models to quantify the density of Chinese Fir using near-infrared spectroscopy and LS-SVM combined SPXY algorithm as a nonlinear multivariate calibration procedure.
Collapse
Affiliation(s)
- Xiangrong Zhu
- Hunan Agricultural Product Processing Institute, Changsha 410125, PR China
| | | | | | | | | |
Collapse
|
17
|
Lei B, Li S, Xi L, Li J, Liu H, Yao X. Novel approaches for retention time prediction of oligonucleotides in ion-pair reversed-phase high-performance liquid chromatography. J Chromatogr A 2009; 1216:4434-9. [PMID: 19324364 DOI: 10.1016/j.chroma.2009.03.032] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2008] [Revised: 03/09/2009] [Accepted: 03/13/2009] [Indexed: 10/21/2022]
Abstract
The base sequence autocorrelation (BSA) descriptors were used to describe structures of oligonucleotides and to develop accurate quantitative structure-retention relationship (QSRR) models of oligonucleotides in ion-pair reversed-phase high-performance liquid chromatography. Through the combination use of multiple linear regression (MLR) and genetic algorithm (GA), QSRR models were developed at temperatures of 30 degrees C, 40 degrees C, 50 degrees C, 60 degrees C and 80 degrees C, respectively. Satisfactory results were obtained for the single-temperature models (STM). Multi-temperature model (MTM) was also developed that can be used for predicting the retention time at any temperature. The correlation coefficients of retention time prediction for the test set based on the MTM model at 30 degrees C, 40 degrees C, 50 degrees C, 60 degrees C and 80 degrees C were 0.978, 0.982, 0.989, 0.988 and 0.996, respectively. The corresponding absolute average relative deviations (AARD) for the test set at each temperature were all less than 1%. The new strategy of feature representation and multi-temperatures modeling is a very promising tool for QSRR modeling with good predictive ability for predicting retention time of oligonucleotides at multiple temperatures under the studied condition.
Collapse
Affiliation(s)
- Beilei Lei
- Department of Chemistry, Lanzhou University, Lanzhou 730000, China
| | | | | | | | | | | |
Collapse
|
18
|
Cruz-Monteagudo M, Borges F, Cordeiro MNDS. Desirability-based multiobjective optimization for global QSAR studies: application to the design of novel NSAIDs with improved analgesic, antiinflammatory, and ulcerogenic profiles. J Comput Chem 2008; 29:2445-59. [PMID: 18452123 DOI: 10.1002/jcc.20994] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Up to now, very few reports have been published concerning the application of multiobjective optimization (MOOP) techniques to quantitative structure-activity relationship (QSAR) studies. However, none reports the optimization of objectives related directly to the desired pharmaceutical profile of the drug. In this work, for the first time, it is proposed a MOOP method based on Derringer's desirability function that allows conducting global QSAR studies considering simultaneously the pharmacological, pharmacokinetic and toxicological profile of a set of molecule candidates. The usefulness of the method is demonstrated by applying it to the simultaneous optimization of the analgesic, antiinflammatory, and ulcerogenic properties of a library of fifteen 3-(3-methylphenyl)-2-substituted amino-3H-quinazolin-4-one compounds. The levels of the predictor variables producing concurrently the best possible compromise between these properties is found and used to design a set of new optimized drug candidates. Our results also suggest the relevant role of the bulkiness of alkyl substituents on the C-2 position of the quinazoline ring over the ulcerogenic properties for this family of compounds. Finally, and most importantly, the desirability-based MOOP method proposed is a valuable tool and shall aid in the future rational design of novel successful drugs.
Collapse
Affiliation(s)
- Maykel Cruz-Monteagudo
- Physico-Chemical Molecular Research Unit, Department of Organic Chemistry, Faculty of Pharmacy, University of Porto, 4150-047 Porto, Portugal.
| | | | | |
Collapse
|
19
|
Shen HB, Chou KC. Identification of proteases and their types. Anal Biochem 2008; 385:153-60. [PMID: 19007742 DOI: 10.1016/j.ab.2008.10.020] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2009] [Revised: 10/13/2008] [Accepted: 10/14/2008] [Indexed: 10/21/2022]
Abstract
Called by many as biology's version of Swiss army knives, proteases cut long sequences of amino acids into fragments and regulate most physiological processes. They are vitally important in the life cycle. Different types of proteases have different action mechanisms and biological processes. With the avalanche of protein sequences generated during the postgenomic age, it is highly desirable for both basic research and drug design to develop a fast and reliable method for identifying the types of proteases according to their sequences or even just for whether they are proteases or not. In this article, three recently developed identification methods in this regard are discussed: (i) FunD-PseAAC, (ii) GO-PseAAC, and (iii) FunD-PsePSSM. The first two were established by hybridizing the FunD (functional domain) approach and the GO (gene ontology) approach, respectively, with the PseAAC (pseudo amino acid composition) approach. The third method was established by fusing the FunD approach with the PsePSSM (pseudo position-specific scoring matrix) approach. Of these three methods, only FunD-PsePSSM has provided a server called ProtIdent (protease identifier), which is freely accessible to the public via the website at http://www.csbio.sjtu.edu.cn/bioinf/Protease. For the convenience of users, a step-by-step guide on how to use ProtIdent is illustrated. Meanwhile, the caveat in using ProtIdent and how to understand the success expectancy rate of a statistical predictor are discussed. Finally, the essence of why ProtIdent can yield a high success rate in identifying proteases and their types is elucidated.
Collapse
Affiliation(s)
- Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiaotong University, Shanghai 200240, China.
| | | |
Collapse
|
20
|
Fernández M, Fernández L, Sánchez P, Caballero J, Abreu JI. Proteometric modelling of protein conformational stability using amino acid sequence autocorrelation vectors and genetic algorithm-optimised support vector machines. MOLECULAR SIMULATION 2008. [DOI: 10.1080/08927020802301920] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Michael Fernández
- a Faculty of Agronomy, Center for Biotechnological Studies, University of Matanzas, Molecular Modeling Group , Matanzas, Cuba
- b Kyushu Institute of Technology (KIT), Department of Bioscience and Bioinformatics , Iizuka, Fukuoka, Japan
| | - Leyden Fernández
- a Faculty of Agronomy, Center for Biotechnological Studies, University of Matanzas, Molecular Modeling Group , Matanzas, Cuba
| | - Pedro Sánchez
- a Faculty of Agronomy, Center for Biotechnological Studies, University of Matanzas, Molecular Modeling Group , Matanzas, Cuba
- c Faculty of Informatics, University of Matanzas, Artificial Intelligence Lab , Matanzas, Cuba
| | - Julio Caballero
- d Centro de Bioinformática y Simulación Molecular, Universidad de Talca , Talca, Chile
| | - Jose Ignacio Abreu
- a Faculty of Agronomy, Center for Biotechnological Studies, University of Matanzas, Molecular Modeling Group , Matanzas, Cuba
- c Faculty of Informatics, University of Matanzas, Artificial Intelligence Lab , Matanzas, Cuba
| |
Collapse
|
21
|
Xiao X, Lin WZ, Chou KC. Using grey dynamic modeling and pseudo amino acid composition to predict protein structural classes. J Comput Chem 2008; 29:2018-24. [PMID: 18381630 DOI: 10.1002/jcc.20955] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Using the pseudo amino acid (PseAA) composition to represent the sample of a protein can incorporate a considerable amount of sequence pattern information so as to improve the prediction quality for its structural or functional classification. However, how to optimally formulate the PseAA composition is an important problem yet to be solved. In this article the grey modeling approach is introduced that is particularly efficient in coping with complicated systems such as the one consisting of many proteins with different sequence orders and lengths. On the basis of the grey model, four coefficients derived from each of the protein sequences concerned are adopted for its PseAA components. The PseAA composition thus formulated is called the "grey-PseAA" composition that can catch the essence of a protein sequence and better reflect its overall pattern. In our study we have demonstrated that introduction of the grey-PseAA composition can remarkably enhance the success rates in predicting the protein structural class. It is anticipated that the concept of grey-PseAA composition can be also used to predict many other protein attributes, such as subcellular localization, membrane protein type, enzyme functional class, GPCR type, protease type, among many others.
Collapse
Affiliation(s)
- Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333000, China.
| | | | | |
Collapse
|
22
|
Prediction of protein structural classes by Chou’s pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis. Amino Acids 2008; 37:415-25. [DOI: 10.1007/s00726-008-0170-2] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2008] [Accepted: 08/03/2008] [Indexed: 10/21/2022]
|
23
|
Yan S, Wu G. Quantitative relationship between mutated amino-acid sequence of human copper-transporting ATPases and their related diseases. Mol Divers 2008; 12:119-29. [PMID: 18688737 DOI: 10.1007/s11030-008-9084-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2008] [Accepted: 07/19/2008] [Indexed: 02/03/2023]
Abstract
Copper-transporting ATPase 1 and 2 (ATP7A and ATP7B) are two highly homologous P-type copper ATPase exporters. Mutations in ATP7A can lead to Menkes disease which is an X-linked disorder of copper deficiency. Mutations in ATP7B can cause Wilson disease which is an autosomal recessive disorder of copper toxicity. In this study, we attempt to build a quantitative relationship between mutated ATPase and Menkes/Wilson disease. First, we use the amino-acid distribution probability as a measure to quantify the difference in ATPase before and after mutation. Second, we use the cross-impact analysis to define the quantitative relationship between mutant ATPase protein and Menkes/Wilson disease, and compute various probabilities. Finally, we use the Bayesian equation to determine the probability that Menkes/Wilson disease is diagnosed under a mutation. The results show (i) the vast majority of mutations lead to the amino-acid distribution probability increase in mutant ATP7As and decrease in ATP7Bs, and (ii) the probability that a mutation causes Menkes/Wilson disease is about nine tenth. Thus we provide a way to use the descriptively probabilistic method to couple the mutation with its clinical outcome after quantifying mutations in proteins.
Collapse
Affiliation(s)
- Shaomin Yan
- Guangxi Academy of Sciences, 98 Daling Road, Nanning, Guangxi, 530007, China
| | | |
Collapse
|
24
|
Dea-Ayuela MA, Pérez-Castillo Y, Meneses-Marcel A, Ubeira FM, Bolas-Fernández F, Chou KC, González-Díaz H. HP-Lattice QSAR for dynein proteins: experimental proteomics (2D-electrophoresis, mass spectrometry) and theoretic study of a Leishmania infantum sequence. Bioorg Med Chem 2008; 16:7770-6. [PMID: 18662882 DOI: 10.1016/j.bmc.2008.07.023] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2008] [Revised: 06/23/2008] [Accepted: 07/02/2008] [Indexed: 10/21/2022]
Abstract
The toxicity and inefficacy of actual organic drugs against Leishmaniosis justify research projects to find new molecular targets in Leishmania species including Leishmania infantum (L. infantum) and Leishmaniamajor (L. major), both important pathogens. In this sense, quantitative structure-activity relationship (QSAR) methods, which are very useful in Bioorganic and Medicinal Chemistry to discover small-sized drugs, may help to identify not only new drugs but also new drug targets, if we apply them to proteins. Dyneins are important proteins of these parasites governing fundamental processes such as cilia and flagella motion, nuclear migration, organization of the mitotic splinde, and chromosome separation during mitosis. However, despite the interest for them as potential drug targets, so far there has been no report whatsoever on dyneins with QSAR techniques. To the best of our knowledge, we report here the first QSAR for dynein proteins. We used as input the Spectral Moments of a Markov matrix associated to the HP-Lattice Network of the protein sequence. The data contain 411 protein sequences of different species selected by ClustalX to develop a QSAR that correctly discriminates on average between 92.75% and 92.51% of dyneins and other proteins in four different train and cross-validation datasets. We also report a combined experimental and theoretic study of a new dynein sequence in order to illustrate the utility of the model to search for potential drug targets with a practical example. First, we carried out a 2D-electrophoresis analysis of L. infantum biological samples. Next, we excised from 2D-E gels one spot of interest belonging to an unknown protein or protein fragment in the region M<20,200 and pI<4. We used MASCOT search engine to find proteins in the L. major data base with the highest similarity score to the MS of the protein isolated from L. infantum. We used the QSAR model to predict the new sequence as dynein with probability of 99.99% without relying upon alignment. In order to confirm the previous function annotation we predicted the sequences as dynein with BLAST and the omniBLAST tools (96% alignment similarity to dyneins of other species). Using this combined strategy, we have successfully identified L. infantum protein containing dynein heavy chain, and illustrated the potential use of the QSAR model as a complement to alignment tools.
Collapse
|
25
|
Fernández M, Fernández L, Caballero J, Abreu JI, Reyes G. Proteochemometric Modeling of the Inhibition Complexes of Matrix Metalloproteinases withN-Hydroxy-2-[(Phenylsulfonyl)Amino]Acetamide Derivatives Using Topological Autocorrelation Interaction Matrix and Model Ensemble Averaging. Chem Biol Drug Des 2008; 72:65-78. [DOI: 10.1111/j.1747-0285.2008.00675.x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
26
|
Dai Y, Zhang X, Zhang X, Wang H, Lu Z. DFT and GA studies on the QSAR of 2-aryl-5-nitro-1H-indole derivatives as NorA efflux pump inhibitors. J Mol Model 2008; 14:807-12. [PMID: 18575902 DOI: 10.1007/s00894-008-0328-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2008] [Accepted: 05/30/2008] [Indexed: 10/21/2022]
Abstract
The structures of 2-aryl-5-nitro-1H-indole derivatives were optimized with PM3 and DFT at b3lyp/6-31 g* level successively. Some structural and electric descriptors were obtained from the single point energy calculation and natural bond orbital analysis at the level of b3lyp/6-31 g*. As efflux pump inhibitors, a QSAR model was built with genetic algrithum (GA) and partial least square (PLS) analyses. The high R(2) and R(2)CV indicates the derived model has a good predictive power which can be used in prediction of activity for new 2-aryl-5-nitro-1H-indole derivatives. This model gives us a revelation that the activity of 2-aryl-5-nitro-1H-indole derivatives as efflux pump inhibitor can be improved by properly increasing the molecular volume and Mulliken atomic charge of C(3)(Q(C3)) or lowering the dipole and Mulliken atomic charge of C(4)(Q(C4)) in 2-aryl and it was found from this article that a QSAR relationship can be built for small samples with large descriptors by compressing the descriptors with GA and analyzing with PLS. With this model, a new compound, 2-(2-Azidomethyl-5-phenoxy-phenyl)-5-nitro-1H-indole was predicted to lower the MIC of berberine to 0.091 microg/mL for inhibiting K2361 of S. aureus with NorA efflux pump protein over expression. Figure: Basic structure of 2-aryl-5-nitro-1H-indoles.
Collapse
Affiliation(s)
- Yujie Dai
- Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, People's Republic of China.
| | | | | | | | | |
Collapse
|
27
|
Prediction of protein structure class by coupling improved genetic algorithm and support vector machine. Amino Acids 2008; 35:581-90. [DOI: 10.1007/s00726-008-0084-z] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2007] [Accepted: 01/31/2008] [Indexed: 10/22/2022]
|
28
|
An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins. Amino Acids 2008; 36:167-75. [DOI: 10.1007/s00726-008-0044-7] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2007] [Accepted: 02/07/2008] [Indexed: 10/22/2022]
|
29
|
Fernández M, Fernández L, Abreu JI, Garriga M. Classification of voltage-gated K(+) ion channels from 3D pseudo-folding graph representation of protein sequences using genetic algorithm-optimized support vector machines. J Mol Graph Model 2008; 26:1306-14. [PMID: 18289899 DOI: 10.1016/j.jmgm.2008.01.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2007] [Revised: 01/03/2008] [Accepted: 01/03/2008] [Indexed: 11/26/2022]
Abstract
Voltage-gated K(+) ion channels (VKCs) are membrane proteins that regulate the passage of potassium ions through membranes. This work reports a classification scheme of VKCs according to the signs of three electrophysiological variables: activation threshold voltage (V(t)), half-activation voltage (V(a50)) and half-inactivation voltage (V(h50)). A novel 3D pseudo-folding graph representation of protein sequences encoded the VKC sequences. Amino acid pseudo-folding 3D distances count (AAp3DC) descriptors, calculated from the Euclidean distances matrices (EDMs) were tested for building the classifiers. Genetic algorithm (GA)-optimized support vector machines (SVMs) with a radial basis function (RBF) kernel well discriminated between VKCs having negative and positive/zero V(t), V(a50) and V(h50) values with overall accuracies about 80, 90 and 86%, respectively, in crossvalidation test. We found contributions of the "pseudo-core" and "pseudo-surface" of the 3D pseudo-folded proteins to the discrimination between VKCs according to the three electrophysiological variables.
Collapse
Affiliation(s)
- Michael Fernández
- Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, 44740 Matanzas, Cuba.
| | | | | | | |
Collapse
|
30
|
Nanni L, Lumini A. Combing ontologies and dipeptide composition for predicting DNA-binding proteins. Amino Acids 2008; 34:635-41. [PMID: 18175049 DOI: 10.1007/s00726-007-0016-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2007] [Accepted: 12/06/2007] [Indexed: 12/11/2022]
Abstract
Given a novel protein it is very important to know if it is a DNA-binding protein, because DNA-binding proteins participate in the fundamental role to regulate gene expression. In this work, we propose a parallel fusion between a classifier trained using the features extracted from the gene ontology database and a classifier trained using the dipeptide composition of the protein. As classifiers the support vector machine (SVM) and the 1-nearest neighbour are used. Matthews's correlation coefficient obtained by our fusion method is approximately 0.97 when the jackknife cross-validation is used; this result outperforms the best performance obtained in the literature (0.924) using the same dataset where the SVM is trained using only the Chou's pseudo amino acid based features. In this work also the area under the ROC-curve (AUC) is reported and our results show that the fusion permits to obtain a very interesting 0.995 AUC. In particular we want to stress that our fusion obtains a 5% false negative with a 0% of false positive. Matthews's correlation coefficient obtained using the single best GO-number is only 0.7211 and hence it is not possible to use the gene ontology database as a simple lookup table. Finally, we test the complementarity of the two tested feature extraction methods using the Q-statistic. We obtain the very interesting result of 0.58, which means that the features extracted from the gene ontology database and the features extracted from the amino acid sequence are partially independent and that their parallel fusion should be studied more.
Collapse
Affiliation(s)
- Loris Nanni
- DEIS, IEIIT-CNR, Università di Bologna, Viale Risorgimento 2, 40136 Bologna, Italy.
| | | |
Collapse
|