1
|
Contreras-Torres E, Marrero-Ponce Y, Terán JE, Agüero-Chapin G, Antunes A, García-Jacas CR. Fuzzy spherical truncation-based multi-linear protein descriptors: From their definition to application in structural-related predictions. Front Chem 2022; 10:959143. [PMID: 36277354 PMCID: PMC9585278 DOI: 10.3389/fchem.2022.959143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 08/15/2022] [Indexed: 11/13/2022] Open
Abstract
This study introduces a set of fuzzy spherically truncated three-dimensional (3D) multi-linear descriptors for proteins. These indices codify geometric structural information from kth spherically truncated spatial-(dis)similarity two-tuple and three-tuple tensors. The coefficients of these truncated tensors are calculated by applying a smoothing value to the 3D structural encoding based on the relationships between two and three amino acids of a protein embedded into a sphere. At considering, the geometrical center of the protein matches with center of the sphere, the distance between each amino acid involved in any specific interaction and the geometrical center of the protein can be computed. Then, the fuzzy membership degree of each amino acid from an spherical region of interest is computed by fuzzy membership functions (FMFs). The truncation value is finally a combination of the membership degrees from interacting amino acids, by applying the arithmetic mean as fusion rule. Several fuzzy membership functions with diverse biases on the calculation of amino acids memberships (e.g., Z-shaped (close to the center), PI-shaped (middle region), and A-Gaussian (far from the center)) were considered as well as traditional truncation functions (e.g., Switching). Such truncation functions were comparatively evaluated by exploring: 1) the frequency of membership degrees, 2) the variability and orthogonality analyses among them based on the Shannon Entropy’s and Principal Component’s methods, respectively, and 3) the prediction performance of alignment-free prediction of protein folding rates and structural classes. These analyses unraveled the singularity of the proposed fuzzy spherically truncated MDs with respect to the classical (non-truncated) ones and respect to the MDs truncated with traditional functions. They also showed an improved prediction power by attaining an external correlation coefficient of 95.82% in the folding rate modelling and an accuracy of 100% in distinguishing structural protein classes. These outcomes are better than the ones attained by existing approaches, justifying the theoretical contribution of this report. Thus, the fuzzy spherically truncated-based protein descriptors from MuLiMs-MCoMPAs (http://tomocomd.com/mulims-mcompas) are promising alignment-free predictors for modeling protein functions and properties.
Collapse
Affiliation(s)
- Ernesto Contreras-Torres
- Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Universidad San Francisco de Quito (USFQ), Quito, Pichincha, Ecuador
- Instituto de Simulación Computacional (ISC-USFQ), Quito, Pichincha, Ecuador
- BCAM—Basque Center for Applied Mathematics, Bilbao, Spain
| | - Yovani Marrero-Ponce
- Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Universidad San Francisco de Quito (USFQ), Quito, Pichincha, Ecuador
- Instituto de Simulación Computacional (ISC-USFQ), Quito, Pichincha, Ecuador
- Computer-Aided Molecular “Biosilico” Discovery and Bioinformatics Research International Network (CAMD-BIR IN), Quito, Ecuador
- *Correspondence: Yovani Marrero-Ponce, , , César R. García-Jacas, , ,
| | - Julio E. Terán
- Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Universidad San Francisco de Quito (USFQ), Quito, Pichincha, Ecuador
- Instituto de Simulación Computacional (ISC-USFQ), Quito, Pichincha, Ecuador
- Department of Textile Engineering, Chemistry and Science, College of Textiles, North Carolina State University, Raleigh, NC, United States
| | - Guillermin Agüero-Chapin
- CIIMAR—Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Porto, Portugal
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| | - Agostinho Antunes
- CIIMAR—Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Porto, Portugal
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| | - César R. García-Jacas
- Cátedras Conacyt—Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California, Mexico
- *Correspondence: Yovani Marrero-Ponce, , , César R. García-Jacas, , ,
| |
Collapse
|
2
|
Patnode K, Rasulev B, Voronov A. Synergistic Behavior of Plant Proteins and Biobased Latexes in Bioplastic Food Packaging Materials: Experimental and Machine Learning Study. ACS APPLIED MATERIALS & INTERFACES 2022; 14:8384-8393. [PMID: 35119263 DOI: 10.1021/acsami.1c21650] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Plant-based proteins are attractive components which may serve as sustainable alternatives to current petrochemical products. Both soy protein and major corn protein, zein, are of interest in food packaging applications due to their sustainability, biodegradation properties, and inherent physicochemical properties. This study discusses the development of bioplastic materials, where it explores the effects of combining zein, soy protein, and plasticizing latexes derived from plant oil-based monomers (POBMs) on properties of resulting bioplastic films. By looking for synergistic effects of soy protein's inherent film formation ability and zein's higher strength, we prepare strong yet flexible soy-zein films as materials, called proteoposites. Incorporation of natural additive POBM-latexes helps to plasticize and hydrophobize the bioplastic films and thus to improve mechanical and barrier properties. Variation of the POBM-latexes' particle size further aims to enhance the performance of resulting bioplastic films. As a result, modified soy-zein proteoposite films with improved moisture resistance, enhanced mechanical behavior, and greater barrier properties were developed. Machine learning-based computational models were utilized in order to find main structural factors affecting the bioplastic's properties and develop a quantitative structure-property relationship model between the physicochemical properties of the film components and the resulted bioplastics' properties and performance. The developed model effectively predicts experimental outcomes with >85% (R2: 0.85) accuracy. The newly synthesized proteoposites confirmed the machine learning model predictions. As a result, proteoposite films made of two plant proteins and modified with POBM-latexes can be considered as an attractive and viable replacement for petrochemical food packaging products.
Collapse
Affiliation(s)
- Kristen Patnode
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102-6050, United States
| | - Bakhtiyor Rasulev
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102-6050, United States
| | - Andriy Voronov
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102-6050, United States
| |
Collapse
|
3
|
García-Jacas CR, Marrero-Ponce Y, Vivas-Reyes R, Suárez-Lezcano J, Martinez-Rios F, Terán JE, Aguilera-Mendoza L. Distributed and multicore QuBiLS-MIDAS software v2.0: Computing chiral, fuzzy, weighted and truncated geometrical molecular descriptors based on tensor algebra. J Comput Chem 2020; 41:1209-1227. [PMID: 32058625 DOI: 10.1002/jcc.26167] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2019] [Revised: 01/22/2020] [Accepted: 01/26/2020] [Indexed: 12/12/2022]
Abstract
Advances to the distributed, multi-core and fully cross-platform QuBiLS-MIDAS software v2.0 (http://tomocomd.com/qubils-midas) are reported in this article since the v1.0 release. The QuBiLS-MIDAS software is the only one that computes atom-pair and alignment-free geometrical MDs (3D-MDs) from several distance metrics other than the Euclidean distance, as well as alignment-free 3D-MDs that codify structural information regarding the relations among three and four atoms of a molecule. The most recent features added to the QuBiLS-MIDAS software v2.0 are related (a) to the calculation of atomic weightings from indices based on the vertex-degree invariant (e.g., Alikhanidi index); (b) to consider central chirality during the molecular encoding; (c) to use measures based on clustering methods and statistical functions to codify structural information among more than two atoms; (d) to the use of a novel method based on fuzzy membership functions to spherically truncate inter-atomic relations; and (e) to the use of weighted and fuzzy aggregation operators to compute global 3D-MDs according to the importance and/or interrelation of the atoms of a molecule during the molecular encoding. Moreover, a novel module to compute QuBiLS-MIDAS 3D-MDs from their headings was also developed. This module can be used either by the graphical user interface or by means of the software library. By using the library, both the predictive models built with the QuBiLS-MIDAS 3D-MDs and the QuBiLS-MIDAS 3D-MDs calculation can be embedded in other tools. A set of predefined QuBiLS-MIDAS 3D-MDs with high information content and low redundancy on a set comprised of 20,469 compounds is also provided to be employed in further cheminformatics tasks. This set of predefined 3D-MDs evidenced better performance than all the universe of Dragon (v5.5) and PaDEL 0D-to-3D MDs in variability studies, whereas a linear independence study proved that these QuBiLS-MIDAS 3D-MDs codify chemical information orthogonal to the Dragon 0D-to-3D MDs. This set of predefined 3D-MDs would be periodically updated as long as new results be achieved. In general, this report highlights our continued efforts to provide a better tool for a most suitable characterization of compounds, and in this way, to contribute to obtaining better outcomes in future applications.
Collapse
Affiliation(s)
- César R García-Jacas
- Cátedras Conacyt - Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja, California, Mexico
| | - Yovani Marrero-Ponce
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Pichincha, Ecuador.,Instituto de Simulación Computacional (ISC-USFQ), Diego de Robles y vía Interoceánica, Quito, Pichincha, Ecuador.,Grupo GINUMED, Corporacion Universitaria Rafael Nuñez, Facultad de Salud, Programa de Medicina, Cartagena, Colombia.,Unidad de Investigación de Diseño de Fármacos y Conectividad Molecular, Departamento de Química Física, Facultad de Farmacia, Universitat de València, Spain
| | - Ricardo Vivas-Reyes
- Grupo de Química Cuántica y Teórica de la Universidad de Cartagena - Facultad de Ciencias Exactas y Naturales. Programa de Química. Campus de San Pablo, Cartagena, Colombia.,Grupo CipTec, Facultad de Ingenierias. Fundacion Universitaria Tecnologico Comfenalco - Cartagena, Cartagena, Bolívar, Colombia
| | - José Suárez-Lezcano
- Pontificia Universidad Católica del Ecuador Sede Esmeraldas (PUCESE), Esmeraldas, Ecuador
| | | | - Julio E Terán
- Department of Textile Engineering, Chemistry and Science, College of Textiles, NorthCarolina State University, Raleigh, NC, USA
| | - Longendri Aguilera-Mendoza
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California, Mexico
| |
Collapse
|
4
|
García-Jacas CR, Marrero-Ponce Y, Brizuela CA, Suárez-Lezcano J, Martinez-Rios F. Smoothed Spherical Truncation based on Fuzzy Membership Functions: Application to the Molecular Encoding. J Comput Chem 2019; 41:203-217. [PMID: 31647589 DOI: 10.1002/jcc.26089] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Revised: 08/26/2019] [Accepted: 10/02/2019] [Indexed: 11/09/2022]
Abstract
A novel spherical truncation method, based on fuzzy membership functions, is introduced to truncate interatomic (or interaminoacid) relations according to smoothing values computed from fuzzy membership degrees. In this method, the molecules are circumscribed into a sphere, so that the geometric centers of the molecules are the centers of the spheres. The fuzzy membership degree of each atom (or aminoacid) is computed from its distance with respect to the geometric center of the molecule, by using a fuzzy membership function. So, the smoothing value to be applied in the truncation of a relation (or interaction) is computed by averaging the fuzzy membership degrees of the atoms (or aminoacids) involved in the relation. This truncation method is rather different from the existing ones, at considering the geometric center for the whole molecule and not only for atom-groups, as well as for using fuzzy membership functions to compute the smoothing values. A variability study on a set comprised of 20,469 compounds (15,050 drug-like compounds, 2994 drugs approved, 880 natural products from African sources, and 1545 plant-derived natural compounds exhibiting anti-cancerous activity) demonstrated that the truncation method proposed allows to determine molecular encodings with better ability for discriminating among structurally different molecules than the encodings obtained without applying truncation or applying non-fuzzy truncation functions. Moreover, a principal component analysis revealed that orthogonal chemical information of the molecules is encoded by using the method proposed. Lastly, a modeling study proved that the truncation method improves the modeling ability of existing geometric molecular descriptors, at allowing to develop more robust models than the ones built only using non-truncated descriptors. In this sense, a comparison and statistical assessment were performed on eight chemical datasets. As a result, the models based on the truncated molecular encodings yielded statistically better results than 12 procedures considered from the literature. It can thus be stated that the proposed truncation method is a relevant strategy for obtaining better molecular encodings, which will be ultimately useful in enhancing the modeling ability of existing encodings both on small-to-medium size molecules and biomacromolecules. © 2019 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- César R García-Jacas
- Cátedras CONACYT-Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California, Mexico
| | - Yovani Marrero-Ponce
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas; and Instituto de Simulación Computacional (ISC-USFQ), Quito, Pichincha, Ecuador.,Grupo GINUMED, Corporacion Universitaria Rafael Nuñez. Facultad de Salud, Programa de Medicina, Cartagena, Colombia.,Unidad de Investigación de Diseño de Fármacos y Conectividad Molecular, Departamento de Química Física, Facultad de Farmacia, Universitat de València, Spain
| | - Carlos A Brizuela
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California, Mexico
| | - José Suárez-Lezcano
- Pontificia Universidad Católica del Ecuador Sede Esmeraldas (PUCESE), Esmeraldas, Ecuador
| | | |
Collapse
|
5
|
Chen M, Jabeen F, Rasulev B, Ossowski M, Boudjouk P. A computational structure–property relationship study of glass transition temperatures for a diverse set of polymers. ACTA ACUST UNITED AC 2018. [DOI: 10.1002/polb.24602] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Affiliation(s)
- Min Chen
- Center for Computationally Assisted Science and TechnologyNorth Dakota State UniversityFargo North Dakota58102
- Department of Computer ScienceNorth Dakota State UniversityFargo North Dakota58102
| | - Farukh Jabeen
- Center for Computationally Assisted Science and TechnologyNorth Dakota State UniversityFargo North Dakota58102
| | - Bakhtiyor Rasulev
- Center for Computationally Assisted Science and TechnologyNorth Dakota State UniversityFargo North Dakota58102
- Department of Coatings and Polymeric MaterialsNorth Dakota State UniversityFargo North Dakota58102
| | - Martin Ossowski
- Center for Computationally Assisted Science and TechnologyNorth Dakota State UniversityFargo North Dakota58102
| | - Philip Boudjouk
- Department of Chemistry and BiochemistryNorth Dakota State UniversityFargo North Dakota58102
| |
Collapse
|
6
|
Engler MS, Scheubert K, Schubert US, Böcker S. Exploring the Limits of the Geometric Copolymerization Model. Polymers (Basel) 2017; 9:E101. [PMID: 30970781 PMCID: PMC6431939 DOI: 10.3390/polym9030101] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2017] [Revised: 03/07/2017] [Accepted: 03/08/2017] [Indexed: 11/28/2022] Open
Abstract
The geometric copolymerization model is a recently introduced statistical Markov chain model. Here, we investigate its practicality. First, several approaches to identify the optimal model parameters from observed copolymer fingerprints are evaluated using Monte Carlo simulated data. Directly optimizing the parameters is robust against noise but has impractically long running times. A compromise between robustness and running time is found by exploiting the relationship between monomer concentrations calculated by ordinary differential equations and the geometric model. Second, we investigate the applicability of the model to copolymerizations beyond living polymerization and show that the model is useful for copolymerizations involving termination and depropagation reactions.
Collapse
Affiliation(s)
- Martin S Engler
- Life Sciences Group, Centrum Wiskunde & Informatica, Science Park 123, 1089XG Amsterdam, The Netherlands.
| | - Kerstin Scheubert
- Chair of Bioinformatics, Friedrich Schiller University, Ernst-Abbe-Platz 2, 07743 Jena, Germany.
| | - Ulrich S Schubert
- Laboratory of Organic and Macromolecular Chemistry (IOMC), Friedrich Schiller University Jena, Humboldtstr. 10, 07743 Jena, Germany.
- Jena Center for Soft Matter (JCSM), Friedrich Schiller University Jena, Philosophenweg 7, 07743 Jena, Germany.
| | - Sebastian Böcker
- Chair of Bioinformatics, Friedrich Schiller University, Ernst-Abbe-Platz 2, 07743 Jena, Germany.
- Jena Center for Soft Matter (JCSM), Friedrich Schiller University Jena, Philosophenweg 7, 07743 Jena, Germany.
| |
Collapse
|
7
|
Rasulev B, Jabeen F, Stafslien S, Chisholm BJ, Bahr J, Ossowski M, Boudjouk P. Polymer Coating Materials and Their Fouling Release Activity: A Cheminformatics Approach to Predict Properties. ACS APPLIED MATERIALS & INTERFACES 2017; 9:1781-1792. [PMID: 27982587 DOI: 10.1021/acsami.6b12766] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
A novel cheminformatics-based approach has been employed to investigate a set of polymer coating materials designed to mitigate the accumulation of marine biofouling on surfaces immersed in the sea. Specifically, a set of 27 nontoxic, amphiphilic polysiloxane-based polymer coatings was synthesized using a combinatorial, high-throughput approach and characterized for fouling-release (FR) activity toward a number of relevant marine fouling organisms, including bacteria, microalgae, and adult barnacles. In order to model these complex systems adequately, a new computational technique was used in which all investigated polymer-based coating materials were considered as mixture systems comprising several compositional variables at a range of concentrations. By applying a combination of methodologies for mixture systems and a quantitative structure-activity relationship approach (QSAR), seven unique QSAR models were developed that were able to successfully predict the desired FR properties. Furthermore, the developed models identified several significant descriptors responsible for FR activity of investigated polymer-based coating materials, with correlation coefficients ranging from rtest2 = 0.63 to 0.94. The computational models derived from this study may serve as a powerful set of tools to predict optimal combinations of source components to produce amphiphilic polysiloxane-based coating systems with effective, broad-spectrum FR properties.
Collapse
Affiliation(s)
- Bakhtiyor Rasulev
- Center for Computationally Assisted Science and Technology, North Dakota State University , Fargo, North Dakota, United States
- Department of Coatings and Polymeric Materials, North Dakota State University , Fargo, North Dakota, United States
| | - Farukh Jabeen
- Center for Computationally Assisted Science and Technology, North Dakota State University , Fargo, North Dakota, United States
| | - Shane Stafslien
- Research and Creative Activities, North Dakota State University , Fargo, North Dakota, United States
| | - Bret J Chisholm
- Department of Coatings and Polymeric Materials, North Dakota State University , Fargo, North Dakota, United States
| | - James Bahr
- Research and Creative Activities, North Dakota State University , Fargo, North Dakota, United States
| | - Martin Ossowski
- Center for Computationally Assisted Science and Technology, North Dakota State University , Fargo, North Dakota, United States
| | - Philip Boudjouk
- Center for Computationally Assisted Science and Technology, North Dakota State University , Fargo, North Dakota, United States
- Department of Chemistry and Biochemistry, North Dakota State University , Fargo, North Dakota, United States
| |
Collapse
|
8
|
Engler MS, Scheubert K, Schubert US, Böcker S. New Statistical Models for Copolymerization. Polymers (Basel) 2016; 8:E240. [PMID: 30979335 PMCID: PMC6432000 DOI: 10.3390/polym8060240] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2016] [Revised: 06/07/2016] [Accepted: 06/15/2016] [Indexed: 11/16/2022] Open
Abstract
For many years, copolymerization has been studied using mathematical and statistical models. Here, we present new Markov chain models for copolymerization kinetics: the Bernoulli and Geometric models. They model copolymer synthesis as a random process and are based on a basic reaction scheme. In contrast to previous Markov chain approaches to copolymerization, both models take variable chain lengths and time-dependent monomer probabilities into account and allow for computing sequence likelihoods and copolymer fingerprints. Fingerprints can be computed from copolymer mass spectra, potentially allowing us to estimate the model parameters from measured fingerprints. We compare both models against Monte Carlo simulations. We find that computing the models is fast and memory efficient.
Collapse
Affiliation(s)
- Martin S Engler
- Chair of Bioinformatics, Friedrich Schiller University Jena, Ernst-Abbe-Platz 2, 07743 Jena, Germany.
| | - Kerstin Scheubert
- Chair of Bioinformatics, Friedrich Schiller University Jena, Ernst-Abbe-Platz 2, 07743 Jena, Germany.
| | - Ulrich S Schubert
- Laboratory of Organic and Macromolecular Chemistry (IOMC), Friedrich Schiller University Jena, Humboldtstr. 10, 07743 Jena, Germany.
- Jena Center for Soft Matter (JCMS), Friedrich Schiller University Jena, Philosophenweg 7, 07743 Jena, Germany.
| | - Sebastian Böcker
- Chair of Bioinformatics, Friedrich Schiller University Jena, Ernst-Abbe-Platz 2, 07743 Jena, Germany.
- Jena Center for Soft Matter (JCMS), Friedrich Schiller University Jena, Philosophenweg 7, 07743 Jena, Germany.
| |
Collapse
|
9
|
Ruiz-Blanco YB, Paz W, Green J, Marrero-Ponce Y. ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins. BMC Bioinformatics 2015; 16:162. [PMID: 25982853 PMCID: PMC4432771 DOI: 10.1186/s12859-015-0586-0] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Accepted: 04/22/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The exponential growth of protein structural and sequence databases is enabling multifaceted approaches to understanding the long sought sequence-structure-function relationship. Advances in computation now make it possible to apply well-established data mining and pattern recognition techniques to these data to learn models that effectively relate structure and function. However, extracting meaningful numerical descriptors of protein sequence and structure is a key issue that requires an efficient and widely available solution. RESULTS We here introduce ProtDCal, a new computational software suite capable of generating tens of thousands of features considering both sequence-based and 3D-structural descriptors. We demonstrate, by means of principle component analysis and Shannon entropy tests, how ProtDCal's sequence-based descriptors provide new and more relevant information not encoded by currently available servers for sequence-based protein feature generation. The wide diversity of the 3D-structure-based features generated by ProtDCal is shown to provide additional complementary information and effectively completes its general protein encoding capability. As demonstration of the utility of ProtDCal's features, prediction models of N-linked glycosylation sites are trained and evaluated. Classification performance compares favourably with that of contemporary predictors of N-linked glycosylation sites, in spite of not using domain-specific features as input information. CONCLUSIONS ProtDCal provides a friendly and cross-platform graphical user interface, developed in the Java programming language and is freely available at: http://bioinf.sce.carleton.ca/ProtDCal/ . ProtDCal introduces local and group-based encoding which enhances the diversity of the information captured by the computed features. Furthermore, we have shown that adding structure-based descriptors contributes non-redundant additional information to the features-based characterization of polypeptide systems. This software is intended to provide a useful tool for general-purpose encoding of protein sequences and structures for applications is protein classification, similarity analyses and function prediction.
Collapse
Affiliation(s)
- Yasser B Ruiz-Blanco
- Unit of Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research (CAMD-BIR Unit), Facultad de Química y Farmacia, Universidad Central "Marta Abreu" de Las Villas, Road to Camajuani km 5 ½, Santa Clara, CP: 54830, Villa Clara, Cuba. .,Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada.
| | - Waldo Paz
- Unit of Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research (CAMD-BIR Unit), Facultad de Química y Farmacia, Universidad Central "Marta Abreu" de Las Villas, Road to Camajuani km 5 ½, Santa Clara, CP: 54830, Villa Clara, Cuba. .,Centre of Informatics Studies (CEI), Universidad Central "Marta Abreu" de Las Villas, Road to Camajuani km 5 ½, Santa Clara, CP:54830, Villa Clara, Cuba.
| | - James Green
- Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada.
| | - Yovani Marrero-Ponce
- Unit of Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research (CAMD-BIR Unit), Facultad de Química y Farmacia, Universidad Central "Marta Abreu" de Las Villas, Road to Camajuani km 5 ½, Santa Clara, CP: 54830, Villa Clara, Cuba. .,Grupo de Investigación Microbiología y Ambiente (GIMA). Programa de Bacteriología, Facultad Ciencias de la Salud, Universidad de San Buenaventura, Calle Real de Ternera, Cartagena (Bolivar), Colombia.
| |
Collapse
|
10
|
Rodriguez-Soca Y, Munteanu CR, Dorado J, Rabuñal J, Pazos A, González-Díaz H. Plasmod-PPI: A web-server predicting complex biopolymer targets in plasmodium with entropy measures of protein–protein interactions. POLYMER 2010. [DOI: 10.1016/j.polymer.2009.11.029] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
11
|
Rodriguez-Soca Y, Munteanu CR, Dorado J, Pazos A, Prado-Prado FJ, González-Díaz H. Trypano-PPI: A Web Server for Prediction of Unique Targets in Trypanosome Proteome by using Electrostatic Parameters of Protein−protein Interactions. J Proteome Res 2009; 9:1182-90. [DOI: 10.1021/pr900827b] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Affiliation(s)
- Yamilet Rodriguez-Soca
- Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782, Santiago de Compostela, Spain, and Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, Spain
| | - Cristian R. Munteanu
- Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782, Santiago de Compostela, Spain, and Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, Spain
| | - Julián Dorado
- Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782, Santiago de Compostela, Spain, and Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, Spain
| | - Alejandro Pazos
- Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782, Santiago de Compostela, Spain, and Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, Spain
| | - Francisco J. Prado-Prado
- Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782, Santiago de Compostela, Spain, and Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, Spain
| | - Humberto González-Díaz
- Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782, Santiago de Compostela, Spain, and Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, Spain
| |
Collapse
|
12
|
Munteanu CR, Vázquez JM, Dorado J, Sierra AP, Sánchez-González Á, Prado-Prado FJ, González-Díaz H. Complex Network Spectral Moments for ATCUN Motif DNA Cleavage: First Predictive Study on Proteins of Human Pathogen Parasites. J Proteome Res 2009; 8:5219-28. [DOI: 10.1021/pr900556g] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Cristian R. Munteanu
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, s/n 15071 A Coruña, Spain, Department of Inorganic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela, Praza Seminario de Estudos Galegos, s/n. Campus sur, 15782 Santiago de Compostela, Spain, and Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, Praza Seminario de Estudos Galegos, s/n. Campus sur, 15782
| | - José M. Vázquez
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, s/n 15071 A Coruña, Spain, Department of Inorganic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela, Praza Seminario de Estudos Galegos, s/n. Campus sur, 15782 Santiago de Compostela, Spain, and Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, Praza Seminario de Estudos Galegos, s/n. Campus sur, 15782
| | - Julián Dorado
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, s/n 15071 A Coruña, Spain, Department of Inorganic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela, Praza Seminario de Estudos Galegos, s/n. Campus sur, 15782 Santiago de Compostela, Spain, and Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, Praza Seminario de Estudos Galegos, s/n. Campus sur, 15782
| | - Alejandro Pazos Sierra
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, s/n 15071 A Coruña, Spain, Department of Inorganic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela, Praza Seminario de Estudos Galegos, s/n. Campus sur, 15782 Santiago de Compostela, Spain, and Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, Praza Seminario de Estudos Galegos, s/n. Campus sur, 15782
| | - Ángeles Sánchez-González
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, s/n 15071 A Coruña, Spain, Department of Inorganic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela, Praza Seminario de Estudos Galegos, s/n. Campus sur, 15782 Santiago de Compostela, Spain, and Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, Praza Seminario de Estudos Galegos, s/n. Campus sur, 15782
| | - Francisco J. Prado-Prado
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, s/n 15071 A Coruña, Spain, Department of Inorganic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela, Praza Seminario de Estudos Galegos, s/n. Campus sur, 15782 Santiago de Compostela, Spain, and Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, Praza Seminario de Estudos Galegos, s/n. Campus sur, 15782
| | - Humberto González-Díaz
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, s/n 15071 A Coruña, Spain, Department of Inorganic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela, Praza Seminario de Estudos Galegos, s/n. Campus sur, 15782 Santiago de Compostela, Spain, and Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, Praza Seminario de Estudos Galegos, s/n. Campus sur, 15782
| |
Collapse
|
13
|
Pérez-Montoto LG, Dea-Ayuela MA, Prado-Prado FJ, Bolas-Fernández F, Ubeira FM, González-Díaz H. Study of peptide fingerprints of parasite proteins and drug-DNA interactions with Markov-Mean-Energy invariants of biopolymer molecular-dynamic lattice networks. POLYMER 2009; 50:3857-3870. [PMID: 32287404 PMCID: PMC7111648 DOI: 10.1016/j.polymer.2009.05.055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2009] [Revised: 05/06/2009] [Accepted: 05/14/2009] [Indexed: 11/26/2022]
Abstract
Since the advent of Molecular Dynamics (MD) in biopolymers science with the study by Karplus et al. on protein dynamics, MD has become the by foremost well established, computational technique to investigate structure and function of biomolecules and their respective complexes and interactions. The analysis of the MD trajectories (MDTs) remains, however, the greatest challenge and requires a great deal of insight, experience, and effort. Here, we introduce a new class of invariants for MDTs based on the spatial distribution of Mean-Energy values ξk (L) on a 2D Euclidean space representation of the MDTs. The procedure forces one MD trajectory to fold into a 2D Cartesian coordinates system using a step-by-step procedure driven by simple rules. The ξk (L) values are invariants of a Markov matrix (1 Π), which describes the probabilities of transition between two states in the new 2D space; which is associated to a graph representation of MDTs similar to the lattice networks (LNs) of DNA and protein sequences. We also introduce a new algorithm to perform phylogenetic analysis of peptides based on MDTs instead of the sequence of the polypeptide. In a first experiment, we illustrate this algorithm for 35 peptides present on the Peptide Mass Fingerprint (PMF) of a new protein of Leishmania infantum studied in this work. We report, by the first time, 2D Electrophoresis isolation, MALDI TOF Mass Spectroscopy characterization, and MASCOT search results for this PMF. In a second experiment, we construct the LNs for 422 MDTs obtained in DNA-Drug Docking simulations of the interaction of 57 anticancer furocoumarins with a DNA oligonucleotide. We calculated the respective ξk (L) values for all these LNs and used them as inputs to train a new classifier with Accuracy = 85.44% and 84.91% in training and validation respectively. The new model can be used as scoring function to guide DNA-Drug Docking studies in drug design of new coumarins for PUVA therapy. The new phylogenetics analysis algorithms encode information different from sequence similarity and may be used to analyze MDTs obtained in Docking or modeling experiments for any classes of biopolymers. The work opens new perspective on the analysis and applications of MD in polymer sciences.
Collapse
Affiliation(s)
- Lázaro Guillermo Pérez-Montoto
- Department of Microbiology and Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain,Department of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - María Auxiliadora Dea-Ayuela
- Departamento de Atención Sanitaria, Salud Pública y Sanidad Animal, Facultad CC Experimentales y de La Salud, Universidad CEU Cardenal Herrera, 46113 Moncada (Valencia), Spain
| | - Francisco J. Prado-Prado
- Department of Microbiology and Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain,Department of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | | | - Florencio M. Ubeira
- Department of Microbiology and Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - Humberto González-Díaz
- Department of Microbiology and Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain,Corresponding author. Department of Microbiology and Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| |
Collapse
|
14
|
Cruz-Monteagudo M, Munteanu CR, Borges F, Cordeiro MND, Uriarte E, Chou KC, González-Díaz H. Stochastic molecular descriptors for polymers. 4. Study of complex mixtures with topological indices of mass spectra spiral and star networks: The blood proteome case. POLYMER 2008. [DOI: 10.1016/j.polymer.2008.09.070] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
15
|
Cruz-Monteagudo M, González-Díaz H, Borges F, Dominguez ER, Cordeiro MNDS. 3D-MEDNEs: an alternative "in silico" technique for chemical research in toxicology. 2. quantitative proteome-toxicity relationships (QPTR) based on mass spectrum spiral entropy. Chem Res Toxicol 2008; 21:619-32. [PMID: 18257557 DOI: 10.1021/tx700296t] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Low range mass spectra (MS) characterization of serum proteome offers the best chance of discovering proteome-(early drug-induced cardiac toxicity) relationships, called here Pro-EDICToRs. However, due to the thousands of proteins involved, finding the single disease-related protein could be a hard task. The search for a model based on general MS patterns becomes a more realistic choice. In our previous work ( González-Díaz, H. , et al. Chem. Res. Toxicol. 2003, 16, 1318- 1327 ), we introduced the molecular structure information indices called 3D-Markovian electronic delocalization entropies (3D-MEDNEs). In this previous work, quantitative structure-toxicity relationship (QSTR) techniques allowed us to link 3D-MEDNEs with blood toxicological properties of drugs. In this second part, we extend 3D-MEDNEs to numerically encode biologically relevant information present in MS of the serum proteome for the first time. Using the same idea behind QSTR techniques, we can seek now by analogy a quantitative proteome-toxicity relationship (QPTR). The new QPTR models link MS 3D-MEDNEs with drug-induced toxicological properties from blood proteome information. We first generalized Randic's spiral graph and lattice networks of protein sequences to represent the MS of 62 serum proteome samples with more than 370 100 intensity ( I i ) signals with m/ z bandwidth above 700-12000 each. Next, we calculated the 3D-MEDNEs for each MS using the software MARCH-INSIDE. After that, we developed several QPTR models using different machine learning and MS representation algorithms to classify samples as control or positive Pro-EDICToRs samples. The best QPTR proposed showed accuracy values ranging from 83.8% to 87.1% and leave-one-out (LOO) predictive ability of 77.4-85.5%. This work demonstrated that the idea behind classic drug QSTR models may be extended to construct QPTRs with proteome MS data.
Collapse
Affiliation(s)
- Maykel Cruz-Monteagudo
- Physico-Chemical Molecular Research Unit, Department of Organic Chemistry, Faculty of Pharmacy, University of Porto, 4150-047 Porto, Portugal
| | | | | | | | | |
Collapse
|
16
|
González-Díaz H, Saíz-Urra L, Molina R, González-Díaz Y, Sánchez-González A. Computational chemistry approach to protein kinase recognition using 3D stochastic van der Waals spectral moments. J Comput Chem 2007; 28:1042-8. [PMID: 17269125 DOI: 10.1002/jcc.20649] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Three-dimensional (3D) protein structures now frequently lack functional annotations because of the increase in the rate at which chemical structures are solved with respect to experimental knowledge of biological activity. As a result, predicting structure-function relationships for proteins is an active research field in computational chemistry and has implications in medicinal chemistry, biochemistry and proteomics. In previous studies stochastic spectral moments were used to predict protein stability or function (González-Díaz, H. et al. Bioorg Med Chem 2005, 13, 323; Biopolymers 2005, 77, 296). Nevertheless, these moments take into consideration only electrostatic interactions and ignore other important factors such as van der Waals interactions. The present study introduces a new class of 3D structure molecular descriptors for folded proteins named the stochastic van der Waals spectral moments ((o)beta(k)). Among many possible applications, recognition of kinases was selected due to the fact that previous computational chemistry studies in this area have not been reported, despite the widespread distribution of kinases. The best linear model found was Kact = -9.44 degrees beta(0)(c) +10.94 degrees beta(5)(c) -2.40 degrees beta(0)(i) + 2.45 degrees beta(5)(m) + 0.73, where core (c), inner (i) and middle (m) refer to specific spatial protein regions. The model with a high Matthew's regression coefficient (0.79) correctly classified 206 out of 230 proteins (89.6%) including both training and predicting series. An area under the ROC curve of 0.94 differentiates our model from a random classifier. A subsequent principal components analysis of 152 heterogeneous proteins demonstrated that beta(k) codifies information different to other descriptors used in protein computational chemistry studies. Finally, the model recognizes 110 out of 125 kinases (88.0%) in a virtual screening experiment and this can be considered as an additional validation study (these proteins were not used in training or predicting series).
Collapse
Affiliation(s)
- Humberto González-Díaz
- Department of Organic Chemistry and Institute of Industrial Pharmacy, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain.
| | | | | | | | | |
Collapse
|
17
|
González-Díaz H, Viña D, Santana L, de Clercq E, Uriarte E. Stochastic entropy QSAR for the in silico discovery of anticancer compounds: Prediction, synthesis, and in vitro assay of new purine carbanucleosides. Bioorg Med Chem 2006; 14:1095-107. [PMID: 16253507 DOI: 10.1016/j.bmc.2005.09.039] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2005] [Revised: 09/12/2005] [Accepted: 09/13/2005] [Indexed: 11/20/2022]
Abstract
A Markov model based QSAR is introduced for the rational selection of anticancer compounds. The model discriminates 90.3% of 226 structurally heterogeneous anticancer/non-anticancer compounds in training series. External validation series were used to validate the model; the 91.8% containing 85 compounds, not considered to fit the model, were correctly classified. The model developed is afterwards used in a simulation of a virtual search for anticancer compounds never considered either in training or in predicting series. The 87.7% of the 213 anticancer compounds used in this simulated search were correctly classified. The model also shows high values for specificity (0.89), sensitivity (0.91), and Mathews correlation coefficient (0.79). In addition, the present model compares better-to-similar with respect to other four models elsewhere reported if one takes into consideration 26 comparison parameters. Finally, we exemplify the use of the model in practice with the design of a new series of carbanucleosides. The compounds evaluated with the model were synthesized and experimentally assayed for their antitumor effects on the proliferation of murine leukemia cells (L1210/0) and human T-lymphocyte cells (CEM/0 and Molt4/C8). The more interesting activity was detected for the compound 5a with a predicted probability of 80.2% and IC(50) = 27.0, 27.2, and 29.4 microM, respectively, against the above-mentioned cellular lines. These values are comparable to those for the control compound Ara-A.
Collapse
Affiliation(s)
- Humberto González-Díaz
- Department of Drug Design, Chemical Bioactives Center, Central University of Las Villas, Villa Clara, Cuba.
| | | | | | | | | |
Collapse
|
18
|
Agüero-Chapin G, González-Díaz H, Molina R, Varona-Santos J, Uriarte E, González-Díaz Y. Novel 2D maps and coupling numbers for protein sequences. The first QSAR study of polygalacturonases; isolation and prediction of a novel sequence fromPsidium guajavaL. FEBS Lett 2006; 580:723-30. [PMID: 16413021 DOI: 10.1016/j.febslet.2005.12.072] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2005] [Revised: 12/19/2005] [Accepted: 12/21/2005] [Indexed: 10/25/2022]
Abstract
The development of 2D graph-theoretic representations for DNA sequences was very important for qualitative and quantitative comparison of sequences. Calculation of numeric features for these representations is useful for DNA-QSAR studies. Most of all graph-theoretic representations identify each one of the four bases with a unitary walk in one axe direction in the 2D space. In the case of proteins, twenty amino acids instead of four bases have to be considered. This fact has limited the introduction of useful 2D Cartesian representations and the corresponding sequences descriptors to encode protein sequence information. In this study, we overcome this problem grouping amino acids into four groups: acid, basic, polar and non-polar amino acids. The identification of each group with one of the four axis directions determines a novel 2D representation and numeric descriptors for proteins sequences. Afterwards, a Markov model has been used to calculate new numeric descriptors of the protein sequence. These descriptors are called herein the sequence 2D coupling numbers (zeta(k)). In this work, we calculated the zeta(k) values for 108 sequences of different polygalacturonases (PGs) and for 100 sequences of other proteins. A Linear Discriminant Analysis model derived here (PG=5.36.zeta1-3.98.zeta3-42.21) successfully discriminates between PGs and other proteins. The model correctly classified 100% of a subset of 81 PGs and 75 non-PG proteins sequences used to train the model. The model also correctly classified 51 out of 52 (98.07%) of proteins sequences used as external validation series. The uses of different group of amino acids and/or axes orientation give different results, so it is suggested to be explored for other databases. Finally, to illustrates the use of the model we report the isolation and prediction of the PG action for a novel sequence (AY908988) isolated by our group from Psidium guajava L. This prediction coincides very well with sequence alignment results found by the BLAST methodology. These findings illustrate the possibilities of the sequence descriptors derived for this novel 2D sequence representation in proteins sequence QSAR studies.
Collapse
|
19
|
González-Díaz H, Uriarte E. Proteins QSAR with Markov average electrostatic potentials. Bioorg Med Chem Lett 2005; 15:5088-94. [PMID: 16169216 DOI: 10.1016/j.bmcl.2005.07.056] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2005] [Revised: 06/28/2005] [Accepted: 07/05/2005] [Indexed: 11/30/2022]
Abstract
Classic physicochemical and topological indices have been largely used in small molecules QSAR but less in proteins QSAR. In this study, a Markov model is used to calculate, for the first time, average electrostatic potentials xik for an indirect interaction between aminoacids placed at topologic distances k within a given protein backbone. The short-term average stochastic potential xi1 for 53 Arc repressor mutants was used to model the effect of Alanine scanning on thermal stability. The Arc repressor is a model protein of relevance for biochemical studies on bioorganics and medicinal chemistry. A linear discriminant analysis model developed correctly classified 43 out of 53, 81.1% of proteins according to their thermal stability. More specifically, the model classified 20/28, 71.4% of proteins with near wild-type stability and 23/25, 92.0% of proteins with reduced stability. Moreover, predictability in cross-validation procedures was of 81.0%. Expansion of the electrostatic potential in the series xi0, xi1, xi2, and xi3, justified the use of the abrupt truncation approach, being the overall accuracy >70.0% for xi0 but equal for xi1, xi2, and xi3. The xi1 model compared favorably with respect to others based on D-Fire potential, surface area, volume, partition coefficient, and molar refractivity, with less than 77.0% of accuracy [Ramos de Armas, R.; González-Díaz, H.; Molina, R.; Uriarte, E. Protein Struct. Func. Bioinf.2004, 56, 715]. The xi1 model also has more tractable interpretation than others based on Markovian negentropies and stochastic moments. Finally, the model is notably simpler than the two models based on quadratic and linear indices. Both models, reported by Marrero-Ponce et al., use four-to-five time more descriptors. Introduction of average stochastic potentials may be useful for QSAR applications; having xik amenable physical interpretation and being very effective.
Collapse
Affiliation(s)
- Humberto González-Díaz
- Department of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela 15782, Spain.
| | | |
Collapse
|
20
|
González-Díaz H, Molina R, Uriarte E. Recognition of stable protein mutants with 3D stochastic average electrostatic potentials. FEBS Lett 2005; 579:4297-301. [PMID: 16081074 DOI: 10.1016/j.febslet.2005.06.065] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2004] [Revised: 06/07/2005] [Accepted: 06/23/2005] [Indexed: 11/15/2022]
Abstract
As more and more proteins are applied to biochemical research there is increasing interest in studying their stability. In this study, a Markov model has been used to calculate molecular descriptors of the protein structure and these are called the average electrostatic potentials (xi(k)). These descriptors were intended to encode indirect electrostatic pair-wise interactions between amino acids located at Euclidean distance k within a given 3D protein backbone. The different xi(k) values could be calculated for the protein as a whole or for specific protein regions (orbits), which include amino acids that lie within a given range of distances from the center of charge of the protein. In this work we calculated the xi(k) values for 657 mutants of different proteins. A Linear Discriminant Analysis model correctly classified a subset of 435 out of 493 proteins according to their thermal stability - a level of predictability of 88.2%. This experiment was repeated with three additional subsets of proteins selected at random from the initial series of 657. More specifically, the model predicted 314/356 (88.2%) of mutants with higher stability than the corresponding wild-type protein and 264/301 (86.7%) of proteins with near wild-type stability. These results illustrate the possibilities for the average stochastic potentials xi(k) in the study of 3D-structure/property relationships for biochemically relevant proteins.
Collapse
Affiliation(s)
- Humberto González-Díaz
- Department of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela 15782, Spain.
| | | | | |
Collapse
|