1
|
Li H, Naeem A, Yousaf S, Aslam A, Tchier F, Tola KA. Topological analysis and predictive modeling of amino acid structures with implications for bioinformatics and structural biology. Sci Rep 2025; 15:638. [PMID: 39753647 PMCID: PMC11699217 DOI: 10.1038/s41598-024-83697-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 12/17/2024] [Indexed: 01/06/2025] Open
Abstract
Amino acids, as the fundamental constituents of proteins and enzymes, play a vital role in various biological processes. Amino acids such as histidine, cysteine, and methionine are known to coordinate with metal ions in proteins and enzymes, playing critical roles in their structure and function. In metalloproteins, metal ions are often coordinated by specific amino acid residues, contributing to the protein's stability and catalytic activity. Investigating the structural properties of amino acids is paramount to understanding the intricacies of protein function and interactions. The molecular structure of amino acid structures are examined using topological indices that are based on both distance and degree. These indices capture unique structural features of amino acids in their molecular graphs. We have developed linear, quadratic, and logarithmic regression models to estimate the five physical/chemical properties of twenty-two amino acids molecules. The findings reveal novel insights into the structural determinants of amino acid properties and present efficient predictive models for various attributes. This research contributes towards better understanding amino acid structures and offers practical applications in bioinformatics, drug design, and structural biology, enhancing the ability to manipulate and comprehend the molecular world.
Collapse
Affiliation(s)
- Huili Li
- School of Software, Pingdingshan University, Pingdingshan, 467000, Henan, China
- International Joint Laboratory for Multidimensional Topology and Carcinogenic Characteristics Analysis of Atmospheric Particulate Matter PM2.5, Pingdingshan, 467000, Henan, China
| | - Anisa Naeem
- Department of Mathematics, Faculty of Science, University of Gujrat, Gujrat, Pakistan
| | - Shamaila Yousaf
- Department of Mathematics, Faculty of Science, University of Gujrat, Gujrat, Pakistan
| | - Adnan Aslam
- Department of Natural Sciences and Humanities, University of Engineering and Technology, Lahore (RCET), Lahore, Pakistan
| | - Fairouz Tchier
- Mathematics Department, College of Science, King Saud University, P.O. Box 22452, 11495, Riyadh, Saudi Arabia
| | - Keneni Abera Tola
- Department of Mathematics, College of Natural and Computational Sciences, Wollega University, Nekemte, Ethiopia.
| |
Collapse
|
2
|
Wise K, Phan N, Selby-Pham J, Simovich T, Gill H. Utilisation of QSPR ODT modelling and odour vector modelling to predict Cannabis sativa odour. PLoS One 2023; 18:e0284842. [PMID: 37098051 PMCID: PMC10128932 DOI: 10.1371/journal.pone.0284842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 04/11/2023] [Indexed: 04/26/2023] Open
Abstract
Cannabis flower odour is an important aspect of product quality as it impacts the sensory experience when administered, which can affect therapeutic outcomes in paediatric patient populations who may reject unpalatable products. However, the cannabis industry has a reputation for having products with inconsistent odour descriptions and misattributed strain names due to the costly and laborious nature of sensory testing. Herein, we evaluate the potential of using odour vector modelling for predicting the odour intensity of cannabis products. Odour vector modelling is proposed as a process for transforming routinely produced volatile profiles into odour intensity (OI) profiles which are hypothesised to be more informative to the overall product odour (sensory descriptor; SD). However, the calculation of OI requires compound odour detection thresholds (ODT), which are not available for many of the compounds present in natural volatile profiles. Accordingly, to apply the odour vector modelling process to cannabis, a QSPR statistical model was first produced to predict ODT from physicochemical properties. The model presented herein was produced by polynomial regression with 10-fold cross-validation from 1,274 median ODT values to produce a model with R2 = 0.6892 and a 10-fold R2 = 0.6484. This model was then applied to terpenes which lacked experimentally determined ODT values to facilitate vector modelling of cannabis OI profiles. Logistic regression and k-means unsupervised cluster analysis was applied to both the raw terpene data and the transformed OI profiles to predict the SD of 265 cannabis samples and the accuracy of the predictions across the two datasets was compared. Out of the 13 SD categories modelled, OI profiles performed equally well or better than the volatile profiles for 11 of the SD, and across all SD the OI data was on average 21.9% more accurate (p = 0.031). The work herein is the first example of the application of odour vector modelling to complex volatile profiles of natural products and demonstrates the utility of OI profiles for the prediction of cannabis odour. These findings advance both the understanding of the odour modelling process which has previously only been applied to simple mixtures, and the cannabis industry which can utilise this process for more accurate prediction of cannabis odour and thereby reduce unpleasant patient experiences.
Collapse
Affiliation(s)
- Kimber Wise
- School of Science, RMIT University, Bundoora, Victoria, Australia
- Nutrifield, Sunshine West, Victoria, Australia
| | - Nicholas Phan
- Faculty of Science, Monash University, Clayton, Victoria, Australia
| | - Jamie Selby-Pham
- School of Science, RMIT University, Bundoora, Victoria, Australia
- Nutrifield, Sunshine West, Victoria, Australia
| | - Tomer Simovich
- School of Engineering, RMIT University, Melbourne, Victoria, Australia
- PerkinElmer Inc., Glen Waverley, Victoria, Australia
| | - Harsharn Gill
- School of Science, RMIT University, Bundoora, Victoria, Australia
| |
Collapse
|
3
|
Toropov AA, Toropova AP, Benfenati E, Nicolotti O, Carotti A, Nesmerak K, Veselinović AM, Veselinović JB, Duchowicz PR, Bacelo D, Castro EA, Rasulev BF, Leszczynska D, Leszczynski J. QSPR/QSAR Analyses by Means of the CORAL Software. PHARMACEUTICAL SCIENCES 2017. [DOI: 10.4018/978-1-5225-1762-7.ch036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
In this chapter, the methodology of building up quantitative structure—property/activity relationships (QSPRs/QSARs)—by means of the CORAL software is described. The Monte Carlo method is the basis of this approach. Simplified Molecular Input-Line Entry System (SMILES) is used as the representation of the molecular structure. The conversion of SMILES into the molecular graph is available for QSPR/QSAR analysis using the CORAL software. The model for an endpoint is a mathematical function of the correlation weights for various features of the molecular structure. Hybrid models that are based on features extracted from both SMILES and a graph also can be built up by the CORAL software. The conceptually new ideas collected and revealed through the CORAL software are: (1) any QSPR/QSAR model is a random event; and (2) optimal descriptor can be a translator of eclectic information into an endpoint prediction.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Pablo R. Duchowicz
- Instituto de Investigaciones Fisicoquímicas Teóricas y Aplicadas INIFTA (UNLP, CCT La Plata-CONICET), Argentina
| | | | - Eduardo A. Castro
- Instituto de Investigaciones Fisicoquímicas Teóricas y Aplicadas INIFTA (UNLP, CCT La Plata-CONICET), Argentina
| | | | | | | |
Collapse
|
4
|
Monte Carlo-based QSAR modeling of dimeric pyridinium compounds and drug design of new potent acetylcholine esterase inhibitors for potential therapy of myasthenia gravis. Struct Chem 2016. [DOI: 10.1007/s11224-016-0776-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
|
5
|
In silico prediction of the β-cyclodextrin complexation based on Monte Carlo method. Int J Pharm 2015; 495:404-409. [DOI: 10.1016/j.ijpharm.2015.08.078] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2015] [Accepted: 08/24/2015] [Indexed: 01/24/2023]
|
6
|
Živković JV, Trutić NV, Veselinović JB, Nikolić GM, Veselinović AM. Monte Carlo method based QSAR modeling of maleimide derivatives as glycogen synthase kinase-3β inhibitors. Comput Biol Med 2015; 64:276-82. [DOI: 10.1016/j.compbiomed.2015.07.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Revised: 06/28/2015] [Accepted: 07/07/2015] [Indexed: 12/23/2022]
|
7
|
Toropov AA, Toropova AP, Benfenati E, Nicolotti O, Carotti A, Nesmerak K, Veselinović AM, Veselinović JB, Duchowicz PR, Bacelo D, Castro EA, Rasulev BF, Leszczynska D, Leszczynski J. QSPR/QSAR Analyses by Means of the CORAL Software. QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS IN DRUG DESIGN, PREDICTIVE TOXICOLOGY, AND RISK ASSESSMENT 2015. [DOI: 10.4018/978-1-4666-8136-1.ch015] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
In this chapter, the methodology of building up quantitative structure—property/activity relationships (QSPRs/QSARs)—by means of the CORAL software is described. The Monte Carlo method is the basis of this approach. Simplified Molecular Input-Line Entry System (SMILES) is used as the representation of the molecular structure. The conversion of SMILES into the molecular graph is available for QSPR/QSAR analysis using the CORAL software. The model for an endpoint is a mathematical function of the correlation weights for various features of the molecular structure. Hybrid models that are based on features extracted from both SMILES and a graph also can be built up by the CORAL software. The conceptually new ideas collected and revealed through the CORAL software are: (1) any QSPR/QSAR model is a random event; and (2) optimal descriptor can be a translator of eclectic information into an endpoint prediction.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Pablo R. Duchowicz
- Instituto de Investigaciones Fisicoquímicas Teóricas y Aplicadas INIFTA (UNLP, CCT La Plata-CONICET), Argentina
| | | | - Eduardo A. Castro
- Instituto de Investigaciones Fisicoquímicas Teóricas y Aplicadas INIFTA (UNLP, CCT La Plata-CONICET), Argentina
| | | | | | | |
Collapse
|
8
|
Harini M, Adhikari J, Rani KY. A Review on Property Estimation Methods and Computational Schemes for Rational Solvent Design: A Focus on Pharmaceuticals. Ind Eng Chem Res 2013. [DOI: 10.1021/ie301329y] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- M. Harini
- Department of Chemical
Engineering, Indian Institute of Technology, Bombay, Mumbai-400076, India
| | - Jhumpa Adhikari
- Department of Chemical
Engineering, Indian Institute of Technology, Bombay, Mumbai-400076, India
| | - K. Yamuna Rani
- Chemical Engineering Division, Indian Institute of Chemical Technology, Hyderabad-500607,
India
| |
Collapse
|
9
|
Models for anti-tumor activity of bisphosphonates using refined topochemical descriptors. THE SCIENCE OF NATURE - NATURWISSENSCHAFTEN 2011; 98:871-87. [PMID: 21892780 DOI: 10.1007/s00114-011-0839-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2011] [Revised: 08/16/2011] [Accepted: 08/17/2011] [Indexed: 10/17/2022]
Abstract
An in silico approach comprising of decision tree (DT), random forest (RF) and moving average analysis (MAA) was successfully employed for development of models for prediction of anti-tumor activity of bisphosphonates. A dataset consisting of 65 analogues of both nitrogen-containing and non-nitrogen-containing bisphosphonates was selected for the present study. Four refinements of eccentric distance sum topochemical index termed as augmented eccentric distance sum topochemical indices 1-4 [formula: see text] have been proposed so as to significantly augment discriminating power. Proposed topological indices (TIs) along with the exiting TIs (>1,400) were subsequently utilized for development of models for prediction of anti-tumor activity of bisphosphonates. A total of 43 descriptors of diverse nature, from a large pool of molecular descriptors, calculated through E-Dragon software (version 1.0) and an in-house computer program were selected for development of suitable models by employing DT, RF and MAA. DT identified two TIs as most important and classified the analogues of the dataset with an accuracy of 97% in training set and 90.7% in tenfold cross-validated set. Random forest correctly classified the analogues with an accuracy of 89.2%. Four independent models developed through MAA predicted the activity of analogues of the dataset with an accuracy of 87.6% to 89%. The statistical significance of proposed models was assessed through intercorrelation analysis, specificity, sensitivity and Matthew's correlation coefficient. The proposed models offer a vast potential for providing lead structures for development of potent anti-tumor agents for treatment of cancer that has spread to the bone.
Collapse
|
10
|
Toropova AP, Toropov AA, Benfenati E, Gini G. Simplified molecular input-line entry system and International Chemical Identifier in the QSAR analysis of styrylquinoline derivatives as HIV-1 integrase inhibitors. Chem Biol Drug Des 2011; 77:343-60. [PMID: 21352501 DOI: 10.1111/j.1747-0285.2011.01109.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The simplified molecular input-line entry system (SMILES) and IUPAC International Chemical Identifier (InChI) were examined as representations of the molecular structure for quantitative structure-activity relationships (QSAR), which can be used to predict the inhibitory activity of styrylquinoline derivatives against the human immunodeficiency virus type 1 (HIV-1). Optimal SMILES-based descriptors give a best model with n = 26, r(2) = 0.6330, q(2) = 0.5812, s = 0.502, F = 41 for the training set and n = 10, r(2) = 0.7493, r(pred)(2) = 0.6235, R(m)(2) = 0.537, s = 0.541, F = 24 for the validation set. Optimal InChI-based descriptors give a best model with n = 26, r(2) = 0.8673, q(2) = 0.8456, s = 0.302, F = 157 for the training set and n = 10, r(2) = 0.8562, r(pred)(2) = 0.7715, R(m)(2) = 0.819, s = 0.329, F = 48 for the validation set. Thus, the InChI-based model is preferable. The described SMILES-based and InChI-based approaches have been checked with five random splits into the training and test sets.
Collapse
Affiliation(s)
- Alla P Toropova
- Istituto di Ricerche Farmacologiche Mario Negri, Via La Masa 19, 20156 Milano, Italy
| | | | | | | |
Collapse
|
11
|
Souza ÉS, Kuhnen CA, Junkes BDS, Yunes RA, Heinzen VEF. Quantitative structure–retention relationship modelling of esters on stationary phases of different polarity. J Mol Graph Model 2009; 28:20-7. [DOI: 10.1016/j.jmgm.2009.03.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2008] [Revised: 03/05/2009] [Accepted: 03/07/2009] [Indexed: 10/21/2022]
|
12
|
Toropov AA, Toropova AP, Benfenati E. QSAR Modelling for Mutagenic Potency of Heteroaromatic Amines by Optimal SMILES-based Descriptors. Chem Biol Drug Des 2009; 73:301-12. [DOI: 10.1111/j.1747-0285.2009.00778.x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
13
|
Theoretical characterization of gas–liquid chromatographic stationary phases with quantum chemical descriptors. J Chromatogr A 2009; 1216:2540-7. [DOI: 10.1016/j.chroma.2009.01.026] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2008] [Revised: 01/07/2009] [Accepted: 01/12/2009] [Indexed: 11/19/2022]
|
14
|
Héberger K. Quantitative structure-(chromatographic) retention relationships. J Chromatogr A 2007; 1158:273-305. [PMID: 17499256 DOI: 10.1016/j.chroma.2007.03.108] [Citation(s) in RCA: 268] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2007] [Revised: 03/13/2007] [Accepted: 03/19/2007] [Indexed: 01/30/2023]
Abstract
Since the pioneering works of Kaliszan (R. Kaliszan, Quantitative Structure-Chromatographic Retention Relationships, Wiley, New York, 1987; and R. Kaliszan, Structure and Retention in Chromatography. A Chemometric Approach, Harwood Academic, Amsterdam, 1997) no comprehensive summary is available in the field. Present review covers the period of 1996-August 2006. The sources are grouped according to the special properties of kinds of chromatography: Quantitative structure-retention relationship in gas chromatography, in planar chromatography, in column liquid chromatography, in micellar liquid chromatography, affinity chromatography and quantitative structure enantioselective retention relationships. General tendencies, misleading practice and conclusions, validation of the models, suggestions for future works are summarized for each sub-field. Some straightforward applications are emphasized but standard ones. The sources and the model compounds, descriptors, predicted retention data, modeling methods and indicators of their performance, validation of models, and stationary phases are collected in the tables. Some important conclusions are: Not all physicochemical descriptors correlate with the retention data strongly; the heat of formation is not related to the chromatographic retention. It is not appropriate to give the errors of Kovats indices in percentages. The apparently low values (1-3%) can disorient the reviewers and readers. Contemporary mean interlaboratory reproducibility of Kovats indices are about 5-10 i.u. for standard non polar phases and 10-25 i.u. for standard polar phases. The predictive performance of QSRR models deteriorates as the polarity of GC stationary phase increases. The correlation coefficient alone is not a particularly good indicator for the model performance. Residuals are more useful than plots of measured and calculated values. There is no need to give the retention data in a form of an equation if the numbers of compounds are small. The domain of model applicability of models should be given in all cases.
Collapse
Affiliation(s)
- Károly Héberger
- Chemical Research Center, Hungarian Academy of Sciences, P.O. Box 17, H-1525 Budapest, Hungary.
| |
Collapse
|
15
|
Balaban AT, Mills D, Kodali V, Basak SC. Complexity of chemical graphs in terms of size, branching, and cyclicity. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2006; 17:429-50. [PMID: 16920663 DOI: 10.1080/10629360600884421] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Chemical graph complexity depends on many factors, but the main ones are size, branching, and cyclicity. Some molecular descriptors embrace together all these three parameters, which cannot then be disentangled. The topological index J (and its refinements that include accounting for bond multiplicity and the presence of heteroatoms) was designed to compensate in a significant measure for graph size and cyclicity, and therefore it contains information mainly on branching. In order to separate these factors, two new indices (F and G) related with J are proposed, which allow to group together graphs with the same size into families of constitutional formulas differing in their branching and cyclicity. A comparison with other topological indices revealed that a few other topological indices vary similarly with index G, notably DN2S4 among the triplet indices, and TOTOP among the indices contained in the Molconn-Z program. This comparison involved all possible chemical graphs (i.e. connected planar graphs with vertex degrees not higher than four) with four through six vertices, and all possible alkanes with four through nine carbon atoms.
Collapse
Affiliation(s)
- A T Balaban
- Texas A&M University Galveston, Galveston, TX 77551, USA.
| | | | | | | |
Collapse
|