1
|
Nandy A. Mapping Biomolecular Sequences: Graphical Representations - their Origins, Applications and Future Prospects. Comb Chem High Throughput Screen 2021; 25:354-364. [PMID: 33970841 DOI: 10.2174/1386207324666210510164743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 01/25/2021] [Accepted: 02/11/2021] [Indexed: 11/22/2022]
Abstract
The exponential growth in the depositories of biological sequence data have generated an urgent need to store, retrieve and analyse the data efficiently and effectively for which the standard practice of using alignment procedures are not adequate due to high demand on computing resources and time. Graphical representation of sequences has become one of the most popular alignment-free strategies to analyse the biological sequences where each basic unit of the sequences - the bases adenine, cytosine, guanine and thymine for DNA/RNA, and the 20 amino acids for proteins - are plotted on a multi-dimensional grid. The resulting curve in 2D and 3D space and the implied graph in higher dimensions provide a perception of the underlying information of the sequences through visual inspection; numerical analyses, in geometrical or matrix terms, of the plots provide a measure of comparison between sequences and thus enable study of sequence hierarchies. The new approach has also enabled studies of comparisons of DNA sequences over many thousands of bases and provided new insights into the structure of the base compositions of DNA sequences In this article we review in brief the origins and applications of graphical representations and highlight the future perspectives in this field.
Collapse
Affiliation(s)
- Ashesh Nandy
- Centre for Interdisciplinary Research and Education, Kolkata 700068, India
| |
Collapse
|
2
|
Fernandez-Blanco E, Rivero D, Rabuñal J, Dorado J, Pazos A, Munteanu CR. Automatic seizure detection based on star graph topological indices. J Neurosci Methods 2012; 209:410-9. [DOI: 10.1016/j.jneumeth.2012.07.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2012] [Revised: 06/28/2012] [Accepted: 07/10/2012] [Indexed: 11/27/2022]
|
3
|
Agüero-Chapin G, Sánchez-Rodríguez A, Hidalgo-Yanes PI, Pérez-Castillo Y, Molina-Ruiz R, Marchal K, Vasconcelos V, Antunes A. An alignment-free approach for eukaryotic ITS2 annotation and phylogenetic inference. PLoS One 2011; 6:e26638. [PMID: 22046320 PMCID: PMC3202569 DOI: 10.1371/journal.pone.0026638] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2011] [Accepted: 09/29/2011] [Indexed: 02/02/2023] Open
Abstract
The ITS2 gene class shows a high sequence divergence among its members that have complicated its annotation and its use for reconstructing phylogenies at a higher taxonomical level (beyond species and genus). Several alignment strategies have been implemented to improve the ITS2 annotation quality and its use for phylogenetic inferences. Although, alignment based methods have been exploited to the top of its complexity to tackle both issues, no alignment-free approaches have been able to successfully address both topics. By contrast, the use of simple alignment-free classifiers, like the topological indices (TIs) containing information about the sequence and structure of ITS2, may reveal to be a useful approach for the gene prediction and for assessing the phylogenetic relationships of the ITS2 class in eukaryotes. Thus, we used the TI2BioP (Topological Indices to BioPolymers) methodology [1], [2], freely available at http://ti2biop.sourceforge.net/ to calculate two different TIs. One class was derived from the ITS2 artificial 2D structures generated from DNA strings and the other from the secondary structure inferred from RNA folding algorithms. Two alignment-free models based on Artificial Neural Networks were developed for the ITS2 class prediction using the two classes of TIs referred above. Both models showed similar performances on the training and the test sets reaching values above 95% in the overall classification. Due to the importance of the ITS2 region for fungi identification, a novel ITS2 genomic sequence was isolated from Petrakia sp. This sequence and the test set were used to comparatively evaluate the conventional classification models based on multiple sequence alignments like Hidden Markov based approaches, revealing the success of our models to identify novel ITS2 members. The isolated sequence was assessed using traditional and alignment-free based techniques applied to phylogenetic inference to complement the taxonomy of the Petrakia sp. fungal isolate.
Collapse
Affiliation(s)
- Guillermin Agüero-Chapin
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Porto, Portugal
- Molecular Simulation and Drug Design (CBQ), Universidad Central “Marta Abreu” de Las Villas (UCLV), Santa Clara, Cuba
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| | | | - Pedro I. Hidalgo-Yanes
- Molecular Simulation and Drug Design (CBQ), Universidad Central “Marta Abreu” de Las Villas (UCLV), Santa Clara, Cuba
- Area of Microbiology, University of León, León, Spain
| | - Yunierkis Pérez-Castillo
- Molecular Simulation and Drug Design (CBQ), Universidad Central “Marta Abreu” de Las Villas (UCLV), Santa Clara, Cuba
| | - Reinaldo Molina-Ruiz
- Molecular Simulation and Drug Design (CBQ), Universidad Central “Marta Abreu” de Las Villas (UCLV), Santa Clara, Cuba
| | - Kathleen Marchal
- CMPG, Department of Microbial and Molecular Systems, KU Leuven, Leuven, Belgium
| | - Vítor Vasconcelos
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Porto, Portugal
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| | - Agostinho Antunes
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Porto, Portugal
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| |
Collapse
|
4
|
González-Díaz H, Prado-Prado F, Sobarzo-Sánchez E, Haddad M, Maurel Chevalley S, Valentin A, Quetin-Leclercq J, Dea-Ayuela MA, Teresa Gomez-Muños M, Munteanu CR, José Torres-Labandeira J, García-Mera X, Tapia RA, Ubeira FM. NL MIND-BEST: A web server for ligands and proteins discovery—Theoretic-experimental study of proteins of Giardia lamblia and new compounds active against Plasmodium falciparum. J Theor Biol 2011; 276:229-49. [DOI: 10.1016/j.jtbi.2011.01.010] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2010] [Revised: 12/02/2010] [Accepted: 01/10/2011] [Indexed: 10/18/2022]
|
5
|
Zhang S, Wang T. A Complexity-based Method to Compare RNA Secondary Structures and its Application. J Biomol Struct Dyn 2010; 28:247-58. [PMID: 20645657 DOI: 10.1080/07391102.2010.10507357] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
6
|
Concu R, Dea-Ayuela MA, Perez-Montoto LG, Bolas-Fernández F, Prado-Prado FJ, Podda G, Uriarte E, Ubeira FM, González-Díaz H. Prediction of enzyme classes from 3D structure: a general model and examples of experimental-theoretic scoring of peptide mass fingerprints of Leishmania proteins. J Proteome Res 2009; 8:4372-82. [PMID: 19603824 DOI: 10.1021/pr9003163] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The number of protein and peptide structures included in Protein Data Bank (PDB) and Gen Bank without functional annotation has increased. Consequently, there is a high demand for theoretical models to predict these functions. Here, we trained and validated, with an external set, a Markov Chain Model (MCM) that classifies proteins by their possible mechanism of action according to Enzyme Classification (EC) number. The methodology proposed is essentially new, and enables prediction of all EC classes with a single equation without the need for an equation for each class or nonlinear models with multiple outputs. In addition, the model may be used to predict whether one peptide presents a positive or negative contribution of the activity of the same EC class. The model predicts the first EC number for 106 out of 151 (70.2%) oxidoreductases, 178/178 (100%) transferases, 223/223 (100%) hydrolases, 64/85 (75.3%) lyases, 74/74 (100%) isomerases, and 100/100 (100%) ligases, as well as 745/811 (91.9%) nonenzymes. It is important to underline that this method may help us predict new enzyme proteins or select peptide candidates that improve enzyme activity, which may be of interest for the prediction of new drugs or drug targets. To illustrate the model's application, we report the 2D-Electrophoresis (2DE) isolation from Leishmania infantum as well as MADLI TOF Mass Spectra characterization and theoretical study of the Peptide Mass Fingerprints (PMFs) of a new protein sequence. The theoretical study focused on MASCOT, BLAST alignment, and alignment-free QSAR prediction of the contribution of 29 peptides found in the PMF of the new protein to specific enzyme action. This combined strategy may be used to identify and predict peptides of prokaryote and eukaryote parasites and their hosts as well as other superior organisms, which may be of interest in drug development or target identification.
Collapse
Affiliation(s)
- Riccardo Concu
- Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, Santiago de Compostela, Spain
| | | | | | | | | | | | | | | | | |
Collapse
|
7
|
3D entropy and moments prediction of enzyme classes and experimental-theoretic study of peptide fingerprints in Leishmania parasites. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2009; 1794:1784-94. [DOI: 10.1016/j.bbapap.2009.08.020] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2009] [Revised: 08/07/2009] [Accepted: 08/17/2009] [Indexed: 11/21/2022]
|
8
|
González-Díaz H, Dea-Ayuela MA, Pérez-Montoto LG, Prado-Prado FJ, Agüero-Chapín G, Bolas-Fernández F, Vazquez-Padrón RI, Ubeira FM. QSAR for RNases and theoretic-experimental study of molecular diversity on peptide mass fingerprints of a new Leishmania infantum protein. Mol Divers 2009; 14:349-69. [PMID: 19578942 PMCID: PMC7088557 DOI: 10.1007/s11030-009-9178-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2009] [Accepted: 06/13/2009] [Indexed: 11/29/2022]
Abstract
The toxicity and low success of current treatments for Leishmaniosis determines the search of new peptide drugs and/or molecular targets in Leishmania pathogen species (L. infantum and L. major). For example, Ribonucleases (RNases) are enzymes relevant to several biologic processes; then, theoretical and experimental study of the molecular diversity of Peptide Mass Fingerprints (PMFs) of RNases is useful for drug design. This study introduces a methodology that combines QSAR models, 2D-Electrophoresis (2D-E), MALDI-TOF Mass Spectroscopy (MS), BLAST alignment, and Molecular Dynamics (MD) to explore PMFs of RNases. We illustrate this approach by investigating for the first time the PMFs of a new protein of L. infantum. Here we report and compare new versus old predictive models for RNases based on Topological Indices (TIs) of Markov Pseudo-Folding Lattices. These group of indices called Pseudo-folding Lattice 2D-TIs include: Spectral moments pi ( k )(x,y), Mean Electrostatic potentials xi ( k )(x,y), and Entropy measures theta ( k )(x,y). The accuracy of the models (training/cross-validation) was as follows: xi ( k )(x,y)-model (96.0%/91.7%)>pi ( k )(x,y)-model (84.7/83.3) > theta ( k )(x,y)-model (66.0/66.7). We also carried out a 2D-E analysis of biological samples of L. infantum promastigotes focusing on a 2D-E gel spot of one unknown protein with M<20, 100 and pI <7. MASCOT search identified 20 proteins with Mowse score >30, but not one >52 (threshold value), the higher value of 42 was for a probable DNA-directed RNA polymerase. However, we determined experimentally the sequence of more than 140 peptides. We used QSAR models to predict RNase scores for these peptides and BLAST alignment to confirm some results. We also calculated 3D-folding TIs based on MD experiments and compared 2D versus 3D-TIs on molecular phylogenetic analysis of the molecular diversity of these peptides. This combined strategy may be of interest in drug development or target identification.
Collapse
Affiliation(s)
- Humberto González-Díaz
- Department of Microbiology and Parasitology, and Department of Organic Chemistry, Faculty of Pharmacy, USC, 15782, Santiago de Compostela, Spain.
| | | | | | | | | | | | | | | |
Collapse
|
9
|
Pasha FA, Neaz MM, Cho SJ, Ansari M, Mishra SK, Tiwari S. In SilicoQuantitative Structure-Toxicity Relationship Study of Aromatic Nitro Compounds. Chem Biol Drug Des 2009; 73:537-44. [DOI: 10.1111/j.1747-0285.2009.00799.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
10
|
Perez-Bello A, Munteanu CR, Ubeira FM, De Magalhães AL, Uriarte E, González-Díaz H. Alignment-free prediction of mycobacterial DNA promoters based on pseudo-folding lattice network or star-graph topological indices. J Theor Biol 2008; 256:458-66. [PMID: 18992259 PMCID: PMC7126577 DOI: 10.1016/j.jtbi.2008.09.035] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2008] [Revised: 09/23/2008] [Accepted: 09/25/2008] [Indexed: 12/01/2022]
Abstract
The importance of the promoter sequences in the function regulation of several important mycobacterial pathogens creates the necessity to design simple and fast theoretical models that can predict them. This work proposes two DNA promoter QSAR models based on pseudo-folding lattice network (LN) and star-graphs (SG) topological indices. In addition, a comparative study with the previous RNA electrostatic parameters of thermodynamically-driven secondary structure folding representations has been carried out. The best model of this work was obtained with only two LN stochastic electrostatic potentials and it is characterized by accuracy, selectivity and specificity of 90.87%, 82.96% and 92.95%, respectively. In addition, we pointed out the SG result dependence on the DNA sequence codification and we proposed a QSAR model based on codons and only three SG spectral moments.
Collapse
Affiliation(s)
- Alcides Perez-Bello
- Department of Microbiology and Parasitology, University of Santiago de Compostela, Santiago de Compostela 15782, Spain.
| | | | | | | | | | | |
Collapse
|
11
|
Abstract
The relation between information and classical thermodynamic transformations is investigated. A probabilistic facet of reversible transforms is first illustrated. A method for quantifying the information associated with heating, cooling, expansion, and compression is then presented. Several applications are illustrated for a single-component van der Waals fluid. The significance is discussed in terms of dimension and measurement resolution properties, critical point behavior, and the algorithms designed for heat engines. The uncertainty in the state point position due to fluctuations is well-established for equilibrium systems. What is examined here is the uncertainty surrounding an extended locus of points. The Shannon information serves as a useful tool for quantifying this uncertainty.
Collapse
Affiliation(s)
- Daniel J Graham
- Department of Chemistry, Loyola University Chicago, Chicago, IL 60626, USA.
| | | |
Collapse
|
12
|
Dai Q, Wang T. Use of linear regression model to compare RNA secondary structures. J Theor Biol 2008; 253:854-60. [DOI: 10.1016/j.jtbi.2008.04.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2008] [Revised: 04/17/2008] [Accepted: 04/17/2008] [Indexed: 11/25/2022]
|
13
|
González-Díaz H, González-Díaz Y, Santana L, Ubeira FM, Uriarte E. Proteomics, networks and connectivity indices. Proteomics 2008; 8:750-78. [DOI: 10.1002/pmic.200700638] [Citation(s) in RCA: 170] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
14
|
Cruz-Monteagudo M, González-Díaz H, Borges F, Dominguez ER, Cordeiro MNDS. 3D-MEDNEs: an alternative "in silico" technique for chemical research in toxicology. 2. quantitative proteome-toxicity relationships (QPTR) based on mass spectrum spiral entropy. Chem Res Toxicol 2008; 21:619-32. [PMID: 18257557 DOI: 10.1021/tx700296t] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Low range mass spectra (MS) characterization of serum proteome offers the best chance of discovering proteome-(early drug-induced cardiac toxicity) relationships, called here Pro-EDICToRs. However, due to the thousands of proteins involved, finding the single disease-related protein could be a hard task. The search for a model based on general MS patterns becomes a more realistic choice. In our previous work ( González-Díaz, H. , et al. Chem. Res. Toxicol. 2003, 16, 1318- 1327 ), we introduced the molecular structure information indices called 3D-Markovian electronic delocalization entropies (3D-MEDNEs). In this previous work, quantitative structure-toxicity relationship (QSTR) techniques allowed us to link 3D-MEDNEs with blood toxicological properties of drugs. In this second part, we extend 3D-MEDNEs to numerically encode biologically relevant information present in MS of the serum proteome for the first time. Using the same idea behind QSTR techniques, we can seek now by analogy a quantitative proteome-toxicity relationship (QPTR). The new QPTR models link MS 3D-MEDNEs with drug-induced toxicological properties from blood proteome information. We first generalized Randic's spiral graph and lattice networks of protein sequences to represent the MS of 62 serum proteome samples with more than 370 100 intensity ( I i ) signals with m/ z bandwidth above 700-12000 each. Next, we calculated the 3D-MEDNEs for each MS using the software MARCH-INSIDE. After that, we developed several QPTR models using different machine learning and MS representation algorithms to classify samples as control or positive Pro-EDICToRs samples. The best QPTR proposed showed accuracy values ranging from 83.8% to 87.1% and leave-one-out (LOO) predictive ability of 77.4-85.5%. This work demonstrated that the idea behind classic drug QSTR models may be extended to construct QPTRs with proteome MS data.
Collapse
Affiliation(s)
- Maykel Cruz-Monteagudo
- Physico-Chemical Molecular Research Unit, Department of Organic Chemistry, Faculty of Pharmacy, University of Porto, 4150-047 Porto, Portugal
| | | | | | | | | |
Collapse
|
15
|
|
16
|
Cruz-Monteagudo M, González-Díaz H, Agüero-Chapín G, Santana L, Borges F, Domínguez ER, Podda G, Uriarte E. Computational chemistry development of a unified free energy Markov model for the distribution of 1300 chemicals to 38 different environmental or biological systems. J Comput Chem 2007; 28:1909-23. [PMID: 17405109 DOI: 10.1002/jcc.20730] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Predicting tissue and environmental distribution of chemicals is of major importance for environmental and life sciences. Most of the molecular descriptors used in computational prediction of chemicals partition behavior consider molecular structure but ignore the nature of the partition system. Consequently, computational models derived up-to-date are restricted to the specific system under study. Here, a free energy-based descriptor (DeltaG(k)) is introduced, which circumvent this problem. Based on DeltaG(k), we developed for the first time a single linear classification model to predict the partition behavior of a broad number of structurally diverse drugs and other chemicals (1300) for 38 different partition systems of biological and environmental significance. The model presented training/predicting set accuracies of 91.79/88.92%. Parametrical assumptions were checked. Desirability analysis was used to explore the levels of the predictors that produce the most desirable partition properties. Finally, inversion of the partition direction for each one of the 38 partition systems evidences that our models correctly classified 89.08% of compounds with an uncertainty of only +/-0.17% independently of the direction of the partition process used to seek the model. Other 10 different classification models (linear, neural networks, and genetic algorithms) were also tested for the same purposes. None of these computational models favorably compare with respect to the linear model indicating that our approach capture the main aspects that govern chemicals partition in different systems.
Collapse
Affiliation(s)
- Maykel Cruz-Monteagudo
- Physico-Chemical Molecular Research Unit, Department of Organic Chemistry, Faculty of Pharmacy, University of Porto 4050-047, Porto, Portugal
| | | | | | | | | | | | | | | |
Collapse
|
17
|
Bielińska-Wa¸ż D, Clark T, Wa¸ż P, Nowak W, Nandy A. 2D-dynamic representation of DNA sequences. Chem Phys Lett 2007. [DOI: 10.1016/j.cplett.2007.05.050] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
18
|
González-Díaz H, Vilar S, Santana L, Podda G, Uriarte E. On the applicability of QSAR for recognition of miRNA bioorganic structures at early stages of organism and cell development: Embryo and stem cells. Bioorg Med Chem 2007; 15:2544-50. [PMID: 17300944 DOI: 10.1016/j.bmc.2007.01.050] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2006] [Revised: 01/24/2007] [Accepted: 01/31/2007] [Indexed: 11/18/2022]
Abstract
Quantitative structure-activity-relationship (QSAR) models have application in bioorganic chemistry mainly to the study of small sized molecules while applications to biopolymers remain not very developed. MicroRNAs (miRNAs), which are non-coding small RNAs, regulate a variety of biological processes and constitute good candidates to scale up the application of QSAR to biopolymers. The propensity of a small RNA sequence to act as miRNA depends on its secondary structure, which one can explain in terms of folding thermodynamic parameters. Then, thermodynamic QSAR can be used, for instance, for fast identification of miRNAs at early stages of development such as embryos and stem cells (called here esmiRNAs), and gain clarity inside cellular differentiation processes and diseases such as cancer. First, we calculated folding free energies (DeltaG), enthalpies (DeltaH), and entropies (DeltaS) as well as melting temperatures (T(m)) for 2623 small RNA sequences (including 623 esmiRNAs and 2000 negative control sequences). Next, we seek a QSAR classification model: esmiRNA=0.035 x T(m)-0.078 x DeltaS-8.748. The model correctly recognized 543 (87.2%) of esmiRNAs and 935 (93.5%) of non-esmiRNAs divided into both training and validation series. The model also recognized 908 out of 1000 additional negative control sequences. ROC curve analysis (area=0.93) demonstrated that the present model significantly differentiates from a random classifier. In addition, we map the influence of thermodynamic parameters over esmiRNA activity. Last, a double ordinate Cartesian plot of cross-validated residuals (first ordinate), standard residuals (second ordinate), and leverages (abscissa) defined the domain of applicability of the model as a squared area within +/-2 band for residuals and a leverage threshold of h=0.0074. The present is the first QSAR model for quickly accurate selection of new esmiRNAs with potential use in bioorganic and medicinal chemistry.
Collapse
Affiliation(s)
- Humberto González-Díaz
- Faculty of Pharmacy, University of Santiago de Compostela, Santiago de Compostela 15782, Spain.
| | | | | | | | | |
Collapse
|
19
|
González-Díaz H, Agüero-Chapin G, Varona J, Molina R, Delogu G, Santana L, Uriarte E, Podda G. 2D-RNA-coupling numbers: A new computational chemistry approach to link secondary structure topology with biological function. J Comput Chem 2007; 28:1049-56. [PMID: 17279496 DOI: 10.1002/jcc.20576] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Methods for prediction of proteins, DNA, or RNA function and mapping it onto sequence often rely on bioinformatics alignment approach instead of chemical structure. Consequently, it is interesting to develop computational chemistry approaches based on molecular descriptors. In this sense, many researchers used sequence-coupling numbers and our group extended them to 2D proteins representations. However, no coupling numbers have been reported for 2D-RNA topology graphs, which are highly branched and contain useful information. Here, we use a computational chemistry scheme: (a) transforming sequences into RNA secondary structures, (b) defining and calculating new 2D-RNA-coupling numbers, (c) seek a structure-function model, and (d) map biological function onto the folded RNA. We studied as example 1-aminocyclopropane-1-carboxylic acid (ACC) oxidases known as ACO, which control fruit ripening having importance for biotechnology industry. First, we calculated tau(k)(2D-RNA) values to a set of 90-folded RNAs, including 28 transcripts of ACO and control sequences. Afterwards, we compared the classification performance of 10 different classifiers implemented in the software WEKA. In particular, the logistic equation ACO = 23.8 . tau(1)(2D-RNA) + 41.4 predicts ACOs with 98.9%, 98.0%, and 97.8% of accuracy in training, leave-one-out and 10-fold cross-validation, respectively. Afterwards, with this equation we predict ACO function to a sequence isolated in this work from Coffea arabica (GenBank accession DQ218452). The tau(1)(2D-RNA) also favorably compare with other descriptors. This equation allows us to map the codification of ACO activity on different mRNA topology features. The present computational-chemistry approach is general and could be extended to connect RNA secondary structure topology to other functions.
Collapse
Affiliation(s)
- Humberto González-Díaz
- Department of Organic Chemistry, University of Santiago de Compostela, Santiago de Compostela 15782, Spain.
| | | | | | | | | | | | | | | |
Collapse
|