2
|
Agüero-Chapin G, de la Riva GA, Molina-Ruiz R, Sánchez-Rodríguez A, Pérez-Machado G, Vasconcelos V, Antunes A. Non-linear models based on simple topological indices to identify RNase III protein members. J Theor Biol 2010; 273:167-78. [PMID: 21192951 DOI: 10.1016/j.jtbi.2010.12.019] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2010] [Revised: 11/15/2010] [Accepted: 12/13/2010] [Indexed: 01/27/2023]
Abstract
Alignment-free classifiers are especially useful in the functional classification of protein classes with variable homology and different domain structures. Thus, the Topological Indices to BioPolymers (TI2BioP) methodology (Agüero-Chapin et al., 2010) inspired in both the TOPS-MODE and the MARCH-INSIDE methodologies allows the calculation of simple topological indices (TIs) as alignment-free classifiers. These indices were derived from the clustering of the amino acids into four classes of hydrophobicity and polarity revealing higher sequence-order information beyond the amino acid composition level. The predictability power of such TIs was evaluated for the first time on the RNase III family, due to the high diversity of its members (primary sequence and domain organization). Three non-linear models were developed for RNase III class prediction: Decision Tree Model (DTM), Artificial Neural Networks (ANN)-model and Hidden Markov Model (HMM). The first two are alignment-free approaches, using TIs as input predictors. Their performances were compared with a non-classical HMM, modified according to our amino acid clustering strategy. The alignment-free models showed similar performances on the training and the test sets reaching values above 90% in the overall classification. The non-classical HMM showed the highest rate in the classification with values above 95% in training and 100% in test. Although the higher accuracy of the HMM, the DTM showed simplicity for the RNase III classification with low computational cost. Such simplicity was evaluated in respect to HMM and ANN models for the functional annotation of a new bacterial RNase III class member, isolated and annotated by our group.
Collapse
Affiliation(s)
- Guillermin Agüero-Chapin
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Rua dos Bragas, 177, 4050-123 Porto, Portugal
| | | | | | | | | | | | | |
Collapse
|
3
|
Agüero-Chapin G, Pérez-Machado G, Molina-Ruiz R, Pérez-Castillo Y, Morales-Helguera A, Vasconcelos V, Antunes A. TI2BioP: Topological Indices to BioPolymers. Its practical use to unravel cryptic bacteriocin-like domains. Amino Acids 2010; 40:431-42. [PMID: 20563611 DOI: 10.1007/s00726-010-0653-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2010] [Accepted: 06/02/2010] [Indexed: 02/04/2023]
Abstract
Bacteriocins are proteinaceous toxins produced and exported by both gram-negative and gram-positive bacteria as a defense mechanism. The bacteriocin protein family is highly diverse, which complicates the identification of bacteriocin-like sequences using alignment approaches. The use of topological indices (TIs) irrespective of sequence similarity can be a promising alternative to predict proteinaceous bacteriocins. Thus, we present Topological Indices to BioPolymers (TI2BioP) as an alignment-free approach inspired in both the Topological Substructural Molecular Design (TOPS-MODE) and Markov Chain Invariants for Network Selection and Design (MARCH-INSIDE) methodology. TI2BioP allows the calculation of the spectral moments as simple TIs to seek quantitative sequence-function relationships (QSFR) models. Since hydrophobicity and basicity are major criteria for the bactericide activity of bacteriocins, the spectral moments ((HP)μ(k)) were derived for the first time from protein artificial secondary structures based on amino acid clustering into a Cartesian system of hydrophobicity and polarity. Several orders of (HP)μ(k) characterized numerically 196 bacteriocin-like sequences and a control group made up of 200 representative CATH domains. Subsequently, they were used to develop an alignment-free QSFR model allowing a 76.92% discrimination of bacteriocin proteins from other domains, a relevant result considering the high sequence diversity among the members of both groups. The model showed a prediction overall performance of 72.16%, detecting specifically 66.7% of proteinaceous bacteriocins whereas the InterProScan retrieved just 60.2%. As a practical validation, the model also predicted successfully the cryptic bactericide function of the Cry 1Ab C-terminal domain from Bacillus thuringiensis's endotoxin, which has not been detected by classical alignment methods.
Collapse
Affiliation(s)
- Guillermín Agüero-Chapin
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Rua dos Bragas, 177, 4050-123, Porto, Portugal
| | | | | | | | | | | | | |
Collapse
|
8
|
Cruz-Monteagudo M, González-Díaz H, Borges F, Dominguez ER, Cordeiro MNDS. 3D-MEDNEs: an alternative "in silico" technique for chemical research in toxicology. 2. quantitative proteome-toxicity relationships (QPTR) based on mass spectrum spiral entropy. Chem Res Toxicol 2008; 21:619-32. [PMID: 18257557 DOI: 10.1021/tx700296t] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Low range mass spectra (MS) characterization of serum proteome offers the best chance of discovering proteome-(early drug-induced cardiac toxicity) relationships, called here Pro-EDICToRs. However, due to the thousands of proteins involved, finding the single disease-related protein could be a hard task. The search for a model based on general MS patterns becomes a more realistic choice. In our previous work ( González-Díaz, H. , et al. Chem. Res. Toxicol. 2003, 16, 1318- 1327 ), we introduced the molecular structure information indices called 3D-Markovian electronic delocalization entropies (3D-MEDNEs). In this previous work, quantitative structure-toxicity relationship (QSTR) techniques allowed us to link 3D-MEDNEs with blood toxicological properties of drugs. In this second part, we extend 3D-MEDNEs to numerically encode biologically relevant information present in MS of the serum proteome for the first time. Using the same idea behind QSTR techniques, we can seek now by analogy a quantitative proteome-toxicity relationship (QPTR). The new QPTR models link MS 3D-MEDNEs with drug-induced toxicological properties from blood proteome information. We first generalized Randic's spiral graph and lattice networks of protein sequences to represent the MS of 62 serum proteome samples with more than 370 100 intensity ( I i ) signals with m/ z bandwidth above 700-12000 each. Next, we calculated the 3D-MEDNEs for each MS using the software MARCH-INSIDE. After that, we developed several QPTR models using different machine learning and MS representation algorithms to classify samples as control or positive Pro-EDICToRs samples. The best QPTR proposed showed accuracy values ranging from 83.8% to 87.1% and leave-one-out (LOO) predictive ability of 77.4-85.5%. This work demonstrated that the idea behind classic drug QSTR models may be extended to construct QPTRs with proteome MS data.
Collapse
Affiliation(s)
- Maykel Cruz-Monteagudo
- Physico-Chemical Molecular Research Unit, Department of Organic Chemistry, Faculty of Pharmacy, University of Porto, 4150-047 Porto, Portugal
| | | | | | | | | |
Collapse
|
9
|
Agüero-Chapín G, González-Díaz H, de la Riva G, Rodríguez E, Sánchez-Rodríguez A, Podda G, Vazquez-Padrón RI. MMM-QSAR Recognition of Ribonucleases without Alignment: Comparison with an HMM Model and Isolation from Schizosaccharomyces pombe, Prediction, and Experimental Assay of a New Sequence. J Chem Inf Model 2008; 48:434-48. [DOI: 10.1021/ci7003225] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Guillermín Agüero-Chapín
- Dipartimento Farmaco Chimico Tecnologico, Universitá Degli Studi di Cagliari, Cagliari, 09124, Italy, CAP, Faculty of Chemistry and Pharmacy, IBP, and CBQ, UCLV, Santa Clara 54830, Cuba, Unit for Bioinformatics & Connectivity Analysis (UBICA), Institute of Industrial Pharmacy and Department of Organic Chemistry, Faculty of Pharmacy, USC, Santiago de Compostela 15782, Spain, CINVESTAV-LANGEBIO, Irapuato, Guanajuato 36821, México, Caribbean Vitroplants, Santo Domingo 1464, Dominican Republic, and Vascular
| | - Humberto González-Díaz
- Dipartimento Farmaco Chimico Tecnologico, Universitá Degli Studi di Cagliari, Cagliari, 09124, Italy, CAP, Faculty of Chemistry and Pharmacy, IBP, and CBQ, UCLV, Santa Clara 54830, Cuba, Unit for Bioinformatics & Connectivity Analysis (UBICA), Institute of Industrial Pharmacy and Department of Organic Chemistry, Faculty of Pharmacy, USC, Santiago de Compostela 15782, Spain, CINVESTAV-LANGEBIO, Irapuato, Guanajuato 36821, México, Caribbean Vitroplants, Santo Domingo 1464, Dominican Republic, and Vascular
| | - Gustavo de la Riva
- Dipartimento Farmaco Chimico Tecnologico, Universitá Degli Studi di Cagliari, Cagliari, 09124, Italy, CAP, Faculty of Chemistry and Pharmacy, IBP, and CBQ, UCLV, Santa Clara 54830, Cuba, Unit for Bioinformatics & Connectivity Analysis (UBICA), Institute of Industrial Pharmacy and Department of Organic Chemistry, Faculty of Pharmacy, USC, Santiago de Compostela 15782, Spain, CINVESTAV-LANGEBIO, Irapuato, Guanajuato 36821, México, Caribbean Vitroplants, Santo Domingo 1464, Dominican Republic, and Vascular
| | - Edrey Rodríguez
- Dipartimento Farmaco Chimico Tecnologico, Universitá Degli Studi di Cagliari, Cagliari, 09124, Italy, CAP, Faculty of Chemistry and Pharmacy, IBP, and CBQ, UCLV, Santa Clara 54830, Cuba, Unit for Bioinformatics & Connectivity Analysis (UBICA), Institute of Industrial Pharmacy and Department of Organic Chemistry, Faculty of Pharmacy, USC, Santiago de Compostela 15782, Spain, CINVESTAV-LANGEBIO, Irapuato, Guanajuato 36821, México, Caribbean Vitroplants, Santo Domingo 1464, Dominican Republic, and Vascular
| | - Aminael Sánchez-Rodríguez
- Dipartimento Farmaco Chimico Tecnologico, Universitá Degli Studi di Cagliari, Cagliari, 09124, Italy, CAP, Faculty of Chemistry and Pharmacy, IBP, and CBQ, UCLV, Santa Clara 54830, Cuba, Unit for Bioinformatics & Connectivity Analysis (UBICA), Institute of Industrial Pharmacy and Department of Organic Chemistry, Faculty of Pharmacy, USC, Santiago de Compostela 15782, Spain, CINVESTAV-LANGEBIO, Irapuato, Guanajuato 36821, México, Caribbean Vitroplants, Santo Domingo 1464, Dominican Republic, and Vascular
| | - Gianni Podda
- Dipartimento Farmaco Chimico Tecnologico, Universitá Degli Studi di Cagliari, Cagliari, 09124, Italy, CAP, Faculty of Chemistry and Pharmacy, IBP, and CBQ, UCLV, Santa Clara 54830, Cuba, Unit for Bioinformatics & Connectivity Analysis (UBICA), Institute of Industrial Pharmacy and Department of Organic Chemistry, Faculty of Pharmacy, USC, Santiago de Compostela 15782, Spain, CINVESTAV-LANGEBIO, Irapuato, Guanajuato 36821, México, Caribbean Vitroplants, Santo Domingo 1464, Dominican Republic, and Vascular
| | - Roberto I. Vazquez-Padrón
- Dipartimento Farmaco Chimico Tecnologico, Universitá Degli Studi di Cagliari, Cagliari, 09124, Italy, CAP, Faculty of Chemistry and Pharmacy, IBP, and CBQ, UCLV, Santa Clara 54830, Cuba, Unit for Bioinformatics & Connectivity Analysis (UBICA), Institute of Industrial Pharmacy and Department of Organic Chemistry, Faculty of Pharmacy, USC, Santiago de Compostela 15782, Spain, CINVESTAV-LANGEBIO, Irapuato, Guanajuato 36821, México, Caribbean Vitroplants, Santo Domingo 1464, Dominican Republic, and Vascular
| |
Collapse
|