1
|
Agüero-Chapin G, Galpert D, Molina-Ruiz R, Ancede-Gallardo E, Pérez-Machado G, De la Riva GA, Antunes A. Graph Theory-Based Sequence Descriptors as Remote Homology Predictors. Biomolecules 2019; 10:E26. [PMID: 31878100 PMCID: PMC7022958 DOI: 10.3390/biom10010026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 12/16/2019] [Accepted: 12/18/2019] [Indexed: 12/23/2022] Open
Abstract
Alignment-free (AF) methodologies have increased in popularity in the last decades as alternative tools to alignment-based (AB) algorithms for performing comparative sequence analyses. They have been especially useful to detect remote homologs within the twilight zone of highly diverse gene/protein families and superfamilies. The most popular alignment-free methodologies, as well as their applications to classification problems, have been described in previous reviews. Despite a new set of graph theory-derived sequence/structural descriptors that have been gaining relevance in the detection of remote homology, they have been omitted as AF predictors when the topic is addressed. Here, we first go over the most popular AF approaches used for detecting homology signals within the twilight zone and then bring out the state-of-the-art tools encoding graph theory-derived sequence/structure descriptors and their success for identifying remote homologs. We also highlight the tendency of integrating AF features/measures with the AB ones, either into the same prediction model or by assembling the predictions from different algorithms using voting/weighting strategies, for improving the detection of remote signals. Lastly, we briefly discuss the efforts made to scale up AB and AF features/measures for the comparison of multiple genomes and proteomes. Alongside the achieved experiences in remote homology detection by both the most popular AF tools and other less known ones, we provide our own using the graphical-numerical methodologies, MARCH-INSIDE, TI2BioP, and ProtDCal. We also present a new Python-based tool (SeqDivA) with a friendly graphical user interface (GUI) for delimiting the twilight zone by using several similar criteria.
Collapse
Affiliation(s)
- Guillermin Agüero-Chapin
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Deborah Galpert
- Departamento de Ciencia de la Computación. Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Reinaldo Molina-Ruiz
- Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Evys Ancede-Gallardo
- Programa de Doctorado en Fisicoquímica Molecular, Facultad de Ciencias Exactas, Universidad Andrés Bello, Av. República 239, Santiago 8370146, Chile;
| | - Gisselle Pérez-Machado
- EpiDisease S.L. Spin-Off of Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), 46980 Valencia, Spain;
| | - Gustavo A. De la Riva
- Laboratorio de Biotecnología Aplicada S. de R.L. de C.V., GRECA Inc., Carretera La Piedad-Carapán, km 3.5, La Piedad, Michoacán 59300, Mexico;
- Tecnológico Nacional de México, Instituto Tecnológico de la Piedad, Av. Ricardo Guzmán Romero, Santa Fe, La Piedad de Cavadas, Michoacán 59370, Mexico
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| |
Collapse
|
2
|
Relationship between faecal microbiota and plasma metabolome in rats fed NK603 and MON810 GM maize from the GMO90+ study. Food Chem Toxicol 2019; 131:110547. [DOI: 10.1016/j.fct.2019.05.055] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 05/14/2019] [Accepted: 05/29/2019] [Indexed: 12/19/2022]
|
3
|
Agüero-Chapin G, Pérez-Machado G, Molina-Ruiz R, Pérez-Castillo Y, Morales-Helguera A, Vasconcelos V, Antunes A. TI2BioP: Topological Indices to BioPolymers. Its practical use to unravel cryptic bacteriocin-like domains. Amino Acids 2010; 40:431-42. [PMID: 20563611 DOI: 10.1007/s00726-010-0653-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2010] [Accepted: 06/02/2010] [Indexed: 02/04/2023]
Abstract
Bacteriocins are proteinaceous toxins produced and exported by both gram-negative and gram-positive bacteria as a defense mechanism. The bacteriocin protein family is highly diverse, which complicates the identification of bacteriocin-like sequences using alignment approaches. The use of topological indices (TIs) irrespective of sequence similarity can be a promising alternative to predict proteinaceous bacteriocins. Thus, we present Topological Indices to BioPolymers (TI2BioP) as an alignment-free approach inspired in both the Topological Substructural Molecular Design (TOPS-MODE) and Markov Chain Invariants for Network Selection and Design (MARCH-INSIDE) methodology. TI2BioP allows the calculation of the spectral moments as simple TIs to seek quantitative sequence-function relationships (QSFR) models. Since hydrophobicity and basicity are major criteria for the bactericide activity of bacteriocins, the spectral moments ((HP)μ(k)) were derived for the first time from protein artificial secondary structures based on amino acid clustering into a Cartesian system of hydrophobicity and polarity. Several orders of (HP)μ(k) characterized numerically 196 bacteriocin-like sequences and a control group made up of 200 representative CATH domains. Subsequently, they were used to develop an alignment-free QSFR model allowing a 76.92% discrimination of bacteriocin proteins from other domains, a relevant result considering the high sequence diversity among the members of both groups. The model showed a prediction overall performance of 72.16%, detecting specifically 66.7% of proteinaceous bacteriocins whereas the InterProScan retrieved just 60.2%. As a practical validation, the model also predicted successfully the cryptic bactericide function of the Cry 1Ab C-terminal domain from Bacillus thuringiensis's endotoxin, which has not been detected by classical alignment methods.
Collapse
Affiliation(s)
- Guillermín Agüero-Chapin
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Rua dos Bragas, 177, 4050-123, Porto, Portugal
| | | | | | | | | | | | | |
Collapse
|
4
|
Hayakawa T, Sato S, Iwamoto S, Sudo S, Sakamoto Y, Yamashita T, Uchida M, Matsushima K, Kashino Y, Sakai H. Novel strategy for protein production using a peptide tag derived from Bacillus thuringiensis Cry4Aa. FEBS J 2010; 277:2883-91. [DOI: 10.1111/j.1742-4658.2010.07704.x] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
5
|
Ghosh S, Thakur MK. Overproduction of mouse estrogen receptor alpha-ligand binding domain decreases bacterial growth. Mol Biol Rep 2007; 35:589-94. [PMID: 17786586 DOI: 10.1007/s11033-007-9128-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2007] [Accepted: 08/14/2007] [Indexed: 11/27/2022]
Abstract
Escherichia coli (E. coli) is the most widely used prokaryotic host system for the synthesis of recombinant proteins. The overproduction of recombinant proteins is sometimes lethal to the host cells. In the present study, we expressed the ligand binding domain (LBD) of mouse estrogen receptor alpha (mouse ERalpha) using an expression vector (pIVEX) in E. coli BL21(DE3) and examined the effect of production of this protein on bacterial growth. The expressed protein was immunologically detected as a 30 kD histidine-tagged protein in the soluble part of the bacterial lysate. The overproduction of mouse ERalpha-LBD, as reflected by total protein content and expression pattern, resulted in the decrease of bacterial growth.
Collapse
Affiliation(s)
- Swati Ghosh
- Biochemistry and Molecular Biology Laboratory, Center of Advanced Study in Zoology, Banaras Hindu University, Varanasi, 221005, India
| | | |
Collapse
|
6
|
Itsko M, Zaritsky A. Exposing cryptic antibacterial activity in Cyt1Ca fromBacillus thuringiensis israelensisby genetic manipulations. FEBS Lett 2007; 581:1775-82. [DOI: 10.1016/j.febslet.2007.03.064] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2007] [Revised: 03/22/2007] [Accepted: 03/22/2007] [Indexed: 10/23/2022]
|
7
|
González-Díaz H, Agüero-Chapin G, Varona J, Molina R, Delogu G, Santana L, Uriarte E, Podda G. 2D-RNA-coupling numbers: A new computational chemistry approach to link secondary structure topology with biological function. J Comput Chem 2007; 28:1049-56. [PMID: 17279496 DOI: 10.1002/jcc.20576] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Methods for prediction of proteins, DNA, or RNA function and mapping it onto sequence often rely on bioinformatics alignment approach instead of chemical structure. Consequently, it is interesting to develop computational chemistry approaches based on molecular descriptors. In this sense, many researchers used sequence-coupling numbers and our group extended them to 2D proteins representations. However, no coupling numbers have been reported for 2D-RNA topology graphs, which are highly branched and contain useful information. Here, we use a computational chemistry scheme: (a) transforming sequences into RNA secondary structures, (b) defining and calculating new 2D-RNA-coupling numbers, (c) seek a structure-function model, and (d) map biological function onto the folded RNA. We studied as example 1-aminocyclopropane-1-carboxylic acid (ACC) oxidases known as ACO, which control fruit ripening having importance for biotechnology industry. First, we calculated tau(k)(2D-RNA) values to a set of 90-folded RNAs, including 28 transcripts of ACO and control sequences. Afterwards, we compared the classification performance of 10 different classifiers implemented in the software WEKA. In particular, the logistic equation ACO = 23.8 . tau(1)(2D-RNA) + 41.4 predicts ACOs with 98.9%, 98.0%, and 97.8% of accuracy in training, leave-one-out and 10-fold cross-validation, respectively. Afterwards, with this equation we predict ACO function to a sequence isolated in this work from Coffea arabica (GenBank accession DQ218452). The tau(1)(2D-RNA) also favorably compare with other descriptors. This equation allows us to map the codification of ACO activity on different mRNA topology features. The present computational-chemistry approach is general and could be extended to connect RNA secondary structure topology to other functions.
Collapse
Affiliation(s)
- Humberto González-Díaz
- Department of Organic Chemistry, University of Santiago de Compostela, Santiago de Compostela 15782, Spain.
| | | | | | | | | | | | | | | |
Collapse
|
8
|
Yudina TG, Brioukhanov AL, Zalunin IA, Revina LP, Shestakov AI, Voyushina NE, Chestukhina GG, Netrusov AI. Antimicrobial activity of different proteins and their fragments from Bacillus thuringiensis parasporal crystals against clostridia and archaea. Anaerobe 2006; 13:6-13. [PMID: 17126041 DOI: 10.1016/j.anaerobe.2006.09.006] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2006] [Revised: 09/26/2006] [Accepted: 09/27/2006] [Indexed: 10/23/2022]
Abstract
Proteins of parasporal crystals (Cry proteins) from entomopathogenic bacterium Bacillus thuringiensis (subspecies kurstaki, galleriae, tenebrionis) as well as some fragments of these proteins, obtained by limited proteolysis, are capable of antimicrobial action against anaerobic bacteria and archaea-Clostridium butyricum, Clostridium acetobutylicum and Methanosarcina barkeri. The MICs are 45-150 microg/mL. Electron microscopy showed that lysis of M. barkeri cells in the presence of 49kDa fragment of Cry3Aa toxin is generally similar to the bacterial cell lysis, which has been previously detected in the presence of Cry11A, Cry1Ab and other Cry proteins. The Cry1D-like toxin from crystals of B. thuringiensis subsp. galleriae has been put forward as an example of the supposition that cell wall and some of its components like teichoic acid and N-acetylgalactosamine have possible influence on Cry toxins, enhancing their antimicrobial activity. The possible ecological role of the antimicrobial activity of Cry proteins is also discussed.
Collapse
Affiliation(s)
- Tatyana G Yudina
- Department of Microbiology, Biological Faculty, M.V. Lomonosov Moscow State University, Leninskie Gory 1/12, 119992 Moscow, Russian Federation
| | | | | | | | | | | | | | | |
Collapse
|
9
|
Agüero-Chapin G, González-Díaz H, Molina R, Varona-Santos J, Uriarte E, González-Díaz Y. Novel 2D maps and coupling numbers for protein sequences. The first QSAR study of polygalacturonases; isolation and prediction of a novel sequence fromPsidium guajavaL. FEBS Lett 2006; 580:723-30. [PMID: 16413021 DOI: 10.1016/j.febslet.2005.12.072] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2005] [Revised: 12/19/2005] [Accepted: 12/21/2005] [Indexed: 10/25/2022]
Abstract
The development of 2D graph-theoretic representations for DNA sequences was very important for qualitative and quantitative comparison of sequences. Calculation of numeric features for these representations is useful for DNA-QSAR studies. Most of all graph-theoretic representations identify each one of the four bases with a unitary walk in one axe direction in the 2D space. In the case of proteins, twenty amino acids instead of four bases have to be considered. This fact has limited the introduction of useful 2D Cartesian representations and the corresponding sequences descriptors to encode protein sequence information. In this study, we overcome this problem grouping amino acids into four groups: acid, basic, polar and non-polar amino acids. The identification of each group with one of the four axis directions determines a novel 2D representation and numeric descriptors for proteins sequences. Afterwards, a Markov model has been used to calculate new numeric descriptors of the protein sequence. These descriptors are called herein the sequence 2D coupling numbers (zeta(k)). In this work, we calculated the zeta(k) values for 108 sequences of different polygalacturonases (PGs) and for 100 sequences of other proteins. A Linear Discriminant Analysis model derived here (PG=5.36.zeta1-3.98.zeta3-42.21) successfully discriminates between PGs and other proteins. The model correctly classified 100% of a subset of 81 PGs and 75 non-PG proteins sequences used to train the model. The model also correctly classified 51 out of 52 (98.07%) of proteins sequences used as external validation series. The uses of different group of amino acids and/or axes orientation give different results, so it is suggested to be explored for other databases. Finally, to illustrates the use of the model we report the isolation and prediction of the PG action for a novel sequence (AY908988) isolated by our group from Psidium guajava L. This prediction coincides very well with sequence alignment results found by the BLAST methodology. These findings illustrate the possibilities of the sequence descriptors derived for this novel 2D sequence representation in proteins sequence QSAR studies.
Collapse
|
10
|
Molecular approaches for identification and construction of novel insecticidal genes for crop protection. World J Microbiol Biotechnol 2005. [DOI: 10.1007/s11274-005-9027-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
11
|
González-Díaz H, Agüero-Chapin G, Varona-Santos J, Molina R, de la Riva G, Uriarte E. 2D RNA-QSAR: assigning ACC oxidase family membership with stochastic molecular descriptors; isolation and prediction of a sequence from Psidium guajava L. Bioorg Med Chem Lett 2005; 15:2932-7. [PMID: 15878661 DOI: 10.1016/j.bmcl.2005.03.017] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2004] [Revised: 03/03/2005] [Accepted: 03/04/2005] [Indexed: 11/17/2022]
Abstract
Quantitative structure-activity relationship (QSAR) techniques for small molecules could be applied to nucleic acids. Unfortunately, almost all molecular descriptors are more successful at encoding branching information than sequences and/or cannot be back-projected. A solution for scaling the QSAR problem up to RNA may be to transform sequences into secondary structures first. Our group has used Markovian negentropies as molecular descriptors for drug design with preliminary results in bioinformatics [Bioinformatics 2003, 19, 2079]. However, RNA-QSAR studies on RNA molecules have not been described to date. Novel Markovian negentropies have been introduced here as molecular descriptors for 2D-RNA structures. An RNA-QSAR study of the ACC proteins from different plants has been carried out. The QSAR recognizes 19/20 sequences (95.0%) within the ACC family and 12/17 (70.6%) of the control group sequences. The model has a high Matthews' regression coefficient (C = 0.68). Overall cross-validation average accuracies were 14 out of 15 for ACC sequences (93.3%) and 10 out of 13 for control sequences (76.9%). Finally, ACC oxidase family membership was assigned to a new sequence isolated for the first time in this work from Psidium guajava L. A backprojection map for this sequence identifies the left stem (40%) and the main stem (45%) as highly important substructures. Results of an nBLAST experiment are consistent with this finding and indicate a high conservation score (>70) for left stem and main stem; whereas major loop, right stem, cap and major loop right half were hardly conserved.
Collapse
Affiliation(s)
- Humberto González-Díaz
- Department of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela, Spain.
| | | | | | | | | | | |
Collapse
|