1
|
Zhang X, Peng W, Chen H, Xing H. BnAP2-12 overexpression delays ramie flowering: evidence from AP2/ERF gene expression. FRONTIERS IN PLANT SCIENCE 2024; 15:1367837. [PMID: 38590749 PMCID: PMC10999622 DOI: 10.3389/fpls.2024.1367837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 03/12/2024] [Indexed: 04/10/2024]
Abstract
Introduction The APETALA2/ethylene response factor (AP2/ERF) superfamily plays a significant role in regulating plant gene expression in response to growth and development. To date, there have been no studies into whether the ramie AP2/ERF genes are involved in the regulation of flower development. Methods Here, 84 BnAP2/ERF members were identified from the ramie genome database, and various bioinformatics data on the AP2/ERF gene family, structure, replication, promoters and regulatory networks were analysed. BnAP2-12 was transferred into Arabidopsis through the flower-dipping method. Results Phylogenetic analysis classified the 84 BnAP2/ERF members into four subfamilies: AP2 (18), RAV (3), ERF (42), and DREB (21). The functional domain analysis of genes revealed 10 conserved motifs. Genetic mapping localised the 84 members on 14 chromosomes, among which chromosomes 1, 3, 5, and 8 had more members. Collinearity analysis revealed that 43.37% possibly resulted from replication events during the evolution of the ramie genome. Promoter sequence analysis identified classified cis-acting elements associated with plant growth and development, and responses to stress, hormones, and light. Transcriptomic comparison identified 3,635 differentially expressed genes (DEGs) between male and female flowers (1,803 and 1,832 upregulated and downregulated genes, respectively). Kyoto Encyclopaedia of Genes and Genomes pathway analysis categorised DEGs involved in metabolic pathways and biosynthesis of secondary metabolites. Gene Ontology enrichment analysis further identified enriched genes associated with pollen and female gamete formations. Of the 84 BnAP2/ERFs genes, 22 and 8 upregulated and downregulated genes, respectively, were present in female flowers. Co-expression network analysis identified AP2/ERF members associated with flower development, including BnAP2-12. Subcellular localisation analysis showed that the BnAP2-12 protein is localised in the nucleus and cell membrane. Overexpression BnAP2-12 delayed the flowering time of Arabidopsis thaliana. Conclusion These findings provide insights into the mechanism of ramie flower development.
Collapse
Affiliation(s)
- Xiaoyang Zhang
- Agricultural College of Hunan Agricultural University, Changsha, China
- Ramie Research Institute of Hunan Agricultural University, Changsha, China
| | - Wenxian Peng
- Ramie Research Institute of Hunan Agricultural University, Changsha, China
- Changsha Tobacco Company, Ningxiang, China
| | - Hao Chen
- Agricultural College of Hunan Agricultural University, Changsha, China
| | - Hucheng Xing
- Agricultural College of Hunan Agricultural University, Changsha, China
- Ramie Research Institute of Hunan Agricultural University, Changsha, China
- Hunan Key Laboratory of Germplasm Resources Innovation and Resource Utilization Crop Breeding Center, Changsha, China
- Hunan Provincial Engineering Technology Research Center of Grass Crop Germplasm Innovation and Utilization, Changsha, China
| |
Collapse
|
2
|
João M, Sena AC, Rebello VEF. On closing the inopportune gap with consistency transformation and iterative refinement. PLoS One 2023; 18:e0287483. [PMID: 37440507 PMCID: PMC10343097 DOI: 10.1371/journal.pone.0287483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 06/06/2023] [Indexed: 07/15/2023] Open
Abstract
The problem of aligning multiple biological sequences has fascinated scientists for a long time. Over the last four decades, tens of heuristic-based Multiple Sequence Alignment (MSA) tools have been proposed, the vast majority being built on the concept of Progressive Alignment. It is known, however, that this approach suffers from an inherent drawback regarding the inadvertent insertion of gaps when aligning sequences. Two well-known corrective solutions have frequently been adopted to help mitigate this: Consistency Transformation and Iterative Refinement. This paper takes a tool-independent technique-oriented look at the alignment quality benefits of these two strategies using problem instances from the HOMSTRAD and BAliBASE benchmarks. Eighty MSA aligners have been used to compare 4 classes of heuristics: Progressive Alignments, Iterative Alignments, Consistency-based Alignments, and Consistency-based Progressive Alignments with Iterative Refinement. Statistically, while both Consistency-based classes are better for alignments with low similarity, for sequences with higher similarity, the differences between the classes are less clear. Iterative Refinement has its own drawbacks resulting in there being statistically little advantage for Progressive Aligners to adopt this technique either with Consistency Transformation or without. Nevertheless, all 4 classes are capable of bettering each other, depending on the instance problem. This further motivates the development of MSA frameworks, such as the one being developed for this research, which simultaneously contemplate multiple classes and techniques in their attempt to uncover better solutions.
Collapse
Affiliation(s)
- Mario João
- Medical Sciences College, State University of Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
- Institute of Computing, Fluminense Federal University, Niterói, Rio de Janeiro, Brazil
| | - Alexandre C Sena
- Institute of Mathematics and Statistics, State University of Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Vinod E F Rebello
- Institute of Computing, Fluminense Federal University, Niterói, Rio de Janeiro, Brazil
| |
Collapse
|
3
|
Gu J, Xu Y, Nie Y. Role of distal sites in enzyme engineering. Biotechnol Adv 2023; 63:108094. [PMID: 36621725 DOI: 10.1016/j.biotechadv.2023.108094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 11/15/2022] [Accepted: 01/01/2023] [Indexed: 01/06/2023]
Abstract
The limitations associated with natural enzyme catalysis have triggered the rise of the field of protein engineering. Traditional rational design was based on the analysis of protein structural information and catalytic mechanisms to identify key active sites or ligand binding sites to reshape the substrate pocket. The role and significance of functional sites in the active center have been studied extensively. With a deeper understanding of the structure-catalysis relationship map, the entire protein molecule can be filled with residues that play a substantial role in its structure and function. However, the catalytic mechanism underlying distal mutations remains unclear. The aim of this review was to highlight the criticality of the distal site in enzyme engineering based on the following three aspects: What can distal mutations exert on function from mutability landscape? How do distal sites influence enzyme function? How to predict and design distal mutations? This review provides insights into the catalytic mechanism of enzymes from the global interaction network, knowledge from sequence-structure-dynamics-function relationships, and strategies for distal mutation-based protein engineering.
Collapse
Affiliation(s)
- Jie Gu
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Yan Xu
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China; State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| | - Yao Nie
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China; Suqian Industrial Technology Research Institute of Jiangnan University, Suqian 223814, China.
| |
Collapse
|
4
|
Roadmap to the study of gene and protein phylogeny and evolution-A practical guide. PLoS One 2023; 18:e0279597. [PMID: 36827278 PMCID: PMC9955684 DOI: 10.1371/journal.pone.0279597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 12/12/2022] [Indexed: 02/25/2023] Open
Abstract
Developments in sequencing technologies and the sequencing of an ever-increasing number of genomes have revolutionised studies of biodiversity and organismal evolution. This accumulation of data has been paralleled by the creation of numerous public biological databases through which the scientific community can mine the sequences and annotations of genomes, transcriptomes, and proteomes of multiple species. However, to find the appropriate databases and bioinformatic tools for respective inquiries and aims can be challenging. Here, we present a compilation of DNA and protein databases, as well as bioinformatic tools for phylogenetic reconstruction and a wide range of studies on molecular evolution. We provide a protocol for information extraction from biological databases and simple phylogenetic reconstruction using probabilistic and distance methods, facilitating the study of biodiversity and evolution at the molecular level for the broad scientific community.
Collapse
|
5
|
Wang X, Xu K, Tan Y, Liu S, Zhou J. Possibilities of Using De Novo Design for Generating Diverse Functional Food Enzymes. Int J Mol Sci 2023; 24:3827. [PMID: 36835238 PMCID: PMC9964944 DOI: 10.3390/ijms24043827] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 02/03/2023] [Accepted: 02/03/2023] [Indexed: 02/17/2023] Open
Abstract
Food enzymes have an important role in the improvement of certain food characteristics, such as texture improvement, elimination of toxins and allergens, production of carbohydrates, enhancing flavor/appearance characteristics. Recently, along with the development of artificial meats, food enzymes have been employed to achieve more diverse functions, especially in converting non-edible biomass to delicious foods. Reported food enzyme modifications for specific applications have highlighted the significance of enzyme engineering. However, using direct evolution or rational design showed inherent limitations due to the mutation rates, which made it difficult to satisfy the stability or specific activity needs for certain applications. Generating functional enzymes using de novo design, which highly assembles naturally existing enzymes, provides potential solutions for screening desired enzymes. Here, we describe the functions and applications of food enzymes to introduce the need for food enzymes engineering. To illustrate the possibilities of using de novo design for generating diverse functional proteins, we reviewed protein modelling and de novo design methods and their implementations. The future directions for adding structural data for de novo design model training, acquiring diversified training data, and investigating the relationship between enzyme-substrate binding and activity were highlighted as challenges to overcome for the de novo design of food enzymes.
Collapse
Affiliation(s)
- Xinglong Wang
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology, School of Biotechnology, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Jiangnan University, Wuxi 214122, China
| | - Kangjie Xu
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology, School of Biotechnology, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Jiangnan University, Wuxi 214122, China
| | - Yameng Tan
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology, School of Biotechnology, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Jiangnan University, Wuxi 214122, China
| | - Song Liu
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology, School of Biotechnology, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Jiangnan University, Wuxi 214122, China
| | - Jingwen Zhou
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology, School of Biotechnology, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Jiangnan University, Wuxi 214122, China
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology, Jiangnan University, Wuxi 214122, China
| |
Collapse
|
6
|
Planas-Iglesias J, Marques SM, Pinto GP, Musil M, Stourac J, Damborsky J, Bednar D. Computational design of enzymes for biotechnological applications. Biotechnol Adv 2021; 47:107696. [PMID: 33513434 DOI: 10.1016/j.biotechadv.2021.107696] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 01/12/2021] [Accepted: 01/13/2021] [Indexed: 12/14/2022]
Abstract
Enzymes are the natural catalysts that execute biochemical reactions upholding life. Their natural effectiveness has been fine-tuned as a result of millions of years of natural evolution. Such catalytic effectiveness has prompted the use of biocatalysts from multiple sources on different applications, including the industrial production of goods (food and beverages, detergents, textile, and pharmaceutics), environmental protection, and biomedical applications. Natural enzymes often need to be improved by protein engineering to optimize their function in non-native environments. Recent technological advances have greatly facilitated this process by providing the experimental approaches of directed evolution or by enabling computer-assisted applications. Directed evolution mimics the natural selection process in a highly accelerated fashion at the expense of arduous laboratory work and economic resources. Theoretical methods provide predictions and represent an attractive complement to such experiments by waiving their inherent costs. Computational techniques can be used to engineer enzymatic reactivity, substrate specificity and ligand binding, access pathways and ligand transport, and global properties like protein stability, solubility, and flexibility. Theoretical approaches can also identify hotspots on the protein sequence for mutagenesis and predict suitable alternatives for selected positions with expected outcomes. This review covers the latest advances in computational methods for enzyme engineering and presents many successful case studies.
Collapse
Affiliation(s)
- Joan Planas-Iglesias
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Sérgio M Marques
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Gaspar P Pinto
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Milos Musil
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic; IT4Innovations Centre of Excellence, Faculty of Information Technology, Brno University of Technology, 61266 Brno, Czech Republic
| | - Jan Stourac
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic.
| | - David Bednar
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00 Brno, Czech Republic.
| |
Collapse
|
7
|
Ali M, Ishqi HM, Husain Q. Enzyme engineering: Reshaping the biocatalytic functions. Biotechnol Bioeng 2020; 117:1877-1894. [DOI: 10.1002/bit.27329] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Revised: 01/13/2020] [Accepted: 03/09/2020] [Indexed: 12/19/2022]
Affiliation(s)
- Misha Ali
- Department of Biochemistry, Faculty of Life SciencesAligarh Muslim University Aligarh Uttar Pradesh India
| | | | - Qayyum Husain
- Department of Biochemistry, Faculty of Life SciencesAligarh Muslim University Aligarh Uttar Pradesh India
| |
Collapse
|
8
|
Abstract
Background Protein sequence alignment analyses have become a crucial step for many bioinformatics studies during the past decades. Multiple sequence alignment (MSA) and pair-wise sequence alignment (PSA) are two major approaches in sequence alignment. Former benchmark studies revealed drawbacks of MSA methods on nucleotide sequence alignments. To test whether similar drawbacks also influence protein sequence alignment analyses, we propose a new benchmark framework for protein clustering based on cluster validity. This new framework directly reflects the biological ground truth of the application scenarios that adopt sequence alignments, and evaluates the alignment quality according to the achievement of the biological goal, rather than the comparison on sequence level only, which averts the biases introduced by alignment scores or manual alignment templates. Compared with former studies, we calculate the cluster validity score based on sequence distances instead of clustering results. This strategy could avoid the influence brought by different clustering methods thus make results more dependable. Results Results showed that PSA methods performed better than MSA methods on most of the BAliBASE benchmark datasets. Analyses on the 80 re-sampled benchmark datasets constructed by randomly choosing 90% of each dataset 10 times showed similar results. Conclusions These results validated that the drawbacks of MSA methods revealed in nucleotide level also existed in protein sequence alignment analyses and affect the accuracy of results. Electronic supplementary material The online version of this article (10.1186/s12859-018-2524-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yingying Wang
- Research Center for Biomedical Information Technology, Shenzhen Institutes of Advanced Technologies, Chinese Academy of Sciences, Shenzhen, China
| | - Hongyan Wu
- Research Center for Biomedical Information Technology, Shenzhen Institutes of Advanced Technologies, Chinese Academy of Sciences, Shenzhen, China.
| | - Yunpeng Cai
- Research Center for Biomedical Information Technology, Shenzhen Institutes of Advanced Technologies, Chinese Academy of Sciences, Shenzhen, China.
| |
Collapse
|
9
|
Baichoo S, Ouzounis CA. Computational complexity of algorithms for sequence comparison, short-read assembly and genome alignment. Biosystems 2017; 156-157:72-85. [PMID: 28392341 DOI: 10.1016/j.biosystems.2017.03.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Revised: 03/21/2017] [Accepted: 03/22/2017] [Indexed: 12/12/2022]
Abstract
A multitude of algorithms for sequence comparison, short-read assembly and whole-genome alignment have been developed in the general context of molecular biology, to support technology development for high-throughput sequencing, numerous applications in genome biology and fundamental research on comparative genomics. The computational complexity of these algorithms has been previously reported in original research papers, yet this often neglected property has not been reviewed previously in a systematic manner and for a wider audience. We provide a review of space and time complexity of key sequence analysis algorithms and highlight their properties in a comprehensive manner, in order to identify potential opportunities for further research in algorithm or data structure optimization. The complexity aspect is poised to become pivotal as we will be facing challenges related to the continuous increase of genomic data on unprecedented scales and complexity in the foreseeable future, when robust biological simulation at the cell level and above becomes a reality.
Collapse
Affiliation(s)
- Shakuntala Baichoo
- Department of Computer Science & Engineering, University of Mauritius, Réduit 80837, Mauritius.
| | - Christos A Ouzounis
- Biological Computation & Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thessalonica 57001, Greece.
| |
Collapse
|
10
|
Zambrano-Vega C, Nebro AJ, Durillo JJ, García-Nieto J, Aldana-Montes JF. Multiple Sequence Alignment with Multiobjective Metaheuristics. A Comparative Study. INT J INTELL SYST 2017. [DOI: 10.1002/int.21892] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Cristian Zambrano-Vega
- Facultad de Ciencias de la Ingeniería; Universidad Técnica Estatal de Quevedo; Quevedo Ecuador
| | - Antonio J. Nebro
- Edificio de Investigación Ada Byron; University of Málaga; Málaga Spain
| | - Juan J. Durillo
- Distributed and Parallel Systems Group; University of Innsbruck; Innsbruck Austria
| | - José García-Nieto
- Edificio de Investigación Ada Byron; University of Málaga; Málaga Spain
| | | |
Collapse
|
11
|
Rani RR, Ramyachitra D. Multiple sequence alignment using multi-objective based bacterial foraging optimization algorithm. Biosystems 2016; 150:177-189. [PMID: 27784624 DOI: 10.1016/j.biosystems.2016.10.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Revised: 10/18/2016] [Accepted: 10/18/2016] [Indexed: 10/20/2022]
Abstract
Multiple sequence alignment (MSA) is a widespread approach in computational biology and bioinformatics. MSA deals with how the sequences of nucleotides and amino acids are sequenced with possible alignment and minimum number of gaps between them, which directs to the functional, evolutionary and structural relationships among the sequences. Still the computation of MSA is a challenging task to provide an efficient accuracy and statistically significant results of alignments. In this work, the Bacterial Foraging Optimization Algorithm was employed to align the biological sequences which resulted in a non-dominated optimal solution. It employs Multi-objective, such as: Maximization of Similarity, Non-gap percentage, Conserved blocks and Minimization of gap penalty. BAliBASE 3.0 benchmark database was utilized to examine the proposed algorithm against other methods In this paper, two algorithms have been proposed: Hybrid Genetic Algorithm with Artificial Bee Colony (GA-ABC) and Bacterial Foraging Optimization Algorithm. It was found that Hybrid Genetic Algorithm with Artificial Bee Colony performed better than the existing optimization algorithms. But still the conserved blocks were not obtained using GA-ABC. Then BFO was used for the alignment and the conserved blocks were obtained. The proposed Multi-Objective Bacterial Foraging Optimization Algorithm (MO-BFO) was compared with widely used MSA methods Clustal Omega, Kalign, MUSCLE, MAFFT, Genetic Algorithm (GA), Ant Colony Optimization (ACO), Artificial Bee Colony (ABC), Particle Swarm Optimization (PSO) and Hybrid Genetic Algorithm with Artificial Bee Colony (GA-ABC). The final results show that the proposed MO-BFO algorithm yields better alignment than most widely used methods.
Collapse
Affiliation(s)
- R Ranjani Rani
- Department of Computer Science, Bharathiar University, Coimbatore, Tamilnadu, India.
| | - D Ramyachitra
- Department of Computer Science, Bharathiar University, Coimbatore, Tamilnadu, India.
| |
Collapse
|
12
|
Abstract
BACKGROUND Aligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only one parameter setting may provide an undesirable bias in further steps of the analysis and give too simplistic interpretations. In this work, we reformulate multiple sequence alignment from a multiobjective point of view. The goal is to generate several sequence alignments that represent a trade-off between maximizing the substitution score and minimizing the number of indels/gaps in the sum-of-pairs score function. This trade-off gives to the practitioner further information about the similarity of the sequences, from which she could analyse and choose the most plausible alignment. METHODS We introduce several heuristic approaches, based on local search procedures, that compute a set of sequence alignments, which are representative of the trade-off between the two objectives (substitution score and indels). Several algorithm design options are discussed and analysed, with particular emphasis on the influence of the starting alignment and neighborhood search definitions on the overall performance. A perturbation technique is proposed to improve the local search, which provides a wide range of high-quality alignments. RESULTS AND CONCLUSIONS The proposed approach is tested experimentally on a wide range of instances. We performed several experiments with sequences obtained from the benchmark database BAliBASE 3.0. To evaluate the quality of the results, we calculate the hypervolume indicator of the set of score vectors returned by the algorithms. The results obtained allow us to identify reasonably good choices of parameters for our approach. Further, we compared our method in terms of correctly aligned pairs ratio and columns correctly aligned ratio with respect to reference alignments. Experimental results show that our approaches can obtain better results than TCoffee and Clustal Omega in terms of the first ratio.
Collapse
Affiliation(s)
- Maryam Abbasi
- CISUC, Department of Informatics Engineering, University of Coimbra, Polo II, Pinhal de Marrocos, 3030-290 Coimbra, Portugal
| | - Luís Paquete
- CISUC, Department of Informatics Engineering, University of Coimbra, Polo II, Pinhal de Marrocos, 3030-290 Coimbra, Portugal
| | - Francisco B. Pereira
- CISUC, Department of Informatics Engineering, University of Coimbra, Polo II, Pinhal de Marrocos, 3030-290 Coimbra, Portugal
- Polytechnic Institute of Coimbra, Rua Pedro Nunes, Quinta da Nora, 3030-199 Coimbra, Portugal
| |
Collapse
|
13
|
Kadioglu O, Efferth T. Peptide aptamer identified by molecular docking targeting translationally controlled tumor protein in leukemia cells. Invest New Drugs 2016; 34:515-21. [DOI: 10.1007/s10637-016-0339-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2016] [Accepted: 03/04/2016] [Indexed: 11/29/2022]
|
14
|
Al-Shatnawi M, Ahmad MO, Swamy MNS. MSAIndelFR: a scheme for multiple protein sequence alignment using information on indel flanking regions. BMC Bioinformatics 2015; 16:393. [PMID: 26597571 PMCID: PMC4657235 DOI: 10.1186/s12859-015-0826-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2015] [Accepted: 11/14/2015] [Indexed: 11/16/2022] Open
Abstract
Background The alignment of multiple protein sequences is one of the most commonly performed tasks in bioinformatics. In spite of considerable research and efforts that have been recently deployed for improving the performance of multiple sequence alignment (MSA) algorithms, finding a highly accurate alignment between multiple protein sequences is still a challenging problem. Results We propose a novel and efficient algorithm called, MSAIndelFR, for multiple sequence alignment using the information on the predicted locations of IndelFRs and the computed average log–loss values obtained from IndelFR predictors, each of which is designed for a different protein fold. We demonstrate that the introduction of a new variable gap penalty function based on the predicted locations of the IndelFRs and the computed average log–loss values into the proposed algorithm substantially improves the protein alignment accuracy. This is illustrated by evaluating the performance of the algorithm in aligning sequences belonging to the protein folds for which the IndelFR predictors already exist and by using the reference alignments of the four popular benchmarks, BAliBASE 3.0, OXBENCH, PREFAB 4.0, and SABRE (SABmark 1.65). Conclusions We have proposed a novel and efficient algorithm, the MSAIndelFR algorithm, for multiple protein sequence alignment incorporating a new variable gap penalty function. It is shown that the performance of the proposed algorithm is superior to that of the most–widely used alignment algorithms, Clustal W2, Clustal Omega, Kalign2, MSAProbs, MAFFT, MUSCLE, ProbCons and Probalign, in terms of both the sum–of–pairs and total column metrics. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0826-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mufleh Al-Shatnawi
- Department of Electrical and Computer Engineering, Concordia University, 1455 De Maisonneuve Blvd. W., Montreal, H3G 1M8, Quebec, Canada.
| | - M Omair Ahmad
- Department of Electrical and Computer Engineering, Concordia University, 1455 De Maisonneuve Blvd. W., Montreal, H3G 1M8, Quebec, Canada.
| | - M N S Swamy
- Department of Electrical and Computer Engineering, Concordia University, 1455 De Maisonneuve Blvd. W., Montreal, H3G 1M8, Quebec, Canada.
| |
Collapse
|
15
|
Parente DJ, Ray JCJ, Swint-Kruse L. Amino acid positions subject to multiple coevolutionary constraints can be robustly identified by their eigenvector network centrality scores. Proteins 2015; 83:2293-306. [PMID: 26503808 DOI: 10.1002/prot.24948] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 09/21/2015] [Accepted: 10/14/2015] [Indexed: 12/21/2022]
Abstract
As proteins evolve, amino acid positions key to protein structure or function are subject to mutational constraints. These positions can be detected by analyzing sequence families for amino acid conservation or for coevolution between pairs of positions. Coevolutionary scores are usually rank-ordered and thresholded to reveal the top pairwise scores, but they also can be treated as weighted networks. Here, we used network analyses to bypass a major complication of coevolution studies: For a given sequence alignment, alternative algorithms usually identify different, top pairwise scores. We reconciled results from five commonly-used, mathematically divergent algorithms (ELSC, McBASC, OMES, SCA, and ZNMI), using the LacI/GalR and 1,6-bisphosphate aldolase protein families as models. Calculations used unthresholded coevolution scores from which column-specific properties such as sequence entropy and random noise were subtracted; "central" positions were identified by calculating various network centrality scores. When compared among algorithms, network centrality methods, particularly eigenvector centrality, showed markedly better agreement than comparisons of the top pairwise scores. Positions with large centrality scores occurred at key structural locations and/or were functionally sensitive to mutations. Further, the top central positions often differed from those with top pairwise coevolution scores: instead of a few strong scores, central positions often had multiple, moderate scores. We conclude that eigenvector centrality calculations reveal a robust evolutionary pattern of constraints-detectable by divergent algorithms--that occur at key protein locations. Finally, we discuss the fact that multiple patterns coexist in evolutionary data that, together, give rise to emergent protein functions.
Collapse
Affiliation(s)
- Daniel J Parente
- Department of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, Kansas, 66160
| | - J Christian J Ray
- Center for Computational Biology and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, 66047
| | - Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, Kansas, 66160
| |
Collapse
|
16
|
Ortuño FM, Valenzuela O, Prieto B, Saez-Lara MJ, Torres C, Pomares H, Rojas I. Comparing different machine learning and mathematical regression models to evaluate multiple sequence alignments. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2015.01.080] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
17
|
Benoit JB, Hansen IA, Szuter EM, Drake LL, Burnett DL, Attardo GM. Emerging roles of aquaporins in relation to the physiology of blood-feeding arthropods. J Comp Physiol B 2014; 184:811-25. [DOI: 10.1007/s00360-014-0836-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2014] [Revised: 05/21/2014] [Accepted: 05/28/2014] [Indexed: 01/18/2023]
|
18
|
Benoit JB, Hansen IA, Attardo GM, Michalková V, Mireji PO, Bargul JL, Drake LL, Masiga DK, Aksoy S. Aquaporins are critical for provision of water during lactation and intrauterine progeny hydration to maintain tsetse fly reproductive success. PLoS Negl Trop Dis 2014; 8:e2517. [PMID: 24762803 PMCID: PMC3998938 DOI: 10.1371/journal.pntd.0002517] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2013] [Accepted: 09/20/2013] [Indexed: 12/26/2022] Open
Abstract
Tsetse flies undergo drastic fluctuations in their water content throughout their adult life history due to events such as blood feeding, dehydration and lactation, an essential feature of the viviparous reproductive biology of tsetse. Aquaporins (AQPs) are transmembrane proteins that allow water and other solutes to permeate through cellular membranes. Here we identify tsetse aquaporin (AQP) genes, examine their expression patterns under different physiological conditions (blood feeding, lactation and stress response) and perform functional analysis of three specific genes utilizing RNA interference (RNAi) gene silencing. Ten putative aquaporins were identified in the Glossina morsitans morsitans (Gmm) genome, two more than has been previously documented in any other insect. All organs, tissues, and body parts examined had distinct AQP expression patterns. Two AQP genes, gmmdripa and gmmdripb ( = gmmaqp1a and gmmaqp1b) are highly expressed in the milk gland/fat body tissues. The whole-body transcript levels of these two genes vary over the course of pregnancy. A set of three AQPs (gmmaqp5, gmmaqp2a, and gmmaqp4b) are expressed highly in the Malpighian tubules. Knockdown of gmmdripa and gmmdripb reduced the efficiency of water loss following a blood meal, increased dehydration tolerance and reduced heat tolerance of adult females. Knockdown of gmmdripa extended pregnancy length, and gmmdripb knockdown resulted in extended pregnancy duration and reduced progeny production. We found that knockdown of AQPs increased tsetse milk osmolality and reduced the water content in developing larva. Combined knockdown of gmmdripa, gmmdripb and gmmaqp5 extended pregnancy by 4–6 d, reduced pupal production by nearly 50%, increased milk osmolality by 20–25% and led to dehydration of feeding larvae. Based on these results, we conclude that gmmDripA and gmmDripB are critical for diuresis, stress tolerance and intrauterine lactation through the regulation of water and/or other uncharged solutes. Glossina sp. are responsible for transmission of African trypanosomes, the causative agents of sleeping sickness in humans and Nagana in cattle. Blood feeding and nutrient provisioning through lactation during intrauterine progeny development are periods when considerable water movement occurs within tsetse flies. With the completion of the tsetse fly genome, we sought to characterize the role of aquaporins in relation water homeostasis during blood feeding, stress tolerance and the lactation cycle. We provide evidence that specific AQPs are 1. critical during diuresis following a bloodmeal, 2. important in the regulation of dehydration resistance and heat tolerance and 3. crucial in the allocation of water within tsetse milk that is necessary for progeny hydration. Specifically, we discovered a novel tsetse AQP that is imperative to lactation and may represent a potential target for population control of this disease vector.
Collapse
Affiliation(s)
- Joshua B. Benoit
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, Yale University, New Haven, Connecticut, United States of America
- * E-mail:
| | - Immo A. Hansen
- Department of Biology and Institute of Applied Biosciences, New Mexico State University, Las Cruces, New Mexico, United States of America
| | - Geoffrey M. Attardo
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, Yale University, New Haven, Connecticut, United States of America
| | - Veronika Michalková
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, Yale University, New Haven, Connecticut, United States of America
- Institute of Zoology, Slovak Academy of Sciences, Bratislava, Slovakia
| | - Paul O. Mireji
- Department of Biochemistry and Molecular Biology, Egerton University, Njoro, Kenya
| | - Joel L. Bargul
- Molecular Biology and Bioinformatics Unit, International Center of Insect Physiology and Ecology (ICIPE), Nairobi, Kenya
| | - Lisa L. Drake
- Department of Biology and Institute of Applied Biosciences, New Mexico State University, Las Cruces, New Mexico, United States of America
| | - Daniel K. Masiga
- Molecular Biology and Bioinformatics Unit, International Center of Insect Physiology and Ecology (ICIPE), Nairobi, Kenya
| | - Serap Aksoy
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|
19
|
Abstract
Background Sequence alignment has become an indispensable tool in modern molecular biology research, and probabilistic sequence alignment models have been shown to provide an effective framework for building accurate sequence alignment tools. One such example is the pair hidden Markov model (pair-HMM), which has been especially popular in comparative sequence analysis for several reasons, including their effectiveness in modeling and detecting sequence homology, model simplicity, and the existence of efficient algorithms for applying the model to sequence alignment problems. However, despite these advantages, pair-HMMs also have a number of practical limitations that may degrade their alignment performance or render them unsuitable for certain alignment tasks. Results In this work, we propose a novel scheme for comparing and aligning biological sequences that can effectively address the shortcomings of the traditional pair-HMMs. The proposed scheme is based on a simple message-passing approach, where messages are exchanged between neighboring symbol pairs that may be potentially aligned in the optimal sequence alignment. The message-passing process yields probabilistic symbol alignment confidence scores, which may be used for predicting the optimal alignment that maximizes the expected number of correctly aligned symbol pairs. Conclusions Extensive performance evaluation on protein alignment benchmark datasets shows that the proposed message-passing scheme clearly outperforms the traditional pair-HMM-based approach, in terms of both alignment accuracy and computational efficiency. Furthermore, the proposed scheme is numerically robust and amenable to massive parallelization.
Collapse
|
20
|
Sebestova E, Bendl J, Brezovsky J, Damborsky J. Computational tools for designing smart libraries. Methods Mol Biol 2014; 1179:291-314. [PMID: 25055786 DOI: 10.1007/978-1-4939-1053-3_20] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Traditional directed evolution experiments are often time-, labor- and cost-intensive because they involve repeated rounds of random mutagenesis and the selection or screening of large mutant libraries. The efficiency of directed evolution experiments can be significantly improved by targeting mutagenesis to a limited number of hot-spot positions and/or selecting a limited set of substitutions. The design of such "smart" libraries can be greatly facilitated by in silico analyses and predictions. Here we provide an overview of computational tools applicable for (a) the identification of hot-spots for engineering enzyme properties, and (b) the evaluation of predicted hot-spots and selection of suitable amino acids for substitutions. The selected tools do not require any specific expertise and can easily be implemented by the wider scientific community.
Collapse
Affiliation(s)
- Eva Sebestova
- Loschmidt Laboratories, Masaryk University, Kamenice 5/A13, 625 00, Brno, Czech Republic
| | | | | | | |
Collapse
|
21
|
PROMALS3D: multiple protein sequence alignment enhanced with evolutionary and three-dimensional structural information. Methods Mol Biol 2014; 1079:263-71. [PMID: 24170408 DOI: 10.1007/978-1-62703-646-7_17] [Citation(s) in RCA: 178] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Multiple sequence alignment (MSA) is an essential tool with many applications in bioinformatics and computational biology. Accurate MSA construction for divergent proteins remains a difficult computational task. The constantly increasing protein sequences and structures in public databases could be used to improve alignment quality. PROMALS3D is a tool for protein MSA construction enhanced with additional evolutionary and structural information from database searches. PROMALS3D automatically identifies homologs from sequence and structure databases for input proteins, derives structure-based constraints from alignments of three-dimensional structures, and combines them with sequence-based constraints of profile-profile alignments in a consistency-based framework to construct high-quality multiple sequence alignments. PROMALS3D output is a consensus alignment enriched with sequence and structural information about input proteins and their homologs. PROMALS3D Web server and package are available at http://prodata.swmed.edu/PROMALS3D.
Collapse
|
22
|
Sahraeian SME, Yoon BJ. PicXAA: a probabilistic scheme for finding the maximum expected accuracy alignment of multiple biological sequences. Methods Mol Biol 2014; 1079:203-210. [PMID: 24170404 DOI: 10.1007/978-1-62703-646-7_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
PicXAA is a probabilistic nonprogressive alignment algorithm that finds protein (or DNA) multiple sequence alignments with maximum expected accuracy. PicXAA greedily builds up the alignment from sequence regions with high local similarity, thereby yielding an accurate global alignment that effectively captures the local similarities across sequences. PicXAA constantly yields accurate alignment results on a wide range of reference sets that have different characteristics, with especially remarkable improvements over other leading algorithms on sequence sets with high local similarities. In this chapter, we describe the overall alignment strategy used in PicXAA and discuss several important considerations for effective deployment of the algorithm.
Collapse
|
23
|
Warnow T. Large-Scale Multiple Sequence Alignment and Phylogeny Estimation. MODELS AND ALGORITHMS FOR GENOME EVOLUTION 2013. [DOI: 10.1007/978-1-4471-5298-9_6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
24
|
Verma R, Schwaneberg U, Roccatano D. Computer-Aided Protein Directed Evolution: a Review of Web Servers, Databases and other Computational Tools for Protein Engineering. Comput Struct Biotechnol J 2012; 2:e201209008. [PMID: 24688649 PMCID: PMC3962222 DOI: 10.5936/csbj.201209008] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2012] [Revised: 10/07/2012] [Accepted: 10/12/2012] [Indexed: 12/01/2022] Open
Abstract
The combination of computational and directed evolution methods has proven a winning strategy for protein engineering. We refer to this approach as computer-aided protein directed evolution (CAPDE) and the review summarizes the recent developments in this rapidly growing field. We will restrict ourselves to overview the availability, usability and limitations of web servers, databases and other computational tools proposed in the last five years. The goal of this review is to provide concise information about currently available computational resources to assist the design of directed evolution based protein engineering experiment.
Collapse
Affiliation(s)
- Rajni Verma
- School of Engineering and Science, Jacobs University Bremen, Campus Ring 1, 28759 Bremen, Germany ; Department of Biotechnology, RWTH Aachen University, Worringer Weg 1, 52074 Aachen, Germany
| | - Ulrich Schwaneberg
- Department of Biotechnology, RWTH Aachen University, Worringer Weg 1, 52074 Aachen, Germany
| | - Danilo Roccatano
- School of Engineering and Science, Jacobs University Bremen, Campus Ring 1, 28759 Bremen, Germany
| |
Collapse
|
25
|
Li X, Zhang Z, Song J. Computational enzyme design approaches with significant biological outcomes: progress and challenges. Comput Struct Biotechnol J 2012; 2:e201209007. [PMID: 24688648 PMCID: PMC3962085 DOI: 10.5936/csbj.201209007] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2012] [Revised: 09/27/2012] [Accepted: 10/04/2012] [Indexed: 11/29/2022] Open
Abstract
Enzymes are powerful biocatalysts, however, so far there is still a large gap between the number of enzyme-based practical applications and that of naturally occurring enzymes. Multiple experimental approaches have been applied to generate nearly all possible mutations of target enzymes, allowing the identification of desirable variants with improved properties to meet the practical needs. Meanwhile, an increasing number of computational methods have been developed to assist in the modification of enzymes during the past few decades. With the development of bioinformatic algorithms, computational approaches are now able to provide more precise guidance for enzyme engineering and make it more efficient and less laborious. In this review, we summarize the recent advances of method development with significant biological outcomes to provide important insights into successful computational protein designs. We also discuss the limitations and challenges of existing methods and the future directions that should improve them.
Collapse
Affiliation(s)
- Xiaoman Li
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, Tianjin 300308, China
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Jiangning Song
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, Tianjin 300308, China ; Department of Biochemistry and Molecular Biology and ARC Centre of Excellence in Structural and Functional Microbial Genomics, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
26
|
Ortuño FM, Valenzuela O, Pomares H, Rojas F, Florido JP, Urquiza JM, Rojas I. Predicting the accuracy of multiple sequence alignment algorithms by using computational intelligent techniques. Nucleic Acids Res 2012; 41:e26. [PMID: 23066102 PMCID: PMC3592395 DOI: 10.1093/nar/gks919] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Multiple sequence alignments (MSAs) have become one of the most studied approaches in bioinformatics to perform other outstanding tasks such as structure prediction, biological function analysis or next-generation sequencing. However, current MSA algorithms do not always provide consistent solutions, since alignments become increasingly difficult when dealing with low similarity sequences. As widely known, these algorithms directly depend on specific features of the sequences, causing relevant influence on the alignment accuracy. Many MSA tools have been recently designed but it is not possible to know in advance which one is the most suitable for a particular set of sequences. In this work, we analyze some of the most used algorithms presented in the bibliography and their dependences on several features. A novel intelligent algorithm based on least square support vector machine is then developed to predict how accurate each alignment could be, depending on its analyzed features. This algorithm is performed with a dataset of 2180 MSAs. The proposed system first estimates the accuracy of possible alignments. The most promising methodologies are then selected in order to align each set of sequences. Since only one selected algorithm is run, the computational time is not excessively increased.
Collapse
Affiliation(s)
- Francisco M Ortuño
- Department of Computer Architecture and Computer Technology, University of Granada, 18071 Granada, Spain.
| | | | | | | | | | | | | |
Collapse
|
27
|
Binda E, Marcone GL, Pollegioni L, Marinelli F. Characterization of VanYn, a novel D,D-peptidase/D,D-carboxypeptidase involved in glycopeptide antibiotic resistance in Nonomuraea sp. ATCC 39727. FEBS J 2012; 279:3203-13. [PMID: 22788848 DOI: 10.1111/j.1742-4658.2012.08706.x] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
VanY(n) is a novel protein involved in the mechanism of self-resistance in Nonomuraea sp. ATCC 39727, which produces the glycopeptide antibiotic A40926, the precursor of the second-generation dalbavancin, which is in phase III of clinical development. VanY(n) (196 residues) is encoded by the dbv7 gene within the dbv biosynthetic cluster devoted to A40926 production. C-terminal His6-tagged VanY(n) was successfully expressed as a soluble and active protein in Escherichia coli. The analysis of the sequence suggests the presence of a hydrophobic transmembrane portion and two conserved sequences (SxHxxGxAxD and ExxH) in the extracytoplasmic domain that are potentially involved in coordination of Zn(2+) and catalytic activity. The presence of these conserved sequences indicates a similar mechanism of action and substrate binding in VanY(n) as in VanY, VanX and VanXY Zn(2+)-dependent D,D-carboxypeptidases and D-Ala-D-Ala dipeptidases acting on peptidoglycan maturation and involved in glycopeptide resistance in pathogens. On substrates mimicking peptidoglycan precursors, VanY(n) shows D,D-carboxypeptidase and D,D-dipeptidase activity, but lacks D,D-carboxyesterase ability on D-Ala-D-Lac-terminating peptides. VanY(n) belongs to the metallo-D,D-carboxypeptidase family, but it is inhibited by β-lactams. Its characterization provides new insights into the evolution and transfer of resistance determinants from environmental glycopeptide-producing actinomycetes (such as Nonomuraea sp.) to glycopeptide-resistant pathogens (enterococci and staphylococci). It may also contribute to an early warning system for emerging resistance mechanisms following the introduction into clinics of a second-generation glycopeptide such as dalbavancin.
Collapse
Affiliation(s)
- Elisa Binda
- Department of Biotechnology and Life Sciences, University of Insubria, Varese, Italy.
| | | | | | | |
Collapse
|
28
|
Pruesse E, Peplies J, Glöckner FO. SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes. ACTA ACUST UNITED AC 2012; 28:1823-9. [PMID: 22556368 PMCID: PMC3389763 DOI: 10.1093/bioinformatics/bts252] [Citation(s) in RCA: 2087] [Impact Index Per Article: 173.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Motivation: In the analysis of homologous sequences, computation of multiple sequence alignments (MSAs) has become a bottleneck. This is especially troublesome for marker genes like the ribosomal RNA (rRNA) where already millions of sequences are publicly available and individual studies can easily produce hundreds of thousands of new sequences. Methods have been developed to cope with such numbers, but further improvements are needed to meet accuracy requirements. Results: In this study, we present the SILVA Incremental Aligner (SINA) used to align the rRNA gene databases provided by the SILVA ribosomal RNA project. SINA uses a combination of k-mer searching and partial order alignment (POA) to maintain very high alignment accuracy while satisfying high throughput performance demands. SINA was evaluated in comparison with the commonly used high throughput MSA programs PyNAST and mothur. The three BRAliBase III benchmark MSAs could be reproduced with 99.3, 97.6 and 96.1 accuracy. A larger benchmark MSA comprising 38 772 sequences could be reproduced with 98.9 and 99.3% accuracy using reference MSAs comprising 1000 and 5000 sequences. SINA was able to achieve higher accuracy than PyNAST and mothur in all performed benchmarks. Availability: Alignment of up to 500 sequences using the latest SILVA SSU/LSU Ref datasets as reference MSA is offered at http://www.arb-silva.de/aligner. This page also links to Linux binaries, user manual and tutorial. SINA is made available under a personal use license. Contact:epruesse@mpi-bremen.de Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Elmar Pruesse
- Microbial Genomics Group, Max Planck Institute for Marine Microbiology, Celsiusstr.1, 28359 Bremen, Germany.
| | | | | |
Collapse
|
29
|
Plyusnin I, Holm L. Comprehensive comparison of graph based multiple protein sequence alignment strategies. BMC Bioinformatics 2012; 13:64. [PMID: 22540977 PMCID: PMC3375188 DOI: 10.1186/1471-2105-13-64] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2011] [Accepted: 04/29/2012] [Indexed: 12/03/2022] Open
Abstract
Background Alignment of protein sequences (MPSA) is the starting point for a multitude of applications in molecular biology. Here, we present a novel MPSA program based on the SeqAn sequence alignment library. Our implementation has a strict modular structure, which allows to swap different components of the alignment process and, thus, to investigate their contribution to the alignment quality and computation time. We systematically varied information sources, guiding trees, score transformations and iterative refinement options, and evaluated the resulting alignments on BAliBASE and SABmark. Results Our results indicate the optimal alignment strategy based on the choices compared. First, we show that pairwise global and local alignments contain sufficient information to construct a high quality multiple alignment. Second, single linkage clustering is almost invariably the best algorithm to build a guiding tree for progressive alignment. Third, triplet library extension, with introduction of new edges, is the most efficient consistency transformation of those compared. Alternatively, one can apply tree dependent partitioning as a post processing step, which was shown to be comparable with the best consistency transformation in both time and accuracy. Finally, propagating information beyond four transitive links introduces more noise than signal. Conclusions This is the first time multiple protein alignment strategies are comprehensively and clearly compared using a single implementation platform. In particular, we showed which of the existing consistency transformations and iterative refinement techniques are the most valid. Our implementation is freely available at http://ekhidna.biocenter.helsinki.fi/MMSA and as a supplementary file attached to this article (see Additional file 1).
Collapse
Affiliation(s)
- Ilya Plyusnin
- Institute of Biotechnology, University of Helsinki, P,O, Box 56, Viikinkaari 5, Helsinki, Finland.
| | | |
Collapse
|
30
|
Hamada M, Asai K. A classification of bioinformatics algorithms from the viewpoint of maximizing expected accuracy (MEA). J Comput Biol 2012; 19:532-49. [PMID: 22313125 DOI: 10.1089/cmb.2011.0197] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Many estimation problems in bioinformatics are formulated as point estimation problems in a high-dimensional discrete space. In general, it is difficult to design reliable estimators for this type of problem, because the number of possible solutions is immense, which leads to an extremely low probability for every solution-even for the one with the highest probability. Therefore, maximum score and maximum likelihood estimators do not work well in this situation although they are widely employed in a number of applications. Maximizing expected accuracy (MEA) estimation, in which accuracy measures of the target problem and the entire distribution of solutions are considered, is a more successful approach. In this review, we provide an extensive discussion of algorithms and software based on MEA. We describe how a number of algorithms used in previous studies can be classified from the viewpoint of MEA. We believe that this review will be useful not only for users wishing to utilize software to solve the estimation problems appearing in this article, but also for developers wishing to design algorithms on the basis of MEA.
Collapse
Affiliation(s)
- Michiaki Hamada
- Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Japan.
| | | |
Collapse
|
31
|
Krishnadev O, Srinivasan N. AlignHUSH: alignment of HMMs using structure and hydrophobicity information. BMC Bioinformatics 2011; 12:275. [PMID: 21729312 PMCID: PMC3228556 DOI: 10.1186/1471-2105-12-275] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2010] [Accepted: 07/05/2011] [Indexed: 11/10/2022] Open
Abstract
Background Sensitive remote homology detection and accurate alignments especially in the midnight zone of sequence similarity are needed for better function annotation and structural modeling of proteins. An algorithm, AlignHUSH for HMM-HMM alignment has been developed which is capable of recognizing distantly related domain families The method uses structural information, in the form of predicted secondary structure probabilities, and hydrophobicity of amino acids to align HMMs of two sets of aligned sequences. The effect of using adjoining column(s) information has also been investigated and is found to increase the sensitivity of HMM-HMM alignments and remote homology detection. Results We have assessed the performance of AlignHUSH using known evolutionary relationships available in SCOP. AlignHUSH performs better than the best HMM-HMM alignment methods and is observed to be even more sensitive at higher error rates. Accuracy of the alignments obtained using AlignHUSH has been assessed using the structure-based alignments available in BaliBASE. The alignment length and the alignment quality are found to be appropriate for homology modeling and function annotation. The alignment accuracy is found to be comparable to existing methods for profile-profile alignments. Conclusions A new method to align HMMs has been developed and is shown to have better sensitivity at error rates of 10% and above when compared to other available programs. The proposed method could effectively aid obtaining clues to functions of proteins of yet unknown function. A web-server incorporating the AlignHUSH method is available at http://crick.mbu.iisc.ernet.in/~alignhush/
Collapse
Affiliation(s)
- Oruganty Krishnadev
- Molecular Biophysics Unit Indian Institute of Science, Bangalore 560012, India
| | | |
Collapse
|
32
|
Blazewicz J, Frohmberg W, Kierzynka M, Pesch E, Wojciechowski P. Protein alignment algorithms with an efficient backtracking routine on multiple GPUs. BMC Bioinformatics 2011; 12:181. [PMID: 21599912 PMCID: PMC3125261 DOI: 10.1186/1471-2105-12-181] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2010] [Accepted: 05/20/2011] [Indexed: 12/01/2022] Open
Abstract
Background Pairwise sequence alignment methods are widely used in biological research. The increasing number of sequences is perceived as one of the upcoming challenges for sequence alignment methods in the nearest future. To overcome this challenge several GPU (Graphics Processing Unit) computing approaches have been proposed lately. These solutions show a great potential of a GPU platform but in most cases address the problem of sequence database scanning and computing only the alignment score whereas the alignment itself is omitted. Thus, the need arose to implement the global and semiglobal Needleman-Wunsch, and Smith-Waterman algorithms with a backtracking procedure which is needed to construct the alignment. Results In this paper we present the solution that performs the alignment of every given sequence pair, which is a required step for progressive multiple sequence alignment methods, as well as for DNA recognition at the DNA assembly stage. Performed tests show that the implementation, with performance up to 6.3 GCUPS on a single GPU for affine gap penalties, is very efficient in comparison to other CPU and GPU-based solutions. Moreover, multiple GPUs support with load balancing makes the application very scalable. Conclusions The article shows that the backtracking procedure of the sequence alignment algorithms may be designed to fit in with the GPU architecture. Therefore, our algorithm, apart from scores, is able to compute pairwise alignments. This opens a wide range of new possibilities, allowing other methods from the area of molecular biology to take advantage of the new computational architecture. Performed tests show that the efficiency of the implementation is excellent. Moreover, the speed of our GPU-based algorithms can be almost linearly increased when using more than one graphics card.
Collapse
|
33
|
Shu N, Elofsson A. KalignP: improved multiple sequence alignments using position specific gap penalties in Kalign2. ACTA ACUST UNITED AC 2011; 27:1702-3. [PMID: 21505030 PMCID: PMC3106193 DOI: 10.1093/bioinformatics/btr235] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
SUMMARY Kalign2 is one of the fastest and most accurate methods for multiple alignments. However, in contrast to other methods Kalign2 does not allow externally supplied position specific gap penalties. Here, we present a modification to Kalign2, KalignP, so that it accepts such penalties. Further, we show that KalignP using position specific gap penalties obtained from predicted secondary structures makes steady improvement over Kalign2 when tested on Balibase 3.0 as well as on a dataset derived from Pfam-A seed alignments. AVAILABILITY AND IMPLEMENTATION KalignP is freely available at http://kalignp.cbr.su.se. The source code of KalignP is available under the GNU General Public License, Version 2 or later from the same website.
Collapse
Affiliation(s)
- Nanjiang Shu
- Department of Biochemistry and Biophysics, Stockholm Bioinformatics Center, Center for Biomembrane Research, Swedish e-science Research Center, Stockholm University, 106 91 Stockholm, Sweden
| | | |
Collapse
|
34
|
Abstract
Homology modeling is based on the observation that related protein sequences adopt similar three-dimensional structures. Hence, a homology model of a protein can be derived using related protein structure(s) as modeling template(s). A key step in this approach is the establishment of correspondence between residues of the protein to be modeled and those of modeling template(s). This step, often referred to as sequence-structure alignment, is one of the major determinants of the accuracy of a homology model. This chapter gives an overview of methods for deriving sequence-structure alignments and discusses recent methodological developments leading to improved performance. However, no method is perfect. How to find alignment regions that may have errors and how to make improvements? This is another focus of this chapter. Finally, the chapter provides a practical guidance of how to get the most of the available tools in maximizing the accuracy of sequence-structure alignments.
Collapse
|
35
|
Beyond directed evolution--semi-rational protein engineering and design. Curr Opin Biotechnol 2010; 21:734-43. [PMID: 20869867 DOI: 10.1016/j.copbio.2010.08.011] [Citation(s) in RCA: 287] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2010] [Revised: 08/29/2010] [Accepted: 08/30/2010] [Indexed: 11/22/2022]
Abstract
Over the past two decades, directed evolution has transformed the field of protein engineering. The advances in understanding protein structure and function, in no insignificant part a result of directed evolution studies, are increasingly empowering scientists and engineers to device more effective methods for manipulating and tailoring biocatalysts. Abandoning large combinatorial libraries, the focus has shifted to small, functionally rich libraries and rational design. A critical component to the success of these emerging engineering strategies are computational tools for the evaluation of protein sequence datasets and the analysis of conformational variations of amino acids in proteins. Highlighting the opportunities and limitations of such approaches, this review focuses on recent engineering and design examples that require screening or selection of small libraries.
Collapse
|
36
|
Sahraeian SME, Yoon BJ. PicXAA: greedy probabilistic construction of maximum expected accuracy alignment of multiple sequences. Nucleic Acids Res 2010; 38:4917-28. [PMID: 20413579 PMCID: PMC2926610 DOI: 10.1093/nar/gkq255] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2009] [Revised: 03/25/2010] [Accepted: 03/26/2010] [Indexed: 11/13/2022] Open
Abstract
Accurate tools for multiple sequence alignment (MSA) are essential for comparative studies of the function and structure of biological sequences. However, it is very challenging to develop a computationally efficient algorithm that can consistently predict accurate alignments for various types of sequence sets. In this article, we introduce PicXAA (Probabilistic Maximum Accuracy Alignment), a probabilistic non-progressive alignment algorithm that aims to find protein alignments with maximum expected accuracy. PicXAA greedily builds up the multiple alignment from sequence regions with high local similarities, thereby yielding an accurate global alignment that effectively grasps the local similarities among sequences. Evaluations on several widely used benchmark sets show that PicXAA constantly yields accurate alignment results on a wide range of reference sets, with especially remarkable improvements over other leading algorithms on sequence sets with local similarities. PicXAA source code is freely available at: http://www.ece.tamu.edu/~bjyoon/picxaa/.
Collapse
Affiliation(s)
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
| |
Collapse
|
37
|
Kemena C, Notredame C. Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics 2009; 25:2455-65. [PMID: 19648142 PMCID: PMC2752613 DOI: 10.1093/bioinformatics/btp452] [Citation(s) in RCA: 150] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2009] [Revised: 06/24/2009] [Accepted: 07/16/2009] [Indexed: 12/22/2022] Open
Abstract
This review focuses on recent trends in multiple sequence alignment tools. It describes the latest algorithmic improvements including the extension of consistency-based methods to the problem of template-based multiple sequence alignments. Some results are presented suggesting that template-based methods are significantly more accurate than simpler alternative methods. The validation of existing methods is also discussed at length with the detailed description of recent results and some suggestions for future validation strategies. The last part of the review addresses future challenges for multiple sequence alignment methods in the genomic era, most notably the need to cope with very large sequences, the need to integrate large amounts of experimental data, the need to accurately align non-coding and non-transcribed sequences and finally, the need to integrate many alternative methods and approaches.
Collapse
Affiliation(s)
- Carsten Kemena
- Centre For Genomic Regulation, Pompeus Fabre University, Carrer del Doctor Aiguader 88, 08003 Barcelona, Spain
| | | |
Collapse
|
38
|
Abstract
Molecular modeling techniques have made significant advances in recent years and are becoming essential components of many chemical, physical and biological studies. Here we present three widely used techniques used in the simulation of biomolecular systems: structural and homology modeling, molecular dynamics and molecular docking. For each of these topics we present a brief discussion of the underlying scientific basis of the technique, some simple examples of how the method is commonly applied, and some discussion of the limitations and caveats of which the user should be aware. References for further reading as well as an extensive list of software resources are provided.
Collapse
Affiliation(s)
- Akansha Saxena
- Biomedical Engineering, Washington University, St Louis, Missouri, USA
| | - Diana Wong
- Biomedical Engineering, Washington University, St Louis, Missouri, USA
| | - Karthikeyan Diraviyam
- Biomedical Engineering and Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - David Sept
- Biomedical Engineering and Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|