1
|
Ji M, Kan Y, Kim D, Lee S, Yi G. DeepPI: Alignment-Free Analysis of Flexible Length Proteins Based on Deep Learning and Image Generator. Interdiscip Sci 2024; 16:1-12. [PMID: 38568406 DOI: 10.1007/s12539-024-00618-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 02/01/2024] [Accepted: 02/03/2024] [Indexed: 09/19/2024]
Abstract
With the rapid development of NGS technology, the number of protein sequences has increased exponentially. Computational methods have been introduced in protein functional studies because the analysis of large numbers of proteins through biological experiments is costly and time-consuming. In recent years, new approaches based on deep learning have been proposed to overcome the limitations of conventional methods. Although deep learning-based methods effectively utilize features of protein function, they are limited to sequences of fixed-length and consider information from adjacent amino acids. Therefore, new protein analysis tools that extract functional features from proteins of flexible length and train models are required. We introduce DeepPI, a deep learning-based tool for analyzing proteins in large-scale database. The proposed model that utilizes Global Average Pooling is applied to proteins of flexible length and leads to reduced information loss compared to existing algorithms that use fixed sizes. The image generator converts a one-dimensional sequence into a distinct two-dimensional structure, which can extract common parts of various shapes. Finally, filtering techniques automatically detect representative data from the entire database and ensure coverage of large protein databases. We demonstrate that DeepPI has been successfully applied to large databases such as the Pfam-A database. Comparative experiments on four types of image generators illustrated the impact of structure on feature extraction. The filtering performance was verified by varying the parameter values and proved to be applicable to large databases. Compared to existing methods, DeepPI outperforms in family classification accuracy for protein function inference.
Collapse
Affiliation(s)
- Mingeun Ji
- Department of Multimedia Engineering, Dongguk University, Seoul, 04620, Korea
| | - Yejin Kan
- Department of Multimedia Engineering, Dongguk University, Seoul, 04620, Korea
| | - Dongyeon Kim
- Department of Artificial Intelligence, Dongguk University, Seoul, 04620, Korea
| | - Seungmin Lee
- Department of Multimedia Engineering, Dongguk University, Seoul, 04620, Korea
| | - Gangman Yi
- Department of Multimedia Engineering, Dongguk University, Seoul, 04620, Korea.
- Department of Artificial Intelligence, Dongguk University, Seoul, 04620, Korea.
- Division of AI Software Convergence, Dongguk University, Seoul, 04620, Korea.
| |
Collapse
|
2
|
Sun Y, Florio TJ, Gupta S, Young MC, Marshall QF, Garfinkle SE, Papadaki GF, Truong HV, Mycek E, Li P, Farrel A, Church NL, Jabar S, Beasley MD, Kiefel BR, Yarmarkovich M, Mallik L, Maris JM, Sgourakis NG. Structural principles of peptide-centric Chimeric Antigen Receptor recognition guide therapeutic expansion. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.24.542108. [PMID: 37292750 PMCID: PMC10245919 DOI: 10.1101/2023.05.24.542108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Peptide-Centric Chimeric Antigen Receptors (PC-CARs), which recognize oncoprotein epitopes displayed by human leukocyte antigens (HLAs) on the cell surface, offer a promising strategy for targeted cancer therapy 1 . We have previously developed a PC-CAR targeting a neuroblastoma- associated PHOX2B peptide, leading to robust tumor cell lysis restricted by two common HLA allotypes 2 . Here, we determine the 2.1 Å structure of the PC-CAR:PHOX2B/HLA-A*24:02/β2m complex, which reveals the basis for antigen-specific recognition through interactions with CAR complementarity-determining regions (CDRs). The PC-CAR adopts a diagonal docking mode, where interactions with both conserved and polymorphic HLA framework residues permit recognition of multiple HLA allotypes from the A9 serological cross-reactivity group, covering a combined American population frequency of up to 25.2%. Comprehensive characterization using biochemical binding assays, molecular dynamics simulations, and structural and functional analyses demonstrate that high-affinity PC-CAR recognition of cross-reactive pHLAs necessitates the presentation of a specific peptide backbone, where subtle structural adaptations of the peptide are critical for high-affinity complex formation and CAR-T cell killing. Our results provide a molecular blueprint for engineering CARs with optimal recognition of tumor-associated antigens in the context of different HLAs, while minimizing cross-reactivity with self-epitopes.
Collapse
|
3
|
Lajevardy SA, Kargari M. Developing new genetic algorithm based on integer programming for multiple sequence alignment. Soft comput 2022. [DOI: 10.1007/s00500-022-06790-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
4
|
Li Y. Sequence Alignment with Q-Learning Based on the Actor-Critic Model. ACM T ASIAN LOW-RESO 2021. [DOI: 10.1145/3433540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Multiple sequence alignment methods refer to a series of algorithmic solutions for the alignment of evolutionary-related sequences while taking into account evolutionary events such as mutations, insertions, deletions, and rearrangements under certain conditions. In this article, we propose a method with Q-learning based on the Actor-Critic model for sequence alignment. We transform the sequence alignment problem into an agent's autonomous learning process. In this process, the reward of the possible next action taken is calculated, and the cumulative reward of the entire process is calculated. The results show that the method we propose is better than the gene algorithm and the dynamic programming method.
Collapse
Affiliation(s)
- Yarong Li
- The Experimental High School Attached to Beijing Normal University, Beijing, China
| |
Collapse
|
5
|
Esgin E, Karagoz P. Extracting process hierarchies by multi-sequence alignment adaptations. ENTERP INF SYST-UK 2021. [DOI: 10.1080/17517575.2021.1913239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Eren Esgin
- AI Research, MBIS R&D Center, Istanbul, Turkey
- Informatics Institute, Middle East Technical University, Ankara, Turkey
| | - Pinar Karagoz
- Computer Engineering Department, Middle East Technical University, Ankara, Turkey
| |
Collapse
|
6
|
Protein Analysis: From Sequence to Structure. Adv Bioinformatics 2021. [DOI: 10.1007/978-981-33-6191-1_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
7
|
Structures of c-di-GMP/cGAMP degrading phosphodiesterase VcEAL: identification of a novel conformational switch and its implication. Biochem J 2020; 476:3333-3353. [PMID: 31647518 DOI: 10.1042/bcj20190399] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 10/17/2019] [Accepted: 10/21/2019] [Indexed: 01/02/2023]
Abstract
Cyclic dinucleotides (CDNs) have emerged as the central molecules that aid bacteria to adapt and thrive in changing environmental conditions. Therefore, tight regulation of intracellular CDN concentration by counteracting the action of dinucleotide cyclases and phosphodiesterases (PDEs) is critical. Here, we demonstrate that a putative stand-alone EAL domain PDE from Vibrio cholerae (VcEAL) is capable to degrade both the second messenger c-di-GMP and hybrid 3'3'-cyclic GMP-AMP (cGAMP). To unveil their degradation mechanism, we have determined high-resolution crystal structures of VcEAL with Ca2+, c-di-GMP-Ca2+, 5'-pGpG-Ca2+ and cGAMP-Ca2+, the latter provides the first structural basis of cGAMP hydrolysis. Structural studies reveal a typical triosephosphate isomerase barrel-fold with substrate c-di-GMP/cGAMP bound in an extended conformation. Highly conserved residues specifically bind the guanine base of c-di-GMP/cGAMP in the G2 site while the semi-conserved nature of residues at the G1 site could act as a specificity determinant. Two metal ions, co-ordinated with six stubbornly conserved residues and two non-bridging scissile phosphate oxygens of c-di-GMP/cGAMP, activate a water molecule for an in-line attack on the phosphodiester bond, supporting two-metal ion-based catalytic mechanism. PDE activity and biofilm assays of several prudently designed mutants collectively demonstrate that VcEAL active site is charge and size optimized. Intriguingly, in VcEAL-5'-pGpG-Ca2+ structure, β5-α5 loop adopts a novel conformation that along with conserved E131 creates a new metal-binding site. This novel conformation along with several subtle changes in the active site designate VcEAL-5'-pGpG-Ca2+ structure quite different from other 5'-pGpG bound structures reported earlier.
Collapse
|
8
|
A bi-objective function optimization approach for multiple sequence alignment using genetic algorithm. Soft comput 2020. [DOI: 10.1007/s00500-020-04917-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
9
|
Sharma S, Ahmed M, Akhter Y. The molecular link between tyrosol binding to tri6 transcriptional regulator and downregulation of trichothecene biosynthesis. Biochimie 2019; 160:14-23. [DOI: 10.1016/j.biochi.2019.01.021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2018] [Accepted: 01/30/2019] [Indexed: 10/27/2022]
|
10
|
Ashkenazy H, Sela I, Levy Karin E, Landan G, Pupko T. Multiple Sequence Alignment Averaging Improves Phylogeny Reconstruction. Syst Biol 2019; 68:117-130. [PMID: 29771363 PMCID: PMC6657586 DOI: 10.1093/sysbio/syy036] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Revised: 05/07/2018] [Accepted: 05/09/2018] [Indexed: 01/11/2023] Open
Abstract
The classic methodology of inferring a phylogenetic tree from sequence data is composed of two steps. First, a multiple sequence alignment (MSA) is computed. Then, a tree is reconstructed assuming the MSA is correct. Yet, inferred MSAs were shown to be inaccurate and alignment errors reduce tree inference accuracy. It was previously proposed that filtering unreliable alignment regions can increase the accuracy of tree inference. However, it was also demonstrated that the benefit of this filtering is often obscured by the resulting loss of phylogenetic signal. In this work we explore an approach, in which instead of relying on a single MSA, we generate a large set of alternative MSAs and concatenate them into a single SuperMSA. By doing so, we account for phylogenetic signals contained in columns that are not present in the single MSA computed by alignment algorithms. Using simulations, we demonstrate that this approach results, on average, in more accurate trees compared to 1) using an unfiltered MSA and 2) using a single MSA with weights assigned to columns according to their reliability. Next, we explore in which regions of the MSA space our approach is expected to be beneficial. Finally, we provide a simple criterion for deciding whether or not the extra effort of computing a SuperMSA and inferring a tree from it is beneficial. Based on these assessments, we expect our methodology to be useful for many cases in which diverged sequences are analyzed. The option to generate such a SuperMSA is available at http://guidance.tau.ac.il.
Collapse
Affiliation(s)
- Haim Ashkenazy
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Tel Aviv, Israel
| | - Itamar Sela
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Eli Levy Karin
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Tel Aviv, Israel
- Department of Molecular Biology & Ecology of Plants, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Giddy Landan
- Institute of Microbiology, Christian-Albrechts-University of Kiel, 24118 Kiel, Germany
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Tel Aviv, Israel
| |
Collapse
|
11
|
Mahajan R, Verma S, Kushwaha M, Singh D, Akhter Y, Chatterjee S. Biodegradation of di‑n‑butyl phthalate by psychrotolerant Sphingobium yanoikuyae strain P4 and protein structural analysis of carboxylesterase involved in the pathway. Int J Biol Macromol 2018; 122:806-816. [PMID: 30395899 DOI: 10.1016/j.ijbiomac.2018.10.225] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Revised: 10/31/2018] [Accepted: 10/31/2018] [Indexed: 02/07/2023]
Abstract
A priority pollutant Phthalate Esters (PAEs) are widely used as plasticizers and are responsible mainly for carcinogenicity and endocrine disruption in human. For the bioremediation of PAEs, a psychrotolerant Sphingobium yanoikuyae strain P4, capable of utilizing many phthalates di‑methyl phthalate (DMP), di‑ethyl phthalate (DEP), di‑n‑butyl phthalate (DBP), di‑isobutyl phthalate (DIBP), butyl benzyl phthalate (BBP), and few Polycyclic Aromatic Hydrocarbons as the sole source of carbon and energy was isolated from Palampur, Kangra, Himachal Pradesh, India. 100% utilization of DBP (1 g L-1) by the strain was observed within 24 h of incubation at 28 °C. Interestingly the strain also degraded DBP completely at 20 °C and 15 °C within 36 h and 60 h, respectively. Esterase involved in DBP degradation was found to be inducible in nature and intracellular. Comparative sequence analysis of carboxylesterase enzyme sequences revealed conserved motifs: G-X-S-X-G and -HGG- which were the characteristic peptide motifs reported in different esterases. Structural analysis showed that the enzyme belongs to serine hydrolase superfamily, which has an α/β hydrolase fold. Interaction and binding of DBP to a catalytic Ser184 residue in the esterase enzyme were also analysed. In conclusion, carboxylesterase possess the required active site which may be involved in the catabolism of DBP.
Collapse
Affiliation(s)
- Rishi Mahajan
- Bioremediation and Metabolomics Research Group, Department of Environmental Sciences, Central University of Himachal Pradesh, Temporary Academic Block-Shahpur, District-Kangra, Himachal Pradesh 176206, India; Department of Chemistry and Chemical Sciences, Central University of Himachal Pradesh, Temporary Academic Block-Shahpur, District-Kangra, Himachal Pradesh 176206, India
| | - Shalini Verma
- Bioremediation and Metabolomics Research Group, Department of Environmental Sciences, Central University of Himachal Pradesh, Temporary Academic Block-Shahpur, District-Kangra, Himachal Pradesh 176206, India
| | - Madhulika Kushwaha
- Bioremediation and Metabolomics Research Group, Department of Environmental Sciences, Central University of Himachal Pradesh, Temporary Academic Block-Shahpur, District-Kangra, Himachal Pradesh 176206, India
| | - Dharam Singh
- Molecular and Microbial Genetics Lab, Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, District-Kangra, Himachal Pradesh 176061, India
| | - Yusuf Akhter
- Department of Biotechnology, Babasaheb Bhimrao Ambedkar University, Vidya Vihar, Raebareli Road, Lucknow, Uttar Pradesh 226025, India.
| | - Subhankar Chatterjee
- Bioremediation and Metabolomics Research Group, Department of Environmental Sciences, Central University of Himachal Pradesh, Temporary Academic Block-Shahpur, District-Kangra, Himachal Pradesh 176206, India; Department of Chemistry and Chemical Sciences, Central University of Himachal Pradesh, Temporary Academic Block-Shahpur, District-Kangra, Himachal Pradesh 176206, India.
| |
Collapse
|
12
|
Ghavami S, Toozandehjani H, Ghavami G, Sardari S. Innovative protein translation into music and color image applicable for assessing protein alignment based on bio-mimicking human perception system. Int J Biol Macromol 2018; 119:896-901. [PMID: 30076932 DOI: 10.1016/j.ijbiomac.2018.07.185] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2018] [Revised: 07/11/2018] [Accepted: 07/29/2018] [Indexed: 11/17/2022]
Abstract
One of the valuable bioinformatics techniques is protein sequence alignment which is a method of searching, comparing and ordering the sequences of protein. The pointed method is employed to recognize region of similarity which may be a significance of functional, structural, or evolutionary relatives between the protein sequences. In current investigation, an innovative similarity search/alignment algorithm for pattern recognition process of protein structures in the frame of bio-mimicking pattern recognition capabilities of human visual and auditory systems towards investigating more and more novel approaches in the field of protein sequence alignment procedure. The selected approach in current investigation based on the concept of intra scientific facts and using both capabilities of bioinformatics and psychological knowledge led to present the unique automatic translational system (ATS-P) for translating protein structures to musical composition in addition to image color combination towards finalizing innovative pattern and method for protein alignment. Actually during current study, the perception of protein sequence via visual and sonic representation was projected to support researchers in the process of protein pattern recognition and structural demonstrating. In the other word, this presented algorithm confirmed their properties by bio-mimicking of developed visual and auditory perception systems can progress proficient trend to assist protein relevant scientists towards successful protein alignment procedure.
Collapse
Affiliation(s)
- Setareh Ghavami
- Drug Design and Bioinformatics Unit, Department of Medical Biotechnology, Biotechnology Research Center, Pasteur Institute of Iran, Tehran 13164, Iran; Department of Psychology, School of Humanities, Neyshabur Branch, Islamic Azad University, Neyshabur, Iran
| | - Hassan Toozandehjani
- Department of Psychology, School of Humanities, Neyshabur Branch, Islamic Azad University, Neyshabur, Iran
| | - Ghazaleh Ghavami
- Drug Design and Bioinformatics Unit, Department of Medical Biotechnology, Biotechnology Research Center, Pasteur Institute of Iran, Tehran 13164, Iran
| | - Soroush Sardari
- Drug Design and Bioinformatics Unit, Department of Medical Biotechnology, Biotechnology Research Center, Pasteur Institute of Iran, Tehran 13164, Iran.
| |
Collapse
|
13
|
Sharma S, Ahmed M, Akhter Y. The revelation of selective sphingolipid pathway inhibition mechanism on fumonisin toxin binding to ceramide synthases in susceptible organisms and survival mechanism in resistant species. Biochimie 2018; 149:41-50. [DOI: 10.1016/j.biochi.2018.03.014] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Accepted: 03/30/2018] [Indexed: 10/17/2022]
|
14
|
Nicoludis JM, Gaudet R. Applications of sequence coevolution in membrane protein biochemistry. BIOCHIMICA ET BIOPHYSICA ACTA. BIOMEMBRANES 2018; 1860:895-908. [PMID: 28993150 PMCID: PMC5807202 DOI: 10.1016/j.bbamem.2017.10.004] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 09/28/2017] [Accepted: 10/02/2017] [Indexed: 12/22/2022]
Abstract
Recently, protein sequence coevolution analysis has matured into a predictive powerhouse for protein structure and function. Direct methods, which use global statistical models of sequence coevolution, have enabled the prediction of membrane and disordered protein structures, protein complex architectures, and the functional effects of mutations in proteins. The field of membrane protein biochemistry and structural biology has embraced these computational techniques, which provide functional and structural information in an otherwise experimentally-challenging field. Here we review recent applications of protein sequence coevolution analysis to membrane protein structure and function and highlight the promising directions and future obstacles in these fields. We provide insights and guidelines for membrane protein biochemists who wish to apply sequence coevolution analysis to a given experimental system.
Collapse
Affiliation(s)
- John M Nicoludis
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, United States
| | - Rachelle Gaudet
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, 02138, United States.
| |
Collapse
|
15
|
Chowdhury SR, Sen U. Crystal structure of heat shock protein 15 (Hsp15) from Vibrio cholerae: Novel mode of trimerization and nucleic acid binding properties. Biochem Biophys Res Commun 2018; 497:1076-1081. [PMID: 29486158 DOI: 10.1016/j.bbrc.2018.02.182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 02/23/2018] [Indexed: 11/29/2022]
Abstract
Vibrio cholerae, experiences a highly hostile environment at human intestine which trigger the induction of various heat shock genes. VcHsp15, the hslR gene product of V. cholerae O395 is a highly up regulated protein which targets erroneously dislodged 50S subunit upon heat shock that carries a tRNA attached to the abortive nascent polypeptide chain, and recycle it for another round of translation. In this study we report the crystal structure of VcHsp15 at 2.33 Å. Although the structure of VcHsp15 share very similar fold to E. Coli Hsp15 their oligomerization properties are quite different. While EcHsp15 is a monomer, VcHsp15 exhibit a novel trimeric form both in crystal structure and in solution. The putative αL motif of VcHsp15 shares a strikingly similar fold with several RNA binding proteins like ribosomal protein S4 and threonyl-tRNA synthetase. Curiously, their αL motif display a comparable surface charge, albeit extremely low sequence identity, indicating that this motif serves as a basic module to bind RNA.
Collapse
Affiliation(s)
- Sanghati Roy Chowdhury
- Crystallography and Molecular Biology Division, Saha Institute of Nuclear Physics, HBNI, 1/AF Bidhan Nagar, Kolkata 700064, India
| | - Udayaditya Sen
- Crystallography and Molecular Biology Division, Saha Institute of Nuclear Physics, HBNI, 1/AF Bidhan Nagar, Kolkata 700064, India.
| |
Collapse
|
16
|
Sharma S, Kumari I, Hussain R, Ahmed M, Akhter Y. Species specific substrates and products choices of 4- O -acetyltransferase from Trichoderma brevicompactum. Enzyme Microb Technol 2017. [DOI: 10.1016/j.enzmictec.2017.05.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
17
|
Tessier CJG, Emlaw JR, Cao ZQ, Pérez-Areales FJ, Salameh JPJ, Prinston JE, McNulty MS, daCosta CJB. Back to the future: Rational maps for exploring acetylcholine receptor space and time. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2017; 1865:1522-1528. [PMID: 28844740 DOI: 10.1016/j.bbapap.2017.08.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Revised: 08/09/2017] [Accepted: 08/11/2017] [Indexed: 12/27/2022]
Abstract
Global functions of nicotinic acetylcholine receptors, such as subunit cooperativity and compatibility, likely emerge from a network of amino acid residues distributed across the entire pentameric complex. Identification of such networks has stymied traditional approaches to acetylcholine receptor structure and function, likely due to the cryptic interdependency of their underlying amino acid residues. An emerging evolutionary biochemistry approach, which traces the evolutionary history of acetylcholine receptor subunits, allows for rational mapping of acetylcholine receptor sequence space, and offers new hope for uncovering the amino acid origins of these enigmatic properties.
Collapse
Affiliation(s)
- Christian J G Tessier
- Department of Chemistry and Biomolecular Sciences, Centre for Chemical and Synthetic Biology, University of Ottawa, 10 Marie-Curie, Ottawa, Ontario K1N 6N5, Canada
| | - Johnathon R Emlaw
- Department of Chemistry and Biomolecular Sciences, Centre for Chemical and Synthetic Biology, University of Ottawa, 10 Marie-Curie, Ottawa, Ontario K1N 6N5, Canada
| | - Zhuo Qian Cao
- Department of Chemistry and Biomolecular Sciences, Centre for Chemical and Synthetic Biology, University of Ottawa, 10 Marie-Curie, Ottawa, Ontario K1N 6N5, Canada
| | - F Javier Pérez-Areales
- Department of Chemistry and Biomolecular Sciences, Centre for Chemical and Synthetic Biology, University of Ottawa, 10 Marie-Curie, Ottawa, Ontario K1N 6N5, Canada
| | - Jean-Paul J Salameh
- Department of Chemistry and Biomolecular Sciences, Centre for Chemical and Synthetic Biology, University of Ottawa, 10 Marie-Curie, Ottawa, Ontario K1N 6N5, Canada
| | - Jethro E Prinston
- Department of Chemistry and Biomolecular Sciences, Centre for Chemical and Synthetic Biology, University of Ottawa, 10 Marie-Curie, Ottawa, Ontario K1N 6N5, Canada
| | - Melissa S McNulty
- Department of Chemistry and Biomolecular Sciences, Centre for Chemical and Synthetic Biology, University of Ottawa, 10 Marie-Curie, Ottawa, Ontario K1N 6N5, Canada
| | - Corrie J B daCosta
- Department of Chemistry and Biomolecular Sciences, Centre for Chemical and Synthetic Biology, University of Ottawa, 10 Marie-Curie, Ottawa, Ontario K1N 6N5, Canada.
| |
Collapse
|
18
|
Chowdhury B, Garai G. A review on multiple sequence alignment from the perspective of genetic algorithm. Genomics 2017; 109:419-431. [PMID: 28669847 DOI: 10.1016/j.ygeno.2017.06.007] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Revised: 05/27/2017] [Accepted: 06/27/2017] [Indexed: 01/04/2023]
Abstract
Sequence alignment is an active research area in the field of bioinformatics. It is also a crucial task as it guides many other tasks like phylogenetic analysis, function, and/or structure prediction of biological macromolecules like DNA, RNA, and Protein. Proteins are the building blocks of every living organism. Although protein alignment problem has been studied for several decades, unfortunately, every available method produces alignment results differently for a single alignment problem. Multiple sequence alignment is characterized as a very high computational complex problem. Many stochastic methods, therefore, are considered for improving the accuracy of alignment. Among them, many researchers frequently use Genetic Algorithm. In this study, we have shown different types of the method applied in alignment and the recent trends in the multiobjective genetic algorithm for solving multiple sequence alignment. Many recent studies have demonstrated considerable progress in finding the alignment accuracy.
Collapse
Affiliation(s)
- Biswanath Chowdhury
- Department of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, Kolkata, WB, 700009, India.
| | - Gautam Garai
- Computational Sciences Division, Saha Institute of Nuclear Physics, Kolkata, WB 700064, India.
| |
Collapse
|
19
|
Das S, Roy Chowdhury S, Dey S, Sen U. Structural and biochemical studies on Vibrio cholerae Hsp31 reveals a novel dimeric form and Glutathione-independent Glyoxalase activity. PLoS One 2017; 12:e0172629. [PMID: 28235098 PMCID: PMC5325305 DOI: 10.1371/journal.pone.0172629] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Accepted: 02/07/2017] [Indexed: 11/23/2022] Open
Abstract
Vibrio cholerae experiences a highly hostile environment at human intestine which triggers the induction of various heat shock genes. The hchA gene product of V. cholerae O395, referred to a hypothetical intracellular protease/amidase VcHsp31, is one such stress-inducible homodimeric protein. Our current study demonstrates that VcHsp31 is endowed with molecular chaperone, amidopeptidase and robust methylglyoxalase activities. Through site directed mutagenesis coupled with biochemical assays on VcHsp31, we have confirmed the role of residues in the vicinity of the active site towards amidopeptidase and methylglyoxalase activities. VcHsp31 suppresses the aggregation of insulin in vitro in a dose dependent manner. Through crystal structures of VcHsp31 and its mutants, grown at various temperatures, we demonstrate that VcHsp31 acquires two (Type-I and Type-II) dimeric forms. Type-I dimer is similar to EcHsp31 where two VcHsp31 monomers associate in eclipsed manner through several intersubunit hydrogen bonds involving their P-domains. Type-II dimer is a novel dimeric organization, where some of the intersubunit hydrogen bonds are abrogated and each monomer swings out in the opposite directions centering at their P-domains, like twisting of wet cloth. Normal mode analysis (NMA) of Type-I dimer shows similar movement of the individual monomers. Upon swinging, a dimeric surface of ~400Å2, mostly hydrophobic in nature, is uncovered which might bind partially unfolded protein substrates. We propose that, in solution, VcHsp31 remains as an equilibrium mixture of both the dimers. With increase in temperature, transformation to Type-II form having more exposed hydrophobic surface, occurs progressively accounting for the temperature dependent increase of chaperone activity of VcHsp31.
Collapse
Affiliation(s)
- Samir Das
- Structural Genomics Division, Saha Institute of Nuclear Physics, Kolkata, India
| | - Sanghati Roy Chowdhury
- Crystallography and Molecular Biology Division Saha Institute of Nuclear Physics, Kolkata, India
| | - Sanjay Dey
- Department of Biotechnology, St. Xavier’s College, Kolkata
| | - Udayaditya Sen
- Crystallography and Molecular Biology Division Saha Institute of Nuclear Physics, Kolkata, India
- * E-mail:
| |
Collapse
|
20
|
Burnett S, Furlong M, Melvin PG, Singiser R. Games that Enlist Collective Intelligence to Solve Complex Scientific Problems. JOURNAL OF MICROBIOLOGY & BIOLOGY EDUCATION 2016; 17:133-136. [PMID: 27047610 PMCID: PMC4798797 DOI: 10.1128/jmbe.v17i1.983] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
There is great value in employing the collective problem-solving power of large groups of people. Technological advances have allowed computer games to be utilized by a diverse population to solve problems. Science games are becoming more popular and cover various areas such as sequence alignments, DNA base-pairing, and protein and RNA folding. While these tools have been developed for the general population, they can also be used effectively in the classroom to teach students about various topics. Many games also employ a social component that entices students to continue playing and thereby to continue learning. The basic functions of game play and the potential of game play as a tool in the classroom are discussed in this article.
Collapse
Affiliation(s)
- Stephen Burnett
- Department of Biology, Clayton State University, Morrow, GA 30260
| | - Michelle Furlong
- Department of Biology, Clayton State University, Morrow, GA 30260
| | - Paul Guy Melvin
- Department of Biology, Clayton State University, Morrow, GA 30260
| | - Richard Singiser
- Department of Chemistry & Physics, Clayton State University, Morrow, GA 30260
| |
Collapse
|
21
|
Al-Shatnawi M, Ahmad MO, Swamy MNS. MSAIndelFR: a scheme for multiple protein sequence alignment using information on indel flanking regions. BMC Bioinformatics 2015; 16:393. [PMID: 26597571 PMCID: PMC4657235 DOI: 10.1186/s12859-015-0826-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2015] [Accepted: 11/14/2015] [Indexed: 11/16/2022] Open
Abstract
Background The alignment of multiple protein sequences is one of the most commonly performed tasks in bioinformatics. In spite of considerable research and efforts that have been recently deployed for improving the performance of multiple sequence alignment (MSA) algorithms, finding a highly accurate alignment between multiple protein sequences is still a challenging problem. Results We propose a novel and efficient algorithm called, MSAIndelFR, for multiple sequence alignment using the information on the predicted locations of IndelFRs and the computed average log–loss values obtained from IndelFR predictors, each of which is designed for a different protein fold. We demonstrate that the introduction of a new variable gap penalty function based on the predicted locations of the IndelFRs and the computed average log–loss values into the proposed algorithm substantially improves the protein alignment accuracy. This is illustrated by evaluating the performance of the algorithm in aligning sequences belonging to the protein folds for which the IndelFR predictors already exist and by using the reference alignments of the four popular benchmarks, BAliBASE 3.0, OXBENCH, PREFAB 4.0, and SABRE (SABmark 1.65). Conclusions We have proposed a novel and efficient algorithm, the MSAIndelFR algorithm, for multiple protein sequence alignment incorporating a new variable gap penalty function. It is shown that the performance of the proposed algorithm is superior to that of the most–widely used alignment algorithms, Clustal W2, Clustal Omega, Kalign2, MSAProbs, MAFFT, MUSCLE, ProbCons and Probalign, in terms of both the sum–of–pairs and total column metrics. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0826-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mufleh Al-Shatnawi
- Department of Electrical and Computer Engineering, Concordia University, 1455 De Maisonneuve Blvd. W., Montreal, H3G 1M8, Quebec, Canada.
| | - M Omair Ahmad
- Department of Electrical and Computer Engineering, Concordia University, 1455 De Maisonneuve Blvd. W., Montreal, H3G 1M8, Quebec, Canada.
| | - M N S Swamy
- Department of Electrical and Computer Engineering, Concordia University, 1455 De Maisonneuve Blvd. W., Montreal, H3G 1M8, Quebec, Canada.
| |
Collapse
|
22
|
Gibson TJ, Dinkel H, Van Roey K, Diella F. Experimental detection of short regulatory motifs in eukaryotic proteins: tips for good practice as well as for bad. Cell Commun Signal 2015; 13:42. [PMID: 26581338 PMCID: PMC4652402 DOI: 10.1186/s12964-015-0121-y] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2015] [Accepted: 11/13/2015] [Indexed: 12/17/2022] Open
Abstract
It has become clear in outline though not yet in detail how cellular regulatory and signalling systems are constructed. The essential machines are protein complexes that effect regulatory decisions by undergoing internal changes of state. Subcomponents of these cellular complexes are assembled into molecular switches. Many of these switches employ one or more short peptide motifs as toggles that can move between one or more sites within the switch system, the simplest being on-off switches. Paradoxically, these motif modules (termed short linear motifs or SLiMs) are both hugely abundant but difficult to research. So despite the many successes in identifying short regulatory protein motifs, it is thought that only the “tip of the iceberg” has been exposed. Experimental and bioinformatic motif discovery remain challenging and error prone. The advice presented in this article is aimed at helping researchers to uncover genuine protein motifs, whilst avoiding the pitfalls that lead to reports of false discovery.
Collapse
Affiliation(s)
- Toby J Gibson
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, D69117, Heidelberg, Germany.
| | - Holger Dinkel
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, D69117, Heidelberg, Germany.
| | - Kim Van Roey
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, D69117, Heidelberg, Germany. .,Health Services Research Unit, Operational Direction Public Health and Surveillance, Scientific Institute of Public Health (WIV-ISP), 1050, Brussels, Belgium.
| | - Francesca Diella
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, D69117, Heidelberg, Germany.
| |
Collapse
|
23
|
Sela I, Ashkenazy H, Katoh K, Pupko T. GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res 2015; 43:W7-14. [PMID: 25883146 PMCID: PMC4489236 DOI: 10.1093/nar/gkv318] [Citation(s) in RCA: 504] [Impact Index Per Article: 56.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Accepted: 03/28/2015] [Indexed: 12/25/2022] Open
Abstract
Inference of multiple sequence alignments (MSAs) is a critical part of phylogenetic and comparative genomics studies. However, from the same set of sequences different MSAs are often inferred, depending on the methodologies used and the assumed parameters. Much effort has recently been devoted to improving the ability to identify unreliable alignment regions. Detecting such unreliable regions was previously shown to be important for downstream analyses relying on MSAs, such as the detection of positive selection. Here we developed GUIDANCE2, a new integrative methodology that accounts for: (i) uncertainty in the process of indel formation, (ii) uncertainty in the assumed guide tree and (iii) co-optimal solutions in the pairwise alignments, used as building blocks in progressive alignment algorithms. We compared GUIDANCE2 with seven methodologies to detect unreliable MSA regions using extensive simulations and empirical benchmarks. We show that GUIDANCE2 outperforms all previously developed methodologies. Furthermore, GUIDANCE2 also provides a set of alternative MSAs which can be useful for downstream analyses. The novel algorithm is implemented as a web-server, available at: http://guidance.tau.ac.il.
Collapse
Affiliation(s)
- Itamar Sela
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv 6997801, Israel
| | - Haim Ashkenazy
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv 6997801, Israel
| | - Kazutaka Katoh
- Immunology Frontier Research Center, Osaka University, Suita, Osaka 565-0871, Japan Computational Biology Research Center, The National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv 6997801, Israel
| |
Collapse
|
24
|
Protein sectors: statistical coupling analysis versus conservation. PLoS Comput Biol 2015; 11:e1004091. [PMID: 25723535 PMCID: PMC4344308 DOI: 10.1371/journal.pcbi.1004091] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2014] [Accepted: 12/15/2014] [Indexed: 11/19/2022] Open
Abstract
Statistical coupling analysis (SCA) is a method for analyzing multiple sequence alignments that was used to identify groups of coevolving residues termed "sectors". The method applies spectral analysis to a matrix obtained by combining correlation information with sequence conservation. It has been asserted that the protein sectors identified by SCA are functionally significant, with different sectors controlling different biochemical properties of the protein. Here we reconsider the available experimental data and note that it involves almost exclusively proteins with a single sector. We show that in this case sequence conservation is the dominating factor in SCA, and can alone be used to make statistically equivalent functional predictions. Therefore, we suggest shifting the experimental focus to proteins for which SCA identifies several sectors. Correlations in protein alignments, which have been shown to be informative in a number of independent studies, would then be less dominated by sequence conservation.
Collapse
|
25
|
Lyras DP, Metzler D. ReformAlign: improved multiple sequence alignments using a profile-based meta-alignment approach. BMC Bioinformatics 2014; 15:265. [PMID: 25099134 PMCID: PMC4133627 DOI: 10.1186/1471-2105-15-265] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Accepted: 07/29/2014] [Indexed: 11/16/2022] Open
Abstract
Background Obtaining an accurate sequence alignment is fundamental for consistently analyzing biological data. Although this problem may be efficiently solved when only two sequences are considered, the exact inference of the optimal alignment easily gets computationally intractable for the multiple sequence alignment case. To cope with the high computational expenses, approximate heuristic methods have been proposed that address the problem indirectly by progressively aligning the sequences in pairs according to their relatedness. These methods however are not flexible to change the alignment of an already aligned group of sequences in the view of new data, resulting thus in compromises on the quality of the deriving alignment. In this paper we present ReformAlign, a novel meta-alignment approach that may significantly improve on the quality of the deriving alignments from popular aligners. We call ReformAlign a meta-aligner as it requires an initial alignment, for which a variety of alignment programs can be used. The main idea behind ReformAlign is quite straightforward: at first, an existing alignment is used to construct a standard profile which summarizes the initial alignment and then all sequences are individually re-aligned against the formed profile. From each sequence-profile comparison, the alignment of each sequence against the profile is recorded and the final alignment is indirectly inferred by merging all the individual sub-alignments into a unified set. The employment of ReformAlign may often result in alignments which are significantly more accurate than the starting alignments. Results We evaluated the effect of ReformAlign on the generated alignments from ten leading alignment methods using real data of variable size and sequence identity. The experimental results suggest that the proposed meta-aligner approach may often lead to statistically significant more accurate alignments. Furthermore, we show that ReformAlign results in more substantial improvement in cases where the starting alignment is of relatively inferior quality or when the input sequences are harder to align. Conclusions The proposed profile-based meta-alignment approach seems to be a promising and computationally efficient method that can be combined with practically all popular alignment methods and may lead to significant improvements in the generated alignments. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-265) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Dimitrios P Lyras
- Faculty of Biology, Department II, Ludwig-Maximilians Universität München, Planegg-Martinsried 82152, Germany.
| | | |
Collapse
|
26
|
Sebestova E, Bendl J, Brezovsky J, Damborsky J. Computational tools for designing smart libraries. Methods Mol Biol 2014; 1179:291-314. [PMID: 25055786 DOI: 10.1007/978-1-4939-1053-3_20] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Traditional directed evolution experiments are often time-, labor- and cost-intensive because they involve repeated rounds of random mutagenesis and the selection or screening of large mutant libraries. The efficiency of directed evolution experiments can be significantly improved by targeting mutagenesis to a limited number of hot-spot positions and/or selecting a limited set of substitutions. The design of such "smart" libraries can be greatly facilitated by in silico analyses and predictions. Here we provide an overview of computational tools applicable for (a) the identification of hot-spots for engineering enzyme properties, and (b) the evaluation of predicted hot-spots and selection of suitable amino acids for substitutions. The selected tools do not require any specific expertise and can easily be implemented by the wider scientific community.
Collapse
Affiliation(s)
- Eva Sebestova
- Loschmidt Laboratories, Masaryk University, Kamenice 5/A13, 625 00, Brno, Czech Republic
| | | | | | | |
Collapse
|
27
|
Hossain KSMT, Patnaik D, Laxman S, Jain P, Bailey-Kellogg C, Ramakrishnan N. Improved multiple sequence alignments using coupled pattern mining. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1098-1112. [PMID: 24384701 DOI: 10.1109/tcbb.2013.36] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
We present alignment refinement by mining coupled residues (ARMiCoRe), a novel approach to a classical bioinformatics problem, viz., multiple sequence alignment (MSA) of gene and protein sequences. Aligning multiple biological sequences is a key step in elucidating evolutionary relationships, annotating newly sequenced segments, and understanding the relationship between biological sequences and functions. Classical MSA algorithms are designed to primarily capture conservations in sequences whereas couplings, or correlated mutations, are well known as an additional important aspect of sequence evolution. (Two sequence positions are coupled when mutations in one are accompanied by compensatory mutations in another). As a result, better exposition of couplings is sometimes one of the reasons for hand-tweaking of MSAs by practitioners. ARMiCoRe introduces a distinctly pattern mining approach to improving MSAs: using frequent episode mining as a foundational basis, we define the notion of a coupled pattern and demonstrate how the discovery and tiling of coupled patterns using a max-flow approach can yield MSAs that are better than conservation-based alignments. Although we were motivated to improve MSAs for the sake of better exposing couplings, we demonstrate that our MSAs are also improvements in terms of traditional metrics of assessment. We demonstrate the effectiveness of ARMiCoRe on a large collection of data sets.
Collapse
|
28
|
Riera C, Lois S, de la Cruz X. Prediction of pathological mutations in proteins: the challenge of integrating sequence conservation and structure stability principles. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2013. [DOI: 10.1002/wcms.1170] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Casandra Riera
- Laboratory of Translational Bioinformatics in Neuroscience; VHIR; Barcelona Spain
| | - Sergio Lois
- Laboratory of Translational Bioinformatics in Neuroscience; VHIR; Barcelona Spain
| | - Xavier de la Cruz
- Laboratory of Translational Bioinformatics in Neuroscience; VHIR; Barcelona Spain
- Institució Catalana per la Recerca i Estudis Avançats (ICREA); Barcelona Spain
| |
Collapse
|
29
|
Abstract
Multiple sequence alignment (MSA) of DNA, RNA, and protein sequences is one of the most essential techniques in the fields of molecular biology, computational biology, and bioinformatics. Next-generation sequencing technologies are changing the biology landscape, flooding the databases with massive amounts of raw sequence data. MSA of ever-increasing sequence data sets is becoming a significant bottleneck. In order to realise the promise of MSA for large-scale sequence data sets, it is necessary for existing MSA algorithms to be run in a parallelised fashion with the sequence data distributed over a computing cluster or server farm. Combining MSA algorithms with cloud computing technologies is therefore likely to improve the speed, quality, and capability for MSA to handle large numbers of sequences.
In this review, multiple sequence alignments are discussed, with a specific focus on the ClustalW and Clustal Omega algorithms. Cloud computing technologies and concepts are outlined, and the next generation of cloud base MSA algorithms is introduced.
Collapse
|
30
|
Warnow T. Large-Scale Multiple Sequence Alignment and Phylogeny Estimation. MODELS AND ALGORITHMS FOR GENOME EVOLUTION 2013. [DOI: 10.1007/978-1-4471-5298-9_6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
31
|
Petrus AK, Swithers KS, Ranjit C, Wu S, Brewer HM, Gogarten JP, Pasa-Tolic L, Noll KM. Genes for the major structural components of Thermotogales species' togas revealed by proteomic and evolutionary analyses of OmpA and OmpB homologs. PLoS One 2012; 7:e40236. [PMID: 22768259 PMCID: PMC3387000 DOI: 10.1371/journal.pone.0040236] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2012] [Accepted: 06/03/2012] [Indexed: 11/20/2022] Open
Abstract
The unifying structural characteristic of members of the bacterial order Thermotogales is their toga, an unusual cell envelope that includes a loose-fitting sheath around each cell. Only two toga-associated structural proteins have been purified and characterized in Thermotoga maritima: the anchor protein OmpA1 (or Ompα) and the porin OmpB (or Ompβ). The gene encoding OmpA1 (ompA1) was cloned and sequenced and later assigned to TM0477 in the genome sequence, but because no peptide sequence was available for OmpB, its gene (ompB) was not annotated. We identified six porin candidates in the genome sequence of T. maritima. Of these candidates, only one, encoded by TM0476, has all the characteristics reported for OmpB and characteristics expected of a porin including predominant β-sheet structure, a carboxy terminus porin anchoring motif, and a porin-specific amino acid composition. We highly enriched a toga fraction of cells for OmpB by sucrose gradient centrifugation and hydroxyapatite chromatography and analyzed it by LC/MS/MS. We found that the only porin candidate that it contained was the TM0476 product. This cell fraction also had β-sheet character as determined by circular dichroism, consistent with its enrichment for OmpB. We conclude that TM0476 encodes OmpB. A phylogenetic analysis of OmpB found orthologs encoded in syntenic locations in the genomes of all but two Thermotogales species. Those without orthologs have putative isofunctional genes in their place. Phylogenetic analyses of OmpA1 revealed that each species of the Thermotogales has one or two OmpA homologs. T. maritima has two OmpA homologs, encoded by ompA1 (TM0477) and ompA2 (TM1729), both of which were found in the toga protein-enriched cell extracts. These annotations of the genes encoding toga structural proteins will guide future examinations of the structure and function of this unusual lineage-defining cell sheath.
Collapse
Affiliation(s)
- Amanda K. Petrus
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
| | - Kristen S. Swithers
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
| | - Chaman Ranjit
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
| | - Si Wu
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richmond, Washington, United States of America
| | - Heather M. Brewer
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richmond, Washington, United States of America
| | - J. Peter Gogarten
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
| | - Ljiljana Pasa-Tolic
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richmond, Washington, United States of America
| | - Kenneth M. Noll
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
| |
Collapse
|
32
|
Krishnadev O, Srinivasan N. AlignHUSH: alignment of HMMs using structure and hydrophobicity information. BMC Bioinformatics 2011; 12:275. [PMID: 21729312 PMCID: PMC3228556 DOI: 10.1186/1471-2105-12-275] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2010] [Accepted: 07/05/2011] [Indexed: 11/10/2022] Open
Abstract
Background Sensitive remote homology detection and accurate alignments especially in the midnight zone of sequence similarity are needed for better function annotation and structural modeling of proteins. An algorithm, AlignHUSH for HMM-HMM alignment has been developed which is capable of recognizing distantly related domain families The method uses structural information, in the form of predicted secondary structure probabilities, and hydrophobicity of amino acids to align HMMs of two sets of aligned sequences. The effect of using adjoining column(s) information has also been investigated and is found to increase the sensitivity of HMM-HMM alignments and remote homology detection. Results We have assessed the performance of AlignHUSH using known evolutionary relationships available in SCOP. AlignHUSH performs better than the best HMM-HMM alignment methods and is observed to be even more sensitive at higher error rates. Accuracy of the alignments obtained using AlignHUSH has been assessed using the structure-based alignments available in BaliBASE. The alignment length and the alignment quality are found to be appropriate for homology modeling and function annotation. The alignment accuracy is found to be comparable to existing methods for profile-profile alignments. Conclusions A new method to align HMMs has been developed and is shown to have better sensitivity at error rates of 10% and above when compared to other available programs. The proposed method could effectively aid obtaining clues to functions of proteins of yet unknown function. A web-server incorporating the AlignHUSH method is available at http://crick.mbu.iisc.ernet.in/~alignhush/
Collapse
Affiliation(s)
- Oruganty Krishnadev
- Molecular Biophysics Unit Indian Institute of Science, Bangalore 560012, India
| | | |
Collapse
|
33
|
Shortridge MD, Triplet T, Revesz P, Griep MA, Powers R. Bacterial protein structures reveal phylum dependent divergence. Comput Biol Chem 2011; 35:24-33. [PMID: 21315656 DOI: 10.1016/j.compbiolchem.2010.12.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2010] [Revised: 12/28/2010] [Accepted: 12/29/2010] [Indexed: 01/26/2023]
Abstract
Protein sequence space is vast compared to protein fold space. This raises important questions about how structures adapt to evolutionary changes in protein sequences. A growing trend is to regard protein fold space as a continuum rather than a series of discrete structures. From this perspective, homologous protein structures within the same functional classification should reveal a constant rate of structural drift relative to sequence changes. The clusters of orthologous groups (COG) classification system was used to annotate homologous bacterial protein structures in the Protein Data Bank (PDB). The structures and sequences of proteins within each COG were compared against each other to establish their relatedness. As expected, the analysis demonstrates a sharp structural divergence between the bacterial phyla Firmicutes and Proteobacteria. Additionally, each COG had a distinct sequence/structure relationship, indicating that different evolutionary pressures affect the degree of structural divergence. However, our analysis also shows the relative drift rate between sequence identity and structure divergence remains constant.
Collapse
Affiliation(s)
- Matthew D Shortridge
- Department of Chemistry, University of Nebraska-Lincoln, 68588-0304, United States
| | | | | | | | | |
Collapse
|
34
|
Abstract
Homology modeling is based on the observation that related protein sequences adopt similar three-dimensional structures. Hence, a homology model of a protein can be derived using related protein structure(s) as modeling template(s). A key step in this approach is the establishment of correspondence between residues of the protein to be modeled and those of modeling template(s). This step, often referred to as sequence-structure alignment, is one of the major determinants of the accuracy of a homology model. This chapter gives an overview of methods for deriving sequence-structure alignments and discusses recent methodological developments leading to improved performance. However, no method is perfect. How to find alignment regions that may have errors and how to make improvements? This is another focus of this chapter. Finally, the chapter provides a practical guidance of how to get the most of the available tools in maximizing the accuracy of sequence-structure alignments.
Collapse
|
35
|
Abstract
Multiple alignment of DNA sequences is an important step in various molecular biological analyses. As a large amount of sequence data is becoming available through genome and other large-scale sequencing projects, scalability, as well as accuracy, is currently required for a multiple sequence alignment (MSA) program. In this chapter, we outline the algorithms of an MSA program MAFFT and provide practical advice, focusing on several typical situations a biologist sometimes faces. For genome alignment, which is beyond the scope of MAFFT, we introduce two tools: TBA and MAUVE.
Collapse
Affiliation(s)
- Kazutaka Katoh
- Digital Medicine Initiative, Kyushu University, Fukuoka, Japan
| | | | | |
Collapse
|