1
|
Chen L, Li Q, Nasif KFA, Xie Y, Deng B, Niu S, Pouriyeh S, Dai Z, Chen J, Xie CY. AI-Driven Deep Learning Techniques in Protein Structure Prediction. Int J Mol Sci 2024; 25:8426. [PMID: 39125995 PMCID: PMC11313475 DOI: 10.3390/ijms25158426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Revised: 07/29/2024] [Accepted: 07/29/2024] [Indexed: 08/12/2024] Open
Abstract
Protein structure prediction is important for understanding their function and behavior. This review study presents a comprehensive review of the computational models used in predicting protein structure. It covers the progression from established protein modeling to state-of-the-art artificial intelligence (AI) frameworks. The paper will start with a brief introduction to protein structures, protein modeling, and AI. The section on established protein modeling will discuss homology modeling, ab initio modeling, and threading. The next section is deep learning-based models. It introduces some state-of-the-art AI models, such as AlphaFold (AlphaFold, AlphaFold2, AlphaFold3), RoseTTAFold, ProteinBERT, etc. This section also discusses how AI techniques have been integrated into established frameworks like Swiss-Model, Rosetta, and I-TASSER. The model performance is compared using the rankings of CASP14 (Critical Assessment of Structure Prediction) and CASP15. CASP16 is ongoing, and its results are not included in this review. Continuous Automated Model EvaluatiOn (CAMEO) complements the biennial CASP experiment. Template modeling score (TM-score), global distance test total score (GDT_TS), and Local Distance Difference Test (lDDT) score are discussed too. This paper then acknowledges the ongoing difficulties in predicting protein structure and emphasizes the necessity of additional searches like dynamic protein behavior, conformational changes, and protein-protein interactions. In the application section, this paper introduces some applications in various fields like drug design, industry, education, and novel protein development. In summary, this paper provides a comprehensive overview of the latest advancements in established protein modeling and deep learning-based models for protein structure predictions. It emphasizes the significant advancements achieved by AI and identifies potential areas for further investigation.
Collapse
Affiliation(s)
- Lingtao Chen
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Qiaomu Li
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Kazi Fahim Ahmad Nasif
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Ying Xie
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Bobin Deng
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Shuteng Niu
- Department of Computer Science, Bowling Green State University, Bowling Green, OH 43403, USA;
| | - Seyedamin Pouriyeh
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Zhiyu Dai
- Division of Pulmonary and Critical Care Medicine, John T. Milliken Department of Medicine, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA;
| | - Jiawei Chen
- College of Computing, Data Science and Society, University of California, Berkeley, CA 94720, USA;
| | - Chloe Yixin Xie
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| |
Collapse
|
2
|
Huang B, Kong L, Wang C, Ju F, Zhang Q, Zhu J, Gong T, Zhang H, Yu C, Zheng WM, Bu D. Protein Structure Prediction: Challenges, Advances, and the Shift of Research Paradigms. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:913-925. [PMID: 37001856 PMCID: PMC10928435 DOI: 10.1016/j.gpb.2022.11.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 11/23/2022] [Accepted: 11/30/2022] [Indexed: 03/31/2023]
Abstract
Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields, including biochemistry, medicine, physics, mathematics, and computer science. These researchers adopt various research paradigms to attack the same structure prediction problem: biochemists and physicists attempt to reveal the principles governing protein folding; mathematicians, especially statisticians, usually start from assuming a probability distribution of protein structures given a target sequence and then find the most likely structure, while computer scientists formulate protein structure prediction as an optimization problem - finding the structural conformation with the lowest energy or minimizing the difference between predicted structure and native structure. These research paradigms fall into the two statistical modeling cultures proposed by Leo Breiman, namely, data modeling and algorithmic modeling. Recently, we have also witnessed the great success of deep learning in protein structure prediction. In this review, we present a survey of the efforts for protein structure prediction. We compare the research paradigms adopted by researchers from different fields, with an emphasis on the shift of research paradigms in the era of deep learning. In short, the algorithmic modeling techniques, especially deep neural networks, have considerably improved the accuracy of protein structure prediction; however, theories interpreting the neural networks and knowledge on protein folding are still highly desired.
Collapse
Affiliation(s)
- Bin Huang
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lupeng Kong
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; Changping Laboratory, Beijing 102206, China
| | - Chao Wang
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Fusong Ju
- Microsoft Research AI4Science, Beijing 100080, China
| | - Qi Zhang
- Huawei Noah's Ark Lab, Wuhan 430206, China
| | - Jianwei Zhu
- Microsoft Research AI4Science, Beijing 100080, China
| | - Tiansu Gong
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Haicang Zhang
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China; Zhongke Big Data Academy, Zhengzhou 450046, China.
| | - Chungong Yu
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China; Zhongke Big Data Academy, Zhengzhou 450046, China.
| | - Wei-Mou Zheng
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China.
| | - Dongbo Bu
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China; Zhongke Big Data Academy, Zhengzhou 450046, China.
| |
Collapse
|
3
|
Bhattacharya S, Roche R, Shuvo MH, Moussad B, Bhattacharya D. Contact-Assisted Threading in Low-Homology Protein Modeling. Methods Mol Biol 2023; 2627:41-59. [PMID: 36959441 DOI: 10.1007/978-1-0716-2974-1_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
The ability to successfully predict the three-dimensional structure of a protein from its amino acid sequence has made considerable progress in the recent past. The progress is propelled by the improved accuracy of deep learning-based inter-residue contact map predictors coupled with the rising growth of protein sequence databases. Contact map encodes interatomic interaction information that can be exploited for highly accurate prediction of protein structures via contact map threading even for the query proteins that are not amenable to direct homology modeling. As such, contact-assisted threading has garnered considerable research effort. In this chapter, we provide an overview of existing contact-assisted threading methods while highlighting the recent advances and discussing some of the current limitations and future prospects in the application of contact-assisted threading for improving the accuracy of low-homology protein modeling.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| | | | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | | |
Collapse
|
4
|
Lee SJ, Joo K, Sim S, Lee J, Lee IH, Lee J. CRFalign: A Sequence-Structure Alignment of Proteins Based on a Combination of HMM-HMM Comparison and Conditional Random Fields. Molecules 2022; 27:3711. [PMID: 35744836 PMCID: PMC9231382 DOI: 10.3390/molecules27123711] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 06/03/2022] [Accepted: 06/07/2022] [Indexed: 11/16/2022] Open
Abstract
Sequence-structure alignment for protein sequences is an important task for the template-based modeling of 3D structures of proteins. Building a reliable sequence-structure alignment is a challenging problem, especially for remote homologue target proteins. We built a method of sequence-structure alignment called CRFalign, which improves upon a base alignment model based on HMM-HMM comparison by employing pairwise conditional random fields in combination with nonlinear scoring functions of structural and sequence features. Nonlinear scoring part is implemented by a set of gradient boosted regression trees. In addition to sequence profile features, various position-dependent structural features are employed including secondary structures and solvent accessibilities. Training is performed on reference alignments at superfamily levels or twilight zone chosen from the SABmark benchmark set. We found that CRFalign method produces relative improvement in terms of average alignment accuracies for validation sets of SABmark benchmark. We also tested CRFalign on 51 sequence-structure pairs involving 15 FM target domains of CASP14, where we could see that CRFalign leads to an improvement in average modeling accuracies in these hard targets (TM-CRFalign ≃42.94%) compared with that of HHalign (TM-HHalign ≃39.05%) and also that of MRFalign (TM-MRFalign ≃36.93%). CRFalign was incorporated to our template search framework called CRFpred and was tested for a random target set of 300 target proteins consisting of Easy, Medium and Hard sets which showed a reasonable template search performance.
Collapse
Affiliation(s)
- Sung Jong Lee
- Basic Science Institute, Changwon National University, Changwon 51140, Korea;
| | - Keehyoung Joo
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea;
| | | | - Juyong Lee
- Department of Chemistry, Kangwon National University, Chuncheon 24341, Korea;
| | - In-Ho Lee
- Korea Research Institute of Standards and Science (KRISS), Daejeon 34113, Korea;
| | - Jooyoung Lee
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea
| |
Collapse
|
5
|
Black MH, Gradowski M, Pawłowski K, Tagliabracci VS. Methods for discovering catalytic activities for pseudokinases. Methods Enzymol 2022; 667:575-610. [PMID: 35525554 DOI: 10.1016/bs.mie.2022.03.047] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Pseudoenzymes resemble active enzymes, but lack key catalytic residues believed to be required for activity. Many pseudoenzymes appear to be inactive in conventional enzyme assays. However, an alternative explanation for their apparent lack of activity is that pseudoenzymes are being assayed for the wrong reaction. We have discovered several new protein kinase-like families which have revealed how different binding orientations of adenosine triphosphate (ATP) and active site residue migration can generate a novel reaction from a common kinase scaffold. These results have exposed the catalytic versatility of the protein kinase fold and suggest that atypical kinases and pseudokinases should be analyzed for alternative transferase activities. In this chapter, we discuss a general approach for bioinformatically identifying divergent or atypical members of an enzyme superfamily, then present an experimental approach to characterize their catalytic activity.
Collapse
Affiliation(s)
- Miles H Black
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX, United States
| | - Marcin Gradowski
- Department of Biochemistry and Microbiology, Institute of Biology, Warsaw University of Life Sciences, Warsaw, Poland
| | - Krzysztof Pawłowski
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX, United States; Department of Biochemistry and Microbiology, Institute of Biology, Warsaw University of Life Sciences, Warsaw, Poland.
| | - Vincent S Tagliabracci
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX, United States; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, United States; Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, TX, United States; Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, United States.
| |
Collapse
|
6
|
Zhang Y, Zhang Q, Zhou J, Zou Q. A survey on the algorithm and development of multiple sequence alignment. Brief Bioinform 2022; 23:6546258. [PMID: 35272347 DOI: 10.1093/bib/bbac069] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 01/30/2022] [Accepted: 02/09/2022] [Indexed: 12/21/2022] Open
Abstract
Multiple sequence alignment (MSA) is an essential cornerstone in bioinformatics, which can reveal the potential information in biological sequences, such as function, evolution and structure. MSA is widely used in many bioinformatics scenarios, such as phylogenetic analysis, protein analysis and genomic analysis. However, MSA faces new challenges with the gradual increase in sequence scale and the increasing demand for alignment accuracy. Therefore, developing an efficient and accurate strategy for MSA has become one of the research hotspots in bioinformatics. In this work, we mainly summarize the algorithms for MSA and its applications in bioinformatics. To provide a structured and clear perspective, we systematically introduce MSA's knowledge, including background, database, metric and benchmark. Besides, we list the most common applications of MSA in the field of bioinformatics, including database searching, phylogenetic analysis, genomic analysis, metagenomic analysis and protein analysis. Furthermore, we categorize and analyze classical and state-of-the-art algorithms, divided into progressive alignment, iterative algorithm, heuristics, machine learning and divide-and-conquer. Moreover, we also discuss the challenges and opportunities of MSA in bioinformatics. Our work provides a comprehensive survey of MSA applications and their relevant algorithms. It could bring valuable insights for researchers to contribute their knowledge to MSA and relevant studies.
Collapse
Affiliation(s)
- Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China.,School of Computer Science and Engineering, University of Electronic Science and Technology of China, 611731, Chengdu, China
| | - Qiang Zhang
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China
| | - Jiliu Zhou
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, 610054, Chengdu, China
| |
Collapse
|
7
|
Xu H, Hu B, Flesher DA, Liu J, Motaleb MA. BB0259 Encompasses a Peptidoglycan Lytic Enzyme Function for Proper Assembly of Periplasmic Flagella in Borrelia burgdorferi. Front Microbiol 2021; 12:692707. [PMID: 34659138 PMCID: PMC8517470 DOI: 10.3389/fmicb.2021.692707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 08/19/2021] [Indexed: 11/18/2022] Open
Abstract
Assembly of the bacterial flagellar rod, hook, and filament requires penetration through the peptidoglycan (PG) sacculus and outer membrane. In most β- and γ-proteobacteria, the protein FlgJ has two functional domains that enable PG hydrolyzing activity to create pores, facilitating proper assembly of the flagellar rod. However, two distinct proteins performing the same functions as the dual-domain FlgJ are proposed in δ- and ε-proteobacteria as well as spirochetes. The Lyme disease spirochete Borrelia burgdorferi genome possesses a FlgJ and a PG lytic SLT enzyme protein homolog (BB0259). FlgJ in B. burgdorferi is crucial for flagellar hook and filament assembly but not for the proper rod assembly reported in other bacteria. However, BB0259 has never been characterized. Here, we use cryo-electron tomography to visualize periplasmic flagella in different bb0259 mutant strains and provide evidence that the E580 residue of BB0259 is essential for PG-hydrolyzing activity. Without the enzyme activity, the flagellar hook fails to penetrate through the pores in the cell wall to complete assembly of an intact periplasmic flagellum. Given that FlgJ and BB0259 interact with each other, they likely coordinate the penetration through the PG sacculus and assembly of a functional flagellum in B. burgdorferi and other spirochetes. Because of its role, we renamed BB0259 as flagellar-specific lytic transglycosylase or LTaseBb.
Collapse
Affiliation(s)
- Hui Xu
- Department of Microbiology and Immunology, Brody School of Medicine, East Carolina University, Greenville, NC, United States
| | - Bo Hu
- Department of Microbiology and Molecular Genetics, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - David A. Flesher
- Department of Microbial Pathogenesis, Yale University School of Medicine, New Haven, CT, United States
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, United States
| | - Jun Liu
- Department of Microbial Pathogenesis, Yale University School of Medicine, New Haven, CT, United States
- Microbial Sciences Institute, Yale University, West Haven, CT, United States
| | - Md A. Motaleb
- Department of Microbiology and Immunology, Brody School of Medicine, East Carolina University, Greenville, NC, United States
| |
Collapse
|
8
|
Bhattacharya S, Roche R, Shuvo MH, Bhattacharya D. Recent Advances in Protein Homology Detection Propelled by Inter-Residue Interaction Map Threading. Front Mol Biosci 2021; 8:643752. [PMID: 34046429 PMCID: PMC8148041 DOI: 10.3389/fmolb.2021.643752] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
Sequence-based protein homology detection has emerged as one of the most sensitive and accurate approaches to protein structure prediction. Despite the success, homology detection remains very challenging for weakly homologous proteins with divergent evolutionary profile. Very recently, deep neural network architectures have shown promising progress in mining the coevolutionary signal encoded in multiple sequence alignments, leading to reasonably accurate estimation of inter-residue interaction maps, which serve as a rich source of additional information for improved homology detection. Here, we summarize the latest developments in protein homology detection driven by inter-residue interaction map threading. We highlight the emerging trends in distant-homology protein threading through the alignment of predicted interaction maps at various granularities ranging from binary contact maps to finer-grained distance and orientation maps as well as their combination. We also discuss some of the current limitations and possible future avenues to further enhance the sensitivity of protein homology detection.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Rahmatullah Roche
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Md Hossain Shuvo
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
- Department of Biological Sciences, Auburn University, Auburn, AL, United States
| |
Collapse
|
9
|
Kim DN, Gront D, Sanbonmatsu KY. Practical Considerations for Atomistic Structure Modeling with Cryo-EM Maps. J Chem Inf Model 2020; 60:2436-2442. [PMID: 32422044 PMCID: PMC7891309 DOI: 10.1021/acs.jcim.0c00090] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
We describe common approaches to atomistic structure modeling with single particle analysis derived cryo-EM maps. Several strategies for atomistic model building and atomistic model fitting methods are discussed, including selection criteria and implementation procedures. In covering basic concepts and caveats, this short perspective aims to help facilitate active discussion between scientists at different levels with diverse backgrounds.
Collapse
Affiliation(s)
- Doo Nam Kim
- Computational Biology Team, Biological Science Division, Pacific Northwest National Laboratory, Richland, Washington, 99354, United States
| | - Dominik Gront
- Faculty of Chemistry, Biological and Chemical Research Center, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Karissa Y. Sanbonmatsu
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, New Mexico, 87545, United States
- New Mexico Consortium, Los Alamos, New Mexico, 87544, United States
| |
Collapse
|
10
|
Mulnaes D, Porta N, Clemens R, Apanasenko I, Reiners J, Gremer L, Neudecker P, Smits SHJ, Gohlke H. TopModel: Template-Based Protein Structure Prediction at Low Sequence Identity Using Top-Down Consensus and Deep Neural Networks. J Chem Theory Comput 2020; 16:1953-1967. [DOI: 10.1021/acs.jctc.9b00825] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Daniel Mulnaes
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Nicola Porta
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Rebecca Clemens
- Institute für Biochemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Irina Apanasenko
- Institut für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
- Institute of Biological Information Processing (IBI-7: Structural Biochemistry) & JuStruct, Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| | - Jens Reiners
- Institute für Biochemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
- Center for Structural Studies Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Lothar Gremer
- Institut für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
- Institute of Biological Information Processing (IBI-7: Structural Biochemistry) & JuStruct, Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| | - Philipp Neudecker
- Institut für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
- Institute of Biological Information Processing (IBI-7: Structural Biochemistry) & JuStruct, Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| | - Sander H. J. Smits
- Institute für Biochemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
- Center for Structural Studies Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Holger Gohlke
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
- Institute of Biological Information Processing (IBI-7: Structural Biochemistry) & JuStruct, Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
- John von Neumann Institute for Computing (NIC) & Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| |
Collapse
|
11
|
Makigaki S, Ishida T. Sequence alignment using machine learning for accurate template-based protein structure prediction. Bioinformatics 2020; 36:104-111. [PMID: 31197318 DOI: 10.1093/bioinformatics/btz483] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Revised: 04/15/2019] [Accepted: 06/05/2019] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Template-based modeling, the process of predicting the tertiary structure of a protein by using homologous protein structures, is useful if good templates can be found. Although modern homology detection methods can find remote homologs with high sensitivity, the accuracy of template-based models generated from homology-detection-based alignments is often lower than that from ideal alignments. RESULTS In this study, we propose a new method that generates pairwise sequence alignments for more accurate template-based modeling. The proposed method trains a machine learning model using the structural alignment of known homologs. It is difficult to directly predict sequence alignments using machine learning. Thus, when calculating sequence alignments, instead of a fixed substitution matrix, this method dynamically predicts a substitution score from the trained model. We evaluate our method by carefully splitting the training and test datasets and comparing the predicted structure's accuracy with that of state-of-the-art methods. Our method generates more accurate tertiary structure models than those produced from alignments obtained by other methods. AVAILABILITY AND IMPLEMENTATION https://github.com/shuichiro-makigaki/exmachina. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shuichiro Makigaki
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan
| | - Takashi Ishida
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan
| |
Collapse
|
12
|
Steinegger M, Meier M, Mirdita M, Vöhringer H, Haunsberger SJ, Söding J. HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics 2019; 20:473. [PMID: 31521110 PMCID: PMC6744700 DOI: 10.1186/s12859-019-3019-7] [Citation(s) in RCA: 547] [Impact Index Per Article: 109.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 08/02/2019] [Indexed: 01/06/2023] Open
Abstract
Background HH-suite is a widely used open source software suite for sensitive sequence similarity searches and protein fold recognition. It is based on pairwise alignment of profile Hidden Markov models (HMMs), which represent multiple sequence alignments of homologous proteins. Results We developed a single-instruction multiple-data (SIMD) vectorized implementation of the Viterbi algorithm for profile HMM alignment and introduced various other speed-ups. These accelerated the search methods HHsearch by a factor 4 and HHblits by a factor 2 over the previous version 2.0.16. HHblits3 is ∼10× faster than PSI-BLAST and ∼20× faster than HMMER3. Jobs to perform HHsearch and HHblits searches with many query profile HMMs can be parallelized over cores and over cluster servers using OpenMP and message passing interface (MPI). The free, open-source, GPLv3-licensed software is available at https://github.com/soedinglab/hh-suite. Conclusion The added functionalities and increased speed of HHsearch and HHblits should facilitate their use in large-scale protein structure and function prediction, e.g. in metagenomics and genomics projects.
Collapse
Affiliation(s)
- Martin Steinegger
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Am Fassberg 11, Munich, 81379, Germany.,Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Markus Meier
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Am Fassberg 11, Munich, 81379, Germany
| | - Milot Mirdita
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Am Fassberg 11, Munich, 81379, Germany
| | - Harald Vöhringer
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Am Fassberg 11, Munich, 81379, Germany.,European Bioinformatics Institute, Cambridge, CB10 1SD, United Kingdom
| | | | - Johannes Söding
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Am Fassberg 11, Munich, 81379, Germany.
| |
Collapse
|
13
|
Halabelian L, Ravichandran M, Li Y, Zeng H, Rao A, Aravind L, Arrowsmith CH. Structural basis of HMCES interactions with abasic DNA and multivalent substrate recognition. Nat Struct Mol Biol 2019; 26:607-612. [PMID: 31235913 PMCID: PMC6609482 DOI: 10.1038/s41594-019-0246-6] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2019] [Accepted: 05/08/2019] [Indexed: 12/29/2022]
Abstract
Embryonic stem cell-specific 5-hydroxymethylcytosine-binding protein (HMCES) can covalently cross-link to abasic sites in single-stranded DNA at stalled replication forks to prevent genome instability. Here, we report crystal structures of the human HMCES SOS response-associated peptidase (SRAP) domain in complex with DNA-damage substrates, including HMCES cross-linked with an abasic site within a 3' overhang DNA. HMCES interacts with both single-strand and duplex segments of DNA, with two independent duplex DNA interaction sites identified in the SRAP domain. The HMCES DNA-protein cross-link structure provides structural insights into a novel thiazolidine covalent interaction between the DNA abasic site and conserved Cys 2 of HMCES. Collectively, our structures demonstrate the capacity for the SRAP domain to interact with a variety of single-strand- and double-strand-containing DNA structures found in DNA-damage sites, including 5' and 3' overhang DNAs and gapped DNAs with short single-strand segments.
Collapse
Affiliation(s)
- Levon Halabelian
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario, Canada
| | - Mani Ravichandran
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario, Canada
| | - Yanjun Li
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario, Canada
| | - Hong Zeng
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario, Canada
| | - Anjana Rao
- Division of Signaling and Gene Expression, La Jolla Institute for Immunology, La Jolla, CA, USA
- Department of Pharmacology, University of San Diego, La Jolla, CA, USA
- Moores Cancer Center, University of San Diego, La Jolla, CA, USA
- Sanford Consortium for Regenerative Medicine, La Jolla, CA, USA
| | - L Aravind
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, USA
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Cheryl H Arrowsmith
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, Onatario, Canada.
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.
| |
Collapse
|
14
|
Bhattacharya S, Bhattacharya D. Does inclusion of residue-residue contact information boost protein threading? Proteins 2019; 87:596-606. [PMID: 30882932 DOI: 10.1002/prot.25684] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2018] [Revised: 02/20/2019] [Accepted: 03/13/2019] [Indexed: 12/26/2022]
Abstract
Template-based modeling is considered as one of the most successful approaches for protein structure prediction. However, reliably and accurately selecting optimal template proteins from a library of known protein structures having similar folds as the target protein and making correct alignments between the target sequence and the template structures, a template-based modeling technique known as threading, remains challenging, particularly for non- or distantly-homologous protein targets. With the recent advancement in protein residue-residue contact map prediction powered by sequence co-evolution and machine learning, here we systematically analyze the effect of inclusion of residue-residue contact information in improving the accuracy and reliability of protein threading. We develop a new threading algorithm by incorporating various sequential and structural features, and subsequently integrate residue-residue contact information as an additional scoring term for threading template selection. We show that the inclusion of contact information attains statistically significantly better threading performance compared to a baseline threading algorithm that does not utilize contact information when everything else remains the same. Experimental results demonstrate that our contact based threading approach outperforms popular threading method MUSTER, contact-assisted ab initio folding method CONFOLD2, and recent state-of-the-art contact-assisted protein threading methods EigenTHREADER and map_align on several benchmarks. Our study illustrates that the inclusion of contact maps is a promising avenue in protein threading to ultimately help to improve the accuracy of protein structure prediction.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama
| |
Collapse
|
15
|
Buey RM, Schmitz RA, Buchanan BB, Balsera M. Crystal Structure of the Apo-Form of NADPH-Dependent Thioredoxin Reductase from a Methane-Producing Archaeon. Antioxidants (Basel) 2018; 7:E166. [PMID: 30453601 PMCID: PMC6262447 DOI: 10.3390/antiox7110166] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 11/12/2018] [Accepted: 11/14/2018] [Indexed: 12/20/2022] Open
Abstract
The redox regulation of proteins via reversible dithiol/disulfide exchange reactions involves the thioredoxin system, which is composed of a reductant, a thioredoxin reductase (TR), and thioredoxin (Trx). In the pyridine nucleotide-dependent Trx reduction pathway, reducing equivalents, typically from reduced nicotinamide adenine dinucleotide phosphate (NADPH), are transferred from NADPH-TR (NTR) to Trx and, in turn, to target proteins, thus resulting in the reversible modification of the structural and functional properties of the targets. NTR enzymes contain three functional sites: an NADPH binding pocket, a non-covalently bound flavin cofactor, and a redox-active disulfide in the form of CxxC. With the aim of increasing our knowledge of the thioredoxin system in archaea, we here report the high-resolution crystal structure of NTR from the methane-generating organism Methanosarcina mazei strain Gö1 (MmNTR) at 2.6 Å resolution. Based on the crystals presently described, MmNTR assumes an overall fold that is nearly identical to the archetypal fold of authentic NTRs; however, surprisingly, we observed no electron density for flavin adenine dinucleotide (FAD) despite the well-defined and conserved FAD-binding cavity in the folded module. Remarkably, the dimers of the apo-protein within the crystal were different from those observed by small angle X-ray scattering (SAXS) for the holo-protein, suggesting that the binding of the flavin cofactor does not require major protein structural rearrangements. Rather, binding results in the stabilization of essential parts of the structure, such as those involved in dimer stabilization. Altogether, this structure represents the example of an apo-form of an NTR that yields important insight into the effects of the cofactor on protein folding.
Collapse
Affiliation(s)
- Rubén M Buey
- Metabolic Engineering Group. Dpto. Microbiología y Genética. Universidad de Salamanca, 37007 Salamanca, Spain.
| | - Ruth A Schmitz
- Institut für Allgemeine Mikrobiologie, Christian-Albrechts-Universität Kiel, 24118 Kiel, Germany.
| | - Bob B Buchanan
- Department of Plant & Microbial Biology, University of California, 94720 Berkeley CA, USA.
| | - Monica Balsera
- Instituto de Recursos Naturales y Agrobiología de Salamanca (IRNASA-CSIC), 37008 Salamanca, Spain.
| |
Collapse
|
16
|
Dułak D, Gadzała M, Banach M, Ptak M, Wiśniowski Z, Konieczny L, Roterman I. Filamentous Aggregates of Tau Proteins Fulfil Standard Amyloid Criteria Provided by the Fuzzy Oil Drop (FOD) Model. Int J Mol Sci 2018; 19:E2910. [PMID: 30257460 PMCID: PMC6213535 DOI: 10.3390/ijms19102910] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2018] [Revised: 09/12/2018] [Accepted: 09/20/2018] [Indexed: 01/02/2023] Open
Abstract
Abnormal filamentous aggregates that are formed by tangled tau protein turn out to be classic amyloid fibrils, meeting all the criteria defined under the fuzzy oil drop model in the context of amyloid characterization. The model recognizes amyloids as linear structures where local hydrophobicity minima and maxima propagate in an alternating manner along the fibril's long axis. This distribution of hydrophobicity differs greatly from the classic monocentric hydrophobic core observed in globular proteins. Rather than becoming a globule, the amyloid instead forms a ribbonlike (or cylindrical) structure.
Collapse
Affiliation(s)
- Dawid Dułak
- ABB Business Services Sp. z o.o. ul. Żegańska 1, 04-713 Warszawa, Poland.
| | | | - Mateusz Banach
- Department of Bioinformatics and Telemedicine, Medical College, Jagiellonian University, Łazarza 16, 31-530 Kraków, Poland.
| | - Magdalena Ptak
- Department of Bioinformatics and Telemedicine, Medical College, Jagiellonian University, Łazarza 16, 31-530 Kraków, Poland.
- Faculty of Physics, Astronomy and Applied Computer Science, Jagiellonian University, Łojasiewicza 11, 30-348 Kraków, Poland.
| | - Zdzisław Wiśniowski
- Department of Bioinformatics and Telemedicine, Medical College, Jagiellonian University, Łazarza 16, 31-530 Kraków, Poland.
| | - Leszek Konieczny
- Chair of Medical Biochemistry, Medical College, Jagiellonian University, Kopernika 7, 31-034 Kraków, Poland.
| | - Irena Roterman
- Department of Bioinformatics and Telemedicine, Medical College, Jagiellonian University, Łazarza 16, 31-530 Kraków, Poland.
| |
Collapse
|
17
|
Conway JM, Crosby JR, Hren AP, Southerland RT, Lee LL, Lunin VV, Alahuhta P, Himmel ME, Bomble YJ, Adams MWW, Kelly RM. Novel multidomain, multifunctional glycoside hydrolases from highly lignocellulolytic
Caldicellulosiruptor
species. AIChE J 2018. [DOI: 10.1002/aic.16354] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Jonathan M. Conway
- Dept. of Chemical and Biomolecular EngineeringNorth Carolina State UniversityRaleighNC27695
| | - James R. Crosby
- Dept. of Chemical and Biomolecular EngineeringNorth Carolina State UniversityRaleighNC27695
| | - Andrew P. Hren
- Dept. of Chemical and Biomolecular EngineeringNorth Carolina State UniversityRaleighNC27695
| | - Robert T. Southerland
- Dept. of Chemical and Biomolecular EngineeringNorth Carolina State UniversityRaleighNC27695
| | - Laura L. Lee
- Dept. of Chemical and Biomolecular EngineeringNorth Carolina State UniversityRaleighNC27695
| | | | - Petri Alahuhta
- Biosciences CenterNational Renewable Energy LaboratoryGoldenCO80401
| | | | | | - Michael W. W. Adams
- Dept. of Biochemistry and Molecular BiologyUniversity of GeorgiaAthensGA30602
| | - Robert M. Kelly
- Dept. of Chemical and Biomolecular EngineeringNorth Carolina State UniversityRaleighNC27695
| |
Collapse
|
18
|
Rusu V, Hoch E, Mercader JM, Tenen DE, Gymrek M, Hartigan CR, DeRan M, von Grotthuss M, Fontanillas P, Spooner A, Guzman G, Deik AA, Pierce KA, Dennis C, Clish CB, Carr SA, Wagner BK, Schenone M, Ng MCY, Chen BH, Centeno-Cruz F, Zerrweck C, Orozco L, Altshuler DM, Schreiber SL, Florez JC, Jacobs SBR, Lander ES. Type 2 Diabetes Variants Disrupt Function of SLC16A11 through Two Distinct Mechanisms. Cell 2017; 170:199-212.e20. [PMID: 28666119 DOI: 10.1016/j.cell.2017.06.011] [Citation(s) in RCA: 103] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2016] [Revised: 03/16/2017] [Accepted: 06/08/2017] [Indexed: 01/08/2023]
Abstract
Type 2 diabetes (T2D) affects Latinos at twice the rate seen in populations of European descent. We recently identified a risk haplotype spanning SLC16A11 that explains ∼20% of the increased T2D prevalence in Mexico. Here, through genetic fine-mapping, we define a set of tightly linked variants likely to contain the causal allele(s). We show that variants on the T2D-associated haplotype have two distinct effects: (1) decreasing SLC16A11 expression in liver and (2) disrupting a key interaction with basigin, thereby reducing cell-surface localization. Both independent mechanisms reduce SLC16A11 function and suggest SLC16A11 is the causal gene at this locus. To gain insight into how SLC16A11 disruption impacts T2D risk, we demonstrate that SLC16A11 is a proton-coupled monocarboxylate transporter and that genetic perturbation of SLC16A11 induces changes in fatty acid and lipid metabolism that are associated with increased T2D risk. Our findings suggest that increasing SLC16A11 function could be therapeutically beneficial for T2D. VIDEO ABSTRACT.
Collapse
Affiliation(s)
- Victor Rusu
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Eitan Hoch
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Metabolism Program, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Josep M Mercader
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Barcelona Supercomputing Center (BSC), Joint BSC-CRG-IRB Research Program in Computational Biology, 08034 Barcelona, Spain
| | - Danielle E Tenen
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Division of Endocrinology, Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
| | - Melissa Gymrek
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | | | - Michael DeRan
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Marcin von Grotthuss
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Pierre Fontanillas
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Alexandra Spooner
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Gaelen Guzman
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Amy A Deik
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Kerry A Pierce
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Courtney Dennis
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Clary B Clish
- Metabolism Program, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Steven A Carr
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | | | - Monica Schenone
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Maggie C Y Ng
- Center for Genomics and Personalized Medicine Research, Center for Diabetes Research, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| | - Brian H Chen
- Longitudinal Studies Section, Translational Gerontology Branch, Intramural Research Program, National Institute on Aging, National Institutes of Health, Baltimore, MD 21224, USA
| | | | | | | | - Carlos Zerrweck
- The Obesity Clinic at Hospital General Tlahuac, 13250 Mexico City, Mexico
| | - Lorena Orozco
- Instituto Nacional de Medicina Genómica, Tlalpan, 14610 Mexico City, Mexico
| | - David M Altshuler
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; Department of Molecular Biology, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA; Department of Biology, MIT, Cambridge, MA 02139, USA
| | | | - Jose C Florez
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Metabolism Program, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA.
| | - Suzanne B R Jacobs
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Metabolism Program, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Eric S Lander
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Biology, MIT, Cambridge, MA 02139, USA; Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
19
|
Hrabe T, Jaroszewski L, Godzik A. Revealing aperiodic aspects of solenoid proteins from sequence information. Bioinformatics 2016; 32:2776-82. [PMID: 27334472 DOI: 10.1093/bioinformatics/btw319] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2015] [Accepted: 05/13/2016] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Repeat proteins, which contain multiple repeats of short sequence motifs, form a large but seldom-studied group of proteins. Methods focusing on the analysis of 3D structures of such proteins identified many subtle effects in length distribution of individual motifs that are important for their functions. However, similar analysis was yet not applied to the vast majority of repeat proteins with unknown 3D structures, mostly because of the extreme diversity of the underlying motifs and the resulting difficulty to detect those. RESULTS We developed FAIT, a sequence-based algorithm for the precise assignment of individual repeats in repeat proteins and introduced a framework to classify and compare aperiodicity patterns for large protein families. FAIT extracts repeat positions by post-processing FFAS alignment matrices with image processing methods. On examples of proteins with Leucine Rich Repeat (LRR) domains and other solenoids like proteins, we show that the automated analysis with FAIT correctly identifies exact lengths of individual repeats based entirely on sequence information. AVAILABILITY AND IMPLEMENTATION https://github.com/GodzikLab/FAIT CONTACT: adam@godziklab.org SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Thomas Hrabe
- Department of Bioinformatics and Systems Biology, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Lukasz Jaroszewski
- Department of Bioinformatics and Systems Biology, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Adam Godzik
- Department of Bioinformatics and Systems Biology, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| |
Collapse
|
20
|
dRHP-PseRA: detecting remote homology proteins using profile-based pseudo protein sequence and rank aggregation. Sci Rep 2016; 6:32333. [PMID: 27581095 PMCID: PMC5007510 DOI: 10.1038/srep32333] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Accepted: 08/04/2016] [Indexed: 11/09/2022] Open
Abstract
Protein remote homology detection is an important task in computational proteomics. Some computational methods have been proposed, which detect remote homology proteins based on different features and algorithms. As noted in previous studies, their predictive results are complementary to each other. Therefore, it is intriguing to explore whether these methods can be combined into one package so as to further enhance the performance power and application convenience. In view of this, we introduced a protein representation called profile-based pseudo protein sequence to extract the evolutionary information from the relevant profiles. Based on the concept of pseudo proteins, a new predictor, called “dRHP-PseRA”, was developed by combining four state-of-the-art predictors (PSI-BLAST, HHblits, Hmmer, and Coma) via the rank aggregation approach. Cross-validation tests on a SCOP benchmark dataset have demonstrated that the new predictor has remarkably outperformed any of the existing methods for the same purpose on ROC50 scores. Accordingly, it is anticipated that dRHP-PseRA holds very high potential to become a useful high throughput tool for detecting remote homology proteins. For the convenience of most experimental scientists, a web-server for dRHP-PseRA has been established at http://bioinformatics.hitsz.edu.cn/dRHP-PseRA/.
Collapse
|
21
|
Le NQK, Ou YY. Prediction of FAD binding sites in electron transport proteins according to efficient radial basis function networks and significant amino acid pairs. BMC Bioinformatics 2016; 17:298. [PMID: 27475771 PMCID: PMC4967503 DOI: 10.1186/s12859-016-1163-x] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2015] [Accepted: 07/22/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cellular respiration is a catabolic pathway for producing adenosine triphosphate (ATP) and is the most efficient process through which cells harvest energy from consumed food. When cells undergo cellular respiration, they require a pathway to keep and transfer electrons (i.e., the electron transport chain). Due to oxidation-reduction reactions, the electron transport chain produces a transmembrane proton electrochemical gradient. In case protons flow back through this membrane, this mechanical energy is converted into chemical energy by ATP synthase. The convert process is involved in producing ATP which provides energy in a lot of cellular processes. In the electron transport chain process, flavin adenine dinucleotide (FAD) is one of the most vital molecules for carrying and transferring electrons. Therefore, predicting FAD binding sites in the electron transport chain is vital for helping biologists understand the electron transport chain process and energy production in cells. RESULTS We used an independent data set to evaluate the performance of the proposed method, which had an accuracy of 69.84 %. We compared the performance of the proposed method in analyzing two newly discovered electron transport protein sequences with that of the general FAD binding predictor presented by Mishra and Raghava and determined that the accuracy of the proposed method improved by 9-45 % and its Matthew's correlation coefficient was 0.14-0.5. Furthermore, the proposed method enabled reducing the number of false positives significantly and can provide useful information for biologists. CONCLUSIONS We developed a method that is based on PSSM profiles and SAAPs for identifying FAD binding sites in newly discovered electron transport protein sequences. This approach achieved a significant improvement after we added SAAPs to PSSM features to analyze FAD binding proteins in the electron transport chain. The proposed method can serve as an effective tool for predicting FAD binding sites in electron transport proteins and can help biologists understand the functions of the electron transport chain, particularly those of FAD binding sites. We also developed a web server which identifies FAD binding sites in electron transporters available for academics.
Collapse
Affiliation(s)
- Nguyen-Quoc-Khanh Le
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan.
| | - Yu-Yen Ou
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan.
| |
Collapse
|
22
|
Ghouzam Y, Postic G, Guerin PE, de Brevern AG, Gelly JC. ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles. Sci Rep 2016; 6:28268. [PMID: 27319297 PMCID: PMC4913311 DOI: 10.1038/srep28268] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2016] [Accepted: 06/01/2016] [Indexed: 11/09/2022] Open
Abstract
Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation -with Protein Blocks-, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the 'Hard' category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/.
Collapse
Affiliation(s)
- Yassine Ghouzam
- INSERM, U 1134, DSIMB, F-75739 Paris, France
- Univ. Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, F-75739 Paris, France
- Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France
- Laboratoire d’Excellence GR-Ex, F-75739 Paris, France
| | - Guillaume Postic
- INSERM, U 1134, DSIMB, F-75739 Paris, France
- Univ. Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, F-75739 Paris, France
- Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France
- Laboratoire d’Excellence GR-Ex, F-75739 Paris, France
| | - Pierre-Edouard Guerin
- INSERM, U 1134, DSIMB, F-75739 Paris, France
- Univ. Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, F-75739 Paris, France
- Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France
- Laboratoire d’Excellence GR-Ex, F-75739 Paris, France
| | - Alexandre G. de Brevern
- INSERM, U 1134, DSIMB, F-75739 Paris, France
- Univ. Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, F-75739 Paris, France
- Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France
- Laboratoire d’Excellence GR-Ex, F-75739 Paris, France
| | - Jean-Christophe Gelly
- INSERM, U 1134, DSIMB, F-75739 Paris, France
- Univ. Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, F-75739 Paris, France
- Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France
- Laboratoire d’Excellence GR-Ex, F-75739 Paris, France
| |
Collapse
|
23
|
Niedzialkowska E, Gasiorowska O, Handing KB, Majorek KA, Porebski PJ, Shabalin IG, Zasadzinska E, Cymborowski M, Minor W. Protein purification and crystallization artifacts: The tale usually not told. Protein Sci 2016; 25:720-33. [PMID: 26660914 PMCID: PMC4815408 DOI: 10.1002/pro.2861] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Revised: 12/02/2015] [Accepted: 12/02/2015] [Indexed: 01/07/2023]
Abstract
The misidentification of a protein sample, or contamination of a sample with the wrong protein, may be a potential reason for the non-reproducibility of experiments. This problem may occur in the process of heterologous overexpression and purification of recombinant proteins, as well as purification of proteins from natural sources. If the contaminated or misidentified sample is used for crystallization, in many cases the problem may not be detected until structures are determined. In the case of functional studies, the problem may not be detected for years. Here several procedures that can be successfully used for the identification of crystallized protein contaminants, including: (i) a lattice parameter search against known structures, (ii) sequence or fold identification from partially built models, and (iii) molecular replacement with common contaminants as search templates have been presented. A list of common contaminant structures to be used as alternative search models was provided. These methods were used to identify four cases of purification and crystallization artifacts. This report provides troubleshooting pointers for researchers facing difficulties in phasing or model building.
Collapse
Affiliation(s)
- Ewa Niedzialkowska
- Department of Molecular Physiology and Biological PhysicsUniversity of Virginia School of Medicine1340 Jefferson Park Avenue, Jordan Hall, Room 4223CharlottesvilleVirginia22908
- Jerzy Haber Institute of Catalysis and Surface ChemistryPolish Academy of SciencesNiezapominajek 8Krakow30‐239Poland
- Midwest Center for Structural Genomics (MCSG)ArgonneIllinois60439
| | - Olga Gasiorowska
- Department of Molecular Physiology and Biological PhysicsUniversity of Virginia School of Medicine1340 Jefferson Park Avenue, Jordan Hall, Room 4223CharlottesvilleVirginia22908
- Midwest Center for Structural Genomics (MCSG)ArgonneIllinois60439
| | - Katarzyna B. Handing
- Department of Molecular Physiology and Biological PhysicsUniversity of Virginia School of Medicine1340 Jefferson Park Avenue, Jordan Hall, Room 4223CharlottesvilleVirginia22908
- Midwest Center for Structural Genomics (MCSG)ArgonneIllinois60439
| | - Karolina A. Majorek
- Department of Molecular Physiology and Biological PhysicsUniversity of Virginia School of Medicine1340 Jefferson Park Avenue, Jordan Hall, Room 4223CharlottesvilleVirginia22908
- Midwest Center for Structural Genomics (MCSG)ArgonneIllinois60439
- Center for Structural Genomics of Infectious Diseases (CSGID)ChicagoIllinois60611
| | - Przemyslaw J. Porebski
- Department of Molecular Physiology and Biological PhysicsUniversity of Virginia School of Medicine1340 Jefferson Park Avenue, Jordan Hall, Room 4223CharlottesvilleVirginia22908
- Midwest Center for Structural Genomics (MCSG)ArgonneIllinois60439
| | - Ivan G. Shabalin
- Department of Molecular Physiology and Biological PhysicsUniversity of Virginia School of Medicine1340 Jefferson Park Avenue, Jordan Hall, Room 4223CharlottesvilleVirginia22908
- Midwest Center for Structural Genomics (MCSG)ArgonneIllinois60439
- Center for Structural Genomics of Infectious Diseases (CSGID)ChicagoIllinois60611
- New York Structural Genomics Research Consortium (NYSGRC)BronxNew York10461
| | - Ewelina Zasadzinska
- Department of Biochemistry and Molecular GeneticsUniversity of Virginia School of Medicine1340 Jefferson Park Avenue, Jordan Hall, Room 6044CharlottesvilleVirginia22908
| | - Marcin Cymborowski
- Department of Molecular Physiology and Biological PhysicsUniversity of Virginia School of Medicine1340 Jefferson Park Avenue, Jordan Hall, Room 4223CharlottesvilleVirginia22908
- Midwest Center for Structural Genomics (MCSG)ArgonneIllinois60439
| | - Wladek Minor
- Department of Molecular Physiology and Biological PhysicsUniversity of Virginia School of Medicine1340 Jefferson Park Avenue, Jordan Hall, Room 4223CharlottesvilleVirginia22908
- Midwest Center for Structural Genomics (MCSG)ArgonneIllinois60439
- Center for Structural Genomics of Infectious Diseases (CSGID)ChicagoIllinois60611
- New York Structural Genomics Research Consortium (NYSGRC)BronxNew York10461
| |
Collapse
|
24
|
Zheng F, Robertson AP, Abongwa M, Yu EW, Martin RJ. The Ascaris suum nicotinic receptor, ACR-16, as a drug target: Four novel negative allosteric modulators from virtual screening. INTERNATIONAL JOURNAL FOR PARASITOLOGY-DRUGS AND DRUG RESISTANCE 2016; 6:60-73. [PMID: 27054065 PMCID: PMC4805779 DOI: 10.1016/j.ijpddr.2016.02.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Revised: 01/28/2016] [Accepted: 02/05/2016] [Indexed: 12/22/2022]
Abstract
Soil-transmitted helminth infections in humans and livestock cause significant debility, reduced productivity and economic losses globally. There are a limited number of effective anthelmintic drugs available for treating helminths infections, and their frequent use has led to the development of resistance in many parasite species. There is an urgent need for novel therapeutic drugs for treating these parasites. We have chosen the ACR-16 nicotinic acetylcholine receptor of Ascaris suum (Asu-ACR-16), as a drug target and have developed three-dimensional models of this transmembrane protein receptor to facilitate the search for new bioactive compounds. Using the human α7 nAChR chimeras and Torpedo marmorata nAChR for homology modeling, we defined orthosteric and allosteric binding sites on the Asu-ACR-16 receptor for virtual screening. We identified four ligands that bind to sites on Asu-ACR-16 and tested their activity using electrophysiological recording from Asu-ACR-16 receptors expressed in Xenopus oocytes. The four ligands were acetylcholine inhibitors (SB-277011-A, IC50, 3.12 ± 1.29 μM; (+)-butaclamol Cl, IC50, 9.85 ± 2.37 μM; fmoc-1, IC50, 10.00 ± 1.38 μM; fmoc-2, IC50, 16.67 ± 1.95 μM) that behaved like negative allosteric modulators. Our work illustrates a structure-based in silico screening method for seeking anthelmintic hits, which can then be tested electrophysiologically for further characterization. Three-dimensional structural models of the Ascaris nicotinic (Asu-ACR-16) receptor made by homology modeling. High affinity ligands selected by in silico screening. Four ligands validated by electrophysiological studies as negative allosteric modulators.
Collapse
Affiliation(s)
- Fudan Zheng
- Department of Chemistry, College of Liberal Arts and Sciences, Iowa State University, Ames, IA 50011, USA
| | - Alan P Robertson
- Department of Biomedical Sciences, College of Veterinary Medicine, Iowa State University, Ames, IA 50011, USA
| | - Melanie Abongwa
- Department of Biomedical Sciences, College of Veterinary Medicine, Iowa State University, Ames, IA 50011, USA
| | - Edward W Yu
- Department of Chemistry, College of Liberal Arts and Sciences, Iowa State University, Ames, IA 50011, USA; Department of Physics and Astronomy, College of Liberal Arts and Sciences, Iowa State University, Ames, IA 50011, USA
| | - Richard J Martin
- Department of Biomedical Sciences, College of Veterinary Medicine, Iowa State University, Ames, IA 50011, USA.
| |
Collapse
|
25
|
Structural characterization of ANGPTL8 (betatrophin) with its interacting partner lipoprotein lipase. Comput Biol Chem 2016; 61:210-20. [PMID: 26908254 DOI: 10.1016/j.compbiolchem.2016.01.009] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Revised: 01/07/2016] [Accepted: 01/21/2016] [Indexed: 12/20/2022]
Abstract
Angiopoietin-like protein 8 (ANGPTL8) (also known as betatrophin) is a newly identified secretory protein with a potential role in autophagy, lipid metabolism and pancreatic beta-cell proliferation. Its structural characterization is required to enhance our current understanding of its mechanism of action which could help in identifying its receptor and/or other binding partners. Based on the physiological significance and necessity of exploring structural features of ANGPTL8, the present study is conducted with a specific aim to model the structure of ANGPTL8 and study its possible interactions with Lipoprotein Lipase (LPL). To the best of our knowledge, this is the first attempt to predict 3-dimensional (3D) structure of ANGPTL8. Three different approaches were used for modeling of ANGPTL8 including homology modeling, de-novo structure prediction and their amalgam which is then proceeded by structure verification using ERRATT, PROSA, Qmean and Ramachandran plot scores. The selected models of ANGPTL8 were further evaluated for protein-protein interaction (PPI) analysis with LPL using CPORT and HADDOCK server. Our results have shown that the crystal structure of iSH2 domain of Phosphatidylinositol 3-kinase (PI3K) p85β subunit (PDB entry: 3mtt) is a good candidate for homology modeling of ANGPTL8. Analysis of inter-molecular interactions between the structure of ANGPTL8 and LPL revealed existence of several non-covalent interactions. The residues of LPL involved in these interactions belong from its lid region, thrombospondin (TSP) region and heparin binding site which is suggestive of a possible role of ANGPTL8 in regulating the proteolysis, motility and localization of LPL. Besides, the conserved residues of SE1 region of ANGPTL8 formed interactions with the residues around the hinge region of LPL. Overall, our results support a model of inhibition of LPL by ANGPTL8 through the steric block of its catalytic site which will be further explored using wet lab studies in future.
Collapse
|
26
|
A large-scale conformation sampling and evaluation server for protein tertiary structure prediction and its assessment in CASP11. BMC Bioinformatics 2015; 16:337. [PMID: 26493701 PMCID: PMC4619059 DOI: 10.1186/s12859-015-0775-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Accepted: 10/14/2015] [Indexed: 11/10/2022] Open
Abstract
Background With more and more protein sequences produced in the genomic era, predicting protein structures from sequences becomes very important for elucidating the molecular details and functions of these proteins for biomedical research. Traditional template-based protein structure prediction methods tend to focus on identifying the best templates, generating the best alignments, and applying the best energy function to rank models, which often cannot achieve the best performance because of the difficulty of obtaining best templates, alignments, and models. Methods We developed a large-scale conformation sampling and evaluation method and its servers to improve the reliability and robustness of protein structure prediction. In the first step, our method used a variety of alignment methods to sample relevant and complementary templates and to generate alternative and diverse target-template alignments, used a template and alignment combination protocol to combine alignments, and used template-based and template-free modeling methods to generate a pool of conformations for a target protein. In the second step, it used a large number of protein model quality assessment methods to evaluate and rank the models in the protein model pool, in conjunction with an exception handling strategy to deal with any additional failure in model ranking. Results The method was implemented as two protein structure prediction servers: MULTICOM-CONSTRUCT and MULTICOM-CLUSTER that participated in the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) in 2014. The two servers were ranked among the best 10 server predictors. Conclusions The good performance of our servers in CASP11 demonstrates the effectiveness and robustness of the large-scale conformation sampling and evaluation. The MULTICOM server is available at: http://sysbio.rnet.missouri.edu/multicom_cluster/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0775-x) contains supplementary material, which is available to authorized users.
Collapse
|
27
|
Tong J, Pei J, Grishin NV. SFESA: a web server for pairwise alignment refinement by secondary structure shifts. BMC Bioinformatics 2015; 16:282. [PMID: 26335387 PMCID: PMC4558796 DOI: 10.1186/s12859-015-0711-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2015] [Accepted: 08/19/2015] [Indexed: 12/01/2022] Open
Abstract
Background Protein sequence alignment is essential for a variety of tasks such as homology modeling and active site prediction. Alignment errors remain the main cause of low-quality structure models. A bioinformatics tool to refine alignments is needed to make protein alignments more accurate. Results We developed the SFESA web server to refine pairwise protein sequence alignments. Compared to the previous version of SFESA, which required a set of 3D coordinates for a protein, the new server will search a sequence database for the closest homolog with an available 3D structure to be used as a template. For each alignment block defined by secondary structure elements in the template, SFESA evaluates alignment variants generated by local shifts and selects the best-scoring alignment variant. A scoring function that combines the sequence score of profile-profile comparison and the structure score of template-derived contact energy is used for evaluation of alignments. PROMALS pairwise alignments refined by SFESA are more accurate than those produced by current advanced alignment methods such as HHpred and CNFpred. In addition, SFESA also improves alignments generated by other software. Conclusions SFESA is a web-based tool for alignment refinement, designed for researchers to compute, refine, and evaluate pairwise alignments with a combined sequence and structure scoring of alignment blocks. To our knowledge, the SFESA web server is the only tool that refines alignments by evaluating local shifts of secondary structure elements. The SFESA web server is available at http://prodata.swmed.edu/sfesa.
Collapse
Affiliation(s)
- Jing Tong
- Department of Biophysics and Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas, 6001 Forest Park Road, Dallas, TX, 75390-9050, USA.
| | - Jimin Pei
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, 6001 Forest Park Road, Dallas, TX, 75390-9050, USA.
| | - Nick V Grishin
- Department of Biophysics and Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas, 6001 Forest Park Road, Dallas, TX, 75390-9050, USA. .,Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, 6001 Forest Park Road, Dallas, TX, 75390-9050, USA.
| |
Collapse
|
28
|
Zepeda Mendoza ML, Sicheritz-Pontén T, Gilbert MTP. Environmental genes and genomes: understanding the differences and challenges in the approaches and software for their analyses. Brief Bioinform 2015; 16:745-58. [PMID: 25673291 PMCID: PMC4570204 DOI: 10.1093/bib/bbv001] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Revised: 12/16/2014] [Indexed: 01/19/2023] Open
Abstract
DNA-based taxonomic and functional profiling is widely used for the characterization of organismal communities across a rapidly increasing array of research areas that include the role of microbiomes in health and disease, biomonitoring, and estimation of both microbial and metazoan species richness. Two principal approaches are currently used to assign taxonomy to DNA sequences: DNA metabarcoding and metagenomics. When initially developed, each of these approaches mandated their own particular methods for data analysis; however, with the development of high-throughput sequencing (HTS) techniques they have begun to share many aspects in data set generation and processing. In this review we aim to define the current characteristics, goals and boundaries of each field, and describe the different software used for their analysis. We argue that an appreciation of the potential and limitations of each method can help underscore the improvements required by each field so as to better exploit the richness of current HTS-based data sets.
Collapse
|
29
|
Ghouzam Y, Postic G, de Brevern AG, Gelly JC. Improving protein fold recognition with hybrid profiles combining sequence and structure evolution. Bioinformatics 2015; 31:3782-9. [PMID: 26254434 DOI: 10.1093/bioinformatics/btv462] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 08/02/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Template-based modeling, the most successful approach for predicting protein 3D structure, often requires detecting distant evolutionary relationships between the target sequence and proteins of known structure. Developed for this purpose, fold recognition methods use elaborate strategies to exploit evolutionary information, mainly by encoding amino acid sequence into profiles. Since protein structure is more conserved than sequence, the inclusion of structural information can improve the detection of remote homology. RESULTS Here, we present ORION, a new fold recognition method based on the pairwise comparison of hybrid profiles that contain evolutionary information from both protein sequence and structure. Our method uses the 16-state structural alphabet Protein Blocks, which provides an accurate 1D description of protein structure local conformations. ORION systematically outperforms PSI-BLAST and HHsearch on several benchmarks, including target sequences from the modeling competitions CASP8, 9 and 10, and detects ∼10% more templates at fold and superfamily SCOP levels. AVAILABILITY Software freely available for download at http://www.dsimb.inserm.fr/orion/. CONTACT jean-christophe.gelly@univ-paris-diderot.fr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yassine Ghouzam
- Inserm U1134, Paris, France, Université Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, Paris, France, Institut National de la Transfusion Sanguine, Paris, France and Laboratory of Excellence GR-Ex, Paris, France
| | - Guillaume Postic
- Inserm U1134, Paris, France, Université Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, Paris, France, Institut National de la Transfusion Sanguine, Paris, France and Laboratory of Excellence GR-Ex, Paris, France
| | - Alexandre G de Brevern
- Inserm U1134, Paris, France, Université Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, Paris, France, Institut National de la Transfusion Sanguine, Paris, France and Laboratory of Excellence GR-Ex, Paris, France
| | - Jean-Christophe Gelly
- Inserm U1134, Paris, France, Université Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, Paris, France, Institut National de la Transfusion Sanguine, Paris, France and Laboratory of Excellence GR-Ex, Paris, France
| |
Collapse
|
30
|
Tong J, Sadreyev RI, Pei J, Kinch LN, Grishin NV. Using homology relations within a database markedly boosts protein sequence similarity search. Proc Natl Acad Sci U S A 2015; 112:7003-8. [PMID: 26038555 PMCID: PMC4460465 DOI: 10.1073/pnas.1424324112] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence-based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit's known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre.
Collapse
Affiliation(s)
- Jing Tong
- Department of Molecular Biophysics, University of Texas Southwestern Medical Center, Dallas, TX 75390-9050
| | - Ruslan I Sadreyev
- Department of Molecular Biology, Massachusetts General Hospital, Boston, MA 02114; Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114
| | - Jimin Pei
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX 75390-9050
| | - Lisa N Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX 75390-9050
| | - Nick V Grishin
- Department of Molecular Biophysics, University of Texas Southwestern Medical Center, Dallas, TX 75390-9050; Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX 75390-9050
| |
Collapse
|
31
|
Alternative approach to protein structure prediction based on sequential similarity of physical properties. Proc Natl Acad Sci U S A 2015; 112:5029-32. [PMID: 25848034 DOI: 10.1073/pnas.1504806112] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The relationship between protein sequence and structure arises entirely from amino acid physical properties. An alternative method is therefore proposed to identify homologs in which residue equivalence is based exclusively on the pairwise physical property similarities of sequences. This approach, the property factor method (PFM), is entirely different from those in current use. A comparison is made between our method and PSI BLAST. We demonstrate that traditionally defined sequence similarity can be very low for pairs of sequences (which therefore cannot be identified using PSI BLAST), but similarity of physical property distributions results in almost identical 3D structures. The performance of PFM is shown to be better than that of PSI BLAST when sequence matching is comparable, based on a comparison using targets from CASP10 (89 targets) and CASP11 (51 targets). It is also shown that PFM outperforms PSI BLAST in informatically challenging targets.
Collapse
|
32
|
Three-dimensional protein structure prediction: Methods and computational strategies. Comput Biol Chem 2014; 53PB:251-276. [DOI: 10.1016/j.compbiolchem.2014.10.001] [Citation(s) in RCA: 121] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Revised: 10/03/2014] [Accepted: 10/07/2014] [Indexed: 01/01/2023]
|
33
|
Wang YL, Lin YT, Chen CL, Shaw GC, Liaw SH. Crystallization and preliminary crystallographic analysis of poly(3-hydroxybutyrate) depolymerase from Bacillus thuringiensis. ACTA CRYSTALLOGRAPHICA SECTION F-STRUCTURAL BIOLOGY COMMUNICATIONS 2014; 70:1421-3. [PMID: 25286954 DOI: 10.1107/s2053230x14019347] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2014] [Accepted: 08/26/2014] [Indexed: 11/10/2022]
Abstract
Poly[(R)-3-hydroxybutyrate] (PHB) is a microbial biopolymer that has been commercialized as biodegradable plastics. The key enzyme for the degradation is PHB depolymerase (PhaZ). A new intracellular PhaZ from Bacillus thuringiensis (BtPhaZ) has been screened for potential applications in polymer biodegradation. Recombinant BtPhaZ was crystallized using 25% polyethylene glycol 3350, 0.2 M ammonium acetate, 0.1 M bis-tris pH 6.5 at 288 K. The crystals belonged to space group P1, with unit-cell parameters a = 42.97, b = 83.23, c = 85.50 Å, α = 73.45, β = 82.83, γ = 83.49°. An X-ray diffraction data set was collected to 1.42 Å resolution with an Rmerge of 6.4%. Unexpectedly, a molecular-replacement solution was obtained using the crystal structure of Streptomyces lividans chloroperoxidase as a template, which shares 24% sequence identity to BtPhaZ. This is the first crystal structure of an intracellular poly(3-hydroxybutyrate) depolymerase.
Collapse
Affiliation(s)
- Yung Lin Wang
- Institute of Biochemistry and Molecular Biology, National Yang-Ming University, Taipei 11221, Taiwan
| | - Yi Ting Lin
- Department of Life Sciences and Institute of Genome Sciences, National Yang-Ming University, Taipei 11221, Taiwan
| | - Chia Lin Chen
- Institute of Biochemistry and Molecular Biology, National Yang-Ming University, Taipei 11221, Taiwan
| | - Gwo Chyuan Shaw
- Institute of Biochemistry and Molecular Biology, National Yang-Ming University, Taipei 11221, Taiwan
| | - Shwu Huey Liaw
- Institute of Biochemistry and Molecular Biology, National Yang-Ming University, Taipei 11221, Taiwan
| |
Collapse
|
34
|
A comparative assessment and analysis of 20 representative sequence alignment methods for protein structure prediction. Sci Rep 2014; 3:2619. [PMID: 24018415 PMCID: PMC3965362 DOI: 10.1038/srep02619] [Citation(s) in RCA: 128] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Accepted: 08/22/2013] [Indexed: 11/08/2022] Open
Abstract
Protein sequence alignment is essential for template-based protein structure prediction and function annotation. We collect 20 sequence alignment algorithms, 10 published and 10 newly developed, which cover all representative sequence- and profile-based alignment approaches. These algorithms are benchmarked on 538 non-redundant proteins for protein fold-recognition on a uniform template library. Results demonstrate dominant advantage of profile-profile based methods, which generate models with average TM-score 26.5% higher than sequence-profile methods and 49.8% higher than sequence-sequence alignment methods. There is no obvious difference in results between methods with profiles generated from PSI-BLAST PSSM matrix and hidden Markov models. Accuracy of profile-profile alignments can be further improved by 9.6% or 21.4% when predicted or native structure features are incorporated. Nevertheless, TM-scores from profile-profile methods including experimental structural features are still 37.1% lower than that from TM-align, demonstrating that the fold-recognition problem cannot be solved solely by improving accuracy of structure feature predictions.
Collapse
|
35
|
PvdP is a tyrosinase that drives maturation of the pyoverdine chromophore in Pseudomonas aeruginosa. J Bacteriol 2014; 196:2681-90. [PMID: 24816606 DOI: 10.1128/jb.01376-13] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The iron binding siderophore pyoverdine constitutes a major adaptive factor contributing to both virulence and survival in fluorescent pseudomonads. For decades, pyoverdine production has allowed the identification and classification of fluorescent and nonfluorescent pseudomonads. Here, we demonstrate that PvdP, a periplasmic enzyme of previously unknown function, is a tyrosinase required for the maturation of the pyoverdine chromophore in Pseudomonas aeruginosa. PvdP converts the nonfluorescent ferribactin, containing two iron binding groups, into a fluorescent pyoverdine, forming a strong hexadentate complex with ferrous iron, by three consecutive oxidation steps. PvdP represents the first characterized member of a small family of tyrosinases present in fluorescent pseudomonads that are required for siderophore maturation and are capable of acting on large peptidic substrates.
Collapse
|
36
|
Joseph AP, de Brevern AG. From local structure to a global framework: recognition of protein folds. J R Soc Interface 2014; 11:20131147. [PMID: 24740960 DOI: 10.1098/rsif.2013.1147] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Protein folding has been a major area of research for many years. Nonetheless, the mechanisms leading to the formation of an active biological fold are still not fully apprehended. The huge amount of available sequence and structural information provides hints to identify the putative fold for a given sequence. Indeed, protein structures prefer a limited number of local backbone conformations, some being characterized by preferences for certain amino acids. These preferences largely depend on the local structural environment. The prediction of local backbone conformations has become an important factor to correctly identifying the global protein fold. Here, we review the developments in the field of local structure prediction and especially their implication in protein fold recognition.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- Science and Technology Facilities Council, Rutherford Appleton Laboratory, Harwell Oxford, , Didcot OX11 0QX, UK
| | | |
Collapse
|
37
|
Ma J, Wang S, Wang Z, Xu J. MRFalign: protein homology detection through alignment of Markov random fields. PLoS Comput Biol 2014; 10:e1003500. [PMID: 24675572 PMCID: PMC3967925 DOI: 10.1371/journal.pcbi.1003500] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2013] [Accepted: 01/08/2014] [Indexed: 11/24/2022] Open
Abstract
Sequence-based protein homology detection has been extensively studied and so far the most sensitive method is based upon comparison of protein sequence profiles, which are derived from multiple sequence alignment (MSA) of sequence homologs in a protein family. A sequence profile is usually represented as a position-specific scoring matrix (PSSM) or an HMM (Hidden Markov Model) and accordingly PSSM-PSSM or HMM-HMM comparison is used for homolog detection. This paper presents a new homology detection method MRFalign, consisting of three key components: 1) a Markov Random Fields (MRF) representation of a protein family; 2) a scoring function measuring similarity of two MRFs; and 3) an efficient ADMM (Alternating Direction Method of Multipliers) algorithm aligning two MRFs. Compared to HMM that can only model very short-range residue correlation, MRFs can model long-range residue interaction pattern and thus, encode information for the global 3D structure of a protein family. Consequently, MRF-MRF comparison for remote homology detection shall be much more sensitive than HMM-HMM or PSSM-PSSM comparison. Experiments confirm that MRFalign outperforms several popular HMM or PSSM-based methods in terms of both alignment accuracy and remote homology detection and that MRFalign works particularly well for mainly beta proteins. For example, tested on the benchmark SCOP40 (8353 proteins) for homology detection, PSSM-PSSM and HMM-HMM succeed on 48% and 52% of proteins, respectively, at superfamily level, and on 15% and 27% of proteins, respectively, at fold level. In contrast, MRFalign succeeds on 57.3% and 42.5% of proteins at superfamily and fold level, respectively. This study implies that long-range residue interaction patterns are very helpful for sequence-based homology detection. The software is available for download at http://raptorx.uchicago.edu/download/. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2–5. Sequence-based protein homology detection has been extensively studied, but it remains very challenging for remote homologs with divergent sequences. So far the most sensitive methods employ HMM-HMM comparison, which models a protein family using HMM (Hidden Markov Model) and then detects homologs using HMM-HMM alignment. HMM cannot model long-range residue interaction patterns and thus, carries very little information regarding the global 3D structure of a protein family. As such, HMM comparison is not sensitive enough for distantly-related homologs. In this paper, we present an MRF-MRF comparison method for homology detection. In particular, we model a protein family using Markov Random Fields (MRF) and then detect homologs by MRF-MRF alignment. Compared to HMM, MRFs are able to model long-range residue interaction pattern and thus, contains information for the overall 3D structure of a protein family. Consequently, MRF-MRF comparison is much more sensitive than HMM-HMM comparison. To implement MRF-MRF comparison, we have developed a new scoring function to measure the similarity of two MRFs and also an efficient ADMM algorithm to optimize the scoring function. Experiments confirm that MRF-MRF comparison indeed outperforms HMM-HMM comparison in terms of both alignment accuracy and remote homology detection, especially for mainly beta proteins.
Collapse
Affiliation(s)
- Jianzhu Ma
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Zhiyong Wang
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
- * E-mail:
| |
Collapse
|
38
|
Ahmed MH, Kellogg GE, Selley DE, Safo MK, Zhang Y. Predicting the molecular interactions of CRIP1a-cannabinoid 1 receptor with integrated molecular modeling approaches. Bioorg Med Chem Lett 2014; 24:1158-65. [PMID: 24461351 PMCID: PMC4353595 DOI: 10.1016/j.bmcl.2013.12.119] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2013] [Revised: 12/26/2013] [Accepted: 12/29/2013] [Indexed: 12/14/2022]
Abstract
Cannabinoid receptors are a family of G-protein coupled receptors that are involved in a wide variety of physiological processes and diseases. One of the key regulators that are unique to cannabinoid receptors is the cannabinoid receptor interacting proteins (CRIPs). Among them CRIP1a was found to decrease the constitutive activity of the cannabinoid type-1 receptor (CB1R). The aim of this study is to gain an understanding of the interaction between CRIP1a and CB1R through using different computational techniques. The generated model demonstrated several key putative interactions between CRIP1a and CB1R, including the critical involvement of Lys130 in CRIP1a.
Collapse
Affiliation(s)
- Mostafa H Ahmed
- Department of Medicinal Chemistry, Virginia Commonwealth University, Richmond, VA 23298, USA; Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Glen E Kellogg
- Department of Medicinal Chemistry, Virginia Commonwealth University, Richmond, VA 23298, USA; Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University, Richmond, VA 23298, USA; Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Dana E Selley
- Department of Pharmacology and Toxicology, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Martin K Safo
- Department of Medicinal Chemistry, Virginia Commonwealth University, Richmond, VA 23298, USA; Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Yan Zhang
- Department of Medicinal Chemistry, Virginia Commonwealth University, Richmond, VA 23298, USA.
| |
Collapse
|
39
|
Kodavali PK, Dudkiewicz M, Pikuła S, Pawłowski K. Bioinformatics analysis of bacterial annexins--putative ancestral relatives of eukaryotic annexins. PLoS One 2014; 9:e85428. [PMID: 24454864 PMCID: PMC3894181 DOI: 10.1371/journal.pone.0085428] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2013] [Accepted: 12/03/2013] [Indexed: 11/19/2022] Open
Abstract
Annexins are Ca(2+)-binding, membrane-interacting proteins, widespread among eukaryotes, consisting usually of four structurally similar repeated domains. It is accepted that vertebrate annexins derive from a double genome duplication event. It has been postulated that a single domain annexin, if found, might represent a molecule related to the hypothetical ancestral annexin. The recent discovery of a single-domain annexin in a bacterium, Cytophaga hutchinsonii, apparently confirmed this hypothesis. Here, we present a more complex picture. Using remote sequence similarity detection tools, a survey of bacterial genomes was performed in search of annexin-like proteins. In total, we identified about thirty annexin homologues, including single-domain and multi-domain annexins, in seventeen bacterial species. The thorough search yielded, besides the known annexin homologue from C. hutchinsonii, homologues from the Bacteroidetes/Chlorobi phylum, from Gemmatimonadetes, from beta- and delta-Proteobacteria, and from Actinobacteria. The sequences of bacterial annexins exhibited remote but statistically significant similarity to sequence profiles built of the eukaryotic ones. Some bacterial annexins are equipped with additional, different domains, for example those characteristic for toxins. The variation in bacterial annexin sequences, much wider than that observed in eukaryotes, and different domain architectures suggest that annexins found in bacteria may actually descend from an ancestral bacterial annexin, from which eukaryotic annexins also originate. The hypothesis of an ancient origin of bacterial annexins has to be reconciled with the fact that remarkably few bacterial strains possess annexin genes compared to the thousands of known bacterial genomes and with the patchy, anomalous phylogenetic distribution of bacterial annexins. Thus, a massive annexin gene loss in several bacterial lineages or very divergent evolution would appear a likely explanation. Alternative evolutionary scenarios, involving horizontal gene transfer between bacteria and protozoan eukaryotes, in either direction, appear much less likely. Altogether, current evidence does not allow unequivocal judgement as to the origin of bacterial annexins.
Collapse
Affiliation(s)
- Praveen Kumar Kodavali
- Department of Biochemistry, Nencki Institute of Experimental Biology, Polish Academy of Sciences, Warsaw, Poland
| | - Małgorzata Dudkiewicz
- Faculty of Agriculture and Biology, Warsaw University of Life Sciences, Warsaw, Poland
| | - Sławomir Pikuła
- Department of Biochemistry, Nencki Institute of Experimental Biology, Polish Academy of Sciences, Warsaw, Poland
| | - Krzysztof Pawłowski
- Faculty of Agriculture and Biology, Warsaw University of Life Sciences, Warsaw, Poland
| |
Collapse
|
40
|
Hermann C, Strittmatter LM, Deane JE, Boyle LH. The binding of TAPBPR and Tapasin to MHC class I is mutually exclusive. THE JOURNAL OF IMMUNOLOGY 2013; 191:5743-50. [PMID: 24163410 DOI: 10.4049/jimmunol.1300929] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
The loading of peptide Ags onto MHC class I molecules is a highly controlled process in which the MHC class I-dedicated chaperone tapasin is a key player. We recently identified a tapasin-related molecule, TAPBPR, as an additional component in the MHC class I Ag-presentation pathway. In this study, we show that the amino acid residues important for tapasin to interact with MHC class I are highly conserved on TAPBPR. We identify specific residues in the N-terminal and C-terminal domains of TAPBPR involved in associating with MHC class I. Furthermore, we demonstrate that residues on MHC class I crucial for its association with tapasin, such as T134, are also essential for its interaction with TAPBPR. Taken together, the data indicate that TAPBPR and tapasin bind in a similar orientation to the same face of MHC class I. In the absence of tapasin, the association of MHC class I with TAPBPR is increased. However, in the absence of TAPBPR, the interaction between MHC class I and tapasin does not increase. In light of our findings, previous data determining the function of tapasin in the MHC class I Ag-processing and presentation pathway must be re-evaluated.
Collapse
Affiliation(s)
- Clemens Hermann
- Department of Pathology, Cambridge Institute of Medical Research, University of Cambridge, Cambridge CB2 0XY, United Kingdom
| | | | | | | |
Collapse
|
41
|
Xu D, Jaroszewski L, Li Z, Godzik A. FFAS-3D: improving fold recognition by including optimized structural features and template re-ranking. ACTA ACUST UNITED AC 2013; 30:660-7. [PMID: 24130308 DOI: 10.1093/bioinformatics/btt578] [Citation(s) in RCA: 81] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Homology detection enables grouping proteins into families and prediction of their structure and function. The range of application of homology-based predictions can be significantly extended by using sequence profiles and incorporation of local structural features. However, incorporation of the latter terms varies a lot between existing methods, and together with many examples of distant relations not recognized even by the best methods, suggests that further improvements are still possible. RESULTS Here we describe recent improvements to the fold and function assignment system (FFAS) method, including adding optimized structural features (experimental or predicted), 'symmetrical' Z-score calculation and re-ranking the templates with a neural network. The alignment accuracy in the new FFAS-3D is now 11% higher than the original and comparable with the most accurate template-based structure prediction algorithms. At the same time, FFAS-3D has high success rate at the Structural Classification of Proteins (SCOP) family, superfamily and fold levels. Importantly, FFAS-3D results are not highly correlated with other programs suggesting that it may significantly improve meta-predictions. FFAS-3D does not require 3D structures of the templates, as using predicted features instead of structure-derived does not lead to the decrease of accuracy. Because of that, FFAS-3D can be used for databases other than Protein Data Bank (PDB) such as Protein families database or Clusters of orthologous groups thus extending its applications to functional annotations of genomes and protein families. AVAILABILITY AND IMPLEMENTATION FFAS-3D is available at http://ffas.godziklab.org.
Collapse
Affiliation(s)
- Dong Xu
- Bioinformatics and Systems Biology Program, Sanford-Burnham Medical Research Institute, 10901 North Torrey Pines Road, La Jolla, CA 92037, Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Dr. La Jolla, CA 92093-0446, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Fahad Medical Research Center, King Abdulaziz University, P.O. Box 80216, Jeddah 21589, Kingdom of Saudi Arabia
| | | | | | | |
Collapse
|
42
|
Van Voorst JR, Finzel BC. Searching for likeness in a database of macromolecular complexes. J Chem Inf Model 2013; 53:2634-47. [PMID: 24047445 DOI: 10.1021/ci4002537] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
A software tool and workflow based on distance geometry is presented that can be used to search for local similarity in substructures in a comprehensive database of experimentally derived macromolecular structure. The method does not rely on fold annotation, specific secondary structure assignments, or sequence homology and may be used to locate compound substructures of multiple segments spanning different macromolecules that share a queried backbone geometry. This generalized substructure searching capability is intended to allow users to play an active part in exploring the role specific substructures play in larger protein domains, quaternary assemblies of proteins, and macromolecular complexes of proteins and polynucleotides. The user may select any portion or portions of an existing structure or complex to serve as a template for searching, and other structures that share the same structural features are identified, retrieved and overlaid to emphasize substructural likeness. Matching structures may be compared using a variety of integrated tools including molecular graphics for structure visualization and matching substructure sequence logos. A number of examples are provided that illustrate how generalized substructure searching may be used to understand both the similarity, and individuality of specific macromolecular structures. Web-based access to our substructure searching services is freely available at https://drugsite.msi.umn.edu.
Collapse
Affiliation(s)
- Jeffrey R Van Voorst
- Department of Medicinal Chemistry, University of Minnesota College of Pharmacy , Minneapolis, Minnesota 55455, United States
| | | |
Collapse
|
43
|
A novel predicted calcium-regulated kinase family implicated in neurological disorders. PLoS One 2013; 8:e66427. [PMID: 23840464 PMCID: PMC3696010 DOI: 10.1371/journal.pone.0066427] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Accepted: 05/08/2013] [Indexed: 12/03/2022] Open
Abstract
The catalogues of protein kinases, the essential effectors of cellular signaling, have been charted in Metazoan genomes for a decade now. Yet, surprisingly, using bioinformatics tools, we predicted protein kinase structure for proteins coded by five related human genes and their Metazoan homologues, the FAM69 family. Analysis of three-dimensional structure models and conservation of the classic catalytic motifs of protein kinases present in four out of five human FAM69 proteins suggests they might have retained catalytic phosphotransferase activity. An EF-hand Ca2+-binding domain in FAM69A and FAM69B proteins, inserted within the structure of the kinase domain, suggests they may function as Ca2+-dependent kinases. The FAM69 genes, FAM69A, FAM69B, FAM69C, C3ORF58 (DIA1) and CXORF36 (DIA1R), are by large uncharacterised molecularly, yet linked to several neurological disorders in genetics studies. The C3ORF58 gene is found deleted in autism, and resides in the Golgi. Unusually high cysteine content and presence of signal peptides in some of the family members suggest that FAM69 proteins may be involved in phosphorylation of proteins in the secretory pathway and/or of extracellular proteins.
Collapse
|
44
|
Lenart A, Dudkiewicz M, Grynberg M, Pawłowski K. CLCAs - a family of metalloproteases of intriguing phylogenetic distribution and with cases of substituted catalytic sites. PLoS One 2013; 8:e62272. [PMID: 23671590 PMCID: PMC3650047 DOI: 10.1371/journal.pone.0062272] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2012] [Accepted: 03/19/2013] [Indexed: 01/08/2023] Open
Abstract
The zinc-dependent metalloproteases with His-Glu-x-x-His (HExxH) active site motif, zincins, are a broad group of proteins involved in many metabolic and regulatory functions, and found in all forms of life. Human genome contains more than 100 genes encoding proteins with known zincin-like domains. A survey of all proteins containing the HExxH motif shows that approximately 52% of HExxH occurrences fall within known protein structural domains (as defined in the Pfam database). Domain families with majority of members possessing a conserved HExxH motif include, not surprisingly, many known and putative metalloproteases. Furthermore, several HExxH-containing protein domains thus identified can be confidently predicted to be putative peptidases of zincin fold. Thus, we predict zincin-like fold for eight uncharacterised Pfam families. Besides the domains with the HExxH motif strictly conserved, and those with sporadic occurrences, intermediate families are identified that contain some members with a conserved HExxH motif, but also many homologues with substitutions at the conserved positions. Such substitutions can be evolutionarily conserved and non-random, yet functional roles of these inactive zincins are not known. The CLCAs are a novel zincin-like protease family with many cases of substituted active sites. We show that this allegedly metazoan family has a number of bacterial and archaeal members. An extremely patchy phylogenetic distribution of CLCAs in prokaryotes and their conserved protein domain composition strongly suggests an evolutionary scenario of horizontal gene transfer (HGT) from multicellular eukaryotes to bacteria, providing an example of eukaryote-derived xenologues in bacterial genomes. Additionally, in a protein family identified here as closely homologous to CLCA, the CLCA_X (CLCA-like) family, a number of proteins is found in phages and plasmids, supporting the HGT scenario.
Collapse
Affiliation(s)
- Anna Lenart
- Department of Cellular and Molecular Neurobiology, Nencki Institute of Experimental Biology, Polish Academy of Sciences, Warsaw, Poland
| | - Małgorzata Dudkiewicz
- Faculty of Agriculture and Biology, Warsaw University of Life Sciences, Warsaw, Poland
| | - Marcin Grynberg
- Department of Genetics, Institute of Biochemistry and Biophysics, Polish Academy of Sciences,Warsaw, Poland
| | - Krzysztof Pawłowski
- Department of Cellular and Molecular Neurobiology, Nencki Institute of Experimental Biology, Polish Academy of Sciences, Warsaw, Poland
- Faculty of Agriculture and Biology, Warsaw University of Life Sciences, Warsaw, Poland
- * E-mail:
| |
Collapse
|
45
|
Cappelli A, Manini M, Valenti S, Castriconi F, Giuliani G, Anzini M, Brogi S, Butini S, Gemma S, Campiani G, Giorgi G, Mennuni L, Lanza M, Giordani A, Caselli G, Letari O, Makovec F. Synthesis and structure–activity relationship studies in serotonin 5-HT1A receptor agonists based on fused pyrrolidone scaffolds. Eur J Med Chem 2013; 63:85-94. [DOI: 10.1016/j.ejmech.2013.01.044] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2012] [Revised: 01/14/2013] [Accepted: 01/17/2013] [Indexed: 11/25/2022]
|
46
|
Lenart A, Pawłowski K. Intersection of selenoproteins and kinase signalling. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1834:1279-84. [PMID: 23541531 DOI: 10.1016/j.bbapap.2013.03.019] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/21/2013] [Revised: 03/16/2013] [Accepted: 03/19/2013] [Indexed: 11/28/2022]
Abstract
The small, obscure group of selenoprotein oxidoreductases and the huge clan of kinases, the workhorses of cellular signalling, are rarely discussed together. Focusing on selenoproteins of unknown structures, we predict a thioredoxin-like fold for the Selenoprotein N (SelN) family and use the structure to rationalise effects of the muscular myopathy-linked mutations in the gene coding SelN. Discussing the recent prediction of a protein kinase-like domain in the Selenoprotein O (SelO), we reiterate evidence for an oxidoreductase function alongside the predicted kinase domain. Thus, we propose that SelO, the strongly conserved kinase-cum-tentative-oxidoreductase may reflect oxidoreductase regulation of kinase networks. Also, we use bibliometric and systems biology approach to explore the kinase-selenoprotein relationships that begin to emerge from the literature. This article is part of a Special Issue entitled: Inhibitors of Protein Kinases (2012).
Collapse
Affiliation(s)
- Anna Lenart
- Nencki Institute of Experimental Biology, Polish Academy of Sciences, Warsaw, Poland
| | | |
Collapse
|
47
|
Latek D, Pasznik P, Carlomagno T, Filipek S. Towards improved quality of GPCR models by usage of multiple templates and profile-profile comparison. PLoS One 2013; 8:e56742. [PMID: 23468878 PMCID: PMC3585245 DOI: 10.1371/journal.pone.0056742] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2012] [Accepted: 01/14/2013] [Indexed: 11/19/2022] Open
Abstract
UNLABELLED G-protein coupled receptors (GPCRs) are targets of nearly one third of the drugs at the current pharmaceutical market. Despite their importance in many cellular processes the crystal structures are available for less than 20 unique GPCRs of the Rhodopsin-like class. Fortunately, even though involved in different signaling cascades, this large group of membrane proteins has preserved a uniform structure comprising seven transmembrane helices that allows quite reliable comparative modeling. Nevertheless, low sequence similarity between the GPCR family members is still a serious obstacle not only in template selection but also in providing theoretical models of acceptable quality. An additional level of difficulty is the prediction of kinks and bulges in transmembrane helices. Usage of multiple templates and generation of alignments based on sequence profiles may increase the rate of success in difficult cases of comparative modeling in which the sequence similarity between GPCRs is exceptionally low. Here, we present GPCRM, a novel method for fast and accurate generation of GPCR models using averaging of multiple template structures and profile-profile comparison. In particular, GPCRM is the first GPCR structure predictor incorporating two distinct loop modeling techniques: Modeller and Rosetta together with the filtering of models based on the Z-coordinate. We tested our approach on all unique GPCR structures determined to date and report its performance in comparison with other computational methods targeting the Rhodopsin-like class. We also provide a database of precomputed GPCR models of the human receptors from that class. AVAILABILITY GPCRM SERVER AND DATABASE: http://gpcrm.biomodellab.eu.
Collapse
Affiliation(s)
- Dorota Latek
- International Institute of Molecular and Cell Biology, Warsaw, Poland
- * E-mail: (DL); (SF)
| | - Pawel Pasznik
- International Institute of Molecular and Cell Biology, Warsaw, Poland
| | - Teresa Carlomagno
- EMBL, Structural and Computational Biology Unit, Heidelberg, Germany
| | - Slawomir Filipek
- Faculty of Chemistry, University of Warsaw, Warsaw, Poland
- * E-mail: (DL); (SF)
| |
Collapse
|
48
|
Zawaira A, Shibayama Y. A simple recipe for the non-expert bioinformaticist for building experimentally-testable hypotheses for proteins with no known homologs. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2012; 13:185-200. [PMID: 22956349 DOI: 10.1007/s10969-012-9141-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/29/2012] [Accepted: 08/08/2012] [Indexed: 06/01/2023]
Abstract
The study of the protein-protein interactions (PPIs) of unique ORFs is a strategy for deciphering the biological roles of unique ORFs of interest. For uniform reference, we define unique ORFs as those for which no matching protein is found after PDB-BLAST search with default parameters. The uniqueness of the ORFs generally precludes the straightforward use of structure-based approaches in the design of experiments to explore PPIs. Many open-source bioinformatics tools, from the commonly-used to the relatively esoteric, have been built and validated to perform analyses and/or predictions of sorts on proteins. How can these available tools be combined into a protocol that helps the non-expert bioinformaticist researcher to design experiments to explore the PPIs of their unique ORF? Here we define a pragmatic protocol based on accessibility of software to achieve this and we make it concrete by applying it on two proteins-the ImuB and ImuA' proteins from Mycobacterium tuberculosis. The protocol is pragmatic in that decisions are made largely based on the availability of easy-to-use freeware. We define the following basic and user-friendly software pathway to build testable PPI hypotheses for a query protein sequence: PSI-PRED → MUSTER → metaPPISP → ASAView and ConSurf. Where possible, other analytical and/or predictive tools may be included. Our protocol combines the software predictions and analyses with general bioinformatics principles to arrive at consensus, prioritised and testable PPI hypotheses.
Collapse
Affiliation(s)
- Alexander Zawaira
- Gene Expression and Biophysics Group, Synthetic Biology, ERA, CSIR Biosciences, Brummeria, Pretoria, South Africa.
| | | |
Collapse
|
49
|
Yadav S, Kushwaha HR, Kumar K, Verma PK. Comparative structural modeling of a monothiol GRX from chickpea: Insight in iron–sulfur cluster assembly. Int J Biol Macromol 2012; 51:266-73. [DOI: 10.1016/j.ijbiomac.2012.05.014] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2012] [Revised: 04/27/2012] [Accepted: 05/11/2012] [Indexed: 01/12/2023]
|
50
|
Kumar A, Möcklinghoff S, Yumoto F, Jaroszewski L, Farr CL, Grzechnik A, Nguyen P, Weichenberger CX, Chiu HJ, Klock HE, Elsliger MA, Deacon AM, Godzik A, Lesley SA, Conklin BR, Fletterick RJ, Wilson IA. Structure of a novel winged-helix like domain from human NFRKB protein. PLoS One 2012; 7:e43761. [PMID: 22984442 PMCID: PMC3439487 DOI: 10.1371/journal.pone.0043761] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2012] [Accepted: 07/24/2012] [Indexed: 01/26/2023] Open
Abstract
The human nuclear factor related to kappa-B-binding protein (NFRKB) is a 1299-residue protein that is a component of the metazoan INO80 complex involved in chromatin remodeling, transcription regulation, DNA replication and DNA repair. Although full length NFRKB is predicted to be around 65% disordered, comparative sequence analysis identified several potentially structured sections in the N-terminal region of the protein. These regions were targeted for crystallographic studies, and the structure of one of these regions spanning residues 370-495 was determined using the JCSG high-throughput structure determination pipeline. The structure reveals a novel, mostly helical domain reminiscent of the winged-helix fold typically involved in DNA binding. However, further analysis shows that this domain does not bind DNA, suggesting it may belong to a small group of winged-helix domains involved in protein-protein interactions.
Collapse
Affiliation(s)
- Abhinav Kumar
- Joint Center for Structural Genomics, La Jolla, California, United States of America
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, California, United States of America
| | - Sabine Möcklinghoff
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, California, United States of America
| | - Fumiaki Yumoto
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, California, United States of America
- Gladstone Institute of Cardiovascular Disease, San Francisco, California, United States of America
| | - Lukasz Jaroszewski
- Joint Center for Structural Genomics, La Jolla, California, United States of America
- Program on Bioinformatics and Systems Biology, Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Carol L. Farr
- Joint Center for Structural Genomics, La Jolla, California, United States of America
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, California, United States of America
| | - Anna Grzechnik
- Joint Center for Structural Genomics, La Jolla, California, United States of America
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, California, United States of America
| | - Phuong Nguyen
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, California, United States of America
| | - Christian X. Weichenberger
- Joint Center for Structural Genomics, La Jolla, California, United States of America
- Program on Bioinformatics and Systems Biology, Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Hsiu-Ju Chiu
- Joint Center for Structural Genomics, La Jolla, California, United States of America
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, California, United States of America
| | - Heath E. Klock
- Joint Center for Structural Genomics, La Jolla, California, United States of America
- Protein Sciences Department, Genomics Institute of the Novartis Research Foundation, San Diego, California, United States of America
| | - Marc-André Elsliger
- Joint Center for Structural Genomics, La Jolla, California, United States of America
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, California, United States of America
| | - Ashley M. Deacon
- Joint Center for Structural Genomics, La Jolla, California, United States of America
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, California, United States of America
| | - Adam Godzik
- Joint Center for Structural Genomics, La Jolla, California, United States of America
- Program on Bioinformatics and Systems Biology, Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
- Center for Research in Biological Systems, University of California San Diego, La Jolla, California, United States of America
| | - Scott A. Lesley
- Joint Center for Structural Genomics, La Jolla, California, United States of America
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, California, United States of America
- Protein Sciences Department, Genomics Institute of the Novartis Research Foundation, San Diego, California, United States of America
| | - Bruce R. Conklin
- Gladstone Institute of Cardiovascular Disease, San Francisco, California, United States of America
- Departments of Medicine and Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco, California, United States of America
| | - Robert J. Fletterick
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, California, United States of America
- * E-mail: (RJF); (IAW)
| | - Ian A. Wilson
- Joint Center for Structural Genomics, La Jolla, California, United States of America
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, California, United States of America
- * E-mail: (RJF); (IAW)
| |
Collapse
|