1
|
Bhattacharya S, Roche R, Shuvo MH, Moussad B, Bhattacharya D. Contact-Assisted Threading in Low-Homology Protein Modeling. Methods Mol Biol 2023; 2627:41-59. [PMID: 36959441 DOI: 10.1007/978-1-0716-2974-1_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
The ability to successfully predict the three-dimensional structure of a protein from its amino acid sequence has made considerable progress in the recent past. The progress is propelled by the improved accuracy of deep learning-based inter-residue contact map predictors coupled with the rising growth of protein sequence databases. Contact map encodes interatomic interaction information that can be exploited for highly accurate prediction of protein structures via contact map threading even for the query proteins that are not amenable to direct homology modeling. As such, contact-assisted threading has garnered considerable research effort. In this chapter, we provide an overview of existing contact-assisted threading methods while highlighting the recent advances and discussing some of the current limitations and future prospects in the application of contact-assisted threading for improving the accuracy of low-homology protein modeling.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| | | | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | | |
Collapse
|
2
|
Ben Boubaker R, Tiss A, Henrion D, Chabbert M. Homology Modeling in the Twilight Zone: Improved Accuracy by Sequence Space Analysis. Methods Mol Biol 2023; 2627:1-23. [PMID: 36959439 DOI: 10.1007/978-1-0716-2974-1_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
The analysis of the relationship between sequence and structure similarities during the evolution of a protein family has revealed a limit of sequence divergence for which structural conservation can be confidently assumed and homology modeling is reliable. Below this limit, the twilight zone corresponds to sequence divergence for which homology modeling becomes increasingly difficult and requires specific methods. Either with conventional threading methods or with recent deep learning methods, such as AlphaFold, the challenge relies on the identification of a template that shares not only a common ancestor (homology) but also a conserved structure with the query. As both homology and structural conservation are transitive properties, mining of sequence databases followed by multidimensional scaling (MDS) of the query sequence space can reveal intermediary sequences to infer homology and structural conservation between the query and the template. Here, as a case study, we studied the plethodontid receptivity factor isoform 1 (PRF1) from Plethodon jordani, a member of a pheromone protein family present only in lungless salamanders and weakly related to cytokines of the IL6 family. A variety of conventional threading methods led to the cytokine CNTF as a template. Sequence mining, followed by phylogenetic and MDS analysis, provided missing links between PRF1 and CNTF and allowed reliable homology modeling. In addition, we compared automated models obtained from web servers to a customized model to show how modeling can be improved by expert information.
Collapse
Affiliation(s)
- Rym Ben Boubaker
- UMR CNRS 6015 - INSERM 1083, Laboratoire MITOVASC, Université d'Angers, Angers, France
| | - Asma Tiss
- UMR CNRS 6015 - INSERM 1083, Laboratoire MITOVASC, Université d'Angers, Angers, France
| | - Daniel Henrion
- UMR CNRS 6015 - INSERM 1083, Laboratoire MITOVASC, Université d'Angers, Angers, France
| | - Marie Chabbert
- UMR CNRS 6015 - INSERM 1083, Laboratoire MITOVASC, Université d'Angers, Angers, France.
| |
Collapse
|
3
|
Pearce R, Zhang Y. Toward the solution of the protein structure prediction problem. J Biol Chem 2021; 297:100870. [PMID: 34119522 PMCID: PMC8254035 DOI: 10.1016/j.jbc.2021.100870] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 06/07/2021] [Accepted: 06/09/2021] [Indexed: 11/20/2022] Open
Abstract
Since Anfinsen demonstrated that the information encoded in a protein's amino acid sequence determines its structure in 1973, solving the protein structure prediction problem has been the Holy Grail of structural biology. The goal of protein structure prediction approaches is to utilize computational modeling to determine the spatial location of every atom in a protein molecule starting from only its amino acid sequence. Depending on whether homologous structures can be found in the Protein Data Bank (PDB), structure prediction methods have been historically categorized as template-based modeling (TBM) or template-free modeling (FM) approaches. Until recently, TBM has been the most reliable approach to predicting protein structures, and in the absence of reliable templates, the modeling accuracy sharply declines. Nevertheless, the results of the most recent community-wide assessment of protein structure prediction experiment (CASP14) have demonstrated that the protein structure prediction problem can be largely solved through the use of end-to-end deep machine learning techniques, where correct folds could be built for nearly all single-domain proteins without using the PDB templates. Critically, the model quality exhibited little correlation with the quality of available template structures, as well as the number of sequence homologs detected for a given target protein. Thus, the implementation of deep-learning techniques has essentially broken through the 50-year-old modeling border between TBM and FM approaches and has made the success of high-resolution structure prediction significantly less dependent on template availability in the PDB library.
Collapse
Affiliation(s)
- Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA; Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, USA.
| |
Collapse
|
4
|
Abstract
Biologists are increasingly aware of the importance of protein structure in revealing function. The computational tools now exist which allow researchers to model unknown proteins simply on the basis of their primary sequence. However, for the non-specialist bioinformatician, there is a dazzling array of terminology, acronyms, and competing computer software available for this process. This review is intended to highlight the key stages of computational protein structure prediction, as well as explain the reasons behind some of the procedures and list some established workarounds for common pitfalls. Thereafter follows a review of five one-stop servers for start-to-finish structure prediction.
Collapse
|
5
|
Skolnick J, Gao M. The role of local versus nonlocal physicochemical restraints in determining protein native structure. Curr Opin Struct Biol 2020; 68:1-8. [PMID: 33129066 DOI: 10.1016/j.sbi.2020.10.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 10/03/2020] [Accepted: 10/05/2020] [Indexed: 12/15/2022]
Abstract
The tertiary structure of a native protein is dictated by the interplay of local secondary structure propensities, hydrogen bonding, and tertiary interactions. It is argued that the space of known protein topologies covers all single domain folds and results from the compactness of the native structure and excluded volume. Protein compactness combined with the chirality of the protein's side chains also yields native-like Ramachandran plots. It is the many-body, tertiary interactions among residues that collectively select for the global structure that a particular protein sequence adopts. This explains why the recent advances in deep-learning approaches that predict protein side-chain contacts, the distance matrix between residues, and sequence alignments are successful. They succeed because they implicitly learned the many-body interactions among protein residues.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, GA 30332, United States.
| | - Mu Gao
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, GA 30332, United States.
| |
Collapse
|
6
|
Chen MC, Li Y, Zhu YH, Ge F, Yu DJ. SSCpred: Single-Sequence-Based Protein Contact Prediction Using Deep Fully Convolutional Network. J Chem Inf Model 2020; 60:3295-3303. [DOI: 10.1021/acs.jcim.9b01207] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Ming-Cai Chen
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| | - Yang Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Washtenaw 100, Ann Arbor, Michigan 48109-2218, United States
| | - Yi-Heng Zhu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| | - Fang Ge
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| |
Collapse
|
7
|
Kanakaveti V, Shanmugam A, Ramakrishnan C, Anoosha P, Sakthivel R, Rayala SK, Gromiha MM. Computational approaches for identifying potential inhibitors on targeting protein interactions in drug discovery. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2020; 121:25-47. [PMID: 32312424 DOI: 10.1016/bs.apcsb.2019.11.013] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In the era of big data, the interplay of artificial and human intelligence is the demanding job to address the concerns involving exchange of decisions between both sides. Drug discovery is one of the key sources of the big data, which involves synergy among various computational methods to achieve a clinical success. Rightful acquisition, mining and analysis of the data related to ligand and targets are crucial to accomplish reliable outcomes in the entire process. Novel designing and screening tactics are necessary to substantiate a potent and efficient lead compounds. Such methods are emphasized and portrayed in the current review targeting protein-ligand and protein-protein interactions involved in various diseases with potential applications.
Collapse
Affiliation(s)
- Vishnupriya Kanakaveti
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India
| | - Anusuya Shanmugam
- Department of Pharmaceutical Engineering, Vinayaka Mission's Kirupananda Variyar Engineering College, Vinayaka Mission's Research Foundation (Deemed to be University), Salem, Tamil Nadu, India
| | - C Ramakrishnan
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India
| | - P Anoosha
- Department of Internal Medicine, Division of Medical Oncology and Comprehensive Cancer Center, The Ohio State University, Columbus, OH, United States
| | - R Sakthivel
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India
| | - S K Rayala
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India; Advanced Computational Drug Discovery Unit (ACDD), Tokyo Tech World Research Hub Initiative (WRHI), Institute of Innovative Research, Tokyo Institute of Technology, Midori-ku, Yokohama, Japan
| |
Collapse
|
8
|
DESTINI: A deep-learning approach to contact-driven protein structure prediction. Sci Rep 2019; 9:3514. [PMID: 30837676 PMCID: PMC6401133 DOI: 10.1038/s41598-019-40314-1] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Accepted: 02/12/2019] [Indexed: 11/09/2022] Open
Abstract
The amino acid sequence of a protein encodes the blueprint of its native structure. To predict the corresponding structural fold from the protein’s sequence is one of most challenging problems in computational biology. In this work, we introduce DESTINI (deep structural inference for proteins), a novel computational approach that combines a deep-learning algorithm for protein residue/residue contact prediction with template-based structural modelling. For the first time, the significantly improved predictive ability is demonstrated in the large-scale tertiary structure prediction of over 1,200 single-domain proteins. DESTINI successfully predicts the tertiary structure of four times the number of “hard” targets (those with poor quality templates) that were previously intractable, viz, a “glass-ceiling” for previous template-based approaches, and also improves model quality for “easy” targets (those with good quality templates). The significantly better performance by DESTINI is largely due to the incorporation of better contact prediction into template modelling. To understand why deep-learning accomplishes more accurate contact prediction, systematic clustering reveals that deep-learning predicts coherent, native-like contact patterns compared to co-evolutionary analysis. Taken together, this work presents a promising strategy towards solving the protein structure prediction problem.
Collapse
|
9
|
Robertson JC, Perez A, Dill KA. MELD × MD Folds Nonthreadables, Giving Native Structures and Populations. J Chem Theory Comput 2018; 14:6734-6740. [PMID: 30407805 DOI: 10.1021/acs.jctc.8b00886] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A current challenge is to compute the native structures of proteins from their amino acid sequences. A main approach of bioinformatics is threading, in which a protein to be predicted is computationally threaded onto protein fragments of similar sequence having an already known structure. However, ∼15% of proteins cannot be folded in this way; this has been called the glass ceiling, and the proteins are called nonthreadables. For these, physical molecular dynamics (MD) modeling is promising because it does not require templates. We find that MD, when used with an accelerator called MELD, can fold many nonthreadables. For 41 nonthreadable proteins with fewer than 125 residues, MELD-accelerated MD (MELD × MD) folds 20 of them to better than 4 Å error. In 10 cases, MELD × MD succeeds even when the force field does not properly encode the native state. In 11 cases, MELD × MD foretells its own success; seeing large Boltzmann populations in the simulations predicts it has converged to the correct native state. MELD × MD acceleration can be applied to a broad physical protein modeling range.
Collapse
Affiliation(s)
- James C Robertson
- Laufer Center for Physical and Quantitative Biology , Stony Brook University , Stony Brook , New York 11794 , United States
| | - Alberto Perez
- Laufer Center for Physical and Quantitative Biology , Stony Brook University , Stony Brook , New York 11794 , United States
| | - Ken A Dill
- Laufer Center for Physical and Quantitative Biology , Stony Brook University , Stony Brook , New York 11794 , United States.,Department of Chemistry , Stony Brook University , Stony Brook , New York 11794 , United States.,Department of Physics and Astronomy , Stony Brook University , Stony Brook , New York 11794 , United States
| |
Collapse
|
10
|
Ladunga I. Finding Homologs in Amino Acid Sequences Using Network BLAST Searches. ACTA ACUST UNITED AC 2018; 59:3.4.1-3.4.24. [DOI: 10.1002/cpbi.34] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Istvan Ladunga
- Departments of Statistics, Biochemistry and School of Biological Sciences, University of Nebraska–Lincoln Lincoln Nebraska
| |
Collapse
|