1
|
Wuyun Q, Chen Y, Shen Y, Cao Y, Hu G, Cui W, Gao J, Zheng W. Recent Progress of Protein Tertiary Structure Prediction. Molecules 2024; 29:832. [PMID: 38398585 PMCID: PMC10893003 DOI: 10.3390/molecules29040832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 02/06/2024] [Accepted: 02/08/2024] [Indexed: 02/25/2024] Open
Abstract
The prediction of three-dimensional (3D) protein structure from amino acid sequences has stood as a significant challenge in computational and structural bioinformatics for decades. Recently, the widespread integration of artificial intelligence (AI) algorithms has substantially expedited advancements in protein structure prediction, yielding numerous significant milestones. In particular, the end-to-end deep learning method AlphaFold2 has facilitated the rise of structure prediction performance to new heights, regularly competitive with experimental structures in the 14th Critical Assessment of Protein Structure Prediction (CASP14). To provide a comprehensive understanding and guide future research in the field of protein structure prediction for researchers, this review describes various methodologies, assessments, and databases in protein structure prediction, including traditionally used protein structure prediction methods, such as template-based modeling (TBM) and template-free modeling (FM) approaches; recently developed deep learning-based methods, such as contact/distance-guided methods, end-to-end folding methods, and protein language model (PLM)-based methods; multi-domain protein structure prediction methods; the CASP experiments and related assessments; and the recently released AlphaFold Protein Structure Database (AlphaFold DB). We discuss their advantages, disadvantages, and application scopes, aiming to provide researchers with insights through which to understand the limitations, contexts, and effective selections of protein structure prediction methods in protein-related fields.
Collapse
Affiliation(s)
- Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Yihan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Yifeng Shen
- Faculty of Environment and Information Studies, Keio University, Fujisawa 252-0882, Kanagawa, Japan;
| | - Yang Cao
- College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China
| | - Wei Cui
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
2
|
Dolorfino M, Samanta R, Vorobieva A. ProteinMPNN Recovers Complex Sequence Properties of Transmembrane β-barrels. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.16.575764. [PMID: 38352434 PMCID: PMC10862708 DOI: 10.1101/2024.01.16.575764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/19/2024]
Abstract
Recent deep-learning (DL) protein design methods have been successfully applied to a range of protein design problems, including the de novo design of novel folds, protein binders, and enzymes. However, DL methods have yet to meet the challenge of de novo membrane protein (MP) and the design of complex β-sheet folds. We performed a comprehensive benchmark of one DL protein sequence design method, ProteinMPNN, using transmembrane and water-soluble β-barrel folds as a model, and compared the performance of ProteinMPNN to the new membrane-specific Rosetta Franklin2023 energy function. We tested the effect of input backbone refinement on ProteinMPNN performance and found that given refined and well-defined inputs, ProteinMPNN more accurately captures global sequence properties despite complex folding biophysics. It generates more diverse TMB sequences than Franklin2023 in pore-facing positions. In addition, ProteinMPNN generated TMB sequences that passed state-of-the-art in silico filters for experimental validation, suggesting that the model could be used in de novo design tasks of diverse nanopores for single-molecule sensing and sequencing. Lastly, our results indicate that the low success rate of ProteinMPNN for the design of β-sheet proteins stems from backbone input accuracy rather than software limitations.
Collapse
Affiliation(s)
- Marissa Dolorfino
- Structural Biology Brussel, Vrije Universiteit Brussel, Brussels, Belgium
- VUB-VIB Center for Structural Biology, Brussels, Belgium
| | | | - Anastassia Vorobieva
- Structural Biology Brussel, Vrije Universiteit Brussel, Brussels, Belgium
- VUB-VIB Center for Structural Biology, Brussels, Belgium
- VIB Center for AI and Computational Biology, Belgium
| |
Collapse
|
3
|
Wang J, Chen C, Yao G, Ding J, Wang L, Jiang H. Intelligent Protein Design and Molecular Characterization Techniques: A Comprehensive Review. Molecules 2023; 28:7865. [PMID: 38067593 PMCID: PMC10707872 DOI: 10.3390/molecules28237865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/13/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
In recent years, the widespread application of artificial intelligence algorithms in protein structure, function prediction, and de novo protein design has significantly accelerated the process of intelligent protein design and led to many noteworthy achievements. This advancement in protein intelligent design holds great potential to accelerate the development of new drugs, enhance the efficiency of biocatalysts, and even create entirely new biomaterials. Protein characterization is the key to the performance of intelligent protein design. However, there is no consensus on the most suitable characterization method for intelligent protein design tasks. This review describes the methods, characteristics, and representative applications of traditional descriptors, sequence-based and structure-based protein characterization. It discusses their advantages, disadvantages, and scope of application. It is hoped that this could help researchers to better understand the limitations and application scenarios of these methods, and provide valuable references for choosing appropriate protein characterization techniques for related research in the field, so as to better carry out protein research.
Collapse
Affiliation(s)
| | | | | | - Junjie Ding
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Liangliang Wang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Hui Jiang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| |
Collapse
|
4
|
Khakzad H, Igashov I, Schneuing A, Goverde C, Bronstein M, Correia B. A new age in protein design empowered by deep learning. Cell Syst 2023; 14:925-939. [PMID: 37972559 DOI: 10.1016/j.cels.2023.10.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 06/22/2023] [Accepted: 10/11/2023] [Indexed: 11/19/2023]
Abstract
The rapid progress in the field of deep learning has had a significant impact on protein design. Deep learning methods have recently produced a breakthrough in protein structure prediction, leading to the availability of high-quality models for millions of proteins. Along with novel architectures for generative modeling and sequence analysis, they have revolutionized the protein design field in the past few years remarkably by improving the accuracy and ability to identify novel protein sequences and structures. Deep neural networks can now learn and extract the fundamental features of protein structures, predict how they interact with other biomolecules, and have the potential to create new effective drugs for treating disease. As their applicability in protein design is rapidly growing, we review the recent developments and technology in deep learning methods and provide examples of their performance to generate novel functional proteins.
Collapse
Affiliation(s)
- Hamed Khakzad
- Université de Lorraine, CNRS, Inria, LORIA, 54000 Nancy, France; École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Ilia Igashov
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Arne Schneuing
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Casper Goverde
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | | | - Bruno Correia
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| |
Collapse
|
5
|
Larrea-Sebal A, Jebari-Benslaiman S, Galicia-Garcia U, Jose-Urteaga AS, Uribe KB, Benito-Vicente A, Martín C. Predictive Modeling and Structure Analysis of Genetic Variants in Familial Hypercholesterolemia: Implications for Diagnosis and Protein Interaction Studies. Curr Atheroscler Rep 2023; 25:839-859. [PMID: 37847331 PMCID: PMC10618353 DOI: 10.1007/s11883-023-01154-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/15/2023] [Indexed: 10/18/2023]
Abstract
PURPOSE OF REVIEW Familial hypercholesterolemia (FH) is a hereditary condition characterized by elevated levels of low-density lipoprotein cholesterol (LDL-C), which increases the risk of cardiovascular disease if left untreated. This review aims to discuss the role of bioinformatics tools in evaluating the pathogenicity of missense variants associated with FH. Specifically, it highlights the use of predictive models based on protein sequence, structure, evolutionary conservation, and other relevant features in identifying genetic variants within LDLR, APOB, and PCSK9 genes that contribute to FH. RECENT FINDINGS In recent years, various bioinformatics tools have emerged as valuable resources for analyzing missense variants in FH-related genes. Tools such as REVEL, Varity, and CADD use diverse computational approaches to predict the impact of genetic variants on protein function. These tools consider factors such as sequence conservation, structural alterations, and receptor binding to aid in interpreting the pathogenicity of identified missense variants. While these predictive models offer valuable insights, the accuracy of predictions can vary, especially for proteins with unique characteristics that might not be well represented in the databases used for training. This review emphasizes the significance of utilizing bioinformatics tools for assessing the pathogenicity of FH-associated missense variants. Despite their contributions, a definitive diagnosis of a genetic variant necessitates functional validation through in vitro characterization or cascade screening. This step ensures the precise identification of FH-related variants, leading to more accurate diagnoses. Integrating genetic data with reliable bioinformatics predictions and functional validation can enhance our understanding of the genetic basis of FH, enabling improved diagnosis, risk stratification, and personalized treatment for affected individuals. The comprehensive approach outlined in this review promises to advance the management of this inherited disorder, potentially leading to better health outcomes for those affected by FH.
Collapse
Affiliation(s)
- Asier Larrea-Sebal
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
- Fundación Biofisika Bizkaia, 48940, Leioa, Spain
| | - Shifa Jebari-Benslaiman
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - Unai Galicia-Garcia
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - Ane San Jose-Urteaga
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
| | - Kepa B Uribe
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
| | - Asier Benito-Vicente
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - César Martín
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain.
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain.
| |
Collapse
|
6
|
Islam S, Pantazes RJ. Developing similarity matrices for antibody-protein binding interactions. PLoS One 2023; 18:e0293606. [PMID: 37883504 PMCID: PMC10602319 DOI: 10.1371/journal.pone.0293606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 10/17/2023] [Indexed: 10/28/2023] Open
Abstract
The inventions of AlphaFold and RoseTTAFold are revolutionizing computational protein science due to their abilities to reliably predict protein structures. Their unprecedented successes are due to the parallel consideration of several types of information, one of which is protein sequence similarity information. Sequence homology has been studied for many decades and depends on similarity matrices to define how similar or different protein sequences are to one another. A natural extension of predicting protein structures is predicting the interactions between proteins, but similarity matrices for protein-protein interactions do not exist. This study conducted a mutational analysis of 384 non-redundant antibody-protein antigen complexes to calculate antibody-protein interaction similarity matrices. Every important residue in each antibody and each antigen was mutated to each of the other 19 commonly occurring amino acids and the percentage changes in interaction energies were calculated using three force fields: CHARMM, Amber, and Rosetta. The data were used to construct six interaction similarity matrices, one for antibodies and another for antigens using each force field. The matrices exhibited both commonalities, such as mutations of aromatic and charged residues being the most detrimental, and differences, such as Rosetta predicting mutations of serines to be better tolerated than either Amber or CHARMM. A comparison to nine previously published similarity matrices for protein sequences revealed that the new interaction matrices are more similar to one another than they are to any of the previous matrices. The created similarity matrices can be used in force field specific applications to help guide decisions regarding mutations in protein-protein binding interfaces.
Collapse
Affiliation(s)
- Sumaiya Islam
- Department of Chemical Engineering, Auburn University, Auburn, Alabama, United States of America
| | - Robert J. Pantazes
- Department of Chemical Engineering, Auburn University, Auburn, Alabama, United States of America
| |
Collapse
|
7
|
Huang B, Kong L, Wang C, Ju F, Zhang Q, Zhu J, Gong T, Zhang H, Yu C, Zheng WM, Bu D. Protein Structure Prediction: Challenges, Advances, and the Shift of Research Paradigms. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:913-925. [PMID: 37001856 PMCID: PMC10928435 DOI: 10.1016/j.gpb.2022.11.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 11/23/2022] [Accepted: 11/30/2022] [Indexed: 03/31/2023]
Abstract
Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields, including biochemistry, medicine, physics, mathematics, and computer science. These researchers adopt various research paradigms to attack the same structure prediction problem: biochemists and physicists attempt to reveal the principles governing protein folding; mathematicians, especially statisticians, usually start from assuming a probability distribution of protein structures given a target sequence and then find the most likely structure, while computer scientists formulate protein structure prediction as an optimization problem - finding the structural conformation with the lowest energy or minimizing the difference between predicted structure and native structure. These research paradigms fall into the two statistical modeling cultures proposed by Leo Breiman, namely, data modeling and algorithmic modeling. Recently, we have also witnessed the great success of deep learning in protein structure prediction. In this review, we present a survey of the efforts for protein structure prediction. We compare the research paradigms adopted by researchers from different fields, with an emphasis on the shift of research paradigms in the era of deep learning. In short, the algorithmic modeling techniques, especially deep neural networks, have considerably improved the accuracy of protein structure prediction; however, theories interpreting the neural networks and knowledge on protein folding are still highly desired.
Collapse
Affiliation(s)
- Bin Huang
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lupeng Kong
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; Changping Laboratory, Beijing 102206, China
| | - Chao Wang
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Fusong Ju
- Microsoft Research AI4Science, Beijing 100080, China
| | - Qi Zhang
- Huawei Noah's Ark Lab, Wuhan 430206, China
| | - Jianwei Zhu
- Microsoft Research AI4Science, Beijing 100080, China
| | - Tiansu Gong
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Haicang Zhang
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China; Zhongke Big Data Academy, Zhengzhou 450046, China.
| | - Chungong Yu
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China; Zhongke Big Data Academy, Zhengzhou 450046, China.
| | - Wei-Mou Zheng
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China.
| | - Dongbo Bu
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China; Zhongke Big Data Academy, Zhengzhou 450046, China.
| |
Collapse
|
8
|
Kao TY, Chiang YW. DEERefiner-assisted structural refinement using pulsed dipolar spectroscopy: a study on multidrug transporter LmrP. Phys Chem Chem Phys 2023; 25:24508-24517. [PMID: 37656008 DOI: 10.1039/d3cp02569a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
Pulsed dipolar spectroscopy, such as double electron-electron resonance (DEER), has been underutilized in protein structure determination, despite its ability to provide valuable spatial information. In this study, we present DEERefiner, a user-friendly MATLAB-based GUI program that enables the modeling of protein structures by combining an initial structure and DEER distance restraints. We illustrate the effectiveness of DEERefiner by successfully modeling the ligand-dependent conformational changes of the proton-drug antiporter LmrP to an extracellular-open-like conformation with an impressive precision of 0.76 Å. Additionally, DEERefiner was able to uncover a previously hypothesized but experimentally unresolved proton-dependent conformation of LmrP, characterized as an extracellular-closed/partially intracellular-open conformation, with a precision of 1.16 Å. Our work not only highlights the ability of DEER spectroscopy to model protein structures but also reveals the potential of DEERefiner to advance the field by providing an accessible and applicable tool for precise protein structure modeling, thereby paving the way for deeper insights into protein function.
Collapse
Affiliation(s)
- Te-Yu Kao
- Department of Chemistry, National Tsing Hua University, Hsinchu 300-044, Taiwan.
| | - Yun-Wei Chiang
- Department of Chemistry, National Tsing Hua University, Hsinchu 300-044, Taiwan.
| |
Collapse
|
9
|
Stern JA, Free TJ, Stern KL, Gardiner S, Dalley NA, Bundy BC, Price JL, Wingate D, Della Corte D. A probabilistic view of protein stability, conformational specificity, and design. Sci Rep 2023; 13:15493. [PMID: 37726313 PMCID: PMC10509192 DOI: 10.1038/s41598-023-42032-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 09/04/2023] [Indexed: 09/21/2023] Open
Abstract
Various approaches have used neural networks as probabilistic models for the design of protein sequences. These "inverse folding" models employ different objective functions, which come with trade-offs that have not been assessed in detail before. This study introduces probabilistic definitions of protein stability and conformational specificity and demonstrates the relationship between these chemical properties and the [Formula: see text] Boltzmann probability objective. This links the Boltzmann probability objective function to experimentally verifiable outcomes. We propose a novel sequence decoding algorithm, referred to as "BayesDesign", that leverages Bayes' Rule to maximize the [Formula: see text] objective instead of the [Formula: see text] objective common in inverse folding models. The efficacy of BayesDesign is evaluated in the context of two protein model systems, the NanoLuc enzyme and the WW structural motif. Both BayesDesign and the baseline ProteinMPNN algorithm increase the thermostability of NanoLuc and increase the conformational specificity of WW. The possible sources of error in the model are analyzed.
Collapse
Affiliation(s)
- Jacob A Stern
- Department of Computer Science, Brigham Young University, Provo, UT, USA
| | - Tyler J Free
- Department of Chemical Engineering, Brigham Young University, Provo, UT, USA
| | - Kimberlee L Stern
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, UT, USA
| | - Spencer Gardiner
- Department of Physics and Astronomy, Brigham Young University, Provo, UT, USA
| | - Nicholas A Dalley
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, UT, USA
| | - Bradley C Bundy
- Department of Chemical Engineering, Brigham Young University, Provo, UT, USA
| | - Joshua L Price
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, UT, USA
| | - David Wingate
- Department of Computer Science, Brigham Young University, Provo, UT, USA
| | - Dennis Della Corte
- Department of Physics and Astronomy, Brigham Young University, Provo, UT, USA.
| |
Collapse
|
10
|
Niazi SK. The Coming of Age of AI/ML in Drug Discovery, Development, Clinical Testing, and Manufacturing: The FDA Perspectives. Drug Des Devel Ther 2023; 17:2691-2725. [PMID: 37701048 PMCID: PMC10493153 DOI: 10.2147/dddt.s424991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 08/24/2023] [Indexed: 09/14/2023] Open
Abstract
Artificial intelligence (AI) and machine learning (ML) represent significant advancements in computing, building on technologies that humanity has developed over millions of years-from the abacus to quantum computers. These tools have reached a pivotal moment in their development. In 2021 alone, the U.S. Food and Drug Administration (FDA) received over 100 product registration submissions that heavily relied on AI/ML for applications such as monitoring and improving human performance in compiling dossiers. To ensure the safe and effective use of AI/ML in drug discovery and manufacturing, the FDA and numerous other U.S. federal agencies have issued continuously updated, stringent guidelines. Intriguingly, these guidelines are often generated or updated with the aid of AI/ML tools themselves. The overarching goal is to expedite drug discovery, enhance the safety profiles of existing drugs, introduce novel treatment modalities, and improve manufacturing compliance and robustness. Recent FDA publications offer an encouraging outlook on the potential of these tools, emphasizing the need for their careful deployment. This has expanded market opportunities for retraining personnel handling these technologies and enabled innovative applications in emerging therapies such as gene editing, CRISPR-Cas9, CAR-T cells, mRNA-based treatments, and personalized medicine. In summary, the maturation of AI/ML technologies is a testament to human ingenuity. Far from being autonomous entities, these are tools created by and for humans designed to solve complex problems now and in the future. This paper aims to present the status of these technologies, along with examples of their present and future applications.
Collapse
|
11
|
Zaman AB, Inan TT, De Jong K, Shehu A. Adaptive Stochastic Optimization to Improve Protein Conformation Sampling. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2759-2771. [PMID: 34882562 DOI: 10.1109/tcbb.2021.3134103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We have long known that characterizing protein structures structure is key to understanding protein function. Computational approaches have largely addressed a narrow formulation of the problem, seeking to compute one native structure from an amino-acid sequence. Now AlphaFold2 is shown to be able to reveal a high-quality native structure for many proteins. However, researchers over the years have argued for broadening our view to account for the multiplicity of native structures. We now know that many protein molecules switch between different structures to regulate interactions with molecular partners in the cell. Elucidating such structures de novo is exceptionally difficult, as it requires exploration of possibly a very large structure space in search of competing, near-optimal structures. Here we report on a novel stochastic optimization method capable of revealing very different structures for a given protein from knowledge of its amino-acid sequence. The method leverages evolutionary search techniques and adapts its exploration of the search space to balance between exploration and exploitation in the presence of a computational budget. In addition to demonstrating the utility of this method for identifying multiple native structures, we additionally provide a benchmark dataset for researchers to continue work on this problem.
Collapse
|
12
|
Minami S, Kobayashi N, Sugiki T, Nagashima T, Fujiwara T, Tatsumi-Koga R, Chikenji G, Koga N. Exploration of novel αβ-protein folds through de novo design. Nat Struct Mol Biol 2023; 30:1132-1140. [PMID: 37400653 PMCID: PMC10442233 DOI: 10.1038/s41594-023-01029-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 05/30/2023] [Indexed: 07/05/2023]
Abstract
A fundamental question in protein evolution is whether nature has exhaustively sampled nearly all possible protein folds throughout evolution, or whether a large fraction of the possible folds remains unexplored. To address this question, we defined a set of rules for β-sheet topology to predict novel αβ-folds and carried out a systematic de novo protein design exploration of the novel αβ-folds predicted by the rules. The designs for all eight of the predicted novel αβ-folds with a four-stranded β-sheet, including a knot-forming one, folded into structures close to the design models. Further, the rules predicted more than 10,000 novel αβ-folds with five- to eight-stranded β-sheets; this number far exceeds the number of αβ-folds observed in nature so far. This result suggests that a vast number of αβ-folds are possible, but have not emerged or have become extinct due to evolutionary bias.
Collapse
Affiliation(s)
- Shintaro Minami
- Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences (NINS), Okazaki, Japan
| | - Naohiro Kobayashi
- Institute for Protein Research (IPR), Osaka University, Osaka, Japan
- RIKEN Center for Biosystems Dynamics Research, RIKEN, Yokohama, Japan
| | - Toshihiko Sugiki
- Institute for Protein Research (IPR), Osaka University, Osaka, Japan
| | - Toshio Nagashima
- RIKEN Center for Biosystems Dynamics Research, RIKEN, Yokohama, Japan
| | | | - Rie Tatsumi-Koga
- Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences (NINS), Okazaki, Japan
| | - George Chikenji
- Department of Applied Physics, Graduate School of Engineering, Nagoya University, Nagoya, Japan
| | - Nobuyasu Koga
- Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences (NINS), Okazaki, Japan.
- SOKENDAI, The Graduate University for Advanced Studies, Hayama, Japan.
- Research Center of Integrative Molecular Systems, Institute for Molecular Science (IMS), National Institutes of Natural Sciences (NINS), Okazaki, Japan.
- Laboratory for Protein Design, Institute for Protein Research (IPR), Osaka University, Osaka, Japan.
| |
Collapse
|
13
|
Ledwitch KV, Künze G, McKinney JR, Okwei E, Larochelle K, Pankewitz L, Ganguly S, Darling HL, Coin I, Meiler J. Sparse pseudocontact shift NMR data obtained from a non-canonical amino acid-linked lanthanide tag improves integral membrane protein structure prediction. JOURNAL OF BIOMOLECULAR NMR 2023; 77:69-82. [PMID: 37016190 PMCID: PMC10443207 DOI: 10.1007/s10858-023-00412-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 03/20/2023] [Indexed: 06/19/2023]
Abstract
A single experimental method alone often fails to provide the resolution, accuracy, and coverage needed to model integral membrane proteins (IMPs). Integrating computation with experimental data is a powerful approach to supplement missing structural information with atomic detail. We combine RosettaNMR with experimentally-derived paramagnetic NMR restraints to guide membrane protein structure prediction. We demonstrate this approach using the disulfide bond formation protein B (DsbB), an α-helical IMP. Here, we attached a cyclen-based paramagnetic lanthanide tag to an engineered non-canonical amino acid (ncAA) using a copper-catalyzed azide-alkyne cycloaddition (CuAAC) click chemistry reaction. Using this tagging strategy, we collected 203 backbone HN pseudocontact shifts (PCSs) for three different labeling sites and used these as input to guide de novo membrane protein structure prediction protocols in Rosetta. We find that this sparse PCS dataset combined with 44 long-range NOEs as restraints in our calculations improves structure prediction of DsbB by enhancements in model accuracy, sampling, and scoring. The inclusion of this PCS dataset improved the Cα-RMSD transmembrane segment values of the best-scoring and best-RMSD models from 9.57 Å and 3.06 Å (no NMR data) to 5.73 Å and 2.18 Å, respectively.
Collapse
Affiliation(s)
- Kaitlyn V Ledwitch
- Center for Structural Biology, Vanderbilt University, Nashville, TN, 37240, USA.
- Department of Chemistry, Vanderbilt University, Nashville, TN, 37235, USA.
- Department of Chemistry, Center for Structural Biology, MRBIII 5154E, Vanderbilt University, Nashville, TN, 37212, USA.
| | - Georg Künze
- Institute of Drug Discovery, Faculty of Medicine, University of Leipzig, 04103, Leipzig, Germany
| | - Jacob R McKinney
- Center for Structural Biology, Vanderbilt University, Nashville, TN, 37240, USA
- Department of Chemistry, Vanderbilt University, Nashville, TN, 37235, USA
| | - Elleansar Okwei
- Center for Structural Biology, Vanderbilt University, Nashville, TN, 37240, USA
- Department of Chemistry, Vanderbilt University, Nashville, TN, 37235, USA
| | - Katherine Larochelle
- Center for Structural Biology, Vanderbilt University, Nashville, TN, 37240, USA
- Department of Chemistry, Vanderbilt University, Nashville, TN, 37235, USA
| | - Lisa Pankewitz
- Center for Structural Biology, Vanderbilt University, Nashville, TN, 37240, USA
- Department of Chemistry, Vanderbilt University, Nashville, TN, 37235, USA
| | - Soumya Ganguly
- Center for Structural Biology, Vanderbilt University, Nashville, TN, 37240, USA
- Department of Chemistry, Vanderbilt University, Nashville, TN, 37235, USA
| | - Heather L Darling
- Center for Structural Biology, Vanderbilt University, Nashville, TN, 37240, USA
- Department of Chemistry, Vanderbilt University, Nashville, TN, 37235, USA
| | - Irene Coin
- Institute of Biochemistry, Faculty of Life Science, University of Leipzig, 04103, Leipzig, Germany
| | - Jens Meiler
- Center for Structural Biology, Vanderbilt University, Nashville, TN, 37240, USA
- Department of Chemistry, Vanderbilt University, Nashville, TN, 37235, USA
- Institute of Drug Discovery, Faculty of Medicine, University of Leipzig, 04103, Leipzig, Germany
| |
Collapse
|
14
|
van Aalst EJ, McDonald CJ, Wylie BJ. Cholesterol Biases the Conformational Landscape of the Chemokine Receptor CCR3: A MAS SSNMR-Filtered Molecular Dynamics Study. J Chem Inf Model 2023; 63:3068-3085. [PMID: 37127541 PMCID: PMC10208230 DOI: 10.1021/acs.jcim.2c01546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Indexed: 05/03/2023]
Abstract
Cholesterol directs the pathway of ligand-induced G protein-coupled receptor (GPCR) signal transduction. The GPCR C-C motif chemokine receptor 3 (CCR3) is the principal chemotactic receptor for eosinophils, with roles in cancer metastasis and autoinflammatory conditions. Recently, we discovered a direct correlation between bilayer cholesterol and increased agonist-triggered CCR3 signal transduction. However, the allosteric molecular mechanism escalating ligand affinity and G protein coupling is unknown. To study cholesterol-guided CCR3 conformational selection, we implement comparative, objective measurement of protein architectures by scoring shifts (COMPASS) to grade model structures from molecular dynamics simulations. In this workflow, we scored predicted chemical shifts against 2-dimensional solid-state NMR 13C-13C correlation spectra of U-15N,13C-CCR3 samples prepared with and without cholesterol. Our analysis of trajectory model structures uncovers that cholesterol induces site-specific conformational restraint of extracellular loop (ECL) 2 and conserved motion in transmembrane helices and ECL3 not observed in simulations of bilayers with only phosphatidylcholine lipids. PyLipID analysis implicates direct cholesterol agency in CCR3 conformational selection and dynamics. Residue-residue contact scoring shows that cholesterol biases the conformational selection of the orthosteric pocket involving Y411.39, Y1133.32, and E2877.39. Lastly, we observe contact remodeling in activation pathway residues centered on the initial transmission switch, Na+ pocket, and R3.50 in the DRY motif. Our observations have unique implications for understanding of CCR3 ligand recognition and specificity and provide mechanistic insight into how cholesterol functions as an allosteric regulator of CCR3 signal transduction.
Collapse
Affiliation(s)
- Evan J. van Aalst
- Department of Chemistry and
Biochemistry, Texas Tech University, Lubbock, Texas 79415, United States
| | - Corey J. McDonald
- Department of Chemistry and
Biochemistry, Texas Tech University, Lubbock, Texas 79415, United States
| | - Benjamin J. Wylie
- Department of Chemistry and
Biochemistry, Texas Tech University, Lubbock, Texas 79415, United States
| |
Collapse
|
15
|
Wodak SJ, Vajda S, Lensink MF, Kozakov D, Bates PA. Critical Assessment of Methods for Predicting the 3D Structure of Proteins and Protein Complexes. Annu Rev Biophys 2023; 52:183-206. [PMID: 36626764 PMCID: PMC10885158 DOI: 10.1146/annurev-biophys-102622-084607] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Advances in a scientific discipline are often measured by small, incremental steps. In this review, we report on two intertwined disciplines in the protein structure prediction field, modeling of single chains and modeling of complexes, that have over decades emulated this pattern, as monitored by the community-wide blind prediction experiments CASP and CAPRI. However, over the past few years, dramatic advances were observed for the accurate prediction of single protein chains, driven by a surge of deep learning methodologies entering the prediction field. We review the mainscientific developments that enabled these recent breakthroughs and feature the important role of blind prediction experiments in building up and nurturing the structure prediction field. We discuss how the new wave of artificial intelligence-based methods is impacting the fields of computational and experimental structural biology and highlight areas in which deep learning methods are likely to lead to future developments, provided that major challenges are overcome.
Collapse
Affiliation(s)
- Shoshana J Wodak
- VIB-VUB Center for Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium;
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA;
- Department of Chemistry, Boston University, Boston, Massachusetts, USA
| | - Marc F Lensink
- Univ. Lille, CNRS, UMR 8576-UGSF-Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France;
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA;
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, United Kingdom;
| |
Collapse
|
16
|
Koehler Leman J, Künze G. Recent Advances in NMR Protein Structure Prediction with ROSETTA. Int J Mol Sci 2023; 24:ijms24097835. [PMID: 37175539 PMCID: PMC10178863 DOI: 10.3390/ijms24097835] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 04/15/2023] [Accepted: 04/21/2023] [Indexed: 05/15/2023] Open
Abstract
Nuclear magnetic resonance (NMR) spectroscopy is a powerful method for studying the structure and dynamics of proteins in their native state. For high-resolution NMR structure determination, the collection of a rich restraint dataset is necessary. This can be difficult to achieve for proteins with high molecular weight or a complex architecture. Computational modeling techniques can complement sparse NMR datasets (<1 restraint per residue) with additional structural information to elucidate protein structures in these difficult cases. The Rosetta software for protein structure modeling and design is used by structural biologists for structure determination tasks in which limited experimental data is available. This review gives an overview of the computational protocols available in the Rosetta framework for modeling protein structures from NMR data. We explain the computational algorithms used for the integration of different NMR data types in Rosetta. We also highlight new developments, including modeling tools for data from paramagnetic NMR and hydrogen-deuterium exchange, as well as chemical shifts in CS-Rosetta. Furthermore, strategies are discussed to complement and improve structure predictions made by the current state-of-the-art AlphaFold2 program using NMR-guided Rosetta modeling.
Collapse
Affiliation(s)
- Julia Koehler Leman
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY 10010, USA
| | - Georg Künze
- Institute for Drug Discovery, Medical Faculty, University of Leipzig, Brüderstr. 34, D-04103 Leipzig, Germany
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstr. 16-18, D-04107 Leipzig, Germany
| |
Collapse
|
17
|
Mullapudi V, Vaquer-Alicea J, Bommareddy V, Vega AR, Ryder BD, White CL, Diamond MI, Joachimiak LA. Network of hotspot interactions cluster tau amyloid folds. Nat Commun 2023; 14:895. [PMID: 36797278 PMCID: PMC9935906 DOI: 10.1038/s41467-023-36572-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 02/06/2023] [Indexed: 02/18/2023] Open
Abstract
Cryogenic electron microscopy has revealed unprecedented molecular insight into the conformations of β-sheet-rich protein amyloids linked to neurodegenerative diseases. It remains unknown how a protein can adopt a diversity of folds and form multiple distinct fibrillar structures. Here we develop an in silico alanine scan method to estimate the relative energetic contribution of each amino acid in an amyloid assembly. We apply our method to twenty-seven ex vivo and in vitro fibril structural polymorphs of the microtubule-associated protein tau. We uncover networks of energetically important interactions involving amyloid-forming motifs that stabilize the different fibril folds. We evaluate our predictions in cellular and in vitro aggregation assays. Using a machine learning approach, we classify the structures based on residue energetics to identify distinguishing and unifying features. Our energetic profiling suggests that minimal sequence elements control the stability of tau fibrils, allowing future design of protein sequences that fold into unique structures.
Collapse
Affiliation(s)
- Vishruth Mullapudi
- Center for Alzheimer's and Neurodegenerative Diseases, Peter O'Donnell Jr. Brain Institute, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Jaime Vaquer-Alicea
- Center for Alzheimer's and Neurodegenerative Diseases, Peter O'Donnell Jr. Brain Institute, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Vaibhav Bommareddy
- Center for Alzheimer's and Neurodegenerative Diseases, Peter O'Donnell Jr. Brain Institute, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Anthony R Vega
- Center for Alzheimer's and Neurodegenerative Diseases, Peter O'Donnell Jr. Brain Institute, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Bryan D Ryder
- Center for Alzheimer's and Neurodegenerative Diseases, Peter O'Donnell Jr. Brain Institute, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- Molecular Biophysics Graduate Program, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Charles L White
- Center for Alzheimer's and Neurodegenerative Diseases, Peter O'Donnell Jr. Brain Institute, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Marc I Diamond
- Center for Alzheimer's and Neurodegenerative Diseases, Peter O'Donnell Jr. Brain Institute, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Lukasz A Joachimiak
- Center for Alzheimer's and Neurodegenerative Diseases, Peter O'Donnell Jr. Brain Institute, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
| |
Collapse
|
18
|
Zhao TY, Dunbar M, Keten S, Patankar NA. The buckling-condensation mechanism driving gas vesicle collapse. SOFT MATTER 2023; 19:1174-1185. [PMID: 36651808 DOI: 10.1039/d2sm00493c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Gas vesicles (GVs) are proteinaceous cylindrical shells found within bacteria or archea growing in aqueous environments and are composed primarily of two proteins, gas vesicle protein A and C (GvpA and GvpC). GVs exhibit strong performance as next-generation ultrasound contrast agents due to their gas-filled interior, tunable collapse pressure, stability in vivo and functionalizable exterior. However, the exact mechanism leading to GV collapse remains inconclusive, which leads to difficulty in predicting collapse pressures for different species of GVs and in extending favorable nonlinear response regimes. Here, we propose a two stage mechanism leading to GV loss of echogenicity and rupture under hydrostatic pressure: elastic buckling of the cylindrical shell coupled with condensation driven weakening of the GV membrane. Our goal is to therefore test whether the final fracture of the GV membrane occurs by the interplay of both mechanisms or purely through buckling failure as previously believed. To do so, we (1) compare the theoretical condensation and buckling pressures with that for experimental GV collapse and (2) describe how condensation can lead to plastic buckling failure. GV shell properties that are necessary input to this theoretical description, such as the elastic moduli and wettability of GvpA, are determined using molecular dynamics simulations of a novel structural model of GvpA that better represents the hydrophobic core. For GVs that are not reinforced by GvpC, this analytical framework shows that the experimentally observed pressures resulting in loss of echogenicity coincide with both the elastic buckling and condensation pressure regimes. We also found that the stress strain curve for GvpA wetted on both the interior and exterior exhibits a loss of mechanical stability compared to GvpA only wetted on the exterior by the bulk solution. We identify a pressure vs. vesicle size regime where condensation can occur prior to buckling, which may preclude nonlinear shell buckling responses in contrast imaging.
Collapse
Affiliation(s)
- Tom Y Zhao
- Northwestern University, Department of Mechanical Engineering, 2145 Sheridan Road, Evanston, Illinois 60208, USA.
| | - Martha Dunbar
- Northwestern University, Department of Mechanical Engineering, 2145 Sheridan Road, Evanston, Illinois 60208, USA.
| | - Sinan Keten
- Northwestern University, Department of Mechanical Engineering, 2145 Sheridan Road, Evanston, Illinois 60208, USA.
| | - Neelesh A Patankar
- Northwestern University, Department of Mechanical Engineering, 2145 Sheridan Road, Evanston, Illinois 60208, USA.
| |
Collapse
|
19
|
Vemula D, Jayasurya P, Sushmitha V, Kumar YN, Bhandari V. CADD, AI and ML in drug discovery: A comprehensive review. Eur J Pharm Sci 2023; 181:106324. [PMID: 36347444 DOI: 10.1016/j.ejps.2022.106324] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 10/26/2022] [Accepted: 11/03/2022] [Indexed: 11/06/2022]
Abstract
Computer-aided drug design (CADD) is an emerging field that has drawn a lot of interest because of its potential to expedite and lower the cost of the drug development process. Drug discovery research is expensive and time-consuming, and it frequently took 10-15 years for a drug to be commercially available. CADD has significantly impacted this area of research. Further, the combination of CADD with Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) technologies to handle enormous amounts of biological data has reduced the time and cost associated with the drug development process. This review will discuss how CADD, AI, ML, and DL approaches help identify drug candidates and various other steps of the drug discovery process. It will also provide a detailed overview of the different in silico tools used and how these approaches interact.
Collapse
Affiliation(s)
- Divya Vemula
- National Institute of Pharmaceutical Education and Research- Hyderabad, India
| | - Perka Jayasurya
- National Institute of Pharmaceutical Education and Research- Hyderabad, India
| | - Varthiya Sushmitha
- National Institute of Pharmaceutical Education and Research- Hyderabad, India
| | | | - Vasundhra Bhandari
- National Institute of Pharmaceutical Education and Research- Hyderabad, India.
| |
Collapse
|
20
|
Kordes S, Beck J, Shanmugaratnam S, Flecks M, Höcker B. Physics-based approach to extend a de novo TIM barrel with rationally designed helix-loop-helix motifs. Protein Eng Des Sel 2023; 36:gzad012. [PMID: 37707513 DOI: 10.1093/protein/gzad012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 09/04/2023] [Accepted: 09/05/2023] [Indexed: 09/15/2023] Open
Abstract
Computational protein design promises the ability to build tailor-made proteins de novo. While a range of de novo proteins have been constructed so far, the majority of these designs have idealized topologies that lack larger cavities which are necessary for the incorporation of small molecule binding sites or enzymatic functions. One attractive target for enzyme design is the TIM-barrel fold, due to its ubiquity in nature and capability to host versatile functions. With the successful de novo design of a 4-fold symmetric TIM barrel, sTIM11, an idealized, minimalistic scaffold was created. In this work, we attempted to extend this de novo TIM barrel by incorporating a helix-loop-helix motif into its βα-loops by applying a physics-based modular design approach using Rosetta. Further diversification was performed by exploiting the symmetry of the scaffold to integrate two helix-loop-helix motifs into the scaffold. Analysis with AlphaFold2 and biochemical characterization demonstrate the formation of additional α-helical secondary structure elements supporting the successful extension as intended.
Collapse
Affiliation(s)
- Sina Kordes
- Department of Biochemistry, University of Bayreuth, Bayreuth 95447, Germany
| | - Julian Beck
- Department of Biochemistry, University of Bayreuth, Bayreuth 95447, Germany
| | | | - Merle Flecks
- Department of Biochemistry, University of Bayreuth, Bayreuth 95447, Germany
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, Bayreuth 95447, Germany
| |
Collapse
|
21
|
Tanaka S. Protein-Protein Interaction Modelling with the Fragment Molecular Orbital Method. Methods Mol Biol 2023; 2552:295-305. [PMID: 36346599 DOI: 10.1007/978-1-0716-2609-2_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Fragment molecular orbital (FMO) method enables ab initio quantum-chemical calculations for biomolecular systems with high accuracy and moderate computational cost. Through this analysis we can evaluate the inter-fragment interaction energies (IFIEs) that provide useful measures for effective interactions between the fragments representing amino-acid residues and ligand molecules. Here I describe how to prepare the input structures and perform the FMO calculations for protein-protein complex system. In addition to the pre-processing, some useful tools for the post-processing analysis are also illustrated.
Collapse
Affiliation(s)
- Shigenori Tanaka
- Graduate School of System Informatics, Kobe University, Kobe, Hyogo, Japan.
| |
Collapse
|
22
|
Lugmayr W, Kotov V, Goessweiner-Mohr N, Wald J, DiMaio F, Marlovits TC. StarMap: a user-friendly workflow for Rosetta-driven molecular structure refinement. Nat Protoc 2023; 18:239-264. [PMID: 36323866 DOI: 10.1038/s41596-022-00757-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 08/08/2022] [Indexed: 01/13/2023]
Abstract
Cryogenic electron microscopy (cryo-EM) data represent density maps of macromolecular systems at atomic or near-atomic resolution. However, building and refining 3D atomic models by using data from cryo-EM maps is not straightforward and requires significant hands-on experience and manual intervention. We recently developed StarMap, an easy-to-use interface between the popular structural display program ChimeraX and Rosetta, a powerful molecular modeling engine. StarMap offers a general approach for refining structural models of biological macromolecules into cryo-EM density maps by combining Monte Carlo sampling with local density-guided optimization, Rosetta-based all-atom refinement and real-space B-factor calculations in a straightforward workflow. StarMap includes options for structural symmetry, local refinements and independent model validation. The overall quality of the refinement and the structure resolution is then assessed via analytical outputs, such as magnification calibration (pixel size calibration) and Fourier shell correlations. Z-scores reported by StarMap provide an easily interpretable indicator of the goodness of fit for each residue and can be plotted to evaluate structural models and improve local residue refinements, as well as to identify flexible regions and potentially functional sites in large macromolecular complexes. The protocol requires general computer skills, without the need for coding expertise, because most parts of the workflow can be operated by clicking tabs within the ChimeraX graphical user interface. Time requirements for the model refinement depend on the size and quality of the input data; however, this step can typically be completed within 1 d. The analytical parts of the workflow are completed within minutes.
Collapse
Affiliation(s)
- Wolfgang Lugmayr
- University Medical Center Hamburg-Eppendorf (UKE), Institute of Structural and Systems Biology, Hamburg, Germany.,CSSB Centre for Structural Systems Biology, Hamburg, Germany.,Deutsches Elektronen Synchrotron (DESY), Hamburg, Germany.,Research Institute of Molecular Pathology (IMP), Vienna, Austria.,Institute for Molecular Biotechnology (IMBA), Austrian Academy of Sciences, Vienna, Austria
| | - Vadim Kotov
- University Medical Center Hamburg-Eppendorf (UKE), Institute of Structural and Systems Biology, Hamburg, Germany.,CSSB Centre for Structural Systems Biology, Hamburg, Germany.,Deutsches Elektronen Synchrotron (DESY), Hamburg, Germany.,Research Institute of Molecular Pathology (IMP), Vienna, Austria.,Institute for Molecular Biotechnology (IMBA), Austrian Academy of Sciences, Vienna, Austria.,Evotec SE, Hamburg, Germany
| | - Nikolaus Goessweiner-Mohr
- University Medical Center Hamburg-Eppendorf (UKE), Institute of Structural and Systems Biology, Hamburg, Germany.,CSSB Centre for Structural Systems Biology, Hamburg, Germany.,Deutsches Elektronen Synchrotron (DESY), Hamburg, Germany.,Research Institute of Molecular Pathology (IMP), Vienna, Austria.,Institute for Molecular Biotechnology (IMBA), Austrian Academy of Sciences, Vienna, Austria.,Johannes Kepler University, Institute of Biophysics, Linz, Austria
| | - Jiri Wald
- University Medical Center Hamburg-Eppendorf (UKE), Institute of Structural and Systems Biology, Hamburg, Germany.,CSSB Centre for Structural Systems Biology, Hamburg, Germany.,Deutsches Elektronen Synchrotron (DESY), Hamburg, Germany.,Research Institute of Molecular Pathology (IMP), Vienna, Austria.,Institute for Molecular Biotechnology (IMBA), Austrian Academy of Sciences, Vienna, Austria
| | - Frank DiMaio
- University of Washington, Department of Biochemistry, Seattle, WA, USA
| | - Thomas C Marlovits
- University Medical Center Hamburg-Eppendorf (UKE), Institute of Structural and Systems Biology, Hamburg, Germany. .,CSSB Centre for Structural Systems Biology, Hamburg, Germany. .,Deutsches Elektronen Synchrotron (DESY), Hamburg, Germany. .,Research Institute of Molecular Pathology (IMP), Vienna, Austria. .,Institute for Molecular Biotechnology (IMBA), Austrian Academy of Sciences, Vienna, Austria.
| |
Collapse
|
23
|
Mufassirin MMM, Newton MAH, Sattar A. Artificial intelligence for template-free protein structure prediction: a comprehensive review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10350-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
24
|
Bhowmick S, Jing T, Wang W, Zhang EY, Zhang F, Yang Y. In Silico Protein Folding Prediction of COVID-19 Mutations and Variants. Biomolecules 2022; 12:1665. [PMID: 36359015 PMCID: PMC9688002 DOI: 10.3390/biom12111665] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 11/08/2022] [Accepted: 11/09/2022] [Indexed: 08/27/2023] Open
Abstract
With its fast-paced mutagenesis, the SARS-CoV-2 Omicron variant has threatened many societies worldwide. Strategies for predicting mutagenesis such as the computational prediction of SARS-CoV-2 structural diversity and its interaction with the human receptor will greatly benefit our understanding of the virus and help develop therapeutics against it. We aim to use protein structure prediction algorithms along with molecular docking to study the effects of various mutations in the Receptor Binding Domain (RBD) of the SARS-CoV-2 and its key interactions with the angiotensin-converting enzyme 2 (ACE-2) receptor. The RBD structures of the naturally occurring variants of SARS-CoV-2 were generated from the WUHAN-Hu-1 using the trRosetta algorithm. Docking (HADDOCK) and binding analysis (PRODIGY) between the predicted RBD sequences and ACE-2 highlighted key interactions at the Receptor-Binding Motif (RBM). Further mutagenesis at conserved residues in the Original, Delta, and Omicron variants (P499S and T500R) demonstrated stronger binding and interactions with the ACE-2 receptor. The predicted T500R mutation underwent some preliminary tests in vitro for its binding and transmissibility in cells; the results correlate with the in-silico analysis. In summary, we suggest conserved residues P499 and T500 as potential mutation sites that could increase the binding affinity and yet do not exist in nature. This work demonstrates the use of the trRosetta algorithm to predict protein structure and future mutations at the RBM of SARS-CoV-2, followed by experimental testing for further efficacy verification. It is important to understand the protein structure and folding to help develop potential therapeutics.
Collapse
Affiliation(s)
| | | | | | | | | | - Yanmin Yang
- Department of Neurology and Neurological Sciences, School of Medicine, Stanford University, 1201 Welch Road, MSLS, P259, Stanford, CA 94305, USA
| |
Collapse
|
25
|
He J, Turzo SBA, Seffernick JT, Kim SS, Lindert S. Prediction of Intrinsic Disorder Using Rosetta ResidueDisorder and AlphaFold2. J Phys Chem B 2022; 126:8439-8446. [PMID: 36251522 DOI: 10.1021/acs.jpcb.2c05508] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
The combination of deep learning and sequence data has transformed protein structure prediction and modeling, evidenced in the success of AlphaFold (AF). For this reason, many methods have been developed to take advantage of this success in areas where inaccurate structural modeling may limit computational predictiveness. For example, many methods have been developed to predict protein intrinsic disorder from sequence, including our Rosetta ResidueDisorder (RRD) approach. Intrinsically disordered regions in proteins are parts of the sequence that do not form ordered, folded structures under typical physiological conditions. In the original implementation of RRD, Rosetta ab initio models were generated, and disordered regions were predicted based on residue scores (disordered residues typically exist in regions of unfavorable scores). In this work, we show that by (i) replacing the ab initio modeling with AF (using the same scoring and disorder assignment approach) and (ii) updating the score function, the predictiveness improved significantly. Residues were better ranked by the order/disorder, evidenced by an improvement in receiver operating characteristic area-under-the-curve from 0.69 to 0.78 on a large (229 protein) and balanced data set (relatively even ordered versus disordered residues). Finally, the binary prediction accuracy also improved from 62% to 74% on the same data set. Our results show that the combined AF-RRD approach was as good as or better than all existing methods by these metrics (AF-RRD had the highest prediction accuracy).
Collapse
Affiliation(s)
- Jiadi He
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, United States
| | - Sm Bargeen Alam Turzo
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, United States
| | - Justin T Seffernick
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, United States
| | - Stephanie S Kim
- School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
| | - Steffen Lindert
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, United States
| |
Collapse
|
26
|
Fobe TL, Walker CC, Meek GA, Shirts MR. Folding Coarse-Grained Oligomer Models with PyRosetta. J Chem Theory Comput 2022; 18:6354-6369. [PMID: 36179376 DOI: 10.1021/acs.jctc.2c00519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Non-biological foldamers are a promising class of macromolecules that share similarities to classical biopolymers such as proteins and nucleic acids. Currently, designing novel foldamers is a non-trivial process, often involving many iterations of trial synthesis and characterization until folded structures are observed. In this work, we aim to tackle these foldamer design challenges using computational modeling techniques. We developed CG PyRosetta, an extension to the popular protein folding python package, PyRosetta, which introduces coarse-grained (CG) residues into PyRosetta, enabling the folding of toy CG foldamer models. Although these models are simplified, they can help explore overarching physical hypotheses about how oligomers can form. Through systematic variation of CG parameters in these models, we can investigate various folding hypotheses at the CG scale to inform the design process of new foldamer chemistries. In this study, we demonstrate CG PyRosetta's ability to identify minimum energy structures with a diverse structural search over a range of simple models, as well as two hypothesis-driven parameter scans investigating the effects of side-chain size and internal backbone angle on secondary structures. We are able to identify several types of secondary structures from single- and double-helices to sheet-like and knot-like structures. We show how side-chain size and backbone bond angle both play an important role in the structure and energetics of these toy models. Optimal side-chain sizes promote favorable packing of side chains, while specific backbone bond angles influence the specific helix type found in folded structures.
Collapse
Affiliation(s)
- Theodore L Fobe
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado80309, United States
| | - Christopher C Walker
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado80309, United States
| | - Garrett A Meek
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado80309, United States
| | - Michael R Shirts
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado80309, United States
| |
Collapse
|
27
|
Qing R, Hao S, Smorodina E, Jin D, Zalevsky A, Zhang S. Protein Design: From the Aspect of Water Solubility and Stability. Chem Rev 2022; 122:14085-14179. [PMID: 35921495 PMCID: PMC9523718 DOI: 10.1021/acs.chemrev.1c00757] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Indexed: 12/13/2022]
Abstract
Water solubility and structural stability are key merits for proteins defined by the primary sequence and 3D-conformation. Their manipulation represents important aspects of the protein design field that relies on the accurate placement of amino acids and molecular interactions, guided by underlying physiochemical principles. Emulated designer proteins with well-defined properties both fuel the knowledge-base for more precise computational design models and are used in various biomedical and nanotechnological applications. The continuous developments in protein science, increasing computing power, new algorithms, and characterization techniques provide sophisticated toolkits for solubility design beyond guess work. In this review, we summarize recent advances in the protein design field with respect to water solubility and structural stability. After introducing fundamental design rules, we discuss the transmembrane protein solubilization and de novo transmembrane protein design. Traditional strategies to enhance protein solubility and structural stability are introduced. The designs of stable protein complexes and high-order assemblies are covered. Computational methodologies behind these endeavors, including structure prediction programs, machine learning algorithms, and specialty software dedicated to the evaluation of protein solubility and aggregation, are discussed. The findings and opportunities for Cryo-EM are presented. This review provides an overview of significant progress and prospects in accurate protein design for solubility and stability.
Collapse
Affiliation(s)
- Rui Qing
- State
Key Laboratory of Microbial Metabolism, School of Life Sciences and
Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
- Media
Lab, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
- The
David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Shilei Hao
- Media
Lab, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
- Key
Laboratory of Biorheological Science and Technology, Ministry of Education, College of Bioengineering, Chongqing University, Chongqing 400030, China
| | - Eva Smorodina
- Department
of Immunology, University of Oslo and Oslo
University Hospital, Oslo 0424, Norway
| | - David Jin
- Avalon GloboCare
Corp., Freehold, New Jersey 07728, United States
| | - Arthur Zalevsky
- Laboratory
of Bioinformatics Approaches in Combinatorial Chemistry and Biology, Shemyakin−Ovchinnikov Institute of Bioorganic
Chemistry RAS, Moscow 117997, Russia
| | - Shuguang Zhang
- Media
Lab, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
28
|
Li X, Craven TW, Levine PM. Cyclic Peptide Screening Methods for Preclinical Drug Discovery. J Med Chem 2022; 65:11913-11926. [PMID: 36074956 DOI: 10.1021/acs.jmedchem.2c01077] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Cyclic peptides are among the most diverse architectures for current drug discovery efforts. Their size, stability, and ease of synthesis provide attractive scaffolds to engage and modulate some of the most challenging targets, including protein-protein interactions and those considered to be "undruggable". With a variety of sophisticated screening technologies to produce libraries of cyclic peptides, including phage display, mRNA display, split intein circular ligation of peptides, and in silico screening, a new era of cyclic peptide drug discovery is at the forefront of modern medicine. In this perspective, we begin by discussing cyclic peptides approved for clinical use in the past two decades. Particular focus is placed around synthetic chemistries to generate de novo libraries of cyclic peptides and novel methods to screen them. The perspective culminates with future prospects for generating cyclic peptides as viable therapeutic options and discusses the advantages and disadvantages currently being faced with bringing them to market.
Collapse
Affiliation(s)
- Xinting Li
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington 98195, United States
| | - Timothy W Craven
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington 98195, United States
| | - Paul M Levine
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington 98195, United States
| |
Collapse
|
29
|
Bongirwar V, Mokhade AS. Different methods, techniques and their limitations in protein structure prediction: A review. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2022; 173:72-82. [PMID: 35588858 DOI: 10.1016/j.pbiomolbio.2022.05.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 04/16/2022] [Accepted: 05/11/2022] [Indexed: 11/17/2022]
Abstract
Because of the increase in different types of diseases in human habitats, demands for designing various types of drugs are also increasing. Protein and its structure play a very important role in drug design. Therefore researchers from different areas like mathematics, medicines, and computer science are teaming up for getting better solutions in the said field. In this paper, we have discussed different methods of secondary and tertiary protein structure prediction (PSP), along with the limitations of different approaches. Different types of datasets used in PSP are also discussed here. This paper also tells about different performance measures to evaluate the prediction accuracy of PSP methods. Different software's/servers are available for download, which are used to find the protein structures for the input protein sequence. These softwares will also help to compare the performance of any new algorithm with other available methods. Details of those softwares are also mentioned in this paper.
Collapse
Affiliation(s)
| | - A S Mokhade
- Visvesvaraya National Institute of Technology, Nagpur, India
| |
Collapse
|
30
|
Pearce R, Li Y, Omenn GS, Zhang Y. Fast and accurate Ab Initio Protein structure prediction using deep learning potentials. PLoS Comput Biol 2022; 18:e1010539. [PMID: 36112717 PMCID: PMC9518900 DOI: 10.1371/journal.pcbi.1010539] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 09/28/2022] [Accepted: 09/03/2022] [Indexed: 01/05/2023] Open
Abstract
Despite the immense progress recently witnessed in protein structure prediction, the modeling accuracy for proteins that lack sequence and/or structure homologs remains to be improved. We developed an open-source program, DeepFold, which integrates spatial restraints predicted by multi-task deep residual neural-networks along with a knowledge-based energy function to guide its gradient-descent folding simulations. The results on large-scale benchmark tests showed that DeepFold creates full-length models with accuracy significantly beyond classical folding approaches and other leading deep learning methods. Of particular interest is the modeling performance on the most difficult targets with very few homologous sequences, where DeepFold achieved an average TM-score that was 40.3% higher than trRosetta and 44.9% higher than DMPfold. Furthermore, the folding simulations for DeepFold were 262 times faster than traditional fragment assembly simulations. These results demonstrate the power of accurately predicted deep learning potentials to improve both the accuracy and speed of ab initio protein structure prediction.
Collapse
Affiliation(s)
- Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Gilbert S. Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Departments of Internal Medicine and Human Genetics and School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
31
|
Yang H, Xiong Z, Zonta F. Construction of a Deep Neural Network Energy Function for Protein Physics. J Chem Theory Comput 2022; 18:5649-5658. [PMID: 35939398 PMCID: PMC9476656 DOI: 10.1021/acs.jctc.2c00069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The traditional approach of computational biology consists of calculating molecule properties by using approximate classical potentials. Interactions between atoms are described by an energy function derived from physical principles or fitted to experimental data. Their functional form is usually limited to pairwise interactions between atoms and does not consider complex multibody effects. More recently, neural networks have emerged as an alternative way of describing the interactions between biomolecules. In this approach, the energy function does not have an explicit functional form and is learned bottom-up from simulations at the atomistic or quantum level. In this study, we attempt a top-down approach and use deep learning methods to obtain an energy function by exploiting the large amount of experimental data acquired with years in the field of structural biology. The energy function is represented by a probability density model learned from a large repertoire of building blocks representing local clusters of amino acids paired with their sequence signature. We demonstrated the feasibility of this approach by generating a neural network energy function and testing its validity on several applications such as discriminating decoys, assessing qualities of structural models, sampling structural conformations, and designing new protein sequences. We foresee that, in the future, our methodology could exploit the continuously increasing availability of experimental data and simulations and provide a new method for the parametrization of protein energy functions.
Collapse
Affiliation(s)
- Huan Yang
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai 201210, China
| | - Zhaoping Xiong
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai 201210, China
| | - Francesco Zonta
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai 201210, China
| |
Collapse
|
32
|
I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction. Nat Protoc 2022; 17:2326-2353. [PMID: 35931779 DOI: 10.1038/s41596-022-00728-0] [Citation(s) in RCA: 96] [Impact Index Per Article: 48.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 05/24/2022] [Indexed: 01/17/2023]
Abstract
Most proteins in cells are composed of multiple folding units (or domains) to perform complex functions in a cooperative manner. Relative to the rapid progress in single-domain structure prediction, there are few effective tools available for multi-domain protein structure assembly, mainly due to the complexity of modeling multi-domain proteins, which involves higher degrees of freedom in domain-orientation space and various levels of continuous and discontinuous domain assembly and linker refinement. To meet the challenge and the high demand of the community, we developed I-TASSER-MTD to model the structures and functions of multi-domain proteins through a progressive protocol that combines sequence-based domain parsing, single-domain structure folding, inter-domain structure assembly and structure-based function annotation in a fully automated pipeline. Advanced deep-learning models have been incorporated into each of the steps to enhance both the domain modeling and inter-domain assembly accuracy. The protocol allows for the incorporation of experimental cross-linking data and cryo-electron microscopy density maps to guide the multi-domain structure assembly simulations. I-TASSER-MTD is built on I-TASSER but substantially extends its ability and accuracy in modeling large multi-domain protein structures and provides meaningful functional insights for the targets at both the domain- and full-chain levels from the amino acid sequence alone.
Collapse
|
33
|
Modeling of protein conformational changes with Rosetta guided by limited experimental data. Structure 2022; 30:1157-1168.e3. [PMID: 35597243 PMCID: PMC9357069 DOI: 10.1016/j.str.2022.04.013] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 04/08/2022] [Accepted: 04/25/2022] [Indexed: 11/24/2022]
Abstract
Conformational changes are an essential component of functional cycles of many proteins, but their characterization often requires an integrative structural biology approach. Here, we introduce and benchmark ConfChangeMover (CCM), a new method built into the widely used macromolecular modeling suite Rosetta that is tailored to model conformational changes in proteins using sparse experimental data. CCM can rotate and translate secondary structural elements and modify their backbone dihedral angles in regions of interest. We benchmarked CCM on soluble and membrane proteins with simulated Cα-Cα distance restraints and sparse experimental double electron-electron resonance (DEER) restraints, respectively. In both benchmarks, CCM outperformed state-of-the-art Rosetta methods, showing that it can model a diverse array of conformational changes. In addition, the Rosetta framework allows a wide variety of experimental data to be integrated with CCM, thus extending its capability beyond DEER restraints. This method will contribute to the biophysical characterization of protein dynamics.
Collapse
|
34
|
Yue K. Modeling protein structure as a stable static equilibrium. Phys Rev E 2022; 106:024410. [PMID: 36110022 DOI: 10.1103/physreve.106.024410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 07/24/2022] [Indexed: 06/15/2023]
Abstract
We present evidence that the protein structure can be modeled as a stable static equilibrium, determined mainly by compressive supports in the nonpolar interior. That is, protein structures derive their structural strength through the same mechanical principles as do conventional structures like bridges and buildings. This is based on the observation that the experimentally elucidated structural determinants, the interior nonpolar side chains, are engaged in strong compressions in static terms. At the same time, major substructures in proteins, helices and h-bonded strands, because of their geometry, inherently leave gaps in the space they occupy. Under the compressive force, nonpolar side chains from one substructure can protrude into the gaps of another neighboring substructure and block its motion. As a result, interlocking of substructures can form, which builds up the nonpolar core assembly. The native structure then is the one with the structurally most stable core assembly. While intuitively appealing, this is a radical departure from the prevailing thinking that protein native structure is determined by global energy minimum, which is founded on thermodynamic hypothesis. Furthermore, to develop an effective model for analyzing protein structure with conventional tools, a proper mechanical representation must be established. By proving that the stability of the equilibrium in compressive interactions is conditioned on a form of mechanical energy minimum, we show that our notion of native structure can be equally consistent with the thermodynamic hypothesis. By mathematically treating the blocking action, an interaction, as a bar, a physical object, we succeed in representing and analyzing the core assembly as truss, a conventional structure. In this paper we define and expound step-by-step increasingly integrated interlocking patterns. We then analyze the core assemblies of a large set of diverse protein database structures. A native structure can be distinguished from decoys by comparing the composition and strength of their core assemblies. We show the results for two sets of native structures vs decoys.
Collapse
Affiliation(s)
- Kaizhi Yue
- Conformational Search Solutions, Palo Alto, California 94306, USA
| |
Collapse
|
35
|
Turzo SMBA, Seffernick JT, Rolland AD, Donor MT, Heinze S, Prell JS, Wysocki VH, Lindert S. Protein shape sampled by ion mobility mass spectrometry consistently improves protein structure prediction. Nat Commun 2022; 13:4377. [PMID: 35902583 PMCID: PMC9334640 DOI: 10.1038/s41467-022-32075-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 07/14/2022] [Indexed: 11/09/2022] Open
Abstract
Ion mobility (IM) mass spectrometry provides structural information about protein shape and size in the form of an orientationally-averaged collision cross-section (CCSIM). While IM data have been used with various computational methods, they have not yet been utilized to predict monomeric protein structure from sequence. Here, we show that IM data can significantly improve protein structure determination using the modelling suite Rosetta. We develop the Rosetta Projection Approximation using Rough Circular Shapes (PARCS) algorithm that allows for fast and accurate prediction of CCSIM from structure. Following successful testing of the PARCS algorithm, we use an integrative modelling approach to utilize IM data for protein structure prediction. Additionally, we propose a confidence metric that identifies near native models in the absence of a known structure. The results of this study demonstrate the ability of IM data to consistently improve protein structure prediction. Collision cross sections (CCS) from ion mobility mass spectrometry provide information about protein shape and size. Here, the authors develop an algorithm to predict CCS and integrate experimental ion mobility data into Rosetta-based molecular modelling to predict protein structures from sequence.
Collapse
Affiliation(s)
- S M Bargeen Alam Turzo
- Department of Chemistry and Biochemistry and Resource for Native Mass Spectrometry Guided Structural Biology, Ohio State University, Columbus, OH, 43210, USA
| | - Justin T Seffernick
- Department of Chemistry and Biochemistry and Resource for Native Mass Spectrometry Guided Structural Biology, Ohio State University, Columbus, OH, 43210, USA
| | - Amber D Rolland
- Department of Chemistry and Biochemistry and Materials Science Institute, University of Oregon, Eugene, OR, 97403, USA
| | - Micah T Donor
- Department of Chemistry and Biochemistry and Materials Science Institute, University of Oregon, Eugene, OR, 97403, USA
| | - Sten Heinze
- Department of Chemistry and Biochemistry and Resource for Native Mass Spectrometry Guided Structural Biology, Ohio State University, Columbus, OH, 43210, USA
| | - James S Prell
- Department of Chemistry and Biochemistry and Materials Science Institute, University of Oregon, Eugene, OR, 97403, USA
| | - Vicki H Wysocki
- Department of Chemistry and Biochemistry and Resource for Native Mass Spectrometry Guided Structural Biology, Ohio State University, Columbus, OH, 43210, USA
| | - Steffen Lindert
- Department of Chemistry and Biochemistry and Resource for Native Mass Spectrometry Guided Structural Biology, Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
36
|
Li Y, Zhang C, Yu DJ, Zhang Y. Deep learning geometrical potential for high-accuracy ab initio protein structure prediction. iScience 2022; 25:104425. [PMID: 35663033 PMCID: PMC9160776 DOI: 10.1016/j.isci.2022.104425] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 05/02/2022] [Accepted: 05/11/2022] [Indexed: 11/22/2022] Open
Abstract
Ab initio protein structure prediction has been vastly boosted by the modeling of inter-residue contact/distance maps in recent years. We developed a new deep learning model, DeepPotential, which accurately predicts the distribution of a complementary set of geometric descriptors including a novel hydrogen-bonding potential defined by C-alpha atom coordinates. On 154 Free-Modeling/Hard targets from the CASP and CAMEO experiments, DeepPotential demonstrated significant advantage on both geometrical feature prediction and full-length structure construction, with Top-L/5 contact accuracy and TM-score of full-length models 4.1% and 6.7% higher than the best of other deep-learning restraint prediction approaches. Detail analyses showed that the major contributions to the TM-score/contact-map improvements come from the employment of multi-tasking network architecture and metagenome-based MSA collection assisted with confidence-based MSA selection, where hydrogen-bonding and inter-residue orientation predictions help improve hydrogen-bonding network and secondary structure packing. These results demonstrated new progress in the deep-learning restraint-guided ab initio protein structure prediction. Multi-tasking network architecture for multiple inter-residue geometries Novel deep learning model for improved hydrogen-bonding modeling Rapid and high-accuracy Ab initio protein structure prediction
Collapse
Affiliation(s)
- Yang Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 21000, China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 21000, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
37
|
Matching protein surface structural patches for high-resolution blind peptide docking. Proc Natl Acad Sci U S A 2022; 119:e2121153119. [PMID: 35482919 PMCID: PMC9170164 DOI: 10.1073/pnas.2121153119] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Modeling interactions between short peptides and their receptors is a challenging docking problem due to the peptide flexibility, resulting in a formidable sampling problem of peptide conformation in addition to its orientation. Alternatively, the peptide can be viewed as a piece that complements the receptor monomer structure. Here, we show that the peptide conformation can be determined based on the receptor backbone only and sampled using local structural motifs found in solved protein monomers and interfaces, independent of sequence similarity. This approach outperforms current peptide docking protocols and promotes new directions for peptide interface design. Peptide docking can be perceived as a subproblem of protein–protein docking. However, due to the short length and flexible nature of peptides, many do not adopt one defined conformation prior to binding. Therefore, to tackle a peptide docking problem, not only the relative orientation, but also the bound conformation of the peptide needs to be modeled. Traditional peptide-centered approaches use information about peptide sequences to generate representative conformer ensembles, which can then be rigid-body docked to the receptor. Alternatively, one may look at this problem from the viewpoint of the receptor, namely, that the protein surface defines the peptide-bound conformation. Here, we present PatchMAN (Patch-Motif AligNments), a global peptide-docking approach that uses structural motifs to map the receptor surface with backbone scaffolds extracted from protein structures. On a nonredundant set of protein–peptide complexes, starting from free receptor structures, PatchMAN successfully models and identifies near-native peptide–protein complexes in 58%/84% within 2.5 Å/5 Å interface backbone RMSD, with corresponding sampling in 81%/100% of the cases, outperforming other approaches. PatchMAN leverages the observation that structural units of peptides with their binding pocket can be found not only within interfaces, but also within monomers. We show that the bound peptide conformation is sampled based on the structural context of the receptor only, without taking into account any sequence information. Beyond peptide docking, this approach opens exciting new avenues to study principles of peptide–protein association, and to the design of new peptide binders. PatchMAN is available as a server at https://furmanlab.cs.huji.ac.il/patchman/.
Collapse
|
38
|
Zimmermann MT. Molecular Modeling is an Enabling Approach to Complement and Enhance Channelopathy Research. Compr Physiol 2022; 12:3141-3166. [PMID: 35578963 DOI: 10.1002/cphy.c190047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Hundreds of human membrane proteins form channels that transport necessary ions and compounds, including drugs and metabolites, yet details of their normal function or how function is altered by genetic variants to cause diseases are often unknown. Without this knowledge, researchers are less equipped to develop approaches to diagnose and treat channelopathies. High-resolution computational approaches such as molecular modeling enable researchers to investigate channelopathy protein function, facilitate detailed hypothesis generation, and produce data that is difficult to gather experimentally. Molecular modeling can be tailored to each physiologic context that a protein may act within, some of which may currently be difficult or impossible to assay experimentally. Because many genomic variants are observed in channelopathy proteins from high-throughput sequencing studies, methods with mechanistic value are needed to interpret their effects. The eminent field of structural bioinformatics integrates techniques from multiple disciplines including molecular modeling, computational chemistry, biophysics, and biochemistry, to develop mechanistic hypotheses and enhance the information available for understanding function. Molecular modeling and simulation access 3D and time-dependent information, not currently predictable from sequence. Thus, molecular modeling is valuable for increasing the resolution with which the natural function of protein channels can be investigated, and for interpreting how genomic variants alter them to produce physiologic changes that manifest as channelopathies. © 2022 American Physiological Society. Compr Physiol 12:3141-3166, 2022.
Collapse
Affiliation(s)
- Michael T Zimmermann
- Bioinformatics Research and Development Laboratory, Genomic Sciences and Precision Medicine Center, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.,Clinical and Translational Sciences Institute, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.,Department of Biochemistry, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| |
Collapse
|
39
|
Krivacic C, Kundert K, Pan X, Pache RA, Liu L, Conchúir SO, Jeliazkov JR, Gray JJ, Thompson MC, Fraser JS, Kortemme T. Accurate positioning of functional residues with robotics-inspired computational protein design. Proc Natl Acad Sci U S A 2022; 119:e2115480119. [PMID: 35254891 PMCID: PMC8931229 DOI: 10.1073/pnas.2115480119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 01/27/2022] [Indexed: 11/18/2022] Open
Abstract
SignificanceComputational protein design promises to advance applications in medicine and biotechnology by creating proteins with many new and useful functions. However, new functions require the design of specific and often irregular atom-level geometries, which remains a major challenge. Here, we develop computational methods that design and predict local protein geometries with greater accuracy than existing methods. Then, as a proof of concept, we leverage these methods to design new protein conformations in the enzyme ketosteroid isomerase that change the protein's preference for a key functional residue. Our computational methods are openly accessible and can be applied to the design of other intricate geometries customized for new user-defined protein functions.
Collapse
Affiliation(s)
- Cody Krivacic
- UC Berkeley–UCSF Graduate Program in Bioengineering, University of California, San Francisco, CA 94158
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158
| | - Kale Kundert
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158
- Biophysics Graduate Program, University of California, San Francisco, CA 94158
| | - Xingjie Pan
- UC Berkeley–UCSF Graduate Program in Bioengineering, University of California, San Francisco, CA 94158
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158
| | - Roland A. Pache
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158
| | - Lin Liu
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158
| | - Shane O Conchúir
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158
| | | | - Jeffrey J. Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218
| | - Michael C. Thompson
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158
| | - James S. Fraser
- UC Berkeley–UCSF Graduate Program in Bioengineering, University of California, San Francisco, CA 94158
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158
- Biophysics Graduate Program, University of California, San Francisco, CA 94158
- Quantitative Biosciences Institute, University of California, San Francisco, CA 94158
| | - Tanja Kortemme
- UC Berkeley–UCSF Graduate Program in Bioengineering, University of California, San Francisco, CA 94158
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158
- Biophysics Graduate Program, University of California, San Francisco, CA 94158
- Quantitative Biosciences Institute, University of California, San Francisco, CA 94158
| |
Collapse
|
40
|
Boral A, Khamaru M, Mitra D. Designing synthetic transcription factors: A structural perspective. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:245-287. [PMID: 35534109 DOI: 10.1016/bs.apcsb.2021.12.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this chapter, we discuss different design strategies of synthetic proteins, especially synthetic transcription factors. Design and engineering of synthetic transcription factors is particularly relevant for the need-based manipulation of gene expression. With recent advances in structural biology techniques and with the emergence of other precision biochemical/physical tools, accurate knowledge on structure-function relations is increasingly becoming available. Besides discussing the underlying principles of design, we go through individual cases, especially those involving four major groups of transcription factors-basic leucine zippers, zinc fingers, helix-turn-helix and homeodomains. We further discuss how synthetic biology can come together with structural biology to alter the genetic blueprint of life.
Collapse
Affiliation(s)
- Aparna Boral
- Department of Life Sciences, Presidency University, Kolkata, West Bengal, India
| | - Madhurima Khamaru
- Department of Life Sciences, Presidency University, Kolkata, West Bengal, India
| | - Devrani Mitra
- Department of Life Sciences, Presidency University, Kolkata, West Bengal, India.
| |
Collapse
|
41
|
Ju F, Zhu J, Zhang Q, Wei G, Sun S, Zheng WM, Bu D. Seq-SetNet: directly exploiting multiple sequence alignment for protein secondary structure prediction. Bioinformatics 2022; 38:990-996. [PMID: 34849579 DOI: 10.1093/bioinformatics/btab777] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 10/22/2021] [Accepted: 11/04/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Accurate prediction of protein structure relies heavily on exploiting multiple sequence alignment (MSA) for residue mutations and correlations as this information specifies protein tertiary structure. The widely used prediction approaches usually transform MSA into inter-mediate models, say position-specific scoring matrix or profile hidden Markov model. These inter-mediate models, however, cannot fully represent residue mutations and correlations carried by MSA; hence, an effective way to directly exploit MSAs is highly desirable. RESULTS Here, we report a novel sequence set network (called Seq-SetNet) to directly and effectively exploit MSA for protein structure prediction. Seq-SetNet uses an 'encoding and aggregation' strategy that consists of two key elements: (i) an encoding module that takes a component homologue in MSA as input, and encodes residue mutations and correlations into context-specific features for each residue; and (ii) an aggregation module to aggregate the features extracted from all component homologues, which are further transformed into structural properties for residues of the query protein. As Seq-SetNet encodes each homologue protein individually, it could consider both insertions and deletions, as well as long-distance correlations among residues, thus representing more information than the inter-mediate models. Moreover, the encoding module automatically learns effective features and thus avoids manual feature engineering. Using symmetric aggregation functions, Seq-SetNet processes the homologue proteins as a sequence set, making its prediction results invariable to the order of these proteins. On popular benchmark sets, we demonstrated the successful application of Seq-SetNet to predict secondary structure and torsion angles of residues with improved accuracy and efficiency. AVAILABILITY AND IMPLEMENTATION The code and datasets are available through https://github.com/fusong-ju/Seq-SetNet. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fusong Ju
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jianwei Zhu
- Microsoft Research Asia, Beijing 100080, China
| | - Qi Zhang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guozheng Wei
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shiwei Sun
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,Zhongke Big Data Academy, Zhengzhou 450046, Henan, China
| | - Wei-Mou Zheng
- University of Chinese Academy of Sciences, Beijing 100049, China.,Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,Zhongke Big Data Academy, Zhengzhou 450046, Henan, China
| |
Collapse
|
42
|
Han Y, Wang Z, Chen A, Ali I, Cai J, Ye S, Li J. An inductive transfer learning force field (ITLFF) protocol builds protein force fields in seconds. Brief Bioinform 2022; 23:6509736. [PMID: 35039818 DOI: 10.1093/bib/bbab590] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 12/19/2021] [Accepted: 12/23/2021] [Indexed: 01/15/2023] Open
Abstract
Accurate simulation of protein folding is a unique challenge in understanding the physical process of protein folding, with important implications for protein design and drug discovery. Molecular dynamics simulation strongly requires advanced force fields with high accuracy to achieve correct folding. However, the current force fields are inaccurate, inapplicable and inefficient. We propose a machine learning protocol, the inductive transfer learning force field (ITLFF), to construct protein force fields in seconds with any level of accuracy from a small dataset. This process is achieved by incorporating an inductive transfer learning algorithm into deep neural networks, which learn knowledge of any high-level calculations from a large dataset of low-level method. Here, we use a double-hybrid density functional theory (DFT) as a case functional, but ITLFF is suitable for any high-precision functional. The performance of the selected 18 proteins indicates that compared with the fragment-based double-hybrid DFT algorithm, the force field constructed by ITLFF achieves considerable accuracy with a mean absolute error of 0.0039 kcal/mol/atom for energy and a root mean square error of 2.57 $\mathrm{kcal}/\mathrm{mol}/{\AA}$ for force, and it is more than 30 000 times faster and obtains more significant efficiency benefits as the system increases. The outstanding performance of ITLFF provides promising prospects for accurate and efficient protein dynamic simulations and makes an important step toward protein folding simulation. Due to the ability of ITLFF to utilize the knowledge acquired in one task to solve related problems, it is also applicable for various problems in biology, chemistry and material science.
Collapse
Affiliation(s)
- Yanqiang Han
- National Key Laboratory of Science and Technology on Micro/Nano Fabrication, Shanghai Jiao Tong University, Shanghai, 200240, China
- Key Laboratory for Thin Film and Microfabrication of Ministry of Education, Department of Micro/Nano-electronics, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Zhilong Wang
- National Key Laboratory of Science and Technology on Micro/Nano Fabrication, Shanghai Jiao Tong University, Shanghai, 200240, China
- Key Laboratory for Thin Film and Microfabrication of Ministry of Education, Department of Micro/Nano-electronics, Shanghai Jiao Tong University, Shanghai 200240, China
| | - An Chen
- National Key Laboratory of Science and Technology on Micro/Nano Fabrication, Shanghai Jiao Tong University, Shanghai, 200240, China
- Key Laboratory for Thin Film and Microfabrication of Ministry of Education, Department of Micro/Nano-electronics, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Imran Ali
- National Key Laboratory of Science and Technology on Micro/Nano Fabrication, Shanghai Jiao Tong University, Shanghai, 200240, China
- Key Laboratory for Thin Film and Microfabrication of Ministry of Education, Department of Micro/Nano-electronics, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Junfei Cai
- National Key Laboratory of Science and Technology on Micro/Nano Fabrication, Shanghai Jiao Tong University, Shanghai, 200240, China
- Key Laboratory for Thin Film and Microfabrication of Ministry of Education, Department of Micro/Nano-electronics, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Simin Ye
- National Key Laboratory of Science and Technology on Micro/Nano Fabrication, Shanghai Jiao Tong University, Shanghai, 200240, China
- Key Laboratory for Thin Film and Microfabrication of Ministry of Education, Department of Micro/Nano-electronics, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jinjin Li
- National Key Laboratory of Science and Technology on Micro/Nano Fabrication, Shanghai Jiao Tong University, Shanghai, 200240, China
- Key Laboratory for Thin Film and Microfabrication of Ministry of Education, Department of Micro/Nano-electronics, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
43
|
Sych T, Levental KR, Sezgin E. Lipid–Protein Interactions in Plasma Membrane Organization and Function. Annu Rev Biophys 2022; 51:135-156. [DOI: 10.1146/annurev-biophys-090721-072718] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Lipid–protein interactions in cells are involved in various biological processes, including metabolism, trafficking, signaling, host–pathogen interactions, and transmembrane transport. At the plasma membrane, lipid–protein interactions play major roles in membrane organization and function. Several membrane proteins have motifs for specific lipid binding, which modulate protein conformation and consequent function. In addition to such specific lipid–protein interactions, protein function can be regulated by the dynamic, collective behavior of lipids in membranes. Emerging analytical, biochemical, and computational technologies allow us to study the influence of specific lipid–protein interactions, as well as the collective behavior of membranes on protein function. In this article, we review the recent literature on lipid–protein interactions with a specific focus on the current state-of-the-art technologies that enable novel insights into these interactions. Expected final online publication date for the Annual Review of Biophysics, Volume 51 is May 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Taras Sych
- Science for Life Laboratory, Department of Women's and Children's Health, Karolinska Institutet, Solna, Sweden;,
| | - Kandice R. Levental
- Department of Molecular Physiology and Biological Physics, Center for Membrane and Cell Physiology, University of Virginia, Charlottesville, Virginia, USA
| | - Erdinc Sezgin
- Science for Life Laboratory, Department of Women's and Children's Health, Karolinska Institutet, Solna, Sweden;,
- MRC Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
44
|
Yadav NS, Kumar P, Singh I. Structural and functional analysis of protein. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00026-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
45
|
Peng CX, Zhou XG, Zhang GJ. De novo Protein Structure Prediction by Coupling Contact With Distance Profile. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:395-406. [PMID: 32750861 DOI: 10.1109/tcbb.2020.3000758] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
De novo protein structure prediction is a challenging problem that requires both an accurate energy function and an efficient conformation sampling method. In this study, a de novo structure prediction method, named CoDiFold, is proposed. In CoDiFold, contacts and distance profiles are organically combined into the Rosetta low-resolution energy function to improve the accuracy of energy function. As a result, the correlation between energy and root mean square deviation (RMSD) is improved. In addition, a population-based multi-mutation strategy is designed to balance the exploration and exploitation of conformation space sampling. The average RMSD of the models generated by the proposed protocol is decreased by 49.24 and 45.21 percent in the test set with 43 proteins compared with those of Rosetta and QUARK de novo protocols, respectively. The results also demonstrate that the structures predicted by proposed CoDiFold are comparable to the state-of-the-art methods for the 10 FM targets of CASP13. The source code and executable versions are freely available at http://github.com/iobio-zjut/CoDiFold.
Collapse
|
46
|
Artificial Intelligence in Medicine: Biochemical 3D Modeling and Drug Discovery. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
47
|
Green biomanufacturing promoted by automatic retrobiosynthesis planning and computational enzyme design. Chin J Chem Eng 2022. [DOI: 10.1016/j.cjche.2021.08.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
48
|
Gao J, Zheng S, Yao M, Wu P. Precise estimation of residue relative solvent accessible area from Cα atom distance matrix using a deep learning method. Bioinformatics 2021; 38:94-98. [PMID: 34450651 DOI: 10.1093/bioinformatics/btab616] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 08/12/2021] [Accepted: 08/24/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The solvent accessible surface is an essential structural property measure related to the protein structure and protein function. Relative solvent accessible area (RSA) is a standard measure to describe the degree of residue exposure in the protein surface or inside of protein. However, this computation will fail when the residues information is missing. RESULTS In this article, we proposed a novel method for estimation RSA using the Cα atom distance matrix with the deep learning method (EAGERER). The new method, EAGERER, achieves Pearson correlation coefficients of 0.921-0.928 on two independent test datasets. We empirically demonstrate that EAGERER can yield better Pearson correlation coefficients than existing RSA estimators, such as coordination number, half sphere exposure and SphereCon. To the best of our knowledge, EAGERER represents the first method to estimate the solvent accessible area using limited information with a deep learning model. It could be useful to the protein structure and protein function prediction. AVAILABILITYAND IMPLEMENTATION The method is free available at https://github.com/cliffgao/EAGERER. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Shuangjia Zheng
- School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou 510006, China
| | - Mengting Yao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Peikun Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| |
Collapse
|
49
|
Decoding the link of microbiome niches with homologous sequences enables accurately targeted protein structure prediction. Proc Natl Acad Sci U S A 2021; 118:2110828118. [PMID: 34873061 DOI: 10.1073/pnas.2110828118] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/27/2021] [Indexed: 12/26/2022] Open
Abstract
Information derived from metagenome sequences through deep-learning techniques has significantly improved the accuracy of template free protein structure modeling. However, most of the deep learning-based modeling studies are based on blind sequence database searches and suffer from low efficiency in computational resource utilization and model construction, especially when the sequence library becomes prohibitively large. We proposed a MetaSource model built on 4.25 billion microbiome sequences from four major biomes (Gut, Lake, Soil, and Fermentor) to decode the inherent linkage of microbial niches with protein homologous families. Large-scale protein family folding experiments on 8,700 unknown Pfam families showed that a microbiome targeted approach with multiple sequence alignment constructed from individual MetaSource biomes requires more than threefold less computer memory and CPU (central processing unit) time but generates contact-map and three-dimensional structure models with a significantly higher accuracy, compared with that using combined metagenome datasets. These results demonstrate an avenue to bridge the gap between the rapidly increasing metagenome databases and the limited computing resources for efficient genome-wide database mining, which provides a useful bluebook to guide future microbiome sequence database and modeling development for high-accuracy protein structure and function prediction.
Collapse
|
50
|
Ovchinnikov S, Huang PS. Structure-based protein design with deep learning. Curr Opin Chem Biol 2021; 65:136-144. [PMID: 34547592 PMCID: PMC8671290 DOI: 10.1016/j.cbpa.2021.08.004] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 08/13/2021] [Indexed: 12/11/2022]
Abstract
Since the first revelation of proteins functioning as macromolecular machines through their three dimensional structures, researchers have been intrigued by the marvelous ways the biochemical processes are carried out by proteins. The aspiration to understand protein structures has fueled extensive efforts across different scientific disciplines. In recent years, it has been demonstrated that proteins with new functionality or shapes can be designed via structure-based modeling methods, and the design strategies have combined all available information - but largely piece-by-piece - from sequence derived statistics to the detailed atomic-level modeling of chemical interactions. Despite the significant progress, incorporating data-derived approaches through the use of deep learning methods can be a game changer. In this review, we summarize current progress, compare the arc of developing the deep learning approaches with the conventional methods, and describe the motivation and concepts behind current strategies that may lead to potential future opportunities.
Collapse
Affiliation(s)
- Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, 02138, USA.
| | - Po-Ssu Huang
- Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|