1
|
Gillani M, Pollastri G. Protein subcellular localization prediction tools. Comput Struct Biotechnol J 2024; 23:1796-1807. [PMID: 38707539 PMCID: PMC11066471 DOI: 10.1016/j.csbj.2024.04.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/11/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open
Abstract
Protein subcellular localization prediction is of great significance in bioinformatics and biological research. Most of the proteins do not have experimentally determined localization information, computational prediction methods and tools have been acting as an active research area for more than two decades now. Knowledge of the subcellular location of a protein provides valuable information about its functionalities, the functioning of the cell, and other possible interactions with proteins. Fast, reliable, and accurate predictors provides platforms to harness the abundance of sequence data to predict subcellular locations accordingly. During the last decade, there has been a considerable amount of research effort aimed at developing subcellular localization predictors. This paper reviews recent subcellular localization prediction tools in the Eukaryotic, Prokaryotic, and Virus-based categories followed by a detailed analysis. Each predictor is discussed based on its main features, strengths, weaknesses, algorithms used, prediction techniques, and analysis. This review is supported by prediction tools taxonomies that highlight their rele- vant area and examples for uncomplicated categorization and ease of understandability. These taxonomies help users find suitable tools according to their needs. Furthermore, recent research gaps and challenges are discussed to cover areas that need the utmost attention. This survey provides an in-depth analysis of the most recent prediction tools to facilitate readers and can be considered a quick guide for researchers to identify and explore the recent literature advancements.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| |
Collapse
|
2
|
Zhou B, Zheng L, Wu B, Yi K, Zhong B, Tan Y, Liu Q, Liò P, Hong L. A conditional protein diffusion model generates artificial programmable endonuclease sequences with enhanced activity. Cell Discov 2024; 10:95. [PMID: 39251570 PMCID: PMC11385924 DOI: 10.1038/s41421-024-00728-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 08/13/2024] [Indexed: 09/11/2024] Open
Abstract
Deep learning-based methods for generating functional proteins address the growing need for novel biocatalysts, allowing for precise tailoring of functionalities to meet specific requirements. This advancement leads to the development of highly efficient and specialized proteins with diverse applications across scientific, technological, and biomedical fields. This study establishes a pipeline for protein sequence generation with a conditional protein diffusion model, namely CPDiffusion, to create diverse sequences of proteins with enhanced functions. CPDiffusion accommodates protein-specific conditions, such as secondary structures and highly conserved amino acids. Without relying on extensive training data, CPDiffusion effectively captures highly conserved residues and sequence features for specific protein families. We applied CPDiffusion to generate artificial sequences of Argonaute (Ago) proteins based on the backbone structures of wild-type (WT) Kurthia massiliensis Ago (KmAgo) and Pyrococcus furiosus Ago (PfAgo), which are complex multi-domain programmable endonucleases. The generated sequences deviate by up to nearly 400 amino acids from their WT templates. Experimental tests demonstrated that the majority of the generated proteins for both KmAgo and PfAgo show unambiguous activity in DNA cleavage, with many of them exhibiting superior activity as compared to the WT. These findings underscore CPDiffusion's remarkable success rate in generating novel sequences for proteins with complex structures and functions in a single step, leading to enhanced activity. This approach facilitates the design of enzymes with multi-domain molecular structures and intricate functions through in silico generation and screening, all accomplished without the need for supervision from labeled data.
Collapse
Affiliation(s)
- Bingxin Zhou
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, China
- Shanghai National Center for Applied Mathematics (SJTU center), Shanghai Jiao Tong University, Shanghai, China
| | - Lirong Zheng
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, China.
- Department of Cell and Developmental Biology & Michigan Neuroscience Institute, University of Michigan Medical School, Ann Arbor, MI, USA.
| | - Banghao Wu
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, China
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Kai Yi
- School of Mathematics and Statistics, University of New South Wales, Sydney, NSW, Australia
| | - Bozitao Zhong
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, China
| | - Yang Tan
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, China
| | - Qian Liu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Pietro Liò
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK.
| | - Liang Hong
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, China.
- Shanghai National Center for Applied Mathematics (SJTU center), Shanghai Jiao Tong University, Shanghai, China.
- Zhangjiang Institute for Advanced Study, Shanghai Jiao Tong University, Shanghai, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, China.
| |
Collapse
|
3
|
Saharkhiz S, Mostafavi M, Birashk A, Karimian S, Khalilollah S, Jaferian S, Yazdani Y, Alipourfard I, Huh YS, Farani MR, Akhavan-Sigari R. The State-of-the-Art Overview to Application of Deep Learning in Accurate Protein Design and Structure Prediction. Top Curr Chem (Cham) 2024; 382:23. [PMID: 38965117 PMCID: PMC11224075 DOI: 10.1007/s41061-024-00469-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 06/09/2024] [Indexed: 07/06/2024]
Abstract
In recent years, there has been a notable increase in the scientific community's interest in rational protein design. The prospect of designing an amino acid sequence that can reliably fold into a desired three-dimensional structure and exhibit the intended function is captivating. However, a major challenge in this endeavor lies in accurately predicting the resulting protein structure. The exponential growth of protein databases has fueled the advancement of the field, while newly developed algorithms have pushed the boundaries of what was previously achievable in structure prediction. In particular, using deep learning methods instead of brute force approaches has emerged as a faster and more accurate strategy. These deep-learning techniques leverage the vast amount of data available in protein databases to extract meaningful patterns and predict protein structures with improved precision. In this article, we explore the recent developments in the field of protein structure prediction. We delve into the newly developed methods that leverage deep learning approaches, highlighting their significance and potential for advancing our understanding of protein design.
Collapse
Affiliation(s)
- Saber Saharkhiz
- Division of Neuroscience, Department of Cellular and Molecular Medicine, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
| | - Mehrnaz Mostafavi
- Faculty of Allied Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Amin Birashk
- Department of Computer Science, The University of Texas at Dallas, Richardson, TX, USA
| | - Shiva Karimian
- Electrical and Computer Research Center, Sanandaj Azad University, Sanandaj, Iran
| | - Shayan Khalilollah
- Department of Neurosurgery, Faculty of Medicine, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran
| | - Sohrab Jaferian
- Goergen Institute for Data Science, University of Rochester, Rochester, NY, USA
| | - Yalda Yazdani
- Immunology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Iraj Alipourfard
- Institute of Physical Chemistry, Polish Academy of Sciences, Marcina Kasprzaka 44/52, 01-224, Warsaw, Poland.
| | - Yun Suk Huh
- Department of Biological Engineering, Inha University, Incheon, Republic of Korea
| | | | | |
Collapse
|
4
|
Zheng W, Wuyun Q, Zhang Y. One step forward towards deep-learning protein complex structure prediction by precise multiple sequence alignment construction. Clin Transl Med 2024; 14:e1689. [PMID: 38880984 PMCID: PMC11180690 DOI: 10.1002/ctm2.1689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Accepted: 04/26/2024] [Indexed: 06/18/2024] Open
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and BioinformaticsUniversity of MichiganAnn ArborMichiganUSA
| | - Qiqige Wuyun
- Department of Computer Science and EngineeringMichigan State UniversityEast LansingMichiganUSA
| | - Yang Zhang
- Cancer Science Institute of SingaporeNational University of SingaporeSingaporeSingapore
- Department of Computer Science, School of ComputingNational University of SingaporeSingaporeSingapore
- Department of Biochemistry, Yong Loo Lin School of MedicineNational University of SingaporeSingaporeSingapore
| |
Collapse
|
5
|
Kumawat P, Agarwal LK, Sharma K. An Overview of SARS-CoV-2 Potential Targets, Inhibitors, and Computational Insights to Enrich the Promising Treatment Strategies. Curr Microbiol 2024; 81:169. [PMID: 38733424 DOI: 10.1007/s00284-024-03671-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 03/18/2024] [Indexed: 05/13/2024]
Abstract
The rapid spread of the SARS-CoV-2 virus has emphasized the urgent need for effective therapies to combat COVID-19. Investigating the potential targets, inhibitors, and in silico approaches pertinent to COVID-19 are of utmost need to develop novel therapeutic agents and reprofiling of existing FDA-approved drugs. This article reviews the viral enzymes and their counter receptors involved in the entry of SARS-CoV-2 into host cells, replication of genomic RNA, and controlling the host cell physiology. In addition, the study provides an overview of the computational techniques such as docking simulations, molecular dynamics, QSAR modeling, and homology modeling that have been used to find the FDA-approved drugs and other inhibitors against SARS-CoV-2. Furthermore, a comprehensive overview of virus-based and host-based druggable targets from a structural point of view, together with the reported therapeutic compounds against SARS-CoV-2 have also been presented. The current study offers future perspectives for research in the field of network pharmacology investigating the large unexplored molecular libraries. Overall, the present in-depth review aims to expedite the process of identifying and repurposing drugs for researchers involved in the field of COVID-19 drug discovery.
Collapse
Affiliation(s)
- Pooja Kumawat
- Department of Chemistry, Mohanlal Sukhadia University, Udaipur, Rajasthan, 313001, India
| | - Lokesh Kumar Agarwal
- Department of Chemistry, Mohanlal Sukhadia University, Udaipur, Rajasthan, 313001, India.
| | - Kuldeep Sharma
- Department of Botany, Mohanlal Sukhadia University, Udaipur, Rajasthan, 313001, India
| |
Collapse
|
6
|
Go EB, Lee JH, Cho JH, Kwon NH, Choi JI, Kwon I. Enhanced therapeutic potential of antibody fragment via IEDDA-mediated site-specific albumin conjugation. J Biol Eng 2024; 18:23. [PMID: 38576037 PMCID: PMC10996255 DOI: 10.1186/s13036-024-00418-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 03/14/2024] [Indexed: 04/06/2024] Open
Abstract
BACKGROUND The use of single-chain variable fragments (scFvs) for treating human diseases, such as cancer and immune system disorders, has attracted significant attention. However, a critical drawback of scFv is its extremely short serum half-life, which limits its therapeutic potential. Thus, there is a critical need to prolong the serum half-life of the scFv for clinical applications. One promising serum half-life extender for therapeutic proteins is human serum albumin (HSA), which is the most abundant protein in human serum, known to have an exceptionally long serum half-life. However, conjugating a macromolecular half-life extender to a small protein, such as scFv, often results in a significant loss of its critical properties. RESULTS In this study, we conjugated the HSA to a permissive site of scFv to improve pharmacokinetic profiles. To ensure minimal damage to the antigen-binding capacity of scFv upon HSA conjugation, we employed a site-specific conjugation approach using a heterobifunctional crosslinker that facilitates thiol-maleimide reaction and inverse electron-demand Diels-Alder reaction (IEDDA). As a model protein, we selected 4D5scFv, derived from trastuzumab, a therapeutic antibody used in human epithermal growth factor 2 (HER2)-positive breast cancer treatment. We introduced a phenylalanine analog containing a very reactive tetrazine group (frTet) at conjugation site candidates predicted by computational methods. Using the linker TCO-PEG4-MAL, a single HSA molecule was site-specifically conjugated to the 4D5scFv (4D5scFv-HSA). The 4D5scFv-HSA conjugate exhibited HER2 binding affinity comparable to that of unmodified 4D5scFv. Furthermore, in pharmacokinetic profile in mice, the serum half-life of 4D5scFv-HSA was approximately 12 h, which is 85 times longer than that of 4D5scFv. CONCLUSIONS The antigen binding results and pharmacokinetic profile of 4D5scFv-HSA demonstrate that the site-specifically albumin-conjugated scFv retained its binding affinity with a prolonged serum half-life. In conclusion, we developed an effective strategy to prepare site-specifically albumin-conjugated 4D5scFv, which can have versatile clinical applications with improved efficacy.
Collapse
Affiliation(s)
- Eun Byeol Go
- School of Materials Science and Engineering, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005, Republic of Korea
| | - Jae Hun Lee
- School of Materials Science and Engineering, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005, Republic of Korea
| | - Jeong Haeng Cho
- ProAbTech, Gwangju, 61005, Republic of Korea
- Department of Biotechnology and Bioengineering, Interdisciplinary Program for Bioenergy and Biomaterials, Chonnam National University, Gwangju, 61186, Republic of Korea
| | - Na Hyun Kwon
- School of Materials Science and Engineering, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005, Republic of Korea
| | - Jong-Il Choi
- Department of Biotechnology and Bioengineering, Interdisciplinary Program for Bioenergy and Biomaterials, Chonnam National University, Gwangju, 61186, Republic of Korea
| | - Inchan Kwon
- School of Materials Science and Engineering, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005, Republic of Korea.
| |
Collapse
|
7
|
Wang H, Liu D, Zhao K, Wang Y, Zhang G. SPDesign: protein sequence designer based on structural sequence profile using ultrafast shape recognition. Brief Bioinform 2024; 25:bbae146. [PMID: 38600663 PMCID: PMC11006797 DOI: 10.1093/bib/bbae146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 03/02/2024] [Accepted: 03/15/2024] [Indexed: 04/12/2024] Open
Abstract
Protein sequence design can provide valuable insights into biopharmaceuticals and disease treatments. Currently, most protein sequence design methods based on deep learning focus on network architecture optimization, while ignoring protein-specific physicochemical features. Inspired by the successful application of structure templates and pre-trained models in the protein structure prediction, we explored whether the representation of structural sequence profile can be used for protein sequence design. In this work, we propose SPDesign, a method for protein sequence design based on structural sequence profile using ultrafast shape recognition. Given an input backbone structure, SPDesign utilizes ultrafast shape recognition vectors to accelerate the search for similar protein structures in our in-house PAcluster80 structure database and then extracts the sequence profile through structure alignment. Combined with structural pre-trained knowledge and geometric features, they are further fed into an enhanced graph neural network for sequence prediction. The results show that SPDesign significantly outperforms the state-of-the-art methods, such as ProteinMPNN, Pifold and LM-Design, leading to 21.89%, 15.54% and 11.4% accuracy gains in sequence recovery rate on CATH 4.2 benchmark, respectively. Encouraging results also have been achieved on orphan and de novo (designed) benchmarks with few homologous sequences. Furthermore, analysis conducted by the PDBench tool suggests that SPDesign performs well in subdivided structures. More interestingly, we found that SPDesign can well reconstruct the sequences of some proteins that have similar structures but different sequences. Finally, the structural modeling verification experiment indicates that the sequences designed by SPDesign can fold into the native structures more accurately.
Collapse
Affiliation(s)
| | | | | | - Yajun Wang
- Corresponding authors. Guijun Zhang, College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China. E-mail: ; Yajun Wang, College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China. E-mail:
| | - Guijun Zhang
- Corresponding authors. Guijun Zhang, College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China. E-mail: ; Yajun Wang, College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China. E-mail:
| |
Collapse
|
8
|
Baker K, Hughes N, Bhattacharya S. An interactive visualization tool for educational outreach in protein contact map overlap analysis. FRONTIERS IN BIOINFORMATICS 2024; 4:1358550. [PMID: 38562910 PMCID: PMC10982686 DOI: 10.3389/fbinf.2024.1358550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 03/04/2024] [Indexed: 04/04/2024] Open
Abstract
Recent advancements in contact map-based protein three-dimensional (3D) structure prediction have been driven by the evolution of deep learning algorithms. However, the gap in accessible software tools for novices in this domain remains a significant challenge. This study introduces GoFold, a novel, standalone graphical user interface (GUI) designed for beginners to perform contact map overlap (CMO) problems for better template selection. Unlike existing tools that cater more to research needs or assume foundational knowledge, GoFold offers an intuitive, user-friendly platform with comprehensive tutorials. It stands out in its ability to visually represent the CMO problem, allowing users to input proteins in various formats and explore the CMO problem. The educational value of GoFold is demonstrated through benchmarking against the state-of-the-art contact map overlap method, map_align, using two datasets: PSICOV and CAMEO. GoFold exhibits superior performance in terms of TM-score and Z-score metrics across diverse qualities of contact maps and target difficulties. Notably, GoFold runs efficiently on personal computers without any third-party dependencies, thereby making it accessible to the general public for promoting citizen science. The tool is freely available for download for macOS, Linux, and Windows.
Collapse
Affiliation(s)
- Kevan Baker
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Nathaniel Hughes
- Department of Computer Science and Computer Information Systems, Auburn University at Montgomery, Montgomery, AL, United States
| | - Sutanu Bhattacharya
- Department of Computer Science and Computer Information Systems, Auburn University at Montgomery, Montgomery, AL, United States
| |
Collapse
|
9
|
Drogalin A, Monteiro LS, Alves MJ, Castro TG. Golgi α-mannosidase: opposing structures of Drosophila melanogaster and novel human model using molecular dynamics simulations and docking at different pHs. J Biomol Struct Dyn 2024; 42:2714-2725. [PMID: 37158092 DOI: 10.1080/07391102.2023.2209184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 04/19/2023] [Indexed: 05/10/2023]
Abstract
The search for Golgi α-mannosidase II (GMII) potent and specific inhibitors has been a focus of many studies for the past three decades since this enzyme is a key target for cancer treatment. α-Mannosidases, such as those from Drosophila melanogaster or Jack bean, have been used as functional models of the human Golgi α-mannosidase II (hGMII) because mammalian mannosidases are difficult to purify and characterize experimentally. Meanwhile, computational studies have been seen as privileged tools able to explore assertive solutions to specific enzymes, providing molecular details of these macromolecules, their protonation states and their interactions. Thus, modelling techniques can successfully predict hGMII 3D structure with high confidence, speeding up the development of new hits. In this study, Drosophila melanogaster Golgi mannosidase II (dGMII) and a novel human model, developed in silico and equilibrated via molecular dynamics simulations, were both opposed for docking. Our findings highlight that the design of novel inhibitors should be carried out considering the human model's characteristics and the enzyme operating pH. A reliable model is evidenced, showing a good correlation between Ki/IC50 experimental data and theoretical ΔGbinding estimations in GMII, opening the possibility of optimizing the rational drug design of new derivatives.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Artem Drogalin
- Chemistry Centre, School of Sciences, University of Minho, Braga, Portugal
| | - Luís S Monteiro
- Chemistry Centre, School of Sciences, University of Minho, Braga, Portugal
| | - Maria José Alves
- Chemistry Centre, School of Sciences, University of Minho, Braga, Portugal
| | - Tarsila G Castro
- CEB - Centre of Biological Engineering, University of Minho, Braga, Portugal
- LABBELS -Associate Laboratory, Braga/Guimarães, Portugal
| |
Collapse
|
10
|
Zheng L, Wang H, Liu X, Xu C, Tian M, Shi G, Bai C, Li Z, Wang J, Liu S. A panel of multivalent nanobodies broadly neutralizing Omicron subvariants and recombinant. J Med Virol 2024; 96:e29528. [PMID: 38501378 DOI: 10.1002/jmv.29528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 02/05/2024] [Accepted: 03/02/2024] [Indexed: 03/20/2024]
Abstract
The emerging Omicron subvariants have a remarkable ability to spread and escape nearly all current monoclonal antibody (mAb) treatments. Although the virulence of SARS-CoV-2 has now diminished, it remains a significant threat to public health due to its high transmissibility and susceptibility to mutation. Therefore, it is urgent to develop broad-acting and potent therapeutics targeting current and emerging Omicron variants. Here, we identified a panel of Omicron BA.1 spike receptor-binding domain (RBD)-targeted nanobodies (Nbs) from a naive alpaca VHH library. This panel of Nbs exhibited high binding affinity to the spike RBD of wild-type, Alpha B.1.1.7, Beta B.1.351, Delta plus, Omicron BA.1, and BA.2. Through multivalent Nb construction, we obtained a subpanel of ultrapotent neutralizing Nbs against Omicron BA.1, BA.2, BF.7 and even emerging XBB.1.5, and XBB.1.16 pseudoviruses. Protein structure prediction and docking analysis showed that Nb trimer 2F2E5 targets two independent RBD epitopes, thus minimizing viral escape. Taken together, we obtained a panel of broad and ultrapotent neutralizing Nbs against Omicron BA.1, Omicron BA.2, BF.7, XBB.1.5, and XBB.1.16. These multivalent Nbs hold great promise for the treatment against SARS-CoV-2 infection and could possess a superwide neutralizing breadth against novel omicron mutants or recombinants.
Collapse
Affiliation(s)
- Liuhai Zheng
- Department of Critical Medicine, School of Medicine, Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medical College of Jinan University, Shenzhen, Guangdong, China
- NMPA Key Laboratory for Research and Evaluation of Drug Metabolism & Guangdong Provincial Key Laboratory of New Drug Screening, School of Pharmaceutical Sciences, Southern Medical University, Guangzhou, China
| | - Huifang Wang
- Department of Critical Medicine, School of Medicine, Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medical College of Jinan University, Shenzhen, Guangdong, China
| | - Xueyan Liu
- Department of Critical Medicine, School of Medicine, Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medical College of Jinan University, Shenzhen, Guangdong, China
| | - Chengchao Xu
- Department of Critical Medicine, School of Medicine, Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medical College of Jinan University, Shenzhen, Guangdong, China
- College of Integrative Medicine, Laboratory of Pathophysiology, Key Laboratory of Integrative Medicine on Chronic Diseases, Fujian University of Traditional Chinese Medicine, Fuzhou, China
- State Key Laboratory for Quality Assurance and Sustainable Use of Dao-di Herbs, Artemisinin Research Center, and Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, China
| | - Mingxiong Tian
- School of Medicine, Zhongda Hospital, Southeast University, Nanjing, China
| | - Guangwei Shi
- NMPA Key Laboratory for Research and Evaluation of Drug Metabolism & Guangdong Provincial Key Laboratory of New Drug Screening, School of Pharmaceutical Sciences, Southern Medical University, Guangzhou, China
| | - Chongzhi Bai
- Central Laboratory, Shanxi Province Hospital of Traditional Chinese Medicine, Taiyuan, China
| | - Zhijie Li
- Department of Critical Medicine, School of Medicine, Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medical College of Jinan University, Shenzhen, Guangdong, China
| | - Jigang Wang
- Department of Critical Medicine, School of Medicine, Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medical College of Jinan University, Shenzhen, Guangdong, China
- NMPA Key Laboratory for Research and Evaluation of Drug Metabolism & Guangdong Provincial Key Laboratory of New Drug Screening, School of Pharmaceutical Sciences, Southern Medical University, Guangzhou, China
- State Key Laboratory for Quality Assurance and Sustainable Use of Dao-di Herbs, Artemisinin Research Center, and Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, China
- Department of Oncology, the Affiliated Hospital of Southwest Medical University, Luzhou, Sichuan, China
- State Key Laboratory of Antiviral Drugs, School of Pharmacy, Henan University, Kaifeng, Henan, China
| | - Shuwen Liu
- NMPA Key Laboratory for Research and Evaluation of Drug Metabolism & Guangdong Provincial Key Laboratory of New Drug Screening, School of Pharmaceutical Sciences, Southern Medical University, Guangzhou, China
| |
Collapse
|
11
|
Wuyun Q, Chen Y, Shen Y, Cao Y, Hu G, Cui W, Gao J, Zheng W. Recent Progress of Protein Tertiary Structure Prediction. Molecules 2024; 29:832. [PMID: 38398585 PMCID: PMC10893003 DOI: 10.3390/molecules29040832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 02/06/2024] [Accepted: 02/08/2024] [Indexed: 02/25/2024] Open
Abstract
The prediction of three-dimensional (3D) protein structure from amino acid sequences has stood as a significant challenge in computational and structural bioinformatics for decades. Recently, the widespread integration of artificial intelligence (AI) algorithms has substantially expedited advancements in protein structure prediction, yielding numerous significant milestones. In particular, the end-to-end deep learning method AlphaFold2 has facilitated the rise of structure prediction performance to new heights, regularly competitive with experimental structures in the 14th Critical Assessment of Protein Structure Prediction (CASP14). To provide a comprehensive understanding and guide future research in the field of protein structure prediction for researchers, this review describes various methodologies, assessments, and databases in protein structure prediction, including traditionally used protein structure prediction methods, such as template-based modeling (TBM) and template-free modeling (FM) approaches; recently developed deep learning-based methods, such as contact/distance-guided methods, end-to-end folding methods, and protein language model (PLM)-based methods; multi-domain protein structure prediction methods; the CASP experiments and related assessments; and the recently released AlphaFold Protein Structure Database (AlphaFold DB). We discuss their advantages, disadvantages, and application scopes, aiming to provide researchers with insights through which to understand the limitations, contexts, and effective selections of protein structure prediction methods in protein-related fields.
Collapse
Affiliation(s)
- Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Yihan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Yifeng Shen
- Faculty of Environment and Information Studies, Keio University, Fujisawa 252-0882, Kanagawa, Japan;
| | - Yang Cao
- College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China
| | - Wei Cui
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
12
|
Li J, Wang L, Zhu Z, Song C. Exploring the Alternative Conformation of a Known Protein Structure Based on Contact Map Prediction. J Chem Inf Model 2024; 64:301-315. [PMID: 38117138 PMCID: PMC10777399 DOI: 10.1021/acs.jcim.3c01381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 12/03/2023] [Accepted: 12/05/2023] [Indexed: 12/21/2023]
Abstract
The rapid development of deep learning-based methods has considerably advanced the field of protein structure prediction. The accuracy of predicting the 3D structures of simple proteins is comparable to that of experimentally determined structures, providing broad possibilities for structure-based biological studies. Another critical question is whether and how multistate structures can be predicted from a given protein sequence. In this study, analysis of tens of two-state proteins demonstrated that deep learning-based contact map predictions contain structural information on both states, which suggests that it is probably appropriate to change the target of deep learning-based protein structure prediction from one specific structure to multiple likely structures. Furthermore, by combining deep learning- and physics-based computational methods, we developed a protocol for exploring alternative conformations from a known structure of a given protein, by which we successfully approached the holo-state conformations of multiple representative proteins from their apo-state structures.
Collapse
Affiliation(s)
- Jiaxuan Li
- Center
for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Lei Wang
- Center
for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua
Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Zefeng Zhu
- Center
for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua
Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Chen Song
- Center
for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua
Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| |
Collapse
|
13
|
Polonsky K, Pupko T, Freund NT. Evaluation of the Ability of AlphaFold to Predict the Three-Dimensional Structures of Antibodies and Epitopes. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2023; 211:1578-1588. [PMID: 37782047 DOI: 10.4049/jimmunol.2300150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 09/06/2023] [Indexed: 10/03/2023]
Abstract
Being able to accurately predict the three-dimensional structure of an Ab can facilitate Ab characterization and epitope prediction, with important diagnostic and clinical implications. In this study, we evaluated the ability of AlphaFold to predict the structures of 222 recently published, high-resolution Fab H and L chain structures of Abs from different species directed against different Ags. We show that although the overall Ab prediction quality is in line with the results of CASP14, regions such as the complementarity-determining regions (CDRs) of the H chain, which are prone to higher variation, are predicted less accurately. Moreover, we discovered that AlphaFold mispredicts the bending angles between the variable and constant domains. To evaluate the ability of AlphaFold to model Ab-Ag interactions based only on sequence, we used AlphaFold-Multimer in combination with ZDOCK to predict the structures of 26 known Ab-Ag complexes. ZDOCK, which was applied on bound components of both the Ab and the Ag, succeeded in assembling 11 complexes, whereas AlphaFold succeeded in predicting only 2 of 26 models, with significant deviations in the docking contacts predicted in the rest of the molecules. Within the 11 complexes that were successfully predicted by ZDOCK, 9 involved short-peptide Ags (18-mer or less), whereas only 2 were complexes of Ab with a full-length protein. Docking of modeled unbound Ab and Ag was unsuccessful. In summary, our study provides important information about the abilities and limitations of using AlphaFold to predict Ab-Ag interactions and suggests areas for possible improvement.
Collapse
Affiliation(s)
- Ksenia Polonsky
- Department of Clinical Microbiology and Immunology, Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Tal Pupko
- Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Natalia T Freund
- Department of Clinical Microbiology and Immunology, Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
14
|
Mostofian B, Martin HJ, Razavi A, Patel S, Allen B, Sherman W, Izaguirre JA. Targeted Protein Degradation: Advances, Challenges, and Prospects for Computational Methods. J Chem Inf Model 2023; 63:5408-5432. [PMID: 37602861 PMCID: PMC10498452 DOI: 10.1021/acs.jcim.3c00603] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Indexed: 08/22/2023]
Abstract
The therapeutic approach of targeted protein degradation (TPD) is gaining momentum due to its potentially superior effects compared with protein inhibition. Recent advancements in the biotech and pharmaceutical sectors have led to the development of compounds that are currently in human trials, with some showing promising clinical results. However, the use of computational tools in TPD is still limited, as it has distinct characteristics compared with traditional computational drug design methods. TPD involves creating a ternary structure (protein-degrader-ligase) responsible for the biological function, such as ubiquitination and subsequent proteasomal degradation, which depends on the spatial orientation of the protein of interest (POI) relative to E2-loaded ubiquitin. Modeling this structure necessitates a unique blend of tools initially developed for small molecules (e.g., docking) and biologics (e.g., protein-protein interaction modeling). Additionally, degrader molecules, particularly heterobifunctional degraders, are generally larger than conventional small molecule drugs, leading to challenges in determining drug-like properties like solubility and permeability. Furthermore, the catalytic nature of TPD makes occupancy-based modeling insufficient. TPD consists of multiple interconnected yet distinct steps, such as POI binding, E3 ligase binding, ternary structure interactions, ubiquitination, and degradation, along with traditional small molecule properties. A comprehensive set of tools is needed to address the dynamic nature of the induced proximity ternary complex and its implications for ubiquitination. In this Perspective, we discuss the current state of computational tools for TPD. We start by describing the series of steps involved in the degradation process and the experimental methods used to characterize them. Then, we delve into a detailed analysis of the computational tools employed in TPD. We also present an integrative approach that has proven successful for degrader design and its impact on project decisions. Finally, we examine the future prospects of computational methods in TPD and the areas with the greatest potential for impact.
Collapse
Affiliation(s)
- Barmak Mostofian
- OpenEye, Cadence Molecular Sciences, Boston, Massachusetts 02114 United States
| | - Holli-Joi Martin
- Laboratory
for Molecular Modeling, Division of Chemical Biology and Medicinal
Chemistry, Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599 United States
| | - Asghar Razavi
- ENKO
Chem, Inc, Mystic, Connecticut 06355 United States
| | - Shivam Patel
- Psivant
Therapeutics, Boston, Massachusetts 02210 United States
| | - Bryce Allen
- Differentiated
Therapeutics, San Diego, California 92056 United States
| | - Woody Sherman
- Psivant
Therapeutics, Boston, Massachusetts 02210 United States
| | - Jesus A Izaguirre
- Differentiated
Therapeutics, San Diego, California 92056 United States
- Atommap
Corporation, New York, New York 10013 United States
| |
Collapse
|
15
|
Mani H, Chang CC, Hsu HJ, Yang CH, Yen JH, Liou JW. Comparison, Analysis, and Molecular Dynamics Simulations of Structures of a Viral Protein Modeled Using Various Computational Tools. Bioengineering (Basel) 2023; 10:1004. [PMID: 37760106 PMCID: PMC10525864 DOI: 10.3390/bioengineering10091004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 08/16/2023] [Accepted: 08/22/2023] [Indexed: 09/29/2023] Open
Abstract
The structural analysis of proteins is a major domain of biomedical research. Such analysis requires resolved three-dimensional structures of proteins. Advancements in computer technology have led to progress in biomedical research. In silico prediction and modeling approaches have facilitated the construction of protein structures, with or without structural templates. In this study, we used three neural network-based de novo modeling approaches-AlphaFold2 (AF2), Robetta-RoseTTAFold (Robetta), and transform-restrained Rosetta (trRosetta)-and two template-based tools-the Molecular Operating Environment (MOE) and iterative threading assembly refinement (I-TASSER)-to construct the structure of a viral capsid protein, hepatitis C virus core protein (HCVcp), whose structure have not been fully resolved by laboratory techniques. Templates with sufficient sequence identity for the homology modeling of complete HCVcp are currently unavailable. Therefore, we performed domain-based homology modeling for MOE simulations. The templates for each domain were obtained through sequence-based searches on NCBI and the Protein Data Bank. Then, the modeled domains were assembled to construct the complete structure of HCVcp. The full-length structure and two truncated forms modeled using various computational tools were compared. Molecular dynamics (MD) simulations were performed to refine the structures. The root mean square deviation of backbone atoms, root mean square fluctuation of Cα atoms, and radius of gyration were calculated to monitor structural changes and convergence in the simulations. The model quality was evaluated through ERRAT and phi-psi plot analysis. In terms of the initial prediction for protein modeling, Robetta and trRosetta outperformed AF2. Regarding template-based tools, MOE outperformed I-TASSER. MD simulations resulted in compactly folded protein structures, which were of good quality and theoretically accurate. Thus, the predicted structures of certain proteins must be refined to obtain reliable structural models. MD simulation is a promising tool for this purpose.
Collapse
Affiliation(s)
- Hemalatha Mani
- Institute of Medical Sciences, Tzu Chi University, Hualien 97004, Taiwan
| | - Chun-Chun Chang
- Department of Laboratory Medicine, Hualien Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, Hualien 97004, Taiwan
- Department of Laboratory Medicine and Biotechnology, Tzu Chi University, Hualien 97004, Taiwan
| | - Hao-Jen Hsu
- Department of Biomedical Sciences and Engineering, Tzu Chi University, Hualien 97004, Taiwan
| | - Chin-Hao Yang
- Department of Biochemistry, School of Medicine, Tzu Chi University, Hualien 97004, Taiwan
| | - Jui-Hung Yen
- Department of Molecular Biology and Human Genetics, Tzu Chi University, Hualien 97004, Taiwan
| | - Je-Wen Liou
- Institute of Medical Sciences, Tzu Chi University, Hualien 97004, Taiwan
- Department of Laboratory Medicine and Biotechnology, Tzu Chi University, Hualien 97004, Taiwan
- Department of Biochemistry, School of Medicine, Tzu Chi University, Hualien 97004, Taiwan
| |
Collapse
|
16
|
Begum MN, Mahtarin R, Islam MT, Ahmed S, Konika TK, Mannoor K, Akhteruzzaman S, Qadri F. Molecular investigation of TSHR gene in Bangladeshi congenital hypothyroid patients. PLoS One 2023; 18:e0282553. [PMID: 37561783 PMCID: PMC10414570 DOI: 10.1371/journal.pone.0282553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 07/11/2023] [Indexed: 08/12/2023] Open
Abstract
The disorder of thyroid gland development or thyroid dysgenesis accounts for 80-85% of congenital hypothyroidism (CH) cases. Mutations in the TSHR gene are mostly associated with thyroid dysgenesis, and prevent or disrupt normal development of the gland. There is limited data available on the genetic spectrum of congenital hypothyroid children in Bangladesh. Thus, an understanding of the molecular aetiology of thyroid dysgenesis is a prerequisite. The aim of the study was to investigate the effect of mutations in the TSHR gene on the small molecule thyrogenic drug-binding site of the protein. We identified two nonsynonymous mutations (p.Ser508Leu, p.Glu727Asp) in the exon 10 of the TSHR gene in 21 patients with dysgenesis by sequencing-based analysis. Later, the TSHR368-764 protein was modeled by the I-TASSER server for wild-type and mutant structures. The model proteins were targeted by thyrogenic drugs, MS437 and MS438 to perceive the effect of mutations. The damaging effect in drug-protein complexes of mutants was explored by molecular docking and molecular dynamics simulations. The binding affinity of wild-type protein was much higher than the mutant cases for both of the drug ligands (MS437 and MS438). Molecular dynamics simulates the dynamic behavior of wild-type and mutant complexes. MS437-TSHR368-764MT2 and MS438-TSHR368-764MT1 showed stable conformations in biological environments. Finally, Principle Component Analysis revealed structural and energy profile discrepancies. TSHR368-764MT1 exhibited much more variations than TSHR368-764WT and TSHR368-764MT2, emphasizing a more damaging pattern in TSHR368-764MT1. This genetic study might be helpful to explore the mutational impact on drug binding sites of TSHR protein which is important for future drug design and selection for the treatment of congenital hypothyroid children with dysgenesis.
Collapse
Affiliation(s)
- Mst. Noorjahan Begum
- Institute for Developing Science and Health Initiatives (ideSHi), ECB Chattar, Mirpur, Dhaka, Bangladesh
- Department of Genetic Engineering & Biotechnology, University of Dhaka, Dhaka, Bangladesh
- Virology Laboratory, Infectious Diseases Division, International Centre for Diarrhoeal Disease Research, Bangladesh, Mohakhali, Dhaka, Bangladesh
| | - Rumana Mahtarin
- Institute for Developing Science and Health Initiatives (ideSHi), ECB Chattar, Mirpur, Dhaka, Bangladesh
- Department of Biochemistry and Molecular Biology, Shahjalal University of Science and Technology, Sylhet, Bangladesh
| | - Md. Tarikul Islam
- Institute for Developing Science and Health Initiatives (ideSHi), ECB Chattar, Mirpur, Dhaka, Bangladesh
| | - Sinthyia Ahmed
- Division of Computer Aided Drug Design, The Red-Green Research Centre, BICCB, Tejgaon, Dhaka, Bangladesh
| | - Tasnia Kawsar Konika
- Nuclear Medicine and Allied Sciences, Bangabandhu Sheikh Mujib Medical University (BSMMU), Shahbag, Dhaka, Bangladesh
| | - Kaiissar Mannoor
- Institute for Developing Science and Health Initiatives (ideSHi), ECB Chattar, Mirpur, Dhaka, Bangladesh
| | - Sharif Akhteruzzaman
- Department of Genetic Engineering & Biotechnology, University of Dhaka, Dhaka, Bangladesh
| | - Firdausi Qadri
- Institute for Developing Science and Health Initiatives (ideSHi), ECB Chattar, Mirpur, Dhaka, Bangladesh
- Mucosal Immunology and Vaccinology, Infectious Diseases Division, International Centre for Diarrhoeal Disease Research, Bangladesh, Mohakhali, Dhaka, Bangladesh
| |
Collapse
|
17
|
Appadurai R, Koneru JK, Bonomi M, Robustelli P, Srivastava A. Clustering Heterogeneous Conformational Ensembles of Intrinsically Disordered Proteins with t-Distributed Stochastic Neighbor Embedding. J Chem Theory Comput 2023; 19:4711-4727. [PMID: 37338049 PMCID: PMC11108026 DOI: 10.1021/acs.jctc.3c00224] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/21/2023]
Abstract
Intrinsically disordered proteins (IDPs) populate a range of conformations that are best described by a heterogeneous ensemble. Grouping an IDP ensemble into "structurally similar" clusters for visualization, interpretation, and analysis purposes is a much-desired but formidable task, as the conformational space of IDPs is inherently high-dimensional and reduction techniques often result in ambiguous classifications. Here, we employ the t-distributed stochastic neighbor embedding (t-SNE) technique to generate homogeneous clusters of IDP conformations from the full heterogeneous ensemble. We illustrate the utility of t-SNE by clustering conformations of two disordered proteins, Aβ42, and α-synuclein, in their APO states and when bound to small molecule ligands. Our results shed light on ordered substates within disordered ensembles and provide structural and mechanistic insights into binding modes that confer specificity and affinity in IDP ligand binding. t-SNE projections preserve the local neighborhood information, provide interpretable visualizations of the conformational heterogeneity within each ensemble, and enable the quantification of cluster populations and their relative shifts upon ligand binding. Our approach provides a new framework for detailed investigations of the thermodynamics and kinetics of IDP ligand binding and will aid rational drug design for IDPs.
Collapse
Affiliation(s)
- Rajeswari Appadurai
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka 560012, India
| | | | - Massimiliano Bonomi
- Structural Bioinformatics Unit, Department of Structural Biology and Chemistry. CNRS UMR 3528, C3BI, CNRS USR 3756, Institut Pasteur, Paris, France
| | - Paul Robustelli
- Dartmouth College, Department of Chemistry, Hanover, NH, 03755, USA
| | - Anand Srivastava
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka 560012, India
| |
Collapse
|
18
|
Nagar N, Tubiana J, Loewenthal G, Wolfson HJ, Ben Tal N, Pupko T. EvoRator2: Predicting Site-specific Amino Acid Substitutions Based on Protein Structural Information Using Deep Learning. J Mol Biol 2023; 435:168155. [PMID: 37356902 DOI: 10.1016/j.jmb.2023.168155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 05/13/2023] [Accepted: 05/17/2023] [Indexed: 06/27/2023]
Abstract
Multiple sequence alignments (MSAs) are the workhorse of molecular evolution and structural biology research. From MSAs, the amino acids that are tolerated at each site during protein evolution can be inferred. However, little is known regarding the repertoire of tolerated amino acids in proteins when only a few or no sequence homologs are available, such as orphan and de novo designed proteins. Here we present EvoRator2, a deep-learning algorithm trained on over 15,000 protein structures that can predict which amino acids are tolerated at any given site, based exclusively on protein structural information mined from atomic coordinate files. We show that EvoRator2 obtained satisfying results for the prediction of position-weighted scoring matrices (PSSM). We further show that EvoRator2 obtained near state-of-the-art performance on proteins with high quality structures in predicting the effect of mutations in deep mutation scanning (DMS) experiments and that for certain DMS targets, EvoRator2 outperformed state-of-the-art methods. We also show that by combining EvoRator2's predictions with those obtained by a state-of-the-art deep-learning method that accounts for the information in the MSA, the prediction of the effect of mutation in DMS experiments was improved in terms of both accuracy and stability. EvoRator2 is designed to predict which amino-acid substitutions are tolerated in such proteins without many homologous sequences, including orphan or de novo designed proteins. We implemented our approach in the EvoRator web server (https://evorator.tau.ac.il).
Collapse
Affiliation(s)
- Natan Nagar
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Jérôme Tubiana
- Blavatnik School of Computer Science, Raymond & Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Gil Loewenthal
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Haim J Wolfson
- Blavatnik School of Computer Science, Raymond & Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Nir Ben Tal
- School of Neurobiology, Biochemistry & Biophysics, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
| |
Collapse
|
19
|
Adhav V, Saikrishnan K. The Realm of Unconventional Noncovalent Interactions in Proteins: Their Significance in Structure and Function. ACS OMEGA 2023; 8:22268-22284. [PMID: 37396257 PMCID: PMC10308531 DOI: 10.1021/acsomega.3c00205] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 05/22/2023] [Indexed: 07/04/2023]
Abstract
Proteins and their assemblies are fundamental for living cells to function. Their complex three-dimensional architecture and its stability are attributed to the combined effect of various noncovalent interactions. It is critical to scrutinize these noncovalent interactions to understand their role in the energy landscape in folding, catalysis, and molecular recognition. This Review presents a comprehensive summary of unconventional noncovalent interactions, beyond conventional hydrogen bonds and hydrophobic interactions, which have gained prominence over the past decade. The noncovalent interactions discussed include low-barrier hydrogen bonds, C5 hydrogen bonds, C-H···π interactions, sulfur-mediated hydrogen bonds, n → π* interactions, London dispersion interactions, halogen bonds, chalcogen bonds, and tetrel bonds. This Review focuses on their chemical nature, interaction strength, and geometrical parameters obtained from X-ray crystallography, spectroscopy, bioinformatics, and computational chemistry. Also highlighted are their occurrence in proteins or their complexes and recent advances made toward understanding their role in biomolecular structure and function. Probing the chemical diversity of these interactions, we determined that the variable frequency of occurrence in proteins and the ability to synergize with one another are important not only for ab initio structure prediction but also to design proteins with new functionalities. A better understanding of these interactions will promote their utilization in designing and engineering ligands with potential therapeutic value.
Collapse
Affiliation(s)
- Vishal
Annasaheb Adhav
- Department of Biology, Indian Institute of Science Education and Research, Pune 411008, India
| | - Kayarat Saikrishnan
- Department of Biology, Indian Institute of Science Education and Research, Pune 411008, India
| |
Collapse
|
20
|
Abbas U, Chen J, Shao Q. Assessing Fairness of AlphaFold2 Prediction of Protein 3D Structures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.23.542006. [PMID: 37293014 PMCID: PMC10245900 DOI: 10.1101/2023.05.23.542006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
AlphaFold2 is reshaping biomedical research by enabling the prediction of a protein's 3D structure solely based on its amino acid sequence. This breakthrough reduces reliance on labor-intensive experimental methods traditionally used to obtain protein structures, thereby accelerating the pace of scientific discovery. Despite the bright future, it remains unclear whether AlphaFold2 can uniformly predict the wide spectrum of proteins equally well. Systematic investigation into the fairness and unbiased nature of its predictions is still an area yet to be thoroughly explored. In this paper, we conducted an in-depth analysis of AlphaFold2's fairness using data comprised of five million reported protein structures from its open-access repository. Specifically, we assessed the variability in the distribution of PLDDT scores, considering factors such as amino acid type, secondary structure, and sequence length. Our findings reveal a systematic discrepancy in AlphaFold2's predictive reliability, varying across different types of amino acids and secondary structures. Furthermore, we observed that the size of the protein exerts a notable impact on the credibility of the 3D structural prediction. AlphaFold2 demonstrates enhanced prediction power for proteins of medium size compared to those that are either smaller or larger. These systematic biases could potentially stem from inherent biases present in its training data and model architecture. These factors need to be taken into account when expanding the applicability of AlphaFold2.
Collapse
Affiliation(s)
- Usman Abbas
- Chemical & Materials Engineering, University of Kentucky, Lexington, Kentucky, USA
| | - Jin Chen
- Institute for Biomedical Informatics, University of Kentucky, Lexington, Kentucky, USA
| | - Qing Shao
- Chemical & Materials Engineering, University of Kentucky, Lexington, Kentucky, USA
| |
Collapse
|
21
|
Wodak SJ, Vajda S, Lensink MF, Kozakov D, Bates PA. Critical Assessment of Methods for Predicting the 3D Structure of Proteins and Protein Complexes. Annu Rev Biophys 2023; 52:183-206. [PMID: 36626764 PMCID: PMC10885158 DOI: 10.1146/annurev-biophys-102622-084607] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Advances in a scientific discipline are often measured by small, incremental steps. In this review, we report on two intertwined disciplines in the protein structure prediction field, modeling of single chains and modeling of complexes, that have over decades emulated this pattern, as monitored by the community-wide blind prediction experiments CASP and CAPRI. However, over the past few years, dramatic advances were observed for the accurate prediction of single protein chains, driven by a surge of deep learning methodologies entering the prediction field. We review the mainscientific developments that enabled these recent breakthroughs and feature the important role of blind prediction experiments in building up and nurturing the structure prediction field. We discuss how the new wave of artificial intelligence-based methods is impacting the fields of computational and experimental structural biology and highlight areas in which deep learning methods are likely to lead to future developments, provided that major challenges are overcome.
Collapse
Affiliation(s)
- Shoshana J Wodak
- VIB-VUB Center for Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium;
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA;
- Department of Chemistry, Boston University, Boston, Massachusetts, USA
| | - Marc F Lensink
- Univ. Lille, CNRS, UMR 8576-UGSF-Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France;
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA;
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, United Kingdom;
| |
Collapse
|
22
|
Roth MG, Westrick NM, Baldwin TT. Fungal biotechnology: From yesterday to tomorrow. FRONTIERS IN FUNGAL BIOLOGY 2023; 4:1135263. [PMID: 37746125 PMCID: PMC10512358 DOI: 10.3389/ffunb.2023.1135263] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 03/07/2023] [Indexed: 09/26/2023]
Abstract
Fungi have been used to better the lives of everyday people and unravel the mysteries of higher eukaryotic organisms for decades. However, comparing progress and development stemming from fungal research to that of human, plant, and bacterial research, fungi remain largely understudied and underutilized. Recent commercial ventures have begun to gain popularity in society, providing a new surge of interest in fungi, mycelia, and potential new applications of these organisms to various aspects of research. Biotechnological advancements in fungal research cannot occur without intensive amounts of time, investments, and research tool development. In this review, we highlight past breakthroughs in fungal biotechnology, discuss requirements to advance fungal biotechnology even further, and touch on the horizon of new breakthroughs with the highest potential to positively impact both research and society.
Collapse
Affiliation(s)
- Mitchell G. Roth
- Department of Plant Pathology, The Ohio State University, Wooster, OH, United States
| | - Nathaniel M. Westrick
- Department of Plant Pathology, University of Wisconsin-Madison, Madison, WI, United States
| | - Thomas T. Baldwin
- Department of Plant Pathology, North Dakota State University, Fargo, ND, United States
| |
Collapse
|
23
|
Dou Z, Sun Y, Jiang X, Wu X, Li Y, Gong B, Wang L. Data-driven strategies for the computational design of enzyme thermal stability: trends, perspectives, and prospects. Acta Biochim Biophys Sin (Shanghai) 2023; 55:343-355. [PMID: 37143326 PMCID: PMC10160227 DOI: 10.3724/abbs.2023033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 11/23/2022] [Indexed: 03/05/2023] Open
Abstract
Thermal stability is one of the most important properties of enzymes, which sustains life and determines the potential for the industrial application of biocatalysts. Although traditional methods such as directed evolution and classical rational design contribute greatly to this field, the enormous sequence space of proteins implies costly and arduous experiments. The development of enzyme engineering focuses on automated and efficient strategies because of the breakthrough of high-throughput DNA sequencing and machine learning models. In this review, we propose a data-driven architecture for enzyme thermostability engineering and summarize some widely adopted datasets, as well as machine learning-driven approaches for designing the thermal stability of enzymes. In addition, we present a series of existing challenges while applying machine learning in enzyme thermostability design, such as the data dilemma, model training, and use of the proposed models. Additionally, a few promising directions for enhancing the performance of the models are discussed. We anticipate that the efficient incorporation of machine learning can provide more insights and solutions for the design of enzyme thermostability in the coming years.
Collapse
Affiliation(s)
- Zhixin Dou
- State Key Laboratory of Microbial TechnologyShandong UniversityQingdao266237China
| | - Yuqing Sun
- School of SoftwareShandong UniversityJinan250101China
| | - Xukai Jiang
- National Glycoengineering Research CenterShandong UniversityQingdao266237China
| | - Xiuyun Wu
- State Key Laboratory of Microbial TechnologyShandong UniversityQingdao266237China
| | - Yingjie Li
- State Key Laboratory of Microbial TechnologyShandong UniversityQingdao266237China
| | - Bin Gong
- School of SoftwareShandong UniversityJinan250101China
| | - Lushan Wang
- State Key Laboratory of Microbial TechnologyShandong UniversityQingdao266237China
| |
Collapse
|
24
|
Sicard J, Barbe S, Boutrou R, Bouvier L, Delaplace G, Lashermes G, Théron L, Vitrac O, Tonda A. A primer on predictive techniques for food and bioresources transformation processes. J FOOD PROCESS ENG 2023. [DOI: 10.1111/jfpe.14325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2023]
Affiliation(s)
| | | | | | - Laurent Bouvier
- UMET Université de Lille, CNRS, Centrale Lille, INRAE Villeneuve‐D'Ascq France
| | - Guillaume Delaplace
- UMET Université de Lille, CNRS, Centrale Lille, INRAE Villeneuve‐D'Ascq France
| | | | | | - Olivier Vitrac
- SayFood, INRAE, AgroParisTech Université Paris Saclay Massy France
| | - Alberto Tonda
- MIA‐Paris, AgroParisTech, INRAE Université Paris Saclay Paris France
| |
Collapse
|
25
|
Buehler MJ. Unsupervised cross-domain translation via deep learning and adversarial attention neural networks and application to music-inspired protein designs. PATTERNS (NEW YORK, N.Y.) 2023; 4:100692. [PMID: 36960446 PMCID: PMC10028431 DOI: 10.1016/j.patter.2023.100692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 01/02/2023] [Accepted: 01/24/2023] [Indexed: 02/16/2023]
Abstract
Taking inspiration from nature about how to design materials has been a fruitful approach, used by humans for millennia. In this paper we report a method that allows us to discover how patterns in disparate domains can be reversibly related using a computationally rigorous approach, the AttentionCrossTranslation model. The algorithm discovers cycle- and self-consistent relationships and offers a bidirectional translation of information across disparate knowledge domains. The approach is validated with a set of known translation problems, and then used to discover a mapping between musical data-based on the corpus of note sequences in J.S. Bach's Goldberg Variations created in 1741-and protein sequence data-information sampled more recently. Using protein folding algorithms, 3D structures of the predicted protein sequences are generated, and their stability is validated using explicit solvent molecular dynamics. Musical scores generated from protein sequences are sonified and rendered into audible sound.
Collapse
Affiliation(s)
- Markus J. Buehler
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
- Center for Computational Science and Engineering, Schwarzman College of Computing, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
- Department of Mechanical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
- Corresponding author
| |
Collapse
|
26
|
Szwabowski GL, Baker DL, Parrill AL. Application of computational methods for class A GPCR Ligand discovery. J Mol Graph Model 2023; 121:108434. [PMID: 36841204 DOI: 10.1016/j.jmgm.2023.108434] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 02/11/2023] [Accepted: 02/13/2023] [Indexed: 02/22/2023]
Abstract
G protein-coupled receptors (GPCR) are integral membrane proteins of considerable interest as targets for drug development due to their role in transmitting cellular signals in a multitude of biological processes. Of the six classes categorizing GPCR (A, B, C, D, E, and F), class A contains the largest number of therapeutically relevant GPCR. Despite their importance as drug targets, many challenges exist for the discovery of novel class A GPCR ligands serving as drug precursors. Though knowledge of the structural and functional characteristics of GPCR has grown significantly over the past 20 years, a large portion of GPCR lack reported, experimentally determined structures. Furthermore, many GPCR have no known endogenous and/or synthetic ligands, limiting further exploration of their biochemical, cellular, and physiological roles. While many successes in GPCR ligand discovery have resulted from experimental high-throughput screening, computational methods have played an increasingly important role in GPCR ligand identification in the past decade. Here we discuss computational techniques applied to GPCR ligand discovery. This review summarizes class A GPCR structure/function and provides an overview of many obstacles currently faced in GPCR ligand discovery. Furthermore, we discuss applications and recent successes of computational techniques used to predict GPCR structure as well as present a summary of ligand- and structure-based methods used to identify potential GPCR ligands. Finally, we discuss computational hit list generation and refinement and provide comprehensive workflows for GPCR ligand identification.
Collapse
Affiliation(s)
| | - Daniel L Baker
- Department of Chemistry, The University of Memphis, Memphis, TN, 38152, USA
| | - Abby L Parrill
- Department of Chemistry, The University of Memphis, Memphis, TN, 38152, USA.
| |
Collapse
|
27
|
Gogoi CR, Rahman A, Saikia B, Baruah A. Protein Dihedral Angle Prediction: The State of the Art. ChemistrySelect 2023. [DOI: 10.1002/slct.202203427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
| | - Aziza Rahman
- Department of Chemistry Dibrugarh University Dibrugarh Assam India
| | - Bondeepa Saikia
- Department of Chemistry Dibrugarh University Dibrugarh Assam India
| | - Anupaul Baruah
- Department of Chemistry Dibrugarh University Dibrugarh Assam India
| |
Collapse
|
28
|
Banerjee A, Saha S, Tvedt NC, Yang LW, Bahar I. Mutually beneficial confluence of structure-based modeling of protein dynamics and machine learning methods. Curr Opin Struct Biol 2023; 78:102517. [PMID: 36587424 PMCID: PMC10038760 DOI: 10.1016/j.sbi.2022.102517] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/19/2022] [Accepted: 11/22/2022] [Indexed: 12/31/2022]
Abstract
Proteins sample an ensemble of conformers under physiological conditions, having access to a spectrum of modes of motions, also called intrinsic dynamics. These motions ensure the adaptation to various interactions in the cell, and largely assist in, if not determine, viable mechanisms of biological function. In recent years, machine learning frameworks have proven uniquely useful in structural biology, and recent studies further provide evidence to the utility and/or necessity of considering intrinsic dynamics for increasing their predictive ability. Efficient quantification of dynamics-based attributes by recently developed physics-based theories and models such as elastic network models provides a unique opportunity to generate data on dynamics for training ML models towards inferring mechanisms of protein function, assessing pathogenicity, or estimating binding affinities.
Collapse
Affiliation(s)
- Anupam Banerjee
- Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh PA 15261, USA
| | - Satyaki Saha
- Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh PA 15261, USA
| | - Nathan C Tvedt
- Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh PA 15261, USA; Computational and Applied Mathematics and Statistics, The College of William and Mary, Williamsburg, VA 23185, USA
| | - Lee-Wei Yang
- Institute of Bioinformatics and Structural Biology, and PhD Program in Biomedical Artificial Intelligence, National Tsing Hua University, Hsinchu 300044, Taiwan; Physics Division, National Center for Theoretical Sciences, Taipei 106319, Taiwan
| | - Ivet Bahar
- Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh PA 15261, USA.
| |
Collapse
|
29
|
Pearce R, Huang X, Omenn GS, Zhang Y. De novo protein fold design through sequence-independent fragment assembly simulations. Proc Natl Acad Sci U S A 2023; 120:e2208275120. [PMID: 36656852 PMCID: PMC9942881 DOI: 10.1073/pnas.2208275120] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 12/22/2022] [Indexed: 01/20/2023] Open
Abstract
De novo protein design generally consists of two steps, including structure and sequence design. Many protein design studies have focused on sequence design with scaffolds adapted from native structures in the PDB, which renders novel areas of protein structure and function space unexplored. We developed FoldDesign to create novel protein folds from specific secondary structure (SS) assignments through sequence-independent replica-exchange Monte Carlo (REMC) simulations. The method was tested on 354 non-redundant topologies, where FoldDesign consistently created stable structural folds, while recapitulating on average 87.7% of the SS elements. Meanwhile, the FoldDesign scaffolds had well-formed structures with buried residues and solvent-exposed areas closely matching their native counterparts. Despite the high fidelity to the input SS restraints and local structural characteristics of native proteins, a large portion of the designed scaffolds possessed global folds completely different from natural proteins in the PDB, highlighting the ability of FoldDesign to explore novel areas of protein fold space. Detailed data analyses revealed that the major contributions to the successful structure design lay in the optimal energy force field, which contains a balanced set of SS packing terms, and REMC simulations, which were coupled with multiple auxiliary movements to efficiently search the conformational space. Additionally, the ability to recognize and assemble uncommon super-SS geometries, rather than the unique arrangement of common SS motifs, was the key to generating novel folds. These results demonstrate a strong potential to explore both structural and functional spaces through computational design simulations that natural proteins have not reached through evolution.
Collapse
Affiliation(s)
- Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI48109
| | - Xiaoqiang Huang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI48109
| | - Gilbert S. Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI48109
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI48109
- Department of Human Genetics, University of Michigan, Ann Arbor, MI48109
- School of Public Health, University of Michigan, Ann Arbor, MI48109
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI48109
- Department of Computer Science, School of Computing, National University of Singapore117417, Singapore
- Cancer Science Institute of Singapore, National University of Singapore117599, Singapore
| |
Collapse
|
30
|
Nallasamy V, Seshiah M. Energy Profile Bayes and Thompson Optimized Convolutional Neural Network protein structure prediction. Neural Comput Appl 2023; 35:1983-2006. [PMID: 36245797 PMCID: PMC9542649 DOI: 10.1007/s00521-022-07868-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 09/21/2022] [Indexed: 01/12/2023]
Abstract
In living organisms, proteins are considered as the executants of biological functions. Owing to its pivotal role played in protein folding patterns, comprehension of protein structure is a challenging issue. Moreover, owing to numerous protein sequence exploration in protein data banks and complication of protein structures, experimental methods are found to be inadequate for protein structural class prediction. Hence, it is very much advantageous to design a reliable computational method to predict protein structural classes from protein sequences. In the recent few years there has been an elevated interest in using deep learning to assist protein structure prediction as protein structure prediction models can be utilized to screen a large number of novel sequences. In this regard, we propose a model employing Energy Profile for atom pairs in conjunction with the Legion-Class Bayes function called Energy Profile Legion-Class Bayes Protein Structure Identification model. Followed by this, we use a Thompson Optimized convolutional neural network to extract features between amino acids and then the Thompson Optimized SoftMax function is employed to extract associations between protein sequences for predicting secondary protein structure. The proposed Energy Profile Bayes and Thompson Optimized Convolutional Neural Network (EPB-OCNN) method tested distinct unique protein data and was compared to the state-of-the-art methods, the Template-Based Modeling, Protein Design using Deep Graph Neural Networks, a deep learning-based S-glutathionylation sites prediction tool called a Computational Framework, the Deep Learning and a distance-based protein structure prediction using deep learning. The results obtained when applied with the Biopython tool with respect to protein structure prediction time, protein structure prediction accuracy, specificity, recall, F-measure, and precision, respectively, are measured. The proposed EPB-OCNN method outperformed the state-of-the-art methods, thereby corroborating the objective.
Collapse
Affiliation(s)
- Varanavasi Nallasamy
- Cognizant Technology Solutions Pvt. Ltd, CHIL SEZ IT Park, Keeranatham, Saravanam Patti, Coimbatore, Tamil Nadu 641035 India
| | - Malarvizhi Seshiah
- Department of Computer Science, Thiruvalluvar Government Arts College, Rasipuram, Namakkal, Tamil Nadu India
| |
Collapse
|
31
|
Miller NL, Clark T, Raman R, Sasisekharan R. Learned features of antibody-antigen binding affinity. Front Mol Biosci 2023; 10:1112738. [PMID: 36895805 PMCID: PMC9989197 DOI: 10.3389/fmolb.2023.1112738] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 01/18/2023] [Indexed: 02/23/2023] Open
Abstract
Defining predictors of antigen-binding affinity of antibodies is valuable for engineering therapeutic antibodies with high binding affinity to their targets. However, this task is challenging owing to the huge diversity in the conformations of the complementarity determining regions of antibodies and the mode of engagement between antibody and antigen. In this study, we used the structural antibody database (SAbDab) to identify features that can discriminate high- and low-binding affinity across a 5-log scale. First, we abstracted features based on previously learned representations of protein-protein interactions to derive 'complex' feature sets, which include energetic, statistical, network-based, and machine-learned features. Second, we contrasted these complex feature sets with additional 'simple' feature sets based on counts of contacts between antibody and antigen. By investigating the predictive potential of 700 features contained in the eight complex and simple feature sets, we observed that simple feature sets perform comparably to complex feature sets in classification of binding affinity. Moreover, combining features from all eight feature-sets provided the best classification performance (median cross-validation AUROC and F1-score of 0.72). Of note, classification performance is substantially improved when several sources of data leakage (e.g., homologous antibodies) are not removed from the dataset, emphasizing a potential pitfall in this task. We additionally observe a classification performance plateau across diverse featurization approaches, highlighting the need for additional affinity-labeled antibody-antigen structural data. The findings from our present study set the stage for future studies aimed at multiple-log enhancement of antibody affinity through feature-guided engineering.
Collapse
Affiliation(s)
- Nathaniel L Miller
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States.,Koch Institute of Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Thomas Clark
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States.,Koch Institute of Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Rahul Raman
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States.,Koch Institute of Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Ram Sasisekharan
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States.,Koch Institute of Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, United States
| |
Collapse
|
32
|
Mufassirin MMM, Newton MAH, Sattar A. Artificial intelligence for template-free protein structure prediction: a comprehensive review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10350-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
33
|
George DM, Ramadoss R, Mackey HR, Vincent AS. Comparative computational study to augment UbiA prenyltransferases inherent in purple photosynthetic bacteria cultured from mangrove microbial mats in Qatar for coenzyme Q 10 biosynthesis. BIOTECHNOLOGY REPORTS (AMSTERDAM, NETHERLANDS) 2022; 36:e00775. [PMID: 36404947 PMCID: PMC9672418 DOI: 10.1016/j.btre.2022.e00775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 10/31/2022] [Accepted: 11/11/2022] [Indexed: 11/15/2022]
Abstract
Coenzyme Q10 (CoQ10) is a powerful antioxidant with a myriad of applications in healthcare and cosmetic industries. The most effective route of CoQ10 production is microbial biosynthesis. In this study, four CoQ10 biosynthesizing purple photosynthetic bacteria: Rhodobacter blasticus, Rhodovulum adriaticum, Afifella pfennigii and Rhodovulum marinum, were identified using 16S rRNA sequencing of enriched microbial mat samples obtained from Purple Island mangroves (Qatar). The membrane bound enzyme 4-hydroxybenzoate octaprenyltransferase (UbiA) is pivotal for bacterial biosynthesis of CoQ10. The identified bacteria could be inducted as efficient industrial bio-synthesizers of CoQ10 by engineering their UbiA enzymes. Therefore, the mutation sites and substitution residues for potential functional enhancement were determined by comparative computational study. Two mutation sites were identified within the two conserved Asp-rich motifs, and the effect of proposed mutations in substrate binding affinity of the UbiA enzymes was assessed using multiple ligand simultaneous docking (MLSD) studies, as a groundwork for experimental studies.
Collapse
Affiliation(s)
- Drishya M. George
- College of Health and Life Sciences, Hamad bin Khalifa University, Qatar Foundation, Doha, Qatar
| | - Ramya Ramadoss
- Biological Sciences, Carnegie Mellon University Qatar, Doha, Qatar
| | - Hamish R. Mackey
- College of Health and Life Sciences, Hamad bin Khalifa University, Qatar Foundation, Doha, Qatar
- Division of Sustainable Development, College of Science and Engineering, Hamad bin Khalifa University, Qatar Foundation, Doha, Qatar
| | | |
Collapse
|
34
|
Buscajoni L, Martinetz MC, Berkemeyer M, Brocard C. Refolding in the modern biopharmaceutical industry. Biotechnol Adv 2022; 61:108050. [PMID: 36252795 DOI: 10.1016/j.biotechadv.2022.108050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 10/07/2022] [Accepted: 10/11/2022] [Indexed: 11/02/2022]
Abstract
Inclusion bodies (IBs) often emerge upon overexpression of recombinant proteins in E. coli. From IBs, refolding is necessary to generate the native protein that can be further purified to obtain pure and active biologicals. This work focusses on refolding as a significant process step during biopharmaceutical manufacturing with an industrial perspective. A theoretical and historical background on protein refolding gives the reader a starting point for further insights into industrial process development. Quality requirements on IBs as starting material for refolding are discussed and further economic and ecological aspects are considered with regards to buffer systems and refolding conditions. A process development roadmap shows the development of a refolding process starting from first exploratory screening rounds to scale-up and implementation in manufacturing plant. Different aspects, with a direct influence on yield, such as the selection of chemicals including pH, ionic strength, additives, etc., and other often neglected aspects, important during scale-up, such as mixing, and gas-fluid interaction, are highlighted with the use of a quality by design (QbD) approach. The benefits of simulation sciences (process simulation and computer fluid dynamics) and process analytical technology (PAT) for seamless process development are emphasized. The work concludes with an outlook on future applications of refolding and highlights open research inquiries.
Collapse
Affiliation(s)
- Luisa Buscajoni
- Boehringer-Ingelheim RCV GmbH & Co KG, Biopharma Austria, Process Science Downstream Development, Dr. Boehringer-Gasse 5- 11, 1120 Vienna, Austria.
| | - Michael C Martinetz
- Boehringer-Ingelheim RCV GmbH & Co KG, Biopharma Austria, Process Science Downstream Development, Dr. Boehringer-Gasse 5- 11, 1120 Vienna, Austria.
| | - Matthias Berkemeyer
- Boehringer-Ingelheim RCV GmbH & Co KG, Biopharma Austria, Process Science Downstream Development, Dr. Boehringer-Gasse 5- 11, 1120 Vienna, Austria.
| | - Cécile Brocard
- Boehringer-Ingelheim RCV GmbH & Co KG, Biopharma Austria, Process Science Downstream Development, Dr. Boehringer-Gasse 5- 11, 1120 Vienna, Austria.
| |
Collapse
|
35
|
Protein structure prediction in the deep learning era. Curr Opin Struct Biol 2022; 77:102495. [PMID: 36371845 DOI: 10.1016/j.sbi.2022.102495] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 10/03/2022] [Accepted: 10/04/2022] [Indexed: 11/11/2022]
Abstract
Significant advances have been achieved in protein structure prediction, especially with the recent development of the AlphaFold2 and the RoseTTAFold systems. This article reviews the progress in deep learning-based protein structure prediction methods in the past two years. First, we divide the representative methods into two categories: the two-step approach and the end-to-end approach. Then, we show that the two-step approach is possible to achieve similar accuracy to the state-of-the-art end-to-end approach AlphaFold2. Compared to the end-to-end approach, the two-step approach requires fewer computing resources. We conclude that it is valuable to keep developing both approaches. Finally, a few outstanding challenges in function-orientated protein structure prediction are pointed out for future development.
Collapse
|
36
|
Liu H, Chen Q. Computational protein design with data‐driven approaches: Recent developments and perspectives. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Affiliation(s)
- Haiyan Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine University of Science and Technology of China Hefei Anhui China
- Biomedical Sciences and Health Laboratory of Anhui Province University of Science and Technology of China Hefei Anhui China
- School of Data Science University of Science and Technology of China Hefei Anhui China
| | - Quan Chen
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine University of Science and Technology of China Hefei Anhui China
- Biomedical Sciences and Health Laboratory of Anhui Province University of Science and Technology of China Hefei Anhui China
| |
Collapse
|
37
|
Nussinov R, Zhang M, Liu Y, Jang H. AlphaFold, Artificial Intelligence (AI), and Allostery. J Phys Chem B 2022; 126:6372-6383. [PMID: 35976160 PMCID: PMC9442638 DOI: 10.1021/acs.jpcb.2c04346] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/03/2022] [Indexed: 02/08/2023]
Abstract
AlphaFold has burst into our lives. A powerful algorithm that underscores the strength of biological sequence data and artificial intelligence (AI). AlphaFold has appended projects and research directions. The database it has been creating promises an untold number of applications with vast potential impacts that are still difficult to surmise. AI approaches can revolutionize personalized treatments and usher in better-informed clinical trials. They promise to make giant leaps toward reshaping and revamping drug discovery strategies, selecting and prioritizing combinations of drug targets. Here, we briefly overview AI in structural biology, including in molecular dynamics simulations and prediction of microbiota-human protein-protein interactions. We highlight the advancements accomplished by the deep-learning-powered AlphaFold in protein structure prediction and their powerful impact on the life sciences. At the same time, AlphaFold does not resolve the decades-long protein folding challenge, nor does it identify the folding pathways. The models that AlphaFold provides do not capture conformational mechanisms like frustration and allostery, which are rooted in ensembles, and controlled by their dynamic distributions. Allostery and signaling are properties of populations. AlphaFold also does not generate ensembles of intrinsically disordered proteins and regions, instead describing them by their low structural probabilities. Since AlphaFold generates single ranked structures, rather than conformational ensembles, it cannot elucidate the mechanisms of allosteric activating driver hotspot mutations nor of allosteric drug resistance. However, by capturing key features, deep learning techniques can use the single predicted conformation as the basis for generating a diverse ensemble.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational
Structural Biology Section, Frederick National
Laboratory for Cancer Research, Frederick, Maryland 21702, United States
- Department
of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Mingzhen Zhang
- Computational
Structural Biology Section, Frederick National
Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| | - Yonglan Liu
- Cancer
Innovation Laboratory, National Cancer Institute, Frederick, Maryland 21702, United States
| | - Hyunbum Jang
- Computational
Structural Biology Section, Frederick National
Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| |
Collapse
|
38
|
Pearce R, Li Y, Omenn GS, Zhang Y. Fast and accurate Ab Initio Protein structure prediction using deep learning potentials. PLoS Comput Biol 2022; 18:e1010539. [PMID: 36112717 PMCID: PMC9518900 DOI: 10.1371/journal.pcbi.1010539] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 09/28/2022] [Accepted: 09/03/2022] [Indexed: 01/05/2023] Open
Abstract
Despite the immense progress recently witnessed in protein structure prediction, the modeling accuracy for proteins that lack sequence and/or structure homologs remains to be improved. We developed an open-source program, DeepFold, which integrates spatial restraints predicted by multi-task deep residual neural-networks along with a knowledge-based energy function to guide its gradient-descent folding simulations. The results on large-scale benchmark tests showed that DeepFold creates full-length models with accuracy significantly beyond classical folding approaches and other leading deep learning methods. Of particular interest is the modeling performance on the most difficult targets with very few homologous sequences, where DeepFold achieved an average TM-score that was 40.3% higher than trRosetta and 44.9% higher than DMPfold. Furthermore, the folding simulations for DeepFold were 262 times faster than traditional fragment assembly simulations. These results demonstrate the power of accurately predicted deep learning potentials to improve both the accuracy and speed of ab initio protein structure prediction.
Collapse
Affiliation(s)
- Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Gilbert S. Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Departments of Internal Medicine and Human Genetics and School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
39
|
Zhu GY, Liu Y, Wang PH, Yang X, Yu DJ. Learning Protein Embedding to Improve Protein Fold Recognition Using Deep Metric Learning. J Chem Inf Model 2022; 62:4283-4291. [PMID: 36017565 DOI: 10.1021/acs.jcim.2c00959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Protein fold recognition refers to predicting the most likely fold type of the query protein and is a critical step of protein structure and function prediction. With the popularity of deep learning in bioinformatics, protein fold recognition has obtained impressive progress. In this study, to extract the fold-specific feature to improve protein fold recognition, we proposed a unified deep metric learning framework based on a joint loss function, termed NPCFold. In addition, we also proposed an integrated machine learning model based on the similarity of proteins in various properties, termed NPCFoldpro. Benchmark experiments show both NPCFold and NPCFoldpro outperform existing protein fold recognition methods at the fold level, indicating that our proposed strategies of fusing loss functions and fusing features could improve the fold recognition level.
Collapse
Affiliation(s)
- Guan-Yu Zhu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, P. R. China
| | - Yan Liu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, P. R. China
| | - Peng-Hao Wang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, P. R. China
| | - Xibei Yang
- School of Computer, Jiangsu University of Science and Technology, Zhenjiang 212100, P. R. China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, P. R. China
| |
Collapse
|
40
|
Computation-Aided Design of Albumin Affibody-Inserted Antibody Fragment for the Prolonged Serum Half-Life. Pharmaceutics 2022; 14:pharmaceutics14091769. [PMID: 36145517 PMCID: PMC9500697 DOI: 10.3390/pharmaceutics14091769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 08/17/2022] [Accepted: 08/23/2022] [Indexed: 11/16/2022] Open
Abstract
Single-chain variable fragments (scFvs) have been recognized as promising agents in cancer therapy. However, short serum half-life of scFvs often limits clinical application. Fusion to albumin affibody (ABD) is an effective and convenient half-life extension strategy. Although one terminus of scFv is available for fusion of ABD, it is also frequently used for fusion of useful moieties such as small functional proteins, cytokines, or antibodies. Herein, we investigated the internal linker region for ABD fusion instead of terminal region, which was rarely explored before. We constructed two internally ABD-inserted anti-HER2 4D5scFv (4D5-ABD) variants, which have short (4D5-S-ABD) and long (4D5-L-ABD) linker length respectively. The model structures of these 4D5scFv and 4D5-ABD variants predicted using the deep learning-based protein structure prediction program (AlphaFold2) revealed high similarity to either the original 4D5scFv or the ABD structure, implying that the functionality would be retained. Designed 4D5-ABD variants were expressed in the bacterial expression system and characterized. Both 4D5-ABD variants showed anti-HER2 binding affinity comparable with 4D5scFv. Binding affinity of both 4D5-ABD variants against albumin was also comparable. In a pharmacokinetic study in mice, the 4D5-ABD variants showed a significantly prolonged half-life of 34 h, 114 times longer than that of 4D5scFv. In conclusion, we have developed a versatile scFv platform with enhanced pharmacokinetic profiles with an aid of deep learning-based structure prediction.
Collapse
|
41
|
I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction. Nat Protoc 2022; 17:2326-2353. [PMID: 35931779 DOI: 10.1038/s41596-022-00728-0] [Citation(s) in RCA: 128] [Impact Index Per Article: 64.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 05/24/2022] [Indexed: 01/17/2023]
Abstract
Most proteins in cells are composed of multiple folding units (or domains) to perform complex functions in a cooperative manner. Relative to the rapid progress in single-domain structure prediction, there are few effective tools available for multi-domain protein structure assembly, mainly due to the complexity of modeling multi-domain proteins, which involves higher degrees of freedom in domain-orientation space and various levels of continuous and discontinuous domain assembly and linker refinement. To meet the challenge and the high demand of the community, we developed I-TASSER-MTD to model the structures and functions of multi-domain proteins through a progressive protocol that combines sequence-based domain parsing, single-domain structure folding, inter-domain structure assembly and structure-based function annotation in a fully automated pipeline. Advanced deep-learning models have been incorporated into each of the steps to enhance both the domain modeling and inter-domain assembly accuracy. The protocol allows for the incorporation of experimental cross-linking data and cryo-electron microscopy density maps to guide the multi-domain structure assembly simulations. I-TASSER-MTD is built on I-TASSER but substantially extends its ability and accuracy in modeling large multi-domain protein structures and provides meaningful functional insights for the targets at both the domain- and full-chain levels from the amino acid sequence alone.
Collapse
|
42
|
Ben-Tal N, Kolodny R. Homologues not needed: Structure prediction from a protein language model. Structure 2022; 30:1047-1049. [PMID: 35931059 DOI: 10.1016/j.str.2022.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
Abstract
Accurate protein structure predictors use clusters of homologues, which disregard sequence specific effects. In this issue of Structure, Weißenow and colleagues report a deep learning-based tool, EMBER2, that efficiently predicts the distances in a protein structure from its amino acid sequence only. This approach should enable the analysis of mutation effects.
Collapse
Affiliation(s)
- Nir Ben-Tal
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel.
| | - Rachel Kolodny
- Department of Computer Science, University of Haifa, Mount Carmel, Haifa, 3498838, Israel.
| |
Collapse
|
43
|
Du H, Jiang D, Gao J, Zhang X, Jiang L, Zeng Y, Wu Z, Shen C, Xu L, Cao D, Hou T, Pan P. Proteome-Wide Profiling of the Covalent-Druggable Cysteines with a Structure-Based Deep Graph Learning Network. Research (Wash D C) 2022; 2022:9873564. [PMID: 35958111 PMCID: PMC9343084 DOI: 10.34133/2022/9873564] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Accepted: 06/27/2022] [Indexed: 11/06/2022] Open
Abstract
Covalent ligands have attracted increasing attention due to their unique advantages, such as long residence time, high selectivity, and strong binding affinity. They also show promise for targets where previous efforts to identify noncovalent small molecule inhibitors have failed. However, our limited knowledge of covalent binding sites has hindered the discovery of novel ligands. Therefore, developing in silico methods to identify covalent binding sites is highly desirable. Here, we propose DeepCoSI, the first structure-based deep graph learning model to identify ligandable covalent sites in the protein. By integrating the characterization of the binding pocket and the interactions between each cysteine and the surrounding environment, DeepCoSI achieves state-of-the-art predictive performances. The validation on two external test sets which mimic the real application scenarios shows that DeepCoSI has strong ability to distinguish ligandable sites from the others. Finally, we profiled the entire set of protein structures in the RCSB Protein Data Bank (PDB) with DeepCoSI to evaluate the ligandability of each cysteine for covalent ligand design, and made the predicted data publicly available on website.
Collapse
Affiliation(s)
- Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Junbo Gao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Lingxiao Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Yundian Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410004 Hunan, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| |
Collapse
|
44
|
Liu Z, Zhang R, Zhang W, Xu Y. Structure-based rational design of hydroxysteroid dehydrogenases for improving and diversifying steroid synthesis. Crit Rev Biotechnol 2022:1-17. [PMID: 35834355 DOI: 10.1080/07388551.2022.2054770] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
A group of steroidogenic enzymes, hydroxysteroid dehydrogenases are involved in steroid metabolism which is very important in the cell: signaling, growth, reproduction, and energy homeostasis. The enzymes show an inherent function in the interconversion of ketosteroids and hydroxysteroids in a position- and stereospecific manner on the steroid nucleus and side-chains. However, the biocatalysis of steroids reaction is a vital and demanding, yet challenging, task to produce the desired enantiopure products with non-natural substrates or non-natural cofactors, and/or in non-physiological conditions. This has driven the use of protein design strategies to improve their inherent biosynthetic efficiency or activate their silent catalytic ability. In this review, the innate features and catalytic characteristics of enzymes based on sequence-structure-function relationships of steroidogenic enzymes are reviewed. Combining structure information and catalytic mechanisms, progress in protein redesign to stimulate potential function, for example, substrate specificity, cofactor dependence, and catalytic stability are discussed.
Collapse
Affiliation(s)
- Zhiyong Liu
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key Laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi, China
| | - Rongzhen Zhang
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key Laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi, China
| | - Wenchi Zhang
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Yan Xu
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key Laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi, China
| |
Collapse
|
45
|
Whole-Genome Sequencing of a Potential Ester-Synthesizing Bacterium Isolated from Fermented Golden Pomfret and Identification of Its Lipase Encoding Genes. Foods 2022; 11:foods11131954. [PMID: 35804769 PMCID: PMC9266206 DOI: 10.3390/foods11131954] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 06/20/2022] [Accepted: 06/25/2022] [Indexed: 12/17/2022] Open
Abstract
Microbial ester synthases are regarded as valuable catalysts in the food industry. Here, one strain of Acinetobacter venetianus with ester synthase-production capacity, SCSMX-3, was isolated from traditional fermented golden pomfret. It exhibited good growth in mesophilic, low salt, and slightly alkaline environments. The ester synthase produced by SCSMX-3 displayed maximum activity at pH 8.0 and 35 °C. Genome sequencing revealed that the strain contains one circular chromosome of 336313 bp and two circular plasmids (plasmid A-14424 bp and plasmid B-11249 bp). Six CRISPR structures enhance the genomic stability of SCSMX-3 and provide the opportunity to create new functional strains. Gene function analysis indicated that SCSMX-3 produces the necessary enzymes for survival under different conditions and for flavor substance synthesis. Furthermore, 49 genes encoding enzymes associated with lipid metabolism, including three triacylglycerol lipases and two esterases, were identified through the NCBI Non-Redundant Protein Database. The lipase encoded by gene0302 belongs to the GX group and the abH15.02 (Burkholderia cepacia lipase) homolog of the abH15 superfamily. Our results shed light on the genomic diversity of and lipid metabolism in A. venetianus isolated from fermented golden pomfret, laying a foundation for the exploration of new ester synthases to improve the flavor of fermented fish products.
Collapse
|
46
|
Shi Z, Liu P, Liao X, Mao Z, Zhang J, Wang Q, Sun J, Ma H, Ma Y. Data-Driven Synthetic Cell Factories Development for Industrial Biomanufacturing. BIODESIGN RESEARCH 2022; 2022:9898461. [PMID: 37850146 PMCID: PMC10521697 DOI: 10.34133/2022/9898461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 05/26/2022] [Indexed: 10/19/2023] Open
Abstract
Revolutionary breakthroughs in artificial intelligence (AI) and machine learning (ML) have had a profound impact on a wide range of scientific disciplines, including the development of artificial cell factories for biomanufacturing. In this paper, we review the latest studies on the application of data-driven methods for the design of new proteins, pathways, and strains. We first briefly introduce the various types of data and databases relevant to industrial biomanufacturing, which are the basis for data-driven research. Different types of algorithms, including traditional ML and more recent deep learning methods, are also presented. We then demonstrate how these data-based approaches can be applied to address various issues in cell factory development using examples from recent studies, including the prediction of protein function, improvement of metabolic models, and estimation of missing kinetic parameters, design of non-natural biosynthesis pathways, and pathway optimization. In the last section, we discuss the current limitations of these data-driven approaches and propose that data-driven methods should be integrated with mechanistic models to complement each other and facilitate the development of synthetic strains for industrial biomanufacturing.
Collapse
Affiliation(s)
- Zhenkun Shi
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Pi Liu
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Xiaoping Liao
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Zhitao Mao
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Jianqi Zhang
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Qinhong Wang
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Jibin Sun
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Hongwu Ma
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Yanhe Ma
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| |
Collapse
|
47
|
Misiura M, Shroff R, Thyer R, Kolomeisky AB. DLPacker: Deep learning for prediction of amino acid side chain conformations in proteins. Proteins 2022; 90:1278-1290. [PMID: 35122328 DOI: 10.1002/prot.26311] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 12/03/2021] [Accepted: 12/07/2021] [Indexed: 12/20/2022]
Abstract
Prediction of side chain conformations of amino acids in proteins (also termed "packing") is an important and challenging part of protein structure prediction with many interesting applications in protein design. A variety of methods for packing have been developed but more accurate ones are still needed. Machine learning (ML) methods have recently become a powerful tool for solving various problems in diverse areas of science, including structural biology. In this study, we evaluate the potential of deep neural networks (DNNs) for prediction of amino acid side chain conformations. We formulate the problem as image-to-image transformation and train a U-net style DNN to solve the problem. We show that our method outperforms other physics-based methods by a significant margin: reconstruction RMSDs for most amino acids are about 20% smaller compared to SCWRL4 and Rosetta Packer with RMSDs for bulky hydrophobic amino acids Phe, Tyr, and Trp being up to 50% smaller.
Collapse
Affiliation(s)
- Mikita Misiura
- Department of Chemistry, Center for Theoretical Biological Physics, Rice University, Houston, Texas, USA
| | | | - Ross Thyer
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas, USA
| | - Anatoly B Kolomeisky
- Department of Chemistry, Center for Theoretical Biological Physics, Rice University, Houston, Texas, USA.,Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas, USA.,Department of Physics and Astronomy, Center for Theoretical Biological Physics, Rice University, Houston, Texas, USA
| |
Collapse
|
48
|
Saldaño T, Escobedo N, Marchetti J, Zea DJ, Mac Donagh J, Velez Rueda AJ, Gonik E, García Melani A, Novomisky Nechcoff J, Salas MN, Peters T, Demitroff N, Fernandez Alberti S, Palopoli N, Fornasari MS, Parisi G. Impact of protein conformational diversity on AlphaFold predictions. Bioinformatics 2022; 38:2742-2748. [PMID: 35561203 DOI: 10.1093/bioinformatics/btac202] [Citation(s) in RCA: 58] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 02/10/2022] [Accepted: 03/31/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION After the outstanding breakthrough of AlphaFold in predicting protein 3D models, new questions appeared and remain unanswered. The ensemble nature of proteins, for example, challenges the structural prediction methods because the models should represent a set of conformers instead of single structures. The evolutionary and structural features captured by effective deep learning techniques may unveil the information to generate several diverse conformations from a single sequence. Here, we address the performance of AlphaFold2 predictions obtained through ColabFold under this ensemble paradigm. RESULTS Using a curated collection of apo-holo pairs of conformers, we found that AlphaFold2 predicts the holo form of a protein in ∼70% of the cases, being unable to reproduce the observed conformational diversity with the same error for both conformers. More importantly, we found that AlphaFold2's performance worsens with the increasing conformational diversity of the studied protein. This impairment is related to the heterogeneity in the degree of conformational diversity found between different members of the homologous family of the protein under study. Finally, we found that main-chain flexibility associated with apo-holo pairs of conformers negatively correlates with the predicted local model quality score plDDT, indicating that plDDT values in a single 3D model could be used to infer local conformational changes linked to ligand binding transitions. AVAILABILITY AND IMPLEMENTATION Data and code used in this manuscript are publicly available at https://gitlab.com/sbgunq/publications/af2confdiv-oct2021. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tadeo Saldaño
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Nahuel Escobedo
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Julia Marchetti
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | | | - Juan Mac Donagh
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Ana Julia Velez Rueda
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Eduardo Gonik
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
- INIFTA (CONICET-UNLP) - Fotoquímica y Nanomateriales para el Ambiente y la Biología (nanoFOT), La Plata, Argentina
| | | | | | - Martín N Salas
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
| | - Tomás Peters
- Fundación Instituto Leloir-Instituto de Investigaciones Bioquímicas de Buenos Aires, Buenos Aires, Argentina
| | - Nicolás Demitroff
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
- Fundación Instituto Leloir-Instituto de Investigaciones Bioquímicas de Buenos Aires, Buenos Aires, Argentina
| | - Sebastian Fernandez Alberti
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Nicolas Palopoli
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Maria Silvina Fornasari
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| |
Collapse
|
49
|
Talluri S. Algorithms for protein design. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:1-38. [PMID: 35534105 DOI: 10.1016/bs.apcsb.2022.01.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Computational Protein Design has the potential to contribute to major advances in enzyme technology, vaccine design, receptor-ligand engineering, biomaterials, nanosensors, and synthetic biology. Although Protein Design is a challenging problem, proteins can be designed by experts in Protein Design, as well as by non-experts whose primary interests are in the applications of Protein Design. The increased accessibility of Protein Design technology is attributable to the accumulated knowledge and experience with Protein Design as well as to the availability of software and online resources. The objective of this review is to serve as a guide to the relevant literature with a focus on the novel methods and algorithms that have been developed or applied for Protein Design, and to assist in the selection of algorithms for Protein Design. Novel algorithms and models that have been introduced to utilize the enormous amount of experimental data and novel computational hardware have the potential for producing substantial increases in the accuracy, reliability and range of applications of designed proteins.
Collapse
Affiliation(s)
- Sekhar Talluri
- Department of Biotechnology, GITAM, Visakhapatnam, India.
| |
Collapse
|
50
|
Yang W, Liu Y, Xiao C. Deep metric learning for accurate protein secondary structure prediction. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|