1
|
Hong L, Kortemme T. An integrative approach to protein sequence design through multiobjective optimization. PLoS Comput Biol 2024; 20:e1011953. [PMID: 38991035 DOI: 10.1371/journal.pcbi.1011953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 06/25/2024] [Indexed: 07/13/2024] Open
Abstract
With recent methodological advances in the field of computational protein design, in particular those based on deep learning, there is an increasing need for frameworks that allow for coherent, direct integration of different models and objective functions into the generative design process. Here we demonstrate how evolutionary multiobjective optimization techniques can be adapted to provide such an approach. With the established Non-dominated Sorting Genetic Algorithm II (NSGA-II) as the optimization framework, we use AlphaFold2 and ProteinMPNN confidence metrics to define the objective space, and a mutation operator composed of ESM-1v and ProteinMPNN to rank and then redesign the least favorable positions. Using the two-state design problem of the foldswitching protein RfaH as an in-depth case study, and PapD and calmodulin as examples of higher-dimensional design problems, we show that the evolutionary multiobjective optimization approach leads to significant reduction in the bias and variance in RfaH native sequence recovery, compared to a direct application of ProteinMPNN. We suggest that this improvement is due to three factors: (i) the use of an informative mutation operator that accelerates the sequence space exploration, (ii) the parallel, iterative design process inherent to the genetic algorithm that improves upon the ProteinMPNN autoregressive sequence decoding scheme, and (iii) the explicit approximation of the Pareto front that leads to optimal design candidates representing diverse tradeoff conditions. We anticipate this approach to be readily adaptable to different models and broadly relevant for protein design tasks with complex specifications.
Collapse
Affiliation(s)
- Lu Hong
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, United States of America
| | - Tanja Kortemme
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, United States of America
- Quantitative Biosciences Institute, University of California, San Francisco, California, United States of America
- Chan Zuckerberg Biohub, San Francisco, California, United States of America
| |
Collapse
|
2
|
Rahimzadeh F, Mohammad Khanli L, Salehpoor P, Golabi F, PourBahrami S. Unveiling the evolution of policies for enhancing protein structure predictions: A comprehensive analysis. Comput Biol Med 2024; 179:108815. [PMID: 38986287 DOI: 10.1016/j.compbiomed.2024.108815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 06/09/2024] [Accepted: 06/24/2024] [Indexed: 07/12/2024]
Abstract
Predicting protein structure is both fascinating and formidable, playing a crucial role in structure-based drug discovery and unraveling diseases with elusive origins. The Critical Assessment of Protein Structure Prediction (CASP) serves as a biannual battleground where global scientists converge to untangle the intricate relationships within amino acid chains. Two primary methods, Template-Based Modeling (TBM) and Template-Free (TF) strategies, dominate protein structure prediction. The trend has shifted towards Template-Free predictions due to their broader sequence coverage with fewer templates. The predictive process can be broadly classified into contact map, binned-distance, and real-valued distance predictions, each with distinctive strengths and limitations manifested through tailored loss functions. We have also introduced revolutionary end-to-end, and all-atom diffusion-based techniques that have transformed protein structure predictions. Recent advancements in deep learning techniques have significantly improved prediction accuracy, although the effectiveness is contingent upon the quality of input features derived from natural bio-physiochemical attributes and Multiple Sequence Alignments (MSA). Hence, the generation of high-quality MSA data holds paramount importance in harnessing informative input features for enhanced prediction outcomes. Remarkable successes have been achieved in protein structure prediction accuracy, however not enough for what structural knowledge was intended to, which implies need for development in some other aspects of the predictions. In this regard, scientists have opened other frontiers for protein structural prediction. The utilization of subsampling in multiple sequence alignment (MSA) and protein language modeling appears to be particularly promising in enhancing the accuracy and efficiency of predictions, ultimately aiding in drug discovery efforts. The exploration of predicting protein complex structure also opens up exciting opportunities to deepen our knowledge of molecular interactions and design therapeutics that are more effective. In this article, we have discussed the vicissitudes that the scientists have gone through to improve prediction accuracy, and examined the effective policies in predicting from different aspects, including the construction of high quality MSA, providing informative input features, and progresses in deep learning approaches. We have also briefly touched upon transitioning from predicting single-chain protein structures to predicting protein complex structures. Our findings point towards promoting open research environments to support the objectives of protein structure prediction.
Collapse
Affiliation(s)
- Faezeh Rahimzadeh
- Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| | | | - Pedram Salehpoor
- Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| | - Faegheh Golabi
- Department of Biomedical Engineering, Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Shahin PourBahrami
- Department of Computer Engineering, Technical and Vocational University (TVU), Tehran, Iran
| |
Collapse
|
3
|
Csikász-Nagy A, Fichó E, Noto S, Reguly I. Computational tools to predict context-specific protein complexes. Curr Opin Struct Biol 2024; 88:102883. [PMID: 38986166 DOI: 10.1016/j.sbi.2024.102883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Revised: 05/21/2024] [Accepted: 06/19/2024] [Indexed: 07/12/2024]
Abstract
Interactions between thousands of proteins define cells' protein-protein interaction (PPI) network. Some of these interactions lead to the formation of protein complexes. It is challenging to identify a protein complex in a haystack of protein-protein interactions, and it is even more difficult to predict all protein complexes of the complexome. Simulations and machine learning approaches try to crack these problems by looking at the PPI network or predicted protein structures. Clustering of PPI networks led to the first protein complex predictions, while most recently, atomistic models of protein complexes and deep-learning-based structure prediction methods have also emerged. The simulation of PPI level interactions even enables the quantitative prediction of protein complexes. These methods, the required data sources, and their potential future developments are discussed in this review.
Collapse
Affiliation(s)
- Attila Csikász-Nagy
- Cytocast Hungary Kft, Budapest, Hungary; Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary.
| | | | - Santiago Noto
- Cytocast Hungary Kft, Budapest, Hungary; Escola de Matemática Aplicada, Fundação Getúlio Vargas, Rio de Janeiro, Brazil
| | - István Reguly
- Cytocast Hungary Kft, Budapest, Hungary; Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
| |
Collapse
|
4
|
Cheng P, Mao C, Tang J, Yang S, Cheng Y, Wang W, Gu Q, Han W, Chen H, Li S, Chen Y, Zhou J, Li W, Pan A, Zhao S, Huang X, Zhu S, Zhang J, Shu W, Wang S. Zero-shot prediction of mutation effects with multimodal deep representation learning guides protein engineering. Cell Res 2024:10.1038/s41422-024-00989-2. [PMID: 38969803 DOI: 10.1038/s41422-024-00989-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 06/03/2024] [Indexed: 07/07/2024] Open
Abstract
Mutations in amino acid sequences can provoke changes in protein function. Accurate and unsupervised prediction of mutation effects is critical in biotechnology and biomedicine, but remains a fundamental challenge. To resolve this challenge, here we present Protein Mutational Effect Predictor (ProMEP), a general and multiple sequence alignment-free method that enables zero-shot prediction of mutation effects. A multimodal deep representation learning model embedded in ProMEP was developed to comprehensively learn both sequence and structure contexts from ~160 million proteins. ProMEP achieves state-of-the-art performance in mutational effect prediction and accomplishes a tremendous improvement in speed, enabling efficient and intelligent protein engineering. Specifically, ProMEP accurately forecasts mutational consequences on the gene-editing enzymes TnpB and TadA, and successfully guides the development of high-performance gene-editing tools with their engineered variants. The gene-editing efficiency of a 5-site mutant of TnpB reaches up to 74.04% (vs 24.66% for the wild type); and the base editing tool developed on the basis of a TadA 15-site mutant (in addition to the A106V/D108N double mutation that renders deoxyadenosine deaminase activity to TadA) exhibits an A-to-G conversion frequency of up to 77.27% (vs 69.80% for ABE8e, a previous TadA-based adenine base editor) with significantly reduced bystander and off-target effects compared to ABE8e. ProMEP not only showcases superior performance in predicting mutational effects on proteins but also demonstrates a great capability to guide protein engineering. Therefore, ProMEP enables efficient exploration of the gigantic protein space and facilitates practical design of proteins, thereby advancing studies in biomedicine and synthetic biology.
Collapse
Affiliation(s)
- Peng Cheng
- Bioinformatics Center of AMMS, Beijing, China
| | - Cong Mao
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Jin Tang
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Sen Yang
- Bioinformatics Center of AMMS, Beijing, China
| | - Yu Cheng
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Wuke Wang
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Qiuxi Gu
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Wei Han
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Hao Chen
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Sihan Li
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | | | | | - Wuju Li
- Bioinformatics Center of AMMS, Beijing, China
| | - Aimin Pan
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Suwen Zhao
- iHuman Institute, ShanghaiTech University, Shanghai, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Xingxu Huang
- Zhejiang Lab, Hangzhou, Zhejiang, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | | | - Jun Zhang
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China.
| | - Wenjie Shu
- Bioinformatics Center of AMMS, Beijing, China.
| | | |
Collapse
|
5
|
Lupo U, Sgarbossa D, Bitbol AF. Pairing interacting protein sequences using masked language modeling. Proc Natl Acad Sci U S A 2024; 121:e2311887121. [PMID: 38913900 PMCID: PMC11228504 DOI: 10.1073/pnas.2311887121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 12/18/2023] [Indexed: 06/26/2024] Open
Abstract
Predicting which proteins interact together from amino acid sequences is an important task. We develop a method to pair interacting protein sequences which leverages the power of protein language models trained on multiple sequence alignments (MSAs), such as MSA Transformer and the EvoFormer module of AlphaFold. We formulate the problem of pairing interacting partners among the paralogs of two protein families in a differentiable way. We introduce a method called Differentiable Pairing using Alignment-based Language Models (DiffPALM) that solves it by exploiting the ability of MSA Transformer to fill in masked amino acids in multiple sequence alignments using the surrounding context. MSA Transformer encodes coevolution between functionally or structurally coupled amino acids within protein chains. It also captures inter-chain coevolution, despite being trained on single-chain data. Relying on MSA Transformer without fine-tuning, DiffPALM outperforms existing coevolution-based pairing methods on difficult benchmarks of shallow multiple sequence alignments extracted from ubiquitous prokaryotic protein datasets. It also outperforms an alternative method based on a state-of-the-art protein language model trained on single sequences. Paired alignments of interacting protein sequences are a crucial ingredient of supervised deep learning methods to predict the three-dimensional structure of protein complexes. Starting from sequences paired by DiffPALM substantially improves the structure prediction of some eukaryotic protein complexes by AlphaFold-Multimer. It also achieves competitive performance with using orthology-based pairing.
Collapse
Affiliation(s)
- Umberto Lupo
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Damiano Sgarbossa
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| |
Collapse
|
6
|
Laakko T, Korkealaakso A, Yildirir BF, Batys P, Liljeström V, Hokkanen A, Nonappa, Penttilä M, Laukkanen A, Miserez A, Södergård C, Mohammadi P. Accelerated Engineering of ELP-Based Materials through Hybrid Biomimetic-De Novo Predictive Molecular Design. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2312299. [PMID: 38710202 DOI: 10.1002/adma.202312299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 03/28/2024] [Indexed: 05/08/2024]
Abstract
Efforts to engineer high-performance protein-based materials inspired by nature have mostly focused on altering naturally occurring sequences to confer the desired functionalities, whereas de novo design lags significantly behind and calls for unconventional innovative approaches. Here, using partially disordered elastin-like polypeptides (ELPs) as initial building blocks this work shows that de novo engineering of protein materials can be accelerated through hybrid biomimetic design, which this work achieves by integrating computational modeling, deep neural network, and recombinant DNA technology. This generalizable approach involves incorporating a series of de novo-designed sequences with α-helical conformation and genetically encoding them into biologically inspired intrinsically disordered repeating motifs. The new ELP variants maintain structural conformation and showed tunable supramolecular self-assembly out of thermal equilibrium with phase behavior in vitro. This work illustrates the effective translation of the predicted molecular designs in structural and functional materials. The proposed methodology can be applied to a broad range of partially disordered biomacromolecules and potentially pave the way toward the discovery of novel structural proteins.
Collapse
Affiliation(s)
- Timo Laakko
- VTT Technical Research Centre of Finland Ltd., VTT, FI-02044, Finland
| | | | - Burcu Firatligil Yildirir
- Faculty of Engineering and Natural Sciences, Tampere University, Korkeakoulunkatu 6, Tampere, FI-33720, Finland
| | - Piotr Batys
- Jerzy Haber Institute of Catalysis and Surface Chemistry, Polish Academy of Sciences, Niezapominajek 8, Krakow, PL-30239, Poland
| | - Ville Liljeström
- Department of Applied Physics, School of Science, Aalto University, Aalto, FI-00076, Finland
| | - Ari Hokkanen
- VTT Technical Research Centre of Finland Ltd., VTT, FI-02044, Finland
| | - Nonappa
- Faculty of Engineering and Natural Sciences, Tampere University, Korkeakoulunkatu 6, Tampere, FI-33720, Finland
| | - Merja Penttilä
- VTT Technical Research Centre of Finland Ltd., VTT, FI-02044, Finland
| | - Anssi Laukkanen
- VTT Technical Research Centre of Finland Ltd., VTT, FI-02044, Finland
| | - Ali Miserez
- Center for Sustainable Materials (SusMat), School of Materials Science and Engineering, Nanyang Technological University (NTU), Singapore, 637553, Singapore
- School of Biological Sciences, NTU, Singapore, 637551, Singapore
| | - Caj Södergård
- VTT Technical Research Centre of Finland Ltd., VTT, FI-02044, Finland
| | - Pezhman Mohammadi
- VTT Technical Research Centre of Finland Ltd., VTT, FI-02044, Finland
| |
Collapse
|
7
|
Hermosilla AM, Berner C, Ovchinnikov S, Vorobieva AA. Validation of de novo designed water-soluble and transmembrane β-barrels by in silico folding and melting. Protein Sci 2024; 33:e5033. [PMID: 38864690 PMCID: PMC11168064 DOI: 10.1002/pro.5033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 04/14/2024] [Accepted: 05/08/2024] [Indexed: 06/13/2024]
Abstract
In silico validation of de novo designed proteins with deep learning (DL)-based structure prediction algorithms has become mainstream. However, formal evidence of the relationship between a high-quality predicted model and the chance of experimental success is lacking. We used experimentally characterized de novo water-soluble and transmembrane β-barrel designs to show that AlphaFold2 and ESMFold excel at different tasks. ESMFold can efficiently identify designs generated based on high-quality (designable) backbones. However, only AlphaFold2 can predict which sequences have the best chance of experimentally folding among similar designs. We show that ESMFold can generate high-quality structures from just a few predicted contacts and introduce a new approach based on incremental perturbation of the prediction ("in silico melting"), which can reveal differences in the presence of favorable contacts between designs. This study provides a new insight on DL-based structure prediction models explainability and on how they could be leveraged for the design of increasingly complex proteins; in particular membrane proteins which have historically lacked basic in silico validation tools.
Collapse
Affiliation(s)
- Alvaro Martin Hermosilla
- Structural Biology BrusselsVrije Universiteit BrusselBrusselsBelgium
- VIB‐VUB Center for Structural BiologyBrusselsBelgium
| | - Carolin Berner
- Structural Biology BrusselsVrije Universiteit BrusselBrusselsBelgium
- VIB‐VUB Center for Structural BiologyBrusselsBelgium
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship ProgramHarvard UniversityCambridgeMassachusettsUSA
- Present address:
Department of BiologyMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Anastassia A. Vorobieva
- Structural Biology BrusselsVrije Universiteit BrusselBrusselsBelgium
- VIB‐VUB Center for Structural BiologyBrusselsBelgium
- VIB Center for AI and Computational BiologyBelgium
| |
Collapse
|
8
|
Wang J, Watson JL, Lisanza SL. Protein Design Using Structure-Prediction Networks: AlphaFold and RoseTTAFold as Protein Structure Foundation Models. Cold Spring Harb Perspect Biol 2024; 16:a041472. [PMID: 38438190 PMCID: PMC11216169 DOI: 10.1101/cshperspect.a041472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2024]
Abstract
Designing proteins with tailored structures and functions is a long-standing goal in bioengineering. Recently, deep learning advances have enabled protein structure prediction at near-experimental accuracy, which has catalyzed progress in protein design as well. We review recent studies that use structure-prediction neural networks to design proteins, via approaches such as activation maximization, inpainting, or denoising diffusion. These methods have led to major improvements over previous methods in wet-lab success rates for designing protein binders, metalloproteins, enzymes, and oligomeric assemblies. These results show that structure-prediction models are a powerful foundation for developing protein-design tools and suggest that continued improvement of their accuracy and generality will be key to unlocking the full potential of protein design.
Collapse
Affiliation(s)
- Jue Wang
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
- Institute for Protein Design, University of Washington, Seattle, Washington 98195, USA
- Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, Washington 98195, USA
- DeepMind, London EC4A 3BF, United Kingdom
| | - Joseph L Watson
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
- Institute for Protein Design, University of Washington, Seattle, Washington 98195, USA
| | - Sidney L Lisanza
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
- Institute for Protein Design, University of Washington, Seattle, Washington 98195, USA
- Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
9
|
Ren X, Wei J, Luo X, Liu Y, Li K, Zhang Q, Gao X, Yan S, Wu X, Jiang X, Liu M, Cao D, Wei L, Zeng X, Shi J. HydrogelFinder: A Foundation Model for Efficient Self-Assembling Peptide Discovery Guided by Non-Peptidal Small Molecules. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2400829. [PMID: 38704695 PMCID: PMC11234452 DOI: 10.1002/advs.202400829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 03/10/2024] [Indexed: 05/07/2024]
Abstract
Self-assembling peptides have numerous applications in medicine, food chemistry, and nanotechnology. However, their discovery has traditionally been serendipitous rather than driven by rational design. Here, HydrogelFinder, a foundation model is developed for the rational design of self-assembling peptides from scratch. This model explores the self-assembly properties by molecular structure, leveraging 1,377 self-assembling non-peptidal small molecules to navigate chemical space and improve structural diversity. Utilizing HydrogelFinder, 111 peptide candidates are generated and synthesized 17 peptides, subsequently experimentally validating the self-assembly and biophysical characteristics of nine peptides ranging from 1-10 amino acids-all achieved within a 19-day workflow. Notably, the two de novo-designed self-assembling peptides demonstrated low cytotoxicity and biocompatibility, as confirmed by live/dead assays. This work highlights the capacity of HydrogelFinder to diversify the design of self-assembling peptides through non-peptidal small molecules, offering a powerful toolkit and paradigm for future peptide discovery endeavors.
Collapse
Affiliation(s)
- Xuanbai Ren
- College of Information Science and EngineeringHunan UniversityChangsha410003China
| | - Jiaying Wei
- State Key Laboratory of Chemo/Bio‐Sensing and Chemometrics, School of Biomedical SciencesHunan UniversityChangsha410003China
| | - Xiaoli Luo
- College of Information Science and EngineeringHunan UniversityChangsha410003China
| | - Yuansheng Liu
- College of Information Science and EngineeringHunan UniversityChangsha410003China
| | - Kenli Li
- College of Information Science and EngineeringHunan UniversityChangsha410003China
| | - Qiang Zhang
- ZJU‐Hangzhou Global Scientific and Technological Innovation CenterHangzhou311200China
- College of Computer Science and TechnologyZhejiang UniversityHangzhou310013China
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering DivisionKing Abdullah University of Science and Technology (KAUST)Thuwal23955‐6900Saudi Arabia
| | - Sizhe Yan
- State Key Laboratory of Chemo/Bio‐Sensing and Chemometrics, School of Biomedical SciencesHunan UniversityChangsha410003China
| | - Xia Wu
- State Key Laboratory of Chemo/Bio‐Sensing and Chemometrics, School of Biomedical SciencesHunan UniversityChangsha410003China
| | - Xingyue Jiang
- State Key Laboratory of Chemo/Bio‐Sensing and Chemometrics, School of Biomedical SciencesHunan UniversityChangsha410003China
| | - Mingquan Liu
- College of Information Science and EngineeringHunan UniversityChangsha410003China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical SciencesCentral South UniversityChangsha410003China
| | - Leyi Wei
- School of SoftwareShandong UniversityJinan250100China
- Joint SDU‐NTU Centre for Artificial Intelligence Research (C‐FAIR)Shandong UniversityJinan250100China
| | - Xiangxiang Zeng
- College of Information Science and EngineeringHunan UniversityChangsha410003China
| | - Junfeng Shi
- State Key Laboratory of Chemo/Bio‐Sensing and Chemometrics, School of Biomedical SciencesHunan UniversityChangsha410003China
| |
Collapse
|
10
|
Chen H, Fan X, Zhu S, Pei Y, Zhang X, Zhang X, Liu L, Qian F, Tian B. Accurate prediction of CDR-H3 loop structures of antibodies with deep learning. eLife 2024; 12:RP91512. [PMID: 38921957 PMCID: PMC11208048 DOI: 10.7554/elife.91512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/27/2024] Open
Abstract
Accurate prediction of the structurally diverse complementarity determining region heavy chain 3 (CDR-H3) loop structure remains a primary and long-standing challenge for antibody modeling. Here, we present the H3-OPT toolkit for predicting the 3D structures of monoclonal antibodies and nanobodies. H3-OPT combines the strengths of AlphaFold2 with a pre-trained protein language model and provides a 2.24 Å average RMSDCα between predicted and experimentally determined CDR-H3 loops, thus outperforming other current computational methods in our non-redundant high-quality dataset. The model was validated by experimentally solving three structures of anti-VEGF nanobodies predicted by H3-OPT. We examined the potential applications of H3-OPT through analyzing antibody surface properties and antibody-antigen interactions. This structural prediction tool can be used to optimize antibody-antigen binding and engineer therapeutic antibodies with biophysical properties for specialized drug administration route.
Collapse
Affiliation(s)
- Hedi Chen
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua UniversityBeijingChina
| | - Xiaoyu Fan
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua UniversityBeijingChina
| | - Shuqian Zhu
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua UniversityBeijingChina
| | - Yuchan Pei
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua UniversityBeijingChina
| | - Xiaochun Zhang
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua UniversityBeijingChina
| | - Xiaonan Zhang
- Department of Natural Language Processing, Baidu International Technology (Shenzhen) Co LtdShenzhenChina
| | - Lihang Liu
- Department of Natural Language Processing, Baidu International Technology (Shenzhen) Co LtdShenzhenChina
| | - Feng Qian
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua UniversityBeijingChina
| | - Boxue Tian
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua UniversityBeijingChina
| |
Collapse
|
11
|
Kalhor M, Lapin J, Picciani M, Wilhelm M. Rescoring Peptide Spectrum Matches: Boosting Proteomics Performance by Integrating Peptide Property Predictors Into Peptide Identification. Mol Cell Proteomics 2024; 23:100798. [PMID: 38871251 DOI: 10.1016/j.mcpro.2024.100798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 05/26/2024] [Accepted: 06/09/2024] [Indexed: 06/15/2024] Open
Abstract
Rescoring of peptide spectrum matches originating from database search engines enabled by peptide property predictors is exceeding the performance of peptide identification from traditional database search engines. In contrast to the peptide spectrum match scores calculated by traditional database search engines, rescoring peptide spectrum matches generates scores based on comparing observed and predicted peptide properties, such as fragment ion intensities and retention times. These newly generated scores enable a more efficient discrimination between correct and incorrect peptide spectrum matches. This approach was shown to lead to substantial improvements in the number of confidently identified peptides, facilitating the analysis of challenging datasets in various fields such as immunopeptidomics, metaproteomics, proteogenomics, and single-cell proteomics. In this review, we summarize the key elements leading up to the recent introduction of multiple data-driven rescoring pipelines. We provide an overview of relevant post-processing rescoring tools, introduce prominent data-driven rescoring pipelines for various applications, and highlight limitations, opportunities, and future perspectives of this approach and its impact on mass spectrometry-based proteomics.
Collapse
Affiliation(s)
- Mostafa Kalhor
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Joel Lapin
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mario Picciani
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany; Munich Data Science Institute, Technical University of Munich, Garching, Germany.
| |
Collapse
|
12
|
Luong KD, Singh A. Application of Transformers in Cheminformatics. J Chem Inf Model 2024; 64:4392-4409. [PMID: 38815246 PMCID: PMC11167597 DOI: 10.1021/acs.jcim.3c02070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 04/05/2024] [Accepted: 05/06/2024] [Indexed: 06/01/2024]
Abstract
By accelerating time-consuming processes with high efficiency, computing has become an essential part of many modern chemical pipelines. Machine learning is a class of computing methods that can discover patterns within chemical data and utilize this knowledge for a wide variety of downstream tasks, such as property prediction or substance generation. The complex and diverse chemical space requires complex machine learning architectures with great learning power. Recently, learning models based on transformer architectures have revolutionized multiple domains of machine learning, including natural language processing and computer vision. Naturally, there have been ongoing endeavors in adopting these techniques to the chemical domain, resulting in a surge of publications within a short period. The diversity of chemical structures, use cases, and learning models necessitate a comprehensive summarization of existing works. In this paper, we review recent innovations in adapting transformers to solve learning problems in chemistry. Because chemical data is diverse and complex, we structure our discussion based on chemical representations. Specifically, we highlight the strengths and weaknesses of each representation, the current progress of adapting transformer architectures, and future directions.
Collapse
Affiliation(s)
- Kha-Dinh Luong
- Department of Computer Science, University of California Santa Barbara, Santa Barbara, CA 93106, United States
| | - Ambuj Singh
- Department of Computer Science, University of California Santa Barbara, Santa Barbara, CA 93106, United States
| |
Collapse
|
13
|
Dahlström KM, Salminen TA. Apprehensions and emerging solutions in ML-based protein structure prediction. Curr Opin Struct Biol 2024; 86:102819. [PMID: 38631107 DOI: 10.1016/j.sbi.2024.102819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/05/2024] [Accepted: 03/31/2024] [Indexed: 04/19/2024]
Abstract
The three-dimensional structure of proteins determines their function in vital biological processes. Thus, when the structure is known, the molecular mechanism of protein function can be understood in more detail and obtained information utilized in biotechnological, diagnostics, and therapeutic applications. Over the past five years, machine learning (ML)-based modeling has pushed protein structure prediction to the next level with AlphaFold in the front line, predicting the structure for hundreds of millions of proteins. Further advances recently report promising ML-based approaches for solving remaining challenges by incorporating functionally important metals, co-factors, post-translational modifications, structural dynamics, and interdomain and multimer interactions in the structure prediction process.
Collapse
Affiliation(s)
- Käthe M Dahlström
- Structural Bioinformatics Laboratory, Biochemistry, Faculty of Science and Engineering, Åbo Akademi University, Tykistökatu 6A, 20520 Turku, Finland; InFLAMES Research Flagship Center, Åbo Akademi University, 20520 Turku, Finland
| | - Tiina A Salminen
- Structural Bioinformatics Laboratory, Biochemistry, Faculty of Science and Engineering, Åbo Akademi University, Tykistökatu 6A, 20520 Turku, Finland; InFLAMES Research Flagship Center, Åbo Akademi University, 20520 Turku, Finland.
| |
Collapse
|
14
|
Wang L, Wen Z, Liu SW, Zhang L, Finley C, Lee HJ, Fan HJS. Overview of AlphaFold2 and breakthroughs in overcoming its limitations. Comput Biol Med 2024; 176:108620. [PMID: 38761500 DOI: 10.1016/j.compbiomed.2024.108620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 05/01/2024] [Accepted: 05/14/2024] [Indexed: 05/20/2024]
Abstract
Predicting three-dimensional (3D) protein structures has been challenging for decades. The emergence of AlphaFold2 (AF2), a deep learning-based machine learning method developed by DeepMind, became a game changer in the protein folding community. AF2 can predict a protein's three-dimensional structure with high confidence based on its amino acid sequence. Accurate prediction of protein structures can dramatically accelerate our understanding of biological mechanisms and provide a solid foundation for reliable drug design. Although AF2 breaks through the barriers in predicting protein structures, many rooms remain to be further studied. This review provides a brief historical overview of the development of protein structure prediction, covering template-based, template-free, and machine learning-based methods. In addition to reviewing the potential benefits (Pros) and considerations (Cons) of using AF2, this review summarizes the diverse applications, including protein structure predictions, dynamic changes, point mutation, integration of language model and experimental data, protein complex, and protein-peptide interaction. It underscores recent advancements in efficiency, reliability, and broad application of AF2. This comprehensive review offers valuable insights into the applications of AF2 and AF2-inspired AI methods in structural biology and its potential for clinically significant drug target discovery.
Collapse
Affiliation(s)
- Lei Wang
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China
| | - Zehua Wen
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China
| | - Shi-Wei Liu
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China
| | - Lihong Zhang
- Digestive Department, Binhai New Area Hospital of TCM Tianjin, Tianjin, 300451, China
| | - Cierra Finley
- Department of Natural Sciences, Southwest Tennessee Community College, Memphis, TN, 38015, USA
| | - Ho-Jin Lee
- Department of Natural Sciences, Southwest Tennessee Community College, Memphis, TN, 38015, USA; Division of Natural & Mathematical Sciences, LeMoyne-Own College, Memphis, TN, 38126, USA.
| | - Hua-Jun Shawn Fan
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China.
| |
Collapse
|
15
|
Wang H, Chen B, Sun H, Zhang Y. Carbon-based molecular properties efficiently predicted by deep learning-based quantum chemical simulation with large language models. Comput Biol Med 2024; 176:108531. [PMID: 38728991 DOI: 10.1016/j.compbiomed.2024.108531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 04/21/2024] [Accepted: 04/28/2024] [Indexed: 05/12/2024]
Abstract
The prediction of thermodynamic properties of carbon-based molecules based on their geometrical conformation using fluctuation and density functional theories has achieved great success in the field of energy chemistry, while the excessive computational cost provides both opportunities and challenges for the integration of machine learning. In this work, a deep learning-based quantum chemical prediction model was constructed for efficient prediction of thermodynamic properties of carbon-based molecules. We constructed a novel framework - encoding the 3D information into a large language model (LLM), which in turn generates a 2D SMILES string, while embedding a learnable encoding designed to preserve the integrity of the original 3D information, providing better structural information for the model. Additionally, we have designed an equivariant learning module to encompass representations of conformations and feature learning for conformational sampling. This framework aims to predict thermodynamic properties more accurately than learning from 2D topology alone, while providing faster computational speeds than conventional simulations. By combining machine learning and quantum chemistry, we pioneer efficient practical applications in the field of energy chemistry. Our model advances the integration of data-driven and physics-based modeling to unlock novel insights into carbon-based molecules.
Collapse
Affiliation(s)
- Haoyu Wang
- University of Shanghai for Science and Technology, Shanghai, China; School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China.
| | - Bin Chen
- University of Shanghai for Science and Technology, Shanghai, China; School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China.
| | - Hangling Sun
- Hengtu Imalligent Technology (Shanghai) Co., Ltd., Shanghai, China
| | - Yuxuan Zhang
- University of Shanghai for Science and Technology, Shanghai, China
| |
Collapse
|
16
|
Le VT, Malik MS, Tseng YH, Lee YC, Huang CI, Ou YY. DeepPLM_mCNN: An approach for enhancing ion channel and ion transporter recognition by multi-window CNN based on features from pre-trained language models. Comput Biol Chem 2024; 110:108055. [PMID: 38555810 DOI: 10.1016/j.compbiolchem.2024.108055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 02/28/2024] [Accepted: 03/19/2024] [Indexed: 04/02/2024]
Abstract
Accurate classification of membrane proteins like ion channels and transporters is critical for elucidating cellular processes and drug development. We present DeepPLM_mCNN, a novel framework combining Pretrained Language Models (PLMs) and multi-window convolutional neural networks (mCNNs) for effective classification of membrane proteins into ion channels and ion transporters. Our approach extracts informative features from protein sequences by utilizing various PLMs, including TAPE, ProtT5_XL_U50, ESM-1b, ESM-2_480, and ESM-2_1280. These PLM-derived features are then input into a mCNN architecture to learn conserved motifs important for classification. When evaluated on ion transporters, our best performing model utilizing ProtT5 achieved 90% sensitivity, 95.8% specificity, and 95.4% overall accuracy. For ion channels, we obtained 88.3% sensitivity, 95.7% specificity, and 95.2% overall accuracy using ESM-1b features. Our proposed DeepPLM_mCNN framework demonstrates significant improvements over previous methods on unseen test data. This study illustrates the potential of combining PLMs and deep learning for accurate computational identification of membrane proteins from sequence data alone. Our findings have important implications for membrane protein research and drug development targeting ion channels and transporters. The data and source codes in this study are publicly available at the following link: https://github.com/s1129108/DeepPLM_mCNN.
Collapse
Affiliation(s)
- Van-The Le
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| | - Muhammad-Shahid Malik
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan; Department of Computer Science and Engineering, Karakoram International University, Pakistan
| | - Yi-Hsuan Tseng
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| | - Yu-Cheng Lee
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| | - Cheng-I Huang
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| | - Yu-Yen Ou
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan; Graduate Program in Biomedical Informatics, Yuan Ze University, Chung-Li, 32003, Taiwan.
| |
Collapse
|
17
|
Zheng M, Sun G, Li X, Fan Y. EGPDI: identifying protein-DNA binding sites based on multi-view graph embedding fusion. Brief Bioinform 2024; 25:bbae330. [PMID: 38975896 PMCID: PMC11229037 DOI: 10.1093/bib/bbae330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 06/08/2024] [Accepted: 06/26/2024] [Indexed: 07/09/2024] Open
Abstract
Mechanisms of protein-DNA interactions are involved in a wide range of biological activities and processes. Accurately identifying binding sites between proteins and DNA is crucial for analyzing genetic material, exploring protein functions, and designing novel drugs. In recent years, several computational methods have been proposed as alternatives to time-consuming and expensive traditional experiments. However, accurately predicting protein-DNA binding sites still remains a challenge. Existing computational methods often rely on handcrafted features and a single-model architecture, leaving room for improvement. We propose a novel computational method, called EGPDI, based on multi-view graph embedding fusion. This approach involves the integration of Equivariant Graph Neural Networks (EGNN) and Graph Convolutional Networks II (GCNII), independently configured to profoundly mine the global and local node embedding representations. An advanced gated multi-head attention mechanism is subsequently employed to capture the attention weights of the dual embedding representations, thereby facilitating the integration of node features. Besides, extra node features from protein language models are introduced to provide more structural information. To our knowledge, this is the first time that multi-view graph embedding fusion has been applied to the task of protein-DNA binding site prediction. The results of five-fold cross-validation and independent testing demonstrate that EGPDI outperforms state-of-the-art methods. Further comparative experiments and case studies also verify the superiority and generalization ability of EGPDI.
Collapse
Affiliation(s)
- Mengxin Zheng
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Guicong Sun
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Xueping Li
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Yongxian Fan
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| |
Collapse
|
18
|
Jing H, Gao Z, Xu S, Shen T, Peng Z, He S, You T, Ye S, Lin W, Sun S. Accurate prediction of antibody function and structure using bio-inspired antibody language model. Brief Bioinform 2024; 25:bbae245. [PMID: 38797969 PMCID: PMC11128484 DOI: 10.1093/bib/bbae245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 04/08/2024] [Accepted: 05/07/2024] [Indexed: 05/29/2024] Open
Abstract
In recent decades, antibodies have emerged as indispensable therapeutics for combating diseases, particularly viral infections. However, their development has been hindered by limited structural information and labor-intensive engineering processes. Fortunately, significant advancements in deep learning methods have facilitated the precise prediction of protein structure and function by leveraging co-evolution information from homologous proteins. Despite these advances, predicting the conformation of antibodies remains challenging due to their unique evolution and the high flexibility of their antigen-binding regions. Here, to address this challenge, we present the Bio-inspired Antibody Language Model (BALM). This model is trained on a vast dataset comprising 336 million 40% nonredundant unlabeled antibody sequences, capturing both unique and conserved properties specific to antibodies. Notably, BALM showcases exceptional performance across four antigen-binding prediction tasks. Moreover, we introduce BALMFold, an end-to-end method derived from BALM, capable of swiftly predicting full atomic antibody structures from individual sequences. Remarkably, BALMFold outperforms those well-established methods like AlphaFold2, IgFold, ESMFold and OmegaFold in the antibody benchmark, demonstrating significant potential to advance innovative engineering and streamline therapeutic antibody development by reducing the need for unnecessary trials. The BALMFold structure prediction server is freely available at https://beamlab-sh.com/models/BALMFold.
Collapse
Affiliation(s)
- Hongtai Jing
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200433, China
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
- MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200032, China
| | - Zhengtao Gao
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200433, China
| | - Sheng Xu
- Shanghai AI Laboratory, Shanghai 200232, China
| | - Tao Shen
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200433, China
- Zelixir Biotech, Shanghai 201206, China
| | - Zhangzhi Peng
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200433, China
| | - Shwai He
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200433, China
| | - Tao You
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200433, China
| | - Shuang Ye
- Department of Gynecologic Oncology, Fudan University Shanghai Cancer Center, Shanghai 200032, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Wei Lin
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200433, China
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
- MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200032, China
- Shanghai AI Laboratory, Shanghai 200232, China
- School of Mathematical Sciences and Shanghai Center for Mathematical Sciences, Fudan University, Shanghai 200433, China
| | - Siqi Sun
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200433, China
- Shanghai AI Laboratory, Shanghai 200232, China
| |
Collapse
|
19
|
Doga H, Raubenolt B, Cumbo F, Joshi J, DiFilippo FP, Qin J, Blankenberg D, Shehab O. A Perspective on Protein Structure Prediction Using Quantum Computers. J Chem Theory Comput 2024; 20:3359-3378. [PMID: 38703105 PMCID: PMC11099973 DOI: 10.1021/acs.jctc.4c00067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 04/19/2024] [Accepted: 04/22/2024] [Indexed: 05/06/2024]
Abstract
Despite the recent advancements by deep learning methods such as AlphaFold2, in silico protein structure prediction remains a challenging problem in biomedical research. With the rapid evolution of quantum computing, it is natural to ask whether quantum computers can offer some meaningful benefits for approaching this problem. Yet, identifying specific problem instances amenable to quantum advantage and estimating the quantum resources required are equally challenging tasks. Here, we share our perspective on how to create a framework for systematically selecting protein structure prediction problems that are amenable for quantum advantage, and estimate quantum resources for such problems on a utility-scale quantum computer. As a proof-of-concept, we validate our problem selection framework by accurately predicting the structure of a catalytic loop of the Zika Virus NS3 Helicase, on quantum hardware.
Collapse
Affiliation(s)
- Hakan Doga
- IBM Quantum,
Almaden Research Center, San Jose, California 95120, United States
| | - Bryan Raubenolt
- Center
for Computational Life Sciences, Lerner
Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Fabio Cumbo
- Center
for Computational Life Sciences, Lerner
Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Jayadev Joshi
- Center
for Computational Life Sciences, Lerner
Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Frank P. DiFilippo
- Center
for Computational Life Sciences, Lerner
Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Jun Qin
- Center
for Computational Life Sciences, Lerner
Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Daniel Blankenberg
- Center
for Computational Life Sciences, Lerner
Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Omar Shehab
- IBM
Quantum, IBM Thomas J Watson Research Center, Yorktown Heights, New York 10598, United States
| |
Collapse
|
20
|
Ahdritz G, Bouatta N, Floristean C, Kadyan S, Xia Q, Gerecke W, O'Donnell TJ, Berenberg D, Fisk I, Zanichelli N, Zhang B, Nowaczynski A, Wang B, Stepniewska-Dziubinska MM, Zhang S, Ojewole A, Guney ME, Biderman S, Watkins AM, Ra S, Lorenzo PR, Nivon L, Weitzner B, Ban YEA, Chen S, Zhang M, Li C, Song SL, He Y, Sorger PK, Mostaque E, Zhang Z, Bonneau R, AlQuraishi M. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nat Methods 2024:10.1038/s41592-024-02272-z. [PMID: 38744917 DOI: 10.1038/s41592-024-02272-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 04/03/2024] [Indexed: 05/16/2024]
Abstract
AlphaFold2 revolutionized structural biology with the ability to predict protein structures with exceptionally high accuracy. Its implementation, however, lacks the code and data required to train new models. These are necessary to (1) tackle new tasks, like protein-ligand complex structure prediction, (2) investigate the process by which the model learns and (3) assess the model's capacity to generalize to unseen regions of fold space. Here we report OpenFold, a fast, memory efficient and trainable implementation of AlphaFold2. We train OpenFold from scratch, matching the accuracy of AlphaFold2. Having established parity, we find that OpenFold is remarkably robust at generalizing even when the size and diversity of its training set is deliberately limited, including near-complete elisions of classes of secondary structure elements. By analyzing intermediate structures produced during training, we also gain insights into the hierarchical manner in which OpenFold learns to fold. In sum, our studies demonstrate the power and utility of OpenFold, which we believe will prove to be a crucial resource for the protein modeling community.
Collapse
Affiliation(s)
- Gustaf Ahdritz
- Department of Systems Biology, Columbia University, New York, NY, USA
- Harvard University, Cambridge, MA, USA
| | - Nazim Bouatta
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA.
| | | | - Sachin Kadyan
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Qinghui Xia
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - William Gerecke
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA
| | | | - Daniel Berenberg
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
| | - Ian Fisk
- Flatiron Institute, New York, NY, USA
| | | | - Bo Zhang
- Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT, USA
| | | | | | | | | | | | | | - Stella Biderman
- EleutherAI, New York, NY, USA
- Booz Allen Hamilton, McLean, VA, USA
| | | | - Stephen Ra
- Prescient Design, Genentech, New York, NY, USA
| | | | | | | | | | | | - Minjia Zhang
- University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | | | | | | | - Peter K Sorger
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA
| | | | - Zhao Zhang
- Rutgers University, New Brunswick, NJ, USA
| | | | | |
Collapse
|
21
|
Guo HB, Huntington B, Perminov A, Smith K, Hastings N, Dennis P, Kelley-Loughnane N, Berry R. AlphaFold2 modeling and molecular dynamics simulations of an intrinsically disordered protein. PLoS One 2024; 19:e0301866. [PMID: 38739602 PMCID: PMC11090348 DOI: 10.1371/journal.pone.0301866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 03/23/2024] [Indexed: 05/16/2024] Open
Abstract
We use AlphaFold2 (AF2) to model the monomer and dimer structures of an intrinsically disordered protein (IDP), Nvjp-1, assisted by molecular dynamics (MD) simulations. We observe relatively rigid dimeric structures of Nvjp-1 when compared with the monomer structures. We suggest that protein conformations from multiple AF2 models and those from MD trajectories exhibit a coherent trend: the conformations of an IDP are deviated from each other and the conformations of a well-folded protein are consistent with each other. We use a residue-residue interaction network (RIN) derived from the contact map which show that the residue-residue interactions in Nvjp-1 are mainly transient; however, those in a well-folded protein are mainly persistent. Despite the variation in 3D shapes, we show that the AF2 models of both disordered and ordered proteins exhibit highly consistent profiles of the pLDDT (predicted local distance difference test) scores. These results indicate a potential protocol to justify the IDPs based on multiple AF2 models and MD simulations.
Collapse
Affiliation(s)
- Hao-Bo Guo
- Material and Manufacturing Directorate, Air Force Research Laboratory, WPAFB, Mason, OH, United States of America
- UES Inc., Dayton, OH, United States of America
| | - Baxter Huntington
- Material and Manufacturing Directorate, Air Force Research Laboratory, WPAFB, Mason, OH, United States of America
- Miami University, Oxford, OH, United States of America
| | - Alexander Perminov
- Material and Manufacturing Directorate, Air Force Research Laboratory, WPAFB, Mason, OH, United States of America
- Miami University, Oxford, OH, United States of America
| | - Kenya Smith
- United States Air Force Academy, Colorado Springs, CO, United States of America
| | - Nicholas Hastings
- United States Air Force Academy, Colorado Springs, CO, United States of America
| | - Patrick Dennis
- Material and Manufacturing Directorate, Air Force Research Laboratory, WPAFB, Mason, OH, United States of America
| | - Nancy Kelley-Loughnane
- Material and Manufacturing Directorate, Air Force Research Laboratory, WPAFB, Mason, OH, United States of America
| | - Rajiv Berry
- Material and Manufacturing Directorate, Air Force Research Laboratory, WPAFB, Mason, OH, United States of America
| |
Collapse
|
22
|
Ille AM, Markosian C, Burley SK, Mathews MB, Pasqualini R, Arap W. Generative artificial intelligence performs rudimentary structural biology modeling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.10.575113. [PMID: 38293060 PMCID: PMC10827103 DOI: 10.1101/2024.01.10.575113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Natural language-based generative artificial intelligence (AI) has become increasingly prevalent in scientific research. Intriguingly, capabilities of generative pre-trained transformer (GPT) language models beyond the scope of natural language tasks have recently been identified. Here we explored how GPT-4 might be able to perform rudimentary structural biology modeling. We prompted GPT-4 to model 3D structures for the 20 standard amino acids and an α-helical polypeptide chain, with the latter incorporating Wolfram mathematical computation. We also used GPT-4 to perform structural interaction analysis between nirmatrelvir and its target, the SARS-CoV-2 main protease. Geometric parameters of the generated structures typically approximated close to experimental references. However, modeling was sporadically error-prone and molecular complexity was not well tolerated. Interaction analysis further revealed the ability of GPT-4 to identify specific amino acid residues involved in ligand binding along with corresponding bond distances. Despite current limitations, we show the capacity of natural language generative AI to perform basic structural biology modeling and interaction analysis with atomic-scale accuracy.
Collapse
Affiliation(s)
- Alexander M. Ille
- School of Graduate Studies, Rutgers, The State University of New Jersey, Newark, New Jersey, USA
- Rutgers Cancer Institute of New Jersey, Newark, New Jersey, USA
- Division of Cancer Biology, Department of Radiation Oncology, Rutgers New Jersey Medical School, Newark, New Jersey, USA
| | - Christopher Markosian
- School of Graduate Studies, Rutgers, The State University of New Jersey, Newark, New Jersey, USA
- Rutgers Cancer Institute of New Jersey, Newark, New Jersey, USA
- Division of Cancer Biology, Department of Radiation Oncology, Rutgers New Jersey Medical School, Newark, New Jersey, USA
| | - Stephen K. Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
- Rutgers Cancer Institute of New Jersey, Robert Wood Johnson Medical School, New Brunswick, New Jersey, USA
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, California, USA
| | - Michael B. Mathews
- School of Graduate Studies, Rutgers, The State University of New Jersey, Newark, New Jersey, USA
- Division of Infectious Disease, Department of Medicine, Rutgers New Jersey Medical School, Newark, New Jersey, USA
| | - Renata Pasqualini
- Rutgers Cancer Institute of New Jersey, Newark, New Jersey, USA
- Division of Cancer Biology, Department of Radiation Oncology, Rutgers New Jersey Medical School, Newark, New Jersey, USA
| | - Wadih Arap
- Rutgers Cancer Institute of New Jersey, Newark, New Jersey, USA
- Division of Hematology/Oncology, Department of Medicine, Rutgers New Jersey Medical School, Newark, New Jersey, USA
| |
Collapse
|
23
|
Huang J, Li J, Chen Q, Wang X, Chen G, Tang J. Freeprotmap: waiting-free prediction method for protein distance map. BMC Bioinformatics 2024; 25:176. [PMID: 38704533 PMCID: PMC11069170 DOI: 10.1186/s12859-024-05771-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 04/09/2024] [Indexed: 05/06/2024] Open
Abstract
BACKGROUND Protein residue-residue distance maps are used for remote homology detection, protein information estimation, and protein structure research. However, existing prediction approaches are time-consuming, and hundreds of millions of proteins are discovered each year, necessitating the development of a rapid and reliable prediction method for protein residue-residue distances. Moreover, because many proteins lack known homologous sequences, a waiting-free and alignment-free deep learning method is needed. RESULT In this study, we propose a learning framework named FreeProtMap. In terms of protein representation processing, the proposed group pooling in FreeProtMap effectively mitigates issues arising from high-dimensional sparseness in protein representation. In terms of model structure, we have made several careful designs. Firstly, it is designed based on the locality of protein structures and triangular inequality distance constraints to improve prediction accuracy. Secondly, inference speed is improved by using additive attention and lightweight design. Besides, the generalization ability is improved by using bottlenecks and a neural network block named local microformer. As a result, FreeProtMap can predict protein residue-residue distances in tens of milliseconds and has higher precision than the best structure prediction method. CONCLUSION Several groups of comparative experiments and ablation experiments verify the effectiveness of the designs. The results demonstrate that FreeProtMap significantly outperforms other state-of-the-art methods in accurate protein residue-residue distance prediction, which is beneficial for lots of protein research works. It is worth mentioning that we could scan all proteins discovered each year based on FreeProtMap to find structurally similar proteins in a short time because the fact that the structure similarity calculation method based on distance maps is much less time-consuming than algorithms based on 3D structures.
Collapse
Affiliation(s)
- Jiajian Huang
- Zhejiang Lab, Zhejiang, China.
- Dalian University of Technology, Liaoning, China.
| | - Jinpeng Li
- Zhejiang Lab, Zhejiang, China
- The Chinese University of Hong Kong, Hong Kong, China
| | | | - Xia Wang
- Zhejiang Lab, Zhejiang, China.
- Dalian University of Technology, Liaoning, China.
| | | | | |
Collapse
|
24
|
Zhang Y, Yu L, Yang M, Han B, Luo J, Jing R. Model fusion for predicting unconventional proteins secreted by exosomes using deep learning. Proteomics 2024:e2300184. [PMID: 38643383 DOI: 10.1002/pmic.202300184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/22/2024]
Abstract
Unconventional secretory proteins (USPs) are vital for cell-to-cell communication and are necessary for proper physiological processes. Unlike classical proteins that follow the conventional secretory pathway via the Golgi apparatus, these proteins are released using unconventional pathways. The primary modes of secretion for USPs are exosomes and ectosomes, which originate from the endoplasmic reticulum. Accurate and rapid identification of exosome-mediated secretory proteins is crucial for gaining valuable insights into the regulation of non-classical protein secretion and intercellular communication, as well as for the advancement of novel therapeutic approaches. Although computational methods based on amino acid sequence prediction exist for predicting unconventional proteins secreted by exosomes (UPSEs), they suffer from significant limitations in terms of algorithmic accuracy. In this study, we propose a novel approach to predict UPSEs by combining multiple deep learning models that incorporate both protein sequences and evolutionary information. Our approach utilizes a convolutional neural network (CNN) to extract protein sequence information, while various densely connected neural networks (DNNs) are employed to capture evolutionary conservation patterns.By combining six distinct deep learning models, we have created a superior framework that surpasses previous approaches, achieving an ACC score of 77.46% and an MCC score of 0.5406 on an independent test dataset.
Collapse
Affiliation(s)
- Yonglin Zhang
- Department of Clinical Pharmacy and Pharmacy Management, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China
| | - Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang, Guizhou, China
| | - Ming Yang
- Department of Clinical Pharmacy and Pharmacy Management, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China
| | - Bin Han
- GCP Center/Institute of Drug Clinical Trials, Affiliated Hospital of North Sichuan Medical College, Nanchong, China
| | - Jiesi Luo
- Basic Medical College, Southwest Medical University, Luzhou, Sichuan, China
| | - Runyu Jing
- School of Cyber Science and Engineering, Sichuan University, Chengdu, Sichuan, China
| |
Collapse
|
25
|
Penunuri G, Wang P, Corbett-Detig R, Russell SL. A Structural Proteome Screen Identifies Protein Mimicry in Host-Microbe Systems. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.10.588793. [PMID: 38645127 PMCID: PMC11030372 DOI: 10.1101/2024.04.10.588793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Host-microbe systems are evolutionary niches that produce coevolved biological interactions and are a key component of global health. However, these systems have historically been a difficult field of biological research due to their experimental intractability. Impactful advances in global health will be obtained by leveraging in silico screens to identify genes involved in mediating interspecific interactions. These predictions will progress our understanding of these systems and lay the groundwork for future in vitro and in vivo experiments and bioengineering projects. A driver of host-manipulation and intracellular survival utilized by host-associated microbes is molecular mimicry, a critical mechanism that can occur at any level from DNA to protein structures. We applied protein structure prediction and alignment tools to explore host-associated bacterial structural proteomes for examples of protein structure mimicry. By leveraging the Legionella pneumophila proteome and its many known structural mimics, we developed and validated a screen that can be applied to virtually any host-microbe system to uncover signals of protein mimicry. These mimics represent candidate proteins that mediate host interactions in microbial proteomes. We successfully applied this screen to other microbes with demonstrated effects on global health, Helicobacter pylori and Wolbachia , identifying protein mimic candidates in each proteome. We discuss the roles these candidates may play in important Wolbachia -induced phenotypes and show that Wobachia infection can partially rescue the loss of one of these factors. This work demonstrates how a genome-wide screen for candidates of host-manipulation and intracellular survival offers an opportunity to identify functionally important genes in host-microbe systems.
Collapse
|
26
|
Gucwa K, Wons E, Wisniewska A, Jakalski M, Dubiak Z, Kozlowski LP, Mruk I. Lethal perturbation of an Escherichia coli regulatory network is triggered by a restriction-modification system's regulator and can be mitigated by excision of the cryptic prophage Rac. Nucleic Acids Res 2024; 52:2942-2960. [PMID: 38153127 PMCID: PMC11014345 DOI: 10.1093/nar/gkad1234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 12/08/2023] [Accepted: 12/13/2023] [Indexed: 12/29/2023] Open
Abstract
Bacterial gene regulatory networks orchestrate responses to environmental challenges. Horizontal gene transfer can bring in genes with regulatory potential, such as new transcription factors (TFs), and this can disrupt existing networks. Serious regulatory perturbations may even result in cell death. Here, we show the impact on Escherichia coli of importing a promiscuous TF that has adventitious transcriptional effects within the cryptic Rac prophage. A cascade of regulatory network perturbations occurred on a global level. The TF, a C regulatory protein, normally controls a Type II restriction-modification system, but in E. coli K-12 interferes with expression of the RacR repressor gene, resulting in de-repression of the normally-silent Rac ydaT gene. YdaT is a prophage-encoded TF with pleiotropic effects on E. coli physiology. In turn, YdaT alters expression of a variety of bacterial regulons normally controlled by the RcsA TF, resulting in deficient lipopolysaccharide biosynthesis and cell division. At the same time, insufficient RacR repressor results in Rac DNA excision, halting Rac gene expression due to loss of the replication-defective Rac prophage. Overall, Rac induction appears to counteract the lethal toxicity of YdaT. We show here that E. coli rewires its regulatory network, so as to minimize the adverse regulatory effects of the imported C TF. This complex set of interactions may reflect the ability of bacteria to protect themselves by having robust mechanisms to maintain their regulatory networks, and/or suggest that regulatory C proteins from mobile operons are under selection to manipulate their host's regulatory networks for their own benefit.
Collapse
Affiliation(s)
- Katarzyna Gucwa
- Department of Microbiology, Faculty of Biology, University of Gdansk, Wita Stwosza 59, Gdansk 80-308, Poland
| | - Ewa Wons
- Department of Microbiology, Faculty of Biology, University of Gdansk, Wita Stwosza 59, Gdansk 80-308, Poland
| | - Aleksandra Wisniewska
- Department of Microbiology, Faculty of Biology, University of Gdansk, Wita Stwosza 59, Gdansk 80-308, Poland
| | - Marcin Jakalski
- 3P-Medicine Laboratory, Medical University of Gdansk, Debinki 7, 80-211 Gdansk, Poland
| | - Zuzanna Dubiak
- Department of Microbiology, Faculty of Biology, University of Gdansk, Wita Stwosza 59, Gdansk 80-308, Poland
| | - Lukasz Pawel Kozlowski
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland
| | - Iwona Mruk
- Department of Microbiology, Faculty of Biology, University of Gdansk, Wita Stwosza 59, Gdansk 80-308, Poland
| |
Collapse
|
27
|
Scalzitti N, Miralavy I, Korenchan DE, Farrar CT, Gilad AA, Banzhaf W. Computational peptide discovery with a genetic programming approach. J Comput Aided Mol Des 2024; 38:17. [PMID: 38570405 DOI: 10.1007/s10822-024-00558-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 03/07/2024] [Indexed: 04/05/2024]
Abstract
The development of peptides for therapeutic targets or biomarkers for disease diagnosis is a challenging task in protein engineering. Current approaches are tedious, often time-consuming and require complex laboratory data due to the vast search spaces that need to be considered. In silico methods can accelerate research and substantially reduce costs. Evolutionary algorithms are a promising approach for exploring large search spaces and can facilitate the discovery of new peptides. This study presents the development and use of a new variant of the genetic-programming-based POET algorithm, called POETRegex , where individuals are represented by a list of regular expressions. This algorithm was trained on a small curated dataset and employed to generate new peptides improving the sensitivity of peptides in magnetic resonance imaging with chemical exchange saturation transfer (CEST). The resulting model achieves a performance gain of 20% over the initial POET models and is able to predict a candidate peptide with a 58% performance increase compared to the gold-standard peptide. By combining the power of genetic programming with the flexibility of regular expressions, new peptide targets were identified that improve the sensitivity of detection by CEST. This approach provides a promising research direction for the efficient identification of peptides with therapeutic or diagnostic potential.
Collapse
Affiliation(s)
- Nicolas Scalzitti
- BEACON Center of Evolution in Action, Michigan State University, East Lansing, MI, USA
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Iliya Miralavy
- BEACON Center of Evolution in Action, Michigan State University, East Lansing, MI, USA
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - David E Korenchan
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Christian T Farrar
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Assaf A Gilad
- BEACON Center of Evolution in Action, Michigan State University, East Lansing, MI, USA.
- Department of Chemical Engineering, Michigan State University, East Lansing, MI, USA.
- Department of Radiology, Michigan State University, East Lansing, MI, USA.
| | - Wolfgang Banzhaf
- BEACON Center of Evolution in Action, Michigan State University, East Lansing, MI, USA.
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA.
| |
Collapse
|
28
|
Lin P, Li H, Huang SY. Deep learning in modeling protein complex structures: From contact prediction to end-to-end approaches. Curr Opin Struct Biol 2024; 85:102789. [PMID: 38402744 DOI: 10.1016/j.sbi.2024.102789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 01/16/2024] [Accepted: 02/06/2024] [Indexed: 02/27/2024]
Abstract
Protein-protein interactions play crucial roles in many biological processes. Traditionally, protein complex structures are normally built by protein-protein docking. With the rapid development of artificial intelligence and its great success in monomer protein structure prediction, deep learning has widely been applied to modeling protein-protein complex structures through inter-protein contact prediction and end-to-end approaches in the past few years. This article reviews the recent advances of deep-learning-based approaches in modeling protein-protein complex structures as well as their advantages and limitations. Challenges and possible future directions are also briefly discussed in applying deep learning for the prediction of protein complex structures.
Collapse
Affiliation(s)
- Peicong Lin
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China
| | - Hao Li
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China.
| |
Collapse
|
29
|
Zhang J, Durham J, Qian Cong. Revolutionizing protein-protein interaction prediction with deep learning. Curr Opin Struct Biol 2024; 85:102775. [PMID: 38330793 DOI: 10.1016/j.sbi.2024.102775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 12/31/2023] [Accepted: 01/05/2024] [Indexed: 02/10/2024]
Abstract
Protein-protein interactions (PPIs) are pivotal for driving diverse biological processes, and any disturbance in these interactions can lead to disease. Thus, the study of PPIs has been a central focus in biology. Recent developments in deep learning methods, coupled with the vast genomic sequence data, have significantly boosted the accuracy of predicting protein structures and modeling protein complexes, approaching levels comparable to experimental techniques. Herein, we review the latest advances in the computational methods for modeling 3D protein complexes and the prediction of protein interaction partners, emphasizing the application of deep learning methods deriving from coevolution analysis. The review also highlights biomedical applications of PPI prediction and outlines challenges in the field.
Collapse
Affiliation(s)
- Jing Zhang
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; HaroldC.Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA. https://twitter.com/jzhang_genome
| | - Jesse Durham
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; HaroldC.Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Qian Cong
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; HaroldC.Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
30
|
Wang H, Liu D, Zhao K, Wang Y, Zhang G. SPDesign: protein sequence designer based on structural sequence profile using ultrafast shape recognition. Brief Bioinform 2024; 25:bbae146. [PMID: 38600663 PMCID: PMC11006797 DOI: 10.1093/bib/bbae146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 03/02/2024] [Accepted: 03/15/2024] [Indexed: 04/12/2024] Open
Abstract
Protein sequence design can provide valuable insights into biopharmaceuticals and disease treatments. Currently, most protein sequence design methods based on deep learning focus on network architecture optimization, while ignoring protein-specific physicochemical features. Inspired by the successful application of structure templates and pre-trained models in the protein structure prediction, we explored whether the representation of structural sequence profile can be used for protein sequence design. In this work, we propose SPDesign, a method for protein sequence design based on structural sequence profile using ultrafast shape recognition. Given an input backbone structure, SPDesign utilizes ultrafast shape recognition vectors to accelerate the search for similar protein structures in our in-house PAcluster80 structure database and then extracts the sequence profile through structure alignment. Combined with structural pre-trained knowledge and geometric features, they are further fed into an enhanced graph neural network for sequence prediction. The results show that SPDesign significantly outperforms the state-of-the-art methods, such as ProteinMPNN, Pifold and LM-Design, leading to 21.89%, 15.54% and 11.4% accuracy gains in sequence recovery rate on CATH 4.2 benchmark, respectively. Encouraging results also have been achieved on orphan and de novo (designed) benchmarks with few homologous sequences. Furthermore, analysis conducted by the PDBench tool suggests that SPDesign performs well in subdivided structures. More interestingly, we found that SPDesign can well reconstruct the sequences of some proteins that have similar structures but different sequences. Finally, the structural modeling verification experiment indicates that the sequences designed by SPDesign can fold into the native structures more accurately.
Collapse
Affiliation(s)
| | | | | | - Yajun Wang
- Corresponding authors. Guijun Zhang, College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China. E-mail: ; Yajun Wang, College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China. E-mail:
| | - Guijun Zhang
- Corresponding authors. Guijun Zhang, College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China. E-mail: ; Yajun Wang, College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China. E-mail:
| |
Collapse
|
31
|
Jing X, Wu F, Luo X, Xu J. Single-sequence protein structure prediction by integrating protein language models. Proc Natl Acad Sci U S A 2024; 121:e2308788121. [PMID: 38507445 PMCID: PMC10990103 DOI: 10.1073/pnas.2308788121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 02/05/2024] [Indexed: 03/22/2024] Open
Abstract
Protein structure prediction has been greatly improved by deep learning in the past few years. However, the most successful methods rely on multiple sequence alignment (MSA) of the sequence homologs of the protein under prediction. In nature, a protein folds in the absence of its sequence homologs and thus, a MSA-free structure prediction method is desired. Here, we develop a single-sequence-based protein structure prediction method RaptorX-Single by integrating several protein language models and a structure generation module and then study its advantage over MSA-based methods. Our experimental results indicate that in addition to running much faster than MSA-based methods such as AlphaFold2, RaptorX-Single outperforms AlphaFold2 and other MSA-free methods in predicting the structure of antibodies (after fine-tuning on antibody data), proteins of very few sequence homologs, and single mutation effects. By comparing different protein language models, our results show that not only the scale but also the training data of protein language models will impact the performance. RaptorX-Single also compares favorably to MSA-based AlphaFold2 when the protein under prediction has a large number of sequence homologs.
Collapse
Affiliation(s)
| | - Fandi Wu
- MoleculeMind Ltd., Beijing100084, China
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing100190, China
| | - Xiao Luo
- Toyota Technological Institute at Chicago, Chicago, IL60637
- Shanghai Artificial Intelligence Laboratory, Shanghai200232, China
| | - Jinbo Xu
- MoleculeMind Ltd., Beijing100084, China
- Toyota Technological Institute at Chicago, Chicago, IL60637
| |
Collapse
|
32
|
Roche R, Moussad B, Shuvo MH, Tarafder S, Bhattacharya D. EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks. Nucleic Acids Res 2024; 52:e27. [PMID: 38281252 PMCID: PMC10954458 DOI: 10.1093/nar/gkae039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 12/22/2023] [Accepted: 01/11/2024] [Indexed: 01/30/2024] Open
Abstract
Protein language models (pLMs) trained on a large corpus of protein sequences have shown unprecedented scalability and broad generalizability in a wide range of predictive modeling tasks, but their power has not yet been harnessed for predicting protein-nucleic acid binding sites, critical for characterizing the interactions between proteins and nucleic acids. Here, we present EquiPNAS, a new pLM-informed E(3) equivariant deep graph neural network framework for improved protein-nucleic acid binding site prediction. By combining the strengths of pLM and symmetry-aware deep graph learning, EquiPNAS consistently outperforms the state-of-the-art methods for both protein-DNA and protein-RNA binding site prediction on multiple datasets across a diverse set of predictive modeling scenarios ranging from using experimental input to AlphaFold2 predictions. Our ablation study reveals that the pLM embeddings used in EquiPNAS are sufficiently powerful to dramatically reduce the dependence on the availability of evolutionary information without compromising on accuracy, and that the symmetry-aware nature of the E(3) equivariant graph-based neural architecture offers remarkable robustness and performance resilience. EquiPNAS is freely available at https://github.com/Bhattacharya-Lab/EquiPNAS.
Collapse
Affiliation(s)
- Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Sumit Tarafder
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | |
Collapse
|
33
|
Hong L, Kortemme T. An integrative approach to protein sequence design through multiobjective optimization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.01.582670. [PMID: 38496480 PMCID: PMC10942313 DOI: 10.1101/2024.03.01.582670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
With recent methodological advances in the field of computational protein design, in particular those based on deep learning, there is an increasing need for frameworks that allow for coherent, direct integration of different models and objective functions into the generative design process. Here we demonstrate how evolutionary multiobjective optimization techniques can be adapted to provide such an approach. With the established Non-dominated Sorting Genetic Algorithm II (NSGA-II) as the optimization framework, we use AlphaFold2 and ProteinMPNN confidence metrics to define the objective space, and a mutation operator composed of ESM-1v and ProteinMPNN to rank and then redesign the least favorable positions. Using the multistate design problem of the foldswitching protein RfaH as an in-depth case study, we show that the evolutionary multiobjective optimization approach leads to significant reduction in the bias and variance in RfaH native sequence recovery, compared to a direct application of ProteinMPNN. We suggest that this improvement is due to three factors: (i) the use of an informative mutation operator that accelerates the sequence space exploration, (ii) the parallel, iterative design process inherent to the genetic algorithm that improves upon the ProteinMPNN autoregressive sequence decoding scheme, and (iii) the explicit approximation of the Pareto front that leads to optimal design candidates representing diverse tradeoff conditions. We anticipate this approach to be readily adaptable to different models and broadly relevant for protein design tasks with complex specifications.
Collapse
Affiliation(s)
- Lu Hong
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Tanja Kortemme
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| |
Collapse
|
34
|
Tang Z, Koo PK. Evaluating the representational power of pre-trained DNA language models for regulatory genomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.29.582810. [PMID: 38464101 PMCID: PMC10925287 DOI: 10.1101/2024.02.29.582810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
The emergence of genomic language models (gLMs) offers an unsupervised approach to learn a wide diversity of cis-regulatory patterns in the non-coding genome without requiring labels of functional activity generated by wet-lab experiments. Previous evaluations have shown pre-trained gLMs can be leveraged to improve prediction performance across a broad range of regulatory genomics tasks, albeit using relatively simple benchmark datasets and baseline models. Since the gLMs in these studies were tested upon fine-tuning their weights for each downstream task, determining whether gLM representations embody a foundational understanding of cis-regulatory biology remains an open question. Here we evaluate the representational power of pre-trained gLMs to predict and interpret cell-type-specific functional genomics data that span DNA and RNA regulation. Our findings suggest that current gLMs do not offer substantial advantages over conventional machine learning approaches that use one-hot encoded sequences. This work highlights a major limitation with current gLMs, raising potential issues in conventional pre-training strategies for the non-coding genome.
Collapse
Affiliation(s)
- Ziqi Tang
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, NY, USA
| | - Peter K Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, NY, USA
| |
Collapse
|
35
|
Michael-Pitschaze T, Cohen N, Ofer D, Hoshen Y, Linial M. Detecting anomalous proteins using deep representations. NAR Genom Bioinform 2024; 6:lqae021. [PMID: 38486884 PMCID: PMC10939404 DOI: 10.1093/nargab/lqae021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 11/17/2023] [Accepted: 02/23/2024] [Indexed: 03/17/2024] Open
Abstract
Many advances in biomedicine can be attributed to identifying unusual proteins and genes. Many of these proteins' unique properties were discovered by manual inspection, which is becoming infeasible at the scale of modern protein datasets. Here, we propose to tackle this challenge using anomaly detection methods that automatically identify unexpected properties. We adopt a state-of-the-art anomaly detection paradigm from computer vision, to highlight unusual proteins. We generate meaningful representations without labeled inputs, using pretrained deep neural network models. We apply these protein language models (pLM) to detect anomalies in function, phylogenetic families, and segmentation tasks. We compute protein anomaly scores to highlight human prion-like proteins, distinguish viral proteins from their host proteome, and mark non-classical ion/metal binding proteins and enzymes. Other tasks concern segmentation of protein sequences into folded and unstructured regions. We provide candidates for rare functionality (e.g. prion proteins). Additionally, we show the anomaly score is useful in 3D folding-related segmentation. Our novel method shows improved performance over strong baselines and has objectively high performance across a variety of tasks. We conclude that the combination of pLM and anomaly detection techniques is a valid method for discovering a range of global and local protein characteristics.
Collapse
Affiliation(s)
- Tomer Michael-Pitschaze
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Niv Cohen
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Dan Ofer
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Yedid Hoshen
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Michal Linial
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
36
|
Jänes J, Beltrao P. Deep learning for protein structure prediction and design-progress and applications. Mol Syst Biol 2024; 20:162-169. [PMID: 38291232 PMCID: PMC10912668 DOI: 10.1038/s44320-024-00016-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 12/21/2023] [Accepted: 01/11/2024] [Indexed: 02/01/2024] Open
Abstract
Proteins are the key molecular machines that orchestrate all biological processes of the cell. Most proteins fold into three-dimensional shapes that are critical for their function. Studying the 3D shape of proteins can inform us of the mechanisms that underlie biological processes in living cells and can have practical applications in the study of disease mutations or the discovery of novel drug treatments. Here, we review the progress made in sequence-based prediction of protein structures with a focus on applications that go beyond the prediction of single monomer structures. This includes the application of deep learning methods for the prediction of structures of protein complexes, different conformations, the evolution of protein structures and the application of these methods to protein design. These developments create new opportunities for research that will have impact across many areas of biomedical research.
Collapse
Affiliation(s)
- Jürgen Jänes
- Institute of Molecular Systems Biology, ETH Zürich, 8093, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Pedro Beltrao
- Institute of Molecular Systems Biology, ETH Zürich, 8093, Zürich, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
37
|
Banayan NE, Loughlin BJ, Singh S, Forouhar F, Lu G, Wong K, Neky M, Hunt HS, Bateman LB, Tamez A, Handelman SK, Price WN, Hunt JF. Systematic enhancement of protein crystallization efficiency by bulk lysine-to-arginine (KR) substitution. Protein Sci 2024; 33:e4898. [PMID: 38358135 PMCID: PMC10868448 DOI: 10.1002/pro.4898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 01/01/2024] [Accepted: 01/02/2024] [Indexed: 02/16/2024]
Abstract
Structural genomics consortia established that protein crystallization is the primary obstacle to structure determination using x-ray crystallography. We previously demonstrated that crystallization propensity is systematically related to primary sequence, and we subsequently performed computational analyses showing that arginine is the most overrepresented amino acid in crystal-packing interfaces in the Protein Data Bank. Given the similar physicochemical characteristics of arginine and lysine, we hypothesized that multiple lysine-to-arginine (KR) substitutions should improve crystallization. To test this hypothesis, we developed software that ranks lysine sites in a target protein based on the redundancy-corrected KR substitution frequency in homologs. This software can be run interactively on the worldwide web at https://www.pxengineering.org/. We demonstrate that three unrelated single-domain proteins can tolerate 5-11 KR substitutions with at most minor destabilization, and, for two of these three proteins, the construct with the largest number of KR substitutions exhibits significantly enhanced crystallization propensity. This approach rapidly produced a 1.9 Å crystal structure of a human protein domain refractory to crystallization with its native sequence. Structures from Bulk KR-substituted domains show the engineered arginine residues frequently make hydrogen-bonds across crystal-packing interfaces. We thus demonstrate that Bulk KR substitution represents a rational and efficient method for probabilistic engineering of protein surface properties to improve crystallization.
Collapse
Affiliation(s)
- Nooriel E. Banayan
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| | - Blaine J. Loughlin
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| | - Shikha Singh
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| | - Farhad Forouhar
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| | - Guanqi Lu
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| | - Kam‐Ho Wong
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
- Present address:
Vaccine Research and DevelopmentPfizer Inc.Pearl RiverNew YorkUSA
| | - Matthew Neky
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
- Present address:
Columbia UniversityNew YorkNew YorkUSA
| | - Henry S. Hunt
- Department of PhysicsStanford UniversityStanfordCaliforniaUSA
| | | | | | - Samuel K. Handelman
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
- Present address:
Department of Pain & Neuronal HealthEli Lily & Co.893 Delaware StIndianapolisIndianaUSA
| | - W. Nicholson Price
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
- Present address:
University of Michigan Law SchoolAnn ArborMichiganUSA
| | - John F. Hunt
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| |
Collapse
|
38
|
Taujale R, Gravel N, Zhou Z, Yeung W, Kochut K, Kannan N. Informatic challenges and advances in illuminating the druggable proteome. Drug Discov Today 2024; 29:103894. [PMID: 38266979 DOI: 10.1016/j.drudis.2024.103894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 01/08/2024] [Accepted: 01/17/2024] [Indexed: 01/26/2024]
Abstract
The understudied members of the druggable proteomes offer promising prospects for drug discovery efforts. While large-scale initiatives have generated valuable functional information on understudied members of the druggable gene families, translating this information into actionable knowledge for drug discovery requires specialized informatics tools and resources. Here, we review the unique informatics challenges and advances in annotating understudied members of the druggable proteome. We demonstrate the application of statistical evolutionary inference tools, knowledge graph mining approaches, and protein language models in illuminating understudied protein kinases, pseudokinases, and ion channels.
Collapse
Affiliation(s)
- Rahil Taujale
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA
| | - Nathan Gravel
- Institute of Bioinformatics, University of Georgia, Athens, GA, USA
| | | | - Wayland Yeung
- Institute of Bioinformatics, University of Georgia, Athens, GA, USA
| | - Krystof Kochut
- School of Computing, University of Georgia, Athens, GA, USA
| | - Natarajan Kannan
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA; Institute of Bioinformatics, University of Georgia, Athens, GA, USA.
| |
Collapse
|
39
|
Brooks BW, van den Berg S, Dreier DA, LaLone CA, Owen SF, Raimondo S, Zhang X. Towards Precision Ecotoxicology: Leveraging Evolutionary Conservation of Pharmaceutical and Personal Care Product Targets to Understand Adverse Outcomes Across Species and Life Stages. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY 2024; 43:526-536. [PMID: 37787405 PMCID: PMC11017229 DOI: 10.1002/etc.5754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 05/19/2023] [Accepted: 09/20/2023] [Indexed: 10/04/2023]
Abstract
Translation of environmental science to the practice aims to protect biodiversity and ecosystem services, and our future ability to do so relies on the development of a precision ecotoxicology approach wherein we leverage the genetics and informatics of species to better understand and manage the risks of global pollution. A little over a decade ago, a workshop focusing on the risks of pharmaceuticals and personal care products (PPCPs) in the environment identified a priority research question, "What can be learned about the evolutionary conservation of PPCP targets across species and life stages in the context of potential adverse outcomes and effects?" We review the activities in this area over the past decade, consider prospects of more recent developments, and identify future research needs to develop next-generation approaches for PPCPs and other global chemicals and waste challenges. Environ Toxicol Chem 2024;43:526-536. © 2023 SETAC. This article has been contributed to by U.S. Government employees and their work is in the public domain in the USA.
Collapse
Affiliation(s)
- Bryan W Brooks
- Department of Environmental Science, Center for Reservoir and Aquatic Systems Research, Institute of Biomedical Studies, Baylor University, Waco, Texas, USA
| | | | - David A Dreier
- Syngenta Crop Protection, Greensboro, North Carolina, USA
| | - Carlie A LaLone
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Duluth, Minnesota
| | - Stewart F Owen
- Global Sustainability, Astra Zeneca, Macclesfield, Cheshire, UK
| | - Sandy Raimondo
- Gulf Ecosystem Measurement and Modeling Division, Office of Research and Development, US Environmental Protection Agency, Gulf Breeze, Florida
| | - Xiaowei Zhang
- School of the Environment, Nanjing University, Nanjing, China
| |
Collapse
|
40
|
Yang J, Li FZ, Arnold FH. Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering. ACS CENTRAL SCIENCE 2024; 10:226-241. [PMID: 38435522 PMCID: PMC10906252 DOI: 10.1021/acscentsci.3c01275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 12/26/2023] [Accepted: 01/16/2024] [Indexed: 03/05/2024]
Abstract
Enzymes can be engineered at the level of their amino acid sequences to optimize key properties such as expression, stability, substrate range, and catalytic efficiency-or even to unlock new catalytic activities not found in nature. Because the search space of possible proteins is vast, enzyme engineering usually involves discovering an enzyme starting point that has some level of the desired activity followed by directed evolution to improve its "fitness" for a desired application. Recently, machine learning (ML) has emerged as a powerful tool to complement this empirical process. ML models can contribute to (1) starting point discovery by functional annotation of known protein sequences or generating novel protein sequences with desired functions and (2) navigating protein fitness landscapes for fitness optimization by learning mappings between protein sequences and their associated fitness values. In this Outlook, we explain how ML complements enzyme engineering and discuss its future potential to unlock improved engineering outcomes.
Collapse
Affiliation(s)
- Jason Yang
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Francesca-Zhoufan Li
- Division
of Biology and Biological Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Frances H. Arnold
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
- Division
of Biology and Biological Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
41
|
Wu KE, Yang KK, van den Berg R, Alamdari S, Zou JY, Lu AX, Amini AP. Protein structure generation via folding diffusion. Nat Commun 2024; 15:1059. [PMID: 38316764 PMCID: PMC10844308 DOI: 10.1038/s41467-024-45051-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 01/12/2024] [Indexed: 02/07/2024] Open
Abstract
The ability to computationally generate novel yet physically foldable protein structures could lead to new biological discoveries and new treatments targeting yet incurable diseases. Despite recent advances in protein structure prediction, directly generating diverse, novel protein structures from neural networks remains difficult. In this work, we present a diffusion-based generative model that generates protein backbone structures via a procedure inspired by the natural folding process. We describe a protein backbone structure as a sequence of angles capturing the relative orientation of the constituent backbone atoms, and generate structures by denoising from a random, unfolded state towards a stable folded structure. Not only does this mirror how proteins natively twist into energetically favorable conformations, the inherent shift and rotational invariance of this representation crucially alleviates the need for more complex equivariant networks. We train a denoising diffusion probabilistic model with a simple transformer backbone and demonstrate that our resulting model unconditionally generates highly realistic protein structures with complexity and structural patterns akin to those of naturally-occurring proteins. As a useful resource, we release an open-source codebase and trained models for protein structure diffusion.
Collapse
Affiliation(s)
- Kevin E Wu
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | | | | | | | - James Y Zou
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Alex X Lu
- Microsoft Research, Cambridge, MA, USA
| | | |
Collapse
|
42
|
Li H, Sun X, Cui W, Xu M, Dong J, Ekundayo BE, Ni D, Rao Z, Guo L, Stahlberg H, Yuan S, Vogel H. Computational drug development for membrane protein targets. Nat Biotechnol 2024; 42:229-242. [PMID: 38361054 DOI: 10.1038/s41587-023-01987-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 09/13/2023] [Indexed: 02/17/2024]
Abstract
The application of computational biology in drug development for membrane protein targets has experienced a boost from recent developments in deep learning-driven structure prediction, increased speed and resolution of structure elucidation, machine learning structure-based design and the evaluation of big data. Recent protein structure predictions based on machine learning tools have delivered surprisingly reliable results for water-soluble and membrane proteins but have limitations for development of drugs that target membrane proteins. Structural transitions of membrane proteins have a central role during transmembrane signaling and are often influenced by therapeutic compounds. Resolving the structural and functional basis of dynamic transmembrane signaling networks, especially within the native membrane or cellular environment, remains a central challenge for drug development. Tackling this challenge will require an interplay between experimental and computational tools, such as super-resolution optical microscopy for quantification of the molecular interactions of cellular signaling networks and their modulation by potential drugs, cryo-electron microscopy for determination of the structural transitions of proteins in native cell membranes and entire cells, and computational tools for data analysis and prediction of the structure and function of cellular signaling networks, as well as generation of promising drug candidates.
Collapse
Affiliation(s)
- Haijian Li
- Center for Computer-Aided Drug Discovery, Faculty of Pharmaceutical Sciences, Shenzhen Institute of Advanced Technology/Chinese Academy of Sciences (SIAT/CAS), Shenzhen, China
| | - Xiaolin Sun
- Center for Computer-Aided Drug Discovery, Faculty of Pharmaceutical Sciences, Shenzhen Institute of Advanced Technology/Chinese Academy of Sciences (SIAT/CAS), Shenzhen, China
| | - Wenqiang Cui
- Center for Computer-Aided Drug Discovery, Faculty of Pharmaceutical Sciences, Shenzhen Institute of Advanced Technology/Chinese Academy of Sciences (SIAT/CAS), Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Marc Xu
- Center for Computer-Aided Drug Discovery, Faculty of Pharmaceutical Sciences, Shenzhen Institute of Advanced Technology/Chinese Academy of Sciences (SIAT/CAS), Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Junlin Dong
- Center for Computer-Aided Drug Discovery, Faculty of Pharmaceutical Sciences, Shenzhen Institute of Advanced Technology/Chinese Academy of Sciences (SIAT/CAS), Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Babatunde Edukpe Ekundayo
- Laboratory of Biological Electron Microscopy, IPHYS, SB, EPFL and Department of Fundamental Microbiology, Faculty of Biology and Medicine, University of Lausanne, Lausanne, Switzerland
| | - Dongchun Ni
- Laboratory of Biological Electron Microscopy, IPHYS, SB, EPFL and Department of Fundamental Microbiology, Faculty of Biology and Medicine, University of Lausanne, Lausanne, Switzerland
| | - Zhili Rao
- Center for Computer-Aided Drug Discovery, Faculty of Pharmaceutical Sciences, Shenzhen Institute of Advanced Technology/Chinese Academy of Sciences (SIAT/CAS), Shenzhen, China
| | - Liwei Guo
- Center for Computer-Aided Drug Discovery, Faculty of Pharmaceutical Sciences, Shenzhen Institute of Advanced Technology/Chinese Academy of Sciences (SIAT/CAS), Shenzhen, China
| | - Henning Stahlberg
- Laboratory of Biological Electron Microscopy, IPHYS, SB, EPFL and Department of Fundamental Microbiology, Faculty of Biology and Medicine, University of Lausanne, Lausanne, Switzerland.
| | - Shuguang Yuan
- Center for Computer-Aided Drug Discovery, Faculty of Pharmaceutical Sciences, Shenzhen Institute of Advanced Technology/Chinese Academy of Sciences (SIAT/CAS), Shenzhen, China.
| | - Horst Vogel
- Center for Computer-Aided Drug Discovery, Faculty of Pharmaceutical Sciences, Shenzhen Institute of Advanced Technology/Chinese Academy of Sciences (SIAT/CAS), Shenzhen, China.
- Institut des Sciences et Ingénierie Chimiques (ISIC), Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.
| |
Collapse
|
43
|
Xiong D, Qiu Y, Zhao J, Zhou Y, Lee D, Gupta S, Torres M, Lu W, Liang S, Kang JJ, Eng C, Loscalzo J, Cheng F, Yu H. Structurally-informed human interactome reveals proteome-wide perturbations by disease mutations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.04.24.538110. [PMID: 37162909 PMCID: PMC10168245 DOI: 10.1101/2023.04.24.538110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Human genome sequencing studies have identified numerous loci associated with complex diseases. However, translating human genetic and genomic findings to disease pathobiology and therapeutic discovery remains a major challenge at multiscale interactome network levels. Here, we present a deep-learning-based ensemble framework, termed PIONEER (Protein-protein InteractiOn iNtErfacE pRediction), that accurately predicts protein binding partner-specific interfaces for all known protein interactions in humans and seven other common model organisms, generating comprehensive structurally-informed protein interactomes. We demonstrate that PIONEER outperforms existing state-of-the-art methods. We further systematically validated PIONEER predictions experimentally through generating 2,395 mutations and testing their impact on 6,754 mutation-interaction pairs, confirming the high quality and validity of PIONEER predictions. We show that disease-associated mutations are enriched in PIONEER-predicted protein-protein interfaces after mapping mutations from ~60,000 germline exomes and ~36,000 somatic genomes. We identify 586 significant protein-protein interactions (PPIs) enriched with PIONEER-predicted interface somatic mutations (termed oncoPPIs) from pan-cancer analysis of ~11,000 tumor whole-exomes across 33 cancer types. We show that PIONEER-predicted oncoPPIs are significantly associated with patient survival and drug responses from both cancer cell lines and patient-derived xenograft mouse models. We identify a landscape of PPI-perturbing tumor alleles upon ubiquitination by E3 ligases, and we experimentally validate the tumorigenic KEAP1-NRF2 interface mutation p.Thr80Lys in non-small cell lung cancer. We show that PIONEER-predicted PPI-perturbing alleles alter protein abundance and correlates with drug responses and patient survival in colon and uterine cancers as demonstrated by proteogenomic data from the National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium. PIONEER, implemented as both a web server platform and a software package, identifies functional consequences of disease-associated alleles and offers a deep learning tool for precision medicine at multiscale interactome network levels.
Collapse
Affiliation(s)
- Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
| | - Yunguang Qiu
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Junfei Zhao
- Department of Systems Biology, Herbert Irving Comprehensive Center, Columbia University, New York, NY 10032, USA
| | - Yadi Zhou
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Dongjin Lee
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Shobhita Gupta
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
- Biophysics Program, Cornell University, Ithaca, NY 14853, USA
| | - Mateo Torres
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
| | - Weiqiang Lu
- Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Siqi Liang
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Jin Joo Kang
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
| | - Charis Eng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Joseph Loscalzo
- Channing Division of Network Medicine, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
44
|
Zhang S, Li J, Chen SJ. Machine learning in RNA structure prediction: Advances and challenges. Biophys J 2024:S0006-3495(24)00067-5. [PMID: 38297836 DOI: 10.1016/j.bpj.2024.01.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 01/08/2024] [Accepted: 01/24/2024] [Indexed: 02/02/2024] Open
Abstract
RNA molecules play a crucial role in various biological processes, with their functionality closely tied to their structures. The remarkable advancements in machine learning techniques for protein structure prediction have shown promise in the field of RNA structure prediction. In this perspective, we discuss the advances and challenges encountered in constructing machine learning-based models for RNA structure prediction. We explore topics including model building strategies, specific challenges involved in predicting RNA secondary (2D) and tertiary (3D) structures, and approaches to these challenges. In addition, we highlight the advantages and challenges of constructing RNA language models. Given the rapid advances of machine learning techniques, we anticipate that machine learning-based models will serve as important tools for predicting RNA structures, thereby enriching our understanding of RNA structures and their corresponding functions.
Collapse
Affiliation(s)
- Sicheng Zhang
- Department of Physics and Institute of Data Science and Informatics, University of Missouri, Columbia, Missouri
| | - Jun Li
- Department of Physics and Institute of Data Science and Informatics, University of Missouri, Columbia, Missouri
| | - Shi-Jie Chen
- Department of Physics and Institute of Data Science and Informatics, University of Missouri, Columbia, Missouri; Department of Biochemistry, University of Missouri, Columbia, Missouri.
| |
Collapse
|
45
|
Stein RA, Mchaourab HS. Rosetta Energy Analysis of AlphaFold2 models: Point Mutations and Conformational Ensembles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.05.556364. [PMID: 37732281 PMCID: PMC10508732 DOI: 10.1101/2023.09.05.556364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
There has been an explosive growth in the applications of AlphaFold2, and other structure prediction platforms, to accurately predict protein structures from a multiple sequence alignment (MSA) for downstream structural analysis. However, two outstanding questions persist in the field regarding the robustness of AlphaFold2 predictions of the consequences of point mutations and the completeness of its prediction of protein conformational ensembles. We combined our previously developed method SPEACH_AF with model relaxation and energetic analysis with Rosetta to address these questions. SPEACH_AF introduces residue substitutions across the MSA and not just within the input sequence. With respect to conformational ensembles, we combined SPEACH_AF and a new MSA subsampling method, AF_cluster, and for a benchmarked set of proteins, we found that the energetics of the conformational ensembles generated by AlphaFold2 correspond to those of experimental structures and explored by standard molecular dynamic methods. With respect to point mutations, we compared the structural and energetic consequences of having the mutation(s) in the input sequence versus in the whole MSA (SPEACH_AF). Both methods yielded models different from the wild-type sequence, with more robust changes when the mutation(s) were in the whole MSA. While our findings demonstrate the robustness of AlphaFold2 in analyzing point mutations and exploring conformational ensembles, they highlight the need for multi parameter structural and energetic analyses of these models to generate experimentally testable hypotheses.
Collapse
Affiliation(s)
- Richard A Stein
- Department of Molecular Physiology and Biophysics and Center for Applied AI in Protein Dynamics Vanderbilt University
| | - Hassane S Mchaourab
- Department of Molecular Physiology and Biophysics and Center for Applied AI in Protein Dynamics Vanderbilt University
| |
Collapse
|
46
|
Szadkowska M, Kocot AM, Sowik D, Wyrzykowski D, Jankowska E, Kozlowski LP, Makowska J, Plotka M. Molecular characterization of the PhiKo endolysin from Thermus thermophilus HB27 bacteriophage phiKo and its cryptic lytic peptide RAP-29. Front Microbiol 2024; 14:1303794. [PMID: 38312500 PMCID: PMC10836841 DOI: 10.3389/fmicb.2023.1303794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 12/12/2023] [Indexed: 02/06/2024] Open
Abstract
Introduction In the era of increasing bacterial resistance to antibiotics, new bactericidal substances are sought, and lysins derived from extremophilic organisms have the undoubted advantage of being stable under harsh environmental conditions. The PhiKo endolysin is derived from the phiKo bacteriophage infecting Gram-negative extremophilic bacterium Thermus thermophilus HB27. This enzyme shows similarity to two previously investigated thermostable type-2 amidases, the Ts2631 and Ph2119 from Thermus scotoductus bacteriophages, that revealed high lytic activity not only against thermophiles but also against Gram-negative mesophilic bacteria. Therefore, antibacterial potential of the PhiKo endolysin was investigated in the study presented here. Methods Enzyme activity was assessed using turbidity reduction assays (TRAs) and antibacterial tests. Differential scanning calorimetry was applied to evaluate protein stability. The Collection of Anti-Microbial Peptides (CAMP) and Antimicrobial Peptide Calculator and Predictor (APD3) were used to predict regions with antimicrobial potential in the PhiKo primary sequence. The minimum inhibitory concentration (MIC) of the RAP-29 synthetic peptide was determined against Gram-positive and Gram-negative selected strains, and mechanism of action was investigated with use of membrane potential sensitive fluorescent dye 3,3'-Dipropylthiacarbocyanine iodide (DiSC3(5)). Results and discussion The PhiKo endolysin is highly thermostable with melting temperature of 91.70°C. However, despite its lytic effect against such extremophiles as: T. thermophilus, Thermus flavus, Thermus parvatiensis, Thermus scotoductus, and Deinococcus radiodurans, PhiKo showed moderate antibacterial activity against mesophiles. Consequently, its protein sequence was searched for regions with potential antibacterial activity. A highly positively charged region was identified and synthetized (PhiKo105-133). The novel RAP-29 peptide lysed mesophilic strains of staphylococci and Gram-negative bacteria, reducing the number of cells by 3.7-7.1 log units and reaching the minimum inhibitory concentration values in the range of 2-31 μM. This peptide is unstructured in an aqueous solution but forms an α-helix in the presence of detergents. Moreover, it binds lipoteichoic acid and lipopolysaccharide, and causes depolarization of bacterial membranes. The RAP-29 peptide is a promising candidate for combating bacterial pathogens. The existence of this cryptic peptide testifies to a much wider panel of antimicrobial peptides than thought previously.
Collapse
Affiliation(s)
- Monika Szadkowska
- Laboratory of Extremophiles Biology, Department of Microbiology, University of Gdańsk, Gdańsk, Poland
| | - Aleksandra Maria Kocot
- Laboratory of Extremophiles Biology, Department of Microbiology, University of Gdańsk, Gdańsk, Poland
| | - Daria Sowik
- Department of Biomedical Chemistry, Faculty of Chemistry, University of Gdańsk, Gdańsk, Poland
| | - Dariusz Wyrzykowski
- Department of General and Inorganic Chemistry, Faculty of Chemistry, University of Gdańsk, Gdańsk, Poland
| | - Elzbieta Jankowska
- Department of Biomedical Chemistry, Faculty of Chemistry, University of Gdańsk, Gdańsk, Poland
| | - Lukasz Pawel Kozlowski
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | - Joanna Makowska
- Department of General and Inorganic Chemistry, Faculty of Chemistry, University of Gdańsk, Gdańsk, Poland
| | - Magdalena Plotka
- Laboratory of Extremophiles Biology, Department of Microbiology, University of Gdańsk, Gdańsk, Poland
| |
Collapse
|
47
|
Krokidis MG, Dimitrakopoulos GN, Vrahatis AG, Exarchos TP, Vlamos P. Challenges and limitations in computational prediction of protein misfolding in neurodegenerative diseases. Front Comput Neurosci 2024; 17:1323182. [PMID: 38250244 PMCID: PMC10796696 DOI: 10.3389/fncom.2023.1323182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 12/19/2023] [Indexed: 01/23/2024] Open
Affiliation(s)
| | | | | | | | - Panagiotis Vlamos
- Bioinformatics and Human Electrophysiology Laboratory, Department of Informatics, Ionian University, Corfu, Greece
| |
Collapse
|
48
|
Roy BG, Choi J, Fuchs MF. Predictive Modeling of Proteins Encoded by a Plant Virus Sheds a New Light on Their Structure and Inherent Multifunctionality. Biomolecules 2024; 14:62. [PMID: 38254661 PMCID: PMC10813169 DOI: 10.3390/biom14010062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 12/29/2023] [Accepted: 12/30/2023] [Indexed: 01/24/2024] Open
Abstract
Plant virus genomes encode proteins that are involved in replication, encapsidation, cell-to-cell, and long-distance movement, avoidance of host detection, counter-defense, and transmission from host to host, among other functions. Even though the multifunctionality of plant viral proteins is well documented, contemporary functional repertoires of individual proteins are incomplete. However, these can be enhanced by modeling tools. Here, predictive modeling of proteins encoded by the two genomic RNAs, i.e., RNA1 and RNA2, of grapevine fanleaf virus (GFLV) and their satellite RNAs by a suite of protein prediction software confirmed not only previously validated functions (suppressor of RNA silencing [VSR], viral genome-linked protein [VPg], protease [Pro], symptom determinant [Sd], homing protein [HP], movement protein [MP], coat protein [CP], and transmission determinant [Td]) and previously identified putative functions (helicase [Hel] and RNA-dependent RNA polymerase [Pol]), but also predicted novel functions with varying levels of confidence. These include a T3/T7-like RNA polymerase domain for protein 1AVSR, a short-chain reductase for protein 1BHel/VSR, a parathyroid hormone family domain for protein 1EPol/Sd, overlapping domains of unknown function and an ABC transporter domain for protein 2BMP, and DNA topoisomerase domains, transcription factor FBXO25 domain, or DNA Pol subunit cdc27 domain for the satellite RNA protein. Structural predictions for proteins 2AHP/Sd, 2BMP, and 3A? had low confidence, while predictions for proteins 1AVSR, 1BHel*/VSR, 1CVPg, 1DPro, 1EPol*/Sd, and 2CCP/Td retained higher confidence in at least one prediction. This research provided new insights into the structure and functions of GFLV proteins and their satellite protein. Future work is needed to validate these findings.
Collapse
Affiliation(s)
- Brandon G. Roy
- Plant Pathology and Plant-Microbe Biology Section, School of Integrative Plant Science, Cornell University, 15 Castle Creek Drive, Geneva, NY 14456, USA; (J.C.); (M.F.F.)
| | | | | |
Collapse
|
49
|
Chowdhury NB, Simons-Senftle M, Decouard B, Quillere I, Rigault M, Sajeevan KA, Acharya B, Chowdhury R, Hirel B, Dellagi A, Maranas C, Saha R. A multi-organ maize metabolic model connects temperature stress with energy production and reducing power generation. iScience 2023; 26:108400. [PMID: 38077131 PMCID: PMC10709110 DOI: 10.1016/j.isci.2023.108400] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 10/30/2023] [Accepted: 11/03/2023] [Indexed: 02/18/2024] Open
Abstract
Climate change has adversely affected maize productivity. Thereby, a holistic understanding of metabolic crosstalk among its organs is important to address this issue. Thus, we reconstructed the first multi-organ maize metabolic model, iZMA6517, and contextualized it with heat and cold stress transcriptomics data using expression distributed reaction flux measurement (EXTREAM) algorithm. Furthermore, implementing metabolic bottleneck analysis on contextualized models revealed differences between these stresses. While both stresses had reducing power bottlenecks, heat stress had additional energy generation bottlenecks. We also performed thermodynamic driving force analysis, revealing thermodynamics-reducing power-energy generation axis dictating the nature of temperature stress responses. Thus, a temperature-tolerant maize ideotype can be engineered by leveraging the proposed thermodynamics-reducing power-energy generation axis. We experimentally inoculated maize root with a beneficial mycorrhizal fungus, Rhizophagus irregularis, and as a proof-of-concept demonstrated its efficacy in alleviating temperature stress. Overall, this study will guide the engineering effort of temperature stress-tolerant maize ideotypes.
Collapse
Affiliation(s)
- Niaz Bahar Chowdhury
- Chemical and Biomolecular Engineering, University of Nebraska-Lincoln, Lincoln, NE, USA
| | | | - Berengere Decouard
- Université Paris-Saclay, INRAE, AgroParisTech, Institut Jean-Pierre Bourgin (IJPB), 78000 Versailles, France
| | - Isabelle Quillere
- Université Paris-Saclay, INRAE, AgroParisTech, Institut Jean-Pierre Bourgin (IJPB), 78000 Versailles, France
| | - Martine Rigault
- Université Paris-Saclay, INRAE, AgroParisTech, Institut Jean-Pierre Bourgin (IJPB), 78000 Versailles, France
| | | | - Bibek Acharya
- Chemical and Biological Engineering, Iowa State University, Ames, IA, USA
| | - Ratul Chowdhury
- Chemical and Biological Engineering, Iowa State University, Ames, IA, USA
| | - Bertrand Hirel
- Centre de Versailles-Grignon, Institut National de Recherche pour l’Agriculture, Versailles, France
| | - Alia Dellagi
- Université Paris-Saclay, INRAE, AgroParisTech, Institut Jean-Pierre Bourgin (IJPB), 78000 Versailles, France
| | - Costas Maranas
- Chemical Engineering, The Pennsylvania State University, University Park, PA, USA
| | - Rajib Saha
- Chemical and Biomolecular Engineering, University of Nebraska-Lincoln, Lincoln, NE, USA
| |
Collapse
|
50
|
Simpkin AJ, Mesdaghi S, Sánchez Rodríguez F, Elliott L, Murphy DL, Kryshtafovych A, Keegan RM, Rigden DJ. Tertiary structure assessment at CASP15. Proteins 2023; 91:1616-1635. [PMID: 37746927 PMCID: PMC10792517 DOI: 10.1002/prot.26593] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 08/25/2023] [Accepted: 09/07/2023] [Indexed: 09/26/2023]
Abstract
The results of tertiary structure assessment at CASP15 are reported. For the first time, recognizing the outstanding performance of AlphaFold 2 (AF2) at CASP14, all single-chain predictions were assessed together, irrespective of whether a template was available. At CASP15, there was no single stand-out group, with most of the best-scoring groups-led by PEZYFoldings, UM-TBM, and Yang Server-employing AF2 in one way or another. Many top groups paid special attention to generating deep Multiple Sequence Alignments (MSAs) and testing variant MSAs, thereby allowing them to successfully address some of the hardest targets. Such difficult targets, as well as lacking templates, were typically proteins with few homologues. Local divergence between prediction and target correlated with localization at crystal lattice or chain interfaces, and with regions exhibiting high B-factor factors in crystal structure targets, and should not necessarily be considered as representing error in the prediction. However, analysis of exposed and buried side chain accuracy showed room for improvement even in the latter. Nevertheless, a majority of groups produced high-quality predictions for most targets, which are valuable for experimental structure determination, functional analysis, and many other tasks across biology. These include those applying methods similar to those used to generate major resources such as the AlphaFold Protein Structure Database and the ESM Metagenomic atlas: the confidence estimates of the former were also notably accurate.
Collapse
Affiliation(s)
- Adam J. Simpkin
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| | - Shahram Mesdaghi
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
- Computational Biology Facility, MerseyBio, University of LiverpoolLiverpoolUK
| | - Filomeno Sánchez Rodríguez
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
- Life Science, Diamond Light Source, Harwell Science and Innovation CampusOxfordshireUK
- Department of Chemistry, York Structural Biology LaboratoryUniversity of YorkYorkUK
| | - Luc Elliott
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| | - David L. Murphy
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| | | | - Ronan M. Keegan
- UKRI‐STFC, Rutherford Appleton Laboratory, Research Complex at HarwellDidcotUK
| | - Daniel J. Rigden
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| |
Collapse
|