1
|
Liang F, Sun M, Xie L, Zhao X, Liu D, Zhao K, Zhang G. Recent advances and challenges in protein complex model accuracy estimation. Comput Struct Biotechnol J 2024; 23:1824-1832. [PMID: 38707538 PMCID: PMC11066466 DOI: 10.1016/j.csbj.2024.04.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/18/2024] [Accepted: 04/18/2024] [Indexed: 05/07/2024] Open
Abstract
Estimation of model accuracy plays a crucial role in protein structure prediction, aiming to evaluate the quality of predicted protein structure models accurately and objectively. This process is not only key to screening candidate models that are close to the real structure, but also provides guidance for further optimization of protein structures. With the significant advancements made by AlphaFold2 in monomer structure, the problem of single-domain protein structure prediction has been widely solved. Correspondingly, the importance of assessing the quality of single-domain protein models decreased, and the research focus has shifted to estimation of model accuracy of protein complexes. In this review, our goal is to provide a comprehensive overview of the reference and statistical metrics, as well as representative methods, and the current challenges within four distinct facets (Topology Global Score, Interface Total Score, Interface Residue-Wise Score, and Tertiary Residue-Wise Score) in the field of complex EMA.
Collapse
Affiliation(s)
| | | | - Lei Xie
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xuanfeng Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
2
|
Yi L, Guo X, Liu Y, Jirimutu, Wang Z. Single-cell 5' RNA sequencing of camelid peripheral B cells provides insights into cellular basis of heavy-chain antibody production. Comput Struct Biotechnol J 2024; 23:1705-1714. [PMID: 38689719 PMCID: PMC11059136 DOI: 10.1016/j.csbj.2024.04.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 04/15/2024] [Accepted: 04/15/2024] [Indexed: 05/02/2024] Open
Abstract
Camelids produce both conventional tetrameric antibodies (Abs) and dimeric heavy-chain antibodies (HCAbs). Although B cells that generate these two types of Abs exhibit distinct B cell receptors (BCRs), whether these two B cell populations differ in their phenotypes and developmental processes remains unclear. Here, we performed single-cell 5' RNA profiling of peripheral blood mononuclear cell samples from Bactrian camels before and after immunization. We characterized the functional subtypes and differentiation trajectories of circulating B cells in camels, and reconstructed single-cell BCR sequences. We found that in contrast to humans, the proportion of T-bet+ B cells was high among camelid peripheral B cells. Several marker genes of human B cell subtypes, including CD27 and IGHD, were expressed at low levels in the corresponding camel B cell subtypes. Camelid B cells expressing variable genes of HACbs (VHH) were widely present in various functional subtypes and showed highly overlapping differentiation trajectories with B cells expressing variable genes of conventional Abs (VH). After immunization, the transcriptional changes in VHH+ and VH+ B cells were largely consistent. Through structure modeling, we identified a variety of scaffold types among the reconstructed VHH sequences. Our study provides insights into the cellular context of HCAb production in camels and lays the foundation for developing single-B cell-based camelid single-domain Ab screening.
Collapse
Affiliation(s)
- Li Yi
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, College of Food Science and Engineering, Inner Mongolia Agricultural University, Huhhot 010018, China
| | - Xin Guo
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yuexing Liu
- Guangzhou Laboratory, Guangzhou 510005, China
| | - Jirimutu
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, College of Food Science and Engineering, Inner Mongolia Agricultural University, Huhhot 010018, China
- Inner Mongolia China-Kazakhstan Camel Research Institute, Alxa 750306, China
| | - Zhen Wang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
3
|
Sirugue L, Langenfeld F, Lagarde N, Montes M. PLO3S: Protein LOcal Surficial Similarity Screening. Comput Struct Biotechnol J 2024; 26:1-10. [PMID: 38189058 PMCID: PMC10770625 DOI: 10.1016/j.csbj.2023.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 12/01/2023] [Accepted: 12/03/2023] [Indexed: 01/09/2024] Open
Abstract
The study of protein molecular surfaces enables to better understand and predict protein interactions. Different methods have been developed in computer vision to compare surfaces that can be applied to protein molecular surfaces. The present work proposes a method using the Wave Kernel Signature: Protein LOcal Surficial Similarity Screening (PLO3S). The descriptor of the PLO3S method is a local surface shape descriptor projected on a unit sphere mapped onto a 2D plane and called Surface Wave Interpolated Maps (SWIM). PLO3S allows to rapidly compare protein surface shapes through local comparisons to filter large protein surfaces datasets in protein structures virtual screening protocols.
Collapse
Affiliation(s)
- Léa Sirugue
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| | - Florent Langenfeld
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| | - Nathalie Lagarde
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| | - Matthieu Montes
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| |
Collapse
|
4
|
Sun M, Wu F, Xu Z, Wang Y, Cao J, Zhou Y, Zhou J, Zhang H, Xu Q. The TCTP is essential for ovarian development and oviposition of Rhipicephalus haemaphysaloides. Vet Parasitol 2024; 329:110212. [PMID: 38781831 DOI: 10.1016/j.vetpar.2024.110212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 04/23/2024] [Accepted: 05/15/2024] [Indexed: 05/25/2024]
Abstract
Tick infestations transmit various infectious agents and result in significant socioeconomic consequences. Currently, the primary focus of tick control efforts is identifying potential targets for immune intervention. In a previous study, we identified a highly conserved protein abundant in tick haemolymph extracellular vesicles (EVs) known as translationally controlled tumour protein (TCTP). We have found that native TCTP is present in various tissues of the Rhipicephalus haemaphysaloides tick, including salivary glands, midgut, ovary, and fat body. Notably, TCTP is particularly abundant in the tick ovary and its levels increase progressively from the blood-feeding stage to engorgement. When the TCTP gene was knocked down by RNAi, there was a noticeable delay in ovarian development, and the reproductive performance, in terms of egg quantity and survival, was also hindered. Our investigations have revealed that the observed effects in ovary and eggs in dsRNA-treated ticks are not attributable to cell death mechanisms like apoptosis and autophagy but rather to the reduction in the expression of vitellogenin (Vg1, Vg2, and Vg3) and ferritin (ferritin 1 and ferritin 2) proteins crucial for ovarian development and embryo survival in ticks. Additionally, phylogenetic analysis and structural comparisons of RhTCTP and its orthologues across various tick species, vertebrate hosts, and humans have shown that TCTP is conserved in ticks but differs significantly between ticks and their hosts, particularly in the TCTP_1 and TCTP_2 domains. Overall, TCTP plays a vital role in tick reproductive development and presents itself as a potential target for tick control in both humans and animals.
Collapse
Affiliation(s)
- Meng Sun
- College of Animal Science and Technology, Anhui Agricultural University, Hefei 230036, China; Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai 200241, China
| | - Fei Wu
- College of Animal Science and Technology, Anhui Agricultural University, Hefei 230036, China; College of Animal Sciences, Zhejiang Provincial Key Laboratory of Preventive Veterinary Medicine, Institute of Preventive Veterinary Medicine, Zhejiang University, Hangzhou 310058, China
| | - Zhengmao Xu
- Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai 200241, China
| | - Yanan Wang
- Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai 200241, China
| | - Jie Cao
- Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai 200241, China
| | - Yongzhi Zhou
- Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai 200241, China
| | - Jinlin Zhou
- Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai 200241, China
| | - Houshuang Zhang
- Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai 200241, China.
| | - Qianming Xu
- College of Animal Science and Technology, Anhui Agricultural University, Hefei 230036, China.
| |
Collapse
|
5
|
Shimada N, Kameyama A, Watanabe M, Sahara T, Matsuzawa T. Identification and characterization of xyloglucan-degradation related α-1,2-l-fucosidase in Aspergillus oryzae. J Biosci Bioeng 2024:S1389-1723(24)00159-2. [PMID: 38871579 DOI: 10.1016/j.jbiosc.2024.05.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 05/14/2024] [Accepted: 05/25/2024] [Indexed: 06/15/2024]
Abstract
Xyloglucan in plant cell walls has complex side-chain structures; Aspergillus oryzae produces various enzymes to degrade and assimilate xyloglucan. In this study, we identified and characterized α-1,2-l-fucosidase (AfcA) which is involved in xyloglucan degradation in A. oryzae. AfcA expression was induced in the presence of xyloglucan oligosaccharides. AfcA showed specific activity toward α-(1→2)-linked l-fucopyranosyl residues attached to the side chains of xyloglucan oligosaccharides and milk oligosaccharides, but not toward α-(1→3)-, α-(1→4)-, and α-(1→6)-linked l-fucopyranosyl residues. As fucopyranosyl residues in the side chains of xyloglucan oligosaccharides prevent the degradation of xyloglucan oligosaccharides by isoprimeverose-producing oligoxyloglucan hydrolase and β-galactosidase, the cooperative action of AfcA, isoprimeverose-producing oligoxyloglucan hydrolase, and β-galactosidase play a key role in degrading fucosylated xyloglucan in A. oryzae.
Collapse
Affiliation(s)
- Naoki Shimada
- Department of Applied Biological Science, Faculty of Agriculture, Kagawa University, 2393 Ikenobe, Miki, Kagawa 761-0795, Japan
| | - Akihiko Kameyama
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), 1-1-1 Higashi, Tsukuba, Ibaraki 305-8565, Japan
| | - Masahiro Watanabe
- Research Institute for Sustainable Chemistry, National Institute of Advanced Industrial Science and Technology (AIST), 3-11-32 Kagamiyama, Higashi-Hiroshima, Hiroshima 739-0046, Japan
| | - Takehiko Sahara
- Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), 1-1-1 Higashi, Tsukuba, Ibaraki 305-8566, Japan
| | - Tomohiko Matsuzawa
- Department of Applied Biological Science, Faculty of Agriculture, Kagawa University, 2393 Ikenobe, Miki, Kagawa 761-0795, Japan.
| |
Collapse
|
6
|
Zhou H, Skolnick J. Utility of the Morgan Fingerprint in Structure-Based Virtual Ligand Screening. J Phys Chem B 2024; 128:5363-5370. [PMID: 38783525 PMCID: PMC11163432 DOI: 10.1021/acs.jpcb.4c01875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 05/10/2024] [Accepted: 05/14/2024] [Indexed: 05/25/2024]
Abstract
In modern drug discovery, virtual ligand screening (VLS) is frequently applied to identify possible hits before experimental testing and refinement due to its cost-effective nature for large compound libraries. For decades, efforts have been devoted to developing VLS methods with high accuracy. These include the state-of-the-art FINDSITE suite of approaches FINDSITEcomb2.0, FRAGSITE, and FRAGSITE2 and the meta version FRAGSITEcomb that were developed in our lab. These methods combine ligand homology modeling (LHM), traditional ligand similarity methods, and more recently machine learning approaches to rank ligands and have proven to be superior to most recent deep learning and large language model-based approaches. Here, we describe further improvements to our previous best methods by combining the Morgan fingerprint (MF) with the originally used PubChem fingerprint and FP2 fingerprint. We then benchmarked FINDSITEcomb2.0M, FRAGSITEM, FRAGSITE2M, and the composite meta-approach FRAGSITEcombM. On the 102 target DUD-E set, the 1% enrichment factor (EF1%) and area under the precision-recall curve (AUPR) of FRAGSITEcomb increased from 42.0/0.59 to 47.6/0.72. This 0.72 AUPR is significantly better than that of the state-of-the-art deep learning-based method DenseFS's AUPR of 0.443. An independent test on the 81 targets DEKOIS2.0 set shows that EF1%/AUPR increases from 18.3/0.520 to 23.1/0.683. An ablation investigation shows that the MF contributes to most of the improvement of all four approaches. Thus, the MF is a useful addition to structure-based VLS.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems
Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Jeffrey Skolnick
- Center for the Study of Systems
Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| |
Collapse
|
7
|
Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, Ronneberger O, Willmore L, Ballard AJ, Bambrick J, Bodenstein SW, Evans DA, Hung CC, O'Neill M, Reiman D, Tunyasuvunakool K, Wu Z, Žemgulytė A, Arvaniti E, Beattie C, Bertolli O, Bridgland A, Cherepanov A, Congreve M, Cowen-Rivers AI, Cowie A, Figurnov M, Fuchs FB, Gladman H, Jain R, Khan YA, Low CMR, Perlin K, Potapenko A, Savy P, Singh S, Stecula A, Thillaisundaram A, Tong C, Yakneen S, Zhong ED, Zielinski M, Žídek A, Bapst V, Kohli P, Jaderberg M, Hassabis D, Jumper JM. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024; 630:493-500. [PMID: 38718835 PMCID: PMC11168924 DOI: 10.1038/s41586-024-07487-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 04/29/2024] [Indexed: 06/13/2024]
Abstract
The introduction of AlphaFold 21 has spurred a revolution in modelling the structure of proteins and their interactions, enabling a huge range of applications in protein modelling and design2-6. Here we describe our AlphaFold 3 model with a substantially updated diffusion-based architecture that is capable of predicting the joint structure of complexes including proteins, nucleic acids, small molecules, ions and modified residues. The new AlphaFold model demonstrates substantially improved accuracy over many previous specialized tools: far greater accuracy for protein-ligand interactions compared with state-of-the-art docking tools, much higher accuracy for protein-nucleic acid interactions compared with nucleic-acid-specific predictors and substantially higher antibody-antigen prediction accuracy compared with AlphaFold-Multimer v.2.37,8. Together, these results show that high-accuracy modelling across biomolecular space is possible within a single unified deep-learning framework.
Collapse
Affiliation(s)
| | - Jonas Adler
- Core Contributor, Google DeepMind, London, UK
| | - Jack Dunger
- Core Contributor, Google DeepMind, London, UK
| | | | - Tim Green
- Core Contributor, Google DeepMind, London, UK
| | | | | | | | | | | | | | | | | | | | | | | | - Zachary Wu
- Core Contributor, Google DeepMind, London, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Yousuf A Khan
- Google DeepMind, London, UK
- Department of Molecular and Cellular Physiology, Stanford University, Stanford, CA, USA
| | | | | | | | | | | | | | | | | | | | - Ellen D Zhong
- Google DeepMind, London, UK
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| | | | | | | | | | | | - Demis Hassabis
- Core Contributor, Google DeepMind, London, UK.
- Core Contributor, Isomorphic Labs, London, UK.
| | | |
Collapse
|
8
|
Bernard C, Postic G, Ghannay S, Tahi F. State-of-the-RNArt: benchmarking current methods for RNA 3D structure prediction. NAR Genom Bioinform 2024; 6:lqae048. [PMID: 38745991 PMCID: PMC11091930 DOI: 10.1093/nargab/lqae048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 04/05/2024] [Accepted: 05/08/2024] [Indexed: 05/16/2024] Open
Abstract
RNAs are essential molecules involved in numerous biological functions. Understanding RNA functions requires the knowledge of their 3D structures. Computational methods have been developed for over two decades to predict the 3D conformations from RNA sequences. These computational methods have been widely used and are usually categorised as either ab initio or template-based. The performances remain to be improved. Recently, the rise of deep learning has changed the sight of novel approaches. Deep learning methods are promising, but their adaptation to RNA 3D structure prediction remains difficult. In this paper, we give a brief review of the ab initio, template-based and novel deep learning approaches. We highlight the different available tools and provide a benchmark on nine methods using the RNA-Puzzles dataset. We provide an online dashboard that shows the predictions made by benchmarked methods, freely available on the EvryRNA platform: https://evryrna.ibisc.univ-evry.fr/evryrna/state_of_the_rnart/.
Collapse
Affiliation(s)
- Clément Bernard
- Université Paris-Saclay, Univ. Evry, IBISC, 91020 Evry-Courcouronnes, France
- LISN - CNRS/Université Paris-Saclay, 91400 Orsay, France
| | - Guillaume Postic
- Université Paris-Saclay, Univ. Evry, IBISC, 91020 Evry-Courcouronnes, France
| | - Sahar Ghannay
- LISN - CNRS/Université Paris-Saclay, 91400 Orsay, France
| | - Fariza Tahi
- Université Paris-Saclay, Univ. Evry, IBISC, 91020 Evry-Courcouronnes, France
| |
Collapse
|
9
|
Han Y, Lu Y, Yan X, Cui H, Cheng S, Zheng J, Zhou Y, Wang S, Li Z. Atom-ProteinQA: Atom-level protein model quality assessment through fine-grained joint learning. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 249:108078. [PMID: 38537495 DOI: 10.1016/j.cmpb.2024.108078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 12/26/2023] [Accepted: 02/10/2024] [Indexed: 04/21/2024]
Abstract
MOTIVATION Protein model quality assessment (ProteinQA) is a fundamental task that is essential for biologically relevant applications, i.e., protein structure refinement, protein design, etc. Previous works aimed to conduct ProteinQA only on the global structure or per-residue level, ignoring potentially usable and precise cues from a fine-grained per-atom perspective. In this study, we propose an atom-level ProteinQA model, named Atom-ProteinQA, in which two innovative modules are designed to extract geometric and topological atom-level relationships respectively. Specifically, on the one hand, a geometric perception module exploits 3D sparse convolution to capture the geometric features of the input protein, generating fine-grained atom-level predictions. On the other hand, natural chemical bonds are utilized to construct an atom-level graph, then message passing from a topological perception module is applied to output residue-level predictions in parallel. Eventually, through a cross-model aggregation module, features from different modules mutually interact, enhancing performance on both the atom and residue levels. RESULTS Extensive experiments show that our proposed Atom-ProteinQA outperforms previous methods by a large margin, regardless of residue-level or atom-level assessment. Concretely, we achieved state-of-the-art performance on CATH-2084, Decoy-8000, public benchmarks CASP13 & CASP14, and the CAMEO. AVAILABILITY The repository of this project is released on: https://github.com/luyfcandy/Atom_ProteinQA.
Collapse
Affiliation(s)
- Yatong Han
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Yingfeng Lu
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Xu Yan
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Hannah Cui
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | | | - Jiayou Zheng
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Yuzhe Zhou
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Sheng Wang
- Shanghai Zelixir Biotech Company Ltd., Shanghai, 200030, China.
| | - Zhen Li
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China.
| |
Collapse
|
10
|
Tejera-Nevado P, Serrano E, González-Herrero A, Bermejo R, Rodríguez-González A. Unlocking the power of AI models: exploring protein folding prediction through comparative analysis. J Integr Bioinform 2024; 0:jib-2023-0041. [PMID: 38797876 DOI: 10.1515/jib-2023-0041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 04/10/2024] [Indexed: 05/29/2024] Open
Abstract
Protein structure determination has made progress with the aid of deep learning models, enabling the prediction of protein folding from protein sequences. However, obtaining accurate predictions becomes essential in certain cases where the protein structure remains undescribed. This is particularly challenging when dealing with rare, diverse structures and complex sample preparation. Different metrics assess prediction reliability and offer insights into result strength, providing a comprehensive understanding of protein structure by combining different models. In a previous study, two proteins named ARM58 and ARM56 were investigated. These proteins contain four domains of unknown function and are present in Leishmania spp. ARM refers to an antimony resistance marker. The study's main objective is to assess the accuracy of the model's predictions, thereby providing insights into the complexities and supporting metrics underlying these findings. The analysis also extends to the comparison of predictions obtained from other species and organisms. Notably, one of these proteins shares an ortholog with Trypanosoma cruzi and Trypanosoma brucei, leading further significance to our analysis. This attempt underscored the importance of evaluating the diverse outputs from deep learning models, facilitating comparisons across different organisms and proteins. This becomes particularly pertinent in cases where no previous structural information is available.
Collapse
Affiliation(s)
- Paloma Tejera-Nevado
- ETS Ingenieros Informáticos, 16771 Universidad Politécnica de Madrid , Madrid, Spain
- Centro de Tecnología Biomédica, 16771 Universidad Politécnica de Madrid , Pozuelo de Alarcón, Madrid, Spain
| | - Emilio Serrano
- ETS Ingenieros Informáticos, 16771 Universidad Politécnica de Madrid , Madrid, Spain
| | - Ana González-Herrero
- 54446 Margarita Salas Center for Biological Research (CIB-CSIC), Spanish National Research Council , Madrid, Spain
| | - Rodrigo Bermejo
- 54446 Margarita Salas Center for Biological Research (CIB-CSIC), Spanish National Research Council , Madrid, Spain
| | - Alejandro Rodríguez-González
- ETS Ingenieros Informáticos, 16771 Universidad Politécnica de Madrid , Madrid, Spain
- Centro de Tecnología Biomédica, 16771 Universidad Politécnica de Madrid , Pozuelo de Alarcón, Madrid, Spain
| |
Collapse
|
11
|
Ahdritz G, Bouatta N, Floristean C, Kadyan S, Xia Q, Gerecke W, O'Donnell TJ, Berenberg D, Fisk I, Zanichelli N, Zhang B, Nowaczynski A, Wang B, Stepniewska-Dziubinska MM, Zhang S, Ojewole A, Guney ME, Biderman S, Watkins AM, Ra S, Lorenzo PR, Nivon L, Weitzner B, Ban YEA, Chen S, Zhang M, Li C, Song SL, He Y, Sorger PK, Mostaque E, Zhang Z, Bonneau R, AlQuraishi M. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nat Methods 2024:10.1038/s41592-024-02272-z. [PMID: 38744917 DOI: 10.1038/s41592-024-02272-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 04/03/2024] [Indexed: 05/16/2024]
Abstract
AlphaFold2 revolutionized structural biology with the ability to predict protein structures with exceptionally high accuracy. Its implementation, however, lacks the code and data required to train new models. These are necessary to (1) tackle new tasks, like protein-ligand complex structure prediction, (2) investigate the process by which the model learns and (3) assess the model's capacity to generalize to unseen regions of fold space. Here we report OpenFold, a fast, memory efficient and trainable implementation of AlphaFold2. We train OpenFold from scratch, matching the accuracy of AlphaFold2. Having established parity, we find that OpenFold is remarkably robust at generalizing even when the size and diversity of its training set is deliberately limited, including near-complete elisions of classes of secondary structure elements. By analyzing intermediate structures produced during training, we also gain insights into the hierarchical manner in which OpenFold learns to fold. In sum, our studies demonstrate the power and utility of OpenFold, which we believe will prove to be a crucial resource for the protein modeling community.
Collapse
Affiliation(s)
- Gustaf Ahdritz
- Department of Systems Biology, Columbia University, New York, NY, USA
- Harvard University, Cambridge, MA, USA
| | - Nazim Bouatta
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA.
| | | | - Sachin Kadyan
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Qinghui Xia
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - William Gerecke
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA
| | | | - Daniel Berenberg
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
| | - Ian Fisk
- Flatiron Institute, New York, NY, USA
| | | | - Bo Zhang
- Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT, USA
| | | | | | | | | | | | | | - Stella Biderman
- EleutherAI, New York, NY, USA
- Booz Allen Hamilton, McLean, VA, USA
| | | | - Stephen Ra
- Prescient Design, Genentech, New York, NY, USA
| | | | | | | | | | | | - Minjia Zhang
- University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | | | | | | | - Peter K Sorger
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA
| | | | - Zhao Zhang
- Rutgers University, New Brunswick, NJ, USA
| | | | | |
Collapse
|
12
|
Fazekas Z, K Menyhárd D, Perczel A. LoCoHD: a metric for comparing local environments of proteins. Nat Commun 2024; 15:4029. [PMID: 38740745 DOI: 10.1038/s41467-024-48225-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Accepted: 04/22/2024] [Indexed: 05/16/2024] Open
Abstract
Protein folds and the local environments they create can be compared using a variety of differently designed measures, such as the root mean squared deviation, the global distance test, the template modeling score or the local distance difference test. Although these measures have proven to be useful for a variety of tasks, each fails to fully incorporate the valuable chemical information inherent to atoms and residues, and considers these only partially and indirectly. Here, we develop the highly flexible local composition Hellinger distance (LoCoHD) metric, which is based on the chemical composition of local residue environments. Using LoCoHD, we analyze the chemical heterogeneity of amino acid environments and identify valines having the most conserved-, and arginines having the most variable chemical environments. We use LoCoHD to investigate structural ensembles, to evaluate critical assessment of structure prediction (CASP) competitors, to compare the results with the local distance difference test (lDDT) scoring system, and to evaluate a molecular dynamics simulation. We show that LoCoHD measurements provide unique information about protein structures that is distinct from, for example, those derived using the alignment-based RMSD metric, or the similarly distance matrix-based but alignment-free lDDT metric.
Collapse
Affiliation(s)
- Zsolt Fazekas
- Laboratory of Structural Chemistry and Biology, Institute of Chemistry, ELTE Eötvös Loránd University, Budapest, Hungary
- ELTE Hevesy György PhD School of Chemistry, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Dóra K Menyhárd
- Laboratory of Structural Chemistry and Biology, Institute of Chemistry, ELTE Eötvös Loránd University, Budapest, Hungary
- HUN-REN-ELTE Protein Modeling Research Group, ELTE Eötvös Loránd University, Budapest, Hungary
| | - András Perczel
- Laboratory of Structural Chemistry and Biology, Institute of Chemistry, ELTE Eötvös Loránd University, Budapest, Hungary.
- HUN-REN-ELTE Protein Modeling Research Group, ELTE Eötvös Loránd University, Budapest, Hungary.
| |
Collapse
|
13
|
Chen X, Liu J, Park N, Cheng J. A Survey of Deep Learning Methods for Estimating the Accuracy of Protein Quaternary Structure Models. Biomolecules 2024; 14:574. [PMID: 38785981 PMCID: PMC11117562 DOI: 10.3390/biom14050574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 04/07/2024] [Accepted: 05/09/2024] [Indexed: 05/25/2024] Open
Abstract
The quality prediction of quaternary structure models of a protein complex, in the absence of its true structure, is known as the Estimation of Model Accuracy (EMA). EMA is useful for ranking predicted protein complex structures and using them appropriately in biomedical research, such as protein-protein interaction studies, protein design, and drug discovery. With the advent of more accurate protein complex (multimer) prediction tools, such as AlphaFold2-Multimer and ESMFold, the estimation of the accuracy of protein complex structures has attracted increasing attention. Many deep learning methods have been developed to tackle this problem; however, there is a noticeable absence of a comprehensive overview of these methods to facilitate future development. Addressing this gap, we present a review of deep learning EMA methods for protein complex structures developed in the past several years, analyzing their methodologies, data and feature construction. We also provide a prospective summary of some potential new developments for further improving the accuracy of the EMA methods.
Collapse
Affiliation(s)
- Xiao Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| | - Nolan Park
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
14
|
Zhang Z, Shen W, Liu Q, Zitnik M. PocketGen: Generating Full-Atom Ligand-Binding Protein Pockets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.25.581968. [PMID: 38464121 PMCID: PMC10925136 DOI: 10.1101/2024.02.25.581968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Designing small-molecule-binding proteins, such as enzymes and biosensors, is essential in protein biology and bioengineering. Generating high-fidelity protein pockets-areas where proteins interact with ligand molecules-is challenging due to the complex interactions between ligand molecules and proteins, the flexibility of ligand molecules and amino acid side chains, and intricate sequence-structure dependencies. We introduce PocketGen, a deep generative method that produces the residue sequence and the full-atom structure within the protein pocket region, leveraging sequence-structure consistency. PocketGen comprises a bilevel graph transformer for structural encoding and a sequence refinement module utilizing a protein language model (pLM) for sequence prediction. The bilevel graph transformer captures interactions at multiple granularities (atom-level and residue/ligand-level) and aspects (intra-protein and protein-ligand) through bilevel attention mechanisms. A structural adapter employing cross-attention is integrated into the pLM for sequence refinement to ensure consistency between structure-based and sequence-based prediction. During training, only the adapter is fine-tuned, while the other layers of the pLM remain unchanged. Experiments demonstrate that PocketGen can efficiently generate protein pockets with higher binding affinity and validity than state-of-the-art methods. PocketGen is ten times faster than physics-based methods and achieves a 95% success rate (percentage of generated pockets with higher binding affinity than reference pockets) with an amino acid recovery rate exceeding 64%.
Collapse
|
15
|
Manfredi M, Savojardo C, Iardukhin G, Salomoni D, Costantini A, Martelli PL, Casadio R. Alpha&ESMhFolds: A Web Server for Comparing AlphaFold2 and ESMFold Models of the Human Reference Proteome. J Mol Biol 2024:168593. [PMID: 38718922 DOI: 10.1016/j.jmb.2024.168593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 04/22/2024] [Accepted: 04/30/2024] [Indexed: 05/16/2024]
Abstract
We develop a novel database Alpha&ESMhFolds which allows the direct comparison of AlphaFold2 and ESMFold predicted models for 42,942 proteins of the Reference Human Proteome, and when available, their comparison with 2,900 directly associated PDB structures with at least a structure to sequence coverage of 70%. Statistics indicate that good quality models tend to overlap with a TM-score >0.6 as long as some PDB structural information is available. As expected, a direct model superimposition to the PDB structure highlights that AlphaFold2 models are slightly superior to ESMFold ones. However, some 55% of the database is endowed with models overlapping with TM-score <0.6. This highlights the different outputs of the two methods. The database is freely available for usage at https://alpha-esmhfolds.biocomp.unibo.it/.
Collapse
Affiliation(s)
- Matteo Manfredi
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy.
| | - Georgii Iardukhin
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| | | | | | - Pier Luigi Martelli
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy.
| | - Rita Casadio
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| |
Collapse
|
16
|
Mallick B, Dutta A, Mondal P, Dutta M. Proteomic analysis and protein structure prediction of Shigella phage Sfk20 based on a comparative study using structure prediction approaches. Proteins 2024; 92:637-648. [PMID: 38146101 DOI: 10.1002/prot.26653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 11/21/2023] [Accepted: 12/01/2023] [Indexed: 12/27/2023]
Abstract
Bacteriophages are the natural predators of bacteria and are available abundantly everywhere in nature. Lytic phages can specifically infect their bacterial host (through attachment to the receptor) and use their host replication machinery to replicate rapidly, a feature that enables them to kill a disease-causing bacteria. Hence, phage attachment to the host bacteria is the first important step of the infection process. It is reported in this study that the receptor could be an LPS which is responsible for the attachment of the Sfk20 phage to its host (Shigella flexneri 2a). Phage Sfk20 bacteriolytic activity was examined for preliminary optimization of phage titer. The phage Sfk20 viability at different saline conditions was conducted. The LC-MS/MS technique used here for detecting and identifying 40 Sfk20 phage proteins helped us to get an initial understanding of the structural landscape of phage Sfk20. From the identified proteins, six structurally significant proteins were selected for structure prediction using two neural network systems: AlphaFold2 and ESMFold, and one homology modeling software: Phyre2. Later the performance of these modeling systems was compared using various metrics. We conclude from the available and generated information that AlphaFold2 and Phyre2 perform better than ESMFold for predicting Sfk20 phage protein structures.
Collapse
Affiliation(s)
- Bani Mallick
- Division of Electron Microscopy, ICMR-National Institute of Cholera & Enteric Diseases, Kolkata, West Bengal, India
| | - Aninda Dutta
- Division of Electron Microscopy, ICMR-National Institute of Cholera & Enteric Diseases, Kolkata, West Bengal, India
| | - Payel Mondal
- Division of Electron Microscopy, ICMR-National Institute of Cholera & Enteric Diseases, Kolkata, West Bengal, India
| | - Moumita Dutta
- Division of Electron Microscopy, ICMR-National Institute of Cholera & Enteric Diseases, Kolkata, West Bengal, India
| |
Collapse
|
17
|
Gogishvili D, Illes-Toth E, Harris MJ, Hopley C, Teunissen CE, Abeln S. Structural flexibility and heterogeneity of recombinant human glial fibrillary acidic protein (GFAP). Proteins 2024; 92:649-664. [PMID: 38149328 DOI: 10.1002/prot.26656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 12/07/2023] [Accepted: 12/12/2023] [Indexed: 12/28/2023]
Abstract
Glial fibrillary acidic protein (GFAP) is a promising biomarker for brain and spinal cord disorders. Recent studies have highlighted the differences in the reliability of GFAP measurements in different biological matrices. The reason for these discrepancies is poorly understood as our knowledge of the protein's 3-dimensional conformation, proteoforms, and aggregation remains limited. Here, we investigate the structural properties of GFAP under different conditions. For this, we characterized recombinant GFAP proteins from various suppliers and applied hydrogen-deuterium exchange mass spectrometry (HDX-MS) to provide a snapshot of the conformational dynamics of GFAP in artificial cerebrospinal fluid (aCSF) compared to the phosphate buffer. Our findings indicate that recombinant GFAP exists in various conformational species. Furthermore, we show that GFAP dimers remained intact under denaturing conditions. HDX-MS experiments show an overall decrease in H-bonding and an increase in solvent accessibility of GFAP in aCSF compared to the phosphate buffer, with clear indications of mixed EX2 and EX1 kinetics. To understand possible structural interface regions and the evolutionary conservation profiles, we combined HDX-MS results with the predicted GFAP-dimer structure by AlphaFold-Multimer. We found that deprotected regions with high structural flexibility in aCSF overlap with predicted conserved dimeric 1B and 2B domain interfaces. Structural property predictions combined with the HDX data show an overall deprotection and signatures of aggregation in aCSF. We anticipate that the outcomes of this research will contribute to a deeper understanding of the structural flexibility of GFAP and ultimately shed light on its behavior in different biological matrices.
Collapse
Affiliation(s)
- Dea Gogishvili
- Bioinformatics, Computer Science Department, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- AI Technology for Life, Department of Computing and Information Sciences, Department of Biology, Utrecht University, Utrecht, The Netherlands
| | - Eva Illes-Toth
- National Measurement Laboratory at Laboratory of the Government Chemist (LGC), Teddington, UK
| | - Matthew J Harris
- National Measurement Laboratory at Laboratory of the Government Chemist (LGC), Teddington, UK
| | - Christopher Hopley
- National Measurement Laboratory at Laboratory of the Government Chemist (LGC), Teddington, UK
| | - Charlotte E Teunissen
- Amsterdam Neuroscience, Neurodegeneration, Amsterdam, The Netherlands
- Neurochemistry Laboratory, Department of Clinical Chemistry, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam UMC, Amsterdam, The Netherlands
| | - Sanne Abeln
- Bioinformatics, Computer Science Department, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- AI Technology for Life, Department of Computing and Information Sciences, Department of Biology, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
18
|
Mischley V, Maier J, Chen J, Karanicolas J. PPIscreenML: Structure-based screening for protein-protein interactions using AlphaFold. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.16.585347. [PMID: 38559274 PMCID: PMC10979958 DOI: 10.1101/2024.03.16.585347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Protein-protein interactions underlie nearly all cellular processes. With the advent of protein structure prediction methods such as AlphaFold2 (AF2), models of specific protein pairs can be built extremely accurately in most cases. However, determining the relevance of a given protein pair remains an open question. It is presently unclear how to use best structure-based tools to infer whether a pair of candidate proteins indeed interact with one another: ideally, one might even use such information to screen amongst candidate pairings to build up protein interaction networks. Whereas methods for evaluating quality of modeled protein complexes have been co-opted for determining which pairings interact (e.g., pDockQ and iPTM), there have been no rigorously benchmarked methods for this task. Here we introduce PPIscreenML, a classification model trained to distinguish AF2 models of interacting protein pairs from AF2 models of compelling decoy pairings. We find that PPIscreenML out-performs methods such as pDockQ and iPTM for this task, and further that PPIscreenML exhibits impressive performance when identifying which ligand/receptor pairings engage one another across the structurally conserved tumor necrosis factor superfamily (TNFSF). Analysis of benchmark results using complexes not seen in PPIscreenML development strongly suggest that the model generalizes beyond training data, making it broadly applicable for identifying new protein complexes based on structural models built with AF2.
Collapse
|
19
|
Tan X, Han Y, Zhai S, Dong H, Zhang T, Zhang K. An Integrated Analytical Approach for Screening Functional Post-Translational Modification Sites in Metabolic Enzymes. ACS OMEGA 2024; 9:19003-19008. [PMID: 38708225 PMCID: PMC11064186 DOI: 10.1021/acsomega.3c09514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 04/07/2024] [Accepted: 04/10/2024] [Indexed: 05/07/2024]
Abstract
Post-translational modifications (PTMs) are pivotal in the orchestration of diverse physiological and pathological processes. Despite this, the identification of functional PTM sites within the vast amount of data remains challenging. Conventionally, those PTM sites are discerned through labor-intensive and time-consuming experiments. Here, we developed an integrated analytical approach for the identification of functional PTM sites on metabolic enzymes via a screening process. Through gene ontology (GO) analysis, we identified 269 enzymes with lysine 2-hydroxyisobutyrylation (Khib) from our proteomics data set of Escherichia coli. The first round of screening was performed based on the enzyme structures/predicted structures using the TM-score engineer, a tool designed to evaluate the impact of PTM on the protein structure. Subsequently, we examined the influence of Khib on the enzyme-substrate interactions through both static and dynamic analyses, molecular docking, and molecular dynamics simulation. Ultimately, we identified NfsB K181hib and ThiF K83hib as potential functional sites. This work has established a novel analytical approach for the identification of functional protein PTM sites, thereby contributing to the understanding of Khib functions.
Collapse
Affiliation(s)
- Xiaoxia Tan
- The
Province and Ministry Co-Sponsored Collaborative Innovation Center
for Medical Epigenetics, Key Laboratory of Immune Microenvironment
and Disease (Ministry of Education), Tianjin Key Laboratory of Medical
Epigenetics, Department of Biochemistry and Molecular Biology, School
of Basic Medical Sciences, Tianjin Medical
University, Tianjin 300070, China
| | - Yue Han
- The
Province and Ministry Co-Sponsored Collaborative Innovation Center
for Medical Epigenetics, Key Laboratory of Immune Microenvironment
and Disease (Ministry of Education), Tianjin Key Laboratory of Medical
Epigenetics, Department of Biochemistry and Molecular Biology, School
of Basic Medical Sciences, Tianjin Medical
University, Tianjin 300070, China
| | - Shengrui Zhai
- The
Province and Ministry Co-Sponsored Collaborative Innovation Center
for Medical Epigenetics, Key Laboratory of Immune Microenvironment
and Disease (Ministry of Education), Tianjin Key Laboratory of Medical
Epigenetics, Department of Biochemistry and Molecular Biology, School
of Basic Medical Sciences, Tianjin Medical
University, Tianjin 300070, China
| | - Hanyang Dong
- The
Province and Ministry Co-Sponsored Collaborative Innovation Center
for Medical Epigenetics, Key Laboratory of Immune Microenvironment
and Disease (Ministry of Education), Tianjin Key Laboratory of Medical
Epigenetics, Department of Biochemistry and Molecular Biology, School
of Basic Medical Sciences, Tianjin Medical
University, Tianjin 300070, China
| | - Tao Zhang
- School
of Biomedical Engineering, Tianjin Medical
University, Tianjin 300070, China
| | - Kai Zhang
- The
Province and Ministry Co-Sponsored Collaborative Innovation Center
for Medical Epigenetics, Key Laboratory of Immune Microenvironment
and Disease (Ministry of Education), Tianjin Key Laboratory of Medical
Epigenetics, Department of Biochemistry and Molecular Biology, School
of Basic Medical Sciences, Tianjin Medical
University, Tianjin 300070, China
| |
Collapse
|
20
|
Ovek D, Keskin O, Gursoy A. ProInterVal: Validation of Protein-Protein Interfaces through Learned Interface Representations. J Chem Inf Model 2024; 64:2979-2987. [PMID: 38526504 DOI: 10.1021/acs.jcim.3c01788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
Proteins are vital components of the biological world and serve a multitude of functions. They interact with other molecules through their interfaces and participate in crucial cellular processes. Disruption of these interactions can have negative effects on organisms, highlighting the importance of studying protein-protein interfaces for developing targeted therapies for diseases. Therefore, the development of a reliable method for investigating protein-protein interactions is of paramount importance. In this work, we present an approach for validating protein-protein interfaces using learned interface representations. The approach involves using a graph-based contrastive autoencoder architecture and a transformer to learn representations of protein-protein interaction interfaces from unlabeled data and then validating them through learned representations with a graph neural network. Our method achieves an accuracy of 0.91 for the test set, outperforming existing GNN-based methods. We demonstrate the effectiveness of our approach on a benchmark data set and show that it provides a promising solution for validating protein-protein interfaces.
Collapse
Affiliation(s)
- Damla Ovek
- KUIS AI Center, Koç University, Istanbul 34450, Turkey
- Computer Engineering, Koç University, Istanbul 34450, Turkey
| | - Ozlem Keskin
- Chemical and Biological Engineering, Koç University, Istanbul 34450, Turkey
| | - Attila Gursoy
- Computer Engineering, Koç University, Istanbul 34450, Turkey
| |
Collapse
|
21
|
Krishna R, Wang J, Ahern W, Sturmfels P, Venkatesh P, Kalvet I, Lee GR, Morey-Burrows FS, Anishchenko I, Humphreys IR, McHugh R, Vafeados D, Li X, Sutherland GA, Hitchcock A, Hunter CN, Kang A, Brackenbrough E, Bera AK, Baek M, DiMaio F, Baker D. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 2024; 384:eadl2528. [PMID: 38452047 DOI: 10.1126/science.adl2528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 02/27/2024] [Indexed: 03/09/2024]
Abstract
Deep-learning methods have revolutionized protein structure prediction and design but are presently limited to protein-only systems. We describe RoseTTAFold All-Atom (RFAA), which combines a residue-based representation of amino acids and DNA bases with an atomic representation of all other groups to model assemblies that contain proteins, nucleic acids, small molecules, metals, and covalent modifications, given their sequences and chemical structures. By fine-tuning on denoising tasks, we developed RFdiffusion All-Atom (RFdiffusionAA), which builds protein structures around small molecules. Starting from random distributions of amino acid residues surrounding target small molecules, we designed and experimentally validated, through crystallography and binding measurements, proteins that bind the cardiac disease therapeutic digoxigenin, the enzymatic cofactor heme, and the light-harvesting molecule bilin.
Collapse
Affiliation(s)
- Rohith Krishna
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Jue Wang
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Woody Ahern
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98105, USA
| | - Pascal Sturmfels
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98105, USA
| | - Preetham Venkatesh
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, WA 98105, USA
| | - Indrek Kalvet
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98105, USA
| | - Gyu Rie Lee
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98105, USA
| | | | - Ivan Anishchenko
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Ian R Humphreys
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Ryan McHugh
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, WA 98105, USA
| | - Dionne Vafeados
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Xinting Li
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | | | - Andrew Hitchcock
- School of Biosciences, University of Sheffield, Sheffield S10 2TN, UK
| | - C Neil Hunter
- School of Biosciences, University of Sheffield, Sheffield S10 2TN, UK
| | - Alex Kang
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Evans Brackenbrough
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Asim K Bera
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Minkyung Baek
- School of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea
| | - Frank DiMaio
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98105, USA
| |
Collapse
|
22
|
Waterhouse AM, Studer G, Robin X, Bienert S, Tauriello G, Schwede T. The structure assessment web server: for proteins, complexes and more. Nucleic Acids Res 2024:gkae270. [PMID: 38634802 DOI: 10.1093/nar/gkae270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 03/21/2024] [Accepted: 04/02/2024] [Indexed: 04/19/2024] Open
Abstract
The 'structure assessment' web server is a one-stop shop for interactive evaluation and benchmarking of structural models of macromolecular complexes including proteins and nucleic acids. A user-friendly web dashboard links sequence with structure information and results from a variety of state-of-the-art tools, which facilitates the visual exploration and evaluation of structure models. The dashboard integrates stereochemistry information, secondary structure information, global and local model quality assessment of the tertiary structure of comparative protein models, as well as prediction of membrane location. In addition, a benchmarking mode is available where a model can be compared to a reference structure, providing easy access to scores that have been used in recent CASP experiments and CAMEO. The structure assessment web server is available at https://swissmodel.expasy.org/assess.
Collapse
Affiliation(s)
- Andrew M Waterhouse
- Biozentrum, University of Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Computational Structural Biology, Basel, Switzerland
| | - Gabriel Studer
- Biozentrum, University of Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Computational Structural Biology, Basel, Switzerland
| | - Xavier Robin
- Biozentrum, University of Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Computational Structural Biology, Basel, Switzerland
| | - Stefan Bienert
- Biozentrum, University of Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Computational Structural Biology, Basel, Switzerland
| | - Gerardo Tauriello
- Biozentrum, University of Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Computational Structural Biology, Basel, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Computational Structural Biology, Basel, Switzerland
| |
Collapse
|
23
|
Chen J, Wu H, Wang N. KEGG orthology prediction of bacterial proteins using natural language processing. BMC Bioinformatics 2024; 25:146. [PMID: 38600441 PMCID: PMC11007918 DOI: 10.1186/s12859-024-05766-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 04/03/2024] [Indexed: 04/12/2024] Open
Abstract
BACKGROUND The advent of high-throughput technologies has led to an exponential increase in uncharacterized bacterial protein sequences, surpassing the capacity of manual curation. A large number of bacterial protein sequences remain unannotated by Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology, making it necessary to use auto annotation tools. These tools are now indispensable in the biological research landscape, bridging the gap between the vastness of unannotated sequences and meaningful biological insights. RESULTS In this work, we propose a novel pipeline for KEGG orthology annotation of bacterial protein sequences that uses natural language processing and deep learning. To assess the effectiveness of our pipeline, we conducted evaluations using the genomes of two randomly selected species from the KEGG database. In our evaluation, we obtain competitive results on precision, recall, and F1 score, with values of 0.948, 0.947, and 0.947, respectively. CONCLUSIONS Our experimental results suggest that our pipeline demonstrates performance comparable to traditional methods and excels in identifying distant relatives with low sequence identity. This demonstrates the potential of our pipeline to significantly improve the accuracy and comprehensiveness of KEGG orthology annotation, thereby advancing our understanding of functional relationships within biological systems.
Collapse
Affiliation(s)
- Jing Chen
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
- Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computing Intelligence, Jiangnan University, Wuxi, China
| | - Haoyu Wu
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
| | - Ning Wang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China.
| |
Collapse
|
24
|
Capponi S, Wang S. AI in cellular engineering and reprogramming. Biophys J 2024:S0006-3495(24)00245-5. [PMID: 38576162 DOI: 10.1016/j.bpj.2024.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 03/19/2024] [Accepted: 04/01/2024] [Indexed: 04/06/2024] Open
Abstract
During the last decade, artificial intelligence (AI) has increasingly been applied in biophysics and related fields, including cellular engineering and reprogramming, offering novel approaches to understand, manipulate, and control cellular function. The potential of AI lies in its ability to analyze complex datasets and generate predictive models. AI algorithms can process large amounts of data from single-cell genomics and multiomic technologies, allowing researchers to gain mechanistic insights into the control of cell identity and function. By integrating and interpreting these complex datasets, AI can help identify key molecular events and regulatory pathways involved in cellular reprogramming. This knowledge can inform the design of precision engineering strategies, such as the development of new transcription factor and signaling molecule cocktails, to manipulate cell identity and drive authentic cell fate across lineage boundaries. Furthermore, when used in combination with computational methods, AI can accelerate and improve the analysis and understanding of the intricate relationships between genes, proteins, and cellular processes. In this review article, we explore the current state of AI applications in biophysics with a specific focus on cellular engineering and reprogramming. Then, we showcase a couple of recent applications where we combined machine learning with experimental and computational techniques. Finally, we briefly discuss the challenges and prospects of AI in cellular engineering and reprogramming, emphasizing the potential of these technologies to revolutionize our ability to engineer cells for a variety of applications, from disease modeling and drug discovery to regenerative medicine and biomanufacturing.
Collapse
Affiliation(s)
- Sara Capponi
- IBM Almaden Research Center, San Jose, California; Center for Cellular Construction, San Francisco, California.
| | - Shangying Wang
- Bay Area Institute of Science, Altos Labs, Redwood City, California.
| |
Collapse
|
25
|
Si Y, Yan C. Protein language model-embedded geometric graphs power inter-protein contact prediction. eLife 2024; 12:RP92184. [PMID: 38564241 PMCID: PMC10987090 DOI: 10.7554/elife.92184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024] Open
Abstract
Accurate prediction of contacting residue pairs between interacting proteins is very useful for structural characterization of protein-protein interactions. Although significant improvement has been made in inter-protein contact prediction recently, there is still a large room for improving the prediction accuracy. Here we present a new deep learning method referred to as PLMGraph-Inter for inter-protein contact prediction. Specifically, we employ rotationally and translationally invariant geometric graphs obtained from structures of interacting proteins to integrate multiple protein language models, which are successively transformed by graph encoders formed by geometric vector perceptrons and residual networks formed by dimensional hybrid residual blocks to predict inter-protein contacts. Extensive evaluation on multiple test sets illustrates that PLMGraph-Inter outperforms five top inter-protein contact prediction methods, including DeepHomo, GLINTER, CDPred, DeepHomo2, and DRN-1D2D_Inter, by large margins. In addition, we also show that the prediction of PLMGraph-Inter can complement the result of AlphaFold-Multimer. Finally, we show leveraging the contacts predicted by PLMGraph-Inter as constraints for protein-protein docking can dramatically improve its performance for protein complex structure prediction.
Collapse
Affiliation(s)
- Yunda Si
- School of Physics, Huazhong University of Science and TechnologyWuhanChina
| | - Chengfei Yan
- School of Physics, Huazhong University of Science and TechnologyWuhanChina
| |
Collapse
|
26
|
Liu Q, Fu Q, Yan Y, Jiang Q, Mao L, Wang L, Yu F, Zheng H. Curation, nomenclature, and topological classification of receptor-like kinases from 528 plant species for novel domain discovery and functional inference. MOLECULAR PLANT 2024; 17:658-671. [PMID: 38384130 DOI: 10.1016/j.molp.2024.02.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 01/25/2024] [Accepted: 02/19/2024] [Indexed: 02/23/2024]
Abstract
Receptor-like kinases (RLKs) are the most numerous signal transduction components in plants and play important roles in determining how different plants adapt to their ecological environments. Research on RLKs has focused mainly on a small number of typical RLK members in a few model plants. There is an urgent need to study the composition, distribution, and evolution of RLKs at the holistic level to increase our understanding of how RLKs assist in the ecological adaptations of different plant species. In this study, we collected the genome assemblies of 528 plant species and constructed an RLK dataset. Using this dataset, we identified and characterized 524 948 RLK family members. Each member underwent systematic topological classification and was assigned a gene ID based on a unified nomenclature system. Furthermore, we identified two novel extracellular domains in some RLKs, designated Xiao and Xiang. Evolutionary analysis of the RLK family revealed that the RLCK-XVII and RLCK-XII-2 classes were present exclusively in dicots, suggesting that diversification of RLKs between monocots and dicots may have led to differences in downstream cytoplasmic responses. We also used an interaction proteome to help empower data mining for inference of new RLK functions from a global perspective, with the ultimate goal of understanding how RLKs shape the adaptation of different plants to the environments/ecosystems. The assembled RLK dataset, together with annotations and analytical tools, forms an integrated foundation of multiomics data that is publicly accessible via the metaRLK web portal (http://metaRLK.biocloud.top).
Collapse
Affiliation(s)
- Qian Liu
- State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University College of Biology, Changsha, Hunan 410082, China
| | - Qiong Fu
- State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University College of Biology, Changsha, Hunan 410082, China
| | - Yujie Yan
- State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University College of Biology, Changsha, Hunan 410082, China
| | - Qian Jiang
- State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University College of Biology, Changsha, Hunan 410082, China
| | - Longfei Mao
- Bioinformatics Center, Hunan University College of Biology, Changsha, Hunan 410082, China
| | - Long Wang
- State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University College of Biology, Changsha, Hunan 410082, China
| | - Feng Yu
- State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University College of Biology, Changsha, Hunan 410082, China.
| | - Heping Zheng
- State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University College of Biology, Changsha, Hunan 410082, China; Bioinformatics Center, Hunan University College of Biology, Changsha, Hunan 410082, China.
| |
Collapse
|
27
|
Liu W, Wang Z, You R, Xie C, Wei H, Xiong Y, Yang J, Zhu S. PLMSearch: Protein language model powers accurate and fast sequence search for remote homology. Nat Commun 2024; 15:2775. [PMID: 38555371 PMCID: PMC10981738 DOI: 10.1038/s41467-024-46808-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 03/08/2024] [Indexed: 04/02/2024] Open
Abstract
Homologous protein search is one of the most commonly used methods for protein annotation and analysis. Compared to structure search, detecting distant evolutionary relationships from sequences alone remains challenging. Here we propose PLMSearch (Protein Language Model), a homologous protein search method with only sequences as input. PLMSearch uses deep representations from a pre-trained protein language model and trains the similarity prediction model with a large number of real structure similarity. This enables PLMSearch to capture the remote homology information concealed behind the sequences. Extensive experimental results show that PLMSearch can search millions of query-target protein pairs in seconds like MMseqs2 while increasing the sensitivity by more than threefold, and is comparable to state-of-the-art structure search methods. In particular, unlike traditional sequence search methods, PLMSearch can recall most remote homology pairs with dissimilar sequences but similar structures. PLMSearch is freely available at https://dmiip.sjtu.edu.cn/PLMSearch .
Collapse
Affiliation(s)
- Wei Liu
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China
| | - Ziye Wang
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China
| | - Ronghui You
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China
| | - Chenghan Xie
- School of Mathematical Sciences, Fudan University, 200433, Shanghai, China
| | - Hong Wei
- School of Mathematical Sciences, Nankai University, 300071, Tianjin, China
| | - Yi Xiong
- Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, 200240, Shanghai, China
| | - Jianyi Yang
- Ministry of Education Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Science, Shandong University, 266237, Qingdao, China.
| | - Shanfeng Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China.
- Shanghai Qi Zhi Institute, Shanghai, China.
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China.
- Shanghai Key Lab of Intelligent Information Processing and Shanghai Institute of Artificial Intelligence Algorithm, Fudan University, Shanghai, China.
- Zhangjiang Fudan International Innovation Center, Shanghai, China.
| |
Collapse
|
28
|
Zhang C, Zhang C, Shang T, Zhu N, Wu X, Duan H. HighFold: accurately predicting structures of cyclic peptides and complexes with head-to-tail and disulfide bridge constraints. Brief Bioinform 2024; 25:bbae215. [PMID: 38706323 PMCID: PMC11070728 DOI: 10.1093/bib/bbae215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 04/12/2024] [Accepted: 04/18/2024] [Indexed: 05/07/2024] Open
Abstract
In recent years, cyclic peptides have emerged as a promising therapeutic modality due to their diverse biological activities. Understanding the structures of these cyclic peptides and their complexes is crucial for unlocking invaluable insights about protein target-cyclic peptide interaction, which can facilitate the development of novel-related drugs. However, conducting experimental observations is time-consuming and expensive. Computer-aided drug design methods are not practical enough in real-world applications. To tackles this challenge, we introduce HighFold, an AlphaFold-derived model in this study. By integrating specific details about the head-to-tail circle and disulfide bridge structures, the HighFold model can accurately predict the structures of cyclic peptides and their complexes. Our model demonstrates superior predictive performance compared to other existing approaches, representing a significant advancement in structure-activity research. The HighFold model is openly accessible at https://github.com/hongliangduan/HighFold.
Collapse
Affiliation(s)
- Chenhao Zhang
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Chengyun Zhang
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
- AI department, Shanghai Highslab Therapeutics. Inc, Shanghai, 201203, China
| | - Tianfeng Shang
- AI department, Shanghai Highslab Therapeutics. Inc, Shanghai, 201203, China
| | - Ning Zhu
- China Pharmaceutical University, Nanjing, Jiangsu, 211198, China
| | - Xinyi Wu
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao, 999078, China
| |
Collapse
|
29
|
Jing X, Wu F, Luo X, Xu J. Single-sequence protein structure prediction by integrating protein language models. Proc Natl Acad Sci U S A 2024; 121:e2308788121. [PMID: 38507445 PMCID: PMC10990103 DOI: 10.1073/pnas.2308788121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 02/05/2024] [Indexed: 03/22/2024] Open
Abstract
Protein structure prediction has been greatly improved by deep learning in the past few years. However, the most successful methods rely on multiple sequence alignment (MSA) of the sequence homologs of the protein under prediction. In nature, a protein folds in the absence of its sequence homologs and thus, a MSA-free structure prediction method is desired. Here, we develop a single-sequence-based protein structure prediction method RaptorX-Single by integrating several protein language models and a structure generation module and then study its advantage over MSA-based methods. Our experimental results indicate that in addition to running much faster than MSA-based methods such as AlphaFold2, RaptorX-Single outperforms AlphaFold2 and other MSA-free methods in predicting the structure of antibodies (after fine-tuning on antibody data), proteins of very few sequence homologs, and single mutation effects. By comparing different protein language models, our results show that not only the scale but also the training data of protein language models will impact the performance. RaptorX-Single also compares favorably to MSA-based AlphaFold2 when the protein under prediction has a large number of sequence homologs.
Collapse
Affiliation(s)
| | - Fandi Wu
- MoleculeMind Ltd., Beijing100084, China
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing100190, China
| | - Xiao Luo
- Toyota Technological Institute at Chicago, Chicago, IL60637
- Shanghai Artificial Intelligence Laboratory, Shanghai200232, China
| | - Jinbo Xu
- MoleculeMind Ltd., Beijing100084, China
- Toyota Technological Institute at Chicago, Chicago, IL60637
| |
Collapse
|
30
|
Zimmerman L, Alon N, Levin I, Koganitsky A, Shpigel N, Brestel C, Lapidoth GD. Context-dependent design of induced-fit enzymes using deep learning generates well-expressed, thermally stable and active enzymes. Proc Natl Acad Sci U S A 2024; 121:e2313809121. [PMID: 38437538 PMCID: PMC10945820 DOI: 10.1073/pnas.2313809121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 02/09/2024] [Indexed: 03/06/2024] Open
Abstract
The potential of engineered enzymes in industrial applications is often limited by their expression levels, thermal stability, and catalytic diversity. De novo enzyme design faces challenges due to the complexity of enzymatic catalysis. An alternative approach involves expanding natural enzyme capabilities for new substrates and parameters. Here, we introduce CoSaNN (Conformation Sampling using Neural Network), an enzyme design strategy using deep learning for structure prediction and sequence optimization. CoSaNN controls enzyme conformations to expand chemical space beyond simple mutagenesis. It employs a context-dependent approach for generating enzyme designs, considering non-linear relationships in sequence and structure space. We also developed SolvIT, a graph NN predicting protein solubility in Escherichia coli, optimizing enzyme expression selection from larger design sets. Using this method, we engineered enzymes with superior expression levels, with 54% expressed in E. coli, and increased thermal stability, with over 30% having higher Tm than the template, with no high-throughput screening. Our research underscores AI's transformative role in protein design, capturing high-order interactions and preserving allosteric mechanisms in extensively modified enzymes, and notably enhancing expression success rates. This method's ease of use and efficiency streamlines enzyme design, opening broad avenues for biotechnological applications and broadening field accessibility.
Collapse
Affiliation(s)
| | - Noga Alon
- Enzymit Ltd., Ness-Ziona7403626, Israel
| | | | | | | | | | | |
Collapse
|
31
|
Tavis S, Hettich RL. Multi-Omics integration can be used to rescue metabolic information for some of the dark region of the Pseudomonas putida proteome. BMC Genomics 2024; 25:267. [PMID: 38468234 PMCID: PMC10926591 DOI: 10.1186/s12864-024-10082-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 02/02/2024] [Indexed: 03/13/2024] Open
Abstract
In every omics experiment, genes or their products are identified for which even state of the art tools are unable to assign a function. In the biotechnology chassis organism Pseudomonas putida, these proteins of unknown function make up 14% of the proteome. This missing information can bias analyses since these proteins can carry out functions which impact the engineering of organisms. As a consequence of predicting protein function across all organisms, function prediction tools generally fail to use all of the types of data available for any specific organism, including protein and transcript expression information. Additionally, the release of Alphafold predictions for all Uniprot proteins provides a novel opportunity for leveraging structural information. We constructed a bespoke machine learning model to predict the function of recalcitrant proteins of unknown function in Pseudomonas putida based on these sources of data, which annotated 1079 terms to 213 proteins. Among the predicted functions supplied by the model, we found evidence for a significant overrepresentation of nitrogen metabolism and macromolecule processing proteins. These findings were corroborated by manual analyses of selected proteins which identified, among others, a functionally unannotated operon that likely encodes a branch of the shikimate pathway.
Collapse
Affiliation(s)
- Steven Tavis
- Genome Science and Technology Graduate Program, University of Tennessee Knoxville, Knoxville, USA
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Robert L Hettich
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.
| |
Collapse
|
32
|
Shor B, Schneidman-Duhovny D. CombFold: predicting structures of large protein assemblies using a combinatorial assembly algorithm and AlphaFold2. Nat Methods 2024; 21:477-487. [PMID: 38326495 PMCID: PMC10927564 DOI: 10.1038/s41592-024-02174-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 01/09/2024] [Indexed: 02/09/2024]
Abstract
Deep learning models, such as AlphaFold2 and RosettaFold, enable high-accuracy protein structure prediction. However, large protein complexes are still challenging to predict due to their size and the complexity of interactions between multiple subunits. Here we present CombFold, a combinatorial and hierarchical assembly algorithm for predicting structures of large protein complexes utilizing pairwise interactions between subunits predicted by AlphaFold2. CombFold accurately predicted (TM-score >0.7) 72% of the complexes among the top-10 predictions in two datasets of 60 large, asymmetric assemblies. Moreover, the structural coverage of predicted complexes was 20% higher compared to corresponding Protein Data Bank entries. We applied the method on complexes from Complex Portal with known stoichiometry but without known structure and obtained high-confidence predictions. CombFold supports the integration of distance restraints based on crosslinking mass spectrometry and fast enumeration of possible complex stoichiometries. CombFold's high accuracy makes it a promising tool for expanding structural coverage beyond monomeric proteins.
Collapse
Affiliation(s)
- Ben Shor
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Dina Schneidman-Duhovny
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
| |
Collapse
|
33
|
Manalastas-Cantos K, Adoni KR, Pfeifer M, Märtens B, Grünewald K, Thalassinos K, Topf M. Modeling Flexible Protein Structure With AlphaFold2 and Crosslinking Mass Spectrometry. Mol Cell Proteomics 2024; 23:100724. [PMID: 38266916 PMCID: PMC10884514 DOI: 10.1016/j.mcpro.2024.100724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 12/23/2023] [Accepted: 12/27/2023] [Indexed: 01/26/2024] Open
Abstract
We propose a pipeline that combines AlphaFold2 (AF2) and crosslinking mass spectrometry (XL-MS) to model the structure of proteins with multiple conformations. The pipeline consists of two main steps: ensemble generation using AF2 and conformer selection using XL-MS data. For conformer selection, we developed two scores-the monolink probability score (MP) and the crosslink probability score (XLP)-both of which are based on residue depth from the protein surface. We benchmarked MP and XLP on a large dataset of decoy protein structures and showed that our scores outperform previously developed scores. We then tested our methodology on three proteins having an open and closed conformation in the Protein Data Bank: Complement component 3 (C3), luciferase, and glutamine-binding periplasmic protein, first generating ensembles using AF2, which were then screened for the open and closed conformations using experimental XL-MS data. In five out of six cases, the most accurate model within the AF2 ensembles-or a conformation within 1 Å of this model-was identified using crosslinks, as assessed through the XLP score. In the remaining case, only the monolinks (assessed through the MP score) successfully identified the open conformation of glutamine-binding periplasmic protein, and these results were further improved by including the "occupancy" of the monolinks. This serves as a compelling proof-of-concept for the effectiveness of monolinks. In contrast, the AF2 assessment score was only able to identify the most accurate conformation in two out of six cases. Our results highlight the complementarity of AF2 with experimental methods like XL-MS, with the MP and XLP scores providing reliable metrics to assess the quality of the predicted models. The MP and XLP scoring functions mentioned above are available at https://gitlab.com/topf-lab/xlms-tools.
Collapse
Affiliation(s)
- Karen Manalastas-Cantos
- Center for Data and Computing in Natural Sciences, Universität Hamburg, Hamburg, Germany; Department of Integrative Virology, Leibniz-Institut für Virologie (LIV), Centre for Structural Systems Biology (CSSB), Hamburg, Germany
| | - Kish R Adoni
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, UK; Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, United Kingdom
| | - Matthias Pfeifer
- Department of Integrative Virology, Leibniz-Institut für Virologie (LIV), Centre for Structural Systems Biology (CSSB), Hamburg, Germany; Universitätsklinikum Hamburg Eppendorf (UKE), Hamburg, Germany
| | - Birgit Märtens
- Department of Integrative Virology, Leibniz-Institut für Virologie (LIV), Centre for Structural Systems Biology (CSSB), Hamburg, Germany; Universitätsklinikum Hamburg Eppendorf (UKE), Hamburg, Germany
| | - Kay Grünewald
- Department of Integrative Virology, Leibniz-Institut für Virologie (LIV), Centre for Structural Systems Biology (CSSB), Hamburg, Germany; Department of Chemistry, Universität Hamburg, Hamburg, Germany
| | - Konstantinos Thalassinos
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, UK; Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, United Kingdom
| | - Maya Topf
- Department of Integrative Virology, Leibniz-Institut für Virologie (LIV), Centre for Structural Systems Biology (CSSB), Hamburg, Germany; Universitätsklinikum Hamburg Eppendorf (UKE), Hamburg, Germany.
| |
Collapse
|
34
|
Banayan NE, Loughlin BJ, Singh S, Forouhar F, Lu G, Wong K, Neky M, Hunt HS, Bateman LB, Tamez A, Handelman SK, Price WN, Hunt JF. Systematic enhancement of protein crystallization efficiency by bulk lysine-to-arginine (KR) substitution. Protein Sci 2024; 33:e4898. [PMID: 38358135 PMCID: PMC10868448 DOI: 10.1002/pro.4898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 01/01/2024] [Accepted: 01/02/2024] [Indexed: 02/16/2024]
Abstract
Structural genomics consortia established that protein crystallization is the primary obstacle to structure determination using x-ray crystallography. We previously demonstrated that crystallization propensity is systematically related to primary sequence, and we subsequently performed computational analyses showing that arginine is the most overrepresented amino acid in crystal-packing interfaces in the Protein Data Bank. Given the similar physicochemical characteristics of arginine and lysine, we hypothesized that multiple lysine-to-arginine (KR) substitutions should improve crystallization. To test this hypothesis, we developed software that ranks lysine sites in a target protein based on the redundancy-corrected KR substitution frequency in homologs. This software can be run interactively on the worldwide web at https://www.pxengineering.org/. We demonstrate that three unrelated single-domain proteins can tolerate 5-11 KR substitutions with at most minor destabilization, and, for two of these three proteins, the construct with the largest number of KR substitutions exhibits significantly enhanced crystallization propensity. This approach rapidly produced a 1.9 Å crystal structure of a human protein domain refractory to crystallization with its native sequence. Structures from Bulk KR-substituted domains show the engineered arginine residues frequently make hydrogen-bonds across crystal-packing interfaces. We thus demonstrate that Bulk KR substitution represents a rational and efficient method for probabilistic engineering of protein surface properties to improve crystallization.
Collapse
Affiliation(s)
- Nooriel E. Banayan
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| | - Blaine J. Loughlin
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| | - Shikha Singh
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| | - Farhad Forouhar
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| | - Guanqi Lu
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| | - Kam‐Ho Wong
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
- Present address:
Vaccine Research and DevelopmentPfizer Inc.Pearl RiverNew YorkUSA
| | - Matthew Neky
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
- Present address:
Columbia UniversityNew YorkNew YorkUSA
| | - Henry S. Hunt
- Department of PhysicsStanford UniversityStanfordCaliforniaUSA
| | | | | | - Samuel K. Handelman
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
- Present address:
Department of Pain & Neuronal HealthEli Lily & Co.893 Delaware StIndianapolisIndianaUSA
| | - W. Nicholson Price
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
- Present address:
University of Michigan Law SchoolAnn ArborMichiganUSA
| | - John F. Hunt
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| |
Collapse
|
35
|
Shams MH, Sohrabi SM, Jafari R, Sheikhian A, Motedayyen H, Baharvand PA, Hasanvand A, Fouladvand A, Assarehzadegan MA. Designing a T-cell epitope-based vaccine using in silico approaches against the Sal k 1 allergen of Salsola kali plant. Sci Rep 2024; 14:5040. [PMID: 38424208 PMCID: PMC10904830 DOI: 10.1038/s41598-024-55788-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Accepted: 02/27/2024] [Indexed: 03/02/2024] Open
Abstract
Allergens originated from Salsola kali (Russian thistle) pollen grains are one of the most important sources of aeroallergens causing pollinosis in desert and semi-desert regions. T-cell epitope-based vaccines (TEV) are more effective among different therapeutic approaches developed to alleviate allergic diseases. The physicochemical properties, and B as well as T cell epitopes of Sal k 1 (a major allergen of S. kali) were predicted using immunoinformatic tools. A TEV was constructed using the linkers EAAAK, GPGPG and the most suitable CD4+ T cell epitopes. RS04 adjuvant was added as a TLR4 agonist to the amino (N) and carboxyl (C) terminus of the TEV protein. The secondary and tertiary structures, solubility, allergenicity, toxicity, stability, physicochemical properties, docking with immune receptors, BLASTp against the human and microbiota proteomes, and in silico cloning of the designed TEV were assessed using immunoinformatic analyses. Two CD4+ T cell epitopes of Sal k1 that had high affinity with different alleles of MHC-II were selected and used in the TEV. The molecular docking of the TEV with HLADRB1, and TLR4 showed TEV strong interactions and stable binding pose to these receptors. Moreover, the codon optimized TEV sequence was cloned between NcoI and XhoI restriction sites of pET-28a(+) expression plasmid. The designed TEV can be used as a promising candidate in allergen-specific immunotherapy against S. kali. Nonetheless, effectiveness of this vaccine should be validated through immunological bioassays.
Collapse
Affiliation(s)
- Mohammad Hossein Shams
- Hepatitis Research Center and Department of Medical Immunology, School of Medicine, Lorestan University of Medical Sciences, Khorramabad, Iran.
| | - Seyyed Mohsen Sohrabi
- Department of Production Engineering and Plant Genetic, Faculty of Agriculture, Shahid Chamran University of Ahvaz, Box 6814993165, Ahvaz, Iran
| | - Reza Jafari
- School of Allied Medical Sciences, Shahroud University of Medical Sciences, Shahroud, Iran
| | - Ali Sheikhian
- Hepatitis Research Center and Department of Medical Immunology, School of Medicine, Lorestan University of Medical Sciences, Khorramabad, Iran
| | - Hossein Motedayyen
- Autoimmune Diseases Research Center, Kashan University of Medical Sciences, Kashan, Iran
| | - Peyman Amanolahi Baharvand
- Hepatitis Research Center and Department of Medical Immunology, School of Medicine, Lorestan University of Medical Sciences, Khorramabad, Iran
| | - Amin Hasanvand
- Department of Physiology and Pharmacology, School of Medicine, Lorestan University of Medical Sciences, Khorramabad, Iran
| | - Ali Fouladvand
- Hepatitis Research Center and Department of Medical Immunology, School of Medicine, Lorestan University of Medical Sciences, Khorramabad, Iran
| | - Mohammad-Ali Assarehzadegan
- Immunology Research Center, Department of Immunology, School of Medicine, Iran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
36
|
Wang X, Zhu H, Terashi G, Taluja M, Kihara D. DiffModeler: Large Macromolecular Structure Modeling in Low-Resolution Cryo-EM Maps Using Diffusion Model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.20.576370. [PMID: 38328203 PMCID: PMC10849514 DOI: 10.1101/2024.01.20.576370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Cryogenic electron microscopy (cryo-EM) has now been widely used for determining multi-chain protein complexes. However, modeling a complex structure is challenging particularly when the map resolution is low, typically in the intermediate resolution range of 5 to 10 Å. Within this resolution range, even accurate structure fitting is difficult, let alone de novo modeling. To address this challenge, here we present DiffModeler, a fully automated method for modeling protein complex structures. DiffModeler employs a diffusion model for backbone tracing and integrates AlphaFold2-predicted single-chain structures for structure fitting. Extensive testing on cryo-EM maps at intermediate resolutions demonstrates the exceptional accuracy of DiffModeler in structure modeling, achieving an average TM-Score of 0.92, surpassing existing methodologies significantly. Notably, DiffModeler successfully modeled a protein complex composed of 47 chains and 13,462 residues, achieving a high TM-Score of 0.94. Further benchmarking at low resolutions (10-20 Å confirms its versatility, demonstrating plausible performance. Moreover, when coupled with CryoREAD, DiffModeler excels in constructing protein-DNA/RNA complex structures for near-atomic resolution maps (0-5 Å), showcasing state-of-the-art performance with average TM-Scores of 0.88 and 0.91 across two datasets.
Collapse
Affiliation(s)
- Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, Indiana, 47907, USA
| | - Han Zhu
- Department of Computer Science, Purdue University, West Lafayette, Indiana, 47907, USA
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, 47907, USA
| | - Manav Taluja
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, 47907, USA
- School of Computer Science and Engineering, Vellore Institute of Technology, Tamil Nadu 642014, India
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, Indiana, 47907, USA
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, 47907, USA
| |
Collapse
|
37
|
Leśniewski M, Pyrka M, Czaplewski C, Co NT, Jiang Y, Gong Z, Tang C, Liwo A. Assessment of Two Restraint Potentials for Coarse-Grained Chemical-Cross-Link-Assisted Modeling of Protein Structures. J Chem Inf Model 2024; 64:1377-1393. [PMID: 38345917 PMCID: PMC10900291 DOI: 10.1021/acs.jcim.3c01890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 01/20/2024] [Accepted: 01/22/2024] [Indexed: 02/27/2024]
Abstract
The influence of distance restraints from chemical cross-link mass spectroscopy (XL-MS) on the quality of protein structures modeled with the coarse-grained UNRES force field was assessed by using a protocol based on multiplexed replica exchange molecular dynamics, in which both simulated and experimental cross-link restraints were employed, for 23 small proteins. Six cross-links with upper distance boundaries from 4 Å to 12 Å (azido benzoic acid succinimide (ABAS), triazidotriazine (TATA), succinimidyldiazirine (SDA), disuccinimidyl adipate (DSA), disuccinimidyl glutarate (DSG), and disuccinimidyl suberate (BS3)) and two types of restraining potentials ((i) simple flat-bottom Lorentz-like potentials dependent on side chain distance (all cross-links) and (ii) distance- and orientation-dependent potentials determined based on molecular dynamics simulations of model systems (DSA, DSG, BS3, and SDA)) were considered. The Lorentz-like potentials with properly set parameters were found to produce a greater number of higher-quality models compared to unrestrained simulations than the MD-based potentials, because the latter can force too long distances between side chains. Therefore, the flat-bottom Lorentz-like potentials are recommended to represent cross-link restraints. It was also found that significant improvement of model quality upon the introduction of cross-link restraints is obtained when the sum of differences of indices of cross-linked residues exceeds 150.
Collapse
Affiliation(s)
- Mateusz Leśniewski
- Faculty
of Chemistry, University of Gdańsk, Fahrenheit Union of Universities, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Maciej Pyrka
- Faculty
of Chemistry, University of Gdańsk, Fahrenheit Union of Universities, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
- Department
of Physics and Biophysics, University of
Warmia and Mazury, ul. Oczapowskiego 4, 10-719 Olsztyn, Poland
| | - Cezary Czaplewski
- Faculty
of Chemistry, University of Gdańsk, Fahrenheit Union of Universities, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Nguyen Truong Co
- Faculty
of Chemistry, University of Gdańsk, Fahrenheit Union of Universities, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Yida Jiang
- College
of Chemistry and Molecular Engineering & Center for Quantitative
Biology & PKU-Tsinghua Center for Life Sciences & Beijing
National Laboratory for Molecular Sciences, Peking University, Beijing 100871, China
| | - Zhou Gong
- Innovation
Academy of Precision Measurement Science and Technology, Chinese Academy of Sciences, 30 W. Xiao Hong Shan, Wuhan 430071, China
| | - Chun Tang
- College
of Chemistry and Molecular Engineering & Center for Quantitative
Biology & PKU-Tsinghua Center for Life Sciences & Beijing
National Laboratory for Molecular Sciences, Peking University, Beijing 100871, China
| | - Adam Liwo
- Faculty
of Chemistry, University of Gdańsk, Fahrenheit Union of Universities, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| |
Collapse
|
38
|
Tarasovetc EV, Sissoko GB, Mukhina AS, Maiorov A, Ataullakhanov FI, Cheeseman IM, Grishchuk EL. Molecular density-accelerated binding-site maturation underlies CENP-T-dependent kinetochore assembly. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.25.581584. [PMID: 38464265 PMCID: PMC10925139 DOI: 10.1101/2024.02.25.581584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Formation of macromolecular cellular structures relies on recruitment of multiple proteins, requiring the precisely controlled pairwise binding interactions. At human kinetochores, our recent work found that the high molecular density environment enables strong bonding between the Ndc80 complex and its two binding sites at the CENP-T receptor. However, the mechanistic basis for this unusual density-dependent facilitation remains unknown. Here, using quantitative single-molecule approaches, we reveal two distinct mechanisms that drive preferential recruitment of the Ndc80 complex to higher-order structures of CENP-T, as opposed to CENP-T monomers. First, the Ndc80 binding sites within the disordered tail of the CENP-T mature over time, leading to a stronger grip on the Spc24/25 heads of the Ndc80 complexes. Second, the maturation of Ndc80 binding sites is accelerated when CENP-T molecules are clustered in close proximity. The rates of the clustering-induced maturation are remarkably different for two binding sites within CENP-T, correlating with different interfaces formed by the corresponding CENP-T sequences as they wrap around the Spc24/25 heads. The differential clustering-dependent regulation of these sites is preserved in dividing human cells, suggesting a distinct regulatory entry point to control kinetochore-microtubule interactions. The tunable acceleration of slowly maturing binding sites by a high molecular-density environment may represent a fundamental physicochemical mechanism to assist the assembly of mitotic kinetochores and other macromolecular structures.
Collapse
Affiliation(s)
- Ekaterina V. Tarasovetc
- Department of Physiology, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA 19104, USA
| | - Gunter B. Sissoko
- Whitehead Institute for Biomedical Research; Cambridge, MA 02142, USA
- Department of Biology, Massachusetts Institute of Technology; Cambridge, MA 02142, USA
| | - Anna S. Mukhina
- Department of Physics, Lomonosov Moscow State University; Moscow, 119991, Russia
| | - Aleksandr Maiorov
- Department of Physiology, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA 19104, USA
| | - Fazoil I. Ataullakhanov
- Center for Theoretical Problems of Physicochemical Pharmacology, Russian Academy of Sciences; Moscow, 119991, Russia
- Dmitry Rogachev National Research Center of Pediatric Hematology, Oncology and Immunology; Moscow, 117198, Russia
- Moscow Institute of Physics and Technology; 141701, Dolgoprudny, Russia
| | - Iain M. Cheeseman
- Whitehead Institute for Biomedical Research; Cambridge, MA 02142, USA
- Department of Biology, Massachusetts Institute of Technology; Cambridge, MA 02142, USA
| | - Ekaterina L. Grishchuk
- Department of Physiology, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA 19104, USA
| |
Collapse
|
39
|
Ali MA, Caetano-Anollés G. AlphaFold2 Reveals Structural Patterns of Seasonal Haplotype Diversification in SARS-CoV-2 Spike Protein Variants. BIOLOGY 2024; 13:134. [PMID: 38534404 DOI: 10.3390/biology13030134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 02/07/2024] [Accepted: 02/16/2024] [Indexed: 03/28/2024]
Abstract
The slow experimental acquisition of high-quality atomic structures of the rapidly changing proteins of the COVID-19 virus challenges vaccine and therapeutic drug development efforts. Fortunately, deep learning tools such as AlphaFold2 can quickly generate reliable models of atomic structure at experimental resolution. Current modeling studies have focused solely on definitions of mutant constellations of Variants of Concern (VOCs), leaving out the impact of haplotypes on protein structure. Here, we conduct a thorough comparative structural analysis of S-proteins belonging to major VOCs and corresponding latitude-delimited haplotypes that affect viral seasonal behavior. Our approach identified molecular regions of importance as well as patterns of structural recruitment. The S1 subunit hosted the majority of structural changes, especially those involving the N-terminal domain (NTD) and the receptor-binding domain (RBD). In particular, structural changes in the NTD were much greater than just translations in three-dimensional space, altering the sub-structures to greater extents. We also revealed a notable pattern of structural recruitment with the early VOCs Alpha and Delta behaving antagonistically by suppressing regions of structural change introduced by their corresponding haplotypes, and the current VOC Omicron behaving synergistically by amplifying or collecting structural change. Remarkably, haplotypes altering the galectin-like structure of the NTD were major contributors to seasonal behavior, supporting its putative environmental-sensing role. Our results provide an extensive view of the evolutionary landscape of the S-protein across the COVID-19 pandemic. This view will help predict important regions of structural change in future variants and haplotypes for more efficient vaccine and drug development.
Collapse
Affiliation(s)
- Muhammad Asif Ali
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
40
|
Corum MR, Venkannagari H, Hryc CF, Baker ML. Predictive modeling and cryo-EM: A synergistic approach to modeling macromolecular structure. Biophys J 2024; 123:435-450. [PMID: 38268190 PMCID: PMC10912932 DOI: 10.1016/j.bpj.2024.01.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 01/09/2024] [Accepted: 01/18/2024] [Indexed: 01/26/2024] Open
Abstract
Over the last 15 years, structural biology has seen unprecedented development and improvement in two areas: electron cryo-microscopy (cryo-EM) and predictive modeling. Once relegated to low resolutions, single-particle cryo-EM is now capable of achieving near-atomic resolutions of a wide variety of macromolecular complexes. Ushered in by AlphaFold, machine learning has powered the current generation of predictive modeling tools, which can accurately and reliably predict models for proteins and some complexes directly from the sequence alone. Although they offer new opportunities individually, there is an inherent synergy between these techniques, allowing for the construction of large, complex macromolecular models. Here, we give a brief overview of these approaches in addition to illustrating works that combine these techniques for model building. These examples provide insight into model building, assessment, and limitations when integrating predictive modeling with cryo-EM density maps. Together, these approaches offer the potential to greatly accelerate the generation of macromolecular structural insights, particularly when coupled with experimental data.
Collapse
Affiliation(s)
- Michael R Corum
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas
| | - Harikanth Venkannagari
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas
| | - Corey F Hryc
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas
| | - Matthew L Baker
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas.
| |
Collapse
|
41
|
Bahena-Ceron R, Teixeira C, Ponce JRJ, Wolff P, Couzon F, François P, Klaholz BP, Vandenesch F, Romby P, Moreau K, Marzi S. RlmQ: a newly discovered rRNA modification enzyme bridging RNA modification and virulence traits in Staphylococcus aureus. RNA (NEW YORK, N.Y.) 2024; 30:200-212. [PMID: 38164596 PMCID: PMC10870370 DOI: 10.1261/rna.079850.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 11/29/2023] [Indexed: 01/03/2024]
Abstract
rRNA modifications play crucial roles in fine-tuning the delicate balance between translation speed and accuracy, yet the underlying mechanisms remain elusive. Comparative analyses of the rRNA modifications in taxonomically distant bacteria could help define their general, as well as species-specific, roles. In this study, we identified a new methyltransferase, RlmQ, in Staphylococcus aureus responsible for the Gram-positive specific m7G2601, which is not modified in Escherichia coli (G2574). We also demonstrate the absence of methylation on C1989, equivalent to E. coli C1962, which is methylated at position 5 by the Gram-negative specific RlmI methyltransferase, a paralog of RlmQ. Both modifications (S. aureus m7G2601 and E. coli m5C1962) are situated within the same tRNA accommodation corridor, hinting at a potential shared function in translation. Inactivation of S. aureus rlmQ causes the loss of methylation at G2601 and significantly impacts growth, cytotoxicity, and biofilm formation. These findings unravel the intricate connections between rRNA modifications, translation, and virulence in pathogenic Gram-positive bacteria.
Collapse
Affiliation(s)
- Roberto Bahena-Ceron
- Université de Strasbourg, CNRS, Architecture et Réactivité de l'ARN, 67000 Strasbourg, France
| | - Chloé Teixeira
- CIRI, Centre International de Recherche en Infectiologie, Université de Lyon, Inserm U1111, Université Claude Bernard Lyon 1, CNRS UMR5308, ENS de Lyon, 69007 Lyon, France
| | - Jose R Jaramillo Ponce
- Université de Strasbourg, CNRS, Architecture et Réactivité de l'ARN, 67000 Strasbourg, France
| | - Philippe Wolff
- Université de Strasbourg, CNRS, Architecture et Réactivité de l'ARN, 67000 Strasbourg, France
| | - Florence Couzon
- CIRI, Centre International de Recherche en Infectiologie, Université de Lyon, Inserm U1111, Université Claude Bernard Lyon 1, CNRS UMR5308, ENS de Lyon, 69007 Lyon, France
| | - Pauline François
- CIRI, Centre International de Recherche en Infectiologie, Université de Lyon, Inserm U1111, Université Claude Bernard Lyon 1, CNRS UMR5308, ENS de Lyon, 69007 Lyon, France
| | - Bruno P Klaholz
- Centre for Integrative Biology, Department of Integrated Structural Biology, IGBMC, 67400 Illkirch, France
- CNRS UMR 7104, 67400 Illkirch, France
- Inserm U964, 67400 Illkirch, France
- Université de Strasbourg, 67000 Strasbourg, France
| | - François Vandenesch
- CIRI, Centre International de Recherche en Infectiologie, Université de Lyon, Inserm U1111, Université Claude Bernard Lyon 1, CNRS UMR5308, ENS de Lyon, 69007 Lyon, France
- Institut des agents infectieux, Hospices Civils de Lyon, 69004 Lyon, France
- Centre National de Référence des Staphylocoques, Hospices Civils de Lyon, 69317 Lyon, France
| | - Pascale Romby
- Université de Strasbourg, CNRS, Architecture et Réactivité de l'ARN, 67000 Strasbourg, France
| | - Karen Moreau
- CIRI, Centre International de Recherche en Infectiologie, Université de Lyon, Inserm U1111, Université Claude Bernard Lyon 1, CNRS UMR5308, ENS de Lyon, 69007 Lyon, France
| | - Stefano Marzi
- Université de Strasbourg, CNRS, Architecture et Réactivité de l'ARN, 67000 Strasbourg, France
| |
Collapse
|
42
|
Schweke H, Pacesa M, Levin T, Goverde CA, Kumar P, Duhoo Y, Dornfeld LJ, Dubreuil B, Georgeon S, Ovchinnikov S, Woolfson DN, Correia BE, Dey S, Levy ED. An atlas of protein homo-oligomerization across domains of life. Cell 2024; 187:999-1010.e15. [PMID: 38325366 DOI: 10.1016/j.cell.2024.01.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 11/03/2023] [Accepted: 01/15/2024] [Indexed: 02/09/2024]
Abstract
Protein structures are essential to understanding cellular processes in molecular detail. While advances in artificial intelligence revealed the tertiary structure of proteins at scale, their quaternary structure remains mostly unknown. We devise a scalable strategy based on AlphaFold2 to predict homo-oligomeric assemblies across four proteomes spanning the tree of life. Our results suggest that approximately 45% of an archaeal proteome and a bacterial proteome and 20% of two eukaryotic proteomes form homomers. Our predictions accurately capture protein homo-oligomerization, recapitulate megadalton complexes, and unveil hundreds of homo-oligomer types, including three confirmed experimentally by structure determination. Integrating these datasets with omics information suggests that a majority of known protein complexes are symmetric. Finally, these datasets provide a structural context for interpreting disease mutations and reveal coiled-coil regions as major enablers of quaternary structure evolution in human. Our strategy is applicable to any organism and provides a comprehensive view of homo-oligomerization in proteomes.
Collapse
Affiliation(s)
- Hugo Schweke
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Martin Pacesa
- Laboratory of Protein Design and Immunoengineering, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Tal Levin
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Casper A Goverde
- Laboratory of Protein Design and Immunoengineering, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Prasun Kumar
- School of Chemistry, University of Bristol, Bristol BS8 1TS, UK; School of Biochemistry, University of Bristol, Bristol BS8 1TD, UK; Bristol BioDesign Institute, University of Bristol, Life Sciences Building, Bristol BS8 1TQ, UK; Max Planck-Bristol Centre for Minimal Biology, University of Bristol, Cantock's Close, Bristol BS8 1TS, UK
| | - Yoan Duhoo
- Protein Production and Structure Characterization Core Facility (PTPSP), School of Life Sciences, École polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Lars J Dornfeld
- Laboratory of Protein Design and Immunoengineering, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Benjamin Dubreuil
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Sandrine Georgeon
- Laboratory of Protein Design and Immunoengineering, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, USA
| | - Derek N Woolfson
- School of Chemistry, University of Bristol, Bristol BS8 1TS, UK; School of Biochemistry, University of Bristol, Bristol BS8 1TD, UK; Bristol BioDesign Institute, University of Bristol, Life Sciences Building, Bristol BS8 1TQ, UK; Max Planck-Bristol Centre for Minimal Biology, University of Bristol, Cantock's Close, Bristol BS8 1TS, UK.
| | - Bruno E Correia
- Laboratory of Protein Design and Immunoengineering, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| | - Sucharita Dey
- Department of Bioscience and Bioengineering, Indian Institute of Technology Jodhpur, Rajasthan, India.
| | - Emmanuel D Levy
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
43
|
Wuyun Q, Chen Y, Shen Y, Cao Y, Hu G, Cui W, Gao J, Zheng W. Recent Progress of Protein Tertiary Structure Prediction. Molecules 2024; 29:832. [PMID: 38398585 PMCID: PMC10893003 DOI: 10.3390/molecules29040832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 02/06/2024] [Accepted: 02/08/2024] [Indexed: 02/25/2024] Open
Abstract
The prediction of three-dimensional (3D) protein structure from amino acid sequences has stood as a significant challenge in computational and structural bioinformatics for decades. Recently, the widespread integration of artificial intelligence (AI) algorithms has substantially expedited advancements in protein structure prediction, yielding numerous significant milestones. In particular, the end-to-end deep learning method AlphaFold2 has facilitated the rise of structure prediction performance to new heights, regularly competitive with experimental structures in the 14th Critical Assessment of Protein Structure Prediction (CASP14). To provide a comprehensive understanding and guide future research in the field of protein structure prediction for researchers, this review describes various methodologies, assessments, and databases in protein structure prediction, including traditionally used protein structure prediction methods, such as template-based modeling (TBM) and template-free modeling (FM) approaches; recently developed deep learning-based methods, such as contact/distance-guided methods, end-to-end folding methods, and protein language model (PLM)-based methods; multi-domain protein structure prediction methods; the CASP experiments and related assessments; and the recently released AlphaFold Protein Structure Database (AlphaFold DB). We discuss their advantages, disadvantages, and application scopes, aiming to provide researchers with insights through which to understand the limitations, contexts, and effective selections of protein structure prediction methods in protein-related fields.
Collapse
Affiliation(s)
- Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Yihan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Yifeng Shen
- Faculty of Environment and Information Studies, Keio University, Fujisawa 252-0882, Kanagawa, Japan;
| | - Yang Cao
- College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China
| | - Wei Cui
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
44
|
Liu Z, Zhang C, Zhang Q, Zhang Y, Yu DJ. TM-search: An Efficient and Effective Tool for Protein Structure Database Search. J Chem Inf Model 2024; 64:1043-1049. [PMID: 38270339 DOI: 10.1021/acs.jcim.3c01455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
The quickly increasing size of the Protein Data Bank is challenging biologists to develop a more scalable protein structure alignment tool for fast structure database search. Although many protein structure search algorithms and programs have been designed and implemented for this purpose, most require a large amount of computational time. We propose a novel protein structure search approach, TM-search, which is based on the pairwise structure alignment program TM-align and a new iterative clustering algorithm. Benchmark tests demonstrate that TM-search is 27 times faster than a TM-align full database search while still being able to identify ∼90% of all high TM-score hits, which is 2-10 times more than other existing programs such as Foldseek, Dali, and PSI-BLAST.
Collapse
Affiliation(s)
- Zi Liu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw, Ann Arbor, Michigan 48109-2218, United States
| | - Qidi Zhang
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw, Ann Arbor, Michigan 48109-2218, United States
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| |
Collapse
|
45
|
Moreland RT, Zhang S, Barreira SN, Ryan JF, Baxevanis AD. An AI-generated proteome-scale dataset of predicted protein structures for the ctenophore Mnemiopsis leidyi. Proteomics 2024:e2300397. [PMID: 38329168 DOI: 10.1002/pmic.202300397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 01/11/2024] [Accepted: 01/12/2024] [Indexed: 02/09/2024]
Abstract
This Dataset Brief describes the computational prediction of protein structures for the ctenophore Mnemiopsis leidyi. Here, we report the proteome-scale generation of 15,333 protein structure predictions using AlphaFold, as well as an updated implementation of publicly available search, manipulation, and visualization tools for these protein structure predictions through the Mnemiopsis Genome Project Portal (https://research.nhgri.nih.gov/mnemiopsis). The utility of these predictions is demonstrated by highlighting comparisons to experimentally determined structures for the light-sensitive protein mnemiopsin 1 and the ionotropic glutamate receptor (iGluR). The application of these novel protein structure prediction methods will serve to further position non-bilaterian species such as Mnemiopsis as powerful model systems for the study of early animal evolution and human health.
Collapse
Affiliation(s)
- R Travis Moreland
- Center for Genomics and Data Science Research, Division of Intramural Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Suiyuan Zhang
- Center for Genomics and Data Science Research, Division of Intramural Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Sofia N Barreira
- Center for Genomics and Data Science Research, Division of Intramural Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Joseph F Ryan
- Whitney Laboratory for Marine Bioscience, University of Florida, St. Augustine, Florida, USA
| | - Andreas D Baxevanis
- Center for Genomics and Data Science Research, Division of Intramural Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
46
|
Wu KE, Yang KK, van den Berg R, Alamdari S, Zou JY, Lu AX, Amini AP. Protein structure generation via folding diffusion. Nat Commun 2024; 15:1059. [PMID: 38316764 PMCID: PMC10844308 DOI: 10.1038/s41467-024-45051-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 01/12/2024] [Indexed: 02/07/2024] Open
Abstract
The ability to computationally generate novel yet physically foldable protein structures could lead to new biological discoveries and new treatments targeting yet incurable diseases. Despite recent advances in protein structure prediction, directly generating diverse, novel protein structures from neural networks remains difficult. In this work, we present a diffusion-based generative model that generates protein backbone structures via a procedure inspired by the natural folding process. We describe a protein backbone structure as a sequence of angles capturing the relative orientation of the constituent backbone atoms, and generate structures by denoising from a random, unfolded state towards a stable folded structure. Not only does this mirror how proteins natively twist into energetically favorable conformations, the inherent shift and rotational invariance of this representation crucially alleviates the need for more complex equivariant networks. We train a denoising diffusion probabilistic model with a simple transformer backbone and demonstrate that our resulting model unconditionally generates highly realistic protein structures with complexity and structural patterns akin to those of naturally-occurring proteins. As a useful resource, we release an open-source codebase and trained models for protein structure diffusion.
Collapse
Affiliation(s)
- Kevin E Wu
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | | | | | | | - James Y Zou
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Alex X Lu
- Microsoft Research, Cambridge, MA, USA
| | | |
Collapse
|
47
|
Jones RN, Miyauchi S, Roy S, Boutros N, Mayadev JS, Mell LK, Califano JA, Venuti A, Sharabi AB. Computational and AI-driven 3D structural analysis of human papillomavirus (HPV) oncoproteins E5, E6, and E7 reveal significant divergence of HPV E5 between low-risk and high-risk genotypes. Virology 2024; 590:109946. [PMID: 38147693 DOI: 10.1016/j.virol.2023.109946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 11/01/2023] [Accepted: 11/20/2023] [Indexed: 12/28/2023]
Abstract
There are over 220 identified genotypes of Human papillomavirus (HPV), and the HPV genome encodes 3 major oncogenes, E5, E6, and E7. Conservation and divergence in protein sequence and function between low-risk versus high-risk oncogenic HPV genotypes has not been fully characterized. Here, we used modern computational and structural folding algorithms to perform a comparative analysis of HPV E5, E6, and E7 between multiple low risk and high risk genotypes. We first identified significantly greater sequence divergence in E5 between low- and high-risk genotypes compared to E6 and E7. Next, we used AlphaFold to model the structure of papillomavirus proteins and complexes with high confidence, including some with no established consensus structure. We observed that HPV E5, but not E6 or E7, had a dramatically different 3D structure between low-risk and high-risk genotypes. To our knowledge, this is the first comparative analysis of HPV proteins using Alphafold artificial intelligence (AI) system. The marked differences in E5 sequence and structure in high-risk HPVs may contribute in important and underappreciated ways to the development of HPV-associated cancers.
Collapse
Affiliation(s)
- Riley N Jones
- Department of Radiation Medicine and Applied Sciences, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Sayuri Miyauchi
- Department of Radiation Medicine and Applied Sciences, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Souvick Roy
- Department of Radiation Medicine and Applied Sciences, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Nathalie Boutros
- Department of Radiation Medicine and Applied Sciences, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Jyoti S Mayadev
- Department of Radiation Medicine and Applied Sciences, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Loren K Mell
- Department of Radiation Medicine and Applied Sciences, University of California, San Diego, La Jolla, CA, 92037, USA; Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Joseph A Califano
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA; Division of Otolaryngology-Head and Neck Surgery, Department of Surgery, University of California, San Diego, La Jolla, CA, USA
| | - Aldo Venuti
- HPV-UNIT-UOSD Tumor Immunology and Immunotherapy, IRCCS Regina Elena National Cancer Institute, Rome, Italy
| | - Andrew B Sharabi
- Department of Radiation Medicine and Applied Sciences, University of California, San Diego, La Jolla, CA, 92037, USA; Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
48
|
Zheng W, Wuyun Q, Li Y, Zhang C, Freddolino PL, Zhang Y. Improving deep learning protein monomer and complex structure prediction using DeepMSA2 with huge metagenomics data. Nat Methods 2024; 21:279-289. [PMID: 38167654 PMCID: PMC10864179 DOI: 10.1038/s41592-023-02130-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 11/13/2023] [Indexed: 01/05/2024]
Abstract
Leveraging iterative alignment search through genomic and metagenome sequence databases, we report the DeepMSA2 pipeline for uniform protein single- and multichain multiple-sequence alignment (MSA) construction. Large-scale benchmarks show that DeepMSA2 MSAs can remarkably increase the accuracy of protein tertiary and quaternary structure predictions compared with current state-of-the-art methods. An integrated pipeline with DeepMSA2 participated in the most recent CASP15 experiment and created complex structural models with considerably higher quality than the AlphaFold2-Multimer server (v.2.2.0). Detailed data analyses show that the major advantage of DeepMSA2 lies in its balanced alignment search and effective model selection, and in the power of integrating huge metagenomics databases. These results demonstrate a new avenue to improve deep learning protein structure prediction through advanced MSA construction and provide additional evidence that optimization of input information to deep learning-based structure prediction methods must be considered with as much care as the design of the predictor itself.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - P Lydia Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA.
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA.
- Department of Computer Science, School of Computing, National University of Singapore, Singapore, Singapore.
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
49
|
Giancotti R, Lomoio U, Puccio B, Tradigo G, Vizza P, Torti C, Veltri P, Guzzi PH. The Omicron XBB.1 Variant and Its Descendants: Genomic Mutations, Rapid Dissemination and Notable Characteristics. BIOLOGY 2024; 13:90. [PMID: 38392308 PMCID: PMC10886209 DOI: 10.3390/biology13020090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 01/26/2024] [Accepted: 01/30/2024] [Indexed: 02/24/2024]
Abstract
The SARS-CoV-2 virus, which is a major threat to human health, has undergone many mutations during the replication process due to errors in the replication steps and modifications in the structure of viral proteins. The XBB variant was identified for the first time in Singapore in the fall of 2022. It was then detected in other countries, including the United States, Canada, and the United Kingdom. We study the impact of sequence changes on spike protein structure on the subvariants of XBB, with particular attention to the velocity of variant diffusion and virus activity with respect to its diffusion. We examine the structural and functional distinctions of the variants in three different conformations: (i) spike glycoprotein in complex with ACE2 (1-up state), (ii) spike glycoprotein (closed-1 state), and (iii) S protein (open-1 state). We also estimate the affinity binding between the spike protein and ACE2. The market binding affinity observed in specific variants raises questions about the efficacy of current vaccines in preparing the immune system for virus variant recognition. This work may be useful in devising strategies to manage the ongoing COVID-19 pandemic. To stay ahead of the virus evolution, further research and surveillance should be carried out to adjust public health measures accordingly.
Collapse
Affiliation(s)
- Raffaele Giancotti
- Department of Surgical and Medical Sciences, Magna Graecia University of Catanzaro, 88100 Catanzaro, Italy
| | - Ugo Lomoio
- Department of Surgical and Medical Sciences, Magna Graecia University of Catanzaro, 88100 Catanzaro, Italy
| | - Barbara Puccio
- Department of Surgical and Medical Sciences, Magna Graecia University of Catanzaro, 88100 Catanzaro, Italy
| | | | - Patrizia Vizza
- Department of Surgical and Medical Sciences, Magna Graecia University of Catanzaro, 88100 Catanzaro, Italy
| | - Carlo Torti
- Department of Surgical and Medical Sciences, Magna Graecia University of Catanzaro, 88100 Catanzaro, Italy
| | - Pierangelo Veltri
- Department of Computer Engineering, Modelling, Electronics and System, University of Calabria, 87036 Rende, Italy
| | - Pietro Hiram Guzzi
- Department of Surgical and Medical Sciences, Magna Graecia University of Catanzaro, 88100 Catanzaro, Italy
| |
Collapse
|
50
|
Satalkar V, Degaga GD, Li W, Pang YT, McShan AC, Gumbart JC, Mitchell JC, Torres MP. Generative β-hairpin design using a residue-based physicochemical property landscape. Biophys J 2024:S0006-3495(24)00070-5. [PMID: 38297834 DOI: 10.1016/j.bpj.2024.01.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/20/2023] [Accepted: 01/25/2024] [Indexed: 02/02/2024] Open
Abstract
De novo peptide design is a new frontier that has broad application potential in the biological and biomedical fields. Most existing models for de novo peptide design are largely based on sequence homology that can be restricted based on evolutionarily derived protein sequences and lack the physicochemical context essential in protein folding. Generative machine learning for de novo peptide design is a promising way to synthesize theoretical data that are based on, but unique from, the observable universe. In this study, we created and tested a custom peptide generative adversarial network intended to design peptide sequences that can fold into the β-hairpin secondary structure. This deep neural network model is designed to establish a preliminary foundation of the generative approach based on physicochemical and conformational properties of 20 canonical amino acids, for example, hydrophobicity and residue volume, using extant structure-specific sequence data from the PDB. The beta generative adversarial network model robustly distinguishes secondary structures of β hairpin from α helix and intrinsically disordered peptides with an accuracy of up to 96% and generates artificial β-hairpin peptide sequences with minimum sequence identities around 31% and 50% when compared against the current NCBI PDB and nonredundant databases, respectively. These results highlight the potential of generative models specifically anchored by physicochemical and conformational property features of amino acids to expand the sequence-to-structure landscape of proteins beyond evolutionary limits.
Collapse
Affiliation(s)
- Vardhan Satalkar
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia
| | - Gemechis D Degaga
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee
| | - Wei Li
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia
| | - Yui Tik Pang
- School of Physics, Georgia Institute of Technology, Atlanta, Georgia
| | - Andrew C McShan
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia
| | - James C Gumbart
- School of Physics, Georgia Institute of Technology, Atlanta, Georgia; School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia
| | - Julie C Mitchell
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee.
| | - Matthew P Torres
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia; School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia.
| |
Collapse
|