1
|
Gao W, Li H, Yang J, Zhang J, Fu R, Peng J, Hu Y, Liu Y, Wang Y, Li S, Zhang S. Machine Learning Assisted MALDI Mass Spectrometry for Rapid Antimicrobial Resistance Prediction in Clinicals. Anal Chem 2024. [PMID: 39096240 DOI: 10.1021/acs.analchem.4c00741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2024]
Abstract
Antimicrobial susceptibility testing (AST) plays a critical role in assessing the resistance of individual microbial isolates and determining appropriate antimicrobial therapeutics in a timely manner. However, conventional AST normally takes up to 72 h for obtaining the results. In healthcare facilities, the global distribution of vancomycin-resistant Enterococcus fecium (VRE) infections underscores the importance of rapidly determining VRE isolates. Here, we developed an integrated antimicrobial resistance (AMR) screening strategy by combining matrix-assisted laser desorption ionization mass spectrometry (MALDI-MS) with machine learning to rapidly predict VRE from clinical samples. Over 400 VRE and vancomycin-susceptible E. faecium (VSE) isolates were analyzed using MALDI-MS at different culture times, and a comprehensive dataset comprising 2388 mass spectra was generated. Algorithms including the support vector machine (SVM), SVM with L1-norm, logistic regression, and multilayer perceptron (MLP) were utilized to train the classification model. Validation on a panel of clinical samples (external patients) resulted in a prediction accuracy of 78.07%, 80.26%, 78.95%, and 80.54% for each algorithm, respectively, all with an AUROC above 0.80. Furthermore, a total of 33 mass regions were recognized as influential features and elucidated, contributing to the differences between VRE and VSE through the Shapley value and accuracy, while tandem mass spectrometry was employed to identify the specific peaks among them. Certain ribosomal proteins, such as A0A133N352 and R2Q455, were tentatively identified. Overall, the integration of machine learning with MALDI-MS has enabled the rapid determination of bacterial antibiotic resistance, greatly expediting the usage of appropriate antibiotics.
Collapse
Affiliation(s)
- Weibo Gao
- Beijing Advanced Innovation Center for Intelligent Robots and Systems, School of Mechatronical Engineering, Beijing Institute of Technology, Beijing 100081, China
| | - Hang Li
- School of Medical Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Jingxian Yang
- Department of Clinical Laboratory, Aerospace Center Hospital, Beijing 100039, China
| | - Jinming Zhang
- School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Rongxin Fu
- School of Medical Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Jiaxi Peng
- Department of Chemistry, University of Toronto, Toronto ON M5S 3H6, Canada
| | - Yechen Hu
- Department of Chemistry, University of Toronto, Toronto ON M5S 3H6, Canada
| | - Yitong Liu
- Department of Chemistry, University of Toronto, Toronto ON M5S 3H6, Canada
| | - Yingshi Wang
- Department of Clinical Laboratory, Aerospace Center Hospital, Beijing 100039, China
| | - Shuang Li
- School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Shuailong Zhang
- Beijing Advanced Innovation Center for Intelligent Robots and Systems, School of Mechatronical Engineering, Beijing Institute of Technology, Beijing 100081, China
- School of Integrated Circuits and Electronics, Beijing Institute of Technology, Beijing, 100081, China
- Zhengzhou Research Institute, Beijing Institute of Technology, Zhengzhou 100081, China
| |
Collapse
|
2
|
Ali S, Chourasia P, Patterson M. From PDB files to protein features: a comparative analysis of PDB bind and STCRDAB datasets. Med Biol Eng Comput 2024; 62:2449-2483. [PMID: 38622438 DOI: 10.1007/s11517-024-03074-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 03/13/2024] [Indexed: 04/17/2024]
Abstract
Understanding protein structures is crucial for various bioinformatics research, including drug discovery, disease diagnosis, and evolutionary studies. Protein structure classification is a critical aspect of structural biology, where supervised machine learning algorithms classify structures based on data from databases such as Protein Data Bank (PDB). However, the challenge lies in designing numerical embeddings for protein structures without losing essential information. Although some effort has been made in the literature, researchers have not effectively and rigorously combined the structural and sequence-based features for efficient protein classification to the best of our knowledge. To this end, we propose numerical embeddings that extract relevant features for protein sequences fetched from PDB structures from popular datasets such as PDB Bind and STCRDAB. The features are physicochemical properties such as aromaticity, instability index, flexibility, Grand Average of Hydropathy (GRAVY), isoelectric point, charge at pH, secondary structure fracture, molar extinction coefficient, and molecular weight. We also incorporate scaling features for the sliding windows (e.g., k-mers), which include Kyte and Doolittle (KD) hydropathy scale, Eisenberg hydrophobicity scale, Hydrophilicity scale, Flexibility of the amino acids, and Hydropathy scale. Multiple-feature selection aims to improve the accuracy of protein classification models. The results showed that the selected features significantly improved the predictive performance of existing embeddings.
Collapse
Affiliation(s)
- Sarwan Ali
- Georgia State University, Atlanta, GA, USA.
| | | | | |
Collapse
|
3
|
Ghafarollahi A, Buehler MJ. ProtAgents: protein discovery via large language model multi-agent collaborations combining physics and machine learning. DIGITAL DISCOVERY 2024; 3:1389-1409. [PMID: 38993729 PMCID: PMC11235180 DOI: 10.1039/d4dd00013g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 05/13/2024] [Indexed: 07/13/2024]
Abstract
Designing de novo proteins beyond those found in nature holds significant promise for advancements in both scientific and engineering applications. Current methodologies for protein design often rely on AI-based models, such as surrogate models that address end-to-end problems by linking protein structure to material properties or vice versa. However, these models frequently focus on specific material objectives or structural properties, limiting their flexibility when incorporating out-of-domain knowledge into the design process or comprehensive data analysis is required. In this study, we introduce ProtAgents, a platform for de novo protein design based on Large Language Models (LLMs), where multiple AI agents with distinct capabilities collaboratively address complex tasks within a dynamic environment. The versatility in agent development allows for expertise in diverse domains, including knowledge retrieval, protein structure analysis, physics-based simulations, and results analysis. The dynamic collaboration between agents, empowered by LLMs, provides a versatile approach to tackling protein design and analysis problems, as demonstrated through diverse examples in this study. The problems of interest encompass designing new proteins, analyzing protein structures and obtaining new first-principles data - natural vibrational frequencies - via physics simulations. The concerted effort of the system allows for powerful automated and synergistic design of de novo proteins with targeted mechanical properties. The flexibility in designing the agents, on one hand, and their capacity in autonomous collaboration through the dynamic LLM-based multi-agent environment on the other hand, unleashes great potentials of LLMs in addressing multi-objective materials problems and opens up new avenues for autonomous materials discovery and design.
Collapse
Affiliation(s)
- Alireza Ghafarollahi
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology 77 Massachusetts Ave. Cambridge MA 02139 USA
| | - Markus J Buehler
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology 77 Massachusetts Ave. Cambridge MA 02139 USA
- Center for Computational Science and Engineering, Schwarzman College of Computing, Massachusetts Institute of Technology 77 Massachusetts Ave. Cambridge MA 02139 USA
| |
Collapse
|
4
|
Sela M, Church JR, Schapiro I, Schneidman-Duhovny D. RhoMax: Computational Prediction of Rhodopsin Absorption Maxima Using Geometric Deep Learning. J Chem Inf Model 2024; 64:4630-4639. [PMID: 38829021 PMCID: PMC11200256 DOI: 10.1021/acs.jcim.4c00467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 05/15/2024] [Accepted: 05/17/2024] [Indexed: 06/05/2024]
Abstract
Microbial rhodopsins (MRs) are a diverse and abundant family of photoactive membrane proteins that serve as model systems for biophysical techniques. Optogenetics utilizes genetic engineering to insert specialized proteins into specific neurons or brain regions, allowing for manipulation of their activity through light and enabling the mapping and control of specific brain areas in living organisms. The obstacle of optogenetics lies in the fact that light has a limited ability to penetrate biological tissues, particularly blue light in the visible spectrum. Despite this challenge, most optogenetic systems rely on blue light due to the scarcity of red-shifted opsins. Finding additional red-shifted rhodopsins would represent a major breakthrough in overcoming the challenge of limited light penetration in optogenetics. However, determining the wavelength absorption maxima for rhodopsins based on their protein sequence is a significant hurdle. Current experimental methods are time-consuming, while computational methods lack accuracy. The paper introduces a new computational approach called RhoMax that utilizes structure-based geometric deep learning to predict the absorption wavelength of rhodopsins solely based on their sequences. The method takes advantage of AlphaFold2 for accurate modeling of rhodopsin structures. Once trained on a balanced train set, RhoMax rapidly and precisely predicted the maximum absorption wavelength of more than half of the sequences in our test set with an accuracy of 0.03 eV. By leveraging computational methods for absorption maxima determination, we can drastically reduce the time needed for designing new red-shifted microbial rhodopsins, thereby facilitating advances in the field of optogenetics.
Collapse
Affiliation(s)
- Meitar Sela
- The
Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Jonathan R. Church
- Fritz
Haber Center for Molecular Dynamics Research, Institute of Chemistry, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Igor Schapiro
- Fritz
Haber Center for Molecular Dynamics Research, Institute of Chemistry, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Dina Schneidman-Duhovny
- The
Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| |
Collapse
|
5
|
Liu S, Yang Q, Zhang L, Luo S. Accurate Protein p Ka Prediction with Physical Organic Chemistry Guided 3D Protein Representation. J Chem Inf Model 2024; 64:4410-4418. [PMID: 38780156 DOI: 10.1021/acs.jcim.4c00354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
Protein pKa is a fundamental physicochemical parameter that dictates protein structure and function. However, accurately determining protein site-pKa values remains a substantial challenge, both experimentally and theoretically. In this study, we introduce a physical organic approach, leveraging a protein structural and physical-organic-parameter-based representation (P-SPOC), to develop a rapid and intuitive model for protein pKa prediction. Our P-SPOC model achieves state-of-the-art predictive accuracy, with a mean absolute error (MAE) of 0.33 pKa units. Furthermore, we have incorporated advanced protein structure prediction models, like AlphaFold2, to approximate structures for proteins lacking three-dimensional representations, which enhances the applicability of our model in the context of structure-undetermined protein research. To promote broader accessibility within the research community, an online prediction interface was also established at isyn.luoszgroup.com.
Collapse
Affiliation(s)
- Siyuan Liu
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Qi Yang
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Long Zhang
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Sanzhong Luo
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| |
Collapse
|
6
|
Cheng J, Liang T, Xie XQ, Feng Z, Meng L. A new era of antibody discovery: an in-depth review of AI-driven approaches. Drug Discov Today 2024; 29:103984. [PMID: 38642702 DOI: 10.1016/j.drudis.2024.103984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 04/02/2024] [Accepted: 04/15/2024] [Indexed: 04/22/2024]
Abstract
Given their high affinity and specificity for a range of macromolecules, antibodies are widely used in the treatment of autoimmune diseases, cancers, inflammatory diseases, and Alzheimer's disease (AD). Traditional experimental methods are time-consuming, expensive, and labor-intensive. Recent advances in artificial intelligence (AI) technologies provide complementary methods that can reduce the time and costs required for antibody design by minimizing failures and increasing the success rate of experimental tests. In this review, we scrutinize the plethora of AI-driven methodologies that have been deployed over the past 4 years for modeling antibody structures, predicting antibody-antigen interactions, optimizing antibody affinity, and generating novel antibody candidates. We also briefly address the challenges faced in integrating AI-based models with traditional antibody discovery pipelines and highlight the potential future directions in this burgeoning field.
Collapse
Affiliation(s)
- Jin Cheng
- School of Pharmacy, Jiangsu Vocational College of Medicine, Yancheng, 224005, China
| | - Tianjian Liang
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screening Center, and Pharmacometrics & System Pharmacology PharmacoAnalytics, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA; Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Xiang-Qun Xie
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screening Center, and Pharmacometrics & System Pharmacology PharmacoAnalytics, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA; Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, Pittsburgh, PA 15261, USA; Drug Discovery Institute, University of Pittsburgh, Pittsburgh, PA 15261, USA; Department of Computational Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15261, USA; Department of Structural Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15261, USA.
| | - Zhiwei Feng
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screening Center, and Pharmacometrics & System Pharmacology PharmacoAnalytics, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA; Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, Pittsburgh, PA 15261, USA.
| | - Li Meng
- School of Pharmacy, Jiangsu Vocational College of Medicine, Yancheng, 224005, China.
| |
Collapse
|
7
|
Flynn CD, Chang D. Artificial Intelligence in Point-of-Care Biosensing: Challenges and Opportunities. Diagnostics (Basel) 2024; 14:1100. [PMID: 38893627 PMCID: PMC11172335 DOI: 10.3390/diagnostics14111100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 05/22/2024] [Accepted: 05/24/2024] [Indexed: 06/21/2024] Open
Abstract
The integration of artificial intelligence (AI) into point-of-care (POC) biosensing has the potential to revolutionize diagnostic methodologies by offering rapid, accurate, and accessible health assessment directly at the patient level. This review paper explores the transformative impact of AI technologies on POC biosensing, emphasizing recent computational advancements, ongoing challenges, and future prospects in the field. We provide an overview of core biosensing technologies and their use at the POC, highlighting ongoing issues and challenges that may be solved with AI. We follow with an overview of AI methodologies that can be applied to biosensing, including machine learning algorithms, neural networks, and data processing frameworks that facilitate real-time analytical decision-making. We explore the applications of AI at each stage of the biosensor development process, highlighting the diverse opportunities beyond simple data analysis procedures. We include a thorough analysis of outstanding challenges in the field of AI-assisted biosensing, focusing on the technical and ethical challenges regarding the widespread adoption of these technologies, such as data security, algorithmic bias, and regulatory compliance. Through this review, we aim to emphasize the role of AI in advancing POC biosensing and inform researchers, clinicians, and policymakers about the potential of these technologies in reshaping global healthcare landscapes.
Collapse
Affiliation(s)
- Connor D. Flynn
- Department of Chemistry, Weinberg College of Arts & Sciences, Northwestern University, Evanston, IL 60208, USA
| | - Dingran Chang
- Department of Biomedical Engineering, McCormick School of Engineering, Northwestern University, Evanston, IL 60208, USA
| |
Collapse
|
8
|
Sawhney A, Li J, Liao L. Improving AlphaFold Predicted Contacts for Alpha-Helical Transmembrane Proteins Using Structural Features. Int J Mol Sci 2024; 25:5247. [PMID: 38791287 PMCID: PMC11121315 DOI: 10.3390/ijms25105247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 05/06/2024] [Accepted: 05/09/2024] [Indexed: 05/26/2024] Open
Abstract
Residue contact maps provide a condensed two-dimensional representation of three-dimensional protein structures, serving as a foundational framework in structural modeling but also as an effective tool in their own right in identifying inter-helical binding sites and drawing insights about protein function. Treating contact maps primarily as an intermediate step for 3D structure prediction, contact prediction methods have limited themselves exclusively to sequential features. Now that AlphaFold2 predicts 3D structures with good accuracy in general, we examine (1) how well predicted 3D structures can be directly used for deciding residue contacts, and (2) whether features from 3D structures can be leveraged to further improve residue contact prediction. With a well-known benchmark dataset, we tested predicting inter-helical residue contact based on AlphaFold2's predicted structures, which gave an 83% average precision, already outperforming a sequential features-based state-of-the-art model. We then developed a procedure to extract features from atomic structure in the neighborhood of a residue pair, hypothesizing that these features will be useful in determining if the residue pair is in contact, provided the structure is decently accurate, such as predicted by AlphaFold2. Training on features generated from experimentally determined structures, we leveraged knowledge from known structures to significantly improve residue contact prediction, when testing using the same set of features but derived using AlphaFold2 structures. Our results demonstrate a remarkable improvement over AlphaFold2, achieving over 91.9% average precision for a held-out subset and over 89.5% average precision in cross-validation experiments.
Collapse
Affiliation(s)
- Aman Sawhney
- Department of Computer and Information Sciences, University of Delaware, Smith Hall, 18 Amstel Avenue, Newark, DE 19716, USA;
| | - Jiefu Li
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, 516 Jun Gong Road, Shanghai 200093, China;
| | - Li Liao
- Department of Computer and Information Sciences, University of Delaware, Smith Hall, 18 Amstel Avenue, Newark, DE 19716, USA;
| |
Collapse
|
9
|
Tran TO, Le NQK. Sa-TTCA: An SVM-based approach for tumor T-cell antigen classification using features extracted from biological sequencing and natural language processing. Comput Biol Med 2024; 174:108408. [PMID: 38636332 DOI: 10.1016/j.compbiomed.2024.108408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Revised: 01/13/2024] [Accepted: 04/01/2024] [Indexed: 04/20/2024]
Abstract
Accurately predicting tumor T-cell antigen (TTCA) sequences is a crucial task in the development of cancer vaccines and immunotherapies. TTCAs derived from tumor cells, are presented to immune cells (T cells) through major histocompatibility complex (MHC), via the recognition of specific portions of their structure known as epitopes. More specifically, MHC class I introduces TTCAs to T-cell receptors (TCR) which are located on the surface of CD8+ T cells. However, TTCA sequences are varied and lead to struggles in vaccine design. Recently, Machine learning (ML) models have been developed to predict TTCA sequences which could aid in fast and correct TTCA identification. During the construction of the TTCA predictor, the peptide encoding strategy is an important step. Previous studies have used biological descriptors for encoding TTCA sequences. However, there have been no studies that use natural language processing (NLP), a potential approach for this purpose. As sentences have their own words with diverse properties, biological sequences also hold unique characteristics that reflect evolutionary information, physicochemical values, and structural information. We hypothesized that NLP methods would benefit the prediction of TTCA. To develop a new identifying TTCA model, we first constructed a based model with widely used ML algorithms and extracted features from biological descriptors. Then, to improve our model performance, we added extracted features from biological language models (BLMs) based on NLP methods. Besides, we conducted feature selection by using Chi-square and Pearson Correlation Coefficient techniques. Then, SMOTE, Up-sampling, and Near-Miss were used to treat unbalanced data. Finally, we optimized Sa-TTCA by the SVM algorithm to the four most effective feature groups. The best performance of Sa-TTCA showed a competitive balanced accuracy of 87.5% on a training set, and 72.0% on an independent testing set. Our results suggest that integrating biological descriptors with natural language processing has the potential to improve the precision of predicting protein/peptide functionality, which could be beneficial for developing cancer vaccines.
Collapse
Affiliation(s)
- Thi-Oanh Tran
- International Ph.D. Program in Cell Therapy and Regenerative Medicine, College of Medicine, Taipei Medical University, Taipei, 110, Taiwan; AIBioMed Research Group, Taipei Medical University, Taipei, 110, Taiwan; Hematology and Blood Transfusion Center, Bach Mai Hospital, No. 78, Giai Phong Street, Hanoi, Viet Nam
| | - Nguyen Quoc Khanh Le
- AIBioMed Research Group, Taipei Medical University, Taipei, 110, Taiwan; Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, 110, Taiwan; Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei, 110, Taiwan; Translational Imaging Research Center, Taipei Medical University Hospital, Taipei, 110, Taiwan.
| |
Collapse
|
10
|
Tripp A, Braun M, Wieser F, Oberdorfer G, Lechner H. Click, Compute, Create: A Review of Web-based Tools for Enzyme Engineering. Chembiochem 2024:e202400092. [PMID: 38634409 DOI: 10.1002/cbic.202400092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 04/14/2024] [Accepted: 04/15/2024] [Indexed: 04/19/2024]
Abstract
Enzyme engineering, though pivotal across various biotechnological domains, is often plagued by its time-consuming and labor-intensive nature. This review aims to offer an overview of supportive in silico methodologies for this demanding endeavor. Starting from methods to predict protein structures, to classification of their activity and even the discovery of new enzymes we continue with describing tools used to increase thermostability and production yields of selected targets. Subsequently, we discuss computational methods to modulate both, the activity as well as selectivity of enzymes. Last, we present recent approaches based on cutting-edge machine learning methods to redesign enzymes. With exception of the last chapter, there is a strong focus on methods easily accessible via web-interfaces or simple Python-scripts, therefore readily useable for a diverse and broad community.
Collapse
Affiliation(s)
- Adrian Tripp
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
| | - Markus Braun
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
| | - Florian Wieser
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
| | - Gustav Oberdorfer
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
- BioTechMed, Graz, Austria
| | - Horst Lechner
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
- BioTechMed, Graz, Austria
| |
Collapse
|
11
|
Go EB, Lee JH, Cho JH, Kwon NH, Choi JI, Kwon I. Enhanced therapeutic potential of antibody fragment via IEDDA-mediated site-specific albumin conjugation. J Biol Eng 2024; 18:23. [PMID: 38576037 PMCID: PMC10996255 DOI: 10.1186/s13036-024-00418-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 03/14/2024] [Indexed: 04/06/2024] Open
Abstract
BACKGROUND The use of single-chain variable fragments (scFvs) for treating human diseases, such as cancer and immune system disorders, has attracted significant attention. However, a critical drawback of scFv is its extremely short serum half-life, which limits its therapeutic potential. Thus, there is a critical need to prolong the serum half-life of the scFv for clinical applications. One promising serum half-life extender for therapeutic proteins is human serum albumin (HSA), which is the most abundant protein in human serum, known to have an exceptionally long serum half-life. However, conjugating a macromolecular half-life extender to a small protein, such as scFv, often results in a significant loss of its critical properties. RESULTS In this study, we conjugated the HSA to a permissive site of scFv to improve pharmacokinetic profiles. To ensure minimal damage to the antigen-binding capacity of scFv upon HSA conjugation, we employed a site-specific conjugation approach using a heterobifunctional crosslinker that facilitates thiol-maleimide reaction and inverse electron-demand Diels-Alder reaction (IEDDA). As a model protein, we selected 4D5scFv, derived from trastuzumab, a therapeutic antibody used in human epithermal growth factor 2 (HER2)-positive breast cancer treatment. We introduced a phenylalanine analog containing a very reactive tetrazine group (frTet) at conjugation site candidates predicted by computational methods. Using the linker TCO-PEG4-MAL, a single HSA molecule was site-specifically conjugated to the 4D5scFv (4D5scFv-HSA). The 4D5scFv-HSA conjugate exhibited HER2 binding affinity comparable to that of unmodified 4D5scFv. Furthermore, in pharmacokinetic profile in mice, the serum half-life of 4D5scFv-HSA was approximately 12 h, which is 85 times longer than that of 4D5scFv. CONCLUSIONS The antigen binding results and pharmacokinetic profile of 4D5scFv-HSA demonstrate that the site-specifically albumin-conjugated scFv retained its binding affinity with a prolonged serum half-life. In conclusion, we developed an effective strategy to prepare site-specifically albumin-conjugated 4D5scFv, which can have versatile clinical applications with improved efficacy.
Collapse
Affiliation(s)
- Eun Byeol Go
- School of Materials Science and Engineering, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005, Republic of Korea
| | - Jae Hun Lee
- School of Materials Science and Engineering, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005, Republic of Korea
| | - Jeong Haeng Cho
- ProAbTech, Gwangju, 61005, Republic of Korea
- Department of Biotechnology and Bioengineering, Interdisciplinary Program for Bioenergy and Biomaterials, Chonnam National University, Gwangju, 61186, Republic of Korea
| | - Na Hyun Kwon
- School of Materials Science and Engineering, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005, Republic of Korea
| | - Jong-Il Choi
- Department of Biotechnology and Bioengineering, Interdisciplinary Program for Bioenergy and Biomaterials, Chonnam National University, Gwangju, 61186, Republic of Korea
| | - Inchan Kwon
- School of Materials Science and Engineering, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005, Republic of Korea.
| |
Collapse
|
12
|
Monteiro da Silva G, Cui JY, Dalgarno DC, Lisi GP, Rubenstein BM. High-throughput prediction of protein conformational distributions with subsampled AlphaFold2. Nat Commun 2024; 15:2464. [PMID: 38538622 PMCID: PMC10973385 DOI: 10.1038/s41467-024-46715-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 02/28/2024] [Indexed: 04/12/2024] Open
Abstract
This paper presents an innovative approach for predicting the relative populations of protein conformations using AlphaFold 2, an AI-powered method that has revolutionized biology by enabling the accurate prediction of protein structures. While AlphaFold 2 has shown exceptional accuracy and speed, it is designed to predict proteins' ground state conformations and is limited in its ability to predict conformational landscapes. Here, we demonstrate how AlphaFold 2 can directly predict the relative populations of different protein conformations by subsampling multiple sequence alignments. We tested our method against nuclear magnetic resonance experiments on two proteins with drastically different amounts of available sequence data, Abl1 kinase and the granulocyte-macrophage colony-stimulating factor, and predicted changes in their relative state populations with more than 80% accuracy. Our subsampling approach worked best when used to qualitatively predict the effects of mutations or evolution on the conformational landscape and well-populated states of proteins. It thus offers a fast and cost-effective way to predict the relative populations of protein conformations at even single-point mutation resolution, making it a useful tool for pharmacology, analysis of experimental results, and predicting evolution.
Collapse
Affiliation(s)
| | - Jennifer Y Cui
- Brown University Department of Molecular and Cell Biology and Biochemistry, Providence, RI, USA
| | | | - George P Lisi
- Brown University Department of Molecular and Cell Biology and Biochemistry, Providence, RI, USA
- Brown University Department of Chemistry, Providence, RI, USA
| | - Brenda M Rubenstein
- Brown University Department of Molecular and Cell Biology and Biochemistry, Providence, RI, USA.
- Brown University Department of Chemistry, Providence, RI, USA.
| |
Collapse
|
13
|
Baker K, Hughes N, Bhattacharya S. An interactive visualization tool for educational outreach in protein contact map overlap analysis. FRONTIERS IN BIOINFORMATICS 2024; 4:1358550. [PMID: 38562910 PMCID: PMC10982686 DOI: 10.3389/fbinf.2024.1358550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 03/04/2024] [Indexed: 04/04/2024] Open
Abstract
Recent advancements in contact map-based protein three-dimensional (3D) structure prediction have been driven by the evolution of deep learning algorithms. However, the gap in accessible software tools for novices in this domain remains a significant challenge. This study introduces GoFold, a novel, standalone graphical user interface (GUI) designed for beginners to perform contact map overlap (CMO) problems for better template selection. Unlike existing tools that cater more to research needs or assume foundational knowledge, GoFold offers an intuitive, user-friendly platform with comprehensive tutorials. It stands out in its ability to visually represent the CMO problem, allowing users to input proteins in various formats and explore the CMO problem. The educational value of GoFold is demonstrated through benchmarking against the state-of-the-art contact map overlap method, map_align, using two datasets: PSICOV and CAMEO. GoFold exhibits superior performance in terms of TM-score and Z-score metrics across diverse qualities of contact maps and target difficulties. Notably, GoFold runs efficiently on personal computers without any third-party dependencies, thereby making it accessible to the general public for promoting citizen science. The tool is freely available for download for macOS, Linux, and Windows.
Collapse
Affiliation(s)
- Kevan Baker
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Nathaniel Hughes
- Department of Computer Science and Computer Information Systems, Auburn University at Montgomery, Montgomery, AL, United States
| | - Sutanu Bhattacharya
- Department of Computer Science and Computer Information Systems, Auburn University at Montgomery, Montgomery, AL, United States
| |
Collapse
|
14
|
Lee KH, Won SJ, Oyinloye P, Shi L. Unlocking the Potential of High-Quality Dopamine Transporter Pharmacological Data: Advancing Robust Machine Learning-Based QSAR Modeling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.06.583803. [PMID: 38558976 PMCID: PMC10979915 DOI: 10.1101/2024.03.06.583803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
The dopamine transporter (DAT) plays a critical role in the central nervous system and has been implicated in numerous psychiatric disorders. The ligand-based approaches are instrumental to decipher the structure-activity relationship (SAR) of DAT ligands, especially the quantitative SAR (QSAR) modeling. By gathering and analyzing data from literature and databases, we systematically assemble a diverse range of ligands binding to DAT, aiming to discern the general features of DAT ligands and uncover the chemical space for potential novel DAT ligand scaffolds. The aggregation of DAT pharmacological activity data, particularly from databases like ChEMBL, provides a foundation for constructing robust QSAR models. The compilation and meticulous filtering of these data, establishing high-quality training datasets with specific divisions of pharmacological assays and data types, along with the application of QSAR modeling, prove to be a promising strategy for navigating the pertinent chemical space. Through a systematic comparison of DAT QSAR models using training datasets from various ChEMBL releases, we underscore the positive impact of enhanced data set quality and increased data set size on the predictive power of DAT QSAR models.
Collapse
Affiliation(s)
- Kuo Hao Lee
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse – Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA
| | - Sung Joon Won
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse – Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA
| | - Precious Oyinloye
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse – Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA
| | - Lei Shi
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse – Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA
| |
Collapse
|
15
|
Jänes J, Beltrao P. Deep learning for protein structure prediction and design-progress and applications. Mol Syst Biol 2024; 20:162-169. [PMID: 38291232 PMCID: PMC10912668 DOI: 10.1038/s44320-024-00016-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 12/21/2023] [Accepted: 01/11/2024] [Indexed: 02/01/2024] Open
Abstract
Proteins are the key molecular machines that orchestrate all biological processes of the cell. Most proteins fold into three-dimensional shapes that are critical for their function. Studying the 3D shape of proteins can inform us of the mechanisms that underlie biological processes in living cells and can have practical applications in the study of disease mutations or the discovery of novel drug treatments. Here, we review the progress made in sequence-based prediction of protein structures with a focus on applications that go beyond the prediction of single monomer structures. This includes the application of deep learning methods for the prediction of structures of protein complexes, different conformations, the evolution of protein structures and the application of these methods to protein design. These developments create new opportunities for research that will have impact across many areas of biomedical research.
Collapse
Affiliation(s)
- Jürgen Jänes
- Institute of Molecular Systems Biology, ETH Zürich, 8093, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Pedro Beltrao
- Institute of Molecular Systems Biology, ETH Zürich, 8093, Zürich, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
16
|
Corum MR, Venkannagari H, Hryc CF, Baker ML. Predictive modeling and cryo-EM: A synergistic approach to modeling macromolecular structure. Biophys J 2024; 123:435-450. [PMID: 38268190 PMCID: PMC10912932 DOI: 10.1016/j.bpj.2024.01.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 01/09/2024] [Accepted: 01/18/2024] [Indexed: 01/26/2024] Open
Abstract
Over the last 15 years, structural biology has seen unprecedented development and improvement in two areas: electron cryo-microscopy (cryo-EM) and predictive modeling. Once relegated to low resolutions, single-particle cryo-EM is now capable of achieving near-atomic resolutions of a wide variety of macromolecular complexes. Ushered in by AlphaFold, machine learning has powered the current generation of predictive modeling tools, which can accurately and reliably predict models for proteins and some complexes directly from the sequence alone. Although they offer new opportunities individually, there is an inherent synergy between these techniques, allowing for the construction of large, complex macromolecular models. Here, we give a brief overview of these approaches in addition to illustrating works that combine these techniques for model building. These examples provide insight into model building, assessment, and limitations when integrating predictive modeling with cryo-EM density maps. Together, these approaches offer the potential to greatly accelerate the generation of macromolecular structural insights, particularly when coupled with experimental data.
Collapse
Affiliation(s)
- Michael R Corum
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas
| | - Harikanth Venkannagari
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas
| | - Corey F Hryc
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas
| | - Matthew L Baker
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas.
| |
Collapse
|
17
|
Sathyavageeswaran A, Bonesso Sabadini J, Perry SL. Self-Assembling Polypeptides in Complex Coacervation. Acc Chem Res 2024; 57:386-398. [PMID: 38252962 DOI: 10.1021/acs.accounts.3c00689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Intracellular compartmentalization plays a pivotal role in cellular function, with membrane-bound organelles and membrane-less biomolecular "condensates" playing key roles. These condensates, formed through liquid-liquid phase separation (LLPS), enable selective compartmentalization without the barrier of a lipid bilayer, thereby facilitating rapid formation and dissolution in response to stimuli. Intrinsically disordered proteins (IDPs) or proteins with intrinsically disordered regions (IDRs), which are often rich in charged and polar amino acid sequences, scaffold many condensates, often in conjunction with RNA.Comprehending the impact of IDP/IDR sequences on phase separation poses a challenge due to the extensive chemical diversity resulting from the myriad amino acids and post-translational modifications. To tackle this hurdle, one approach has been to investigate LLPS in simplified polypeptide systems, which offer a narrower scope within the chemical space for exploration. This strategy is supported by studies that have demonstrated how IDP function can largely be understood based on general chemical features, such as clusters or patterns of charged amino acids, rather than residue-level effects, and the ways in which these kinds of motifs give rise to an ensemble of conformations.Our laboratory has utilized complex coacervates assembled from oppositely charged polypeptides as a simplified material analogue to the complexity of liquid-liquid phase separated biological condensates. Complex coacervation is an associative LLPS that occurs due to the electrostatic complexation of oppositely charged macro-ions. This process is believed to be driven by the entropic gains resulting from the release of bound counterions and the reorganization of water upon complex formation. Apart from their direct applicability to IDPs, polypeptides also serve as excellent model polymers for investigating molecular interactions due to the wide range of available side-chain functionalities and the capacity to finely regulate their sequence, thus enabling precise control over interactions with guest molecules.Here, we discuss fundamental studies examining how charge patterning, hydrophobicity, chirality, and architecture affect the phase separation of polypeptide-based complex coacervates. These efforts have leveraged a combination of experimental and computational approaches that provide insight into molecular level interactions. We also examine how these parameters affect the ability of complex coacervates to incorporate globular proteins and viruses. These efforts couple directly with our fundamental studies into coacervate formation, as such "guest" molecules should not be considered as experiencing simple encapsulation and are instead active participants in the electrostatic assembly of coacervate materials. Interestingly, we observed trends in the incorporation of proteins and viruses into coacervates formed using different chain length polypeptides that are not well explained by simple electrostatic arguments and may be the result of more complex interactions between globular and polymeric species. Additionally, we describe experimental evidence supporting the potential for complex coacervates to improve the thermal stability of embedded biomolecules, such as viral vaccines.Ultimately, peptide-based coacervates have the potential to help unravel the physics behind biological condensates, while paving the way for innovative methods in compartmentalization, purification, and biomolecule stabilization. These advancements could have implications spanning medicine to biocatalysis.
Collapse
Affiliation(s)
- Arvind Sathyavageeswaran
- Department of Chemical Engineering, University of Massachusetts Amherst, Amherst, Massachusetts 10003, United States
| | - Júlia Bonesso Sabadini
- Department of Chemical Engineering, University of Massachusetts Amherst, Amherst, Massachusetts 10003, United States
- Institute of Chemistry, University of Campinas (UNICAMP), Campinas, SP 13083-970, Brazil
| | - Sarah L Perry
- Department of Chemical Engineering, University of Massachusetts Amherst, Amherst, Massachusetts 10003, United States
| |
Collapse
|
18
|
Ahmad M, Imran A, Movileanu L. Overlapping characteristics of weak interactions of two transcriptional regulators with WDR5. Int J Biol Macromol 2024; 258:128969. [PMID: 38158065 PMCID: PMC10922662 DOI: 10.1016/j.ijbiomac.2023.128969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 12/18/2023] [Accepted: 12/20/2023] [Indexed: 01/03/2024]
Abstract
The WD40 repeat protein 5 (WDR5) is a nuclear hub that critically influences gene expression by interacting with transcriptional regulators. Utilizing the WDR5 binding motif (WBM) site, WDR5 interacts with the myelocytomatosis (MYC), an oncoprotein transcription factor, and the retinoblastoma-binding protein 5 (RbBP5), a scaffolding element of an epigenetic complex. Given the clinical significance of these protein-protein interactions (PPIs), there is a pressing necessity for a quantitative assessment of these processes. Here, we use biolayer interferometry (BLI) to examine interactions of WDR5 with consensus peptide ligands of MYC and RbBP5. We found that both interactions exhibit relatively weak affinities arising from a fast dissociation process. Remarkably, live-cell imaging identified distinctive WDR5 localizations in the absence and presence of full-length binding partners. Although WDR5 tends to accumulate within nucleoli, WBM-mediated interactions with MYC and RbBP5 require their localization outside nucleoli. We utilize fluorescence resonance energy transfer (FRET) microscopy to confirm these weak interactions through a low FRET efficiency of the MYC-WDR5 and RbBP5-WDR5 complexes in living cells. In addition, we evaluate the impact of peptide and small-molecule inhibitors on these interactions. These outcomes form a fundamental basis for further developments to clarify the multitasking role of the WBM binding site of WDR5.
Collapse
Affiliation(s)
- Mohammad Ahmad
- Department of Physics, Syracuse University, 201 Physics Building, Syracuse, NY 13244-1130, USA
| | - Ali Imran
- Department of Physics, Syracuse University, 201 Physics Building, Syracuse, NY 13244-1130, USA
| | - Liviu Movileanu
- Department of Physics, Syracuse University, 201 Physics Building, Syracuse, NY 13244-1130, USA; Department of Biomedical and Chemical Engineering, Syracuse University, 329 Link Hall, Syracuse, NY 13244, USA; The BioInspired Institute, Syracuse University, Syracuse, NY 13244, USA.
| |
Collapse
|
19
|
Krokidis MG, Dimitrakopoulos GN, Vrahatis AG, Exarchos TP, Vlamos P. Challenges and limitations in computational prediction of protein misfolding in neurodegenerative diseases. Front Comput Neurosci 2024; 17:1323182. [PMID: 38250244 PMCID: PMC10796696 DOI: 10.3389/fncom.2023.1323182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 12/19/2023] [Indexed: 01/23/2024] Open
Affiliation(s)
| | | | | | | | - Panagiotis Vlamos
- Bioinformatics and Human Electrophysiology Laboratory, Department of Informatics, Ionian University, Corfu, Greece
| |
Collapse
|
20
|
Taneja I, Lasker K. Machine-learning-based methods to generate conformational ensembles of disordered proteins. Biophys J 2024; 123:101-113. [PMID: 38053335 PMCID: PMC10808026 DOI: 10.1016/j.bpj.2023.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 10/24/2023] [Accepted: 12/01/2023] [Indexed: 12/07/2023] Open
Abstract
Intrinsically disordered proteins are characterized by a conformational ensemble. While computational approaches such as molecular dynamics simulations have been used to generate such ensembles, their computational costs can be prohibitive. An alternative approach is to learn from data and train machine-learning models to generate conformational ensembles of disordered proteins. This has been a relatively unexplored approach, and in this work we demonstrate a proof-of-principle approach to do so. Specifically, we devised a two-stage computational pipeline: in the first stage, we employed supervised machine-learning models to predict ensemble-derived two-dimensional (2D) properties of a sequence, given the conformational ensemble of a closely related sequence. In the second stage, we used denoising diffusion models to generate three-dimensional (3D) coarse-grained conformational ensembles, given the two-dimensional predictions outputted by the first stage. We trained our models on a data set of coarse-grained molecular dynamics simulations of thousands of rationally designed synthetic sequences. The accuracy of our 2D and 3D predictions was validated across multiple metrics, and our work demonstrates the applicability of machine-learning techniques to predicting higher-dimensional properties of disordered proteins.
Collapse
Affiliation(s)
- Ishan Taneja
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, California
| | - Keren Lasker
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, California.
| |
Collapse
|
21
|
Radjasandirane R, de Brevern AG. AlphaFold2 for Protein Structure Prediction: Best Practices and Critical Analyses. Methods Mol Biol 2024; 2836:235-252. [PMID: 38995544 DOI: 10.1007/978-1-0716-4007-4_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Abstract
AlphaFold2 (AF2) has emerged in recent years as a groundbreaking innovation that has revolutionized several scientific fields, in particular structural biology, drug design, and the elucidation of disease mechanisms. Many scientists now use AF2 on a daily basis, including non-specialist users. This chapter is aimed at the latter. Tips and tricks for getting the most out of AF2 to produce a high-quality biological model are discussed here. We suggest to non-specialist users how to maintain a critical perspective when working with AF2 models and provide guidelines on how to properly evaluate them. After showing how to perform our own structure prediction using ColabFold, we list several ways to improve AF2 models by adding information that is missing from the original AF2 model. By using software such as AlphaFill to add cofactors and ligands to the models, or MODELLER to add disulfide bridges between cysteines, we guide users to build a high-quality biological model suitable for applications such as drug design, protein interaction, or molecular dynamics studies.
Collapse
Affiliation(s)
- Ragousandirane Radjasandirane
- Université Paris Cité and Université des Antilles and Université de la Réunion, BIGR, UMR_S1134, DSIMB Team, Inserm, Paris, France
| | - Alexandre G de Brevern
- Université Paris Cité and Université des Antilles and Université de la Réunion, BIGR, UMR_S1134, DSIMB Team, Inserm, Paris, France.
| |
Collapse
|
22
|
Ali S, Chourasia P, Patterson M. When Protein Structure Embedding Meets Large Language Models. Genes (Basel) 2023; 15:25. [PMID: 38254915 PMCID: PMC10815811 DOI: 10.3390/genes15010025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 12/16/2023] [Accepted: 12/21/2023] [Indexed: 01/24/2024] Open
Abstract
Protein structure analysis is essential in various bioinformatics domains such as drug discovery, disease diagnosis, and evolutionary studies. Within structural biology, the classification of protein structures is pivotal, employing machine learning algorithms to categorize structures based on data from databases like the Protein Data Bank (PDB). To predict protein functions, embeddings based on protein sequences have been employed. Creating numerical embeddings that preserve vital information while considering protein structure and sequence presents several challenges. The existing literature lacks a comprehensive and effective approach that combines structural and sequence-based features to achieve efficient protein classification. While large language models (LLMs) have exhibited promising outcomes for protein function prediction, their focus primarily lies on protein sequences, disregarding the 3D structures of proteins. The quality of embeddings heavily relies on how well the geometry of the embedding space aligns with the underlying data structure, posing a critical research question. Traditionally, Euclidean space has served as a widely utilized framework for embeddings. In this study, we propose a novel method for designing numerical embeddings in Euclidean space for proteins by leveraging 3D structure information, specifically employing the concept of contact maps. These embeddings are synergistically combined with features extracted from LLMs and traditional feature engineering techniques to enhance the performance of embeddings in supervised protein analysis. Experimental results on benchmark datasets, including PDB Bind and STCRDAB, demonstrate the superior performance of the proposed method for protein function prediction.
Collapse
Affiliation(s)
| | | | - Murray Patterson
- Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA; (S.A.); (P.C.)
| |
Collapse
|
23
|
da Silva GM, Cui JY, Dalgarno DC, Lisi GP, Rubenstein BM. Predicting Relative Populations of Protein Conformations without a Physics Engine Using AlphaFold 2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.25.550545. [PMID: 37546747 PMCID: PMC10402055 DOI: 10.1101/2023.07.25.550545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
This paper presents a novel approach for predicting the relative populations of protein conformations using AlphaFold 2, an AI-powered method that has revolutionized biology by enabling the accurate prediction of protein structures. While AlphaFold 2 has shown exceptional accuracy and speed, it is designed to predict proteins' ground state conformations and is limited in its ability to predict conformational landscapes. Here, we demonstrate how AlphaFold 2 can directly predict the relative populations of different protein conformations by subsampling multiple sequence alignments. We tested our method against NMR experiments on two proteins with drastically different amounts of available sequence data, Abl1 kinase and the granulocyte-macrophage colony-stimulating factor, and predicted changes in their relative state populations with more than 80% accuracy. Our subsampling approach worked best when used to qualitatively predict the effects of mutations or evolution on the conformational landscape and well-populated states of proteins. It thus offers a fast and cost-effective way to predict the relative populations of protein conformations at even single-point mutation resolution, making it a useful tool for pharmacology, NMR analysis, and evolution.
Collapse
Affiliation(s)
- Gabriel Monteiro da Silva
- Brown University Department of Molecular Biology, Cell Biology, and Biochemistry, Providence, RI, USA
| | - Jennifer Y Cui
- Brown University Department of Molecular Biology, Cell Biology, and Biochemistry, Providence, RI, USA
| | | | - George P Lisi
- Brown University Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University Department of Chemistry, Providence, RI, USA
| | - Brenda M Rubenstein
- Brown University Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University Department of Chemistry, Providence, RI, USA
| |
Collapse
|
24
|
Simpkin AJ, Mesdaghi S, Sánchez Rodríguez F, Elliott L, Murphy DL, Kryshtafovych A, Keegan RM, Rigden DJ. Tertiary structure assessment at CASP15. Proteins 2023; 91:1616-1635. [PMID: 37746927 PMCID: PMC10792517 DOI: 10.1002/prot.26593] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 08/25/2023] [Accepted: 09/07/2023] [Indexed: 09/26/2023]
Abstract
The results of tertiary structure assessment at CASP15 are reported. For the first time, recognizing the outstanding performance of AlphaFold 2 (AF2) at CASP14, all single-chain predictions were assessed together, irrespective of whether a template was available. At CASP15, there was no single stand-out group, with most of the best-scoring groups-led by PEZYFoldings, UM-TBM, and Yang Server-employing AF2 in one way or another. Many top groups paid special attention to generating deep Multiple Sequence Alignments (MSAs) and testing variant MSAs, thereby allowing them to successfully address some of the hardest targets. Such difficult targets, as well as lacking templates, were typically proteins with few homologues. Local divergence between prediction and target correlated with localization at crystal lattice or chain interfaces, and with regions exhibiting high B-factor factors in crystal structure targets, and should not necessarily be considered as representing error in the prediction. However, analysis of exposed and buried side chain accuracy showed room for improvement even in the latter. Nevertheless, a majority of groups produced high-quality predictions for most targets, which are valuable for experimental structure determination, functional analysis, and many other tasks across biology. These include those applying methods similar to those used to generate major resources such as the AlphaFold Protein Structure Database and the ESM Metagenomic atlas: the confidence estimates of the former were also notably accurate.
Collapse
Affiliation(s)
- Adam J. Simpkin
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| | - Shahram Mesdaghi
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
- Computational Biology Facility, MerseyBio, University of LiverpoolLiverpoolUK
| | - Filomeno Sánchez Rodríguez
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
- Life Science, Diamond Light Source, Harwell Science and Innovation CampusOxfordshireUK
- Department of Chemistry, York Structural Biology LaboratoryUniversity of YorkYorkUK
| | - Luc Elliott
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| | - David L. Murphy
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| | | | - Ronan M. Keegan
- UKRI‐STFC, Rutherford Appleton Laboratory, Research Complex at HarwellDidcotUK
| | - Daniel J. Rigden
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| |
Collapse
|
25
|
Monzon AM, Arrías PN, Elofsson A, Mier P, Andrade-Navarro MA, Bevilacqua M, Clementel D, Bateman A, Hirsh L, Fornasari MS, Parisi G, Piovesan D, Kajava AV, Tosatto SCE. A STRP-ed definition of Structured Tandem Repeats in Proteins. J Struct Biol 2023; 215:108023. [PMID: 37652396 DOI: 10.1016/j.jsb.2023.108023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 07/31/2023] [Accepted: 08/28/2023] [Indexed: 09/02/2023]
Abstract
Tandem Repeat Proteins (TRPs) are a class of proteins with repetitive amino acid sequences that have been studied extensively for over two decades. Different features at the level of sequence, structure, function and evolution have been attributed to them by various authors. And yet many of its salient features appear only when looking at specific subclasses of protein tandem repeats. Here, we attempt to rationalize the existing knowledge on Tandem Repeat Proteins (TRPs) by pointing out several dichotomies. The emerging picture is more nuanced than generally assumed and allows us to draw some boundaries of what is not a "proper" TRP. We conclude with an operational definition of a specific subset, which we have denominated STRPs (Structural Tandem Repeat Proteins), which separates a subclass of tandem repeats with distinctive features from several other less well-defined types of repeats. We believe that this definition will help researchers in the field to better characterize the biological meaning of this large yet largely understudied group of proteins.
Collapse
Affiliation(s)
- Alexander Miguel Monzon
- Dept. of Information Engineering, University of Padova, via Giovanni Gradenigo 6/B, 35131 Padova, Italy
| | - Paula Nazarena Arrías
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Arne Elofsson
- Dept. of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Tomtebodavägen 23, 171 21 Solna, Sweden
| | - Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Martina Bevilacqua
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Damiano Clementel
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Layla Hirsh
- Dept. of Engineering, Faculty of Science and Engineering, Pontifical Catholic University of Peru, Av. Universitaria 1801 San Miguel, Lima 32, Lima, Peru
| | - Maria Silvina Fornasari
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier (CRBM), UMR 5237 CNRS, Université Montpellier, 1919 Route de Mende, Cedex 5, 34293 Montpellier, France
| | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy.
| |
Collapse
|
26
|
Chungyoun M, Gray JJ. AI Models for Protein Design are Driving Antibody Engineering. CURRENT OPINION IN BIOMEDICAL ENGINEERING 2023; 28:100473. [PMID: 37484815 PMCID: PMC10361400 DOI: 10.1016/j.cobme.2023.100473] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/25/2023]
Abstract
Therapeutic antibody engineering seeks to identify antibody sequences with specific binding to a target and optimized drug-like properties. When guided by deep learning, antibody generation methods can draw on prior knowledge and experimental efforts to improve this process. By leveraging the increasing quantity and quality of predicted structures of antibodies and target antigens, powerful structure-based generative models are emerging. In this review, we tie the advancements in deep learning-based protein structure prediction and design to the study of antibody therapeutics.
Collapse
Affiliation(s)
- Michael Chungyoun
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, 21287, USA
| | - Jeffrey J Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, 21287, USA
- Program in Molecular Biophysics, institute for Nanobiotechnology, and Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21287, USA
| |
Collapse
|
27
|
Larrea-Sebal A, Jebari-Benslaiman S, Galicia-Garcia U, Jose-Urteaga AS, Uribe KB, Benito-Vicente A, Martín C. Predictive Modeling and Structure Analysis of Genetic Variants in Familial Hypercholesterolemia: Implications for Diagnosis and Protein Interaction Studies. Curr Atheroscler Rep 2023; 25:839-859. [PMID: 37847331 PMCID: PMC10618353 DOI: 10.1007/s11883-023-01154-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/15/2023] [Indexed: 10/18/2023]
Abstract
PURPOSE OF REVIEW Familial hypercholesterolemia (FH) is a hereditary condition characterized by elevated levels of low-density lipoprotein cholesterol (LDL-C), which increases the risk of cardiovascular disease if left untreated. This review aims to discuss the role of bioinformatics tools in evaluating the pathogenicity of missense variants associated with FH. Specifically, it highlights the use of predictive models based on protein sequence, structure, evolutionary conservation, and other relevant features in identifying genetic variants within LDLR, APOB, and PCSK9 genes that contribute to FH. RECENT FINDINGS In recent years, various bioinformatics tools have emerged as valuable resources for analyzing missense variants in FH-related genes. Tools such as REVEL, Varity, and CADD use diverse computational approaches to predict the impact of genetic variants on protein function. These tools consider factors such as sequence conservation, structural alterations, and receptor binding to aid in interpreting the pathogenicity of identified missense variants. While these predictive models offer valuable insights, the accuracy of predictions can vary, especially for proteins with unique characteristics that might not be well represented in the databases used for training. This review emphasizes the significance of utilizing bioinformatics tools for assessing the pathogenicity of FH-associated missense variants. Despite their contributions, a definitive diagnosis of a genetic variant necessitates functional validation through in vitro characterization or cascade screening. This step ensures the precise identification of FH-related variants, leading to more accurate diagnoses. Integrating genetic data with reliable bioinformatics predictions and functional validation can enhance our understanding of the genetic basis of FH, enabling improved diagnosis, risk stratification, and personalized treatment for affected individuals. The comprehensive approach outlined in this review promises to advance the management of this inherited disorder, potentially leading to better health outcomes for those affected by FH.
Collapse
Affiliation(s)
- Asier Larrea-Sebal
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
- Fundación Biofisika Bizkaia, 48940, Leioa, Spain
| | - Shifa Jebari-Benslaiman
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - Unai Galicia-Garcia
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - Ane San Jose-Urteaga
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
| | - Kepa B Uribe
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
| | - Asier Benito-Vicente
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - César Martín
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain.
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain.
| |
Collapse
|
28
|
Zhong W, Li H, Wang Y. Design and Construction of Artificial Biological Systems for One-Carbon Utilization. BIODESIGN RESEARCH 2023; 5:0021. [PMID: 37915992 PMCID: PMC10616972 DOI: 10.34133/bdr.0021] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 10/05/2023] [Indexed: 11/03/2023] Open
Abstract
The third-generation (3G) biorefinery aims to use microbial cell factories or enzymatic systems to synthesize value-added chemicals from one-carbon (C1) sources, such as CO2, formate, and methanol, fueled by renewable energies like light and electricity. This promising technology represents an important step toward sustainable development, which can help address some of the most pressing environmental challenges faced by modern society. However, to establish processes competitive with the petroleum industry, it is crucial to determine the most viable pathways for C1 utilization and productivity and yield of the target products. In this review, we discuss the progresses that have been made in constructing artificial biological systems for 3G biorefineries in the last 10 years. Specifically, we highlight the representative works on the engineering of artificial autotrophic microorganisms, tandem enzymatic systems, and chemo-bio hybrid systems for C1 utilization. We also prospect the revolutionary impact of these developments on biotechnology. By harnessing the power of 3G biorefinery, scientists are establishing a new frontier that could potentially revolutionize our approach to industrial production and pave the way for a more sustainable future.
Collapse
Affiliation(s)
- Wei Zhong
- Westlake Center of Synthetic Biology and Integrated Bioengineering, School of Engineering,
Westlake University, Hangzhou 310000, PR China
| | - Hailong Li
- Westlake Center of Synthetic Biology and Integrated Bioengineering, School of Engineering,
Westlake University, Hangzhou 310000, PR China
- School of Materials Science and Engineering,
Zhejiang University, Zhejiang Province, Hangzhou 310000, PR China
| | - Yajie Wang
- Westlake Center of Synthetic Biology and Integrated Bioengineering, School of Engineering,
Westlake University, Hangzhou 310000, PR China
| |
Collapse
|
29
|
Alderson TR, Pritišanac I, Kolarić Đ, Moses AM, Forman-Kay JD. Systematic identification of conditionally folded intrinsically disordered regions by AlphaFold2. Proc Natl Acad Sci U S A 2023; 120:e2304302120. [PMID: 37878721 PMCID: PMC10622901 DOI: 10.1073/pnas.2304302120] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 08/30/2023] [Indexed: 10/27/2023] Open
Abstract
The AlphaFold Protein Structure Database contains predicted structures for millions of proteins. For the majority of human proteins that contain intrinsically disordered regions (IDRs), which do not adopt a stable structure, it is generally assumed that these regions have low AlphaFold2 confidence scores that reflect low-confidence structural predictions. Here, we show that AlphaFold2 assigns confident structures to nearly 15% of human IDRs. By comparison to experimental NMR data for a subset of IDRs that are known to conditionally fold (i.e., upon binding or under other specific conditions), we find that AlphaFold2 often predicts the structure of the conditionally folded state. Based on databases of IDRs that are known to conditionally fold, we estimate that AlphaFold2 can identify conditionally folding IDRs at a precision as high as 88% at a 10% false positive rate, which is remarkable considering that conditionally folded IDR structures were minimally represented in its training data. We find that human disease mutations are nearly fivefold enriched in conditionally folded IDRs over IDRs in general and that up to 80% of IDRs in prokaryotes are predicted to conditionally fold, compared to less than 20% of eukaryotic IDRs. These results indicate that a large majority of IDRs in the proteomes of human and other eukaryotes function in the absence of conditional folding, but the regions that do acquire folds are more sensitive to mutations. We emphasize that the AlphaFold2 predictions do not reveal functionally relevant structural plasticity within IDRs and cannot offer realistic ensemble representations of conditionally folded IDRs.
Collapse
Affiliation(s)
- T. Reid Alderson
- Department of Biochemistry, University of Toronto, Toronto, ONM5S 1A8, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ONM5S 1A8, Canada
| | - Iva Pritišanac
- Department of Cell and Systems Biology, University of Toronto, Toronto, ONM5S 35G, Canada
- Molecular Medicine Program, The Hospital for Sick Children, Toronto, ONM5G 0A4, Canada
- Department of Molecular Biology and Biochemistry, Gottfried Schatz Research Center for Cell Signaling, Metabolism and Aging, Medical University of Graz, Graz8010, Austria
| | - Đesika Kolarić
- Department of Molecular Biology and Biochemistry, Gottfried Schatz Research Center for Cell Signaling, Metabolism and Aging, Medical University of Graz, Graz8010, Austria
| | - Alan M. Moses
- Department of Cell and Systems Biology, University of Toronto, Toronto, ONM5S 35G, Canada
| | - Julie D. Forman-Kay
- Department of Biochemistry, University of Toronto, Toronto, ONM5S 1A8, Canada
- Molecular Medicine Program, The Hospital for Sick Children, Toronto, ONM5G 0A4, Canada
| |
Collapse
|
30
|
Rothman JE. Starting at Go: Protein structure prediction succumbs to machine learning. Proc Natl Acad Sci U S A 2023; 120:e2311128120. [PMID: 37732752 PMCID: PMC10523586 DOI: 10.1073/pnas.2311128120] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2023] Open
Abstract
This year's Lasker Basic Science Award recognizes the invention of AlphaFold, a revolutionary advance in the history of protein research which for the first time offers the practical ability to accurately predict the three-dimensional arrangement of amino acids in the vast majority of proteins on a genomic scale on the basis of sequence alone [J. Jumper et al., Nature 596, 583-589 (2021) and K. Tunyasuvunakool et al., Nature 596, 590-596 (2021)]. This extraordinary achievement by Demis Hassabis and John Jumper and their coworkers at Google's DeepMind and other collaborators was built on decades of experimental protein structure determination (structural biology) as well as the gradual development of multiple strategies incorporating biologically inspired statistical approaches. But when Jumper and Hassabis added a brew of innovative neural network-based machine learning approaches to the mix, the results were explosive. Realizing the half-century-old dream of predicting protein structure has already accelerated the pace and creativity of many areas of Chemistry, Biology, and Medicine.
Collapse
|
31
|
Schafer JW, Porter LL. Evolutionary selection of proteins with two folds. Nat Commun 2023; 14:5478. [PMID: 37673981 PMCID: PMC10482954 DOI: 10.1038/s41467-023-41237-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 08/24/2023] [Indexed: 09/08/2023] Open
Abstract
Although most globular proteins fold into a single stable structure, an increasing number have been shown to remodel their secondary and tertiary structures in response to cellular stimuli. State-of-the-art algorithms predict that these fold-switching proteins adopt only one stable structure, missing their functionally critical alternative folds. Why these algorithms predict a single fold is unclear, but all of them infer protein structure from coevolved amino acid pairs. Here, we hypothesize that coevolutionary signatures are being missed. Suspecting that single-fold variants could be masking these signatures, we developed an approach, called Alternative Contact Enhancement (ACE), to search both highly diverse protein superfamilies-composed of single-fold and fold-switching variants-and protein subfamilies with more fold-switching variants. ACE successfully revealed coevolution of amino acid pairs uniquely corresponding to both conformations of 56/56 fold-switching proteins from distinct families. Then, we used ACE-derived contacts to (1) predict two experimentally consistent conformations of a candidate protein with unsolved structure and (2) develop a blind prediction pipeline for fold-switching proteins. The discovery of widespread dual-fold coevolution indicates that fold-switching sequences have been preserved by natural selection, implying that their functionalities provide evolutionary advantage and paving the way for predictions of diverse protein structures from single sequences.
Collapse
Affiliation(s)
- Joseph W Schafer
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Lauren L Porter
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA.
- National Heart, Lung, and Blood Institute, Biochemistry and Biophysics Center, National Institutes of Health, Bethesda, MD, 20892, USA.
| |
Collapse
|
32
|
Huang Z, Cui X, Xia Y, Zhao K, Zhang G. Pathfinder: Protein folding pathway prediction based on conformational sampling. PLoS Comput Biol 2023; 19:e1011438. [PMID: 37695768 PMCID: PMC10513300 DOI: 10.1371/journal.pcbi.1011438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 09/21/2023] [Accepted: 08/17/2023] [Indexed: 09/13/2023] Open
Abstract
The study of protein folding mechanism is a challenge in molecular biology, which is of great significance for revealing the movement rules of biological macromolecules, understanding the pathogenic mechanism of folding diseases, and designing protein engineering materials. Based on the hypothesis that the conformational sampling trajectory contain the information of folding pathway, we propose a protein folding pathway prediction algorithm named Pathfinder. Firstly, Pathfinder performs large-scale sampling of the conformational space and clusters the decoys obtained in the sampling. The heterogeneous conformations obtained by clustering are named seed states. Then, a resampling algorithm that is not constrained by the local energy basin is designed to obtain the transition probabilities of seed states. Finally, protein folding pathways are inferred from the maximum transition probabilities of seed states. The proposed Pathfinder is tested on our developed test set (34 proteins). For 11 widely studied proteins, we correctly predicted their folding pathways and specifically analyzed 5 of them. For 13 proteins, we predicted their folding pathways to be further verified by biological experiments. For 6 proteins, we analyzed the reasons for the low prediction accuracy. For the other 4 proteins without biological experiment results, potential folding pathways were predicted to provide new insights into protein folding mechanism. The results reveal that structural analogs may have different folding pathways to express different biological functions, homologous proteins may contain common folding pathways, and α-helices may be more prone to early protein folding than β-strands.
Collapse
Affiliation(s)
- Zhaohong Huang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Xinyue Cui
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Yuhao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| |
Collapse
|
33
|
Williams AH, Zhan CG. Staying Ahead of the Game: How SARS-CoV-2 has Accelerated the Application of Machine Learning in Pandemic Management. BioDrugs 2023; 37:649-674. [PMID: 37464099 DOI: 10.1007/s40259-023-00611-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/28/2023] [Indexed: 07/20/2023]
Abstract
In recent years, machine learning (ML) techniques have garnered considerable interest for their potential use in accelerating the rate of drug discovery. With the emergence of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic, the utilization of ML has become even more crucial in the search for effective antiviral medications. The pandemic has presented the scientific community with a unique challenge, and the rapid identification of potential treatments has become an urgent priority. Researchers have been able to accelerate the process of identifying drug candidates, repurposing existing drugs, and designing new compounds with desirable properties using machine learning in drug discovery. To train predictive models, ML techniques in drug discovery rely on the analysis of large datasets, including both experimental and clinical data. These models can be used to predict the biological activities, potential side effects, and interactions with specific target proteins of drug candidates. This strategy has proven to be an effective method for identifying potential coronavirus disease 2019 (COVID-19) and other disease treatments. This paper offers a thorough analysis of the various ML techniques implemented to combat COVID-19, including supervised and unsupervised learning, deep learning, and natural language processing. The paper discusses the impact of these techniques on pandemic drug development, including the identification of potential treatments, the understanding of the disease mechanism, and the creation of effective and safe therapeutics. The lessons learned can be applied to future outbreaks and drug discovery initiatives.
Collapse
Affiliation(s)
- Alexander H Williams
- Molecular Modeling and Biopharmaceutical Center, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA
- GSK Upper Providence, 1250 S. Collegeville Road, Collegeville, PA, 19426, USA
| | - Chang-Guo Zhan
- Molecular Modeling and Biopharmaceutical Center, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA.
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA.
| |
Collapse
|
34
|
Lefin N, Herrera-Belén L, Farias JG, Beltrán JF. Review and perspective on bioinformatics tools using machine learning and deep learning for predicting antiviral peptides. Mol Divers 2023:10.1007/s11030-023-10718-3. [PMID: 37626205 DOI: 10.1007/s11030-023-10718-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 08/15/2023] [Indexed: 08/27/2023]
Abstract
Viruses constitute a constant threat to global health and have caused millions of human and animal deaths throughout human history. Despite advances in the discovery of antiviral compounds that help fight these pathogens, finding a solution to this problem continues to be a task that consumes time and financial resources. Currently, artificial intelligence (AI) has revolutionized many areas of the biological sciences, making it possible to decipher patterns in amino acid sequences that encode different functions and activities. Within the field of AI, machine learning, and deep learning algorithms have been used to discover antimicrobial peptides. Due to their effectiveness and specificity, antimicrobial peptides (AMPs) hold excellent promise for treating various infections caused by pathogens. Antiviral peptides (AVPs) are a specific type of AMPs that have activity against certain viruses. Unlike the research focused on the development of tools and methods for the prediction of antimicrobial peptides, those related to the prediction of AVPs are still scarce. Given the significance of AVPs as potential pharmaceutical options for human and animal health and the ongoing AI revolution, we have reviewed and summarized the current machine learning and deep learning-based tools and methods available for predicting these types of peptides.
Collapse
Affiliation(s)
- Nicolás Lefin
- Department of Chemical Engineering, Faculty of Engineering and Science, University of La Frontera, Ave. Francisco Salazar, 01145, Temuco, Chile
| | - Lisandra Herrera-Belén
- Departamento de Ciencias Básicas, Facultad de Ciencias, Universidad Santo Tomás, Temuco, Chile
| | - Jorge G Farias
- Department of Chemical Engineering, Faculty of Engineering and Science, University of La Frontera, Ave. Francisco Salazar, 01145, Temuco, Chile
| | - Jorge F Beltrán
- Department of Chemical Engineering, Faculty of Engineering and Science, University of La Frontera, Ave. Francisco Salazar, 01145, Temuco, Chile.
| |
Collapse
|
35
|
Guarra F, Colombo G. Computational Methods in Immunology and Vaccinology: Design and Development of Antibodies and Immunogens. J Chem Theory Comput 2023; 19:5315-5333. [PMID: 37527403 PMCID: PMC10448727 DOI: 10.1021/acs.jctc.3c00513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Indexed: 08/03/2023]
Abstract
The design of new biomolecules able to harness immune mechanisms for the treatment of diseases is a prime challenge for computational and simulative approaches. For instance, in recent years, antibodies have emerged as an important class of therapeutics against a spectrum of pathologies. In cancer, immune-inspired approaches are witnessing a surge thanks to a better understanding of tumor-associated antigens and the mechanisms of their engagement or evasion from the human immune system. Here, we provide a summary of the main state-of-the-art computational approaches that are used to design antibodies and antigens, and in parallel, we review key methodologies for epitope identification for both B- and T-cell mediated responses. A special focus is devoted to the description of structure- and physics-based models, privileged over purely sequence-based approaches. We discuss the implications of novel methods in engineering biomolecules with tailored immunological properties for possible therapeutic uses. Finally, we highlight the extraordinary challenges and opportunities presented by the possible integration of structure- and physics-based methods with emerging Artificial Intelligence technologies for the prediction and design of novel antigens, epitopes, and antibodies.
Collapse
Affiliation(s)
- Federica Guarra
- Department of Chemistry, University
of Pavia, Via Taramelli 12, 27100 Pavia, Italy
| | - Giorgio Colombo
- Department of Chemistry, University
of Pavia, Via Taramelli 12, 27100 Pavia, Italy
| |
Collapse
|
36
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
37
|
Sala D, Engelberger F, Mchaourab HS, Meiler J. Modeling conformational states of proteins with AlphaFold. Curr Opin Struct Biol 2023; 81:102645. [PMID: 37392556 DOI: 10.1016/j.sbi.2023.102645] [Citation(s) in RCA: 36] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 05/16/2023] [Accepted: 06/01/2023] [Indexed: 07/03/2023]
Abstract
Many proteins exert their function by switching among different structures. Knowing the conformational ensembles affiliated with these states is critical to elucidate key mechanistic aspects that govern protein function. While experimental determination efforts are still bottlenecked by cost, time, and technical challenges, the machine-learning technology AlphaFold showed near experimental accuracy in predicting the three-dimensional structure of monomeric proteins. However, an AlphaFold ensemble of models usually represents a single conformational state with minimal structural heterogeneity. Consequently, several pipelines have been proposed to either expand the structural breadth of an ensemble or bias the prediction toward a desired conformational state. Here, we analyze how those pipelines work, what they can and cannot predict, and future directions.
Collapse
Affiliation(s)
- D Sala
- Institute of Drug Discovery, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany. https://twitter.com/sala_davide
| | - F Engelberger
- Institute of Drug Discovery, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany. https://twitter.com/fengel97
| | - H S Mchaourab
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA. https://twitter.com/Mchaourablab
| | - J Meiler
- Institute of Drug Discovery, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany; Center for Structural Biology, Vanderbilt University, Nashville, TN 37240, USA; Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Dresden/Leipzig, Germany.
| |
Collapse
|
38
|
Yang X, Duan H, Liu X, Zhang X, Pan S, Zhang F, Gao P, Liu B, Yang J, Chi X, Yang W. Broad Sarbecovirus Neutralizing Antibodies Obtained by Computational Design and Synthetic Library Screening. J Virol 2023:e0061023. [PMID: 37367229 PMCID: PMC10373554 DOI: 10.1128/jvi.00610-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 06/10/2023] [Indexed: 06/28/2023] Open
Abstract
Members of the Sarbecovirus subgenus of Coronaviridae have twice caused deadly threats to humans. There is increasing concern about the rapid mutation of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which has evolved into multiple generations of epidemic variants in 3 years. Broad neutralizing antibodies are of great importance for pandemic preparedness against SARS-CoV-2 variants and divergent zoonotic sarbecoviruses. Here, we analyzed the structural conservation of the receptor-binding domain (RBD) from representative sarbecoviruses and chose S2H97, a previously reported RBD antibody with ideal breadth and resistance to escape, as a template for computational design to enhance the neutralization activity and spectrum. A total of 35 designs were purified for evaluation. The neutralizing activity of a large proportion of these designs against multiple variants was increased from several to hundreds of times. Molecular dynamics simulation suggested that extra interface contacts and enhanced intermolecular interactions between the RBD and the designed antibodies are established. After light and heavy chain reconstitution, AI-1028, with five complementarity determining regions optimized, showed the best neutralizing activity across all tested sarbecoviruses, including SARS-CoV, multiple SARS-CoV-2 variants, and bat-derived viruses. AI-1028 recognized the same cryptic RBD epitope as the parental prototype antibody. In addition to computational design, chemically synthesized nanobody libraries are also a precious resource for rapid antibody development. By applying distinct RBDs as baits for reciprocal screening, we identified two novel nanobodies with broad activities. These findings provide potential pan-sarbecovirus neutralizing drugs and highlight new pathways to rapidly optimize therapeutic candidates when novel SARS-CoV-2 escape variants or new zoonotic coronaviruses emerge. IMPORTANCE The subgenus Sarbecovirus includes human SARS-CoV, SARS-CoV-2, and hundreds of genetically related bat viruses. The continuous evolution of SARS-CoV-2 has led to the striking evasion of neutralizing antibody (NAb) drugs and convalescent plasma. Antibodies with broad activity across sarbecoviruses would be helpful to combat current SARS-CoV-2 mutations and longer term animal virus spillovers. The study of pan-sarbecovirus NAbs described here is significant for the following reasons. First, we established a structure-based computational pipeline to design and optimize NAbs to obtain more potent and broader neutralizing activity across multiple sarbecoviruses. Second, we screened and identified nanobodies from a highly diversified synthetic library with a broad neutralizing spectrum using an elaborate screening strategy. These methodologies provide guidance for the rapid development of antibody therapeutics against emerging pathogens with highly variable characteristics.
Collapse
Affiliation(s)
- Xuehua Yang
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Huarui Duan
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Xiuying Liu
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Xinhui Zhang
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Shengnan Pan
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Fangyuan Zhang
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Peixiang Gao
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Bo Liu
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Jian Yang
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Xiaojing Chi
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Wei Yang
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| |
Collapse
|
39
|
Wang L, Bao Y, Yu F, Zhu W, Wang JL, Yang J, Xie H, Huang D. Development of gene model combined with machine learning technology to predict for advanced atherosclerotic plaques. Clin Neurol Neurosurg 2023; 231:107819. [PMID: 37315377 DOI: 10.1016/j.clineuro.2023.107819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/03/2023] [Accepted: 06/04/2023] [Indexed: 06/16/2023]
Abstract
BACKGROUND Atherosclerosis, as a major cause of stroke, is responsible for a quarter of deaths worldwide. In particular, rupture of late-stage plaques in large vessels such as the carotid artery can lead to serious cardiovascular disease. The aim of our study was to establish a genetic model combined with machining leaning techniques to screen out gene signatures and predict for advanced atherosclerosis plaques. METHODS The microarray dataset GSE28829 and GSE43292 which were publicly obtained from the Gene Expression Omnibus database were utilized to screen for potential predictive genes. Differentially expressed genes (DEGs) were identified by using the "limma" R package. Gene Ontology (GO) and Kyoto Encyclopedia of Genes Genomes (KEGG) analyses of these DEGs were performed by Metascape. Later, Random Forest (RF) algorithm was applied to further screen out top-30 genes which contribute the most. The expression data of top 30-DEGs were converted into a "Gene Score". Finally, we developed a model based on artificial neural network (ANN) to predict advanced atherosclerotic plaques. The model later was validated in an independent test dataset GSE104140. RESULTS A total of 176 DEGs were identified in the training datasets. GO and KEGG enrichment analysis revealed that these genes were enriched in leukocyte-mediated immune response, cytokine- cytokine interactions, and immunoinflammatory signaling. Further, top-30 genes (including 25 upregulated and 5 downregulated DEGs) were screened as predictors by RF algorithm. The predictive model was developed with a significantly predictive value (AUC = 0.913) in the training datasets, and was validated with an independent dataset GSE104140 (AUC = 0.827). CONCLUSION In present study, our prediction model was established and showed satisfactory predictive power in both training and test datasets. In addition, this is the first study adopted bioinformatics methods combined with machine learning techniques (RF and ANN) to explore and predict for the advanced atherosclerotic plaques. However, further investigations were needed to verify the screened DEGs and predictive effectiveness of this model.
Collapse
Affiliation(s)
- Lufeng Wang
- Department of Neurology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
| | - Yiwen Bao
- Department of Neurology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
| | - Fei Yu
- Department of Neurology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
| | - Wenxia Zhu
- Department of Neurology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
| | - Jun Lang Wang
- Department of Imaging, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
| | - Jie Yang
- Department of Neurology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
| | - Hongrong Xie
- Department of Neurology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China.
| | - Dongya Huang
- Department of Neurology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China.
| |
Collapse
|
40
|
Valdés-Tresanco MS, Valdés-Tresanco ME, Jiménez-Gutiérrez DE, Moreno E. Structural Modeling of Nanobodies: A Benchmark of State-of-the-Art Artificial Intelligence Programs. Molecules 2023; 28:molecules28103991. [PMID: 37241731 DOI: 10.3390/molecules28103991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 05/01/2023] [Accepted: 05/05/2023] [Indexed: 05/28/2023] Open
Abstract
The number of applications for nanobodies is steadily expanding, positioning these molecules as fast-growing biologic products in the biotechnology market. Several of their applications require protein engineering, which in turn would greatly benefit from having a reliable structural model of the nanobody of interest. However, as with antibodies, the structural modeling of nanobodies is still a challenge. With the rise of artificial intelligence (AI), several methods have been developed in recent years that attempt to solve the problem of protein modeling. In this study, we have compared the performance in nanobody modeling of several state-of-the-art AI-based programs, either designed for general protein modeling, such as AlphaFold2, OmegaFold, ESMFold, and Yang-Server, or specifically designed for antibody modeling, such as IgFold, and Nanonet. While all these programs performed rather well in constructing the nanobody framework and CDRs 1 and 2, modeling CDR3 still represents a big challenge. Interestingly, tailoring an AI method for antibody modeling does not necessarily translate into better results for nanobodies.
Collapse
Affiliation(s)
| | - Mario E Valdés-Tresanco
- Centre for Molecular Simulations and Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1N4, Canada
| | | | - Ernesto Moreno
- Faculty of Basic Sciences, University of Medellin, Medellin 050026, Colombia
| |
Collapse
|
41
|
Zhang O, Haghighatlari M, Li J, Liu ZH, Namini A, Teixeira JMC, Forman-Kay JD, Head-Gordon T. Learning to evolve structural ensembles of unfolded and disordered proteins using experimental solution data. J Chem Phys 2023; 158:174113. [PMID: 37144719 PMCID: PMC10163956 DOI: 10.1063/5.0141474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/11/2023] [Indexed: 05/06/2023] Open
Abstract
The structural characterization of proteins with a disorder requires a computational approach backed by experiments to model their diverse and dynamic structural ensembles. The selection of conformational ensembles consistent with solution experiments of disordered proteins highly depends on the initial pool of conformers, with currently available tools limited by conformational sampling. We have developed a Generative Recurrent Neural Network (GRNN) that uses supervised learning to bias the probability distributions of torsions to take advantage of experimental data types such as nuclear magnetic resonance J-couplings, nuclear Overhauser effects, and paramagnetic resonance enhancements. We show that updating the generative model parameters according to the reward feedback on the basis of the agreement between experimental data and probabilistic selection of torsions from learned distributions provides an alternative to existing approaches that simply reweight conformers of a static structural pool for disordered proteins. Instead, the biased GRNN, DynamICE, learns to physically change the conformations of the underlying pool of the disordered protein to those that better agree with experiments.
Collapse
Affiliation(s)
- Oufan Zhang
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | - Mojtaba Haghighatlari
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | - Jie Li
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | | | - Ashley Namini
- Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5S 1A8, Canada
| | | | | | | |
Collapse
|
42
|
Ruffolo JA, Chu LS, Mahajan SP, Gray JJ. Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies. Nat Commun 2023; 14:2389. [PMID: 37185622 PMCID: PMC10129313 DOI: 10.1038/s41467-023-38063-x] [Citation(s) in RCA: 46] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 04/14/2023] [Indexed: 05/17/2023] Open
Abstract
Antibodies have the capacity to bind a diverse set of antigens, and they have become critical therapeutics and diagnostic molecules. The binding of antibodies is facilitated by a set of six hypervariable loops that are diversified through genetic recombination and mutation. Even with recent advances, accurate structural prediction of these loops remains a challenge. Here, we present IgFold, a fast deep learning method for antibody structure prediction. IgFold consists of a pre-trained language model trained on 558 million natural antibody sequences followed by graph networks that directly predict backbone atom coordinates. IgFold predicts structures of similar or better quality than alternative methods (including AlphaFold) in significantly less time (under 25 s). Accurate structure prediction on this timescale makes possible avenues of investigation that were previously infeasible. As a demonstration of IgFold's capabilities, we predicted structures for 1.4 million paired antibody sequences, providing structural insights to 500-fold more antibodies than have experimentally determined structures.
Collapse
Affiliation(s)
- Jeffrey A Ruffolo
- Program in Molecular Biophysics, The Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Lee-Shin Chu
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Sai Pooja Mahajan
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Jeffrey J Gray
- Program in Molecular Biophysics, The Johns Hopkins University, Baltimore, MD, 21218, USA.
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, MD, 21218, USA.
| |
Collapse
|
43
|
Chowdhury RA, Green AAS, Park CS, Maclennan JE, Clark NA. Topological defect coarsening in quenched smectic-C films analyzed using artificial neural networks. Phys Rev E 2023; 107:044701. [PMID: 37198757 DOI: 10.1103/physreve.107.044701] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 01/17/2023] [Indexed: 05/19/2023]
Abstract
Mechanically quenching a thin film of smectic-C liquid crystal results in the formation of a dense array of thousands of topological defects in the director field. The subsequent rapid coarsening of the film texture by the mutual annihilation of defects of opposite sign has been captured using high-speed, polarized light video microscopy. The temporal evolution of the texture has been characterized using an object-detection convolutional neural network to determine the defect locations, and a binary classification network customized to evaluate the brush orientation dynamics around the defects in order to determine their topological signs. At early times following the quench, inherent limits on the spatial resolution result in undercounting of the defects and deviations from expected behavior. At intermediate to late times, the observed annihilation dynamics scale in agreement with theoretical predictions and simulations of the 2D XY model.
Collapse
Affiliation(s)
- Ravin A Chowdhury
- Department of Physics and Soft Materials Research Center, University of Colorado, Boulder, Colorado 80309, USA
| | - Adam A S Green
- Department of Physics and Soft Materials Research Center, University of Colorado, Boulder, Colorado 80309, USA
| | - Cheol S Park
- Department of Physics and Soft Materials Research Center, University of Colorado, Boulder, Colorado 80309, USA
| | - Joseph E Maclennan
- Department of Physics and Soft Materials Research Center, University of Colorado, Boulder, Colorado 80309, USA
| | - Noel A Clark
- Department of Physics and Soft Materials Research Center, University of Colorado, Boulder, Colorado 80309, USA
| |
Collapse
|
44
|
de Brevern AG. An agnostic analysis of the human AlphaFold2 proteome using local protein conformations. Biochimie 2023; 207:11-19. [PMID: 36417962 DOI: 10.1016/j.biochi.2022.11.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 10/14/2022] [Accepted: 11/17/2022] [Indexed: 11/21/2022]
Abstract
Knowledge of the 3D structure of proteins is a valuable asset for understanding their precise biological mechanisms. However, the cost of production of 3D structures and experimental difficulties limit their obtaining. The proposal of 3D structural models is consequently an appealing alternative. The release of the AlphaFold Deep Learning approach has revolutionized the field. The recent near-complete human proteome proposal makes it possible to analyse large amounts of data and evaluate the results of the approach in greater depth. The 3D human proteome was thus analysed in light of the classic secondary structures, and many less-used protein local conformations (PolyProline II helices, type of γ-turns, of β-turns and of β-bulges, curvature of the helices, and a structural alphabet). Without questioning the global quality of the approach, this analysis highlights certain local conformations, which maybe poorly predicted and they could therefore be better addressed.
Collapse
Affiliation(s)
- Alexandre G de Brevern
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM UMR_S 1134, BIGR, DSIMB Bioinformatics team, F-75014, Paris, France.
| |
Collapse
|
45
|
Rajapaksa S, Konagurthu AS, Lesk AM. Sequence and structure alignments in post-AlphaFold era. Curr Opin Struct Biol 2023; 79:102539. [PMID: 36753924 DOI: 10.1016/j.sbi.2023.102539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Accepted: 01/02/2023] [Indexed: 02/09/2023]
Abstract
Sequence alignment is fundamental for analyzing protein structure and function. For all but closely-related proteins, alignments based on structures are more accurate than alignments based purely on amino-acid sequences. However, the disparity between the large amount of sequence data and the relative paucity of experimentally-determined structures has precluded the general applicability of structure alignment. Based on the success of AlphaFold (and its likes) in producing high-quality structure predictions, we suggest that when aligning homologous proteins, lacking experimental structures, better results can be obtained by a structural alignment of predicted structures than by an alignment based only on amino-acid sequences. We present a quantitative evaluation, based on pairwise alignments of sequences and structures (both predicted and experimental) to support this hypothesis.
Collapse
Affiliation(s)
- Sandun Rajapaksa
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, 3800, Victoria, Australia
| | - Arun S Konagurthu
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, 3800, Victoria, Australia
| | - Arthur M Lesk
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, 16802, Pennsylvania, USA.
| |
Collapse
|
46
|
Aubel M, Eicholt L, Bornberg-Bauer E. Assessing structure and disorder prediction tools for de novo emerged proteins in the age of machine learning. F1000Res 2023; 12:347. [PMID: 37113259 PMCID: PMC10126731 DOI: 10.12688/f1000research.130443.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/17/2023] [Indexed: 03/31/2023] Open
Abstract
Background: De novo protein coding genes emerge from scratch in the non-coding regions of the genome and have, per definition, no homology to other genes. Therefore, their encoded de novo proteins belong to the so-called "dark protein space". So far, only four de novo protein structures have been experimentally approximated. Low homology, presumed high disorder and limited structures result in low confidence structural predictions for de novo proteins in most cases. Here, we look at the most widely used structure and disorder predictors and assess their applicability for de novo emerged proteins. Since AlphaFold2 is based on the generation of multiple sequence alignments and was trained on solved structures of largely conserved and globular proteins, its performance on de novo proteins remains unknown. More recently, natural language models of proteins have been used for alignment-free structure predictions, potentially making them more suitable for de novo proteins than AlphaFold2. Methods: We applied different disorder predictors (IUPred3 short/long, flDPnn) and structure predictors, AlphaFold2 on the one hand and language-based models (Omegafold, ESMfold, RGN2) on the other hand, to four de novo proteins with experimental evidence on structure. We compared the resulting predictions between the different predictors as well as to the existing experimental evidence. Results: Results from IUPred, the most widely used disorder predictor, depend heavily on the choice of parameters and differ significantly from flDPnn which has been found to outperform most other predictors in a comparative assessment study recently. Similarly, different structure predictors yielded varying results and confidence scores for de novo proteins. Conclusions: We suggest that, while in some cases protein language model based approaches might be more accurate than AlphaFold2, the structure prediction of de novo emerged proteins remains a difficult task for any predictor, be it disorder or structure.
Collapse
Affiliation(s)
- Margaux Aubel
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
| | - Lars Eicholt
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
- Department Protein Evolution, Max Planck-Institute for Biology, Tuebingen, 72076, Germany
| |
Collapse
|
47
|
Yang Z, Zeng X, Zhao Y, Chen R. AlphaFold2 and its applications in the fields of biology and medicine. Signal Transduct Target Ther 2023; 8:115. [PMID: 36918529 PMCID: PMC10011802 DOI: 10.1038/s41392-023-01381-z] [Citation(s) in RCA: 60] [Impact Index Per Article: 60.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/27/2022] [Accepted: 02/16/2023] [Indexed: 03/16/2023] Open
Abstract
AlphaFold2 (AF2) is an artificial intelligence (AI) system developed by DeepMind that can predict three-dimensional (3D) structures of proteins from amino acid sequences with atomic-level accuracy. Protein structure prediction is one of the most challenging problems in computational biology and chemistry, and has puzzled scientists for 50 years. The advent of AF2 presents an unprecedented progress in protein structure prediction and has attracted much attention. Subsequent release of structures of more than 200 million proteins predicted by AF2 further aroused great enthusiasm in the science community, especially in the fields of biology and medicine. AF2 is thought to have a significant impact on structural biology and research areas that need protein structure information, such as drug discovery, protein design, prediction of protein function, et al. Though the time is not long since AF2 was developed, there are already quite a few application studies of AF2 in the fields of biology and medicine, with many of them having preliminarily proved the potential of AF2. To better understand AF2 and promote its applications, we will in this article summarize the principle and system architecture of AF2 as well as the recipe of its success, and particularly focus on reviewing its applications in the fields of biology and medicine. Limitations of current AF2 prediction will also be discussed.
Collapse
Affiliation(s)
- Zhenyu Yang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Xiaoxi Zeng
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
| | - Yi Zhao
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Runsheng Chen
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, 518118, China.
| |
Collapse
|
48
|
Chakravarty D, Schafer JW, Porter LL. Distinguishing features of fold-switching proteins. Protein Sci 2023; 32:e4596. [PMID: 36782353 PMCID: PMC9951197 DOI: 10.1002/pro.4596] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 01/30/2023] [Accepted: 02/09/2023] [Indexed: 02/15/2023]
Abstract
Though many folded proteins assume one stable structure that performs one function, a small-but-increasing number remodel their secondary and tertiary structures and change their functions in response to cellular stimuli. These fold-switching proteins regulate biological processes and are associated with autoimmune dysfunction, severe acute respiratory syndrome coronavirus-2 infection, and more. Despite their biological importance, it is difficult to computationally predict fold switching. With the aim of advancing computational prediction and experimental characterization of fold switchers, this review discusses several features that distinguish fold-switching proteins from their single-fold and intrinsically disordered counterparts. First, the isolated structures of fold switchers are less stable and more heterogeneous than single folders but more stable and less heterogeneous than intrinsically disordered proteins (IDPs). Second, the sequences of single fold, fold switching, and intrinsically disordered proteins can evolve at distinct rates. Third, proteins from these three classes are best predicted using different computational techniques. Finally, late-breaking results suggest that single folders, fold switchers, and IDPs have distinct patterns of residue-residue coevolution. The review closes by discussing high-throughput and medium-throughput experimental approaches that might be used to identify new fold-switching proteins.
Collapse
Affiliation(s)
- Devlina Chakravarty
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesdaMarylandUSA
| | - Joseph W. Schafer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesdaMarylandUSA
| | - Lauren L. Porter
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesdaMarylandUSA
- Biochemistry and Biophysics Center, National Heart, Lung, and Blood Institute, National Institutes of HealthBethesdaMarylandUSA
| |
Collapse
|
49
|
Non-synonymous variation and protein structure of candidate genes associated with selection in farm and wild populations of turbot (Scophthalmus maximus). Sci Rep 2023; 13:3019. [PMID: 36810752 PMCID: PMC9944912 DOI: 10.1038/s41598-023-29826-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 02/10/2023] [Indexed: 02/24/2023] Open
Abstract
Non-synonymous variation (NSV) of protein coding genes represents raw material for selection to improve adaptation to the diverse environmental scenarios in wild and livestock populations. Many aquatic species face variations in temperature, salinity and biological factors throughout their distribution range that is reflected by the presence of allelic clines or local adaptation. The turbot (Scophthalmus maximus) is a flatfish of great commercial value with a flourishing aquaculture which has promoted the development of genomic resources. In this study, we developed the first atlas of NSVs in the turbot genome by resequencing 10 individuals from Northeast Atlantic Ocean. More than 50,000 NSVs where detected in the ~ 21,500 coding genes of the turbot genome, and we selected 18 NSVs to be genotyped using a single Mass ARRAY multiplex on 13 wild populations and three turbot farms. We detected signals of divergent selection on several genes related to growth, circadian rhythms, osmoregulation and oxygen binding in the different scenarios evaluated. Furthermore, we explored the impact of NSVs identified on the 3D structure and functional relationship of the correspondent proteins. In summary, our study provides a strategy to identify NSVs in species with consistently annotated and assembled genomes to ascertain their role in adaptation.
Collapse
|
50
|
Sala D, Hildebrand PW, Meiler J. Biasing AlphaFold2 to predict GPCRs and kinases with user-defined functional or structural properties. Front Mol Biosci 2023; 10:1121962. [PMID: 36876042 PMCID: PMC9978208 DOI: 10.3389/fmolb.2023.1121962] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 01/31/2023] [Indexed: 02/18/2023] Open
Abstract
Determining the three-dimensional structure of proteins in their native functional states has been a longstanding challenge in structural biology. While integrative structural biology has been the most effective way to get a high-accuracy structure of different conformations and mechanistic insights for larger proteins, advances in deep machine-learning algorithms have paved the way to fully computational predictions. In this field, AlphaFold2 (AF2) pioneered ab initio high-accuracy single-chain modeling. Since then, different customizations have expanded the number of conformational states accessible through AF2. Here, we further expanded AF2 with the aim of enriching an ensemble of models with user-defined functional or structural features. We tackled two common protein families for drug discovery, G-protein-coupled receptors (GPCRs) and kinases. Our approach automatically identifies the best templates satisfying the specified features and combines those with genetic information. We also introduced the possibility of shuffling the selected templates to expand the space of solutions. In our benchmark, models showed the intended bias and great accuracy. Our protocol can thus be exploited for modeling user-defined conformational states in an automatic fashion.
Collapse
Affiliation(s)
- Davide Sala
- Institute of Drug Discovery, Faculty of Medicine, University of Leipzig, Leipzig, Germany
| | - Peter W. Hildebrand
- Institute of Medical Physics and Biophysics, Faculty of Medicine, University of Leipzig, Leipzig, Germany
| | - Jens Meiler
- Institute of Drug Discovery, Faculty of Medicine, University of Leipzig, Leipzig, Germany
- Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
- Department of Chemistry, Vanderbilt University, Nashville, TN, United States
| |
Collapse
|