1
|
Li C, Luo Y, Xie Y, Zhang Z, Liu Y, Zou L, Xiao F. Structural and functional prediction, evaluation, and validation in the post-sequencing era. Comput Struct Biotechnol J 2024; 23:446-451. [PMID: 38223342 PMCID: PMC10787220 DOI: 10.1016/j.csbj.2023.12.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 12/20/2023] [Accepted: 12/22/2023] [Indexed: 01/16/2024] Open
Abstract
The surge of genome sequencing data has underlined substantial genetic variants of uncertain significance (VUS). The decryption of VUS discovered by sequencing poses a major challenge in the post-sequencing era. Although experimental assays have progressed in classifying VUS, only a tiny fraction of the human genes have been explored experimentally. Thus, it is urgently needed to generate state-of-the-art functional predictors of VUS in silico. Artificial intelligence (AI) is an invaluable tool to assist in the identification of VUS with high efficiency and accuracy. An increasing number of studies indicate that AI has brought an exciting acceleration in the interpretation of VUS, and our group has already used AI to develop protein structure-based prediction models. In this review, we provide an overview of the previous research on AI-based prediction of missense variants, and elucidate the challenges and opportunities for protein structure-based variant prediction in the post-sequencing era.
Collapse
Affiliation(s)
- Chang Li
- Clinical Biobank, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Yixuan Luo
- Beijing Normal University, Beijing, China
| | - Yibo Xie
- Information Center, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Zaifeng Zhang
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Ye Liu
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Lihui Zou
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Fei Xiao
- Clinical Biobank, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
- Beijing Normal University, Beijing, China
| |
Collapse
|
2
|
Weller JA, Rohs R. Structure-Based Drug Design with a Deep Hierarchical Generative Model. J Chem Inf Model 2024. [PMID: 39058534 DOI: 10.1021/acs.jcim.4c01193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/28/2024]
Abstract
Recently, the remarkable growth of available crystal structure data and libraries of commercially available or readily synthesizable molecules have unlocked previously inaccessible regions of chemical space for drug development. Paired with improvements in virtual ligand screening methods, these expanded libraries are having a notable impact on early drug design efforts. Yet screening-based methods still face scalability limits, due to computational constraints and the sheer scale of drug-like space. Machine learning approaches are overcoming these limitations by learning the fundamental intra- and intermolecular relationships in drug-target systems from existing data. Here, we introduce DrugHIVE, a deep hierarchical variational autoencoder that outperforms state-of-the-art autoregressive and diffusion-based methods in both speed and performance on common generative benchmarks. DrugHIVE's hierarchical design enables improved control over molecular generation. Its capabilities include dramatically increasing virtual screening efficiency and accelerating a wide range of common drug design tasks, including de novo generation, molecular optimization, scaffold hopping, linker design, and high-throughput pattern replacement. Our highly scalable method can even be applied to receptors with high-confidence AlphaFold-predicted structures, extending the ability to generate high-quality drug-like molecules to a majority of the unsolved human proteome.
Collapse
Affiliation(s)
- Jesse A Weller
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, United States
- Department of Physics and Astronomy, University of Southern California, Los Angeles, California 90089, United States
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, United States
- Department of Physics and Astronomy, University of Southern California, Los Angeles, California 90089, United States
- Department of Chemistry, University of Southern California, Los Angeles, California 90089, United States
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, California 90089, United States
| |
Collapse
|
3
|
Manen-Freixa L, Antolin AA. Polypharmacology prediction: the long road toward comprehensively anticipating small-molecule selectivity to de-risk drug discovery. Expert Opin Drug Discov 2024:1-27. [PMID: 39004919 DOI: 10.1080/17460441.2024.2376643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 07/02/2024] [Indexed: 07/16/2024]
Abstract
INTRODUCTION Small molecules often bind to multiple targets, a behavior termed polypharmacology. Anticipating polypharmacology is essential for drug discovery since unknown off-targets can modulate safety and efficacy - profoundly affecting drug discovery success. Unfortunately, experimental methods to assess selectivity present significant limitations and drugs still fail in the clinic due to unanticipated off-targets. Computational methods are a cost-effective, complementary approach to predict polypharmacology. AREAS COVERED This review aims to provide a comprehensive overview of the state of polypharmacology prediction and discuss its strengths and limitations, covering both classical cheminformatics methods and bioinformatic approaches. The authors review available data sources, paying close attention to their different coverage. The authors then discuss major algorithms grouped by the types of data that they exploit using selected examples. EXPERT OPINION Polypharmacology prediction has made impressive progress over the last decades and contributed to identify many off-targets. However, data incompleteness currently limits most approaches to comprehensively predict selectivity. Moreover, our limited agreement on model assessment challenges the identification of the best algorithms - which at present show modest performance in prospective real-world applications. Despite these limitations, the exponential increase of multidisciplinary Big Data and AI hold much potential to better polypharmacology prediction and de-risk drug discovery.
Collapse
Affiliation(s)
- Leticia Manen-Freixa
- Oncobell Division, Bellvitge Biomedical Research Institute (IDIBELL) and ProCURE Department, Catalan Institute of Oncology (ICO), Barcelona, Spain
| | - Albert A Antolin
- Oncobell Division, Bellvitge Biomedical Research Institute (IDIBELL) and ProCURE Department, Catalan Institute of Oncology (ICO), Barcelona, Spain
- Center for Cancer Drug Discovery, The Division of Cancer Therapeutics, The Institute of Cancer Research, London, UK
| |
Collapse
|
4
|
Sawada R, Sakajiri Y, Shibata T, Yamanishi Y. Predicting therapeutic and side effects from drug binding affinities to human proteome structures. iScience 2024; 27:110032. [PMID: 38868195 PMCID: PMC11167438 DOI: 10.1016/j.isci.2024.110032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Revised: 04/08/2024] [Accepted: 05/16/2024] [Indexed: 06/14/2024] Open
Abstract
Evaluation of the binding affinities of drugs to proteins is a crucial process for identifying drug pharmacological actions, but it requires three dimensional structures of proteins. Herein, we propose novel computational methods to predict the therapeutic indications and side effects of drug candidate compounds from the binding affinities to human protein structures on a proteome-wide scale. Large-scale docking simulations were performed for 7,582 drugs with 19,135 protein structures revealed by AlphaFold (including experimentally unresolved proteins), and machine learning models on the proteome-wide binding affinity score (PBAS) profiles were constructed. We demonstrated the usefulness of the method for predicting the therapeutic indications for 559 diseases and side effects for 285 toxicities. The method enabled to predict drug indications for which the related protein structures had not been experimentally determined and to successfully extract proteins eliciting the side effects. The proposed method will be useful in various applications in drug discovery.
Collapse
Affiliation(s)
- Ryusuke Sawada
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Iizuka, Japan
- Department of Pharmacology, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
| | - Yuko Sakajiri
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Iizuka, Japan
- Graduate School of Informatics, Nagoya University, Chikusa, Nagoya, Japan
| | - Tomokazu Shibata
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Iizuka, Japan
| | - Yoshihiro Yamanishi
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Iizuka, Japan
- Graduate School of Informatics, Nagoya University, Chikusa, Nagoya, Japan
| |
Collapse
|
5
|
Wang L, Wen Z, Liu SW, Zhang L, Finley C, Lee HJ, Fan HJS. Overview of AlphaFold2 and breakthroughs in overcoming its limitations. Comput Biol Med 2024; 176:108620. [PMID: 38761500 DOI: 10.1016/j.compbiomed.2024.108620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 05/01/2024] [Accepted: 05/14/2024] [Indexed: 05/20/2024]
Abstract
Predicting three-dimensional (3D) protein structures has been challenging for decades. The emergence of AlphaFold2 (AF2), a deep learning-based machine learning method developed by DeepMind, became a game changer in the protein folding community. AF2 can predict a protein's three-dimensional structure with high confidence based on its amino acid sequence. Accurate prediction of protein structures can dramatically accelerate our understanding of biological mechanisms and provide a solid foundation for reliable drug design. Although AF2 breaks through the barriers in predicting protein structures, many rooms remain to be further studied. This review provides a brief historical overview of the development of protein structure prediction, covering template-based, template-free, and machine learning-based methods. In addition to reviewing the potential benefits (Pros) and considerations (Cons) of using AF2, this review summarizes the diverse applications, including protein structure predictions, dynamic changes, point mutation, integration of language model and experimental data, protein complex, and protein-peptide interaction. It underscores recent advancements in efficiency, reliability, and broad application of AF2. This comprehensive review offers valuable insights into the applications of AF2 and AF2-inspired AI methods in structural biology and its potential for clinically significant drug target discovery.
Collapse
Affiliation(s)
- Lei Wang
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China
| | - Zehua Wen
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China
| | - Shi-Wei Liu
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China
| | - Lihong Zhang
- Digestive Department, Binhai New Area Hospital of TCM Tianjin, Tianjin, 300451, China
| | - Cierra Finley
- Department of Natural Sciences, Southwest Tennessee Community College, Memphis, TN, 38015, USA
| | - Ho-Jin Lee
- Department of Natural Sciences, Southwest Tennessee Community College, Memphis, TN, 38015, USA; Division of Natural & Mathematical Sciences, LeMoyne-Own College, Memphis, TN, 38126, USA.
| | - Hua-Jun Shawn Fan
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China.
| |
Collapse
|
6
|
Passi G, Lieberman S, Zahdeh F, Murik O, Renbaum P, Beeri R, Linial M, May D, Levy-Lahad E, Schneidman-Duhovny D. Discovering predisposing genes for hereditary breast cancer using deep learning. Brief Bioinform 2024; 25:bbae346. [PMID: 39038933 PMCID: PMC11262808 DOI: 10.1093/bib/bbae346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 04/18/2024] [Accepted: 07/04/2024] [Indexed: 07/24/2024] Open
Abstract
Breast cancer (BC) is the most common malignancy affecting Western women today. It is estimated that as many as 10% of BC cases can be attributed to germline variants. However, the genetic basis of the majority of familial BC cases has yet to be identified. Discovering predisposing genes contributing to familial BC is challenging due to their presumed rarity, low penetrance, and complex biological mechanisms. Here, we focused on an analysis of rare missense variants in a cohort of 12 families of Middle Eastern origins characterized by a high incidence of BC cases. We devised a novel, high-throughput, variant analysis pipeline adapted for family studies, which aims to analyze variants at the protein level by employing state-of-the-art machine learning models and three-dimensional protein structural analysis. Using our pipeline, we analyzed 1218 rare missense variants that are shared between affected family members and classified 80 genes as candidate pathogenic. Among these genes, we found significant functional enrichment in peroxisomal and mitochondrial biological pathways which segregated across seven families in the study and covered diverse ethnic groups. We present multiple evidence that peroxisomal and mitochondrial pathways play an important, yet underappreciated, role in both germline BC predisposition and BC survival.
Collapse
Affiliation(s)
- Gal Passi
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Sari Lieberman
- The Fuld Family Medical Genetics Institute, Shaare Zedek Medical Center 12 Bayit St., Jerusalem 9103101, Israel
- The Eisenberg R&D Authority, Shaare Zedek Medical Center, 12 Bayit St., Jerusalem 9103101, Israel
- Faculty of Medicine, The Hebrew University of Jerusalem, Ein Kerem PO Box 12271 Jerusalem 9112102, Israel
| | - Fouad Zahdeh
- The Fuld Family Medical Genetics Institute, Shaare Zedek Medical Center 12 Bayit St., Jerusalem 9103101, Israel
- The Eisenberg R&D Authority, Shaare Zedek Medical Center, 12 Bayit St., Jerusalem 9103101, Israel
| | - Omer Murik
- The Fuld Family Medical Genetics Institute, Shaare Zedek Medical Center 12 Bayit St., Jerusalem 9103101, Israel
- The Eisenberg R&D Authority, Shaare Zedek Medical Center, 12 Bayit St., Jerusalem 9103101, Israel
| | - Paul Renbaum
- The Fuld Family Medical Genetics Institute, Shaare Zedek Medical Center 12 Bayit St., Jerusalem 9103101, Israel
- The Eisenberg R&D Authority, Shaare Zedek Medical Center, 12 Bayit St., Jerusalem 9103101, Israel
| | - Rachel Beeri
- The Fuld Family Medical Genetics Institute, Shaare Zedek Medical Center 12 Bayit St., Jerusalem 9103101, Israel
- The Eisenberg R&D Authority, Shaare Zedek Medical Center, 12 Bayit St., Jerusalem 9103101, Israel
| | - Michal Linial
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, Jerusalem 91904, Israel
| | - Dalit May
- The Fuld Family Medical Genetics Institute, Shaare Zedek Medical Center 12 Bayit St., Jerusalem 9103101, Israel
- The Eisenberg R&D Authority, Shaare Zedek Medical Center, 12 Bayit St., Jerusalem 9103101, Israel
- Clalit Health Services, Jerusalem, Israel
| | - Ephrat Levy-Lahad
- The Fuld Family Medical Genetics Institute, Shaare Zedek Medical Center 12 Bayit St., Jerusalem 9103101, Israel
- The Eisenberg R&D Authority, Shaare Zedek Medical Center, 12 Bayit St., Jerusalem 9103101, Israel
- Faculty of Medicine, The Hebrew University of Jerusalem, Ein Kerem PO Box 12271 Jerusalem 9112102, Israel
| | - Dina Schneidman-Duhovny
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| |
Collapse
|
7
|
Gu S, Yang Y, Zhao Y, Qiu J, Wang X, Tong HHY, Liu L, Wan X, Liu H, Hou T, Kang Y. Evaluation of AlphaFold2 Structures for Hit Identification across Multiple Scenarios. J Chem Inf Model 2024; 64:3630-3639. [PMID: 38630855 DOI: 10.1021/acs.jcim.3c01976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
The introduction of AlphaFold2 (AF2) has sparked significant enthusiasm and generated extensive discussion within the scientific community, particularly among drug discovery researchers. Although previous studies have addressed the performance of AF2 structures in virtual screening (VS), a more comprehensive investigation is still necessary considering the paramount importance of structural accuracy in drug design. In this study, we evaluate the performance of AF2 structures in VS across three common drug discovery scenarios: targets with holo, apo, and AF2 structures; targets with only apo and AF2 structures; and targets exclusively with AF2 structures. We utilized both the traditional physics-based Glide and the deep-learning-based scoring function RTMscore to rank the compounds in the DUD-E, DEKOIS 2.0, and DECOY data sets. The results demonstrate that, overall, the performance of VS on AF2 structures is comparable to that on apo structures but notably inferior to that on holo structures across diverse scenarios. Moreover, when a target has solely AF2 structure, selecting the holo structure of the target from different subtypes within the same protein family produces comparable results with the AF2 structure for VS on the data set of the AF2 structures, and significantly better results than the AF2 structures on its own data set. This indicates that utilizing AF2 structures for docking-based VS may not yield most satisfactory outcomes, even when solely AF2 structures are available. Moreover, we rule out the possibility that the variations in VS performance between the binding pockets of AF2 and holo structures arise from the differences in their biological assembly composition.
Collapse
Affiliation(s)
- Shukai Gu
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078, SAR, China
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yuwei Yang
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078, SAR, China
| | - Yihao Zhao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jiayue Qiu
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078, SAR, China
| | - Xiaorui Wang
- State Key Laboratory of Quality Re-search in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao 999078, China
| | - Henry Hoi Yee Tong
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078, SAR, China
| | - Liwei Liu
- Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd., Nanjing 210000, Jiangsu, China
| | - Xiaozhe Wan
- Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd., Nanjing 210000, Jiangsu, China
| | - Huanxiang Liu
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078, SAR, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
8
|
Ruiz-Serra V, Valentini S, Madroñero S, Valencia A, Porta-Pardo E. 3Dmapper: a command line tool for BioBank-scale mapping of variants to protein structures. Bioinformatics 2024; 40:btae171. [PMID: 38565273 PMCID: PMC11018535 DOI: 10.1093/bioinformatics/btae171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 02/09/2024] [Accepted: 03/30/2024] [Indexed: 04/04/2024] Open
Abstract
MOTIVATION The interpretation of genomic data is crucial to understand the molecular mechanisms of biological processes. Protein structures play a vital role in facilitating this interpretation by providing functional context to genetic coding variants. However, mapping genes to proteins is a tedious and error-prone task due to inconsistencies in data formats. Over the past two decades, numerous tools and databases have been developed to automatically map annotated positions and variants to protein structures. However, most of these tools are web-based and not well-suited for large-scale genomic data analysis. RESULTS To address this issue, we introduce 3Dmapper, a stand-alone command-line tool developed in Python and R. It systematically maps annotated protein positions and variants to protein structures, providing a solution that is both efficient and reliable. AVAILABILITY AND IMPLEMENTATION https://github.com/vicruiser/3Dmapper.
Collapse
Affiliation(s)
- Victoria Ruiz-Serra
- Barcelona Supercomputing Center (BSC)
- Josep Carreras Leukaemia Research Institute (IJC), Badalona 08916, Spain
| | - Samuel Valentini
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Trento 38123, Italy
| | - Sergi Madroñero
- Josep Carreras Leukaemia Research Institute (IJC), Badalona 08916, Spain
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC)
- Institució Catalana de Recerca Avançada (ICREA)
| | - Eduard Porta-Pardo
- Barcelona Supercomputing Center (BSC)
- Josep Carreras Leukaemia Research Institute (IJC), Badalona 08916, Spain
| |
Collapse
|
9
|
Weller JA, Rohs R. DrugHIVE: Target-specific spatial drug design and optimization with a hierarchical generative model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.22.573155. [PMID: 38187658 PMCID: PMC10769420 DOI: 10.1101/2023.12.22.573155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Rapid advancement in the computational methods of structure-based drug design has led to their widespread adoption as key tools in the early drug development process. Recently, the remarkable growth of available crystal structure data and libraries of commercially available or readily synthesizable molecules have unlocked previously inaccessible regions of chemical space for drug development. Paired with improvements in virtual ligand screening methods, these expanded libraries are having a significant impact on the success of early drug design efforts. However, screening-based methods are limited in their scalability due to computational limits and the sheer scale of drug-like space. An approach within the quickly evolving field of artificial intelligence (AI), deep generative modeling, is extending the reach of molecular design beyond classical methods by learning the fundamental intra- and inter-molecular relationships in drug-target systems from existing data. In this work we introduce DrugHIVE, a deep hierarchical structure-based generative model that enables fine-grained control over molecular generation. Our model outperforms state of the art autoregressive and diffusion-based methods on common benchmarks and in speed of generation. Here, we demonstrate DrugHIVEs capacity to accelerate a wide range of common drug design tasks such as de novo generation, molecular optimization, scaffold hopping, linker design, and high throughput pattern replacement. Our method is highly scalable and can be applied to high confidence AlphaFold predicted receptors, extending our ability to generate high quality drug-like molecules to a majority of the unsolved human proteome.
Collapse
|
10
|
Smith MD, Darryl Quarles L, Demerdash O, Smith JC. Drugging the entire human proteome: Are we there yet? Drug Discov Today 2024; 29:103891. [PMID: 38246414 DOI: 10.1016/j.drudis.2024.103891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/12/2024] [Accepted: 01/16/2024] [Indexed: 01/23/2024]
Abstract
Each of the ∼20,000 proteins in the human proteome is a potential target for compounds that bind to it and modify its function. The 3D structures of most of these proteins are now available. Here, we discuss the prospects for using these structures to perform proteome-wide virtual HTS (VHTS). We compare physics-based (docking) and AI VHTS approaches, some of which are now being applied with large databases of compounds to thousands of targets. Although preliminary proteome-wide screens are now within our grasp, further methodological developments are expected to improve the accuracy of the results.
Collapse
Affiliation(s)
- Micholas Dean Smith
- University of Tennessee/Oak Ridge National Laboratory Center for Molecular Biophysics, Oak Ridge, TN 37830, USA; Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USA
| | - L Darryl Quarles
- Departments of Medicine, University of Tennessee Health Science Center, Memphis, TN 38163, USA; ORRxD LLC, 3404 Olney Drive, Durham, NC 27705, USA
| | - Omar Demerdash
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830, USA
| | - Jeremy C Smith
- University of Tennessee/Oak Ridge National Laboratory Center for Molecular Biophysics, Oak Ridge, TN 37830, USA; Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USA.
| |
Collapse
|
11
|
Schaeffer RD, Zhang J, Medvedev KE, Kinch LN, Cong Q, Grishin NV. ECOD domain classification of 48 whole proteomes from AlphaFold Structure Database using DPAM2. PLoS Comput Biol 2024; 20:e1011586. [PMID: 38416793 PMCID: PMC10927120 DOI: 10.1371/journal.pcbi.1011586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 03/11/2024] [Accepted: 02/20/2024] [Indexed: 03/01/2024] Open
Abstract
Protein structure prediction has now been deployed widely across several different large protein sets. Large-scale domain annotation of these predictions can aid in the development of biological insights. Using our Evolutionary Classification of Protein Domains (ECOD) from experimental structures as a basis for classification, we describe the detection and cataloging of domains from 48 whole proteomes deposited in the AlphaFold Database. On average, we can provide positive classification (either of domains or other identifiable non-domain regions) for 90% of residues in all proteomes. We classified 746,349 domains from 536,808 proteins comprised of over 226,424,000 amino acid residues. We examine the varying populations of homologous groups in both eukaryotes and bacteria. In addition to containing a higher fraction of disordered regions and unassigned domains, eukaryotes show a higher proportion of repeated proteins, both globular and small repeats. We enumerate those highly populated domains that are shared in both eukaryotes and bacteria, such as the Rossmann domains, TIM barrels, and P-loop domains. Additionally, we compare the sampling of homologous groups from this whole proteome set against our stable ECOD reference and discuss groups that have been enriched by structure predictions. Finally, we discuss the implication of these results for protein target selection for future classification strategies for very large protein sets.
Collapse
Affiliation(s)
- R. Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Jing Zhang
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Kirill E. Medvedev
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Lisa N. Kinch
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Qian Cong
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Nick V. Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| |
Collapse
|
12
|
Aspromonte MC, Nugnes MV, Quaglia F, Bouharoua A, Tosatto SCE, Piovesan D. DisProt in 2024: improving function annotation of intrinsically disordered proteins. Nucleic Acids Res 2024; 52:D434-D441. [PMID: 37904585 PMCID: PMC10767923 DOI: 10.1093/nar/gkad928] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/05/2023] [Accepted: 10/10/2023] [Indexed: 11/01/2023] Open
Abstract
DisProt (URL: https://disprot.org) is the gold standard database for intrinsically disordered proteins and regions, providing valuable information about their functions. The latest version of DisProt brings significant advancements, including a broader representation of functions and an enhanced curation process. These improvements aim to increase both the quality of annotations and their coverage at the sequence level. Higher coverage has been achieved by adopting additional evidence codes. Quality of annotations has been improved by systematically applying Minimum Information About Disorder Experiments (MIADE) principles and reporting all the details of the experimental setup that could potentially influence the structural state of a protein. The DisProt database now includes new thematic datasets and has expanded the adoption of Gene Ontology terms, resulting in an extensive functional repertoire which is automatically propagated to UniProtKB. Finally, we show that DisProt's curated annotations strongly correlate with disorder predictions inferred from AlphaFold2 pLDDT (predicted Local Distance Difference Test) confidence scores. This comparison highlights the utility of DisProt in explaining apparent uncertainty of certain well-defined predicted structures, which often correspond to folding-upon-binding fragments. Overall, DisProt serves as a comprehensive resource, combining experimental evidence of disorder information to enhance our understanding of intrinsically disordered proteins and their functional implications.
Collapse
Affiliation(s)
| | | | - Federica Quaglia
- Department of Biomedical Sciences, University of Padova, Padova, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), Bari, Italy
| | - Adel Bouharoua
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | | | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| |
Collapse
|
13
|
Liu BH, Liu M, Radhakrishnan S, Jaladanki CK, Gao C, Tang JP, Kumari K, Go ML, Vu KAL, Seo HS, Song K, Tian X, Feng L, Tan JL, Bassal MA, Arthanari H, Qi J, Dhe-Paganon S, Fan H, Tenen DG, Chai L. Targeting transcription factors through an IMiD independent zinc finger domain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.03.574032. [PMID: 38260640 PMCID: PMC10802279 DOI: 10.1101/2024.01.03.574032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Immunomodulatory imide drugs (IMiDs) degrade specific C2H2 zinc finger degrons in transcription factors, making them effective against certain cancers. SALL4, a cancer driver, contains seven C2H2 zinc fingers in four clusters, including an IMiD degron in zinc finger cluster two (ZFC2). Surprisingly, IMiDs do not inhibit growth of SALL4 expressing cancer cells. To overcome this limit, we focused on a non-IMiD degron, SALL4 zinc finger cluster four (ZFC4). By combining AlphaFold and the ZFC4-DNA crystal structure, we identified a potential ZFC4 drug pocket. Utilizing an in silico docking algorithm and cell viability assays, we screened chemical libraries and discovered SH6, which selectively targets SALL4-expressing cancer cells. Mechanistic studies revealed that SH6 degrades SALL4 protein through the CUL4A/CRBN pathway, while deletion of ZFC4 abolished this activity. Moreover, SH6 led to significant 62% tumor growth inhibition of SALL4+ xenografts in vivo and demonstrated good bioavailability in pharmacokinetic studies. In summary, these studies represent a new approach for IMiD independent drug discovery targeting C2H2 transcription factors in cancer.
Collapse
|
14
|
Terwilliger TC, Liebschner D, Croll TI, Williams CJ, McCoy AJ, Poon BK, Afonine PV, Oeffner RD, Richardson JS, Read RJ, Adams PD. AlphaFold predictions are valuable hypotheses and accelerate but do not replace experimental structure determination. Nat Methods 2024; 21:110-116. [PMID: 38036854 PMCID: PMC10776388 DOI: 10.1038/s41592-023-02087-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 10/11/2023] [Indexed: 12/02/2023]
Abstract
Artificial intelligence-based protein structure prediction methods such as AlphaFold have revolutionized structural biology. The accuracies of these predictions vary, however, and they do not take into account ligands, covalent modifications or other environmental factors. Here, we evaluate how well AlphaFold predictions can be expected to describe the structure of a protein by comparing predictions directly with experimental crystallographic maps. In many cases, AlphaFold predictions matched experimental maps remarkably closely. In other cases, even very high-confidence predictions differed from experimental maps on a global scale through distortion and domain orientation, and on a local scale in backbone and side-chain conformation. We suggest considering AlphaFold predictions as exceptionally useful hypotheses. We further suggest that it is important to consider the confidence in prediction when interpreting AlphaFold predictions and to carry out experimental structure determination to verify structural details, particularly those that involve interactions not included in the prediction.
Collapse
Affiliation(s)
- Thomas C Terwilliger
- New Mexico Consortium, Los Alamos, NM, USA.
- Los Alamos National Laboratory, Los Alamos, NM, USA.
| | - Dorothee Liebschner
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Tristan I Croll
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
| | | | - Airlie J McCoy
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
| | - Billy K Poon
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Pavel V Afonine
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Robert D Oeffner
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
| | | | - Randy J Read
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
| | - Paul D Adams
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Department of Bioengineering, University of California, Berkeley, CA, USA
| |
Collapse
|
15
|
Cen LP, Ng TK, Ji J, Lin JW, Yao Y, Yang R, Dong G, Cao Y, Chen C, Yao SQ, Wang WY, Huang Z, Qiu K, Pang CP, Liu Q, Zhang M. Artificial Intelligence-based database for prediction of protein structure and their alterations in ocular diseases. Database (Oxford) 2023; 2023:baad083. [PMID: 38109881 PMCID: PMC10727695 DOI: 10.1093/database/baad083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 07/17/2023] [Accepted: 12/15/2023] [Indexed: 12/20/2023]
Abstract
The aim of the study is to establish an online database for predicting protein structures altered in ocular diseases by Alphafold2 and RoseTTAFold algorithms. Totally, 726 genes of multiple ocular diseases were collected for protein structure prediction. Both Alphafold2 and RoseTTAFold algorithms were built locally using the open-source codebases. A dataset with 48 protein structures from Protein Data Bank (PDB) was adopted for algorithm set-up validation. A website was built to match ocular genes with the corresponding predicted tertiary protein structures for each amino acid sequence. The predicted local distance difference test-Cα (pLDDT) and template modeling (TM) scores of the validation protein structure and the selected ocular genes were evaluated. Molecular dynamics and molecular docking simulations were performed to demonstrate the applications of the predicted structures. For the validation dataset, 70.8% of the predicted protein structures showed pLDDT greater than 90. Compared to the PDB structures, 100% of the AlphaFold2-predicted structures and 97.9% of the RoseTTAFold-predicted structure showed TM score greater than 0.5. Totally, 1329 amino acid sequences of 430 ocular disease-related genes have been predicted, of which 75.9% showed pLDDT greater than 70 for the wildtype sequences and 76.1% for the variant sequences. Small molecule docking and molecular dynamics simulations revealed that the predicted protein structures with higher confidence scores showed similar molecular characteristics with the structures from PDB. We have developed an ocular protein structure database (EyeProdb) for ocular disease, which is released for the public and will facilitate the biological investigations and structure-based drug development for ocular diseases. Database URL: http://eyeprodb.jsiec.org.
Collapse
Affiliation(s)
| | - Tsz Kin Ng
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
- Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, 147K Argyle Street, KLN, Hong Kong
| | - Jie Ji
- Network & Information Centre, Shantou University, 243 Daxue Road, Shantou, Guangdong 515063, China
| | - Jian-Wei Lin
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
| | - Yao Yao
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
- Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
| | - Rucui Yang
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
- Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
| | - Geng Dong
- Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
- Guangdong Provincial Key Laboratory of Infectious Diseases and Molecular Immunopathology, Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
| | - Yingjie Cao
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
| | - Chongbo Chen
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
| | - Shi-Qi Yao
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
- Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
| | - Wen-Ying Wang
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
- Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
| | - Zijing Huang
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
| | - Kunliang Qiu
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
| | - Chi Pui Pang
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, 147K Argyle Street, KLN, Hong Kong
| | - Qingping Liu
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
- Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
| | - Mingzhi Zhang
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
| |
Collapse
|
16
|
Malhotra N, Khatri S, Kumar A, Arun A, Daripa P, Fatihi S, Venkadesan S, Jain N, Thukral L. AI-based AlphaFold2 significantly expands the structural space of the autophagy pathway. Autophagy 2023; 19:3201-3220. [PMID: 37516933 PMCID: PMC10621275 DOI: 10.1080/15548627.2023.2238578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 07/08/2023] [Accepted: 07/14/2023] [Indexed: 07/31/2023] Open
Abstract
ABBREVIATIONS AF2: AlphaFold2; AF2-Mult: AlphaFold2 multimer; ATG: autophagy-related; CTD: C-terminal domain; ECTD: extreme C-terminal domain; FR: flexible region; MD: molecular dynamics; NTD: N-terminal domain; pLDDT: predicted local distance difference test; UBL: ubiquitin-like.
Collapse
Affiliation(s)
- Nidhi Malhotra
- Computational Structural Biology Lab, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Shantanu Khatri
- Computational Structural Biology Lab, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research (AcSir), Ghaziabad, India
| | - Ajit Kumar
- Computational Structural Biology Lab, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research (AcSir), Ghaziabad, India
| | - Akanksha Arun
- Computational Structural Biology Lab, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research (AcSir), Ghaziabad, India
| | - Purba Daripa
- Computational Structural Biology Lab, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Saman Fatihi
- Computational Structural Biology Lab, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research (AcSir), Ghaziabad, India
| | | | - Niyati Jain
- Computational Structural Biology Lab, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Lipi Thukral
- Computational Structural Biology Lab, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research (AcSir), Ghaziabad, India
| |
Collapse
|
17
|
James JK, Norland K, Johar AS, Kullo IJ. Deep generative models of LDLR protein structure to predict variant pathogenicity. J Lipid Res 2023; 64:100455. [PMID: 37821076 PMCID: PMC10696256 DOI: 10.1016/j.jlr.2023.100455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 09/16/2023] [Accepted: 10/05/2023] [Indexed: 10/13/2023] Open
Abstract
The complex structure and function of low density lipoprotein receptor (LDLR) makes classification of protein-coding missense variants challenging. Deep generative models, including Evolutionary model of Variant Effect (EVE), Evolutionary Scale Modeling (ESM), and AlphaFold 2 (AF2), have enabled significant progress in the prediction of protein structure and function. ESM and EVE directly estimate the likelihood of a variant sequence but are purely data-driven and challenging to interpret. AF2 predicts LDLR structures, but variant effects are explicitly modeled by estimating changes in stability. We tested the effectiveness of these models for predicting variant pathogenicity compared to established methods. AF2 produced two distinct conformations based on a novel hinge mechanism. Within ESM's hidden space, benign and pathogenic variants had different distributions. In EVE, these distributions were similar. EVE and ESM were comparable to Polyphen-2, SIFT, REVEL, and Primate AI for predicting binary classifications in ClinVar. However, they were more strongly correlated with experimental measures of LDL uptake. AF2 poorly performed in these tasks. Using the UK Biobank to compare association with clinical phenotypes, ESM and EVE were more strongly associated with serum LDL-C than Polyphen-2. ESM was able to identify variants with more extreme LDL-C levels than EVE and had a significantly stronger association with atherosclerotic cardiovascular disease. In conclusion, AF2 predicted LDLR structures do not accurately model variant pathogenicity. ESM and EVE are competitive with prior scoring methods for prediction based on binary classifications in ClinVar but are superior based on correlations with experimental assays and clinical phenotypes.
Collapse
Affiliation(s)
- Jose K James
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Kristjan Norland
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Angad S Johar
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA; Gonda Vascular Center, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
18
|
Kosoglu K, Aydin Z, Tuncbag N, Gursoy A, Keskin O. Structural coverage of the human interactome. Brief Bioinform 2023; 25:bbad496. [PMID: 38180828 PMCID: PMC10768791 DOI: 10.1093/bib/bbad496] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/16/2023] [Accepted: 11/30/2023] [Indexed: 01/07/2024] Open
Abstract
Complex biological processes in cells are embedded in the interactome, representing the complete set of protein-protein interactions. Mapping and analyzing the protein structures are essential to fully comprehending these processes' molecular details. Therefore, knowing the structural coverage of the interactome is important to show the current limitations. Structural modeling of protein-protein interactions requires accurate protein structures. In this study, we mapped all experimental structures to the reference human proteome. Later, we found the enrichment in structural coverage when complementary methods such as homology modeling and deep learning (AlphaFold) were included. We then collected the interactions from the literature and databases to form the reference human interactome, resulting in 117 897 non-redundant interactions. When we analyzed the structural coverage of the interactome, we found that the number of experimentally determined protein complex structures is scarce, corresponding to 3.95% of all binary interactions. We also analyzed known and modeled structures to potentially construct the structural interactome with a docking method. Our analysis showed that 12.97% of the interactions from HuRI and 73.62% and 32.94% from the filtered versions of STRING and HIPPIE could potentially be modeled with high structural coverage or accuracy, respectively. Overall, this paper provides an overview of the current state of structural coverage of the human proteome and interactome.
Collapse
Affiliation(s)
- Kayra Kosoglu
- Computational Sciences and Engineering, College of Engineering, Koc University, 34450 Istanbul, Turkey
| | - Zeynep Aydin
- Computational Sciences and Engineering, College of Engineering, Koc University, 34450 Istanbul, Turkey
| | - Nurcan Tuncbag
- School of Medicine, Koc University, 34450 Istanbul, Turkey
- Department of Chemical and Biological Engineering, College of Engineering, Koc University, 34450 Istanbul, Turkey
| | - Attila Gursoy
- Department of Computer Engineering, College of Engineering, Koc University, 34450 Istanbul, Turkey
| | - Ozlem Keskin
- Department of Chemical and Biological Engineering, College of Engineering, Koc University, 34450 Istanbul, Turkey
| |
Collapse
|
19
|
Rosignoli S, di Paola L, Paiardini A. PyPCN: protein contact networks in PyMOL. Bioinformatics 2023; 39:btad675. [PMID: 37941462 PMCID: PMC10641099 DOI: 10.1093/bioinformatics/btad675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 09/25/2023] [Accepted: 11/03/2023] [Indexed: 11/10/2023] Open
Abstract
MOTIVATION Protein contact networks (PCNs) represent the 3D structure of a protein using network formalism. Inter-residue contacts are described as binary adjacency matrices, which are derived from the graph representation of residues (as α-carbons, β-carbons or centroids) and Euclidean distances according to defined thresholds. Functional characterization algorithms are computed on binary adjacency matrices to unveil allosteric, dynamic, and interaction mechanisms in proteins. Such strategies are usually applied in a combinatorial manner, although rarely in seamless and user-friendly implementations. RESULTS PyPCN is a plugin for PyMOL wrapping more than twenty PCN algorithms and metrics in an easy-to-use graphical user interface, to support PCN analysis. The plugin accepts 3D structures from the Protein Data Bank, user-provided PDBs, or precomputed adjacency matrices. The results are directly mapped to 3D protein structures and organized into interactive diagrams for their visualization. A dedicated graphical user interface combined with PyMOL visual support makes analysis more intuitive and easier, extending the applicability of PCNs. AVAILABILITY AND IMPLEMENTATION https://github.com/pcnproject/PyPCN.
Collapse
Affiliation(s)
- Serena Rosignoli
- Department of Biochemical Sciences “A. Rossi Fanelli”, Sapienza University of Rome, 00185 Rome, Italy
| | - Luisa di Paola
- Unit of Chemical-Physics Fundamentals in Chemical Engineering, Department of Engineering, Università Campus Bio-Medico di Roma, 00128 Rome, Italy
| | - Alessandro Paiardini
- Department of Biochemical Sciences “A. Rossi Fanelli”, Sapienza University of Rome, 00185 Rome, Italy
| |
Collapse
|
20
|
Liang Z, Liu T, Li Q, Zhang G, Zhang B, Du X, Liu J, Chen Z, Ding H, Hu G, Lin H, Zhu F, Luo C. Deciphering the functional landscape of phosphosites with deep neural network. Cell Rep 2023; 42:113048. [PMID: 37659078 DOI: 10.1016/j.celrep.2023.113048] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 07/11/2023] [Accepted: 08/11/2023] [Indexed: 09/04/2023] Open
Abstract
Current biochemical approaches have only identified the most well-characterized kinases for a tiny fraction of the phosphoproteome, and the functional assignments of phosphosites are almost negligible. Herein, we analyze the substrate preference catalyzed by a specific kinase and present a novel integrated deep neural network model named FuncPhos-SEQ for functional assignment of human proteome-level phosphosites. FuncPhos-SEQ incorporates phosphosite motif information from a protein sequence using multiple convolutional neural network (CNN) channels and network features from protein-protein interactions (PPIs) using network embedding and deep neural network (DNN) channels. These concatenated features are jointly fed into a heterogeneous feature network to prioritize functional phosphosites. Combined with a series of in vitro and cellular biochemical assays, we confirm that NADK-S48/50 phosphorylation could activate its enzymatic activity. In addition, ERK1/2 are discovered as the primary kinases responsible for NADK-S48/50 phosphorylation. Moreover, FuncPhos-SEQ is developed as an online server.
Collapse
Affiliation(s)
- Zhongjie Liang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China; Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Soochow University, Suzhou 215123, China
| | - Tonghai Liu
- Zhongshan Institute for Drug Discovery, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Zhongshan 528437, China; State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Qi Li
- Zhongshan Institute for Drug Discovery, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Zhongshan 528437, China; State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Guangyu Zhang
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Bei Zhang
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Xikun Du
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Jingqiu Liu
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Zhifeng Chen
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Hong Ding
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Guang Hu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China; Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Soochow University, Suzhou 215123, China
| | - Hao Lin
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Fei Zhu
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China.
| | - Cheng Luo
- Zhongshan Institute for Drug Discovery, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Zhongshan 528437, China; State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310024, China; School of Life Science and Technology, Shanghai Tech University, 100 Haike Road, Shanghai 201210, China; School of Pharmacy, Fujian Medical University, Fuzhou 350122, China.
| |
Collapse
|
21
|
Xu T, Xu Q, Li J. Toward the appropriate interpretation of Alphafold2. Front Artif Intell 2023; 6:1149748. [PMID: 37664078 PMCID: PMC10469483 DOI: 10.3389/frai.2023.1149748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Accepted: 07/24/2023] [Indexed: 09/05/2023] Open
Abstract
In life science, protein is an essential building block for life forms and a crucial catalyst for metabolic reactions in organisms. The structures of protein depend on an infinity of amino acid residues' complex combinations determined by gene expression. Predicting protein folding structures has been a tedious problem in the past seven decades but, due to robust development of artificial intelligence, astonishing progress has been made. Alphafold2, whose key component is Evoformer, is a typical and successful example of such progress. This article attempts to not only isolate and dissect every detail of Evoformer, but also raise some ideas for potential improvement.
Collapse
Affiliation(s)
- Tian Xu
- Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States
| | - Qin Xu
- Department of Mathematics, The University of Arizona, Tucson, AZ, United States
| | - Jianyong Li
- Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States
| |
Collapse
|
22
|
Medvedev KE, Schaeffer RD, Chen KS, Grishin NV. Pan-cancer structurome reveals overrepresentation of beta sandwiches and underrepresentation of alpha helical domains. Sci Rep 2023; 13:11988. [PMID: 37491511 PMCID: PMC10368619 DOI: 10.1038/s41598-023-39273-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 07/22/2023] [Indexed: 07/27/2023] Open
Abstract
The recent progress in the prediction of protein structures marked a historical milestone. AlphaFold predicted 200 million protein models with an accuracy comparable to experimental methods. Protein structures are widely used to understand evolution and to identify potential drug targets for the treatment of various diseases, including cancer. Thus, these recently predicted structures might convey previously unavailable information about cancer biology. Evolutionary classification of protein domains is challenging and different approaches exist. Recently our team presented a classification of domains from human protein models released by AlphaFold. Here we evaluated the pan-cancer structurome, domains from over and under expressed proteins in 21 cancer types, using the broadest levels of the ECOD classification: the architecture (A-groups) and possible homology (X-groups) levels. Our analysis reveals that AlphaFold has greatly increased the three-dimensional structural landscape for proteins that are differentially expressed in these 21 cancer types. We show that beta sandwich domains are significantly overrepresented and alpha helical domains are significantly underrepresented in the majority of cancer types. Our data suggest that the prevalence of the beta sandwiches is due to the high levels of immunoglobulins and immunoglobulin-like domains that arise during tumor development-related inflammation. On the other hand, proteins with exclusively alpha domains are important elements of homeostasis, apoptosis and transmembrane transport. Therefore cancer cells tend to reduce representation of these proteins to promote successful oncogeneses.
Collapse
Affiliation(s)
- Kirill E Medvedev
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
| | - R Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Kenneth S Chen
- Department of Pediatrics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- Children's Medical Center Research Institute, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Nick V Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| |
Collapse
|
23
|
Krokengen OC, Raasakka A, Kursula P. The intrinsically disordered protein glue of the myelin major dense line: Linking AlphaFold2 predictions to experimental data. Biochem Biophys Rep 2023; 34:101474. [PMID: 37153862 PMCID: PMC10160357 DOI: 10.1016/j.bbrep.2023.101474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 03/31/2023] [Accepted: 04/19/2023] [Indexed: 05/10/2023] Open
Abstract
Numerous human proteins are classified as intrinsically disordered proteins (IDPs). Due to their physicochemical properties, high-resolution structural information about IDPs is generally lacking. On the other hand, IDPs are known to adopt local ordered structures upon interactions with e.g. other proteins or lipid membrane surfaces. While recent developments in protein structure prediction have been revolutionary, their impact on IDP research at high resolution remains limited. We took a specific example of two myelin-specific IDPs, the myelin basic protein (MBP) and the cytoplasmic domain of myelin protein zero (P0ct). Both of these IDPs are crucial for normal nervous system development and function, and while they are disordered in solution, upon membrane binding, they partially fold into helices, being embedded into the lipid membrane. We carried out AlphaFold2 predictions of both proteins and analysed the models in light of experimental data related to protein structure and molecular interactions. We observe that the predicted models have helical segments that closely correspond to the membrane-binding sites on both proteins. We furthermore analyse the fits of the models to synchrotron-based X-ray scattering and circular dichroism data from the same IDPs. The models are likely to represent the membrane-bound state of both MBP and P0ct, rather than the conformation in solution. Artificial intelligence-based models of IDPs appear to provide information on the ligand-bound state of these proteins, instead of the conformers dominating free in solution. We further discuss the implications of the predictions for mammalian nervous system myelination and their relevance to understanding disease aspects of these IDPs.
Collapse
Affiliation(s)
| | - Arne Raasakka
- Department of Biomedicine, University of Bergen, Norway
| | - Petri Kursula
- Department of Biomedicine, University of Bergen, Norway
- Faculty of Biochemistry and Molecular Medicine & Biocenter Oulu, Oulu, Finland
- Corresponding author. Department of Biomedicine, University of Bergen, Norway.
| |
Collapse
|
24
|
Hatano Y, Ishihara T, Onodera O. Accuracy of a machine learning method based on structural and locational information from AlphaFold2 for predicting the pathogenicity of TARDBP and FUS gene variants in ALS. BMC Bioinformatics 2023; 24:206. [PMID: 37208601 DOI: 10.1186/s12859-023-05338-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 05/09/2023] [Indexed: 05/21/2023] Open
Abstract
BACKGROUND In the sporadic form of amyotrophic lateral sclerosis (ALS), the pathogenicity of rare variants in the causative genes characterizing the familial form remains largely unknown. To predict the pathogenicity of such variants, in silico analysis is commonly used. In some ALS causative genes, the pathogenic variants are concentrated in specific regions, and the resulting alterations in protein structure are thought to significantly affect pathogenicity. However, existing methods have not taken this issue into account. To address this, we have developed a technique termed MOVA (method for evaluating the pathogenicity of missense variants using AlphaFold2), which applies positional information for structural variants predicted by AlphaFold2. Here we examined the utility of MOVA for analysis of several causative genes of ALS. METHODS We analyzed variants of 12 ALS-related genes (TARDBP, FUS, SETX, TBK1, OPTN, SOD1, VCP, SQSTM1, ANG, UBQLN2, DCTN1, and CCNF) and classified them as pathogenic or neutral. For each gene, the features of the variants, consisting of their positions in the 3D structure predicted by AlphaFold2, pLDDT score, and BLOSUM62 were trained into a random forest and evaluated by the stratified fivefold cross validation method. We compared how accurately MOVA predicted mutant pathogenicity with other in silico prediction methods and evaluated the prediction accuracy at TARDBP and FUS hotspots. We also examined which of the MOVA features had the greatest impact on pathogenicity discrimination. RESULTS MOVA yielded useful results (AUC ≥ 0.70) for TARDBP, FUS, SOD1, VCP, and UBQLN2 of 12 ALS causative genes. In addition, when comparing the prediction accuracy with other in silico prediction methods, MOVA obtained the best results among those compared for TARDBP, VCP, UBQLN2, and CCNF. MOVA demonstrated superior predictive accuracy for the pathogenicity of mutations at hotspots of TARDBP and FUS. Moreover, higher accuracy was achieved by combining MOVA with REVEL or CADD. Among the features of MOVA, the x, y, and z coordinates performed the best and were highly correlated with MOVA. CONCLUSIONS MOVA is useful for predicting the virulence of rare variants in which they are concentrated at specific structural sites, and for use in combination with other prediction methods.
Collapse
Affiliation(s)
- Yuya Hatano
- Department of Neurology, Brain Research Institute, Niigata University, 1-757 Asahimachidori, Chuo-ku, Niigata-shi, Niigata, 951-8585, Japan
| | - Tomohiko Ishihara
- Department of Neurology, Brain Research Institute, Niigata University, 1-757 Asahimachidori, Chuo-ku, Niigata-shi, Niigata, 951-8585, Japan.
| | - Osamu Onodera
- Department of Neurology, Brain Research Institute, Niigata University, 1-757 Asahimachidori, Chuo-ku, Niigata-shi, Niigata, 951-8585, Japan
| |
Collapse
|
25
|
David A, Sternberg MJE. Protein structure-based evaluation of missense variants: Resources, challenges and future directions. Curr Opin Struct Biol 2023; 80:102600. [PMID: 37126977 DOI: 10.1016/j.sbi.2023.102600] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/30/2023] [Accepted: 03/31/2023] [Indexed: 05/03/2023]
Abstract
We provide an overview of the methods that can be used for protein structure-based evaluation of missense variants. The algorithms can be broadly divided into those that calculate the difference in free energy (ΔΔG) between the wild type and variant structures and those that use structural features to predict the damaging effect of a variant without providing a ΔΔG. A wide range of machine learning approaches have been employed to develop those algorithms. We also discuss challenges and opportunities for variant interpretation in view of the recent breakthrough in three-dimensional structural modelling using deep learning.
Collapse
Affiliation(s)
- Alessia David
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK.
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| |
Collapse
|
26
|
Bartolec TK, Vázquez-Campos X, Norman A, Luong C, Johnson M, Payne RJ, Wilkins MR, Mackay JP, Low JKK. Cross-linking mass spectrometry discovers, evaluates, and corroborates structures and protein-protein interactions in the human cell. Proc Natl Acad Sci U S A 2023; 120:e2219418120. [PMID: 37071682 PMCID: PMC10151615 DOI: 10.1073/pnas.2219418120] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 03/16/2023] [Indexed: 04/19/2023] Open
Abstract
Significant recent advances in structural biology, particularly in the field of cryoelectron microscopy, have dramatically expanded our ability to create structural models of proteins and protein complexes. However, many proteins remain refractory to these approaches because of their low abundance, low stability, or-in the case of complexes-simply not having yet been analyzed. Here, we demonstrate the power of using cross-linking mass spectrometry (XL-MS) for the high-throughput experimental assessment of the structures of proteins and protein complexes. This included those produced by high-resolution but in vitro experimental data, as well as in silico predictions based on amino acid sequence alone. We present the largest XL-MS dataset to date, describing 28,910 unique residue pairs captured across 4,084 unique human proteins and 2,110 unique protein-protein interactions. We show that models of proteins and their complexes predicted by AlphaFold2, and inspired and corroborated by the XL-MS data, offer opportunities to deeply mine the structural proteome and interactome and reveal mechanisms underlying protein structure and function.
Collapse
Affiliation(s)
- Tara K. Bartolec
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Randwick, NSW2052, Australia
| | - Xabier Vázquez-Campos
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Randwick, NSW2052, Australia
| | - Alexander Norman
- School of Chemistry, University of Sydney, Sydney, NSW2006, Australia
| | - Clement Luong
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW2006, Australia
| | - Marcus Johnson
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW2006, Australia
| | - Richard J. Payne
- School of Chemistry, University of Sydney, Sydney, NSW2006, Australia
- Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, The University of Sydney, Sydney, NSW2006, Australia
| | - Marc R. Wilkins
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Randwick, NSW2052, Australia
| | - Joel P. Mackay
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW2006, Australia
| | - Jason K. K. Low
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW2006, Australia
| |
Collapse
|
27
|
McCafferty CL, Pennington EL, Papoulas O, Taylor DW, Marcotte EM. Does AlphaFold2 model proteins' intracellular conformations? An experimental test using cross-linking mass spectrometry of endogenous ciliary proteins. Commun Biol 2023; 6:421. [PMID: 37061613 PMCID: PMC10105775 DOI: 10.1038/s42003-023-04773-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 03/28/2023] [Indexed: 04/17/2023] Open
Abstract
A major goal in structural biology is to understand protein assemblies in their biologically relevant states. Here, we investigate whether AlphaFold2 structure predictions match native protein conformations. We chemically cross-linked proteins in situ within intact Tetrahymena thermophila cilia and native ciliary extracts, identifying 1,225 intramolecular cross-links within the 100 best-sampled proteins, providing a benchmark of distance restraints obeyed by proteins in their native assemblies. The corresponding structure predictions were highly concordant, positioning 86.2% of cross-linked residues within Cɑ-to-Cɑ distances of 30 Å, consistent with the cross-linker length. 43% of proteins showed no violations. Most inconsistencies occurred in low-confidence regions or between domains. Overall, AlphaFold2 predictions with lower predicted aligned error corresponded to more correct native structures. However, we observe cases where rigid body domains are oriented incorrectly, as for ciliary protein BBC118, suggesting that combining structure prediction with experimental information will better reveal biologically relevant conformations.
Collapse
Affiliation(s)
- Caitlyn L McCafferty
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX, 78712, USA.
| | - Erin L Pennington
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX, 78712, USA
| | - Ophelia Papoulas
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX, 78712, USA
| | - David W Taylor
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX, 78712, USA.
| | - Edward M Marcotte
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX, 78712, USA.
| |
Collapse
|
28
|
Bruley A, Bitard-Feildel T, Callebaut I, Duprat E. A sequence-based foldability score combined with AlphaFold2 predictions to disentangle the protein order/disorder continuum. Proteins 2023; 91:466-484. [PMID: 36306150 DOI: 10.1002/prot.26441] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 10/14/2022] [Accepted: 10/18/2022] [Indexed: 11/11/2022]
Abstract
Order and disorder govern protein functions, but there is a great diversity in disorder, from regions that are-and stay-fully disordered to conditional order. This diversity is still difficult to decipher even though it is encoded in the amino acid sequences. Here, we developed an analytic Python package, named pyHCA, to estimate the foldability of a protein segment from the only information of its amino acid sequence and based on a measure of its density in regular secondary structures associated with hydrophobic clusters, as defined by the hydrophobic cluster analysis (HCA) approach. The tool was designed by optimizing the separation between foldable segments from databases of disorder (DisProt) and order (SCOPe [soluble domains] and OPM [transmembrane domains]). It allows to specify the ratio between order, embodied by regular secondary structures (either participating in the hydrophobic core of well-folded 3D structures or conditionally formed in intrinsically disordered regions) and disorder. We illustrated the relevance of pyHCA with several examples and applied it to the sequences of the proteomes of 21 species ranging from prokaryotes and archaea to unicellular and multicellular eukaryotes, for which structure models are provided in the AlphaFold protein structure database. Cases of low-confidence scores related to disorder were distinguished from those of sequences that we identified as foldable but are still excluded from accurate modeling by AlphaFold2 due to a lack of sequence homologs or to compositional biases. Overall, our approach is complementary to AlphaFold2, providing guides to map structural innovations through evolutionary processes, at proteome and gene scales.
Collapse
Affiliation(s)
- Apolline Bruley
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Tristan Bitard-Feildel
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Elodie Duprat
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| |
Collapse
|
29
|
de Brevern AG. An agnostic analysis of the human AlphaFold2 proteome using local protein conformations. Biochimie 2023; 207:11-19. [PMID: 36417962 DOI: 10.1016/j.biochi.2022.11.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 10/14/2022] [Accepted: 11/17/2022] [Indexed: 11/21/2022]
Abstract
Knowledge of the 3D structure of proteins is a valuable asset for understanding their precise biological mechanisms. However, the cost of production of 3D structures and experimental difficulties limit their obtaining. The proposal of 3D structural models is consequently an appealing alternative. The release of the AlphaFold Deep Learning approach has revolutionized the field. The recent near-complete human proteome proposal makes it possible to analyse large amounts of data and evaluate the results of the approach in greater depth. The 3D human proteome was thus analysed in light of the classic secondary structures, and many less-used protein local conformations (PolyProline II helices, type of γ-turns, of β-turns and of β-bulges, curvature of the helices, and a structural alphabet). Without questioning the global quality of the approach, this analysis highlights certain local conformations, which maybe poorly predicted and they could therefore be better addressed.
Collapse
Affiliation(s)
- Alexandre G de Brevern
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM UMR_S 1134, BIGR, DSIMB Bioinformatics team, F-75014, Paris, France.
| |
Collapse
|
30
|
Bordin N, Dallago C, Heinzinger M, Kim S, Littmann M, Rauer C, Steinegger M, Rost B, Orengo C. Novel machine learning approaches revolutionize protein knowledge. Trends Biochem Sci 2023; 48:345-359. [PMID: 36504138 PMCID: PMC10570143 DOI: 10.1016/j.tibs.2022.11.001] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 10/24/2022] [Accepted: 11/17/2022] [Indexed: 12/10/2022]
Abstract
Breakthrough methods in machine learning (ML), protein structure prediction, and novel ultrafast structural aligners are revolutionizing structural biology. Obtaining accurate models of proteins and annotating their functions on a large scale is no longer limited by time and resources. The most recent method to be top ranked by the Critical Assessment of Structure Prediction (CASP) assessment, AlphaFold 2 (AF2), is capable of building structural models with an accuracy comparable to that of experimental structures. Annotations of 3D models are keeping pace with the deposition of the structures due to advancements in protein language models (pLMs) and structural aligners that help validate these transferred annotations. In this review we describe how recent developments in ML for protein science are making large-scale structural bioinformatics available to the general scientific community.
Collapse
Affiliation(s)
- Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, Gower St, WC1E 6BT London, UK
| | - Christian Dallago
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany; VantAI, 151 W 42nd Street, New York, NY 10036, USA
| | - Michael Heinzinger
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748 Garching, Germany
| | - Stephanie Kim
- School of Biological Sciences, Seoul National University, Seoul, South Korea; Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Maria Littmann
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
| | - Clemens Rauer
- Institute of Structural and Molecular Biology, University College London, Gower St, WC1E 6BT London, UK
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea; Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Burkhard Rost
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany; Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748 Garching/Munich, Germany; TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, Gower St, WC1E 6BT London, UK.
| |
Collapse
|
31
|
Schaeffer RD, Zhang J, Kinch LN, Pei J, Cong Q, Grishin NV. Classification of domains in predicted structures of the human proteome. Proc Natl Acad Sci U S A 2023; 120:e2214069120. [PMID: 36917664 PMCID: PMC10041065 DOI: 10.1073/pnas.2214069120] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 02/06/2023] [Indexed: 03/16/2023] Open
Abstract
Recent advances in protein structure prediction have generated accurate structures of previously uncharacterized human proteins. Identifying domains in these predicted structures and classifying them into an evolutionary hierarchy can reveal biological insights. Here, we describe the detection and classification of domains from the human proteome. Our classification indicates that only 62% of residues are located in globular domains. We further classify these globular domains and observe that the majority (65%) can be classified among known folds by sequence, with a smaller fraction (33%) requiring structural data to refine the domain boundaries and/or to support their homology. A relatively small number (966 domains) cannot be confidently assigned using our automatic pipelines, thus demanding manual inspection. We classify 47,576 domains, of which only 23% have been included in experimental structures. A portion (6.3%) of these classified globular domains lack sequence-based annotation in InterPro. A quarter (23%) have not been structurally modeled by homology, and they contain 2,540 known disease-causing single amino acid variations whose pathogenesis can now be inferred using AF models. A comparison of classified domains from a series of model organisms revealed expansions of several immune response-related domains in humans and a depletion of olfactory receptors. Finally, we use this classification to expand well-known protein families of biological significance. These classifications are presented on the ECOD website (http://prodata.swmed.edu/ecod/index_human.php).
Collapse
Affiliation(s)
- R. Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Jing Zhang
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Lisa N. Kinch
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX75390
- HHMI, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Jimin Pei
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Qian Cong
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Nick V. Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX75390
| |
Collapse
|
32
|
Malbranke C, Bikard D, Cocco S, Monasson R, Tubiana J. Machine learning for evolutionary-based and physics-inspired protein design: Current and future synergies. Curr Opin Struct Biol 2023; 80:102571. [PMID: 36947951 DOI: 10.1016/j.sbi.2023.102571] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 01/29/2023] [Accepted: 02/07/2023] [Indexed: 03/24/2023]
Abstract
Computational protein design facilitates the discovery of novel proteins with prescribed structure and functionality. Exciting designs were recently reported using novel data-driven methodologies that can be roughly divided into two categories: evolutionary-based and physics-inspired approaches. The former infer characteristic sequence features shared by sets of evolutionary-related proteins, such as conserved or coevolving positions, and recombine them to generate candidates with similar structure and function. The latter approaches estimate key biochemical properties, such as structure free energy, conformational entropy, or binding affinities using machine learning surrogates, and optimize them to yield improved designs. Here, we review recent progress along both tracks, discuss their strengths and weaknesses, and highlight opportunities for synergistic approaches.
Collapse
Affiliation(s)
- Cyril Malbranke
- Laboratory of Physics of the Ecole Normale Supérieure, PSL Research, CNRS UMR 8023, Sorbonne Université, Université de Paris, Paris, France; Institut Pasteur, Université Paris Cité, CNRS UMR 6047, Synthetic Biology, 75015 Paris, France.
| | - David Bikard
- Institut Pasteur, Université Paris Cité, CNRS UMR 6047, Synthetic Biology, 75015 Paris, France
| | - Simona Cocco
- Laboratory of Physics of the Ecole Normale Supérieure, PSL Research, CNRS UMR 8023, Sorbonne Université, Université de Paris, Paris, France
| | - Rémi Monasson
- Laboratory of Physics of the Ecole Normale Supérieure, PSL Research, CNRS UMR 8023, Sorbonne Université, Université de Paris, Paris, France
| | - Jérôme Tubiana
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
33
|
Rosenkranz AA, Slastnikova TA. Prospects of Using Protein Engineering for Selective Drug Delivery into a Specific Compartment of Target Cells. Pharmaceutics 2023; 15:pharmaceutics15030987. [PMID: 36986848 PMCID: PMC10055131 DOI: 10.3390/pharmaceutics15030987] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Revised: 03/13/2023] [Accepted: 03/17/2023] [Indexed: 03/30/2023] Open
Abstract
A large number of proteins are successfully used to treat various diseases. These include natural polypeptide hormones, their synthetic analogues, antibodies, antibody mimetics, enzymes, and other drugs based on them. Many of them are demanded in clinical settings and commercially successful, mainly for cancer treatment. The targets for most of the aforementioned drugs are located at the cell surface. Meanwhile, the vast majority of therapeutic targets, which are usually regulatory macromolecules, are located inside the cell. Traditional low molecular weight drugs freely penetrate all cells, causing side effects in non-target cells. In addition, it is often difficult to elaborate a small molecule that can specifically affect protein interactions. Modern technologies make it possible to obtain proteins capable of interacting with almost any target. However, proteins, like other macromolecules, cannot, as a rule, freely penetrate into the desired cellular compartment. Recent studies allow us to design multifunctional proteins that solve these problems. This review considers the scope of application of such artificial constructs for the targeted delivery of both protein-based and traditional low molecular weight drugs, the obstacles met on the way of their transport to the specified intracellular compartment of the target cells after their systemic bloodstream administration, and the means to overcome those difficulties.
Collapse
Affiliation(s)
- Andrey A Rosenkranz
- Laboratory of Molecular Genetics of Intracellular Transport, Institute of Gene Biology of Russian Academy of Sciences, 34/5 Vavilov St., 119334 Moscow, Russia
- Department of Biophysics, Faculty of Biology, Lomonosov Moscow State University, 1-12 Leninskie Gory St., 119234 Moscow, Russia
| | - Tatiana A Slastnikova
- Laboratory of Molecular Genetics of Intracellular Transport, Institute of Gene Biology of Russian Academy of Sciences, 34/5 Vavilov St., 119334 Moscow, Russia
| |
Collapse
|
34
|
AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. Commun Biol 2023; 6:160. [PMID: 36755055 PMCID: PMC9908985 DOI: 10.1038/s42003-023-04488-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 01/16/2023] [Indexed: 02/10/2023] Open
Abstract
Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.
Collapse
|
35
|
Zhao H, Zhang H, She Z, Gao Z, Wang Q, Geng Z, Dong Y. Exploring AlphaFold2's Performance on Predicting Amino Acid Side-Chain Conformations and Its Utility in Crystal Structure Determination of B318L Protein. Int J Mol Sci 2023; 24:ijms24032740. [PMID: 36769074 PMCID: PMC9916901 DOI: 10.3390/ijms24032740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 01/10/2023] [Accepted: 01/12/2023] [Indexed: 02/04/2023] Open
Abstract
Recent technological breakthroughs in machine-learning-based AlphaFold2 (AF2) are pushing the prediction accuracy of protein structures to an unprecedented level that is on par with experimental structural quality. Despite its outstanding structural modeling capability, further experimental validations and performance assessments of AF2 predictions are still required, thus necessitating the development of integrative structural biology in synergy with both computational and experimental methods. Focusing on the B318L protein that plays an essential role in the African swine fever virus (ASFV) for viral replication, we experimentally demonstrate the high quality of the AF2 predicted model and its practical utility in crystal structural determination. Structural alignment implies that the AF2 model shares nearly the same atomic arrangement as the B318L crystal structure except for some flexible and disordered regions. More importantly, side-chain-based analysis at the individual residue level reveals that AF2's performance is likely dependent on the specific amino acid type and that hydrophobic residues tend to be more accurately predicted by AF2 than hydrophilic residues. Quantitative per-residue RMSD comparisons and further molecular replacement trials suggest that AF2 has a large potential to outperform other computational modeling methods in terms of structural determination. Additionally, it is numerically confirmed that the AF2 model is accurate enough so that it may well potentially withstand experimental data quality to a large extent for structural determination. Finally, an overall structural analysis and molecular docking simulation of the B318L protein are performed. Taken together, our study not only provides new insights into AF2's performance in predicting side-chain conformations but also sheds light upon the significance of AF2 in promoting crystal structural determination, especially when the experimental data quality of the protein crystal is poor.
Collapse
Affiliation(s)
- Haifan Zhao
- School of Life Sciences, University of Science and Technology of China, Hefei 230027, China
- Beijing Synchrotron Radiation Facility, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
| | - Heng Zhang
- Beijing Synchrotron Radiation Facility, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
| | - Zhun She
- Beijing Synchrotron Radiation Facility, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
| | - Zengqiang Gao
- Beijing Synchrotron Radiation Facility, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
| | - Qi Wang
- Beijing Synchrotron Radiation Facility, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhi Geng
- Beijing Synchrotron Radiation Facility, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
- Correspondence: (Z.G.); (Y.D.)
| | - Yuhui Dong
- Beijing Synchrotron Radiation Facility, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Correspondence: (Z.G.); (Y.D.)
| |
Collapse
|
36
|
Duran-Frigola M, Cigler M, Winter GE. Advancing Targeted Protein Degradation via Multiomics Profiling and Artificial Intelligence. J Am Chem Soc 2023; 145:2711-2732. [PMID: 36706315 PMCID: PMC9912273 DOI: 10.1021/jacs.2c11098] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Only around 20% of the human proteome is considered to be druggable with small-molecule antagonists. This leaves some of the most compelling therapeutic targets outside the reach of ligand discovery. The concept of targeted protein degradation (TPD) promises to overcome some of these limitations. In brief, TPD is dependent on small molecules that induce the proximity between a protein of interest (POI) and an E3 ubiquitin ligase, causing ubiquitination and degradation of the POI. In this perspective, we want to reflect on current challenges in the field, and discuss how advances in multiomics profiling, artificial intelligence, and machine learning (AI/ML) will be vital in overcoming them. The presented roadmap is discussed in the context of small-molecule degraders but is equally applicable for other emerging proximity-inducing modalities.
Collapse
Affiliation(s)
- Miquel Duran-Frigola
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria,Ersilia
Open Source Initiative, 28 Belgrave Road, CB1 3DE, Cambridge, United Kingdom,
| | - Marko Cigler
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
| | - Georg E. Winter
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria,
| |
Collapse
|
37
|
Sora V, Laspiur AO, Degn K, Arnaudi M, Utichi M, Beltrame L, De Menezes D, Orlandi M, Stoltze UK, Rigina O, Sackett PW, Wadt K, Schmiegelow K, Tiberti M, Papaleo E. RosettaDDGPrediction for high-throughput mutational scans: From stability to binding. Protein Sci 2023; 32:e4527. [PMID: 36461907 PMCID: PMC9795540 DOI: 10.1002/pro.4527] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 11/25/2022] [Accepted: 11/25/2022] [Indexed: 12/05/2022]
Abstract
Reliable prediction of free energy changes upon amino acid substitutions (ΔΔGs) is crucial to investigate their impact on protein stability and protein-protein interaction. Advances in experimental mutational scans allow high-throughput studies thanks to multiplex techniques. On the other hand, genomics initiatives provide a large amount of data on disease-related variants that can benefit from analyses with structure-based methods. Therefore, the computational field should keep the same pace and provide new tools for fast and accurate high-throughput ΔΔG calculations. In this context, the Rosetta modeling suite implements effective approaches to predict folding/unfolding ΔΔGs in a protein monomer upon amino acid substitutions and calculate the changes in binding free energy in protein complexes. However, their application can be challenging to users without extensive experience with Rosetta. Furthermore, Rosetta protocols for ΔΔG prediction are designed considering one variant at a time, making the setup of high-throughput screenings cumbersome. For these reasons, we devised RosettaDDGPrediction, a customizable Python wrapper designed to run free energy calculations on a set of amino acid substitutions using Rosetta protocols with little intervention from the user. Moreover, RosettaDDGPrediction assists with checking completed runs and aggregates raw data for multiple variants, as well as generates publication-ready graphics. We showed the potential of the tool in four case studies, including variants of uncertain significance in childhood cancer, proteins with known experimental unfolding ΔΔGs values, interactions between target proteins and disordered motifs, and phosphomimetics. RosettaDDGPrediction is available, free of charge and under GNU General Public License v3.0, at https://github.com/ELELAB/RosettaDDGPrediction.
Collapse
Affiliation(s)
- Valentina Sora
- Cancer Structural Biology, Danish Cancer Society Research CenterCopenhagenDenmark
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Adrian Otamendi Laspiur
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Kristine Degn
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Matteo Arnaudi
- Cancer Structural Biology, Danish Cancer Society Research CenterCopenhagenDenmark
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Mattia Utichi
- Cancer Structural Biology, Danish Cancer Society Research CenterCopenhagenDenmark
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Ludovica Beltrame
- Cancer Structural Biology, Danish Cancer Society Research CenterCopenhagenDenmark
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Dayana De Menezes
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Matteo Orlandi
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Ulrik Kristoffer Stoltze
- Department of Clinical GeneticsCopenhagen University Hospital RigshospitaletCopenhagenDenmark
- Department of Pediatrics and Adolescent MedicineUniversity Hospital RigshospitaletCopenhagenDenmark
- Institute of Clinical Medicine, Faculty of MedicineUniversity of CopenhagenCopenhagenDenmark
| | - Olga Rigina
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Peter Wad Sackett
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Karin Wadt
- Department of Clinical GeneticsCopenhagen University Hospital RigshospitaletCopenhagenDenmark
- Institute of Clinical Medicine, Faculty of MedicineUniversity of CopenhagenCopenhagenDenmark
| | - Kjeld Schmiegelow
- Department of Pediatrics and Adolescent MedicineUniversity Hospital RigshospitaletCopenhagenDenmark
- Institute of Clinical Medicine, Faculty of MedicineUniversity of CopenhagenCopenhagenDenmark
| | - Matteo Tiberti
- Cancer Structural Biology, Danish Cancer Society Research CenterCopenhagenDenmark
| | - Elena Papaleo
- Cancer Structural Biology, Danish Cancer Society Research CenterCopenhagenDenmark
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| |
Collapse
|
38
|
Brender JR, Ramamoorthy A, Gursky O, Bhunia A. Intrinsic disorder and structural biology: Searching where the light isn't. Biophys Chem 2023; 292:106912. [PMID: 36335754 DOI: 10.1016/j.bpc.2022.106912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Jeffrey R Brender
- Radiation Biology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Ayyalusamy Ramamoorthy
- Biophysics, Department of Chemistry, Biomedical Engineering, and Macromolecular Science and Engineering, University of Michigan, Ann Arbor, MI 48109-1055, USA
| | - Olga Gursky
- Boston University School of Medicine, Department of Physiology & Biophysics, W302, 700 Albany St, Boston, MA 02118, USA
| | - Anirban Bhunia
- Biomolecular NMR and Drug Design Laboratory, Department of Biophysics, Bose Institute, P-1/12 CIT Scheme VII (M), Kolkata 700054, India
| |
Collapse
|
39
|
Bertoline LMF, Lima AN, Krieger JE, Teixeira SK. Before and after AlphaFold2: An overview of protein structure prediction. FRONTIERS IN BIOINFORMATICS 2023; 3:1120370. [PMID: 36926275 PMCID: PMC10011655 DOI: 10.3389/fbinf.2023.1120370] [Citation(s) in RCA: 41] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 02/17/2023] [Indexed: 03/08/2023] Open
Abstract
Three-dimensional protein structure is directly correlated with its function and its determination is critical to understanding biological processes and addressing human health and life science problems in general. Although new protein structures are experimentally obtained over time, there is still a large difference between the number of protein sequences placed in Uniprot and those with resolved tertiary structure. In this context, studies have emerged to predict protein structures by methods based on a template or free modeling. In the last years, different methods have been combined to overcome their individual limitations, until the emergence of AlphaFold2, which demonstrated that predicting protein structure with high accuracy at unprecedented scale is possible. Despite its current impact in the field, AlphaFold2 has limitations. Recently, new methods based on protein language models have promised to revolutionize the protein structural biology allowing the discovery of protein structure and function only from evolutionary patterns present on protein sequence. Even though these methods do not reach AlphaFold2 accuracy, they already covered some of its limitations, being able to predict with high accuracy more than 200 million proteins from metagenomic databases. In this mini-review, we provide an overview of the breakthroughs in protein structure prediction before and after AlphaFold2 emergence.
Collapse
Affiliation(s)
- Letícia M F Bertoline
- Laboratory of Genetics and Molecular Cardiology, Heart Institute, University of São Paulo Medical School, São Paulo, Brazil
| | - Angélica N Lima
- Laboratory of Genetics and Molecular Cardiology, Heart Institute, University of São Paulo Medical School, São Paulo, Brazil
| | - Jose E Krieger
- Laboratory of Genetics and Molecular Cardiology, Heart Institute, University of São Paulo Medical School, São Paulo, Brazil
| | - Samantha K Teixeira
- Laboratory of Genetics and Molecular Cardiology, Heart Institute, University of São Paulo Medical School, São Paulo, Brazil
| |
Collapse
|
40
|
Durairaj J, de Ridder D, van Dijk AD. Beyond sequence: Structure-based machine learning. Comput Struct Biotechnol J 2022; 21:630-643. [PMID: 36659927 PMCID: PMC9826903 DOI: 10.1016/j.csbj.2022.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 12/21/2022] [Accepted: 12/21/2022] [Indexed: 12/31/2022] Open
Abstract
Recent breakthroughs in protein structure prediction demarcate the start of a new era in structural bioinformatics. Combined with various advances in experimental structure determination and the uninterrupted pace at which new structures are published, this promises an age in which protein structure information is as prevalent and ubiquitous as sequence. Machine learning in protein bioinformatics has been dominated by sequence-based methods, but this is now changing to make use of the deluge of rich structural information as input. Machine learning methods making use of structures are scattered across literature and cover a number of different applications and scopes; while some try to address questions and tasks within a single protein family, others aim to capture characteristics across all available proteins. In this review, we look at the variety of structure-based machine learning approaches, how structures can be used as input, and typical applications of these approaches in protein biology. We also discuss current challenges and opportunities in this all-important and increasingly popular field.
Collapse
Affiliation(s)
- Janani Durairaj
- Biozentrum, University of Basel, Basel, Switzerland
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Aalt D.J. van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| |
Collapse
|
41
|
Sakamoto K, Asano S, Ago Y, Hirokawa T. AlphaFold version 2.0 elucidates the binding mechanism between VIPR2 and KS-133, and reveals an S–S bond (Cys25−Cys192) formation of functional significance for VIPR2. Biochem Biophys Res Commun 2022; 636:10-16. [DOI: 10.1016/j.bbrc.2022.10.071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 10/20/2022] [Indexed: 11/02/2022]
|
42
|
Caswell RC, Gunning AC, Owens MM, Ellard S, Wright CF. Assessing the clinical utility of protein structural analysis in genomic variant classification: experiences from a diagnostic laboratory. Genome Med 2022; 14:77. [PMID: 35869530 PMCID: PMC9308257 DOI: 10.1186/s13073-022-01082-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 07/04/2022] [Indexed: 12/21/2022] Open
Abstract
Background The widespread clinical application of genome-wide sequencing has resulted in many new diagnoses for rare genetic conditions, but testing regularly identifies variants of uncertain significance (VUS). The remarkable rise in the amount of genomic data has been paralleled by a rise in the number of protein structures that are now publicly available, which may have clinical utility for the interpretation of missense and in-frame insertions or deletions. Methods Within a UK National Health Service genomic medicine diagnostic laboratory, we investigated the number of VUS over a 5-year period that were evaluated using protein structural analysis and how often this analysis aided variant classification. Results We found 99 novel missense and in-frame variants across 67 genes that were initially classified as VUS by our diagnostic laboratory using standard variant classification guidelines and for which further analysis of protein structure was requested. Evidence from protein structural analysis was used in the re-assessment of 64 variants, of which 47 were subsequently reclassified as pathogenic or likely pathogenic and 17 remained as VUS. We identified several case studies where protein structural analysis aided variant interpretation by predicting disease mechanisms that were consistent with the observed phenotypes, including loss-of-function through thermodynamic destabilisation or disruption of ligand binding, and gain-of-function through de-repression or escape from proteasomal degradation. Conclusions We have shown that using in silico protein structural analysis can aid classification of VUS and give insights into the mechanisms of pathogenicity. Based on our experience, we propose a generic evidence-based workflow for incorporating protein structural information into diagnostic practice to facilitate variant classification. Supplementary Information The online version contains supplementary material available at 10.1186/s13073-022-01082-2.
Collapse
|
43
|
Fukunishi Y, Higo J, Kasahara K. Computer simulation of molecular recognition in biomolecular system: from in silico screening to generalized ensembles. Biophys Rev 2022; 14:1423-1447. [PMID: 36465086 PMCID: PMC9703445 DOI: 10.1007/s12551-022-01015-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 11/06/2022] [Indexed: 11/29/2022] Open
Abstract
Prediction of ligand-receptor complex structure is important in both the basic science and the industry such as drug discovery. We report various computation molecular docking methods: fundamental in silico (virtual) screening, ensemble docking, enhanced sampling (generalized ensemble) methods, and other methods to improve the accuracy of the complex structure. We explain not only the merits of these methods but also their limits of application and discuss some interaction terms which are not considered in the in silico methods. In silico screening and ensemble docking are useful when one focuses on obtaining the native complex structure (the most thermodynamically stable complex). Generalized ensemble method provides a free-energy landscape, which shows the distribution of the most stable complex structure and semi-stable ones in a conformational space. Also, barriers separating those stable structures are identified. A researcher should select one of the methods according to the research aim and depending on complexity of the molecular system to be studied.
Collapse
Affiliation(s)
- Yoshifumi Fukunishi
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), 2-3-26, Aomi, Koto-Ku, Tokyo, 135-0064 Japan
| | - Junichi Higo
- Graduate School of Information Science, University of Hyogo, 7-1-28 Minatojima Minamimachi, Chuo-Ku, Kobe, Hyogo 650-0047 Japan ,Research Organization of Science and Technology, Ritsumeikan University, 1-1-1 Noji-Higashi, Kusatsu, Shiga 525-8577 Japan
| | - Kota Kasahara
- College of Life Sciences, Ritsumeikan University, 1-1-1 Noji-Higashi, Kusatsu, Shiga 525-8577 Japan
| |
Collapse
|
44
|
Delhommel F, Martínez-Lumbreras S, Sattler M. Combining NMR, SAXS and SANS to characterize the structure and dynamics of protein complexes. Methods Enzymol 2022; 678:263-297. [PMID: 36641211 DOI: 10.1016/bs.mie.2022.09.020] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Understanding the structure and dynamics of biological macromolecules is essential to decipher the molecular mechanisms that underlie cellular functions. The description of structure and conformational dynamics often requires the integration of complementary techniques. In this review, we highlight the utility of combining nuclear magnetic resonance (NMR) spectroscopy with small angle scattering (SAS) to characterize these challenging biomolecular systems. NMR can assess the structure and conformational dynamics of multidomain proteins, RNAs and biomolecular complexes. It can efficiently provide information on interaction surfaces, long-distance restraints and relative domain orientations at residue-level resolution. Such information can be readily combined with high-resolution structural data available on subcomponents of biomolecular assemblies. Moreover, NMR is a powerful tool to characterize the dynamics of biomolecules on a wide range of timescales, from nanoseconds to seconds. On the other hand, SAS approaches provide global information on the size and shape of biomolecules and on the ensemble of all conformations present in solution. Therefore, NMR and SAS provide complementary data that are uniquely suited to investigate dynamic biomolecular assemblies. Here, we briefly review the type of data that can be obtained by both techniques and describe different approaches that can be used to combine them to characterize biomolecular assemblies. We then provide guidelines on which experiments are best suited depending on the type of system studied, ranging from fully rigid complexes, dynamic structures that interconvert between defined conformations and systems with very high structural heterogeneity.
Collapse
Affiliation(s)
- Florent Delhommel
- Institute of Structural Biology, Helmholtz Zentrum München, Neuherberg, Germany; Bavarian NMR Center, Department of Chemistry, Technical University of Munich, Garching, Germany
| | - Santiago Martínez-Lumbreras
- Institute of Structural Biology, Helmholtz Zentrum München, Neuherberg, Germany; Bavarian NMR Center, Department of Chemistry, Technical University of Munich, Garching, Germany
| | - Michael Sattler
- Institute of Structural Biology, Helmholtz Zentrum München, Neuherberg, Germany; Bavarian NMR Center, Department of Chemistry, Technical University of Munich, Garching, Germany.
| |
Collapse
|
45
|
Digging into the 3D Structure Predictions of AlphaFold2 with Low Confidence: Disorder and Beyond. Biomolecules 2022; 12:biom12101467. [PMID: 36291675 PMCID: PMC9599455 DOI: 10.3390/biom12101467] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 10/04/2022] [Accepted: 10/05/2022] [Indexed: 01/12/2023] Open
Abstract
AlphaFold2 (AF2) has created a breakthrough in biology by providing three-dimensional structure models for whole-proteome sequences, with unprecedented levels of accuracy. In addition, the AF2 pLDDT score, related to the model confidence, has been shown to provide a good measure of residue-wise disorder. Here, we combined AF2 predictions with pyHCA, a tool we previously developed to identify foldable segments and estimate their order/disorder ratio, from a single protein sequence. We focused our analysis on the AF2 predictions available for 21 reference proteomes (AFDB v1), in particular on their long foldable segments (>30 amino acids) that exhibit characteristics of soluble domains, as estimated by pyHCA. Among these segments, we provided a global analysis of those with very low pLDDT values along their entire length and compared their characteristics to those of segments with very high pLDDT values. We highlighted cases containing conditional order, as well as cases that could form well-folded structures but escape the AF2 prediction due to a shallow multiple sequence alignment and/or undocumented structure or fold. AF2 and pyHCA can therefore be advantageously combined to unravel cryptic structural features in whole proteomes and to refine predictions for different flavors of disorder.
Collapse
|
46
|
Liu Y, Yeung WSB, Chiu PCN, Cao D. Computational approaches for predicting variant impact: An overview from resources, principles to applications. Front Genet 2022; 13:981005. [PMID: 36246661 PMCID: PMC9559863 DOI: 10.3389/fgene.2022.981005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 08/08/2022] [Indexed: 11/13/2022] Open
Abstract
One objective of human genetics is to unveil the variants that contribute to human diseases. With the rapid development and wide use of next-generation sequencing (NGS), massive genomic sequence data have been created, making personal genetic information available. Conventional experimental evidence is critical in establishing the relationship between sequence variants and phenotype but with low efficiency. Due to the lack of comprehensive databases and resources which present clinical and experimental evidence on genotype-phenotype relationship, as well as accumulating variants found from NGS, different computational tools that can predict the impact of the variants on phenotype have been greatly developed to bridge the gap. In this review, we present a brief introduction and discussion about the computational approaches for variant impact prediction. Following an innovative manner, we mainly focus on approaches for non-synonymous variants (nsSNVs) impact prediction and categorize them into six classes. Their underlying rationale and constraints, together with the concerns and remedies raised from comparative studies are discussed. We also present how the predictive approaches employed in different research. Although diverse constraints exist, the computational predictive approaches are indispensable in exploring genotype-phenotype relationship.
Collapse
Affiliation(s)
- Ye Liu
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| | - William S. B. Yeung
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- Department of Obstetrics and Gynaecology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Philip C. N. Chiu
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- Department of Obstetrics and Gynaecology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- *Correspondence: Philip C. N. Chiu, ; Dandan Cao,
| | - Dandan Cao
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- *Correspondence: Philip C. N. Chiu, ; Dandan Cao,
| |
Collapse
|
47
|
Baltzis A, Mansouri L, Jin S, Langer BE, Erb I, Notredame C. Highly significant improvement of protein sequence alignments with AlphaFold2. Bioinformatics 2022; 38:5007-5011. [PMID: 36130276 PMCID: PMC9665868 DOI: 10.1093/bioinformatics/btac625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/29/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Protein sequence alignments are essential to structural, evolutionary and functional analysis, but their accuracy is often limited by sequence similarity unless molecular structures are available. Protein structures predicted at experimental grade accuracy, as achieved by AlphaFold2, could therefore have a major impact on sequence analysis. RESULTS Here, we find that multiple sequence alignments estimated on AlphaFold2 predictions are almost as accurate as alignments estimated on experimental structures and significantly closer to the structural reference than sequence-based alignments. We also show that AlphaFold2 structural models of relatively low quality can be used to obtain highly accurate alignments. These results suggest that, besides structure modeling, AlphaFold2 encodes higher-order dependencies that can be exploited for sequence analysis. AVAILABILITY AND IMPLEMENTATION All data, analyses and results are available on Zenodo (https://doi.org/10.5281/zenodo.7031286). The code and scripts have been deposited in GitHub (https://github.com/cbcrg/msa-af2-nf) and the various containers in (https://cloud.sylabs.io/library/athbaltzis/af2/alphafold, https://hub.docker.com/r/athbaltzis/pred). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Suzanne Jin
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain
| | - Björn E Langer
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain
| | - Ionas Erb
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain
| | | |
Collapse
|
48
|
Abstract
![]()
AlphaFold has burst into our lives. A powerful algorithm
that underscores
the strength of biological sequence data and artificial intelligence
(AI). AlphaFold has appended projects and research directions. The
database it has been creating promises an untold number of applications
with vast potential impacts that are still difficult to surmise. AI
approaches can revolutionize personalized treatments and usher in
better-informed clinical trials. They promise to make giant leaps
toward reshaping and revamping drug discovery strategies, selecting
and prioritizing combinations of drug targets. Here, we briefly overview
AI in structural biology, including in molecular dynamics simulations
and prediction of microbiota–human protein–protein interactions.
We highlight the advancements accomplished by the deep-learning-powered
AlphaFold in protein structure prediction and their powerful impact
on the life sciences. At the same time, AlphaFold does not resolve
the decades-long protein folding challenge, nor does it identify the
folding pathways. The models that AlphaFold provides do not capture
conformational mechanisms like frustration and allostery, which are
rooted in ensembles, and controlled by their dynamic distributions.
Allostery and signaling are properties of populations. AlphaFold also
does not generate ensembles of intrinsically disordered proteins and
regions, instead describing them by their low structural probabilities.
Since AlphaFold generates single ranked structures, rather than conformational
ensembles, it cannot elucidate the mechanisms of allosteric activating
driver hotspot mutations nor of allosteric drug resistance. However,
by capturing key features, deep learning techniques can use the single
predicted conformation as the basis for generating a diverse ensemble.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, United States.,Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Mingzhen Zhang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| | - Yonglan Liu
- Cancer Innovation Laboratory, National Cancer Institute, Frederick, Maryland 21702, United States
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| |
Collapse
|
49
|
Piper SJ, Johnson RM, Wootten D, Sexton PM. Membranes under the Magnetic Lens: A Dive into the Diverse World of Membrane Protein Structures Using Cryo-EM. Chem Rev 2022; 122:13989-14017. [PMID: 35849490 DOI: 10.1021/acs.chemrev.1c00837] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Membrane proteins are highly diverse in both structure and function and can, therefore, present different challenges for structure determination. They are biologically important for cells and organisms as gatekeepers for information and molecule transfer across membranes, but each class of membrane proteins can present unique obstacles to structure determination. Historically, many membrane protein structures have been investigated using highly engineered constructs or using larger fusion proteins to improve solubility and/or increase particle size. Other strategies included the deconstruction of the full-length protein to target smaller soluble domains. These manipulations were often required for crystal formation to support X-ray crystallography or to circumvent lower resolution due to high noise and dynamic motions of protein subdomains. However, recent revolutions in membrane protein biochemistry and cryo-electron microscopy now provide an opportunity to solve high resolution structures of both large, >1 megadalton (MDa), and small, <100 kDa (kDa), drug targets in near-native conditions, routinely reaching resolutions around or below 3 Å. This review provides insights into how the recent advances in membrane biology and biochemistry, as well as technical advances in cryo-electron microscopy, help us to solve structures of a large variety of membrane protein groups, from small receptors to large transporters and more complex machineries.
Collapse
Affiliation(s)
- Sarah J Piper
- Drug Discovery Biology theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville 3052, Victoria, Australia.,ARC Centre for Cryo-electron Microscopy of Membrane Proteins, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville 3052, Victoria, Australia
| | - Rachel M Johnson
- Drug Discovery Biology theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville 3052, Victoria, Australia.,ARC Centre for Cryo-electron Microscopy of Membrane Proteins, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville 3052, Victoria, Australia
| | - Denise Wootten
- Drug Discovery Biology theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville 3052, Victoria, Australia.,ARC Centre for Cryo-electron Microscopy of Membrane Proteins, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville 3052, Victoria, Australia
| | - Patrick M Sexton
- Drug Discovery Biology theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville 3052, Victoria, Australia.,ARC Centre for Cryo-electron Microscopy of Membrane Proteins, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville 3052, Victoria, Australia
| |
Collapse
|
50
|
Hunter GA, Ferreira GC. An Extended C-Terminus, the Possible Culprit for Differential Regulation of 5-Aminolevulinate Synthase Isoforms. Front Mol Biosci 2022; 9:920668. [PMID: 35911972 PMCID: PMC9329541 DOI: 10.3389/fmolb.2022.920668] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 05/30/2022] [Indexed: 12/05/2022] Open
Abstract
5-Aminolevulinate synthase (ALAS; E.C. 2.3.1.37) is a pyridoxal 5′-phosphate (PLP)-dependent enzyme that catalyzes the key regulatory step of porphyrin biosynthesis in metazoa, fungi, and α-proteobacteria. ALAS is evolutionarily related to transaminases and is therefore classified as a fold type I PLP-dependent enzyme. As an enzyme controlling the key committed and rate-determining step of a crucial biochemical pathway ALAS is ideally positioned to be subject to allosteric feedback inhibition. Extensive kinetic and mutational studies demonstrated that the overall enzyme reaction is limited by subtle conformational changes of a hairpin loop gating the active site. These findings, coupled with structural information, facilitated early prediction of allosteric regulation of activity via an extended C-terminal tail unique to eukaryotic forms of the enzyme. This prediction was subsequently supported by the discoveries that mutations in the extended C-terminus of the erythroid ALAS isoform (ALAS2) cause a metabolic disorder known as X-linked protoporphyria not by diminishing activity, but by enhancing it. Furthermore, kinetic, structural, and molecular modeling studies demonstrated that the extended C-terminal tail controls the catalytic rate by modulating conformational flexibility of the active site loop. However, the precise identity of any such molecule remains to be defined. Here we discuss the most plausible allosteric regulators of ALAS activity based on divergences in AlphaFold-predicted ALAS structures and suggest how the mystery of the mechanism whereby the extended C-terminus of mammalian ALASs allosterically controls the rate of porphyrin biosynthesis might be unraveled.
Collapse
Affiliation(s)
- Gregory A. Hunter
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, United States
- *Correspondence: Gregory A. Hunter, ; Gloria C. Ferreira,
| | - Gloria C. Ferreira
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, United States
- Department of Chemistry, College of Arts and Sciences, University of South Florida, Tampa, FL, United States
- Global and Planetary Health, College of Public Health, University of South Florida, Tampa, FL, United States
- *Correspondence: Gregory A. Hunter, ; Gloria C. Ferreira,
| |
Collapse
|