1
|
Rosignoli S, Pacelli M, Manganiello F, Paiardini A. An outlook on structural biology after AlphaFold: tools, limits and perspectives. FEBS Open Bio 2025; 15:202-222. [PMID: 39313455 DOI: 10.1002/2211-5463.13902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 08/19/2024] [Accepted: 09/13/2024] [Indexed: 09/25/2024] Open
Abstract
AlphaFold and similar groundbreaking, AI-based tools, have revolutionized the field of structural bioinformatics, with their remarkable accuracy in ab-initio protein structure prediction. This success has catalyzed the development of new software and pipelines aimed at incorporating AlphaFold's predictions, often focusing on addressing the algorithm's remaining challenges. Here, we present the current landscape of structural bioinformatics shaped by AlphaFold, and discuss how the field is dynamically responding to this revolution, with new software, methods, and pipelines. While the excitement around AI-based tools led to their widespread application, it is essential to acknowledge that their practical success hinges on their integration into established protocols within structural bioinformatics, often neglected in the context of AI-driven advancements. Indeed, user-driven intervention is still as pivotal in the structure prediction process as in complementing state-of-the-art algorithms with functional and biological knowledge.
Collapse
Affiliation(s)
- Serena Rosignoli
- Department of Biochemical sciences "A. Rossi Fanelli", Sapienza Università di Roma, Italy
| | - Maddalena Pacelli
- Department of Biochemical sciences "A. Rossi Fanelli", Sapienza Università di Roma, Italy
| | - Francesca Manganiello
- Department of Biochemical sciences "A. Rossi Fanelli", Sapienza Università di Roma, Italy
| | - Alessandro Paiardini
- Department of Biochemical sciences "A. Rossi Fanelli", Sapienza Università di Roma, Italy
| |
Collapse
|
2
|
McCaffrey P, Jackups R, Seheult J, Zaydman MA, Balis U, Thaker HM, Rashidi H, Gullapalli RR. Evaluating Use of Generative Artificial Intelligence in Clinical Pathology Practice: Opportunities and the Way Forward. Arch Pathol Lab Med 2025; 149:130-141. [PMID: 39384182 DOI: 10.5858/arpa.2024-0208-ra] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/05/2024] [Indexed: 10/11/2024]
Abstract
CONTEXT.— Generative artificial intelligence (GAI) technologies are likely to dramatically impact health care workflows in clinical pathology (CP). Applications in CP include education, data mining, decision support, result summaries, and patient trend assessments. OBJECTIVE.— To review use cases of GAI in CP, with a particular focus on large language models. Specific examples are provided for the applications of GAI in the subspecialties of clinical chemistry, microbiology, hematopathology, and molecular diagnostics. Additionally, the review addresses potential pitfalls of GAI paradigms. DATA SOURCES.— Current literature on GAI in health care was reviewed broadly. The use case scenarios for each CP subspecialty review common data sources generated in each subspecialty. The potential for utilization of CP data in the GAI context was subsequently assessed, focusing on issues such as future reporting paradigms, impact on quality metrics, and potential for translational research activities. CONCLUSIONS.— GAI is a powerful tool with the potential to revolutionize health care for patients and practitioners alike. However, GAI must be implemented with much caution considering various shortcomings of the technology such as biases, hallucinations, practical challenges of implementing GAI in existing CP workflows, and end-user acceptance. Human-in-the-loop models of GAI implementation have the potential to revolutionize CP by delivering deeper, meaningful insights into patient outcomes both at an individual and a population level.
Collapse
Affiliation(s)
- Peter McCaffrey
- From the Departments of Pathology (McCaffrey, Thaker) and Radiology (McCaffrey), University of Texas Medical Branch, Galveston
| | - Ronald Jackups
- the Department of Pathology and Immunology, Washington University School of Medicine, St Louis, Missouri (Jackups, Zaydman)
| | - Jansen Seheult
- the Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota (Seheult)
| | - Mark A Zaydman
- the Department of Pathology and Immunology, Washington University School of Medicine, St Louis, Missouri (Jackups, Zaydman)
| | - Ulysses Balis
- the Department of Pathology, University of Michigan, Ann Arbor (Balis)
| | - Harshwardhan M Thaker
- From the Departments of Pathology (McCaffrey, Thaker) and Radiology (McCaffrey), University of Texas Medical Branch, Galveston
| | - Hooman Rashidi
- Computational Pathology & AI Center of Excellence, University of Pittsburgh, School of Medicine & UPMC, Pittsburgh, Pennsylvania (Rashidi)
| | - Rama R Gullapalli
- the Department of Pathology, Department of Chemical and Biological Engineering, University of New Mexico, Albuquerque (Gullapalli)
| |
Collapse
|
3
|
Li SW, Ren PX, Wang L, Han QL, Li FL, Li HL, Bai F. MAI-TargetFisher: A proteome-wide drug target prediction method synergetically enhanced by artificial intelligence and physical modeling. Acta Pharmacol Sin 2025:10.1038/s41401-024-01444-z. [PMID: 39870848 DOI: 10.1038/s41401-024-01444-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Accepted: 11/24/2024] [Indexed: 01/29/2025] Open
Abstract
Computational target identification plays a pivotal role in the drug development process. With the significant advancements of deep learning methods for protein structure prediction, the structural coverage of human proteome has increased substantially. This progress inspired the development of the first genome-wide small molecule targets scanning method. Our method aims to localize drug targets and detect potential off-target effects early in the drug discovery process, thereby improving the success rate of drug development. We have constructed a high-quality database of protein structures with annotated potential binding sites, covering 82% of the protein-coding genome. On the basis of this database, to enhance our search capabilities, we have integrated computational techniques, including both artificial intelligence-based and biophysical model-based methods. This integration led to the development of a target identification method called Multi-Algorithm Integrated Target Fisher (MAI-TargetFisher). MAI-TargetFisher leverages the complementary strengths of various methods while minimizing their weaknesses, enabling precise database navigation to generate a reliably ranked set of candidate targets for an active query molecule. Importantly, our work is the first comprehensive scan of protein surfaces across the entire human genome, aimed at evaluating potential small molecule binding sites on each protein. Through a series of evaluations on benchmark and a target identification task, the results demonstrate the high hit rates and good reliability of our method under the validation of wet experiments. We have also made available a freely accessible web server at https://bailab.siais.shanghaitech.edu.cn/mai-targetfisher for non-commercial use.
Collapse
Affiliation(s)
- Shi-Wei Li
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China
| | - Peng-Xuan Ren
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China
| | - Lin Wang
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China
| | - Qi-Lei Han
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China
| | - Feng-Lei Li
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China
| | - Hong-Lin Li
- Innovation Center for AI and Drug Discovery, East China Normal University, Shanghai, 200062, China
| | - Fang Bai
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
- Shanghai Clinical Research and Trial Center, Shanghai, 201210, China.
| |
Collapse
|
4
|
Pitarch B, Pazos F. Deep Learning Approaches for the Prediction of Protein Functional Sites. Molecules 2025; 30:214. [PMID: 39860084 PMCID: PMC11767512 DOI: 10.3390/molecules30020214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2024] [Revised: 12/20/2024] [Accepted: 01/01/2025] [Indexed: 01/27/2025] Open
Abstract
Knowing which residues of a protein are important for its function is of paramount importance for understanding the molecular basis of this function and devising ways of modifying it for medical or biotechnological applications. Due to the difficulty in detecting these residues experimentally, prediction methods are essential to cope with the sequence deluge that is filling databases with uncharacterized protein sequences. Deep learning approaches are especially well suited for this task due to the large amounts of protein sequences for training them, the trivial codification of this sequence data to feed into these systems, and the intrinsic sequential nature of the data that makes them suitable for language models. As a consequence, deep learning-based approaches are being applied to the prediction of different types of functional sites and regions in proteins. This review aims to give an overview of the current landscape of methodologies so that interested users can have an idea of which kind of approaches are available for their proteins of interest. We also try to give an idea of how these systems work, as well as explain their limitations and high dependence on the training set so that users are aware of the quality of expected results.
Collapse
Affiliation(s)
| | - Florencio Pazos
- Computational Systems Biology Group, National Center for Biotechnology (CNB-CSIC), 28049 Madrid, Spain;
| |
Collapse
|
5
|
Mutz P, Camargo AP, Sahakyan H, Neri U, Butkovic A, Wolf YI, Krupovic M, Dolja VV, Koonin EV. The protein structurome of Orthornavirae and its dark matter. mBio 2024:e0320024. [PMID: 39714180 DOI: 10.1128/mbio.03200-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Accepted: 10/28/2024] [Indexed: 12/24/2024] Open
Abstract
Metatranscriptomics is uncovering more and more diverse families of viruses with RNA genomes comprising the viral kingdom Orthornavirae in the realm Riboviria. Thorough protein annotation and comparison are essential to get insights into the functions of viral proteins and virus evolution. In addition to sequence- and hmm profile‑based methods, protein structure comparison adds a powerful tool to uncover protein functions and relationships. We constructed an Orthornavirae "structurome" consisting of already annotated as well as unannotated ("dark matter") proteins and domains encoded in viral genomes. We used protein structure modeling and similarity searches to illuminate the remaining dark matter in hundreds of thousands of orthornavirus genomes. The vast majority of the dark matter domains showed either "generic" folds, such as single α-helices, or no high confidence structure predictions. Nevertheless, a variety of lineage-specific globular domains that were new either to orthornaviruses in general or to particular virus families were identified within the proteomic dark matter of orthornaviruses, including several predicted nucleic acid-binding domains and nucleases. In addition, we identified a case of exaptation of a cellular nucleoside monophosphate kinase as an RNA-binding protein in several virus families. Notwithstanding the continuing discovery of numerous orthornaviruses, it appears that all the protein domains conserved in large groups of viruses have already been identified. The rest of the viral proteome seems to be dominated by poorly structured domains including intrinsically disordered ones that likely mediate specific virus-host interactions. IMPORTANCE Advanced methods for protein structure prediction, such as AlphaFold2, greatly expand our capability to identify protein domains and infer their likely functions and evolutionary relationships. This is particularly pertinent for proteins encoded by viruses that are known to evolve rapidly and as a result often cannot be adequately characterized by analysis of the protein sequences. We performed an exhaustive structure prediction and comparative analysis for uncharacterized proteins and domains ("dark matter") encoded by viruses with RNA genomes. The results show the dark matter of RNA virus proteome consists mostly of disordered and all-α-helical domains that cannot be readily assigned a specific function and that likely mediate various interactions between viral proteins and between viral and host proteins. The great majority of globular proteins and domains of RNA viruses are already known although we identified several unexpected domains represented in individual viral families.
Collapse
Affiliation(s)
- Pascal Mutz
- Division of Intramural Research, Computational Biology Branch, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Antonio Pedro Camargo
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Harutyun Sahakyan
- Division of Intramural Research, Computational Biology Branch, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Uri Neri
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Anamarija Butkovic
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Archaeal Virology Unit, Paris, France
| | - Yuri I Wolf
- Division of Intramural Research, Computational Biology Branch, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Mart Krupovic
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Archaeal Virology Unit, Paris, France
| | - Valerian V Dolja
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon, USA
| | - Eugene V Koonin
- Division of Intramural Research, Computational Biology Branch, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
6
|
Li C, Luo Y, Xie Y, Zhang Z, Liu Y, Zou L, Xiao F. Structural and functional prediction, evaluation, and validation in the post-sequencing era. Comput Struct Biotechnol J 2024; 23:446-451. [PMID: 38223342 PMCID: PMC10787220 DOI: 10.1016/j.csbj.2023.12.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 12/20/2023] [Accepted: 12/22/2023] [Indexed: 01/16/2024] Open
Abstract
The surge of genome sequencing data has underlined substantial genetic variants of uncertain significance (VUS). The decryption of VUS discovered by sequencing poses a major challenge in the post-sequencing era. Although experimental assays have progressed in classifying VUS, only a tiny fraction of the human genes have been explored experimentally. Thus, it is urgently needed to generate state-of-the-art functional predictors of VUS in silico. Artificial intelligence (AI) is an invaluable tool to assist in the identification of VUS with high efficiency and accuracy. An increasing number of studies indicate that AI has brought an exciting acceleration in the interpretation of VUS, and our group has already used AI to develop protein structure-based prediction models. In this review, we provide an overview of the previous research on AI-based prediction of missense variants, and elucidate the challenges and opportunities for protein structure-based variant prediction in the post-sequencing era.
Collapse
Affiliation(s)
- Chang Li
- Clinical Biobank, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Yixuan Luo
- Beijing Normal University, Beijing, China
| | - Yibo Xie
- Information Center, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Zaifeng Zhang
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Ye Liu
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Lihui Zou
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Fei Xiao
- Clinical Biobank, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
- Beijing Normal University, Beijing, China
| |
Collapse
|
7
|
Heinzinger M, Rost B. Artificial Intelligence Learns Protein Prediction. Cold Spring Harb Perspect Biol 2024; 16:a041458. [PMID: 38858069 PMCID: PMC11368192 DOI: 10.1101/cshperspect.a041458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2024]
Abstract
From AlphaGO over StableDiffusion to ChatGPT, the recent decade of exponential advances in artificial intelligence (AI) has been altering life. In parallel, advances in computational biology are beginning to decode the language of life: AlphaFold2 leaped forward in protein structure prediction, and protein language models (pLMs) replaced expertise and evolutionary information from multiple sequence alignments with information learned from reoccurring patterns in databases of billions of proteins without experimental annotations other than the amino acid sequences. None of those tools could have been developed 10 years ago; all will increase the wealth of experimental data and speed up the cycle from idea to proof. AI is affecting molecular and medical biology at giant steps, and the most important might be the leap toward more powerful protein design.
Collapse
Affiliation(s)
- Michael Heinzinger
- Technical University of Munich (TUM) School of School of Computation, Information and Technology (CIT), Bioinformatics and Computational Biology - i12, 85748 Garching/Munich, Germany
| | - Burkhard Rost
- Technical University of Munich (TUM) School of School of Computation, Information and Technology (CIT), Bioinformatics and Computational Biology - i12, 85748 Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), 85748 Garching/Munich, Germany
- TUM School of Life Sciences Weihenstephan (WZW), 85354 Freising, Germany
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| |
Collapse
|
8
|
Manen-Freixa L, Antolin AA. Polypharmacology prediction: the long road toward comprehensively anticipating small-molecule selectivity to de-risk drug discovery. Expert Opin Drug Discov 2024; 19:1043-1069. [PMID: 39004919 DOI: 10.1080/17460441.2024.2376643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 07/02/2024] [Indexed: 07/16/2024]
Abstract
INTRODUCTION Small molecules often bind to multiple targets, a behavior termed polypharmacology. Anticipating polypharmacology is essential for drug discovery since unknown off-targets can modulate safety and efficacy - profoundly affecting drug discovery success. Unfortunately, experimental methods to assess selectivity present significant limitations and drugs still fail in the clinic due to unanticipated off-targets. Computational methods are a cost-effective, complementary approach to predict polypharmacology. AREAS COVERED This review aims to provide a comprehensive overview of the state of polypharmacology prediction and discuss its strengths and limitations, covering both classical cheminformatics methods and bioinformatic approaches. The authors review available data sources, paying close attention to their different coverage. The authors then discuss major algorithms grouped by the types of data that they exploit using selected examples. EXPERT OPINION Polypharmacology prediction has made impressive progress over the last decades and contributed to identify many off-targets. However, data incompleteness currently limits most approaches to comprehensively predict selectivity. Moreover, our limited agreement on model assessment challenges the identification of the best algorithms - which at present show modest performance in prospective real-world applications. Despite these limitations, the exponential increase of multidisciplinary Big Data and AI hold much potential to better polypharmacology prediction and de-risk drug discovery.
Collapse
Affiliation(s)
- Leticia Manen-Freixa
- Oncobell Division, Bellvitge Biomedical Research Institute (IDIBELL) and ProCURE Department, Catalan Institute of Oncology (ICO), Barcelona, Spain
| | - Albert A Antolin
- Oncobell Division, Bellvitge Biomedical Research Institute (IDIBELL) and ProCURE Department, Catalan Institute of Oncology (ICO), Barcelona, Spain
- Center for Cancer Drug Discovery, The Division of Cancer Therapeutics, The Institute of Cancer Research, London, UK
| |
Collapse
|
9
|
Kwon S, Jung N, Yang J, Seok C. GalaxySagittarius-AF: Predicting Targets for Drug-Like Compounds in the Extended Human 3D Proteome. J Mol Biol 2024; 436:168617. [PMID: 39237198 DOI: 10.1016/j.jmb.2024.168617] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 05/12/2024] [Accepted: 05/14/2024] [Indexed: 09/07/2024]
Abstract
In recent years, advancements in deep learning techniques have significantly expanded the structural coverage of the human proteome. GalaxySagittarius-AF translates these achievements in structure prediction into target prediction for druglike compounds by incorporating predicted structures. This web server searches the database of human protein structures using both similarity- and structure-based approaches, suggesting potential targets for a given druglike compound. In comparison to its predecessor, GalaxySagittarius, GalaxySagittarius-AF utilizes an enlarged structure database, incorporating curated AlphaFold model structures alongside their binding sites and ligands, predicted using an updated version of GalaxySite. GalaxySagittarius-AF covers a large human protein space compared to many other available computational target screening methods. The structure-based prediction method enhances the use of expanded structural information, differentiating it from other target prediction servers that rely on ligand-based methods. Additionally, the web server has undergone enhancements, operating two to three times faster than its predecessor. The updated report page provides comprehensive information on the sequence and structure of the predicted protein targets. GalaxySagittarius-AF is accessible at https://galaxy.seoklab.org/sagittarius_af without the need for registration.
Collapse
Affiliation(s)
- Sohee Kwon
- Department of Chemistry, Seoul National University, Seoul 08826, Republic of Korea; Galux Inc, Seoul 08738, Republic of Korea.
| | - Nuri Jung
- Department of Chemistry, Seoul National University, Seoul 08826, Republic of Korea.
| | | | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul 08826, Republic of Korea; Galux Inc, Seoul 08738, Republic of Korea.
| |
Collapse
|
10
|
Weller J, Rohs R. Structure-Based Drug Design with a Deep Hierarchical Generative Model. J Chem Inf Model 2024; 64:6450-6463. [PMID: 39058534 PMCID: PMC11350878 DOI: 10.1021/acs.jcim.4c01193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 07/16/2024] [Accepted: 07/17/2024] [Indexed: 07/28/2024]
Abstract
Recently, the remarkable growth of available crystal structure data and libraries of commercially available or readily synthesizable molecules have unlocked previously inaccessible regions of chemical space for drug development. Paired with improvements in virtual ligand screening methods, these expanded libraries are having a notable impact on early drug design efforts. Yet screening-based methods still face scalability limits, due to computational constraints and the sheer scale of drug-like space. Machine learning approaches are overcoming these limitations by learning the fundamental intra- and intermolecular relationships in drug-target systems from existing data. Here, we introduce DrugHIVE, a deep hierarchical variational autoencoder that outperforms state-of-the-art autoregressive and diffusion-based methods in both speed and performance on common generative benchmarks. DrugHIVE's hierarchical design enables improved control over molecular generation. Its capabilities include dramatically increasing virtual screening efficiency and accelerating a wide range of common drug design tasks, including de novo generation, molecular optimization, scaffold hopping, linker design, and high-throughput pattern replacement. Our highly scalable method can even be applied to receptors with high-confidence AlphaFold-predicted structures, extending the ability to generate high-quality drug-like molecules to a majority of the unsolved human proteome.
Collapse
Affiliation(s)
- Jesse
A. Weller
- Department
of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, United States
- Department
of Physics and Astronomy, University of
Southern California, Los Angeles, California 90089, United States
| | - Remo Rohs
- Department
of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, United States
- Department
of Physics and Astronomy, University of
Southern California, Los Angeles, California 90089, United States
- Department
of Chemistry, University of Southern California, Los Angeles, California 90089, United States
- Thomas
Lord Department of Computer Science, University
of Southern California, Los Angeles, California 90089, United States
| |
Collapse
|
11
|
Badonyi M, Marsh JA. Proteome-scale prediction of molecular mechanisms underlying dominant genetic diseases. PLoS One 2024; 19:e0307312. [PMID: 39172982 PMCID: PMC11341024 DOI: 10.1371/journal.pone.0307312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 06/26/2024] [Indexed: 08/24/2024] Open
Abstract
Many dominant genetic disorders result from protein-altering mutations, acting primarily through dominant-negative (DN), gain-of-function (GOF), and loss-of-function (LOF) mechanisms. Deciphering the mechanisms by which dominant diseases exert their effects is often experimentally challenging and resource intensive, but is essential for developing appropriate therapeutic approaches. Diseases that arise via a LOF mechanism are more amenable to be treated by conventional gene therapy, whereas DN and GOF mechanisms may require gene editing or targeting by small molecules. Moreover, pathogenic missense mutations that act via DN and GOF mechanisms are more difficult to identify than those that act via LOF using nearly all currently available variant effect predictors. Here, we introduce a tripartite statistical model made up of support vector machine binary classifiers trained to predict whether human protein coding genes are likely to be associated with DN, GOF, or LOF molecular disease mechanisms. We test the utility of the predictions by examining biologically and clinically meaningful properties known to be associated with the mechanisms. Our results strongly support that the models are able to generalise on unseen data and offer insight into the functional attributes of proteins associated with different mechanisms. We hope that our predictions will serve as a springboard for researchers studying novel variants and those of uncertain clinical significance, guiding variant interpretation strategies and experimental characterisation. Predictions for the human UniProt reference proteome are available at https://osf.io/z4dcp/.
Collapse
Affiliation(s)
- Mihaly Badonyi
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom
| | - Joseph A. Marsh
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
12
|
Correa Marrero M, Jänes J, Baptista D, Beltrao P. Integrating Large-Scale Protein Structure Prediction into Human Genetics Research. Annu Rev Genomics Hum Genet 2024; 25:123-140. [PMID: 38621234 DOI: 10.1146/annurev-genom-120622-020615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
The last five years have seen impressive progress in deep learning models applied to protein research. Most notably, sequence-based structure predictions have seen transformative gains in the form of AlphaFold2 and related approaches. Millions of missense protein variants in the human population lack annotations, and these computational methods are a valuable means to prioritize variants for further analysis. Here, we review the recent progress in deep learning models applied to the prediction of protein structure and protein variants, with particular emphasis on their implications for human genetics and health. Improved prediction of protein structures facilitates annotations of the impact of variants on protein stability, protein-protein interaction interfaces, and small-molecule binding pockets. Moreover, it contributes to the study of host-pathogen interactions and the characterization of protein function. As genome sequencing in large cohorts becomes increasingly prevalent, we believe that better integration of state-of-the-art protein informatics technologies into human genetics research is of paramount importance.
Collapse
Affiliation(s)
- Miguel Correa Marrero
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| | - Jürgen Jänes
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| | | | - Pedro Beltrao
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| |
Collapse
|
13
|
Sawada R, Sakajiri Y, Shibata T, Yamanishi Y. Predicting therapeutic and side effects from drug binding affinities to human proteome structures. iScience 2024; 27:110032. [PMID: 38868195 PMCID: PMC11167438 DOI: 10.1016/j.isci.2024.110032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Revised: 04/08/2024] [Accepted: 05/16/2024] [Indexed: 06/14/2024] Open
Abstract
Evaluation of the binding affinities of drugs to proteins is a crucial process for identifying drug pharmacological actions, but it requires three dimensional structures of proteins. Herein, we propose novel computational methods to predict the therapeutic indications and side effects of drug candidate compounds from the binding affinities to human protein structures on a proteome-wide scale. Large-scale docking simulations were performed for 7,582 drugs with 19,135 protein structures revealed by AlphaFold (including experimentally unresolved proteins), and machine learning models on the proteome-wide binding affinity score (PBAS) profiles were constructed. We demonstrated the usefulness of the method for predicting the therapeutic indications for 559 diseases and side effects for 285 toxicities. The method enabled to predict drug indications for which the related protein structures had not been experimentally determined and to successfully extract proteins eliciting the side effects. The proposed method will be useful in various applications in drug discovery.
Collapse
Affiliation(s)
- Ryusuke Sawada
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Iizuka, Japan
- Department of Pharmacology, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
| | - Yuko Sakajiri
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Iizuka, Japan
- Graduate School of Informatics, Nagoya University, Chikusa, Nagoya, Japan
| | - Tomokazu Shibata
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Iizuka, Japan
| | - Yoshihiro Yamanishi
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Iizuka, Japan
- Graduate School of Informatics, Nagoya University, Chikusa, Nagoya, Japan
| |
Collapse
|
14
|
Wang L, Wen Z, Liu SW, Zhang L, Finley C, Lee HJ, Fan HJS. Overview of AlphaFold2 and breakthroughs in overcoming its limitations. Comput Biol Med 2024; 176:108620. [PMID: 38761500 DOI: 10.1016/j.compbiomed.2024.108620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 05/01/2024] [Accepted: 05/14/2024] [Indexed: 05/20/2024]
Abstract
Predicting three-dimensional (3D) protein structures has been challenging for decades. The emergence of AlphaFold2 (AF2), a deep learning-based machine learning method developed by DeepMind, became a game changer in the protein folding community. AF2 can predict a protein's three-dimensional structure with high confidence based on its amino acid sequence. Accurate prediction of protein structures can dramatically accelerate our understanding of biological mechanisms and provide a solid foundation for reliable drug design. Although AF2 breaks through the barriers in predicting protein structures, many rooms remain to be further studied. This review provides a brief historical overview of the development of protein structure prediction, covering template-based, template-free, and machine learning-based methods. In addition to reviewing the potential benefits (Pros) and considerations (Cons) of using AF2, this review summarizes the diverse applications, including protein structure predictions, dynamic changes, point mutation, integration of language model and experimental data, protein complex, and protein-peptide interaction. It underscores recent advancements in efficiency, reliability, and broad application of AF2. This comprehensive review offers valuable insights into the applications of AF2 and AF2-inspired AI methods in structural biology and its potential for clinically significant drug target discovery.
Collapse
Affiliation(s)
- Lei Wang
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China
| | - Zehua Wen
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China
| | - Shi-Wei Liu
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China
| | - Lihong Zhang
- Digestive Department, Binhai New Area Hospital of TCM Tianjin, Tianjin, 300451, China
| | - Cierra Finley
- Department of Natural Sciences, Southwest Tennessee Community College, Memphis, TN, 38015, USA
| | - Ho-Jin Lee
- Department of Natural Sciences, Southwest Tennessee Community College, Memphis, TN, 38015, USA; Division of Natural & Mathematical Sciences, LeMoyne-Own College, Memphis, TN, 38126, USA.
| | - Hua-Jun Shawn Fan
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China.
| |
Collapse
|
15
|
Passi G, Lieberman S, Zahdeh F, Murik O, Renbaum P, Beeri R, Linial M, May D, Levy-Lahad E, Schneidman-Duhovny D. Discovering predisposing genes for hereditary breast cancer using deep learning. Brief Bioinform 2024; 25:bbae346. [PMID: 39038933 PMCID: PMC11262808 DOI: 10.1093/bib/bbae346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 04/18/2024] [Accepted: 07/04/2024] [Indexed: 07/24/2024] Open
Abstract
Breast cancer (BC) is the most common malignancy affecting Western women today. It is estimated that as many as 10% of BC cases can be attributed to germline variants. However, the genetic basis of the majority of familial BC cases has yet to be identified. Discovering predisposing genes contributing to familial BC is challenging due to their presumed rarity, low penetrance, and complex biological mechanisms. Here, we focused on an analysis of rare missense variants in a cohort of 12 families of Middle Eastern origins characterized by a high incidence of BC cases. We devised a novel, high-throughput, variant analysis pipeline adapted for family studies, which aims to analyze variants at the protein level by employing state-of-the-art machine learning models and three-dimensional protein structural analysis. Using our pipeline, we analyzed 1218 rare missense variants that are shared between affected family members and classified 80 genes as candidate pathogenic. Among these genes, we found significant functional enrichment in peroxisomal and mitochondrial biological pathways which segregated across seven families in the study and covered diverse ethnic groups. We present multiple evidence that peroxisomal and mitochondrial pathways play an important, yet underappreciated, role in both germline BC predisposition and BC survival.
Collapse
Affiliation(s)
- Gal Passi
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Sari Lieberman
- The Fuld Family Medical Genetics Institute, Shaare Zedek Medical Center 12 Bayit St., Jerusalem 9103101, Israel
- The Eisenberg R&D Authority, Shaare Zedek Medical Center, 12 Bayit St., Jerusalem 9103101, Israel
- Faculty of Medicine, The Hebrew University of Jerusalem, Ein Kerem PO Box 12271 Jerusalem 9112102, Israel
| | - Fouad Zahdeh
- The Fuld Family Medical Genetics Institute, Shaare Zedek Medical Center 12 Bayit St., Jerusalem 9103101, Israel
- The Eisenberg R&D Authority, Shaare Zedek Medical Center, 12 Bayit St., Jerusalem 9103101, Israel
| | - Omer Murik
- The Fuld Family Medical Genetics Institute, Shaare Zedek Medical Center 12 Bayit St., Jerusalem 9103101, Israel
- The Eisenberg R&D Authority, Shaare Zedek Medical Center, 12 Bayit St., Jerusalem 9103101, Israel
| | - Paul Renbaum
- The Fuld Family Medical Genetics Institute, Shaare Zedek Medical Center 12 Bayit St., Jerusalem 9103101, Israel
- The Eisenberg R&D Authority, Shaare Zedek Medical Center, 12 Bayit St., Jerusalem 9103101, Israel
| | - Rachel Beeri
- The Fuld Family Medical Genetics Institute, Shaare Zedek Medical Center 12 Bayit St., Jerusalem 9103101, Israel
- The Eisenberg R&D Authority, Shaare Zedek Medical Center, 12 Bayit St., Jerusalem 9103101, Israel
| | - Michal Linial
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, Jerusalem 91904, Israel
| | - Dalit May
- The Fuld Family Medical Genetics Institute, Shaare Zedek Medical Center 12 Bayit St., Jerusalem 9103101, Israel
- The Eisenberg R&D Authority, Shaare Zedek Medical Center, 12 Bayit St., Jerusalem 9103101, Israel
- Clalit Health Services, Jerusalem, Israel
| | - Ephrat Levy-Lahad
- The Fuld Family Medical Genetics Institute, Shaare Zedek Medical Center 12 Bayit St., Jerusalem 9103101, Israel
- The Eisenberg R&D Authority, Shaare Zedek Medical Center, 12 Bayit St., Jerusalem 9103101, Israel
- Faculty of Medicine, The Hebrew University of Jerusalem, Ein Kerem PO Box 12271 Jerusalem 9112102, Israel
| | - Dina Schneidman-Duhovny
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| |
Collapse
|
16
|
Gu S, Yang Y, Zhao Y, Qiu J, Wang X, Tong HHY, Liu L, Wan X, Liu H, Hou T, Kang Y. Evaluation of AlphaFold2 Structures for Hit Identification across Multiple Scenarios. J Chem Inf Model 2024; 64:3630-3639. [PMID: 38630855 DOI: 10.1021/acs.jcim.3c01976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
The introduction of AlphaFold2 (AF2) has sparked significant enthusiasm and generated extensive discussion within the scientific community, particularly among drug discovery researchers. Although previous studies have addressed the performance of AF2 structures in virtual screening (VS), a more comprehensive investigation is still necessary considering the paramount importance of structural accuracy in drug design. In this study, we evaluate the performance of AF2 structures in VS across three common drug discovery scenarios: targets with holo, apo, and AF2 structures; targets with only apo and AF2 structures; and targets exclusively with AF2 structures. We utilized both the traditional physics-based Glide and the deep-learning-based scoring function RTMscore to rank the compounds in the DUD-E, DEKOIS 2.0, and DECOY data sets. The results demonstrate that, overall, the performance of VS on AF2 structures is comparable to that on apo structures but notably inferior to that on holo structures across diverse scenarios. Moreover, when a target has solely AF2 structure, selecting the holo structure of the target from different subtypes within the same protein family produces comparable results with the AF2 structure for VS on the data set of the AF2 structures, and significantly better results than the AF2 structures on its own data set. This indicates that utilizing AF2 structures for docking-based VS may not yield most satisfactory outcomes, even when solely AF2 structures are available. Moreover, we rule out the possibility that the variations in VS performance between the binding pockets of AF2 and holo structures arise from the differences in their biological assembly composition.
Collapse
Affiliation(s)
- Shukai Gu
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078, SAR, China
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yuwei Yang
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078, SAR, China
| | - Yihao Zhao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jiayue Qiu
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078, SAR, China
| | - Xiaorui Wang
- State Key Laboratory of Quality Re-search in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao 999078, China
| | - Henry Hoi Yee Tong
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078, SAR, China
| | - Liwei Liu
- Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd., Nanjing 210000, Jiangsu, China
| | - Xiaozhe Wan
- Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd., Nanjing 210000, Jiangsu, China
| | - Huanxiang Liu
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078, SAR, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
17
|
Ruiz-Serra V, Valentini S, Madroñero S, Valencia A, Porta-Pardo E. 3Dmapper: a command line tool for BioBank-scale mapping of variants to protein structures. Bioinformatics 2024; 40:btae171. [PMID: 38565273 PMCID: PMC11018535 DOI: 10.1093/bioinformatics/btae171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 02/09/2024] [Accepted: 03/30/2024] [Indexed: 04/04/2024] Open
Abstract
MOTIVATION The interpretation of genomic data is crucial to understand the molecular mechanisms of biological processes. Protein structures play a vital role in facilitating this interpretation by providing functional context to genetic coding variants. However, mapping genes to proteins is a tedious and error-prone task due to inconsistencies in data formats. Over the past two decades, numerous tools and databases have been developed to automatically map annotated positions and variants to protein structures. However, most of these tools are web-based and not well-suited for large-scale genomic data analysis. RESULTS To address this issue, we introduce 3Dmapper, a stand-alone command-line tool developed in Python and R. It systematically maps annotated protein positions and variants to protein structures, providing a solution that is both efficient and reliable. AVAILABILITY AND IMPLEMENTATION https://github.com/vicruiser/3Dmapper.
Collapse
Affiliation(s)
- Victoria Ruiz-Serra
- Barcelona Supercomputing Center (BSC)
- Josep Carreras Leukaemia Research Institute (IJC), Badalona 08916, Spain
| | - Samuel Valentini
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Trento 38123, Italy
| | - Sergi Madroñero
- Josep Carreras Leukaemia Research Institute (IJC), Badalona 08916, Spain
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC)
- Institució Catalana de Recerca Avançada (ICREA)
| | - Eduard Porta-Pardo
- Barcelona Supercomputing Center (BSC)
- Josep Carreras Leukaemia Research Institute (IJC), Badalona 08916, Spain
| |
Collapse
|
18
|
Weller JA, Rohs R. DrugHIVE: Target-specific spatial drug design and optimization with a hierarchical generative model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.22.573155. [PMID: 38187658 PMCID: PMC10769420 DOI: 10.1101/2023.12.22.573155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Rapid advancement in the computational methods of structure-based drug design has led to their widespread adoption as key tools in the early drug development process. Recently, the remarkable growth of available crystal structure data and libraries of commercially available or readily synthesizable molecules have unlocked previously inaccessible regions of chemical space for drug development. Paired with improvements in virtual ligand screening methods, these expanded libraries are having a significant impact on the success of early drug design efforts. However, screening-based methods are limited in their scalability due to computational limits and the sheer scale of drug-like space. An approach within the quickly evolving field of artificial intelligence (AI), deep generative modeling, is extending the reach of molecular design beyond classical methods by learning the fundamental intra- and inter-molecular relationships in drug-target systems from existing data. In this work we introduce DrugHIVE, a deep hierarchical structure-based generative model that enables fine-grained control over molecular generation. Our model outperforms state of the art autoregressive and diffusion-based methods on common benchmarks and in speed of generation. Here, we demonstrate DrugHIVEs capacity to accelerate a wide range of common drug design tasks such as de novo generation, molecular optimization, scaffold hopping, linker design, and high throughput pattern replacement. Our method is highly scalable and can be applied to high confidence AlphaFold predicted receptors, extending our ability to generate high quality drug-like molecules to a majority of the unsolved human proteome.
Collapse
|
19
|
Smith MD, Darryl Quarles L, Demerdash O, Smith JC. Drugging the entire human proteome: Are we there yet? Drug Discov Today 2024; 29:103891. [PMID: 38246414 DOI: 10.1016/j.drudis.2024.103891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/12/2024] [Accepted: 01/16/2024] [Indexed: 01/23/2024]
Abstract
Each of the ∼20,000 proteins in the human proteome is a potential target for compounds that bind to it and modify its function. The 3D structures of most of these proteins are now available. Here, we discuss the prospects for using these structures to perform proteome-wide virtual HTS (VHTS). We compare physics-based (docking) and AI VHTS approaches, some of which are now being applied with large databases of compounds to thousands of targets. Although preliminary proteome-wide screens are now within our grasp, further methodological developments are expected to improve the accuracy of the results.
Collapse
Affiliation(s)
- Micholas Dean Smith
- University of Tennessee/Oak Ridge National Laboratory Center for Molecular Biophysics, Oak Ridge, TN 37830, USA; Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USA
| | - L Darryl Quarles
- Departments of Medicine, University of Tennessee Health Science Center, Memphis, TN 38163, USA; ORRxD LLC, 3404 Olney Drive, Durham, NC 27705, USA
| | - Omar Demerdash
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830, USA
| | - Jeremy C Smith
- University of Tennessee/Oak Ridge National Laboratory Center for Molecular Biophysics, Oak Ridge, TN 37830, USA; Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USA.
| |
Collapse
|
20
|
Schaeffer RD, Zhang J, Medvedev KE, Kinch LN, Cong Q, Grishin NV. ECOD domain classification of 48 whole proteomes from AlphaFold Structure Database using DPAM2. PLoS Comput Biol 2024; 20:e1011586. [PMID: 38416793 PMCID: PMC10927120 DOI: 10.1371/journal.pcbi.1011586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 03/11/2024] [Accepted: 02/20/2024] [Indexed: 03/01/2024] Open
Abstract
Protein structure prediction has now been deployed widely across several different large protein sets. Large-scale domain annotation of these predictions can aid in the development of biological insights. Using our Evolutionary Classification of Protein Domains (ECOD) from experimental structures as a basis for classification, we describe the detection and cataloging of domains from 48 whole proteomes deposited in the AlphaFold Database. On average, we can provide positive classification (either of domains or other identifiable non-domain regions) for 90% of residues in all proteomes. We classified 746,349 domains from 536,808 proteins comprised of over 226,424,000 amino acid residues. We examine the varying populations of homologous groups in both eukaryotes and bacteria. In addition to containing a higher fraction of disordered regions and unassigned domains, eukaryotes show a higher proportion of repeated proteins, both globular and small repeats. We enumerate those highly populated domains that are shared in both eukaryotes and bacteria, such as the Rossmann domains, TIM barrels, and P-loop domains. Additionally, we compare the sampling of homologous groups from this whole proteome set against our stable ECOD reference and discuss groups that have been enriched by structure predictions. Finally, we discuss the implication of these results for protein target selection for future classification strategies for very large protein sets.
Collapse
Affiliation(s)
- R. Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Jing Zhang
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Kirill E. Medvedev
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Lisa N. Kinch
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Qian Cong
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Nick V. Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| |
Collapse
|
21
|
Aspromonte MC, Nugnes MV, Quaglia F, Bouharoua A, Tosatto SCE, Piovesan D. DisProt in 2024: improving function annotation of intrinsically disordered proteins. Nucleic Acids Res 2024; 52:D434-D441. [PMID: 37904585 PMCID: PMC10767923 DOI: 10.1093/nar/gkad928] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/05/2023] [Accepted: 10/10/2023] [Indexed: 11/01/2023] Open
Abstract
DisProt (URL: https://disprot.org) is the gold standard database for intrinsically disordered proteins and regions, providing valuable information about their functions. The latest version of DisProt brings significant advancements, including a broader representation of functions and an enhanced curation process. These improvements aim to increase both the quality of annotations and their coverage at the sequence level. Higher coverage has been achieved by adopting additional evidence codes. Quality of annotations has been improved by systematically applying Minimum Information About Disorder Experiments (MIADE) principles and reporting all the details of the experimental setup that could potentially influence the structural state of a protein. The DisProt database now includes new thematic datasets and has expanded the adoption of Gene Ontology terms, resulting in an extensive functional repertoire which is automatically propagated to UniProtKB. Finally, we show that DisProt's curated annotations strongly correlate with disorder predictions inferred from AlphaFold2 pLDDT (predicted Local Distance Difference Test) confidence scores. This comparison highlights the utility of DisProt in explaining apparent uncertainty of certain well-defined predicted structures, which often correspond to folding-upon-binding fragments. Overall, DisProt serves as a comprehensive resource, combining experimental evidence of disorder information to enhance our understanding of intrinsically disordered proteins and their functional implications.
Collapse
Affiliation(s)
| | | | - Federica Quaglia
- Department of Biomedical Sciences, University of Padova, Padova, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), Bari, Italy
| | - Adel Bouharoua
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | | | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| |
Collapse
|
22
|
Liu BH, Liu M, Radhakrishnan S, Jaladanki CK, Gao C, Tang JP, Kumari K, Go ML, Vu KAL, Seo HS, Song K, Tian X, Feng L, Tan JL, Bassal MA, Arthanari H, Qi J, Dhe-Paganon S, Fan H, Tenen DG, Chai L. Targeting transcription factors through an IMiD independent zinc finger domain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.03.574032. [PMID: 38260640 PMCID: PMC10802279 DOI: 10.1101/2024.01.03.574032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Immunomodulatory imide drugs (IMiDs) degrade specific C2H2 zinc finger degrons in transcription factors, making them effective against certain cancers. SALL4, a cancer driver, contains seven C2H2 zinc fingers in four clusters, including an IMiD degron in zinc finger cluster two (ZFC2). Surprisingly, IMiDs do not inhibit growth of SALL4 expressing cancer cells. To overcome this limit, we focused on a non-IMiD degron, SALL4 zinc finger cluster four (ZFC4). By combining AlphaFold and the ZFC4-DNA crystal structure, we identified a potential ZFC4 drug pocket. Utilizing an in silico docking algorithm and cell viability assays, we screened chemical libraries and discovered SH6, which selectively targets SALL4-expressing cancer cells. Mechanistic studies revealed that SH6 degrades SALL4 protein through the CUL4A/CRBN pathway, while deletion of ZFC4 abolished this activity. Moreover, SH6 led to significant 62% tumor growth inhibition of SALL4+ xenografts in vivo and demonstrated good bioavailability in pharmacokinetic studies. In summary, these studies represent a new approach for IMiD independent drug discovery targeting C2H2 transcription factors in cancer.
Collapse
|
23
|
Terwilliger TC, Liebschner D, Croll TI, Williams CJ, McCoy AJ, Poon BK, Afonine PV, Oeffner RD, Richardson JS, Read RJ, Adams PD. AlphaFold predictions are valuable hypotheses and accelerate but do not replace experimental structure determination. Nat Methods 2024; 21:110-116. [PMID: 38036854 PMCID: PMC10776388 DOI: 10.1038/s41592-023-02087-4] [Citation(s) in RCA: 79] [Impact Index Per Article: 79.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 10/11/2023] [Indexed: 12/02/2023]
Abstract
Artificial intelligence-based protein structure prediction methods such as AlphaFold have revolutionized structural biology. The accuracies of these predictions vary, however, and they do not take into account ligands, covalent modifications or other environmental factors. Here, we evaluate how well AlphaFold predictions can be expected to describe the structure of a protein by comparing predictions directly with experimental crystallographic maps. In many cases, AlphaFold predictions matched experimental maps remarkably closely. In other cases, even very high-confidence predictions differed from experimental maps on a global scale through distortion and domain orientation, and on a local scale in backbone and side-chain conformation. We suggest considering AlphaFold predictions as exceptionally useful hypotheses. We further suggest that it is important to consider the confidence in prediction when interpreting AlphaFold predictions and to carry out experimental structure determination to verify structural details, particularly those that involve interactions not included in the prediction.
Collapse
Affiliation(s)
- Thomas C Terwilliger
- New Mexico Consortium, Los Alamos, NM, USA.
- Los Alamos National Laboratory, Los Alamos, NM, USA.
| | - Dorothee Liebschner
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Tristan I Croll
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
| | | | - Airlie J McCoy
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
| | - Billy K Poon
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Pavel V Afonine
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Robert D Oeffner
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
| | | | - Randy J Read
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
| | - Paul D Adams
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Department of Bioengineering, University of California, Berkeley, CA, USA
| |
Collapse
|
24
|
Cen LP, Ng TK, Ji J, Lin JW, Yao Y, Yang R, Dong G, Cao Y, Chen C, Yao SQ, Wang WY, Huang Z, Qiu K, Pang CP, Liu Q, Zhang M. Artificial Intelligence-based database for prediction of protein structure and their alterations in ocular diseases. Database (Oxford) 2023; 2023:baad083. [PMID: 38109881 PMCID: PMC10727695 DOI: 10.1093/database/baad083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 07/17/2023] [Accepted: 12/15/2023] [Indexed: 12/20/2023]
Abstract
The aim of the study is to establish an online database for predicting protein structures altered in ocular diseases by Alphafold2 and RoseTTAFold algorithms. Totally, 726 genes of multiple ocular diseases were collected for protein structure prediction. Both Alphafold2 and RoseTTAFold algorithms were built locally using the open-source codebases. A dataset with 48 protein structures from Protein Data Bank (PDB) was adopted for algorithm set-up validation. A website was built to match ocular genes with the corresponding predicted tertiary protein structures for each amino acid sequence. The predicted local distance difference test-Cα (pLDDT) and template modeling (TM) scores of the validation protein structure and the selected ocular genes were evaluated. Molecular dynamics and molecular docking simulations were performed to demonstrate the applications of the predicted structures. For the validation dataset, 70.8% of the predicted protein structures showed pLDDT greater than 90. Compared to the PDB structures, 100% of the AlphaFold2-predicted structures and 97.9% of the RoseTTAFold-predicted structure showed TM score greater than 0.5. Totally, 1329 amino acid sequences of 430 ocular disease-related genes have been predicted, of which 75.9% showed pLDDT greater than 70 for the wildtype sequences and 76.1% for the variant sequences. Small molecule docking and molecular dynamics simulations revealed that the predicted protein structures with higher confidence scores showed similar molecular characteristics with the structures from PDB. We have developed an ocular protein structure database (EyeProdb) for ocular disease, which is released for the public and will facilitate the biological investigations and structure-based drug development for ocular diseases. Database URL: http://eyeprodb.jsiec.org.
Collapse
Affiliation(s)
| | - Tsz Kin Ng
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
- Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, 147K Argyle Street, KLN, Hong Kong
| | - Jie Ji
- Network & Information Centre, Shantou University, 243 Daxue Road, Shantou, Guangdong 515063, China
| | - Jian-Wei Lin
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
| | - Yao Yao
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
- Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
| | - Rucui Yang
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
- Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
| | - Geng Dong
- Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
- Guangdong Provincial Key Laboratory of Infectious Diseases and Molecular Immunopathology, Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
| | - Yingjie Cao
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
| | - Chongbo Chen
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
| | - Shi-Qi Yao
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
- Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
| | - Wen-Ying Wang
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
- Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
| | - Zijing Huang
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
| | - Kunliang Qiu
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
| | - Chi Pui Pang
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, 147K Argyle Street, KLN, Hong Kong
| | - Qingping Liu
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
- Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
| | - Mingzhi Zhang
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
| |
Collapse
|
25
|
Malhotra N, Khatri S, Kumar A, Arun A, Daripa P, Fatihi S, Venkadesan S, Jain N, Thukral L. AI-based AlphaFold2 significantly expands the structural space of the autophagy pathway. Autophagy 2023; 19:3201-3220. [PMID: 37516933 PMCID: PMC10621275 DOI: 10.1080/15548627.2023.2238578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 07/08/2023] [Accepted: 07/14/2023] [Indexed: 07/31/2023] Open
Abstract
ABBREVIATIONS AF2: AlphaFold2; AF2-Mult: AlphaFold2 multimer; ATG: autophagy-related; CTD: C-terminal domain; ECTD: extreme C-terminal domain; FR: flexible region; MD: molecular dynamics; NTD: N-terminal domain; pLDDT: predicted local distance difference test; UBL: ubiquitin-like.
Collapse
Affiliation(s)
- Nidhi Malhotra
- Computational Structural Biology Lab, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Shantanu Khatri
- Computational Structural Biology Lab, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research (AcSir), Ghaziabad, India
| | - Ajit Kumar
- Computational Structural Biology Lab, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research (AcSir), Ghaziabad, India
| | - Akanksha Arun
- Computational Structural Biology Lab, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research (AcSir), Ghaziabad, India
| | - Purba Daripa
- Computational Structural Biology Lab, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Saman Fatihi
- Computational Structural Biology Lab, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research (AcSir), Ghaziabad, India
| | | | - Niyati Jain
- Computational Structural Biology Lab, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Lipi Thukral
- Computational Structural Biology Lab, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research (AcSir), Ghaziabad, India
| |
Collapse
|
26
|
James JK, Norland K, Johar AS, Kullo IJ. Deep generative models of LDLR protein structure to predict variant pathogenicity. J Lipid Res 2023; 64:100455. [PMID: 37821076 PMCID: PMC10696256 DOI: 10.1016/j.jlr.2023.100455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 09/16/2023] [Accepted: 10/05/2023] [Indexed: 10/13/2023] Open
Abstract
The complex structure and function of low density lipoprotein receptor (LDLR) makes classification of protein-coding missense variants challenging. Deep generative models, including Evolutionary model of Variant Effect (EVE), Evolutionary Scale Modeling (ESM), and AlphaFold 2 (AF2), have enabled significant progress in the prediction of protein structure and function. ESM and EVE directly estimate the likelihood of a variant sequence but are purely data-driven and challenging to interpret. AF2 predicts LDLR structures, but variant effects are explicitly modeled by estimating changes in stability. We tested the effectiveness of these models for predicting variant pathogenicity compared to established methods. AF2 produced two distinct conformations based on a novel hinge mechanism. Within ESM's hidden space, benign and pathogenic variants had different distributions. In EVE, these distributions were similar. EVE and ESM were comparable to Polyphen-2, SIFT, REVEL, and Primate AI for predicting binary classifications in ClinVar. However, they were more strongly correlated with experimental measures of LDL uptake. AF2 poorly performed in these tasks. Using the UK Biobank to compare association with clinical phenotypes, ESM and EVE were more strongly associated with serum LDL-C than Polyphen-2. ESM was able to identify variants with more extreme LDL-C levels than EVE and had a significantly stronger association with atherosclerotic cardiovascular disease. In conclusion, AF2 predicted LDLR structures do not accurately model variant pathogenicity. ESM and EVE are competitive with prior scoring methods for prediction based on binary classifications in ClinVar but are superior based on correlations with experimental assays and clinical phenotypes.
Collapse
Affiliation(s)
- Jose K James
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Kristjan Norland
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Angad S Johar
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA; Gonda Vascular Center, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
27
|
Kosoglu K, Aydin Z, Tuncbag N, Gursoy A, Keskin O. Structural coverage of the human interactome. Brief Bioinform 2023; 25:bbad496. [PMID: 38180828 PMCID: PMC10768791 DOI: 10.1093/bib/bbad496] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/16/2023] [Accepted: 11/30/2023] [Indexed: 01/07/2024] Open
Abstract
Complex biological processes in cells are embedded in the interactome, representing the complete set of protein-protein interactions. Mapping and analyzing the protein structures are essential to fully comprehending these processes' molecular details. Therefore, knowing the structural coverage of the interactome is important to show the current limitations. Structural modeling of protein-protein interactions requires accurate protein structures. In this study, we mapped all experimental structures to the reference human proteome. Later, we found the enrichment in structural coverage when complementary methods such as homology modeling and deep learning (AlphaFold) were included. We then collected the interactions from the literature and databases to form the reference human interactome, resulting in 117 897 non-redundant interactions. When we analyzed the structural coverage of the interactome, we found that the number of experimentally determined protein complex structures is scarce, corresponding to 3.95% of all binary interactions. We also analyzed known and modeled structures to potentially construct the structural interactome with a docking method. Our analysis showed that 12.97% of the interactions from HuRI and 73.62% and 32.94% from the filtered versions of STRING and HIPPIE could potentially be modeled with high structural coverage or accuracy, respectively. Overall, this paper provides an overview of the current state of structural coverage of the human proteome and interactome.
Collapse
Affiliation(s)
- Kayra Kosoglu
- Computational Sciences and Engineering, College of Engineering, Koc University, 34450 Istanbul, Turkey
| | - Zeynep Aydin
- Computational Sciences and Engineering, College of Engineering, Koc University, 34450 Istanbul, Turkey
| | - Nurcan Tuncbag
- School of Medicine, Koc University, 34450 Istanbul, Turkey
- Department of Chemical and Biological Engineering, College of Engineering, Koc University, 34450 Istanbul, Turkey
| | - Attila Gursoy
- Department of Computer Engineering, College of Engineering, Koc University, 34450 Istanbul, Turkey
| | - Ozlem Keskin
- Department of Chemical and Biological Engineering, College of Engineering, Koc University, 34450 Istanbul, Turkey
| |
Collapse
|
28
|
Rosignoli S, di Paola L, Paiardini A. PyPCN: protein contact networks in PyMOL. Bioinformatics 2023; 39:btad675. [PMID: 37941462 PMCID: PMC10641099 DOI: 10.1093/bioinformatics/btad675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 09/25/2023] [Accepted: 11/03/2023] [Indexed: 11/10/2023] Open
Abstract
MOTIVATION Protein contact networks (PCNs) represent the 3D structure of a protein using network formalism. Inter-residue contacts are described as binary adjacency matrices, which are derived from the graph representation of residues (as α-carbons, β-carbons or centroids) and Euclidean distances according to defined thresholds. Functional characterization algorithms are computed on binary adjacency matrices to unveil allosteric, dynamic, and interaction mechanisms in proteins. Such strategies are usually applied in a combinatorial manner, although rarely in seamless and user-friendly implementations. RESULTS PyPCN is a plugin for PyMOL wrapping more than twenty PCN algorithms and metrics in an easy-to-use graphical user interface, to support PCN analysis. The plugin accepts 3D structures from the Protein Data Bank, user-provided PDBs, or precomputed adjacency matrices. The results are directly mapped to 3D protein structures and organized into interactive diagrams for their visualization. A dedicated graphical user interface combined with PyMOL visual support makes analysis more intuitive and easier, extending the applicability of PCNs. AVAILABILITY AND IMPLEMENTATION https://github.com/pcnproject/PyPCN.
Collapse
Affiliation(s)
- Serena Rosignoli
- Department of Biochemical Sciences “A. Rossi Fanelli”, Sapienza University of Rome, 00185 Rome, Italy
| | - Luisa di Paola
- Unit of Chemical-Physics Fundamentals in Chemical Engineering, Department of Engineering, Università Campus Bio-Medico di Roma, 00128 Rome, Italy
| | - Alessandro Paiardini
- Department of Biochemical Sciences “A. Rossi Fanelli”, Sapienza University of Rome, 00185 Rome, Italy
| |
Collapse
|
29
|
Liang Z, Liu T, Li Q, Zhang G, Zhang B, Du X, Liu J, Chen Z, Ding H, Hu G, Lin H, Zhu F, Luo C. Deciphering the functional landscape of phosphosites with deep neural network. Cell Rep 2023; 42:113048. [PMID: 37659078 DOI: 10.1016/j.celrep.2023.113048] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 07/11/2023] [Accepted: 08/11/2023] [Indexed: 09/04/2023] Open
Abstract
Current biochemical approaches have only identified the most well-characterized kinases for a tiny fraction of the phosphoproteome, and the functional assignments of phosphosites are almost negligible. Herein, we analyze the substrate preference catalyzed by a specific kinase and present a novel integrated deep neural network model named FuncPhos-SEQ for functional assignment of human proteome-level phosphosites. FuncPhos-SEQ incorporates phosphosite motif information from a protein sequence using multiple convolutional neural network (CNN) channels and network features from protein-protein interactions (PPIs) using network embedding and deep neural network (DNN) channels. These concatenated features are jointly fed into a heterogeneous feature network to prioritize functional phosphosites. Combined with a series of in vitro and cellular biochemical assays, we confirm that NADK-S48/50 phosphorylation could activate its enzymatic activity. In addition, ERK1/2 are discovered as the primary kinases responsible for NADK-S48/50 phosphorylation. Moreover, FuncPhos-SEQ is developed as an online server.
Collapse
Affiliation(s)
- Zhongjie Liang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China; Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Soochow University, Suzhou 215123, China
| | - Tonghai Liu
- Zhongshan Institute for Drug Discovery, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Zhongshan 528437, China; State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Qi Li
- Zhongshan Institute for Drug Discovery, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Zhongshan 528437, China; State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Guangyu Zhang
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Bei Zhang
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Xikun Du
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Jingqiu Liu
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Zhifeng Chen
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Hong Ding
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Guang Hu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China; Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Soochow University, Suzhou 215123, China
| | - Hao Lin
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Fei Zhu
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China.
| | - Cheng Luo
- Zhongshan Institute for Drug Discovery, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Zhongshan 528437, China; State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310024, China; School of Life Science and Technology, Shanghai Tech University, 100 Haike Road, Shanghai 201210, China; School of Pharmacy, Fujian Medical University, Fuzhou 350122, China.
| |
Collapse
|
30
|
Xu T, Xu Q, Li J. Toward the appropriate interpretation of Alphafold2. Front Artif Intell 2023; 6:1149748. [PMID: 37664078 PMCID: PMC10469483 DOI: 10.3389/frai.2023.1149748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Accepted: 07/24/2023] [Indexed: 09/05/2023] Open
Abstract
In life science, protein is an essential building block for life forms and a crucial catalyst for metabolic reactions in organisms. The structures of protein depend on an infinity of amino acid residues' complex combinations determined by gene expression. Predicting protein folding structures has been a tedious problem in the past seven decades but, due to robust development of artificial intelligence, astonishing progress has been made. Alphafold2, whose key component is Evoformer, is a typical and successful example of such progress. This article attempts to not only isolate and dissect every detail of Evoformer, but also raise some ideas for potential improvement.
Collapse
Affiliation(s)
- Tian Xu
- Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States
| | - Qin Xu
- Department of Mathematics, The University of Arizona, Tucson, AZ, United States
| | - Jianyong Li
- Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States
| |
Collapse
|
31
|
Medvedev KE, Schaeffer RD, Chen KS, Grishin NV. Pan-cancer structurome reveals overrepresentation of beta sandwiches and underrepresentation of alpha helical domains. Sci Rep 2023; 13:11988. [PMID: 37491511 PMCID: PMC10368619 DOI: 10.1038/s41598-023-39273-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 07/22/2023] [Indexed: 07/27/2023] Open
Abstract
The recent progress in the prediction of protein structures marked a historical milestone. AlphaFold predicted 200 million protein models with an accuracy comparable to experimental methods. Protein structures are widely used to understand evolution and to identify potential drug targets for the treatment of various diseases, including cancer. Thus, these recently predicted structures might convey previously unavailable information about cancer biology. Evolutionary classification of protein domains is challenging and different approaches exist. Recently our team presented a classification of domains from human protein models released by AlphaFold. Here we evaluated the pan-cancer structurome, domains from over and under expressed proteins in 21 cancer types, using the broadest levels of the ECOD classification: the architecture (A-groups) and possible homology (X-groups) levels. Our analysis reveals that AlphaFold has greatly increased the three-dimensional structural landscape for proteins that are differentially expressed in these 21 cancer types. We show that beta sandwich domains are significantly overrepresented and alpha helical domains are significantly underrepresented in the majority of cancer types. Our data suggest that the prevalence of the beta sandwiches is due to the high levels of immunoglobulins and immunoglobulin-like domains that arise during tumor development-related inflammation. On the other hand, proteins with exclusively alpha domains are important elements of homeostasis, apoptosis and transmembrane transport. Therefore cancer cells tend to reduce representation of these proteins to promote successful oncogeneses.
Collapse
Affiliation(s)
- Kirill E Medvedev
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
| | - R Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Kenneth S Chen
- Department of Pediatrics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- Children's Medical Center Research Institute, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Nick V Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| |
Collapse
|
32
|
Krokengen OC, Raasakka A, Kursula P. The intrinsically disordered protein glue of the myelin major dense line: Linking AlphaFold2 predictions to experimental data. Biochem Biophys Rep 2023; 34:101474. [PMID: 37153862 PMCID: PMC10160357 DOI: 10.1016/j.bbrep.2023.101474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 03/31/2023] [Accepted: 04/19/2023] [Indexed: 05/10/2023] Open
Abstract
Numerous human proteins are classified as intrinsically disordered proteins (IDPs). Due to their physicochemical properties, high-resolution structural information about IDPs is generally lacking. On the other hand, IDPs are known to adopt local ordered structures upon interactions with e.g. other proteins or lipid membrane surfaces. While recent developments in protein structure prediction have been revolutionary, their impact on IDP research at high resolution remains limited. We took a specific example of two myelin-specific IDPs, the myelin basic protein (MBP) and the cytoplasmic domain of myelin protein zero (P0ct). Both of these IDPs are crucial for normal nervous system development and function, and while they are disordered in solution, upon membrane binding, they partially fold into helices, being embedded into the lipid membrane. We carried out AlphaFold2 predictions of both proteins and analysed the models in light of experimental data related to protein structure and molecular interactions. We observe that the predicted models have helical segments that closely correspond to the membrane-binding sites on both proteins. We furthermore analyse the fits of the models to synchrotron-based X-ray scattering and circular dichroism data from the same IDPs. The models are likely to represent the membrane-bound state of both MBP and P0ct, rather than the conformation in solution. Artificial intelligence-based models of IDPs appear to provide information on the ligand-bound state of these proteins, instead of the conformers dominating free in solution. We further discuss the implications of the predictions for mammalian nervous system myelination and their relevance to understanding disease aspects of these IDPs.
Collapse
Affiliation(s)
| | - Arne Raasakka
- Department of Biomedicine, University of Bergen, Norway
| | - Petri Kursula
- Department of Biomedicine, University of Bergen, Norway
- Faculty of Biochemistry and Molecular Medicine & Biocenter Oulu, Oulu, Finland
| |
Collapse
|
33
|
Hatano Y, Ishihara T, Onodera O. Accuracy of a machine learning method based on structural and locational information from AlphaFold2 for predicting the pathogenicity of TARDBP and FUS gene variants in ALS. BMC Bioinformatics 2023; 24:206. [PMID: 37208601 DOI: 10.1186/s12859-023-05338-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 05/09/2023] [Indexed: 05/21/2023] Open
Abstract
BACKGROUND In the sporadic form of amyotrophic lateral sclerosis (ALS), the pathogenicity of rare variants in the causative genes characterizing the familial form remains largely unknown. To predict the pathogenicity of such variants, in silico analysis is commonly used. In some ALS causative genes, the pathogenic variants are concentrated in specific regions, and the resulting alterations in protein structure are thought to significantly affect pathogenicity. However, existing methods have not taken this issue into account. To address this, we have developed a technique termed MOVA (method for evaluating the pathogenicity of missense variants using AlphaFold2), which applies positional information for structural variants predicted by AlphaFold2. Here we examined the utility of MOVA for analysis of several causative genes of ALS. METHODS We analyzed variants of 12 ALS-related genes (TARDBP, FUS, SETX, TBK1, OPTN, SOD1, VCP, SQSTM1, ANG, UBQLN2, DCTN1, and CCNF) and classified them as pathogenic or neutral. For each gene, the features of the variants, consisting of their positions in the 3D structure predicted by AlphaFold2, pLDDT score, and BLOSUM62 were trained into a random forest and evaluated by the stratified fivefold cross validation method. We compared how accurately MOVA predicted mutant pathogenicity with other in silico prediction methods and evaluated the prediction accuracy at TARDBP and FUS hotspots. We also examined which of the MOVA features had the greatest impact on pathogenicity discrimination. RESULTS MOVA yielded useful results (AUC ≥ 0.70) for TARDBP, FUS, SOD1, VCP, and UBQLN2 of 12 ALS causative genes. In addition, when comparing the prediction accuracy with other in silico prediction methods, MOVA obtained the best results among those compared for TARDBP, VCP, UBQLN2, and CCNF. MOVA demonstrated superior predictive accuracy for the pathogenicity of mutations at hotspots of TARDBP and FUS. Moreover, higher accuracy was achieved by combining MOVA with REVEL or CADD. Among the features of MOVA, the x, y, and z coordinates performed the best and were highly correlated with MOVA. CONCLUSIONS MOVA is useful for predicting the virulence of rare variants in which they are concentrated at specific structural sites, and for use in combination with other prediction methods.
Collapse
Affiliation(s)
- Yuya Hatano
- Department of Neurology, Brain Research Institute, Niigata University, 1-757 Asahimachidori, Chuo-ku, Niigata-shi, Niigata, 951-8585, Japan
| | - Tomohiko Ishihara
- Department of Neurology, Brain Research Institute, Niigata University, 1-757 Asahimachidori, Chuo-ku, Niigata-shi, Niigata, 951-8585, Japan.
| | - Osamu Onodera
- Department of Neurology, Brain Research Institute, Niigata University, 1-757 Asahimachidori, Chuo-ku, Niigata-shi, Niigata, 951-8585, Japan
| |
Collapse
|
34
|
David A, Sternberg MJE. Protein structure-based evaluation of missense variants: Resources, challenges and future directions. Curr Opin Struct Biol 2023; 80:102600. [PMID: 37126977 DOI: 10.1016/j.sbi.2023.102600] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/30/2023] [Accepted: 03/31/2023] [Indexed: 05/03/2023]
Abstract
We provide an overview of the methods that can be used for protein structure-based evaluation of missense variants. The algorithms can be broadly divided into those that calculate the difference in free energy (ΔΔG) between the wild type and variant structures and those that use structural features to predict the damaging effect of a variant without providing a ΔΔG. A wide range of machine learning approaches have been employed to develop those algorithms. We also discuss challenges and opportunities for variant interpretation in view of the recent breakthrough in three-dimensional structural modelling using deep learning.
Collapse
Affiliation(s)
- Alessia David
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK.
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| |
Collapse
|
35
|
Bartolec TK, Vázquez-Campos X, Norman A, Luong C, Johnson M, Payne RJ, Wilkins MR, Mackay JP, Low JKK. Cross-linking mass spectrometry discovers, evaluates, and corroborates structures and protein-protein interactions in the human cell. Proc Natl Acad Sci U S A 2023; 120:e2219418120. [PMID: 37071682 PMCID: PMC10151615 DOI: 10.1073/pnas.2219418120] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 03/16/2023] [Indexed: 04/19/2023] Open
Abstract
Significant recent advances in structural biology, particularly in the field of cryoelectron microscopy, have dramatically expanded our ability to create structural models of proteins and protein complexes. However, many proteins remain refractory to these approaches because of their low abundance, low stability, or-in the case of complexes-simply not having yet been analyzed. Here, we demonstrate the power of using cross-linking mass spectrometry (XL-MS) for the high-throughput experimental assessment of the structures of proteins and protein complexes. This included those produced by high-resolution but in vitro experimental data, as well as in silico predictions based on amino acid sequence alone. We present the largest XL-MS dataset to date, describing 28,910 unique residue pairs captured across 4,084 unique human proteins and 2,110 unique protein-protein interactions. We show that models of proteins and their complexes predicted by AlphaFold2, and inspired and corroborated by the XL-MS data, offer opportunities to deeply mine the structural proteome and interactome and reveal mechanisms underlying protein structure and function.
Collapse
Affiliation(s)
- Tara K. Bartolec
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Randwick, NSW2052, Australia
| | - Xabier Vázquez-Campos
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Randwick, NSW2052, Australia
| | - Alexander Norman
- School of Chemistry, University of Sydney, Sydney, NSW2006, Australia
| | - Clement Luong
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW2006, Australia
| | - Marcus Johnson
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW2006, Australia
| | - Richard J. Payne
- School of Chemistry, University of Sydney, Sydney, NSW2006, Australia
- Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, The University of Sydney, Sydney, NSW2006, Australia
| | - Marc R. Wilkins
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Randwick, NSW2052, Australia
| | - Joel P. Mackay
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW2006, Australia
| | - Jason K. K. Low
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW2006, Australia
| |
Collapse
|
36
|
Mutz P, Resch W, Faure G, Senkevich TG, Koonin EV, Moss B. Exaptation of Inactivated Host Enzymes for Structural Roles in Orthopoxviruses and Novel Folds of Virus Proteins Revealed by Protein Structure Modeling. mBio 2023; 14:e0040823. [PMID: 37017580 PMCID: PMC10128050 DOI: 10.1128/mbio.00408-23] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 02/21/2023] [Indexed: 04/06/2023] Open
Abstract
Viruses with large, double-stranded DNA genomes captured the majority of their genes from their hosts at different stages of evolution. The origins of many virus genes are readily detected through significant sequence similarity with cellular homologs. In particular, this is the case for virus enzymes, such as DNA and RNA polymerases or nucleotide kinases, that retain their catalytic activity after capture by an ancestral virus. However, a large fraction of virus genes have no readily detectable cellular homologs, meaning that their origins remain enigmatic. We explored the potential origins of such proteins that are encoded in the genomes of orthopoxviruses, a thoroughly studied virus genus that includes major human pathogens. To this end, we used AlphaFold2 to predict the structures of all 214 proteins that are encoded by orthopoxviruses. Among the proteins of unknown provenance, structure prediction yielded clear indications of origin for 14 of them and validated several inferences that were previously made via sequence analysis. A notable emerging trend is the exaptation of enzymes from cellular organisms for nonenzymatic, structural roles in virus reproduction that is accompanied by the disruption of catalytic sites and by an overall drastic divergence that precludes homology detection at the sequence level. Among the 16 orthopoxvirus proteins that were found to be inactivated enzyme derivatives are the poxvirus replication processivity factor A20, which is an inactivated NAD-dependent DNA ligase; the major core protein A3, which is an inactivated deubiquitinase; F11, which is an inactivated prolyl hydroxylase; and more similar cases. For nearly one-third of the orthopoxvirus virion proteins, no significantly similar structures were identified, suggesting exaptation with subsequent major structural rearrangement that yielded unique protein folds. IMPORTANCE Protein structures are more strongly conserved in evolution than are amino acid sequences. Comparative structural analysis is particularly important for inferring the origins of viral proteins that typically evolve at high rates. We used a powerful protein structure modeling method, namely, AlphaFold2, to model the structures of all orthopoxvirus proteins and compared them to all available protein structures. Multiple cases of recruitment of host enzymes for structural roles in viruses, accompanied by the disruption of catalytic sites, were discovered. However, many viral proteins appear to have evolved unique structural folds.
Collapse
Affiliation(s)
- Pascal Mutz
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA
| | - Wolfgang Resch
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland, USA
| | - Guilhem Faure
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Tatiana G. Senkevich
- Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, National Instutes of Health, Bethesda, Maryland, USA
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA
| | - Bernard Moss
- Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, National Instutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
37
|
McCafferty CL, Pennington EL, Papoulas O, Taylor DW, Marcotte EM. Does AlphaFold2 model proteins' intracellular conformations? An experimental test using cross-linking mass spectrometry of endogenous ciliary proteins. Commun Biol 2023; 6:421. [PMID: 37061613 PMCID: PMC10105775 DOI: 10.1038/s42003-023-04773-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 03/28/2023] [Indexed: 04/17/2023] Open
Abstract
A major goal in structural biology is to understand protein assemblies in their biologically relevant states. Here, we investigate whether AlphaFold2 structure predictions match native protein conformations. We chemically cross-linked proteins in situ within intact Tetrahymena thermophila cilia and native ciliary extracts, identifying 1,225 intramolecular cross-links within the 100 best-sampled proteins, providing a benchmark of distance restraints obeyed by proteins in their native assemblies. The corresponding structure predictions were highly concordant, positioning 86.2% of cross-linked residues within Cɑ-to-Cɑ distances of 30 Å, consistent with the cross-linker length. 43% of proteins showed no violations. Most inconsistencies occurred in low-confidence regions or between domains. Overall, AlphaFold2 predictions with lower predicted aligned error corresponded to more correct native structures. However, we observe cases where rigid body domains are oriented incorrectly, as for ciliary protein BBC118, suggesting that combining structure prediction with experimental information will better reveal biologically relevant conformations.
Collapse
Affiliation(s)
- Caitlyn L McCafferty
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX, 78712, USA.
| | - Erin L Pennington
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX, 78712, USA
| | - Ophelia Papoulas
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX, 78712, USA
| | - David W Taylor
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX, 78712, USA.
| | - Edward M Marcotte
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX, 78712, USA.
| |
Collapse
|
38
|
Bruley A, Bitard-Feildel T, Callebaut I, Duprat E. A sequence-based foldability score combined with AlphaFold2 predictions to disentangle the protein order/disorder continuum. Proteins 2023; 91:466-484. [PMID: 36306150 DOI: 10.1002/prot.26441] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 10/14/2022] [Accepted: 10/18/2022] [Indexed: 11/11/2022]
Abstract
Order and disorder govern protein functions, but there is a great diversity in disorder, from regions that are-and stay-fully disordered to conditional order. This diversity is still difficult to decipher even though it is encoded in the amino acid sequences. Here, we developed an analytic Python package, named pyHCA, to estimate the foldability of a protein segment from the only information of its amino acid sequence and based on a measure of its density in regular secondary structures associated with hydrophobic clusters, as defined by the hydrophobic cluster analysis (HCA) approach. The tool was designed by optimizing the separation between foldable segments from databases of disorder (DisProt) and order (SCOPe [soluble domains] and OPM [transmembrane domains]). It allows to specify the ratio between order, embodied by regular secondary structures (either participating in the hydrophobic core of well-folded 3D structures or conditionally formed in intrinsically disordered regions) and disorder. We illustrated the relevance of pyHCA with several examples and applied it to the sequences of the proteomes of 21 species ranging from prokaryotes and archaea to unicellular and multicellular eukaryotes, for which structure models are provided in the AlphaFold protein structure database. Cases of low-confidence scores related to disorder were distinguished from those of sequences that we identified as foldable but are still excluded from accurate modeling by AlphaFold2 due to a lack of sequence homologs or to compositional biases. Overall, our approach is complementary to AlphaFold2, providing guides to map structural innovations through evolutionary processes, at proteome and gene scales.
Collapse
Affiliation(s)
- Apolline Bruley
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Tristan Bitard-Feildel
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Elodie Duprat
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| |
Collapse
|
39
|
de Brevern AG. An agnostic analysis of the human AlphaFold2 proteome using local protein conformations. Biochimie 2023; 207:11-19. [PMID: 36417962 DOI: 10.1016/j.biochi.2022.11.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 10/14/2022] [Accepted: 11/17/2022] [Indexed: 11/21/2022]
Abstract
Knowledge of the 3D structure of proteins is a valuable asset for understanding their precise biological mechanisms. However, the cost of production of 3D structures and experimental difficulties limit their obtaining. The proposal of 3D structural models is consequently an appealing alternative. The release of the AlphaFold Deep Learning approach has revolutionized the field. The recent near-complete human proteome proposal makes it possible to analyse large amounts of data and evaluate the results of the approach in greater depth. The 3D human proteome was thus analysed in light of the classic secondary structures, and many less-used protein local conformations (PolyProline II helices, type of γ-turns, of β-turns and of β-bulges, curvature of the helices, and a structural alphabet). Without questioning the global quality of the approach, this analysis highlights certain local conformations, which maybe poorly predicted and they could therefore be better addressed.
Collapse
Affiliation(s)
- Alexandre G de Brevern
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM UMR_S 1134, BIGR, DSIMB Bioinformatics team, F-75014, Paris, France.
| |
Collapse
|
40
|
Bordin N, Dallago C, Heinzinger M, Kim S, Littmann M, Rauer C, Steinegger M, Rost B, Orengo C. Novel machine learning approaches revolutionize protein knowledge. Trends Biochem Sci 2023; 48:345-359. [PMID: 36504138 PMCID: PMC10570143 DOI: 10.1016/j.tibs.2022.11.001] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 10/24/2022] [Accepted: 11/17/2022] [Indexed: 12/10/2022]
Abstract
Breakthrough methods in machine learning (ML), protein structure prediction, and novel ultrafast structural aligners are revolutionizing structural biology. Obtaining accurate models of proteins and annotating their functions on a large scale is no longer limited by time and resources. The most recent method to be top ranked by the Critical Assessment of Structure Prediction (CASP) assessment, AlphaFold 2 (AF2), is capable of building structural models with an accuracy comparable to that of experimental structures. Annotations of 3D models are keeping pace with the deposition of the structures due to advancements in protein language models (pLMs) and structural aligners that help validate these transferred annotations. In this review we describe how recent developments in ML for protein science are making large-scale structural bioinformatics available to the general scientific community.
Collapse
Affiliation(s)
- Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, Gower St, WC1E 6BT London, UK
| | - Christian Dallago
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany; VantAI, 151 W 42nd Street, New York, NY 10036, USA
| | - Michael Heinzinger
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748 Garching, Germany
| | - Stephanie Kim
- School of Biological Sciences, Seoul National University, Seoul, South Korea; Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Maria Littmann
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
| | - Clemens Rauer
- Institute of Structural and Molecular Biology, University College London, Gower St, WC1E 6BT London, UK
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea; Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Burkhard Rost
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany; Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748 Garching/Munich, Germany; TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, Gower St, WC1E 6BT London, UK.
| |
Collapse
|
41
|
Schaeffer RD, Zhang J, Kinch LN, Pei J, Cong Q, Grishin NV. Classification of domains in predicted structures of the human proteome. Proc Natl Acad Sci U S A 2023; 120:e2214069120. [PMID: 36917664 PMCID: PMC10041065 DOI: 10.1073/pnas.2214069120] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 02/06/2023] [Indexed: 03/16/2023] Open
Abstract
Recent advances in protein structure prediction have generated accurate structures of previously uncharacterized human proteins. Identifying domains in these predicted structures and classifying them into an evolutionary hierarchy can reveal biological insights. Here, we describe the detection and classification of domains from the human proteome. Our classification indicates that only 62% of residues are located in globular domains. We further classify these globular domains and observe that the majority (65%) can be classified among known folds by sequence, with a smaller fraction (33%) requiring structural data to refine the domain boundaries and/or to support their homology. A relatively small number (966 domains) cannot be confidently assigned using our automatic pipelines, thus demanding manual inspection. We classify 47,576 domains, of which only 23% have been included in experimental structures. A portion (6.3%) of these classified globular domains lack sequence-based annotation in InterPro. A quarter (23%) have not been structurally modeled by homology, and they contain 2,540 known disease-causing single amino acid variations whose pathogenesis can now be inferred using AF models. A comparison of classified domains from a series of model organisms revealed expansions of several immune response-related domains in humans and a depletion of olfactory receptors. Finally, we use this classification to expand well-known protein families of biological significance. These classifications are presented on the ECOD website (http://prodata.swmed.edu/ecod/index_human.php).
Collapse
Affiliation(s)
- R. Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Jing Zhang
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Lisa N. Kinch
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX75390
- HHMI, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Jimin Pei
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Qian Cong
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Nick V. Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX75390
| |
Collapse
|
42
|
Malbranke C, Bikard D, Cocco S, Monasson R, Tubiana J. Machine learning for evolutionary-based and physics-inspired protein design: Current and future synergies. Curr Opin Struct Biol 2023; 80:102571. [PMID: 36947951 DOI: 10.1016/j.sbi.2023.102571] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 01/29/2023] [Accepted: 02/07/2023] [Indexed: 03/24/2023]
Abstract
Computational protein design facilitates the discovery of novel proteins with prescribed structure and functionality. Exciting designs were recently reported using novel data-driven methodologies that can be roughly divided into two categories: evolutionary-based and physics-inspired approaches. The former infer characteristic sequence features shared by sets of evolutionary-related proteins, such as conserved or coevolving positions, and recombine them to generate candidates with similar structure and function. The latter approaches estimate key biochemical properties, such as structure free energy, conformational entropy, or binding affinities using machine learning surrogates, and optimize them to yield improved designs. Here, we review recent progress along both tracks, discuss their strengths and weaknesses, and highlight opportunities for synergistic approaches.
Collapse
Affiliation(s)
- Cyril Malbranke
- Laboratory of Physics of the Ecole Normale Supérieure, PSL Research, CNRS UMR 8023, Sorbonne Université, Université de Paris, Paris, France; Institut Pasteur, Université Paris Cité, CNRS UMR 6047, Synthetic Biology, 75015 Paris, France.
| | - David Bikard
- Institut Pasteur, Université Paris Cité, CNRS UMR 6047, Synthetic Biology, 75015 Paris, France
| | - Simona Cocco
- Laboratory of Physics of the Ecole Normale Supérieure, PSL Research, CNRS UMR 8023, Sorbonne Université, Université de Paris, Paris, France
| | - Rémi Monasson
- Laboratory of Physics of the Ecole Normale Supérieure, PSL Research, CNRS UMR 8023, Sorbonne Université, Université de Paris, Paris, France
| | - Jérôme Tubiana
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
43
|
Rosenkranz AA, Slastnikova TA. Prospects of Using Protein Engineering for Selective Drug Delivery into a Specific Compartment of Target Cells. Pharmaceutics 2023; 15:pharmaceutics15030987. [PMID: 36986848 PMCID: PMC10055131 DOI: 10.3390/pharmaceutics15030987] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Revised: 03/13/2023] [Accepted: 03/17/2023] [Indexed: 03/30/2023] Open
Abstract
A large number of proteins are successfully used to treat various diseases. These include natural polypeptide hormones, their synthetic analogues, antibodies, antibody mimetics, enzymes, and other drugs based on them. Many of them are demanded in clinical settings and commercially successful, mainly for cancer treatment. The targets for most of the aforementioned drugs are located at the cell surface. Meanwhile, the vast majority of therapeutic targets, which are usually regulatory macromolecules, are located inside the cell. Traditional low molecular weight drugs freely penetrate all cells, causing side effects in non-target cells. In addition, it is often difficult to elaborate a small molecule that can specifically affect protein interactions. Modern technologies make it possible to obtain proteins capable of interacting with almost any target. However, proteins, like other macromolecules, cannot, as a rule, freely penetrate into the desired cellular compartment. Recent studies allow us to design multifunctional proteins that solve these problems. This review considers the scope of application of such artificial constructs for the targeted delivery of both protein-based and traditional low molecular weight drugs, the obstacles met on the way of their transport to the specified intracellular compartment of the target cells after their systemic bloodstream administration, and the means to overcome those difficulties.
Collapse
Affiliation(s)
- Andrey A Rosenkranz
- Laboratory of Molecular Genetics of Intracellular Transport, Institute of Gene Biology of Russian Academy of Sciences, 34/5 Vavilov St., 119334 Moscow, Russia
- Department of Biophysics, Faculty of Biology, Lomonosov Moscow State University, 1-12 Leninskie Gory St., 119234 Moscow, Russia
| | - Tatiana A Slastnikova
- Laboratory of Molecular Genetics of Intracellular Transport, Institute of Gene Biology of Russian Academy of Sciences, 34/5 Vavilov St., 119334 Moscow, Russia
| |
Collapse
|
44
|
Bertoline LMF, Lima AN, Krieger JE, Teixeira SK. Before and after AlphaFold2: An overview of protein structure prediction. FRONTIERS IN BIOINFORMATICS 2023; 3:1120370. [PMID: 36926275 PMCID: PMC10011655 DOI: 10.3389/fbinf.2023.1120370] [Citation(s) in RCA: 59] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 02/17/2023] [Indexed: 03/08/2023] Open
Abstract
Three-dimensional protein structure is directly correlated with its function and its determination is critical to understanding biological processes and addressing human health and life science problems in general. Although new protein structures are experimentally obtained over time, there is still a large difference between the number of protein sequences placed in Uniprot and those with resolved tertiary structure. In this context, studies have emerged to predict protein structures by methods based on a template or free modeling. In the last years, different methods have been combined to overcome their individual limitations, until the emergence of AlphaFold2, which demonstrated that predicting protein structure with high accuracy at unprecedented scale is possible. Despite its current impact in the field, AlphaFold2 has limitations. Recently, new methods based on protein language models have promised to revolutionize the protein structural biology allowing the discovery of protein structure and function only from evolutionary patterns present on protein sequence. Even though these methods do not reach AlphaFold2 accuracy, they already covered some of its limitations, being able to predict with high accuracy more than 200 million proteins from metagenomic databases. In this mini-review, we provide an overview of the breakthroughs in protein structure prediction before and after AlphaFold2 emergence.
Collapse
|
45
|
AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. Commun Biol 2023; 6:160. [PMID: 36755055 PMCID: PMC9908985 DOI: 10.1038/s42003-023-04488-9] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 01/16/2023] [Indexed: 02/10/2023] Open
Abstract
Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.
Collapse
|
46
|
Duran-Frigola M, Cigler M, Winter GE. Advancing Targeted Protein Degradation via Multiomics Profiling and Artificial Intelligence. J Am Chem Soc 2023; 145:2711-2732. [PMID: 36706315 PMCID: PMC9912273 DOI: 10.1021/jacs.2c11098] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Indexed: 01/28/2023]
Abstract
Only around 20% of the human proteome is considered to be druggable with small-molecule antagonists. This leaves some of the most compelling therapeutic targets outside the reach of ligand discovery. The concept of targeted protein degradation (TPD) promises to overcome some of these limitations. In brief, TPD is dependent on small molecules that induce the proximity between a protein of interest (POI) and an E3 ubiquitin ligase, causing ubiquitination and degradation of the POI. In this perspective, we want to reflect on current challenges in the field, and discuss how advances in multiomics profiling, artificial intelligence, and machine learning (AI/ML) will be vital in overcoming them. The presented roadmap is discussed in the context of small-molecule degraders but is equally applicable for other emerging proximity-inducing modalities.
Collapse
Affiliation(s)
- Miquel Duran-Frigola
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
- Ersilia
Open Source Initiative, 28 Belgrave Road, CB1 3DE, Cambridge, United Kingdom
| | - Marko Cigler
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
| | - Georg E. Winter
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
| |
Collapse
|
47
|
Zhao H, Zhang H, She Z, Gao Z, Wang Q, Geng Z, Dong Y. Exploring AlphaFold2's Performance on Predicting Amino Acid Side-Chain Conformations and Its Utility in Crystal Structure Determination of B318L Protein. Int J Mol Sci 2023; 24:2740. [PMID: 36769074 PMCID: PMC9916901 DOI: 10.3390/ijms24032740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 01/10/2023] [Accepted: 01/12/2023] [Indexed: 02/04/2023] Open
Abstract
Recent technological breakthroughs in machine-learning-based AlphaFold2 (AF2) are pushing the prediction accuracy of protein structures to an unprecedented level that is on par with experimental structural quality. Despite its outstanding structural modeling capability, further experimental validations and performance assessments of AF2 predictions are still required, thus necessitating the development of integrative structural biology in synergy with both computational and experimental methods. Focusing on the B318L protein that plays an essential role in the African swine fever virus (ASFV) for viral replication, we experimentally demonstrate the high quality of the AF2 predicted model and its practical utility in crystal structural determination. Structural alignment implies that the AF2 model shares nearly the same atomic arrangement as the B318L crystal structure except for some flexible and disordered regions. More importantly, side-chain-based analysis at the individual residue level reveals that AF2's performance is likely dependent on the specific amino acid type and that hydrophobic residues tend to be more accurately predicted by AF2 than hydrophilic residues. Quantitative per-residue RMSD comparisons and further molecular replacement trials suggest that AF2 has a large potential to outperform other computational modeling methods in terms of structural determination. Additionally, it is numerically confirmed that the AF2 model is accurate enough so that it may well potentially withstand experimental data quality to a large extent for structural determination. Finally, an overall structural analysis and molecular docking simulation of the B318L protein are performed. Taken together, our study not only provides new insights into AF2's performance in predicting side-chain conformations but also sheds light upon the significance of AF2 in promoting crystal structural determination, especially when the experimental data quality of the protein crystal is poor.
Collapse
Affiliation(s)
- Haifan Zhao
- School of Life Sciences, University of Science and Technology of China, Hefei 230027, China
- Beijing Synchrotron Radiation Facility, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
| | - Heng Zhang
- Beijing Synchrotron Radiation Facility, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
| | - Zhun She
- Beijing Synchrotron Radiation Facility, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
| | - Zengqiang Gao
- Beijing Synchrotron Radiation Facility, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
| | - Qi Wang
- Beijing Synchrotron Radiation Facility, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhi Geng
- Beijing Synchrotron Radiation Facility, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
| | - Yuhui Dong
- Beijing Synchrotron Radiation Facility, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
48
|
Sora V, Laspiur AO, Degn K, Arnaudi M, Utichi M, Beltrame L, De Menezes D, Orlandi M, Stoltze UK, Rigina O, Sackett PW, Wadt K, Schmiegelow K, Tiberti M, Papaleo E. RosettaDDGPrediction for high-throughput mutational scans: From stability to binding. Protein Sci 2023; 32:e4527. [PMID: 36461907 PMCID: PMC9795540 DOI: 10.1002/pro.4527] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 11/25/2022] [Accepted: 11/25/2022] [Indexed: 12/05/2022]
Abstract
Reliable prediction of free energy changes upon amino acid substitutions (ΔΔGs) is crucial to investigate their impact on protein stability and protein-protein interaction. Advances in experimental mutational scans allow high-throughput studies thanks to multiplex techniques. On the other hand, genomics initiatives provide a large amount of data on disease-related variants that can benefit from analyses with structure-based methods. Therefore, the computational field should keep the same pace and provide new tools for fast and accurate high-throughput ΔΔG calculations. In this context, the Rosetta modeling suite implements effective approaches to predict folding/unfolding ΔΔGs in a protein monomer upon amino acid substitutions and calculate the changes in binding free energy in protein complexes. However, their application can be challenging to users without extensive experience with Rosetta. Furthermore, Rosetta protocols for ΔΔG prediction are designed considering one variant at a time, making the setup of high-throughput screenings cumbersome. For these reasons, we devised RosettaDDGPrediction, a customizable Python wrapper designed to run free energy calculations on a set of amino acid substitutions using Rosetta protocols with little intervention from the user. Moreover, RosettaDDGPrediction assists with checking completed runs and aggregates raw data for multiple variants, as well as generates publication-ready graphics. We showed the potential of the tool in four case studies, including variants of uncertain significance in childhood cancer, proteins with known experimental unfolding ΔΔGs values, interactions between target proteins and disordered motifs, and phosphomimetics. RosettaDDGPrediction is available, free of charge and under GNU General Public License v3.0, at https://github.com/ELELAB/RosettaDDGPrediction.
Collapse
Affiliation(s)
- Valentina Sora
- Cancer Structural Biology, Danish Cancer Society Research CenterCopenhagenDenmark
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Adrian Otamendi Laspiur
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Kristine Degn
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Matteo Arnaudi
- Cancer Structural Biology, Danish Cancer Society Research CenterCopenhagenDenmark
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Mattia Utichi
- Cancer Structural Biology, Danish Cancer Society Research CenterCopenhagenDenmark
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Ludovica Beltrame
- Cancer Structural Biology, Danish Cancer Society Research CenterCopenhagenDenmark
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Dayana De Menezes
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Matteo Orlandi
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Ulrik Kristoffer Stoltze
- Department of Clinical GeneticsCopenhagen University Hospital RigshospitaletCopenhagenDenmark
- Department of Pediatrics and Adolescent MedicineUniversity Hospital RigshospitaletCopenhagenDenmark
- Institute of Clinical Medicine, Faculty of MedicineUniversity of CopenhagenCopenhagenDenmark
| | - Olga Rigina
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Peter Wad Sackett
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Karin Wadt
- Department of Clinical GeneticsCopenhagen University Hospital RigshospitaletCopenhagenDenmark
- Institute of Clinical Medicine, Faculty of MedicineUniversity of CopenhagenCopenhagenDenmark
| | - Kjeld Schmiegelow
- Department of Pediatrics and Adolescent MedicineUniversity Hospital RigshospitaletCopenhagenDenmark
- Institute of Clinical Medicine, Faculty of MedicineUniversity of CopenhagenCopenhagenDenmark
| | - Matteo Tiberti
- Cancer Structural Biology, Danish Cancer Society Research CenterCopenhagenDenmark
| | - Elena Papaleo
- Cancer Structural Biology, Danish Cancer Society Research CenterCopenhagenDenmark
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| |
Collapse
|
49
|
Brender JR, Ramamoorthy A, Gursky O, Bhunia A. Intrinsic disorder and structural biology: Searching where the light isn't. Biophys Chem 2023; 292:106912. [PMID: 36335754 DOI: 10.1016/j.bpc.2022.106912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Jeffrey R Brender
- Radiation Biology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Ayyalusamy Ramamoorthy
- Biophysics, Department of Chemistry, Biomedical Engineering, and Macromolecular Science and Engineering, University of Michigan, Ann Arbor, MI 48109-1055, USA
| | - Olga Gursky
- Boston University School of Medicine, Department of Physiology & Biophysics, W302, 700 Albany St, Boston, MA 02118, USA
| | - Anirban Bhunia
- Biomolecular NMR and Drug Design Laboratory, Department of Biophysics, Bose Institute, P-1/12 CIT Scheme VII (M), Kolkata 700054, India
| |
Collapse
|
50
|
Durairaj J, de Ridder D, van Dijk AD. Beyond sequence: Structure-based machine learning. Comput Struct Biotechnol J 2022; 21:630-643. [PMID: 36659927 PMCID: PMC9826903 DOI: 10.1016/j.csbj.2022.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 12/21/2022] [Accepted: 12/21/2022] [Indexed: 12/31/2022] Open
Abstract
Recent breakthroughs in protein structure prediction demarcate the start of a new era in structural bioinformatics. Combined with various advances in experimental structure determination and the uninterrupted pace at which new structures are published, this promises an age in which protein structure information is as prevalent and ubiquitous as sequence. Machine learning in protein bioinformatics has been dominated by sequence-based methods, but this is now changing to make use of the deluge of rich structural information as input. Machine learning methods making use of structures are scattered across literature and cover a number of different applications and scopes; while some try to address questions and tasks within a single protein family, others aim to capture characteristics across all available proteins. In this review, we look at the variety of structure-based machine learning approaches, how structures can be used as input, and typical applications of these approaches in protein biology. We also discuss current challenges and opportunities in this all-important and increasingly popular field.
Collapse
Affiliation(s)
- Janani Durairaj
- Biozentrum, University of Basel, Basel, Switzerland
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Aalt D.J. van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| |
Collapse
|