1
|
Chu HY, Fong JHC, Thean DGL, Zhou P, Fung FKC, Huang Y, Wong ASL. Accurate top protein variant discovery via low-N pick-and-validate machine learning. Cell Syst 2024; 15:193-203.e6. [PMID: 38340729 DOI: 10.1016/j.cels.2024.01.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 10/11/2023] [Accepted: 01/18/2024] [Indexed: 02/12/2024]
Abstract
A strategy to obtain the greatest number of best-performing variants with least amount of experimental effort over the vast combinatorial mutational landscape would have enormous utility in boosting resource producibility for protein engineering. Toward this goal, we present a simple and effective machine learning-based strategy that outperforms other state-of-the-art methods. Our strategy integrates zero-shot prediction and multi-round sampling to direct active learning via experimenting with only a few predicted top variants. We find that four rounds of low-N pick-and-validate sampling of 12 variants for machine learning yielded the best accuracy of up to 92.6% in selecting the true top 1% variants in combinatorial mutant libraries, whereas two rounds of 24 variants can also be used. We demonstrate our strategy in successfully discovering high-performance protein variants from diverse families including the CRISPR-based genome editors, supporting its generalizable application for solving protein engineering tasks. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Hoi Yee Chu
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China
| | - John H C Fong
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Dawn G L Thean
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Peng Zhou
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China
| | - Frederic K C Fung
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China
| | - Yuanhua Huang
- School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Alan S L Wong
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China.
| |
Collapse
|
2
|
Chen J, Woldring DR, Huang F, Huang X, Wei GW. Topological deep learning based deep mutational scanning. Comput Biol Med 2023; 164:107258. [PMID: 37506452 PMCID: PMC10528359 DOI: 10.1016/j.compbiomed.2023.107258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 06/28/2023] [Accepted: 07/08/2023] [Indexed: 07/30/2023]
Abstract
High-throughput deep mutational scanning (DMS) experiments have significantly impacted protein engineering, drug discovery, immunology, cancer biology, and evolutionary biology by enabling the systematic understanding of protein functions. However, the mutational space associated with proteins is astronomically large, making it overwhelming for current experimental capabilities. Therefore, alternative methods for DMS are imperative. We propose a topological deep learning (TDL) paradigm to facilitate in silico DMS. We utilize a new topological data analysis (TDA) technique based on the persistent spectral theory, also known as persistent Laplacian, to capture both topological invariants and the homotopic shape evolution of data. To validate our TDL-DMS model, we use SARS-CoV-2 datasets and show excellent accuracy and reliability for binding interface mutations. This finding is significant for SARS-CoV-2 variant forecasting and designing effective antibodies and vaccines. Our proposed model is expected to have a significant impact on drug discovery, vaccine design, precision medicine, and protein engineering.
Collapse
Affiliation(s)
- Jiahui Chen
- Department of Mathematical Sciences, University of Arkansas, Fayetteville, AR 72701, USA
| | - Daniel R Woldring
- Department of Chemical Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Faqing Huang
- Department of Chemistry and Biochemistry, University of Southern Mississippi, Hattiesburg, MS 39406, USA
| | - Xuefei Huang
- Department of Chemistry, Michigan State University, MI 48824, USA; Department of Biomedical Engineering, Michigan State University, East Lansing, MI 48824, USA; The Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA; Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA.
| |
Collapse
|
3
|
Wang G, Liu X, Wang K, Gao Y, Li G, Baptista-Hon DT, Yang XH, Xue K, Tai WH, Jiang Z, Cheng L, Fok M, Lau JYN, Yang S, Lu L, Zhang P, Zhang K. Deep-learning-enabled protein-protein interaction analysis for prediction of SARS-CoV-2 infectivity and variant evolution. Nat Med 2023; 29:2007-2018. [PMID: 37524952 DOI: 10.1038/s41591-023-02483-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 06/28/2023] [Indexed: 08/02/2023]
Abstract
Host-pathogen interactions and pathogen evolution are underpinned by protein-protein interactions between viral and host proteins. An understanding of how viral variants affect protein-protein binding is important for predicting viral-host interactions, such as the emergence of new pathogenic SARS-CoV-2 variants. Here we propose an artificial intelligence-based framework called UniBind, in which proteins are represented as a graph at the residue and atom levels. UniBind integrates protein three-dimensional structure and binding affinity and is capable of multi-task learning for heterogeneous biological data integration. In systematic tests on benchmark datasets and further experimental validation, UniBind effectively and scalably predicted the effects of SARS-CoV-2 spike protein variants on their binding affinities to the human ACE2 receptor, as well as to SARS-CoV-2 neutralizing monoclonal antibodies. Furthermore, in a cross-species analysis, UniBind could be applied to predict host susceptibility to SARS-CoV-2 variants and to predict future viral variant evolutionary trends. This in silico approach has the potential to serve as an early warning system for problematic emerging SARS-CoV-2 variants, as well as to facilitate research on protein-protein interactions in general.
Collapse
Affiliation(s)
- Guangyu Wang
- State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China.
| | - Xiaohong Liu
- Instutite for Artificial Intelligence in Medicine and Faculty of Medicine, Macau University of Science and Technology, Macau, China
- UCL Cancer Institute, University College London, London, UK
| | - Kai Wang
- Department of Big Data and Biomedical Artificial Intelligence, National Biomedical Imaging Center, College of Future Technology, Peking University and Peking-Tsinghua Center for Life Sciences, Beijing, China
| | - Yuanxu Gao
- Guangzhou National Laboratory, Guangzhou, China
| | - Gen Li
- Guangzhou National Laboratory, Guangzhou, China
- Guangzhou Women and Children's Medical Center, Guangzhou, China
| | - Daniel T Baptista-Hon
- Instutite for Artificial Intelligence in Medicine and Faculty of Medicine, Macau University of Science and Technology, Macau, China
- Zhuhai International Eye Center and Provincial Key Laboratory of Tumor Interventional Diagnosis and Treatment, Zhuhai People's Hospital and the First Affiliated Hospital of Faculty of Medicine, Macau University of Science and Technology, Guangdong, China
| | - Xiaohong Helena Yang
- Instutite for Artificial Intelligence in Medicine and Faculty of Medicine, Macau University of Science and Technology, Macau, China
| | - Kanmin Xue
- Nuffield Laboratory of Ophthalmology, Department of Clinical Neurosciences, University of Oxford, Oxford, UK
| | - Wa Hou Tai
- Instutite for Artificial Intelligence in Medicine and Faculty of Medicine, Macau University of Science and Technology, Macau, China
| | - Zeyu Jiang
- State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
| | - Linling Cheng
- Instutite for Artificial Intelligence in Medicine and Faculty of Medicine, Macau University of Science and Technology, Macau, China
- Zhuhai International Eye Center and Provincial Key Laboratory of Tumor Interventional Diagnosis and Treatment, Zhuhai People's Hospital and the First Affiliated Hospital of Faculty of Medicine, Macau University of Science and Technology, Guangdong, China
| | - Manson Fok
- Instutite for Artificial Intelligence in Medicine and Faculty of Medicine, Macau University of Science and Technology, Macau, China
| | - Johnson Yiu-Nam Lau
- Departments of Biology and Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China
| | - Shengyong Yang
- State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Ligong Lu
- Instutite for Artificial Intelligence in Medicine and Faculty of Medicine, Macau University of Science and Technology, Macau, China
- Zhuhai International Eye Center and Provincial Key Laboratory of Tumor Interventional Diagnosis and Treatment, Zhuhai People's Hospital and the First Affiliated Hospital of Faculty of Medicine, Macau University of Science and Technology, Guangdong, China
| | - Ping Zhang
- State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
| | - Kang Zhang
- Instutite for Artificial Intelligence in Medicine and Faculty of Medicine, Macau University of Science and Technology, Macau, China.
- Department of Big Data and Biomedical Artificial Intelligence, National Biomedical Imaging Center, College of Future Technology, Peking University and Peking-Tsinghua Center for Life Sciences, Beijing, China.
- Guangzhou National Laboratory, Guangzhou, China.
- Zhuhai International Eye Center and Provincial Key Laboratory of Tumor Interventional Diagnosis and Treatment, Zhuhai People's Hospital and the First Affiliated Hospital of Faculty of Medicine, Macau University of Science and Technology, Guangdong, China.
| |
Collapse
|
4
|
Horne J, Shukla D. Recent Advances in Machine Learning Variant Effect Prediction Tools for Protein Engineering. Ind Eng Chem Res 2022; 61:6235-6245. [PMID: 36051311 PMCID: PMC9432854 DOI: 10.1021/acs.iecr.1c04943] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Proteins are Nature's molecular machinery and comprise diverse roles while consisting of chemically similar building blocks. In recent years, protein engineering and design have become important research areas, with many applications in the pharmaceutical, energy, and biocatalysis fields, among others-where the aim is to ultimately create a protein given desired structural and functional properties. It is often critical to model the relationship between a protein's sequence, folded structure, and biological function to assist in such protein engineering pursuits. However, significant challenges remain in concretely mapping an amino acid sequence to specific protein properties and biological activities. Mutations may enhance or diminish molecular protein function, and the epistatic interactions between mutations result in an inherently complex mapping between genetic modifications and protein function. Therefore, estimating the quantitative effects of mutations on protein function(s) remains a grand challenge of biology, bioinformatics, and many related fields and would rapidly accelerate protein engineering tasks when successful. Such estimation is often known as variant effect prediction (VEP). However, progress has been demonstrated in recent years with the development of machine learning (ML) methods in modeling the relationship between mutations and protein function. In this Review, recent advances in variant effect prediction (VEP) are discussed as tools for protein engineering, focusing on techniques incorporating gains from the broader ML community and challenges in estimating biomolecular functional differences. Primary developments highlighted include convolutional neural networks, graph neural networks, and natural language embeddings for protein sequences.
Collapse
Affiliation(s)
- Jesse Horne
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Champaign, Illinois 61801, United States
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering and Department of Bioengineering, University of Illinois Urbana-Champaign, Champaign, Illinois 61801, United States; Department of Plant Biology, Cancer Center at Illinois, and Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Champaign, Illinois 61801, United States
| |
Collapse
|
5
|
Thean DGL, Chu HY, Fong JHC, Chan BKC, Zhou P, Kwok CCS, Chan YM, Mak SYL, Choi GCG, Ho JWK, Zheng Z, Wong ASL. Machine learning-coupled combinatorial mutagenesis enables resource-efficient engineering of CRISPR-Cas9 genome editor activities. Nat Commun 2022; 13:2219. [PMID: 35468907 PMCID: PMC9039034 DOI: 10.1038/s41467-022-29874-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 04/04/2022] [Indexed: 12/12/2022] Open
Abstract
The genome-editing Cas9 protein uses multiple amino-acid residues to bind the target DNA. Considering only the residues in proximity to the target DNA as potential sites to optimise Cas9’s activity, the number of combinatorial variants to screen through is too massive for a wet-lab experiment. Here we generate and cross-validate ten in silico and experimental datasets of multi-domain combinatorial mutagenesis libraries for Cas9 engineering, and demonstrate that a machine learning-coupled engineering approach reduces the experimental screening burden by as high as 95% while enriching top-performing variants by ∼7.5-fold in comparison to the null model. Using this approach and followed by structure-guided engineering, we identify the N888R/A889Q variant conferring increased editing activity on the protospacer adjacent motif-relaxed KKH variant of Cas9 nuclease from Staphylococcus aureus (KKH-SaCas9) and its derived base editor in human cells. Our work validates a readily applicable workflow to enable resource-efficient high-throughput engineering of genome editor’s activity. Screening combinatorial mutants is too massive for wet-lab experiment alone. Here the authors present a machine learning-coupled combinatorial mutagenesis approach to vastly reduce experimental burden for engineering Cas9 genome editing enzymes.
Collapse
Affiliation(s)
- Dawn G L Thean
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, SAR, China
| | - Hoi Yee Chu
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, SAR, China.,Centre for Oncology and Immunology Limited, Hong Kong Science Park, Hong Kong, SAR, China
| | - John H C Fong
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, SAR, China
| | - Becky K C Chan
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, SAR, China.,Centre for Oncology and Immunology Limited, Hong Kong Science Park, Hong Kong, SAR, China
| | - Peng Zhou
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, SAR, China.,Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, SAR, China
| | - Cynthia C S Kwok
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, SAR, China
| | - Yee Man Chan
- Ming Wai Lau Centre for Reparative Medicine, Karolinska Institutet, Hong Kong, SAR, China
| | - Silvia Y L Mak
- Ming Wai Lau Centre for Reparative Medicine, Karolinska Institutet, Hong Kong, SAR, China
| | - Gigi C G Choi
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, SAR, China.,Centre for Oncology and Immunology Limited, Hong Kong Science Park, Hong Kong, SAR, China
| | - Joshua W K Ho
- School of Biomedical Sciences, The University of Hong Kong, Hong Kong, SAR, China.,Laboratory of Data Discovery for Health Limited (D24H), Hong Kong Science Park, Hong Kong, SAR, China
| | - Zongli Zheng
- Ming Wai Lau Centre for Reparative Medicine, Karolinska Institutet, Hong Kong, SAR, China.,Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, SAR, China.,Biotechnology and Health Centre, City University of Hong Kong Shenzhen Research Institute, Shenzhen, China
| | - Alan S L Wong
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, SAR, China. .,Centre for Oncology and Immunology Limited, Hong Kong Science Park, Hong Kong, SAR, China. .,Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, SAR, China.
| |
Collapse
|
6
|
Hanning KR, Minot M, Warrender AK, Kelton W, Reddy ST. Deep mutational scanning for therapeutic antibody engineering. Trends Pharmacol Sci 2021; 43:123-135. [PMID: 34895944 DOI: 10.1016/j.tips.2021.11.010] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 11/02/2021] [Accepted: 11/10/2021] [Indexed: 12/24/2022]
Abstract
The biophysical and functional properties of monoclonal antibody (mAb) drug candidates are often improved by protein engineering methods to increase the probability of clinical efficacy. One emerging method is deep mutational scanning (DMS) which combines the power of exhaustive protein mutagenesis and functional screening with deep sequencing and bioinformatics. The application of DMS has yielded significant improvements to the affinity, specificity, and stability of several preclinical antibodies alongside novel applications such as introducing multi-specific binding properties. DMS has also been applied directly on target antigens to precisely map antibody-binding epitopes and notably to profile the mutational escape potential of viral targets (e.g., SARS-CoV-2 variants). Finally, DMS combined with machine learning is enabling advances in the computational screening and engineering of therapeutic antibodies.
Collapse
Affiliation(s)
- Kyrin R Hanning
- Te Huataki Waiora School of Health, University of Waikato, Hamilton 3240, New Zealand
| | - Mason Minot
- Department of Biosystems Science and Engineering, Eidgenössische Technische Hochschule (ETH) Zurich, Basel 4058, Switzerland
| | - Annmaree K Warrender
- Te Huataki Waiora School of Health, University of Waikato, Hamilton 3240, New Zealand
| | - William Kelton
- Te Huataki Waiora School of Health, University of Waikato, Hamilton 3240, New Zealand.
| | - Sai T Reddy
- Department of Biosystems Science and Engineering, Eidgenössische Technische Hochschule (ETH) Zurich, Basel 4058, Switzerland.
| |
Collapse
|