1
|
Jayasekara SK, Joni HD, Jayantha B, Dissanayake L, Mandrell C, Sinharage MM, Molitor R, Jayasekara T, Sivakumar P, Jayakody LN. Trends in in-silico guided engineering of efficient polyethylene terephthalate (PET) hydrolyzing enzymes to enable bio-recycling and upcycling of PET. Comput Struct Biotechnol J 2023; 21:3513-3521. [PMID: 37484494 PMCID: PMC10362282 DOI: 10.1016/j.csbj.2023.06.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 06/01/2023] [Accepted: 06/03/2023] [Indexed: 07/25/2023] Open
Abstract
Polyethylene terephthalate (PET) is the largest produced polyester globally, and less than 30% of all the PET produced globally (∼6 billion pounds annually) is currently recycled into lower-quality products. The major drawbacks in current recycling methods (mechanical and chemical), have inspired the exploration of potentially efficient and sustainable PET depolymerization using biological approaches. Researchers have discovered efficient PET hydrolyzing enzymes in the plastisphere and have demonstrated the selective degradation of PET to original monomers thus enabling biological recycling or upcycling. However, several significant hurdles such as the less efficiency of the hydrolytic reaction, low thermostability of the enzymes, and the inability of the enzyme to depolymerize crystalline PET must be addressed in order to establish techno-economically feasible commercial-scale biological PET recycling or upcycling processes. Researchers leverage a synthetic biology-based design; build, test, and learn (DBTL) methodology to develop commercially applicable efficient PET hydrolyzing enzymes through 1) high-throughput metagenomic and proteomic approaches to discover new PET hydrolyzing enzymes with superior properties: and, 2) enzyme engineering approaches to modify and optimize PET hydrolyzing properties. Recently, in-silico platforms including molecular mechanics and machine learning concepts are emerging as innovative tools for the development of more efficient and effective PET recycling through the exploration of novel mutations in PET hydrolyzing enzymes. In-silico-guided PET hydrolyzing enzyme engineering with DBTL cycles enables the rapid development of efficient variants of enzymes over tedious conventional enzyme engineering methods such as random or directed evolution. This review highlights the potential of in-silico-guided PET degrading enzyme engineering to create more efficient variants, including Ideonella sakaiensis PETase (IsPETase) and leaf-branch compost cutinases (LCC). Furthermore, future research prospects are discussed to enable a sustainable circular economy through the bioconversion of PET to original or high-value platform chemicals.
Collapse
Affiliation(s)
- Sandhya K. Jayasekara
- School of Biological Science, Southern Illinois University Carbondale, Carbondale, IL, USA
| | - Hriday Dhar Joni
- School of Physics and Applied Physics, Southern Illinois University Carbondale, Carbondale, IL, USA
| | - Bhagya Jayantha
- School of Biological Science, Southern Illinois University Carbondale, Carbondale, IL, USA
| | - Lakshika Dissanayake
- School of Biological Science, Southern Illinois University Carbondale, Carbondale, IL, USA
| | - Christopher Mandrell
- School of Physics and Applied Physics, Southern Illinois University Carbondale, Carbondale, IL, USA
| | - Manuka M.S. Sinharage
- School of Physics and Applied Physics, Southern Illinois University Carbondale, Carbondale, IL, USA
| | - Ryan Molitor
- School of Physics and Applied Physics, Southern Illinois University Carbondale, Carbondale, IL, USA
| | - Thushari Jayasekara
- School of Physics and Applied Physics, Southern Illinois University Carbondale, Carbondale, IL, USA
| | - Poopalasingam Sivakumar
- School of Physics and Applied Physics, Southern Illinois University Carbondale, Carbondale, IL, USA
| | - Lahiru N. Jayakody
- School of Biological Science, Southern Illinois University Carbondale, Carbondale, IL, USA
- Fermentation Science Institute, Southern Illinois University Carbondale, Carbondale, IL, USA
| |
Collapse
|
2
|
Zhu JJ, Zhang NJ, Wei T, Chen HF. Enhancing Conformational Sampling for Intrinsically Disordered and Ordered Proteins by Variational Autoencoder. Int J Mol Sci 2023; 24:ijms24086896. [PMID: 37108059 PMCID: PMC10138423 DOI: 10.3390/ijms24086896] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 03/26/2023] [Accepted: 03/27/2023] [Indexed: 04/29/2023] Open
Abstract
Intrinsically disordered proteins (IDPs) account for more than 50% of the human proteome and are closely associated with tumors, cardiovascular diseases, and neurodegeneration, which have no fixed three-dimensional structure under physiological conditions. Due to the characteristic of conformational diversity, conventional experimental methods of structural biology, such as NMR, X-ray diffraction, and CryoEM, are unable to capture conformational ensembles. Molecular dynamics (MD) simulation can sample the dynamic conformations at the atomic level, which has become an effective method for studying the structure and function of IDPs. However, the high computational cost prevents MD simulations from being widely used for IDPs conformational sampling. In recent years, significant progress has been made in artificial intelligence, which makes it possible to solve the conformational reconstruction problem of IDP with fewer computational resources. Here, based on short MD simulations of different IDPs systems, we use variational autoencoders (VAEs) to achieve the generative reconstruction of IDPs structures and include a wider range of sampled conformations from longer simulations. Compared with the generative autoencoder (AEs), VAEs add an inference layer between the encoder and decoder in the latent space, which can cover the conformational landscape of IDPs more comprehensively and achieve the effect of enhanced sampling. Through experimental verification, the Cα RMSD between VAE-generated and MD simulation sampling conformations in the 5 IDPs test systems was significantly lower than that of AE. The Spearman correlation coefficient on the structure was higher than that of AE. VAE can also achieve excellent performance regarding structured proteins. In summary, VAEs can be used to effectively sample protein structures.
Collapse
Affiliation(s)
- Jun-Jie Zhu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ning-Jie Zhang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ting Wei
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hai-Feng Chen
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
- Shanghai Center for Bioinformation Technology, Shanghai 200240, China
| |
Collapse
|
3
|
Yin C, Song Z, Tian H, Palzkill T, Tao P. Unveiling the structural features that regulate carbapenem deacylation in KPC-2 through QM/MM and interpretable machine learning. Phys Chem Chem Phys 2023; 25:1349-1362. [PMID: 36537692 PMCID: PMC11162551 DOI: 10.1039/d2cp03724f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Resistance to carbapenem β-lactams presents major clinical and economical challenges for the treatment of pathogen infections. The fast hydrolysis of carbapenems by carbapenemase-producing bacterial strains enables the effective deactivation of carbapenem antibiotics. In this study, we aim to unravel the structural features that distinguish the notable deacylation activity of carbapenemases. The deacylation reactions between imipenem (IPM) and the KPC-2 class A serine-based β-lactamases (ASβLs) are modeled with combined quantum mechanical/molecular mechanical (QM/MM) minimum energy pathway (MEP) calculations and interpretable machine-learning (ML) methods. We first applied a dual-level computational protocol to achieve fast sampling of QM/MM MEPs. A tree-based ensemble ML model was employed to learn the MEP activation barriers from the conformational features of the KPC-2/IPM active site. The barrier-predicting model was then unboxed using the Shapley additive explanation (SHAP) importance attribution methods to derive mechanistic insights, which were also verified by additional QM/MM wavefunction analysis. Essentially, we show that potential hydrogen bonding interactions of the general base and the tautomerization states of the carbapenem pyrroline ring could concertedly regulate the activation barrier of KPC-2/IPM deacylation. Nonetheless, we demonstrate the efficacy of interpretable ML to assist the analysis of QM/MM simulation data for robust extraction of human-interpretable mechanistic insights.
Collapse
Affiliation(s)
- Chao Yin
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, 75205, USA.
| | - Zilin Song
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, 75205, USA.
| | - Hao Tian
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, 75205, USA.
| | - Timothy Palzkill
- Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, Texas, 77030, USA
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, 75205, USA.
| |
Collapse
|
4
|
Tian H, Ketkar R, Tao P. ADMETboost: a web server for accurate ADMET prediction. J Mol Model 2022; 28:408. [PMID: 36454321 PMCID: PMC9903341 DOI: 10.1007/s00894-022-05373-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 10/31/2022] [Indexed: 12/03/2022]
Abstract
The absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties are important in drug discovery as they define efficacy and safety. In this work, we applied an ensemble of features, including fingerprints and descriptors, and a tree-based machine learning model, extreme gradient boosting, for accurate ADMET prediction. Our model performs well in the Therapeutics Data Commons ADMET benchmark group. For 22 tasks, our model is ranked first in 18 tasks and top 3 in 21 tasks. The trained machine learning models are integrated in ADMETboost, a web server that is publicly available at https://ai-druglab.smu.edu/admet .
Collapse
Affiliation(s)
- Hao Tian
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, 75205, TX, USA
| | | | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, 75205, TX, USA.
| |
Collapse
|
5
|
Duan C, Nandy A, Adamji H, Roman-Leshkov Y, Kulik HJ. Machine Learning Models Predict Calculation Outcomes with the Transferability Necessary for Computational Catalysis. J Chem Theory Comput 2022; 18:4282-4292. [PMID: 35737587 DOI: 10.1021/acs.jctc.2c00331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Virtual high-throughput screening (VHTS) and machine learning (ML) have greatly accelerated the design of single-site transition-metal catalysts. VHTS of catalysts, however, is often accompanied with a high calculation failure rate and wasted computational resources due to the difficulty of simultaneously converging all mechanistically relevant reactive intermediates to expected geometries and electronic states. We demonstrate a dynamic classifier approach, i.e., a convolutional neural network that monitors geometry optimizations on the fly, and exploit its good performance and transferability in identifying geometry optimization failures for catalyst design. We show that the dynamic classifier performs well on all reactive intermediates in the representative catalytic cycle of the radical rebound mechanism for the conversion of methane to methanol despite being trained on only one reactive intermediate. The dynamic classifier also generalizes to chemically distinct intermediates and metal centers absent from the training data without loss of accuracy or model confidence. We rationalize this superior model transferability as arising from the use of electronic structure and geometric information generated on-the-fly from density functional theory calculations and the convolutional layer in the dynamic classifier. When used in combination with uncertainty quantification, the dynamic classifier saves more than half of the computational resources that would have been wasted on unsuccessful calculations for all reactive intermediates being considered.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Husain Adamji
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Yuriy Roman-Leshkov
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
6
|
Song Z, Trozzi F, Tian H, Yin C, Tao P. Mechanistic Insights into Enzyme Catalysis from Explaining Machine-Learned Quantum Mechanical and Molecular Mechanical Minimum Energy Pathways. ACS PHYSICAL CHEMISTRY AU 2022; 2:316-330. [PMID: 35936506 PMCID: PMC9344433 DOI: 10.1021/acsphyschemau.2c00005] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
With the increasing popularity of machine learning (ML) applications, the demand for explainable artificial intelligence techniques to explain ML models developed for computational chemistry has also emerged. In this study, we present the development of the Boltzmann-weighted cumulative integrated gradients (BCIG) approach for effective explanation of mechanistic insights into ML models trained on high-level quantum mechanical and molecular mechanical (QM/MM) minimum energy pathways. Using the acylation reactions of the Toho-1 β-lactamase and two antibiotics (ampicillin and cefalexin) as the model systems, we show that the BCIG approach could quantitatively attribute the energetic contribution in one system and the relative reactivity of individual steps across different systems to specific chemical processes such as the bond making/breaking and proton transfers. The proposed BCIG contribution attribution method quantifies chemistry-interpretable insights in terms of contributions from each elementary chemical process, which is in agreement with the validating QM/MM calculations and our intuitive mechanistic understandings of the model reactions.
Collapse
|
7
|
Tian H, Jiang X, Trozzi F, Xiao S, Larson EC, Tao P. Explore Protein Conformational Space With Variational Autoencoder. Front Mol Biosci 2021; 8:781635. [PMID: 34869602 PMCID: PMC8633506 DOI: 10.3389/fmolb.2021.781635] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 10/28/2021] [Indexed: 12/02/2022] Open
Abstract
Molecular dynamics (MD) simulations have been actively used in the study of protein structure and function. However, extensive sampling in the protein conformational space requires large computational resources and takes a prohibitive amount of time. In this study, we demonstrated that variational autoencoders (VAEs), a type of deep learning model, can be employed to explore the conformational space of a protein through MD simulations. VAEs are shown to be superior to autoencoders (AEs) through a benchmark study, with low deviation between the training and decoded conformations. Moreover, we show that the learned latent space in the VAE can be used to generate unsampled protein conformations. Additional simulations starting from these generated conformations accelerated the sampling process and explored hidden spaces in the conformational landscape.
Collapse
Affiliation(s)
- Hao Tian
- Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Department of Chemistry, Southern Methodist University, Dallas, TX, United States
| | - Xi Jiang
- Department of Statistical Science, Southern Methodist University, Dallas, TX, United States
| | - Francesco Trozzi
- Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Department of Chemistry, Southern Methodist University, Dallas, TX, United States
| | - Sian Xiao
- Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Department of Chemistry, Southern Methodist University, Dallas, TX, United States
| | - Eric C. Larson
- Department of Computer Science, Southern Methodist University, Dallas, TX, United States
| | - Peng Tao
- Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Department of Chemistry, Southern Methodist University, Dallas, TX, United States
| |
Collapse
|
8
|
Song Z, Trozzi F, Palzkill T, Tao P. QM/MM modeling of class A β-lactamases reveals distinct acylation pathways for ampicillin and cefalexin. Org Biomol Chem 2021; 19:9182-9189. [PMID: 34647114 PMCID: PMC8613693 DOI: 10.1039/d1ob01593a] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Efficient mechanism-based design of antibiotics that are not susceptible to β-lactamases is hindered by the lack of comprehensive knowledge on the energetic landscapes for the hydrolysis of various β-lactams. Herein, we adopted efficient quantum mechanics/molecular mechanics simulations to explore the acylation reaction catalyzed by CTX-M-44 (Toho-1) β-lactamase. We show that the catalytic pathways for β-lactam hydrolysis are correlated to substrate scaffolds: using Glu166 as the only general base for acylation is viable for ampicillin but prohibitive for cefalexin. The present computational workflow provides quantitative insights to facilitate the optimization of future β-lactam antibiotics.
Collapse
Affiliation(s)
- Zilin Song
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, USA.
| | - Francesco Trozzi
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, USA.
| | - Timothy Palzkill
- The Verna and Marrs McLean Department of Biochemistry and Molecular Biology and Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, USA.
| |
Collapse
|
9
|
Lyu Y, Scrimin P. Mimicking Enzymes: The Quest for Powerful Catalysts from Simple Molecules to Nanozymes. ACS Catal 2021. [DOI: 10.1021/acscatal.1c01219] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Yanchao Lyu
- University of Padova, Department of Chemical Sciences, via Marzolo, 1, 35131 Padova, Italy
| | - Paolo Scrimin
- University of Padova, Department of Chemical Sciences, via Marzolo, 1, 35131 Padova, Italy
| |
Collapse
|
10
|
Wu L, Qin L, Nie Y, Xu Y, Zhao YL. Computer-aided understanding and engineering of enzymatic selectivity. Biotechnol Adv 2021; 54:107793. [PMID: 34217814 DOI: 10.1016/j.biotechadv.2021.107793] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 04/26/2021] [Accepted: 06/28/2021] [Indexed: 12/26/2022]
Abstract
Enzymes offering chemo-, regio-, and stereoselectivity enable the asymmetric synthesis of high-value chiral molecules. Unfortunately, the drawback that naturally occurring enzymes are often inefficient or have undesired selectivity toward non-native substrates hinders the broadening of biocatalytic applications. To match the demands of specific selectivity in asymmetric synthesis, biochemists have implemented various computer-aided strategies in understanding and engineering enzymatic selectivity, diversifying the available repository of artificial enzymes. Here, given that the entire asymmetric catalytic cycle, involving precise interactions within the active pocket and substrate transport in the enzyme channel, could affect the enzymatic efficiency and selectivity, we presented a comprehensive overview of the computer-aided workflow for enzymatic selectivity. This review includes a mechanistic understanding of enzymatic selectivity based on quantum mechanical calculations, rational design of enzymatic selectivity guided by enzyme-substrate interactions, and enzymatic selectivity regulation via enzyme channel engineering. Finally, we discussed the computational paradigm for designing enzyme selectivity in silico to facilitate the advancement of asymmetric biosynthesis.
Collapse
Affiliation(s)
- Lunjie Wu
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Lei Qin
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Yao Nie
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China; Suqian Industrial Technology Research Institute of Jiangnan University, Suqian 223814, China.
| | - Yan Xu
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China; State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi 214122, China.
| | - Yi-Lei Zhao
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, MOE-LSB & MOE-LSC, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
11
|
Trozzi F, Wang X, Tao P. UMAP as a Dimensionality Reduction Tool for Molecular Dynamics Simulations of Biomacromolecules: A Comparison Study. J Phys Chem B 2021; 125:5022-5034. [PMID: 33973773 PMCID: PMC8356557 DOI: 10.1021/acs.jpcb.1c02081] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Proteins are the molecular machines of life. The multitude of possible conformations that proteins can adopt determines their free-energy landscapes. However, the inherently high dimensionality of a protein free-energy landscape poses a challenge to deciphering how proteins perform their functions. For this reason, dimensionality reduction is an active field of research for molecular biologists. The uniform manifold approximation and projection (UMAP) is a dimensionality reduction method based on a fuzzy topological analysis of data. In the present study, the performance of UMAP is compared with that of other popular dimensionality reduction methods such as t-distributed stochastic neighbor embedding (t-SNE), principal component analysis (PCA), and time-structure independent components analysis (tICA) in the context of analyzing molecular dynamics simulations of the circadian clock protein VIVID. A good dimensionality reduction method should accurately represent the data structure on the projected components. The comparison of the raw high-dimensional data with the projections obtained using different dimensionality reduction methods based on various metrics showed that UMAP has superior performance when compared with linear reduction methods (PCA and tICA) and has competitive performance and scalable computational cost.
Collapse
Affiliation(s)
- Francesco Trozzi
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, 75275, United States of America
| | - Xinlei Wang
- Department of Statistical Science, Southern Methodist University, Dallas, Texas, 75275, United States of America
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, 75275, United States of America
| |
Collapse
|