1
|
Pražnikar J. Using graphlet degree vectors to predict atomic displacement parameters in protein structures. Acta Crystallogr D Struct Biol 2023; 79:1109-1119. [PMID: 37987168 PMCID: PMC10833351 DOI: 10.1107/s2059798323009142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 10/17/2023] [Indexed: 11/22/2023] Open
Abstract
In structural biology, atomic displacement parameters, commonly used in the form of B values, describe uncertainties in atomic positions. Their distribution over the structure can provide hints on local structural reliability and mobility. A spatial macromolecular model can be represented by a graph whose nodes are atoms and whose edges correspond to all interatomic contacts within a certain distance. Small connected subgraphs, called graphlets, provide information about the wiring of a particular atom. The multiple linear regression approach based on this information aims to predict a distribution of values of isotropic atomic displacement parameters (B values) within a protein structure, given the atomic coordinates and molecular packing. By modeling the dynamic component of atomic uncertainties, this method allows the B values obtained from experimental crystallographic or cryo-electron microscopy studies to be reproduced relatively well.
Collapse
Affiliation(s)
- Jure Pražnikar
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, Koper, Slovenia
- Department of Biochemistry, Molecular and Structural Biology, Institute Jožef Stefan, Jamova 39, Ljubljana, Slovenia
| |
Collapse
|
2
|
Pandey A, Liu E, Graham J, Chen W, Keten S. B-factor prediction in proteins using a sequence-based deep learning model. PATTERNS (NEW YORK, N.Y.) 2023; 4:100805. [PMID: 37720331 PMCID: PMC10499862 DOI: 10.1016/j.patter.2023.100805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/23/2023] [Accepted: 07/07/2023] [Indexed: 09/19/2023]
Abstract
B factors provide critical insight into protein dynamics. Predicting B factors of an atom in new proteins remains challenging as it is impacted by their neighbors in Euclidean space. Previous learning methods developed have resulted in low Pearson correlation coefficients beyond the training set due to their limited ability to capture the effect of neighboring atoms. With the advances in deep learning methods, we develop a sequence-based model that is tested on 2,442 proteins and outperforms the state-of-the-art models by 30%. We find that the model learns that the B factor of a site is prominently affected by atoms within a 12-15 Å radius, which is in excellent agreement with cutoffs from protein network models. The ablation study revealed that the B factor can largely be predicted from the primary sequence alone. Based on the abovementioned points, our model lays a foundation for predicting other properties that are correlated with the B factor.
Collapse
Affiliation(s)
- Akash Pandey
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - Elaine Liu
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - Jacob Graham
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - Wei Chen
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - Sinan Keten
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
- Department of Civil and Environmental Engineering, Northwestern University, Evanston, IL, USA
| |
Collapse
|
3
|
Chauhan VM, Pantazes RJ. Analysis of conformational stability of interacting residues in protein binding interfaces. Protein Eng Des Sel 2023; 36:gzad016. [PMID: 37889566 PMCID: PMC10681001 DOI: 10.1093/protein/gzad016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 10/17/2023] [Accepted: 10/18/2023] [Indexed: 10/28/2023] Open
Abstract
After approximately 60 years of work, the protein folding problem has recently seen rapid advancement thanks to the inventions of AlphaFold and RoseTTAFold, which are machine-learning algorithms capable of reliably predicting protein structures from their sequences. A key component in their success was the inclusion of pairwise interaction information between residues. As research focus shifts towards developing algorithms to design and engineer binding proteins, it is likely that knowledge of interaction features at protein interfaces can improve predictions. Here, 574 protein complexes were analyzed to identify the stability features of their pairwise interactions, revealing that interactions between pre-stabilized residues are a selected feature in protein binding interfaces. In a retrospective analysis of 475 de novo designed binding proteins with an experimental success rate of 19%, inclusion of pairwise interaction pre-stabilization parameters increased the frequency of identifying experimentally successful binders to 40%.
Collapse
Affiliation(s)
- Varun M Chauhan
- Department of Chemical Engineering, Auburn University, Auburn, AL 36849, USA
| | - Robert J Pantazes
- Department of Chemical Engineering, Auburn University, Auburn, AL 36849, USA
| |
Collapse
|
4
|
Xia K, Liu X, Wee J. Persistent Homology for RNA Data Analysis. Methods Mol Biol 2023; 2627:211-229. [PMID: 36959450 DOI: 10.1007/978-1-0716-2974-1_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
Molecular representations are of great importance for machine learning models in RNA data analysis. Essentially, efficient molecular descriptors or fingerprints that characterize the intrinsic structural and interactional information of RNAs can significantly boost the performance of all learning modeling. In this paper, we introduce two persistent models, including persistent homology and persistent spectral, for RNA structure and interaction representations and their applications in RNA data analysis. Different from traditional geometric and graph representations, persistent homology is built on simplicial complex, which is a generalization of graph models to higher-dimensional situations. Hypergraph is a further generalization of simplicial complexes and hypergraph-based embedded persistent homology has been proposed recently. Moreover, persistent spectral models, which combine filtration process with spectral models, including spectral graph, spectral simplicial complex, and spectral hypergraph, are proposed for molecular representation. The persistent attributes for RNAs can be obtained from these two persistent models and further combined with machine learning models for RNA structure, flexibility, dynamics, and function analysis.
Collapse
Affiliation(s)
- Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore.
| | - Xiang Liu
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
- Chern Institute of Mathematics and LPMC, Nankai University, Tianjin, China
| | - JunJie Wee
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
5
|
Lin Y, Zhang Y, Wang D, Yang B, Shen YQ. Computer especially AI-assisted drug virtual screening and design in traditional Chinese medicine. PHYTOMEDICINE : INTERNATIONAL JOURNAL OF PHYTOTHERAPY AND PHYTOPHARMACOLOGY 2022; 107:154481. [PMID: 36215788 DOI: 10.1016/j.phymed.2022.154481] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 09/14/2022] [Accepted: 09/27/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND Traditional Chinese medicine (TCM), as a significant part of the global pharmaceutical science, the abundant molecular compounds it contains is a valuable potential source of designing and screening new drugs. However, due to the un-estimated quantity of the natural molecular compounds and diversity of the related problems drug discovery such as precise screening of molecular compounds or the evaluation of efficacy, physicochemical properties and pharmacokinetics, it is arduous for researchers to design or screen applicable compounds through old methods. With the rapid development of computer technology recently, especially artificial intelligence (AI), its innovation in the field of virtual screening contributes to an increasing efficiency and accuracy in the process of discovering new drugs. PURPOSE This study systematically reviewed the application of computational approaches and artificial intelligence in drug virtual filtering and devising of TCM and presented the potential perspective of computer-aided TCM development. STUDY DESIGN We made a systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Then screening the most typical articles for our research. METHODS The systematic review was performed by following the PRISMA guidelines. The databases PubMed, EMBASE, Web of Science, CNKI were used to search for publications that focused on computer-aided drug virtual screening and design in TCM. RESULT Totally, 42 corresponding articles were included in literature reviewing. Aforementioned studies were of great significance to the treatment and cost control of many challenging diseases such as COVID-19, diabetes, Alzheimer's Disease (AD), etc. Computational approaches and AI were widely used in virtual screening in the process of TCM advancing, which include structure-based virtual screening (SBVS) and ligand-based virtual screening (LBVS). Besides, computational technologies were also extensively applied in absorption, distribution, metabolism, excretion and toxicity (ADMET) prediction of candidate drugs and new drug design in crucial course of drug discovery. CONCLUSIONS The applications of computer and AI play an important role in the drug virtual screening and design in the field of TCM, with huge application prospects.
Collapse
Affiliation(s)
- Yumeng Lin
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Chinese Academy of Medical Sciences Research Unit of Oral Carcinogenesis and Management, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - You Zhang
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Chinese Academy of Medical Sciences Research Unit of Oral Carcinogenesis and Management, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Dongyang Wang
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Chinese Academy of Medical Sciences Research Unit of Oral Carcinogenesis and Management, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Bowen Yang
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Chinese Academy of Medical Sciences Research Unit of Oral Carcinogenesis and Management, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Ying-Qiang Shen
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Chinese Academy of Medical Sciences Research Unit of Oral Carcinogenesis and Management, West China Hospital of Stomatology, Sichuan University, Chengdu, China.
| |
Collapse
|
6
|
Protein Function Analysis through Machine Learning. Biomolecules 2022; 12:biom12091246. [PMID: 36139085 PMCID: PMC9496392 DOI: 10.3390/biom12091246] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/22/2022] [Accepted: 08/31/2022] [Indexed: 11/16/2022] Open
Abstract
Machine learning (ML) has been an important arsenal in computational biology used to elucidate protein function for decades. With the recent burgeoning of novel ML methods and applications, new ML approaches have been incorporated into many areas of computational biology dealing with protein function. We examine how ML has been integrated into a wide range of computational models to improve prediction accuracy and gain a better understanding of protein function. The applications discussed are protein structure prediction, protein engineering using sequence modifications to achieve stability and druggability characteristics, molecular docking in terms of protein–ligand binding, including allosteric effects, protein–protein interactions and protein-centric drug discovery. To quantify the mechanisms underlying protein function, a holistic approach that takes structure, flexibility, stability, and dynamics into account is required, as these aspects become inseparable through their interdependence. Another key component of protein function is conformational dynamics, which often manifest as protein kinetics. Computational methods that use ML to generate representative conformational ensembles and quantify differences in conformational ensembles important for function are included in this review. Future opportunities are highlighted for each of these topics.
Collapse
|
7
|
Crystal structure of potato 14-3-3 protein St14f revealed the importance of helix I in StFDL1 recognition. Sci Rep 2022; 12:11596. [PMID: 35804047 PMCID: PMC9270373 DOI: 10.1038/s41598-022-15505-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Accepted: 06/24/2022] [Indexed: 11/08/2022] Open
Abstract
In potato (Solanum tuberosum L.), 14-3-3 protein forms a protein complex with the FLOWERING LOCUS T (FT)-like protein StSP6A and the FD-like protein StFDL1 to activate potato tuber formation. Eleven 14-3-3 isoforms were reported in potato, designated as St14a-k. In this study, the crystal structure of the free form of St14f was determined at 2.5 Å resolution. Three chains were included in the asymmetric unit of the St14f free form crystal, and the structural deviation among the three chain structures was found on the C-terminal helix H and I. The St14f free form structure in solution was also investigated by nuclear magnetic resonance (NMR) residual dipolar coupling analysis, and the chain B in the crystal structure was consistent with NMR data. Compared to other crystal structures, St14f helix I exhibited a different conformation with larger B-factor values. Larger B-factor values on helix I were also found in the 14-3-3 free form structure with higher solvent contents. The mutation in St14f Helix I stabilized the complex with StFDL1. These data clearly showed that the flexibility of helix I of 14-3-3 protein plays an important role in the recognition of target protein.
Collapse
|
8
|
Roethel A, Biliński P, Ishikawa T. BioS2Net: Holistic Structural and Sequential Analysis of Biomolecules Using a Deep Neural Network. Int J Mol Sci 2022; 23:ijms23062966. [PMID: 35328384 PMCID: PMC8954277 DOI: 10.3390/ijms23062966] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Revised: 03/05/2022] [Accepted: 03/08/2022] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND For decades, the rate of solving new biomolecular structures has been exceeding that at which their manual classification and feature characterisation can be carried out efficiently. Therefore, a new comprehensive and holistic tool for their examination is needed. METHODS Here we propose the Biological Sequence and Structure Network (BioS2Net), which is a novel deep neural network architecture that extracts both sequential and structural information of biomolecules. Our architecture consists of four main parts: (i) a sequence convolutional extractor, (ii) a 3D structure extractor, (iii) a 3D structure-aware sequence temporal network, as well as (iv) a fusion and classification network. RESULTS We have evaluated our approach using two protein fold classification datasets. BioS2Net achieved a 95.4% mean class accuracy on the eDD dataset and a 76% mean class accuracy on the F184 dataset. The accuracy of BioS2Net obtained on the eDD dataset was comparable to results achieved by previously published methods, confirming that the algorithm described in this article is a top-class solution for protein fold recognition. CONCLUSIONS BioS2Net is a novel tool for the holistic examination of biomolecules of known structure and sequence. It is a reliable tool for protein analysis and their unified representation as feature vectors.
Collapse
Affiliation(s)
- Albert Roethel
- Department of Molecular Biology, Institute of Biochemistry, Faculty of Biology, University of Warsaw, 02-096 Warsaw, Poland;
- College of Inter-Faculty Individual Studies in Mathematics and Natural Sciences, University of Warsaw, 02-097 Warsaw, Poland
| | - Piotr Biliński
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, 02-097 Warsaw, Poland;
| | - Takao Ishikawa
- Department of Molecular Biology, Institute of Biochemistry, Faculty of Biology, University of Warsaw, 02-096 Warsaw, Poland;
- Correspondence: ; Tel.: +48-22-5543111
| |
Collapse
|
9
|
Carugo O. Uses and Abuses of the Atomic Displacement Parameters in Structural Biology. Methods Mol Biol 2022; 2449:281-298. [PMID: 35507268 DOI: 10.1007/978-1-0716-2095-3_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
B-factors determined with X-ray crystallographic analyses are commonly used to estimate the flexibility degree of atoms, residues, and molecular moieties in biological macromolecules. In this chapter, the most recent studies and applications of B-factors in protein engineering and structural biology are briefly summarized. Particular emphasis is given to the limitations in using B-factors, in order to prevent inappropriate applications. It is eventually predicted that future applications will involve anisotropically refined B-factors, deep learning, and data produced by cryo-EM.
Collapse
|
10
|
Wei H, Wang B, Yang J, Gao J. RNA Flexibility Prediction With Sequence Profile and Predicted Solvent Accessibility. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2017-2022. [PMID: 31794403 DOI: 10.1109/tcbb.2019.2956496] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Structural flexibility plays an essential role in many biological processes. B-factor is an important indicator to measure the flexibility of protein or RNA structures. Many methods were developed to predict protein B-factors, but few studies have been done for RNA B-factor prediction. In this paper, we proposed a new method RNAbval to predict RNA B-factors using random forest. The method was developed using a comprehensive set of features, including the sequence profile and predicted solvent accessibility. RNAbval achieved an improvement of 9.2-20.5 percent over the state-of-the-art method on two benchmark test datasets. The proposed method is available at http://yanglab.nankai.edu.cn/RNAbval/.
Collapse
|
11
|
Wang S, Gong W, Deng X, Liu Y, Li C. Exploring the dynamics of RNA molecules with multiscale Gaussian network model. Chem Phys 2020. [DOI: 10.1016/j.chemphys.2020.110820] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
12
|
Wang R, Nguyen DD, Wei GW. Persistent spectral graph. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING 2020; 36:e3376. [PMID: 32515170 PMCID: PMC7719081 DOI: 10.1002/cnm.3376] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Revised: 05/15/2020] [Accepted: 05/31/2020] [Indexed: 05/25/2023]
Abstract
Persistent homology is constrained to purely topological persistence, while multiscale graphs account only for geometric information. This work introduces persistent spectral theory to create a unified low-dimensional multiscale paradigm for revealing topological persistence and extracting geometric shapes from high-dimensional datasets. For a point-cloud dataset, a filtration procedure is used to generate a sequence of chain complexes and associated families of simplicial complexes and chains, from which we construct persistent combinatorial Laplacian matrices. We show that a full set of topological persistence can be completely recovered from the harmonic persistent spectra, that is, the spectra that have zero eigenvalues, of the persistent combinatorial Laplacian matrices. However, non-harmonic spectra of the Laplacian matrices induced by the filtration offer another powerful tool for data analysis, modeling, and prediction. In this work, fullerene stability is predicted by using both harmonic spectra and non-harmonic persistent spectra, while the latter spectra are successfully devised to analyze the structure of fullerenes and model protein flexibility, which cannot be straightforwardly extracted from the current persistent homology. The proposed method is found to provide excellent predictions of the protein B-factors for which current popular biophysical models break down.
Collapse
Affiliation(s)
- Rui Wang
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Duc Duy Nguyen
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
| |
Collapse
|
13
|
Pun CS, Yong BYS, Xia K. Weighted-persistent-homology-based machine learning for RNA flexibility analysis. PLoS One 2020; 15:e0237747. [PMID: 32822369 PMCID: PMC7446851 DOI: 10.1371/journal.pone.0237747] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Accepted: 08/01/2020] [Indexed: 12/22/2022] Open
Abstract
With the great significance of biomolecular flexibility in biomolecular dynamics and functional analysis, various experimental and theoretical models are developed. Experimentally, Debye-Waller factor, also known as B-factor, measures atomic mean-square displacement and is usually considered as an important measurement for flexibility. Theoretically, elastic network models, Gaussian network model, flexibility-rigidity model, and other computational models have been proposed for flexibility analysis by shedding light on the biomolecular inner topological structures. Recently, a topology-based machine learning model has been proposed. By using the features from persistent homology, this model achieves a remarkable high Pearson correlation coefficient (PCC) in protein B-factor prediction. Motivated by its success, we propose weighted-persistent-homology (WPH)-based machine learning (WPHML) models for RNA flexibility analysis. Our WPH is a newly-proposed model, which incorporate physical, chemical and biological information into topological measurements using a weight function. In particular, we use local persistent homology (LPH) to focus on the topological information of local regions. Our WPHML model is validated on a well-established RNA dataset, and numerical experiments show that our model can achieve a PCC of up to 0.5822. The comparison with the previous sequence-information-based learning models shows that a consistent improvement in performance by at least 10% is achieved in our current model.
Collapse
Affiliation(s)
- Chi Seng Pun
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
- * E-mail: (CSP); (KX)
| | - Brandon Yung Sin Yong
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
- * E-mail: (CSP); (KX)
| |
Collapse
|
14
|
Bramer D, Wei GW. Atom-specific persistent homology and its application to protein flexibility analysis. COMPUTATIONAL AND MATHEMATICAL BIOPHYSICS 2020; 8:1-35. [PMID: 34278230 PMCID: PMC8281920 DOI: 10.1515/cmb-2020-0001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Recently, persistent homology has had tremendous success in biomolecular data analysis. It works by examining the topological relationship or connectivity of a group of atoms in a molecule at a variety of scales, then rendering a family of topological representations of the molecule. However, persistent homology is rarely employed for the analysis of atomic properties, such as biomolecular flexibility analysis or B-factor prediction. This work introduces atom-specific persistent homology to provide a local atomic level representation of a molecule via a global topological tool. This is achieved through the construction of a pair of conjugated sets of atoms and corresponding conjugated simplicial complexes, as well as conjugated topological spaces. The difference between the topological invariants of the pair of conjugated sets is measured by Bottleneck and Wasserstein metrics and leads to an atom-specific topological representation of individual atomic properties in a molecule. Atom-specific topological features are integrated with various machine learning algorithms, including gradient boosting trees and convolutional neural network for protein thermal fluctuation analysis and B-factor prediction. Extensive numerical results indicate the proposed method provides a powerful topological tool for analyzing and predicting localized information in complex macromolecules.
Collapse
Affiliation(s)
- David Bramer
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Corresponding Author: Guo-WeiWei: Department of Mathematics, Michigan State University, MI 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA; Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA,
| |
Collapse
|