1
|
Kumar S, Duggineni VK, Singhania V, Misra SP, Deshpande PA. Unravelling and Quantifying the Biophysical– Biochemical Descriptors Governing Protein Thermostability by Machine Learning. ADVANCED THEORY AND SIMULATIONS 2023. [DOI: 10.1002/adts.202200703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Affiliation(s)
- Shashi Kumar
- Quantum and Molecular Engineering Laboratory Department of Chemical Engineering Indian Institute of Technology Kharagpur Kharagpur 721302 India
| | - Vinay Kumar Duggineni
- Quantum and Molecular Engineering Laboratory Department of Chemical Engineering Indian Institute of Technology Kharagpur Kharagpur 721302 India
| | - Vibhuti Singhania
- Quantum and Molecular Engineering Laboratory Department of Chemical Engineering Indian Institute of Technology Kharagpur Kharagpur 721302 India
| | - Swayam Prabha Misra
- Quantum and Molecular Engineering Laboratory Department of Chemical Engineering Indian Institute of Technology Kharagpur Kharagpur 721302 India
| | - Parag A. Deshpande
- Quantum and Molecular Engineering Laboratory Department of Chemical Engineering Indian Institute of Technology Kharagpur Kharagpur 721302 India
| |
Collapse
|
2
|
Evaluation of the Effectiveness of Derived Features of AlphaFold2 on Single-Sequence Protein Binding Site Prediction. BIOLOGY 2022; 11:biology11101454. [PMID: 36290358 PMCID: PMC9598995 DOI: 10.3390/biology11101454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 09/30/2022] [Accepted: 09/30/2022] [Indexed: 11/06/2022]
Abstract
Simple Summary With the development of artificial intelligence, researchers can roughly predict the crystal structure of a protein by computer without the need for biological experiments, which provides new ideas and solutions to problems, such as protein-protein interaction and drug-target predictions. In this study, we proposed strategies to combine predicted protein structures with deep learning networks and evaluated them on different protein binding site prediction tasks. Our computational experiment results showed that all proposed strategies could effectively encode structural information for deep learning models. Abstract Though AlphaFold2 has attained considerably high precision on protein structure prediction, it is reported that directly inputting coordinates into deep learning networks cannot achieve desirable results on downstream tasks. Thus, how to process and encode the predicted results into effective forms that deep learning models can understand to improve the performance of downstream tasks is worth exploring. In this study, we tested the effects of five processing strategies of coordinates on two single-sequence protein binding site prediction tasks. These five strategies are spatial filtering, the singular value decomposition of a distance map, calculating the secondary structure feature, and the relative accessible surface area feature of proteins. The computational experiment results showed that all strategies were suitable and effective methods to encode structural information for deep learning models. In addition, by performing a case study of a mutated protein, we showed that the spatial filtering strategy could introduce structural changes into HHblits profiles and deep learning networks when protein mutation happens. In sum, this work provides new insight into the downstream tasks of protein-molecule interaction prediction, such as predicting the binding residues of proteins and estimating the effects of mutations.
Collapse
|
3
|
Tverdislov VA, Sidorova AE, Bagrova OE, Belova EV, Bystrov VS, Levashova NT, Lutsenko AO, Semenova EV, Shpigun DK. Chirality As a Symmetric Basis of Self-Organization of Biomacromolecules. Biophysics (Nagoya-shi) 2022. [DOI: 10.1134/s0006350922050190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
|
4
|
Yang B, Bao W, Wang J. Active disease-related compound identification based on capsule network. Brief Bioinform 2022; 23:bbab462. [PMID: 35057581 PMCID: PMC8690041 DOI: 10.1093/bib/bbab462] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 09/30/2021] [Accepted: 10/07/2021] [Indexed: 01/03/2023] Open
Abstract
Pneumonia, especially corona virus disease 2019 (COVID-19), can lead to serious acute lung injury, acute respiratory distress syndrome, multiple organ failure and even death. Thus it is an urgent task for developing high-efficiency, low-toxicity and targeted drugs according to pathogenesis of coronavirus. In this paper, a novel disease-related compound identification model-based capsule network (CapsNet) is proposed. According to pneumonia-related keywords, the prescriptions and active components related to the pharmacological mechanism of disease are collected and extracted in order to construct training set. The features of each component are extracted as the input layer of capsule network. CapsNet is trained and utilized to identify the pneumonia-related compounds in Qingre Jiedu injection. The experiment results show that CapsNet can identify disease-related compounds more accurately than SVM, RF, gcForest and forgeNet.
Collapse
Affiliation(s)
- Bin Yang
- School of Information science and Engineering, Zaozhuang University, Zaozhuang, China 277160
| | - Wenzheng Bao
- School of Information and Electrical Engineering, Xuzhou University of Technology, Xuzhou, China 221018
| | - Jinglong Wang
- College of Food Science and Pharmaceutical Engineering, Zaozhuang University, Zaozhuang 277160, China
| |
Collapse
|
5
|
Wang L, Miao X, Nie R, Zhang Z, Zhang J, Cai J. MultiCapsNet: A General Framework for Data Integration and Interpretable Classification. Front Genet 2021; 12:767602. [PMID: 34899854 PMCID: PMC8652257 DOI: 10.3389/fgene.2021.767602] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 10/25/2021] [Indexed: 12/16/2022] Open
Abstract
The latest progresses of experimental biology have generated a large number of data with different formats and lengths. Deep learning is an ideal tool to deal with complex datasets, but its inherent “black box” nature needs more interpretability. At the same time, traditional interpretable machine learning methods, such as linear regression or random forest, could only deal with numerical features instead of modular features often encountered in the biological field. Here, we present MultiCapsNet (https://github.com/wanglf19/MultiCapsNet), a new deep learning model built on CapsNet and scCapsNet, which possesses the merits such as easy data integration and high model interpretability. To demonstrate the ability of this model as an interpretable classifier to deal with modular inputs, we test MultiCapsNet on three datasets with different data type and application scenarios. Firstly, on the labeled variant call dataset, MultiCapsNet shows a similar classification performance with neural network model, and provides importance scores for data sources directly without an extra importance determination step required by the neural network model. The importance scores generated by these two models are highly correlated. Secondly, on single cell RNA sequence (scRNA-seq) dataset, MultiCapsNet integrates information about protein-protein interaction (PPI), and protein-DNA interaction (PDI). The classification accuracy of MultiCapsNet is comparable to the neural network and random forest model. Meanwhile, MultiCapsNet reveals how each transcription factor (TF) or PPI cluster node contributes to classification of cell type. Thirdly, we made a comparison between MultiCapsNet and SCENIC. The results show several cell type relevant TFs identified by both methods, further proving the validity and interpretability of the MultiCapsNet.
Collapse
Affiliation(s)
- Lifei Wang
- Shulan (Hangzhou) Hospital Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China.,China National Center for Bioinformation, Beijing, China.,Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Xuexia Miao
- China National Center for Bioinformation, Beijing, China.,Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Rui Nie
- China National Center for Bioinformation, Beijing, China.,Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Zhang Zhang
- School of Systems Science, Beijing Normal University, Beijing, China
| | - Jiang Zhang
- School of Systems Science, Beijing Normal University, Beijing, China
| | - Jun Cai
- China National Center for Bioinformation, Beijing, China.,Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
6
|
Sidorova A, Bystrov V, Lutsenko A, Shpigun D, Belova E, Likhachev I. Quantitative Assessment of Chirality of Protein Secondary Structures and Phenylalanine Peptide Nanotubes. NANOMATERIALS 2021; 11:nano11123299. [PMID: 34947648 PMCID: PMC8707344 DOI: 10.3390/nano11123299] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Revised: 11/26/2021] [Accepted: 12/02/2021] [Indexed: 01/25/2023]
Abstract
In this study we consider the features of spatial-structure formation in proteins and their application in bioengineering. Methods for the quantitative assessment of the chirality of regular helical and irregular structures of proteins are presented. The features of self-assembly of phenylalanine (F) into peptide nanotubes (PNT), which form helices of different chirality, are also analyzed. A method is proposed for calculating the magnitude and sign of the chirality of helix-like peptide nanotubes using a sequence of vectors for the dipole moments of individual peptides.
Collapse
Affiliation(s)
- Alla Sidorova
- Faculty of Physics, Lomonosov Moscow State University, 119991 Moscow, Russia; (A.L.); (D.S.); (E.B.)
- Correspondence:
| | - Vladimir Bystrov
- Institute of Mathematical Problems of Biology, The Branch of Keldysh Institute of Applied Mathematics, RAS, 142290 Pushchino, Russia; (V.B.); (I.L.)
| | - Aleksey Lutsenko
- Faculty of Physics, Lomonosov Moscow State University, 119991 Moscow, Russia; (A.L.); (D.S.); (E.B.)
| | - Denis Shpigun
- Faculty of Physics, Lomonosov Moscow State University, 119991 Moscow, Russia; (A.L.); (D.S.); (E.B.)
| | - Ekaterina Belova
- Faculty of Physics, Lomonosov Moscow State University, 119991 Moscow, Russia; (A.L.); (D.S.); (E.B.)
| | - Ilya Likhachev
- Institute of Mathematical Problems of Biology, The Branch of Keldysh Institute of Applied Mathematics, RAS, 142290 Pushchino, Russia; (V.B.); (I.L.)
| |
Collapse
|
7
|
Wang L, Cao C, Zuo S. Protein secondary structure assignment using pc-polyline and convolutional neural network. Proteins 2021; 89:1017-1029. [PMID: 33780034 DOI: 10.1002/prot.26079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 02/22/2021] [Accepted: 03/15/2021] [Indexed: 01/17/2023]
Abstract
MOTIVATION The assignment of protein secondary structure elements (SSEs) underpins structural analysis and prediction. The backbone of a protein could be adequately represented using a pc-polyline that passes through the centers of its peptide planes. One salient feature of pc-polyline representation is that the secondary structure of a protein becomes recognizable in a matrix whose elements are the pairwise distances between two peptide plane centers. Thus, a pc-polyline could in turn be used to assign SSEs. RESULTS Using convolutional neural network (CNN) here we confirm that a pc-polyline indeed contains enough information for it to be used for the accurate assignments of the six SSE types: α-helix, β-sheet, β-bulge, 310 -helix, turn and loop. The applications to three large data sets show that the assignments by our CNN-based p2psse program agree very well with those by dssp, stride and quite well with those by five other programs. The analyses of their SSE assignments raise some general questions about the characterizations of protein secondary structure. In particular the analyses illustrate the difficulty with giving a quantitative and consistent definition for each of the six SSE types especially for 310 -helix, β-bulge, turn or loop in terms of either backbone H-bond patterns, or backbone dihedral angles, or Cα -polyline or pc-polyline. The difficulty suggests that the SSE space though being dominated by the regions for the six SSE types is to a certain degree continuous. AVAILABILITY The program is available at https://github.com/wlincong/p2pSSE.
Collapse
Affiliation(s)
- Lincong Wang
- The College of Computer Science and Technology, Jilin University, Changchun, China
| | - Chen Cao
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada
| | - Shuxue Zuo
- The College of Computer Science and Technology, Jilin University, Changchun, China
| |
Collapse
|
8
|
Liu Z, Gong Y, Guo Y, Zhang X, Lu C, Zhang L, Wang H. TMP- SSurface2: A Novel Deep Learning-Based Surface Accessibility Predictor for Transmembrane Protein Sequence. Front Genet 2021; 12:656140. [PMID: 33790952 PMCID: PMC8006303 DOI: 10.3389/fgene.2021.656140] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 02/22/2021] [Indexed: 12/13/2022] Open
Abstract
Transmembrane protein (TMP) is an important type of membrane protein that is involved in various biological membranes related biological processes. As major drug targets, TMPs’ surfaces are highly concerned to form the structural biases of their material-bindings for drugs or other biological molecules. However, the quantity of determinate TMP structures is still far less than the requirements, while artificial intelligence technologies provide a promising approach to accurately identify the TMP surfaces, merely depending on their sequences without any feature-engineering. For this purpose, we present an updated TMP surface residue predictor TMP-SSurface2 which achieved an even higher prediction accuracy compared to our previous version. The method uses an attention-enhanced Bidirectional Long Short Term Memory (BiLSTM) network, benefiting from its efficient learning capability, some useful latent information is abstracted from protein sequences, thus improving the Pearson correlation coefficients (CC) value performance of the old version from 0.58 to 0.66 on an independent test dataset. The results demonstrate that TMP-SSurface2 is efficient in predicting the surface of transmembrane proteins, representing new progress in transmembrane protein structure modeling based on primary sequences. TMP-SSurface2 is freely accessible at https://github.com/NENUBioCompute/TMP-SSurface-2.0.
Collapse
Affiliation(s)
- Zhe Liu
- School of Computer Science and Engineering, Changchun University of Technology, Changchun, China.,School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, China.,Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Yingli Gong
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Yuanzhao Guo
- School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, China
| | - Xiao Zhang
- College of Computing and Software Engineering, Kennesaw State University, Kennesaw, GA, United States
| | - Chang Lu
- School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, China
| | - Li Zhang
- School of Computer Science and Engineering, Changchun University of Technology, Changchun, China
| | - Han Wang
- School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, China
| |
Collapse
|
9
|
Liu Z, Gong Y, Bao Y, Guo Y, Wang H, Lin GN. TMPSS: A Deep Learning-Based Predictor for Secondary Structure and Topology Structure Prediction of Alpha-Helical Transmembrane Proteins. Front Bioeng Biotechnol 2021; 8:629937. [PMID: 33569377 PMCID: PMC7869861 DOI: 10.3389/fbioe.2020.629937] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 12/10/2020] [Indexed: 11/13/2022] Open
Abstract
Alpha transmembrane proteins (αTMPs) profoundly affect many critical biological processes and are major drug targets due to their pivotal protein functions. At present, even though the non-transmembrane secondary structures are highly relevant to the biological functions of αTMPs along with their transmembrane structures, they have not been unified to be studied yet. In this study, we present a novel computational method, TMPSS, to predict the secondary structures in non-transmembrane parts and the topology structures in transmembrane parts of αTMPs. TMPSS applied a Convolutional Neural Network (CNN), combined with an attention-enhanced Bidirectional Long Short-Term Memory (BiLSTM) network, to extract the local contexts and long-distance interdependencies from primary sequences. In addition, a multi-task learning strategy was used to predict the secondary structures and the transmembrane helixes. TMPSS was thoroughly trained and tested against a non-redundant independent dataset, where the Q3 secondary structure prediction accuracy achieved 78% in the non-transmembrane region, and the accuracy of the transmembrane region prediction achieved 90%. In sum, our method showcased a unified model for predicting the secondary structure and topology structure of αTMPs by only utilizing features generated from primary sequences and provided a steady and fast prediction, which promisingly improves the structural studies on αTMPs.
Collapse
Affiliation(s)
- Zhe Liu
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China.,Shanghai Key Laboratory of Psychotic Disorders, Shanghai, China
| | - Yingli Gong
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Yihang Bao
- School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, China
| | - Yuanzhao Guo
- School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, China
| | - Han Wang
- School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, China
| | - Guan Ning Lin
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China.,Shanghai Key Laboratory of Psychotic Disorders, Shanghai, China
| |
Collapse
|
10
|
Shi Q, Chen W, Huang S, Wang Y, Xue Z. Deep learning for mining protein data. Brief Bioinform 2019; 22:194-218. [PMID: 31867611 DOI: 10.1093/bib/bbz156] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 10/21/2019] [Accepted: 11/07/2019] [Indexed: 01/16/2023] Open
Abstract
The recent emergence of deep learning to characterize complex patterns of protein big data reveals its potential to address the classic challenges in the field of protein data mining. Much research has revealed the promise of deep learning as a powerful tool to transform protein big data into valuable knowledge, leading to scientific discoveries and practical solutions. In this review, we summarize recent publications on deep learning predictive approaches in the field of mining protein data. The application architectures of these methods include multilayer perceptrons, stacked autoencoders, deep belief networks, two- or three-dimensional convolutional neural networks, recurrent neural networks, graph neural networks, and complex neural networks and are described from five perspectives: residue-level prediction, sequence-level prediction, three-dimensional structural analysis, interaction prediction, and mass spectrometry data mining. The advantages and deficiencies of these architectures are presented in relation to various tasks in protein data mining. Additionally, some practical issues and their future directions are discussed, such as robust deep learning for protein noisy data, architecture optimization for specific tasks, efficient deep learning for limited protein data, multimodal deep learning for heterogeneous protein data, and interpretable deep learning for protein understanding. This review provides comprehensive perspectives on general deep learning techniques for protein data analysis.
Collapse
Affiliation(s)
- Qiang Shi
- School of Software Engineering, Huazhong University of Science and Technology. His main interests cover machine learning especially deep learning, protein data analysis, and big data mining
| | - Weiya Chen
- School of Software Engineering, Huazhong University of Science & Technology, Wuhan, China. His research interests cover bioinformatics, virtual reality, and data visualization
| | - Siqi Huang
- Software Engineering at Huazhong University of science and technology, focusing on Machine learning and data mining
| | - Yan Wang
- School of life, University of Science & Technology; her main interests cover protein structure and function prediction and big data mining
| | - Zhidong Xue
- School of Software Engineering, Huazhong University of Science & Technology, Wuhan, China. His research interests cover bioinformatics, machine learning, and image processing
| |
Collapse
|
11
|
Fang C, Li Z, Xu D, Shang Y. MUFold-SSW: a new web server for predicting protein secondary structures, torsion angles and turns. Bioinformatics 2019; 36:1293-1295. [PMID: 31532508 PMCID: PMC8489430 DOI: 10.1093/bioinformatics/btz712] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Revised: 09/11/2019] [Accepted: 09/13/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Protein secondary structure and backbone torsion angle prediction can provide important information for predicting protein 3D structures and protein functions. Our new methods MUFold-SS, MUFold-Angle, MUFold-BetaTurn and MUFold-GammaTurn, developed based on advanced deep neural networks, achieved state-of-the-art performance for predicting secondary structures, backbone torsion angles, beta-turns and gamma-turns, respectively. An easy-to-use web service will provide the community a convenient way to use these methods for research and development. RESULTS MUFold-SSW, a new web server, is presented. It provides predictions of protein secondary structures, torsion angles, beta-turns and gamma-turns for a given protein sequence. This server implements MUFold-SS, MUFold-Angle, MUFold-BetaTurn and MUFold-GammaTurn, which performed well for both easy targets (proteins with weak sequence similarity in PDB) and hard targets (proteins without detectable similarity in PDB) in various experimental tests, achieving results better than or comparable with those of existing methods. AVAILABILITY AND IMPLEMENTATION MUFold-SSW is accessible at http://mufold.org/mufold-ss-angle. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Dong Xu
- To whom correspondence should be addressed. or
| | - Yi Shang
- To whom correspondence should be addressed. or
| |
Collapse
|