1
|
Wang L, Zhong H, Xue Z, Wang Y. Improving the topology prediction of α-helical transmembrane proteins with deep transfer learning. Comput Struct Biotechnol J 2022; 20:1993-2000. [PMID: 35521551 PMCID: PMC9062415 DOI: 10.1016/j.csbj.2022.04.024] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 04/09/2022] [Accepted: 04/17/2022] [Indexed: 11/11/2022] Open
Abstract
Transmembrane proteins (TMPs) are essential for cell recognition and communication, and they serve as important drug targets in humans. Transmembrane proteins' 3D structures are critical for determining their functions and drug design but are hard to determine even by experimental methods. Although some computational methods have been developed to predict transmembrane helices (TMHs) and orientation, there is still room for improvement. Considering that the pre-trained language model can make full use of massive unlabeled protein sequences to obtain latent feature representation for TMPs and reduce the dependence on evolutionary information, we proposed DeepTMpred, which used pre-trained self-supervised language models called ESM, convolutional neural networks, attentive neural network and conditional random fields for alpha-TMP topology prediction. Compared with the current state-of-the-art tools on a non-redundant dataset of TMPs, DeepTMpred demonstrated superior predictive performance in most evaluation metrics, especially at the TMH level. Furthermore, DeepTMpred could also obtain reliable prediction results for TMPs without much evolutionary feature in a few seconds. A tutorial on how to use DeepTMpred can be found in the colab notebook (https://colab.research.google.com/github/ISYSLAB-HUST/DeepTMpred/blob/master/notebook/test.ipynb).
Collapse
|
2
|
Wu H, Yang R, Fu Q, Chen J, Lu W, Li H. Research on predicting 2D-HP protein folding using reinforcement learning with full state space. BMC Bioinformatics 2019; 20:685. [PMID: 31874607 PMCID: PMC6929271 DOI: 10.1186/s12859-019-3259-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein structure prediction has always been an important issue in bioinformatics. Prediction of the two-dimensional structure of proteins based on the hydrophobic polarity model is a typical non-deterministic polynomial hard problem. Currently reported hydrophobic polarity model optimization methods, greedy method, brute-force method, and genetic algorithm usually cannot converge robustly to the lowest energy conformations. Reinforcement learning with the advantages of continuous Markov optimal decision-making and maximizing global cumulative return is especially suitable for solving global optimization problems of biological sequences. RESULTS In this study, we proposed a novel hydrophobic polarity model optimization method derived from reinforcement learning which structured the full state space, and designed an energy-based reward function and a rigid overlap detection rule. To validate the performance, sixteen sequences were selected from the classical data set. The results indicated that reinforcement learning with full states successfully converged to the lowest energy conformations against all sequences, while the reinforcement learning with partial states folded 50% sequences to the lowest energy conformations. Reinforcement learning with full states hits the lowest energy on an average 5 times, which is 40 and 100% higher than the three and zero hit by the greedy algorithm and reinforcement learning with partial states respectively in the last 100 episodes. CONCLUSIONS Our results indicate that reinforcement learning with full states is a powerful method for predicting two-dimensional hydrophobic-polarity protein structure. It has obvious competitive advantages compared with greedy algorithm and reinforcement learning with partial states.
Collapse
Affiliation(s)
- Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Ru Yang
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Qiming Fu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China. .,Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215009, China.
| | - Jianping Chen
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.,Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Weizhong Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Haiou Li
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| |
Collapse
|
3
|
Lu W, Tang Y, Wu H, Huang H, Fu Q, Qiu J, Li H. Predicting RNA secondary structure via adaptive deep recurrent neural networks with energy-based filter. BMC Bioinformatics 2019; 20:684. [PMID: 31874602 PMCID: PMC6929275 DOI: 10.1186/s12859-019-3258-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Background RNA secondary structure prediction is an important issue in structural bioinformatics, and RNA pseudoknotted secondary structure prediction represents an NP-hard problem. Recently, many different machine-learning methods, Markov models, and neural networks have been employed for this problem, with encouraging results regarding their predictive accuracy; however, their performances are usually limited by the requirements of the learning model and over-fitting, which requires use of a fixed number of training features. Because most natural biological sequences have variable lengths, the sequences have to be truncated before the features are employed by the learning model, which not only leads to the loss of information but also destroys biological-sequence integrity. Results To address this problem, we propose an adaptive sequence length based on deep-learning model and integrate an energy-based filter to remove the over-fitting base pairs. Conclusions Comparative experiments conducted on an authoritative dataset RNA STRAND (RNA secondary STRucture and statistical Analysis Database) revealed a 12% higher accuracy relative to three currently used methods.
Collapse
Affiliation(s)
- Weizhong Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiang, 215000, China
| | - Ye Tang
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiang, 215000, China
| | - Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiang, 215000, China. .,Anhui Key Laboratory of Intelligent Building Energy Efficiency, Anhui Jianzhu University, Hefei, Anhui, 230601, China.
| | - Hongmei Huang
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiang, 215000, China
| | - Qiming Fu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiang, 215000, China
| | - Jing Qiu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiang, 215000, China
| | - Haiou Li
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiang, 215000, China
| |
Collapse
|
4
|
Jabeen A, Ranganathan S. Applications of machine learning in GPCR bioactive ligand discovery. Curr Opin Struct Biol 2019; 55:66-76. [PMID: 31005679 DOI: 10.1016/j.sbi.2019.03.022] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 03/14/2019] [Accepted: 03/14/2019] [Indexed: 12/17/2022]
Abstract
GPCRs constitute the largest druggable family having targets for 475 Food and Drug Administration (FDA) approved drugs. As GPCRs are of great interest to pharmaceutical industry, enormous efforts are being expended to find relevant and potent GPCR ligands as lead compounds. There are tens of millions of compounds present in different chemical databases. In order to scan this immense chemical space, computational methods, especially machine learning (ML) methods, are essential components of GPCR drug discovery pipelines. ML approaches have applications in both ligand-based and structure-based virtual screening. We present here a cheminformatics overview of ML applications to different stages of GPCR drug discovery. Focusing on olfactory receptors, which are the largest family of GPCRs, a case study for predicting agonists for an ectopic olfactory receptor, OR1G1, compares four classical ML methods.
Collapse
Affiliation(s)
- Amara Jabeen
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| | - Shoba Ranganathan
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia.
| |
Collapse
|
5
|
Deep Learning in the Biomedical Applications: Recent and Future Status. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9081526] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Deep neural networks represent, nowadays, the most effective machine learning technology in biomedical domain. In this domain, the different areas of interest concern the Omics (study of the genome—genomics—and proteins—transcriptomics, proteomics, and metabolomics), bioimaging (study of biological cell and tissue), medical imaging (study of the human organs by creating visual representations), BBMI (study of the brain and body machine interface) and public and medical health management (PmHM). This paper reviews the major deep learning concepts pertinent to such biomedical applications. Concise overviews are provided for the Omics and the BBMI. We end our analysis with a critical discussion, interpretation and relevant open challenges.
Collapse
|
6
|
Wu H, Cao C, Xia X, Lu Q. Unified Deep Learning Architecture for Modeling Biology Sequence. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1445-1452. [PMID: 28991751 DOI: 10.1109/tcbb.2017.2760832] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Prediction of the spatial structure or function of biological macromolecules based on their sequences remains an important challenge in bioinformatics. When modeling biological sequences using traditional sequencing models, long-range interaction, complicated and variable output of labeled structures, and variable length of biological sequences usually lead to different solutions on a case-by-case basis. This study proposed a unified deep learning architecture based on long short-term memory or a gated recurrent unit to capture long-range interactions. The architecture designs the optional reshape operator to adapt to the diversity of the output labels and implements a training algorithm to support the training of sequence models capable of processing variable-length sequences. The merging and pooling operators enhances the ability of capturing short-range interactions between basic units of biological sequences. The proposed deep-learning architecture and its training algorithm might be capable of solving currently variable biological sequence-modeling problems under a unified framework. We validated the model on one of the most difficult biological sequence-modeling problems, protein residue interaction prediction. The results indicate that the accuracy of obtaining the residue interactions of the model exceeded popular approaches by 10 percent on multiple widely-used benchmarks.
Collapse
|
7
|
Identify High-Quality Protein Structural Models by Enhanced K-Means. BIOMED RESEARCH INTERNATIONAL 2017; 2017:7294519. [PMID: 28421198 PMCID: PMC5381204 DOI: 10.1155/2017/7294519] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2016] [Revised: 02/09/2017] [Accepted: 02/19/2017] [Indexed: 01/01/2023]
Abstract
Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K-means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K-means clustering (SK-means), whereas the other employs squared distance to optimize the initial centroids (K-means++). Our results showed that SK-means and K-means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K-means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK-means and K-means++ demonstrated substantial improvements relative to results from SPICKER and classical K-means.
Collapse
|
8
|
Li H, Lyu Q, Cheng J. A Template-Based Protein Structure Reconstruction Method Using Deep Autoencoder Learning. ACTA ACUST UNITED AC 2016; 9:306-313. [PMID: 29081613 PMCID: PMC5658031 DOI: 10.4172/jpb.1000419] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Protein structure prediction is an important problem in computational biology, and is widely applied to various biomedical problems such as protein function study, protein design, and drug design. In this work, we developed a novel deep learning approach based on a deeply stacked denoising autoencoder for protein structure reconstruction. We applied our approach to a template-based protein structure prediction using only the 3D structural coordinates of homologous template proteins as input. The templates were identified for a target protein by a PSI-BLAST search. 3DRobot (a program that automatically generates diverse and well-packed protein structure decoys) was used to generate initial decoy models for the target from the templates. A stacked denoising autoencoder was trained on the decoys to obtain a deep learning model for the target protein. The trained deep model was then used to reconstruct the final structural model for the target sequence. With target proteins that have highly similar template proteins as benchmarks, the GDT-TS score of the predicted structures is greater than 0.7, suggesting that the deep autoencoder is a promising method for protein structure reconstruction.
Collapse
Affiliation(s)
- Haiou Li
- Department of Computer Science and Technology, Soochow University, Suzhou, 215006, China.,Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Qiang Lyu
- Department of Computer Science and Technology, Soochow University, Suzhou, 215006, China
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|