1
|
Guo Y, Wu J, Ma H, Wang S, Huang J. Deep Ensemble Learning with Atrous Spatial Pyramid Networks for Protein Secondary Structure Prediction. Biomolecules 2022; 12:biom12060774. [PMID: 35740899 PMCID: PMC9221033 DOI: 10.3390/biom12060774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 05/26/2022] [Accepted: 05/30/2022] [Indexed: 02/04/2023] Open
Abstract
The secondary structure of proteins is significant for studying the three-dimensional structure and functions of proteins. Several models from image understanding and natural language modeling have been successfully adapted in the protein sequence study area, such as Long Short-term Memory (LSTM) network and Convolutional Neural Network (CNN). Recently, Gated Convolutional Neural Network (GCNN) has been proposed for natural language processing. It has achieved high levels of sentence scoring, as well as reduced the latency. Conditionally Parameterized Convolution (CondConv) is another novel study which has gained great success in the image processing area. Compared with vanilla CNN, CondConv uses extra sample-dependant modules to conditionally adjust the convolutional network. In this paper, we propose a novel Conditionally Parameterized Convolutional network (CondGCNN) which utilizes the power of both CondConv and GCNN. CondGCNN leverages an ensemble encoder to combine the capabilities of both LSTM and CondGCNN to encode protein sequences by better capturing protein sequential features. In addition, we explore the similarity between the secondary structure prediction problem and the image segmentation problem, and propose an ASP network (Atrous Spatial Pyramid Pooling (ASPP) based network) to capture fine boundary details in secondary structure. Extensive experiments show that the proposed method can achieve higher performance on protein secondary structure prediction task than existing methods on CB513, Casp11, CASP12, CASP13, and CASP14 datasets. We also conducted ablation studies over each component to verify the effectiveness. Our method is expected to be useful for any protein related prediction tasks, which is not limited to protein secondary structure prediction.
Collapse
Affiliation(s)
- Yuzhi Guo
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA; (Y.G.); (H.M.); (S.W.)
| | | | - Hehuan Ma
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA; (Y.G.); (H.M.); (S.W.)
| | - Sheng Wang
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA; (Y.G.); (H.M.); (S.W.)
| | - Junzhou Huang
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA; (Y.G.); (H.M.); (S.W.)
- Correspondence:
| |
Collapse
|
2
|
Moffat L, Jones DT. Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework. Bioinformatics 2021; 37:3744-3751. [PMID: 34213528 PMCID: PMC8570780 DOI: 10.1093/bioinformatics/btab491] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/08/2021] [Accepted: 06/30/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Over the past 50 years, our ability to model protein sequences with evolutionary information has progressed in leaps and bounds. However, even with the latest deep learning methods, the modelling of a critically important class of proteins, single orphan sequences, remains unsolved. RESULTS By taking a bioinformatics approach to semi-supervised machine learning, we develop Profile Augmentation of Single Sequences (PASS), a simple but powerful framework for building accurate single-sequence methods. To demonstrate the effectiveness of PASS we apply it to the mature field of secondary structure prediction. In doing so we develop S4PRED, the successor to the open-source PSIPRED-Single method, which achieves an unprecedented Q3 score of 75.3% on the standard CB513 test. PASS provides a blueprint for the development of a new generation of predictive methods, advancing our ability to model individual protein sequences. AVAILABILITY AND IMPLEMENTATION The S4PRED model is available as open source software on the PSIPRED GitHub repository (https://github.com/psipred/s4pred), along with documentation. It will also be provided as a part of the PSIPRED web service (http://bioinf.cs.ucl.ac.uk/psipred/). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lewis Moffat
- Department of Computer Science, University College London, London WC1E 6BT, UK
- Biomedical Data Science Laboratory, The Francis Crick Institute, London NW1 1AT, UK
| | - David T Jones
- Department of Computer Science, University College London, London WC1E 6BT, UK
- Biomedical Data Science Laboratory, The Francis Crick Institute, London NW1 1AT, UK
| |
Collapse
|
3
|
Chen J, Xu Y, Sun W, Huang L. Joint sparse neural network compression via multi-application multi-objective optimization. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02243-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
4
|
Visibelli A, Bongini P, Rossi A, Niccolai N, Bianchini M. A deep attention network for predicting amino acid signals in the formation of [Formula: see text]-helices. J Bioinform Comput Biol 2020; 18:2050028. [PMID: 32757808 DOI: 10.1142/s0219720020500286] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The secondary and tertiary structure of a protein has a primary role in determining its function. Even though many folding prediction algorithms have been developed in the past decades - mainly based on the assumption that folding instructions are encoded within the protein sequence - experimental techniques remain the most reliable to establish protein structures. In this paper, we searched for signals related to the formation of [Formula: see text]-helices. We carried out a statistical analysis on a large dataset of experimentally characterized secondary structure elements to find over- or under-occurrences of specific amino acids defining the boundaries of helical moieties. To validate our hypothesis, we trained various Machine Learning models, each equipped with an attention mechanism, to predict the occurrence of [Formula: see text]-helices. The attention mechanism allows to interpret the model's decision, weighing the importance the predictor gives to each part of the input. The experimental results show that different models focus on the same subsequences, which can be seen as codes driving the secondary structure formation.
Collapse
Affiliation(s)
- A Visibelli
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, 53100, Siena, Italy
| | - P Bongini
- Department of Information Engineering and Mathematics, University of Siena, 53100, Siena, Italy.,Department of Information Engineering, University of Florence, 50139, Florence, Italy
| | - A Rossi
- Department of Information Engineering and Mathematics, University of Siena, 53100, Siena, Italy.,Department of Information Engineering, University of Florence, 50139, Florence, Italy
| | - N Niccolai
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, 53100, Siena, Italy
| | - M Bianchini
- Department of Information Engineering and Mathematics, University of Siena, 53100, Siena, Italy
| |
Collapse
|
5
|
Huang J, Sun W, Huang L. Deep neural networks compression learning based on multiobjective evolutionary algorithms. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.10.053] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
6
|
An enhanced protein secondary structure prediction using deep learning framework on hybrid profile based features. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2019.105926] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
7
|
Peyravi F, Latif A, Moshtaghioun SM. Protein tertiary structure prediction using hidden Markov model based on lattice. J Bioinform Comput Biol 2019; 17:1950007. [PMID: 31057069 DOI: 10.1142/s0219720019500070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The prediction of protein structure from its amino acid sequence is one of the most prominent problems in computational biology. The biological function of a protein depends on its tertiary structure which is determined by its amino acid sequence via the process of protein folding. We propose a novel fold recognition method for protein tertiary structure prediction based on a hidden Markov model and 3D coordinates of amino acid residues. The method introduces states based on the basis vectors in Bravais cubic lattices to learn the path of amino acids of the proteins of each fold. Three hidden Markov models are considered based on simple cubic, body-centered cubic (BCC) and face-centered cubic (FCC) lattices. A 10-fold cross validation was performed on a set of 42 fold SCOP dataset. The proposed composite methodology is compared to fold recognition methods which have HMM as base of their algorithms having approaches on only amino acid sequence or secondary structure. The accuracy of proposed model based on face-centered cubic lattices is quite better in comparison with SAM, 3-HMM optimized and Markov chain optimized in overall experiment. The huge data of 3D space help the model to have greater performance in comparison to methods which use only primary structures or only secondary structures.
Collapse
Affiliation(s)
- Farzad Peyravi
- * Department of Computer Engineering, Yazd University, Yazd, Iran
| | | | | |
Collapse
|
8
|
A Composite Approach to Protein Tertiary Structure Prediction: Hidden Markov Model Based on Lattice. Bull Math Biol 2018; 81:899-918. [DOI: 10.1007/s11538-018-00542-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 11/28/2018] [Indexed: 11/25/2022]
|
9
|
Heffernan R, Paliwal K, Lyons J, Singh J, Yang Y, Zhou Y. Single-sequence-based prediction of protein secondary structures and solvent accessibility by deep whole-sequence learning. J Comput Chem 2018; 39:2210-2216. [PMID: 30368831 DOI: 10.1002/jcc.25534] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Revised: 05/11/2018] [Accepted: 06/14/2018] [Indexed: 02/01/2023]
Abstract
Predicting protein structure from sequence alone is challenging. Thus, the majority of methods for protein structure prediction rely on evolutionary information from multiple sequence alignments. In previous work we showed that Long Short-Term Bidirectional Recurrent Neural Networks (LSTM-BRNNs) improved over regular neural networks by better capturing intra-sequence dependencies. Here we show a single-sequence-based prediction method employing LSTM-BRNNs (SPIDER3-Single), that consistently achieves Q3 accuracy of 72.5%, and correlation coefficient of 0.67 between predicted and actual solvent accessible surface area. Moreover, it yields reasonably accurate prediction of eight-state secondary structure, main-chain angles (backbone ϕ and ψ torsion angles and C α-atom-based θ and τ angles), half-sphere exposure, and contact number. The method is more accurate than the corresponding evolutionary-based method for proteins with few sequence homologs, and computationally efficient for large-scale screening of protein-structural properties. It is available as an option in the SPIDER3 server, and a standalone version for download, at http://sparks-lab.org. © 2018 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Rhys Heffernan
- Signal Processing Laboratory, Griffith University, Brisbane, QLD, 4111, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane, QLD, 4111, Australia
| | - James Lyons
- Signal Processing Laboratory, Griffith University, Brisbane, QLD, 4111, Australia
| | - Jaswinder Singh
- Signal Processing Laboratory, Griffith University, Brisbane, QLD, 4111, Australia
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yet-Sen University, Guangzhou, China
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, QLD, 4222, Australia
| |
Collapse
|
10
|
Protein secondary structure prediction: A survey of the state of the art. J Mol Graph Model 2017; 76:379-402. [DOI: 10.1016/j.jmgm.2017.07.015] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Revised: 07/14/2017] [Accepted: 07/17/2017] [Indexed: 11/21/2022]
|
11
|
GHANTY PRADIP, PAL NIKHILR, MUDI RAJANIK. PREDICTION OF PROTEIN SECONDARY STRUCTURE USING PROBABILITY BASED FEATURES AND A HYBRID SYSTEM. J Bioinform Comput Biol 2013; 11:1350012. [DOI: 10.1142/s0219720013500121] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In this paper, we propose some co-occurrence probability-based features for prediction of protein secondary structure. The features are extracted using occurrence/nonoccurrence of secondary structures in the protein sequences. We explore two types of features: position-specific (based on position of amino acid on fragments of protein sequences) as well as position-independent (independent of amino acid position on fragments of protein sequences). We use a hybrid system, NEUROSVM, consisting of neural networks and support vector machines for classification of secondary structures. We propose two schemes NSVMps and NSVM for protein secondary structure prediction. The NSVMps uses position-specific probability-based features and NEUROSVM classifier whereas NSVM uses the same classifier with position-independent probability-based features. The proposed method falls in the single-sequence category of methods because it does not use any sequence profile information such as position specific scoring matrices (PSSM) derived from PSI-BLAST. Two widely used datasets RS126 and CB513 are used in the experiments. The results obtained using the proposed features and NEUROSVM classifier are better than most of the existing single-sequence prediction methods. Most importantly, the results using NSVMps that are obtained using lower dimensional features, are comparable to those by other existing methods. The NSVMps and NSVM are finally tested on target proteins of the critical assessment of protein structure prediction experiment-9 (CASP9). A larger dataset is used to compare the performance of the proposed methods with that of two recent single-sequence prediction methods. We also investigate the impact of presence of different amino acid residues (in protein sequences) that are responsible for the formation of different secondary structures.
Collapse
Affiliation(s)
- PRADIP GHANTY
- Praxis Softek Solutions Private Limited, Module 616, SDF Building, Sector V, Saltlake, Kolkata, India
| | - NIKHIL R. PAL
- Electronics and Communication Sciences Unit, Indian Statistical Institute, 203 B. T. Road, Calcutta 700108, India
| | - RAJANI K. MUDI
- Department of Instrumentation and Electronics Engineering, Jadavpur University, Saltlake Campus, Kolkata, India
| |
Collapse
|