1
|
Pang Y, Liu B. TransDFL: Identification of Disordered Flexible Linkers in Proteins by Transfer Learning. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:359-369. [PMID: 36272675 PMCID: PMC10626177 DOI: 10.1016/j.gpb.2022.10.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 09/21/2022] [Accepted: 10/14/2022] [Indexed: 11/27/2022]
Abstract
Disordered flexible linkers (DFLs) are the functional disordered regions in proteins, which are the sub-regions of intrinsically disordered regions (IDRs) and play important roles in connecting domains and maintaining inter-domain interactions. Trained with the limited available DFLs, the existing DFL predictors based on the machine learning techniques tend to predict the ordered residues as DFLs, leading to a high falsepositive rate (FPR) and low prediction accuracy. Previous studies have shown that DFLs are extremely flexible disordered regions, which are usually predicted as disordered residues with high confidence [P(D) > 0.9] by an IDR predictor. Therefore, transferring an IDR predictor to an accurate DFL predictor is of great significance for understanding the functions of IDRs. In this study, we proposed a new predictor called TransDFL for identifying DFLs by transferring the RFPR-IDP predictor for IDR identification to the DFL prediction. The RFPR-IDP was pre-trained with IDR sequences to learn the general features between IDRs and DFLs, which is helpful to reduce the false positives in the ordered regions. RFPR-IDP was fine-tuned with the DFL sequences to capture the specific features of DFLs so as to be transferred into the TransDFL. Experimental results of two application scenarios (prediction of DFLs only in IDRs or prediction of DFLs in entire proteins) showed that TransDFL consistently outperformed other existing DFL predictors with higher accuracy. The corresponding web server of TransDFL can be freely accessed at http://bliulab.net/TransDFL/.
Collapse
Affiliation(s)
- Yihe Pang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China; Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China.
| |
Collapse
|
2
|
Mufassirin MMM, Newton MAH, Sattar A. Artificial intelligence for template-free protein structure prediction: a comprehensive review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10350-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
3
|
Newton MAH, Rahman J, Zaman R, Sattar A. Enhancing Protein Contact Map Prediction Accuracy via Ensembles of Inter-Residue Distance Predictors. Comput Biol Chem 2022; 99:107700. [DOI: 10.1016/j.compbiolchem.2022.107700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 05/19/2022] [Accepted: 05/19/2022] [Indexed: 11/03/2022]
|
4
|
Guo L, Jiang Q, Jin X, Liu L, Zhou W, Yao S, Wu M, Wang Y. A Deep Convolutional Neural Network to Improve the Prediction of Protein Secondary Structure. Curr Bioinform 2020. [DOI: 10.2174/1574893615666200120103050] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Protein secondary structure prediction (PSSP) is a fundamental task in
bioinformatics that is helpful for understanding the three-dimensional structure and biological
function of proteins. Many neural network-based prediction methods have been developed for
protein secondary structures. Deep learning and multiple features are two obvious means to improve
prediction accuracy.
Objective:
To promote the development of PSSP, a deep convolutional neural network-based
method is proposed to predict both the eight-state and three-state of protein secondary structure.
Methods:
In this model, sequence and evolutionary information of proteins are combined as multiple
input features after preprocessing. A deep convolutional neural network with no pooling layer and
connection layer is then constructed to predict the secondary structure of proteins. L2 regularization,
batch normalization, and dropout techniques are employed to avoid over-fitting and obtain better
prediction performance, and an improved cross-entropy is used as the loss function.
Results:
Our proposed model can obtain Q3 prediction results of 86.2%, 84.5%, 87.8%, and 84.7%,
respectively, on CullPDB, CB513, CASP10 and CASP11 datasets, with corresponding Q8
prediction results of 74.1%, 70.5%, 74.9%, and 71.3%.
Conclusion:
We have proposed the DCNN-SS deep convolutional-network-based PSSP method,
and experimental results show that DCNN-SS performs competitively with other methods.
Collapse
Affiliation(s)
- Lin Guo
- School of Software, Yunnan University, Kunming, China; 2School of Information, Yunnan Normal University, Kunming, China
| | - Qian Jiang
- School of Software, Yunnan University, Kunming, China; 2School of Information, Yunnan Normal University, Kunming, China
| | - Xin Jin
- School of Software, Yunnan University, Kunming, China; 2School of Information, Yunnan Normal University, Kunming, China
| | - Lin Liu
- School of Software, Yunnan University, Kunming, China; 2School of Information, Yunnan Normal University, Kunming, China
| | - Wei Zhou
- School of Software, Yunnan University, Kunming, China; 2School of Information, Yunnan Normal University, Kunming, China
| | - Shaowen Yao
- School of Software, Yunnan University, Kunming, China; 2School of Information, Yunnan Normal University, Kunming, China
| | - Min Wu
- School of Software, Yunnan University, Kunming, China; 2School of Information, Yunnan Normal University, Kunming, China
| | - Yun Wang
- School of Software, Yunnan University, Kunming, China; 2School of Information, Yunnan Normal University, Kunming, China
| |
Collapse
|
5
|
Chen T, Wang X, Chu Y, Wang Y, Jiang M, Wei DQ, Xiong Y. T4SE-XGB: Interpretable Sequence-Based Prediction of Type IV Secreted Effectors Using eXtreme Gradient Boosting Algorithm. Front Microbiol 2020; 11:580382. [PMID: 33072049 PMCID: PMC7541839 DOI: 10.3389/fmicb.2020.580382] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Accepted: 08/21/2020] [Indexed: 12/19/2022] Open
Abstract
Type IV secreted effectors (T4SEs) can be translocated into the cytosol of host cells via type IV secretion system (T4SS) and cause diseases. However, experimental approaches to identify T4SEs are time- and resource-consuming, and the existing computational tools based on machine learning techniques have some obvious limitations such as the lack of interpretability in the prediction models. In this study, we proposed a new model, T4SE-XGB, which uses the eXtreme gradient boosting (XGBoost) algorithm for accurate identification of type IV effectors based on optimal features based on protein sequences. After trying 20 different types of features, the best performance was achieved when all features were fed into XGBoost by the 5-fold cross validation in comparison with other machine learning methods. Then, the ReliefF algorithm was adopted to get the optimal feature set on our dataset, which further improved the model performance. T4SE-XGB exhibited highest predictive performance on the independent test set and outperformed other published prediction tools. Furthermore, the SHAP method was used to interpret the contribution of features to model predictions. The identification of key features can contribute to improved understanding of multifactorial contributors to host-pathogen interactions and bacterial pathogenesis. In addition to type IV effector prediction, we believe that the proposed framework can provide instructive guidance for similar studies to construct prediction methods on related biological problems. The data and source code of this study can be freely accessed at https://github.com/CT001002/T4SE-XGB.
Collapse
Affiliation(s)
- Tianhang Chen
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Xiangeng Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China
| | - Yanyi Chu
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Mingming Jiang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
6
|
Affiliation(s)
- Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, China
| |
Collapse
|