1
|
Daanial Khan Y, Alkhalifah T, Alturise F, Hassan Butt A. DeepDBS: Identification of DNA-binding sites in protein sequences by using deep representations and random forest. Methods 2024; 231:26-36. [PMID: 39270885 DOI: 10.1016/j.ymeth.2024.09.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 08/26/2024] [Accepted: 09/04/2024] [Indexed: 09/15/2024] Open
Abstract
Interactions of biological molecules in organisms are considered to be primary factors for the lifecycle of that organism. Various important biological functions are dependent on such interactions and among different kinds of interactions, the protein DNA interactions are very important for the processes of transcription, regulation of gene expression, DNA repairing and packaging. Thus, keeping the knowledge of such interactions and the sites of those interactions is necessary to study the mechanism of various biological processes. As experimental identification through biological assays is quite resource-demanding, costly and error-prone, scientists opt for the computational methods for efficient and accurate identification of such DNA-protein interaction sites. Thus, herein, we propose a novel and accurate method namely DeepDBS for the identification of DNA-binding sites in proteins, using primary amino acid sequences of proteins under study. From protein sequences, deep representations were computed through a one-dimensional convolution neural network (1D-CNN), recurrent neural network (RNN) and long short-term memory (LSTM) network and were further used to train a Random Forest classifier. Random Forest with LSTM-based features outperformed the other models, as well as the existing state-of-the-art methods with an accuracy score of 0.99 for self-consistency test, 10-fold cross-validation, 5-fold cross-validation, and jackknife validation while 0.92 for independent dataset testing. It is concluded based on results that the DeepDBS can help accurate and efficient identification of DNA binding sites (DBS) in proteins.
Collapse
Affiliation(s)
- Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Punjab 54770, Pakistan
| | - Tamim Alkhalifah
- Department of Computer Engineering, College of Computer, Qassim University, Buraydah, Saudi Arabia
| | - Fahad Alturise
- Department of Cybersecurity, College of Computer, Qassim University, Buraydah 52571, Saudi Arabia
| | - Ahmad Hassan Butt
- Department of Computer Science, Faculty of Computing and Information Technology, University of the Punjab, Lahore 54000, Punjab, Pakistan.
| |
Collapse
|
2
|
Shen C, Chai W, Han J, Zhang Z, Liu X, Yang S, Wang Y, Wang D, Wan F, Fan Z, Hu H. Identification and validation of a dysregulated TME-related gene signature for predicting prognosis, and immunological properties in bladder cancer. Front Immunol 2023; 14:1213947. [PMID: 37965307 PMCID: PMC10641729 DOI: 10.3389/fimmu.2023.1213947] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 10/13/2023] [Indexed: 11/16/2023] Open
Abstract
Background During tumor growth, tumor cells interact with their tumor microenvironment (TME) resulting in the development of heterogeneous tumors that promote tumor occurrence and progression. Recently, there has been extensive attention on TME as a possible therapeutic target for cancers. However, an accurate TME-related prediction model is urgently needed to aid in the assessment of patients' prognoses and therapeutic value, and to assist in clinical decision-making. As such, this study aimed to develop and validate a new prognostic model based on TME-associated genes for BC patients. Methods Transcriptome data and clinical information for BC patients were extracted from The Cancer Genome Atlas (TCGA) database. Gene Expression Omnibus (GEO) and IMvigor210 databases, along with the MSigDB, were utilized to identify genes associated with TMEs (TMRGs). A consensus clustering approach was used to identify molecular clusters associated with TMEs. LASSO Cox regression analysis was conducted to establish a prognostic TMRG-related signature, with verifications being successfully conducted internally and externally. Gene ontology (GO), KEGG, and single-sample gene set enrichment analyses (ssGSEA) were performed to investigate the underlying mechanisms. The potential response to ICB therapy was estimated using the Tumor Immune Dysfunction and Exclusion (TIDE) algorithm and Immunophenoscore (IPS). Additionally, it was found that the expression level of certain genes in the model was significantly correlated with objective responses to anti-PD-1 or anti-PD-L1 treatment in the IMvigor210, GSE111636, GSE176307, or Truce01 (registration number NCT04730219) cohorts. Finally, real-time PCR validation was performed on 10 paired tissue samples, and in vitro cytological experiments were also conducted on BC cell lines. Results In BC patients, 133 genes differentially expressed that were associated with prognosis in TME. Consensus clustering analysis revealed three distinct clinicopathological characteristics and survival outcomes. A novel prognostic model based on nine TMRGs (including C3orf62, DPYSL2, GZMA, SERPINB3, RHCG, PTPRR, STMN3, TMPRSS4, COMP) was identified, and a TMEscore for OS prediction was constructed, with its reliable predictive performance in BC patients being validated. MultiCox analysis showed that the risk score was an independent prognostic factor. A nomogram was developed to facilitate the clinical viability of TMEscore. Based on GO and KEGG enrichment analyses, biological processes related to ECM and collagen binding were significantly enriched among high-risk individuals. In addition, the low-risk group, characterized by a higher number of infiltrating CD8+ T cells and a lower burden of tumor mutations, demonstrated a longer survival time. Our study also found that TMEscore correlated with drug susceptibility, immune cell infiltration, and the prediction of immunotherapy efficacy. Lastly, we identified SERPINB3 as significantly promoting BC cells migration and invasion through differential expression validation and in vitro phenotypic experiments. Conclusion Our study developed a prognostic model based on nine TMRGs that accurately and stably predicted survival, guiding individual treatment for patients with BC, and providing new therapeutic strategies for the disease.
Collapse
Affiliation(s)
- Chong Shen
- Department of Urology, The Second Hospital of Tianjin Medical University, Tianjin, China
- Tianjin Key Laboratory of Urology, Tianjin Institute of Urology, Tianjin, China
| | - Wang Chai
- Department of Urology, The Second Hospital of Tianjin Medical University, Tianjin, China
- Tianjin Key Laboratory of Urology, Tianjin Institute of Urology, Tianjin, China
| | - Jingwen Han
- Department of Urology, The Second Hospital of Tianjin Medical University, Tianjin, China
- Tianjin Key Laboratory of Urology, Tianjin Institute of Urology, Tianjin, China
| | - Zhe Zhang
- Department of Urology, The Second Hospital of Tianjin Medical University, Tianjin, China
- Tianjin Key Laboratory of Urology, Tianjin Institute of Urology, Tianjin, China
| | - Xuejing Liu
- Obstetrics and Gynecology, Haidian Maternal & Child Health Hospital, Beijing, China
| | - Shaobo Yang
- Department of Urology, The Second Hospital of Tianjin Medical University, Tianjin, China
- Tianjin Key Laboratory of Urology, Tianjin Institute of Urology, Tianjin, China
| | - Yinlei Wang
- Department of Urology, The Second Hospital of Tianjin Medical University, Tianjin, China
- Tianjin Key Laboratory of Urology, Tianjin Institute of Urology, Tianjin, China
| | - Donghuai Wang
- Department of Urology, The Second Hospital of Tianjin Medical University, Tianjin, China
- Tianjin Key Laboratory of Urology, Tianjin Institute of Urology, Tianjin, China
| | - Fangxin Wan
- Department of Gastrointestinal Surgery, The Second Hospital of Tianjin Medical University, Tianjin, China
| | - Zhenqian Fan
- Department of Endocrinology, The Second Hospital of Tianjin Medical University, Tianjin, China
| | - Hailong Hu
- Department of Urology, The Second Hospital of Tianjin Medical University, Tianjin, China
- Tianjin Key Laboratory of Urology, Tianjin Institute of Urology, Tianjin, China
| |
Collapse
|
3
|
Hu J, Zeng WW, Jia NX, Arif M, Yu DJ, Zhang GJ. Improving DNA-Binding Protein Prediction Using Three-Part Sequence-Order Feature Extraction and a Deep Neural Network Algorithm. J Chem Inf Model 2023; 63:1044-1057. [PMID: 36719781 DOI: 10.1021/acs.jcim.2c00943] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Identification of the DNA-binding protein (DBP) helps dig out information embedded in the DNA-protein interaction, which is significant to understanding the mechanisms of DNA replication, transcription, and repair. Although existing computational methods for predicting the DBPs based on protein sequences have obtained great success, there is still room for improvement since the sequence-order information is not fully mined in these methods. In this study, a new three-part sequence-order feature extraction (called TPSO) strategy is developed to extract more discriminative information from protein sequences for predicting the DBPs. For each query protein, TPSO first divides its primary sequence features into N- and C-terminal fragments and then extracts the numerical pseudo features of three parts including the full sequence and these two fragments, respectively. Based on TPSO, a novel deep learning-based method, called TPSO-DBP, is proposed, which employs the sequence-based single-view features, the bidirectional long short-term memory (BiLSTM) and fully connected (FC) neural networks to learn the DBP prediction model. Empirical outcomes reveal that TPSO-DBP can achieve an accuracy of 87.01%, covering 85.30% of all DBPs, while achieving a Matthew's correlation coefficient value (0.741) that is significantly higher than most existing state-of-the-art DBP prediction methods. Detailed data analyses have indicated that the advantages of TPSO-DBP lie in the utilization of TPSO, which helps extract more concealed prominent patterns, and the deep neural network framework composed of BiLSTM and FC that learns the nonlinear relationships between input features and DBPs. The standalone package and web server of TPSO-DBP are freely available at https://jun-csbio.github.io/TPSO-DBP/.
Collapse
Affiliation(s)
- Jun Hu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou310023, China
| | - Wen-Wu Zeng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou310023, China
| | - Ning-Xin Jia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou310023, China
| | - Muhammad Arif
- School of Systems and Technology, Department of Informatics and Systems, University of Management and Technology, Lahore54770, Pakistan
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing210094, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou310023, China
| |
Collapse
|
4
|
Varghese DM, Nussinov R, Ahmad S. Predictive modeling of moonlighting DNA-binding proteins. NAR Genom Bioinform 2022; 4:lqac091. [PMID: 36474806 PMCID: PMC9716651 DOI: 10.1093/nargab/lqac091] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 10/25/2022] [Accepted: 11/11/2022] [Indexed: 09/10/2024] Open
Abstract
Moonlighting proteins are multifunctional, single-polypeptide chains capable of performing multiple autonomous functions. Most moonlighting proteins have been discovered through work unrelated to their multifunctionality. We believe that prediction of moonlighting proteins from first principles, that is, using sequence, predicted structure, evolutionary profiles, and global gene expression profiles, for only one functional class of proteins in a single organism at a time will significantly advance our understanding of multifunctional proteins. In this work, we investigated human moonlighting DNA-binding proteins (mDBPs) in terms of properties that distinguish them from other (non-moonlighting) proteins with the same DNA-binding protein (DBP) function. Following a careful and comprehensive analysis of discriminatory features, a machine learning model was developed to assess the predictability of mDBPs from other DBPs (oDBPs). We observed that mDBPs can be discriminated from oDBPs with high accuracy of 74% AUC of ROC using these first principles features. A number of novel predicted mDBPs were found to have literature support for their being moonlighting and others are proposed as candidates, for which the moonlighting function is currently unknown. We believe that this work will help in deciphering and annotating novel moonlighting DBPs and scale up other functions. The source codes and data sets used for this work are freely available at https://zenodo.org/record/7299265#.Y2pO3ctBxPY.
Collapse
Affiliation(s)
- Dana Mary Varghese
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi-110067, India
| | - Ruth Nussinov
- Computational Structural Biology Section, Cancer Innovation Laboratory, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Israel
| | - Shandar Ahmad
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi-110067, India
| |
Collapse
|
5
|
Jain A, Mittal S, Tripathi LP, Nussinov R, Ahmad S. Host-pathogen protein-nucleic acid interactions: A comprehensive review. Comput Struct Biotechnol J 2022; 20:4415-4436. [PMID: 36051878 PMCID: PMC9420432 DOI: 10.1016/j.csbj.2022.08.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 08/01/2022] [Accepted: 08/01/2022] [Indexed: 12/02/2022] Open
Abstract
Recognition of pathogen-derived nucleic acids by host cells is an effective host strategy to detect pathogenic invasion and trigger immune responses. In the context of pathogen-specific pharmacology, there is a growing interest in mapping the interactions between pathogen-derived nucleic acids and host proteins. Insight into the principles of the structural and immunological mechanisms underlying such interactions and their roles in host defense is necessary to guide therapeutic intervention. Here, we discuss the newest advances in studies of molecular interactions involving pathogen nucleic acids and host factors, including their drug design, molecular structure and specific patterns. We observed that two groups of nucleic acid recognizing molecules, Toll-like receptors (TLRs) and the cytoplasmic retinoic acid-inducible gene (RIG)-I-like receptors (RLRs) form the backbone of host responses to pathogen nucleic acids, with additional support provided by absent in melanoma 2 (AIM2) and DNA-dependent activator of Interferons (IFNs)-regulatory factors (DAI) like cytosolic activity. We review the structural, immunological, and other biological aspects of these representative groups of molecules, especially in terms of their target specificity and affinity and challenges in leveraging host-pathogen protein-nucleic acid interactions (HP-PNI) in drug discovery.
Collapse
Affiliation(s)
- Anuja Jain
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India
| | - Shikha Mittal
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India
- Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology, Waknaghat, Solan, Himachal Pradesh, 173234, India
| | - Lokesh P. Tripathi
- National Institutes of Biomedical Innovation, Health and Nutrition, Ibaraki, Osaka, Japan
- Riken Center for Integrative Medical Sciences, Tsurumi, Yokohama, Kanagawa, Japan
| | - Ruth Nussinov
- Computational Structural Biology Section, Basic Science Program, Frederick National, Laboratory for Cancer Research, Frederick, MD 21702, USA
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Israel
| | - Shandar Ahmad
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India
| |
Collapse
|
6
|
Mosharaf MP, Hassan MM, Ahmed FF, Khatun MS, Moni MA, Mollah MNH. Computational prediction of protein ubiquitination sites mapping on Arabidopsis thaliana. Comput Biol Chem 2020; 85:107238. [DOI: 10.1016/j.compbiolchem.2020.107238] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2018] [Revised: 01/22/2020] [Accepted: 02/18/2020] [Indexed: 02/06/2023]
|
7
|
Chen YA, Tripathi LP, Fujiwara T, Kameyama T, Itoh MN, Mizuguchi K. The TargetMine Data Warehouse: Enhancement and Updates. Front Genet 2019; 10:934. [PMID: 31649722 PMCID: PMC6794636 DOI: 10.3389/fgene.2019.00934] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 09/05/2019] [Indexed: 12/01/2022] Open
Abstract
Biological data analysis is the key to new discoveries in disease biology and drug discovery. The rapid proliferation of high-throughput ‘omics’ data has necessitated a need for tools and platforms that allow the researchers to combine and analyse different types of biological data and obtain biologically relevant knowledge. We had previously developed TargetMine, an integrative data analysis platform for target prioritisation and broad-based biological knowledge discovery. Here, we describe the newly modelled biological data types and the enhanced visual and analytical features of TargetMine. These enhancements have included: an enhanced coverage of gene–gene relations, small molecule metabolite to pathway mappings, an improved literature survey feature, and in silico prediction of gene functional associations such as protein–protein interactions and global gene co-expression. We have also described two usage examples on trans-omics data analysis and extraction of gene-disease associations using MeSH term descriptors. These examples have demonstrated how the newer enhancements in TargetMine have contributed to a more expansive coverage of the biological data space and can help interpret genotype–phenotype relations. TargetMine with its auxiliary toolkit is available at https://targetmine.mizuguchilab.org. The TargetMine source code is available at https://github.com/chenyian-nibio/targetmine-gradle.
Collapse
Affiliation(s)
- Yi-An Chen
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Lokesh P Tripathi
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Takeshi Fujiwara
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Tatsuya Kameyama
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Mari N Itoh
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Kenji Mizuguchi
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| |
Collapse
|
8
|
Chauhan S, Ahmad S. Enabling full‐length evolutionary profiles based deep convolutional neural network for predicting DNA‐binding proteins from sequence. Proteins 2019; 88:15-30. [DOI: 10.1002/prot.25763] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2019] [Revised: 06/01/2019] [Accepted: 06/15/2019] [Indexed: 12/22/2022]
Affiliation(s)
- Sucheta Chauhan
- School of Computational and Integrative SciencesJawaharlal Nehru University New Delhi India
| | - Shandar Ahmad
- School of Computational and Integrative SciencesJawaharlal Nehru University New Delhi India
| |
Collapse
|