Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Stawiski EW, Gregoret LM, Mandel-Gutfreund Y. Annotating nucleic acid-binding function based on protein structure. J Mol Biol 2003;326:1065-79. [PMID: 12589754 DOI: 10.1016/s0022-2836(03)00031-7] [Citation(s) in RCA: 141] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

For:	Stawiski EW, Gregoret LM, Mandel-Gutfreund Y. Annotating nucleic acid-binding function based on protein structure. J Mol Biol 2003;326:1065-79. [PMID: 12589754 DOI: 10.1016/s0022-2836(03)00031-7] [Citation(s) in RCA: 141] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Number

Cited by Other Article(s)

Pradhan UK, Meher PK, Naha S, Das R, Gupta A, Parsad R. ProkDBP: Toward more precise identification of prokaryotic DNA binding proteins. Protein Sci 2024;33:e5015. [PMID: 38747369 PMCID: PMC11094783 DOI: 10.1002/pro.5015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 04/18/2024] [Accepted: 04/21/2024] [Indexed: 05/19/2024]

Ahmed SH, Bose DB, Khandoker R, Rahman MS. StackDPP: a stacking ensemble based DNA-binding protein prediction model. BMC Bioinformatics 2024;25:111. [PMID: 38486135 PMCID: PMC10941422 DOI: 10.1186/s12859-024-05714-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 02/20/2024] [Indexed: 03/17/2024] Open

Abstract

BACKGROUND

DNA-binding proteins (DNA-BPs) are the proteins that bind and interact with DNA. DNA-BPs regulate and affect numerous biological processes, such as, transcription and DNA replication, repair, and organization of the chromosomal DNA. Very few proteins, however, are DNA-binding in nature. Therefore, it is necessary to develop an efficient predictor for identifying DNA-BPs.

RESULT

In this work, we have proposed new benchmark datasets for the DNA-binding protein prediction problem. We discovered several quality concerns with the widely used benchmark datasets, PDB1075 (for training) and PDB186 (for independent testing), which necessitated the preparation of new benchmark datasets. Our proposed datasets UNIPROT1424 and UNIPROT356 can be used for model training and independent testing respectively. We have retrained selected state-of-the-art DNA-BP predictors in the new dataset and reported their performance results. We also trained a novel predictor using the new benchmark dataset. We extracted features from various feature categories, then used a Random Forest classifier and Recursive Feature Elimination with Cross-validation (RFECV) to select the optimal set of 452 features. We then proposed a stacking ensemble architecture as our final prediction model. Named Stacking Ensemble Model for DNA-binding Protein Prediction, or StackDPP in short, our model achieved 0.92, 0.92 and 0.93 accuracy in 10-fold cross-validation, jackknife and independent testing respectively.

CONCLUSION

StackDPP has performed very well in cross-validation testing and has outperformed all the state-of-the-art prediction models in independent testing. Its performance scores in cross-validation testing generalized very well in the independent test set. The source code of the model is publicly available at https://github.com/HasibAhmed1624/StackDPP . Therefore, we expect this generalized model can be adopted by researchers and practitioners to identify novel DNA-binding proteins.

Collapse

Levine J, Lobyntseva A, Shazman S, Hakim F, Gozes I. Longitudinal Genotype-Phenotype (Vineland Questionnaire) Characterization of 15 ADNP Syndrome Cases Highlights Mutated Protein Length and Structural Characteristics Correlation with Communicative Abilities Accentuated in Males. J Mol Neurosci 2024;74:15. [PMID: 38282129 DOI: 10.1007/s12031-024-02189-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/08/2024] [Indexed: 01/30/2024]

Mohanty P, Kapoor U, Sundaravadivelu Devarajan D, Phan TM, Rizuan A, Mittal J. Principles Governing the Phase Separation of Multidomain Proteins. Biochemistry 2022;61:2443-2455. [PMID: 35802394 PMCID: PMC9669140 DOI: 10.1021/acs.biochem.2c00210] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Pal A, Chakrabarti P, Dey S. ProDFace: A web-tool for the dissection of protein-DNA interfaces. Front Mol Biosci 2022;9:978310. [PMID: 36148013 PMCID: PMC9486321 DOI: 10.3389/fmolb.2022.978310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Accepted: 08/09/2022] [Indexed: 11/30/2022] Open

Feric M, Misteli T. Function moves biomolecular condensates in phase space. Bioessays 2022;44:e2200001. [PMID: 35243657 PMCID: PMC9277701 DOI: 10.1002/bies.202200001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 02/19/2022] [Accepted: 02/22/2022] [Indexed: 11/08/2022]

DNAPred_Prot: Identification of DNA-Binding Proteins Using Composition- and Position-Based Features. Appl Bionics Biomech 2022;2022:5483115. [PMID: 35465187 PMCID: PMC9020926 DOI: 10.1155/2022/5483115] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 12/25/2021] [Accepted: 02/05/2022] [Indexed: 12/29/2022] Open

Wei J, Chen S, Zong L, Gao X, Li Y. Protein-RNA interaction prediction with deep learning: structure matters. Brief Bioinform 2022;23:bbab540. [PMID: 34929730 PMCID: PMC8790951 DOI: 10.1093/bib/bbab540] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 11/14/2021] [Accepted: 11/22/2021] [Indexed: 12/11/2022] Open

Jia Y, Huang S, Zhang T. KK-DBP: A Multi-Feature Fusion Method for DNA-Binding Protein Identification Based on Random Forest. Front Genet 2021;12:811158. [PMID: 34912382 PMCID: PMC8667860 DOI: 10.3389/fgene.2021.811158] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 11/15/2021] [Indexed: 02/04/2023] Open

Zhang Y, Ni J, Gao Y. RF-SVM: Identification of DNA-binding proteins based on comprehensive feature representation methods and support vector machine. Proteins 2021;90:395-404. [PMID: 34455627 DOI: 10.1002/prot.26229] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 08/10/2021] [Accepted: 08/24/2021] [Indexed: 01/07/2023]

Li G, Du X, Li X, Zou L, Zhang G, Wu Z. Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning. PeerJ 2021;9:e11262. [PMID: 33986992 PMCID: PMC8101451 DOI: 10.7717/peerj.11262] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 03/22/2021] [Indexed: 12/12/2022] Open

Zhang J, Chen Q, Liu B. iDRBP_MMC: Identifying DNA-Binding Proteins and RNA-Binding Proteins Based on Multi-Label Learning Model and Motif-Based Convolutional Neural Network. J Mol Biol 2020;432:5860-5875. [DOI: 10.1016/j.jmb.2020.09.008] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Revised: 08/12/2020] [Accepted: 09/04/2020] [Indexed: 11/28/2022]

Hu S, Ma R, Wang H. An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences. PLoS One 2019;14:e0225317. [PMID: 31725778 PMCID: PMC6855455 DOI: 10.1371/journal.pone.0225317] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2019] [Accepted: 11/02/2019] [Indexed: 11/23/2022] Open

Wang W, Langlois R, Langlois M, Genchev GZ, Wang X, Lu H. Functional Site Discovery From Incomplete Training Data: A Case Study With Nucleic Acid-Binding Proteins. Front Genet 2019;10:729. [PMID: 31543893 PMCID: PMC6729729 DOI: 10.3389/fgene.2019.00729] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 07/11/2019] [Indexed: 12/27/2022] Open

Blanco JD, Radusky L, Climente-González H, Serrano L. FoldX accurate structural protein-DNA binding prediction using PADA1 (Protein Assisted DNA Assembly 1). Nucleic Acids Res 2019;46:3852-3863. [PMID: 29608705 PMCID: PMC5934639 DOI: 10.1093/nar/gky228] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Accepted: 03/20/2018] [Indexed: 12/20/2022] Open

Qu K, Wei L, Zou Q. A Review of DNA-binding Proteins Prediction Methods. Curr Bioinform 2019. [DOI: 10.2174/1574893614666181212102030] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Zhang J, Liu B. A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods. Curr Bioinform 2019. [DOI: 10.2174/1574893614666181212102749] [Citation(s) in RCA: 96] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Abstract Background:Proteins play a crucial role in life activities, such as catalyzing metabolic reactions, DNA replication, responding to stimuli, etc. Identification of protein structures and functions are critical for both basic research and applications. Because the traditional experiments for studying the structures and functions of proteins are expensive and time consuming, computational approaches are highly desired. In key for computational methods is how to efficiently extract the features from the protein sequences. During the last decade, many powerful feature extraction algorithms have been proposed, significantly promoting the development of the studies of protein structures and functions.Objective:To help the researchers to catch up the recent developments in this important field, in this study, an updated review is given, focusing on the sequence-based feature extractions of protein sequences.Method:These sequence-based features of proteins were grouped into three categories, including composition-based features, autocorrelation-based features and profile-based features. The detailed information of features in each group was introduced, and their advantages and disadvantages were discussed. Besides, some useful tools for generating these features will also be introduced.Results:Generally, autocorrelation-based features outperform composition-based features, and profile-based features outperform autocorrelation-based features. The reason is that profile-based features consider the evolutionary information, which is useful for identification of protein structures and functions. However, profile-based features are more time consuming, because the multiple sequence alignment process is required.Conclusion:In this study, some recently proposed sequence-based features were introduced and discussed, such as basic k-mers, PseAAC, auto-cross covariance, top-n-gram etc. These features did make great contributions to the developments of protein sequence analysis. Future studies can be focus on exploring the combinations of these features. Besides, techniques from other fields, such as signal processing, natural language process (NLP), image processing etc., would also contribute to this important field, because natural languages (such as English) and protein sequences share some similarities. Therefore, the proteins can be treated as documents, and the features, such as k-mers, top-n-grams, motifs, can be treated as the words in the languages. Techniques from these filed will give some new ideas and strategies for extracting the features from proteins. Collapse

Mishra A, Pokhrel P, Hoque MT. StackDPPred: a stacking based prediction of DNA-binding protein from sequence. Bioinformatics 2018;35:433-441. [DOI: 10.1093/bioinformatics/bty653] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Accepted: 07/18/2018] [Indexed: 12/12/2022] Open

DPP-PseAAC: A DNA-binding protein prediction model using Chou's general PseAAC. J Theor Biol 2018;452:22-34. [PMID: 29753757 DOI: 10.1016/j.jtbi.2018.05.006] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2018] [Revised: 04/21/2018] [Accepted: 05/04/2018] [Indexed: 11/21/2022]

HMMBinder: DNA-Binding Protein Prediction Using HMM Profile Based Features. BIOMED RESEARCH INTERNATIONAL 2017;2017:4590609. [PMID: 29270430 PMCID: PMC5706079 DOI: 10.1155/2017/4590609] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Accepted: 10/22/2017] [Indexed: 12/21/2022]

Festuccia N, Gonzalez I, Owens N, Navarro P. Mitotic bookmarking in development and stem cells. Development 2017;144:3633-3645. [DOI: 10.1242/dev.146522] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Zhang L, Duan X, He N, Chen X, Shi J, Li W, Xu L, Li H. Exposure to lethal levels of benzo[a]pyrene or cadmium trigger distinct protein expression patterns in earthworms (Eisenia fetida). THE SCIENCE OF THE TOTAL ENVIRONMENT 2017;595:733-742. [PMID: 28407590 DOI: 10.1016/j.scitotenv.2017.04.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Revised: 03/30/2017] [Accepted: 04/01/2017] [Indexed: 06/07/2023]

Abstract

UNLABELLED

Different pollutants induce distinct toxic responses in earthworms (Eisenia fetida). Here, we used proteomics techniques to compare the responses of E. fetida to exposure to the 10% lethal concentration (14d-LC₁₀) of benzo[a]pyrene (BaP) or cadmium (Cd) in natural red soil (China). BaP exposure markedly induced the expression of oxidation-reduction proteins, whereas Cd exposure mainly induced the expression of proteins involved in transcription- and translation-related processes. Furthermore, calmodulin-binding proteins were differentially expressed upon exposure to different pollutants. The calcium (Ca²⁺)-binding cytoskeletal element myosin was down-regulated upon BaP treatment, whereas the Ca²⁺-binding cytoskeletal element tropomyosin-1 was up-regulated upon Cd treatment. Some proteins exhibited opposite responses to the two pollutants. For instance, catalase (CAT) and heat shock protein 70 were up-regulated upon BaP treatment and down-regulated upon Cd treatment. A significant (p<0.05, one-way ANOVA with least-significant difference (LSD) test) increase in the level of reactive oxygen species (ROS) and CAT activity further showed that BaP mainly induces oxidative stress. Real-time PCR analysis showed that mRNA expression often did not correlate well with protein expression in earthworms subjected to Cd or BaP treatment. In addition, the expression of the gene encoding the protein metallothionein, which was not detected in the protein analysis, was induced upon Cd treatment, but slightly reduced upon BaP treatment. Therefore, BaP and Cd have distinct effects on the protein profile of E. Fetida with BaP markedly inducing ROS activity, and Cd mainly triggering genotoxicity.

CAPSULE SUMMARY

Distinct patterns of protein expression are induced in earthworms upon exposure to different pollutants; BaP markedly induces high levels of ROS, while Cd resultes in genotoxicity.

Collapse

Improved detection of DNA-binding proteins via compression technology on PSSM information. PLoS One 2017;12:e0185587. [PMID: 28961273 PMCID: PMC5621689 DOI: 10.1371/journal.pone.0185587] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 09/17/2017] [Indexed: 12/04/2022] Open

Zhang J, Liu B. PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation. Int J Mol Sci 2017;18:ijms18091856. [PMID: 28841194 PMCID: PMC5618505 DOI: 10.3390/ijms18091856] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Revised: 08/19/2017] [Accepted: 08/22/2017] [Indexed: 12/30/2022] Open

Jia Z, Li L, Chakravorty A, Alexov E. Treating ion distribution with Gaussian-based smooth dielectric function in DelPhi. J Comput Chem 2017;38:1974-1979. [PMID: 28602026 PMCID: PMC5495612 DOI: 10.1002/jcc.24831] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Revised: 03/21/2017] [Accepted: 04/22/2017] [Indexed: 11/06/2022]

Wei L, Tang J, Zou Q. Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2016.06.026] [Citation(s) in RCA: 196] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

ThiN as a Versatile Domain of Transcriptional Repressors and Catalytic Enzymes of Thiamine Biosynthesis. J Bacteriol 2017;199:JB.00810-16. [PMID: 28115546 DOI: 10.1128/jb.00810-16] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Accepted: 01/14/2017] [Indexed: 01/21/2023] Open

Abstract

Thiamine biosynthesis is commonly regulated by a riboswitch mechanism; however, the enzymatic steps and regulation of this pathway in archaea are poorly understood. Haloferax volcanii, one of the representative archaea, uses a eukaryote-like Thi4 (thiamine thiazole synthase) for the production of the thiazole ring and condenses this ring with a pyrimidine moiety synthesized by an apparent bacterium-like ThiC (2-methyl-4-amino-5-hydroxymethylpyrimidine [HMP] phosphate synthase) branch. Here we found that archaeal Thi4 and ThiC were encoded by leaderless transcripts, ruling out a riboswitch mechanism. Instead, a novel ThiR transcription factor that harbored an N-terminal helix-turn-helix (HTH) DNA binding domain and C-terminal ThiN (TMP synthase) domain was identified. In the presence of thiamine, ThiR was found to repress the expression of thi4 and thiC by a DNA operator sequence that was conserved across archaeal phyla. Despite having a ThiN domain, ThiR was found to be catalytically inactive in compensating for the loss of ThiE (TMP synthase) function. In contrast, bifunctional ThiDN, in which the ThiN domain is fused to an N-terminal ThiD (HMP/HMP phosphate [HMP-P] kinase) domain, was found to be interchangeable for ThiE function and, thus, active in thiamine biosynthesis. A conserved Met residue of an extended α-helix near the active-site His of the ThiN domain was found to be important for ThiDN catalytic activity, whereas the corresponding Met residue was absent and the α-helix was shorter in ThiR homologs. Thus, we provide new insight into residues that distinguish catalytic from noncatalytic ThiN domains and reveal that thiamine biosynthesis in archaea is regulated by a transcriptional repressor, ThiR, and not by a riboswitch.IMPORTANCE Thiamine pyrophosphate (TPP) is a cofactor needed for the enzymatic activity of many cellular processes, including central metabolism. In archaea, thiamine biosynthesis is an apparent chimera of eukaryote- and bacterium-type pathways that is not well defined at the level of enzymatic steps or regulatory mechanisms. Here we find that ThiN is a versatile domain of transcriptional repressors and catalytic enzymes of thiamine biosynthesis in archaea. Our study provides new insight into residues that distinguish catalytic from noncatalytic ThiN domains and reveals that archaeal thiamine biosynthesis is regulated by a ThiN domain transcriptional repressor, ThiR, and not by a riboswitch.

Collapse

Ponnuraj K, Saravanan KM. Dihedral angle preferences of DNA and RNA binding amino acid residues in proteins. Int J Biol Macromol 2017;97:434-439. [PMID: 28099891 DOI: 10.1016/j.ijbiomac.2017.01.068] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2016] [Revised: 01/12/2017] [Accepted: 01/13/2017] [Indexed: 11/30/2022]

Dutta S, Madan S, Parikh H, Sundar D. An ensemble micro neural network approach for elucidating interactions between zinc finger proteins and their target DNA. BMC Genomics 2016;17:1033. [PMID: 28155662 PMCID: PMC5260015 DOI: 10.1186/s12864-016-3323-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract

BACKGROUND

The ability to engineer zinc finger proteins binding to a DNA sequence of choice is essential for targeted genome editing to be possible. Experimental techniques and molecular docking have been successful in predicting protein-DNA interactions, however, they are highly time and resource intensive. Here, we present a novel algorithm designed for high throughput prediction of optimal zinc finger protein for 9 bp DNA sequences of choice. In accordance with the principles of information theory, a subset identified by using K-means clustering was used as a representative for the space of all possible 9 bp DNA sequences. The modeling and simulation results assuming synergistic mode of binding obtained from this subset were used to train an ensemble micro neural network. Synergistic mode of binding is the closest to the DNA-protein binding seen in nature, and gives much higher quality predictions, while the time and resources increase exponentially in the trade off. Our algorithm is inspired from an ensemble machine learning approach, and incorporates the predictions made by 100 parallel neural networks, each with a different hidden layer architecture designed to pick up different features from the training dataset to predict optimal zinc finger proteins for any 9 bp target DNA.

RESULTS

The model gave an accuracy of an average 83% sequence identity for the testing dataset. The BLAST e-value are well within the statistical confidence interval of E-05 for 100% of the testing samples. The geometric mean and median value for the BLAST e-values were found to be 1.70E-12 and 7.00E-12 respectively. For final validation of approach, we compared our predictions against optimal ZFPs reported in literature for a set of experimentally studied DNA sequences. The accuracy, as measured by the average string identity between our predictions and the optimal zinc finger protein reported in literature for a 9 bp DNA target was found to be as high as 81% for DNA targets with a consensus sequence GCNGNNGCN reported in literature. Moreover, the average string identity of our predictions for a catalogue of over 100 9 bp DNA for which the optimal zinc finger protein has been reported in literature was found to be 71%.

CONCLUSIONS

Validation with experimental data shows that our tool is capable of domain adaptation and thus scales well to datasets other than the training set with high accuracy. As synergistic binding comes the closest to the ideal mode of binding, our algorithm predicts biologically relevant results in sync with the experimental data present in the literature. While there have been disjointed attempts to approach this problem synergistically reported in literature, there is no work covering the whole sample space. Our algorithm allows designing zinc finger proteins for DNA targets of the user's choice, opening up new frontiers in the field of targeted genome editing. This algorithm is also available as an easy to use web server, ZifNN, at http://web.iitd.ac.in/~sundar/ZifNN/ .

Collapse

Identification of DNA binding proteins using evolutionary profiles position specific scoring matrix. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.03.025] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Zhou J, Xu R, He Y, Lu Q, Wang H, Kong B. PDNAsite: Identification of DNA-binding Site from Protein Sequence by Incorporating Spatial and Sequence Context. Sci Rep 2016;6:27653. [PMID: 27282833 PMCID: PMC4901350 DOI: 10.1038/srep27653] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 05/18/2016] [Indexed: 02/01/2023] Open

Molecular Pathophysiology of Fragile X-Associated Tremor/Ataxia Syndrome and Perspectives for Drug Development. THE CEREBELLUM 2016;15:599-610. [DOI: 10.1007/s12311-016-0800-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Paz I, Kligun E, Bengad B, Mandel-Gutfreund Y. BindUP: a web server for non-homology-based prediction of DNA and RNA binding proteins. Nucleic Acids Res 2016;44:W568-74. [PMID: 27198220 PMCID: PMC4987955 DOI: 10.1093/nar/gkw454] [Citation(s) in RCA: 49] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Accepted: 05/11/2016] [Indexed: 12/12/2022] Open

Liu B, Wang S, Dong Q, Li S, Liu X. Identification of DNA-binding proteins by combining auto-cross covariance transformation and ensemble learning. IEEE Trans Nanobioscience 2016;15:328-334. [PMID: 28113908 DOI: 10.1109/tnb.2016.2555951] [Citation(s) in RCA: 65] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Computational Prediction of RNA-Binding Proteins and Binding Sites. Int J Mol Sci 2015;16:26303-17. [PMID: 26540053 PMCID: PMC4661811 DOI: 10.3390/ijms161125952] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Revised: 10/20/2015] [Accepted: 10/23/2015] [Indexed: 11/19/2022] Open

DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation. Sci Rep 2015;5:15479. [PMID: 26482832 PMCID: PMC4611492 DOI: 10.1038/srep15479] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Accepted: 09/28/2015] [Indexed: 02/01/2023] Open

Frye SA, Lång E, Beyene GT, Balasingham SV, Homberset H, Rowe AD, Ambur OH, Tønjum T. The Inner Membrane Protein PilG Interacts with DNA and the Secretin PilQ in Transformation. PLoS One 2015;10:e0134954. [PMID: 26248334 PMCID: PMC4527729 DOI: 10.1371/journal.pone.0134954] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Accepted: 07/15/2015] [Indexed: 11/19/2022] Open

SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues. PLoS One 2015;10:e0133260. [PMID: 26176857 PMCID: PMC4503397 DOI: 10.1371/journal.pone.0133260] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 06/25/2015] [Indexed: 11/19/2022] Open

An overview of the prediction of protein DNA-binding sites. Int J Mol Sci 2015;16:5194-215. [PMID: 25756377 PMCID: PMC4394471 DOI: 10.3390/ijms16035194] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Revised: 02/21/2015] [Accepted: 02/27/2015] [Indexed: 02/06/2023] Open

Xu R, Zhou J, Wang H, He Y, Wang X, Liu B. Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation. BMC SYSTEMS BIOLOGY 2015;9 Suppl 1:S10. [PMID: 25708928 PMCID: PMC4331676 DOI: 10.1186/1752-0509-9-s1-s10] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Abstract

BACKGROUND

DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation. There have been several computational methods proposed in the literature to deal with the DNA-binding protein identification. However, most of them can't provide an invaluable knowledge base for our understanding of DNA-protein interactions.

RESULTS

We firstly presented a new protein sequence encoding method called PSSM Distance Transformation, and then constructed a DNA-binding protein identification method (SVM-PSSM-DT) by combining PSSM Distance Transformation with support vector machine (SVM). First, the PSSM profiles are generated by using the PSI-BLAST program to search the non-redundant (NR) database. Next, the PSSM profiles are transformed into uniform numeric representations appropriately by distance transformation scheme. Lastly, the resulting uniform numeric representations are inputted into a SVM classifier for prediction. Thus whether a sequence can bind to DNA or not can be determined. In benchmark test on 525 DNA-binding and 550 non DNA-binding proteins using jackknife validation, the present model achieved an ACC of 79.96%, MCC of 0.622 and AUC of 86.50%. This performance is considerably better than most of the existing state-of-the-art predictive methods. When tested on a recently constructed independent dataset PDB186, SVM-PSSM-DT also achieved the best performance with ACC of 80.00%, MCC of 0.647 and AUC of 87.40%, and outperformed some existing state-of-the-art methods.

CONCLUSIONS

The experiment results demonstrate that PSSM Distance Transformation is an available protein sequence encoding method and SVM-PSSM-DT is a useful tool for identifying the DNA-binding proteins. A user-friendly web-server of SVM-PSSM-DT was constructed, which is freely accessible to the public at the web-site on http://bioinformatics.hitsz.edu.cn/PSSM-DT/.

Collapse

Samant M, Jethva M, Hasija Y. INTERACT-O-FINDER: A Tool for Prediction of DNA-Binding Proteins Using Sequence Features. Int J Pept Res Ther 2014. [DOI: 10.1007/s10989-014-9446-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Xu R, Zhou J, Liu B, He Y, Zou Q, Wang X, Chou KC. Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach. J Biomol Struct Dyn 2014;33:1720-30. [PMID: 25252709 DOI: 10.1080/07391102.2014.968624] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Liu B, Xu J, Fan S, Xu R, Zhou J, Wang X. PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou’s PseAAC and Physicochemical Distance Transformation. Mol Inform 2014;34:8-17. [DOI: 10.1002/minf.201400025] [Citation(s) in RCA: 135] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2014] [Accepted: 05/27/2014] [Indexed: 11/06/2022]

newDNA-Prot: Prediction of DNA-binding proteins by employing support vector machine and a comprehensive sequence representation. Comput Biol Chem 2014;52:51-9. [PMID: 25240115 DOI: 10.1016/j.compbiolchem.2014.09.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2014] [Revised: 09/05/2014] [Accepted: 09/06/2014] [Indexed: 11/21/2022]

Liu B, Xu J, Lan X, Xu R, Zhou J, Wang X, Chou KC. iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One 2014;9:e106691. [PMID: 25184541 PMCID: PMC4153653 DOI: 10.1371/journal.pone.0106691] [Citation(s) in RCA: 208] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2014] [Accepted: 07/31/2014] [Indexed: 11/18/2022] Open

Yang XX, Deng ZL, Liu R. RBRDetector: Improved prediction of binding residues on RNA-binding protein structures using complementary feature- and template-based strategies. Proteins 2014;82:2455-71. [DOI: 10.1002/prot.24610] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2014] [Revised: 04/28/2014] [Accepted: 05/09/2014] [Indexed: 11/05/2022]

enDNA-Prot: identification of DNA-binding proteins by applying ensemble learning. BIOMED RESEARCH INTERNATIONAL 2014;2014:294279. [PMID: 24977146 PMCID: PMC4058174 DOI: 10.1155/2014/294279] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Revised: 05/05/2014] [Accepted: 05/05/2014] [Indexed: 12/03/2022]

Zhao H, Wang J, Zhou Y, Yang Y. Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome. PLoS One 2014;9:e96694. [PMID: 24792350 PMCID: PMC4008587 DOI: 10.1371/journal.pone.0096694] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2014] [Accepted: 04/10/2014] [Indexed: 12/25/2022] Open

Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes. PLoS One 2014;9:e86703. [PMID: 24475169 PMCID: PMC3901691 DOI: 10.1371/journal.pone.0086703] [Citation(s) in RCA: 112] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2013] [Accepted: 12/10/2013] [Indexed: 11/22/2022] Open

Abstract

Developing an efficient method for determination of the DNA-binding proteins, due to their vital roles in gene regulation, is becoming highly desired since it would be invaluable to advance our understanding of protein functions. In this study, we proposed a new method for the prediction of the DNA-binding proteins, by performing the feature rank using random forest and the wrapper-based feature selection using forward best-first search strategy. The features comprise information from primary sequence, predicted secondary structure, predicted relative solvent accessibility, and position specific scoring matrix. The proposed method, called DBPPred, used Gaussian naïve Bayes as the underlying classifier since it outperformed five other classifiers, including decision tree, logistic regression, k-nearest neighbor, support vector machine with polynomial kernel, and support vector machine with radial basis function. As a result, the proposed DBPPred yields the highest average accuracy of 0.791 and average MCC of 0.583 according to the five-fold cross validation with ten runs on the training benchmark dataset PDB594. Subsequently, blind tests on the independent dataset PDB186 by the proposed model trained on the entire PDB594 dataset and by other five existing methods (including iDNA-Prot, DNA-Prot, DNAbinder, DNABIND and DBD-Threader) were performed, resulting in that the proposed DBPPred yielded the highest accuracy of 0.769, MCC of 0.538, and AUC of 0.790. The independent tests performed by the proposed DBPPred on completely a large non-DNA binding protein dataset and two RNA binding protein datasets also showed improved or comparable quality when compared with the relevant prediction methods. Moreover, we observed that majority of the selected features by the proposed method are statistically significantly different between the mean feature values of the DNA-binding and the non DNA-binding proteins. All of the experimental results indicate that the proposed DBPPred can be an alternative perspective predictor for large-scale determination of DNA-binding proteins.

Collapse

Cirillo D, Livi CM, Agostini F, Tartaglia GG. Discovery of protein–RNA networks. ACTA ACUST UNITED AC 2014;10:1632-42. [DOI: 10.1039/c4mb00099d] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]