Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Jain A, Gali H, Kihara D. Identification of Moonlighting Proteins in Genomes Using Text Mining Techniques. Proteomics 2018;18:e1800083. [PMID: 30260564 DOI: 10.1002/pmic.201800083] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2018] [Revised: 08/13/2018] [Indexed: 12/31/2022]

For:	Jain A, Gali H, Kihara D. Identification of Moonlighting Proteins in Genomes Using Text Mining Techniques. Proteomics 2018;18:e1800083. [PMID: 30260564 DOI: 10.1002/pmic.201800083] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2018] [Revised: 08/13/2018] [Indexed: 12/31/2022]

Number

Cited by Other Article(s)

Liu X, Shen Y, Zhang Y, Liu F, Ma Z, Yue Z, Yue Y. IdentPMP: identification of moonlighting proteins in plants using sequence-based learning models. PeerJ 2021;9:e11900. [PMID: 34434652 PMCID: PMC8351581 DOI: 10.7717/peerj.11900] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 07/13/2021] [Indexed: 01/17/2023] Open

Abstract

Background

A moonlighting protein refers to a protein that can perform two or more functions. Since the current moonlighting protein prediction tools mainly focus on the proteins in animals and microorganisms, and there are differences in the cells and proteins between animals and plants, these may cause the existing tools to predict plant moonlighting proteins inaccurately. Hence, the availability of a benchmark data set and a prediction tool specific for plant moonlighting protein are necessary.

Methods

This study used some protein feature classes from the data set constructed in house to develop a web-based prediction tool. In the beginning, we built a data set about plant protein and reduced redundant sequences. We then performed feature selection, feature normalization and feature dimensionality reduction on the training data. Next, machine learning methods for preliminary modeling were used to select feature classes that performed best in plant moonlighting protein prediction. This selected feature was incorporated into the final plant protein prediction tool. After that, we compared five machine learning methods and used grid searching to optimize parameters, and the most suitable method was chosen as the final model.

Results

The prediction results indicated that the eXtreme Gradient Boosting (XGBoost) performed best, which was used as the algorithm to construct the prediction tool, called IdentPMP (Identification of Plant Moonlighting Proteins). The results of the independent test set shows that the area under the precision-recall curve (AUPRC) and the area under the receiver operating characteristic curve (AUC) of IdentPMP is 0.43 and 0.68, which are 19.44% (0.43 vs. 0.36) and 13.33% (0.68 vs. 0.60) higher than state-of-the-art non-plant specific methods, respectively. This further demonstrated that a benchmark data set and a plant-specific prediction tool was required for plant moonlighting protein studies. Finally, we implemented the tool into a web version, and users can use it freely through the URL: http://identpmp.aielab.net/.

Collapse

Shirafkan F, Gharaghani S, Rahimian K, Sajedi RH, Zahiri J. Moonlighting protein prediction using physico-chemical and evolutional properties via machine learning methods. BMC Bioinformatics 2021;22:261. [PMID: 34030624 PMCID: PMC8142502 DOI: 10.1186/s12859-021-04194-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 05/13/2021] [Indexed: 12/18/2022] Open

Abstract

Background

Moonlighting proteins (MPs) are a subclass of multifunctional proteins in which more than one independent or usually distinct function occurs in a single polypeptide chain. Identification of unknown cellular processes, understanding novel protein mechanisms, improving the prediction of protein functions, and gaining information about protein evolution are the main reasons to study MPs. They also play an important role in disease pathways and drug-target discovery. Since detecting MPs experimentally is quite a challenge, most of them are detected randomly. Therefore, introducing an appropriate computational approach to predict MPs seems reasonable.

Results

In this study, we introduced a competent model for detecting moonlighting and non-MPs through extracted features from protein sequences. We attempted to set up a well-judged scheme for detecting outlier proteins. Consequently, 37 distinct feature vectors were utilized to study each protein’s impact on detecting MPs. Furthermore, 8 different classification methods were assessed to find the best performance. To detect outliers, each one of the classifications was executed 100 times by tenfold cross-validation on feature vectors; proteins which misclassified 90 times or more were grouped. This process was applied to every single feature vector and eventually the intersection of these groups was determined as the outlier proteins. The results of tenfold cross-validation on a dataset of 351 samples (containing 215 moonlighting and 136 non-moonlighting proteins) reveal that the SVM method on all feature vectors has the highest performance among all methods in this study and other available methods. Besides, the study of outliers showed that 57 of 351 proteins in the dataset could be an appropriate candidate for the outlier. Among the outlier proteins, there were non-MPs (such as P69797) that have been misclassified in 8 different classification methods with 16 different feature vectors. Because these proteins have been obtained by computational methods, the results of this study could reduce the likelihood of hypothesizing whether these proteins are non-moonlighting at all.

Conclusions

MPs are difficult to be identified through experimentation. Using distinct feature vectors, our method enabled identification of novel moonlighting proteins. The study also pinpointed that a number of non-MPs are likely to be moonlighting.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12859-021-04194-5.

Collapse

Espinosa-Cantú A, Cruz-Bonilla E, Noda-Garcia L, DeLuna A. Multiple Forms of Multifunctional Proteins in Health and Disease. Front Cell Dev Biol 2020;8:451. [PMID: 32587857 PMCID: PMC7297953 DOI: 10.3389/fcell.2020.00451] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 05/14/2020] [Indexed: 12/23/2022] Open

Uversky VN. Bringing Darkness to Light: Intrinsic Disorder as a Means to Dig into the Dark Proteome. Proteomics 2019;18:e1800352. [PMID: 30334344 DOI: 10.1002/pmic.201800352] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Macossay-Castillo M, Marvelli G, Guharoy M, Jain A, Kihara D, Tompa P, Wodak SJ. The Balancing Act of Intrinsically Disordered Proteins: Enabling Functional Diversity while Minimizing Promiscuity. J Mol Biol 2019;431:1650-1670. [PMID: 30878482 DOI: 10.1016/j.jmb.2019.03.008] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Revised: 02/25/2019] [Accepted: 03/03/2019] [Indexed: 10/27/2022]

Abstract

Intrinsically disordered proteins (IDPs) or regions (IDRs) perform diverse cellular functions, but are also prone to forming promiscuous and potentially deleterious interactions. We investigate the extent to which the properties of, and content in, IDRs have adapted to enable functional diversity while limiting interference from promiscuous interactions in the crowded cellular environment. Information on protein sequences, their predicted intrinsic disorder, and 3D structure contents is related to data on protein cellular concentrations, gene co-expression, and protein-protein interactions in the well-studied yeast Saccharomyces cerevisiae. Results reveal that both the protein IDR content and the frequency of "sticky" amino acids in IDRs (those more frequently involved in protein interfaces) decrease with increasing protein cellular concentration. This implies that the IDR content and the amino acid composition of IDRs experience negative selection as the protein concentration increases. In the S. cerevisiae protein-protein interaction network, the higher a protein's IDR content, the more frequently it interacts with IDR-containing partners, and the more functionally diverse the partners are. Employing a clustering analysis of Gene Ontology terms, we newly identify ~600 putative multifunctional proteins in S. cerevisiae. Strikingly, these proteins are enriched in IDRs and contribute significantly to all the observed trends. In particular, IDRs of multi-functional proteins feature more sticky amino acids than IDRs of their non-multifunctional counterparts, or the surfaces of structured yeast proteins. This property likely affords sufficient binding affinity for the functional interactions, commonly mediated by short IDR segments, thereby counterbalancing the loss in overall IDR conformational entropy upon binding.

Collapse