1
|
Liu X, Shen Y, Zhang Y, Liu F, Ma Z, Yue Z, Yue Y. IdentPMP: identification of moonlighting proteins in plants using sequence-based learning models. PeerJ 2021; 9:e11900. [PMID: 34434652 PMCID: PMC8351581 DOI: 10.7717/peerj.11900] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 07/13/2021] [Indexed: 01/17/2023] Open
Abstract
Background A moonlighting protein refers to a protein that can perform two or more functions. Since the current moonlighting protein prediction tools mainly focus on the proteins in animals and microorganisms, and there are differences in the cells and proteins between animals and plants, these may cause the existing tools to predict plant moonlighting proteins inaccurately. Hence, the availability of a benchmark data set and a prediction tool specific for plant moonlighting protein are necessary. Methods This study used some protein feature classes from the data set constructed in house to develop a web-based prediction tool. In the beginning, we built a data set about plant protein and reduced redundant sequences. We then performed feature selection, feature normalization and feature dimensionality reduction on the training data. Next, machine learning methods for preliminary modeling were used to select feature classes that performed best in plant moonlighting protein prediction. This selected feature was incorporated into the final plant protein prediction tool. After that, we compared five machine learning methods and used grid searching to optimize parameters, and the most suitable method was chosen as the final model. Results The prediction results indicated that the eXtreme Gradient Boosting (XGBoost) performed best, which was used as the algorithm to construct the prediction tool, called IdentPMP (Identification of Plant Moonlighting Proteins). The results of the independent test set shows that the area under the precision-recall curve (AUPRC) and the area under the receiver operating characteristic curve (AUC) of IdentPMP is 0.43 and 0.68, which are 19.44% (0.43 vs. 0.36) and 13.33% (0.68 vs. 0.60) higher than state-of-the-art non-plant specific methods, respectively. This further demonstrated that a benchmark data set and a plant-specific prediction tool was required for plant moonlighting protein studies. Finally, we implemented the tool into a web version, and users can use it freely through the URL: http://identpmp.aielab.net/.
Collapse
Affiliation(s)
- Xinyi Liu
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui, China
| | - Yueyue Shen
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui, China
| | - Youhua Zhang
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui, China
| | - Fei Liu
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui, China
| | - Zhiyu Ma
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui, China
| | - Zhenyu Yue
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui, China
| | - Yi Yue
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui, China
| |
Collapse
|
2
|
Shirafkan F, Gharaghani S, Rahimian K, Sajedi RH, Zahiri J. Moonlighting protein prediction using physico-chemical and evolutional properties via machine learning methods. BMC Bioinformatics 2021; 22:261. [PMID: 34030624 PMCID: PMC8142502 DOI: 10.1186/s12859-021-04194-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 05/13/2021] [Indexed: 12/18/2022] Open
Abstract
Background Moonlighting proteins (MPs) are a subclass of multifunctional proteins in which more than one independent or usually distinct function occurs in a single polypeptide chain. Identification of unknown cellular processes, understanding novel protein mechanisms, improving the prediction of protein functions, and gaining information about protein evolution are the main reasons to study MPs. They also play an important role in disease pathways and drug-target discovery. Since detecting MPs experimentally is quite a challenge, most of them are detected randomly. Therefore, introducing an appropriate computational approach to predict MPs seems reasonable. Results In this study, we introduced a competent model for detecting moonlighting and non-MPs through extracted features from protein sequences. We attempted to set up a well-judged scheme for detecting outlier proteins. Consequently, 37 distinct feature vectors were utilized to study each protein’s impact on detecting MPs. Furthermore, 8 different classification methods were assessed to find the best performance. To detect outliers, each one of the classifications was executed 100 times by tenfold cross-validation on feature vectors; proteins which misclassified 90 times or more were grouped. This process was applied to every single feature vector and eventually the intersection of these groups was determined as the outlier proteins. The results of tenfold cross-validation on a dataset of 351 samples (containing 215 moonlighting and 136 non-moonlighting proteins) reveal that the SVM method on all feature vectors has the highest performance among all methods in this study and other available methods. Besides, the study of outliers showed that 57 of 351 proteins in the dataset could be an appropriate candidate for the outlier. Among the outlier proteins, there were non-MPs (such as P69797) that have been misclassified in 8 different classification methods with 16 different feature vectors. Because these proteins have been obtained by computational methods, the results of this study could reduce the likelihood of hypothesizing whether these proteins are non-moonlighting at all. Conclusions MPs are difficult to be identified through experimentation. Using distinct feature vectors, our method enabled identification of novel moonlighting proteins. The study also pinpointed that a number of non-MPs are likely to be moonlighting. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04194-5.
Collapse
Affiliation(s)
- Farshid Shirafkan
- Laboratory of Bioinformatics and Drug Design, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Sajjad Gharaghani
- Laboratory of Bioinformatics and Drug Design, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| | - Karim Rahimian
- Bioinformatics and Computational Omics Lab (BioCOOL), Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Reza Hasan Sajedi
- Department of Biochemistry, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Javad Zahiri
- Department of Neuroscience, University of California San Diego, La Jolla, CA, USA. .,Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
3
|
Espinosa-Cantú A, Cruz-Bonilla E, Noda-Garcia L, DeLuna A. Multiple Forms of Multifunctional Proteins in Health and Disease. Front Cell Dev Biol 2020; 8:451. [PMID: 32587857 PMCID: PMC7297953 DOI: 10.3389/fcell.2020.00451] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 05/14/2020] [Indexed: 12/23/2022] Open
Abstract
Protein science has moved from a focus on individual molecules to an integrated perspective in which proteins emerge as dynamic players with multiple functions, rather than monofunctional specialists. Annotation of the full functional repertoire of proteins has impacted the fields of biochemistry and genetics, and will continue to influence basic and applied science questions - from the genotype-to-phenotype problem, to our understanding of human pathologies and drug design. In this review, we address the phenomena of pleiotropy, multidomain proteins, promiscuity, and protein moonlighting, providing examples of multitasking biomolecules that underlie specific mechanisms of human disease. In doing so, we place in context different types of multifunctional proteins, highlighting useful attributes for their systematic definition and classification in future research directions.
Collapse
Affiliation(s)
- Adriana Espinosa-Cantú
- Unidad de Genómica Avanzada (Langebio), Centro de Investigación y de Estudios Avanzados, Guanajuato, Mexico
| | - Erika Cruz-Bonilla
- Unidad de Genómica Avanzada (Langebio), Centro de Investigación y de Estudios Avanzados, Guanajuato, Mexico
| | - Lianet Noda-Garcia
- Department of Plant Pathology and Microbiology, Robert H. Smith Faculty of Agriculture, Food, and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Alexander DeLuna
- Unidad de Genómica Avanzada (Langebio), Centro de Investigación y de Estudios Avanzados, Guanajuato, Mexico
| |
Collapse
|
4
|
Uversky VN. Bringing Darkness to Light: Intrinsic Disorder as a Means to Dig into the Dark Proteome. Proteomics 2019; 18:e1800352. [PMID: 30334344 DOI: 10.1002/pmic.201800352] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA.,Laboratory of New Methods in Biology, Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, 142290, Moscow Region, Russia
| |
Collapse
|
5
|
Macossay-Castillo M, Marvelli G, Guharoy M, Jain A, Kihara D, Tompa P, Wodak SJ. The Balancing Act of Intrinsically Disordered Proteins: Enabling Functional Diversity while Minimizing Promiscuity. J Mol Biol 2019; 431:1650-1670. [PMID: 30878482 DOI: 10.1016/j.jmb.2019.03.008] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Revised: 02/25/2019] [Accepted: 03/03/2019] [Indexed: 10/27/2022]
Abstract
Intrinsically disordered proteins (IDPs) or regions (IDRs) perform diverse cellular functions, but are also prone to forming promiscuous and potentially deleterious interactions. We investigate the extent to which the properties of, and content in, IDRs have adapted to enable functional diversity while limiting interference from promiscuous interactions in the crowded cellular environment. Information on protein sequences, their predicted intrinsic disorder, and 3D structure contents is related to data on protein cellular concentrations, gene co-expression, and protein-protein interactions in the well-studied yeast Saccharomyces cerevisiae. Results reveal that both the protein IDR content and the frequency of "sticky" amino acids in IDRs (those more frequently involved in protein interfaces) decrease with increasing protein cellular concentration. This implies that the IDR content and the amino acid composition of IDRs experience negative selection as the protein concentration increases. In the S. cerevisiae protein-protein interaction network, the higher a protein's IDR content, the more frequently it interacts with IDR-containing partners, and the more functionally diverse the partners are. Employing a clustering analysis of Gene Ontology terms, we newly identify ~600 putative multifunctional proteins in S. cerevisiae. Strikingly, these proteins are enriched in IDRs and contribute significantly to all the observed trends. In particular, IDRs of multi-functional proteins feature more sticky amino acids than IDRs of their non-multifunctional counterparts, or the surfaces of structured yeast proteins. This property likely affords sufficient binding affinity for the functional interactions, commonly mediated by short IDR segments, thereby counterbalancing the loss in overall IDR conformational entropy upon binding.
Collapse
Affiliation(s)
- Mauricio Macossay-Castillo
- VIB-VUB Center for Structural Biology, Vlaams Instituut voor Biotechnologie, Pleinlaan 2, 1050 Brussels, Belgium; Structural Biology Brussels, Department of Bioengineering Sciences, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
| | - Giulio Marvelli
- VIB-VUB Center for Structural Biology, Vlaams Instituut voor Biotechnologie, Pleinlaan 2, 1050 Brussels, Belgium; Structural Biology Brussels, Department of Bioengineering Sciences, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
| | - Mainak Guharoy
- VIB-VUB Center for Structural Biology, Vlaams Instituut voor Biotechnologie, Pleinlaan 2, 1050 Brussels, Belgium; Structural Biology Brussels, Department of Bioengineering Sciences, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
| | - Aashish Jain
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA; Department of Biological Sciences, Purdue University, Hockmeyer Structural Biology Building, 249 S. Martin Jischke Dr West Lafayette, IN 47907, USA
| | - Peter Tompa
- VIB-VUB Center for Structural Biology, Vlaams Instituut voor Biotechnologie, Pleinlaan 2, 1050 Brussels, Belgium; Structural Biology Brussels, Department of Bioengineering Sciences, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium; Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar tudosok korutja 2, 1117 Budapest, Hungary
| | - Shoshana J Wodak
- VIB-VUB Center for Structural Biology, Vlaams Instituut voor Biotechnologie, Pleinlaan 2, 1050 Brussels, Belgium.
| |
Collapse
|